Home > PHP 6 and MySQL 5: Visual QuickPro Guide (3 ed.) > Topic

better database design: memory limits, rows and tables

Posted by goingcrazy 
This forum is now read only. You can not log in or make any changes. Please use the new forum linked above.
As of March 14, 2011, this forum has been replaced and is permanently set to read-only mode. This means you can view any existing post but cannot open new posts or reply to existing ones. The new forum is located athttp://www.larryullman.com/forums/If you need to post a message, you'll need to register there. If you have to post in order to follow up on a thread started here, well, that's going to be a bit of a pain. I would recommend just posting the appropriate information, or copying the original text from here, or linking to your thread here. I apologize for the inconvenience but this purge will make for better forums in the long run. Thanks.
better database design: memory limits, rows and tables
March 04, 2011 01:37AM
mysql version: 5
php version: 6

I have a database that currently has 12 different tables. I am making some changes to increase flexibility to the user and I could reduce the number of tables. Total tables would go from 12 to 10. The table that absorbed the other 2 would triple in size (number of rows would increase accordingly).

questions
1) is there any issues with memory that I should be aware of, can the table get too large in size or too many rows? is there a max for memory per table or max memory for database?

2) will there be any speed issues with this particular table getting 3 times larger. Queries often select thousands of rows at a time. This is a large table, millions of rows, so the absorbing of the other tables will increase it by millions more rows. This table is only 6 months old and is going to continue to grow and grow, my preference is to have it be as large as I can before I have to limit it.

3) Is a database better designed with fewer tables that contain more rows or more tables and less rows the way to go?

Thank you for the help. Barrett
Re: better database design: memory limits, rows and tables
March 04, 2011 06:24AM
Hey,

How much do you know about database normalization? There is a reason why data are split into multiple tables. Keeping redundancy at a minimum, lowering the risk of inconsistent data, and to lower the risk of personal errors. I will try to explain shortly why this is important to you.

Ok. You have a large database for a forum. This forum has users, forums and posts. I will quickly give you an example of a possible database design:

Inconsistent data - Very basic user info::

users (id, username, password, postal_code*)
location (postal_code*, postal_name)

Underlined is PRIMARY KEY - The identifier. As you can see, some of the columns have a symbol (*). This is called a foreign key. What it does is to "link" or connect that data together with a primary key logically. Either in THAT table or another table.

Question: What if postal_name was also in the "users"-table? You could end up with spelling a postal code wrong, or a postal name wrong. Then you would have inconsistent data. To remove this issue, we split data into different tables.

Dependencies - connecting data logically in a forum:

Ok. You have a large forum. We need to make sure that that every post is identified with the poster, (user) and is in a forum. This is where the foreign key comes into play. This is how we CONNECT all this information to each other. Now I will provide you we a pretty basic example.

The table defenition:
users (user_id, username, password)
forums (forum_id, forum_name)
posts (post_id, forum_id*, user_id*, post_content,)

Example content:
users (101, goingcrazy, somePassword)
forums (10, PHP 6 and MySQL 5)
posts (5000, 10, 101, I have a database that currently has 12 different tables....)

In the table post. The value 10 refer to the forum ID and 101 is a reference to the user ID.

As you can see, the posts table is using the users and forums tables to CONNECT that very post to a FORUM and to a USER. Now you hopefully understand HOW this is done, but I'm also going to explain WHY it is done to you.

Why you should split tables:

Consider this example. You have a table for football players. SOME of the players have nicknames. If you add the column "nicknames" to the table you store football players in, maybe 50 of 62 columns would be EMPTY. That is an indication for you that you should SPLIT that column into a new table. REPETITIVE values are also a great indication. If all the data in one column in a table have the value of 1, you have done something wrong in the design process.

Is there a limit?
I would say no. Databases are great for storing data, and even large forums is not using very huge space. You should absolutely think about HOW the data is stored. If a six month old database hold millions of rows, I'm pretty sure you are doing something wrong along the way.

Give us your table definition and some examples of data, and maybe we can help you optimize it! :)

Sorry for the long post. I'm just not able to sleep... English is far from my native language, but I hope you get something from this. Would be happy to explain this better to you. Please ask if you wonder about anything.



Edited 2 time(s). Last edit at 03/04/2011 06:31AM by Antonio Conte.
Re: better database design: memory limits, rows and tables
March 04, 2011 02:47PM
Antonio, thanks for the detailed and well thought response! Just to add two things. First, to reinforce what Antonio said, the PRIMARY, if not ONLY, consideration should be normalization. The data needs to be stored reliably. You can improve speed by doing other things; you can never improve a flawed database design.

Second, when MySQL and other major databases talk about a lot of data or rows, they mean millions of rows and gigabytes of data. I say this for other people's frame of reference. You say you have millions of records, so you have large tables, but still what the database was intended for. I would argue that having so much data is further proof that you need to maintain data integrity.

Best Wishes,
Larry

Writer/Web Developer/Instructor
Forum Moderator
Re: better database design: memory limits, rows and tables
March 04, 2011 03:22PM
Thank you for your kind words, Larry. I would credit my database teacher. After a half year of database studies at school, I think I got a pretty good hang on it. MySQL is in my opinion a very underestimated language. You can achieve almost everything must people would normally do in PHP if you construct your queries right. I have learned to love SQL in general. Hopefully I can get others to like it too!
Re: better database design: memory limits, rows and tables
March 04, 2011 04:22PM
You're welcome. Thanks for helping out. I really appreciate it.

Just so you know, MySQL isn't technically a language, but yes, you should always try to do as much as possible in the database and SQL queries.

Best Wishes,
Larry

Writer/Web Developer/Instructor
Forum Moderator
Re: better database design: memory limits, rows and tables
March 04, 2011 09:30PM
Antonio,
ok, one thing I've learned from you and Larry:
1) don't worry about speed, the focus for database design should be on normalization and data integrity.

My database stores data for a manufacturing process, measuring different inputs or items. Each item puts 1440 values a day into the database, the number of items varies, but can exceed 100. 100 x 1440 = 144,000 rows a day, that is how the table has gotten so big. In a nutshell I have the following: (using easier names for simplicity)

table #, table name, structure
table 1: store, structure: store_id, store_name
table 2: items, structure: store_id, item_id, item_name
table 3: carrots, structure: store_id, item_id, broccoli_id, datetime, value
table 4: broccoli, structure: store_id, broccoli_id, datetime, value

Table 3 and 4 are the ones that get the 1440 values a day from each item.
I am considering turning table 3 and 4 into a new table 3 named vegetables, name of item would come from items table
table 3: vegetables, structure: store_id, item_id, vegetable_id, datetime, value

With the new structure I could tally all vegetables and would not be limited to carrots and broccoli, user would just name the vegetable in the items table with item_name. I do not have my book with me, but when I'm home I am going to reread the chapter on database design and normalization. It sounds like making this change will increase flexibility to user, and have no effect on data integrity, with a cleaner (less tables and redundancy) structure. The size of the vegetable table will be twice as large as that of the original carrots and broccoli tables, but that should not be a worry.What do you think?

thanks for help! Barrett
Re: better database design: memory limits, rows and tables
March 04, 2011 09:49PM
With what you said, I don't really understand the current design at all. I expect it's a matter of your vegetables-carrots-broccoli analogy not being complete. For example, why is there a broccoli_id column in carrots and broccoli? Why does carrots have an item_id filed but broccoli does not? In short, without it being more coherent, I couldn't say how it should be designed.

Another criteria is how the data will be used. How long will the data be stored? What kinds of queries need to be run on the data (i.e., what kinds of questions will be asked and what would those answers look like)?

Best Wishes,
Larry

Writer/Web Developer/Instructor
Forum Moderator
Re: better database design: memory limits, rows and tables
March 04, 2011 10:00PM
I thought that post was better, but it was sloppy, working against a dead battery in laptop without charger handy.

Current design:
table #, table name, structure
table 1: store, structure: store_id, store_name
table 2: items, structure: store_id, item_id, item_name
table 3: carrots, structure: store_id, item_id, carrot_id, datetime, value
table 4: broccoli, structure: store_id, item_id, broccoli_id, datetime, value

upon review, I am unsure why table 3 and 4 have the carrot_id and broccoli_id rows, they are not referenced in queries, set up as auto increment, should I remove them?

Typical Query
1) given a date range, either days (1440 + results) or an hour range select the values for 20 items. Using php, various math is applied for various reports.
-- using a JOIN would select the item_name from Table "items" to show in report.

Hope that clears it up some. Barrett
Re: better database design: memory limits, rows and tables
March 05, 2011 09:17AM
Without the values in mind, this is probably a more clever way to START the designing:

store (store_id, store_name)
items (item_id, item_name)

OK! Now you are able to work with stores and to create items. You talk about expanding items. This is how you need to do it. The question is wheter the values you use are in any way connected to the items. Is this a table for keeping data of the manifacturing process of food for fabrics and such? I really don't know how to work further with your data without AN ACTUALL EXAMPLE of data.

Start by giving me five or something examples from your carots table this way:

carots (store_id, item_id, carrot_id, datetime, value)
- Underline primary key, *-symbol for the foreign keys. This is for knowing the structure.

Then give live examples like this:
carots (100, 1576, 100424, 2011-03-05, someValue)
carots (100, 1577, 100437, 2011-03-05, someValue)
carots (100, 1578, 100501, 2011-03-08, someValue)
carots (100, 1579, 100654, 2011-03-08, someValue)
carots (100, 1580, 100683, 2011-03-09, someValue)

Hopefully, we can figure something out. :)
Re: better database design: memory limits, rows and tables
March 05, 2011 09:10PM
Ok. Yes, values are tied to each individual item. For example, if an item is an zucchini, the value would be a percent representing a moisture content, .99, .80 etc. The value is taken at 1 minute intervals; the reason for the date time value. The stores and items are related, 1 store could have 20 items, another store could have 15 items

current structure of carrots table is:
carrots (store_id*, item_id*, carrot_id, datetime, value)

some examples of data in the carrot table
store_id, item_id, carrot_id, datetime, value
carrots (100, 19, 190000, 2011-02-28 00:00:00, .89)
carrots (100, 19, 190001, 2011-02-28 00:01:00, .89)
carrots (100, 19, 190002, 2011-02-28 00:02:00, .89)
carrots (100, 19, 190003 ,2011-02-28 00:03:00, .88)

the structure I am considering moving to, which eliminates the carrots table and broccoli table, they become the vegetable table

store (store_id, store_name)
items(item_index_id,, store_id*, item_id, item_name)
vegetables (store_id*, vegetable_index_id, item_id, datetime, value)

typical queries will pull 1440 or more rows of data from vegetable tables, either for 1 item or multiple, using a JOIN query select the item name from items table and store_name from store table then display results in a table or graph.

Going to a vegetable table will increase flexibility for user, they can enter their name of any vegetable, not just have carrots and broccoli.
Re: better database design: memory limits, rows and tables
March 06, 2011 09:15AM
Ok. Thanks. Then you are absolutely thinking about this the right way. What I notice though, is that there are a lot of repetitive values in your tables. Therefor, you would gain from a redesign, and you are right about the simple joins to connect the data. Don't mind if you have any problems of thinking how to Join this data later on. We will help you with that.

One thing though. This is pretty complicated stuff. At least consulting with a specialist in databases would be a clever thing to do. Even though I consider myself pretty good with must of this stuff, this is in know way an easy task to perform. I will give it my best though, as I find your problem inspirational. Really hope we can pull this off! :D

----------------------------------------

Starting with stores:
stores (store_id, store_name)
- Maybe you need others fields like description, location, employees. That won't be a problem here. You will allow to keep track of larger amount of data for each store. You had this already - good!

Then you have the products.
products (product_id, product_name, starting_value)
- If the individual product, like a carrot, have a starting level of moisture, then you would benefit in using it here. It will allow you to ONLY make a row if there are any change to that value. I don't know if this is possible for you - that you know best yourself. Else, remove starting_value. The key here is that EACH product are unique. Carrot, broccoli, zucchini, pea. This is very you would place all data connected to the individual products. This will allow you to create categories like vegetables, fruits, baking product and such in the feature. This is not - however - 100 carrots. It's THE carrot. THE pea. THE broccoli.

----------------------------------

Ok. It's pretty simple so far. The question is, you seem to need an unique identifier for every carrot, every pee, every broccoli. The question is how we proceed according to your needs. If you need to keep TRACK of the numbers of carrots, pees and broccolis for each company, like say Beer Company INC, I would recommend keeping that data in something like "items in store".

Items in store
items (product_id*, store_id*, number_of_items)
- Ok. This is possibly magic for you. Instead of keeping track of the numbers of items through "the process table", you keep it here. You would also want to set the product_id and store_id to a COMBINED PRIMARY. This is what will save you from redundant data. You won't allow several occurrences of CARROT for the unique store Beer Company INC. You will instead update the numbers of products in store constantly. This database will need constant UPDATING instead of CREATION of rows, and will save you lots of data. This structure will, however, allow Beer Company INC. to keep track of their carrots, peas and broccoli.

----------------------------------

The process table - some initial thoughts:
- This is a guess: You keep track of the moisture for ALL carrots in that one process right? If you have 20 carrots in the same process, each carrot will have the same level of moisture? This would be my guess, looking at your example data. If this is wrong however, we need another round of thinking for this table. I also notice that you keep track of the moisture ever minute, even when there are no change in moisture! Could you instead keep track of changes to moisture at a given time instead? That would remove very much data!

If all my thoughts were right, here we go:
process (process_id, store_id*, product_id*, datetime, moisture)
- Looking at the structure from your last post, I notice you have both vegetable_index_id and item_id. If item_id is carrot, (19) and vegetable_index_id is 190000, 190001, 190002, then you are onto something. vegetable_index_id should anyhow be listed first of the columns. My guess is that you want to keep all data from each store combined. There are no need for that. (Select * FROM process WHERE store = 5) with a couple of joins will do the trick.

I'm starting to get tired here. Hope some of my thoughts where right. You will need to keep track of an amazing amount of data no matter what, but at least you can get this more logical and stable. Get back to me.
Re: better database design: memory limits, rows and tables
March 07, 2011 01:41AM
Antonio,
to answer a couple of your questions
1) yes, I have to treat each vegetable or item as an individual, they cannot be grouped together. Each store may have 10 items or 20 or whatever assigned to it, but it is important that each item be treated as an individual, even if they are two carrots, each carrot is different.
2) I hadn't before thought about just updating the value, if there wasn't a change from minute to minute, don't add a row. You are right that it would save a ton of space, size or memory. A big problem I see is that there are a dozen or so reports that are generated for the data, and much of it is predicated on a single value representing 1 minute of time, and it some instances I believe it has to be that way. I'll have to go that more thought, but my gut says don't try it, at least not yet.
3) COMBINED PRIMARY
-- in my items table, items(item_index_id,, store_id*, item_id, item_name),
I created the row item_index_id to make a PRIMARY key, which I auto increment every row and therefore it is unique to every row. Each store, say store 1, will have item id's that begin at 1 and go on and on, for example
store #1 has items 1,2,3,4,5,6,7,8,9
store #2 has items 1,2,3,4,5,6,7,8,9, 10, 11

Question: Can and should I eliminate the item_index_id and just create a COMBINED PRIMARY with store_id and item_id?

From my previous post, the example of data in the carrots table,
store_id, item_id, carrot_id, datetime, value
carrots (100, 19, 190000, 2011-02-28 00:00:00, .89)
carrots (100, 19, 190001, 2011-02-28 00:01:00, .89)
carrots (100, 19, 190002, 2011-02-28 00:02:00, .89)
carrots (100, 19, 190003 ,2011-02-28 00:03:00, .88)

The 19's were just a coincidence. Here 19000 represents the carrot_id, or "an id for that value, item_id and time combination". It too was created to make a PRIMARY KEY. If I could do a COMBINED PRIMARY of 3 rows, store_id, item_id and datetime I too could eliminate the carrot_id. But just looked to mysql, and it appears 2 is the limit for a combined primary?

let me know what you think. From my very first post, it looks like I should dump the carrots table, broccoli table, and just go with a vegetable table or what you called a process table?

thanks for the help here. Barrett
Re: better database design: memory limits, rows and tables
March 07, 2011 09:18AM
Then check this out!

stores (store_id, store_name)
products (product_id, product_name, starting_moisture_if_any)
process (process_id, process_start, process_stop, store_id*)
unique_products (unique_products_id, process_id*, product_id*)
values (unique_products_id, datetime, value)

Explanation:
I think I am onto something here. You would only need to register a process ONCE. That is about right. You then create the unique products, give them an id, connect them to a process, and tell which type of product it is. You then use that ID for the unique products to store values in the "values" table. This may seem like a very little gain, but think about this. We have removed 14000 columns of store_ids and product_ids. THAT is a huge amount of data.

This will also make something a lot easier for you. You can add more information about each store. Logo path, description, etc. You can add every product you would ever need, and even add more information about each product. The Process table does not NEED the process_start and _stop columns, but they can be used for statistics and calculation how many values are in a process.

-------------------------

Tell me what you think of this. Have I missed a point? I can't really see any problems with my table structures at the moment. One things for sure. We are getting closer to something much better for you and your customers. ;)



Edited 1 time(s). Last edit at 03/08/2011 10:34AM by Antonio Conte.
Re: better database design: memory limits, rows and tables
March 08, 2011 01:50AM
oh yea, much closer! I may have been too generic in the beginning, but I wasn't sure how far we'd get. The process actually occurs every day for every item, yet still each process is unique to each item, so the start and end time isn't super significant, if needed it can be found by querying the earliest and latest values in the values table. Sort of easiest to think of it as not a process, but just an on going measurement, like maybe measuring the temperature outside.

With that said, I do think you are onto something. What do you think of this structure, with the process table removed and a combination of the products table and unique_products table combined into a new products table. the savings, as you pointed out, would come from eliminating the store_id from the values table.

stores (store_id, store_name)
products (product_id, store_id*, product_number, product_name)
--- note: the product number is used in various reports
values (value_index_id, product_id*, datetime, value)
--- note: created the value_index_id for PRIMARY key, auto incremented
--- note: datetime would not be a unique value, because of multiple product_ids

one little hangup: the data is imported from a text file. The text file reads store_id, product_number, datetime and value. When that file is uploaded a query would have to go and grab the product_id from the products table and use that in the insert query instead of the store_id and product_number.
Alternative would be to go with the following structure, which adds the store_id and product_number back into the values table and removes the product_id, so net is +1 in the row count, does make a difference though in such a huge table (millions of rows). I plan on storing a store's data as long as I can, various reasons, but it does make the table much larger.

stores (store_id, store_name)
products (product_id, store_id*, product_number, product_name)
values (value_index_id, store_id*, product_number*, datetime, value)

Let me know your thoughts. Any which way, it seems this is the route, and definately not use product specific tables, like carrot table or broccoli table, just use a values table.

Barrett
Re: better database design: memory limits, rows and tables
March 08, 2011 10:57AM
You are starting to get this!

Create ONE carrot, ONE broccoli, etc before you use that data in products. There's a reason why I used more tables than you. More tables are not bad if they keep the data more structured logically, remove redundancy and never let you misspell "Carrot" for "Craatos" in one row.

--------------------------------------------

You did not like my "unique_products" structure it seems. :P

Do you see the point of it? Adding 100 rows for unique carrots in ONE table is going to let you use just an ID for the unique product in you values table. That will save you even more repetitive and pointless data.

stores (store_id, store_name)
products (product_id, product_name)
unique_products (unique_products_id, store_id*, product_id*)
values (unique_products_id*, datetime, value)
Re: better database design: memory limits, rows and tables
March 08, 2011 08:13PM
Quote
structure repeated
stores (store_id, store_name)
products (product_id, product_name)
unique_products (unique_products_id, store_id*, product_id*)
values (unique_products_id*, datetime, value)

I think I am following you, I guess what I am trying to say is that every product_id in the products table is unique, no two are ever a like. So in above, the products table and the unique_products table would be the same size, names would be like carrot 1, carrot 2, brocolli, 1, brocolli 2, etc.. Combining the two tables into one would produce:

stores (store_id, store_name)
products (products_id, store_id*, product_number, product_name)
values (products_id*, datetime, value)

With the issue of reading the text file and having the store_id and product_id readily available, but not the product_id, I can't decide if its worth the work to adjust to using the product_id, like in above structure or change the values table to:

values (value_index_id, store_id*, product_number*, datetime, value)

this makes the table larger, but it is so much easier to use and code for the store_id and product_number; they are commonly used in reports and other areas, product_id would not be.

let me know what you think.
Re: better database design: memory limits, rows and tables
March 08, 2011 10:29PM
Your first example is the way to do this. When I talk about Products and Unique_products, that's only for seperating the NAME of the vegetable from the unique products. (Called just products in yours) This is not very important, although it makes your table INF2 Instead of INF3 (Partially relashionship in data. Like Postal Numbers and Postal names.) This is only logical though, as I'm pretty sure you never type in "Carrot" manually for each row. ;)

Text file issue:
Is it really necessary to use a text file? Do you read store_id and product_id from it? Why not use the data in your table instead? I'm thinking you put to much weight on reading from this file instead on relying on the database as a whole. With Stores, Products and Unique_products, you should never have to read anything from a text file.

When it comes to using store_id* and product_number* in Values, of course it works. The problem is that EVERY SINGLE MINUTE, you store two rows of unnecessary data! The only smart thing to do is to have store_id and product_id in the Products table. The query is not very hard to write.

SELECT a.unique_products_id as number, c.store_name as store, b.product_name as product, a.datetime, a.value
FROM values as a 
INNER JOIN products as b on (a.unique_products_id = b.unique_products_id)
INNER JOIN stores as c on (b.store_id = c.store_id)
ORDER BY a.unique_products_id

I think this query might need SOME altering, but this is pretty much it.
Re: better database design: memory limits, rows and tables
March 09, 2011 01:44AM
text file
yeah, I am stuck with the initial insert of data into the values table coming from a text file. I do read store_id and panel_id from it. Once the data is insert into mysql, then the text file is no longer used.

here's the structure I'll go with, insert will be done using a JOIN that grabs the product_id from products table.

stores (store_id, store_name)
products (products_id, store_id*, product_number, product_name)
values (products_id*, datetime, value)
-- note: in the values table, i'll use products_id and datetime as a combined PRIMARY KEY, is that correct?

thanks for the help on this.
Re: better database design: memory limits, rows and tables
March 09, 2011 07:03AM
Quote
goingcrazy
here's the structure I'll go with, insert will be done using a JOIN that grabs the product_id from products table.

stores (store_id, store_name)
products (products_id, store_id*, product_number, product_name)
values (products_id*, datetime, value)
-- note: in the values table, i'll use products_id and datetime as a combined PRIMARY KEY, is that correct?

Ok. You confuse me with naming. The logical thing for me would be to use the NAME of product_number as Primary Key. It's the product_number (200-500) of product _id (19), product_name (Carrot) that should be the unique identifier, right? Product_id (19) is always product_name (Carrot), but product_number is what's changing?

Combined Primary: That is correct. With Product_number (Your product_id, see discussion of naming above) as a single primary key, you could only store that number once ever. :)

Just glad to help out. :)
Re: better database design: memory limits, rows and tables
March 10, 2011 12:57AM
no, in the products table, the product_number and product_name are associated. for example, product_number (19) is always product_name (carrot). so products_id is what changes and gets used in the values table. I know my language is a bit confusing, outside of this forum, I don't get an opportunity to talk with anyone about php, mysql, so I'm a bit weak with formal naming, etc.

Thanks for the help on this, in addition to ending up with a better design than I would have had, I've learned some other things along the way. keep up the good work, guys like you are a big help in this forum, and its appreciated.

Barrett
Re: better database design: memory limits, rows and tables
March 10, 2011 01:22PM
Then we agree. :)

Glad I could help out.
Re: better database design: memory limits, rows and tables
March 11, 2011 03:37PM
Yes, Antonio, thank you very much for helping out with this. I really appreciate it!

Best Wishes,
Larry

Writer/Web Developer/Instructor
Forum Moderator