What Is Larry Thinking? #70 => Going Big, Micro Version, Part 2

In this edition…

About This Newsletter

This is the third and final issue of my two-part newsletter on “going big”. (Yes, third in a two-part series.) By “going big” I mean how one transitions from a Web site with little to moderate traffic, to one that can handle tons of traffic. The first newsletter looked at going big from the macro perspective: theory, implementation, hardware, and networking. The second newsletter was the first look at the micro perspective: how to write code that scales well. The emphasis there was on the code itself, along with the development process.

In this newsletter, I’ll provide a couple of resources for two other key components: the underlying database and the user’s browser. As always, questions, comments, and all feedback are much appreciated. And thanks for your interest in what I have to say and do!

Q&A => How do you optimize a database?

In 2008 (!!!), Bryan wrote me asking how to best optimize a database. The fact that this email came in over four years ago is testament both to how long I’ve been doing this newsletter and how long it can take me to answer questions (worst case scenario)! Yeesh.

Bryan wrote:

I was hoping you could touch a bit on MySQL optimization and indexing. This is something that is very important to me as I run a high-traffic advertising network, and I’m always looking to speed things up. Unfortunately, it is very difficult to find reliable and complete information on when and what to index. I have found many sources, but so many contradict each other, so it is very difficult to know what’s the correct way.

Although I’ve written about the subject of MySQL indexes in various books of mine, it’s a topic always worth rehashing. Simply put, creating the proper indexes for your tables improves the performance of your database (well, for SELECT queries anyway, which are the most important).

To start, you should index columns that are:

The primary key
Frequently used in WHERE clauses
Frequently used in ORDER BY clauses
Frequently used as the basis for a JOIN

You should not index columns that:

Allow NULL values
Have a very limited range of values (such as Y/N or 1/0)

The other thing to keep in mind is how the index is defined. For example, if queries might sort by last name, you could index just the first few characters of the last name column. By doing so, you get the performance benefit of treating the column as if it were of fixed-length. You can also create indexes on more than one column at once. For example, you might create an index on the combination of the email address and password (for logging in purposes). If the index was created with the columns named in that order, you would also effectively be creating an index on the email address by itself, which could be beneficial. The same goes for indexing last_name, first_name: that creates an index applicable to ORDER BY last_name, first_name as well as one that applies to just ORDER BY last_name.

To confirm that you’ve got the right indexes for your database, use the EXPLAIN command to verify how well MySQL is using the indexes you’ve created for the queries your site executes.

A second factor for performance is how your columns are defined. Databases (and computers) work with numbers, particularly integers, faster than they do strings, so prefer numeric types whenever possible.

Bryan had also stated:

Currently, the only effective way I have found to reduce load on MySQL is to simply reduce the amount of queries made.

Reducing the number of queries is always a great policy. Well, when you’re just starting… As you grow big, the opposite turns out to be the case.

In my books, I always argue for normalizing a database first and foremost. Normalization protects the integrity of your data, which is critical, but it does so at the cost of performance. Normalization often requires complex queries, like JOINs, UNIONs, subqueries, and groupings, to retrieve the information needed. These complex queries are the hardest for the database to execute, and therefore take the most time. For most sites, the extra effort (and the performance hit) is not a problem. But when sites get big, most sites drop the single complex query for multiple, faster queries.

On a less busy site, the act of running a query will be the most time consuming, so one more complex, slower query is best. On a very busy site, especially when you have multiple database servers, individual queries can be executed faster. Therefore, breaking complex queries into smaller, faster queries, is preferred.

The fact is that as sites get bigger, relational databases are used less and less as originally intended (i.e., as relational). The switch that many active sites are making to non-relational (aka NoSQL or non-SQL) databases isn’t so much a change from one database type to another as it is a change from using a relational database in a non-relational manner to using an actual non-relational database.

A similar trick that big sites use to improve performance is to break up what could be single tables into multiple. For example, you might have a users table that stores everything about a user. Each time a user logs in, or any time any query references that table, any extraneous information stored in the table–the user’s address, gender, whether the user likes watermelon or not, whatever–impacts the performance of that query, even when that information is not referenced by the query itself. The fix in such cases is to put the bare minimum required information in the primary users table (e.g., username, email address, password, user ID, and registration date), and move everything else to another. Again, multiple queries will be required to retrieve all the information, but multiple smaller queries can execute faster than single larger queries. You’ll also find that all queries will run better if large text and binary columns are moved to their own separate tables.

If you do this right, the first table could have SELECT queries performed using any column (e.g., the email address or the username), but the secondary tables would always use SELECT queries off of foreign keys: the FK related to the PK from the original table.

If you want to improve the performance of your database, also take a look at the code that’s using the database. Be absolutely certain to only retrieve (e.g., in PHP) the columns and rows you’ll absolutely use. Also try to limit uses of:

JOINs
UNIONs
Subqueries
Groupings

Finally, the best way to optimize your database and database interactions is to not query the database at all. This can be accomplished by making smart use of caching.

On the Web => 15 Pro-tips for MySQL Users

Joshua Thijssen just posted on Speaker Deck the slides for his presentation “15 Pro-tips for MySQL Users“. As the name implies, for the most part, the tips are for more advanced MySQL users, although a couple are the kinds of things I would hope most MySQL users are already doing. There’s plenty to learn here, or at least consider, and it doesn’t take long to peruse. (And, there are actually 16 tips.)
Once you’ve gone through that presentation, you may want to consider some of the others that Thijssen has posted on Speaker Deck. In particular, I’d consider “REST in practice” and “Alice & Bob: public key cryptography 101“.

On the Web => How to lose weight (in the browser)

A team of very smart Web developers recently posted “How to lose weight (in the browser)“. This is a front-end guide to improving the performance of your Web site.
The site has 24 specific recommendations in these areas:

HTML
CSS
JavaScript
jQuery
Images
Server

All of the recommendations are thoroughly explained: not just what you should do, but why. Many recommendations are also supported with specific data and references. For others, there are links to useful tools to help you implement the particular suggestion.

I wouldn’t say this resource has anything especially novel (i.e., information you won’t find elsewhere), but the breadth of coverage, the directness of the recommendations, and the cohesiveness makes it an excellent resource for any Web developer.

Larry Ullman’s Book News => “The Yii Book”

I just this past week finally (finally!) posted another update on “The Yii Book“. This is version 0.7, and includes two new chapters:

Chapter 15, “Internationalization”
Chapter 16, “Leaving the Browser”

This adds about 40 more pages of material, bringing the total to around 405 pages as a PDF.

These two chapters do reflect a change in the table of contents, but I’ve not removed any planned content, just re-arranged it. The next two chapters are:
Chapter 17, “Improving Performance”
Chapter 18, “Advanced Database Issues”

I’ll probably target another update after writing those two. The following three chapters will be harder and more involved, I suspect.
The current plan is still to complete the first edition for the book using version 1 of the Yii framework. Then I’ll see where Yii 2 is at before making a decision about translations and print copies (i.e., doing those for Yii 1 or waiting for Yii 2).

And thanks to everyone for their interest in this book and to everyone that has already purchased a copy!

Series

Blog post categories

Note