What is Larry Thinking? #68 => Going Big, Macro Version

April 8, 2013
The Yii Book If you like my writing on the Yii framework, you'll love "The Yii Book"!

In this edition…

About This Newsletter

The subject of this newsletter is “going big”. By that I mean how to transition from a Web site with little to moderate traffic, to one that can handle tons of traffic. (How you get the traffic itself is an entirely different issue.)

To be entirely forthcoming, “going big” is not my forte, which is to say that I don’t have a ton of direct, personal experience in this area. Among the X dozens of Web sites I’ve worked on over the past 14 years, only a smattering have the demands of a “big” or “big-ish” site. Which makes sense, as statistically, not that many sites are “big”. In the grand scheme of things, the number of “big” sites is such a small percentage as to be almost negligible. This is fact I’ll speak more about at the beginning of this newsletter.

That being said, I do know a fair amount about the subject, and I know, and have spoken in detail with, people that are directly responsible for heavily trafficked sites. So, although I’m not an expert in “going big”, I’m not just guessing here, either.

As I was writing this newsletter, it also became “big” (as in wordy), so I’ve split it into two. This, the first, looks at going big from the macro perspective: theory, implementation, hardware, and networking. In the next newsletter, I’ll look at the micro perspective: how to write code that scales well.

As always, questions, comments, and all feedback are much appreciated. And thanks for your interest in what I have to say and do!

What is Larry Thinking? => The Myths of Going Big

One of the most common, and most potentially ruinous, statements I see beginning Web developers make is:

I have this great idea for a Web site, and it’s going to be huge, so I need to start with the infrastructure that can support a huge site.

I’ve seen this far too many times, often with people who only have an idea: no Web site, no Web development skills, sometimes not even the domain name yet!

Now, to be clear, dreaming is fine. And pursuing a project because you have a great idea is certainly justifiable. In fact, I would argue that it helps to have an idea in mind before you take the time (or spend any money) to pursue new skills; Doing so gives you a target to aim for.

But the reason sentences like the above frighten me is that they’re expensive. Putting the cart before the horse (so to speak) is one way businesses fail and people lose lots of money. The resources required by a busy site–hardware, networks, and staffing–are especially expensive. These are resources you shouldn’t spend money on until you have to, or very nearly almost have to. I like a good analogy, so let’s try one on…

My wife and I have two cars and two kids. Eventually my kids will be of driving age as well, at which point it would probably be beneficial to have at least one more car. Seeing that need on the horizon, should I buy another car today? Well, my kids are 6, so…no. Not only would I be wasting money now and for ten years (maintenance, insurance), but it’s conceivable that the third car would never be necessary (e.g., the happiest reason, would be that we all move to Venice, Italy).

This brings me to what I consider to be the two biggest myths of going big:

  1. Your Web site will ever go big.
  2. Going big is a difficult transition to make on the fly.

The first myth sounds pessimistic, but is statistically true. The top two Web sites (in terms of pageviews per day) are Google and Facebook. They each average around 400–450 million pageviews per day. That is the upper echelon for “big”.

Amazon is extremely popular, ranked around the #10 busiest site, and it gets around 45 million pageviews per day. This means that the #10 ranked site gets about one-tenth the traffic of the top site. ESPN, one of my favorite sites, is also extremely popular, but it ends up being around #100, and gets around 12 million pageviews per day. Evernote, a software company, is ranked around #500, and gets about 500,000 pageviews per day. The #500 ranked site gets about one one-thousandth of the traffic of the top site. This is a good place to stop, as 500,000 pageviews per day reasonably counts as “big”. (Although a proper, definitive number for “big” is somewhere lower than that, and would depend upon what, exactly, the site does: 100,000 views of mostly text content is a different beast than 100,000 views of mostly video.)

With those numbers in mind, what are the odds that your site will end up in the top 500 of all Web sites in the world (regardless of how good your idea is)? The odds are not good. The assumption that your site will be a huge success is a bad one to make. Spending money based upon that assumption is a catastrophe.

(My site, in case you’re curious, gets about 20,000 pageviews per day, easily handled by one Virtual Private Server [VPS] with the help of a Content Delivery Network [CDN] for better international load times.)

I suspect beginners make the mistake of thinking they need to support “big” from the outset due to a lack of knowledge, a misunderstanding. This is the second myth: that it’s hard to transition from little to moderate traffic to handling a lot of traffic. Over 14 years, I’ve switched hosts (a few times), domain names (once), and hosting packages (a few times), all with little or no down time. In the process, I’ve gone from spending $5/month (all prices US) to spending around $65/month (now). It would have been foolish to have spent $65/month (let alone the hundreds per month that a dedicated server costs) all those years ago before I needed those kinds of resources.

So what should you do? Keep reading and I’ll tell you. But first…

Let me just add that I don’t think having a high traffic site should ever be a goal. Just like being popular in high school isn’t all it’s cracked up to be, or that I believe it’s shallow to dream merely of being rich, don’t aspire towards creating a busy, popular Web site. Create a great Web site that addresses a need, that solves a problem, and sufficient popularity (and income, if that’s a hope), will follow. I say this as a person that never tried to sell a lot of books, but rather aspired to write good books. In doing so, I managed to sell enough along the way.

Q&A => How do I get started with a new site?

A while back, Jennifer had written in asking how to get started with a new site. She was not asking in terms of programming, but rather in terms of hosting and such. If I were a complete beginner starting today, I would (all prices in USD):

  1. Buy the domain name for the site (cost: $10/year or so).
  2. Develop the entire site on your own computer (cost: $0!).
  3. Have your friends and family try out your site on your computer (cost: $0!).
  4. Read more, practice more, study more.
  5. Rebuild the site from scratch on your computer. This time the result will be at least 20% better (cost: $0!).
  6. Move the site to a quality shared hosting situation (cost: $15/month or so).
  7. When your site is busy enough that shared hosting can no longer support it, move to a basic VPS package with a quality host (cost: $30–50/month or so).
  8. When your site’s traffic grows, add on a CDN (cost: $5–15/month or so).
  9. When your site’s traffic outgrows the VPS, get a better VPS package (cost: $80/month or so).
  10. When your site’s traffic outgrows any VPS, move to a dedicated server (cost: $300/month or so).

Using this sequence, you’ll see that you’ve only spent a total of about $10 for the first several months of the project. This is as it should be for most people. Even for a somewhat busy site, such as mine, I’m still only talking about a few hundred dollars per year. Let your infrastructure grow with the demand. If you work with quality hosting companies, there will be little or no downtime, even as you switch from one hosting plan to another, or from one hosting company to another. This is how it’s done all the time.

You’ll also notice that I’ve added a reiterative process there: create the site, get some input, get better yourself, and recreate it. Obviously I don’t do this myself on new projects, but the fact is that you’ll learn on every project you work on. The site I begin tomorrow will be slightly better than the one I finished yesterday (hopefully). When you’re just getting started, the quality difference between that first project and that second will be exponential.

I should also add that I’m a big fan of Amazon’s Web services as an affordable, global, easily-expanded solution. But they’re not that easy for beginners to master and use. My instructions are for the common person, implementing what should be a good idea. If you’re on your second or third good idea, have a track record of success, and have some funding, the game plan will be different.

Q&A => Who should I use for my hosting?

I personally use ServInt for my hosting. I’ve been using ServInt for my Web hosting for about 6–7 years now and I’m so, so happy with their service. They only provide Virtual Private Servers (VPS) and dedicating hosting, so ServInt is not for everyone. But I think their packages are reasonably priced and their customer service is excellent. Their customer service is excellent! I’m paying $59 (USD) per month and am happily doing so. I have complete control over my little area of the server and don’t have to worry about what someone else might have done that would bring my site down. I have a few sites on the server, and the VPS handles them easily.

In the forums, these companies have also been recommended:

Besides letting you all know about a good hosting company, if you’re looking for one, I wanted to thank the people that have used me as a reference when creating his or her own account with ServInt. I don’t know who has done so, but a couple of people have signed up with ServInt and mentioned me in the past few months, which gives me a small credit on my account. Thanks for that!

Q&A => What kind of infrastructure does a big site need?

I forget who asked me this question or when, but I’ll revisit it now as it’s pertinent. First, I have to define my terms. By “big”, in terms of traffic, let’s look at sites in the 100,000 – 10 million pageviews per day category. Under that number, you probably only need a single server and a CDN. Above that number, like an Amazon, YouTube, Twitter, Google, or Facebook, it’s not a question of how many servers are required, but how many buildings of servers are needed.

I recently spoke with someone responsible for a Web site that receives between 10 and 20 million pageviews per day. This is definitely a top 50 site. The site is a mixture of text content, images, and video. What do you think is required to handle that kind of demand? The answer may surprise you.

In this particular case, two and a half people maintain the site. The site runs on eight (8) Web servers, with two database servers, all in one location. They also use a CDN. That’s it. And these are actually Windows servers, running .NET.

Servers are very capable machines, when configured properly, a little bit of hardware can go a long way.

On the Blog => How to Pick a Data Center

In November, I posted a short article on my blog titled “How to Pick a Data Center“. The post is a description of, and reference to, an article put out by my hosting company on the geographic importance of selecting where your server is physically located.

On the Web => Scaling Lessons Learned at Dropbox

Last year I came across a good article titled Scaling Lessons Learned at Dropbox. In it, the author, Rajiv Eranki (the second engineer hired at Dropbox), explains how they successfully managed to scale Dropbox from 4,000 users to 40 million. It’s a very interesting read, presenting many good, concrete solutions to a problem that most of us are never fortunate enough to have. A bit technical at times, yes, but worth the read.

On the Web => Useful Services

Thomas Fuchs, creator of the script.aculo.us JavaScript library and the Freckle online time tracking site, posted a great page of useful services a couple of weeks back. These are sites and tools that Thomas and Amy Hoy have found to be invaluable in running their Freckle SaaS (Software as a Service). From Web site and server monitoring, to log management, to email tools, there’s about a dozen or more products that come highly recommended (and not all necessarily cost). If you’re going big with a project, these are exactly the kinds of resources you need.

Larry Ullman’s Book News => “The Yii Book” version 0.6 Update Posted

Last week, I posted an update to “The Yii Book“. This is version 0.6, and it completes Part 2 of the book. The book currently consists of 14 chapters and 360 pages (as a PDF)!

In a couple of months, once the first version of the book is done, I’ll start pursuing translations, a print version, and an update for Yii framework version 2 (when that’s appropriate). In the meantime, I’m working away at getting the first edition completed.

And thanks to everyone for their interest in this book and to everyone that has already purchased a copy!