Using Amazon’s CloudFront as a CDN

February 20, 2011

In October 2010, I revamped my entire Web site, even switching the domain name in the process. The previous, outdated, version of the site was custom built, using a standard template system (i.e., a couple of included files). The site didn’t have too many features, too much content on any one page, and only a smattering of dynamic behavior (a SuckerFish menu being about the most elaborate). The new version of the site uses WordPress as its basis, which means worse performance. Plus, I’m using several plug-ins, and there’s a ton more content on each page. In short, I have a better site in many ways but it performs much more poorly. Thus, I’m embarking on the tedious but valuable process of improving the site’s performance in any way possible. One solution I just embraced is using Amazon’s CloudFront as a Content Delivery Network (CDN). In this post I explain why I choose CloudFront, how I went about setting it up, and what it’s actually costing me. I first heard about CDNs about two years ago, when I started applying Yahoo!’s YSlow tool to my site. The premise behind a CDN is this: even though data travels across the Internet super quickly, the greater the distance that data has to travel the longer it will take, even if it’s an issue of microseconds. If you have a worldwide market, users in, say, Australia or India, will be loading your site more slowly than those in, say, the United States, where the site’s actual server is. This makes sense. However, upon finally learning about CDNs, I promptly dismissed the idea as I really can’t afford having servers around the world, not to mention creating the network and software for managing a CDN is completely beyond me. There are, and were, commercial CDN companies around, but those were generally geared towards bigger, more active sites than mine.

Note: For a basis of comparison, my site currently gets about 60,000+ visits per month, around 400,000+ page views per month, which translates into approximately 1.5 million hits (i.e., things being accessed) and 20GB of data. My site has a lot of textual content, but not that much in terms of media. If your site has more page views or media, the need for a CDN is greater.

Why Amazon?

When Amazon started promoting their Web services, I looked into them, but was quickly scared away but some of the calculations I was seeing. It turns out that if you enter the wrong numbers into Amazon’s payment estimator, you can quickly, and erroneously, come back with estimates of $1700 (US) per month! Well, well outside of my price range. And perhaps about 1,700 times more expensive than the reality. So why did I return to, and eventually use, Amazon?

I researched a number of CDNs, from the consumer friendly MaxCDN to the professional Edgecast. Edgecast has a great network—one of the best I saw—and I’m sure would be fine, but you can’t even get a sense of how much it will cost without speaking to someone. It didn’t look like Edgecast is intended for the commoner. Conversely, MaxCDN is reasonably priced and has plugins for WordPress, but currently only has servers in North America and Europe. I figured if I was going to use a CDN, I wanted one that was more global (and I  know that I get significant traffic from the Middle East, the Far East, and Australia/New Zealand). So this brought me back to Amazon, whose network is stellar, for obvious reasons. Amazon’s services are reasonably priced compared to other numbers of seen, and, are really easy to use.

I’m not saying that CloudFront is the best CDN or the right CDN for you, but just like searching with Google is easy and reliable, or selling iPhone software through iTunes makes sense, there’s an argument for going with something user-friendly and established.

How do you setup Amazon’s CloudFront?

The first thing you have to do is create an Amazon Web Services account. This is free and opens up all of Amazon’s Web Services for you to pick and choose from: computation, database, e-commerce, networking, storage, etc. All I wanted (to start) was CloudFront, Amazon’s CDN. CloudFront requires that you use Amazon’s Simple Storage Service (S3), so you must first sign up for that (actually, you may be able to get away with not using it, but that’d be harder). The S3 just provides a way to put content on Amazon’s system, so that it can then be distributed to CloudFront. Amazon lists the pricing for using S3 and it’s really cheap, like 14 cents per gigabtye stored to start. There are additional costs for data transfer and the number of requests, but neither is significant (I’ll share some hard numbers later on).

Note: At the time of this writing, Amazon has a Free Usage Tier, which gives you a good amount of usage for free for a year.

Once you’ve signed up with S3, you create a “bucket”, which is a storage vessel: a way of organizing your content. I created a single bucket for my single site, but you may need to create more than one. Into this bucket you’d put any content that needs to be stored. For my site, this is the template’s image files, the CSS files, the JS libraries, and whatever media is used by any posts. It does not include the PHP scripts or anything requiring immediate access to my server (e.g., databases and database calls). Right now, for my site, all the S3-stored data is only a few megabytes in size. One trick is to organize the data in the bucket the same way it’s organized on your own server. This will make references to stored data that much easier. The data stored in an S3 bucket is available for serving (using the domain <your-bucket-name>.s3.amazonaws.com). In other words, www.example.com/img/me.png can now also be found at mybucketname.s3.amazonaws.com/img/me.png. But serving content from S3 provides little benefit over just hosting the data on your own server. To serve that data over an CDN, sign up for Amazon CloudFront.

CloudFront has its own pricing, which you’ll pay on top of the S3 cost (although, again, it’s not as much as you might think). Different regions cost different amounts, from fifteen cents per gigabyte to twenty cents per (the costs go down the more terabytes you transfer; I can’t imagine ever transferring a terabyte of data from this server). There’s also a per request cost. Once you’ve signed up for CloudFront, you need to create a “distribution”. You can create as many distributions as you want, like one for each bucket, although I just needed one for now.

You can choose to download or stream the content. For most data, such as CSS and images, download is the proper method. You can select to log the transactions or not (I did not). I’ll get to CNAMEs in a bit. The comments are for your own usage. Then select “Enabled” and create the distribution.

The distribution gets its own domain name, which will be something really unmemorable, like d23590wishpx48.cloudfront.net. It will take perhaps a day for the distribution to be completely deployed, but after that you can use the CDN. To do so, just change the URL of any resource on your site so that it points to the distribution domain name. For example, www.example.com/img/me.png becomes d23590wishpx48.cloudfront.net/img/me.png. Now, instead of that image only being served from your server, it’ll be served from whatever CloudFront server is closest to the user (the non-CloudFront content will still be downloaded from your server).

CloudFront provides “origin pull” service, which is common for CDNs. What this means is that the first time someone in Japan visits your site, their browser will request that image. CloudFront will then grab the image from S3 and store it in the “edge” location in Tokyo, which is to say that server. Subsequent requests from Japan for the same resource will then immediately be served from the Tokyo system. The same goes for users all over the world. CloudFront will also handle caching of your data, which you can also customize. For example, CloudFront, by default, will check for updated versions of a file if a request comes more than 24 hours after the previous request for the same file.

Finally, a cosmetic improvement is to make it look like you’re running your own CDN. To do so, you can associate a CNAME with the distribution domain name. In my case, on my server I created a CNAME (like an alias) so that cloudfront.larryullman.com equates to d23590wishpx48.cloudfront.net. This becomes part of the DNS record of my server and my site. Then I returned to Amazon’s panel and added that CNAME to the distribution. Now, any file destined for the CDN is linked to cloudfront.larryullman.com instead of www.larryullman.com , a minor but useful distinction.

What is it costing me?

The main thing keeping me from using a CDN before was cost, or what I thought the cost would be. I’ve been using CloudFront for about a week now. The following image shows the hard numbers and my to-date cost (for one week) of 12 cents. Yep, 12 cents.

Now, to be fair, I haven’t been making the most of CloudFront yet, and I’m slowly making sure that as many resources as possible are put, and pulled from, there, so the cost will no doubt go up. But it’s looking like my charges should be maybe a couple of dollars per month. If I can get the slightest bit of speed improvement for those dollars, it will be worth it. Speaking of speed…

Did it work?

I started with a CDN to improve my site’s performance, so the final question is: Have I? Well, I can’t say. The site’s server happens to be only a couple hundred miles away from me, and the CloudFront server that I’d likely pull from is just about in the same zip code as it, so I won’t personally see the benefit. Also, YSlow, which factors in use of a CDN as a grading component, only recognizes a few CDNs offhand, meaning you still have to tell YSlow that you are, in fact, tapping into the CDN potential. And, for those users in foreign lands, far away from my server, only a third or so of the data will be provided by the CDN. But every little bit helps, in my opinion, and those little performance improvements will be that much more meaningful as the site grows in popularity. And there’s obviously a justification for a CDN: as you can see in the above image, approximately 64% of the CDN requests are coming from outside of the United States (which is probably to say North America). For pennies, or even dollars, per month, it’s worth doing.

Update

First, it’s been a couple of days since I posted this, and I’ve pointed more of the site’s resources to the CDN, so my total charge thus far is all the way up to 22 cents. So there’s that. I expect I’m going to get to a point of around $2-3 per month.

Second, one argument against using Amazon’s Web Services as I’ve described here is that every time I change a file that’s served via the CDN, such as a CSS script, I need to upload the latest version to Amazon’s S3. I happen to be using a WordPress plug-in that will upload changed files for me, but still, this is something to consider.

Another Update

It’s now been over 2 years since I started using Amazon’s Web Services. For the previous complete month (March 2013), my total bill was $3.44 (US). Most of that ($3.27) was for CloudFront, with 16 cents for S3 (storage) and a penny for extra data transfer. In that month, CloudFront served up about 700,000 requests in the US, over 900,000 in Europe, about 600,000 in Asia and the Pacific, plus 100,000 for South America and Australia.