What is Larry Thinking? #45 => Olio

September 18, 2011

In this edition…

About This Newsletter

I used to regularly do crossword puzzles, although I sadly don’t anymore (I don’t get a regular newspaper and I just can’t get into doing crossword puzzles on the computer). If you’ve done crossword puzzles, or know someone that does, you know that what crossword puzzles really teach you is an esoteric list of words that are unusable in daily life. Olio is one such word: a miscellaneous collection of things. This newsletter has no discrete theme, so when searching for a title, I thought of olio. Olio: that’s what you’re getting here. And, if you’re lucky, it may also be the answer to 44 Down.

A couple of notes… First, in my previous newsletter I had suggested that this issue might have my thoughts about doing mobile development with Flex 4.5, which I’ve been playing with for the past month. I haven’t gotten around to writing up anything yet, and there’s enough to say on the subject that I’ll probably just cover the material through several blog posts. Or maybe it will be in the next newsletter. Who knows? Second, I’m giving away copies of the fourth edition of my “PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide” in this newsletter, as I just received my copies of the book last week.

As always, questions, comments, and all feedback are much appreciated. And thanks for your interest in what I have to say and do!

What Were You Thinking => Storing Uploaded Files in Multiple Directories

In the previous newsletter, I answered a question about storing uploaded files in multiple directories. Jamie was kind enough to bring to my attention a video in which a member of the YouTube development team explains how they made YouTube scalable. The video is an hour long, but if you’re curious about how a site like YouTube is developed and how to design scalable Web sites, it’s worth your time (although some of it is a bit high-end technical). Around the 13:50 time marker, the speaker begins talking about handling thumbnails, which is when YouTube got in a position where they could no longer accept new videos, as they had maxed out the number of files allowed per directory. Another point being made in the presentation, which I might expound upon in a future newsletter, is that YouTube’s main concern wasn’t the site’s performance but rather the performance of their developers, which is to say: How quickly one can make changes to the system on the fly was a much higher priority.

Thanks, Jamie, for the reference!

On the Web => NoSQL Standouts

InfoWorld recently posted a good and long article discussing the particular strengths and weaknesses of various NoSQL (aka, non-relational) databases. If you’re intrigued about what NoSQL databases are all about, and maybe even thinking about starting to play around with one, it’s well worth your time to read.

As for me personally, CouchDB and MongoDB are still most attractive, in some part due to their use of JavaScript and JSON, and in part because of existing PHP libraries for interacting with these database applications.

eWeek also had a recent article discussing NoSQL databases, titled Does NoSQL Matter for Your Organization?. This is a great question, because the benefit of non-relational databases—significant speed and scalability improvements—is something that only a small subset of sites really needs. The article also has a couple of good links to more useful information on the subject of non-relational databases.

On the Web => Optimizing Web Page Performance

While doing some research, I came across an article describing how browsers download Web page resources. The article explains that the HTTP/1.1 specification recommends that browsers download page components—images, CSS, JavaScript, and other media—two at a time. This means that if your site has two CSS scripts, two JavaScript inclusions, and 12 images, that the browser has to go through eight series of downloads, not counting the initial request for the HTML. However, the specification is per hostname, meaning that if some of the content is set to come from, say, www1.example.com and some from www2.example.com, that effectively doubles the number of downloads that can be made simultaneously, thereby greatly improving the site’s performance. As the article points out, in their tests, performance got worse when they attempted to use more than four hostnames (and most browsers have a limit to how many things they can download simultaneously regardless of the points of origination).

Anyway, this is an older article, but worth the quick read. The article contains links to two other good (but dated) discussions of this and related issues. This article is part of a larger four-part series on optimizing Web page performance.

On the Web => Dive into HTML5

Some time ago I came across a book titled “Dive Into HTML5”, written by Mark Pilgrim. Although the work was also published as an O’Reilly book titled “HTML5: Up and Running”, you can read the content online at diveintohtml5.org. It’s a very clear, well-written, informative book, and a lovely Web site, too.

On the Blog => How Web Hosts Prey on Beginners

I was recently reading Popular Science (which I get a lot out of), when I ran across an ad for a major, well-known Web hosting company. Obviously the company focuses on cheap hosting—the company’s plans start at $5 (US) per month, but a few things in the ad caught my eye. In particular, the ad reminded me of how many Web hosting companies, particularly those that provide cheap hosting, prey on the ignorance of beginning Web developers. And so I wrote about How Web Hosts Prey on Beginners in a blog post.

What is Larry Thinking? => Looking Both Ways

I randomly came across this quote, attributed to a person named Doug Linder:

A good programmer is someone who looks both ways before crossing a one-way street.

I have no idea who Doug Linder is (perhaps three of you will email me with this information), but it’s a really good quote. Restated, good programmers don’t rely upon expectations. I would suggest this corollary:

A bad programmer makes assumptions.

That may be a bit harsh—perhaps more nicely said as “An amateur programmer makes assumptions”, but the fact is that making assumptions leads to bugs, user interface issues, and security vulnerabilities. One of the hardest things as a developer, I think, is imagining how other people might use the software you create. Maybe it’s just me, but when I program, I have a sense of what the application (including Web sites) should do and how it should be used. It’s very difficult for me to manufacture what other things people might try to do with the system I’ve created. But it’s exactly the ability to properly handle non-standard, including malicious, uses of a site or application that elevates the program’s security level and professionalism.

As an example, one of the first things I do when testing the usability and security of a script is to submit its form without doing anything to the form at all (i.e., enter no data; make no selections). What happens then? Are server-side errors shown, such as PHP’s undefined variable? The same goes for pages that expect to receive values in the URL: What happens if no value is present in the URL? What happens if the value is a string instead of an integer? What if it’s a negative integer instead of a positive one? Smartly written software works as it should when used correctly and responds with appropriate messages—not server errors—to the end user when used incorrectly.

As another example, when many beginning programmers write a PHP script that handles an HTML form, there’s an assumption made that the PHP script will only handle the form as written. What most beginners don’t realize is that it’s very easy to fake an HTML form and submit any type of form data to your PHP script. For example, a site has form.html which gets submitted to form.php. And let’s say that the form has a gender dropdown menu, with option values of M and F. An assumption might be that the PHP script will only receive either of those characters, making that associated variable safe to use in a query such as INSERT INTO users (…, gender, …) VALUES (…, '{$_POST['gender']}', …). This actually leaves the script vulnerable to SQL Injection attacks…

I could create on my own computer a form that contains a textarea named gender and any other necessary elements in order to pass whatever other validation the server-side script is performing. I set the form’s action attribute to point to the site’s form.php script, even though the form is loaded from my own computer. Into that textarea, I enter ‘;DROP DATABASE users;. This turns the query into INSERT INTO users (…, gender, …) VALUES (…, '';DROP DATABASE users;', …). The theory is that the initial apostrophe I used terminates the query, making it syntactically invalid, and then the DROP DATABASE users; query gets executed instead. Not good. In this particular case, that security hole would be easily closed by actually verifying that gender has a value of M or F, not assuming it does just because your HTML form is written that way.

This is just one example that’s easy enough to both do and undo. But I want to stress that assumptions don’t just create security holes for malicious people to exploit, they also create pits for your site’s users to fall into, thereby making your site less usable or entirely unusable. I’ve certainly been to many sites where I had no idea what was expected of me (from a user interface perspective) or what I needed to do to continue using the site. In short, assumptions were being made on the front and end that I wasn’t privy to.

So, your assignment, should you choose to accept it, is to revisit a legacy script or piece of code you’ve written and see what, if any, assumptions you’ve made. Because you never know when some idiot is driving down the wrong way of a one-way street!

Book Giveaway => “PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide” (4th Edition)

You must be subscribed to the newsletter to qualify for the book giveaway.

Larry Ullman’s Book News => “PHP and MySQL for Dynamic Web Sites” (4th Edition) and “Modern JavaScript”

I’m pleased to say that on Friday, September 9th, I received my hardcopies of the latest edition of “PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide”. This would suggest that the book should be available for purchase, or in bookstores, now or very soon. For details about the book, please see its corresponding pages. I’ve also created a new forum for this edition. Thanks to everyone for their interest in the book!

I’m also pleased to say that I’ve officially begun writing my next book, “Modern JavaScript: Develop and Design.” The first two chapters are on their way to the editors! The first chapter presents a history of JavaScript in Web development terms, ending with today’s conventions. It’s a short chapter that could have been used as a long introduction, but I think there’s an argument to understanding JavaScript’s history in order to best use it today. Also, you’re likely to run across legacy code, so it’s beneficial to be able to recognize what you should no longer be doing.

The second chapter covers the importance of the DOCTYPE and then introduces HTML5. I will be using HTML5, in a limited way, in this book for all of the examples. This is partly because HTML5 has some great new form elements that are usable now and partly because HTML5 is clearly the future standard. Next, the chapter discusses key approaches to Web development and JavaScript programming: graceful degradation, progressive enhancement, and unobtrusive JavaScript. If you’re not already using these last two approaches, they’ll change the way you think about Web development and design. The chapter then jumps right into a real-world use of JavaScript—form validation, in order to whet your appetite and provide a sense of what you’re working towards.

After I send off this newsletter, I’ll begin writing Chapter 3. It goes through the “tools of the trade”: browsers, validators, debuggers, Web resources, and so on. And that concludes Part 1 of the book; Part 2 starts teaching JavaScript as a standalone language.

I believe the book will be made available online as I write it, and I’ll post details about that once I have them.