A Simple Approach to Site Security

April 17, 2010

There are two kinds of security that Web sites, applications, and operating systems can provide: perceived security and actual security. Perceived security is still important, because that’s what convinces users that it’s safe to, for example, provide their personal information to your Web site. But actual security is the key. Think of it as the difference between having a sign in front of your house saying it’s protected by a security system and actually having a security system. But if you’re anything like me, you’ve never tried to hack someone’s Web site and aren’t generally inclined to think like a person who would, so how do you make your sites secure? Here’s what I do…

You really have to start thinking about security from the get-go, which means the database. Secondary column properties make a big difference to the reliability of the stored data. This means identifying columns as NOT NULL, UNSIGNED, a particular number of decimals, and applying a UNIQUE index (all of these as appropriate, of course). These settings will prevent bad data from getting into the database, such as negative quantities or duplicate email addresses (the PHP code will need to handle those MySQL rejections, though).

In your PHP code, you’ll want to use regular expressions to validate data when you can. And all strings should be run through an escaping function like mysqli_real_escape_string(), prior to using them in a database query. I would do this even if the data already passed a regular expression (you cannot be too careful). You can also consider applying strip_tags(), too.

For numeric values, I strongly recommend type-casting:

$something = (int) $_POST['something'];

Type-casting to numbers will make the data safe to use in queries. If you type-cast a string as an integer, the result will be zero, so you can type-cast, then check for a valid value (i.e., greater than zero). Valid numeric values will just be formally converted from a string with that value to a number with that value.

With variables, it’s also best to assume the values are invalid or not present at all, then prove otherwise.

When you get to your HTML, there’s not much you can do to guarantee security, but you can encourage it. For example, you can limit the length of a text input to whatever is the maximum size of the corresponding column in the database. Or you can use drop-down menus with preset values. Client-side validation using JavaScript is nice for the end user, but is not a real security tool as JavaScript can easily be disabled.

Once you’ve done all that (and hopefully you already are doing most of these things), you can run some tests to find potential security holes. To start, I think about the three kinds of people that use the site: those that use it perfectly, exactly as intended; those that use it without malicious intent, but that might cause problems; and those that are trying to hack the site. In the first category you have pretty much just me. I’m developing the site, I know what it’s supposed to do and how I think it’s to be used, and I’m not likely to break it. In the second category is almost everyone. They just want to use the site, they may make mistakes, they may have apostrophes in their names, and they just expect the site to work regardless. For these people, the goal is to provide a proper user interface, point out mistakes when they occur, and insure that clean, proper data is used at every step of the way. For the third category of user, they’ll do everything they can, include taking extraordinary steps, to try to break your site in order to get some information they deem useful.

There’s really no point in testing the site as I’d use it, so I quickly start imitating the behavior of the second group of users. First, I test my PHP scripts by submitting forms without doing anything at all. The result should be appropriate error messages to the end user, without ANY “undefined index” or similar PHP errors. The same goes for scripts that expect to receive values in the URL: test them without sending any values in the URL and see the result. Second, I fill out forms using invalid values: numbers for strings and vice versa. What is the result? Third, I fill out the forms using potentially valid, but complicated values, such as strings with apostrophes and quotation marks, possibly even with HTML.

As I said, I’m not maliciously-inclined (or I’d like to think I’m not), so it’s hard for me to think like a hacker. Much of what a hacker might try to do will be caught by the previous set of tests, though. If your site handles invalid data, apostrophes, quotation marks, and HTML tags properly, you’ve already done a lot to prevent bad things from happening. Hackers might take things further though, like creating their own HTML form that submits data to your PHP scripts (you can POST data without using a form at all, too). What if a hacker were to create a copy of your form, and replace your “quantity” or “states” drop-down menu with a text input, so they could use their own values? How well would the PHP script handle that contingency?

Hackers also like to get into places they shouldn’t, so try directly accessing scripts or directories that should be protected. Also try directly running scripts that are meant to be included, such as a configuration file. If you’re serving files to only authenticated users, can those files be viewed directly?

Sometimes the goal of a hacker is to find out information about your server, so you have to be extra careful that the site does not give too much away, specifically, too much about PHP, Apache (or whatever Web server), and MySQL. Towards that end, you have to make sure that MySQL errors are handled properly, without revealing anything to the end user. To do this, you may need to temporarily break your database scripts to see the result. For example, you’ve got a site working and it’s not giving away any secrets, whether or not the user behaves themselves. But what if the database server goes down or is to busy or something else happens to prevent your PHP scripts from connecting? What would the user see then? Hopefully nothing but a generic “system is done/come back later” apology.

With practice, developing sites to properly handle all of these situations and types of users becomes second-nature (not that you shouldn’t continue to test them). But I know that as you’re just getting going, it’s natural to feel like you haven’t done enough or tested enough. I hope that these few paragraphs provide a better feel for the process.