Thursday, September 17, 2009

how to build a high-traffic website

this is all common sense information based on some simple principles. i am not an engineer, but i feel i understand what makes things fast or slow in general and how to apply them in a world of software and enterprise systems.

let's start with cars. something light is going to be faster than something heavier, given the same horsepower. if it's lighter it takes less force to move it. you can compare this to computing as anything 'light' in memory use, disk operations or cpu cycles will take less time to finish 'computing.' keep everything light so you can scale 1000x at a moment's notice.

next, you need your 'frame' to be strong enough to handle the horsepower you throw at it. sheet metal and aluminum will fold up pretty quickly when the torque of 500 horses jars and twists it all at once. this applies not only to the framework of your software, but to the framework of your systems and your network which your systems operate on. they need to scale infinitely (yes, as in to infinity) and they need to be built to handle all the future needs/requirements. when you try to add a hack or reinvent your architecture later you'll find yourself in quite a tight spot, and it will be better if you can just move your existing resources around to handle additional load.

of course, if a car is exorbitantly fast you'll need lots of safety to keep all that power and speed in check. this can mean redundancy and also testing/integrity checking. first you need to consider your technology: are you using Hadoop? if so, you've got a big gaping single-point-of-failure you need to ensure never goes down or is always able to be replicated or hot-failed quickly. what happens when any given part of your infrastructure goes down? will it cause the whole thing to come to a screeching halt or will your car still drive OK with just 7 cylinders? you'll also need to write tests for your software if it changes frequently to ensure it's operating as expected. you could also do to implement a configuration management and tripwire solution to track any change on your systems and apply the correct good working configuration. this isn't a security requirement, this is essential to just keeping everything running.

next, you'll need lots of dials, knobs and lights to know everything is running smoothly. when you have a race car or even a humble sports car, you need to know your oil is OK, your fuel is plentiful and that your engine isn't knocking. this means lots and lots of stats and graphs and monitors. more than you could ever need. you don't want to be caught with your pants down and your car stalled out in the middle of the track because you weren't watching the oil temp level. you need to make sure your disks aren't filling up and know before they do. you need to know the *rate* at which your resources are being used so you have ample warning before things get too close to fix. you need to know when some service is down, and if at all possible get it back up automatically.

finally: the car's performance is dependent upon its mechanics and engineers. you need the people working on it to know what they're doing and you have to make sure they're doing their job right, or you could end up with a catastrophe. you can't afford to have a developer making sql queries that take 50 seconds to complete. you also can't afford to have a sysadmin making local edits or ignoring system faults, or really anybody to falter on their layer. you *need* to police everyone and make sure everything is done *right* with no shortcuts or bad designs. you cannot ignore the fact that diligence will always result in a more reliable system than those without.

Friday, September 11, 2009

how much memory is my application really using?

In order to properly visualize how much memory is being used by an application, you must understand the basics of how memory is managed by the Virtual Memory Manager.

Memory is allocated by the kernel in units called 'pages'. Memory in userland applications is lumped into two generic categories: pages that have been allocated and pages that have been used. This distinction is based on the idea of 'overcommittal' of memory. Many applications attempt to allocate much more memory for themselves than they ever use. If they attempt to allocate themselves memory and the kernel refuses, the program usually dies (though that's up to the application to handle). In order to prevent this, the kernel allows programs to over-commit themselves to more memory than they could possibly use. The result is programs think they can use a lot of memory when really your system probably doesn't even have that much memory.

The Linux kernel supports tuning of several VM parameters including overcommit. There are 3 basic modes of operation: heuristic, always and never. Heuristic overcommit (the default) allows programs to overcommit memory and determines if the allocation is wildly more than the system contains, and if so won't allow allocation. Always overcommit will pretty much guarantee an allocation. Never overcommit will stricly only allow allocation for the amount of swap + a configurable percentage of real memory (RAM). The benefit with never overcommit is that applications won't simply be killed due to the system running out of free pages; instead it will receive a nice friendly error when trying to allocate memory and *can* decide for itself what to do.

Now, let's say the program has allocated all it needs. The userland app now uses some of the memory. But how much is it using? How much memory is available for other programs? The dirty truth is, this is hard to figure out mostly because the kernel won't give us any easy answers. Most older kernels simply don't have the facility to report what memory is being used so we have to guesstimate. What we want to know is, given one or more processes, how much ram is actually being used versus what is allocated?

With the exception of very new kernels, all you can really tell about the used memory of a program is how much memory is in the physical RAM. This is called the Resident Set Size, or RSS. Since the 2.6.13 kernel we can use the 'smaps' /proc/ file to get a better idea of the RSS use. The RSS is split into two basic groups: shared (pages used by 2 or more processes) and private (pages used by only one process). We know that the memory that a process physically takes up in RAM combines the shared and private RSS, but this is misleading: the shared memory is used by many programs, so if we were to count this more than once it would falsely inflate the amount of used memory. The solution is to take all the programs whose memory you want to measure and count all the private pages they use, and then (optionally) count the shared pages too. A catch here is that there may be many other programs using the same shared pages, so this can only be considered a 'good idea' about the total memory your application is using (it may be an over-estimation).

With recent kernels some extra stats have been added to the kernel which can aid in giving a good idea about the amount of memory used. Pss is a value that can be found in smaps with kernels greater than 2.6.24. It contains the private RSS as well as the amount of shared memory divided by the number of processes that are using it, which (when added with other processes also sharing the same memory) gives you a closer idea to the amount of memory really being used - but again this is not completely accurate due to the many programs which may all be using different shared memory. In very recent kernels the 'pagemap' file allows a program to examine all the pages allocated and get a very fine-tuned look at what is allocated by what. This is also useful for determining what is in swap, which otherwise would be impossible to find out.

Based on all this information, one thing should be clear: without a modern kernel, you will never really know how much memory your processes are using! However, guesstimation can be very accurate and so we will try our best with what we have. I have created a perl script which can give you a rough idea of memory usage based on smaps information. You can pass it a program name and it will try to summarize the memory of all processes based on that program name. Alternatively you can pass it pid numbers and it will give stats on each pid and a summary. For example, to check the memory used by all apache processes, run 'meminfo.pl httpd'. To check the memory of all processes on the system, run 'ps ax | awk '{print $1}' | xargs meminfo.pl'.

Some guidelines for looking at memory usage:

* Ignore the free memory, buffers and cache settings. 99% of the time this will not apply to you and it is misleading. The buffer and cache may be reclaimed at any time if the system is running out of resources, and free memory is completely pointless - this relates to pages of physical memory not yet allocated, and realistically your RAM should be mostly allocated most of the time ( for buffers, cache, etc as well as miscellaneous kernel and userland memory ).

* If you don't have Pss or Pagemap in your kernel, a rough guess of used memory can be had by either adding up all the Private RSS of every process or subtracting the Shared RSS from the RSS total for every process. This still doesn't account for kernel memory which is actually in use and other factors but it's a good start.

* Do not make the mistake of confusing swap use with 'running out of memory'. Swap is always used by the kernel even if you have tons of free memory. The kernel tries to intelligently move inactive memory to swap to keep a balance between responsiveness and speed of memory allocation/buffer+cache reclimation. Basically, it's better to have lots of memory you don't use in swap than in physical RAM. You can tune your VM to swap less or more but it depends on your application.

Monday, September 7, 2009

personal preferences for managing large sets of host configuration

after dealing with a rather befuddling cfengine configuration for several years i can honestly say that nobody should ever be allowed to maintain it by themselves. by the same token, a large group should never be allowed to maintain it without a very rigorous method of verifying changes, and policies that will *actually* be enforced by a manager, team lead or other designated person(s).

the problem you get when managing large host configurations with something like cfengine is the flexibility in determining configuration and how people differ in their approach of applying the configuration. say you want to configure apache on 1000 hosts, many of them having differences in the configuration. generally one method will be set down and *most* people will do it the same way. this allows for simplicity in terms of making an update to an existing config. but what about edge cases? those few, weird, new changes that don't work with the existing model? perhaps it requires such a newer or older version of the software that the configuration method changes drastically.

how can you trust your admins to make it work 'the right way' when they need to make a major change? the fact is, you can't. it's not like they're setting out to create a bad precedent. most of the time people just want to get something to work and don't have the foggiest how, but they try anyway. this results in something which is much different and slightly broken compared to your old working model, but since it's new nobody else knows how it works.

i don't have a time-tested solution to this, but i know how i'd do it if i had to set down the original model. it comes down to the same logic behind managing open source software. it's important every contributed change works, yes? and though you trust your commiters you want to ensure no problems crop up. you want to make sure no drastic design changes happen or that in general things are being done the right way. the only way to do that involves two policies.

1. managerial overview. this is similar to peer-review, except there's only 1 or 2 peers whose task it is to look at every single commit, understand why it was done the way it was, and decide to fix or remove the commit if it violates the accepted working model. this requires a certain amount of time from an employee so a team lead makes more sense than a manager. it's not a critical role but it is an important one, and anyone who understands the model and the inner-workings of your configuration management language should be capable.

2. strict procedures for change of configuration management. this means you take the time to enumerate every example of how your admins can modify your configuration management. typically this also includes a "catch-all" instruction to get verification from a manager, team lead, or other person-in-charge if you're going to make a change outside the scope of the original procedures. this requires a delicate touch; bark too hard at offensive misdirected changes and they'll prefer not to contact you in the future. on the other hand, if you don't enforce people following the original procedures and using good judgment you'll get called all the time.

at the end of the day it all comes down to how you lay out your config management. it needs to be simple and user-friendly while at the same time extensible and flexible. you want it to be able to grow with uncommon uses while at the same time not being over-designed or clunky. in my opinion, the best way to go about this is to break up everything into sections, just like you would a filesystem full of irregular material.

the parent directories should be very vague/general and become more specific as they go down, while at the same time always having the ability to group similar configuration into those sub-directories. a depth of 4 or 5 is a good target. don't get too worried about making it too specific; the more general you are at each step, the easier it is to expand the configuration there in the future as well as making it easier for users to find and apply configuration where it makes sense.

you also need to consider: "how practical is it to manage this configuration on this host?" you need to make it so that any given change on any given host should take no longer than 5 minutes to determine how to make that change on that host (using your configuration management system). in this way anyone from the NOC to the system engineers or architects can modify the system in real-time when it counts. documentation is no substitute for intuitive user-friendliness. documentation should explain why something is the way it is, not how to do it or where to find it.

note puppet's style guide and its reasons for its formatting: "These guidelines were developed at Stanford University, where complexity drove the need for a highly formalized and structured stylistic practice which is strictly adhered to by the dozen Unix admins"

*update* i don't think i even touched on it, but in cfengine the use of modules should probably be leveraged greatly instead of briefly. the more code you shove into your module, the less you'll need in your inputs and thus the easier it will be to manage the hulking beast of lengthy input scripts.

Tuesday, September 1, 2009

e-mail compared to the postal service

I think it's interesting to compare the way e-mail works to the way the Postal Service works. Let's look at it from the perspective of the Postal Service handling our e-mail.

First off, everyone pays to send and receive e-mail. Even if it's indirectly (an advertisement in your mail or on your mailbox) someone's paying for your mail. Instead of buying postage for each item you send, you pay a lump sum every month - around $2 per month let's say. Each item you send has the same rate - a postcard and a package are the same under your monthly fee. However, you can only send packages up to 10lbs in weight. You can also only store up to 100lbs (that's not to scale) of mail in your mailbox at any time. If you want to send more or keep more mail, you'll have to pay more per month.

When you mail a letter [e-mail], it goes to your local post office. But what then? In the world of e-mail, every post office is basically a separate fiefdom and subject to its own laws. Your post office will try to deliver your mail to where it needs to go, but it may have to go through several other post offices along the way, and all of them have different rules about whether it is allowed to be delivered. Most of it comes down to whether or not they determine your message is unsolicited bulk mail [spam]. If they think your mail is too suspect, or you're sending too many pieces of mail at once, or you sent it from a post office that may send a lot of spam-like letters, or a number of other reasons: they will not accept your mail. You or your post office will have to convince the other post offices that indeed your letter is a real and honest letter and should be accepted.

Companies that advertise via mail in this way are subject to a common problem: post offices refusing their mail. Sometimes it's genuine; a company may be sending what amounts to spam, and any given post office may not want to pass that on since it may be filling up their queues with excessive mail. A lot of the time the mail is genuine but the post office still deemed it to be spam. Either they're just sending too much and seem suspicious, or some of the recipients don't live at those addresses anymore, who knows. Maybe they just don't have the right official papers [SPF records] and the post office wants them to prove they're real. Everybody has their own criteria and they can do what they want in terms of sending your mail through or not.

It's not their fault. The poor post offices are under siege by tons and tons of mail which most of their users don't want. If the post offices weren't so selective, you'd have full mail bags dumped at your mailbox every day and it'd be nearly impossible for you to sort out an important document from the IRS or something like that. They need to verify that the mail coming in is vaguely genuine or it'll just clog up the whole system.

This isn't an exact representation of how the system works, but it's close. The sending and receiving of mail is basically a game of can-i-send-it, should-i-accept-it, with lots of negotiating and back-and-forth just to get some letters through. But how to solve it? Lots of suggestions have been put forth, even to the point of additional requiring fees to receive or send mail. But I think the heart of the problem lies in the unregulated nature of what is now an essential, core method of communication for the people of the world.

Postal mail is regulated, and though that body is woefully out of date and badly managed it does provide the mail in a reliable and fairly timely manner. The system allows for 3rd parties to also function to ensure a delivery of a message. This needs to be taken into account with our e-mail systems. We should have a regulated method of reliably delivering e-mail, but also allow for 3rd parties that can ensure the delivery of a message. A similar system would require each internet user's mail box to be independent of any particular provider so that anyone anywhere in the world can come and drop mail in it, without need to pay for the privilege of receiving said mail and without thought of whether or not my post office will allow me to receive it [spam blocking]. E-mail should become the next necessary component of communication amongst humans, but to do this you need to break down the barrier of cost and technicality.

I think the idea of requiring a small fee to send some mail is time-tested and effective. Would you spend $0.10 to send a letter to an editor telling them how they're a nazi because their news story had a certain slant? And would spam companies be able to deliver such huge mountains of unsolicited bulk mail if each one cost them $0.10? It wouldn't fix the problem because you'd still have people who have it in their budget to spend a gross amount of money on targeted advertising/marketing, but they couldn't afford to simply mail every person in the world for something they're probably not going to buy. This could also help fund a central regulated organized system of vetting mail and ensuring it gets delivered, instead of having all different post offices decide if they'll deliver you your mail or not.