Tuesday, August 23, 2011

A tip for handling long downtime

So you push out a piece of code and it eats your live database. The site is broken. You need to take it down to repair the database. So you're going to keep your site down for how long? 30 minutes? 6 hours?

If you're trying to "fix" a database and you're keeping your site down until it's done, get a read-only copy of an old snapshot of the database + site code up. Put up a banner on all pages saying the site is under emergency maintenance so parts of the site are temporarily disabled.

This way your users get to continue using at least the read-only parts of the site and not all of your traffic goes out the window. Keep this in mind when developing the site too; not being able to update a hit counter in the database for a specific page should be a soft error, for example.

If you don't have a place to host this temporary database + site code, think about having such a place. Secondary/failover hosts would work at a time like this, or maybe your single host(s) need more capacity.