Author: John Allspaw & Jesse Robbins (eds)
Publisher: O'Reilly, 2010
Aimed at: Anyone who has to manage a web site
Pros: An interesting and readable collection of essays
Cons: Sometimes obvious, raises more questions than answers
Reviewed by: Ian Elliot
One of the biggest problems that confronts us is managing to work out a reasonably robust and scalable architecture for a web site. It almost sounds silly that we have been working on the problem for so long that there can be any doubt about how to do it - but there is!
This is a collection of fairly diverse essays on different aspects of "Keeping the Data on Time", which is the book's subtitle. More prosaically its about building and keeping web sites running mostly told from personal experience. You might think that so much of this stuff is obvious but think again. Many of the stories start out with a "this is how it should be done" and end with a big "but".
This doesn't mean that there aren't some contributions that you will find less than interesting. The first for example is Web Operations:The Career. The point is that most web ops people get there by accident and it's mostly a question of just getting on with it. My guess is that most readers will skip this essay and start the book at How Picnik Uses Cloud Computing, an essay on planning to use cloud based resources - Amazon EC2 in particular. Of course this was written before the Amazon outages and the doubts that have set in over cloud reliability, but it is still relevant if fairly obvious stuff.
Next we have a look at metrics and how to use them - boring but essential. Continuous Deployment is a rehearsal of some of the agile arguments and if you already know about this applying them to the web will come as no surprise.
Infrastructure As Code is quite a nice idea but really it's an argument for a service-oriented architecture that can be run on almost any hardware. The idea is that you infrastructure should be a code repository that can be easily deployed without worrying about exact hardware configuration.
The next essay is on another standard topic - monitoring - but it is well written and a refreshingly honest account of the aims and difficulties of monitoring a web site with an eye to maximizing up time.
Chapter 7 is a re-run of an almost classic essay "How Complex Systems Fail". I for one have never understood why the essay is regarded as important - it mostly states the obvious in over generalizations.
Chapter 8 is an interview with Heather Champ who does something important at Flicker. It isn't quite clear what her role is but it seems to be about keeping the community happy when the system fails. This is a sometimes amusing account of turning failure into a community based joke. Some food for thought about how to represent your failures to your users.
Dealing with Unexpected Traffic Spikes is a really good account of what happens when your popular website suddenly becomes even more popular. It is basically an everyday story of a cache that didn't really know what to do when it became overwhelmed by requests. You almost feel the panic! For me this was the best read in the entire book and there was a sense of anticlimax as the essays returned to focusing on more mundane topics:
- Dev and Ops Collaboration and Cooperations
- How Your Visitors Feel: User-Facing Metrics
- Relational Database Strategy and Tactics for the Web
- How to Make Failure Beautiful: The Art and Science of Postmortems
- Nonrelational Databases
- Agile Infrastructure
- Things that Go Bump in the Night (and How to Sleep Through Them)
Overall what you make of this book depends on how much you know and what you are trying to do. Some of the essays will resonate with you more than others. Some you will think are downright boring and obvious so at the end of the day what you think of the book will depend on what proportion you find relevant.The book doesn't often give solutions to the problems it raises in exact terms. It is more a "you need to move in this direction" sort of advice.
If you want to know about making big websites work then this is a good place to start, if only to discover what you already know.