Authors: Baron Schwartz, Peter Zaitsev & Vadim Tkachenko
Aimed at: MySQL DBAs, but useful for developers too
Pros: Really strong technical tips and discussions
Cons: The broad scope means some topics could benefit from more detailed discussions.
Reviewed by: Kay Ewbank
Databases are a bit like old cars; getting them started and running is just the first step on a lifetime of tweaking and tinkering with the engine to try to get them to run more smoothly and to find ways to make them go faster. Most DBAs spend their time wrestling with settings to try to improve the performance, and this book is packed with 770 pages talking about how to do just that. I can't describe the book as an easy read, but the authors do make the going as easy as possible with a certain amount of humour.
The subtitle of the book is 'Optimization, Backups and Replication', and this is the third edition covering MySQL 5.5. The authors all work for Percona, and the book does talk about Percona tools for investigating MySQL performance, but as open source utilities they are available free of charge.
The book kicks off with a look at MySQL Architecture and History before getting into the a useful chapter on benchmarking MySQL, where the authors discuss what and how to measure, tactics, tools, and some go through some examples of benchmarks.
Having worked out just how slowly your MySQL server is running, the next chapter is dedicated to how to profile the performance of your server. The authors say that the three most common performance-related requests they receive is to find out whether a server is doing all its work optimally; to find out why a specific query isn't running fast enough; and to troubleshoot intermittent incidents when the server stalls or freezes. This is a good chapter that looks in depth at just what 'performance' means, then goes through how to profile entire apps, particular queries, and workloads. You're then shown how to use the profile for optimization. I thought the section on profiling to get to the bottom of intermittent problems was particularly interesting, because these can be so annoying and hard to track down.
Chapter 4 looks at how to optimize schema and data types. Some of the advice here seems obvious - choose small and simple data types, avoid nulls if possible - but the more detailed discussions could well bear fruit. The section on schema design seemed more likely to produce useful performance benefits, though the whole topic of schema design is such a matter of experience that I suspect the advice would be most useful to fairly experienced admins.
The classic 'quick fix' for improving performance is to make use of indexes, and the next chapter looks at indexing for high performance. The benefits and drawbacks of b-tree indexes and hash indexes are discussed before the authors go on to look at specific strategies such as isolating the index column in queries; indexing on part of a column to avoid the need to index long text columns; how to choose a good column order, and a number of other very specific ideas and techniques.
Next comes a chapter on optimizing the performance of queries. This is another hefty chapter, and it starts off with the question of why are queries slow? The opening section on query basics goes through a couple of obvious questions that are too often ignored; are you asking for data you don't need, and is the server having to analyze more rows than it needs to. There's a good section on ways to restructure queries more efficiently; whether it's better to have one complex query or several simple ones to provide the same final answer, for example. Next comes a description of the limitations of the MySQL Query Optimizer, looking at the different parts (such as hash joins and parallel execution) that can cause problems. The chapter ends with a set of tips for optimizing different types of queries - Joins, subqueries, Group By, Union.
Chapter 7 looks at advanced MySQL features such as partitioned tables, views, cursors, full-text searches and the MySQL query cache to see how to use each in the most efficient way. Chapters 8 and 9 continue with this basic theme; chapter 8 looks at how to optimize server settings such as memory, concurrency and I/O. The most useful section of this chapter is a fairly short part on what not to do. Chapter 9 looks at how the operating system and hardware optimization affect MySQL performance, with topics such as how to balance memory and disk resources, when to use solid state storage, how to optimize RAID performance, and using SANs. The chapter ends with a useful little section on how to make use of operating system status utilities such as vmstat and iostat, and how to identify if a machine is CPU-bound, I/O-bound , or swapping.
The next chapter, on replication, comes in at 70 pages, and is pretty tough going. Working out why a replicated system isn't running fast enough takes knowledge of more than just the database, you also need to understand just what's going on with your replicas. This means the chapter starts from a basic overview of what replication is, followed by details of how to set up replication and the recommended configuration. The two main types of replication - statement based and row based - are compared, then the authors get in to the different topologies - master and multiple replicas, master-master in active-active, master-master in active-passive, ring replicas, trees and pyramids. By the end of this, you know more about replicas than you probably thought you really wanted to, so it's a bit upsetting that the very next section is titled "Why replication doesn't help scale writes", followed by "when will replicas begin to lag"! There's a section on replication administration and maintenance that shows how to measure lag, how to resync a replica, and how to change a master. The final (and fairly long) section in this chapter looks at replication problems and solutions.
Scalability and high availability each get a chapter, looking more at the basics of how they can each help improve performance. The high availability chapter looks at how to improve mean time between failures, and time to recovery. There's a short chapter on MySQL in the cloud, looking at the benefits, drawbacks and myths, and what it means in practical terms when used on Amazon and other solutions.
More practically useful is the chapter on application-level optimization. This starts with an examination of the common problems, then looks in depth at different caching options. There's a meaty chapter on backup and recovery and how to survive recovery, and the final chapter looks at a variety of tools that you can use.
Overall, this is a very useful book. Its 770 pages are densely packed, I was even hoping for a quick cartoon or bit of motivational drivel at times to lighten the load. It's a hard read, but only because the subject material is hard. You can't expect to find out when to use ISAM and when to use INNODB, or twenty ways to make a query faster, without having to read detailed material on why the changes and choices matter. If you administer a MySQL server, or if you're writing code against a MySQL database, this book will improve the way your apps perform, and that should be reward enough for the effort you'll have to put in.