Page 1 of 2
Author: Jason Strate and Grant Fritchey
Audience: Performance DBAs/Developers
Reviewer: Ian Stirk
This book discusses indexes, a primary means of improving performance in SQL Server, how does it fare?
Indexes are typically a dry subject matter, however they are fundamental to both understanding the structure of data and helping improve query performance. I did wonder why anyone would want to write a book solely about indexes. After all, if you wanted to know about a given make of car, would you buy a book that’s only about the car’s engine? The answer I think, from a performance perspective, is yes.
Before reading the book, I had a list of questions that I hoped would be covered, this would let me know how detailed the book was. These included:
Will it use practical examples to prove a point?
Does it mention the plan cache as input to the DTA?
Does it identify the limit in the number of missing indexes?
Will it provide scripts to make my working life easier?
Will it provide scripts to automatically fix my databases?
It successfully answered most of my questions – showing me the book had both depth and quality. Below is a chapter-by-chapter exploration of the topics covered.
Chapter 1 Index fundamentals
The chapter provides a comprehensive overview of what indexes are, the different types, and why they are important. The types of index discussed include heaps (sic), clustered, non-clustered, columnstore, xml, spatial, hash, range, and full-text search. Also covered are primary keys, unique indexes, included columns, partitioned indexes, filtered indexes, and compression.
The various options used for creating an index are covered in good detail. These options include fillfactor, pad_index, sort_in_tempdb, ignore_dup_key, statisitics_norecompute, drop_existing, online, allow_row_locks, allow_page_locks, maxdop, data_compression, on partitions.
A brief overview of the system tables that contain index metadata is provided, including: sys.indexes, sys.index_columns, sys.xml_indexes, sys.spatial_indexes, sys.column_store_dictionaries, and sys.column_store_segments.
Useful discussions, diagrams, practical example code, and website links are given throughout. These traits apply to the whole of the book.
Chapter 2 Index Storage Fundamentals
There is great detail in this chapter, probable more than you would need to know. Parts of it are academic, however it does give you a detailed understanding of how data is physically stored and related.
Rows, pages and extents are discussed, before delving deep into the various page types. The different types of page tie the data structures together, and physically implement the database. Page types covered are: File header, boot, PFS, GAM, SGAM, DCM, BCM, IAM, Data, Index, and LOB (text/image). Index organization in terms of heaps, b-trees and columnar is described.
Various DBCC commands are discussed, and example SQL code provided, to look at the content of individual physical pages. I’ve used this in the past when DBCC CHECKDB has reported corruption (unfortunately for me, the typical response of a database restore was not possible due to missing trans logs, but luckily the corruption was limited to a non-clustered index that could easily be rebuilt :-).
The chapter continues with a discussion of a major cause of index degradation – fragmentation. It covers forwarded records (for heaps only) and page splits.
Chapter 3 Index Metadata and Statistics
The chapter opens with a look at index statistics i.e. information about the density and distribution of column values in an index, these are used by the optimizer to create an efficient access path to the data. The chapter provides a detailed look at what index statistics are and why they are important. Stale statistics can make SQL that should run in a few seconds take many minutes (or hours) to run. Helpfully, SQL is provided to examine the state of your index statistics.
Usage stats in relation to the DMO sys.dm_db_index_usage_stats are discussed next. This DMO contains metrics about how an index is being used (i.e. seek, scan, lookup), in terms of the number of batches that use this index (rather than the number of rows accessed via each method). It also provides associated last accessed dates for each index access method. This can be useful in determining how, when and if an index is used.
Operational stats in relation to the DMO sys.dm_db_index_operational_stats are then discussed. This DMO contains metrics about how an index is being used (i.e. seek, scan, lookup), in terms of the number of rows accessed via each method. Additional useful columns relate to row/page locks, lock escalations, latch contention, and compression are discussed in detail.
Finally, index metrics relating to space and fragmentation as discussed in the DMO sys.dm_db_index_physical_stats. Fragmentation can result in data taking longer to read since the data is not in the required logical/physical sequence. Fragmentation is typically removed during a maintenance period.
While it is great having lots of details about the various DMOs, and how the indexes are being used, I wanted more information about how to interpret the data (some is given), and what to do next.
Chapter 4 XML Indexes
Chapter 5 Spatial Indexing
Chapter 6 Full-Text Indexing
These are atypical indexes, relating to unstructured data. I think they’re included for completeness rather than for detail (after all they contain the word ‘index’). That said, in each case a satisfactory overview is given with examples to get you started in your investigations.
XML indexes allow you to query XML by storing path, tags, and value details in an index. Spatial data relates to the geometry and geography data types. Full-text indexing relates to indexing text based content/documents. In the previous edition of the book, these three chapters were combined into one.
Chapter 7 Indexing Memory-Optimized Tables
In-Memory tables are stored in memory, unlike their on-disk counterparts. They are the primary reason for upgrading to SQL Server 2014, providing significant performance improvements. The chapter describes what In-Memory tables are, and how to create them.
The chapter next discusses the two indexing options associated with In-Memory tables: hash indexes are optimized for key lookups, and range indexes are useful for searching for a range of data. Example code is given to illustrate the usage of both types of index.
Chapter 8 Index myths and best practices
This chapter was a nice bit of fun, short, sharp and quite revealing. Various myths were discussed (typically illustrated with practical examples), including:
Databases don’t need indexes
Primary keys are always clustered
Online index operations don’t block
Any column in a multicolumn index can be used as a filter
Clustered indexes store rows in physical order
Indexes always output in the same order
All tables should have heap/clustered index
This was followed by some index best practices. Maybe the best advice given is that you should always test things yourself! The best practices included:
Use clustered indexes on Primary Key by default
Balance index counts (i.e. reads < updates)
Set FillFactor for indexes individually
Index Foreign Keys
Continuously review your index environment