Expert Performance Indexing in Azure SQL and SQL Server 2022 |
Page 1 of 3 Author: Edward Pollack & Jason Strate This book discusses indexes, a primary means of improving performance in SQL Server, how does it fare? Indexes are typically a dry subject matter, however they are fundamental to both understanding the structure of data and helping improve query performance. I did wonder why anyone would want to write a book solely about indexes. After all, if you wanted to know about a given make of car, would you buy a book that’s only about the car’s engine? The answer I think, from a performance perspective, is yes. The book, the 4th edition of the title, is purported to be an update on the 2019 edition, undoubtedly timed for the release of SQL Server 2022. Below is a chapter-by-chapter exploration of the topics covered. Chapter 1: Index Fundamentals The chapter provides a comprehensive overview of what indexes are, the different types, and why they are important. The types of indexes discussed include heaps (sic), clustered, non-clustered, columnstore, xml, spatial, hash, range, and full-text search. Also covered are primary keys, unique indexes, included columns, partitioned indexes, filtered indexes, and compression. The various options used for creating an index are covered in good detail. These options include fillfactor, pad_index, sort_in_tempdb, ignore_dup_key, statisitics_norecompute, drop_existing, online, allow_row_locks, allow_page_locks, maxdop, data_compression, on partitions. A brief overview of the system tables that contain index metadata is provided, including: sys.indexes, sys.index_columns, sys.xml_indexes, sys.spatial_indexes, sys.column_store_dictionaries, and sys.column_store_segments. Since SQL Server 2019, index rebuilds are resumable, details on this are described briefly. The Table of Contents lists sys.selective_xml:index_paths and sys.selective_xml:index_namespaces - which are nonsense, helpfully the underlying chapter correctly uses SYS.SELECTIVE_XML_INDEX_PATHS and SYS.SELECTIVE_XML_INDEX_NAMESPACES. These faults exited in the book’s previous edition - makes me wonder how much of an ‘update’ this book is over its previous edition Useful discussions, diagrams, practical example code, and website links are given throughout. These traits apply to the whole of the book.
Chapter 2: Index Storage Fundamentals There is great detail in this chapter, probable more than most would need to know. Parts of it are academic, however it does give you a detailed understanding of how data is physically stored and related. Rows, pages and extents are discussed, before delving deep into the various page types. The different types of pages tie the data structures together, and physically implement the database. Page types covered include: File header, boot, PFS, GAM, SGAM, DCM, IAM, Data, Index, and LOB (text/image). Index organization in terms of heaps, b-trees and columnar is described. The chapter ends with a brief look at the characteristics (e.g. restrictions) of the various index types: heap, clustered, non-clustered and columnstore. The chapter says columnstore indexes can’t be ordered, but a significant new feature of SQL Server 2022 is to have ordered columnstore data, this helps create more granular segments, giving better segment elimination and thus faster performance – so much for an updated edition for SQL Server 2022… Chapter 3: Examining Index Contents Sometimes, you may need to delve deeper into the physical structure and content of a database page, for example when you investigate database corruption. This can be done using Dynamic Management Functions (DMFs) or the older DBCC commands. DMFs have the advantage of filtering and JOINing to other system tables/views to get additional related data. The DMFs discussed are:
In each case, useful descriptions, code, and outputs are discussed. Next, various DBCC commands are discussed. These are old-time commands, so older DBAs are more likely to be familiar with them. The DBCC commands are discussed, and example SQL code provided, to look at the content of individual physical pages. I’ve used this in the past when DBCC CHECKDB has reported corruption (unfortunately for me, the typical response of a database restore was not possible due to missing trans logs, but luckily the corruption was limited to a non-clustered index that could easily be rebuilt :-). Overall, a useful and detailed chapter showing how to look at the physical database structures. Although this is a new chapter, much of its content previously belonged to chapter 2. Chapter 4: Fragmentation As data changes, index data can become fragmented, which can degrade query performance. This short chapter discusses how to determine how fragmented an index is. As throughout the book, there are some great examples to illustrate the points made. The chapter looks at both forwarded records (for heaps only) and page splits – both of these use the DMF sys.dm_db_index_physical_stats. There’s a link to chapter 11 which discusses how fragmentation affects performance, together with various approaches to reducing fragmentation. Although this is a new chapter, much of its content previously belonged to Chapter 9. Chapter 5: Index Metadata and Statistics The chapter opens with a look at index statistics i.e. information about the density and distribution of column values in an index, these are used by the optimizer to create an efficient access path to the data. The chapter provides a detailed look at what index statistics are and why they are important. Stale statistics can make SQL that should run in a few seconds take many minutes (or hours) to run. Helpfully, SQL is provided to examine the state of your index statistics. Usage stats in relation to the DMO (Dynamic Management Object) sys.dm_db_index_usage_stats are discussed next. This DMO contains metrics about how an index is being used (i.e. seek, scan, lookup), in terms of the number of batches that use this index (rather than the number of rows accessed via each method). It also provides associated last accessed dates for each index access method. This can be useful in determining how, when and if an index is used. Operational stats in relation to the DMO sys.dm_db_index_operational_stats are then discussed. This DMO contains metrics about how an index is being used (i.e. seek, scan, lookup), in terms of the number of rows accessed via each method. Additional useful columns relate to row/page locks, lock escalations, latch contention, and compression are discussed in detail. Next, index metrics relating to space and fragmentation as discussed in the DMO sys.dm_db_index_physical_stats. Fragmentation can result in data taking longer to read since the data is not in the required logical/physical sequence. Fragmentation is typically removed during a maintenance period. Lastly, Columnstore statistics are discussed. These are different from the typical rowstore indexes. The DMOs sys.dm_db_column_store_row_group_physical_stat and sys.dm_db_column_store_row_group_operational_stats are used to obtain physical and operational statistics respectively. While it is great having lots of details about the various DMOs, and how the indexes are being used, I wanted more information about how to interpret the data (some is given), and what to do next. Aspects of fragmentation information is discussed in too many places, in the previous chapter, this chapter, and in Chapter 11 – it would be better to have it all in the dedicated fragmentation chapter. |
||||
Last Updated ( Tuesday, 04 April 2023 ) |