Page 2 of 7
Chapter 5 Index Fragmentation
Fragmentation affect performance, especially for index scans. The chapter opens with a discussion of external fragmentation (logical order of pages doesn’t match the physical order), and internal fragmentation (free space in index e.g. due to deletions). The DMV sys.dm_db_index_physical_stats is used to get fragmentation amount. Fillfactor and pad_index are discussed as a way of reducing fragmentation, at the expense of non-full pages (so less data per read).
Index maintenance provides a mechanism to rebuild or reorganise the index pages to reduce the degree of fragmentation. Typically, when the fragmentation is over 30%, a rebuild is desired, when it is between 5% - 30% a reorg is recommended. Index maintenance needs some thought since it can impact the transaction log, network traffic, and high availability (HA) options.
Finally, some common patterns that cause fragmentation are examined and illustrated with code. These patterns include: indexing on random values e.g. GUIDs, increasing row size during updates, the use of row versioning (this adds 14-bytes per row), and DBCC SHRINK.
The chapter provides useful code showing what fragmentation is, and how it can be reduced. Common pattern of fragmentation are discussed together with solutions to remove it. Often it involves a balance between page fullness and acceptable amounts of page splits. The author suggests experimenting with the fillfactor, starting with 100% and reducing it by 5% until the DMVs show it gives a good value. There’s a useful reference grid of the various index maintenance options (e.g. online/offline) against the various SQL Server versions. Lastly, there’s a reference to Ola Hallengren’s very useful database maintenance script.
I sometimes wonder if a tool could examine the degree of fragmentation together with the number of reads/write and use this to create a more optimal fillfactor when rebuilding the index…
Chapter 6 Designing and Tuning the Indexes
Indexes are a primary source of performance improvement. The chapter starts with a look at the factors to consider when choosing a clustered index (i.e. static, small, increasing value). Code is provided to illustrate the impact of uniqueness and duplicates on the size of a clustered index – showing the importance of specifying uniqueness when possible. Problems of stale statistics, hotspots, and uniqueidentifiers are discussed.
The chapter then discusses factors to consider when choosing a non-clustered index. Here you should aim for selectivity, else the index may not be used. Code is given to illustrate how having several separate non-clustered indexes compares with a composite non-clustered index (the former can produce intersecting indexes), testing will help you determine which is better for your workload.
Optimizing and tuning indexes is an iterative process, and probably belongs towards the end of the development cycle, when the schema and associated queries are more stable. At the project start, it’s possible to create primary and foreign keys. The importance of testing with expected data volumes and workloads is emphasised. Unused or inefficient indexes should be removed. The DMV sys.dm_db_index_usage_stats can be used to show how and when an index is used, and the DMV sys.dm_db_index_operation_stats provides deeper information about locks, I/O, access method etc.
Lastly, various methods of detecting suboptimal queries are discussed. These include looking at queries with the most I/O, and most CPU. Useful tools include SQL Profiler and Extended Events. Additionally, the DMV sys.dm_exec_query_stats is easy to use, as are the missing indexes DMVs.
This chapter gives some great insights into why problems occur and how having the right index design can correct them. There’s a useful discussion on the consequences of updating clustered indexes, and its impact of non-clustered indexes and fragmentation. There’s an interesting point about defining a non-clustered index as unique when possible, since this saves space and also helps the optimizer create better plans. The reason for using ‘option (recompile)’ on the sample code is not explained (it’s to ensure the SQL is not cached).
Chapter 7 Constraints
The chapter begins by examining the purpose of constraints. Constraints allow relationships to be trusted, ensure data integrity and allow business rules to be implemented.
Primary keys (PKs) uniquely identify a row, internally it’s implemented as a unique key that doesn’t allow null values. By default a PK is created as a unique clustered index. PKs can be identified via the is_primary_key column in the catalog view sys.indexes. Unique constraints allow one null value. Unique constraints allow the optimizer to create better execution plans, but index changes are more expensive. Uniqueness can be implemented using a unique constraint or a unique index.
Foreign Key (FK) constraints enforce relationships between tables. However, there is a cost with FKs, each insert needs a lookup, and example code is given that shows this in the execution plan. It is possible to implement FKs without creating an index, this typically leads to a scan and slow performance. It’s generally a good idea to create indexes on FKs. Details of FKs are given in sys.foreign_keys and sys.foreign_keys_columns.
Lastly, check constraints enforce domain integrity by limiting the values in a column. These can help the optimizer produce a more efficient execution plan. Sometimes they can be expensive for large data changes. Metadata is stored in sys.check_constraints.
This chapter shows constraints are important in enforcing relationships and rules. There are good links to related chapters and various catalog views. This chapter did not have a summary section.
Chapter 8 Triggers
Triggers contain code that runs in response to certain events e.g. inserting data. There are triggers for data manipulation language (DML) changes, data definition language (DDL) changes, and logon.
The chapter opens with a discussion of DML triggers. These can be useful for both auditing and enforcing business rules across tables. There are 2 types, ‘instead of’ – which replace the data modification, and the more typical ‘after’ – which runs after data modification. Good example code is provided to illustrate their working. There is an overhead associated with using triggers. The inserted and deleted tables are introduced, being stored in the version store of tempdb.
DDL triggers are responses to DDL events e.g. create table. They are useful in auditing and restricting changes. Since they occur after the event, they can be expensive if a large table needs to be rolled back. Information about the event can be obtained via the EVENTDATA function. DDL triggers exist at both the server and database level, metadata is stored in sys.triggers and sys.server_triggers.
Logon triggers fire after successful authentication but before a session is established. This can be useful in restricting resources or multiple logons. For busy servers, client code may need to change to handle potential timeouts.
It is possible to determine what individual columns have changed by using the update() and columns_updated() functions. Session context_info can be used to pass information to a trigger.
This chapter provides a good overview of the various types of triggers, what they are, how they are used, and their potential impact on performance.
Chapter 9 Views
Views provide a layer of abstraction, hiding implementation, and restricting access. The chapter opens with a look at regular views, being just metadata, with the virtual table being created at run time. They simplify security, since access can be granted to the view instead of the underlying tables. The accompanying code illustrates how sometimes views can contain unnecessary joins that a basic query wouldn’t need. Partitioned views combine data from different tables using the UNION ALL command. This can be useful for archiving and data sharding.
Indexed views are materialized views, storing actual data. Each time data in a base table is updated the indexed view is synchronously updated too. Indexed views require a clustered index, and can also have non-clustered indexes, these are often used to enforce uniqueness on a subset of values. There are many restrictions on created an indexed view, a technet link provides further information.
This chapter contains a good description of the different types of view, how to create them, and how they are used, together with any performance implications. There’s a useful tip about SQL Server not performing join elimination in the case of composite FK constraints, the solution is to define the view with outer joins.
Chapter 10 User-Defined Functions
Code reuse is an important principle in software development, and is typically implemented as User-Defined Functions (UDFs) in T-SQL, however reuse can cause performance problems.
The chapter opens with a look at multi-Statement functions, these have a BEGIN…END structure with a single RETURN statement. There are 2 types, scalar – which return a single scalar value, and table-valued – which builds and returns a table. These can be expensive if used as part of a SELECT or WHERE clause. The optimizer doesn’t add the true cost of the UDF to the execution plan, assuming only 1 row is returned irrespective of the statistics, this can result in a significantly suboptimal plan.
Inline table-value functions (TVFs) are sometimes called parameterized views. The code is expanded inline, there is no separate call, and also the statistics are used in creating the execution plan. Where possible, multi-Statement scalar functions should be converted to inline functions (if they return one value). There’s an interesting, if tortuous, example of converting a multi-statement function to inline by making creative use of nested CTEs and returning the data as XML, to improve performance.
This chapter provides a good overview of the different function types and how they should be used, bearing in mind the potential performance problems. As always, there are plenty of examples and detail to support the assertions made. I like the author’s recommendation of using schemabinding on functions to prevent potential errors due to metadata changes. The image on listing 10-10 is difficult to read.
Chapter 11 XML
XML is the standard for platform independent information exchange. The chapter opens with the dilemma about whether to use XML or not, since while it offers the flexibility, processing it can be expensive. XML indexes can improve performance, but usually take up a lot of space (often more than the underlying XML data itself). It is suggested, where possible, to shred the XML and store in tables. It is possible to see the internal XML index names via the sys.internal_tables view
The XML datatype is stored internally as UFT-16 with some compression, and doesn’t preserve the original XML document – there is an interesting example that illustrates this. Typed XML binds the XML to a schema, this provides better validation but can be costly.
XQuery can be used to process the XML data, using various methods (i.e. value, exists, query, nodes, and modify) – example code is provided for each processing methods. OpenXML is another method of processing XML in SQL Server, it uses the MSXML parser. The chapter ends with a look at the FOR XML statement which gets data as XML in 4 methods: raw, auto, explicit and path.
This chapter looks at another area of SQL Server functionality with an emphasis on how it relates to performance and internals. There’s a useful link to the MSDN site for further information about XML indexes and XML DML. I was a surprised no mention was made of using the CLR to process XML, especially since the CLR chapter does mention XQuery.