Improve SQL performance – An Intelligent Update Statistics Utility

Written by Ian Stirk

Wednesday, 05 December 2012

Keeping statistics up to date is vital to database performance important. Stale statistics can result in queries taking much longer to run. Here's a utility that will update the statistics in an intelligent manner.

Previously, in Improve SQL Performance – Know Your Statistics, I showed how we can determine if index statistics in SQL Server are up to date. This is important because statistics can determine if an index will be used, and how it is used with a SQL query. Stale statistics can result in a query performing poorly.

In that article I provided SQL code that showed when the statistics were last updated and how much the underlying data had changed since the last update. Here I go one step further and provide SQL code that will update the statistics, and do so in an intelligent manner.

Typically, SQL Server itself will automatically update an index’s statistics when 20% of the index’s data changes. While this may be sufficient some times, there are many cases where it is not. This is clearly shown by the common practice of manually updating statistics when a query starts to run slowly.

Intelligent Update Statistics Utility

The SQL code provided here is intelligent since it:

Only updates those indexes that have changed
Uses a sampling algorithm that reflects the number of rows in the index
Runs frequently and is pre-emptive

If we only update the statistics of those indexes whose underlying data has changed, we make more efficient use of resources (time, memory, CPU etc). With small tables it makes sense to sample more rows since it is easier to miss a column value compared with a larger table. With larger tables a smaller sample will often suffice since we are more likely to capture a given column value. Running the utility frequently will ensure the statistics are up to date and we incorporate any new boundary values (e.g. a new date).

To update the statistics, in an intelligent manner, you should run the following SQL code on the relevant database:

listing1

Overview of SQL code

If you want the code for this utility it can be found in the CodeBin.

The script starts with the statement SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED. The purpose of this is to ensure the SQL that follows does not take out any locks and does not honour any locks, allowing it to run faster and without hindrance.

In the SQL code, four system tables/views are involved in intelligently updating statistics. A brief description of each is given below:

Table/View	Description
sys.indexes	Contains details for each index, e.g. name
sys.sysindexes	Contains details for each index and table. e.g. row count, number of row changes since statistics last updated
sys.tables	Contains table information, e.g. name
sys.schemas	Contains details of schema objects, e.g. name

In the SQL code the sys.indexes system view joins with the sys.sysindexes system view based on their common name and id columns. Joins to the sys.table and sys.schema system tables are provided to obtain the name of the schema and table the index’s statistics relate to. This is required when we want to update the relevant statistics.

We capture the name of the schema, table, index, together with the number of rows in the index, into a temporary table named #IndexUsage. We filter on the joined system views based on various criteria; specifically, we’re concerned only with indexes that relate to user tables as opposed to system tables. Additionally, we’re concerned only with clustered and non-clustered indexes, ignoring heaps. We want only indexes that have at least 100 rows, and finally, we’re concerned only with indexes whose data has changed since the last time update statistics was run on the index. It’s possible to amend this section of the script so that we update only the statistics of indexes that have changed by a given number or percentage of rows.

The name of the index that we want to update the statistics on is pre-pended with the name of the schema and table, together with the keywords UPDATE STATISTICS. Additionally it is appended with the number of rows to sample. An intelligent sampling algorithm determines how many rows to sample, based on the number of rows in the index. The resultant SQL commands are concatenated, and the result is stored in the variable @UpdateStatisticsSQL.

The content of the variable @UpdateStatisticsSQL is displayed for debugging purposes, via the PRINT command. This variable is then used as input to the system stored procedure sp_executesql; running this executes the SQL commands and updates the statistics on the relevant indexes.

Commenting out the call to sp_executesql will show what statistics would have been updated, without actually executing any updates. This can be useful in determining the impact of the update.

The algorithm used to determine the percentage of rows to sample is given below:

Rows in index	Sampling percentage/row count
Up to 500,000	100% (up to 500,000 rows sampled)
Up to 1,000,000	50% (up to 500,000 rows sampled)
Up to 5,000,000	25% (up to 1,250,000 rows sampled)
Up to 10,000,000	10% (up to 1,000,000 rows sampled)
Up to 50,000,000	2% (up to 1,000,000 rows sampled)
Up to 100,000,000	1% (up to 1,000,000 rows sampled)
Over 100,000,000	3,000,000 rows

The algorithm is weighted such that smaller indexes have a higher sampling percentage. This seems appropriate because you’re more likely to miss an important data value if you have a lower sampling percentage in a small index. Similarly, you’re more likely to repeat a data value if you have a high sampling percentage on a large index.

You can alter this algorithm to suit your needs. These sampling values have proved helpful on some of the systems I’ve used (that have 600 indexes, some with row counts in excess of 11 billion). Ultimately, the sampling percentages should reflect the variability in the underlying column data values.

It’s possible to limit the number of indexes that will have their statistics updated by filtering, out or in, by adding a WHERE condition. It’s also possible to filter out or in by database name, schema name, table name, index name, or the number of rows changed.

This SQL code could be implemented in a SQL Server agent job, and scheduled to run periodically, thus automatically keeping your statistics up to date. This would pre-empt any problems based on stale statistics, fixing them before they occur.

Conclusion

The SQL code provided allows index statistics to be updated in an intelligent manner.

System tables and DMVs provide a quick and easy way to investigate and prevent performance problems, including updating index statistics intelligently. There really are a great number of SQL Server secrets that DMVs can reveal. For further information, see my recent book “SQL Server DMVs in Action”, which contains more than 100 useful scripts.

The code for this article can be found in the CodeBin.

Improve SQL Performance – Know Your Statistics

Identifying your slowest SQL queries

Improve SQL performance – find your missing indexes

A Generic SQL Performance Test Harness

SQL Server DMVs in Action (Book Review)

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 02 July 2020 )

Recent Articles

Recent Book Reviews

Popular Articles

Intelligent Update Statistics Utility

Overview of SQL code

Related articles:

Comments