Page 1 of 2
Author: Kalen Delaney
Publisher: Red Gate
Audience: Professionals working with any RDBMS, despite being SQL Server specific
Reviewed by: Nikos Vaggalis
Although this began as a usual book review, it quickly grew into an all-encompassing article as a result of the additional and extended commentary
While concurrency is always covered as part of a book on SQL Server, this title is totally dedicated to it. Exaggerated one might say; quite the opposite and with good reason I would say. It aims to raise awareness of this frequently underrated subject whose incomprehension can result in massive performance complications.
The problem lies in that no matter how well designed your application, database, or SQL queries are, all working harmonically in isolation, going multi-user without putting considerable effort into understanding concurrency (and how your specific DBMS goes about it), will sooner or later lead your edifice into collapsing, shaken by performance degradation, unacceptable waiting time or even corrupted data state.
So this is where this slim guide comes in, to properly educate anyone working with the RDBMS in keeping out of trouble, building a greatly performing multi-user application and being ready to troubleshoot when the need arises.
The ideal way to read this book is to not skip a line, never mind a page, otherwise there's the risk of overlooking the hidden gems, the little, subtle details based on the author's over two decade long professional experience, that makes it stand out from other titles.
Although the techniques and information cover SQL Server, I don't see why they can't, in the most part, be applied in general, due to the underlying principles being identical; it's just jargon that differs. Therefore, wherever possible, I will be attempting to translate the SQL server specific terms to their more generic counterparts.
This book is available as a free PDF from the Redgate website Click on the book cover to access download
Chapter 1 starts with the basics of the transaction and its ACID properties, and explains that the database engine guarantees A-tomicity, C-onsistency and D-urability, but leaves the control of the I-solation to the developer, despite having a default one.
It then elaborates on the read phenomena that, subject to the degree of strictness imposed by the ANSI isolation level, can either be prevented or allowed.
As a historical note, before the establishment of the ANSI SQL-92 standard which introduced the isolation levels, locking was controlled in accordance with the proprietary methods used by the vendors that existed at the time e.g. Ingres. So for example using, "set lockmode session where readlock = nolock" or "set lockmode on my_table where readlock = exclusive", was not portable, potentially dangerous and hard to maintain in code.
With the advent of the standard, any DBMS product could be used in the same way, so saying 'READ COMMITTED' had identical semantics universally while the underlying locking implementation could vary from vendor to vendor.
You also get to learn that in choosing an isolation level, being too strict can hamper concurrency, being too loose can risk data integrity like in the case of lost updates, aka transactions overwriting each other’s data. Lost updates are especially "popular" with front end programmers that get to write database applications without comprehending the engine's innards, thinking that just knowing how to do a SELECT or UPDATE is the only thing required, and then erroneously assume that the DBMS will work its magic and handle all the rest for them.
Worst of all, when this breaks down, it can be hard to identify which component is at fault; the code, the DBMS or the evil user?
Of course, the damage inflicted by overwriting each other’s data depends on how mission critical the system in question is. Thus, it might not be that big a deal or even knowingly allowed under the "last writer wins" rule, say in a personnel management application, but not acceptable in a flight booking or banking application.
The end of the chapter makes a prompt context switch from the pessimistic to the optimistic model by introducing the new Snapshot isolation and explains why it's for the better
However there is more to pessimistic concurrency than tackling isolation levels, hence Chapters 2 to 5 examine the rest of its components
Chapter 2 covers the fundamental aspects of locking which are split into five categories :
- the unit of data/resource locked - e.g row, page or table
- the type/mode of locks - e.g. shared,exclusive,update
- the duration of the lock
- lock ownership/scope of the lock - transaction wide, statement wide
- lock metadata - the data stored in the system catalogues on current locking
In the Lock Resources section we find a rather insightful example explaining that the decision of what to lock depends on the current isolation level in relation to the table's underlying data structure.
So under RC and RR isolation, querying a table with a clustered index (data rows are physically aligned to the index rows, in other words a BTREE) takes (KEY) locks on the index rows, querying a HEAP table (no index) takes (RID) locks on the data rows; but if there is a non-clustered index (data rows are not physically aligned to their index rows) then both types of locks, KEY and RID, are taken. Querying a range of values by using an index under SERIALIZABLE isolation takes key-range locks.