Codd and his Rules
Codd and his Rules
Written by Mike James   
Thursday, 05 October 2017
Article Index
Codd and his Rules
The Join

The Join

This description would be incomplete without mentioning the operation that really makes real world database manipulations possible - the join.

If you have two database tables A and B to form the join A*B you first create the Cartesian product A X B. The Cartesian product of database tables is just the table which has the columns of A and B and lines that are created by taking every possible combination of a line from A and a line from B. For example, if A is:

x      y
tom    29
fred   30
codd   32

and B is:

x      z
tom    france
codd   spain

then the Cartesian product of the two tables A X B is:

x     x’    y   z
tom   tom   29  france
tom   codd  29  spain
fred  tom   30  france
fred  codd  30  spain
codd  tom   32  france
codd  codd  32  spain

You should be able to see how this works but notice that the field x from table B has been renamed x’ to make it distinct.

The second step in forming the join is to remove all of the lines that do not have identical values in the common fields, i.e. x and x’. So A*B is

x     x’    y   z
tom   tom   29  france
codd  codd  32  spain

If you also take out the duplicate x’ column then you get what has come to be called an “equi-join” if you don't take out the duplicates you get a “natural-join” or just a “join”.

By using joins it is possible to break a database down into a number of smaller tables which can be put back together. Exactly how you break a database down is a question of which “normal form” you opt to use and here we start getting into the depths of database design. But put simply normal forms are mostly about removing the redundancies in a database to try to push the representation closer to that of a pure set and a set based algebra.

Implementation

Needless to say Codd’s approach was seen as very attractive - although not at first by his employer IBM. In 1982 IBM finally caught on and announced SEQUEL and their new database, DB2, both based on Codd’s relational theories. Codd had his own database language called alpha but the IBM team developed their own which wasn't really a relational language but SEQUEL became popular and eventually turned into SQL when the Oracle database was released. SQL contains lots of features that go beyond what Codd considered to be the pure theory of relational databases - but as it is so popular it has become what we all think of as the relational database language. So much so that Microsoft even named its database engine - SQL Server.

 

sql

 

The whole subject became so heated and confused that in 1985 Codd published his (in)famous 12 rules which were the principles that a relational database should obey. Interestingly Codd’s rules have become a stick that the “database thought police” use to beat the innocent programmers rather than a guiding light - for this reason you will find them reproduced on the next page.

His book, The Relational Model for Data Base Management" covers the practical aspects of the design of relational databases and defines the twelve rules and the systems that need to be followed in order to be described as truly relational with the motivation behind these rules in over 500 pages.

Codd attempted to remove the “procedural” approach from database and many think that this isn’t possible using a theory based on relations. Even more radical, some go so far as to think that it isn’t desirable and the mathematician’s phobia of procedure, i.e. dynamic processes, shouldn’t be foisted onto the programmer. But notice that this isn't the motivation of the many No-SQL databases that are appearing to be gaining support. This is more about the practical difficulties of building databases that are distributed across servers and which are available to many users at the same time. These are not issues that Codd, Codd's rules or SQL ever considered.

In 1981 Codd was awarded the Turing Award and in 1982 the ACM chose his 1970 paper as one of the 25 most important contributions to the industry. Whatever the true and long term value of the relational model, Codd never gave up the 12-rule approach and defined 12 rules for On-line Analytical Processing (OLAP)! He retired from IBM in 1984 and set up two companies to provide consultancy to the database world. 

The 12 rules of Codd

Of the 12 rules only the first 6 have to be satisfied for a database to be called “relational” but there is a rule 0 which has to be obeyed - perhaps they should be called Codd’s 13 rules?

 

Rule 0: Relational Database Management

For any system that is advertised as, or claimed to be, a relational database management system that system must be able to manage database entirely through its relational capabilities.

 

Rule 1: Representation of information

All information in a relational database is represented explicitly at the logical level and in exactly one way - by values in tables.

 

Rule 2: Guaranteed logical accessibility

Each and every datum in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.

 

Rule 3: Systematic representation of missing information

Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational database management systems for representing missing information and inapplicable information in a systematic way, independent of the data type.

 

Rule 4: Dynamic online catalog

The database description is represented at the local level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.

 

Rule 5: Comprehensive data sub-language

A relation system may support several languages and various modes of terminal use (for example, the “fill in the blanks mode”). There must be, however, at least one language whose statements are expressible, per some well defined syntax, as character strings, and that is comprehensive in supporting all of the following items:

  • Data definition
  • View definition
  • Data manipulation
  • Integrity constraints
  • Authorization
  • Transaction boundaries (begin, commit, rollback)

Rule 6: Updatable views

All views that are theoretically updatable are also updatable by the system.

 

Rule 7: High level insert, update and delete

The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data, but also to the insertion, update and the deletion of data.

 

Rule 8: Physical data independence

Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.

 

Rule 9: Logical data independence

Application program and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

 

Rule 10: Integrity independence

Integrity constraints specific to a particular database must be definable in the relational data sub language and storage in the catalog, not in the applications program.

 

Rule 11: Distributed independence

Whether or not a system supports database distribution, it must have a data sublanguage that can support distributed database without impairing application programs or terminal activities.

 

Rule 12: The nonsubversion rule

If a relational system has a low-level (single-record-at-a-time) language, that low-level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time)

 EFCodd

Related Articles

View Updating and Relational Theory

A Generic SQL Server Compression Utility

Dataclips 2.0 - A Pastebin For SQL

Database Design & Relational Theory

 

 
 

 

blog comments powered by Disqus

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

Banner


Introduction to Boolean Logic

It may sound like a daunting topic, but Boolean logic is very easy to explain and to understand. It represents the simplest of all the logics and the very basis of computing. Today, November 2, 2 [ ... ]



Inside Bitcoin - The Block Chain

Bitcoin is a currency that exists entirely in software and is under the control of no central authority. What is really important about Bitcoin, however, are the algorithms that make it all work. We e [ ... ]


Other Articles

 

<ASIN:0596520832>

<ASIN:0596008945>

<ASIN:0596100124> 

<ASIN:0201612941>

<ASIN:0321197844>

<ASIN:1425122906>



Last Updated ( Thursday, 05 October 2017 )
 
 

   
Banner
Banner
RSS feed of all content
I Programmer - full contents
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.