Normalization

Normalization is a series of steps followed to obtain a database design that allows for efficient access and storage of data. These steps reduce data redundancy and the chances of data becoming inconsistent.

First Normal Form[edit]

First Normal Form eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship.

Two rules follow this definition:

Each table has a primary key made of one or several fields and uniquely identifying each record
Each field is atomic, it does not contain more than one value

For instance, assuming a table WAGON to follow each wagon in every station.

wagon_id    integer  (ex. 101)
description string   (ex. 'wagon_type, empty_weight, capacity, designer, design_date')
state       string   (ex. 'under repair')
timestamp   datetime (ex. '22/12/2008 17:37')
station     string   (ex. 'New York Grand Central')

The primary key is (wagon_id, timestamp).
This table is not in 1NF because "description" is not atomic. To move it in 1NF we have to split "description" field in its components:

wagon_id     integer
wagon_type   string
empty_weight number
capacity     number
designer     string
design_date  datetime
state        string
timestamp    datetime
station      string

Second Normal Form[edit]

Second Normal Form requires that every non-key attribute must depend on the whole of every candidate key and not on just a part of a key.

To fulfill this normal form it is necessary to eliminate functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key.

In our example, "wagon_type", "empty_weight", "capacity"... only depends on "wagon_id" but not on "timestamp" field of the primary key, so this table is not in 2NF. In order to reach 2NF, we have to split the table in two in the way that each field of each table depends on all the fields of the primary key:

wagon_id     integer
wagon_type   string
empty_weight number
capacity     number
designer     string
design_date  datetime

wagon_id     integer
timestamp    datetime
state        string
station      string

Third Normal Form[edit]

Third Normal Form requires that non-key attributes must depend only on candidate keys.

To fulfill this normal form it is necessary to put all fields that are in functional dependencies on non-key fields in a separate table.

At this stage, all non-key fields are dependent on the whole of every key and nothing but the key(s).

In our example, in the first table it is most likely that "empty_weight", "capacity", "designer" and "design_date" depend on "wagon_type", so we have to split this table in two:

wagon_id     integer
wagon_type   string

wagon_type   string
empty_weight number
capacity     number
designer     string
design_date  datetime

Now our example with its 3 tables is in 3NF.

Fourth Normal Form[edit]

Fourth Normal Form separates independent multi-valued facts stored in one table into separate tables.

In the last table of our example, it is clear that "empty_weight" and "capacity" are interesting for every day wagon management whereas "designer" and "design_date" for historical purpose. So we can think they will never be used at the same time and have to split the table in two to fit the 4NF.

wagon_type   string
empty_weight number
capacity     number

wagon_type   string
designer     string
design_date  datetime

Fifth Normal Form[edit]

Fifth Normal Form breaks out data redundancy that is not covered by any of the previous normal forms.

Also see[edit]

Denormalize - the reverse process

Examples in an article by Barry Wise: http://www.barrywise.com/2008/01/database-normalization-and-design-techniques/

Glossary of Terms
A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z	#

Normalization

Contents

First Normal Form[edit]

Second Normal Form[edit]

Third Normal Form[edit]

Fourth Normal Form[edit]

Fifth Normal Form[edit]

Also see[edit]

Navigation menu

Personal tools

Site Navigation

Search

Site highlights

Wiki Navigation

Tools