You are on page 1of 9

For useful Documents like this and Lots of more Educational and Technological Stuff Visit...

Normalization www.thecodexpert.com

Normalization is the process of efficiently organizing data in a database. This includes creating tables and
establishing relationships between those tables according to rules designed both to protect the data and to
make the database more flexible by eliminating redundancy and inconsistent dependency. Generally there are
two goals of the normalization process:
1. Eliminating redundant data (for example, storing the same data in more than one table).
2. Ensuring data dependencies make sense (only storing related data in a table).

Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is
logically stored.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one
place must be changed, the data must be changed in exactly the same way in all locations. A customer address
change is much easier to implement if that data is stored only in the Customers table and nowhere else in the
database.

Preliminary Definitions
In this section I introduce several definitions that are common jargon in the world of database administration
and normalization.

Entity: The word ‘entity’ as it relates to databases can simply be defined as the general name for the
information that is to be stored within a single table. For example, if I were interested in storing information
about the school’s students, then ‘student’ would be the entity. The student entity would likely be composed
of several pieces of information, for example: student identification number, name, and email address. These
pieces of information are better known as attributes.

Primary key: A primary key uniquely identifies a row of data found within a table. Referring to the school
system, the student identification number would be the primary key for the student table since an ID would
uniquely identify each student.
Note that a primary key might not necessarily correspond to one specific attribute. In fact, it could be the
result of a combination of several components of the entity. For example, while a location could not be a
primary key for a class, since there might be several classes held there throughout the day, the combined time
and location would make a satisfactory primary key, since no two classes could be held at the same time in the
same location. When multiple attributes are used to derive a primary key, this key is known as a concatenated
primary key.

Relationship: Understanding of the various relationships both between the data items forming the various
entities and between the entities themselves forms the crux of database normalization. There are three types of
data relationships that you should be aware of:

 One-to-one (1:1) - A one-to-one relationship signifies that each instance of a given entity relates to exactly
one instance of another entity. For example, each student would have exactly one grade record, and each
grade record would be specific to one student.

 one-to-many (1: N) - A one-to-many relationship signifies that each instance of a given entity relates to one
or more instances of another entity. For example, one professor entity could be found teaching several classes,
and each class could in turn be mapped to one professor.

 many-to-many (M: N) - A many-to-many relationship signifies that many instances of a given entity relate
to many instances of another entity. To illustrate, a schedule could be comprised of many classes, and a class
could be found within many schedules.

Foreign key: A foreign key forms the basis of a 1: N relationship between two tables. The foreign key can
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
be found within the M table, and maps to www.thecodexpert.com
the primary key found in the 1 table. To illustrate, the primary key
in the professor table (probably a unique identification number) would be introduced as the foreign key within
the classes’ entity, since it would be necessary to map a particular professor to several classes.

Entity-relationship diagram (ERD): An ERD is essentially a graphical representation of the database


structure. These diagrams, regardless of whether they are built using the latest design software or scrawled on
a napkin with a crayon, are immensely useful towards attaining a better understanding of the dynamics of the
various database relationships.

Determinant and Dependent

The terms determinant and dependent can be described as follows:

1. The expression X Y means 'if I know the value of X, then I can obtain the value of Y' (in a table or
somewhere).
2. In the expression X Y, X is the determinant and Y is the dependent attribute.
3. The value X determines the value of Y.
4. The value Y depends on the value of X.

Functional Dependencies (FD)

A functional dependency can be described as follows:

1. An attribute is functionally dependent if its value is determined by another attribute.


2. That is, if we know the value of one (or several) data items, then we can find the value of another (or
several).
3. Functional dependencies are expressed as X Y, where X is the determinant and Y is the
functionally dependent attribute.
4. If A (B,C) then A B and A C.
5. If (A,B) C, then it is not necessarily true that A C and B C.
6. If A B and B A, then A and B are in a 1-1 relationship.
7. If A B then for A there can only ever be one value for B.

Examples: PERSON Relation

PERSON SIN NAME CITY


123 Laurent Toronto
324 Bill Toronto
574 Bill Montreal
What can we say about Person table?
“If I know the sin number I know the name”
SIN attribute determines NAME attribute.
Attribute NAME functionally depends of the attribute SIN
Warning: Knowing the NAME does not imply the SIN knowledge: NAME __ SIM

NOTATION: SIN _ NAME

Transitive Dependencies (TD)

A transitive dependency can be described as follows:

1. An attribute is transitively dependent if its value is determined by another attribute which is not a key.
2. If X Y and X is not a key then this is a transitive dependency.
3. A transitive dependency exists when A B C but NOT A C.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
Multi-Valued Dependencies (MVD)www.thecodexpert.com

A multi-valued dependency can be described as follows:

1. A table involves a multi-valued dependency if it may contain multiple values for an entity.
2. A multi-valued dependency may arise as a result of enforcing 1st normal form.
3. X Y, ie X multi-determines Y, when for each value of X we can have more than one value of Y.
4. If A B and A C then we have a single attribute A which multi-determines two other independent
attributes, B and C.
5. If A (B,C) then we have an attribute A which multi-determines a set of associated attributes, B and
C.

Join Dependencies (JD)

A join dependency can be described as follows:

If a table can be decomposed into three or more smaller tables, it must be capable of being joined again on
common keys to form the original table.

Modification Anomalies

A major objective of data normalization is to avoid modification anomalies. These come in two flavours:

1. An insertion anomaly is a failure to place information about a new database entry into all the places
in the database where information about that new entry needs to be stored. In a properly normalized
database, information about a new entry needs to be inserted into only one place in the database. In an
inadequately normalized database, information about a new entry may need to be inserted into more
than one place, and, human fallibility being what it is, some of the needed additional insertions may be
missed. There are circumstances in which certain facts cannot be recorded at all. For example, each
record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire
Date, and Course Code—thus we can record the details of any faculty member who teaches at least
one course, but we cannot record the details of a newly-hired faculty member who has not yet been
assigned to teach any courses. This phenomenon is known as an insertion anomaly.

An insertion anomaly. Until the new faculty member is assigned to teach at least one course, his
details cannot be recorded.

2. A deletion anomaly is a failure to remove information about an existing database entry when it is time
to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of
entry needs to be deleted from only one place in the database. In an inadequately normalized database,
information about that old entry may need to be deleted from more than one place, and, human
fallibility being what it is, some of the needed additional deletions may be missed.There are
circumstances in which the deletion of data representing certain facts necessitates the deletion of data
representing completely different facts. The "Faculty and Their Courses" table described in the
previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be
assigned to any courses, we must delete the last of the records on which that faculty member appears.
This phenomenon is known as a deletion anomaly.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to
any courses.

3 The same information can be expressed on multiple records; therefore updates to the table may result
in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an
Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will
potentially need to be applied to multiple records (one for each of his skills). If the update is not
carried through successfully—if, that is, the employee's address is updated on some records but not
others—then the table is left in an inconsistent state. Specifically, the table provides conflicting
answers to the question of what this particular employee's address is. This phenomenon is known as an
update anomaly.

An update anomaly. As shown in fig, its having different addresses on different records.

All three kinds of anomalies are highly undesirable, since their occurrence constitutes corruption of the
database. Properly normalised databases are much less susceptible to corruption than are unnormalised
databases.

Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized.
These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred
to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often
see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen

First Normal Form


We say database in First normal form if a table is in first normal form if all the key attributes have been
defined and it contains no repeating groups. It can given in the following points

1. Eliminate repeating groups in individual tables.


2. Create a separate table for each set of related data.
3. Identify each set of related data with a primary key.

Consider the following example


For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

This structure creates the following problems:

 Order 123 has no room for more than 3 products.


 Order 456 has wasted space for product2 and product3.

In order to create a table that is in first normal form we must extract the repeating groups and place them in a
separate table, which I shall call ORDER_LINE.

I have removed 'product1', 'product2' and 'product3', so there are no repeating groups.

Each row contains one product for one order, so this allows an order to contain any number of products.

2nd Normal Form


A table is in second normal form (2NF) if and only if it is in 1NF and every non key attribute is fully
functionally dependent on the whole of the primary key (i.e. there are no partial dependencies).

1. Anomalies can occur when attributes are dependent on only part of a multi-attribute (composite) key.
2. A relation is in second normal form when all non-key attributes are dependent on the whole key. That
is, no attribute is dependent on only a part of the key.
3. Any relation having a key with a single attribute is in second normal form.

Second normal form (2NF) further addresses the concept of removing duplicative data:

 Meet all the requirements of the first normal form.


 Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
 Create relationships between these new tables and their predecessors through the use of foreign keys.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
Take the following table structure as an example:
www.thecodexpert.com

Order (order_id, cust, cust_address, cust_contact, order_date, order_total)

Here we should realize that cust_address and cust_contact are functionally dependent on cust but not on
order_id, therefore they are not dependent on the whole key. To make this table 2NF these attributes must be
removed and placed somewhere else.

3rd Normal Form


A table is in third normal form (3NF) if and only if it is in 2NF and every non key attribute is non transitively
dependent on the primary key (i.e. there are no transitive dependencies).

1. Anomalies can occur when a relation contains one or more transitive dependencies.
2. A relation is in 3NF when it is in 2NF and has no transitive dependencies.
3. A relation is in 3NF when 'All non-key attributes are dependent on the key, the whole key and nothing
but the key'.

Take the following table structure as an example:

order(order_id, cust, cust_address, cust_contact, order_date, order_total)

Here we should realise that cust_address and cust_contact are functionally dependent on cust which is
not a key. To make this table 3NF these attributes must be removed and placed somewhere else.

Eliminate fields that do not depend on the key.

Values in a record that are not part of that record's key do not belong in the table. In general, any time the
contents of a group of fields may apply to more than a single record in the table, consider placing those fields
in a separate table.

For example, in an Employee Recruitment table, a candidate's university name and address may be included.
But you need a complete list of universities for group mailings. If university information is stored in the
Candidates table, there is no way to list universities with no current candidates. Create a separate Universities
table and link it to the Candidates table with a university code key.

Boyce-Codd Normal Form


A table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and every determinant is a candidate
key.

Take the following table structure as an example:

schedule(campus, course, class, time, room/bldg)

Take the following sample data:


For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com
Note that no two buildings on any of the university campuses have the same name, thus ROOM/BLDG
CAMPUS. As the determinant is not a candidate key this table is NOT in Boyce-Codd normal form.

This table should be decomposed into the following relations:

R1(course, class, room/bldg, time)

R2(room/bldg, campus)

Normalizing an Example Table


These steps demonstrate the process of normalizing a fictitious student table.
Un-normalized table:

1. Student# Advisor Adv-Room Class1 Class2 Class3


1022 Jones 412 101-07 143-01 159-02
4123 Smith 216 201-01 211-02 214-01
First Normal Form: No Repeating Groups

Tables should have only two dimensions. Since one student has several classes, these classes should be
listed in a separate table. Fields Class1, Class2, and Class3 in the above records are indications of design
trouble.

Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem is
with a one-to-many relationship, do not put the one side and the many side in the same table. Instead, create
another table in first normal form by eliminating the repeating group (Class#), as shown below:
2.
Student# Advisor Adv-Room Class#
1022 Jones 412 101-07
1022 Jones 412 143-01
1022 Jones 412 159-02
4123 Smith 216 201-01
4123 Smith 216 211-02
4123 Smith 216 214-01
Second Normal Form: Eliminate Redundant Data

Note the multiple Class# values for each Student# value in the above table. Class# is not functionally
dependent on Student# (primary key), so this relationship is not in second normal form.

The following two tables demonstrate second normal form:

Students:
3.
Student# Advisor Adv-Room
1022 Jones 412
4123 Smith 216

Registration:

Student# Class#
1022 101-07
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
1022 143-01 www.thecodexpert.com
1022 159-02
4123 201-01
4123 211-02
4123 214-01
Third Normal Form: Eliminate Data Not Dependent On Key

In the last example, Adv-Room (the advisor's office number) is functionally dependent on the Advisor
attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown
below:

Students:

Student# Advisor
4. 1022 Jones
4123 Smith

Faculty:

Name Room Dept


Jones 412 42
Smith 216 42
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...

For useful Documents like


www.thecodexpert.com

this and
Lots of more
Educational and
Technological Stuff...

Visit...

www.thecodexpert.com

You might also like