You are on page 1of 2

A Synthesis Method for Database Normalization

Jamal Munshi, Sonoma State Univesity, 1995


All rights reserved

In the RDBMS metaphor, a database is a collection of tables that together


represent a cohesive data environment of the business enterprise. Each table has a
unique name within the database and consists of columns and rows. Each column has
a unique name within the table and represents an attribute relevant to the user.
Each row contains an instance of the table. One or more columns form a key
structure whose value appears only once in each table and uniquely identifies a row
in the table. Tables may also contain one or more foreign keys which appear as keys
in other tables. It is through these foreign keys that row to row linkages are
established between related tables. It is a property of the relational model that the
conceptual data structure is established by the semantic content of the data rather
than the preconceived or so called `hard wired' reporting requirements. Therefore,
theoretically, if the set of tables is properly constructed, all queries that are
meaningful within the semantic constraints are possible using SQL.

The construction is proper if all non-key attributes are functionally dependent on


the key, the whole key, and nothing but the key and if no part of the key is
functionally dependent on any other part of the key. This state of the tables is
referred to as the `normalized set' and the normalization process consists of
procedures to produce this set.

Conventionally, the normalization concept and procedure are explained in stages


called `normal forms'. The process begins with one single table that contains all
attributes (columns) that are relevant to the database being designed. The single
large table is then decomposed stepwise into normal forms. Each step addresses a
single normalization issue.

In the first step repeating groups are removed and the resultant tables are said to
be in the first normal form. The first normal relations are then decomposed
stagewise by addressing each of the normalization criteria. When dependencies on
part of the key are removed, the relations are said to be in the `second normal
form'. Similarly, when transitive dependencies (dependencies on non-key attributes)
are removed by further decomposition the result is the third normal form or the so
called Boyce Codd normal form.

Further decomposition to remove multivalued dependencies may be used to produce


a set of relations in the fourth normal form. Some designers recognize a Domain Key
Normal Form (Fagin 1981) in which every constraint on the relations is the result of
only two variables: the key and the domains of the non-key attributes. No
generalized method for achieving this state has been proposed.

The decomposition method of normalization is difficult to use and confusing to


students and analysts alike and it is based on the unrealistic notion that we begin
the design process with the largest possible tables. In the synthesis method
proposed in this paper, we reverse the process and begin with the smallest possible
tables which are normalized by definition since each of these tables states a single
dependency equation.

We call these relations `elemental tables. They are produced by the following
procedure: first list all the entity-types and their attributes; select the attributes
one at a time and determine the functional dependency and key structure needed to
identify an instance of this entity type; pick those attributes from the attribute list
and construct the key for that one single non-key (or partial key) attribute; and
place the attribute and its key structure into an elemental table. Then go on to the
next attribute and keep constructing elemental tables until all attributes have been
accounted for.

Although we now have a normalized set of relations, their use is cumbersome since
the number of tables that must be referenced by SQL queries will be unnecessarily
large. The database will also suffer severe degradation problems. To enhance
database performance and simplify queries, we now synthesize larger tables from
the elemental tables. In the synthesis process we delete elemental relations that
are semantically redundant and combine relations that have exactly the same key
structure. The result will be a set of tables that are as normalized as any analysis
method is able to achieve but the procedure is immensely simpler. In the resultant
tables every non-key attribute is FD on the key, the whole key, and nothing but the
key and no attribute of the composite key is FD on any other part of the key.

As an example, consider Chens famous example (Chen, P., ACM TODS, vol. 1, no. 1)

EMPLOYEES (SSN, NAME,AGE,DEPT,BUDGET,PROJECT,PROJNAME,PCTIME)

The functional dependencies are described in the paper. First we produce the
elemental tables as follows:

ER1(SSN,NAME) ER2(SSN,AGE) ER3(SSN,DEPT) ER4(SSN,PROJECT)


ER5(DEPT,BUDGET) ER6(PROJECT,PROJNAME) ER7(SSN,PROJECT,PCTIME)

ER7 contains all the information in ER4. Therefore ER4 is semantically redundant
and is dropped. ER1, ER2, and ER3 contain identical keys. They are collapsed into a
single table. The resultant normalized tables are:

NORMAL1(SSN,NAME,AGE,DEPT) NORMAL2(DEPT,BUDGET)
NORMAL3(PROJECT,PROJNAME) NORMAL4(SSN,PROJECT,PCTIME)

You might also like