Professional Documents
Culture Documents
A database is collection of related information stored so that it is available to many user for
different purposes.The management of data in a database system is done by means of a
general purpose software package called database management system (DBMS).
Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQLServer, Oracle
.
What is RDBMS?
RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL,
and for all modern database systems like MS SQL Server, Oracle, MySQL, and Microsoft
Access.
A Relational database management system (RDBMS) is a database management system
(DBMS) that is based on the relational model as introduced by Edgar F. Codd.
What is table or relation?
The data in RDBMS is stored in database objects called tables. The table is a collection of
related data entries and it consists of columns and rows.
Following is the example of a CUSTOMERS table:
ID
NAME
AGE
ADDRESS
SALARY
Ramesh
32
Ahemdabad
2000
Anu
20
Delhi
1000
Ram
23
Delhi
10020
Ramesh
32
Ahemdabad
2000
DBMS Users
Mainly , there are mainly four types of users in DBMS system:
Database administrator
2
Database designer
End users
Application programmers.
Database Designers
Database designers task is undertaken before the database is actually implemented. Hence,
Database Designers are responsible for identifying the data to be stored in the database, and
choosing appropriate structures to represent and store this data.
End Users
End Users are the users who use the applications developed . End users need not know about
working ,database design , and the access mechanism etc.They just use the system to get their
task done.
Application Programmers
These users write application programs to interact with the database . Application programs
can be written in some programming languages like C,C++,Java etc. Such programs access
the database by issuing the appropriate request , typically a SQL statement to DBMS .
are
data
are
The
3
logical level thus describes the entire database in terms of a small number of
relatively simple structures.
3. View level or External Schema
The highest level of abstraction describes only part of the entire database. The
variety of information stored in a large database. Many users of the database
system do not need all this information; instead, they need to access only a part of
the database. The view level of abstraction exists to simplify their interaction with
the system.
Data Independence
Data independence means that 'the application is independent of the storage structure and
access strategy of data'. In other words, The ability to modify the schema definition in one
level should not affect the schema definition in the next higher level.
There are Two types of Data Independence
1. Physical Data Independence
Physical data independence is the ability to modify the inner schema without having
alteration to the conceptual schemas or application programs. Alteration in the
internal schema might include.
Using new storage devices.
Using different file organizations or storage structures.
2. Logical Data Independence
Logical data independence is the ability to modify the conceptual schema without
having alteration in external schemas or application programs. Alterations in the
conceptual schema may include addition or deletion of fresh entities, attributes or
relationships and should be possible without having alteration to existing external
schemas .
NOTE: Logical Data Independence is more difficult to achieve.
Data Models
Data model tells how the logical structure of a database is modeled. Data Models are
fundamental entities to introduce abstraction in DBMS. Data models define how data is
connected to each other and how it will be processed and stored inside the system.
4
Network Model
The network model was developed by an alternative to the hierarchical database. In the
network model , entities are organized in a graph in which some entites can be accessed
through several other path.
Relational Model
In the relational model of a database, all data is represented in terms of tuples, grouped
into relations. A database organized in terms of the relational model is a relational database.
Database Languages
A DBMS must provide appropriate languages and interfaces for each category of users to
express database queries and updates. Database Languages are used to create and maintain
database on computer. Database languages can be categorized as follows :
Keys in DBMS
Key is a single or combination of multiple fields in a table. Its is used to fetch or retrieve
records/data-rows from data table according to the condition/requirement. Keys are also used
to create relationship among different database tables or views.The following are the various
types of keys available in the DBMS system:
Super Key
Super key is a set of one or more than one keys that can be used to identify a record uniquely
in a table. Example :Primary key, Unique key, Alternate key are subset of Super Keys.
Candidate Key
A Candidate Key is a set of one or more fields/columns that can identify a record uniquely in
a table. There can be multiple Candidate Keys in one table. Each Candidate Key can work as
Primary Key.Example: In below diagram ID, RollNo and EnrollNo are Candidate Keys since
all these three fields can be work as Primary Key
7
Primary Key
Primary key is a set of one or more fields/columns of a table that uniquely identify a record in
database table. It can not accept null, duplicate values. Only one Candidate Key can be
Primary Key.
Alternate Key
A Alternate key is a key that can be work as a primary key. Basically it is a candidate key that
currently is not primary key.
Example: In below diagram RollNo and EnrollNo becomes Alternate Keys when we define ID as
Primary Key.
Foreign Key
Foreign Key is a field in database table that is Primary key in another table. It can accept
multiple null, duplicate values. Foreign key can accept multiple null values in table. Example
: We can have a DeptID column in the Employee table which is pointing to DeptID column
in a department table where it a primary key.
Secondary Key
It defines the tuple but not uniquely.Example:Name,Address are secondary keys.
Unique Key
Unique key is a set of one or more fields/columns of a table that uniquely identify a record in
database table. It is like Primary key but it can accept only one null value and it can not have
duplicate values.We can have more than one primary key in a table.
8
Since for each value of A there is associated one and only one value of B.
Example
Integrity Constriants
Constraints are used to limit the type of data that can go into a table. Integrity constraints are
used to ensure accuracy and consistency of data in a relational database. Constraints can be
specified when a table is created (with the CREATE TABLE statement) or after the table is
created (with the ALTER TABLE statement).
You can define integrity constraints to enforce the business rules you want to associate with
the information in a database. If any of the results of a DML statement execution violate an
integrity constraint, then Oracle rolls back the statement and returns an error.
Example: assume that you define an integrity constraint for the salary column of the
employees table. This integrity constraint enforces the rule that no row in this table can
contain a numeric value greater than 10,000 in this column. If an INSERT or UPDATE
statement attempts to violate this integrity constraint, then Oracle rolls back the statement and
returns an information error message.
There are following two types of integrity constraints as follows:
1. Entity Integrity Constraints
2. Referential Integrity Constraints
Entity Integrity Constraint
The entity integrity constraint states that primary keys can't be null. There must be a proper
value in the primary key field.
10
This is because the primary key value is used to identify individual rows in a table. If there
were null values for primary keys, it would mean that we could not indentify those rows.
On the other hand, there can be null values other than primary key fields. Null value means
that one doesn't know the value for that field. Null value is different from zero value or space.
In the Car Rental database in the Car table each car must have a proper and unique Reg_No.
There might be a car whose rate is unknown - maybe the car is broken or it is brand new - i.e.
the Rate field has a null value. See the picture below.
The entity integrity constraints assure that a spesific row in a table can be identified.
Examples
Rule 1. You can't delete any of the rows in the CarType table that are visible in the picture
since all the car types are in use in the Car table.
Rule 2. You can't change any of the model_ids in the CarType table since all the car types are
in use in the Car table.
Rule 3. The values that you can enter in the model_id field in the Car table must be in the
model_id field in the CarType table.
Rule 4. The model_id field in the Car table can have a null value which means that the car
type of that car in not known
Normalization
Normalization was developed by IBM researcher E.F. Codd in 1970s . Database
normalization is a database schema design technique by which an existing schema is
modified to minimize redundancy and dependency of data. Redundancy is storing the same
data item in more one place . Normalization split a table into smaller tables and deine
relationship between them to increase the clearity in organizing data.
While designing a database out of an entityrelationship model, the main problem existing in
that raw database is redundancy.
Problem Without Normalization
Without Normalization, it becomes difficult to handle and update the database,
without facing data loss. Insertion,Updaton and Deletion Anamolies are very frequent
if Database is not Normalized. To understand these anomalies let us take an
example of Student table.
S_Id
401
402
S_Name
Adam
Alex
S_Address
Noida
Panipat
Subject_Opted
Bio
Maths
403
Sturat
Jammu
Maths
404
Adam
Noida
Physics
Updation Anamoly : To update address of a student who occurs twice or more than
twice in a table, we will have to update S_Address column in all the rows, else data
will become inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id),
name and address of a student but if student has not opted for any subjects yet then
we have to insert NULL there, leading to Insertion Anamoly.
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it,
when we delete that row, entire student record will be deleted along with it.
To solve this problem, the raw database needs to be normalized. This is a step by step
process of removing different kinds of redundancy and anomaly at each step. At each step a
specific rule is followed to remove specific kind of impurity in order to give the database a
slim and clean look.
12
Emp-Name
Month
Sales
Bank-Id
Bank-Name
E01
AA
Jan
1000
B01
SBI
Feb
1200
Mar
850
Jan
2200
B02
UTI
Feb
2500
Jan
1700
B01
SBI
Feb
1800
Mar
1850
Apr
1725
E02
E03
BB
CC
In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).
it contains no non-atomic values and each row can provide a unique combination of
values.
The above table in UNF can be processed to create the following table in 1NF.
Emp-Id
Emp-Name
Month
Sales
Bank-Id
Bank-Name
E01
AA
Jan
1000
B01
SBI
E01
AA
Feb
1200
B01
SBI
E01
AA
Mar
850
B01
SBI
E02
BB
Jan
2200
B02
UTI
E02
BB
Feb
2500
B02
UTI
E03
CC
Jan
1700
B01
SBI
E03
CC
Feb
1800
B01
SBI
E03
CC
Mar
1850
B01
SBI
E03
CC
Apr
1725
B01
SBI
As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the
13
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales
and Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id,
which is not the
primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.
Emp-Id
Emp-Name
Month
Sales
Bank-Id
E01
AA
JAN
1000
B01
E01
AA
FEB
1200
B01
E01
AA
MAR
850
B01
E02
BB
JAN
2200
B02
E02
BB
FEB
2500
B02
E03
CC
JAN
1700
B01
E03
CC
FEB
1800
B01
E03
CC
MAR
1850
B01
E03
CC
APR
1726
B01
Bank-Id
Bank-Name
B01
SBI
B02
UTI
After removing the portion into another relation we store lesser amount of data in two
relations without any loss information. There is also a significant reduction in redundancy.
In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines
[Genre Type]. Therefore, [Book ID] determines [Genre Type] via [Genre ID] and
we have transitive functional dependency, and this structure does not satisfy third
normal form.
To bring this table to third normal form, we split the table into two as follows:
Now all non-key attributes are fully functional dependent only on the primary key.
In [TABLE_BOOK], both [Genre ID] and [Price] are only dependent on [Book
ID]. In [TABLE_GENRE], [Genre Type] is only dependent on [Genre ID].
SID
100
150
200
250
200
Major
Math
Phy
Math
Math
Phy
Fname
Ram
Shyam
Rohan
Ram
Rohit
Here
FD: Fname
Major .Any Faculty member advises only in one
subject.Therefore given the Fname,we can determines the major.Thus Fname is
determinant. But it is not a candidate key.
According to definition,Fname must be candidate key.So this table can be
decomposed into two relations STU-ADV(SID,Fname) and ADVSUBJ(Fname,Major) . Hence,these relations are in BCNF.
STU-ADV(SID,Fname)
Key(SID,Fname)
SID
100
150
200
250
200
Fname
Ram
Shyam
Rohan
Ram
Rohit
ADV-SUBJ(Fname,Major)
KeyFname)
Fname
Ram
Shyam
Rohan
Ram
Rohit
Major
Math
Phy
Math
Math
Phy
16
It is in BCNF.
It has no multi value dependency(MVD).
MVD exists if
skills
hobbies
Programming
Golf
Programming
Bowling
Analysis
Golf
Analysis
Bowling
Analysis
Golf
Analysis
Gardening
Management
Golf
Management
Gardening
17
This table is difficult to maintain since adding a new hobby requires multiple new rows
corresponding to each skill. This problem is created by the pair of multi-valued dependencies
EMPLOYEE#--->SKILLS and EMPLOYEE#--->HOBBIES. A much better alternative
would be to decompose INFO into two relations:
skills(employee#, skill)
employee#
skills
Programming
Analysis
Analysis
Management
hobbies(employee#, hobby)
employee#
hobbies
Golf
Bowling
Golf
Gardening
It is in 4NF.
18
buyer
vendor
item
Sally
Mary
Sally
Mary
Sally
Problem:- The problem with the above table structure is that if Claiborne starts to sell Jeans
then how many records must you create to record this fact? The problem is there are pair wise
cyclical dependencies in the primary key. That is, in order to determine the item you must
know the buyer and vendor, and to determine the vendor you must know the buyer and the
item, and finally to know the buyer you must know the vendor and the item.
Solution:- The solution is to break this one table into three tables; Buyer-Vendor, Buyer-Item,
and Vendor-Item. So following tables are in the 5NF.
Buyer-Vendor
buyer
vendor
Sally
Liz Claiborne
Mary
Liz Claiborne
Sally
Jordach
Mary
Jordach
19
Buyer-Item
buyer
item
Sally
Blouses
Mary
Blouses
Sally
Jeans
Mary
Jeans
Sally
Sneakers
Vendor-Item
vendor
item
Jordach
Jeans
Jordach
Sneakers
20