You are on page 1of 30

Chapter 6: Elements of Database Systems

ACCOUNTING INFORMATION SYSTEMS: A DATABASE APPROACH


by: Uday S. Murthy, Ph.D., ACA and S. Michael Groomer, Ph.D., CPA, CISA

Elements of Database Systems


Learning Objectives
After studying this chapter you should be able to:

distinguish between the file-oriented approach and the database approach

discuss fundamental relational database concepts such as composite and foreign


keys

specify the types of relationships that can be represented in database systems

provide a detailed description of the relational database model

discuss database integrity, emphasizing entity and referential integrity in


particular

explain and provide examples of validation rules in relational database systems

discuss how views and permissions can be used to restrict access to sensitive
data in relational database systems

explain the data dictionary concept

describe the types of database languages

construct SQL queries to extract information from relational database systems

discuss database backup and recovery methods

explain concepts such as concurrency control

explain in general terms concepts such as the object-oriented approach to


developing database systems

The elements of computer-based information systems were discussed in Chapter 2. In


this chapter we focus on the elements of database systems. As discussed in Chapter 1,
the next generation of accounting systems will have an enterprise-wide orientation and
will most likely be built on a database platform. We will first differentiate between the
older file-oriented approach and the more recent database approach. Various data
structure elements specific to database systems will then be discussed. The major
1 of 30

Chapter 6: Elements of Database Systems


types of database models will then be reviewed. The remainder of the chapter will focus
exclusively on the relational database model, which has grown to become widely
accepted as the platform of choice for robust enterprise applications.

File-oriented and database approaches contrasted


The early applications of computer technology were in automating transaction
processing systems. These early applications were developed using COBOL. Each
transaction processing system was created and treated independently with its own set
of files and programs. There was virtually no integration across application areas. Most
business data processing systems developed in the 1970s and the early 1980s
employed this approach. This approach to TPS is referred to as the file-oriented
approach. A more modern approach is to develop an integrated set of application
systems with all data stored in a shared repository, i.e., the enterprise database. This
approach is referred to as the database approach. It is important to note that a
significant number of file-oriented systems exist in many businesses today. Most of
these file-oriented systems are written in COBOL and continue to receive maintenance.
These systems will be around for some years to come due primarily to the significant
investment businesses have made in these systems. The term "legacy systems" is used
to describe these older COBOL systems. Systems administrators in many organizations
have to grapple with the problems associated with interfacing these legacy systems with
newer systems that operate in a database or 4GL environment. Furthermore, many of
these legacy systems had to be overhauled to deal with the Y2K problem, since their
"date" data structures typically allocated only two digits for the year. The contrast
between the file-oriented and database approaches is most stark in the context of
custom-developed accounting information systems. However, as discussed in Chapter
3, many companies use off-the-shelf accounting software packages such as
Quickbooks Pro, Peachtree, and Great Plains. Lower-end (less expensive) accounting
packages tend to conform more to the file-oriented approach while the higher-end (more
expensive) packages tend to conform more to the database approach. Thus, the
discussion below of the file-oriented and database approaches is also relevant in the
context of accounting software packages. Both the file-oriented and the database
approaches will now be described, and the relative advantages and drawbacks of each
will be discussed.

The file-oriented approach


As indicated above, the file-oriented approach involves creating a set of files, as
needed, for each transaction processing application such as sales or purchases. A set
of COBOL programs and data files are created to satisfy the information needs of each
application. As shown in the figure on the next page, each application's files and
programs are created and maintained independent of other applications.

2 of 30

Chapter 6: Elements of Database Systems

Duplication of files across applications is one consequence of this independence. For


example, in the above figure, File A is used in application program 1 and 3. However,
two instances of File A must be created - programs 1 and 3 cannot simply share File A
since there is no way of allowing concurrent access to the same file in COBOL. Another
important characteristic of the file-oriented approach is that each COBOL program is
required to define the data structures that will be used in the program. For example, a
CUSTOMER file might have the following fields: CUSTOMER-NO, NAME, STREETADDRESS, CITY, STATE, and ZIP. The format of this file, including the field names and
data types (e.g., text, numeric, date/time) must be explicitly defined. Such definitions
constitute the "data structure" for a particular file. If you are familiar with COBOL, you
might recall that every COBOL program has four divisions -- the identification division,
the environment division, the data division, and the procedure division. The data division
is where the data structures are defined which are then manipulated in the procedure
division. Thus, in the file-oriented approach, every program defines all the data
structures it uses.
3 of 30

Chapter 6: Elements of Database Systems

Drawbacks of the file-oriented approach


The most significant drawback of the file-oriented approach is data redundancy or
duplication, which is caused by the lack of sharing of data across applications. For
example, a marketing application would have to create its own customer file, although a
customer file already exists in the sales application. The resulting data redundancy or
duplication has many undesirable consequences. Apart from merely consuming more
storage space, data redundancy can easily lead to data inconsistencies. Adding new
data, changing existing data, or deleting data has to be repeated for each instance of
the duplicate files. For example, consider the marketing and sales application referred
to above. If a customer's address change is recorded in the sales application but not the
marketing application, the result is inconsistent data across the two applications with no
indication of which address is the correct address. Thus, data redundancy can cause
data inconsistencies, which is more problematic than simply the extra storage space
consumed by duplicate data.
A second drawback of the file-oriented approach is the proliferation of files resulting
from each application creating its own files as needed, which leads to data maintenance
becoming significantly more problematic. With several versions of the same file in
different applications, ensuring consistency of data across all applications becomes
more difficult as the number of files multiplies. A third drawback of the file-oriented
approach is the length of time normally required for application development. Files
needed for a new application must be created from scratch since sharing of existing
files is not possible.
A fourth and rather significant disadvantage of the file-oriented approach is the lack of
independence between the data structures and the application programs that access
those data structures. As indicated earlier, the "data" division in the COBOL program
defines the data structures of all files used in the program. These data structures have
to be redefined in every COBOL program that accesses the same file. Any change in a
data structure of a file has to be painstakingly effected in each of the several COBOL
programs that access the file. The drawbacks of the file-oriented approach are
summarized in the table on the next page.

4 of 30

Chapter 6: Elements of Database Systems

Drawbacks of the File-Oriented Approach


Drawback
Explanation
Duplication of data (files) across applications
poses a data maintenance problem and can
Data redundancy
potentially cause problems of data
inconsistencies. Excessive duplication also
results in high data storage costs.
The task of maintaining files can become
very complex as the number of applications
Proliferation of files
multiplies, since each application creates its
own set of files.
Inability to share data in existing files
increases the time required to create new
Lengthy application
applications, since all files needed for the
development
new application must be independently
created.
Data structures and the procedures that
modify the data are both defined within the
Lack of data
same program (in the "Data" and
independence
"Procedure" divisions). Data structures
and/or procedures cannot be independently
modified.

The Database Approach


In contrast to the file-oriented approach, the database approach centers around creating
an organization wide repository of data that all applications and all users can share.
Rather than having multiple instances of the same file, each set of data is uniquely
stored, as shown in the figure that follows. Note that in the database approach the term
"data set" is used instead of "file." Each application interfaces with the data it needs by
accessing the appropriate data sets from the organization's repository or database. The
"sharing" of data in the common repository, stored on disk, is transparent to the users.
Concurrent access to the same data set is handled by the database management
system. Note in the following figure (next page) that the DBMS interfaces with the
operating system. The actual retrieval of the data sets stored on disks is handled by the
operating system.

5 of 30

Chapter 6: Elements of Database Systems

A database is an integrated repository of an organization's data containing a series of


interrelated data sets. The data sets are designed to store data about entities such as
customers, employees, and vendors, and also events such as sales, which are really
relationships between entities. Specifically, the "sales" event represents a relationship
between the "customers" and "finished goods inventory" data sets (i.e., finished goods
inventory is sold to customers). The repository is integrated in that there is no
duplication of data sets - every entity and event data set is stored just once. The data
sets are interrelated in that common attributes exist between data sets to signify
relationships between entities and events. Coordination among the data sets involves
ensuring that updates to one data set do not result in data inconsistencies in related
data sets.
The tasks of creating, updating, and managing data sets are handled by the database
management system (DBMS). The definition of data sets in terms of their structure is
done using DBMS facilities. Updates to these data sets are also performed using
features in the DBMS. In effect, the DBMS serves as the interface between the
application programs and users requesting access to data sets and the operating
system which actually retrieves the data from their physical locations on a magnetic
disk. When there are multiple concurrent requests to access the same data set, it is the
DBMS that prioritizes and coordinates the requests. Details regarding this process of
handling multiple simultaneous requests to the same data set, a process called
concurrency control, will be discussed a little later. The DBMS also handles a variety of
other functions, most notably backup and security.
Advantages of the Database Approach
In contrast to the file-oriented approach, the database approach has many advantages.
First, data redundancy is essentially eliminated since every data set is stored only once
in the repository. Multiple applications requiring access to the same data set simply
share that data set perhaps even simultaneously as indicated above. No longer are
duplicate versions of the same data set maintained for different applications (as was
6 of 30

Chapter 6: Elements of Database Systems


necessary in the file-oriented approach). This sharing of data is a key feature of the
database approach. If customer names and addresses are stored in only one data set,
a change in a particular customer's address would need to be made just once.
A second advantage of the database approach, related to the first advantage, is that
data inconsistencies are much less likely to occur. Since customer ABC's record is
stored just once in the database, there is no possibility of having different versions of
ABC's name and address in different files.
A third and very significant advantage of the database approach is data
independence. Recall that in the file-oriented approach data structures must be defined
in each application program that accesses those data structures. In a database, data
structures are defined using the DBMS independently of application programs that
access the data sets. It is not the application programs but the DBMS that is used to
define data structures. An application program that requires access to the customer
data set does not have to define the structure of the customer data set. This
independent definition permits changes to be made in the structures of data sets without
having to modify each application program that accesses the affected data sets. Thus, if
the "zip code" field in the customer data set needs to be changed from a five digit field
to a nine digit field, this change is performed once using the DBMS. None of the
application programs that access the customer data set need to be modified.
A fourth advantage of the database approach is that sharing of data and the data
independence concept permit rapid application development. New applications using
data that already exists in the database can be very quickly developed. The timeconsuming steps of defining the data structures and setting up files are eliminated.
The fifth and final advantage of the database approach is that the important functions of
backup, control, and security are centralized. Virtually all DBMS come with backup
facilities to periodically backup the entire database. Corrupted data sets can be easily
restored from the backup. Facilities also exist within the DBMS software to specify
access restrictions for either the entire repository, specific data sets, or specific data
items within each data set. Controls over what kind of data can be entered into each
data set can also be specified at the level of the data set. These controls, called integrity
constraints or validation rules, will be discussed later in the chapter. In contrast to the
database approach, the file-oriented approach required backup, control, and security to
be performed and specified on an application-by-application basis.
The advantages of the database approach are summarized in the following table on the
next page.

7 of 30

Chapter 6: Elements of Database Systems

Advantages of the Database Approach


Advantage
Explanation
Each data set is stored just once
Data redundancy virtually
in the repository, thereby reducing
eliminated
data storage costs.
Since each data item is stored
No data inconsistencies
only once, there cannot be
multiple versions of that data item.
Data structures are defined
separately from the application
Data independence
programs. Changes can be made
to data structures without having
to modify application programs.
Ability to share data shortens the
Rapid application development
time required to create new
applications.
Backup, control, and security
Centralized backup, control, and tasks are all handled centrally by
security
the database management
system.

Drawbacks of the database approach


In comparison to the advantages just cited, the database approach has a few
drawbacks. First, within the organization, performing tasks beyond the most basic
database activities can be extremely complex. Although DBMS software is becoming
increasingly user friendly, most complex tasks such as administering the database
require considerable expertise. A second drawback of the database approach is that
DBMS include a number of fairly complex features for controlling the integrity of data
entered into tables. Mastering the intricacies of table integrity features can be quite a
daunting task. However, given their role as control and security consultants within the
organization, auditors (internal and external) must familiarize themselves with the
DBMS features that allow control and security to be specified. From an auditor's
perspective, the complexity of DBMS might pose significant problems in addressing
audit concerns of control, security, and integrity of data. Third, data stored within the
database can most easily be accessed using DBMS specific features and utilities, such
as the DBMS query language and built-in reports. Thus, for the purposes of the annual
financial statement audit, the auditor must become adept at creating and running
database queries to ensure that there are no material errors in the financial statements.
A fourth and final drawback of the database approach pertains to the centralization of
control and security. Although centralization was listed as an advantage, it can also be a
weakness since intruders need only penetrate the DBMS shield to have access to all of

8 of 30

Chapter 6: Elements of Database Systems


the organization's data. Access and control restrictions specified on data sets are done
using DBMS features; those same features can be used to "turn off" the access and
control restrictions unless adequate safeguards exist. Thus, for the DBMS control and
security features to be reliable, there should be strong control procedures over access
to the database (control procedures are discussed in detail in chapter 10). The
disadvantages of the database approach are summarized in the following table. The
advantages of the database approach outweigh the few drawbacks. With the falling cost
of hardware and software, most organizations should be able to justify the investment
associated with the database approach.

Disadvantages of the Database Approach


Disadvantage
Explanation
Complexity of
The administration of large scale database
DBMS
systems requires significant resources and
administration
expertise
Configuring the database to insure data integrity
requires considerable expertise and intricate
Data integrity
knowledge of DBMS features. Accountants and
using complex
auditors must familiarize themselves with control
DBMS features
and security concerns for DBMS and how these
can be implemented in database environments
Accountants and auditors must be competent in
Data accessible
using the DBMS to access data for the purpose of
only through
generating useful information and fulfilling audit
DBMS
objectives
Backup, control and security are typically
Centralized
centralized, potentially making the organization
backup, control,
vulnerable to a hacker who can break through the
and security
central security shield

Fundamental database concepts


Having contrasted the file-oriented and database approaches, let us examine some
fundamental concepts in the database approach. The data set concept discussed
earlier is a crucial building block of database models and database systems. A data set
is created for every entity and every event of interest that needs to be represented in
the database. A primary key is a unique identifier of a record in a file. Similarly, in a
database, every data set must have a unique identifier of records within the data set.
This unique identifier is referred to as the primary key of the data set. When multiple
fields are required to uniquely identify a data set, the result is referred to as a composite
key or a concatenated key. Every field in a data set other than the primary key is
referred to as a non-key attribute. Some of the non-key attributes may be used to sort
the file (or data set) to facilitate answering user queries.

9 of 30

Chapter 6: Elements of Database Systems


As discussed earlier, a unique aspect of the database approach is the interrelationships
between data sets. These interrelationships are defined in different ways depending on
the type of database model (to be discussed in the next section). In terms of our
discussion of keys, it is relevant to discuss the concept of a foreign key. A foreign key
is a field in a data set that is the primary key in a related data set, referred to as the
"master" data set. There are two variants of the "foreign key" concept -- a foreign key
can either be part of a composite primary key or simply a non-key attribute in a data
set. That is, when a data set has a composite primary key (more than one field making
up the primary key), then each individual element of that composite key will usually be
the primary key in another related data set. Alternatively, a foreign key can simply be a
non-key attribute in a data set which happens to be a primary key in a related data set.
It is important to identify foreign keys in a database because it is these keys that enable
linking of data sets that are related to one another. The concept of foreign keys will be
explained later in this chapter using an example. The various key types are summarized
in the following table.

Keys in Database Environments


Key
Explanation
Primary key
Unique identifier of records in a data set.
Two or more fields taken together serve as
Composite (concatenated) key
the primary key in the data set.
Non-key attribute
Any field that is not a primary key attribute.
Two variants: (1) an element of a composite
key in a data set which is the primary key in a
Foreign key
related data set, (2) a non-key attribute in a
data set which is the primary key in a related
data set.

Relationships between entities and events, both of which are represented in the
database by means of data sets, can be of three types, referred to as the relationship
cardinality. One-to-one (1:1), one-to-many (1:M), and many-to-many (M:M)
relationships are the three relationship types; the shorthand for depicting each
relationship type is shown in parentheses. Consider the relationship between the
"department" and "manager" entities. A 1:1 relationship between departments and
managers implies that each department can have one and only one manager and each
manager can manage one and only one department. Now consider the relationship
between the "salespersons" and "customers" entities. A 1:M relationship between
salespersons and customers means that each salesperson can have many customers
but every customer is assigned to exactly one salesperson. Note that a 1:M relationship
can be interpreted as a M:1 relationship when read from the opposite direction. Thus,
the relationship from customers to salespersons is a M:1 relationship (many customers
have one salesperson). A M:M relationship between salespersons and customers

10 of 30

Chapter 6: Elements of Database Systems


indicates that each salesperson can have many customers and each customer can
work with many salespersons.
As another example of the various relationship cardinalities, consider the relationship
between students and tutors. A 1:1 relationship indicates that each student is assigned
exactly one tutor and each tutor is assigned exactly one student. A 1:M relationship
from tutors to students indicates that each tutor has many students but every student is
assigned exactly one tutor. Thus, this relationship would be read as a M:1 relationship
from students to tutors. A M:M relationship between tutors and students indicates that
each student can have many tutors and each tutor can have many students. Note that
the "M" side of a relationship, that is the "many" side, can be interpreted as "one or
more." The various relationship types are summarized in the table below.

Relationship
1:1
1:M
M:M

Relationship Types
Explanation
One-to-one (e.g., one professor in one office)
One-to-many (e.g., one advisor has many students)
Many-to-many (e.g., a class has many students, and
a student can be in many classes)

Overview of database models


Given the understanding of fundamental database concepts discussed above, we now
turn to a description of the dominant database model today, i.e., the relational model.
Two older database models are the hierarchical and the network model -- these models
were popular in the 1970s but have since been displaced by the relational model. The
hierarchical and network models are rarely found in practice and are therefore not
discussed in this chapter. An emerging model is the object-oriented database model
which we shall briefly explore when we consider emerging concepts in the database
arena.

Relational Model Overview


The relational model, first proposed by E.F. Codd, uses the concept of a "relation" to
store data. A relation is simply a two-dimensional table with rows and columns. The
rows, also referred to as "tuples," are the records in the data set and the columns are
the fields. The term "table" and "relation" are synonymous and are used
interchangeably. Henceforth, we shall use the term "table" rather than "relation" and
"row" rather than "tuple." A "customer" data set would be implemented as a twodimensional table with the columns containing the various attributes of customers (e.g.,
name, address, phone number, etc.) and the rows containing the records (i.e.,
customers). The concept of a table is thus intuitively appealing and easy to understand.

11 of 30

Chapter 6: Elements of Database Systems


In the relational model, then, a series of tables are constructed for storing data relevant
to the situation. A relational model for a sales order and collection system is shown in
the figure below. Note that one table is used to represent each entity and each event.

The resulting tables are SALES-REGIONS, CUSTOMERS, SALES-ORDERS, ITEMSORDERED, COLLECTIONS, ORDERS-COLLECTIONS, and ITEMS. The arrows in the
figure are drawn to point out the links between tables (i.e., the common fields between
tables). The single and double headed arrows signify "1" and "M" relationships as
before. The convention we will use to indicate the primary key in a table is by
underlining it. Obviously, there will be two (or more) fields underlined in the case of a
composite primary key.
Recall that we introduced the concept of a "foreign key" earlier in the chapter. Let us
revisit that concept in the context of the relational model. To repeat, a foreign key is
either a non-key attribute in a table that is a primary key in a related table or an element
of a composite key in a table that is a primary key in a related table. The "related table"
is in effect the "master" table for that key field. In the above set of tables, the
CUSTOMER-NO and REGION-NO fields in the SALES-ORDERS table represent one
variant of the foreign key concept -- they are non-key attributes in the SALES-ORDERS
table, but each of them are primary keys in a related table. CUSTOMER-NO is the
primary key in the CUSTOMERS table, and REGION-NO is the primary key in the
SALES-REGIONS table. "CUSTOMERS" is considered to be the "master" table for the

12 of 30

Chapter 6: Elements of Database Systems


CUSTOMER-NO primary key, and "SALES-REGIONS" is considered to be the "master"
table for the REGION-NO field.
Now let us turn to the second variant of foreign keys -- those that are elements of
composite primary keys. Note that the ORDERS-COLLECTIONS table has a
composite key of RECEIPT-NO and ORDER-NO (i.e., RECEIPT-NO and ORDER-NO
taken together uniquely determine the rows in the ORDERS-COLLECTIONS table). Per
the definition of a foreign key, both RECEIPT-NO and ORDER-NO are each considered
to be foreign keys in the ORDERS-COLLECTIONS table since each of them is a
primary key in a related "master" table (RECEIPT-NO is the primary key in
COLLECTIONS and ORDER-NO is the primary key in SALES-ORDERS). Similarly,
ORDER-NO and ITEM-NO are foreign keys in the ITEMS-ORDERED table (the
"master" table for items is the ITEMS table). The relevance of the distinction between
these two variants of foreign keys will become apparent a little later in the chapter when
we discuss the concepts of entity integrity and referential integrity. The convention we
will use to indicate foreign keys in the relational model is with an asterisk (*) at the end
of the foreign key field.
In the relational model, M:M relationships are represented using composite key tables.
Thus, the M:M relationship between SALES-ORDERS and COLLECTIONS is
implemented by creating a new ORDER-COLLECTIONS table, which has a composite
key comprising the primary keys of the SALES-ORDERS and COLLECTIONS tables,
i.e., the two tables involved in the M:M relationship. That is, the ORDERCOLLECTIONS table has a composite primary key of RECEIPT-NO, ORDER-NO, with
RECEIPT-NO being the primary key of the COLLECTIONS table and ORDER-NO being
the primary key of the SALES-ORDERS table. Similarly, the M:M relationship between
SALES-ORDES and ITEMS is implemented by means of the ITEMS-ORDERED table
which also has a composite key, formed by taking the primary key of SALES-ORDERS
and the primary key of ITEMS (i.e., ORDER-NO, ITEM-NO). The ITEMS-ORDERED
and ORDER-COLLECTION composite key tables above both had non-key attributes.
However, it is possible for a composite key table to have no non-key attributes at all. In
that case, the composite key table is referred to as an all key relation (i.e., a table
where all fields comprise the primary key).
It is important to note that relationships between tables are represented implicitly using
foreign keys. By contrast, the older hierarchical and network models represented
relationships explicitly using physical pointers. The process of designing a relational
model, in terms of the number and structure of tables and the keys linking tables, will be
discussed in the next chapter. Suffice to say for now that the process of designing a
relational database model is non-trivial.
The simplicity and ease of use of the relational model, and hence its superiority over the
hierarchical and network models, is evident in many ways. First, the use of foreign keys
results in the representation of all relationships implicitly and not explicitly (as was the
case in the hierarchical and network models). Thus, any two tables can be related as
long as they have a common field. Thus, query processing in the relational model is far
simpler since users do not have be cognizant of the physical pointers between data
sets. In terms of the types of relationships that can be represented in the relational
13 of 30

Chapter 6: Elements of Database Systems


model, 1:1, 1:M and M:M relationships can all be represented. The way in which each of
the relationship types is represented will be discussed in the next chapter.
Relational database management systems (RDBMS) are available for enterprise
oriented operating systems such as Unix. Linux, and Windows 2012 Server as well as
personal computer oriented operating systems such as Windows and Macintosh OS-X.
Oracle 12c, IBM Informix 12.1, Microsoft SQL Server, IBM DB2, and Sybase are some
of the popular enterprise RDBMS that run on industrial-strength servers. For individual
users running Windows, some of the popular RDBMS packages are Microsoft Access
and dBase.

The Relational Model Explored


Although there have been recent advances in database technology focused on the
object-oriented model, which we will briefly explore towards the end of this chapter, the
vast majority of database oriented business information systems are built using the
relational model platform. Some years ago, RDBMS were hailed for their simplicity and
ease of use but assailed for their poor performance relative to DBMS built on the then
prevailing models the hierarchical and network models (which are now defunct).
However, in recent years, the performance of RDBMS has improved considerably as a
result of which RDBMS have gained widespread acceptance in the marketplace. Even
personal computer based RDBMS such as Microsoft Access are proving to be powerful
enough for creating complex and robust database applications to meet the information
needs of a variety of businesses.
We will first examine the rules to which tables must conform. RDBMS vary in terms of
their compliance with these rules. In addition to these basic rules, we will also focus on
various RDBMS features that have implications for control and security of the database.
As accountants, you are likely to be called upon to work with and give advice on the
control and security aspect of database systems. Integrity constraints, validation rules,
permissions, views, and the data dictionary are some of the features of RDBMS that are
relevant from a control and security standpoint.
Rules for tables
Tables in a relational database must conform to a number of rules. Each table in the
database must have a unique name; no two tables can have the same name. Duplicate
columns and rows are not permitted within a table; no two columns can have the same
name in a table, and no two rows can have the same value in every column. The
sequence of rows and columns is immaterial. However, the convention is to list the
primary key field(s) at the left and the non-key fields on the right. Rows in tables are
usually ordered in ascending or descending order of the primary key, although this
ordering is not essential. Tables can easily be sorted on fields other than the primary
key.
Every table must have a designated primary key - a unique identifier of every row in the
table. The primary key could either be one field only, or more than one field taken
together. As discussed earlier, when multiple fields make up the primary key, the result
is referred to as a composite key or a concatenated key. Note that a table could have
more than one unique identifier, but only one must be chosen as the primary key. For
14 of 30

Chapter 6: Elements of Database Systems


example, a student table could store both the social security number (SSN) and a
university assigned student ID (SID) number. The student table would thus have two
unique identifiers (SSN and SID), but only one can be designated as the primary key.
Relationships between tables are represented using common fields between them. As
discussed earlier, these common fields are foreign keys. Recall that a foreign key is
either an individual element of a composite primary key or a non-key attribute in one
table that is the primary key in another table. As discussed above, the "CUSTOMERNO" field in the sales orders table in the relational model is also the primary key in the
customers table and is thus a foreign key in the sales orders table. The rules for tables
are summarized below.

Table names must be unique in the database.

Every table must have a primary key.

Duplicate rows and duplicate columns are not allowed.

The order of rows and columns is immaterial.

Entity and referential integrity


Tables must conform to a number of integrity constraints. Two key constraints are
entity integrity and referential integrity. Entity integrity means that the primary key
field (or fields in case of a composite key) in a table cannot be null and must be unique.
This integrity constraint applies to every table in the database. What entity integrity
simply means is that the primary key field must have a value -- it cannot be left blank.
Furthermore, the value of every primary key in a table must be unique -- no two rows in
the table can have the same primary key value. Most RDBMS are equipped with
features that automatically enforce entity integrity. Thus, the RDBMS will signal an error
if the user attempts to insert a new row without specifying a value for the primary key
field, or if the value specified is a duplicate value.
Referential integrity means that foreign keys must either be null or match an existing
value in the "master" table for the foreign key. It is important to note that foreign keys
can only be null when they are non-key attributes. It is perfectly legal for a non-key
attribute to be null. However, a foreign key can never be null when it is an element of a
composite key, because a null value in an element of a composite key would violate
entity integrity.
Referential integrity is best explained in the context of an example. Consider the
relational model in the figure presented earlier in the chapter. The CUSTOMER-NO field
in the SALES-ORDERS table is a foreign key (because it is a non-key attribute in the
SALES-ORDERS table and a primary key in the CUSTOMERS table). What referential
integrity requires is that every value of CUSTOMER-NO in the SALES-ORDERS table
must exist in the CUSTOMERS table. In other words, there cannot be a customer
number in the SALES-ORDERS table that does not exist in the CUSTOMERS table.
Simply put, you cannot have a sales order on a non-existing customer. The
CUSTOMERS table is considered the "master" table for the CUSTOMER-NO foreign
key. That is, every new customer number must first appear in the CUSTOMERS table

15 of 30

Chapter 6: Elements of Database Systems


before it can appear anywhere else in the database (i.e., in other tables). Note however
that referential integrity allows a foreign key field to be left blank when it is a non-key
attribute. For example, a null value in the CUSTOMER-NO foreign key field in the
SALES-ORDERS table might signify a cash sale (in which case it may not be necessary
to keep track of the customer number). Referential integrity also applies to the REGIONNO field in the SALES-ORDERS table (every REGION-NO in SALES-ORDERS must
exist in the SALES-REGIONS table). Thus, referential integrity dictates that all foreign
key fields must have a corresponding value in the "master" table for the foreign key
field.
Entity integrity and referential integrity are essential to ensure an error free database.
Entity integrity prevents tables from having duplicate or missing primary keys which
would prevent rows from being located and queries from being answered. The main
purpose of referential integrity is to ensure the validity of foreign keys. As discussed
earlier, foreign keys are how links between tables are implemented. If referential
integrity is not enforced, then relationships between tables may be corrupted because of
invalid foreign key values.
Data validation rules
In addition to entity and referential integrity, a number of data validation rules can be
prescribed for each table in the database. The purpose of these validation rules is to
prevent erroneous data from being entered into the table. Note the emphasis on the
word "prevent" -- these rules are aimed at prevention rather than detection of errors.
Thus, validation rules in database systems present an opportunity for accountants and
auditors to propose a wide range of controls that can be built into the systems to
prevent errors from creeping into the database. Let us first discuss examples of data
validation rules and then how they work to prevent errors.
Validation rules can be established for individual fields within a table to restrict the data
that can be entered into the field. Rules that refer to more than one field in a table can
also be defined either directly at the field level or in some database systems at the
overall table level. Data validation rules at the field level can be specified to ensure that
the value entered in the field is in a range of acceptable values. Minimum values,
maximum values and both minimum and maximum values for data can be specified. For
an example, an "hours worked" field can have a minimum value (1), a maximum value
(40), or both a minimum and a maximum value (>=1 AND <= 40). If the user attempts to
enter a value of 50, the system will reject the input and display an error message.
Another type of validation rule allows valid values for a field to be specified such that
data input is accepted only if it matches one of the acceptable values. A "shipping code"
field may be designed to accept only one of the following values: 'S' for in-state, 'O' for
out of state but within the U.S., and 'I' for international. If the user attempts to enter a
value other than S, O, or I, an message will be displayed and the erroneous data will not
be accepted. Rules can also be specified to ensure that the correct number of digits
have been entered into a field. For example, a "zip code" field can be designed to
accept exactly five digits, or exactly nine digits in the format 99999-9999.
Rules can also be specified to ensure valid relationships between fields. Take for
example a "sales order" table with the fields "order date" and "ship date." A rule can be
16 of 30

Chapter 6: Elements of Database Systems


specified to ensure that the ship date is always on or after the order date. While some
database systems allow such rules to be specified in either of the two fields, other
systems refer to such multi-field validation rules as "record validation rules" (or "table
level validation rules"). Validation rules in RDBMS are summarized in the following
table.

Validation Rules
Greater than a minimum and/or less than a
Range test
maximum value?
Validity test
One of the acceptable values for this field?
Length test
Correct number of digits entered?
Valid combinations Correct mathematical or logical relationship
test
between fields in a table?

Restricting access to tables


In addition to such validation rules that are defined for each table, access restrictions
can be defined at the database level for controlling access to sensitive data such as
tables containing employee salary data. Most RDBMS provide the ability to designate
authorized and unauthorized users for tables. Certain tables can be made accessible to
all users except those listed, or only to the listed users. Unauthorized users receive
error messages when they attempt to access a table which they have not been
authorized to access.
Another method of restricting access to sensitive data is to create a view. A view is a
virtual table. A view appears to the user as a table but is actually a version of an existing
table with some of the table's data hidden from view (hence the term "view"). For
example, a view can be constructed of the "employee" table that hides the "salary" and
"home phone number" fields. The rest of the data in the employee table such as the
name, office phone number, email address, etc. would appear in the view. All users can
be given permission to access this view, but only a few individuals would be granted
access to the actual employee table. Apart from hiding certain columns, rows that
contain sensitive data can also be hidden. For example, information pertaining to high
level executives can be omitted from the view by excluding all rows in the "employee"
table where the "rank" field is higher than 8 (assuming that all top executives have
values greater than 8 in the "rank" field). Hiding of rows and columns can be combined
in the same view. To summarize, access to sensitive data can be restricted by (1)
specifying which users can access the sensitive table by setting "permissions" and (2)
creating views in which sensitive fields are hidden and permitting access only to the
view.
Data Dictionary
In an RDBMS, the data dictionary contains a variety of information about the contents
of the database. In a sense, the data dictionary contains data about data, or "meta

17 of 30

Chapter 6: Elements of Database Systems


data." Note that the data dictionary is not a single table in the database but a collection
of hidden system tables. The information stored in these hidden system tables
comprising the data dictionary is useful both from the standpoint of application
development and database maintenance and also from a control and security
standpoint. Some examples of the kind of data contained within the data dictionary are
the names of all tables, the columns (attributes) contained in each table along with their
data formats, and the privileges held by each user authorized to access the database.
By querying the data dictionary, users can locate all tables in which a particular attribute
exists. Note that access to the data dictionary should be restricted to select systems
designers and auditors who would find the need to refer to the data dictionary for a
variety of reasons. For example, an auditor interested in determining which users are
authorized to read the "customers" table could find that information by examining the
data dictionary. Or if the auditor would like to know how many tables contain the
"employee salary" field, the answer can be obtained from the data dictionary.

Languages for RDBMS


There are three categories of RDBMS languages: one for defining the relational
database schema, one that is used to access database tables from within conventional
application programs, and one that can be used by end users to perform ad hoc queries
of the database. The data definition language (DDL) is used to program the database
schema. The design and setup of the database is typically performed by the database
administrator (DBA) who is the individual having overall control over the database.
Using the DDL, the DBA can create tables, define authorized users of each table and
specify validation rules for individual tables.
The language that is used within conventional application programs like COBOL, or
more recently C and Visual Basic, is called the data manipulation language (DML).
These DML statements are embedded within third generation languages such as
COBOL or C, or even fourth generation languages. The purpose of the DML statements
is to allow the programs to access tables in a database whenever required. The need
for a separate DML embedded within conventional programs is fast diminishing as most
RDBMS today have powerful programming components built into them. For example,
Microsoft Access provides the ability to develop powerful applications using Visual
Basic for Applications (VBA) which is available within Access.
The third category of DBMS languages is the data query language (DQL). The DQL is
used by end users to perform ad hoc queries on the database. The DQL should ideally
be easy to use so that end users without extensive programming experience can
execute simple statements to satisfy relatively straightforward information needs. There
are two broad categories of DQL - GUI oriented querying referred to as Query By
Example (QBE), and command line oriented querying. The most popular command line
DQL is Structured Query Language (SQL - pronounced "sequel"). SQL will be
discussed in greater detail in the next section.
In GUI oriented querying, QBE, the user is presented with a shell of the table or tables
containing the data that would answer the user's query. In the appropriate field, the user
simply provides an example of the data he/she is looking for (hence the term "query by

18 of 30

Chapter 6: Elements of Database Systems


example"). In effect, the user specifies select conditions for fields and can also indicate
which fields the result should be sorted on. For example, if the user wants all sales
orders where the amount exceeded $1,000, he/she would first invoke the QBE module
to view a "skeleton" of the orders table. Then, the user would tab over to the "amount"
field, type in >1000, and execute the query. The skeleton table would then show the
rows that met the condition specified by the user (or a message if no rows were found to
satisfy the condition). As is possible in the "view" concept discussed earlier, the user
can hide certain fields from the answer. The following figure shows QBE in action, to
show orders exceeding $1,000, using the default query design view in Microsoft Access.

In addition to QBE and SQL, two other RDBMS tools are noteworthy. Most RDBMS
include a report writer which can be used to create custom reports formatted to the
user's specifications. The user simply indicates which table or view to use as the input
and can determine the precise nature of the report. The fields to total, at what points to
provide subtotals, the header and footer for each page, and the end of report summary
information are some examples of the report characteristics that the user can control.
The second useful RDBMS tool is the forms editor which can be used to create custom
data input forms. Rather than using the table itself to enter data, users (especially
novice users) can be provided with easy to use forms that simplify the process of
entering and retrieving data. Forms can be designed to supply default values for fields
and for specifying custom formats to facilitate data entry. Other than simply for data
input, forms also represent the user-interface component of powerful custom
applications that can be developed using the RDBMS' programming tools. Program
code modules can be associated with buttons on forms such that a whole series of
actions are automatically executed when the user clicks on a button after entering data
into form fields. This use of forms will be discussed in the next chapter. Shown below is
an example of a Microsoft Access form, used to add, update, and delete information
about customers.

19 of 30

Chapter 6: Elements of Database Systems

Structured Query Language -- SQL


As indicated above, a very popular database language for data definition, manipulation,
and query is Structured Query Language (SQL). In the context of the three types of
database languages discussed earlier (i.e., DDL, DML, and DQL), SQL includes
statements for all three purposes. The set of "create" commands within SQL are used to
define tables, i.e., as a DDL. SQL statements can be embedded into conventional
programming languages (i.e., used as a DML). Finally, and in its most common use,
SQL can be used as a DQL by end users seeking answers to ad hoc queries.
There are four primary operations that can be performed on tables using SQL -- the
SELECT operation (to select rows from one or more tables), the INSERT operation (to
insert rows into a table), the UPDATE operation (to modify one or more rows in a table),
and the DELETE operation (to delete one or more rows from a table). The most
commonly used SQL operation to answer ad hoc queries is the SELECT operation.
The SELECT operation can simultaneously (1) create a horizontal subset of a table, i.e.,
selecting all rows from a table that meet a certain condition, (2) link tables using the
common field between them, and (3) create a vertical subset of a table or tables, i.e.,
displaying only certain fields in tables.
The general syntax of the SELECT operation is as follows:
SELECT <table_name1>.<field_name1>, <table_name1>.<field_name2>,
<table_name2>.<field_name2>
FROM <table_name1>, <table_name2>...
WHERE <table_name1>.<common_field1> = <table_name2>.<common_field1> ....
AND <condition> [INTO <result table>]
The "INTO" syntax in square brackets is optional -- it results in the creation of a new
table to store the query results. The <condition> portion of the SQL statement (i.e., the
WHERE clause) is specified using a field from a table listed in the FROM part of the
statement (e.g., CUSTOMERS.BALANCE > 5000). Note that the WHERE condition

20 of 30

Chapter 6: Elements of Database Systems


clause is used to specify both the joins necessary to obtain the query result and the
criterion or criteria to be applied. The order of joins and criteria specifications is not
material. Tables needed for joins and for criteria specifications should be listed in the
FROM portion of the SQL statement. Note that the above syntax is not universal across
all RDBMS -- slight variations from one RDBMS to the next will exist. Let us now look at
some examples of SQL in action.
Consider the following four tables in a sales database information system (the primary
key in each table is in underlined, and foreign keys are identified with an asterisk at the
end of the field):
CUSTOMERS (CUSTOMERNO, NAME, ADDRESS, PHONE, BALANCE,
CREDIT_LIMIT)
SALES (INVOICENO, DATE, CUSTOMERNO*, SALESPERSON, TOTAL)
ITEMS (ITEMNO, DESCRIPTION, QTY_ON_HAND, COST_PRICE)
ITEMS_SOLD (INVOICENO*, ITEMNO*, QTY_SOLD, SELLING_PRICE)
Let us assume that the sales manager has the following queries: (1) which customers, if
any, have exceeded their credit limit? (2) what are the names and phone numbers of
customers who have been sold merchandise by John Doe? (3) what are the names and
current balances of customers who have been sold item number 1250? The SQL
queries to answer each of these queries follow:
Query no. 1:
SELECT * FROM CUSTOMERS
WHERE CREDIT_LIMIT < BALANCE;

(Note: the * means all fields)

Query no. 2:
SELECT CUSTOMERS.NAME, CUSTOMERS.PHONE
FROM CUSTOMERS, SALES
WHERE CUSTOMERS.CUSTOMERNO = SALES.CUSTOMERNO AND
SALES.SALESPERSON = "John Doe";

Query no. 3
SELECT CUSTOMERS.NAME, CUSTOMERS.BALANCE
FROM CUSTOMERS, SALES, ITEMS_SOLD
WHERE CUSTOMERS.CUSTOMERNO = SALES.CUSTOMERNO
AND SALES.INVOICENO = ITEMS_SOLD.INVOICENO
AND ITEMS_SOLD.ITEMNO = 1250;

The statements shown above use the syntax <table-name.field-name> to jointly refer to
both a field and the table in which the field appears. Joins are performed by indicating
which fields in the two tables should equal one another (i.e., which fields are common
between the two tables). In query number 3 above, the WHERE clause specifies (1) the
two joins neededbetween CUSTOMERS and SALES using CUSTOMERNO and
between SALES and ITEMS_SOLD using INVOICENO and (2) the criterion involving

21 of 30

Chapter 6: Elements of Database Systems


the ITEMNO field in the ITEMS_SOLD table. The CUSTOMERS, SALES, and
ITEMS_SOLD tables must all be specified in the FROM portion of the query.
While the exact form of SQL syntax can vary from one RDBMS to the next, the general
format should be similar to that shown above. The various DBMS languages/tools are
summarized in the following table.

DBMS Languages
Language/tool
Explanation
Used to create tables, set
permissions on tables, define
DDL - Data Definition Language validation rules in tables, and
perform other functions such as
backup.
Embedded into application
programs written in a third or
DML - Data Manipulation
fourth generation language. The
Language
DML statements allow the
program to interface with the
database.
General term for user-oriented
interfaces to the database to
DQL - Data Query Language
enable end users to obtain
answers to ad hoc questions.
A widely accepted standard
relational database query
language. Command line interface
SQL - Structured Query Language
using four main operators -SELECT, INSERT, UPDATE, and
DELETE.
Graphical interface for querying.
User is presented with a shell of a
table to be queried in which the
QBE - Query By Example
user can enter an example of what
he/she is looking for as a means
of querying the table.
Allows custom reports to be
Report Writer
generated from tables in a very
user-friendly intuitive manner.
Permit the creation of user-friendly
interfaces to tables. Forms can be
Forms Editor
made to appear like the
documents and paper forms that
are familiar to the user.

22 of 30

Chapter 6: Elements of Database Systems


DBMS backup and control features
In order to protect the organization from accidental or intentional corruption of the
database, periodic backups should be performed. The most basic form of backup is the
static backup. This backup procedure first involves closing all programs and shutting
down the database. Next, the entire database is backed up, either to a separate disk or
to tape. During the backup procedure all users are "locked out" of the database (i.e.,
prevented from accessing the database). The reason that this backup method is
referred to as a "static" method is because table structures and values are saved at a
particular point in time. If the database crashes, it can only be recovered to the state at
which it was last backed up. In other words, transactions that occurred since the last
backup are lost. Most personal computer RDBMS offer only static database backup.
A superior backup method is called dynamic backup. Often available only on
mainframe RDBMS, this method involves periodic static backup combined with logging
of each individual transaction to a backup magnetic disk in addition to the primary
magnetic disk. In effect, every transaction is recorded twice -- once on the primary
database disk and once on a backup disk. In the event of a hardware or software failure
which results in corruption of the database, the static backup is retrieved and the new
transactions from the backup disk are "applied" to the backed up version of the
database. This process results in recovery of the database to the status at the point of
failure. In effect, the database is reconstructed as if the crash had never occurred. One
popular dynamic backup method is the redundant array of inexpensive (or
independent) disks (RAID). As the name suggests, RAID uses an array of magnetic
disks for recording transactions. The RAID controller determines which disks each
transaction will be written on - as indicated earlier, each transaction will be written on
more than one disk. If a disk fails, the RAID controller can still retrieve all the
transactions because every single data item on the failed disk would have been written
on some other disk in the array. Especially with the cost of magnetic disks declining
rapidly, RAID systems have become more affordable than ever before. This web site at
the Advanced Computer & Network Corporation provides an excellent description of the
different levels of RAID, from level 0, to level 53, to level 0+1.
When multiple users can access a database via a network, a critical concern is control
over concurrent or simultaneous updates. If two users are allowed to update a table at
the same time, the database may be left with inconsistent values after the two users
perform their respective updates. For example, assume that 100 units of a finished
goods inventory item are in stock. Next assume that two sales clerks each process a
sales transaction for 80 units at exactly the same time (both would be permitted to do
so, since the system would display available inventory of 100 to both users). At the
conclusion of both transactions, the inventory item would have a balance of -60! To
prevent such an occurrence, all RDBMS have some form of concurrency control. The
most rudimentary form of concurrency control is called a "lock out." When one user
accesses a table with the intention of updating it, all other users are simply "locked out"
from the table. The other users receive a message indicating that the table is currently
unavailable. However, this is an extreme form of concurrency control. Users who only
intend to read data in the table should be permitted to do so. A less stringent form of
concurrency control is the "write lock" in which users who only intend to read data from

23 of 30

Chapter 6: Elements of Database Systems


the table are granted access but users who intend to update values in the table are
denied access. Remember that in a multi-user DBMS environment, it is essential to
allow simultaneous access to tables unless corruption of the data might result (as in the
case of simultaneous updates).

Emerging database systems concepts


We conclude this chapter with a brief discussion of an emerging concept relating to
database systems. Object-oriented (OO) approaches to modeling and implementing
database systems are becoming increasingly popular. This approach employs objectoriented modeling (OOM) techniques to model the domain of interest and then
implements the resulting model using an object-oriented database management system
(OODBMS). The object-oriented approach focuses on the objects of interest in the
domain. Customers, vendors, employees, sales orders, and receipts are all viewed as
objects that have certain attributes. OOM involves identifying the objects of interest,
their attributes, and relationships between objects.
A critical feature unique to the OO approach is that an "object" package includes both
the attributes of the object and the methods or procedures that pertain to that object.
The methods might dictate how the object's attributes are modified in response to
different events, or how the object causes changes in the attributes of other objects.
Thus, a key difference between the database models described earlier and the OO
approach is that OO models combine data (attributes) and procedures (methods) in one
package, i.e., the "object." This feature of OO models is referred to as encapsulation attributes and methods are represented together in one capsule. Another powerful
feature of OO models is inheritance. OO models depict the real world as a hierarchy of
object classes, with lower level classes inheriting attributes and methods from higher
level classes. Thus, lower level object classes do not need to redefine attributes and
methods that are common to the higher level object classes in the class hierarchy.
An OO model contains all details needed for implementation and object-oriented DBMS
are powerful enough to represent all the information contained in the model. However,
most organizations that have made heavy investments in RDBMS see little need to
migrate to OO environments. While OO modeling methods are available, there is no
consensus regarding the "best" method to use. Finally, although OODBMS are
beginning to become commercially available, they have not gained much acceptance in
the marketplace probably due to their relatively high cost and poor performance in
comparison to RDBMS. Gemstone, Jade, ObjectDB, and Objectivity are some
examples of OODBMS.

Summary
The chapter began by contrasting the older file-oriented approach with the database
approach. Drawbacks of the file-oriented approach and advantages and limitations of
the database approach were discussed. Key database concepts such as primary,
concatenated, and foreign keys were described. The various types of relationships such
as 1:1, 1:M, and M:M relationships were then explained. The relational model was then
explored in detail. Rules for relations, entity and referential integrity, and validation
rules for relational database systems were explained. The process of restricting access
24 of 30

Chapter 6: Elements of Database Systems


to data in a relational database was then discussed. The data dictionary concept was
then explained. The three major database languages - the data definition language, the
data manipulation language, and the data query language - were described in terms of
their functions. SQL, a popular relational database query language, was discussed in
some detail along with examples. Finally, backup and control procedures for relational
database systems were discussed. These include static backup, dynamic backup,
RAID, and concurrency control. The chapter concluded by discussing the emerging
concept of object-oriented modeling and implementation of database systems.

Key Terms
Composite key
Concatenated key
Concurrency control
Data definition language
Data dictionary
Data independence
Data manipulation language
Data query language
Database approach
Dynamic backup
Encapsulation
Entity integrity
File-oriented approach
Foreign key
Forms editor
Inheritance
Object-oriented
Redundant array of inexpensive disks
Referential integrity
Relationship cardinality
Report writer
Static backup
Structured query language (SQL)

25 of 30

Chapter 6: Elements of Database Systems

Key Web Sites


PC based DBMS

Microsoft Access Microsoft's popular RDBMS for the personal computer

dBase - one of the earliest (and still around) RDBMS for the personal computer

Base the database software that is part of the Apache OpenOffice suite

Server based DBMS

IBM DB2 IBMs relational DBMS for the enterprise

Microsoft SQL Server the latest version is SQL Server 2005

IBM Informix 12.1 Informix Dynamic Server an enterprise strength relational


database

Oracle 12c the latest version of Oracle's database, the industry leader in
RDBMS technology.

Sybase - A cross-platform RDBMS

mySQL An open source (i.e., free) multi-platform RDBMS

Other sites

A home page dedicated to information about the SQL standard

An interactive online SQL tutorial

The W3Schools.com site for SQL a good place to learn SQL

Network World article on RAID

Explanation of the different levels of RAID

A site with links to various object-oriented database systems

26 of 30

Chapter 6: Elements of Database Systems

Discussion Questions
1. Briefly describe the file-oriented approach to data processing.
2. Provide an overview level description of the database approach to data
processing.
3. Distinguish between the file-oriented and database approaches in terms of their
relative advantages and disadvantages.
4. What do you understand by the term "legacy systems."
5. Explain the concept of data independence.
6. Giving examples, explain the concept of foreign keys.
7. Indicate the key features of the object-oriented model.
8. How are many-to-many relationships represented in the relational model?
Explain in the context of the following scenario: an employee can be working on
many projects, and a project can have many employees working on it.
9. What are the rules to which tables must conform in the relational model?
10. Giving examples, explain the concepts of entity and referential integrity.
11. What are data validation rules? Why are validation rules in database
environments superior to application controls in a file-oriented environment?
12. Explain the methods by which access to sensitive data in a relational database
can be restricted.
13. Explain the concept of the "data dictionary." Why do auditors find the data
dictionary useful?
14. What are the three broad categories of database languages? Briefly indicate the
function of each language type.
15. Describe the four major SQL operators.
16. Distinguish between static and dynamic database backup. Explain the function of
RAID.
17. Giving examples, explain the concept of concurrency control in database
environments.

27 of 30

Chapter 6: Elements of Database Systems

Problems and Exercises


1. Aggies-R-Us would like your assistance in developing a logical database model for
their purchasing application. Based on discussions with key managers at Aggies-R-Us,
you determine the following information: (1) a vendor can supply many parts and a
particular part can by supplied by many vendors, and (2) each part can be stored in
many warehouses and each warehouse can store many parts.
Required: Draw a relational model to reflect the relationships between vendors, parts,
and warehouses. List the tables in your model and draw appropriate links between the
tables (use single-headed and double-headed arrows to indicate the relationship
cardinality). You may make reasonable assumptions regarding the fields to be
represented in each table; be sure to indicate the primary key in each table.
2. Answer the questions that follow with reference to the following tables in a relational
database. The assumptions pertaining to the tables are (1) an employee can work on
many projects, (2) a project can have many employees working on it, and (3) the
"hours-worked" field is used to keep track of the number of hours worked by each
employee on each project.
EMPLOYEES (EMPLOYEE-NO, NAME, PHONE-NO, OFFICE)
CUSTOMERS (CUSTOMER-NO, NAME, ADDRESS, BALANCE)
PROJECTS (PROJECT-NO, DATE, CUSTOMER-NO, BILLING-AMOUNT)
HOURS (EMPLOYEE-NO, PROJECT-NO, HOURS-WORKED)
Required:
a) Identify the primary key in each table.
b) Identify foreign keys, if any.
3. Answer the questions that follow with reference to the following tables in a relational
database. The assumptions pertaining to the tables are (1) an instructor can be
teaching many courses, (2) a student can be taking many classes, (3) there can be only
one instructor teaching a particular course, and (4) each class can have many students
enrolled in it.
INSTRUCTORS (INSTRUCTOR-NO, NAME, PHONE-NO, OFFICE)
STUDENTS (STUDENT-NO, NAME, ADDRESS, PHONE, YEAR-JOINED,
GRADUATION-YEAR, COLLEGE)
COURSES (COURSE-NO, DESCRIPTION, CREDIT-HOURS, INSTRUCTOR-NO)
ENROLLMENTS (COURSE-NO, STUDENT-NO, STATUS)
Required:
a) Identify the primary key in each table.
28 of 30

Chapter 6: Elements of Database Systems


b) Identify foreign keys, if any.
4. Consider a scenario in which work orders require many parts and the same part
could be used on different work orders. Construct tables to show how this relationship
would be implemented in the relational model. You may make any reasonable
assumptions regarding fields to be represented for work orders and parts.
5. Answer the questions that follow with reference to the following tables in a relational
database. The assumptions pertaining to these three tables are (1) a customer can
have many invoices, (2) an invoice can have many items, and (3) the STR field
indicates the state sales tax rate for each customer's state.
CUSTOMERS
ADDRESS
2
ABC Corp. 111 Any St.
Houston
DEF Corp. 22 Anywhere Dr. New York
GHI Corp. 5 Someplace Ct. Miami
JKL Corp. 56 Some Dr.
Bryan
MNO Corp. 7 Noplace Cir.
San Diego

CUSTOMER# NAME
456
457
458
459
460

ADDRESS1

STATE
TX
NY
FL
TX
CA

STR BALANCE
6.25
6.50
6.45
6.25
5.50

34560.65
2145.90
45670.75
21009.50
4561.00

INVOICES
INVOICE# DATE
1001
11-1-95
1002
11-2-95
11-2-95
1003
11-2-95
1004
11-3-95

CUSTOMER#
456
457
460
459
450

AMOUNT
450.75
560.25
300.10
890.25
425.50

INVOICE-ITEMS
INVOICE#
1001
1001
1002
1003
1003
1006

ITEM#
121
540
211
121
121
348

DESC
Widget
Bolt
Gear
Widget
Widget
Nut

PRICE
2.25
0.40
3.70
2.25
2.25
0.25

QTY
45
25
10
15
10
5

Required:
a) List all violations of entity integrity in the above tables.
b) List all violations of referential integrity in the above tables.

29 of 30

Chapter 6: Elements of Database Systems


6. Consider the following tables in a relational database. Provide the appropriate
"SELECT" SQL statement necessary to answer the queries that follow. Primary keys
are underlined and foreign key fields have an asterisk at the end of the field.
CUSTOMERS (CUSTNO, CNAME, CADDRESS, BALANCE)
SALESPERSONS (SPNO, SNAME, DATE_EMPLOYED, SALARY)
SALES (INVOICENO, DATE, CUSTNO*, SPNO*)
Required:
a) List the salesperson name and salary for all sales to customers whose balance is
greater than $20,000.
b) List the names and addresses of all customers who have been sold merchandise
by salespersons employed before 1/1/96.
7. Consider the following tables in a relational database which are in third normal form.
Provide the appropriate "SELECT" SQL statement necessary to answer the queries that
follow. Primary keys are underlined and foreign key fields have an asterisk at the end of
the field.
CUSTOMERS (CUSTOMERNO, NAME, ADDRESS, REGION, BALANCE)
INVOICES (INVOICENO, DATE, CUSTOMERNO*, SALESPERSON, AMOUNT)
ITEMS-SOLD (INVOICENO*, ITEMNO*, QUANTITY_SOLD, SELLING_PRICE)
INVENTORY (ITEMNO, DESCRIPTION, QUANTITY_ON_HAND)
Required:
a) List the invoice number, item number, item description and selling price on all
invoices by salesperson "John Doe."
b) List the customer names, invoice numbers, and invoice dates for all invoices
where the quantity sold exceeded 100.
8. Visit Web sites describing personal computer relational database products such as
Microsoft Access, dBase, and OpenOffice Base. Develop criteria that you feel are
important in evaluating the products, and rate each product in terms of the criteria you
develop. Summarize your findings regarding the product you feel best meets the criteria
you develop.

Last Updated: August 19, 2013

Copyright 1996-2013 CyberText Publishing, Inc. All Rights Reserved

30 of 30

You might also like