You are on page 1of 31

GTU MCA Sem - 3

3630002 - SQL (BCS - I)

Unit – 4

Database Design

Compiled and Prepared By:


Dr. Jaypalsinh A. Gohil
Content Outline

 Database Design Life Cycle (DBLC).

 Database Design strategies.

 Centralized vs. Decentralized Database


Systems.
Database Life Cycle (DBLC)
Database Life Cycle (DBLC) contains six phases, as shown in the below Fig:
Database Life Cycle (DBLC)
1. Database Initial Study:
Database Life Cycle (DBLC)
1. Database Initial Study:

1. 1 Analyze the company situation:


The company situation describes the general conditions in which a
company operates, its organizational structure, and its mission. To
analyze the company situation, the database designer must discover
what the company’s operational components are, how they function,
and how they interact.

1.2 Define Problems and Constraints:


The designer has both formal and informal sources of information. If
the company has existed for any length of time, it already has some
kind of system in place (either manual or computer-based). The
process of defining problems might initially appear to be unstructured.
During the initial problem definition process, the designer is likely to
collect very broad problem descriptions. The problem definition
process quickly leads to a host of general problem descriptions.
Database Life Cycle (DBLC)
1. Database Initial Study:

1. 3 Define Objectives :

A proposed database system must be designed to help solve at least


the major problems identified during the problem discovery process.
The designer’s job is to make sure that the database system objectives,
as seen by the designer, correspond to those envisioned by the end
user(s). In any case, the database designer must begin to address the
following questions:

Q:1 - What is the proposed system’s initial objective?


Q:2 - Will the system interface with other existing or future systems
in the company?
Q:3 - Will the system share the data with other systems or users?
Database Life Cycle (DBLC)
1. Database Initial Study:
1. 4 Define Scope and Boundaries:
The system’s scope defines the extent of the design according to
operational requirements. Will the database design encompass the
entire organization, one or more departments within the organization,
or one or more functions of a single department? The designer must
know the “size of the ballpark.” Knowing the scope helps in defining
the required data structures, the type and number of entities, the
physical size of the database, and so on.

The proposed system is also subject to limits known as boundaries,


which are external to the system. Boundaries are also imposed by
existing hardware and software. Ideally, the designer can choose the
hardware and software that will best accomplish the system goals.

Thus, the scope and boundaries become the factors that force the
design into a specific mold, and the designer’s job is to design the best
system possible within those constraints.
Database Life Cycle (DBLC)
2. Database Design:
Database Life Cycle (DBLC)

2. Database Design:

The second phase focuses on the design of the database model that
will support company operations and objectives. This is arguably the
most critical DBLC phase: making sure that the final product meets
user and system requirements.

At this point, there are two views of the data within the system: the
business view of data as a source of information and the designer’s
view of the data structure, its access, and the activities required to
transform the data into information. Consider the following Figure to
contrasts those views.
Database Life Cycle (DBLC)
2. Database Design:
Database Life Cycle (DBLC)
2. Database Design:
Database Life Cycle (DBLC)
2. Database Design:
Database Life Cycle (DBLC)
2. Database Design:
Database Life Cycle (DBLC)
2. Database Design:

II. DBMS Software Selection:


The selection of DBMS software is critical to the information system’s
smooth operation. Although the factors affecting the purchasing
decision vary from company to company, some of the most common
are:

1. Cost:
2. DBMS features and tools:
3. Underlying model:
4. Portability:
5. DBMS hardware requirements:
Database Life Cycle (DBLC)
2. Database Design:
III. Logical Design:
Database Life Cycle (DBLC)
Database Life Cycle (DBLC)
Given a query, there are generally a variety of methods for computing
the answer. For example, we have seen that, in SQL, a query could be
expressed in several different ways. Each SQL query can itself be
translated into a relational-algebra expression in one of several ways.
Furthermore, the relational-algebra representation of a query specifies
only partially how to evaluate a query; there are usually several ways
to evaluate relational-algebra expressions. As an illustration, consider
the query:
select balance from account where balance < 2500

This query can be translated into either of the following relational-


algebra expressions:

• σbalance<2500 (Πbalance (account))


• Πbalance (σbalance<2500 (account))
Query Processing
Query Evaluation Plan

A relational-algebra operation annotated with instructions on how to


evaluate it is called an evaluation primitive.

A sequence of primitive operations that can be used to evaluate a


query is a queryexecution plan or query-evaluation plan.
Figure 13.2 illustrates an evaluation plan for our example query,
in which a particular index (denoted in the figure as “index 1”) is
specified for the selection operation. The query-execution engine
takes a query-evaluation plan, executes that plan, and returns the
answers to the query.
Query Processing
Measure of Query Cost:
The cost of query evaluation can be measured in terms of a number of
different resources, including disk accesses, CPU time to execute a
query.

The response time for a query-evaluation plan (that is, the clock time
required to execute the plan), assuming no other activity is going on
the computer, would account for all these costs, and could be used as a
good measure of the cost of the plan.

In large database systems, however, disk accesses (which we measure


as the number of transfers of blocks from disk) are usually the most
important cost, since disk accesses are slow compared to in-memory
operations.

We use the number of block transfers from disk as a measure of the


actual cost.
Query Processing
Measure of Query Cost:
A more accurate measure would therefore estimate:

1.The number of seek operations performed.

2.The number of blocks read.

3.The number of blocks written.


Query Processing
Selection Operation:

Basic Algorithms :
A1. Linear search
In a linear search, the system scans each file block and tests all records
to see whether they satisfy the selection condition. Selections on key
attributes have an average cost of br/2, but still have a worst-case cost
of br.

A2. Binary search


If the file is ordered on an attribute, and the selection condition is an
equality comparison on the attribute, we can use a binary search to
locate records that satisfy the selection. The system performs the
binary search on the blocks of the file. The number of blocks that need
to be examined to find a block containing the required records is
[log2(br)].
Query Processing
Sorting:

We can sort a relation by building an index on the sort key, and then
using that index to read the relation in sorted order. However, such a
process orders the relation only logically, through an index, rather than
physically.

Hence, the reading of tuples in the sorted order may lead to a disk
access for each record, which can be very expensive, since the number
of records can be much larger than the number of blocks. For this
reason, it may be desirable to order the records physically.

Sorting of relations that do not fit in memory is called external sorting.


The most commonly used technique for external sorting is the
external sort–merge algorithm. We describe the external sort–merge
algorithm next.
Query Processing
Sorting:

1. In the first stage, a number of sorted runs are created; each run is
sorted, but contains only some of the records of the relation.

i = 0;
repeat
read M blocks of the relation, or the rest of the
` relation, whichever is smaller;
sort the in-memory part of the relation;
write the sorted data to run file Ri;
i = i + 1;
until the end of the relation
Query Processing
Sorting:
2. In the second stage, the runs are merged. Suppose, for now, that the
total number of runs, N, is less than M,

read one block of each of the N files Ri into a buffer page in memory;
repeat
choose the first tuple (in sort order) among all buffer pages;
write the tuple to the output, and delete it from the buffer
page;
if the buffer page of any run Ri is empty and not end-of-
file(Ri)
then read the next block of Ri into the buffer page;
until all buffer pages are empty

The output of the merge stage is the sorted relation.


Query Processing
Sorting:
Query Processing
Join:
Query Processing
Join:

Nested Loop Join:


This algorithm is called the nested-loop join algorithm, since it
basically consists of a pair of nested for loops. Relation r is called the
outer relation and relation s the inner relation of the join, since the
loop for r encloses the loop for s. The algorithm uses the notation tr ·
ts, where tr and ts are tuples.

for each tuple tr in r do begin


for each tuple ts in s do begin
test pair (tr, ts) to see if they satisfy the join
condition θ if they do, add tr · ts to the result.
end
end
Query Processing
Evaluation of Expressions
Now we consider how to evaluate an expression containing multiple
operations. The obvious way to evaluate an expression is simply to
evaluate one operation at a time, in an appropriate order. The result of
each evaluation is materialized in a temporary relation for subsequent
use.
Materialization:
Query Processing
Evaluation of Expressions
Materialization:
If we apply the materialization approach, we start from the lowest-level
operations in the expression (at the bottom of the tree). In our example, there
is only one such operation; the selection operation on account.

In our example, the inputs to the join are the customer relation and the
temporary relation created by the selection on account. The join can now be
evaluated, creating another temporary relation. By repeating the process, we
will eventually evaluate the operation at the root of the tree, giving the final
result of the expression. In our example, we get the final result by executing
the projection operation at the root of the tree, using as input the temporary
relation created by the join.

Evaluation as just described is called materialized evaluation, since the


results of each intermediate operation are created (materialized) and then are
used for evaluation of the next-level operations.
Query Processing
Evaluation of Expressions
Pipelining:

We can improve query-evaluation efficiency by reducing the number of


temporary files that are produced. We achieve this reduction by combining
several relational operations into a pipeline of operations, in which the results
of one operation are passed along to the next operation in the pipeline.
Evaluation as just described is called pipelined evaluation.

For example, consider the expression

materialization were applied, evaluation would involve creating a temporary


relation to hold the result of the join, and then reading back in the result to
perform the projection. These operations can be combined: When the join
operation generates a tuple of its result, it passes that tuple immediately to the
project operation for processing. By combining the join and the projection, we
avoid creating the intermediate result, and instead create the final result
directly.
References

 “Fundamentals of Database Systems”, Elmsari, Navathe, 5th Edition,


Pearson Education (2008)

 “Database System Concepts”, Silberschatz, Korth, Sudarshan, 5th


Edition, McGraw Hill Publication

 “Database Systems Concepts, Design & Applications”, S. K. Singh,


Pearson Education

 , “Database Systems Design, Implementation & management”, Rob


& Coronel, Thomson Publication

You might also like