You are on page 1of 25

Query Optimization

In Distributed Database
System

Non Distributed query processing


architecture

Non Distributed query processing


architecture
Suppose we want to retrieve the name of all
customers who have one or more accounts
in branches in the city of Edina. We can
write the SQL statement for this question as
Select c.Cname
From Customer c, Branch b, Account a
Where c.CID = a.CID
AND a.Bname = b.Bname
AND b.Bcity = Edina;

Non Distributed query processing


architecture
PJcname (SLBcity = Edina
(Customer CP (Account CP Branch)))

Non Distributed query processing


architecture

Non Distributed query processing


architecture

Distributed query processing


architecture

Mapping Global Query into Local


Queries
It is the responsibility of the
controlling site to use the global data
dictionary (GDD) to determine the
distribution information and
reconstruct the global view from
local physical fragments.

Mapping Global Query into Local


Queries
Suppose the EMP relation is horizontally
fragmented based on the value of the LOC
attribute. Each employee works at one of three
possible locations (LA, NY, or MPLS). The LA
server stores the information about employees
who work in LA. Similarly, NY and MPLS servers
store the information about employees who
work at these locations.
Now we will consider a query that needs to
retrieve the name of all employees who make
more than $50,000

Mapping Global Query into Local


Queries
Global Query: PJEname (SLsal >50000
(EMP))
LAs Query: PJEname (SLsal >50000
(LA_EMP))
Nys Query: PJEname (SLsal >50000
(NY_EMP))
MPLSs Query: PJEname (SLsal >50000
(MPLS_EMP))

Mapping Global Query into Local


Queries
PJEname (SLsal >50000 (LA_EMP))
UN
PJEname (SLsal >50000 (NY_EMP))
UN
PJEname (SLsal >50000 (MPLS_EMP))

Mapping Global Query into Local


Queries
PJEname (SLsal >50000 (LA_EMP))
UN
(
PJEname (SLsal >50000 (NY_EMP))
UN
PJEname (SLsal >50000 (MPLS_EMP))
)

Mapping Global Query into Local


Queries
SLsal >50000 (
PJEname (LA_EMP)
UN
(
PJEname (NY_EMP)
UN
PJEname (MPLS_EMP)
)

Mapping Global Query into Local


Queries
For each alternative, the global optimizer must
also decide where to union these results together.
One approach would be to union all three
intermediate relations at the client or controlling
site.
Another approach is to use one of the database
servers at MPLS, LA, or NY to perform the union.
The anticipated size of each intermediate relation,
the local database server speed, and the
communication link speed are factors that are
used to decide on a plan with the smallest
communication cost.

Mapping Global Query into Local


Queries
Consider a simple query that
performs a select operation on the
Branch relation in our example bank
database. This query enters Site 1 of
a three-site system.
To analyze the impact of distributing
the Branch relation on the overall
execution, we consider three cases.

Mapping Global Query into Local


Queries
In the first case, the Branch relation is
stored at Site 2 and it is not fragmented
or replicated. In the second case, the
Branch relation is replicated and there
are two copies of it, one at Site 2 and
one at Site 3. In the third case, the
Branch relation is fragmented
horizontally with one fragment stored at
Site 2 and another at Site 3.

Mapping Global Query into Local


Queries
Case 1:The Branch relation is stored entirely
at Site 2. In this case, since the Branch
relation is not fragmented or replicated, the
global query is mapped directly to a local
query that must run at Site 2. The results of
the select operation need to be sent back to
Site 1 to be displayed to the user. To run the
query at Site 2, Site 1 sends a message to
Site 2 passing the SL expression to it. We will
assume that the SL command takes one
message to send.

Mapping Global Query into Local


Queries
The number of messages required to
return the results to Site 1 depends on
how many tuples of the Branch relation
qualify. For simplicity, we assume that
each row of the relation takes one
message to be sent. Therefore, the
results require N messages if there are N
tuples that are returned from the select.
This alternative requires N +1
messages

Mapping Global Query into Local


Queries
Case 2:The Branch relation is replicated with copies
stored at Site 2 and Site 3. In this case, since
Branch is replicated but not fragmented, the global
query still maps to a single site local query. Since
there are two copies of the relation in the system,
the global optimizer will have to decide where it is
best to run the query. In either case, the number of
messages to run the command at the remote site
and send the results back is N+1the same as
Case 1. In this case, the optimizer has to consider
factors such as the local processing speed of each
server as well as the communication link

Mapping Global Query into Local


Queries
speed between Site 1 and each of the two
candidate sites. If the link speed for both
sites is the same, the processing speed
and workload are used to break the tie.
But if the link between Site 1 and Site 2 is
much faster than the link between Site 1
and Site 3, running the select operation at
Site 2 is better only if the processing
speed and load of servers at Site 2 and
Site 3 are the same.

Mapping Global Query into Local


Queries
Case 3:The Branch is horizontally fragmented with
fragments at Site 2 and Site 3. In this case, the global
query is mapped into two local queries, each running
against one of the two fragments. Since fragments of
the Branch relation are not replicated, the global
optimizer has only one option for the site where it
needs to run each local query.
As assumed before, the query returns N tuples. In this
case, however, the N tuples reside on two different
sites. Lets assume that N2 and N3 represent the
qualified number of tuples at Site 2 and Site 3,
respectively, where N=N2+N3. We need to union the
tuples from Site 2 and Site 3 as the answer to the query

Mapping Global Query into Local


Queries
There are three sites where we can
perform the union:
a. Union is performed at Site 1. In this case, as
shown in Figure 4.18a, the communication
cost is 2+N2+N3.
b. Union is performed at Site 2. In this case, as
shown in Figure 4.18b, the communication
cost is 2+N+N3.
c. Union is performed at Site 3. In this case, as
shown in Figure 4.18c, the communication
cost is 2+N+N2.

Mapping Global Query into Local


Queries

Mapping Global Query into Local


Queries
Clearly, the lowest communication cost is
associated with Plan 1. If Site 1 does not have
a database server to perform the union, Plan 2
or Plan 3 must be considered.
If N2>N3, then Plan 2 is superior. If
N2<N3, then Plan 3 is better. If N2=N3,
then both plans have the same
communication cost. In this case, the
optimizer must consider each database
servers speed and/or load factors to decide
where to run the query.

Mapping Global Query into Local


Queries
From this discussion, we can summarize that the
total processing cost of a query in a distributed
database consists of the amount of time it takes
to generate the distributed query execution plan;
the time it takes to send the commands that
each local server must run; the time it takes for
the local DBEs to execute the local queries; and
the additional communication cost required
between the controlling site and the local sites
for transferring the intermediate and/or final
results.

You might also like