architecture Suppose we want to retrieve the name of all customers who have one or more accounts in branches in the city of Edina. We can write the SQL statement for this question as Select c.Cname From Customer c, Branch b, Account a Where c.CID = a.CID AND a.Bname = b.Bname AND b.Bcity = Edina;
Non Distributed query processing
architecture PJcname (SLBcity = Edina (Customer CP (Account CP Branch)))
Non Distributed query processing
architecture
Non Distributed query processing
architecture
Distributed query processing
architecture
Mapping Global Query into Local
Queries It is the responsibility of the controlling site to use the global data dictionary (GDD) to determine the distribution information and reconstruct the global view from local physical fragments.
Mapping Global Query into Local
Queries Suppose the EMP relation is horizontally fragmented based on the value of the LOC attribute. Each employee works at one of three possible locations (LA, NY, or MPLS). The LA server stores the information about employees who work in LA. Similarly, NY and MPLS servers store the information about employees who work at these locations. Now we will consider a query that needs to retrieve the name of all employees who make more than $50,000
Queries PJEname (SLsal >50000 (LA_EMP)) UN PJEname (SLsal >50000 (NY_EMP)) UN PJEname (SLsal >50000 (MPLS_EMP))
Mapping Global Query into Local
Queries PJEname (SLsal >50000 (LA_EMP)) UN ( PJEname (SLsal >50000 (NY_EMP)) UN PJEname (SLsal >50000 (MPLS_EMP)) )
Mapping Global Query into Local
Queries SLsal >50000 ( PJEname (LA_EMP) UN ( PJEname (NY_EMP) UN PJEname (MPLS_EMP) )
Mapping Global Query into Local
Queries For each alternative, the global optimizer must also decide where to union these results together. One approach would be to union all three intermediate relations at the client or controlling site. Another approach is to use one of the database servers at MPLS, LA, or NY to perform the union. The anticipated size of each intermediate relation, the local database server speed, and the communication link speed are factors that are used to decide on a plan with the smallest communication cost.
Mapping Global Query into Local
Queries Consider a simple query that performs a select operation on the Branch relation in our example bank database. This query enters Site 1 of a three-site system. To analyze the impact of distributing the Branch relation on the overall execution, we consider three cases.
Mapping Global Query into Local
Queries In the first case, the Branch relation is stored at Site 2 and it is not fragmented or replicated. In the second case, the Branch relation is replicated and there are two copies of it, one at Site 2 and one at Site 3. In the third case, the Branch relation is fragmented horizontally with one fragment stored at Site 2 and another at Site 3.
Mapping Global Query into Local
Queries Case 1:The Branch relation is stored entirely at Site 2. In this case, since the Branch relation is not fragmented or replicated, the global query is mapped directly to a local query that must run at Site 2. The results of the select operation need to be sent back to Site 1 to be displayed to the user. To run the query at Site 2, Site 1 sends a message to Site 2 passing the SL expression to it. We will assume that the SL command takes one message to send.
Mapping Global Query into Local
Queries The number of messages required to return the results to Site 1 depends on how many tuples of the Branch relation qualify. For simplicity, we assume that each row of the relation takes one message to be sent. Therefore, the results require N messages if there are N tuples that are returned from the select. This alternative requires N +1 messages
Mapping Global Query into Local
Queries Case 2:The Branch relation is replicated with copies stored at Site 2 and Site 3. In this case, since Branch is replicated but not fragmented, the global query still maps to a single site local query. Since there are two copies of the relation in the system, the global optimizer will have to decide where it is best to run the query. In either case, the number of messages to run the command at the remote site and send the results back is N+1the same as Case 1. In this case, the optimizer has to consider factors such as the local processing speed of each server as well as the communication link
Mapping Global Query into Local
Queries speed between Site 1 and each of the two candidate sites. If the link speed for both sites is the same, the processing speed and workload are used to break the tie. But if the link between Site 1 and Site 2 is much faster than the link between Site 1 and Site 3, running the select operation at Site 2 is better only if the processing speed and load of servers at Site 2 and Site 3 are the same.
Mapping Global Query into Local
Queries Case 3:The Branch is horizontally fragmented with fragments at Site 2 and Site 3. In this case, the global query is mapped into two local queries, each running against one of the two fragments. Since fragments of the Branch relation are not replicated, the global optimizer has only one option for the site where it needs to run each local query. As assumed before, the query returns N tuples. In this case, however, the N tuples reside on two different sites. Lets assume that N2 and N3 represent the qualified number of tuples at Site 2 and Site 3, respectively, where N=N2+N3. We need to union the tuples from Site 2 and Site 3 as the answer to the query
Mapping Global Query into Local
Queries There are three sites where we can perform the union: a. Union is performed at Site 1. In this case, as shown in Figure 4.18a, the communication cost is 2+N2+N3. b. Union is performed at Site 2. In this case, as shown in Figure 4.18b, the communication cost is 2+N+N3. c. Union is performed at Site 3. In this case, as shown in Figure 4.18c, the communication cost is 2+N+N2.
Mapping Global Query into Local
Queries
Mapping Global Query into Local
Queries Clearly, the lowest communication cost is associated with Plan 1. If Site 1 does not have a database server to perform the union, Plan 2 or Plan 3 must be considered. If N2>N3, then Plan 2 is superior. If N2<N3, then Plan 3 is better. If N2=N3, then both plans have the same communication cost. In this case, the optimizer must consider each database servers speed and/or load factors to decide where to run the query.
Mapping Global Query into Local
Queries From this discussion, we can summarize that the total processing cost of a query in a distributed database consists of the amount of time it takes to generate the distributed query execution plan; the time it takes to send the commands that each local server must run; the time it takes for the local DBEs to execute the local queries; and the additional communication cost required between the controlling site and the local sites for transferring the intermediate and/or final results.