You are on page 1of 26

Parallel Database Systems

The Future Of High Performance Database Systems


David Dewitt and Jim Gray 1992

Presented By Ajith Karimpana

Parallel Databases
History of Parallel Databases

Why Parallel Databases ?


How are they implemented ? Where are they implemented ? Future of Parallel Databases

Parallel Databases

Parallel Databases

The History

History of Parallel Databases


Mainframes dominated most database and transaction processing tasks. Parallel Machines were practically written off. Specialized Database Machines came up with trendy hardware. Relational Data Model brought about a revolution.

History of Parallel Databases


Relational Data Model Revolution Uniform operations applied to uniform streams of data. Each operator produces a new relation. Pipelined Parallelism Partitioned Parallelism

History of Parallel Databases


Pipelined Parallelism
Streaming the output of one operator into the output of another operator.

Partitioned Parallelism
Partitioning the input data among multiple processors and memories, such that an operator is split into many independent operators each working on a part of the data.

Parallel Databases

Parallel Databases

WHY ?

Parallel Databases Why ?


The Philosophy The ideal database machine would be a single infinitely fast processor with an infinite memory with infinite bandwidth and it would be infinitely cheap (free). But do we have such an ideal machine ?

NO

So the challenge is to build an infinitely fast processor out of infinitely many processors of finite speed, and to build an infinitely large memory with infinite memory bandwidth from infinitely many storage units of finite speed. Answer To This Challenge Parallel Databases

Parallel Databases

Parallel Databases
The Implementation

Parallel Databases- Implementation


Parallel Database Implementation The Basic Techniques Two Key Properties -

Parallel Databases- Implementation


Two Kinds of Scale up

Batch Same query running on N-times larger database.


Transactional N-times as many clients, submitting N-times as many requests against an N-times larger database.

Parallel Databases- Implementation


Threats To Linear Speedup/Scale up

Parallel Databases- Implementation


Hardware Architecture
Shared Memory Shared Disk

Parallel Databases- Implementation


Hardware Architecture Shared Nothing

Parallel Databases- Implementation


Parallel Dataflow Approach To SQL Software

SQL data model was originally proposed to improve programmer productivity by offering a nonprocedural database language. SQL came with Data Independence since the programs do not specify how the query is to be executed. Relational Queries with their properties can be executed as a dataflow graph and can use both pipelined and partitioned parallelism.

Parallel Databases- Implementation


Data Partitioning

Partitioning a relation involves distributing its tuples over several disks. Three Kinds Round-robin Partitioning
Range Partitioning

Hashing Partitioning

Parallel Databases- Implementation


Range Round-Robin Hashing

Parallel Databases- Implementation


Round-Robin
Ideal for applications that wish to read entire relation sequentially for each query. Not ideal for point and range queries, since each of the n disks must be searched.

Hash
Ideal for point queries based on the partitioning attribute. Ideal for sequential scans of the entire relation. Not ideal for point queries on non-partitioning attributes. Not ideal for range queries on the partitioning attribute.

Range
Ideal for point and range queries on the partitioning attribute.

Parallel Databases- Implementation


Handling Of Skew

The distribution of tuples when a relation is partitioned (except for Round-Robin) may be skewed, with a high percentage of tuples placed in some partitions and fewer tuples in other partitions.
2 Kinds

Data Skew (Attribute-value Skew) Execution Skew (Partition Skew)

Parallel Databases- Implementation


Parallelism With Relational Operators Consider a simple sequential query

Parallel Databases- Implementation


A Relational Dataflow Graph

Parallel Databases- Implementation

Parallel Databases- Implementation


Famous Implementations Of Parallel Databases

Teradata Tandem NonStop SQL Gamma The Super Database Computer Bubba nCUBE

Parallel Databases

Parallel Databases
The Future

Parallel Databases- The Future


Research Problems Parallel Query Optimization Application Program Parallelism Physical Database Design On-line Data Reorganization and Utilities

Future Directions Many commercial success stories. But research issues still remain unresolved. Some applications are not well supported by relational data model. Object-oriented design ??

Parallel Databases

Thank You

Grilling Time !!

You might also like