You are on page 1of 46

Advanced Databases Introduction

dr. Toon Calders prof. dr. Jan Paredaens

Outline
Motivation for the course Other DH courses Practical organization Course topics Project Overview of changes

Motivation for the Course


Database = a piece of software to handle data:
store, maintain, and query

Most ideal system situation-dependent


data type: simple / semi-structured / complex / types of queries: simple lookup / analytical / type of usage: multi-user / single-user / distributed /

Motivation for the Course


Relational databases are tuned towards:
simple data simple, ad-hoc queries multiple users

Other models are more suitable for other types of data


Object-Oriented, Deductive, Semi-Structured Databases, Data warehouses

Motivation for the Course


Study different data models
Advantages, disadvantages Conceptual level what are the important notions? Whats underneath?

In a scientific way
exact, not just claims

Motivation for the Course


Student knows:
different database models

Understands:
why they are introduced conceptual notions

Is able to:
quickly master vendor-specific products

Outline
Motivation for the course Other DH courses Practical organization Course topics Project Overview of changes

Other DH Courses
Relational database systems
(2ID05) Databases and Data Modelling (2ID35) Database Technology transations, indexing, query optimization, distributed DB

Other database models (2ID45) Advanced Databases (2II15) Data Mining (2ID25) Information Retrieval (2ID99) Capita Selecta DH

Outline
Motivation for the course Other DH courses Practical organization Course topics Project Overview of changes

Practical Organization
In principle Wed 8:45 10:30 Practical session M 1.46

no new material opportunity to practice, ask questions together solve exercises

Fri 10:45

12:30 Lectures

HG 6.09

XML : Paredaens (6 lectures) other parts: Calders

Practical Organization
Important information
http://wwwis.win.tue.nl/~tcalders/teaching/advancedDB/

Subscribe to 2ID45 on studyweb !


messages to the whole class group lecture postponed, room changes,

t.calders@tue.nl

Practical Organization
Course material
Book: Silberschatz, Korth, Sudarshan. Database system concepts 5th edition. McGraw-Hill International Lots of additional material on course webpage papers slides solutions to exercises

Practical Organization
Grades:
70% written exam 30% group project

No project = no grade Grade for the project can be transfered to August, similar for grade for the exam Grades expire in August

Outline
Motivation for the course Other DH courses Practical organization Course Topics Project Overview of changes

Course Topics
Limitations of the relational model Deductive databases Object-Oriented Databases Data Warehousing & OLAP Semi-Structured data

Limitations of the relational model


Not every query can be expressed
Transitive closure cannot be expressed in Relational Algebra Give all cities reachable from Antwerp by plane Give all smallest components of a part Give all decendants of person X Not even if youre very smart proof Extension to other relational query languages

Deductive Databases
Motivation is two-fold:
add deductive capabilities to databases; the database contains: facts (intensional relations) rules to generate derived facts (extensional relations) Database is knowledge base Extend the querying datalog allows for recursion

Deductive Databases
Datalog as engine of deductive databases
similarities with Prolog has facts and rules rules define -possibly recursive- views

Semantics not always clear


safety negation recursion

Deductive Databases
g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y).

Deductive Databases
In this topic we study:
How to handle negation and recursion in the same program How to efficiently evaluate Datalog queries

OO Databases
Many applications require the storage and manipulation of complex data
design databases geometric databases

Object-Oriented programming languages manipulate complex objects


classes, methods, inheritance, polymorphism

OO Databases
Very simple example:
Class book set of authors title set of keywords

Extremely simple to model in OO language Hard in relational database!

OO Databases
In many applications persistency of the data is nevertheless required
protection against system failure consistency of the data

Mapping: object in OO language tuples of atomic values in relational database is often problematic

OO Databases
Either we ignore the multivalued dependencies

Title Database System Concepts Database System Concepts Database System Concepts Database System Concepts Database System Concepts Database System Concepts

Author Silberschatz Korth Sudarshan Silberschatz Korth Sudarshan

Keyword Database Database Database Storage Storage Storage

This table is in 3NF, BCNF

OO Databases
Or we go to 4NF
Title Database System Concepts Database System Concepts Database System Concepts Title Database System Concepts Database System Concepts Author Silberschatz Korth Sudarshan Keyword Database Storage

OO Databases
Basically OODB = persistent OO programming language
Very important concept rather uninteresting scientifically

This topic will mainly be self-study


Reading bookchapter + Q & A session

Data Warehousing & OLAP


Monitor & Integrator OLAP Server

other

Metadata

Analysis

sources
Operational Extract Transform Load Refresh

Query/Reporting

DBs

Data Warehouse

Serve

Data Mining ROLAP Server

Data Marts

Data Sources

Data Storage

OLAP Engine Front-End Tools

Data Warehousing & OLAP


Transaction processing Operational setting Up-to-date = critical Simple data Simple queries; only touch a small part of the database Flight reservations ticket sales do not sell a seat twice reservation, date, name Give flight details of X List flights to Y

Data Warehousing & OLAP


Decision support Off-line setting Historical data Summarized data Integrate different databases Statistical queries Flight company Evaluate ROI flights Flights of last year # passengers per carrier for destination X Passengers, fuel costs, maintenance info Average % of seats sold/month/destination

Data Warehousing & OLAP


In this topic we will study:
Conceptual models for decision support Database explosion problem Efficient implementation strategies indexing, view materialization

XML
Why is XML important?
simple open non-proprietary widely accepted data exchange format

XML is like HTML but


no fixed set of tags X = extensible no fixed semantics (c.q. representation) of tags representation determined by separate stylesheet semantics determined by application no fixed structure user-defined schemas

XML
<PersonList Type="Student" Date="2004-12-12"> <Title Value="Student List"/> <Contents> <Person> <Name>Jan Vijs</Name> <Id>11</Id> <Address> <Number>123</Number> <Street>Turnstreet</Street> </Address> </Person> <Person> <Id>66</Id> <Address> <Street>Hole Rd</Street> </Address> </Person> </Contents> </PersonList>

XML
In this topic:
XML XQuery, XSLT LiXQuery

Taught by prof Paredaens

Outline
Motivation for the course Other DH courses Practical organization Course Topics Project Overview of changes

Project
Pick one of the 4 topics:
deductive databases / rule-based systems object-oriented databases data warehouses semi-structured databases

Formulate your own project


illustrating the different course concepts showing you mastered the technology

Project
Make a project proposal ( WEEK 10 )

examples of last year will be given fulfilling certain constraints listing technologies to be used

Status report Final report Project presentations

( WEEK 15 ) ( WEEK 20 ) ( WEEKS 21 & 22 )

Outline
Motivation for the course Other DH courses Practical organization Course Topics Project Overview of changes

Overview of Changes
First some facts and figures regarding Spring 2008
Heterogeneous group Outside NL, HBO, BSc TU/e

CSE

BIS

Overview of Changes
Some suggestions I decided to act upon:
1. Start with the difficult material: expressiveness of RA Gaifman locality 2. Too much time is being spent on XML (5+5) (6+3) & topic (XSLT) has been added 3. Disproportional weight given to XML in exam project no longer exclusively XML

Overview of Changes
Some suggestions I decided to act upon:
4. Some materials and instruction just too hard extra exercices will be added; more modular 5. The course was split up in lots of individual subjects, with no apparent relation to one another tried to handle that in the course motivation

Overview of Changes
Some suggestions that were ignored: A google for 'advanced databases' returns quite some courses from other universities that look interesting to me. Perhaps the lecturers could take a look at those.
When (re-)constructing the course last year other universities ADB courses were surveyed. Many of the interesting topics are already handled in other courses (Data Mining, Information retrieval, Database technology)

Overview of Changes
Some suggestions that were ignored: Don't discuss prerequisite knowledge too much, it is prerequisite. Heterogeneous group. Balance the course subjects more, TC was discussed very specific while the other 3 subjects where treated in global. Time spent on TC is justified by its difficulty and its importance for database theory + motivates OODB & Deductive DB

Overview of Changes
Take-away message
(some?) lecturers do act on questionnaires filling out the questionnaires is useful

Overview of Changes
Take-away message
(some?) lecturers do act on questionnaires filling out the questionnaires is useful

Summary
Relational model has limitations
simple queries simple data

OODBs allow complex data types Deductive databases, datalog complex queries Somewhere in-between: datawarehouses and OLAP
special requirements, special datastructures

Semi-structured data can be stored in XML Project complements theoretical lectures Instructions for clarification

!! See you on Friday !!