You are on page 1of 3

Incremental Detection of Inconsistencies in

Distributed Data
Existing System:
REAL-LIFE data is often dirty. To clean the data, efficient algorithms for detecting errors
have to be in place. Errors in the data are typically detected as violations of constraints (data
!ality r!les", s!ch as f!nctional dependencies (F#s", denial constraints, and conditional
f!nctional dependencies ($F#s". %hen the data is in a centrali&ed database, it is 'no(n that t(o
)*L !eries s!ffice to detect its violations of a set of $F#s . It is increasingly common to find
data partitioned vertically or hori&ontally and distrib!ted across different sites. This is
highlighted by the recent interests in )aa) and $lo!d comp!ting, +apRed!ce and col!mnar
#,+). In the distrib!ted settings, ho(ever, it is m!ch harder to detect errors in the data.
-. ./ conditional f!nctional dependencies
0. .o Error detection in distrib!ted data.
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS
PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE
PROJECTS
CELL: +91 9!9" #9$"% +91 99&&' #"(% +91 9!9" "(9$% +91 9($1!
!$!$1
V)*)+: ,,,-.)/012304546738+*-649 M0)1 +6:)333.)/01*3:546738+*;9:0)1-86:
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS
PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE
PROJECTS
CELL: +91 9!9" #9$"% +91 99&&' #"(% +91 9!9" "(9$% +91 9($1!
!$!$1
V)*)+: ,,,-.)/012304546738+*-649 M0)1 +6:)333.)/01*3:546738+*;9:0)1-86:
Proposed System:
%e proposed to red!ce data shipment, e.g., co!nters pointer and tags in base relations .
%hile these co!ld be incorporated into o!r sol!tion, they do not yield bo!nded1optimal
incremental detection algorithms. There has also been a host of (or' on !ery processing and
m!lti-!ery optimi&ation for distrib!ted data. The former typically aims to generate
distrib!ted!ery plans, to red!ce data shipment or response time. /ptimi&ation strategies, e.g.,
semi-2oins , bloom2oins , and recently have proved !sef!l in main-memory distrib!ted databases
(e.g., +onet#, and3-)tore ", and in clo!d comp!ting and +apRed!ce . /!r algorithms
leverage the techni!es of to red!ce data shipment (hen validating m!ltiple $F#s, in partic!lar.
-. Red!ce data shipment or response time
0. Error #etection
4. 5ertical 6artitions
Key Features
First, (e are c!rrently e7perimenting (ith real-life datasets from different
applications, to find o!t (hen incremental detection is most effective. Second,
(e also intend to e7tend o!r algorithms to data that is partitioned both
vertically and hori&ontally. Third, (e plan to develop +apRed!ce algorithms
for incremental violation detection. Fourth, (e are to e7tend o!r approach to
s!pport constraints defined in terms of similarity predicates (e.g., matching
dependencies for record matching" beyond e!ality comparison, for (hich
hash-based indices may not (or' and more rob!st inde7ing techni!es need
to be e7plored.
Algorithm
Incremental algorithm

You might also like