Professional Documents
Culture Documents
Data Integrit
Integrity Workshop
Washington, D.C.
November 14-
14-15, 2008
by
Sen--Yoni Musingo,
Sen Musingo Ph.D.
Ph D
1
2
Data Integrity
g y Components
p
3
Data Integrity
g y Components
p
4
Accuracy
y in Data Integrity
g y
Accuracy
y in Data Integrity
g y
z Accuracy refers to how close a measurement
(e.g., gender) is to the expected value (male or
female).
z In archery, an arrow represents a measurement
and the bulls-
bulls-eye represents the expected or
accepted value. Accuracy corresponds to the
distance between the arrows and the bulls-
bulls-eye.
z Your data is accurate if it is clean and
precise!
precise
6
Accuracy
y in Data Integrity
g y
7
Accuracy
y in Data Integrity
g y
If it stinks
stinks, you need to wash it!
It Stinks!
Very
Clean
8
Accuracy
y in Data Integrity
g y
9
Accuracy in Data Integrity
10
Accuracy
y in Data Integrity
g y
11
Accuracy
y in Data Integrity
g y
12
Accuracy in Data Integrity
How Precise Are Your Data?
Data?
Bulls
B
Bulls-
ll -Eye
E
Every Time?
13
Accuracy
y in Data Integrity
g y
Data Precision
Also called reproducibility or repeatability, is the
degree
g to which further reporting
p g of the data
shows the same or similar results.
14
Consistency
y in Data Integrity
g y
Data Consistency is achieved through:
through
Standardization
Integration
Automation
Replication
Synchronization
16
Consistency in Data Integrity
Whose standards?
17
Consistency in Data Integrity
Standardization
A process of achieving agreement on
common data definitions, representation, and
structures to which all data layers must
conform
Without Standardization:
z Data exchange and interoperability are
problematic and costly
z D t cannott be
Data b aligned
li d with
ith th
the enterprise
t i
architecture
z Data q
quality
y and consistency
y are compromised
p
18
Consistency in Data Integrity
19
Consistency in Data Integrity
Integration
A process off combining
bi i d data
t ffrom diff
differentt sources and
d
providing the user with a unified view of these data.
ETL is an integration process used in data warehousing to extract data
from outside sources, transform these data to fit business needs, and
load these data into the warehouse.
20
Consistency in Data Integrity
21
Consistency in Data Integrity
Automation
A process that uses a computerized control
system to reduce or minimize the need for
human intervention
Its goal is twofold:
z To avoid mistakes in data entry by making the initial
entering of the data as automatic as possible.
Different situations require different automation
methods and equipment
z To avoid having to re-enter data to perform a different
task with it.
22
Consistency in Data Integrity
How Automated Is Your Data System?
23
Consistency in Data Integrity
Replication and Synchronization
24
Consistency in Data Integrity
Replication
p or Mirrorring
g
A process used to generate and manage
multiple copies of data at one or more sites,
allowing employees to stay connected to
essential business information and applications
Data replication also provides a backup system
in case of a catastrophic failure
25
Consistency in Data Integrity
Synchronization
z A process used to consolidate data being
moved from system to system
z Bad data is never spread from system to
y
system,, so the information delivered
across the enterprise is up-
up-to-
to-date,
consistent and accurate
26
Consistency
y in Data Integrity
g y
Can You Synchronize and Replicate Your Data?
Data
27
Reliability in Data Integrity
Reliability
28
Reliability in Data Integrity
z Reliability = Accuracy + Consistency + More…
z Data are reliable also when they are:
z Complete: contain all the data elements needed for
Complete:
the intended purposes of use
z Timely:: accessible and available to users as
Timely
needed when needed
needed,
z Valid:: represent what is being measured
Valid
z Secure:: p
Secure protected against
g malicious or
unintentional alterations
29
Reliability in Data Integrity
30
Reliability in Data Integrity
How Good Are Your Data Sources?
Sources?
31
y in Data Integrity
Reliability g y
Your Data Are As Good As Your Sources!
32
y in Data Integrity
Reliability g y
34
Reliability in Data Integrity
How Useful Are Your Data?
Data?
35
y in Data Integrity
Reliability g y
How Timely Are Your Data?
Data
36
y in Data Integrity
Reliability g y
Timeliness
z Making data available in the form
needed when needed
needed, needed, and where
needed
z Timeliness
Ti li off D
Data
t CCollection
ll ti andd
Submission
z Timeliness of Data Processing, Analysis
and Reporting
37
y in Data Integrity
Reliability g y
How Accessible and Visible Are Your Data?
Data?
In Black Hole?
38
Reliability in Data Integrity
……Out of the Black Hole?
39
Data Integrity Drivers
40
Data Integrity Drivers
41
Data Integrity Drivers
Do You Collaborate with Your Stakeholders?
42
Data Integrity Drivers
H
How St
Strong iis Y
Your Collaboration?
C ll b ti ?
43
Data Integrity Drivers
Do You Have A Data Integrity Workgroup?
44
Data Integrity Drivers
Does the Workgroup Meet Regularly?
45
Data Integrity Drivers
Business Rules
z Do you have consistent and coherent
business rules for collection
collection, submission
submission,
maintenance and use of data?
z What is the role of the Data Integrity
Workgroup?
46
Data Integrity Drivers
Data
Meltdown
47
Data Integrity Drivers
Some Safeguards:
z Data should be physically, technically,
and logically secured!
z Should have policies and procedures to
ensure that sensitive data access is on a
“
“need
d tto know”
k ”b
basis
i
z Should have user authentication to
provide
id assurance as tto whoh iis
accessing what, when and how
48
Data Integrity Drivers
49
Data Integrity Drivers
Malicious Altering
z Internal threat from users: conscious
and intentional attack
z External threats: viruses, worms, and
hackers from the Internet
z Theft and security breach
z Fraud
50
Data Integrity Drivers
51
Data Integrity Drivers
Inadvertent/Accidental Altering
z Well meaning users compromising information
through inadvertent or ill-advised actions
z Hardware malfunction
52
Data Integrity Drivers
53
Data Integrity Drivers
Firewall:
Prevent
P t unauthorized
th i d electronic
l t i access tto your
networked computer system
Permit, deny, encrypt, decrypt, or proxy all
computer traffic between different security
domains
Prevent hackers from accessing a computer and
also keep information from being sent out from
your computer without your knowledge.
Don’t prevent virus attacks but, in some
circumstances, they can stop viruses from
sendingg information from an infected computer
p
54
Data Integrity Drivers
55
Data Infrastructure
Data Collection and Submission
56
Data Infrastructure
57
Data Infrastructure
58
Got Data Integrity?
Data Integrity is collecting, processing, maintaining and using information reliably,
accurately, and consistently, even if nobody is watching you!
accurately, and consistently, even if nobody is watching you!
59