You are on page 1of 44

MANAGING DATA RESOURCES:

OBJECTIVES
Understand the concept of data management and quality of data Understand the basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how a database management system organizes information and compare the principal database models Understand important database design principles

Data: The Key Corporate Resource


Data are representation of facts There is cost associated with collection, and storage of data Value of data may decline with passage of time as they are sometimes context specific, thus high degree of obsolescence There is a need to manage data effectively on a regular basis.

Steps in Management of Data Data Profiling (understanding data) Data Quality Management (improving quality) Data Integration (Combining similar data from multiple sources) Data Augmentation (Improving the value of data)

Problems in Data Management


Amount of data increases exponentially with time but often become out of date creating a huge problem of maintenance Data are scattered throughout the organisation is captured by different persons in different formats using different devices in different files/databases The ever increasing amount of external data is required for decision making and needs to be collected and integrated with the internal data Data security, quality and integrity are critical and can easily be jeopardized The statutory obligations to store data differ from country to country and keep changing. Data are often captured and used offline without going through the quality control checks, hence the validity of data is questionable

Dimensions of Quality of Data Accuracy Accessibility Relevance Timeliness Completeness

File Organization Terms and Concepts

Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single character Field: Group of words or a complete number Record: Group of related fields File: Group of records of same type Database: Group of related files

Data Hierarchy
DATABASE FILES RECORDS FIELDS

CHARACTER

Key Data Concepts Database: Group of related files Entity: Generalised class of people, places or things for which data is collected Attribute: A characteristic of an entity Data Item: The specific value of an attribute Key Field: A field or a set of fields that is used to identify the record for retrieving or updating data

Entities and Attributes

Alternative approaches to organising data

Traditional / Application Oriented Approach (An Approach where a set of data files are created for each application)
Data Base Approach (An approach where a pool of related data is shared by multiple applications)

Traditional File Processing

DATA BASE APPROACH


Application Programs Such as Payroll, invoicing, inventory, financial accounting Management/User inquiries

DATA BASE

DBMS

Problems with the Traditional File Environment Data Redundancy and Inconsistency: Data redundancy: The presence of duplicate data in multiple data files so that the same data are stored in more than one place or location Data inconsistency: The same attribute may have different values

Program-data dependence: The coupling of data stored in files and the specific programs required to update and maintain those files such that changes in programs require changes to the data Lack of flexibility: A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion

Poor security: Because there is a little control or management of data, management will have no knowledge of who is accessing or even making changes to the organizations data Lack of data-sharing and availability:

Information cannot flow freely across different functional areas or different parts of the organization. Users find different values of the same piece of information in two different systems, and hence they may not use these systems because they cannot trust the accuracy of the data

Database A database is an organised logical grouping of related data, in a manner that a single software provides access to all the data.

THE DATABASE APPROACH TO DATA MANAGEMENT

Database Management System (DBMS) Software for creating and maintaining databases Permits firms to rationally manage data for the entire firm Acts as interface between application programs and physical data files Separates logical and design views of data Solves many problems of the traditional data file approach

THE DATABASE APPROACH TO DATA MANAGEMENT

The Contemporary Database Environment

THE DATABASE APPROACH TO DATA MANAGEMENT

Components of DBMS: Data Model Data definition language: Specifies content and structure of database and defines each data element Data manipulation language: Used to process data in a database Data dictionary: Stores definitions of data elements, and data characteristics

Data Model Data Model defines the way data are conceptually structured The examples of model forms include hierarchical, network, relational, hypermedia, etc.

Data Definition Language DDL is the language used by programmers to specify the type of information and structure of the databases. It is essentially a link between the logical view of database and physical view of database

Data Manipulation Language The language used with a third or fourth generation language to manipulate the data in the data base. This language contains commands that permit the end users and programming specialists to extract data from the database to satisfy the the information requests and develop applications that access data from the data base. DML provides the user with the ability to retrieve, sort, display and delete contents of database. DML includes command like SELECT, MODIFY,DELETE

Data Dictionary The data dictionary stores definitions of data elements and data characteristics such as usage, physical representation, ownership, authorisation and security Data dictionary provides standard definitions to each data elements They also serve as metadata ( data about data)

THE DATABASE APPROACH TO DATA MANAGEMENT

Sample Data Dictionary Report

THE DATABASE APPROACH TO DATA MANAGEMENT

Types of Databases Relational DBMS Hierarchical and network DBMS

Object-oriented databases

THE DATABASE APPROACH TO DATA MANAGEMENT

Relational DBMS: Represents data as two-dimensional tables called relations Relates data across tables based on common data element Examples: DB2, Oracle, MS SQL Server

THE DATABASE APPROACH TO DATA MANAGEMENT

The Relational Data Model

Hierarchical and Network DBMS


Hierarchical DBMS Organizes data in a tree-like structure Supports one-to-many parent child relationships Prevalent in large legacy systems

THE DATABASE APPROACH TO DATA MANAGEMENT

A Hierarchical Database for a Human Resources System

Hierarchical and Network DBMS


Network DBMS: Depicts data logically as many-to-many relationships

THE DATABASE APPROACH TO DATA MANAGEMENT

The Network Data Model

Hierarchical and Network DBMS


Disadvantages:

Outdated Less flexible compared to RDBMS Lack support for ad-hoc and English languagelike queries

Physical design
Entity-relationship diagram: Methodology for documenting databases illustrating relationships between database entities Normalization: Process of creating small stable data structures from complex groups of data

CREATING A DATABASE ENVIRONMENT

An Unnormalized Relation for ORDER

CREATING A DATABASE ENVIRONMENT

Normalized Tables Created from ORDER

CREATING A DATABASE ENVIRONMENT

An Entity-Relationship Diagram ( Logical view of Data)

Distributing Databases
Centralized database:

Used by single central processor or multiple processors in client/server network


There are advantages and disadvantages to having all corporate data in one location Security is higher in central environments, risks lower If data demands are highly decentralized, then a decentralized design is less costly, and more flexible

Distributed database Data bases can be decentralized either by partitioning or by replicating Partitioned database: Database is divided into segments or regions. For example, a customer database can be divided into Eastern customers and Western customers, and two separate databases maintained in the two regions

Duplicated database: Duplicated database: The database is completely duplicated at two or more locations. The separate databases are synchronized in off hours on a batch basis Regardless of which method is chosen, data administrators and business managers need to understand how the data in different databases will be coordinated and how business processes might be effected by the decentralization

CREATING A DATABASE ENVIRONMENT

Distributed Databases

Ensuring Data Quality: Corporate and government databases have unexpectedly poor levels of data quality National consumer credit reporting databases have error rates of 20-35%

32% of the records in the FBI's Computerized Criminal History file are inaccurate, incomplete, or ambiguous
Gartner Group estimates that consumer data in corporate databases degrades at the rate of 2% a month

.Ensuring Data Quality The quality of decision making in a firm is directly related to the quality of data in its databases Data Quality Audit: Structured survey of the accuracy and level of completeness of the data in an information system Data Cleansing: Consists of activities for detecting and correcting data in a database or file that are incorrect, incomplete, improperly formatted, or redundant

MANAGEMENT OPPORTUNITIES, CHALLENGES, AND SOLUTIONS

Key Organizational Elements in the Database Environment

You might also like