Professional Documents
Culture Documents
OBJECTIVES
Understand the concept of data management and quality of data Understand the basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how a database management system organizes information and compare the principal database models Understand important database design principles
Steps in Management of Data Data Profiling (understanding data) Data Quality Management (improving quality) Data Integration (Combining similar data from multiple sources) Data Augmentation (Improving the value of data)
Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single character Field: Group of words or a complete number Record: Group of related fields File: Group of records of same type Database: Group of related files
Data Hierarchy
DATABASE FILES RECORDS FIELDS
CHARACTER
Key Data Concepts Database: Group of related files Entity: Generalised class of people, places or things for which data is collected Attribute: A characteristic of an entity Data Item: The specific value of an attribute Key Field: A field or a set of fields that is used to identify the record for retrieving or updating data
Traditional / Application Oriented Approach (An Approach where a set of data files are created for each application)
Data Base Approach (An approach where a pool of related data is shared by multiple applications)
DATA BASE
DBMS
Problems with the Traditional File Environment Data Redundancy and Inconsistency: Data redundancy: The presence of duplicate data in multiple data files so that the same data are stored in more than one place or location Data inconsistency: The same attribute may have different values
Program-data dependence: The coupling of data stored in files and the specific programs required to update and maintain those files such that changes in programs require changes to the data Lack of flexibility: A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion
Poor security: Because there is a little control or management of data, management will have no knowledge of who is accessing or even making changes to the organizations data Lack of data-sharing and availability:
Information cannot flow freely across different functional areas or different parts of the organization. Users find different values of the same piece of information in two different systems, and hence they may not use these systems because they cannot trust the accuracy of the data
Database A database is an organised logical grouping of related data, in a manner that a single software provides access to all the data.
Database Management System (DBMS) Software for creating and maintaining databases Permits firms to rationally manage data for the entire firm Acts as interface between application programs and physical data files Separates logical and design views of data Solves many problems of the traditional data file approach
Components of DBMS: Data Model Data definition language: Specifies content and structure of database and defines each data element Data manipulation language: Used to process data in a database Data dictionary: Stores definitions of data elements, and data characteristics
Data Model Data Model defines the way data are conceptually structured The examples of model forms include hierarchical, network, relational, hypermedia, etc.
Data Definition Language DDL is the language used by programmers to specify the type of information and structure of the databases. It is essentially a link between the logical view of database and physical view of database
Data Manipulation Language The language used with a third or fourth generation language to manipulate the data in the data base. This language contains commands that permit the end users and programming specialists to extract data from the database to satisfy the the information requests and develop applications that access data from the data base. DML provides the user with the ability to retrieve, sort, display and delete contents of database. DML includes command like SELECT, MODIFY,DELETE
Data Dictionary The data dictionary stores definitions of data elements and data characteristics such as usage, physical representation, ownership, authorisation and security Data dictionary provides standard definitions to each data elements They also serve as metadata ( data about data)
Object-oriented databases
Relational DBMS: Represents data as two-dimensional tables called relations Relates data across tables based on common data element Examples: DB2, Oracle, MS SQL Server
Outdated Less flexible compared to RDBMS Lack support for ad-hoc and English languagelike queries
Physical design
Entity-relationship diagram: Methodology for documenting databases illustrating relationships between database entities Normalization: Process of creating small stable data structures from complex groups of data
Distributing Databases
Centralized database:
Distributed database Data bases can be decentralized either by partitioning or by replicating Partitioned database: Database is divided into segments or regions. For example, a customer database can be divided into Eastern customers and Western customers, and two separate databases maintained in the two regions
Duplicated database: Duplicated database: The database is completely duplicated at two or more locations. The separate databases are synchronized in off hours on a batch basis Regardless of which method is chosen, data administrators and business managers need to understand how the data in different databases will be coordinated and how business processes might be effected by the decentralization
Distributed Databases
Ensuring Data Quality: Corporate and government databases have unexpectedly poor levels of data quality National consumer credit reporting databases have error rates of 20-35%
32% of the records in the FBI's Computerized Criminal History file are inaccurate, incomplete, or ambiguous
Gartner Group estimates that consumer data in corporate databases degrades at the rate of 2% a month
.Ensuring Data Quality The quality of decision making in a firm is directly related to the quality of data in its databases Data Quality Audit: Structured survey of the accuracy and level of completeness of the data in an information system Data Cleansing: Consists of activities for detecting and correcting data in a database or file that are incorrect, incomplete, improperly formatted, or redundant