You are on page 1of 68

ICT II

Topic One: Introduction to


Data Management
By
Bwiino Keefa
Email: kbwiino@mubs.ac.ug
MUBS JINJA CAMPUS
Dept. Marketing & Management
b
Data are Raw Facts
Proper data management is the
foundation on which business
success is built
Successful management of data is critical to
any organizational mission to make
informed decisions.
b
b
Motivation/Importance of Data Mgt
 Data management plays a significant role in an
organization’s ability to generate revenue, control
costs
 Successfully being able to share, store, protect and
retrieve the ever-increasing amount of data can be
the competitive advantage for organizations today.
 Data management helps organizations to mitigate
risks.
 It enables decision making in organizations

b
What Are the Benefits of Good Data
Management?

 Optimum data quality


 Improved user confidence
 Efficient and timely access to data
 Improved knowledge and
understanding of the agency’s data
holdings
 Improves decision making in an
organisation
b
What Are the Costs of Poor Data
Management?
 Misinterpretation of the data
 Lost data
 Inaccessible data
 Indefensible data
 Wasted time and money
 Missed deadlines
 Lost user confidence
 Any of these tantamounts to business failure
b
Managing data Resources:
 An information system provides users with timely, accurate,
and relevant information.
 The information is stored in computer files.
 When files are properly arranged and maintained, users can
easily access and retrieve the information when they need.
 If the files are not properly managed, they can lead to chaos in
information processing.
 Even if the hardware and software are excellent, the
information system can be very inefficient because of poor file
management.

b
Data Management
 Data Management is a broad field of study,
but essentially is the process of managing
data as a resource that is valuable to an
organization or business.
 Data management can also be the
development and execution of architectures,
policies, practices and procedures in order to
manage the information lifecycle needs of
an enterprise in an effective manner.
b
Areas of Data Management
 Data Modeling- Is first creating a structure for the
data that you collect and use and then organizing
this data in a way that is easily accessible and
efficient to store and pull the data for reports and
analysis.
 Data warehousing - is storing data effectively so
that it can be accessed and used efficiently in future.
 Data Movement - is the ability to move data from
one place to another. For instance, data needs to be
moved from where it is collected to a database and
then to an end user.
b
Areas of Data Management cont
 Database Administration - is extremely important in
managing data. Every organization or enterprise needs
database administrators that are responsible for the database
environment.
 Data mining - is a process in which large amounts of data
are sifted through to show trends, relationships, and patterns.
Data mining is a crucial component to data management
because it exposes interesting information about the data
being collected. It is important to note that data is primarily
collected so it can be used to find these patterns,
relationships and trends that can help a business grow or
create profit.
b
File Organization Terms and Concepts

 A computer system organizes data in a hierarchy that starts


with the bit.
 Bit represents 0 or 1.
 8 bits are grouped to form a byte. Each byte represents one
character, number , or symbol.
 Bytes can be grouped to form a field. It can represent a
person’s name or age.
 Related fields can be grouped to form a record. Related fields
can be student’s name, course taken and the grade.
 Related records can be grouped to form a file.
 Related files can be grouped to form a database
b
The Data Hierarchy
Course Financial
Database File File Student database

Personal History File

Edward Kabaale BBA A


File Cosmas Engen BBA B attribute
NAME COURSE GRADE
Record Edward Kabaale ERM A
Field Edward Kabaale (NAME field)
Byte 01000001 (letter A in Binary)
Bit 0 or 1 b
More File Organization Terms
Key field
Every record in a file should contain at least one field that uniquely
identifies that record so that the record can be retrieved, updated,
or sorted. This identifier field is called a key field (Primarykey).

Key field
NAME STUDENT No COURSE GRADE

Sarah Kissa 959010054 CIS 500 A


Daniel Boles 969010055 IST 203 B

b
Accessing Records from Computer Files

 Computer stores files on secondary storage devices.


 Records can be arranged in several ways on storage
media.
 How individual record can be accessed or retrieved
depends on how they are arranged on storage media.
Most computer applications utilize this method.
 There are mainly two ways to organize records:
sequentially or randomly.
 In sequential file organization, data records must be
retrieved in the same physical sequence in which they are
stored.
b
Accessing Records from Computer Files

 In direct or random file organization, data


records can be accessed in any sequence as
users desire, without regard to actual physical
order on the storage media.
 Sequential file organization is the only file
organization that can be used on magnetic
tape. Example: Payroll
 Direct or random file organization is utilized
with magnetic disk. b
Traditional Approach to Data Management
(File based approach)

• The “old” way of doing things; still often


used in practice.
• Separate information stored on separate
files.
• It is a way of collecting and maintaining
data in an organization that leads to each
functional area or division creating and
maintaining its own data files and programs.
Changes and updates are made to these files
b
separately
File Processing Example:
Sales Production Marketing
Knows how Knows how Knows the
many of much of price of
Products A, Products A, Products A,
B, and C have B, and C have B, and C.
been sold. been produced. File stores
File stores File stores Prod. Name
Prod. Name, Prod. Name, and Product
Production Production Price.
Schedule, Schedule, and
and Sales. Number Produced.

b
Advantages of File Based
Approach
 Backup:
 Itis possible to take faster and automatic back-
up of database stored in files of computer-based
systems.
 computer systems provide functionalities to
serve this purpose.it is also possible to develop
specific application program for this purpose.
 Compactness:
 It is possible to store data compactly.
b
Advantages of File Based
Approach
 Data Retrieval:
 Computer-based systems provide enhanced data
retrieval techniques to retrieve data stored in
files in easy and efficient way.
 Editing:
 It is easy to edit any information stored in
computers in form of files.
 Specific application programs or editing
software can be used for this purpose.
b
Advantages of File Based
Approach
 Remote Access:
 In computer-based systems,it is possible to
access data remotely.
 so,to access data it is not necessary for a user to
remain present at location where these data are
kept.
 Sharing:
 Data stored in files of computer-based systems
ca be shared among multiple users at a same
time.
b
Problems of File Based Approach
 Data Redundancy:
 Itis possible that the same information may be
duplicated in different files. This leads to data
redundancy which results into memory wastage.
 Data Inconsistency:
 Because of data redundancy, it is possible that
data may not be in consistent state due to
multiple storage of the same data

b
Problems of File Based Approach
 Difficulty in Accessing Data:
 Accessing data is not convenient and efficient in
file processing system.
 Limited Data Sharing:
 Data are scattered in various files. Also different
files may have different formats and these files
may be stored in different folders may be of
different departments.
 So, due to this data isolation, it is difficult to
share data among different applications.
b
Problems of File Based Approach
 Integrity Problems:
 Data integrity means that the data contained in
the database is both correct and consistent. For
this purpose the data stored in database must
satisfy correct and constraints.
 Atomicity Problems:
 Any operation on database must be atomic. This
means, it must happen in its entirely or not at
all.

b
Problems of File Based Approach
 Concurrent Access Anomalies:
 Multiple users are allowed to access data
simultaneously. This is for the sake of better
performance and faster response.
 Security Problems:
 Database should be accessible to users in
limited way.
 Each user should be allowed to access data
concerning his requirements only
b
Problems of File Based Approach
 Data dependence - Using file-based system, the physical
structure and storage of the data files and records are defined
in the application program code. This characteristic is known
as program-data dependence. Making changes to an existing
structure are rather difficult and will lead to a modification of
program. Such maintenance activities are time-consuming
and subject to error.
 Incompatible file format - The structures of the file are
dependent on the application programming language.
However file structure provided in one programming
language such as direct file, indexed-sequential file which is
available in COBOL programming, may be different from the
structure generated by other programming
b language such as
C.
Problems of File Based Approach
 Lack of flexibility refers to the fact that it is very difficult to
create new reports from the data when needed. Ad hoc reports
are impossible; a new report could require several weeks of
work by more than one programmer and the creation of
intermediate files to combine data from disparate files.

b
Understanding Terms

 Data redundancy is the presence of duplicate data in multiple


data files. In this situation there is confusion of results because
the data can have different meanings in different files.
 Program-data dependence is the tight relationship between data
stored in files and the specific programs required to update and
maintain those files. This dependency is very inefficient,
resulting in the need to make changes in many programs when a
common piece of data (such as zip code) changes.

b
Database Approach to data
management

 In order to overcome the limitations of the


file-based approach, the concept of database
and the Database Management System
(DBMS) emerged for data management
 Many programs and users can share
data in a database
 Secures data so only authorized users
can access certain data
b
Basic Database Definitions
 Database: A collection of related data. A necessity for almost
any enterprise to carry out its business. Consists of raw facts, and when
organized may be transformed into information
 Data: Known facts that can be recorded and have an
implicit meaning.
 Mini-world: Some part of the real world about which data
is stored in a database. For example, student grades and
transcripts at a university.
 Database Management System (DBMS): A collection of
software to facilitate the creation and maintenance of a DB.
 Database System: The DBMS software together with the
data. Sometimes, applications are also included.

b
Basic Database Definitions
 Data-Item (field):
It is a character or group of characters that has a
specific meaning. For Example, cid, cname from
customer table
 A record:
 Itis a collection of logically related fields. And
we also say that record consists of values for
each field.

b
Basic Database Definitions
 A file:
It is a collection of related records arranged in a specific
sequence.
 Metadata:
 Set of data that describes and gives information about
another data. In other words, data about data is called
metadata.
 System Catalog:
 The system catalog is a collection of tables and views
that contain important information about a database. A
system catalog is available for each database.
b
Basic Database Definitions

 Data dictionary:
 Data dictionary is a file that contains metadata that is usually a part of
the system catalog. It has the following for components: Entities,
Attributes, Relationships and Keys
 Entity
– A generalized class of people, places, or things (objects) for which
data are collected, stored, and maintained
– E.g., Customer, Employee
 Attribute
– A characteristic of an entity; something the entity is identified by
– E.g., Customer name, Employee name

b
Database Keys

 Keys
– A field or set of fields in a record that is used to identify the
record
– E.g, A field or set of fields that uniquely identifies the record

 Primary Key
– This is a the first key that uniquely identifies a record e.g regno,
employee_ID

 Candidate Key
– This any other key other than the primary key that we can use
to identify a record e.g NIN, NSSFN,TIN, Passport Number
 Foreign Key
– A field that enforces referential integrity between two tables in
the database b
Database Keys
 Compound Key
– This is when more than one field is combined to
form a primary key eg. Studentno & courseID
 Composite Key
– Composite key is similar to compound key, but the
columns which are part of composite keys
are always keys in that table.
 Surrogate Key
– Surrogate key is a kind of primary key, but it is not
defined by the designer. It is a system generated
random number, which uniquely identifies the entity
in the system and not available for the user.
b
Database Management System (DBMS)

• Software for creating and maintaining databases

• Permits firms to rationally manage data for the entire firm

• Acts as interface between application programs and


physical data files

• Separates logical and design views of data

• Solves many problems of the traditional data file approach


• Examples of DBMS????

b
The Contemporary Database Environment

b
Functional Components of DBMS
 Data Definition Language (DDL) - It defines each
element as it appears in the database. The DDL is the
formal language programmers use to specify the content
and structure of the database.
 Data Manipulation Language (DML) - It is a set of
procedural commands that enable programmers to append,
modify, update, and retrieve data. The DML uses simple
verbs like sort, delete, insert, select, display

 Query Language

b
Functional Components of DBMS
 A query language - It enables the user to
make queries from the database. It is a
standard data manipulation language for
relational database management systems.
E.g SQL
 Report Generators - It enables generation of
reports from a database. The programs
enable reports be presented using pictures,
graphics, maps etc.
b
Functional Components of DBMS

 Application Generators - Most of the DBMS


packages include programming facilities
available in 4th Generation Languages
(4GLS).
 User Interface - This is a shell that provides
the environment for interaction of a user
with the database.

b
Advantages of Database Approach
 Control of data redundancy -The database approach
attempts to eliminate the redundancy by integrating the file.
Although the database approach does not eliminate
redundancy entirely, it controls the amount of redundancy
inherent in the database.
 Data consistency - By eliminating or controlling
redundancy, the database approach reduces the risk of
inconsistencies occurring. It ensures all copies of the idea
are kept consistent.
 More information from the same amount of data - With
the integration of the operated data in the database
approach, it may be possible to derive additional
information for the same data.b
Advantages of Database Approach
 Sharing of data - Database belongs to the entire
organization and can be shared by all authorized users.
 Improved data integrity - Database integrity provides the
validity and consistency of stored data. Integrity is usually
expressed in terms of constraints, which are consistency
rules that the database is not permitted to violate.
 Improved security - Database approach provides a
protection of the data from the unauthorized users. It may
take the term of user names and passwords to identify user
type and their access right in the operation including
retrieval, insertion, updating and deletion.

b
Advantages of Database Approach
 Enforcement of standards -The integration of the
database enforces the necessary standards including data
formats, naming conventions, documentation standards,
update procedures and access rules.
 Increased concurrency - Database can manage concurrent
data access effectively. It ensures no interference between
users that would not result any loss of information nor loss
of integrity.
 Improved backing and recovery services - Modern
database management system provides facilities to
minimize the amount of processing that can be lost
following a failure by using the transaction approach.
b
Disadvantages of Database Approach
 Complexity - Database management system is an extremely complex
piece of software. All parties must be familiar with its functionality
and take full advantage of it. Therefore, training for the
administrators, designers and users is required.
 Size - The database management system consumes a substantial
amount of main memory as well as a large number amount of disk
space in order to make it run efficiently.
 Cost of DBMS - A multi-user database management system may be
very expensive. Even after the installation, there is a high recurrent
annual maintenance cost on the software.
 Cost of conversion - When moving from a file-base system to a
database system, the company is required to have additional expenses
on hardware acquisition and training cost.

b
Disadvantages of Database Approach
 Performance - As the database approach is to cater for
many applications rather than exclusively for a particular
one, some applications may not run as fast as before.
 Higher impact of a failure - The database approach
increases the vulnerability of the system due to the
centralization. As all users and applications reply on the
database availability, the failure of any component can
bring operations to a halt and affect the services to the
customer seriously.

b
Database Principles
 Data Independence-This is used to describe the separation of
data or data handling from the functional processing of the data
and the programs that use the data.
 Data Integrity - This is where data is held in a single, integrated
database
 Data Redundancy/Data Duplication - This describes the case
where a particular data element is individually kept at several
places (records, files, etc) in the database.
 Data Security - This is the ability of a database system to
preserve and protect the data which it holds.

b
Database Models
 Collection of logical constructs used to
represent data structure and relationships
within the database
 Conceptual models: logical nature of data
representation
 Implementation models: emphasis on how the
data are represented in the database

b
Database Models (con’t.)
 Relationships in Conceptual Models
 One-to-one(1:1)
 One-to-many (1:M)
 Many-to-many (M:N)

 Implementation Database Models


 Hierarchical
 Network
 Relational

b
Hierarchical Database model

Hierarchical DBMS:

• Organizes data in a tree-like structure

• Supports one-to-many parent-child relationships

• Prevalent in large legacy systems

b
A Hierarchical Database for a Human Resources System

b
Network Data Model

 Network data model


– An expansion of the hierarchical database model
with an owner-member relationship in which a
member may have many owners

Project 1 Project 2

Department A Department B Department C

b
Relational Data Model

 Relational data model


 All data elements are placed in two-dimensional
tables, called relations, that are the logical
equivalent of files

b
Data Table 1: Project Table Data Table 2: Department Table

Project Number Description Dept. Number Dept. Number Dept. Name Manager SSN

155 Payroll 257 257 Accounting 421-55-99993

498 Widgets 632 632 Manufacturing 765-00-3192

226 Sales manager 598 598 Marketing 098-40-1370

Data Table 3: Manager Table


SSN Last Name First Name Hire Date Dept. Number

005-10-6321 Johns Francine 10-7-65 257

549-77-1001 Buckley Bill 2-17-79 650

098-40-1370 Fiske Steven 1-5-85 598

b
Entity Relationship Database
Model
 Complements the relational data model
concepts
 Represented in an entity relationship
diagram (ERD)
 Based on entities, attributes, and
 relationships

b
Database Types
• Flat file
– Has no relationship between its records
– Used to store and manipulate a single table or file
– stores each record as a line of text, and uses commas, tabs, or other indicators within the
line to separate the items
• Comma-separated values (CSV)
– File organizer
• Goes beyond the capabilities of a flat file to store and/or retrieve data

• Single User
– Only one person can use the database at any time (e.g. Microsoft Outlook and Quicken
used to store and manipulate personal data)

• Multiuser
– Networked computer systems need multiuser DBMSs
– Allow several people in an organization access the data and to see each other’s changes

Centralized vs. Distributed


b
Database Types

 General-purpose database
 Can be used for a large number of applications

 Special-purpose database
 Designed for a limited number of applications
or to serve a specific need

 Open-Source database systems


 PostgreSQL, MySQL
b
Using Databases with Other
Software

 Front-end application
 One that directly interacts with people or users
 Back-end application
 Interacts with other programs or applications
 System designers are increasingly using the
Web as the front end to database systems

b
STEPS IN DATABASE DESIGN

 Requirement analysis
What does the user want?
• Conceptual database design
Defining the entities and attributes, and
the relationships between these --> The
ER model
• Physical database design
Implementation of the conceptual design
using a Database Management System
b
Normalization

 Normalization is used to streamline the


database design by removing redundant data,
such as repeating groups.

 A database that is not normalized will have


inefficient queries and will delete information
when it should not or update only part of the
information.
b
Normal Forms

 A set of conditions on table structure that


improves maintenance. Normalization
removes processing anomalies:
 Update
 Inconsistent Data
 Addition
 Deletion
Normal Forms

All attributes depend on the key, the whole


key and nothing but the key.
1NF Keys and no repeating groups
2NF No partial dependencies
3NF All determinants are candidate keys
4NF No multivalued dependencies
Importance of a database in Organizations
 Simplify the search and utilization of information in an
organization
 Helps in monitoring the progress of the business operation
well, so they can take quick and appropriate steps if a
problem occurs.
 Assisting in organizing organizational data such as
employees / members bios of the organization, consumer
biodata, list of products, payment of salaries, payment of
bills, and others.
 Facilitate the members in the data access activities include
data acquisition and manipulation of data such as add and
delete data through the authority that has been given.
b
Importance of a database in Organizations
 Keeping the organization’s data security, because any data
can be protected by provide login and password for each
data.
 The database can help to determine a better strategy for the
advancement of an organization in the future.
 Assist marketing activities for the database to collect
customer data in a complete and detailed so as to facilitate
the marketing activities for an organization or company.
 The database can save on operating costs for an
organization / company to manage information.

b
Trends – Distributed Databases
• Distributed database
– Also called a virtualized database
– Actual data may be spread across several databases at different
locations, allow more users direct access at different user sites
• Master database file: database that records the existence of all other
databases and the location of those database files and records the
initialization information for database
• Transaction database file: comprises a unit of work performed
within a DBMS against a database, and treated in a coherent and
reliable way independent of other transactions
• Replicated database
– Database that holds a duplicate set of frequently used data

b
Centralized Databases
• Used by single central processor or multiple processors
in client/server network

• There are advantages and disadvantages to having all


corporate data in one location.

• Security is higher in central environments, risks lower.

• If data demands are highly decentralized, then a


decentralized design is less costly, and more flexible.

b
Database Administration
• Database administrator
– A skilled and trained computer professional who directs all
activities related to an organization’s database, including providing
security from intruders
• responsible for
– Overall design and coordination of the database
– Development and maintenance of schemas
– Development and maintenance of the data dictionary
– Implementation of the DBMS
– System and user documentation
– User support and training
– Overall operation of the DBMS
– Testing and maintaining of the DBMS
– Establishing emergency and recovery procedures

b
Database Recoverability
 is usually defined as a way to store data as a back
up and then test the back ups to make sure that
they are valid.
 the task of integrity means that data that is pulled
for certain records or files are in fact valid and
have high data integrity
 data integrity is extremely important especially
when creating reports or when data is used for
analysis. If you have data that is deemed invalid,
your results will be worthless.
b
Database Security
 Is an essential task for database administrators. For
instance, database administrators are usually in charge of
giving clearance and access to certain databases in an
organization.
 Another important task is availability. Availability is
defined as making sure a database is up and running. The
more up time, usually the higher level of productivity.
 Performance is related to availability, it is considered
getting the most out of the hardware, applications and data
as possible. Performance is usually in relation to an
organizations budget, physical equipment and resources.
b
Ensuring Data Quality
• The quality of decision making in a firm is directly
related to the quality of data in its databases.
• Data Quality Audit: Structured survey of the accuracy
and level of completeness of the data in an
information system
• Data Cleansing: Consists of activities for detecting
and correcting data in a database or file that are
incorrect, incomplete, improperly formatted, or
redundant

You might also like