You are on page 1of 144

Designing and Building Access

Database Systems

Edition 7, October 2009

Mark Gregory
___________________________________________________________________________

Page 1 of 144
Designing and Building Access
Database Systems
Mark Gregory
École Supérieure de Commerce de Rennes

ESC Rennes School of Business, France

Previously,

Edition 0 School of Computing and March 2000


Engineering, University of
Huddersfield

Edition 7 ESC Rennes School of Business, October 2009


France

Work on this document started when Mark Gregory was working at


the University of Huddersfield, UK in 1999/2000. Some of the
material included was originally written by my then colleague Dr.
Steve Wade, who is still at Huddersfield. Other portions draw on
work by Dr. Ken Lunn, who has moved on to the IT Directorate of
the UK National Health Service.
The Sixth Edition was a very substantial revision of the previous
year’s edition, as is this Seventh Edition.

Page 2 of 144
1. INTRODUCTION: WHO IS THIS DOCUMENT FOR?..................................... 13

1.1. Preface...................................................................................................................................................... 13

1.2. Skills required.......................................................................................................................................... 13

1.3. The aims of the remainder of this document ......................................................................................... 13

1.4. The structure of this document and how to use it................................................................................. 14

1.5. About Learning Access ........................................................................................................................... 15


1.5.1. Starter: the naïve user: Level 1 .......................................................................................................... 15
1.5.2. The thinking user: Level 2 ................................................................................................................. 15
1.5.3. The competent (“power”) user: Level 3............................................................................................. 15
1.5.4. Advanced: the programmer or systems integrator: Level 4........................................................................ 15
1.5.5. What is the relevance of MS Access skills? ...................................................................................... 15

1.6. Testing yourself........................................................................................................................................ 16


1.6.1. What you should already know.......................................................................................................... 16
1.6.2. What you should already be able to do or need now to learn ............................................................ 16
1.6.3. A checklist of more advanced skills ........................................................................................................ 17
1.6.4. Further help on learning to learn ....................................................................................................... 17

1.7. Conventions Used in this document ....................................................................................................... 17

1.8. Limitations ............................................................................................................................................... 17

1.9. Acknowledgements .................................................................................................................................. 18

SECTION 1 - THE PRINCIPLES OF DATABASE.................................................... 19

2. A BRIEF INTRODUCTION TO DATABASES.................................................. 19

2.1. Databases (bases de données) and how they are designed ................................................................... 19

2.2. Models used by systems analysts ............................................................................................................ 19

2.3. Simple model of data processing ............................................................................................................ 19

2.4. Why study Databases? ............................................................................................................................ 20


2.4.1. They are used in every significant BIS .............................................................................................. 20
2.4.2. They are at the heart of ...................................................................................................................... 20

2.5. Background.............................................................................................................................................. 20

2.6. How to store data .................................................................................................................................... 20

2.7. What is a database?................................................................................................................................. 21

2.8. Why keep data in different tables? ........................................................................................................ 21

2.9. Learning a minimum about database .................................................................................................... 23

Page 3 of 144
2.10. Basic concepts ...................................................................................................................................... 23

2.11. Entity (type, class) ............................................................................................................................... 24

2.12. An example: Students by Programme ............................................................................................... 25

2.13. Attribute............................................................................................................................................... 25

2.14. Primary and Foreign Keys.................................................................................................................. 25

2.15. Why have foreign keys? ...................................................................................................................... 26

2.16. Entity occurrences – student .............................................................................................................. 26

2.17. Entity occurrences – programme ....................................................................................................... 26

2.18. Queries.................................................................................................................................................. 26

2.19. Students and Modules ......................................................................................................................... 27

2.20. Three Vital Rules................................................................................................................................. 27

2.21. Resolving Many-to-Many Relationships ........................................................................................... 27

2.22. Towards a more complete entity relationship attribute model........................................................ 30

2.23. Example Query – definition................................................................................................................ 31

2.24. Example Query – results ..................................................................................................................... 31

2.25. What is a query? (French ‘une requête’) ......................................................................................... 32

2.26. Data Dictionary ................................................................................................................................... 32

2.27. Entity-Relationship Diagrams (ERD) ................................................................................................ 32

2.28. Another example database: NorthWind / Les Comptoirs................................................................ 33

2.29. Relationship: meaning and characteristics........................................................................................ 33

2.30. Why it’s important to relate entities .................................................................................................. 34

2.31. Degree of relationship (simplified) ..................................................................................................... 34

2.32. Simple database design ....................................................................................................................... 34

3. A SIMPLE METHODOLOGY FOR DESIGNING MICROSOFT ACCESS


DATABASES ............................................................................................................ 35

3.0. Introduction – background to the methodology ................................................................................... 35


3.0.1. Database, or something else?............................................................................................................. 35
3.0.2. What is a methodology? .................................................................................................................... 35
3.0.3. Assumptions ...................................................................................................................................... 36
3.0.4. Introduction to modelling business information systems: why we have chosen certain techniques... 36
3.0.5. What we’re trying to achieve together............................................................................................... 37
3.0.6. Business Process Modelling: Documenting a Business Process........................................................ 37
3.0.7. Why have we chosen the techniques we have?.................................................................................. 38

Page 4 of 144
3.0.8. Business Process Modelling .................................................................................................................. 38
3.0.9. SSADM ............................................................................................................................................... 38
3.0.10. MERISE .............................................................................................................................................. 39

3.1. Feasibility study....................................................................................................................................... 39

3.2. Set out Project Terms of Reference ....................................................................................................... 40

3.3. Analyse the needs of users ...................................................................................................................... 40


3.3.1. Identify business processes using a high-level Use Case diagram..................................................... 40
3.3.2. Identify detailed requirements for a process to be computerised: carry out Process Modelling ....... 40

3.4. Decide the purpose and basic contents of the database – Data Modelling ......................................... 41
3.4.1. Basic Constructs of ER Modelling .................................................................................................... 41
3.4.2. Deciding entity types ......................................................................................................................... 41
3.4.3. Entities............................................................................................................................................... 41
3.4.4. Relationships ..................................................................................................................................... 42
3.4.5. Fields: What are the attributes of each entity?................................................................................... 43
3.4.6. Data type: Domain............................................................................................................................. 44
3.4.7. Identify Domains................................................................................................................................... 44
3.4.8. Classifying Relationships .................................................................................................................. 45
3.4.9. Keys: primary and secondary (“foreign”) .......................................................................................... 46
3.4.10. Normalisation....................................................................................................................................... 48
3.4.11. ER Notation....................................................................................................................................... 48
3.4.12. Online tutorial.................................................................................................................................... 50
3.4.13. DFDs and ERDs – why both? How are they linked?......................................................................... 50
3.4.14. Why BOTH Data and Process models?............................................................................................. 51

3.5. Cross-check: entity life history............................................................................................................... 51


3.5.1. Cross-check DFD and ERA ................................................................................................................... 51
3.5.2. Time dimension.................................................................................................................................... 51

3.6. Model User <-> System Interactions...................................................................................................... 51

3.7. Define required outputs: reports, forms, queries ................................................................................. 52

3.8. How will Input / Update be carried out (Forms etc.)? ......................................................................... 52

3.9. Work through your design on paper, whiteboard, etc. ........................................................................ 52

3.10. Implementing processes in Access ...................................................................................................... 52


3.10.1. System data processing...................................................................................................................... 52

3.11. Define (“design” in Access terms) the database: Build a prototype................................................ 53

3.12. Refine / iterate / implement................................................................................................................. 53

3.13. Test the database ................................................................................................................................. 53

3.14. Obtain User Feedback......................................................................................................................... 53

3.15. Refine the system by Iteration ............................................................................................................ 54

4. PUTTING DATABASE DESIGN THEORY INTO PRACTICE ......................... 54

4.1. Design aids ............................................................................................................................................... 54

Page 5 of 144
4.2. An Exercise .............................................................................................................................................. 54

4.3. Achieving real competence in Database Design .................................................................................... 55


4.3.1. Documented scenarios.......................................................................................................................... 55
4.3.2. Suggested but undocumented scenarios................................................................................................. 55
4.3.3. Further study........................................................................................................................................ 56

5. MORE ABOUT DATABASES.......................................................................... 56

5.1. What is a database?................................................................................................................................. 56

5.2. The history of databases ......................................................................................................................... 56

5.3. Implementing data models in MS Access .............................................................................................. 57

5.4. What is a database management system?.............................................................................................. 58

5.5. First challenge: database design............................................................................................................. 58

5.6. Second challenge: database implementation ......................................................................................... 58

5.7. An inductive approach ............................................................................................................................ 58

5.8. What Is a Database?................................................................................................................................ 59

5.9. What Is a DBMS?.................................................................................................................................... 59

SECTION 2 – USING MICROSOFT ACCESS TO BUILD GOOD DATABASES..... 60

6. INTRODUCTION TO MICROSOFT ACCESS ................................................. 60

6.1. What is a database management system?.............................................................................................. 60


6.1.1. Software which manages a database.................................................................................................. 60
6.1.2. Implements entities as tables, maintaining and enforcing relationships............................................. 60
6.1.3. Deals with all the component disc files ............................................................................................. 60
6.1.4. Provides functions such as................................................................................................................. 60
6.1.5. An approachable programming language................................................................................................ 60

6.2. Important facilities of more advanced DBMS ...................................................................................... 60

6.3. Further facilities of more advanced DBMS........................................................................................... 61


6.3.1. Other RDBMS................................................................................................................................... 62

6.4. Why we want business students to learn Access ................................................................................... 62


6.4.1. The relative ease-of-use of MS Access.............................................................................................. 62
6.4.2. MS Access is easily obtained ............................................................................................................ 63
6.4.3. MS Access supports usable programming languages............................................................................... 63

7. MS ACCESS IMPLEMENTATION OF DATA MODELS.................................. 63

7.1. Tables, one per entity type...................................................................................................................... 63

7.2. Fields, one per attribute .......................................................................................................................... 63

7.3. Records, one per entity occurrence........................................................................................................ 63

Page 6 of 144
7.4. Attribute types in MS Access.................................................................................................................. 63

7.5. Permitted data types in MS Access ........................................................................................................ 64


7.5.1. Use of Number or Currency fields..................................................................................................... 65
7.5.2. Storing telephone numbers ................................................................................................................ 65
7.5.3. Controlling data entry formats with masks ........................................................................................ 65

7.6. Keys .......................................................................................................................................................... 66


7.6.1. Candidate keys................................................................................................................................... 66
7.6.2. Primary key ....................................................................................................................................... 66
7.6.3. Multi-part primary keys ..................................................................................................................... 66
7.6.4. Entity integrity rule............................................................................................................................ 66
7.6.5. Foreign keys ...................................................................................................................................... 66

7.7. Relationships............................................................................................................................................ 66
7.7.2. Relationships and linking: Enforcing referential integrity where appropriate ................................... 68

7.8. System outputs ......................................................................................................................................... 68


7.8.1. Queries .............................................................................................................................................. 69
7.8.2. Reports .............................................................................................................................................. 69
7.8.3. Forms................................................................................................................................................. 69

7.9. System inputs ........................................................................................................................................... 69


7.9.1. Forms, sub-forms and their use with 1: M and M: N relationships.................................................... 69
7.9.2. Field-specific validation checks......................................................................................................... 69
7.9.3. Using relational integrity to carry out inter-table validation checks .................................................. 70
7.9.4. Table-level checks on forms .................................................................................................................. 70

7.10. Implementing processes ...................................................................................................................... 70


7.10.1. Data processing in Access ................................................................................................................. 70
7.10.2. Functional elements in Access........................................................................................................... 70

7.11. System data transformations .............................................................................................................. 71


7.11.1. Append and Update queries............................................................................................................... 71
7.11.2. Macros ................................................................................................................................................ 71
7.11.3. Visual Basic for Applications (VBA) modules inside Access....................................................................... 71
7.11.4. Visual Basic programs outside Access.................................................................................................... 71

8. WAYS IN WHICH TO LEARN MORE MS ACCESS........................................ 72

8.1. Sample databases and applications included with Microsoft Access .................................................. 72
8.1.1. NorthWind Traders sample database (English edition) / Les Comptoirs (édition française)............ 72
8.1.2. Database Wizards (Assistants) .......................................................................................................... 72

SECTION 3 – THE ANYTOWN DISTANCE LEARNING BUSINESS SCHOOL


EXAMPLE................................................................................................................. 73

9. EXAMPLE SCENARIO: ANYTOWN DISTANCE LEARNING BUSINESS


SCHOOL................................................................................................................... 73

10. BACKGROUND: STUDYING .......................................................................... 73

11. A CLOSER LOOK INTO "MANAGING STUDENTS" ..................................... 74

Page 7 of 144
12. THE PROCESS OF DECIDING WHAT HAPPENS TO STUDENTS............... 74

13. COURSE REVIEW .......................................................................................... 74

14. SIMPLIFYING ASSUMPTIONS ....................................................................... 75

15. EXTERNAL ENTITIES..................................................................................... 75

16. PROCESSES................................................................................................... 75

16.1. Process Applicants............................................................................................................................... 75

16.2. Admit students to Course – Course Enrolment ................................................................................ 75

16.3. Register students on core and optional modules............................................................................... 75

16.4. Teach and assess a module.................................................................................................................. 75

16.5. Prepare for and hold exam board (jury) ........................................................................................... 75


• Collect together the results for all students for all modules they have been studying................................ 75
• Review module results in exam board ....................................................................................................... 75
• Decide student status in exam board.......................................................................................................... 75

16.6. Review Course ..................................................................................................................................... 76

17. DOCUMENTS .................................................................................................. 76

17.1. Course Description .............................................................................................................................. 76


17.1.1. List of Modules.................................................................................................................................. 76

17.2. Management Reports .......................................................................................................................... 76

18. ENTITY AND ATTRIBUTE LISTS ................................................................... 76

19. EXAMPLE STUDENT RECORD REPORT...................................................... 77

20. ANYTOWN HIGH-LEVEL USE CASE DIAGRAM ........................................... 78

21. ANYTOWN: CONTEXT DIAGRAM ................................................................. 79

22. LEVEL 1 DFD .................................................................................................. 80

23. EXAMPLE LEVEL 2 DFD................................................................................ 81

24. DATA DICTIONARY ........................................................................................ 82

24.1. Data dictionary for Anytown Business School.................................................................................. 82

Page 8 of 144
25. ANYTOWN ER DIAGRAM............................................................................... 94

26. ANYTOWN SYSTEM IMPLEMENTATION...................................................... 95

27. TERMINOLOGY ASSOCIATED WITH DATA MODELLING AND DATABASE DESIGN ....... 95

0. REFERENCES ................................................................................................ 96

0.1. Basics of structured analysis................................................................................................................... 96

0.2. Database theory ......................................................................................................................................... 96

0.3. DataFlow Diagrams (DFDs) ................................................................................................................... 96

0.4. Entity relationship modelling ................................................................................................................. 96

0.5. Use Case ................................................................................................................................................... 96

0.6. Basics of Object Oriented Analysis and Design (OOAD) ............................................................................. 97

1. APPENDIX 1 BUSINESS PROCESS ANALYSIS USING USE CASE


ANALYSIS ................................................................................................................ 98

1.1. What is a Use Case Diagram? ................................................................................................................ 98

1.2. What to do if a use case diagram won’t fit on a single page? ............................................................ 101

1.3. Finding Use Cases.................................................................................................................................. 101

1.4. Naming Use Cases.................................................................................................................................. 102

1.5. Describing Use Cases............................................................................................................................. 103

1.6. Using Use Cases to identify System Inputs and Outputs.................................................................... 103

1.7. Other resources for learning about Use Cases.................................................................................... 103

2. APPENDIX 2 DATA FLOW DIAGRAMS ....................................................... 104

2.1. What are Data Flow Diagrams (DFDs)? ............................................................................................. 104

2.2. Why use Data Flow Diagrams? ............................................................................................................ 104

2.3. What is a DFD? Main elements............................................................................................................ 105

2.4. The components of a DFD..................................................................................................................... 105

2.5. What appears on a DFD? ..................................................................................................................... 106


2.5.2. Listing the elements of a DFD ......................................................................................................... 107

2.6. The Data Flow Diagram Symbols – SSADM Notation....................................................................... 107

2.7. Making a Data Flow Diagram: a Top-Down Approach..................................................................... 107

Page 9 of 144
2.8. The elements of a DFD .......................................................................................................................... 108

2.9. Creating DFDs ....................................................................................................................................... 108

2.10. First List the Elements of the Data Flow Diagram ......................................................................... 110

2.11. Drawing the Context Diagram ......................................................................................................... 110

2.12. Expanding a context diagram to give a level 1 DFD....................................................................... 110

2.13. Questions to ask yourself .................................................................................................................. 111

2.14. Rules for DFDs................................................................................................................................... 111

2.15. Some points on logical DFDs ............................................................................................................ 111

2.16. Supporting documentation ............................................................................................................... 111

2.17. Summary: “levelled” DFDs .............................................................................................................. 112

3. APPENDIX 3 WHEN TO USE A SPREADSHEET, AND WHEN TO USE A


DATABASE ............................................................................................................ 113

3.1. Introduction ........................................................................................................................................... 113

3.2. Spreadsheets versus databases ............................................................................................................. 113


3.2.1. What spreadsheets are good at......................................................................................................... 113
3.2.2. What databases are better at ............................................................................................................ 113
3.2.3. Using spreadsheets and database together ....................................................................................... 114
3.2.4. Summary.......................................................................................................................................... 114

3.3. What to do if your spreadsheet skills are weak................................................................................... 115

3.4. What to do if your database skills are weak........................................................................................ 115

3.5. Conclusion.............................................................................................................................................. 116

3.6. Acknowledgements – bibliography for Appendix 3............................................................................ 116

4. APPENDIX 4: REASONS WHY A DATABASE IS TO BE PREFERRED TO A


SPREADSHEET - SPREADSHEET DOES NOT EQUAL DATABASE.................. 117

4.2. More Than a List................................................................................................................................... 117

4.3. Create the Database .............................................................................................................................. 118

4.4. Create a Data Entry Form.................................................................................................................... 120

5. APPENDIX 5: ACCESS HINTS - DESIGNING FOR USE ............................................. 122

5.1. Getting more help..................................................................................................................................... 122

5.2. Unlocking the power of many-to-many relationships ................................................................................ 122

5.3. Some difficulties associated with forms and subforms and how to overcome them.................................. 125

Page 10 of 144
5.4. Subform not updated................................................................................................................................ 125

5.5. Detail subform does not show the subset of records based on the value of the current master form record
126

6. APPENDIX 6: NORMALISATION ........................................................................... 129

6.1. Introduction to Normalisation.............................................................................................................. 129

6.2. Introduction ........................................................................................................................................... 129

6.3. Preliminary remarks ............................................................................................................................. 129

6.4. Terminology ........................................................................................................................................... 130


6.4.1. Records............................................................................................................................................. 130
6.4.2. Field names ....................................................................................................................................... 130
6.4.3. Keys.................................................................................................................................................. 130

6.5. The various stages of normalisation..................................................................................................... 132


6.5.1. Convert data into unnormalised form (UNF, 0NF) .................................................................................. 132
6.5.2. Convert UNF into First Normal Form (1NF) ........................................................................................... 132
6.5.3. Convert 1NF into Second normal form (2NF)......................................................................................... 132
6.5.4. Convert 2NF into Third normal form (3NF) ............................................................................................ 133

6.6. Further normalisation........................................................................................................................... 133

6.7. A full example of normalisation ........................................................................................................... 133


6.7.1. Step 1 - Convert data into UNF.......................................................................................................... 134
6.7.2. Step 2 - Convert data into 1NF .......................................................................................................... 134
6.7.3. Step 3 - Convert data into 2NF .......................................................................................................... 135
6.7.4. Step 4 - Convert data into 3NF .......................................................................................................... 135

6.8. Normalisation: A Summary.................................................................................................................. 136

6.9. Normalisation complements top-down entity-relationship modelling............................................... 137

6.10. What is achieved by normalisation? ................................................................................................ 137

6.11. How is normalisation used in practice? ........................................................................................... 137

6.12. Still confused? .................................................................................................................................... 137

6.13. Some questions with which to check your understanding.............................................................. 137

7. APPENDIX 7 INSTALLING AND USING MICROSOFT VISIO ..................... 140

7.1. Introduction ........................................................................................................................................... 140

7.2. Visualize complex information to better understand it ...................................................................... 140

7.3. Learning Visio........................................................................................................................................ 141

7.4. Creating DFDs using Visio ................................................................................................................... 141

7.5. Installing SSADM support.................................................................................................................... 141

Page 11 of 144
8. APPENDIX 8 STRUCTURED WALKTHROUGHS, A WAY TO IMPROVE THE
QUALITY OF ANALYSIS........................................................................................ 143

8.1. How to seek for perfection! Improving the quality of our work ....................................................... 143

8.2. References for Appendix 8.................................................................................................................... 144

Page 12 of 144
1. Introduction: Who is this document for?

1.1. Preface
The booklet aims to help you learn how to design and build applications using Microsoft
Access. This document is written to be read and understood as you are working on your own
design and build experiments.
This Access database design and implementation document is a higher-level self-instruction
booklet; it is assumed that you are already a fairly competent Access user.
If you need to learn how to use Microsoft Access, please see section 1.6 for further advice.

1.2. Skills required


Modern relational database management systems such as Microsoft Access have been designed
to enable users to get as far as is reasonably possible without needing software design and
construction (“programming”) skills. Four basic levels of skill can be recognised in database
use. These are:
♦ LEVEL 1 – Database User
Straightforward data input, amendment and querying, such as
might be undertaken by a clerical or professional worker who is
expected to capture and use data as a small part of their every
day work;
♦ LEVEL 2 – Database Builder
Basic database implementation skills, including design of
simple databases and implementation of the design as a series
of tables, queries and reports; such skills might be anticipated
in a professional worker in an office environment who has some
responsibility for the basic information systems (IS) needed in
that office, but whose primary job responsibility is not IS-
oriented;
♦ LEVEL 3 – Database Administrator
Real database design and implementation competence. You
would expect this in an information systems professional. But
this same higher level of competence may also be found in
certain business-oriented individuals who take a real pride in
using computers to their full potential. Such individuals are
sometimes referred to as power users. The work of such an
individual includes serving the needs of other clerical and
professional office workers by undertaking detailed analysis,
design and implementation work and creating systems usable
by other office workers and business professionals.
♦ LEVEL 4 – Database Professional
Expert user with programming skills.

1.3. The aims of the remainder of this document


The main aim of asking you to work through the remainder of this document is to link the
following topics:
♦ To help you to learn the principles of modelling
information systems in an experiential, problem-oriented way

Page 13 of 144
and not just a theoretical one. (Corresponds to Level 1 above)
♦ If you are aspiring to general competence in business studies:
to give you reasonable skills in the analysis and
construction of effective, albeit small-scale, computerised
business information systems. (Corresponds to Level 2
above)
♦ If you want really to exploit the power of databases and
systems and / or you aspire to the challenge of managing
information systems professionals: To help you reach the
point where you can analyse a user's requirements, design
them a solution, and refine the solution by means of
building a working prototype in MS Access. (Corresponds to
Level 3 above)
♦ If you are a budding IS professional, or wish to become a
systems analyst or consultant: This document is a starting point
only – you will need specific additional training and
experience. (Corresponds to Level 4 above)
Note that material which only applies to Level 3 or above is shown in grey-background Arial Narrow, like
this paragraph.

1.4. The structure of this document and how to use it


This document consists of:
♦ Section 1 – The Principles of Database
∗ Data modelling using ERM
∗ Databases
♦ Section 2 – Using Microsoft Access to build good databases
∗ System implementation using Microsoft Access
♦ Section 3 – The Anytown Distance Learning Business School
example
∗ A fully worked example of the analysis and design of a
system for a virtual enterprise
∗ However, please note that this example does not enter
into the business-oriented aspects of the assignment
you are doing
♦ Appendix 1 Use Case analysis
∗ User interaction modelling using Use Case scenarios
♦ Appendix 2 Data Flow Diagrams
∗ Process modelling using DFD data modelling
♦ Appendix 3 When to use a spreadsheet, and when to use a
database
♦ Appendix 4 Reasons why a database is to be preferred to a
spreadsheet
♦ Appendix 5: Access Hints – Designing for Use
Page 14 of 144
♦ Appendix 6: Normalisation
♦ Appendix 7 Installing and using Microsoft Visio
♦ Appendix 8 Structured Walkthroughs, a way to improve the
quality of your analysis
All readers should start with Section 1. Then read the rest of the document, but ignoring this
kind of text. Later, reread the document including text like this.

1.5. About Learning Access


You should work your way through the following stages:

1.5.1. Starter: the naïve user: Level 1


You should already be at (or perhaps beyond) this stage. If you aren’t – learn how to
use Access now! This Designing and Building Access Database Systems guide cannot
help you to learn these basic skills, which you are assumed already to have – but may
have to acquire, revise and practise them, at the same time as you are reading this
booklet. You’ll find a checklist just below, in section 1.6.

1.5.2. The thinking user: Level 2


The business specialist who nevertheless thinks carefully about how s/he can best use a
computer to help them to get their work done, or who spots a new application area or
ICT-related business opportunity.
Working through this document, and using the facilities of each Office programme just
a little bit more each time, should get you to about this stage.

1.5.3. The competent (“power”) user: Level 3


This is the person who becomes known as the person to whom to talk when no-one
else in the department or office seems to know what to do! This is the person who has
mastered spreadsheets and uses them frequently, and who knows when to use a
database.

1.5.4. Advanced: the programmer or systems integrator: Level 4


Further competence in Access will require you to begin to use the power of the VBA
programming language and to understand SQL. This subject is beyond the scope of this
module, and is NOT expected in ESC business students.

1.5.5. What is the relevance of MS Access skills?


The main reason for advising business students to learn Access is that it is possible
using Access to build reasonably powerful Information Systems (IS) with a tool which
is reasonably straightforward (if not always easy!). In effect, you are building a
Prototype system using an End User Computing tool. At the same time, you are
consolidating what you have been taught in first and second year modules. All this is
essential if you are to achieve the learning outcomes of the module that you are
studying.
Facility in Access is itself a marketable skill. You should find it much easier to obtain
certain internships or placements as a result of the fact that you know industry standard
software like Access. In addition, Access is a reasonably complete implementation of
the theoretical relational database model originally defined by Edgar Codd and
popularised by many authors (notably, Chris Date—see Date 2003). Relational

Page 15 of 144
databases are a very powerful way to structure data and to be able to get the
information you need as a future manager.

1.6. Testing yourself


In this section, we summarise what we consider to be the basic knowledge and ability you need
to have in Microsoft Access.
The most important first step is to take a first step! Get hold of a copy of Access and start to
use it. As you do so, tick off the various things on the list below. You can start reading this
Designing and Building Access Database Systems guide in parallel, but please understand that
you cannot understand what is in this book without actually testing your practical ability and
knowledge.

1.6.1. What you should already know


We have already revised or introduced the following concepts:
♦ What are Tables?
♦ Designing a Table
♦ Keys: primary and secondary (“foreign”)
♦ Relationships and linking

1.6.2. What you should already be able to do or need now to learn


You should aim at the following practical competences, which you may have acquired
in the first year at ESC Rennes, or which you may now need to learn:
Competence Tick when
you can
do this
♦ Fundamental skills
∗ Starting Access
∗ Creating a database
∗ Creating a Table
∗ Adding Data
∗ Creating a Query
∗ Adding a second table
∗ Linking tables with a relationship –
establishing foreign keys
♦ Forms – basic concepts
∗ Creating a form based on a table using the
form-building assistant / wizard
∗ Changing the design of the form
∗ Adding records using a form
♦ Reports – basic concepts
∗ Creating a report

Page 16 of 144
∗ Creating a report based on a query
♦ Relationships
∗ Creating a relationship between tables
∗ Creating a query which uses linked tables
∗ Forms, sub-forms and their use with one-to-
many (1:M) relationships
∗ Many-to-many relationships and multi-part
primary keys
♦ Forms – more advanced use
∗ Using list and combo boxes
∗ Combo boxes (zones de liste déroulantes)
and subforms (sous-formulaires)
∗ Creating a subform (sous-formulaire)
∗ Inserting a subform into a main form
∗ Subforms of subforms
∗ Adding record navigation buttons

1.6.3. A checklist of more advanced skills


If you are aiming at Level 3 or Level 4 competence, you will need to achieve:
Competence Tick when
you can do
this
∗ Competence in update and append queries
∗ Advanced data validation techniques
∗ Basic competence in Visual Basic programming
∗ Basic competence in SQL (structured query language)
∗ Dealing with problematic many-to-many relationships

1.6.4. Further help on learning to learn


See appendix 4 for a very basic introduction to Microsoft Access, and appendix 3.4 for
some suggested websites.

1.7. Conventions Used in this document


Points which are significant only to more advanced users are indicated like this paragraph.

 VITAL POINTS are indicated like this!


1.8. Limitations
This document is aimed at people who are comparatively new to systems analysis and design,
and who are not aiming to be experts in that field. It therefore aims to be useful and usable
without necessarily being totally complete. Where a conflict exists between being totally
Page 17 of 144
comprehensive (but unnecessarily difficult), and being comprehensible and straightforward,
the second approach is adopted. The aim is to exclude material which is extraneous in the
sense that most business people, and indeed many analysts, do not need to consider it.
Therefore complex issues such as ternary relationships are ignored. Instead, the main issues
are concentrated on and the reader is encouraged to understand them and to apply them. Once
the reader is comfortable with the approach adopted in this document and has achieved some
real competence in database design and implementation, he/she can read more advanced texts
and tackle the more difficult issues. Until then, the slightly simplified (but never facile)
approach adopted in this document is a sensible compromise.
Microsoft Access is very unusual as a database management package in that it is intended
both to be very useful to people who are new to database, and also to offer the full power of a
programmable system to more advanced users. Microsoft Access aims to make the gap
between intermediate and advanced use as small as it can be, because it provides both
macros and Visual Basic for Applications (VBA). Macros can be used to automate repetitive
sequences of commands or instructions, such as those needed to open a form without having
first explicitly to open the database window. VBA is a full programming language, and it can be
used for relatively complex tasks such as advanced field validation, and also for dealing with
anticipated errors and automatically recovering from them. VBA is a programming language
which is based on Microsoft's Visual Basic system. In Office 97 and beyond, the same VBA
language is used in all the major Microsoft applications, Word, Excel and Access.
Although this document does not assume familiarity with Visual Basic, certain more advanced
uses of Access do require awareness of such Visual Basic concepts as functions (sub
programs which return a result) and many more advanced features of Access – things like
validation rules - use Visual Basic syntax.
Business students should NOT normally attempt to master the VBA programming language.
However, at certain points in this document, VBA is used to illustrate more advanced
techniques.

1.9. Acknowledgements
I should like to thank:
♦ Former Huddersfield colleagues Dr. Steve Wade and Dr. Ken
Lunn
♦ ESC Rennes colleagues, notably Dr. Renaud Macgilchrist
♦ Previous ESC Rennes students
The following students gave me permission to reuse parts of their excellent
work on the Anytown Business School group case. I have incorporated this
case as a worked example in this document, and made significant use of these
students’ work:
Marine CORRE; Marie GALATAUD; Emmanuelle HAMEURY;
Naïla MALTI

Page 18 of 144
SECTION 1 - THE PRINCIPLES OF DATABASE

2. A brief introduction to databases

2.1. Databases (bases de données) and how they are designed


In this chapter, we give consideration to databases: what they are, to some extent how they are
used, and to a limited extent how they are designed.
The subject matter here involves a specialised vocabulary, and a degree of complexity. Many of
the ideas surrounding database are on first encounter quite strange, but they quickly become
intuitive if you combine a study of the theory of database with an attempt to make them work in
practice. So: stick with that approach, learn a little then try it out!

2.2. Models used by systems analysts


These are examples only!
♦ Interaction models
∗ UCDs (Use Case Diagrams)
♦ Process models
∗ DFDs (Data Flow Diagrams)
♦ Data models
∗ ERA (Entity Relationship Attribute diagram)

2.3. Simple model of data processing

Data Data Processing


Source System Recipient

Information

Store Retrieve
Data Data

Database

This diagram, which shows the structure of a data processing system (a synonym for business
information system), highlights the central importance of the database as the place where data is
stored and from which it is retrieved.
Page 19 of 144
2.4. Why study Databases?

2.4.1. They are used in every significant BIS


♦ Store details of orders, customers etc.
♦ Support product catalogue in B2C applications

2.4.2. They are at the heart of


♦ “Database marketing”
♦ CRM – customer relationship management
♦ ERP – enterprise resource planning

2.5. Background
Some understanding of what a database is, how it is used, and (to a greater or lesser extent) how
databases are designed is essential to understanding electronic business.
Businesses are systems; they use Information Systems, which are based on Information and
Communications Technology.
Example: any e-commerce company provides a Web window onto its internal catalogue: which
is a web page connected to a database.
Every stakeholder needs information from the business. They generally obtain this as
information presented on forms (screens), reports and dynamic web-pages (webpages which
show the current contents of a database and permit stockholders to update that database).

2.6. How to store data


♦ Data is stored in tables: 2-dimensional structures
♦ In MS Office terms:
∗ Word tables (also PowerPoint)
∗ Excel worksheets
∗ Access tables
♦ The 2-dimensional table which follows was created in Word

Relative strengths and weaknesses of Word, Excel and Access for storing data
Method Advantages Disadvantages
Word Simple, well understood by people with weak computing skills No formulae (or only very
Processing: e.g. rudimentary ones)
Word
Excellent formatting options Tables are not related in any
way Can only be updated by one
person at a time. The data in a
table has no “structure” known to
the computer.
Spreadsheet: e.g. Some degree of structure – cells organised into rows and columns, Persistent data is not safe.
Excel with links possible between the cells
Very powerful data manipulation using formulae Size limits – 65535 rows (until
Office 2007).

Page 20 of 144
Separate tables can be held in different worksheets No design methodology or
coherence – it is possible and
easy to mix data up in a way
which makes it impossible to
find, update and relate.
Items of data can be related together using lookup formulae such as Poor support for queries –
VLOOKUP (RECHERCHEV) and HLOOKUP (RECHERCHEH) searching is slow, and the lookup
formulae are far from being
intuitive.
Can only be updated by one
person at a time
Database: e.g. Each kind of data is stored by the database management system More difficult to use and to learn
Access (DBMS) in its own separate table. The tables are related together (at first)
in accordance with the Relational data model – this gives
coherence to the collection of tables, which is the whole database
Very powerful data structuring and querying. In fact a query is just Requires thoughtful use and
a results table which combines together selected data from more advance planning
than one stored table. The database program enables the user to say
what data they need and they construct a query which precisely
specifies what data is to be retrieved into the results table
Safer persistent data (though less safe than bigger, more powerful Access databases are not directly
DBMS programs like Microsoft SQL Server, ORACLE etc) web-accessible
Is multi-user: that is, more than one person at a time can change
(update) the database
Since every record in a table has the same basic structure, it is But the programming language
much easier and / or more cost-effective to process complete sets within Access, VBA, is too
of records under program control difficult and / or inappropriate for
most business users to learn.
Figure 1 Comparative strengths and weaknesses of data storage in two dimensional tables: Microsoft
Office tools

2.7. What is a database?


♦ A linked collection of tables (tables)
♦ Each table containing data about a single kind of thing
♦ Data in the separate tables can be combined ("joined") to
answer user needs for information

2.8. Why keep data in different tables?


♦ Company A uses a single table named orders to record orders
they receive, while Company B uses a relational database with
two tables: orders and customers.
♦ When a customer places an order with Company A, a new
record (or row) in the table orders is created.
♦ Because Company A has only one table of data, all the
information pertaining to that order must be put into a single
record: the customer’s general information, such as name and
address, is stored in the same record as the order information,
such as product description, quantity, and price. If customers
place more than one order, their general information will need
to be re-entered and thus duplicated for each order they place.

Page 21 of 144
Order Customer name Customer Product Product Unit Price Quantity Amount
number address code description of per unit
sale
O001 GREGORY Mark 1 La Rue P001 Apples kg 0,80 € 2,5 2,00 €
O002 GREGORY Mark 1 La Rue P876 Oranges kg 0,90 € 1 0,90 €
O003 MACGILCHRIST 1 La Croix P001 Apples kg 0,80 € 3 2,40 €
Renaud Mistyped
O004 GREGORY Mark 1 La Rue P001 Apples kg 0,80 €
address 2 1,60 €
O005 MACGILCHRIST 1 La Croix P876 Oranges kg 1,05 € 1,5 1,58 €
Renaud
O006 GREGORY Mark 11 La Rue P001 Apples kg 0,90 € 2 1,80 €
O007 GOT Guillaume 1 L’Avenue P001 Apples kg 0,90 € 3 2,70 €
O008 GREGORY Mark 99 Le Chemin P876 Oranges kg 0,90 € 1,5 1,35 €

Changed
address

♦ Whenever there is duplicate data, as in the case above, many


inconsistencies may arise when users try to query the
database. Additionally, a customer’s change of address might
require the database manager to find all records in orders that
the customer placed, and change the address data for each
one.
♦ Company B is much better off with its relational database.
Each of its customers has one and only one record of general
information stored in the table customers. Each customer’s
record is identified by a unique customer code which will serve
as the relational key. When a customer orders from Company
B, the record in orders need contain only a reference to the
customer’s code, because all of the customer’s general
information is already stored in customers.
♦ Indeed, Company B might go further and introduce a product
table. It then has:
∗ CUSTOMER table
Customer Customer name Customer
number address
C001 GREGORY Mark 1 La Rue
C002 MACGILCHRIST 1 La Croix
Renaud
C003 GOT Guillaume 1 L'Avenue

Page 22 of 144
∗ PRODUCT table Price per unit
is on both
Product Product Unit of sale Standard
tables! One is
code description price per
standard, the
unit
other order-
P001 Apples kg 0,80 €
specific.
P876 Oranges kg 0,90 €
∗ ORDER table
Order Customer Product Actual Quantity Amount
number number code price per
unit
O001 C001 P001 0,80 € 2,5 2,00 €
O002 C001 P876 0,90 € 1 0,90 €
O003 C002 P001 0,80 € 3 2,40 €
O004 C001 P001 0,80 € 2 1,60 €
O005 C002 P876 1,05 € 1,5 1,58 €
O006 C001 P001 0,90 € 2 1,80 €
O007 C003 P001 0,90 € 3 2,70 €
O008 C001 P876 0,90 € 1,5 1,35 €
♦ This still isn’t perfect, since Orders and their Details continue to
be mixed together in one table. 1

2.9. Learning a minimum about database


Every business student needs to know about and understand:
♦ Database principles
∗ Tables
∗ Queries
∗ A query is a results table
♦ Introduction to database design
♦ How to use a sample database
♦ How databases are used - ERP, CRM etc.

2.10. Basic concepts


♦ Entity: class of thing about which data is stored
Examples: student; programme – these are tables of data.
♦ Occurrence: a single instance of an entity
Example: ETU2004987 Smith, John – this is one record in the table of data.
♦ Attribute: a single fact that describes, qualifies or is otherwise
a property of an entity
Example: Programme name, value for John Smith: MA International
Business
♦ Key: attribute(s) which uniquely identify a single occurrence of

1
The solution here includes the introduction of a link or intersection entity, called Order
Detail. See section 2.20 for a general description of what must be done.
Page 23 of 144
an entity
Example: student number uniquely identifies a Student
♦ Relationship: a logical connection or dependency between
two entities
Example: any one programme has many students; any one student is on
precisely one programme: we say that a one to many relationship exists
between programme and student

2.11. Entity (type, class)


An entity represents a class of objects, usually in the real world. Synonyms for entity include
class and type.
♦ Entities are of importance to the area of business being
investigated
♦ They are objects about which data is stored
♦ Represented as boxes on an Entity Relationship Model (ERM)
♦ Examples:
∗ Student
∗ Programme
An entity has a number of different data attributes or properties, that is, facts about the thing.
For example a student will have a student number, a last name, a first name, and a programme
code. A programme will have a programme code, name and programme leader.

Page 24 of 144
2.12. An example: Students by Programme

Entity relationship diagram Sample data

Programme Programme Programme LMD Leader


code name level
PK Programme code PGE Programme Grande M RIVET
École Philippe
Programme name
EMBA Executive MBA M MINDAY
LMD level Don
Leader

Student Student no Surname Forenames Programme


code
PK Student no
20099234 Leuchars Annabelle EMBA
Surname 20099235 Dromsky Pierre-Charles PGE
Forenames
FK1 Programme code 20099897 Mozart Anne-Marie PGE

In the diagram, the two rectangular boxes represent entity types. Here, they are programme and
student. They are represented as different entity types because they represent different things in
the real world. At least in theory, a programme could exist without any students. Almost by
definition, a student is on a programme of some kind, but it is clear that programme and student
are not the same things. It is equally clear that they are related. The diagram represents this
relationship by using a line with a crow's foot at one end of it. The end of the crow's foot
represents the many end of a one to many relationship, often represented simply as 1: M
It is necessary to have an additional attribute on the student which links the student to its
owning programme. On the sample data provided with the diagram, we have shown Annabelle
Leuchars as being a student on the Executive MBA, by including the Programme code in the
Student table. Programme code is a foreign key, which links the Student back to her
Programme.

2.13. Attribute
An attribute is a Property of an entity, a single fact about the entity. An entity type will
normally have several different attributes, one (or occasionally more) of which uniquely
identifies every instance of the entity type. The identifying attribute or group of attributes is
called for the primary key for the entity type.
The Attributes of Programme are Programme Code (primary key), Programme Name, and
Programme leader
The Attributes of Student are Number (primary key), First name, Last name, Programme Code
(foreign key)
Programme code has to be present as a foreign key in the student entity in order to represent the
relationship which exists between programme and student.

2.14. Primary and Foreign Keys


Page 25 of 144
♦ Primary key is an attribute or combination of attributes which
uniquely identifies an entity occurrence
♦ To make a link between the Many (child) end of a relationship
and its One (parent) end, the Primary key of the One end is
repeated in the Many end
♦ In the Many entity, it is known as the Foreign Key
♦ What are Foreign Keys? A foreign key is an attribute that
completes a relationship by identifying the parent entity.
Foreign keys provide a method for maintaining integrity
(coherence, consistency) in the data. Every relationship in the
data model must be supported by a foreign key.
♦ Identifying Foreign Keys: Every dependent and subtype entity
in the data model must have a foreign key for each relationship
in which it participates. Foreign keys are formed in dependent
and subtype entities by migrating the entire primary key from
the parent entity.

2.15. Why have foreign keys?


♦ One particular programme has many students on it. That group
of students (we call the group an entity type or table or set –
the words are synonyms) is defined by having the same
programme code
♦ The programme code of programme is of course the Primary
key
♦ It is also, in the student table, the Foreign key which links each
student to the programme of which s/he is a part
♦ Remember that the foreign key repeats at the many end, the
primary key of the one end of the one to many relationship.
This is essential if the database management software is to be
able to link back together the students on a given programme,
or to look up the details of the programme for a given student.

2.16. Entity occurrences – student


Having decided the general attributes of an entity type, it is then possible to store records
relating to occurrences of the entity, typically in the real world. So, in the example above,
details of three students are given, two on one programme, one on another.

2.17. Entity occurrences – programme


In the example above, details of two programmes are given.

2.18. Queries
The purpose of a database is to enable users to get the specific information they need. This can
be done using queries. Queries are both useful in themselves, and also are used as the basis for
reports and for forms.
♦ To answer a question like: who is programme leader for a
given student? We can get all the necessary information by a
query on both tables - programme and student
Page 26 of 144
♦ Note that the name of the programme leader should be an
attribute of programme, and definitely NOT of student!
To answer a question like: who is programme leader for a given student? we can get all the
necessary information by a query on both tables - programme and student. This is the work of
the relational database management system software (RDBMS). A user of the database
formulates a query, and the RDBMS goes away to look up details of occurrences in both entity
types, joining the answers together as a result presented to the user.

2.19. Students and Modules

Module Student

In this diagram, the two rectangular boxes represent entity types. Here, they are Module and
Student. The relationship is Many-to-Many. The diagram represents this relationship by using a
line with a crow's foot at both ends of it. The end of the crow's foot represents the many end of
the many to many relationship, often represented simply as M:M or M:N
This model reflects the empirical observations that:
1. Any one student studies many modules
2. Any one module has many students
Many-to-Many relationships are very common. They are also problematical – this is because
actual database management systems like Access (and almost all others) cannot support Many-
to-Many relationships directly.
However, by following simple rules, it is possible to eliminate many-to-many relationships.

2.20. Three Vital Rules


1. An attribute can only hold a single fact
∗ If the name of an attribute is a list (e.g. something in
the plural, like Student Qualifications) this is a sign that
another entity is needed
2. The Primary key of the One end of a One to Many relationship also
appears as a foreign key in the Many end
3. A Many to Many relationship can be resolved into two One to Many
relationships, both going to a Link (or Intersection) entity type
These are Rules – there is no need to question them, just to apply them!

2.21. Resolving Many-to-Many Relationships

Page 27 of 144
Resolving Many-to-Many
Relationships

The many-to-many relationship is removed by:


♦ Introducing a link or intersection entity
♦ Drawing 1-to-many links FROM each original entity TO the new
one
∗ Note that the primary key of EACH parent entity
becomes part of the COMPOUND primary key of the
link entity
♦ Here, the primary key of Module is Module Code, and that of
Student is Student No. Both become the compound primary
key of the Registration entity
Module
PK Module code

Module name
Module leader
Module code Student no Module result
Module Registration
IS402E 20099234 A
PK,FK1 Module code
IS402E 20099235 B
PK,FK2 Student no
IS402E 20099897 C
Module result
OB401E 20099234 Fx
Student
PK Student no

Surname
Forenames

Page 28 of 144
♦ Note that there is only one primary key, made up of two
attributes
∗ Neither Module code nor Student no are unique in the
Module registration table – but the combination is
unique
Unless a student is allowed to do a module a second time, in which
case it is necessary to add a further attribute, usually a date, to the
compound primary key in order to make it unique again:

Module code Student no Date Module result

IS402E 20099234 2009 A

IS402E 20099235 2009 B

IS402E 20099897 2009 C

OB401E 20099234 2009 Fx

OB401E 20099234 2010 E

Page 29 of 144
2.22. Towards a more complete entity relationship attribute model
As analysis proceeds, the model is gradually refined and improved. Still incomplete, it might
look like this:

Programme Qualification

PK Programme code PK Qualification

Programme name
LMD level
Programme leader surname
Programme leader forenames

Student
PK Student no Award
PK,FK1 Qualification
Student surname PK,FK2 Student no
Student forenames
FK1 Programme code Award result
Student gender
Student birthdate

Module Registration

PK,FK1 Module code


PK,FK1 Module year semester
PK,FK2 Student no

Module grade
Module mark

Module Operation

PK,FK1 Module code


PK Module year semester

Module leader

Module
PK Module code

Module name
Note that this model has introduced a number of changes:
♦ There is greater precision in the attribute names chosen
♦ We wish to record a student’s qualifications, so we have
introduced Qualification
♦ Because a many-to many relationship exists between student
Page 30 of 144
and qualification, an intermediate (link) entity has been
introduced; we observe that in the real world a specific award is
give to each student who qualifies in something, so we’ve
called the link entity Award
♦ We observe that many modules are offered and “run” (that is,
they occur and are taught) for several years in succession (and
sometimes in more than one semester in a year), and further
that in some cases students take a module one year, fail it, and
do it again in a subsequent year; therefore we introduce a
Module Operation, the run of a module in a given year and
semester
♦ The model remains incomplete but it’s now good enough to be
worth prototyping (building and testing) in Access – so that we
can check that it meets our needs for storing data and (above
all) retrieving information in a very flexible way

2.23. Example Query – definition


Suppose we want to show a list of the students and the programme they are on, in ascending
order of student last name. The slide is a screen shot indicating how this query is constructed in
the Microsoft Access RDBMS.

I created it using the query design wizard (assistant) in Access. In the simple query wizard, I
specified fields from the student table and from the programme table which I wished to appear
in the result. Here, I wanted a list of students with details of the programme they are following.
The screen shot shows the resulting query: it indicates that there are two tables which are joined
together in the preparation of the result, and it also indicates which fields take part in the result.

2.24. Example Query – results


This slide shows the results of running (executing) the query. The results of the query have the
form of a table, in which the columns are the attributes from the participating tables and the
rows are the result records.

Page 31 of 144
How has Access created this result? Probably something like this: it reads each record in the
student table. One of the attributes of student is the programme code. Programme code is the
foreign key in the student table; it is also the primary key in the programme table. Access looks
up the details from the programme corresponding to the programme code for each student
record. In effect, it joins together the two tables on the basis of the linking foreign key.

2.25. What is a query?


(French ‘une requête’)
♦ A query is a response to a question formulated by a user of the
database
♦ A query takes the form of a table, in this case, a results table
∗ Technically, data tables are sets or relations
∗ So are the results of a query
♦ The power of a relational database is that it treats all stored
data and derived information as what mathematicians call sets
(sometimes called relations)
∗ Therefore database software is “simply” a computerised
implementation of mathematical set manipulation

2.26. Data Dictionary


A store of data about data, a dictionary is a database used by analysts, programmers etc. or by
you as you design a database. I find it useful to use a spreadsheet for this purpose, and you will
find an example dictionary implemented as a spreadsheet for the Anytown case later in this
document.

2.27. Entity-Relationship Diagrams (ERD)


Entity-Relationship Diagrams show:
♦ Entity types = the kind of things data is collected about in the
database
∗ Entities = the specific things data is collected about
♦ Relationship = the way specific entities of one type are related
to specific entities of the other type
♦ Attributes = the specific data items of interest stored for each
entity type
♦ ERDs help determine what kinds of data will be included in a
database, and how the database will be structured
♦ They are an excellent communication medium between users
and developers

Page 32 of 144
2.28. Another example database: NorthWind / Les Comptoirs
♦ Provided with Microsoft Access – usually to be found under
help menu, Sample databases. It includes table like:
∗ Product
∗ Order
∗ Customer
∗ Supplier
∗ Purchase order
The screen-shot shows a slightly-improved version of NorthWind.

2.29. Relationship: meaning and characteristics


♦ A link between entities which is significant for this type of
system
♦ Degree of relationship
♦ Optionality: does this relationship have to hold?
♦ Name: link phrase
E.g. Customer places orders.
∗ Two names?
A relationship can be named in either of two directions, depending
on which entity you start from. Thus:

(i) Customer places orders


Page 33 of 144
(ii) Orders are placed by customers

2.30. Why it’s important to relate entities


Construction of the query in the previous example was eased because a proper design process
had been undertaken in order to determine what entity types would be represented in the
database, and how they would be related. This design process resulted in the simple entity
relationship model presented in section 2.22. A line was used to link the two entity boxes
together; this line had a crow's foot at the many end of the one to many relationship which
analysis indicated exists between programme and student.
So one of the most important results of analysis is to establish what entity types are, and how
they are related. Relationships are links between entities which are significant for this type of
information and which are normally true in reality. A relationship can have a name: actually, it
can have two, one read from one end of the link, the other from the other end. In the earlier
example of Student and Programme, we can recognise two relationships - "student is on
programme" and "programme enrols student". The relationship is said to have a degree of 1:M,
which can be read one to many. In this case, the relationship is said to be mandatory: that is to
say, a student is not a student if they are not on a programme, and a programme is not a
programme if it has no students. (The second assertion may not always be the case, and it is
possible to represent a relationship as being optional from one or both ends.)

2.31. Degree of relationship (simplified)


The degree of a relationship is an indication at the end being considered of whether more than
one occurrence can be associated with one entity occurrence at the other
♦ There are three basic possibilities:
∗ One to one: 1:1
∗ One to many: 1:M
∗ Many to many: M:N
For more information concerning the degree of a relationship, please see 3.4.4

2.32. Simple database design


Design means deciding:
♦ What are the main entities?
♦ What are the attributes of each entity?
∗ What is the data type of each attribute?
∗ Validation rules
♦ Keys: primary and “foreign”
♦ Relationships and linking
Before building, for example, a Microsoft Access database, it is essential to carry out at least
some database design. These are the main points that have got to be addressed in even the most
informal design exercise.
The aim is to identify all the main entities and to give them the appropriate attributes.
Having done that, consideration has to be given to the kind of data which each attribute will
actually hold. Broadly speaking, the type of data is either numeric, text or something more
special-purpose like a date. If text, consideration has to be given to the maximum number of
characters that can be stored. Data which appears to be numeric may not in fact be so:

Page 34 of 144
telephone numbers, for example, must be stored as text, as indeed should code numbers like
student numbers.
Note the usefulness of a simple data dictionary here. If you decide that a customer number is to
be five letters (as it is in NorthWind), then it needs to be five letters everywhere it is used. You
record that decision in the data dictionary.
It is absolutely critical to identify the primary key for each entity type, and to ensure that
there is a foreign key at the many end of any one to many relationship which is discovered as
you think about how the entity types are related.

3. A Simple Methodology for Designing Microsoft Access


Databases
A methodology is a coherent set of methods, linked by a common underlying philosophy. The
methodology I suggest for designing and building access databases is described in this section, and
elaborated on in the remainder of this document. Some of the specific methods are derived from the
British SSADM, Structured Systems Analysis and Design Methodology; and one UML2 technique, that
of Use Case analysis, is employed. However, the Simple Methodology presented here is greatly
simplified to make it more appropriate to business use.

3.0. Introduction – background to the methodology

3.0.1. Database, or something else?


As the writer of the biblical book Ecclesiastes wrote nearly three millennia ago, "There
is nothing new under the sun"! Whatever, it is certainly the case that most standard
information handling problems encountered in business are common to more than one
business. It is therefore very likely that you will be able to find a system which has
been written for someone else but which is (more or less) directly applicable to an
information systems requirement you are analysing. If you can find a packaged
solution which is (more or less) applicable to your company or to a client whom you
are advising, then you can save yourself a lot of effort and the client a lot of money.
But what if there is no such package, or if it really doesn't suit the needs of the client /
user?
A Database may be, and often is, appropriate in many contexts - but still consider
alternatives, such as spreadsheets, for more straightforward or small-scale work, or
where system users are already highly familiar with spreadsheets.
Spreadsheets are great in some contexts, and there is immense power in advanced
spreadsheet packages like Microsoft Excel and Lotus 1-2-3. 3 And as we will begin to
see as we become more knowledgeable about information systems modelling and more
experienced in our use of relational databases, there are deficiencies also in the
relational database approach. For this reason, newer techniques such as object oriented
systems analysis and design are beginning to be used by IS/IT professionals. But for
now, what you must do is work hard to get familiar with systems analysis using the
structured approach, and database design using the relational approach.

3.0.2. What is a methodology?

2
UML is the Unified Modelling Language, a set of notations largely used by information systems professionals
and particularly associated with a style of programming called Object Oriented or OO. The only UML notation
we employ in this module is the Use Case diagram, UCD.
3
However, it is a serious error to use a spreadsheet when a database is necessary. Please see appendices 3 and 4
for a discussion of reasons why a database is often superior to a spreadsheet.
Page 35 of 144
A methodology is a set of methods for tackling a particular class of problem; the
methods should be linked by a coherent philosophy and be consistent with one
another. Formal (mathematical) and semi-formal (strictly defined) methodologies have
been defined for the analysis, design and construction of information systems.
However, they are often too rigid, too prescriptive or quite simply too long-winded to
be useful for people who are still learning the basics of the craft and who are tackling
relatively small problems. The approach adopted in the rest of this document is
methodical, but does NOT follow any one specific methodology; instead, it follows a
simplified methodology of my own. 4 So, if you think a full-blown database is
appropriate, you need to consider the steps outlined in the main parts of this section,
many of them associated with a particular method.5

3.0.3. Assumptions
The approach described in this document is applicable only to relatively small
applications: such as proof-of-concept prototype systems or perhaps end user
computing systems. So:
∗ The requirement is relatively small scale
E.g. the specific needs of the department in which you work; or
(part or all) of a small business
∗ A prototype (and perhaps target system) can be
implemented using Microsoft Access or a similar end-
user orientated database
Even if it’s too large for Access, people often create an initial (or
“prototype”) system in Microsoft Access. This is then used to
establish the complete requirements for an eventual full, or “target”
system. Or the target system may be sufficiently small to be
realisable using Microsoft Access.
∗ You are acting as the Analyst or System Designer
This document exists to help people design an effective database
application. In business, it is normal to distinguish between those
who use a system, the so-called users, and those who analyse,
design and implement a system – the developers.
This document treats you throughout as though you were acting in
the developer role.
What if you are the user as well as the developer?
Then you are in the situation sometimes described as end user
development, where a business person or student develops a system
for their own use and perhaps also for the use of other members of
their team or department.
Wherever possible, get someone else – e.g. a member of your team
– to act in the role of a true system user. Their perspective may be
different but also complementary.

3.0.4. Introduction to modelling business information systems:


4
Some of the techniques I use are borrowed from the UK standard SSADM (Structured Systems Analysis and
Design Methodology); the French equivalent is MERISE (see, for example,
http://www.commentcamarche.net/merise/concintro.php3). Nevertheless, these methodologies in their full form
are far too complex to be used by business people unaided.
5
Please note that what follows does not directly correspond to the structure of the assignment that you have to
undertake, but that there are close parallels.
Page 36 of 144
why we have chosen certain techniques
♦ Business situations and how to model them
Business information systems Our preferred modelling
have several overlapping technique for each of these
aspects: “views”:
The situations and decisions that Usage models – Use Case
influence the system Diagrams
The processes through which Process models – Data Flow
information flows through system Diagrams
Taking account of the life cycles of
information
The manner in which data and Data models - ERA Diagrams
information is organised (Entity Relationship Attribute
diagram)

3.0.5. What we’re trying to achieve together


♦ Three viewpoints - Trois cas de figure
∗ You need a system and you want someone else to build
it for you

Then you need to know how to specify what you need


∗ You need a system and you have to build it yourself

Then you need to know how to analyse and build what you
need
∗ You’re an entrepreneur and you want to build a
business

So: along with your business, marketing, organisational, etc.


strategies: you need a systems strategy and process and data
models
∗ In every case – YOU have work to do!

3.0.6. Business Process Modelling: Documenting a Business Process


♦ Is itself a business process that involves naming business
processes and subdividing them into their basic elements
∗ Helps clarify the problem the information system
attempts to solve – requirements analysis
∗ Can later be used as program specifications
♦ Business Process Modelling is a hot topic associated with
quality management
♦ Business Process Reengineering (BPR) = the complete
redesign of a business process using ICT

Page 37 of 144
3.0.7. Why have we chosen the techniques we have?
♦ Entity relationship attribute model
The ER model maps directly to tables and fields in commonly-used database
management systems. It is much less complex than classes, the parallel in the
more-recent but (very technical) object oriented (OO) approach.
♦ Use case model
This technique is intended specifically for use with business users, and it is
reasonably visual. It is therefore a very good basis for a dialogue between
you as system users and IS professionals.
♦ Dataflow diagrams
This technique is intended specifically for use with business users, and it is
reasonably visual. It also breaks large problems down into smaller, more-
manageable ones. It is therefore a very good basis for a dialogue between
you as system users and IS professionals.

3.0.8. Business Process Modelling


It has long been recognised that there is a need for a fundamental and high-level
analysis of the business's processes. In fact, in early strategic information systems
planning exercises, it was not uncommon to seek to do a data flow diagram and an
entity relationship model for a complete organisation. However, these tools may be too
low-level when all that is required is to identify potential systems requirements.
More recently, specific business process modelling techniques have been developed.
The approach which is gaining favour currently is that of the Business Process
Management Initiative. See http://www.bpmi.org/ (checked 24/11/2008)
This approach, though of great interest, is outside the scope of this document.
Instead, the assumption is that a single business process is being “computerised”, that
is, supported by a new computer-based information system.

3.0.9. SSADM
SSADM itself is less widely used than once it was but remains important, not least
because it is relatively easy for business people to understand when compared with
more modern techniques.
For a good worked example of all SSADM techniques, please see
http://www.systemsanalysis.org.uk/ accessed 24/11/2008.
Wikipedia (accessed 26/02/2008) has a useful summary of SSADM (Structured
Systems Analysis and Design Methodology):
http://en.wikipedia.org/wiki/Structured%20Systems%20Analysis%20and%20Design%20Method
The following material was found at http://www.edrawsoft.com/SSADM.php accessed
03/01/2009.
♦ Introduction - Structured Systems Analysis and Design
Methodology (SSADM)
SSADM (Structured Systems Analysis and Design Method) is another method dealing with
information systems design. It was developed in the UK by CCT (Central Computer and
Telecommunications Agency) in the early 1980's. It is the UK government's standard method
for carrying out the systems analysis and design stages of an information technology project.
SSADM has been traditionally used for the development of medium or large systems. However,
one variant of SSADM is 'Micro SSADM' which is for small systems. SSADM starts from
Page 38 of 144
defining the information system strategy and then develops a feasibility study module. These
are followed by requirements analysis, requirements specification, logical system specification
and a final physical system design.
♦ Structured Systems Analysis and Design Methodology
(SSADM) Stages
SSADM consists of 5 main stages (which are broken-down in several sub-stages). The 5 main
stages are:
♦ Feasibility Study
The Feasibility Study involves a high level analysis of a business area to determine
whether it’s feasible to develop a particular system. Data Flow Modelling and (high-
level) Logical Data Modelling can be used as technique during this stage.
♦ Requirements Analysis
In the Requirements Analysis stage requirements are identified and the current
business environment is modelled, business system options are produced and
presented. One of these options will be chosen then refined. Data Flow Modelling
and Logical Data Modelling can be used as technique during this stage.
♦ Requirements Specification
In the Requirements Specification the functional and non-functional requirements are
specified as a result of the previous stage. Data Flow Modelling, Logical Data
Modelling and Entity Event Modelling can be used as technique during this stage.
♦ Logical System Specification
In the Logical System Specification the development and implementation
environment are specified, and the logical design of update and enquiry processing
and system dialogues are carried out.
♦ Physical Design
During the Physical Design the logical system specification and technical
specification are used to create a physical design and a set program specifications.
♦ Applicability of SSADM
Unlike rapid application development, which conducts steps in parallel, SSADM builds each
step on the work that was prescribed in the previous step with no deviation from the model.
Because of the rigid structure of the methodology, SSADM is praised for its control over
projects and its ability to develop better quality systems. Most current developers find it too
onerous in its application, however.

3.0.10. MERISE
This is a French equivalent to SSADM.
See, for example, http://www.commentcamarche.net/merise/concintro.php3 accessed 24/11/2008.

3.1. Feasibility study


Research the basic situation and write up a natural-language description of the scenario. Go on
to describe the purpose of the database, who will use it; why; and for storing what kind of data
♦ Include a list of the basic, obvious system requirements (things
the system must do) as enunciated by its users or their
representatives
♦ Set out the basic constraints – budget, timescale, existing
systems, etc.
Page 39 of 144
♦ Can the user afford the hardware, software, development effort
and training required to implement the system?
♦ Do the benefits (financial and other) exceed the costs?
♦ Is there sufficient time to build a system in this way, or will the
user have to make do with a bought-in package?
The result of a feasibility study is usually a Go / No-Go decision, with the go sometimes given
to a proposal whose scope is more limited than the original idea.

3.2. Set out Project Terms of Reference


The terms of reference for a project are an agreement between the client for a project and the
people responsible for carrying out the project. The terms of reference describe the context and
scope of the project, identify the client, implementers and project management approach, and
set out the overall timetable for the project.

3.3. Analyse the needs of users


The next step is a thorough analysis of user requirements.
You might use Use Case diagrams (see also Appendix 1) and Data Flow Diagrams (see also
Appendix 2) for this purpose:

3.3.1. Identify business processes using a high-level Use Case


diagram
This should be done for a whole area of business, not just the part which you intend to
computerise. The purpose of this step is to ensure that you have a good idea of what is,
and is not, in the scope of a particular process to be computerised. This can be done
using a high-level Use Case diagram; the technique is described in Appendix 1.

3.3.2. Identify detailed requirements for a process to be computerised:


carry out Process Modelling
This is done using Data Flow Diagrams, which can be used in an iterative, refining
fashion both to understand the needs of users better, and to improve on the existing
situation with models of a better, computerised, solution.
The elements of a Data Flow Diagram (DFD) are:
♦ External entities
These are people and/or systems outside the scope of the system being
modelled. They may also indicate required entities in the subsequent data
model.
♦ Processes
These are business activities within the scope of the system being modelled,
which process things and data in order to carry out an activity of value to the
business.
♦ Data stores
Which also indicate required entities in the subsequent Data Model.
♦ Flows
DFDs are intended to identify flows of data.
For a tutorial on DFDs, see http://www.cems.uwe.ac.uk/~tdrewry/dfds.htm checked
24/11/2008.
Page 40 of 144
Appendix 2 describes the technique.

3.4. Decide the purpose and basic contents of the database –


Data Modelling
The approach adopted here is to develop an Entity Relationship (Attribute) [ER; ERA] model
(after the method first proposed by Peter Chen, 1976).
What data has to be stored? And how is it related?

3.4.1. Basic Constructs of ER Modelling


The ER model views the real world as a construct of entities and association between
entities.

3.4.2. Deciding entity types


♦ An entity type is a class of real-world thing of which there is
(usually) more than one occurrence
♦ Its name is a noun or noun-phrase
♦ As an initial indication: in a natural language description of the
scenario, underline the names of things which occur more than
once in the real world
♦ Look for “persistence” - longevity, ongoing significance
♦ Do not include reports - this is output information, we are
looking for the structure of the underlying data
Invoices may be stored as an entity type. However, in my view, an Invoice is
a report which is created at a specific moment in order to seek payment from
a customer. It can therefore be argued that there is no need to store its details
in the database.
♦ Do not include "calculated" items as attributes
For example, it is usually wrong to store calculated per-ordered-item costs in
a specific attribute such as amount, when these can quickly be calculated as
(quantity * unit price) at the time of use of a query or report.
♦ “Rationalise” (merge, remove) irrelevant entities
♦ It isn’t always clear when something is an attribute of
something else, or an entity type in its own right; however, an
attribute should never be a list – if it is, this indicates another
entity

3.4.3. Entities6
Entities are the principal data object about which information is to be collected.
Entities are usually recognizable concepts, either concrete or abstract, such as person,
places, things, or events which have relevance to the database. Some specific examples
of entities are Employees and Projects. An entity is analogous to a table in the
relational model.
Entities can be classified as independent or dependent (in some methodologies, the terms
used are strong and weak, respectively). An independent entity is one that does not rely on
another for identification. A dependent entity is one that relies on another for identification.

6
This material was in part found at http://www.edrawsoft.com/datamodel.php checked 18/10/2009.
Page 41 of 144
An entity occurrence (also called an instance) is an individual occurrence of an entity.
An occurrence is analogous to a row in the relational model.
♦ Special Entity Types
∗ Associative entities (also known as link or intersection
entities) are entities used to associate two or more
entities in order to reconcile a many-to-many
relationship.
∗ Subtypes entities are used in generalisation hierarchies
to represent a subset of instances of their parent entity,
called the supertype, but which have attributes or
relationships that apply only to the subset.
An example is B2B Customer, a specialisation of Customer. The
Customer entity has the main attributes. A B2B entity then has
additional attributes specific to B2B, for example, credit
arrangements or contact details. Customer and B2B customer have a
one to one relationship.
Associative entities and generalisation hierarchies are discussed in more
detail below.
♦ What are the main entities / tables?
We now go on to decide which tables are necessary and how they link
together. There should be a table for each class of real-world thing, or 'entity'.

3.4.4. Relationships
A Relationship represents an association between two or more entities. Example of
such a relationship might be:
1. Employees are assigned to projects
2. Projects have subtasks
3. Departments manage one or more projects
Relationships are classified in terms of degree, connectivity, cardinality, and existence.
These concepts are discussed below.
♦ Relationships and linking
How are the entity types inter-related? There are three basic possibilities,
sometimes referred to as the cardinality of the relationship. Cardinality
specifies how many instances of an entity relate to one instance of another
entity.
Ordinality is also closely linked to cardinality. While cardinality specifies the
number of occurrences of a relationship, ordinality describes the relationship
as either mandatory or optional. In other words, cardinality specifies the
maximum number of related records and ordinality specifies the absolute
minimum number of related records. When the minimum number is zero, the
relationship is usually called optional and when the minimum number is one
or more, the relationship is usually called mandatory.
∗ 1:1 (one to one)
In a one-to-one relationship, each record in Table A can have only
one matching record in Table B, and each record in Table B can
have only one matching record in Table A. This type of relationship
is not common, because most information related in this way would
be in one table. For example, it may not be necessary to have a
Page 42 of 144
separate credit reference entity; instead, its attributes could appear
on the customer entity.
You might use a one-to-one relationship to divide a table with many
fields, to isolate part of a table for security reasons, or to store
information that applies only to a subset of the main table. For
example, you might want to create a table to track employees
participating in a fundraising soccer game. The additional attributes
for employees who are also football players would be stored in a
football player table, linked one-to-one to employee. This is done
because the vast majority of employees will not be football players.
Similarly, you might have a general customer table, and then link it
to a B2B table (for B2B-specific elements) and a B2C one. See also
generalisation hierarchies below.
∗ 1:M (one to many)
A one-to-many relationship is the most common type of
relationship. In a one-to-many relationship, a record in Table A can
have many matching records in Table B, but a record in Table B has
only one matching record in Table A.
∗ M:N (many to many) and their resolution into two 1:M,
1:N relationships to a new link entity
In a many-to-many relationship, a record in Table A can have many
matching records in Table B, and a record in Table B can have
many matching records in Table A. This type of relationship can
only be stored in a database by defining a third table (called a
junction table, or a link or intersection entity) whose primary key
consists of or includes two fields - the foreign keys from both
Tables A and B. A many-to-many relationship is really two one-to-
many relationships with a third table. For example, an Orders table
and a Products table have a many-to-many relationship that's
defined by creating two one-to-many relationships to an Order
Details table.
It is occasionally necessary to add another attribute to the key to
ensure uniqueness – often this is a date/time field.

3.4.5. Fields: What are the attributes of each entity?


Attributes describe the entity of which they are associated. A particular instance of an
attribute is a value. For example, "DOLMAN Arthur" is one value of the attribute
Name. The domain of an attribute is the collection of all possible values an attribute
can have. The domain of Name is a character string.
Attributes can be classified as identifiers or descriptors. Identifiers, more commonly
called keys, uniquely identify an instance of an entity. A descriptor describes a non-
unique characteristic of an entity instance.
What attributes / fields are required? In other words: what characteristics or attributes
does each table have? For example, an animal type entity has a primary key of type,
and other attributes, such as the number of legs this kind of animal has, and its normal
diet.
Each of the characteristics represents a different field in the table and to differentiate
them they need a unique name. A database management system such as Access
requires to be told the name of each field (attribute) and type of data (text, numeric,
date etc.) which that field represents. If it is a text field the largest character size, e.g.
the biggest name to be stored will need to be included.

Page 43 of 144
 AVOID “repeating fields”- fields whose name is in
the plural, or which imply a plural, which almost
certainly requires a list of values - this is almost
invariably a sign that an extra table is needed.
As an example in a students database, do not make qualifications into fields of Student
– a student already has many qualifications and will gain more. The very fact that the
word qualifications is in the plural is an indication that the relationship between
student and qualification is in fact many-to-many. So a good database design is:

Note that, in accordance with the rule that the primary key of the one end of a many-
to-may relationship becomes an attribute of the many end – where it is known as a
foreign key – the entity type Award has attributes Qualification and Student no. Very
frequently, the combination of the foreign keys is the best primary key for the new
entity type. However, it is sometimes necessary to add a date or time attribute to make
the key unique – this is arguably necessary here because it is possible to envisage a
student achieving a qualification on more than one date. However, for simplicity, we
have ignored this rare possibility here.

3.4.6. Data type: Domain


For each field: it is necessary to consider the type of data it will contain. Numeric?
Integer or real? etc. If text: how many characters? The combination of type, size and
all its potential values is sometimes called the domain of an attribute.

3.4.7. Identify Domains


This concept, which goes beyond the Chen model, is both well-based theoretically and very
useful in practice. A Domain is the list of all possible values of an attribute. Thus you might
know that the set of all possible values of a Sex attribute is Male and Female (for mammals);
you might also choose to add the value Hermaphrodite (to cover worms). It is also very
common to permit a Null value, meaning that for a particular individual we do not know what
their sex is. However, with these four permitted values, we have defined ALL possible values of
that attribute. From this we can state that a Sex attribute should be a 1-character Text field, and
that a Validation rule should permit only the values M, F, H and (perhaps) space, representing
<null>. We have identified the domain of the Sex attribute.
Page 44 of 144
It is important to think about the Domain of an attribute for two reasons:
♦ The Domain determines the data type, size and permitted
values
All attributes having the same Domain should have the same data type, size and
permitted values.
Therefore a Surname should be defined in the same way throughout a database
implementation.
Neither MS Access nor the vast majority of actual database management systems
provide direct support for the Domain concept - instead, it is the responsibility of the
implementer to ensure that all attributes which share the same domain are defined
with the same type (e.g. numeric integer, text ….), size (e.g. long, double, 5-
character text ….), and that appropriate validation rules are defined and enforced.
In the case of the animal type entity, the number of legs attribute is an integer
number in the range 2 to 1000. The data type is integer; the domain is the total set of
possible values, in this case, 2, 4, 6, 8… 1000 (millipede!).
♦ Validation rules: What rules apply to each field having this
domain?
Simple example: Sex may be male or female. All other values should be
disallowed by a validation rule, which permits only M (male/masculine) or F
(female/feminine) (and, perhaps, unknown) as values for a Sex attribute.
Consider the validation rules for each data attribute. For example, in the
animal entity, the attribute number of legs must be a value in the domain of
all possible values. Values such as three and 5 are never valid. Consider
setting a rule which does not permit these values. This has the benefit that it
decreases the likelihood of storing bad data.

3.4.8. Classifying Relationships


Relationships are classified by their degree, connectivity, cardinality, direction, type,
and existence. Not all modelling methodologies use all these classifications.
♦ Degree of a Relationship
The degree of a relationship is the number of entities associated with the
relationship. This is usually two, since only two entities are involved in any
given relationship - such relationships are called binary relationships. This
association between two entities is the most common type in the real world.
The n-ary relationship is the general form for degree n. Special cases are the binary
and ternary, where the degree is 2 and 3, respectively.
∗ Binary relationships
This association between two entities is the most common type in the real
world.

(a) A recursive binary relationship occurs when an entity is


related to itself. An example might be "some employees are
married to other employees".
∗ Ternary relationships
A ternary relationship involves three entities and is used when a binary
relationship is inadequate. Many modelling approaches recognize only
binary relationships. Ternary or n-ary relationships are decomposed into

Page 45 of 144
two or more binary relationships. They are sufficiently rare to be ignored in
the remainder of this document.
♦ Direction
The direction of a relationship indicates the originating entity of a binary
relationship. The entity from which a relationship originates is the parent
entity; the entity where the relationship terminates is the child entity.
The direction of a relationship is determined by its connectivity. In a one-to-
one relationship the direction is from the independent entity to a dependent
entity. If both entities are independent, the direction is arbitrary. With one-to-
many relationships, the entity occurring once is the parent. The direction of
many-to-many relationships is arbitrary.
♦ Type of relationship
An identifying relationship is one in which one of the child entities is also a dependent
entity. A non-identifying relationship is one in which both entities are independent.
♦ Existence
Existence denotes whether the existence of an entity instance is dependent upon the
existence of another, related, entity instance. The existence of an entity in a
relationship is defined as either mandatory or optional. If an instance of an entity
must always occur for an entity to be included in a relationship, then it is mandatory.
An example of mandatory existence is the statement "every project must be
managed by a single department". If the instance of the entity is not required, it is
optional. An example of optional existence is the statement, "employees may be
assigned to work on projects".
♦ Generalisation Hierarchies
A generalisation hierarchy is a form of abstraction that specifies that two or more
entities that share common attributes can be generalized into a higher level entity
type called a supertype or generic entity. The lower-level of entities become the
subtype, or categories, to the super type. Subtypes are dependent entities.

3.4.9. Keys: primary and secondary (“foreign”)


What are the primary and foreign keys in the attributes of the entities you have
identified?
There is one and only one primary key per entity type. One (sometimes more) field(s)
will uniquely identify each entity in a database; therefore, we have to set it to be the
primary key.
The primary field of an animal patient might be its name or the owner name. However,
both of these are bad choices. Why? What better alternative can you suggest?
Patient also needs to contain a foreign key - the name of the animal type. Why?

 NEVER FORGET: if a table is at the many end of


one (or more) one-to-many relationship(s), then the
attribute or attributes which uniquely identify
records in the One table must also appear as
attribute(s) in the Many table. These one or more
attributes, known together as a Foreign key, are
essential because they link records in the Many
table to a record in the One table.
Therefore, it is necessary to have a foreign key in
Episode corresponding to the patient-code attribute in
Patient, and also a foreign key corresponding to the

Page 46 of 144
treatment-name attribute in Treatment. The
combination of patient-code and treatment-name is
not, however, sufficient in this case to act as the
primary key – it is also necessary to include the date in
order to create a unique key. As a rule of thumb if
there are two or more columns within a given table
which together are the logical way to identify that row
(and the way you would always join to the table), then
use those as a compound key, otherwise assign a
separate auto increment column as a primary key.
∗ Candidate keys
There may be more than one possible candidate for use as the
primary key of a table. For example, in an employee table, you
could use either the company generated employee number, or the
Social Security number. In this situation, we say that there are two
candidate primary keys.
∗ Choosing a primary key
One primary key must be selected for the table. A primary key can
sometimes be a compound key, that is it may consist of two or more
elements, which in combination uniquely identify the entity
occurrence.
There may be several candidates, but each entity has one and only
one primary key.
∗ Entity integrity rule
The entity integrity says that no field participating in the primary
key of an entity may be null. Null means empty, or spaces, or zero,
etc.
∗ Multi-part primary keys
Where a link or intersection entity is used to resolve a many to many
relationship into two one to many relationships, it is common for
each foreign key in the child entity to form a part of a compound
primary key. Sometimes it may be necessary to add an additional
part to ensure that the primary key is unique for each instance; most
commonly, it is necessary to add a Date.
∗ Foreign keys
In order to create a one to many relationship between two entity
types, the primary key of the parent entity (or, much more rarely,
another candidate key) is replicated in the child entity as the so-
called foreign key.
Foreign keys implement one to many (1:M) relationships in the
following way. If two entity types are related 1:M, then the primary
key attribute(s) (or, rarely, the alternate key attribute(s)) of the one
entity MUST appear as attribute(s) of the many entity. This is
because this is the only way in which the database software can
“join” the many records to the one. Consider a situation in which
students are on a programme. The entity types are Programme and
Student, related 1:M. If the primary key of Programme is
Programme_Code, then Student must also have a Programme_Code
attribute.
♦ Ensuring referential integrity

Page 47 of 144
The terms referential integrity, linking, Primary and Foreign keys and
relationship can be described in this way: Two tables can be linked by a
relationship. This link can be one-to-one (e.g. husband to wife), or one-to-
many (e.g. one brand of car gives rise to many models of car - but each
model has one and only one brand). Coordination is accomplished with
relationships between tables. A relationship works by matching data in key
fields - usually a field with the same name in both tables. In most cases, these
matching fields are the primary key from one table, which provides a unique
identifier for each record, and a foreign key in the other table. For example,
employees can be associated with orders they're responsible for by creating a
relationship between the Employee table and the Order table using the
EmployeeID fields. You can ask Microsoft Access to enforce referential
integrity: if a table such as patient is related to another table animal type, and
referential integrity is enforced, then Access will only allow a new patient to
be introduced if the animal type already exists in the animal type table.
When you create the properties of a new relationship, you can specify the
behaviour to be followed:
∗ Insert
∗ Update
If a primary key is changed in an owning table, should the system
automatically change the related foreign keys? The answer will
usually be yes, and the option should be set.
To be certain, it is necessary to model the ordinality of a relationship, as
mentioned in section 3.4.4 and again in section 3.4.11.
∗ Delete
If a parent record is deleted, should the system automatically delete
all the associated child records? Setting this option should only be
done after careful thought!

3.4.10. Normalisation
We should now go back and check each attribute list is:
∗ Complete
∗ Has the right attributes on the right entities
We may choose to use the formal relational data analysis technique called normalisation. This
technique is described in appendix 6. It is a useful cross-check, and is not essential.

3.4.11.ER Notation
There is no standard for representing data objects in ER diagrams. Each modelling
methodology uses its own notation.
The original notation used by Chen is widely used in academic texts and journals but rarely
seen in either CASE (Computer Aided Software Engineering) tools or publications by non-
academics. Today, there are a number of notations used, among the more common being
Bachman, crow's foot, IDEFIX and SSADM.

Page 48 of 144
Source: http://en.wikipedia.org/wiki/File:ERD_Representation.svg accessed
18/10/2009.
All notational styles represent entities as rectangular boxes and relationships as lines
connecting boxes. Each style uses a special set of symbols to represent the cardinality
of a connection.
♦ Showing relationships diagrammatically using the crow’s foot
notation
The symbols used in this document for the basic ER constructs are taken
from the American Information Engineering tradition and are also called the
crow’s foot notation (in French, patte d’oie).
∗ Entities are represented by labelled rectangles. The
label is the name of the entity. Entity names should be
singular nouns.
∗ Relationships are represented by a solid line connecting
two entities. The name of the relationship is written
above the line. Relationship names should be verbs.
Page 49 of 144
∗ Attributes, when included, are listed inside the entity
rectangle. Attributes which are identifiers are underlined.
Attribute names should be singular nouns.
∗ Cardinality of many is represented by a line ending in a
crow's foot. If the crow's foot is omitted, the cardinality is
one.
∗ Existence is represented by placing a circle or a
perpendicular bar on the line. Mandatory existence is
shown by the bar (which looks like a 1) next to the entity
of which an instance is required. Optional existence is
shown by placing a circle next to the entity that is
optional.
There are many different ways of drawing entity-relationship diagrams. In
most of this document, we show one-to-many relationships using the crow’s
foot notation without particular concern for the ordinality.
Where it is desirable or necessary to consider ordinality (whether or not a relationship
is mandatory) we can use an extended set of symbols:

We have not been this precise in the remainder of this document.

3.4.12.Online tutorial
For an additional online tutorial about entity relationship modelling, see
http://www.cems.uwe.ac.uk/~tdrewry/lds.htm checked 24/11/2008. Note that this
tutorial sticks rigidly to the SSADM modelling conventions and names and makes
reference to Logical Data Structures, LDS. As it makes clear, “Logical data structures
are data models, and are sometimes called entity-relationship (ER) models or even
entity-attribute-relationship models.” In other words, LDS is a synonym for Entity
Relationship Model.

3.4.13.DFDs and ERDs – why both? How are they linked?


♦ DFDs are process models; BUT
∗ Data stores usually have to be stored in database, as
one or more entity types
∗ External entities may have to be stored in database, as

Page 50 of 144
one or more entity types
♦ ERM is a data model
∗ Used to analyse data requirements, and to design
database tables, attributes and relationships

3.4.14.Why BOTH Data and Process models?


 To analyse requirements for data, you should create an Entity Relationship model
(also known as a Data Model, an ER model and an ERA model). An ERA model
is a complementary technique to process modelling, done for example using Data
Flow Diagrams – both are necessary before the overall requirements of a system
are understood.
Process modelling using DFDs can be used in an iterative, refining fashion both to
understand the needs of users better, and to improve on the existing situation with
models of a better, computerised, solution
There is no direct link between DFD process models and ERA data models. However,
data stores and data flows give clues as to what data needs to be modelled:
♦ External entities and data stores indicate required entities
Although note that in some cases a data store is actually an updateable view
of (that is, an updateable query on) one or more entities.
♦ Data flows to and from external entities indicate system inputs
and outputs

3.5. Cross-check: entity life history


This is another useful cross-check.

3.5.1. Cross-check DFD and ERA


The data stores and external entities on the DFD must have counterpart entity types on the
ERA diagram – although the correspondence is not necessarily one to one (it can be an
updatable view of one or more entity types), there should be somewhere to store all the data
indicated in the DFD.

3.5.2. Time dimension


Ensure that, for all major entity types, there are processes which CReate, Update, and Delete
(CRUD!) them. It may be necessary to create specific processes to carry out operations which
create, update or delete entity types. But note that some systems do not ever delete data,
instead, they may archive the data.
Formal Entity Life History models exist as part of SSADM but are rarely constructed nowadays. It is
usually sufficient simply to ensure that the above points 3.5.1 and 3.5.2 are respected. See also
http://www.cems.uwe.ac.uk/~tdrewry/modeling.htm#Modeling%20Techniques

3.6. Model User <-> System Interactions


Work out what interactions will take place between system users and the information system
you are creating. This can be done by using a low-level Use Case diagram or by other
techniques not taught in the School.
♦ System inputs and outputs
Identification of these can be aided by Use Case diagrams, since these
indicate who needs to use a system and for what. The Use Case diagram will

Page 51 of 144
indicate the basic interactions between the system and its users - some of
these will be data input / update actions, others will involve information
output. They can also be used to derive a list of system inputs and outputs –
e.g. forms, reports and/or webpages. The technique is described in Appendix
1.

3.7. Define required outputs: reports, forms, queries


What is the database actually for? What Information is to be Output? (Query etc.). Think out
what OUTPUTS are required from the system. This is the information that system users require
FROM the system in order to be able to do their jobs more effectively, to make better decisions
etc.
From this, you can create a list of the information-yielding Forms and Reports required, and the
Queries necessary to support those forms and reports.

3.8. How will Input / Update be carried out (Forms etc.)?


Think out what the necessary system INPUTS are, and when system users will be in a position
to store that input data. How will Input / update be carried out (Forms etc.)?
Although formal dialogue design methods exist, they are usually taught as part of specific HCI
(Human Computer Interface) or webpage design courses. The approach recommended in this
module is simply to identify the necessary forms, perhaps sketching them out on paper or in an
MS Office product. The Use Case diagram produced as part of the modelling of User <->
System interactions tells you where forms, etc, are needed: each interaction between a human
user (actor) and a Use Case implies a need for Inputs and Outputs.

3.9. Work through your design on paper, whiteboard, etc.


As far as you can, ensure that the system you are proposing will do the job you have outlined
for it. Doing this properly is time consuming and is hard work, but it usually pays off in terms
of avoided wasted implementation effort at the computer. It is also extremely useful in helping
to identify the steps in a test plan, that is, the tests that you will have to carry out on the
implemented system and the expected results of those tests.
An excellent technique for improving the quality of your work is to work with team colleagues.
This can either be explicit collaboration (e.g. working in pairs); or it can take the form of
structured walkthroughs. Structured walkthroughs are discussed in appendix 8.

3.10. Implementing processes in Access


The implementation of complex processes will involve at least macros and perhaps module
programming in (for example) the Visual Basic for Applications programming language.
However, for many simple database applications, this is not necessary. Instead, all that is
required is to guide the user as to which form, query or report they should next be using. This
can be done by creating an initial form, called a switchboard, and by the use of menus and
submenus. This approach leaves the choice of what step to take next to the user, a style of use
which we sometimes call event-driven.

3.10.1. System data processing


In order actually to carry out the data processing, Microsoft Access supports update,
append and delete operations which operate on complete sets of records defined by
corresponding queries. Therefore, the contents of the database can be changed under
the control of queries.
Beyond that, it is necessary to use a programming language. The specification of data
processing is carried out by means of descriptions of the algorithms involved, using techniques

Page 52 of 144
such as pseudocode. The language supported within Access is Visual Basic for Applications.
The use of pseudocode and of programming languages is beyond the scope of this document.

3.11. Define (“design” in Access terms) the database: Build a


prototype
This is the stage at which you "tell" Access what your design decisions are, as you design the
tables, attributes, forms, queries, reports etc. There's a fair bit of work here, but if you have
done your "homework" in terms of carefully carrying out the steps already described, and have
already taken the trouble to become reasonably competent in your use of Microsoft Access as a
package, you should be able to concentrate on a good Access implementation of a design in
which you can already place some confidence.

3.12. Refine / iterate / implement


At each stage, but particularly at the design stage and early implementation stage, you will be
learning more about what your system should be doing and you may also be finding problems
with the way it currently works. Where necessary, revise the earlier work and then go on to
ensure that later work is in line with the changed requirement. If you do this conscientiously,
there should not be too many surprises at the system testing stage.

3.13. Test the database


As you implement the system, test as many aspects as you can as you go along. If, for example,
you have created a table and a form by which to input and/or update data in that table, ensure
that the form works and that the data you can input is sensible and appropriately validated. As
you add new forms and subforms, you should find that you can both test them, and also go back
to forms and queries you implemented earlier and further check that they behave as they ought
to do.
When you find problems, do not rush to solve them too quickly. It is wise to carry out as many
tests as you can, building up a “bug list”, or register of problems / deficiencies. Tackle these in
small batches, rather than one at a time, because you will often find that apparently different
problems are linked and have the same underlying causes.

3.14. Obtain User Feedback


Eventually, you will have a working system which does not have too many obvious
implementation faults. However, it is more than likely that it will not be what the would-be
system users expected. It is often the case that users react badly when first shown a system. This
is likely to be for one of the following reasons:
♦ You, the designer / implementer, have made mistakes in
implementing a user's requirements
You may well have omitted a user requirement, or got it wrong. Such
deficiencies are bugs, and you have to put them right.
♦ You, the designer / implementer, have failed to understand
some aspect of the user's requirements of a system
The fault in this case may lie with you or the user or both, but you have to
reach some agreement on what needs to be done to put the system right, and
it will normally be at your expense.
♦ The user has not themselves sufficiently thought through what
they require of a system
Even though the "fault" here is more obviously with the user, you still have
to reach some agreement on what needs to be done to put the system right.

Page 53 of 144
♦ The user, inspired by seeing the implemented system, decides
that they would like the system to do more
Great! A business opportunity! You enter the required additional
functionality on a document which you may grandly term the Enhancement
Register, work out the implications in terms of additional design and
implementation effort, and tell the user what the enhancements will cost - in
terms of later delivery and / or an increased bill. You should never allow
yourself to get dragged into a cycle of continuously responding to such
changes as you go along, without explicit renegotiation of the terms of
reference agreed at the outset of the project.

3.15. Refine the system by Iteration


The ideal in any area of activity is "Right, on time, first time, and every time" - that is,
wherever possible, we should aim to avoid repeated work and wasteful repetition. However,
even the most experienced information systems designers do not achieve this when designing
and implementing information systems. It is almost invariably necessary to go back to earlier
stages in the simple methodology, repeating the work. Note that if you change work carried out
at an early stage in the process, you will normally have to repeat not only that step but also all
subsequent steps. Professional systems analysts and designers build in a significant amount of
time and budget into their original plan in order to cover this reality.

4. Putting database design theory into practice

4.1. Design aids


Use some or all of the following as appropriate to the context in which you are working.
♦ Diagramming tools
E.g. Microsoft Visio Professional, SmartDraw, or EDraw. For more on using
Visio, please see Appendix 7.
♦ Using a spreadsheet as a simple data dictionary
Professional developers often make use of a database about the design
decisions they make. This database about databases is sometimes called a
Data Dictionary, sometimes a Repository. For the small-scale scenarios dealt
with in this module, it is possible and sensible to use a relatively-
straightforward spreadsheet as a simple Data Dictionary. Such spreadsheets
can be a useful aid when recording attributes and rearranging them
subsequently. See section 24.1 for an example of how this can be useful.
♦ CASE tools
CASE tools -- CASE here stands for computer aided software engineering (and has
nothing to do with use case diagrams!) -- are tools which are used by professionals
to manage the development of large and complex systems. The use of these tools is
beyond the scope of this document.

4.2. An Exercise
Assume that you are the people who originally designed the University of Anytown database used as an
example later in this booklet. Now that you know about the various stages required to analyse user needs
and design a database solution, carry out those steps for yourself for a business school. Go through the
various stages and carefully document what you do at each stage. Or, if you are responsible for database
design in an assignment I have set, do the same thing for that database.
This is a significant piece of work - it will probably take you at least a few hours of effort, and may well
take you a week of on-and-off effort.
Page 54 of 144
When you have finished the University of Anytown database design, compare the results of your work
with those of the original analyst / designers. You should find that you have reached similar or better
conclusions.

4.3. Achieving real competence in Database Design


If you can successfully tackle this exercise and test, you have achieved a reasonable mastery of
database design and should go on to a really difficult task. So you need to test out your skills! You can
do some work on one of the suggested scenarios that follow.
Alternatively and/or additionally, move on directly to your own scenario, such as the one you are working
on as part of an assignment set by your teachers.

4.3.1. Documented scenarios


I have produced a companion document, called Database design and implementation cases,
which is available on request. This describes the following scenarios.
♦ Instruction Training Company
This requirement is separately documented as "Instruction Training Company".
♦ Dating agency
This requirement is separately documented as "Seekers Dating Agency".
♦ Filing system
This requirement is separately documented as "Filing System".
♦ Video Collection
This requirement is separately documented as "Video Collection".
♦ A catalogue of your record, CD or tape collection
- To include entities such as Album, Artist, Track, and attributes such as media (e.g.
CD, tape, vinyl) and play-time (length of track in minutes and seconds).
This requirement is separately documented as "Media System"
♦ A contact / correspondence management system
This requirement is separately documented as "Contact Manager".
♦ A Student CD Library system
This requirement is separately documented as "Student CD Library".

4.3.2. Suggested but undocumented scenarios


Decide on a scenario of value to you in your work or in your leisure time in which a database
might be of value. In each case, you should prepare a scenario document, similar to the ones
listed above, which sets out the main features of the problem area you intend to tackle. Here
are some ideas - you can of course come up with your own idea, but you are advised to discuss
it with a tutor just in case it is inappropriately complex:
♦ Details of the modules (“courses”) you have studied, the
lectures and classes which formed part of that module, and a
diary of significant events.
♦ Details of your personal research - to include entities such as
references, authors, citations.

Page 55 of 144
Be aware that this is a fairly difficult example to tackle - for example, how will your
database design cope with an article by many authors (NOT just two or three, maybe
five?).

4.3.3. Further study


Before going much further with database design, and assuming you are full of enthusiasm for
database, you are advised to study the topic in a textbook, learning more about concepts such
as normalisation, which is a very useful technique for ensuring that each table has exactly the
right attributes. Normalisation is introduced in appendix 6. For a basic treatment, see [Hughes
2000]; for an advanced treatment, see [Date 2003]

5. More about Databases

5.1. What is a database?


A database is a collection of inter-related data, stored together without unnecessary
redundancy, which can serve multiple uses and applications
It is the implementation on a computer of the data model.
Databases were originally a reaction to uncoordinated “conventional files”.
A database is a collection of inter-related data, stored together without unnecessary
redundancy, which can serve multiple uses and applications. It is the physical implementation
of the data model (entity-relationship model) created by the data analyst.
Entities are implemented as tables.
Entity occurrences are records.
Attributes are implemented as fields.
Key relationships may be enforced - in Access, for example, it is possible to "enforce
referential integrity". This ensures that in a one to many situation, the many-end record cannot
exist unless a one-end record already exists. This is good, in that it prevents unwittingly
creating records that are not linked to anything else.
The emphasis throughout is on effective data retrieval in order to answer arbitrarily complex
questions.

5.2. The history of databases


♦ Hierarchic and network databases were invented in the 1960s.
♦ 1970: Dr. E.F. Codd introduces the concept of the relational
database
In 1970, the expatriate British researcher Edgar Codd was working for IBM in the
United States. He suggested that a better basis for database implementation was
relational set theory, a mathematical approach.
♦ Concepts are relatively simple and have a strong theoretical
basis, that of mathematical set theory
In a relational database, care is taken to keep all the data for a set of like entities in
what is mathematically a relation or a set, but what we would probably refer to as a
table of records: e.g. student. Each different kind of entity is kept separately: so we
might also have a programme entity (or relation or table - these are equivalent
terms). Student records are linked to a programme record by means of a shared
linking attribute, in this case, the programme code.
♦ Relational model does have limitations but is currently the
Page 56 of 144
dominant paradigm (way of thinking)
♦ Object databases are just beginning to become commercially
significant and might dominate eventually
The relational database paradigm has been dominant since about 1980, and has yet
to be displaced by the more recent object database approach.
∗ Oracle – hybrid object-relational approach

5.3. Implementing data models in MS Access


♦ Entities are implemented as tables
∗ Entity occurrences are records or rows in the tables
♦ Attributes are implemented as fields
♦ Key relationships should almost always be enforced
∗ Set cascade update yes; think hard about cascade
delete

The diagram shows a situation in which a foreign key, programme


code, in a Student table is being linked to the corresponding
programme code in the Programme table. It is necessary for a
Programme to exist before a Student can be registered. It is
probably appropriate automatically to cascade any change to the
programme code in Programme to each Student record having that
code. By contrast, the deletion of a Programme might not require
the deletion of linked Students (who perhaps studied on the
programme before it was deleted).
♦ The emphasis has to be on effective data retrieval in order to
answer arbitrarily complex questions

Page 57 of 144
5.4. What is a database management system?
♦ Software which manages a database
♦ Implements entities as tables, maintaining and enforcing
relationships
♦ Deals with all the component disc files
♦ Provides functions such as
∗ Table creation and structural updating
∗ Insert, update and delete operations, on individual
records and on complete sets of records
∗ Queries, reports and forms

5.5. First challenge: database design


A professional systems analyst will carry out parallel data analysis and also process modelling.
The results of data analysis will be compared with Data Flow Diagrams (process view): data
stores and flows give clues as to what data needs to be modelled.
♦ Decide the purpose of the database
♦ Analyse the needs of its users in data terms
∗ Compare with Data Flow Diagrams (process view): data
stores and external entities give clues as to what data
needs to be modelled
∗ Informal cross-referencing between process and data
models
Informal cross-referencing between process and data models may
point up potential problems in one or both sets of analysis.
♦ Design the database on paper
Design the database on paper, define the computer implementation, and only
then think about implementation!
It is very important to give consideration to how input / update be carried out
(forms etc.); in connection with electronic business it is worth noting that
nearly every interaction with the system will take place through a web
interface.

5.6. Second challenge: database implementation


♦ Define the computer implementation
∗ Tables
∗ Attribute types: e.g. student number is 11 chars text,
split three letters and eight digits
♦ How will input / update be carried out?
∗ Forms
∗ Web interface

5.7. An inductive approach


Page 58 of 144
♦ Learning from examples:
∗ Student: Anytown case
See section 9.
∗ NorthWind2003 (complex but useful in an e-business
context; based on standard Microsoft NorthWind
example)
NorthWind2003 is a complex but useful example in an e-business
context; based as it is on the standard Microsoft NorthWind
example

5.8. What Is a Database?


♦ A structured collection of ELECTRONICALLY STORED data
∗ Controlled & accessed through computers
∗ The structure is given by predefined relationships
between predefined types of data items
♦ May include many types of data

5.9. What Is a DBMS?


♦ Database management system (DBMS) = an integrated set of
programs, used to define, update, and control the database
♦ Examples
∗ Small MS Access,
OpenOffice.org Base
∗ Medium MS SQL Server, MySQL, PostgreSQL
∗ Large ORACLE, IBM DB/2

Page 59 of 144
SECTION 2 – USING MICROSOFT ACCESS TO BUILD GOOD
DATABASES

6. Introduction to Microsoft Access


The Microsoft Office Access relational database management system is software to manage databases.

6.1. What is a database management system?


Microsoft Access is an example of a Relational DataBase Management System (RDBMS). It is
positioned in the marketplace as an office productivity aid. As such, it has limited resilience
and recovery facilities, can be used by more than one person at the same time, but is NOT
suitable for "mission-critical" high reliability or high performance applications - beyond a
handful of concurrent users, Access runs out of steam!
Access - and other small scale databases, such as Filemaker Pro - offer the standard features
expected of such programs:

6.1.1. Software which manages a database

6.1.2. Implements entities as tables, maintaining and enforcing


relationships

6.1.3. Deals with all the component disc files


In Access, there is only one file per database. Bigger databases have much more
complex file structures stored on disc.

6.1.4. Provides functions such as


♦ Table creation and structural updating
♦ Form-based insert, update and delete operations, on individual
records and on complete sets of records
♦ Query and report facilities
Access provides powerful query and report facilities. When you define a
query, what Access does on your behalf, behind the scenes, is to create and
then run a query expressed in a powerful industry-standard programming
language called SQL (Structured Query Language). You can in fact see the
generated SQL if you use View / SQL View.

6.1.5. An approachable programming language


Access offers Visual Basic for Applications (VBA) and SQL.

6.2. Important facilities of more advanced DBMS


More advanced DBMS offer additional features, and are usually structured to operate in a so-
called "client server" situation. In a client-server application, client programs running on PCs
or other low-power computers present data to individual users. The data itself is managed and
stored on a database server computer to which all the client machines are connected.
The clients may either be directly connected to the distant database server, or they may run a
local database (usually Access) which connects to the distant database server (e.g. MS SQL
Page 60 of 144
Server, Oracle ….) using the Open DataBase Connectivity feature ODBC, or they may present
the data in the database by means of web pages. The architecture then looks something like:

6.3. Further facilities of more advanced DBMS


♦ Support many users and multiple applications
∗ MS Access does this, sort of ... an individual database
may support a handful of users
♦ Depend upon a data dictionary (sometimes called a repository)

Page 61 of 144
♦ Integrate with the CASE (Computer Aided Software Engineering) tool which
created and maintains the data dictionary
♦ Implement resilience and recovery mechanisms
These things include roll-forward and / or roll-back mechanisms so that complete transactions
(only) are carried out. Such mechanisms are essential to prevent situations where, for example,
money leaves one company’s bank account, but never reaches another company’s.
♦ Enforce security
Only privileged users should be able to see things like payroll data.

6.3.1. Other RDBMS


Examples of so-called "industrial strength" databases include
♦ The “Big Three”
These are the commercial databases at the heart of most large
enterprises:

(a) Microsoft SQL Server

(b) Oracle Corporation ORACLE

(c) IBM DB2


♦ Computer Associates INGRES
The first commercial-strength database management system, INGRES is
now an Open Source product.
♦ MySQL
MySQL is a relational database management system which has more than
11 million installations.
MySQL is popular for web applications and acts as the database
component of the LAMP, BAMP, MAMP, and WAMP platforms
(Linux/BSD/Mac/Windows-Apache-MySQL-PHP/Perl/Python), and for
open-source bug tracking tools like Bugzilla. Its popularity for use with web
applications is closely tied to the popularity of the PHP programming
language and the Ruby on Rails programming framework, which are often
combined with MySQL.
♦ PostgreSQL
These systems are capable of supporting very large numbers of users and transaction rates
measured in tens or hundreds every second. They are typically used to store so-called
"corporate" databases, and are "overkill" in the context of the personal and team productivity
applications to which Access is well-suited.

6.4. Why we want business students to learn Access


We expect our students to become competent Access users and (to a limited extent) designers.
We start off with Access for a number of reasons:

6.4.1. The relative ease-of-use of MS Access


Industrial strength databases, such as ORACLE, are harder to learn and less well
integrated into the PC environment, whereas MS Access is easily accessible (sic!)

Page 62 of 144
6.4.2. MS Access is easily obtained
It forms a part of the Microsoft Office Professional and Premium office suites
(although it is included neither in the Small Business Edition nor the Student edition).
MS Access is available in French on ESC Rennes student workstations for students
who do not have a copy on their own personal machine. Alternatively a free copy can
be obtained by means of the MSDN Academic Alliance membership of the School.

6.4.3. MS Access supports usable programming languages


MS Access supports and integrates with the two most widely-used programming languages
associated with personal productivity aids and with databases. These two languages are Basic
and SQL. In fact, MS Access provides Basic and SQL in a number of ways:
♦ Visual Basic for Applications (VBA)
The dialect of Basic supported by MS Access is VBA - the same language also used
internally by several other Microsoft Office products, including Excel and the Visio
business drawing package.
♦ Visual Basic (VB) itself
Access can act as the so-called "Jet Engine", providing database facilities to
programs written in Visual Basic.
♦ SQL: Structured Query Language
SQL, Structured Query Language, is a database query language that was adopted
as an industry standard in 1986.
In their SQL standard, the American National Standards Institute ANSI declared that
the official pronunciation for SQL is "es queue el". However, many database
professionals have taken to the "slang" pronunciation sequel that reflects the
language's original name, Sequel, before trademark conflicts caused IBM to insist on
the ‘official’ pronunciation.
Access supports a reasonably-comprehensive subset of the ANSI SQL 92 standard.
This provides the basis for a high degree of integration with other databases - so
that, for example, an Access database can act as a client to an industrial-strength
database running on a server computer.

7. MS Access implementation of data models


Access provides a direct implementation of many of the features of a data model (entity relationship
model). Having completed an ERM, it can be implemented in Access as:

7.1. Tables, one per entity type

7.2. Fields, one per attribute

7.3. Records, one per entity occurrence

7.4. Attribute types in MS Access


What data type should you use for a field in a table?
Decide what kind of data type to use for a field based on these considerations:
• What kind of values do you want to allow in the field? For example, you can't
store text in a field with a Number data type.
• How much storage space do you want to use for values in the field?
Page 63 of 144
• What types of operations do you want to perform on the values in the field?
For example, Microsoft Access can sum values in Number or Currency fields,
but not values in Text or OLE Object fields.
• Do you want to sort or index a field? Memo, Hyperlink, and OLE Object
fields can't be sorted or indexed.
• Do you want to use a field to group records in queries or reports? Memo,
Hyperlink, and OLE Object fields can't be used to group records.
• How do you want to sort values in a field? In a Text field, numbers sort as
strings of characters (1, 10, 100, 2, 20, 200, and so on), not as numeric values.
Use a Number or Currency field to sort numbers as numeric values. Also,
many date formats will not sort properly if entered in a Text field. Use a
Date/Time field to ensure proper sorting.

7.5. Permitted data types in MS Access


The following table summarises all the field data types available in Microsoft Access, their uses, and their
storage sizes.

Data Type Use Size


Text (Français: Texte) Text or combinations of text and numbers, such as Up to 255 characters (if
addresses. Also numbers that do not require calculations, you need more text in a
such as phone numbers, part numbers, or postal codes. field, you have to use a
Memo field).
Memo (Mémo) Lengthy text and numbers, such as notes or descriptions. Up to 64,000
characters.
Number (Numérique) Numeric data to be used for mathematical calculations, 1, 2, 4, 8 bytes.
except calculations involving money (use Currency type). Set
the FieldSize property to define the specific Number type.
Byte Stores numbers from 0 to 255 (no fractions). 1 byte
(Octet: Numérique 1
octet)
Integer Stores numbers from –32,768 to 32,767 (no fractions). 2 bytes
(Entier: Numérique 2
octets)
Long Integer (Default) Stores numbers from –2,147,483,648 to 4 bytes
(Entier Long: 2,147,483,647 (no fractions).
Numérique 4 octets)
Single Stores numbers from –3.402823E38 to –1.401298E–45 for 4 bytes
(Réel simple) negative values and from
1.401298E–45 to 3.402823E38 for positive values
Double Stores numbers from –1.79769313486231E308 to 8 bytes
(Réel double)
–4.94065645841247E–324 for negative values and from
1.79769313486231E308 to 4.94065645841247E–324 for
positive values.

15 decimal places.
Date/Time Dates and times. 8 bytes
(Date/Heure)
Currency Currency values. Use the Currency data type to prevent 8 bytes
(Monétaire) rounding off during calculations. Accurate to 15 digits to the
Page 64 of 144
left of the decimal point and 4 digits to the right.
AutoNumber Unique sequential (incrementing by 1) or random numbers 4 bytes
(Numérotation automatically inserted when a record is added.
automatique)
NB: if you use an automatically numbered field as part of the
primary key of a table, and you also have to use it as the
foreign key in a linked table, the data type required in the
many end is long integer, which is how in fact an
AutoNumber field is stored.
Yes/No Fields that will contain only one of two values, such as 1 bit
(Oui/Non) Yes/No, True/False, On/Off.
OLE Object Objects (such as Microsoft Word documents, Microsoft Excel Up to one gigabyte
(Liaison OLE) spreadsheets, pictures, sounds, or other binary data), (subject to disc space!)
created in other programs using the OLE protocol, that can
be linked to or embedded in a Microsoft Access table. You
must use a bound object frame in a form or report to display
the OLE object.
Hyperlink Field that will store hyperlinks. A hyperlink can be a UNC Up to 64,000 characters
(Hyperlien) (Universal Naming Convention) path to a file, or a URL.
Assistant for choosing Creates a field which permits you to choose, from a scrolling The same size as the
from a list list, a value which comes either from another table or from a primary key of the
(Assistant Liste de specified list of permitted values. If you choose this option, a corresponding table. In
choix) wizard appears to help you to define the field. the (common) case
where this is an
AutoNumber field, it will
be 4 bytes in length.

7.5.1. Use of Number or Currency fields


Microsoft Access provides two field data types to store data containing numeric
values: Number or Currency.
Use a Number field to store numeric data to be used for mathematical calculations,
except calculations that involve money or that require a high degree of accuracy. The
kind and size of numeric values that can be stored in a Number field is controlled by
setting the FieldSize property. For example, the Byte field size will only store whole
numbers (no decimal values) from 0 to 255 and occupies 1 byte of disk space.
Use a Currency field to prevent rounding off during calculations. A Currency field is
accurate to 15 digits to the left of the decimal point and 4 digits to the right. A
Currency field occupies 8 bytes of disk space.

7.5.2. Storing telephone numbers


Quite a lot of data which we informally refer to as numbers are in practice nothing of
the kind! For example, telephone numbers include nonnumeric characters such as plus,
spaces, and leading zeros: +44 (0)789 12345. For this reason, such fields must be
stored as text. See the next section for details of how to ensure that the data stored in
the fields is correctly formatted.

7.5.3. Controlling data entry formats with masks


When you have several people entering data in your database, you can define how
users must enter data in specific fields to help maintain consistency and to make your
database easier to manage. For example, you can set an input mask for a form so that
users can only enter telephone numbers in the Swedish format or addresses in the

Page 65 of 144
French format. You can set a specific format for the input mask, and select another
format so that the same data is displayed differently.
For full details of masks and how to use them, please refer to Microsoft Access
documentation available online: http://office.microsoft.com/en-
us/access/HA100964521033.aspx#2

7.6. Keys

7.6.1. Candidate keys


If there is only one candidate key, it has to be the primary key, and the comments
below for primary key apply.
If a key is a candidate key but not a primary key, it is wise to set additional properties:
indexed - null forbidden; and duplicates not allowed.

7.6.2. Primary key


There is only one primary key per table (although the single primary key may have
multiple attributes within it -- please refer to the next section).
If the key has only one part, select it, and use Edit / Primary key to set the attribute as
primary key.
♦ Primary keys in MS Access
A commonly-used technique in Access is to use an AutoNumber field as a
primary key attribute. An AutoNumber field is in fact a Long Integer value.
For this reason, the data type of a corresponding foreign key field should be
set to Long Integer.

7.6.3. Multi-part primary keys


The primary key may be multipart. To create a multipart primary key in Access, select
the first field, then, holding the control key, select the second and subsequent parts.
Once all parts of the primary are selected, use Edit / Primary key to set the attribute as
primary key.

7.6.4. Entity integrity rule


This rule states that no field participating in the primary key of an entity may be null.
This rule may be enforced in Microsoft Access as follows: set the Null Forbidden
property for each attribute which participates in the primary key.

7.6.5. Foreign keys


In Microsoft Access, a foreign key is created by creating attributes which correspond
to the primary key of the one end of a one to many relationship in the many end. These
attributes must have the same data type and size as the attributes in the primary table.
If the primary key is an auto number field, the attribute in the many table should be
declared as a long integer. Then a one to many relationship should be established, as is
described in the next section.

7.7. Relationships
Defining relationships in Access involves you in adding the tables you want to relate to the
Relationships window, and then dragging the primary key field from one table and dropping it
on the foreign key field in the other table.

Page 66 of 144
The kind of relationship that Microsoft Access creates depends on how the related fields are
defined:
♦ One-to-many relationship
A one-to-many relationship is created if only one of the related fields is a
primary key or has a unique index. This is usually the case.
♦ One-to-one relationship
A one-to-one relationship is created if both of the related fields are primary
keys and / or have unique indexes.
Sometimes Access recognises this automatically, as here, when a B2B
customer table is being created to hold fields specific to B2B customers:

The result is:

Page 67 of 144
Sometimes Access will not automatically recognise a one to one relationship
and you may need to force a one-to-one relationship; you do this by setting
the index property of the foreign key attribute to duplicates not allowed.
So, if we have this B2B table which we want to link back to Customer:

Then we have to set a property on the Indexed field CustomerID:

♦ Many-to-many relationship
A many-to-many relationship is really two one-to-many relationships with a
third table whose primary key consists of7 two fields - the foreign keys from
the two other tables. This has already been discussed in section 2.21

7.7.2. Relationships and linking: Enforcing referential integrity where


appropriate
Should you enforce referential integrity? As you define a 1: M relationship, you are
invited to check the box which enforces referential integrity. You should normally do
this - it means that you cannot create a child record for a non-existent parent record,
and this is a powerful and useful validation / data integrity constraint in nearly every
case. Do NOT do this if your ER model shows the relationship to be optional.
To summarise why you might want to use referential integrity: if a table such as animal
patient is related to another table animal type, and referential integrity is enforced, then
a database management system will only allow an actual patient record to be inserted if
the type of the animal already appears in the animal type table. This is a highly-
desirable constraint, since it ensures that questions to which a precise answer is needed
actually get one. Without it, someone might erroneously update the database to say of
Fido that he is a dawg, and a query which lists all dogs would omit Fido.
If you do set the option to enforce referential integrity, it is common also to set the
option for Cascade update; it is potentially dangerous to set the option for Cascade
delete, and you should only do this if you are certain of what you are doing.

7.8. System outputs

7
Or, includes them, along with another attribute which ensures uniqueness, usually a date.
Page 68 of 144
7.8.1. Queries
A query is a temporary results table resulting from joining together fields taken from
one or more database tables. A query can also include calculated fields.

7.8.2. Reports
Reports are comprehensive summaries of a situation, and normally involve data from
several tables. As such, it is based rather on a single query than on a single table. A
report is frequently intended to be printed, rather than viewed on-screen.

7.8.3. Forms
Forms are used to get data into a system, and may also be used to get information out -
- see the next section.

7.9. System inputs

7.9.1. Forms, sub-forms and their use with 1: M and M: N relationships


A form can be used to input data into a table. Where two tables are linked by a one to
many relationship, it is common to use the form and an associated sub-form.
A subform is a form within a form. The primary form is called the main form, and the
form within the form is called the subform. A form/subform combination is often
referred to as a hierarchical form, a master/detail form, or a parent/child form.
Subforms are especially effective when you want to show data from tables or queries
with a one-to-many relationship. For example, you could create a form with a subform
to show data from a Customer table and a Cars table. One Customer can own many
Cars. Conversely, one Car can be owned by only one Customer at a time. The data in
the Customer table is the "one" side of the relationship. The data in the Cars table is
the "many" side of the relationship. From a Customer-based form, a subform of type
datasheet or continuous form can show the details of all the various cars owned by that
customer. Alternatively, the form/subform relationship can be used in the opposite
sense, since with a Service record displayed, a user might want to show the details of
the (one) car which is being serviced.
Subforms can be nested, that is, a 1:M:N situation such as Customer : Order : Order
Detail can be implemented as a form containing a subform which itself contains a
subform.
Unfortunately, there is no straightforward way to show the three tables which
participate in a many to many relationship (for example, Order, Product and the link
table OrderDetail). Often, it is adequate just to use two (different) form to sub form
combinations. Where it is necessary to show the contents of all three tables at the same
time, a technique which is frequently applicable is to have a form and subform
relationship, with the link between the subform table and its other owner being
implemented as a combo box. An example of the use of this technique is provided in
appendix 5.2.

7.9.2. Field-specific validation checks


In Microsoft Access, these may take the form of simplified Visual Basic rules, such as
that a field must be either M or F. Where the list of permitted values will never
change, such as in the case of gender, it is sensible to include the rule as a property of
the attribute being defined. When the valid values form part of a variable list, it is
probably better to set that list up as a separate table and to enforce referential integrity
– see the next section.

Page 69 of 144
7.9.3. Using relational integrity to carry out inter-table validation checks
Where two tables are linked in a one to many relationship, it is usually good practice
to enforce referential integrity. See section 7.7.2. This makes it impossible to introduce
a child record for a non-existent parent; this is often of considerable value in
improving the design of a database.
A variant of this technique involves the specific identification of so-called lookup
tables. A lookup table contains the valid values of an attribute. By making the lookup
table a parent entity to the table whose values are to be verified, it becomes impossible
to enter “bad” data, that is, data not authorised by the lookup table. In this example,
the grade attribute of a student’s result in a module has been made into a lookup based
on the valid values stored in the parent Grade table. It is therefore impossible to record
an invalid grade.

7.9.4. Table-level checks on forms


On a form, it is possible to cross check fields. For example, you might not allow the title Mr for a
person whose gender is female. However, to do so requires the use of VBA.

7.10. Implementing processes

7.10.1.Data processing in Access


People expect systems to do things for them! This normally involves some amount of
data processing or transformation

7.10.2.Functional elements in Access


♦ Implicit data processing
E.g. a query involves joining two or more tables and generating a single
unnormalised table which is the required result - Access is doing a lot of
processing to achieve this, albeit it is not particularly visible to the system
user.
♦ Calculations: using expressions in Microsoft Access
A query may involve arithmetic elements, such as creating totals and sub-
totals. When defining the query, instead of specifying a field, specify a
calculated field. The syntax is exemplified in:
Amount : [Quantity] * [Unit price]
The operators (such as * for multiply) and functions (such as ) permitted in
expressions are documented at http://office.microsoft.com/en-
Page 70 of 144
us/access/HP051866381033.aspx which also describes the Expression
Builder, which considerably eases the task of building valid expressions.
Expressions are also used in forms and reports.
♦ SQL
The set-processing language in Access is SQL: Structured Query Language, which is
an ANSI (American National Standards Institute) standard language. As the name
suggests, SQL is a means of asking queries (questions) of a database and getting
back answers. Part of the power of the relational data model is that, provided that the
database consists of normalised entities, it is possible to ask almost any arbitrarily-
complex question and get an answer.
However, SQL is more than a conventional query language. It also provides set
manipulation facilities, that is, it is possible to create whole new sets of data and to
store them in tables, and / or it is possible to update complete sets of records in a
single operation. Access implements this functionality as ‘append’ and ‘update’
queries – see below, section 7.11.1.
♦ Occasional need for record-at-a-time navigation and
processing
Access usually manipulates records a set at a time. Sometimes, it is necessary to
carry out record-at-a-time navigation and processing under program control. This is
achieved in Access by means of:
∗ Recordsets: the Access mechanism for making tables
and queries available a record at a time
∗ Visual Basic: the language in which you can manipulate
individual records
ESC students should not normally try to learn how to do this.

7.11. System data transformations


Access provides the following explicit ways in which to take stored data and either to change
the way in which it is stored, or to transform it under program control:

7.11.1.Append and Update queries


These are the user-accessible means by which Access provides set manipulation
facilities, that is, it is possible to create whole new sets of data and to store them in
tables, and / or it is possible to update complete sets of records in a single operation.
They are not described further in this document.

7.11.2. Macros
Macros are stored sequences of user commands.

7.11.3. Visual Basic for Applications (VBA) modules inside Access


Program code can be linked to objects within a database, such as forms, etc. This program
code is written in VBA.

7.11.4. Visual Basic programs outside Access


A program written in Visual Basic can manipulate data stored in a Microsoft Access database.
The facility to do this is called Access Data Objects, ADO.
This document does not cover these topics in any systematic manner. ESC students should not
normally attempt to learn programming.
Page 71 of 144
8. Ways in which to learn more MS Access

8.1. Sample databases and applications included with Microsoft


Access
The material in this section is based on, and quotes from, the MS Access help files.
Microsoft Access provides a sample database that you can use while you're learning Microsoft
Access.

8.1.1. NorthWind Traders sample database (English edition) /


Les Comptoirs (édition française)
Use this sample database when you're first learning Microsoft Access. The NorthWind
database contains the sales data for a fictitious company called NorthWind Traders,
which imports and exports speciality foods from around the world. By viewing the
tables, queries, forms, reports, macros, and modules included in the NorthWind
database, you can develop ideas for your own Microsoft Access database. You can
also use the NorthWind data to experiment with Microsoft Access before you enter
your own data. For example, you may want to practice designing queries using the
Orders table since it contains enough records to produce meaningful results.
A French version of the same application can be built using the French-language
edition of Access. It is entitled Les Comptoirs. It is not quite as complete as the
American English version.
We have created an improved version of NorthWind which overcomes some of the
weaknesses in its original design and implementation by Microsoft.

8.1.2. Database Wizards (Assistants)


Microsoft Access also includes a database wizard (assistant) that you can use to create
common databases, such as a Contact Management database8. You can use the
databases created by the database wizard as-is or as a learning tool to help you design
your own databases.

8
However, please note that this Contact Management system will NOT meet the requirement set out in section 4.3.1!
Page 72 of 144
SECTION 3 – THE ANYTOWN DISTANCE LEARNING BUSINESS
SCHOOL EXAMPLE
9. Example scenario: Anytown Distance Learning Business
School
The Anytown Distance Learning Business School offers general business courses at undergraduate and
postgraduate levels. The undergraduate course is a Bachelor of Arts (BA) course called Business
Studies. The postgraduate course is a Master of Business Administration (MBA). Each course is
administered by a Course Coordinator.
Students apply for a course, BA or MBA.9 They send in an application form containing their personal
details, and their desired course. On behalf of the School, the appropriate Course Coordinator checks
whether the course is available and that the student has already obtained the necessary academic
qualifications. If the course is available (not yet full) and the student is qualified, he or she is enrolled in
the course, and the School confirms the enrolment by sending a confirmation letter to the student. If the
course is unavailable or the student is not sufficiently qualified, the student is sent a rejection letter.

10. Background: Studying


The academic year is divided into two teaching semesters, the first of which runs from October to
January, the second from February to May. At the start of a semester students must register for the
modules they will be taking in that semester. These are the core modules for that semester of the course,
and the electives chosen by the student. All modules last only one semester. There is a third semester,
over the summer, running from June to September. No ordinary modules are taught in the third semester.
However, MBA students do their dissertation in the third semester.
Once the student is enrolled on a course, they have to study modules, each of which has an associated
credit. The credit for an undergraduate module is 10; that for a postgraduate module is 15. To be
awarded a BA, a student on the BA course has to achieve 360 credits, and this is normally achieved by
the student studying six 10-credit modules per semester for each of two semesters per year for each of 3
years: 6 X 10 X 2 X 3 = 360. On the MBA, students study four 15-credit modules per semester for each
of two semesters in a single year: 4 X 15 X 2 = 120. They then undertake a single dissertation (project)
in the third semester; the dissertation is for 60 credits. A student is awarded an MBA when he or she has
achieved 180 credits. The table summarises the structure of each course:
Course Modules Credits per Taught Number of Sub-total Project Total
per module Semesters years credits
semester per year
BA 6 10 2 3 360 0 360
MBA 4 15 2 1 120 60 180
The student chooses the modules they will study at the start of each semester. A module is taught by a
module leader. Each module is assessed by coursework and by an exam, in varying proportions – one
module might be 60% exam, another 50%. A student passes a module if the mark for the module as a
whole exceeds 40%. If they pass, they are awarded the module credits; if they fail, they are awarded zero
credits. If a student fails a module, he or she has to take an alternative module in a later semester, so that
they can obtain sufficient credits.
Study is by means of Distance Learning. Coursework is submitted by email. The students do not need to
come to Anytown, except when they have exams to do at the end of the first and second semesters.

9
Note that course in this case study is neither programme nor module – but, as we will see, it is closer to
programme than module.
Page 73 of 144
11. A Closer Look into "Managing Students"
This section focuses on the part of the system that supports the administration of information about
students on courses and modules.
• Students study modules drawn from two lists of modules held for the School, one of
undergraduate modules, the other of postgraduate ones.
• Modules have titles and a unique identifying code. Each module has a pre-defined value
expressed as a number of credits.
• Modules are of two kinds - some modules are core, some are electives, that is they are optional.
• Core modules must be taken by all students on the course. The course regulations will specify
how many optional modules a student can take and what these options might be.
• Students who pass a module are awarded the number of credits specified as that module’s value.
If they fail, they get zero credits.
• Students construct a programme of study by doing core modules, to which are added the optional
(elective) modules they select from those available.
• Every course defines a maximum period of enrolment within which time the course must be
completed. This is normally five years for an undergraduate course and three years for a
postgraduate course. If a student does not complete within this time, the decision of the next
exam board will be that they have failed the course.
• Students may suspend studies or withdraw from the course. The date on which this happens must
be recorded.
• Each component (coursework or exam) of a module has a certain percentage weighting and a
student’s overall mark for a module is calculated by combining the marks for each component.
• An exam board (jury) meets after each semester to consider the marks obtained by students and
to determine whether they have passed or failed the modules they were registered for, and what
their status on the course now is. This process is described in more detail in section 12.

12. The process of deciding what happens to students


At the end of every semester, the module leaders and course coordinator meet together in an exam board
(jury). The exam board is chaired by the Dean of the School, who represents the School’s management.
The exam board firstly looks at the results for each module, checking to see that they are reasonable and
sensible – that is, that the marks awarded are neither too high nor too low. If there is a problem, all the
marks awarded are scaled up or down by a percentage decided by the board.
The exam board is then presented with a list of all the students on the course. For each student, there is a
report, called a Student Results Summary, which shows what modules the student has undertaken and
the results they have achieved. An example of this report is given below in section 19. The jury then
decides for each student whether they have:

1. Passed all the necessary credits, including all the core modules for the course and at least the
necessary number of options from the course’s collection of optional modules; in this case, the
decision is that they have succeeded in the course and they are awarded a BA or MBA.
2. Not yet passed all the necessary credits, but are making satisfactory progress: the decision is
that they may proceed, taking further credits as necessary.
3. Are not making satisfactory progress, that is, they are failing to complete too many modules or
have exceeded the maximum time they may stay on the course: the decision is that they have
failed in the course as a whole.
After the exam board, a revised version of the Student Record is printed and sent to the students.

13. Course Review


Once a year, a Course Review meeting is held at which each course is reviewed to see whether the list of
modules is still appropriate, or whether some modules have become obsolete, or whether new modules
need to be devised. The same meeting has the power to change the relative weightings of the coursework
and exam components, and to decide which modules are core on a course and which are electives.
Module leaders can propose changes to module specifications. The Dean can propose changes to the
Page 74 of 144
programme itself.

14. Simplifying Assumptions


This database does not contain full details of qualifications.10 So data about the following things is
simply stored as large text fields (Access Memo type), because it does not need to be queried:
Course Qualifications Required
Applicant / Student Previous Qualifications

Each module has only one teacher, and that teacher is the module leader. One teacher may however be
the module leader for a number of modules.

15. External entities


These are:
♦ Applicant
♦ Student
♦ Course Coordinator
♦ Module Leader
♦ Management – the Dean
16. Processes
This list is not necessarily complete.

16.1. Process Applicants


Decide whether to accept or reject students, on the basis of their qualifications and the
availability of spare places on the course.

16.2. Admit students to Course – Course Enrolment


Details of the applicant are transferred to the Student table.

16.3. Register students on core and optional modules


This is done at the start of each semester. Students fill in a form stating which optional modules
they wish to do and confirming the core modules that they have to do.

16.4. Teach and assess a module


Each module lasts one semester.

16.5. Prepare for and hold exam board (jury)


• Collect together the results for all students for all modules they have
been studying
• Review module results in exam board
• Decide student status in exam board

10
For a more thoughtful approach to how to manage qualifications, please see section 2.22
Page 75 of 144
16.6. Review Course
This is described in section 13.

17. Documents
The Course Coordinators currently produce and maintain the following documents:

17.1. Course Description


Produced for each course, this describes the course, says what qualifications students have to
have, and lists the core and elective modules. It is updated for and as a result of the Course
Review meeting (section 13). The Course Description is a report, it is not an entity.

17.1.1.List of Modules
For each module, the following data has to be kept:
♦ Module Code
♦ Module Title
♦ Course Code – The Course on which the module is used – a
module is used either on the BA or the MBA
♦ The Lecturer who is the Module Leader
♦ Elective or Core?
♦ Examination weighting %

17.2. Management Reports


The following statistics and analyses would be of value to management:
♦ Analysis of the results for each module – average mark,
standard deviation, percentage of students who do not pass the
module.
18. Entity and Attribute Lists
Some initial analysis work has been undertaken; however, please note that it is not complete, and you
are expected to add to it.
Some of the entities are Applicant, Course, Lecturer (module leader), and Student.
Some of the attributes for some of these entities are:
Applicant
Applicant No
Date of Application
Applicant Name
Applicant Address
Applicant Country
Actual Qualifications (stored as a Memo field)
Course
Course Code
Course Name
Course Description
Qualifications Required (stored as a Memo field)
Lecturer
Lecturer No
Page 76 of 144
Lecturer Name
Home Address
Student
Student No
Course Code (Foreign key)
Student Name
Student Address
Date of Birth
Previous Qualifications (stored as a Memo field)
Status (Applicant / Enrolled / Passed / Failed / Withdrawn / Suspended / Progressing)

19. Example Student Record Report


This is an extract of the information presented to the exam board for one student; it is later
printed and sent to the student.
Student Results Summary
Printed 10/07/2008
Name: LEUCHARS Annabelle
Address: 12 Acacia Avenue ILKLEY BD25 1EE
Course: MBA
Date application
received: 25/07/2007
Date admitted
into School 01/09/2007

Year 1 Semester 1 Results

Course Course
Module Core or work work Exam Exam Overall
Code Credits elective Module Name Teacher proportion mark proportion mark mark Result
M001 15 C Statistics BENNETT 40% 55 60% 33 42 Pass
Gordon
M567 15 C Electronic GREGORY 50% 63 50% 25 44 Pass
Business Mark
Systems
M999 15 E Ethics ELLUL Jacques 100% 33 0% 33 Fail

(etc.)

Year 1 Semester 2 Results


(etc.)

Year 1 Dissertation
Course Course
Core or work work Overall
Credits elective Module Name Teacher proportion mark mark Result
M234 60 C Dissertation GREEN David 100% 56 0% 56 Pass

CREDITS ASSESSED SO FAR 105 180


CREDITS ATTEMPTED, BUT STILL OUTSTANDING - MUST BE DONE LATER 10
AVERAGE FOR COURSE AS A WHOLE 49%

Page 77 of 144
20. Anytown high-level Use Case diagram
Please note that the label <<include>> can also be written « include ». Note also that Microsoft Visio
employs <<uses>> or « uses » instead of << include >> - they mean the same thing.

Page 78 of 144
21. Anytown: Context diagram

Page 79 of 144
22. Level 1 DFD

Page 80 of 144
23. Example Level 2 DFD

Page 81 of 144
24. Data dictionary
We now need to move towards a good ERA model by means of top-down entity attribute modelling.
The approach I have adopted here is to work on the basis of the list of "obvious" entities which I identified in section 18, put them into a spreadsheet, and gradually add the
appropriate attributes. The spreadsheet is an extended example of what is sometimes called a Data Dictionary.

24.1. Data dictionary for Anytown Business School

Data Dictionary for Anytown Business School

Description

External entities
Applicant
Student
Course Coordinator
Module Leader
Dean

Process (P) Sub-


or Sub process
Processes process (S)? Process No Process name No Sub-processes DFD number
Process
applications P 1 Process applications 1

Page 82 of 144
Admit students Admit students to
to course P 2 course 2
Register
students on
core and Register students on
elective core and elective
modules P 3 modules 3
Prepare for and
hold exam Prepare for and hold
board P 4 exam board 4

Collect module
results and Collect module
produce student results and produce
profile S 4 1 student profile 4.1
The scaling factor (if any)
is applied to the recorded
student results before the
Review module Review module Student Results Summary
results S 4 2 results 4.2 is reprinted

Decide student Decide student


status S 4 3 status 4.3
Print results
letters S 4 4 Print results letters 4.4

Teach and Teach and assess


assess module P 5 module 5

Page 83 of 144
Review
programmes Review programmes
and modules P 6 and modules 6
Produce
management Produce management
reports P 7 reports 7

Data Stores Store name

1 Applicants

2 Students

3 Module registrations

4 Module results

5 Module specifications

6 Student profiles

7 Course specifications

Process
Data Flows External entity Process No Direction Name of flow name
Process
Applicant 1 Inward Application applications
Process
Applicant 1 Outward Acceptance or rejection applications
Process
Course Coordinator 1 Outward Application applications
Page 84 of 144
Process
Course Coordinator 1 Inward Decision applications
Review
programmes
Course Coordinator 6 Inward Course description and modules
Register
students on
core and
elective
Student 3 Inward Module choices modules
Review
Coursework and exams for module
Student 4.2 Inward assessment results
Teach and
assess
Student 5 Outward Module results letters module
Print results
Student 4.4 Outward Student results letters letters
Review
Proposed changes to module programmes
Module Leader 6 Inward specification and modules
Review
Module specification as programmes
Module Leader 6 Outward revised and agreed and modules
Review
programmes
Module Leader 6 Inward Module results and modules
Analysis of the results of
Produce each module; course
management description; list of
Dean 7 Outward Management reports reports modules

Page 85 of 144
Review
Proposed changes to programmes
Dean 6 Inward programme and modules

Entities Attribute Primary? Foreign? Domain Validation


Y or C Type Size Format Input mask Rules Description
An applicant becomes a
Applicant / student when they are
Student enrolled
Student
number Y Text 11 > LLL00000000
Student
forenames Text 30
Student last
name Text 20 >

Date of birth Date/time Date 00/00/0000


Student
address line
1 Text 20
Student
address line
2 Text 20

Student city Text 50 >


Student
postcode Text 8
Student
country Text 32

Course code Y Text 3

Page 86 of 144
Application
date Date/time 20 Date 00/00/0000
Enrolment
date Date/time 20 Date 00/00/0000
Finishing
date Date/time 20 Date 00/00/0000

Applicant /
Enrolled /
Passed /
Failed /
Withdrawn /
Suspended /
Status Text 12 Progressing
Term
address line
1 Text 20
Term
address line
2 Text 20
Term city Text 50 > Defaults to Anytown
Term
postcode Text 8
Contact
details Text 60

Previous
qualifications Memo
Employee
Employee
number Y Text 7 > LLL0000 e.g. EMP1234
Employee
forenames Text 30

Page 87 of 144
Employee
last name Text 20
Course
coordinator /
Employee module
role Text 20 leader / Dean
Social
security
number Text 16
Employee
address line
1 Text 20
Employee
address line
2 Text 20
Employee
city Text 50
Employee
postcode Text 8
Employee
country Text 32
Employee
contact E.g. telephone numbers,
details Memo etc.
Programme Level Y Text 1 > L P/U
Credits per 10 if undergrad; 15 if
module Integer postgrad
Project
credits Integer 60 if postgrad
Credits 360 if undergrad; 180 if
required Integer postgrad
Course

Page 88 of 144
Course MBA /
code Y Text 3 BA
BA Business Studies or
Course Master of Business
name Text 40 Administration
Course Long Employee who manages
coordinator Y Integer the course
Level Y Text 1

Required
qualifications Memo

Max number
of students Integer
Normal
number of
years Integer

Max number
of years Integer
Module
value Integer

Modules per
semester Integer
Taught
semesters
per year Integer
Description Memo
Module
Module
code Y Text 4

Page 89 of 144
Module title Text 1

Course code Y Text 1


Hypertext
Specification link
A Module Operation is the
operation, or running, of a
Module operation module in a given year
Module
code C Y Text 4
Year C Text 4
Core /
elective / C/E/
obsolete? Text 1 O L
Examination
weighting Integer %
Teacher Text 7 > LLL0000
Scaling
factor Scaling factor, in %,
applied Single % decided by jury

Registration Result
Module
code C Y Text 4 L000
Student
number C Y Text 11 > LLL00000000

Date course
work
received Date/time

Page 90 of 144
Course work
mark Integer %

Exam mark Integer %

Overall mark Integer %


Module
result Text 4 Pass / fail

Relationships Parent Relationship Child Degree Description


A Course is part of either
a Postgraduate or an
Undergraduate
Programme Includes Course 1:M Programme
An application is made by
an Applicant for a Course.
If they are acceptable,
they may Enrol on the
Applicant / Course. They are then a
Course Enrols Student 1:M Student on that Course
A Course is delivered as
Course Consists Of Module 1:M a series of Modules
An Employee whose role
is Course Co-ordinator,
Employee Coordinates Course 1:M coordinates the Course
An Employee whose role
is Lecturer, Leads the
Employee Leads Module 1:M Module
Registration Result
resolves the many-to-
many relationship
Registration between Module and
Module Results In Result 1:M Student

Page 91 of 144
Registration Result
resolves the many-to-
Is many relationship
Applicant / Registered Registration between Module and
Student On Result 1:M Student
Module
Module Runs as operation 1:M

Use Cases Actors Relationships Other Use Cases Description


Input applicant details Course coordinator Used by actor
Confirm applicant as student Course coordinator Used by actor
Register student on modules Student Used by actor
Input module results Module leader Used by actor
Print module results for jury Course coordinator Used by actor
Print student results for jury Course coordinator Used by actor
Print student results letters Course coordinator Used by actor
Change course structure Dean Used by actor
Change course structure Includes Update module
Module, and Module
Update module Module leader Used by actor Operation
Update module Dean Includes Allocate teacher
Update student status Jury Used by actor
Record new staff member Dean Used by actor
Submit coursework Student Used by actor
Request management reports Dean Used by actor
Allocate teacher Dean Used by actor

System
Outputs
Reports Description

Page 92 of 144
Average mark, standard
deviation, percentage of
students who have not
Analysis of the results of each module passed
See section 10 of
Student Results Summary scenario
Sub-
Forms Forms Description

Queries Description

System
Inputs
Sub-
Forms Forms Description
Applicant details
Student details
Programme and module
Record student module choices details
Module and Module Operation
Update course structure details
Update module
Update student Registration results
Update member of staff
Coursework receipt

Validation Checks Description


(NB: this is intended for inter-entity validation checks, and there are none in this
particular scenario)

Page 93 of 144
25. Anytown ER diagram

Page 94 of 144
26. Anytown system implementation
In order to use the analysis and design work we have already undertaken, you would begin to
translate the ERA model (data model) into equivalent Access objects. Therefore, entities become
tables, attributes become fields, and relationships are defined as relationships! Similarly, the Use
Case diagram has already been used to identify inputs and outputs indicated in the dictionary
above. Implementation in Access involves converting these into equivalent forms and subforms.
You might like to try this for yourself (because we have not uploaded an Anytown database).
Over to you to try…

27. Terminology associated with data modelling and database design


Unfortunately, both for historical reasons, and for others, a variety of complex vocabulary has built up in the
area of database design, and specifically of normalisation. The literature of normalisation generally follows
the vocabulary established by Edgar Codd. This is based on a specialised branch of mathematics, and is
frankly obscure. I have therefore deliberately changed the vocabulary in the rest of this document to follow
the Entity Relationship Attribute approach we have been using so far. It is possible that you will encounter
other vocabularies, and therefore I have set out the equivalences in the table below. The only ones you
need to know are the column headed ERA, and that headed Access. OOAD stands for object oriented
analysis and design, a more recent approach which we do not teach in the School.

Meaning File Spreadsheet ERA Access Codd OOAD


Class of object, File (worksheet) Entity Table Relation Class
thing
Instance of Record Row Entity occurrence Record Tuple Instance
object, thing
Property, fact Field Column Attribute Field Attribute Attribute
about
Relationship (none) [achieved Relationship Relationship Relationship Association
painfully by
lookup functions]
Set of all (none) (none) (none) (none) Domain Domain /
possible values type
Operation (none) Formula (none) Module (none) Operation /
method
Degree (1:M (none) (none) Degree Degree Cardinality Multiplicity
etc.)

Page 95 of 144
0. References

0.1. Basics of structured analysis


There are a myriad number of books on basic "structured" systems analysis and design
techniques. One which I would recommend for its combination of cheapness and
accessibility is:
Hughes, Martin
Mastering systems analysis and design
Basingstoke: Macmillan, 2000
ISBN 0-333-74803-4 Out of print.
The following book is excellent for setting the techniques used in this document in a
business-oriented context:
Curtis, Graham & David Cobham
Business Information Systems: Analysis, Design & Practice 6ed
Financial Times / Prentice Hall, 2008

0.2. Database theory


The classic reference for students who really want to understand database theory is the book by
Chris Date - Date was a collaborator with Dr. Edgar Codd, who invented the relational data
model, until the death of the latter in 2003:
Date, Chris
An introduction to database systems (8th edition)
Reading, MS: Addison-Wesley, 2003

This book is frankly difficult at first encounter, but it remains the classic reference on relational
database.

0.3. DataFlow Diagrams (DFDs)


For a tutorial on DFDs, see http://www.cems.uwe.ac.uk/~tdrewry/dfds.htm

0.4. Entity relationship modelling


Entity relationship modelling was originally proposed by Peter Chen in the seminal
article
Chen, Peter
The entity-relationship model—toward a unified view of data
ACM Transactions on Database Systems (TODS) archive, Volume 1 , Issue 1
(March 1976) - Special issue: papers from the international conference on very
large data bases: September 22-24, 1975, Framingham, MA. ISSN:0362-5915.
I use a simplified version of his notation in this document.
For an additional online tutorial about entity relationship modelling, see
http://www.cems.uwe.ac.uk/~tdrewry/lds.htm

0.5. Use Case


Wikipedia (accessed 26/02/2008) has a useful summary of Use Case diagrams
http://en.wikipedia.org/wiki/Use_case_diagram
See also http://www.systemsanalysis.org.uk/ accessed 15/06/2008.
Page 96 of 144
0.6. Basics of Object Oriented Analysis and Design (OOAD)
The structured approach is not the only one used in industry – the more recent object oriented
analysis and design approach is described in books such as this one. OO is conceptually more
difficult than the older structured approach. But if you need a textbook on this more recent
approach, I recommend the following - although it is NOT suitable for the basic "structured"
systems analysis and design techniques used in this document:
Bennett, Simon & McRobb, Steve & Farmer, Ray
Object-oriented systems analysis and design using UML 3/e
Maidenhead: McGraw-Hill, 2006
£44.99
A good textbook for initial study of object-oriented analysis and design. Rather long-winded in
places.

Page 97 of 144
1. Appendix 1 Business Process Analysis using Use Case
Analysis
With thanks to Dr. Ken Lunn, former colleague at the University of Huddersfield, whose material
has formed the main basis for this section.
A Use Case is a definition of a meaningful interaction with a computer system. If you have used
the internet to buy things, an example of a Use Case would be choosing something from an online
catalogue, and another might be paying for the goods.
Use Case modelling is part of requirements definition and systems analysis. At the high level, a
set of Use Case diagrams define the presentation of the system, and these are excellent tools for
discussion with stakeholders of a system, such as users and sponsors. At a more detailed level,
Use Cases are used to fully specify the external functionality of a system.
Use Cases are part of the information required by developers to design and implement a system.
Use Case diagrams say "what" a system does. The detailed analysis of Use Cases begins to say
something of "how" the system behaves in an environment. However, it does not say "how" a
system is structured internally to provide that behaviour. In computer system development you
will frequently see this separation emphasised. Before you decide how a system works, you need
to determine what it does first - a simple and obvious rule, but one so often forgotten to many
people's ultimate regret. That’s why Use Case diagrams (UCDs) and Use Case models (UCDs
with supporting text documents) can be so useful.

1.1. What is a Use Case Diagram?


A Use Case Diagram models a complete business process.
It consists of three key elements:
 ACTORS: People or things that use a system. An Actor might be a
clerk in a business, a manager, or even a customer accessing a
system via the Internet. Other computer systems are also called
Actors. A banking system that you send information to might well be
an Actor in your system.
 USE CASES: A Use Case is a meaningful piece of functionality
provided by a computer system. It can be quite complicated.
Examples would be printing invoices, accepting payment, ordering
goods.
Use case: A use case in a use case diagram is a visual
representation of distinct, identifiable (nameable) business
functionality in a system.
To choose a business process as a likely candidate for
modelling as a use case, you need to ensure that the
business process is discrete in nature. Discrete means
separately and clearly-identifiable.
As the first step in identifying use cases, you should list the
discrete business functions in your problem statement.
Each of these business functions can be classified as a
potential use case.
 RELATIONSHIPS: These are links between Actors and Use Cases.
Actors use Use Cases, and also Use Cases can use other Use
Cases.

Page 98 of 144
We draw a Use Case as an ellipse with the name of the Use Case underneath:
Sometimes the name is put inside:

A Use Case

A Use Case

The Use Case name is a concise, active description of the behaviour carried out by the
Use Case, such as "print invoice". Do not write mini-essays to describe the behaviour of
the Use Case - we shall use a more elaborate means for describing the behaviour in full.

An Actor is drawn as a stick person:

An Actor

This is rather an unusual choice of notation when it is an external computer system, but
you will get used to it. An Actor is really a role, not a person. One person may use the
system under many different roles. When finding actors, you are looking for the roles
that people adopt, not the people or even the job titles.
Relationships are drawn as lines, usually with an arrow:

A Use Case
An Actor
This means that an Actor uses the Use Case. In any relationship there will be two way
communications. The direction of the arrow indicates who initiates the interaction. Often
in an interactive system, it is the Actor that initiates the dialogue, but it can be the Use
Case. Sometimes the arrow is left out.
A Use Case can use another Use Case. If you have a piece of well-defined functionality,
it makes sense to re-use this wherever possible. Also, sometimes a Use Case gets too big
to manage sensibly and it makes sense to break this down into smaller Use Cases.
There are two ways Use Cases can relate. The first is where a Use Case "includes"
another Use Case. In this case the second Use Case is always invoked as part of the
execution of the first. This is drawn with an arrow pointing to the Use Case that is
included, with the label <<include>> tagged to the line:
Page 99 of 144
<<include>>

Change Invoice Add item to sales ledger

Please note that the label <<include>> can also be written « include ». Note also that
Microsoft Visio employs <<uses>> or « uses » instead of << include >>.
Sometimes a Use Case is only called occasionally from another Use Case. From the
scenario analysis of the business, this will often be to support an alternative path or an
exception. We draw this with an arrow pointing the other way (yes it is confusing at
first) where the arrow points to the calling Use Case. So below, Chase Payment
sometimes calls Issue Warning Letter.

<<extend>>

Chase Payment Issue Warning Letter


So now we have the building blocks for a sophisticated description of a system's external
behaviour. Let us look now at an application that manages payments for customers. A
credit controller might be able to print invoices, chase payments, process payments,
correct invoices, and correct deliveries and register bad debts using a computer system.
Part of chasing payment may involve either issuing warning letters, where the computer
system prints one off, or telephoning the customer, where the computer system provides
a means of the controller logging the results of the conversation.

Page 100 of 144


Print Invoice
<<extend>>
Customer

Issue Warning Letter

Chase Payment <<extend>>

Telephone Reminder

Process Payment

<<include>>

Correct Invoice
Credit Control
Clerk Receive Payment

<<extend>>
<<extend>>
Correct Delivery

Receive Cheque Receive Bank Credit


Register Bad Debt

With a Use Case Diagram like the one above, you are getting a clear picture of who uses
a system, and what they can do with it. You also have forced some decisions, and
provided some external structure to the system.

1.2. What to do if a use case diagram won’t fit on a single


page?
Answer: split the diagram over several pages, with each page corresponding to a single
area of business activity.

1.3. Finding Use Cases


The first stage of analysis is to map out the business using:
 A High Level Business Activity Model, that breaks a business
down into a simple, three level hierarchy of activities.
 Scenario Analysis of Business Activities, using primary and
alternative path analysis.
 Construction of Business Processes or Business Workflows
from the Scenario Analysis, and described using Activity
Diagrams.

The use of these techniques is not taught on this module, nor is it described in
this document. That is because the scope of activity for the kind of systems
Page 101 of 144
described in the rest of the document is assumed to be relatively small scale, in
relatively clear-cut situations. So you should already know what the business
process is, and you are simply trying to improve it. Once you have a business
process fully defined, you go around all the activities asking the simple
question "is there a potential use of a computer system here?" Sometimes you
may need no system support in an activity in a business process, sometimes the
need for one use case, sometimes many use cases are needed for a particular
business activity.
We are now beginning to see the rudiments of a methodology emerging. It starts in the
business arena, describing the business in some detail. Then it starts to think about
where computer systems are used. The first thing to worry about is what the system
does, and how it fits in to the business, not how it does it in detailed technical terms.

1.4. Naming Use Cases


This section summarises the document “What Makes a Good Use-Case Name?” by Dr.
Use Case (aka Leslee Probasco, Rational Software Canada) to be found at
http://download.boulder.ibm.com/ibmdl/pub/software/dw/rationaledge/mar01/WhatMak
esaGoodUseCaseNameMar01.pdf accessed 15/12/2008.
“Auction” is a poor choice of use case name. Is it a noun or is it a verb? Both.
Having clear and meaningful use-case names is very important; it's worth spending the
time on up front to get them right.
Why Should We Care About Use-Case Names, Anyway?
Why should we care so much about what name we give a use case? When
defining the requirements that a part of the business needs, the project team
and customers must agree on scope definition and cost and schedule estimates.
Ultimately they must make the decision to either proceed with the project or to
cancel it (one of the objectives of the inception phase during which UCDs are
initially created).
Often the only information available about the identified actors and use cases
are their names. Along with specified features and other system requirements,
this must be sufficient for all stakeholders to have a clear enough
understanding of the functionality of the system in order to make this critical
"thumbs-up" or "thumbs-down" decision.
Naming use cases should enable anyone (at least anyone familiar with the
problem domain) to be able to look at a use-case diagram -- noting the actor
and use case names, and their associations - and have a pretty good idea of
the value or goal to be achieved by each use case. To accomplish this, it is
very important to choose the names of all actors and use cases with this
objective in mind.
The "Golden Rule of Use-Case Names" suggested by Probasco comes from the
Rational Unified Process RUP. This states that "each use case should have a
name that indicates what value (or goal) is achieved by the actor's interaction
with the system" (if all goes as expected! :>).
Here are some good questions to help you adhere to this rule:
• Why would the actor initiate this interaction with the system?
• What goal does the actor have in mind when undertaking these
actions?
• What value is achieved and for which actor?
The preference is to have a use-case name that begins with a verb – just as in
a To-Do list. For an ATM system, a to-do list might include actions such as
Page 102 of 144
"Withdraw Cash," "Transfer Funds," "Service the ATM," "Deposit Funds,".
Names like "Cash Withdrawal," "Funds Transfer," "ATM Service," and the
like do not follow this rule.
So: All use-case names should indicate what value or goal is achieved by the
actor(s)' interaction with the system and must be stated in the active form,
beginning with a verb.

1.5. Describing Use Cases


The high level Use Case Diagrams above are fine for a “mile-high”, or “birds-eye”, view
of the computer system’s behaviour. For many stakeholders, such as sponsors and
managers, this will be enough. However, the analyst who thinks the job is done has a
rude awakening. Once you have defined the use cases at the high level, a lot of work
may still be necessary to open these up and define them in detail.
At the very least, the Use Case may need to be described in some detail in a text
document which explains the use case diagram. Sometimes people distinguish between
the diagram alone, the UCD, and the use case model, which is the UCD and other
supporting documentation.
Now we know what the system presents to the various users (or actors), we need to
define in fine detail the "how" of that interaction. Use cases can be further defined as
detailed sequences of behaviour. This is beyond the scope of this simple introduction.

1.6. Using Use Cases to identify System Inputs and Outputs


At every point on a Use Case diagram where an arrow connects a (human) actor to a Use
Case, one or more Forms, Reports or Web Pages are needed to input data to the process
or to output information. You can therefore make a list of these of these interactions, and
state what means (Form, Report or Web Page) is appropriate to that interaction.

1.7. Other resources for learning about Use Cases


There’s plenty of material about use case modelling on the Web, but much of it is
unnecessarily complicated because it is described as part of the full UML language
which business students should not learn.
One reference is http://www.parlezuml.com/tutorials/usecases/usecases_intro.pdf
accessed 24/11/2008 and written by Jason Gorman.

Page 103 of 144


2. Appendix 2 Data Flow Diagrams
This appendix summarises the steps required in creating a dataflow diagram. Examples of
context, level 1 and a single level two diagram are given earlier in the document, in the
University of Anytown business school case.
Creating a DFD is a process also called Data Flow Modelling: the process of identifying,
modelling and documenting how data moves around an information system. Data Flow
Modelling examines processes (activities that transform data from one form to another), data
stores (the holding areas for data), external entities (what sends data into a system or receives
data from a system), and data flows (routes by which data can flow).

2.1. What are Data Flow Diagrams (DFDs)?


♦ DFDs are diagrams that show how data flows within a
system
♦ They are initially created as two simple HIGH LEVEL
diagrams: the Context Diagram and the Level One Data
Flow Diagram (DFD)
♦ Parts of the high level diagram may then be ‘exploded’
(alternatively, expanded or zoomed) to show more detail
♦ DFDs represent the flow of data between different
processes within a system, together with the original flows
of data into the system and the output flows of information
from the system to its users, clients or other consumers
∗ The idea is that the diagrams be simple and
intuitive, not focusing on details
∗ They should describe what the users of systems
do, rather than what computers do
♦ Limitations:
∗ Focus only on flows of information (not physical
flows of goods)
∗ The diagrams therefore ignore flows of materials
∗ Nor do they show how a process works: its
decision points (if this, then this, otherwise this) or
repetitions – such diagrams, sometimes called
flowcharts, are NOT a part of the approach
documented in this booklet

2.2. Why use Data Flow Diagrams?


♦ They are a technique used in structured analysis, that is,
a tool used by the systems analyst for analysing
requirements
♦ They are also a communications aid
∗ Between user and analyst

Page 104 of 144


∗ Within an analysis team
♦ The diagrams can be used to solve disagreements about
how work is being done or how it should be done in the
future

2.3. What is a DFD? Main elements


♦ Context diagram
♦ Sources and Recipients of information (“external
entities”)
♦ Main system inputs and outputs
♦ Main data stores

2.4. The components of a DFD


A complete DFD is a hierarchical set of diagrams:
♦ One Context diagram
∗ Shows the system boundary by indicating the
name of the main process in the system, and the
External Entities which lie outside the system
∗ Summarizes the main data flows into and out of
the system
∗ The Context diagram defines the system
boundary: what is a part of the system under study
(and what is outside it); this also sometimes called
the scope of the system
∗ Defines data flows to, and information flows from,
a system
∗ Identifies the data flows and stores within the
system
∗ Identifies the functions (processes) performed by
the system
♦ Levelled set of DFDs
∗ One Level 1 DFD, which identifies the business
processes and breaks them down into
subprocesses
∗ Several (2 to 10, typically 7) Level 2 DFDs
♦ Main system functions (processes), the key functions,
appear as Level 1 DFDs
♦ Where necessary, explode level 1 DFDs into component
level 2 DFDs
∗ This means that there are parent and child
processes
∗ Consider parent as a window onto its children
Page 105 of 144
∗ Possible to look at a process at any level of detail,
so may need Level 2 diagrams, or even Level 3
♦ All key documents should be identified as data flows

NB: this is NOT a DFD; this diagram shows the STRUCTURE of a DFD!

2.5. What appears on a DFD?


There are only FIVE kinds of thing on a DFD:
♦ Processes
A Process takes data in and processes it to create output
data or information; the inputs are modified or
transformed in the process of generating the outputs.
Processes can often be identified in the real world with
actions undertaken by individuals or whole departments.
For example, a sales representative is part of a process
Take Order. That process may itself be a part of (a sub-
process of) the larger process, Fulfil Order.
♦ External entities
An External Entity is a source of data to the system as a
whole, or a client or consumer of information (processed
data) produced by the system. They are outside of the
system being modelled. External Entities are terminators
which indicate where data comes from and where output
information goes to. In designing a system, we have no
idea about what these terminators do or how they do it.
♦ Data stores
A Data Store is a place where data is stored: typically, a
folder in a filing cabinet (a paper file) or a file in a
computer. Data Stores represent a place in the process
where data comes to rest. A DFD does not say anything
about the relative timing of the processes, so an example
Page 106 of 144
data store might be a place to accumulate data over a
year for the annual accounting process
♦ Data flows
∗ Data flows in from an external entity, another
process, or a data store
∗ Data flows out to an external entity (in this case it
is information), or to another process or a data
store
Data flows may sometimes be obvious in the real-world as documents,
such as order forms (a data flow into a Fulfil Orders process) or
invoices (a data flow out from a Bill Customers process).
♦ Elementary process descriptions
A process which is relatively simple and straightforward is NOT
broken down into sub-processes; instead, it is described in a
paragraph or so of text called an Elementary Process Description

2.5.2. Listing the elements of a DFD


Analysts normally identify and list the main elements of a DFD on paper or in a
spreadsheet before they go on to make the actual diagram. This list of elements
is sometimes formally maintained as a Data Dictionary.

2.6. The Data Flow Diagram Symbols – SSADM Notation

1 A Process box
The Number and Description is the
Process same as in the Elements List (data
Description dictionary)

A Data Store
D1 Name of Data The Number and Description is the
same as in the Elements List
Source or
Destination Source/Destination
Arrows show DATA FLOWS

2.7. Making a Data Flow Diagram: a Top-Down Approach


♦ The systems analyst makes a context level DFD, which
shows the interaction (data flows) between the system
(represented by one process) and the system
environment (represented by external entities).
Page 107 of 144
♦ The system is decomposed in a lower level (Level 1) into
a set of processes, data stores, and the data flows
between these processes and data stores.
♦ Each process on the Level 1 diagram may then
decomposed into an even lower level diagram (Level 2)
containing its subprocesses.
♦ This approach then continues on the subsequent
subprocesses, until a necessary and sufficient level of
detail is reached which is called the primitive process.
♦ A primitive process is briefly described in natural
language (English, French etc.) This description is
sometimes called an Elementary Process Description.

2.8. The elements of a DFD


Every page in a DFD should contain fewer than 10 components. If a process has more
than 10, exceptionally 20, components, then one or more components (typically a
process) should be combined into one and another, lower-level, DFD be generated that
describes that component in more detail. Each component should be numbered, as
should each subcomponent, and so on. So for example, a top level DFD would have
components 1 2 3 4 5; the sub-component DFD of component 3 would have components
3.1, 3.2, 3.3, and 3.4; and the sub-sub-component DFD of component 3.2 would have
components 3.2.1, 3.2.2, and 3.2.3
♦ Context diagram (one only)
♦ Level 1 DFD (one only, identifying up to seven / eight
main processes)
♦ Level 2 DFDs (two to seven / eight in number)
♦ All key documents should be identified as data flows
♦ Key processes or functions appear in Level 1 DFD
♦ Where necessary, explode level 1 DFD into component
level 2 DFDs
∗ Parent and child processes
∗ Consider parent as a window onto its children

2.9. Creating DFDs


♦ Pencil and paper: often good enough, especially for a
first draft
♦ Can use MS Draw in Word, PowerPoint
∗ Very painful!
♦ Drawing packages
∗ MS Visio Professional

(a) Comprehensive but expensive

Page 108 of 144


(b) In Visio, the DFD is called a data flow model
diagram in the "software" category, or
diagramme de modèle de flux de données in the
“logiciels” category

(c) SSADM support is no longer provided in the


standard Visio product
MS Visio is now available to ESC students via the Microsoft
Developers’ Network Academic Alliance MSDNAA
Electronic Licence Management System ELMS. You should
by now have received an email from Microsoft’s agent e-
Academy telling you how you can profit from this scheme.
In order to create a drawing of a particular kind, you use both
a template file and a stencil file. These together tell Visio
what kind of symbols can be used. The equivalent terms in
French are un modèle and un gabarit.
For more information on Microsoft Visio and its use, please
see appendix 7.
∗ SmartDraw (www.smartdraw.com)

(a) Slightly less expensive

(b) There’s also a free time-limited trial edition


This product has built in support for SSADM, and is
significantly cheaper than Microsoft Visio. However, it is
much less widely used, and there is much less third party
support for this product. So although it is an excellent
product and quite appropriate for the work you will do on
this module, it is unlikely that you will be using it in your
professional life.
∗ EDraw (http://www.edrawsoft.com/)
This new product also has good support for SSADM, and is
significantly cheaper than Microsoft Visio. However, it is
much less widely used, and there is much less third party
support for this product. So although it is an excellent
product and quite appropriate for the work you will do on
this module, it is unlikely that you will be using it in your
professional life.
∗ Dia
So far, the open source software available does not seem to
me to be as good as it needs to be seriously to compete with
Visio. Sadly, there is as yet no open source software that I
can identify which is of sufficient quality to recommend in
the area of drawing and computer aided software
engineering. You might consider Dia, which is an open
source diagramming package which can be used to create
some analysis diagrams.
♦ CASE tools: Computer Aided Software Engineering
Professional systems analysts often work in a project environment in which
they and other developers (programmers, etc.) use comprehensive packages
Page 109 of 144
called CASE tools to document an entire system development and
implementation approach. The market leading CASE tool is IBM’s Rational
Rose product. It is far too complex to be used by business professionals
alone and unaided, although business specialists are a very important part of
the overall project team (they, systems analysts, designers, programmers,
etc.) which carries out large information systems projects.

2.10. First List the Elements of the Data Flow Diagram


♦ The Sources or Destinations of Data
∗ Where the data comes from or goes to
(sometimes called External Entities)
♦ The Processes
∗ The processes that use or change the data
♦ Documents: flows of data
The documents used in or created by a process - for example, paper
reports output by a system; paper forms used to “capture” (record)
data for input into a system – are often a very good starting point for
initial systems analysis. Sometimes, in complex systems, flows of data
between processes are also distinct documents.
♦ The Data Stores
∗ Repositories of data (can be card indexes,
folders, computer files, documents)

2.11. Drawing the Context Diagram


This level shows the overall context of the system and its operating environment and
shows the whole system as just one process. It does not show data stores.
A context diagram is a top level (sometimes also known as Level 0) data flow diagram.
It only contains one process box that generalizes the function of the entire system in
relationship to external entities.
♦ Draw a box to represent the system under study and give
it a name
♦ Add the External Entities outside the system box
♦ For each data flow, put directional arrows from and to the
external entities to and from the system box
♦ Label the data flows; typically these are documents, they
may also be other kinds of message

2.12. Expanding a context diagram to give a level 1 DFD


♦ Draw a big box to contain the expanded diagram
♦ Name it across the top
♦ Add the data flows entering or leaving the box
♦ Work back from a flow leaving the box to identify the sub-
process (child process) which creates that output flow and
add it to the diagram
Page 110 of 144
♦ Identify the data flows which are inputs to that process
♦ Those data flows may be the direct outputs of other child
processes; or more often, such another child process
outputs to a data store which is then intermediate
between the two child processes
♦ Number the child processes; if there are more than
seven or eight, it will be necessary to group together
some processes at this level, and create a Level Two
diagram for that process

2.13. Questions to ask yourself


♦ For each process in the top level DFD:
∗ Does it need to read from or write to a data store?
∗ Does it send data to or read it from another
process?
∗ Does the process have access to all the data it
needs?
∗ Does it need to be specified in more detail, using
a Level 2 diagram, or is it simple enough to
describe in (possibly, structured) English as an
Elementary Process Description?

2.14. Rules for DFDs


♦ Process name has form imperative-verb followed by
object (noun phrase), e.g. Enrol students
∗ Not “Enrolment” – which is a noun
♦ All processes and data stores must somewhere have
data going in to them and away from them
♦ All data flows must start from or end with a process -
otherwise, what makes them happen?

2.15. Some points on logical DFDs


♦ Arrows show DATA flows - not the sequence of
processes, nor physical flows
♦ Sources and Destinations NEVER connect directly to a
Data Store – always Processes
♦ A Process must have at least one Data Input and one
Data Output
♦ Arrows not to or from data stores should be labelled with
the data that is flowing

2.16. Supporting documentation


♦ Elementary Process Descriptions
Page 111 of 144
∗ Description in natural language of a process
which is not further exploded in lower-level
diagrams
∗ Shouldn’t need to be long – if it is, may indicate
need for another level of diagram
♦ The list of elements - External entity list, etc – which is
typically stored in a Data Dictionary

2.17. Summary: “levelled” DFDs


The output of the whole process of analysing processes (!) is sometimes referred to as a
Levelled Data Flow Diagram. A Levelled DFD consists of:
♦ One Context diagram
♦ One Level one DFD
♦ Multiple level two DFDs (two to seven/eight in number)
♦ (Rarely) level three DFDs (two to seven/eight in number
for each level two process which needs explosion)
♦ Elementary Process Descriptions where necessary

Page 112 of 144


3. Appendix 3 When to use a spreadsheet, and when to use
a database

3.1. Introduction
We normally store data on computers when we have many occurrences of a specific kind
of record, and we want to process specific records, or complete set of records. For
example, we may want to maintain a list of companies. For purposes of comparison, we
will normally choose to store the same items of data about each occurrence. For
example, we will store the name of each company, its principal sector of activity, and
the address of its global headquarters. A widely accepted way of storing data, indeed, we
may even refer to it as the “natural” way to store such data, is by means of two-
dimensional tables.
Many widely used office productivity programs provide good facilities for storing two-
dimensional tables. We can use a word processing program, such as Microsoft Word; a
spreadsheet, such as Microsoft Excel; or a database, such as Microsoft Access. However, each
program has specific strengths and weaknesses when it stores data in this way. Refer back to
section 2.6 for more on this.

3.2. Spreadsheets versus databases

3.2.1. What spreadsheets are good at


∗ Spreadsheets combine conceptual simplicity, very
powerful data manipulation and analysis facilities,
and good information presentation facilities
∗ Spreadsheets are easier to design and to use
than are databases
∗ It is comparatively easy to involve the use of a
spreadsheet as the context of its use changes
∗ Functions make it easy to use previously
programmed data analytical techniques
∗ It is possible to program new functions, or to have
them written for you so that you can use a specific
data analytical technique
∗ Recent versions of Excel have excellent
presentation facilities and they also connect very
well to Word or PowerPoint

3.2.2. What databases are better at


∗ Spreadsheets are by their very nature highly
insecure – anyone who can access a spreadsheet
can see all the data in that spreadsheet; industrial
strength databases make it impossible for users
who are not privileged to see data to do so
∗ Spreadsheets can rapidly become very complex,
and it is very difficult to understand what the
overall structure of the spreadsheet is; as a result,
Page 113 of 144
they can become a nightmare to maintain
∗ It is difficult for more than a very small number of
people to use a single spreadsheet at one time,
and almost impossible to stop them from
interacting with each other, often in a conflicting
way
∗ Spreadsheets can handle at the most a few
thousand records; databases can handle millions
∗ Databases can support tens, or even thousands,
of simultaneous users

3.2.3. Using spreadsheets and database together


Microsoft Office is an integrated suite of programmes. As a result, there are
many different ways in which Microsoft Excel and Microsoft Access can be
made to work together. In the most difficult cases, it will be necessary to
programme this interchange using the Microsoft Office Automation feature.
However, this is often not necessary. As stated in
http://www.dummies.com/WileyCDA/DummiesArticle/id-2128.html, (checked
20/11/2008) “You can do plenty of importing and exporting data between
Microsoft Office applications without writing any code at all. For example, you
can perform the following actions:
• Import and export data by using options on the Access File menu.
• E-mail Access objects, such as reports, by choosing Send To --> Mail
Recipient.
• Use the OfficeLinks feature to send objects to other programs.
• Use basic Windows cut-and-paste techniques and OLE (Object Linking and
Embedding) to copy and link data between programs.
• Merge data from Access tables to Microsoft Word letters, labels, envelopes, or
other reports, using the Word Mail Merge feature. (Search the Word Help
system for merge.)
If you're just looking to get data from Access to another program (or vice
versa), writing code is probably not the easiest approach. Any of the previous
approaches are easier than writing custom VBA code to do the job.”
If you wish to incorporate data stored in Microsoft Access and manipulate it in
a Microsoft Excel spreadsheet, for example because you wish to create a
monthly report with charts and graphics, you can create a query which extracts
the data from several tables and then sends the results to Microsoft Excel in the
form of an external data range.
In the opposite direction, it is possible to create a list of data in Excel and make
it available in Microsoft Access as though it were an external table. Or you can
use a Microsoft Access form to enter data into Excel. For further information,
see the online help facility in each of the two products.

3.2.4. Summary
When you are manipulating data for yourself alone, or as part of a small team,
or in a very small business, spreadsheets are likely to be more intuitive, initially
more productive and easier to get started with. However, as the volumes of
data, or the number of users, increase, databases become much the preferable
option. It is often a sensible and viable option to prototype the requirements for

Page 114 of 144


a small business information system using a spreadsheet, and then, when the
requirements are quite clear, to transfer the data storage element to a database.
See also http://www.epinions.com/content_972857476 (checked 20/11/2008)

3.3. What to do if your spreadsheet skills are weak


Almost all business professionals need to be more or less able to use a spreadsheet. If
you are not confident that you can use formulae in spreadsheets to carry out moderately
complex data analysis, I suggest that you follow tutorials that you can find using Google.
Here are a few suggestions – though they are unlikely to be the best ones:
Spreadsheet – Link to Excel tutorials
Microsoft’s
(http://office.microsoft.com/en-gb/training/CR100654561033.aspx)
own tutorials
Spreadsheet – http://www.yevol.com/excel2003/index.htm
Excel 2003 -
English
Spreadsheet – http://translate.google.com/translate?u=http%3A%2F%2Fwww.yevol.com%2Fex
Excel 2003 - cel%2Findex.htm&sl=en&tl=fr&hl=en&ie=UTF-8
français
Spreadsheet – http://www.yevol.com/excel/index.htm
Excel 2007 -
English
Spreadsheet – http://translate.google.com/translate?u=http%3A%2F%2Fwww.yevol.com%2Fex
Excel 2007 - cel%2Findex.htm&sl=en&tl=fr&hl=en&ie=UTF-8
français

3.4. What to do if your database skills are weak


Work through some or all of an online tutorial guide – there are some excellent Access
tutorials available on the World Wide Web. See what you can find for yourself using
Google; here are some suggestions:
Database – Link to Access tutorials
Microsoft’s
(http://office.microsoft.com/en-gb/training/CR100654561033.aspx)
own tutorials
Database – http://www.yevol.com/en/access2003/index.htm
Access 2003 -
English
Database – http://www.yevol.com/access2003/index.htm
Access 2003 -
français
Database – http://www.yevol.com/en/access/index.htm
Access 2007 -
English
Database – http://translate.google.com/translate?u=http%3A%2F%2Fwww.yevol.com
Access 2007 - %2Faccess%2Findex.htm&sl=en&tl=fr&hl=en&ie=UTF-8
français
Alternatively and / or additionally, if you were here in your first year, revise the work
that you carried out on Access in the second semester of the first year.

Page 115 of 144


3.5. Conclusion
There is usually more than one way to skin a cat, as we say in English! There are
certainly many ways of storing data. Each program has specific strengths and
weaknesses. You are advised to consider very carefully who it is that requires what
information, who collects the data, what kind of transformation is required between the
input data and the output information, and then to choose the right program – or
programs. For it is no coincidence that Microsoft, and other vendors, sell office
productivity suites. A suite is a collection of programs, each program having specific
strengths and weaknesses. Microsoft Office is just such a suite. As such, it provides
many ways to store data using one program, and to link to that data in another. A very
common scenario is that the data is stored in a database, while required output
information is presented using the data analytical features of a spreadsheet. A good
workman knows his or her tools, and chooses the appropriate tool for the job!

3.6. Acknowledgements – bibliography for Appendix 3


http://www.sjsoft.com/

Thanks to St James Software http://www.sjsoft.com/ for the


original material on which I based this appendix.
http://www.epinions.com/content_972857476

(checked 20/11/2008) Thanks to Epinions; see


http://www.epinions.com/about/ for information about this
organisation.

Page 116 of 144


4. Appendix 4: Reasons why a database is to be preferred to
a spreadsheet - Spreadsheet Does Not Equal Database
This section was taken from http://www.pcmag.com/article2/0,2817,1435148,00.asp (checked
24/11/2008), an article which originally appeared in the United States edition of PC Magazine,
dated February 3, 2004. This material is Copyright (c) 2004 Ziff Davis Media Inc. All Rights
Reserved.

♦ February 3, 2004
♦ By Helen Bradley

Since the days of Lotus 1-2-3, people have used spreadsheet programs for everything
from word processing to data management. Doing the former is silly. Doing the latter,
however, is viable, especially in the latest version of Microsoft Excel. But though you
may be more comfortable with Excel, a real relational database program like Microsoft
Access is a better choice for managing data—for a number of reasons.
♦ Databases are safer. Excel, for example, does everything
in memory, so that any unsaved data may be lost if your
system crashes. Databases write data to the hard drive
immediately.
♦ Databases can handle more data. Sure, Excel can
technically handle more than 65,000 rows of data, but
doing so will likely bog down even the fastest PC.
♦ Databases can easily link tables of related data together,
such as customers and orders or musical groups and
albums (as well as the songs on each album). This is
where the words relational and database come together.
Storing related data together in a single table or
spreadsheet can be unwieldy and invite errors.
We'll look at a situation for which Access is a better tool than Excel and show you how an
Access solution works. If you've never used Access before, that's okay; we'll walk you
through how to create everything from scratch. We used Access 2002 for the instructions,
but you'll find the process is similar in all versions of Access. We chose Access because
so many users have it already, but you can do the same things in other relational databases
such as FileMaker or Microsoft SQL Server. For more on picking the right database, see
"Databases for All Reasons" in our issue of January 2003 at
http://www.pcmag.com/article2/0,1759,760886,00.asp (checked 24/11/2008).

4.2. More Than a List


Consider a veterinarian's office: To record pet and owner details, you could use a list in
an Excel worksheet, but you'd encounter difficulties. If you create one record for each
owner, how would you handle an owner with multiple pets? You could add a field for
each pet, which would work for most clients. But a client who runs a breeding kennel
with 25 cats and innumerable kittens would force your data record to grow to an
excessive length.
On the other hand, if you organize your data so you have one record per pet, you would
have to enter the owner details for each pet in the household. This is unnecessarily
repetitive. And if an owner changes his address, you would have to find and update all
his pets' records individually.
Page 117 of 144
The better solution is to have two lists—one with the owner details and one with the pet
details—and then link the two by including a field in both lists with a common piece of
information. For example, give each owner a unique code number, which you can then
use in his pets' records. That way, you can find a pet, check the owner code, and then
find the owner's details in the owner file. Likewise, you can look up an owner, find his
code, and then extract all the pet records with that owner number.
Although you have two lists, each owner and each pet has only one entry in the system.
It's neat and efficient, and it solves another problem our veterinarian may encounter:
When client breeders sell kittens to new owners, the new owners may become clients,
too. To change a pet from one owner to another, simply change the owner code in the
pet's record and if necessary add a new owner record.

4.3. Create the Database


To create the database requires two tables, one for owners and one for pets, with a field
common to both—the owner code. We will set up the relationship between the two
tables and add a form to make it easier to enter data.
Each table needs a structure that includes a list of field names and types, as well as the
sizes of the fields. Each table must also have a primary key—a field that contains a piece
of information unique to that record. In the owner's table, the primary key is the owner
code; in the pet table, we'll use a similar field called the pet code. We will use an
AutoNumber field type for each. Access will then assign a unique sequential value to
that field for each record.
To build your database, launch Access, choose Blank Database from the task pane, and
name your file PetHosp.mdb. Click on Create then double-click on the Create table in
Design view option. The Table1:Table dialog will appear. Type Owner Nr as the first
field name, then tab over to the next column and enter AutoNumber as the Data Type.
(Access automatically completes the entry once you've typed the first letter.) Now enter
the rest of the data as shown on the next page. Here are the fields and types:
Field Name Data Type
Owner Nr AutoNumber
Surname Text
First Name Text
Title Text
Address 1 Text
Address 2 Text
Town Text
Postcode Text
Date Created Date/Time

If you want, you can add a description for each field to explain its contents as well as a
caption. The caption is a name that is used in place of the field name in reports and
forms. If you use shortened or cryptic field names, captions are a good idea.
To set a primary key, right-click on the area to the left of the Owner Nr field and choose
Primary Key. A key icon will appear, indicating that the field is the primary key. Save
the file with the name Owner, and click on the table's Close button.
Repeat this process to create a second table for pets with these fields:

Page 118 of 144


Field Name Data Type
patient no AutoNumber
patient name Text
owner code Number
animal type Text
condition Text
treatment date Date/Time
leave date Date/Time
date of birth Date/Time
Set patient no as the primary key, name the table patients, and close it.
Once you have created the tables, you can define the relationship between them. When
you do this, Access helps you maintain your data integrity. For example, you can set up
the relationship so that removing an owner automatically removes any of his pets from
the patients table.
Choose Tools | Relationships. When the Show Table dialog appears, click on the Owner
table and then select Add. Do the same with the patients table and then click on Close.
Small dialogs will appear, showing the structure of the two tables. Drag the ClientNo
field from the Client table and drop it on the ClientNo field in the Pets table. When you
let go of the mouse button, the Edit Relationships dialog appears with these two fields
listed. Select the Enforce Referential Integrity check box and the Cascade Delete
Related Records check box. This ensures that if an owner is removed, all his pets are
removed, too. Click on Create to set up the relationship, which is one-to-many—one
owner can have many pets (Figure 1). Click on the window's Close button and answer
Yes when prompted to save the changes.
Figure 1
Owner and Patients have a one-to-many relationship.

Now you can enter data into the tables. Click on Tables in the Objects bar and double-
click on Clients to open it in datasheet view. Type the following data into the table (the
number in the Owner Nr field will be entered automatically):

Page 119 of 144


Owner First Address 1 Address 2 Town Post- Date
Surname Title
Nr Name code Created
1 Brown Joe 12/12/98 1 Blah St Downtown Athens 12345 01/04/04

2 Blah Uptown Athens 12345 12/04/04


2 Smith Anne 2/2/2000
Avenue
3 Green Rick 5/5/2000 3 Blah Blvd Trackside Atlanta 56789 15/04/04

Close the table and then repeat the process to add the following data to the Pet table (the
patient no will be added automatically):
patient owner animal patient condition treatment leave date date of birth
no code type name date
1 2 Cat Peaches fever 30/04/2004 01/05/2004 01/04/2003
2 1 Dog Sam 01/04/2003
3 3 Horse Dobbin 03/03/1999
4 3 Cat Ginger 01/04/2003

4.4. Create a Data Entry Form


Although you could continue to add data using the two tables separately, it's easier to
use a form that displays all the related data. Access can do this for you. Close both
tables and click on the Forms icon in the Objects bar and double-click on Create form by
using wizard.
From the Tables / Queries drop-down list choose Table:Owner and click on the double
angle brackets (>>) to move all the Available Fields to the Selected Fields pane. Then
choose Table:Patients and move only the animal type and patient name fields from the
Available Fields to the Selected Fields pane. Then click on Next.
Access will ask you, How do you want to view your data? Choose by owner and click on
the Form with subform(s) option and then choose Next. When prompted, select
Datasheet as the layout type for the subform and choose Next. Pick a style for your form
(any will do) and click on Next. Type a form name, such as Owner and patient details,
click on Open to view or enter information in the form and click on Finish to end.
A form appears on the screen with the client data on top and the details of the pets
belonging to the client in a table below (Figure 2). You'll see two sets of record
navigation tools. The one at the bottom of the table is for the patients subform and the
other is for the owner records. Click on the Next Record button for the Owner data and
you will see that pets are displayed for that owner.
Figure 2

Page 120 of 144


Now you can add a new owner and his pet, as well as add a new patient to one of the
existing clients. To see what is happening behind the scenes, close the form and open the
patients table. You'll see that the data has been entered into the fields patient no and
owner code, even though neither field was included on the form. The patient no number
is automatically entered, because the field type is AutoNumber and the owner code field
is automatically set to the owner’s number, since the records are related through the
form's design.
Remove a client from the Clients table by opening the table, selecting the client, and
clicking on Delete. You'll be warned that a record in another data file will be affected
(the client's pets will be removed when the client is). This is the result of selecting the
Cascade Delete Related Records check box when setting up the relationship. The same
does not work in reverse and it is possible to have a client with no pets in the Clients
table.

Page 121 of 144


5. Appendix 5: Access Hints - Designing for Use
The whole point of using Microsoft Access is to permit the safe, effective and efficient storage of data in
tables so that information can be retrieved from them. We have seen that this involves analysing what data
is to be stored in what tables. Designing a set of database tables which correctly reflects the structure of the
data is, as we have seen, very important. However, almost as important is
♦ To ensure that users of the database can get out the information they
are looking for - this is done, in technical terms, using reports and
forms and queries;
♦ To enable users easily to put correct data into the underlying tables –
this is done using forms and subforms.
In this section, we will suggest that the best way in which to get data into a database is to use forms and
subforms which are based on the relational structure of the data. Very approximately, we will use forms and
subforms which correspond to master and detail tables.

5.1. Getting more help


The web is crucial to learning more about Microsoft Access and in particular for getting help with
problems which you find too difficult to solve alone. There are many forums in which people help
each other. There are many people who are very anxious to help by writing about how they have
solved problems which probably seemed complicated to them when they first encountered them!
A key to getting help is to formulate your Google query very carefully, adding just the right
keywords. For example, when writing this appendix, I used the Google search:

I found material helpful to my writing in the first three sites which Google displayed! They are:
♦ http://www.microsoft.com/communities/newsgroups/en-
us/default.aspx?dg=microsoft.public.access.queries&tid=462cbebf-
5bef-437b-88f6-fbf70e774da0&cat=&lang=&cr=&sloc=&p=1
♦ http://articles.techrepublic.com.com/5100-10878_11-5285168.html
♦ http://www.access-
programmers.co.uk/forums/showthread.php?t=170227

5.2. Unlocking the power of many-to-many relationships


The examples here are based on the following database structure:

Page 122 of 144


In this database, the many-to-many between Student and Module Operation has been resolved
by introducing link entity Module Registration. But how do we make it easy for the database user
to input data into the database and to get it out again?
The answer requires the use of forms and subforms that are based on one of the two “owning”
tables and the link (junction, intersection) table. Using the nomenclature introduced on the
diagram, we can refer to an A side of the many to many between Student and Module operation
and a B side. Which we take as A and which as B is a matter of choice.
You base the parent form on the A side of a relationship – here, on Module Operation.
You base the subform on the junction table.
You then use a combo box based on the table on the other side of the junction table, the B side –
here, on Student. The combo box goes on the subform.
In summary, to do data entry for a many to many relationship, use the two main tables and their
junction table to create a form, subform and combo setup.
More generally, when accommodating a many-to-many relationship via an associate table, you'll
need to base forms on a query which combines the fields from the various tables involved. Make
sure that this query includes all the non-key fields you may want to modify or may need from both
the many and the one table, and, if necessary, from other tables as well. It is also essential that
the foreign key that represents the one side from the associate table be included.
Consider the following example. Assume that you are the module leader responsible for the
operation of a module in a given year and that you want to manage students for whom you are
responsible.
The steps you should follow are these:
♦ You will create a form with subform based on a query, which in
this case I have called Module Operation Registration. (I’ve included
all the main fields from Module, Module Operation, Module
Registration and Students because I use the same query in several
forms.) Using the nomenclature introduced above, Module Operation
is on the A side (as is its parent entity, Module).
Page 123 of 144
♦ Create a form – subform - subform based on the query (here for
Module, Module Operation and Module Registration as combined in
the query Module Operation Registration). Ensure that, as a minimum,
the fields displayed include all the elements of all the primary keys of
the tables.
♦ Turn the foreign key corresponding to the primary key of the B
side of the many-to-many into a combo box. Here, turn the Student
no field in the subform into a combo box. That combo's rowsource
needs to independently query the Students table so as to return the
possible values of Student no. When you choose a value it is inserted
as the bound field. Your combo box therefore needs to include the
student number, but also a concatenation of first and last names of the
Students, so you have some meaningful data from which to select.
This will enable you to choose an existing student from the list (or
even enter a new one, although this is probably undesirable). To
accomplish this task, open the completed form in Design view
and change the foreign key field's bound control to a combo box.
(Right–click the control, choose Change To, and then select Combo
Box.) Set the combo box control's Row Source property to an
appropriate SQL statement11:
SELECT DISTINCT [Module Operation
Registration].[Student no], [Module Operation
Registration].[Student surname], [Module Operation
Registration].[Student forenames] FROM [Module Operation
Registration] ORDER BY [Student surname], [Student
forenames];
♦ In addition, set the Column Count property to 3 (so that the three
fields in the SELECT statement will be displayed). Return to Form
view and display the control's drop-down list.

11
You don’t need to create this SQL statement yourself. Instead, separately create a query that combines
the fields that you need in the usual way, using Design mode (mode création). Test that it works, then
display it in SQL mode. Copy the SQL SELECT statement that Access has generated and use it to replace
the SELECT statement in the Row Source mentioned above.
Page 124 of 144
5.3. Some difficulties associated with forms and subforms and how to
overcome them
The examples here are based on the following database structure:

5.4. Subform not updated


A common problem is that when you scroll through records in the main form, the
subform is not updated. The records on the main form and the subform are not
synchronized. This is because the subform is not always automatically linked to the
main form.
When you create a subform or subreport by dragging a form or report from the
Database window onto another form or report or by using the Form Wizard, Microsoft
Access does automatically set the LinkChildFields and LinkMasterFields properties, but
only under one of the following conditions:
∗ Both the main form or report and the child object are based
on tables, and a relationship between those tables has been
defined with the Relationships command. Microsoft Access
uses the fields that relate the two tables as the linking fields.
∗ The main form or report is based on a table with a primary
key, and the subform or subreport is based on a table or
query that contains a field with the same name and the same
or a compatible data type as the primary key. Microsoft
Access uses the primary key from the main object's
underlying table and the identically named field from the child
object's underlying table or query as the linking fields.
Recalculation occurs automatically for controls that reference other fields on the same
form or fields in subforms. Recalculation does not occur automatically for subform
controls that only reference fields on the main (master) form or in other subforms.
This is because subforms notify the main form of any changes, but the master form
does not notify the subforms of changes. Nor do subforms on the same main form notify
one another of any changes.

Page 125 of 144


Otherwise, it is necessary to set these properties explicitly. A common situation is where
it is necessary to incorporate an existing form as a subform to a newly-established one.
Because setting the properties changes the definition of the existing form, it is wise to
take a new copy of the existing form and to work with that, rather than the original.
SOLUTION: Setting LinkChildFields, LinkMasterFields Properties explicitly
You should use the LinkMasterFields and LinkChildFields subform control properties to
link the main form and subform automatically. You can manually update the subform by
pressing the F9 (recalculate) key.
You can use the LinkChildFields and LinkMasterFields properties together to specify
how Microsoft Access links records in a form or report to records in a subform,
subreport, or embedded object, such as a chart. If these properties are set, Microsoft
Access automatically updates the related record in the subform when you change to a
new record in a main form.
You can set the LinkChildFields and LinkMasterFields properties for the subform,
subreport, or embedded object as follows:
∗ The LinkChildFields property. Enter the name of one or more
linking fields in the subform, subreport, or embedded object.
∗ The LinkMasterFields property. Enter the name of one or
more linking fields or controls in the main form or report.
You can use the Subform/Subreport Field Linker to set these properties by clicking the
Build button to the right of the property box in the property sheet.
You can use the name of a control (including the name of a calculated control) to set
the LinkMasterFields property, but you can't use the name of a control to set the
LinkChildFields property. If you want to use a calculated value as the link for a subform,
subreport, or embedded object, define a calculated field in the child object's underlying
query and set the LinkChildFields property to the field.
When you specify more than one field or control name for these property settings, you
must enter the same number of fields or controls for each property setting and separate
the names with a semicolon (;).
Note The linking fields don't have to be included in the main object or in the child
object. As long as they are contained in the objects' underlying tables or queries, you
can use the fields to link the objects. When you use a wizard, Microsoft Access
automatically includes the linking fields.

5.5. Detail subform does not show the subset of records based on the
value of the current master form record
Unless you take specific action, a detail subform does not show the subset of records
based on the value of the current master form record when that master form record
changes.
SOLUTION
Solving this problem requires both SQL and some simple VBA, which deals with certain
events.
Event programming is a very powerful tool that you can use within your VBA code to
monitor user actions, take appropriate action when a user does something, or monitor
the state of the application as it changes.
An Event is an action initiated either by user action or by other VBA code. An Event
Procedure is a Sub procedure that you write, according to the specification of the event,
which is called automatically by Access when an event of that particular type occurs.

Page 126 of 144


A form frmComboTest is defined:

This is a test form which shows how the ProductID combo box displays values based on
the CategoryID selected.
The content, the RowSource for the ProductID, is obtained using the SQL statement:
SELECT distinct Products.ProductID, Products.ProductName FROM
Products WHERE
(((Products.CategoryID)=[forms]![frmComboTest]![CategoryID])) UNION
select distinct null, null FROM Products ORDER BY Products.ProductName;
The ProductID combo box is requeried on both the OnCurrent event for the Form as
well as the Change event for the CategoryID combo box. The code required for the
OnCurrent event procedure is:
Private Sub Form_Current()
ProductID.Requery
End Sub
The code required for the Change event on the parent (master) is:
Private Sub CategoryID_Change()
ProductID.Value = Null
ProductID.Requery
End Sub
Pay close attention to the RowSource for the ProductID combo box. The RowSource is
an SQL statement. It is based on a UNION query with the appropriate Product table
records as well as a row that contains null values. When the CategoryID combo box is
changed, the ProductID combo box receives the null value. This is how the contents of
the ProductID combo box are cleared.
Unfortunately, the SQL used as the RowSource has to be written by you, the user – this
particular kind of SQL statement cannot automatically be generated on the basis of a
user-defined query.
Note The syntax for referring to objects, such as forms and controls, is not completely
straightforward.
Use either of the following syntax statements to reference a control on a main form:

Page 127 of 144


Forms!formname!controlname
Me!controlname
(In more recent versions, you can substitute dot (.) for exclamation mark (!) between
objects.)
One of the most common mistakes made in Access form development is improper
syntax when referencing controls on a subform. As far as Access is concerned, a
subform is just another control on the main form.
To refer to a subform or a control on a subform, you must remember that Access treats
the subform as a control. Essentially, you have a form with a control with a control. To
express that arrangement in terms Access can decipher, you need the Form property
as follows
Forms!mainform!subform.Form.controlonsubform
Me!subform.Form.controlonsubform
In other words, subform is simply a control on the main form.
In the example given above, frmComboTest is a form. If it is included as a form on
another form (for example, mainform), that is, if it is a subform of mainform, then the
SQL SELECT statement needs to be amended:
SELECT distinct Products.ProductID, Products.ProductName FROM
Products WHERE
(((Products.CategoryID)=[forms]![mainform]![frmComboTest]![CategoryID]))
UNION select distinct null, null FROM Products ORDER BY
Products.ProductName;
If the full syntax is not respected, Access returns an error when the field on the form is
used.

Page 128 of 144


6. Appendix 6: Normalisation

6.1. Introduction to Normalisation


With thanks to my former colleague at Huddersfield, Steve Wade, on whose material much of this
section is based! I have also referred to the book by Graham Curtis (Curtis & Cobham 2008).
Normalisation is a bottom-up technique for relational data analysis based on analysing inter-
relationships between data items. From our point of view, this is just an alternative way to
establish the entity types, their attributes, and their relationships. We will use it primarily as a way
of crosschecking that we have found all the relevant entities, attributes and relationships. Curtis
(2008) says: “Normalisation results in a fine tuning of the entity model. It may lead to more entities
and relationships being defined if the entity model does not contain entities in the simplest form.
The analyst is moving away from considering a high-level logical model of the organisation to the
detailed analysis of data and its impact on that model. Doing this ensures that data is organised in
such a way that (1) updating a piece of data generally requires its update in only one place, and
(2) deletion of a specified piece of data does not lead to the unintended loss of other data.”
The end product of the technique is a set of entities designed to minimise redundancy of
data and to avoid consistency problems.

6.2. Introduction
♦ The relational database has a mathematical basis in Set
Theory
♦ It is possible to exploit the mathematical basis for
relational database design to improve the quality of the
actual design. Normalisation is a formal technique for
ensuring that the right attributes appear on the right
entities
♦ Also called relational data analysis, the technique of
normalisation is based on a property of data called
dependency or functional dependency.
♦ Normalisation aims to yield a set of entities designed to
∗ Minimise data redundancy
∗ Avoid consistency problems
♦ Normalisation is a “Bottom up” technique
Instead of starting with a top-down analysis of user requirements, this
technique starts with the existing situation: the technique examines business
documents as they are currently used in existing business processes. From
this, it induces the necessary database entities. For example, the starting
point might be an existing purchase order form. As we saw above in section
5, normalisation enables us to deduce the need for several entity types,
including purchase order, supplier, product and order detail line.
♦ It is applied to attributes discovered on paper and
computer forms, viewed as a table (cf. Spreadsheet view)

6.3. Preliminary remarks

Page 129 of 144


An Entity name takes the form of a Singular noun (or occasionally noun phrase), e.g. Student,
Module, Module registration.
♦ Attributes should be singular and represent a single fact
about an entity.
♦ They MUST NOT be lists (= more than one fact); an
Awards attribute for a student is WRONG
♦ Each attribute should depend upon the whole key (an
issue only if the key is compound)
♦ If an attribute depends on part only of the key, this is
WRONG
♦ Each attribute should depend only on the key
♦ If an attribute depends on any other non-key attribute,
this is WRONG
♦ What do I mean by wrong?
If you design a database which does not respect the rules which follow, it is
likely that you will store duplicate, and therefore potentially inconsistent, data;
or that other inconsistencies will develop, especially when you delete entity
occurrences.

6.4. Terminology

6.4.1. Records
Data tends to be held in groups of items - each individual item of data is a field, and the
group of fields constitutes a record.

6.4.2. Field names


Or attributes.

6.4.3. Keys
♦ Introduction
Before we can store details of (facts about) things in a database, we need
unique labels, that is, identifiers or names, for the entities about which
attributes are to be stored. These identifiers, or keys, need to be chosen with
precision and consistency.
Candidate keys are possible labels / names / identifiers.
Where there is more than one candidate key, we need to choose one as the
primary key.
Usually we choose numeric / short keys.
Often, we deliberately create a unique key (perhaps intended to be computer-
generated), such as a student enrolment number
∗ Candidate keys
There may be more than one possible key in a given situation. For
example, we might identify an Employee by her payroll number, or
by her National Insurance (NI) number in the UK, or Social Security
(US) number.
Page 130 of 144
∗ Choose numeric / short keys
This may imply encoding, e.g. BAIB for BA (Honours) International
Business.
∗ Often need to create a unique key (perhaps
computer-generated)
Microsoft Access offers the AutoNumber facility to assist in
generating unique keys.
♦ Key types
Keys may be:
∗ Simple: single attribute
∗ Secondary - identifies a group of linked
occurrences
This document does not discuss secondary keys.
∗ Compound

(a) This means the key is made up of more than one


attribute

(b) Each attribute is often a single key in another


relation
∗ Candidate key
An attribute or combination of attributes is a candidate key if it
uniquely identifies a record. We have to choose to make one the
Primary key, and leave others as Alternate keys (i.e. candidate
keys which have not been chosen as the Primary key).
∗ Foreign keys
Foreign keys implement one to many (1:M) relationships in the
following way. If two entity types are related 1:M, then the primary
key attribute(s) (or, rarely, the alternate key attribute(s)) of the one
entity MUST appear as attribute(s) of the many entity. This is
because this is the only way in which the database software can
“join” the many records to the one. Consider a situation in which
students are on a programme. The entity types are Programme and
Student, related 1:M. If the primary key of Programme is
Programme_Code, then Student must also have a
Programme_Code attribute.
∗ Questions and Answers

Question: How many primary keys must an entity have?

Answer: one

Question: How many foreign keys does an entity have?

Answer: potentially several – one per 1:M relationship in


which the entity is at the many end
Page 131 of 144
♦ Functional Dependency
This is a fundamental concept, initially a bit difficult to grasp.
Consider an entity E that has two attributes A and B. The attribute B of the
entity is functionally dependent on the attribute A if and only if for each value
of A no more than one value of B is associated. In other words, the value of
attribute A uniquely determines the value of B and if there were several entity
occurrences that had the same value of A then all these entity occurrences
will have an identical value of attribute B.
A and B need not be single attributes. They could be any subsets of the
attributes of an entity E (possibly single attributes). We may then write
E.A -> E.B
This can be read as A determines B (though this is not strictly correct), or that
B depends on A (which is true).
♦ Dependency made simple(r!)
An attribute B is functionally dependent on A if for any particular example of
B, it is the value for that one particular A and there can be no other value.
Example: healthy sound animals have a number of legs which is true for all
animals of that type.
Knowing an animal is a dog, we know it ought to have four legs.
Number of legs is functionally dependent upon animal type.

6.5. The various stages of normalisation

6.5.1. Convert data into unnormalised form (UNF, 0NF)


List out all the data attributes you can identify. I find it useful to record them on a
spreadsheet or in a word processor running in outline mode - using either of these tools,
it is easy to reorder data as you realise that particular attributes are part of another
entity from the one you first thought.

6.5.2. Convert UNF into First Normal Form (1NF)


♦ 1NF: The rule
♦ There must be only one value per cell (row / column
intersection) in the entity viewed as a table. That is, an
attribute must be a single value, and not a list.
♦ Identify Groups: data field(s) (one or many) that can have
multiple values for the single main key
♦ You should remove repeating groups ('remove' means
set up as a separate entity)
♦ Key to the new entity will be a compound key comprising
the original key plus additional information needed to
uniquely identify individual occurrences

6.5.3. Convert 1NF into Second normal form (2NF)


♦ Rule for 2NF: A 1NF entity is also in 2NF if every non-key
attribute depends on the whole of the key
Page 132 of 144
♦ Avoids duplication, which is inefficient and leads to
update problems
♦ For each entity, determine whether the key is compound
♦ For entities which have a compound key, ask: "Are there
any non-key attributes which depend only on part of the
key?"
♦ If there are: remove them (i.e. set up a new entity)

6.5.4. Convert 2NF into Third normal form (3NF)


♦ For each entity, ask: "Are there any non-key attributes
dependent only on any other non-key attributes?"
♦ If there are: remove them (i.e. set up a new entity)
♦ Rule for 3NF:
A 2NF entity is also in 3NF if no non-key attribute
depends on any other non- key attribute
♦ Avoids duplication, which is inefficient and leads to
update problems
♦ Gives us somewhere to store data when there is no (in
this case) order

6.6. Further normalisation


Fourth normal, Boyce-Codd normal and further normal forms have been identified in the database
literature but circumstances in which they are needed are so rare as to be of little practical
significance.

6.7. A full example of normalisation


We shall consider how each of these steps might be carried out on the following document:

Purchase Order placed by Entrepôt Direct, Lille Date 01/05/2004


Purchase order 1234567 Supplier number 1
number

Supplier name Aardvark Supplier address 23bis rue du Flâneur


35000 RENNES
Supplier product Product Name Quantity Packet size Purchase price Sub-total
code required
A1 Digital Radio 10 1 160,00 1600,00
A2 Whiteboard 16 1 120,00 1920,00
Pre-tax total 3520,00
VAT rate 19,6%
VAT 689,92
Total remitted 4209,92

Page 133 of 144


6.7.1. Step 1 - Convert data into UNF

This involves representing the data in the Purchase Order in the following format:

Purchase order number


Supplier number
Supplier Name
Supplier Address
Supplier product code
Product Name
Quantity required
Packet size
Purchase price

This UNF representation indicates that:


 "Purchase order number" is the key data item.
 "Supplier number, Name and Address" occur once per order.
 The remaining items are repeated a number of times as a group. They are indented above –
this emphasises the repetition.

6.7.2. Step 2 - Convert data into 1NF

Remove repeating groups, i.e. groups of data fields (or a single data field) that may have multiple values for
a single value of the key.
Set such groups up as a separate entity.
The key to this new entity will be a compound key comprising the original key plus additional information to
identify individual occurrences.
Applying this to the above example gives us the following:

First Normal Form

Purchase order number Purchase order number


Supplier number Supplier product code
Supplier Name Product Name
Supplier Address Quantity required
Packet size
Purchase price

We now have two entities, purchase order and purchase order detail.

The rule for 1NF is therefore:

There must be only one value per cell (row/column intersection) in the entity. Put in another way, an entity is
in first normal form (1NF) if there are no repeating groups of attributes.

Page 134 of 144


6.7.3. Step 3 - Convert data into 2NF

For each entity we must ask:


 Does the entity have a compound key?
 If it does, then we must ask: Are there any non-key attributes which depend on only a part of
the key?

Rule for 2NF

A 1NF entity is also in 2NF if every non-key attribute depends on the whole of the key.

Any attributes that are dependent on only a part of the key should be removed and stored in their own
entity along with the part-key on which they depend.
Applying this rule to our example leads us to produce the following representation:

Second Normal Form

Purchase order number Purchase order number


Supplier number Part No
Supplier Name Quantity Required
Supplier Address Purchase price

Part Number
Product Name
Packet size

What was wrong with 1NF, and what have we gained by moving to 2NF?
The answer is that the 1NF representation contains unnecessary repetition of "Part Description" and
"Packet size" information for every part ordered.
The same part may be ordered many hundreds of times so that storing the data in 1NF could represent a
waste of disk space. More importantly, this amount of redundancy in the way data is stored could lead to
significant update problems.
Another problem with our 1NF representation is that there is nowhere in the database to store information
about Parts which are not currently on order.
So to summarise, by normalising, we have discovered a third entity type, which is going to be called
something like Product, or Stock item.
To avoid the possibility of the database becoming inconsistent (with some copies of the same data being
updated whilst other copies are overlooked) we would ideally like to store each piece of data only once.
This is really what normalisation is all about.

6.7.4. Step 4 - Convert data into 3NF


For each 2NF entity must ask:
 Are any of the non-key attributes dependent on any other non-key attributes?
So that we can enforce the following rule:

Rule for 3NF

Page 135 of 144


A 2NF entity is also in 3NF if no non-key attribute depends on any other non-key attribute.

Applying this rule to our example would give the following representation:

Third Normal Form

PURCHASE ORDER PURCHASE ORDER LINE SUPPLIER


Purchase order number Purchase order number Supplier number
Supplier number Supplier product code Supplier Name
Quantity required Supplier Address
Purchase price
PRODUCT
Supplier product code
Product Name
Packet size

Note: The items in CAPITALS above are suggested names for the entities now identified
Again we should ask the question: "What's wrong with 2NF"?
Our 2NF representation included unnecessary repetition of "Supplier Name" and "Supplier Address" for
every purchase order associated with the same supplier.
This corresponds to the first problem that we discussed with regard to 1NF.
The second problem corresponds to the fact that in the 2NF representation there is nowhere to store
information about suppliers from whom nothing is currently on order.
So normalisation here has identified the existence of a supplier entity.
There is another possible reason for interdependency of attributes. This is that one attribute is calculated
from others. It is very wise not to make such calculated attributes part of the database structure. Instead, it
is better simply to remove them, and to create them as calculated fields on queries, reports or forms as
they are needed. In this example, it would be unwise to have a subtotal attribute calculated as quantity
required times purchase price. Instead, as the subtotal is needed – e.g. on a report or form – it should
normally be recalculated by a formula. The exception to this advice is where it genuinely is necessary to
store the value calculated at one point in time, typically for accounting reasons.

6.8. Normalisation: A Summary


♦ First normal form: No repeating attributes or groups of
attributes
♦ Second normal form: Remove dependencies on part
(only) of the key
♦ Third normal form: Remove inter-dependencies
♦ Most important aspects of normalisation
Generally, the 3NF representation is the ideal we should strive for.
There are higher normal forms for dealing with anomalous situations. They are explained in the
database literature, but rarely have any practical significance.
The steps involved in normalisation may be summarised as follows:
1. Separate repeating groups
2. Separate partial-key dependencies
3. Separate non-key dependencies
♦ ESSENTIAL rule: No repeating attributes (no plurals)
Page 136 of 144
(1NF)
If you break this essential rule, you will end up with a completely unusable
database design.
♦ GOLDEN rule: “An entity is fully normalised if every non-
key attribute depends upon the key, the whole key and
nothing but the key.”
Even if you do not carry out all the stages of normalisation, this can be a very
useful final check to apply to the results of any database design work.

6.9. Normalisation complements top-down entity-relationship


modelling
Once you have completed normalisation, you should compare the results with those achieved by
the top-down modelling carried out in accordance with Chen's ER model and resolve any
inconsistencies.

6.10. What is achieved by normalisation?


Normalisation of entity types leads to a data model that forms the basis of a good database
design because it:
• Decomposes entity types into their simple “atomic constituents”, that is, their basic parts
• Ensures that data is not unnecessarily repeated
• Allows data on entities to be independent of the existence of other entities
It is also important to realise that the data that was associated with the original unnormalised
entity is still recoverable. The entities are connected at the entity level by their key attributes.
Having normalised each of the entity types in the model, it is possible to recombine the entity
types in order to answer specific questions. This is done in Microsoft Access by means of queries
which join together more than one table.
• So a further advantage of normalisation is that tables which are in third normal form can
be recombined so as to answer almost any conceivable question.

6.11. How is normalisation used in practice?


Very few database designers use normalisation as the only way in which they build data models.
Instead, normalisation is used in one of the two following ways:
The data model is built in two ways, using top-down entity modelling (section 24), and by bottom-
up normalisation. The results are then compared and any inconsistencies resolved by specific
design decisions.
The data model created by top-down means is checked to ensure that all the entities are in third
normal format. This can either e done by using the three rules outlined above for each stage of
normalisation, or simply by applying the so-called “golden rule” of normalisation.

6.12. Still confused?


See if Microsoft’s explanation at
http://support.microsoft.com/default.aspx?scid=kb;EN-US;283878
(checked 24/11/2008) is any help. There’s also an associated webcast.

6.13. Some questions with which to check your understanding


Page 137 of 144
The answers to these questions are NOT in this document. However, if you do them, you may
approach the author of this document to discuss them!

1. Identify the duplicated data in the following table:

Emp. Emp. Name Job Job Title Start Date Finish Date
No. Code

123 Smith A1 Trainee Analyst 3/2/2000 5/2/2001


349 Cairns P3 Senior Programmer 3/2/2001 3/9/2001
541 McPhee P3 Senior Programmer 5/2/2001 7/4/2003
123 Smith A2 Analyst 6/2/2001 7/5/2003
541 McPhee P4 Chief Programmer 8/4/2003
123 Smith A3 Senior Analyst 8/5/2003 10/10/2003
632 Keith A2 Analyst 9/4/2004
123 Smith M1 Project Manager 11/10/2003

2. Given the following table are the following statements true or false?

a. A customer can have more than one salesperson


b. A salesperson can have more than one customer
c. A discount code is associated with only one discount percentage.
d. The total sales to date is determined solely by customer no.

S.Person S.Person Cust. Cust. Name Cust. Cust. Total


No. Name No. Disc. Disc % Sales to
Code Date

002 Kellerman 257 Jones A 10 2272


286 Brown B 15 189
295 Foster B 15 23652
014 Adams 317 Green C 20 24272
352 Tate A 10 5734
027 Jennings 463 Young O 0 5734
494 Peacock C 20 4153
295 Foster B 15 2253

i) Choose a key field for the above table.


ii) Remove the repeating groups from the above table to produce 1NF tables.
iii) Perform 2NF analysis on the tables identified in (ii)
iv) Perform 3NF analysis on the tables identified in (iii)
v) Represent the tables in (iv) as an ERM.

3. Consider a retail company that stores sales information. The information is currently stored in a single file. The
company has several stores (shops) and the file has a record for every product line on sale at each store. The
file also contains details of future price changes and the effective date for which these have been scheduled.
The file therefore has the following structure:

Store Item Description Annual Sales Price Effective Date


to Date

10 AB13 Towel 1100.00 1.15 30/06/03


10 CF99 Handkerchief 350.00 0.38 30/06/03
10 CF99 Handkerchief * 0.47 31/10/03
Page 138 of 144
10 HK76 Jeans 1700.00 26.99 30/06/03
20 AB13 Towel 840.00 * *

The entries marked * have values that have already been entered, i.e. * represents 'ditto'.

(I) What problems might result from storing this data in a single table?
(II) Take the data in the file through to third normal form.
(III) Does the new file structure address all the problems identified in (I)?
(IV) If the sales manager wanted to add the following data to the files:
- Supplier Name for each item
- Name of store manager for each store
- Maximum quantity of each item to be stored in each store
Where would the data be stored in your 3NF model? Would any new tables be
required?

4. The following table shows the breakdown of student marks on different courses by assignment number. In this
example we have a repeating group inside a repeating group. For each course there is repeating student data
and for each student there is repeating assignment data.

Course Course Student Student Ass. Ass. Ass.


Code Title Code Name No. Subject Mark

SA Systems A1234 Wade 1 Dataflow Diagrams 75


Analysis
2 Entity Relationship Models 67
3 Normalisation 25

A1235 Walker 1 Dataflow Diagrams 60


2 Entity Relationship Models 54
3 Normalisation 32

DB Database A1230 Smith 1 SQL 65


Design
2 Prototype Implementation 54

A1234 Wade 1 SQL 55


2 Prototype Implementation 64

i) Take the data in this report through to 3NF. What are the benefits of storing this data in third normal
form?

5. The Natural Yoghurt Company sells many products. Each product is composed of several raw ingredients that
are supplied by various vendors. A particular ingredient is always supplied by the same vendor; however a
vendor may supply more than one ingredient. The product line (product offering) is divided up so that only one
department is responsible for a particular product. However, each department is responsible for more than one
product. Each manager manages exactly one department. The following data items must be stored in the
Natural Yoghurt Company’s database:

Product Number EmployeeID of manager Product name


Manager’s name Ingredient Number Department Identification No.
Ingredient name Department name Qty of ingredient required for
product
Dept office address Dept. phone number Vendor ID
Vendor name Vendor address

Derive an entity relationship model and a set of 3NF tables from the above description.

Page 139 of 144


7. Appendix 7 Installing and using Microsoft Visio

7.1. Introduction
MS Visio is now available to ESC students via the Microsoft Developers’ Network
Academic Alliance MSDNAA Electronic Licence Management System ELMS. You
should by now have received an email from e-academy telling you how you can profit
from this scheme.
In order to create a drawing of a particular kind, you use both a template file and a
stencil file. These together tell Visio what kind of symbols can be used. The equivalent
terms in French are un modèle and un gabarit.
Microsoft Office Visio 2007 makes it easy for business and ICT professionals to
visualise, explore, and communicate complex information. Rather than complicated text
and tables that are hard to understand, you can use Visio diagrams that communicate
information at a glance. Instead of static pictures, you can create data-connected Visio
diagrams that display data, are easy to refresh, and dramatically increase your
productivity. You can use the wide variety of diagrams in Office Visio 2007 to
understand, act on, and share information about organizational systems, resources, and
processes throughout an enterprise.
Office Visio 2007 is available in two stand-alone editions: Office Visio Professional and
Office Visio Standard. Office Visio Standard 2007 has the same basic functionality as
Visio Professional 2007 and includes a subset of its features and templates. Office Visio
Professional 2007 offers advanced functionality, such as data connectivity and
visualization features, that Office Visio Standard 2007 does not.

7.2. Visualize complex information to better understand it


Office Visio 2007 provides a wide range of templates — business process flowcharts,
network diagrams, workflow diagrams, database models, and software diagrams — you
can use to visualize and streamline business processes, track projects and resources,
chart organizations, map networks, diagram building sites, and optimize systems.
You can more easily visualize processes, systems, and complex information using new
or improved features in Office Visio 2007:
♦ Get started quickly with templates.
Office Visio 2007 includes specific tools to support the diverse
diagramming needs of IT and business professionals. Create a broader
range of diagrams with new templates, such as the ITIL (Information
Technology Infrastructure Library) template and the Value Stream
Mapping template in Office Visio Professional 2007. Use the
predefined Microsoft SmartShapes symbols and powerful search
capabilities to locate the right shape, whether it is saved on a
computer or on the Web.
♦ Quickly access templates you use often.
In the Getting Started window, find the template you need by
browsing simplified template categories and using large template
previews. Locate the templates you used recently by using the new
Recent Templates view in the Getting Started window.
♦ Get inspired by sample diagrams.

Page 140 of 144


Find new sample diagrams more easily by opening the Getting Started
window and using the Samples category in Office Visio Professional
2007. View sample diagrams that are integrated with data to get ideas
for creating your own diagrams, to realize how data provides more
context for many diagram types, and to determine which template you
want to use.
♦ Connect shapes without drawing connectors.
The AutoConnect functionality in Office Visio 2007 connects shapes,
distributes them evenly, and aligns them for you — with only one
click. When you move the connected shapes, they stay connected and
the connectors automatically reroute between the shapes

7.3. Learning Visio


It’s worth investing a little (but not much?) effort into mastering Visio.
The standard Microsoft tutorials can be found at http://office.microsoft.com/en-
us/visio/CH102262071033.aspx

7.4. Creating DFDs using Visio


In Visio, the DFD is called a data flow model diagram in the "software" category, or
diagramme de modèle de flux de données in the “logiciels” category. Here you will find
various DFD conventions – one widely used in America is Gane Sarson.

7.5. Installing SSADM support


In this document, I have followed the SSADM convention (particularly fir DFDs). The
standard Visio product no longer contains support for SSADM shapes. I can make
available a template on request.
To use them with an English-language or French-language version of Visio, please
follow the following instructions. I am indebted to the Danish specialist Pavel Hruby,
whose web site is to be found at http://www.phruby.com/index.html (checked
24/11/2008) for the basic information on which I based this approach.
Typically, Visio 2002 keeps stencils and templates in the folder
C:\Program Files\Microsoft Office\Visio10\1033\Solutions\Software
in the English-language version of the product, and in the folder
C:\Program Files\Microsoft Office\Visio10\1033\Solutions\Logiciel
in the French language version of the product.
I believe that Visio 2003 uses the folder
C:\Program Files\Microsoft Office\Visio11\1036\
in both the English-language and French language versions of the product.
I believe that Visio 2007 uses the folder
C:\Program Files\Microsoft Office\Office12\1036\
in both the English-language and French language versions of the product.
It is not very wise to put your additional files in these Microsoft folders. I advise you
instead to keep them in a different folder, and to tell Visio where to find them. So:
♦ In English-language Visio:
∗ Download the files to any folder, except for the
Page 141 of 144
folder in which Visio 2003 stores its own stencils
and templates.
∗ Start Visio, click "Tools" and "Options". In the
"Advanced" tab, click "File Paths..." and type in the
fields "Stencils:" and "Templates:" the paths to the
directory with the SSADM stencil and template.
Restart Visio. The template "SSADM" will appear
under the "(Other)" tab, not under the Software tab
as it was in certain earlier versions of Visio.
♦ In French-language Visio:
∗ Download the files to any folder, except for the
folder in which Visio 2003 stores its own stencils
and templates.
∗ Start Visio, click "Outils" and "Options". In the
"Options avancés" tab, click "Chemins d’accès..."
and type in the fields "Modèles:" and "Gabarits:"
the paths to the directory with the SSADM stencil
and template. Restart Visio. The template
"SSADM" will appear under the "(Autres)" tab.

Page 142 of 144


8. Appendix 8 Structured Walkthroughs, a way to improve
the quality of analysis

8.1. How to seek for perfection! Improving the quality of our


work
This section borrows heavily from Bell, Douglas (2005), himself quoting Weinberg,
Gerald M. (1998).
Some aspects of systems analysis require considerable precision. In effect, we are
looking for the correct answer to a precise problem. This is a way of thinking which is
alien to most of us most of the time. Human beings are used to imprecision! But in
analysis and in programming, we seek zero-defect, bug-free implementations.12
However, in this real world, this is impossible to achieve and very difficult to approach.
Once we have created something, such as an entity relationship diagram, it is wise to
assume that it will contain faults. However, we can become very blind to our own
mistakes. For this reason, it is a common experience that someone else can spot errors
better than the author himself. This observation led to the invention of the so-called
"structured walk-through". Credit for its invention belongs to Gerry Weinberg, in his
book "The psychology of computer programming". Weinberg suggested that
programmers see their programs as an extension of themselves. He suggested that we get
very involved with our own creations and tend to regard them as manifestations of our
own thoughts. Since we are unable to find fault with ourselves, we become unable to see
mistakes in what we have created; the recognition of such failings is unacceptable to us.
This failing is sometimes referred to as cognitive dissonance.
The solution, as Douglas Bell says in his book "Software engineering for students: a
programming approach", is to seek help with fault finding. In doing this we will
relinquish our private relationship with our work. When applied to computer
programming, the approach is sometimes called ego-less programming. But it has much
wider application, specifically to anything we create which needs to be more-or-less
correct. Seeking for help can be a completely informal technique, carried out by
colleagues in a friendly manner. It must not be a formalised or rigid procedure of the
organisation. Such formalisation destroys its ethos and therefore its effectiveness.
Instead, if you get a friend or colleague to inspect your work, it is extraordinary to
witness how quickly someone else can see a fault that has been defeating you for hours.
Studies also show that different people tend to uncover different types of fault. This
further reinforces the need for team techniques.
An extension to this approach is the so-called “structured walkthrough”. A structured
walk-through is simply a term for on organised meeting at which an artefact is examined
by a group of colleagues. The aim of the meeting is to try to find faults which might
otherwise go undetected for some time. The word structured in this context simply
means well organised. The term walk-through means that the producer of the artefact has
to explain to the meeting
♦ Step by step the working of the artefact
and
♦ All the assumptions behind that artefact.
The very act of explaining the artefact, and of course of letting other people look at that
artefact, will enable errors or problems to be detected much more quickly. It is important

12
Please note : the zero-defect ideal is emphatically not expected in the work that you do for assessment.
Instead, we are aiming for “good enough”! This appendix is included only because of the extremely useful
technique it illustrates.
Page 143 of 144
that the meeting concern itself only with the identification of problems or serious errors
of style. The designer of the artefact should correct the problems subsequently
Keys to success in the use of structured walk-through include:
 Correctly assembling the right group of colleagues.
 Distributing the artefact to participants before the meeting.
 Total concentration on the artefact itself, rather than the person -- individual
criticism should be avoided.
 The meeting should be scheduled in advance and of fixed duration.
The benefits of structured walk-throughs can be summarised as:
 The quality of the artefact is improved because more faults are found, and
because errors of style -- which can lead subsequently to errors of interpretation
by others -- are eliminated.
 Misunderstandings of the original requirements are more likely to be detected.
 The earlier a problem is found with an artefact, the cheaper it will be to fix it.
But there are obvious problems in using this technique in an organisational culture that
is not collaborative and supportive.

8.2. References for Appendix 8


Bell, Douglas (2005) "Software Engineering for Students (4 ed)" Pearson Mar
2005, Paperback, 448 pages ISBN13: 9780321261274 ISBN10: 0321261275

Weinberg, Gerald M. (1998) “The Psychology of Computer Programming: Silver


Anniversary Edition” (Paperback) Gerald M. Weinberg (Author) Dorset House;
Anl Sub edition (September 1998) ISBN-10: 0932633420 ISBN-13: 978-
0932633422

Page 144 of 144

You might also like