You are on page 1of 63

Data Warehousing in Fisheries: A Case Study of The National Fisheries Resources Research Institute (FIRRI)

By Onyango Gerald B.Sc(Mak), M.Sc(Mak). Department of Information Systems, Faculty of Computing and Information Technology, Makerere University. Email: gponyango@yahoo.co.uk / Phone : +256782523740

A Project Report Submitted to School of Graduate Studies in Partial Fulllment of the Requirements for the Award of the Degree of Master of Science in Computer Science of Makerere University

OPTION : Management Information Systems

Supervisor : Dr. Ogao Patrick Department of Information Systems, Faculty of Computing and Information Technology, Makerere University. Email: ogao@cit.mak.ac.ug,+256-41-540628, Fax:+256-41-540620

October 2006

Declaration
I, Onyango Gerald, do hereby declare that this Project Report is original and has not been published and/or submitted for any other degree award to any other University before.

Signed: .......................................................... Gerald Onyango, B.Sc., M.Sc.

Date: ...........................................

Approval: This Project Report has been submitted for Examination with my approval as the supervisor

Signed: .......................................................... Dr. Patrick Ogao, Ph.D. Department of Information Systems, Faculty of Computing and Information Technology.

Date: ...........................................

Dedication
To Dad and Mum who made it possible.

Good things in life do not come by easily

ii

Acknowledgments
There are a number of people who made all this possible. Thanks be to God for the strength and wisdom he gave me throughout the study, and to various people who assisted me in one way or another that enabled me see the fruits of this project.

My sincere appreciation goes to my supervisor, Dr. Patrick Ogao, without whose help this work would not be as it is.

Without the support of my parents and siblings, this study would not have been nurtured to fruitation. I also acknowledge all my friends and classmates for having made my academics at Makerere joyous and fruitful. Special thanks go to my coursemates Mr. Ssemwogerere Tom and Mr. Ndyanabo Antony who provided me with the core software I used in this project.

MAY THE ALMIGHTY GOD BLESS YOU ALL ABUNDANTLY

iii

Abstract
Information on the status and trends in sheries is key to sustainable exploitation and management of sheries resources. In Uganda, the organisation charged with research and dissemination of sheries information is The National Fisheries Resources Research Institute (FIRRI). FIRRI packages this information in brochures, posters, videos, and press releases. However, preparation of this information, within FIRRI, is an uphill task because most of this information is scattered in les and other storage media scattered among the Institutes different research disciplines. This called for a system that can centralise the storage and dissemination of the information generated within FIRRI. A Data Warehousing system was the system of choice to remedy the situation.

Work on the development of the Data Warehousing System commenced with the development of a Data Mart for one of the research disciplines. A Data mart for the Fish Biology and Ecology discipline was designed and developed using Microsoft SQL Server 2005. SQL Server Integration Services (SSIS) was used to develop the Extract, Transform, and Load (ETL) tools, while SQL Server Analysis Services (SSAS) was used to develop the dimensional data cubes. Microsoft Excel, tted with Cube Analysis add-in, was chosen as the enduser interface. Validation of the system proved it to be functioning as required. The results of the study show that it is possible to centralise information storage and retrieval in sheries using a data warehouse. It provides evidence that centralised data storage, information retrieval, and reporting in FIRRI is both possible and attainable.

Shortage of time could not allow for the development of a fully edged Data Warehouse, complete with a web interface. Since it was only possible to build a data mart for one of the disciplines within FIRRI, it is proposed that future work comprise of the development of an Enterprise wide Data Warehouse that can even be accesses through the World Wide Web / Internet.

iv

Contents
1 Introduction 1.1 1.2 1.3 Background to The Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 The Case for a Data Warehouse (DW) and Data Mining . . . . . . . . . . . . . Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 1.3.2 1.4 1.5 2 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specic Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 3 3 3 4 4 4 5 5 5 6 6 6 7 8 8 9 10 11 13 13 13 14 14 14 16 16

Justication of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Literature Review 2.1 2.2 2.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The National Fisheries Resources Research Institute (FIRRI) . . . . . . . . . . . . . . Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.4 2.5 The Data Warehousing Concept . . . . . . . . . . . . . . . . . . . . . . . . . Data Warehouse Design Model . . . . . . . . . . . . . . . . . . . . . . . . . Data Warehouse Structure and Tools . . . . . . . . . . . . . . . . . . . . . . . Data Mart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Current Approaches to Data Warehouse (DW) Development . . . . . . . . . . Analysis of the Current Approaches to Data Warehouse Development . . . . .

Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Fisheries Data Warehousing Projects . . . . . . . . . . . . . . . . . . . . . . .

Methodology 3.1 3.2 3.3 3.4 3.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Study and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Implementation 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

4.2

System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 Fisheries Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Usage of FIRRIs Information System . . . . . . . . . . . . . . . . . . . . . . Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . User Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Extraction, Transformation, and Load (ETL) . . . . . . . . . . . . . . . . Analysis Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enduser Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16 16 17 17 18 18 18 19 20 20 25 28 28 29 38 40 40 46 46 46 47

4.3

System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 4.3.2 4.3.3

4.4

System Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5

4.5

Conclusions, Limitations, and Future Work . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 4.5.2 4.5.3

vi

List of Figures
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Warehouse Architecture for the FIRRI Fisheries Data Warehouse . . . . . . . . . . . . Fish Catch Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fish Prey Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fish Biology Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fish Gonad Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fish Catch-Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fish Catch Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fish Catch-Length Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fish Biology Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 21 21 22 22 23 23 24 24 24 25 25 26 26 27 27 27 28 28 29 29 30 30 31 32 32 33 33 34 34 35

4.10 Fish Prey Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Fish Gonad Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12 Date Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13 Water Body Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14 Catch Type Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15 Species Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.16 Fishing Gear Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.17 Sex and Maturity Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.18 Prey Type Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.19 Length Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.20 Excel Source Adapter Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.21 Flatle Source Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.22 Unique Identiers for Rows of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.23 Dimension and Fact Table Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.24 Foreach Loop Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.25 Centralised Running / Execution of Packages . . . . . . . . . . . . . . . . . . . . . . 4.26 Execute SQL Task Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.27 Data Cleaning and Surrogate Key Generation Control Flow . . . . . . . . . . . . . . . 4.28 Cleaning the Dimension-Data Flow and Generating Surrogate Keys . . . . . . . . . . 4.29 Fuzzy Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.30 Correcting Spelling Mistakes and Adding Missing Data Entries . . . . . . . . . . . . . 4.31 Sorting The Data Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

4.32 Surrogate Key Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.33 Inner Joining Two Data Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.34 Mapping Source Data to The Destination Table . . . . . . . . . . . . . . . . . . . . . 4.35 Fact-Data Cleaning and Transformation Data Flow Task . . . . . . . . . . . . . . . . 4.36 Surrogate Key Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.37 Loading Data into the Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.38 Example of Warehouse Dimension Table Data Flow Task . . . . . . . . . . . . . . . . 4.39 Example of Warehouse Fact Table Data Flow Task . . . . . . . . . . . . . . . . . . . 4.40 Structure of Analysis Data Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.41 Length Frequency Distribution of Oreochromis niloticus . . . . . . . . . . . . . . . . 4.42 Maximum Weight of Selected Fish Species Across 4 Quarters . . . . . . . . . . . . . 4.43 Check of Rows Written to the Data Warehouse . . . . . . . . . . . . . . . . . . . . . 4.44 System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.45 Rows Written During Data Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36 37 37 38 39 39 39 40 41 42 43 44 45 46

viii

List of Acronyms
1. CMR 2. DW 3. EDW 4. FAO 5. FIRRI 6. GUI 7. IT 8. MOLAP 9. NARS 10. ODBC 11. OLAP 12. PARI 13. SSAS 14. SSIS - CSIRO Marine Research - Data Warehouse - Enterprise Data Warehouse - Food And Agricultural Organization - Fisheries Resources Research Institute - Graphic User Interface - Information Technology - Multidimensional Online Analytical Processing - National Agricultural Research System - Ofce Database Connectivity - Online Analytical Processing - Public Agricultural Research Institute - SQL Server Analysis Services - SQL Server Integration Services

ix

Chapter 1 Introduction
1.1 Background to The Study

Fisheries is the industry or occupation devoted to the catching, processing, or selling of sh, shellsh, or other aquatic animals (The Free Dictionary, 2006) [40]. The sheries and aquaculture sector is extremely important in terms of food security, revenue generation and employment (Sugiyama, 2005) [39]. Sugiyama (2005) [39] noted that catching or farming aquatic resources makes an integral contribution to rural livelihoods in many parts of the Pacic region. But although sheries resources are renewable they can be depleted through unsustainable exploitation. It is therefore important to ensure that there is guided development and management of this asset so that it can continue contributing to the livelihood of the people who depend on it.

Sugiyama (2005) [39] argues that knowledge of the status and trends of sheries, including socioeconomic information on shing communities, is a key to using aquatic resources in a sustainable way. Sugiyama (2005)[39] believes that adequate sheries data, and information that are timely and reliable, provide a basis for sound policy development, better decision-making and responsible sheries management. Sugiyama (2005) [39] says that this information is required at the national level for the maintenance of food security and for describing social and economic benets of sheries, as well as for assessing the validity of sheries policy and for tracking the performance of sheries management. Sugiyama (2005) [39] also observed an increasing need for sheries information outside of the government sector. Consequently information is a priority for the sustainable exploitation and management of sh stocks (FIRRI, 2003) [11].

In Uganda, the national institution mandated to undertake, promote and streamline sheries research and to ensure dissemination and application of research results, is The National Fisheries Resources Research Institute (FIRRI) (FIRRI, 2000; FIRRI, 2001; FIRRI, 2002; FIRRI, 2003; FIRRI, 2004; FIRRI, 2005; FIRRI, 2006a;) [8], [8], [9], [10], [11], [12], [13]. FIRRI contributes to the sheries sub-sector

developmental objective by providing information to guide sustainable management of capture sheries resources and development of aquaculture (FIRRI, 2003) [11]. Therefore, the nal products of FIRRIs outputs are Technical Guidelines containing technologies, methods and advice to guide development and management of the sheries of different aquatic systems, and development of aquaculture. The information packages are produced in the form of books, booklets, fact sheets, brochures, posters, video lms and press releases to service providers and resource users. FIRRI disseminates this information to shing communities and other end-users through community barazas, workshops, radio and TV shows (FIRRI, 2006) [14].

The information system within FIRRI was originally manual and paper based. With the advent of computers, different functional areas within the institute developed their own le management systems. This independent keeping of les by the individual functional areas created data redundancy and inconsistency, program-data dependence, inexibility, poor security, and lack of data sharing and availability. Inmon (1993) [18] argues that factors such as: having the same data present on different systems, in different departments; difculty in getting timely, meaningful information; multiple systems giving different answers to the same business questions; and limited analysis by decision makers and policy planners due to non-availability of sophisticated tools and easily decipherable, timely and comprehensive information calls for a data warehouse.

Having noted that lack of effective and timely information from research to shing communities and other stakeholders is a major constraint to sustainable sh production and utilisation, FIRRI is devoted to the development of a Fisheries Database and Information centre (FIRRI, 2006) [14]. FIRRI hopes that the development of a Fisheries Database and Information centre will facilitate timely acquisition and exchange of information on all water bodies in the country and also create a central station from which this information can be obtained. However, Mahadik (2002) [26], claims that as the quantities of information and data handled by organisations increase, the traditional means of analysing the data like reports and query tools prove to be inadequate. Mahadik (2002) [26] believes that powerful system navigation and information exploration tools that use hypermedia, dynamic visual querying and tree maps should be availed. Mahadik (2002) [26], asserts that it should be ensured that: employees are free to communicate with each other and share data and information freely across the organisation; data dictionaries are created and regulated; and online data is reformatted before being inserted into the company wide databases. Mahadik (2002) [26] claims that the latest development in analytical tools, that enables organisations nd meaning in their data, is data mining.

Therefore, to enhance the availability of information in the Ugandan sheries sector, there is need to enhance the processing efciency of the data analysed in FIRRI, and also enhance the dissemination capacity. The optimal solution to this problem would be to build a data warehouse in FIRRI and add data mining tools to the data warehouse to improve on data analysis, and information dissemination, efciency. 2

1.1.1

The Case for a Data Warehouse (DW) and Data Mining

A data warehouse is a subject-oriented, time-variant, nonvolatile database or repository of information collected from many different sources and centrally stored, usually in a single location. Information from multiple sources in different locations, applications or les, be it in different operating units or departments, can be standardised and stored in a single repository. This consolidation of the data store eliminates the reconciliation of inconsistent data, avoids lengthy adhoc manipulation of data from different sources, and improves data quality. Data can be retrieved in a matter of minutes.

In a data warehousing system, the user can create most of his or her own queries and reports by him or her self. He/she recognises the information (s)he wants, makes a request (query) to the data warehouse, and data or information stored in the warehouse is delivered to him/her. Tools such as Online Analytical Processing (OLAP) and data mining improve enduser analysis capabilities and shrink the time between the occurrence of an event and the subsequent alert of the managers. In a data warehousing system, data can be retrieved in a matter of minutes.

A data warehouse contains only trusted data, data that has been cleaned. This guarantees the accuracy and reliability of the data and information in and from a warehouse. Historical data is also stored within the data warehouse, which information can be used to carry out trend analysis and what if analyses.

1.2

Problem Statement

FIRRI scientists are required to produce eld reports after every eld trip. The institute is also required to come up with quarterly reports, and an annual report, detailing the activities performed within the period, as well as packaged information for the stakeholders in the sheries sector. Under the current set up whereby information and data is scattered among different functional areas, integration of data, compilation of reports and packaging of information for stakeholders is an uphill task. Dissemination of information through community barazas, workshops, radio and TV shows does not enable real time provision of information as one has to wait untill such an event is organised before one can get access to the information. And the current information system cannot handle complicated ad hoc enquiries such as cross-tabulation.

1.3
1.3.1

Objectives
General Objective

To develop a Data Warehousing information system that supports sheries data and information storage, and retrieval, from a centralised location. 3

1.3.2

Specic Objectives

i. To review work similar to, and literature related to, Data Warehousing in sheries ii. To design a Data Warehousing system for centralised storage and retrieval of sheries data iii. To implement a Data Warehousing system for centralised storage and retrieval of sheries data iv. To validate the sheries Data Warehousing system developed

1.4

Justication of the study

A data warehouse system will provide a centralised location for data and information storage and retrieval and a range of ad hoc and standardised query tools, analytical tools, and graphical reporting facilities for data mining. These tools will perform high-level analyses of hidden patterns, relationships, or trends, and will drill into more detail where needed. The patterns inferred from the data could be used to predict future behaviour and guide decision-making. The data warehouse will create an increase in information availability, efciency, scope and accuracy of scientic reporting, and provide new opportunities for reaching out and passing on information to the sher community via the Internet. Sugiyama (2005) [39] believes that with more accurate and timely information at the community level, the public is likely to be better informed and supportive of efforts to manage sheries and aquatic resources in a responsible manner. She claims that disseminating timely and readily understandable information on the status and trends of sheries should help ensure transparency in sheries management, as called for by the Code of Conduct for Responsible Fisheries.

1.5

Scope

Conceptually, the study will focus on the design, development, and implementation of a data warehouse and data mining system that can enhance data analysis and information dissemination from FIRRI. Geographically, the study will focus on the Fisheries Resources Research Institute.

Chapter 2 Literature Review


2.1 Introduction

This section entails the reviews of works of various writers that are deemed relevant to the study.

2.2

The National Fisheries Resources Research Institute (FIRRI)

Established in 1947, the National Fisheries Resources Research Institute (FIRRI) is a semi-autonomous Public Agricultural Research Institute (PARI) of Uganda operating under the National Agricultural Research System (NARS) (FIRRI, 2006) [14]. As the sheries research arm of NARO, research by FIRRI is currently focusing on providing information for increasing and sustaining sh production and utilisation (FIRRI, 2004 [rri04]; FIRRI, 2006 [14]).

FIRRI has its headquarters in Jinja and an outstation at Kajjansi, where the scientic work is organised according to disciplines such as: Stock Assessment (sh biomass, exploitation rates, etc), sh biology and ecology (biodiversity and conservation), sh habitat quality and quantity, distribution, food webs, physico-chemical characteristics and primary production (water quality), invertebrate studies and food webs, wetlands, aquatic weeds such as water hyacinth, socio-economics (livelihood analysis, co-management), aquaculture/sh farming (seed production, feeds (live feeds and commercial feeds) and pond management/commercial aquaculture (FIRRI, 2003 [11]; FIRRI, 2006 [14]).

2.3
2.3.1

Data Warehousing
The Data Warehousing Concept

Data warehousing is the process of collecting data to be stored in a managed database in which the data are subject-oriented and integrated, time variant, and nonvolatile for the support of decision making (Chan, 1999) [3]. Data from the different operations of a corporation are reconciled and stored in a central repository (a data warehouse) from where analysts extract information that enables better decision making (Cho and Ngai, 2003) [4]. Data can then be aggregated or parsed, and sliced and diced as needed in order to provide information (Fox, 2004) [15].

According to Inmon (1993) [18], a Data Warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data used in support of decision making processes (Inmon, 1993) [18]. Subjectoriented means that a data warehouse focuses on the high-level entities of the business (Chan, 1999) [3] and the data are organized according to subject (Zeng et. al. [43], 2003; Ma et. al., 2000) [24]. For example, sheries data would be organised by sh species, water body, or type of shing gear. Integrated means that the data are stored in consistent formats, naming conventions, in measurement of variables, encoding structures, physical attributes of data, or domain constraints (Ma et. al., 2000 [24]; Chan, 1999 [3]; OLeary, 1999 [30]). For example, whereas an organization may have four or ve unique coding schemes for ethnicity, in a data warehouse there is only one coding scheme (Chan, 1999) [3].

Time-variant means warehouses provide access to a greater volume of more detailed information over a longer period (Zeng et. al., 2003)[43] and that the data are associated with a point in time (Chan, 1999 [3]; OLeary, 1999 [30]), such as month, quarter, or year. Warehouse data are non-volatile in that data that enter the database are rarely, if ever, changed once they are entered into the warehouse (Zeng et. al., 2003 [43]; Chan, 1999 [3]). The data in the warehouse are read-only; updates or refresh of the data occur on a periodic, incremental or full refresh basis (Zeng et. al., 2003) [43]. Finally, nonvolatile means that the data do not change (Chan, 1999) [3].

2.3.2

Data Warehouse Design Model

Data Warehouses typically use the multidimensional and relational storage structures (Bose, 2006) [2]whose models are developed using Cube and Star schemes (Velasquez et. al., 2005)[[42]]. The multidimensional structure physically stores the data in array-like structures that are similar to a data cube. In the relational structure the data is stored in a relational database using the star and snowake schemas (Bose, 2006 [2]; OLeary, 1999 [30]).

Bose (2006) [2] observed that summary data are modeled as a multidimensional data cube consisting 6

of measure and dimension attributes. With the support of OLAP for multidimensional analysis, users can synthesise enterprise information through comparative customised viewing, and analyse historical and projected data (Ma et. al., 2000) [24]. Bose (2006) [2] noted that at the instance level the values of the dimension attributes are assumed unique to determine the values of all measure attributes. He afrms that a multidimensional data cube consisting of dimension and measure attributes is called a fact table. In addition, a multidimensional data cube contains a dimension table for each dimension attribute in the star schema (Bose, 2006) [2]. OLeary (1999) [30] observes that at the centre of the star is the event table (the fact table), and surrounding the event, at the points of the star, are dimension tables containing the resources, time, and location dimensions.

OLeary (1999) [30] noted that fact tables hold particular measures of the event, and include foreign key references to dimension tables at each of the points on the star. He also observed that the particular process being modelled inuences which resources, events, agents or locations are included in the dimensions and the number of tables used to represent each. OLeary (1999) [30] observed that dimension tables describe the properties of the dimensions at hand, and are kept on each dimension that decision makers would like to either rollup or drill down. He noted that in some situations there is a need to generate additional tables from some of the dimensions, resulting in a snowake schema.

The snowake schema is a star schema whose dimension tables are normalised (Bose, 2006) [2], and whose dimensions have embedded foreign keys so that dimension tables have relationships with other dimension tables, creating tables for attributes within a dimension table (OLeary, 1999) [30]. A DW design is often built around a time dimension so that the DW contains data over several periods of time. This feature allows users to perform extensive yearly, quarterly, and monthly analyses that help enable the identication of patterns and trends (Theodoratos and Sellis, 1999) [41]. OLeary (1999) [30] noted that use of the star or snowake schemas is aimed at limiting access and query problems in a data warehouse environment.

2.3.3

Data Warehouse Structure and Tools

Ma et. al., (2000) [24] observed that the data warehouse has a distinct structure whose components include current detail data, older detail data, lightly summarised data, highly summarised data, and meta-data. The current detail data reects the most recent happenings, stored on disk and accessed by end-user analysts (Ma et. al., 2000) [24]. Meta-data are data that describe the meaning and structure of business data, as well as how they are created, accessed, and used (Devlin, 1997) [6]. They describe what is in the data warehouse, specify what comes into and out of the data warehouse, schedule extracts based on a business events schedule, documenting and monitor data synchronisation requirements, and measuring data quality (Ma et. al., 2000) [24].

Chan (1999) [3] observed that Internet-based Decision Support Systems (DSS) and Executive Informa7

tion Systems (EIS) can be built on data warehouses to support distributed decision processes. The web-based multidimensional on-line analytical processing (MOLAP) systems enable users to view summary data by zooming in on details by column, by row, or by cell displayed on multiplayer spreadsheets (Chan, 1999) [3]. According to Chan (1999) [3], this slice and dice capability enables users examine data horizontally, and changes in aggregated performance data can be traced back to unit-level productivity. He observed that in a networked environment, this means that decision makers can link forecasting with operational data in a dynamic manner.

According to Pipe (1997) [32], a warehousing system has: design tools to design warehouse databases; source data acquisition tools to capture data from source tables and databases, and clean, enhance, transport and apply it to data warehouse databases; a data manager to manage and access warehouse data; graphic user interface (GUI) and Web-based data access tools to provide end-users with tools they need to access and analyse warehouse data; a delivery manager to distribute warehouse data and other information objects to other data warehouses, desktop applications, and Web servers on a corporate Intranet; middleware to connect data access tools to warehouse databases, and the delivery manager to target systems; an information directory to provide administrators and business users with information about the contents and meaning of data stored in warehouse databases; and warehouse management tools to administer data warehouse operations. GUIs use multimedia to enhance the impact on information and decision-making support generated through data warehousing (Ma et. al., 2000) [24].

2.3.4

Data Mart

A data mart is a subset of the enterprise-wide data warehouse (OLeary, 1999 [30]; Poe et. al., 1998 [33]; Singh, 1998 [38]). Unlike the data warehouse which is traditionally meant to address the needs of the organisation from an enterprise perspective, a data mart has a limited scope and performs the role of a departmental, regional or functional data warehouse (Bose, 2006 [2]; Singh, 1998 [38]; Poe et. al. [33], 1998). According to Bose (2006) [2], the difference between an enterprise DW and a data mart is essentially a matter of scope. Because data marts are developed for specic business purposes, system design, implementation, testing and installation are less costly than for data warehouses (OLeary, 1999) [30]. OLeary (1999) [30] observed that where data warehouses can take years to develop, data marts can be developed in a few months, at a much smaller cost. A data mart often uses aggregation or summarisation of the data to enhance query performance.

2.3.5

Current Approaches to Data Warehouse (DW) Development

A number of ways can be used to build a DW. An organisation may either build a single DW, or have a multi-tier data warehouse system (Bose, 2006) [2]. In the single data warehouse architecture there is one centralised data warehouse where source systems feed their data to directly, and where the end users obtain data and/or information. In the multi-tier warehousing system, an enterprise data ware8

house coexists with several data marts (Bose, 2006 [2]; Pipe, 1997 [32]). In this system one can either have independent or dependent data mart architecture (Zeng et. al., 2003 [43]; Bose, 2006) [2]. In the independent data mart architecture, the source systems feed the data marts, and the warehouse is fed by the data marts (Bose, 2006 [2]; Zeng et. al., 2003 [43]). According to Bose (2006) [2] and Zeng et. al. (2005)[43], the dependent data mart architecture has a central data warehouse that contains the corporate view of the data and supplies the departmental data marts with the specic data they require.

Bose (2006) [2] observed that the variations on the multi-tier approach that have been implemented in organisations are top-down, bottom-up and hybrid. With the top-down approach, data marts are seen as follow-on to the construction of an Enterprise Data Warehouse (EDW) (Atkinson, 2001 [1]; Pipe, 1997 [32]). In this implementation approach, data ows from the source to enterprise warehouse to data marts, and the implementation follows the waterfall approach (Pipe, 1997) [32].

The bottom up approach is to rst build data marts and then an EDW (Atkinson, 2001 [1]; Pipe, 1997 [32]). The enterprise warehouse evolves bottom-up as a new layer on top of existing data marts, and data marts are loaded directly from source systems, and the enterprise warehouse is loaded from the data marts (Bose, 2006) [2]. In this case the corporate data warehouse project begins with a small pilot project for a specic subject area. Bose (2006) [2] afrms that in so doing, both a data mart and the rst data warehouse are created simultaneously.

Bose (2006) [2] and Pipe (1997) [32] noted that the hybrid approach, or parallel strategy, might include elements of both the top-down and bottom-up approaches. They argue that in this approach the enterprise model is developed rst and documented at a high level, so certain subject areas may be modelled in more detail as warehouse development proceeds. In this approach, therefore, the data warehouse is developed incrementally (Pipe, 1997) [32].

2.3.6

Analysis of the Current Approaches to Data Warehouse Development

Bose (2006) [2] says that the implementation of a single Data Warehouse establishes a single, reliable source for data and provides a more integrated solution for reporting and decision support across functional areas. However, they are not well suited for highly specialised data needs (Bose, 2006) [2]. Bose (2006) [2] believes that the data mart solution, where a data warehouse coexists with data marts, may be well suited for highly specialised data needs. Pipe (1997) [32] afrms that in the multi-tier warehouse architecture involving an EDW and underlying data marts, data is located where it can deliver the highest availability and performance, without sacricing integrity or control over the management of corporate data for business decision-making. Bose (2006) [2] and Pipe (1997) [32] claim that in the long run, a multi-tier warehouse architecture/system is the optimal one .

Both the top-down and bottom-up approaches to data warehouse development have their strengths and weaknesses. The advantage of the Top-down implementation approach is that it leads to a planned, integrated multi-tier solution, and improves the consistency of information in the data marts (Bose, 2006 [2]; Atkinson, 2001 [1]; Pipe, 1997 [32]). However, a top-down approach can create problems when the data marts are added later and it cannot deliver solutions fast enough for an organisation to quickly exploit new business opportunities (Atkinson, 2001) [1]. Bose (2006) [2] and Pipe (1997) [32] argue that this approach usually takes more time and is relatively costly.

Bose (2006) [2] and Pipe (1997) [32] point out that the bottom-up approach gives quick results and a high return on investment. However, if the spread of data marts in this approach is not controlled, there can be integration problems between the data marts and the future EDW (Atkinson, 2001) [1]. Bose (2006) [2]believes that the bottom-up approach eventually yields a disintegrated warehouse because the data marts often do not conform to a common model. Pipe (1997) [32] concludes that an ideal solution to the top-down and bottom-up approaches would be a synergistic marriage of the two approaches to maximise the strengths, and minimise the weaknesses, of each approach. Pipe (1997) [32] claims this strategy supports incremental and evolutionary data warehouse development. Bose (2006) [2] advises that during development, too much must not be take on at once as this can leave users feeling abandoned and the development team overwhelmed. He believes that an incremental approach yields the best results.

2.4

Data Mining

Data mining is the process of applying articial intelligence techniques to large data sets in order to determine data patterns (Ma et. al., 2000) [24] and extract previously unknown but signicant information (Singh, 1998) [38]. On the front-end (client side), data-mining tools allow users to analyse contents of the data warehouse via graphical, tabular, geographic, and syntactic reports. The front-end data-mining tool provides the user with an intuitive, graphical tool for creating new analyses and navigating the data warehouse. This helps focus users analysis so that relevant information can be obtained faster and more effectively (Ma et. al., 2000) [24].

Data mining applications utilise information stored in the warehouse to generate business-oriented, enduser-customised information (Ma et. al., 2000) [24], and statistical summaries from different views of data (Cho and Ngai, 2003) [4]. They can be applied in conjunction with OLAP to form an integrated business solution (Cho and Ngai, 2003)[4]. Data mining is critical to the enterprise that wants to exploit operational and other available data to improve the quality of decision-making and gain critical competitive advantages (Ma et. al., 2000) [24]. Accurate data identication and analysis improves the quality of decision making; strong navigation, computation, synthesis capabilities make it possible to gain critical competitive advantages; relevant information is obtained faster and time is used more

10

effectively (Ma et. al., 2000) [24].

2.5

Some Fisheries Data Warehousing Projects

A number of countries have embraced the data warehousing and data mining concepts in their sheries sector. NetCoast (2001) [29] observed that The European Union implemented COASTBASE, a virtual coastal and marine data warehouse for integrated, distributed information search, access and feedback. The CoastBase client (based on HTML and Java) provides uniform, multilingual and interactive access to all CoastBase services. The ultimate aim of the project was to improve marine and coastal research, assessment, policy making and cooperation along Europes coasts.

Rees and Finney (2000) [34] report that CSIRO Marine Research (CMR) Australia, developed a data warehouse using ORACLE 8i. The client software is written in Java and uses Javas Remote Method Invocation (RMI) and Java Database Connectivity (JDBC) to connect to the underlying ORACLE data store. The database schema has marine, biological, chemical and physical oceanographic parameters, and is designed so that sampled parameters are primarily referenced via spatial coordinates and a time stamp. This allows users examine integrated datasets according to spatial and temporal constraints. For example, a biologist interested in species distribution in a particular geographic area can also acquire any available habitat data (e.g. water column parameters and seaoor sediment composition) for that region and time period, a feature that may be important if one is looking for any correlations between habitat type and species distribution. This is also useful if models are to be employed in order to interpolate between known data points to produce a species distribution map, or to plot the potential distribution of a species based on its known habitat or other environmental surrogate. Users invoke the data warehouse interface via CMRs web page.

Kupca (2004) [22] reports that in Iceland, the Marine Research Institute, Reykjavik, developed a sheries data warehouse structured around 48 tables that include biological sample data, catch data, stomach data, tagging data and incomplete data that do not t the common DW structure. To make the DW portable and platform independent, the Linux operating system and PostgreSQL RDBMS were chosen. PHP was used as the programming language to develop a web-based interface and the upload and extraction parts. An SQL command sent to the database retrieves and presents its result in an HTML table. To ease the use of the DW, there are predened table aliases and groups of useful joins that enable composition of complex multiline queries within seconds. Metadata are split into ve topics (biological samples data, stomach data, catch data, acoustic data and tagging data) and include the information on time, position and species in each topic.

Fisheries data warehouses have been put to varying advantages. Scottish Executive Publications (2006) [36] asserts that the IFISH data warehouse brings together sheries data as a shared resource and has 11

resulted in a substantial reduction in the burden on each department to produce data for the other. The Government of British Columbia (2006)[16] reports that their webpage FishInfo BC provides on-line access to the British Columbia Fisheries Data Warehouse and to federal-provincial sheries datasets where all data are linked to active maps and to standard tables and reports that allow users choose exactly what they want to know about any location, and then print their own personal reports. In support of Indias development of a DW, that includes sheries among other agricultural disciplines, Sharma et. al. (2006) [37] claimed that a DW can improve the quality of research and planning, reduce the duplication of research efforts, encourage dissemination of research ndings, and facilitate qualitative research supported by agricultural databases. Therefore, development of a data warehouse that has data mining capabilities would go a long way in improving sheries management in the Ugandan sheries sector.

12

Chapter 3 Methodology
3.1 Introduction

This chapter entails the approach used to undertake the sheries data warehouse development project. The case study method was used because it helps get the gist of the study since case studies normally look at cross sectional research focused at subject variables. According to Olsen and Marie (2004) [31], cross sectional research gives better subject selection and measurements. The project was implemented in three main phases: system study and analysis; system design; and system development. The means used to validate the system under development is also included.

3.2

System Study and Analysis

According to Kakinda (2000) [19], research design is the structure or nature of research, which may either be qualitative or quantitative. A qualitative approach was used to evaluate information system and datasets, and procedures pertaining to management of research work in FIRRI. Fact nding was based on: 1. Interviews carried out with the staff of the National Fisheries Research Institute (FIRRI). Sample interview questions are presented in Appendix 1, 2, and 3. 2. Document analysis: a number of documents were analysed so as to gain more understanding of the type and contents of the reports required of the data warehousing system. Documents studied included FIRRIs Annual reports for the years 1997 - 1998, 1999 - 2000, 2000 - 2001, 2002 2003, 2003 - 2004, and 2004 - 2005; eld reports; as well as FIRRIs Survey Report on its study of the Upper Victoria Nile River under the Bujagali Hydroelectric Power Project (NARO/FIRRI, 2001) [28].

13

3.3

System Design

The methodology used to design the dimensional model was adapted from Kimball (1996) [20] and Connolly and Begg (2001) [5]. First, the subject matter for the data mart was identied and the grain of the fact table (what a fact table record represents) decided. The grain of the fact table determined the minimum level at which data was referenced, and also enabled the identication of the dimensions, as well as the grain of each of the dimension tables. The dimensions were then identied and conformed. For each dimension chosen, all dimensional attributes that lled out each dimensional table were described.

Next, facts that populate each fact table record were chosen. Facts comprised of numeric additive quantities, and were expressed at the level implied by the grain. Once fact tables had been selected, each fact table was re-examined to determine whether there are opportunities to use precalculations. This applied to those values that may be incorrectly derived by users. As many text descriptions as possible were then added to the dimension tables. The duration of the database, how far back in time the fact table goes, was then chosen. All records related to an old attribute name were linked to that old attribute name and those related to the new attribute name were accordingly linked to it, so as to track slowly changing dimensions. Query priorities and the query modes was then decided.

3.4

System Development

System development involves the implementation, testing and renement of the system. The data warehouse was developed iteratively using the Data Warehouse Lifecycle based on Zachmans approach. A multi-tier warehouse architecture involving an EDW and underlying data marts was developed using the hybrid/parallel approach to data warehouse development (Bose, 2006 [2]; Atkinson, 2001 [1]; Pipe, 1997 [32]). The project started with the development of a data mart for the Fish biology and ecology research discipline in FIRRI. The steps to developing a data warehouse/mart as advocated by Roland and Leonard (2005), Velasquez et. al. (2005) [42], and Chan (1999) [3] were considered during the development process. Sample data was run through the system to establish whether it was functioning as required.

3.5

System Validation

Validation entails the conrmation by examination and provision of objective evidence that an information system has been implemented correctly and that it conforms to user needs and intended uses. During design and development planning, the validation plan was developed to identify required validation tasks and procedures for reporting anomalies and their resolution. In the requirements denition phase, testable user and functional requirements for the data warehouse were established. 14

During the design phase, care was taken to ensure that the software development and management procedures were consistent with accepted practices. At the implementation phase, functional testing was performed to check if the system performs functions as specied in the functional specications. To facilitate tracking and problem resolution processes, each batch of input data extracted was assigned a unique identier linking it back to the source. The system was also tted with a log le as an indirect link between the source and the input transaction. The mapping utilised by the ETL tool was reviewed, care being taken to ensure that the data being loaded into specic data elements is in fact being sourced from the right tables in the source systems. Each data element was given a formal description and a mapping back to the source table(s) used to populate it during the ETL process. Simple database queries were run on the tables in the warehouse to count the number of records in the data warehouse. These counts were then compared with the number of data entries in the source systems. Equality of these counts led to the assumption that records were not left out due to an error during the ETL or simple load process. This was further veried by the lack of errors (not necessarily warnings) in the exception reporting by the ETL tool. For additional verication, actual rows from both the source and data warehouse tables were randomly selected, printed, and listed side by side for comparison.

15

Chapter 4 Implementation
4.1 Introduction

This chapter entails what was used as the basis for understanding and implementing the sheries data warehousing project. The ndings of the requirements elicitisation, analyses of the ndings and subsequent use of the ndings to develop the system are presented.

4.2
4.2.1

System Analysis
Fisheries Data

Fisheries data typically comprises information on the activity of sherfolk and their catches, plus results of scientic surveys aimed at learning more about the biology, population dynamics, and movements of the species concerned. This information is then used by the sherfolk and sheries managers to anticipate the most favourable conditions and locations for shing, and thereby maximise the catches while reducing effort. The data can also be used to conduct independent assessment of stocks and modeling of the resource dynamics, so as to be able to either support, conrm or dispute the soundness of decisions made by the relevant shery managers.

Fisheries data includes: reports and information summaries on catch and landings data; scientic observational data; and environmental Data. Summary (and derived) data includes aggregated statistics by region, season, lengths of sh caught, etc. Catch-and-effort data includes information on the shing activity of the shermen (boat movements; hours, locations and depths shed; gear type used, etc.). Scientic survey data is similar to the catch-and-effort data from commercial operations but is less biased towards areas where catches would be expected to be highest. Biological data may be collected on commercial boats or scientic surveys. Environmental Data is ancillary data such as water temperatures and other hydrologic conditions, which may provide insight into the biological patterns observed. 16

4.2.2

Usage of FIRRIs Information System

Information gathering, analysis and dissemination in FIRRI is shared among its eight disciplines. The most important discipline is, reportedly, the Fisheries Biology and Ecology Discipline. Field data is mainly obtained on a quarterly basis, though sometimes data is obtained monthly. This data is stored and used to perform routine analyses and produce standardised reports such as: eld reports; quarterly reports; and annual reports. Workshop papers and papers meant for scientic publications are also prepared from the reports generated from the system.

Reports prepared by FIRRIs Fish Biology and Ecology discipline is aimed at answering questions on the structure of sh stocks and how this varies with location; and the life history of sh species, particularly with regard to age and growth, recruitment to the shery, reproductive biology, migration and other movement patterns, diet and place in the ecosystem, and natural mortality in the absence of shing pressure. This information is used to prepare brochures, enact legislation aimed at sheries management and conservation, monitor the environmental conditions and sh habitats in the different water bodies, regulate the shing effort, and recommend to stakeholders the best shing practices that may lead to sustainable exploitation of the sh resources.

The stakeholders in the sheries sector, who need and make use of the information generated and packaged in FIRRI, include: the sherfolk; The National Agricultural Research Institute (NARO); International and regional collaborators such as research institutions around Lake Victoria [The Kenya Marine Fisheries Research Institute (KMFRI) in Kenya, The Tanzania Fisheries Research Institute (TAFIRI)in Tanzania], and The Lake Victoria Fisheries Organisation of the East African Community; the Uganda Fisheries Department; several departments at Makerere University, such as Zoology department and The Makerere Universitys Institute of Environment and Natural Resources; NGOs such as the Uganda Fisheries and Fish Conservation Association (UFFCA); legislators in Ugandas parliament; schools; and the general public.

Currently, data is mainly stored in Excel les based on desktop computers scattered among the different functional areas of the institute. Some historical data are still being stored in paper les, though efforts are being made transform them into an electronic form for storage in a relational database. Data analysis is being carried out using Microsoft Excel, SPSS, and other statistical packages. GIS ArcView was being used to present some of the results from the analysis. Most reports are written using Microsoft Word.

4.2.3

Functional Requirements

There are a number of functionalities expected of any information system aimed at improving the current information management in FIRRI. The system should: 17

1. be able to extract data from various les in different storage areas and store them in a centralised location from where data and information can be retrieved; 2. be able to generate sheries reports directly from the system; 3. be able to store a massive amount of data over a long period of time so as to enable trend analysis; 4. have an allowance for occasional loading of lump-sum data in the event that alot of data is accumulated during a given quarter.

4.2.4

Non-functional Requirements

The four major non-functional requirements include: (i) system accessibility, (ii) system security, (iii) Software operability (iv) system performance. 1. System accessibility: any end-user should be in position to access dynamic reports that have resulted from the analysis of sheries data. 2. System security: depending on repository content, the system should provide for differing levels of access to repository content. 3. Software operability: The initial system should be able to make use of the software environment within FIRRI, and therefore be able to run on the windows operating system. 4. System performance: the system should be able to handle at least 40 concurrent end-users.

4.2.5

User Requirements

The users require a system with: 1. A facility for generation of sheries reports; 2. Ability to centralise data and information retrieval; 3. A provision for aggregations and generating summaries; 4. Ability to carry out trend analysis; 5. Ability to project trends; 6. Provision of reliability of at least 98 percent uptime.

4.2.6

System Requirements

Since the warehouse database software should run on the windows operating system platform, Microsoft SQL Server 2005 is recommended. The Microsoft SQL Server 2005 has the SQL Server Management Studio and SQL Business Intelligence Development Studio that have the Server 2005 Integration Services (SSIS) and SQL Server 2005 Analysis Services (SSAS) that are ideal for ware-

18

house development. To run SQL Server 2005, the following hardware and software are required.

1. VGA or higher resolution; 2. A Microsoft mouse or compatible pointing device; 3. Microsoft Internet Explorer 6.0 SP1 or later; 4. Internet Information Services (IIS) 5.0 or later; 5. ASP.NET 2.0; 6. Windows Installer 3.1 or later; 7. Microsoft Data Access Components (MDAC) 2.8 SP1 or later; 8. Itanium processor or higher; 9. Minimum Processor speed of 1 GHz; 10. Memory (RAM) of at least 512 MB; 11. Windows 2003, or higher, Operating system.

4.3

System Design

In light of informational content and nature of analysis required to come up with information, the system that best redresses the shortcomings of the information system in FIRRI is a data warehousing system. The architectural design of the new system, that shows how the data ows throughout the system, is presented in Figure 4.1. The two processes a data warehouse undergoes are data loading (entry) and access. Loading is carried out using Extract Transform and Load (ETL) tools, while warehouse data can be accessed using OLAP tools. Therefore, data will be entered into the FIRRI data warehouse using the ETL tools that extract data already entered into operational systems.

In FIRRIs architectural design, data is extracted from operational data sources that include the operational system in FIRRI, at les, the internet, or decentralised databases located in the district sheries ofces within the country. The extracted data will be loaded into the staging area, where it will be cleaned and loaded into the Data warehouse. The data in the warehouse will be inform of meta data, summary data, and raw data. The warehouse has a provision for archiving and backing up the data. From the data warehouse, the information and data is availed to the data marts. The data marts are tailored around the different functional units within FIRRI such as aquaculture and Socioeconomics (Figure 4.1) or FIRRIs partners in sheries information usage and delivery. Endusers in FIRRIs different disciplines and partner institutions interact with the data marts and are then able to analyse or mine the data and come up with their reports.

19

Figure 4.1: Warehouse Architecture for the FIRRI Fisheries Data Warehouse The trigger for the ETL process will be changes and additions to source data, that will bring about a processing requirement for the data. The data prole for FIRRIs Fisheries Data Warehouse includes quarterly extractions of sheries data and dimensional updates, and occasional monthly input of the data. Therefore, in FIRRI, the Data Warehouse ETL will have a set of quarterly processing requirements, where changes and additions to source data will be extracted and processed through the system quarterly.

4.3.1

Logical Models

Being the most important discipline, the Fish Biology and Ecology discipline was chosen for development of a data mart that will eventually lead to an Enterprise wide Data Warehouse (EDW) for FIRRI. The data mart models consists of ve fact tables and eight dimensions, found in the sh catch dimensional model (Figure 4.2), sh prey dimensional model (Figure 4.3), biology dimensional model (Figure 4.4), gonad dimensional model (Figure 4.5), and catch-length dimensional model (Figure 4.6). A given sh species was taken as the grain of the catch fact table, while an individual sh specimen was taken as the grain for the biology, gonad, catch-length and prey fact tables. The conformed dimensions are date, geography, species, water body, catch type, project, and shing gear dimensions.

4.3.2

Facts

The fact tables are Catch, Catch-Length, Biology, Prey, and Gonad tables. The Catch fact table stores catch sample data. It comprises of the additive measures weight of sh, number of sh, number of boat crew, and number of shing gear (Figure 4.7). The catch-length fact table stores the length measurements of the sampled catch, and comprises of the semi additive fact length (Figure 4.8). The Biology 20

Figure 4.2: Fish Catch Model

Figure 4.3: Fish Prey Model

21

Figure 4.4: Fish Biology Model

Figure 4.5: Fish Gonad Model

22

Figure 4.6: Fish Catch-Length Model

Figure 4.7: Fish Catch Fact fact table stores biological facts about sh sampled, and comprises of: an additive fact sh weight; semi additive fact total length; and the non additive fact serial number (Figure 4.9). The prey fact table contains data about the prey ingested by the sh. It comprises of the additive facts predator-weight, prey-weight, total food weight, total food count, prey count; semi additive fact predator total length; and the non additive fact digestive state (Figure 4.10). The gonad fact table stores gonadal statistics. It stores the semi-additive facts sh weight, number of gonads, gonadal weight, total length, number of eggs counted, and a non additive fact serial number (Figure 4.11). The attribute SourceID has been included in all fact tables to point their data to the source of data extraction.

23

Figure 4.8: Fish Catch-Length Fact

Figure 4.9: Fish Biology Fact

Figure 4.10: Fish Prey Fact

24

Figure 4.11: Fish Gonad Fact

Figure 4.12: Date Dimension

4.3.3

Dimensions

Eight dimensions were identied among the ve fact tables. The dimensions are Date, Water Body, Catch Type, Species, Fishing Gear, Sex-Maturity, Prey Type, and Length dimensions.

Date Dimension This dimension contains attributes that detail the time the data was collected. It has the levels Year, Half, Quarter Month, and Date (Figure 4.12). The Month level has the attributes Month and Month Name. The attribute SourceID points to the data source.

Water Body Dimension The dimension contains attributes about the water body where the sh was caught. It includes attributes such as: water body type, water body name, zone, station, location, shing area (Figure 4.13). The attribute SourceID points to the source of data extraction.

Catch Type Dimension This dimension contains attributes detailing the nature of the sample, or catch type, and the shing

25

Figure 4.13: Water Body Dimension

Figure 4.14: Catch Type Dimension conditions prevailing at the time when the sh was caught. It includes the attributes catch type, shing time, season, and moon (Figure 4.14). The attribute SourceID ties the data to its extraction source.

Species Dimension Contains attributes detailing the hierarchy and levels in the nomenclature of a given sh species. It includes the scientic names, abbreviations of scientic names, and common names of attributes such as Kingdom, Phylum, Genus, Family, Order, Class and Species (Figure 4.15). The attribute SourceID points to the source of the data.

Fishing Gear Dimension Contains attributes detailing the characteristics of the shing gear used to catch the sh sampled. Attributes such as gear type, size, eet, ply and operation are included (Figure 4.16). The attribute SourceID points the data to the source of extraction.

SexMaturity Dimension Contains attributes detailing the sexual characteristics and the maturity state of the sh sampled. Attributes such as sex, maturity, gonad state, fat content, and stomach fullness are included (Figure 4.17). The attribute SourceID ties the data entry to its origin or extraction source.

Prey Type Dimension Contains attributes that depict the type of prey eaten by a given sh. The attributes PreyName Short

26

Figure 4.15: Species Dimension

Figure 4.16: Fishing Gear Dimension

Figure 4.17: Sex and Maturity Dimension

27

Figure 4.18: Prey Type Dimension

Figure 4.19: Length Dimension and Prey Name are included (Figure 4.18). The attribute SourceID points to the data source.

Length Dimension Contains the attribute detailing the length sizes of sh sampled. It has the attribute length (Figure 4.19). The attribute SourceID ties the data to its extraction source.

4.4

System Development

This section entails the physical and data staging design, and development of the system. Owing to the need for the data warehouse system to run smoothly within the current software environment in FIRRI, Microsoft SQL server 2005 was chosen as the database to be used for the initial development of the data warehousing system as it was the only readily available database technology to the researcher.

Microsoft SQL server 2005 has the SQL Server Management Studio and SQL Business Intelligence Development Studio that were used for the development of the databases and ETL tools, respectively. In the Studios are the SQL Server 2005 Integration Services (SSIS) and SQL Server 2005 Analysis Services (SSAS). SSIS has a set of built-in tasks, containers, transformations, and data adapters that may not require the writing of any lines of code during warehouse development. These SSIS features were used during data warehouse development.

4.4.1

Database Development

The staging database and data warehouse, for use in the sheries data warehouse were created using SQL Server Management Studio. The staging area was divided into two for use in data transformation and validation. The second part of the staging area is used to verify that the right transformations have

28

Figure 4.20: Excel Source Adapter Extraction

Figure 4.21: Flatle Source Adapter been carried out on the data before the data is loaded into the data warehouse. Dimensional tables were then created in each of the databases.

4.4.2

Data Extraction, Transformation, and Load (ETL)

Data extraction and load tools were developed using SSIS found in SQL Business Intelligence Development Studio. The extraction, transformation, and load of data into the warehouse were divided into extraction of data from source tables into the staging database; data transformation and cleaning before load into the data warehouse; data load into the warehouse; and development of cubes before deployment to the warehouse server. The ETL packages can connect to a wide variety of data sources. These were developed in the form of packages within SSIS projects. When the different drivers are connected to a package, the package can extract data from either at les, Excel spreadsheets, XML documents, or tables and views in relational databases. A package can connect to relational databases using .NET and OLE DB providers, and to legacy databases using ODBC drivers. Figures 4.20 and 4.21 show data ow tasks which have Excel, Flatle, or databases as their source and destination systems.

Extracting data from the source systems into the Staging Area Before extracting data from the source tables, each row of data in the source tables was assigned a unique identier (e.g. FecundityID in Figure 4.22), that was mapped to the SourceID column in the staging database. This unique identier tied the data to its source table or le. In this way each data entry in the warehouse can be traced back to the source table, le or folder.

The rst step in creating the packages was to create an Integration Services project. The project included templates for objects (data sources, data source views, and packages) used in the data transformation solution. Connection managers, that connect packages to data sources and destinations, such as Excel connection manager, OLE DB connection manager or Flat le connection manager were then added to the package. After creating connection managers for source and destination data, Data Flow tasks were added to the package. An example of a package that has data ow tasks added to the control ow is 29

Figure 4.22: Unique Identiers for Rows of Data

Figure 4.23: Dimension and Fact Table Load presented in Figure 4.23. The Data Flow tasks encapsulate data ow engines that move data between sources and destinations, and provide the functionality for transforming, cleaning, and modifying data as it is moved. Most of the extract, transform, and load (ETL) processes occur in the Data Flow tasks.

Source and destination adapters that point to source and destination tables were then dened, with a connector joining the two as shown in Figures 4.20 and 4.21. The data ows between abstracted sources and destinations that do not contain connectivity information, but instead contain references to connection managers (e.g. localhost.Biodiversity Staging, Excel Connection Manager, Prey Connection Manager) that dene physically where the data sources and destinations are. The data-ows for extracting the data from the source systems and populating the dimension and fact tables in the staging area are similar. To ensure high-speed data copying, transformations were not designed to be performed on the data while it is moving from the source le to the staging destination table. Packages and used

30

Figure 4.24: Foreach Loop Containers to populate the dimension and fact tables were developed, as outlined above, for each of the dimension models.

The package that populates the Biology dimension model is designed to demonstrate the ability of a package to iterate through any number of les in a folder and extract data from multiple le sources. It uses the Foreach Loop container (Figure 4.24). When the package is run, the Foreach Loop Container iterates through a collection of les in a folder. Each time a le is found that matches the set criteria, the Foreach Loop Container updates a variable with the le name. This causes the connection manager to connect to a different le, and the data ow task processes a different data set and loads it into the staging area.

To enable centralised extraction of data from source systems, a package that enables centralised running / execution of all the packages that extract data from source systems into the staging area was then created (Figure 4.25). At runtime, SQL queries that truncate the dimension and fact tables are executed rst using Execute SQL tasks (Figure 4.26). Package execution tasks, tasks that execute packages that populate the different dimensional models, then execute next (Figure 4.25).

Data Cleaning and Transformation


Before being loaded into the data warehouse, data extracted from the multiple les was cleaned or transformed using built-in transformations contained in SSIS. Surrogate keys for the fact tables are generated and assigned before the data is loaded into the warehouse. The control ow for the data cleaning and surrogate key generation is presented in Figure 4.27. At runtime, before the tasks that clean the dimen-

31

Figure 4.25: Centralised Running / Execution of Packages

Figure 4.26: Execute SQL Task Editor

32

Figure 4.27: Data Cleaning and Surrogate Key Generation Control Flow

Figure 4.28: Cleaning the Dimension-Data Flow and Generating Surrogate Keys sion tables and asign surrogate keys are executed, an SQL task is used to nd the prevailing maximum dimension / surrogate key. The tasks passes on the maximum surrogate key as a variable to the data ow.

In the dimension load tasks, the fuzzy look up is used to lookup data rows with spelling mistakes and correct such mistakes (Figure 4.28). The fuzzy lookup adapter looks up the correct spellings in a reference table and replaces the column entry that has the wrong spelling or some missing data entry (Figures 4.29 and 4.30). After correction of mistakes, the data ow is passed through a slowly changing dimension editor that compares data sets in the data ow with those in the destination table (Figure 4.28). New data rows are passed through while rows with changes are passed to an OLE command adapter that updates the changing entry in the destination table.

33

Figure 4.29: Fuzzy Lookup

Figure 4.30: Correcting Spelling Mistakes and Adding Missing Data Entries

34

Figure 4.31: Sorting The Data Flows For new data sets, a dimension key is generated. The source data is split up into two by a multicast adapter (Figure 4.28). One side only does a sort (Figure 4.31) to prepare for the merge join. The other path rst sorts and removes rows with duplicate sort values, then the script component is used to generate and assign the surrogate keys (Figure 4.32). The maximum dimension key value, that was passed on to the ow as a variable, is incremented by the script component for every row that passes through. This then adds a surrogate key value to the data ow. The two data ows are then inner joined on the sort keys using the merge join transformation (Figure 4.33) resulting in an updated data ow with new surrogate keys. The data ow from the source is then mapped onto the destination dimension table (Figure 4.34). This is done for all dimension tables.

In the fact table-load tasks, the conditional split adapter is used to lookup rows with missing column entries and pass them over into an error table for further management and possible cleaning or addition of missing data (Figure 4.35). Before the bad data is loaded to the error table, the audit transformation is used to add information about the task and package where the error was detected so as to enable corrections. Good data is passed on to a slowly changing dimension editor that compares the data ow with the destination fact table. Any data that has the same unique ID as a data entry in the destination table is not passed through, while that with changes is passed over to an OLE DB command for update of the corresponding entry in the destination table.

New data ows are passed on to lookup transformations that lookup surrogate keys in the dimension tables and assign the surrogate keys to the corresponding foreign keys in the fact tables (Figure 4.36) before the data is inserted into the destination fact table. In the event of an error, the error is passed 35

Figure 4.32: Surrogate Key Generation

36

Figure 4.33: Inner Joining Two Data Flows

Figure 4.34: Mapping Source Data to The Destination Table 37

Figure 4.35: Fact-Data Cleaning and Transformation Data Flow Task over to an error ow, and audit information added to it. A union join is used to join all error data ows before the data ow that has errors is inserted into an error table (Figure 4.35). This process is the same for all the fact tables.

Loading Data into the Warehouse


The package developed for loading data into the warehouse is presented in Figure 4.37. Whereas the data ows loading the warehouse dimensions (Figure 4.38) are rst passed to a sort editor that removes any duplicated rows of data before being passed on to a slowly changing dimension adapter, ows that populate the fact tables (Figure 4.39) are not rst passed through a sort adapter before being passed on to a slowly changing dimension adapter. In both cases the slowly changing dimension transformation adapter compares incoming data with that in the warehouse. If the unique identier of a row of incoming data matches that for data in the destination table, and no changes have been effected on the column entries, the row is not passed through. If the unique identier matches that for data in the destination table, but one or all of the columns in the incoming data has a change, the row ow is directed to an OLE DB Command adapter that changes and updates the concerned row in the destination table. If the unique identier of the incoming data ow has no match in the destination table, then the incoming row data is passed on as a new data output, and the destination adapter inserts the data as new data in the warehouse table. This process is replicated for all fact and dimension tables.

4.4.3

Analysis Cubes

The analysis cubes were developed using SQL Server Analysis Services (SSAS). Analysis Services is a middle-tier server for online analytical processing (OLAP) and data mining. The Analysis Services 38

Figure 4.36: Surrogate Key Assignment

Figure 4.37: Loading Data into the Warehouse

Figure 4.38: Example of Warehouse Dimension Table Data Flow Task

39

Figure 4.39: Example of Warehouse Fact Table Data Flow Task system includes a server that manages multidimensional cubes of data for analysis and provides rapid client access to cube information. Analysis Server is the server component of Analysis Services that is specically designed to create and maintain multidimensional data structures and provide multidimensional data in response to client queries. The structure of the multidimensional cubes developed, showing the data view of all fact and dimension tables, is presented in Figure 4.40.

4.4.4

Enduser Application

Endusers will access the data warehouse through Microsoft Excel. Microsoft Excel was chosen because most of the endusers in FIRRI are already well versed with the use of Excel, and because Excel has an addin, Excel addin for Analysis Services that works well with SQL Server Analysis Services Cubes. The Microsoft Ofce Excel Add-in for SQL Server Analysis Services provides analysis capabilities and exible reporting for data imported into Excel from Analysis Services cubes. By invoking the add-in from within Excel, the enduser can import data from the Analysis Services cubes, use Analysis Services techniques to analyze the data, and then leveraging their existing Excel skills use Excel functionality to manipulate and present the data in reports. From within Excel, the endusers can use Excel formatting and calculation features, combine data from multiple dimensions, use drillthrough to see source data, drill up and drill down, expand and collapse, isolate and eliminate, and pivot the data.

The enduser has the option to use the pivot table to generate a report like the one presented in Figure 4.41, or use the Cube analysis addin to create a report such as that presented in Figure 4.42. The enduser can then format the report and produce colourful charts as that presented in Figure 4.42.

4.4.5

System Validation

This section entails the results of the different mechanisms put in place to validate the data warehouse system developed. The system can extract data from multiple sources and centralise them into one location as planned. This is evidenced by the availability of data from all source systems being found in the warehouse. Comparison of the unique identiers for the data in the source systems and that in the data warehouse shows they are the same. Additional data warehouse verication by listing actual rows from randomly selected tables in both source (Figure 4.22) and data warehouse (Figure 4.43) also show an 40

Figure 4.40: Structure of Analysis Data Cube

41

Figure 4.41: Length Frequency Distribution of Oreochromis niloticus

42

Figure 4.42: Maximum Weight of Selected Fish Species Across 4 Quarters

43

Figure 4.43: Check of Rows Written to the Data Warehouse exact match, further conrmation that the Data Warehouse is functioning as required of it. The unique identier, referred to as FecundityID in the source system, is similar to that in the data warehouse where it is referred to as the SourceID.

SSIS has a progress log that shows the time the execution of a package was started and ended, the data source and its destination, the number of rows in the source table, the number of rows written (extracted successfully) (Figure 4.44). This inbuilt mechanism validates the execution of the package. In addition, there is colour coding of the adapters and transformations in the package as it is executing. Yellow indicates the package is executing, while red shows that there is an error in the package and therefore execution was not successful. The green colour is coded when the system execution has been successful, and the number of rows written is also indicated on the connector between the source and destination (Figure 4.45). Examination of the system logs show that the number of records in the source systems are an exact match with those in the data warehouse. Equality of these counts is an indication that records were not left out due to an error during the ETL or simple load process. This was further veried by the lack of errors (not necessarily warnings) in the exception reporting by the ETL tool.

44

Figure 4.44: System Validation

45

Figure 4.45: Rows Written During Data Load

4.5
4.5.1

Conclusions, Limitations, and Future Work


Conclusions

This project focused on verifying whether designing and implementating a data warehousing system in FIRRI would bring about centralisation of storage and retrieval of sheries data and information. Results show through a data warehouse can greatly improve the storage, retrieval, and dissemination of shries information. Given that the data warehouse enables aggregation of data and information from different source systems, enables easy execution of complex queries, and enables real time dissemination and retrieval of data and information, one cannot underestimate its power in effecting positively affecting the operations of the sheries sector. It provides evidence that centralised data storage, information retrieval, and reporting in FIRRI is both possible and attainable.

4.5.2

Limitations

In the course of this study, a number of problems were encountered. Time and nancial constraints were a major problem faced. Owing to the limitations in the time frame and nancial limitations, it was not possible to develop a web interface for the warehouse as a preferred software for use, webintelligence, could not be procured.

46

4.5.3

Future Work

The data warehouse provides a mechanism of extracting data from any type of database management system on any networked information system to the extent of facilitating data transmission and exchange on the Internet. This is vital for the dissemination of information to sheries stakeholders who are not within FIRRI. Since in this study it was only possible to design and develope a data mart, future work should therefore focus on developing an Enterprise Wide Datawarehouse that can be accessed by endusers even via the World Wide Web.

47

References
1. Atkinson E., 2001. Data Warehousing - A Boat records managers should not miss. Records Management Journal, Vol. 11, No. 1. pp 35 - 43. 2. Bose R., 2006. Understanding Management Data Systems for Enterprise Performance Management. Industrial Management and Data Systems Vol. 106 No. 1., pp. 43-59. 3. Chan S. S., 1999. The Impact of Technology on Users and the Workplace. New Directions for Institutional Research. Volume 1999, Issue 103. pp 3 - 21 4. Cho V. and Ngai E.W.T., 2003. Data Mining for Selection of Insurance Sales Agents. Expert Systems. Vol. 20, No. 3. 5. Connoly and Begg. 6. Devlin B., 1997. Data Warehouse: From Architecture to Implementation, Addison-Wesley, Reading, MA 7. FIRRI, 1999. Annual Report 1998 - 1999 8. FIRRI, 2000. Annual Report 1999 - 2000 9. FIRRI, 2001. Annual Report 2000 -2001 10. FIRRI, 2002. Annual Report 2001 - 2002 11. FIRRI, 2003. Annual Report 2002 - 2003 12. FIRRI, 2004. Annual Report 2003 - 2004 13. FIRRI, 2005. Annual Report 2004 - 2005 14. FIRRI, 2006. FIRRI Prole. www.ri.go.ug (Retrieved on 13/13/2006). 15. Fox R., 2004. Moving from data to information OCLC Systems and Services: International Digital Library Perspectives Volume 20 Number 3 pp 96-101 16. Government of British Columbia, 2006. FishInfo BC. (Retrieved on 7/4/2006). 17. Han J. and Kamber M., 2001. Data Mining: Concepts and Techniques. San Diego: Morgan Kaufman. 18. Inmon W.H., 1993. Building the Data Warehouse, A WileyQED publication, John Wiley and Sons, Inc. NewYork 123-133 19. Kakinda M. F., 2000. Introduction to social research. 20. Kimball R. and Ross M., 1996. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 21. Krishna P. R. and Kumar S.D., 2001. A Fuzzy Approach to Build an Intelligent Data Warehouse. Journal of Intelligent and Fuzzy Systems 11 (2001) 23-32 22. Kupca V., 2004. A standardized database for sheries data. Marine Research Institute,Reykjav (i)k, 48

Iceland. CM 2004/FF:15 (Retrieved from on 20/03/2006) 23. Lee S. M. and Hong S., 2004. Impact of Data Warehousing on Organizational Performance of Retailing Firms. International Journal of Information Technology and Decision Making Vol. 3, No. 1. pp. 61-79. 24. Ma C., Chou D.C., Yen D.C., 2000. Data Warehousing, Technology Assessment and Management. Industrial Management and Data Systems Volume 100 No. 3 pp. 125 - 135 25. Mackinnon M. J. and Glick N., 1999. Data Mining and Knowledge discovery in Databases - An Overview. Australian and New Zealand Journal of Statistics. 41(3). pp 255 - 275. 26. Mahadik H., 2002. Mumbai:India infoline Ltd. Retrieved on 20/03/2006 from http:www.infoline.com/worked%20on%20reasearch/Knowledgement%20management.htm 27. Mundy J., 2002. Relating to OLAP, Intelligent Enterprise, Vol. 5 No. 16. pp. 20 - 22. 28. NARO / FIRRI, 2001. Aquatic and Fisheries Survey of The Upper Victoria Nile. Final Report, January 2001. Prepared for AES Nile Power. 29. NetCoast, 2001. A Guide to Integrated Coastal Zone Management: Simulation models and modelling systems related to Integrated Coastal Zone Management (Retrieved from http://www.netcoast.nl/tools/rikz/COASTBASE.htm on 25/03/2006). 30. OLeary D.E. 1999. REAL-D: A Schema for Data Warehouses. Journal of Information Systems Vol, 13. No, I pp. 49-62. 31. Olsen D.M. and Marie, D.M., 2004. Cross -Sectional Study design and Data analysis. Retrieved on 26/03/2006 from, http://www.collegeboard.com/prod-downloads/yes/ 4297 MODULE 05.pdfsearch=cross%20sectional%20study 32. Pipe P., 1997. The Data Mart: A New Approach to Data Warehousing. International Review of Law Computers and Technology, Volume 11, No. 2, pp 251-261. 33. Poe V., Kauer P., Brobst S., 1998. Building a data warehouse for decision support. 2nd Edition, Prentice Hall PTR, 1998. 34. Rees T. and Finney K., 2000. Biological data and metadata initiatives at CSIRO Marine Research, Australia, with implications for the design of OBIS. Oceanography Vol. 13 No. 3. pp 60 - 65. 35. Roland P., Leonard E.H., T 2005. Data Warehousing: An Aid to Decision-Making. THE Journal, Apr 2005, Vol. 32, Issue 9 36. Scottish Executive Publications, 2006. Chief Statisticians Annual Report 2005. (Retrieved on April 7, 2006). 37. Sharma S. D., Singh R. and Anil Rai A., 2006. Integrated National Agricultural Resources Information System. GISdevelopment.net. (Retrieved 26/03/2006) 38. Singh H., 1998. Data warehousing: concepts, technologies, implementations and management, Prentice Hall PTR, 1998.

49

39. Sugiyama S., 2005. Information Requirements for Policy Development, Decision-making and Responsible Fisheries Management: What Data Should Be Collected? SPC Women in Fisheries Information Bulletin No. 15. pp 24 - 29. 40. Thefreedictionary, 2006. http://www.thefreedictionary.com/Fisheries (Retrieved on 26/03/2006) 41. Theodoratos D. and Sellis T., 1999. Designing data warehouses, Data and Knowledge Engineering, Vol. 31 No. 3, pp. 279-301. 42. Velasquez D. J., Weber R. H., Yasuda H., and Aoki T., 2005. Acquisition and Maintenance of Knowledge for Online Navigation Suggestions IEICE TRANS. INF. and SYST., VOL.E88-D, NO.5. pp 993 -1003. 43. Zeng Y., Chiang R.H.L., and Yen D.C., 2003. Enterprise Integration with Advanced Information Technologies: ERP and data warehousing. Information Management and Computer Security, Vol. 11/3 pp 115-122..

50

Appendix 1: Lead Scientists Questionnaire


(i). INTRODUCTION I am carrying out a study on the use of data warehousing in sheries. This interview is aimed at nding out the kind of information you would want out of the warehouse and the the way it should be formatted and presented.

(ii). RESPONSIBILITIES Describe FIRRI and its relationship to the rest of the sheries sector. What are your primary responsibilities?

(iii). RESEARCH OBJECTIVES AND ISSUES What are the objectives of FIRRI? What are its top priority research goals? What functions and departments within FIRRI are most crucial to ensuring that these key success factors are achieved? What role do they play? How do they work together to ensure success? What are the key research issues you face today? Is there anything that prevents you from meeting your research objectives? Where does FIRRI stand in the use of information technology?

(iv). ANALYSES REQUIREMENTS What role does data analysis play in decisions made and by sheries managers? What key information is required to make or support the decisions you make in the process of achieving your goals and overcoming obstacles? How do you get this information today? Is there other information which is not available to you today that you believe would have significant impact on helping meet your goals? Which reports do you currently use? What data on the report is important? How do you use the information? If the report were dynamic, what would the report do differently? What analytic capabilities would you like to have?

Thank You

51

Appendix 2: Information System Audit Questionnaire


(i). INTRODUCTION I am carrying out a study on the use of data warehousing in sheries. This interview is aimed at nding out the kind of information you would want out of the warehouse and the way it should be formatted and presented.

(ii). RESPONSIBILITIES Describe FIRRI and its relationship to the rest of the sheries sector. What are its primary responsibilities? Which interest groups does it support?

(iii). USER SUPPORT / ANALYSES AND DATA REQUIREMENTS What is the current process used to disseminate information? What tools are used to access/analyse information today? Who uses them? Are you asked to perform routine analyses? Do you create standardised reports? Describe typical ad hoc requests. How long does it take to full these requests? What is the technical and analytical sophistication of the users? What is the biggest bottleneck/issue with the current data access process?

(iv). DATA AVAILABILITY AND QUALITY Which source systems are used for frequently-requested information? How often is the data updated? Availability following update? How much history is available? What are the known bottlenecks in current source systems? Do you currently have common source les? Who maintains the source les? How are changes captured? What else should I know about FIRRI and its information systems? What must this project accomplish to be deemed successful? Thank You

52

Appendix 3: End-User Questionnaire


(i). INTRODUCTION I am carrying out a study on the use of data warehousing in sheries. This interview is aimed at nding out the kind of information you would want out of the warehouse and the way it should be formatted and presented.

(ii). RESPONSIBILITIES Describe FIRRI and its relationship to the rest of the sheries sector. What are your primary responsibilities?

(iii). RESEARCH OBJECTIVES AND ISSUES What are the objectives of FIRRI? What are its top priority research goals? What are the key research issues you face today? Describe your research disciplines. How do you distinguish between research disciplines? How do you categorise research disciplines?

(iv). ANALYSES REQUIREMENTS What type of routine analysis do you currently perform? What data is used? How do you currently get the data? What do you do with the information once you get it? What analysis would you like to perform? Are there potential improvements to your current method/process? Which reports do you currently use? What data on the report is important? How do you use the information? If the report were dynamic, what would the report do differently? What analytic capabilities would you like to have? Are there specic bottlenecks to getting at information? How much historical information is required?

What must this project accomplish to be deemed successful? Thank You

53

You might also like