Original Archival

ORIGINAL ARCHIVAL COPY
LAND USE EFFECTS ON WATER QUALITY: BUILDING A FRAMEWORK FOR
CHICAGO RIVER WATERSHED
BY
NAILA GHIDEY ISMAIL MAHDI
DEPARTMENT OF
CIVIL, ARCHITICHTURAL, AND ENVIRONMENTAL ENGINEERING
Submitted in partial fulfillment of the

requirements for the degree of
Doctor of Philosophy in Environmental Engineering
in the Graduate College of the
Illinois Institute of Technology
Approved
Adviser
Chicago, Illinois
May 2012
UMI Number: 3529157
All rights reserved
INFORMATION TO ALL USERS

The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
UMI 3529157
Published by ProQuest LLC 2012. Copyright in the Dissertation held by the Author.
Microform Edition ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Copyright by
NAILA GHIDEY ISMAIL MAHDI
May 2012
11
ACKNOWLEDGEMENT
I am deeply grateful to my advisor, Professor Krishna Pagilla, for his constant
support. Without his help this work would not be possible. I would also like to thank the
members of my committee for their inputs. A special thanks to Dr. Tzuoh-Ying Su of the
U.S. Army Corps of Engineers (USACE), Chicago District, in providing information and
data.
I am greatly indebted to my dear husband Haithum Elhadi for his huge support
and assistance. I dedicate this thesis to him and to our wonderful children Sapheya,
Nadia, Nour and Yahia.
111
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENT iii
LIST OF TABLES vi
LIST OF FIGURES viii
LIST OF SYMBOLS xi
ABSTRACT xiii
CHAPTER
1. INTRODUCTION 1
1.1 Introduction 1
1.2 Statement of the Problem 1
1.3 Goals of the Study 2
1.4 Objectives of the Study 4
1.5 Overview of the Thesis 9
2. LITERATURE REVIEW AND THEORETICAL BACKGROUND 10
2.1 Introduction 10
2.2 Land Use Effect in Urban Watershed 10
2.3 Regulations 18
2.4 Watershed Modeling 21
2.5 Data Integration and Data Warehouse 43
2.6 Conclusion 49
3. STUDY AREA 51
3.1 Introduction 51
3.2 Watershed Characteristics 51
3.3 Watershed Data Used in the Study 59
3.4 Watershed Elements 65
3.5 Conclusion 70
4. WATERSHED DATA WAREHOUSE 71
4.1 Introduction 71
4.2 Data Warehouse Technology 72
4.3 Watershed Data Warehouse 76
iv
4.4 The Development of Watershed Data Warehouse 78
4.5 Graphical User Interfaces 91
4.6 Chicago River Watershed Data Warehouse 95
4.7 Conclusion 112
5. DATA DRIVEN MODEL TO PREDICT WATER QUALITY 113
5.1 Introduction 113

5.2 Methodology 113
5.3 Data Mining Methodology 115
5.4 Case Study 125
5.5 Implementation and Results 129
5.6 Conclusion 142
6. WATER QUALITY MODELING USING BASINS/HSPF 144
6.1 Introduction 144

6.2 Methodology 144
6.3 Watershed Simulation 151
6.4 HSPF Simulation Results 164
6.5 Total Annual Loads of Nutrients 182
6.6 Detailed Land Use Export Coefficients 190
6.7 Conclusion 194
7. CONCLUSIONS 195
7.1 Summary 195

7.2 Future Research Work 200
APPENDIX
A. DATA WAREHOUSE AND DATA MINING 202
B. BASINS/HSPF 232
BIBLIOGRAPHY 241
v
LIST OF TABLES
Table Page
2.1 Characteristics of major watershed models 28
3.1 Sources and types of potential pollutants in the study area 60
3.2 Sources'data description 62
3.3 Average annual North Side WRP effluent 64
4.1 The Bus Architecture Matrix for Watershed Data Warehouse 82
4.2 Entity definition 83
4.3 Watershed Data Warehouse tables'statistics 90
4.4 Watershed water quality fact data table 91
5.1 Predictors'properties 127
5.2 Prediction accuracy of regression models 134
5.3 Total nitrate classes 136
5.4 Prediction accuracy of ANN model 139
5.5 Prediction accuracy of logistic regression model 139
5.6 Prediction accuracy of SVM model 140
5.7 Prediction accuracy of decision tree model 140
5.8 Prediction accuracy of lazy learner model 141
5.9 Prediction accuracy of nai've bayes model 141
6.1 Metrological data required for HSPF 147
6.2 Some of TIA percentages adopted for this study based on literature 156
6.3 General calibration/validation targets or tolerances for HSPF 163
6.4 Calibration/Sensitivity analysis for EIA equations for this study 168
vi
6.5 Statistical results of hydrology calibration 171
6.6 Statistical results of hydrology validation 172
6.7 Statistical results of water quality calibration 178
6.8 Statistical results of water quality validation 181
6.9 Comparing Physical and data driven models 182
6.10 Simulated annual loads of total nitrogen 184
6.11 Simulated annual loads of total phosphorous 185
vii
LIST OF FIGURES
Figure Page
1.1 Elements of the research topic 8
2.1 Major land use areas in USA 12
2.2 Components of a typical watershed /hydrologic model 27
2.3 Structure chart for PERLND module 41
2.4 Structure chart for IMPLND module 41
2.5 Structure chart for RCHRES module 42
2.6 A flow diagram of the hydrological components of HSPF 42
3.1 Study area 55
3.2 Urban land use in Chicago 56
3.3 Locations of data sources 61
3.4 Basic watershed elements 69
4.1 Data warehouse components 74
4.2 Roll-up for the land use type dimension and related attributes 87
4.3 Star schema model for watershed water quality data mart 87
4.4 Watershed data warehouse multi-dimensional model 88
4.5 Graphical user interface for watershed data warehouse 94
4.6 An ad hoc analysis example for watershed data warehouse 94
4.7 Water quality and quantity stations used in the watershed assessment 100
4.8 TKN historical data 104
4.9 Total nitrates historical data 105
4.10 Total phosphorous historical data 106
Vlll
4.11 N/P ratio for upstream station 107
4.12 N/P ratio for downstream station 108
4.13 Dissolve oxygen historical data 109
4.14 DO vs. water temperature for upstream station 110
4.15 DO vs. water temperature for downstream station 110
4.16 Water temperature vs. air temperature Ill
5.1 Data mining methodology 115
5.2 k-fold cross validation method 118
5.3 Histograms of attributes 128
5.4 Scatter plot matrix of attributes 128
5.5 Decision tree regression model 133
5.6 Actual vs. predicted total nitrates 135
6.1 The Chicago River Watershed delineation process using BASINS 150
6.2 Schematic created by WinHSPF for the upper Chicago River subbasins 152
6.3 GenScn window where performance of model was evaluated 165
6.4 Simulation of flow for calibration period 169
6.5 The duration curve for calibration period 169
6.6 Observed vs. simulated flow scatter plot for calibration period 170
6.7 Simulation of flow for validation period 173
6.8 The duration curve for validation period 173
6.9 Observed vs. simulated flow scatter plot for validation period 174
6.10 Simulation of total nitrates for calibration period 176
6.11 Simulation of total ammonium for calibration period 177
ix
6.12 Simulation of ortho phosphate for calibration period 177
6.13 Simulation of total nitrates for validation period 179
6.14 Simulation of total ammonium for validation period 180
6.15 Simulation of ortho phosphate for validation period 180
6.16 Point and non-point sources nutrients' loadings 186
6.17 Land use area in Upper Chicago River Basin 187
6.18 Total nitrogen loads in Upper Chicago River Basin 188
6.19 Total phosphorous loads in Upper Chicago River Basin 189
6.20 Average export coefficients for total nitrogen 192
6.21 Average export coefficients for total phosphorous 193
x
LIST OF SYMBOLS
Symbol Definition
ANN Artificial Neural Network
BAM Bus Architecture Matrix
Better Assessment Science Integrating Point & Non-
BASINS point Sources
CMAP Chicago Metropolitan Agency for Planning
CWA Clean Water Act
DM Data Mining
DO Dissolved Oxygen
DW Data Warehouse
EC Export Coefficient
EIA Effective Impervious Area
EPA Environmental Protection Agency
GIS Geographical Information System
GUI Graphical User Interface
HSPF Hydrological Simulation Program-FORTRAN
IEPA Illinois Environmental Protection Agency
MAE Mean Absolute Error
Metropolitan Water Reclamation District of Greater
MWRDGC Chicago
NPS Non Point Sources
NSE Nash-Sutcliffe Efficiency
NWS National Weather Service
PME Percent Mean Error
PS Point Sources
RAE Relative Absolute Error
RMSE Root Mean Square Error
RRSE Root Relative Squared Error
XI
ROC Receiver Operating Characteristic
SQL Standard Query Language
SVM Support Vector Machines
TIA Total Impervious Area
TKN Total Kjeldahl Nitrogen
TMDL Total Maximum Daily Loads
TN Total Nitrogen
TP Total Phosphorous
USEPA US Environmental Protection Agency
USGS U.S. Geological Survey
WDW Watershed Data Warehouse
WEKA The Waikato Environment for Knowledge Analysis
WQS Water Quality Standards
xn
ABSTRACT
The purpose of this study is to introduce a framework that enables a holistic
watershed approach that models the dynamics of water quality and landuse in a highly
urbanized watershed.
The landuse-water quality relationship is a complex relationship and has not been
adequately addressed for highly urbanized watersheds. Factors such as inadequate urban
planning, increase of impervious areas and dynamics of population growth are some of
the reasons for the complex relationship. Also point sources are always easy to be
identified and controlled unlike nonpoint sources such as urban storm runoff. Both
quantities and transport pathways of pollutant inputs are impacted by land use in the
watershed. So, examining the factors that govern the relationship between different land
uses and water quality within a watershed can give insights and important information
about existing and potential sources of contamination.
The two backbone concepts in this study are the holistic watershed perspective
and the role of historical data records as part of assessment, modeling and integration
tools of the watershed framework. Analysis of the records will explain watershed
conditions identifying the major problem areas and justify the modeling and post analysis
procedures. Data sources are often important but data availability, heterogeneity and
conformity are the main challenges in integrating these sources.
This research presents an approach to integrate the watershed data in a single
repository and methodologies for analyzing and assessing the watershed using data
warehouse and data mining technologies. A multi-dimensional model that supports
complex querying of watershed data and discovery of trends and patterns in data by
xiii
incorporating 40 years worth of watershed data from different source agencies in a
central repository is introduced.
Also, the discipline of data driven modeling was introduced in this thesis using
the developed central repository. Several regression and classification algorithms were
presented and assessed for their appropriateness for predicting total nitrates using few
watershed attributes. The results show acceptable prediction accuracy.
Five years of water quality simulation using the multi-purpose environmental
analysis system BASINS coupled with the comprehensive, conceptual, and continuous
simulation watershed scale model HSPF resulted in export coefficients for level (III),
detailed land use for the Chicago River watershed. The water quality simulation approach
utilized in this research to generate the coefficients constitutes a new contribution to the
Chicago River watershed and other highly urbanized watersheds.
The continuous calibrated and validated model can be used in the investigation
and analysis of different scenarios and possible future conditions, thus providing a
planning tool for regulatory environmental agencies. The data driven models developed
can be used as operation tool to maintain the water quality parameters especially if
TMDL and WQS are developed for Chicago River Watershed. So the framework
proposed for this study can be considered robust with the proposed integration, planning
and operating techniques and tools. Furthermore, an optimization tool is introduced in the
future work section.
xiv
1
CHAPTER 1
INTRODUCTION
1.1 Introduction
The pollution of urban watersheds has become a serious problem that threatens
the urban ecological environment. Surface water quality issues in highly urbanized
watershed are increasing in Chicago metropolitan area just like most of the urban
watersheds in the United States. The sources of the pollution and their contributions are
highly dependent on the type of land use and land cover in the watershed. Although,
identification, quantification and control for contributions from point sources could be
achievable, the same could not be said for nonpoint sources contribution.
1.2 Statement of the Problem
Urban storm water runoff is considered a major source of pollutants in highly
urbanized watersheds (Bian et al., 2011; Brezonik et al., 2002). The effect of it results in
change of available water quantities for direct runoff, stream flow and ground water flow.
Moreover, it affects considerably the chemical, physical, and biological processes in the
receiving water bodies. The complexity of the factors governs these processes and the
random patterns of precipitation make it difficult to control the storm water runoff
pollution (Bian et al., 2011; Zhu et al. 2008).
Nutrients, such as nitrogen and phosphorus, are essential for a healthy and diverse
aquatic environment. However, excessive amounts of these nutrients can have
undesirable effects on water quality, resulting in adverse changes in the biological and
aquatic life (USEPA, 2000). Potential risks to human health are also associated with the
2
growth of harmful algal blooms (Hamed et al., 2004). In 1998 list of impaired waters, the
States reported sedimentation is the leading cause of impairments to water quality
followed by nutrients' contamination (USEPA, 2000).
Runoff from different types of land use carries different kinds of contaminants
and pollutants. For example, agricultural land uses' runoff carry high amounts of
nutrients and sediments, while, runoff from developed urban areas may carry sodium and
sulfate from road salt treatments along with other different materials such as rubber and
metals (Tong et al., 2002).
Moreover, different types of land cover can modify the hydrologic cycle, water
balance, water temperature and other surface land and water characteristics due to the
changes they impose on different processes such as evapotranspiration, infiltration,
percolation, sedimentation, erosion etc. (Tong et al., 2002; LeBlanc et al., 1997). Thus,
the land use type will not only affect amount of runoff and pollutants inputs but will also
change the transport pathways of those inputs (Tong et al., 2002).
Typically, small amounts of nutrients are received from forest land uses; while
large amounts are received from land uses that involve fertilization and soil disturbance
(Calderon, 2009). The strong relationship between land use types and the quantity and
quality of water is undeniable (Tong et al., 2002; Gburek et.al., 1999).
1.3 Goals of the Study
Examining the factors that govern the relationship between different land uses and
water quality within a watershed can give insights and important information about
existing and potential sources of contamination. Also for future planning, development,
3
and decision-making purposes, there is a need for a reliable analysis and assessment tools
that can predict the future water quality conditions under various scenarios.
Watershed management is a tool that has been accepted by water resource
managers and policy makers as an effective methodology to address effectively the full
range of concerns. It promotes the development of coordinated programs to control point
source contamination, reduce polluted runoff, and protect drinking water sources
(USEPA, 2001). In order to formulate sound watershed management plans, it is essential
to understand the intrinsic environmental informatics of urban watersheds (Tong et al.,
2009).
Previous studies that aggregated watershed elements to evaluate land use effects
on water quality are deficient in considering the detailed spatial and temporal aspects of
the urban watershed. Incorporating detailed land use and historical data records to
develop tools to quantify the impact on water quality are the key element in the tools
developed in this study. The understanding of the different watershed elements,
especially those related to impacts of land use will provide a better assessment of current
conditions and will provide good indication of what the future will hold if there are any
future land use development plans. Going through historical data records for basic
watershed elements such as water quality, quantity, land use, climate and watershed
characteristics and the interaction in between them will allow a thorough understanding
of the past and present conditions of the watershed and will allow for better decisions for
the future.
This research provides a framework that develops watershed management
planning and policy making tools to assess, analyze, and quantify detailed land use
4
effects on water quality in a watershed context. The framework is comprised of
methodologies and components for data integration, analysis and assessment such as data
warehousing, data driven modeling, watershed assessment, and watershed modeling. An
optimization approach that utilizes the watershed modeling outputs will be introduced
later. Tools such as data mining techniques and watershed models are used to analyze,
describe and predict the behavior of the watershed and how it is impacted by highly
urbanized land use.
1.4 Objectives of the Study
The purpose of this study is to understand and model the dynamics of nutrients in
a highly urbanized watershed. The effect of detailed urban land use on nutrients runoff to
water bodies in the Chicago River Watershed is investigated. Different tools and different
data about water quality, water quantity, point and non point sources, geospatial,
meteorological and land use data in a holistic watershed approach to examine nutrients'
pollution.
The Chicago River watershed is located in northern Illinois and drains
approximately 645 mi . It is 82% urban land use. The highly urbanized watershed is
recently facing issues like the invasion of the Asian carp and other water quality issues
which prompted serious talk about making drastic decisions and actions considering
hydrological separation of the Great Lakes and Mississippi River basins, or even re-
reversal of the Chicago River itself.
United States policies and regulations, such as the clean water act (CWA), were
created and are implemented to help maintain the quality of our water resources in the
5
United States (IEPA, 2009). Under section 303(d) of the CWA, states are required to
develop lists of impaired waters. This program is Environmental Protection Agency
EPA's national tracking system for impaired waters. A state's 303(d) impaired waters list
identify where the required pollution controls are not sufficient to attain or maintain
applicable WQS. The states are required to establish and develop prioritized Total
Maximum Daily Loads (TMDLs) for the identified waters.
The Chicago River Watershed is still experiencing development in the Total
Maximum Daily Loads (TMDLs) program for lakes and rivers listed as impaired waters
(303(d) list) by EPA. Not much done for the watershed, only a "Stage 1" TMDL report
was recently presented as partial fulfillment by the Illinois Environmental Protection
Agency (IEPA) and the United States Environmental Protection Agency (USEPA).
The purpose of the proposed project was to develop TMDLs impaired water
bodies on portion of the watershed, the Upper North Branch of Chicago River Watershed.
The potential causes of impairment for those segments proposed in the report were
chloride, dissolved oxygen, fecal coliform, pH, water temperature, and total phosphorus.
A final TMDL report is not published yet.
The framework proposed in the study provides tools to assess the watershed,
predict water quality parameters, quantify detailed land use effect on water quality and
could be implemented to maintain any developed TMDL and WQS for the watershed.
1.4.1 Strategy. Elements of the proposed framework are shown in Figure 1.1. They
consist of watershed data warehouse component; data analysis and watershed assessment
6
component; modeling and export coefficient yield component; and finally an
optimization approach component.
A local watershed data warehouse (WDW) that integrates and aggregates different
available data types from various agencies will be constructed. This DW will make it
easy to access, retrieve, manage data records, resolve missing data issues, integrate,
analyze, and assess historical watershed data. Water quantity and quality, climate, land
use and more of the watershed data could be and integrated to provide watershed
assessment or data requirements for modeling, for this study and for any similar studies
in the area. The local WDW will help: 1) Develop a deeper understanding of the
watershed, 2) Establish powerful watershed management decision making and analytics
capabilities, and 3) Facilitate more meaningful stakeholder interactions.
Existing data integration methods are deficient in their ability to easily access and
provide synthesized data for the watershed. This is because monitoring records are
usually managed separately by different organizations. Retrieving data for watershed
analysis from depends mainly on users' ability to navigate through these data sources.
Even the systems that were built to alleviate the issue proved to be deficient in their
ability to provide a decision making tool and interfaces that allow navigation through the
data records.
The proposed framework in this study develops a multi-dimensional data
integration model. Using this model will make it easy to investigate data in its most
atomic view and hence make it flexible to be accessed, retrieved and integrated across
many different spatial and temporal levels. Analysis of the historical data record will give
insight of the previous and existing watershed conditions and its sensitivity toward
7
different parameters, making it easy to concentrate either on the whole watershed or just
in a specific sub watershed. A graphical user interface that is specifically tailored for the
watershed is introduced to facilitate access to the WDW and bring the benefits of the
multi-dimensional model to different stakeholders. Also, an ad hoc analysis tool that
allows users to summarize data, perform analysis, slicing and dicing of data to assess the
watershed is also introduced. Data mining techniques is investigated to develop data
driven models to predict water quality parameters.
A framework was built using a multi-purpose watershed-based model called
Better Assessment Science Integrating point & Non-point Sources (BASINS). A
watershed model, Hydrological Simulation Program-Fortran (HSPF) was used to
simulate the watershed behavior and to develop the nutrients' export coefficients for
detailed land use types. The continuous watershed simulation model takes into
consideration detailed land use and long term simulation. The detailed land use considers
the effective imperviousness concept which takes into account whether the impervious
surface is directly connected to a drainage system or not. The resulted nutrient's export
coefficients are site specific indicators that incorporate lot of the watershed conditions
and variables at the watershed level including hydro meteorological data, topographic
data, land use management practices and physical characteristics. These coefficients
provide the numerical quantification for different land use type. They would be the input
for the introduced multi-objective optimization approach.

8
Framework
Data Warehouse Data Analysis &

Watershed
Assessment
Data
Data Mining
Modeling
Integration & Export Optimization
Data coefficient
Analysis
Data
Presentation
Watershed
Assessment
Figure 1.1. Elements of the proposed framework.

9
1.5 Overview of the Thesis
This work presents a theoretical background including a detailed literature review
of the theory and important principles in Chapter 2. Chapter 3 gives an overview of the
study area. Chapter 4 introduces the WDW and multi-dimensional model, watershed
assessment and data mining results. Chapters 5 introduces the data driven models
developed to predict water quality parameters. Chapter 6 presents and discusses the
results from the water quality model. Chapter 7 concludes the dissertation and evaluates
the watershed framework summarizing the most important findings of the investigation
and outlines areas for future research including the introduction of a multi-objective
optimization approach.
10
CHAPTER 2
LITERATURE REVIEW AND THEORETICAL BACKGROUND
2.1 Introduction
A watershed is a hydrologically connected geographical area where all the water
within that area drains to a common waterway (EPA, 2011). Water movement in the
watershed can be influenced by factors such as topography, soil composition and water
recharge (e.g. precipitation) (ILEPA, 2009). The importance of watersheds is
emphasized by the impacts of its pollution sources on all down gradient areas including
its convergence with a common waterway (ILEPA, 2009).
In this study, the two backbone concepts are the holistic watershed perspective
and the role of historical data records as part of assessment, modeling processes and
building of a watershed framework. The proposed study is composed of four parts: build
WDW that can easily access and manage data records; followed by watershed analysis,
assessment and data mining; then a data driven model that predict water quality and
quantity through data driven algorithms; then water quality simulation using the
Hydrological Simulation Program FORTRAN (HSPF) that simulates land use effects on
water quality (local export coefficients). A multi-objective optimization approach is
proposed for further investigation.
2.2 Land Use Effect in Urban Watershed
Urban areas contains much of the world population and inspite of that they cover
a relatively small proportion of the earthjust 2.6 percent in the United States
11
(Figure.2.1) (USDA, 2012). However, urban areas can have fundamental ecological
impacts on water quantity and water quality (Donaldson, 2005).
Over the years, land uses have seen rapid and extreme changes in the United
States that altered the surface characteristics of watersheds and impacted water quality
and quantity (Allan, 2004). Urban sprawl, inadequate urban planning, population
dynamics, increase of impervious areas, and increase of industrial and agricultural sectors
are all factors that are endangering the quality and quantity of water (Calderon, 2009).
The knowledge about land use and land cover has always been an important
aspect for nation's plans to overcome problems of uncontrolled development,
deteriorating of environmental quality, loss of prime agricultural important wetlands, or
loss of fish and wildlife habitat (Anderson et al., 1976). Land use classification systems
are needed in the analysis of environmental processes and problems. To gain information
of the different classes and categorizes of each land use type, land use can be classified at
the more detailed levels taking into account criteria of capacity, type, and needs into
account (Anderson et al., 1976). One example of a category of urban land use (Level I)
would be residential land use (Level II) which can be further subcategorized into single-
family unit or multi-family units etc. (Level III). The following sub-sections will further
discuss different aspects of the effect of urban land use on surface water quantity and
quality, and how it modifies land and surface characteristics.

12
Miscellaneous
Urban areas other land
Cropland
2 6% 10 1%
19 5%
Special-use
areas
13 1%
Forest-use land Grassland pasture

28 7% and range
25 9%
Source USDA Economic Resaarcn Service
Figure 2.1. Major land use areas in USA (USDA, 2012)

13
2.2.1 Urban Land Use Effect on Surface Water. The effect of urbanization on
streams differs from one system to another; some systems suffer radically from relatively
minor impacts, while others show less sensitivity (Smith, 2005). In urban land use areas,
great percentage of the areas is covered by impervious segments such as buildings,
parking lots and pavements. The impacts of those areas on watersheds have always been
accounted on aspects such hydrology, climate, and ecology (Rose et al., 200; Paul et al.,
2001).
The effects urban land use can have on water quality of streams, rivers, lakes and
estuaries of watersheds had been the base of a lot of studies over the years (Hanratty et
al., 1998; Rai et al., 1998; Bhaduri et al., 2000; and Bhaduri et al., 2001). Even streams in
urban watersheds are now characterized by having fundamental differences from streams
in forested, rural, or agricultural watersheds, due to significant amounts and rate of
surface runoff due impervious cover (Tong et al., 2009). The volume of runoff and flood
damage potential is greatly high in urban areas than in other land uses' areas (Weng,
2001). Also, impacts on sub watershed scale when spatial variation of urbanization was
considered showed high impact on runoff and nitrogen that is directly proportional to
urbanization level (Tang et al, 2005).
Watershed imperviousness had been the subject of lot of monitoring and
modeling studies over the years which have consistently shown that urban pollutant loads
increase with increase in imperviousness (Cianfrani et al., 2006; Allan, 2004; Barnes et
al., 2002; Beach, 2002; Cappiella et al., 2001; Finkenbine et al., 2000; Schueler, 1994).
Studies shows that the more the increase in the impervious surfaces the more significant
degradation have been noticed in the quality of aquatic resources and surface waters
14
(Tsegaye et al, 2006; Doll et al., 2002; Johnson et al, 2001; Bhaduri et al, 2000; Arnold et
al., 1998).
2.2.2 Pollutants in Urban Streams. Different kinds of pollutants and contaminants
could degrade runoff water quality from different types of land use. Runoff from highly
developed urban areas may be containing sodium and sulfate from road deicers and even
rubber fragments or heavy metals (Tong et al., 2002). A study in an eastern Illinois
watershed found that urban land use was the main cause of nitrogen and phosphorous
relative to agricultural land use (Ahearn et al., 2005). The same conclusion was reached
in an urban land use in studies in Alabama and Ontario (Canada) (Silva et al, 2001;
Basnyat et al, 1999; Ahearn et al., 2005). Concentrations of total phosphorous in urban
area streams are generally higher than the concentrations in agricultural area streams
(Brett et al, 2005; USGS, 1999; Winger et al., 2000; Donaldson, 2005). These elevated
levels of phosphorous found were due to point source pollution from wastewater
treatment plants in urban land uses relative to non-point sources pollution associated with
fertilizers in agricultural land uses (USGS, 1999; Robbins et al, 2001; Robbins et al.,
2003; Donaldson, 2005).
2.2.3 Modifications Due to Urban Land Use. Land surface characteristics along with
water balance and hydrologic cycle can be modified by changing land use and the
altering patterns of evapotranspiration, interception, infiltration, percolation and
absorption (Tong et al., 2002; LeBlanc et al., 1997). As a result, significant changes occur
in the quantity of water available for stream and ground water flow, and the different
chemical, physical, and biological processes in the receiving water bodies are modified
(Tong et al., 2002). In a study that classify surface water in urban land use, a strong
correlation between proportion of urban land use area such as residential and industrial,
and worsening water quality had been found (Ren et al., 2003). Although these land uses
considered as pollutant sources are inevitable, they can greatly affect the hydrology and
water quality in a watershed (Cotter et al, 2003).
2.2.4 Land Use Effect on Water Quality and Quantity. Although lot of studies
investigated the impacts of land use on water quantity and quality (Wu et al.,1993;
Mattikalli et al., 1996; Tsihrintzis et al., 1998; and Bouraoui et al., 1998), quantifying
water quality in a river watershed based on land use patterns is still developmental (Tong
et al., 2002; Tong et al., 2009). This is due to the complex relationship between different
land uses patterns with water quality and quantity under different environmental and
geographical settings (Tong et al., 2009).
Tools such as hydrological models that are coupled with geographic information
systems (GIS) and remote sensing proved to be powerful techniques in conducting these
kinds of studies (Conway et al., 2005; Wang et al., 2005). Other integrated approaches
involve the use of statistical and spatial analyses, as well as hydrologic modeling to
examine the effects of land use on water quality (Tong, 2007; Tong et al., 2002).
Most researches depend on field studies and focus on local geographical scale and
small range of land use patterns to view the issue (Wilson et al, 2011; Akhavan et al.,
2010; Leon et al., 2010; Tong, 2006). Integrated approaches that involve holistic view of
16
the issue, integrate different data records in the area, and utilize different methods of
analysis, are needed (Walton et al., 2009).
In order to conserve water resources and formulate sound watershed management
plans, it is essential to understand the intrinsic environmental informatics of urban
watersheds (Tong et al., 2009). This understanding will provide a better assessment of
current conditions and will provide good indication of what the future will hold if there
are any future land use development plans.
2.2.4.1 Impacts on Water Quantity. In a study for Cook county stormwater
management, the impacts of urban land use was detailed for the Chicago area. The study
stated that land developments clearly altered the region's runoff patterns by converting
pervious land to impervious land, and by considerably changing the drainage patterns
(MWRDGC, 2007).
As a result a shift of groundwater-dominated hydrology to surface water
dominated hydrology had occurred (MWRDGC, 2007). That led to huge increase in the
rate and volume of stormwater runoff and considerable reduction in groundwater
recharge. Changing runoff rates and volumes can create the typical impacts that
explained are below:
Flooding. The rates of flow have increased by 100 to 200 percent or even more in
urbanizing watersheds. Detention basins can help reduce this effect, however cumulative
increases in runoff volumes tend to decrease detention effectiveness when the whole
watershed is considered (MWRDGC, 2007).

17
Erosion. As more development takes place in urbanizing watershed, the increased
rate of runoff tends to acquire very high speed in channels. This leads to the scouring and
destabilization of stream banks (MWRDGC, 2007).
Destabilization. Storm flows tend to stress aquatic life whether it is high flow in
wet season or low flow in dry season. The high speedy flows tend to flush the natural
substrates and organisms. In dry seasons, reduced and extended low flows results in
siltation that reduce stream depth and elevation of water temperature during summer time
(MWRDGC, 2007).
2.2.4.2 Water Quality Impacts. High density developments such as commercial and
industrial land use projects were found to contribute more to the pollution of storm runoff
than lower-density residential developments (MWRDGC, 2007). Some common water
quality impacts of stormwater runoff are as follows:
Sediment Contamination. Runoff sediment may be toxic to some organisms due
to the high concentrations of heavy metals and organic compounds. The high organic
contents may results in high oxygen demand when it decomposes in stream waters
(MWRDGC, 2007).
Nutrient Contamination. High levels nitrogen and phosphorus can stimulate
excessive growth of algae and other undesirable aquatic plants. Impairment to aesthetics,
recreational and quality of the water body can deteriorate (MWRDGC, 2007).
Toxicity. Low dissolved oxygen levels, high pollutant concentrations and
elevated water temperatures increase the toxicity problem to aquatic life. Decomposed
18
organic matter that is washed by storm runoff tends to lower the dissolved oxygen to low
levels during summer time (MWRDGC, 2007).
Bacterial Contamination. For storm runoff, it was found that the water quality
standard for fecal coliform bacteria is frequently violated in urban water bodies after a
storm event. This violation reflects the presence of significant animal or human waste in
the water (MWRDGC, 2007).
Salt Contamination. Salinity levels in urban watersheds have higher levels due
to salt treatment used for deicing roads. This may adversely impact certain plant
communities and wetland species (MWRDGC, 2007).
Impairment of Recreational Waters. Urban runoff may reduce the recreation
potential of urban water bodies due to contamination problems (MWRDGC, 2007).
Water Temperatures' Elevation. Watershed urbanization results in increases in
water temperatures due to the removal of natural shading and the reduction of base flows.
Moreover, impervious surfaces results in runoff being heated by the sun raising its
temperature. Elevated water temperatures stress aquatic life and aggravate water quality
problems (MWRDGC, 2007).
2.3 Regulations
United States policies and regulations, such as the Clean Water Act (CWA), were
created and are implemented to help maintain the quality of our water resources in the
United States (IEPA, 2009). Each state is charged by U.S. EPA to develop water quality
standards (WQS). WQS are laws or regulations that states authorize in order to enhance
19
water quality and to ensure that designated use of waters is not compromised (IEPA,
2009). In general, WQS consist of three elements (IEPA, 2009):
Beneficial designated use of water body such as recreation, protection of aquatic
life, aesthetic quality, and public and food processing water supply;
Necessary WQS to support this use;
A policy that ensures water quality improvements are conserved, maintained and
protected (anti-degradation policy).
Now there are an estimated 34,000 impaired waters and 58,000 associated
impairments officially listed in the U.S., where nutrients and sediments are two of the
most common pollutants included in the list (Borah et al, 2006). Since 1972, public
awareness and concern for controlling water pollution led to the enactment and then the
amendment of the CWA in 1977. The act established the basic structure for regulating
discharges of pollutants into the waters of the United States. EPA is given the authority to
implement pollution control programs. EPA stated various regulatory and no regulatory
tools to reduce direct pollutant discharges in an effort to restore and maintain the integrity
of the nations' waters chemically, physically and biologically by financing municipal
treatment facilities, and manage polluted runoff (USEPA, 2011).
Clean Water Act. For many years following the passage of CWA in 1977, the
focus was mainly on the chemical aspects of the "integrity" goal stated by EPA. Also
efforts focused on regulating discharges from traditional "point source" facilities, such as
municipal sewage plants and industrial facilities, and little attention was given to runoff
from streets, construction sites, farms and other urban storm runoffs (USEPA, 201 la).
20
Starting in the late 1980s, more attention has been given to physical and
biological integrity and polluted runoff. For "nonpoint" runoff, voluntary programs such
as cost-sharing were key tools. For urban point sources regulatory approaches are being
employed (USEPA, 201 la).
Over the years, evolution of CWA programs shifted from a program-by-program,
source-by-source, and pollutant-by-pollutant approach to more holistic watershed-based
strategies. The watershed approach ensures equal emphasis on both protecting and
restoring waters. A full range of issues and problems are addressed and not only those
subject to CWA regulatory authority. Also through the involvement of stakeholder
groups, the different processes to achieve and maintain state water quality and other
environmental goals are part of this approach (USEPA, 201 la).
The major CWA programs are: WQS; Anti-degradation policy; Water body
monitoring and assessment; Reports on condition of the nation's waters; Total Maximum
Daily Loads (TMDLs); NPDES permit program for point sources; Section 319 program
for nonpoint sources; Section 404 program regulating filling of wetlands and other
waters; Section 401 state water quality certification; and state revolving loan fund (SRF)
(USEPA, 2011a).
Under section 303(d) of the CWA, states are required to develop lists of impaired
waters. This program is EPA's national tracking system for impaired waters. A state's
303(d) impaired waters list identify where the required pollution controls are not
sufficient to attain or maintain applicable WQS. The states are required to establish and
develop prioritized Total Maximum Daily Loads (TMDLs) for the identified waters.
21
A TMDL is a calculation of the maximum amount of a pollutant that a water body
can receive and still safely meet WQS, and an allocation of that load among the various
sources of the pollutant and a margin of safety (MOS) which takes into account any lack
of knowledge concerning the relationship between effluent limitations and water quality.
In equation form, a TMDL may be expressed as follows (IEPA, 2009):
TMDL = WLA + LA + MOS 2.1
where,
WLA = Waste Load Allocation (i.e., loadings from point sources);
LA = Load Allocation (i.e., loadings from nonpoint sources including natural
background); and
MOS = Margin of Safety.
Long term plans (8 to 13 years) are provided to states by EPA for completing
TMDLs from the first listing of the water body. Water bodies are allowed to be removed
from their 303(d) list after a TMDL have been developed or other changes to solve water
quality issues have been made (USEPA, 2011b). While CWA have required TMDLs
developments since 1972, until now EPA and the states have not developed many.
2.4 Watershed Modeling
Watershed models are useful tools that enable interpretation, quantification, and
assessing of complex natural processes (Borah, 2011). They describe complicated
systems through set of equations that explain the problems and develop a method to solve
them (Regnier et al., 2002; Miller et al., 2007). They can simulate pollutants' generation
and movement across land and through rivers and other water systems to predict flows,
stages and pollutant concentrations (Barling et al., 1994). In general they simulate natural
22
processes for the flow of water, sediment, chemicals, nutrients, and microbial organisms
within watersheds, as well as quantify the impact of human activities on these processes
(Singh et al., 2004).
Models are merely a reflection of our understanding for the watershed systems
and this understanding define the quality of results they produce (EPA, 2011). However,
watersheds models are fundamental to water resources assessment, development and
management (Jia et al., 2005). Simulation of these natural processes plays a fundamental
role in addressing a range of water resources, environmental, and social problems (Singh
et al., 2004). They are highly utilized to understand dynamic interactions between climate
and land-surface hydrology (Singh et al., 2004).
The following sub-sections will discuss the development of watershed models.
Also the general classification of models will be shown. Some of the currently used
models in the USA and other parts of the world will be mentioned. The strengths and
deficiencies of watershed models will be discussed. Finally, the watershed models
selected for this study will be presented.
2.4.1 Development of Watershed Models. Before 1960s, watershed modeling was
confined to the modeling of individual components of the hydrologic cycle due to
limitations in both computing capabilities and available data (Singh et al, 2006). The
advance of computers and the following rapid growth of computing capability in the
decades to follow made the watershed modeling more comprehensive (Singh et al, 2006).
The development of the Stanford Watershed Model (SWM), now called Hydrological
Simulation Program-Fortran (HSPF), initiated the development of more operational,

23
lumped or 'conceptual' models (Singh et al, 2004). During the decades of the 1970s and
1980s, more mathematical models were developed for simulation of watershed hydrology
and their applications in other areas, such as environmental and ecosystems management
(Singh et al., 2002). Examples of such watershed hydrology models are Storm Water
Management Model (SWMM), Precipitation-Runoff Modeling System (PRMS), National
Weather Service (NWS) River Forecast System, Streamflow Synthesis and Reservoir
Regulation (SSARR), Systeme Hydrologique European (SHE), TOPMODEL, Institute of
Hydrology Distributed Model (IHDM), and others (Singh et al, 2006). These models
described different processes using differential equations based on simplified hydraulic
laws, and expressed other processes using empirical algebraic equations (Singh et al,
2004). Soil moisture replenishment, depletion and redistribution were incorporated in
more recent conceptual models to simulate the dynamic variation in areas contributing to
direct runoff (Singh et al, 2004). The development of new models along with constant
improvement of old models is still continuing today (Singh et al., 2002).
2.4.2 Classification of Watershed Models. To select an appropriate model, factors
such as intended use, accuracy, data availability and study area characteristics should be
taken into account (Wang et al., 2005).
The model structure and architecture are determined by the objective for which
the model is built. Singh (1995), classified models based on the process descriptions; the
process time and space scale; the techniques of solution; modeled area land use, and the
intended model use. Components of a typical continuous, deterministic watershed
/hydrologic model are shown in Figure 2.2.

24
In general, watershed models are classified as empirical or physical (conceptual)
based computer models (Ahmad, 2010). Empirical models consider factors such as field
observation, measurement, experiments and statistical methods. But the problem with
these types of models is that they are site specific and require long-term data. They show
good performance when used in simulating hydrology or soil erosion (Ahmad, 2010).
The physical-based models are founded on a scientific base and fundamental knowledge
of watershed processes. Fundamental concepts such as laws of conservation of mass and
energy are considered.
Physical-based models are generally more preferred because they provide a better
understanding of watershed processes (Ahmad, 2010). Process-based models are the
watershed models that represent hydrologic and water quality processes using both
empirical and physically-based relationships (Arabi et al, 2005).
According to degree of spatial variability, watershed models can be categorized in
two types: lumped-parameter models and distributed-parameter models (Wu, 2006).
Spatial scale models are further classified into either lumped or distributed models and
temporal scale models are further classified into event-based or continuous model
(Ahmad, 2010). Lumped models are spatial scale models where the watershed is
considered to be a single unit for computations and watershed parameters, where they are
adjusted for each sub-unit and averaged over the entire unit, while distributed models
divide the watershed into small units, each having homogeneous properties (Wu, 2006;
Ahmad, 2010). Physical and hydrologic characteristics related with this area are lumped
together to represent the watershed as one uniform system (Qi, 2006). Now event-based
models are temporal scale models that can simulate single storm events and do not take
25
into account the hydrologic cycle (Wu, 2006). The continuous hydrologic models, on the
other hand, consider the whole hydrologic cycle and effects of long-term hydrological
changes and watershed management practices (Ahmad, 2010; Wu, 2006). Watershed
management practices, especially structural practices, are analyzed by event-based
rainfall-runoff models (Nu-Fang et al., 2011; Sheng et al., 2008; Najafi, 2003; Muzik,
2002). Continuous models are used to investigate long term processes such as fate and
transport of pollutants (Singh et al., 2011; Yu et al., 2009; Jeon et al., 2007;
Ramireddygari et al., 2000). Combined models that have both long-term and single-event
simulation capabilities are also used (Borah et al., 2003).
Statistical tools, including regression and correlation analysis, time series
analysis, stochastic processes, and probabilistic analysis are necessary to analyze the
output of models (Tong et al., 2006; Calderon, 2009). Because of uncertainties in model
structure such as parameter values, precipitation, and other climatic inputs, uncertainty
analysis and reliability analysis can be employed to examine their impact (Calderon,
2009).
2.4.3 Currently used watershed models. Several known watershed models are
currently in use in the U.S. and elsewhere (Singh et al., 2004). The models' construction
and component processes vary significantly according to the different purposes they are
supposed to fulfill. Some of these models are: The Hydrologic Engineering Center's
Hydrologic Modeling System HEC-HMS is used in the private sector for designing
drainage systems and quantifying the effect of land use change on flooding; The National
Weather Service NWS model is used for flood forecasting; HSPF and its extended water
26
quality model are the standard models adopted by EPA; The Modular Modeling System
MMS model adopted by USGS is a widely used model for water resources planning and
management works; and distributed hydrologic model WATFLOOD is the popular model
in Canada, used for hydrologic simulation; RORB and WBN models are runoff routing
model commonly employed for flood forecasting, drainage design, and evaluating the
effect of land use change in Australia; TOPMODEL and SHE are the standard models for
hydrologic analysis in many European countries; HBV model is the standard model for
flow forecasting in Scandinavian countries; ARNO, LCS, and TOPIKAPI models are
popular in Italy; TANK models are also popular in Japan; The Xin'anjiang model is a
commonly used model in China (Singh et al, 2004). From literature, many other
watershed models can be found. Table 2.1 shows characteristics of major watershed
models (Heathcote, 1998; Qi, 2006).

27
Inputs from
precipitation
Pervious Areas Impervious Areas
V
Surface Surface
storage runoff Surface
1 f
storage
* 1 f
Surface
Soil water Interflow water flows Surface
i i
runoff
Groundwater ->
Groundwater
aquifer (base) flow
Figure 2.2. Components of a typical continuous, deterministic watershed /hydrologic

model (Heathcote, 1998).
28
Table 2.1. Characteristics of major watershed models (Heathcote, 1998)
Model Name Primary Application Model of Operation
SWMM Simulation of urban runoff Event or continuous;

quantity and quality, including time step can be
processes in storm and minutes or hour.
combined sewer systems.
STORM Simulation of rainfall-runoff- Event or continuous;
water quality in urban and fixed time step of one
rural catchments. hour.
HSPF Comprehensive package for Dynamic and
simulation of watershed continuous.
hydrology and water quality
for both urban and non-urban
areas.
AnnAGNPS Simulation of agricultural Event or continuous.
areas with primary emphasis
on nutrients and sediments
and to compare the effects of
various pollution control
practices.
ANSWERS Capable of predicting the Event-oriented. A
hydrologic and erosion single storm
response of agricultural hyetograph drives the
watersheds. model.
SWAT A river basin scale model Continuous; three
developed to quantify the computation levels
impact of land management available, depending
practices in large, complex on users needs.
watersheds.
MIKE-11 Simulation of unsteady-state- Continuous unsteady-
one dimensional flows, state in one
transport, and biological dimension
chemical reactions
29
2.4.4 Strengths and Deficiencies of Watershed Models. Singh (2004) summarized the
major strengths of the current generation of models as follows: They are diverse, making
it easy to find specific watershed model to address a practical problem; they are
comprehensive and can be applied to a range of issues in a watershed; they can simulate
the physics of the underlying hydrologic processes in both space and time quite well; they
are distributed in space and time; and the attempt to integrate ecosystems and ecology,
environmental components, bio-systems, geochemistry, atmospheric sciences, and coastal
processes with hydrology successfully reflect the increasing role of watershed models in
tackling environmental and ecosystems problems.
On the other hand, Singh (2004) pointed out the watershed models' deficiencies
as follows: they are not user-friendly tools; they require large data inputs; they lack the
measures that can quantitatively asses the model reliability; there are limited and unclear
guidance for the model applicability; and they cannot be supplied with environmental,
social, and political inputs.
2.4.5 Models Used in the Study.
Better Assessment Science Integrating point & Non-point Sources (BASINS).
This section presents a summary of the Better Assessment Science Integrating point &
Non-point Sources (BASINS). A detailed description of BASINS can be obtained in the
User's manual,
Version 4.0. BASINS is a multi-purpose environmental analysis system that
integrates a geographical information system (GIS), national watershed data, and state-of-
the-art environmental assessment and modeling tools (such as HSPF, SWAT, SWMM
30
etc.) into one convenient package (EPA, 2012). The system is designed to be local, state
and regional to perform watershed and water quality-based studies (EPA, 2012). It was
developed by the USEPA to address the basic objectives of facilitating investigation of
environmental information, supporting the analysis of environmental systems, and to
provide a framework for investigating management alternatives (EPA, 2012).
The BASINS system promotes better assessment and integration of point and
nonpoint sources for watershed and water quality management. It integrates several key
environmental data sets with improved analysis techniques. Environmental programs can
apply the integrated system in various stages of environmental management planning and
decision making (EPA, 2007). It is also conceived for developing TMDLs programs since
they requires a watershed-based approach that integrates both point and nonpoint sources
(EPA, 2007).
Watershed-based assessments involve many separate steps such as data
preparation, information collection and summarization, maps and tables' development,
and model application and interpretation. BASINS facilitate such steps by bringing key
data and analytical components under one roof providing the user with a fully
comprehensive watershed management tool (EPA, 2007).
The framework for BASINS is provided by the integration of GIS which
organizes spatial information so it can be displayed as maps, tables, or graphics. Through
the use of GIS, BASINS has the flexibility to display and integrate important of
information such as land use, point source discharges, and water supply withdrawals
(EPA, 2007). BASINS is a widely accepted watershed-based water quality assessment
tool and it was adopted to model land use effects on water quality in many watershed
31
studies (Tong et al., 2002; Luzio et al., 2002; Fohrer et al., 2001; Arnold et al., 2005;
Singh, 2005; Tong et al, 2007; Tong et al, 2008).
Hydrological Simulation Program-FORTRAN (HSPF). HSPF is a watershed
scale conceptual model. It is comprehensive and performs continuous simulation of
nonpoint source hydrology and water quality, combines it with point source
contributions, and performs flow and water quality routing in the watershed reaches
(Singh, 2005).
HSPF can simulate and predict the impact of land use on nutrient loadings into
watershed water bodies. The model is flexible and reliable hydrologic model. It is very
robust with high resolution (Bicknell et al., 1996). HSPF model is developed under EPA
sponsorship to simulate hydrology and water quality processes in pervious or impervious
areas (EPA, 2011). The first version of HSPF was released in 1980. The functions and
processes in the initial development were derived from the following group of
predecessor models (Bicknell et al., 2005):
Hydrocomp Simulation Programming (HSP), 1969
NonPoint Source (NPS) Model, 1976
Agricultural Runoff Management (ARM) Model, 1976
Sediment and Radionuclides Transport (SERATRA), 1979
HSPF consists of number of modules that are arranged hierarchically to permit the
continuous simulation of hydrologic and water quality processes (Bicknell et al.,
2005).The main simulation modules, PERLND, IMPLND, and RCHRES simulate
pervious land segments, impervious land segments, and free flow reaches/mixed
reservoirs, respectively (Donigian et al., 1995). Further details of subroutines shown in

32
each module Figures 2.3, 2.4, and 2.5 are explained in details in HSPF Version 12.2
User's Manual (Bicknell et al., 2005).
HSPF also has number of utility modules that are used to access, manipulate, and
analyze time series information stored by the user in HSPF's TSS (Time Series Store) and
WDM (Watershed Data Management) files. The time series comprises data such as
hourly precipitation, daily evaporation, and daily stream flow. They provide valuable
resource in the analysis of a watershed's characteristics and to perform different processes
(Bicknell et al., 2005).
The HSPF system was designed such that a top down approach was followed. The
various simulation and utility modules can be invoked conveniently either individually or
in tandem although they were separated according to functionality (Bicknell et al., 2005).
The concept behind designing HSPF is that the comprehensive simulation system with
consistent means of representing watershed is viewed as a set of constituents which move
through a fixed environment and interact with each other (Bicknell et al., 2005). Water,
sediments, chemicals are all constituents and the motions and interactions are denoted as
processes (Bicknell et al., 2005).
When launching HSPF, the watershed area must be delineated either manually or
automatically into homogeneous land areas called Hydrologic Response Units (HRUs)
before running the HSPF model (Donigian et al., 1995). The delineation process takes
place in BASINS. It divides the watershed into subbasins that has a combination of
weather, soil, landuse, topographic and geologic properties that are unique to the specific
subbasin (Donigian et al., 1995). HRUs can be impervious or pervious areas, which are
modeled independently. Each HRU requires input data such as precipitation, temperature,
33
potential evapotranspiration, and parameters related to land use, soil characteristics, and
agricultural practices to simulate hydrology, sediments, nutrients and pesticides
(Donigian et al., 1995).
A flow diagram of the hydrological components of HSPF is shown in Figure 2.6.
This diagram shows a reservoir-type model that allows different types of inflow and
outflow (Bicknell, 2005; Calderon, 2009). Inflows and outflows are simulated as a water-
balance system in HSPF (Donigian et al., 1995). Pervious land segment simulates
processes such as interception, evapotranspiration, surface detention, surface runoff,
infiltration, shallow subsurface flow (interflow), base flow, and deep percolation
(Donigian et al., 1995; Calderon, 2009). All these processes are performed by the
PERLND module.
HSPF uses the physical and empirical formulations to model the movement of
water within each HRU. According to land cover on the land segment, interception
storage capacity is assumed and loss of interception is simulated accordingly. This
interception storage must be filled before excess precipitation can reach the land surface;
the intercepted water is subsequently subjected to evaporation (Calderon, 2009).
According to Bicknell (2005) the process can be explained as follows: the
hydrologic processes are modeled by PWATER which is the key subroutine of module
PERLND. The subroutine simulates the retention, routing, and evaporation of water from
pervious land segments. Algorithms used to simulate these lands, and related processes,
are based on the original research for the LANDS subprogram of the Stanford Watershed
Model (Bicknell, 2005). The number of time series required by PWATER depends on
whether snow accumulation and melt are considered, otherwise only potential
34
evapotranspiration and precipitation are required. However, when snow conditions need
to be simulated as well, time series for air temperature, precipitation, snow cover, water
yield, and ice content of the snowpack are also required. Water available for infiltration
and runoff are sum of inflow to the surface detention storage and the existing storage.
Part of the precipitation directly infiltrates and moves to the lower zone and groundwater
storages. Other part of the water move to the upper zone storage and may be routed as
runoff from surface detention or interflow storage. The water that infiltrated through the
surface and from the upper zone storage may stay within the lower zone storage where it
becomes subject to evapotranspiration or flow to active groundwater storage or may be
lost by deep percolation where it is considered lost from the simulated system.
Similarly Bicknell (2005) stated that IWATER simulates the retention, routing,
and evaporation of water from an impervious land segment. IWATER is similar to
PWATER of the PERLND module; however, IWATER is simpler because there is no
infiltration associated and hence no subsurface processes to be considered. Precipitation
is available for retention storage and removed by evaporation but when the retention
capacity is exceeded, it overflows the storage and is available for runoff.
The algorithms used to simulate infiltration show the continuous variation of
infiltration rate with time as a function of soil moisture. They are calculated by the
following relationships (few subroutines are summarized here from HSPF Users Manual,
detailed descriptions of all modules and subroutines used by HSPF could be found in
HSPF version 12.2 User's Manual (Bicknell et al., 2005)) :
IBAR = (INFILT/ (LZS/LZSN) **INFEXP)*INFFAC 2.2
IMAX = INFILD*IBAR 2.3

35
IMIN = IBAR - (IMAX - IBAR) 2.4
RATIO = INTFW*(2.0** (LZS/LZSN)) 2.5

Where:
IBAR = mean infiltration capacity over the land segment (in/interval)
INFILT = infiltration parameter (in/interval)
LZS = lower zone storage (inches)
LZSN = parameter for lower zone nominal storage (inches)
INFEXP = exponent parameter greater than one
INFFAC = factor to account for frozen ground effects, if applicable
IMAX = maximum infiltration capacity (in/interval)
INFILD = parameter giving the ratio of maximum to mean infiltration capacity
over the land segment
IMIN = minimum infiltration capacity (in/interval)
RATIO = ratio of the ordinates of line II to line I (see Bicknell et al. (2005) -
subroutine SURFAC-Determination of infiltration and interflow inflow Figure)
INTFW = interflow inflow parameter
The factor that reduces both infiltration and upper zone percolation that account
for the freezing of the ground surface (INFFAC) is calculated as follows:
INFFAC = 1.0 - FZG*PACKI 2.6
Where:
FZG = parameter indicating how much icing reduces infiltration (/inches)
PACKI = water equivalent of ice in snowpack (inches

36
The fraction of runoff that becomes inflow to the upper zone storage is
computed as follows:
FRAC = 1 - (UZRAT/2)*(l/(4 - UZRAT))**(3 - UZRAT) 2.7
For UZRAT less than or =2
FRAC = (0.5/(UZRAT - 1))**(2*UZRAT - 3) 2.8
For UZRAT greater than two
Where:
FRAC = fraction of potential of direct runoff retained by the upper zone storage
UZRAT = UZS/UZSN
UZS= upper zone storage
UZSN= upper zone storage nominal capacity
PROUTE, the surface runoff subroutine determines how much potential surface
detention runs off in one simulation interval. The process of overland flow is considered
a turbulent flow process. Chezy-Manning equation and an empirical expression which
relates outflow depth to detention storage are used for the simulation. The rate of
overland flow discharge is computed as follows:
For SURSM < SURSE 2.9
SURO = DELT60*SRC*(SURSM*(1.0 + 0.6(SURSM/SURSE)**3)**1.67
For SURSM >= SURSE
SURO = DELT60*SRC*(SURSM*1.6)**1.67 2.10
Where:
SURO = surface outflow (in/interval)
DELT60 = DELT/60.0 (hr/interval)

DELT= time steps
SRC = routing variable
SURSM = mean surface detention storage over the time interval (in)
SURSE = equilibrium surface detention storage (inches) for current supply rate
Only the simulation in the main channel river is considered when simulating
rivers (Bicknell et al., 2005). Storage routing technique is used by the model to route
water from one reach to the next during stream processes (Singh et al., 2004). The
hydraulic characteristics of reaches are defined by parameters that represent volume
discharge relations for reaches in specific function tables (FTABLES) (Singh et al.,
2004). A fixed relationship is assumed among water level, surface area, volume and
discharge for each reach.
Parameters as percentage of impervious area, average length of overland flow and
average slope overland flow can be determined from the Geographical Information
System (GIS) data base including Digital Elevation Models (DEMs) (Singh et al., 2005;
Calderon, 2009). Others parameters pertaining to infiltration, soil-moisture zones, and
interflow are determined by calibration or comparison with observed hydrographs
(Linsley et al., 1988; Calderon, 2009).Values of other parameters needed by HSPF cannot
be obtained from field data and need to be determined through model calibration
iterations (Linsley et al., 1988; Bicknell et al., 2005; Calderon, 2009).
Water quality constituents or pollutants in the outflows from an impervious land
segment are simulated by IQUAL module using simple relationships. One approach is to
simulate the constituents by association with solids removal. The other approach uses
atmospheric deposition and/or basic accumulation and depletion rates together with
38
depletion by washoff to simulate constituent outflow. A combination of the two methods
may be used too. Up to 10 quality constituents can be simulated by IQUAL at a time.
Removal of the solids associated constituent by solids washoff is simulated as follows:
SOQS - SOSLD*POTFW 2.11
Where:
SOQS = flux of constituent associated with solids washoff (quantity/ac per
interval)
SOSLD = washoff of detached solids (tons/ac per interval)
POTFW = washoff potency factor (quantity/ton)
If atmospheric deposition data are input, the simulation is determined as follows:
SQO = SQO + ADFX + PREC*ADCN 2.12
Where:
SQO = storage of available quality constituent on the surface (mass/area)
ADFX = dry or total atmospheric deposition flux (mass/area per interval)
PREC = precipitation depth
ADCN = concentration for wet atmospheric deposition (mass/volume)
If there is surface outflow and some quality constituent is in storage, then washoff
is simulated as follows:
SOQO = SQO*(LO - EXP (SURO*WSFAC)) 2.13
Where:
39
SOQO = washoff of the quality constituent from the land surface (quantity/ac/
interval)
SQO = storage of the quality constituent on the surface (quantity/ac)
SURO = surface outflow of water (in/interval)
WSFAC = susceptibility of the quality constituent to washoff (/inch)
EXP = exponential function
For this study, for model development process, many components of the
BASINS 4.0 system were used, namely WinHSPF and WDMUtil for pre-processing and
GenScn for post-processing.
HSPF is extensively used to model urbanized watershed (Brun et al., 2000;
Tong, 2006; Im et al., 2003; Shirinian-Orlando, 2007; Wicklein et al., 2008) but not as
much in highly urbanized watersheds as Chicago River watershed. HSPF lacks the
capability to simulate storm sewer networks (Mohamoud et al., 2010). Though there are
studies that show that among reviewed models that simulate storm water quantity and
quality in urban environments, HSPF is the most comprehensive and flexible hydrology
and water quality model available (Zoppou, 2003; Bergman et al., 2002; Mohamoud et
al., 2010). However other studies suggested that using the urban land use as a non point
source for nutrients can give invalid results, because of the impervious cover in urban
area and the way drainage is frequently routed to waste water treatment plants (which
may or may not be in the same basin), then discharged to local rivers as point sources PS
(Ahearn et al., 2005).
Since accurate estimates of runoff volume are important in order to estimate
pollutant loads, the effective impervious area (EIA) as a portion of the total impervious
40
area (TIA) should be determined to be used in hydrological models (Sutherland, 2000;
Smith, 2005; Brabec et al., 2010). Impervious area is a rough indication of the total
watershed utilized by human activities. The EIA is considered one of the most important
and hard to determine parameters (Sutherland, 2005). It is the portion of the TIA within a
watershed that is partially or totally connected to the drainage collection system. Street
surfaces, parking lots, paved driveways and sidewalks, rooftops that are directly
connected to the storm sewer system, are all included in the EIA (Sutherland, 2000). For
urban runoff modeling or hydrologic analysis, the EIA for a given basin is usually less
than the TIA; however, in highly urbanized basins, EIA values can approach and equal
TIA (Smith, 2005). Field measurements, empirical equations and calibrated computer
models are some ways to determine effective impervious area (Brabec et al., 2010;
Sutherland, 2000; Alley et al., 1983; Laenen, 1983)

41
PERLND ATEMPI SNOW I PWATERI sedmnt! rstempI pwtgasI pqualI

Perform Correct air Simulate the Simulate Produce and Estimate Estimate Simulate
quality
computations temperature accumulation water budget remove soil water constituents
on a segment for elevation and melting for previous sediment temperature temperature using simple
of previous difference of snow and land and dissolved ratalionsh(
land ice segment gas conc. with sediment &
water yield
pEfr 4.2(1) 1 I 4 2(112 4 2(1)3 4 2(1) 4 4 2(1)5 4.2(1).e 4 2(1) 7
|4 2(1>2^ [4 2(1)3^ 4.2{t) 4

> I* 2H)^>
Agri'Chemicsf Section*
M3TLAY PEST 1 NITR PHOS I TRACER
Estimate the Simulate Simulate Simulate Simulate the

moisture & the the pestiode nitrogen phosphorus movement of
fraction! of behavior in behavior in behavior in a tracer
solutes being
transported m detail detail detail (conservative)
the soil layers
4.2(1).6 4.2(1)9 \4.2(1).1C 4.2(1) 11 4.2(1).12
2(1) 8^ 4 2(U9^ 4 2(1) 4.2(1)1^ 4 2(1)12>
PDTOT I PBAROTI PPRINTI

Place point* Place bar- Produce
valued output valued output pnnted
in INPAD in INPAO output
Figure 2.3. Structure chart for PERLND module, (Bicknell et al., 2005)
IMPLND ATEMPI 1 SNOW I IWATERI SOLIDSI IWTGAS

Perform
r ~ ~ - r(See
(See module
i.
module , Simulate
' Accumulate Estimate
compulations | PERLND) |
| PERLND)I water budget and remove water
on a segment for impervious solids temperatures
of impervious i h land segment and dissolvec
land gas concs.
i 11
TT2f2; 4 2(1) 1 4 2(1)2 I 4 2(2) 3 | * 2(2) 4 4.2(2) 5
4 2(2V3^> 42(2)4^
IQUAL IPTOT IBAROT

Simu rate
quality Place point Place Produce
constituents valued bar-valued printed
using simple output in output in output
relatlonshios INPAD
witr solids INPAD
and /or water
4 2(2) 6 4.2(2) 7 4 2(2) 8 4 2(2) 9
4 2(2).6 42(2).7 4.2(2).8
Figure 2.4. Structure chart for IMPLND module, (Bicknell et al., 2005)
42
RCHRES HYDR ADCALC CONS I HTRCH SEDTRN

Perform Simulate Prepare to Simulate Simulate Simulate
connputations hydraulic simulate behavior of heat bahavior of
for a reach or behavior advection of conservative exchange inorganic
mixed entrained constituents and water sediment
reservoir constituents temperature
4.2(3) 4 2(3).2 4.2(3) 3 4 2(3)4

ADVECT
SINK Simulate
Calculate advection of
quantity of constituent
material totally en
settling
out of control trained in
votume water
4 2(3)01 4 2(3) 3 1
BQUAL RQUAL RPTOT RBAROT RPRINT)

Simulate Simulate Put current Put ctrrer* Produce
behavior of behavior of values of values of printed
a generalized constituents point valued bar-valued output
quality involved in time series In time series in
biochemical INPAD INPAD
constituent
tranformatioiis
4.2(3) 10
.2{3i.10>
Figure 2.5. Structure chart for RCHRES module, (Bicknell et al., 2005)
iwûc*/ J / / / / f /
Irttcrccption
Storage
Lower /.erne StormHow

Storage^
sa
Figure 2.6. A flow diagram of the hydrological components of HSPF (Bicknell, 2005)
43
2.4.6 Previous Watershed Studies in the Study Area. Number of studies was
conducted in the area but generally as part of studies to investigate the flow and water
quality for the Upper Illinois River Basin system (Bartosova et al., 2007; Demissie et al.,
2007; Bartosova et al., 2005; Knapp et al., 2004). The studies did not tackle the
individual watershed and also the limited land use categorization used could not explain
the more detailed behavior of a highly urbanized watershed such as Chicago River
Watershed.
2.5 Data Integration and Data Warehouse
To understand nutrients fate and transport the key will always be available in
historical data records (Boynton et al, 1995; Vanclooster et al, 2004). Any evaluation and
analyses in a watershed should include the historical changes and variations, present
conditions, and potential future conditions (Tong et al, 2002; Randhir et al., 2009). To be
able to do that, data sources plays a great role. But the true challenge would be the
heterogeneity of data consumed from different data resources. Integrating data from these
different sources, in order to be useful, for assessment or analysis or for using data set for
a model application can be a difficult task because these would involve thorough
investigation of data pages and metadata they contain (Beran, et al, 2009; Horsburgh et
al., 2009).
Many organizations in the USA monitor important hydrologic variables such as
water quality and quantity, groundwater levels, and precipitation etc. but are managed by
different agencies. This division of responsibilities has created some barriers between
watershed data users and watershed data managers. Many believe that managing water
44
resource systems in a fully integrated fashion would alleviate these problems (Rooy et al.
1993).
Number of national data collection and publication systems that are operated by
government agencies have formed over the years. These include the USGS water data
storage and retrieval system (WATSTORE) which has been replaced by the National
Water Information System (NWIS), the USEPA storage and retrieval system (STORET),
the Natural Resources Conservation Service (NRCS) which operates and maintain
systems such as Soil Climate Analysis Network (SCAN) and SNOwpack TELemetry
(SNOTEL), the NOAA National Climatic Data Center (NCDC) and others (Horsburgh et
al., 2009). These national data systems are huge data stores, but, they have different data
storage, retrieval, and publication formats and systems (Beran, et al, 2009; Horsburgh et
al., 2008; McGuire et al., 2008). To synthesize data sets from these different sources into
a single analysis proved to be a difficult task because each system needs to be navigated
through the pages of metadata that it contains (Raskin et al., 2005; Horsburgh et al.,
2008; Horsburgh et al., 2009). Moreover, all these systems are traditional database
management systems that lack the ability to integrate data in a way that provide a
decision support system that could deliver actionable information (Maidment, 2005;
Teuteberg et al., 2009; Beran, et al, 2009).
During the past decade, initiatives by the U.S. National Science Foundation
(NSF), the American Geophysical Union (AGU), the American Meteorological Society
(AMS) and the International Association of Hydrological Sciences (IAHS) have brought
attention to the value of long-term hydrologic data to the investigation of long term
watershed scale impacts of hydrologic and climatic data (Marks et al., 2007). Ongoing
45
researches to understand long term impacts on natural resources based on various
hydrological data collected from experimental watersheds for more than thirty years
collected and stored data and made it available for retrieval in public websites (Marks et
al., 2007; Bosh et al., 2007; Moran et al., 2008; Nicholas et al., 2008). Also the Long
Term Ecological Research (LTER) network has made long term climatic and hydrologic
data collected for their research available in public website. Although the data provided
by these experimental watersheds will help to understand long term impacts, however,
these efforts to provide synthesized data for watershed assessment and analysis is more of
local benefit to the specific experimented watershed and will not give similar benefits to
other watersheds.
The concept of integrating data from different data sources' agencies is
introduced by the Hydrologic Information System (HIS) project which is developed by
The Consortium of Universities for the Advancement of Hydrologic Science, Inc
(CUAHSI) (sponsored by NSF). HIS system is designed to optimize data retrieval by
providing standard data format that allow effective sharing of information from existing
national databases such as NWIS, NCDC, STORET etc. (Maidment, 2005; Horsburgh et
al., 2008; Horsburgh et al., 2011; CUAHSI, 2012). Within the HIS, storage and
management of observations data and their associated metadata are accomplished by
using an Observations Data Model (ODM) which is a relational database model that
provides a framework in which data of different types and from disparate sources can be
integrated (CUAHSI, 2012). Also another system, an ontology-aide, search engine had
been introduced. The system named Hydroseek allows users to query multiple hydrologic
repositories simultaneously through a single interface regardless of the heterogeneity that

46
exist between the sources (Beran, et al, 2009; Hydroseek, 2012). Although these efforts
represent considerable progress in integrating heterogenous data records and sources on a
watershed scale but they are solely data storage or retrieval systems and none of them
provide integration system that support decision making.
Data warehouse (DW) technology is the integrated way introduced to manage and
analyze monitoring data (Rob et al., 2008). " A DW is a collection of consistent, subject-
oriented, integrated, time-variant, non-volatile data and processes on them, which are
based on available information and enable people to make decisions and predictions
about the future" (Inmon, 2005). DW is an in-advance approach to the integration of data
from multiple, huge, heterogeneous and distributed databases and other information
sources (Widom, 1995).
A DW environment includes components such as extraction, transformation, and
loading component (ETL), an online analytical processing engine component (OLAP),
and client analysis component (Ahmed et al, 2010). It enables business decision makers
to creatively improve various processes (Bernardino, 2002; Rainardi, 2007; Ahmed,
2010) including support of complex querying (Bernardino, 2002), and discovery of trends
and patterns in data (Tjoa et al, 2005; Han et al, 2006). DW store and maintain data in
multidimensional format that support aggregation, drilldown, and slicing/dicing of data
(Han et al, 2006; Sen et al, 2005; Kimball et al., 2002).
The management of huge amount of data and its complex analysis during queries
are most important in development of a DW (Bonifati et al. 2001; Chen et al. 2003;
Kambayashi et al. 2004; Rai et al, 2007). The DW specific property that makes it an
efficient application processer is that most of the applications are decision support
47
oriented applications that can summarize huge amount of data and deliver actionable
information (Ahmad, 2010; Rai et al, 2007). Furthermore, DWs have the benefit of
keeping historical records and are historically consistent to achieve better understanding
of the business processes (Lane, 2007; Ahmed, 2010).
DW technology has been introduced to the civil engineering sector for
organizations that generates a great amount of operational data that are distributed across
various functional systems to support its daily operations such as construction
management, site selection, and energy efficient building operation (Chau et al, 2002;
Ahmad et al, 2004; Rujirayanyong et al, 2005; Ahmed et al, 2009).
It has also been introduced into the field of environmental management,
sustainability and ecology (Burmann et al,2007; Teuteberg et al, 2009; Freundlieb et al,
2009) where growing need of decisions support process according to ecological criteria
such as electricity consumption or pollutant content are important. The concept was
developed to determine relationships between site characteristics, water quality variables
and fish community health (McGuire et al., 2006). It is also introduced in the
development an integrated approach for decision making in agricultural sectors (Rai et
al., 2007).
There is yet to be done regarding developing data warehousing in the
environmental and water resources sectors. The existing literature identifies ways to
incorporate spatial dimensions in DW but there is a lack in research on the process of
identifying the dimensions, facts, and hierarchies in spatial data warehousing for
environmental and water resources areas (McGuire et al., 2008). Given the nature of
environmental and water resources data and their sources, the development of an
48
integrated information system and DW would have a great potential in these areas
(Burmann et al., 2007).
2.5.1 Data Driven Models. Huge amount of data collected daily from monitoring
systems and the exponential growth and advance in the information systems, have
directed the attention to data mining area to generate models that can explain physical
systems. Data mining is based on the analysis of all the data characterizing a system and
model it given the basis of connections between the system state variables, with only a
limited number of assumptions about the physical behavior of the system (UNISCO-IHE,
2012). The discipline of data driven modeling is the study of mathematical algorithms
that improve automatically through experience and training (Preis et al., 2007). It has
developed with the involvement of areas such as artificial intelligence, machine learning,
data mining, knowledge discovery and pattern recognition. The most used models are
artificial neural networks, fuzzy rule-based systems and statistical methods.
Data driven modeling has gained a lot of attention in the last decades in both
hydrology and water resources research. While physical based models require the
description of the system's input, physical laws and boundary and initial conditions, a
data driven model simply extracts knowledge from large amount of data with only
limited number of assumptions about the physical behavior of the system. A data driven
modeling approach can only be considered if sufficient data is available.
Data driven modeling has been applied in areas such as rainfall-runoff modeling
(Minns et al., 1996; Dawson et al.,1998; Tokar et al., 2000; Solomatine et al., 2003;
Abedini et al., 2004; Muttil et al., 2004; Lin et al., 2007); flood forecasting (Sahoo et al.,
49
2006; Chen et al., 2007; Chiang et al., 2007); stream flow prediction (Imrie et al., 2000;
Asefa et al., 2006; Preis et al, 2007). Water quality constituents were also predicted using
data driven models in number of studies (Markel et al., 2002; Preis et al., 2007; Shrestha
et al., 2007).
Data driven models have proven their applicability to various water-related
problems. They would be useful in solving a practical problem or modeling a system or
process if (1) sufficient amount of data is available; (2) there are no considerable changes
to the modeled system during the period covered by the model (Solomatine, et al., 2004;
Solomatine, et al., 2007). They are effective if building knowledge-driven simulation
models is needed due to lack of understanding of the underlying physical processes (Preis
et al., 2007; Shrestha et al., 2007) or the available models are not adequate enough
(Solomatine, et al., 2007). It is always useful to have modeling alternatives and to
validate the simulation results of physically based models with data driven ones, or vice
versa (Solomatine et al., 2003; Preis et al, 2007).
2.6 Conclusion
To investigate land use effects on water quality in highly urbanized watershed
such as Chicago River Watershed, it is realized the importance of thorough understanding
of the spatial and temporal aspects of different attributes of water resources, especially
quantity and quality, and how are they are interlinked. Finding comprehensive ways to
interact and assess those attributes is the key for sound and successful watershed
management. This could be achieved by sufficient integration between watershed
elements such as water quality, quantity, climate and land use; and watershed problems,
50
conflicts, needs and targets; and improving domain knowledge and decision making
ability in the same time.
Methodologies for analyzing and assessing the watershed using data warehouse
and data mining technologies proved to be successful and getting lots of attention in the
water resources field relative to existing systems. Also using watershed perspective as a
tool has been accepted by water resource managers and policy makers as an effective
methodology to address effectively the full range of concerns in the watershed. So,
incorporating detailed land use and historical data records to develop tools to quantify the
impact on water quality are the key elements using both physical and data driven
modeling techniques.
51
CHAPTER 3
STUDY AREA
3.1 Introduction
The Chicago River Basin (hydrologic unit 07120003) is the smallest part of the
Upper Illinois River Basin (UIRB). It comprises 6 percent of the whole basin. UIRB is
part of the Mississippi River Basin which is world's second largest drainage basin and
includes comprehensively more than 40% of the land areas in USA. The significance of
the Chicago River Basin is its navigable system. The Chicago Sanitary and Ship Canal
along with the Illinois River, and the lower reaches of the Des Plaines River, provide a
navigable link between Lake Michigan and the Mississippi River.
3.2 Watershed Characteristics
3.2.1 Location and Drainage Area. The Chicago River watershed area is located in
northern Illinois, confined within latitudes 4111' and 4220' N and longitudes 8732'
and 8846' W. It drains approximately 645 mi2. The upper river is the North Branch
Chicago River which originates in the lake county as three tributary streams, West Fork,
Middle Fork, and the Skokie River, Figure 3.1. The three tributaries then flow south into
Cook County. The Skokie River joins the Middle Fork, which then joins the West Fork.
At the junction of combined Middle and West Fork rivers, begins the North Branch
Chicago River. It then ends at the junction of the North Branch and the North Shore
Channel. The North branch Chicago River then joins the South Branch of the river in
downtown Chicago. The South Branch flows into the Chicago Sanitary and Ship Canal
52
where it flows westwards and joins the Des Plaines River as a tributary of the Illinois
River which flows southwest across the state and join the Mississippi River system.
3.2.2 Topography. The uppermost bedrock of the Chicago River Basin is mainly
undifferentiated Silurian Devonian dolomite and limestone, and Ordovician shale (USGS,
1999). The Chicago River and the Des Plaines Basins are naturally divided by a drainage
divide in northern Cook County, Illinois. The origin of the fault has been explained as
being from either volcanic activity or from meteoric impact (USGS, 1999). Mean
elevation in the Watershed is 443 ft above sea level. The study area has a mean basin
slope of 0.001.
3.2.3 Population Growth. The Chicago River basin is a highly dense populated area.
Population in the basin grew steadily over the years and created urban and industrial
growth. As a result of this growth major changes in the region had taken place and have
significantly affected the quality of surface waters. These changes are the construction of
navigable waterways, diversion of Lake Michigan water, and construction of wastewater-
treatment plants (USGS, 1999). Wastewater disposal and storm runoff became a serious
issue in the watershed.
Before 1900's Chicago River and Calumet River used to flow and drain into Lake
Michigan. The Chicago River was considered the sewage system then. Because of
increased growth of population, the river was badly polluted, with human and industrial
wastes directly dumped into the river then into Lake Michigan. The problem to provide
clean drinking water from the lake and the contamination of the river that caused diseases
in the area, led to the decision to reverse the Chicago River by creating a canal from the
Chicago River to the Des Plaines River. A cut was made to the natural
subcontinental divide that separates the Chicago River and Calumet River basins from the
Des Plaines River basin. Now the Chicago River flows from north to south through Lake
and Cook Counties. Now, the population slightly declined in the last two decades but the
issues in the area are because of reasons related to the development and redevelopment of
urban areas.
3.2.4 Soils. Mollisols soils with low to very low permeability cover the entire
watershed (USGS, 1999). Poorly drained soils are the predominant soil in the north,
especially along the rivers. The hydrologic soil group classification identifies soil groups
with similar infiltration and runoff characteristics. Typically, clay soils are poorly drained
and have very low infiltration rates, while sand soils are well drained and have a higher
infiltration rates. United States Department of Agriculture (USDA, 2012) has defined
four hydrologic groups (A, B, C, or D) for soils (USDA, 2007). Type A soil has high
infiltration while D soil has very low infiltration rate. Generally, the watershed Chicago
River watershed has a moderately slow infiltration rate along Lake Michigan (hydrologic
group C) with very poorly drained areas along the western border of the watershed and
the rest of the watershed is highly altered, mainly impervious (ILEPA, 2009).
3.2.5 Climate. The climate of the watershed is classified as humid continental because
of the cool, dry winters and warm, humid summers. The combinations of cool, dry and
warm, moist air are the sources of most precipitation in the basin. Large daily fluctuations
54
in temperature and precipitation can result from this combination (USGS, 1999). The
average annual temperature ranged from 46 F to 51 F. Winter average low temperature
is 4F. Summer average temperature is 77F to 82F. Average annual precipitation is
approximately 16 to 18 in., and average snowfall (including snow, ice, sleet, and hail) is
approximately 50 in/yr. Evapotranspiration (moisture released from plants) returns an
estimated 70 percent of the average annual precipitation to the atmosphere.
3.2.6 Land Use. Human factors that affect the hydrologic characteristics of the
watershed include land use, urbanization, and population change. Population in the basin
grew steadily and created urban and industrial growth areas and that's due to the
construction of the navigable system that link Lake Michigan and the Mississippi River.
Numerous inputs of contaminants and nutrients from manmade sources that include
municipal and industrial releases, urban runoff, and atmospheric deposition become a
serious issue (USGS, 1999).
The Chicago River watershed is approximately 82% urban land use. Figure shows
land use percentages for the Chicago Metropolitan area were extracted from Chicago
Metropolitan Agency for Planning (CMAP). CMAP's 2005, Figure 3.2shows land use
Inventory created using digital aerial photography and supplemented with data from
numerous government and private-sector sources (CMAP, 2012).

i*J'* < t-
,v < 'y$i
'
1 * y
Figure 3.1. Study area (www.chicagoriver.org).

56
Urbanized Land Use Proportions by Sub-Region, 2005
100%
Under
Construction
Trans./Comm./
Util
Industrial
Institutional
Commercial
Residential
Chicago Suburban DuPage Kane Kendall Lake McHenry Will

Cook
Figure 3.2. Urban land use in Chicago (CMAP, 2012)

57
3.2.7 Surface Water Issues. Surface-water issues related to urbanization include point
and nonpoint sources of sediment, nutrients, trace elements, and organic compounds;
streamflow alterations; and the health and community structure of aquatic biota (USGS,
1999). In the early part of the 20th century, MWRDGC built large intercepting sewers to
redirect sewage to wastewater treatment plants, where it is cleaned before being
discharged as effluent. Today, the MWRDGC reclaims approximately 1.4 billion gallons
of wastewater each day.
The two main water treatment plants facilities that discharge into the Chicago
River watershed are North shore water treatment plant WRP and Calumet WRP. The
water in the CAWS is 70% treated effluent and the rest of the water is from Lake
Michigan and stormwater. Combined sewers that carry both sewage and stormwater serve
much of the area around the CAWS. The Tunnel and Reservoir Project (TARP) is the
MWRD's long term plan to reduce combined sewer overflows (CSOs). TARP works by
capturing the flow from CSOs before it gets to the waterways and diverting it to a system
of tunnels and reservoirs (MWRDGC, 2011).
3.2.7.1 Water Quantity Issues. Developments alter runoff patterns by converting
pervious land to impervious land, as well as by changing the lay of the land and drainage
patterns that result in a dramatic increase in the rate and volume of stormwater runoff and
a reduction in groundwater recharge (MWRDGC, 2007). The change in land cover, the
increase in construction activities that results in compact soils and smooth natural grades,
along with diminished native vegetation, and storm sewers systems and lined channels all
these factors aid in the conveyance of greater volumes of runoff downstream at much
58
faster rates (MWRDGC, 2007). All this led to increase in flooding, stream channel
erosion, and hydrologic destabilization of streams (MWRDGC, 2007).
3.2.7.2 Water Quality Issues. Much of the pollutant load in runoff originates from
impervious surfaces, particularly roadways and parking lots. Higher density
developments such as commercial, industrial and highway projects tend to contribute
higher pollutant loads than lower-density residential developments (MWRDGC, 2007).
Some common water quality impacts of stormwater runoff are sediment contamination,
nutrient enrichment, toxicity to aquatic life, bacterial contamination, salt contamination,
Impaired aesthetic conditions, and elevated Water temperatures. In general, nutrient
loads, nitrogen and phosphorus, were greatest from the urban center of the Chicago
metropolitan area, reflecting the effect of wastewater return flows to the Chicago River
and Chicago Sanitary and Ship Canal (USGS, 1999). About 30 percent of the total
nitrogen load in the upper Illinois River Basin was measured in the Chicago Sanitary and
Ship Canal at Romeoville, and primarily results from wastewater-treatment-plant
effluents.
The Chicago Sanitary and Ship Canal also was observed to carry the majority of
ammonia and phosphorus loads during low-flow conditions (USGS, 1995). It is
considered the main nutrient contributor to Illinois River and hence Gulf of Mexico dead
zone, the largest hypoxic zone measured. Hypoxia is the condition of low dissolved
oxygen in the water that occurs due to overabundance of nutrients that leads to excess
algal blooming or eutrophication. Hypoxia refers to dissolved oxygen concentrations less
than 2 mg/L. Prolonged hypoxia conditions can lead to death of biota in the waters. Table
59
3.1, lists common pollutants and their potential sources, found in Cook County
watersheds where most the Chicago River watershed lay within (MWRDGC, 2007).
3.3 Watershed Data Used in the Study
The review of available historical data records is an essential step in the analysis
of the watershed system. The analysis and assessment of data will help to pinpoint the
problem areas in the watershed. Figure 3.3 depicts the location of data sources and major
point sources within the watershed.
3.3.1 Data sources and types. For this study different types of data were compiled and
utilized from different source agencies for purpose of building WDW, watershed
assessment and watershed modeling, These agencies include U.S. Geologic Survey
USGS, Metropolitan Water Reclamation District of Grater Chicago MWRDG, Chicago
Metropolitan Agency for Planning CMAP, US Army Corps of Engineers- Chicago
District USACE, and Better Assessment Science Integrating Point & Non-Point Sources
BASINS data store. Table 3.2 shows source agency, station ID, data type and years of
data used.
60
Table 3.1. Sources and types of potential pollutants in the study area (MWRDGC, 2007).
Pollutant Potential Source
Total Dissolved Solids Highway/ road/bridge runoff (non-construction

related), urban runoff/storm sewers, combined sewer
overflows, municipal point source discharges,
sanitary sewer overflows
Total Suspended Solids Combined sewer overflows, sanitary sewer
overflows, site clearance (land development or
redevelopment), urban runoff/storm sewers
Sedimentation/Siltation Combined sewer overflows, sanitary sewer
overflows, site clearance (land development or
redevelopment), urban runoff/storm sewers
Dissolved Oxygen channelization, combined sewer overflows, upstream
impoundments, impacts from hydrostructure flow
regulation, sanitary sewer overflows
Total Nitrogen Combined sewer overflows, municipal point source
discharges, sanitary sewer overflows
Total Phosphorous Combined sewer overflows, sanitary sewer

overflows, municipal point source discharges, urban
runoff/storm sewers
Chlorine Combined sewer overflows, highway/road/bridge
runoff (non-construction related), municipal point
source discharges, urban runoff/storm sewers
Iron Combined sewer overflows, industrial point source
discharges, municipal point source discharges,
urban runoff/storm sewer
Silver Combined sewer overflows, municipal point source
discharges, urban runoff/storm sewers,
contaminated sediments
DDT Contaminated sediments
Heptachlor Contaminated sediments
Hexachlorobenzene Contaminated sediments
Aldrin Contaminated sediments
Lake Michigan
Caiumet
WRP
US6S
* WRP
MWRD
Scte ft KAomcters
Figure 3.3. Locations of data sources

62
Table 3.2. Sources' data description
Source Station ID/ Description Data Type Years

Agency
05536290, 05536118, Discharge, Gage 1970-2010
USGS
05536121,05536123, Heights
05536179, 05536190,
05536195,05536255,
05536275, 05536340,
05536500, 05536105,
05535000, 05535070,
05536290, 05535500,
05536000, 05536235
WW 31, WW 32, Effluents, Water 1970-2008

MWRDGC
WW 34, WW 35, Quality
WW 36, WW 37, Parameters
WW 39, WW 40,
WW 41, WW 42,
WW 43, WW 46,
WW 48, WW 49,
WW 50, WW 52,
WW 54, WW 55,
WW 56, WW 57,
WW 58, WW 59,
WW 73, WW 74,
WW 75, WW 76,
WW 77, WW 78,
WW 86, WW 92,
WW 96, WW 97,
WW 99, WW 100,
WW 101, WW 102,
WW 103, WW 104,
WW 105, WW 106,
WW108
CMAP Shapefiles Land use 2001,2005
Inventory
USACE Station no. 10 (Cook Precipitation 1999-2006
County Precipitation
Network)
BASINS BASINS Data Store Climate

63
3.3.2 Point Sources. Point sources refer to a direct discharge of pollutants to a
waterbody through a discrete conveyance such as a pipe or channel. A number of point
sources discharge actively within the Chicago River Watershed. They are permitted from
National Pollutant Discharge Elimination System (NPDES) permits (ILEPA, 2009). This
include facilities, treatment plants, combined sewer overflows (CSOs).
NPDES were included in the HSPF water quality model as direct inputs to the
main reaches in the watershed. Pollutants species considered are the total nitrates as
nitrogen (N02+N03) total ammonia (NH3+NH4+) and TP as phosphorus. For this study
only the North shore WRP and will be considered. Table 3.3 shows the average values of
some parameters of the effluent.

64
Table 3.3. Average annual North side WRP effluent
Flow TKN NH3-N N02-N N03-N TP TN TP

Year MGD mg/l mg/l mg/l mg/l mg/l lb lb
1990 294 2.0 1.2 0.5 6.0 1.0 7677968.2 896094.9
1991 291 1.8 0.7 0.3 6.1 1.0 7350200.0 892093.4
1992 276 2.2 0.8 0.4 6.1 1.0 7307889.6 872292.7
1993 299 2.3 0.9 0.3 6.0 0.9 7784686.9 862227.7
1994 268 2.9 1.3 0.4 5.8 0.9 7391447.9 752196.0
1995 265 2.7 1.0 0.4 5.9 1.0 7248471.8 836775.6
1996 265 2.7 1.0 0.4 6.2 1.2 7540693.1 973123.9
1997 253 2.3 0.6 0.4 6.7 1.4 7254368.3 1095403.9
1998 265 2.0 0.4 0.3 7.0 1.4 7487739.7 1162582.2
1999 268 2.2 0.6 0.4 7.1 1.3 7938693.6 1030273.6
2000 252 2.0 0.4 0.4 7.7 1.6 7778360.2 1216608.0
2001 280 2.2 0.8 0.5 6.8 1.2 8080259.0 988723.7
2002 250 2.0 0.7 0.5 6.8 1.4 7069922.3 1057824.8
2003 238 2.4 1.0 0.5 7.1 1.4 7215978.2 999804.2
2004 243 2.6 1.1 0.6 6.8 1.3 7397163.0 969028.4
2005 234 2.3 1.0 0.5 6.9 1.1 6909498.2 804920.9
2006 244 1.86 0.5 0.3 8.3 1.4 7748476.5 1039864.6
2007 241 1.69 0.5 0.3 8.3 1.3 7510150.9 983061.7
2008 245 1.30 0.2 0.2 8.8 1.4 7638529.7 1044126.3
2009 245 1.36 0.2 0.3 8.7 1.3 7644496.1 999378.0
2010 226 1.4 0.3 0.2 8.9 1.4 7236032.7 956273.6
2011 244 1.8 0.5 0.4 8.8 1.3 8129512.6 973016.1

65
3.4 Watershed elements
The basic watershed elements are water quality and quantity, climate, land use,
and any other characteristics that define a watershed such as watershed size, shape, slope,
soil type, drainage area, hydraulic roughness and population. Interactions among these
elements and their attributes can result in different unique problems, conflicts, targets,
and needs that a watershed would experience and as shown in Figure 3.3 and also a list
further defined these elements.
For the Chicago River watershed, these watershed elements are further defined as
follows:
Water Quantity:
Stormwater runoff
Sewer systems discharge- outfalls, combined sewer systems etc.
Water Treatment facilities
Receiving waters- Chicago waterway, Chicago River and tributes
Water Quality
" Sedimentation and sediment contamination
Nutrients
Toxics
Bacterial contamination
Salt contamination
Impaired recreation conditions
Elevated water temperature
Impaired habitat for aquatic life

66
Land Use:
82% urban:
~ 56% residential
~ 10% commercial
~ 10% industrial
~ 10% institutional
~ 15% Transportation/utilities
21 % open space, agriculture, vegetation, wetland, and water
Climate:
Wide temperature fluctuations
Urban heat island (due to building materials thermal admittance and structures
geometry)
High levels of air pollution-cloud formation
Increased water vapors-cloud formation
Altered wind patterns-micro advection
Increased precipitation
Watershed Characteristics:
Size
Shape
Drainage area
Soil type
Average slope
Hydraulic roughness (land cover)

Population
Urbanization degree
Problems:
Increase of volume and rate of runoff
flooding
Pollutants
Excess nutrients
Excess fecal coliform bacteria
Excess erosion
Increase of water temperature and pH, and decrease of DO
Alteration of physical stream habitats
Loss of biodiversity/ habitat
Toxins in water and sedimentation
Conflicts:
Urban development and urban sprawl alter natural land use
Treatment facilities pollutes receiving water
Storm sewers, drainage systems, rooftops, driveways, roads, highway,
parking lots increase rate and volume of runoff and pollutes receiving water
Targets:
Healthy River and good water quality
Better recreation
Environmental education and awareness

68
Environmentally sustainable economical development
Healthy wildlife habitats
Reduced flooding and flood damages
Needs:
Integrated Watershed approach
Comprehensive Watershed assessment
Decision models
Optimization approaches to resolve conflicts
- TMDLs
WQ standards
BMP
69
Water Water . . . . w a t e r s h e d
~ .. ~ LandUse Climate r. . ...
Quality Quantity ! Characteristics
j " ' " ' 'j " r

: i ; !
, .. _ T ,
Watershed
. ? ... t r
Problems Conflicts Needs Targets
Figure 3.3. Basic watershed elements

70
3.4 Conclusion
Given the study area conditions and the watershed elements, the scope of the
study would fit in utilizing these data and to incorporate theses elements. WDW will
make it easy to access, retrieve, fill data gaps, analyze, and manage available historical
data records. The data then is used in develop watershed models: data driven model to
predict water quality and quantity using data driven algorithms, and physical watershed
model to simulate land use effect on water quality producing local export coefficients for
the Chicago River Watershed. Optimization approach for land use tradeoff is introduced.
Given the Chicago River watershed needs, the study provides the following: BASINS
provide the integrated watershed platform; Data Warehouse and HSPF provide the
decision models and comprehensive watershed assessment; and optimization approach
provides an approach to resolve conflicts mentioned in section 3.4.

71
CHAPTER 4
WATERSHED DATA WAREHOUSE
4.1 Introduction
Decision making in watersheds always involve information processing on
multiple attributes of water resources, especially quantity and quality. How to interact
and assess those attributes is the key for sound and successful watershed management.
This chapter considers the development of an effective and comprehensive tool that will
holistically integrate some of the watershed attributes and assess them in a watershed
perspective.
For watershed assessment, it is important to have a thorough understanding of the
spatial and temporal aspects of the watershed and available historical data records. Many
organizations and individuals monitor important hydrologic variables that would help to
assess watersheds, however, the different data storage systems and formats they have
make it hard to integrate data.
Moreover, all these systems are traditional database management systems that
lack the ability to aggregate data and provide a decision support system that analyze data
and deliver actionable information. Therefore, this chapter addresses this problem by
proposing the design and implementation of a multi-dimensional data analysis concept
for available watershed data.
The objective of this chapter is to demonstrate how to integrate and analyze data
from different data sources. A local DW that aggregates different available data types
from various agencies in the watershed will be presented. Historical records of surface
water quality, quantity data, land use and climate will be investigated and showed as an
72
example for this study, but more attributes can be easily added and utilized following the
same procedures. The DW will make it easy to access, retrieve, fill data gaps, analyze,
and manage data records of water quantity and quality, climate, land use etc. in the
watershed and to integrate and provide the data for different requirements such as
watershed assessment, physical modeling, or merely pinpoint problem and impairment
locations in the watershed.
The overall objectives of this chapter are: Firstly, the development of a multi
dimensional watershed data model that is described based on DW technology; Secondly,
the introduction of a graphical user interface that brings the benefits of the multi
dimensional model to different stakeholders; Finally, the demonstration of the advantages
of multi-dimensional watershed data through the assessment of the Chicago River
Watershed.
4.2 Data Warehouse Technology
A DW is a repository of integrated information that is made accessible for queries
and analysis and can be used as a foundation of a decision support system (Chau et al,
2002). Behind DW technology is multi-dimensional data modeling concept. An object
oriented multi-dimensional model is denoted by F (D1,DZ, D3 ....,Dn) that consist of a
fact name and list of dimensions. D is made up of list of category attributes D L (A t ,A 2 ,
A3, , A n as shown (Ahmed, 2010),
(0*i, ,A n r Top D }; -) 4.1
Each dimension is organized into a hierarchy can be composed of numerous
levels, each allowing data aggregation at desired level of abstraction (Ahmed, 2010).
73
Each level in a dimension can have additional attributes that provide descriptive
characteristics about the facts to narrow the search and classifying of the facts data (Rob,
2008). These descriptive attributes and the dimension hierarchy attributes are called
dimensional data. D is the domain of /4; and is and TopD is a specific generic, maximum
element that is functional and definable from all other attributes (Gosain et al., 2010;
Ahmed, 2010), as shown in equation 4.2.
Vt(l < i < n): Di -* Top0 4.2
Only one A t determines all other category attributes and thus defines the finest
granularity (Gosain et al., 2010; Ahmed, 2010), see equation 4.3.
3 i (l < i < n)Vy(l < j < n , i & j ) - D i - D j 4.3
To create a complete warehousing environment, four separate and distinct
components need to be considered (Kimball et al., 2002), see Figure 4.1.

74
i.. />/
Data Source Data Staging Data Presentation Data Acess Tools

Area Area Area Query Tools
Flat Files Processing Data Marts Analytical
Realtionai Database Data Stores (Flat Dimensional Model Applications
files and tables) Data Mining
Figure 4.1. Data Warehouse components

75
4.2.1 Data Source Area. The data source area includes heterogeneous databases that
supply data to the warehouse (Rai et al, 2007). This includes flat files and operational
spatial databases. The source systems should be thought of as outside the DW because
there is little or no control over the content or format of the data (Kimball et al., 2002).
The main priorities of the data source area are processing performance and availability
(Kimball et al., 2002; Inmon 2005). Homogeneity and consistency among different
sources would be preferred but not required since data will be processed in the staging
area (Kimball et al., 2002).
4.2.2 Data Staging Area. The data staging area is an intermediate database where both
data storage and extract-transformation-load (ETL) processes take place. It includes the
identification of relevant information; the extraction of this information; the integration
of the information from multiple sources into a common format; the cleansing of these
data sets; and the propagation of the data to the DW (Kimball et al., 2002; Sapsford et al.,
2006; Simitsis et al., 2005). The data staging area is dominated by the simple activities of
sorting and sequential processing and does not provide query and presentation services.
4.2.3 Data Presentation Area. The data presentation area (or multi-dimensional data
model) is considered the core of the DW. It is the area where integrated data marts are
organized, stored, and made available for direct querying by users (Kimball et al., 2002).
All the data presented, stored and accessed through a dimensional model. If the
presentation area is based on a relational database, then model is to as star schemas and if
it is based on multidimensional database or online analytic processing (OLAP)

76
technology, then the data is stored in cubes (Kimball et al., 2002). Data must be atomic
and must adhere to the DW bus architecture where the overall data architecture for the
warehouse was identified in order to deliver the granular data in a dimensional form. The
bus architecture provides a rational approach and framework to decompose the DW
planning task.
4.2.4 Data Access Tools. The final major DW component is the data access tool area.
This area provides an interface for end users to retrieve, process, organize, analyze, and
export data to external environments as appropriate. It can be a simple tool such as an ad
hoc query tool or a complex one such as a sophisticated data mining or modeling
application (Kimball et al., 2002).
4.3 Watershed Data Warehouse
The basic watershed elements are water quality and quantity, climate, land use,
and any other characteristics that define a watershed such as watershed size, shape, slope,
soil type, drainage area, hydraulic roughness and population. Interactions among these
elements and their attributes can result in different unique problems, conflicts, targets,
and needs that a watershed would experience (see Figure 3.3 and section 3.4).
Information regarding interactions and relationships among different watershed
elements at a watershed scale is an important step in developing an effective decision
support system and a sound watershed management plan. It is known that factors such as
changes in climate and land use would alter the hydrologic cycle and affect the quantity
of water available for runoff, streamflow and ground water flow (Changnon et al., 1996)
77
and water quality (Tong et al., 2009). Also it was a given fact that watershed hydrology is
intimately related to land use, soil type and climate (Chow et al., 1988). Inspite of this,
assessment of these relationships is not always considered in policy design (Randhir et al,
2009).
The focus of this study is to develop an effective way to facilitate the evaluation
of these interactions process among watershed attributes by utilizing a WDW. Different
watershed attributes such as precipitation, nutrients, surface flow that stem from basic
watershed elements such as climate, water quality, and water quantity can be evaluated
by gaining more information about the single attribute or retrieving information across
multiple attributes.
Water Water , ,,, ... watershed

Quality Quantity La"d Use C"mate characters
T
Watershed
? T T T
Problems Conflicts Needs Targets
Figure 3.3. Basic watershed elements (shown before in section 3.4)
The interactions among attributes and the difficulty in assessing them play a vital
role in resource management (Randhir et al, 2009; Randhir et al, 1997). Recognizing the
right relationship is an important step to achieve the potential mix of products and
services that could be provided by a watershed (Randhir et al, 2009; Lovejoy et al. 1997).
The complexity of interactions among different watershed elements and the difficulty in
78
assessment are major reasons that lead to adopting evaluation plans that focus on single
element or attribute.
The basic watershed elements data are segregated among different operational
systems and data sources that support them. The segregation causes many problems for
watershed scale data analysis including: difficult data sharing; redundancy, multiple
entries for the same data may happen at various locations, slower decision-making
process; and does not support advanced analysis that are important for supporting holistic
watershed scale decisions.
In watershed scale analysis and assessment, all data can be associated according
to a specific purpose. The WDW will be capable of providing information based on the
interaction among the basic resources. Collecting and analyzing data in this fashion
sound practical and logical.
4.4 The Development of Watershed Data Warehouse
In designing a DW, the first challenge is to determine how to integrate data
sources in a DW. Two distinct approaches may be used to determine the corresponding
strategy (Rujirayanyong et al, 2006): need-based (top down) and availability-based
(bottom up) approaches. The need-based approach takes care of data that will be needed
in the future based on the watershed needs, so that these data will be acquired and be
added to the warehouse.
The availability-based approach will determine which data is currently available
in the source systems; and the available data will be added to the warehouse. In this case
some uploaded data may not have any immediate use but may become useful in the
79
future. For the WDW, a hybrid approach is adopted, taking into account the watershed
needs and data source realities.
This study classifies watershed data into five categories:
Water quality data such as water temperature, nutrients concentration, DO, pH
etc;
Water quantity data such as stream flow, groundwater flow, surface runoff etc;
Climate data such as precipitation, air temperature, evaporation, cloud cover etc;
Land use such as urban, agricultural, etc;
Watershed characteristics such as slope, hydraulic roughness, population, soil
type etc.
All this data may exist in a large variety of formats but they will be standardized
in the DW staging area.
4.4.1 Data Sources. Within the United States, many hydrologic variables such as
streamflow, water quality, groundwater levels, soil moisture, and precipitation are
monitored by agencies such as such Environmental Protection Agency EPA, U.S.
Geological Survey USGS, the NOAA National Climatic Data Center NCDC, and others.
Number of national data collection and publication systems have formed to collect these
data under one roof (e.g. STORET).
These systems contain huge amount of data, but have the different storage
systems and formats, along with different data retrieval systems remained an obstacle to
access and utilize these data (Horsburgh et al., 2009).

80
4.4.2 Dimensional Modeling. A dimensional model contains the logical design of a
DW, preferably for the most atomic data collected. Data at its lowest grain level provides
maximum analytic flexibility because it can be constrained and rolled up in many
different ways (Kimball et al., 2002).
In DW, Data is either regarded as fact data or dimensional data (Rob et al., 2008).
The fact data tables consist of numeric measurements and are joined to set of dimensional
tables that are filled with descriptive attributes. Fact table is the primary table in the
dimensional model. It is where numerical performance measurements are stored. A row
in a fact table corresponds to a measurement and all measurements in a fact table must be
at the same grain (Kimball et al., 2002). One example for a fact measure is a specific
watershed reading data e.g. flow.
Dimensions are described as discrete attributes which determine the minimum
granularity adopted to represent facts. They are the entry points into the fact table and
hence the users interface for the whole DW. The dimension attributes are the primary
source of query and reporting (Kimball et al., 2002). The power of the DW is directly
proportional to the quality and depth of the dimension attribute (Kimball et al., 2002).
Given the watershed reading flow example, typical dimensions for the watershed reading
data would be flow type, flow location or flow date.
Dimension table is defined with a primary key field while the fact table uses
foreign key fields to reference with its dimension tables. The fact and dimensional tables
are simply joined in a star join schema. The resulting dimensional schema is scalable to
allow new fact and dimension tables to be added as needed and extensible to
accommodate change (Kimball et al., 2002; Rujirayanyong et al, 2006).

81
To build a DW for a watershed, a hybrid of top down-bottom up approach was
followed. All possible facts and dimensions were identified and possible linkages
between them were established through Bus Architecture Matrix (BAM) (Kimball et al.,
2002) (see Table 4.1). By defining a standard bus interface for the DW environment,
separate fact and dimensional models that share a comprehensive set of common and
conformed dimensions can be implemented. In Table 4.1 the watershed processes were
laid out as matrix rows. The matrix rows translate into facts based on the watershed
primary activities. The rows of BAM are facts (data marts) and columns are possible
dimensions and intersections of data marts and dimensions are marked. This watershed
BAM mapped all the processes which need to be considered to get all data marts to
conform to each other on a common definition of dimension.
The watershed processes or fact tables proposed for this study are Watershed
water quality, Watershed water quantity Watershed climate, Watershed land use, and
Watershed Characteristics. The BAM can be expanded by adding either new watershed
processes (data marts) or more detailed existing processes along with their corresponding
dimensions as needed.
82
Table 4.1. The Bus Architecture Matrix for WDW
Processes Date Location Source Measurement Land Watershed

agency details use characteristics
type type
Watershed XXX X X
water quality
Watershed XXX X X
water quantity
Watershed XXX X
climate
Watershed XXX X
land use
Watershed XX X
characteristics
A grain level for each entity (fact table and dimension) will be determined
according to watershed requirement and data availability. Table 4.2 provides definition of
the entities used in the proposed WDW model; it defines the type, description and grain
of the fact and dimension tables. The two types of slowly changing dimensions used are
fixed where it indicates that the information about dimension is fixed and never changes;
and type 1 where it indicates that the information about dimension can be updated and
new information can overwrite the old one where the update is insignificant to be tracked.
The grain level provides information about the level of individual record in each fact
table making it easy to choose appropriate dimensions to be associated with the fact table
(Rai et al, 2007).

83
Table 4.2. Entity definition (1 of 2)
Entity Entity type Description Grain
Watershed Fact Contains water quality readings (e.g. A reading
water quality nitrates, DO etc.) at different
monitoring stations
Watershed Fact Contains water quantity readings (e.g. A reading
water surface water flow) at different
quantity monitoring stations
Watershed Fact Contains Climate readings (e.g. A reading
climate precipitation, air temperature etc.) at
different monitoring stations
Watershed Fact Contains pervious , impervious and A land use
land use total areas of different land use types area
at different monitoring stations
Watershed Fact Contains different parameters that A watershed
characteristics describe a watershed parameter
value
Date Dimension- Provides hierarchies for analyzing A day
Fixed monitoring data for different dates or
date ranges (e.g. days, weeks, months,
seasons, years)
Location Dimension- Provides information about the A monitoring
Type 1 monitoring stations (e.g. station ID, station
location description, longitude,
latitude, monitoring agency)
Source Dimension- Provides description about the A monitoring
agency Fixed monitoring agency (e.g. name, type) agency

84
Table 4.2. Entity definition (2 of 2)
Entity Entity type Description Grain
Measurement Dimension- Provides detailed information that A measurement
details Type 1 describe the water quality, water
quantity, and climate readings (e.g.
name, unit, category, subcategory)
Land use type Dimension- Provides hierarchies of land use type Level III land use
Fixed (e.g. land use level, land use code and type
description)
Watershed Dimension- Provides information about different A watershed
characteristics Fixed watershed characteristics (e.g. characteristics
type drainage area, soil characteristics, parameter
population etc.)
Dimension tables represent hierarchical relationships (Kimball e al., 2002). Each
dimension is structured in a way that allows filtering or aggregating fact measures from
fact table at a desired level of hierarchy, for instance Date Dimension allows aggregation
of data for day level, week level, month level etc. Each level in a dimension can have
more attributes to provide descriptive characteristics about the facts to filter the search
and classifying of the facts data (Rob et al, 2008). The basic dimensions that shows the
explicit grain proposed for this study are Date dimension, Location dimension, Source
agency dimension, Measurement details dimension, Land use type dimension and
Watershed characteristics type dimension
The six dimensions of the multi-dimensional model in details are:

85
1. Date Dimension: This dimension specifies the daily grained measurements. It is
the structure of time providing access to the watershed's historical records. This
structure aggregates data from the day level, week level, month level, season level
and to the year level, in a single standard calendar year hierarchy.
2. Location Dimension: This dimension specifies localizations among monitoring
stations, land use, and watershed characteristics. It structures the physical
locations of a monitoring station or a location where land use data or specific
watershed characteristics can be related. It can facilitate aggregation of data
based on location that is specified by the monitoring station ID, available location
description, source agency, longitude, and latitude.
3. Source Agency Dimension: This dimension specifies data sources. It aggregates
data based on agency's name (e.g. USGS, EPA) and type (e.g. Federal, regional,
local) or type of measurements (e.g. water quality, water quantity).
4. The Measurement Details Dimension: This dimension specifies details about the
measurements to be aggregated such as name (e.g. total phosphorous, flow), unit
(e.g. mg/1, cfs), category (e.g. water quality, water quantity), and subcategory (e.g.
chemical, physical).
5. The Land Use Type Dimension: this dimension specifies the land use level it is
level 1 (e.g. urban land use), level 11 (e.g. residential urban land use), or level 111
(e.g. single family residential land use) and the specified code and description for
the land use.
6. The Watershed Characteristics Dimension: this dimension specifies the different
types of elements that characterize a watershed such as hydrologic units,

86
watershed, shape, watershed length, watershed slope, drainage area, surface
roughness, soil characteristics, and watershed population.
Figure 4.2 shows the roll-up for the land use type dimension as an example of
hierarchal relationships represented by dimension tables. All data regarding the
dimensions is stored in corresponding dimension tables and all fact measures are stored
in separate tables. Each fact data like watershed water quality, watershed water quantity,
watershed climate, watershed land use, and watershed characteristics is individually
related to the dimensional data. Since the presentation area is based on a relational
database, these dimensionally modeled tables are referred to as star schema (Kimball et
al., 2002).
Using star schema as a data modeling technique will provide an efficient query
environment. It makes the implementation of multi-dimensional data analysis easy while
keeping the relational structure of the dimensional and fact data (Rob et al, 2008). Figure
4.3 shows the star model for one of the proposed watershed processes, watershed water
quality data mart and the corresponding dimensions. In the star schema model, the
watershed water quality (numeric measurements) is joined to set of dimension tables
(date, location, source agency, and land use) that are filled with descriptive attributes.
Each fact table can be shown individually as in Figure 4.3 with dimensional tables
displayed radial around it or can be shown collectively with all fact tables included. The
proposed WDW is designed as a multi dimensional model and shown in Figure 4.4. The
Figure shows the five central fact tables for the five different watershed processes and
which consist of measurements and dimension keys to set of six smaller six dimensional
tables detailing the dimensional attributes and hierarchical attributes.

87
9 Lana Use Level In Code >o- . Land Use Level II Code >0- . Land use Level I Code
* Land Use Level III Desc Land Use Level II Desc V Land Use Level I Desc
Figure 4.2. Roll-up for the land use type dimension and related attributes
Land Use Type

Location
Watershed
Source Agency Date
Water Quality
Measurement Watershed Characteristics

Details Type
Figure 4.3. Star schema model for watershed water quality data mart
tlMrtonK*
'SMoniO ISourc* ApncvK*
*St*wOK /tqwiHm* V*afer$MCnTyp*K
j ^ Stalon MMMtonna Agtcf ÂotflcyTm >/H|PItmOflK UM
OmmtonTwt Fwfl VW***ftt8ha0
DtmrttonT## Typ*2 /VM*rtMSlop
rOr*n*Qt**i
'SurflClROUgMtilS
/StfCMractansfcs
W*rtft#*Poputto
Owttnvofl Typt Fad
:*08Kn(fK3
;4M*Hurtmfit OftaxiKet (FK)
ftDmwvfiO j j*OaKr{fK3 iftnxmr>Ktr(fKi
i*8oure Afltnw **(FK)
ftttMfuwwtOffaiis ; j*Und vw T*n My(FK> !Awalritt4 cnf tm kv (Fk)
jtt tfmmtnwnt Da** Ky<F 0 !<Uftd UMTVP#Kcv<FK) ftlO<lMnKy(F)C 'HUKatOftK^lTK)
j*iaridOMTfl*K*(FK) i*L0<l6WiWif(FK) SowttAflancrKfUfFfc) Source AQtncy tit* (f *5 i FtCtTy* Atomic
<L0CWiKW{FM3 jftSourtt Agincf ** (Fk) vftttftngVaM | / land UM Art* pwcwiast
i<SOMfCA#tKYKtfkl I^Rodrngvuu# FKITW Monvc Fad TW Awn*
;vR*omtVaM
F*ct1to Atom*
|>Ful Da*
I^DtyOfWtfk
4/DwNyminKon*
;**** Matevm |vOwNgmOri
i'HMwrwwntNam# $UA4UTw *Ktt "VDwNarr*
* C wtormrt MMwrtn*rtNam# jvtawu$U*HCot i^DrAfi&w
jvUfldUStLmllOMt ;/** NwfflYir
!* Land Us# Itfttk Co4t 'vWM NumCwatt
:'Mw*nnt Subcategory IV Lfld Uft*lfitKDtr i>Mor*
: Dmwntiofl Typ* Tr* 1 IvLMdUMUvvtNt COM i> north Nam Owrali
[vUM UitUwtlWOMC !/north Nam*
| 0*iMWfn Tip* Furt I^MontiANrtv
!'$asw
/5am# DftyYtarflfo
; Dimtnsion Type Fnao
Figure 4.4 Multi dimensional model for watershed

o
oo
89
Table 4.3 shows the dimension, fact and stage tables' statistics. Table 4.4 shows
watershed water quality fact table resulting from the star schema as an example for
watershed processes fact tables. It shows the watershed processes readings measures and
all dimensions that related to the fact tables via dimension primary keys. All fact tables
have three or more foreign keys, designated by the FK notation in Figure 4.4, that
connect to the dimensions tables' primary keys (Kimball et al., 2002). For example a date
key in any of the fact tables always will match a specific date key in the Date dimension
table and when all the keys in the fact tables match their respective primary keys
correctly in the corresponding dimension tables, then the tables satisfy referential
integrity and the fact tables could be accessed via the dimension tables joined to them
(Kimball et al., 2002).

90
Table 4.3. WDW tables' statistics
Table Name Table Type Number of Average

Rows Row
Length
DATE_DIM
DIM 29950 89
LAND_USE_TYPE_DIM
DIM 136 86
LOCATION_DIM
DIM 77 82
MEASUREMENT_DETAILS_DIM
DIM 159 29
SOURCE_AGENCY_DIM
DIM 4 72
WATERSHED_CLIMATE_FACT
FACT 33878 28
WATERSHED_LAND_USE_FACT FACT 199 51
WATERSHED_WATER_QUALITY_FACT FACT 824736 28
WATERSHED_WATER_QUANTITY_FACT
FACT 151692 29
MWRD_READINGS_STAGE
STAGE 1377409 33
NWS_AIR_TEMP_STAGE
STAGE 17593 18
NW S_DAlLY_PREC_STAGE
STAGE 16285 10
USGS READINGS STAGE STAGE 233446 30

91
Table 4.4. Watershed water quality fact data table

Name Description
Date Key (FK) Foreign key from the date dimension
Measurement Details Key (FK" Foreign key from the measurement details
dimension
Land Use Type Key (FK) Foreign key from the land use type dimension
Location Key (FK) Foreign key from the location dimension
Source Agency Key (FK) Foreign key from the source agency dimension
Reading Value The value of a reading (e.g. water temperature )
4.5 Graphical User Interfaces
The review of available aggregated historical data records is an important step for
more detailed and better assessment and analysis of watershed data. To facilitate access
to the WDW a tailored graphical user interfaces (GUI) dashboard was built. In definition
a dashboard is a multilayered performance management system that is built on top of a
business intelligence and data integration system to facilitate the different tasks of the
stakeholders and help to monitor measure and manage a business activity.
The GUI is a web base browser applet implemented in Java that can be accessed
by simple internet browsers. The distinctive feature of this dashboard is that it consists of
two view layers of information, a monitoring layer that shows graphical abstracted data,
graphs, symbols and charts; and an analysis layer that allows summarized dimensional
data, hierarchies, slicing and dicing of data through ad hoc analysis tool (Eckerson,
2006).
The purpose of the monitoring layer is to visually convey the information via
visual elements such as graphs, dials, gauges, symbols, alerts, charts and tables with
specific formats or any other visual elements that gives information. For analysis layer
aspects such as dimensional time series analysis and segmentation are considered along
with visual analysis, reporting, and predictive statistics and modeling tools that could
give information about root cause of a problem. Theses successive layers provide
necessary details, views, perspectives that enable users to understand a problem and
identify the steps they must take to address it (Eckerson, 2006). The dashboard allows
access to the WDW for users where access to the internet is possible. Example of
watershed dashboard is shown in Figure 4.5.
The GUI, Figure 4.5, allows tracking of different parameters for different water
quantity, quality stations and climate data or any selected watershed process through any
desired time period. The main purpose is to show watershed data with a complete view
including location and date selection. This enables the user to view the watershed
conditions in this specific location and date selection to build up information and
knowledge about it.
A graphical representation provides the user with a sensitivity level of the selected
parameter they want to assess. If the user is only interested in obtaining information
relating to a particular station for a selected period of time, it will be possible to assess
whether this station data is sufficiently available for the selected period. The user can
scan through a number of successive water quality and quantity monitoring stations in
93
different locations and different date levels that range from a day level, week level,
month level, year level or even a seasonal level from the time selection panel. The
graphical representation is updated with the relevant selected information.
The dimensional data can further be analyzed through ad hoc analysis tools where
data can be sliced and diced to find patterns or pinpoint certain problem areas. Figure 4.6
shows a sample of ad hoc analysis for average, maximum, and minimum values for total
phosphorous during summer for all the water quality stations within Chicago River
Watershed in the period 1970-2010.
All the analyzed data, graphs and tables could be exported in several format (such
as excel, or PDF) and used in other tools such as data mining, modeling, and power point.
94
MMtmwnm MonWy Avg
Figure 4.5. Graphical user interfaces for WDW.
v-" iuoQ.>Hir t*<u 4 tt*aK._<ue ;'v*i tuone,.*jeM
A."hi. Arviti-.i> f'.ir hi'Avf WafMSNKl
Figure 4.6. An ad hoc analysis example for WDW.

95
4.6 Chicago River Watershed Data Warehouse
4.6.1 Watershed Condition and Data. The WDW concept was demonstrated for the
Chicago River Watershed.
The Chicago River basin is a highly dense populated area. Population in the basin
grew steadily over the years and created urban and industrial growth. As a result of this
growth major changes in the region had taken place and have significantly affected the
quality of surface waters. These changes are the construction of navigable waterways,
diversion of Lake Michigan water, and construction of wastewater-treatment plants
(USGS, 1999). Numerous inputs of contaminants and nutrients from manmade sources
that include municipal and industrial releases, urban runoff, and atmospheric deposition
become a serious issue (USGS, 1999).
Now, the population slightly declined in the last two decades but the issues in the
watershed are because of reasons related to the development and redevelopment of
available urban areas. The watershed is considered highly urbanized area with almost
82% urban land use. The increased water quality and quantity issues along with
uncontrolled invasive species form the Mississippi river that threatens the Great lakes
ecology, raised the calls for taking extreme measures to resolve these issues.
But before taking drastic measures to solve problems in the watershed, a thorough
understanding of the watershed elements is essential. The historical records of water
quality and quantity, climate, land use, and other watershed characteristics data will offer
better understanding, assessment, and analysis for the watershed. Details of these
elements were given in Chapter 3.

96
In an effort to provide better assessment and analysis and comprehensive data
repository for the watershed, a WDW for the Chicago River watershed is proposed. The
WDW is an in-advance approach to the integration of data from multiple, possibly very
large, distributed, heterogeneous databases and other information sources (Widom,
1995). It will manage and analyze monitoring data in an integrated way that will develop
an effective way to facilitate the evaluation of the interacting watershed process as
explained in sections 4.3 and 4.4.
Analysis of the historical data record will give insight of the previous and existing
watershed conditions and its sensitivity toward different parameters, making it easy to
concentrate either on the whole watershed or just in a specific sub watershed. This will
help in developing a deep understanding of the watershed and lead to the establishment of
powerful watershed management decision making and analytical capabilities and
facilitate more meaningful stakeholder interactions.
As shown previously in Table 3.2 numerous data for water quality, quantity,
climate, and land use were obtained for the watershed. Water quantity data were obtained
from USGS, there are 18 active stations that measure daily flow and gage heights in the
watershed. Data for the period of 1970-2010 were compiled for the water quality. Water
quantity data were obtained from the MWRDGC; there are 41 stations within the
watershed that measures up to 65 different water quality parameters once, twice or for
some stations three times a month. Data for the period of 1970-2008 were compiled for
water quality. Land use data were compiled from CMAP, land use inventory for 2001 and
2005 were utilized. Climate data compiled were precipitation and air temperature.
97
Chicago O'Hare Airport metrological station's hourly data for the period 1970-2006 for
precipitation and for the period 1994-2006 for air temperature were compiled.
4.6.2 Watershed Data Warehouse Architecture. The data was extracted from its
originating data sources and saved in excel files. Staging area tables, dimensional tables,
and fact tables were created and stored in Oracle Database 1 lg system, launching a DW.
The data was loaded to the DW's staging area using SQL*Loader. SQL*Loader is an
Oracle-supplied utility that allows user to load data from a flat file into one or more
database tables. A control file was created to provide information to SQL* Loader such as
name and location of input data file, format of records in the input data file, name of
tables to be loaded, correspondence between the fields in the input files and the columns
in the destination database tables being loaded (Gennick et al., 2001). Staging area is
where the data is cleansed, manipulated and prepared to be delivered to the multi
dimensional model (presentation area).
The four staging steps of DW are extracting, cleaning, conforming and delivering
(Kimball et al., 2004): The extracting was simple and fast where original data was
extracted from different sources and loaded to its designated stage tables, in case of the
USGS and MWRDGC data, the extracted tables were restructured and cleaned form
different symbols and notations used by the source before they were loaded into the
staging area and the CMAP shapefiles areas were transformed into numerical areas that
were connected to monitoring locations; Cleaning processes involved checking valid
values, consistency across values, and removing duplicates, null cells were either
populated with mean values or removed, also very high reading and unreasonable
98
negative readings were removed, data were matched based on location for some stations
where the station ID been changed over the years; Data conformation is required
whenever two or more data sources are merged into the DW, standardized domains and
measures were used so querying separate data sources can be made based on identical
textural and numerical labels; and finally to make the data ready for querying, the data
was physically structured into a set of simple, symmetric schemas, discussed earlier, and
known as star schemas or dimensional models.
The measurements and dimensional data contained in the staging area were
mapped to the DW to be loaded in the designated fact table and dimensional tables and
completed with mapping the correct foreign keys. All logical definitions and their
physical implementation comply with Oracle Corporation Specifications for Oracle DW
1 lg release 2. See Appendix A for the design and development of WDW.
4.6.3 Watershed Assessment. The analysis and assessment of Chicago River
Watershed data is used as an example of the application of DW technology for different
stakeholders. Data analysis and watershed assessment of the spatial and temporal aspects
of the watershed give an overview of the system and its needs and can help to identify the
major issues and problems in the study area. This section presents an assessment through
the years of some of water quality parameters that can be obtained by using the Chicago
River watershed dashboard and running ad hoc analysis utilizing the Chicago River
WDW. Figure 4.7 shows the location of the stations selected to be used in the assessment.
They were selected to show the behaviors of the watershed upstream and downstream for
sections of the system. The parameters chosen for the assessment were total kjeldahl
99
nitrogen (TKN), total nitrates (N02+N03), total phosphorous (TP), Dissolved oxygen
(DO), water temperature. Other watershed assessment for different parameters such as
flow, Ammonia to assess stream toxicity etc. can be done too.

100
WW_32
05535070
WW_106
05535500
WW31
05534500
Lake Michigan
WW_37
05536105
Dup.igi-' County
WW_46
05536118
WRP
MWRD
Figure 4.7. Water quality and quantity stations used in the watershed assessment
101
4.6.3.1 Assessment of TKN and Total Nitrates (N02+N03). In definition TKN is the
sum of organic nitrogen, ammonia (NH3), and ammonium (NH/) and to calculate total
nitrogen (TN), the concentration of total nitrates (NO2+NO3") is to be added to TKN.
Figures 4.8 and 4.9 shows the TKN and total nitrates historical data in the MWRDGC
stations included in this assessment (see Figure 4.7 for locations).
No known WQS are now available for these two parameters in the Chicago River
Watershed, if that was available it would be easy to apply the WQS value and to detect
where and when these standards were exceeded, a thorough analysis of the location can
be done then.
A visual inspection of Figure 4.8 and 4.9 reveals that the upstream station
WW 32 showed lower and more stable concentrations through most of the years, while
the upstream WW 46 showed much higher values with apparently decreasing trendline
for TKN and increasing trendline for total nitrates. This is due to the North Side WRP
effluents which due to stringent permits for ammonia it converted more of the ammonium
into nitrates. These findings suggest that just looking at the downstream station for TKN
would have shown improvements in lowering the constituent; however that is not the
case since the assessment shows that the TKN were actually transformed to total nitrates.
4.6.3.2 Assessment of TP. Figure 4.10 shows total phosphorous historical data for the
MWRDGC stations included in this assessment (see Figure 4.7 for locations). The
majority of the data for all stations fall in the range of 0-2 mg/1 for total phosphorous.
WQS would have helped to identify the location and period for TP that was exceeded for
extra analysis and assessment. The downstream TP showed almost constant or very slight
102
increase over the years suggesting that not much had been done to decrease the
constituent.
4.6.3.3 Assessment of N/P Ratio. Nutrients, such as nitrogen and phosphorus, are
essential for a healthy and diverse aquatic environment. Excessive amounts of nutrients
however can have undesirable effects on water quality, resulting in changes in the
biological community (USEPA, 2000). High concentration of nutrients also can result in
potential human health risks associated with the growth of harmful algal blooms (Harned
et al., 2004) resulting in the phenomena known as eutrophication which in later results in
hypoxia. Hypoxia is the condition of low oxygen in the water that occurs due to
overabundance of nutrients. It refers to DO concentrations less than 2 mg/1. In this
section, the N/P ratios are evaluated in terms of defining the limiting nutrient in the
aquatic system, the limiting nutrient is a concept defined as a chemical needed for plant
growth but is available in smaller quantities than needed for algae to increase their
abundance (Calderon, 2009). To define the limiting nutrient Chapra (1997) specified a
rule of thumb for N/P ratio for rivers and streams. It suggests that a ratio value of 7.2 and
less indicates that limiting factor for algal growth is nitrogen and for ratio values that is
higher than 7.2 the limiting factor for algal growth is phosphorous (Calderon, 2009).
Figure 4.11 and 4.12 shows N/P ratio assessment for an upstream station WW 32 and a
downstream station WW 46 in the period of 1976-2008. For the upstream station it
shows higher N/P ratios which suggest high concentrations of nitrogen relative to
phosphorus which makes phosphorous the limiting factor. Looking at the downstream
N/P ratios in Figure 4.12 would suggest that low concentrations of both phosphorus and
103
nitrogen and hence lowered N/P ratios. However given the assessment done for TKN,
total nitrates and total phosphorus would suggest that the lowered N/P ratio is due to
added nitrogen and phosphorous. This is probably due to the added phosphorous and
nitrogen by the North side treatment plant and other point sources.
4.6.3.4 Assessment of DO. Figure 4.13 shows the rate of dissolved oxygen over the years
for the station selected for the assessment. The Figure shows that almost all of the rates
measured are above 2 mg/1 indicating sufficient DO in the water. This result was
expected inspite of the high rates of nutrients available in the streams because of the
availability of aeration plants in the stream. The dissolved oxygen rates were further
analyzed vs. the water temperature for both stations and shown in Figures 4.14 and 4.15.
The Figures show clearly that the dissolved oxygen rates drop with the elevation of water
temperature probably with warm air temperatures. Figure 4.16 show relationship between
water temperature and air temperature in the watershed.

104
16 i i 1 r r~ ~i 1 i i \ r- ii 1 1 1 1 1 1 1 r
14 Upstream -
WW_32
12
Downstream -
WW 46
10
M .V
1
z
*i fAfcj
1975 1980 1985 1990 1995 2000 2005 2010
Figure 4.8. TKN historical data -MWRDGC stations (1975-2008)

105
WW_32
WW 46
1970 1975 1980 1985 1990 1995 2000 2005 2010
Figure 4.9. Total nitrates historical data-MWRDGC stations (1970-2008)

106
WW_32
WW 46
1970 1975 1980 1985 1990 1995 2000 2005 2010
Figure 4.10. Total phosphorous historical data -MWRDGC stations (1970-2008)

107
* + w

*
t *
* * V \
1975 1980 1985 1990 1995 2000 2005
Figure 4.11. N/P ratio for upstream station.

108
40
%
% V \ * X * ' * < / # %

* a k"

0 Llii iiiiii-iJi '-'''''1
' ''1''1L
1975 1980 1985 1990 1995 2000 2005
Figure 4.12. N/P ratio for downstream station.

109
12
# WW_46
WW 32
10
%
E. 6
O
o
/ V/ KS t
. .
rt
1970 1975 1980 1985 1990 1995 2000 2005 2010
Figure 4.13. Dissolved oxygen historical data -MWRD stations (1970-2008)

110
Water Temp, (deg C)
Figure 4.14. DO vs. water temperature for upstream station (1970-2008)
12
10
4 i | *
* nv. * ,
ao
1 6
O
a

10 15 20 25 30 35
Water Temp, (deg C)
Figure 4.15. DO vs. water temperature for downstream station (1970-2008)

Ill
y = 2.3525X + 20.99
R2 = 0.8103
-10 -5 0 5 10 15 20 25 30 35
Water Temp, (deg C)
Figure 4.16. Water temp. vs. air temp

112
4.7 Conclusion
The multi-dimensional watershed model presented in this chapter is the base for
the framework proposed to investigate land use effects on water quality in highly
urbanized watersheds. It provides readily integrated watershed data that offers holistic
view of the watershed elements, across the heterogeneous data sources. The DW concept
described here is used to study and assess the Chicago River Watershed. It allows
combining data from different sources, such as USGS, MWRDGC, CMAP, and NWS in
a single repository. Implementing multi-dimensional modeling using DW techniques
facilitates the integration and aggregation of information at all desired levels concerning
watershed monitored locations.
The web-based dashboard and reporting tools allow the watershed stakeholders to
focus their efforts in monitoring, understanding and take proactive actions, in
management the watershed. The introduced GUI illustrates the ease with which the DW
dimensional concept can be mapped to graphical user interface design to create a tool that
facilitate the different intended tasks of the users, whether it is a watershed assessment
task or integrating data for a physical model application task. The ad hoc analysis tools
are further used where data can be sliced and diced to find patterns or pinpoint certain
problem areas and to provide necessary details, views, or perspectives that enable users to
understand a problem and identify the steps they must take to address it. This improves
the efficiency of analyzing and assessing a watershed over utilizing traditional databases.
Although, the model and the methodology were implemented for highly
urbanized watershed, it is not restricted and can be used without modification for any
watershed.
113
CHAPTER 5
DATA DRIVEN MODEL TO PREDICT WATER QUALITY
5.1 Introduction
Estimates of nutrient concentrations, loads, and yields are useful for evaluating a
water body and help to identify source areas to develop mitigation strategies (USGS,
2012). Generally to determine concentrations of nutrients in a stream, samples are
collected manually once or twice a month or may be even less frequent and later analyzed
in laboratory. This procedure is time consuming, and not efficient when immediate
information is needed. Nutrient loads transported by a stream during a given period of
time, are particularly important when considering the amount of nutrients entering lake,
or reservoir (USGS, 2012). Load estimates also are important to the establishment and
monitoring of TMDLs mandated by the CWA (USGS, 2012). The yield estimates may be
used by resource and regulatory authorities to help prioritize efforts with regard to land
use management and best practices (USGS, 2012).
This chapter investigates the development of data driven models that can estimate
water quality constituents from historical data records in Chicago River watershed
making use of the WDW repository introduced in Chapter 4.
5.2 Methodology
This research uses data mining (DM) from the artificial intelligence field to
estimate water quality parameters such as total nitrates for the Chicago River Watershed.
DM models consist of a set of mathematical relationships. DM tasks are divided into two
114
major divisions, predictive and descriptive tasks. Predictive tasks where a particular
attribute is predicted based on the value of other attributes. The attribute to be predicted is
the dependent variable while the attributes used for making the prediction are
independent variables. For the descriptive tasks, the objective is to develop patterns
(correlations, trends, etc.) that summarize the relationships in data which are often
exploratory in nature. These tasks usually require post processing techniques to validate
and explain the results (Tan et al., 2006).
The predictive models are divided to classification models which are used for
discrete target variables and regression models which are used for continuous target
variables (Tan et al., 2006). There are many methods to construct prediction and
classification models such as naive Bayesian, support vector machines, decision tree,
neural network, and k-nearest neighbor classifications.
Regression is the statistical methodology that is most often used for numeric
predictions. Both prediction and classification are supervised learning problems where
there is an input X and an output Y, where the model learns the mapping from the input to
output (Alpaydin, 2010). The approach in DM is that a model defined up to a set of
parameters, is assumed:
y = g(x|0) 5.1
Where, g is the model and are its 0 paremeters. Y is a number in prediction or
regression and a class code in classification.
The DM program optimizes these parameters so that the approximation error is
minimized and the estimates are close to the correct values given in the training set
(Alpaydin, 2010). For the Chicago River Watershed, data driven models to estimate
115
nutrient concentration based on some watershed parameters such as stream flow,
precipitation, air temperature, water temperature, dissolved oxygen, turbidity, areas of
different land use types, month of year and others, were developed using different data
mining techniques.
5.3 DM Methodology
DM is part of Knowledge discovery in database (KDD) process. It consist of
series of mining steps as shown in Figure 5.1
Data Mining
Model Model
Input Pre Building Deployment Output
processing
Evaluation
Figure 5.1 DM methodology

116
5.3.1 Data Pre-processing. This includes the tracking of incomplete data that lack
certain attributes or certain attributes' values, filling missing or incomplete values,
remove errors and outliers, and resolve inconsistencies in data (Han et al., 2006). This
process ensure quality data which will in turn will ensure quality mining results and
quality decisions since duplicate or missing data may result in incorrect or even
misleading statistics (Han et al., 2006).
To better understand the mining data, descriptive data summarization provides the
analytical foundation for data pre-processing. The basic statistical measures for data
summarization include measurements for central tendency of data such as mean,
weighted mean, median, mode; and measurements for data dispersion such as range,
quartiles, variance and standard deviation for (Han et al., 2006). Graphical
representations such as histograms, boxplots, quantile plots, and scatter plots facilitate
visual inspection of the data and are useful for data pre-processing and data mining as
well (Han et al., 2006).
Examples of data pre-processing are data cleansing, data integration, and data
transformation. Data processing supports integration, cleansing and transformation of the
data to assure high quality. The majority of these pre-processing steps were done when
the DW was built. As discussed in Chapter 4.
Data transformation routines are used to convert the data into forms that are
suitable for mining, for example an attribute data may be normalized to fall between
small ranges such as 0 to 1 (Han et al., 2006). Different data reduction techniques such as
data cube aggregation attribute subset selection, dimensionality reduction, numerosity
reduction and discretization can be used to obtain a reduced representation of the data
117
without losing the content of information (Han et al., 2006). For numerical data
techniques such as binning, histogram analysis, entropy-based discretization, and cluster
analysis can be used (Han et al., 2006).
Histograms are highly effective at approximating both sparse and dense data as
well as highly skewed and uniform data and can capture dependencies between attributes
(Han et al., 2006). They use binning to approximate data distributions. Data sets for
analysis may contain hundreds of attributes, many of which may be irrelevant to the
mining task or redundant and may slow down the mining process and result in discovered
patterns of poor quality. Various statistical significant tests and techniques which assume
that the attributes are independent of one another can be performed to select best
attributes subsets.
5.3.2 Model Building and Evaluation. This involve the selection and applications of
various models that are developed using comparable analytical techniques and adjustment
of model parameters until optimal values are reached. Input data are randomly partitioned
into two independent sets, a training set and a test set. The training set is used to derive
the model with an accuracy estimated using the test set, this is called holdout method
(Han et al., 2006). Random sub sampling method is a variation of the holdout method in
which the method is repeated k times and average accuracy is considered (Han et al.,
2006). In k -fold cross validation, the input data are randomly partitioned into k or folds
each of approximately equal size. Training and testing is then performed k times and
where each sample is used the same number of times for training and once for testing, see
Figure 5.2, the error is calculated as the average error rates from the all the k iterations
118
(Han et al., 2006). 10-fold cross validation method is adopted for building all the models
in this study.
Total number of samples
Fold I
Training sample
Fold 2
Fold 3
Testing sample
Fold 4
Figure 5.2 k -fold cross validation method where k =4
5.3.2.1 Prediction Models. This section describes the different regression or
classification approaches used in this chapter. In this study, eight different algorithms
were investigated and built as regression or classification model where applicable and
their merits were compared in the context of performance analysis. The prediction
models are: Multiple linear regression, Artificial neural networks, Model trees, Support
vector machines, Lazy learners and Gaussian process. The classification models are:
Artificial neural networks, Model trees, Support vector machines, Naive Bayes, Lazy
learners and logistic regression. General and brief description of each algorithm is given
below:
Multiple Linear Regression is based on the assumption of a linear relationship
between the dependent variable Y and its predictorsX1,X2, ...,Xn.
Y = w0 + w2X2+ + wnXn 5.2

119
The method of least squares can be used to solve w0, w1(and wn where the
functional relationship between Y and its predictors is estimated by minimizing the
residual sum of squares. Linear regression offers simple and easily interpretable type of
models.
Artificial Neural Network (ANN) this algorithm was inspired by attempts to
simulate biological neural systems. Backpropagation is the learning algorithm that
performs by learning on a multilayer feed-forward neural network, during the learning
phase the network adjusts the weights to predict the correct class label of the input tuples
(Han et al., 2006). The multilayer feed-forward neural network comprises number of
neurons is organized into an input layer, an output layer and a number of hidden layers.
The units in the input layer take the information to be processed (values of the predictors)
as inputs, while the output layer produces the prediction result. The hidden layers
successively receives the results of the units in the input layer and gives its results as
inputs to the units in the next layer (Tan et al., 2006; Han et al., 2006; Ould-Ahmed-Vall
et al., 2007).
The process as outlined by Han (2006) is as follows: a set of training tuples are
iteratively processed and compared to the actual known target value; for each training
tuple, the weights are modified to minimize the mean squared error between the
network's prediction and the actual target value; the weigh modifications are made in the
"backwards" direction from the output layer, through each hidden layer down to the first
hidden layer. That is why the term backpropagation is used.
The ANN algorithm has two benefits, high prediction accuracy and no prior
knowledge requirements for physical relationship between the dependent and the
120
independent variables (Tan et al., 2006). However, the black-box nature of ANN makes it
difficult to understand and analyze the learned function (Han et al., 2006; Ould-Ahmed-
Vall et al., 2007).
Support Vector Machines (SVM) is a classification method for both linear and
nonlinear data. It uses an appropriate mapping to transform the original training data into
a higher new dimension where it searches for the linear optimal separating hyperplane
(i.e. decision boundary) where data from two classes can always be separated. SVM finds
this hyperplane using support vectors (essential training tuples) and margins (defined by
the support vectors) (Han et al., 2006). The technique used in this study is the Sequential
Minimal Optimization algorithm (SMO).
Model Tree is tree like structure where each internal node denotes test on an
attribute, each branch represents an outcome of test and each leaf node holds a class label
(Han et al., 2006). They extract predictive information in the form an "if-then-else"
expression that is clear and understandable to humans (Ahmad et al, 2010). That is an
explainable approach, in contrast with other machine learning approaches, such as neural
networks (Alpaydin, 2010, Ould-Ahmed-Vail et al., 2007). It can explain the decisions
that lead to certain prediction that can be easily used within a database to identify a set of
records. The input space partitions until the data at the leaf nodes constituted are
relatively homogeneous then a linear model can explain the remaining variability. The
model tree algorithm used in this work is the classical M5 algorithm.
Nai've Bayes is a probabilistic classifier based on Bayesian theory. It simplifies
the learning process by assuming that the inputs are independent. Bayes' theorem is
based on the idea that the outcome of an event can be predicted based on some evidence
121
that can be observed to predict an outcome of some events (Ahmad, 2010). Naive Bayes
computes conditional probabilities for the target values based on historical records by
observing the frequency of attribute values and of combinations of attribute values
(Alpaydin, 2010).
Advantages of the algorithm is the ease of implementation and the good results,
however, the disadvantages include the assumption that the inputs are independent which
results in loss of accuracy (Han et al., 2006).
Lazy Learner algorithm (in contrast to the above algorithms which are eager
learners) lazy learner is an instance-based learning that stores training data and waits until
it is given a test tuple to start a process (Han et al., 2006). The algorithm takes less time
in training but more time in predicting. It effectively uses more space since it uses many
local linear functions to form its implicit global approximation to the target function,
opposite to eager learner algorithms which commit to a single hypothesis (Han et al.,
2006). Typical approaches include: k-nearest neighbor; locally weighted regression; and
case-based (Han et al., 2006).
Gaussian Process algorithm is a collection of normally distributed random
variables which generates samples over time {Xt}teTjtne where the linear combination
will be normally distributed no matter which finite linear combination of Xt ones takes.
They are considered attractive because of their flexible non-parametric nature and
computational simplicity (Seeger, 2004).
Logistic Regression models the probability of the occurring of some events as a
linear function of a set of predictor variables.

122
5.3.2.2 Model Evaluation. Different criteria were used to evaluate the regression and
classification models:
Regression Models. This section discusses the criteria to evaluate the prediction
accuracy of the different algorithms used in the study. As stated in section 5.3.2, 10-fold
cross validation was used. This technique consists of dividing the overall data samples
into 10 subsets, or folds. Each model is trained using 9 of the subsets and evaluated using
the tenth subset. The process is iterated 10 times (Figure 5.2) and each time, a different
subset is used for testing and the remaining 9 subsets are used for training the model. The
model is evaluated by averaging the prediction evaluation criteria from the 10 different
iterations. Regression evaluation criteria used for this study are (Alpaydin, 2010, Ould-
Ahmed-Vall et al., 2007):
The Correlation Coefficient: This criteria is based on the standard correlation
coefficient and measures the extent of linear relationship between predicted (P) and
actual (A) values. It is a dimensionless index that ranges from -1 to 1 with 1
corresponding to ideal correlation. The correlation coefficient C is given by:
c _ CovÂ) 53
Where Cov(P,A) is the covariance between the predicted and the actual values
while CTp and aAare their respective standard deviations.
Root Mean Squared Error (RMSE): This error measure is used in the
determination of confidence intervals. It ranges from 0 to QO with 0 corresponding to the
ideal situation. It is computed as:
RMSE 5.4
123
Where pj and aj are the predicted and actual attribute measured for ith test
instance and N the number of instances
Mean Absolute Error (MAE): This error measure is similar toRMSE, except that
it uses absolute error values instead of the squared errors. It is computed as:
5.5
Root Relative Squared Error (RRSE): The relative squared error is relative to
what is represented by the simple predictor which is the mean of the actual values. It is
computed by normalizing the total squared error by dividing it by the total squared error
of the simple predictor. It is given by:
RRSE 5.6
Where, a is the actual mean.
Relative Absolute Error (RAE): This error is similar way to RRSE. The relative
absolute error takes the total absolute error and normalizes it by dividing by the total
absolute error of the simple predictor. The value of this error ranges from 0% to 100%
with 0 being the ideal situation. It is given by:
5.7
Classification Models. Classification models are assessed based on their
accuracy. Typically, confusion matrix, per-class and overall precision and recall and
receiver-operating characteristic are calculated:
Model accuracy is a criterion that measures the wellness of the model
correlation. It refers to the percentage of correct predictions made by the model when
compared with the actual classifications in the test data displayed in a confusion matrix
124
(Ahmad et al., 2011; Han et al., 2006). Accuracy is the proportion of total true results to
total results. It is given by:
Accuracy = (Tp + Tn)/(Tp + Tn + Fp + Fn) 5.8
Where Tp and Fp are the number of true and false positives respectively. Tn and
Fnare the number of true and false negatives respectively.
Confusion Matrix is an n-by-n matrix, where n indicates number of tuples of
classes. Rows represent actual classifications in data, while columns represent number of
predicted classifications by the model.
Precision is the percentage of records that are correct responses and are actually
positive or relevant to the positive class, and it is given by:
TP
Precision= 5.9
Tp+Fp
Recall is the percentage of positive records that are predicted among all the
records predicted by the classifier, it is given by:
Recall= 5.10
Tp+Fn
F-measure is the trade-off of precision for recall and vice versa. It is the measure
that discourages systems from sacrificing to one another excessively. It is given by:
recallxprecision
F-measure = - 5.11
{recalls-precision) / 2
Receiver Operating Characteristic (ROC) is a plot of true positive rate vs. false
positive rate that compare predicted and actual values. It provides an insight into the
decision-making ability of a model (sensitivity) i.e., how likely is the model to accurately
predict the negative or the positive class. It is a useful metric for evaluating how a model
behaves with different probability thresholds (Flach, 2003; Ahmad et al, 2011).
125
5.3.3 Model Deployment. The insights offered by data mining results can be integrated
with policy and decision making tools so that effective watershed management and
optimum land use utilization can be achieved. Such integration requires a post processing
step that ensures that only valid and useful results are incorporated into decision support
system. Example of post processing is the preparation of model inputs based on "what if'
scenarios in order to predict future behaviors that result due to change in any of the
watershed elements such as population, water quality regulations, land use, climate etc.
5.4 Case Study
The capabilities of predicting water quality parameters using data driven models
were demonstrated for Chicago River Watershed. The WDW repository introduced in
Chapter 4 was utilized for developing the models. The goal of this research is to
investigate simplified procedures to continuously predict watershed water quality
parameters by utilizing other watershed parameters that are available, continuous and
easily obtained.
The attributes were picked based on their physical nature and whether they are
real time frequently measured data such as daily flow, air temperature and hourly
precipitation; or they are measurements of specific conductance such as pH, water
temperature, dissolved oxygen, turbidity, and total chlorophyll; or they are not time
consuming chemical or biological tested measurements such as BOD and COD; or
related to the land use of the source.
The choice of these attributes for data driven models to predict total nitrates were
assumed to give relevant and useful information and hence good discovered patterns.
126
Table 5.1 shows the properties and descriptive summarization of the predictors, the
attributes for land use are represented in the table by just one type (TOTlOOl which is
single family land use) the rest of land use attributes are described in Appendix A.
For the Chicago River Watershed, most of the pre-processing steps required for
data mining were performed when building the DW. Histogram analysis strategy was
used to visualize attributes data for outliers. Figures 5.3 and 5.4 show histograms and
matrix of scatter plots of attributes for the Chicago River Watershed selected for the data
mining analysis. Histograms partition the values of an attribute into equal sized partitions
or ranges. 2% of top and bottom data were removed. Also the missing values were
replaced by mean values. The k-fold cross validation method was used for partitioning
training and testing data sets for all the predictive models used for this study, 10-folds
were used. Total number of samples is 905 samples and number of attributes investigated
is 154 attributes.
127
Table 5.1. Predictor's properties

Attribute Description Unit Mean Min Max Stdev
MONTH NUM Number of the month 1 12
DO Dissolved oxygen mg/1 7.198 0 15 2.670
NITRATE Total nitrate mg/1 2.686 0 11.98 2.903
TOTP Total phosphorous mg/1 0.966 0 74 4.128
TKN Total Kjeldahl nitrogen mg/1 1.979 0.2 88 3.741
TURB Turbidity NTU 21.280 2.8 312 32.119
TEMP Water temperature degC 13.407 -4 33.7 7.674
CHLOROPH Chlorophyll yll-A 9.054 0 118.4 13.177
Biochemic oxygen mg/1

BOD 4.155 0 46 3.386
demand
Chemical oxygen mg/1

COD 44.466 2 305 37.649
demand
CBOD Carbonaceous BOD mg/1 1.653 0 6 1.782
PH Water pH 7.481 0 9.2 0.661
Volatile suspended mg/1

vss 137.697 0 916 194.80
solids
ELEV Elevation ft 270.976 0.00007 513.776 137.29'
Inorganic suspended mg/1

INORGSS 29.769 0 428 42.913
solids
MINAIRTEMP Min. air temperature deg F 43.035 -5.8 79 17.301
AVGAIRTEMP Avg. air temperature deg F 52.608 -0.16 86.16 18.284
MAXAIRTEMP Max. air temperature deg F 61.943 8.1 99 19.942
DAILYPERC Daily precipitation in 0.093 0 1.82 0.244
FLOW Daily flow cfs 67.708 0.02 1450 145.64-
TOTJ 001 Single-f residential area acre 25878.139 13161.3 58746.6 19001.1
128
! ; i 1 ! 1 : . i : j r , , I -
mil
n.
"^-r-rfr-rTrrv^. ........ JL
M!'I . ?7.^frrrr-fK ~ ~ M - n > r M* r "U *

T - r f " j f k .
,
TL n .
n . . . . r i- . n - i . r i .1 *
ww o mm
r
|
1 -i r ' - ; r - Fn. . . . . i -
i
!r i .
,
1 1 . ; , . . r-n*'"L . . . . r-,r . r r i
r"" - " "" . . . . r r ; . . . . ri r - i .
]. . . !i i r"
r- it
Figure 5.3 Histograms of attributes
lillliMllh %mw > Uttr V * / i 1 44-

utt .<**>- * : i-
t'l'l.tlllij ***+ > +**. a 4- > 1. ;l

/ ^ % P Ml 1 1 115
4-
.liiiiiLii JL ~l M l t X f r E I
%'/> it ; % i ' | i 'i %% Ih
Figure 5.4 scatter plot matrix of attributes

129
5.5 Implementation and Results
The open-source, The Waikato Environment for Knowledge Analysis (WEKA)
software package was used for this study. It provides a comprehensive collection of DM
algorithms and data preprocessing tools that offer a framework to compare the different
algorithms described in Section 5.3.2(Hall et al., 2009). WEKA has several graphical
user interfaces that enable easy access to the underlying processes. The main graphical
user interface is the "Explorer". It has a panel-based interface, where different panels
correspond to different data mining tasks such as preprocess where data can be loaded
from various sources including files and database; and classify which gives access to
WEKA's different classification and regression algorithms. The panel also provides
access to graphical representations of models prediction errors in scatter plots, and also
allows evaluation via ROC curves and other "threshold curves" (Hall et al., 2009).
5.5.1 Regression Models' Results. Prediction accuracy of regression models i.e.,
multiple linear regression, ANN, decision tree, SVM, lazy learner and Gaussian process
using the predictors (shown in Table 5.1) are shown in Table 5.2. Appendix A shows the
details of the results of all the models.
Among the six regression models built only the multiple regression model and the
model tree gave models that are Interpretable. Multiple linear Regression model is given
by equation 5.12.
Total Nitrate^ 5.6452-0.0534 * MONTH NUM + 0.0714 * DO + 0.0961 *
TEMP - 0.1304 * BOD + 0.006 * COD - 0.3908 * PH -0.0022 * VSS-0.0037 *
INORG SS + 0.0152 * MIN AIR TEMP -0.1395 * AVGAIRTEMP +0.0719 *

130
MAX AIR TEMP +0.5953 * DAILY PERC -0.0046 * FLOW + 0.0001 * TOT 1002 -
0.0025 * TOT 1005 + 0.0006 * TOT 1009 +0.0006 * TOT 1010 + 0.0002 * TOTJOl 1
+0.0002 * TOTJOl3 +0.0001 * TOTJOl5 + 0.0003 * TOTJOl6 + 0.0005 *
TOT 1027 +0.0001 * TOT 1032 +0.017 * TOT 1033 +0.0005 * TOT 1037 +0.0002 *
TOT 1040 + 0.0001 * TOT 1045 -0.0163 * TOTJ092 + 0.0124 * TOTJ095 + 0.0003
* TOT 1096 5.12
Equation 5.12 indicates that attributes such as DO, water temperature, air
temperature, precipitation and few land uses can predict total nitrates
Figure 5.5 shows the decision tree model where number of rules of 'if then else"
nature partition the tree, rules for each node are shown in Appendix A. Each leaf node
represents a rule to predict the total nitrate. The first umber in parentheses indicates the
number of instances that falls into the corresponding leaf and the percentage indicates the
misclassified instances. Example of these the tree model rules are as follows:
TOTJ 001 <= 14404.2 :
| INORGSS <= 15.5 :
| | DO <= 7.199 : LM1 (42/15.604%)
1 I DO > 7.199 :LM2 (27/87.84%)
| INORG SS > 15.5:
| I VSS <= 40.5 :
| | | FLOW <= 10.15 :LM3 (37/4.438%)

131
The linear model (rule class) defined by rule LM1 is given by:
NITRATE = 0.0363 * DO + 0.0057 * TEMP - 0.0068 * BOD + 0.0003 * COD - 0.0192 *
PH - 0.0001 * VSS - 0.0002 * INORGSS - 0.001 * MINAIRTEMP - 0.0017 *
AVGAIRTEMP + 0.1426 * DAILYPERC - 0.0001 * FLOW + 0 * TOTJOOl +
0.7193 5.13
The linear model defined by rule LM 2 is given by:
NITRATE = 0.0438 * DO + 0.0057 * TEMP - 0.0068 * BOD + 0.0003 * COD - 0.0192
*PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
*AVG_AIR_TEMP + 0.1426 * DAILY PERC - 0.0001 * FLOW + 0 * TOTJOOl +
1.282 5.14
The linear model defined by rule LM3 is:
NITRATE = 0.0094 * DO + 0.004 * TEMP - 0.0068 * BOD + 0.0003 * COD - 0.0192 *
PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017 *
AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0013 * FLOW + 0 * TOTJOOl +
0.5277 5.15
The other models do not provide similar representation, nevertheless they can be
utilized to predict total nitrates if the model showed good prediction performance. Table
5.2 compares the prediction accuracy of the six regression models. It shows that ANN,
decision tree and Gaussian processes showed better performance than SVM and lazy
132
learner. They showed similar performance with very close values for RMSE, MAE, and
correlation coefficient of 74.49%, 74.48% and 74.41% respectively.
To further assess the models' quality for the top three algorithms i.e. ANN,
decision tree and Gaussian process, the predicted total nitrate versus the actual total
nitrate was plotted for the all the instances. Figure 5.6 shows that the three models
present good performance for total nitrate values lower than 8 mg/1. This is due to the
insufficient amount of high total nitrate values in the training data which didn't allow the
models to gain sufficient "learning". The plot indicates that different level of
performance for different values of total nitrates can be observed; well for low values (0
to 4 mg/1), acceptable for medium values (4mg/l to 8 mg/1) and poor for high values
(8mg/l and above). Nevertheless given the result of the assessment of upstream and
downstream total nitrate historical records (Figure 4.9), the total nitrates values always
fall below the 8 mg/1 line. This allows the exploitation of the given regression models to
identify and quantify the total nitrates in the Chicago River watershed.
=37181 95 3718' 95
> <=43-233 ><3 233
*
{0 15 >1015
/\
= 225 75 >225 ?5
/\
<=4 075 4 078
X =126 5 '1265
4
/ \
*
781 >781
/ \
=7.55
/
7.55
\
X =57.1 >571 192 '19.2 56422.4 8422.45
172 7 72
i
=10.15 1015
*
=382 >382
7.165 7165
*
=5 05 >5.05 <=1C05 >1005
5 >18.5
Figure 5.5. Decision tree regression model

134
Table 5.2. Prediction accuracy of regression models
Multi- ANN Decision SVM Lazy Gaussian
Regression
Correlation 0.6759 0.7449 0.7448 0.6331 0.6295 0.7441
RMSE 2.1306 1.9469 1.9279 2.3431 2.245 1.9368
MAE 1.4842 1.2686 1.2217 1.3042 1.5583 1.2731
RRSE 73.68% 67.32% 49.99% 81.02% 77.63 % 66.97%
RAE 60.73% 51.91% 66.65% 53.36% 63.76% 52.09%

135
Predicted vs. Actual Nitrate
Actual
GP - Predicted
a ANN - Predicted
M5P- Predicted
Actual Nitrate
Figure 5.6. Actual vs. predicted total nitrates

136
5.5.2 Classification Models' Results. To use classification models to predict total
nitrates, the values were transformed from continuous to three nominal classes. The
classes were defined as low, medium and high, Table 5.3. The classification models
selected are ANN, logistic regression, SVM, decision tree, lazy learner (LWL) and naive
bayes. Prediction accuracy for models is shown in Tables 5.4, 5.5, 5.6, 5.7, 5.8, 5.9
respectively. Appendix A shows the detailed results of the models.
Table 5.3. Total nitrates classes
Class Range
Low 0 < (N 0 2 + N 0 3 ) < 3.99
Medium 3.99 < (N 0 2 + N 0 3 ) < 7.99
High 7.99 < (N02 + N03) < +oo
As for the regression models, the only model that shows a mathematical form is
the decision tree while all the other models act as black box models. The prediction
accuracy results indicates that all models showed good performance with model accuracy
of 83.3149 %, 82.3204 %, 81.989 %, 81.6575 %, 81.547 %, 80.7735 % for ANN,
decision tree, logistic regression, lazy learner, SVM, and naive bayes models
respectively.
Comparing the performance based on the confusion matrix results; ANN, decision
tree, and logistic regression were able to predict the three classification classes. ANN was
the best to predict the low class with 93.2% true positives rate (TP), followed by logistic
regression and then decision tree (92.5% and 91.8% respectively). For the medium class
the models showed TP rates of 74%, 69.1%, and 66.2% for ANN, decision tree and
137
logistic regression respectively. As for the high class the decision showed the best
performance, although low TP rate, followed by logistic regression then the ANN
(29.7%, 28.1%, and 14.1%). The other three models SVM, lazy learner, and naive bayes
only predicted the low and medium total nitrate classes. For low class the TP rates of the
models are 91.7%, 91.7%, and 90.7% for lazy learner, SVM, and naive bayes
respectively. The rates for the medium class are 76%, 75.5%, and 75% for lazy learner,
SVM, and naive bayes respectively.
Other evaluation criteria for the classification models are precision, recall, and f-
measure. The precision rates for the models in descending order are 81.9%, 81.7%,
80.8%, 76%, 76%, and 75.9% for ANN, decision tree, logistic regression, SVM, lazy
learner, and naive bayes respectively. Similarly the recall rates are 85%, 83.3%, 82.3%,
82%, 81.7%, and 80.8% for SVM, ANN, decision tree, logistic regression, lazy learner,
and naive Bayes, respectively. The values for F-measure as given by equation 5.11 are
82%, 81.7%, 80.9%, 78.8%, 78.7%, and 78.2% for decision tree, ANN, logistic
regression, lazy learner, SVM, and naive bayes.
The last criteria to be considered is the ROC plot that measure the decision
making ability and sensitivity of the model. ROC plot for the six models collectively and
the ROC for the ANN model respectively are shown in Appendix A. The weighted
average ROC values are 91.8%, 90.4%, 86.2%, 85.4%, 82.3 %, and 77.5% for ANN,
logistic regression, naive bayes, lazy learner, decision tree, and SVM respectively. The
top left corner of the ROC plot is significantly high for all the models indicating high true
positive rate and a low false positive rate, hence good performance.
138
All the measures given above would suggest that ANN is the best classification
model to predict total nitrates followed by decision tree; the worst is the naive bayes.
However the decision tree provides clear logical model that can be easily understood.
139
Table 5.4. Prediction accuracy of ANN model
Value by class Weighted

Avg.
Low Medium High
Confusion Low 594 41 2

matrix
Medium 48 151 5
High 27 28 9
Model accuracy 83.315%
Precision 0.888 0.686 0.563 0.819
Recall 0.932 0.740 0.141 0.833
F-measure 0.910 0.712 0.225 0.817
ROC area 0.929 0.912 0.833 0.918
Table 5.5. Prediction accuracy of logistic regression model

Avg.
Low Medium High

matrix
Medium 60 135 9
High 32 14 18
Precision 0.865 0.696 0.6 0.808
Recall 0.925 0.662 0.281 0.82
F-measure 0.894 0.678 0.383 0.809
ROC area 0.915 0.894 0.831 0.904

140
Table 5.6. Prediction accuracy of SVM model

Avg.
Low Medium High

matrix
Medium 50 154 0
High 41 23 0
Precision 0.865 0.67 0 0.76
Recall 0.917 0.755 0 0.815
F-measure 0.89 0.71 0 0.787
ROC area 0.791 0.811 0.495 0.775
Table 5.7. Prediction accuracy of decision tree model

Avg.
Low Medium High

matrix
Medium 42 141 21
High 18 27 19
Precision 0.907 0.678 0.365 0.817
Recall 0.918 0.691 0.297 0.823
F-measure 0.913 0.684 0.328 0.82
ROC area 0.863 0.775 0.581 0.823

141
Table 5.8. Prediction accuracy of lazy learner model

Avg.
Low Medium High

matrix
Medium 49 155 0
High 41 23 0
Precision 0.866 0.671 0 0.761
Recall 0.917 0.76 0 0.817
F-measure 0.891 0.713 0 0.788
ROC area 0.869 0.87 0.647 0.854
Table 5.9. Prediction accuracy of naive bayes model

Avg.
Low Medium High

matrix
Medium 50 153 1
High 41 23 0
Precision 0.864 0.668 0 0.759
Recall 0.907 0.75 0 0.808
F-measure 0.885 0.707 0 0.782
ROC area 0.879 0.866 0.679 0.862

142
5.6 Conclusion
Results show that, given sufficient data with proper variables, DM methods are
capable of predicting water quality parameters, total nitrates in this case. Among the
prediction models used in this study, ANN and decision tree showed better performance
with very close values for RMSE, MAE, and correlation coefficient of 74.49% and
74.48% respectively. Also, for the classification models the prediction accuracy results
indicates that all models showed good performance with ANN, decision tree showing the
best performance with model accuracy of 83.3149 % and 82.3204 % respectively.
Although the ANN model always shows better performance, however, further training for
decision tree models would be more logical since they show reasoning process in rules
that are understandable to humans. These rules can assist policy making in watershed
management plans. On the other hand the other models do not provide such features to
enhance watershed management.
To support better prediction results and robust forecasting system for policy
makers, it is a common practice to use the combination of the outcome of the mining
models. It would be reasonable to utilize combination of the top predicting models for the
prediction of water quality parameters.
The success of data mining methodology relies heavily on the quality and quantity
of data used in the prediction process. Even though this study used a sufficient amount of
data, with logical set of predictors, more data and more watershed characteristics can be
incorporated to enhance the predictive models' efficiency and performance.
Techniques presented in this chapter are intended to integrate some of watershed
parameters as indicators to predict the water quality parameter in question, and hence
143
simplifying the modeling procedures. This allows the utilization of watershed basic
elements' data and the relationship among them without giving attention to the physical
behaviors that link them adopting advanced analytical techniques.
The data driven models derived would be useful in solving a practical problem or
modeling a system or process if (1) sufficient amount of data is available; (2) there are no
considerable changes to the modeled system during the period covered by the model
(Solomatine, et al., 2004; Solomatine, et al., 2007). They are effective if building
knowledge-driven simulation models is needed due to lack of understanding of the
underlying physical processes (Preis et al., 2007; Shrestha et al., 2007) or the available
models are not adequate enough (Solomatine, et al., 2007). It is always useful to have
modeling alternatives and to validate the simulation results of physically based models
with data driven ones.

144
CHAPTER 6
WATER QUALITY MODELING USING BASINS/HSPF
6.1 Introduction
In this Chapter, a water quality model of the Chicago River Watershed was
developed using BASINS/HSPF. The model is for simulating and quantifying the effect
of level (III) land use on nutrients loading into the water bodies in the watershed. From
the calibrated and validated water quality model, nutrient export coefficients that relate
the detailed land uses to water quality were obtained.
To assess the relationships between land use and water quality in the watershed,
the BASINS 4.0 model was selected. BASINS built-in delineation tools, DEM
reclassification and water quality management tools for observed data and other features
allows water quality to be assessed for a specific stream site or for a whole watershed.
HSPF version 12 was used as the water quality embedded model. HSPF is incorporated
in BASINS 4.0 and the interface is known as WinHSPF (Singh et al., 2006). With
WinHSPF, users are able to run HSPF in a friendly Windows environment.
6.2 Methodology
This section outlines the steps carried out to fulfill the objectives of the simulation
process. It explains how the hydrologic and water quality model were constructed and
used in BASINS/HSPF model environment.
6.2.1 Watershed Modeling in a BASINS Environment. Version 4.0. BASINS is a
multi-purpose environmental analysis system that integrates a geographical information

145
system (GIS), national watershed data, and state-of-the-art environmental assessment and
modeling tools (such as HSPF, SWAT, SWMM etc.) into one convenient package (EPA,
2012). It provides a framework to integrate several key environmental data sets with
improved analysis techniques (EPA, 2012). It was used in this study to characterize
hydrology and water quality processes and how they are related to detailed land use (level
III) in the Chicago River Basin.
BASINS data layers that can be provided to HSPF include: Digital Elevation
Model (DEM) grid data, to determine boundaries of watershed; National Land Cover
Data (NLCD or GIRAS) land use data to calculate landuse distribution within watershed;
Reach files to determine stream networks; Permit Compliance system (PCS) to provide
loading information in the watershed; Meteorological data to provide meteorological data
requirements; STORET data and USGS data to provide water quality and quantity data
(Aqua Terra Consultants, 2012).
BASINS package contain several important modeling tools. In order to run HSPF,
the observed meteorological data, water quality data and flow data must be formatted to a
Watershed Data Management (WDM) format using another program WDMUtil that is
also included in the BASINS package. The WDM files contain time series data required
by HSPF such as Meteorological data, HSPF program inputs and outputs, and model's
time series that are used in calibration and validation processes. All input data, except for
time series, are contained in User's Control Input (UCI) file. This file contains all the
needed parameters values and control specifications to run the HSPF model. For the
evaluation of model, all the calibration and validation analysis were done using the
GenScenario tool in the BASINS package.

146
6.2.1.1 Meteorological Data in the WDM Format. Meteorological data were available
as a daily data while to run both BASINS and HSPF models, hourly meteorological data
are required. The metrological station selected for this study is Chicago O'Hare Airport.
The reason that station was chosen among other available stations in the study area was
because it had all the metrological constituents that are required by HSPF. Table 6.1
presents the minimum input data requirement to run HSPF and provided by the station.
Precipitation data is used to find surface runoff, sediment and pollutant transport,
and hydrological processes. Potential evapotranspiration data is used in computation of
runoff or direct evaporation from land and water surfaces. Air temperature data is used to
determine water and soil temperature and to model snow and rain in the watershed. Wind
speed data is needed to model heat exchange, oxygen reaeration rates and chemical
volatilization rates. Solar radiation data is used to find heat balance in water bodies and
plankton growth rate. Dew point temperature is used to determine the kind of
precipitation and to model heat balance in streams. Finally, cloud cover is used to model
heat balance and photolysis.
Daily time series data must be disaggregated into hourly time series in WDMUtil
program which contains a function that perform that. For this study all the meteorological
time series were readily available by BASINS as disaggregated hourly data for the
selected station.
147
Table 6.1. Metrological data required for HSPF. Chicago O'Hare Airport Metrological
Station.
Hydro- Data time step Data period
meteorological data
Precipitation Hourly 1962/06/01 to
2006/12/31
Potential Hourly 1958/11/01 to
evapotranspiration 2006/12/31
Air temperature Hourly 1958/11/01 to
2006/12/31
Wind speed Hourly 1994/12/31 to
2006/12/31
Solar radiation Hourly 1995/01/01 to
2006/12/31
Dew point temperature Hourly 1994/12/31 to
2006/12/31
Cloud cover Hourly 1994/12/31 to
2006/12/31
6.2.1.2 GIS Data. Once the project was built in BASINS for Hydrologic unit 07120003,
GIS data layers were imported to the project in shape file format. Each GIS data layer
was projected to UTM 1983 Zone 16. GIS data layers that were loaded into BASINS
4.0's window were: Stream network data ( National Hydrography Dataset (NHD) and it
was used because it has more complete hydrography layers than the core Reach File, VI
(RF1) layer provided by BASINS); Chicago O'Hare Airport metrological station data;
GIRAS land use data (from the 1970s) and National Land Cover Data (NLCD) for 1992
and 2001; BASINS Digital Elevation Model (DEM) Grids; Water quality and quantity
monitoring station data (USGS and STORET); Contour and soil type layers. Time series
148
data for imported shapefiles were later downloaded from the BASINS window and saved
as WDM files.
The land use data available through BASINS are either Level (I) or Level (II) land
use type. In order to fulfill the objective of the study level (III) land use type were
acquired from Chicago Metropolitan Agency for planning. CMAP's 2005 landuse
inventory, in shapefile format, was added to the BASIN project. The inventory was
created using 2005 digital aerial photography, and supplemented with data from
numerous government and private-sector sources (CMAP, 2012). The inventory covers
Cook, DuPage, Kane, Kendall, Lake, McHenry and Will counties, identifying areas as
small as one acre using a 49-category classification scheme (CMAP, 2012). The CMAP
land use data was further clipped into smaller shapefile using ArcGIS ArcMaplO clipping
tools, to fit the watershed study area because of limitations in processing large landuse
classifications in BASINS. Land use types classifications used for the study are shown in
Appendix.
6.2.1.3 Watershed Delineation using BASINS 4.0. The watershed delineation tool
within BASINS 4.0 was used to delineate the Chicago River Watershed. Watershed
delineation is the process by which the watershed boundary and stream network are
determined according to the watershed topography and similarity of physical processes. It
is used to determine a contributing watershed area for a specific outlet or to divide the
watershed into sub basins. Delineation is part of a segmentation process, which is
required by HSPF, where the watershed is divided into segments to analyze them. The
delineation is either performed automatically using DEM grids or manually where

149
existing streams and basins are manually selected and used to determine the watershed.
For this study Automatic delineation was used. The delineation process ended up in
determining the three GIS layers that are required to run the HSPF: Streams, Subbasins,
and Outlets.
For this study the delineation process divided the watershed into two subbasins:
the Upper Chicago River subbasin and the Calumet River subbasin. The two subbasins
were actually naturally separated before the building of the Chicago Area Waterway
System (CAWS), and both used to drain into Lake Michigan. They are hydrologically not
connected, i.e. no stream is connecting the two subbasins within the watershed boundary,
and hence, they were represented as two subbabisns at the end of the delineation process.
Figure (6.1) shows the results of the delineation of the Chicago River watershed. The
three GIS layers required by HSPF were determined for each of the sub watershed.
The automatic delineation also estimated stream network parameters within each
subbasin using the digital elevation layer and stream network layer provided. Average
stream slope, stream length, drainage areas and elevation of each stream segment were
estimated as well.
The only way to consider the two subbasin as one was to choose an outlet that lay
outside the boundary of the Chicago River watershed, however this would mean part of
Des Plaines River watershed should be included and data for Des Plaines River and Salt
Greek should be included as well and that would be beyond the scope of the study to
investigate the Chicago River watershed in a watershed perspective. Also the complex
stream behaviors of the Calumet River subabsin would not make it possible for the
subbasin to be analyzed within the boundary of the watershed for the same reason so only
150
Upper Chicago River subbasin was investigated in a watershed context for the physical
simulation part of this study.
m VU-liB- fl-A
r ^ iwriinAMrtye* _
jj r OUtt Ungad vWjh*! f6?i 20003nJw flj~
ft* AWefhMSNpeflB(0?1M003nedwst*i, Q
(7 Strewn SImwflle (rw)(0712000><Wt
u Q OyttetaMaH ShapeFte(OytWs.cJ*cy<iri2on
Jl NHDPW
r NHCMM C*
.1 r Pant Sourc sna VNtnorawais
P Prmt CompMrc System
T ObservwJCWta SJaloos
; P sroperswwM
P NiMS Oround-VSMw Sura
P NtMSVMetOmtYSIAons
- P t*MS 0*fr C*ctwg* Savons
P MrttS MaMurfw(t Staxvis
I ' Wewher SMkn Stos 2006
r Barter
r NAWQA Study Area Un* Boundaries
'<& Hyerotojpr
P NMMlHydiogii((tyMasa(07120003
C ReteM**. V1
f~ CtiO0rigUniCO(M
- ( " AccanUng LW Boundanes v
mmmm
X <82.427 140 V 4,654.15662SMtl X 482427 Y: 4.654 J57
Figure 6.1. The Chicago River Watershed delineation process using BASINS 4.0 (the red
lines represent the subbasins formulated).
151
6.3 Watershed Simulation
The WinHSPF interface was launched by selecting HSPF model from Models
menu in the BASINS main window. Shapefiles such as the study area's Subbasin,
streams, and outlets resulted from delineation process along with land use and
metrological station shapefiles were supplied in order to initiate the WinHSPF. Once
WinHSPF was launched, a HSPF User Control Input (UCI) file and watershed data
management file (WDM) were created.
The WinHSPF divided the Upper Chicago River subbasin into homogeneous land
areas known as Hydrologic Response Units (HRUs). The HRU were used to define 6
reaches and 7 sub watersheds. The reaches element specifies the rivers, lakes and
channels created in HSPF's RCHRES (reach-reservoir) module.
The hydraulic characteristic of each reach was defined by parameters in the
function tables FTABLES that represent volume-discharge relationships for each reach.
A fixed relationship was assumed among water level, surface area, volume and discharge
(Singh et al., 2005). HRUs can be impervious or pervious areas, which once determined,
would be modeled independently. Each HRU requires input data such as metrological
data and parameters related to land use, soil characteristics to simulate hydrology,
sediments, and nutrients (Donigian et al., 1995).
The main simulation modules are PERLND, IMPLND, and RCHRES and they
simulate pervious land segments, impervious land segments, and free flow respectively
(Donigian et al., 1995). Figure (6.2) shows the schematic created by WinHSPF to
represent the Upper Chicago River subbasin. The schematic shows all the elements such
as streams, and subbasins that were included in the model.

152
flb t* AjUfeat Hfc
Lb
Ul
U.^
KiPC^
Figure 6.2. Schematic created by WinHSPF for the upper Chicago River subbasin.
153
6.3.1 Impervious Area Assumptions. One of important parameters that must be
estimated for accurate hydrologic analyses is the effective impervious area (EIA) of the
watershed (Sutherland, 2000). Studies suggest that using the urban land use as a non
point source for nutrients can give unrealistic results that's because the cover in urban
areas is impervious and drainage is frequently routed to waste water treatment plants
WWTPs (which may or may not be in the same basin), then discharged in the streams as
point sources (Ahearn et al., 2005).
Since accurate estimates of runoff volume are essential in the estimation of
pollutant loads, the effective impervious area (EIA) as a percentage of the total
impervious area (TIA) should be determined for basins that are directly connected to the
drainage systems (Sutherland, 2000). EIA include impervious areas such as paved
driveways connected to the street, sidewalks, rooftops that are connected to the curb or
storm sewer system, and parking lots (Sutherland, 2000). For urban runoff modeling or
hydrologic analysis, the EIA is usually less than the TIA. However, in highly urbanized
basins EIA values can approach and equal TIA (Sutherland, 2000).
TIA is determined using the two common methods: land-use or zoning maps; and
aerial photography (Jones et al., 2003). The scientific basis for the relationship between
land use and the amount of impervious surface was developed in the field of urban
hydrology during the 1970s (Brabec et al., 2010). In the early research, imperviousness
was evaluated using four ways: (1) using aerial photography and then using a planimeter
to measure each area, (2) counting the number of intersections that overlaid a variety of
land uses or impervious features by overlaying grids on aerial photographs, (3)

154
classification of remotely sensed images and (4) equating the percentage of urbanization
in a region with the percentage of imperviousness (Brabec et al.,2010).
The majority of current impervious surface studies rely on the methods of these
original studies and subsequent studies that correlated percentage impervious surface to
land use largely by using estimates of the proportion of imperviousness within each class,
see Appendix B(Brabec et al., 2010). Some of the TIA determined using aerial or satellite
photography and adopted for this study found in the literature are shown in Table (6.2)
(Brabec et al., 2010; Sutherland, 2000).
The three recent methods most commonly used to determine E I A are field
measurements, empirical equations and calibrated computer rainfall-runoff models
(Jones, et al., 2003).
Empirical equations to determine E I A were used in this study. One relationship
was proposed by Alley et al. (1983) based on work completed for highly urbanized
drainage areas in Denver, Colorado (Sutherland, 2000). They proposed the equation:
E I A = 0.15 x T I A 1 A 1 6.1
Other relationship was developed by Laenen (1983), for the USGS, was based on
work completed on more than 40 watersheds throughout the metropolitan areas of
Portland and Salem, Oregon (Sutherland, 2000). An empirical equation based on this
database to estimate EIA as function of TIA was proposed:
E I A = 3.6 + 0.43 x T I A 6.2
Based on the USGS calibrated values of E I A for all Basins, Sutherland re
analyzed e q u a t i o n ( 6 . 2 ) a n d d e v e l o p e d s e r i e s o f e q u a t i o n s t h a t p r o v i d e e s t i m a t e s o f E I A
values to be applied to various generalized conditions of subbasins as input into

155
hydrologic models, (see Appendix B) (Sutherland, 2000). These equations are
summarized as follow:
1. Extremely disconnected basins, with either extensive infiltrations measures or
basin serviced predominantly with ditches/swales.
E I A = 0.01 x T I A 2 0 6.3
2. Somewhat disconnected basins, either 50% of urban areas serviced by ditches or
swales and roofs disconnected or an average basin with some infiltration
measures:
E I A = 0.04 x T I A 1 1 6.4
3. Average basins, no infiltration measures, roofs disconnected:
E I A = 0.1 x T I A 1 - 5 6.5
4. Highly connected basins, no infiltration measures, roofs connected:
E I A = 0.4 x T I A 1 - 2 6.6
5. Totally connected basins, no infiltration measures, roofs connected:
EIA = TIA 6.7

156
Table 6.2. Some of TIA percentages adopted for this study based on literature
Land Use Category (TIA)%
Agricultural 0
Commercial 85
Forest 0
Industrial 85
Multi-Family Residential 50
Single-Family Residential 35
Public Open Space 0
Roads 85
Schools 50
Vacant 0
Water 100
157
6.3.2 Flow simulation. Flow is the first component to be simulated. PWATER and
IWATER are the modules used for flow simulation. PWATER calculate the components
of the water budget and predict the total runoff from pervious land segments. IWATER
module simulates the retention, routing, and evaporation of water from impervious land
segments. The instream hydraulic behavior is simulated by HYDR module.
For each reach, a fixed relationship is assumed among water level, surface area,
volume and discharge. Instream simulation is based on the assumption of a completely
mixed system with unidirectional, longitudinal flow simulation. The hydraulic
characteristics of reaches in the model are defined by parameters in the function tables
(FTABLES) that represent volume discharge relations for reaches (Singh et al., 2005).
Parameters needed for the simulation such as nominal upper zone storage, nominal lower
zone storage, soil moisture infiltration rate, percent vegetation cover of each land use
type and groundwater recession rate were populated with BASINS default values or
literature values and later adjusted during hydrologic calibration.
6.3.3 Water quality simulation. The simulation of nutrient loadings from different
land use nonpoint sources was done using the HSPF modules PQUAL and IQUAL. A
simplified approach that simulates each water quality constituent independently based on
simple relationships with water or sediment was used by the modules. The species
modeled were total ammonium (NH3+NH4) as N, total nitrate (N03+N02) as N and
ortho phosphorus (P04) for both pervious and impervious land segments.
The PQUAL and IQUAL simulate the pollutants using one of two methods: either
by direct wash off by overland flow where the constituent is simulated based on basic
158
depletion and accumulation rate or by wash off associated with detached sediments where
the constituent is simulated as a function of sediment removal. The first approach was
adopted for all the species since the study area is largely impervious and the nutrients will
basically washed off with overland flow.
Wash off is simulated using the commonly used relationship (Bicknell et al.,
2001):
SOQO = SQO*(1.0 - exp (-SURO*WSFAC)) 6.8
Where:
SOQO = washoff of the quality constituent from the land surface (lb/ac/day)
SQO = storage of the quality constituent on the surface (lb/ac)
SURO = surface outflow of water (in/day)
WSFAC = susceptibility of the quality constituent to washoff (/in)
exp = exponential function
And the storage of constituents on the land surface is calculated using equation
6.9 to account for the accumulation and removal processes (Bicknell et al., 2001):
SQO = ACQOP + SQOS* (1.0 - REMQOP) 6.9
Where,
SQO = storage of available quality constituent on the land surface (lb/ac)
ACQOP - accumulation rate of the constituent on the land surface (lb/ac/day)
SQOS = SQO at the start of the interval, and
REMQOP = unit removal rate of the stored constituent (/day)

159
HSPF simulates several physical, chemical and biological processes within a
stream reach using the RCHRES module. It is assumed that the reaches are completely
mixed and the flow is unidirectional. Point sources were added in the HSPF simulation.
The two known NPDES that could be added to the watershed are North Side WRP and
Calumet Water WRP. In WinHSPF, after the non-point source loadings were calculated
for each land use, they were then added to their corresponding reaches along with the
identified point sources. For each channel reach, WinHSPF the fate, transport, and
delivery of the nutrient loads will be simulated using the reach quality module (RQUAL).
6.3.4 Model Calibration and Validation. Hydrologists need to evaluate model
performance for the following reasons: (1) to provide a quantitative estimate of the
model's performance and predictive ability; (2) to provide a measure to evaluate any
improvements to the modeling approach; (3) to compare results of different modeling
efforts with previous results (Krause et al., 2005).
Calibration is an iterative procedure of parameter adjustment, as a result of
comparing simulated and observed parameter values (Donigian, 2002). Initial set of
values for all parameters are used based on literature recommendations then later refined
and improved until reasonable difference between simulated and observed data series are
observed (Donigian, 2002). Validation is the procedure that ensures that the calibrated
model can properly assesses the watershed variables and conditions that can affect model
results, and demonstrate the ability of the model predict observations for periods separate
from the calibration period (Donigian, 2002).
No commonly accepted modeling guidance has been yet established, although the
160
American Society of Civil Engineers (ASCE) had emphasized the need to clearly define
model evaluation criteria since in 1993 (Donigian, 2002). However, specific statistics and
performance ratings for the models use have been developed and used for evaluation
(Calderon, 2009). A number of 'basic truths' are evident and are likely to be accepted by
most modelers in modeling natural systems (Donigian, 2002):
Models are solely approximations of reality and cannot exactly represent natural
systems.
There is no single statistic or test can be acceptable to determines whether or not a
model is validated
Graphical comparisons and statistical tests are both required to evaluate model
calibration and validation performance.
Models cannot be expected to be more accurate than the errors in the input and
observed data.
A 'weight of evidence' approach is accepted and used to examine and assess
model performance, for these purposes multiple model comparisons, both graphical and
statistical are preferred (Donigian, 2002).
For this study model performance and calibration/validation are evaluated through
qualitative and quantitative measures, involving both graphical comparisons and
statistical tests. The calibration/ validation process is hierarchal process starts with
developing parameters, then hydrology calibration/validation and finally water quality
calibration/validation. Graphical comparisons include observed vs. simulated scatter
plots, with a 45 linear regression and statistical comparisons Error statistics, e.g. mean
error, absolute mean error or correlation tests. Among the standard regression, Pearson's
161
correlation (r) and determination (r2) coefficients were used. Those coefficients describe
the degree of co-linearity between simulated and observed data. The regression
coefficients are given by the following equations:
^(Qj-oxsj-s)
r 6.10
Where Oj and Sj are observed and simulated values respectively and 0 and S are
the mean of observed and simulated values respectively.
For model performance, (r) ranges from -1 tol. A value closer to 1 means better
performance. For (r2) the values range from 0 to 1 higher values means less variance and
better performance, generally a value above 0.5 is considered acceptable (Donigian,
2002; Calderon, 2009).. The fact that only the dispersion is quantified is one of the major
drawbacks of (r2) if it is considered alone (Krause et al., 2005). A model which
systematically over or under predicts will still result in good (r2) values close to 1.0 even
if all predictions were wrong (Krause et al., 2005).
Other model evaluation criterion is the Nash-Sutcliffe efficiency coefficient. The
efficiency NSE proposed in 1970 and is defined as one minus the sum of the absolute
squared differences between the predicted and observed values normalized by the
variance of the observed values during the period under investigation (Krause et al.,
2005). It is calculated as:
NSE=1-t^f 611
Where Oj and Sj are observed and simulated values respectively and 0 is the mean
of observed values.
The range of NSE lies between 1 (perfect fit) and -oo. An efficiency of lower than
162
zero indicates that the mean value of the observed time series would have been a better
predictor than the model. The largest disadvantage of the Nash-Sutcliffe efficiency is the
fact that the differences between the observed and simulated values are calculated as
squared values. As a result larger values in a time series are strongly overestimated
whereas lower values are neglected (Krause et al., 2005).
Root Mean Square Error (RMSE), Normalized Root Mean Square Error (NRMSE)
and Mean Absolute Error (MAE) are other statistical indices that can be used to evaluate
model performance. They are given by the following equations:
RMSE = p=i(0i Si)2 6.12
NRMSE = RMSE 6.13

Omax Omin
MAE = Jzr=ilSi - Oil = ^ZP=ilet| 6.14
Where 0, and S; are observed and simulated values and n number of records.
Omax and 0min are maximum and minmum observed values. RMSE and MAE
measure the aggregated differnce between simulated values and observed values. Values
close to zero indicate better performance.
Percent Mean Error (PME) is a general calibration/validation measure that have
been provided to HSPF model users to be used in model performance evaluation
(Donigian, 2000). The values in the table provide general guidance, in terms of the
percent mean errors or differences between simulated and observed values, so that users
can determine the level of agreement or accuracy (i.e. very good, good, fair) that might
163
be expected from the model application (Donigian, 2000). Table 6.3 shows Percent
Mean Error (PME) values for different modeling processes.
Table 6.3. General Calibration/Validation Targets or Tolerances for HSPF Applications

(Donigian, 2000)
% Difference Between Simulated and

Recorded Values
Very Good Good Fair
Hydrology/Flow <10 10-15 15-25
Sediment <20 20 - 30 30 - 45
Water Temperature <7 8-12 13-18
Water Quality/Nutrients < 15 15 -25 25 -35
Pesticides/Toxics <20 20 - 30 30 - 40
164
6.4 HSPF Simulation Results.
For the Upper Chicago River subbasin the results of simulation were measured at
North Branch Chicago River at Grand Ave, Chicago. The location was chosen to
represent the outlet for the subbasin. There were two factors that limited the time period
for the calibration and validation of the model. First the observed flow was limited to the
period 2002 to 2010 (with some missing data in the period of 2003-2004) but the
available metrological data end at 2006 so only the period 2002 to 2006 was allowed for
performing the flow simulation, calibration and validation.
The other factor was that the land use data applied was for the year 2005 so a
simulation period around this year would give more realistic results for land use. Thus the
calibration and validation period for flow was restricted between the years 2002 and
2005. For the water quality a little longer simulation period was considered since the
observed nutrient information was available for the period of 1970-2010 and all
metrological data needed was available for the period 1995-2006 as shown in Table 6.1
but a period closer to the range of flow simulation was chosen for water quality
calibration and validation, which was 2000-2005. USGS flow information at station
05536118, and MWRDGC nutrient information at station WW 46 both located at North
Branch Chicago River at Grand Ave at Chicago were downloaded as observed data.
Figure (6.3) shows the GenScn window where the calibration and validation were
performed.
165
(icnSc n. nh t ni<j|i 1
ffc Analy* Hap Locations Scenarios CorattuMts HnaSariM MM Hafe

CQMlitlMMltS
* K k i 1*1 o t t l - H . 1 y|i i
2-7 ** ii 1 of IS Al | Mom I
a Ml r a M r
ICOMPUTED A6W0
NB.LHAP ATEM
UUSI MVLD CLOU
PT-100 PCWP
PT-EFFEC
PT-O0S IFWO
PTTHOUS LZSX
PTX
TIM Swift* (2 of 129J
+IHXI *!!! -fl
DSN Scanano Locaton ConMtuer* Start
_d
Dates
But I Start End i n ,f( v
Coram lasnsrr fjorafTpr
cgmiii lamfiolT to laooal 5[5i [n*^ T]
BgHaWBlaMH
Figure 6.3. GenScn window where performance of model evaluated.

166
6.4.1 Hydrology Calibration and Validation. The stream flow simulation was carried
out using meteorological data from Chicago O'Hare airport station, covering the period
from 1 October 2002 to 31 May 2003, and with 2005 land use data obtained from the
Chicago Metropolitan Agency for Planning with a detailed Level (III) land use data as
mentioned .To calibrate the flow and measure its sensitivity to impervious land segments,
the equations proposed by Alley, Laenen and Sutherland (Alley et al., 1983; Laenen,
1983; Sutherland, 2000) were adopted to find the percentages of pervious and impervious
land segments.
Computed land uses were then used in the flow simulation where iterative
procedure was taken. There were limited availability of data and guidelines associated
with the model input parameters for pervious and impervious land segments, so
BASINS's default input parameters were used at first. The parameters were lower storage
nominal (LZSN), upper zone storage nominal (UZSN), mean soil infiltration rate
(INFILT), lower zone evapotranspiration (LZETP), ground water detention storage
(INTFW), and interflow recession coefficient IRC. LZSN, UZSN and INFILT parameters
affect the total annual flow volume and adjusting them can alter the total annual
simulated flow volume. LZETP, INTFW, and IRC affect the base flow conditions of the
river and hydrograph shape and peak flow conditions. All these parameters were
calibration parameters that could be estimated and adjusted during the model calibration
process.
The various module parameters were repeatedly adjusted and model was run and
simulated and observed values were compared until reasonable correlation and
determination coefficients were obtained. WinHSPF's 'Input Data Editor Tool' was used
167
to manually adjust these parameters The model was run and calibrated using the proposed
equation to compute effective impervious area (EIA) and results were compared to
observed values in order to choose which equation to adopt. Table 6.4 show each
EIA equation used and the correlation and determination coefficients associated with each
EIA equation use in the simulation process.
Equation 2 showed acceptable performance and it was the one adopted for
calculating effective impervious area and determining percentages of pervious and
impervious land segments for the watershed.
The calibration period selected for hydrology calibration was October 2002 to
May 2003. Figures 6.4 to 6.6 shows the results obtained from the hydrology calibration
and graphical comparisons between observed and simulated values.

168
Table 6.4. Calibration/ sensitivity analysis for EfA equations for the study area
EIA equation r r2
_______ 0.670 045"
(EIA = 0.15 x TIA141)
2. Laenen, 1983, 0.714 0.51
(EIA = 3.6 + 0.43 x TIA)
3. Sutherland, 2000, Highly 0.670 0.45
connected basins, no
infiltration measures, roofs
connected
(EIA = 0.4 x TIA12)
4. Sutherland, 2000,Totally 0.640 0.41
connected basins, no
infiltration measures, roofs
connected
(EIA = TIA)
169
4000
NB_CMAP_ RCH10
3600
OBSERVED 0553(5118
2400
Cj 2000
1600
1200
800
400
OCT DEC JAN MAR APR MAY

2003
AruJysis Plot for FLOW
Figure 6.4. Simulation of flow for calibration period
10000
NB_CMAP_ RCH10
OBSERVED 05536118
&
2 1000
u.
100
0.5 5 10 20 30 50 70 80 $0 95 98 99.5
Percent thance FLOW exceeded

Analysis Plot for FLOW
Figure 6.5. Duration curve for calibration period

170
Y= 0.949 X+ 152.615
Coir Cotf= 0.714
800
400
0 800 1600 2400 4000
NB_CMAP_ RCH10
Satur Plot (NE_CMAP_ RCH10 vs OB SERVED 05536118)

for FLOW
Figure 6.6. Observed vs. simulated flow scatter plot for calibration period (red scatter
points and line represent the simulated data)
Inspecting Figure 6.4 it was found that simulated flow is slightly lower than the
observed flow but it perfectly mimics the pattern of flow in high flow season but not low
flow pattern. The duration curves of the simulated and observed flow Figure 6.5 reveals
the same, there is slight and almost fixed differences between simulated and observed
flow. The duration curve also shows that the two curves mostly follow same pattern for
95 percentile of flows. These results may suggest that the percentages of pervious and
impervious areas proposed were able to reflect the pattern of flow but not the exact value
of flow in the watershed.
Figure 6.6 presents a graphical comparison between observed and simulated
scatter plots, with a 45 linear regression. With a correlation coefficient of 0.714, the plot
reveals that the two data sets were sufficiently matched.
Table 6.5. Statistical results of hydrology calibration
Observed Simulated PME r r2 NSE RMSE NRMSE
mean flow mean flow
470.13 335.30 28.6 0.714 0.51 -0.16 76 0.04
As shown in the Table 6.5, the model performance had reveled acceptable
calibration based on statistical indicators and acceptable ranges published in the literature
for hydrologic simulation. Determination and correlation coefficients (r) and (rz) showed
acceptable values and acceptable model performance. The percent mean errors (PME) is
slightly above 25%. The overall performance of model can be considered acceptable
given all the criteria together.
The hydrology validation period chosen was October 2004 to April 2005. Figures
172
6.7, 6.8, and 6.9 below show the results obtained from the hydrology validation and
graphical comparisons between observed and simulated values. Table 6.6 shows
statistical results of hydrology validation.
Table 6.6. Statistical results of hydrology validation
Observed Simulated PME r r2 NSE RMSE NRMSE
mean flow mean flow
484.18 358.33 25.9 0.37 0.608 -0.10 38.35 0.06
Results from the hydrologic validation analysis shows that some of the statistical
indicators are fair based on graphical representation and according to the guidelines given
by Donigian (Donigian et al., 2000). The model showed better performance in the
validation period relative to the calibration period except for the poor r and fair to
acceptable r2. Again the overall model performance will be considered acceptable based
on those criteria taken altogether.

173
"i 1 r
NB _CMAP_ RCH10
OBSERVED 05536118
frfr- lad
OCT NOV DEC JAN FEB MAR APR
2004 2005
Analysis Plot for FLOW
Figure 6.7. Simulation of flow for validation period
10000
: NB_CMAP_ RCH10
! OBSERVED 05536118
1000
100
0.5 2 5 10 20 30 50 70 80 90 95 98
Patent drumM FLOW exceeded
Arutysis Plot for FLOW
Figure 6.8. Duration curve for validation period

174
4000
Y= 0.752 X+ 213.494
Con Corf = 0.608

oo
800
0 800 1600 3200 4000
NB_CMAP_ RCH10
Scatter Plot (NB CMAP.RCHlOvs OBSERVED 05536118)

for FLOW
Figure 6.9. Observed vs. simulated flow scatter plot for validation period (red scatter
points and line represent the simulated data)
6.4.2 Water quality calibration and validation. The calibration and validation
process in HSPF is a hierarchical methodology beginning with the hydrology and end
with water quality constituents (Donigian, 2000; Calderon, 2009). After the flow
calibration processes, nutrient constituents were added to a list of parameters to be
modeled in the WinHPSF's Pollution Selection Window. For this study, nutrient
constituents simulated were total nitrates (N03+ N02) as N, total ammonia (NH4+NH3)
as N and orthophosphate (P04). HSPF uses PQUAL and IQUAL modules to simulate
constituents of the nutrients individually. Total nitrogen and Phosphorus loads were
calculated later using scripts provided by HSPF. Various nutrient modeling parameters
were added for both pervious and impervious land segments. These parameters include
the constituent washoff factor, monthly constituent accumulation factor and the initial
storage for each constituent. These parameters were calibration parameters that were
adjusted and calibrated until a reasonable model behavior was reached.
The results of the nutrient simulations were examined and compared with the
observed values. The initial simulation trials resulted in ammonia and nitrate values were
consistently over predicted mostly during the wet season while orthophosphate nitrate
were over predicted for all the year. Calibration parameters which were adjusted include
the monthly accumulation factors and monthly values for limiting storage for each
constituent for both pervious and impervious land segment. The adjustments were carried
until a reasonable model performance was seen. Instream process parameters were also
adjusted. Nitrification and denitrification parameters (KN0320), along with oxidation
rate (KTAM20) and algal growth rate parameters were adjusted.
Figure 6.10, 6.11, and 6.12 shows graphical results of calibration results for total
176
nitrates, total ammonia and ortho phosphorous respectively. Table 6.7 summarizes the
calibration statistics for the nutrients simulated.
* * OB SERVED NOB
SIMULATED N03
S O N D J F M A M J J A S 0 N D | J F M A M J J A S
2002 ' 2003 2004
Amfysis Plot fcr t RCH10
Figure 6.10. Simulation of total nitrates for calibration period

177
* * OB SERVED NH4
SIMULATED kb
3.6
2.4
1.2
S 0 N D I J F M A M J J A S 0 N D I J F M A M J J A S
2002 ' 2003 ' 2004
Arufysis Plot for t RCH10
Figure 6.11. Simulation of total ammonium for calibration period
* x OBSERVED P04
SIMULATED P04
24
2000 2001 2002 2003
Artstysis Plot for it RCH10
Figure 6.12. Simulation of ortho phosphate for calibration period

178
Table 6.7. Statistical results of water quality calibration
Constituents Mean Mean ME PME MAE RMSE NSE
observed simulated
value value
Nitrate-N 5.81 5.50 0.228 3.93 1.74 2.21 0.13
Ammonia-N 2.29 2.09 0.048 2.11 0.96 1.23 -0.35
OrthoP 0.91 0.96 0.087 9.52 0.71 0.99 -0.18
The results of the calibration show that there is an acceptable agreement between
the observed and simulated data. Statistical results for best-fit calibration of total nitrates
and the percent mean error between the simulated and observed data for nitrate show that
the model performance criteria PME was very good for all the constituents as the
accepted tolerances suggested by Donigian, Table 6.3. Other statistical values could be
considered acceptable.
The validation process was conducted with water quality data for the period
between November 2004 and December 2005 for total nitrates and total ammonium, and
for the period of January 2004 to December 2005 for P04. The validation purpose is to
make sure that calibrated model and its adjusted parameters can properly resemble the
watershed conditions that can affect model's results. Once the model is calibrated and
parameters are optimized, the model was run for the specified validation period and the
179
results were statistically analyzed. Figures 6.13, 6.14, and 6.15 show graphical
representations of validation periods for total nitrates, total ammonia and orthophosphate
respectively.
* * OB SERVED N03
SIMULATED N03
N D J F M A M J J A S 0 N D
2004 2005
Analysis Plat, for t RCH10
Figure 6.13. Simulation of total nitrates for validation period

180
20 i | i 1 1 1 r~
* * OB SERVED HH4
SIMULATED NH
16
Q- 12
r
8
a 8
njJviluiJ. t'
WrJ
N D J F M A M J J A S 0 N D
2004 ' 2005
Aiufysis Plot for t RCH10
Figure 6.14. Simulation of total ammonium for validation period
6
* *OB SERVED P04
SIMULATED PCM
4.8
S1 3.6
1.2
0
J F M A M J J A S O N D J F M A M J J A S O N D
2004 2005
AntîsNrtftrttRCHlO
Figure 6.15. Simulation of ortho phosphate for validation period

181
Table 6.8. Statistical results of water quality validation
Mean Mean ME PME MAE RMSE NSE

Constituents
observed simulated
value value
Nitrate-N 5.25 5.01 0.399 7.61 1.86 2.16 -0.64
Ammonia-N 2.62 2.16 0.501 19.14 1.488 1.98 -3.35
Ortho P 1.02 0.88 0.179 17.54 0.625 0.76 -0.36
According to the results obtained from the validation process period, the model
performance is considered very good for all total nitrates and good performance for total
ammonium and phosphate based on PME value (Table 6.3) for accepted performance
values suggested by Donigian (Donigian et al., 2000).
6.4.3 Comparing Data Driven and Physical Models. For the proposed framework for
Chicago River Watershed, both data driven and physical models were developed.
Comparing the performance of the two model approaches' results are shown in Table 6.9.
It suggests that data driven models show better performance, RMSE for regression
models vs. physical model showed up to 10.7 % increase in prediction performance.
Although the use of data driven approach for modeling of complex physical systems is
receiving an increasing interest as the result of the growing availability of data, it is not
easy to precisely link the data driven technique to the most important physical variables
that govern the natural processes of the watershed system (Preis et al., 2008). This
property of the physical model would benefit in the analysis of different scenarios that the
watershed may face such as climate change, population change, or inclusion or removal
182
of certain physical variables to the watershed, thus provide a planning tool for regulatory
environmental agencies in Chicago River Watershed to use and develop better
management programs. Also as discussed in section 5.5.1 data driven models showed less
predictive performance for high total nitrate values. However, the data driven models
require fewer inputs and can be deployed anywhere in the watershed while the physical
model require extensive data inputs and can only be applied in the specific watershed
outlets selected in the simulation. These arguments make it logical to suggest the use of
both physical and data driven models is essential for the proposed framework. The
physical model can be used whenever significant physical change takes place in the
watershed as a planning tool while the data driven model can be used as an operating tool
that can be used periodically to inspect the watershed water quality parameters, especially
if TMDL and WQS are established for the watershed.
Table 6.9. Comparing Physical and data driven models
Physical ANN Gaussian Decision
Model Process Tree
RMSE 2.160 1.9469 1.9368 1.9279
6.5 Total annual loads of nutrients
HSPF, specifically the modules PQUAL and IQUAL, was used to estimate annual
loadings of total nitrogen and total phosphorus from forty four different land use types in
the Upper Chicago River Basin. Based on the results from the calibrated and validated
water quality model, the total annual loads from the Upper North Chicago River subbasin
were computed.
183
Average nutrient loads from individual some land use segments from 2000 to
2005 were displayed in Tables 6.10 and 6.11 for total nitrogen and total phosphorus
respectively. The average nutrient loadings for total nitrogen and total phosphorus for all
land use types along with pervious and impervious nutrient yield values for the watershed
are shown in Appendix B. Figure 6.16 shows the total nitrogen and total phosphorous
form point and non point sources. Also Figures 6.17, 6.18, and 6.19 show percentages of
different land use areas, total nitrogen and total phosphorous associated with each land
use type
The results of the simulation show that from 2000 to 2005, the land use type that
produced the highest total nitrogen and total phosphorus loads in the Upper Chicago
River subbasin was residential single family land use segment. This is expected, since
residential single family land use is the dominant land use type in the Basin. During this
study, no information that can relate the contribution of a detailed land use, level (III), to
the total nitrogen and total phosphorus loads to the Chicago River watershed or any
similar highly urbanized watersheds was found. Therefore, it was difficult to determine
how well the loads simulated by the model match the actual loads but based on the results
of nutrient model calibration and validation presented in section 6.4.2, it can be assumed
that the model had done an acceptable and unique work in estimating total nitrogen and
total phosphorus loads from a detailed land use segments.

184
Table 6.10. Simulated annual loads of total nitrogen
Land Use Type Combined EC Area Total Annual % Loads

(lbs/acre / yr) (acres) Loads (lbs)
2.8288 61776 174743 46.28
Residential Single
Family
3.2094 9595 30794 8.26
Residential Multi Family
4.4022 5924 26077 7.10
Urban Mix W/ Parking
Lot
4.397 5403 23755 6.48
Industrial W/ Parking Lot
3.2098 3722 11946 3.22
Education
4.4022 2315 10193 2.78
Interstate/ Toll
0.8778 11554 10140 2.42
Open Space Cons
4.9788 1470 7318 2.02
Lake/ Reservoirs/
Lagoon
4.4022 1603 7056 1.92
Business W/ Parking Lot
185
Table 6.11. Simulated annual loads of total phosphorous
Land Use Type Combined EC Area Total Annual % Loads

(lbs/acre / yr) (acres) Loads (lbs)
Residential Single 0.1876 61776 11583 47.66

Family
0.1964 9595 1885 8.00
Residential Multi Family
Urban Mix W/ Parking 0.2244 5924 1330 6.14

Lot
0.1496 11554 1731 5.80
Open Space Cons
0.2244 5403 1212 5.60
Industrial W/ Parking Lot
0.1964 3722 731 3.12
Education
0.2244 2315 520 2.40
Interstate/ Toll
0.0578 9455 549 1.82
Golf Course
0.2244 1603 360 1.64
Business W/ Parking Lot
186
9,800,000 1,400,000
PS{N) NPS(N)
NPS(P) PS(P) 1,200,000

9,100,000
1,000,000
8,400,000
800,000
600,000
7,700,000
400,000
7,000,000
200,000
6,300,000 0
2001 2002 2003 2004
Figure 6.16. Point and Non-Point Nutrients' Loadings (lb)

187
Residentia! Single Family Open Space Cons Residential Multi Family Golf Course
Urban Mix W/ Parking lot Vacant/ Grass Industrial W/ Parking Lot Open Space Recreational
Education Interstate/ Toll Business W/ Parking Lot Lake/ Reservoires/ Lagoon
Government Cemetry Crops/ Grain/ Graze Wetland
Office Cmps Religious Manafacturing/ Production Utilities/ Waste
Single Office Retail Center Urban Mix No Parking Lot Transportation
Medical Cultural/ Entertainment Warehouse/ Distribution/ Wholesale Other Roadway
Construction Residential * Construction Non-Residential Residential Mobile Home Mall
4 Other vacant Rivers/ Canals * Nursery/ Greenhouse/ Ore Hotel/ Motel
Open Space Private Institutional/ Other Water Open Space Linear
Communication independent Auto Parking Open Space Other Residentia! Farm
Figure 6.17. Land Use Area in Upper Chicago River Basin

188
0.10%
0.72%
0.48%
0.60%
0.64%
I 1.04% 0.42%
1.02% u 09
0.76% 0.72%
1.74% 126% 1.26%
1.78% 1-28%
2.02% "
2.42%
2.78%
Residential Single Family Residential Multi Family Urban Mix W/ Parking Lot Industrial W/ Parking Lot
Education Interstate/ Toll Open Space Cons Lake/ Reservoires/ Lagoon
Business W/ Parking Lot Golf Course Government Manafacturing/ Production
Office Cmps Utilities/ Waste Vacant/ Grass Transportation
Religious ft Single Office Warehouse/ Distribution/ \ Open Space Recreational
a Retail Center Urban Mix No Parking Lot Medical Other Roadway
Cultural/ Entertainment Crops/ Grain/ Graze Construction Residential Construction Non-Residential
Cemetry Mall Rivers/ Canals Residential Mobile Home
Hotel/ Motel Wetland '* Nursery/ Greenhouse/ Ore Institutional/ Other
Other vacant Independent Auto Parking Open Space Private Water
Open Space Linear Communication Open Space Other Residential Farm
Figure 6.18. Total Nitrogen loads in Upper Chicago River Basin

189
0.88%
0.92% 0.66%
1.06%
0.64%
_ 0.56%
1.10% 0.54%
1.10%
0.42%
1.50%
1.64% 1-50%
1.82%
2.40%
5.60% 47.66%
5.80%
6.14% 8.00%
Residential Single Family Residential Multi Family Urban Mix W/ Parking Lot Open Space Cons
Industrial W/ Parking Lot Education Interstate/ Toll Golf Course
Business W/ Parking Lot Government Lake/ Reservoires/ Lagoon Manafacturing/ Production
Office Cmps Utilities/ Waste Vacant/ Grass Religious
Transportation Single Office Open Space Recreational Retail Center
Urban Mix No Parking Lot Medical Warehouse/ Distribution/ Wholesale Cultural/ Entertainment
Other Roadway Construction Non-Residential Construction Residential Crops/ Grain/ Graze
Mall Residential Mobile Home * Rivers/ Canals Wetland
Cemetry Hotel/ Motel '* Other vacant Institutional/ Other
Nursery/ Greenhouse/ Ore Independent Auto Parking Communication Open Space Linear
Open Space Other Open Space Private Residential Farm Water
Figure 6.19. Phosphorous loads in Upper Chicago River Basin

190
6.6 Detailed Land Use Export Coefficients
Export coefficients are generally used for calculating runoff pollutant loads for
different land use types. The most common pollutants for which export coefficients are
usually generated are total nitrogen (TN) and total phosphorus (TP) (Lin, 2004). The
export coefficients presented in this section are the first attempt to measure and model
nutrient using detailed land use types in the Chicago River Watershed and any similar
highly urbanized watersheds using a continuous simulation approach and watershed
perspective analysis. Previous studies estimated export coefficients ranges but only for a
limited number of land uses (Lin, 2004; Line et al., 2002; Mcfarland et al., 2001;
Smullen et al., 1999; Baldys et al., 1998; Frink, 1991; Loehr et al., 1989; Clesceri et al.,
1986; Driver et al., 1985; Rast et al.,1983; Beaulac et al., 1982; Reckhow et al., 1980).
For highly urbanized areas, storm event mean concentrations are generally used for
calculating runoff pollutant loads for urban land use types (Smullenet al., 1999;
Brezonik,et al., 2001).
Several water quality models used to estimate non-point water pollution into
watersheds require the input of either export coefficients (typically for rural areas) or
event mean concentrations (typically for urban areas) which represent the concentration
of a specific pollutant contained in stormwater runoff coming from a particular land use
type within a watershed (Lin, 2004). Export coefficients represent the average total
amount of pollutant loaded annually into a system from a defined area, and are reported
as mass of pollutant per unit area per year (e.g. lb/ac/yr) while EMC they are reported as
a mass of pollutant per unit volume of water (usually mg/L) (Lin, 2004).These numbers
are generally calculated from local storm water monitoring data because collecting the
191
data necessary for calculating site-specific EMCs or export coefficients can be cost-
prohibitive, hence, researchers or regulators will often use values that are already
available in the literature (Lin, 2004).
Export coefficients are very useful indicators that allow predicting the possible
yield of nutrients reaching receiving water bodies. Those values are the combination of a
lot of site specific conditions and variables at the watershed level including hydro
meteorological data, topographic data, land use management practices and physical
characteristics (Lin, 2004; Mcfarland et al., 2001, Calderon, 2009). If site-specific
numbers are not available, regional or national averages can be used, although the
accuracy of using these numbers is questionable and that is due to the specific
meteorological and physiographic characteristics of individual watersheds, agricultural
and urban land uses that can exhibit a wide range of variability in nutrient export
(Beaulac et al., 1982; Lin, 2004).
Figure 6.20 and 6.21 show the obtained export coefficients for total nitrogen and
total phosphorous respectively. Detailed export coefficient values are presented in
Appendix B.
192
Figure 6.20. Average Export Coefficients (EC) for different land use types for TN
193
0.25
0.2
0.15
0.1
Land Use Type
Figure 6.21. Average Export Coefficients (EC) for different land use types for TP
194
6.7 Conclusion
A water quality model based on hydrologic simulation was developed for Chicago
River Watershed. The model is the base for the finding of detailed land use effects on
water quality in the area. Moreover, the watershed simulation methodology presented can
support local and federal agencies in the development of TMDL's for the watershed since
it was based on the state of the art modeling procedures available. HSPF, the selected
water quality model, designed to support watershed based analysis and TMDL
development. The model can be successfully applied to a highly urbanized watershed
with appropriate consideration given to EIA. The results from the five year water quality
simulation resulted in finding of nutrients' loadings of both point and non-point sources.
Land use export coefficients for forty four different land uses were developed as well.
Export coefficients can be utilized as input for a multi-objective optimization approach to
resolve land use conflicts.
and analysis of different scenarios in the watershed and allows the evaluation of the
behavior of the watershed under possible future conditions, thus providing a planning
tool for regulatory environmental agencies. The data driven models developed in Chapter
5 can be used as operation tool to maintain the water quality parameters especially if
TMDL and WQS are developed for Chicago River Watershed.

195
CHAPTER 7
CONCLUSIONS
7.1 Summary
This research is an attempt to suggest a holistic framework, where watershed
perspective and historical data records are used as tools to investigate land use effects on
water quality in highly urbanized watershed, Chicago River Watershed. It is realized the
importance of thorough understanding of the spatial and temporal aspects of different
attributes of water resources, especially quantity and quality, and how are they are
interlinked. Finding comprehensive ways to interact and assess those attributes is the key
for sound and successful watershed management. This thesis makes a unique contribution
towards achieving sufficient integration between watershed elements such as water
quality, quantity, climate and landuse; and watershed problems, conflicts, needs and
targets; and improving domain knowledge and decision making ability in the same time.
The thesis introduced an approach to integrate the watershed data in a single
repository and presented methodologies for analyzing and assessing the watershed using
Data Warehouse (DW) and Data Mining (DM) technologies. The DW will make it easy
to access, retrieve, fill data gaps, analyze, and manage data records of water quantity and
quality, climate, land use etc. from different source agencies such as USGS, MWRDGC,
NWS, CMAP etc. and facilitate data interactions and decision making.
Current data storage systems are managed by independent and disparate sources
which created obstacles to synthesizing data from these different sources into a single
analysis. Even though there are systems that progressed to fill that gap; such as the old
system STORET which was introduced by EPA or the more recent enhanced observatory
196
system HIS that was introduced by CUAHSI; they proved to be deficient in their ability
to integrate and process different monitoring data to generate actionable information that
can facilitate assessing and understanding the watershed.
This research realized the need for a DW based on watershed needs to creatively
improve various watershed processes including support of complex querying of
watershed data and discovery of trends and patterns in data by incorporating 40 years
worth of watershed data from different source agencies in a central repository. The WDW
support decision-support queries that users typically need to address and that involve
analytics including aggregation, drilldown, and slicing/dicing of data by storing and
maintaining watershed data in multidimensional format.
To facilitate access to the WDW a tailored graphical user interfaces (GUI)
dashboard was built. The distinctive feature of this dashboard is that it consists of two
view layers of information, a monitoring layer to visually convey the information and an
analysis layer that allows summarized dimensional data, hierarchies, slicing and dicing of
data through ad hoc analysis tool.
The multi-dimensional watershed model presented in this study is the base for the
framework proposed to investigate land use effects on water quality in highly urbanized
watersheds. It provides readily integrated watershed data that offers holistic view of the
watershed elements, across the heterogeneous data sources. The DW concept described
allows combining data from different sources, such as USGS, MWRDGC, CMAP, and
NWS in a single repository. Implementing multi-dimensional modeling using DW
techniques facilitates the integration and aggregation of information at all desired levels
concerning watershed monitored locations.

197
The web-based dashboard and reporting tools allow the watershed stakeholders to
focus their efforts in monitoring, understanding and take proactive actions, in
management the watershed. The introduced GUI illustrates the ease with which the DW
dimensional concept can be mapped to graphical user interface design to create a tool that
facilitate the different intended tasks of the users, whether it is a watershed assessment
task or integrating data for a physical model application task. The ad hoc analysis tools
are further used where data can be sliced and diced to find patterns or pinpoint certain
problem areas and to provide necessary details, views, or perspectives that enable users to
understand a problem and identify the steps they must take to address it. This improves
the efficiency of analyzing and assessing a watershed over utilizing traditional databases.
Although, the model and the methodology were implemented for highly
urbanized watershed, it is not restricted and can be used without modification for any
watershed.
Moreover, the discipline of data driven modeling was introduced in this thesis for
Chicago River watershed using WDW repository. Several regression and classification
algorithms such as multiple linear regressions, artificial neural networks, model trees,
support vector machines, lazy learners, naive bayes, logistic regression and Gaussian
process were presented and assessed for their appropriateness for predicting total nitrates
using few watershed attributes. The results show acceptable prediction accuracy and
interpretability by number of algorithms in spite of the limited count of data used. The
resulting models could be deployed for built up scenarios that associate with change in
any of the watershed elements such as population, water quality regulations, land use,
climate etc. in order to predict future outcomes. Thus, insights offered by a site specific
198
data mining results can be integrated with policy and decision making tools to effectively
manage the watershed and optimally utilize its land use. In particular the decision tree
model approach is worth investigating for prioritizing steps of actions for instance when
considering handling a certain water quality parameter.
The success of data mining methodology relies heavily on the quality and quantity
of data used in the prediction process. Even though this study used a sufficient amount of
data, with logical set of predictors, more data and more watershed characteristics can be
incorporated to enhance the predictive models' efficiency and performance. Although the
ANN model always showed better performance, however, further training for decision
tree models would be more logical since they show reasoning process in rules that are
understandable to humans. These rules can assist policy making in watershed
management plans. On the other hand the other models do not provide such features to
enhance watershed management.
Data mining techniques presented in this study are intended to integrate some of
watershed parameters as indicators to predict the water quality parameter in question, and
hence simplifying the modeling procedures. This allows the utilization of watershed basic
elements' data and the relationship among them without giving attention to the physical
behaviors that link them adopting advanced analytical techniques.
Since the Chicago River watershed is 82% urban land use i.e. highly urbanized
area, examining effect of land use on water quality requires a detailed level of land use.
The export coefficients presented in this thesis are the first attempt to measure and model
nutrients using detailed land use types with a continuous simulation approach and
watershed perspective analysis rather than a storm event methodology. Five years of
199
water quality simulation using the multi-purpose environmental analysis system BASINS
coupled with the comprehensive, conceptual, and continuous simulation watershed scale
model HSPF resulted in export coefficients for level (III), detailed land use for the
Chicago River watershed. Export coefficients are very useful indicators that allow
predicting the possible yield of nutrients reaching receiving water bodies. In this sense,
the water quality simulation approach utilized in this research to generate the coefficients
constitutes a new contribution to the Chicago River watershed and other highly urbanized
watersheds.
The watershed simulation methodology presented can support local and federal
agencies in the development of TMDL's for the watershed since it was based on the state
of the art modeling procedures available. HSPF the selected water quality model,
designed to support watershed based analysis and TMDL development. The model can be
successfully applied to a highly urbanized watershed with appropriate consideration
given to EIA. The results from the five year water quality simulation resulted in finding
of nutrients' loadings of both point and non-point sources. Land use export coefficients
for forty four different land uses were developed as well. Export coefficients can be
utilized as input for a multi-objective optimization approach to resolve land use conflicts
as discussed in section 7.2.1.
and analysis of different scenarios in the watershed and allows the evaluation of the
behavior of the watershed under possible future conditions, thus providing a planning
tool for regulatory environmental agencies. The data driven models developed in Chapter
5 can be used as operation tool to maintain the water quality parameters especially if
200
TMDL and WQS are developed for Chicago River Watershed. So the framework
proposed for this study can be considered robust with the proposed integration, planning
and operating techniques and tools. Furthermore, an optimization tool is introduced in the
future work section.
7.2 Future Research Work
The framework presented in this study is not a solution for the watershed
problems but a collection of innovated tools that can help to investigate and solve the
issues. More sophisticated tools can be utilized to fulfill the goals of the framework.
Although this research is clearly advocating the holistic approach to the watershed
management by including watershed perspective and historical data records, it has some
limitations regarding the utilized tools.
7.2.1 Multi-objective optimization approach for future work. Simulation models at
the watershed scale offer an effective watershed management tools to estimate nutrients
yields for wide spectrum of problems dealing with surface waters (Arabi, 2005; Qi,
2006). Also advances in mathematical optimization techniques open up new paths to
explore alternative scenarios in water resources management which enhance the quality
of decision making (Qi, 2006). Coupling the watershed model simulation results with
optimization techniques will provide a better planning tool.
Multi-objective optimization is the task of finding one or more optimum solutions
when more than one objective function is involved and different solutions may produce
trade-offs (conflicting scenarios) among different objectives (Deb, 2001; Calderon,

201
2009). Pareto optimal solutions are set of solutions where going from any one point to
another in the set, at least one objective function improves and at least one other worsens,
neither of the solutions dominates over each other and provides good flexible options for
decision makers (Yee et al., 2003; Coello, 1999; Calderon, 2009).
The range of land use export coefficients obtained from long term continuous
simulation reflects the different conditions of watershed and different meteorological and
physical variables included in the simulation and hence provide a perfect input for a
multi-objective optimization approach to evaluate multiple scenarios that seek to find
optimal land use change and distribution in highly urbanized developed watershed. Based
on different detailed land use types, scenarios that take into account different
combination of pervious and impervious land use segments and tradeoff between them
(e.g. changing an impervious parking lot land use into pervious etc.) along with factors
such as environmental, social and economical factors can be investigated as part of
planning and decision making tool. The multi-objective optimization approach will allow
the optimizing of independent objectives to find the best land use combination while the
high priority goal is to meet certain water quality standards regarding nutrient loadings of
total nitrogen (TN) and total phosphorus (TP).

APPENDIX A
DATA WAREHOUSE & DATA MINING

203
A.l Database Size: 1.1 GB
A.2 Tables' Data Definition SQL Statements

A.2.1 DATE_DIM
CREATE TABLE "CHICAGORW"."DATE_DIM"
(
"DATEKEY" NUMBER(30,0) NOT NULL ENABLE,
"SYSMODIFICATIODATE" DATE DEFAULT sysdate NOT NULL ENABLE,
"FULLDATE" DATE,
"DAYOFWEEK" NUMBER(38,0),
"DAY NUM IN MONTH" NUMBER(38,0),
"DAY NUM OVERALL" NUMBER(38,0),
"DAY NAME" VARCHAR2(30 BYTE),
"DAYABBREV" VARCHAR2( 10 BYTE),
"WEEK NUM IN YEAR" NUMBER(38,0),
"WEEK NUM OVERALL" NUMBER(38,0),
"MONTH" NUMBER(38,0),
"MONTH NUM OVERALL" NUMBER(38,0),
"MONTHNAME" VARCHAR2(30 BYTE),
"MONTH ABBREV" VARCHAR2(10 BYTE),
"SEASON" VARCHAR2(30 BYTE),
"YEAR" NUMBER(5,0),
"SAME DAY YEAR AGO" DATE,
CONSTRAINT "PK5" PRIMARY KEY ("DATE KEY") USING INDEX PCTFREE
10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL
65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE
0 FREELISTS 1 FREELIST GROUPS 1 BUFFERPOOL DEFAULT FLASH CACHE
DEFAULT CELL FLASH CACHE DEFAULT) TABLESPACE "USERS" ENABLE
)
SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
(
INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL DEFAULT
FLASH CACHE DEFAULT CELL FLASH CACHE DEFAULT
)
TABLESPACE "USERS" ;
A.2.2 LAND_USE_TYPE_DIM
CREATE TABLE "CHICAGORW"."LAND_USE_TYPE_DIM"
(
"LAND_USE_TYPE_KEY" NUMBER(30,0) NOT NULL ENABLE,
"SYS MODIFICATIO DATE" DATE DEFAULT sysdate NOT NULL ENABLE,
"LAND USE LEVEL I CODE" NUMBER(38,0),
204
"LANDUSELEVELIDESC" VARCHAR2(75 BYTE),

"LAND USE LEVEL II CODE" NUMBER(38,0),
"LAND_USE_LEVEL_II_DESC" VARCHAR2(75 BYTE),
"LAND USE LEVEL III CODE" NUMBER(38,0),
"LANDUSELEVELIIIDESC" VARCHAR2(75 BYTE),
CONSTRAINT "PK4" PRIMARY KEY ("LAND USE TYPE_KEY") USING
INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS
2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFERPOOL
DEFAULT FLASHCACHE DEFAULT CELL_FLASH CACHE DEFAULT)
TABLESPACE "USERS" ENABLE
)
(
FLASH CACHE DEFAULT CELLFLASHCACHE DEFAULT
)
A.2.3 LOCATION DIM

CREATE TABLE "CHICAGORW"."LOCATION_DIM"
(
"LOCATIONKEY" NUMBER(30,0) NOT NULL ENABLE,
"STATION ID" VARCHAR2(30 BYTE),
"STATION DESC" VARCHAR2(100 BYTE),
"STATION MONITORING AGENCY" VARCHAR2(60 BYTE),
"LONGITUDE" VARCHAR2(30 BYTE),
"LATITUDE" VARCHAR2(30 BYTE),
CONSTRAINT "PK1" PRIMARY KEY ("LOCATION KEY") USING INDEX
PCTFREE 10 .INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL
DEFAULT FLASH CACHE DEFAULT CELL FLASH CACHE DEFAULT)
)
(
)
205
A.2.4 MEASUREMENTDETAILSDIM
CREATE TABLE "CHICAGORW"."MEASUREMENT_DETAILS_DIM"
(
"MEASUREMENTDETAILSKEY" NUMBER(30,0) NOT NULL ENABLE,
"MEASUREMENTNAME" VARCHAR2(30 BYTE),
"CONFORMED MEASUREMENT NAME" VARCHAR2(30 BYTE),
"MEASUREMENT UNIT" VARCHAR2(60 BYTE),
"MEASUREMENT CATEGORY" VARCHAR2(60 BYTE),
"MEASUREMENT SUBCATEGORY" VARCHAR2(60 BYTE),
"ME ASUREMENTDESC" VARCHAR2(120 BYTE),
CONSTRAINT "PK3" PRIMARY KEY ("MEASUREMENT_DETAILS_KEY")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE
STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1
MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER POOL DEFAULT FLASH CACHE DEFAULT CELL FLASH_CACHE
DEFAULT) TABLESPACE "USERS" ENABLE
)
(
FLASHCACHE DEFAULT CELLFLASHCACHE DEFAULT
)
A.2.5 SOURCEAGENCYDIM
CREATE TABLE "CHICAGORW"."SOURCE_AGENCY_DIM"
(
"SOURCEAGENCYKEY" CHAR(10 BYTE) NOT NULL ENABLE,
"AGENCY NAME" VARCHAR2(60 BYTE),
"AGENCY NAME ABBREV" VARCHAR2(60 BYTE),
"AGENCY TYPE" VARCHAR2(60 BYTE),
CONSTRAINT "PK2" PRIMARY KEY ("SOURCE AGENCY KEY") USING
INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL
DEFAULT FLASHCACHE DEFAULT CELLFLASHCACHE DEFAULT)
)
(
206

PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFERPOOL DEFAULT
FLASHCACHE DEFAULT CELL_FLASH CACHE DEFAULT
)
A.2.6 WATERSHED_CLIMATE_FACT
CREATE TABLE "CHICAGORW"."WATERSHED_CLIMATE_FACT"
(
"SOURCEAGENCYKEY" CHAR(10 BYTE) NOT NULL ENABLE,
"READING VALUE" NUMBER(30,5),
CONSTRAINT "PK9" PRIMARY KEY ("DATE KEY",
"MEASUREMENTDETAILSKEY", "LOCATIONKEY",
"SOURCEAGENCYKEY") USING INDEX PCTFREE 10 INITRANS 2
MAXTRANS 255 COMPUTE STATISTICS NOCOMPRESS LOGGING
)
SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "USERS" ;
A.2.7 WATERSHED LAND_USE_FACT

CREATE TABLE "CHICAGORW"."WATERSHED_LAND_USE_FACT"
(
"DATE KEY" NUMBER(30,0) NOT NULL ENABLE,
"LANDUSETYPEKEY" NUMBER(30,0) NOT NULL ENABLE,
"LOCATION KEY" NUMBER(30,0) NOT NULL ENABLE,
"SOURCE AGENCY KEY" CHAR(10 BYTE) NOT NULL ENABLE,
"SOURCELANDUSECODE" NUMBER( 10,0) NOT NULL ENABLE,
"LAND USE AREA TOTAL" NUMBER( 10,2),
"PER LAND USE AREA TOTAL" NUMBER(10,2),
"IMPLANDUSE ARE A TOTAL" NUMBER( 10,2),
CONSTRAINT "PK10" PRIMARY KEY ("DATEKEY",
"LANDUSETYPEKEY", "LOCATIONKEY", "SOURCE AGENCYJCEY",
"SOURCELANDUSECODE") USING INDEX PCTFREE 10 INITRANS 2
MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT
1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0
FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL DEFAULT FLASH CACHE
)
207
(
FLASHCACHE DEFAULT CELLFLASHCACHE DEFAULT
)
A.2.8 WATERSHED WATER QUALITY FACT

CREATE TABLE "CHICAGORW"."WATERSHED WATER QUALITY FACT"
(
"SOURCEAGENCYKEY" NUMBER(30,0) NOT NULL ENABLE,
"READINGVALUE" NUMBER(35,5),
"MEASUREMENT DETAILSKEY", "LOCATION KEY",
"SOURCEAG ENCYKEY") USING INDEX PCTFREE 10 INITRANS 2
FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL DEFAULT FLASH CACHE
)
(
)
A.2.9 WATERSHED_WATER_QUANTITY_FACT
CREATE TABLE "CHICAGORW"."WATERSHEDWATERQUANTITYFACT"
(
"DATE KEY" NUMBER(30,0) NOT NULL ENABLE,
"MEASUREMENT DETAILS KEY" NUMBER(30,0) NOT NULL ENABLE,
"LOCATION KEY" NUMBER(30,0) NOT NULL ENABLE,
"SOURCE AGENCY KEY" NUMBER(30,0) NOT NULL ENABLE,
"READING VALUE" NUMBER(30,5),
"MEASUREMENT DETAILS KEY", "LOCATION KEY",
"SOURCEAGENCYKEY") USING INDEX PCTFREE 10 INITRANS 2
208

FREELISTS 1 FREELIST GROUPS 1 BUFFERPOOL DEFAULT FLASHCACHE
DEFAULT CELLFLASHCACHE DEFAULT) TABLESPACE "USERS" ENABLE
)
(
FLASH CACHE DEFAULT CELL FLASH_CACHE DEFAULT
)
A.2.10 MWRD_READINGS_STAGE
CREATE TABLE "CHICAGORW"."MWRD_READINGS_STAGE"
(
"READING DATE" DATE,
"LOCATION ID" VARCHAR2(20 BYTE),
"MEASURMENT" VARCHAR2(20 BYTE),
"UNIT" VARCHAR2(20 BYTE),
"VALUE" VARCHAR2(20 BYTE),
"INSERT DATE" DATE
)
(
)
A.2.11 NWS_AIR_TEMP_STAGE
CREATE TABLE "CHICAGORW"."NWS_AIR_TEMP_STAGE"
(
"AVG AIR TEMP" NUMBER(10,2),
"MAX AIR TEMP" NUMBER(10,2),
"MIN AIR TEMP" NUMBER(10,2)
)
(
209
)
A.2.12 NWS_DAILYPRECSTAGE
CREATE TABLE "CHICAGORW"."NWS_DAILY_PREC_STAGE"
(
"READINGDATE" DATE,
"DAILYPERC" NUMBER(10,3)
)
(
FLASH_CACHE DEFAULT CELL FLASH CACHE DEFAULT
)
A.2.13 USGS_READINGS_STAGE
CREATE TABLE "CHICAGORW"."USGS_READINGS_STAGE"
(
"GAGEHEIGHT" NUMBER(15,3),
"DISCHARGE" NUMBER(15,3),
"LOCATION ID" VARCHAR2(20 BYTE),
"INSERT DATE" DATE DEFAULT sysdate
)
(
FLASH CACHE DEFAULT CELL_FLASH CACHE DEFAULT
)
210
A.3 Nitrate Regression
A.3.1 Multiple Linear regression (LinearRegression)

SYNOPSIS
Class for using linear regression for prediction. Uses the Akaike criterion for
model selection, and is able to deal with weighted instances.
= Run information ===
Scheme:weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
Relation: Chi_NB_data_mining_total_area_weka-
weka.filters.unsupervised.attribute.Remove-R4-5-
weka.filters.unsupervised.attribute.ReplaceMissingValues
Instances: 905
Attributes: 154
[list of attributes omitted]
Test mode: 10-fold cross-validation
=== Classifier model (full training set) =
Linear Regression Model
NITRATE = -0.0534 * MONTH NUM + 0.0714 * DO + 0.0961 * TEMP -
0.1304 * BOD + 0.006 * COD +-0.3908 * PH -0.0022 * VSS -0.0037 * rNORG_SS +
0.0152 * MIN AIR TEMP -0.1395 * AVG AIR TEMP + 0.0719 * MAX AIR TEMP
+0.5953 * DAILY PERC +-0.0046 * FLOW +0.0001 * TOTJ002 -0.0025 * TOT1005
+0.0006 * TOT 1009 + 0.0006 * TOT_1010 + 0.0002 * TOTJOl 1 + 0.0002 *
TOT 1013 + 0.0001 * TOT 1015 +0.0003 * TOTJ016 + 0.0005 * TOTJ027 +0.0001
* TOT 1032 +0.017 * TOT 1033 +0.0005 * TOT_1037 +0.0002 * TOTJ040 +
0.0001 * TOT 1045 + 0 * TOTJ049 -0.0163 * TOTJ092 +0.0124 * TOTJ095 +
0.0003 * TOT 1096 +5.6452
Time taken to build model: 0.04 seconds
=== Cross-validation ===
=== Summary -==
Correlation coefficient 0.6759
Mean absolute error 1.4842
Root mean squared error 2.1306
Relative absolute error 60.7279 %
Root relative squared error 73.6747 %
Total Number of Instances 905
A.3.2 Artificial neural network (MultilayerPerceptron)

SYNOPSIS:
A Classifier that uses backpropagation to classify instances.
This network can be built by hand, created by an algorithm or both. The network
can also be monitored and modified during training time. The nodes in this network are
all sigmoid (except for when the class is numeric in which case the the output nodes
become unthresholded linear unit
=== Run information ==

211
Scheme:weka.classifiers.functions.MultilayerPerceptron -L 0.01 -M 0.2 -N 1000 -

V 0 -S 0 -E 20 -H a
Instances: 905
Attributes: 154
Test mode:10-fold cross-validation
=== Summary ===
A.3.3 Support vector machines (SMOreg)

SYNOPSIS
SMOreg implements the support vector machine for regression. The parameters
can be learned using various algorithms. The algorithm is selected by setting the
RegOptimizer. The most popular algorithm (RegSMOImproved) is due to Shevade,
Keerthi et al and this is the default RegOptimizer.
=== Run information ===
Scheme:weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I
"weka.classifiers.functions.supportVector.RegSMOImproved-L0.0010-W 1 -P l.OE-12
-T 0.0010 -V" -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E
1.0"
Instances: 905
Attributes: 154
=== Summary ===
212
A.3.4 Model tree

SYNOPSIS
M5Base. Implements base routines for generating M5 Model trees and rules
The original algorithm M5 was invented by R. Quinlan and Yong Wang made
improvements.
Scheme:weka.classifiers.trees.M5P -N -M 50.0
Relation: Chi NB data mining total area weka-
Instances: 905
Attributes: 154
M5 unpruned model tree:
(using smoothed linear models)
TOTlOOl <= 14404.2 :
| INORG_SS<= 15.5 :
| | DO <= 7.199 : LM1 (42/15.604%)
| | DO > 7.199 : LM2 (27/87.84%)
| INORGSS > 15.5:
| I VSS <= 40.5 :
| | | FLOW <= 10.15 : LM3 (37/4.438%)
| | | FLOW > 10.15 : LM4 (22/7.834%)
| I VSS > 40.5 :
| I I VSS <=225.75:
| | | | PH <= 7.81 :
| | I I | DAILYPERC <= 0.16:
| | | | [ | PH <= 7.165 :LM5 (26/12.942%)
| | | | | | PH> 7.165:
| | | | | | | TEMP <= 18.5 : LM6 (47/26.796%)
| | | | | | | TEMP > 18.5 : LM7 (14/11.77%)
| | | 1 | DAILY PERC > 0.16: LM8 (22/8.948%)
| | | | PH> 7.81 : LM9 (33/9.853%)
| | | VSS > 225.75 : LM10(46/7.96%)
TOT lOOl > 14404.2 :
| TOTJOOl <=37181.95 :
| | INORG SS <=21.5 :
| | | BOD <=4.078:
| | | | PH <= 7.55 :
| | | | | FLOW <= 9.05 : LM11 (28/99.734%)
| | | | | FLOW > 9.05 :LM12 (30/101.282%)
| | | | PH> 7.55: LM13 (27/109.655%)
| | | BOD > 4.078 :LM14 (36/88.755%)
| j INORG_SS> 21.5 :
| I
VSS <= 195 :
| |
| FLOW <=14.5:
j |
| | AVG_AIR_TEMP <= 42.485 : LM15 (24/150.814%)
| |
| | AVGAIRTEMP > 42.485:
| |
| | | DO <= 5.05 : LM16 (27/22.25%)
| |
| | | DO > 5.05 :LM17 (48/94.117%)
| | j FLOW > 14.5:
I | || PH <= 7.72 :
| | | | | DO <= 10.05 : LM18 (42/59.963%)
| | | | | DO > 10.05 :LM19 (17/23.703%)
| | | I PH> 7.72 : LM20 (28/26.416%)
| I VSS> 195:
| | | MIN AIR TEMP <= 57.1 : LM21 (39/8.132%)
| j | MIN_AIR_TEMP > 57.1 : LM22 (12/21.375%)
TOTlOOl > 37181.95 :
| COD <=43.233:
| | FLOW <= 126.5 : LM23 (45/41.532%)
| | FLOW > 126.5 : LM24 (15/24.793%)
| COD > 43.233 :
| | FLOW <=98.5:
| | | TEMP <=19.2:
| | | | TEMP <= 10.15 : LM25 (28/66.049%)
| | | I TEMP > 10.15 : LM26 (25/27.298%)
| | | TEMP > 19.2 : LM27 (36/45.643%)
| | FLOW > 98.5 :
| | | TOTlOOl <=58422.45 : LM28 (32/41.327%)
| | | TOT lOOl > 58422.45 :
M i l F L O W< =3 8 2: L M 2 9(13/40.185%)
| | | | FLOW > 382 : LM30 (37/36.961%)
214
LM num: 1
NITRATE =
0.0363 * DO + 0.0057 * TEMP - 0.0068 * BOD + 0.0003 * COD - 0.0192
* PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MINAIRTEMP - 0.0017 *
AVGAIRTEMP + 0.1426 * DAILYPERC - 0.0001 * FLOW + 0 * TOTJOOl +
0.7193
LM num: 2
NITRATE = 0.0438 * DO + 0.0057 * TEMP - 0.0068 * BOD + 0.0003 * COD -
0.0192 * PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG AIR TEMP + 0.1426 * DAILY PERC - 0.0001 * FLOW + 0 * TOTJOOl +
1.282
LM num: 3
* AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0013 * FLOW + 0 * TOT lOOl +
0.5277
LM num: 4
* AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0016 * FLOW + 0 * TOTJOOl +
0.6461
LM num: 5
NITRATE = 0.0094 * DO - 0.0017 * TEMP - 0.0068 * BOD + 0.0003 * COD +
0.2946
LM num: 6
0.7249
LM num: 7
0.0446 * PH - 0.0002* VSS- 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
0.6722
LM num: 8
* AVG_AIR TEMP + 0.0634 * DAILY PERC + 0.0001 * FLOW + 0 * TOTJOOl +
0.5162
LM num: 9
215
* AVGAIRTEMP + 0.0634 * DAILYPERC+ 0.0001 * FLOW + 0 * TOT_1001 +

I.1756
LM num: 10
* AVG AIR TEMP + 0.0634 * DAILYPERC + 0.0001 * FLOW + 0 * TOTJOOl +
0.7755
LM num: 11
0.7943 * PH - 0.0006 * VSS - 0.0016 * INORG_SS - 0.005 * MINAIRTEMP - 0.0025
12.6819
LM num: 12
II.7097
LM num: 13
1.1519 * PH - 0.0006 * VSS - 0.0016 * INORG SS - 0.005 * MIN AIR TEMP - 0.0025
13.4822
LM num: 14
0.5138 * PH - 0.0006 * VSS - 0.0016 * INORGSS - 0.005 * MIN AIR TEMP - 0.0025
7.275
LM num: 15
7.3986
LM num: 16
4.0342
LM num: 17
4.7406
LM num: 18
216
* AVG AIR TEMP + 0.0143 * DAILYPERC - 0.0019 * FLOW + 0 * TOT_1001 +

4.9023
LM num: 19
NITRATE = 0.0074 * DO + 0.008 * TEMP - 0.0305 * BOD + 0.0002 * COD-
0 .2774 * PH - 0.0007 * VSS - 0.001 * INORG_SS - 0.0034 * MINAIRTEMP - 0.0115
* AVG AIR TEMP + 0.0143 * DAILY PERC - 0.0019 * FLOW + 0 * TOTlOOl +
4.7358
LM num: 20
* AVG AIR TEMP + 0.0143 * DAILY PERC - 0.0019 * FLOW + 0 * TOT lOOl +
4.798
LM num: 21
* A V G A I R T E M P + 0 . 0 1 4 3 * D A I L Y P E R C - 0 . 0 0 1 8 * F L O W + 0 * T O T lOOl +
2.922
LM num: 22
0.1337 * PH - 0.0012 * VSS - 0.001 * INORGSS - 0.0034 * MIN AIR TEMP - 0.0114
* AVG AIR TEMP + 0.0143 * DAILY PERC - 0.0018 * FLOW + 0 * TOT 1001 +
2.9925
LM num: 23
0.0426 * PH - 0.0003 * VSS - 0.0006 * INORG_SS - 0.0022 * MIN AIR TEMP -
0.0033 * AVG AIR TEMP + 0.0143 * DAILY PERC - 0.0017 * FLOW + 0.0006 *
TOT 1001 -30.9523
LM num: 24
TOT lOOl -31.5529
LM num: 25
NITRATE = 0.0094 * DO - 0.0061 * TEMP - 0.0172 * BOD + 0.0034 * COD -
0.0426 * PH - 0.0003 * VSS - 0.0006 * INORG SS - 0.0022 * MIN AIR TEMP -
TOT lOOl -27.9014
LM num: 26
TOTJOOl -27.7684
LM num: 27
0.0426 * PH - 0.0003 * VSS - 0.0006 * INORG SS - 0.0022 * MIN AIR TEMP -
217
0.0033 * AVG AIR TEMP + 0.0143 * DAILYPERC- 0.0012 * FLOW + 0.0006 *

TOTJ 001 -28.538
LM num: 28
TOTJ 001 - 80.2254
LM num: 29
TOTJ 001 -64.941
LM num: 30
TOT 1001 - 65.8262
Number of Rules : 30

=== Summary ===
A.3.5 Lazy learner (LWL)

SYNOPSIS
Locally weighted learning. Uses an instance-based algorithm to assign instance
weights which are then used by a specified WeightedlnstancesHandler.
Can do classification (e.g. using naive Bayes) or regression (e.g. using linear
regression).
Scheme:weka.classifiers.lazy.LWL -U 0 -K -1 -A
"weka.core.neighboursearch.LinearNNSearch -A V'weka.core.EuclideanDistance -R first-
lastV" -W weka.classifiers.trees.DecisionStump
Relation: ChiNBdatamining totalareaweka-
Instances: 905
Attributes: 154
218
=== Classifier model (full training set) =-

Locally weighted learning
Using classifier: weka.classifiers.trees.DecisionStump

Using linear weighting kernels
Using all neighbours
Time taken to build model: 0 seconds
=== Summary ===

A.3.6 Gaussian process (GaussianProcesses)

SYNOPSIS
Implements Gaussian Processes for regression without hyperparameter-tuning.
Scheme:weka.classifiers.functions.GaussianProcesses -L 1.0 -N 0 -K
"weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 1.0"
Instances: 905
Attributes: 154
=== Classifier model (full training set) ==
Gaussian Processes
Kernel used:
RBF kernel: K(x,y) = eA-(1.0* <x-y,x-y>A2)
Average Target Value : 2.685958751393536
Inverted Covariance Matrix:
Lowest Value = -0.21889501888682303
Highest Value = 0.9798981897805298
Inverted Covariance Matrix * Target-value Vector:
Lowest Value = -5.116699435695602
Highest Value = 8.93518362362874

== Summary -----
219

220
A.4. Nitrate Classification:
Histograms:
h"f~
rr, LiiL
. r : i r u i n n n- - . - r s i . r i i
r - -in .
r .
N i r. r i ^ . r 7 . . FI F ,
"
i . . r ir"r L . r i
r L . . . ^-F IF"I . . . r h
r - ----- . . . . . . . - R N R I
- r~ -
r "
Figure A. 1 Histograms for calssification models attributes
>iwi'i # 11 l~ - ttr-i--:i / ( i-i i t i-- I I '

mHiltw. imi j l 4;k-,r ;v # -I 4 I
_ . I t * A 'If -i4~i4th.. *m I I i
/\ I I !h Ife:+ ! I I
il / Ull i . - . 4 it * - 1 - -i' - I iWl t in . til** I - '*+** -.1 i *
I 1
Figure A.2 Scatter plots for calssification models attributes

221
A.4.1 Logistic regression (Logistic)

SYNOPSIS
Class for building and using a multinomial logistic regression model with a ridge
estimator.
Scheme:weka.classifiers.functions.Logistic -R 1.0E-8 -M -1
weka.filters.unsupervised.attribute.ReplaceMissingValues-
weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R3
Instances: 905
Attributes: 154
=== Summary ===
Correctly Classified Instances 742 81.989 %
Incorrectly Classified Instances 163 18.011 %
Kappa statistic 0.5708
=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.925 0.343 0.865 0.925 0.894 0.915 '(-inf-3.993333]'
0.662 0.084 0.696 0.662 0.678 0.894 '(3-993333-
7.986667]'
0.281 0.014 0.6 0.281 0.383 0.831 '(7.986667-inf)'
Weighted Avg. 0.82 0.262 0.808 0.82 0.809 0.904
=== Confusion Matrix =

a b c < classified as
589 45 3 | a = '(-inf-3.993333]'
60 135 9 | b ='(3.993333-7.986667]'
32 14 18 | c ='(7.986667-inf)'
222
A.4.2 Artificial neural network (MultilayerPerceptron)

SYNOPSIS:
A Classifier that uses backpropagation to classify instances.
This network can be built by hand, created by an algorithm or both. The network
can also be monitored and modified during training time. The nodes in this network are
all sigmoid (except for when the class is numeric in which case the the output nodes
become unthresholded linear units).
= Run information ==
Scheme:wekaxlassifiers.functions.MultilayerPerceptron -L 0.01 -M 0.2 -N 1000 -
V 0 -S 0 -E 20 -H a
weka.filters.unsupervised.attribute.Discretize-B3-M-l.0-R3
Instances: 905
Attributes: 154
=== Summary =
Correctly Classified Instances 754 83.3149%
=== Detailed Accuracy By Class =-=

0.932 0.28 0.888 0.932 0.91 0.929 '(-inf-3.993333]'
0.74 0.098 0.686 0.74 0.712 0.912 '(3.993333-
7.986667]'
0.141 0.008 0.563 0.141 0.225 0.833 '(7.986667-inf)'
Weighted Avg. 0.833 0.22 0.819 0.833 0.817 0.918
= Confusion Matrix ===

594 41 2 | a = '(-inf-3.993333]'
48 151 5 | b = '(3.993333-7.986667]'
27 28 9 | c = '(7.986667-inf)'
223
A.4.3 Support vector machines (SMO)

SYNOPSIS
SMOreg implements the support vector machine for regression. The parameters
can be learned using various algorithms. The algorithm is selected by setting the
RegOptimizer. The most popular algorithm (RegSMOImproved) is due to Shevade,
Keerthi et al and this is the default RegOptimizer.
=== Run information =
Scheme:weka.classifiers.functions.SMO -C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -V -1
W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0"
Instances: 905
Attributes: 154
=== Summary ===
=== Detailed Accuracy By Class ===

0.917 0.34 0.865 0.917 0.89 0.791 '(-inf-3.993333]'
0.755 0.108 0.67 0.755 0.71 0.811 '(3.993333-
7.986667]'
0 0 0 0 0 0.495 '(7.986667-inf)'
Weighted Avg. 0.815 0.263 0.76 0.815 0.787 0.775
= Confusion Matrix
a h c classified as
584 53 0| a = '(-inf-3.993333]'
50 154 0| b = '(3.993333-7.986667]'
41 23 0| c ='(7.986667-inf)'
224
A.4.4 Model tree

SYNOPSIS
Class for generating a pruned or unpruned C4.5 decision tree.
Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2
Instances: 905
Attributes: 154

J48 pruned tree
TOT 1004 <= 129.033005

| TOT lOOl <= 13161.3: '(-inf-3.993333]' (316.0/3.0)
| TOT 1001 > 13161.3
| | AVG_AIR_TEMP<= 21.68
| | | TEMP <= 3.4: '(-inf-3.993333]' (5.0/1.0)
| | | TEMP > 3.4:'(7.986667-inf)'(9.0)
| | AVG_AIR_TEMP> 21.68
| | | BOD<=2
| | | | COD <=21
DO <= 10.3
INORGSS <= 15
| PH <= 7.25: '(7.986667-inf)' (4.0)
| PH > 7.25
| | DAILY PERC <= 0.03
| | | AVG_AIR_TEMP <= 70.86: '(3.993333-7.986667]'
(4.0/1.0)
| | | AVG_AIR_TEMP > 70.86:'(7.986667-inf)'(5.0)
| | DAILY PERC > 0.03: '(3.993333-7.986667]' (3.0/1.0)
INORG SS > 15: '(3.993333-7.986667]' (4.0/1.0)
DO > 10.3: '(3.993333-7.986667]' (7.0)
COD >21
PH <= 6.95: '(3.993333-7.986667]' (2.0)
PH > 6.95
MONTH NUM <= 8
INORG SS <= 16
| COD <= 25: '(-inf-3.993333]' (3.0/1.0)
| COD > 25: '(7.986667-inf)' (5.0)
INORG SS > 16
| BOD <= 1: '(3.993333-7.986667]' (3.0/1.0)
225
| | | | I | | | BOD > 1: '(-inf-3.993333]' (10.0/1.0)

| | | | | j M O N T H N U M > 8 : '(-inf-3.993333]' ( 1 2 . 0 )
| | | BOD>2
| | | | TOTJOOl <= 15647.1
| | | | | FLOW <=14
| | | | | | FLOW <= 5.9:'(7.986667-inf)'(3.0)
| | | | | | FLOW >5.9:'(3.993333-7.986667]'(8.0/1.0)
| | | | | FLOW > 14: '(-inf-3.993333]' (14.0/1.0)
[ I | | TOTJOOl > 15647.1
| | | | | INORGSS <= 28
| | | | | | PH <= 7.65
| | | | | | | PH <= 7.04
| | | | | | | I INORG SS <= 7: '(7.986667-inf)' (3.0/1.0)
| | | | | t | | INORG_SS > 7:'(-inf-3.993333]'(9.0)
i | | | | | | PH > 7.04
I I I I I I I I BOD <=4
| | | | | | | | j MONTH NUM <= 2: '(7.986667-inf)' (4.0/1.0)
| | | j | | | | | MONTH NUM > 2
I II I I II I I I BOD <= 3
| i | | | | | | | j | MIN_A1R_TEMP <= 30: '(3.993333-7.986667]' (2.0)
| | | | | i | | | i | MIN AIR TEMP > 30
| I | | | | | | I | | | TEMP <= 13.6: '(-inf-3.993333]' (7.0/1.0)
| | | | | | | | | | | | TEMP >13.6
| j | | | | | | | | | | | MONTH_NUM <= 6:'(3.993333-7.986667]'(2.0)
| | | | | | | | | | | | | MONTH_NUM> 6:'(-inf-3.993333]'(2.0/1.0)
I I I I I I I I I I BOD>3
| | | | | | | | | | | INORGSS <=17
| | | | | | | | I I I I P H< =7.35:'(3.993333-7.986667]'(4.0)
| | | | | I I ! I I I I PH> 7.35:'(-inf-3.993333]'(3.0/1.0)
|| I I II II I I I INORG_SS> 17:'(-inf-3.993333]'(2.0)
| | | | | | M BOD > 4
| | | | | | | | | INORG_SS<= 19:'(-inf-3.993333]'(20.0/1.0)
| | | | | | | | | INORG_SS> 19: '(3.993333-7.986667]' (5.0/1.0)
| | | | | | PH> 7.65:'(-inf-3.993333]'(19.0/1.0)
| | | | | INORG SS > 28
| | I I II TURB <= 7.25
| | | | | | | DO <=6.5:'(-inf-3.993333]'(3.0)
| | | || I I DO >6.5:'(7.986667-inf)'(3.0/1.0)
| | | | | | TURB >7.25:'(-inf-3.993333]'(169.0/5.0)
TOT1004 > 129.033005
| INORG SS <= 26
| | FLOW <=64
j j | BOD <=3
| | I I PH <= 7.43:'(3.993333-7.986667]'(3.0)
| | I I PH> 7.43:'(-inf-3.993333]'(5.0)
| | | BOD>3
| | | MONTHNUM <= 10: '(3.993333-7.986667]' (11.0)
| | | MONTH NUM >10
| | | | TEMP <= 9.9: '(-inf-3.993333]' (2.0)
| | | | TEMP > 9.9:'(3.993333-7.986667]'(3.0)
| FLOW > 64: '(-inf-3.993333]' (22.0/1.0)
INORG_SS > 26
| MONTH NUM <= 8
| | PH <= 6.82
| | | TURB <= 14: '(7.986667-inf)' (3.0)
| | | TURB > 14:'(-inf-3.993333]'(4.0/1.0)
| | PH > 6.82
| | | CBOD <= 3
| | | | TOT 1001 <=58098.3
| | I | | CHLOROPH<= 12.6
| | ! | | | FLOW <= 197: '(3.993333-7.986667]'(64.0/9.0)
Mill! FLOW >197
| | | | | | | DO <= 7.3: '(3.993333-7.986667]' (2.0)
| | | | | || DO >7.3:'(-inf-3.993333]'(6.0/1.0)
| | | | | CHLOROPH> 12.6:'(-inf-3.993333]'(2.0)
| | | | TOTJOOl >58098.3: '(3.993333-7.986667]' (30.0/4.0)
| | | CBOD>3
| | | | FLOW <= 280
| | | I I PH <= 7.51: '(3.993333-7.986667]' (5.0/1.0)
| | | | | PH> 7.51:'(7.986667-inf)'(2.0)
| | | | FLOW > 280: '(-inf-3.993333]' (3.0)
| MONTH NUM >8
| | AVG AIR TEMP <= 24.74
| | | PH <= 7.51: '(7.986667-inf)' (5.0)
| | | PH> 7.51: '(3.993333-7.986667]' (3.0/1.0)
| | AVG_AIR_TEMP> 24.74:'(3.993333-7.986667]'(56.0/10.0)
Number of Leaves : 53
Size of the tree : 105
=== Summary ===

Incorrectly Classified Instances 160 17.6796%
227
Detailed Accuracy By Class ===

TP Rate FPRate Precision Recall F-Measure ROC Area Class
0.918 0.224 0.907 0.918 0.913 0.863 '(-inf-3.993333]'
0.691 0.096 0.678 0.691 0.684 0.775 '(3.993333-
7.986667]'
0.297 0.039 0.365 0.297 0.328 0.581 '(7.986667-inf)'
Weighted Avg. 0.823 0.182 0.817 0.823 0.82 0.823
=== Confusion Matrix ===

585 40 12 | a ='(-inf-3.993333]'
42 141 21 | b = '(3.993333-7.986667]'
18 27 19 I c - '(7.986667-inf)'
228
Figure A.3. Decision tree for classification regression

229
A.4.5 Lazy learner (LWL)

SYNOPSIS
Locally weighted learning. Uses an instance-based algorithm to assign instance
weights which are then used by a specified WeightedlnstancesHandler.
Can do classification (e.g. using naive Bayes) or regression (e.g. using linear
regression).
Scheme:weka.classifiers.lazy.LWL -U 0 -K -1 -A
"weka.core.neighboursearch.LinearNNSearch -A Y'weka.core.EuclideanDistance -R first-
lastV" -W weka.classifiers.trees.DecisionStump
weka. filters.unsupervised.attribute.Discretize-B3-M-1.0-R3
Instances: 905
Attributes: 154
=== Summary ===

Correctly Classified Instances 739 81.6575%
Incorrectly Classified Instances 166 18.3425%

0.917 0.336 0.866 0.917 0.891 0.869 '(-inf-3.993333]'
0.76 0.108 0.671 0.76 0.713 0.87 '(3.993333-
7.986667]'
0 0 0 0 0 0.647 '(7.986667-inf)'
Weighted Avg. 0.817 0.261 0.761 0.817 0.788 0.854

584 53 0| a = '(-inf-3.993333]'
49 155 0| b ='(3.993333-7.986667]'
41 23 0| c = '(7.986667-inf)'
230
A.4.6 NaiveBayes
SYNOPSIS
Class for a Naive Bayes classifier using estimator classes. Numeric estimator
precision values are chosen based on analysis of the training data. For this reason, the
classifier is not an UpdateableClassifier (which in typical usage are initialized with zero
training instances) ~ if you need the UpdateableClassifier functionality, use the
NaiveBayesUpdateable classifier. The NaiveBayesUpdateable classifier will use a
default precision of 0.1 for numeric attributes when buildClassifier is called with zero
training instances.
Scheme:weka.classifiers.bayes.NaiveBayes
Instances: 905
Attributes: 154
= Summary ===

0.907 0.34 0.864 0.907 0.885 0.879 ,(-inf-3.993333]'
0.75 0.108 0.668 0.75 0.707 0.866 '(3.993333-
7.986667]'
0 0.008 0 0 0 0.679 '(7.986667-inf)'
Weighted Avg. 0.808 0.264 0.759 0.808 0.782 0.862

a b c <-- classified as
578 53 6 | a = '(-inf-3.993333]'
50 153 1| b ='(3.993333-7.986667]'
41 23 0| c = '(7.986667-inf)'
{
I1
i,
i
i
Figure A.4. ROC for the six classification models
Figure A.5. ROC for the ANN model

232
APPENDIX B
BASINS/HSPF
233
iao-1
Mappd .
Imptrvioui
Artai 60-
MIA ueacno
(%)
* 0.9.<LS. matin mni Humbv
- US.<15. CIA
e r A = 3 . 6 . 4 - 3 ( H i A )
'i
Avnaf* R*W* Gfw-ttr>
(1A s O.I/MIA)1-*
&ctrffiy Pi6#ni*cct4 CMSin

*IA 0.01 (M:A)^
Td tally Cor*nc-rd 6*t*>n
(tA< MIA
Effective Impervious Area, EI A (%)
^ B t A vtva* wr I m c ' U S S - 3 . r a f n f f l r u n o f f modeti stWy. Only

ptn+ wrtfc MIA&4 witra (L**nj 1900 and
Figure B.l. Plot of Sutherland Equations and USGS (Laenen, 1983) equation that
illustrate relationships between TIA and EI A for a range of watersheds (Sutherland,
2000).
234
1.and Cover Class Notes Mean Range Reference
Single-family residential < 0 25 acre lots 30 30-4 Alley and Veenhuis 1W.I1
0.254) 5 acre lots 26 22-31 Alley and Veenhuis (1083)
0.5-1 0 acre lots 15 13-16 Alley and Veenhuis (1083)
Includes multi-family residential 30 22-44 Sullivan et al. (1078)
Multiple-family residential 66 53-64 Alley and Veenhuis (1083)
Commercial ss 66-08 Alley and Veenhuis (1083)
81 52-00 Sullivan et al. (1078)
Industrial Ml Alley and Veenhuis (1083)
40 11-57 Sullivan et al. (1078)
l^pen 5 1-14 Sullivan et al (1078)
Figure B.2. Percentage Imperviousness for Various Land Cover Classes as Calculated
Directly from Aerial Photo and Map Analysis (Brabec et al., 2010)
Percentage TLA Percentage tlA

Alley Rouge Ailey Kmg Rouge
and City of Griffin Program and and 1Program
Cooper Tavkf Veenhuis Olvmpia Stankowski et al. U5DA Office Cooper Taylor Veenhuis (kxidard Office
land Use Category 1003)h d<W (loos r1 0072)' (1080)' <1086| (1004^ (10%) (100?) (1081) (1086)' (1004)
Agricultural land/
open space 5 2-5 0 1 0-2.0 0 0-1.5 - 2 U.I 1 1
Public and quasi-public ... _ 50-75 - -
Parks 5 5 0 100 0 15 4.2
C.olf courses 5 20 ... 0 10 -
I ,ow-density 10 < 15 \2 U-|o 12 188 5 4 18 2.4
single-family residential ( J u/ac) (0-2 u / ac) (1 u/2 ac.) ( t u/ac.)
Medium-density 15 2t lVlft 25 "U-42 20 17 8 24 in M0 22 16 h
single-family residential (1-1 u/ac) <1-2 u/ac ) {2-8 u/ac) ll u/ac.) <1-1 u/ac.) (1-2 u/ac I
"Suburban" density 22-^1 25 n-io
4 u/ac. <2-4 u/ac) <2-4 u/ac )
High-density ho 40 !UM0 40 40 25-48 10 51 4 51 25 18-12 25 101
îngle-family ivstdential (3'7 u/ac) i ' 4 u/ac.) <!*-7 u/ac ) (8-22 u/ac| ill u/ac.) (1-7 u/ac.; <> 4 u/ac )
1U
(4 u/ac.)
Molîie homes 70 ftO
Multifamilv 80 48 *0-80 47-M *5 72 IT 52
< 7 u/ac ) (7-Wu/ac ) [ * 22 u/ac.) (8 u/ac.) < 7 u / ac )
Commercial u 60-U0 66-08 86 80-100 85 56.2 8ft 4K-85 51-08 35-40 41.0
Industrial AO 40-00 72 75.0 46 M.u
Highways 1(10 !0C> 520 Imi 0(1 22 7
Construction ute 50 o 77 17
NRTTIt rhe number of land use class*** varies considerably between studio* USDA - U.S. Department of Agriculture
a. Abstracted from Alley and Vconhuis (IQR3). IMch and Fbbert (10%), Taylor (loo^j, Beyerlein 0006)
b I rom KingCounty hkirface Water ManagementDivision <1000), Departmentof Public Wurks. and Fi:i/fterrettCon.suitingCn>up<io<JO).SnoquaimieKidge Draft Mas
ter Drainage Plan
c. Based on direct measuiemmt from aerial photos and field inspection from nineteen basins in the Denver area.
d. Total and effective impervious area percentagescompiled from CountySurface Water Management < 1000), PFf/Barrett Consulting Croup (1^1).Snoqualmie Ridge
Draft Master Drainage Han; Alley and Veenhuis (1083), and for the open land/agricultural land category, estimated based on similar land uses
e. No discussion of methodology for determining impervious figures
f The source for the percentage imperviousness figures is not indicated in the report.
g. Based on general field observations and studies by Carter (1061>, Feltun and Lull 11063), Antoine<10f>4), and Stall et al (WTO). These reference studiesare not New Jer
sey specific.
h. Measured from aerial photographs and a field survey of three sample areas per land use category in each watershed.
i. Measured from topographic maps.
Figure B.3. The Percentage Impervious Area Ascribed to Various Land Use Categories,
Showing the Relationship of Total Impervious Area (TIA) to Effective Impervious Area
(EIA) Used in Various Studies (Brabec et al., 2010)
235
Table B. 1 Simulated annual loads of total nitrogen from different land use segment in the
Upper Chicago River subbasin
Land Use Type Perv. Imperv. % EIA Combined Area Total % Loads
Loads Loads (lbs/acre / (acres) Annual
(lbs/acre (lbs/acre / yr) Loads (lbs)
/yr) yr)
Residential 1.2216 9.1722 20% 2.8288 61776 174742.6 46.28%
Single Family
Residential 1.2216 9.1722 25% 3.2094 9595 30794.2 8.26%
Multi Family
Urban Mix W/ 1.2222 9.1722 40% 4.4022 5924 26076.8 7.10%
Parking Lot
Industrial W/ 1.2134 9.1722 40% 4.397 5403 23754.6 6.48%
Parking Lot
Education 1.2222 9.1722 25% 3.2098 3722 11945.6 3.22%
Interstate/Toll 1.2222 9.1722 40% 4.4022 2315 10192.8 2.78%
Open Space 0.8778 9.1722 0% 0.8778 11554 10140 2.42%
Cons
Lake/ 0.7852 9.1722 50% 4.9788 1470 7317.6 2.02%
Reservoirs/
Lagoon
Business W/ 1.2222 9.1722 40% 4.4022 1603 7055.8 1.92%
Parking Lot
Golf Course 0.7856 9.1722 0% 0.7856 9455 7428 1.78%
Government 1.2222 9.1722 40% 4.402 1441 6343.2 1.74%
Manufacturing/ 1.2222 9.1722 39% 4.316 1098 4740.4 1.28%
Production
Office Camps 1.2222 9.1722 37% 4.1568 1111 4618.4 1.26%
Utilities/ Waste 1.2222 9.1722 40% 4.4022 1042 4588.8 1.26%
Vacant/ Grass 0.7856 9.1722 0% 0.7856 5489 4312.4 1.04%
Transportation 1.2222 9.1722 40% 4.4022 841 3701.4 1.02%
Religious 1.2222 9.1722 25% 3.2096 1104 3544.6 0.92%
Single Office 1.2222 9.1722 22% 2.9696 973 2889 0.80%
Warehouse/ 1.2222 9.1722 40% 4.4022 626 2756.4 0.76%
Distribution/
Wholesale
Open Space 0.7914 9.1722 0% 0.7914 3832 3032.6 0.72%
Recreational
Retail Center 1.2222 9.1722 25% 3.2098 851 2729.6 0.72%
Urban Mix No 1.2222 9.1722 25% 3.2096 845 2713.6 0.72%
Parking Lot
Medical 1.2222 9.1722 33% 3.8188 709 2706 0.72%
Other Roadway 1.2222 9.1722 40% 4.4022 532 2339.8 0.64%
Cultural/ 1.2222 9.1722 25% 3.2096 684 2196.4 0.60%
Entertainment
Crops/Grain/ 1.7074 9.1722 0% 1.7074 1218 2080 0.48%
236
Graze
Construction 1.2222 9.1722 25% 3.2098 507 1626.4 0.42%
Residential
Construction 1.2222 9.1722 25% 3.2098 497 1595.4 0.42%
Non-
Residential
Cemetery 1.1272 9.1722 0% 1.1272 1396 1574.2 0.36%
Mall 1.2222 9.1722 40% 4.4022 300 1322.4 0.34%
Rivers/ Canals 0.7852 9.1722 50% 4.9788 248 1236.2 0.34%
Residential 1.2222 9.1722 25% 3.2098 404 1295 0.32%
Mobile Home
Hotel/ Motel 1.2222 9.1722 25% 3.2094 217 697.4 0.20%
Wetland 0.7852 9.1722 0% 0.7852 1122 881.2 0.18%
Institutional/ 1.2222 9.1722 40% 4.4022 91 402.4 0.10%
Other
Nursery/ 1.708 9.1722 0% 1.708 231 394.6 0.10%
Greenhouse/
Orch
Other vacant 0.8778 9.1722 0% 0.8778 297 260.4 0.08%
Independent 1.2222 9.1722 39% 4.3452 42 182.2 0.02%
Auto Parking
Communication 1.2222 9.1722 25% 3.2104 51 163.4 0.00%
Open Space 0.7856 9.1722 0% 0.7856 110 86.6 0.00%
Private
Water 0.7852 9.1722 0% 0.7852 84 65.6 0.00%
Open Space 0.7856 9.1722 0% 0.7856 63 49.2 0.00%
Linear
Open Space 0.7856 9.1722 0% 0.7856 39 30.8 0.00%
Other
Residential 1 2222 9.1722 7% 1.7772 11 18.8 0.00%
Farm
Total / Average 1.1271 9.1722 23% 130.48 140923 376622.8 100%
Table B.2 Simulated annual loads of total Phosphorus from different land use segment in
the Upper Chicago River subbasin
Land Use Type Perv. Imperv. % EIA Combined Area Total % Loads
Loads Loads (lbs/acre/ (acres) Annual
(lbs/acre (lbs/acre/ yr) Loads (lbs)
/yr) yr)
Residential 0.1496 0.3362 20% 0.1876 61776 11582.4 47.66%

Single Family
Residential Multi 0.1496 0.3362 25% 0.1964 9595 1884.6 8.00%
Family
237
Urban Mix W/ 0.1496 0.3362 40% 0.2244 5924 1329.4 6.14%

Parking Lot
Open Space Cons 0.1496 0.3362 0% 0.1496 11554 1730.8 5.80%
Industrial W/ 0.1496 0.3362 40% 0.2244 5403 1212.4 5.60%
Parking Lot
Education 0.1496 0.3362 25% 0.1964 3722 731 3.12%
Interstate/Toll 0.1496 0.3362 40% 0.2244 2315 519.6 2.40%
Golf Course 0.0578 0.3362 0% 0.0578 9455 546.8 1.82%
Business W/ 0.1496 0.3362 40% 0.2244 1603 359.6 1.64%
Parking Lot
Government 0.1496 0.3362 40% 0.2244 1441 323.4 1.50%
Lake/ 0.0688 0.3362 50% 0.2026 1470 298 1.50%
Reservoires/
Lagoon
Manafacturing/ 0.1496 0.3362 39% 0.2224 1098 244.2 1.10%
Production
Office Cmps 0.1496 0.3362 37% 0.2186 1111 243 1.10%
Utilities/Waste 0.1496 0.3362 40% 0.2244 1042 234 1.08%
Vacant/ Grass 0.0578 0.3362 0% 0.0578 5489 317.6 1.06%
Religious 0.1496 0.3362 25% 0.1964 1104 217 0.92%
Transportation 0.1496 0.3362 40% 0.2244 841 188.8 0.88%
Single Office 0.1496 0.3362 22% 0.1908 973 185.8 0.80%
Open Space 0.0578 0.3362 0% 0.0578 3832 221.6 0.74%
Recreational
Retail Center 0.1496 0.3362 25% 0.1964 851 167 0.72%
Urban Mix No 0.1496 0.3362 25% 0.1964 845 166.2 0.72%
Parking Lot
Medical 0.1496 0.3362 33% 0.2106 709 149.4 0.66%
Warehouse/ 0.1496 0.3362 40% 0.2244 626 140.4 0.64%
Distribution/
Wholesale
Cultural/ 0.1496 0.3362 25% 0.1964 684 134.4 0.56%
Entertainment
Other Roadway 0.1496 0.3362 40% 0.2244 532 119.2 0.54%
Construction 0.1496 0.3362 25% 0.1964 497 97.8 0.42%
Non-Residential
Construction 0.1496 0.3362 25% 0.1964 507 99.6 0.42%
Residential
Crops/ Grain/ 0.089 0.3362 0% 0.089 1218 108.6 0.34%
Graze
Mall 0.1496 0.3362 40% 0.2244 300 67.4 0.32%
Residential 0.1496 0.3362 25% 0.1964 404 79.2 0.32%
Mobile Home
Rivers/ Canals 0.0688 0.3362 50% 0.2026 248 50 0.26%
Wetland 0.0688 0.3362 0% 0.0688 1122 77.2 0.26%
Cemetry 0.055 0.3362 0% 0.055 1396 76.6 0.24%
Hotel/ Motel 0.1496 0.3362 25% 0.1964 217 42.4 0.20%
Other vacant 0.1496 0.3362 0% 0.1496 297 44.6 0.14%
238
Institutional/ 0.1496 0.3362 40% 0.2244 91 20.6 0.10%

Other
Nursery/ 0.09 0.3362 0% 0.09 231 20.8 0.06%
Greenhouse/ Ore
Independent 0.1496 0.3362 39% 0.223 42 9.6 0.02%
Auto Parking
Communication 0.1496 0.3362 25% 0.1964 51 10 0.00%
Open Space 0.0578 0.3362 0% 0.0578 63 3.6 0.00%
Linear
Open Space 0.0578 0.3362 0% 0.0578 39 2.2 0.00%
Other
Open Space 0.0578 0.3362 0% 0.0578 110 6.4 0.00%
Private
Residential Farm 0.1496 0.3362 7% 0.1628 11 1.6 0.00%
Water 0.069 0.3362 0% 0.069 84 5.6 0.00%
Total / Average 0.1249 0.3362 23% 7.47 140923.0 24070.4 100%
239
Table B.3 Land use codes (as used in physical and data driven models)
code code code

1110 RES/SF 1 11 111
1120 RES/FARM 1 16 169
1130 RES/MF 1 11 112
1140 RES/MOBILE HM 1 11 115
1211 MALL 1 15 153
1212 RETAIL CNTR 1 12 121
1221 OFFICE CM PS 1 15 152
1222 SINGL OFFICE 1 12 123
1223 BUS. PARK 1 15 152
1231 URB MX W/PRKNG 1 16 169
1232 URB MX NO PRKNG 1 16 169
1240 CULT/ENT 1 12 128
1250 HOTEL/MOTEL 1 11 114
1310 MEDICAL 1 14 149
1320 EDUCATION 1 12 127
1330 GOVT 1 12 125
1340 PRISON 1 12 126
1350 RELIGOUS 1 12 128
1360 CEMETERY 1 17 174
1370 INST/OTHER 1 12 129
1410 MINERAL EXT 1 13 137
1420 MANUF/PROC 1 13 139
1430 WAREH/DIST/WHOL 1 12 122
1440 INDUSTPK 1 15 151
1511 INTERSTATE/TOLL 1 14 144
1512 OTHER ROADWY 1 14 144
1520 OTH LINEAR TRAN 1 14 144
1530 AIRTRANSPORT 1 14 141
1540 INDEP AUTO PRK 1 15 152
1550 COMMUNICATION 1 14 145
1560 UTILITIES/WASTE 1 14 147
2100 CROP/GRAIN/GRAZ 2 21 213
2200 NRSRY/GRNHS/ORC 2 22 221
2300 AG/OTHER 2 24 249
3100 OPENSP REC 1 17 173
3200 GOLF COURSE 1 17 173
3300 OPENSP CONS 1 17 179
3400 OPENSP PRIVATE 1 17 179
3500 OPENSP LINEAR 1 17 179
240
3600 OPENSP OTHER 1 17 179

4110 VAC FOR/GRASS 2 24 249
4120 WETLAND 6 6 6
4210 CONST RES 1 11 117
4220 CONST NONRES 1 12 129
4300 OTHER VACANT 1 17 179
5100 RIVERS/CANALS 5 51 512
5200 LAKE/RES/LAGOON 5 51 513
5300 LAKE MICHIGAN 5 51 513
9999 OUT OF REGION
241
BIBLIOGRAPHY
Abedini, M.J., Nasseri, M., (2004). Spatiotemporal rainfall forecasting via ANNS
coupled with GA. In: Liong, Phoon, Babovic (Eds.), Sixth International
Conference on Hydroinformatics.
Ahmed, A., Ploennigs, J., Menzel, K., & Cahill, B. (2010). Multi-dimensional building
performance data management for continuous commissioning, Advanced
Engineering Informatics, 24, 466-475.
Ahmed, A., Korres, N., Ploennigs, J., Elhadi, H., & Menzel, K. (2011). Mining building
performance data for energy-efficient operation. Advanced Engineering
Informatic, 25(2), 341-354.
Ahmed, I., Azhar, S., & Lukauskis, P. (2004). Development of a decision support system
using data warehousing to assist builders/developers in site selection. Automation
in Construction, 13 (4), 525-542.
Akhavan, S., Abedi-Koupai, J., Mousavi, S.-F., Afyuni, M., Eslamian, S.S., &
Abbaspour, K.C. (2010). Application of SWAT model to investigate nitrate
leaching in Hamadan-Bahar Watershed, Iran. Agriculture, Ecosystems and
Environment, 139 (4), 675-688.
Allan, J. D. (2004). Landscapes and Riverscapes: The influence of land use on stream
ecosystems. Annual Review of Ecology, Evolution, and Systematics, 35, 257-284.
Alley, W.M., & Veenhuis , J.E. (1983). Effective impervious area in urban runoff
modeling. Journal of Hydraulic Engineering, 109(2), 313-319.
Anderson, J.R., Hardy, E.E., Roach, J.T., & Witmer, R.E. (1976). A land use and land
cover classification system for use with remote sensor data : U.S. Geological
Survey professional paper 964. Retrieved from
http://landcover.usgs.gov/pdf/anderson.pdf
Arabi, M. (2005). A Modeling framework for evaluation of watershed management

practices for sediment and nutrient control (Doctoral thesis). Available from
ProQuest database.
Arabi, M., Govindaraju, R.S., Hantush, M. M., & Engel, B. A. (2006). Role of watershed
subdivision on modeling the effectiveness of best management practices with
SWAT. Journal of the American Water Resources Association, 42(2), 513-528.
Arnold, J. G., Srinivasan, R., Muttiah, R. S., & Williams, J. R. (1998). Large area
hydrologic modeling and assessment - Part 1: Model development. Journal of the
American Water Resources Association, 34( 1), 73-89.
242
Arnold, J. G., Potter, K.N., King, K.W., & Allen, P.M. (2005). Estimation of soil
cracking and the effect on surface runoff in a Texas Blackland Prairie watershed.
Hydrological Processes, 19(3), 589-603.
Ahmad, H. M. N., (2010). Modeling hydrology and nitrogen export for the Thomas
Brook watershed with SWAT (Master of applied science thesis). ISBN: 978-0
494-68078-0.
Alpaydin, E. (2010). Introduction to machine learnening, 2nd ed. The MIT Press.
Asefa, T., Kemblowski, M., McKee, M., Khalil, A. (2006). Multi-time scale stream flow
predictions: the support vector machines approach. Journal of Hydrology, 318, 7-
16.
Ahearn, D.S., Sheibley, R.W., Dahlgren, R.A., Anderson, M., Johnson, J., & Tate, K.W.
(2005). Land use and land cover influence on water quality in the last free-flowing
river draining the western Sierra Nevada, California. Journal of Hydrology, 313,
234-247.
Baldys, S., Raines, T. H., Mansfield, B. L., & Sandlin, J. T. (1998). Urban stormwater
quality, event-mean concentrations, and estimates of stormwater pollutant loads.
U.S. Geological Survey Water-Resources Investigation Report 98-4158.
Barling, R.O., & Moore I. O. (1994). Role of buffer strips in management of waterway
pollution: A review. Environmental Management, 18(A), 543-558.
Barnes, K. B., Morgan, J. M., & Roberge, M. C. (2002). Impervious surfaces and the
quality of natural and built environments. Department of Geography and
Environmental Planning, Towson University. Retrieved from
http://pages.towson.edu/morgan/files/Impervious.pdf
Bartosova, A., Singh, J., Slowikowski, J., Machesky, M., & McConkey, S. (2005).
Overview of recommended phase III water quality monitoring: Fox River
investigation. Illinois State Water Survey, ISWS CR 2005-13.
Bartosova, A., Singh, J., Rahim, M., McConkey, S. (2007). Fox River Watershed
investigation: Stratton Dam to the Illinois River, phase II: hydrologic and water
quality simulation models, part 3: validation of hydrologic model parameters,
Brewster Creek, Ferson Creek, Flint Creek, Mill Creek, and Tyler Creek
Watersheds. Illinois State Water Survey, ISWS CR 2007-07.
Basnyat, P., Teeter, L.D., Flynn, K.M., Lockaby, B.G., (1999). Relationships between
landscape characteristics and nonpoint source pollution inputs to coastal estuaries.
Environmental Management, 23 (4), 539-549.
243
Beach, D. (2002). Coastal sprawl: the effects of urban design on aquatic ecosystems in
the United States. Pew Oceans Commission, Arlington. Retrieved from
http://www.Dewtrusts.org/uploadedFiles/wwwpewtrustsorg/Reports/Protecting oe
an life/env pew oceans sprawl.pdf
Beaulac, M. N. & Reckhow, K. H. (1982). An examination of land use-nutrient export

relationships. Water Resources Bulletin, 18(6), 1013-1024.
Beran, B., Piasecki, M. (2009). Engineering new paths to water data. Computer and
Geosciences, 35 (4), 753-760.
Bergman, M. J., Green,W., & Donnangelo, L. J. (2002). Calibration of storm loads in the
South Prong watershed, Florida, using Basins/HSPF. Journal of the American
Water Research Association, 38, 1423-1436.
Bernarrdino, J.R. (2002). Approximate Query Answering Using Data Warehouse

Striping. Journal of Intelligent Information Systems, 19(2), 145-167.
Bhaduri, B., Harbor, J., Engel, B. A., & Grove, M. (2000), Assessing watershed-scale,
long-term hydrologic impacts of land-use change using a GIS-NPS model.
Environnemental Management, 26(6), 643-58.
Bhaduri, B., Minner, M., Tatalovich, S., & Harbor, J. (2001). Long-term hydrologic
impact of land use change: a tale of two models. Journal of Water Resources
Planning and Management, 127(1), 13-19.
Bian, B., Juan Cheng, X., & Li, L. (2011). Investigation of urban water quality using
simulated rainfall in a medium size city of China. Environmental Monitoring and
Assessment, 753(1-4), 217-229.
Bicknell, R., Imhoff, J., Kittle, L. Jr, Donigian, S. Jr, & Johanson, C. (1996),
Hydrological Simulation Program-Fortran User's Manual. .S. Environmental
Protection Agency. Retrieved from
http://eng.odu.edu/cee/resources/model/mbin/hspf/dos/hspf vl 1 entiretv.pdf
Bicknell, B. R., Imhoff, J. C., Kittle, Jr, J. L., Jobes, T. H., & Donigian, Jr., A. S. (2005).
HSPF Version 12.2 User's Manual. U.S. Environmental Protection Agency.
Retrieved from
http://water.epa.goV/scitech/datait/models/basins/bsnsdocs.cfm#hspf
Bonifati, A., Cattaneo, E., Ceri, S., Fuggett, A., & Paraposchi, S. (2001). Designing data
marts for data warehouse. ACM Transactions on Software
Engineering and Methodology, 10(4), 452-483.
Borah D.K., & Bera, M. (2003). Watershed scale hydrology and nonpoint sourcepollution
models: Review of Mathematical bases. American Society of Agricultural
244
Engineers, 46(6), 1553-1566.
Borah , D. K., Yagow, G., Saleh, A., Barnes, P. L., Rosenthal, W., Krug, E. C., & Hauck,
L. M. (2006). Sediment and nutrient modeling for TMDL development and
implementation. American Society of Agricultural and Biological Engineers,
49(4), 967-986.
Borah, D. K. (2011). Hydrologic procedures of storm event watershed models: a

comprehensive review and comparison. Hydrological Processes, 25(22), 3412
3489.
Bosch, D.D., Sheridan, J.M., Lowrance, R.R., Hubbard, R.K, Strickland, T.C.,
Feyereisen, G.W., & Sullivan, D.G. (2007). Little river experimental watershed
database. Water Resources Research 43 (W09470), doi:10.1029/2006WR005844.
Bouraoui, F., Vachaud, G., & Chen. T. (1998). Prediction of the effect of climatic
changes and land use management on water resources. Physics and Chemistry of
the Earth, 23(4), 379-384.
Boynton, W. R., Garber, J.H., Summers, R., & Kemp, W. M. (1995). Inputs,
transformations, and transport of nitrogen and phosphorus in Chesapeake Bay and
selected tributaries. Estuaries, 75(16), 285-314.
Brabec E., Schulte S. & Richards P.L. (2002). Impervious surfaces and water quality: A
review of current literature and its implications for watershed planning.
Journal of Planning Literature, 16, 499.
Brett, M.T., Arhonditsis, G.B., Mueller, S.E., Hartley, D.M., Frodge, J.D., & Funke, D.E.
(2005). Non point source impacts on stream nutrient concentrations along a forest
to urban gradient. Environmental Management, 35(3), 330-42.
Brezonik, P. L., & Stadelmann, T. H. (2002). Analysis and predictive models of storm
water runoff volumes, loads, and pollution concentration from watersheds in the
Twins Cities metropolitan area, Minnesota, USA. Water Research, 36, 1743
1757.
Brun, S.E., & Band, L.E. (2000). Simulating runoff behavior in an urbanizing watershed.
Computers, Environment and Urban Systems, 24( 1), 5-22.
Burmann, A., & Marx Gomez, J. (2007). Data Warehousing with Environmental Data.
Information Technologies in Environmental Engineering ITEE 3rd international
ICSC symposium, 153-160.
Calderon, C. V. (2009). Multi-Objective optimization approach for land use allocation

based on water quality (Doctoral dissertation). Available from ProQuest database.
(UMI Number: 3401413).
245
Cappiella, K., & K. Brown. (2001). Derivations of Impervious Cover for Suburban Land
Uses in the Chesapeake Bay Watershed. Prepared for the U.S. EPA Chesapeake
Bay Program. Center for Watershed Protection, Ellicott City, MD, 51.
Carpenter, S., Caraco, N., Correll, D., Howarth, R., Sharpley, A.,& Smith, V. (1998).
Nonpoint pollution of surface waters with phosphorous and nitrogen. Ecological
Applications, 8(3), 559-568.
Center for Watershed Protection (2003). Impacts of impervious cover on aquatic systems.
Center for Watershed Protection, Ellicott City, MD, 141 p.
Chang, H. (2004). Water quality impacts of climate and land use changes in southeastern
Pennsylvania. The Professional Geographer, 56(2), 240-257.
Chapra, S.C. (1997). Surface Water Quality Modeling. NewYork . McGraw-Hill Book
Company.
Chau, K.W., Cao, Y., Anson, M., & Zhang, J. (2002). Application of Data Warehouse
and Decision Support System in Construction Management. Automation in
Construction, 72(2), 213-224.
Chen,R., Chen, C., & Cheng, C, (2003). A Web-based ERP data mining system for
decision making. International Journal of Computer Applications in
Technology,! 7(3), 156-158
Chen, S.T., & Yu, P.S. (2007). Real-time probabilistic forecasting of flood stages.
Journal of Hydrology, 340, 63-77.
Chiang, Y.M., Hsu, K.L., Chang, F.J., Yang Hong, Y., & Sorooshian, S. (2007). Merging
multiple precipitation sources for flash flood forecasting. Journal of Hydrology,
340, 183-196.
Choi, W., & Deal, B. M. (2008). Assessing hydrological impact of potential land use
change through hydrological and land use change modeling for the Kishwaukee
River Basin (USA). Journal of Environmental Management, 88, 1119-1130.
Chow, V.T., Maidment, D., & Mays, L. W. (1988). Applied Hydrology. McGraw Hill.
Cianfrani, C. M., Hession, W. C., & Rizzo, D. M. (2006). Watershed imperviousness

impacts on stream channel condition in South Eastern Pennsylvania. The Journal
of the American Water Resources Association (JAWRA), 42, 941-956.
Clesceri, N. L., Curran, S. J., and Sedlak, R. I. (1986). Nutrient loads to Wisconsin lakes:
Part I. Nitrogenand phosphorus export coefficients. Water Resources Bulletin,
22(6), 983-989.
246
Consortium of Universities for the Advancement of Hydrologic Science. (2012,

January). CUAHSI. Information retrieved from http:// his.cuahsi.org
Conway, T.M. & Lathrop, L.G. (2005). Alternative land use regulations and
environmental impacts: assessing future land use in an urbanizing watershed.
Landscape and Urban Planning, 70(1), 1-15.
Cotter, A. S., Chaubey I., Costello T. A., Soerens T.S., & Nelson, M. A. (2003). Water
quality model output uncertainty as affected by spatial resolution of input data.
Journal of the American Water Resources Association, 39(4), 977-986.
Dawson, C.W., & Wilby, R. (1998). An artificial neural network approach to rainfall
runoff modelling. Hydrological Sciences Journal, 43( 1), 47-66.
Deb, K., (2001). Multi-objective optimization using evolutionary algorithms. Wiley.
Demissie, M., Singh, J., Knapp, H. V., Saco, P., Lian, Y. (2007). Hydrologic model
development for the Illinois River Basin using BASINS 3.0. Illinois State Water
Survey, ISWS CR 2007-03.
Doll, B.A., Wise-Frederick, D. E., Buckner, C. M., Wilkerson, S. d., Harman, W. A.,
Smith, R. E., & Spooner, J. (2002). Hydraulic geometry relationships for urban
streams throughout the piedmont of North Carolina. Journal of the American
Water Resources Association (JAWRA), 38(3), 641-651.
Donigian, A.S., Imhoff J.C., & Bicknell, B.R., (1983). Predicting water quality resulting
from agricultural nonpoint source pollution via simulation - HSPF. In
Agricultural Management and Water Quality. Ames, Iowa: Iowa State University
Press, 200-249.
Donigian, A.S., Bicknell, B.R., and Imhoff, J.C. (1995). Hydrological Simulation
Program- Fortran (HSPF). In: V. P. Sigh (Editor), Computer Models of
Watershed Hydrology, Chapter 12. Water Resources Publications, Littleton, CO.
395-442.
Donigian, A.S. (2002). HSPF Watershed Model Calibration and Validation.

AquaterraConsultants, Mountain View, California.
Driver, N. E., Mustard, M. H., Rhinesmith, R. B., and Middelburg, R. F. (1985). U.S.
Geological Surveyurban-stormwater data base for 22 metropolitan areas
throughout the United States. United StatesGeological Survey, Open-File Report
85-337.
Environmental Protection Agency (2007). Better Assessment Science Integrating Point

and Non- point Sources BASINS version 4.0. EPA-823-C-07-001.
247
Environmental Protection Agency (2012). BASINS 4 lectures, data sets, and exercises.
Retrieved from http://water.epa.gov/scitech/datait/models/basins/training.cfm
Finkenbine, J.K., Atwater, J.W., & Mavinic, D. S. (2000). Stream health after
urbanization. Journal of the American Water Resources Association (JA WRA),
36(5), 1149-1160.
Fohrer, N., Haverkamp, S., Eckhardt, K., & Frede, H.G. (2001). Hydrologic response to
land use changes on the catchment scale. Physics and Chemistry of the Earth (B),
26(7-8), 577-582.
Freundlieb, M., & Teuteberg, F. (2009). Towards a Reference Model of an

Environmental Management information System for Compliance Management.
Environmental Informatics and Industrial Environmental Protection: Concepts,
Methods and Tools. ISBN: 978-3-8322-8397-1.
Frink, C. R. (1991). Estimating nutrient exports to estuaries. Journal of Environmental

Quality, 20, 717-724.
Gburek, W. J., & Folmar, G. J. (1999). Flow and chemical contributions to streamflow in
an upland watershed: a baseflow survey. Journal of Hydrology, 217, 1-18.
Gosain, A., & MannS. (2010). Object Oriented Multidimensional Model for a Data
Warehouse with Operators. International Journal of Database Theory and
Application, 3(4), 35-40.
Hall,M., Frank,E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. (2009). The
weka data mining software: An update. SIGKDD Explorations, //(I), 10-18.
Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques-2"d ed. Morgan
Kaufmann Publishers.
Hanratty, M.P., & Stefan, H.G. (1998). Simulating climate change effects in a Minnesota
agricultural watershed. Journal of Environmental Quality, 27(6), 1524-1532.
Harned D. A., Atkins J. B., & Harvill, J. S. (2004). Nutrient mass balance and trends,
mobile river basin, Alabama, Georgia, and Mississippi. Journal of the American
Water Resources Association (JAWRA), 40(3), 765-793.
Heathcote, I. W. (1998). Integrated watershed management: Principles and practice.

John Wiley & Sons Inc.
Horsburgh, J.S., Tarboton, D.G., Maidment, D.R., & Zaslavsky, I. (2008). A relational
model for environmental and water resources data. Water Resources Research, 44
(W05406), doi:10.1029/2007WR006392.
Horsburgh, J.S., Tarboton, D.G., Piasecki, M., Maidment, D.R., Zaslavsky, I., Valentine,
248
I.D., & Whitenack, T. (2009). An integrated system for publishing environmental

observations data. Environmental modeling & software, 24, 879-888.
Horsburgh, J.S., Tarboton, D.G., D. R. Maidment, D.R., & Zaslavsky, I. (2011).

Components of an environmental observatory information system. Computers &
Geosciences, 37 (2), 207-218.
Hydroseek (2012). Retreived from http://www.hvdroseek.org/
Illinois Environmental Protection Agency (2009). Upper North Branch Chicago River
Watershed TMDL Stage 1 Report. Environmental Protection Agency. Retrieved
from http://www.epa.state.il.us/water/tmdl/report/chicago-river/stage-1 -report.pdf
Im, S., Brannan, K.M., & Mostaghimi, S. (2003). Simulating hydrologic and water
quality impacts in an urbanizing watershed. Journal of the American Water
Resources Association, 39(6), 1465-1479.
Imrie, C.E., Durucan, S., & Korre, A. (2000). River flow prediction using artificial neural
networks: generalisation beyond the calibration range. Journal of Hydrology, 233,
138-153.
Inmon, B. (2005). Building the Data Warehouse. New York. John Wiley.
Jeon, J., Yoon, C. G., Donigian Jr., A. S., & Jung W. (2007). Development of the HSPF
Paddy model to estimate watershed pollutant loads in paddy fanning regions.
Agricultural Water Mangment, 90(1-2), 75-86.
Jia, Y., Kinouchi, T., & Yoshitani, J. (2005). Distributed hydrologic modeling in a
partially urbanized agricultural watershed using water and energy transfer process
model. Journal of Hydrologic Engineering, 10(4), 253-264.
Johnson, M.P. (2001). Environmental impacts of urban sprawl: a survey of the literature
and proposed research agenda. Environment and Planning, A33, 717-735.
Jones, T., Johnston, C., & Kipkie, C. (2003). Using annual hydrographs to determine
effective impervious area. Practical Modeling of Urban Water Systems, 11, 291
306.
Kambayashi, Y., Kumar, V., Mohania, M., & Samtania, S. (2004). Recent Advances and
Research Problems in Data Warehouse. Lecture Notes in Computer Science,
7552,81-92.
Kimball, R. (2002). The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling. Wiley publishing.
Krause, P., Boyle, D., & Base, F. (2005). Comparison of different efficiency criteria for
249
hydrological model assessment. Advanced Geosciences, 5, 89-97.
Knapp, H.V., Singh, J., & Andrew, K. (2004). Hydrologic Modeling of Climate
Scenarios for Two Illinois Watersheds. Illinois State Water Survey, ISWS CR
2004-07
Laenen, A., (1983). Storm runoff as related to urbanization based on data collected in
Salem and Portland, and generalized for the Willamette Valley, Oregon. U.S.
Geological Survey Water Resources Investigations Report 83-4143. Retrieved
from http://or.water.usgs.gov/pubs dir/orrpts.html
Lane, P. (2007). Data Warehousing Guide, 119g, Oracle Data Base. Oracle.
LeBlanc, R. T., Brown, R. D. & FitzGibbon, J. E. (1997). Modeling the effects of land
use change on the water temperature in unregulated urban streams. Journal of
Environmental Management, 49, 445-469.
Leon, L.F., Booty, W., Wong, I.,McCrimmon, C., Melles, S., Benoy, G., & Vanrobaeys,
J. (2010). Advances in the integration of watershed and lake modeling in the Lake
Winnipeg basin. Modelling for Environment's Sake: Proceedings of the 5th
Biennial Conference of the International Environmental Modelling and Software
Societyl, 860-867.
Lin, J. (2004). Review of published export coefficient and event mean concentration data.
US Army Corps of Engineer, Wetlands Regulatory Assistance Program ERDC
TN-WRAP-04-3. Retrieved from
http://el.erdc.usace.army.mil/elpubs/pdf/tnwrap04-3.pdf
Lin, G.F., & Wang, C.M. (2007). A nonlinear rainfall-runoff model embedded with an
automated calibration method - Part 1: The model. Journal of Hydrology, 341,
186-195.
Line, D. E., White, N. M., Osmond, D. L., Jennings, G. D., & Mojonnier, C. B. (2002).
Pollutant export from various land uses in the Upper Neuse River Basin. Water
Environment Research, 74{ 1), 100-108.
Linsley, R.K., Kohler, M.A., & Paulhus, J.L. H. (1988). Hydrology for engineers. New
York, NY: McGraw-Hill.
Loehr, R. C, Ryding, S. O., & Sonzogni, W. C. (1989). Estimating the nutrient load to a
waterbody. The Control of Eutrophication of Lakes and Reservoirs, 1, 115-146.
Luzio, M. D., Srinivian, R., & Arnold, J. G. (2002). Integration of watershed tools and
swat model to basins. Jornal of American Water Resources Assaoctian, 35(4),
1127-1142.
250
Mcfarland, A. M. S., & Hauck, L. M. (2001). Determining nutrient export coefficients

and source loading uncertainty using in stream monitoring data. Journal of the
American Water Resources Association, 37(1), 223-236.
McGuire, M., & Gangopadhyay, A. (2006). Modeling, visualizing, and mining

hydrologic spatial hierarchies for water quality management. ASPRS Annual
Conference Reno, Nevada. Retrieved from
http://www.asprs.Org/a/publications/proceedings/reno2006/0094.pdf
McGuire, M., Gangopadhyay, A., Komlodi, A., & Swan, C. (2008). A user-centered
design for a spatial data warehouse for data exploration in environmental
research, Ecological Informatics, 5(4-5), 273-285.
Markel, D., Shamir, U. (2002). Monitoring Lake Kinneret and its watershed: forming the
basis for management of a water supply lake. In: Rubin, H., Nachtnebel, P.,
Fuerst, J., Shamir, U. (Eds.), Water Resources Quality Preserving the Quality of
our Water Resources. Springer-Verlag, pp. 177-190.
Marks D., Seyfried, M., Flerchinger, G., & Winstral, A. (2007). Research Data Collection
at the Reynolds Creek Experimental. Watershed,Journal of Service Climatology,
7(4), 1-12.
Mattikalli, N. M., & Richards, K. S. (1996). Estimation of surface water quality changes
in response to land use change: application of the export coefficient model using
remote sensing and geographical information system. Journal of Environmental
Management, 48, 263-282.
Melching, C. S., Alp, E., Shrestha, R.L., & Lanyon R. (2002). Simulation of water
quality during unsteady flow in the Chicago waterway system. Marquette
University. Retrieved from http://www.mu.edu/environment/Dearborn.pdf
Metropolitan Water Reclamation District of Greater Chicago (2007). Cook County

stormwater management plan. Metropolitan Water Reclamation District of
Greater Chicago. Retrieved from
http://www.mwrd.org/pv obi cache/pv obi id 036DA479F4B3B5B6D01E253F
F79937856366100/filename/Final CCSMP 021507.pdf
Metropolitan Water Reclamation District of Greater Chicago (2011). Information

retrieved from http://www.mwrd.org/
Miller, S. N., Semmens, D.J., Goodrich, D.C., Hernandez, M., Miller, R.C.,
Kepner,W.G., & Guertin, D.P. (2007). The automated geospatial watershed
assessment tool. Environmental Modelling and Software, 22(3), 365-377.
Minns, A.W., Hall, M.J. (1996). Artificial neural network as rainfall-runoff model.
Hydrological Sciences Journal, 41(3), 399417.
251
Mohamoud, Y. M., Parmar, R., & Wolfe, K. (2010). Modeling Best Management
Practices (BMPs) with HSPF. ASCE Conf. Proc. Watershed Management
Conference 2010: Innovations in Watershed Management under Land Use and
Climate Change, doi:10.1061/41143(394)81.
Moran, S.M., Emmerich, W.E., Goodrich, D.C, Heilman, P., Holifield Collins, C.d.,
Reefer, T.O., Nearing, M.A., Nichols, M.H., Renard, K.G., Scott, R.L., Smith,
J.R., Stone, J.J., Unkrich, C.L., & Wong, J. (2008). Preface to special section on
Fifty Years of Research and Data Collection: U.S. Department of Agriculture
Walnut Gulch Experimental Watershed. Water, Resources Research, 44
(W05S01), doi:10.1029/ 2007WR006083.
Muttil, N., & Liong, S.Y. (2004). Physically interpretable rainfall- runoff models using
genetic programming. In: Liong, Phoon, Babovic (Eds.,), Sixth International
Conference on Hydroinformatics.
Muzik, I. (2002). A first-order analysis of the climate change effect on flood

frequenciesin a subalpine watershed by means of a hydrological rainfall-runoff
model. Journal of Hydrology, 267(1-2), 65-73.
Najafi, M. Z., (2003). Watershed modeling of rainfall excess transformation into runoff.
Journal of Hydrology, 270(3-4), 273-281.
National Water Information System. (2012, January). NWIS. Retrieved from USGS
website http://waterdata.uses.gov/nwis/
National Climatic Data Center. (2012, January). NCDC and NOAA. Retrieved from
NCDC website http://www.ncdc.noaa. gov
Nichols, M.H., & AnsonE. (2008). Southwest Watershed Research Center Data Access
Project. Water Resources Research, 44 (W05S03), doi:10.1029/2006WR005665.
Nu-Fang, F., Zhi-Hua, S., Lu, L., & Cheng, J. (2011). Rainfall, runoff, and suspended
sediment delivery relationships in a small agricultural watershed of the
ThreeGorges area, China. Geomorphology, 135(1-2), 158-166.
Ould-Ahmed-Vall,E., Woodlee, J., Yount, C., Doshi, K., & Abraham S. (2007). Using
model trees for computer architecture performance analysis of software
applications. IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS), 116-125.
Paul, M.J., Meyer, J.L. (2001). Streams in the urban landscape. Annual Review of
Ecology andSystematics, 32, 333-365.
Preis, A., & Ostfeld, A. (2008). A coupled model tree-genetic algorithm scheme for flow
and water quality predictions in watersheds. Journal of Hydrology, 349, 364-
252
375.
Qi, H., (2006). Integrated watershed management and land use optimization under
uncertainty (Doctoral thesis). Available from ProQuest database. (UMI Number:
3358529).
Rai, A., Malhotra, P.K., Sharma, S.d., Chaturvedi, K.K. (2007). Data warehousing for
agricultural research- an integrated approach for decision making. Journal of the
Indian Society of Agricultural Statistics, 61(2), 264-273.
Rai, S.C. & Sharma, E. (1998). Comparative assessment of runoff characteristics under
different land use patterns within a Himalayan watershed. Hydrological Process,
12, 2235-2248.
Rainardi, V. (2007). Building a Data Warehouse: With Examples in SQL Server.

Springer.
Ramireddygari, S. R., Sophocleous, M. A., Koelliker, J. K., Perkins s. P., & Govindaraju,
R. S. (2000). Development and application of a comprehensive
simulation model to evaluate impacts of watershed structures and irrigation water
use on streamflow and groundwater: the case of Wet Walnut Creek Watershed,
Kansas, USA. Journal of Hydrology, 236(3-4), 223-246.
Rast, W., & Lee, G. F. (1983). Nutrient loading estimates for lakes. Journal of
Environmental Engineering, 109(2), 502-517.
Reckhow, K. H:, Beaulac, M. N., & Simpson, J. T. (1980). Modeling phosphorus loading
and lake response under uncertainty: A manual and compilation of export
coefficients. U.S. EPA Report No. EPA-440/5-80-011, Office of Water
Regulations, Criteria and Standards Division. Retrieved from
http://nepis.epa.gov/
Regnier, P., O'Kane, J.P., Steefel, C.I., & Vanderborght, J.P. (2002). Modeling complex
multi-component reactive-transport systems: towards a simulation environment
based on the concept of a Knowledge Base. Applied Mathematical Modelling,
26(9), 913-927.
Ren, W.W., Zhong, Y., Meligrana, J., Anderson, B., Watt, W. E.,Chen, J. K., & Leung,
H. L. (2003). Urbanization, land use, and water quality in Shanghai 1947-1996.
Environment International, 29(5), 649-659.
Rob, C., Coronely, C., & Crockett, K. (2008). Data Bases Systems: Design,
Implementation and Management. Cengage Learning EMEA.
Robbins P., & Birkenholtz, T. (2001). Lawns and toxins: an ecology of the city. Cities:
The International Journal of Urban Policy and Planning, 18(6), 369-380.
253
Robbins, P.,& Birkenholtz, T. (2003). Turfgrass revolution: measuring the expanse of the
American lawn. Land Use Polic, 20, 181-194.
Rooy,P., Anderson d., & Verstraelen P. (1993). Integrated water management considers
whole water system. Water Environment and Technology, 5(4), 38^40.
Rose, S., & Peters, N.E. (2001). Effects of urbanization on streamflow in the Atlanta area
(Georgia, USA): a comparative hydrological approach. Hydrological Processes,
75, 1441-1457.
Rujirayanyong, T., & Shi, J.J. (2006) A project-oriented data warehouse for construction.
Automation in Construction, 15, 800-807.
Sahoo, G.B., Ray, C., & De Carlo, E.H. (2006). Use of neural network to predict flash
flood and attendant water qualities of a mountainous stream on Oahu, Hawaii.
Journal of Hydrology, 327, 525-538.
Sapsford, R., & Jupp, V. (2006). Data Collection and Analysis, 2nd ed. SAGE.
Schueler, T.R. & Holland, H. K. (1994). The importance of imperviousness.
Watershed Protection Techniques, 7(3), 100-111.
Schueler, T.R. (1995). Environmental Land Planning Series: Site Planning for Urban
Stream Protection. Prepared by the Metropolitan Washington Council of
Governments and the Center for Watershed Protection, Silver Spring, Maryland.
Retrieved from http://www.mwcog.org/
Schueler, T.R. (2000). The importance of imperviousness. The Practice of Watershed

Protection, 7,7-18.
Seeger, M. (2004). Gaussian processes for machine learning. International

Journal of Neural Systems, 14(2), 69-106.
Sen, A., Sinha, A.P. (2005). A Comparison of Data Warehousing Methodologies.

Communications of the ACM, 48(3), 80-84.
Sheng,Y., Ying, G., & Sansalone Sheng, J. (2008). Differentiation of transport for
particulate and dissolved water chemistry load indices in rainfall-runoff from\
urban source area watersheds. Journal of Hydrology, 567(1-2), 144-158.
Shirinian O., Anne, A., & Christopher G. U. (2007). Modeling the Hydrology and water
quality using BASINS/HSPF for the upper Maurice River watershed, New
Jersey. Journal of Environmental Science & Health, Part A: Toxic/Hazardous
Substances & Environmental Engineering, 42(3), 289-303.
Shrestha, R.R., Ba'rdossy, A., Michael, R. (2007). A hybrid deterministic-fuzzy rule

254
based model for catchment scale nitrate dynamics. Journal of Hydrology, 342,
143-156.
Sliva, L., & Williams, D.D. (2001). Buffer zone versus whole catchment approaches to
studying land use impact on river water quality. Water Research, 35, 3462-3472.
Simitsis, A., Vassiliadis, P., & Sellis T. (2005). Optimizing ETL processes in data
warehouses. Data Engineering: ICDE Proceedings 21st International
Conference, 564-575.
Singh, V.P. (1995). Watershed modeling: Computers models of watershed hydrology.

Littleton, Colo : Water Resources Publications.
Singh, V.P., & Woolhiser, D.A. (2002). Mathematical modeling of watershed hydrology.
Journal of Hydrologic Engineering, American Society of Civil Engineers, 7(4),
270-292.
Singh, V. P., & Frevert, D.K. (2004). Watershed Modeling. ASCE Conf. Proc.
doi:10.1061/40685.
Singh, J., Knapp, H. V., & Demissie, M. (2004). Hydrologic modeling of the Iroquois
River watershed using HSPF and SWAT. Illinois State Water Survey, ISWS CR
2004-08.
Singh, V. P., & Frevert, D. K. (2006). Watershed Models: CRC Press.
Singh R K., Panda, r. K., Satapathy, K. K., & Ngachan, S. V. (2011). Simulation of
runoff and sediment yield from a hilly watershed in the eastern Himalaya, India
using the WEPP model. Journal of Hydrology, 405(3-4), 261-276.
Smith, T.E, Deacon, J.R., & Soule, S.A. (2005). Effects of urbanization on stream quality
at selected sites in the Seacoast region in New Hampshire. U.S. Geological Survey
Scientific Investigations Report, 5103, 18.
Smullen, J. T., Shallcross, A. L., & Cave, K. A. (1999). Updating the U.S. nationwide
urban runoff quality database. Water Science Technology, 39(12), 9-16.
Soil Climate Analysis Network. (2012, January). NCRS. http://www.wcc.nrcs.usda.gov
Solomatine, D.P., Dulal, K.N. (2003). Model tree as an alternative to neural network in
rainfall-runoff modeling. Hydrological Sciences Journal, 48(3), 399411.
Solomatine, D.P., & Xue, Y. (2004). M5 model trees and neural networks: application to
flood forecasting in the upper reach of the Huai River in China. ASCE J.
Hydrologic Engineering, 9(6), 491-501.
255
Solomatine, D.P., Maskey, M., & Shrestha, D.L. (2007). Instance-based learning
compared to other datadriven methods in hydrologic forecasting. Hydrological
Processes, 21, doi: 10.1002/hyp.6592.
Sutherland, R.C. (2000). Methods for estimating the effective impervious area of urban
watersheds. The Practice of Watershed Protection, 32, 193-195.
Tan, P.N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston.
Addison Wesley.
Tang, Z., Engel, B. A., Pijanowski, B. C., & Lim, K. J. (2005). Forecasting land use
change and its environmental impact at a watershed scale. Journal of
Environmental Management, 76, 35-45.
Teuteberg, F., & StraBenburg, J. (2009). State of the Art and Future Research in
Environmental Management Information Systems: Information Technologies in
Environmental Engineering. Environmental Science and Engineering Part 2, 64-
77.
Tjoa, A.M., & Trujillo, J. (2005). Data Warehousing and Knowledge Discovery.
Copenhagen .Springer.
Tokar, A.S., & Markus, M. (2000). Precipitation-runoff modeling using artificial neural
networks and conceptual models. Journal of Hydrologic Engineering, 5(2), 156-
161.
Tong, S. T. Y., & Chen, W. (2002). Modeling the relationship between land use and
surface water quality. Journal of Environmental Management, 66(4), 377-393.
Tong, S. T. Y., & Liu, A.J. (2006). Modelling the hydrologic effects of land-use
and climate changes. Int. J. Risk Assessment and Management, 6(4/5/6).
Tong, S. T. Y., Liu, A.J., & Goodrich, J. A. (2007). Climate change impacts on nutrient
and sediment loads in a Midwestern agricultural watershed. Journal of
Environmental Informatics, 9(1), 18-28.
Tong, S. T.Y., Liua, A. J., & Goodrich, J. A. (2009). Assessing the water quality impacts
of future land-use changes in an urbanising watershed. Civil Engineering and
Environmental Systems, 26(1), 3-18
Tsihrintzis, V. A., & Hamid, R. (1998). Runoff quality prediction from small urban
catchments using SWMM. Hydrological Processes, 12, 311-329.
Tsegaye, T., Sheppard, D., Islam, K.R., Johnson, A., Tadesse, W., Atalay, A., & Marzen,
256
L. (2006). Development of chemical index as a measure of in-stream water

quality in response to landuse and land cover changes. Water, Air, and Soil
Pollution, 174, 161-179.
United States department of Agriculture. (2012, January). Major land uses in USA.
Retrieved from Economic Research Service http://www.ers.usda.gov/
United States Environmental Protection Agency (2000). Ambient water quality criteria
recommendations: Information supporting the development of state and tribal
nutrient criteria for rivers and streams in nutrient ecoregion XI. Office of science
and technology, office of water, EPA, 822-B-00-017. Retrieved from
http://water.epa.gov/scitech/swguidance/standards/criteria/nutrients/uDload/2007
09 27 criteria nutrient ecoregions lakes lakes 2.pdf
United States Environmental Protection Agency (2001). Overview to watershed

assessment: Tools for local stakeholders. Office of water, EPA 832-B-01-004.
Retrieved from http://water.epa.gov/scitech/wastetech/upload/overview-to-
watershed-assessments-tools-for-stakeholders.pdf
United States Environmental Protection Agency (201 la). Clean water act. USEPA.
Retrieved from http://www.epa.gov/lawsregs/laws/cwa.html
United States Environmental Protection Agency (201 lb). Regulations. USEPA. Retrieved
from http://www.epa.gov/lawsregs/
USEPA storage and retrieval system. (2012, January). STORET. Retrieved from EPA
website http://www,epa.gov/storet/
U.S. Geological Survey (1995). Water-Quality Assessment of the Upper Illinois River
Basin in Illinois, Indiana, and Wisconsin: Nutrients, Dissolved Oxygen, and
Fecal-indicator Bacteria in Surface Water, April 1987 through August 1990.
Water-Resources Investigations Report 95-400. Retrieved from
http://pubs.usgs.gov/wri/1995/4005/report.pdf
U.S. Geological Survey (1999). Environmental Setting of the Upper Illinois River Basin
and Implications for Water Quality. Water-Resources Investigations Report 98-
4268. Retrieved from http://il.water.usgs.gov/nawQa/uirb/pubs/reports/WRIR 98-
4268.pdf
U.S. Geological Survey (1999). The quality of our nation's watersnutrients and
pesticides. National water quality assessment program. Retrieved from
http://pubs.usgs.gov/circ/circl225/pdf/front.pdf
U.S. Geological Survey (2012). Real-time water quality monitoring and regression
analysis to etimate nutrient and bacteria concentrations in kansas streams. USGS.
Retrieved from http://ks.water.usgs.gOv/pubs/reports/vgc.06I0.html#HDR01
257
Vanclooster, M., Boesten, J., Tiktak, A., Jarvis, N., & Kroes, J. (2004). On the use of
unsaturated flow and transport models in nutrient and pesticide management. In:
Unsaturated-Zone Modeling: Progress, Challenges and Applications (eds R.A.
Feddes, G.H. de Rooij & J.C. van Dam), 331-361.
Walton, R.S., & Hunter, H.M. (2009). Isolating the water quality responses of multiple
land uses from stream monitoring data through model calibration. Journal of
Hydrology, 375(1-2), 29-45.
Wang, X., Sheng, Y., & Huang, G.H. (2004). Land allocation based on integrated GIS
optimization modeling at a watershed level. Landscape and Urban Planning,
66(2), 61-74.
Wang, S.H., Huggins, D.G., Frees, L., Volkman, C.G., Lim N.C., Baker, D.S, Smith, V.,
& DdeNoyelles, Jr., F. (2005). An integrated modeling approach to total
watershed management: water quality and watershed management of Cheney
Reservoir.. Water and Air and Soil Pollution, 164,1-19.
Wang Y., & Witten, I. (1997). Inducing model trees for continuous classes. Proceedings
of the 9th European Conf. on Machine Learning, 128-137.
Weng, Q. (2001). Modeling urban growth effects on surface runoff with the integration of
remote sensing and GIS. Environmental Management, 28(6), 73748.
Wicklein, S.M., & Schiffer,D.M. (2002). Simulation of runoff and water quality for 1990
and 2008 land-use Conditions in the Reedy Creek Watershed, East-Central
Florida. Water-Resources Investigations Report 02-4018; U.S. Geological Survey.
Retrieved from http://pubs.usgs.gov/wri/
Widom, J. (1995). Research problems in data warehousing. In Proc. CIKM.
Wilson, C.O., & Weng, Q. (2011). Simulating the impacts of future land use and climate
changes on surface water quality in the Des Plaines River watershed, Chicago
Metropolitan Statistical Area, Illinois. Science of the Total Environment, 409(20),
4387-4405.
Winger J.G., & Duthie, H.C. (2000). Export coefficient modeling to assess phosphorus
loading in an urban watershed. Journal of the American Water Resources
Association, 36, 1053-106.
Wu, R.S., & Haith, D.A. (1993). Land use, climate, and water supply. Journal of Water
Resources Planning Management, 119(6), 685-704.
Wu, Q., Li, H., Wang, R., Paulussen, J., He, Y., Wang, M. (2006). Monitoring and
258
predicting land use change in Beijing using remote sensing and GIS. Landscape
Urban Planning, 78, 322-33.
Zhu,W., Bian, B.,& Li, L. (2008). Heavy metal contamination of road-deposited

sediments in a medium size city of China. Environmental Monitoring and
Assessment, 147, 171-181.
Yee, K.Y., Ray, A.K., & Rangiah, G.P. (2003). Multi-objective optimization of industrial
styrene reactor. Computers and Chemical Engineering, 27, 111-130.
Yu, X., Zhang, X., & Niu, L. (2009). Simulated multi-scale watershed runoff and
sediment production based on GeoWEPP model. International Journal of
Sediment Research, 24(4), 465-478.
Zoppou, C. (2001). Review of urban storm water models. Environmental Modelling &
Software, 16(3), 195-231.

Original Archival

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Original Archival

Uploaded by

Copyright:

Available Formats

ORIGINAL ARCHIVAL COPY

LAND USE EFFECTS ON WATER QUALITY: BUILDING A FRAMEWORK FOR

CHICAGO RIVER WATERSHED

NAILA GHIDEY ISMAIL MAHDI

Submitted in partial fulfillment of the

All rights reserved

INFORMATION TO ALL USERS

NAILA GHIDEY ISMAIL MAHDI

I am deeply grateful to my advisor, Professor Krishna Pagilla, for his constant

Nadia, Nour and Yahia.

LIST OF FIGURES viii

2. LITERATURE REVIEW AND THEORETICAL BACKGROUND 10

4. WATERSHED DATA WAREHOUSE 71

5. DATA DRIVEN MODEL TO PREDICT WATER QUALITY 113

5.1 Introduction 113

6. WATER QUALITY MODELING USING BASINS/HSPF 144

6.1 Introduction 144

7.1 Summary 195

3.1 Sources and types of potential pollutants in the study area 60

3.2 Sources'data description 62

3.3 Average annual North Side WRP effluent 64

4.1 The Bus Architecture Matrix for Watershed Data Warehouse 82

4.2 Entity definition 83

4.3 Watershed Data Warehouse tables'statistics 90

4.4 Watershed water quality fact data table 91

5.1 Predictors'properties 127

5.2 Prediction accuracy of regression models 134

5.3 Total nitrate classes 136

5.4 Prediction accuracy of ANN model 139

5.5 Prediction accuracy of logistic regression model 139

5.6 Prediction accuracy of SVM model 140

5.7 Prediction accuracy of decision tree model 140

5.8 Prediction accuracy of lazy learner model 141

5.9 Prediction accuracy of nai've bayes model 141

6.1 Metrological data required for HSPF 147

6.3 General calibration/validation targets or tolerances for HSPF 163

6.6 Statistical results of hydrology validation 172

6.7 Statistical results of water quality calibration 178

6.8 Statistical results of water quality validation 181

6.9 Comparing Physical and data driven models 182

6.10 Simulated annual loads of total nitrogen 184

6.11 Simulated annual loads of total phosphorous 185

2.1 Major land use areas in USA 12

2.2 Components of a typical watershed /hydrologic model 27

2.3 Structure chart for PERLND module 41

2.4 Structure chart for IMPLND module 41

2.5 Structure chart for RCHRES module 42

2.6 A flow diagram of the hydrological components of HSPF 42

3.1 Study area 55

3.2 Urban land use in Chicago 56

3.3 Locations of data sources 61

3.4 Basic watershed elements 69

4.1 Data warehouse components 74

4.4 Watershed data warehouse multi-dimensional model 88

4.5 Graphical user interface for watershed data warehouse 94

4.6 An ad hoc analysis example for watershed data warehouse 94

4.8 TKN historical data 104

4.9 Total nitrates historical data 105

4.10 Total phosphorous historical data 106

4.12 N/P ratio for downstream station 108

4.13 Dissolve oxygen historical data 109

4.14 DO vs. water temperature for upstream station 110

4.15 DO vs. water temperature for downstream station 110