Professional Documents
Culture Documents
BY
DEPARTMENT OF
CIVIL, ARCHITICHTURAL, AND ENVIRONMENTAL ENGINEERING
Approved
Adviser
Chicago, Illinois
May 2012
UMI Number: 3529157
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
UMI 3529157
Published by ProQuest LLC 2012. Copyright in the Dissertation held by the Author.
Microform Edition ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Copyright by
May 2012
11
ACKNOWLEDGEMENT
support. Without his help this work would not be possible. I would also like to thank the
members of my committee for their inputs. A special thanks to Dr. Tzuoh-Ying Su of the
U.S. Army Corps of Engineers (USACE), Chicago District, in providing information and
data.
I am greatly indebted to my dear husband Haithum Elhadi for his huge support
and assistance. I dedicate this thesis to him and to our wonderful children Sapheya,
111
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENT iii
LIST OF TABLES vi
LIST OF SYMBOLS xi
ABSTRACT xiii
CHAPTER
1. INTRODUCTION 1
1.1 Introduction 1
1.2 Statement of the Problem 1
1.3 Goals of the Study 2
1.4 Objectives of the Study 4
1.5 Overview of the Thesis 9
2.1 Introduction 10
2.2 Land Use Effect in Urban Watershed 10
2.3 Regulations 18
2.4 Watershed Modeling 21
2.5 Data Integration and Data Warehouse 43
2.6 Conclusion 49
3. STUDY AREA 51
3.1 Introduction 51
3.2 Watershed Characteristics 51
3.3 Watershed Data Used in the Study 59
3.4 Watershed Elements 65
3.5 Conclusion 70
4.1 Introduction 71
4.2 Data Warehouse Technology 72
4.3 Watershed Data Warehouse 76
iv
4.4 The Development of Watershed Data Warehouse 78
4.5 Graphical User Interfaces 91
4.6 Chicago River Watershed Data Warehouse 95
4.7 Conclusion 112
7. CONCLUSIONS 195
APPENDIX
A. DATA WAREHOUSE AND DATA MINING 202
B. BASINS/HSPF 232
BIBLIOGRAPHY 241
v
LIST OF TABLES
Table Page
2.1 Characteristics of major watershed models 28
6.2 Some of TIA percentages adopted for this study based on literature 156
6.4 Calibration/Sensitivity analysis for EIA equations for this study 168
vi
6.5 Statistical results of hydrology calibration 171
vii
LIST OF FIGURES
Figure Page
1.1 Elements of the research topic 8
4.2 Roll-up for the land use type dimension and related attributes 87
4.3 Star schema model for watershed water quality data mart 87
4.7 Water quality and quantity stations used in the watershed assessment 100
Vlll
4.11 N/P ratio for upstream station 107
6.1 The Chicago River Watershed delineation process using BASINS 150
6.2 Schematic created by WinHSPF for the upper Chicago River subbasins 152
6.6 Observed vs. simulated flow scatter plot for calibration period 170
6.9 Observed vs. simulated flow scatter plot for validation period 174
ix
6.12 Simulation of ortho phosphate for calibration period 177
x
LIST OF SYMBOLS
Symbol Definition
ANN Artificial Neural Network
BAM Bus Architecture Matrix
Better Assessment Science Integrating Point & Non-
BASINS point Sources
CMAP Chicago Metropolitan Agency for Planning
CWA Clean Water Act
DM Data Mining
DO Dissolved Oxygen
DW Data Warehouse
EC Export Coefficient
EIA Effective Impervious Area
EPA Environmental Protection Agency
GIS Geographical Information System
GUI Graphical User Interface
HSPF Hydrological Simulation Program-FORTRAN
IEPA Illinois Environmental Protection Agency
MAE Mean Absolute Error
Metropolitan Water Reclamation District of Greater
MWRDGC Chicago
NPS Non Point Sources
NSE Nash-Sutcliffe Efficiency
NWS National Weather Service
PME Percent Mean Error
PS Point Sources
RAE Relative Absolute Error
RMSE Root Mean Square Error
RRSE Root Relative Squared Error
XI
ROC Receiver Operating Characteristic
SQL Standard Query Language
SVM Support Vector Machines
TIA Total Impervious Area
TKN Total Kjeldahl Nitrogen
TMDL Total Maximum Daily Loads
TN Total Nitrogen
TP Total Phosphorous
USEPA US Environmental Protection Agency
USGS U.S. Geological Survey
WDW Watershed Data Warehouse
WEKA The Waikato Environment for Knowledge Analysis
WQS Water Quality Standards
xn
ABSTRACT
watershed approach that models the dynamics of water quality and landuse in a highly
urbanized watershed.
The landuse-water quality relationship is a complex relationship and has not been
adequately addressed for highly urbanized watersheds. Factors such as inadequate urban
planning, increase of impervious areas and dynamics of population growth are some of
the reasons for the complex relationship. Also point sources are always easy to be
identified and controlled unlike nonpoint sources such as urban storm runoff. Both
quantities and transport pathways of pollutant inputs are impacted by land use in the
watershed. So, examining the factors that govern the relationship between different land
uses and water quality within a watershed can give insights and important information
The two backbone concepts in this study are the holistic watershed perspective
and the role of historical data records as part of assessment, modeling and integration
tools of the watershed framework. Analysis of the records will explain watershed
conditions identifying the major problem areas and justify the modeling and post analysis
procedures. Data sources are often important but data availability, heterogeneity and
repository and methodologies for analyzing and assessing the watershed using data
complex querying of watershed data and discovery of trends and patterns in data by
xiii
incorporating 40 years worth of watershed data from different source agencies in a
Also, the discipline of data driven modeling was introduced in this thesis using
the developed central repository. Several regression and classification algorithms were
presented and assessed for their appropriateness for predicting total nitrates using few
analysis system BASINS coupled with the comprehensive, conceptual, and continuous
simulation watershed scale model HSPF resulted in export coefficients for level (III),
detailed land use for the Chicago River watershed. The water quality simulation approach
utilized in this research to generate the coefficients constitutes a new contribution to the
The continuous calibrated and validated model can be used in the investigation
and analysis of different scenarios and possible future conditions, thus providing a
planning tool for regulatory environmental agencies. The data driven models developed
can be used as operation tool to maintain the water quality parameters especially if
TMDL and WQS are developed for Chicago River Watershed. So the framework
proposed for this study can be considered robust with the proposed integration, planning
and operating techniques and tools. Furthermore, an optimization tool is introduced in the
xiv
1
CHAPTER 1
INTRODUCTION
1.1 Introduction
The pollution of urban watersheds has become a serious problem that threatens
the urban ecological environment. Surface water quality issues in highly urbanized
watershed are increasing in Chicago metropolitan area just like most of the urban
watersheds in the United States. The sources of the pollution and their contributions are
highly dependent on the type of land use and land cover in the watershed. Although,
identification, quantification and control for contributions from point sources could be
achievable, the same could not be said for nonpoint sources contribution.
urbanized watersheds (Bian et al., 2011; Brezonik et al., 2002). The effect of it results in
change of available water quantities for direct runoff, stream flow and ground water flow.
Moreover, it affects considerably the chemical, physical, and biological processes in the
receiving water bodies. The complexity of the factors governs these processes and the
random patterns of precipitation make it difficult to control the storm water runoff
Nutrients, such as nitrogen and phosphorus, are essential for a healthy and diverse
undesirable effects on water quality, resulting in adverse changes in the biological and
aquatic life (USEPA, 2000). Potential risks to human health are also associated with the
2
growth of harmful algal blooms (Hamed et al., 2004). In 1998 list of impaired waters, the
Runoff from different types of land use carries different kinds of contaminants
and pollutants. For example, agricultural land uses' runoff carry high amounts of
nutrients and sediments, while, runoff from developed urban areas may carry sodium and
sulfate from road salt treatments along with other different materials such as rubber and
Moreover, different types of land cover can modify the hydrologic cycle, water
balance, water temperature and other surface land and water characteristics due to the
percolation, sedimentation, erosion etc. (Tong et al., 2002; LeBlanc et al., 1997). Thus,
the land use type will not only affect amount of runoff and pollutants inputs but will also
Typically, small amounts of nutrients are received from forest land uses; while
large amounts are received from land uses that involve fertilization and soil disturbance
(Calderon, 2009). The strong relationship between land use types and the quantity and
Examining the factors that govern the relationship between different land uses and
water quality within a watershed can give insights and important information about
existing and potential sources of contamination. Also for future planning, development,
3
and decision-making purposes, there is a need for a reliable analysis and assessment tools
that can predict the future water quality conditions under various scenarios.
managers and policy makers as an effective methodology to address effectively the full
source contamination, reduce polluted runoff, and protect drinking water sources
2009).
Previous studies that aggregated watershed elements to evaluate land use effects
on water quality are deficient in considering the detailed spatial and temporal aspects of
the urban watershed. Incorporating detailed land use and historical data records to
develop tools to quantify the impact on water quality are the key element in the tools
especially those related to impacts of land use will provide a better assessment of current
conditions and will provide good indication of what the future will hold if there are any
future land use development plans. Going through historical data records for basic
watershed elements such as water quality, quantity, land use, climate and watershed
characteristics and the interaction in between them will allow a thorough understanding
of the past and present conditions of the watershed and will allow for better decisions for
the future.
planning and policy making tools to assess, analyze, and quantify detailed land use
4
methodologies and components for data integration, analysis and assessment such as data
optimization approach that utilizes the watershed modeling outputs will be introduced
later. Tools such as data mining techniques and watershed models are used to analyze,
describe and predict the behavior of the watershed and how it is impacted by highly
The purpose of this study is to understand and model the dynamics of nutrients in
a highly urbanized watershed. The effect of detailed urban land use on nutrients runoff to
water bodies in the Chicago River Watershed is investigated. Different tools and different
data about water quality, water quantity, point and non point sources, geospatial,
meteorological and land use data in a holistic watershed approach to examine nutrients'
pollution.
approximately 645 mi . It is 82% urban land use. The highly urbanized watershed is
recently facing issues like the invasion of the Asian carp and other water quality issues
which prompted serious talk about making drastic decisions and actions considering
hydrological separation of the Great Lakes and Mississippi River basins, or even re-
United States policies and regulations, such as the clean water act (CWA), were
created and are implemented to help maintain the quality of our water resources in the
5
United States (IEPA, 2009). Under section 303(d) of the CWA, states are required to
EPA's national tracking system for impaired waters. A state's 303(d) impaired waters list
identify where the required pollution controls are not sufficient to attain or maintain
applicable WQS. The states are required to establish and develop prioritized Total
Maximum Daily Loads (TMDLs) program for lakes and rivers listed as impaired waters
(303(d) list) by EPA. Not much done for the watershed, only a "Stage 1" TMDL report
Agency (IEPA) and the United States Environmental Protection Agency (USEPA).
The purpose of the proposed project was to develop TMDLs impaired water
bodies on portion of the watershed, the Upper North Branch of Chicago River Watershed.
The potential causes of impairment for those segments proposed in the report were
chloride, dissolved oxygen, fecal coliform, pH, water temperature, and total phosphorus.
The framework proposed in the study provides tools to assess the watershed,
predict water quality parameters, quantify detailed land use effect on water quality and
could be implemented to maintain any developed TMDL and WQS for the watershed.
1.4.1 Strategy. Elements of the proposed framework are shown in Figure 1.1. They
consist of watershed data warehouse component; data analysis and watershed assessment
6
A local watershed data warehouse (WDW) that integrates and aggregates different
available data types from various agencies will be constructed. This DW will make it
easy to access, retrieve, manage data records, resolve missing data issues, integrate,
analyze, and assess historical watershed data. Water quantity and quality, climate, land
use and more of the watershed data could be and integrated to provide watershed
assessment or data requirements for modeling, for this study and for any similar studies
in the area. The local WDW will help: 1) Develop a deeper understanding of the
Existing data integration methods are deficient in their ability to easily access and
provide synthesized data for the watershed. This is because monitoring records are
analysis from depends mainly on users' ability to navigate through these data sources.
Even the systems that were built to alleviate the issue proved to be deficient in their
ability to provide a decision making tool and interfaces that allow navigation through the
data records.
integration model. Using this model will make it easy to investigate data in its most
atomic view and hence make it flexible to be accessed, retrieved and integrated across
many different spatial and temporal levels. Analysis of the historical data record will give
insight of the previous and existing watershed conditions and its sensitivity toward
7
different parameters, making it easy to concentrate either on the whole watershed or just
in a specific sub watershed. A graphical user interface that is specifically tailored for the
watershed is introduced to facilitate access to the WDW and bring the benefits of the
allows users to summarize data, perform analysis, slicing and dicing of data to assess the
simulate the watershed behavior and to develop the nutrients' export coefficients for
detailed land use types. The continuous watershed simulation model takes into
consideration detailed land use and long term simulation. The detailed land use considers
the effective imperviousness concept which takes into account whether the impervious
surface is directly connected to a drainage system or not. The resulted nutrient's export
coefficients are site specific indicators that incorporate lot of the watershed conditions
and variables at the watershed level including hydro meteorological data, topographic
data, land use management practices and physical characteristics. These coefficients
provide the numerical quantification for different land use type. They would be the input
Framework
Data
Data Mining
Modeling
Integration & Export Optimization
Data coefficient
Analysis
Data
Presentation
Watershed
Assessment
of the theory and important principles in Chapter 2. Chapter 3 gives an overview of the
study area. Chapter 4 introduces the WDW and multi-dimensional model, watershed
assessment and data mining results. Chapters 5 introduces the data driven models
developed to predict water quality parameters. Chapter 6 presents and discusses the
results from the water quality model. Chapter 7 concludes the dissertation and evaluates
the watershed framework summarizing the most important findings of the investigation
and outlines areas for future research including the introduction of a multi-objective
optimization approach.
10
CHAPTER 2
2.1 Introduction
within that area drains to a common waterway (EPA, 2011). Water movement in the
watershed can be influenced by factors such as topography, soil composition and water
emphasized by the impacts of its pollution sources on all down gradient areas including
In this study, the two backbone concepts are the holistic watershed perspective
and the role of historical data records as part of assessment, modeling processes and
building of a watershed framework. The proposed study is composed of four parts: build
WDW that can easily access and manage data records; followed by watershed analysis,
assessment and data mining; then a data driven model that predict water quality and
quantity through data driven algorithms; then water quality simulation using the
Hydrological Simulation Program FORTRAN (HSPF) that simulates land use effects on
Urban areas contains much of the world population and inspite of that they cover
a relatively small proportion of the earthjust 2.6 percent in the United States
11
(Figure.2.1) (USDA, 2012). However, urban areas can have fundamental ecological
Over the years, land uses have seen rapid and extreme changes in the United
States that altered the surface characteristics of watersheds and impacted water quality
and quantity (Allan, 2004). Urban sprawl, inadequate urban planning, population
dynamics, increase of impervious areas, and increase of industrial and agricultural sectors
are all factors that are endangering the quality and quantity of water (Calderon, 2009).
The knowledge about land use and land cover has always been an important
loss of fish and wildlife habitat (Anderson et al., 1976). Land use classification systems
are needed in the analysis of environmental processes and problems. To gain information
of the different classes and categorizes of each land use type, land use can be classified at
the more detailed levels taking into account criteria of capacity, type, and needs into
account (Anderson et al., 1976). One example of a category of urban land use (Level I)
would be residential land use (Level II) which can be further subcategorized into single-
family unit or multi-family units etc. (Level III). The following sub-sections will further
discuss different aspects of the effect of urban land use on surface water quantity and
Miscellaneous
Urban areas other land
Cropland
2 6% 10 1%
19 5%
Special-use
areas
13 1%
2.2.1 Urban Land Use Effect on Surface Water. The effect of urbanization on
streams differs from one system to another; some systems suffer radically from relatively
minor impacts, while others show less sensitivity (Smith, 2005). In urban land use areas,
parking lots and pavements. The impacts of those areas on watersheds have always been
accounted on aspects such hydrology, climate, and ecology (Rose et al., 200; Paul et al.,
2001).
The effects urban land use can have on water quality of streams, rivers, lakes and
estuaries of watersheds had been the base of a lot of studies over the years (Hanratty et
al., 1998; Rai et al., 1998; Bhaduri et al., 2000; and Bhaduri et al., 2001). Even streams in
urban watersheds are now characterized by having fundamental differences from streams
surface runoff due impervious cover (Tong et al., 2009). The volume of runoff and flood
damage potential is greatly high in urban areas than in other land uses' areas (Weng,
2001). Also, impacts on sub watershed scale when spatial variation of urbanization was
considered showed high impact on runoff and nitrogen that is directly proportional to
modeling studies over the years which have consistently shown that urban pollutant loads
increase with increase in imperviousness (Cianfrani et al., 2006; Allan, 2004; Barnes et
al., 2002; Beach, 2002; Cappiella et al., 2001; Finkenbine et al., 2000; Schueler, 1994).
Studies shows that the more the increase in the impervious surfaces the more significant
degradation have been noticed in the quality of aquatic resources and surface waters
14
(Tsegaye et al, 2006; Doll et al., 2002; Johnson et al, 2001; Bhaduri et al, 2000; Arnold et
al., 1998).
could degrade runoff water quality from different types of land use. Runoff from highly
developed urban areas may be containing sodium and sulfate from road deicers and even
rubber fragments or heavy metals (Tong et al., 2002). A study in an eastern Illinois
watershed found that urban land use was the main cause of nitrogen and phosphorous
relative to agricultural land use (Ahearn et al., 2005). The same conclusion was reached
in an urban land use in studies in Alabama and Ontario (Canada) (Silva et al, 2001;
Basnyat et al, 1999; Ahearn et al., 2005). Concentrations of total phosphorous in urban
area streams are generally higher than the concentrations in agricultural area streams
(Brett et al, 2005; USGS, 1999; Winger et al., 2000; Donaldson, 2005). These elevated
levels of phosphorous found were due to point source pollution from wastewater
treatment plants in urban land uses relative to non-point sources pollution associated with
fertilizers in agricultural land uses (USGS, 1999; Robbins et al, 2001; Robbins et al.,
2.2.3 Modifications Due to Urban Land Use. Land surface characteristics along with
water balance and hydrologic cycle can be modified by changing land use and the
absorption (Tong et al., 2002; LeBlanc et al., 1997). As a result, significant changes occur
in the quantity of water available for stream and ground water flow, and the different
chemical, physical, and biological processes in the receiving water bodies are modified
(Tong et al., 2002). In a study that classify surface water in urban land use, a strong
correlation between proportion of urban land use area such as residential and industrial,
and worsening water quality had been found (Ren et al., 2003). Although these land uses
considered as pollutant sources are inevitable, they can greatly affect the hydrology and
2.2.4 Land Use Effect on Water Quality and Quantity. Although lot of studies
investigated the impacts of land use on water quantity and quality (Wu et al.,1993;
Mattikalli et al., 1996; Tsihrintzis et al., 1998; and Bouraoui et al., 1998), quantifying
water quality in a river watershed based on land use patterns is still developmental (Tong
et al., 2002; Tong et al., 2009). This is due to the complex relationship between different
land uses patterns with water quality and quantity under different environmental and
Tools such as hydrological models that are coupled with geographic information
systems (GIS) and remote sensing proved to be powerful techniques in conducting these
kinds of studies (Conway et al., 2005; Wang et al., 2005). Other integrated approaches
involve the use of statistical and spatial analyses, as well as hydrologic modeling to
examine the effects of land use on water quality (Tong, 2007; Tong et al., 2002).
Most researches depend on field studies and focus on local geographical scale and
small range of land use patterns to view the issue (Wilson et al, 2011; Akhavan et al.,
2010; Leon et al., 2010; Tong, 2006). Integrated approaches that involve holistic view of
16
the issue, integrate different data records in the area, and utilize different methods of
watersheds (Tong et al., 2009). This understanding will provide a better assessment of
current conditions and will provide good indication of what the future will hold if there
management, the impacts of urban land use was detailed for the Chicago area. The study
stated that land developments clearly altered the region's runoff patterns by converting
pervious land to impervious land, and by considerably changing the drainage patterns
(MWRDGC, 2007).
dominated hydrology had occurred (MWRDGC, 2007). That led to huge increase in the
recharge. Changing runoff rates and volumes can create the typical impacts that
Flooding. The rates of flow have increased by 100 to 200 percent or even more in
urbanizing watersheds. Detention basins can help reduce this effect, however cumulative
increases in runoff volumes tend to decrease detention effectiveness when the whole
rate of runoff tends to acquire very high speed in channels. This leads to the scouring and
Destabilization. Storm flows tend to stress aquatic life whether it is high flow in
wet season or low flow in dry season. The high speedy flows tend to flush the natural
substrates and organisms. In dry seasons, reduced and extended low flows results in
siltation that reduce stream depth and elevation of water temperature during summer time
(MWRDGC, 2007).
2.2.4.2 Water Quality Impacts. High density developments such as commercial and
industrial land use projects were found to contribute more to the pollution of storm runoff
to the high concentrations of heavy metals and organic compounds. The high organic
contents may results in high oxygen demand when it decomposes in stream waters
(MWRDGC, 2007).
excessive growth of algae and other undesirable aquatic plants. Impairment to aesthetics,
recreational and quality of the water body can deteriorate (MWRDGC, 2007).
elevated water temperatures increase the toxicity problem to aquatic life. Decomposed
18
organic matter that is washed by storm runoff tends to lower the dissolved oxygen to low
Bacterial Contamination. For storm runoff, it was found that the water quality
standard for fecal coliform bacteria is frequently violated in urban water bodies after a
storm event. This violation reflects the presence of significant animal or human waste in
Salt Contamination. Salinity levels in urban watersheds have higher levels due
to salt treatment used for deicing roads. This may adversely impact certain plant
water temperatures due to the removal of natural shading and the reduction of base flows.
Moreover, impervious surfaces results in runoff being heated by the sun raising its
temperature. Elevated water temperatures stress aquatic life and aggravate water quality
2.3 Regulations
United States policies and regulations, such as the Clean Water Act (CWA), were
created and are implemented to help maintain the quality of our water resources in the
United States (IEPA, 2009). Each state is charged by U.S. EPA to develop water quality
standards (WQS). WQS are laws or regulations that states authorize in order to enhance
19
water quality and to ensure that designated use of waters is not compromised (IEPA,
life, aesthetic quality, and public and food processing water supply;
A policy that ensures water quality improvements are conserved, maintained and
Now there are an estimated 34,000 impaired waters and 58,000 associated
impairments officially listed in the U.S., where nutrients and sediments are two of the
most common pollutants included in the list (Borah et al, 2006). Since 1972, public
awareness and concern for controlling water pollution led to the enactment and then the
amendment of the CWA in 1977. The act established the basic structure for regulating
discharges of pollutants into the waters of the United States. EPA is given the authority to
implement pollution control programs. EPA stated various regulatory and no regulatory
tools to reduce direct pollutant discharges in an effort to restore and maintain the integrity
Clean Water Act. For many years following the passage of CWA in 1977, the
focus was mainly on the chemical aspects of the "integrity" goal stated by EPA. Also
efforts focused on regulating discharges from traditional "point source" facilities, such as
municipal sewage plants and industrial facilities, and little attention was given to runoff
from streets, construction sites, farms and other urban storm runoffs (USEPA, 201 la).
20
Starting in the late 1980s, more attention has been given to physical and
biological integrity and polluted runoff. For "nonpoint" runoff, voluntary programs such
as cost-sharing were key tools. For urban point sources regulatory approaches are being
strategies. The watershed approach ensures equal emphasis on both protecting and
restoring waters. A full range of issues and problems are addressed and not only those
groups, the different processes to achieve and maintain state water quality and other
The major CWA programs are: WQS; Anti-degradation policy; Water body
monitoring and assessment; Reports on condition of the nation's waters; Total Maximum
Daily Loads (TMDLs); NPDES permit program for point sources; Section 319 program
for nonpoint sources; Section 404 program regulating filling of wetlands and other
waters; Section 401 state water quality certification; and state revolving loan fund (SRF)
(USEPA, 2011a).
Under section 303(d) of the CWA, states are required to develop lists of impaired
waters. This program is EPA's national tracking system for impaired waters. A state's
303(d) impaired waters list identify where the required pollution controls are not
sufficient to attain or maintain applicable WQS. The states are required to establish and
develop prioritized Total Maximum Daily Loads (TMDLs) for the identified waters.
21
can receive and still safely meet WQS, and an allocation of that load among the various
sources of the pollutant and a margin of safety (MOS) which takes into account any lack
of knowledge concerning the relationship between effluent limitations and water quality.
where,
background); and
Long term plans (8 to 13 years) are provided to states by EPA for completing
TMDLs from the first listing of the water body. Water bodies are allowed to be removed
from their 303(d) list after a TMDL have been developed or other changes to solve water
quality issues have been made (USEPA, 2011b). While CWA have required TMDLs
developments since 1972, until now EPA and the states have not developed many.
Watershed models are useful tools that enable interpretation, quantification, and
systems through set of equations that explain the problems and develop a method to solve
them (Regnier et al., 2002; Miller et al., 2007). They can simulate pollutants' generation
and movement across land and through rivers and other water systems to predict flows,
stages and pollutant concentrations (Barling et al., 1994). In general they simulate natural
22
processes for the flow of water, sediment, chemicals, nutrients, and microbial organisms
within watersheds, as well as quantify the impact of human activities on these processes
Models are merely a reflection of our understanding for the watershed systems
and this understanding define the quality of results they produce (EPA, 2011). However,
management (Jia et al., 2005). Simulation of these natural processes plays a fundamental
role in addressing a range of water resources, environmental, and social problems (Singh
et al., 2004). They are highly utilized to understand dynamic interactions between climate
Also the general classification of models will be shown. Some of the currently used
models in the USA and other parts of the world will be mentioned. The strengths and
limitations in both computing capabilities and available data (Singh et al, 2006). The
advance of computers and the following rapid growth of computing capability in the
decades to follow made the watershed modeling more comprehensive (Singh et al, 2006).
The development of the Stanford Watershed Model (SWM), now called Hydrological
lumped or 'conceptual' models (Singh et al, 2004). During the decades of the 1970s and
1980s, more mathematical models were developed for simulation of watershed hydrology
and their applications in other areas, such as environmental and ecosystems management
(Singh et al., 2002). Examples of such watershed hydrology models are Storm Water
Weather Service (NWS) River Forecast System, Streamflow Synthesis and Reservoir
Hydrology Distributed Model (IHDM), and others (Singh et al, 2006). These models
laws, and expressed other processes using empirical algebraic equations (Singh et al,
more recent conceptual models to simulate the dynamic variation in areas contributing to
direct runoff (Singh et al, 2004). The development of new models along with constant
such as intended use, accuracy, data availability and study area characteristics should be
The model structure and architecture are determined by the objective for which
the model is built. Singh (1995), classified models based on the process descriptions; the
process time and space scale; the techniques of solution; modeled area land use, and the
based computer models (Ahmad, 2010). Empirical models consider factors such as field
observation, measurement, experiments and statistical methods. But the problem with
these types of models is that they are site specific and require long-term data. They show
good performance when used in simulating hydrology or soil erosion (Ahmad, 2010).
The physical-based models are founded on a scientific base and fundamental knowledge
Physical-based models are generally more preferred because they provide a better
watershed models that represent hydrologic and water quality processes using both
Spatial scale models are further classified into either lumped or distributed models and
temporal scale models are further classified into event-based or continuous model
(Ahmad, 2010). Lumped models are spatial scale models where the watershed is
considered to be a single unit for computations and watershed parameters, where they are
adjusted for each sub-unit and averaged over the entire unit, while distributed models
divide the watershed into small units, each having homogeneous properties (Wu, 2006;
Ahmad, 2010). Physical and hydrologic characteristics related with this area are lumped
together to represent the watershed as one uniform system (Qi, 2006). Now event-based
models are temporal scale models that can simulate single storm events and do not take
25
into account the hydrologic cycle (Wu, 2006). The continuous hydrologic models, on the
other hand, consider the whole hydrologic cycle and effects of long-term hydrological
changes and watershed management practices (Ahmad, 2010; Wu, 2006). Watershed
rainfall-runoff models (Nu-Fang et al., 2011; Sheng et al., 2008; Najafi, 2003; Muzik,
2002). Continuous models are used to investigate long term processes such as fate and
transport of pollutants (Singh et al., 2011; Yu et al., 2009; Jeon et al., 2007;
Ramireddygari et al., 2000). Combined models that have both long-term and single-event
analysis, stochastic processes, and probabilistic analysis are necessary to analyze the
output of models (Tong et al., 2006; Calderon, 2009). Because of uncertainties in model
structure such as parameter values, precipitation, and other climatic inputs, uncertainty
analysis and reliability analysis can be employed to examine their impact (Calderon,
2009).
2.4.3 Currently used watershed models. Several known watershed models are
currently in use in the U.S. and elsewhere (Singh et al., 2004). The models' construction
and component processes vary significantly according to the different purposes they are
supposed to fulfill. Some of these models are: The Hydrologic Engineering Center's
Hydrologic Modeling System HEC-HMS is used in the private sector for designing
drainage systems and quantifying the effect of land use change on flooding; The National
Weather Service NWS model is used for flood forecasting; HSPF and its extended water
26
quality model are the standard models adopted by EPA; The Modular Modeling System
MMS model adopted by USGS is a widely used model for water resources planning and
management works; and distributed hydrologic model WATFLOOD is the popular model
in Canada, used for hydrologic simulation; RORB and WBN models are runoff routing
model commonly employed for flood forecasting, drainage design, and evaluating the
effect of land use change in Australia; TOPMODEL and SHE are the standard models for
hydrologic analysis in many European countries; HBV model is the standard model for
flow forecasting in Scandinavian countries; ARNO, LCS, and TOPIKAPI models are
popular in Italy; TANK models are also popular in Japan; The Xin'anjiang model is a
commonly used model in China (Singh et al, 2004). From literature, many other
watershed models can be found. Table 2.1 shows characteristics of major watershed
Inputs from
precipitation
Pervious Areas Impervious Areas
V
Surface Surface
storage runoff Surface
1 f
storage
* 1 f
Surface
Soil water Interflow water flows Surface
i i
runoff
Groundwater ->
Groundwater
aquifer (base) flow
2.4.4 Strengths and Deficiencies of Watershed Models. Singh (2004) summarized the
major strengths of the current generation of models as follows: They are diverse, making
it easy to find specific watershed model to address a practical problem; they are
comprehensive and can be applied to a range of issues in a watershed; they can simulate
the physics of the underlying hydrologic processes in both space and time quite well; they
are distributed in space and time; and the attempt to integrate ecosystems and ecology,
processes with hydrology successfully reflect the increasing role of watershed models in
On the other hand, Singh (2004) pointed out the watershed models' deficiencies
as follows: they are not user-friendly tools; they require large data inputs; they lack the
measures that can quantitatively asses the model reliability; there are limited and unclear
guidance for the model applicability; and they cannot be supplied with environmental,
This section presents a summary of the Better Assessment Science Integrating point &
User's manual,
integrates a geographical information system (GIS), national watershed data, and state-of-
the-art environmental assessment and modeling tools (such as HSPF, SWAT, SWMM
30
etc.) into one convenient package (EPA, 2012). The system is designed to be local, state
and regional to perform watershed and water quality-based studies (EPA, 2012). It was
The BASINS system promotes better assessment and integration of point and
nonpoint sources for watershed and water quality management. It integrates several key
environmental data sets with improved analysis techniques. Environmental programs can
apply the integrated system in various stages of environmental management planning and
decision making (EPA, 2007). It is also conceived for developing TMDLs programs since
they requires a watershed-based approach that integrates both point and nonpoint sources
(EPA, 2007).
and model application and interpretation. BASINS facilitate such steps by bringing key
data and analytical components under one roof providing the user with a fully
the use of GIS, BASINS has the flexibility to display and integrate important of
information such as land use, point source discharges, and water supply withdrawals
tool and it was adopted to model land use effects on water quality in many watershed
31
studies (Tong et al., 2002; Luzio et al., 2002; Fohrer et al., 2001; Arnold et al., 2005;
nonpoint source hydrology and water quality, combines it with point source
contributions, and performs flow and water quality routing in the watershed reaches
(Singh, 2005).
HSPF can simulate and predict the impact of land use on nutrient loadings into
watershed water bodies. The model is flexible and reliable hydrologic model. It is very
robust with high resolution (Bicknell et al., 1996). HSPF model is developed under EPA
areas (EPA, 2011). The first version of HSPF was released in 1980. The functions and
processes in the initial development were derived from the following group of
HSPF consists of number of modules that are arranged hierarchically to permit the
pervious land segments, impervious land segments, and free flow reaches/mixed
each module Figures 2.3, 2.4, and 2.5 are explained in details in HSPF Version 12.2
HSPF also has number of utility modules that are used to access, manipulate, and
analyze time series information stored by the user in HSPF's TSS (Time Series Store) and
WDM (Watershed Data Management) files. The time series comprises data such as
hourly precipitation, daily evaporation, and daily stream flow. They provide valuable
The HSPF system was designed such that a top down approach was followed. The
various simulation and utility modules can be invoked conveniently either individually or
in tandem although they were separated according to functionality (Bicknell et al., 2005).
The concept behind designing HSPF is that the comprehensive simulation system with
through a fixed environment and interact with each other (Bicknell et al., 2005). Water,
sediments, chemicals are all constituents and the motions and interactions are denoted as
When launching HSPF, the watershed area must be delineated either manually or
automatically into homogeneous land areas called Hydrologic Response Units (HRUs)
before running the HSPF model (Donigian et al., 1995). The delineation process takes
place in BASINS. It divides the watershed into subbasins that has a combination of
weather, soil, landuse, topographic and geologic properties that are unique to the specific
subbasin (Donigian et al., 1995). HRUs can be impervious or pervious areas, which are
modeled independently. Each HRU requires input data such as precipitation, temperature,
33
potential evapotranspiration, and parameters related to land use, soil characteristics, and
This diagram shows a reservoir-type model that allows different types of inflow and
outflow (Bicknell, 2005; Calderon, 2009). Inflows and outflows are simulated as a water-
balance system in HSPF (Donigian et al., 1995). Pervious land segment simulates
infiltration, shallow subsurface flow (interflow), base flow, and deep percolation
(Donigian et al., 1995; Calderon, 2009). All these processes are performed by the
PERLND module.
HSPF uses the physical and empirical formulations to model the movement of
water within each HRU. According to land cover on the land segment, interception
interception storage must be filled before excess precipitation can reach the land surface;
hydrologic processes are modeled by PWATER which is the key subroutine of module
PERLND. The subroutine simulates the retention, routing, and evaporation of water from
pervious land segments. Algorithms used to simulate these lands, and related processes,
are based on the original research for the LANDS subprogram of the Stanford Watershed
Model (Bicknell, 2005). The number of time series required by PWATER depends on
whether snow accumulation and melt are considered, otherwise only potential
34
evapotranspiration and precipitation are required. However, when snow conditions need
to be simulated as well, time series for air temperature, precipitation, snow cover, water
yield, and ice content of the snowpack are also required. Water available for infiltration
and runoff are sum of inflow to the surface detention storage and the existing storage.
Part of the precipitation directly infiltrates and moves to the lower zone and groundwater
storages. Other part of the water move to the upper zone storage and may be routed as
runoff from surface detention or interflow storage. The water that infiltrated through the
surface and from the upper zone storage may stay within the lower zone storage where it
lost by deep percolation where it is considered lost from the simulated system.
Similarly Bicknell (2005) stated that IWATER simulates the retention, routing,
is available for retention storage and removed by evaporation but when the retention
infiltration rate with time as a function of soil moisture. They are calculated by the
following relationships (few subroutines are summarized here from HSPF Users Manual,
detailed descriptions of all modules and subroutines used by HSPF could be found in
RATIO = ratio of the ordinates of line II to line I (see Bicknell et al. (2005) -
The factor that reduces both infiltration and upper zone percolation that account
Where:
The fraction of runoff that becomes inflow to the upper zone storage is
computed as follows:
Where:
FRAC = fraction of potential of direct runoff retained by the upper zone storage
UZRAT = UZS/UZSN
PROUTE, the surface runoff subroutine determines how much potential surface
detention runs off in one simulation interval. The process of overland flow is considered
relates outflow depth to detention storage are used for the simulation. The rate of
Where:
SURSM = mean surface detention storage over the time interval (in)
SURSE = equilibrium surface detention storage (inches) for current supply rate
Only the simulation in the main channel river is considered when simulating
rivers (Bicknell et al., 2005). Storage routing technique is used by the model to route
water from one reach to the next during stream processes (Singh et al., 2004). The
discharge relations for reaches in specific function tables (FTABLES) (Singh et al.,
2004). A fixed relationship is assumed among water level, surface area, volume and
average slope overland flow can be determined from the Geographical Information
System (GIS) data base including Digital Elevation Models (DEMs) (Singh et al., 2005;
(Linsley et al., 1988; Calderon, 2009).Values of other parameters needed by HSPF cannot
be obtained from field data and need to be determined through model calibration
segment are simulated by IQUAL module using simple relationships. One approach is to
simulate the constituents by association with solids removal. The other approach uses
atmospheric deposition and/or basic accumulation and depletion rates together with
38
Where:
interval)
Where:
If there is surface outflow and some quality constituent is in storage, then washoff
is simulated as follows:
Where:
39
SOQO = washoff of the quality constituent from the land surface (quantity/ac/
interval)
For this study, for model development process, many components of the
BASINS 4.0 system were used, namely WinHSPF and WDMUtil for pre-processing and
Tong, 2006; Im et al., 2003; Shirinian-Orlando, 2007; Wicklein et al., 2008) but not as
much in highly urbanized watersheds as Chicago River watershed. HSPF lacks the
capability to simulate storm sewer networks (Mohamoud et al., 2010). Though there are
studies that show that among reviewed models that simulate storm water quantity and
quality in urban environments, HSPF is the most comprehensive and flexible hydrology
and water quality model available (Zoppou, 2003; Bergman et al., 2002; Mohamoud et
al., 2010). However other studies suggested that using the urban land use as a non point
source for nutrients can give invalid results, because of the impervious cover in urban
area and the way drainage is frequently routed to waste water treatment plants (which
may or may not be in the same basin), then discharged to local rivers as point sources PS
pollutant loads, the effective impervious area (EIA) as a portion of the total impervious
40
Smith, 2005; Brabec et al., 2010). Impervious area is a rough indication of the total
watershed utilized by human activities. The EIA is considered one of the most important
and hard to determine parameters (Sutherland, 2005). It is the portion of the TIA within a
watershed that is partially or totally connected to the drainage collection system. Street
surfaces, parking lots, paved driveways and sidewalks, rooftops that are directly
connected to the storm sewer system, are all included in the EIA (Sutherland, 2000). For
urban runoff modeling or hydrologic analysis, the EIA for a given basin is usually less
than the TIA; however, in highly urbanized basins, EIA values can approach and equal
TIA (Smith, 2005). Field measurements, empirical equations and calibrated computer
models are some ways to determine effective impervious area (Brabec et al., 2010;
Figure 2.3. Structure chart for PERLND module, (Bicknell et al., 2005)
4 2(2V3^> 42(2)4^
Figure 2.4. Structure chart for IMPLND module, (Bicknell et al., 2005)
42
.2{3i.10>
Figure 2.5. Structure chart for RCHRES module, (Bicknell et al., 2005)
iw^uc*/ J / / / / f /
Irttcrccption
Storage
sa
Figure 2.6. A flow diagram of the hydrological components of HSPF (Bicknell, 2005)
43
2.4.6 Previous Watershed Studies in the Study Area. Number of studies was
conducted in the area but generally as part of studies to investigate the flow and water
quality for the Upper Illinois River Basin system (Bartosova et al., 2007; Demissie et al.,
2007; Bartosova et al., 2005; Knapp et al., 2004). The studies did not tackle the
individual watershed and also the limited land use categorization used could not explain
the more detailed behavior of a highly urbanized watershed such as Chicago River
Watershed.
To understand nutrients fate and transport the key will always be available in
historical data records (Boynton et al, 1995; Vanclooster et al, 2004). Any evaluation and
analyses in a watershed should include the historical changes and variations, present
conditions, and potential future conditions (Tong et al, 2002; Randhir et al., 2009). To be
able to do that, data sources plays a great role. But the true challenge would be the
heterogeneity of data consumed from different data resources. Integrating data from these
different sources, in order to be useful, for assessment or analysis or for using data set for
a model application can be a difficult task because these would involve thorough
investigation of data pages and metadata they contain (Beran, et al, 2009; Horsburgh et
al., 2009).
water quality and quantity, groundwater levels, and precipitation etc. but are managed by
different agencies. This division of responsibilities has created some barriers between
watershed data users and watershed data managers. Many believe that managing water
44
resource systems in a fully integrated fashion would alleviate these problems (Rooy et al.
1993).
Number of national data collection and publication systems that are operated by
government agencies have formed over the years. These include the USGS water data
storage and retrieval system (WATSTORE) which has been replaced by the National
Water Information System (NWIS), the USEPA storage and retrieval system (STORET),
the Natural Resources Conservation Service (NRCS) which operates and maintain
systems such as Soil Climate Analysis Network (SCAN) and SNOwpack TELemetry
(SNOTEL), the NOAA National Climatic Data Center (NCDC) and others (Horsburgh et
al., 2009). These national data systems are huge data stores, but, they have different data
storage, retrieval, and publication formats and systems (Beran, et al, 2009; Horsburgh et
al., 2008; McGuire et al., 2008). To synthesize data sets from these different sources into
a single analysis proved to be a difficult task because each system needs to be navigated
through the pages of metadata that it contains (Raskin et al., 2005; Horsburgh et al.,
2008; Horsburgh et al., 2009). Moreover, all these systems are traditional database
management systems that lack the ability to integrate data in a way that provide a
decision support system that could deliver actionable information (Maidment, 2005;
During the past decade, initiatives by the U.S. National Science Foundation
(NSF), the American Geophysical Union (AGU), the American Meteorological Society
(AMS) and the International Association of Hydrological Sciences (IAHS) have brought
attention to the value of long-term hydrologic data to the investigation of long term
watershed scale impacts of hydrologic and climatic data (Marks et al., 2007). Ongoing
45
hydrological data collected from experimental watersheds for more than thirty years
collected and stored data and made it available for retrieval in public websites (Marks et
al., 2007; Bosh et al., 2007; Moran et al., 2008; Nicholas et al., 2008). Also the Long
Term Ecological Research (LTER) network has made long term climatic and hydrologic
data collected for their research available in public website. Although the data provided
by these experimental watersheds will help to understand long term impacts, however,
these efforts to provide synthesized data for watershed assessment and analysis is more of
local benefit to the specific experimented watershed and will not give similar benefits to
other watersheds.
providing standard data format that allow effective sharing of information from existing
national databases such as NWIS, NCDC, STORET etc. (Maidment, 2005; Horsburgh et
al., 2008; Horsburgh et al., 2011; CUAHSI, 2012). Within the HIS, storage and
using an Observations Data Model (ODM) which is a relational database model that
provides a framework in which data of different types and from disparate sources can be
integrated (CUAHSI, 2012). Also another system, an ontology-aide, search engine had
been introduced. The system named Hydroseek allows users to query multiple hydrologic
exist between the sources (Beran, et al, 2009; Hydroseek, 2012). Although these efforts
watershed scale but they are solely data storage or retrieval systems and none of them
Data warehouse (DW) technology is the integrated way introduced to manage and
analyze monitoring data (Rob et al., 2008). " A DW is a collection of consistent, subject-
oriented, integrated, time-variant, non-volatile data and processes on them, which are
based on available information and enable people to make decisions and predictions
about the future" (Inmon, 2005). DW is an in-advance approach to the integration of data
from multiple, huge, heterogeneous and distributed databases and other information
and client analysis component (Ahmed et al, 2010). It enables business decision makers
2010) including support of complex querying (Bernardino, 2002), and discovery of trends
and patterns in data (Tjoa et al, 2005; Han et al, 2006). DW store and maintain data in
The management of huge amount of data and its complex analysis during queries
are most important in development of a DW (Bonifati et al. 2001; Chen et al. 2003;
Kambayashi et al. 2004; Rai et al, 2007). The DW specific property that makes it an
efficient application processer is that most of the applications are decision support
47
oriented applications that can summarize huge amount of data and deliver actionable
information (Ahmad, 2010; Rai et al, 2007). Furthermore, DWs have the benefit of
keeping historical records and are historically consistent to achieve better understanding
organizations that generates a great amount of operational data that are distributed across
management, site selection, and energy efficient building operation (Chau et al, 2002;
sustainability and ecology (Burmann et al,2007; Teuteberg et al, 2009; Freundlieb et al,
2009) where growing need of decisions support process according to ecological criteria
such as electricity consumption or pollutant content are important. The concept was
and fish community health (McGuire et al., 2006). It is also introduced in the
al., 2007).
environmental and water resources sectors. The existing literature identifies ways to
identifying the dimensions, facts, and hierarchies in spatial data warehousing for
environmental and water resources areas (McGuire et al., 2008). Given the nature of
environmental and water resources data and their sources, the development of an
48
integrated information system and DW would have a great potential in these areas
2.5.1 Data Driven Models. Huge amount of data collected daily from monitoring
systems and the exponential growth and advance in the information systems, have
directed the attention to data mining area to generate models that can explain physical
systems. Data mining is based on the analysis of all the data characterizing a system and
model it given the basis of connections between the system state variables, with only a
limited number of assumptions about the physical behavior of the system (UNISCO-IHE,
2012). The discipline of data driven modeling is the study of mathematical algorithms
that improve automatically through experience and training (Preis et al., 2007). It has
developed with the involvement of areas such as artificial intelligence, machine learning,
data mining, knowledge discovery and pattern recognition. The most used models are
Data driven modeling has gained a lot of attention in the last decades in both
hydrology and water resources research. While physical based models require the
description of the system's input, physical laws and boundary and initial conditions, a
data driven model simply extracts knowledge from large amount of data with only
limited number of assumptions about the physical behavior of the system. A data driven
Data driven modeling has been applied in areas such as rainfall-runoff modeling
(Minns et al., 1996; Dawson et al.,1998; Tokar et al., 2000; Solomatine et al., 2003;
Abedini et al., 2004; Muttil et al., 2004; Lin et al., 2007); flood forecasting (Sahoo et al.,
49
2006; Chen et al., 2007; Chiang et al., 2007); stream flow prediction (Imrie et al., 2000;
Asefa et al., 2006; Preis et al, 2007). Water quality constituents were also predicted using
data driven models in number of studies (Markel et al., 2002; Preis et al., 2007; Shrestha
et al., 2007).
process if (1) sufficient amount of data is available; (2) there are no considerable changes
to the modeled system during the period covered by the model (Solomatine, et al., 2004;
models is needed due to lack of understanding of the underlying physical processes (Preis
et al., 2007; Shrestha et al., 2007) or the available models are not adequate enough
validate the simulation results of physically based models with data driven ones, or vice
2.6 Conclusion
of the spatial and temporal aspects of different attributes of water resources, especially
quantity and quality, and how are they are interlinked. Finding comprehensive ways to
interact and assess those attributes is the key for sound and successful watershed
elements such as water quality, quantity, climate and land use; and watershed problems,
50
conflicts, needs and targets; and improving domain knowledge and decision making
Methodologies for analyzing and assessing the watershed using data warehouse
and data mining technologies proved to be successful and getting lots of attention in the
water resources field relative to existing systems. Also using watershed perspective as a
tool has been accepted by water resource managers and policy makers as an effective
methodology to address effectively the full range of concerns in the watershed. So,
incorporating detailed land use and historical data records to develop tools to quantify the
impact on water quality are the key elements using both physical and data driven
modeling techniques.
51
CHAPTER 3
STUDY AREA
3.1 Introduction
The Chicago River Basin (hydrologic unit 07120003) is the smallest part of the
Upper Illinois River Basin (UIRB). It comprises 6 percent of the whole basin. UIRB is
part of the Mississippi River Basin which is world's second largest drainage basin and
includes comprehensively more than 40% of the land areas in USA. The significance of
the Chicago River Basin is its navigable system. The Chicago Sanitary and Ship Canal
along with the Illinois River, and the lower reaches of the Des Plaines River, provide a
3.2.1 Location and Drainage Area. The Chicago River watershed area is located in
northern Illinois, confined within latitudes 4111' and 4220' N and longitudes 8732'
and 8846' W. It drains approximately 645 mi2. The upper river is the North Branch
Chicago River which originates in the lake county as three tributary streams, West Fork,
Middle Fork, and the Skokie River, Figure 3.1. The three tributaries then flow south into
Cook County. The Skokie River joins the Middle Fork, which then joins the West Fork.
At the junction of combined Middle and West Fork rivers, begins the North Branch
Chicago River. It then ends at the junction of the North Branch and the North Shore
Channel. The North branch Chicago River then joins the South Branch of the river in
downtown Chicago. The South Branch flows into the Chicago Sanitary and Ship Canal
52
where it flows westwards and joins the Des Plaines River as a tributary of the Illinois
River which flows southwest across the state and join the Mississippi River system.
3.2.2 Topography. The uppermost bedrock of the Chicago River Basin is mainly
undifferentiated Silurian Devonian dolomite and limestone, and Ordovician shale (USGS,
1999). The Chicago River and the Des Plaines Basins are naturally divided by a drainage
divide in northern Cook County, Illinois. The origin of the fault has been explained as
being from either volcanic activity or from meteoric impact (USGS, 1999). Mean
elevation in the Watershed is 443 ft above sea level. The study area has a mean basin
slope of 0.001.
3.2.3 Population Growth. The Chicago River basin is a highly dense populated area.
Population in the basin grew steadily over the years and created urban and industrial
growth. As a result of this growth major changes in the region had taken place and have
significantly affected the quality of surface waters. These changes are the construction of
treatment plants (USGS, 1999). Wastewater disposal and storm runoff became a serious
Before 1900's Chicago River and Calumet River used to flow and drain into Lake
Michigan. The Chicago River was considered the sewage system then. Because of
increased growth of population, the river was badly polluted, with human and industrial
wastes directly dumped into the river then into Lake Michigan. The problem to provide
clean drinking water from the lake and the contamination of the river that caused diseases
in the area, led to the decision to reverse the Chicago River by creating a canal from the
Chicago River to the Des Plaines River. A cut was made to the natural
subcontinental divide that separates the Chicago River and Calumet River basins from the
Des Plaines River basin. Now the Chicago River flows from north to south through Lake
and Cook Counties. Now, the population slightly declined in the last two decades but the
issues in the area are because of reasons related to the development and redevelopment of
urban areas.
3.2.4 Soils. Mollisols soils with low to very low permeability cover the entire
watershed (USGS, 1999). Poorly drained soils are the predominant soil in the north,
especially along the rivers. The hydrologic soil group classification identifies soil groups
with similar infiltration and runoff characteristics. Typically, clay soils are poorly drained
and have very low infiltration rates, while sand soils are well drained and have a higher
infiltration rates. United States Department of Agriculture (USDA, 2012) has defined
four hydrologic groups (A, B, C, or D) for soils (USDA, 2007). Type A soil has high
infiltration while D soil has very low infiltration rate. Generally, the watershed Chicago
River watershed has a moderately slow infiltration rate along Lake Michigan (hydrologic
group C) with very poorly drained areas along the western border of the watershed and
the rest of the watershed is highly altered, mainly impervious (ILEPA, 2009).
3.2.5 Climate. The climate of the watershed is classified as humid continental because
of the cool, dry winters and warm, humid summers. The combinations of cool, dry and
warm, moist air are the sources of most precipitation in the basin. Large daily fluctuations
54
in temperature and precipitation can result from this combination (USGS, 1999). The
approximately 16 to 18 in., and average snowfall (including snow, ice, sleet, and hail) is
3.2.6 Land Use. Human factors that affect the hydrologic characteristics of the
watershed include land use, urbanization, and population change. Population in the basin
grew steadily and created urban and industrial growth areas and that's due to the
construction of the navigable system that link Lake Michigan and the Mississippi River.
Numerous inputs of contaminants and nutrients from manmade sources that include
municipal and industrial releases, urban runoff, and atmospheric deposition become a
The Chicago River watershed is approximately 82% urban land use. Figure shows
land use percentages for the Chicago Metropolitan area were extracted from Chicago
Metropolitan Agency for Planning (CMAP). CMAP's 2005, Figure 3.2shows land use
Inventory created using digital aerial photography and supplemented with data from
100%
Under
Construction
Trans./Comm./
Util
Industrial
Institutional
Commercial
Residential
3.2.7 Surface Water Issues. Surface-water issues related to urbanization include point
and nonpoint sources of sediment, nutrients, trace elements, and organic compounds;
streamflow alterations; and the health and community structure of aquatic biota (USGS,
1999). In the early part of the 20th century, MWRDGC built large intercepting sewers to
discharged as effluent. Today, the MWRDGC reclaims approximately 1.4 billion gallons
The two main water treatment plants facilities that discharge into the Chicago
River watershed are North shore water treatment plant WRP and Calumet WRP. The
water in the CAWS is 70% treated effluent and the rest of the water is from Lake
Michigan and stormwater. Combined sewers that carry both sewage and stormwater serve
much of the area around the CAWS. The Tunnel and Reservoir Project (TARP) is the
MWRD's long term plan to reduce combined sewer overflows (CSOs). TARP works by
capturing the flow from CSOs before it gets to the waterways and diverting it to a system
pervious land to impervious land, as well as by changing the lay of the land and drainage
patterns that result in a dramatic increase in the rate and volume of stormwater runoff and
a reduction in groundwater recharge (MWRDGC, 2007). The change in land cover, the
increase in construction activities that results in compact soils and smooth natural grades,
along with diminished native vegetation, and storm sewers systems and lined channels all
these factors aid in the conveyance of greater volumes of runoff downstream at much
58
faster rates (MWRDGC, 2007). All this led to increase in flooding, stream channel
3.2.7.2 Water Quality Issues. Much of the pollutant load in runoff originates from
Some common water quality impacts of stormwater runoff are sediment contamination,
loads, nitrogen and phosphorus, were greatest from the urban center of the Chicago
metropolitan area, reflecting the effect of wastewater return flows to the Chicago River
and Chicago Sanitary and Ship Canal (USGS, 1999). About 30 percent of the total
nitrogen load in the upper Illinois River Basin was measured in the Chicago Sanitary and
effluents.
The Chicago Sanitary and Ship Canal also was observed to carry the majority of
considered the main nutrient contributor to Illinois River and hence Gulf of Mexico dead
zone, the largest hypoxic zone measured. Hypoxia is the condition of low dissolved
oxygen in the water that occurs due to overabundance of nutrients that leads to excess
than 2 mg/L. Prolonged hypoxia conditions can lead to death of biota in the waters. Table
59
3.1, lists common pollutants and their potential sources, found in Cook County
watersheds where most the Chicago River watershed lay within (MWRDGC, 2007).
The review of available historical data records is an essential step in the analysis
of the watershed system. The analysis and assessment of data will help to pinpoint the
problem areas in the watershed. Figure 3.3 depicts the location of data sources and major
3.3.1 Data sources and types. For this study different types of data were compiled and
utilized from different source agencies for purpose of building WDW, watershed
assessment and watershed modeling, These agencies include U.S. Geologic Survey
District USACE, and Better Assessment Science Integrating Point & Non-Point Sources
BASINS data store. Table 3.2 shows source agency, station ID, data type and years of
data used.
60
Table 3.1. Sources and types of potential pollutants in the study area (MWRDGC, 2007).
Caiumet
WRP
US6S
* WRP
MWRD
Scte ft KAomcters
sources discharge actively within the Chicago River Watershed. They are permitted from
National Pollutant Discharge Elimination System (NPDES) permits (ILEPA, 2009). This
NPDES were included in the HSPF water quality model as direct inputs to the
main reaches in the watershed. Pollutants species considered are the total nitrates as
nitrogen (N02+N03) total ammonia (NH3+NH4+) and TP as phosphorus. For this study
only the North shore WRP and will be considered. Table 3.3 shows the average values of
The basic watershed elements are water quality and quantity, climate, land use,
and any other characteristics that define a watershed such as watershed size, shape, slope,
soil type, drainage area, hydraulic roughness and population. Interactions among these
elements and their attributes can result in different unique problems, conflicts, targets,
and needs that a watershed would experience and as shown in Figure 3.3 and also a list
For the Chicago River watershed, these watershed elements are further defined as
follows:
Water Quantity:
Stormwater runoff
Water Quality
Nutrients
Toxics
Bacterial contamination
Salt contamination
Land Use:
82% urban:
~ 56% residential
~ 10% commercial
~ 10% industrial
~ 10% institutional
~ 15% Transportation/utilities
Climate:
Urban heat island (due to building materials thermal admittance and structures
geometry)
Increased precipitation
Watershed Characteristics:
Size
Shape
Drainage area
Soil type
Average slope
Urbanization degree
Problems:
flooding
Pollutants
Excess nutrients
Excess erosion
Conflicts:
parking lots increase rate and volume of runoff and pollutes receiving water
Targets:
Better recreation
Needs:
Decision models
- TMDLs
WQ standards
BMP
69
Water Water . . . . w a t e r s h e d
~ .. ~ LandUse Climate r. . ...
Quality Quantity ! Characteristics
, .. _ T ,
Watershed
. ? ... t r
3.4 Conclusion
Given the study area conditions and the watershed elements, the scope of the
study would fit in utilizing these data and to incorporate theses elements. WDW will
make it easy to access, retrieve, fill data gaps, analyze, and manage available historical
data records. The data then is used in develop watershed models: data driven model to
predict water quality and quantity using data driven algorithms, and physical watershed
model to simulate land use effect on water quality producing local export coefficients for
the Chicago River Watershed. Optimization approach for land use tradeoff is introduced.
Given the Chicago River watershed needs, the study provides the following: BASINS
provide the integrated watershed platform; Data Warehouse and HSPF provide the
CHAPTER 4
4.1 Introduction
multiple attributes of water resources, especially quantity and quality. How to interact
and assess those attributes is the key for sound and successful watershed management.
This chapter considers the development of an effective and comprehensive tool that will
holistically integrate some of the watershed attributes and assess them in a watershed
perspective.
spatial and temporal aspects of the watershed and available historical data records. Many
organizations and individuals monitor important hydrologic variables that would help to
assess watersheds, however, the different data storage systems and formats they have
Moreover, all these systems are traditional database management systems that
lack the ability to aggregate data and provide a decision support system that analyze data
and deliver actionable information. Therefore, this chapter addresses this problem by
The objective of this chapter is to demonstrate how to integrate and analyze data
from different data sources. A local DW that aggregates different available data types
from various agencies in the watershed will be presented. Historical records of surface
water quality, quantity data, land use and climate will be investigated and showed as an
72
example for this study, but more attributes can be easily added and utilized following the
same procedures. The DW will make it easy to access, retrieve, fill data gaps, analyze,
and manage data records of water quantity and quality, climate, land use etc. in the
watershed and to integrate and provide the data for different requirements such as
The overall objectives of this chapter are: Firstly, the development of a multi
the introduction of a graphical user interface that brings the benefits of the multi
Watershed.
and analysis and can be used as a foundation of a decision support system (Chau et al,
levels, each allowing data aggregation at desired level of abstraction (Ahmed, 2010).
73
Each level in a dimension can have additional attributes that provide descriptive
characteristics about the facts to narrow the search and classifying of the facts data (Rob,
2008). These descriptive attributes and the dimension hierarchy attributes are called
dimensional data. D is the domain of /4; and is and TopD is a specific generic, maximum
element that is functional and definable from all other attributes (Gosain et al., 2010;
Only one A t determines all other category attributes and thus defines the finest
i.. />/
4.2.1 Data Source Area. The data source area includes heterogeneous databases that
supply data to the warehouse (Rai et al, 2007). This includes flat files and operational
spatial databases. The source systems should be thought of as outside the DW because
there is little or no control over the content or format of the data (Kimball et al., 2002).
The main priorities of the data source area are processing performance and availability
(Kimball et al., 2002; Inmon 2005). Homogeneity and consistency among different
sources would be preferred but not required since data will be processed in the staging
4.2.2 Data Staging Area. The data staging area is an intermediate database where both
data storage and extract-transformation-load (ETL) processes take place. It includes the
of the information from multiple sources into a common format; the cleansing of these
data sets; and the propagation of the data to the DW (Kimball et al., 2002; Sapsford et al.,
2006; Simitsis et al., 2005). The data staging area is dominated by the simple activities of
sorting and sequential processing and does not provide query and presentation services.
4.2.3 Data Presentation Area. The data presentation area (or multi-dimensional data
model) is considered the core of the DW. It is the area where integrated data marts are
organized, stored, and made available for direct querying by users (Kimball et al., 2002).
All the data presented, stored and accessed through a dimensional model. If the
presentation area is based on a relational database, then model is to as star schemas and if
technology, then the data is stored in cubes (Kimball et al., 2002). Data must be atomic
and must adhere to the DW bus architecture where the overall data architecture for the
warehouse was identified in order to deliver the granular data in a dimensional form. The
planning task.
4.2.4 Data Access Tools. The final major DW component is the data access tool area.
This area provides an interface for end users to retrieve, process, organize, analyze, and
hoc query tool or a complex one such as a sophisticated data mining or modeling
The basic watershed elements are water quality and quantity, climate, land use,
and any other characteristics that define a watershed such as watershed size, shape, slope,
soil type, drainage area, hydraulic roughness and population. Interactions among these
elements and their attributes can result in different unique problems, conflicts, targets,
and needs that a watershed would experience (see Figure 3.3 and section 3.4).
support system and a sound watershed management plan. It is known that factors such as
changes in climate and land use would alter the hydrologic cycle and affect the quantity
of water available for runoff, streamflow and ground water flow (Changnon et al., 1996)
77
and water quality (Tong et al., 2009). Also it was a given fact that watershed hydrology is
intimately related to land use, soil type and climate (Chow et al., 1988). Inspite of this,
assessment of these relationships is not always considered in policy design (Randhir et al,
2009).
The focus of this study is to develop an effective way to facilitate the evaluation
watershed attributes such as precipitation, nutrients, surface flow that stem from basic
watershed elements such as climate, water quality, and water quantity can be evaluated
by gaining more information about the single attribute or retrieving information across
multiple attributes.
T
Watershed
? T T T
The interactions among attributes and the difficulty in assessing them play a vital
role in resource management (Randhir et al, 2009; Randhir et al, 1997). Recognizing the
right relationship is an important step to achieve the potential mix of products and
services that could be provided by a watershed (Randhir et al, 2009; Lovejoy et al. 1997).
The complexity of interactions among different watershed elements and the difficulty in
78
assessment are major reasons that lead to adopting evaluation plans that focus on single
element or attribute.
The basic watershed elements data are segregated among different operational
systems and data sources that support them. The segregation causes many problems for
watershed scale data analysis including: difficult data sharing; redundancy, multiple
entries for the same data may happen at various locations, slower decision-making
process; and does not support advanced analysis that are important for supporting holistic
In watershed scale analysis and assessment, all data can be associated according
to a specific purpose. The WDW will be capable of providing information based on the
interaction among the basic resources. Collecting and analyzing data in this fashion
sources in a DW. Two distinct approaches may be used to determine the corresponding
(bottom up) approaches. The need-based approach takes care of data that will be needed
in the future based on the watershed needs, so that these data will be acquired and be
in the source systems; and the available data will be added to the warehouse. In this case
some uploaded data may not have any immediate use but may become useful in the
79
future. For the WDW, a hybrid approach is adopted, taking into account the watershed
etc;
Water quantity data such as stream flow, groundwater flow, surface runoff etc;
Climate data such as precipitation, air temperature, evaporation, cloud cover etc;
type etc.
All this data may exist in a large variety of formats but they will be standardized
4.4.1 Data Sources. Within the United States, many hydrologic variables such as
streamflow, water quality, groundwater levels, soil moisture, and precipitation are
Geological Survey USGS, the NOAA National Climatic Data Center NCDC, and others.
Number of national data collection and publication systems have formed to collect these
These systems contain huge amount of data, but have the different storage
systems and formats, along with different data retrieval systems remained an obstacle to
DW, preferably for the most atomic data collected. Data at its lowest grain level provides
In DW, Data is either regarded as fact data or dimensional data (Rob et al., 2008).
The fact data tables consist of numeric measurements and are joined to set of dimensional
tables that are filled with descriptive attributes. Fact table is the primary table in the
in a fact table corresponds to a measurement and all measurements in a fact table must be
at the same grain (Kimball et al., 2002). One example for a fact measure is a specific
granularity adopted to represent facts. They are the entry points into the fact table and
hence the users interface for the whole DW. The dimension attributes are the primary
source of query and reporting (Kimball et al., 2002). The power of the DW is directly
proportional to the quality and depth of the dimension attribute (Kimball et al., 2002).
Given the watershed reading flow example, typical dimensions for the watershed reading
Dimension table is defined with a primary key field while the fact table uses
foreign key fields to reference with its dimension tables. The fact and dimensional tables
are simply joined in a star join schema. The resulting dimensional schema is scalable to
allow new fact and dimension tables to be added as needed and extensible to
followed. All possible facts and dimensions were identified and possible linkages
between them were established through Bus Architecture Matrix (BAM) (Kimball et al.,
2002) (see Table 4.1). By defining a standard bus interface for the DW environment,
separate fact and dimensional models that share a comprehensive set of common and
conformed dimensions can be implemented. In Table 4.1 the watershed processes were
laid out as matrix rows. The matrix rows translate into facts based on the watershed
primary activities. The rows of BAM are facts (data marts) and columns are possible
dimensions and intersections of data marts and dimensions are marked. This watershed
BAM mapped all the processes which need to be considered to get all data marts to
The watershed processes or fact tables proposed for this study are Watershed
water quality, Watershed water quantity Watershed climate, Watershed land use, and
Watershed Characteristics. The BAM can be expanded by adding either new watershed
processes (data marts) or more detailed existing processes along with their corresponding
dimensions as needed.
82
Watershed XXX X X
water quality
Watershed XXX X X
water quantity
Watershed XXX X
climate
Watershed XXX X
land use
Watershed XX X
characteristics
A grain level for each entity (fact table and dimension) will be determined
according to watershed requirement and data availability. Table 4.2 provides definition of
the entities used in the proposed WDW model; it defines the type, description and grain
of the fact and dimension tables. The two types of slowly changing dimensions used are
fixed where it indicates that the information about dimension is fixed and never changes;
and type 1 where it indicates that the information about dimension can be updated and
new information can overwrite the old one where the update is insignificant to be tracked.
The grain level provides information about the level of individual record in each fact
table making it easy to choose appropriate dimensions to be associated with the fact table
monitoring stations
value
seasons, years)
Land use type Dimension- Provides hierarchies of land use type Level III land use
Fixed (e.g. land use level, land use code and type
description)
population etc.)
dimension is structured in a way that allows filtering or aggregating fact measures from
fact table at a desired level of hierarchy, for instance Date Dimension allows aggregation
of data for day level, week level, month level etc. Each level in a dimension can have
more attributes to provide descriptive characteristics about the facts to filter the search
and classifying of the facts data (Rob et al, 2008). The basic dimensions that shows the
explicit grain proposed for this study are Date dimension, Location dimension, Source
agency dimension, Measurement details dimension, Land use type dimension and
the structure of time providing access to the watershed's historical records. This
structure aggregates data from the day level, week level, month level, season level
based on location that is specified by the monitoring station ID, available location
data based on agency's name (e.g. USGS, EPA) and type (e.g. Federal, regional,
4. The Measurement Details Dimension: This dimension specifies details about the
(e.g. mg/1, cfs), category (e.g. water quality, water quantity), and subcategory (e.g.
chemical, physical).
5. The Land Use Type Dimension: this dimension specifies the land use level it is
level 1 (e.g. urban land use), level 11 (e.g. residential urban land use), or level 111
(e.g. single family residential land use) and the specified code and description for
Figure 4.2 shows the roll-up for the land use type dimension as an example of
dimensions is stored in corresponding dimension tables and all fact measures are stored
in separate tables. Each fact data like watershed water quality, watershed water quantity,
related to the dimensional data. Since the presentation area is based on a relational
database, these dimensionally modeled tables are referred to as star schema (Kimball et
al., 2002).
Using star schema as a data modeling technique will provide an efficient query
keeping the relational structure of the dimensional and fact data (Rob et al, 2008). Figure
4.3 shows the star model for one of the proposed watershed processes, watershed water
quality data mart and the corresponding dimensions. In the star schema model, the
(date, location, source agency, and land use) that are filled with descriptive attributes.
Each fact table can be shown individually as in Figure 4.3 with dimensional tables
displayed radial around it or can be shown collectively with all fact tables included. The
proposed WDW is designed as a multi dimensional model and shown in Figure 4.4. The
Figure shows the five central fact tables for the five different watershed processes and
which consist of measurements and dimension keys to set of six smaller six dimensional
9 Lana Use Level In Code >o- . Land Use Level II Code >0- . Land use Level I Code
* Land Use Level III Desc Land Use Level II Desc V Land Use Level I Desc
Figure 4.2. Roll-up for the land use type dimension and related attributes
Watershed
Source Agency Date
Water Quality
Figure 4.3. Star schema model for watershed water quality data mart
tlMrtonK*
'SMoniO ISourc* ApncvK*
*St*wOK /tqwiHm* V*afer$MCnTyp*K
j ^ Stalon MMMtonna Agtcf ^AotflcyTm >/H|PItmOflK UM
OmmtonTwt Fwfl VW***ftt8ha0
DtmrttonT## Typ*2 /VM*rtMSlop
rOr*n*Qt**i
'SurflClROUgMtilS
/StfCMractansfcs
W*rtft#*Poputto
Owttnvofl Typt Fad
:*08Kn(fK3
;4M*Hurtmfit OftaxiKet (FK)
ftDmwvfiO j j*OaKr{fK3 iftnxmr>Ktr(fKi
i*8oure Afltnw **(FK)
ftttMfuwwtOffaiis ; j*Und vw T*n My(FK> !Awalritt4 cnf tm kv (Fk)
jtt tfmmtnwnt Da** Ky<F 0 !<Uftd UMTVP#Kcv<FK) ftlO<lMnKy(F)C 'HUKatOftK^lTK)
j*iaridOMTfl*K*(FK) i*L0<l6WiWif(FK) SowttAflancrKfUfFfc) Source AQtncy tit* (f *5 i FtCtTy* Atomic
<L0CWiKW{FM3 jftSourtt Agincf ** (Fk) vftttftngVaM | / land UM Art* pwcwiast
i<SOMfCA#tKYKtfkl I^Rodrngvuu# FKITW Monvc Fad TW Awn*
;vR*omtVaM
F*ct1to Atom*
|>Ful Da*
I^DtyOfWtfk
4/DwNyminKon*
;**** Matevm |vOwNgmOri
i'HMwrwwntNam# $UA4UTw *Ktt "VDwNarr*
* C wtormrt MMwrtn*rtNam# jvtawu$U*HCot i^DrAfi&w
jvUfldUStLmllOMt ;/** NwfflYir
!* Land Us# Itfttk Co4t 'vWM NumCwatt
:'Mw*nnt Subcategory IV Lfld Uft*lfitKDtr i>Mor*
: Dmwntiofl Typ* Tr* 1 IvLMdUMUvvtNt COM i> north Nam Owrali
[vUM UitUwtlWOMC !/north Nam*
| 0*iMWfn Tip* Furt I^MontiANrtv
!'$asw
/5am# DftyYtarflfo
; Dimtnsion Type Fnao
Table 4.3 shows the dimension, fact and stage tables' statistics. Table 4.4 shows
watershed water quality fact table resulting from the star schema as an example for
watershed processes fact tables. It shows the watershed processes readings measures and
all dimensions that related to the fact tables via dimension primary keys. All fact tables
have three or more foreign keys, designated by the FK notation in Figure 4.4, that
connect to the dimensions tables' primary keys (Kimball et al., 2002). For example a date
key in any of the fact tables always will match a specific date key in the Date dimension
table and when all the keys in the fact tables match their respective primary keys
correctly in the corresponding dimension tables, then the tables satisfy referential
integrity and the fact tables could be accessed via the dimension tables joined to them
DATE_DIM
DIM 29950 89
LAND_USE_TYPE_DIM
DIM 136 86
LOCATION_DIM
DIM 77 82
MEASUREMENT_DETAILS_DIM
DIM 159 29
SOURCE_AGENCY_DIM
DIM 4 72
WATERSHED_CLIMATE_FACT
FACT 33878 28
WATERSHED_WATER_QUANTITY_FACT
FACT 151692 29
MWRD_READINGS_STAGE
STAGE 1377409 33
NWS_AIR_TEMP_STAGE
STAGE 17593 18
NW S_DAlLY_PREC_STAGE
STAGE 16285 10
Measurement Details Key (FK" Foreign key from the measurement details
dimension
Land Use Type Key (FK) Foreign key from the land use type dimension
Source Agency Key (FK) Foreign key from the source agency dimension
The review of available aggregated historical data records is an important step for
more detailed and better assessment and analysis of watershed data. To facilitate access
to the WDW a tailored graphical user interfaces (GUI) dashboard was built. In definition
business intelligence and data integration system to facilitate the different tasks of the
The GUI is a web base browser applet implemented in Java that can be accessed
by simple internet browsers. The distinctive feature of this dashboard is that it consists of
two view layers of information, a monitoring layer that shows graphical abstracted data,
graphs, symbols and charts; and an analysis layer that allows summarized dimensional
data, hierarchies, slicing and dicing of data through ad hoc analysis tool (Eckerson,
2006).
The purpose of the monitoring layer is to visually convey the information via
visual elements such as graphs, dials, gauges, symbols, alerts, charts and tables with
specific formats or any other visual elements that gives information. For analysis layer
aspects such as dimensional time series analysis and segmentation are considered along
with visual analysis, reporting, and predictive statistics and modeling tools that could
give information about root cause of a problem. Theses successive layers provide
necessary details, views, perspectives that enable users to understand a problem and
identify the steps they must take to address it (Eckerson, 2006). The dashboard allows
access to the WDW for users where access to the internet is possible. Example of
The GUI, Figure 4.5, allows tracking of different parameters for different water
quantity, quality stations and climate data or any selected watershed process through any
desired time period. The main purpose is to show watershed data with a complete view
including location and date selection. This enables the user to view the watershed
conditions in this specific location and date selection to build up information and
A graphical representation provides the user with a sensitivity level of the selected
parameter they want to assess. If the user is only interested in obtaining information
relating to a particular station for a selected period of time, it will be possible to assess
whether this station data is sufficiently available for the selected period. The user can
scan through a number of successive water quality and quantity monitoring stations in
93
different locations and different date levels that range from a day level, week level,
month level, year level or even a seasonal level from the time selection panel. The
The dimensional data can further be analyzed through ad hoc analysis tools where
data can be sliced and diced to find patterns or pinpoint certain problem areas. Figure 4.6
shows a sample of ad hoc analysis for average, maximum, and minimum values for total
phosphorous during summer for all the water quality stations within Chicago River
All the analyzed data, graphs and tables could be exported in several format (such
as excel, or PDF) and used in other tools such as data mining, modeling, and power point.
94
4.6.1 Watershed Condition and Data. The WDW concept was demonstrated for the
The Chicago River basin is a highly dense populated area. Population in the basin
grew steadily over the years and created urban and industrial growth. As a result of this
growth major changes in the region had taken place and have significantly affected the
quality of surface waters. These changes are the construction of navigable waterways,
(USGS, 1999). Numerous inputs of contaminants and nutrients from manmade sources
that include municipal and industrial releases, urban runoff, and atmospheric deposition
Now, the population slightly declined in the last two decades but the issues in the
available urban areas. The watershed is considered highly urbanized area with almost
82% urban land use. The increased water quality and quantity issues along with
uncontrolled invasive species form the Mississippi river that threatens the Great lakes
ecology, raised the calls for taking extreme measures to resolve these issues.
But before taking drastic measures to solve problems in the watershed, a thorough
quality and quantity, climate, land use, and other watershed characteristics data will offer
better understanding, assessment, and analysis for the watershed. Details of these
repository for the watershed, a WDW for the Chicago River watershed is proposed. The
WDW is an in-advance approach to the integration of data from multiple, possibly very
1995). It will manage and analyze monitoring data in an integrated way that will develop
Analysis of the historical data record will give insight of the previous and existing
watershed conditions and its sensitivity toward different parameters, making it easy to
concentrate either on the whole watershed or just in a specific sub watershed. This will
help in developing a deep understanding of the watershed and lead to the establishment of
As shown previously in Table 3.2 numerous data for water quality, quantity,
climate, and land use were obtained for the watershed. Water quantity data were obtained
from USGS, there are 18 active stations that measure daily flow and gage heights in the
watershed. Data for the period of 1970-2010 were compiled for the water quality. Water
quantity data were obtained from the MWRDGC; there are 41 stations within the
watershed that measures up to 65 different water quality parameters once, twice or for
some stations three times a month. Data for the period of 1970-2008 were compiled for
water quality. Land use data were compiled from CMAP, land use inventory for 2001 and
2005 were utilized. Climate data compiled were precipitation and air temperature.
97
Chicago O'Hare Airport metrological station's hourly data for the period 1970-2006 for
precipitation and for the period 1994-2006 for air temperature were compiled.
4.6.2 Watershed Data Warehouse Architecture. The data was extracted from its
originating data sources and saved in excel files. Staging area tables, dimensional tables,
and fact tables were created and stored in Oracle Database 1 lg system, launching a DW.
The data was loaded to the DW's staging area using SQL*Loader. SQL*Loader is an
Oracle-supplied utility that allows user to load data from a flat file into one or more
database tables. A control file was created to provide information to SQL* Loader such as
name and location of input data file, format of records in the input data file, name of
tables to be loaded, correspondence between the fields in the input files and the columns
in the destination database tables being loaded (Gennick et al., 2001). Staging area is
where the data is cleansed, manipulated and prepared to be delivered to the multi
The four staging steps of DW are extracting, cleaning, conforming and delivering
(Kimball et al., 2004): The extracting was simple and fast where original data was
extracted from different sources and loaded to its designated stage tables, in case of the
USGS and MWRDGC data, the extracted tables were restructured and cleaned form
different symbols and notations used by the source before they were loaded into the
staging area and the CMAP shapefiles areas were transformed into numerical areas that
values, consistency across values, and removing duplicates, null cells were either
populated with mean values or removed, also very high reading and unreasonable
98
negative readings were removed, data were matched based on location for some stations
where the station ID been changed over the years; Data conformation is required
whenever two or more data sources are merged into the DW, standardized domains and
measures were used so querying separate data sources can be made based on identical
textural and numerical labels; and finally to make the data ready for querying, the data
was physically structured into a set of simple, symmetric schemas, discussed earlier, and
The measurements and dimensional data contained in the staging area were
mapped to the DW to be loaded in the designated fact table and dimensional tables and
completed with mapping the correct foreign keys. All logical definitions and their
stakeholders. Data analysis and watershed assessment of the spatial and temporal aspects
of the watershed give an overview of the system and its needs and can help to identify the
major issues and problems in the study area. This section presents an assessment through
the years of some of water quality parameters that can be obtained by using the Chicago
River watershed dashboard and running ad hoc analysis utilizing the Chicago River
WDW. Figure 4.7 shows the location of the stations selected to be used in the assessment.
They were selected to show the behaviors of the watershed upstream and downstream for
sections of the system. The parameters chosen for the assessment were total kjeldahl
99
nitrogen (TKN), total nitrates (N02+N03), total phosphorous (TP), Dissolved oxygen
(DO), water temperature. Other watershed assessment for different parameters such as
WW_32
05535070
WW_106
05535500
WW31
05534500
Lake Michigan
WW_37
05536105
Dup.igi-' County
WW_46
05536118
WRP
MWRD
Figure 4.7. Water quality and quantity stations used in the watershed assessment
101
4.6.3.1 Assessment of TKN and Total Nitrates (N02+N03). In definition TKN is the
sum of organic nitrogen, ammonia (NH3), and ammonium (NH/) and to calculate total
Figures 4.8 and 4.9 shows the TKN and total nitrates historical data in the MWRDGC
No known WQS are now available for these two parameters in the Chicago River
Watershed, if that was available it would be easy to apply the WQS value and to detect
where and when these standards were exceeded, a thorough analysis of the location can
be done then.
A visual inspection of Figure 4.8 and 4.9 reveals that the upstream station
WW 32 showed lower and more stable concentrations through most of the years, while
the upstream WW 46 showed much higher values with apparently decreasing trendline
for TKN and increasing trendline for total nitrates. This is due to the North Side WRP
effluents which due to stringent permits for ammonia it converted more of the ammonium
into nitrates. These findings suggest that just looking at the downstream station for TKN
would have shown improvements in lowering the constituent; however that is not the
case since the assessment shows that the TKN were actually transformed to total nitrates.
4.6.3.2 Assessment of TP. Figure 4.10 shows total phosphorous historical data for the
MWRDGC stations included in this assessment (see Figure 4.7 for locations). The
majority of the data for all stations fall in the range of 0-2 mg/1 for total phosphorous.
WQS would have helped to identify the location and period for TP that was exceeded for
extra analysis and assessment. The downstream TP showed almost constant or very slight
102
increase over the years suggesting that not much had been done to decrease the
constituent.
4.6.3.3 Assessment of N/P Ratio. Nutrients, such as nitrogen and phosphorus, are
essential for a healthy and diverse aquatic environment. Excessive amounts of nutrients
however can have undesirable effects on water quality, resulting in changes in the
biological community (USEPA, 2000). High concentration of nutrients also can result in
potential human health risks associated with the growth of harmful algal blooms (Harned
et al., 2004) resulting in the phenomena known as eutrophication which in later results in
hypoxia. Hypoxia is the condition of low oxygen in the water that occurs due to
section, the N/P ratios are evaluated in terms of defining the limiting nutrient in the
aquatic system, the limiting nutrient is a concept defined as a chemical needed for plant
growth but is available in smaller quantities than needed for algae to increase their
abundance (Calderon, 2009). To define the limiting nutrient Chapra (1997) specified a
rule of thumb for N/P ratio for rivers and streams. It suggests that a ratio value of 7.2 and
less indicates that limiting factor for algal growth is nitrogen and for ratio values that is
higher than 7.2 the limiting factor for algal growth is phosphorous (Calderon, 2009).
Figure 4.11 and 4.12 shows N/P ratio assessment for an upstream station WW 32 and a
shows higher N/P ratios which suggest high concentrations of nitrogen relative to
phosphorus which makes phosphorous the limiting factor. Looking at the downstream
N/P ratios in Figure 4.12 would suggest that low concentrations of both phosphorus and
103
nitrogen and hence lowered N/P ratios. However given the assessment done for TKN,
total nitrates and total phosphorus would suggest that the lowered N/P ratio is due to
added nitrogen and phosphorous. This is probably due to the added phosphorous and
nitrogen by the North side treatment plant and other point sources.
4.6.3.4 Assessment of DO. Figure 4.13 shows the rate of dissolved oxygen over the years
for the station selected for the assessment. The Figure shows that almost all of the rates
measured are above 2 mg/1 indicating sufficient DO in the water. This result was
expected inspite of the high rates of nutrients available in the streams because of the
availability of aeration plants in the stream. The dissolved oxygen rates were further
analyzed vs. the water temperature for both stations and shown in Figures 4.14 and 4.15.
The Figures show clearly that the dissolved oxygen rates drop with the elevation of water
temperature probably with warm air temperatures. Figure 4.16 show relationship between
16 i i 1 r r~ ~i 1 i i \ r- ii 1 1 1 1 1 1 1 r
14 Upstream -
WW_32
12
Downstream -
WW 46
10
M .V
1
z
*i fAfcj
WW_32
WW 46
WW_32
WW 46
* + w
*
t *
* * V \
40
%
% V \ * X * ' * < / # %
* a k"
0 Llii iiiiii-iJi '-'''''1
' ''1''1L
1975 1980 1985 1990 1995 2000 2005
12
# WW_46
WW 32
10
%
E. 6
O
o
/ V/ KS t
. .
rt
1970 1975 1980 1985 1990 1995 2000 2005 2010
12
10
4 i | *
* nv. * ,
ao
1 6
O
a
10 15 20 25 30 35
Water Temp, (deg C)
y = 2.3525X + 20.99
R2 = 0.8103
-10 -5 0 5 10 15 20 25 30 35
Water Temp, (deg C)
4.7 Conclusion
The multi-dimensional watershed model presented in this chapter is the base for
the framework proposed to investigate land use effects on water quality in highly
urbanized watersheds. It provides readily integrated watershed data that offers holistic
view of the watershed elements, across the heterogeneous data sources. The DW concept
described here is used to study and assess the Chicago River Watershed. It allows
combining data from different sources, such as USGS, MWRDGC, CMAP, and NWS in
facilitates the integration and aggregation of information at all desired levels concerning
The web-based dashboard and reporting tools allow the watershed stakeholders to
management the watershed. The introduced GUI illustrates the ease with which the DW
dimensional concept can be mapped to graphical user interface design to create a tool that
facilitate the different intended tasks of the users, whether it is a watershed assessment
task or integrating data for a physical model application task. The ad hoc analysis tools
are further used where data can be sliced and diced to find patterns or pinpoint certain
problem areas and to provide necessary details, views, or perspectives that enable users to
understand a problem and identify the steps they must take to address it. This improves
the efficiency of analyzing and assessing a watershed over utilizing traditional databases.
Although, the model and the methodology were implemented for highly
urbanized watershed, it is not restricted and can be used without modification for any
watershed.
113
CHAPTER 5
5.1 Introduction
Estimates of nutrient concentrations, loads, and yields are useful for evaluating a
water body and help to identify source areas to develop mitigation strategies (USGS,
collected manually once or twice a month or may be even less frequent and later analyzed
in laboratory. This procedure is time consuming, and not efficient when immediate
time, are particularly important when considering the amount of nutrients entering lake,
or reservoir (USGS, 2012). Load estimates also are important to the establishment and
monitoring of TMDLs mandated by the CWA (USGS, 2012). The yield estimates may be
used by resource and regulatory authorities to help prioritize efforts with regard to land
This chapter investigates the development of data driven models that can estimate
water quality constituents from historical data records in Chicago River watershed
5.2 Methodology
This research uses data mining (DM) from the artificial intelligence field to
estimate water quality parameters such as total nitrates for the Chicago River Watershed.
DM models consist of a set of mathematical relationships. DM tasks are divided into two
114
major divisions, predictive and descriptive tasks. Predictive tasks where a particular
attribute is predicted based on the value of other attributes. The attribute to be predicted is
the dependent variable while the attributes used for making the prediction are
independent variables. For the descriptive tasks, the objective is to develop patterns
(correlations, trends, etc.) that summarize the relationships in data which are often
exploratory in nature. These tasks usually require post processing techniques to validate
The predictive models are divided to classification models which are used for
discrete target variables and regression models which are used for continuous target
variables (Tan et al., 2006). There are many methods to construct prediction and
classification models such as naive Bayesian, support vector machines, decision tree,
Regression is the statistical methodology that is most often used for numeric
predictions. Both prediction and classification are supervised learning problems where
there is an input X and an output Y, where the model learns the mapping from the input to
parameters, is assumed:
y = g(x|0) 5.1
minimized and the estimates are close to the correct values given in the training set
(Alpaydin, 2010). For the Chicago River Watershed, data driven models to estimate
115
different land use types, month of year and others, were developed using different data
mining techniques.
5.3 DM Methodology
Data Mining
Model Model
Input Pre Building Deployment Output
processing
Evaluation
5.3.1 Data Pre-processing. This includes the tracking of incomplete data that lack
remove errors and outliers, and resolve inconsistencies in data (Han et al., 2006). This
process ensure quality data which will in turn will ensure quality mining results and
quality decisions since duplicate or missing data may result in incorrect or even
To better understand the mining data, descriptive data summarization provides the
analytical foundation for data pre-processing. The basic statistical measures for data
weighted mean, median, mode; and measurements for data dispersion such as range,
quartiles, variance and standard deviation for (Han et al., 2006). Graphical
representations such as histograms, boxplots, quantile plots, and scatter plots facilitate
visual inspection of the data and are useful for data pre-processing and data mining as
Examples of data pre-processing are data cleansing, data integration, and data
data to assure high quality. The majority of these pre-processing steps were done when
Data transformation routines are used to convert the data into forms that are
suitable for mining, for example an attribute data may be normalized to fall between
small ranges such as 0 to 1 (Han et al., 2006). Different data reduction techniques such as
reduction and discretization can be used to obtain a reduced representation of the data
117
without losing the content of information (Han et al., 2006). For numerical data
Histograms are highly effective at approximating both sparse and dense data as
well as highly skewed and uniform data and can capture dependencies between attributes
(Han et al., 2006). They use binning to approximate data distributions. Data sets for
analysis may contain hundreds of attributes, many of which may be irrelevant to the
mining task or redundant and may slow down the mining process and result in discovered
patterns of poor quality. Various statistical significant tests and techniques which assume
that the attributes are independent of one another can be performed to select best
attributes subsets.
5.3.2 Model Building and Evaluation. This involve the selection and applications of
various models that are developed using comparable analytical techniques and adjustment
of model parameters until optimal values are reached. Input data are randomly partitioned
into two independent sets, a training set and a test set. The training set is used to derive
the model with an accuracy estimated using the test set, this is called holdout method
(Han et al., 2006). Random sub sampling method is a variation of the holdout method in
which the method is repeated k times and average accuracy is considered (Han et al.,
2006). In k -fold cross validation, the input data are randomly partitioned into k or folds
each of approximately equal size. Training and testing is then performed k times and
where each sample is used the same number of times for training and once for testing, see
Figure 5.2, the error is calculated as the average error rates from the all the k iterations
118
(Han et al., 2006). 10-fold cross validation method is adopted for building all the models
in this study.
Fold I
Training sample
Fold 2
Fold 3
Testing sample
Fold 4
classification approaches used in this chapter. In this study, eight different algorithms
were investigated and built as regression or classification model where applicable and
their merits were compared in the context of performance analysis. The prediction
models are: Multiple linear regression, Artificial neural networks, Model trees, Support
vector machines, Lazy learners and Gaussian process. The classification models are:
Artificial neural networks, Model trees, Support vector machines, Naive Bayes, Lazy
learners and logistic regression. General and brief description of each algorithm is given
below:
The method of least squares can be used to solve w0, w1(and wn where the
residual sum of squares. Linear regression offers simple and easily interpretable type of
models.
phase the network adjusts the weights to predict the correct class label of the input tuples
(Han et al., 2006). The multilayer feed-forward neural network comprises number of
neurons is organized into an input layer, an output layer and a number of hidden layers.
The units in the input layer take the information to be processed (values of the predictors)
as inputs, while the output layer produces the prediction result. The hidden layers
successively receives the results of the units in the input layer and gives its results as
inputs to the units in the next layer (Tan et al., 2006; Han et al., 2006; Ould-Ahmed-Vall
et al., 2007).
The process as outlined by Han (2006) is as follows: a set of training tuples are
iteratively processed and compared to the actual known target value; for each training
tuple, the weights are modified to minimize the mean squared error between the
network's prediction and the actual target value; the weigh modifications are made in the
"backwards" direction from the output layer, through each hidden layer down to the first
The ANN algorithm has two benefits, high prediction accuracy and no prior
knowledge requirements for physical relationship between the dependent and the
120
independent variables (Tan et al., 2006). However, the black-box nature of ANN makes it
difficult to understand and analyze the learned function (Han et al., 2006; Ould-Ahmed-
Support Vector Machines (SVM) is a classification method for both linear and
nonlinear data. It uses an appropriate mapping to transform the original training data into
a higher new dimension where it searches for the linear optimal separating hyperplane
(i.e. decision boundary) where data from two classes can always be separated. SVM finds
this hyperplane using support vectors (essential training tuples) and margins (defined by
the support vectors) (Han et al., 2006). The technique used in this study is the Sequential
Model Tree is tree like structure where each internal node denotes test on an
attribute, each branch represents an outcome of test and each leaf node holds a class label
(Han et al., 2006). They extract predictive information in the form an "if-then-else"
expression that is clear and understandable to humans (Ahmad et al, 2010). That is an
explainable approach, in contrast with other machine learning approaches, such as neural
networks (Alpaydin, 2010, Ould-Ahmed-Vail et al., 2007). It can explain the decisions
that lead to certain prediction that can be easily used within a database to identify a set of
records. The input space partitions until the data at the leaf nodes constituted are
relatively homogeneous then a linear model can explain the remaining variability. The
the learning process by assuming that the inputs are independent. Bayes' theorem is
based on the idea that the outcome of an event can be predicted based on some evidence
121
that can be observed to predict an outcome of some events (Ahmad, 2010). Naive Bayes
computes conditional probabilities for the target values based on historical records by
(Alpaydin, 2010).
Advantages of the algorithm is the ease of implementation and the good results,
however, the disadvantages include the assumption that the inputs are independent which
Lazy Learner algorithm (in contrast to the above algorithms which are eager
learners) lazy learner is an instance-based learning that stores training data and waits until
it is given a test tuple to start a process (Han et al., 2006). The algorithm takes less time
in training but more time in predicting. It effectively uses more space since it uses many
local linear functions to form its implicit global approximation to the target function,
opposite to eager learner algorithms which commit to a single hypothesis (Han et al.,
2006). Typical approaches include: k-nearest neighbor; locally weighted regression; and
variables which generates samples over time {Xt}teTjtne where the linear combination
will be normally distributed no matter which finite linear combination of Xt ones takes.
They are considered attractive because of their flexible non-parametric nature and
5.3.2.2 Model Evaluation. Different criteria were used to evaluate the regression and
classification models:
Regression Models. This section discusses the criteria to evaluate the prediction
accuracy of the different algorithms used in the study. As stated in section 5.3.2, 10-fold
cross validation was used. This technique consists of dividing the overall data samples
into 10 subsets, or folds. Each model is trained using 9 of the subsets and evaluated using
the tenth subset. The process is iterated 10 times (Figure 5.2) and each time, a different
subset is used for testing and the remaining 9 subsets are used for training the model. The
model is evaluated by averaging the prediction evaluation criteria from the 10 different
iterations. Regression evaluation criteria used for this study are (Alpaydin, 2010, Ould-
coefficient and measures the extent of linear relationship between predicted (P) and
c _ Cov^A) 53
Where Cov(P,A) is the covariance between the predicted and the actual values
Root Mean Squared Error (RMSE): This error measure is used in the
RMSE 5.4
123
Where pj and aj are the predicted and actual attribute measured for ith test
Mean Absolute Error (MAE): This error measure is similar toRMSE, except that
it uses absolute error values instead of the squared errors. It is computed as:
5.5
Root Relative Squared Error (RRSE): The relative squared error is relative to
what is represented by the simple predictor which is the mean of the actual values. It is
computed by normalizing the total squared error by dividing it by the total squared error
RRSE 5.6
Relative Absolute Error (RAE): This error is similar way to RRSE. The relative
absolute error takes the total absolute error and normalizes it by dividing by the total
absolute error of the simple predictor. The value of this error ranges from 0% to 100%
5.7
accuracy. Typically, confusion matrix, per-class and overall precision and recall and
correlation. It refers to the percentage of correct predictions made by the model when
compared with the actual classifications in the test data displayed in a confusion matrix
124
(Ahmad et al., 2011; Han et al., 2006). Accuracy is the proportion of total true results to
Where Tp and Fp are the number of true and false positives respectively. Tn and
classes. Rows represent actual classifications in data, while columns represent number of
Precision is the percentage of records that are correct responses and are actually
TP
Precision= 5.9
Tp+Fp
Recall is the percentage of positive records that are predicted among all the
Recall= 5.10
Tp+Fn
F-measure is the trade-off of precision for recall and vice versa. It is the measure
that discourages systems from sacrificing to one another excessively. It is given by:
recallxprecision
F-measure = - 5.11
{recalls-precision) / 2
Receiver Operating Characteristic (ROC) is a plot of true positive rate vs. false
positive rate that compare predicted and actual values. It provides an insight into the
decision-making ability of a model (sensitivity) i.e., how likely is the model to accurately
predict the negative or the positive class. It is a useful metric for evaluating how a model
behaves with different probability thresholds (Flach, 2003; Ahmad et al, 2011).
125
5.3.3 Model Deployment. The insights offered by data mining results can be integrated
with policy and decision making tools so that effective watershed management and
optimum land use utilization can be achieved. Such integration requires a post processing
step that ensures that only valid and useful results are incorporated into decision support
system. Example of post processing is the preparation of model inputs based on "what if'
scenarios in order to predict future behaviors that result due to change in any of the
watershed elements such as population, water quality regulations, land use, climate etc.
The capabilities of predicting water quality parameters using data driven models
were demonstrated for Chicago River Watershed. The WDW repository introduced in
Chapter 4 was utilized for developing the models. The goal of this research is to
parameters by utilizing other watershed parameters that are available, continuous and
easily obtained.
The attributes were picked based on their physical nature and whether they are
real time frequently measured data such as daily flow, air temperature and hourly
temperature, dissolved oxygen, turbidity, and total chlorophyll; or they are not time
The choice of these attributes for data driven models to predict total nitrates were
assumed to give relevant and useful information and hence good discovered patterns.
126
Table 5.1 shows the properties and descriptive summarization of the predictors, the
attributes for land use are represented in the table by just one type (TOTlOOl which is
single family land use) the rest of land use attributes are described in Appendix A.
For the Chicago River Watershed, most of the pre-processing steps required for
data mining were performed when building the DW. Histogram analysis strategy was
used to visualize attributes data for outliers. Figures 5.3 and 5.4 show histograms and
matrix of scatter plots of attributes for the Chicago River Watershed selected for the data
mining analysis. Histograms partition the values of an attribute into equal sized partitions
or ranges. 2% of top and bottom data were removed. Also the missing values were
replaced by mean values. The k-fold cross validation method was used for partitioning
training and testing data sets for all the predictive models used for this study, 10-folds
were used. Total number of samples is 905 samples and number of attributes investigated
is 154 attributes.
127
TOTJ 001 Single-f residential area acre 25878.139 13161.3 58746.6 19001.1
128
! ; i 1 ! 1 : . i : j r , , I -
mil
n.
"^-r-rfr-rTrrv^. ........ JL
,
TL n .
n . . . . r i- . n - i . r i .1 *
ww o mm
r
|
1 -i r ' - ; r - Fn. . . . . i -
i
!r i .
,
1 1 . ; , . . r-n*'"L . . . . r-,r . r r i
]. . . !i i r"
r- it
4-
.liiiiiLii JL ~l M l t X f r E I
%'/> it ; % i ' | i 'i %% Ih
software package was used for this study. It provides a comprehensive collection of DM
algorithms and data preprocessing tools that offer a framework to compare the different
algorithms described in Section 5.3.2(Hall et al., 2009). WEKA has several graphical
user interfaces that enable easy access to the underlying processes. The main graphical
user interface is the "Explorer". It has a panel-based interface, where different panels
correspond to different data mining tasks such as preprocess where data can be loaded
from various sources including files and database; and classify which gives access to
WEKA's different classification and regression algorithms. The panel also provides
access to graphical representations of models prediction errors in scatter plots, and also
allows evaluation via ROC curves and other "threshold curves" (Hall et al., 2009).
multiple linear regression, ANN, decision tree, SVM, lazy learner and Gaussian process
using the predictors (shown in Table 5.1) are shown in Table 5.2. Appendix A shows the
Among the six regression models built only the multiple regression model and the
model tree gave models that are Interpretable. Multiple linear Regression model is given
by equation 5.12.
MAX AIR TEMP +0.5953 * DAILY PERC -0.0046 * FLOW + 0.0001 * TOT 1002 -
0.0025 * TOT 1005 + 0.0006 * TOT 1009 +0.0006 * TOT 1010 + 0.0002 * TOTJOl 1
TOT 1027 +0.0001 * TOT 1032 +0.017 * TOT 1033 +0.0005 * TOT 1037 +0.0002 *
TOT 1040 + 0.0001 * TOT 1045 -0.0163 * TOTJ092 + 0.0124 * TOTJ095 + 0.0003
Equation 5.12 indicates that attributes such as DO, water temperature, air
temperature, precipitation and few land uses can predict total nitrates
Figure 5.5 shows the decision tree model where number of rules of 'if then else"
nature partition the tree, rules for each node are shown in Appendix A. Each leaf node
represents a rule to predict the total nitrate. The first umber in parentheses indicates the
number of instances that falls into the corresponding leaf and the percentage indicates the
misclassified instances. Example of these the tree model rules are as follows:
The linear model (rule class) defined by rule LM1 is given by:
0.7193 5.13
*PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
1.282 5.14
0.5277 5.15
The other models do not provide similar representation, nevertheless they can be
utilized to predict total nitrates if the model showed good prediction performance. Table
5.2 compares the prediction accuracy of the six regression models. It shows that ANN,
decision tree and Gaussian processes showed better performance than SVM and lazy
132
learner. They showed similar performance with very close values for RMSE, MAE, and
To further assess the models' quality for the top three algorithms i.e. ANN,
decision tree and Gaussian process, the predicted total nitrate versus the actual total
nitrate was plotted for the all the instances. Figure 5.6 shows that the three models
present good performance for total nitrate values lower than 8 mg/1. This is due to the
insufficient amount of high total nitrate values in the training data which didn't allow the
models to gain sufficient "learning". The plot indicates that different level of
performance for different values of total nitrates can be observed; well for low values (0
to 4 mg/1), acceptable for medium values (4mg/l to 8 mg/1) and poor for high values
(8mg/l and above). Nevertheless given the result of the assessment of upstream and
downstream total nitrate historical records (Figure 4.9), the total nitrates values always
fall below the 8 mg/1 line. This allows the exploitation of the given regression models to
identify and quantify the total nitrates in the Chicago River watershed.
=37181 95 3718' 95
*
{0 15 >1015
/\
= 225 75 >225 ?5
/\
<=4 075 4 078
X =126 5 '1265
4
/ \
*
781 >781
/ \
=7.55
/
7.55
\
X =57.1 >571 192 '19.2 56422.4 8422.45
172 7 72
i
=10.15 1015
*
=382 >382
7.165 7165
*
=5 05 >5.05 <=1C05 >1005
5 >18.5
Regression
Actual
GP - Predicted
a ANN - Predicted
M5P- Predicted
Actual Nitrate
nitrates, the values were transformed from continuous to three nominal classes. The
classes were defined as low, medium and high, Table 5.3. The classification models
selected are ANN, logistic regression, SVM, decision tree, lazy learner (LWL) and naive
bayes. Prediction accuracy for models is shown in Tables 5.4, 5.5, 5.6, 5.7, 5.8, 5.9
Class Range
Low 0 < (N 0 2 + N 0 3 ) < 3.99
As for the regression models, the only model that shows a mathematical form is
the decision tree while all the other models act as black box models. The prediction
accuracy results indicates that all models showed good performance with model accuracy
decision tree, logistic regression, lazy learner, SVM, and naive bayes models
respectively.
Comparing the performance based on the confusion matrix results; ANN, decision
tree, and logistic regression were able to predict the three classification classes. ANN was
the best to predict the low class with 93.2% true positives rate (TP), followed by logistic
regression and then decision tree (92.5% and 91.8% respectively). For the medium class
the models showed TP rates of 74%, 69.1%, and 66.2% for ANN, decision tree and
137
logistic regression respectively. As for the high class the decision showed the best
performance, although low TP rate, followed by logistic regression then the ANN
(29.7%, 28.1%, and 14.1%). The other three models SVM, lazy learner, and naive bayes
only predicted the low and medium total nitrate classes. For low class the TP rates of the
models are 91.7%, 91.7%, and 90.7% for lazy learner, SVM, and naive bayes
respectively. The rates for the medium class are 76%, 75.5%, and 75% for lazy learner,
Other evaluation criteria for the classification models are precision, recall, and f-
measure. The precision rates for the models in descending order are 81.9%, 81.7%,
80.8%, 76%, 76%, and 75.9% for ANN, decision tree, logistic regression, SVM, lazy
learner, and naive bayes respectively. Similarly the recall rates are 85%, 83.3%, 82.3%,
82%, 81.7%, and 80.8% for SVM, ANN, decision tree, logistic regression, lazy learner,
and naive Bayes, respectively. The values for F-measure as given by equation 5.11 are
82%, 81.7%, 80.9%, 78.8%, 78.7%, and 78.2% for decision tree, ANN, logistic
The last criteria to be considered is the ROC plot that measure the decision
making ability and sensitivity of the model. ROC plot for the six models collectively and
the ROC for the ANN model respectively are shown in Appendix A. The weighted
average ROC values are 91.8%, 90.4%, 86.2%, 85.4%, 82.3 %, and 77.5% for ANN,
logistic regression, naive bayes, lazy learner, decision tree, and SVM respectively. The
top left corner of the ROC plot is significantly high for all the models indicating high true
positive rate and a low false positive rate, hence good performance.
138
All the measures given above would suggest that ANN is the best classification
model to predict total nitrates followed by decision tree; the worst is the naive bayes.
However the decision tree provides clear logical model that can be easily understood.
139
High 27 28 9
High 32 14 18
High 41 23 0
High 18 27 19
High 41 23 0
High 41 23 0
5.6 Conclusion
Results show that, given sufficient data with proper variables, DM methods are
capable of predicting water quality parameters, total nitrates in this case. Among the
prediction models used in this study, ANN and decision tree showed better performance
with very close values for RMSE, MAE, and correlation coefficient of 74.49% and
74.48% respectively. Also, for the classification models the prediction accuracy results
indicates that all models showed good performance with ANN, decision tree showing the
Although the ANN model always shows better performance, however, further training for
decision tree models would be more logical since they show reasoning process in rules
that are understandable to humans. These rules can assist policy making in watershed
management plans. On the other hand the other models do not provide such features to
To support better prediction results and robust forecasting system for policy
makers, it is a common practice to use the combination of the outcome of the mining
models. It would be reasonable to utilize combination of the top predicting models for the
The success of data mining methodology relies heavily on the quality and quantity
of data used in the prediction process. Even though this study used a sufficient amount of
data, with logical set of predictors, more data and more watershed characteristics can be
parameters as indicators to predict the water quality parameter in question, and hence
143
simplifying the modeling procedures. This allows the utilization of watershed basic
elements' data and the relationship among them without giving attention to the physical
The data driven models derived would be useful in solving a practical problem or
modeling a system or process if (1) sufficient amount of data is available; (2) there are no
considerable changes to the modeled system during the period covered by the model
(Solomatine, et al., 2004; Solomatine, et al., 2007). They are effective if building
underlying physical processes (Preis et al., 2007; Shrestha et al., 2007) or the available
models are not adequate enough (Solomatine, et al., 2007). It is always useful to have
modeling alternatives and to validate the simulation results of physically based models
CHAPTER 6
6.1 Introduction
In this Chapter, a water quality model of the Chicago River Watershed was
developed using BASINS/HSPF. The model is for simulating and quantifying the effect
of level (III) land use on nutrients loading into the water bodies in the watershed. From
the calibrated and validated water quality model, nutrient export coefficients that relate
To assess the relationships between land use and water quality in the watershed,
the BASINS 4.0 model was selected. BASINS built-in delineation tools, DEM
reclassification and water quality management tools for observed data and other features
allows water quality to be assessed for a specific stream site or for a whole watershed.
HSPF version 12 was used as the water quality embedded model. HSPF is incorporated
in BASINS 4.0 and the interface is known as WinHSPF (Singh et al., 2006). With
6.2 Methodology
This section outlines the steps carried out to fulfill the objectives of the simulation
process. It explains how the hydrologic and water quality model were constructed and
system (GIS), national watershed data, and state-of-the-art environmental assessment and
modeling tools (such as HSPF, SWAT, SWMM etc.) into one convenient package (EPA,
2012). It provides a framework to integrate several key environmental data sets with
improved analysis techniques (EPA, 2012). It was used in this study to characterize
hydrology and water quality processes and how they are related to detailed land use (level
BASINS data layers that can be provided to HSPF include: Digital Elevation
Model (DEM) grid data, to determine boundaries of watershed; National Land Cover
Data (NLCD or GIRAS) land use data to calculate landuse distribution within watershed;
Reach files to determine stream networks; Permit Compliance system (PCS) to provide
requirements; STORET data and USGS data to provide water quality and quantity data
BASINS package contain several important modeling tools. In order to run HSPF,
the observed meteorological data, water quality data and flow data must be formatted to a
Watershed Data Management (WDM) format using another program WDMUtil that is
also included in the BASINS package. The WDM files contain time series data required
by HSPF such as Meteorological data, HSPF program inputs and outputs, and model's
time series that are used in calibration and validation processes. All input data, except for
time series, are contained in User's Control Input (UCI) file. This file contains all the
needed parameters values and control specifications to run the HSPF model. For the
evaluation of model, all the calibration and validation analysis were done using the
6.2.1.1 Meteorological Data in the WDM Format. Meteorological data were available
as a daily data while to run both BASINS and HSPF models, hourly meteorological data
are required. The metrological station selected for this study is Chicago O'Hare Airport.
The reason that station was chosen among other available stations in the study area was
because it had all the metrological constituents that are required by HSPF. Table 6.1
presents the minimum input data requirement to run HSPF and provided by the station.
Precipitation data is used to find surface runoff, sediment and pollutant transport,
runoff or direct evaporation from land and water surfaces. Air temperature data is used to
determine water and soil temperature and to model snow and rain in the watershed. Wind
speed data is needed to model heat exchange, oxygen reaeration rates and chemical
volatilization rates. Solar radiation data is used to find heat balance in water bodies and
plankton growth rate. Dew point temperature is used to determine the kind of
precipitation and to model heat balance in streams. Finally, cloud cover is used to model
Daily time series data must be disaggregated into hourly time series in WDMUtil
program which contains a function that perform that. For this study all the meteorological
time series were readily available by BASINS as disaggregated hourly data for the
selected station.
147
Table 6.1. Metrological data required for HSPF. Chicago O'Hare Airport Metrological
Station.
Hydro- Data time step Data period
meteorological data
Precipitation Hourly 1962/06/01 to
2006/12/31
Potential Hourly 1958/11/01 to
evapotranspiration 2006/12/31
Air temperature Hourly 1958/11/01 to
2006/12/31
Wind speed Hourly 1994/12/31 to
2006/12/31
Solar radiation Hourly 1995/01/01 to
2006/12/31
Dew point temperature Hourly 1994/12/31 to
2006/12/31
Cloud cover Hourly 1994/12/31 to
2006/12/31
6.2.1.2 GIS Data. Once the project was built in BASINS for Hydrologic unit 07120003,
GIS data layers were imported to the project in shape file format. Each GIS data layer
was projected to UTM 1983 Zone 16. GIS data layers that were loaded into BASINS
4.0's window were: Stream network data ( National Hydrography Dataset (NHD) and it
was used because it has more complete hydrography layers than the core Reach File, VI
(RF1) layer provided by BASINS); Chicago O'Hare Airport metrological station data;
GIRAS land use data (from the 1970s) and National Land Cover Data (NLCD) for 1992
and 2001; BASINS Digital Elevation Model (DEM) Grids; Water quality and quantity
monitoring station data (USGS and STORET); Contour and soil type layers. Time series
148
data for imported shapefiles were later downloaded from the BASINS window and saved
as WDM files.
The land use data available through BASINS are either Level (I) or Level (II) land
use type. In order to fulfill the objective of the study level (III) land use type were
acquired from Chicago Metropolitan Agency for planning. CMAP's 2005 landuse
inventory, in shapefile format, was added to the BASIN project. The inventory was
created using 2005 digital aerial photography, and supplemented with data from
numerous government and private-sector sources (CMAP, 2012). The inventory covers
Cook, DuPage, Kane, Kendall, Lake, McHenry and Will counties, identifying areas as
small as one acre using a 49-category classification scheme (CMAP, 2012). The CMAP
land use data was further clipped into smaller shapefile using ArcGIS ArcMaplO clipping
tools, to fit the watershed study area because of limitations in processing large landuse
classifications in BASINS. Land use types classifications used for the study are shown in
Appendix.
6.2.1.3 Watershed Delineation using BASINS 4.0. The watershed delineation tool
within BASINS 4.0 was used to delineate the Chicago River Watershed. Watershed
delineation is the process by which the watershed boundary and stream network are
is used to determine a contributing watershed area for a specific outlet or to divide the
required by HSPF, where the watershed is divided into segments to analyze them. The
existing streams and basins are manually selected and used to determine the watershed.
For this study Automatic delineation was used. The delineation process ended up in
determining the three GIS layers that are required to run the HSPF: Streams, Subbasins,
and Outlets.
For this study the delineation process divided the watershed into two subbasins:
the Upper Chicago River subbasin and the Calumet River subbasin. The two subbasins
were actually naturally separated before the building of the Chicago Area Waterway
System (CAWS), and both used to drain into Lake Michigan. They are hydrologically not
connected, i.e. no stream is connecting the two subbasins within the watershed boundary,
and hence, they were represented as two subbabisns at the end of the delineation process.
Figure (6.1) shows the results of the delineation of the Chicago River watershed. The
three GIS layers required by HSPF were determined for each of the sub watershed.
The automatic delineation also estimated stream network parameters within each
subbasin using the digital elevation layer and stream network layer provided. Average
stream slope, stream length, drainage areas and elevation of each stream segment were
estimated as well.
The only way to consider the two subbasin as one was to choose an outlet that lay
outside the boundary of the Chicago River watershed, however this would mean part of
Des Plaines River watershed should be included and data for Des Plaines River and Salt
Greek should be included as well and that would be beyond the scope of the study to
investigate the Chicago River watershed in a watershed perspective. Also the complex
stream behaviors of the Calumet River subabsin would not make it possible for the
subbasin to be analyzed within the boundary of the watershed for the same reason so only
150
Upper Chicago River subbasin was investigated in a watershed context for the physical
m VU-liB- fl-A
r ^ iwriinAMrtye* _
jj r OUtt Ungad vWjh*! f6?i 20003nJw flj~
ft* AWefhMSNpeflB(0?1M003nedwst*i, Q
(7 Strewn SImwflle (rw)(0712000><Wt
u Q OyttetaMaH ShapeFte(OytWs.cJ*cy<iri2on
Jl NHDPW
r NHCMM C*
.1 r Pant Sourc sna VNtnorawais
P Prmt CompMrc System
T ObservwJCWta SJaloos
; P sroperswwM
P NiMS Oround-VSMw Sura
P NtMSVMetOmtYSIAons
- P t*MS 0*fr C*ctwg* Savons
P MrttS MaMurfw(t Staxvis
I ' Wewher SMkn Stos 2006
r Barter
r NAWQA Study Area Un* Boundaries
'<& Hyerotojpr
P NMMlHydiogii((tyMasa(07120003
C ReteM**. V1
f~ CtiO0rigUniCO(M
- ( " AccanUng LW Boundanes v
mmmm
Figure 6.1. The Chicago River Watershed delineation process using BASINS 4.0 (the red
lines represent the subbasins formulated).
151
The WinHSPF interface was launched by selecting HSPF model from Models
menu in the BASINS main window. Shapefiles such as the study area's Subbasin,
streams, and outlets resulted from delineation process along with land use and
metrological station shapefiles were supplied in order to initiate the WinHSPF. Once
WinHSPF was launched, a HSPF User Control Input (UCI) file and watershed data
The WinHSPF divided the Upper Chicago River subbasin into homogeneous land
areas known as Hydrologic Response Units (HRUs). The HRU were used to define 6
reaches and 7 sub watersheds. The reaches element specifies the rivers, lakes and
function tables FTABLES that represent volume-discharge relationships for each reach.
A fixed relationship was assumed among water level, surface area, volume and discharge
(Singh et al., 2005). HRUs can be impervious or pervious areas, which once determined,
would be modeled independently. Each HRU requires input data such as metrological
data and parameters related to land use, soil characteristics to simulate hydrology,
The main simulation modules are PERLND, IMPLND, and RCHRES and they
simulate pervious land segments, impervious land segments, and free flow respectively
(Donigian et al., 1995). Figure (6.2) shows the schematic created by WinHSPF to
represent the Upper Chicago River subbasin. The schematic shows all the elements such
Lb
Ul
U.^
KiPC^
Figure 6.2. Schematic created by WinHSPF for the upper Chicago River subbasin.
153
estimated for accurate hydrologic analyses is the effective impervious area (EIA) of the
watershed (Sutherland, 2000). Studies suggest that using the urban land use as a non
point source for nutrients can give unrealistic results that's because the cover in urban
areas is impervious and drainage is frequently routed to waste water treatment plants
WWTPs (which may or may not be in the same basin), then discharged in the streams as
pollutant loads, the effective impervious area (EIA) as a percentage of the total
impervious area (TIA) should be determined for basins that are directly connected to the
drainage systems (Sutherland, 2000). EIA include impervious areas such as paved
driveways connected to the street, sidewalks, rooftops that are connected to the curb or
storm sewer system, and parking lots (Sutherland, 2000). For urban runoff modeling or
hydrologic analysis, the EIA is usually less than the TIA. However, in highly urbanized
basins EIA values can approach and equal TIA (Sutherland, 2000).
TIA is determined using the two common methods: land-use or zoning maps; and
aerial photography (Jones et al., 2003). The scientific basis for the relationship between
land use and the amount of impervious surface was developed in the field of urban
hydrology during the 1970s (Brabec et al., 2010). In the early research, imperviousness
was evaluated using four ways: (1) using aerial photography and then using a planimeter
to measure each area, (2) counting the number of intersections that overlaid a variety of
classification of remotely sensed images and (4) equating the percentage of urbanization
The majority of current impervious surface studies rely on the methods of these
original studies and subsequent studies that correlated percentage impervious surface to
land use largely by using estimates of the proportion of imperviousness within each class,
see Appendix B(Brabec et al., 2010). Some of the TIA determined using aerial or satellite
photography and adopted for this study found in the literature are shown in Table (6.2)
The three recent methods most commonly used to determine E I A are field
was proposed by Alley et al. (1983) based on work completed for highly urbanized
drainage areas in Denver, Colorado (Sutherland, 2000). They proposed the equation:
E I A = 0.15 x T I A 1 A 1 6.1
Other relationship was developed by Laenen (1983), for the USGS, was based on
Portland and Salem, Oregon (Sutherland, 2000). An empirical equation based on this
analyzed e q u a t i o n ( 6 . 2 ) a n d d e v e l o p e d s e r i e s o f e q u a t i o n s t h a t p r o v i d e e s t i m a t e s o f E I A
summarized as follow:
E I A = 0.01 x T I A 2 0 6.3
measures:
E I A = 0.04 x T I A 1 1 6.4
E I A = 0.1 x T I A 1 - 5 6.5
E I A = 0.4 x T I A 1 - 2 6.6
Table 6.2. Some of TIA percentages adopted for this study based on literature
Agricultural 0
Commercial 85
Forest 0
Industrial 85
Multi-Family Residential 50
Single-Family Residential 35
Roads 85
Schools 50
Vacant 0
Water 100
157
6.3.2 Flow simulation. Flow is the first component to be simulated. PWATER and
IWATER are the modules used for flow simulation. PWATER calculate the components
of the water budget and predict the total runoff from pervious land segments. IWATER
module simulates the retention, routing, and evaporation of water from impervious land
For each reach, a fixed relationship is assumed among water level, surface area,
characteristics of reaches in the model are defined by parameters in the function tables
(FTABLES) that represent volume discharge relations for reaches (Singh et al., 2005).
Parameters needed for the simulation such as nominal upper zone storage, nominal lower
zone storage, soil moisture infiltration rate, percent vegetation cover of each land use
type and groundwater recession rate were populated with BASINS default values or
6.3.3 Water quality simulation. The simulation of nutrient loadings from different
land use nonpoint sources was done using the HSPF modules PQUAL and IQUAL. A
simplified approach that simulates each water quality constituent independently based on
simple relationships with water or sediment was used by the modules. The species
ortho phosphorus (P04) for both pervious and impervious land segments.
The PQUAL and IQUAL simulate the pollutants using one of two methods: either
by direct wash off by overland flow where the constituent is simulated based on basic
158
depletion and accumulation rate or by wash off associated with detached sediments where
the constituent is simulated as a function of sediment removal. The first approach was
adopted for all the species since the study area is largely impervious and the nutrients will
Wash off is simulated using the commonly used relationship (Bicknell et al.,
2001):
Where:
SOQO = washoff of the quality constituent from the land surface (lb/ac/day)
And the storage of constituents on the land surface is calculated using equation
6.9 to account for the accumulation and removal processes (Bicknell et al., 2001):
Where,
stream reach using the RCHRES module. It is assumed that the reaches are completely
mixed and the flow is unidirectional. Point sources were added in the HSPF simulation.
The two known NPDES that could be added to the watershed are North Side WRP and
Calumet Water WRP. In WinHSPF, after the non-point source loadings were calculated
for each land use, they were then added to their corresponding reaches along with the
identified point sources. For each channel reach, WinHSPF the fate, transport, and
delivery of the nutrient loads will be simulated using the reach quality module (RQUAL).
performance for the following reasons: (1) to provide a quantitative estimate of the
model's performance and predictive ability; (2) to provide a measure to evaluate any
comparing simulated and observed parameter values (Donigian, 2002). Initial set of
values for all parameters are used based on literature recommendations then later refined
and improved until reasonable difference between simulated and observed data series are
observed (Donigian, 2002). Validation is the procedure that ensures that the calibrated
model can properly assesses the watershed variables and conditions that can affect model
results, and demonstrate the ability of the model predict observations for periods separate
No commonly accepted modeling guidance has been yet established, although the
160
American Society of Civil Engineers (ASCE) had emphasized the need to clearly define
model evaluation criteria since in 1993 (Donigian, 2002). However, specific statistics and
performance ratings for the models use have been developed and used for evaluation
(Calderon, 2009). A number of 'basic truths' are evident and are likely to be accepted by
Models are solely approximations of reality and cannot exactly represent natural
systems.
model is validated
Graphical comparisons and statistical tests are both required to evaluate model
Models cannot be expected to be more accurate than the errors in the input and
observed data.
model performance, for these purposes multiple model comparisons, both graphical and
For this study model performance and calibration/validation are evaluated through
statistical tests. The calibration/ validation process is hierarchal process starts with
plots, with a 45 linear regression and statistical comparisons Error statistics, e.g. mean
error, absolute mean error or correlation tests. Among the standard regression, Pearson's
161
correlation (r) and determination (r2) coefficients were used. Those coefficients describe
the degree of co-linearity between simulated and observed data. The regression
^(Qj-oxsj-s)
r 6.10
Where Oj and Sj are observed and simulated values respectively and 0 and S are
For model performance, (r) ranges from -1 tol. A value closer to 1 means better
performance. For (r2) the values range from 0 to 1 higher values means less variance and
2002; Calderon, 2009).. The fact that only the dispersion is quantified is one of the major
systematically over or under predicts will still result in good (r2) values close to 1.0 even
efficiency NSE proposed in 1970 and is defined as one minus the sum of the absolute
squared differences between the predicted and observed values normalized by the
variance of the observed values during the period under investigation (Krause et al.,
NSE=1-t^f 611
Where Oj and Sj are observed and simulated values respectively and 0 is the mean
of observed values.
The range of NSE lies between 1 (perfect fit) and -oo. An efficiency of lower than
162
zero indicates that the mean value of the observed time series would have been a better
predictor than the model. The largest disadvantage of the Nash-Sutcliffe efficiency is the
fact that the differences between the observed and simulated values are calculated as
squared values. As a result larger values in a time series are strongly overestimated
Root Mean Square Error (RMSE), Normalized Root Mean Square Error (NRMSE)
and Mean Absolute Error (MAE) are other statistical indices that can be used to evaluate
Where 0, and S; are observed and simulated values and n number of records.
Omax and 0min are maximum and minmum observed values. RMSE and MAE
measure the aggregated differnce between simulated values and observed values. Values
(Donigian, 2000). The values in the table provide general guidance, in terms of the
percent mean errors or differences between simulated and observed values, so that users
can determine the level of agreement or accuracy (i.e. very good, good, fair) that might
163
be expected from the model application (Donigian, 2000). Table 6.3 shows Percent
Sediment <20 20 - 30 30 - 45
Pesticides/Toxics <20 20 - 30 30 - 40
164
For the Upper Chicago River subbasin the results of simulation were measured at
North Branch Chicago River at Grand Ave, Chicago. The location was chosen to
represent the outlet for the subbasin. There were two factors that limited the time period
for the calibration and validation of the model. First the observed flow was limited to the
period 2002 to 2010 (with some missing data in the period of 2003-2004) but the
available metrological data end at 2006 so only the period 2002 to 2006 was allowed for
The other factor was that the land use data applied was for the year 2005 so a
simulation period around this year would give more realistic results for land use. Thus the
calibration and validation period for flow was restricted between the years 2002 and
2005. For the water quality a little longer simulation period was considered since the
observed nutrient information was available for the period of 1970-2010 and all
metrological data needed was available for the period 1995-2006 as shown in Table 6.1
but a period closer to the range of flow simulation was chosen for water quality
calibration and validation, which was 2000-2005. USGS flow information at station
Branch Chicago River at Grand Ave at Chicago were downloaded as observed data.
Figure (6.3) shows the GenScn window where the calibration and validation were
performed.
165
(icnSc n. nh t ni<j|i 1
_d
Dates
But I Start End i n ,f( v
Coram lasnsrr fjorafTpr
cgmiii lamfiolT to laooal 5[5i [n*^ T]
BgHaWBlaMH
6.4.1 Hydrology Calibration and Validation. The stream flow simulation was carried
out using meteorological data from Chicago O'Hare airport station, covering the period
from 1 October 2002 to 31 May 2003, and with 2005 land use data obtained from the
Chicago Metropolitan Agency for Planning with a detailed Level (III) land use data as
mentioned .To calibrate the flow and measure its sensitivity to impervious land segments,
the equations proposed by Alley, Laenen and Sutherland (Alley et al., 1983; Laenen,
1983; Sutherland, 2000) were adopted to find the percentages of pervious and impervious
land segments.
Computed land uses were then used in the flow simulation where iterative
procedure was taken. There were limited availability of data and guidelines associated
with the model input parameters for pervious and impervious land segments, so
BASINS's default input parameters were used at first. The parameters were lower storage
nominal (LZSN), upper zone storage nominal (UZSN), mean soil infiltration rate
(INTFW), and interflow recession coefficient IRC. LZSN, UZSN and INFILT parameters
affect the total annual flow volume and adjusting them can alter the total annual
simulated flow volume. LZETP, INTFW, and IRC affect the base flow conditions of the
river and hydrograph shape and peak flow conditions. All these parameters were
calibration parameters that could be estimated and adjusted during the model calibration
process.
The various module parameters were repeatedly adjusted and model was run and
simulated and observed values were compared until reasonable correlation and
determination coefficients were obtained. WinHSPF's 'Input Data Editor Tool' was used
167
to manually adjust these parameters The model was run and calibrated using the proposed
equation to compute effective impervious area (EIA) and results were compared to
observed values in order to choose which equation to adopt. Table 6.4 show each
EIA equation used and the correlation and determination coefficients associated with each
Equation 2 showed acceptable performance and it was the one adopted for
The calibration period selected for hydrology calibration was October 2002 to
May 2003. Figures 6.4 to 6.6 shows the results obtained from the hydrology calibration
Table 6.4. Calibration/ sensitivity analysis for EfA equations for the study area
EIA equation r r2
connected basins, no
connected
connected basins, no
connected
(EIA = TIA)
169
4000
NB_CMAP_ RCH10
3600
OBSERVED 0553(5118
2400
Cj 2000
1600
1200
800
400
10000
NB_CMAP_ RCH10
OBSERVED 05536118
&
2 1000
u.
100
0.5 5 10 20 30 50 70 80 $0 95 98 99.5
Y= 0.949 X+ 152.615
800
400
NB_CMAP_ RCH10
Figure 6.6. Observed vs. simulated flow scatter plot for calibration period (red scatter
points and line represent the simulated data)
Inspecting Figure 6.4 it was found that simulated flow is slightly lower than the
observed flow but it perfectly mimics the pattern of flow in high flow season but not low
flow pattern. The duration curves of the simulated and observed flow Figure 6.5 reveals
the same, there is slight and almost fixed differences between simulated and observed
flow. The duration curve also shows that the two curves mostly follow same pattern for
95 percentile of flows. These results may suggest that the percentages of pervious and
impervious areas proposed were able to reflect the pattern of flow but not the exact value
scatter plots, with a 45 linear regression. With a correlation coefficient of 0.714, the plot
As shown in the Table 6.5, the model performance had reveled acceptable
calibration based on statistical indicators and acceptable ranges published in the literature
for hydrologic simulation. Determination and correlation coefficients (r) and (rz) showed
acceptable values and acceptable model performance. The percent mean errors (PME) is
slightly above 25%. The overall performance of model can be considered acceptable
The hydrology validation period chosen was October 2004 to April 2005. Figures
172
6.7, 6.8, and 6.9 below show the results obtained from the hydrology validation and
graphical comparisons between observed and simulated values. Table 6.6 shows
Results from the hydrologic validation analysis shows that some of the statistical
indicators are fair based on graphical representation and according to the guidelines given
by Donigian (Donigian et al., 2000). The model showed better performance in the
validation period relative to the calibration period except for the poor r and fair to
acceptable r2. Again the overall model performance will be considered acceptable based
"i 1 r
NB _CMAP_ RCH10
OBSERVED 05536118
frfr- lad
2004 2005
10000
: NB_CMAP_ RCH10
! OBSERVED 05536118
1000
100
0.5 2 5 10 20 30 50 70 80 90 95 98
4000
Y= 0.752 X+ 213.494
800
NB_CMAP_ RCH10
Figure 6.9. Observed vs. simulated flow scatter plot for validation period (red scatter
points and line represent the simulated data)
6.4.2 Water quality calibration and validation. The calibration and validation
process in HSPF is a hierarchical methodology beginning with the hydrology and end
with water quality constituents (Donigian, 2000; Calderon, 2009). After the flow
modeled in the WinHPSF's Pollution Selection Window. For this study, nutrient
constituents simulated were total nitrates (N03+ N02) as N, total ammonia (NH4+NH3)
as N and orthophosphate (P04). HSPF uses PQUAL and IQUAL modules to simulate
constituents of the nutrients individually. Total nitrogen and Phosphorus loads were
calculated later using scripts provided by HSPF. Various nutrient modeling parameters
were added for both pervious and impervious land segments. These parameters include
the constituent washoff factor, monthly constituent accumulation factor and the initial
storage for each constituent. These parameters were calibration parameters that were
The results of the nutrient simulations were examined and compared with the
observed values. The initial simulation trials resulted in ammonia and nitrate values were
consistently over predicted mostly during the wet season while orthophosphate nitrate
were over predicted for all the year. Calibration parameters which were adjusted include
the monthly accumulation factors and monthly values for limiting storage for each
constituent for both pervious and impervious land segment. The adjustments were carried
until a reasonable model performance was seen. Instream process parameters were also
Figure 6.10, 6.11, and 6.12 shows graphical results of calibration results for total
176
nitrates, total ammonia and ortho phosphorous respectively. Table 6.7 summarizes the
* * OB SERVED NOB
SIMULATED N03
S O N D J F M A M J J A S 0 N D | J F M A M J J A S
* * OB SERVED NH4
SIMULATED kb
3.6
2.4
1.2
S 0 N D I J F M A M J J A S 0 N D I J F M A M J J A S
* x OBSERVED P04
SIMULATED P04
24
observed simulated
value value
The results of the calibration show that there is an acceptable agreement between
the observed and simulated data. Statistical results for best-fit calibration of total nitrates
and the percent mean error between the simulated and observed data for nitrate show that
the model performance criteria PME was very good for all the constituents as the
accepted tolerances suggested by Donigian, Table 6.3. Other statistical values could be
considered acceptable.
The validation process was conducted with water quality data for the period
between November 2004 and December 2005 for total nitrates and total ammonium, and
for the period of January 2004 to December 2005 for P04. The validation purpose is to
make sure that calibrated model and its adjusted parameters can properly resemble the
watershed conditions that can affect model's results. Once the model is calibrated and
parameters are optimized, the model was run for the specified validation period and the
179
results were statistically analyzed. Figures 6.13, 6.14, and 6.15 show graphical
representations of validation periods for total nitrates, total ammonia and orthophosphate
respectively.
* * OB SERVED N03
SIMULATED N03
N D J F M A M J J A S 0 N D
2004 2005
Analysis Plat, for t RCH10
20 i | i 1 1 1 r~
* * OB SERVED HH4
SIMULATED NH
16
Q- 12
r
8
a 8
njJviluiJ. t'
WrJ
N D J F M A M J J A S 0 N D
2004 ' 2005
Aiufysis Plot for t RCH10
6
* *OB SERVED P04
SIMULATED PCM
4.8
S1 3.6
1.2
0
J F M A M J J A S O N D J F M A M J J A S O N D
2004 2005
Ant^isNrtftrttRCHlO
value value
According to the results obtained from the validation process period, the model
performance is considered very good for all total nitrates and good performance for total
ammonium and phosphate based on PME value (Table 6.3) for accepted performance
6.4.3 Comparing Data Driven and Physical Models. For the proposed framework for
Chicago River Watershed, both data driven and physical models were developed.
Comparing the performance of the two model approaches' results are shown in Table 6.9.
It suggests that data driven models show better performance, RMSE for regression
Although the use of data driven approach for modeling of complex physical systems is
receiving an increasing interest as the result of the growing availability of data, it is not
easy to precisely link the data driven technique to the most important physical variables
that govern the natural processes of the watershed system (Preis et al., 2008). This
property of the physical model would benefit in the analysis of different scenarios that the
watershed may face such as climate change, population change, or inclusion or removal
182
of certain physical variables to the watershed, thus provide a planning tool for regulatory
management programs. Also as discussed in section 5.5.1 data driven models showed less
predictive performance for high total nitrate values. However, the data driven models
require fewer inputs and can be deployed anywhere in the watershed while the physical
model require extensive data inputs and can only be applied in the specific watershed
outlets selected in the simulation. These arguments make it logical to suggest the use of
both physical and data driven models is essential for the proposed framework. The
physical model can be used whenever significant physical change takes place in the
watershed as a planning tool while the data driven model can be used as an operating tool
that can be used periodically to inspect the watershed water quality parameters, especially
HSPF, specifically the modules PQUAL and IQUAL, was used to estimate annual
loadings of total nitrogen and total phosphorus from forty four different land use types in
the Upper Chicago River Basin. Based on the results from the calibrated and validated
water quality model, the total annual loads from the Upper North Chicago River subbasin
were computed.
183
Average nutrient loads from individual some land use segments from 2000 to
2005 were displayed in Tables 6.10 and 6.11 for total nitrogen and total phosphorus
respectively. The average nutrient loadings for total nitrogen and total phosphorus for all
land use types along with pervious and impervious nutrient yield values for the watershed
are shown in Appendix B. Figure 6.16 shows the total nitrogen and total phosphorous
form point and non point sources. Also Figures 6.17, 6.18, and 6.19 show percentages of
different land use areas, total nitrogen and total phosphorous associated with each land
use type
The results of the simulation show that from 2000 to 2005, the land use type that
produced the highest total nitrogen and total phosphorus loads in the Upper Chicago
River subbasin was residential single family land use segment. This is expected, since
residential single family land use is the dominant land use type in the Basin. During this
study, no information that can relate the contribution of a detailed land use, level (III), to
the total nitrogen and total phosphorus loads to the Chicago River watershed or any
similar highly urbanized watersheds was found. Therefore, it was difficult to determine
how well the loads simulated by the model match the actual loads but based on the results
of nutrient model calibration and validation presented in section 6.4.2, it can be assumed
that the model had done an acceptable and unique work in estimating total nitrogen and
9,800,000 1,400,000
PS{N) NPS(N)
1,000,000
8,400,000
800,000
600,000
7,700,000
400,000
7,000,000
200,000
6,300,000 0
2001 2002 2003 2004
Residentia! Single Family Open Space Cons Residential Multi Family Golf Course
Urban Mix W/ Parking lot Vacant/ Grass Industrial W/ Parking Lot Open Space Recreational
0.10%
0.72%
0.48%
0.60%
0.64%
I 1.04% 0.42%
1.02% u 09
0.76% 0.72%
1.74% 126% 1.26%
1.78% 1-28%
2.02% "
2.42%
2.78%
Residential Single Family Residential Multi Family Urban Mix W/ Parking Lot Industrial W/ Parking Lot
0.88%
0.92% 0.66%
1.06%
0.64%
_ 0.56%
1.10% 0.54%
1.10%
0.42%
1.50%
1.64% 1-50%
1.82%
2.40%
5.60% 47.66%
5.80%
6.14% 8.00%
Residential Single Family Residential Multi Family Urban Mix W/ Parking Lot Open Space Cons
Urban Mix No Parking Lot Medical Warehouse/ Distribution/ Wholesale Cultural/ Entertainment
Nursery/ Greenhouse/ Ore Independent Auto Parking Communication Open Space Linear
Export coefficients are generally used for calculating runoff pollutant loads for
different land use types. The most common pollutants for which export coefficients are
usually generated are total nitrogen (TN) and total phosphorus (TP) (Lin, 2004). The
export coefficients presented in this section are the first attempt to measure and model
nutrient using detailed land use types in the Chicago River Watershed and any similar
perspective analysis. Previous studies estimated export coefficients ranges but only for a
limited number of land uses (Lin, 2004; Line et al., 2002; Mcfarland et al., 2001;
Smullen et al., 1999; Baldys et al., 1998; Frink, 1991; Loehr et al., 1989; Clesceri et al.,
1986; Driver et al., 1985; Rast et al.,1983; Beaulac et al., 1982; Reckhow et al., 1980).
For highly urbanized areas, storm event mean concentrations are generally used for
calculating runoff pollutant loads for urban land use types (Smullenet al., 1999;
Several water quality models used to estimate non-point water pollution into
watersheds require the input of either export coefficients (typically for rural areas) or
event mean concentrations (typically for urban areas) which represent the concentration
of a specific pollutant contained in stormwater runoff coming from a particular land use
type within a watershed (Lin, 2004). Export coefficients represent the average total
amount of pollutant loaded annually into a system from a defined area, and are reported
as mass of pollutant per unit area per year (e.g. lb/ac/yr) while EMC they are reported as
a mass of pollutant per unit volume of water (usually mg/L) (Lin, 2004).These numbers
are generally calculated from local storm water monitoring data because collecting the
191
data necessary for calculating site-specific EMCs or export coefficients can be cost-
prohibitive, hence, researchers or regulators will often use values that are already
Export coefficients are very useful indicators that allow predicting the possible
yield of nutrients reaching receiving water bodies. Those values are the combination of a
lot of site specific conditions and variables at the watershed level including hydro
meteorological data, topographic data, land use management practices and physical
numbers are not available, regional or national averages can be used, although the
accuracy of using these numbers is questionable and that is due to the specific
and urban land uses that can exhibit a wide range of variability in nutrient export
Figure 6.20 and 6.21 show the obtained export coefficients for total nitrogen and
Appendix B.
192
Figure 6.20. Average Export Coefficients (EC) for different land use types for TN
193
0.25
0.2
0.15
0.1
Figure 6.21. Average Export Coefficients (EC) for different land use types for TP
194
6.7 Conclusion
A water quality model based on hydrologic simulation was developed for Chicago
River Watershed. The model is the base for the finding of detailed land use effects on
water quality in the area. Moreover, the watershed simulation methodology presented can
support local and federal agencies in the development of TMDL's for the watershed since
it was based on the state of the art modeling procedures available. HSPF, the selected
water quality model, designed to support watershed based analysis and TMDL
with appropriate consideration given to EIA. The results from the five year water quality
simulation resulted in finding of nutrients' loadings of both point and non-point sources.
Land use export coefficients for forty four different land uses were developed as well.
The continuous calibrated and validated model can be used in the investigation
and analysis of different scenarios in the watershed and allows the evaluation of the
behavior of the watershed under possible future conditions, thus providing a planning
tool for regulatory environmental agencies. The data driven models developed in Chapter
5 can be used as operation tool to maintain the water quality parameters especially if
CHAPTER 7
CONCLUSIONS
7.1 Summary
perspective and historical data records are used as tools to investigate land use effects on
water quality in highly urbanized watershed, Chicago River Watershed. It is realized the
attributes of water resources, especially quantity and quality, and how are they are
interlinked. Finding comprehensive ways to interact and assess those attributes is the key
for sound and successful watershed management. This thesis makes a unique contribution
quality, quantity, climate and landuse; and watershed problems, conflicts, needs and
targets; and improving domain knowledge and decision making ability in the same time.
repository and presented methodologies for analyzing and assessing the watershed using
Data Warehouse (DW) and Data Mining (DM) technologies. The DW will make it easy
to access, retrieve, fill data gaps, analyze, and manage data records of water quantity and
quality, climate, land use etc. from different source agencies such as USGS, MWRDGC,
NWS, CMAP etc. and facilitate data interactions and decision making.
Current data storage systems are managed by independent and disparate sources
which created obstacles to synthesizing data from these different sources into a single
analysis. Even though there are systems that progressed to fill that gap; such as the old
system STORET which was introduced by EPA or the more recent enhanced observatory
196
system HIS that was introduced by CUAHSI; they proved to be deficient in their ability
to integrate and process different monitoring data to generate actionable information that
This research realized the need for a DW based on watershed needs to creatively
watershed data and discovery of trends and patterns in data by incorporating 40 years
worth of watershed data from different source agencies in a central repository. The WDW
support decision-support queries that users typically need to address and that involve
dashboard was built. The distinctive feature of this dashboard is that it consists of two
view layers of information, a monitoring layer to visually convey the information and an
analysis layer that allows summarized dimensional data, hierarchies, slicing and dicing of
The multi-dimensional watershed model presented in this study is the base for the
framework proposed to investigate land use effects on water quality in highly urbanized
watersheds. It provides readily integrated watershed data that offers holistic view of the
watershed elements, across the heterogeneous data sources. The DW concept described
allows combining data from different sources, such as USGS, MWRDGC, CMAP, and
techniques facilitates the integration and aggregation of information at all desired levels
The web-based dashboard and reporting tools allow the watershed stakeholders to
management the watershed. The introduced GUI illustrates the ease with which the DW
dimensional concept can be mapped to graphical user interface design to create a tool that
facilitate the different intended tasks of the users, whether it is a watershed assessment
task or integrating data for a physical model application task. The ad hoc analysis tools
are further used where data can be sliced and diced to find patterns or pinpoint certain
problem areas and to provide necessary details, views, or perspectives that enable users to
understand a problem and identify the steps they must take to address it. This improves
the efficiency of analyzing and assessing a watershed over utilizing traditional databases.
Although, the model and the methodology were implemented for highly
urbanized watershed, it is not restricted and can be used without modification for any
watershed.
Moreover, the discipline of data driven modeling was introduced in this thesis for
Chicago River watershed using WDW repository. Several regression and classification
algorithms such as multiple linear regressions, artificial neural networks, model trees,
support vector machines, lazy learners, naive bayes, logistic regression and Gaussian
process were presented and assessed for their appropriateness for predicting total nitrates
using few watershed attributes. The results show acceptable prediction accuracy and
interpretability by number of algorithms in spite of the limited count of data used. The
resulting models could be deployed for built up scenarios that associate with change in
any of the watershed elements such as population, water quality regulations, land use,
climate etc. in order to predict future outcomes. Thus, insights offered by a site specific
198
data mining results can be integrated with policy and decision making tools to effectively
manage the watershed and optimally utilize its land use. In particular the decision tree
model approach is worth investigating for prioritizing steps of actions for instance when
The success of data mining methodology relies heavily on the quality and quantity
of data used in the prediction process. Even though this study used a sufficient amount of
data, with logical set of predictors, more data and more watershed characteristics can be
incorporated to enhance the predictive models' efficiency and performance. Although the
ANN model always showed better performance, however, further training for decision
tree models would be more logical since they show reasoning process in rules that are
management plans. On the other hand the other models do not provide such features to
Data mining techniques presented in this study are intended to integrate some of
watershed parameters as indicators to predict the water quality parameter in question, and
hence simplifying the modeling procedures. This allows the utilization of watershed basic
elements' data and the relationship among them without giving attention to the physical
Since the Chicago River watershed is 82% urban land use i.e. highly urbanized
area, examining effect of land use on water quality requires a detailed level of land use.
The export coefficients presented in this thesis are the first attempt to measure and model
nutrients using detailed land use types with a continuous simulation approach and
watershed perspective analysis rather than a storm event methodology. Five years of
199
water quality simulation using the multi-purpose environmental analysis system BASINS
coupled with the comprehensive, conceptual, and continuous simulation watershed scale
model HSPF resulted in export coefficients for level (III), detailed land use for the
Chicago River watershed. Export coefficients are very useful indicators that allow
predicting the possible yield of nutrients reaching receiving water bodies. In this sense,
the water quality simulation approach utilized in this research to generate the coefficients
constitutes a new contribution to the Chicago River watershed and other highly urbanized
watersheds.
The watershed simulation methodology presented can support local and federal
agencies in the development of TMDL's for the watershed since it was based on the state
of the art modeling procedures available. HSPF the selected water quality model,
designed to support watershed based analysis and TMDL development. The model can be
given to EIA. The results from the five year water quality simulation resulted in finding
of nutrients' loadings of both point and non-point sources. Land use export coefficients
for forty four different land uses were developed as well. Export coefficients can be
utilized as input for a multi-objective optimization approach to resolve land use conflicts
The continuous calibrated and validated model can be used in the investigation
and analysis of different scenarios in the watershed and allows the evaluation of the
behavior of the watershed under possible future conditions, thus providing a planning
tool for regulatory environmental agencies. The data driven models developed in Chapter
5 can be used as operation tool to maintain the water quality parameters especially if
200
TMDL and WQS are developed for Chicago River Watershed. So the framework
proposed for this study can be considered robust with the proposed integration, planning
and operating techniques and tools. Furthermore, an optimization tool is introduced in the
The framework presented in this study is not a solution for the watershed
problems but a collection of innovated tools that can help to investigate and solve the
issues. More sophisticated tools can be utilized to fulfill the goals of the framework.
Although this research is clearly advocating the holistic approach to the watershed
management by including watershed perspective and historical data records, it has some
the watershed scale offer an effective watershed management tools to estimate nutrients
yields for wide spectrum of problems dealing with surface waters (Arabi, 2005; Qi,
explore alternative scenarios in water resources management which enhance the quality
of decision making (Qi, 2006). Coupling the watershed model simulation results with
when more than one objective function is involved and different solutions may produce
2009). Pareto optimal solutions are set of solutions where going from any one point to
another in the set, at least one objective function improves and at least one other worsens,
neither of the solutions dominates over each other and provides good flexible options for
The range of land use export coefficients obtained from long term continuous
simulation reflects the different conditions of watershed and different meteorological and
physical variables included in the simulation and hence provide a perfect input for a
optimal land use change and distribution in highly urbanized developed watershed. Based
on different detailed land use types, scenarios that take into account different
combination of pervious and impervious land use segments and tradeoff between them
(e.g. changing an impervious parking lot land use into pervious etc.) along with factors
planning and decision making tool. The multi-objective optimization approach will allow
the optimizing of independent objectives to find the best land use combination while the
high priority goal is to meet certain water quality standards regarding nutrient loadings of
A.2.2 LAND_USE_TYPE_DIM
CREATE TABLE "CHICAGORW"."LAND_USE_TYPE_DIM"
(
"LAND_USE_TYPE_KEY" NUMBER(30,0) NOT NULL ENABLE,
"SYS MODIFICATIO DATE" DATE DEFAULT sysdate NOT NULL ENABLE,
"LAND USE LEVEL I CODE" NUMBER(38,0),
204
A.2.4 MEASUREMENTDETAILSDIM
CREATE TABLE "CHICAGORW"."MEASUREMENT_DETAILS_DIM"
(
"MEASUREMENTDETAILSKEY" NUMBER(30,0) NOT NULL ENABLE,
"SYS MODIFICATIO DATE" DATE DEFAULT sysdate NOT NULL ENABLE,
"MEASUREMENTNAME" VARCHAR2(30 BYTE),
"CONFORMED MEASUREMENT NAME" VARCHAR2(30 BYTE),
"MEASUREMENT UNIT" VARCHAR2(60 BYTE),
"MEASUREMENT CATEGORY" VARCHAR2(60 BYTE),
"MEASUREMENT SUBCATEGORY" VARCHAR2(60 BYTE),
"ME ASUREMENTDESC" VARCHAR2(120 BYTE),
CONSTRAINT "PK3" PRIMARY KEY ("MEASUREMENT_DETAILS_KEY")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE
STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1
MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER POOL DEFAULT FLASH CACHE DEFAULT CELL FLASH_CACHE
DEFAULT) TABLESPACE "USERS" ENABLE
)
SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
(
INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL DEFAULT
FLASHCACHE DEFAULT CELLFLASHCACHE DEFAULT
)
TABLESPACE "USERS" ;
A.2.5 SOURCEAGENCYDIM
CREATE TABLE "CHICAGORW"."SOURCE_AGENCY_DIM"
(
"SOURCEAGENCYKEY" CHAR(10 BYTE) NOT NULL ENABLE,
"SYSMODIFICATIODATE" DATE DEFAULT sysdate NOT NULL ENABLE,
"AGENCY NAME" VARCHAR2(60 BYTE),
"AGENCY NAME ABBREV" VARCHAR2(60 BYTE),
"AGENCY TYPE" VARCHAR2(60 BYTE),
CONSTRAINT "PK2" PRIMARY KEY ("SOURCE AGENCY KEY") USING
INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS
2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL
DEFAULT FLASHCACHE DEFAULT CELLFLASHCACHE DEFAULT)
TABLESPACE "USERS" ENABLE
)
SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
(
206
A.2.6 WATERSHED_CLIMATE_FACT
CREATE TABLE "CHICAGORW"."WATERSHED_CLIMATE_FACT"
(
"DATEKEY" NUMBER(30,0) NOT NULL ENABLE,
"MEASUREMENTDETAILSKEY" NUMBER(30,0) NOT NULL ENABLE,
"LOCATIONKEY" NUMBER(30,0) NOT NULL ENABLE,
"SOURCEAGENCYKEY" CHAR(10 BYTE) NOT NULL ENABLE,
"SYSMODIFICATIODATE" DATE DEFAULT sysdate NOT NULL ENABLE,
"READING VALUE" NUMBER(30,5),
CONSTRAINT "PK9" PRIMARY KEY ("DATE KEY",
"MEASUREMENTDETAILSKEY", "LOCATIONKEY",
"SOURCEAGENCYKEY") USING INDEX PCTFREE 10 INITRANS 2
MAXTRANS 255 COMPUTE STATISTICS NOCOMPRESS LOGGING
TABLESPACE "USERS" ENABLE
)
SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "USERS" ;
(
INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFERPOOL DEFAULT
FLASHCACHE DEFAULT CELLFLASHCACHE DEFAULT
)
TABLESPACE "USERS" ;
A.2.9 WATERSHED_WATER_QUANTITY_FACT
CREATE TABLE "CHICAGORW"."WATERSHEDWATERQUANTITYFACT"
(
"DATE KEY" NUMBER(30,0) NOT NULL ENABLE,
"MEASUREMENT DETAILS KEY" NUMBER(30,0) NOT NULL ENABLE,
"LOCATION KEY" NUMBER(30,0) NOT NULL ENABLE,
"SOURCE AGENCY KEY" NUMBER(30,0) NOT NULL ENABLE,
"SYS MODIFICATIO DATE" DATE DEFAULT sysdate NOT NULL ENABLE,
"READING VALUE" NUMBER(30,5),
CONSTRAINT "PK8" PRIMARY KEY ("DATEKEY",
"MEASUREMENT DETAILS KEY", "LOCATION KEY",
"SOURCEAGENCYKEY") USING INDEX PCTFREE 10 INITRANS 2
MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT
208
A.2.10 MWRD_READINGS_STAGE
CREATE TABLE "CHICAGORW"."MWRD_READINGS_STAGE"
(
"READING DATE" DATE,
"LOCATION ID" VARCHAR2(20 BYTE),
"MEASURMENT" VARCHAR2(20 BYTE),
"UNIT" VARCHAR2(20 BYTE),
"VALUE" VARCHAR2(20 BYTE),
"INSERT DATE" DATE
)
SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
(
INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL DEFAULT
FLASH CACHE DEFAULT CELL FLASH CACHE DEFAULT
)
TABLESPACE "USERS" ;
A.2.11 NWS_AIR_TEMP_STAGE
CREATE TABLE "CHICAGORW"."NWS_AIR_TEMP_STAGE"
(
"READING DATE" DATE,
"AVG AIR TEMP" NUMBER(10,2),
"MAX AIR TEMP" NUMBER(10,2),
"MIN AIR TEMP" NUMBER(10,2)
)
SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
(
INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL DEFAULT
FLASH CACHE DEFAULT CELL FLASH CACHE DEFAULT
209
)
TABLESPACE "USERS" ;
A.2.12 NWS_DAILYPRECSTAGE
CREATE TABLE "CHICAGORW"."NWS_DAILY_PREC_STAGE"
(
"READINGDATE" DATE,
"DAILYPERC" NUMBER(10,3)
)
SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
(
INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFERPOOL DEFAULT
FLASH_CACHE DEFAULT CELL FLASH CACHE DEFAULT
)
TABLESPACE "USERS" ;
A.2.13 USGS_READINGS_STAGE
CREATE TABLE "CHICAGORW"."USGS_READINGS_STAGE"
(
"READING DATE" DATE,
"GAGEHEIGHT" NUMBER(15,3),
"DISCHARGE" NUMBER(15,3),
"LOCATION ID" VARCHAR2(20 BYTE),
"INSERT DATE" DATE DEFAULT sysdate
)
SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
(
INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER POOL DEFAULT
FLASH CACHE DEFAULT CELL_FLASH CACHE DEFAULT
)
TABLESPACE "USERS" ;
210
LM num: 1
NITRATE =
0.0363 * DO + 0.0057 * TEMP - 0.0068 * BOD + 0.0003 * COD - 0.0192
* PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MINAIRTEMP - 0.0017 *
AVGAIRTEMP + 0.1426 * DAILYPERC - 0.0001 * FLOW + 0 * TOTJOOl +
0.7193
LM num: 2
NITRATE = 0.0438 * DO + 0.0057 * TEMP - 0.0068 * BOD + 0.0003 * COD -
0.0192 * PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG AIR TEMP + 0.1426 * DAILY PERC - 0.0001 * FLOW + 0 * TOTJOOl +
1.282
LM num: 3
NITRATE = 0.0094 * DO + 0.004 * TEMP - 0.0068 * BOD + 0.0003 * COD -
0.0192 * PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0013 * FLOW + 0 * TOT lOOl +
0.5277
LM num: 4
NITRATE = 0.0094 * DO + 0.004 * TEMP - 0.0068 * BOD + 0.0003 * COD -
0.0192 * PH - 0.0001 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0016 * FLOW + 0 * TOTJOOl +
0.6461
LM num: 5
NITRATE = 0.0094 * DO - 0.0017 * TEMP - 0.0068 * BOD + 0.0003 * COD +
0.0851 * PH - 0.0002 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0001 * FLOW + 0 * TOTJOOl +
0.2946
LM num: 6
NITRATE = 0.0094 * DO - 0.0003 * TEMP - 0.0068 * BOD + 0.0003 * COD +
0.0446 * PH - 0.0002 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0001 * FLOW + 0 * TOTJOOl +
0.7249
LM num: 7
NITRATE = 0.0094 * DO - 0.0003 * TEMP - 0.0068 * BOD + 0.0003 * COD +
0.0446 * PH - 0.0002* VSS- 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG AIR TEMP + 0.0634 * DAILY PERC + 0.0001 * FLOW + 0 * TOTJOOl +
0.6722
LM num: 8
NITRATE = 0.0094 * DO - 0.0011 * TEMP - 0.0068 * BOD + 0.0003 * COD +
0.0547 * PH - 0.0002 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
* AVG_AIR TEMP + 0.0634 * DAILY PERC + 0.0001 * FLOW + 0 * TOTJOOl +
0.5162
LM num: 9
NITRATE = 0.0094 * DO + 0.0009 * TEMP - 0.0068 * BOD + 0.0003 * COD -
0.0617 * PH - 0.0002 * VSS - 0.0002 * INORG_SS - 0.001 * MIN AIR TEMP - 0.0017
215
LM num: 16
NITRATE = 0.0766 * DO + 0.008 * TEMP - 0.0305 * BOD + 0.0002 * COD -
0.1977 * PH - 0.0007 * VSS - 0.001 * INORG_SS - 0.0034 * MIN AIR TEMP - 0.0216
* AVG AIR TEMP + 0.0143 * DAILY PERC - 0.0018 * FLOW + 0 * TOTJOOl +
4.0342
LM num: 17
NITRATE = 0.0535 * DO + 0.008 * TEMP - 0.0305 * BOD + 0.0002 * COD -
0.1977 * PH - 0.0007 * VSS - 0.001 * INORG_SS - 0.0034 * MIN AIR TEMP - 0.0216
* AVG AIR TEMP + 0.0143 * DAILY PERC - 0.0018 * FLOW + 0 * TOTJOOl +
4.7406
LM num: 18
NITRATE = 0.0074 * DO + 0.008 * TEMP - 0.0305 * BOD + 0.0002 * COD -
0.2774 * PH - 0.0007 * VSS - 0.001 * INORG SS - 0.0034 * MIN AIR TEMP - 0.0115
216
Scheme:weka.classifiers.functions.GaussianProcesses -L 1.0 -N 0 -K
"weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 1.0"
Relation: Chi_NB_data_mining_total_area_weka-
weka.filters.unsupervised.attribute.Remove-R4-5-
weka.filters.unsupervised.attribute.ReplaceMissingValues
Instances: 905
Attributes: 154
[list of attributes omitted]
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ==
Gaussian Processes
Kernel used:
RBF kernel: K(x,y) = eA-(1.0* <x-y,x-y>A2)
Average Target Value : 2.685958751393536
Inverted Covariance Matrix:
Lowest Value = -0.21889501888682303
Highest Value = 0.9798981897805298
Inverted Covariance Matrix * Target-value Vector:
Lowest Value = -5.116699435695602
Highest Value = 8.93518362362874
Histograms:
h"f~
rr, LiiL
. r : i r u i n n n- - . - r s i . r i i
r - -in .
r .
N i r. r i ^ . r 7 . . FI F ,
"
i . . r ir"r L . r i
r L . . . ^-F IF"I . . . r h
r - ----- . . . . . . . - R N R I
- r~ -
r "
I 1
=== Summary =
Correctly Classified Instances 754 83.3149%
Incorrectly Classified Instances 151 16.6851 %
Kappa statistic 0.6061
Mean absolute error 0.16
Root mean squared error 0.2863
Relative absolute error 53.3972 %
Root relative squared error 74.0196 %
Total Number of Instances 905
= Confusion Matrix
a h c classified as
584 53 0| a = '(-inf-3.993333]'
50 154 0| b = '(3.993333-7.986667]'
41 23 0| c ='(7.986667-inf)'
224
Number of Leaves : 53
Size of the tree : 105
Time taken to build model: 0.15 seconds
A.4.6 NaiveBayes
SYNOPSIS
Class for a Naive Bayes classifier using estimator classes. Numeric estimator
precision values are chosen based on analysis of the training data. For this reason, the
classifier is not an UpdateableClassifier (which in typical usage are initialized with zero
training instances) ~ if you need the UpdateableClassifier functionality, use the
NaiveBayesUpdateable classifier. The NaiveBayesUpdateable classifier will use a
default precision of 0.1 for numeric attributes when buildClassifier is called with zero
training instances.
=== Run information ===
Scheme:weka.classifiers.bayes.NaiveBayes
Relation: Chi_NB_data_mining_total_area_weka-
weka.filters.unsupervised.attribute.Remove-R4-5-
weka.filters.unsupervised.attribute.ReplaceMissingValues-
weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R3
Instances: 905
Attributes: 154
[list of attributes omitted]
Test mode: 10-fold cross-validation
= Summary ===
Correctly Classified Instances 731 80.7735 %
Incorrectly Classified Instances 174 19.2265 %
Kappa statistic 0.5445
Mean absolute error 0.1278
Root mean squared error 0.3569
Relative absolute error 42.6404 %
Root relative squared error 92.2884 %
Total Number of Instances 905
i
i
APPENDIX B
BASINS/HSPF
233
iao-1
Mappd .
Imptrvioui
Artai 60-
MIA ueacno
(%)
- US.<15. CIA
e r A = 3 . 6 . 4 - 3 ( H i A )
'i
Avnaf* R*W* Gfw-ttr>
(1A s O.I/MIA)1-*
Figure B.l. Plot of Sutherland Equations and USGS (Laenen, 1983) equation that
illustrate relationships between TIA and EI A for a range of watersheds (Sutherland,
2000).
234
Single-family residential < 0 25 acre lots 30 30-4 Alley and Veenhuis 1W.I1
0.254) 5 acre lots 26 22-31 Alley and Veenhuis (1083)
0.5-1 0 acre lots 15 13-16 Alley and Veenhuis (1083)
Includes multi-family residential 30 22-44 Sullivan et al. (1078)
Multiple-family residential 66 53-64 Alley and Veenhuis (1083)
Commercial ss 66-08 Alley and Veenhuis (1083)
81 52-00 Sullivan et al. (1078)
Industrial Ml Alley and Veenhuis (1083)
40 11-57 Sullivan et al. (1078)
l^pen 5 1-14 Sullivan et al (1078)
Figure B.2. Percentage Imperviousness for Various Land Cover Classes as Calculated
Directly from Aerial Photo and Map Analysis (Brabec et al., 2010)
Agricultural land/
open space 5 2-5 0 1 0-2.0 0 0-1.5 - 2 U.I 1 1
Public and quasi-public ... _ 50-75 - -
Parks 5 5 0 100 0 15 4.2
C.olf courses 5 20 ... 0 10 -
I ,ow-density 10 < 15 \2 U-|o 12 188 5 4 18 2.4
single-family residential ( J u/ac) (0-2 u / ac) (1 u/2 ac.) ( t u/ac.)
Medium-density 15 2t lVlft 25 "U-42 20 17 8 24 in M0 22 16 h
single-family residential (1-1 u/ac) <1-2 u/ac ) {2-8 u/ac) ll u/ac.) <1-1 u/ac.) (1-2 u/ac I
"Suburban" density 22-^1 25 n-io
4 u/ac. <2-4 u/ac) <2-4 u/ac )
High-density ho 40 !UM0 40 40 25-48 10 51 4 51 25 18-12 25 101
^ingle-family ivstdential (3'7 u/ac) i ' 4 u/ac.) <!*-7 u/ac ) (8-22 u/ac| ill u/ac.) (1-7 u/ac.; <> 4 u/ac )
1U
(4 u/ac.)
Mol^iie homes 70 ftO
Multifamilv 80 48 *0-80 47-M *5 72 IT 52
< 7 u/ac ) (7-Wu/ac ) [ * 22 u/ac.) (8 u/ac.) < 7 u / ac )
Commercial u 60-U0 66-08 86 80-100 85 56.2 8ft 4K-85 51-08 35-40 41.0
Industrial AO 40-00 72 75.0 46 M.u
Highways 1(10 !0C> 520 Imi 0(1 22 7
Construction ute 50 o 77 17
NRTTIt rhe number of land use class*** varies considerably between studio* USDA - U.S. Department of Agriculture
a. Abstracted from Alley and Vconhuis (IQR3). IMch and Fbbert (10%), Taylor (loo^j, Beyerlein 0006)
b I rom KingCounty hkirface Water ManagementDivision <1000), Departmentof Public Wurks. and Fi:i/fterrettCon.suitingCn>up<io<JO).SnoquaimieKidge Draft Mas
ter Drainage Plan
c. Based on direct measuiemmt from aerial photos and field inspection from nineteen basins in the Denver area.
d. Total and effective impervious area percentagescompiled from CountySurface Water Management < 1000), PFf/Barrett Consulting Croup (1^1).Snoqualmie Ridge
Draft Master Drainage Han; Alley and Veenhuis (1083), and for the open land/agricultural land category, estimated based on similar land uses
e. No discussion of methodology for determining impervious figures
f The source for the percentage imperviousness figures is not indicated in the report.
g. Based on general field observations and studies by Carter (1061>, Feltun and Lull 11063), Antoine<10f>4), and Stall et al (WTO). These reference studiesare not New Jer
sey specific.
h. Measured from aerial photographs and a field survey of three sample areas per land use category in each watershed.
i. Measured from topographic maps.
Figure B.3. The Percentage Impervious Area Ascribed to Various Land Use Categories,
Showing the Relationship of Total Impervious Area (TIA) to Effective Impervious Area
(EIA) Used in Various Studies (Brabec et al., 2010)
235
Table B. 1 Simulated annual loads of total nitrogen from different land use segment in the
Upper Chicago River subbasin
Land Use Type Perv. Imperv. % EIA Combined Area Total % Loads
Loads Loads (lbs/acre / (acres) Annual
(lbs/acre (lbs/acre / yr) Loads (lbs)
/yr) yr)
Residential 1.2216 9.1722 20% 2.8288 61776 174742.6 46.28%
Single Family
Residential 1.2216 9.1722 25% 3.2094 9595 30794.2 8.26%
Multi Family
Urban Mix W/ 1.2222 9.1722 40% 4.4022 5924 26076.8 7.10%
Parking Lot
Industrial W/ 1.2134 9.1722 40% 4.397 5403 23754.6 6.48%
Parking Lot
Education 1.2222 9.1722 25% 3.2098 3722 11945.6 3.22%
Interstate/Toll 1.2222 9.1722 40% 4.4022 2315 10192.8 2.78%
Open Space 0.8778 9.1722 0% 0.8778 11554 10140 2.42%
Cons
Lake/ 0.7852 9.1722 50% 4.9788 1470 7317.6 2.02%
Reservoirs/
Lagoon
Business W/ 1.2222 9.1722 40% 4.4022 1603 7055.8 1.92%
Parking Lot
Golf Course 0.7856 9.1722 0% 0.7856 9455 7428 1.78%
Government 1.2222 9.1722 40% 4.402 1441 6343.2 1.74%
Manufacturing/ 1.2222 9.1722 39% 4.316 1098 4740.4 1.28%
Production
Office Camps 1.2222 9.1722 37% 4.1568 1111 4618.4 1.26%
Utilities/ Waste 1.2222 9.1722 40% 4.4022 1042 4588.8 1.26%
Vacant/ Grass 0.7856 9.1722 0% 0.7856 5489 4312.4 1.04%
Transportation 1.2222 9.1722 40% 4.4022 841 3701.4 1.02%
Religious 1.2222 9.1722 25% 3.2096 1104 3544.6 0.92%
Single Office 1.2222 9.1722 22% 2.9696 973 2889 0.80%
Warehouse/ 1.2222 9.1722 40% 4.4022 626 2756.4 0.76%
Distribution/
Wholesale
Open Space 0.7914 9.1722 0% 0.7914 3832 3032.6 0.72%
Recreational
Retail Center 1.2222 9.1722 25% 3.2098 851 2729.6 0.72%
Urban Mix No 1.2222 9.1722 25% 3.2096 845 2713.6 0.72%
Parking Lot
Medical 1.2222 9.1722 33% 3.8188 709 2706 0.72%
Other Roadway 1.2222 9.1722 40% 4.4022 532 2339.8 0.64%
Cultural/ 1.2222 9.1722 25% 3.2096 684 2196.4 0.60%
Entertainment
Crops/Grain/ 1.7074 9.1722 0% 1.7074 1218 2080 0.48%
236
Graze
Construction 1.2222 9.1722 25% 3.2098 507 1626.4 0.42%
Residential
Construction 1.2222 9.1722 25% 3.2098 497 1595.4 0.42%
Non-
Residential
Cemetery 1.1272 9.1722 0% 1.1272 1396 1574.2 0.36%
Mall 1.2222 9.1722 40% 4.4022 300 1322.4 0.34%
Rivers/ Canals 0.7852 9.1722 50% 4.9788 248 1236.2 0.34%
Residential 1.2222 9.1722 25% 3.2098 404 1295 0.32%
Mobile Home
Hotel/ Motel 1.2222 9.1722 25% 3.2094 217 697.4 0.20%
Wetland 0.7852 9.1722 0% 0.7852 1122 881.2 0.18%
Institutional/ 1.2222 9.1722 40% 4.4022 91 402.4 0.10%
Other
Nursery/ 1.708 9.1722 0% 1.708 231 394.6 0.10%
Greenhouse/
Orch
Other vacant 0.8778 9.1722 0% 0.8778 297 260.4 0.08%
Independent 1.2222 9.1722 39% 4.3452 42 182.2 0.02%
Auto Parking
Communication 1.2222 9.1722 25% 3.2104 51 163.4 0.00%
Open Space 0.7856 9.1722 0% 0.7856 110 86.6 0.00%
Private
Water 0.7852 9.1722 0% 0.7852 84 65.6 0.00%
Open Space 0.7856 9.1722 0% 0.7856 63 49.2 0.00%
Linear
Open Space 0.7856 9.1722 0% 0.7856 39 30.8 0.00%
Other
Residential 1 2222 9.1722 7% 1.7772 11 18.8 0.00%
Farm
Total / Average 1.1271 9.1722 23% 130.48 140923 376622.8 100%
Table B.2 Simulated annual loads of total Phosphorus from different land use segment in
the Upper Chicago River subbasin
Land Use Type Perv. Imperv. % EIA Combined Area Total % Loads
Loads Loads (lbs/acre/ (acres) Annual
(lbs/acre (lbs/acre/ yr) Loads (lbs)
/yr) yr)
Table B.3 Land use codes (as used in physical and data driven models)
BIBLIOGRAPHY
Abedini, M.J., Nasseri, M., (2004). Spatiotemporal rainfall forecasting via ANNS
coupled with GA. In: Liong, Phoon, Babovic (Eds.), Sixth International
Conference on Hydroinformatics.
Ahmed, A., Ploennigs, J., Menzel, K., & Cahill, B. (2010). Multi-dimensional building
performance data management for continuous commissioning, Advanced
Engineering Informatics, 24, 466-475.
Ahmed, A., Korres, N., Ploennigs, J., Elhadi, H., & Menzel, K. (2011). Mining building
performance data for energy-efficient operation. Advanced Engineering
Informatic, 25(2), 341-354.
Ahmed, I., Azhar, S., & Lukauskis, P. (2004). Development of a decision support system
using data warehousing to assist builders/developers in site selection. Automation
in Construction, 13 (4), 525-542.
Akhavan, S., Abedi-Koupai, J., Mousavi, S.-F., Afyuni, M., Eslamian, S.S., &
Abbaspour, K.C. (2010). Application of SWAT model to investigate nitrate
leaching in Hamadan-Bahar Watershed, Iran. Agriculture, Ecosystems and
Environment, 139 (4), 675-688.
Allan, J. D. (2004). Landscapes and Riverscapes: The influence of land use on stream
ecosystems. Annual Review of Ecology, Evolution, and Systematics, 35, 257-284.
Alley, W.M., & Veenhuis , J.E. (1983). Effective impervious area in urban runoff
modeling. Journal of Hydraulic Engineering, 109(2), 313-319.
Anderson, J.R., Hardy, E.E., Roach, J.T., & Witmer, R.E. (1976). A land use and land
cover classification system for use with remote sensor data : U.S. Geological
Survey professional paper 964. Retrieved from
http://landcover.usgs.gov/pdf/anderson.pdf
Arabi, M., Govindaraju, R.S., Hantush, M. M., & Engel, B. A. (2006). Role of watershed
subdivision on modeling the effectiveness of best management practices with
SWAT. Journal of the American Water Resources Association, 42(2), 513-528.
Arnold, J. G., Srinivasan, R., Muttiah, R. S., & Williams, J. R. (1998). Large area
hydrologic modeling and assessment - Part 1: Model development. Journal of the
American Water Resources Association, 34( 1), 73-89.
242
Arnold, J. G., Potter, K.N., King, K.W., & Allen, P.M. (2005). Estimation of soil
cracking and the effect on surface runoff in a Texas Blackland Prairie watershed.
Hydrological Processes, 19(3), 589-603.
Ahmad, H. M. N., (2010). Modeling hydrology and nitrogen export for the Thomas
Brook watershed with SWAT (Master of applied science thesis). ISBN: 978-0
494-68078-0.
Alpaydin, E. (2010). Introduction to machine learnening, 2nd ed. The MIT Press.
Asefa, T., Kemblowski, M., McKee, M., Khalil, A. (2006). Multi-time scale stream flow
predictions: the support vector machines approach. Journal of Hydrology, 318, 7-
16.
Ahearn, D.S., Sheibley, R.W., Dahlgren, R.A., Anderson, M., Johnson, J., & Tate, K.W.
(2005). Land use and land cover influence on water quality in the last free-flowing
river draining the western Sierra Nevada, California. Journal of Hydrology, 313,
234-247.
Baldys, S., Raines, T. H., Mansfield, B. L., & Sandlin, J. T. (1998). Urban stormwater
quality, event-mean concentrations, and estimates of stormwater pollutant loads.
U.S. Geological Survey Water-Resources Investigation Report 98-4158.
Barling, R.O., & Moore I. O. (1994). Role of buffer strips in management of waterway
pollution: A review. Environmental Management, 18(A), 543-558.
Barnes, K. B., Morgan, J. M., & Roberge, M. C. (2002). Impervious surfaces and the
quality of natural and built environments. Department of Geography and
Environmental Planning, Towson University. Retrieved from
http://pages.towson.edu/morgan/files/Impervious.pdf
Bartosova, A., Singh, J., Slowikowski, J., Machesky, M., & McConkey, S. (2005).
Overview of recommended phase III water quality monitoring: Fox River
investigation. Illinois State Water Survey, ISWS CR 2005-13.
Bartosova, A., Singh, J., Rahim, M., McConkey, S. (2007). Fox River Watershed
investigation: Stratton Dam to the Illinois River, phase II: hydrologic and water
quality simulation models, part 3: validation of hydrologic model parameters,
Brewster Creek, Ferson Creek, Flint Creek, Mill Creek, and Tyler Creek
Watersheds. Illinois State Water Survey, ISWS CR 2007-07.
Basnyat, P., Teeter, L.D., Flynn, K.M., Lockaby, B.G., (1999). Relationships between
landscape characteristics and nonpoint source pollution inputs to coastal estuaries.
Environmental Management, 23 (4), 539-549.
243
Beach, D. (2002). Coastal sprawl: the effects of urban design on aquatic ecosystems in
the United States. Pew Oceans Commission, Arlington. Retrieved from
http://www.Dewtrusts.org/uploadedFiles/wwwpewtrustsorg/Reports/Protecting oe
an life/env pew oceans sprawl.pdf
Beran, B., Piasecki, M. (2009). Engineering new paths to water data. Computer and
Geosciences, 35 (4), 753-760.
Bergman, M. J., Green,W., & Donnangelo, L. J. (2002). Calibration of storm loads in the
South Prong watershed, Florida, using Basins/HSPF. Journal of the American
Water Research Association, 38, 1423-1436.
Bhaduri, B., Harbor, J., Engel, B. A., & Grove, M. (2000), Assessing watershed-scale,
long-term hydrologic impacts of land-use change using a GIS-NPS model.
Environnemental Management, 26(6), 643-58.
Bhaduri, B., Minner, M., Tatalovich, S., & Harbor, J. (2001). Long-term hydrologic
impact of land use change: a tale of two models. Journal of Water Resources
Planning and Management, 127(1), 13-19.
Bian, B., Juan Cheng, X., & Li, L. (2011). Investigation of urban water quality using
simulated rainfall in a medium size city of China. Environmental Monitoring and
Assessment, 753(1-4), 217-229.
Bicknell, R., Imhoff, J., Kittle, L. Jr, Donigian, S. Jr, & Johanson, C. (1996),
Hydrological Simulation Program-Fortran User's Manual. .S. Environmental
Protection Agency. Retrieved from
http://eng.odu.edu/cee/resources/model/mbin/hspf/dos/hspf vl 1 entiretv.pdf
Bicknell, B. R., Imhoff, J. C., Kittle, Jr, J. L., Jobes, T. H., & Donigian, Jr., A. S. (2005).
HSPF Version 12.2 User's Manual. U.S. Environmental Protection Agency.
Retrieved from
http://water.epa.goV/scitech/datait/models/basins/bsnsdocs.cfm#hspf
Bonifati, A., Cattaneo, E., Ceri, S., Fuggett, A., & Paraposchi, S. (2001). Designing data
marts for data warehouse. ACM Transactions on Software
Engineering and Methodology, 10(4), 452-483.
Borah D.K., & Bera, M. (2003). Watershed scale hydrology and nonpoint sourcepollution
models: Review of Mathematical bases. American Society of Agricultural
244
Borah , D. K., Yagow, G., Saleh, A., Barnes, P. L., Rosenthal, W., Krug, E. C., & Hauck,
L. M. (2006). Sediment and nutrient modeling for TMDL development and
implementation. American Society of Agricultural and Biological Engineers,
49(4), 967-986.
Bosch, D.D., Sheridan, J.M., Lowrance, R.R., Hubbard, R.K, Strickland, T.C.,
Feyereisen, G.W., & Sullivan, D.G. (2007). Little river experimental watershed
database. Water Resources Research 43 (W09470), doi:10.1029/2006WR005844.
Bouraoui, F., Vachaud, G., & Chen. T. (1998). Prediction of the effect of climatic
changes and land use management on water resources. Physics and Chemistry of
the Earth, 23(4), 379-384.
Boynton, W. R., Garber, J.H., Summers, R., & Kemp, W. M. (1995). Inputs,
transformations, and transport of nitrogen and phosphorus in Chesapeake Bay and
selected tributaries. Estuaries, 75(16), 285-314.
Brabec E., Schulte S. & Richards P.L. (2002). Impervious surfaces and water quality: A
review of current literature and its implications for watershed planning.
Journal of Planning Literature, 16, 499.
Brett, M.T., Arhonditsis, G.B., Mueller, S.E., Hartley, D.M., Frodge, J.D., & Funke, D.E.
(2005). Non point source impacts on stream nutrient concentrations along a forest
to urban gradient. Environmental Management, 35(3), 330-42.
Brezonik, P. L., & Stadelmann, T. H. (2002). Analysis and predictive models of storm
water runoff volumes, loads, and pollution concentration from watersheds in the
Twins Cities metropolitan area, Minnesota, USA. Water Research, 36, 1743
1757.
Brun, S.E., & Band, L.E. (2000). Simulating runoff behavior in an urbanizing watershed.
Computers, Environment and Urban Systems, 24( 1), 5-22.
Burmann, A., & Marx Gomez, J. (2007). Data Warehousing with Environmental Data.
Information Technologies in Environmental Engineering ITEE 3rd international
ICSC symposium, 153-160.
Cappiella, K., & K. Brown. (2001). Derivations of Impervious Cover for Suburban Land
Uses in the Chesapeake Bay Watershed. Prepared for the U.S. EPA Chesapeake
Bay Program. Center for Watershed Protection, Ellicott City, MD, 51.
Carpenter, S., Caraco, N., Correll, D., Howarth, R., Sharpley, A.,& Smith, V. (1998).
Nonpoint pollution of surface waters with phosphorous and nitrogen. Ecological
Applications, 8(3), 559-568.
Center for Watershed Protection (2003). Impacts of impervious cover on aquatic systems.
Center for Watershed Protection, Ellicott City, MD, 141 p.
Chang, H. (2004). Water quality impacts of climate and land use changes in southeastern
Pennsylvania. The Professional Geographer, 56(2), 240-257.
Chapra, S.C. (1997). Surface Water Quality Modeling. NewYork . McGraw-Hill Book
Company.
Chau, K.W., Cao, Y., Anson, M., & Zhang, J. (2002). Application of Data Warehouse
and Decision Support System in Construction Management. Automation in
Construction, 72(2), 213-224.
Chen,R., Chen, C., & Cheng, C, (2003). A Web-based ERP data mining system for
decision making. International Journal of Computer Applications in
Technology,! 7(3), 156-158
Chen, S.T., & Yu, P.S. (2007). Real-time probabilistic forecasting of flood stages.
Journal of Hydrology, 340, 63-77.
Chiang, Y.M., Hsu, K.L., Chang, F.J., Yang Hong, Y., & Sorooshian, S. (2007). Merging
multiple precipitation sources for flash flood forecasting. Journal of Hydrology,
340, 183-196.
Choi, W., & Deal, B. M. (2008). Assessing hydrological impact of potential land use
change through hydrological and land use change modeling for the Kishwaukee
River Basin (USA). Journal of Environmental Management, 88, 1119-1130.
Chow, V.T., Maidment, D., & Mays, L. W. (1988). Applied Hydrology. McGraw Hill.
Clesceri, N. L., Curran, S. J., and Sedlak, R. I. (1986). Nutrient loads to Wisconsin lakes:
Part I. Nitrogenand phosphorus export coefficients. Water Resources Bulletin,
22(6), 983-989.
246
Conway, T.M. & Lathrop, L.G. (2005). Alternative land use regulations and
environmental impacts: assessing future land use in an urbanizing watershed.
Landscape and Urban Planning, 70(1), 1-15.
Cotter, A. S., Chaubey I., Costello T. A., Soerens T.S., & Nelson, M. A. (2003). Water
quality model output uncertainty as affected by spatial resolution of input data.
Journal of the American Water Resources Association, 39(4), 977-986.
Dawson, C.W., & Wilby, R. (1998). An artificial neural network approach to rainfall
runoff modelling. Hydrological Sciences Journal, 43( 1), 47-66.
Demissie, M., Singh, J., Knapp, H. V., Saco, P., Lian, Y. (2007). Hydrologic model
development for the Illinois River Basin using BASINS 3.0. Illinois State Water
Survey, ISWS CR 2007-03.
Doll, B.A., Wise-Frederick, D. E., Buckner, C. M., Wilkerson, S. d., Harman, W. A.,
Smith, R. E., & Spooner, J. (2002). Hydraulic geometry relationships for urban
streams throughout the piedmont of North Carolina. Journal of the American
Water Resources Association (JAWRA), 38(3), 641-651.
Donigian, A.S., Imhoff J.C., & Bicknell, B.R., (1983). Predicting water quality resulting
from agricultural nonpoint source pollution via simulation - HSPF. In
Agricultural Management and Water Quality. Ames, Iowa: Iowa State University
Press, 200-249.
Donigian, A.S., Bicknell, B.R., and Imhoff, J.C. (1995). Hydrological Simulation
Program- Fortran (HSPF). In: V. P. Sigh (Editor), Computer Models of
Watershed Hydrology, Chapter 12. Water Resources Publications, Littleton, CO.
395-442.
Driver, N. E., Mustard, M. H., Rhinesmith, R. B., and Middelburg, R. F. (1985). U.S.
Geological Surveyurban-stormwater data base for 22 metropolitan areas
throughout the United States. United StatesGeological Survey, Open-File Report
85-337.
Environmental Protection Agency (2012). BASINS 4 lectures, data sets, and exercises.
Retrieved from http://water.epa.gov/scitech/datait/models/basins/training.cfm
Finkenbine, J.K., Atwater, J.W., & Mavinic, D. S. (2000). Stream health after
urbanization. Journal of the American Water Resources Association (JA WRA),
36(5), 1149-1160.
Fohrer, N., Haverkamp, S., Eckhardt, K., & Frede, H.G. (2001). Hydrologic response to
land use changes on the catchment scale. Physics and Chemistry of the Earth (B),
26(7-8), 577-582.
Gburek, W. J., & Folmar, G. J. (1999). Flow and chemical contributions to streamflow in
an upland watershed: a baseflow survey. Journal of Hydrology, 217, 1-18.
Gosain, A., & MannS. (2010). Object Oriented Multidimensional Model for a Data
Warehouse with Operators. International Journal of Database Theory and
Application, 3(4), 35-40.
Hall,M., Frank,E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. (2009). The
weka data mining software: An update. SIGKDD Explorations, //(I), 10-18.
Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques-2"d ed. Morgan
Kaufmann Publishers.
Hanratty, M.P., & Stefan, H.G. (1998). Simulating climate change effects in a Minnesota
agricultural watershed. Journal of Environmental Quality, 27(6), 1524-1532.
Harned D. A., Atkins J. B., & Harvill, J. S. (2004). Nutrient mass balance and trends,
mobile river basin, Alabama, Georgia, and Mississippi. Journal of the American
Water Resources Association (JAWRA), 40(3), 765-793.
Horsburgh, J.S., Tarboton, D.G., Maidment, D.R., & Zaslavsky, I. (2008). A relational
model for environmental and water resources data. Water Resources Research, 44
(W05406), doi:10.1029/2007WR006392.
Horsburgh, J.S., Tarboton, D.G., Piasecki, M., Maidment, D.R., Zaslavsky, I., Valentine,
248
Illinois Environmental Protection Agency (2009). Upper North Branch Chicago River
Watershed TMDL Stage 1 Report. Environmental Protection Agency. Retrieved
from http://www.epa.state.il.us/water/tmdl/report/chicago-river/stage-1 -report.pdf
Im, S., Brannan, K.M., & Mostaghimi, S. (2003). Simulating hydrologic and water
quality impacts in an urbanizing watershed. Journal of the American Water
Resources Association, 39(6), 1465-1479.
Imrie, C.E., Durucan, S., & Korre, A. (2000). River flow prediction using artificial neural
networks: generalisation beyond the calibration range. Journal of Hydrology, 233,
138-153.
Inmon, B. (2005). Building the Data Warehouse. New York. John Wiley.
Jeon, J., Yoon, C. G., Donigian Jr., A. S., & Jung W. (2007). Development of the HSPF
Paddy model to estimate watershed pollutant loads in paddy fanning regions.
Agricultural Water Mangment, 90(1-2), 75-86.
Jia, Y., Kinouchi, T., & Yoshitani, J. (2005). Distributed hydrologic modeling in a
partially urbanized agricultural watershed using water and energy transfer process
model. Journal of Hydrologic Engineering, 10(4), 253-264.
Johnson, M.P. (2001). Environmental impacts of urban sprawl: a survey of the literature
and proposed research agenda. Environment and Planning, A33, 717-735.
Jones, T., Johnston, C., & Kipkie, C. (2003). Using annual hydrographs to determine
effective impervious area. Practical Modeling of Urban Water Systems, 11, 291
306.
Kambayashi, Y., Kumar, V., Mohania, M., & Samtania, S. (2004). Recent Advances and
Research Problems in Data Warehouse. Lecture Notes in Computer Science,
7552,81-92.
Kimball, R. (2002). The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling. Wiley publishing.
Krause, P., Boyle, D., & Base, F. (2005). Comparison of different efficiency criteria for
249
Knapp, H.V., Singh, J., & Andrew, K. (2004). Hydrologic Modeling of Climate
Scenarios for Two Illinois Watersheds. Illinois State Water Survey, ISWS CR
2004-07
Laenen, A., (1983). Storm runoff as related to urbanization based on data collected in
Salem and Portland, and generalized for the Willamette Valley, Oregon. U.S.
Geological Survey Water Resources Investigations Report 83-4143. Retrieved
from http://or.water.usgs.gov/pubs dir/orrpts.html
Lane, P. (2007). Data Warehousing Guide, 119g, Oracle Data Base. Oracle.
LeBlanc, R. T., Brown, R. D. & FitzGibbon, J. E. (1997). Modeling the effects of land
use change on the water temperature in unregulated urban streams. Journal of
Environmental Management, 49, 445-469.
Leon, L.F., Booty, W., Wong, I.,McCrimmon, C., Melles, S., Benoy, G., & Vanrobaeys,
J. (2010). Advances in the integration of watershed and lake modeling in the Lake
Winnipeg basin. Modelling for Environment's Sake: Proceedings of the 5th
Biennial Conference of the International Environmental Modelling and Software
Societyl, 860-867.
Lin, J. (2004). Review of published export coefficient and event mean concentration data.
US Army Corps of Engineer, Wetlands Regulatory Assistance Program ERDC
TN-WRAP-04-3. Retrieved from
http://el.erdc.usace.army.mil/elpubs/pdf/tnwrap04-3.pdf
Lin, G.F., & Wang, C.M. (2007). A nonlinear rainfall-runoff model embedded with an
automated calibration method - Part 1: The model. Journal of Hydrology, 341,
186-195.
Line, D. E., White, N. M., Osmond, D. L., Jennings, G. D., & Mojonnier, C. B. (2002).
Pollutant export from various land uses in the Upper Neuse River Basin. Water
Environment Research, 74{ 1), 100-108.
Linsley, R.K., Kohler, M.A., & Paulhus, J.L. H. (1988). Hydrology for engineers. New
York, NY: McGraw-Hill.
Loehr, R. C, Ryding, S. O., & Sonzogni, W. C. (1989). Estimating the nutrient load to a
waterbody. The Control of Eutrophication of Lakes and Reservoirs, 1, 115-146.
Luzio, M. D., Srinivian, R., & Arnold, J. G. (2002). Integration of watershed tools and
swat model to basins. Jornal of American Water Resources Assaoctian, 35(4),
1127-1142.
250
McGuire, M., Gangopadhyay, A., Komlodi, A., & Swan, C. (2008). A user-centered
design for a spatial data warehouse for data exploration in environmental
research, Ecological Informatics, 5(4-5), 273-285.
Markel, D., Shamir, U. (2002). Monitoring Lake Kinneret and its watershed: forming the
basis for management of a water supply lake. In: Rubin, H., Nachtnebel, P.,
Fuerst, J., Shamir, U. (Eds.), Water Resources Quality Preserving the Quality of
our Water Resources. Springer-Verlag, pp. 177-190.
Marks D., Seyfried, M., Flerchinger, G., & Winstral, A. (2007). Research Data Collection
at the Reynolds Creek Experimental. Watershed,Journal of Service Climatology,
7(4), 1-12.
Mattikalli, N. M., & Richards, K. S. (1996). Estimation of surface water quality changes
in response to land use change: application of the export coefficient model using
remote sensing and geographical information system. Journal of Environmental
Management, 48, 263-282.
Melching, C. S., Alp, E., Shrestha, R.L., & Lanyon R. (2002). Simulation of water
quality during unsteady flow in the Chicago waterway system. Marquette
University. Retrieved from http://www.mu.edu/environment/Dearborn.pdf
Miller, S. N., Semmens, D.J., Goodrich, D.C., Hernandez, M., Miller, R.C.,
Kepner,W.G., & Guertin, D.P. (2007). The automated geospatial watershed
assessment tool. Environmental Modelling and Software, 22(3), 365-377.
Minns, A.W., Hall, M.J. (1996). Artificial neural network as rainfall-runoff model.
Hydrological Sciences Journal, 41(3), 399417.
251
Mohamoud, Y. M., Parmar, R., & Wolfe, K. (2010). Modeling Best Management
Practices (BMPs) with HSPF. ASCE Conf. Proc. Watershed Management
Conference 2010: Innovations in Watershed Management under Land Use and
Climate Change, doi:10.1061/41143(394)81.
Moran, S.M., Emmerich, W.E., Goodrich, D.C, Heilman, P., Holifield Collins, C.d.,
Reefer, T.O., Nearing, M.A., Nichols, M.H., Renard, K.G., Scott, R.L., Smith,
J.R., Stone, J.J., Unkrich, C.L., & Wong, J. (2008). Preface to special section on
Fifty Years of Research and Data Collection: U.S. Department of Agriculture
Walnut Gulch Experimental Watershed. Water, Resources Research, 44
(W05S01), doi:10.1029/ 2007WR006083.
Muttil, N., & Liong, S.Y. (2004). Physically interpretable rainfall- runoff models using
genetic programming. In: Liong, Phoon, Babovic (Eds.,), Sixth International
Conference on Hydroinformatics.
Najafi, M. Z., (2003). Watershed modeling of rainfall excess transformation into runoff.
Journal of Hydrology, 270(3-4), 273-281.
National Water Information System. (2012, January). NWIS. Retrieved from USGS
website http://waterdata.uses.gov/nwis/
National Climatic Data Center. (2012, January). NCDC and NOAA. Retrieved from
NCDC website http://www.ncdc.noaa. gov
Nichols, M.H., & AnsonE. (2008). Southwest Watershed Research Center Data Access
Project. Water Resources Research, 44 (W05S03), doi:10.1029/2006WR005665.
Nu-Fang, F., Zhi-Hua, S., Lu, L., & Cheng, J. (2011). Rainfall, runoff, and suspended
sediment delivery relationships in a small agricultural watershed of the
ThreeGorges area, China. Geomorphology, 135(1-2), 158-166.
Ould-Ahmed-Vall,E., Woodlee, J., Yount, C., Doshi, K., & Abraham S. (2007). Using
model trees for computer architecture performance analysis of software
applications. IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS), 116-125.
Paul, M.J., Meyer, J.L. (2001). Streams in the urban landscape. Annual Review of
Ecology andSystematics, 32, 333-365.
Preis, A., & Ostfeld, A. (2008). A coupled model tree-genetic algorithm scheme for flow
and water quality predictions in watersheds. Journal of Hydrology, 349, 364-
252
375.
Qi, H., (2006). Integrated watershed management and land use optimization under
uncertainty (Doctoral thesis). Available from ProQuest database. (UMI Number:
3358529).
Rai, A., Malhotra, P.K., Sharma, S.d., Chaturvedi, K.K. (2007). Data warehousing for
agricultural research- an integrated approach for decision making. Journal of the
Indian Society of Agricultural Statistics, 61(2), 264-273.
Rai, S.C. & Sharma, E. (1998). Comparative assessment of runoff characteristics under
different land use patterns within a Himalayan watershed. Hydrological Process,
12, 2235-2248.
Ramireddygari, S. R., Sophocleous, M. A., Koelliker, J. K., Perkins s. P., & Govindaraju,
R. S. (2000). Development and application of a comprehensive
simulation model to evaluate impacts of watershed structures and irrigation water
use on streamflow and groundwater: the case of Wet Walnut Creek Watershed,
Kansas, USA. Journal of Hydrology, 236(3-4), 223-246.
Rast, W., & Lee, G. F. (1983). Nutrient loading estimates for lakes. Journal of
Environmental Engineering, 109(2), 502-517.
Reckhow, K. H:, Beaulac, M. N., & Simpson, J. T. (1980). Modeling phosphorus loading
and lake response under uncertainty: A manual and compilation of export
coefficients. U.S. EPA Report No. EPA-440/5-80-011, Office of Water
Regulations, Criteria and Standards Division. Retrieved from
http://nepis.epa.gov/
Regnier, P., O'Kane, J.P., Steefel, C.I., & Vanderborght, J.P. (2002). Modeling complex
multi-component reactive-transport systems: towards a simulation environment
based on the concept of a Knowledge Base. Applied Mathematical Modelling,
26(9), 913-927.
Ren, W.W., Zhong, Y., Meligrana, J., Anderson, B., Watt, W. E.,Chen, J. K., & Leung,
H. L. (2003). Urbanization, land use, and water quality in Shanghai 1947-1996.
Environment International, 29(5), 649-659.
Rob, C., Coronely, C., & Crockett, K. (2008). Data Bases Systems: Design,
Implementation and Management. Cengage Learning EMEA.
Robbins P., & Birkenholtz, T. (2001). Lawns and toxins: an ecology of the city. Cities:
The International Journal of Urban Policy and Planning, 18(6), 369-380.
253
Robbins, P.,& Birkenholtz, T. (2003). Turfgrass revolution: measuring the expanse of the
American lawn. Land Use Polic, 20, 181-194.
Rooy,P., Anderson d., & Verstraelen P. (1993). Integrated water management considers
whole water system. Water Environment and Technology, 5(4), 38^40.
Rose, S., & Peters, N.E. (2001). Effects of urbanization on streamflow in the Atlanta area
(Georgia, USA): a comparative hydrological approach. Hydrological Processes,
75, 1441-1457.
Rujirayanyong, T., & Shi, J.J. (2006) A project-oriented data warehouse for construction.
Automation in Construction, 15, 800-807.
Sahoo, G.B., Ray, C., & De Carlo, E.H. (2006). Use of neural network to predict flash
flood and attendant water qualities of a mountainous stream on Oahu, Hawaii.
Journal of Hydrology, 327, 525-538.
Sapsford, R., & Jupp, V. (2006). Data Collection and Analysis, 2nd ed. SAGE.
Schueler, T.R. & Holland, H. K. (1994). The importance of imperviousness.
Watershed Protection Techniques, 7(3), 100-111.
Schueler, T.R. (1995). Environmental Land Planning Series: Site Planning for Urban
Stream Protection. Prepared by the Metropolitan Washington Council of
Governments and the Center for Watershed Protection, Silver Spring, Maryland.
Retrieved from http://www.mwcog.org/
Sheng,Y., Ying, G., & Sansalone Sheng, J. (2008). Differentiation of transport for
particulate and dissolved water chemistry load indices in rainfall-runoff from\
urban source area watersheds. Journal of Hydrology, 567(1-2), 144-158.
Shirinian O., Anne, A., & Christopher G. U. (2007). Modeling the Hydrology and water
quality using BASINS/HSPF for the upper Maurice River watershed, New
Jersey. Journal of Environmental Science & Health, Part A: Toxic/Hazardous
Substances & Environmental Engineering, 42(3), 289-303.
based model for catchment scale nitrate dynamics. Journal of Hydrology, 342,
143-156.
Sliva, L., & Williams, D.D. (2001). Buffer zone versus whole catchment approaches to
studying land use impact on river water quality. Water Research, 35, 3462-3472.
Simitsis, A., Vassiliadis, P., & Sellis T. (2005). Optimizing ETL processes in data
warehouses. Data Engineering: ICDE Proceedings 21st International
Conference, 564-575.
Singh, V.P., & Woolhiser, D.A. (2002). Mathematical modeling of watershed hydrology.
Journal of Hydrologic Engineering, American Society of Civil Engineers, 7(4),
270-292.
Singh, V. P., & Frevert, D.K. (2004). Watershed Modeling. ASCE Conf. Proc.
doi:10.1061/40685.
Singh, J., Knapp, H. V., & Demissie, M. (2004). Hydrologic modeling of the Iroquois
River watershed using HSPF and SWAT. Illinois State Water Survey, ISWS CR
2004-08.
Singh R K., Panda, r. K., Satapathy, K. K., & Ngachan, S. V. (2011). Simulation of
runoff and sediment yield from a hilly watershed in the eastern Himalaya, India
using the WEPP model. Journal of Hydrology, 405(3-4), 261-276.
Smith, T.E, Deacon, J.R., & Soule, S.A. (2005). Effects of urbanization on stream quality
at selected sites in the Seacoast region in New Hampshire. U.S. Geological Survey
Scientific Investigations Report, 5103, 18.
Smullen, J. T., Shallcross, A. L., & Cave, K. A. (1999). Updating the U.S. nationwide
urban runoff quality database. Water Science Technology, 39(12), 9-16.
Solomatine, D.P., Dulal, K.N. (2003). Model tree as an alternative to neural network in
rainfall-runoff modeling. Hydrological Sciences Journal, 48(3), 399411.
Solomatine, D.P., & Xue, Y. (2004). M5 model trees and neural networks: application to
flood forecasting in the upper reach of the Huai River in China. ASCE J.
Hydrologic Engineering, 9(6), 491-501.
255
Solomatine, D.P., Maskey, M., & Shrestha, D.L. (2007). Instance-based learning
compared to other datadriven methods in hydrologic forecasting. Hydrological
Processes, 21, doi: 10.1002/hyp.6592.
Sutherland, R.C. (2000). Methods for estimating the effective impervious area of urban
watersheds. The Practice of Watershed Protection, 32, 193-195.
Tan, P.N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston.
Addison Wesley.
Tang, Z., Engel, B. A., Pijanowski, B. C., & Lim, K. J. (2005). Forecasting land use
change and its environmental impact at a watershed scale. Journal of
Environmental Management, 76, 35-45.
Teuteberg, F., & StraBenburg, J. (2009). State of the Art and Future Research in
Environmental Management Information Systems: Information Technologies in
Environmental Engineering. Environmental Science and Engineering Part 2, 64-
77.
Tjoa, A.M., & Trujillo, J. (2005). Data Warehousing and Knowledge Discovery.
Copenhagen .Springer.
Tokar, A.S., & Markus, M. (2000). Precipitation-runoff modeling using artificial neural
networks and conceptual models. Journal of Hydrologic Engineering, 5(2), 156-
161.
Tong, S. T. Y., & Chen, W. (2002). Modeling the relationship between land use and
surface water quality. Journal of Environmental Management, 66(4), 377-393.
Tong, S. T. Y., & Liu, A.J. (2006). Modelling the hydrologic effects of land-use
and climate changes. Int. J. Risk Assessment and Management, 6(4/5/6).
Tong, S. T. Y., Liu, A.J., & Goodrich, J. A. (2007). Climate change impacts on nutrient
and sediment loads in a Midwestern agricultural watershed. Journal of
Environmental Informatics, 9(1), 18-28.
Tong, S. T.Y., Liua, A. J., & Goodrich, J. A. (2009). Assessing the water quality impacts
of future land-use changes in an urbanising watershed. Civil Engineering and
Environmental Systems, 26(1), 3-18
Tsihrintzis, V. A., & Hamid, R. (1998). Runoff quality prediction from small urban
catchments using SWMM. Hydrological Processes, 12, 311-329.
Tsegaye, T., Sheppard, D., Islam, K.R., Johnson, A., Tadesse, W., Atalay, A., & Marzen,
256
United States department of Agriculture. (2012, January). Major land uses in USA.
Retrieved from Economic Research Service http://www.ers.usda.gov/
United States Environmental Protection Agency (2000). Ambient water quality criteria
recommendations: Information supporting the development of state and tribal
nutrient criteria for rivers and streams in nutrient ecoregion XI. Office of science
and technology, office of water, EPA, 822-B-00-017. Retrieved from
http://water.epa.gov/scitech/swguidance/standards/criteria/nutrients/uDload/2007
09 27 criteria nutrient ecoregions lakes lakes 2.pdf
United States Environmental Protection Agency (201 la). Clean water act. USEPA.
Retrieved from http://www.epa.gov/lawsregs/laws/cwa.html
United States Environmental Protection Agency (201 lb). Regulations. USEPA. Retrieved
from http://www.epa.gov/lawsregs/
USEPA storage and retrieval system. (2012, January). STORET. Retrieved from EPA
website http://www,epa.gov/storet/
U.S. Geological Survey (1995). Water-Quality Assessment of the Upper Illinois River
Basin in Illinois, Indiana, and Wisconsin: Nutrients, Dissolved Oxygen, and
Fecal-indicator Bacteria in Surface Water, April 1987 through August 1990.
Water-Resources Investigations Report 95-400. Retrieved from
http://pubs.usgs.gov/wri/1995/4005/report.pdf
U.S. Geological Survey (1999). Environmental Setting of the Upper Illinois River Basin
and Implications for Water Quality. Water-Resources Investigations Report 98-
4268. Retrieved from http://il.water.usgs.gov/nawQa/uirb/pubs/reports/WRIR 98-
4268.pdf
U.S. Geological Survey (1999). The quality of our nation's watersnutrients and
pesticides. National water quality assessment program. Retrieved from
http://pubs.usgs.gov/circ/circl225/pdf/front.pdf
U.S. Geological Survey (2012). Real-time water quality monitoring and regression
analysis to etimate nutrient and bacteria concentrations in kansas streams. USGS.
Retrieved from http://ks.water.usgs.gOv/pubs/reports/vgc.06I0.html#HDR01
257
Vanclooster, M., Boesten, J., Tiktak, A., Jarvis, N., & Kroes, J. (2004). On the use of
unsaturated flow and transport models in nutrient and pesticide management. In:
Unsaturated-Zone Modeling: Progress, Challenges and Applications (eds R.A.
Feddes, G.H. de Rooij & J.C. van Dam), 331-361.
Walton, R.S., & Hunter, H.M. (2009). Isolating the water quality responses of multiple
land uses from stream monitoring data through model calibration. Journal of
Hydrology, 375(1-2), 29-45.
Wang, X., Sheng, Y., & Huang, G.H. (2004). Land allocation based on integrated GIS
optimization modeling at a watershed level. Landscape and Urban Planning,
66(2), 61-74.
Wang, S.H., Huggins, D.G., Frees, L., Volkman, C.G., Lim N.C., Baker, D.S, Smith, V.,
& DdeNoyelles, Jr., F. (2005). An integrated modeling approach to total
watershed management: water quality and watershed management of Cheney
Reservoir.. Water and Air and Soil Pollution, 164,1-19.
Wang Y., & Witten, I. (1997). Inducing model trees for continuous classes. Proceedings
of the 9th European Conf. on Machine Learning, 128-137.
Weng, Q. (2001). Modeling urban growth effects on surface runoff with the integration of
remote sensing and GIS. Environmental Management, 28(6), 73748.
Wicklein, S.M., & Schiffer,D.M. (2002). Simulation of runoff and water quality for 1990
and 2008 land-use Conditions in the Reedy Creek Watershed, East-Central
Florida. Water-Resources Investigations Report 02-4018; U.S. Geological Survey.
Retrieved from http://pubs.usgs.gov/wri/
Wilson, C.O., & Weng, Q. (2011). Simulating the impacts of future land use and climate
changes on surface water quality in the Des Plaines River watershed, Chicago
Metropolitan Statistical Area, Illinois. Science of the Total Environment, 409(20),
4387-4405.
Winger J.G., & Duthie, H.C. (2000). Export coefficient modeling to assess phosphorus
loading in an urban watershed. Journal of the American Water Resources
Association, 36, 1053-106.
Wu, R.S., & Haith, D.A. (1993). Land use, climate, and water supply. Journal of Water
Resources Planning Management, 119(6), 685-704.
Wu, Q., Li, H., Wang, R., Paulussen, J., He, Y., Wang, M. (2006). Monitoring and
258
predicting land use change in Beijing using remote sensing and GIS. Landscape
Urban Planning, 78, 322-33.
Yee, K.Y., Ray, A.K., & Rangiah, G.P. (2003). Multi-objective optimization of industrial
styrene reactor. Computers and Chemical Engineering, 27, 111-130.
Yu, X., Zhang, X., & Niu, L. (2009). Simulated multi-scale watershed runoff and
sediment production based on GeoWEPP model. International Journal of
Sediment Research, 24(4), 465-478.
Zoppou, C. (2001). Review of urban storm water models. Environmental Modelling &
Software, 16(3), 195-231.