A Hydrological Neighbourhood Approach To Predicting Streamflow in The Mackenzie Valley

Prediction in Ungauged Basins: Approaches for Canadas Cold Regions
A HYDROLOGICAL NEIGHBOURHOOD APPROACH TO PREDICTING STREAMFLOW IN THE MACKENZIE VALLEY

C. Spence and P. Saso
Hydrometeorology and Arctic Laboratory, Environment Canada, Saskatoon, SK S7N 3H5
ABSTRACT
Streamflow or runoff data almost invariably must be extrapolated across space and scale when designing infrastructure. This exercise is particularly difficult in Canadas northern areas because of the low density hydrometric network. There have been considerable changes to this network since the last evaluation of its ability to provide regional hydrometric data was attempted twenty years ago, and a new assessment is overdue. The objective of the study summarized in this paper is to determine, using statistical techniques, how well the hydrometric network measures the high streamflow regime of the Mackenzie Valley. A combination canonical correlation-multiple regression analysis was used to extrapolate data from 34 hydrometric stations deemed suitable for the study. A jack-knife approach was used to measure the accuracy of the extrapolation procedure. Data were also extrapolated to several poorly gauged streams for comparison. The results show that, overall, the Mackenzie Valleys hydrometric networks performance was fair. However, there was a great deal of variability in the accuracy of any extrapolation in basins smaller than 50,000 km2. There were notable problems extrapolating the regimes of smaller basins. The great improvement in recent years in the availability of physio-climatic data contrasts the contraction of the hydrometric network in northern Canada. This decrease in the number of gauged basins will make it even more difficult to statistically extrapolate streamflow in this region in upcoming years. This likely has profound economic implications for any impending infrastructure development in the Valley.
21
Spence and Saso
RSUM
Les donnes sur les dbits et le ruissellement doivent presque invariablement tre extrapoles dans lespace lors de la conception des infrastructures. Cet exercice est particulirement difficile dans les rgions du Nord canadien en raison de la faible densit du rseau hydromtrique. Dimportants changements ont t apports ce rseau depuis la dernire valuation, tente il y a vingt ans, de sa capacit fournir des donnes hydromtriques rgionales; une nouvelle valuation tarde tre ralise. Lobjectif de ltude rsume dans le prsent article est de dterminer, laide de techniques statistiques, avec quelle efficacit le rseau hydromtrique mesure les forts rgimes de dbits dans la valle du Mackenzie. Une combinaison danalyse de rgression multiple et danalyse de corrlation canonique a t utilise pour extrapoler les donnes recueillies aux 34 stations hydromtriques juges appropries pour ltude. Une approche jack-knife a t utilise pour mesurer lexactitude de la procdure dextrapolation. Aux fins de comparaison, des donnes ont galement t extrapoles plusieurs cours deau mal jaugs. Les rsultats indiquent que dans lensemble, le rendement du rseau hydromtrique de la valle du Mackenzie est bon. Toutefois, on a not une grande variabilit dans lexactitude de lextrapolation pour les bassins infrieurs 50 000 km2. Dimportants problmes ont t rencontrs lors de lextrapolation des rgimes des plus petits bassins. Les grands progrs raliss au cours des dernires annes relativement la disponibilit des donnes physio-climatiques font contraste avec la rduction du rseau hydromtrique dans le Nord canadien. Cette diminution du nombre de bassins jaugs rendra encore plus difficile lextrapolation statistique des dbits dans cette rgion au cours des prochaines annes, ce qui aura probablement dimportantes consquences conomiques sur toute laboration future de linfrastructure dans la valle.
22
A Hydrological Neighbourhood Approach To Predicting Streamflow In The Mackenzie Valley
INTRODUCTION
The task of providing representative streamflow and lake level data for an area as large and remote as the Northwest Territories is a daunting one. Logistic and fiscal situations have resulted in a relatively low-density hydrometric observation network, often necessitating industry to design infrastructure based on inadequate spatially extrapolated data. Some industrial development, notably precious and base metal and diamond mining requires data for basins smaller than most of the gauged basins. This necessitates extrapolation of data across not only space, but spatial scale. Regionalization is one procedure by which hydrometric data can be statistically extrapolated to ungauged basins. The first step to regionalization is the definition of homogenous groups of basins. One of many methodologies by which this can be done includes delimiting spatially contiguous regions, within which all basins form a homogenous group (Golder, 2000). Regions can be based upon either climatological or physiographic traits (Shawinigan, 1982) or statistically similar streamflow regimes (Golder, 2000). A second methodology defines similar areas as hydrological neighbourhoods. A basin is determined a priori, and basins hydrologically similar to that location are defined as being within a given radius across either real world (De Beers, 2002) or statistical space (Cavadias et al., 2001; Ouarda et al., 2000). The second step is to extrapolate streamflow within the homogenous region or neighbourhood using one of two common methods. The first is the index flood method (Flood Studies Report, 1975). The second identifies relationships via regression between physiographic and hydrological characteristics of gauged basins in the neighbourhood or region, which are subsequently applied to ungauged areas (Ribeiro-Corra et al., 1995; Kreuder and Leith, 1988). It is crucial to define regions or neighbourhoods correctly, as delimiting too small or too large an area will increase any error or bias the extrapolation, respectively. In a New Brunswick study, Davar and Brimley (1990) found that fewer statistically defined regions than physio-climatically defined regions are needed to extrapolate hydrometric data to the same degree of accuracy. Cavadias et al. (2001) reported on the use of canonical correlation analysis in the determination of homogenous regions in Ontario. Interestingly, they found that several Canadian Shield basins in northern Ontario were hydrologically statistically similar to the small southern Ontario Great Lakes Lowlands basin under study. Many studies have concentrated on comparing statistical techniques for identifying regions, including Cavadias (1989) in Newfoundland, Ouarda et al.
23
Spence and Saso
(2000) in Quebec and Burn (1990) in Manitoba. The earliest attempts to rigorously define hydrological regions for the Northwest Territories, which only included the mainland, were by Shawinigan (1982). The study sought to develop relationships between physiography, precipitation, temperature and hydrological regimes using a square grid technique. The high levels of uncertainty with independent variables and hydrometric data in 1982, especially in the central Arctic and Mackenzie Mountains, rendered the method unsuitable. There have been significant advances in large-scale data acquisition and assimilation since the Shawinigan (1982) study. Remote sensing technologies and advances in computing power, in particular, now make sound physiographic and extensive climate data readily available for all parts of the Northwest Territories. Widespread regionalization studies are now possible for Canadas North. In this paper we discuss the application of statistics to the prediction of streamflow, using the application of canonical correlation analysis and multiple regression to one portion of the Northwest Territories, the Mackenzie Valley (Figure 1), as an example. We will also discuss how these methodologies can be applied in the context of hydrometric network planning to improve the prediction of streamflow in ungauged basins.
METHODOLOGY Defining Hydrological Neighbourhoods Using Canonical Correlation Analysis
Canonical Correlation Analysis (CCA) was introduced by Hotelling (1936) and has since been used for many applications including estimation of peak streamflows (Cavadias et al. 2001; Ouarda et al., 2000). Recent studies suggest canonical correlation analysis to be the most effective statistical method for streamflow estimation (GREHYS, 1996). The CCA approach to streamflow estimation finds groups or neighbourhoods of hydrologically homogenous basins by correlating the streamflow record with physiographic and climatic characteristics of each basin. Essentially, CCA takes a complicated dataset made up of many variables and simplifies it so that all of the original variables are represented by new, canonical, variables. The canonical variables are made from linear combinations of the original normalized variables such that the correlation of the canonical variables is maximized. In the case of streamflow analysis, CCA takes two sets of variables and creates new canonical variables which represent the physiographic and climatic variables (i.e., V1, V2), and the hydrological or streamflow variables (i.e., W1, W2). Once the V and W canonical variables are calculated, the data are examined on X-Y plots with W1 and W2 on one graph and
24
Legend
Rivers Lakes Study Area
0 50 100 200 300 400 Kilometres
Figure 1. Location of the study area.
V1 and V2 on another. In order for CCA to be effective, the data on each graph should show a similar pattern. This indicates that the physiographic and climatic variables are highly correlated. This correlation is represented by canonical coefficients (1, 2). If the point patterns are sufficiently similar and the values are high, the V variables will be useful for estimating the W variables and vice versa (Cavadias et al. 2001). A detailed description of the mathematics employed by CCA can be found in most statistical textbooks on multivariate statistical analysis (Muirhead, 1982). Many statistics software packages perform canonical correlation including SAS, SPSS and Statgraphics.
25
Spence and Saso
In this application a jack-knife procedure was used to find the hydrological neighbourhood for each of the gauged basins. The jack-knife approach involved removing one basin from the dataset and then performing CCA on the remaining stations. The process was then repeated for each of the gauged basins. Each removed basin thus acted as an ungauged basin whereby allowing us to test the effectiveness of this estimation method for ungauged basins. The jack-knife procedure also allowed for the assessment of the stability of the canonical correlation coefficients as each basin was in turn removed from the dataset. The canonical variables generated by the CCA for each basin were then plotted and the Euclidean distance between the (V1, V2) values for the ungauged basin and the surrounding basins were measured. Assuming the distribution of distances between each basin in canonical space adheres to a chi-squared distribution (Ouarda et al., 2000), we estimated a series of neighbourhoods for the removed basin that lies at the centre of the neighbourhood using different confidence levels ().
Flood Prediction Using Multiple Regression
The authors identified the most influential physio-climatic variables as those with the highest correlation coefficients with the canonical variables and the best p-values from an initial multiple regression procedure, which included all the physio-climatic and hydrological variables and neighbourhoods defined with a confidence interval of 0.7. This step, and each multiple regression after, generated specific relationships for the neighbourhood about each jack-knifed basin. Using only the most influential physio-climatic variables, a second series of multiple- regression analyses were conducted to determine the confidence level to use to define neighbourhood size. The neighbourhood confidence level that resulted in the lowest overall percentage error for the entire sample set was subsequently used to estimate the hydrological variables. Once the suite of physio-climatic variables had been picked and the proper neighbourhood confidence level had been defined, a final suite of multiple- regression analyses were performed. These generated the final equations detailing the relationship between physio-climatic and hydrological variables for each neighbourhood.
26
APPLICATION Hydrometric Data and Filters
Hydrometric records from existing and historic Water Survey of Canada gauges in the Mackenzie Valley were screened against the following criteria: The record should contain at least 17 years of data between 1971 and 2002. The record should have very few estimated values Data gaps should not exceed more than two days in any calendar year.
Several recent studies have examined Mackenzie Valley hydrometric gauge records for detectable trends between 1971 and 2002 (Zhang et al., 2001; Burn and Hag Elnur, 2002; Spence, 2002; Woo and Thorne, 2003). Because none detected any trends in annual peak flows, the hydrometric data from the Mackenzie Valley were considered appropriate for this study. Once basins were selected, the 1:100 year (Q100) and 1:2 year, or mean annual (Q2) flood quantiles were calculated. For consistency this was done assuming a Log Pearson type III distribution (Bras, 1990) to all the peak flow records from 1971 - 2002. This distribution provided the best visual fit to most of the time series. In order to ensure that misestimated values did not skew the results, estimated values were removed from streamflow records, and peak flow values generated. They were then compared to peak flow values generated with the estimated values included. Records where a difference in peak flow values differed by more than 4% were removed. Although there were many potential basins in the Mackenzie Valley for the study a number of the datasets did not have enough reliable or timely data to estimate Q100 and Q2 floods. After screening the hydrometric data, 34 basins remained (Table 1 and Figure 2).
Physiographic and Climatic Data
Physiographic and climatic parameters were gathered for each of the selected basins. Parameters were selected based upon data availability and how significantly they were expected to affect the hydrological regime. Drainage basin area (A) is the most common variable used for defining hydrological neighbourhoods and extrapolating streamflow (Ouarda et al., 2000; RiberioCorrea et al., 1995). These data were obtained from Water Survey of Canadas HYDAT archives.
27
Spence and Saso
Table 1.
Name
Selected Basins with the most influential basin parameters

Station 07OB001 O7OC001 07PA001 07TB001 07UC001 10AC002 10AC004 10AC005 10AD001 10BA001 10BB001 10BB002 10BE001 10BE004 10BE006 10BE009 10CB001 10CD001 10EA003 10EB001 10EC001 10ED001 10ED002 10ED003 10FB005 10GB006 10GC003 10JA002 10JB001 10JC003 10LA002 10LC003 10LC007 10MC002 Q100 (m3/s) 1187 830 432 76 274 1028 286 196 1341 902 1705 928 7925 686 5802 9 693 3933 1383 2467 3548 16016 17807 264 220 2443 762 162 793 685 3176 164 98 8914 Q2 (m3/s) 851 314 175 24 116 589 113 127 787 514 1256 681 5180 235 3439 3 197 2049 572 1436 2188 9176 10911 28 28 664 78 99 309 563 1341 37 25 5362 A (km2) 47900 10400 18500 4850 15600 6940 1700 888 9450 6580 22700 11200 104000 2570 61600 211 2160 20300 8560 14600 31100 222000 275000 542 1310 20200 2050 32100 17300 146400 18750 1310 625 70600 %LR 0.53 0.02 4.79 14.57 6.65 0.96 0.12 0.01 0.76 0.13 0.37 0.49 0.84 0.12 1.20 0.01 0.24 0.36 1.76 0.76 1.18 0.94 1.12 0.01 0.01 4.76 1.59 18.02 9.22 28.08 0.15 4.35 11.70 0.77 %VLV 0.16 0.01 2.59 23.40 0.70 15.40 28.40 29.20 26.70 27.70 22.60 28.30 17.89 48.30 18.71 1.14 11.95 12.61 21.31 47.07 36.65 11.29 13.62 0.01 0.01 13.92 0.06 14.40 10.50 18.50 28.10 1.13 8.00 36.20 P (mm) 398 415 354 311 355 530 456 549 592 610 652 698 553 808 525 510 637 597 673 737 721 505 554 414 408 368 424 310 352 306 531 302 297 509 DSF (m/km2 x 10-5) 1.1 1.8 1.3 4.9 1.8 2.4 3.9 7.0 2.6 2.4 1.8 2.3 0.6 3.2 0.8 7.2 3.9 1.2 2.2 1.5 1.0 0.5 0.5 6.4 5.5 2.0 3.6 1.2 2.2 0.5 1.9 0.7 0.6 0.7 Period of record 1963 date 1969 date 1969 1990 1978 1997 1962 1990 1957 1993 1963 1995 1964 date 1946 1993 1967 1993 1962 1995 1967 1994 1944 date 1961 date 1696 1995 1979 date 1944 date 1944 date 1960 date 1962 date 1969 1995 1942 date 1972 date 1974 date 1972 date 1975 date 1972 date 1933 date 1969 1992 1961 - date 1968 date 1973 date 1975 date 1969 date n (yrs) 32 29 20 18 20 23 25 32 22 22 19 22 32 32 24 23 24 26 27 29 23 29 30 27 21 21 30 31 19 17 24 21 23 22
Hay Chinchaga Buffalo Emile Kakisa Dease Blue Cottonwood Hyland Turnagain Kechika (at mouth) Kechika (above Boya) Liard (Lower Crossing Toad Liard (above Kechika) Teeter Sikanni Muskwa Flat South Nahanni (Virginia Falls) South Nahanni (Clausen Creek) Liard (Ft. Liard) Liard (at mouth) Birch Jean Marie Willowlake Martin Camsell Johnny Hoe Great Bear Arctic Red Rengleng Caribou Peel
Precipitation also plays an influential role in streamflow generation. Mean annual precipitation values were obtained from the CANGRID dataset (Louie et al., 2002). This experimental Meteorological Service of Canada dataset provides precipitation values across Canada at a 50-km resolution. In order to obtain annual precipitation values for each basin, a GIS overlay procedure was used. The National Scale Frameworks drainage areas (The Atlas of Canada, 2000) were overlaid with the CANGRID dataset. A query was then performed to find the CANGRID points that existed within each basin and the values were averaged to estimate the mean annual precipitation (P) value for each basin. A dimensionless shape factor (DSF) variable was used by GREHYS (1996) to account for the variation in basin hydrological response that can be influenced by its shape. For instance, a very wide basin is likely to have less flashy runoff than a narrower basin since it takes longer for water from the outskirts of the basin to reach the main channel. This variable was calculated for each basin following:
28
0 50 100
200
300
400
Kilometres
Figure 2. Selected basins for analysis.
DSF =
0.28 B p A
(1)
where Bp is the basin perimeter.

29
Spence and Saso
The slope of the main channel (S) was estimated in order to differentiate between steeper and lower sloped basins. Elevation data were derived from the GTOPO30 database. The following equation was then applied to the elevations obtained from the dataset:
S= E1 - E2 L
(2)
where E1 and E2 are the elevation at the top and bottom of the channel, respectively. L is the length of the channel. The channel was defined using the 1:1,000,000 National Scale Frameworks drainage network (The Atlas of Canada, 2000). Incorporating land cover into the statistical model allowed us to account for the influence of land cover on the hydrological response. For instance, basins are expected to produce contrasting runoff regimes if there are notable differences in the distribution of land cover types with different evaporative regimes, such as deciduous forests (Blanken et al., 1997) and exposed bedrock (Spence and Woo, 2002). Land cover distribution was assessed for each of the basins using the Canada Centre for Remote Sensing land cover map of 1998 dataset (Cihlar et al., 2002). The original dataset divided land cover into 31 different categories. These categories were merged into nine broad categories based on similarities in hydrological properties. These include percentages of lakes and rivers (%LR), evergreen forests (%EF), deciduous forests (%DF), mixed forests (%MF), low vegetation (%LV), wetlands (%W), very low vegetation (%VLV), exposed bedrock (%R), and permanent ice cover (%I). When measured percentages were equal to zero, a value of 0.01% was used during the canonical correlation and multiple regression analysis.
Canonical Correlation Results
With two pairs of thirteen physio-climatic variables and two streamflow variables, two pairs of canonical variables were defined. They can be represented as:
W1 = - 0 .74 log( Q100 ) + 1 .71 log( Q 2 ) W2 = 3 .85 log( Q 100 ) - 3 .53 log( Q 2 )
(3)
(4)
30
V1 = 0 .70 log( A ) - 0 .22 log(% LR ) + 0 .002 log(% EF ) + 0 .06 log(% DF ) + .0004 log(% MF ) + 0 .17 log(% LV ) + 0 .03 log(% W ) + 0 .23 log(% VLV ) + 0 .1 log(% R ) - 0 .05 log(% I ) + 0 .29 log( P ) - 0 .04 log( S ) - 0 .19 log( DSF )
(5)
V2 = -0.08 log( A) - 0.22 log(% LR ) + 0.48 log(% EF ) + 0.59 log(% DF ) - 0.16 log(% MF ) + 0.82 log(% LV ) - 0.32 log(%W ) - 0.83 log(%VLV ) + 0.72 log(% R ) + 0.76 log(% I ) - 0.06 log( P ) + 0.14 log( S ) - 0.15 log( DSF )
(6)
DSF, A, and P exhibit the strongest relationships with the physioclimatic canonical variable V1 (Table 2). Most of the original variables are strongly associated with V1, but some notable basin characteristics such as percentage of lakes or rivers (%LR), percentage of evergreen forest (%EF) and percentage of wetlands (%W) are better correlated with V2. The similar correlation coefficients between W1 and both Q100 and Q2 suggest that both the latter terms vary together and are well captured within variation in W1.
r Basin Variables A** %LR ** %EF %DF %MF %LV %W %VLV ** %R %I P ** S DSF ** Flood Variables Q100 Q2 Canonical l V1 0.64 -0.13 -0.25 -0.36 -0.17 -0.25 0.27 0.46 0.40 0.48 0.47 -0.13 -0.78 W1 0.67 0.68 V2 0.04 -0.35 0.44 -0.27 -0.03 0.16 -0.38 -0.31 0.07 0.08 0.04 -0.07 -0.15 W2 0.32 0.28
The similar basin distribution in both physio-climatic and stream0.98 0.83 flow canonical space implies that the selected physio-climatic Table 2. Correlations between the physiovariables have an influence on the climatic parameters and the canonical hydrological regimes of the selected variables. ** denotes those terms used in the multiple regression analyses. streams (Figure 3). This is corroborated by the high canonical lambda coefficients, 0.979 (1) and 0.831 (2), relating the physio-climatic and streamflow variables. The larger and smallest basins are in the upper right and upper left quadrants of the canonical space, respectively. Lake-dominated basins, such as the Camsell, Great Bear, Kakisa and Emile are in the lower quadrants. Basins typical of the Taiga Plains ecozone (Ecological Stratification Working Group, 1996) such as the Jean Marie and Willowlake tend to be in the upper quadrants. The mountainous basins, including the South Nahanni, Arctic Red, Hyland, Toad and Kechika tend to be grouped together near the centre of the space, with the larger basins tending to be in the right quadrants.
31
Spence and Saso
2.5 2 1.5 1 Chinchaga 0.5 Rengleng Caribou Blue Birch Jean Marie
Martin
Willowlake
Arctic Red Liard (mouth)
Sikanni
Flat
Muskwa South Nahanni (CC) Liard (FL)
Johnny Hoe Kakisa
Hyland Hay
V2
Kechika (B) Liard (K)
0 -0.5 -1 -1.5 -2
Toad Camsell
Kechika (mouth) Liard (LC) Peel South Nahanni (VF)
Turnagain Buffalo Dease
Great Bear Emile Teeter Cottonwood -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-2.5 -2.5
-2
V1
2.5 2 1.5 1 0.5 Rengleng Birch
Martin
Jean Marie Chinchaga Sikanni
Willowlake
Liard (K) Liard (FL) Liard (mouth) Liard (LC)
Arctic Red Muskwa
Peel
W2
Toad Buffalo Blue

Kakisa Johnny Hoe
Flat Dease
0 Caribou -0.5 -1 -1.5 -2 Teeter -2.5 -2.5 Emile
South Nahanni (CC) South Nahanni (VF) Hyland Kechika (mouth) Hay Kechika (B) Great Bear
Turnagain
Camsell Cottonwood
-2
-1.5
-1
-0.5
0.5
1.5
2.5
W1
Figure 3. Distribution of study basins in canonical space.
32
Identification of Optimal Hydrological Neighbourhood Size
Neighbourhood size had a profound influence on the median percentage error of predicted Q100 values for all 34 basins (Figure 4). Sometimes there were none, or very few, gauged basins present when using a high confidence level (i.e., = 0.1) to define a basins neighbourhood,. The average number of basins in each neighbourhood at a 90% confidence level was only three. This was especially common for the smaller basins in the left quadrants of the canonical space. This made it impossible to make predictions for some basins and lead to significant error, often in the range of 200% or more, in predicted Q100. Results were vastly improved by reducing the confidence level to 80%. The lower confidence levels of 60% and 65% included more dissimilar basins in the neighbourhood of the study basin, thereby misestimating the peak flood values. The best results were at the 70% confidence level, similar to findings of Ouarda et al. (2000) and Ribeiro-Corra et al. (1995). Neighbourhood size at the 70% confidence interval ranged from only one to twenty-four and averaged fifteen basins. The 70% confidence level was applied in all subsequent exercises to define neighbourhoods.
250
200
Median percentage error
150
100
50
0 0.5
0.6
0.7
0.8
0.9
1.0
1- a
Figure 4. Median percentage error of calculated Q100 versus confidence level defining neighbourhood size.
33
Spence and Saso
20000
10000
Q100 Estimated (m3/s)
15000
Q2 Estimated (m3/s)
0 5000 10000 15000 (m3/s) 20000
8000
6000
10000
4000
5000
2000
2000
4000
6000
8000
10000
Q100 Calculated
Q2 Calculated (m3/s)
Figure 5. Estimated values of Q100 and Q2 versus those calculated using the observed record.
Jack-knife Methodology Results
Not unexpectedly, the selection of neighbours was a function of physio-climatic similarity as defined by the key independent variables in Table 2. Neighbours were not necessarily always in geographic proximity, but they tended to be basins with similar areas and shape as shown by the dimensionless shape factor. There were good extrapolated estimates of Q100 and Q2 (Figure 5) for the Liard and Peel Rivers, with percentage errors of 5%. Median observed error was 37%. There was a negative trend between error and range in error with basin size (Figures 5 and 6). Average errors in basins sized from 10,000 to 49,999 km2 were 42%, but values ranged widely from 4% to 81%. The average error in basins smaller than 10,000 km2 equaled the maximum error (81%) in the larger basin class. Minimum error did not increase at nearly the same rate as maximum error did with decreasing basin size. The only identifiable trend with error to physiography and/or climate was a general trend towards peak flow overestimates in basins with high %LR. There were eleven basins for which no reasonable estimate could be made because there were too few members in the 70% confidence level neighbourhood. These were usually small basins (i.e., Birch and Rengleng Rivers). There was only one reasonable estimate provided for basins smaller than 1,000 km2. Other basins for which estimates could not be made include the largest basin (Liard River at the mouth) and distinctly different basins from the rest of the study set (i.e., Camsell and Great Bear Rivers).
34
250
200
Error (%)
150
100
50
0 0 - 4999 5000 - 9999 10000 - 49999 50000 - 99999 > 100000
Basin area (km2)
Figure 6. Influence of basin area on percentage error of estimated Q100.
Tests With Poorly Gauged Basins
Eight basins in the Mackenzie Valley which did not meet the original hydrometric data standards were also evaluated to measure the ability to estimate the hydrological regime of these poorly gauged basins (Figure 7). Results varied widely amongst the basins (Table 3). For instance, the extrapolated estimate of the Carcajous Q100 flood lies within the 90% confidence bounds of the calculated flood (Figure 8). Extrapolated floods overestimated calculated floods on Big Smith Creek, but underestimated them on the neighbouring Blackwater (Figure 8). The magnitude of error in the flood estimates of the smaller Big Smith basin was comparable to those associated with the small basins examined with the jack-knife approach. This provides further evidence that we are unable to predict the peak flows of many streams to the same degree of confidence using statistical extrapolation as we can by gauging them. Reasonable flood estimates could not be determined for the Harris. Flood estimates could not be determined at all for either the Twitya or Mountain, as there were no similar basins within the bounds of their neighbourhoods.
35
Spence and Saso
Table 3.
Calculated and estimated flood quantiles for the five poorly gauged basins for which reasonable estimates could be determined.
Q2 (m3/s) Basin Big Smith Blackwater Carcajou Ramparts Carnwath Calculated 10110% 32829% 76116% 54811% 80127% Estimated 171 204 1501 228 373
Q100 (m3/s) Calculated 18517% 87147% 192127% 92519% 192838% Estimated 462 476 2518 643 610
10na001 Carnwath
Ramparts 10kd004
10kc001 Mountain
Norman Wells
Great Bear Lake
10kb001 10hc003 Carcajou Big Smith
10ha003
Twitya
10hc006 Blackwater
Fort Simpson
10gc002
Figure 7. Eight poorly gauged basins in the Mackenzie Valley. 36
3000
Carcajou
Streamflow (m3/s)
500 450 400 350 300 250 200 150 100 50 0
Streamflow (m3/s)
2500 2000 1500 1000 500 0 2000 1800 0 20 40 60 80 100
Big Smith
20
40
60
80
100
Year flood
Year flood
Blackwater
Streamflow (m3/s)
1600 1400 1200 1000 800 600 400 200 0 0 20 40 60 80 100
Year flood
Figure 8. Flood frequency analysis results. White boxes are calculated results using observed data, bounded by 90% confidence intervals. The extrapolated estimates of Q100 and Q2 are illustrated by the black circles.
Small Basin Error
As shown in Figure 6, errors were very large when extrapolating peak flows to the small basins included in this study. There are at least three potential explanations for the trend of increasing average error with decreasing basin size. First, statistical regularity would suggest that the smaller neighbourhoods associated with the smaller basins would account for some of the observed error. Neighbourhoods of small basins would often include relatively larger basins with much higher peak flows, resulting in overestimation of the peak flow for the small basin. Second, the error in the measure of the independent variables may be more pronounced in smaller basins. The smaller the basin, the fewer the number of CANGRID data points, and the poorer the correlation between Q100 and P (Figure 9). In addition, the northern portions of the CANGRID dataset are recognized as being the most likely to be in error (E. Mekis, 2003, personal communication). The broad-based land cover classification did not differentiate between different types of wetlands, which have been documented to have different hydrological roles in the northern interior plains (Quinton and Hayashi, 2005). Third, there is error associated with the correlation between the independent variables and the flood regime. The best independent variables of A, P, %LR, %VLV and DSF applied in the multiple regression analysis were selected based
37
Spence and Saso
on results from all 34 basins. The land cover percentage terms, especially %LR (Figure 9), appear particularly subject to changes in influence with scale. At small scales, different patterns of variability will result in different runoff responses, even if spatial distribution is the same (Beven, 1991; Wood et al., 1988). This theoretical interpretation is corroborated by field results from northern environments that suggest the topology of land cover influences runoff response in small basins (Quinton et al., 2003; Spence and Woo, 2003).
IMPLICATIONS FOR THE HYDROMETRIC NETWORK
Hydrometric networks are systems that provide data and information with both observations and analytical studies. A hydrometric network generates numerous data and information products, including those used in the operational management of water resources, the inventory of the resource, and the characterization of streamflow regimes. This last group of products has been the hardest for network planners to justify to network administrators even though recent network evaluations in southern Canada show that it is likely the most important economically (Azar et al., 2003). Once the spatial extent of a regime
1 0.8 0.6
Q100, P Q2, %LR
Correlation coefficient r
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0 - 999 1000 - 9999 10000 - 99999 > 100000
Basin area (km2)
Figure 9. The influence of basin area on correlation coefficients between Q100 and mean annual precipitation and Q2 and percentage of lakes and rivers.
38
has been characterized, the information about that regime can be used for engineering design within that geographic space. The problem, especially in northern Canada, is that the streamflow regimes are not well defined, delimited or characterized, and that increases the uncertainty and cost of engineering design. The key to network planning is to design an efficient observational program that can be used to characterize all the streamflow regimes in a given region with a minimum of uncertainty. The hydrometric station density in the Mackenzie Valley, calculated using the 34 basins selected for this study, is approximately 1 station per 25,000 km2. With such low coverage, there are bound to be physio-climatic landscapes and streamflow regimes that are not captured by the hydrometric network. The stations with sufficient data to be used for this study were concentrated in basins larger than 5,000 km2 in the northern Rocky and southern Mackenzie Mountains and basins larger than 10,000 km2 in the southern interior plains. Eleven, or 32% of the selected basins, were smaller than 5,000 km2 but they were widely spread across canonical space because of variations in both %LR and %VLV. The high percentage error found when estimating streamflow in small basins can be attributed to a lack of hydrological neighbours and high variability between small basins. The network must expand to include additional basins smaller than 5000 km2 in order to increase the accuracy of statistical streamflow predictions at this scale. Canonical space scatterplots may be a valuable hydrometric network planning tool. The usefulness of candidate gauged basins for reducing uncertainty can be estimated by determining their location in physio-climatic canonical space relative to existing gauged basins. If the candidate basin is represented in the canonical space in an area where many gauged basins already exist, the new gauges data would not significantly increase the accuracy of prediction and therefore should be rejected. Basins should be selected such that their addition will fill gaps in the canonical space and hence improve our understanding of different physio-climatic regions. Similarly, the location of each existing gauged basin in canonical space can be examined to determine which stations may be redundant. It is not a trivial exercise to decide how such tools should be used in network planning. There are some key cautionary points that network planners must be aware of prior to adopting such tools for planning purposes. The distribution illustrated in Figure 3 represents how well the high-flow regimes of the 34 basins relate. An evaluation of the mean or low-flow regimes to the same suite of physio-climatic variables would result in different forms of Eqs. (3) to (6),
39
Spence and Saso
correlation values in Table 2, and basin distribution in canonical space. The neighbourhoods associated with each basin could change. Similarly, the prominent gaps in which landscapes or basin sizes are measured by the hydrometric network may change. It is important to evaluate the networks ability to characterize the entire streamflow regime. Then changes to the network which will maximize information production can be recommended with confidence. The accuracy and scale issues associated with physio-climatic variables have been highlighted earlier. Eqs. (3) to (6) are, in essence, streamflow models derived specifically from the unique correlation between the selected physioclimatic and hydrological variables as measured and/or estimated. This makes the initial choice of physio-climatic variables and their sources incredibly important to the success of the application of canonical correlation and multiple regression in network planning. Careful evaluation of all landscape variables is necessary to ensure that the data are of high quality and are representative of the basin(s) and streamflow generation processes of interest.
CONCLUSIONS
It is a fallacy that in a region as physiographically diverse as the Mackenzie Valley, basins near to one another would have similar hydrological regimes. Neighbours identified by this study were not necessarily in geographic proximity to one another. Neighbours tended to have similar areas, precipitation, shapes and land cover distribution. As such, it is recommended that similarity in basins physio-climatic characteristics be considered at least as important as geographic proximity in future extrapolation exercises in northern Canada. Median percentage error associated with Q100 and Q2 extrapolation was 37%. The results show that, overall, the Mackenzie Valleys hydrometric networks performance is fair. However, there is a great deal of variability in the accuracy of any extrapolation in basins smaller than 50,000 km2. There are notable problems when extrapolating the regimes of smaller basins. Error tended to increase with decreasing basin size. Error in basins smaller than 5000 km2 averaged 81% with much variation, ranging from 233% to 6%. Only one reasonable estimate of both the Q100 and Q2 floods could be identified in basins smaller than 1000 km2. This pattern is attributed to the fewer neighbours in small basins neighbourhoods, more pronounced error in the independent variables at smaller scales, and decreasing influence by the selected independent variables on the hydrological regime. In order to generate accurate
40
estimates of peak flows in small basins, it may be necessary to study them independently of large basins. Unfortunately, the number of small gauged basins presently in the network likely prohibits this option. The authors recommend an increase in the number of gauged basins smaller than 5,000 km2. Canonical correlation analysis may be a potential tool that network planners can use to decide which basins to gauge to best reduce the uncertainty associated with the prediction of streamflow. This study represents the first comprehensive regionalization exercise in the Mackenzie Valley in over two decades. It is the first application of an extensive statistical approach because inclusive physiographic and climatic data became available for this area only in recent years. It should be stressed that such a regionalization study may be feasible now, but not in the near future, despite recent development and future expected improvements in physio-climatic data. Fiscal restraint measures in the 1990s reduced the hydrometric network by onethird between 1990 and 2000. Thirteen of the thirty-four stations used in this exercise (38%) are no longer operating. These closed stations will no longer be useful for extrapolating extreme flows after the next extensive extreme event occurs in the Mackenzie Valley. It will be even more difficult to accurately extrapolate flows across all the different hydrological regimes of the Mackenzie Valley with only 21 stations. At the very least, smaller neighbourhoods and an associated increase in error would be expected. Because of the precautionary principle used in environmental assessment today and expensive construction costs in Canadas north, any increase in the uncertainty of extrapolated streamflow data in this region could have profound economic implications on future infrastructure development in the Mackenzie Valley.
ACKNOWLEDGEMENTS
Dale Ross and others in the NWT/Nunavut District of the Water Survey of Canada have strongly supported the need for this project. Monies were provided by Environment Canadas Prairie and Northern Region Environmental Assessment Research and Development fund. The authors thank Craig Machtans of the Canadian Wildlife Service and Dr. Don Burn of the University of Waterloo for sharing their statistical expertise. Thank you to Eva Mekis of Environment Canada for her expertise of CANGRID and Jocelyn Ledger of the Canadian Wildlife Service for sharing her GIS knowledge.
41
Spence and Saso
REFERENCES Azar, J., D. Sellars and D. Schroeter. 2003. Hydrometric Business Review. Report for the British Columbia Ministry of Sustainable Resource Development, 64 pp. Beven, K. 1991. Scale considerations. Recent Advances in the Modelling of Hydrologic Systems, Boules, D. and P. OConnell (eds.), 357-371. Blanken, P.D., T.A. Black, P.C. Yang, H.H. Neumann, X. Nesic, R. Staebler, D. den Hartog, M.D. Novak and X. Lee. 1997. Energy Balance and Canopy Conductance of a Boreal Aspen Forest: Partitioning Overstory and Understory Components. Journal of Geophysical Research, 102(D24): 28915-28927. Bras, R.L. 1990. Hydrology: An Introduction to Hydrologic Science. Addison Wesley, Reading, 643 pp. Burn, D.H. 1990. Evaluation of Regional Flood Frequency Analysis with a Region of Influence Approach. Water Resources Research, 26: 2257 - 2265. Burn, D.H. and M.A. Hag Elnur. 2002. Detection of Hydrologic Trends and Variability. Journal of Hydrology, 255: 107-122. Cavadias, G.S. 1989. Regional Flood Estimation by Canonical Correlation. Proceedings 1989 Conference of the Canadian Society of Civil Engineers, 11A. 212-231. Cavadias, G., T.B.M.J. Ouarda, , B. Bobe and C. Girard (2001). A Canonical Correlation Approach to the Determination of Homogeneous Regions for Regional Flood Estimation of Ungauged Basins. Hydrological Sciences Journal, 46(4): 499-512. Cihlar, J., J. Beaubien, and R. Latifovic. 2002. Land Cover of Canada 1998. Special Publication, NBIOME Project. Produced by the Canada Centre for Remote Sensing and the Canadian Forest Service, Natural Resources Canada. Available from the Canada Centre for Remote Sensing, Ottawa, Ontario. Davar, Z.K. and W.A. Brimley. 1990. Hydrometric Network Evaluation: Audit Approach. Journal of Water Resources Planning and Management, 116: 134-146. De Beers. 2002. Snap Lake Diamond Project: Environmental Assessment Report. Submitted to the Mackenzie Valley Environmental Impact Review Board. 123 pp. Ecological Stratification Working Group. 1996. A National Ecological Framework for Canada. Agriculture and Agri-Food Canada, Research Branch, Centre for Land and Biological Resources Research and Environment Canada, State of Environment Directorate, Ottawa/Hull. 125 pp. Flood Studies Report. 1975. Vol. I - Hydrological Studies. Natural Environment Research Council, London, 570 pp. Golder. 2000. Homogenous Regionalization of Canadas Reference Hydrometric Basin Network. Report 001-1330 to Environment Canada, 27 pp.
42
GREHYS. 1996. Intercomparison of Flood Frequency Procedures for Canadian Rivers. Journal of Hydrology, 186: 85-103. Hotelling, H. 1936. Relations between Two Sets of Variates. Biometrika, 28: 321-377. Kreuder, W.L. and R.M. Leith 1988. Hydrometric Network Planning and Evaluation in the Pacific and Yukon Region. Environment Canada. 17 pp. Louie, P.Y.T., W.D. Hogg, M.D. Mackay, X. Zhang and R.F. Hopkinson. 2002. The Water Balance Climatology of the Mackenzie Basin with Reference to the 1994/95 Water Year. Atmosphere Ocean, 40: 159-180. Muirhead, R.J. 1982. Aspects of Multivariate Statistical Theory. Wiley, New York, 673 pp. Ouarda, T.B.M.J., M. Hach , P. Bruneau & B. Bobe. 2000. Regional Flood Peak and Volume Estimation in a Northern Canadian Basin. ASCE, Journal of Cold Regions Engineering, 14(4): 176-191. Quinton, W.L., M. Hayashi and A. Pietroniro. 2003. Connectivity and Storage Functions of Channel Fens and Flat Bogs in Northern Basins. Hydrological Processes, 17: 665-3684. Quinton, W.M. and M. Hayashi. 2005. The Flow and Storage of Water in the WetlandDominated Central Mackenzie River Basin: Recent Advances and Future Directions. In: Prediction in Ungauged Basins: Approaches for Canadas Cold Regions. Spence, C., J.W. Pomeroy and A. Pietroniro (eds.), Canadian Water Resources Association, 45-66. Ribeiro-Correa, B., G.S. Cavadias, B. Clement and J. Rousselle. 1995. Identification of Hydrological Neighbourhoods Using Canonical Correlation Analysis. Journal of Hydrology, 173: 71-89. Shawinigan. 1982. Northwest Territories Water Resources Inventory Study. Environment Canada, 71 pp. Spence, C. 2002. Streamflow Variability (1965 to 1998) in Five Northwest Territories and Nunavut Rivers. Canadian Water Resources Journal, 27: 135-154. Spence, C. and M.K. Woo. 2002. Hydrology of Subarctic Canadian Shield: Bedrock Upland. Journal of Hydrology, 262: 111-127. Spence, C. and M.K. Woo. 2003. Hydrology of Subarctic Canadian Shield: Soil-filled Valleys. Journal of Hydrology, 279: 151-166. The Atlas of Canada. 2000. National Scale Frameworks Hydrology. Atlas Frameworks V5.0, Government of Canada. Ottawa. Woo, M.K. and R. Thorne. 2003. Streamflow in the Mackenzie Basin, Canada. Arctic, 56: 328-340.
43
Spence and Saso
Wood, E., M. Sivapalan, K. Beven and L. Band. 1988. Effects of Spatial Variability and Scale with Implications to Hydrologic Modelling. Journal of Hydrology, 102: 29-47. Zhang, X., K.D. Harvey, W.D. Hogg and T.R. Yuzyk. 2001. Trends in Canadian Streamflow. Water Resources Research, 37: 987-998.
44

A Hydrological Neighbourhood Approach To Predicting Streamflow in The Mackenzie Valley

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Hydrological Neighbourhood Approach To Predicting Streamflow in The Mackenzie Valley

Uploaded by

Copyright:

Available Formats

Prediction in Ungauged Basins: Approaches for Canadas Cold Regions