You are on page 1of 5

Sarhad J. Agric. Vol.25, No.

4, 2009

COLLECTIVE AND INDIVIDUAL MONTH-WISE DATA MANAGEMENT APPROACH ON THE DATA COLLECTED IN KALAM (SWAT) THROUGH MULTIPLE REGRESSION ANALYSIS
AMJAD MASOOD*, SYED MUHAMMAD SAEED SHAH**, MANZOOR AHMAD MALIK* GUL DARAZ KHAN***, SAMAR GUL* and IKRAMUL HAQ**** * ** *** **** Pakistan Council of Research in Water Resources, Peshawar Pakistan. Center of Excellence in Water Resources Engineering, University of Engineering and Technology, Lahore Pakistan. Department of Water Management, NWFP Agricultural University, Peshawar Pakistan. Department of Agricultural Extension Education and Communication, NWFP Agricultural University, Peshawar Pakistan.

ABSTRACT An understanding of hydrological regimes of mountain rivers is essential for water resources management in Pakistan. As there are no proper estimates and relationships of river flow and climatic variables and especially snow-melt stream flow relationships there is always a chance of floods which causes serious damages to crops, human beings and other infrastructures. A proper study is therefore required to understand and analyze the runoff regimes and its relation to the climatic variables to forecast the river flow. In this study different hydrological regimes of River Swat Basin at Kalam are investigated by using inter-relationships of runoff with rainfall and temperature. In regression analysis, two types of data management schemes are used in this study, first is Individual Monthwise Regression in which 30 years monthly values of each parameter are regressed for each month individually. The second is Collective Monthwise Regression in which normal value of each month is tabulated against each parameter and then collectively flow values are regressed on precipitation, temperature and relative humidity. In this collective monthwise technique encouraging results were obtained, which can be used for future prediction of flow. Thirty years data was used to find out the linkages between river flows and climatic variables (temperature, precipitation and relative humidity) in the study. Linkages in collective monthwise approach of regression analysis came out quite better, especially for flow and temperature. This is due to the absence of gauging stations at upper elevations, or it may be due to the reason that the station is not representative of the whole catchment. In this study it is found that the Collective Month-wise Technique is a useful technique for Swat River Basin at Kalam for predicting flow in River Swat.
Key Words: Water resources management, river flow, climatic variable, snowmelt, runoff regimes, regression analysis

Citation: Masood, A., S.M.S. Shah, M.A. Malik, G.D. Khan, S. Gul and I. Haq. 2009. Collective and individual month-wise data management approach on the data collected in Kalam (Swat) through multiple regression analysis. Sarhad J. Agric. 25(4): 557-561. INTRODUCTION Many of river catchments lie in the most northern part of Pakistan. The climate in Pakistan is mainly arid and semiarid. Having high altitudes, these catchments receive a considerable amount of snowfall during winter season. The stream flow is mainly due to melting of the snow. The snowmelt stream flow is a valuable one because it occurs in the period of April, May and June before the monsoon rainfall. This early stream flow runoff is therefore available for irrigation, power generation and water supply at the time when there is an extreme drought. The snowmelt and glacier-melt continue in July and August, but meantime there is abundant water from the monsoon of damaging floods. But due to lack of proper estimates and relationships of river flow and climatic variables and snow-melt stream flow relationships there is always a chance of floods which causes serious damage to crops, human beings and other infrastructures. A proper study therefore is required to understand and analyze the runoff regimes and its relation to the climatic variables to forecast the river flow at the proposed site. One strategy for fitting a "best" line through the data would be to minimize the sum of the residual errors which is ei. This is an inadequate criterion; because best fit is a line connecting the points. Therefore, any straightline passing through the mid point of connecting line (except a perfectly vertical line) results a minimum value of ei (Chapra and Canale, 1984). After fitting the best line, the second step in regression analysis is whether the data can be adequately described by the regression line (Haan, 1973). Linear regression provides a powerful technique for fitting a "best" line to data. It is predicted on the fact that, relationship between dependent and independent

Amjad Masood et al. Collective and individual month-wise data management approach

558

variables is linear (Chapra and Canale, 1984). Although these models fit into linear regression in order to evaluate the coefficient as a best fit. They could then be transformed back to original state and used for predictive purposes. To test the accuracy of models, following three criteria (Haan, 1973; Pergram and Stretch, 1982; Chapra and Canale, 1984) can be used. A useful extension of linear regression is the case where dependent variable Y is a linear function of two or more independent variables X1, X2, X3 etc. ^ Y = B + B X + B X + B X .......... .......... [ a ] 0 1 1 2 2 3 3 Usually n observations are available for the variable and also n numbers of equations are formed one for each observation. Therefore, we have to solve these n equations for the p unknown parameters (regression coefficients) then n must be greater than p (Haan, 1973).

There is a role of seasonal snow accumulation based on surface measurements (De Scally, 1994) or on remotely sensed assessments of snow covered area (Rango et al. 1977; Dey et al. 1989). De Scally (1994) studied the Jehlum Basin and obtained high correlation coefficients between annual maximum snow peak water storage or total winter precipitation and annual runoff, whilst summer precipitation was of little use in estimating annual flow. It is concluded that in the Kunhar basin low elevation snow courses were as useful for forecasting as data from remote high elevation sites. Salomonson et al. (1997) analyzed low-resolution meteorological satellite data and simple photo interpretation techniques have been used to map snow-covered areas during early April over the Indus River and Kabul River basin in Pakistan. The stream flow in the regression analyses for each watershed was estimated (Indus River, 1969-1973, R2 = 0.82 and Kabul River, 1967-1973 R2 = 0.89). Predictions of 1974 seasonal stream flow using the regression equations were within 7% of the actual 1974 flow. Singh and Jain (2003) conducted studies on daily stream flow simulation for the Sutluj River basin located in the western Himalayan regions. The model was calibrated using a data set of three years (1985/86-1987/88) and model parameters were optimized. Using these optimized parameters, simulations of daily stream flow were made for a period of six years (1988/89-1990/91 and 1996/97-1998/99). Modeling of stream flow involves physical features of the basin, including its total area, its altitudinal distribution through elevation zones and the areas of these zones, and the altitude of precipitation and temperature stations.
MATERIALS AND METHODS Regression Analysis

The different data management schemes (individual and collective monthwise) for regression analysis have been used in this study. Regression is statistical technique, may be used to evaluate correlations inferred from knowledge of the physical environment. The resulting equations are in the form:

Y = B + B X + B X + B X .......... .......... .... + B X .......... .......... ...[1b]...Acreman ,..1985 0 1 1 2 2 3 3 n n


Evaluating the Regression

After fitting the best line, the second step in regression analysis is whether the data can be adequately described by the regression line (Haan, 1973). One approach is to determine, how much of the variability in the dependent variable is explained by the regression.
2 (Y i Y ) 2 = ( Y i Y ) + (Y i Y i ) ^ ^ [2]

The larger the sum of square due to regression the better data explained by regression equation. Ratio of the sum of squares due to regression to the total sum of squares corrected for the mean can be used as a measure of ability of the regression line to explain variations in the dependent variable. This ratio is commonly denoted by "r2" (Haan, 1973). r2 = sum of squares due to regression / sum of squares corrected for mean. Then:
r
2

^ 2 = (Y Y ) / (Y Y ) i i

[3]

"r " is called "coefficient of determination". The Equation (3) can also be written; 2 2 2 2 r = [ B Y + B X Y ( Y ) / n ] /[ Y ( Y ) / n [4] i i i i i i 0 1 If the regression equation perfectly predicts every value of Yi then (Y^i - Y) would be zero. Therefore, Equation (2) could be;

Sarhad J. Agric. Vol.25, No.4, 2009

559

2 2 ( Y i Y ) = ( Y i Y ) .......

[5 ]

Under these condition, ratio of both sides of the equation would be one. On the other hand, if the regression equation is explaining none of the variation in "Y" then one side of the equation would be zero which makes the ratio equal to zero as well. Thus the range of coefficient of determination is from zero to one. Closer it is to one the better regression equation fits the data.
Application of Linear Regression and Linearization of Non-Linear Relationship

Linear regression provides a powerful technique for fitting a "best" line to data. It is predicted on the fact that, relationship between dependent and independent variables is linear (Chapra and Canale, 1984). But this is not always the case; in hydrology sometimes the parameters have non-linear relationship with each other. In such cases transformation (linearization) is essential to express the data in a form of linear regression. Therefore, by linearization of non linear regression, we may be able to evaluate the constant and coefficients. Although these models fit into linear regression in order to evaluate the coefficient as a best fit. They could then be transformed back to original state and used for predictive purposes. To test the accuracy of models following three criteria (Haan, 1973; Pergram and Stretch, 1982; Chapra and Canale, 1984) can be used.
Multiple Linear Regression Method

A useful extension of linear regression is the case where dependent variable Y is a linear function of two or more independent variables X1, X2, X3
^ Y = B 0 + B 1 X 1 + B 2 X 2 + B 3 X 3 + .......... .......... .......... .......... ....[ 6 ]

Where B0 is constant coefficient or intercept and B1, B2 and B3 are coefficient for variables X1, X2, and X3 etc. Usually n observations are available for the variable also n numbers of equations are formed one for each observation. Therefore, we have to solve these n equations for the p unknown parameters (regression coefficients) then n must be greater than p (Haan, 1973). As an example of n equations can be:
Y Y Y Y 1 2 3 n = B = B = B = B 0 0 0 0 + B 1 1 1 1 X X X X 1 ,1 2 ,1 3 ,1 1 + B 2 X 2 2 X 1, 2 + B 3 X 3 1,3 X X + .......... + .......... + .......... + B p .X p p n 1, p .X .X .X ........[ 7] + B + B + B + B + B 2 X X + B + B 3 + B .+ B ... + B ....[ 8 ] ....[ 9 ] ....[ 10 ]

2 ,2 3,2

2 ,3 3 ,3

2, p 3, p n, p

3 X

+ B

n ,2

+ B

n ,3

+ ..........

Individual Month-wise Approach (Multiple Regression)

In this Approach Multiple Regression Analysis has been performed for each month between river flow and climatic variables (temperature, precipitation and relative humidity) to estimate the flows. The general regression equation is as follows:
Q = B 0 + B P + B T 1 2 max

Where Q = River flow in m /s , B0 = Constant coefficient or intercept B1 = Coefficient or intercept for the rainfall, B2 = Coefficient for the Maximum temperature B3 = Coefficient for the minimum temperature B4 = Coefficient for the relative humidity
Collective Month-wise Approach (Multiple Regression)

+ B T 3 min

+ B

. RH ..........

..........

.......[ 11 ]

In this approach Multiple Regression Analysis has been performed for the whole normal year (30 years average) between river flow and climatic variables (temperature, precipitation and relative humidity) to estimate the flows.
Q = B 0 + B P + (B )T 1 2 mean
3

+ B

RH ..........

..........

..........

.......[ 12 ]

Where Q = River flow in m /s , B0 = Constant coefficient or intercept B1 = Coefficient or intercept for the rainfall, B2 / = Coefficient for the Maximum and minimum temperatures (say B2 + B3) B4 = Coefficient for the relative humidity

Amjad Masood et al. Collective and individual month-wise data management approach

560

Model for Regression Analysis

For regression analysis MINITAB 11 software was used. MINITAB 11 for Windows is a powerful statistical package that provides a wide range of basic and advanced data analysis capabilities. Minitab Inc. has long been recognized as a leading developer of easy-to-use statistical software. MINITAB's well-designed user interface makes it accessible to users with a wide variety of background and experience.
RESULT AND DISCUSSION Individual Month-wise Approach (Multiple Regression)

The developed equations from the Individual Month-wise Approach for (Multiple Regression) different months of the year are given in Table I.
Table I
S. No 1 2 3 4 5 6 7 8 9 10 11 12

Individual month-wise regression coefficients for different months of the year


Month Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Coefficients of different variables used in regression model B1 B2 B3 B4 B0 0.0252 -0.613 0.000194 .000031 0.000222 0.033 0.048 -0.000289 0.000562 0.000048 0.0055 -0.117 0.000185 0.000304 0.000164 0.0019 0.114 0.000197 0.000448 0.000092 -0.0186 0.151 0.000487 0.000588 0.000107 -0.296 2.47 0.00445 0.00207 0.000162 -0.986 1.74 0.0127 0.00572 0.00131 -2.24 -12.9 0.0402 -0.0115 0.00422 -1.31 3.5 0.0207 0.00052 0.00292 -1.18 14.9 0.0126 0.00172 0.00641 -0.580 13.6 0.00730 0.00455 -0.00058 0.0466 2.10 0.00104 -0.00160 0.000326 R^2 (%) 6.8 30.1 11.2 21.3 28 46.3 63.3 57.9 12.6 30.5 40.7 16.5 Remarks Weak Medium Weak Weak Medium Fair Good Good Weak Medium Fair Weak

With reference to the above equations the linkage between river flow and climatic variables (temperature, precipitation and relative humidity) is not so much good. Only in two months i.e. May and June the linkages are moderate according to above equations. It means station is not representing the whole catchment. R2 value is 6.8% for the month of November which is very low and shows that the linkage between flow and climatic variables is quite weak. Similarly in the months from December to April and July to October, the relationships are consecutively very weak due to having low R2 values. Only in May and June the coefficients of determination are 63.3 and 57.9% respectively which are reasonable.
Collective Month-wise Approach (Multiple Regression)

The developed regression equations for collective month-wise approach for different groups of months are as follows:
Table II
S. No. 1 2 3

Individual month-wise regression coefficients for different months of the year


Duration Nov-Oct April-Oct Nov-May Coefficients of different variables used in regression model B1 B2/ B4 B0 -0.697 -5.9 0.0132 0.00355 -1.35 23.8 0.0357 -0.0105 -1.17 -10.3 0.0159 0.00972 R^2 (%) 77.7 95.1 93.9 Remarks Very Good Excellent Excellent

In this approach regression results are very much improved and R2 values came 77.7%, 95.1% and 93.9% which are very good comparatively. Thus these obtained equations can be used for prediction of flow.
CONCLUSION

Only in two months i.e. May and June the linkage is moderate according to the linear relationship from given data obtained. The main cause of the non correlation of the remaining data of the year according to this technique is due to the limited data collection point which is not uniformly representing the whole parts of the catchment area. Thus the relationship obtained through individual month-wise approach cannot be used for prediction of flow. The collective month-wise multiple regression for predicting flow from an established relationship can be used as a tool for Swat Basin at Kalam site.

Sarhad J. Agric. Vol.25, No.4, 2009

561

REFERENCES

Acreman, M.C. 1985. Predicting the mean annual flood from basin characteristics in Scotland. J. Hydrol.30(1) 37-49. Chapra, C.S. and R.P. Canale. 1984. Numerical methods for engineers. Mc Graw-Hill Co, London. pp.286-309. De Sacally, F.A. 1994. Relative importance of snow accumulation and monsoon rainfall for estimating the annual runoff, Jehlum basin, Pakistan. Hydrol. Sci. J. 39: 199-216. Dey, B., V.K. Sharma and A. Rango. 1989. A test of snowmelt-runoff model for a major river basin in the western Himalayas. Nordic Hydrol. 20:167-178. Haan, C.T. 1973. Statistical method in hydrology. McGraw-Hill, New York. pp.180-221. Pergram, G.G.S. and D.D. Stretch. 1982. Recursive integrated estimation of effective precipitation and continuous stream flow model. Intl. Symp. Missipi, USA. pp.191-228. Rango, A., V.V. Salomonson and J.L. Foster. 1977. Seasonal stream-flow estimation in the Himalayan region employing meteorological snow cover observations. Water Resources Res. 13,109-122. Salomonson, V.V. and A. Rango and J.L. Foster. 1997. Seasonal stream flow estimation in the Himalayan region employing meteorological satellite snow cover observations. Water Resources Res. 27(7), 1541-1552. Singh, P. and S.K. Jain. 2003. Modeling of stream flow and its components for a large Himalayan basin with predominant snowmelt yields. IAHS. 48(2)257-276. Vehvilainen, B. and J. Lohvansuu. 1991. The effects of climatic change on discharge and snow cover in Finland. IAHS.36 (2) 109-121. Yevjevich, Y. 1972. Probability and Statistics in Hydrology. Water Resource, Pub, Colorado, USA, pp. 232-275.

You might also like