You are on page 1of 9

Cotton Demand Estimation and Forecasting using Regression Analysis

Submitted by:
Mayank Singh (11125028)

Page 1

Objective: To find demand estimation and forecasting for cotton based on historical data of various factors affecting its demand. Modal Specification: Demand function for cotton can be stated as: Q = f (P, G, Q1) Where, Q Per capita consumption of cotton (kg) Q1 Per capita consumption of polyester staple fibre (kg) G Per capita gross domestic product (Rs. Crores) P Price of Cotton (Rs. /kg) So the regression model can be written as: Q= a + a1*P + a2*G + a3*Q1 + e Where a, a1, a2, a3 and e are parameters to be estimated using the ordinary least-squares (OLS) method for demand estimation. Since data points are unlikely to fall on line, so regression model is modified to include a disturbance term (e). The assumptions made in the regression analysis are that the number of explanatory or independent variables (P, G, Q1) is smaller than number of observations and there is no perfect correlation among them. Data Description: For analysis of consumption of cotton in India time series data for Q,Q1,G,P is collected for the period 1995-96 to 2010-11.Price of cotton and GDP is adjusted to overcome effect of inflation using wholesale price index, taking 1993-94 as the base year. Regression results and its interpretations: Regression statistics: Multiple R R Square Adjusted R Square Standard Error Observations ANOVA table: Sum of Mean Squares Square F 1.62E-12 5.42E-13 21.14 2.82E-13 2.56E-14 1.91E-12 0.92 0.85 0.81 0.16E-07 15

Regression Residual Total

df 3 11 14

Page 2

Coefficient table: Standard Coefficients Error t Stat P-value Intercept Per capita GDP in Rs. Crores Price in Rs./kg per capita consumption of polyester staple fiber in kg The regression relationship obtained is as follows: Q = 2.42E-04 + .00089 * G 2.3 E-05* P 1.7E-03 * Q1 1. Test of goodness of fit and correlation: 2.42E-04 5.34E-05 .00089 0.000228 -2.3E-05 1.95E-06 -1.7E-03 1.17E-04 4.54 0.0008 3.91 -1.18 -1.47 0.002 0.26 0.17

The goodness of fit of the regression estimates should be evaluated before interpreting the regression coefficient. From the regression table the value of multiple correlation coefficient R = .92, which is very high .This is not too surprising when time series data is employed. R square or coefficient of determination captures the proportion of total variation of the dependant variable Q explained by the full set of independent variables. The closer the R2 to 1, the larger the share of variation explained by the model. If R2=1 then it means all the sample points lie on the Regression line. In this case R2= .85, meaning 85 % of the variation in the dependant variable (Q) is explained by the regression. This is good fit based on the R2 value as only 15% of variation in dependent variable is unexplained. A downward adjusted version of the R2, called adjusted R2 exists to account for the degree of freedom ( the number of observation beyond the minimum needed to calculate the regression statistics). Adjusted R2 = .81, is pretty close to R2 which indicates that penalization linked to degree of freedom is not large. 2. The standard error of the regression: The value of standard error shows the error on the dependant variable (Q) from the regression line. As the very small value of SE = 0.16E-07 shows the better fit of the regression line to the observation points. Regression is based on the assumption that the error term is normally distributed, so that the actual values of the dependent variable (consumption of cotton) should be within one standard error (0.16E-07 in our example) of their fitted value. The SE of the regression can be used to find the interval estimate, or forecast, of Q from its point estimate which can be calculated easily by estimated regression. 3. Test of F statistic: F ratio is used to tell if the explanatory values as a group explain a statistically significant share of the variation in the dependent variables. It can be calculated by following relationF= (R2/(k-1))/((1-R2)/(df)) Where n is the number of observations, k is the number of parameters or coefficients and df is degree of freedom (n-k). Using the values of R2 = .85, n=15 and k=4, we obtain F = 21.14, Page 3

the same value as in the analysis of variance (ANOVA) table. To conduct the F test we compare the calculated or regression value of F statistic with critical value from the table of the F distribution which is 3.59 for the 5 percent level of significance. As F exceeds the Fvalue derived under the null hypothesis, null hypothesis is likely to be true for investigated dataset. This rejection level is called significant level in which alternative hypothesis at the 5 percent level of significance is accepted that not all coefficient are equal to zero. 4. Test of t-statistics of the coefficients: The coefficient table shown above indicate that there is inverse relationship exists between per capita consumption of cotton and price of cotton .This same case applies between other combination of Q and Q1 proving the fact of substitution effect. These relation holds because coefficients of Q1 and P are negative .But there holds direct relationship with per capita GDP as it has positive sign of coefficient =.00089. The p-value of the t-stat is lower than 5% for most coefficients, which means that they are significantly different from 0 at that significance level. Now to check the significance of relationship between Q and independent variables, first calculate critical value t* at 11 df for the 5% level of significance = 2.201.The trend is like that the more the df or the more the observation points are the smaller are the t* values regardless of the level of significance we choose. Comparing t* with t stat of the various coefficients , the significance of relationship in case of price of cotton seems to be very less implying that it should be dropped for further regression analysis. But let us see how will the results of correlation statistics go? Correlation statistics: Per capita consumption of cotton in kg Per capita consumption of cotton in kg Per capita GDP in Rs. Crores Price in Rs./kg per capita consumption of polyester staple fiber in kg 1 0.90 0.77 0.59 Per capita GDP in Rs. Crores 1 0.87 0.74 Price in Rs./kg per capita consumption of polyester staple fiber in kg 1

1 0.49

Page 4

As in the above correlation statistics table it is quiet clear that per capita GDP is highly correlated with other explanatory variables, from this it may be inferred that low t values of other two explanatory variables is due to multicollinearity of per capita GDP. One way to solve this problem is to drop the highly collinear variable. Applying this method we get the following regression results which clearly shows the improvement in the results in terms of R2 and better t stat of the estimated coefficients greater than t* as shown in following two tablesRegression Statistics Multiple R R Square Adjusted R Square Standard Error Observations Coefficients Intercept Price in Rs./kg per capita consumption of polyester staple fiber in kg 0.0016 -7.49E-05 -0.0037 0.94 0.88 0.86 0.0002 15 Standard Error 0.00048 1.18E-05 0.001 t Stat 3.40 -6.34 -3.79 P-value 0.0051 3.66E-05 0.0025

Demand forecasting using Trend Projection (Quantitative method) Regression analysis, as described above, can be used to quantify relationships between variables. However, data collection can be a problem if the regression model includes a large number of independent variables. When changes in a variable show discernable patterns over time, time-series analysis is an alternative method for forecasting future values. The focus of time-series analysis is to identify the components of change in the data. These components may be like trend, seasonality, cyclic pattern or random fluctuations. A trend is a long-term increase or decrease in the variable. For example, the time series of population in my data of India and price of cotton exhibit an upward trend. Trend projection is one of the most commonly used forecasting techniques which are based on assumption that there is an identifiable trend in a time series data. The simplest form of time-series is projecting the past trend by fitting a straight line to the data by regression analysis .The linear regression model will take the form as follows: Tt = b0 + b1t Where: Tt b0 b1 t = = = = trend value in period t intercept of the trend line slope of the trend line time

Using the regression analysis in our case we get, Page 5

Qt = 2173+120.2 * t , R2 = .84 Putting the values in this relationship, we can get forecasted values for different years as follows: Q 16 = 2173 + 120.2 * 16 = 4096.2 million kg Q 17 = 2173 + 120.2 * 17 = 4215.7 million kg Q 18 = 2173 + 120.2 * 18 = 4335.8 million kg

Consumption of cotton (million kg)

5000 4000 3000 2000 1000 0 0 5 10 Time 15 20 Q = 120.2t + 2173 Consumption of cotton ( million kg) Linear (Consumption of cotton ( million kg))

Figure 1 Forecasting by trend projection In the first case when we calculated the forecasted value of cotton consumption take consideration only the long-run trend factor in the data. By incorporating the seasonal variations, we can significantly improve forecast of the consumption of cotton in India. For this we include dummy variables in above linear regression model. Taking D1, D2 and D3 as dummy variables we can have following regression relationship Qt = 2105.99 + 119.33 * t + 72.72 D1 + 73.67 * D2 + 134.92 * D3

Page 6

Adjusting the Trend Forecasts of cotton consumption for seasonal variations by Using seasonal Dummies Year Consumption of time(t) D1 D2 D3 cotton ( million kg) 1996-97 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04 2004-05 2005-06 2006-07 2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2691.1 2546.26 2580.09 2702.49 2725.61 2697.9 2618.85 2777.63 3069.35 3383 3674.55 3701.75 3570 3910 4386 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0

Using the above relation we get the following predicted values: Q16 = 2105.99 + 119.33 * 16 + 72.72 * 0 + 73.67 * 0 + 134.92 * 0 = 4015.27 million kg Q17 = 2105.99 + 119.33 * 17 + 72.72 * 1 + 73.67 * 0 + 134.92 * 0 = 4207.32 million kg Q18 = 2105.99 + 119.33 * 18 + 72.72 * 0 + 73.67 * 1 + 134.92 * 0 = 4327.60 million kg Regression Statistics: The following two tables show the regression statistics in case when dummy variables are included into the consideration for the forecast of cotton consumption. The coefficients of independent variables are used in the forecasting of Q for particular period as show in above relations. Multiple R R Square Adjusted R Square Standard Error Observations 0.91 0.84 0.78 273.26 15 Page 7

Standard Coefficients Error Intercept time period D1 D2 D3 2105.99 119.32 72.72 73.66 134.92 206.06 16.56 209.36 208.71 209.36

t Stat 10.2201 7.20 0.34 0.35 0.64

P-value 1.3E-06 2.92E-05 0.73 0.73 0.53

Evaluation of results: Analysing the both results we can infer that both methods produce almost the similar values. The reason seems to be the less seasonal variation effect associated with the current data. This indicates that cotton consumption is less affected by seasonal variation. Also, here we are concerned with a year basis consumption of cotton .If we would have gone for quarter based consumption of cotton we would get the more variation in the results because then the seasonal variation would be more prominent on the consumption.

References: Cotton Corporation of India Limited, Government of India Access on Oct 2011 from http://www.cotcorp.gov.in/statistics.asp Handbook of Statistics on Indian Economy 2010-11, Reserve Bank of India Access on Oct 2011 from http://www.rbi.org.in/scripts/PublicationsView.aspx?id=12730 Man made fibres (MMF), Ministry of textiles Access on Oct 2011 from http://www.texmin.nic.in/policy/Fibre_Policy_Sub_%20Groups_Report_dir_mg_d_2010 0608_2.pdf

Page 8

Page 9

You might also like