CHAPTER 5

STATISTICAL ANALYSIS

5.1 INTRODUCTION

data, compiling, analyzing and interpreting the observed data for drawing reasonable and

valid conclusion. The data regarding quantitative and qualitative characters are an

essential component of experimental work, which need statistical analysis, interpretations

and conclusion.

5.1.1 Models

Models are powerful tools by which the designers of waste treatment systems can

investigate the performance of a number of potentials under a variety of conditions. The

aim of this part of thesis work is to develop a mathematical model on paper and pulp mill

wastewater in combination with domestic wastewater in correlation with effective

microorganism using regression analysis and artificial neural network analysis. The data

obtained from the study involving various parameters like hydraulic retention time,

organic loading rate, sludge loading rate, influent pH, VSS/TS ratio, influent COD,

effluent COD, are considered as the explanatory variables and the percentage of COD

removal and methane production were considered as response variable using these data, a

multiple regression model and ANN model were prepared.

5.2 REGRESSION

a very useful and widely employed tool of data analysis. It leads to simple, yet often

powerful descriptions of the main features of the relationships among variables.

between two or more variables in terms of the original units of the data. Regression helps

164

to estimate one variable or the dependent variable from the other variables or the

independent variables. In other words we can estimate the value of one variable, provided

the values of other variable given. The statistical method which helps to estimate the

unknown value of one variable from the known values of the related variables is called as

regression.

economics and business research, to find a relation between two or more variables that

are related casually is regression analysis.

According to Wallis and Robert (2001), It is often more important to find out

what the relation actually is, in order to estimate or predict one variable (the dependent

variable), and statistical technique appropriate in such a case is called Regression

Analysis.

With the help of the regression analysis, we can estimate or predict the unknown values

of one variable from the known values of another variable. In the regression analysis, the

independent variable is also known as the Regressor or Predictor or Explanator and

the dependent variable is known as regressed or explained variable.

parameters such as pH, Electrical Conductivity, Total hardness, Calcium, Magnesium,

Total alkalinity, Carbonate, Bicarbonate Sodium, Potassium,Chloride,Phosphate,Fluoride

and Nitrate significant linear relationships among some water quality parameters have

been obtained and found maximum between dissolved solids and electrical

conductivity(0.9999) and between total hardness and magnesium(1.000) which can be

used for rapid monitoring of water quality parameters. There was an also positive

correlation between electrical conductivity and total dissolved solids with chloride

(0.909) and some extent to sodium.

165

Edokapayi and Clement Aghatise (2008) reported that the physical and chemical

characteristics of the five stations along a 5km stretch of Benin river Southern Nigeria.

The results of multiple regression analysis carried out for each station using conductivity,

dissolved oxygen, Calcium, Phosphate and nitrate-nitrogen as dependent variables.

Conductivity was significantly influenced by other environmental conditions at the study

stations (P< 0.0001, 0.0003). Dissolved oxygen was significantly influenced by other

environmental factors at stations I (P<0.0007) and V (P<0.002) and nitrate- nitrogen was

significantly influenced in stations N( P<0.026) and V(P<0.0007).

Venkatesh et al (2009) reported that the r- value varies in the range of 0.0608 to

0.9969 depending on the set of parameters considered for analysis. The correlation values

above 0.94 were selected for analysis. The highest correlation is between EC and TDS.

High positive correlations between Turbidity and TSS, BOD and COD, EC and chlorides

were also observed.

regression line. It can be classified into regression equation, regression coefficient,

individual observation and group distribution.

relationship among them is established. ANOVA indicated that various parameters of

regression and their relationship. The regression co- efficient 0, 1, . p (Parameters)

as well as the variance of the errors, var( i ) = 2, are usually unknown and have to be

estimated from observations (Yi, Xi1, .., Xip ), i = 1,2,3,.n. as in the simple linear

regression model, we can use the least squares criterion to determine the estimates of

regression coefficients.

between the independent variable X and the dependent variable Y is modeled as an nth

order polynomial. Polynomial regression fits a non-linear relationship between the value

166

of X and the corresponding conditional mean of Y, denoted E (Y/X), and has been used to

describe non-linear phenomena.

statistical criteria. It selects the variables which will be included in the final regression

equation. Stepwise regression is classified into backward stepwise regression and forward

stepwise regression.

has certain performance characteristics in common with biological neural networks. A

multilayer feed forward ANN, trained using error back propagation learning algorithm

is employed for this purpose. The ANN is a flexible mathematical structure which is

capable of identifying complex nonlinear relationships between input and output data

sets. ANN models have been found useful and efficient, particularly in problems for

which the characteristics of the processes are difficult to describe using physical

equations (Kuoet.al., 1993).. Figure 5.1 shows the ANN modeling encomprising of

various parameters and the hidden layers.

ANNs have been proved successful in solving many civil and structural

engineering, waste water treatment and rain fall, run off modeling problems. ANN

structure designed to mimic the information processing functions of a network of neurons

similar to the brain (Guru Prasad, 2007).

parameters of HUASB reactor like HRT (Hydraulic Retention time), OLR (Organic

Loading rate), influent pH, effluent pH, VFA/ alkalinity ratio, influent COD, effluent

COD, act as input parameters, also the output parameters were removal percentage of

COD , biogas production and methane content. The Figure 5.2 to 5.5 shows the various

procedures for analyzing the statistical analysis, explaining the architecture layers and

the algorithms, performance and correlation between estimated value and the measured

values of R2 for HUB1 reactor.

167

168

169

Figure 5.4 Correlation between estimated value and measured value for HUB1

Figure 5.5 Predicted value and observed value for Regression Coefficient for HUB 1

170

of TCOD Observed Vs Production Observed Vs

predicted Model I (M1) predicted Model II (M2)

2

HUB2 Y=0.98x+1.904(R2 =0.994) Y=0.547x+0.08 (R =0.51) Significant for M1

2 2

HUB3 Y=0.99x+0.374(R =0.990) Y=0.94x+0.01 (R =0.964) Significant for M1

HUB4 Y=1.09x-0.565(R2 =0.987) Y=1.02x -0.001 (R2=0.95) Significant for M1

The above Table 5.1 shows that the removal of TCOD % for the reactors are

efficient as the R2 value are above 0.9 and the HUB3 and HUB4 i.e. PPW and DW with

and without EM combinations are found to be efficient as the R2 value found to be above

0.99 and 0.95, out of this ANN model study.

(i) Start up phase and aafter start up conditions and optimization of HUASB

reactor

(ii) Efficiency of HUASB reactor among varying HRT.

The main aim in this study is to create a multiple linear regression model for the

dependent variable COD removal (%) and methane production on several independent

variables such as Organic Loading rate, Influent pH , Influent Inlet VFA, alkalinity. The

multiple regression equation for the dependent variable Y on the above independent

variable was fit using the SPSS software through forward stepwise regression method.

A Multiple regression equation for all the reactors using EXCEL software is

furnished below. Forward stepwise regression method was followed to find out the

significance of various factors (Independent variables) for COD removal (dependent

variables.

171

172

based on influent TCOD (X1) HUB 1.

Figure 5.7 Comparison of observed and predicted TCOD removal (Y1) based on OLR

(X2) HUB 2

173

*: Significant at 5% level.

*: Significant at 5% level.

174

Figure 5.8 Comparison of observed and predicted methane production rate (Y2)

based on OLR (X2) HUB 3.

Figure 5.9 Comparison of observed and predicted methane production rate (Y2) based

on influent TCOD (X1) HUB 4.

175

The consolidated regression equation obtained from the Table 4.9 to 4.12 and also

from Table 5.2 to 5.5, with its efficient R2 valued models in each reactor is tabulated in

Table 5.6 for ready reference as follows.

Figure 5.6 to 5.9, depict the comparision of observed and predicted values

obtained from the regression model.

From the Table 5.6, it could be assessed that HUB 4 reactor i.e. PPW and DWW

without EM , seems to be significant in both influent TCOD Vs % TCOD removal as well

OLR Vs TCOD % removal as R2 values are above 0.92. Similarly, the HUB 1 is also

showing consistent values of R2 of about 0.89. HUB 3 shows % of TCOD removal with

respect to influent TCOD as well methane production improved version. HUB2 fails to

maintain consistency factors with wide variation to TCOD removal and methane

production rates.

By and large, the treatment of pulp and paper mill wastewater treatment is proved

to be efficient by adding EM alone as in the HUB 1 reactor and while treating with

DWW, without EM is efficient as in HUB 4 from the conclusion arrived the multiple

regression analysis.

176

Table 5.6 Consolidated regression statement for all the reactor using regression method

R2

Equation Remarks

Value

Influent TCOD (X1) and

Y1 = 49.178+0.029 (x1)-6.3E-06(x1)2+3.34E-10(x1)3

HUB1 TCOD Removal 0.875

Efficiency (Y1).

Influent TCOD (X1) and

Y2 = 0.155+5.05E-05(x1)-1.3E-09(x1)2-1E-12(x1)3

HUB1 Methane Production rate 0.894

(Y2).

Y1 = 49.162+14.632 (x2)-1.588(x2)2+0.042(x2)3

HUB1 Removal Efficiency 0.875

(Y1).

HUB1 0.894

Production rate (Y2).

HUB 2 0.963

TCOD % Removal (Y1).

Y2 = 0.538-0.000 (x1)+1.41E-07(x1)2-1.4E-11(x1)3

HUB 2 Methane Production rate 0.142

(Y2).

HUB 2 0.963

Removal (Y1).

HUB 2 0.142

Production Rate (Y2).

Y1 = 49.423+0.032 (x1)-1E-05(x1)2+9.43E-10(x1)3

HUB 3 TCOD Removal 0.454

Efficiency (Y1).

Regression Analysis of

2 3

Y2 = 0.033-6.2E-08(x1) +5E-12(x1) Influent TCOD (X1) and

HUB 3 0.514

Methane Production rate

(Y2).

OLR (X2) and TCOD

HUB 3 Y1 = 42.049+15.400 (x2)-1.784(x2)2+0.056(x2)3 Removal Efficiency 0.883

(Y1).

OLR (X2) and Methane

HUB 3 Y2 = 0.011+0.083(x2)-0.008(x2)2 0.897

Production rate (Y2).

177

HUB 4 Y1 = 33.780-0.014 (x1)+1.26E-05(x1)2-1.7E-09(x1)3 0.929

TCOD % Removal (Y1).

HUB 4 Y2 =0.113-1.4E-05 (x1)+3.58E-08(x1)2-5.8E-12(x1)3 Methane Production 0.943

Rate (Y2).

Regression Analysis

HUB 4 Y1 = 33.776-6.031 (x2)+2.184(x2)2-0.124(x2)3 OLR (X2) and TCOD % 0.929

Removal (Y1).

Regression Analysis

HUB 4 Y2 = 0.113-0.005 (x2)+0.006(x2)2 OLR (X2) and Methane 0.943

Production Rate (Y2).

178

