Professional Documents
Culture Documents
VADE MECUM
2. STATISTICS
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
Table of Contents
Index - i
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
1. Descriptive Statistics
1.1 Definitions
• Statistics is the science of drawing conclusions about a population based on an analysis of sample data from
that population.
• Population: values that can be taken by a variable.
• Sample: drawing of n values of the variable taken from the population.
• Random Variable = X = ( xi ) .
• Probability Distribution = P ( xi ) . It describes the random variable probability of occurrence and is described
by its parameters. (Example: Normal distribution is described by µ and σ , see below).
• Statistic = Any function of the sample data.
• Estimator = An estimator of a parameter is a statistic, which corresponds to the parameter. For instance :
- The sample mean ( x ) is the estimator of the actual population mean µ
- The sample variance ( S 2 ) is the estimator of the actual population variance σ 2
• Interval Estimation: An interval estimation of a parameter is the interval between 2 statistics that includes the
true value of the parameter with a given probability (1- α ).
1.2 Basic
∑x ∑ (x − x)
n n
2
i i
i =1 i =1
• Arithmetical Mean = x = Standard Deviation = S X =
n n −1
• 2
Variance = S X
2 2 2 2 2 2
- SX +Y = S X + S Y and S aX = a ⋅ S X
- a : Coefficient, X = ( xi ) , Y = ( yi ) : two series of independent values.
∑ (x − x )* ( y − y)
i i
• Covariance = Average of the products of paired deviations: COV ( X ,Y ) = i =1
n
1.3 Normal Probability Distribution
• The most often used probability distribution is the Normal probability distribution:
2
1 x − x
(− )
dZ 1 2 σ
= e
dx σ 2π
Central Limit Theorem
• For a group of n independent sampling units drawn from a population of mean µ and variance σ 2 , the
∑
n
1 σ2
sampling distribution of x = x i is approximately Normal with mean µ and variance . Said:
n n
i =1
σ2
x → Ζ µ, .
n
2.1
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
2.2
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
2.2 Test for the Equality of Two Variances ( σ 1 ,σ 2 ) of two Normal Population of Random
Size, ( n1 , n 2 )
(Excel Function FTEST)
Test Description
• H o : σ 12 = σ 22 , H 1 : σ 12 ≠ σ 22
S 12
• We compute the statistic Fo = , where F = Fisher Distribution
S 22
• We reject H o if Fo > Fα or if Fo < F α
, n1−1, n2 −1 1− , n1 −1, n2 −1,
2 2
α
• Where Fα and F α
denote the upper and lower percentage points of the
, n1 −1, n2 −1 1− , n1 −1, n2 −1 2
2 2
F distribution with n1 − 1 and n2 − 1 degrees of freedom, respectively.
• As the table for the F table gives only the upper tail points of the F, so to find F α
we must
1− , n1 −1,n2 −1
2
1
use: F α
= (be careful about n1 and n2 , which are inverted).
1− , n1 −1, n2 −1 Fα
2 , n2 −1, n1 −1
2
2.3
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
∑ (x − 4586 )
8
2
i
i =1
After computing: x1 = 4586 , x 2 = 4454 S12 = = 34796 , S 2 2 = 26598
8 −1
34796
• Fo = = 1.308 < F0.05 = 4.99
26598 ,8 −1,8 −1
2
−1
and F.975 ,7 ,7 = ( F0.025,7 ,7 ) = ( 4.99 ) −1 = 0.20 <1.308
The test yields not to reject H o : the measurements don’t allow us to conclude that #1 way of sampling is
significantly, with 5% confidence, different than #2 (even if S 1 > S 2 ). The excel function is FINV(0.025,7,7).
2.4
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
∑ (y − y )
n
y 2
- SST = total sum of square of the variable of interest = i
x B1=y/x
i =1
Y
∑ (E )2
E=Y-YEst
n
YEst - SSE = sum of square of errors = i −E
Y
i=1
B0
- SSR = sum of square explained by the regression line: SST = SSR+SSE
X
• We want to optimize SSR/SSE. Thus we test the hypothesis that the slope B1 equals 0:
H o : B1 = 0 , H 1 : B1 ≠ 0 .
• Under H o , the ratio (SSR/p)/(SSE/(n-p-1)) follows a Fisher distribution with p and n-p-1 degrees of freedom
(excel function FINV (α, p, n-p-1)).
• If Fα is high, then H o is rejected and with a certain significance α , we assume the regression is significant.
Coefficient of Determination R2
• The coefficient of determination R2=SSR/SST gives the proportion of variation in the dependent variable
( Y : ( y i )i =1 ton ) explained by the regression line.
• The coefficient of correlation is defined by: r =sqrt (R2).
Example
H0: there is no correlation
n=5, p=1, SST=0.051+0.019, MSR=0.051/1=0.051, MSE=0.019/3=0.0063, F=0.051/0063=8.05,
.75
.7
.65
R2 = 0.051 / (0.051 + 0.019) = 0.73, r = 0.85
Critical F value (α = 0.025), F1,3,0.025 = 17.44 > 8.05
.6
SO3
.55
.5
The ratio belongs to the F distribution
.45
We cannot reject H0, the regression is not significant.
.4 Y = 2.077 - .032 * X; R^2 = .727
.35
42 43 44 45 46 47 48 49 50 51 52
CaO
2.5
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
4.2 Variogram
a) Variogram Construction
• A variogram is a plot of the average difference of a selected variable (C3S for example) between pairs of units
selected as a function of time, where the pairs are chosen in whole-number multiples (e.g. every minute, 2
minutes, 1 meter, 2 meters, …).
2
∑ N with :
x j − x j+h
j =1 - j : numbering of the sample’s value
γ X (h ) =
- N: number of pairs of sample with a specific time or
2 ⋅( N − 1) spatial distance (=h) between values of a pair.
Example:
The C3S values of kiln feed samples are:
Sample# 1 2 3 4 5 6 7 8 9 10
Time 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00
C3S (%) 54.2 57.8 59.8 61.2 60.0 56.0 52.0 52.0 52.4 57.0
Then we can calculate the one-hour pair difference:
Pair# 1 2 3 4 5 6 7 8 9
Diff in pair 3.6 2 1.4 -1.2 -4 -4 0 0.4 4.6 Sum
Square diff 12.96 4 1.96 1.44 16 16 0 0.16 21.16 73.7
73.7
Then γ C 3 S ( 1 hour ) = = 4.6
2 ⋅( 9 − 1)
Two rules for variogram construction
• Collect enough units (N) to get a statistical population (at least 30 samples for a short term experiment and 60
samples for a long term); the short term intends to define very precisely the random heterogeneity term (nugget
effect, refer below).
• The number N should reach half the total amount of samples collected (N>n/2).
2.6
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
b) Variogram Interpretation
X
Interpretation of the limit of variogram (h) when h increases
• Whatever the variable is, beyond a certain value of h, the variable ceases to be
correlated with itself. It is because the phenomenon taking place has no longer
any memory of a past long gone (see case 2 and case 3 where the variable level t
off at a sill generally equal to the variance of the variable). Signal is drifting
• This is true for all raw mix analyses, which are limited in terms of the values they γ X(h)
can take.
• However, over a short period of time (a few hours), the signal may well drift.
(See graph below). In such a case, the variogram will tend to increase instead of
stabilizing itself around σ x2 . h
X
The "Nugget Effect" γ x (h) 2 2
σ x = σ xn
• Many variables, especially those obtained from data
measured with a dispersive method (analytical, #1
Nugget effect
sampling errors, etc.), present a slight or marked
degree of strictly random variations from one value t h
to the next. X γ x (h) 2
σ
• As a rule, a variable presenting a "smooth" graph (# x
t h
Limitations in h value
• If N values of X are available, shifts of more than N/2 should not be considered.
Regionalization and prediction • The span of values of ho for which γx (h) is below σ x2 is
• A very frequent pattern of variogram is
called the "area of regionalization" or the range.
shown as below:
2 • The value of the signal at time t + ho is in fact dependent
γ X (h ) σx of all values taken by X between t and t + ho.
• If all values xb x i +1 , xi + h +1 are known, then xi + h can
2 be predicted much better than by saying that it is
σ xn
randomly distributed with a variance σ x .
2
2.7
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
Pseudo-periodicity
X γ x (h)
• The periodic variations can be self-sustained (control
cycle, oscillator, etc.) or induced by a periodic 2
2 σx
phenomenon (buckets of elevator are unevenly distributed,
correction interval of raw meal).
h
• Even if the periodicity is blurred on the graph of the signal Pseudo Periodic signal t
2.8
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
“Correctogram” is a simple statistics tool which can be used to determine whether over-control or under-control is
occuring in a control loop. For spot checking, a plot of the correctogram can be used.
=0 Perfectly tuned control. All off-target values for the control parameter are due to random
variations (materials, feeder accuracy, etc.)
1 > slope > 0 Undercontrolling. Multiply gain by (1 + slope).
=1 No control taking place.
>1 Divergent control: gain value has wrong sign.
0 > slope > -1 Overcontrolling. Divide gain by (1 – slope).
= -1 Overcontrolling is inducing a cycle with frequency = 2 x sampling interval. Divide gain by 2.
< -1 Divergent cycling due to severe overcontrolling. Divide gain by (1 – slope).
2.9
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
5. Sampling
5.1 Golden Rules
• The MRW.
• The sampling method must allow every particle the same chance of being collected.
• C = fcl g with
- f = Particle shape factor. (= 0.5 usually, ranges between 0 and 1)
= 1 when cubic, = 0.2 when flat, = 0.5 when spheroidal
- l = liberation factor [0 to 1]
= 0 if homogeneous, = 1 if particles completely distinct, = .001 for homogeneous raw mix, = .2
medium, = .3-8 heterogeneous
- g = factor describing the particle size distribution
• If we call “size range” the ratio d M / d m of the upper size limit d M : (about 5% oversize) to the lower size
limit d m : (about 5% undersize):
- Large size range ( d M / d m > 4): g = 0.25, medium size range (4 to 2): g = 0.50, small size range (< 2): g
= 0.75, uniform size ( d M / d m = 1): g = 1.00
Usually we take ρ i = ρ ic
2.10
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
Example:
Mix is crushed at 12.5 mm of 75% lime and 25% clay, CaO is the critical
Sample weight = 50 kg.
l = 0.3 f = 0.5
CaO lime content = 52%, CaO clay content = 24%
ρCaO = 2.7 g / cm 3 , ρ lime = 2.7 , ρ clay = 2.7, g = 0.25
1 − 0.52 1 − 0.24
c = 0.75 x x 2.7 + 0.25 x x 2.7 = 1.869 + 2.137 = 4.00 g / cm 3
0.52 0.24
(1.25 )3 x 0.15
And: σ (FE ) = = 2.4 .10 −3 is the fundamental error standard deviation.
50 ,000
Then the 95% probability confidence interval ± 2 σ ( FE ) is 0.0048 and then CaO content confidence interval is:
052.( 1 ± 2σ ( FE )) = 0.52 ± 0.048% CaO . (Considering that 1 − τ ≈ 1 )
3
C .d M 0.15 x 4 3
MRW = = = 6 kg
σ ( FE ) 2 (0.04 )2
5.4 Estimation of the Maximum Particle Size
• Assuming we want to sample a maximum of 5 kg sample with a tolerate standard deviation of σ = 0.04
3 Mσ 2 5000 x 0.04 2
Then: dM = d M =3 = 3.8 cm
C 0.15
a) Rule of Thumb:
Maximum Particle Size (mm) 10 20 30 40 50 60 75 90
Min sample Coal (ISO1988), kg 0.6 0.8 3
Min Sample Aggregate, ASTM D75, kg 10 25 60 80 100 120 150 175
ASTM for the aggregate industry is very safe.
2.11
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
2.12
Rev. 2002
CEMENT PROCESS ENGINEERING SECTION 2 – STATISTICS
VADE-MECUM
2.13
Rev. 2002