You are on page 1of 12

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1

UNIT 2 - NQF LEVEL 4


OUTCOME 4 - STATISTICS AND PROBABILITY
TUTORIAL 2 STATISTICAL DATA 2
Tabular and graphical form: data collection methods; histograms; bar charts; line diagrams;
cumulative frequency diagrams; scatter plots
Central tendency and dispersion: the concept of central tendency and variance measurement;
mean; median; mode; standard deviation; variance and interquartile range; application to
engineering production
Regression, linear correlation: determine linear correlation coefficients and regression lines and
apply linear regression and product moment correlation to a variety of engineering situations
Probability: interpretation of probability; probabilistic models; empirical variability; events and
sets; mutually exclusive events; independent events; conditional probability; sample space and
probability; addition law; product law; Bayes theorem
Probability distributions: discrete and continuous distributions, introduction to the binomial,
Poisson and normal distributions; use of the Normal distribution to estimate confidence intervals
and use of these confidence intervals to estimate the reliability and quality of appropriate
engineering components and systems

This tutorial carries on the work from tutorial 1 and extends the work.
On completion of this tutorial you should be able to do the following.

Explain and find the standard deviation and variance for grouped and ungrouped data.

Explain and use the normal distribution curve.

D.J.Dunn www.freestudy.co.uk

1.

STANDARD DEVIATION AN EXPLANATION OF THE MEANING.

The data about things like people's height varies about a mean value. If the data is random then
there would be for example, equal numbers of short and tall people. We covered the mean value in
the last tutorial. The STANDARD DEVIATION is a measurement that tells if the data is
concentrated close to the mean or spread out over a wide range. Plots of data like this give us the
characteristic bell shaped curve shown.
The STANDARD DEVIATION uses the symbol
(sigma). If is large, the data is concentrated close to
the mean. If is small, the data is spread out. If we have
the same number of samples, then small values of
produce short graphs and large values of produce tall
graphs as shown.

The area under the frequency distribution graph tells us


how many samples fall within a given range of x. For a
perfect distribution, is a value that divides the
frequency distribution up as shown. 68.26% of all
samples would be within a range of of the mean.
95.42% would be inside the range 2 of the mean.

Consider the manufacture of electronic resistors. If we measured a large


set of data from the total population being manufactured, we would want
the mean value to be the nominal value of the resistor and that all values
fall within the tolerance band required. In manufacturing, control of
dimensions and values generally require a large to indicate accuracy and
close tolerance.
A large value of with the correct mean would indicate accurate values.
A small value of would indicate a lot of resistors outside the required range.
The number less than a given value of x is the
accumulative frequency.
x

acc.f =

f dx

When plotted against x we have the OGIVE. We can use


this to find the percentile values and quartiles as described
in the last tutorial. The median is the centre of the vertical
scale. In the ideal case the mean corresponds to the median
but remember that real data does not produce perfect
shapes like those shown.
The following work shows how to calculate and then how we model these distributions with an
ideal formula to give the normal distribution curve.

D.J.Dunn www.freestudy.co.uk 2

2.

CALCULATION OF STANDARD DEVIATION for UNGROUPED DATA

Ungrouped data is presented in a table listing the value of each sample. If the number of samples is
large, this becomes a large table but it is probably best to use this method with small numbers of
samples.
DEFINITION of

= Standard deviation = Square root of the mean of the variables squared.


The variance is denoted S = 2

(x x )
=

S=

n 1

n = number of samples

You might find it better to arrange your tables in columns rather than rows. Lets look at an
example.

WORKED EXAMPLE No. 1

The following is a table of lead concentration in the blood of a group of people. Calculate the
mean and the standard deviation.
Sample
1
2
3
4
5
6
7
8
9
10
Totals10

Resistance (Ohms)
119
120
120
121
122
119
119
122
123
123
1208

Difference from mean


-1.8
-0.8
-0.8
+0.2
+1.2
-1.8
-1.8
+1.2
+2.2
+2.2

Differences squared
3.24
0.64
0.64
0.04
1.44
3.24
3.24
1.44
4.84
4.84
23.6

Mean = 1208/10 = 120.8


The sum of the squares of the differences (or deviations) from the mean, 23.6, is now divided
by the total number of observation minus one, to give the variance.
VARIANCE

(x x )
=

S=

n 1

23.6
= 2.622
9

Finally, the square root of the variance provides the standard deviation:
= 2.622 = 1.619 Ohms

D.J.Dunn www.freestudy.co.uk 3

SELF ASSESSMENT EXERCISE No. 1

1. The hardness of ten steel samples was measured and the results were as follows.
Sample
Hardness

1
90

2
92

3
95

4
91

5
98

6
102

Calculate the mean and the standard deviation.

7
97

8
92

9
95

10
99

Answer 95.1 and 3.9

2. The thickness of 20 steel strips was measured in mm and tabulated as shown.


Sample
Thickness

1
2
3
4
5
6
7
8
9
10
19.8 19.9 19.9 20.1 20.1 19.9 20.2 19.7 19.7 19.9

Calculate the mean and the standard deviation. Answer 19.92 and 0.168

D.J.Dunn www.freestudy.co.uk 4

3. CALCULATION OF STANDARD DEVIATION for GROUPED DATA

We need to discuss what x means. If you were throwing a dice over and over again you would get
a score of exactly 1, 2, 3, 4, 5 or 6. Hence x can only be these exact numbers and if you throw the
dice repeatedly you can measure how many times a particular number comes up.
In the case of continuous variables such as height, weight, size and so on, you can get values of x
with as many decimal places as required to express the accuracy of the measurement. In order to do
anything meaningful, we have to count the number of samples f that fall within a specified range
of each x value used in the plot. Grouped data is presented in tables showing the bands and the
frequency and is more likely to be used with large numbers of samples.
For example suppose the strength of spot welds is measured and the numbers falling within a band
of 10N are plotted and we get a graph as shown. (This is a fictional table) x is the variable
representing the strength in Newtons at the middle of each band and f the number in each band.

If the data is truly random and no factors exist to make the results biased to one extreme or the
other, the plots usually compare well with the normal distribution curve.
Consider a bell shaped distribution curve. The mean occurs at or near the middle. The deviation
from the mean at any point is d. Next consider the graph of d plotted against f and further the graph
of d2 plotted against f. On this last graph we find the mean d2 as follows.

(d12 + d 22 + d 32 + .....d 2n )
The mean height of the graph is the variance S =
n
2
d
For reasons not explained here, n-1 is often used instead of n on the bottom
In general S =
n
f x x 2

S = 2 =
line. Substitute d = x - x
is the standard deviation.
n 1

D.J.Dunn www.freestudy.co.uk 5

It can be shown that the formula simplifies to

fx 2 fx
=
f f

WORKED EXAMPLE No. 2

The following is a grouped set of data for visits made to the doctor by a sample of children.
Visit to Doctor
No.of Children
Total Visits Cumulative
x
f
fx
0
2
0
2
1
8
8
10
2
27
54
64
3
45
135
199
4
38
152
351
5
15
75
426
6
4
24
450
7
1
7
457
Totals
f = n = 140
f x = 455
Mean number of visits = 455/140 = 3.25.

fx 2 fx

f
f

x2
0
1
4
9
16
25
36
49
2

fx2
0
8
108
405
608
375
144
49
fx2=1697
2

1697 455
=

= 1.55
140 140

= 1.25

Some texts give the formula as

fx 2 fx

f
1
f

1697 455
=

= 1.57 = 1.25
139 140

This does not make much difference so long as the total number of samples is very small.
WORKED EXAMPLE No. 3

The hardness of 143 samples of steel is measured and grouped into bands as shown. Calculate
the mean and standard deviation. The figures of 17.5 and 21.5 result from one sample being
exactly 91 units and so half is allocated to each band.
Range
89-91
91-93
93-95
95-97
97-99
99-101 100
Totals

mid point freq.


x
90
92
94
96
98
17

f
17.5
21.5
32
38
17
1700
143

fx

acc f

x2

f x2

1575
1978
3008
3648
1666
143
13575

17.5
39
71
109
126
10000

8100
8464
8836
9216
9604
170000

141750
181976
282752
350208
163268
1289954

Mean = 13575/143 = 94.93

fx 2 fx

f f

1289954 13575

= 8.939
143
143

= 2.99

It is of interest to note that in this population, we get a very different answer using the other
formula.

fx 2 fx

f 1 f

D.J.Dunn www.freestudy.co.uk 6

1289954 13575
=

= 72.46
142
143

= 8.51

SELF ASSESSMENT EXERCISE No. 2

1. The accuracy of 100 instruments was measured as a percentage and the results were grouped
into bands of 1% as shown. Calculate the mean and the standard deviation.
Range
Mid
61.5-62.5
62
62.563
63.564
64.565
65.566
66.567
67.568
68.569
69.570
70.571
71.572
72.573
73.574
74.575
75.5-76.5
76
Answers 68.88 and 2.74%

freq
1
2
3
4
8
12
13
18
14
10
5
4
3
2
1

2. The breaking strengths of 150 spot welds was measured in Newton and grouped into bands of
20 N as shown.
Range
160-10
180-200
200-220
220-240
240-260
260-280
280-300
300-320

f
2
6
10
28
50
31
15
8

Calculate the mean and the standard deviation. (Answers 251.47 N and 29.04 N)

D.J.Dunn www.freestudy.co.uk 7

4.

NORMAL DISTRIBUTION CURVE

Many examples of data distribution produce a bell shaped curve when plotted that is symmetrical
about the mean value. This is a natural event since we expect most things to have a lot of values
near the mean and very few far away from the mean. Mathematicians have produced various
models of the bell shaped curve and the one most widely used is the normal distribution curve given
by the equation.
( x x )2
2

e 2
y=
2
x is the mean value of x and is the standard deviation. These two parameters define the shape of
the curve. The plots show that the smaller the value of the taller the graph becomes. x is at the
centre and corresponds to the median.

Without proof it can be shown that the area under the


curve between x = - and x = + is always exactly 1.0
what ever the values of x and .
Further it can be shown that the area between x = x and
x = is always 34.13% of the total area under the curve.
xx
.

This produces a curve exactly the same as though = 1


and x = 0 and this is called the NORMALISED
STANDARD DISTRIBUTION CURVE. In other words,
the z value corresponds exactly to standard deviations.

To make life easier we change the x axis to z =

We are now plotting a curve of

y=

(z )2
e 2

The way the area under the curve is distributed is


shown. Remember that the distribution is exactly the
same for the non standardised curve in terms of .
There are several other ways to define the area
distribution. PERCENTILES give the area under the
curve between - and z where z is expressed as a %.
Hence for the standardised normal distribution curve,
the 50th percentile is at 0.5. QUARTILES divide the
areas into sections and are named as shown on the
diagram.

D.J.Dunn www.freestudy.co.uk 8

The area under the curve of the standardised normal curve between z = - and z can be put into the
table form below. This is mostly used for probability problems and so the data in the table below is
called the probability content. The tabled values give the blue area of the diagram for any value of
z. These areas have important use for solving statistical and probability problems. The total area is
1.0 so the yellow area = 1 blue area. Because the graph is symmetrical, the areas between any two
values of z are easily found from the table.

Tables of the Normal Distribution


Probability Content from - to z
z
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0

0.00
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987

0.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987

D.J.Dunn www.freestudy.co.uk 9

0.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987

0.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988

0.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988

0.05
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989

0.06
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989

0.07
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989

0.08
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990

0.09
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990

ACCUMULATIVE DATA

The areas under the normal distribution curve is


z

A = y dz .
0

This is the cumulative distribution. If we plot the data for


the standardised normal distribution (i.e. plot the data in
the table) we get the result shown (the Ogive).
APPLICATION TO STATISTICAL DATA
z

The area under the normal distribution curve between z = 0 and any other value is A = y dz
0

If we change the vertical scale by a factor C then A = C y dz = C y dz


The area under the curve is directly proportional to the vertical scale but clearly the % ratios shown
on the plot would stay the same and the table values are simply multiplied by C. This is important
when applying this work to real situations.
If we calculate the standard deviation as shown earlier, we can use the normalised distribution curve
to solve problems on the assumption that it is a fair representation of the actual distribution. In
order to do this we must find the z value.
xx
z=
. We can now work out what % of the samples fall within any values of z.

D.J.Dunn www.freestudy.co.uk 10

WORKED EXAMPLE No. 4

The strength of spot welds is measured and the numbers falling within a band of 10N are
shown. (This is the example on the previous page).
Strength(N)
Number

130 150 170 190 210 230 250 270 290 310 330 350 370
1
1
2
6
18 26 28 24 17 8
2
1
1

Determine the mean and standard deviation. Using the Normalised distribution data find out the
probability of a spot weld having a strength less than 200 N?
SOLUTION

mid
x
130
150
170
190
210
230
250
270
290
310
330
350
370
Totals

freq.
f
1
1
2
6
18
26
28
24
17
8
2
1
1
135

fx

acc f

x2

f x2

130
150
340
1140
3780
5980
7000
6480
4930
2480
660
350
370
33790

1
2
4
10
28
54
82
106
123
131
133
134
135

16900
22500
28900
36100
44100
52900
62500
72900
84100
96100
108900
122500
136900

16900
22500
57800
216600
793800
1375400
1750000
1749600
1429700
768800
217800
122500
136900
8658300

Mean = 33790/135 = 250.3

fx 2 fx
=
f f

8658300 33790

= 1487
135
135

= 38.56

200 250.3
= 1.304
38.56
If we look up the table value for 1.30 we get 0.9032. The answer we need is 1- 0.9032 = 0.0968
so the probability of getting a spot weld weaker than 200 N is 9.68% or put another way 9.68%
of the samples are likely to be less.

Now find the z value for 140 N z =

In manufacturing, the standard deviation is usually monitored continuously by measuring systems.


This saves us having to work it out. The next example is based on this.

D.J.Dunn www.freestudy.co.uk 11

WORKED EXAMPLE No. 5

A machine makes electronic resistors with a nominal value of 22 k and tolerance 5%


A sample of 1000 produces a mean 21.8k and a standard deviation of 0.8 k
Assuming the normal probability curve applies, determine the percentage that is likely to fall
within the required values.
SOLUTION

The resistors must have a range of 22 k 5% so they must fall within a band of 23.1k and
20.9 k.
23.1 21.8
20.9 21.8
= 1.625 and z =
= 1.125
0.8
0.8
From the table the probability of being less than 23.1 corresponds to z = 1.625 and is 0.9479
(half way between two values).
The z values are z =

To find the probability of being less than 20.9 look up z = 1.125 and subtract from 1.0. Hence we
get 1 - 0.8697 = 0.1303
The quantity or probability of falling within the required limits is 0.9479 0.1303 = 0.8176
81.8% are acceptable, the rest are too high or too low.

SELF ASSESSMENT EXERCISE No. 3

1. A machine tool producing ground pins must produce a diameter of 12 mm 0.05 mm.
Continuous monitoring of the size by gauging equipment shows that the mean is 12.01 with a
standard deviation of 0.03. Assuming a normal distribution, what is the % rejected?
(11.4%)
2. The lifespan in hours of a mass produced light optical device is normally distributed and has a
mean of 1400 with a standard deviation of 300.
What is the probability of one taken at random having a lifespan between 1400 and 1850 hours?
(43.3%)
What is the percentage that will last longer than 2100 hours?

(1%)

If the guarantee is for 1000 hours, what percentage will fail to meet the guarantee?
What lifespan should be guaranteed if 95% must obtained? (907 hours)

D.J.Dunn www.freestudy.co.uk 12

(9.1%)

You might also like