You are on page 1of 40

Statistics

Session objectives

At the end of this module, participants will


be able to,
Explain statistical theory and models used for
engineering analysis.
Select appropriate statistical method to describe
data.
Use appropriate statistical method in engineering
judgment, decision making.

Statistics

Topics

Overview of Statistics
Descriptive Statistics
Population and Sample
Inferential Statistic

Statistics

Overview of
Statistics

Overview of Statistics

Statistics is a methodology for


collecting, analysing,
interpreting and finding
conclusions from the
information.
It can be divided in to two
groups
Descriptive Statistics
Inferential Statistics

Statistics

Descriptive
Statistics

Descriptive Statistics

Statistics

Measures of Central Tendency


Measures of Dispersion
Data Types
Histogram
Probability Distribution Plot
Normal Distribution

Measures of Central
Tendency

Measures of Central Tendency


Descriptive measures that show where the center or the most
usual value of the data lies in collected set of measurements
are called measures of center tendency, which can be
determined by using Mean, Median and Mode.

Mean
Mean is sum of all measurements divided by the number of
observations in the data set.

Limitations of Mean - The mean is influenced considerably by


the presence of the extreme observation.

Statistics

Measures of Central
Tendency

Median
The middle value that separates the higher half from
the lower half of the data set. It is uninfluenced by
extreme values or outliers.

Mode
The most frequent value in the data set.

Statistics

Mean, Median and Mode Application


Mean is used when the data distribution is
symmetric. Median is used when the data

Measures of
Dispersion

Measures od Dispersion
The measures of dispersion provides information
about spreading of observations within the
distribution. Whereas, measures of central tendency
are representatives of a frequency distribution.

Range
Range is difference between the largest observed
value of the variable in a data set and the smallest
one.

Variance
Variance is a measure of the average squared
deviation from the mean.

Statistics

10

Example
The CGPA scores of class of ten students are 7.7, 6.4, 7.4,
8.5, 8.5, 8.8, 9.1, 9.4, 7.8, 8.2. Determine the Mean,
Median, Mode and standard deviation.
Mean
Median
6.4, 7.4, 7.7, 7.8, 8.2, 8.5, 8.5, 8.8, 9.1, 9.4 (Arrange in
Even Numbers Take average of
ascending order)
two middle numbers

Mode
6.4, 7.4, 7.7, 7.8, 8.2, 8.5, 8.5, 8.8, 9.1, 9.4 (Most
frequent number) 7.7 6.4 7.4 8.5 8.5 8.8 9.1 9.4 7.8 8.2
Mode = 8.5 3.168 0.608 0.23040.1444 0.00040.1024 0.10240.38440.84641.4884
4

Variance

Statistics

11

Data Types
Data

Categorical

Numerical

Non-numerical

Quantifiable data

Categorical data represent types of data

Can be ordered or ranked

which may be divided into groups and based

Ex: No. of rivets in a plane, height of a plant

on attributes
Ex: Blood type, Sex, Age

Discrete

Statistics

Continuous

Finite in nature

Infinite in nature

Is countable

Is not countable

Ex: No. of rivets in a plane

Ex: Height of a plant

12

Histogram
Histograms are a useful way to illustrate the
frequency distribution of continuous data.
Example Class CGPA

Statistics

CGPA

Frequenc
y

3.1 - 4

4.1 - 5

5.1 - 6

6.1 - 7

11

7.1 - 8

8.1 - 9

9.1 - 10

The vertical
axis represents
the frequency.

The horizontal axis represents


CGPA and contains the classes
of the frequency distribution.
13

Probability
Distribution Plot
Histograms cab be converted in to
probability distribution plot by converting
frequency in to probability.
Example
Frequen Probabil
ityClass CGPA
CGPA
cy

0.3

0.25

0.2

3.1 - 4

0.05

4.1 - 5

0.125

0.15
Probability

0.1
0.05

5.1 - 6

0.2

6.1 - 7

11

0.275

7.1 - 8

0.225

8.1 - 9

0.1

9.1 - 10

0.025

Total

40

Statistics

Cumulative Grade per Annum (CGPA)

The vertical
axis represents
the probability.

The horizontal axis represents


CGPA and contains the classes
of the frequency distribution.
14

Normal Distribution

Normal Distribution
Many dependent variables are commonly assumed to
be normally distributed in the population.
Symmetrical, bell-shaped curve
mathematical
(X with
)2

1
2 2
formula
f (X )
(e)

Area under curve


+ 1 ~ 68%
+ 2 ~ 95%
+ 3 ~ 99.9%
Statistics

15

Inferential
Statistics

Inferential Statistics
Population and Sample
Inferential Statistics
Overview
Hypothesis Testing
Hypothesis Testing Types

Statistics

17

Population and
Sample

Population
All possible measurements or
outcomes that are of interest
to us in a particular study

Parameter
Measure of a
population

Statistics

Sample
Portion of the population that
is representative of the
population from which it was
selected

Statistic
Measure of a sample

18

Inferential Statistics Overview


Inferential statistics is a statistical method
used to draw the conclusion of population
by using the information from sample drawn
from it.
It determines the probability of the
population by using the characteristics of
the sample.

Statistics

19

Hypothesis Testing
Hypothesis testing or significance testing is
a method for testing a claim or hypothesis
about a parameter in a population, using
data measured in a sample.
Steps involved in Hypothesis testing
1 State the null hypothesis H & alternative hypothesis H

Statistics

Choose a fixed significance level .

Choose an appropriate test statistic and establish the


critical region based on .

Reject H0 if the computed test statistic is in the critical


region. Otherwise, do not reject.

Draw scientific or engineering conclusions.

1.

20

Hypothesis Testing

Statistics

Step-1 : State the null hypothesis H0 &


alternative hypothesis H1.
The null hypothesis is usually the hypothesis that the
investigator wants to collect evidence against or is
the hypothesis to be verified . The H0 carries the
symbol = ; or at certain times endure two symbols
such as or , thus, making H0 composite in nature.
The alternative hypothesis is usually the hypothesis
for which the investigator wants to collect associate
evidence by way of observation that could be
obtained from the sampling test.
The alternative hypothesis is the opposite of the null
hypothesis. For a two sided test, the symbol used in
the statement is and for a one sided test, it is
either < or >.

21

Hypothesis Testing

Step-2 : Choose a fixed significance level .


The level of significance refers to the criteria of
judgment over which a decision is made.
The significance level, denoted by , is the
probability of committing a Type I Error
Type 1 error is that of rejecting a null hypothesis
which is, in actually, true.
Level of significance should be calculated based on
the whether the test is onesided or twosided (Onetailed or two-tailed).

Statistics

22

Hypothesis Testing

Step-3 : Choose an appropriate test statistic


and establish the critical region based on .
The test statistic could be Z , t, 2, F , etc. depending
on the suitable sampling distribution of the sample
statistic to be used.
The rejection zone covers a total zone equal to .
The boundaries of the critical region is determined
by the significance value (single critical value for a
one sided test of significance or two critical values
for a two sided test).
The critical value of the test statistic can be
delivered from the statistical tables.

Statistics

23

Hypothesis Testing

Step-4 : Reject H0 if the computed test


statistic is in the critical region. Otherwise,
do not reject.

Step-5 : Draw scientific or engineering

Statistics

24

Hypothesis Testing Types


Single sample Z-test
2 Sample Z-test
Single sample t-Test
Two sample t-Test
Paired t-Test
F test
ANOVA

Statistics

25

Single sample Z-test

Single sample Z-test


Z-test is tests of hypothesis on a single population
mean.
It is used when the standard deviation is known.
The test statistic is
Where,
=
=
=
n=

Statistics

Sample mean
Population mean
Population standard deviation
Sample size

26

Single sample Z-test Example


A random sample of 100 recorded life of Laptop of certain
model shows an average life of 71.8 months. Assuming a
population standard deviation of 8.9 months, does this seem
to indicate that the mean life of laptop is greater than 70
months? Use a 0.05 level of significance.
1.

H0 : = 70 months
H1 : > 70 months

2.

= 0.05 (level of significance)


3.

4.
5.
Statistics
70 months.

Critical region Z > 1.645


Reject H0 , as Z = 2.02 & > 1.645
Conclusion - Mean life of laptop is greater than
27

2 Sample Z-test
2 sample Z-test is a hypothesis test that is used to
compare two sample groups to determine if they
have originated from the same population.
It is performed when standard deviation is known.
The test statistic is
Where,
= Sample mean from population-1
= Sample mean from population-2
1 = Population-1 mean
2 = Population-2 mean
1 = Population-1 standard deviation
2 = Population-1 standard deviation
n1 = Sample size from population-1
n2 = Sample size from population-2
Statistics

28

Single sample t-Test


t-test is tests of hypotheses on a single population
mean.
It is used when the standard deviation is unknown.
The test statistic is
Where,
= Sample mean
= Population mean
s = Sample standard deviation
n = Sample size

Statistics

29

Single sample t-test


It is claimed that a vacuum cleaner uses an average of
46 KW hours per year. If a random sample of 12 homes
indicates that vacuum cleaners use an average of 42
kilowatt hours per year with a standard deviation of
11.9 kilowatt hours, does this suggest at the 0.05 level
of significance that vacuum cleaners use, on average,
less than 46 KW hours annually? Assume the population
of kilowatt hours to be normal.
1.

H0: = 46 KW hours
H1: < 46 KW hours

2.
= 0.05.
3.
Statistics

Critical region t < - 1.796, where

30

Two sample t-Test


2 sample t-test is a hypothesis test that is used to
compare two sample groups to determine if they
have originated from the same population.
The two sample t-test is performed when standard
deviation is unknown.
For unknown but equal variances, the statistic is

Where,
= Sample mean from
population-1
= Sample mean from
population-2
1 = Population-1 mean
2 = Population-2 mean
Statistics

s1 = Sample standard deviation from


Population-1
s2 = Sample standard deviation from
Population-2
n1 = Sample size from population-1
n2 = Sample size from population-2

31

Two sample t-Test


Unknown but unequal variances, the statistic is

Where,
= Sample mean from
population-1
= Sample mean from
population-2
1 = Population-1 mean
2 = Population-2 mean
d0 = 1 - 2

Statistics

s1 = Sample standard deviation


from Population-1
s2 = Sample standard deviation
from Population-2
n1 = Sample size from population1
n2 = Sample size from population2

32

2 sample t-test Example


An experiment was performed to compare the material
strength, with 12 of material-1 & 10 pieces of material-2.
average strength from sample of material-1 , is 85 units
with a S1 = 4, while the samples of material 2 gave an
average of 81 units with a S2 = 5. Can we conclude at
the 0.05 level of significance that the strength of
material-1 exceeds that of material-2 by more than 2
units? Assume the populations to be approximately
normal with equal variances.
1.

H0: 1 - 2 = 2
H1: 1 - 2 > 2

2.
= 0.05.
3.
Critical region t > 1.725, where
4.
Do not Reject H0 , as Z = 1.04 & < 1.725
Statistics

5.

Conclusion - We are unable to conclude that the33

Paired t-Test
Paired t-test is special case of 2 sample t-test,
provides an hypothesis test of the difference
between population means for a pair of random
samples whose differences are approximately
normally distributed.
The statistic is
Where,
= mean of difference
D = 1 - 2
sd = standard deviation of difference
n = Sample size

Statistics

34

Paired t-Test Example


The marks scored by 9 students before and after
training are given in below table. Can we conclude
that Student
the training
has
made
1
2
3
4difference
5
6 in scores
7
8 at 9
0.05 Before
level of300
significance?
201 232 312 220 256 328 330 231
After

1.

312

242

340

388

296

254

391

402

290

H0: 1 - 2 = 0 or D = 0
H1: 1 - 2 0 or D 0

Statistics

2.
= 0.05.
3.
Critical region t < 2.145 and t > 2.145,
where

35

F test
The F-test is designed to test if two population
variances are equal. It does this by comparing the
ratio of two variances. So, if the variances are equal,
the ratio of the variances will be 1.
The F-distribution is formed by the ratio of two
independent chi-square variables divided by their
respective degrees of freedom.
The test statistics is
Where,
S1 = Sample standard deviation from
Population-1
S2 = Sample standard deviation from
Population-2
1 = Population-1 standard deviation
Statistics2 = Population-1 standard deviation

36

F test - Example
An experiment was performed to compare the
material strength, with 12 of material-1 & 10 pieces
of material-2. average strength from sample of
material-1 , is 85 units with a S1 = 4, while the
samples of material 2 gave an average of 81 units
with a S2 = 5. Assume the populations to be
approximately normal. Investigate variances are
equal at 0.10 level of significance.
1.
2.

Statistics

= 0.1
3.
Critical region f < 0.34 or f > 3.11
37

ANOVA
ANOVA Analysis of Variance
ANOVA analysis is used to compare the
sample of more then three and get the
parameter about the population i.e., to
check the samples belong to same
population or not.
This method is used to compare more than
three samples.
It uses F-test to test if population variances
are equal.

Statistics

38

References

Books, papers
Text book: Probability & statistics for engineers &
scientists/Ronald E. Walpole . . . [et al.] 9th ed.
Agresti, A. & Finlay, B., Statistical Methods for the
Social Sciences, 3th Edition. Prentice Hall, 1997.
Anderson, T. W. & Sclove, S. L., Introductory
Statistical Analysis. Houghton Mifflin Company,
1974.
Clarke, G.M. & Cooke, D., A Basic course in
Statistics. Arnold,1998.

Statistics

39

Thank you!

Statistics

40

You might also like