Professional Documents
Culture Documents
Session objectives
Statistics
Topics
Overview of Statistics
Descriptive Statistics
Population and Sample
Inferential Statistic
Statistics
Overview of
Statistics
Overview of Statistics
Statistics
Descriptive
Statistics
Descriptive Statistics
Statistics
Measures of Central
Tendency
Mean
Mean is sum of all measurements divided by the number of
observations in the data set.
Statistics
Measures of Central
Tendency
Median
The middle value that separates the higher half from
the lower half of the data set. It is uninfluenced by
extreme values or outliers.
Mode
The most frequent value in the data set.
Statistics
Measures of
Dispersion
Measures od Dispersion
The measures of dispersion provides information
about spreading of observations within the
distribution. Whereas, measures of central tendency
are representatives of a frequency distribution.
Range
Range is difference between the largest observed
value of the variable in a data set and the smallest
one.
Variance
Variance is a measure of the average squared
deviation from the mean.
Statistics
10
Example
The CGPA scores of class of ten students are 7.7, 6.4, 7.4,
8.5, 8.5, 8.8, 9.1, 9.4, 7.8, 8.2. Determine the Mean,
Median, Mode and standard deviation.
Mean
Median
6.4, 7.4, 7.7, 7.8, 8.2, 8.5, 8.5, 8.8, 9.1, 9.4 (Arrange in
Even Numbers Take average of
ascending order)
two middle numbers
Mode
6.4, 7.4, 7.7, 7.8, 8.2, 8.5, 8.5, 8.8, 9.1, 9.4 (Most
frequent number) 7.7 6.4 7.4 8.5 8.5 8.8 9.1 9.4 7.8 8.2
Mode = 8.5 3.168 0.608 0.23040.1444 0.00040.1024 0.10240.38440.84641.4884
4
Variance
Statistics
11
Data Types
Data
Categorical
Numerical
Non-numerical
Quantifiable data
on attributes
Ex: Blood type, Sex, Age
Discrete
Statistics
Continuous
Finite in nature
Infinite in nature
Is countable
Is not countable
12
Histogram
Histograms are a useful way to illustrate the
frequency distribution of continuous data.
Example Class CGPA
Statistics
CGPA
Frequenc
y
3.1 - 4
4.1 - 5
5.1 - 6
6.1 - 7
11
7.1 - 8
8.1 - 9
9.1 - 10
The vertical
axis represents
the frequency.
Probability
Distribution Plot
Histograms cab be converted in to
probability distribution plot by converting
frequency in to probability.
Example
Frequen Probabil
ityClass CGPA
CGPA
cy
0.3
0.25
0.2
3.1 - 4
0.05
4.1 - 5
0.125
0.15
Probability
0.1
0.05
5.1 - 6
0.2
6.1 - 7
11
0.275
7.1 - 8
0.225
8.1 - 9
0.1
9.1 - 10
0.025
Total
40
Statistics
The vertical
axis represents
the probability.
Normal Distribution
Normal Distribution
Many dependent variables are commonly assumed to
be normally distributed in the population.
Symmetrical, bell-shaped curve
mathematical
(X with
)2
1
2 2
formula
f (X )
(e)
15
Inferential
Statistics
Inferential Statistics
Population and Sample
Inferential Statistics
Overview
Hypothesis Testing
Hypothesis Testing Types
Statistics
17
Population and
Sample
Population
All possible measurements or
outcomes that are of interest
to us in a particular study
Parameter
Measure of a
population
Statistics
Sample
Portion of the population that
is representative of the
population from which it was
selected
Statistic
Measure of a sample
18
Statistics
19
Hypothesis Testing
Hypothesis testing or significance testing is
a method for testing a claim or hypothesis
about a parameter in a population, using
data measured in a sample.
Steps involved in Hypothesis testing
1 State the null hypothesis H & alternative hypothesis H
Statistics
1.
20
Hypothesis Testing
Statistics
21
Hypothesis Testing
Statistics
22
Hypothesis Testing
Statistics
23
Hypothesis Testing
Statistics
24
Statistics
25
Statistics
Sample mean
Population mean
Population standard deviation
Sample size
26
H0 : = 70 months
H1 : > 70 months
2.
4.
5.
Statistics
70 months.
2 Sample Z-test
2 sample Z-test is a hypothesis test that is used to
compare two sample groups to determine if they
have originated from the same population.
It is performed when standard deviation is known.
The test statistic is
Where,
= Sample mean from population-1
= Sample mean from population-2
1 = Population-1 mean
2 = Population-2 mean
1 = Population-1 standard deviation
2 = Population-1 standard deviation
n1 = Sample size from population-1
n2 = Sample size from population-2
Statistics
28
Statistics
29
H0: = 46 KW hours
H1: < 46 KW hours
2.
= 0.05.
3.
Statistics
30
Where,
= Sample mean from
population-1
= Sample mean from
population-2
1 = Population-1 mean
2 = Population-2 mean
Statistics
31
Where,
= Sample mean from
population-1
= Sample mean from
population-2
1 = Population-1 mean
2 = Population-2 mean
d0 = 1 - 2
Statistics
32
H0: 1 - 2 = 2
H1: 1 - 2 > 2
2.
= 0.05.
3.
Critical region t > 1.725, where
4.
Do not Reject H0 , as Z = 1.04 & < 1.725
Statistics
5.
Paired t-Test
Paired t-test is special case of 2 sample t-test,
provides an hypothesis test of the difference
between population means for a pair of random
samples whose differences are approximately
normally distributed.
The statistic is
Where,
= mean of difference
D = 1 - 2
sd = standard deviation of difference
n = Sample size
Statistics
34
1.
312
242
340
388
296
254
391
402
290
H0: 1 - 2 = 0 or D = 0
H1: 1 - 2 0 or D 0
Statistics
2.
= 0.05.
3.
Critical region t < 2.145 and t > 2.145,
where
35
F test
The F-test is designed to test if two population
variances are equal. It does this by comparing the
ratio of two variances. So, if the variances are equal,
the ratio of the variances will be 1.
The F-distribution is formed by the ratio of two
independent chi-square variables divided by their
respective degrees of freedom.
The test statistics is
Where,
S1 = Sample standard deviation from
Population-1
S2 = Sample standard deviation from
Population-2
1 = Population-1 standard deviation
Statistics2 = Population-1 standard deviation
36
F test - Example
An experiment was performed to compare the
material strength, with 12 of material-1 & 10 pieces
of material-2. average strength from sample of
material-1 , is 85 units with a S1 = 4, while the
samples of material 2 gave an average of 81 units
with a S2 = 5. Assume the populations to be
approximately normal. Investigate variances are
equal at 0.10 level of significance.
1.
2.
Statistics
= 0.1
3.
Critical region f < 0.34 or f > 3.11
37
ANOVA
ANOVA Analysis of Variance
ANOVA analysis is used to compare the
sample of more then three and get the
parameter about the population i.e., to
check the samples belong to same
population or not.
This method is used to compare more than
three samples.
It uses F-test to test if population variances
are equal.
Statistics
38
References
Books, papers
Text book: Probability & statistics for engineers &
scientists/Ronald E. Walpole . . . [et al.] 9th ed.
Agresti, A. & Finlay, B., Statistical Methods for the
Social Sciences, 3th Edition. Prentice Hall, 1997.
Anderson, T. W. & Sclove, S. L., Introductory
Statistical Analysis. Houghton Mifflin Company,
1974.
Clarke, G.M. & Cooke, D., A Basic course in
Statistics. Arnold,1998.
Statistics
39
Thank you!
Statistics
40