You are on page 1of 19

Introduction to Statistics

Statistics versus Probability


Probability and statistics are related areas of mathematics which concern themselves
with analyzing the relative frequency of events.
Probability deals with predicting the likelihood of future events, while statistics
involves the analysis of the frequency of past events.
Probability is primarily a theoretical branch of mathematics, which studies the
consequences of mathematical definitions. Statistics is primarily an applied
branch of mathematics, which tries to make sense of observations in the real
world.

What is Statistics?
Statistics is the study of collection of methods for planning experiments, obtaining
data, and then organizing, summarizing, presenting, analyzing, interpreting and
drawing conclusions.
Keep in mind:
Statistical inferences are no more accurate than the data they are based on
(weakest link).
Statistical results should be interpreted by one who understands the methods
used as well as the subject matter.

Individuals and Variables


Individuals are the people or objects included in the study.
Variable is the characteristic of the individual to be measured or observed
Example

if we want to do a study about the people who have climbed Mt. Everest, then the
individuals in the study are the actual people who made it to the top. The variables to
measure or observe might be the height, weight, race, gender, income, etc of the
individuals that made it to the top of Mt. Everest

Variables: Quantitative vs. Qualitative


A quantitative variable has a value or numerical measurement for which operations
such as addition or averaging make sense.
A qualitative variable is described by placing the individual into a category or group
such as male or female.

Simple Random
Sampling

Random Samples Simple Random Sampling


The outcome of a statistical experiment may be recorded either
as numerical value of a descriptive representation.
When a pair of dice is tossed and the total is the outcome
of interest, so we record a numerical value

When the students of a certain school are given blood


tests and the type of blood is of interest, then a
descriptive representation might be the most useful
(persons blood can be classified in 8 ways: AB, A, B, or O
with plus and minus)

Definitions
1. Population consists of the totality of the observations with
which we are concerned
2. Sample is a subset of a population
3. If X1, X2, , Xn represent a random sample of size n, then the
sample mean is defined by the statistic is:
xi
x1 x2 ... xn
x
n
i 1 n
n

4. If X1, X2, , Xn represent a random sample of size n, then the


sample variance is defined by the statistic is:
S2

1
2

X
i
n 1 i 1
n

n
n


1
2
n X i X i


n n 1 i 1
i 1

Definitions
5. The sample standard deviation denoted by S, is the
positive square root of the sample variance
6. The mode of a data set is the value that occurs most
frequently
7. The median is the central value of an ordered distribution

Order the data from smallest to largest


For an odd number of data,
Median = middle data value
For an even number of data,
Median = (sum of two middle values)/2

Example 01:
The lengths of time, in minutes, that 10 patients waited in a
doctors office before receiving treatment were recorded as
follows: 5, 11, 9, 5, 10, 15, 6, 10, 5 and 10.
Treating the data as a random sample, find:
a) The mean
b) The median
c) The mode
d) The variance
1
86
5 11 9 5 10 15 6 10 5 10 8.6
Sample mean x
10
10
Ordered sampling : 5
Median

5 6

9 10 10 10

10 11 15

9 10
9 .5
2

Mode are 5 and 10

2=

1
1

10
=1

= 10.93
8

Example 02:
The following measurements were recorded for the drying time,
in hours, of a certain brand of Latex paint.
3.4 2.5 4.8 2.9 3.6
2.8 3.3 5.6 3.7 2.8
4.4 4.0 5.2 3.0 4.8
a) Calculate the sample mean, sample median, mode and the
sample variance?
mean = x =

1
(3.4 + 2.5 + ... + 4.8) = 3.787
15

Median= 3.6
Mode = 2.8 and 4.8

2=

1
1

15
=1

= 0.9429

Sampling Distribution
The probability distribution of a statistic is called a
sampling distribution
The sampling distribution of a statistic depends on the size
of the population, the size of the samples, and the method of
choosing the samples.
1. Sampling Distribution of Means
The sampling distribution ofX with sample size n is the
distribution that results when a experiment is conducted
over and over and the many values ofX result
Then this sampling distribution describes the variability of
sample averages around the population mean
10

Sampling Distribution of Means


Suppose that a random sample of n observations is taken
from a normal population with mean and variance 2.
Each observation Xi of the random sample will then have the
same normal distribution as the population being sampled,
we conclude that:
the mean : X

1
X1 X 2 ... X n
n

Has a normal distribution with mean:


X

1
...
n

And variance
X2

2
2
1 2
n

2 2 ... 2 2
n
n
n

11

Central Limit Theorem


IfX is the mean of a random sample of size n taken from a
population with mean and finite variance 2, then the
limiting form of the distribution of:
Z

is approximately a standard
normal distribution

If n 30, the normal approximation forX will be good


If n < 30, the approximation is good only if the population is
not too different from a normal distribution

12

Example 03:
An electrical firm manufactures light bulbs that have a length of
life that is approximately normally distributed, with mean equal
to 800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an
average life of less than 775 hours.
The sampling distribution of X will be approximately normal
with:
40
X 800 and X

x X

16

10

775 800
2.5
10

therefore : PX 775 PZ 2.5 0.0062

13

Sampling Distribution of the difference between two averages


If independent samples of size n1 and n2 are drawn at random
from two populations, discrete or continuous, with means 1
and 2, and variances 12 and 22, respectively, then the
sampling distribution of the differences of means, X1 - X2, is
approximately normally distributed with mean and variance
given by:
X
Hence : Z

X2

12

22

1 2 and 2X1 X 2

n1
n2

X 2 1 2

n1

2
1



n2

is approximately a standard normal variable

2
2

14

Example 04:
Two independent experiments are being run in which two
different types of paints are compared. Eighteen specimens are
painted using type A and the drying time, in hours, is recorded
on each. The same is done with type B. the population standard
deviations are both known to be 1.0.
Assuming that the mean drying time is equal for the two types
of paint, find P(XA - XB > 1.0), where XA and XB are average
drying times for samples of size nA = nB = 18

XB

2
X A X B

A B 0

A2

B2

1
1
1

nA
nB 18 18 9

1 A B
z
3.0; PZ 3.0 1 PZ 3.0 1 0.9987 0.0013
1
15
9

2. Sampling Distribution of S2
If S2 is the variance of a random sample of size n taken from
a normal population having the variance 2, then the
statistic:
2

n 1 S
2

i 1

has a Chi Squared distributi on with n 1

: degrees of freedom

16

Exactly 95% of a chi-squared distribution lies between 20.975 and


20.025. A 2 value falling to the right of 20.025 is not likely to occur
unless our assumed value of 2 is too small. Similarly, a 2 value falling
to the left of 20.975 is unlikely unless our assumed value of 2 is too
17
large.

Example 05:
A manufacturer of car batteries guarantees that his batteries
will last, on the average, 3 years with a standard deviation of 1
year. If five of these batteries have lifetimes of 1.9, 2.4, 3.0, 3.5
and 4.2 years, is the manufacturer still convinced that his
batteries have a standard deviation of 1 year? Assume that the
battery lifetime follows a normal distribution
S2

1
2

X
i
n 1 i 1
n

n
n

1
n X i2 X i


n n 1 i 1
i

548.26 15 0.815
54
1

Then 2

n 1 S 2

4 0.815 3.26

1
is a value from a Chi Squared distributi ons
with 4 degrees of freedom

18

Since 95% of the 2 values with 4 degrees of freedom fall


between 0.484 and 11.143, the computed value with 2 = 1 is
reasonable and therefore the manufacturer has no reason to
suspect that the standard deviation is other than 1 year.

19

You might also like