You are on page 1of 64

BASIC STATISTIC

Two Types of Statistics


Descriptive statistics of a POPULATION
Relevant notation (Greek):

mean
N population size
sum

Inferential

statistics of SAMPLES from


a population.

Assumptions are made that the sample


reflects the population in an unbiased form.
Roman Notation:
X mean
n sample size
sum

Concept Of Population and


Sample
Population

Sample

MEASURES OF CENTRAL TENDENCY


FOR UNGROUPED DATA
Mean
Median
Mode

Mean
The mean for ungrouped data is
obtained by dividing the sum of all
values by the number of values in the
data set. Thus,
Mean for population data:
Mean for sample data:

x
N

x
n

Example 1
Table 3.1 gives the 2002 total payrolls
of five Major League Baseball (MLB)
teams.
Find the mean of the 2002 payrolls of
these five MLB teams.

Table 1

MLB Team
Anaheim Angels
Atlanta Braves
New York Yankees
St. Louis Cardinals
Tampa Bay Devil
Rays

2002 Total Payroll


(millions of dollars)
62
93
126
75
34

Solution 1

x 390

$78 million
n

Thus, the mean 2002 payroll of these five


MLB teams was $78 million.

Example 2
The following are the ages of all eight
employees of a small company:
53 32 61 27 39 44 49 57
Find the mean age of these employees.

Solution 2
x 362

45.25 years
N

Thus, the mean age of all eight


employees of this company is 45.25
years, or 45 years and 3 months.

10

Median
Definition
The median is the value of the middle
term in a data set that has been ranked
in increasing order.

11

Median cont.

The calculation of the median


consists of the following two steps:
1. Rank the data set in increasing order
2. Find the middle term in a data set with
n values. The value of this term is the
median.

12

Median cont.
Value of Median for Ungrouped Data
n 1
Median Value of the
th term in a ranked data set
2

13

Example 3
The following data give the weight lost
(in pounds) by a sample of five
members of a health club at the end of
two months of membership:
10 5 19 8 3
Find the median.

14

Solution 3
First, we rank the given data in
increasing order as follows:
3 5 8 10 19
There are five observations in the data
set. Consequently, n = 5 and
n 1 5 1
Position of the middle term

3
2
2
15

Solution 3
Therefore, the median is the value of
the third term in the ranked data.
3 5 8 10 19
Median

The median weight loss for this


sample of five members of this health
club is 8 pounds.
16

Example 4
Table 3.3 lists the total revenue for
the 12 top-grossing North American
concert tours of all time.
Find the median revenue for these
data.

17

Table 3
Tour
Steel Wheels, 1989
Magic Summer, 1990
Voodoo Lounge, 1994
The Division Bell, 1994
Hell Freezes Over, 1994
Bridges to Babylon,
1997
Popmart, 1997
Twenty-Four Seven,
2000
No Strings Attached,
2000
Elevation, 2001
Popodyssey, 2001

Total Revenue
(millions of dollars)
98.0
74.1
121.2
103.5
79.4
89.3
79.9
80.2
76.4
109.7
86.8
82.1
18

Solution 4

First we rank the given data in increasing


order, as follows:

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2

There are 12 values in this data set. Hence, n


= 12 and
n 1 12 1
Position of the middle term

6.5
2
2
19

Solution 4

Therefore, the median is given by the mean of the sixth


and the seventh values in the ranked data.

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2

Median

82.1 86.8
84.45 $84.45 million
2

Thus the median revenue for the 12 top-grossing North


American concert tours of all time is $84.45 million.
20

Mode
Definition
The mode is the value that occurs with
the highest frequency in a data set.

21

Example 5

The following data give the speeds (in


miles per hour) of eight cars that were
stopped on I-95 for speeding
violations.
77 69 74 81 71 68 74 73
Find the mode.

22

Solution 5

In this data set, 74 occurs twice and


each of the remaining values occurs
only once. Because 74 occurs with the
highest frequency, it is the mode.
Therefore,
Mode = 74 miles per hour

23

Mode cont.
A

data set may have none or many


modes, whereas it will have only one
mean and only one median.
The data set with only one mode is called
unimodal.
The data set with two modes is called
bimodal.
The data set with more than two modes
is called multimodal.
24

Example 6

Last years incomes of five randomly


selected families were $36,150.
$95,750, $54,985, $77,490, and
$23,740. Find the mode.

25

Solution 6

Because each value in this data set


occurs only once, this data set contains
no mode.

26

Example 7
The prices of the same brand of
television set at eight stores are
found to be $495, $486, $503, $495,
$470, $505, $470 and $499. Find the
mode.

27

Solution 7
In this data set, each of the two values
$495 and $470 occurs twice and each
of the remaining values occurs only
once.

Therefore, this data set has two


modes: $495 and $470.
28

Example 8
The ages of 10 randomly selected
students from a class are 21, 19, 27,
22, 29, 19, 25, 21, 22 and 30. Find the
mode.

29

Solution 8
This data set has three modes: 19, 21
and 22. Each of these three values
occurs with a (highest) frequency of
2.

30

MEASURES OF
DISPERSION FOR
UNGROUPED
DATA
Range
Variance

and Standard Deviation


Population Parameters and Sample
Statistics

31

Range
Finding Range for Ungrouped Data
Range = Largest value Smallest Value

32

Example 9
Table 3.4 gives the total areas in
square miles of the four western SouthCentral states of the United States.
Find the range for this data set.

33

Table 4

State
Arkansas
Louisiana
Oklahoma
Texas

Total Area
(square miles)
53,182
49,651
69,903
267,277

34

Solution 9
Range = Largest value Smallest Value
= 267,277 49,651
= 217,626 square miles
Thus, the total areas of these four
states are spread over a range of
217,626 square miles.
35

Variance and Standard


Deviation
The

standard deviation is the most


used measure of dispersion.
The value of the standard deviation
tells how closely the values of a data
set are clustered around the mean.

36

Variance and Standard


Deviation cont.
Short-cut Formulas for the Variance and
Standard Deviation for Ungrouped Data
( x )
x

N
2
N

and s 2

n 1

Where is the population variance and s is


the sample variance.

37

Variance and Standard


Deviation cont.
Short-cut Formulas for the Variance and
Standard Deviation for Ungrouped Data
The standard deviation is obtained by
taking the positive square root of the
variance.
2
Population standard deviation: 2
s s
Sample standard deviation:
38

Example 10
Refer to data in Table 3.1 on the 2002
total payroll (in millions of dollars) of
five MLB teams.
Find the variance and standard
deviation of these data

39

Solution 10
Table 3.6
x
62
93
126
75
34

x
3844
8649
15,876
5625
1156

x = 390

x = 35,150
40

Solution 10

(390) 2
x
35,150

35,150 30,420
2
n
5
s

1182 .50
n 1
5 1
4
s 1182 .50 34.387498 $34,387,498
2

Thus, the standard deviation of the


2002 payrolls of these five MLB teams
is $34,387,498.
41

Example 11
The following data are the 2002
earnings (in thousands of dollars)
before taxes for all six employees of a
small company.
48.50 38.40 65.50
22.60 79.80 54.60
Calculate the variance and standard
deviation for these data.
42

Solution 11
Table 3.7
x

48.50
38.40
65.50
22.60
79.80
54.60

2352.25
1474.56
4290.25
510.76
6368.04
2981.16

x = 309.40

x = 17,977.02
43

Solution 11

(309.40) 2
x
17,977.02

N
6
2
337.0489
N
6
337.0489 $18,359 thousand $18,359
2

Thus, the standard deviation of the


2002 earnings of all six employees of
this company is $18,359.
44

MEAN, VARIANCE AND


STANDARD DEVIATION FOR
Mean for Grouped
GROUPED
DATA Data
Variance

and Standard Deviation


for Grouped Data

45

Mean for Grouped Data


Calculating Mean for Grouped Data
mf

Mean for population data:


N

Mean for sample data:

mf

x
n

Where m is the midpoint and f is the


frequency of a class.
46

Example 12
Table 3.8 gives the frequency
distribution of the daily commuting
times (in minutes) from home to work
for all 25 employees of a company.
Calculate the mean of the daily
commuting times.

47

Table 8
Daily Commuting
Time (minutes)
0
10
20
30
40

to
to
to
to
to

less
less
less
less
less

than
than
than
than
than

10
20
30
40
50

Number of Employees
4
9
6
4
2

48

Solution 12
Table 3.9
Daily Commuting
Time (minutes)

mf

0
10
20
30
40

4
9
6
4
2

5
15
25
35
45

20
135
150
140
90

to
to
to
to
to

less
less
less
less
less

than
than
than
than
than

10
20
30
40
50

N = 25

mf = 535

49

Solution 12
mf

535

21.40 minutes
25

Thus, the employees of this company


spend an average of 21.40 minutes a
day commuting from home to work.

50

Example 13
Table 3.10 gives the frequency
distribution of the number of orders
received each day during the past 50
days at the office of a mail-order
company.
Calculate the mean.

51

Table 10

Number of Orders

Number of Days

10 12
13 15
16 18
19 21

4
12
20
14

52

Solution 13
Table 3.11
Number of
Orders

mf

10 12
13 15
16 18
19 21

4
12
20
14

11
14
17
20

44
168
340
280

n = 50

mf = 832

53

Solution 13
mf

x
n

832

16.64 orders
50

Thus, this mail-order company


received an average of 16.64 orders
per day during these 50 days.
54

Variance and Standard


Deviation for Grouped Data
Short-Cut Formulas for the Variance and
Standard Deviation for Grouped Data
( mf )
m f

N
2
N

and s 2

mf

f
n 1

Where is the population variance, s


is the sample variance, and m is the
midpoint of a class.
55

Variance and Standard


Deviation for Grouped Data
cont.

Short-cut Formulas for the Variance and


Standard Deviation for Grouped Data
The standard deviation is obtained by
taking the positive square root of the
variance.
2
Population standard deviation: 2
s s
Sample standard deviation:
56

Example 14
Table 3.8 gives the frequency
distribution of the daily commuting
times (in minutes) from home to work
for all 25 employees of a company.
Calculate the variance and standard
deviation.

57

Table 8
Daily Commuting
Time (minutes)
0
10
20
30
40

to
to
to
to
to

less
less
less
less
less

than
than
than
than
than

10
20
30
40
50

Number of Employees
4
9
6
4
2

58

Solution 14
Table 3.12
Daily Commuting
Time (minutes)

mf

mf

0
10
20
30
40

4
9
6
4
2

5
15
25
35
45

20
135
150
140
90

100
2025
3750
4900
4050

mf =
535

mf = 14,825

to
to
to
to
to

less
less
less
less
less

than
than
than
than
than

10
20
30
40
50

N = 25

59

Solution 14
2
2
(
mf
)
(
535
)

2
m
f
14,825

3376
2
N
25

135.04
N
25
25

Hence, the standard deviation is

2 135.04 11.62 minutes

Thus, the standard deviation of the


daily commuting times for these
employees is 11.62 minutes.
60

Mean - Grouped Data


When

data already grouped in


frequency distribution
h

i 1

xi

fi

fi (n)= sum. of freq.


fi

= freq in the ith cell

n
xi

= no. of cells/class
= mid point in ith cell

Mean Example 15
Grouped Data
Cell (i)

Class
boundary

Mid
Point
(xi)

Freq
(fi)

Fixi

fi

fixi

1 20

10

20

21 40

30

10

300

12

41 - 60

50

20

1000

32

61 80

70

12

840

44

81 -100

90

540

50

Totals

fi xi
fi

2700

= 2700/50 = 54

Median Grouped Data

Need to find cell / class having middle value &


interpolating in the cell using
n

cfm
i
x 0.5 L m 2
fm

Lm = lower boundary of cell with the median


Cfm
= Cum. freq. of all cells below Lm
fm =class/cell freq. where median occurs
i =
cell interval
Example
MD
=
40.5 + 10
= 53.5

Standard deviation
grouped data
Cell (i)

Class
boundary

Mid
Point
(xi)

Freq
(fi)

1 20

10

20

21 40

30

10

300

12

41 - 60

50

20

1000

32

61 80

70

12

840

44

81 -100

90

540

50

Totals
h

n (n 1)

NOTE:

fi

fixi

2700

f x

n fi x i
2

Fixi

50 (166,600) (2700) 2

424.49 20.6
50 49

DO NOT ROUND OFF fixi & fixi2


ACCURACY AFFECTED

You might also like