You are on page 1of 38

Descriptive Statistics

Sheila R. Bonito, DrPH


UP Open University

Learning Objectives
After this study session, you should be able to:
1.! 2.!

Organize data using frequency distributions Describe data using:


a.!Measures of central tendency b.!Measures of dispersion or variability

3.!

Present data with the use of:


a.!Tables b.!Graphs

4.!

Interpret tables and graphs

Frequency distribution
Frequency distribution is the organization of raw data in table form using classes and frequencies

Frequency distribution What is your age?


50 32 29 49 40 49 39 25 24 38 26 34

Types of frequency distribution


1.!

Ungrouped frequency distribution raw scores Categorical frequency distribution used for data that can be placed in specific categories, such as nominal or ordinal- level data

2.!

3.!

Grouped frequency distribution used when range of data is large; data are grouped into classes that are more than one unit in width

How to construct frequency distribution


How to construct categorical freq dist
1.! 2.! 3.! 4.!

Identify the classes and put them in first column Tally the data and place results in second column Count the tallies and place the results in column C Find the percentage of values in each classes by using the formula: % = f/n * 100 Find the totals for columns C and D
Class
Male Female Total

5.!

Tally

Freq

Percent

How to construct frequency distribution


How to construct grouped frequency distn
1.!

Identify the classes and put them in first column


a.! Find the highest and lowest value b.! Find the range c.! Select the number of classes desired d.! Find the width (R/ # of classes) e.! Get the lower limits (lowest value + width) f.! Get the upper limits g.! Find the boundaries (LL 0.5 and UL + 0.5)

How to construct frequency distribution


How to construct grouped freq dist
2.! 3.! 4.!

Tally the data and place results in second column Find the numerical frequencies from the tallies Find the cumulative frequencies from the tallies
Class limits 21-30 31-40 41-50 Total Class bound aries Tally Frequ ency Perce nt

Principles in constructing grouped frequency distribution


1.! 2.! 3.! 4.! 5.!

The classes must be mutually exclusive The classes must be continuous The classes must be exhaustive The classes must be equal in width The class width should be an odd number

Uses of frequency distribution


1.! 2.! 3.! 4.!

5.!

Organize data in a meaningful, intelligible way Determine the nature or shape of distribution Compare different data sets Facilitate computational procedures for measures of central tendency and dispersion Draw charts and graphs for the presentation of data

Applications of frequency distribution


!! !! !! !! !! !!

present the number of respondents in all categories of a nominal variable compare different groups on the categories of the same variable compute for proportion, percentage, ratio, rate compute for simple and cumulative frequencies and percentages calculate percentile rank cross-tabulation showing row, column and total percents

Example
Civil status Single Married Previously married Total f 20 30 10 60

Example
Civil status Single Married Previously married Total male 10 10 2 22 female 10 20 8 38

Example
Class interval 90-99 80-89 70-79 60-69 50-59 40-49 Total f 3 4 6 3 2 2 20 % 15 20 30 15 10 10 100 cf 20 17 13 7 4 2 c% 100 85 65 35 20 10

Percentile Rank = c%b + (X L /i) %

Class interval 90-99

f 3 4 6 3 2 2 20

% 15 20 30 15 10 10 100

cf 20 17 13 7 4 2

c% 100 85 65 35 20 10

Example

80-89 70-79 60-69 50-59 40-49 Total

Percentile Rank = c%b + (X L /i) % Given a score of 77, find its PR PR = 35 + (77-69.5/10) (30) = 35 + (7.5/10) (30) = 35 + 22.5 = 57.5

Example
Civil status Single Married Previously married Total male 10 (16.7%) 10 (16.7%) 2 (3.3%) 22 (36.7%) female 10 (16.7%) 20 (33.3%) 8 (13.3%) Total 20 (33.3%) 30 (50.0%) 10 (16.7%)

38 (63.3%) 60 (100.0%)

Measures of Central Tendency


Mean (average)
!!

!!

The sum of the values, divided by the total number of values, represented by !! For grouped data:
! Make a table as shown: Class Freq Midpoint Freq*Md ! Find the midpoints of each class ! Multiple the midpoints by the frequency ! Find the sum of column D ! Divide the sum of Col D by the sum of freq

Measures of Central Tendency


!! Median

! divides a distribution into two equal halves ! not influenced by extreme values ! Steps:
! Arrange the data in order ! Select the midpoint

(see formula in separate sheet)

Measures of Central Tendency


!! Mode

! the value that occurs most often in a data set is called the mode ! For grouped data:
! The mode for grouped data is the modal class (the class with the largest frequency)

Measures of variability
!! !!

concerned with the spread of data enables the evaluation of the homogeneity or heterogeneity of a sample.

Measures of variability
!!

!!

!!

Range the simplest but most unstable measure of variability; the difference between the highest and lowest scores Semi-quartile range indicates the range of the middle 50% of the scores; more stable than range Percentile represents the percentage of cases a given score exceeds

Measures of variability
!!

Variance = average of the squares of the distance each value is from the mean
(see formula in separate sheet)

!!

Standard deviation
the most frequently used measure of variability !! based on the concept of the normal curve !! a measure of average deviation of the scores from the mean and is always reported with the mean !! used in the calculation of many of the inferential statistics. (see formula in separate sheet)
!!

Measures of variability
!!

For grouped data:


!!

!! !! !! !! !!

Make a table as shown: Class Freq Midpoint Freq*Md Freq * Md2 Multiply the midpoints by the frequency and place in Col D Multiple the frequency by the square of the midpoints and place the products in Col E Find the sum of column B D and E Substitute in the formula and solve to get the variance Take the square root to get the standard deviation

Presentation of Data Results


Tabular Presentation !! Parts of a Table
!! !! !! !! !! !! !!

Table number Title Column headings Row headings Body Footnotes Source of data

Table number
Causesa Pneumonia Tuberculosis Diseases of the Body vascular Accidents Malignant neoplasms Diarrhea Dis of the circulatory sys Senility Avitaminosis

Title
1986b

Column headings
1991b Rate 95.0 44.77 42.7 31.6 24.4 23.3 15.6 7.3 6.9 4.0 Number 36,705 46,381 22,814 32,981 10,961 22,384 5,497 c c c Rate 57.7 72.9 35.9 51.8 17.2 35.2 8.6 -

Table 1 Ten Leading Causes of Mortality (Rate per 100,000 Population) 1986 and 1991 Number

Row headings53,500
25,039 23,926 17,721 13,662 13,040 8,727 4,097 3,863 2,217

Diseases of the heart

Footnotes

a - based on 1986 ranking b preliminary estimates c not included in the top ten for the year

Source of data

Source: National Statistics Office, Statistical Handbook of the Philippines, 1988.

Presentation of Data Results


Graphical Presentation
TYPE Bar graph NATURE of FUNCTION VARIABLE Comparison of absolute of or Qualitative relative counts, rates, ranks, or Discrete etc. between categories of a Quantitative qualitative or discrete
quantitative variable Shows the breakdown of a group in terms of percentages. This is appropriate when the number of categories is not too many

Component Qualitative bar graph/ Pie graph

Presentation of Data Results


TYPE
Histogram/ Frequency Polygon Line graph Scatterplot
NATURE OF VARIABLE

FUNCTION
Shows frequency distribution of continuous variable Shows trend or changes with time or age Shows correlation between two quantitative variables

Continuous Quantitative Time series Quantitative

Bar Graph
Ten Leading Causes of Mortality in the Philippines, 1991

Diseases of the heart Pneumonias Vascular System Tuberculosis, all forms Malignant Neoplasm Accidents Septicemia Diarrheal Diseases Nephritis, etc Respiratory Conditions of the Fetus & Newborn

10 20 30 40

50 60 70

80

Rate per 100,000 population

Horizontal Bar Graph


Life Expectancy of Selected Countries, Latest Available Years
U.S. Thailand Singapore Rep. Of Korea Japan India China Phils.

20

40

60

80

100

Life Expectancy in Years

Male

Female

Component Bar Graph


High Burden in Developing Countries
5,000 SEAR D WPR B (CHN,VTN, PHL) EUR C EUR A EUR B EMR D
Source: World Health Report, 2002

Lost healthy years (000s) from Cardiovascular disease in 2000


10,000 15,000 20,000 25,000 30,000

AMR A AMR B SEAR B AFR E AFR D EMR B WPR A (JPN) AMR D

Ischaemic heart disease Stroke Other cardiovascular disease

Vertical Bar Graph

Component Bar Graph

Pie Graph

Histogram

Table for Histogram

Frequency Polygon

Line Graph

Scatterplot

You might also like