You are on page 1of 71

Presentation

and Analysis of
Statistical Data

Statistics
A science of conducting studies to collect,
organize, summarize, analyze, and draw
conclusion from data; interpreting and
presenting numerical data.
Can refer to the mere tabulation of
numeric information as in reports of stock,
market, transactions, or to the body of
techniques used in processing or analyzing
data.

Two Branches of Statistics


Statistics can
branches:

be

organized

into

two

Descriptive statistics
Concerned with collecting, organizing,
presenting, and analyzing numerical
data.
Inferential statistics
Its main concern is to analyze the
organized
data leading to prediction or
inferences.

Some Important Concepts


Data
Data are the raw material which the statistician works. Data
can be found through surveys, experiments, numerical
records, and other modes of research.

Population
Refers to the groups or aggregates of people, objects,
materials, events or thing of any form.

Sample
Consist of few or more members of the population.

Outliers
An outlier is a data point that lies outside the overall pattern
of a distribution.

Variables - types
Statistical data or information can be gathered
through different ways such as interviewing
people, observing or inspecting items, using
questionnaires and checklists. The characteristic
that is being studied is called a variable. It
varies from one person or thing to another.
Numerical values and are examples of
quantitative variables.
Non-numerical values or attributes are examples
of qualitative variables.

Dependent and Independent


Variables
A variable can be dependent or independent
depending on its use.
To predict the value of variable on the other,
Independent variable is the predictor while the
dependent variable is the variable whose value
is being predicted.

Scales of Measurement
of Data
Nominal Data
Use numbers for the purpose of identifying name or membership in a group or
category.

Ordinal Data
Connote ranking or inequalities in this type of data, numbers represents
greater than or less than measurement, such as preferences or rankings.

Interval Data
Indicate an actual amount and there is equal unit of measurement
separating each score, specifically equal intervals. The true zero is
present.

Ratio Data
Similar to interval data but has an absolute zero and multiples
are meaningful. It include all the usual measurement of length,
height, weight, area, volume, density, velocity, money and
duration.

Types of Data
Primary Data - data collected directly
by the researcher himself. These
are first-hand or original sources.
They can be collected through:
1. Direct observation or measurement (primary
source of info).
2. By interview
(questionnaires or rating
scales).
3. By mail of recording or of recording forms.
4. Experimentation.

Secondary Data - Information taken from


published or unpublished materials previously
gathered by other researchers or agencies such
as book, newspapers, magazines; journals,
published and unpublished thesis and
dissertations.

Methods in Presenting
Data
Textual Form - data in paragraph
form.
Tabular
Form
systematic
arrangement of data in rows and
columns.
Graphical Form - a graph or chart is
a device for showing numerical
values in pictorial form.
Semi Tabular/Semi Tabular Form the combination of Textual and

Tabular Presentation
Used for summarization and tabulation
of data.
Helps in analysis of relationships, trends,
relative size of data.
Parts of a Table
Title and number
Caption ( column heading)
Stub (row heading)
Sources and footnotes (optional)

Graphical Presentation
Types of graphs or Charts:

Line Plots
Line Chart
Bar Chart
Stem and leaf plots
Pie Charts
Box Plots
Pictograms
Scatter Diagrams

Line Plots
Consists of
a horizontal number line of the possible data values;
one X for each element in the data set placed over the
corresponding value on the number line.

Works well when

the data is quantitative (numerical);


there is one group of data (uni-variate);
the data set has fewer than 50 values;
the range of possible values is not too great.

Line Plot Example


Suppose thirty people live in
an apartment building. The
ages of the residents are
below.
58,
40,
54,
34,
47,

30,
47,
50,
40,
46

37,
47,
35,
46,

36,
39,
40,
49,

34,
54,
38,
47,

49,
47,
47,
35,

35,
48,
48,
48,

The graph is easier to create


when the ages are placed in
order from largest to
smallest as the values will
appear on the number line.
30,
37,
46,
48,
54,

34,
38,
47,
48,
58

34,
39,
47,
48,

35,
40,
47,
49,

35,
40,
47,
49,

35,
40,
47,
50,

36,
46,
47,
54,

Advantages of Line Plots


The plot shows all the data.
Line plots allow several features of the data
to become more obvious, including any
outliers, data clusters, or gaps.

The mode is easily visible.


The range can be calculated quite easily
from this data display.

Disadvantages of Line
Plots
A line plot may only be used for
quantitative (numerical) data.
A line plot is not efficient when the data is
large and/or the the range is large.

Line Graph
Consists of
paired values graphed as points on a plane
defined by an x- and y-axis;
line segments connecting the graphed points
(much like a dot-to-dot).

Works well when


the data is paired (bi-variate);
the data is continuous.

Line Graph Example

John weighed 68 kg in 1991, 70 kg in 1992, 74 kg in 1993,


74 kg in 1994, and 73 kg in 1995.

Advantages of Line
Graphs
A line graph is a way to summarize how
two pieces of information are related and
how they vary depending on one another.

Disadvantages of Line
Graphs
Changing the scale of either axes can
dramatically change the visual impression
of the graph

Bar Graph
Consists of
bars of the same width drawn either horizontally or
vertically;
bars whose length (or height) represents the
frequencies of each value in a data set.

Works well when


the data is numerical or categorical;
the data is discrete;
the data is collected using a frequency table.

Bar Graph Example

Advantages of Bar
Graphs
The mode is easily visible.
A bar graph can be used with numerical
or categorical data.

Disadvantages of Bar
Graphs
A bar graph shows only the frequencies of
the elements of a data set.

Stem and Leaf Plot


Consists of
Numbers on the left, called the stem, which are the first half
of the place value of the numbers (such as tens values);
Numbers on the right, called the leaf, which are the second
half of the place value of the numbers (such as ones values)
so that each leaf represents one of the data elements.

Works well when


the data contains more than 25 elements;
the data is collected in a frequency table;
the data values span many tens of values.

Stem and Leaf Plot


Example

The number of points scored by the Vikings basketball team this


season: 78, 96, 88, 74, 63, 86, 92, 66, 72, 88, 83, 90, 67, 81, 85,
94.
Writing the data in numerical
order may help to organize the
data, but is NOT a required
step.

63, 66, 67, 72, 74, 78, 81, 83,


85, 86, 88, 88, 90, 92, 94, 96

Separate each number into a


stem and a leaf. Since these
are two digit numbers, the tens
digit is the stem and the units
digit is the leaf.

The number 63 would be


represented as
Stem
Leaf

Group the numbers with the


same stems. List the stems in
numerical order. Title the
graph.

Points scored by the Vikings


Stem

Leaf

3 6 7

2 4 8

1 3 5 6 8 8

0 2 4 6

Advantages of
Stem and Leaf Plots
It can be used to quickly organize a large list of data values.
It is convenient to use in determining median or mode of a
data set quickly.
Outliers, data clusters, or gaps are easily visible.

Disadvantages of
Stem and Leaf Plots
A stem and leaf plot is not very informative for a small set
of data.

Pie Chart
Consists of
a circle divided into sectors (or wedges) that show the
percent of the data elements that are categorized similarly.

Works well when


there is only one set of data (uni-variate);
comparing the composition of each part to the whole set of
data.

Pie Chart Example


Color
White
Black
Gray
Red
B lue
Other
Total

Number
19
25
11
18
7
10
90

A proportion can be used to calculate the angle


measure for each sector. Using white as the example,
19 white cars compare to the total of 90 in the same
way that 76 degrees compares to the total degrees
(360) in a circle.

Advantages of Pie
Charts
can be used for either numerical or categorical data.

shows a part to whole relationship.

Disadvantages of Pie
Charts
Without technology, a pie chart may be difficult to make.
Each percent must be converted to an angle by
calculating the fraction of 360 degrees. Then the correct
angle must be drawn.
A circle graph does not provide information about
measures of central tendency or spread.

Box and Whisker Plot


Consists of

the five-point summary (the least value, the greatest


value, the median, the first quartile, and the third
quartile);
a box drawn to show the interval from the first (25th
percentile) to the third quartile (75th percentile) with a line
drawn through the box at the median;
line segments, called the whiskers, connecting the box to
the least and greatest values in the data distribution.

Works well when

there is only one set of data (uni-variate);


there are many data values.

Box and Whisker Plot


Example
Math test scores 80, 75, 90, 95, 65, 65, 80, 85, 70,
100.

Write the data in numerical


order and find the five point
summary..
median = 80
first quartile = 70
third quartile = 90
smallest value = 65
largest value = 100

Median

65, 65, 70, 75, 80, 80, 85, 90, 95, 100
Median of Lower Part,
First Quartile

Place a point beneath


each of these values on a
number line.

Draw the box and


whiskers and median line.

Median of Upper Part,


Third Quartile

65 70

75 80 85 90

95 100

65 70

75 80 85 90

95 100

Box and Whisker Plot


Example

The following set of


numbers are the amount
(arranged from least to
greatest) of video games
owned by each boy in the
club.
18 27 34 52 54 59 61 68
78 82 85 87 91 93 100
68 is the median
The median is the
value exactly in the
middle of an ordered
set of numbers.

52 is the lower quartile

The lower quartile is the median of


the lower half of the values (18
27 34 52 54 59 61).
87 is the upper quartile
The upper quartile is the median
of the upper half of the values (78,
82, 85, 87, 91, 93, 100).

Advantages of Box and


Whisker Plots
Immediate visuals of a box-and-whisker plot are the center, the
spread, and the overall range of distribution.
Box plots are useful for comparing data sets, especially when the data
sets are large or when they have different numbers of data elements.

Disadvantages of Box and


Whisker Plots
It shows only certain statistics rather than all the data.
Since the data elements are not displayed, it is impossible to
determine if there are gaps or clusters in the data.

Scatter Plot
Consists of
paired data (bi-variate) displayed on a twodimensional grid.

Works well when


multiple measurements are made for each
element of a sample.

Additional Notes about


Scatter plots
If the relationship is thought to be a causal one,
then the independent variable is represented
along the x-axis and the dependent variable on
the y-axis
A scatter plot can show that there is a positive,
negative, constant, or no relationship
(correlation) between the variables.
Positive: As the value of one variable increases, so
does the other.
Negative: As the value of one variable increases, the
other decreases.
Constant: As the value of one variable increases (or
decreases), the other remains constant.
No relationship: There is no pattern to the points.

Scatter plot Example

Advantages of Scatter plots


A scatter plot is one of the best ways to determine if two
characteristics are related.
A scatter plot may be used when there are multiple trials for the
same input variable in an experiment.

Disadvantages of Scatter plots


When a scatter plot shows an association between two variables,
there is not necessarily a cause and effect relationship. Both
variables could be related to some third variable that explains their
variation or there could be some other cause. Alternatively, an
apparent association could simply be a result of chance

Histogram
Consists of
equal intervals marked on the horizontal axis;
bars of equal width drawn for each interval, with the height
of each bar representing either the number of elements or
the percent of elements in that interval.
(There is no space between the bars.)
Works well when
data elements could assume any value in a range;
there is one set of data (uni-variate);
the data is collected using a frequency table.

Histogram Example

Advantages of
Histograms
A histogram provides a way to display the
frequency of occurrences of data along an interval.

Disadvantages of Histograms
The use of intervals prevents the calculation of an
exact measure of central tendency.

Frequency Polygon
Consists of
It is a line chart that is constructed by plotting
the frequencies and class mark and connecting
the plotted pointed by means of a straight line;
the polygon us closed by considering an
additional class at each end and each end of the
lines are brought down to the horizontal axis at
the mid point of the additional classes.

Frequency Polygon
(Data)
C.I.

<cf

<cf

f/n

sector

80-89

40

84.5

0.0025

9percen
t

70-79

39

74.5

0.1000

36

60-69

13

35

64.5

18

0.3250

117

50-59

13

22

54.5

31

0.3250

117

40-49

44.5

35

0.1000

36

30-39

34.5

38

0.0750

27

20-29

24.5

39

0.0250

10-19

14.5

40

0.0250

n=40

rf=1

Advantages of Frequency
Polygon
A Frequency Polygon provides a way to display the
frequency of occurrences of data along an interval.
Denotes Class Mark

Disadvantages of Frequency
Polygon
The use of intervals prevents the calculation of an
exact measure of central tendency.

Ogive
Consists of
Graph of a cumulative frequency distribution and
sometimes called a cumulative frequency
distribution graph
Works well when
Less Than And More Than intervals are required.

Ogive

Advantages of Ogive
A Ogive provides a way to display the cumulative
frequency of occurrences of data along an interval.

Disadvantages of Ogive
The use of Cumulative intervals prevents the
calculation of an exact frequency and measure of
central tendency.

MEASURES OF CENTRAL TENDENCY


-single number represent the given data.
1. Mean average value of the given data.
- not appropriate measures of central
tendency if there is outlier.

2. Median divide the distribution into two equal


parts (upper 50% and the lower 50%)
3. Mode the most frequent occuring data.
- nominal value/part.

Mean

UNGROUPED DATA
25 32 41 58 78 9 5 105 110 112 112
115

Mean = X
n

= 883
11
= 80.2727

GROUPED DATA
Mean = fx
n
= 2250
40
Mean = 56.25

Short Cut Method


Mean = AM (fd/n)
i
= 64.5+ (-33/40)10
= 64.5-8.25
Mean = 56.25

Median

Median = LL+(n/2<cf) i
f
= 49.5 + (20-9/13)10
= 49.5 + (11/13) 10
= 49.5 + 8.4615
Median =57.9615

Mode

Mode= LL + (1/ 1+
2) h
*where 1 = difference between the modal class and the next lower
score.
2 = difference between the modal class and the next
upper score.

= 50 + (13/13 +13)10
Mode = 55

QUARTILE
(Q)

Three Quartiles
Q1- 25%

1/4

First Quartile

Q2- 50%

1/2

Second Quartile/ Median

Q3- 75%

3/4

Third Quartile

Q1 = LL + n/4 - <cf i
f
= 49.5 + 10-9 10
13
= 49.5 + (1/13) 10
= 49.5 + 0.7692
Q1 = 50.2692

Q3 = LL + 3n/4 - <cf I
f
= 59.5 + 30-22 10
13
= 59.5 + (80/13) 10
= 59.5 + 6.1538
Q3 = 65.6538

PERCENTILE (P)

P1- 1/100
P2- 2/100
P3- 3/100
P4- 4/100
P5- 5/100
..
P99- 99/100

P23 = LL + 0.23n - <cf i


f
= 49.5 + 9.2 - 9 10
13
= 49.5 + (0.2/13) 10
= 49.5 + (0.0153)10
= 49.5 + 0.1538
P23 = 49.6538

Measures of Variability
Measure of the scatteredness of a particular
data in a given data set.
Average of distance
Used to determine the reliability of average
values and control variation

Following are the


measures of variation :
Range - Difference between values of the two extreme
items of a series
Interquartile Range Difference between third and
First Quartile
Mean Average Deviation the arithmetic mean of the
absolute differences of the values from their average.
Standard Deviation positive Square root of the
average of Squares of Deviations of values from their
means.]

Range

R = XL Xs
Example :Age of a sample of 10 subjects from a
population are as follows:
42, 28, 28, 61, 31, 23, 50, 34, 32, 37.
Range = 61 23 = 38.
Coeff of Range = (XL Xs)/ (XL + Xs)
= (61 23)/ (61 +23)
= 0.452

InterQuartile
Range(IQR)

IQR = (Q3 Q1)


Semi Quartile Range = (Q3 Q1)/2
Coeff of Quartile Deviation = (Q3 Q1)/ (Q3 +
Q1)

Mean Deviation

For Ungrouped Data :


MAD = |x-x|
n
For Grouped Data :
MAD = fi|xi-x|
fi
Coeff of MAD :
= (MAD/ Mean)

Standard Deviation

For Ungrouped Data:


= (Xi X)2/n

For Grouped Data:


= fi(Xi X)2/fi
Coeff of Variation = (/ X) X 100
Variance = 2

You might also like