Professional Documents
Culture Documents
and Analysis of
Statistical Data
Statistics
A science of conducting studies to collect,
organize, summarize, analyze, and draw
conclusion from data; interpreting and
presenting numerical data.
Can refer to the mere tabulation of
numeric information as in reports of stock,
market, transactions, or to the body of
techniques used in processing or analyzing
data.
be
organized
into
two
Descriptive statistics
Concerned with collecting, organizing,
presenting, and analyzing numerical
data.
Inferential statistics
Its main concern is to analyze the
organized
data leading to prediction or
inferences.
Population
Refers to the groups or aggregates of people, objects,
materials, events or thing of any form.
Sample
Consist of few or more members of the population.
Outliers
An outlier is a data point that lies outside the overall pattern
of a distribution.
Variables - types
Statistical data or information can be gathered
through different ways such as interviewing
people, observing or inspecting items, using
questionnaires and checklists. The characteristic
that is being studied is called a variable. It
varies from one person or thing to another.
Numerical values and are examples of
quantitative variables.
Non-numerical values or attributes are examples
of qualitative variables.
Scales of Measurement
of Data
Nominal Data
Use numbers for the purpose of identifying name or membership in a group or
category.
Ordinal Data
Connote ranking or inequalities in this type of data, numbers represents
greater than or less than measurement, such as preferences or rankings.
Interval Data
Indicate an actual amount and there is equal unit of measurement
separating each score, specifically equal intervals. The true zero is
present.
Ratio Data
Similar to interval data but has an absolute zero and multiples
are meaningful. It include all the usual measurement of length,
height, weight, area, volume, density, velocity, money and
duration.
Types of Data
Primary Data - data collected directly
by the researcher himself. These
are first-hand or original sources.
They can be collected through:
1. Direct observation or measurement (primary
source of info).
2. By interview
(questionnaires or rating
scales).
3. By mail of recording or of recording forms.
4. Experimentation.
Methods in Presenting
Data
Textual Form - data in paragraph
form.
Tabular
Form
systematic
arrangement of data in rows and
columns.
Graphical Form - a graph or chart is
a device for showing numerical
values in pictorial form.
Semi Tabular/Semi Tabular Form the combination of Textual and
Tabular Presentation
Used for summarization and tabulation
of data.
Helps in analysis of relationships, trends,
relative size of data.
Parts of a Table
Title and number
Caption ( column heading)
Stub (row heading)
Sources and footnotes (optional)
Graphical Presentation
Types of graphs or Charts:
Line Plots
Line Chart
Bar Chart
Stem and leaf plots
Pie Charts
Box Plots
Pictograms
Scatter Diagrams
Line Plots
Consists of
a horizontal number line of the possible data values;
one X for each element in the data set placed over the
corresponding value on the number line.
30,
47,
50,
40,
46
37,
47,
35,
46,
36,
39,
40,
49,
34,
54,
38,
47,
49,
47,
47,
35,
35,
48,
48,
48,
34,
38,
47,
48,
58
34,
39,
47,
48,
35,
40,
47,
49,
35,
40,
47,
49,
35,
40,
47,
50,
36,
46,
47,
54,
Disadvantages of Line
Plots
A line plot may only be used for
quantitative (numerical) data.
A line plot is not efficient when the data is
large and/or the the range is large.
Line Graph
Consists of
paired values graphed as points on a plane
defined by an x- and y-axis;
line segments connecting the graphed points
(much like a dot-to-dot).
Advantages of Line
Graphs
A line graph is a way to summarize how
two pieces of information are related and
how they vary depending on one another.
Disadvantages of Line
Graphs
Changing the scale of either axes can
dramatically change the visual impression
of the graph
Bar Graph
Consists of
bars of the same width drawn either horizontally or
vertically;
bars whose length (or height) represents the
frequencies of each value in a data set.
Advantages of Bar
Graphs
The mode is easily visible.
A bar graph can be used with numerical
or categorical data.
Disadvantages of Bar
Graphs
A bar graph shows only the frequencies of
the elements of a data set.
Leaf
3 6 7
2 4 8
1 3 5 6 8 8
0 2 4 6
Advantages of
Stem and Leaf Plots
It can be used to quickly organize a large list of data values.
It is convenient to use in determining median or mode of a
data set quickly.
Outliers, data clusters, or gaps are easily visible.
Disadvantages of
Stem and Leaf Plots
A stem and leaf plot is not very informative for a small set
of data.
Pie Chart
Consists of
a circle divided into sectors (or wedges) that show the
percent of the data elements that are categorized similarly.
Number
19
25
11
18
7
10
90
Advantages of Pie
Charts
can be used for either numerical or categorical data.
Disadvantages of Pie
Charts
Without technology, a pie chart may be difficult to make.
Each percent must be converted to an angle by
calculating the fraction of 360 degrees. Then the correct
angle must be drawn.
A circle graph does not provide information about
measures of central tendency or spread.
Median
65, 65, 70, 75, 80, 80, 85, 90, 95, 100
Median of Lower Part,
First Quartile
65 70
75 80 85 90
95 100
65 70
75 80 85 90
95 100
Scatter Plot
Consists of
paired data (bi-variate) displayed on a twodimensional grid.
Histogram
Consists of
equal intervals marked on the horizontal axis;
bars of equal width drawn for each interval, with the height
of each bar representing either the number of elements or
the percent of elements in that interval.
(There is no space between the bars.)
Works well when
data elements could assume any value in a range;
there is one set of data (uni-variate);
the data is collected using a frequency table.
Histogram Example
Advantages of
Histograms
A histogram provides a way to display the
frequency of occurrences of data along an interval.
Disadvantages of Histograms
The use of intervals prevents the calculation of an
exact measure of central tendency.
Frequency Polygon
Consists of
It is a line chart that is constructed by plotting
the frequencies and class mark and connecting
the plotted pointed by means of a straight line;
the polygon us closed by considering an
additional class at each end and each end of the
lines are brought down to the horizontal axis at
the mid point of the additional classes.
Frequency Polygon
(Data)
C.I.
<cf
<cf
f/n
sector
80-89
40
84.5
0.0025
9percen
t
70-79
39
74.5
0.1000
36
60-69
13
35
64.5
18
0.3250
117
50-59
13
22
54.5
31
0.3250
117
40-49
44.5
35
0.1000
36
30-39
34.5
38
0.0750
27
20-29
24.5
39
0.0250
10-19
14.5
40
0.0250
n=40
rf=1
Advantages of Frequency
Polygon
A Frequency Polygon provides a way to display the
frequency of occurrences of data along an interval.
Denotes Class Mark
Disadvantages of Frequency
Polygon
The use of intervals prevents the calculation of an
exact measure of central tendency.
Ogive
Consists of
Graph of a cumulative frequency distribution and
sometimes called a cumulative frequency
distribution graph
Works well when
Less Than And More Than intervals are required.
Ogive
Advantages of Ogive
A Ogive provides a way to display the cumulative
frequency of occurrences of data along an interval.
Disadvantages of Ogive
The use of Cumulative intervals prevents the
calculation of an exact frequency and measure of
central tendency.
Mean
UNGROUPED DATA
25 32 41 58 78 9 5 105 110 112 112
115
Mean = X
n
= 883
11
= 80.2727
GROUPED DATA
Mean = fx
n
= 2250
40
Mean = 56.25
Median
Median = LL+(n/2<cf) i
f
= 49.5 + (20-9/13)10
= 49.5 + (11/13) 10
= 49.5 + 8.4615
Median =57.9615
Mode
Mode= LL + (1/ 1+
2) h
*where 1 = difference between the modal class and the next lower
score.
2 = difference between the modal class and the next
upper score.
= 50 + (13/13 +13)10
Mode = 55
QUARTILE
(Q)
Three Quartiles
Q1- 25%
1/4
First Quartile
Q2- 50%
1/2
Q3- 75%
3/4
Third Quartile
Q1 = LL + n/4 - <cf i
f
= 49.5 + 10-9 10
13
= 49.5 + (1/13) 10
= 49.5 + 0.7692
Q1 = 50.2692
Q3 = LL + 3n/4 - <cf I
f
= 59.5 + 30-22 10
13
= 59.5 + (80/13) 10
= 59.5 + 6.1538
Q3 = 65.6538
PERCENTILE (P)
P1- 1/100
P2- 2/100
P3- 3/100
P4- 4/100
P5- 5/100
..
P99- 99/100
Measures of Variability
Measure of the scatteredness of a particular
data in a given data set.
Average of distance
Used to determine the reliability of average
values and control variation
Range
R = XL Xs
Example :Age of a sample of 10 subjects from a
population are as follows:
42, 28, 28, 61, 31, 23, 50, 34, 32, 37.
Range = 61 23 = 38.
Coeff of Range = (XL Xs)/ (XL + Xs)
= (61 23)/ (61 +23)
= 0.452
InterQuartile
Range(IQR)
Mean Deviation
Standard Deviation