You are on page 1of 59

STATISTICS AND PROBABILITY

What is Statistics?
• “Statistics is a way to get information from
Statistics data”

Data Information

Data: Facts, especially numerical Information: Knowledge


facts, collected together for reference communicated concerning
or information. Body of information some particular fact.
or observations being considered by Processed data. Basis for
the researchers. decision making

Statistics is a tool for creating new understanding from a set of numbers.


Importance/Objectives of Statistics
a. Efficient and effective tool for research
b. Evaluate reliability and validity of information
c. Considers/estimates the precision of results
d. extensively analyzes information for a more
meaningful interpretation
e. ensures that results bring insights leading to a
sound judgment, better policy making and the like
f. elicits and establishes more information
recommended for future study
g. generally applies to any field of discipline
• Four essential Process in
Statistics
a. Collection of Data
- refers to the gathering of related
information
- process involves (a) what is useful and
needed (b) where to get information, (c) how
to get information
b. Presentation of Data
- refers to the systematic way of organizing
data
- process involves (a) collecting , (b)
classifying, and (c) arraying data gathered in
preparation to its analysis
c. Analysis of Data

- refers to extracting relevant information


from the data at hand

- process involves (a) comparison, (b)


description, and (c) statistical measurements
to come up with numerical values
and/or qualitative summary as a resulting
conclusion
d. Interpretation of Data
- refers to the drawing of logical statements
from the analyzed information

- process involves (a) generalizing, (b)


forecasting , and (c) recommending
solutions/interventions about the study.
Two Major Field of Statistics
A. Descriptive Statistics
B. Inferential Statistics
Descriptive Statistics
• - It is a group of statistical measurements or
methods that functions and aims to describe the
data
• - it exposes the basic characteristics or
summaries of the data
• Take note and clearly understood that descriptive
statistics provides information only about the
collected data
Or drawing generalization is not within the scope of
descriptive statistics.
Examples:
• - percentage distribution table
• - frequency table
• - bar graph
• - measures of central tendency
• - measures of position
• - measures of variation
Statistical Inference…
• Statistical inference is the process of making
an estimate, prediction, or decision about a
population based on a sample.
Population
Sample

Inference

Statistic
Parameter
What can we infer about a Population’s Parameters
based on a Sample’s Statistics?
Key Statistical Concepts…
• Population
— a population is the group of all items of interest to a
statistics practitioner.
— frequently very large; sometimes infinite.
E.g. All 5 million voters in the city

• Sample
— A sample is a set of data drawn from the
population.
— Potentially very large, but less than the population.
E.g. a sample of 765 voters exit polled on election day.
Key Statistical Concepts…
• Parameter
— A descriptive measure of a population.

• Statistic
— A descriptive measure of a sample.
Key Statistical Concepts…
Population Sample

Subset

Statistic
Parameter
• Populations have Parameters,
• Samples have Statistics.
1.15
MEASURES OF
CENTRAL TENDENCY
Measures of Central Tendency
• Is a single value that summarizes a set of
data. It is a value where the set of data
tends to center. The measures of central
tendency to be discussed are the mean,
the median and the mode.
Measures of Central Location…
The arithmetic mean, a.k.a. average, shortened
to mean, is the most popular & useful
measure of central location.
It is computed by simply adding up all the
observations and dividing by the total number
of observations:
Sum of the observations
Mean =
Number of observations
Weighted Mean is a special case of the
arithmetic mean. It is computed by
multiplying each value by an appropriate
weight, add these products, and then
divide the result by the sum of the
weights.
Formula in finding the weighted Mean

XW
XW 
W
Where
X is the individual scores
W is the weight of each score
Arithmetic Mean…

Sample Mean
Population Mean
Statistics is a pattern language…
Population Sample

Size N n

Mean
MEDIAN
• Is the middlemost value in an ordered array of
data. It is the value of the observation that
divides the data set into two equal parts when
the data are arranged in increasing or decreasing
order.
• Unlike the mean, median is not affected by every
value in the data set especially by the extreme
values. The median is dependent on the position
of the data in a distribution the reason why it is
also called a position measure.
Formula for Ungrouped Data
Median(Md)  X m
where Xm is the middle most value of the data
when n is odd

~ X m1  X m2
Median(Md)  X 
2
where Xm1 and Xm2 are the two middle most
values when n is of the data when n is even
For odd or even number of
observation

Median X n 1

2 2

Where X represents each value


n is the number of observation
How to find the median of Grouped
Data
1. Given a frequency distribution table, determine the
less than cumulative frequency (<cf)
2. Compute the value of N/2
3. Look for the smallest value in the less than cumulative
frequency (<cf) where N/2 is counted.
4. Locate the median class to identify lower boundary of
the median class, less than cumulative frequency
before the median class, the frequency of the median
class, and the class size.
5. Apply the formula in finding the median.
Formula
N 
   cfb 
~
Median( Md)  X  Lb   2 i
 f 
 
Where
LB is the lower limit of the median class
N is the total frequencies
F is the frequency of the median class
<cfb is the cf of the class (before)
i is the class size
MODE
• Is the value that appears the most number of
times in a data set.
• The mode of ungrouped data can be seen by
inspection. The value that occurs most
frequently is the mode.
How to find the mode of grouped data
• 1. Given a frequency distribution table,
determine the highest frequency. The class
interval containing the highest frequency is the
modal class.
• 2. Determine the lower boundary of the modal
class
• 3. Where the intervals are arranged in increasing
order from top to bottom, find the difference
between the highest frequency and the
frequency just above. The result is the value of
Δ1 .
• 4. Compute the difference between the
highest frequency and the frequency just
below. The difference is the value of Δ2.
• 5. Determine the class size (i).
• 6. Apply the formula for finding the mode.
Formula
ˆ  1 
Mode  X  Lb   i
 1  2 
Where:
Lb is the lower boundary of the modal class
Δ1 is the difference between the highest frequency
and the frequency just above
Δ2 is the difference between the highest frequency
and the frequency just below.
i is the class size
Quantile
• Is a general descriptive measurement used to
separate quantitative data into distinct
groups. Below are the different kinds of
quantiles:
Quartiles
Divide the values into four parts of equal size,
each comprising 25% of the observations. The
median describe the second quartile, below
which 50% of the values fall.
Deciles
• Divide the values into ten ten parts of equal
size, each comprizing 10% of the observations.
The median is the 5th decile.
Percentiles
• Divide the values into 100 parts of equal size,
each comprising 1% of the observations. The
median is the 5oth percentile.
Ungrouped Data
Q1  X N 3 Q3  X 3 N 1 D4  X 4 N
  
6
4 4 4 4 10 10

D2  X 2 N P23  X 23 N 77 P55  X 55 N 45

8  
100 100 100 100
10 10
Grouped Data
N 
   cfb  Where:
Q1  Lb   4 i
 f  Lb is the lower boundary of the Q1,
D3, and P45
 
 3N  N is the total frequency
   cf b 
D3  Lb   10 i <cfb is the less than cumulative
 f  frequency of the class just before the
 
  Q1, D3 and P45 class

 45N 
   cf b 
f is the frequency of the nth quantile
P45  Lb   100 i class
 f 
  i is the class size
 
Use N if the data represent a population and n for a sample
Sample Interpretation
25% of the data is less than or equal to Q1
50% of the data is less than or equal to Q2,D5,P50
75% of the data is less than or equal to Q3 or P75
10% of the data is less than or equal to D1 or P10
30% of the data is less than or equal to D3 or P30
32% of the data is less than or equal to P32
87% of the data is less than or equal to P87
Example
Your handsome lecturer in Statistics gave a
test to all grade 11 students under
section A. The students finished the test
in 35 minutes. This is the 2.5th decile of
the allotted time. What does this mean?
Answer
• This means that 25% of the learners finished
the test.
Example
• Sir C is a teacher in one science high
school in Metro Manila. His salary is
in the 7th decile. Should the teacher
be glad about his salary or not?
Answer
• 70% of the employees in the said school
receive a salary that is less than or equal
to his salary and 30% of the employees
receive a salary that is greater than his
salary. Then, the teacher should be
pleased with the salary.
Example
• 1. Below are the grades of ten selected students
in a morning class. Determine Q1, D3, and the
80th percentile.
66, 90, 93, 76, 73, 87, 76, 96, 85, 69

• Q1 = ?

• D3 =?

• P80 =?
Example
• Calculate Q1, D7,P65 of the Mathematics test
scores of 50 students.
Class Boundary Frequency Lower Less Than
Boundaries cumulative
frequency
46-50 4 45.5 50
41-45 8 40.5 46
36-40 11 35.5 38
31-35 9 30.5 27
26-30 12 25.5 18
21-25 6 20.5 6
Measures of Dispersion
(Variability)
• Measures the extent to which data are
dispersed or spread out.

• Two sets of data may have the same measures


of central tendency but different variations, or
they may have the same measures of variation
but different central tendencies. Likewise, it is
possible that they have different measures of
central tendency and variation.
Some commonly used measures of
dispersion are:
1. range
2. Quartile deviation
3. Mean Absolute deviation
4. Variance
5. Standard deviation
Range
Of a set of data with N observations is defined
as the difference between the highest and the
lowest values.

How to compute for the range


1. Determine the highest and the lowest scores
2. Find the difference between the highest and
the lowest scores.
Quartile Deviation
• Of a set of data with N observations is the
amount of spread within the middle half of
the items arranged in array. It is sometimes
called the semi – interquartile range.

• It is an impovement of the range because it


eliminates the effect of two extreme values. It
is used for ordinal data.
How to compute for the quartile
deviation
1. Arrange the data in ascending order
2. Determine the values of the first and the
third quartiles
3. Find the difference between the third and
the first quartiles
4. Divide the difference by 2
Formula

Q3  Q1
QD 
2
Variance and Standard Deviation
Of a set of data with N observations are special
forms of average deviation from the mean
which is affected by all individual values of the
items in any given distribution. They measure
the average scatter around the mean.
Variance…
population mean

The variance of a population is:

population size
sample mean

• The variance of a sample is:

Note! the denominator is sample size (n) minus one !


Application…
The following sample consists of the number of jobs six
randomly selected students applied for: 17, 15, 23, 7,
9, 13.
• Finds its mean and variance.

• What are we looking to calculate?

• The following sample consists of the number of jobs six


randomly selected students applied for: 17, 15, 23, 7,
9, 13.
Finds its mean and variance.
…as opposed to  or 2
Sample
Sample Mean
Mean & Variance…

Sample Variance

Sample Variance (shortcut method)


Standard Deviation…
• The standard deviation is simply the square
root of the variance, thus:

• Population standard deviation:

• Sample standard deviation:

You might also like