Professional Documents
Culture Documents
STAT 12
COLLEGE OF INFORMATION AND COMPUTER SCIENCE
Second Semester, AY 2011-2012
STAT 12-Statistics and Probability
Course Outline:
I. Statistics
A. Introduction to Statistics
1. Definition of Statistics
2. Areas of Statistics
a. Descriptive Statistics
b. Inferential Statistics
3. Terminologies in Statistics
4. Scales of Measurement
5. Sampling Technique
a. Probability Sampling
b. Non-Probability Sampling
6. Summation Notation
7. Numerical Measures for Population and Samples
a. Measures of Central Tendency
i). Mean, median and mode for grouped and ungrouped data
b. Measures of Variability, Skewness, and Kurtosis -----------------------PRELIM EXAM (Dec.
12, 2012)
II. Probability
A. Experiments, Sample Spaces and Events
B. Computing / Counting Technique
C. Probability of An Event
D. Laws of Probability
E. Conditional Probability and Independent Events-----------------------------MIDTERM EXAM (FEB.
07, 2012)
III. Tests of Hypothesis
A. Testing and Errors in Testing
B. Tailed and Two-Tailed Tests
C. Steps in Testing Hypothesis
D. Tests Concerning the Mean and Proportion-One Population
E. Tests Concerning Means and Proportions-Two Populations
IV. Regression and Correlation
A. Linear Correlation Analysis
B. Simple Linear Regression analysis
C. Residuals
------------------------------------------------------------------FINAL EXAM (MARCH 21,
2012)
Classroom Policies:
1. Be courteous at all times.
2. Be responsible with your duties and obligations as students(e.g. attendance, quizzes,
seatwork, assignments and problem sets)
3. Be a self-disciplined and self-directed individual.
CSU-CICS/2nd sem/2012-13 2
STAT 12
numerical value.
2. Quantitative Variable: includes variables
that
assume numerical values.
Classification:
a. discrete variable: assumes a finite
or countable number of values and
is obtained through the process of
counting.
b. Continuous variable: assumes
infinitely many values that
corresponds to the point on a line
or interval and is obtained through
measuring.
3. Dependent Variable: a variable which is
affected or
influenced by another variable.
4. Independent Variable: one which affects
or
influences the dependent variable.
Data: facts, or set of information or
observations under study.
Scales of Measurement:
1. Nominal Scale: numbers or symbols used
to classify units into distinct categories.
2. Ordinal Scale: accounts for order. No
indication of distance between positions,
data are arranged in some specified
order or rank.
3. Interval Scale: equal intervals(fixed unit
of measurement), no absolute zero.
4. Ratio Scale: Has absolute zero.
CSU-CICS/2nd sem/2012-13 3
STAT 12
2..Inferential Statistics: consists of methods of
analyzing a
subset or part of the population
leading to
predictions or inferences about the
population.
Population: consists of the totality of the
observations. It is
the complete set of all possible
observations or
elements to be studied.
Parameter: a numerical value that
describes a
characteristic of a
population, which is
often represented by a
Greek letter. It is a
constant value.
Sample: a subset of the population.
Statistic: numerical value describing a
characteristic
of a sample.
Variable: an observable characteristic that can
be
measured or classified.
Types:
1..Qualitative Variable: assumes values
that can be
categorized according to some
distinct
characteristics or attribute. It
has no
Sampling Technique:
a procedure used to determine the individuals
or members of a sample.
Two Types of sampling Technique
1. Probability Sampling
a sampling technique wherein each
member or element of the population
has an equal chance of being selected as
members of the sample.
a. Random sampling
- basic type of probability sampling
wherein each member of the population
CSU-CICS/2nd sem/2012-13 4
STAT 12
has an equal chance of being selected as
a member of the sample chosen from a
complete list of the members of the
population.
Types:
i.
Lottery Method
ii.
Table of Random Numbers
b. Systematic Sampling: sampling
technique wherein the selection of
members in the sample is done by
choosing a random starting point,
then draw successive elements from
the population. In other words, it is
picking of the nth element of the
population as a member of the
sample.
c. Stratified Random Sampling:
Stratified-comes from the root word
strata which means groups or
categories(singular form-stratum).
-dividing the elements of a
population into
different categories or
subpopulations and
then the members of the sample
are drawn or
selected proportionally from
each population.
d. Cluster Sampling: sampling wherein
groups or clusters instead of
individuals are randomly chosen. It is
sometimes called area sampling.
e. Multi-stage Sampling: a combination
of several sampling techniques which
is done by starting the selection of
the members of the sample using
cluster sampling and then dividing
each cluster or group into strata, then
from each stratum individuals are
drawn randomly using simple random
sampling.
2. Non-Probability Sampling: a sampling
technique wherein members of the
sample are drawn from the population
based on the judgment of the
researchers. This technique lacks
objectivity of selection, hence, it is
Summation Notation
i added variables
1 - lower limit of the summation
Xi
i 1
summation notation
=
Examples:
1. If x1 = 10, x2 = 5, x3 = 4, x4 = 2, x5 = 7,
find the following expressions:
4
x
i 2
a.
5
(x )
i 1
b.
5
(x
i 1
2)
c.
Theorems:
1. The summation of the sum of two or
more variables is the sum of their
summations
n
xi y i z i
i 1
i 1
i 1
i 1
xi y i z i
=
2. If c is constant, then
c nc
cx
i 1
i 1
and
=c
x
i 1
CSU-CICS/2nd sem/2012-13 5
STAT 12
sometimes called subjective sampling.
Examples:
If x1 = 10, x2 = 3, x3 = 4 and y1 = 7, y2 = 3, y3
= 2, find the following expressions:
3
x y
i 1
a.
3
(x )
i 1
( yi ) 2
b.
c.
xi
i 1
3
y
i 1
i 1
yi
d.
3
2y
i 1
e.
3
xi
y
i 1
i 1
f.
+5
PRESENTATION OF DATA
Classification of data
1. Ungrouped Data- data that are not
arranged, or if arranged, could only be
from highest to lowest or from lowest
to highest.
2. Grouped Data data that are organized
and arranged into different classes or
categories.
Different Methods of Presenting Data
1. Textual Method involves enumerating
the important characteristic, giving
examples on significant figures and
identifying important features of the
data.
a. Stem-and-leaf Plot = a table which
sorts data according to a certain
CSU-CICS/2nd sem/2012-13 6
STAT 12
pattern. It involves separating a
number into two parts: in a two digit
number, the stem consists of the first
digit, and the leaf consists of the
second digit. While in a three-digit
number, the stem consists of the first
two digits , and the leaf consists of
the last digit. In a one-digit number,
the stem is zero.
b. Dot-plot = a graphical presentation of
a set of data which shows a dot for
each value in a data set along a
number line. If there are multiple
occurrences of a specific value or
values, then the dots will be stacked
vertically.
Class Boundary (c.b) obtained by subtracting
half unit from the lower limit and add half unit
to upper limit. This is
expressed in one
decimal place.
Relative Frequency the proportion of the total
number of items. To compute for the relative
frequency, divide the
frequency of a class
interval by the total frequency and the get the
percentage.
Cumulative Frequency the total frequency of
all values which is less that or greater than a
class boundary.
Less than Cumulative Frequency the
total frequency of all values less than upper
class boundary of a given class
interval.
Greater than Cumulative Frequency the
total frequency of all values greater then the
lower boundary of a given
class interval
Steps in Constructing a frequency distribution
table:
1. Decide on the number of classes.
2. Determine the class width(i).
Class width =
- use
when the problem states the number of classes
to be used.
0.7
2.9
3.5
0.9
2.1
2.4
0.4
3.9
6.3
2.5
3.9
2.6
1.8
3.4
2.3
1.3
2.8
1.1
0.2
2.1
2.8
3.7
3.1
2.3
1.5
2.6
3.5
5.9
2.0
1.2
1.3
2.1
0.3
2.5
4.3
1.8
1.4
2.0
1.9
1.7
CSU-CICS/2nd sem/2012-13 7
STAT 12
a. When the data are whole numbers, i
should be a whole number.
b. When the data are in one-decimal
place, i should also be in one decimal
place.
3. Unless otherwise specified, always start
the lowest class with the lowest value of
the raw, in order to minimize errors.
4. Tally the frequency of each class, until
the highest value is reached.
5. The last class interval can go beyond the
highest value in the observation as long
as the obtained i is followed.
x
- indicates sample mean
1. The Mean for Ungrouped Data
CSU-CICS/2nd sem/2012-13 8
STAT 12
N
i 1
xi
i 1
N
=
and
fX
N
cf
lcb 2
fm
N
=
x - score or measuremen t
f - frequency
X m classmark or midpoint
N - total frequency
where
Steps:
1. Construct the column for Xm. The Classmark is the
midpoint of the lower and upper limits of a given
class interval. The midpoint is found by getting
one-half of the sum of the upper limit and the
lower limit.
2. Multiply each classmark by its corresponding
frequency. This will be entered in the column fXm.
3. Get the sum of the values obtained in Step 2 to get
fX
.
4. Sustitute the obtained values in step 3 in the
formula to find the mean.
Characteristics of the Mean
1. The mean is the most appropriate measure of
central tendency when the data are in the interval
or ratio scale.
X=
Where,
X - median
N - total frequency
median class
i - size of the class interval
f frequency of the median class
m
Steps:
1. Construct the <cf.
2. Determine the median class. This is the class
interval containing one-half of the total frequency
N
2
in the <cf column. The <cf column is
constructed by adding frequencies successively
starting from the lowest class interval.
3. Use the formula to find the median.
Characteristics:
1. The mode is the most appropriate measure of central
tendency when the data are nominal in scale.
2. The mode is the least reliable among the three measures
of central tendency because its value is undefined in
some distributions.
3. The mode is used when we want to find the value which
occurs most often.
CSU-CICS/2nd sem/2012-13 9
STAT 12
distribution lies above it and 50% lies below it.
Exercise:
On sheets of short bondpaper, to be submitted on Monday.
(Group of 3)
1. Find the mean, median and mode of the scores (n = 44).
2. Present the table and the solutions.
fm fa
i
l m
2
f
f
m
a
b
Mo =
Where,
Mo - Mode
Steps:
1.. find the modal class. This is the class interval
with the highest frequency.
2. use the formula to find the mode.