You are on page 1of 9

CSU-CICS/2nd sem/2012-13 1

STAT 12
COLLEGE OF INFORMATION AND COMPUTER SCIENCE
Second Semester, AY 2011-2012
STAT 12-Statistics and Probability
Course Outline:
I. Statistics
A. Introduction to Statistics
1. Definition of Statistics
2. Areas of Statistics
a. Descriptive Statistics
b. Inferential Statistics
3. Terminologies in Statistics
4. Scales of Measurement
5. Sampling Technique
a. Probability Sampling
b. Non-Probability Sampling
6. Summation Notation
7. Numerical Measures for Population and Samples
a. Measures of Central Tendency
i). Mean, median and mode for grouped and ungrouped data
b. Measures of Variability, Skewness, and Kurtosis -----------------------PRELIM EXAM (Dec.
12, 2012)
II. Probability
A. Experiments, Sample Spaces and Events
B. Computing / Counting Technique
C. Probability of An Event
D. Laws of Probability
E. Conditional Probability and Independent Events-----------------------------MIDTERM EXAM (FEB.
07, 2012)
III. Tests of Hypothesis
A. Testing and Errors in Testing
B. Tailed and Two-Tailed Tests
C. Steps in Testing Hypothesis
D. Tests Concerning the Mean and Proportion-One Population
E. Tests Concerning Means and Proportions-Two Populations
IV. Regression and Correlation
A. Linear Correlation Analysis
B. Simple Linear Regression analysis
C. Residuals
------------------------------------------------------------------FINAL EXAM (MARCH 21,
2012)
Classroom Policies:
1. Be courteous at all times.
2. Be responsible with your duties and obligations as students(e.g. attendance, quizzes,
seatwork, assignments and problem sets)
3. Be a self-disciplined and self-directed individual.

CSU-CICS/2nd sem/2012-13 2
STAT 12

WELCOME AND HAVE A FRUITFUL SEMESTER AHEAD!!!


Statistics: (science)
- orderly arrangement of knowledge and
facts
based on careful observation and
experimentation.
- a scientific body of knowledge that deals
with
the collection, organization or
presentation,
analysis, interpretation of data and
drawing
conclusions about a broader body of
data on the
basis of a representative sample.
Statistics:(mathematics)
- deals with finding possible solutions to
solving a particular problem with the
application of appropriate formulations.
Collection: refers to the gathering of
information or data.
Organization or Presentation: involves
summarizing data
or information in textual, graphical,
or tabular
forms.
Analysis: involves describing the data by using
statistical
methods and procedures.
Interpretation: refers to the process of making
conclusions
based on the analyzed data.
Two Areas in Statistics:
1..Descriptive Statistics: consists of methods of
collecting
and describing a set of data to yield
valid
information. This can be presented
in forms such
as bar graphs, pie graphs, and line
graphs; the
numerical tables.

numerical value.
2. Quantitative Variable: includes variables
that
assume numerical values.
Classification:
a. discrete variable: assumes a finite
or countable number of values and
is obtained through the process of
counting.
b. Continuous variable: assumes
infinitely many values that
corresponds to the point on a line
or interval and is obtained through
measuring.
3. Dependent Variable: a variable which is
affected or
influenced by another variable.
4. Independent Variable: one which affects
or
influences the dependent variable.
Data: facts, or set of information or
observations under study.
Scales of Measurement:
1. Nominal Scale: numbers or symbols used
to classify units into distinct categories.
2. Ordinal Scale: accounts for order. No
indication of distance between positions,
data are arranged in some specified
order or rank.
3. Interval Scale: equal intervals(fixed unit
of measurement), no absolute zero.
4. Ratio Scale: Has absolute zero.

CSU-CICS/2nd sem/2012-13 3
STAT 12
2..Inferential Statistics: consists of methods of
analyzing a
subset or part of the population
leading to
predictions or inferences about the
population.
Population: consists of the totality of the
observations. It is
the complete set of all possible
observations or
elements to be studied.
Parameter: a numerical value that
describes a
characteristic of a
population, which is
often represented by a
Greek letter. It is a
constant value.
Sample: a subset of the population.
Statistic: numerical value describing a
characteristic
of a sample.
Variable: an observable characteristic that can
be
measured or classified.
Types:
1..Qualitative Variable: assumes values
that can be
categorized according to some
distinct
characteristics or attribute. It
has no
Sampling Technique:
a procedure used to determine the individuals
or members of a sample.
Two Types of sampling Technique
1. Probability Sampling
a sampling technique wherein each
member or element of the population
has an equal chance of being selected as
members of the sample.
a. Random sampling
- basic type of probability sampling
wherein each member of the population

a. Convenience Sampling: sampling


technique that is being by
researchers due to the convenience it
offers them.
b. Opportunity sampling: this sampling
selects the samples as they come
along regardless of whether the
elements in the population have an
equal chance of being selected.
c. Quota Sampling: sampling procedure
wherein the proportions of the
various subgroups in the population
are determined and the sample is

CSU-CICS/2nd sem/2012-13 4
STAT 12
has an equal chance of being selected as
a member of the sample chosen from a
complete list of the members of the
population.
Types:
i.
Lottery Method
ii.
Table of Random Numbers
b. Systematic Sampling: sampling
technique wherein the selection of
members in the sample is done by
choosing a random starting point,
then draw successive elements from
the population. In other words, it is
picking of the nth element of the
population as a member of the
sample.
c. Stratified Random Sampling:
Stratified-comes from the root word
strata which means groups or
categories(singular form-stratum).
-dividing the elements of a
population into
different categories or
subpopulations and
then the members of the sample
are drawn or
selected proportionally from
each population.
d. Cluster Sampling: sampling wherein
groups or clusters instead of
individuals are randomly chosen. It is
sometimes called area sampling.
e. Multi-stage Sampling: a combination
of several sampling techniques which
is done by starting the selection of
the members of the sample using
cluster sampling and then dividing
each cluster or group into strata, then
from each stratum individuals are
drawn randomly using simple random
sampling.
2. Non-Probability Sampling: a sampling
technique wherein members of the
sample are drawn from the population
based on the judgment of the
researchers. This technique lacks
objectivity of selection, hence, it is

drawn to have the same percentage


in it and that the selection of the
members of the sample is not done
randomly.
d. Purposive Sampling: the selection of
the elements or members in the
sample is purposeful.

Summation Notation

i added variables
1 - lower limit of the summation

n - upper limit of the summation

Xi

i 1

summation notation

=
Examples:
1. If x1 = 10, x2 = 5, x3 = 4, x4 = 2, x5 = 7,
find the following expressions:
4

x
i 2

a.
5

(x )
i 1

b.
5

(x
i 1

2)

c.
Theorems:
1. The summation of the sum of two or
more variables is the sum of their
summations
n

xi y i z i
i 1

i 1

i 1

i 1

xi y i z i

=
2. If c is constant, then

c nc

cx
i 1

i 1

and

=c

x
i 1

CSU-CICS/2nd sem/2012-13 5
STAT 12
sometimes called subjective sampling.

Examples:
If x1 = 10, x2 = 3, x3 = 4 and y1 = 7, y2 = 3, y3
= 2, find the following expressions:
3

x y
i 1

a.
3

(x )
i 1

( yi ) 2

b.

c.


xi
i 1
3

y
i 1

i 1

yi

Frequency Distribution Table- a table which


shows the data arranged into different classes
and the number of cases fall into each class.
Frequency number of occurrences of an
observation.

d.
3

2y
i 1

e.
3

xi

y
i 1

i 1

f.

2.. Tabular Method shows complete


information regarding the data.
Parts:
a. Tabular Number-this is for easy
reference to the table.
b. Table title briefly explains the content
of the table.
c. Column header describes the data in
each column
d. Row classifier shows the classes or
categories
e. Body main part of the table
f. Source note- placed below the table
when the data written are not original.

+5

PRESENTATION OF DATA
Classification of data
1. Ungrouped Data- data that are not
arranged, or if arranged, could only be
from highest to lowest or from lowest
to highest.
2. Grouped Data data that are organized
and arranged into different classes or
categories.
Different Methods of Presenting Data
1. Textual Method involves enumerating
the important characteristic, giving
examples on significant figures and
identifying important features of the
data.
a. Stem-and-leaf Plot = a table which
sorts data according to a certain

Class Interval row classifiers


Suggestions in gathering the number of class
intervals:
1. Avoid using fewer than 6 or more than
15 classes. The number of classes in a
given situation depends on the nature,
magnitude, and range of data.
2. Make sure that each item goes into one
and only one class.
Class limit smallest and largest value of a
given interval.
Lower class limit or lower limit smallest value
of a given interval
Upper Class Limit/Upper Limit largest value of
a given interval
Class width(i) the number of test scores
contained in a class. (UL LL) + 1

CSU-CICS/2nd sem/2012-13 6
STAT 12
pattern. It involves separating a
number into two parts: in a two digit
number, the stem consists of the first
digit, and the leaf consists of the
second digit. While in a three-digit
number, the stem consists of the first
two digits , and the leaf consists of
the last digit. In a one-digit number,
the stem is zero.
b. Dot-plot = a graphical presentation of
a set of data which shows a dot for
each value in a data set along a
number line. If there are multiple
occurrences of a specific value or
values, then the dots will be stacked
vertically.
Class Boundary (c.b) obtained by subtracting
half unit from the lower limit and add half unit
to upper limit. This is
expressed in one
decimal place.
Relative Frequency the proportion of the total
number of items. To compute for the relative
frequency, divide the
frequency of a class
interval by the total frequency and the get the
percentage.
Cumulative Frequency the total frequency of
all values which is less that or greater than a
class boundary.
Less than Cumulative Frequency the
total frequency of all values less than upper
class boundary of a given class
interval.
Greater than Cumulative Frequency the
total frequency of all values greater then the
lower boundary of a given
class interval
Steps in Constructing a frequency distribution
table:
1. Decide on the number of classes.
2. Determine the class width(i).

highest va lue - lowest val ue


number of classes

Class width =
- use
when the problem states the number of classes
to be used.

highest va lue - lowest val ue


1 3.322 Log(N)
Class width =
- use
when there is no number of classes indicated.
Simple Frequency Distribution Table consists
only of class interval and frequency.
Complete Frequency Distribution Table
consists of classmark or midpoint(X average
of the UL and LL) and class boundaries(c.b)
Exercise:
The following data represent the length of life
in minutes, measured to the nearest tenth, of a
random sample of 50 black flies subjected to a
new spray in a controlled laboratory
experiment.
2.4
1.6
3.2
4.6
0.4
1.8
2.7
1.7
5.3
1.2

0.7
2.9
3.5
0.9
2.1
2.4
0.4
3.9
6.3
2.5

3.9
2.6
1.8
3.4
2.3
1.3
2.8
1.1
0.2
2.1

2.8
3.7
3.1
2.3
1.5
2.6
3.5
5.9
2.0
1.2

1.3
2.1
0.3
2.5
4.3
1.8
1.4
2.0
1.9
1.7

Using 8 class interval, construct a complete


frequency distribution table, include relative
frequency distribution and cumulative
frequency.

CSU-CICS/2nd sem/2012-13 7
STAT 12
a. When the data are whole numbers, i
should be a whole number.
b. When the data are in one-decimal
place, i should also be in one decimal
place.
3. Unless otherwise specified, always start
the lowest class with the lowest value of
the raw, in order to minimize errors.
4. Tally the frequency of each class, until
the highest value is reached.
5. The last class interval can go beyond the
highest value in the observation as long
as the obtained i is followed.

MEASURES OF CENTRAL TENDENCY


- Any numerical descriptive measure indicating the
center of a set of data, which is arranged in
increasing or decreasing order or magnitude.
- A single number or value which can be considered
typical in a set of data as a whole.
A.. The MEAN
- The average of a set of measurements which is
often referred to as the ARITHMETIC MEAN or
Mean.
- The sum of all the measurements divided by the
number of measurements in the set.

- indicates population mean

2. The mean lies between the largest and smallest


values or measurements.
3. There is only one value for the mean for a given set
of values or measurements.
4. The mean is easily influenced by extreme values
because all values contribute to the average. If there
are high values, the mean tends to be high also. If
there are extremely low values, the mean tends to be
low also
B.. The MEDIAN
- the value of x in the middle of a set of n
measurements.
- Denoted by X, Md or M.

x
- indicates sample mean
1. The Mean for Ungrouped Data

1. Median for Ungrouped Data


- Arranged the data or measurements from smallest to
largest.

CSU-CICS/2nd sem/2012-13 8
STAT 12
N

i 1

xi

i 1

N
=

and

2. The Mean for Grouped Data


(Using Classmark Formula)

fX

When the number of data(n) is odd, the value of


0.5(n+1) , indicates the position of the median in the
ordered data set.
If the number of data(n) is even, get the average of
the values of the observations found at the middle of
the arranged data.

2. Median for Grouped data

N
cf

lcb 2

fm

N
=

x - score or measuremen t
f - frequency

X m classmark or midpoint
N - total frequency

where
Steps:
1. Construct the column for Xm. The Classmark is the
midpoint of the lower and upper limits of a given
class interval. The midpoint is found by getting
one-half of the sum of the upper limit and the
lower limit.
2. Multiply each classmark by its corresponding
frequency. This will be entered in the column fXm.
3. Get the sum of the values obtained in Step 2 to get

fX

.
4. Sustitute the obtained values in step 3 in the
formula to find the mean.
Characteristics of the Mean
1. The mean is the most appropriate measure of
central tendency when the data are in the interval
or ratio scale.

Characteristics of the Median


1. The median is the most appropriate measure of
central tendency for interval data.
2. The median lies between the highest and the
lowest measurements.
3. There is only one value for the median in a given
set of measurements.
4. The median is not influenced by extreme values.
5. The median is used when the middle value is
desired. It is the value where 50% or half of the

X=
Where,

X - median

lcb - lower class boundary of the median class

N - total frequency

cf - less than cumulative frequency above the

median class
i - size of the class interval
f frequency of the median class
m

Steps:
1. Construct the <cf.
2. Determine the median class. This is the class
interval containing one-half of the total frequency

N
2
in the <cf column. The <cf column is
constructed by adding frequencies successively
starting from the lowest class interval.
3. Use the formula to find the median.
Characteristics:
1. The mode is the most appropriate measure of central
tendency when the data are nominal in scale.
2. The mode is the least reliable among the three measures
of central tendency because its value is undefined in
some distributions.
3. The mode is used when we want to find the value which
occurs most often.

CSU-CICS/2nd sem/2012-13 9
STAT 12
distribution lies above it and 50% lies below it.

Exercise:
On sheets of short bondpaper, to be submitted on Monday.
(Group of 3)
1. Find the mean, median and mode of the scores (n = 44).
2. Present the table and the solutions.

C.. The Mode


- the value which occurs most often in a set of
measurement or values.
1. The Mode for Ungrouped data
- The value or measurement which occurs the most
number of times.
- The most popular value.
- Denoted by Mo.
Unimodal- a distribution having only mode.
Binmodal a distribution having two modes.
Multimodal a distribution having more than two values
for the mode.
No Mode - if all the scores in a set of data occur only
once.
2. The Mode for Grouped Data

fm fa
i
l m
2
f

f
m
a
b

Mo =
Where,

Mo - Mode

l - lower class boundary of the modal class


m

f m - frequency of the modal class

f a - frequency above the modal class


f b - frequency below the modal class

i - size of the class interval

Steps:
1.. find the modal class. This is the class interval
with the highest frequency.
2. use the formula to find the mode.

You might also like