You are on page 1of 134

DESCRIPTIVE STATISTICS

BY
AND INFERENTIAL
DR. MOHD ALI
SAMSUDIN
STATISTICS:
PART
1

Preparing Data for


Analysis

Scoring procedures
Tabulation and coding

What does it means scoring


data?

Scoring data means that the


researcher assigns a numeric score
(or value) to each response
category for each question on the
test/instrument to collect the data

Categorizing data

The statistical tests- depend on the


type of data being collected
It is important to understand the
types of data before scoring
procedure is conducted

Types of categorical and quantifiable


data

Data

Categorical

Nominal

Ordinal

Quantifiable

Interval

Ratio

What is categorical data?

Data which cannot be quantified


numerically
BUT
Place into sets or categories
(nominal data) or ranked in some
way (ordinal data)

What is quantifiable data

Data can be measured numerically


More precise
Consist of interval data and ratio
data

Four kinds of measurement


scales

Nominal
Ordinal
Interval
Ratio

Nominal data
A name value or category with no
order or ranking
Example: Type of school
Types of teaching method
Gender
Race

Ordinal data

Comprises an ordering or ranking


of values
ALTHOUGH
The ranks are not intended to be
equal (for example, an attitude
questionnaire)

Example

How of often you felt like insulting


a student (Please tick one)
Every day
Once a week
Sometimes
Never

Other examples of ordinal


data

Questions that rate the quality of


students performance (for
example, very good, good, fair,
poor)
Agreements of attitude towards
science (Strongly agree, Agree,
Disagree, Strongly disagree)

Interval data

Numerical values are assigned


along an interval scale with
Equal intervals
There is no zero point where the
trait being measured does not
exist

Number of students scoring


within various ranges in IQ test
Scores
76-80
81-85
86-90
91-95
96-100
101-105
106-110
111-115
116-120

Frequency
1
0
4
10
21
25
48
18
11

Other examples of interval


data

Temperature

Blood pressure

Ratio data
Same characteristics with interval
data
BUT
There is an absolute zero that
represent some meaning
Example:Costs, sales, number of students,
number of teachers,

Types of categorical and quantifiable


data

Data

Categorical

Nominal

Ordinal

Quantifiable

Interval

Ratio

Example of the scoring


data

Students should be given an


opportunity to select a school of
their choice
Strongly agree _____
Agree
_____
Disagree
_____
Strongly Disagree _____

A numeric score (or value) to


each response category

Strongly agree 4
Agree
3
Disagree
2
Strongly Disagree

Other example of scoring


data

How of often you felt like insulting


a student (Please tick one)
Every day
Once a week
Sometimes
Never

A numeric score (or value) to


each response category

Every day
Once a week
Sometimes
Never

4
3
2
1

An example of multiple
choice question

A.
B.
C.
D.

The quantity of charge which


passes through a circuit is
measure in
Amps
Volts
Coulombs *
Watts

A numeric score (or value) to


each response category

A.
B.
C.
D.

Correct response- 1 mark,


Incorrect response- 0 mark
Amps
0
Volts
0
Coulombs 1
Watts
0

Scoring Procedures for Open


Ended items

Each participant tests should be


scored in the same way and with one
criterion
Greatly facilitated if a standardized
instrument is used
Scoring key should be provided
Recheck the consistency
Clean the data

Clean the data

When a large number of variables


and many individual records, it is
easy to enter a wrong figure or to
miss an entry
Do frequency analysis on a column
data to throw up any inconsistent/
spurious figures

Scoring Procedures for

More complex if is involved open ended


questions
Develop a marking scheme
Advisable to have at least one other
person independently score some of the
tests
Tried out by administering the tests to
similar population as one from the actual
study

Example of open ended


question

Define population and sample


___________________________________
___________________________________
__________________________
(2 marks)

The marking scheme

Precise and complete definition =


2
Precise but incomplete definition=
1
Incorrect definition= 0

Tabulation and coding

After test/instruments have been


scored
Transfers to summary data sheet/
computer. For example SPSS data
sheet
Organize data in the SPPS to
facilitates examination and analysis
of the data

Tabulation and Coding

Tabulation is organizing data

Identifying all information relevant to the


analysis
Separating groups and individuals within groups
Listing data in columns

Coding

Assigning names to variables

EX1 for pretest scores


SEX for gender
EX2 for posttest scores
Objectives 2.1, 2.2, & 2.3

Tabulation and Coding

Coding

Assigning identification numbers to


subjects
Assigning codes to the values of nonnumerical or categorical variables

Gender: 1=Female and 2=Male


Subjects: 1=English, 2=Math, 3=Science,
etc.
Names: 001=Ahmad, 002=Rahman,
003=Salleh, 256=Karim
Objectives 2.2 & 2.3

Example

A study investigating the


interaction between two types of
instruction and two levels of ability
(A 2 x 2 factorial design)
Four subgroups are involved

Method A

Method B

High
ability

68 marks
70 marks
79 marks

78 marks
90 marks
60 marks

Low
ability

50 marks
40 marks
45 marks

60 marks
65 marks
55 marks

4 column involved

Students id
Types of instruction
Level of ability
Total scores

Students id

1 represents
2 represents
3 represents
4 represents
Etc..

Ahmad
Bakar
Malik
Abu

Types of instruction

Two types of instruction, namely :


cooperative and traditional method
1 represents cooperative method
2 represents traditional method

Level of ability

High and low ability


1 represents high ability
2 represents low ability

Total Scores

Example: 50 items/questions
Correct answer- 1 mark
Incorrect answer 0 mark
Full mark: 50 marks
Example:If 20 items are answered correctly
by Ahmad, that means he will get
20 marks for his total scores

Another example

A study investigating the effect of


school location on learning
motivation among male and
female students

Four columns involved

Students id
School location
Students gender
Learning motivation

Students id

1 represents
2 represents
3 represents
4 represents
Etc..

Ahmad
Bakar
Malik
Abu

School location

Urban or rural
1 represents urban
2 represents rural

Students gender

Male and female students


1 represent male
2 represent female

Learning motivation
5 items
Likert scale
Example:I like to study in order to get good
marks in the examination
Strongly agree
4
Agree
3
Disagree
2
Strongly Disagree 1

How to calculate item which


have Likert scale respons

Total up all the items response for


each person to get the total scores
Divide the total scores by the
number of the items to get the
mean of learning motivation for
each students

Item 1 = 4
Item 2 = 3
Item 3 = 4
Item 4 = 2
Item 5 = 1
Total scores= 4+3+4+2+1=14
How many items? 5 items
Means scores of learning motivation
= 14/5 = 2.5

After you have prepared for


data analysis, how do you
analyse the data?

How to analyse the data

Descriptive statistics
Inferential statistics

Descriptive statistics
Describe trends in the data to a
single variable on your instrument
Example:
What is the learning motivation of
secondary school students?

Descriptive statistics

What is the learning motivation of


secondary school students?
In order to answer that, we need
descriptive statistics that indicate
general tendencies in data, the
spread of scores, or relative
position

Central Tendency

Purpose to represent the typical


score attained by subjects
Three common measures

Mode
Median
Mean

Objective 4.1

Spread of scores
(variability)

Purpose to measure the extent to


which scores are spread apart
Four measures

Range
Quartile deviation
Variance
Standard deviation
Objective 5.1

The normal curve

The Normal Curve

If a sufficient number of subjects


are measure, possibly a variable or
variables yield a normal, bellshaped curve
If a variable is normally distributed,
then several things are true

50% of the scores are above


the mean and 50% of the
scores are below the mean

The mean, median and the


mode are the same

The Normal Curve

Third, the most scores are near the


mean and the further from the
mean a score is, the fewer the
number of subjects who attained
the score

The Normal Curve

Fewer Number of
Subjects who
Attained the Scores

Most
Scores

Fewer Number of
Subjects who
Attained the Scores

The Normal Curve

Fewer Number of
Subjects who
Attained the Scores

Most
Scores

Fewer Number of
Subjects who
Attained the Scores

The Normal Curve

Fewer Number of
Subjects who
Attained the Scores

Most
Scores

Fewer Number of
Subjects who
Attained the Scores

The Normal Curve

The Normal Curve

Fourth, the same number, or


percentage, of scores is between
the mean and plus one standard
deviation (mean + 1 SD) as is
between the mean and minus one
standard deviation (mean 1 SD),
and similarly for mean + SD and
mean + SD

If scores are normally


distributed

Mean + 1.0 SD = approximately


68% of the scores
Mean + 2.0 SD = approximately
95% of the scores
Mean + 3.0 SD = approximately
99.7% of the scores

Skewed Distributions

Research data usually more or less


approximate a normal curve
When a distribution is not normal, it is
said to be skewed, and the values of the
mean, the median and the mode are
different
In a skewed distribution, there are more
extreme scores at one end than the
other

Skewed Distributions

If the extreme scores are at lower end of


the distribution, the distribution is said
to be negatively skewed
If the extreme scores are at the upper,
or higher, end of the distribution, the
distribution is said to be positively
skewed
The mean is pulled in the direction of
the extreme scores

Which one is positively


skewed and negatively
skewed?

Skewed Distributions

For a negatively skewed


distribution, the mean is always
lower, or smaller than the median
For a positively skewed
distribution, the mean is always
higher or greater than the median

For a negatively skewed


distribution, the mean is
always lower, or smaller
than the median

For a positively skewed


distribution, the mean is
always higher or greater than
the median

Assessing normality using


SPSS

Click on Analyze
Click on Descriptive Statistics,
then Explore
Click the variable/s you are interested
Click the arrow button to move them
into Dependent List
Click on the Plots button

Under Descriptive, click the


Histogram
Click on Normality Plots with
Test
Click on Continue
Click OK

Interpretation of output from


explore

Skewness and kurtosis values


Test of Normality (Kolmogorov
Smirnov statistic)
Histogram
Normal Probability plots (Normal
Q-Q Plots)

Skewness and kurtosis


values

Skewness and kurtosis values


provide information about the
distribution of scores

Kurtosis

A measure of the peakedness or the


flatness of a distribution
A kurtosis value near zero (0) indicates a
shape close to normal
A positive value of kurtosis indicates a shape
flatter than normal
A positive value of kurtosis indicates a shape
more peaked than normal
A range of kurtosis value between -1.0 and
+1.0 is considered as excellent, but a value
between -2.0 and +2.0 is considered
acceptable

Kurtosis

Skewness

Measures to what extent a distribution


values deviates from symmetry around
the mean
A value of zero represents a symmetric
or evenly balanced distribution
A positive skewness indicates a greater
number of smaller values
A negative skewness indicates a
greater number of larger values

Skewness

Test of Normality (Kolmogorov


Smirnov statistic)

Test of Normality which is Kolmogorov


Smirnov statistic assesses the normality
of the distribution scores
A non-significant result (significant value
of more than 0.05) indicates normality
A significant result (significant value of
0.05 or less than 0.05) suggests
violation of the assumption of normality

The actual shape of distribution can be seen


in histogram
In order to support the claim that the data is
normally distributed, refer to normal Q-Q plot
Normal Q-Q plot- the observed value for each
score is plotted against the expected value
from the normal distribution
A reasonably straight line suggests a normal
distribution

Histogram and Normal Q-Q


Plots

Graphic representation

Bar chart
Histogram
Pie chart

Inferential statistics

What is the purpose of


inferential statistics?

To compare two or more groups on


the independent variable in terms
of the dependent variable ( for
example: Is there a significant
difference between boys and
girls on self esteem?)
Independent variable: gender
(boys and girls
Dependent variable: self esteem

Inferential statistics involves


hypothesis testing

Null hypothesis: There is no


significance difference between
boys and girls on self esteem
Alternative hypothesis: There is a
significant difference between boys
and girls on self esteem

Other purpose of inferential


statistics

Relate two or more variables (for


example: Does self esteem relate
to academic achievement?)
Null hypothesis: There is no
significant relationship between self
esteem and academic achievement
Alternative hypothesis: There is a
significant relationship between self
esteem and academic achievement

Important Perspectives

Inferential statistics

Allow researchers to generalize to a


population of individuals based on
information obtained from a sample
of those individuals
Assess whether the results obtained
from a sample are the same as those
that would have been calculated for
the entire population

Types of Inferential
Statistics

Two issues discussed

Steps involved in testing for


significance
Types of tests

Steps in Statistical Testing

State the null and alternative


hypotheses
Set alpha level
Identify and compute the test
statistic
Compare the computed test
statistic to the criteria for
significance
Objectives 20.1 20.9

Alpha Level

An established probability level which


serves as the criterion to determine
whether to accept or reject the null
hypothesis
Common levels in education

.01
.05 (the most common)
.10

Reject the null hypothesis


If the probability values is less
than or equal to the significance
level,
then reject the null hypothesis,
and
conclude that the research
finding is statistically significant

Objective 20.9

Fail to reject the null


hypothesis
If the probability values is greater
than the significance level,
then fail to reject the null
hypothesis, and
conclude that the research
finding is not statistically
significant

Inferential Statistics

T-Test

Determine whether two means are


significantly different at a selected
probability level

Independent Samples TTest

Determine whether there is a


probably a significant difference
between means of two
independent samples

Independent samples

Two samples that are randomly


formed without any type of
matching
The members of one sample are
not related to members of the
other sample in any systematic
way other than they are selected
from the same population

Example
Group 1 Test Scores

Group 2 Test Scores

3
4
5
6
7

2
3
3
3
4

Are these two sets of scores significantly


different? They are different, but are they
significantly different?

Presenting the results for


independent samples t-test

An independent samples t-test was


conducted to compare the
achievement test scores for boys
and girls. There was no significant
difference in scores for boys
(M=34.02, SD= 4.91), and girls
(M= 33.17; SD = 5.71; t (434) =
1.62, p =0.11).

Non independent sample ttest


or
Paired samples t-test

Nonindependent sample ttest

When samples are not independent, the


members of one group are
systematically related the members of a
second group
The most familiar example is if the same
group takes the test at two different
times
In SPSS, it is known as Paired Samples TTest

Example
Group 1 Test Scores
(Time 1)
2
3
3
3
4

Group 1 Test Scores


(Time 2)
3
4
5
6
7

Do the test scores of Group 1 improve after they


have taken for the second time? If yes, does the
test scores of Group 1 improve significantly
after they have taken for the second time?

Presenting the results for


paired samples t-test

A paired samples t-test was conducted


to evaluate the impact of the
intervention on students achievement
scores. There was statistically
significant decrease in achievement
scores from Time 1 (M=40.17, SD=
5.16) to Time 2 (M= 37.5, SD= 5.15),
t(29) = 5.39, p ,0.005.

One Way Analysis of Variance


(One Way ANOVA)

To determine whether there is a


significant difference between
more than two means a selected
probability level

Example
Group 1 Test
Scores
1
2
2
2
3

Group 2 Test
Scores
2
3
4
5
6

Group 3 Test
Scores
4
4
4
5
7

Are these three sets of scores significantly


different? They are different, but are they
significantly different?

Multiple comparison

If the F ratio is determined to be


nonsignificant, the party is over
But what if it is significant?
Multiple comparison are used to
determine which means are
significantly different from other
means

Example
Group 1 Test
Scores
1
2
2
2
3

Group 2 Test
Scores
2
3
4
5
6

Group 3 Test
Scores
4
4
4
5
7

ANOVA results show that there are significant


difference between the means of three groups

The use of Multiple


Comparison

Multiple comparison procedure


used to determine whether the
means of:- group 1 differs from group 2, OR
- group 1 differ from group 3, OR
- group 2 differs from group 3?

Example of multiple
comparison technique

Tukey Test
Scheffe Test
Duncan Test
Bonferroni Test
HSD Test

Presenting the results from


one way ANOVA with post
A one
hoc
test
way between group analysis of

variance was conducted to explore the


difference of achievement scores
between three group (Group 1, Group 2,
Group 3). There was a statistically
significant difference at the p<0.05
level in achievement scores for the
three age groups [F(2, 432) = 4.6, p=
0.01].
continue..

Presenting the results from


one way ANOVA with post
Post-hoc
hoc
test comparisons using the
Tukey test indicated that the mean
score for Group 1 (M=21.36, SD=
4.55) was significantly different
from Group 3 (M= 22.96; SD=
4.49). Group 2 (M= 22.10, SD=
4.15) did not differ significantly
from either Group 1 or 3.

Two Way ANOVA

Analysis of data which involve


factorial design
What is factorial design?

Factorial design

When two or more independent


variables involved in a study

Example
Method A

Method B

High ability

Low ability

2 X 2 Factorial Design

2 ways ANOVA

Determine main effect on


achievement for method
(determine there is a significant
difference between mean scores of
Method A and Method B)

2 ways ANOVA

Determine main effect on


achievement for ability (determine
there is a significant difference
between mean scores of high and
low ability)

Interaction effect

Is there a significant interaction


effect between method and ability
on achievement?

How to understand there is


an interaction effect between
method (method A and
method B) and students
ability (high and low?

Multiple Regression

More advance than correlation and linear


regression
Correlation- relationship between two
variable (Ex: relationship between attitude
towards learning and academic
achievement)
Linear regression- the relationship
between predictor variable and dependent
variable (Ex: Can attitude towards learning
predict academic achievement of
students?)

Multiple Regression

Multiple regression- a combination of


two or more variables to predict a
dependent variable
(Ex: Can attitude towards learning
and thinking ability predict academic
achievement of students?)

You might also like