You are on page 1of 134

Descriptive statistics and inferential statistics

Preparing Data for Analysis


Scoring procedures Tabulation and coding

What does it means scoring data?

Scoring data means that the researcher assigns a numeric score (or value) to each response category for each question on the test/instrument to collect the data

Categorizing data

The statistical tests- depend on the type of data being collected It is important to understand the types of data before scoring procedure is conducted

Types of categorical and quantifiable data

Data

Categorical

Quantifiable

Nominal

Ordinal

Interval

Ratio

What is categorical data?

Data which cannot be quantified numerically BUT Place into sets or categories (nominal data) or ranked in some way (ordinal data)

What is quantifiable data


Data can be measured numerically More precise Consist of interval data and ratio data

Four kinds of measurement scales


Nominal Ordinal Interval Ratio

Nominal data
A name value or category with no order or ranking Example: Type of school Types of teaching method Gender Race

Ordinal data

Comprises an ordering or ranking of values ALTHOUGH The ranks are not intended to be equal (for example, an attitude questionnaire)

Example
How of often you felt like insulting a student (Please tick one) Every day Once a week Sometimes Never

Other examples of ordinal data

Questions that rate the quality of students performance (for example, very good, good, fair, poor) Agreements of attitude towards science (Strongly agree, Agree, Disagree, Strongly disagree)

Interval data

Numerical values are assigned along an interval scale with Equal intervals There is no zero point where the trait being measured does not exist

Number of students scoring within various ranges in IQ test


Scores 76-80 81-85 86-90 91-95 96-100 101-105 106-110 111-115 116-120 Frequency 1 0 4 10 21 25 48 18 11

Other examples of interval data

Temperature

Blood pressure

Ratio data
Same characteristics with interval data BUT There is an absolute zero that represent some meaning Example:Costs, sales, number of students, number of teachers,

Types of categorical and quantifiable data

Data

Categorical

Quantifiable

Nominal

Ordinal

Interval

Ratio

Example of the scoring data


Students should be given an opportunity to select a school of their choice Strongly agree _____ Agree _____ Disagree _____ Strongly Disagree _____

A numeric score (or value) to each response category


Strongly agree Agree Disagree Strongly Disagree

4 3 2 1

Other example of scoring data


How of often you felt like insulting a student (Please tick one) Every day Once a week Sometimes Never

A numeric score (or value) to each response category


Every day Once a week Sometimes Never

4 3 2 1

An example of multiple choice question

A. B. C. D.

The quantity of charge which passes through a circuit is measure in Amps Volts Coulombs * Watts

A numeric score (or value) to each response category


A.

B.
C. D.

Correct response- 1 mark, Incorrect response- 0 mark Amps 0 Volts 0 Coulombs 1 Watts 0

Scoring Procedures for Open Ended items

Each participant tests should be scored in the same way and with one criterion Greatly facilitated if a standardized instrument is used Scoring key should be provided Recheck the consistency Clean the data

Clean the data

When a large number of variables and many individual records, it is easy to enter a wrong figure or to miss an entry Do frequency analysis on a column data to throw up any inconsistent/ spurious figures

Scoring Procedures for

More complex if is involved open ended questions Develop a marking scheme Advisable to have at least one other person independently score some of the tests Tried out by administering the tests to similar population as one from the actual study

Example of open ended question

Define population and sample ________________________________ ________________________________ ________________________________ (2 marks)

The marking scheme


Precise and complete definition = 2 Precise but incomplete definition= 1 Incorrect definition= 0

Tabulation and coding


After test/instruments have been scored Transfers to summary data sheet/ computer. For example SPSS data sheet Organize data in the SPPS to facilitates examination and analysis of the data

Tabulation and Coding

Tabulation is organizing data


Identifying all information relevant to the analysis Separating groups and individuals within groups Listing data in columns Assigning names to variables

Coding

EX1 for pretest scores SEX for gender EX2 for posttest scores
Objectives 2.1, 2.2, & 2.3

Tabulation and Coding

Coding

Assigning identification numbers to subjects Assigning codes to the values of nonnumerical or categorical variables

Gender: 1=Female and 2=Male Subjects: 1=English, 2=Math, 3=Science, etc. Names: 001=Ahmad, 002=Rahman, 003=Salleh, 256=Karim
Objectives 2.2 & 2.3

Example

A study investigating the interaction between two types of instruction and two levels of ability (A 2 x 2 factorial design) Four subgroups are involved

Method A

Method B

High ability

68 marks 70 marks 79 marks 50 marks 40 marks 45 marks

78 marks 90 marks 60 marks 60 marks 65 marks 55 marks

Low ability

4 column involved

Students id Types of instruction Level of ability Total scores

Students id

1 represents 2 represents 3 represents 4 represents Etc..

Ahmad Bakar Malik Abu

Types of instruction

Two types of instruction, namely : cooperative and traditional method 1 represents cooperative method 2 represents traditional method

Level of ability

High and low ability 1 represents high ability 2 represents low ability

Total Scores

Example: 50 items/questions Correct answer- 1 mark Incorrect answer 0 mark Full mark: 50 marks Example:If 20 items are answered correctly by Ahmad, that means he will get 20 marks for his total scores

Another example

A study investigating the effect of school location on learning motivation among male and female students

Four columns involved


Students id School location Students gender Learning motivation

Students id

1 represents 2 represents 3 represents 4 represents Etc..

Ahmad Bakar Malik Abu

School location

Urban or rural 1 represents urban 2 represents rural

Students gender

Male and female students 1 represent male 2 represent female

Learning motivation
5 items Likert scale Example:I like to study in order to get good marks in the examination Strongly agree 4 Agree 3 Disagree 2 Strongly Disagree 1

How to calculate item which have Likert scale respons

Total up all the items response for each person to get the total scores Divide the total scores by the number of the items to get the mean of learning motivation for each students

Item 1 = 4 Item 2 = 3 Item 3 = 4 Item 4 = 2 Item 5 = 1 Total scores= 4+3+4+2+1=14 How many items? 5 items Means scores of learning motivation = 14/5 = 2.5

After you have prepared for data analysis, how do you analyse the data?

How to analyse the data


Descriptive statistics Inferential statistics

Descriptive statistics
Describe trends in the data to a single variable on your instrument Example: What is the learning motivation of secondary school students?

Descriptive statistics
What is the learning motivation of secondary school students? In order to answer that, we need descriptive statistics that indicate general tendencies in data, the spread of scores, or relative position

Central Tendency

Purpose to represent the typical score attained by subjects Three common measures

Mode Median Mean

Objective 4.1

Spread of scores (variability)

Purpose to measure the extent to which scores are spread apart Four measures

Range Quartile deviation Variance Standard deviation


Objective 5.1

The normal curve

The Normal Curve

If a sufficient number of subjects are measure, possibly a variable or variables yield a normal, bell-shaped curve If a variable is normally distributed, then several things are true

50% of the scores are above the mean and 50% of the scores are below the mean

The mean, median and the mode are the same

The Normal Curve

Third, the most scores are near the mean and the further from the mean a score is, the fewer the number of subjects who attained the score

The Normal Curve

Fewer Number of Subjects who Attained the Scores

Most Scores

Fewer Number of Subjects who Attained the Scores

The Normal Curve

Fewer Number of Subjects who Attained the Scores

Most Scores

Fewer Number of Subjects who Attained the Scores

The Normal Curve

Fewer Number of Subjects who Attained the Scores

Most Scores

Fewer Number of Subjects who Attained the Scores

The Normal Curve

The Normal Curve

Fourth, the same number, or percentage, of scores is between the mean and plus one standard deviation (mean + 1 SD) as is between the mean and minus one standard deviation (mean 1 SD), and similarly for mean + SD and mean + SD

If scores are normally distributed

Mean + 1.0 SD = approximately 68% of the scores Mean + 2.0 SD = approximately 95% of the scores Mean + 3.0 SD = approximately 99.7% of the scores

Skewed Distributions

Research data usually more or less approximate a normal curve When a distribution is not normal, it is said to be skewed, and the values of the mean, the median and the mode are different In a skewed distribution, there are more extreme scores at one end than the other

Skewed Distributions

If the extreme scores are at lower end of the distribution, the distribution is said to be negatively skewed If the extreme scores are at the upper, or higher, end of the distribution, the distribution is said to be positively skewed The mean is pulled in the direction of the extreme scores

Which one is positively skewed and negatively skewed?

Skewed Distributions

For a negatively skewed distribution, the mean is always lower, or smaller than the median For a positively skewed distribution, the mean is always higher or greater than the median

For a negatively skewed distribution, the mean is always lower, or smaller than the median

For a positively skewed distribution, the mean is always higher or greater than the median

Assessing normality using SPSS


Click on Analyze Click on Descriptive Statistics, then Explore Click the variable/s you are interested Click the arrow button to move them into Dependent List Click on the Plots button

Under Descriptive, click the Histogram Click on Normality Plots with Test Click on Continue Click OK

Interpretation of output from explore


Skewness and kurtosis values Test of Normality (Kolmogorov Smirnov statistic) Histogram Normal Probability plots (Normal Q-Q Plots)

Skewness and kurtosis values

Skewness and kurtosis values provide information about the distribution of scores

Kurtosis

A measure of the peakedness or the flatness of a distribution A kurtosis value near zero (0) indicates a shape close to normal A positive value of kurtosis indicates a shape flatter than normal A positive value of kurtosis indicates a shape more peaked than normal A range of kurtosis value between -1.0 and +1.0 is considered as excellent, but a value between -2.0 and +2.0 is considered acceptable

Kurtosis

Skewness

Measures to what extent a distribution values deviates from symmetry around the mean A value of zero represents a symmetric or evenly balanced distribution A positive skewness indicates a greater number of smaller values A negative skewness indicates a greater number of larger values

Skewness

Test of Normality (Kolmogorov Smirnov statistic)

Test of Normality which is Kolmogorov Smirnov statistic assesses the normality of the distribution scores A non-significant result (significant value of more than 0.05) indicates normality A significant result (significant value of 0.05 or less than 0.05) suggests violation of the assumption of normality

Histogram and Normal Q-Q Plots

The actual shape of distribution can be seen in histogram In order to support the claim that the data is normally distributed, refer to normal Q-Q plot Normal Q-Q plot- the observed value for each score is plotted against the expected value from the normal distribution A reasonably straight line suggests a normal distribution

Graphic representation

Bar chart Histogram Pie chart

Inferential statistics

What is the purpose of inferential statistics?

To compare two or more groups on the independent variable in terms of the dependent variable ( for example: Is there a significant difference between boys and girls on self esteem?) Independent variable: gender (boys and girls Dependent variable: self esteem

Inferential statistics involves hypothesis testing

Null hypothesis: There is no significance difference between boys and girls on self esteem Alternative hypothesis: There is a significant difference between boys and girls on self esteem

Other purpose of inferential statistics

Relate two or more variables (for example: Does self esteem relate to academic achievement?) Null hypothesis: There is no significant relationship between self esteem and academic achievement Alternative hypothesis: There is a significant relationship between self esteem and academic achievement

Important Perspectives

Inferential statistics

Allow researchers to generalize to a population of individuals based on information obtained from a sample of those individuals Assess whether the results obtained from a sample are the same as those that would have been calculated for the entire population

Types of Inferential Statistics

Two issues discussed


Steps involved in testing for significance Types of tests

Steps in Statistical Testing

State the null and alternative hypotheses Set alpha level Identify and compute the test statistic Compare the computed test statistic to the criteria for significance

Objectives 20.1 20.9

Alpha Level

An established probability level which serves as the criterion to determine whether to accept or reject the null hypothesis Common levels in education

.01 .05 (the most common) .10

Reject the null hypothesis

If the probability values is less than or equal to the significance level, then reject the null hypothesis, and conclude that the research finding is statistically significant
Objective 20.9

Fail to reject the null hypothesis

If the probability values is greater than the significance level, then fail to reject the null hypothesis, and conclude that the research finding is not statistically significant

Inferential Statistics

T-Test

Determine whether two means are significantly different at a selected probability level

Independent Samples T-Test

Determine whether there is a probably a significant difference between means of two independent samples

Independent samples

Two samples that are randomly formed without any type of matching The members of one sample are not related to members of the other sample in any systematic way other than they are selected from the same population

Example
Group 1 Test Scores Group 2 Test Scores

3 4 5 6 7

2 3 3 3 4

Are these two sets of scores significantly different? They are different, but are they significantly different?

Presenting the results for independent samples t-test

An independent samples t-test was conducted to compare the achievement test scores for boys and girls. There was no significant difference in scores for boys (M=34.02, SD= 4.91), and girls (M= 33.17; SD = 5.71; t (434) = 1.62, p =0.11).

Non independent sample t-test or Paired samples t-test

Nonindependent sample t-test

When samples are not independent, the members of one group are systematically related the members of a second group The most familiar example is if the same group takes the test at two different times In SPSS, it is known as Paired Samples T-Test

Example
Group 1 Test Scores (Time 1) 2 3 3 3 4 Group 1 Test Scores (Time 2) 3 4 5 6 7

Do the test scores of Group 1 improve after they have taken for the second time? If yes, does the test scores of Group 1 improve significantly after they have taken for the second time?

Presenting the results for paired samples t-test

A paired samples t-test was conducted to evaluate the impact of the intervention on students achievement scores. There was statistically significant decrease in achievement scores from Time 1 (M=40.17, SD= 5.16) to Time 2 (M= 37.5, SD= 5.15), t(29) = 5.39, p ,0.005.

One Way Analysis of Variance (One Way ANOVA)

To determine whether there is a significant difference between more than two means a selected probability level

Example
Group 1 Test Group 2 Test Group 3 Test Scores Scores Scores 1 2 4 2 3 4 2 4 4 2 5 5 3 6 7 Are these three sets of scores significantly different? They are different, but are they significantly different?

Multiple comparison

If the F ratio is determined to be nonsignificant, the party is over But what if it is significant? Multiple comparison are used to determine which means are significantly different from other means

Example
Group 1 Test Scores 1 2 2 2 3 Group 2 Test Scores 2 3 4 5 6 Group 3 Test Scores 4 4 4 5 7

ANOVA results show that there are significant difference between the means of three groups

The use of Multiple Comparison

Multiple comparison procedure used to determine whether the means of:- group 1 differs from group 2, OR - group 1 differ from group 3, OR - group 2 differs from group 3?

Example of multiple comparison technique


Tukey Test Scheffe Test Duncan Test Bonferroni Test HSD Test

Presenting the results from one way ANOVA with post hoc test

A one way between group analysis of variance was conducted to explore the difference of achievement scores between three group (Group 1, Group 2, Group 3). There was a statistically significant difference at the p<0.05 level in achievement scores for the three age groups [F(2, 432) = 4.6, p= 0.01]. continue..

Presenting the results from one way ANOVA with post hoc test

Post-hoc comparisons using the Tukey test indicated that the mean score for Group 1 (M=21.36, SD= 4.55) was significantly different from Group 3 (M= 22.96; SD= 4.49). Group 2 (M= 22.10, SD= 4.15) did not differ significantly from either Group 1 or 3.

Two Way ANOVA

Analysis of data which involve factorial design What is factorial design?

Factorial design

When two or more independent variables involved in a study

Example
Method A

Method B

High ability

Low ability

2 X 2 Factorial Design

2 ways ANOVA

Determine main effect on achievement for method (determine there is a significant difference between mean scores of Method A and Method B)

2 ways ANOVA

Determine main effect on achievement for ability (determine there is a significant difference between mean scores of high and low ability)

Interaction effect

Is there a significant interaction effect between method and ability on achievement?

How to understand there is an interaction effect between method (method A and method B) and students ability (high and low?

More advance than correlation and linear regression Correlation- relationship between two variable (Ex: relationship between attitude towards learning and academic achievement) Linear regression- the relationship between predictor variable and dependent variable (Ex: Can attitude towards learning predict academic achievement of students?)

Multiple Regression

Multiple regression- a combination of two or more variables to predict a dependent variable (Ex: Can attitude towards learning and thinking ability predict academic achievement of students?)

Multiple Regression

You might also like