Descriptive and Inferential Statistics Part 1 2013 2014

Descriptive statistics and inferential statistics
Preparing Data for Analysis

Scoring procedures Tabulation and coding
What does it means scoring data?
Scoring data means that the researcher assigns a numeric score (or value) to each response category for each question on the test/instrument to collect the data
Categorizing data
The statistical tests- depend on the type of data being collected It is important to understand the types of data before scoring procedure is conducted
Types of categorical and quantifiable data
Data
Categorical
Quantifiable
Nominal
Ordinal
Interval
Ratio
What is categorical data?
Data which cannot be quantified numerically BUT Place into sets or categories (nominal data) or ranked in some way (ordinal data)
What is quantifiable data

Data can be measured numerically More precise Consist of interval data and ratio data
Four kinds of measurement scales

Nominal Ordinal Interval Ratio
Nominal data
A name value or category with no order or ranking Example: Type of school Types of teaching method Gender Race
Ordinal data
Comprises an ordering or ranking of values ALTHOUGH The ranks are not intended to be equal (for example, an attitude questionnaire)
Example
How of often you felt like insulting a student (Please tick one) Every day Once a week Sometimes Never
Other examples of ordinal data
Questions that rate the quality of students performance (for example, very good, good, fair, poor) Agreements of attitude towards science (Strongly agree, Agree, Disagree, Strongly disagree)
Interval data
Numerical values are assigned along an interval scale with Equal intervals There is no zero point where the trait being measured does not exist
Number of students scoring within various ranges in IQ test

Scores 76-80 81-85 86-90 91-95 96-100 101-105 106-110 111-115 116-120 Frequency 1 0 4 10 21 25 48 18 11
Other examples of interval data
Temperature
Blood pressure
Ratio data
Same characteristics with interval data BUT There is an absolute zero that represent some meaning Example:Costs, sales, number of students, number of teachers,
Types of categorical and quantifiable data
Data
Categorical
Quantifiable
Nominal
Ordinal
Interval
Ratio
Example of the scoring data

Students should be given an opportunity to select a school of their choice Strongly agree _____ Agree _____ Disagree _____ Strongly Disagree _____
A numeric score (or value) to each response category

Strongly agree Agree Disagree Strongly Disagree
4 3 2 1
Other example of scoring data

How of often you felt like insulting a student (Please tick one) Every day Once a week Sometimes Never

Every day Once a week Sometimes Never
4 3 2 1
An example of multiple choice question
A. B. C. D.
The quantity of charge which passes through a circuit is measure in Amps Volts Coulombs * Watts

A.
B.
C. D.
Correct response- 1 mark, Incorrect response- 0 mark Amps 0 Volts 0 Coulombs 1 Watts 0
Scoring Procedures for Open Ended items
Each participant tests should be scored in the same way and with one criterion Greatly facilitated if a standardized instrument is used Scoring key should be provided Recheck the consistency Clean the data
Clean the data
When a large number of variables and many individual records, it is easy to enter a wrong figure or to miss an entry Do frequency analysis on a column data to throw up any inconsistent/ spurious figures
Scoring Procedures for
More complex if is involved open ended questions Develop a marking scheme Advisable to have at least one other person independently score some of the tests Tried out by administering the tests to similar population as one from the actual study
Example of open ended question
Define population and sample ________________________________ ________________________________ ________________________________ (2 marks)
The marking scheme

Precise and complete definition = 2 Precise but incomplete definition= 1 Incorrect definition= 0
Tabulation and coding

After test/instruments have been scored Transfers to summary data sheet/ computer. For example SPSS data sheet Organize data in the SPPS to facilitates examination and analysis of the data
Tabulation and Coding
Tabulation is organizing data

Identifying all information relevant to the analysis Separating groups and individuals within groups Listing data in columns Assigning names to variables

Coding
EX1 for pretest scores SEX for gender EX2 for posttest scores
Objectives 2.1, 2.2, & 2.3
Tabulation and Coding
Coding
Assigning identification numbers to subjects Assigning codes to the values of nonnumerical or categorical variables

Gender: 1=Female and 2=Male Subjects: 1=English, 2=Math, 3=Science, etc. Names: 001=Ahmad, 002=Rahman, 003=Salleh, 256=Karim
Objectives 2.2 & 2.3
Example
A study investigating the interaction between two types of instruction and two levels of ability (A 2 x 2 factorial design) Four subgroups are involved
Method A
Method B
High ability
68 marks 70 marks 79 marks 50 marks 40 marks 45 marks
78 marks 90 marks 60 marks 60 marks 65 marks 55 marks
Low ability
4 column involved

Students id Types of instruction Level of ability Total scores
Students id

1 represents 2 represents 3 represents 4 represents Etc..
Ahmad Bakar Malik Abu
Types of instruction
Two types of instruction, namely : cooperative and traditional method 1 represents cooperative method 2 represents traditional method
Level of ability

High and low ability 1 represents high ability 2 represents low ability
Total Scores

Example: 50 items/questions Correct answer- 1 mark Incorrect answer 0 mark Full mark: 50 marks Example:If 20 items are answered correctly by Ahmad, that means he will get 20 marks for his total scores
Another example
A study investigating the effect of school location on learning motivation among male and female students
Four columns involved

Students id School location Students gender Learning motivation
Students id

1 represents 2 represents 3 represents 4 represents Etc..
Ahmad Bakar Malik Abu
School location

Urban or rural 1 represents urban 2 represents rural
Students gender

Male and female students 1 represent male 2 represent female
Learning motivation
5 items Likert scale Example:I like to study in order to get good marks in the examination Strongly agree 4 Agree 3 Disagree 2 Strongly Disagree 1
How to calculate item which have Likert scale respons
Total up all the items response for each person to get the total scores Divide the total scores by the number of the items to get the mean of learning motivation for each students
Item 1 = 4 Item 2 = 3 Item 3 = 4 Item 4 = 2 Item 5 = 1 Total scores= 4+3+4+2+1=14 How many items? 5 items Means scores of learning motivation = 14/5 = 2.5
After you have prepared for data analysis, how do you analyse the data?
How to analyse the data

Descriptive statistics Inferential statistics
Descriptive statistics
Describe trends in the data to a single variable on your instrument Example: What is the learning motivation of secondary school students?
Descriptive statistics
What is the learning motivation of secondary school students? In order to answer that, we need descriptive statistics that indicate general tendencies in data, the spread of scores, or relative position
Central Tendency
Purpose to represent the typical score attained by subjects Three common measures

Mode Median Mean
Objective 4.1
Spread of scores (variability)
Purpose to measure the extent to which scores are spread apart Four measures

Range Quartile deviation Variance Standard deviation

Objective 5.1
The normal curve
The Normal Curve
If a sufficient number of subjects are measure, possibly a variable or variables yield a normal, bell-shaped curve If a variable is normally distributed, then several things are true
50% of the scores are above the mean and 50% of the scores are below the mean
The mean, median and the mode are the same
The Normal Curve
Third, the most scores are near the mean and the further from the mean a score is, the fewer the number of subjects who attained the score
The Normal Curve
Fewer Number of Subjects who Attained the Scores
Most Scores
The Normal Curve
Most Scores
The Normal Curve
Most Scores
The Normal Curve
The Normal Curve
Fourth, the same number, or percentage, of scores is between the mean and plus one standard deviation (mean + 1 SD) as is between the mean and minus one standard deviation (mean 1 SD), and similarly for mean + SD and mean + SD
If scores are normally distributed
Mean + 1.0 SD = approximately 68% of the scores Mean + 2.0 SD = approximately 95% of the scores Mean + 3.0 SD = approximately 99.7% of the scores
Skewed Distributions
Research data usually more or less approximate a normal curve When a distribution is not normal, it is said to be skewed, and the values of the mean, the median and the mode are different In a skewed distribution, there are more extreme scores at one end than the other
If the extreme scores are at lower end of the distribution, the distribution is said to be negatively skewed If the extreme scores are at the upper, or higher, end of the distribution, the distribution is said to be positively skewed The mean is pulled in the direction of the extreme scores
Which one is positively skewed and negatively skewed?
For a negatively skewed distribution, the mean is always lower, or smaller than the median For a positively skewed distribution, the mean is always higher or greater than the median
For a negatively skewed distribution, the mean is always lower, or smaller than the median
For a positively skewed distribution, the mean is always higher or greater than the median
Assessing normality using SPSS

Click on Analyze Click on Descriptive Statistics, then Explore Click the variable/s you are interested Click the arrow button to move them into Dependent List Click on the Plots button
Under Descriptive, click the Histogram Click on Normality Plots with Test Click on Continue Click OK
Interpretation of output from explore

Skewness and kurtosis values Test of Normality (Kolmogorov Smirnov statistic) Histogram Normal Probability plots (Normal Q-Q Plots)
Skewness and kurtosis values
Skewness and kurtosis values provide information about the distribution of scores
Kurtosis

A measure of the peakedness or the flatness of a distribution A kurtosis value near zero (0) indicates a shape close to normal A positive value of kurtosis indicates a shape flatter than normal A positive value of kurtosis indicates a shape more peaked than normal A range of kurtosis value between -1.0 and +1.0 is considered as excellent, but a value between -2.0 and +2.0 is considered acceptable
Kurtosis
Skewness
Measures to what extent a distribution values deviates from symmetry around the mean A value of zero represents a symmetric or evenly balanced distribution A positive skewness indicates a greater number of smaller values A negative skewness indicates a greater number of larger values
Skewness
Test of Normality (Kolmogorov Smirnov statistic)
Test of Normality which is Kolmogorov Smirnov statistic assesses the normality of the distribution scores A non-significant result (significant value of more than 0.05) indicates normality A significant result (significant value of 0.05 or less than 0.05) suggests violation of the assumption of normality
Histogram and Normal Q-Q Plots
The actual shape of distribution can be seen in histogram In order to support the claim that the data is normally distributed, refer to normal Q-Q plot Normal Q-Q plot- the observed value for each score is plotted against the expected value from the normal distribution A reasonably straight line suggests a normal distribution
Graphic representation

Bar chart Histogram Pie chart
Inferential statistics
What is the purpose of inferential statistics?
To compare two or more groups on the independent variable in terms of the dependent variable ( for example: Is there a significant difference between boys and girls on self esteem?) Independent variable: gender (boys and girls Dependent variable: self esteem
Inferential statistics involves hypothesis testing
Null hypothesis: There is no significance difference between boys and girls on self esteem Alternative hypothesis: There is a significant difference between boys and girls on self esteem
Other purpose of inferential statistics
Relate two or more variables (for example: Does self esteem relate to academic achievement?) Null hypothesis: There is no significant relationship between self esteem and academic achievement Alternative hypothesis: There is a significant relationship between self esteem and academic achievement
Important Perspectives
Inferential statistics
Allow researchers to generalize to a population of individuals based on information obtained from a sample of those individuals Assess whether the results obtained from a sample are the same as those that would have been calculated for the entire population
Types of Inferential Statistics
Two issues discussed

Steps involved in testing for significance Types of tests
Steps in Statistical Testing
State the null and alternative hypotheses Set alpha level Identify and compute the test statistic Compare the computed test statistic to the criteria for significance
Objectives 20.1 20.9
Alpha Level
An established probability level which serves as the criterion to determine whether to accept or reject the null hypothesis Common levels in education
.01 .05 (the most common) .10
Reject the null hypothesis
If the probability values is less than or equal to the significance level, then reject the null hypothesis, and conclude that the research finding is statistically significant
Objective 20.9
Fail to reject the null hypothesis
If the probability values is greater than the significance level, then fail to reject the null hypothesis, and conclude that the research finding is not statistically significant
Inferential Statistics
T-Test
Determine whether two means are significantly different at a selected probability level
Independent Samples T-Test
Determine whether there is a probably a significant difference between means of two independent samples
Independent samples
Two samples that are randomly formed without any type of matching The members of one sample are not related to members of the other sample in any systematic way other than they are selected from the same population
Example
Group 1 Test Scores Group 2 Test Scores
3 4 5 6 7
2 3 3 3 4
Are these two sets of scores significantly different? They are different, but are they significantly different?
Presenting the results for independent samples t-test
An independent samples t-test was conducted to compare the achievement test scores for boys and girls. There was no significant difference in scores for boys (M=34.02, SD= 4.91), and girls (M= 33.17; SD = 5.71; t (434) = 1.62, p =0.11).
Non independent sample t-test or Paired samples t-test
Nonindependent sample t-test
When samples are not independent, the members of one group are systematically related the members of a second group The most familiar example is if the same group takes the test at two different times In SPSS, it is known as Paired Samples T-Test
Example
Group 1 Test Scores (Time 1) 2 3 3 3 4 Group 1 Test Scores (Time 2) 3 4 5 6 7
Do the test scores of Group 1 improve after they have taken for the second time? If yes, does the test scores of Group 1 improve significantly after they have taken for the second time?
Presenting the results for paired samples t-test
A paired samples t-test was conducted to evaluate the impact of the intervention on students achievement scores. There was statistically significant decrease in achievement scores from Time 1 (M=40.17, SD= 5.16) to Time 2 (M= 37.5, SD= 5.15), t(29) = 5.39, p ,0.005.
One Way Analysis of Variance (One Way ANOVA)
To determine whether there is a significant difference between more than two means a selected probability level
Example
Group 1 Test Group 2 Test Group 3 Test Scores Scores Scores 1 2 4 2 3 4 2 4 4 2 5 5 3 6 7 Are these three sets of scores significantly different? They are different, but are they significantly different?
Multiple comparison
If the F ratio is determined to be nonsignificant, the party is over But what if it is significant? Multiple comparison are used to determine which means are significantly different from other means
Example
Group 1 Test Scores 1 2 2 2 3 Group 2 Test Scores 2 3 4 5 6 Group 3 Test Scores 4 4 4 5 7
ANOVA results show that there are significant difference between the means of three groups
The use of Multiple Comparison
Multiple comparison procedure used to determine whether the means of:- group 1 differs from group 2, OR - group 1 differ from group 3, OR - group 2 differs from group 3?
Example of multiple comparison technique

Tukey Test Scheffe Test Duncan Test Bonferroni Test HSD Test
Presenting the results from one way ANOVA with post hoc test
A one way between group analysis of variance was conducted to explore the difference of achievement scores between three group (Group 1, Group 2, Group 3). There was a statistically significant difference at the p<0.05 level in achievement scores for the three age groups [F(2, 432) = 4.6, p= 0.01]. continue..
Presenting the results from one way ANOVA with post hoc test
Post-hoc comparisons using the Tukey test indicated that the mean score for Group 1 (M=21.36, SD= 4.55) was significantly different from Group 3 (M= 22.96; SD= 4.49). Group 2 (M= 22.10, SD= 4.15) did not differ significantly from either Group 1 or 3.
Two Way ANOVA
Analysis of data which involve factorial design What is factorial design?
Factorial design
When two or more independent variables involved in a study
Example
Method A
Method B
High ability
Low ability
2 X 2 Factorial Design
2 ways ANOVA
Determine main effect on achievement for method (determine there is a significant difference between mean scores of Method A and Method B)
2 ways ANOVA
Determine main effect on achievement for ability (determine there is a significant difference between mean scores of high and low ability)
Interaction effect
Is there a significant interaction effect between method and ability on achievement?
How to understand there is an interaction effect between method (method A and method B) and students ability (high and low?
More advance than correlation and linear regression Correlation- relationship between two variable (Ex: relationship between attitude towards learning and academic achievement) Linear regression- the relationship between predictor variable and dependent variable (Ex: Can attitude towards learning predict academic achievement of students?)
Multiple Regression
Multiple regression- a combination of two or more variables to predict a dependent variable (Ex: Can attitude towards learning and thinking ability predict academic achievement of students?)
Multiple Regression

Descriptive and Inferential Statistics Part 1 2013 2014

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive and Inferential Statistics Part 1 2013 2014

Uploaded by

Copyright:

Available Formats

Descriptive statistics and inferential statistics

Preparing Data for Analysis

Scoring procedures Tabulation and coding

What does it means scoring data?

Types of categorical and quantifiable data

What is categorical data?

What is quantifiable data

Four kinds of measurement scales

Nominal Ordinal Interval Ratio

Other examples of ordinal data

Number of students scoring within various ranges in IQ test

Other examples of interval data

Types of categorical and quantifiable data

Example of the scoring data

A numeric score (or value) to each response category

Strongly agree Agree Disagree Strongly Disagree

Other example of scoring data

A numeric score (or value) to each response category

Every day Once a week Sometimes Never

An example of multiple choice question

A numeric score (or value) to each response category

Scoring Procedures for Open Ended items

Clean the data

Scoring Procedures for

Example of open ended question

Define population and sample ________________________________ ________________________________ ________________________________ (2 marks)

The marking scheme

Tabulation and coding

Tabulation and Coding

Tabulation is organizing data

Tabulation and Coding

68 marks 70 marks 79 marks 50 marks 40 marks 45 marks

78 marks 90 marks 60 marks 60 marks 65 marks 55 marks

Students id Types of instruction Level of ability Total scores

1 represents 2 represents 3 represents 4 represents Etc..

Ahmad Bakar Malik Abu

Four columns involved

Students id School location Students gender Learning motivation

1 represents 2 represents 3 represents 4 represents Etc..

Ahmad Bakar Malik Abu

Urban or rural 1 represents urban 2 represents rural

Male and female students 1 represent male 2 represent female

How to calculate item which have Likert scale respons

How to analyse the data

Descriptive statistics Inferential statistics

Mode Median Mean

Spread of scores (variability)

Range Quartile deviation Variance Standard deviation

The normal curve

The Normal Curve

The mean, median and the mode are the same

The Normal Curve

The Normal Curve

Fewer Number of Subjects who Attained the Scores

Fewer Number of Subjects who Attained the Scores

The Normal Curve

Fewer Number of Subjects who Attained the Scores

Fewer Number of Subjects who Attained the Scores

The Normal Curve

Fewer Number of Subjects who Attained the Scores

Fewer Number of Subjects who Attained the Scores

The Normal Curve

The Normal Curve

If scores are normally distributed

Define population and sample (2 marks)