Introduction To Statistics 1

Introduction to statistics I
Sophia King Rm. P24 HWB sk219@le.ac.uk
Using statistics in Psychology

Carrying out psychological research means the collection of data. Statistics are a way of making use of this data Descriptive Statistics: used to describe characteristics of our sample
Statistics describe samples Inferential Statistics: used to generalise from our sample to our population Parameters describe populations Any samples used should therefore be representative of the target population
Descriptive Statistics
Statistical procedures used to summarise, organise, and simplify data. This process should be carried out in such a way that reflects overall findings Raw data is made more manageable Raw data is presented in a logical form Patterns can be seen from organised data
Frequency tables Graphical techniques Measures of Central Tendency Measures of Spread (variability)
Plotting Data: describing spread of data

A researcher is investigating short-term memory capacity: how many symbols remembered are recorded for 20 participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6
We can describe our data by using a Frequency Distribution. This can be presented as a table or a graph. Always presents:
The set of categories that made up the original category The frequency of each score/category Three important characteristics: shape, central tendency, and variability
Frequency Distribution Tables
X 11 10 9 8 7 6 5 4 3
f 1 2 1 2 2 4 3 3 2
fX 11 20 9 16 14 24 15 12 6
Highest Score is placed at top All observed scores are listed Gives information about distribution, variability, and centrality X = score value f = frequency fx = total value associated with
frequency 7f = N 7X =7fX
Frequency Table Additions
X 11 10 9 8 7 6 5 4 3
f 1 2 1 2 2 4 3 3 2
fX 11 20 9 16 14 24 15 12 6
p 0.05 0.1 0.05 0.1 0.1 0.2 0.15 0.15 0.1
% 5% 10% 5% 10% 10% 20% 15% 15% 10%
Frequency tables can display more detailed information about distribution Percentages and proportions p = fraction of total group
associated with each score (relative frequency) p = f/N As %: p(100) =100(f/N)
What does this tell about this distribution of scores?
Grouped Frequency Distribution Tables

Sometimes the spread of data is too wide Grouped tables present scores as class intervals About 10 intervals An interval should be a simple round number
(2, 5, 10, etc), and same width Bottom score should be a multiple of the width
X 95-99 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 50-54
f 1 1 0 1 2 4 7 0 6 3
Class intervals represent Continuous variable of X: E.g. 51 is bounded by real limits of 50.5-51.5 If X is 8 and f is 3, does not mean they all
have the same scores: they all fell somewhere between 7.5 and 8.5
Percentiles and Percentile Ranks
X 11 10 9 8 7 6 5 4 3
f 1 2 1 2 2 4 3 3 2
cf 20 19 17 16 14 12 8 5 2
C% 100% 95% 85% 80% 70% 60% 40% 25% 10%
X values = raw scores, without context Percentile rank = the percentage of the sample with scores below or at the particular value This can be represented be a cumulative frequency column Cumulative percentage obtained by:
c% = cf/N(100)
This gives information about relative position in the data distribution
Representing data as graphs

Frequency
limits of intervals Histograms can be modified to include blocks representing individual scores
memory score
8 7 6 5 4 3 2 1
10
11
12
0 45 49 54 59 64 69 74 79 84 89 94 99 score
Frequency Distribution Graph presents all the info available in a Frequency Table (can be fitted to a grouped frequency table) Uses Histograms Bar width corresponds to real
Frequency
Frequency Distribution Polygons

Shows same information with lines: traces shape of distribution Both histograms and polygons represent continuous data For non numerical data, frequency distribution can be represented by bar graphs Bar graphs have spaces
between adjacent bars to represent distinct categories

phone numbers his toric al dates f amily dates

% #
% "
% !
"

Frequencies of Populations and Samples

Population All the individuals of interest to the study Sample The particular group of participants you are testing: selected from the population Although it is possible to have graphs of population distributions, unlike graphs of sample distributions, exact frequencies are not normally possible. However, you can Display graphs of relative frequencies (categorical data) Use smooth curves to indicate relative frequencies (interval or ratio data)
Frequency Distribution: the Normal Distribution

Bell-shaped: specific shape that can be defined as an equation Symmetrical around the mid point, where the greatest frequency if scores occur
Asymptotes of the perfect curve never quite meet the horizontal axis Normal distribution is an assumption of parametric testing
Frequency Distribution: Different Distribution shapes
Measures of Central Tendency

A way of summarising the data using a single value that is in some way representative of the entire data set It is not always possible to follow the same procedure in producing a central representative value: this changes with the shape of the distribution Mode Most frequent value Does not take into account exact scores Unaffected by extreme scores Not useful when there are several values that occur equally often in a set

Median The values that falls exactly in the midpoint of a ranked distribution Does not take into account exact scores Unaffected by extreme scores In a small set it can be unrepresentative Mean (Arithmetic average) Sample mean: M = 7X Population mean: Q = 7X n N Takes into account all values Easily distorted by extreme values
For our set of memory scores:

4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6
Mode = 6: Median = 6: Mean = 6.35 The mean is the preferred measure of central tendency, except when There are extreme scores or skewed distributions Non interval data Discrete variables
Central Tendencies and Distribution Shape
Describing Variability
Describes in an exact quantitative measure, how spread out/clustered together the scores are Variability is usually defined in terms of distance How far apart scores are from each other How far apart scores are from the mean How representative a score is of the data set as a whole
Describing Variability: the Range

Simplest and most obvious way of describing variability
Range = GHighest - GLowest

The range only takes into account the two extreme scores and ignores any values in between. To counter this there the distribution is divided into quarters (quartiles). Q1 = 25%, Q2 =50%, Q3 =75% The Interquartile range: the distance of the middle two quartiles (Q3 Q1) The Semi-Interquartile range: is one half of the Interquartile range
Describing Variability: Deviation
A more sophisticated measure of variability is one that shows how scores cluster around the mean Deviation is the distance of a score from the mean
X - Q, e.g. 11 - 6.35 = 3.65, 3 6.35 = -3.35
A measure representative of the variability of all the scores would be the mean of the deviation scores 7(X - Q) Add all the deviations and divide by n n However the deviation scores add up to zero (as mean serves as balance point for scores)
Describing Variability: Variance

X 3 3 4 4 4 5 5 5 6 6 6 6 7 7 8 8 9 10 10 11 Sum X-Q -3.35 -3.35 -2.35 -2.35 -2.35 -1.35 -1.35 -1.35 -0.35 -0.35 -0.35 -0.35 0.65 0.65 1.65 1.65 2.65 3.65 3.65 4.65 0 (X -Q) 11.22 11.22 5.52 5.52 5.52 1.82 1.82 1.82 0.12 0.12 0.12 0.12 0.42 0.42 2.72 2.72 7.02 13.32 13.32 21.62 106.55
To remove the +/- signs we simply square each deviation before finding the average. This is called the Variance:
7(X - Q) n
= 106.55 20
= 5.33
The numerator is referred to as the Sum of Squares (SS): as it refers to the sum of the squared deviations around the mean value
Describing Variability: Population Variance

Population variance is designated by W W = 7(X - Q) = SS N N Sample Variance is designated by s Samples are less variable than populations: they therefore give biased estimates of population variability Degrees of Freedom (df): the number of independent (free to vary) scores. In a sample, the sample mean must be known before the variance can be calculated, therefore the final score is dependent on earlier scores: df = n -1
s = 7(X - M) = SS = 106.55 = 5.61 n-1 n -1 20 -1
Describing Variability: the Standard Deviation

Variance is a measure based on squared distances In order to get around this, we can take the square root of the variance, which gives us the standard deviation Population (W) and Sample (s) standard deviation
W = 7(X - Q) N
s = 7(X - M) n-1 So for our memory score example we simple take the square root of the variance: = 5.61 = 2.37
Describing Variability
The standard deviation is the most common measure of variability, but the others can be used. A good measure of variability must: Must be stable and reliable: not be greatly affected by little details in the data
Extreme scores Multiple sampling from the same population Open-ended distributions Both the variance and SD are related to other statistical techniques
Descriptive statistics
A researcher is investigating short-term memory capacity: how many symbols remembered are recorded for 20 participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6
What statistics can we display about this data, and what do they mean? Frequency table: show how often different scores occur Frequency graph: information about the shape of the distribution Measures of central tendency and variability
Descriptive statistics
5
X 11 10 9 8 7 6 5 4 3
f 1 2 1 2 2 4 3 3 2
fX 11 20 9 16 14 24 15 12 6
p 0.05 0.1 0.05 0.1 0.1 0.2 0.15 0.15 0.1
% 5% 10% 5% 10% 10% 20% 15% 15% 10%
0 1 2 3 4 5 6 7 8 9 10 11 12
References and Further Reading
Gravetter & Wallnau

Chapter 2 Chapter 3 Chapter 4

Introduction To Statistics 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics 1

Uploaded by

Copyright:

Available Formats

Introduction to statistics I

Sophia King Rm. P24 HWB sk219@le.ac.uk

Using statistics in Psychology

Plotting Data: describing spread of data

Frequency Distribution Tables

Frequency Table Additions

p 0.05 0.1 0.05 0.1 0.1 0.2 0.15 0.15 0.1

% 5% 10% 5% 10% 10% 20% 15% 15% 10%

 What does this tell about this distribution of scores?

Grouped Frequency Distribution Tables

Percentiles and Percentile Ranks

C% 100% 95% 85% 80% 70% 60% 40% 25% 10%

 This gives information about relative position in the data distribution

Representing data as graphs

Frequency Distribution Polygons

Frequencies of Populations and Samples

Frequency Distribution: the Normal Distribution

Frequency Distribution: Different Distribution shapes

Measures of Central Tendency

Measures of Central Tendency

Measures of Central Tendency

 For our set of memory scores:

Central Tendencies and Distribution Shape

Describing Variability: the Range

Range = GHighest - GLowest

Describing Variability: Deviation

Describing Variability: Variance

Describing Variability: Population Variance

Describing Variability: the Standard Deviation

p 0.05 0.1 0.05 0.1 0.1 0.2 0.15 0.15 0.1

% 5% 10% 5% 10% 10% 20% 15% 15% 10%

References and Further Reading

 Gravetter & Wallnau

You might also like

What does this tell about this distribution of scores?

This gives information about relative position in the data distribution

For our set of memory scores:

Gravetter & Wallnau