You are on page 1of 27

Introduction to statistics I

Sophia King Rm. P24 HWB sk219@le.ac.uk

Using statistics in Psychology


 Carrying out psychological research means the collection of data. Statistics are a way of making use of this data  Descriptive Statistics: used to describe characteristics of our sample
Statistics describe samples  Inferential Statistics: used to generalise from our sample to our population Parameters describe populations  Any samples used should therefore be representative of the target population

Descriptive Statistics
 Statistical procedures used to summarise, organise, and simplify data. This process should be carried out in such a way that reflects overall findings  Raw data is made more manageable  Raw data is presented in a logical form  Patterns can be seen from organised data
Frequency tables Graphical techniques Measures of Central Tendency Measures of Spread (variability)

Plotting Data: describing spread of data


 A researcher is investigating short-term memory capacity: how many symbols remembered are recorded for 20 participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6

 We can describe our data by using a Frequency Distribution. This can be presented as a table or a graph. Always presents:
The set of categories that made up the original category The frequency of each score/category  Three important characteristics: shape, central tendency, and variability

Frequency Distribution Tables

X 11 10 9 8 7 6 5 4 3

f 1 2 1 2 2 4 3 3 2

fX 11 20 9 16 14 24 15 12 6

 Highest Score is placed at top  All observed scores are listed  Gives information about distribution, variability, and centrality  X = score value  f = frequency  fx = total value associated with
frequency  7f = N  7X =7fX

Frequency Table Additions

X 11 10 9 8 7 6 5 4 3

f 1 2 1 2 2 4 3 3 2

fX 11 20 9 16 14 24 15 12 6

p 0.05 0.1 0.05 0.1 0.1 0.2 0.15 0.15 0.1

% 5% 10% 5% 10% 10% 20% 15% 15% 10%

 Frequency tables can display more detailed information about distribution  Percentages and proportions  p = fraction of total group
associated with each score (relative frequency)  p = f/N  As %: p(100) =100(f/N)

 What does this tell about this distribution of scores?

Grouped Frequency Distribution Tables


 Sometimes the spread of data is too wide  Grouped tables present scores as class intervals  About 10 intervals  An interval should be a simple round number
(2, 5, 10, etc), and same width  Bottom score should be a multiple of the width

X 95-99 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 50-54

f 1 1 0 1 2 4 7 0 6 3

 Class intervals represent Continuous variable of X:  E.g. 51 is bounded by real limits of 50.5-51.5  If X is 8 and f is 3, does not mean they all
have the same scores: they all fell somewhere between 7.5 and 8.5

Percentiles and Percentile Ranks

X 11 10 9 8 7 6 5 4 3

f 1 2 1 2 2 4 3 3 2

cf 20 19 17 16 14 12 8 5 2

C% 100% 95% 85% 80% 70% 60% 40% 25% 10%

 X values = raw scores, without context  Percentile rank = the percentage of the sample with scores below or at the particular value  This can be represented be a cumulative frequency column  Cumulative percentage obtained by:
c% = cf/N(100)

 This gives information about relative position in the data distribution

Representing data as graphs


Frequency

limits of intervals  Histograms can be modified to include blocks representing individual scores

memory score

8 7 6 5 4 3 2 1

10

11

12

0 45 49 54 59 64 69 74 79 84 89 94 99 score

 Frequency Distribution Graph presents all the info available in a Frequency Table (can be fitted to a grouped frequency table)  Uses Histograms  Bar width corresponds to real

Frequency

Frequency Distribution Polygons


 Shows same information with lines: traces shape of distribution  Both histograms and polygons represent continuous data  For non numerical data, frequency distribution can be represented by bar graphs  Bar graphs have spaces
between adjacent bars to represent distinct categories

phone numbers his toric al dates f amily dates

 



 

% #

% "

% !

"

   

Frequencies of Populations and Samples


 Population  All the individuals of interest to the study  Sample  The particular group of participants you are testing: selected from the population  Although it is possible to have graphs of population distributions, unlike graphs of sample distributions, exact frequencies are not normally possible. However, you can  Display graphs of relative frequencies (categorical data)  Use smooth curves to indicate relative frequencies (interval or ratio data)

Frequency Distribution: the Normal Distribution


 Bell-shaped: specific shape that can be defined as an equation  Symmetrical around the mid point, where the greatest frequency if scores occur

 Asymptotes of the perfect curve never quite meet the horizontal axis  Normal distribution is an assumption of parametric testing

Frequency Distribution: Different Distribution shapes

Measures of Central Tendency


 A way of summarising the data using a single value that is in some way representative of the entire data set  It is not always possible to follow the same procedure in producing a central representative value: this changes with the shape of the distribution  Mode  Most frequent value  Does not take into account exact scores  Unaffected by extreme scores  Not useful when there are several values that occur equally often in a set

Measures of Central Tendency


 Median  The values that falls exactly in the midpoint of a ranked distribution  Does not take into account exact scores  Unaffected by extreme scores  In a small set it can be unrepresentative  Mean (Arithmetic average)  Sample mean: M = 7X Population mean: Q = 7X n N  Takes into account all values  Easily distorted by extreme values

Measures of Central Tendency

 For our set of memory scores:


4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6

 Mode = 6: Median = 6: Mean = 6.35  The mean is the preferred measure of central tendency, except when  There are extreme scores or skewed distributions  Non interval data  Discrete variables

Central Tendencies and Distribution Shape

Describing Variability
 Describes in an exact quantitative measure, how spread out/clustered together the scores are  Variability is usually defined in terms of distance  How far apart scores are from each other  How far apart scores are from the mean  How representative a score is of the data set as a whole

Describing Variability: the Range


 Simplest and most obvious way of describing variability

Range = GHighest - GLowest


 The range only takes into account the two extreme scores and ignores any values in between. To counter this there the distribution is divided into quarters (quartiles). Q1 = 25%, Q2 =50%, Q3 =75% The Interquartile range: the distance of the middle two quartiles (Q3 Q1) The Semi-Interquartile range: is one half of the Interquartile range

Describing Variability: Deviation

 A more sophisticated measure of variability is one that shows how scores cluster around the mean  Deviation is the distance of a score from the mean
X - Q, e.g. 11 - 6.35 = 3.65, 3 6.35 = -3.35

 A measure representative of the variability of all the scores would be the mean of the deviation scores 7(X - Q) Add all the deviations and divide by n n However the deviation scores add up to zero (as mean serves as balance point for scores)

Describing Variability: Variance


X 3 3 4 4 4 5 5 5 6 6 6 6 7 7 8 8 9 10 10 11 Sum X-Q -3.35 -3.35 -2.35 -2.35 -2.35 -1.35 -1.35 -1.35 -0.35 -0.35 -0.35 -0.35 0.65 0.65 1.65 1.65 2.65 3.65 3.65 4.65 0 (X -Q) 11.22 11.22 5.52 5.52 5.52 1.82 1.82 1.82 0.12 0.12 0.12 0.12 0.42 0.42 2.72 2.72 7.02 13.32 13.32 21.62 106.55

 To remove the +/- signs we simply square each deviation before finding the average. This is called the Variance:
7(X - Q) n

= 106.55 20

= 5.33

 The numerator is referred to as the Sum of Squares (SS): as it refers to the sum of the squared deviations around the mean value

Describing Variability: Population Variance


 Population variance is designated by W W = 7(X - Q) = SS N N  Sample Variance is designated by s  Samples are less variable than populations: they therefore give biased estimates of population variability  Degrees of Freedom (df): the number of independent (free to vary) scores. In a sample, the sample mean must be known before the variance can be calculated, therefore the final score is dependent on earlier scores: df = n -1
s = 7(X - M) = SS = 106.55 = 5.61 n-1 n -1 20 -1

Describing Variability: the Standard Deviation


 Variance is a measure based on squared distances  In order to get around this, we can take the square root of the variance, which gives us the standard deviation  Population (W) and Sample (s) standard deviation
W = 7(X - Q) N
s = 7(X - M) n-1 So for our memory score example we simple take the square root of the variance: = 5.61 = 2.37

Describing Variability

 The standard deviation is the most common measure of variability, but the others can be used. A good measure of variability must:  Must be stable and reliable: not be greatly affected by little details in the data
Extreme scores Multiple sampling from the same population Open-ended distributions  Both the variance and SD are related to other statistical techniques

Descriptive statistics
 A researcher is investigating short-term memory capacity: how many symbols remembered are recorded for 20 participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6

 What statistics can we display about this data, and what do they mean?  Frequency table: show how often different scores occur  Frequency graph: information about the shape of the distribution  Measures of central tendency and variability

Descriptive statistics
5

X 11 10 9 8 7 6 5 4 3

f 1 2 1 2 2 4 3 3 2

fX 11 20 9 16 14 24 15 12 6

p 0.05 0.1 0.05 0.1 0.1 0.2 0.15 0.15 0.1

% 5% 10% 5% 10% 10% 20% 15% 15% 10%

0 1 2 3 4 5 6 7 8 9 10 11 12

References and Further Reading

 Gravetter & Wallnau


 Chapter 2  Chapter 3  Chapter 4

You might also like