You are on page 1of 63

Computing in

Archaeology
Session 11. Correlation and
regression analysis
© Richard Haddlesey www.medievalarchitecture.net
Lecture aims
 To introduce correlation and
regression techniques
The scattergram
 In correlation, we are always dealing
with paired scores, and so values of
the two variables taken together
will be used to make a scattergram
example
 Quantities of New Forrest pottery
recovered from sites at varying distances
from the kilns

Site Distance Quantity


(km)
1 4 98
2 20 60
3 32 41
4 34 47
5 24 62
Negative correlation

Here we can see that the quantity of pottery decreases as


distance from the source increases
Positive correlation

Here we see that the taller a pot, the wider the rim
Curvilinear monotonic relation

Again the further from source, the less quantity


of artefacts
Arched relationship
(non-monotonic)

Here we see the first molar increases with age and is


then worn down as the animal gets older
scattergram
 This shows us that scattergrams are
the most important means of
studying relationships between two
variables
REGRESSION
 Regression differs from other techniques
we have looked at so far in that it is
concerned not just with whether or not a
relationship exists, or the strength of that
relationship, but with its nature

 In regression analysis we use an


independent variable to estimate (or
predict) the values of a dependent variable
Regression equation
y = f(x)

 y = y axis (in this case the


dependent

 f = function (of x)

 x = x axis
y = f(x)

y=x y = 2x y = x2
General linear equations
 y = a + bx

 Where y is the dependent variable, x


is the independent variable, and the
coefficients a and b are constants,
i.e. they are fixed for a given data
Therefore:
 If x = 0 then the equation reduces to y =
a, so a represents the point where the
regression line crosses the y axis (the
intercept)

 The b constant defines the slope of


gradient of the regression line

 Thus for the pottery quantity in relation to


distance from source, b represents the
amount of decrease in pottery quantity
from the source
y = a + bx
least-squares
least-squares
least-squares
least-squares
y = a + bx
y = a + bx
y = 102.64 – 1.8x
CORRELATION
CORRELATION

1 correlation coefficient
CORRELATION

1 correlation coefficient

2 significance
CORRELATION

1 correlation coefficient
• r

2 significance
CORRELATION

1 correlation coefficient
• r
• -1 to +1

2 significance
Levels of measurement:

• nominal – in name only

• ordinal – forming a sequence

• interval – a sequence with fixed distances

• ratio – fixed distances with a datum point


Levels of measurement:

• nominal

• ordinal

• interval

• ratio
Levels of measurement:

• nominal

• ordinal

• interval Product-Moment
Correlation Coefficient
• ratio
Levels of measurement:

• nominal

• ordinal Spearman’s Rank


Correlation Coefficient
• interval

• ratio
The Product-Moment
Correlation Coefficient
sample – 20 bronze spearheads
length (cm) width (cm)

n=20
r= nΣxy – (Σx)(Σy) g
√[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]
length (cm) width (cm)

n=20
r= nΣxy – (Σx)(Σy) g
√[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

n=20
r= nΣxy – (Σx)(Σy) g
√[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

n=20
r= nΣxy – (Σx)(Σy) g= +0.67
√[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

n=20
Test of product moment correlation coefficient
Test of product moment correlation coefficient

H0 : true correlation coefficient = 0


Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0


Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random


Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random

Sample statistics needed: n and r


Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random

Sample statistics needed: n and r

Test statistic: TS = r
Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random

Sample statistics needed: n and r

Test statistic: TS = r

Table: product moment correlation coefficient table.


n = 20
n = 20 r = 0.67 p<0.01
n = 20 r = 0.67 p<0.01
length (cm) width (cm)
Spearman’s Rank Correlation Coefficient (rs)
Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0


Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0


Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal


Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal

Sample statistics needed: n and rs


Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal

Sample statistics needed: n and rs

Test statistic: TS = rs
Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal

Sample statistics needed: n and rs

Test statistic: TS = rs

You might also like