You are on page 1of 27

6CN010 - Dissertation

Quantitative Data Analysis

Quantitative data analysis


Coding
Descriptive statistics
Measure of distribution
Measure of central tendency
Measure of dispersion

From survey to usable data


Processing the data collected for analysis requires coding
Coding converting data into numeric form for analysis
Knowing how data is going to be analysed is essential to
designing surveys
Has to be done at the start not after the data has been
collected!!

Coding
Check the level of measurement make sure it is
appropriate for envisaged analysis.

Variables can be defined into types according to the level of


mathematical scaling that can be carried out on the data.

4 types of data or levels of measurement:


1. Nominal

2. Ordinal

3. Interval

4. Ratio

Coding nominal variables


Nominal (categorical) data comprises of categories that
cannot be rank ordered each category is just different.
No order to categories coding can be in any order but good
practice to code in order of appearance.
What is your
gender? (please tick)

Did you enjoy the


film? (please tick)

Male

Yes

No

Female

Sometimes coded 0 and 1.

Coding ordinal variables


Ordinal data is data that comprises of categories that can be
rank ordered.
Coding similar to nominal but coded in rank order.
Ranking can run low high or high low
How satisfied are you with the
level of service you have
received? (please tick)

What is your highest level of


qualification? (please tick)

Very satisfied

Somewhat satisfied

Neutral

Somewhat dissatisfied

Very dissatisfied

Degree or higher

Level 3 or equivalent

Level 2 or equivalent

Level 1 or equivalent

No qualifications

Coding interval and ratio variables


Scale data interval and ratio data
data is in numeric format (50, 100, 150)
can be measured on a continuous scale
distance between each can be observed and as a result
measured
data can be placed in rank order.
Coding scale data just enter the value (age = 52, number
of bedrooms in a house = 3)

Descriptive statistics
Descriptive statistics describe what the data is or what the
data shows.
Descriptive statistics are different from inferential statistics.
Inferential statistics are used to infer conclusions from the
data and make generalisations to the populations.
Descriptive statistics conducting analysis on one variable at
a time or univariate analysis.

Measures of distribution

Distribution is a summary for each variable of the frequency


or number of times a value or range of values occurs.
Examples:
number and percentage of male and female
ages of research participants.

Frequency tables

(1/3)

A frequency table is one of the most common methods used


to describe a single variable.
Used to describe nominal or ordinal variables those with a
category (yes & no, strongly agree to strongly disagree and
so on).
Shows number and/or percentage of the occurrence of a
category within a variable.
Frequency distributions can be depicted in two ways:
table
graph

Frequency table

(2/3)

Example
Age range

Number

Percentage

Less than 20

150

19.9

20 49

250

33.1

50 64

180

23.8

65 80

100

13.3

Over 80

75

9.9

755

100.0

Total

Frequency table

(3/3)

Example: other things you may see


Number

Percentage

Valid
Percentage

Cumulative
Percentage

Less than 20

150

19.3

19.9

19.9

20 49

250

32.2

33.1

53.0

50 64

180

23.2

23.8

76.8

65 80

100

12.9

13.3

90.1

Over 80

75

9.8

9.9

100.0

Total

755

97.4

100.0

100.0

Missing (not

20

2.6

775

100.0

Age range

recorded)

Total

Measures of central tendency


Measures of central tendency: quantification of the location
of the middle or centre of a data set what the typical or
average score/ result of a data set is.
So, identifying a typical value that best summarises the
distribution of values in a variable.
There are three main different measures:
180 220 280 320 380
1 Mean Average
x
276
5
2 Mode Most frequently occurring (280)
180

220

280

320

280

180

350

280

330

220

Measures of central tendency

(Contd)

2 Mode (contd)
Bi-modal: two most frequently occurring values in a
distribution (two pronounced views or patterns of response).
Multi-modal: where there are more than two modes in a
distribution (potentially several pronounced views or patterns
of response).

Measures of central tendency

(Contd)

3 Median
Median is the midpoint in a distribution, when arranged in
ascending or descending order.
180

220

280

320

380

280
Where there is an even number of observations the median
will be the average of the two middle values.
180

220

280

300

290

320

380

Appropriate measure

Level of measurement

Measure of central tendency

Nominal

Mode

Ordinal

Median and mode

Interval/Ratio

Mean, median and mode

Measures of dispersion
Measures of dispersion: statistical measures that summarise
the amount of spread or variation in the distribution of values
in a variable.
So, how values are spread within a distribution.
There are a number of different measures (applicable to
interval or ratio data):
Range
Standard deviation
Variance

Measures of dispersion
Type

Description

Range

Difference between the highest (maximum) and


lowest (minimum) value in the distribution of values

Variance

The measure of the spread.

Standard deviation

Shows the relation that a set of data has to the mean


of the sample data.

Range
Range is the difference between the highest and lowest value
in the distribution of values.
Example:
Weekly income of 10 people:
180 220 280 320 280 180 310 280 330 220

Range is maximum income minus minimum


income: 330-180 = 150.

Range using ordinal data


Of course, ordinal data can be ordered and so can give
information on range.
Example:
Survey question How useful did you find the book?
Very
useful

Very
unuseful

Useful

Unuseful

Very
unuseful

Useful

Very
useful

Very
useful

Useful

Unuseful

Range is from Very useful to Very un-useful

Inter quartile range


Inter quartile range (IQR) is another range measure but this
time looks at the data in terms of quarters or percentiles.
The range of data is divided into four equal percentiles or
quarters (25%).
Q1
25th
percentile

Q3
75th
percentile
IQR

Min

Max
Q2
Median
50th
Percentile
Ran
ge

Inter Quartile Range


IQR is the range of the middle 50% of the data. Therefore,
because it uses the middle 50%, it is not affected by minima or
maxima values (outliers).
Outliers variables that are the extreme lower or upper end
of the distribution. They are a typical, infrequent observations.
These will influence the mean (arithmetic). Why?
10 people record their height:
160, 162, 164, 166, 168, 170, 172, 174, 176 and 200 cm tall.
With those values the mean is 171cm.
(200cm is the outlier take it out and the mean is 168cm)

Variance
Where the mean is a measure of the centre of a group of
numbers, the variance is the measure of the spread.
It involves measuring the distance between each of the
values and the mean.
To calculate the variance :
1. calculate the mean
2.

for each value in the distribution subtract the mean


and then square the result (the squared difference)

3.

calculate the average of those squared differences.

Variance

N 1

= Sum of (observed value mean score) 2


Total number of scores -1

The larger the variance value the further the observed


values of the data set are dispersed from the mean.
A variance value of zero means all observed values are the
same as the mean.

Standard deviation
Standard deviation = The square root of the variance.

X X

N 1

As it is square rooted the results correspond to the original


data units. E.g. if the variable is height recorded in cm then the
standard deviation can be interpreted as cm.
Standard deviation: how far on average each value is from
the mean.

Appropriate descriptive statistics: summary


Level of
measurement

Univariate analysis

Nominal

Frequency table: count, %, valid %, cumulative %.


Measure of central tendency: mode
Measure of dispersion: no measure.

Ordinal

Frequency table: count, %, valid %, cumulative %.


Measure of central tendency: mode, median
Measure of dispersion: no measure.

Interval/Ratio

Frequency table: count, %, valid %, cumulative %.


Measure of central tendency: mode, median,
mean
Measure of dispersion: range, variance, standard
deviation

Further Reading
Creswell, John W (1994), Research design: Qualitative and Quantitative
Approaches. Sage Publication, London, Page 116-171
Holt, G. (1998). A guide to successful dissertation study for students of the
built environment, Second edition. Wolverhampton: Built Environment
Research Unit. ISBN: 1-902010-01-9, page 100-118
Naoum, S.G. (2007) Dissertation Research and Writing for Construction
Students, 2nd Edition. Oxford: Butterworth Heinemann. ISBN: 0 7506 2988
6, page 91-131

You might also like