Professional Documents
Culture Documents
Chapter 4
Elementary Statistics
Descriptive Statistics
Planning and designing the study Collecting data Describing and summarising data Presenting data and summaries as information
Types of data
Nominal or categorical (qualitative) - e.g. name cannot rank or measure difference Ordinal e.g. position in FTSE or Dow Jones can rank but not measure difference Cardinal (quantitative) e.g. age of person can rank and measure difference (e.g. twice as old) Discrete (e.g. name of country, price) Continuous (e.g. time, weight)
Range of values
Variable Colours Possible range of values [Red, Green, Blue]
[1,31] [0, +]
Bank balance
[-, +]
Sampling
All possible data is the population (N items) A sample is a subset of a population (n<N items) A list of items in a population is a sample frame (e.g. electoral register) A sample should represent the population (i.e. be unbiased)
Types of sampling
Judgemental (non-random)- e.g. expert opinion Quota (non-random) items selected from subsets of population (e.g. males and females) Random -items selected at random Stratified - quota sampling with random selection Cluster/multistage - random samples selected from random clusters (e.g. electoral districts) with or without replacement (sampled item is returned or not returned to sample frame)
Sampling methods
Observation e.g. traffic survey, 2000 cars pass this road from 8 9 AM Longitudinal e.g. social trends Experiments e.g. drug trials Surveys e.g.preferences for products - interviews - questionnaires - panels e.g. focus groups, Delphi
Survey errors
Coverage error sample frame is inadequate (e.g. an out-ofdate phone directory, people may be excluded from the survey). Non-response error not everyone responds to surveys so bias may occur, follow up non-responses by further visits, telephone calls, letters, email, etc. Sampling error cost limits sample size and chance dictates who or which item is included in the sample, so we make statements about the margin of (sampling) error (e.g. the results of a poll will be within 2 percentage points of the actual votes) Measurement error measurement errors result from poorly designed surveys or questionnaires (e.g. badly worded questions) or from incorrectly calibrated instruments. Measuring devices must be calibrated before use and checked during and after use, surveys and questionnaires should be well designed and structured and validated by a pilot study (i.e. small scale trial to identify problems).
Survey guidelines
Ask a series of related questions in a logical sequence (respondents loose interest if questions are presented at random ) Keep questions brief, simple and unambiguous (if respondents do not understand they may give convenient/untrue answers) Avoid hypothetical and conditional questions (i.e. avoid questions such as if you won the lottery would you: pay off your mortgage, buy a new car or have a holiday?. Respondents may not have considered the possibilities.) Avoid leading questions (i.e. avoid questions such as do you agree that broadsheets report news more accurately than tabloids?. Respondents may conform than give honest opinions) Avoid vague questions (i.e. avoid questions such as do you usually drink more wine or beer?. Respondents may drink neither. Does the more refer to glasses, alcoholic content,etc?) Ask positive questions and avoid apologies (i.e. avoid questions such as I hope you dont mind me asking but do you usually buy a daily paper?, just ask Did you buy a daily paper today?)
Example table
Product Sirloin steak (per 500g) Chicken breast (per kg) Heineken cans (4 440 ml) Coca Cola (litre) Mars bars 5 pack (5 65 g) Colgate total (100 ml) Gillette Blue 2 (fixed blade) 10 pack Haagan-Daz ice-cream (500 ml) Olive oil (per 500g) Instant coffee granules (100g) Kelloggs cornflakes (750g) Tuna (185 g) Tropicana orange juice (1 litre) Total basket UK 4.80 6.99 3.38 1.18 1.09 1.79 2.67 3.69 1.85 1.28 1.38 0.47 1.99 32.56 How shopping costs compare France Germany 5.03 5.03 5.00 6.10 1.84 1.09 0.81 0.91 1.24 0.91 1.45 1.22 2.25 2.90 2.04 2.44 1.77 1.37 1.42 2.59 2.20 1.40 0.36 0.45 1.32 1.22 26.73 27.63
Source: Sunday Times 8/10/2000 Notes: UK prices based on Tesco (where identical items not available the nearest equivalent was chosen). All currencies converted to sterling on day of survey.
Charts
Scatter-grams Graphs Pie charts Bar charts Pictograms Histograms Ogives Frequency polygons Lorenz curves
Scatter-grams
Month 1 2 3 4 5 6 7 8 9 10 11 12 Price 16 18 20 25 28 30 28 24 24 22 25 25 Sales 280 300 300 195 155 150 160 250 245 280 200 210
Sales (units) 500 450 400 350 300 250 200 150 100 50 0 0 5 10 15 20 25 Price ('s) 30 35 40 45 50
21 22
23 24
25 26 27
28 29 30
31 32
33 3 4 35
Price ('s )
Graphs
Month 1 2 3 4 5 6 7 8 9 10 11 12 Price 16 18 20 25 28 30 28 24 24 22 25 25 Sales 280 300 300 195 155 150 160 250 245 280 200 210
10
15
20
25
30
35
Prices ('s)
Price
Sales
350 300
Sales (pairs)
25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 Month
Pie charts
Department Clothing Hardware Food Total =
Sales by Department
% sales 35 13 52 100
Sales by Department
Food 52%
Clothing 35%
Bar charts
QUARTER Clothing Hardware Food
Sales ('s) 14000 12000 10000 8000 6000 4000 2000 0 1 2 Quarter 3 4
Sales 's 40000 35000 30000 25000 20000 15000 10000 5000 0 Clothing
Sales by department
Q4 Q3 Q2 Q1
Hardware Department
Food
Picto-grams
Millions of hectares under trees in each region of the UK. (Source: UK Forestry Commission 2000)
Grouped data
Daily sales of own brand baked beans in a London store:
42 108 180 102 52 106 102 88 50 94 164 130 44 88 168 105 84 90 152 114 80 82 98 121 74 60 138 150 56 58 60 126 76 62 112 156 90 64 163 47 82 60 120 183 88 64 181 65
Histogram
Weekly sales of baked beans
Frequency
14 12 10 8 6 4 2 0
40 - 59 60 -79 80 - 99 100 - 119 120 - 139 140 159 160 - 179 180 - 199
weekly sales
Ogives
Less than percent ogive for sales of baked beans in London store
100.00 80.00 60.00 40.00 20.00 0.00 40 60 80 100 120 Sales 140 160 180 200
Frequency polygon
Class range 20 - 39 40 - 59 60 -79 80 - 99 100 - 119 120 - 139 140 159 160 - 179 180 - 199 200 - 219 Midpoint Frequency 30 0 50 7 70 9 90 12 110 10 130 8 150 6 170 5 190 3 210 0
Lorenz curves
% of population 45 20 15 10 5 3 2 % of Cumulative % Cumulative % income of population of income 0 0 5 45 5 7 65 12 8 80 20 15 90 35 17 95 52 23 98 75 25 100 100
Lorenz Curve
Cumulative % of income
100 80 60 40 20 0 0 20 40 60 80 100 Cumulative % of population