Professional Documents
Culture Documents
• Numerical (quantitative)
discrete (eg parity)
continuous (eg height)
• Categorical (qualitative)
nominal ( eg gender)
ordinal (eg disease severity)
• Data entry into Excel for analysis by
Department of Biostatistics
• Data checking before analysis
– see protocol manual
Number Age Gender Date of Current Ever Age started Length of
interview smoker smoked smoking hospitalization
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 45 M 25-May-07 Y Y 15 12 days
2 32 F Jun 15 2007 Y N 0 2 days
3 35 F 23-May-07 N Y 31 5 days
4 46 May 24 2007 N Y 40 2 weeks
5 25 M 22-May-07 Y Y 28 1 month
6 20 M 13-Jun-07 N N 5 days
7 17 F 12-Jun-07 Y Y 16 23 days
8 20 M 11-Jun-07 Y Y 7 days
9 19 M 21-May-07 Y Y 18 8 days
10 30 F 3-Jun-07 N N 20 2 weeks
Number Age Gender Date of Current Ever Age started Length of
interview smoker smoked smoking hospitalization
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 45 M 25-May-07 Y Y 15 12 days
2 32 F Jun 15 2007 Y N 0 2 days
3 35 F 23-May-07 N Y 31 5 days
4 46 ? May 24 2007 N Y 40 2 weeks
5 25 M 22-May-07 Y Y 28 1 month
6 20 M 13-Jun-07 N N 5 days
7 17 F 12-Jun-07 Y Y 16 23 days
8 20 M 11-Jun-07 Y Y ? 7 days
9 19 M 21-May-07 Y Y 18 8 days
10 30 F 3-Jun-07 N N 20 2 weeks
Summary (descriptive) statistics:
numerical variables
Numerical data: | 14 | 8
| 13 |
| 12 |
symmetric | 11 |
| 10 | 4
4 | 9|3
71 | 8|045 skew
95411 | 7|146
9954332 | 6|001
975220 | 5|22448
66 | 4|11116668
8 | 3|
•
Summary (descriptive) statistics:
numerical variables
Central location
• mean (average): sum of all the values
divided by the number of
observations in the group
• median
• (mode: the value which occurs most often)
Median
• laboratory values
1,2,3,4,5
1,2,3,4,50
median rather than mean if:
• skew distribution
• small sample with outlier
• censored observations
• Categorise a numerical variable:
Blood pressure, Body mass index
Measures of variability
• range: largest - smallest value
can be influenced by extreme value
Measures of variability
• standard deviation
(xi x)2
n
i1
n 1
s x 100
mean
• 25th percentile: the value which divides sample
in a quarter versus three quarters
• 75th percentile: the value which divides sample
in three quarters versus a quarter
Summary (descriptive) statistics:
categorical variables
6x2 table
Lung cancer
Cigarettes per day Yes No
50+ 38 12
0 7 61
Odds ratio 27.6
1-4 55 129
0 7 61
Odds ratio 3.7
Stratified analysis: adjusting for effect of
confounders
Hypertension
Yes No
Overweight
BMI > 25 151 108 259
BMI 25 or less 84 132 216
235 240 475
Relative risk 1.5
Hypertension
Yes No
BMI > 25 151 108 259
BMI 25 or less 84 132 216
235 240 475
Relative risk 1.5
Males Females
Hypertension Hypertension
Yes No Yes No
BMI >25 20 24 44 131 84 215
BMI ≤ 25 36 73 109 48 59 107
56 97 153 179 143 322
Target population
Sample
Results
• Statistics based on sample data are
estimates of the corresponding
parameters in the target population
• Different random samples of the same
size from the same target population will
not give exactly the same estimates
(results): there is variation in any sample
• The accuracy of the estimates depends
on:
- sample size
- variability of the variable being
measured
• The standard error of an estimate consists
of these two components:
To calculate use:
• sample estimate
• its standard error
• cutoff value of the appropriate probability
distribution
• 4/10 (95% CI: 12.2% to 73.8%)