Professional Documents
Culture Documents
Content
Types of Data Descriptive/Summary Statistics
Graphical Presentations
Types of Data
Variables
Variables
Exercise
Consider the following variables and decide if they are
Numerical or Categorical; continuous, discrete, nominal or ordinal Gender Height Number of staff in a department Length of psychiatric inpatient treatment Preferred strength of coffee Organisational size Types of anxiety disorder Levels of anxiety Types of medication
Derived data
In the medical field, other types of data may be
encountered
Percentages e.g. % of operational interactions Ratios or quotients e.g. Body Mass Index (BMI), kg/m2 Rates e.g. number of disease events/total number of
Descriptive/Summary Statistics
Measures of location
Measures of location summarise data with a
single number
Mean
The mean (more precisely, the arithmetic mean) is
commonly called the average
x x n
All the values (x) are added together and the sum divided by the number of observations (n)
Mode
The mode represents the most commonly occurring
value within a dataset
Median
Median means middle, and the median is the middle of
a set of data that has been put into rank order
18
24
29
30
32
Quartiles
Are a subset of percentiles
Lower quartile - 25% of the data is below this
value
Measures of Dispersion
The dispersion in a set of data is the variation among
the set of data values
Range
The range is the difference between the largest and the
smallest values in the dataset
It is sensitive to extreme values The range of a list is 0 if and only if all the data values
are equal
4 Range
16
Days
Inter-quartile Range
Upper Quartile Lower Quartile Describes how much the middle 50% of the dataset
varies
- example: if all patients at a clinic took more-or-less the same time to be treated with only one or two exceptionally quick or long appointments you would expect the inter-quartile range to be very small - but if all appointments were either very quick or very long, with few in between then the inter-quartile range would be larger
1 SD
Mean
1 SD
10
12
14
16
Days
10
12
Sample
N Variance s 2
x x)
n 1
StandardDeviation s s 2
StandardDeviation s s 2
Measures of Distribution
Measures of distribution are
- Skewness - Kurtosis
Skewness
A skewed distribution is characterised by a tail off
towards the high end of the scale (a positive skew) or towards the low end of the scale (a negative skew)
Normal Distribution
Skewness statistic ~ 0
Positive Skew
Skewness statistic > 0
Negative Skew
Skewness statistic < 0
Skewness
If the distribution has no skewness, then the
skewness statistic will be zero
Kurtosis
A distribution with kurtosis is characterised by the
distribution being too narrow and peaked (a positive kurtosis) or too wide and flat (a negative kurtosis)
Normal Distribution
Kurtosis statistic ~ 0
Positive Kurtosis
Kurtosis statistic > 0
Negative Kurtosis
Kurtosis statistic < 0
Contingency Table
A table in which the entries are frequencies
A matrix format that displays the frequency
distribution of the variables
Contingency table: 2 x 2
Characteristic Group 1 Group 2 Total Present a b a+b Absent c d c+d Total a+c b+d n=a+b+c+d
True/False Positive/Negative
Of the a + c individuals who have the disease, how
many have positive test results (true positives)?
a a c )
d b d )
Graphical Presentations
Typical graphs
Bar Chart Pareto Chart Pie Chart Box Plot Histogram
Useful for getting an initial feel for the data Useful for explaining/presenting results to others Useful for identifying outliers
Bar (or Column) Chart Pareto Chart Pie Chart Continuous Numerical data (and some Discrete
Numerical data) can be displayed visually in a:
Bar chart
Why use it?
15
Frequency
10
g Type of patient
Pareto chart
Why use it?
Pareto chart
What does it do?
have the largest impact in terms of quality, time or costs In these situations it may be best to use two Pareto charts:
80 70 60
100 80 60 40 20 0
Frequency
40 30 20 10 0 Cause Count Percent Cum % A 30 41.7 41.7 B 25 34.7 76.4 C D E Other 3 4.2 100.0
6 8.3 84.7
5 6.9 91.7
3 4.2 95.8
Percent
50
No Pareto effect
No Pareto effect
70 60 80 50 100
Frequency
40 30 20 10 0 Cause Count Percent Cum % A 18 26.5 26.5 B 15 22.1 48.5 C 14 20.6 69.1 D 10 14.7 83.8 E other 5 7.4 100.0
40 20 0
6 8.8 92.6
Percent
60
50
100
Frequency
30 20 10 Cause
60 40 20
g g g ly in g in in l t r n i r du ro a de w g e r v w n o a ch nd d e ro s a e un s H w o n ct ck d o n e o i i r t t t s or ec ca ed d c i r e M In or ed M c M In Count 21 8 6 6 5 5 Percent 39.6 15.1 11.3 11.3 9.4 9.4 Cum % 39.6 54.7 66.0 77.4 86.8 96.2 e bl a il e ac l p
er h Ot
2 3.8 100.0
Percent
40
80
Pie chart
Why use it?
Box plot
What does it do?
Box Plot
Whisker extends to this adjacent value the highest value within the upper limit
First Quartile (Q1) Whisker extends to this adjacent value the lowest value within the lower limit Outliers *
0 Group A Group B
Histogram
Why use it?
Histogram
What does it do? Displays bars representing the count within different intervals of data Allows visualisation of the shape and spread of a data set Allows patterns to be identified Provides an indication of where the mean lies
Normal Distribution
Normal Distribution
25
20
Frequency
15
10
45.0
46.5
48.0
49.5 Data
51.0
52.5
54.0
Bimodal distribution
Bimodal distribution
40
30
Frequency
20
10
44
48
52 data
56
60
64
Skewed distribution
Skewed distribution
60 50 40
Frequency
30 20 10 0
40
80
120 Data
160
200
240
Histogram: Example 1
Histogram: Example 2
Exercise
Identify situations in your research/work
environment where Bar charts, Pareto charts, Pie charts, Box plots, and Histograms could be used
Summary
Types of Data Descriptive/Summary Statistics
Graphical Presentations