You are on page 1of 65

Statistics

Nermin Mahmoud Ghith


B.S., PGDip.TQM, C.P.H.Q
Healthcare Quality
Consultant
Nermin_ghith@hotmail.com
Statistics

The handling/representation of numbers;


A collection of numerical facts that are
expressed in terms of summarizing
statements;
A method of dealing with data; a tool
concerned with the collection, organization,
and analysis of numerical facts or
observations.

Page 2
Statistics
Descriptive statistics:
Summarizing and describing characteristics
of collections of numbers precisely and
accurately;
Presenting information in a convenient,
usable, and understandable form.
Inferential statistics (Analytical):
Making calculated interpretations or
judgments about properties of large groups
on the basis of samples, with designated
confidence levels.
Page 3
Types of Data

Quantitative
Numerical
Can be measured
Continuous
Qualitative
Categorical or Nominal

Page 4
Types of Variables
Continuous
Pulse
Blood Glucose Level
Discrete or Categorical
Ordinal
Order in the Family
Severity of Pain
Nominal
Sex
Race
Binomial
No or Yes
Absent or Present Page 5
STATISTICAL HANDLING OF NUMBERS

1. Ordering Descriptive
(Organizing)
2. Averaging Descriptive
(Generalization)
3. Finding Variability Descriptive
(measuring relationship)
4. Comparing Inferential
(differences and effects)

Page 6
1. Ordering

Frequency distribution
Relative frequency/percentage
Ratio

Page 7
1. Ordering
Simple Frequency Distribution Tables

CPHQ Test Scores


(ranked highest to lowest)
125 1
124 3
123 2
122 2
121 3
120 4
119 0
118 4
117 1
116 2
115 5
N = 27

Page 8
1. Ordering Length of Stay (LOS) (Stroke)
N=50
X
X X
X X
X X X X
X X X X
X X X X
X X X X X X X
X X X X X X X X
X X X X X X X X X X X X
X X X X X X X X X X X X X X X

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

LOS (Days)

Page 9
1. Ordering
Frequency Terms

Number of cases (N): the total number of


cases in a group.
Class Interval: The width of a class of grouped
data, including both high and low values. The
"i" for the class of data, 116-125, is 10.
Midpoint: The middle score value in any
interval.
Class limits: The highest and the lowest score
in the interval, e.g., 125 and 116 in the class
interval of 116-125.

Page 10
1. Ordering
Frequency Distribution Table

Test Scores
56-65 42
66-75 70
76-85 99
86-95 74
96-105 52
106-115 40
116-125 22
____ ___
i = 10 N = 399

Page 11
1. Ordering
Frequency Distribution Table

Test Scores Cum


56-65 42 42
66-75 70 112
76-85 99 211
86-95 74 285
96-105 52 337
106-115 40 377
116-125 22 399
____ ___
i = 10 N = 399

Page 12
2. AVERAGING:
Measuring Central Tendency

Arithmetic mean ( M ) The


"average" or "mean".
Mode (Mo.)
Median (Mdn.)

Page 13
2. AVERAGING:
Measuring Central Tendency

These three tools allow you to look at an


average value for a given set of data which
can be useful when measuring tendencies
or trends:
The Mean
The Median
The Mode

Page 14
2. AVERAGING:
Measuring Central Tendency

Arithmetic mean ( M ) The


"average" or "mean".
What is
Calculate
__
X = (X1 + X2 + X3.+ Xn) / N
= X / N
When to use
Limitations
Page 15
2. AVERAGING:
Measuring Central Tendency
The Mean
The mean is the average of a set of numbers. To
calculate, simply add up all the numbers and divide
by total number of numbers in the set.
You use it to:
Calculate M from a simple frequency
distribution.
Calculate M from a grouped frequency
distribution, find the midpoint of the class
interval, then sum all midpoints, and divide by
the total number of cases (sum of frequencies)
Page 16
2. AVERAGING:
Measuring Central Tendency
The Mean

Some things to remember about the mean:


Every item in the set is used to calculate the
mean.
A mean may take on a value that is not
realistic,
The mean is affected by extreme, or
outlying, values.

Page 17
2. AVERAGING:
Measuring Central Tendency

Mode (Mo.)
What is
When to use
Limitations

Page 18
2. AVERAGING:
Measuring Central Tendency
The Mode
The mode is simply the most frequently occurring
value in a data set.
Used with data grouped into class intervals, where
Mo. Is the midpoint of the interval with the greatest
frequency.
Some things to remember about the mode:
The mode is the least utilized measure of
central tendency
For some types of data it is the most useful.
Page 19
2. AVERAGING:
Measuring Central Tendency

Median (Mdn.)
What is
Calculate
Odd : (n+1) /2
Even : The mean off (n / 2 )
, (n / 2)+1
When to use
Limitations

Page 20
2. AVERAGING:
Measuring Central Tendency
The Median
The median is the middle item in a set of numbers.
To calculate the Median:
Rank the numbers from largest to smallest.
Locate the middle item.
Use with ranked order measures.
Some things to remember about the median:
The median is not affected by extreme, or
outlying, numbers.
The median usually takes on a realistic,
meaningful value. Page 21
Comparison between Mean, Mode, Median

In a unimodeal symmetrical distribution, all the


three values are the same.
Asymmetrical or skewed distribution or curve,
Mo. Falls at the highest point
M falls someplace towards the TAIL
Mdn, lies between the Mo. And M.
M is more stable value from sample to sample,
and Mo. Is the least consistent.

Page 22
Exercise

Pulse (Beats/Min)
Group (1)
60, 65, 70, 75, 75, 60, 80, 90,
65, 60, 85
Group (2)
55, 65, 60, 70 ,85 , 100, 75, 90,
95,105

Page 23
3. Variability Indices
Measures of Dispersion

Definition:
The variation or scattering of data around
an average value, usually the arithmetic
mean ( M ).
The greater the spread of a distribution,
the greater the dispersion or variability (the
more heterogeneous).

Page 24
3. Variability Indices
Measures of Dispersion

Range
Average Deviation
Standard Deviation (SD)
Percentiles

Page 25
3. Variability Indices
Measures of Dispersion
Range (R)

The "distance" from the lowest (smallest) to the


highest (largest) value in a distribution.
The highest value minus the lowest value = R.
It is useful only as a gross descriptive statistic
Its value is totally dependent on the two extreme
scores.
For the data set 3, 3, 5, 6, and 8, the range is
R=8-3=5 Page 26
3. Variability Indices
Measures of Dispersion
Average
Deviation X Mean Mean-X |Mean-X|
75 95.41667 20.42 20.42
90 95.41667 5.42 5.42
95 95.41667 0.42 0.42
100 95.41667 -4.58 4.58
100 95.41667 -4.58 4.58
90 95.41667 5.42 5.42
95 95.41667 0.42 0.42
90 95.41667 5.42 5.42
95 95.41667 0.42 0.42
100 95.41667 -4.58 4.58
105 95.41667 -9.58 9.58
110 95.41667 -14.58 14.58
0.00 6.32
Page 27
3. Variability Indices
Measures of Dispersion

Standard X Mean Mean-X (Mean-X)2


75 95.41667 20.42 416.84
Deviation (SD) 90 95.41667 5.42 29.34
95 95.41667 0.42 0.17
100 95.41667 -4.58 21.01
100 95.41667 -4.58 21.01
90 95.41667 5.42 29.34
95 95.41667 0.42 0.17
90 95.41667 5.42 29.34
95 95.41667 0.42 0.17
100 95.41667 -4.58 21.01
105 95.41667 -9.58 91.84
110 95.41667 -14.58 212.67
8.908202 0.00 8.91

Page 28
3. Variability Indices
Measures of Dispersion

Standard
Deviation (SD)
sd =

Page 29
3. Variability Indices
Measures of Dispersion

Standard Deviation (SD)


Standard Deviation Empirical Rule
-3 99.7% +3
-2 95% +2
-1 68% +1
50
X
45

40

35

30

25

20

15

10

Page 30
3. Variability Indices
Measures of Dispersion
Standard Deviation(SD)
A measure of the spread of a distribution;
a computed value describing the amount of
variability in a particular distribution.
The more the values cluster around the mean, the
smaller the amount of variability or deviation.
A way to recognize deviation from the normal range
of variation.

Page 31
3. Variability Indices
Measures of Dispersion
Standard Deviation(SD)

It is a standard measure of a proportion of


variability such that,
in a normal distribution, 1 SD means that 68% of
the values will fall within,
2 SDs means 95% of the values will fall within,
and
3 SDs means that 99.7% of the values will fall
within.
Page 32
How To Calculate Standard Deviation(SD)

Standard Deviation is the square root of a measure


called the variance.
Before the SD can be determined, the mean ( M )
is found;
Then the deviation, or distance, each score (X) is
from M must be calculated.

Page 33
How To Calculate Standard Deviation(SD)

Each deviation ("x") is obtained by subtracting M


from each score ("x" n X - M). A low "x" means
little deviation. A high "x" means more dispersion
or variance.

The variance (SD) is found by squaring each "x",


then finding their sum and dividing by the total
number of scores (N).

Page 34
4. Data Comparison Techniques
Tests of Statistical Significance

In statistics, a result is called statistically


significant if it is unlikely to have occurred by
chance.
"A statistically significant difference" simply
means there is statistical evidence that there is
a difference; it does not mean the difference is
necessarily large, important, or significant in the
common meaning of the word.

p-value Page 35
4. Data Comparison Techniques
Tests of Statistical Significance

In statistics, a null hypothesis (H0) is a


hypothesis set up to be nullified or refuted in
order to support an alternative hypothesis.
When used, the null hypothesis is presumed
true until statistical evidence, in the form of a
hypothesis test, indicates otherwise that is,
when the researcher has a certain degree of
confidence, usually 95% to 99%, that the data
does not support the null hypothesis.

p-value Page 36
4. Data Comparison Techniques
Tests of Statistical Significance
In scientific and medical applications, the null
hypothesis plays a major role in testing the
significance of differences in treatment and
control groups.
The assumption at the outset of the experiment
is that no difference exists between the two
groups (for the variable being compared): this
is the null hypothesis in this instance.

p-value Page 37
4. Data Comparison Techniques
Tests of Statistical Significance

Concepts related to tests of Significance

Confidence interval

Level of significance:

p-value Page 38
Concepts related to tests of Significance
Confidence Level for Change
The confidence level ranges from 0% to
100%. The higher the confidence level, the
greater the certainty that a change took place.

By default, 90% confidence is required to state that


the change is significant.

The confidence level is based on what Statisticians


call the significant level, alpha-level or p-value.

The p-value is the probability that the observed


effect could have been due to the natural variation
in the data assuming no change took place.
Page 39
p-value
Concepts related to tests of Significance
Confidence Level for Change

A p-value of 0.05 would indicate that the


chance of the observed effect is low, 1 in 20,
due to variation alone. This is good evidence
that a change took place.

A p-value of 0.5 would indicate that there is a


50-50 chance of observed effect even if no
change took place. This offers little or no
evidence of a change.

The smaller the p-value, the greater the


evidence of a change. (CPHQ)
Page 40
p-value
Concepts related to tests of Significance
Confidence Level for Change

The confidence level is 100*


To find the significance level, subtract the
number shown from one.
(1 - p-value). Therefore:
Confidence Level p-value
99% 0.01
95% 0.05
90% 0.1
80% 0.2
50% 0.5
p-value Page 41
4. Data Comparison Techniques
Tests of Statistical Significance
Parametric Tests:
Statistical methods which depend on the parameters of
populations or probability distributions and are referred to as
parametric methods.
These tests are only meaningful for continuous data which is
sampled from a population with an underlying normal
distribution or whose distribution can be rendered normal by
mathematical transformation.
t test
Regression analysis
Non parametric tests:
Nonparametric methods require fewer assumptions about a
population or probability distribution and are applicable in a
wider range of situations:
Chi-square tests

Page 42
4. Data Comparison Techniques
Tests of Statistical Significance

Chi Square (X) Test:


Comparing Ratios or Rates using raw (tally) data
Measures the difference between the groups or
conditions being compared in the counts or rates
of a particular occurrence, event, or outcome?
Example:
Comparing Surgical facilities A and B on the rates
of post operative infections

Page 43
4. Data Comparison Techniques
Tests of Statistical Significance

The t-Test
Comparing Means Based on Variances
The t-Test compares two sets of like things, using
averages to see if they indicate real difference, or
a difference likely to have occurred by chance.
Ex.: Comparing the average number of packs of
cigarettes smoked per month by smokers who had
a heart attack (MI) before age 50 vs. smokers who
didnt have a heart attack.
Page 44
4. Data Comparison Techniques
Tests of Statistical Significance

Regression Analysis
Regression analysis is a statistical technique that
allows one to compare the entire distribution of
observations of one measurement (or variable) with
the entire distribution of another measure, in order
to determine how strongly the two variables are
interrelated (correlated).
Regression analysis is a way of evaluating the
kinds of data found in scatter diagrams.

Page 45
4. Data Comparison Techniques
Tests of Statistical Significance

A correlation coefficient ( r ) is the computed


value in regression analysis that expresses the
strength of the relationship between the two
measures.
The numbers associated with r range between 0
and plus or minus 1.

Regression Analysis
Page 46
4. Data Comparison Techniques
Tests of Statistical Significance
An r approaching +1.0 indicates a strong positive
relationship between the measures, with both sets
of measure numbers increasing or both decreasing
together.
An r approaching -1.0 indicates a strong negative
relationship, with the numbers of one of the
measures increasing, and the numbers of the other
measure decreasing.
Measures with no significant relationship will have
an r of approximately zero (0).

Regression Analysis
Page 47
4. Data Comparison Techniques
Tests of Statistical Significance

Correlation Examples

Regression Analysis
Page 48
Page 49
Page 50
Page 51
Page 52
4. Data Comparison Techniques
Tests of Statistical Significance

Regression Analysis
Issues to remember
Comparing Two Distributions
Correlation
+ve Correlation
-ve Correlation
No Correlation
Correlation Coefficient r
Regression Analysis Page 53
Important Notes

You CANNOT use tests developed for


continuous data with ordinal or nominal
variables.

Page 54
Graphs and Curves
Bar graphs usually present categorical and
numeric variables grouped in class intervals.

Page 56
Pictographs
A pictograph uses picture symbols to convey the meaning of
statistical information. Pictographs should be used carefully
because the graphs may, either accidentally or deliberately,
misrepresent the data. This is why a graph should be visually
accurate.

Page 57
A pie chart is a way of summarizing a set of
categorical data or displaying the different values of
a given variable (e.g., percentage distribution).

Page 58
Line graphs
Line graphs compare two variables: one is plotted along the
x-axis (horizontal) and the other along the y-axis (vertical).
line graphs can also depict multiple series which are usually
the best candidate for time series data and frequency
distribution.

Page 59
Scatterplots
In science, the scatterplot is widely used to present
measurements of two or more related variables. It is
particularly useful when the variables of the y-axis are
thought to be dependent upon the values of the variable of
the x-axis (usually an independent variable).

Page 60
Normal Distribution curves
Symmetrical Curves
the two sides of the curve are identical if the polygon is
folded in half perpendicular to the baseline ( bell-shaped or
rectangular)

Page 61
Skew. The skew of a distribution refers to how the curve
leans. When a curve has extreme scores on the right hand side of
the distribution, it is said to be positively skewed. In other words,
when high numbers are added to an otherwise normal distribution,
the curve gets pulled in an upward or positive direction. When the
curve is pulled downward by extreme low scores, it is said to be
negatively skewed (J Shape). The more skewed a distribution is,
the more difficult it is to interpret.

Page 62
Histograms
The histogram is a popular graphing tool. It is used
to summarize discrete or continuous data that are
measured on an interval scale.

Page 63
Histographs
A histograph, or frequency polygon, is a graph
formed by joining the midpoints of histogram column
tops. These graphs are used only when depicting
data from the continuous variables shown on a
histogram.

Page 64
Graphic representation of Relationship

Scatter Diagram
Flowchart
Cause and Effect Diagram

Page 65

You might also like