1 Statistics, Graphs, Curves

Statistics
Nermin Mahmoud Ghith

B.S., PGDip.TQM, C.P.H.Q
Healthcare Quality
Consultant
Nermin_ghith@hotmail.com
Statistics
The handling/representation of numbers;

A collection of numerical facts that are
expressed in terms of summarizing
statements;
A method of dealing with data; a tool
concerned with the collection, organization,
and analysis of numerical facts or
observations.
Page 2
Statistics
Descriptive statistics:
Summarizing and describing characteristics
of collections of numbers precisely and
accurately;
Presenting information in a convenient,
usable, and understandable form.
Inferential statistics (Analytical):
Making calculated interpretations or
judgments about properties of large groups
on the basis of samples, with designated
confidence levels.
Page 3
Types of Data
Quantitative
Numerical
Can be measured
Continuous
Qualitative
Categorical or Nominal
Page 4
Types of Variables
Continuous
Pulse
Blood Glucose Level
Discrete or Categorical
Ordinal
Order in the Family
Severity of Pain
Nominal
Sex
Race
Binomial
No or Yes
Absent or Present Page 5
STATISTICAL HANDLING OF NUMBERS
1. Ordering Descriptive
(Organizing)
2. Averaging Descriptive
(Generalization)
3. Finding Variability Descriptive
(measuring relationship)
4. Comparing Inferential
(differences and effects)
Page 6
1. Ordering
Frequency distribution
Relative frequency/percentage
Ratio
Page 7
1. Ordering
Simple Frequency Distribution Tables
CPHQ Test Scores

(ranked highest to lowest)
125 1
124 3
123 2
122 2
121 3
120 4
119 0
118 4
117 1
116 2
115 5
N = 27
Page 8
1. Ordering Length of Stay (LOS) (Stroke)
N=50
X
X X
X X
X X X X
X X X X
X X X X
X X X X X X X
X X X X X X X X
X X X X X X X X X X X X
X X X X X X X X X X X X X X X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LOS (Days)
Page 9
1. Ordering
Frequency Terms
Number of cases (N): the total number of

cases in a group.
Class Interval: The width of a class of grouped
data, including both high and low values. The
"i" for the class of data, 116-125, is 10.
Midpoint: The middle score value in any
interval.
Class limits: The highest and the lowest score
in the interval, e.g., 125 and 116 in the class
interval of 116-125.
Page 10
1. Ordering
Frequency Distribution Table
Test Scores
56-65 42
66-75 70
76-85 99
86-95 74
96-105 52
106-115 40
116-125 22
____ ___
i = 10 N = 399
Page 11
1. Ordering
Frequency Distribution Table
Test Scores Cum

56-65 42 42
66-75 70 112
76-85 99 211
86-95 74 285
96-105 52 337
106-115 40 377
116-125 22 399
____ ___
i = 10 N = 399
Page 12
2. AVERAGING:
Measuring Central Tendency
Arithmetic mean ( M ) The

"average" or "mean".
Mode (Mo.)
Median (Mdn.)
Page 13
2. AVERAGING:
These three tools allow you to look at an

average value for a given set of data which
can be useful when measuring tendencies
or trends:
The Mean
The Median
The Mode
Page 14
2. AVERAGING:
Arithmetic mean ( M ) The

"average" or "mean".
What is
Calculate
__
X = (X1 + X2 + X3.+ Xn) / N
= X / N
When to use
Limitations
Page 15
2. AVERAGING:
The Mean
The mean is the average of a set of numbers. To
calculate, simply add up all the numbers and divide
by total number of numbers in the set.
You use it to:
Calculate M from a simple frequency
distribution.
Calculate M from a grouped frequency
distribution, find the midpoint of the class
interval, then sum all midpoints, and divide by
the total number of cases (sum of frequencies)
Page 16
2. AVERAGING:
The Mean
Some things to remember about the mean:

Every item in the set is used to calculate the
mean.
A mean may take on a value that is not
realistic,
The mean is affected by extreme, or
outlying, values.
Page 17
2. AVERAGING:
Mode (Mo.)
What is
When to use
Limitations
Page 18
2. AVERAGING:
The Mode
The mode is simply the most frequently occurring
value in a data set.
Used with data grouped into class intervals, where
Mo. Is the midpoint of the interval with the greatest
frequency.
Some things to remember about the mode:
The mode is the least utilized measure of
central tendency
For some types of data it is the most useful.
Page 19
2. AVERAGING:
Median (Mdn.)
What is
Calculate
Odd : (n+1) /2
Even : The mean off (n / 2 )
, (n / 2)+1
When to use
Limitations
Page 20
2. AVERAGING:
The Median
The median is the middle item in a set of numbers.
To calculate the Median:
Rank the numbers from largest to smallest.
Locate the middle item.
Use with ranked order measures.
Some things to remember about the median:
The median is not affected by extreme, or
outlying, numbers.
The median usually takes on a realistic,
meaningful value. Page 21
Comparison between Mean, Mode, Median
In a unimodeal symmetrical distribution, all the

three values are the same.
Asymmetrical or skewed distribution or curve,
Mo. Falls at the highest point
M falls someplace towards the TAIL
Mdn, lies between the Mo. And M.
M is more stable value from sample to sample,
and Mo. Is the least consistent.
Page 22
Exercise
Pulse (Beats/Min)
Group (1)
60, 65, 70, 75, 75, 60, 80, 90,
65, 60, 85
Group (2)
55, 65, 60, 70 ,85 , 100, 75, 90,
95,105
Page 23
3. Variability Indices
Measures of Dispersion
Definition:
The variation or scattering of data around
an average value, usually the arithmetic
mean ( M ).
The greater the spread of a distribution,
the greater the dispersion or variability (the
more heterogeneous).
Page 24
Range
Average Deviation
Standard Deviation (SD)
Percentiles
Page 25
Range (R)
The "distance" from the lowest (smallest) to the

highest (largest) value in a distribution.
The highest value minus the lowest value = R.
It is useful only as a gross descriptive statistic
Its value is totally dependent on the two extreme
scores.
For the data set 3, 3, 5, 6, and 8, the range is
R=8-3=5 Page 26
Average
Deviation X Mean Mean-X |Mean-X|
75 95.41667 20.42 20.42
90 95.41667 5.42 5.42
95 95.41667 0.42 0.42
100 95.41667 -4.58 4.58
100 95.41667 -4.58 4.58
90 95.41667 5.42 5.42
95 95.41667 0.42 0.42
90 95.41667 5.42 5.42
95 95.41667 0.42 0.42
100 95.41667 -4.58 4.58
105 95.41667 -9.58 9.58
110 95.41667 -14.58 14.58
0.00 6.32
Page 27
Standard X Mean Mean-X (Mean-X)2

75 95.41667 20.42 416.84
Deviation (SD) 90 95.41667 5.42 29.34
95 95.41667 0.42 0.17
100 95.41667 -4.58 21.01
100 95.41667 -4.58 21.01
90 95.41667 5.42 29.34
95 95.41667 0.42 0.17
90 95.41667 5.42 29.34
95 95.41667 0.42 0.17
100 95.41667 -4.58 21.01
105 95.41667 -9.58 91.84
110 95.41667 -14.58 212.67
8.908202 0.00 8.91
Page 28
Standard
Deviation (SD)
sd =
Page 29
Standard Deviation (SD)

Standard Deviation Empirical Rule
-3 99.7% +3
-2 95% +2
-1 68% +1
50
X
45
40
35
30
25
20
15
10
Page 30
Standard Deviation(SD)
A measure of the spread of a distribution;
a computed value describing the amount of
variability in a particular distribution.
The more the values cluster around the mean, the
smaller the amount of variability or deviation.
A way to recognize deviation from the normal range
of variation.
Page 31
Standard Deviation(SD)
It is a standard measure of a proportion of

variability such that,
in a normal distribution, 1 SD means that 68% of
the values will fall within,
2 SDs means 95% of the values will fall within,
and
3 SDs means that 99.7% of the values will fall
within.
Page 32
How To Calculate Standard Deviation(SD)
Standard Deviation is the square root of a measure

called the variance.
Before the SD can be determined, the mean ( M )
is found;
Then the deviation, or distance, each score (X) is
from M must be calculated.
Page 33
How To Calculate Standard Deviation(SD)
Each deviation ("x") is obtained by subtracting M

from each score ("x" n X - M). A low "x" means
little deviation. A high "x" means more dispersion
or variance.
The variance (SD) is found by squaring each "x",

then finding their sum and dividing by the total
number of scores (N).
Page 34
4. Data Comparison Techniques
Tests of Statistical Significance
In statistics, a result is called statistically

significant if it is unlikely to have occurred by
chance.
"A statistically significant difference" simply
means there is statistical evidence that there is
a difference; it does not mean the difference is
necessarily large, important, or significant in the
common meaning of the word.
p-value Page 35
In statistics, a null hypothesis (H0) is a

hypothesis set up to be nullified or refuted in
order to support an alternative hypothesis.
When used, the null hypothesis is presumed
true until statistical evidence, in the form of a
hypothesis test, indicates otherwise that is,
when the researcher has a certain degree of
confidence, usually 95% to 99%, that the data
does not support the null hypothesis.
p-value Page 36
In scientific and medical applications, the null
hypothesis plays a major role in testing the
significance of differences in treatment and
control groups.
The assumption at the outset of the experiment
is that no difference exists between the two
groups (for the variable being compared): this
is the null hypothesis in this instance.
p-value Page 37
Concepts related to tests of Significance
Confidence interval
Level of significance:
p-value Page 38
Confidence Level for Change
The confidence level ranges from 0% to
100%. The higher the confidence level, the
greater the certainty that a change took place.
By default, 90% confidence is required to state that

the change is significant.
The confidence level is based on what Statisticians

call the significant level, alpha-level or p-value.
The p-value is the probability that the observed

effect could have been due to the natural variation
in the data assuming no change took place.
Page 39
p-value
A p-value of 0.05 would indicate that the

chance of the observed effect is low, 1 in 20,
due to variation alone. This is good evidence
that a change took place.
A p-value of 0.5 would indicate that there is a

50-50 chance of observed effect even if no
change took place. This offers little or no
evidence of a change.
The smaller the p-value, the greater the

evidence of a change. (CPHQ)
Page 40
p-value
The confidence level is 100*

To find the significance level, subtract the
number shown from one.
(1 - p-value). Therefore:
Confidence Level p-value
99% 0.01
95% 0.05
90% 0.1
80% 0.2
50% 0.5
p-value Page 41
Parametric Tests:
Statistical methods which depend on the parameters of
populations or probability distributions and are referred to as
parametric methods.
These tests are only meaningful for continuous data which is
sampled from a population with an underlying normal
distribution or whose distribution can be rendered normal by
mathematical transformation.
t test
Regression analysis
Non parametric tests:
Nonparametric methods require fewer assumptions about a
population or probability distribution and are applicable in a
wider range of situations:
Chi-square tests
Page 42
Chi Square (X) Test:

Comparing Ratios or Rates using raw (tally) data
Measures the difference between the groups or
conditions being compared in the counts or rates
of a particular occurrence, event, or outcome?
Example:
Comparing Surgical facilities A and B on the rates
of post operative infections
Page 43
The t-Test
Comparing Means Based on Variances
The t-Test compares two sets of like things, using
averages to see if they indicate real difference, or
a difference likely to have occurred by chance.
Ex.: Comparing the average number of packs of
cigarettes smoked per month by smokers who had
a heart attack (MI) before age 50 vs. smokers who
didnt have a heart attack.
Page 44
Regression Analysis
Regression analysis is a statistical technique that
allows one to compare the entire distribution of
observations of one measurement (or variable) with
the entire distribution of another measure, in order
to determine how strongly the two variables are
interrelated (correlated).
Regression analysis is a way of evaluating the
kinds of data found in scatter diagrams.
Page 45
A correlation coefficient ( r ) is the computed

value in regression analysis that expresses the
strength of the relationship between the two
measures.
The numbers associated with r range between 0
and plus or minus 1.
Regression Analysis
Page 46
An r approaching +1.0 indicates a strong positive
relationship between the measures, with both sets
of measure numbers increasing or both decreasing
together.
An r approaching -1.0 indicates a strong negative
relationship, with the numbers of one of the
measures increasing, and the numbers of the other
measure decreasing.
Measures with no significant relationship will have
an r of approximately zero (0).
Regression Analysis
Page 47
Correlation Examples
Regression Analysis
Page 48
Page 49
Page 50
Page 51
Page 52
Regression Analysis
Issues to remember
Comparing Two Distributions
Correlation
+ve Correlation
-ve Correlation
No Correlation
Correlation Coefficient r
Regression Analysis Page 53
Important Notes
You CANNOT use tests developed for

continuous data with ordinal or nominal
variables.
Page 54
Graphs and Curves
Bar graphs usually present categorical and
numeric variables grouped in class intervals.
Page 56
Pictographs
A pictograph uses picture symbols to convey the meaning of
statistical information. Pictographs should be used carefully
because the graphs may, either accidentally or deliberately,
misrepresent the data. This is why a graph should be visually
accurate.
Page 57
A pie chart is a way of summarizing a set of
categorical data or displaying the different values of
a given variable (e.g., percentage distribution).
Page 58
Line graphs
Line graphs compare two variables: one is plotted along the
x-axis (horizontal) and the other along the y-axis (vertical).
line graphs can also depict multiple series which are usually
the best candidate for time series data and frequency
distribution.
Page 59
Scatterplots
In science, the scatterplot is widely used to present
measurements of two or more related variables. It is
particularly useful when the variables of the y-axis are
thought to be dependent upon the values of the variable of
the x-axis (usually an independent variable).
Page 60
Normal Distribution curves
Symmetrical Curves
the two sides of the curve are identical if the polygon is
folded in half perpendicular to the baseline ( bell-shaped or
rectangular)
Page 61
Skew. The skew of a distribution refers to how the curve
leans. When a curve has extreme scores on the right hand side of
the distribution, it is said to be positively skewed. In other words,
when high numbers are added to an otherwise normal distribution,
the curve gets pulled in an upward or positive direction. When the
curve is pulled downward by extreme low scores, it is said to be
negatively skewed (J Shape). The more skewed a distribution is,
the more difficult it is to interpret.
Page 62
Histograms
The histogram is a popular graphing tool. It is used
to summarize discrete or continuous data that are
measured on an interval scale.
Page 63
Histographs
A histograph, or frequency polygon, is a graph
formed by joining the midpoints of histogram column
tops. These graphs are used only when depicting
data from the continuous variables shown on a
histogram.
Page 64
Graphic representation of Relationship
Scatter Diagram
Flowchart
Cause and Effect Diagram
Page 65

1 Statistics, Graphs, Curves

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Statistics, Graphs, Curves

Uploaded by

Copyright:

Available Formats

Statistics

Nermin Mahmoud Ghith

The handling/representation of numbers;

CPHQ Test Scores

Number of cases (N): the total number of

Test Scores Cum

Arithmetic mean ( M ) The

These three tools allow you to look at an

Arithmetic mean ( M ) The

Some things to remember about the mean:

In a unimodeal symmetrical distribution, all the

The "distance" from the lowest (smallest) to the

Standard X Mean Mean-X (Mean-X)2

Standard Deviation (SD)

It is a standard measure of a proportion of

Standard Deviation is the square root of a measure

Each deviation ("x") is obtained by subtracting M

The variance (SD) is found by squaring each "x",

In statistics, a result is called statistically

In statistics, a null hypothesis (H0) is a

Concepts related to tests of Significance

By default, 90% confidence is required to state that

The confidence level is based on what Statisticians

The p-value is the probability that the observed

A p-value of 0.05 would indicate that the

A p-value of 0.5 would indicate that there is a

The smaller the p-value, the greater the

The confidence level is 100*

Chi Square (X) Test:

A correlation coefficient ( r ) is the computed

You CANNOT use tests developed for

You might also like