You are on page 1of 151

Introduction

Definition
Uses Of Statistics In Dental Sciences
Limitations
History
Population And Sample
Sampling
Data Its Types And Sources
Presentation Of Data
Measures Of Central Tendency
Variability
Distribution
Types Of Distribution
Probability
Statistical Methods
Tests Of Significance
Sensitivity And Specificity
Statistical Software
Conclusion
References
BELIEFS & HYPOTHESIS

Epidemiological statistical
survey analysis

THEORIES & PRINCIPLES


It is the science & art of collection, compilation,
presentation, analysis, & interpretation of numerical data
concerned with biological events which are affected by
multiple factors.

It is the science which applies the theory of probability to


the making of estimates & inferences about the
characteristics of population
:

Health statistics public/ community


health.

Medical statistics medicine.

Vital statistics demography.

Dental statistics dentistry.


To assess the state of oral health in the
community

To determine the availability and


utilization of dental care facilities.

To indicate the basic factors underlying the


state of oral health by diagnosing the
community and find solutions to such
problems.
To determine success or failure of specific
oral health care programs or to evaluate the
programme action

To promote health legislation and in


creating administrative standards for oral
health
Statistic laws are not exact laws like mathematical
or chemical laws but are only true in majority of
cases.

Ex: when we say that the average height of an adult


indian is 5 6 , it indicates the height not of
individual but of a group of individuals
Question by Chevalier de Mr asking the French
mathematician
Blaise Pascal (1623 - 1662) in 1654:

Consider the following bet:


1)One dice is thrown four times.
Assume there is at least one 6.
2)Two dices are thrown simultanously
24 times. Assume there is at least
one throw with two 6.
After this question there was a correspondence between Blaise
Pascal
and Pierre de Fermat (1601 - 1665) about this problem.

The Chevalier has asked, whether it is correct,


that bet 1) more often wins than bet 2).
Pascal and Fermat were able to prove this
assumption mathematically.
Pierre Simon Marquis de Laplace (1749 - 1827)

First summary of the whole


kowledge
in the area of probability
theory.
Jacob Bernoulli (1654 - 1705)

Law of large numbers:


Relative frequencies converge
to the probability of the event
(repeating independent events)
Carl Friedrich Gauss (1777 - 1855)

Gauss (Normal distribution)


Karl (Carl) Pearson (1857 - 1936)

Methods of correlation- and


regression analysis
(correlation coefficient by Pearson).

Chi-square-distribution
Chi-square-test
William Sealy Gosset (1876 - 1937)

(Student)

worked since 1899 as chemist


in the Guinness brewery in Dublin

analysed quality of beer in the case of


small sample sizes

student-distribution (= t-distribution)
substitutes Gauss-distribution
if variance is unknown
Sir Ronald Aylmer Fisher (1890 - 1962)
The Design of Experiments
variance analysis
F-distribution
population genetics
Egon Sharpe Pearson (1895 - 1980) Jerzy Neyman (1894 - 1981)

Test theory since 1940 Neyman-Pearson-test


Andrei Nikolaevich Kolmogorov (1903 1987)

Developed basis of probability theory 1933


Population a group of individuals that we
would like to know something about

Sample a subset of a population (hopefully


representative)
Parameter - a characteristic of the population in
which we have a particular interest
Statistic a characteristic of the sample
Examples:
The observed proportion of the sample that
responds to treatment
The observed association between a risk factor
and a disease in this sample
1.Well chosen

2.Sufficiently large (to minimize sampling


error)

3.Adequate coverage
Population Sample

Representative of population

looking at the characteristics of


expensive and time-consuming the sample (statistics)

something about the


impractical characteristics of the population
(parameters)

Time saved, economical,


supervision is better, results
received earlier & applied timely
As large as possiblebigger the sample ,
higher will be the precision of estimates of
the sample.
n= Z 2 p (1-p)

L2
Following factors:

Idea about estimate of characteristics under study


and its variabilitypilot study

Knowledge about precision of estimate of character

Probability level within which desired precision


maintained

Availability of experimental material, resources ..


1. Difference expected

2. Positive character

3. Degree of variation among subjects

4.Level of significance desired- p value

5. Power of the study desired

6. Drop out rate


Sampling Methods

Probability Non - Probability


1. Simple random
1. Accidental/ convenience
sampling
2. Purposive/ nonrandom/ selective
2. Systematic/ serial
3. Network / snowball
3. Stratified
4. Quota sampling
4. Multiphase
5. Dimensional
5. Cluster
Random :chance of population unit being selected in
sample

Applicable when small population , homogenous , readily


available

To ensure randomness Lottery method


Table of random numbers
A simple random sample of 20 cases
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140
141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160
161 162 163 164 165 166 167 168 169 170
171 172 173 174 175 176 177 178 179 180
181 182 183 184 185 186 187 188 189 190
191 192 193 194 195 196 197 198 199 200
Serial sampling
Applied to field studies
K = sample interval
K = total population/ sample size desired
Adv
simple
Less time & labor
Results fairly accurate
2 3 4 5 6 7 8 9 10
1
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140
141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160
161 162 163 164 165 166 167 168 169 170
171 172 173 174 175 176 177 178 179 180
181 182 183 184 185 186 187 188 189 190
191 192 193 194 195 196 197 198 199 200
Sub groups
Target population divided
into homogenous groups or
classes called strata
Strata age , sex , classes ,
geographical area
More representative sample
Greater accuracy
Covers wide area
school children for oral disease

Those with disease

Those requiring treatment


Cluster is a randomly selected group
Units of population in natural groups or
clusters. Villages, slums of town
Simple method , less time and cost
Higher error
used where

(1) no sampling frame directly available, and/or


(2) simple random sampling would be expensive,
complex, time-consuming and/or logistically
difficult.
Sampling errors Faulty sample design
Small sample size

Non sampling errors Coverage error


Observational error
Processing error
1. Controls

2. Randomization or random allocation

3. Cross over design

4.Placebo

5. Blinding technique -single/ double blinding


There are three types

1.Observer-subjective / objective

2.Instrumental

3.Sampling defects or error of bias


This is also called as systematic error. This
occurs when there is a tendency to produce
results that differ in a systematic manner
from the true values. A study with small
systematic error is said to have high
accuracy. Accuracy is not affected by the
sample size.
..

There are as many as 45 types of biases,


however the important ones are:
1.Selection bias
2.Measurement bias
3.Confounding bias
Dahlberg in 1940 used this formula to calculate
the method error
Method error=d2
2n
Where
D=difference Between two measurements of a pair
N = Number of subjects
A collective recording of observations either
numerical or other-wise
Data are not necessarily information, and having
more data does not necessarily produce better
decisions.

The goal is to summarise and present data in


useful ways to support prompt and effective
decisions.
Types of data

Constant
Variable
1)Discrete 1)Grouped
2)Continous 2)Ungrouped

1)Qualitative
1)Nominal
2)Ordinal
2)Quantitative Types of
data
Discrete : When the variable under
observation takes only fixed values like
whole numbers , data is discrete

Continuous : If the variable can take any


value in a given range , decimal or
fractional.
Ordinal variables-A set of data is said to be
ordinal if the values / observations belonging
to it can be ranked (put in order) or have a
rating scale attached. You can count and
order, but not measure, ordinal data.

Nominal variables have no natural ranking


order .

Dichotomous variable-have only two categories


such as male/female.
Interview Measurement

Questionnaire
.PRIMARY

Records ..SECONDARY
Nominal data or qualitative data
Frequency distribution of some
characteristics sex, religion etc
Ordinal data
According to rank or orderno
implication of class-interval
Interval data
Placed in meaningful intervals or
order
Ratio data
Frequency distribution in logical
order & meaningful groups
Meaningful ratio exists
It helps in further analysis
Selection of data presentation
Selection of test of significance

1. Variation biological & sampling


2. Mistakes & errors instrumental, systematic,
random
3. Bias
selection bias
measurement bias
confounding bias
Tables
Presentation of
data Charts

Diagrams
Tabulation
Graphs

Pictures
Diagram
Special curves
Devices for presenting data simply from
masses of statistical data

First step before data is used for statistical


analysis
Simple
Tables
Complex
Every table should contain a title as to what
is depicted in the table

Each row and column should be clearly


defined with a heading for each row and
column
Units of measurement should be specified

If the data is not original , the source of the


data should be mentioned at the bottom of the
table
Every diagram must be given a title which is
self explanatory

It should be simple and consistent with the


data

Usually the values of variables are presented on


the horizontal or x-axis and the frequency on
the y-axis
The number of lines drawn in any
graph should not be many so that the
diagram does not look clumsy

The scale of presentation of x and y axes


should be mentioned at the right hand
top corner of the graph
This diagram is useful to study changes of
values in the variable over time and is the
simplest type of diagram

On the x- axis the time such as hours, days,


weeks, months or years are represented and the
value of any quality pertaining to this is
represented along the y- axis
This diagram is used to compare qualitative data
with respect to a single variable like with respect to
time or region .
Represent qualitative data

When it desired to represent both the number of


cases in major groups and as well as the subgroups
simultaneously, we use component bar diagram
This diagram is used to depict quantitative data of
continuous type

A histogram is a bar diagram without gap between


the bars

It represents a frequency distribution


On the x- axis size of observation is marked .
On the y axis frequencies are marked
This is used to represent frequency distribution of
quantitative data and is used to compare two or more
frequency distributions

To draw a frequency polygon, a point is marked over the mid-


point of the class interval, corresponding to the frequency.
Then these points are connected by straight lines
1st Q
1st Qtr 1st Qtr
2nd
2nd Qtr 2nd Qtr 3rd Q
3rd Qtr 3rd Qtr 4th Q
4th Qtr 4th Qtr

1s
2n
3rd
Graph looks like a pie 4th

Components represent slices cut from a pie


Shows % breakdown for qualitative data
Cannot be used for two or more data sets
Venn diagrams or set
diagrams are diagrams that
show all hypothetically
possible logical relations
between a finite collection of
sets
It is a representation of the
qualities (25%, 50% & 70%)
and the range of a continuous
and ordered data set

The Y axis can be arithmetic


or logarithmic
Box plot can be used to
compare different
distributions of data values
1. Collection of data
2. Classification
3. Tabulation
4.Presentation by graphs
5. Descriptive statistics
6. Establishment of relationship
7.Interpretation
Refers to the middle of the distribution
value or parameter which serves as
single estimate of a series of data
Mental picture of central value
Enables comparison
One central value around which all
other observations are dispersed
OBJECTIVE
To condense the entire mass of data
To facilitate comparison
TYPES
Arithmetic mean- mathematical estimate
Median - positional estimate
Mode- based on frequency
Should be easy to understand and compute.

should be based on each and every item in the


series.

should not be affected by extreme observations


(either too small or too large values).
should be capable of further statistical
computations

It should have sampling stability. i..e, if


different samples of same size, say 1 are
picked up from the same population and the
measure of central tendency is calculated, they
should not differ from each other markedly.
simplest measure of central tendency

Mean = Sum of all the observations of the data


Number of observations in the data

0 1 2 3 4 5 6 7 8 9 10
X = Xi
n Mean = 5
: sigma, means the sum of.
Xi : is the value of each observation in the
data,
n: is the number of observations in the
data
The most common measure of central
tendency
Affected by extreme values (outliers)
one half of the one half have a value
values have a higher than or
value smaller than or equal to the median.
equal to the median

Middle Value In A Distribution


Robust Measure Of Central Tendency
Not Affected By Extreme Values
Observations are arranged in the order of
magnitude & then the middle value of the
observations : median.
Odd number of observations : (n + 1) / 2
Even: the mean of the two middle values

Total no observations / 2
Value In A Series Of Observations Which Occurs With
The Greatest Frequency
Not Affected By Extreme Values
Used For Either Numerical Or Categorical Data
There May Be No Mode
There May Be Several Modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9
Synonyms:
Measures of dispersion
Measures of variation or scatter
Dispersion is the degree of spread or variation of
the variable about a central value.
Uses:
Determine reliability of an average
Serve as a basis of control of variability
Comparison of two or more series
Facilitate further statistical analysis

A good measure of dispersion :


simple , easy to compute , based on all items ,
amenable for further analysis and not affected by
extreme values.
Of individual observations
Range
Interquartile range
Mean deviation
Standard deviation
Coefficient of variation
Measure of variation
Difference between the value of the smallest
item and the value of the largest item
Simplest method
Gives no information about the values that lie
between the extreme values
Subjected to fluctuations from sample to
sample
Range = 12 - 7 = 5 Range = 12 - 7 = 5

7 8 9 10 11 12 7 8 9 10 11 12

Ignores the way in which data are distributed


The average of the deviations from the
arithmetic mean
M.D= (x - x)
n
Most important and widely used
It is the square root of the mean of the
squared deviations from arithmetic
mean. n

X X
2
i
Root mean square deviation S
2 i 1

n 1
Greater the deviation greater the
dispersion
Smaller the deviation- higher degree of
uniformity
Calculate the mean = x
Diff of each observation from mean,
d = xi x
Square these = d
Total these = d
Divide this by no of observations minus 1,
variance = d/ (n-1)
Square root of this variance is
S.D = d
(n-1)
Summarizes the deviations , of a large
distribution
Indicates whether the variation from mean is by
chance or real
Helps in finding standard error
Helps in finding the suitable size of sample
Standard deviation is only interpretable as a
summary measure for variations having
approximately symmetric preparations
Compare relative variability

Variation of same character in two or more series

Compare the variability of one character in two


different groups having different magnitude of
values or

to compare two characters in the same group by


expressing in percentage

Cv= s.D x 100


Mean
When you have a collection of points you

begin the initial analysis by plotting them

on a graph to see how they are distributed


1.Normal-Gaussian
2.Binomial
3.Poisson
4.Rectangular or uniform
5.Skewed
6.Log normal
7.Geometric
Uniform or rectangular

Bimodal
De moirre & Guass independently in 1933

Height of bars or curve greatest in middle

Values are spread around mean

Maximum values around mean , few at


extremes

half values above & half below mean

normal distribution / Gaussian distribution


curve is bell shaped.

The curve is symmetrical about the middle point

The mean is located at the highest point of the curve

Measures of central tendency coincide

Maximum number of observations is at the value of the

variable corresponding to the mean

Number of observations gradually decreases on either side

with very few observations at the extreme points


Area under the curve between any 2 pts which
correspond to the number of observations between any 2
values of the variate - in terms of a relationship between
the mean and the SD:
a) Mean 1 S.D. covers 68.3% of the observations;
b) Mean 2 S.D. covers 95.4% of the observations;
c) Mean 3 S.D. covers 99.7% of the observations.
Confidence limits and used for fixing confidence
interval.
Normal distribution law forms the basis for various
tests of significance
Describes how data is distributed
Measures of shape
Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean
The binomial distribution is used for describing
DISCRETE NOT THE CONTINUOUS DATA. These
values are as a result of an experiment known as
Bernoullis process.
They are used to describe
1. One with certain characteristic
2. Rest without this characteristic
The distribution of the occurrence of the characteristic
in the population is defined by the binomial
distribution.
If in a binomial distribution the value of
probability of success and failure of an event
becomes indefinitely small and the number of
observation becomes very large, then binomial
distribution tends to poisson distribution.

This is used to describe the occurrence of rare


events in a large population.
Possibility of occurrence of any event or a
permutation/ combination of events
To determine if the treatment group is different
from control group
If it is less than .05, it means there are fewer
than 5 chances out of 100 that the difference we
observe are due to random chance alone.
Probability scale = 0 to 1
P = 0 absolutely no chance that the
observed diff is due to sampling
variation
P = 1 absolutely certain that the
observed difference is due to
sampling variation
P = between 0 to 1
Essence of any test of significance. P value &
draw inference
P- value

0.05 or > < 0.05

Diff due to diff in


Diff due to chance
sample themselves
( sampling variation)
(not due to sampling variation)

Statistically not significant Statistically significant

P < 0.001 very highly significant


P < 0.01 Highly significant
P < 0.05 Significant
P > 0.05 Not significant
It indicates how much an observation is bigger
or smaller than mean in units of SD
Z ratio = Observation Mean
Standard Deviation
The Z score is the number of SDs that the
simple mean depart from the population mean.
As the critical ratio increases the probability of
accepting null hypothesis decreases.
Statistical hypothesis

Null hypothesis or Alternative hypothesis of


hypothesis of no difference significant difference
STEP-1 Set up the null hypothesis

STEP-2 Set up the alternate hypothesis

STEP-3 Choose the appropriate level of significance

STEP-4 Compute the value of test statistic


Z VALUE = Observed difference

Standard error
STEP-5 Obtain the table value at the given level of

significance

STEP-6 Compare the value of z with that of table

value

STEP-7 Draw the conclusion


Initial assumption
Null hypothesis Ho
(no diff)

experiment

Tests of significance
Diff observed
p-value

Real diff No Real diff Acceptation or


Rejection only
If sufficient sample
Size exist

Alternate hypothesis accepted


Null hypothesis is accepted
[H1]
Null hypothesis
CONCLUSION BASED ON
POPULATION SAMPLE
NULL NULL
HYPOTHESIS HYPOTHESIS
REJECTED ACCEPTED

NULL TYPE I ERROR CORRECT


HYPOTHESIS DECISION
TRUE

NULL CORRECT TYPE II ERROR


HYPOTHESIS DECISION
FALSE
To Minimize Errors The Sampling Distribution Or
Area Under Normal Curve Is Divided Into Two
Regions Or Zones

1. ZONE OF ACCEPTANCE :samples in the area of mean


1.96 SE, null hypothesis accepted

2. ZONE OF REJECTION: sample in the shaded area is


beyond the mean 1.96 SE, null hypothesis - rejected
Whenever 2 sets of observation have been compared, it becomes
essential to find whether the difference in observations between
the 2 groups is because of sampling variation/ any other factor

TESTS OF
SIGNIFICANCE

Parametric tests Non-Parametric tests


Parametric Non Parametric
1 Student paired T test 1 Wilcoxan signed rank test

2 Student unpaired T test 2 Wilcoxan rank sum test

3 One way Anova 3 Kruskal wallis one way anova

4 Two way Anova 4 Friedman one way anova

5 Correlation coefficient 5 Spearmans rank correlation

6 Regression analysis 6 Chi-square test


Performed with following assumptions
Dependent variable is continuous type
(measured interval or ratio scale)

Underlying population has normal distribution

When difference or measures of association are


being tested, variances of samples do not vary
significantly
For testing difference
T-test (paired & unpaired)
Analysis of variance (ANOVA)
Test of significance of difference between 2
means
For testing correlation
Pearsons product moment correlation
coefficient
Does not start with assumptions
1. for ordinal & non-parametric data to test diff
1. Mann Whitneys test
2. Wilcoxans matched pair test
3. Sign test
4. Mc Nemars test
5. Krusal Walliss test
6. Friedmans test
7. Kendalls test
1. For category type data to test diff
1. Chi-square test
2. Fischers exact test
3. Cochranes Q-test
2. For ordinal & non-parametric data to test
correlation
1. Spearmans rank correlation
2. Kendalls
3. For nominal type data
1. Phi coefficient
2. Cramers V
There are many theoretical distributions, both

continuous and discrete.

4 of these a lot: z (unit normal),


sampling distribution
of means
t,

chi-square, and
sampling distribution
of variances
F/ ANOVA
To compare sample mean with population

Means of two samples

Sample proportion with population

Proportion of two samples

Association between two attributes


One-sided tests have one
rejection region, i.e. you check
whether the parameter of
interest is larger (or smaller)
than a given value

Two-sided tests are used when


we test a parameter for
equivalence to a certain value.
Deviations from that value in
both directions are rejected.
Large samples ( > 30)

Difference observed b/w


sample estimate and that
of population is expressed
in terms of SE

Score of value of ratio b/w


the observed difference &
SE is called Z

Z = diff in means / SE of
mean
To compare two methods, it is often
important to know whether the
variances for both methods are the
same
In order to compare two variances V1,
and v2calculate the ratio of the two
variances
This ratio is called the f-statistic F
= v1/v2
Compare more than two samples
Compares variation between the classes as well
as within the classes
For such comparisons there is high chance of
error using T or Z test
One-way Two-way
compare more than 3 means compare 2 or more means
from independent groups by 2 or more factors.

Is the age different between White, Black, Hispanic patients?

Is the age different between Males and Females, With and


Without Pnuemonia?
One-way ANOVA
Two-way ANOVA
Factorial ANOVA
Mixed design ANOVA
Multivariate ANOVA (MANOVA)
When the degree of linear (straight line) association
between two variables is required, correlation coefficient is
calculated.
Ex: measure the changes in height and the changes that
occurred in weightand plot the determined values on
graph paper.
A line of best fit is then made to connect the majority
of the plotted values.
One has to look at a scatter plot of the data before
placing any importance on the magnitude of
correlation.
Height in cms Weight in Kg
1 182.1 79.5
2 172.5 61.5
3 175.7 68.2
4 172.8 66.4
5 160.3 52.6
6 165 .5 54.3
7 172.8 61.1
8 162.4 52.8
POSITIVE CORRELATION NEGATIVE CORRELATION
CORRELATION COEFFICIENT (r)

ABSOLUTELY
NO CORRELATION
Linear regression is related to correlation
analysis.

This seeks to quantify the linear relationship


that may exist between an independent variable
x and a dependent variable y

Y=a+bx
Linear Regression Analysis
Comparing two means to see if
they are significantly different
from each other

Small samples
Designed by W.S Gossett
Used in case of small samples
Ratio of observed difference between means of
two small samples to the SE of difference is
same
When each individual gives a pair of
observations , to test for difference in pair of
values , paired t test is utilized
Used to compare the average (mean) in one
group with the average in another group.
Univariate, Unmatched, Interval, Normal, 2
groups.
Eg. 6 boys on diet A- 4,3,5,2,3,1
9 boys on diet B- 6,3,8,9,5,3,4,2,5
x=6 y= 9 SD 2.04
Test the significance of diff in diet A n B
with regards to their effect on increase in
weight
Used to compare the average for
measurements made twice within
the same person - before vs. after.

For example, Did the systolic


blood pressure change
significantly from the scene of
the injury to admission?

Univariate, Matched, Interval,


Normal, 2 groups
The most commonly used statistical test.
Used for qualitative data
To test whether the difference in distribution of
attributes in different groups is due to sampling
variation or otherwise.
Used when data are in frequencies such as in no. Of
responses in 2 or more categories

Three imp applications


1. Proportion
2. Association
3. Goodness of fit
Test of proportions
Significance of diff in 2 or more than 2 proportions
Values < 30 applied with YATES CORRECTION
Test of association
Test of association between 2 events in samples. most
important application
Probability of association between 2 discrete attributes.
Smoking vs. cancer, wt vs. DM, BP vs. HD
Two events
dependent
Influence each other ASSOCIATED
Test for association
independent
Do not Influence each other Not ASSOCIATED

- p or relative frequency of association due to chance &


also if 2 events are associated or depend on each other
Test of Goodness of Fit
to determine if actual numbers are similar to
the expected or theoretical numbers ie Goodness of
Fit to a theory
Chi-square calculation
(observed f expected f )
= Expected f

2nd d.f. = (R-1) x (C-1)


These tests are non-parametric equivalent of student

t tests. Wilcoxan signed rank is used for paired

data and wilcoxan rank sum is used in case of

unpaired data.
These are similar to parametric anova tests.

Kruskal-wallis is used for one way analysis of

variance and Friedman is for two way analysis

of variance.
Spearmans rank correlation and

Kendalls rank correlation are the non-

parametric equivalents of correlation

coefficient test.
It is used to classify cases into the values of
a categorical dependent, usually a
dichotomy. If discriminant function
analysis is effective for a set of data, the
classification table of correct and incorrect
estimates will yield a high percentage
correct.
Gene Glass(1976) coined the term meta analysis.

The technique of meta analysis involves reviewing and


combining the results of various previous studies.
Provided the studies involved similar treatments, similar
samples, and measured similar outcomes, this can be a
useful approach.
Use parametric Non parametric

To compare two paired Paired t test Wilcoxan signed


samples for equality of rank test
means
To compare two Unpaired t Mann Whitney test
independent samples for test
equality of means

To compare more than two ANOVA Kruskal-Wallis


samples for equality of Chi square test
means
Clinical research can indeed have controls.
Provided that studies are conducted on a
prospective basis, controlled clinical studies can
be quite powerful.

Uncontrolled clinical studies are of questionable


validity, whether or not they are subjected to
statistical analysis.
The sensitivity of a test is the probability that
the test is positive for those subjects who
actually have the disease. A perfect test will
have a sensitivity of 100%. The sensitivity is
also called the true positive rate.

The specificity of a test is the probability that


the test is negative for those in whom the disease
is absent. A perfect test will have a specificity of
100%. The specificity is also called the true
negative rate.
DIAGNOSIS
TEST
RESULT DISEASE DISEASE TOTAL
PRESENT ABSENT
POSITIVE (+) a ( 8) b (10) a +b=(18)
NEGATIVE(-) c (20) d ( 62) c+d = (82)

TOTAL a +c = (28) b +d (72) N =100

Sen = a/a+c) *100 spec= d/(b+d)*100


pre +test= a/(a+b)*100 pre test=d/(c+d)*100
%false neg= c/(a+c ) *100 %false pos= b/(b+d)*100
SPSS (PASW) - 1968 - Norman H. Nie and C.
Hadlai Hull

MINITAB - Barbara F. Ryan, Thomas A. Ryan, Jr.,


and Brian L. Joiner in 1972

Epi Info - developed by center for disease control and


prevention (CDC) in Atlanta

MICROSOFT EXCEL
To conclude statistics might not be of
much significance in clinical practice but
when it comes to pursuing newer
techniques and principles, scientific
substantiation can be obtained only with
statistical verification and interpretation.
References
Parks Textbook of Preventive and Social
medicine; 17th edi
Essentials of Preventive and Community
dentistry Soben Peter; 2nd edi
Fundamentals of biostatistics Sanjeev B
Sarmukkadam
Statistical and methodological aspects of
oral health research - Lesaffre

You might also like