BIOSTATISTICS

Introduction
Definition
Uses Of Statistics In Dental Sciences
Limitations
History
Population And Sample
Sampling
Data Its Types And Sources
Presentation Of Data
Measures Of Central Tendency
Variability
Distribution
Types Of Distribution
Probability
Statistical Methods
Tests Of Significance
Sensitivity And Specificity
Statistical Software
Conclusion
References
BELIEFS & HYPOTHESIS
Epidemiological statistical
survey analysis
THEORIES & PRINCIPLES

It is the science & art of collection, compilation,
presentation, analysis, & interpretation of numerical data
concerned with biological events which are affected by
multiple factors.
It is the science which applies the theory of probability to

the making of estimates & inferences about the
characteristics of population
:
Health statistics public/ community

health.
Medical statistics medicine.
Vital statistics demography.
Dental statistics dentistry.

To assess the state of oral health in the
community
To determine the availability and

utilization of dental care facilities.
To indicate the basic factors underlying the

state of oral health by diagnosing the
community and find solutions to such
problems.
To determine success or failure of specific
oral health care programs or to evaluate the
programme action
To promote health legislation and in

creating administrative standards for oral
health
Statistic laws are not exact laws like mathematical
or chemical laws but are only true in majority of
cases.
Ex: when we say that the average height of an adult

indian is 5 6 , it indicates the height not of
individual but of a group of individuals
Question by Chevalier de Mr asking the French
mathematician
Blaise Pascal (1623 - 1662) in 1654:
Consider the following bet:

1)One dice is thrown four times.
Assume there is at least one 6.
2)Two dices are thrown simultanously
24 times. Assume there is at least
one throw with two 6.
After this question there was a correspondence between Blaise
Pascal
and Pierre de Fermat (1601 - 1665) about this problem.
The Chevalier has asked, whether it is correct,

that bet 1) more often wins than bet 2).
Pascal and Fermat were able to prove this
assumption mathematically.
Pierre Simon Marquis de Laplace (1749 - 1827)
First summary of the whole

kowledge
in the area of probability
theory.
Jacob Bernoulli (1654 - 1705)
Law of large numbers:

Relative frequencies converge
to the probability of the event
(repeating independent events)
Carl Friedrich Gauss (1777 - 1855)
Gauss (Normal distribution)

Karl (Carl) Pearson (1857 - 1936)
Methods of correlation- and

regression analysis
(correlation coefficient by Pearson).
Chi-square-distribution
Chi-square-test
William Sealy Gosset (1876 - 1937)
(Student)
worked since 1899 as chemist

in the Guinness brewery in Dublin
analysed quality of beer in the case of

small sample sizes
student-distribution (= t-distribution)
substitutes Gauss-distribution
if variance is unknown
Sir Ronald Aylmer Fisher (1890 - 1962)
The Design of Experiments
variance analysis
F-distribution
population genetics
Egon Sharpe Pearson (1895 - 1980) Jerzy Neyman (1894 - 1981)
Test theory since 1940 Neyman-Pearson-test

Andrei Nikolaevich Kolmogorov (1903 1987)
Developed basis of probability theory 1933

Population a group of individuals that we
would like to know something about
Sample a subset of a population (hopefully

representative)
Parameter - a characteristic of the population in
which we have a particular interest
Statistic a characteristic of the sample
Examples:
The observed proportion of the sample that
responds to treatment
The observed association between a risk factor
and a disease in this sample
1.Well chosen
2.Sufficiently large (to minimize sampling

error)
3.Adequate coverage
Population Sample
Representative of population
looking at the characteristics of

expensive and time-consuming the sample (statistics)
something about the

impractical characteristics of the population
(parameters)
Time saved, economical,

supervision is better, results
received earlier & applied timely
As large as possiblebigger the sample ,
higher will be the precision of estimates of
the sample.
n= Z 2 p (1-p)
L2
Following factors:
Idea about estimate of characteristics under study

and its variabilitypilot study
Knowledge about precision of estimate of character
Probability level within which desired precision

maintained
Availability of experimental material, resources ..

1. Difference expected
2. Positive character
3. Degree of variation among subjects
4.Level of significance desired- p value
5. Power of the study desired
6. Drop out rate

Sampling Methods
Probability Non - Probability

1. Simple random
1. Accidental/ convenience
sampling
2. Purposive/ nonrandom/ selective
2. Systematic/ serial
3. Network / snowball
3. Stratified
4. Quota sampling
4. Multiphase
5. Dimensional
5. Cluster
Random :chance of population unit being selected in
sample
Applicable when small population , homogenous , readily

available
To ensure randomness Lottery method

Table of random numbers
A simple random sample of 20 cases
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140
141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160
161 162 163 164 165 166 167 168 169 170
171 172 173 174 175 176 177 178 179 180
181 182 183 184 185 186 187 188 189 190
191 192 193 194 195 196 197 198 199 200
Serial sampling
Applied to field studies
K = sample interval
K = total population/ sample size desired
Adv
simple
Less time & labor
Results fairly accurate
2 3 4 5 6 7 8 9 10
1
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140
141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160
161 162 163 164 165 166 167 168 169 170
171 172 173 174 175 176 177 178 179 180
181 182 183 184 185 186 187 188 189 190
191 192 193 194 195 196 197 198 199 200
Sub groups
Target population divided
into homogenous groups or
classes called strata
Strata age , sex , classes ,
geographical area
More representative sample
Greater accuracy
Covers wide area
school children for oral disease
Those with disease
Those requiring treatment

Cluster is a randomly selected group
Units of population in natural groups or
clusters. Villages, slums of town
Simple method , less time and cost
Higher error
used where
(1) no sampling frame directly available, and/or

(2) simple random sampling would be expensive,
complex, time-consuming and/or logistically
difficult.
Sampling errors Faulty sample design
Small sample size
Non sampling errors Coverage error

Observational error
Processing error
1. Controls
2. Randomization or random allocation
3. Cross over design
4.Placebo
5. Blinding technique -single/ double blinding

There are three types
1.Observer-subjective / objective
2.Instrumental
3.Sampling defects or error of bias

This is also called as systematic error. This
occurs when there is a tendency to produce
results that differ in a systematic manner
from the true values. A study with small
systematic error is said to have high
accuracy. Accuracy is not affected by the
sample size.
..
There are as many as 45 types of biases,

however the important ones are:
1.Selection bias
2.Measurement bias
3.Confounding bias
Dahlberg in 1940 used this formula to calculate
the method error
Method error=d2
2n
Where
D=difference Between two measurements of a pair
N = Number of subjects
A collective recording of observations either
numerical or other-wise
Data are not necessarily information, and having
more data does not necessarily produce better
decisions.
The goal is to summarise and present data in

useful ways to support prompt and effective
decisions.
Types of data
Constant
Variable
1)Discrete 1)Grouped
2)Continous 2)Ungrouped
1)Qualitative
1)Nominal
2)Ordinal
2)Quantitative Types of
data
Discrete : When the variable under
observation takes only fixed values like
whole numbers , data is discrete
Continuous : If the variable can take any

value in a given range , decimal or
fractional.
Ordinal variables-A set of data is said to be
ordinal if the values / observations belonging
to it can be ranked (put in order) or have a
rating scale attached. You can count and
order, but not measure, ordinal data.
Nominal variables have no natural ranking

order .
Dichotomous variable-have only two categories

such as male/female.
Interview Measurement
Questionnaire
.PRIMARY
Records ..SECONDARY
Nominal data or qualitative data
Frequency distribution of some
characteristics sex, religion etc
Ordinal data
According to rank or orderno
implication of class-interval
Interval data
Placed in meaningful intervals or
order
Ratio data
Frequency distribution in logical
order & meaningful groups
Meaningful ratio exists
It helps in further analysis
Selection of data presentation
Selection of test of significance
1. Variation biological & sampling

2. Mistakes & errors instrumental, systematic,
random
3. Bias
selection bias
measurement bias
confounding bias
Tables
Presentation of
data Charts
Diagrams
Tabulation
Graphs
Pictures
Diagram
Special curves
Devices for presenting data simply from
masses of statistical data
First step before data is used for statistical

analysis
Simple
Tables
Complex
Every table should contain a title as to what
is depicted in the table
Each row and column should be clearly

defined with a heading for each row and
column
Units of measurement should be specified
If the data is not original , the source of the

data should be mentioned at the bottom of the
table
Every diagram must be given a title which is
self explanatory
It should be simple and consistent with the

data
Usually the values of variables are presented on

the horizontal or x-axis and the frequency on
the y-axis
The number of lines drawn in any
graph should not be many so that the
diagram does not look clumsy
The scale of presentation of x and y axes

should be mentioned at the right hand
top corner of the graph
This diagram is useful to study changes of
values in the variable over time and is the
simplest type of diagram
On the x- axis the time such as hours, days,

weeks, months or years are represented and the
value of any quality pertaining to this is
represented along the y- axis
This diagram is used to compare qualitative data
with respect to a single variable like with respect to
time or region .
Represent qualitative data
When it desired to represent both the number of

cases in major groups and as well as the subgroups
simultaneously, we use component bar diagram
This diagram is used to depict quantitative data of
continuous type
A histogram is a bar diagram without gap between

the bars
It represents a frequency distribution

On the x- axis size of observation is marked .
On the y axis frequencies are marked
This is used to represent frequency distribution of
quantitative data and is used to compare two or more
frequency distributions
To draw a frequency polygon, a point is marked over the mid-

point of the class interval, corresponding to the frequency.
Then these points are connected by straight lines
1st Q
1st Qtr 1st Qtr
2nd
2nd Qtr 2nd Qtr 3rd Q
3rd Qtr 3rd Qtr 4th Q
4th Qtr 4th Qtr
1s
2n
3rd
Graph looks like a pie 4th
Components represent slices cut from a pie

Shows % breakdown for qualitative data
Cannot be used for two or more data sets
Venn diagrams or set
diagrams are diagrams that
show all hypothetically
possible logical relations
between a finite collection of
sets
It is a representation of the
qualities (25%, 50% & 70%)
and the range of a continuous
and ordered data set
The Y axis can be arithmetic

or logarithmic
Box plot can be used to
compare different
distributions of data values
1. Collection of data
2. Classification
3. Tabulation
4.Presentation by graphs
5. Descriptive statistics
6. Establishment of relationship
7.Interpretation
Refers to the middle of the distribution
value or parameter which serves as
single estimate of a series of data
Mental picture of central value
Enables comparison
One central value around which all
other observations are dispersed
OBJECTIVE
To condense the entire mass of data
To facilitate comparison
TYPES
Arithmetic mean- mathematical estimate
Median - positional estimate
Mode- based on frequency
Should be easy to understand and compute.
should be based on each and every item in the

series.
should not be affected by extreme observations

(either too small or too large values).
should be capable of further statistical
computations
It should have sampling stability. i..e, if

different samples of same size, say 1 are
picked up from the same population and the
measure of central tendency is calculated, they
should not differ from each other markedly.
simplest measure of central tendency
Mean = Sum of all the observations of the data

Number of observations in the data
0 1 2 3 4 5 6 7 8 9 10
X = Xi
n Mean = 5
: sigma, means the sum of.
Xi : is the value of each observation in the
data,
n: is the number of observations in the
data
The most common measure of central
tendency
Affected by extreme values (outliers)
one half of the one half have a value
values have a higher than or
value smaller than or equal to the median.
equal to the median
Middle Value In A Distribution

Robust Measure Of Central Tendency
Not Affected By Extreme Values
Observations are arranged in the order of
magnitude & then the middle value of the
observations : median.
Odd number of observations : (n + 1) / 2
Even: the mean of the two middle values
Total no observations / 2
Value In A Series Of Observations Which Occurs With
The Greatest Frequency
Not Affected By Extreme Values
Used For Either Numerical Or Categorical Data
There May Be No Mode
There May Be Several Modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
Synonyms:
Measures of dispersion
Measures of variation or scatter
Dispersion is the degree of spread or variation of
the variable about a central value.
Uses:
Determine reliability of an average
Serve as a basis of control of variability
Comparison of two or more series
Facilitate further statistical analysis
A good measure of dispersion :

simple , easy to compute , based on all items ,
amenable for further analysis and not affected by
extreme values.
Of individual observations
Range
Interquartile range
Mean deviation
Standard deviation
Coefficient of variation
Measure of variation
Difference between the value of the smallest
item and the value of the largest item
Simplest method
Gives no information about the values that lie
between the extreme values
Subjected to fluctuations from sample to
sample
Range = 12 - 7 = 5 Range = 12 - 7 = 5
7 8 9 10 11 12 7 8 9 10 11 12
Ignores the way in which data are distributed

The average of the deviations from the
arithmetic mean
M.D= (x - x)
n
Most important and widely used
It is the square root of the mean of the
squared deviations from arithmetic
mean. n
X X
2
i
Root mean square deviation S
2 i 1
n 1
Greater the deviation greater the
dispersion
Smaller the deviation- higher degree of
uniformity
Calculate the mean = x
Diff of each observation from mean,
d = xi x
Square these = d
Total these = d
Divide this by no of observations minus 1,
variance = d/ (n-1)
Square root of this variance is
S.D = d
(n-1)
Summarizes the deviations , of a large
distribution
Indicates whether the variation from mean is by
chance or real
Helps in finding standard error
Helps in finding the suitable size of sample
Standard deviation is only interpretable as a
summary measure for variations having
approximately symmetric preparations
Compare relative variability
Variation of same character in two or more series
Compare the variability of one character in two

different groups having different magnitude of
values or
to compare two characters in the same group by

expressing in percentage
Cv= s.D x 100

Mean
When you have a collection of points you
begin the initial analysis by plotting them
on a graph to see how they are distributed

1.Normal-Gaussian
2.Binomial
3.Poisson
4.Rectangular or uniform
5.Skewed
6.Log normal
7.Geometric
Uniform or rectangular
Bimodal
De moirre & Guass independently in 1933
Height of bars or curve greatest in middle
Values are spread around mean
Maximum values around mean , few at

extremes
half values above & half below mean
normal distribution / Gaussian distribution

curve is bell shaped.
The curve is symmetrical about the middle point
The mean is located at the highest point of the curve
Measures of central tendency coincide
Maximum number of observations is at the value of the
variable corresponding to the mean
Number of observations gradually decreases on either side
with very few observations at the extreme points

Area under the curve between any 2 pts which
correspond to the number of observations between any 2
values of the variate - in terms of a relationship between
the mean and the SD:
a) Mean 1 S.D. covers 68.3% of the observations;
b) Mean 2 S.D. covers 95.4% of the observations;
c) Mean 3 S.D. covers 99.7% of the observations.
Confidence limits and used for fixing confidence
interval.
Normal distribution law forms the basis for various
tests of significance
Describes how data is distributed
Measures of shape
Symmetric or skewed
Left-Skewed Symmetric Right-Skewed

Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean
The binomial distribution is used for describing
DISCRETE NOT THE CONTINUOUS DATA. These
values are as a result of an experiment known as
Bernoullis process.
They are used to describe
1. One with certain characteristic
2. Rest without this characteristic
The distribution of the occurrence of the characteristic
in the population is defined by the binomial
distribution.
If in a binomial distribution the value of
probability of success and failure of an event
becomes indefinitely small and the number of
observation becomes very large, then binomial
distribution tends to poisson distribution.
This is used to describe the occurrence of rare

events in a large population.
Possibility of occurrence of any event or a
permutation/ combination of events
To determine if the treatment group is different
from control group
If it is less than .05, it means there are fewer
than 5 chances out of 100 that the difference we
observe are due to random chance alone.
Probability scale = 0 to 1
P = 0 absolutely no chance that the
observed diff is due to sampling
variation
P = 1 absolutely certain that the
observed difference is due to
sampling variation
P = between 0 to 1
Essence of any test of significance. P value &
draw inference
P- value
0.05 or > < 0.05
Diff due to diff in

Diff due to chance
sample themselves
( sampling variation)
(not due to sampling variation)
Statistically not significant Statistically significant
P < 0.001 very highly significant

P < 0.01 Highly significant
P < 0.05 Significant
P > 0.05 Not significant
It indicates how much an observation is bigger
or smaller than mean in units of SD
Z ratio = Observation Mean
Standard Deviation
The Z score is the number of SDs that the
simple mean depart from the population mean.
As the critical ratio increases the probability of
accepting null hypothesis decreases.
Statistical hypothesis
Null hypothesis or Alternative hypothesis of

hypothesis of no difference significant difference
STEP-1 Set up the null hypothesis
STEP-2 Set up the alternate hypothesis
STEP-3 Choose the appropriate level of significance
STEP-4 Compute the value of test statistic

Z VALUE = Observed difference
Standard error
STEP-5 Obtain the table value at the given level of
significance
STEP-6 Compare the value of z with that of table
value
STEP-7 Draw the conclusion

Initial assumption
Null hypothesis Ho
(no diff)
experiment
Tests of significance
Diff observed
p-value
Real diff No Real diff Acceptation or

Rejection only
If sufficient sample
Size exist
Alternate hypothesis accepted

Null hypothesis is accepted
[H1]
Null hypothesis
CONCLUSION BASED ON
POPULATION SAMPLE
NULL NULL
HYPOTHESIS HYPOTHESIS
REJECTED ACCEPTED
NULL TYPE I ERROR CORRECT

HYPOTHESIS DECISION
TRUE
NULL CORRECT TYPE II ERROR

HYPOTHESIS DECISION
FALSE
To Minimize Errors The Sampling Distribution Or
Area Under Normal Curve Is Divided Into Two
Regions Or Zones
1. ZONE OF ACCEPTANCE :samples in the area of mean

1.96 SE, null hypothesis accepted
2. ZONE OF REJECTION: sample in the shaded area is

beyond the mean 1.96 SE, null hypothesis - rejected
Whenever 2 sets of observation have been compared, it becomes
essential to find whether the difference in observations between
the 2 groups is because of sampling variation/ any other factor
TESTS OF
SIGNIFICANCE
Parametric tests Non-Parametric tests

Parametric Non Parametric
1 Student paired T test 1 Wilcoxan signed rank test
2 Student unpaired T test 2 Wilcoxan rank sum test
3 One way Anova 3 Kruskal wallis one way anova
4 Two way Anova 4 Friedman one way anova
5 Correlation coefficient 5 Spearmans rank correlation
6 Regression analysis 6 Chi-square test

Performed with following assumptions
Dependent variable is continuous type
(measured interval or ratio scale)
Underlying population has normal distribution
When difference or measures of association are

being tested, variances of samples do not vary
significantly
For testing difference
T-test (paired & unpaired)
Analysis of variance (ANOVA)
Test of significance of difference between 2
means
For testing correlation
Pearsons product moment correlation
coefficient
Does not start with assumptions
1. for ordinal & non-parametric data to test diff
1. Mann Whitneys test
2. Wilcoxans matched pair test
3. Sign test
4. Mc Nemars test
5. Krusal Walliss test
6. Friedmans test
7. Kendalls test
1. For category type data to test diff
1. Chi-square test
2. Fischers exact test
3. Cochranes Q-test
2. For ordinal & non-parametric data to test
correlation
1. Spearmans rank correlation
2. Kendalls
3. For nominal type data
1. Phi coefficient
2. Cramers V
There are many theoretical distributions, both
continuous and discrete.
4 of these a lot: z (unit normal),

sampling distribution
of means
t,
chi-square, and
sampling distribution
of variances
F/ ANOVA
To compare sample mean with population
Means of two samples
Sample proportion with population
Proportion of two samples
Association between two attributes

One-sided tests have one
rejection region, i.e. you check
whether the parameter of
interest is larger (or smaller)
than a given value
Two-sided tests are used when

we test a parameter for
equivalence to a certain value.
Deviations from that value in
both directions are rejected.
Large samples ( > 30)
Difference observed b/w

sample estimate and that
of population is expressed
in terms of SE
Score of value of ratio b/w

the observed difference &
SE is called Z
Z = diff in means / SE of
mean
To compare two methods, it is often
important to know whether the
variances for both methods are the
same
In order to compare two variances V1,
and v2calculate the ratio of the two
variances
This ratio is called the f-statistic F
= v1/v2
Compare more than two samples
Compares variation between the classes as well
as within the classes
For such comparisons there is high chance of
error using T or Z test
One-way Two-way
compare more than 3 means compare 2 or more means
from independent groups by 2 or more factors.
Is the age different between White, Black, Hispanic patients?
Is the age different between Males and Females, With and

Without Pnuemonia?
One-way ANOVA
Two-way ANOVA
Factorial ANOVA
Mixed design ANOVA
Multivariate ANOVA (MANOVA)
When the degree of linear (straight line) association
between two variables is required, correlation coefficient is
calculated.
Ex: measure the changes in height and the changes that
occurred in weightand plot the determined values on
graph paper.
A line of best fit is then made to connect the majority
of the plotted values.
One has to look at a scatter plot of the data before
placing any importance on the magnitude of
correlation.
Height in cms Weight in Kg
1 182.1 79.5
2 172.5 61.5
3 175.7 68.2
4 172.8 66.4
5 160.3 52.6
6 165 .5 54.3
7 172.8 61.1
8 162.4 52.8
POSITIVE CORRELATION NEGATIVE CORRELATION
CORRELATION COEFFICIENT (r)
ABSOLUTELY
NO CORRELATION
Linear regression is related to correlation
analysis.
This seeks to quantify the linear relationship

that may exist between an independent variable
x and a dependent variable y
Y=a+bx
Linear Regression Analysis
Comparing two means to see if
they are significantly different
from each other
Small samples
Designed by W.S Gossett
Used in case of small samples
Ratio of observed difference between means of
two small samples to the SE of difference is
same
When each individual gives a pair of
observations , to test for difference in pair of
values , paired t test is utilized
Used to compare the average (mean) in one
group with the average in another group.
Univariate, Unmatched, Interval, Normal, 2
groups.
Eg. 6 boys on diet A- 4,3,5,2,3,1
9 boys on diet B- 6,3,8,9,5,3,4,2,5
x=6 y= 9 SD 2.04
Test the significance of diff in diet A n B
with regards to their effect on increase in
weight
Used to compare the average for
measurements made twice within
the same person - before vs. after.
For example, Did the systolic

blood pressure change
significantly from the scene of
the injury to admission?
Univariate, Matched, Interval,

Normal, 2 groups
The most commonly used statistical test.
Used for qualitative data
To test whether the difference in distribution of
attributes in different groups is due to sampling
variation or otherwise.
Used when data are in frequencies such as in no. Of
responses in 2 or more categories
Three imp applications

1. Proportion
2. Association
3. Goodness of fit
Test of proportions
Significance of diff in 2 or more than 2 proportions
Values < 30 applied with YATES CORRECTION
Test of association
Test of association between 2 events in samples. most
important application
Probability of association between 2 discrete attributes.
Smoking vs. cancer, wt vs. DM, BP vs. HD
Two events
dependent
Influence each other ASSOCIATED
Test for association
independent
Do not Influence each other Not ASSOCIATED
- p or relative frequency of association due to chance &

also if 2 events are associated or depend on each other
Test of Goodness of Fit
to determine if actual numbers are similar to
the expected or theoretical numbers ie Goodness of
Fit to a theory
Chi-square calculation
(observed f expected f )
= Expected f
2nd d.f. = (R-1) x (C-1)

These tests are non-parametric equivalent of student
t tests. Wilcoxan signed rank is used for paired
data and wilcoxan rank sum is used in case of
unpaired data.
These are similar to parametric anova tests.
Kruskal-wallis is used for one way analysis of
variance and Friedman is for two way analysis
of variance.
Spearmans rank correlation and
Kendalls rank correlation are the non-
parametric equivalents of correlation
coefficient test.
It is used to classify cases into the values of
a categorical dependent, usually a
dichotomy. If discriminant function
analysis is effective for a set of data, the
classification table of correct and incorrect
estimates will yield a high percentage
correct.
Gene Glass(1976) coined the term meta analysis.
The technique of meta analysis involves reviewing and

combining the results of various previous studies.
Provided the studies involved similar treatments, similar
samples, and measured similar outcomes, this can be a
useful approach.
Use parametric Non parametric
To compare two paired Paired t test Wilcoxan signed

samples for equality of rank test
means
To compare two Unpaired t Mann Whitney test
independent samples for test
equality of means
To compare more than two ANOVA Kruskal-Wallis

samples for equality of Chi square test
means
Clinical research can indeed have controls.
Provided that studies are conducted on a
prospective basis, controlled clinical studies can
be quite powerful.
Uncontrolled clinical studies are of questionable

validity, whether or not they are subjected to
statistical analysis.
The sensitivity of a test is the probability that
the test is positive for those subjects who
actually have the disease. A perfect test will
have a sensitivity of 100%. The sensitivity is
also called the true positive rate.
The specificity of a test is the probability that

the test is negative for those in whom the disease
is absent. A perfect test will have a specificity of
100%. The specificity is also called the true
negative rate.
DIAGNOSIS
TEST
RESULT DISEASE DISEASE TOTAL
PRESENT ABSENT
POSITIVE (+) a ( 8) b (10) a +b=(18)
NEGATIVE(-) c (20) d ( 62) c+d = (82)
TOTAL a +c = (28) b +d (72) N =100
Sen = a/a+c) *100 spec= d/(b+d)*100

pre +test= a/(a+b)*100 pre test=d/(c+d)*100
%false neg= c/(a+c ) *100 %false pos= b/(b+d)*100
SPSS (PASW) - 1968 - Norman H. Nie and C.
Hadlai Hull
MINITAB - Barbara F. Ryan, Thomas A. Ryan, Jr.,

and Brian L. Joiner in 1972
Epi Info - developed by center for disease control and

prevention (CDC) in Atlanta
MICROSOFT EXCEL
To conclude statistics might not be of
much significance in clinical practice but
when it comes to pursuing newer
techniques and principles, scientific
substantiation can be obtained only with
statistical verification and interpretation.
References
Parks Textbook of Preventive and Social
medicine; 17th edi
Essentials of Preventive and Community
dentistry Soben Peter; 2nd edi
Fundamentals of biostatistics Sanjeev B
Sarmukkadam
Statistical and methodological aspects of
oral health research - Lesaffre

BIOSTATISTICS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BIOSTATISTICS

Uploaded by

Copyright:

Available Formats

Introduction

THEORIES & PRINCIPLES

It is the science which applies the theory of probability to

Health statistics public/ community

Medical statistics medicine.

Vital statistics demography.

Dental statistics dentistry.

To determine the availability and

To indicate the basic factors underlying the

To promote health legislation and in

Ex: when we say that the average height of an adult

Consider the following bet:

The Chevalier has asked, whether it is correct,

First summary of the whole

Law of large numbers:

Gauss (Normal distribution)

Methods of correlation- and

worked since 1899 as chemist

analysed quality of beer in the case of

Test theory since 1940 Neyman-Pearson-test

Developed basis of probability theory 1933

Sample a subset of a population (hopefully

2.Sufficiently large (to minimize sampling

looking at the characteristics of

something about the

Time saved, economical,

Idea about estimate of characteristics under study

Knowledge about precision of estimate of character

Probability level within which desired precision

Availability of experimental material, resources ..

3. Degree of variation among subjects

4.Level of significance desired- p value

5. Power of the study desired

6. Drop out rate

Probability Non - Probability

Applicable when small population , homogenous , readily

To ensure randomness Lottery method

Those with disease

Those requiring treatment

(1) no sampling frame directly available, and/or

Non sampling errors Coverage error

2. Randomization or random allocation

3. Cross over design

5. Blinding technique -single/ double blinding

3.Sampling defects or error of bias

There are as many as 45 types of biases,

The goal is to summarise and present data in

Continuous : If the variable can take any

Nominal variables have no natural ranking

Dichotomous variable-have only two categories

1. Variation biological & sampling

First step before data is used for statistical

Each row and column should be clearly

If the data is not original , the source of the

It should be simple and consistent with the

Usually the values of variables are presented on

The scale of presentation of x and y axes

On the x- axis the time such as hours, days,

When it desired to represent both the number of

A histogram is a bar diagram without gap between

It represents a frequency distribution

To draw a frequency polygon, a point is marked over the mid-

Components represent slices cut from a pie

The Y axis can be arithmetic