You are on page 1of 22

STATISTICS

Latin world, meaning useful to state Numerical facts systematically arranged A scientific subject that deals with collection, compilation, presentation, analysis, interpretation and making inferences (conclusions) of data.

BIOSTATISTICS
The applications of statistical methods to biological events

VITAL STATISTICS
Data from vital events such as births, deaths, marriage, divorce, fetal deaths It is a major source of information about health of population.

USES OF STATISTICS
1. To collect data in best possible way. 2. To describe characteristics of group 3. To analyze the data & draw conclusions

SOURCES OF HEALTH STATISTICS


Registration of vital events Notification of diseases Record of hospitals Census Surveys Surveillance HIMS

DATA
Any collected piece of information Observations made on individuals These are individual values measured, observed or presented. Recorded values of characteristics of individual of population or sample These are basic building blocks of statistics.

TYPES OF DATA
PRIMARY DATA Data collected for first time to answer specific question of interest in study. SECONDARY DATA Previously gathered data for some other purpose.

COLLECTION OF DATA
There are two approaches of data collection a. CENSUS:---- complete enumeration of whole field -------costly and time consuming b. SAMPLING:----- partial enumeration ------saving money and time METHODS OF COLLECTION OF PRIMARY DATA 1. Observation 2. Questionnaire 3. Interview 4. Case studies 5. Documentation survey METHODS OF COLLECTION OF SECONDARY DATA 1. Official publications 2. Journals & newspapers 3. Research organization

VARIABLE
Any factor that varies Any quantity that varies Any collected piece of information that varies A characteristic of the individual of a population or sample which varies from individual to individual.

Examples: age, weight, income The variable age of person can take different values-----because a person can be 20 years old,35 years old and so on. It is a basic unit to perform a research. All medical research is study of relationship among variables. It provide yardstick on which the effects of treatments or experiences are measured------------------------------ it is characteristic of interest in study.

TYPES OF VARIABLES
------------- according to form of characteristic of interest NUMERICAL/ QUANTATIVE VARIABLES Variables whose values are expressed in numbers Examples: age, weight, number of children, monthly income CATEGORICAL/ QUALITATIVE VARIABLES Variables whose values are expressed in categories Examples: Color: red, blue and green Outcome of disease: recovery, chronicity and death ------ where choice of answers are limited to yes or no

DEPENDENT VARIABLES The variable that is used to describe the problem under study -------------------------------------- also called--------Effect Variable Example: A study to see relationship between mother education and malnutrition in children----------------------------------- malnutrition is a dependent variable INDEPENDENT VARIABLES The variables that is used to describe the factors that cause or influence the problem under study ------------------------------------- also called-------------Cause Variable Example: A study to see relationship between smoking and lung cancer------------------------------------------------------------- smoking is the independent variable ( with values varying from not smoking to smoking more than 3 packets/day) CONFOUNDING VARIABLE A variable that is expressed as nuisance effect that distort true relationship between independent variable (exposure) & dependent variable (disease/outcome) \ Also known as-------- intervening or background or contaminated variable. ------ it confuses our research---- it projects in research but not real variable Example: Mother education------------------------------------ ( independent variable) Malnutrition ------------------------------------ ( dependent variable ) Family income ------------------------------------ ( confounding variable ) common confounding variables------------------------ age, sex, socio-economic status

INDICATOR A variable with characteristics of quality, quantity or time. ----It is operationalizing(defining) the variables--------------- making them measurable------------------------------------------- measuring tool of variable Example: variable:--------------------------------------------- household income indicators:-----------------------high income ( Rs.5000 and above per month) middle income ( Rs.2000-4999 per month) low income ( less than Rs. 2000 per month) HEALTH INDICATOR An indicator which measure different dimensions or changes in health. Example: Number of deaths due to child bearing & puerperium among total live births in a year ---------------------------------------------------(Maternal mortality rate)

Analysis between Demographic and Research variables:------------------- Tests of Significance/Hypothesis testing Analysis between Research variables:---------------------------------------- Tests of Correlation and Regression Assessment of relationship------------ between---------------- two Research variables ---------------------------depends upon------purpose of reserach----------------degree of relationship :------------------------------------- Correlation prediction (forecasting):---------------------------------------Regression Correlation & Regression are two statistical techniques used to define the relationship between two different variables when measured on same people in study. Correlation ------ A statistical tool that tell us how close relationship between variables For example: Age & Weight relationship of boys -------- to study whether a high value in age corresponds to value in weight of boys Regression------ A statistical method that uses relationship between 2 or more variables such that the value of one variable can be predicted based on value of the other ----------------It predicts value of one variable knowing value of another variable Regression analysis is the methodology used for the purpose of prediction one variable is considered to be predicted variable-----its value vary according to predictor variable for example: predicted variable------------ marks obtained in exam predictor variable------------- time spent on study predicted variable --------------------- yield of crops predictor variable --------------------- amount of rainfall

POPULATION (universe )
A large collection of items that have characteristic in common The items----------------------------------------- people, animals, plants or things It is the entire group we are interested in, which we wish to describe or draw conclusions about. It is the entire group about which some specific information is required or recorded Examples: Students in class, chairs in class, books in library, fishes in a lake

SAMPLE
A subset of population which is chosen for investigation For each population, there are many possible samples. By studying the sample, it is hoped to draw about conclusions about population. Sample is a window through which researcher can see entire population. For example: A drop of blood ( sample) will tell us about body (population) chemistry

PARAMETER
A value associated with population Any quantity which define a characteristic of whole population ----------------------------------------------- assigned GREEK letter ( ) This value is unknown----------------------which therefore has to be estimated A parameter is a fixed value --------------------------------which does not vary

SAMPLE STATISTIC
A value calculated from sample Any quantity which define a characteristic of a sample. _ -------------------------------------------assigned ROMAN letter ( X ) This value is used to give information about unknown value in corresponding population ( parameter)

INFERENTIAL STATISTICS
The process by drawing conclusions ( inferences) about population using information ( data ) in samples. There are two approaches; Estimation of parameter Hypothesis testing ESTIMATION OF PARAMETER A procedure to estimate unknown value of parameter by; ---------- Point estimate or Interval estimate ( Confidence Interval ) Point Estimate: A single value is calculated to estimate population value (parameter) _ Ex: X of sample is a point estimate of ( population value) Confidence Interval: Range of values within which parameter( population value) is likely to occur

CONFIDENCE INTERVAL
Sample statistics values vary from sample to sample. Confidence Interval tell us how good is the estimation of parameter (population value), on basis of information provided by Sample Statistics. It is measure of accuracy within which we can pinpoint estimation of parameter. CI is calculated on the basis of SE measurement, which allow us to create CI at specified range of probability CI is constructed at Confidence Levels ( CL) CL -------------- tells you how sure you can be --------There are 4 typical Confidence Levels ------ 99% 98% 95% 90% ----------------------- most researcher use--------- 95% CL For example: 95 % Confidence Interval mean there is 95 % probability that parameter lies within Confidence Limits ( upper & lower limits of Confidence Intervals) and 5 % probability that parameter lies outside the limits.

DATA

SUMMARIZATION

Arrangement pattern of data CENTRAL:--------- tendency of data points clustering in center SPREAD:--------- tendency of data points dispersing in periphery Summary measurement that expresses a single measure; -------------------------------- measure of central tendency ( indicates centrality of data ) -------------------------------- measure of dispersion ( indicates scattering of data )

MEASURES OF CENTRAL TENDENCY


It is a summary of statistics to describe the tendency of observations to cluster in in the central part of data set. The most common measures-----------Mean, Median & Mode MEAN Arithmetic average of distribution of values Statistically mean--- sum of all scores divided by number of scores Mathematically, it is expressed as: Mean of sample; _ x X= _ X(X bar)= sample mean (capital sigma)= summation operator

X=each individual score (sample value) = Total number of scores (sample size) Blood Pressure in individuals---- ( sample size )-------1 , 2 , 3 , 4 Blood Pressure( systolic)-- X ( individual observation) 120,150,110,100 120+150+110+100 sample mean= 4 = 120

Mean of

population x = N (mu) = population mean x = population of X observation (population value) N= Numbers of population members (Population size)

MEDIAN Middle value when observations are arranged in ordered data Ordered data ---------- can be------in ascending or descending order If the total number of a data set are in odd number, then the middle most value is chosen as median, but if it is in even number then the average of two middle values will be the median. It is useful in asymmetrical distribution of data.

MODE Most frequently occurring value in a data set French world meaning------------------------ fashion A data set may have no mode or may have many modes It is occasionally used for describing single distribution of data.

MEASURES OF VARIATION
-------------------------also known as---- DISPERSION or SCATTER It is defined as--------Extent to which values in sample or population vary about their mean. The most common measures-----------Absolute measure --------- compare absolute accuracy of data --------------Range, Variance, Standard Deviation Relative measure -------- compare relative accuracy of data --------- Coefficient of Variation RANGE It is difference between maximum and minimum values in a series It is maximum value minus minimum value R = R2-R1

Its demerit is it based on only 2 values, other values in data set are ignored VARIANCE It is measure of dispersion within a data set It is more representative measure of variability than Range as it uses all measurement in data. Here degree of variation depends amount of variation and sample size Statistically defined as:---Average squared deviation of the data set from the mean Variance of Sample; Sum of squared deviation of the observations from sample mean divided by sample size. S2 = ( x - x ) 2 -1 Variance of Population; Sum of squared deviation of observations from population mean divided by population size. 2 = ( x - ) 2 N Demerit of Variance:--- variance is expressed in squared units of measurements, limiting its usefulness as descriptive term. ------ squared value appear odd in certain situations ,e.g., when study population comprises on human beings like doctors, nurses

STANDARD DEVIATION It is measure of dispersion of data set It is the index of variability ( spread) of the data about their Mean It tells us how much variability can be expected among individual values It is expressed in same units of measurement as original data ----- thus more meaningful---------------as square is eliminated Larger the Standard Deviation-----greater the dispersion Lesser the Standard Deviation-----values are close to Mean Statistically defined as; Square root of Variance Formula for sample standard deviation ___ S=V Or ____________ S = ( x- x )2 / n-1

Formula for population standard deviation ____________ = ( x- )2 / N The steps to calculate standard deviation are: 1. Calculate mean of all measurements. 2. Calculate difference between each individual measurement and the mean 3. Square all these differences. 4. Take the sum of all squared differences. 5. Finally take the square root of the value obtained.

Example: 11 children of 3 years of age were weighed. Their weights were: 13, 14, 14, 15, 16, 16, 16, 17, 17, 18 and 20 kilograms. The no. of measurements n is 11. To calculate standard deviation: 1. first calculate the mean, which is 16 Kg. 2. next we calculate deviation of each measurement from the mean. These are ; 3, 2, 2, 1, 0, 0, 0, 1, 1, 2, 4. These values are then squared 9, 4, 4, 1, 0, 0, 0, 1, 1, 4, 16. 3. The sum of these squared deviations is 40. 4. This sum is divided by the total number of measurements minus one (n-1) 40/11-1 = 04 5. Finally take the square root to obtain standard deviation from mean. __ 4 = 2Kg

STANDARD ERROR
------------------------------------Also called---------Standard Error of Mean If you take out more than one sample from same population, all the samples will yield different Means. --------the variation in these sample Means---- is called------Standard Error It is defined as; It is the measure of the extent to which the sample mean deviate from population mean It measure inter-sample variability It tells us how much variability can be expected among sample means SE = SD n

STANDARD ERROR OF PROPORTION In dealing with qualitative data ------- Mean or SD are not applicable ---------------------- so no chances of SE of Mean in qualitative data in this situation ------------- SE of PROPORTION ----- applicable _____ SE of Proportion = pq / n where p = proportion q = 1- p

n = sample size

COEFFICIENT OF VARIATION It is relative measure of dispersion It is utilized to overcome the difficulties in comparing dispersing data When units of measurement are different. Statistically speaking -------- it is the standard deviation of the distribution expressed as percentage of the mean of the distribution coefficient of variation = standard deviation x 100 mean

DEGREE OF FREEDOM
As most of our statistics is done on samples, we cannot be 100 % sure, therefore to make a conservative estimate we use devisor------ -1 instead of------ for average deviation. -------- defined as; measure of variability which expresses number of options available within space o number which tell us how many of the values may be independently chosen

It is used in calculation ------- variance / SD, t-test, chi-square test

PROBABILITY
Probability mean------ chances of something happening It is quantitative measure of all possible outcome of particular event Event Rolling a die Tossing a coin Drawing cards Possible outcome 1, 2, 3, 4, 5, 6 heads, tails 52 cards

If outcome sure to occur----------probability 1( certain event) If outcome cannot occur----------probability 0 (null event) Range of probability-------------- 0-1 Zero = no chances One = full certainty Probability can also be defined as; Relative frequency of occurrence of an event Frequency = number of times particular score is achieved Relative Frequency = frequency of scores Total number of scores The concept that all men are sure to die-------expressed as----100 %---P=1.0

All other probabilities ----------------------------- measured with this standard 1 chance in 100 = 1 %, P = 0.01

1 chance in 500 = 0.2 % P = 0.002 1 chance in 1000 = 0.1 % P = 0.001 Example: If a treatment for cancer which has a 90 % success rate, the remaining 10 % die. If two patients come for treatment what is the probability that one will die? The probability of either patient dying----------------------- 0.1 The probability of either patient not dying------------------ 0.9 ( 9/10) The probability of both dying :---- 0.1 0.1 = 0.01 The probability of both recovering:----- 0.9 0.9 = 0.81 The balance of probability------ 1- Probability of the event of interest [ 1-(0.81 + 0.01)] [1- 0.82] = 0.18 ------------------------------------the probability that one will die = 0.18

HYPOTHESIS
A statement of prediction Statistical Hypothesis is defined as; A statement of belief used in evaluation of a population parameter------- such as mean of a population

NULL HYPOTHESIS
It is the hypothesis that the samples or population being compared in an experiment study/test are similar. Any difference appeared is due to chance and not due to any other measurable factor. -------------------------------------------------------It simply mean status quo. Null hypothesis is comparable to the law courts assuming innocence until guilt is demonstrated. HYPOTHESIS TESTING/ SIGNIFICANCE TESTING To test the viability of the Null Hypothesis in light of experimental data.

STATISTICAL SIGNIFICANCE It means----------probably true-----------likely to be real Defined as:----A procedure by which sample results are used to decide whether to accept or reject a Null Hypothesis.

The evidence obtained from the sample is not compatible with Null Hypothesis--------- mean------- STATISTICAL SIGNIFICANT -----------------------------------------------------decision is based on p value Small p values-------- lesser than 0.05------------------------------------------------------------------------low degree of compatibility between Null Hypothesis and observed data ----------Null Hypothesis rejected-------statistical significant test Large p value------- greater than 0.05------------------------------------------------------------------------high degree of compatibility between Null Hypothesis and observed data ---------- Null Hypothesis accepted----statistical not significant test

p-VALUE
It measure strength of statistical evidence in scientific study It is happening of phenomenon by chance It is probability of observing a result by chance Probability statement which measure strength of evidence against Null Hypothesis If p = 0.05-------- it mean that there is 5 out of 100 or 1/20 chances that happening would be attributed to chance. There are many different statistical tests to get p-value. 1. Chi-square test 2. Students t-test 3. Z test 4. ANOVA test p-value is usually calculated by following tests, depending upon ------------------------------------------ type of data for quantitative data ------------------------------- t-test for qualitative data ------------------------------ chi-square test

t-test A test to compare the difference in mean between two groups Formula: where t= X- S/ n X = random variable = population mean S = sample standard deviation n = sample size

CHI-SQUARE TEST
A statistical test to determine whether observed difference between study groups are statistically significant on the basis of Null Hypothesis --------it is a measure to check statistically significant association between two variables. Formula: 2 2 X = ( O-E) / E Where O = observed frequency E = expected frequency 2 X = CHI-SQUARE

You might also like