You are on page 1of 104

FUNDAMENTALS OF BIOSTATISTICS FOR HEALTH SCIENCES

HOW STATISTICS ARE USED Define statistics Reasons for studying statistcs Descriptive vs inferential statistics Surveys vs experiments Retrospective vs prospective studies Descriptive vs analytical surveys Define bias Purpose and component of clinical trial

The meaning of statistics......

Recorded data Number of traffic accidents Size of enrollment Number of patients visiting a clinic

The meaning of statistics......

Denote characteristics calculated for a set of data Mean Standard deviation Correlation coefficient

The meaning of statistics......

Body of techniques and procedures dealing with the collection, organization, analysis, interpretation, and presentation of information that can be stated numerically

What do statisticians do?........


A member of a group that works on challenging scientific tasks Frequently engaged in projects that explore the frontiers of human knowledge Primarily concerned with developing and applying methods used in collecting and analyzing data Develop new techniques that provide a unique approach to a particular study to draw valid conclusions

What do statisticians do?........


To guide the design of an experiment or survey To analyze data (examining relationships among several variables, describing, and and analyzing the height, weight or determining whether a difference in some response is significant) To present and interpret results Used for decision making process

Uses of statistics....

2 major categories of statistics Descriptive Inferential

Descriptive statistics deals with Enumeration Organization Graphical presentation of data

Inferential statistics is concerned with reaching conclusions from incomplete information Uses information obtained from a sample to say something about an entire population

Why study statistics......


Essential for both understanding and conducting research in any of the health professions Has a strong emphasis on science and the scientific method for advancing the profession Can help anyone discriminate b/w fact and fiction Should help you when or what purpose a statistician should be consulted

Sources of Data
Observational surveys Planned surveys Examples: effects of atomic bomb explosion (no controlled or assigned subjects) Experiments Reduction of blood pressure among veterans with the use of anti-hypertensive and placebo

Categories of surveys
Health researchers conduct surveys on human populations... categories: Restrospective studies (also referred to as case-control studies) Prospective studies (also called cohort studies)

Categories of surveys
Retrospective studies use past data from selected cases and controls to determine differences, if any, in the exposure to a suspected factor.

The researcher identifies individuals w/ a specific disease or condition (CASE) Also, a comparable sample w/o the disease or condition (CONTROL)

Categories of surveys
PURPOSE OF THE COMPARISON: To determine if the 2 groups differ as to their exposure to some specific factor Examples: Smoking habits of women who bore premature babies with those of women who carried their pregnancies to term The researcher seeks to determine whether there is statistical relation b/w the possible STIMULUS VARIABLE, or causative factor (SMOKING), and the OUTCOME VARIABLE (PREMATURITY)

Categories of surveys
DISADVANTAGEs: Data were usually collected for other purposes Incomplete data Surveys frequently fail to include relevant variables that may be essential to determining whether the 2 groups studied are comparable Thus, results may be in a cloud of doubt Unknown biases frequently hinder these studies

Categories of surveys

ADVANTAGEs:
Economical Applicable to the study of rare studies Strong possibility of obtaining answers quickly bec the cases are usually easily identified

Categories of surveys

STEPS in Sample selection: Out come variable (disease) Stimulus variable (factor)

Categories of surveys
PROSPECTIVE STUDIES The researchers enroll a group of healthy persons (COHORT) and follow them over a certain period to determine the frequency with which a disease develops Example: Framingham study (based on the presence or absence of one or more variables) Smoking, diabetes, hypertension, obesity) At the beginning of the study, these are only suspected of being related to CVD or stroke

Categories of surveys
PROSPECTIVE STUDIES Steps: First look on at the key variables of interest because the disease being studied have not yet occurred In Framingham study, as subjects either developed CVD or had a stroke, the key variables were compared for the individuals who acquired the disease and those who did not

Categories of surveys

PROSPECTIVE STUDIES Steps: The researcher could then begin to determine the degree of importance of each variable in relation to CVD or stroke

Categories of surveys
PROSPECTIVE STUDIES Advantages: Accurate estimation of disease incidence in a population Possibility of including potentially relevant variables (age, gender, ethnicity, occupation) that are related to the outcome variable Data are collected under uniform conditions and for specified reasons Better opportunities to draw appropriate conclusions or comparisons Limited amount of bias (systematic error)

Categories of surveys

PROSPECTIVE STUDIES Disadvantages: Not used to establish or prove a causal relationship (because the variables cannot be randomly assigned or manipulated)

Descriptive vs. Analytical surveys


Retrospective surveys are usually descriptive surveys Provide estimates of a population's characteristics (proportion of individuals who had physical exam during the past 12 mos) Prospective may be descriptive or analytical Analytical surveys seek to determine the degree of association between a variable and a factor in the population Ex: relationship b/w having or not having regular PE and some measure of health status

Clinical Trials
A carefully designed experiment that is generally considered to be the best method for evaluating the effectiveness of a new drug or treatment method Used extensively to test the efficacy of new drugs and treatments Required by FDA before drugs & other medical products receive approval

Clinical Trials

2 key features:
1. randomization 2. blinding ****both helps minimize bias

Clinical Trials
BLINDING... ***the study subjects and/or the investigators do not know who is in the control group and who is in the experimental group
Purpose: ***to reduce the likelihood that study assessments will be biased because subject or investigator behavior has been influenced by knowledge of treatment group assignment

Clinical Trials
Single-blind study.... ***the subject does not know if she or he is in the treatment or the control group
Double-blind study... ***neither the subject nor the investigator knows to which group the subject is assigned ***a neutral party keeps track of who is in which group and typically discloses it only at the conclusion of the trial.

Clinical Trials
RANDOMIZATION... ***method most likely to consistently reduce bias by producing equivalent treatment and control groups. Treatment group... ***receives a potentially therapeutic agent ***compared with the control group

Clinical Trials
Control group... ***receives a placebo therapeutic agent

or

the

standard

***In RANDOMIZATION, each subject in the trial is randomly assigned to either the experimental or the control group

Clinical Trials

Message: ***Trials that use non-random assignments tend to produce biased overestimates of true therapeutic efficacy***(Schechtman)

POPULATIONS & SAMPLES


SELECTING APPROPRIATE SAMPLES Population... ***a set of persons or objects having a common observable characteristic Sample... ***a subset of a population

POPULATIONS & SAMPLES


Examples: 1. if we want to know how many persons in a community have quit smoking, or have health insurance, or plan to vote for a certain candidate ****information is obtained on an appropriate sample of the community & generalize from it to the entire population Critical importance: HOW A SUBGROUP IS SELECTED

POPULATIONS & SAMPLES


***the way the sample is selected, not the size, determines whether we may draw appropriate inferences about the population ***whatever variable you are studying, a relatively small sample should very closely approximate the population

POPULATIONS & SAMPLES


Examples: 2. identifying the health issues of college students and then develop programs or provide health care resources depending upon their needs
***all students in the school...SAMPLE OR A POPULATION? It depends.

POPULATIONS & SAMPLES


***if college in general, the school would be a subset or sample of the population ***if only students at your school were sampled, could the information gathered be used to make inferences about all college students in the Philippines?

POPULATIONS & SAMPLES

***answer could be PROBABLY NOT!!! ***there is likely to be something unique about your school that makes it non-representative of college students in general.

POPULATIONS & SAMPLES


RANDOM SAMPLE... ***every subject has an equal chanc f being selected ***seem to be the best route ***selecting a random sample is fairly easy ***does not guarantee a representative sample, though ***yet, it is the technique most likely to yield a representative sample

POPULATIONS & SAMPLES


RANDOM SAMPLE...
primary purpose: to obtain representative sample and then, based on the sample statistics, to make inferences about the population.

***a 50% response rate might be good enough to ensure a representative sample ***but sampling bias cannot be prevented

POPULATIONS & SAMPLES


In the case of clinical trials, would it employ random sample or non-random sampling? ***the sample would be those patients who met the criteria for testing the new drug, device or procedure
***assumption: whatever results are found at the test site, similar results will be obtained at other sites & with different subjects

POPULATIONS & SAMPLES


WHY SAMPLE? WHY NOT STUDY THE ENTIRE POPULATION? ***We make samples because for most purposes we can obtain suitable accuracy quickly and inexpensively on the basis of the information gained from a sample alone.

HOW SAMPLES ARE SELECTED


Convenience sampling ***group is selected at will or in a particular program or clinic ***self-selected ***problems arise in analysis and in drawing inferences/conclusion ***used when it is impossible to select a random sample example: alcohol consumption among college students in UNEP, researcher picks 1 section of a course as representative

HOW SAMPLES ARE SELECTED


Systematic sampling ***used when a sampling frame (a complete, non-overlapping list of the persons or objects constituting the population) is available ex: 1st, next is every 9th (from a population of 50) n = desired sample size N = size of entire population

HOW SAMPLES ARE SELECTED


Stratified sampling ***used when we wish the sample to represent the various strata (subgroups) of the population proportionately or to increase the precision of the estimate
***a simple random sample is taken from each stratum

HOW SAMPLES ARE SELECTED


Cluster sampling ***we select a simple random sample of groups, e.g., certain number of city blocks

***a person is interviewed in each household of the selected blocks ***more economical technique than the random selection of persons throughout the city

HOW TO SELECT A RANDOM SAMPLE

***Best technique and widely used is the use of a computer program to select random numbers ex: SPSS

EFFECTIVENESS OF A RANDOM SAMPLE


EXAMPLE:
Study: Investigation of heart disease among men ages 45-67 ***5 separate samples of 100 each are selected from the population ***the mean ages were compared with the population mean ***Refer to Table 2.1 p.24

EFFECTIVENESS OF A RANDOM SAMPLE


Results: Population parameter = 54.36 5 statistics representing this mean are very close to it. The difference b/w the sample estimate & the population mean = not exceeds 0.5 year Analysis: similarity among sample means are very close to each other.

EFFECTIVENESS OF A RANDOM SAMPLE


Conclusion: ***assessing all individuals in a population may be impossible, impractical, expensive, inaccurate ***instead, study a sample from the original population ***clearly identify the population ***list it in a sampling frame ***utilize an appropriate sampling technique ***random sampling is usually the most desirable technique (easy to apply, limits bias, provides estimates of error, & meets the assumptions necessary for many statistical tests)

EFFECTIVENESS OF A RANDOM SAMPLE


Conclusion: ***missing or incomplete data can also introduce bias ***the effectiveness of random sampling can easily be demonstrated by comparing sample statistics with population parameters ***the statistics obtained from a sample are used as estimates of the unknown parameters of the population

ORGANIZING AND DISPLAYING DATA

THINGS TO LEARN: 1. distinguish between ***qualitative & quantitative variables ***discrete and continuous variables ***symmetrical, bimodal, and skewed distributions ***positively and negatively skewed disrtibutions 2. construction & interpretation of frequency table 3. types of graphs for displaying quanti & quali data

THINGS TO LEARN: 4. construction of histogram, frequency polygon, bar chart, ogive

5. determination and interpretation of percentiles from an ogive

***to succesfully explain your data, your first task is to classify and organize the data 3 ways of organizing & presenting data: 1. tables 2. graphs 3. numerical techniques

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


***These are 4 commonly used measurement scales. 1. nominal scales used primarily for grouping or categorizing data ***qualitative variables: zip code, hair color, gender, name of college or university, sss number

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


***no numerical value associated with these variables ***assign them a numerical value in order for them to be used in some statistical analysis ex: non-smoker vs smoker 0 = nonsmoker 1 = smoker

gender: female vs male 1 = female 2 = male

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


examples: hair color black = 1 brown = 2 blonde = 3 gray = 4 ID number: arranged in order as entered into the spreadsheet refer to table 3.1 p.32

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


2. ordinal scale ordered series of relationships e.g. 1St, 2nd, 3rd, 4th, 5th...... ***can be used in ranking order of causes of morbidity or mortality example: 5 leading causes of mortality: 1st heart disease 2nd cancer 3rd CVD 4th chronic respiratory disease 5th unintentional injury

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


2. ordinal scale ordered series of relationships e.g. 1St, 2nd, 3rd, 4th, 5th...... ***provides useful information but without being able to quantify the difference (e.g. Difference in the number of deaths due to heart disease vs cancer) ***thus, the information is limited

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


3. interval and ratio ***any differences are measurable and meaningful ex: temp: 60F is 30deg warmer than 30F $20 is $15 higher then $5 4. discrete = discontinuous or continuous ex: number of children per household number of times you visit your doctor number of missing teeth

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


4. discrete = discontinuous or continuous ex: number of children per household number of times you visit your doctor number of missing teeth
***they must always be integers (whole numbers) ***continuous variables ex: age, height, weight 37.8, 138.2, 112.9

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


4. discrete = discontinuous or continuous ***continuous variable significantly improves the accuracy or predicatability of the data

ex: 2.4 would mean 240

NOMINAL, ORDINAL, INTERVAL, & RATIO SCALES


HINTS FOR ENTERING DATA INTO A SPREADSHEET 1. for manual entering of data, remember to verify the accuracy of your data input. Any statistical program will correctly analyze the data provided.

***if incorrect numbers are entered, the analysis will be mathematically correct, but there will be computererror
*** VERIFY, VERIFY, VERIFY

FIGURES, TABLES, AND GROUPS


FIGURES: ***any type of illustration other than a table ***charts, graphs, photographs, drawings ***graph is labeled as Figure ***table is labeled as Table ***TABLE is used to display quantitative data
Primary purpose: to visually display information in a manner that makes it easy for readers to comprehend.

FIGURES, TABLES & GRAPHS


GRAPH: ***one particular type of figure ***labeled as a figure TABLE: ***used to display quantitative data ***a presentation of raw data

FIGURES, TABLES & GRAPHS


FREQUENCY TABLES: ***usually done thru SPSS (spreadsheet) ***primary purpose is to provide a visual presentation that makes the data clear and understandable ***most convenient way of summarizing or displaying data Refer to Table 3.2 (example of frequency table)

FIGURES, TABLES & GRAPHS


FREQUENCY TABLES: ***data input can be in descending or ascending order FREQUENCY: ***the number of cases with a particular value Ex: From N=30 Systolic BP 110-130 = 5 Systolic BP 140-160 = 2 Systolic BP 80-100 = 3

FIGURES, TABLES & GRAPHS

VALID PERCENT ***the % out of 100, using only those subjects with data CUMULATIVE PERCENT ***the % of all previous cases plus the current interval

FIGURES, TABLES & GRAPHS


Class Intervals: - usually high in length aiding the comparisons between any 2 intervals -the number of intervals depends on the number of observations -generally ranges from 5-15 - with too many intervals, data are not sufficiently summarized for a clear visualization of how they are distributed -too few intervals=over-summarized; some details may be lost

FIGURES, TABLES & GRAPHS


Interval width: - number of units between the upper & the lower limits - ex: 91-100 (interval width is 10) refer to Table 3.4 & 3.5 - general rule: use whole numbers & multiples of 5

FIGURES, TABLES & GRAPHS


GRAPHS -another way of displaying data -give a nice overview of the essential features of the data -easier to read than tables but do not give the same detail -self-explanatory having Descriptive title Labeled axes Indication of the units of observation

FIGURES, TABLES & GRAPHS


Types of graphs: Histogram Frequency polygon Cumulative frequency polygons Stem & leaf display Bar charts Pie chart Box & whisker plots

***all can be generated by computer programs

FIGURES, TABLES & GRAPHS


Histogram: -most common -a pictorial presentation of the frequency table -parts: ABSCISSA (horizontal axis) ORDINATE (vertical axis)

-abscissa (w/c depicts the class boundaries -ordinate (w/c depicts the frequency of observations

FIGURES, TABLES & GRAPHS


Histogram: -vertical scale shld begn at zero -general rule: ***height of the vertical scale must be equal to approximately the length of the horizantal scale
****refer to figure 3.1

FIGURES, TABLES & GRAPHS


Frequency polygon -uses the same axes as the histogram constructed by marking a point (at the same height as the histogram's bar) at the midpoint of the class interval -these points are then connected, then connect the endpoint at zero frequency
refer to Fig. 3.3

FIGURES, TABLES & GRAPHS


Frequency polygon:

-various shapes: ***symmetrical ***bimodal ***rectangular ***skewed to the right ***skewed to the left refer to Figure 3.4

FIGURES, TABLES & GRAPHS


Frequency polygon:
-symmetrical (bell-shaped) -bimodal (having two peaks) ***this can represent an overlapping group -rectangular (each class interval is equally represented) -skewed to the right (to the positive side) -skewed to the left (negative side)

FIGURES, TABLES & GRAPHS


Cumulative frequency polygon -alsocalledOGIVE -vertical scale indicates cumulative relative frequency -also connected by points -usefule in comparing 2 sets of data
Refer to Fig. 3.5

FIGURES, TABLES & GRAPHS


Bar charts -useful for displaying nominal or ordinal data -ethnicity, gender, treatment category -arranged alphabetically or frequency wdin a category or on other basis -but it is often arranged by frequency -all bars must be of equal width and separate from each other

FIGURES, TABLES & GRAPHS


Difference of bar charts from histogram:
***bar charts are shown by heights ***histogram is shown by the areas within the bars -the scale at the vertical area begins at zero Refer to Fig. 3.6

FIGURES, TABLES & GRAPHS


Pie charts - a circle divided into wedgges that correspond to the percentage frequencies of the distribution -useful in conveying data that consist of a small number of categories
Refer to Fig. 3.9

FIGURES, TABLES & GRAPHS

Box and whisker plots -graphical examination of data -done by determining the median and the quartile statistics (to be discussed later)

FIGURES, TABLES & GRAPHS


Computerized graphing -www.minitabb.com -www.JMP.com -www.spss.com -spreadsheet program (microsoft excel) can generate simple histograms, bar charts, pie charts, & quartiles required for box-andwhisker plots

FIGURES, TABLES & GRAPHS


Key messages:
-graphing & tabulating data aare essential -this can give us clear understanding and evaluation of the flood of data with which the researcher/reader is bombarded -data is presented accurately and lucidly -it is important to know which method of presentation to choose for each type of data -graphs & tables must tell their own story & stand on their own

FIGURES, TABLES & GRAPHS

EXERCISES ON page 51-53

PROBABILITY
**Applies exclusively to a future event, never to a past event
**stated numerically **defined in the range of 0 to 1 (never more, never less) ***a probability of 1.0 = event will happen with certainty

PROBABILITY
**probability of 0 = event will not happen
**probability of 0.5 = event should occur once in every two attempts, on average **probability close to 1.0 = event is more likely to happen **probability close to 0 = event is unlikely to happen

PROBABILITY
**defined as: the ratio of the number of ways the specified event can occur to the total number of equally likely events that can occur **probability of an event = P(E) -----the proportion of times a favorable event will occur in a long series of repeated trials

PROBABILITY
**P(E) = n / N = number of favorable outcomes number of possible outcomes
ex: one coin = 2 possible outcomes of fair toss coin: head or tail P (H) = 1 / 2

P (T) = 1 / 2

PROBABILITY
**P(E) = n / N = number of favorable outcomes number of possible outcomes
ex: dice = 6 equally possible outcomes N = 6 (1,2,3,4,5,6) P=3/6 P=2/6

NORMAL DISTRIBUTION
**NORMAL LIMITS ----used to classify levels of patients as being healthy or otherwise Ex: normal level of cholesterol = 200 mg/dl ***for patient with above value = indicates a significantly increased risk for coronary heart disease

NORMAL DISTRIBUTION
**giving normal limits is quite critical..WHY?

---given value may not have been accurate or may have been faulty which cause tragic and patient might be given unnecessary treatment or others might fail to receive a needed treatment

NORMAL DISTRIBUTION
**ex; SERUM ALBUMIN ---normal limits for albumin is calculated by adding and subtracting 2 standard deviations from the mean of a large set of observations obtained from a group of presumably healthy persons ---this will provide the limits that contain the middle 95% (w/c is the normal range)

NORMAL DISTRIBUTION

**the remaining 5% will be excluded

CLINICAL LIMITS **the lower & upper 2.5% points for any distribution, normal or otherwise, of healthy persons

NORMAL DISTRIBUTION
**Properties of normal distribution
1. bell-shaped curve = symmetrical about the mean 2. all normal distribution have a particular internal distribution for the area under the curve ----the relative area between any 2 designated points is always the same

NORMAL DISTRIBUTION

**Properties of normal distribution 3. exponential equation ---the normal distribution is a theoretical distribution defined by 2 parameters (the mean and the standard deviation)

HYPOTHESIS TESTING
**One of the principal objectives of research is comparison: ----how does one group differ from another? Ex; ---what is the mean serum cholesterol level of a group of middle-aged men? ---How does it differ from women of the same group age? ---how does it differ from that of men of other age group?

HYPOTHESIS TESTING

**Parameter or unknown characteristic of a population ---usually estimated from a statistic computed from sample data

HYPOTHESIS TESTING
Definition of terms:

HYPOTHESIS --- a statement of belief used in the evaluation of population values NULL HYPOTHESIS (Ho) --- a claim that there is no difference between the population mean and the hypothesized value

HYPOTHESIS TESTING
Definition of terms:

ALTERNATIVE HYPOTHESIS (H1) --- a claim that disagrees with the null hypothesis. If the null hypothesis is rejected, we left with no choice but to fail to reject the alternative hypothesis that mean is not equal to hypothesized value.

HYPOTHESIS TESTING
TEST STATISTIC --- a statistic used to determine the relative position of the mean in the hypothesized probability distribution of sample means CRITICAL REGION --- the region on the far end of the distribution ****one-tailed test --- if only one end of the distribution is involved

HYPOTHESIS TESTING
CRITICAL REGION --- the region on the far end of the distribution ****two-tailed test --- if both ends of the distribution are involved CRITICAL VALUE --- the number that divides the normal distribution into the region where we will reject the null hypothesis and the region where we fail to reject the null hypothesis

HYPOTHESIS TESTING
SIGNIFICANCE LEVEL --- the level that corresponds to the area in the critical region ---this area is usually small (means the results are infrequent and deemed unusual) ----this means statistically significant in the language of statistictians

HYPOTHESIS TESTING
TEST OF SIGNIFICANCE --- a procedure used to establish the validity of a claim by determining whether the test statistic falls in the critical region.

*** if it does, the results are referred to as SIGNIFICANT. ***this test is sometimes called the HYPOTHESIS TEST.

You might also like