14 views

Uploaded by emersaphire

intro to biostat

- Buy back -18022014
- Learning to Use Statistical Tests in Psychology
- Sample Size Determination
- Scale and Tool
- Assignment 2
- aaaaaaaaaaaMB0050
- 12345
- Comparing Two Population Means
- Math A
- jurnal madia
- Statistics Notes
- S2 January 2012 Question Paper
- Mana 420 Review Slides_week 10
- Level 1 Los 2014
- 1
- stathw19
- Written Report in Experimental
- PTCG_HandS_April0808.pdf
- All the Probability and Statistics Sheets
- An Experimental Study of Organisational Change and Communication Management

You are on page 1of 62

Final Exam

You may bring 5 pages of notes

You MUST bring full copies of

statistical tables (on Blackboard)

You MUST bring a calculator

Hypothesis Testing for a single mean and

proportion, and for two means

One-way ANOVA

Chi-square Tests

Power and Sample size

Regression and Correlation

Logistic Regression

Survival analysis

Single mean

Single proportion p

Paired (or matched) data d

Define null and research hypotheses

Define test statistic, level of

significance and decision rule

sample data.

Use decision rule or p-value to decide

whether to reject or not reject the null

hypothesis.

For a single mean

If n 30, use z-test statistic

If n < 30 use t-test statistic

Use z-test statistic

Check assumptions

For comparing two means 1 - 2

If n1 and n2 both 30, use z-test statistic

If n1 and/or n2 < 30 use t-test statistic

Use chi-square test

Type I error occurs when we reject null

hypothesis when we shouldnt.

Pr(Type I error) =

Type II error occurs when we dont

reject null hypothesis when we should

have.

Pr(Type II error) =

One-Way ANOVA

Used when we want to compare the means

of three or more groups from independent

populations.

Continuous outcome measured on each

subject.

We set up an analysis of variance table and

compare the variances of between groups

and within groups.

An F-test is used with two different degrees

of freedom terms.

Chi-Square Test

Chi-square goodness of fit test

Assess whether responses fit a specified

distribution for one sample of people

Test if two discrete variables are associated in

some way for a sample of people

Compare distributions of proportions among two

or more independent groups

Need a large enough sample to ensure

you have the pre-specified amount of

precision in analysis

Sample size determined based on type

of planned analysis:

Confidence interval

Hypothesis test

We always round up our calculation.

Need to account for possible dropout

from study. This always increases the

required sample size.

Power

Linked up with Type II error

Power = 1-

=P(Reject H0 | H0 false)

= Probability of correctly

rejecting H0 when H0 is false.

Correlation

Correlation measures the nature and

strength of linear association between

two variables at a time.

Regression equation that best

describes relationship between

variables.

Correlation Coefficient

Population correlation is r (rho)

Sample correlation is r where

-1 < r < +1

Sign indicates nature of relationship

(positive or direct, negative or inverse)

Linear Regression

A very popular method for describing

the linear relationship between two

variables (usually continuous

variables).

We use a scatterplot to display the

data graphically

the two variables.

Y = Dependent, Outcome variable

X = Independent, Covariate, Predictor

variable

y = b0 + b1 x

Useful when we want to jointly

examine the effect of several X

variables on the outcome Y variable.

Y = continuous outcome variable

X1, X2, , Xp = set of independent or

predictor variables

y

. = b0 + b1 x1 + b2 x 2 + . . . + bp x p

Linear Regression

Predictors can be continuous, indicator

variables (0/1) or a set of dummy variables

Confounding the effect of a risk factor on

an outcome is somehow changed due to the

effect of another factor.

Effect Modification a different relationship

between the risk factor and an outcome

depending on the level of another variable.

Logistic Regression

Used when the outcome is dichotomous

(binary), e.g. diseased , not diseased.

Our goals remain the same as for linear

regression:

is there an association between a

variable X and our outcome variable Y?

If so, what type?

We model the probability p of having

the disease.

b 0 b1X

e

p

b 0 b1X

1 e

p

b0 b1x

logit( p ) ln

1 p

Outcome is dichotomous (1=event,

0=non-event) and p=P(event)

Outcome is modeled as log odds

p

b0 b1x1 b 2 x 2 ... b p x p

ln

1 - p

Exp(bi) = OR

Survival Analysis

Outcome is the time to an event.

An event could be time to heart attack,

cancer remission or death.

(Yes/No) and if so, their time to event.

Determine factors associated with longer

survival.

Survival Analysis

Incomplete follow-up information

Censoring

Measure follow-up time and not time to

event

We know survival time > follow-up time

two or more independent groups

Model:

ln(h(t)/h0(t)) = b1X1 + b2X2 + + bpXp

Model used to jointly assess effects of

independent variables on outcome

(time to an event).

Final Exam

Problem 1.

Suppose a cross-sectional study is

conducted to investigate cardiovascular risk

factors among a sample of patients seeking

medical care at one of three local hospitals.

A total of 300 patients are enrolled. Using

the following data, test if there is an

association between enrollment site (i.e.,

hospital) and family history of CVD. Run

the appropriate test at a 5% level of

significance.

Problem 1.

Family

Hx

Definite

Hosp 1

Hosp 2

Hosp 3

24

14

22

Probable

14

No

68

72

70

Total

100

100

100

Problem 1.

H0: Site and family history are

independent

H1: H0 is false

=0.05

Df = (r-1)(c-1) = (3-1)(3-1) = 4.

Reject H0 if 2 > 9.49

Problem 1.

Family

Hx

Definite

Hosp 1

Hosp 2

Hosp 3

24 (20)

14 (20)

22 (20)

Probable

8 (10)

14 (10)

8 (10)

No

68 (70)

72 (70)

70 (70)

100

100

100

Total

Problem 1.

(24 20 ) 2 (14 20 ) 2 (22 20 ) 2 (8 10 ) 2 (14 10 ) 2 (8 10 ) 2

20

20

20

10

10

10

(68 70 ) 2 (72 70 ) 2 (70 70 ) 2

70

70

70

2

+ 0.06 + 0 = 5.32

Do not reject H0 because 5.32 <9.49.

We do not have significant evidence,

=0.05, to show that site and family

history are not independent.

Problem 2.

The following table summarizes data collected

in the study described in problem 1. The

variable summarized below is body mass

index (BMI) computed as the ratio of weight

in kilograms to height in meters squared.

BMI

N

Mean

Std Dev

Overall

300

24.8

2.5

Hosp 1

100

21.6

2.1

Hosp 2

100

24.8

1.8

Hosp 3

100

27.9

1.3

Problem 2.

Test if there is a significant difference in the mean BMI

scores among hospitals. Show all parts of the test and

use a 5% level of significance. (HINT: MSE = 3.1).

H0: 123

H1: means not all equal

SSb n j (X j X)

=0.05

=100((21.6-24.8)2+(24.824.8)2+(27.924.8)2)

= 100(10.24 + 0 + 9.61) = 1985

Problem 2.

Source

SS

Df

MS

Between

1985

992.5

320.2

Error

920.7

297

3.1

Total

2905.7

299

F = 320.2

Reject H0 since 320.2 > 3.09. We have significant

evidence, =0.05, to show that the means are not

all equal.

Problem 3.

Suppose each participant in the study

described in problem 1 is assigned a

cardiovascular risk (a value between 0 and

100 with higher scores indicative of more

risk of cardiovascular disease). The mean

cardiovascular risk is 21.7 with a standard

deviation of 5.6. Suppose that the

covariance between BMI and cardiovascular

risk is 4.5.

Problem 3.

Compute the sample correlation coefficient between

BMI and cardiovascular risk.

Var(BMI) = sx2= 2.52

Var(Risk) = sy2 = 5.62

Cov(X,Y)

2 2

x y

ss

4.5

2

(2.5) (5.6)

0.3

Run the appropriate test at a 5% level of significance.

H0: r = 0

H1: r 0

(n 2)

Zr

1 r2

=0.05

Reject H0 if Z < -1.96 or if Z > 1.96

298

Z 0.3

5.4

2

1 (0.3)

Reject H0 since 5.4 > 1.96. We have significant

evidence, =0.05, to show that r 0.

Problem 4.

Compute the equation of the line that best describes

the relationship between BMI and cardiovascular risk

(Assume that cardiovascular risk is the dependent

variable).

sy

5.6

b1 r 0.3

0.67

sx

2.5

y 5.08 0.67X

Problem 5.

Suppose we restrict our attention to the

subgroup of patients at high risk for

cardiovascular disease (cardiovascular

risk score of 30 or more).

Using the following data, test if BMI is

significantly different in men versus

women. Use a 5% level of significance.

Problem 5.

H0: 1 = 2

H1: 1 2

=0.05

BMI

X1 X 2

t

1 1

Sp

n1 n 2

Men

Women

20

10

Mean

31.6

28.1

Std Dev

1.7

2.1

Df=20+10-2 = 28

Reject H0 if t < -2.048 or if t > 2.048

Problem 5.

19(1.7) 2 9(2.1) 2

Sp

1.84

20 10 2

31.6 - 28.1

4.91

1 1

1.84

20 10

=0.05, to show there is a difference in mean BMI

between men and women.

Problem 6.

How many men and women would be required to

estimate a difference in mean BMI with a 95%

confidence interval and a margin of error not

exceeding 1 unit. (Use data from problem 6 as

needed.)

2

Zs

ni 2

E

Use Sp from #6

1.96(1.84)

ni 2

26.01

1

Problem 7.

The following table was constructed based on a

comparison of various sociodemographic

characteristics between men and women enrolled in

the study of cardiovascular risk factors.

Which, if any, of the characteristics shown

above are significantly different between men

and women? Justify.

Problem 7.

Characteristic

Men (n=160)

Women (n=140)

45

47

Race

p

0.7256

0.0354

% White

32

38

% Black

41

37

% Hispanic

25

19

% Other

% HS Graduate

78

64

0.0245

47

31

0.0001

% No Insurance

0.9876

Problem 8.

men and women?

Two sample test for equality of independent

means.

What test was used to compare race between

men and women?

Chi-square test of independence.

What test was used to compare educational

level (% high school graduates) between men

and women?

Two sample test for equality of independent

proportions or chi-square test of independence.

Problem 9.

Two different scales are used in a particular

laboratory. There is some concern that one

scale gives different readings than the other.

Ten specimens are randomly selected and

weighed on each scale. The data are shown

below.

weights between the two scales at =0.05

Problem 9.

Specimen

Scale 1

Scale 2

1.2

2.1

3.5

3.6

1.8

1.9

4.0

4.0

5.0

4.9

1.9

2.0

2.7

2.7

2.2

2.3

2.8

2.9

10

3.5

3.7

diff 2 diff /n

2

diff 1.5

Xd

0.15

n

10

sd

n 1

0.276

9

H0: d = 0

H1: d 0 =0.05

t

Xd

sd

, df n 1

t

Xd

sd

0.15

1.72

n 0.276

10

have significant evidence at =0.05 to show that d 0

Problem 10.

Patients with hypertension are generally

recommended to follow a low salt diet.

Surveys report that approximately 75% of

patients adhere to these diets. In a random

sample of 100 patients with hypertension,

70% report following a low-salt diet. Are

these patients significantly low in terms of

adherence? Run the test at = 0.05.

Problem 10.

H0: p = 0.75

H1: p < 0.75

=0.05

p p 0

p 0 (1 p 0 )

n

Z

p p 0

p 0 (1 p 0 )

n

0.70 0.75

0.75(1 0.75)

100

1.15

have significant evidence at =0.05 to show that p<0.75.

Problem 11.

The following table was presented in a journal and describes

the associations between demographic and clinical risk

factors and systolic blood pressure.

Risk Factors

Intercept

Age

Male Sex

Current Smoker

Number

of

Exercise/Week

Hrs

Pressure

p

Regression

Coefficient

105.3

0.0001

1.2

0.0042

4.5

0.0956

-0.5

0.2354

-2.4

0.0003

Problem 11.

a) What type of analysis generated the results summarized

above?

Multiple linear regression analysis because the outcome

(systolic blood pressure) is continuous.

b) Which of the risk factors are significantly associated with

systolic blood pressure?

significant at the 5% level (both have p values < 0.05). Male

sex is marginally significant with a p value of 0.0956.

Problem 11.

c) What is the relative importance of the risk factors?

The most important (statistically significant) risk factor is number of

hours of exercise per week, followed by age and then male sex.

Current smoking status is not statistically significant.

d) How would you interpret the regression coefficient associated with

male sex? With number of hours of exercise per week?

Mens systolic blood pressure is 4.5 units higher than womens

holding age, smoking status and number of hours of exercise

constant. Each additional hour of exercise per week is associated

with a reduction of 2.4 units of systolic blood pressure holding age,

sex and current smoking status constant.

Problem 12.

The following table was presented in a journal and describes

the associations between demographic and clinical risk factors

and hypertension.

Risk Factors

Outcome = Hypertension

Regression Coefficient

3.5

0.0001

Age

0.02

0.0357

Male Sex

0.27

0.0264

-0.005

0.7564

-0.36

0.0111

Intercept

Current Smoker

Number of Hrs Exercise/Week

Problem 12.

a) What type of analysis generated the results summarized above?

Multiple logistic regression analysis because the outcome

(hypertension) is dichotomous.

b) Which of the risk factors are significantly associated with

hypertension?

Age, male sex and number of hours of exercise are statistically

significant at the 5% level (both have p values < 0.05).

c) What is the relative importance of the risk factors?

The most important (statistically significant) risk factor is number of

hours of exercise per week, followed by male sex and then age.

Current smoking status is not statistically significant.

Problem 12.

d) Compute odds ratios for each of the risk factors.

Risk Factors

Outcome = Hypertension

Regression Coefficient

Odds Ratio

Age

0.02

1.02

Male Sex

0.27

1.31

-0.005

0.99

-0.36

0.70

Current Smoker

Number of Hrs Exercise/Week

male sex? With number of hours of exercise per week?

Men are 1.31 times more likely to have hypertension than women, holding

age, current smoking status and number of hours of exercise per week

constant.

Each additional hour of exercise per week is associated with a 30% reduction in

the likelihood that someone has hypertension, holding age, sex and current

smoking status constant.

Problem 13.

A study is conducted to assess whether there is a difference in physicians

opinions regarding the treatment of early stage throat cancer. Specifically,

physicians were asked if they would recommend radiation, surgery or

neither upon initial diagnosis. Based on the data below, is there a

relationship between treatment recommendations and physicians age?

Run the test at a 5% level of significance.

Radiation

Surgery

Neither

Total

<40

35

15

50

100

40-59

29

30

41

100

60-79

40

43

22

105

Total

104

88

113

305

Problem 13.

H0: Age and treatment recommendation are independent

H1: H0 is false

=0.05

2

(

O

E

)

2

E

Df = (r-1)(c-1) = (3-1)(3-1) = 4.

Reject H0 if 2 > 9.49

(35 34 .1) 2 (15 28 .9) 2 (50 37 ) 2 (29 34 .1) 2 (30 28 .9) 2 (41 37 ) 2

34 .1

28 .9

37

34 .1

28 .9

37

(40 35 .8) 2 (43 30 .3) 2 (22 38 .9) 2

35 .8

30 .3

38 .9

2

Radiation

Surgery

Neither

Total

<40

35 (34.1)

15 (28.9)

50 (37.0)

100

40-59

29 (34.1)

30 (28.9)

41 (37.0)

100

60-79

40 (35.8)

43 (30.3)

22 (38.9)

105

Total

104

88

113

305

Reject H0 because 25.66 > 9.49. We have significant evidence, =0.05,

to show that age and treatment recommendation are not independent.

Problem 14.

For each of the following scenarios,

indicate which test would be used. Use

the letters below to indicate the test in

the space provided. Note that the same

test might be used for more than one

scenario.

Problem 14.

a)

b)

c)

d)

e)

f)

g)

h)

i)

j)

k)

Compare proportion to historical/external control

Compare two independent means

Compare two matched/paired means

Analysis of variance

Chi-square goodness of fit test

Chi-square test of independence

Correlation analysis

Linear regression analysis

Logistic regression analysis

Survival analysis

Problem 14.

Scenario

1. We want to test if there is a significant association between BMI (kg/m2) and

incident myocardial infarction adjusting for age, sex, systolic blood pressure and

smoking.

2. We want to test if a new environmental intervention is effective in reducing

exposure to second-hand smoke. Each participant in the study has levels of exposure

measured before and after the intervention is implemented.

3. We wish to test if there is a significant association between GRE scores and first

year GPA in MPH students who matriculated in fall 2011.

4. We want to determine if there are significant differences in ages of participants

enrolled in a study comparing those with a family history of cardiovascular disease to

those without.

5. A study reports that 15% of college freshman smoke. We want to test if

significantly more BU freshman smoke.

6. We want to test if there is a difference in preterm versus term deliveries among

women of black, Hispanic and white race.

7. We want to test if nutritional supplements prolong life (minimize time to death) in

persons over 65 years of age, adjusted for sex and other comorbid conditions.

8. A clinical trial is run to assess the safety of a new drug compared to a standard

drug and the outcome is development of skin rash or not

9. We want to test if there is a difference in mean time to complete a physical task

when comparing 12, 13, 14 and 15 year olds.

10. We want to test whether smoking in pregnancy increases the risk of infection in

newborns.

Test

j

d

h or i

c

b

g

k

g or j

e

g or k

- Buy back -18022014Uploaded byashish_p7
- Learning to Use Statistical Tests in PsychologyUploaded byAbdul Rahman Raj Khan
- Sample Size DeterminationUploaded byapi-3744914
- Scale and ToolUploaded bySenthilkumaran Piramanayagam
- Assignment 2Uploaded byEhsan Karim
- aaaaaaaaaaaMB0050Uploaded byPriyanka Ghosh
- 12345Uploaded byJohn Salvador Ricacho
- Comparing Two Population MeansUploaded bySergio
- Math AUploaded byFrishian Gail Quijano
- jurnal madiaUploaded byANDRI FEBRIANTO
- Statistics NotesUploaded byBalasubrahmanya K. R.
- S2 January 2012 Question PaperUploaded bygerikaalhu
- Mana 420 Review Slides_week 10Uploaded byClara Génadry
- Level 1 Los 2014Uploaded byMohamed Arsalan
- 1Uploaded byJulio Camel
- stathw19Uploaded bykokleong
- Written Report in ExperimentalUploaded byJoseph James Verano
- PTCG_HandS_April0808.pdfUploaded byzilangamba_s4535
- All the Probability and Statistics SheetsUploaded byAli
- An Experimental Study of Organisational Change and Communication ManagementUploaded byDazzling Solomon
- Exercise.,.,Uploaded bychumy
- Excercises in Basic Statistics Using R, TEACHERUploaded byUZAMA
- Business Statistics/Series-2-2005(Code3009)Uploaded byHein Linn Kyaw
- 8[1].Basic Stat InferenceUploaded byManish Mahabir
- Assignments r2Uploaded byRomirites Mistry
- leading factors contributing to brand switching in apparelUploaded byJagdish Purohit
- Teste de Hipoteses ArtigoUploaded byFeolikelly
- Mb0050 Research MethodologyUploaded byAmit Jaiswal
- MB0050 SET_2.odtUploaded bySampath Raj
- Multivariate Statistics and Computational Methods Graduate Students Ought to Know. a Taxonomic BibliographyUploaded by50_BMG

- Stages in the Research ProcessUploaded byCharles Lipanda
- Quantum Mechanics SimulationUploaded byDanielle McLean
- _02-Research Process & ConsiderationsUploaded byKhaskheli Zuhaib
- metodologi penelitianUploaded byRani Puspita
- Joseph Agassi (Auth.) Towards a Rational Philosophical Anthropology 1977Uploaded byNavonil HAzra
- Descriptive Research.docxUploaded byCecile V. Guanzon
- Ignou Ms-95 solved assignment june -2013Uploaded byAmit Kumar Singh
- MATH 250 ELEMENTS OF STATISTICS / TUTORIALOUTLET DOT COMUploaded byalbert0077
- Possible Methodological Limitations.docxUploaded byShaiful Shai
- A BOOTSTRAP APPROACH FOR IMPROVING LOGISTIC REGRESSION PERFORMANCE IN IMBALANCED DATA SETSUploaded byGlobal Research and Development Services
- M.sc_. Statistics 2014-15Uploaded bydixson1965
- Combiner Le Résultat 1 CopyUploaded bybersam05
- Power AnalysisUploaded byGerardoCuevas
- Abstrak an Model Pembelajaran Langsung Dengan Metode Kumon Pada Pokok Bahasan Komposisi Fungsi Dan Invers Fungsi Di Madrasah Aliyah Kabupaten NgawiUploaded byFebrero Ariev
- The Boston Housing DatasetUploaded bySwastik Mishra
- Qc Analytic 1Uploaded byMarie Petalcorin
- Chapter_6.pdf;filename= UTF-8''Chapter 6Uploaded byJinky P. Refurzado
- Assignment+2 2Uploaded byJane Kingston
- Systematic Review 2011 Full ReportUploaded byAna Barbara
- Hanan Reiner - The Web of Religion and Science - Bellah, Giddens, And HabermasUploaded byFacundo Gonzalez Kwatyrka
- 582-2016-2Uploaded byDaviddeMiguel
- Data Presentation and AnalysisUploaded bysweety
- Understanding molecular simulationUploaded byHankyul Lee
- Chapter 1Uploaded bycik siti
- EOMF Vol2 (D-H)Uploaded byChandler Robert Spencer
- SYL CHEM 2011 Summer 2016 Al-MasumUploaded byImani Chambers
- Truth and Trustworthiness EditedUploaded byHasith
- Aspen Workbook - 2013Uploaded bytophat36
- Assessment and EvaluationUploaded byhuwaina
- 101 ExaminationSSUploaded byaryan4ever05