Professional Documents
Culture Documents
2) Sample question: The investigator wanted to know if the improvement in exercise duration is related to the
patients disease history. 3) If given a scatterplot of the 2 variables the independent is on the horizontal axis & the dependent is on the vertical axis 4) & could be used as parameters if you had the entire population to study, but instead we just have an estimate that we can make from the sample
5) Things to look for: a. Correlation coefficient: denoted R or Beta. relates how well the line fits through the data,
and Beta gives you if its a + or - correlation. b. Use when asked to give the COEFFICIENT OF DETERMINATION give R-square in decimals, when asked for the interpretation explain it in percentage (i.e. interpretation: 58.7% of the variability in the (dep.variable) is accounted for in this model
c. i. ii. d.
Give a PREDICTION ex. We predict this one persons (dep.variable) to be ____after (indep.variable) - Figure out prediction value by plugging in a value for X & the data ( & ) from the output into the Equation of Least Squares 6) Give an ESTIMATION for people who did/had whatever (indep variable), we estimate theit MEAN (dep.variable) to be __ a. Estimation value is found the same way as the prediction value 7) INTERPRET Y-INTERCEPT: Y-intercept is when X (indep.variable) is ZERO! Explain if it is appropriate or not to interpret the Y-intercept. It IS appropriate to interpret WHEN X CAN be ZERO! When X is zero, the y-intercept is . Example of interpreting Y-intercept:
a. If a person did not have whatever (indep.variable = 0), we predict this persons (dep.variable) to be = . b. For ppl who did not have whatever (indep.variable = 0), we estimate their MEAN (dep.variable) to be = . 8) INTERPRET SLOPE: use examples: as the (indep.variable) increases by 1, the predicted (dep.variable) incr/decr by . AND as the (indep.var)
increases by 1, the estimated avg (dep.var) incr/decr by 9) RESIDUAL ANALYSIS: checking assumptions to perform SLR inferences: 1. Random errors are indep. 2. Random errors are normally distrib. (skew/kurt test) 3. Random errors have constant variance a. If close to 1 = normal popn b. Use for kurtosis &skewness testing c. The avg.error = 0 10) QUESTION OF INTEREST will ask is there a significant linear relationship between y and x? 11) STATE THE HYPOTHESES: Ho: = 0 (model not useful) there is no linear relationship btwn y & x, Ha: 0 (model IS useful) a relationtip exists between y & x 12) TEST STATISTIC F* or t* (t* x t* = F*) & P value & Conclusion same as always (sig) 13) CONFIDENCE INTERVAL FOR SLOPE: Were 95% conf that as X incr by 1, the Y incr/decr by (values on the following table where t* is also found) WATCH out for NEGs, if SO put smaller value FIRST in interval
CONFIDENCE INTERVALS AND PREDICTION INTERVALS FOR THE OUTCOME VARIABLE (FOR A SPECIFIC VALUE OF THE INDEPENDENT VARIABLE):
VALUE X 1 1 3 . . .
VALUE Y 40 90 30 . . .
**Suppose the researchers would like to predict the Y for an individual who has had ? for X years. Single best prediction: __Prediction interval Interpretation: For an individual who has had this ? for X years, we predict with 95% confidence that this persons Y will be between _and_. **Suppose the researchers would like to estimate what the average percent improvement is for all people having ? for X years. Single best (mean) estimate: __Confidence interval Interpretation: We are 95% confident that the mean percent improvement for all people having this disease for 2 years is between _ and _.
MULTIPLE LINEAR REGRESSION (MLR) Used to look for LINEAR relationships between 1 Dependent variable (outcome), and
MULTIPLE independent variables (explanatory) 1) CORRELATION COEFFICIENTS: To determine correlation coeff btwn OUTCOME var, and EACH DEP.var. In the output find the OUTCOME variable and see all its Pearson Correlation (denoted r) values (highest =strongest correlation, lowest =weakest correlation) make a statement about them...
LOGISTIC REGRESSION used when there is one variable that can lead to a dichotomous answer (live or die, pass or fail, etc etc)
Age Group 50-54 55-59 60-69 n 8 17 10 CHD Absent CHD Present 3 5 4 13 2 8 Proportion of 1s Odds 0.63 1.703 How to interpret this? See the ODDS column! 0.76 3.304 Someone who is 60-69yo is 4 times more likely to get CHD. 0.80 4.000
1) HYPOTHESIS TESTING: Ho Indep variable is NOT a predictor of Dep variable in this model, Ha Indep Variable IS a predictor of Dep variable in this model 2) TEST STATISTIC: Walds for variable, PVALUE same as always, CONCLUSION same as always 3) MODEL FOR LOGIT: Z (predicted natural log of the odds) = o + 1 X 4) o = B constant, 1 = B of Variable & Exp(B) is the ODDS RATIO 5)MODEL FOR PROBABILITY OF DEP.VARIABLE: e to the Z / 1 + e to the Z (e is the natural log or e to the x in the calculator) 6)PREDICTION: FIRST using the MODEL FOR LOGIT Equation, plug in the INDEP variable for X and solve for Z, SECOND using the MODEL FOR PROB. Equation, plug in value of Z you just figured out . Use this PREDICTION to establish your CUTOFF to classify your individuals into categories. 7)CLASSIFICATION: According to this model, if a probability of .50 or lower classifies someone as CHD=0, otherwise as CHD=1, how would a person 48 years old be classified? CHD 1 (PARTIAL) SPSS SPREADSHEET WITH PREDICTED VALUES: pre_1 = the predicted probability that the subject will have CHD pgr_1 = the predicted group for the subject (1=CHD, 0=no CHD) 8)ODDS RATIO (see green highlight above in SPPS output): you can say: A 1 yr increase in age increases the odds of CHD 1.118 times. 9) FOR ALL MODELS! The (predicted) 95% confidence interval for the odds ratio is found on the RIGHT LOWER &UPPER of SPSS output
This means that we are 95% confident that the odds ratio is between LOWER and UPPER. NOTE If a 95% confidence interval for the odds ratio contains 1this means that there is not a statistically significant difference in the odds for the two groups, at the .05 level. WATCH FOR THE RATIO, IF 1.00 IS IN THE RATIO, THEN IT IS NOT STATISTICALLY SIGNIFICANT!
10) SPECIFICITY(top) (prob of a neg when actually pos) & SENSITITVITY(bottom) (prob of a pos when a pos is present)are found in the CLASSIFICATION Table under PERCENT CORRECT
TWO WAY ANOVA (NO SIGNIFICANT INTERACTION) (used when evaluating 2 factors affecting 1 thing 2 indep, 1 dep variable)
**If the lines of the graphs titled EST.MARGINAL MEANS OF CHANGE are PARALLEL to each other, it means NO INTERACTION 1) CHECK ASSUMPTIONS FOR 2way ANOVA 1. All groups from normal popns, 2. Groups are indep., 3. No extreme outliers, 4. Homogeneity of variances (Levines test MORE than 0.05) 2) FIRST TEST OVERALL MODEL: Ho: there is no main effect due to factor 1, there is no main effect due to factor 2, and there is no significant interaction between factor 1 and factor 2 on the mean Dependent Variable). Ha: at least one of these 3 factors affects the mean Dependent Variable. 3) TEST STATISTIC: F* of CORRECTED MODEL line, P-VALUE and CONCLUSION 4) SECOND - IF Ha is TRUETEST THE INTERACTION TERM Ho: Factor 1 & Factor 2 do not interact to affect mean Dep.Variable, Ha: F1 & F2 do interact to affect mean dep var. TEST STATISTIC is F* (of F1*F2), P-value & conclusion. (if lines in graphs were parallel, you could have anticipated the INTERACTION was not going to be significant) 5) THIRD since the interaction term is NOT significant, TEST FOR MAIN EFFECTS a. 1st Main Effect (drug): Ho: there is NO main effect due to factor 1 on the mean Dep.Var. Ha: there is a main effect due to Factor 1 on the mean Dep.Var. see TEST STATISTIC (F*), P-VALUE, & CONCLUSION b. IF 1st Main Effect is significant (Ha ia true) then How Much Difference? We are 95% confident that mean change in hemoglobin levels for active agent drug (mentioned FIRST) users is somewhere between Lower Bound and Upper Bound more/less (POS OR NEG?) than the mean change in hemoglobin levels for non-active agent drug (mentioned SECOND) users. c. 2nd Main Effect (Type of CA) establish Ho and Ha just like you did for 1st main effect but with type of CA instead of drug. TEST STAT is F*, PVALUE & CONCLUSION d. IF 2nd Main Effect is significant (Ha true) then How Much Difference? Answer each question Yes or No; give the p-value used to make the decision. If the means are different, state which group has the higher mean. Is the mean change in hemoglobin levels significantly different for (WATCH WHICH IS BEING MENTIONED FIRST AND THEN LOOK IN THE CHART IN THAT ORDER!) A. the cervical cancer and the prostate cancer patients? B. the cervical cancer and the colorectal cancer patients? C. the prostate cancer and the colorectal cancer patients? We are 95% confident that (fill in blanks with positive numbers watch to put the SMALLEST one FIRST when using Negatives) the mean change in hemoglobin levels for cervical cancer patients (1ST) is somewhere between LOWER BOUND and UPPER BOUND more/less (+ or -) than the mean change in hemoglobin levels for prostate cancer (2nd) patients (repeat with all other comparisons) WATCH OUT FOR WHICH FACTOR IS BEING SAID FIRST TO THEN CHOOSE FROM THE OUTPUT CORRECTLY!! SO In this case CHANGE = DRUG + TYPE + DRUG*TYPE
Significant difference? **WATCH OUT FOR WHICH FACTOR IS BEING ASKED FIRST TO SELECT PROPERLY FROM THE SPSS OUTPUT If watch how you write _____ yes, the negative numbers!!!! which has the 3 higher mean? __________