14 views

Uploaded by Sergio Boillos

- Output Ijong e.coli s.aureus
- spss.docx
- d04_ Normal Distribution Problems
- Proof of Normal Distribution
- c Wo 5896559088
- Midterm 2 Study Guide
- OUTPUT.doc
- General Linear Model 1 Copy
- Statistics - Glossary _ Coursera
- Econometric s
- Quality Guru
- Remedial Measures Purdue.edu
- An Introduction to Pattern Recognition a Matlab Solutions
- Assesing Fit
- 2014 lab 10 11
- Statistics II
- T11 Unbalanced
- DSUR Chapter 19 Web Material
- rahma tok
- Independent Samples Test 16

You are on page 1of 20

Chapter 8

Model Assumptions

Independence (response variables yi are independent)- this is a design issue Normality (response variables are normally distributed) Homoscedasticity (the response variables have the same variance)

Best way to check assumptions: check the assumptions on the random errors

They are independent They are normally distributed They have a constant variance 2 for all settings of the independent variables (Homoscedasticity) They have a zero mean.

If these assumptions are satisfied, we may use the normal density as the working approximation for the random component. So, the residuals are distributed as: i ~ N(0,2)

Plotting Residuals

Produce a scatterplot of the standardized residuals against the fitted values. Produce a scatterplot of the standardized residuals against each of the independent variables.

If assumptions are satisfied, residuals should vary randomly around zero and the spread of the residuals should be about the same throughout the plot (no systematic patterns.)

The residuals seem to increase or decrease in average magnitude with the fitted values, it is an indication that the variance of the residuals is not constant.

The points in the plot lie on a curve around zero, rather than fluctuating randomly.

A few points in the plot lie a long way from the rest of the points.

Heteroscedasticity

Not fatal to an analysis; the analysis is weakened, not invalidated. Detected with scatterplots and rectified through transformation.

http://www.pfc.cfs.nrcan.gc.ca/profiles/wulder/mvstats/transform_e.html http://www.ruf.rice.edu/~lane/stat_sim/transformations/index.html

If a scale is arbitrary, a transformation can be more effective If a scale is meaningful, the difficulty of interpretation increases

Normality

The random errors can be regarded as a random sample from a N(0,2) distribution, so we can check this assumption by checking whether the residuals might have come from a normal distribution. We should look at the standardized residuals Options for looking at distribution:

http://statmaster.sdu.dk/courses/st111/module04/index.html

Does distribution of residuals approximate a normal distribution? Regression is robust with respect to nonnormal errors (inferences typically still valid unless the errors come from a highly skewed distribution)

A normal probability plot is found by plotting the residuals of the observed sample against the corresponding residuals of a standard normal distribution N (0,1)

If the plot shows a straight line, it is reasonable to assume that the observed sample comes from a normal distribution. If the points deviate a lot from a straight line, there is evidence against the assumption that the random errors are an independent sample from a normal distribution.

http://www.skymark.com/resources/tools/normal_test_plot.asp

If this value equals zero all random errors equal 0 Prediction equation () will be equal to mean value E(y) If this value is large Large (absolute values) of Larger deviations between and the mean value E(y). 2 The larger the value of , the greater the error in estimating the model parameters and the error in predicting a value of y for specific values of x.

The units of the estimated variance are squared units of the dependent variable y. For a more meaningful measure of variability, we use s or Root MSE. The interval 2s provides a rough estimation with which the model will predict future values of y for given values of x.

Following any modeling procedure, it is a good idea to assess the validity of your model. Residuals and diagnostic statistics allow you to identify patterns that are either poorly fit by the model, have a strong influence upon the estimated parameters, or which have a high leverage. It is helpful to interpret these diagnostics jointly to understand any potential problems with the model.

Outliers

What:

An observation with a residual that is larger than 3s or a standardized residual larger than 3 (absolute value) A data entry or recording error, Skewness of the distribution, Chance, Unassignable causes Eliminate? Correct? Analyze them? How much influence do they have? How do I know? Minitab Storage: Diagnostics (Leverages, Cooks Distance, DFFITS)

Why:

Then what?

Leverages- p. 403

Identifies observations with unusual or outlying xvalues. Leverages fall between 0 and 1. A value greater than 2(p/n) is large enough to suggest you should examine the corresponding observation.

Minitab identifies observations with leverage over 3(p/n) with an X in the table of unusual observations

An overall measure of the combined impact of each observation on the fitted values. Calculated using leverage values and standardized residuals. Considers whether an observation is unusual with respect to both x- and y-values. Observations with large D values may be outliers.

Compare D to F-distribution with (p, n-p) degrees of freedom. Determine corresponding percentile.

Less than 20%- little influence Greater than 50%- major influence

DFFITS- p. 408

The difference between the predicted value when all observations are included and when the ith observation is deleted. Combines leverage and Studentized residual (deleted t residuals) into one overall measure of how unusual an observation is Represent roughly the number of standard deviations that the fitted value changes when each case is removed from the data set. Observations with DFFITS values greater than 2 times the square root of (p/n) are considered large and should be examined.

Start with the plot and brush outliers Look for values that stand out in diagnostic measures Rules of thumb

Leverages (HI): values greater than 2(p/n) Cooks Distance: values greater than 50% of comparable F (p, n-p) distribution p 2 DFFITS: values greater than n

Lets do an example

Range: 0 d 4 Uncorrelated: d is close to 2 Positively correlated: d is closer to zero Negatively correlated: d is closer to 4.

- Output Ijong e.coli s.aureusUploaded byElyza Aiman
- spss.docxUploaded byYUDHA HINDRAWAN
- d04_ Normal Distribution ProblemsUploaded byrichard langley
- Proof of Normal DistributionUploaded byvc94
- c Wo 5896559088Uploaded byAli
- Midterm 2 Study GuideUploaded byaajjdog
- OUTPUT.docUploaded byAnonymous Y30cQIJIRg
- General Linear Model 1 CopyUploaded byjck07
- Statistics - Glossary _ CourseraUploaded byShubham Sharma
- Econometric sUploaded byvarunragav85
- Quality GuruUploaded bytehky63
- Remedial Measures Purdue.eduUploaded byHadassah May Cordero
- An Introduction to Pattern Recognition a Matlab SolutionsUploaded byroichou
- Assesing FitUploaded byUlysses Bloom
- 2014 lab 10 11Uploaded byPi
- Statistics IIUploaded bygambo_dc
- T11 UnbalancedUploaded byTeflon Slim
- DSUR Chapter 19 Web MaterialUploaded byMuhammad Amir Shafiq
- rahma tokUploaded byaudina
- Independent Samples Test 16Uploaded byGeorge Blessit
- skittles projectUploaded byapi-287735026
- Important Statistics FormulasUploaded byS
- UJI T BERATUploaded byRizky Amalia
- MKENYE MATHEMATICS-S01,FORM SIX-Probability DistributionUploaded byDOPPLER LIBRARY SERIES (DOLIS) EDUCATION SERVICES
- Problem Set 1hgfyUploaded byRakesh Patel
- CHAPTER10-3Uploaded byZizan WS
- Difference RelationshipUploaded bychard apollo
- ch08-cont1Uploaded byRanjith Kumar
- 8865_2017Uploaded bychuasiokleng
- AP Statistics on the Casio PrizmUploaded byrothstem

- 0. Ejemplo VdaUploaded bySergio Boillos
- Graficos de Control Por AtributosUploaded bySergio Boillos
- VDA_Volume_Field_failure_analysis.pdfUploaded byEduardo Pimenta
- VDA_Volume_Field_failure_analysis.pdfUploaded byEduardo Pimenta
- Building a Safety Case in Compliance with ISO 26262Uploaded bySergio Boillos
- Confidence IntervalsUploaded byDheeraj
- metodologiaUploaded byCarlos David Jüngwirth
- FMEA 4 EDITION.pdfUploaded bySergio Boillos
- Informe 8D EjemploUploaded bySergio Boillos
- Que Sueldo Me PongoUploaded bySergio Boillos
- QS9000Uploaded bySergio Boillos
- VDA Minimizing Risks in the Supply ChainUploaded bySergio Boillos
- Control Plan ANPQPUploaded bySergio Boillos
- VDA Volume-Criteria for Car-washes Conforming to VDA SpecificationsUploaded bySergio Boillos
- Utilizacion Tablas EstadisticasUploaded bySergio Boillos
- GM 1927 Launch Manual-2Uploaded bySergio Boillos
- Ejemplo Ford TransitUploaded bySergio Boillos
- qs-9000Uploaded bycwbeh
- Introduction to Acceptance SamplingUploaded byDheeraj
- superficies de respuestaUploaded byAndrea Leticia Ordoñez
- 8dUploaded byEdgar Reyes
- GM 1927-16a PWT Gear Commodity AuditUploaded bySergio Boillos
- ANSI StandardsUploaded bySergio Boillos
- Superficies RespuestaUploaded bySergio Boillos
- Metodo SigmaUploaded byJhandyra Yllary Rojas Garrido
- Análisis ISO TSUploaded bySergio Boillos
- Geometric Tolerancing - DefinitionsUploaded bySergio Boillos
- Curso VDA 6.3 _11 Diciembre 2013Uploaded bySergio Boillos
- Desbalance mecanicoUploaded byRaquel Gonzalez

- Management Science - RegressionUploaded byShivani Roopnarain
- EmpFinPhDAll.pdfUploaded byjamilkhann
- Introduction to Econometrics- Stock & Watson -Ch 4 Slides.docUploaded byAntonio Alvino
- 13EMRslidesSilvia.pdfUploaded bymuralidharan
- Ho MediationUploaded byNathan Montgomery
- A New Methodology to Predict Backbreak in Blasting Operation-M. MohammadnejadUploaded byaufal Riswan
- Regression Models ProjectUploaded byAkshay Rao
- Regression.pptUploaded byNadirah Yasmin
- SPSS Project Final 1Uploaded byIulian Racul
- MAS.M-1414. Cost concepts, Classification and Segregation.MC.docxUploaded bychowchow123
- 5regrUploaded bykhaleel6090
- ArchUploaded bya ayesha
- Aggregated vs. Disaggregated Data in RegressionUploaded byEpsonminces
- Note on Panel DataUploaded byHassanMubasher
- de14_drukker_gsemUploaded byDet Guillermo
- binaryUploaded byRoberto García
- Linear Regression for Machine LearningUploaded byJohn Green
- App05A.docUploaded bydoyeonkim21
- Research Paper 1 EditedUploaded byJoshua Sabherwal
- Week10 AnnotatedUploaded byBob
- Exploratory Factor Analysis With the World Values SurveyUploaded byarijitroy
- econ2209_5248tute8Uploaded byRichard Leicheston
- Estimation_of_reliability_a_better_alter.pdfUploaded byhelton_bsb
- ANALISIS SOALUploaded by'Sii Cumig Dudidudidamdam'
- Calculation of Multiple Regression With Three Independent Variable By B.K SankhlaUploaded byMandadapu Swathi
- Business Case on Pronto PizzaUploaded byMuhammad Bilal
- ding et al 1993.pdfUploaded byelvisgonzalesarce
- Predictor Coef SE Coef T PUploaded byj_anttonio
- IBM SPSS Advanced StatisticsUploaded byJean Hans Garçon
- Chapter 4 RegressionUploaded byIvan Ng