You are on page 1of 5

Stat Trek Teach yourself statistics Search

Home Tutorials AP Statistics Stat Tables Stat Tools Calculators Books Help
Overview AP statistics Statistics and probability Matrix algebra

Share with Friends

AP Statistics Tutorial Transformations to Achieve Linearity


Exploring Data When a residual plot reveals a data set to be nonlinear, it is often possible to
The basics "transform" the raw data to make it more linear. This allows us to use linear
regression techniques more effectively with nonlinear data.
Charts and graphs
Regression
Categorical data
What is a Transformation to Achieve Linearity?

Planning a Study Transforming a variable involves using a mathematical operation to change its measurement scale.
Broadly speaking, there are two kinds of transformations.
Surveys
Experiments Linear transformation. A linear transformation preserves linear relationships between variables.
Therefore, the correlation between x and y would be unchanged after a linear transformation.
Anticipating Patterns
Examples of a linear transformation to variable x would be multiplying x by a constant, dividing x
Probability by a constant, or adding a constant to x.
Random variables
Nonlinear tranformation. A nonlinear transformation changes (increases or decreases) linear
Discrete variables relationships between variables and, thus, changes the correlation between variables. Examples
Continuous variables of a nonlinear transformation of variable x would be taking the square root of x or the reciprocal
Sampling distributions of x.

Statistical Inference In regression, a transformation to achieve linearity is a special kind of nonlinear transformation. It is a

Estimation nonlinear transformation that increases the linear relationship between two variables.

Estimation problems
Methods of Transforming Variables to Achieve Linearity
Hypothesis testing
There are many ways to transform variables to achieve linearity for regression analysis. Some common
Hypothesis tests
methods are summarized below.
Appendices
Practice Exam Method Transformation(s) Regression equation Predicted value ()
Notation y = b 0 + b 1x = b 0 + b 1x
Standard linear regression None
AP Statistics Formulas
Exponential model Dependent variable = log(y) log(y) = b0 + b1x = 10b0 + b1x

Quadratic model Dependent variable = sqrt(y) sqrt(y) = b0 + b1x = ( b 0 + b 1 x )2


* AP and Advanced Reciprocal model Dependent variable = 1/y 1/y = b0 + b1x = 1 / ( b 0 + b 1x )
Placement Program are
Logarithmic model Independent variable = log(x) y= b0 + b1log(x) = b0 + b1log(x)
registered trademarks
of the College Board, Power model Dependent variable = log(y) log(y)= b0 + b1log(x) = 10b0 + b1log(x)
Independent variable = log(x)
which was not involved
in the production of,
Each row shows a different nonlinear transformation method. The second column shows the specific
and does not endorse
transformation applied to dependent and/or independent variables. The third column shows the
this web site.
regression equation used in the analysis. And the last column shows the "back transformation" equation
used to restore the dependent variable to its original, non-transformed measurement scale.

In practice, these methods need to be tested on the data to which they are applied to be sure that they
increase rather than decrease the linearity of the relationship. Testing the effect of a transformation
method involves looking at residual plots and correlation coefficients, as described in the following
sections.

Note: The logarithmic model and the power model require the ability to work with logarithms. Use a
graphic calculator to obtain the log of a number or to transform back from the logarithm to the original
number. If you need it, the Stat Trek glossary has a brief refresher on logarithms.

How to Perform a Transformation to Achieve Linearity

Transforming a data set to enhance linearity is a multi-step, trial-and-error process.

Conduct a standard regression analysis on the raw data.

Construct a residual plot.


If the plot pattern is random, do not transform data.
If the plot pattern is not random, continue.

Compute the coefficient of determination (R2).

Choose a transformation method (see above table).

Transform the independent variable, dependent variable, or both.

Conduct a regression analysis, using the transformed variables.

Compute the coefficient of determination (R2), based on the transformed variables.


If the tranformed R2 is greater than the raw-score R2, the transformation was successful.
Congratulations!
If not, try a different transformation method.

The best tranformation method (exponential model, quadratic model, reciprocal model, etc.) will depend
on nature of the original data. The only way to determine which method is best is to try each and
compare the result (i.e., residual plots, correlation coefficients).

A Transformation Example

Below, the table on the left shows data for independent and dependent variables - x and y, respectively.
When we apply a linear regression to the untransformed raw data, the residual plot shows a non-
random pattern (a U-shaped curve), which suggests that the data are nonlinear.

HP 39G+ Graphing Calculator


x 1 2 3 4 5 6 7 8 9

y 2 1 6 14 15 30 40 74 75

Suppose we repeat the analysis, using a quadratic model to transform the dependent variable. For a Buy Used: $18.95
Buy New: $89.99
quadratic model, we use the square root of y, rather than y, as the dependent variable. Using the
transformed data, our regression equation is:
Approved for AP Statistics and

y't = b0 + b1x Calculus

where
Statistics & Probability with the
yt = transformed dependent variable, which is equal to the square root of y
TI-89
y't = predicted value of the transformed dependent variable yt Brendan Kelly
x = independent variable
b0 = y-intercept of transformation regression line
b1 = slope of transformation regression line

The table below shows the transformed data we analyzed.


List Price: $16.95
Buy Used: $0.13
Buy New: $16.95
x 1 2 3 4 5 6 7 8 9

yt 1.14 1.00 2.45 3.74 3.87 5.48 6.32 8.60 8.66


Probability Theory: The Logic
of Science
E. T. Jaynes
Since the transformation was based on the quadratic model (yt = the square root of y), the
transformation regression equation can be expressed in terms of the original units of variable Y as:
y' = ( b0 + b1x )2

where

y' = predicted value of y in its orginal units


List Price: $110.00
x = independent variable
Buy Used: $78.78
b0 = y-intercept of transformation regression line Buy New: $78.84
b1 = slope of transformation regression line

The residual plot (above right) shows residuals based on predicted raw scores from the transformation 5 Steps to a 5 AP Statistics,
regression equation. The plot suggests that the transformation to achieve linearity was successful. The 2010-2011 Edition (5 Steps to
pattern of residuals is random, suggesting that the relationship between the independent variable (x) a 5 on the Advanced
and the transformed dependent variable (square root of y) is linear. And the coefficient of determination Placement Examinations
was 0.96 with the transformed data versus only 0.88 with the raw data. The transformed data resulted in Series)
a better model. Duane Hinders

Test Your Understanding

Problem

In the context of regression analysis, which of the following statements is true?


List Price: $18.95
I. A linear transformation increases the linear relationship between variables. Buy Used: $0.28
Buy New: $4.10
II. A logarithmic model is the most effective transformation method.
III. A residual plot reveals departures from linearity.

(A) I only How to Prepare for the AP

(B) II only Statistics, 3rd Edition

(C) III only Martin Sternstein Ph.D.

(D) I and II only


(E) I, II, and III

Solution

The correct answer is (C). A linear transformation neither increases nor decreases the linear relationship
List Price: $16.99
between variables; it preserves the relationship. A nonlinear transformation is used to increase the Buy Used: $0.01
relationship between variables. The most effective transformation method depends on the data being Buy New: $14.77

transformed. In some cases, a logarithmic model may be more effective than other methods; but it other
cases it may be less effective. Non-random patterns in a residual plot suggest a departure from linearity
in the data being plotted. Texas Instruments TI-83-Plus
Silver Edition

< Previous lesson Next lesson >

Buy Used: $30.00


Buy New: $169.89

Approved for AP Statistics and


Calculus
About Us Contact Us Privacy Terms of Use Resources Advertising

The contents of this webpage are copyright 2016 StatTrek.com. All Rights Reserved.

View Mobile Version

You might also like