You are on page 1of 38

C H A P T E R

CORE

Data transformation
What do we mean by data transformation? What is the effect of applying a log, squared or reciprocal transformation to the x variable? What is the effect of applying a log, squared or reciprocal transformation to the y variable? How do I choose which data transformation to apply? How do I carry out a regression analysis with transformed data?

6.1 Data transformation


There are methods for tting curves to non-linear relationships, using non-linear regression. However, this procedure is mathematically complicated and the results difcult to interpret. The method of dealing with a non-linear relationship favoured in practice is to apply a mathematical function to one of the variables, so that the relationship between the variables becomes closer to a straight line. By appropriate choice of the function, the scale of measurement is stretched or compressed. There are many functions that can be used to transform the data, but here we will consider only three. These are: the squared transformation the logarithmic transformation the reciprocal transformation When rst confronted with data transformation, many people tend to be suspicious. However, when we think about it from the point of view of analysing a set of data, there is nothing special about the units of measurement used when gathering the data. In general, units used are chosen because they are convenient for recording and reporting the data. Natural units tend to be used, for example, seconds when recording time, or metres when recording length. But what is the natural unit for measuring fuel economy of a car: kilometres per litre (x) or litres per kilometre (x 1 )? In measurement, natural often tends to mean familiar. For example, to a chemist, it is natural to measure acidity in terms of pH and the logarithm of hydrogen ion concentration (log x) rather than the hydrogen ion concentration (x).

166

Chapter 6 Data transformation

167

Generally, it is only luck if a data set reveals all its hidden information when analysed in the form in which it was initially gathered and/or reported. It is part of the analysts role to search out different ways of looking at the data in order to enhance our understanding of that data. One of the most powerful tools available to help achieve this task is data transformation. How do these transformations affect the values to which they are applied? Consider the following table of numbers: value (value)2 log(value) 1/value 0.2 0.04 0.699 5 0.4 0.16 0.398 2.5 0.6 0.36 0.222 1.667 1 1 0 1 2 4 0.301 0.5 3 9 0.477 0.333 4 16 0.602 0.25

From the table we can see that the transformations have the following effects on the data values: The squared transformation has the effect of decreasing values less than 1, and increasing values greater than 1. Large values are increased the most. For example, 22 = 4, and 202 = 400, so that while the values 2 and 20 are 18 units apart, the values 4 and 400 are 396 units apart. That is, the effect of the square transformation is to stretch the values. The log transformation reduces all values, and values between 0 and 1 become negative. Large values are reduced much more than small values. For example, log 2 = 0.301, and log 20 = 1.301, so that while the values 2 and 20 are 18 units apart, the values 0.301 and 1.303 are only 1 unit apart. That is, the effect of the log transformation is to compress the values. Note that the log function can be applied only to values which are greater than 0. The reciprocal transformation again reduces all values greater than one. Large values 1 are reduced much more than small values. For example, 1 = 0.5, and 20 = 0.05, so that 2 while the values 2 and 20 are 18 units apart, the values 0.5 and 0.05 are only 0.45 units apart. That is, the effect of the reciprocal transformation is to compress the large values to an even greater extent than the log transformation. Thus it can be seen that all transformations have a greater effect on the larger values, but this effect varies for each transformation.

Exercise 6A
1 a Copy and complete the table. value 1 2 3 4 5 b Use the information in the table to 2 (value) complete the following statements log(value) by deleting the incorrect term. 1/value i The squared transformation stretches/compresses the scale. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 6 7

168

Essential Further Mathematics Core

2 a Copy and complete the table. value 1 2 4 8 16 32 64 b Use the information in the table to 2 (value) complete the following statements log(value) by deleting the incorrect term. 1/value i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 3 a Copy and complete the table. value (value)2 log(value) 1/value 1 10 100 1000 10000 100000

b Use the information in the table to complete the following statements by deleting the incorrect term. i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 4 a Copy and complete the table. value (value)2 log(value) 1/value 20 10 5 2.5 1.25 0.625

b Use the information in the table to complete the following statements by deleting the incorrect term. i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 5 a Copy and complete the table. value (value)2 log(value) 1/value 2 20 200 2000 20000 200000

b Use the information in the table to complete the following statements by deleting the incorrect term. i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values.

Chapter 6 Data transformation

169

6.2

Transforming the x-axis


We are interested in linearising the relationship between two variables, x and y, and the transformations discussed in the previous section can be applied to either x or y (but not both here). We will examine the effect of transforming the x-axis and the y-axis separately. Transforming the x-axis will have the effect of moving the x values on the plot horizontally, and leave the y values unaltered. The square, log and reciprocal transformations can be applied to the x-axis with the following effects:
Transformation x2 Outcome Graph
y

Spreads out the high x values relative to the smaller x values

log x

Compresses large x values relative to the smaller data values

1 x

Also compresses large x values relative to the smaller data values, to a greater extent than log x. Note that values of x less than 1 become greater than 1, and values of x greater than 1 become less than 1, so that the order of the data values is reversed.

The following examples show the effect on the relationship between x and y when the squared, log and reciprocal transformations are applied to the x values.

170

Essential Further Mathematics Core

Example 1

Linearising the relationship with a squared transformation x y 0 2 1 3 2 6 3 11 4 18

a Plot the data in the table, and comment on the form of the relationship between x and y.

b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.
y 20 15 10 5 0 x 1 2 3 4

3 Write down your conclusion. b 1 Construct a new table of values.

The relationship between y and x is non-linear.


x2 0 1 4 9 16 y 2 3 6 11 18
y 20 15 10 5 0 x2 5 10 15 20

2 Plot the values of y against x 2 . 3 Decide if the form of the relationship is linear or non-linear.

4 Write down your conclusion.

The relationship between y and x2 is linear.

Data transformation is very conveniently carried out with the aid of a graphics calculator, and in practice, this is how you will do it in future. Note that, throughout this chapter, you will nd it useful to enter the data into named lists because you will need to keep track of the various lists of transformed data as you work through the problems.

Chapter 6 Data transformation

171

How to apply the squared transformation using the TI-Nspire CAS Plot the data presented in the table below. x y 0 2 1 3 2 6 3 11 4 18

Apply a squared transformation to the x values (x 2 ) and replot the data. Steps 1 Start a new document by pressing + . 2 Select Add Lists & Spreadsheet. Enter the data into lists named x and y, as shown.

3 Press + and select Add Data & Statistics. Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

4 Return to the Lists & Spreadsheet + ). application (by pressing To calculate the values of x 2 and store them in a list named xsq (short for x-squared), do the following: a Move the cursor to the top of column C . and type xsq. Press b Move the cursor to the grey cell immediately below the xsq heading. We need to enter the expression = x 2. To then VAR ( ), do this, press highlight the variable x and then press to paste x into the formula line. Finally, type 2 (or press ) to complete to calculate and the formula. Press display the x-squared values.

Note: The dash in front of the x (i.e. x) is automatically added when a list name is pasted from the VAR menu. Note: You can also type in the variable x and then select Variable Reference when prompted. This avoids using the VAR menu.

172

Essential Further Mathematics Core

5 Construct a scatterplot of y against x 2 . + to return to the Press scatterplot created earlier and change the independent variable to xsq as follows: a Press e until the list of variables is displayed near the x-axis. Select the to paste the variable, xsq. Press variable to the x-axis. b A scatterplot of y against xsq (x 2 ) is then displayed, as shown. The plot is clearly linear.
Note: If you wish to keep the original plot of y against x you can create a new Data & Statistics page to plot the transformed data.

How to apply the squared transformation using the ClassPad Plot the data presented in the table below. x y 0 2 1 3 2 6 3 11 4 18

Apply a squared transformation to the x values (x 2 ) and replot the data. Steps 1 Open the Statistics application and enter the data into the columns named x and y. Your screen should look like the one shown. 2 Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

Chapter 6 Data transformation

173

3 To calculate the values of x 2 and store them in a list named xsq (i.e. x-squared): a Tap to highlight the cell at the top of the next empty list. Rename by typing xsq and pressing enter . b Tap to highlight the cell at the bottom of the newly named xsq column (in the row titled Cal ). Type x 2 and press to calculate and list the x 2 values. 4 Construct a scatterplot of y against xsq (i.e. x 2 ). The plot is clearly linear.

Example 2

Linearising the relationship with the log transformation x y 1 0 10 10 100 20 400 25 600 1000 28 30

a Plot the data in the table, and comment on the form of the relationship between x and y.

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.
30 25 20 15 10 5 0 x 200 400 600 800 1000 y

3 Write down your conclusion.

The relationship between y and x is non-linear.

174

Essential Further Mathematics Core

b 1 Construct a new table of values.

log x y

0 0
y 30 25 20 15 10 5 0

1 10

2 20

2.6 2.8 25 28

3 30

2 Plot the values of y against log x. 3 Decide if the form of the relationship is linear or non-linear.

log x 0.5 1 1.5 2 2.5 3

4 Write down your conclusion.

The relationship between y and log x is linear.

Once again, this transformation is very conveniently carried out with the aid of a graphics calculator. How to apply the log transformation using the TI-Nspire CAS Plot the data presented in the table below. x y 1 0 10 10 100 20 400 25 600 28 1000 30

Apply a log transformation to the x values (log (x)) and replot the data. Steps 1 Start a new document by pressing + . 2 Select Add Lists & Spreadsheet. Enter the data into lists named x and y, as shown opposite.

Chapter 6 Data transformation

175

3 Press + and select Add Data & Statistics. Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

4 Return to the Lists & Spreadsheet + ). application (by pressing To calculate the values of log x and store them in a list named lx (short for log x), complete the following: a Move the cursor to the top of . column C and type lx. Press b Move the cursor to the grey cell immediately below the lx heading and type = log(. Then ), highlight the press VAR ( to paste x variable x, press into the formula line, then type ) to complete the command. Press to calculate and display the log values. 5 Construct a scatterplot of y against log x. + to return to the Use scatterplot created earlier and change the independent variable to lx. A scatterplot of y against lx (i.e. the log of x) is displayed, as shown. The plot is clearly linear.

Note: If your answers are not given as decimals, refer to the Appendix to change Mode settings to APPRX.

176

Essential Further Mathematics Core

How to apply the log transformation using the ClassPad Plot the data presented in the table below. x y 1 0 10 10 100 20 400 25 600 28 1000 30

Apply a log transformation to the x values (log (x)) and replot the data. Steps 1 Open the Statistics application and enter the data into the columns named x and y. Your screen should look like the one shown opposite. 2 Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear. 3 To calculate the values of log x and store them in a list named lx (short for log x): a Tap to highlight the cell at the top of the next empty list (in this case, list3). Rename by typing lx and pressing enter . b Tap to highlight the cell at the bottom of the newly named lx column (in the row titled Cal ). Typing log(x) and calculates and pressing lists the values of log x.

Chapter 6 Data transformation


Note: To ensure decimal values are displayed, Decimal should be visible in the status bar (at the

177

bottom). If Standard is visible, tap Standard and it will change to Decimal.

4 Construct a scatterplot of y against lx (i.e. log x). The plot is clearly linear.

Example 3

Linearising the relationship with the

1 transformation x x y 1 30 2 15 3 10 4 7.5 5 6

a Plot the data in the table, and comment on the form of the relationship between x and y.

b Apply a reciprocal transformation to the x values 1 , again plot the data, and comment on x the form of the relationship between y and 1 . x Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.
y 30 25 20 15 10 5 1 2 3 4 5 x

3 Write down your conclusion. b 1 Construct a new table of values.

The relationship between y and x is non-linear. 1/x y 1.0 30 0.5 15 0.33 10 0.25 7.5 0.2 6

178

Essential Further Mathematics Core

2 Plot the values of y against 1 . x 3 Decide if the form of the relationship is linear or non-linear.

y 30 25 20 15 10 5 0.20 0.40 0.60 0.80 1.00 1 x

4 Write down your conclusion.

The relationship between y and

1 is linear. x

Once again, this transformation is very conveniently carried out with the aid of a graphics calculator. How to apply the reciprocal transformation using the TI-Nspire CAS Plot the data presented in the table below. x y 1 30 2 15 3 10 4 7.5 5 6 1 x and replot the data.

Apply a reciprocal transformation to the x values Steps 1 Start a new document by pressing + . 2 Select Add Lists & Spreadsheet. Enter the data into lists named x and y, as shown opposite.

3 Press + and select Add Data & Statistics. Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

Chapter 6 Data transformation

179

4 Return to the Lists & Spreadsheet + ). application (by pressing 1 To calculate the values of , complete x the following: a Move the cursor to the top of column C and type recx (short for the . reciprocal of x). Press b Move the cursor to the grey cell immediately below the recx heading and type = 1 , then press VAR ) and highlight the variable x ( to paste into the and press formula line. Press to calculate 1 and display the values. x 1 5 Construct a scatterplot of y against x (i.e. recx) + to return to the scatterplot Use created earlier and change the independent variable to recx. A scatterplot of y against recx (the reciprocal of x) is displayed as shown. The plot is clearly linear.

Note: If your answers are not presented as decimals, refer to the Appendix to change Mode settings to APPRX.

180

Essential Further Mathematics Core

How to apply the reciprocal transformation using the ClassPad Plot the data presented in the table below. x 1 2 3 4 y 30 15 10 7.5 Steps 1 Open the Statistics application and enter the data into the columns named x and y. Your screen should look like the one shown opposite. 2 Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear. 1 3 To calculate the values of and x store them in a list named recx (short for the reciprocal of x): a Tap to highlight the cell at the top of the next empty list (in this case, list3). Rename by typing recx and pressing enter . b Tap to highlight the cell at the bottom of the newly named recx column (in the row titled Cal ). Typing 1 x and calculates and pressing 1 lists the values. x 4 Construct a scatterplot of y 1 against recx i.e. . The plot is x clearly linear.

5 6

Apply a reciprocal transformation to the x values

1 x

and replot the data.

Chapter 6 Data transformation

181

What sorts of non-linear relationships can we linearise using the x2 transformation? The x 2 transformation has the effect of stretching out the upper end of the x scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the x to x 2 transformation. Note that for the x 2 transformation to apply, the scatterplot should peak or bottom around x = 0.
y y

What sorts of non-linear relationships can we linearise using the log x transformation? The log x transformation has the effect of compressing the upper end of the x scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the x to log x transformation.
y y

1 What sorts of non-linear relationships can we linearise using the transformation? x As a guide, relationships that have scatterplots which look like those shown below can often 1 (but not always) be linearised using the x to transformation. x
y y

Exercise 6B
These exercises are expected to be completed with the aid of a graphics calculator. 1 a Plot the data in the table, and comment on x 0 1 2 3 4 the form of the relationship between y and x. y 16 15 12 7 0 b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 .

182

Essential Further Mathematics Core

2 a Plot the data in the table, and comment on the form of the relationship between y and x.

x y

1 3

2 9

3 19

4 33

5 51

b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . 3 a Plot the data in the following table, and x 1 2 3 4 5 comment on the form of the relationship y 30 27 22 15 6 between y and x. b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . 4 a Plot the data in the following table, and comment on the form of the relationship between y and x. x y 1 30 10 20 100 10 400 5 600 2 1000 0

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 5 a Plot the data in the table, and comment on the form of the relationship between y and x. x y 5 3.1 10 4.0 150 7.5 500 9.1 1000 10.0

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 6 a Plot the data in the table, and comment on the form of the relationship between y and x. x y 10 15.0 44 11.8 132 9.4 436 6.8 981 5.0

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 7 a Plot the data in the table, and comment on the x 2 4 6 8 10 form of the relationship between y and x. y 60 30 20 15 12 b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment on the form of the relationship between y and 1/x. 8 a Plot the data in the table, and comment on the form of the relationship between y and x. b Apply a reciprocal transformation to the x values x y 1 61 2 31 3 21 4 16 5 13

(1/x), again plot the data and comment on the form of the relationship between y and 1/x. 9 a Plot the data in the following table, and comment on the form of the relationship between y and x. x y 2 10 4 70 6 90 8 100 10 106

b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment on the form of the relationship between y and 1/x.

Chapter 6 Data transformation

183

c Name an x-axis transformation that should also work for the data. Try it and see. d Name an x-axis transformation that should not work for the data. Try it and see. 10 The table below shows the diameter (in cm) of a number of umbrellas, along with the number of people each umbrella is designed to keep dry. Diameter Number of people 50 1 70 2 85 3 100 4 110 5

a Construct a scatterplot showing the relationship between number of people and umbrella diameter, and comment on the form. b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . 11 The table below shows the performance level on a task of a number of people, along with the time spent (in minutes) in practising the task. Time spent on practise Level of performance 0.5 1.0 1.0 1.5 1.5 2.0 2.0 3.0 3.0 3.0 4.0 3.5 5.0 4.0 6.0 3.5 7.0 3.9 7.0 3.6

a Construct a scatterplot showing the relationship between the time spent on practice and level of performance, and comment on the form. b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 12 The table below shows the horsepower of several cars, along with their fuel consumption in kilometres/litre. Fuel consumption 5.2 7.3 12.6 7.1 6.3 10.1 10.5 14.6 10.9 7.7 Horsepower 155 125 75 110 138 88 80 70 100 103 a Construct a scatterplot showing the relationship between horsepower and fuel consumption, and comment on the form. b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment on the form of the relationship between y and 1/x.

6.3

Transforming the y-axis


Another way to linearise the relationship between x and y is to apply these transformations to the y-axis. Transforming the y-axis will have the effect of moving the y values on the plot vertically, and leave the x values unaltered. The square, log and reciprocal transformations can be applied to the y-axis with the following effects:

184

Essential Further Mathematics Core Transformation y2 Outcome Graph


y

Spreads out the large y values relative to the smaller data values

log y

Compresses large y values relative to the smaller data values

1 y

Also compresses large y values relative to the smaller data values, to a greater extent than log y. Note that values of y less than 1 become greater than 1, and values of y greater than 1 become less than 1, so that the order of the data values is reversed.

The following examples show the effect on the relationship between x and y when the squared, log and reciprocal transformations are applied to the y values. Once again, all these data transformations can be very conveniently carried out with the aid of a graphics calculator. Example 4 Linearising the relationship with a squared transformation

a Plot the data in this table, and comment on the form of the relationship between y and x. x y 0 0 1 3.2 2 4.5 3 5.5 4 6.3 5 7.1

b Apply a squared transformation to the y values (y 2 ), again plot the data, and comment on the form of the relationship between y 2 and x. Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.
8 6 4 2 0 x 1 2 3 4 5 y

3 Write down your conclusion.

The relationship between y and x is non-linear.

Chapter 6 Data transformation

185
5

b 1 Construct a new table of values.

x Y2

0 0

1 10.2
y2 60 50 40 30 20 10 0

2 20.3

3 30.3

4 39.7

50.4

2 Plot the values of y 2 against x. 3 Decide if the form of the relationship is linear or non-linear.

x 1 2 3 4 5

4 Write down your conclusion. Example 5

The relationship between y 2 and x is linear.

Linearising the relationship with the log transformation

a Plot the data in this table, and comment on the x 0 1 2 3 4 5 form of the relationship between y and x. y 100 37 14 5 2 1 b Apply a log transformation to the y values (log y), again plot the data, and comment on the form of the relationship between log y and x. Solution a 1 Plot the values of log y against x. 2 Decide if the form of the relationship is linear or non-linear.
y 100 80 60 40 20 0 x 1 2 3 4 5

3 Write down your conclusion. b 1 Construct a new table of values.

The relationship between y and x is non-linear. x 0 1 2 3 4 5 log y 2.00 1.57 1.15 0.70 0.30 0.00
log y 2.0 1.5 1.0 0.5 0 x 1 2 3 4 5

2 Plot the values of log y against x. 3 Decide if the form of the relationship is linear or non-linear.

4 Write down your conclusion.

The relationship between log y and x is linear.

186

Essential Further Mathematics Core

Example 6

Linearising the relationship with the

1 transformation y

a Plot the data in this table, and comment on the x 1 2 3 4 5 form of the relationship between y and x. y 10.0 5.0 3.3 2.5 2.0 b Apply a reciprocal transformation to the y values (1/y), again plot the data, and comment on the form of the relationship between x and 1/y. Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.
10 8 6 4 2 1 2 3 4 5 x y

3 Write down your conclusion. b 1 Construct a new table of values. 1 2 Plot the values of against x. y 3 Decide if the form of the relationship is linear or non-linear.

The relationship between y and x is non-linear. x 1/y 1 0.1 2 0.2


1 y 0.5 0.4 0.3 0.2 0.1 1 2 3 4 5 x

3 0.3

4 0.4

5 0.5

4 Write down your conclusion.

The relationship between

1 and x is linear. y

What sorts of non-linear relationships can we linearise using the y 2 transformation? The y 2 transformation has the effect of stretching out the upper end of the y scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the y to y 2 transformation. Note that for the y 2 transformation to apply, the scatterplot should peak or bottom around y = 0.
y y

Chapter 6 Data transformation

187

What sorts of non-linear relationships can we linearise using the log y transformation? The log y transformation has the effect of compressing the upper end of the y scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the y to log y transformation.
y

1 transformation? y As a guide, relationships that have scatterplots which look like those shown below can often 1 (but not always) be linearised using the y to transformation. y What sorts of non-linear relationships can we linearise using the
y
y

Exercise 6C
These exercises are expected to be completed with the aid of a graphics calculator. 1 a Plot the data in the table. Comment on the x 0 2 4 6 8 10 form of the relationship between y and x. y 1.2 2.8 3.7 4.5 5.1 5.7 b Apply a squared transformation to the y values (y 2 ). Plot the data, and comment on the form of the relationship between y 2 and x. 2 a Plot the data in the table. Comment on x 5 10 15 20 25 30 the form of the relationship between y 13.2 12.2 11.2 10.0 8.7 7.1 y and x. b Apply a squared transformation to the y values (y 2 ). Plot the data, and comment on the form of the relationship between y 2 and x. 3 a Plot the data in the table. Comment on the form of the relationship between y and x. x y 2 5.1 6 6.2 11 12 21 40 7.3 7.5 9.1 11.8

188

Essential Further Mathematics Core

b Apply a squared transformation to the y values (y 2 ). Plot the data and comment on the form of the relationship between y 2 and x. 4 a Plot the data in the table. Comment on the x 0.1 0.2 0.3 0.4 0.5 form of the relationship between y and x. y 15.8 25.1 39.8 63.1 100.0 b Apply a log transformation to the y values (log y). Plot the data and comment on the form of the relationship between log y and x. 5 a Plot the data in the table. Comment on the x 2 4 6 8 10 form of the relationship between y and x. y 7.94 6.31 5.01 3.98 3.16 b Apply a log transformation to the y values (log y). Plot the data and comment on the form of the relationship between log y and x. 6 a Plot the data in the table. Comment on the x 1 3 5 7 9 form of the relationship between y and x. y 7 32 147 681 3162 b Apply a log transformation to the y values (log y). Plot the data, and comment on the form of the relationship between log y and x. 7 a Plot the data in the table. Comment on the x 1 2 3 4 form of the relationship between y and x. y 1 0.5 0.33 0.25 b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on the form of the relationship between 1/y and x. 5 0.20

8 a Plot the data in the table. Comment on the x 0.2 0.4 0.6 0.8 1.0 form of the relationship between y and x. y 0.71 0.56 0.45 0.38 0.33 b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on the form of the relationship between 1/y and x. 9 a Plot the data in the table. Comment x 11 14 26 35 41 on the form of the relationship y 0.43 0.34 0.19 0.14 0.12 between y and x. b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on the form of the relationship between 1/y and x. c Name a y-axis transformation that should also work for the data. Try it and see. d Name a y-axis transformation that should not work for the data. Try it and see. 10 The time taken for a local anaesthetic to take effect is related to the dose given. To investigate this relationship a researcher collected the data shown. Dose Time 0.5 3.67 0.6 3.55 0.7 3.42 0.8 3.29 0.9 3.15 1.0 3.00 1.1 2.85 1.2 2.68 1.3 2.51 1.4 2.32 1.5 2.12

Chapter 6 Data transformation

189

a Construct a scatterplot showing the relationship between the dose of anaesthetic and time taken for it to take effect, and comment on the form. b Apply a squared transformation to the time values (y), again plot the data, and comment on the form of the relationship between time squared (y 2 ) and dose (x). 11 The table below shows the number of internet users signing up with a new internet service provider for each of the rst nine months of their rst year of operation. Number Month 24 1 32 2 35 3 44 4 60 5 61 6 78 7 92 8 118 9

a Construct a scatterplot showing the relationship between number of users signing up and month, and comment on the form. Month is the independent variable. b Apply a log transformation to the number of users (y), again plot the data, and comment on the form of the relationship between log (number) and month (x). 12 A group of ten students was given an opportunity to practise a complex matching task as often as they liked before they were assessed on the task. The number of times they practised the task and the number of errors made when assessed are given in the table below. Number Errors 1 14 2 9 2 11 4 5 5 4 6 4 7 3 7 3 9 2 11 2

a Construct a scatterplot showing the relationship between number of practices and number of errors (y), and comment on the form. b Apply a reciprocal transformation to the number of errors values (1/y), again plot the data, and comment on the form of the relationship between number of errors (1/y) and number of practices (x).

6.4

Choosing and applying the appropriate transformation


Putting together the information in Sections 6.2 and 6.3, we can see that there may be more than one transformation which linearises the scatterplot. The forms of the scatterplots that can be transformed by the squared, log or reciprocal transformations can be largely classied into one of four categories, shown as the circle of transformations.

190

Essential Further Mathematics Core

The circle of transformations Possible transformations


y2
y2 x2
1 x

Possible transformations

log x

log y 1 y

log y

1 y

log x 1 x

x2

Note that the transformations we have introduced in this chapter are able to linearise only those relationships that are consistently increasing or decreasing. The advantage of having alternatives is that, in practice, we can always try each of them to see which gives us the best result. How do we decide which transformation is the best? The best transformation is the one that results in the best linear model. To choose the best linear model we will consider for each transformation applied: the residual plot, in order to evaluate the linearity of the transformed relationship the value of the coefcient of determination (r 2 ): a higher value indicates a better t This procedure is illustrated in Example 7.

Chapter 6 Data transformation

191

Example 7 The data in this table gives life expectancy in years and gross national product, GNP, in dollars for 24 countries in 1982. Using an appropriate transformation, nd a regression model for the relationship between life expectancy in years and GNP. Country Nicaragua Paraguay Venezuela France West Germany Greece Norway Czechoslovakia Austria Jordan Sri Lanka Brunei

GNP Life expectancy 950 58 1670 65 4250 68 11 520 74 12 280 73 4170 73 14 300 75 5540 71 9830 72 1680 61 320 67 22 260 66

Country GNP Life expectancy Indonesia 550 50 North Korea 930 66 Mongolia 940 64 Taiwan 2 670 72 Australia 11 220 74 Congo 1 420 48 Ethiopia 150 41 Guinea 330 44 Mauritania 520 44 Nigeria 940 49 Togo 350 48 Zaire 180 48

Source: Modern Data Analysis: A First Course in Applied Statistics, L.C. Hamilton 1990, p. 537 West Germany is now part of Germany; Czechoslovakia is now the Czech Republic and Slovakia.

Solution 1 Decide which of the variables is the independent variable, and which is the dependent variable. 2 Plot the values of y against x, decide if the form of the relationship is linear or non-linear, and nd the value of the coefcient of determination (r 2 ).

The independent variable is GNP. The dependent variable is Life expectancy.

Life expectancy

GNP

3 Write down your conclusion. 4 Compare the shape of this plot to those in the circle of transformations (page 190). The scatterplot is similar to the plot in the top left-hand corner. Thus, y 2 , log x and 1 x are the transformations to investigate.

The relationship between y and x is non-linear: r 2 = 36.7%. Suitable transformations are y 2 , log x 1 and x .

192

Essential Further Mathematics Core

Life expectancy squared

Try the y 2 transformation 5 Calculate the values of (Life expectancy)2 and plot these against GNP. Comment on the linearity of the plot.

GNP

6 Fit a regression line, and nd the value of the coefcient of determination (r 2 ). Produce a residual plot, and use this to comment on the form of the relationship.

r = 38.4%. The relationship between (Life expectancy) 2 and x is still non-linear. This is confirmed by the residual plot.
2

Residual

GNP

7 Comment on the effect of the transformation. Try the log x transformation 5 Calculate the values of log GNP and plot these against Life expectancy.

The y 2 transformation has not really helped.

Life expectancy

log GNP

6 Fit a regression line, and nd the value of r 2 . Produce a residual plot, and use this to comment on the form of the relationship.

r 2 = 66.0%. The relationship between Life expectancy and log GNP is closer to linear. This is confirmed by the residual plot.

Chapter 6 Data transformation

193

7 Comment on the effect of the transformation.


Residual

log GNP

The log x transformation has linearised the relationship quite well.


Try the 1/x transformation 1 and plot 8 Calculate the values of GNP 1 Life expectancy against . GNP

Life expectancy

1/GNP

9 Fit a regression line, and nd the value of r 2 . Produce a residual plot and use this to comment on the form of the relationship.

r = 51.5%. The relationship between Life expectancy and 1/GNP is reasonably linear. This is confirmed by the residual plot.
2

Residual

1/GNP

10 Comment on the effect of the transformation. 11 Decide which transformation is the most appropriate for this relationship. Choose the transformation which gives the most linear relationship (from the residual plots) and the highest value of r 2 .

The 1/x transformation has done a reasonable job in linearising the relationship. The most appropriate transformation to use here is the log x transformation, as the residual plot shows that the relationship between log GNP and Life expectancy is linear, and this model has the highest coefficient of determination, r 2 = 66.0%.

194

Essential Further Mathematics Core

12 As the relationship between Life expectancy and log (GNP) appears to be linear and there are no obvious outliers, we can use the least squares method to t a line to the data. Using a calculator, nd the equation of the least squares regression line and write it in terms of the transformed variables.
Note: The independent variable (IV) is now log GNP, and the dependent variable (DV) is Life expectancy.

Using the log x transformation gives a regression model for the relationship:
Life expectancy = 14.3 + 14.5 log (GNP)

Some comments It might seem unnatural to talk about the wealth of a country in terms of log (GNP), yet when we are comparing the relative wealth of countries, log (GNP) is probably a more useful measure than GNP. For instance, knowing that the difference in GNP between Australia and Sri Lanka is $10 900 is less informative than knowing that the difference in log (GNP) is 1.5448, which tells us that Australias GNP is 101.5884 or 35 times that of Sri Lanka. Natural units of measurement are more often those that are familiar rather than those that are most useful!

Exercise 6D
1 The following scatterplots show non-linear relationships. For each scatterplot, state which of the transformations x 2 , log x, 1/x, y 2 , log y, 1/y, if any, you would apply to linearise the relationship. b 5 a 5
4 3 y 2 1 0 y 2 1 0 1 2 3 4 5 6 7 8 9 10 x 4 3

1 2 3 4 5 6 7 8 9 10 x

5 4 3 y 2 1 0

5 4 3 y 2 1 0

1 2 3 4 5 6 7 8 9 10 x

1 2 3 4 5 6 7 8 9 10 x

Chapter 6 Data transformation

195

2 The data below give the yield in kilograms and length in metres of 12 commercial potato plots. Yield (kilograms) 346 1798 152 86 436 968 686 257 2435 287 1850 1320 Length (metres) 12.1 27.4 8.3 5.5 15.7 21.5 19.3 9.0 34.2 14.7 31.9 25.3 a Construct a scatterplot showing the relationship between yield and length of plot and comment on the form. b Using an appropriate transformation, nd a regression model for the relationship between yield in kilograms and length of plot in metres. 3 A recent study in Canada showed that cigarette consumption (per day) is related to cost per pack. Some data drawn from that study is shown below. Cost ($) 4.00 4.50 4.80 5.50 6.00 6.50 7.50 Cigarette consumption 8.0 7.4 7.0 6.4 5.9 5.5 5.0 a Construct a scatterplot showing the relationship between the cost of cigarettes and cigarette consumption, and comment on the form. b Using an appropriate transformation nd a regression model for the relationship between the cost of cigarettes and cigarette consumption. 4 The population of a large town increased over a 13-year period, as shown in the table. a Construct a scatterplot showing the annual population growth of the town, and comment on the form. b Using an appropriate transformation, nd a regression model for the annual population growth of the town. Year 1 2 3 4 5 6 7 Population 58860 57770 58206 59513 59983 60123 59763 Year 8 9 10 11 12 13 Population 61726 60387 61646 62347 64185 67158

196

Essential Further Mathematics Core

5 The monthly average exchange rate (to the nearest cent) between the Australian dollar and the US dollar over a period of 18 months in the 1990s is given in the table below. a Construct a scatterplot showing Exchange rate Exchange rate the exchange rate over the 18-month Month (US $) Month (US $) period, and comment on the form. 1 0.77 10 0.75 b Using an appropriate transformation, 2 0.77 11 0.75 nd a regression model for the 3 0.77 12 0.72 exchange rate over that 18-month 4 0.76 13 0.72 period. 5 0.78 14 0.69 6 0.78 15 0.69 7 0.77 16 0.68 7 0.76 17 0.71 8 0.76 18 0.70 9 0.76 6 The table below shows the percentage of people who can read (literacy rate) and the gross domestic product (GDP) for a selection of 14 countries. a Construct a scatterplot showing the relationship between literacy rate and GDP, and comment on the form. b Using an appropriate transformation nd a regression model for the relationship between literacy rate and GDP for this group of countries. Literary rate (%) 72 35 97 24 99 99 99 73 99 40 35 62 99 64 Gross domestic product/capita 2677 260 19904 122 18944 4500 17539 1030 19860 409 406 6651 22384 2436

Country Botswana Cambodia Canada Ethiopia France Georgia Germany Honduras Japan Liberia Pakistan Saudi Arabia Switzerland Syria

Chapter 6 Data transformation

197

Review

Key ideas and chapter summary


Data transformation This means changing the scale on either the x- or y-axis. It is performed when a residual plot shows that the underlying relationship in a set of bivariate data is clearly non-linear. The squared transformation stretches out the upper end of the scale on an axis. The log transformation compresses the upper end of the scale on an axis. The reciprocal transformation compresses the upper end of the scale on an axis to a greater extent than the log transformation. Residual plots are used to assess the effectiveness of each data transformation. The transformation which results in a linear relationship and which has the highest value of the coefcient of determination is considered to be the best transformation. The circle of transformations provides guidance in choosing the transformations that can be used to linearise various types of scatterplots. See page 190.

x2 or y2 transformation log x or log y transformation 1 1 or transformation x y

Residual plots Coefcient of determination (r2 )

The circle of transformations

Skills check
Having completed this chapter you should be able to: 1 1 recognise which of the x 2 , log x, , y 2 , log y or transformations might be used to x y linearise a bivariate relationship apply each of these transformations to a data set use residual plots and the coefcient of determination, r 2 , to decide which transformation gives the best model for the relationship use the transformed variable as part of a regression analysis to give a model for the relationship

Multiple-choice questions
1 The missing data values, a and b, in the table are: value (value)2 log(value) 1 a 0 2 4 b 3 4 9 16 0.477 0.602 B a = 1, b = 0.5 E a = 1, b = 0.693 C a = 1, b = 0.301

A a = 0, b = 0.5 D a = 1, b = 0.602

198

Essential Further Mathematics Core

Review

2 Select the statement which correctly completes the sentence: The effect of a log transformation is to . . . A stretch the high values in the data B maintain the distance between values C stretch the low values in the data D compress the high values in the data E reverse the order of the values in the data 3 The scatterplot opposite shows the relationship between the number of weeks each person has been on a diet program and their weight loss in kilograms for a group of subjects. A least squares regression line has been tted to the data.
14 12 Weight loss 10 8 6 4 2 0 2 3 4 5 6 7 Number of weeks on a diet

The residual plot for this least squares line would look like:
Number of weeks on a diet Number of weeks on a diet

B
4.00 Residual 2.00 0.00 2.00 4.00 2 3 4 5 6 7 Number of weeks on a diet

7 6 5 4 3 2 0 2 4 6 8 10 12 14 Weight loss

6 5 4 3 2 4.00 2.00 0.00 2.00 4.00 Residual

D
Weight loss

E
14 12 10 8 6 4 2 0 2 3 4 5 6 7 Number of weeks on a diet Residual

4.00 2.00 0.00

2.00 4.00 2 3 4 5 6 7 Number of weeks on a diet

4 The relationship between two variables y and x as shown in the scatterplot is non-linear. In an attempt to transform the relationship to linearity, a student would be advised to: A leave out the rst four points B use a y 2 transformation C use a log y transformation 1 D use a transformation y E use a least squares regression line

y 5 4 3 2 1 0 x 1 2 3 4 5 6 7 8 9 10

Chapter 6 Data transformation

199

Review

5 The relationship between two variables y and x as shown in the scatterplot is non-linear. Which of the following sets of transformations could possibly linearise this relationship? 1 1 A log y, , log x, B y2, x 2 y x 1 1 D log y, , x 2 C y 2 , log x, y x E ax + b
y 5 4 3 2 1 0 x 1 2 3 4 5 6 7 8 9 10

6 The relationship between two variables y and x as shown in the scatterplot is non-linear. y Which of the following transformations is most 5 likely to linearise the relationship? 1 B a y 2 transformation 4 A a transformation x 3 1 C a log y transformation D a transformation y 2 E a log x transformation
1 0 x 1 2 3 4 5 6 7 8 9 10

7 The relationship between two variables y and x as shown in the scatterplot is clearly non-linear. In an attempt to transform the relationship to linearity, a student would be advised to apply: A an x 2 transformation B a y 2 transformation C a log y transformation 1 D a transformation y E none of these
y 5 4 3 2 1 0 x 1 2 3 4 5 6 7 8 9 10

8 Brian has determined from a scatterplot of his data that the appropriate transformations for his data are log x, 1/x and y 2 . After applying each of these transformations to the data, he obtains the results shown below. Model y vs x y vs log x y vs 1/x y 2 vs x Residuals Curved Random Random Random r2 79.6% 80.8% 81.9% 88.4%

200

Essential Further Mathematics Core

Review

Based on the information in the table, which transformation would you suggest Brian use? B a y 2 transformation C a log x transformation A an x 2 transformation 1 D a transformation E no transformation x 9 When investigating the relationship between the weight of the strawberries picked from a strawberry patch, and the width of the patch, Suzie decides that an x 2 transformation is appropriate. After transforming the data, she ts a least squares regression line to the data and determines that the intercept is 10 and the slope is 5. Based on this information, the model that Suzie has tted to the data can be written as: B weight = 5 + 10 (width)2 A (weight)2 = 10 + 5 width C weight = 10 + 5 (width)2 D (weight)2 = 10 + 5 (width)2 E (weight)2 = 5 + 10 width 10 Suppose that the model which describes the relationship between the hours spent studying for an exam and the mark achieved can be modelled by the equation: Mark = 20 + 40 log (Hours) From this model, we would predict that a student who studies for 20 hours would score a mark (to the nearest whole number) of: A 80 B 78 C 180 D 72 E 140

Extended-response questions
1 Measurements of distance travelled in metres and time taken in seconds were made on a falling body. The data are given in the table below. Time Distance Time2 a b c d e f 0 0 1 5.2 2 18.0 3 42.0 4 5 6 79.0 128.0 168.0

Construct a scatterplot of the data and comment on its form. Determine the values of (Time)2 and complete the table. Construct a scatterplot of Distance against (Time)2 . Obtain a residual plot for the new model and comment on the linearity. Determine the value of r 2 for the new model. Write down the regression equation for the new model in terms of the variables in the question. g Use the regression equation to predict the distance travelled in seven seconds.

2 The data in the table below show the marks obtained by students on a test and the amount of time they reported studying for the test: Mark 62 74 79 Time (hours) 1.5 2.25 3.0 80 2.5 56 0.8 86 3.5 92 87 64 6.0 2.75 1.0 88 4.5 48 0.5 32 0.1

Chapter 6 Data transformation

201

Review

a We want to predict a students mark from the time they reported studying for the test. In this situation, which is the dependent variable and which is the independent variable? b Construct a scatterplot and comment on the relationship between test mark and time spent studying in terms of direction, outliers, form and strength. c i Fit a linear model to the data and record its equation. Interpret the slope in terms of the problem at hand. ii Calculate the coefcient of determination and interpret. iii Construct a residual plot and use it to comment on the suitability of modelling the relationship between Mark and Time spent studying with a straight line. d Apply a log transformation to Time. Then: i construct a scatterplot for the transformed data ii nd the equation of the least squares regression line for the transformed data iii use the equation to predict the mark obtained after 5 hours of study iv calculate the coefcient of determination and interpret v construct a residual plot and use it to comment on the linearity of the transformed model 3 The following are the testosterone levels and the age at rst conviction for violent and aggressive crimes collected on a sample of young male prisoners. It is believed that the higher the testosterone level in a male prisoner, the earlier they are likely to be convicted of a violent and aggressive crime. A correlation and regression analysis is also given. Testosterone 1305 1000 1175 1495 1060 800 1005 710 1150 605 690 700 625 610 450 Age at rst conviction 11 12 13 14 15 16 16 17 18 20 21 23 24 27 30
30 28 26 24 22 20 18 16 14 12 10
0 0 60 40

y = 31.9 0.015x r2 = 0.662

Age

Testosterone level 5 4 3 2 1 0 1 2 3 4 5
40 0 60 0 80 0 10 00 12 00 14 00 16 00

Residual

Testosterone level

0 10 00 12 00 14 00 16 00

80

202

Essential Further Mathematics Core

Review

a What is the value of Pearsons correlation coefcient, r? b Write the equation of the least squares regression line in terms of Testosterone level and Age. c Interpret the value of r 2 in terms of Testosterone level and Age. d Use the residual plot to comment on the linearity of the relationship. e Construct a scatterplot of Age against log (Testosterone). f Obtain a residual plot for the new model and comment on the linearity. g Determine the value of r 2 for the new model. h Write down the regression equation for the new model in terms of the variables in the question. 4 Are infant mortality rates in a country related to the number of doctors in a country? The data below give infant mortality rates (deaths per 1000 births) and doctor numbers (per 100 000 people) for 17 countries. Infant mortality No. of doctors Infant mortality No. of doctors 12 192 15 270 13 222 85 9 12 154 20 357 14 294 21 250 10 182 54 79 10 179 75 59 7 204 121 27 10 271 71 52 111 61 a Construct a scatterplot of Infant mortality against Number of doctors and comment on the relationship between infant mortality rate and doctor numbers in terms of direction, outliers, form and strength. b Construct a scatterplot of Infant mortality against log (Number of doctors). c Obtain a residual plot for the new model and comment on the linearity. d Determine the value of r 2 for the new model. e Write down the regression equation for the new model in terms of the variables in the question. f Use the regression equation to predict the infant mortality rate when there are 100 doctors (per 100 000). 5 Tree ages can be determined by cutting down a tree and counting the number of rings on the stump of its trunk. This, however, is a destructive process and it would be useful to have a method of working out the approximate age of a tree without having to cut it down. Noting the obvious, that trees tend to get bigger as they get older, we might be able to use some external measurement of size to help us estimate the age of a tree.

Chapter 6 Data transformation

203

Review

The data below show the age (in years) and diameter at chest height (in cm) of a sample of trees of the same species taken from a commercial plantation. Age (years) 4 5 8 8 8 10 10 12 13 14 Diameter (centimetres) 2.0 2.0 2.5 5.1 7.5 5.1 8.9 12.4 9.0 6.4 Age (years) 16 18 22 25 29 30 34 38 40 Diameter (centimetres) 11.4 11.7 14.7 16.5 15.2 15.2 17.8 17.8 19.1

a We wish to predict the age of a tree from its diameter at chest height. In this situation, which is the dependent variable and which is the independent variable? b Construct a scatterplot and comment on the relationship between age and diameter in terms of direction, outliers, form and strength. c i Fit a linear model to the data and record its equation. Interpret the slope in terms of the problem at hand. ii Calculate the coefcient of determination and interpret. iii Form a residual plot and use it to comment on the suitability of modelling the relationship between age and diameter with a straight line. d Use the x 2 transformation to linearise the data. Then: i construct a scatterplot of age against diameter squared ii nd the equation of the least squares regression line for the transformed data iii calculate the coefcient of determination and interpret iv form a residual plot and use it to comment on the suitability of modelling the relationship between age and diameter squared with a straight line

You might also like