Data Transformation by Andy Field

CHAPTE R 5 E X P L O R I N G A SS U M P T I O N S 155
TABLE 5.1 Data transformations and their uses
Data Transformation Can Correct For

Log transformation (log(Xi )): Taking the logarithm of a set of numbers squashes the Positive skew,
right tail of the distribution. As such it’s a good way to reduce positive skew. However, unequal variances
you can’t get a log value of zero or negative numbers, so if your data tend to zero or
produce negative numbers you need to add a constant to all of the data before you do the
transformation. For example, if you have zeros in the data then do log (Xi + 1), or if you have
negative numbers add whatever value makes the smallest number in the data set positive.
Square root transformation (Xi ): Taking the square root of large values has more of an Positive skew,
effect than taking the square root of small values. Consequently, taking the square root unequal variances
of each of your scores will bring any large scores closer to the centre – rather like the log
transformation. As such, this can be a useful way to reduce positive skew; however, you still
have the same problem with negative numbers (negative numbers don’t have a square root).
Reciprocal transformation (1/Xi ): Dividing 1 by each score also reduces the impact of Positive skew,
large scores. The transformed variable will have a lower limit of 0 (very large numbers will unequal variances
become close to 0). One thing to bear in mind with this transformation is that it reverses
the scores: scores that were originally large in the data set become small (close to
zero) after the transformation, but scores that were originally small become big after the
transformation. For example, imagine two scores of 1 and 10; after the transformation they
become 1/1 = 1, and 1/10 = 0.1: the small score becomes bigger than the large score
after the transformation. However, you can avoid this by reversing the scores before the
transformation, by finding the highest score and changing each score to the highest score
minus the score you’re looking at. So, you do a transformation 1/(XHighest Xi ).
Reverse score transformations: Any one of the above transformations can be used to Negative skew
correct negatively skewed data, but first you have to reverse the scores. To do this, subtract
each score from the highest score obtained, or the highest score + 1 (depending on
whether you want your lowest score to be 0 or 1). If you do this, don’t forget to reverse the
scores back afterwards, or to remember that the interpretation of the variable is reversed:
big scores have become small and small scores have become big!
The issue is quite complicated (especially for this early

in the book), but essentially we need to know whether the
statistical models we apply perform better on transformed
data than they do when applied to data that violate the
assumption that the transformation corrects. If a statisti-
cal model is still accurate even when its assumptions are
broken it is said to be a robust test (section 5.7.4). I’m
JANE SUPERBRAIN 5.1 not going to discuss whether particular tests are robust
here, but I will discuss the issue for particular tests in their
To transform or not to transform, respective chapters. The question of whether to trans-
that is the question 3 form is linked to this issue of robustness (which in turn is
linked to what test you are performing on your data).
Not everyone agrees that transforming data is a good idea; A good case in point is the F-test in ANOVA (see
for example, Glass, Peckham, and Sanders (1972) in a very Chapter 10), which is often claimed to be robust (Glass
extensive review commented that ‘the payoff of normaliz- et al., 1972). Early findings suggested that F performed
ing transformations in terms of more valid probability state- as it should in skewed distributions and that transform-
ments is low, and they are seldom considered to be worth ing the data helped as often as it hindered the accu-
the effort’ (p. 241). In which case, should we bother? racy of F (Games & Lucas, 1966). However, in a lively

Data Transformation by Andy Field

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Transformation by Andy Field

Uploaded by

Copyright:

Available Formats

CHAPTE R 5 E X P L O R I N G A SS U M P T I O N S 155

TABLE 5.1 Data transformations and their uses

Data Transformation Can Correct For

The issue is quite complicated (especially for this early

You might also like