You are on page 1of 6

The Point-Biserial Coefficient of Correlation

The guaranteed fulfillment of the linearity assumption for two groups suggested previously has
considerable theoretical importance within the framework of the general linear model.
Algebraically, this theoretical advantage can be realized by restricting the variability of the
predictor variable by defining it as a binary variable, i.e., a variable that defines two categories,
signified by one and zero. The coefficient of correlation between a (binary) categorical
variable X and a continuous variable Y is called the point biserial coefficient.

Conceptual Definition of the Point Biserial


The dichotomous nature of binary variables allows for the classification of both the X and Y
variables into two categories, with separate ns, means, and variances. Consider the problem of
computing a correlation coefficient between a binary variable X = [0 0 1 1 1], and a continuous
variable Y = [1 2 3 4 5]. The framework for the development of a concise computational
algorithm for this special case is outlined as

The concept of the point biserial is based on a rendering of the coefficient of correlation as a
slope of a regression line. For each category of a binary predictor variable, the predicted score

can be calculated by the regression on categories method as a mean of the scores in either 0 or
1 category. The slope of a regression line of the point biserial coefficient of correlation can be
plotted as

The slope of the regression line B, is the ratio of the opposite and adjacent legs of the triangle.
This slope can be calculated by

where Ms signify the mean of the Y scores corresponding to the 1 or 0 categories, respectively.
The above equation would equal the point biserial coefficient of correlation if the slope would be
expressed in standard scores. However, the equation is rendered in obtained scores. To derive the
formula for the point biserial coefficient of correlation we must transform the above formula
from the obtained score form to that of standard score form.

Derivation of the Point Biserial

To accomplish the necessary modifications of the equation for the slope of the regression line
defined in the above section, let us start with the equation of a regression line in standard score
form

This equation can be expressed in deviation scores form as

and simplified to

The above equation can be compared with the analytical equation of a line in deviation scores

Equating the slopes of the analytical and statistical equations of a line

and multiplying both sides of this equation by the standard deviation of the predictor variable X,
the following equation results

The coefficient of correlation, as isolated from the above equation, can be written as

At this point we can substitute the slope of the regression line B, as defined in the preceding
section, for the slope of the regression line b in the above equation, since the slopes of regression
lines in obtained and deviation scores are identical. This results in equation

The Point Biserial in the PQ Notation


We can replace the standard deviation of the predictor variable X written in sigma notation, with
the variance written in the 'pq' notation. The formula for the point biserial coefficient of
correlation, as derived from the Pearson's product-moment coefficient of correlation, is

The point biserial coefficient of correlation can be also written in the form of a coefficient of
determination:

Computation of the Point Biserial


Let us reconsider the example introduced at the beginning of the chapter

The point biserial coefficient of determination can be computed as (3/5)(2/5)(4-1.5)2/2 which


equals .75. This result can be verified by standardizing variance of predicted scores as 1.5/2 that,
indeed, equals .75.

Summary
The preferred conceptualization of the point biserial coefficient of correlation is in its
determination form, as

The values of the point biserial are numerically equivalent those that could have been obtained
by the product moment coefficient of correlation computed from the same data.
The point biserial correlation is conceptually important, as it helps to understand the main
principles of the tests of statistical significance, especially how the coefficient of correlation can
be used to measure a difference between two means.

You might also like