Professional Documents
Culture Documents
The guaranteed fulfillment of the linearity assumption for two groups suggested previously has
considerable theoretical importance within the framework of the general linear model.
Algebraically, this theoretical advantage can be realized by restricting the variability of the
predictor variable by defining it as a binary variable, i.e., a variable that defines two categories,
signified by one and zero. The coefficient of correlation between a (binary) categorical
variable X and a continuous variable Y is called the point biserial coefficient.
The concept of the point biserial is based on a rendering of the coefficient of correlation as a
slope of a regression line. For each category of a binary predictor variable, the predicted score
can be calculated by the regression on categories method as a mean of the scores in either 0 or
1 category. The slope of a regression line of the point biserial coefficient of correlation can be
plotted as
The slope of the regression line B, is the ratio of the opposite and adjacent legs of the triangle.
This slope can be calculated by
where Ms signify the mean of the Y scores corresponding to the 1 or 0 categories, respectively.
The above equation would equal the point biserial coefficient of correlation if the slope would be
expressed in standard scores. However, the equation is rendered in obtained scores. To derive the
formula for the point biserial coefficient of correlation we must transform the above formula
from the obtained score form to that of standard score form.
To accomplish the necessary modifications of the equation for the slope of the regression line
defined in the above section, let us start with the equation of a regression line in standard score
form
and simplified to
The above equation can be compared with the analytical equation of a line in deviation scores
and multiplying both sides of this equation by the standard deviation of the predictor variable X,
the following equation results
The coefficient of correlation, as isolated from the above equation, can be written as
At this point we can substitute the slope of the regression line B, as defined in the preceding
section, for the slope of the regression line b in the above equation, since the slopes of regression
lines in obtained and deviation scores are identical. This results in equation
The point biserial coefficient of correlation can be also written in the form of a coefficient of
determination:
Summary
The preferred conceptualization of the point biserial coefficient of correlation is in its
determination form, as
The values of the point biserial are numerically equivalent those that could have been obtained
by the product moment coefficient of correlation computed from the same data.
The point biserial correlation is conceptually important, as it helps to understand the main
principles of the tests of statistical significance, especially how the coefficient of correlation can
be used to measure a difference between two means.