You are on page 1of 24

Job Evaluation and Gender:

The Case of University Faculty

JIMSIDANIUS MARIE CRANE


Lkpanments of Politics and Psychology Lkpanment of Scociology
New York University Universityof Tam at Austin

This study examined the effects of students and professors sex on student
evaluations of professors teaching effectiveness. Ratings of over four hundred
faculty made by over nine thousand students were analyzed. After controlling for a
large number of variables, the main results showed that (a) male faculty were given
significantly higher evaluations on global teacher effectiveness and academic
competence than female faculty; (b) when controlling for extraneous variables,
female faculty were not found to be rated as more sensitive to student needs than
male faculty; and (c) when making overall, global judgments of faculty performance,
students seem to place more weight on academic competence for male faculty than
for female faculty.

Evaluations of employees performance are critical to employees success


within work organizations. Favorable evaluations can be viewed as a limited
resource that are essential for the attainment of other resources.
Performance evaluations are used to determine decisions about
terminations, promotions, and salary increases.
If there are gender differences in performance evaluations, gender
inequality in the labor force will persist despite other efforts to diminish
discrimination such as affirmative action in hiring. Changes in the structure
of the labor market, and in particular, an increase in the number of women
in occupations previously dominated by men, makes it important to
determine whether there are systematic differences in the evaluations made
of men and women who perform the same job.
A number of studies have examined whether the comparable
achievements of women are evaluated less favorably than those of men. The
results appear to be mixed. An early and well-known investigation of this

Request for reprints should be sent to ProfessorJim Sidanius, University of California, Los
Angela, Department of Psychology, 405 Hilgard Avenue, Los Angela, CA 90024-1563.

174

Journal of Applied Social Psychology, 1989,19,2, pp. 174-1 97.


Copyright 0 1989 by V. H. Winston &Sons, Inc. All rights reserved.
JOB EVALUATION AND GENDER 175

topic (Goldberg, 1968) showed that college women who judged the quality
of articles (which one half thought were written by women and the other
half thought were written by men) gave higher ratings to the work attributed
to men (see also McKee & Sherriffs, 1957). Similarly, Fidell (1975) reports
that college and university faculty who evaluated descriptions of
hypothetical psychologists rated the men more favorably than the women.
Deaux and Emswiller (1974) showed that male performance on a perceptual
discrimination task was rated as more skillful than the equivalent female
performance. Taynor and Deaux (1973) revealed that raters evaluated the
response to an emergency situation made by a male as more appropriate and
logical than the females identical response.
Still other research reports more favorable evaluations of womens
performance than mens. Jacobson and Effertz (1974) conducted a study in
which some subjects (the leaders) instructed others how to complete a
task. They conclude that women in a leadership position are perceived, and
perceive themselves, as doing a better job than male leaders, by male and
female subjects, given that the actual performance of the male and female
leaders is equivalent. Hamner, Kim, Baird, and Bigoness (1974) carried out
an investigation in which subjects were asked to take on the role of a grocery
store manager and assess the performance of job applicants stocking
shelves. Female applicants were seen as having a higher overall level of task
performance than male applicants. Abramson, Goldberg, Greenberg, and
Abramson (1977), whose research produced similar findings (women
attorneys and legal assistants were judged to be more vocationally
competent than men), call this effect the talking platypus phenomenon.
The term describes an instance in which an individual achieves a level of
success not anticipated and the achievement is magnified rather than
diminished; it matters little what the platypus says, the wonder is that it can
say anything at all. Two studies of actual employees in actual organizational
settings find evidence of this phenomenon. Although there is evidence in
the literature which suggests that, at least in certain situations, women can
receive higher performance evaluations than men (see e.g., Peters et al.,
1984) there is also evidence to indicate that success for men will tend to be
attributed to competence while success for women will tend to be attributed
to luck (see Etaugh & Brown, 1975; Wiener et al., 1971).
Although some studies have revealed biases favoring males and others
have found biases favoring females, other studies tend to find no effect of
gender on evaluations at all (see e.g., Elmore & LaPointe, 1975; Harris,
1975). Hall and Hall (1976) conducted research in which student subjects
evaluated the performance of a hypothetical manager who effectively solved
a problem. They found that the gender of job incumbents did not affect
performance appraisals. Frank and Drucher (1977) trained undergraduate
176 SlDANlUS AND CRANE

industrial or applied psychology students to evaluate an in-basket response.2


Their results indicated that the evaluates gender did not influence
performance evaluations (see Bayer & Astin, 1975; Carter & Ruther, 1975;
Dipboye & Wiley, 1977; Ekstrom, 1979; Mischel, 1974; Nevil, Stephenson &
Philbrick, 1983; Peters et al., 1984; Pheterson, Kiesler & Goldberg, 1971;
Tuchman, 1975; Benokratis & Feagin, 1986).
Furthermore, despite that fact that earlier evidence indicating that
success for men will tend to be attributed to competence whereas success for
women will tend to be attributed to luck (see Etaugh & Brown, 1975;
Wiener et al., 1971), the results of more recent meta-analyses have begun to
cast some doubts on these conclusions as well (see Frieze, Whitely, Hanusa
& McHugh, 1982; Sohn, 1982; Hyde & Linn, 1986).
There are several possible explanations for these apparently inconsistent
findings. One is that these studies differ in terms of the amount of
information about the evaluatee that is made available to the evaluator.
Several researchers suggest that the probability of finding a gender effect is
negatively related to the amount of information evaluators have about
evaluatees (Hall & Hall, 1976; Terborg and Ilgen, 1985; Locksley, Borgida,
Brekike and Hepburn, 1980). The reason is that tbe salience of gender is
greater in the absence of other information and is therefore more likely to
affect judgments. Rosen and Jerdee (1973) used a design similar to that of
Hamner et al. (1974) but provided less information about the evaluatee.
Rosen and Jerdee found a pro-male bias in evaluation; Hamner et al. report
no gender effect.
A second factor that is neglected in most of this research is the extent to
which evaluates behavior violates stereotypes regarding gender-
appropriate behavior. There is an extensive body of research showing that
men and women are viewed as having different traits. Just as we have
stereotypes about individuals, we have stereotypes about jobs as a function
of gender. How are judgments about the quality of performance influenced
by the extent to which the gender of the evaluatee corresponds to
stereotypes about jobs? Although typically people respond more favorably
to individuals whose behaviors correspond with prevailing stereotypes, the
research on performance evaluations reveals that there are exceptions. It
appears that when women perform activities typically performed by men
they are either evaluated more favorably than men (Bigoness, 1976; Hamner
et al., 1974; Taynor & Deaux, 1973,1975) or the same (Hall & Hall, 1976).
This is particularly true when the women are competent and when

b e in-basket, which contains letters, memoranda, telephone messages, and so forth,


provides a situational test that attempts to simulate the important aspects of the job of a
manager.
JOB EVALUATION AND GENDER 177

evaluators have extensive information about evaluatees. If men are expected


to perform better than women, and they perform equally, mens
achievements are undervalued because of the discrepancy between expected
and actual performance. Alternatively, greater credit may be given to
women for harder work.

The Case of TeachingEvaluations

Teaching evaluations, like other kinds of performance appraisals, are


used to make important decisions about the distribution of occupational
rewards. Several recent studies have examined the effects of teachers sex
(and, in many cases, the effect of students sex). The results of this research
are mixed: Some reveal a pro-female bias, some a pro-male bias, and some
no effect at all (see Wilson & Doyle, 1976; Basow & Silverberg, in press;
Centra, 1973a, 1973b; Elliot, 1950; Heilman & Armentrout, 1936; Love11 &
Haner, 1955; Unger, 1979; see also McKeachie, 1979 and Overall and
Marsh, 1982 for excellent reviews). Kaschek (1978) reported laboratory
research revealing that female students gave comparable ratings to
professors of both sexes, but male students evaluated male professors more
favorably than female professors (ie.e., male students assigned particularly
negative ratings to female professors thought to be hard graders). Bennetts
(1982) study of 253 students enrolled in nonscience courses at a small liberal
arts college could find little evidence of direct bias in the formal evaluations
of female faculty members. However she did find evidence that students
expected female faculty members to be warmer than male faculty members.
Martin (1984) compared the evaluations of nine teachers made by 394
students and found no overall difference in evaluations of professors as a
function of teachers sex. She did, however, find a tendency for students to
rate same sex teachers more favorably than other sex professors. Several
other studies also report this effect (Aleamoni, 1981; Centra, 1981; Bray and
Howard, 1980). In subsequent research Kaschek (1981) found that the sex
bias evidenced by male students diminished when all teachers (male and
female) were described as award-winning. Most recently Bason and Silberg
(1987) compared evaluations made of male and female professors along five
dimensions. Differences revealing a pro-male bias were found on only one
dimension, namely interpersonal interaction.
Much of the research examining the effect of teachers sex on evaluations
suffers from several problems. The naturalistic studies are limited in scope:
(a) They present comparisons of the ratings of a few male professors to
those of a few female professors, and @) they fail to consider other factors
that have been shown to influence evaluations. The experimental studies are
somewhat artificial, sometimes asking subjects to rate hypothetical teachers,
178 SlDANlUS AND CRANE

or teachers they have read about rather than teachers they have had as
students (see also review by Marsh, 1984).
The aim of our research is to assess the effect of sex on performance
evaluations in a more comprehensive manner than has been done in the
past. We will attempt to examine the effects of teachers sex by correcting
for the specification errors inherent in earlier research and using several
thousand naturally occurring and formal performance evaluations. We will
ask four specific questions regarding gender effects on evaluations. The first
question addresses the possibility of an effect of teachers sex on evaluations,
when the effects of other determinants of teaching evaluations are
controlled for and a larger sample of evaluations are considered. The
second, and related question concerns the possibility of an interaction effect
of student and teachers sex. The third question concerns the effect of the
violation of gender stereotypes. This will be done by examining the
evaluations of women in departments which are overwhelmingly dominated
by men. Finally, we will compare the determinants of overall or global
evaluations for male and female professors. We will compare the impact of
control variables, gender-related variables, and specific aspects of teaching
effectiveness on global evaluations of male and female professors.

Method

Procedure

The data were collected at a major public university in the American


Southwest. Each semester the Measurement and Evaluation Center of this
university conducts teaching evaluations. These evaluations are made by
students and are a required part of every teachers official dossier to be
considered with other documents in questions of promotion and tenure.
Teacher evaluation data from the spring semester, 1985, were used. The
specific instrument employed is known as the Course-ImnUctor Survey:
GenerizE Questionnaire and is the major evaluation instrument used by this
university administration.

Subjects

The subjects consisted of 9005 undergraduates from this university. There


were 4662 female students and 4241 males students with 102 not indicating
their sex. Four hundred and one university instructors participated, 254 of
whom were males and 147 of whom were female.
JOB EVALUATION AND GENDER 179

Dependent Variables

Previous studies of teaching evaluations tend to indicate that there are


approximately four major dimensions of teacher evaluation: (1)
competence, (2) Sensitivity to Students, (3) Fairness in Assignments and
Grades and (4) Course Expectations (see Frey, 1978; Overall & Marsh, 1982).
Furthermore, the literature also indicates that overall, global evaluation
tends to be predominantly driven by the instructors rated Competence and
Sensitivity to Students or empathy (see Anderson, Alpert & Golden, 1977).
As a result of these empirical findings and the fact that we are particularly
interested in sex differences, we will concentrate on the two dimensions
most closely associated with sex-role stereotypes and which have also been
shown to be most critically related to overall evaluation, namely Competence
and Sensitivity to Students. Altogether, we will examine three dependent
variables: (a) Global Evaluation, (b) Competency, and (c) Sensitivity to
Students,
Following the administrative routine of the University, one of the
primary dependent variables in this study was the students overall judgment
of the instructors effectiveness. The university assesses this variable by
asking the following question of students in its survey: Compared with all
instructors I have had, both in high school and college, this instructor was:
(a) One of the best, (b) Above average, (c) Average, (d) Below average and
(e) Far below average. This variable will hereafter be referred to as Global
Evaluation.
Inspection of the items in the Course Instructor Survey: General
Questionnaire disclosed several items which seemed to tap Competency and
Sensirivily to Students. Five items from the scale were used to operationalize
Competency and three used to operationalize Sensitivityto Students.

ReIiabiIiry and Validityof Dependent VariableMeasures

The validity and reliability of these measures of teacher effectiveness


were ascertained by the use of face and construct validity. The fact that the
global evaluation index is routinely used by this and many other universities
as a standard part of the faculty evaluation process makes it particularly
interesting for study. However, we will also empirically investigate the
indexs construct validity by relating it to the two multi-item scales of
competency and sensitivity to students. The reliability and construct validity of
all three teaching effectiveness indices were simultaneously investigated by
submitting them to a LISREL VI measurement and structural equation
model.
180 SlDANlUS AND CRANE

The coefficients of the model were estimated by use of maximum


likelihood criteria. The results of the measurement model aspects of the
analysis disclosed that the dimensions of competency and sensitivity were
measured with a good deal of reliability. The total coefficient of
determination of the exogenous variables was 0.95:
The construct validity of the single-item measure global evaluation can be
ascertained by examining the results of the structural equation part of the
model. Here, global evaluation was examined as a function of the two latent
exogenous dimensions, competency and sensitivity to students. The results
showed a good deal of construct validity for this index. The two multi-index,
latent continua of teacher effectiveness could together explain more than
62% of the variance of global evaluation (i-e., as measured by the total
coefficient of determination for the structural equations). Of the two
factors, competency was found to be a more important determinant of global
evaluation than sensitivity to students (i.e., gamma = .44 vs gamma = -29
re~pectively).~The overall goodness of fit of the entire model was relatively
high (goodness of fit = .94; adjusted goodness of fit = .90)indicating that
the models overall consistencywith the empirical data was quite acceptable.
The items defining competency and sensitivity to students, together with their
factor loadings from the LISREL analysis, are found in Table 1.

Results

Simple Analyses of Sex and Perfomzance Evaluations

Most previous studies of sex and performance evaluations have done


simple bivariate analyses of data, or at best have controlled for one or two
exogenous variables, usually the sex of the evaluator. We will begin our
substantive analyses at the same point and ask Is the rated Global

3The coefficient of determination is a generalized measure of reliability for the whole


measurement model. It indicates how well the exogenous measured variables serve as
instruments for the measurement of the exogenous latent dimensions of [competency] and
[sensitivity to students.] See Joreskog & Sorbom, 1986 for more details.
4Despite the fact that the global evaluation index is routinely used by the University
administration as a central measure of teacher effectiveness, it is well known that single-item
variables generally have weaker reliabilities than multi-item measures. However, it is also well
known that the correlation of any variable with any other(s) is restricted by that variables
reliability (i.e., the maximum correlation of a variable is limited by the square-root of its
reliability (see Magnusson, 1961; Nunnally, 1978). Therefore, an estimate of the [minimum]
reliability of this single-index variable is simply the percent of its variance which is explained by
any other variable(s). In this case, this is 62%, a value considered quite adequate for research
PUpoSes.
JOB EVALUATIONAND GENDER 181
Table 1

Maximum Likelihood Loadings of Eight Instructor Evaluation Indicators


Upon the Latent Continua of Competency and Sensitivity To Students

Indicator Loading

Competency
The instructor showed a scholarly grasp of the course material. $2
The instructor seemed well prepared for lecture or discussion. .78
The instructor showed confidence before the class. .74
The instructor used clear, relevant examples. .66
The instructor kept the lectures and class discussions focused
on the subject of the course. .69

Sensitivity To Students
The instructor seemed to be sensitive to the feelings and
needs of the students. .76
The instructor usually seemed to be aware of whether the class
was following the presentation with understanding. .75
The instructor made me feel free to ask questions, disagree,
and express my ideas. .75

Evaluation, Competency and Sensitivity to Students of university instructors


significantly related to the instructors and the students sex? To answer
this question, three simple, two-way ANOVAS were performed, one for
each dependent variable, For each analysis the instructors and the students
sex were used as independent variables.
The results showed that the instructors gender was found to be related to
evaluation in all three cases. Female instructors received significantly lower
Global Evaluation and Competency ratings than male instructors (F = 35.62,
p < .001 and F 2 3 . 2 7 , ~< .001 respectively), while male instructors received
significantly lower ratings on Sensitivity to Students than female instructors
(F = 3 9 . 1 2 , ~< .001).6

Factor scores for [Competency] and [Sensitivity to Students] were calculated as weighted
sums of the factor loadings found in the measurement model.
%he classic experimental analytic approach was used where the main effects of instructor
and student sex each were evaluated while controlling for the effects of the other and the
interaction term is assessed after the main effects are controlled. No adjustment for interaction
is made while assessing main effects.
182 SlDANlUS AND CRANE

There was only one significant instructor-sex by student-sex interaction


effect. This effect showed that, after controlling for instructor and student
sex, female students experienced female instructors as being relatively
sensitive to students while male students experienced female instructors as
being relatively insensitive to students.
However, before one comes to any definitive conclusions about
performance evaluations as a function of sex, there are many other factors to
be considered such as expected grade in the course, faculty rank, the reason
the student is taking the course, etc. It is to these more complicated
questions that we will now turn.

Multivariate Analyses

The multivariate analyses controlled for the following factors: 1) the


students gender (male coded 1, female coded -1; 2)the students
expected grade in the course; 3) the students grade point average; 4)
whether the course was elective or required; 5 ) The Instructors academic
area (e.g., Life Sciences, Humanities, Social Sciences or Business); 6) the
Instructors rank; 7 the number of students in the class (in both linear and
quadratic aspects! and 8) the percentage of women teaching in the
Instructors department.
The data were analyzed by use of hierarchical regression analyses where
the exogenous variables (or covariates) were entered into the regression
equation first, followed by the instructors sex and subsequently a number of
interaction terms involving the instructors sex. All of the covariates were
entered into the equation as a single block.

Global Evaluations

The results showed that, as a group, the covariates were significantly


related to Global Evaluation (R =.274,p c lo1). The covariates which were
found to make significant and unique contributions to the prediction of
Global Evaluation were (a) students expected grade. The higher the student
expected hisher grade to be, the better the rating given to the instructor; (b)
students grade point average. The better the students GPA, the lower the
rating given the instructor; (c) the Instructors rank. Associate and Full
Professors tended to get relatively high rating while Lecturers, Teaching
Assistants and Instructors tended to receive relatively low ratings; (d) the
number of students in the class. Instructors of very small and very large

7The quadratic aspects were also considered because it has been found that evaluations are,
in part, a curvilinear function of the number of students in the class.
JOB EVALUATIONAND GENDER 183

classes tended to get relatively high global ratings while instructors of


moderate sized classes tended to get relatively poor ratings; and (e) the
percentage of women faculty in the instructors department. The larger the
percentage of women teaching in a given department, the better the Global
Evaluation of the instructor. (see Table 2). The general trend of these
relationships was consistent with what has been reported in the literature
before (see McKeachie, 1979 & Overall & Marsh, 1982 for reviews).

Table 2

Hierarchical Regression of Global Evaluation on Independent Variables


and Their Interactions

Entry
Step Variables Beta B R

1 Students sex .@a4 .003


Expected grade .215 .255***
Grade point average -.075 -.063***
Student rank .004 .003
Elective vs. required
course .002 .004
Life Sciences -.010 -.059
Humanities - .062 -.114***
Social Sciences .022 .loo
Associate Professor - .083 .119***
Assistant Professor -.007 -.011
Lecturers -.073 -.117***
Instructors -.019 -.031
Teaching assistants -.045 -.105
Number of students - .351 -.OO2** *
Number of students2 .418 .Ooo***
Percent women faculty .085 .007***
.270** *
2 Instructors sex .050 .051***
.274** *
3 Students sex X
Teachers sex .011 .011
.274***
4 Instructors sex X
Percent women faculty .003 .OOO
.274** *
184 SlDANlUS AND CRANE

Table 2 (Continued)
Hierarchical Regression of Global Evaluation on Independent Variables
and Their Interactions

Entry
Step Variables Beta B R
S Teachers sex X
Associate professor .M8 .m*
Teachers sex X
Assistant professor .008 .012
Teachers sex X
Lecturer - .149 -.240* * *
Teachers sex X
Instructor -.044 -.074**
Teachers sex X
Teaching assistant .046 .110
.293***
6 Competency .420 .349***
Sensitivity to students .392 .241** *
.742** *
7 Competency
X Students sex -.007 -.001
Sensitivity to students
X Students sex -.004 -.ooo
.742** *
8 Competency
X Teachers sex .446 .053** *
Sensitivity to students
X Teachers sex -.088 -.011
.743** *

Entry Beta is meant the standardized beta Coefficient at the


time the independent variable is first entered into the regression
equation.
*pI .os; * *pI .01;***pI .001.

The second stage of the hierarchical regression analyses entered the


Instructors gender. the results showed a small yet statistically significant
increase @ c lo4) in the multiple correlation coefficient (see Table 2). The
nature of this increase indicated that male faculty were given significantly
JOB EVALUATION AND GENDER 185

higher Global Evaluation ratings than women faculty, even when controlling
for the covariates above.
Continued additions to the equation indicated no significant Instructor
sex x student sex or Instructor sex x Percent women faculty interactions
(stages 3 and 4 respectively). This is to say that there was no evidence that
the differences in the performance evaluations male and female instructors
received depended upon the sex of the student. Likewise, there was also no
evidence to indicate that the differences in the performance evaluations that
male vs. female instructors received depended upon the number of female
instructors within a given department. This factor was tested because it is
conceivable that female instructors teaching in traditional and
predominantly male-dominated departments might be judged significantly
more harshly than their male counterparts. Our results tend not to confirm
this hypothesis.
The fifth stage of the hierarchical analyses did, however, reveal significant
Instructor sex x Instructor rank interactions. The nature of these
interactions revealed that, although male Teaching Assistants and Assistant,
Associate and Full Professors were given about 9% higher global
evaluations (on the average) than their female counterparts, males with the
rank of Instructor were given only marginally higher evaluations than
their female counterparts. Furthermore, males with the rank of Lecturer
were given 9% lower ratings than female Lecturers. Therefore, the sex
effect noted above must be modified somewhat by adding that the Global
Evaluation received by male and female teachers seems, in part, to depend
on the teachers rank.
A second means of attacking this question would be by examining the
rates at which male and female faculty actually participate in the evaluation
process. If male faculty are avoiding the evaluation process to a larger extent
than female faculty, then the proportion of male faculty in the examined
sample should be significantly smaller than the proportion of male faculty
within the university at large. However, examination of these proportions
revealed that male faculty were not unrepresented in the sample at any rank.
If anything, the male faculty seemed to be slightly, although not
significantly, overrepresented in the sample compared to the university
population.
Finally, one could argue that the results are biased in favor of the male
faculty because the males will have a higher tendency to allow themselves to
be evaluated only in their really good courses whereas females will tend to
allow evaluation across all of their courses. Although this argument is
plausible, it seems somewhat unlikely. Unfortunately, however, the best way
to settle the question, i.e., empirical test, is not possible here because our
data set does not contain information concerning the number of courses
186 SlDANlUS AND CRANE

taught by each instructor at the time of evaluation. Therefore, this question


must await final resolution in a future study.
The sixth step of the analyses of Global Evaluation indicated that,
controlling for everything before, the rating dimensions of Competency and
Sensitivity to Students (in that order) make very strong contributions to the
predictions to rated Global Evaluation (see Table 2).
The seventh and eighth steps of the hierarchical analysis examined the
possible interaction effects of the evaluation dimensions (i.e., Competency
and Sensitivity to Students and the students and the Instructors sex.
Although there were no evaluative dimension x student sex interactions,
there was a small yet statistically significant Competency x Instructor sex
interaction (see Table 2). Careful examination of these results indicated
that, while making global judgments concerning the instructors
performance, the students placed more emphasis upon the evaluative
dimension of Competency for male instructors than for female instructors. In
other words, as far as students were concerned, it was more important that a
male instructor be competent than a female instructor be competent.
However, investigation of the three-way interaction between evaluative
dimension x Instructors sex x students sex revealed no significant effects.
This means that the greater weight given to competency when judging the
performance of male instructors, as opposed to female instructors, was not
affected by the students sex. Female students placed more weight on
competency when judging male instructors to the same extent that male
students did.

Competency

Having investigated some of the covariates and possible determinants of


Global Evaluation, we can now turn our attention to some of the possible
determinants of the constituent evaluative dimensions of primary interest to
us, namely Competency and Sensitivity to Students. As before, the evaluative
dimensions were regressed upon the independent variables in a hierarchical
fashion, starting with the variables treated as covariates (see Table 3).
Starting first with the evaluative dimension Competency, the hierarchical
analysis showed that seven of the eight covariates showed statistically
significant relationships with the instructors adjudged competency. These
significant covariates were (a) students sex. Male students gave uniformly
lower competency ratings than female students; (b) expected grade. The
higher the students expected grade in the course, the lower the student
rated the teachers competency; (c) grade point average. The higher the
students GPA, the lower the competency ratings; (d) instructors academic
area. Instructors within certain academic areas received lower competency
JOB EVALUATION AND GENDER 187

ratings than instructors within other academic areas; (e) instructors rank.
Full Professors, Associate Professors, Assistant Professors and Instructors
tended to get relatively high competency ratings while Lecturers and
Teaching Assistants tended to get relatively low competency ratings; ( f )
number of students in the class. Instructors of very small and very large
classes tended to get relatively high competency ratings while instructors of
moderate-sized classes tended to get relatively low competency ratings; and
(g) percent of women in faculty. The greater the percentage of women
faculty in the instructors department, the higher these instructors
competency ratings. With the exception of the last result, these relationships
have also been found in other studies (see Table 3).
The critical question for this study, however, is whether or not women
faculty will receive significantly lower competency ratings even when
controlling for all of the above covariates. The hierarchical analyses
disclosed that, as predicted, female faculty were indeed perceived as being
significantlyless competent than male faculty, even when controlling for the
covariates. Furthermore, step 3 of the analyses showed no significant
interaction effect between student sex and instructor sex. This is to say that
the adjudged superior competence of male vs. female faculty was not
affected by the students sex
The fourth step of these analyses did disclose a significant interaction
between percentage of female faculty in a given department and the facultys
sex. It was hypothesized that female faculty in non-traditional roles (as
operationalized by the percentage of the women in a given department)
would be perceived as less competent than their male counterparts.
Although this interaction effect was found to be statistically significant, the
pattern of competency ratings for different combinations of sex and
percentage of female faculty was much more complicated than expected. As
predicted, the relative adjudged competency of males vs. females was greater
in predominantly male-dominated departments as compared to departments
with a higher proportion of female faculty. However, in very
male-dominated departments female faculty were rated as being very
competent, even more so than their male counterparts in those
departments.
Finally, the last stage of these analyses showed a significant interaction
between the facultys sex and the facultys rank, showing junior female
faculty being judged as more competent than their male counterparts and
female senior faculty being judged less competent than their male
counterparts. All of the independent variables together showed a relatively
small yet statistically significant relationship with the instructors judged
competency (R = .267,p -c lo4).
188 SlDANlUSAND CRANE

Table 3

Hierarchical Regression of Competency Regressed on Independent


Variables and Their Interactions

Entry
Step Variables Beta B R
1 Students sex -.049 -.054** *
Expected grade .125 .178** *
Grade point average -.035 -.035**
Student rank .017 .015
Elective vs. required
course .004 .012
Life Sciences -.024 -.176
Humanities - .0562 - .124** *
Social Sciences -.012 -.063
Associate Professor .089 .155***
Assistant Professor .025 .046
Lecturers -.130 -.248** *
Instructors .004 .007
Teaching assistants -.044 -.126
Number of students -.147 -.001**
Number of students2 .238 .001**
Percent women faculty .082 .ow** *
.237** *
2 Instructors sex .052 .065***
.242** *
3 Students sex X
Teachers sex ,012 .013
.242** *
4 Instructors sex X
Percent women faculty - ,035 -.003**
.244** *
5 Teachers sex X
Associate professor .016 .027*
Teachers sex X
Assistant professor .041 .075
Teachers sex X
Lecturer -.147 -.283***
Teachers sex X
Instructor -.043 -.085*
JOB EVALUATIONAND GENDER 189
Table 3 (Continued)

Hierarchical Regression of Competency Regressed on Independent


Variables and Their Interactions

Entry
Step Variables Beta B R
Teachers sex X
Teaching assistant -.001 -.m
.267***

1-By Entry Beta is meant the standardized beta coefficient at the


time the independent variable is first entered into the regression
equation.
*p5.05; **pI.01;***p I.001.

Sensitivity to Students

Despite the fact that the simpler analyses above showed women faculty
being perceived as more sensitive than male faculty, after controlling for all
of the covariates there were no significant differences in the degrees to
which male and female faculty were rated as sensitive to students. There
were, however, certain significant interactions between faculty sex and
faculty rank. These interactions tended to show that female Lecturers were
considered to be more sensitive to students than male Lecturers but that
male Assistant and Full Professors tended to be considered more sensitive
to students than female Assistant and Full Professors.
Altogether there was a moderate relationship between all the
exogenous variables and the students perceptions of the facultys sensitivity
to students (R = .368,p < lo-). However, the single factor having, by far,
the largest impact upon the students perceptions of the instructors
sensitivity to students was the number of students in the class. This
relationship consisted of significant linear and quadratic components.

Summary and Discussion

The purpose of this study has been fourfold: (1) to determine whether
there are differences in job performance evaluations made of men and
women faculty; (2) to determine whether these evaluations are affected by
interactions between the sexes of evaluators and evaluatees; (3) to
determine whether any effects of the possible violation of traditional gender
190 SlDANlUS AND CRANE

Table 4

Hierarchical Regression of Sensitivity to Students Regressed on


Independent Variables and Their Interactions

Entry
Step Variables Beta B R

1 Students sex -.027 -.041*


Expected grade .246 .477** *
Grade point average -.072 - .098* * *
Student rank - -040 -.046**
Elective vs. required
course - .001 -.003
Life Sciences - .014 -.137
Humanities -.049 -.147** *
Social Sciences -023 .070*
Associate Professor - -039 -.0!31*
Assistant Professor -.008 -.019
Lecturers - .143 - .386** *
Instructors -.510 -.005* * *
Teaching assistants .lo1 .386** *
Number of students - S10 -.005* * *
Number of students2 .364 .Ooo***
Percent women faculty .079 .011** *
-346***
2 Instructors sex .018 .030
.346** *
3 Students sex X
Teachers sex .006 .009
.346***
4 Instructors sex X
Percent women faculty -.003 -.Ooo
.346**+
5 Teachers sex X
Associate professor - .015 - .036
Teachers sex X
Assistant professor .053 .133**
Teachers sex X
Lecturer -.112 -.293***
Teachers sex X
Instructor -.003 -.008
JOB EVALUATIONAND GENDER 191
Table 4 (Continued)

Hierarchical Regression of Sensitivity to Students Regressed on


Independent Variables and Their Interactions

Entry
Step Variables Beta B R

Teachers sex X
Teaching assistant .016 ,062
.267***

l-By Entry Beta is meant the standardized beta coefficient at the


time the independent variable is first entered into the regression
equation.
*pI .05; **pI .01; ***p5.001.

stereotypes on evaluations of performance could be discerned; and (4) to


examine whether overall performance evaluations for men and women
faculty are equally functions of the same judgmental dimensions.
First, the results of more fully specified analyses have indicated that
women faculty are, in fact, judged to be less adequate and less competent
than their male counterparts. These results seem to stand even if one
controls for a large number of possible confounding variables such as: (a)
academic rank, (b) students sex, (c) student GPA, (d) the students expected
grade, (e) academic discipline, (0 the number of students in the course, and,
(g) the percent of women faculty teaching in a given academic department.
However, although women received lower global evaluation and lower
competency ratings than men, women were seen as being more sensitive to
the needs of students, before but not after controlling for the variables
above.
Second, after controlling for all of the possible confounding variables
above, we found no evidence supporting the notion of an interaction
between instructor and student sex on judgments of instructor performance.
Female instructors were given lower total evaluations and judged to be less
competent by female and as well as male students and to approximately the
same degree.
Third, the results concerning the gender role appropriateness hypotheses
seemed mixed. If one assumes that the number of women actually working
within a given area will be monotonically related to the perceived
appropriateness of women within that area, then the results would seem
consistent with two models. First, there is some evidence to support the
192 SlDANlUS AND CRANE

notion that the less appropriate a persons position in a given role, the less
adequate that persons performance will be judged. However, in extreme
circumstances, that is to say when women are found in very male-dominated
environments, the female instructors were judged to be more competent
than their male counterparts. This result is consistent with the talking
platypus hypothesis. This interpretation is also congruent with the recent
findings of Eagley and Steffen (1984). On the other hand, it is also possible
that in order for women to be accepted in very male-dominated roles, they
will in fact have to be much more competent than their male counterparts
and that the higher-perceived competency of these women is not simply a
function of perceptual distortion but rather due to the fact that they really
are more competent. Finally, of course, it is quite possible that all three
factors are operative at the same time. Needless to say, this particular
phenomenon seems quite complicated and will demand more intense
scrutiny using controls which are not possible for us here if it is to be well
understood.
Fourth, and perhaps most interestingly, we did find evidence indicating
that when students make global evaluations of instructor performance,
competency seems to be a more salient judgmental dimension when making
judgments of male instructors than when making judgments of female
instructors. Furthermore, this effect seems unaffected by the students sex
Therefore, the overall impression given by these data is that, not only are
male professors judged to be more intellectually competent than female
professors, but that when judging overall teacher performance, it is also
more important for males to be intellectually competent than it is for
females.
There are three factors which we feel allow us to view these findings with
a certain degree of relative confidence: (1) Instead of being generated from
some contrived experimental procedure using a small number of instructors
and students, the data used here were generated as part of the universitys
regular and long-standing teacher evaluation process and will actually be
used by this university in making critical personnel decisions; (2) a very large
number of students and teachers were examined, thereby increasing the
precision of parameter estimates; and (3) as far as we know, our study has
controlled for a larger number of possible contaminating variables than
have been considered in the past. Given these circumstances, we think it is
reasonably safe to conclude that gender really does seem to make a
difference in performance evaluation, a difference prejudicial to women
academics as a group.
However, there is also reason to be careful about how these results are to
be interpreted. To begin with, the fact that, in general, men were perceived
as being more competent than women need not be a function of gender
JOB EVALUATION AND GENDER 193

stereotyping or bias; it is quite possible that men are, in fact more


competent in their teaching roles. In order to draw any solid conclusions
here one would have to distinguish between perceived teaching
competency and real teaching competency. Unfortunately, getting an
objective measure of real teaching competency would be extremely
difficult, if not impossible under any circumstances. Even if one were to use
such objective criteria as publication record, this would still not
adequately measure teaching competence. Nonetheless, until we have really
firm measures of objective competence, we must be somewhat cautious as
to what we call sexism and what we call perception of objective reality.
There is, however, one aspect of our findings which would seem to allow a
relatively unambiguous interpretation. This is due to the fact that the
dimension of competencywas found to be more important in the evaluation
of men than of women. It would seem difficult, although not totally
impossible, to interpret this finding as an effect of the behavior of the
teachers themselves. It seems much more plausible that this difference in
evaluation set is more a function.of the evaluators than of the evaluatees.
Second, even though we have firm evidence of potentiul sexism in the job
performance evaluations of male and female faculty, like the results found
in other studies, the sizes of these effects are far from overwhelming (see
Goldberg, 1968, Paludi & Strayer, 1985; Fidell, 1975; Taynor & Deaw,
1973). The size of the sample used here was so large that our statistical tests
were very powerful and we can therefore be fairly certain that these
differences are real, at least within the population sampled. Even if these
differences are a function of gender bias rather than perceptual accuracy,
the differences are too small to play any major role in how men and women
are evaluated. However, we must be careful here; How small is too small?
For some, any statistically reliable gender bias is cause for concern. This is
really a question of values and one which we certainly cannot settle here.
Finally, it must be remembered that this study was conducted in the
American Southwest, a region not generally known for its gender
egalitarianism. The question which naturally comes to mind is whether the
results found here are generalizable to other regions of the United States
and other parts of the Western world? Although there is no definitive
answer to this question as yet, there is at least some circumstantial evidence
in the fact that the studies which have found potential evidence of sexism in
job performance evaluations do not seem to be located in any specific part
of the United States. There is even recent evidence of sexism in the
treatment of male and female faculty in Canada, a nation generally
considered significantly more liberal than the United States (see Schrank,
1977,1985).
194 SlDANlUS AND CRANE

Nonetheless, that fact that evidence of potential sexism was found in


these data, albeit small in magnitude, naturally leads one to wonder about
the possible effects that ethnicity may also play in job performance
evaluation within the Context of traditionally biased institutions. For
example, is it also the case, that everything else being equal, black academics
will receive lower evaluations than white academics? It is these questions
which we will be addressing in our continued research.

References

Abramson, P., Goldberg, P., Greenberg, J., and Abramson, L. (1977). The
talking platypus phenomenon: Competency ratings as a function of sex
and professional status. Psychology of Women Quarter&, 2,114-124.
Aleamoni, L. (1981). Student ratings of instruction. In J. Millman (Ed.),
Handbook of Teacher Evaluations (pp. 110-145). Beverly Hills, C A Sage
Publications.
Anderson, W. T., Jr., Alpert, M. I. 8t Golden, L. L. (1977). A comparative
analysis of student-teacher interpersonal similarity/dissimilarity and
teaching effectiveness. The Journal of Educational Research, 71,3644.
Bayer, A. F. and Astin, H. S. (1975). Sex differentials in the Academic
reward system. Science, 188,796-801.
Basow, S. and Silberg, N. T. (1987). Student evaluations of college
professors: Are female and male professors rated differently? Journal of
Educational PsychoZogy, 79,308-314.
Bennett, S. K (1982). Student perceptions of and expectations for male and
female instructors: Evidence relating to the question of gender bias in
teaching evaluations.Journal of Educational Psychology, 74,170-179.
Benokratis, N. and Feagin, J. (1986). Modern Sexzkm. Englewood Cliffs, NJ:
Prentice-Hall.
Bigoness, W. J. (1976). Effect of applicants sex, race and performance on
employers performance ratings: Some additional findings. Journal of
Applied Psychology, 61,8044.
Bray, J. and Howard, G. (1980). Interaction of Teacher and student sex and
sex role orientations and student evaluations of college instruction.
Contemporay Educational Psychology, A5,241-248.
Carter, A. M. and Ruther, W. E. (1975). The Disappearance of Sex
Discrimination in the First Job Placement of New PhD.3. Los Angela:
Higher Education Research Institute.
Centra, J. A. (1973a). Effectiveness of student feedback in modifying college
instruction. Journal of Educational PsychoZogy, 65,395401.
Centra, J. A. (1973b). Self-ratings of college teachers: A comparison with
student ratings. Journal of Educational Measurement, 10,287-295.
JOB EVALUATIONAND GENDER 195

Centra, J. (1981). Determining Faculty Efectiveness. San Francisco:


Jossey-Bass, Inc.
Deaux, K. and Emswiller, T. (1974). Explanations of successful performance
on sex-linked tasks: What is skill for the male is luck for the female.
Journal of Personality and Social Psychology, 29,8045.
Dipboye, R. and Wiley, J. (1977). Reactions of college recruiters to
interviewee sex and self-presentation style. Journal of Vocational
Behavior, 10,l-12.
Eagley, A. H. & Steffen, V. J. (1984). Gender stereotypes stem from the
distribution of women and men into social roles. Journal of Personality
and Social Psychology, 46,735-754.
Elliot, D. H. (1950). Characteristics and relationships of various criteria of
college and university teaching. Purdue University Studies in Higher
Education, 70,541.
Elmore, P . B. & Lapointe, IC A. (1974). Effects of teacher sex and student
sex on the evaluation of college instructors. Journal of Educational
Pvhology, 66,386-389.
Elmore, P. & LaPointe, K. (1975). Effect of teacher sex, student sex and
teacher warmth on the evaluation of college instructors. Journal of
Educational Psychology, 67,368-374.
Ekstrom, R. B. (1979). Women faculty: Development, promotion and pay.
Findings, 5,l-5.
Etaugh, C. & Brown, B. (1975). Perceiving the causes of success and failure
of male and female performers. Developmental Psychology, 11,103.
Fidell, L. S. (1975). Empirical verifications of sex discrimination in hiring
practices in psychology. American Psychologist, 25,1094-1098.
Frank, F. and Drucher, J. (1977). The influence of evaluatees sex on
evaluations of a response on a managerial selection instrument. Sex
Roles, 3,5944.
Frey, P . W. (1978). A two-dimensional analysis of student ratings of
instruction. Research in Higher Education, 9,69-91.
Frieze, I. H., Whitely, B. E., Hanusa, B. H. & McHugh, M. C. (1982).
Assessing the theoretical models for sex differences in causal attribution
for success and failure. Sex Roles, 8,333-357.
Goldberg, P. (1968). Are women prejudiced against women? nuns-Action,
5,28-30.
Hall, F. and Hall, D. (1976). Effects of job incumbents race and sex on
evaluations of managerial performance.Academy of Management Journal,
19,476481.
Hamner, W. C., Kim, J., Baird, L., and Bigoness, W. (1974). Race and sex as
determinants of ratings by potential employees in a simulated
work-sampling task. Journal ofAppZied Psychology, 59,705-711.
196 SlDANlUS AND CRANE

Harris, M. B. (1975). Sex role stereotypes and teacher evaluations. Journal of


Educational Psychology, 67,751-756.
Heilman, J. K & Armentrout, W. D. (1936). Ratings of college teachers on
ten traits by their students. Journal of Educational Psychology, 27,
197-216.
Hyde, J. S. & Linn, M. C. (1986) The psychology of gender: Advances through
meta-analyses. Baltimore: John Hopkins University Press.
Jacobson, M. and Effertz, J. (1974). Sex roles and leadership: Perceptions of
the leaders and the led. OrganizationalBehavior and Human Performance,
12,383-396.
Joreskog, K. and Sorbom, D. (1986). LISREL VI: Analysis of linear
structural relationships by the method of maximum likelihood.
Mooresville, IN: Scientific Software.
Kaschek, E. (1978). Sex bias in student evaluations of college professors.
Psychology of Women Quarter&, 2,235-243.
Kaschek, E. (1981). Another look at sex bias in students evaluation of
professors: Do winners get the recognition that have been given?
Psychology of Women Quarter&,5,767-773.
Locksley, A, Borgida, E., Brekke, N., & Hepburn, C. (1980). Sex stereotypes
and social judgment. Journal of Personality and Social Psychology, 39,
8214331.
Lovell, G. D. & Haner, C. E (1955). Forced choice scales applied to college
faculty rating. Educational and Psychological Measurement, 15,291-304.
McKeachie, W. (1979). Student ratings of faculty: A reprise. Academe, 65,
384-397.
McKee, J. P. & Sherriffs, A. C. (1957). The differential evaluation of males
and females. Journal of Personaliiy, 25,356-371.
Magnusson, D. (1961). Testteori Stockholm: Almquist & Wiksell.
Marsh, H. W. (1984). Students evaluations of university teaching:
Dimensionality, reliability, validity, potential biases and utility. Journal of
Educational Psychology, 76,707-754.
Martin, E. (1984). Power and authority in the classroom: Sexist stereotypes
in teaching evaluations. Signs, 9,483-492.
Mischel, H. (1974). Sex bias in the evaluation of professional achievements.
Journal of Educational Psychology, 66,157-166.
Nevil, D., Stephenson, B., and Philbrick, J. (1983). Gender effects on
performance evaluation. Journal of Psychology, 115,165-169.
Nunnally, J. C. (1978). Psychometric Theov. New York McGraw-Hill.
Overall, J. and Marsh, H. (1982). Student evaluations of teaching: An
update. AAHE Bulletin, 35,9-13.
Paludi, M. and Strayer, L. (1985). Sex Roles, 12,353-361.
JOB NALUATION AND GENDER 197

Peters, L., OConnor, E., Weekley, J., Pooyan, A, Frank, B., and
Erenkrantz, B. (1984). Sex bias and managerial evaluations: A replication
and extension.Journal of Applied Psychology, 69,349-352.
Pheterson, G., Kiesler, S. and Goldberg, P. (1971). Evaluation of the
performance of women as a function of their sex, achievement, and
personal history. Journal of Personality and Social Psychology, 19,
114-118.
Rosen, B. & Jerdee, T. H. (1973). The influence of sex-role stereotypes on
evaluations of male and female supervisory behavior. Journal of Applied
Psychology, 57,4448.
Schrank, W. E. (1977). Sex discrimination in faculty salaries: A case study.
Canadian Journal of Economics, X,411-433.
Schrank, W. E. (1985, April). Sex digerences in faculty salaries at Memorial
University.+Adecade later. Report submitted to the President of Memorial
University and the Executive Committee of the Memorial University of
Newfoundland Faculty Association.
Sohn, D. (1982). Sex differences in achievement self-attributions: An
effect-size analysis. Sex Roles, 8,345-357.
Taynor, J. and Deaux, K (1973). When women are more deserving than
men: Equity, attribution and perceived sex differences. Journal of
PersonaZity, 28,360-367.
Taynor, J. and Deaux, K. (1975). Equity and perceived sex differences: Role
behavior as defined by the task, the mode and the actor. Journal of
Personality and Social Psychology, 32,381-390.
Terborg, J. and Ilgen, D. (1975). A theoretical approach to sex
discrimination in traditionally masculine occupations. Organizational
Behavior and Human Performance, 13,352-376.
Tuchman, H. P. (1975). Publication, teaching and the academic reward
structure. Lexington, MA. Lexington Books.
Unger, R. (1979). Sexism in teacher evaluation: The comparability of real
life and laboratory analogs. Academic Psychology Bulletin, 1,163-170.
Wiener, B., Frieze, I., Kukla, A., Reed, L., Rest, S. A & Rosenbaum, R. M.
(1971). Perceiving the causes of success and failure. New York: General
Learning Press.
Wilson, D. & Doyle, K. 0. (1976). Student ratings of instructors. Journal of
Higher Education, 47,465-470.

You might also like