You are on page 1of 10

This article was downloaded by: [University of Birmingham]

On: 16 November 2014, At: 08:22


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Assessment & Evaluation in Higher


Education
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/caeh20

Student evaluation of instruction in


higher education: exploring issues of
validity and reliability
a a
Jing Zhao & Dorinda J. Gallant
a
Department of Educational Policy and Leadership , The Ohio
State University , Columbus , Ohio , USA
Published online: 15 Feb 2011.

To cite this article: Jing Zhao & Dorinda J. Gallant (2012) Student evaluation of instruction in
higher education: exploring issues of validity and reliability, Assessment & Evaluation in Higher
Education, 37:2, 227-235, DOI: 10.1080/02602938.2010.523819

To link to this article: http://dx.doi.org/10.1080/02602938.2010.523819

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Assessment & Evaluation in Higher Education
Vol. 37, No. 2, March 2012, 227–235

Student evaluation of instruction in higher education: exploring


issues of validity and reliability
Jing Zhao* and Dorinda J. Gallant

Department of Educational Policy and Leadership, The Ohio State University, Columbus,
Ohio, USA
Assessment
10.1080/02602938.2010.523819
CAEH_A_523819.sgm
0260-2938
Original
Taylor
2010
Ms.
0000002010
00
zhao.195@osu.edu
JingZhao
&
and
Article
Francis
(print)/1469-297X
Francis
& Evaluation in Higher
(online)
Education
Downloaded by [University of Birmingham] at 08:22 16 November 2014

Many personnel committees at colleges and universities in the USA use student
evaluation of faculty instruction to make decisions regarding tenure, promotion,
merit pay or faculty professional development. This study examines the construct
validity and internal consistency reliability of the student evaluation of instruction
(SEI) used at a large mid-western university in the USA for both administrative
and instructional purposes. The sample consisted of 73,500 completed SEIs for
undergraduate students who self-reported as freshman, sophomore, junior or
senior. Confirmatory factor analysis via structural equation modelling was used to
explore the construct validity of the SEI instrument. The internal consistency of
students’ ratings was reported to provide reliability evidence. The results of this
study showed that the model fits the data for the sample. The significance of this
study as well as areas for further research are discussed.
Keywords: student evaluation of instruction; validity; reliability; structural
equation modelling (SEM)

Introduction
The use of student opinions to evaluate faculty instruction and investigate the factors
that may influence them dates back to the early 1900s (Algozzine et al. 2004). Now
student evaluation of faculty instruction is widely used in universities around the
world (Wagenaar 1995). Aleamoni (1981, 1987, 1999) recommended student ratings
to evaluate faculty instruction because students can provide information about: (1)
accomplishment of major educational objectives, (2) rapport with the teacher, (3)
elements of a classroom, such as instruction materials, homework and instructional
methods, (4) kind of communication between students and the instructor, and (5)
consumer data for students who have the freedom to choose their instructors.
Researchers agree that student ratings are the most valid source of evaluating
teaching effectiveness, and there is little support for the validity of any other source.
For example, based on his personal experiences, McKeachie (1997) claimed that
student evaluation of instruction (SEI) was almost certainly more valid than many
personnel committees. Machina (1987) acknowledged the importance of students’
evaluation in the teaching–learning interchange by stating that in general, student
evaluations can be taken to report honestly student perceptions and the perceptions
constitute the entirety of the student end of the teaching process.

*Corresponding author. Email: zhao.195@osu.edu

ISSN 0260-2938 print/ISSN 1469-297X online


© 2012 Taylor & Francis
http://dx.doi.org/10.1080/02602938.2010.523819
http://www.tandfonline.com
2228 J. Zhao and D.J. Gallant

According to Marsh and Roche (1993), student evaluation of faculty instruction is


commonly used in colleges and universities for the following purposes: (1) to provide
formative feedback to faculty for instructional improvement; (2) as a summary
measure of teaching effectiveness for personnel or administrative decision-making,
such as promotion and tenure; (3) to provide information to students for the selection
of courses and teachers; and (4) as a process description for research on teaching.
Algozzine et al. (2004) added to the list by stating that another major use for student
evaluation of faculty instruction was for decisions on salary.
Many studies have been conducted on student evaluation of faculty instruction.
Aleamoni (1999) addressed 16 of the most common myths related to student ratings
of instruction by summarising studies conducted from 1924 to 1978. The myths
included the relationship between: (1) class size and student rating, (2) gender of the
Downloaded by [University of Birmingham] at 08:22 16 November 2014

student and student rating, (3) time of day the course is offered and student rating,
(4) level of the course and student rating, and (5) rank of the instructor and student
rating.
The myths also included reliability and validity issues related to student-rating
forms (e.g. Arubari 1987; Burdsal and Bardo 1986; Marsh 1984; Ting 2000). Reliabil-
ity is concerned with the consistency, stability and dependability of the assessment.
Additionally, a reliable result is one that has similar performance at different times
(McMillan 2004). In the study of Hobson and Talbot (2001), the researchers pointed
out that reliability-related research on student evaluation instrument was mostly
specific to a single instrument. In addition, the reliability studies focused on consis-
tency or inter-rater reliability, stability and generalisability. Research on SEI has
provided strong support for its reliability and has yielded consistent results (e.g.
Barnes and Barnes 1993; Feldman 1989).
In general, validity of student ratings refers to the extent to which student evalua-
tions of faculty instruction actually measure what they are intended to measure.
Whereas construct validity refers to the extent to which an observed measure reflects
the underlying theoretical construct that the investigator has intended to measure
(Cronbach and Meehl 1955). Within the context of faculty evaluations, researchers
have acknowledged the difficulty associated with measuring validity because of the
non-existence of a single criterion for effective teaching (Abrami, d’Apollonia, and
Cohen 1990; Abrami and Mizener 1983; Marsh 1987, 1994, 1995). As a result, some
researchers tended to compare the evaluation form to a measure of student learning,
whereas other researchers tended to compare student ratings with other evaluations of
teacher effectiveness that were assumed to be valid (e.g. instructor self-evaluation of
teacher effectiveness, peer/colleague evaluations and alumni evaluations) (Hobson
and Talbot 2001). For example, Blackburn and Clark (1975) found that the correlation
between student ratings and peer ratings was .62. The study of Overall and Marsh
(1980) reported the correlation between SEI and ratings by recently graduated alumni
to be .83, which is relatively strong support for validity.
Other researchers such as Burdsal and Bardo (1986), Ellett et al. (1997) and
Shevlin et al. (2000) employed statistical methods such as factor analysis, hierarchical
linear modelling and structural equation modelling (SEM) to validate the student-
rating instrument. The purpose of Burdsal and Bardo’s (1986) study was to check the
validity of the questionnaire used to assess student perceptions of teaching effective-
ness (SPTE), which was used at Wichita State University. The SPTE questionnaire
consisted of 11 demographic items and 39 questions aimed at evaluating various
aspects of teaching performance of faculty. The sample consisted of 42,019 students
Assessment & Evaluation in Higher Education 229
3

at the university. Principal axis analysis with varimax rotation was used to analyse the
data. As a result, six factors, namely, attitude toward students, work load, course
value to students, course organisation/structure, grading quality and level of material
were extracted. The authors concluded that the logical patterns in which the items fell
gave face validity to student evaluation. Another finding from the study was that
course difficulty was uncorrelated with other aspects of students’ perceptions of
teaching.
The sample in the study by Ellett et al. (1997) was 2190 students in 145 classes
offered through the evening school at Louisiana State University during the 1996
autumn semester. Results of factor analysis in the study showed that the student
assessment of teaching and learning (SALT) measured two distinct constructs of
students’ perceptions of teaching, which were positively and strongly related to teach
Downloaded by [University of Birmingham] at 08:22 16 November 2014

other. The results also showed that the SALT was positively related to students’
perceptions of personal-learning environment and students’ summative judgements
about the quality of teaching. The authors concluded that the results supported the
criterion-related validity of the measure.
The study conducted at a university in the UK by Shevlin et al. (2000) included
213 undergraduates. The student evaluation of faculty questionnaire consisted of 11
items. Two factors, lecturer ability and module attributes were estimated using SEM.
In addition, the results showed that another factor, the lecturer’s charisma should be
included in the instrument. A large portion of the variance in lecturer ability and
module attributes could be accounted for by the lecturer’s charisma factor.
The purpose of the present study was to examine the validity and reliability of the
SEI instrument used at The Ohio State University. The SEI was first administered in
the autumn of 1994 and the data used in the present study came from the year of 2005.
Thus, it is necessary to see whether the structure of the instrument has been changed
when issued to a quite different population from the original population 11 years ago.
SEM was employed to investigate the structure of SEI, as well as the related reliability
and validity issues. As in other universities, the SEI is also used to provide feedback
to faculty for instructional improvement and for making personnel decisions such as
promotion and tenure, and merit increases. The research question in this study was: Is
the SEI a valid and reliable instrument for evaluating faculty instruction at The Ohio
State University?

Method
Data source
This study was based on archival records obtained from the university registrar’s
office for autumn 2005. The original sample included 98,777 responses from under-
graduate, graduate, graduate professional and other enrolled in faculty taught courses
in the autumn quarter of 2005. The sample of interest in this study consisted of 73,500
completed SEIs for undergraduate students who self-reported as freshman, sopho-
more, junior or senior. Approximately 21% of records represented responses for
freshmen, 20% for sophomores, 24% for juniors and 35% for seniors. The majority
(approximately 73%) of respondents reported having a grade-point average of 3.0 and
above. Furthermore, the majority (approximately 52%) of respondents indicated
enrolment in a course was because it was required in their major/minor whereas only
about 8% of respondents reported enrolment in a course because it was an elective
choice. It is important to note that students may have been enrolled in more than one
4230 J. Zhao and D.J. Gallant

course in autumn 2005. Hence, the 73,500 records may represent multiple responses
from the same student for different courses.

Measure
The SEI questionnaire was developed based on a literature review in several disci-
plines related to student evaluations of instruction and, in a more general sense, to the
measurement of human attitudes, opinion and behaviour. In addition, the SEI devel-
opment committees conducted many hours of in-depth interviews with faculty, deans
and department chairs to refine the SEI instrument (http://www.ureg.ohio-state.edu/
ourweb/scansurvey/sei/SEI_handbook.pdf).
The SEI consisted of 10 questions to elicit students’ ratings of faculty instruction.
Downloaded by [University of Birmingham] at 08:22 16 November 2014

The gathered information was used in important decisions such as promotion, tenure
and merit pay (Office of Academic Affairs 2001). The first nine questions asked about
course materials, atmosphere of the class, etc. The nine questions were rated on a five-
point Likert-type scale, ranging from 1 (disagree strongly) to 5 (agree strongly). The
last question asked for an overall rating of the instructor. It was rated on the five-point
Likert-type scale, ranging from 1 (poor) to 5 (excellent). Demographic information
such as students’ class standing, cumulative grade point average, and reasons why
they took the course were also included in the questionnaire.
The SEI questionnaires were administered to students to evaluate all instructors
during the last week of class in the autumn quarter of 2005. They were administered
to students by a volunteer during the normal class hours while the instructor was out
of the classroom. When students were finished completing the questionnaire, the
forms were collected in a sealed envelope and returned to the university’s administra-
tive office. Students’ evaluation information was only shared with the instructor after
the quarter ended.

Data analysis
This study used SEM to explore the validity of the SEI instrument. SEM allows
researchers to explore relationships between variables in the process of validating
and fitting the measurement model. It serves as a more powerful statistical process,
compared to linear–multiple regression, because SEM takes into account modelling
of interactions of variables, measurement error and multiple latent independent vari-
ables that may be measured by multiple indicators (Kline 1998). LISREL 8.7
(Jöreskog and Sörbom 2004) was used to analyse the data (i.e., to assess how the
items measured the corresponding construct). Maximum-likelihood estimation was
used to impute the missing data using PRELIS.

Results
There were 10 observed variables and one latent variable in the SEM model. The
observed variables were the nine items related to specific teaching attributes and one
item related to overall rating of instruction. The latent variable was instructor effec-
tiveness. The 10 questions in the survey, along with the mean and standard deviation
for each question are presented in Table 1. As reflected in Table 1, the mean ratings
ranged from 4.09 (SD = 1.11) to 4.51 (SD = .86) on a five-point scale. The reliability
Assessment & Evaluation in Higher Education 231
5

Table 1. Descriptive statistics for the student evaluation of instruction statements.


Item Statement n M SD
1 The subject matter of this course was well 73,500 4.31 .923
organised.
2 This course was intellectually stimulating. 73,433 4.12 1.036
3 The instructor was genuinely interested in 73,415 4.51 .860
teaching.
4 The instructor encouraged students to think 73,399 4.30 .953
for themselves.
5 The instructor was well prepared. 73,427 4.44 .887
6 The instructor was genuinely interested in 73,378 4.37 .965
helping students.
Downloaded by [University of Birmingham] at 08:22 16 November 2014

7 I learned a great deal from this instructor. 73,408 4.09 1.105


8 The instructor created an atmosphere 73,409 4.23 1.030
conducive to learning.
9 The instructor communicated the subject 73,299 4.15 1.103
matter clearly.
10 Overall, I would rate this instructor as… 72,467 4.27 .941
Note: The internal consistency reliability was .95.

for the questionnaire was .95, indicating that only 5% of the variability in the ratings
was due to error.

Model specification
The measurement model tested in this study was a confirmatory factor analysis (CFA)
model to determine whether each of the 10 observed variables appropriately loaded to
the latent variable (i.e. instructional effectiveness). Several indices, normal theory
weighted least-square chi-square (χ2), root mean square error of approximation
(RMSEA), standardised root mean square residual (SRMR) and goodness-of-fit index
(GFI) were used to assess the model’s fit to the data. Usually a small χ2 and a non-
significant p value (greater than .05) are desired. However, since the χ2 test is sensitive
to the sample size, it is not as reliable as other indices such as RMSEA, SRMR and
GFI (Schumacker and Lomax 2004). Hu and Bentler (1999) suggested RMSEA < .06,
SRMR < .08 and GFI > .95 to be the cut-off values used for determining goodness of
fit. RMSEA was a standardised measure of χ2, which was relatively insensitive to
sample size and a better index to test the model fit.
The CFA model was generated using LISREL. In the model, χ2 = 11,888.32, df =
35, p < .001. The large χ2 value and significant p value indicated there was a signifi-
cant difference between sample and population covariance matrix, which indicated
that this model was a poor fit. However, given the large sample in this study, it is not
surprising that the χ2 is significant. The path diagram for the model is presented in
Figure 1. For the latent variable, one estimated factor loading (Question 1, well orga-
nised) was fixed to one. The results showed that the model was an over-identified
model with df = 35, which indicated other information could be added to the model.
As observed in Figure 1, 10 observed variables had significant factor loadings (rang-
ing from .70 to .76), p value < .05, on instructor effectiveness. Hence, between 49%
and 58% of the variance in instructor effectiveness can be attributed to the nine
6232 J. Zhao and D.J. Gallant
Downloaded by [University of Birmingham] at 08:22 16 November 2014

Figure 1. Path diagram for student evaluation of instruction questionnaire.

instructional dimensions and the overall rating. Furthermore, in this model, RMSEA
= .068 (which was a little bit higher than .06), SRMR = .025 (which was below .08)
and GFI = .97 (which was above .95). RMSEA in this model was a little bit higher
than .06, the cut-off point set by Hu and Bentler (1999). However, according to
Browne and Cudeck (1993), a RMSEA value between .05 and .08 means an average
fit. Therefore, taking the three model indices (i.e. RMSEA, SRMR and GFI) into
account, it can be concluded that model was an acceptable fit. Instructor effective-
ness was appropriately and adequately assessed by the 10 observed variables in the
SEI questionnaire.
Figure 1. Path diagram for student evaluation of instruction questionnaire.

Discussion
The purpose of this study was to explore the structure of and reliability of the SEI ques-
tionnaire after 11 years of implementation. The results of the study provide evidence
to support the construct validity and reliability of the SEI questionnaire. Hence, the
results of the study provide evidence to support the continued use of the questionnaire
with other sources of information by the university to evaluate faculty instruction.
In exploring the construct validity of the questionnaire, the 10 individual items
correlated highly (r values ranging from .70 to .76) with instructional effectiveness,
Assessment & Evaluation in Higher Education 233
7

accounting for between 49% and 58% of the total variance in instructor effective-
ness. Hence, there is a strong association between the individual items and construct
of interest (i.e. instructional effectiveness). The finding suggests that students’
perception of instructor effectiveness is captured by nine dimensions of instruction
(i.e. organisation of course, intellectual stimulation, instructor interest in teaching,
students encouraged to think for themselves, preparation of instructor, instructor
helping students, learn from instructor, conducive atmosphere for learning and
subject matter communicated clearly) and the overall instructor rating. However,
student ratings should not be the only source of information or overinterpreted
(d’Apollonia and Abrami 1997). In the Shevlin et al. (2000) study, the authors found
high correlations (r values ranging from .60 to .82) between the six items related to
lecturer attributes and the lecturer ability factor. Furthermore, the authors found high
Downloaded by [University of Birmingham] at 08:22 16 November 2014

correlations (r values ranging from .53 to .85) between the five items related to
aspects of a particular module and the module attributes factor. Hence, the authors
concluded that the items used to assess teaching effectiveness were good indicators
of the lecturer ability factor and the module attributes factor. Both the current study
and the Shevlin et al. (2000) study are consistent with the emergence of correlational
construct-validity designs in the 1980s to examine the validity of student ratings of
instruction (Greenwald 1997) and with the broad construct-validation approach
suggested by Marsh and Roche (1997). In this study, reliability was determined by
the internal-consistency reliability index. The internal-consistency reliability for the
questionnaire was high (i.e., .95). Hence, only about 5% of the variability in the
responses can be attributed to error. Other studies have investigated reliability from a
different perspective. For example, Obenchain, Abernathy, and Wiest (2001) investi-
gated reliability in rating courses by comparing holistic ranks and mean scores on the
attribute rating scale. The authors found, for the overall sample, a significant correla-
tion between the attribute rating scale and the holistic rank; thus, supporting the reli-
ability of student evaluation of faculty teaching. However, at the individual student
level, matches between mean scores on the attribute rating scale and holistic ranks
were inconsistent.
Although the results of this study provides validity and reliability support for the
SEI 11 years after it was first implemented, caution is exercised in the interpretation
of the findings. There is a possible ceiling effect that occurred in students’ ratings, as
reflected by the high mean ratings. Hence, there is only a small amount of variability
in ratings considering the diversity in courses and teaching styles of the instructors
(e.g., lecture and discussion). Considering the sample was all undergraduate students,
it is possible that factors unrelated to instructional effectiveness could have attributed
to the high mean ratings. For example, instructor characteristics such as enthusiasm,
humour and warmth have been suggested as positively impacting student evaluations
(e.g., Obenchain, Abernathy, and Wiest 2001). Considering the influences of other
factors on students’ evaluation of instruction, issues related to the validity of use of
the information must be considered (McKeachie 1997). In some instances, personnel
committees make decisions regarding tenure, promotion and merit pay based on
comparisons of ratings in courses that differ in structure, goals or content. To reduce
some of the potential factors that may influence student ratings and to increase the
validity of use of student ratings of instruction, institutions should consider having
students rate their attainment of educational objectives (McKeachie 1997).
In conclusion, the present study has both theoretical and practical significance. In
a theoretical sense, this study contributes to the existing literature related to the
8234 J. Zhao and D.J. Gallant

construct validity of student evaluation of faculty instruction. In the practical sense,


the study provides the university administrative staff with important information about
the SEI as it relates to the validity of use of results in making personnel decisions such
as tenure and promotion.
Future studies related to SEI are extensive. For example, a study can be conducted
to test whether the results were invariant across samples. That is, the current sample
can be split into two subsamples, where the first sample is the calibration sample and
the second sample is the validation sample. The cross validation index can be used to
test whether the results are consistent across the two samples. In addition, studies
could be conducted to investigate what factors influence students’ evaluation of
faculty instruction as well as exploring differences in responses of males and females,
graduate and undergraduate students and students of different ethnicities. Other stud-
Downloaded by [University of Birmingham] at 08:22 16 November 2014

ies could focus on determining the extent to which classroom environment or other
related factors affect student evaluation of faculty instruction in various contexts (e.g.,
research institutions and teaching institutions). Given that relatively few studies have
been done in the eastern world (Ting 2000), research could also be conducted to
compare the use of SEI in North American settings and countries such as China.

Notes on contributors
Jing Zhao is a PhD student in the School of Educational Policy and Leadership, College of
Education and Human Ecology at The Ohio State University. Her research interests lie in
applied measurement, the impact of large-scale testing and English as a second language.

Dorinda J. Gallant is an assistant professor in the School of Educational Policy and Leadership,
College of Education and Human Ecology, The Ohio State University. Her research interests
include applied measurement in elementary, secondary and postsecondary education within the
context of program, product and personnel evaluation.

References
Abrami, P.C., S. d’Apollonia, and P.A. Cohen. 1990. The validity of student ratings of instruction:
What we know and what we do not. Journal of Educational Psychology 82: 219–31.
Abrami, P.C., and D.A. Mizener. 1983. Does the attitude similarity of college professors and
their students produce ‘bias’ in course evaluations? American Educational Research
Journal 20: 123–36.
Aleamoni, L.M. 1981. Student ratings of instruction. In Handbook of teacher evaluation, ed.
J. Millman, 110–45. Beverly Hills, CA: Sage.
Aleamoni, L.M. 1987. Student rating myths versus research facts. Journal of Personnel
Evaluation in Education 1: 111–9.
Aleamoni, L.M. 1999. Students rating myths versus research facts from 1924 to 1998. Journal
of Personnel Evaluation in Education 13: 152–66.
Algozzine, B., J. Beattie, M. Bray, C. Flowers, J. Gretes, L. Howley, G. Mohanty, and F. Spooner.
2004. Student evaluation of college teaching: A practice in search of principles. College
Teaching 52, no. 4: 134–41.
Arubari, E.A. 1987. Improvement of instruction and teacher effectiveness: Are student ratings
reliable and valid? Higher Education 16, no. 3: 267–78.
Barnes, L.L.B., and M.W. Barnes. 1993. Academic discipline and generalizability of student
evaluations of instruction. Research in Higher Education 34: 135–49.
Blackburn, R.T., and Clark, M.J. 1975. An assessment of faculty performance: Some corre-
lates between administrators, colleagues, students and self-ratings. Sociology of Education
48: 242–56.
Browne, M., and R. Cudeck. 1993. Alternative ways of assessing model fit. In Testing struc-
tural equation modeling, ed. K. Bollen and J. Long, 136–62. Newbury Park, CA: Sage.
Assessment & Evaluation in Higher Education 235
9

Burdsal, C.A., and J.W. Bardo. 1986. Measuring students’ perceptions of teaching: Dimensions
of evaluation. Educational & Psychological Measurement 56: 63–79.
Cronbach, L.J., and P.E. Meehl. 1955. Construct validity in psychological tests. Psychological
Bulletin 52: 281–302.
d’Apollonia, S., and P.C. Abrami. 1997. Navigating student ratings of instruction. American
Psychologist 52: 1198–208.
Ellett, C.D., K.S. Loup, R.R. Culross, J.H. Mcmullen, and J.K. Rugutt. 1997. Assessing
enhancement of learning, personal learning environment, and student efficacy: Alternatives
to traditional faculty evaluation in higher education. Journal of Personal Evaluation in
Education 11: 167–92.
Feldman, K.A. 1989. Instructional effectiveness of college teachers as judged by teachers
themselves, current and former students, colleagues, administrators, and external (neutral)
observers. Research in Higher Education 30: 137–74.
Greenwald, A.G. 1997. Validity concerns and usefulness of student ratings of instruction.
American Psychologist 52: 1182–6.
Downloaded by [University of Birmingham] at 08:22 16 November 2014

Hobson, S.M., and D.M. Talbot. 2001. Understanding student evaluations. College Teaching
49, no. 1: 26–31.
Hu, L.T., and P.M. Bentler. 1999. Cutoff criteria for fit indices in covariance structure analy-
sis: Conventional criteria versus new alternatives. Structural Equation Modeling 6: 1–55.
Jöreskog, K.G., and D. Sörbom. 2004. LISREL 8.7. Lincolnwood, IL: Scientific Software
International.
Kline, R.B. 1998. Principles and practices of structural equation modeling. New York: The
Guilford Press.
Machina, K. 1987. Evaluating student evaluations. Academe 73: 19–22.
Marsh, H.W. 1984. Students’ evaluations of university teaching: Dimensionality, reliability,
validity, potential biases, and utility. Journal of Educational Psychology 76: 707–54.
Marsh, H.W. 1987. Students’ evaluation of university teaching: Research findings, method-
ological issues, and directions for future research. International Journal of Educational
Research 11: 253–388.
Marsh, H.W. 1994. Weighting for the right criteria to validate student evaluations of teaching
in the IDEA system. Journal of Educational Psychology 86: 631–48.
Marsh, H.W. 1995. Still weighting for the right criteria to validate student evaluations of
teaching in the IDEA system. Journal of Educational Psychology 87: 666–79.
Marsh, H.W., and L. Roche. 1993. The use of students’ evaluations and an individually struc-
tured to enhance university teaching effectiveness. American Educational Research Journal
30, no. 1: 217–51.
Marsh, H.W., and L.A. Roche. 1997. Making students’ evaluations of teaching effectiveness
effective. American Psychologist 52: 1187–97.
McKeachie, W.J. 1997. Student ratings the validity of use. American Psychologist 52, no. 11:
1218–25.
McMillan, J.H. 2004. Classroom assessment: Principles and practices for effective instruction.
3rd ed. Boston: Pearson Education.
Obenchain, K.M., T.V. Abernathy, and L.R. Wiest. 2001. The reliability of students’ ratings
of faculty teaching effectiveness. College Teaching 49, no. 3: 100–4.
Office of Academic Affairs. 2001. The Ohio State University student evaluation of instruction
(SEI) handbook. http://www.oaa.osu.edu/eval_teaching/seihandbook.html
Overall, J.U., and H.W. Marsh. 1980. Students’ evaluations of instruction: A longitudinal
study of their stability. Journal of Educational Psychology 72: 321–5.
Schumacker, R.E., and R.G. Lomax. 2004. A beginner’s guide to structural equation model-
ing. 2nd ed. Mahwah, NJ: Erlbaum.
Shevlin, M., P. Banyard, M. Davies, and M. Griffiths. 2000. The validity of student evaluation
of teaching in higher education: Love me, love my lectures? Assessment & Evaluation in
Higher Education 25, no. 4: 397–405.
Ting, K.F. 2000. A multilevel perspective on student ratings of instruction: Lessons from the
Chinese experience. Research in Higher Education 41: 637–61.
Wagenaar, T.C. 1995. Student evaluation of teaching: Some cautions and suggestions. Teaching
Sociology 64, no. 1: 64–8.

You might also like