You are on page 1of 13

Development and Validation of Behaviorally-Anchored Rating Scales

for Student Evaluation of Pharmacy Instruction1


Paul G. Grussing
College of Pharmacy, M/C 871, The University of Illinois at Chicago, 833 South Wood Street, Chicago IL 60612

Robert J. Valuck
Department of Pharmacy Administration, The University of Illinois at Chicago, Chicago IL

Reed G. Williams
Department of Medical Education, The University of Illinois at Chicago, Chicago IL

The study purpose was to improve pharmacy instruction by identifying dimensions of teaching unique to
pharmacy education and developing reliable and valid rating scales for student evaluation of instruction.
Error-producing problems in the use of student ratings of instruction, existing rating methods and dimensions
of effective teaching are reported. Rationale is provided for development of Behaviorally-Anchored Rating
Scales, BARS, and the methods used are described. In a national study, 4,300 descriptions of pharmacy
teaching were collected in nine critical incident writing workshops at four types of schools. Ten dimensions of
pharmacy teaching were identified and validated for classroom, laboratory and experiential teaching.
Scales were developed for each dimension. Measures of scale quality are described including retranslation
data, standard deviations of effectiveness ratings, reliability and validity data and data supporting reduction of
leniency and central tendency effects. Four outcomes of the project are discussed, emphasizing two: use
of the newly-validated dimensions in modification of traditional numerically-anchored scales in local use, and
of BARS in providing clear and convincing performance feedback to pharmacy instructors.

INTRODUCTION AND PURPOSE plines, without having been validated for use in rating
pharmacy instruction in particular, additional questions of
From among the traditional faculty roles of teaching, re- validity and rating error arise.
search and service, this study investigated only the evalua-
tion of teaching. Teaching performance may be evaluated Error in Instructor Ratings
using multiple data sources: (i) documented self-evaluation Reduction of measurement error is imperative in evalu-
and course improvement; (ii) peer review of instructional ation of faculty teaching performance. Eight kinds of error
methods, instructor-written texts or manuals, and other in the administration and use of instructional performance
developed media, syllabi and tests; (iii) gains in student rating scales prompted this study. The research and devel-
learning; (iv) student ratings of instructor performance; (v) opment methods chosen were intended to minimize most of
observation or videotaping; and (vi) teaching awards (1,2). these common sources of rating error, especially the first
This study focused on only one data source: student evalua- five listed: (i) error in instrument content; (ii) error in the
tion of faculty performance. Its purpose was to improve the interpretation of the meaning of ratings (3-5); (iii) show-
quality of instruction in U.S. colleges of pharmacy by iden- manship(6-8); (iv) common rating error effects such as “halo
tifying dimensions of pharmacy instruction and developing effect”(9), “reverse halo effect”(10), “leniency effect” and
new, reliable and valid student measures of effective phar- “harshness (or strictness) effect”(11), and “central ten
macy teaching2. Such measures of instructional performance,
whether utilized in instructor self-assessment, for periodic 1
performance reviews or in the critical promotion and tenure The research was supported, in part, by a GAPS grant from the SmithKline
Beecham Foundation through the American Association of Colleges of
process, are essential for the continued development of Pharmacy.
effective teachers. If pharmacy students and instructors are 2
The term “dimension”, as used in this article, refers to an axis, or
to have confidence in instructional rating systems and to continuum, along which performance descriptors, varying in quality or
eventually benefit from the rating process, clear dimensions of intensity, may be ordered. The dimension is identified arid shown to be
effective teaching should be identified and rating errors independent and non-overlapping in meaning with other clusters of
minimized. Problems with the content validity of student similar behaviors.
ratings of instructor performance introduce rating error 3
Formative evaluation refers to evaluation of a process or product to
when instruments are not sensitive to the unique differences provide feedback for the purpose of making possible mid-process refine-
in lecture, laboratory and experiential instruction. More- ments or improvements.
4
over, when instructor rating instruments are developed for Summative evaluation is conducted to examine the quality or impact of a
use across university colleges and departments or disci- final, completed process or product.

American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994 25


Table I. Dimensions of teaching selected from the education literature
Tentative Literature sources and dimensionsb
study 1 2 3 4 5 6
dimensionsa Dickinsonc Wotrubad ICESe Centraf Dasg Hildebrandh
D Course Subject and Course Organization Course Provides
organization course management, structure, outlining, objectives
organization structure clarity structuring
A. Teaching Teaching Speaking, Style, Lecturing Teaching Teaching
ability methods interpreting skill and ability style and clarification
clarity methods application
F. Grading and Testing, as Fairness in Grading and Grading, Objective
feedback learning testing exams examinations evaluation,
experience and feedback
G. Student- Student- Flexible, Class climate, Interaction, Sensitivity. Individual
instructor faculty attitudes re: warmth, student availability, interaction,
interaction interaction students Concern rapport responsive accessibility
H. Workload, Work Workload,
course requirements, course
difficulty difficulty difficulty
I. Enthusiasm/ Enthusiasm, Enthusiasm, Dynamic, Dynamism,
motivation stimulates encourages stimulates enthusiasm
thought thinking thinking
J. Knowledge of Competence Knowledge Knowledge
subject area of subject
a b c d e f
Dimensions listed by final dimension letters and order. First author only. See reference 8. See reference 23. See reference 25. See reference
1. g See reference 28. h
See reference 7.

dency effect”(12,13); (v) error in instrument reliability; (vi) to select the most appropriate scaling method. The litera-
mixed purposes of evaluation3,4(14,15); (vii) inconsistent ture supporting this selection decision is described. The
methods of instrument administration(16-19); and (viii) third step was to conduct critical incident workshops for the
errors in data implementation(20,21). collection of descriptors of effective and ineffective teach-
Procedures for minimizing the first five types of rating ing in U.S. colleges of pharmacy. Editing and selection of
errors were sought. Emphasis was placed on selecting or collected incidents was the fourth step. The fifth was to
developing procedures and instruments to rate the most establish and validate dimensions of pharmacy teaching
appropriate pharmacy teaching behaviors and to rate them using the retranslation process to demonstrate indepen-
accurately and consistently. dence of the dimensions5 (22). Simultaneously, the sixth step
of obtaining effectiveness ratings for incidents from study
Study Goals panelists would provide data for establishing scale anchors.
Four goals were set for the study. First, the project The seventh step was to develop scales by selection of
would identify dimensions of instructional behavior unique meaningful behavioral anchors based on the retranslation
to pharmacy education and to three teaching environments: process and high rater agreement on the scale anchors. A
classroom, laboratory and experiential. Faculty colleagues concurrent validation study would constitute the eighth
have reported the belief that effective pharmacy teaching is step, for which traditional, numerically-anchored scales,
different from good teaching in other departments and parallel in content, would be developed. The final step was
disciplines, and that it varies from one pharmacy teaching accomplished through the concurrent validation study, yield-
environment to another. The researchers sought to apply a ing a useful parallel set of traditional, content-parallel nu-
method, other than factor analysis, to identify and describe merically-anchored scales.
dimensions of pharmacy teaching. The second goal was to
develop Behaviorally-Anchored Rating Scales, BARS, for Identification of Tentative Dimensions
each dimension and teaching environment. Third, the re- Tentative dimensions of pharmacy instruction were
searchers intended to demonstrate concurrent validation of identified and validated based on a review of the pertinent
the scales developed, by showing correlations with a known literature. Tables I and II display dimensions mentioned in
reliable and valid, traditional numerically-anchored scale of studies and review articles outside and within pharmacy
parallel content. Finally, the project was designed to demon- education. The tentative dimensions so identified were later
strate generalizability of the scales for use in all U.S. colleges used for preliminary classification of student-and faculty-
of pharmacy. generated critical incidents of pharmacy teaching.
METHODS 5
The Smith and Kendall retranslation process uses an independent group of
Nine study steps were elaborated to achieve the project expert raters who reallocate descriptors of performance to dimensions
describing performance qualities. It is analogous to the procedures used
goals. First, the study began with identification of tentative by language translators to ensure that all of the meanings of an original
dimensions of pharmacy teaching. This initial validation text are preserved. Text material is translated into a foreign language, then
step would be based on the literature. The second step was retranslated to the original by an independent expert.

26 American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994


Table II. Teaching dimensions in the pharmacy education literature, 1975-90
Citations, American Journal of Pharmaceutical Educationb
1 2 3c 4 5 6c 7 8 9
Vol. 39, 39, 40, 40, 41, 42, 44, 47, 50,
pp. 446-448 450-552 8-13 165-166 317-325 114-118 428-430 102-107 193-195
Dimensionsa Carlson Zanowiak Jacoby Sauter Purohit Kotzand Peterson Martin Downs
A. Teaching ability x x x x x x x x
G. Student-instructor interaction x x x x x x x x
I. Enthusiasm/motivation x x x x x x x
J. Knowledge of subject area x x x x x
D. Course Organization x x x x x x
F. Grading and feedback x x x x
H. Workload, course difficulty x x x
a
Listed by frequency of mentions, final dimension letters and order. bFirst author only.
c
Dimensions based on authors’ original research. dAlso Vol. 40, pp. 3-7.

Education, General. Seven dimensions of effective instruc- mented a factor-analyzed instrument for student evaluation
tion were reported often in the education and psychology of undergraduate pharmacy instruction(30, 31). Jacoby de-
literature. Table I summarizes the most frequently men- scribed how modification of an existing instrument for use in
tioned dimensions of teaching in original studies or reviews. student evaluation of pharmacy teaching contributed to
In their article describing the development of a teacher improved classroom instruction(32). Based on this research,
rating instrument, Wotruba and Wright reviewed 21 pub- an instructional consulting service was initiated to provide
lished studies of student evaluation of teaching(23). Of the feedback to faculty. Purohit et al. explored issues of student
40 criteria they listed, the nine most frequently mentioned evaluation of instruction(33). Citing Hildebrand, the au-
were also cited in a text chapter on uses and limitations of thors discuss “components” of effective teaching as per-
student ratings(24). Seven are shown in Table I. The text ceived by their colleagues and by students(34). Sauter and
author also summarized dimensions of teaching behavior as Walker, in reporting a theoretical model for pharmacy
identified in factor-analytic studies, four of which are re- faculty peer evaluation, mentioned basic components of
ported in Table I. Brandenberg et al. described development teaching and learning as requisite elements in such evalua-
and validation of scales for student evaluation of teach- tion^). Two authors reported special needs for evaluation
ing(25). Their work yielded a comprehensive evaluation of clinical teaching performance. Martin et al, described
system available at the researchers’ school(26). Instructors clinical faculty evaluation and development programs at
may select traditional, numerically-anchored items from a one college of pharmacy(36). Downs and Troutman identi-
“catalog” of over 400 items classified by teaching dimen- fied criteria for the evaluation of clinical pharmacy teach-
sions. Items designed for use in summative evaluation are ing(37). Three articles, written as reports or invitational
normed by instructor rank and by required/ elective status. articles, mentioned qualities of good pharmacy teaching. As
Items designed for instructor’s formative self-evaluation are part of a panel devoted to the evaluation of pharmacy
not so normed. Hildebrand et al. asked faculty and students teaching, Carlson suggested a comprehensive evaluation
to provide descriptions, in observable and behavioral terms,
of the “best” and “worst” teaching they had experienced(27). program(38). Citing articles by Kulick, and by Brown, the
Responses were factor-analyzed into five clusters (dimen- author emphasized three major dimensions students use to
sions) of teaching performance. In a Canadian study of judge their teachers, and discussed major functions in the
teaching in the behavioral sciences, Das et al. identified supervision of students by clinical pharmacists(39,40).
seven dimensions of teaching, and developed BARS for Peterson, in an ad hoc committee report, mentioned key
student evaluation of instruction (28). In equivalent forms features of pharmacy teaching performance(41). Zanowiak,
comparisons using traditional rating instruments, they re- in an invitational article, citing Kiker, mentioned character-
ported the BARS to be at least as psychometrically sound in istics of an effective pharmacy teacher(42,43).
terms of reliability, inter-rater variability and content valid- Some of the authors cited in Tables I and II described
ity. Dickinson and Zellinger identified six teaching dimen- additional kinds of dimensions and behaviors not shown in
sions for veterinary medicine instructors(29). the tables. First, problems in attaching meaning to labels
assigned to factors in earlier studies could introduce bias in
Education, Pharmacy. After review of the education and the generation of unobservable behaviors in this study.
psychology literature, evidence of criteria for effective phar- Specific behaviors might not, in the retranslation process
macy teaching was sought. Ten articles from the pharmacy chosen for this study, be assigned to the same dimension as
education literature, which described or mentioned dimen- suggested by the factor name coined by authors of previous
sions of teaching, are summarized in Table II. Three of the scales and instruments. A second type of dimension not
articles reported research on pharmacy instruction. Based listed in the tables was based on the notion of self-rated
on deficiencies in the use of rating instruments designed for student accomplishment. Behaviors associated with this
use in faculty performance evaluation generally, Kotzan named factor seemed unlikely to be collected and used as
and Mikael and Kotzan and Entreken developed and imple scale anchors in this research which would focus on teacher,
not student, behaviors. Finally, some instruments contained

American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994 27


items describing environmental conditions and curricular and the ordinal relationships among scale anchors were
features which were beyond the control of a single instruc- disguised(8). Green and Sauser, followed by Champion and
tor-ratee. Such behaviors were not expected to result from Green, developed BARS scales for use by psychology stu-
this study which would use critical incidents describing dents in rating of their instructors, and reported compari-
observable instructor behaviors only. sons of the scale properties(63,64). Horn et al. compared
Choice of Scaling Method properties of a BARS scale with a well-established, content-
The Behaviorally-Anchored Rating Scale, BARS, was parallel numerically-anchored scale(58). Business under-
chosen for development in this study because of its unique graduate students rated their instructors twice and the study
measurement properties. First, it relies on critical incidents compared the effect of mid-course feedback for both types
which may be classified into dimensions of behavior shown of scales.
to be unique and independent of each other in their mean- Distinctions are made between BARS and other behav-
ing. Second, it consists, for each performance dimension iorally-based rating scales. Seeking improvements over tra-
identified, of an array of behavioral statements which range ditional graphic, numerically-anchored scales, panels of
from most effective to least effective. Raters are instructed experts may write broad behavioral descriptions for use as
to read the entire continuum of behaviors and then select the scale anchors. Examples may be found in adaptations of the
one which most closely describes the actual, or expected, goal-attainment scaling process(65). Two examples occur in
behavior of the ratee. Each statement is accompanied by a the pharmacy literature describing methods of rating phar-
number on the scale, one of which is recorded to indicate the macy residents, pharmacy students, and pharmacy employ-
ratee’s performance on that particular dimension. ees generally(66,67). Such instruments enjoy advantages in
A review of the literature supports the choice. First, the ease and cost of development. Their disadvantages include
development of BARS scales for rating performance of showing less evidence of reliability and validity. Economy in
pharmacy practitioners and student externs, and as a crite- development has also been described in connection with, so-
rion measure in prediction studies, was previously re- called, “Short-cut” BARS(64).
ported(44,45). Second, BARS have been used for evalua- Critical Incident Collection
tion of performance in a wide variety of other professions Sampling. This study was based upon critical incidents of
and occupations(46,47). They are claimed by some research- teaching behavior in a variety of pharmacy teaching envi-
ers to demonstrate more reliability and validity than nu- ronments. In order to ensure generalizability of scales for
merically-anchored scales because the behaviors serving as use across all school types, study schools selected as sources
scale anchors are clear, unambiguous statements of ratee for generating critical incidents were classified and selected
performance (48-50). This clarity is supported by following using four strata: (i) BS-or PharmD-conferring; (ii) Public or
the Flanagan critical incident technique in scale item gen- private ownership; (iii) Geographic (East 6-8or West); and (iv)
eration, instead of having experts write general descriptors High vs. low graduate-education emphasis (68). Each of the
of performance along a continuum, or of using a traditional these variables was believed to contribute to the instructional
numerically-anchored scale only (51). Several studies exam- culture of the colleges, possibly impacting upon the
ined the psychometric properties of BARS vs. numerically- methods, styles and quality of teaching.
anchored scales. Comparable reliability and validity was A three-stage sampling procedure was used, combining
observed and reported(52-56). However, one early article systematic and random sampling. In stage one, based on the
compared BARS with traditional scales for leniency error four strata, names of all U.S. schools were randomly as-
and inter-rater agreement and found more favorable scale signed to one of the appropriate 16 cells. Two sets of four
properties for traditional scales(57). A third reason for cells were systematically eliminated because they did not
selecting the BARS scale type is that the vivid behavioral contain schools in all cells. Then one set of the two remaining
descriptions used are easy for raters to associate with ratee complete sets was randomly chosen. In stage two, schools
performance, and are very compelling in pointing out where were randomly selected from within each of four cells in the
the ratee may benefit from introspection and performance selected stage one set. Each of the four schools then repre-
sented each stratum and each stratum was represented by
improvement(58-60). Finally, BARS scales for student evalu- two schools.
ation of college instruction have been reported.
Six cited studies have demonstrated the presence of The third stage of sampling occurred when research
independent dimensions of teaching performance and the collaborators at the four selected schools, following re-
feasibility of generating “scaleable” behaviors in construc- searcher guidelines, arranged for representative types of
tion of reliable and valid rating instruments. Harari and volunteer students and faculty to attend local critical inci-
Zedick identified nine dimensions of teaching behavior and dent writing workshops. The local collaborators were re-
developed corresponding BARS scales for evaluating teach- quested to secure broad representation of differing educa-
tional levels in a group of 30 undergraduate professional
ing ability of college psychology professors(61). They found students and from all disciplines in a group of 15 faculty
that when faculty and student ratings for quality of instruc- representing classroom, laboratory and experiential teach-
tor behavior were correlated, that “1.0 or near -1.0 relation- ing.
ships” were found. Das et al. identified performance dimen- 6
sions associated with teaching behavioral science courses, Private vs. public ownership of U.S. colleges of pharmacy was confirmed in
personal correspondence with Mary Bassler, American Association of
developed BARS scales based on the dimensions, and then Colleges of Pharmacy, May 30.1990.
compared scale properties with parallel versions in a nu- 7
merically-anchored scale format(62). Dickson and Zellinger High, vs. low. graduate education emphasis was defined as schools above
compared veterinary medicine students’ ratings of their or below the median number(22) of Ph.D. students enrolled in U.S.
college of pharmacy graduate programs in 1989.
instructors using a BARS scale and a “mixed standard scale”
8
in which items were scrambled so that both the dimensions Eastern schools were defined as those located in AACP-NABP Districts I-
IV; Western in Districts V-VIII.

28 American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994


Critical Incident Workshops. The researchers’ school was Similar incident generation outcomes occurred for the
used as a pilot site for training in the conducting of item second and third new tentative dimensions. Clusters of
writing workshops. Use of forms and presentation of the incidents describing both laboratory and experiential in-
writing tasks in a clear, standardized and reliable manner struction were observed. The numbers and kinds of inci-
among researchers, was checked during the pilot. Incidents dents were sufficiently rich and varied to enable a useful
collected were reviewed for their quality and content rela- number of potentially-scaleable items to be used in the
tive to the tentative dimensions of instruction. No modifica- retranslation process/
tions in forms or procedures were made after the pilot
administration. Incident Selection Criteria
Separate workshops for students and for faculty panel- Ten criteria were applied in the selection and editing of
ists were then conducted at each study school. Panelists incidents for the retranslation process.
were asked to think about effective and ineffective teaching 1. The incidents must have described instructional behav-
incidents they had actually experienced or observed in the iors, not environmental ones beyond the control of the
classroom, laboratory or experiential teaching site. Using instructor (e.g., “This instructor’s lecture room is not
forms provided, they were asked to write brief “stories” air-conditioned.”)
about each incident they could recall, describing the situa- 2. Behaviors must have been observable in the classroom,
tion and specifically what the instructor said or did. Panelists laboratory or experiential teaching site. Opinions or
were reminded that the incidents were to have been person- vague general descriptors of teaching “attributes” were
ally experienced and described as observable behaviors. not used, nor were student attitudes or moralistic state-
Positive feedback was provided to the groups, based on ments based on students’ belief systems.
selected good examples of clear, vivid and unidimensional 3. Each incident must have been clear, unambiguous and
incidents written. Near the end of the workshops, panelists unidimensional in its meaning.
were given a list of seven tentative dimensions of pharmacy 4. Frequency of mention was a primary selection factor,
teaching to prompt them to recall and write additional demonstrating importance of behaviors cited by mul-
incidents. See Tables II and V. In addition to critical incident tiple panelists and occurring across several types of
writing, students were invited to complete a learning styles colleges of pharmacy.
inventory(69-71). As an incentive, student-participants were 5. Only behaviors which related to one of the teaching
promised written feedback on their learning styles and dimensions were included.
suggestions for adaptation to differing teaching styles and 6. Only descriptors of instruction in the professional cur-
formats. As a second incentive, student participants were riculum were included, not exclusively pre-pharmacy
invited to a post-workshop luncheon. teaching behaviors.
After the workshops, students received a letter includ- 7. Educational jargon or school-specific terminology was
ing a report providing feedback on their learning style and avoided.
suggestions for adaptation to differing teaching styles and 8. Behaviors describing unusual instructor leniency or
formats. They were encouraged to visit with the local re- lack of rigor were avoided because some students might
search collaborators for additional information about how perceive them as evidence of poor teaching while others
to apply their own learning styles. Productivity in generation might rate them highly because of perceptions that
of incidents by 138 critical incident writers was broadly- “easy” behaviors are associated with effective teaching.
based among schools, student educational levels and all 9. Incidents describing unprofessional conduct, obviously
faculty disciplines. Students at the pilot plus study schools unethical or criminal activity were also eliminated. Space
wrote 3,098 incidents (72 percent, school mean = 620, SD = for low scale anchors was reserved for ineffective, yet
239). Faculty members from all disciplines wrote 1,202 frequently-occurring incidents, not for obviously un-
incidents (28 percent, School mean = 242, SD = 78). The common and aberrant behaviors.
mean numbers of incidents written per student and faculty 10. Finally, incidents were reviewed to ensure brevity, a
were 22 (SD = 4.8) and 23 (SD = 4.8) respectively. uniform format and unidimensional behavioral style.
Editing and Selection of Incidents Importance, Retranslation and Effectiveness
New Dimensions. During review of the critical incidents, Ratings
and their classification into seven tentative dimensions, The incidents were retranslated and rated following the
three additional, distinct clusters of sufficient numbers of process developed by Smith and Kendall(22) and previously
scaleable incidents were observed. The new tentative di- reported in the pharmacy literature (72). Because of the
mensions of pharmacy teaching were: (i) “Selection and use large (N = 402) number of incidents retained after the
of media;” (ii) “Teaching ability—laboratory;” and (iii) editing process, it was necessary to divide them proportion-
“Teaching ability—experiential.” While the selection and ally into two booklets to be sent to two separate retranslation
use of effective media was frequently subsumed under the groups. Each workbook contained approximately an equal
effective teaching dimension in previous studies, the media number of incidents representing each dimension. Incidents
selection and development incidents collected in this study were selected by the researchers to represent high, medium
suggested independence from behaviors describing instruc- and low quality instructional behaviors.
tor lecture performance. Moreover, a wide variety of inci- Care was taken to ensure that student raters were
dents reflecting effective, mediocre and ineffective media “upperclassmen” with exposure to instruction at all curricu-
selection and use were observed. It also became apparent lar levels. To enhance this, 40 students from the final profes-
that incidents relating to choice of media might confound sional year at the researchers’ school were added to increase
ratings based on incidents describing instructor behavior in 9
the classroom. The final newly-identified tentative dimension, “Teaching Ability —
Experiential,” includes behaviors common to several kinds of community
and institutional experiential instruction, not clinical instruction alone.

American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994 29


the total student retranslator/rater pool to 106. This also The two dimensions rated most important were “Teaching
enabled the critical incidents to be retranslated/rated by Ability—Lecture” and “Knowledge of Subject Area.” “Se-
students with and without incident-writing experience. lection and Use of Media” and “Workload/Course Diffi-
Re translation booklets were mailed with a letter of explana- culty” were considered by students to be the least important.
tion. After 10 working days a postcard reminder was sent to Although statistically significant differences were shown
non-respondents. Ten working days later another letter and between dimension importance ratings, practical review of
retranslation booklet was sent to the remaining non-respon- both of these ratings suggested that all 10 dimensions are
dents. Fifty-seven students responded for a 54 percent re- generally considered to be important by students and that
sponse rate. none should be eliminated from the final set of rating scales.
Student raters were given four tasks—two to validate After the scales were constructed using incident ratings
the importance of dimensions to be identified, one to re- with the greatest rater agreement, a possible source of
translate the incidents into dimensions, and one to assign an variance in ratings was examined. Using data from the
effectiveness rating to each incident. First, the importance of learning style inventories administered to all student inci-
each dimension was determined by asking students to study dent writers who also participated in the retranslation pro-
the dimension descriptions in Table V, and then to rate their cess, four basic learning styles were identified(69). Kolb has
importance on a seven-point scale. The second task asked labeled these styles as “Converger”, “Diverger”, “Assimila-
students to divide and assign a total of 100 points to the 10 tor,” and “Accomodator”. Application of the styles to phar-
dimensions. These first two tasks were designed to validate macy students’ and pharmacists’ learning has been de-
the dimensions by showing their relative importance to scribed by Garvey et al. and Riley(70,71). To determine if
students. If dimensions would not be valued as being impor- learning style differences had an impact on overall respon-
tant, they might not be selected for inclusion in the final dents’ ratings, a grand mean of all scale anchor points for all
rating scales. Importance ratings could also be used to assign 10 scales was computed. Using one way analysis of variance,
weights to each dimension in the calculation of an overall no significant rating differences among the four learning
teaching performance score. style groups were found, F3,50 = 0.34, P = 0.80. To determine
The third task, the retranslation step, involved assigning if learning style differences related to mean scale ratings for
each incident listed to one of the 10 tentative dimensions. individual dimensions, ten one way analyses of variance
This procedure was intended to show the independence of were conducted and none yielded significant differences
the dimensions. If respondents would not agree that inci- between learning style groups at P = 0.05. The mean P value
dents were descriptive of their respective dimensions, the for these tests was 0.80, with values ranging from P = 0.22 to
incidents would not be useable as scale anchors. A standard P = 0.99. Differences in learning style among respondent-
of 80 percent agreement on assignment of incidents to raters did not relate to their ratings which established scale
dimensions was used to retain the item for scaling. anchor points.
The final task was, for each incident, marking of an The final scales are typical of BARS scales generally in
effectiveness rating on a 15-point scale with 15 points being terms of distribution of anchors at the high and low ranges
the highest (most effective) teaching performance. The of the scales. Critical incident generation is “easier” for
purpose was to obtain mean ratings with sufficiently low descriptions of extremely ideal or unsatisfactory behaviors,
standard deviations to enable their use as scale anchors. less productive for generation of “average” incidents of
Guidelines furnished to raters for using the 15-item effec- professional behavior. The researchers elected not to select
tiveness scale have been previously reported(73). items with standard deviations greater than 2.0 in order to
provide more mid-scale range anchors.
Scale Construction
Incidents were retained as scale anchors in the respec- RESULTS
tive scales if at least 80 percent of the participants agreed on The project results are reported first in terms of measures of
assignment to the dimension and if the standard deviation scale quality, before describing the products developed:
about each mean scale point was 2.0 or less.10 After incidents validated dimensions and scales.
were sorted based on these criteria, a group of 11 critical
incidents with standard deviations of less than 2.0, but with Measures of BARS Quality
respondent assignment of less than 80 percent agreement Reliability. Measures of inter-rater agreement and of stabil-
and assignment divided equally between two dimensions, ity were conducted using a limited number of volunteer
was reviewed. A panel of five faculty members was asked to faculty. Inter-rater reliability is reported in Table IV, based
review these incidents to determine their suitability as be- on one lecturer’s performance using a sample of pairs of
havioral descriptors for both dimensions. The group as- ratings. Test-retest reliability was based on two administra-
signed the incidents to the dimension for which they felt the tions of three selected BARS scales in one class, with a five-
best description was provided. After this review, these week time interval. See Table IV. The notion of stability of
incidents were added to the scales only if the behavior was BARS scales is a useful, but not necessary, condition for
different than a behavior at the same, or near to the same, demonstration of BARS scale reliability. Historical effects
scale point. This process yielded an additional eight inci- in the students’ experiencing of instruction are expected
dents as useable scale anchors. during the length of a course. Results show significant test-
Importance ratings were considered in scale construc- retest correlations for two of the three scales, and a corre-
tion. Students responded by indicating that all scales were lated means “t” test showed significant changes (lower
“important” or “very important,” x = 5.77. Based on the 100- ratings over time) for two of the three scales.
point forced distribution, no dimension received lower than 10
a 5 point rating or more than approximately 15 points. Sixty percent agreement is frequently cited as a selection criterion. In
addition, some studies report that incidents with greater variances are
Responses to the two ratings correlated highly, Rho = 0.93. selected for mid-scale anchors in scale construction.

30 American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994


Table III. Numerically-anchored scalea
Name of the instructor being rated:

PLEASE COMPLETE THE FOLLOWING RATINGS SHOWING YOUR EVALUATION ON THE FIVE POINT SCALES
BELOW.

EXAMPLE…The Minnesota Twins will win the World Series in 1993. Agree _ _ _√ _ Disagree
12345

1. The course objectives were: very clear _____ very unclear


2. The instructor stated clearly what was expected of students: almost always _____ almost never
3. Did the instructor make good use of examples and Illustrations? Yes, very often _____ No, seldom
4. It was easy to hear and understand the instructor. Almost always _____ Almost never
5. The instructor summarized material presented in each class. Almost always _____ Almost never
6. The instructor’s clinical demonstrations were clear and concise. Strongly agree _____ Strongly disagree
7. The grading procedures for the course were: very fair _____ very unfair.
8. Was the grading system for the course explained? Yes, very well _____ No, not at all
9. The amount of graded feedback given to me during the course was: quite adequate _____ not enough
10. Were exam questions worded clearly? Yes, very clear _____ No, very unclear
11. How well did examination questions reflect content and emphasis of the course? Well related _____ Poorly related
12. The instructor was sensitive to student needs. Almost always _____ Almost never
13. The instructor listened attentively to what class members had to say. Always _____ Seldom
14. How accessible was the instructor for student conferences about the Available regularly _____ Never available
course?
15. The instructor promoted an atmosphere conducive to work and learning. Strongly agree _____ Strongly disagree
16. The instructor attempted to cover too much material. Strongly agree _____ Strongly disagree
17. The instructor was a dynamic teacher. Yes, very dynamic _____ No, very dull
18. The instructor motivated me to do my best work. Almost always _____ Almost never
19. The instructor stimulated my intellectual curiosity. Almost always _____ Almost never
20. The instructor’s knowledge of the subject was: excellent _____ poor
21. How would you characterize the instructor’s command of the subject Broad and accurate _____ Plainly deficient
22. The instructor seemed well prepared for classes. Yes, always _____ No, seldom
a
Item source: Instructor Course Evaluation System, ICES, University of Illinois, Champaign-Urbana.

Concurrent Validity. BARS ratings were correlated with bers of raters and eliminated ratings of “first-preceptor”
corresponding numerically-anchored scales constructed with student-faculty relationships. Because rotations were sys-
selected items from the catalog of items available from the tematically scheduled, it also ensured representativeness
university’s Office of Instructional Resources. The research- from the wide variety of required clerkships offered. Faculty
ers first identified all catalog items which related to the members responsible for laboratory instruction declined
participation. They noted that items and scale anchors in-
content described in the ten dimensions. Then, 31 items volving quality of laboratory instruments could cause low
were selected which most closely matched the behaviors ratings which, if not kept confidential, might adversely
described in the dimensions tested. The final 22-item nu- affect their departmental and college-wide performance
merically-anchored scale appears in Table III and its con- reviews. All but two of the correlations are positive and
struction in relationship to the ten dimensions is described significant.
in Table IV. A numerically-anchored media scale was not
constructed because the two lecturers did not use media Scale Properties and Error Reduction
other than assigned readings and the chalkboard. BARS and numerically-anchored scales were compared
for scale properties contributing to leniency, central ten-
Selected scales were administered to two groups of dency and halo effects. Evidence for less leniency effect in
students at the researchers’ school, one which rated two the use of BARS was provided by comparing the means for
lecturers and one which rated clerkship instructors. Two both sets of four selected scales: Evaluation, Interaction,
senior faculty members, with courses in lecture format, Workload and Teaching. All four BARS means were lower.
volunteered for the study and signed releases which were The mean BARS rating for four scales was 1.13 scale points
included with the written rating instructions provided to lower, a statistically significant difference. Although these
students. Raters included all students in attendance at one data may suggest that the BARS produce less leniency in
third professional year lecture. After team-taught courses
and unwilling volunteer instructors were eliminated, the ratings, possibly attributable to their unambiguous scale
two available participants received very high ratings with anchors, it is not clear which scale best represents a “true”
low variance and ranges of ratings. Ratings for the experien- rating of instructor performance.
tial rotations were obtained by asking volunteer students Comparison of two scale properties suggest that BARS
from the final professional year and recent alumni who were have produced less central tendency rating effect. The vari-
new, first-year members of the clinical faculty to rate their ance in ratings was greater for all BARS scales. Moreover,
“second preceptor.” This procedure provided sufficient num-

American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994 31


Table IV. Reliabilities and concurrent validation of BARSa using parallel ICESb scales and items
Numerically-anchored BARS reliabilities
scale statistics, ICES Correlation
N of Scale Relia- Inter- Test- statistics
Dimensionsc items items bilityd Rateee raterf retestg rh N
A. Teaching ability-lecture 3 3-5 0.63 1 0.45 87
0.78 2 0.31 0.63 0.43 89
C. Teaching ability-experiential 1 6 3 0.64 30
D. Course organization 2 1-2
0.83 2 0.45 90
F. Student performance evaluation 5 7-11 0.66 1 0.32 86
0.72 2 0.26ns 0.35 0.28 89
0.82 3 0.72 30
G. Student-instructor interaction 4 12-15 0.78 1 0.50 82
0.77 2 0.43 0.18ns 0.51 87
0.96 3 0.90 30
H. Workload/course difficultyj 1 16 2 0.19ns 0.24i 80
3 0.14ns 30
I. Enthusiasm/motivation 3 17-19 0.85 1 0.33 86
0.83 2 0.39 0.47 89
J. Knowledge of Subject Area 3 20-22 0.75 1 0.19ns 86
0.80 2 0.49 90
22k
a
Behaviorally-anchored rating scales.
b
Instructor and Course Evaluation System, University of Illinois.
c
Two scales not tested: B, Teaching Ability-Laboratory, and E, Selection and use of media.
d
Cronbach’s “alpha”.
c
Ratees 1 and 2 are lecturers (N > 80 raters) and ratees no. 3 are experiential preceptors (N = 30 raters).
f
Based on 60 randomly-selected pairs of ratings.
8
Correlations between 2 measures at 5-week intervals, N = 50 pairs.
h
Showing concurrent validity, BARS and ICES.
f-h
All correlations positive and significant at P<0.01, except “i” where P=0.05. Preceptor ratees for the Workload/Course Difficulty item, and lecturer ratee
no. 1 for the Knowledge of Subject Area scale, and 3 BARS scales are non-significant
j-
Low scale reliability, single item used.
k
Nine items deleted from original scale to develop reliable subscales

comparison of the modal ratings for both sets of all four tory,” and “Teaching Ability—Experiential.” Three’ scales
scales shows that the BARS yielded modal ratings which are environment-specific: “Teaching Ability—Lecture,”
were farther from their respective scale mid-points than “Teaching Ability—Laboratory,” and “Teaching Ability—
their adjusted numerical scale counterparts—total differ- Experiential.” The other seven scales apply to all three
ences of 14.9 vs. 8.3 scale points, respectively. teaching environments. By combining the scales as the table
Halo effect was compared by examining correlations of suggests, either seven or eight dimensions of teaching may
measures with each other within BARS and within numeri- be measured in three pharmacy teaching environments.
cal scale types. If scales show a low inter-correlation their Laboratory instruction might also include evaluation of
independence is demonstrated, suggesting that raters are selection and use of media. Three sample scales appear in
less apt to allow performance in one area to affect their the Appendix.
ratings in another. Evidence for lower halo effect for BARS
was not found. The mean intercorrelation for all four BARS BARS Scales. A total of 134 critical incidents “survived” the
was 0.71, SD = 0.10, and for the numerical scales, 0.58, SD = retranslation and effectiveness rating process, with a range
0.19. of from 10 to 19 incidents used as anchors per scale. The
process and results are summarized in Table VI. The mean
Project Products percentage agreement on assignment of incidents to dimen-
Dimensions. Ten independent dimensions of pharmacy sions, 79.6 percent, nearly met the 80 percent retranslation
teaching were identified. They are described in Table V and goal. The mean standard deviation of 1.76 scale points
include three previously-unreported new dimensions: “Se- illustrates strong student rater agreement on the level of
lection and Use of Media,” “Teaching Ability—Labora- effectiveness of each scale point.
32 American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994
Table V. Pharmacy instruction dimensions or revision of traditional scales; (ii) reliable and valid nu-
merically-anchored I.C.E.S. scales; (iii) reliable and valid
Dimensionsa BARS for administration, and iv) BARS for use in faculty
A. Teaching Ability—Lectureb development.
Audible and clear speaking; Interpretation and explana- Utility of the Dimensions in Local Scale Development or
tion of concepts ;
Use of examples and illustrations;
Revision. The kind and quality of instruments in current use
Emphasis and summary of main points; Effective use of for student rating of pharmacy faculty teaching varies con-
chalkboard. siderably. These BARS and parallel traditional scales are
the first to be based on the ten new independent dimensions
B. Teaching Ability—Laboratory of teaching performance unique to pharmacy education.
Availability of equipment, reagents and ingredients;
Demonstration before performance; Supervision; Safety;
For colleges of pharmacy which participate in university-
Sufficient time and access; Concise, useful reporting. wide rating systems, the project offers guidance for the
college to work with the central agency responsible for
C. Teaching Ability—Experiential managing the faculty evaluation program. Existing tradi-
Demonstration and supervision of learning experiences; tional numerically-anchored items of high quality may be
Professional and patient communications; Practice. combined into scales for the 10 unique pharmacy teaching
D. Course Organization11 dimensions. Such scales may be used to report performance
Clarity of scheduling; Detail of content outline; ratings with higher reliability than is possible with a series of
Clarity of learning objectives, assignments and student individual items. If the central service agency does not offer
expectations; Following the course outline and ob- items to rate performance in all of the new pharmacy
jectives. teaching dimensions identified, item-writing and validating
E. Selection and Use of Media activities are called for to complete the locally-developed
Effective use of slides, overheads, videos, texts, hand- scales. With such revised scales in place, development of
outs, models) local, within-pharmacy norms is possible. For schools not
F. Student Performance Evaluationb required to participate in university-wide teaching evalua-
Lecture, Laboratory and Experiential, Relationship to tion systems, similar possibilities exist for within-school
course content/objectives; scale modification and improvement.
Clear, unambiguous questions and assignments;
Explanation of method, content, administration; Use of Equivalent Form ICES Scales. Concurrent valida-
Feedback to students; Fair, objective grading; Applica- tion of the BARS using specially-constructed numerical
tion, not rote memory. scales of parallel content has an additional useful outcome.
For schools using the I.C.E.S. system, use of the traditional
G. Student-Instructor Interactionb scales developed for this study, augmented by additional
Availability for consultation; Responses to student diffi-
culties;
I.C.E.S. items or other items descriptive of the ten dimen-
Conveying a helpful and supportive attitude; sions, could provide reliable scale scores based on the
Concern about student learning; Sensitivity to students’ dimensions. Reliability studies on such expanded scales are
needs; recommended.
Interest in student outcomes; Availability for help after
class;
Administration of BARS Scales. The expected project out-
Listening to student questions and concerns; Initiatives come of reliable and valid scale development was accom-
to help students; plished and the product is available for use in schools of
Atmosphere conducive to learning. pharmacy. BARS scales are expensive to develop and main-
tain. Use and continued research and development of these
H. Workload/Course Difficultyb
Scope of content; Length and difficulty of assignments;
scales in multiple pharmacy schools would provide addi-
Coverage of content; Reasonable due dates and project tional positive returns on the research and development
deadlines. investment. Care should be taken, however, to systemati-
cally select, introduce, administer, and monitor the scales.
I.. Enthusiasm/Motivationb Use of BARS has been most successful in organizations
Dynamic in presentation of subject;
Stimulation of student thought and interest;
where persons being rated have had input into the scale
Motivation of students to do their best work. development process and where the scales are profession-
ally-administered(74). Each administration should be man
J. Knowledge of Subject Areab aged by a human resources expert familiar with develop-
Well-prepared; Competent in field; Knows limits of ex- ment and administration of this type of performance rating
pertise.
scale. Unsupervised scale use by students is not recom-
a
Classroom teaching evaluated on dimensions A, D-J. mended, nor is administration by persons untrained in
Laboratory teaching evaluated on dimensions B, D-J. performance assessment. Potential user schools should uti-
Experiential teaching evaluated on dimensions C, D, F-J. lize a designated testing specialist for BARS scale adminis-
b
Tentative dimensions identified at onset of study. Dimensions B, tration.
C & E added on basis of critical incidents surviving the retranslation/rating Use of BARS in Faculty Development. One of the many
process.
characteristics of BARS is that, because of the vivid behav-
DISCUSSION iors they portray, faculty ratees are prone to adopt effective
teaching behaviors and to abandon those associated with
The project yielded four major outcomes: (i) validated low scale ratings. This tends to cause a favorable shift in
dimensions of teaching performance for use in development teaching behaviors and an inflation of ratings based on
American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994 33
Table VI. Summary statistics, retranslation and effectiveness ratings
Percent agreement on Standard deviation,
N of useable relevant effectiveness
incidents dimension ratings

Total Mean Range/


Dimensions items n/scale scale Mean Range Mean Range
10 134 13.4 10-19 79.6 60.6-100 1.76 1.1-2.0

improved faculty performance. This desirable side effect of talent being rated. Such studies should be expanded to
BARS use suggests that their greatest contribution may be include all of the dimensions of teaching in all environments,
in the provision of highly-effective faculty performance particularly laboratory teaching. The low concurrent validity
feedback, and not in their reliable and valid performance correlations for two scales require additional study. Low
assessment capabilities alone. The utility of BARS in pro- correlations for the Workload item and the Knowledge
viding performance feedback is well-established(75). Heart- scale are attributable, in part, to student differences in
felt introspection about these unforgiving “snapshots” of perceptions. Review of scale development ratings of critical
what students think of their instructors’ teaching could incidents depicting “Workload and course difficulty” showed
result in re-dedicated commitment to improved teaching. that some students approach ratings for this dimension in
terms of relative “ease” of workload, others in a more
Study Constraints normative sense in terms of perceived “appropriateness” of
Three constraints, two methodological and one philo- the amount of work assigned. Thus, both types of scales are
sophical, may have limited the study outcomes. First, the subject to students’ perceptions of appropriate input and
known disadvantages of using study volunteers is evident. effort vs. their own learning styles and willingness to expend
Although sufficiently represented to permit rich incident effort. Similarly, for student ratings of “Knowledge,” stu-
writing contributions from upperclassmen, a larger number dents deal with perceptions rather than facts about the
of volunteers from the final professional year could have instructor’s knowledge. Only vivid examples of lack of
enhanced the study. Perspectives of additional mature stu- preparedness in the classroom, as measured by the BARS
dents’ writings would have enhanced the pool of incidents. scale, served to measure this reliably. Reliability studies will
More importantly, participation by a larger proportion of also be conducted on expanded versions of the numerically-
“seniors” from study schools would have enabled their
utilization in larger numbers for the retranslation/rating anchored I.C.E.S. scales.
steps, allowing less reliance on senior students from the pilot Research on Learning Styles. When students are made
school. Second, faculty member commitment from study aware of their personal learning styles, accommodations to
schools for the purpose of concurrent validation of the scales instructional formats and styles may be made. This study
was not sought at the onset. Instead, volunteers were ob- demonstrated that the mean BARS ratings for items se-
tained only from the researchers’ school. Only two lecturers, lected in scale development were not affected by students’
both highly-experienced, volunteered and with limitations learning styles. Research is continuing on the effect of
on their available class time. This required administration of learning styles on all 402 critical incidents which were sub-
only part of the scales. Both lecturers received very high jected to retranslation and effectiveness ratings, especially
ratings on both types of scales, thus narrowing the range of those which were rejected for scale use because of wide
responses. The higher correlations for experiential courses rating variance. Significance of item variance differences
were due, in part, to a much wider variance in ratings than between learning style groups, if discovered, may offer
for the two volunteer lecturers. Third, the factor-analytic insights to instructors for possible instructional style and
basis for classifying teaching behaviors was not challenged performance accommodations based on specific observed
in this study. The foundation for scale construction was the teaching behaviors.
commonality of seven factors established and named in
previous studies. Because this study stressed observed be- Taxonomical Classification of Incidents with Ethical Impli-
haviors, it did not create global descriptors of instructors’
“personality.” Moreover, the dimensions were not created cations. Numerous items describing substandard profes-
or edited by students. Perhaps students, not educational sional behavior were eliminated from the scales. A review of
researchers, should be asked to fashion a tentative set of the bank of critical incidents for purposes of classification
dimensions based on the critical incidents, without prompt- into available taxonomies of ethical behaviors is planned
ing of previously-named factors or the dimensions identi- (76, 77).
fied in this study. It is possible that students have a discern-
ing and reliable way of “knowing” qualities of instruction Personal Dimensions. The emphasis in scale construction
and may be able to organize and describe an of instructional and use has been on the advantages of unidimensional
qualities more efficiently than researchers who begin with observed behaviors as scale anchors. This emphasis enabled
factor-analyzed groupings of teaching behaviors and who identification of ten discrete dimensions of teaching perfor-
insist on working only with descriptions of observed behavior. mance. It may also be possible to classify additional teaching
behaviors based on personal attributes of the instructor.
Topics for Future Research More “trait-like” than the observable performance-based
dimensions identified, such clusters may “cut across” many
Reliability. Ongoing reliability studies are planned. Coop- of the ten validated performance dimensions. Such personal
eration of additional volunteer instructors, including those dimensions, e.g. “Independence/Assertiveness” and “Han-
with little teaching experience, would broaden the range of dling/Coping with Detail”, have been previously reported

34 American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994


(9) MacMillan Dictionary of Psychology, (edit. Sutherland, S.) MacMillan,
for BARS describing pharmacy practice behaviors(78). Iden- London (1989), p. 183.
tification of such teaching dimensions could supplement these (10) Ibid., p. 183.
BARS, enhancing the ability to more completely and (11) Encyclopedia of Psychology, Vol. 3, (edit. Corsini, R.) John Wiley and
Sons, New York NY (1984) p. 205.
accurately describe the characteristics of effective teaching. (12) Smith, P.C., “Behaviors, results and organizational effectiveness: The
CONCLUSION problem of criteria,” in Handbook of Industrial Psychology, (edit
Dunnette, M.) Wiley and Sons, New York NY (1983) p.757.
This study has addressed the problem of rater error in (13) Op. cit. (11), p. 205.
several ways. First, in terms of scale content, separate di- (14) Op. cit. (2) pp. 25-26, 49-50.
mensions have been identified and scales have been devel- (15) Manning. R.C, The Teacher Evaluation Handbook, Prentice-Hall,
Englewood Cliffs NJ (1988) pp. 4-5.
oped for three pharmacy teaching environments. Scale an- (16) Ibid., p. 6-9.
chors refer to instructional behaviors only, not to extrane- (17) Op. Cit., (1), p. 43.
ous conditions beyond the instructor’s control. Enhanced by (18) Op. Cit., (2), p. 51.
broadly-based input, the scales are generalizable and avail- (19) Op. Cit., (2), p. 51-52.
able for use in all types of colleges of pharmacy. Second, (20) Op. Cit, (2), p. 80.
global student descriptors of instructors’ “personality” have (21) Ivancevich, J.M., Human Resource Management. 5th ed., Irwin,
Homewood IL, (1992) pp. 327-327.
been replaced by measures of two kinds of important, and (22) Cain-Smith, P. and Kendall, L.M., “Retranslation of expectations: An
observable, teaching behaviors: “Student Interaction” and approach to the construction of unambiguous anchors for rating
“Enthusiasm/ Motivation.” Third, problems with rater er- scales,” J. Appl. Psychol., 47, 149-155 (1973).
rors have been reduced. Fourth, approaches to more reli- (23) Wotruba, T.R., and Wright, P.L., “How to develop a teacher rating
able use of traditional, numerically-anchored scales have instrument: A research approach” J. Higher Educ., 46, 653-63 (1975).
(24) Op. Cit., (3) p. 18-19.
been suggested. Finally, however, the greatest impact may (25) Brandenburg, D.C, Braskamp, L.A. and Ory, J.C “Considerations
be the “mirror” which these BARS have provided into for an evaluation program of instructional quality,” CEDR Quart., 12, 8-
pharmacy teaching styles and behaviors. For better or worse, 12(1979).
students, with input from faculty incident writers, have (26) ICES Item Catalog, Newsletter No. 1, Office of Instructional Re-
sources, Univ. of Illinois, Champaign-Urbana IL (1977).
painted their multi-colored picture of the teaching land- (27) Op. cit., (7) p. 4-13.
scape. This vivid painting as a rating instrument may, it is (28) Das, H., Frost, P.J. and Barnowe, J.T., “Behaviorally anchored scales for
hoped, blur and fade. Prompted by faculty review of BARS, assessing behavioral science teaching,” Can. J. Behav. Sci., 11, 79-
the desired outcome of improved teaching could then de- 88(1979).
mand even more sensitive measures and compelling re- (29) Op. cit. (8) p. 149.
minders of how the teaching/learning enterprise might con- (30) Kotzan, J.A. and Mikeal, R.L.,” A factor-analyzed pharmacy-student
evaluation of pharmacy faculty,” Am. J. Pharm. Educ., 40, 3-7(1976).
tinually be enhanced. (31) Kotzan, J.A. and Entrekin, D.N., “Development and implementation of
a factor-analyzed faculty evaluation instrument for undergraduate
Acknowledgments. The assistance of Mikyoung Choi, in pharmacy instruction,” ibid., 42, 114-118(1978).
(32) Jacoby, K.E., “Behavioral prescriptions for faculty based on student
literature review, Debra Agard and Trena Magers in data evaluations of teaching,” ibid., 40, 8-13(1976).
entry; and helpful consultation with and comments by Bruce (33) Purohit, A. A., Manasse, H.R., Jr. and Nelson, A.A., “Critical issues in
A. Sevy, Personnel Decisions, Inc.; are gratefully acknowl- teacher and student evaluation,” ibid., 41, 317-325(1977).
(34) Op. cit., (7) p. 16-22.
edged. Several colleagues collaborated in research at four (35) Sauter, R.C. and Walker, J.D., “A theoretical model for faculty ‘peer’
anonymous colleges of pharmacy, providing essential sup- evaluation,” Am. J. Pharm. Educ., 40, 165-166(1976).
(36) Martin, R.E., Perrier, D. and Trinca, C.E., “A planned program for
port in selection of representative faculty and student groups evaluation and development of clinical pharmacy faculty,” ibid., 47.
and making logistical arrangements for conducting the criti- 102-107(1983).
(37) Downs, G.E. and Troutman, W.G., “Faculty evaluation and develop-
cal incident writing workshops. ment issues: Clinical faculty evaluation,” ibid. 50, 193-195(1986).
Am. J. Pharm. Educ., 58, 25-37(1994); received 9/29/93, accepted 1/23/94. (38) Carlson, P.G., “A panel: The evaluation of teaching in schools and
References colleges of pharmacy,” ibid., 39, 446-448(1975).
(39) Kulik, J.A., “Evaluation of teaching,” Memo to the Faculty, 53,
(1) Centra, J.A., Determining Faculty Effectiveness, Jossey-Bass, San 2(1974).
Francisco CA (1979) pp. 7-11. (40) Brown, B.F., Education by Appointment, Parker Publishing, West
(2) Braskamp, L.A., Brandenburg, D.C. and Ory, J.C., Evaluating Teach- Nyack NJ (1968).
ing Effectiveness. Sage Publications, Newbury Park CA (1984) pp. 29- (41) Peterson, R.V., “Chair report of the AACP Council of Faculties Ad
76. Hoc Committee on Promotion and Tenure,” Am. J. Pharm. Educ., 44,
(3) Centra, J.A., and Creech, F.R., The Relationship between Student, 428-430(1980).
Teachers, and Course Characteristics and Student Ratings of Teacher (42) Zanowiak, P., “Evaluation of teaching: One faculty member’s view-
Effectiveness, Project Report 76-1. Educational Testing Service, point,” ibid., 39, 450-452(1975).
Princeton NJ (1976). (43) Kiker, M., “Characteristics of the effective teacher,” Nursing
Outlook, 21, 721-723(1973).
(4) Measurement and Research Division, Office of Instructional Re- (44) Grussing, P.G., Silzer, R.F. and Cyrs, T.E., Jr., “Development of
sources, “ICES norms”, Unpublished Report, University of Illinois, behaviorally-anchored rating scales for pharmacy practice,” Am. J.
Urbana IL (1977-83). Pharm. Educ., 43, 115-120(1979)
(5) Op. cit. (2), p. 48,49. (45) Lipman, A.G. and McMahon, J.D., “Development of program guide-
(6) Ware, J.E. and Williams, R.G., “The Dr. Fox effect: A study of lecture lines for community and institutional externships,” ibid., 43, 217-
effectiveness and ratings of instruction,” J. Med. Educ., 50, 149-156 222(1979)
(1975). (46) Schwab, D.P., Heneman III, H.G. and DeCotis, T.A., “Behaviorally
anchored rating scales: A review of the literature,” Personnel Psychol., 28,
(7) Hildebrand, M., Wilson, R.C. and Dienst, E.R., Evaluating University 549-562(1975).
Teaching, Center for Research and Development in Higher Educa- (47) Kingstrom, P.O. and Bass, A.R., “A critical analysis of studies com-
tion, Berkeley CA (1971) pp. 18-20. paring behaviorally anchored rating scales (BARS) and other rating
(8) Dickinson, T.L. and Zellinger, P.M., “A comparison of the behavior- formats,” ibid., 31, 263-289(1981).
ally anchored rating and mixed standard scale formats,” J. Appl. (48) Campbell, J.P., Dunnette, M.D., Arvey, R.D. and Hellervik, L.W.,
Psychol, 65, 147-154(1980).

American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994 35


The development and evaluation of behaviorally based rating scales,” J. Appl. which best shows his/her typical performance in this dimension.
Psychol., 57, 15-22 (1973). 4. Follow the same rating procedure for all 10 dimensions.
(49) Borman, W.C. and Dunnette, M.D., “Behavior-based versus trait-oriented
performance ratings: An empirical study,” ibid., 60, 561-565(1975). ***
(50) Harari, O. and Zedeck, S., “Development of behaviorally anchored scales for A. TEACHING ABILITY -LECTURE
the evaluation of faculty teaching,” ibid., 58, 261-265(1973). (Audible and clear speaking; Interpretation and explana-
(51) Flanagan, J.C., “The critical incident Technique,” Psychol. Bull., 51, 327- tion of concepts; Use of examples and illustrations; Empha-
357(1954). sis and summary of main points; Effective use of chalk-
board.)
(52) Op. cit. (46). p. 554-555.
(53) Op. cit. (47), p. 266-273.
(54) Op. cit. (8), p. 150-153. Rating Performance Example
(55) Landy, F.J. and Guion, R.M., “Development of scales for the mea-
surement of work motivation,” Org. Behav. Human Perform., 5, 93-103(1970).
EXCELLENT
(56) Jacobs, R., Kafry, D. and Zedeck, S., “Expectations of behaviorally anchored 15-
rating scales,” Personnel Psych., 33, 595-610(1980). 14-
(57) Bernadin, H.J., Alvares, K.M. and Cranny, C.J., “A recomparison of behavioral 13-
expectation scales to summated scales,” J. Appl. Psychol., 61, 564-570 (1976).
(58) Hom, P.W., DeNisi, A.S., Kinicki, A.J. and Bannister, B.D., “Effec- 12.3 At the beginning of each class period, this instruc-
tiveness of performance feedback from behaviorally anchored rating tor briefly summarized the previous lecture and
scales,” ibid., 67, 568-576(1982). outlined the present lecture.
(59) Blood, M.R., “Spin-offs from behavioral expectation scale proce- 12.1 This instructor not only described concepts and
dures,” ibid., 59, 513-515(1974). process, but also rationale supporting them.
(60) Zedeck, S., Imparato, N., Krausz, M., and Oleno, T., “Development of 12-
behaviorally anchored rating scales as a function of organizational level,” ibid., 59, 11.9 This instructor began each class period by asking
249-252(1974). students if they had any questions from the last
(61) Op. cit. (50), p. 263. class period, or
(62) Op. cit. (28). This instructor taught several approaches to solv-
(63) Green, S.B., Sauser. W.I., Fagg, J.N. and Champion, C.H., “Shortcut methods ing problems, pointing out rationale for each
for deriving behaviorally anchored rating scales,” Educ. Psychol. Meas., 41,
method.
761-775(1981).
11.8 When new drug products entered the market, this
(64) Champion, C.H., Green, S.B. and Sauser, W.I., “Development and evaluation
of shortcut-derived behaviorally anchored rating scales,” ibid., 48, 29-41(1988). instructor frequently used them in examples illus-
(65) Keresuk, T.J., Smith, A. and Cardillo, J.E., Goal Attainment Scaling: trating therapeutic aspects of the active
Applications, Theory, and Measwrcmenf, Erlbaum, Hillsdale N J, (1993). ingredient(s).
(66) Elenbaas. R.M., “Evaluation of students in the clinical setting,” Am. J. Pharm. 11-
Educ., 40, 410-417(1976). 10.5 When lecturing from overhead projections, this
(67) Nelson, A.A. and Maddox, R.R., “An assessment of the mastery of entry- instructor looked to the class, paused, asking if
level practice competencies using a primary care clerkship training model,” there were any questions.
ibid., 56, 354-363(1992). 10-
(68) Penna, R.P. and Sherman, M.S., “Enrollments in schools and colleges of 9-
pharmacy, 1988-1989,” ibid., 53, 270-302(1989). 8-
(69) Kolb, D., Learning Style Inventory Interpretation Booklet, McBer and Co., 7-
Boston MA (1985). 6-
(70) Garvey, M.G., Bootman, J.L. and McGhan, W.F., “An assessment of learning 5.5 This instructor frequently said “Aahhh” or
styles among pharmacy students,” Am. J. Pharm. Educ., 48, 134-140(1984). “Ummm” between phrases and sentences.
(71) Riley, D.A., “Learning styles: A comparison of pharmacy students, graduate 5-
students, faculty and practitioners,” ibid., 51, 33-36 (1987). 4.6 When overhead transparencies were removed be-
(72) Op. cit. (44), p. 116-117.
fore students could complete their notes, this in-
(73) Op. cit. (44), p. 117.
structor would say “You only need to listen to what I
(74) Op. cit. (58), p. 570.
(75) Op. cit. (58), p. 574. am saying.”
(76) Counelis, J.S.. “Toward empirical studies on university ethics,” J. Higher 4.3 This instructor used new scientific and profes-
Educ., 64, 84-86(1993). sional terms freely, assuming that students already
(77) Fassett, W.E., Doing Right by Students: Professional Ethics for knew them.
Professors, PhD Dissertation, University of Washington, Seattle WA (1992). 4-
(78) Op. cit. (44), p. 116. 3.7 This instructor did not speak clearly, saying “sorp-
tion” and not conveying whether adsorption or
absorption was meant.
3-
APPENDIX: THREE SAMPLE BARS SCALES FOR 2.8 This instructor lectured “over the heads” of the
STUDENTEVALUATIONOFPHARMACYINSTRUCTION level of intellect of the students.
2.6 This instructor did not enunciate clearly, mum-
INSTRUCTIONS TO RATER: bling through lectures, or
1. Carefully read the dimension and supporting examples (in When students would ask this instructor to please
parentheses) repeat a point made in lecture, the instructor would
say “Get it from your neighbor,” and continue
2. Read each performance level on this dimension for your ratee. lecturing.
3. Consider the Typical performance level on this dimension for your 2.5 This instructor wrote notes on the chalkboard
ratee. faster than students could comprehend and record
Compare his/her typical performance with each of the performance them, then erased the notes before students com-
examples. pleted taking them down.
Circle the scale number (1-15) nearest to the performance example
2-
2-
1-
POOR

36 American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994


D. COURSE ORGANIZATION F. STUDENT PERFORMANCE EVALUATION
(Clarity of scheduling; Detail of content outline; clarity of (Lecture, Laboratory, and Experiential: Relationship to
learning objectives, assignments and student expectations; course content/objectives; Clear, Unambiguous questions
Following the course outline and objectives) and assignments; Explanation of method, content, adminis-
tration; Feedback to students; Fair, objective grading; Ap-
Rating Performance Example plication, not rote memory).
EXCELLENT
15- Rating Performance Example
14- EXCELLENT
13- 15-
12- 14-
11.3 This instructor’s course syllabus contained helpful 13- 13.0 After exams, this instructor made examinations available
suggestions on how to take notes, study for exams, via computer where students could see the correct
and general expectations for student performance. answers, answers missed, plus helpful comments on
11- 11.0 This instructor reviewed learning objectives be- each question.
fore each examination. 12.2 This instructor provided practice quizzes on computer
10.4 This clinical preceptor told students “up front” terminals.
what was expected and followed through with 12-
learning situations. 11.5 During the next lecture after an exam, this instruc-
10- tor reviewed the questions most frequently missed
9.5 This instructor provided students with written exam, by students.
term project and grading policies. 11.4 This preceptor conducted weekly performance
9- feedback sessions with all externs.
8- 11-
7- 10.9 This’ preceptor’s constructive feedback included
6- reasons for needed improvement as well as posi-
5.7 This instructor’s course included content which tive outcomes of things the students did well.
was duplicative of previously taught prerequisite 10.6 This instructor encouraged students to submit term
“material”. papers early so that feedback could be provided
5- enabling revision before the due date.
4.1 This instructor wrote a special text for the course, 10-
but did not make it available until the third week of 9.9 This clinical preceptor’s exams were patient-ori-
the term. ented in case format.
4- 9-
3.8 When this instructor divided a class into recitation 8-
sections, the content was not standardized be- 7-
tween sections. 6-
3.7 After arriving late for conferences, this clinical 5- 5.0 This lab instructor based grades on results and not
preceptor would spend additional time to collect on explanations of process used to obtain results.
materials and get organized. 4.5 This instructor did not proofread exams and made
3.6 This instructor coordinated a team-taught course corrections on the chalkboard only after students
in which lecturers had no idea of what other lectur- detected errors during the exam.
ers were teaching. 3.9 This instructor provided only one description of
3.3 This instructor frequently delayed lecture ten min- how grades would be computed—”totally bell
utes while returning to his/her office for forgotten curve.”
lecture notes. 3.5 This clinical preceptor was unable to document,
3.1 This instructor never had sufficient copies of hand- with specific student performance behaviors, rea-
outs on the first day of class. sons for the grade assigned.
3- 2.9 This instructor, named “trivial pursuit” by the
2.9 This instructor frequently arrived late to lecture class, tested on facts which were least emphasized
and then would run overtime with lecture. in class.
2.8. This instructor began the course without a sylla - 2.8 This instructor’s exams were so long that it was
bus, saying that he would work it up as the term impossible to complete them in the time allowed.
progressed. 2.6 This instructor administered multiple-choice ex-
2.7 After arriving late to class, this instructor would ams containing not less than twelve responses per
ask “What are we supposed to lecture about to- question.
day?” 2.5 This preceptor did not give student performance
2.2 Unknown to college administration and students, feedback, even if asked.
this instructor arranged for a T.A. to teach the 2.2 This clinical preceptor refused to give students
entire course. their final rotation evaluation until they turned in
2.0 This instructor distributed his/her syllabus two their evaluation of the preceptor first.
weeks before the end of instruction. 2.1 This instructor did not return midterm exams until
2- one day before the final.
1- 2- 2.0 This instructor had a policy of not assigning “A”
POOR grades, saying “No one is perfect.”
1-
POOR

American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994 37

You might also like