Professional Documents
Culture Documents
July 2006
This is the final report for the 2005-2006 school year on the new Unit Assessment for the
Teacher Preparation Program. This is the first year of this process, and this report includes
results from both fall and spring semesters. Some of these results were previously reported in
a preliminary validation study conducted using Fall 2005 data; these results are included in this
summary report for the year, although in a slightly altered form. Although fall and spring
semester data are reported separately1, the final conclusions are based on all data from both
semesters.
This assessment is based on the criteria developed at the recommendation of our national
accrediting body, the National Council for Accreditation of Teacher Education (NCATE), at
their most recent assessment and re-certification of our program, a process that was concluded
in spring 2005. NCATE granted Rider’s School of Education full re-certification without
conditions, but as part of that process the NCATE Examiners suggested one area in which the
School of Education needed to improve its self-assessment procedures in the future. Self-
assessment in the School of Education has until this year focused primarily on evaluations of
separate programs within the School of Education -- programs such as elementary education,
special education, mathematics education, etc. -- and these assessments have not been
conducted in a manner that allowed their results to be aggregated across the entire School of
Education for summary reporting. This design reflected the need to report to each of the many
Specialty Program Associations (SPAs) that NCATE and Rider’s School of Education use to
evaluate each of our many certification programs.
The Unit Assessment that is the subject of this report was designed, in conjunction with the
NCATE Examiners, to meet NCATE’s recommendation that the School of Education conduct
1
Separate analyses were conducted because the data are not completely parallel. There are two
reason for this: (1) Only partial data were collected (in particular, only Level one and Level two
assessments were collected and analyzed in the fall 2005 semester, whereas assessment of all
students at all three levels were made and analyzed in the spring 2006 semester) and (2) based on
the experience of the fall 2005 data collection and analysis, some minor adjustments were made
to the process of data collection.
Because this was the first year in which this data was collected, results (especially for the fall
2005 semester) are incomplete, but they nonetheless suggest that we are meeting our principal
goals. Most importantly for this initial assessment, they demonstrate good levels of reliability
and validity for our rating system, which appears to be sufficiently robust to allow meaningful
interpretation of data. This year’s effort was designed to test the system and to provide
2
To further clarify for those not familiar with the process we have recently adopted, the Unit
Assessment activities described in this report are just the newest aspect of our self-evaluation
process. The evidence reviewed by the NCATE team of examiners was based on a variety of
other quantitative internal criteria (e.g., minimum grade requirements in education courses and
minimum GPA requirements) and external criteria (e.g., scores on a variety of national teacher
certification tests, scores on national tests of competence in fundamental skills such as writing
and mathematics, and scores on national tests of content knowledge in the areas of science,
history, geography, mathematics, literature, and the arts). In these areas, Rider’s Teacher
Preparation Program has set GPA and PRAXIS testing requirements that meet (and in several
areas exceed) state certification requirements.
We also have demonstrated that we meet or exceed the dozens of program-specific
criteria set by the many Specialty Program Associations (SPAs) affiliated with NCATE. These
include the national groups that NCATE trusts to evaluate elementary education programs, early
childhood education programs, special education programs, and secondary subject-area teaching
programs like mathematics education and English/language arts education. Each of these SPAs
sets very specific, performance-based requirements, and our program presented (as part of the
NCATE re-certification process) detailed evidence of our success in meeting these standards.
This evidence was evaluated both by the SPAs and by NCATE as part of our recent (and
successful) evaluation for re-certification. We will continue to collected this data for future
reports to the individual SPAs. The Unit Assessment procedures that are the subject of this
report are an addition to our assessment procedures but do not replace or negate those other,
program-specific evaluations.
The Unit Assessment described here was initially designed under NCATE guidance in the
spring of 2005 as a part of that re-certification process and was further developed by the Teacher
Education Department in the summer and fall of 2005. Its goal is to assess how well our Teacher
Preparation Program is meeting the overarching and unifying goals of our program -- the goals
that all our various programs share. These goals, which are set forth in Rider’s School of
Education’s Conceptual Framework, commit us to fostering committed, knowledgeable,
reflective professionals. It is therefore student performance in those four areas -- commitment,
knowledge, reflection, and professionalism -- that we are assessing.
The School of Education at Rider University strives to prepare students for educational settings
by fostering committed, knowledgeable, reflective professionals. Our plan is to assess
candidates' development in these four areas (commitment, knowledge, reflection, and
professionalism) at three points: upon matriculation (following completion of the first two
required courses in the department); upon completion of methods courses; and upon
completion of student teaching3. These assessments are not part of grading students in their
coursework and will not routinely be reported to students. The ratings are done with the primary
goals of program evaluation and alerting the program of the need for intervention where
necessary with students who are making unacceptable progress. These assessments are part of a
larger School of Education Unit Assessment Plan.
We will conduct reliability and validity studies at regular intervals to document that the data
being collected are meaningful. This is the report of the first such validity study. In all cases,
two or more independent assessments of each student will be made at each level. These will be
made by the professors of EDU-106 and EDU-206 at the first level (matriculation), professors
of all required methods courses at the next level (these courses vary depending on the areas in
which students hope to be certified), and the student teaching supervisor, cooperating teacher,
and seminar leader at the final (student teaching) level.
PART II: FALL 2005 SEMESTER DATA, ANALYSIS, AND CONCLUSIONS4
A total of 488 ratings are included in this analysis. Of these, 144 were of students in
sophomore-level courses (level 1 -- matriculation) and 320 were students in junior-level
3
These three levels are being used throughout the School of Education, including Graduate
Education programs, and the data may in the future be aggregated for an overall Unit
Assessment. No such aggregation of data will be attempted until each program can demonstrate
reliability and validity in its assessments, however.
4
The data and analyses reported in this section were previously reported, in slightly different
form, in February 2006.
Of the level 1 ratings, 67 were from EDU-106 and 84 were from EDU-206. The breakdown for
level 2 ratings was as follows:
5
A total of 24 student ratings were excluded from analyses based on level (level 1 =
matriculation, or sophomore level courses, and level 2 = methods courses, or junior-level
courses), but they were included in the total ratings. These 24 ratings were for students in the
following courses: ECE 322 (8 students), ECE 440 (5 students), ECED 522 (2 students -- these
were graduate students who were mistakenly included in the data collection), ECED 540 (5
students -- these were graduate students who were mistakenly included in the data collection),
and EDU 320 (4 students). These ratings were excluded from level 1 and level 2 analyses
because it was unclear which level was appropriate, something that needs to be determined for
future assessments. (The handful of graduate student ratings were mistakenly included because
they are in dual-listed courses.)
6
The students taking this course are primarily sophomores, but because this is a methods course
it was included in this analysis as a level 2 course. This designation may be changed in the
future.
These inter-rater reliability coefficients are quite adequate for group comparisons, and although
they do not reach the .90 level that is optimal for making comparisons among or high-stakes
decisions about individuals, most do achieve the .80 level that is generally deemed acceptable
for individual comparisons (but, of course, the purpose and use of these assessments has
nothing to do with high-stakes decisions about individuals or comparisons among individuals).
For an inter-rater reliability measure employing a wide variety of raters but just two raters for
each student, these correlations are actually rather high (especially because different professors
are observing students in different courses and different settings). Of importance here is that
they fully meet the inter-rater reliability requirements for the purpose of group comparisons
7
Some transfer students who were taking just one course of a pair, or students who were
repeating a failed course, were the only students who received only a single rating. Those ratings
were not included in inter-rater reliability calculations.
8
The “All Ratings” included some ratings that are not included in either the Level 1 or Level 2
ratings, as explained in footnote 2 above.
For our initial assessment we limited our analysis to an investigation of the correlations
between ratings and overall GPA prior to entering the courses in which the ratings were
conducted. In future analyses we will also look at correlations of ratings and grades in
Education courses (as noted above) and end-of-semester GPAs, but the beginning-of-semester
GPA correlations used in this analysis are helpful in demonstrating that the ratings correlate
with other, independent measures of student performance outside the School of Education.
All correlations were significant at the 0.01 level (two-tailed). This analysis, in combination
with the reliability data reported above, supports a preliminary judgment of acceptable validity of
the rating system.
Results: Ratings were done on a four-point scale: 1-Unacceptable at this level, 2-Limited
acceptability at this level, 3-Acceptable at this level, or 4-Exceeds expectations at this level. A
successful student would therefore score 3 (Acceptable) in each domain at each level, and this
would indicate satisfactory progress through the program. Our goal, therefore, is for students
to reach this level (3) in all areas.
Mean Ratings
Mean Sophomore Mean Junior Mean
Commitment 3.19 3.04 3.24
Knowledge 3.06 2.89 3.13
Reflection 3.08 2.94 3.13
Professionalism 3.19 3.11 3.23
Overall Means
3.2
3.15
Commitment
3.1
Knowledge
3.05 Reflection
Professionalism
3
2.95
Mean
3.3
3.2
3.1
Commitment
3 Knowledge
Reflection
Professionalism
2.9
2.8
2.7
Sophomore Mean Junior Mean
The mean rating in all categories was slightly above Acceptable at this level. The ratings were
higher for students at the junior level (level 2) than at the sophomore level (level 1). This was
not predicted because ratings of levels of performance are based on levels of commitment,
knowledge, reflection, and professionalism expected at each level, and these expected levels of
performance of course vary, with higher standards for acceptable performance at higher levels.
Although unanticipated, it is nonetheless heartening to see that these ratings do show an
increase9. At the sophomore level, commitment and professionalism reached satisfactory levels,
but knowledge and commitment were slightly below the target level of 3 (Acceptable). At the
junior level all mean ratings are significantly above the Acceptable level.
Frequency tables and graphs for each of the four domains can be found in the Fall 2005
Appendices. Overall and across domains, approximately 80% of the ratings were in the
Acceptable at this level or Exceeds expectations at this level. Approximately one-fifth of the
ratings were below the acceptable level, mostly in the category of Limited acceptability at this
level, The percentage of ratings of Unacceptable at this level were approximately 2-3 percent.
As shown in the mean ratings, Level 2 students (juniors in methods courses) received generally
9
ANOVA results indicated statistically significant interaction effects of level and domain
(commitment, knowledge, reflection, and professionalism) for three of the domains (commitment
[p = .024], knowledge [p = .009], and reflection [p = .050]) at the .05 level.
Also of interest is the fact that ratings in these four domains (commitment, knowledge,
reflection, and professionalism) are highly intercorrelated, as one would expect. These are
clearly not orthogonal variables, but rather interdependent attributes that are all necessary for
successful teaching. While conceptually it is easy to distinguish commitment, knowledge,
reflection, and professionalism, in practice they are not independent and will both overlap and
predict one another. For example, a student who is committed to teacher preparation is likely
to be more knowledgeable, professional, and reflective than a student lacking such
commitment. In this sense these four scales can be thought of as four parts of a single scale,
rather the way different methods of assessment in a course (tests of various kinds, papers,
presentations, etc., while on the surface quite different, are essentially measuring the same
construct (understanding of course content) in different ways. The correlation matrix appears
below10.
10
If viewed as a single test, Cronbach’s alpha for these ratings would be 0.931. The inter-rater
reliability estimates are only slightly higher than the cross-domain correlations, which suggests
that these four areas might be thought of as inter-related parts of a whole rather than as
independent constructs. They certainly tend to go together, but this does not mean that they
cannot be understood as separate constructs. By way of analogy, skill in multiplying fractions is
likely to be very highly correlated with skill in dividing fractions, and knowledge of appropriate
use of quotation marks is likely to be highly correlated with skill in capitalization, but in both
cases the two paired skills are conceptually quite different (and though it would be unusual, one
could have very different levels of expertise in, say, use of quotation marks, and capitalization).
Similarly, although one would expect that commitment, knowledge, reflection, and
professionalism would be highly intercorrelated, that does not mean that they represent a single
construct.
There were no significant differences by gender in any of the four domains (commitment,
knowledge, reflection, and professionalism).
Conclusions from Fall 2005 Assessment: This initial testing of this rating and Unit Assessment
system suggests that our proposed model of assessment based on our Conceptual Framework is
workable. Initial reliability and validity data are quite good. The results also suggest that,
overall, students are achieving acceptable levels of the attributes that are fundamental
constituents of our Conceptual Framework -- commitment, knowledge, reflection, and
professionalism -- and which NCATE has accepted as the foundation of our Unit Assessment
Plan. There is evidence that sophomores are not, on average, demonstrating fully acceptable
levels in the domains of knowledge and commitment, but it appears that performance in these
areas improves as students progress through the program. This is likely due to a combination of
genuinely improved individual performance and attrition resulting from students choosing to
leave the program11 and the removal of poorly performing students from the program12.
The next step will be to integrate level 3 ratings of student teachers into the system (see below).
This is by far the most important level, because these are the final ratings of our students just
before they leave the program and move into their new roles as teachers. The major purpose of
the Unit Assessment is to allow us to identify potential problem areas -- areas of weakness in the
development of students in the areas of commitment, knowledge, reflection, and
professionalism -- so that program adjustments can be made where necessary. Without level 3
ratings of student teachers we cannot do this, but we are pleased that as students prepare to
move to that level (that is, as they leave level 2 [junior-level teaching methods courses] and
enter level 3 [student teaching]), they appear overall to be making appropriate progress and are
11
This fits what we know about reasons students commonly drop out of college programs.
“[T]he most common reason for dropping out of university is commitment to one’s chosen field
of study” (Breen & Lindsay, 2002, p. 694; see also Yorke, 1999). Motivation at the college level
has been shown to be very domain specific (Breen & Lindsay, 2002), so that lack of commitment
to an education program is not the same as lack of commitment to a business program or a
program in one of the liberal arts and sciences. Commitment as measured by Rider Education
professors in these ratings naturally focuses on this very domain-specific variety of motivation --
commitment to teaching.
12
This is a Unit Assessment program and is not designed for making individual assessments of
students, for which we have more elaborate systems. All students making unacceptable progress
are noted, however, and either plans are developed to help them improve in areas of weakness or
they are removed from the program.
Future Directions and Recommendations: The following additions and changes will
improve our Unit Assessment Plan:
1. Expand data collection to include level 3 (student teachers)
2. For level 3 data collection, add INTASC Standards assessment data.
3. Decide how to use SPE and ECE data (and be sure to exclude ratings of graduate
students)
4. Decide which separate program analyses to run (if any), such as Elementary Education,
Secondary Education, etc.
5. Needed analyses:
• Overall means of Commitment, Knowledge, Reflection, & Professionalism
ratings and means for Commitment, Knowledge, Reflection, & Professionalism
ratings at each level (1, 2 & 3)
• Overall distribution of Commitment, Knowledge, Reflection, & Professionalism
ratings and distribution of Commitment, Knowledge, Reflection, &
Professionalism ratings at each level (1, 2 & 3)
• Inter-rater reliability for Commitment, Knowledge, Reflection, & Professionalism
ratings (2 raters for most students at levels 1 and 2, 3 raters for students at level 3)
• Means of grades in current Education courses
• Correlations among GPA (at end of the semester), means of grades in current
Education courses, and Commitment, Knowledge, Reflection, & Professionalism
ratings -- both overall and by level
• For level 3 only, correlations among 11 INTASC ratings and Commitment,
Knowledge, Reflection, & Professionalism ratings
• ANOVA of mean differences among the 3 levels for Commitment, Knowledge,
Reflection, & Professionalism ratings
200
150
Unacceptable
100 Limited acceptability
Acceptable
50 Exceeds expectations
0
Frequency
250
200
Unacceptable
150
Limited acceptability
100 Acceptable
Exceeds expectations
50
0
Frequency
250
200
Unacceptable
150
Limited acceptability
100 Acceptable
Exceeds expectations
50
0
Frequency
250
200
Unacceptable
150
Limited acceptability
100 Acceptable
Exceeds expectations
50
0
Frequency
60
50
40 Unacceptable
30 Limited acceptability
Acceptable
20
Exceeds expectations
10
0
Frequency
50
40
Unacceptable
30
Limited acceptability
20 Acceptable
Exceeds expectations
10
0
Frequency
60
50 Unacceptable
40
Limited acceptability
30
Acceptable
20
10 Exceeds
expectations
0
Frequency
60
50 Unacceptable
40
Limited acceptability
30
Acceptable
20
10 Exceeds
expectations
0
Frequency
140
120 Unacceptable
100
Limited acceptability
80
60 Acceptable
40
20 Exceeds
expectations
0
Frequency
180
160
Unacceptable
140
120
Limited acceptability
100
80
Acceptable
60
40
Exceeds
20 expectations
0
Frequency
180
160
Unacceptable
140
120
Limited acceptability
100
80
Acceptable
60
40
Exceeds
20 expectations
0
Frequency
160
140 Unacceptable
120
100 Limited acceptability
80
60 Acceptable
40
Exceeds
20
expectations
0
Frequency
A total of 503 ratings are included in this analysis. Of these, 90 were of students in
sophomore-level courses (level 1 -- matriculation), 256 were students in junior-level courses
(level 2 -- methods courses), and 157 were students in senior-level courses (level 3 -- student
teaching).
Reliability:
Correlations between the ratings of all raters at each level were used to determine the reliability
of the ratings. Most students at Levels 1 and 2 had two independent ratings by two
professors13. Level 3 students had 3 independent ratings by their seminar leader, their
cooperating teacher, and their student teaching supervisor. The paired inter-rater reliability
correlations of those ratings are reported below.
13
Some transfer students who were taking just one course of a pair, or students who were
repeating a failed course, were the only students who received only a single rating. Those ratings
were not included in inter-rater reliability calculations.
These inter-rater reliability coefficients are quite adequate for group comparisons, for which .
60 or higher is generally recommended. For an inter-rater reliability measure using just two
raters, these correlations are actually rather high (especially because different professors are
observing students in different courses and different settings). Of importance here is that they
fully meet the inter-rater reliability requirements for the purpose of group comparisons and
program assessment. It is also interesting to note that the highest inter-rater reliabilities were
at Level 3, the ratings of student teachers, which is arguably the most important assessment of
the three because the Level 3 assessments are made as students are completing the Teacher
Preparation Program. These inter-rater reliabilities of Level 3 students are so good that they
actually reach reliability levels sufficient for use in making individual comparisons among
students or decisions about individual students (in which a .90 level is desirable but .80 is
generally considered quite acceptable), although that was not their purpose.
Correlations of ratings with Education course grades: The overall correlations for all levels
between ratings in commitment, knowledge, reflection, and professionalism with course grades
in current education courses were as follows (all statistically significant at the 0.01 level [2-
tailed]):
Correlations of ratings with GPA: The overall correlations between ratings in commitment,
knowledge, reflection, and professionalism with GPA as of the semester the students entered the
education courses when the ratings were made were as follows (all statistically significant at the
0.01 level [2-tailed]):
Level 1
Correlations of ratings with Education course grades: For Level 1 students the correlations
between ratings in commitment, knowledge, reflection, and professionalism with course grades
in current education courses were as follows (all statistically significant at the 0.01 level [2-
tailed]):
Correlations of ratings with GPA: For Level 1 students the correlations between ratings in
commitment, knowledge, reflection, and professionalism with GPA as of the semester the
students entered the education courses when the ratings were made were as follows (all
statistically significant at the 0.05 level [2-tailed] except Commitment, which was significant at
the 0.01 level [2-tailed]):
Level 2
Correlations of ratings with Education course grades: For Level 2 students the correlations
between ratings in commitment, knowledge, reflection, and professionalism with course grades
in current education courses were as follows (all statistically significant at the 0.01 level [2-
tailed]):
Correlations of ratings with GPA: For Level 2 students the correlations between ratings in
commitment, knowledge, reflection, and professionalism with GPA as of the semester the
students entered the education courses when the ratings were made were as follows (all
statistically significant at the 0.01 level [2-tailed]:
Level 3
Correlations of ratings with separate ratings on INTASC Standard criteria: For Level 3
students the correlations between ratings in commitment, knowledge, reflection, and
professionalism with ratings in the 10 INTASC Standards plus an 11th Standard of how well
student teachers help students develop thinking and problem solving skills (which was added at
the suggestion of the NCATE Accreditation Team) were as follows (all statistically significant at
the 0.01 level [2-tailed]:
Every one of the many predicted correlations was statistically significant at the 0.05 level )two-
tailed), and all but a handful of these were significant at the 0.01 level (2-tailed). The results are
therefore so consistent and so conclusive that a narrative description of these scores of data
points would be superfluous, but complete data are presented above of all observed correlations
for verification.
As was found in the initial fall 2005 semester evaluation of this rating and Unit Assessment
system, these results suggest that our proposed model of assessment based on our Conceptual
Framework is viable, reliable, and valid. Both overall and at each level the ratings in the areas
of commitment, knowledge, reflection, and professionalism have proven to be highly valid (as
well as reliable, as demonstrated in the previous section). The Teacher Education Department
there concludes that these ratings can be reported and used to judge the effectiveness of the
undergraduate Teacher Preparation Program at Rider University.
Ratings were done on a four-point scale: 1-Unacceptable at this level, 2-Limited acceptability
at this level, 3-Acceptable at this level, or 4-Exceeds expectations at this level. A successful
student would therefore score 3 (Acceptable) in each domain at each level, and this would
indicate satisfactory progress through the program. Our goal, therefore, is for students to reach
this level (3) in all areas.
The mean ratings for all students, and the means by level, were as follows:
Mean Ratings
Level 3:
Senior
Level I: Level 2: (Student
Overall Sophomore Junior Teacher)
Mean Mean Mean Mean
Commitment 3.43 3.24 3.32 3.60
Knowledge 3.24 3.23 3.06 3.42
Reflection 3.27 3.21 3.07 3.55
Professionalism 3.38 3.11 3.23 3.61
3.45
3.4
3.35
Commitment
3.3
Knowledge
3.25 Reflection
3.2 Professionalism
3.15
3.1
Overall Mean
3.65
3.6
3.55
3.5
Commitment
3.45 Knowledge
3.4 Reflection
3.35 Professional
3.3
Level 3: Senior
(Student Teacher)
Mean
3.4
3.3
3.2 Commitment
Knowledge
3.1 Reflection
Professional
3
2.9
Level 2: Junior Mean
3.25
3.2
3.15 Commitment
Knowledge
3.1
Reflection
3.05 Professional
3
Level I: Sophomore
Mean
3.6
3.5
3.4
3.3
Commitment
3.2
3.1
3
All I 2 3
3.5
3.4
3.3
3.2
3.1 Knowledge
2.9
2.8
All I 2 3
3.6
3.5
3.4
3.3
3.2
Reflection
3.1
3
2.9
2.8
All I 2 3
3.7
3.6
3.5
3.4
3.3
3.2 Professional
3.1
2.9
2.8
All I 2 3
Frequency tables and graphs for each of the four domains can be found in the Spring 2006
Appendices below. Overall and across domains, approximately 80% of the ratings were in the
Acceptable at this level or Exceeds expectations at this level. Approximately one-fifth of the
ratings were below the acceptable level, mostly in the category of Limited acceptability at this
level, The percentage of ratings of Unacceptable at this level were approximately 1 percent.
This low level of Unacceptable ratings (ranging from 0.5% for Reflection to 1.4% for
Professionalism) is encouraging. Those students will need to repeat courses or develop a
program to improve in areas in which they are deficient before they can be successful as student
teachers, or they may simply be asked to leave the program. Of those Unacceptable ratings, only
a single one came from Level 3 -- the Student Teaching level. This is heartening. Whether this
is a result of improvement prior to student teaching or a weeding out of unacceptable candidates
prior to student teaching cannot be ascertained from this data, but the important point is that,
with a single exception (one rating of Unacceptable in the area of Commitment), all student
teachers received three ratings of at least Limited acceptability at this level in all areas14.
Of course, Limited acceptability at this level is not adequate, and the Teacher Education
Department needs to strive to even further limit the numbers of students performing at this
level. There were an average of 11 such ratings (of 157 total) in this category in each of the
four areas rated. A total of 7.166% of the ratings of student teachers therefore fell into either
the Unacceptable at this level (1 rating out of 628, or 0.0016%) or Limited acceptability at this
level (44 out of 628, or 0.0701%). On the positive side, this means that approximately 93% of
all ratings were in either the “Acceptable at the level” or Exceeds expectations at this level”
categories.
Once again, as in the Fall 2005 Study, ratings in these four domains (commitment, knowledge,
reflection, and professionalism) were found to be highly intercorrelated, as one would expect.
Commitment, knowledge, reflection, and professionalism are interdependent attributes that are
all necessary for successful teaching. While conceptually it is easy to distinguish commitment,
knowledge, reflection, and professionalism, in practice they are not independent and will both
overlap and predict one another. For example, a student who is committed to teacher
preparation is likely to be more knowledgeable, professional, and reflective than a student
lacking such commitment. In this sense these four scales can be thought of as four parts of a
single scale, rather the way different methods of assessment in a course (tests of various kinds,
14
Careful analysis of the charts and tables in the Appendix will show that students at Level 1 in
some ways outperformed those in Level 2 in the Spring 2006 assessment, both in mean ratings
and in the area of having fewer “Unacceptable” ratings. This is the opposite of the results in Fall
2005, but one should also note that the Level 1 cohort was unusually small in Spring 2006,
which may account for these unexpected (but not problematic) results.
Of the four areas, the one that has overall lowest ratings is the area of knowledge. This is largely
because of lower ratings in this area of Level 3 student teachers. The ratings were still quite
good -- the mean was about midway between Acceptable at this level and Exceeds expectations
at this level -- but if one were to single out one area of least strength, these data suggest it
would be the area of knowledge.
15
If all four scales were combined and viewed as a single test, Cronbach’s alpha for these ratings
would be 0.912.
Rating Frequency
1-Unacceptable 7
at this level
2-Limited acceptability 49
at this level
3-Acceptable 201
at this level
4-Exceeds expectations 307
at this level
1-Unacceptable
350 at this level
300
2-Limited
250
acceptability at
200 this level
150 3-Acceptable at
100 this level
50
4-Exceeds
0 expectations at
this level
Rating Frequency
1-Unacceptable 6
at this level
2-Limited acceptability 61
at this level
3-Acceptable 286
at this level
4-Exceeds expectations 211
at this level
1-
300 Unacceptable
at this level
250
2-Limited
200 acceptability
at this level
150
3-Acceptable
100 at this level
50
4-Exceeds
0 expectations
at this level
Rating Frequency
1-Unacceptable 3
at this level
2-Limited acceptability 82
at this level
3-Acceptable 233
at this level
4-Exceeds expectations 246
at this level
1-
250 Unacceptable
at this level
200
2-Limited
acceptability at
150
this level
3-Acceptable
100
at this level
50
4-Exceeds
expectations
0
at this level
Rating Frequency
1-Unacceptable 9
at this level
2-Limited acceptability 48
at this level
3-Acceptable 222
at this level
4-Exceeds expectations 285
at this level
1-
300 Unacceptable
at this level
250
2-Limited
200 acceptability
at this level
150
3-Acceptable
100 at this level
50
4-Exceeds
0 expectations
at this level
Rating Frequency
1-Unacceptable 6
at this level
2-Limited acceptability 34
at this level
3-Acceptable 130
at this level
4-Exceeds expectations 147
at this level
1-
160 Unacceptable
at this level
140
120 2-Limited
acceptability
100 at this level
80
3-Acceptable
60 at this level
40
20 4-Exceeds
0 expectations
at this level
Rating Frequency
1-Unacceptable 6
at this level
2-Limited acceptability 42
at this level
3-Acceptable 182
at this level
4-Exceeds expectations 87
at this level
1-
200 Unacceptable
180 at this level
160
2-Limited
140
acceptability
120 at this level
100
80 3-Acceptable
at this level
60
40
20 4-Exceeds
0 expectations
at this level
Rating Frequency
1-Unacceptable 3
at this level
2-Limited acceptability 63
at this level
3-Acceptable 148
at this level
4-Exceeds expectations 103
at this level
160 1-
140 Unacceptable
at this level
120 2-Limited
100 acceptability
at this level
80
3-Acceptable
60 at this level
40
4-Exceeds
20 expectations
0 at this level
Rating Frequency
1-Unacceptable 8
at this level
2-Limited acceptability 33
at this level
3-Acceptable 159
at this level
4-Exceeds expectations 117
at this level
1-
160 Unacceptable
at this level
140
120 2-Limited
acceptability
100
at this level
80
3-Acceptable
60
at this level
40
20
4-Exceeds
0 expectations
at this level
Rating Frequency
1-Unacceptable 1
at this level
2-Limited acceptability 11
at this level
3-Acceptable 38
at this level
4-Exceeds expectations 107
at this level
1-
120 Unacceptable
at this level
100
2-Limited
80 acceptability
at this level
60
3-Acceptable
40 at this level
20
4-Exceeds
0 expectations
at this level
Rating Frequency
1-Unacceptable 0
at this level
2-Limited acceptability 12
at this level
3-Acceptable 67
at this level
4-Exceeds expectations 78
at this level
1-
80 Unacceptable
70 at this level
60 2-Limited
acceptability
50
at this level
40
3-Acceptable
30 at this level
20
10 4-Exceeds
0 expectations
at this level
Rating Frequency
1-Unacceptable 0
at this level
2-Limited acceptability 10
at this level
3-Acceptable 50
at this level
4-Exceeds expectations 97
at this level
100
1-
90
Unacceptable
80 at this level
70 2-Limited
60 acceptability
at this level
50
3-Acceptable
40 at this level
30
20 4-Exceeds
expectations
10
at this level
0
Rating Frequency
1-Unacceptable 0
at this level
2-Limited acceptability 11
at this level
3-Acceptable 39
at this level
4-Exceeds expectations 107
at this level
1-
120 Unacceptable
at this level
100
2-Limited
80 acceptability
at this level
60
3-Acceptable
40 at this level
20
4-Exceeds
0 expectations
at this level
Regarding the reliability and validity of the Unit Assessment system, the results for the two
semesters overwhelmingly endorse the system as it has been developed and described above.
There is no need to repeat these reliability and validity studies yearly (many major testing
programs do so only once a decade), but it would be prudent to perform another such study at
least every five years to ensure that the high standards achieved this year continue.
The purpose of this Unit Assessment is not to prove reliability and validity, of course. Those are
only tools that allow one to demonstrate the Unit Assessment is indeed doing what it was
designed to do -- to evaluate fairly the success of students at various levels in the undergraduate
Teacher Preparation Program in the four areas that are central to our Conceptual Framework.
These evaluations suggest that, for this year, the Teach Education Department is generally
meeting its goals. This is not to say that there is not room for improvement. The one area of the
four that consistently received the lowest mean ratings is the area of knowledge. While most
students have achieved the level of Acceptable at this level in the area of knowledge at all levels,
fewer have risen to the level of “Exceeds expectations at this level.” This is something that the
Department of Teacher Education might set as a goal -- increasing the percentage of students
who exceed expectations in this area.
Unlike reliability and validity studies, which need be conducted only occasionally, the annual
collection and reporting of ratings of students in these key areas16 needs to be continuous for the
Teacher Education Department to continue to assess (and improve) levels of achievement of its
students. Ongoing assessment will allow the Teacher Education Department to know where it is
now and to set goals for where it hopes to be in the future. It is heartening to find that the new
Unit Assessment procedure has proven such a hardy and valid system for this kind of assessment.
16
It perhaps goes without saying (but will be said here anyway, as it was also said in the
introduction to this report) that this is not the only self-assessment system in use by the
Department of Teacher Education and that it is not intended in any way to replace any other self-
assessment systems. Its goal is to assess student achievement in the four focal areas of
commitment, knowledge, reflection, and professionalism -- the four pillars of our Conceptual
Framework -- at three key points as students pass through our Teacher Preparation Program,
and in this task it appears to have done a remarkable job.
References
Breen, R., & Lindsay, R. (2002). Different disciplines require different motivations for student
success. Research in Higher Education, 43, 693-725.
Yorke, M. (1999). Leaving early: Undergraduate non-completion in higher education.
London: Palmer Press.
Acknowledgements
The Teacher Education Department would like to thank Michael Brogan for support and
assistance with all the statistical analyses reported above. His work managing these statistical
analyses has been invaluable and this report would not have been possible without it. Thanks
also to Sue Dintrone, who assembled the multitude of individual assessments by professors,
seminar leaders, cooperating teachers, and student teaching supervisors into a spreadsheet that
would allow Michael to work his statistical magic.