You are on page 1of 20

A Guide to Classroom Assessment

By Kim Hartley

A Teachers Manual 2 A Guide to Assessment in Society and Environment

Contents
Introduction to Assessment
Validity Reliability Bias

Essays Portfolios Performance Assessment Selected Response


Multiple Choice True/Flase Completion Matching

6 7 9 11

Self Assessment Peer Assessment Creating Quality Rubrics Reference List

15 15 16 20

A Teachers Manual 3 A Guide to Assessment in Society and Environment

Introduction
Assessment is the process of collecting and interpreting evidence for some purpose. Assessment is a way in which teachers, educators and students can map progress in learning and development and can occur in many different forms. Whenever possible students should be involved in the assessment process as this promotes student engagement and chievement. Assessments can serve different purposes as can be used as a form of learning, as learning, or for learning. Summative assessment is the assessment of learning. This type of assessment is used for the purpose of reporting the achievement of individual students at a particular time. Formative assessment is the assessment for Learning and is intended to contribute directly to the learning process through feedback. This type of assessment is used to inform decisions about learning experiences and to report on what has been achieved. Peer and Self assessments are used as learning because they are used as part of the learning process.

Purpose

Use

Task

Agent of Judgement

Basis of Judgement

Score

System of Assessment; Variables. (Harlen 2007, p.15)

Assessment Validity
Validity is the soundness of your interpretations and uses of students assessment results. Validity is concerned with the soundness, trustworthyness, or legitimacy of the claims or inferences that are made on the basis of obtained scores (Mcmillan 2007, p.64). Validity is always determined by professional judgement, in classrooms this judgement is made by the teacher. An analysis is done by accumulating evidence that would suggest that an inference or use is appropriate and whether the consequences and interpretations are reasonable and fair. Validity comes from three forms of evidence; Content-related evidence, Criterionrelated evidence and Construct-related evidence (Mcmillan 2007, p. 65).
The interpretations of students assessment results are valid only to the degree to which you can point to evidence that supports them. The uses you make of your assessment results are valid only to the degree to which you can point to evidence that supports thier correctness and appropriateness. The interpretations and uses of your assessment results are valid only when the values implied by them are appropriate. The interpretations and uses you make of your assessment results are valid only when the consequences of these interprettions and uses are consistent with appropriate values.
The Four Principles of Validation (Messick, 1994)

A Teachers Manual 4 A Guide to Assessment in Society and Environment

You can improve assessment validity by asking other teachers or professionals to judge the assessment for clarity and by using different methods for assessing the same thing. Appropriate vocabulary, sentence structure and item delivery is necessary to ensure that all students can access the information within the item and adequate time should be given to students.Validity can also be improved by comparing students scores on similar but different tasks (Mcmillan 2007, p.69)

Assessment Reliability
Obsrved Score True Score Error

Reliability is concerned with consistancy, stability and dependability of the scores (Mcmillan 2007, p.71) Assessment reliability is the degree to which students results stay the same when; the same task is completed on two seperate occasions, another teacher marks work and gives the same answer, same result from a different but equivelant task (Brookhart 2007, p. 69). Statistical methods can indicate the degree of reliability and the approximate size of the measurement errors in the assessment. Concept of error in assessment is critical to our understanding of reliability. When we assess something, we get an observed score or result. This observed score is a product of what the true or real ability or skills plus some degree of error. Reliability is directly related to error. For each assessment there is some degree of error, so low, medium or high reliability. Error is measured by (SEM) Standard Error of Measure. SEM is determined by a formula that takes into account the reliability and standard deviation of the test. (Mcmillan 2007, p.71) Assessment reliability can be affected by internal factors such as the student s health, motivation, if they are tired, in a bad mood or are distracted. These can all create negative error which would underestimate the students true score. External factors such as the test directions and the room temperature and atmosphere also affect reliability. To ensure that assessment reliability is high it is recomended to use shorter assessments more often.

Assessment Bias

Test Usage Incorrect Test Criteria Test Takers

Test Environment

BIAS

Test Content

Bias can be found in the assessment tasks, its contents, process or problem. Bias is present if the assessment distorts performance because of the students ethnicity, gender, race or religion. There are two forms of assessment bias; unfair penalisation and offensiveness. Offensiveness occurs if the content of the assessment offends, upsets, distresses, angers or creates negive effects for particular students or subgroups

A Teachers Manual 5 A Guide to Assessment in Society and Environment

of students. This negative affect makes it less likely that the students will perform as well as they otherwise might, lowering the validity of the inferences. (mcmillan 78) Unfair penalisation is bias that disadvantages a student because of content that makes it more difficult for students from some groups to perform as compared to students from other groups (Mcmillan 2007, p. 79). Assessment bias can be minimised by having others review your work and by becoming aware of bias elements in assessments.

A Teachers Manual 6 A Guide to Assessment in Society and Environment

Essays
Essays are a good assessment to use for evaluating a students in depth knowledge of a topic but may be subjected to sampling. To ensure that this type of assessment is valid, the teacher must structure the question carefully to ensure that the response given by the students will demonstrte the students ability to meet the specificed marking criteria. This type of assessment is very end heavy and requires alot of time to mark the essays effectively. A rubic cube is used by teachers to assist in the reliballity of scores for essays. Bias can exist in essay questions if the question is not written clearly and concisely. There are two forms of essay questions, restricted response and extended response. Restricted response required students to produce brief answers or short essays. Restricted response items restrict or limit both the content of students answers and the form of their written response. This is done by the way the restricted response task is phrased. (Brookhart 2007, p.190). Extended response requires students to write essays in which they are free to express their own ideas and to use their own organisation of their answers. Usually no single answer is considered correct. A student is free to chose the way to respons. The degree of correctness or merit of a students response can be judged only by a skilled teacher who is informed on the subject (Brookhart 2007, p.190) This type of assessment should be used to assess the students ability to explain cause-effect relationships, ability to present arguments and formulate valid conclusions. This assessment technique requires that students use high order thinking skills as they have to have a deep understanding of the topic and complex information (Mcmillan 2007, p.312). Advantages assesses deep understanding, complex thinking, and reasoning skills, provides students with flexability in their response, evaluates students ability to communicate their reasoning, reduces guessability and encourges students to study content widely and in depth. Disadvantages soring and reading is highly time consuming, especially if giving meaninful feedback, there can be issues with unreliable scoring. This type of assessment does not provide very good example of content knowledge as relatively few questions are asked and therefore validity is also decreased (Designing Test Questions, 2003). Examples of Essay Questions Arguably the novel has responded more quickly and fully to new ideas than any other literary genre. It is a social and moral document, as well as an art form. Do you agree? Discuss with detailed reference to two texts.
This is a good essay question because it offers an opinion and asks the student to respond in their chosen way which provides structure for what needs to be said in the response but freedom in the way the response is given and perspective the student takes. This question requires the student to use high order thinking skills to answer it and is directly related to learning targets.

The complex movement from pleasure to understanding in the reading of fiction. What does this mean?
This is another good example of an essay question. The student is required to explain the given statement and in doing so they will be employing high order thinking skills and a deep understanding of literature and its impact on readers. The topic is clearly defined and it is clear what is required of the student.

Explain the importance of religion.


This essay question is too broad and is open to interpretation. It is unclear what the learning objectives would be with this question unless it is to assess a general superficial understanding of religion. It is unclear what sort of response is required from the student.

A Teachers Manual 7 A Guide to Assessment in Society and Environment

Portfolio
A portfoliio is a limited collection of a students work used either to present the students best work or to demonstrate the students educational growth over time. The works put into the portfolio are limited to those that best serve the portfolios purpose (Brookhart 2007, p.249) Portfolios may be used to evaluate students abilities and improvement. Types of portfolios include best work, showcase and growth. Best works portfolios are summative assessments where as growth and learning portfolios are formative assessments. Portfolios can include; Videos, tests, group work, reports, drafts, photos and comments. Concerns are reliability due to sampling and rater consistancy and validity as this is a potential source of test bias.

Purpose of Portfolio
what learning targets and curriculum goals will the portfolio serve will the portfolio be summative, formative or both

Organisation
what types of entries should be included what number of entries will be required what is the timeframe for each entry

Use in Practice
when will the students work on their portfolios how will the portfolio mark be weighed how will the portfolio fit into classroom routine

Evaluation of Portfolio
is it necessary to have an overall portfolio score How often will the portfolio need to be scored will evaluations of every entry count towards the portfolios grade

Brookheart, p291-292

Advantages allow reflection and growth, create collabortive climate amongst students, provides oppertunity to assume responsibility for own learning, contributes to self awareness and self esteem, provides meaningful picture of student growth, allows for the integration of instruction and assessment, provides concrete and tangible evidence for communication. Disadvantages increased workload for the teacher, decreases instructional time, interfering with teaching and learning, controversy over reliability and validity of data collected as well as standardised portfolio content, requires alot of teacher knowledge about data gathering and interpreting data.

A Teachers Manual 8 A Guide to Assessment in Society and Environment

Example Rubric Whole Portfolio (Grades 3-12) Strong Change over time
The student selects material that clearly demonstrates growth in one or more specific areas.

Developing
The samples show evidence of some growth, but the growth is limited.

Not there yet


The samples in the portfolio do not show evidence of noticeable student growth or change over time. Either noticeable growth has not occurred, or the student has not selected the samples of work that would illustrate that growth clearly. The portfolio reflects minimal diversity. All tasks represented are more or less alike, and demonstrate the same outcomes/skills.

Not Attempted

Diversity

The portfolio clearly demonstrates that the student has tried a variety of tasks/projects/assignments/ challenges. There is great variety in the kinds of work presented or the outcomes/skills demonstrated. The work in the portfolio provides evidence that the student has identified, analyzed, planned strategies, and worked through the solution to a problem or question.

The portfolio reflects some diversity. Tasks are not all parallel and do not all demonstrate identical outcomes.

Evidence of thinking

The work in the portfolio shows some evidence of thinking, reasoning, analyzing or problem solving, but the student may not have worked all the way through a solution, or may have missed opportunities to pull together interesting conclusions or plot alternate strategies. Still, the students work overall shows signs of planning and purposeful effort. Self-reflections included within the portfolio provide at least a superficial analysis of strengths and needs, which may or may not be tied to specific criteria for judging performance or growth. The portfolio is arranged and formatted in a way that enables the reader to make sense of it with a little work. Most items within the portfolio are labeled, dated or both.

The student has not included in the portfolio any work that clearly demonstrates purposeful planning of strategies, problem solving, and analysis of a situation, reasoning out a conclusion, or considering alternative solutions.

Self-reflection

Several (or more) examples of self-reflection show thoughtful consideration of personal strengths and needs based on in-depth understanding of criteria.

Either no self-reflection is included within the portfolio, or the selfreflection is rudimentary.

Structure and Organisation

The student has formatted and arranged the portfolio in a way that invites the reader inside. Items within the portfolio are clearly labeled and dated; the sequence is purposeful.

Arrangement and formatting of the portfolio make it difficult for the reviewer to determine when and under what circumstances it was assembled. Few items (if any) are clearly labeled or dated.

Arter, J., and Chappuis, J., Creating and Recognising Quality Rubrics, Educational Testing Services, New Jersey, 2006

A Teachers Manual 9 A Guide to Assessment in Society and Environment

Performance Assessment
Assessment based on authentic tasks such as activities, exercises or problems that require students to show what they can do. They have more than one acceptable solution. They call for students to create a response to a problem and then explain or defend it. Uses higher order thinking skills such as cause and effect analysis, experimentation, problem solving. To ensure high reliability ensure that the tasks and responses tap into a kind of processing required. Performance assessment targets deep understanding of a topic as well as a students resoning, presentation and communication skills. It is authentic and engaging with no singular answer correct. But has low inter-rater reliability, can be prone to sampling and is time consuming. Rubric for Performance Assessment is very difficult to complete. One suggestion is to identify the overall performance or task to be assessed and perform it yourself, list the important aspects of the performance or task, limit the number of performancne criteria, express performance criteria in terms of observable pupil behaviours or characteristics, dont use amibigius words, arrange criteria in order they are likely to be observed. Two examples of performance assessment are oral presentations and role play. Bias in oral presentations can be present for several reasons. One of the most obvious reasons would be the test takers ability to remain calm and confident during the presentation and to maintain control over their nerves. Reliabiity for role plays are low as the teachers impression or interpretation of the presentation will be reflected in the marking. Performance assessment is difficult to construct and grade and time consuming to give and take but provides a way to measure skills and abilities which would otherwise be impossible to assess (Designing Test Questions, 2003). Example Rubric Oral Presentation (Grades 4-12) Strong Content Ideas
Ideas are focused and supported with relevant details and examples. Content is relevant for the task. Information is accurate.

Middle
Support is attempted, but doesnt go far enough.

Weak
There is little controlling idea, the speaker is still in search of a topic.

Not Attempted

Not Attempted

Information accuracy

Ideas are reasonably clear, but there are some problems with accuracy. The speaker generally stays on the topic, but doesnt develop a clear theme. The listener is left with questions. There seems to be some holes in the information.

Information is limited, unclear, or incorrect.

Audience understanding

The speaker has chosen the most significant information and stays with the topic.

The presentation may be repetitious or sound like a collection of disconnected thoughts.

Organisation Sequence of events

The speaker helps the listener understand the sequence of ideas

The sequence and relationships are fairly easy to follow, but sometimes you

Ideas that go together are not put together.

Not Attempted

A Teachers Manual 10 A Guide to Assessment in Society and Environment

through organizational aids.

Easy to follow ideas

Listeners can put the ideas in an outline.

have to make assumptions to connect the ideas. An outline of the ideas requires inferences. The presentation has a recognizable opening and closing, but there is little sense of anticipation or closure.

Listeners would have trouble putting the ideas into an outline. There is no opening or closing. Details seem to fit where theyre placed. Sequencing is confusing.

Opening and closing

The opening draws the listener in; the closing leaves a sense of closure and resolution.

Delivery Volume

Volume is loud enough to be heard and understood.

The speaker can be heard and volume doesnt distract the listener. Visual aids, while understandable, dont add much to the presentation.

The speaker cant be heard and/or changes in volume distract the listener. Visual aids are confusing, do not relate to the point being made, or distract the listener. Pronunciation and/or enunciation detract from being able to understand the speaker.

Not Attempted

Visual Aids

Visual aids are used effectively to support and enhance meaning.

Pronounciation

Pronunciation and enunciation are clear enough to be understood and are used to emphasize important points. The speaker exhibits very few disfluencies, such as ah, um, and you know.

Pronunciation and/or enunciation are generally clear enough to be understood. While the speaker exhibits disfluencies, they dont detract from the presentation enough to interfere with meaning. Pacing is fairly good, but at times the speaker goes too fast or too slow for the listeners to keep up. The speaker uses bland language that, while not detracting from the message, does little to enhance it.

Disfluencies

Disfluencies, such as um, ah, and you know, detract from understanding what is being said. Pacing is awkward.

Pacing

Pacing is right for the audience. The speaker knows when to slow down and when to speed up.

Language Use Language and Volcabulary

Words and phrases are accurate, to the point, create pictures in the listeners head, and/or result in emphasizing the intended points. The speaker consciously uses language techniques such as vivid language, emotional language, humor imagery, metaphor, and simile.

Grammar and vocabulary detract from being able to understand the speakers message.

Not Attempted

Communication

Words and grammar are accurate and communicate, but dont capture the listeners attention.

Word and phrases either sound like a thesaurus on the loose or are so nondescript, such as thing and stuff that the listener looses attention.

Arter, J., and Chappuis, J., Creating and Recognising Quality Rubrics, Educational Testing Services, New Jersey, 2006

A Teachers Manual 11 A Guide to Assessment in Society and Environment

Selected Response
These type of assessments are commonly known as closed assessments, meaning that there is a correct answer for each question which is not open to interpretation. The advantages of this type of assessment is that it can test large groups in a relatively short period of time and that the test has a high reliability. It is also very quick to mark and in most cases can be marked by a computerised system.

Multiple Choice
Multiple Choice Tests are more complecated than they may seem. The first part of a Multiple Choice test is the item stem, or question, then there are the response alternatives which can be used to complete the stem or answer the question it proposes. Of these response alternatives there will be one correct answer and two or more incorrect answers, known as distractors. Multiple choice questions are able to test high order thinking skills only if they are constructed well. in order to answer such a question the studnets would require more time to select their answer. Advantages versitile, reliable, efficient and accurate, objective, wide sampling of content. Can test on a large scale with reliable and unbias marking, even computerised marking. Disadvantages time consuming and difficult to construct, may favour simple recall, depends heavily on students reading ability. Once a multiple choice question has been used it should not be used in its entirety again, this adds to the time consuming task of writing questions. Example

1. The fundamental differences between men and women are what? a. Their ability to do two things at once. unrealistic options improve guessability b. Physical and psychological differences. make sure correct answers position is random c. The ability to drive properly unrealistic options improve guessability d. All of the above this option only serves to be a filler 2. The following foods are not considered to be healthy to eat on a daily basis. a. Breads and cereals b. Butter, ice-cream, jellybeans, biscuits, chocolate, lollies, custard, chicken and cheese sausages and cakes. make all choics of equal length the reduce guessability c. Bananas and apples avoid easily guessable items and encourage Best Choice responses 3. William Shakespear is a famous playwrite and many of his plays are still studies in schools today. What play did William Shakespeare write? Eliminate irrelevant information in the stem a. Follow Thy Fair Sun b. Oh Mistress Mine c. Redemption d. Easter Wings e. Queen and Huntress f. Sonnet of Black Beauty limit responses to 3-4 4. In what situation should you phone 000 and what information do you need to provide the opperator? Only use one problem per stem a. Only in an emergency and any information they ask you to provide b. Only in an emergency and your name, your location and details of the emergency

A Teachers Manual 12 A Guide to Assessment in Society and Environment

c. Only if you need an ambulance and any information they ask you to provide d. All of the above avoid using all of the above

True/False
The stem, or question, makes a statement and the test taker is required to indicated if the statement is true or false. True/False items have an increased guessability as there are only two options available to the test taker. They are quick to complete and are frequently used to test low level skills. This types of assessment works best when you wish to evaluate knowledge level content, evaluate students understanding of popular misconceptions and when testing concepts which have two logical responses (Designing Test Questions, 2003). Advantages easy to write and develop, quick to score, easily administed to large numbers of students, less discriminating, can be used as a written or oral assessment, limits bias due to poor writing or reading skills, can sample large amount of knowledge in a short amount of time, objective nature limits bias in scoring. Disadvantages - may overestimate learning due to guessability,often leads to test trivial facts, encourages rote memorization, a large number of items are needed for reliability, it is difficult to discriminate between students who know the content material and those who dont. Example

1. Atoms of an element always have same atomic mass? a. True b. Flase 2. According to modern research, all atoms of an element are identical. a. True b. False avoid absolute language 3. According to the Education Department there is a gap between the achivements made by Abroiginal students and non-Aboriginal students and to remedy this the government is implementing programs to help close this gap and these programs are having positive results. Although not all Aboriginal students are benefiting from these initiatives it is generally thought that within twenty years all studnets whether Aboriginal or non-Aboriginal will be achieving at similar levels. a. True do not copy questions straight from the text book b. False avoid long complicated sentences 4. Most of the illegal immirgrants who come to Australia are poor and unskilled labourers who will require welfare benefits in order to live in Australia. a. True b. False use popular misconceptions to test studnets knowledge 5. It is not raining outside and there are puddles on the ground. a. True b. False only use one central idea for each question 6. Fruits are sweet to eat.

A Teachers Manual 13 A Guide to Assessment in Society and Environment

a. True b. False

ask questions that are obviously true or false

Completion Items
A partial statement is presented to the student and the student must select the correct response to complete the statement. Advantages provides a wide sampling of content, efficiently measures lower levels of cognative ability, minimize guessing, provides objective measure to student achievement, Disadvantages are difficult to construct, difficult to measure more than recall of basic information, may include irrelevant clues to answer, time consuming to score compared to T/F and MCQ, more than one answer may be correct. Example

1. The local council provides _____________________ for local residence to ______________ on a __________________ basis. do not omit too many key words or the meaning will be lost 2. The house was painted blue and red with green ______________lining the driveway. These trees __________________shade for the ________________who lived in the _______________. only omit meaningful words and avoid clues

Matching Items
Matching items effectively and efficiently measures the extent to which students know related facts, assosiations and relationships. Assosiations include terms with definitions, persons with descriptions, dates with events and symbols with names. By using this type of assessment the teacher can obtain a good sample of a large amount of knowledge and if the question and answer is constructed properly, can assess at the knowledge and comprehension levels. This types of assessment is easily and objectively scored. Constructing good matching items is more difficult then creating completion or short-answer items, but not as difficult as preparing multiple choice items. Poor matching items are constructed where there is insufficient material to include in the item and irrelevant information is added (McMillan 2007, p.172). Advantages spacesaving and objective way of assessing a number of important learning targets, such as students ability to identify relationships and assosiations between two things. Disadvantages students can use rote memorisation to learn the elements in two lists limited to the assessmetn of memorised factual information, does not assess higher order thinking, time consuming for students. Example

Match the author to the title of their novel. 1. F. Scott Fitzgerald a. The Catcher in the Rye

A Teachers Manual 14 A Guide to Assessment in Society and Environment

2. 3. 4. 5. 6.

Vladimir Nabokov Mary Durack John Steinbeck D.J. Salinger D.H. Lawrence

b. The Grapes of Wrath c. Lolita d. The Great Gatsby e. Lady Chatterleys Lover f. Kings in Grass Castles

Use more responses than premesis to avoid perfect matching Put reponse into a logical order, e.g. alphabetical order or chronological Place premesis and responses on the same page to avoid confusion and distractions

Match these significant events to the date when they occured. 1. Outbreak of WW1 2. Titanic sank 3. Federation of Australia 4. America was discovered 5. End of World War 2 6. America declared war on Iraq 7. Twin Towers terrorism attack 8. Berlin wall was torn down 9. First man landed on the moon 10. Electricity was invented 11. Women were given the vote in Australalia 12. Osama Bin Laden was killed 13. Martin Luther King Jr. was assasinated 14. First colour TV broadcast in Australia 15. First test tube baby born 16. Dolly the sheep is cloned a. 1497 b. 1884 c.1901 d. 1902 e. 1914 f. 1912 g. 1929 h.1945 i. 1961 j. 1968 k. 1969 l. 1978 m. 1996 n. 2001 o. 2003 p. 2011

Limit items to 15 or less, ideal number would be between 4-8 Usse homogenous material for each exercise, this example test knowledge that spans several content areas. To test more than simple recall or basic knoledge

Match the beginning of each sentence to its other half. 1. 2. 3. 4. 5. All the names of the days of the week... There are twelve months in a English is the lanuage people speak in An elephant has a A baby pig is called a a. year and four seasons b. and in ay c. long trunk whihc it uses like a hand d. piglet and these piglets are pink e. Australia, England and America, as well as many other countries, but is the. national language for these countries. Other languages are spoken in these countries also.

Responses short and logical Avoid grammatical clues to correct answers. (do not use incomplete sentences as premesis)

A Teachers Manual 15 A Guide to Assessment in Society and Environment

Self Assessment
Students self assess naturally, teacher assessment is not sufficiently transparent, to deepen students learning experience, let the students into the assessment culture, help students towards autonomous learners, students gain more feedback. Students assess their own performance and can reduce the assessment workload of academic staff. The process of assessment is itself an inherently valuable learning experience. Advantages encourages studnets to become reflective and self critical learners, enables students to take greater responsibility for, and become actively involved in the assessment for learning process, enhanse the quality and depth of the learning that takes place, facilitates self pacing enabling studnets to develop target setting and time-management skills., provides students with instant feedback, students are aware of what is expected of them, students can learn from their mistakes, can complement other modes of assessment. Disadvantages students lack experience, students may look up questions before trying to answer them, hard to create good and valid Self Assessment Questions, totally dependant on the studnets willingness to partiicipate, is not favoured for summative assessments.

Peer Assessment
Peer assessment involves students assessing the performance of other students. Advantages involves students in the assessment process, provides students with greater sense of ownership, students more aware of objectives and learning outcomes, enhanses the quality of the learning process, creates a benchmark which students can use to reflect the standard of their own work, facilitates a sense of partnership in learning within the classroom, suits various styles such as formative, summative, informal or formal. Disadvantages student opinion may be over subjective, lack of experience, requires careful planning, implementation and monitoring, students need to be trained to assess properly, some students may not wnt the responsibility of marking other students work, if not monitored sufficeintly then fairness and effectiveness will me impacted, may cause undue stress and anxiety for students.

A Teachers Manual 16 A Guide to Assessment in Society and Environment

Creating Quality Rubrics


There are two types of rubrics; holistic rubrics and analytical rubrics. Holistic rubrics requires the teacher to score the produce or process as a whole without using individual criteria. Analytical rubrics require the teacher to appoint a mark to each seperate criteria and to then add up the individual marks to obtain a whole score.

Example Rubric A Rubric for designing quality rubrics 1 Weak Covers the right content
You cant tell what learning target(s) the rubric is intended to assess, or you can guess at the learning targets, but they dont seem important. The rubric doesnt seem to align with the content standards/learning targets it is intended to assess. You can think of many important dimensions of a quality performance or product that are not in the rubric, or content focuses on irrelevant features.

3 Medium
Much of the content represents the best thinking in the field, but there are a few places that are questionable.

5 Strong
The content of the rubric represents the best thinking in the field about what it means to perform well on the skill or product under consideration.

Some features dont align well with the content standards/learning targets it is intended to assess.

The content of the rubric aligns directly with the content standards/ learning targets it is intended to assess. The content has the ring of truthyour experience as a teacher confirms that the content is truly what you do look for when you evaluate the quality of a student performance or product. In fact, the rubric is insightful; it helps you organize your own thinking about what it means to perform well. The rubric is divided into easily understandable criteria as needed. The number of criteria reflects the complexity of the learning target. If a holistic rubric is used, its because a single criterion adequately describes performance. The details that are used to describe a criterion go together; you can see how they are facets of the same criterion. The relative emphasis on various features of performance is right things that are more important are stressed more; things that are less important are stressed less. The criteria are independent. Each important feature that contributes to quality work appears in only one place in the rubric.

Much of the content is relevant, but you can easily think of some important things that have been left out or that have been given short shrift, or it contains an irrelevant criterion or descriptor that might lead to an incorrect conclusion about the quality of student performance.

Criteria are well Organised

The rubric is holistic when an analytic one is better suited to the intended use or learning targets to be assessed; or the rubric is an endless list of everything; there is no organization. The rubric seems mixed up descriptors that go together dont seem to be placed together. The rubric is out of balance features of more importance are emphasized the same as features of less importance.

The number of criteria needs to be adjusted a little: either a single criterion should be made into two criteria, or two criteria should be combined.

Some details that are used to describe a criterion are in the wrong criterion, but most are placed correctly. The emphasis on some criteria or descriptors is either too small or too great; others are all right.

Descriptors of quality work are represented redundantly in more than one criterion to the extent that the criteria are really not covering different

Although there are instances when the same feature is included in more than one criterion, the criteria structure holds up pretty well.

A Teachers Manual 17 A Guide to Assessment in Society and Environment

things. Number of Levels Fits Targets and Uses The number of levels is not appropriate for the learning target being assessed or intended use. There are so many levels it is impossible to reliably distinguish between them, or too few to make important distinctions. No levels are defined. Teachers might find it useful to create more levels to make finer distinctions in student progress, or to merge levels to suit the rubrics intended use. The number of levels could be adjusted easily. The number of levels of quality used in the rating scale makes sense. There are enough levels to be able to show student progress, but not so many levels that it is impossible to distinguish among them.

Levels Defined Well

Only the top level is defined. The other levels are not defined. There is some attempt to define terms and include descriptors, but some key ideas are fuzzy in meaning.

Each score point (level) is defined with indicators and/or descriptors. There is enough descriptive detail in the form of concrete indicators, adjectives, and descriptive phrases that allow you to match a student performance to the right score. Two independent users, with training and practice, assign the same rating most of the time.

Wording of the levels, if present, is vague or confusing.

It is unlikely that independent raters could consistently rate work the same, even with practice. Rating is almost totally based on counting the number or frequency of something, even though quality is more important than quantity. Wording tends to be evaluative rather than descriptive of the work.

You have a question whether independent raters, even with practice, could assign the same rating most of the time. There is some descriptive detail in the form of words, adjectives, and descriptive phrases, but counting the frequency of something or vague quantitative words are also present. Wording is descriptive, not evaluative.

If counting the number or frequency of something is included as an indicator, changes in such counts really are indicators of changes in quality.

Levels Parallel

Levels are not parallel in content and there is no explanation of why, or the explanation doesnt make sense.

The levels are mostly parallel in content, but there are some places where there is an indicator at one level that is not present at the other levels.

The levels of the rubric are parallel in contentif an indicator of quality is discussed in one level, it is discussed in all levels. If the levels are not parallel, there is a good explanation why.

Arter, J., and Chappuis, J., Creating and Recognising Quality Rubrics, Educational Testing Services, New Jersey, 2006

A Teachers Manual 18 A Guide to Assessment in Society and Environment

Examples
1 Weak 0-1 point raised 3 Medium 2-3 points raised 5 Strong 4-5 points raised

Points raised in essay

It should be the quality of the points raised that is assessed, not the numerical number

Writing ability

1 Weak Lots of spelling mistakes, lots of grammatical errors and misuse of punctuation.

3 Medium Few spelling mistakes, minimal gramatical errors and punctuation misuse.

5 Strong No spelling mistakes, gramatical errors or punctuation mistakes.

The gap between all categories 1 and 2 and 2 and 3 should be even

Use of resources

1 Weak No significant resources were used. Student does not demonstrate how information from resources ties in with argument.

3 Medium Several resources used, but not all of them are relevant.

5 Strong Appropriate use of relevent resources. Student can demonstrate how information from resources ties in with their argument.

Criteria must be consistant across levels

Expression

1 Weak Student does not express their own ideas.

3 Medium Student expresses their ideas ok

5 Strong Student expresses their ideas very well

Ensure that language used is clearly defined and not subject to interpretation

Not attempted

Unsatisfactory

Satisfactory

Very Satisfactory

Good

Very Good

Excellent

Do not use so many levels that it is hard to distunguish between them

Content

1 Weak Content is not relevant and does not provide reader with information and examples. There are many spelling mistakes.

3 Medium Content is relevant and does provide some examples and information. There are a few spelling mistakes.

5 Strong Content is relevant and provides good examples and information for the reader. There are no spelling mistakes.

Only judge one criteria at a time

A Teachers Manual 19 A Guide to Assessment in Society and Environment

Example Rubric Essay Scoring Criteria, Secondary Social Studies (0= no response, 5= highest understanding) 0 No response
Prior knowledge, facts and events No facts/events mentioned that are not found in the text of the debates No response

1 No facts
No facts

2 Not there yet


One to two pieces of information that are not found in the text of the debates One principle /concept Material from the text accounts for about 14% of the essay Very weak argument At least one serious misconception

3 Understanding
Three to four pieces of information that are not found in the text of the debates Two principles /concept Material from the text accounts for about 12% of the essay Argument present

4 High Understanding
Five to six pieces of information that are not found in the text of the debates

5 Excellent Understanding
Seven or more pieces of information that are not found in the text of the debates. Four or more principles /concept The essay uses or is based on material from the text only Strong and convincing argument No misconceptions

Number of principles or concepts Porportion of text detail

No principles /concepts No information from text

Three principles /concept Material from the text accounts for about 34% of the essay Supported argument Very minor misconception

No information from text

Argumentation

No response

Misconceptions

No response

Argument is hard to decipher One or more serious misconceptions central to the essay

Several minor errors and/or a moderate misconception

Arter, J., and Chappuis, J., Creating and Recognising Quality Rubrics, Educational Testing Services, New Jersey, 2006

In this rubric there are many errors.

1. The student could have four principles of poor quality and get a better mark than someone who has three really good principles. This is the same for the information presented by the student. 2. There are too many levels present. 3. There is an uneven amount of space between each level, such as argument is hard to decifer and very weak argument. 4. The same criteria is used for a mark of 0 and a mark of 1 for the criteria Portion of text detail, Prior knowledge, facts and events and Number of principles or concepts.

A Teachers Manual 20 A Guide to Assessment in Society and Environment

References

Arter, J., and Chappuis, J., Creating and Recognising Quality Rubrics, Educational Testing Services, New Jersey, 2006 Broadfoot, P., An Introduction to Assessment, Continuum International Publishing Group, 2007. Brookhart, s., and Nikito, A., Educational Assessment of Students, Pearson Education, New Jersey, 2007. Designing Test Questions, Grayson Walker Teaching Resource Centre, 2003; Accessed 21/05/2011, http://www.utc.edu/Administration/WalkerTeachingResourceCenter/FacultyDevelopment/Assessment/asse ssment.html Harlen, W., Assessment of Learning, Sage Publications, London, 2007. McMillan, J., Classroom Assessment; Principles and Practice for Effective Standards-based Instruction, Pearson Education, 2007.

You might also like