You are on page 1of 4

EDU 578 ASSESSMENT OF LEARNING NAME : YUSYAILA BINTI YUNUS MATRIC NO : 2011307775

THE DEFINITION OF TERMS. 1. Evaluation A systematic determination or something / someone using criteria against a set of standards. Comparison of actual impacts against strategies pans. Looks at original objectives, at what was accomplished and how it was accomplished. It can be formative (during) or summative (after/ lesson afterwards) to determine the quality of someone by formulating a judgement. 2. Assessment A process of documenting, usually a measureable terms such as knowledge, skills, attitudes and beliefs. A good assessment has both validity and reliability using a good cause, knowledge and others. Estimation how good or knowledgeable one student is compared to other. Validity and Reliability Two of the primary criteria of evaluation in any measurement or observation are: 1. Whether we are measuring what we intend to measure. 2. Whether the same measurement process yields the same results. These two concepts are validity and reliability. Reliability is concerned with questions of stability and consistency - does the same measurement tool yield stable and consistent results when repeated over time. Think about measurement processes in other contexts - in construction or woodworking, a tape measure is a highly reliable measuring instrument. Validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring). To continue with the example of measuring the piece of wood, a tape measure that has been created with accurate spacing for inches, feet, etc. should yield valid results as well. Measuring this piece of wood with a "good" tape measure should produce a correct measurement of the wood's length. To apply these concepts to social research, we want to use measurement tools that are both reliable and valid. We want questions that yield consistent responses when asked multiple times - this is reliability. Similarly, we want questions that get accurate responses from respondents - this is validity. 3. Measurement The use of assessment and the analysis of data such as scores. Measure ability and level of efficiency of students. 4. Test

An assessment (examination to measure the students knowledge or other abilities). Formal testing ofthen results in grades or a test score. May be interpreted with regard to a Norm or Criterion assessment.

5. Formative assessment Use to improve the strategies when the process is still on-going. It is a range of formal and informal assessment or procedures done by teachers during the learning process in order to modify teaching and learning activities to improve students understanding. Involve qualitative feedback for both students and teachers that focus on the details of performance such as observations, questioning and also peer or self-assessment. 6. Summative assessment Basically a drawing lessons from a complete teaching. To assess and summarizes the development of students in a particular time. Can also be derived as assessment of learning. This provides information on the product efficiency or the ability of the students to know what they are supposed to know. The assessment is usually fixed in term of time and content. (Cumulative evaluations used to measure students growth after a defined time) 7. Norm-referenced assessment (grading the learning of students by ranking them) A type of test, evaluation that able to estimate the position of the students, using the test scores. To test whether students performed better or worse among each other. 8. Criterion-referenced assessment Translating a score into a statement about the behaviour to be expected of a person. The objective is simply to see whether can perform and has learned the material or content. Involve the cuts core. (cuts score = pass / fail) 9. Validity Oxford Dictionary In its primary meaning it is arguments that are valid or invalid, according to whether the conclusion follows from the premises. Premises and conclusions themselves are not valid or invalid, but true or false. In model theory a formula is called valid, when it is true in all interpretations. To reiterate, validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring). How to assess the validity of a set of measurements? A valid measure should satisfy four criteria. Face Validity This criterion is an assessment of whether a measure appears, on the face of it, to measure the concept it is intended to measure. This is a very minimum assessment - if a measure cannot satisfy this criterion, then the other criteria are inconsequential. We can think about observational measures of behavior that would have face validity. For example, striking out at another person would have face validity for an indicator of aggression. Similarly, offering assistance to a stranger would meet the criterion of face validity for helping.

Content Validity Content validity concerns the extent to which a measure adequately represents all facets of a concept. Consider a series of questions that serve as indicators of depression (don't feel like eating, lost interest in things usually enjoyed, etc.). If there were other kinds of common behaviors that mark a person as depressed that were not included in the index, then the index would have low content validity since it did not adequately represent all facets of the concept. Criterion-related Validity Criterion-related validity applies to instruments than have been developed for usefulness as indicator of specific trait or behavior, either now or in the future. For example, think about the driving test as a social measurement that has pretty good predictive validity. That is to say, an individual's performance on a driving test correlates well with his/her driving ability. Construct Validity But for a many things we want to measure, there is not necessarily a pertinent criterion available. In this case, turn to construct validity, which concerns the extent to which a measure is related to other measures as specified by theory or previous research. Does a measure stack up with other variables the way we expect it to? A good example of this form of validity comes from early self-esteem studies - self-esteem refers to a person's sense of self-worth or self-respect. Clinical observations in psychology had shown that people who had low self-esteem often had depression. Therefore, to establish the construct validity of the self-esteem measure, the researchers showed that those with higher scores on the self-esteem measure had lower depression scores, while those with low self-esteem had higher rates of depression.

10. Reliability Reliability refers to a condition where a measurement process yields consistent scores (given an unchanged measured phenomenon) over repeat measurements. Perhaps the most straightforward way to assess reliability is to ensure that they meet the following three criteria of reliability. Measures that are high in reliability should exhibit all three. Test-retest Reliability When a researcher administers the same measurement tool multiple times - asks the same question, follows the same research procedures, etc. - does he/she obtain consistent results, assuming that there has been no change in whatever he/she is measuring? This is really the simplest method for assessing reliability - when a researcher asks the same person the same question twice ("What's your name?"), does he/she get back the same results both times. If so, the measure has test-retest reliability. Measurement of the piece of wood talked about earlier has high test-retest reliability. Inter-item reliability This is a dimension that applies to cases where multiple items are used to measure a single concept. In such cases, answers to a set of questions designed to measure some single concept should be associated with each other. Interobserver reliability

Interobserver reliability concerns the extent to which different interviewers or observers using the same measure get equivalent results. If different observers or interviewers use the same instrument to score the same thing, their scores should match. For example, the interobserver reliability of an observational assessment of parent-child interaction is often evaluated by showing two observers a videotape of a parent and child at play. These observers are asked to use an assessment tool to score the interactions between parent and child on the tape. If the instrument has high interobserver reliability, the scores of the two observers should match.

You might also like