Professional Documents
Culture Documents
UNIT-1:..........................................................................................................................1
INTRODUCTION........................................................................................................1
1.1
1.2
1.3
1.4
1.5
1.6
UNIT-2:........................................................................................................................52
JUDGING THE QUALITY OF THE TEST............................................................52
2.1
2.2
2.3
2.4
2.5
UNIT-3:........................................................................................................................69
APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS)................................69
3.1
3.2
3.2
3.3
3.4
UNIT-4:......................................................................................................................102
INTERPRETING THE TEST SCORES...............................................................102
4.1 THE PERCENTAGE CORRECT SCORE:..............................................................102
INTERPRETING THE TESTS SCORES.............................................................102
4.2 THE PERCENTILE RANKS:...............................................................................113
4.3 STANDARD SCORES:........................................................................................118
4.4 PROFILE: >>>>>>>........................................................................................120
UNIT-5:......................................................................................................................121
EVALUATING PRODUCT, PROCEDURES & PERFORMANCE...................121
5.1 EVOLUTION THEMES AND TERMS PAPERS:....................................................121
5.2 EVALUATING GROUP WORK & PERFORMANCE..............................................132
1
UNIT-8:......................................................................................................................197
SELECTED TESTS OF SIGNIFICANCE............................................................197
8.1 T-TEST:...........................................................................................................197
8.2 CHI-SQUARE (X2):..........................................................................................200
8.3 REGRESSION:..................................................................................................205
UNIT-1:
INTRODUCTION
1.1
1.1.1
Evaluation:
Literally, the term evaluation means appraisal, Judgment or assessment,
2.
3.
4.
Approaches to Evaluation
Evaluation in our schools is essentially concerned with two major approaches
to making judgments:
1.
Product Evaluation:
It is the evaluation of students performance in a specific learning context.
Such kind of evaluation seeks to determine how well the student has achieved
the stated objectives of the learning situation. In this sense the students
performance is seen as a product of the educational experience. A school
report is an example of Product Evaluation.
2.
Process Evaluation:
It is that kind of evaluation that seeks to examine the experiences and
activities involved in we learning situation. It makes judgment about the
process by which students acquired learning. In more simple words, it
examines the process of learning experience before it has concluded. For
example, the evaluation of the nature of students-teacher interaction,
instructional methods, school curricula, a specific programmes etc. are the best
examples of Process Evaluation.
1.
Curriculum Evaluation
Curriculum Evaluation, as is clear from the name, is the evaluation of a certain
curriculum i.e. an instructional programme. It is used to determine the
outcome
Programme Evaluation
Programme Evaluation is used for judging the effectiveness of a programme
or a special project. This evaluation is used to make a decision about
programme installation and modification. It helps to obtain evidence to
support or oppose a programme. Outside education, programme evaluation' is
used as a means of determining the effectiveness, efficiency and acceptability
of any form of programme. But within education, we can use the term in a
similar way as in the case of evaluating the effectiveness of a new writing, or
reading programme in primary schools. A curriculum evaluation may qualify
as a programme evaluation if the curriculum is focused on change or
improvement. Programme evaluation, however, does not involve appraisal of
curricula (e.g. evaluation of a computerized student record keeping system.)
3.
Personnel Evaluation
The evaluation of personnel is the assessment of the performance of a working
personnel in an organization. That is why it is also called performance
appraisal or staff evaluation. In education, 'personnel evaluation' is very much
necessary for adopting appropriate appraisal, plans and procedures to achieve
the goal of education. According to McNeil, J.D. "Evaluation of the
Institutional-Evaluation:
Institutional Evaluation is the evaluation of the total programme of a school,
college, university or other educational institution. The evaluation of an
institution is used to collect information and data on all aspects of the function
of that institution. The basic aim of this evaluation is to determine the degree
to which instructional objectives are being met and to identify areas of
strength and weakness in the total programme. An institutional evaluation
involves more than the administration of tests to students; it may require any
combination of questionnaire, interviews, and observations with data being
collected from all persons in the institution community, including
administrators, teachers, and counsellors. The major component of
institutional evaluation is the institution testing. The more comprehensive the
testing program, the more valuable are the resulting data. That is why, for
achieving the most valuable resulting data, institutional testing programme
should include measurement of achievement, aptitude, personality and
interest. Tests selected for an institutional evaluation must match the
objectives of the institution and be appropriate for the students to be tested.
Assessment:
Concept of Assessment
worth. Assessment is a general term that includes the full range of procedures used to
gain information about student learning (observations, ratings of performances or
projects, paper-and-pencil tests) and the formation of value judgments concerning
learning progress. A test is a particular type of assessment that typically consists of a
set of questions administered during a fixed period of time under reasonably
comparable conditions for all students. (Linn and Groundlund, 2000)/ Assessment
may include both quantitative descriptions (measurement) and qualitative descriptions
(non-measurement) of students. In addition, assessment always includes value
judgments concerning desirability of the results. Assessment may or may not he base
on measurements; when it is, it goes beyond simple quantitative description.
The process of collecting, synthesizing, and interpreting information to aid in
decision making is called assessment. For many people, the, words,, classrooms,
assessment evoke images of pupils taking paper-and-pencil, test, teachers scoring
them, and grades being assigned to the pupils based upon their performance.
Assessment, as the term is used here, includes the full range of information that
teacher gather in their classrooms; information that helps them understand their
pupils, monitor their instruction, and establish a viable classroom culture. It also
includes the variety of ways teachers gather, synthesize, and interpret that
information.' Assessment is a general term that includes all the ways through which
teachers gather information in their classrooms.
Need for Assessment in Education
As long as there is need for the educator to make some instructional decisions,
curricular decision, and selection decision. Placement or classification decisions based
on the present or anticipated educational status of the child so long will there be need
for assessment in educational enterprise. To the modern educator, the ultimate goal of
assessment is to facilitate learning. This could be done in a number of ways, in each
way a separate type of decision is required. The assessment decision also determines
which of tests is to be used for assessment. Thus there is a close relationship between
the purpose of evaluation, evaluate decisions and types of tests to be used for them.
The purposes of assessment are as follows:
Selection Decision
Whenever there will be choice, selection decision is to be made. In our daily
life we see that institutions and organization need persons for their work, they get
responses from several people but they cannot take all of them. They have to make
selection out of them. Assessment of these persons is to be made on the bases of tests
given to them. Tests will provide information, which will help in selection decision.
Some persons will be acceptable while others will not be acceptable. Similarly the
universities have to make section decisions for admitting the students to various
courses. Courses in which hundreds of candidates are applicants, Selection decision is
to make on stronger footing. Naturally some tests are given to the candidates to help
in selection decision such as Aptitude tests, Intelligence tests. Achievement tests or
Prognostic tests are generally given for the purpose of selection decision. There has
been ruling from the judiciary that the scores on these tests should have a good
relationship with the success in the job or the course for which the tests has been
given. If any selection tests does not fulfill this requirement it needs to be improved or
replaced by a better one I Although perfection of such tests cannot be guaranteed but
any institution or organization which is interested in the best students or workers will
continue to make efforts in improving the tests being used for the purpose of
selection.
Placement Decision
Since school education should be provide to all in a welfare state the schools
must make provision for all, they cannot reject the candidates for admission as the
universities or colleges can do. How these candidates placed in different programmes
of school education is to be determined on the basis of their assessment. Such school
determinations are called placement .decision. These decisions are required not only
in the case of those who are with some disadvantage but also with those who are
gifted and talented. The schools have to find one or the other programme for all
school age children depending upon their weakness or strength. Placement tests have
to be different and more useful from selection, tests because they improve the
decision to differentially assign students to teaching programmes. Achievement test
and interview are generally used for placement decision.
Classification Decisions
Assessment is also required to help in making decisions in regard to assigning
a person to one of several different categories, jobs or programmes. These decisions
are called classification decisions because in one particular job or programme, there
may be several levels or categories. To which level or category a particular person of
child be assigned, depends upon the results of the test. Aptitude tests, achievement
tests, interest inventories value questionnaires attitude scale and personality measures
are used for classification decision. There is a minor difference in classification
placement and selection. Classification refers to the cases, where categories are
essentially unordered, placement refers to the case where the categories represent
level of teaching or treatment and selection refers to the case where the persons can be
selected or rejected.
Diagnosis and Remedial Decisions
Assessment is required to locate the students who need special remedial help.
For example what instructional strategies the teacher should use to help a particular
students or a group of student so that the opportunities are maximized to achieve the
objective. Aptitude tests, intelligence tests, diagnostic achievement tests, diagnostic
personality measures etc. may be used to achieve the purpose.
Feed Back
It is not sufficient to assessment student through a test and doing nothing after
that. A good teacher will use tests for the purpose of providing feedback to students.
Feedback may be effective or ineffective depending upon the circumstances.
Feedback will facilitate learning if it confirms the learner's correct responses or
identifies errors and corrects them. Test results made available to parents may be used
for making feedback evaluation device. It is also to be remembered that feedback are
both for the student and teacher because it provide information to both and help in
knowing how will students have learnt and how well the teacher has taught.
Motivation and guidance of learning: Assessment is also used to motivate the
students for more study and providing for learning. However motivation device can
be used positively as well as negatively. Unfortunately most of the schoolteacher use
examination or refusing to grant annual promotion to next class can motivate the
student but if they are motivated with using such evaluation techniques which provide
more confidence to the students in the subject, they will be more effective and lasting.
Aptitude tests, achievement tests, attitude scales, personality measures, interest
inventories, surprise quizzes encourage student for more study and understanding.
Assigning Makers to Students:
The instructional programme remains incomplete if it is not followed by
assessment. Although no teacher chooses teaching profession because he is interested
in evaluating the students but no teacher confines his job to teaching only. He
regularly evaluates his students and assigns them makers. Actually most of the
teachers are giving most of their time to this purpose. If teachers do not evaluate their
students, do not assign those marks or grades, how can they check their effectiveness
of teaching and learning outcome of the students?
Role of Assessment in Education Process
try to assess the students' level of achievement and readiness to learn prior to
beginning. Instruction. What do the students know already and what are their
cognitive skills like? How receptive to learning are they? Which ones seems selfmotivated? This component indicates a need for evaluation information before
instruction actually begins.
Once the teacher has decided what will be taught and to whom the teaching is
to be directed, the "How?" must be determined. The Instructional Procedures
component deals with the material and methods of instruction the teacher selects or
develops to facilitate student learning. Does the text need to be supplemented with
illustration? Should small group projects be developed? Is there computer software
available to serve as a refresher for prerequisites? At this point instruction could
begin, and often it does, but unless the teacher makes plans to evaluate student's
performance, the students and teacher will never be sure when learning is complete.
The performance Assessment component helps to answer the question, "Did we
accomplish what we set out to do? Tests, quizzes, teacher observations, projects, and
demonstration are evaluation tools that help to answer this question. Thus evaluation
should be a significant aspect of the teaching process; teaching does not occur,
according to the model, unless evaluation of learner performance occurs.
A
Instructional Objectives
Instructional Objectives
Instructional Objectives
Instructional Objectives
Feedback Loop
The model shows a fifth component, the Feedback Loop that can be used by
the teacher as both a management and a diagnostic procedure. If the results of
evaluation indicate that sufficient learning has occurred, the loop takes the teacher
back to the Instructional Objectives component, and each successive component, so
that plans for beginning the next instructional unit can be developed. (New objectives
are needed, entering behavior is different, and methods will need to be reconsidered,)
But when evaluation results are not so positive, the Feedback Loop is a mechanism
for identifying possible explanations. (Note the arrows that return to each
component.) Were the objectives too vaguely specified? Did students lack essential
prerequisite skills or knowledge? Was the film or text relatively ineffective? Was there
insufficient practice opportunity? Such questions need to be asked and frequently are.
However, questions need to be asked about the effectiveness of the performance
assessment procedures also, perhaps more frequently than they are. Were the test
questions appropriate? Were enough observations made? Were directions clear to
students? The Feedback Loop returns to the Performance Assessment component to
indicate that we must review and assess the quality of out evaluation procedures, after
the fact to determine the appropriateness of the procedures and the accuracy of the
information. Unless the tools of evaluation are developed with care, inadequate
learning may go undetected or complete learning may be misinterpreted as deficient.
In sum, good teaching requires planning for and using good evaluation tools,
Furthermore, evaluation does not take place in vacuum. The BTM shows that other
components of the teaching process provide cues about what to evaluate, when to
evaluate, and how to evaluate. Our purpose is to identify such cues and to take
advantage of them in building tests and other assessment devices that measures
achievement as precisely as possible.
(B) Assessment decision maker who is concerned about all aspects of the educational
endeavour. The key point to consider and keep in mind is that evaluation involves
appraisal of particular goals or purposes. Useful information may be obtained for
evaluation procedures by both formal and informal mean and should include
information collected during instruction as well as in the end of the course date.
According to Ahmanrt and Giock (1985) School Administrators, guidance personnel,
classroom teacher, and individual students require information that will allow them to
make informed and appropriate decision regarding their respective educational
activities. Ideally, they should be aware of all the alternatives open to them, the
possible outcomes of each alternative, and the advantages and disadvantages of the
respective outcomes, Educational and psychological measurement can help
individuals with these matters.
(C)
(2)
(3)
administered to students to monitor their success and to provide them with relevant
feedback. The information is employed les to grace a student than to make
instructions responsive to the student's strengths anorweaknesses as identified by the
Placement assessment
To determine student performance at the beginning of instruction.
2.
Formative assessment
To monitor learning progress during instruction-
3.
Diagnostic assessment
To diagnose learning difficulties during instruction.
4.
Summative assessment
To assess achievement at the end of instruction.
Although a single instrument may sometimes be useful for more than one
purpose (e.g., both form formative and summative assessment purposes), each of
these types of classroom assessment typically requires instruments specifically
designed for the intended use.
All these types of assessment are discussed below in detail.
Placement Assessment
1.1.3
Measurement
and weight only 85 pounds. Similarly, instead of saying that Hamid is a more
intelligent than Zahid, we can say that Hamid has a measured. IQ of 125 and Zahid
has a measured IQ of 88. In each of the above cases, the numerical statement is more
precise, more objective and less open to interpretation than the corresponding verbal
statement.
Steps of measurement
There are two steps used for in the process of measurement. The first step is to
devise a set of operations to isolate the attribute and make it apparent to us. Just a
standard is used for judging the durability of a thing, in the same way educators and
psychologists use various methods for testing the behaviour or performance of a
student. For this purpose they often use Stanford-Binet Tests or other tests that include
operations for eliciting behaviour that we lake to be indicative of intelligence.
The second step in measurement is to express the results of the operations
established in the first step in numerical or quantitative terms. This involves an
answer to the questions, how many or how much? Just millimetre is used as a unit for
indicating the thickness of a thing, in the same way educators and psychologists use
some numerical units for gauging intelligence, emotional maturity and other
attributes. Thus each step in measurement rests on human- fashioned definitions. In
the first step, we define the attribute that interests us. In the second step, we define the
set of operations that will allow us to identify the attribute, and express the result of
our operations.
Difference between Evaluation and Measurement
Some people use 'evaluation' and 'measurement' in the same meaning. Both the
terms are used for the process of assessing the performance of the student and
collecting information about an educational objective. Both tell how effective the
school programme has been and refer to the collection of information, appraisal of
students, and assessment of programme. Some recognize that measurement is one of
the essential components of evaluation. But there is difference between the two terms.
Roughly speaking, `measurement' is the quantitative assessment whereas 'evaluation'
is the quantitative as well as qualitative assessment of the performance of a student or
an educational objective. Measurement is a limited process used for the assessment of
limited and specific educational objectives. On the other hand, evaluation is much
more comprehensive term used for all kinds of educational objectives. Moreover, for
measurement Evaluation is the continuous inspection of all available information
concerning the student, teacher, educational programme and the teaching- learning
process to ascertain the degree of change in students and form valid judgements about
the students and the effectiveness of the programme. On the other hand 'measurement'
is the collection of data about the performance of a student, teacher or curriculum etc.
However, both 'evaluation' and 'measurement' are closed closely related. We
cannot separate one from the other. Both are used for assessing the effectiveness of a
programme of the appraisal of student. Measurement collects data directly from the
objects of concern of the students. Other information is collected from students by
non-testing procedures. Information provided by testing and non-testing is the best
thought of material to be used in the evaluation process.
The Importance of Measurement in Education
Measurement plays very important role in the teaching-learning process.
Without measurement we cannot assess the effectiveness of an educational
programme, the school or its personnel. For effective teaching, it is necessary for the
teacher to be aware of the strengths and weaknesses of his teaching method. Similarly,
for an effective learning, it is necessary for the student to be aware of the possible
outcomes of all the alternatives. He should also be informed about the advantages and
disadvantages of the respective outcomes. All this is impossible without measurement.
Without measurement, how can a teacher be aware his method of teaching or how a
student can be informed about the outcomes of the alternatives. Without
(1)
Prediction
After evaluation it is possible to predict the performance of students in future.
By evaluation we know the aptitude and interest etc. with the help of which we guide
them to take admission in institution which is according to his aptitude and interest.
So, on the basis of evaluation we can plan for the future.
(3)
Selection
Measurement and evaluation is used during the selection of suitable persons
Classification
Evaluation is helpful in the classification in all educational institutions. At the
end of every year, some tests are given to students to check their ability and make
classification on the basis of results obtained from these tests.
Another educational psychologist, Camp, adds that evaluation plays important
function in making maladjusted students, students as useful members of the society by
finding their interests and attitudes. Students suffering from inferiority complex can
also be treated after their proper evaluation.
In short, evaluation and measurement have important functions in education.
They serve as guidelines for students, teachers, ' counsellors and administrators.
1.1.4
Test
Measurement and evaluation are the two processes that are used to collect
2.
3.
4.
5.
6.
instrument
Measurement
Evaluation
behavior
individual
possesses
a decisions
particular characteristic
Answers
How
the
well
individual
question Answers
does
the
perform-as
compared to others.
It is means of collecting It gives Numerical Value Involves qualitative and
information
to some trait.
quantitative
assessment
and decision-making.
Its objective is to find out Its objective is to present Its objective is no make
the facts pertaining to the
some aspect.
information decisions
objectively.
about
all
components of educational
system
Types of Tests
TESTS
Ability Tests
Achievement Test
Aptitude Test
Essay Tests
Objective Tests
Attitude Tests
Character Tests
Personality Tests
Intelligent Test
As it is shown in the diagram above, tests can be classified into two broad
categories according to the behaviour tested: ability tests and personality tests. These
two types are discussed in detail and are further classified into sub-types in the
following lines.
(A) Ability Tests
These tests are used to test the ability of a student. These tests measure the
maximum performance of a student that a student can do. Ability tests are further
classified into three types; (1) achievement tests, (2) aptitude tests, (3) intelligence
tests. These are discussed in the lines below.
(1)
classroom instruction. They measure the attained ability of a student i.e. what a
student has learnt to do. Achievement tests are further classified into two types of tests
i.e. 'Essay type tests' and 'objective type tests'. (These two types of tests will be
discussed in detail in the next question).
(2)
Aptitude tests: Aptitude Tests are those tests that are used to measure the
potential ability of a student i.e. what a student can learn to do. They measure the
capacity of a student to learn a given content. According to Hull, C. L. "An aptitude
test is a psychological test designed to predict an individual's potentialities for success
or failure in a particular occupation, subject for study, etc. this shows that an aptitude
test is a test designed to discover what potentiality a given person has for learning
some particular vocation or acquiring some particular skill. Achievement tests and
aptitude tests seem to be the same. But the distinction between the two is that they are
different in use. If a test is used to measure the present attainment, it is called
achievement test. And if a test is used to predict the future level of performance, it is
called an aptitude test.
(3)
Intelligence Tests: Intelligence Tests are those tests that are used to measure
the native capacity or the overall mental ability of a student. These are also called
scholastic aptitude tests or tests of mental ability. There are many kinds of intelligent
tests but the most popular one is the concept of (IQ) introduce by Termen. IQ is
computed by dividing the mental age (MA) of a student by his physical age or
chronological age (PA or CA) i.e. the actual age of the student. Then the result is
multiplied by 100.
I.Q =
MA
CA
100
Where:
I.Q. = Intelligence Quotient
M.A. = Mental Age
C.A. = Chronological Age (Physical Age)
(B)
Personality Tests
Tests used for the assessment of personality of a student are called personality
tests. They measure the typical performance of a student i.e. what a student will do.
They are universally administered almost all over the world in various fields,
vocations, institutions, and for the selection of recruits. In Pakistan, too, personality
tests are used for job selection and for the selection of army recruits like ISSB
examinations. Personality tests include attitude tests, interest tests, adjustment and
temperament tests, character tests, and tests of other motivational and interpersonal
characteristics.
Uses of Tests
Tests play important role in teaching- learning process. Without tests we
cannot make evaluation or assessment of a student's or neither teacher's performance
nor we can collect information about the effectiveness of an educational programme.
That is why tests are very important in education. They motivate students for learning.
They serve a number of purposes in a variety of educational activities. The following
are the different uses of tests;
1.
educational activiti .;s are performed for th sack of student. That is why the use and
importance of tests in th process of learning is greater than in any other activity. Tests
hel ,students in knowing their strengths and weaknesses in a subject. The resul,S
obtained from these tests serve as guideline for students. They motivate students to
study.
3.
examiner to know how to guide students educational and vocational choice. Tests also
make parents aware o the aptitude of their children and can make a plan for their
proper guidance. The result of the tests in itself serves as a guideline for the students.
4.
deportment with useful information In the light of these tests, they can easily decide
how to promote students, how to admit them an&how to modify (trie.7) school
objectives, instructional methods and curricula. They can then easily decide how to
make the teachinglearning processes effective.
5.
experimentation in classroom. The research workers use these data in their genetic or
ease study research.
In short, tests are used in almost all educational activities. They are the real
tools with the help of which information about teachers, students, curricula and etc.
are gathered. And in the light of this information, teaching and learning process is
improved.
1.2
Introduction:
The purpose of test is usually included the test is announced or at the
beginning of the semester when the evaluation procedures are described as a part of
the general orientation to the course. Should there be any doubt whether the purpose
of the test is clear to all pupils, however it culd be explained again at at the time of
testing. This is usually done orally. The only time, a statement of the purpose of the
test needs to be included in the written direction is, when the test is to be administered
to several sections taught by different teachers, then a written statement of purpose
ensures, greater uniformity. There are various types of test being applied in the
educational institutions, because no a childs ability interests and personality. One test
measures only a specific ability that is why school administers use many different
types of tests even in one single area such as intelligence, move than one test are
needed over a period of years to obtain a reliable estimate of ability each test serves
its own purpose, however, testing and evaluation serve following purposes.
Types of Testing:
There are four types of testing.
Placement Testing:
Formative Testing:
Formative tests are given periodically during instruction on monitor pupils
learning progress and to provide ongoing feedback to pupils and teacher, formative
testing reinforces successful learning and reveals learning weaknesses in need of
correction. A formative test typically covers some predefined segment passes a rather
limited simple of learning tasks. The test items may be easy or difficult, depending on
the learning tasks in the segment of instruction being tested, formative tests are
typically criterion referenced mastery test, but norm-referenced survey tests can also
survey this function, ideally, the test will be constructed in such a way that corrective
prescription can be given for missed test items or sets of lest items. Because the main
purpose of the test is to improve learning the result are seldom used for assigning
grades.
Diagnostic Testing:
Diagnosis of persistent learning difficulties involves much more then
diagnostic testing, but such tests are useful in the total process. The diagnostic test
takes up where the formative test leaves off if pupils do not respond to the feedback
corrective prescriptions of formative testing. A more detailed search for the source of
testing we will need to include a number of a test items in each specific area, with
some slight variation from item to items in diagnosing pupils, difficulties in adding
whole numbers, for example we would want to include addition problems containing.
Various numbers combination with some not requiring carrying and some requiring
carrying, to pinpoint the specific types of error each pupil is making. Because our
focus is on the pupils learning difficulties, diagnostic test must be constructed in
accordance with the most common sources of error that pupils encounter. Such tests
are typically confined to a limited area of instruction, and the test items tend to have a
relatively low level of difficulty.
Summative Testing:
The summative test ig given at the end of a course or unit of instruction, and
the results are used primarily for assigning grades or certifying pupil mastery of the
instructional objectives. The result can also be used for evaluating the effectiveness of
the instruction. The end of the course test (final examination) is typically a normreferenced survey test that is broad in coverage and includes test items with a wide
range of difficulty. The more restricted end of unit, summative test might be norm
referenced or criterion referenced depending on whether mastery or developmental
outcomes are the focus of instruction.
Purpose of Testing:
1.
on simple observation. These tests given the teacher an objective and comprehensive
picture of each pupils progress. This is important because all concerned persons
(students themselves, students parents, teachers, counselors, administrators,
employers, admission officers, and even community) need to know students
performed in school and in particular courses.
pupils progress, so that it could be presented to the present. These reports from the
foundation for most effective cooperation between parents and teachers, which results
improved learning.
To Report to Administrators:
The results of tests indicate the extent to which the schools objectives are
being achieved from the results of evaluation the administrators become able to
identify the weak points and strengths in the teaching programs of their schools and
take necessary action for their improvement.
the teacher to answer the questions like: Do the pupils possess the abilities and skills
needed to proceed with the instruction? What, and to what level have the pupils
already mastered the intended outcomes? This information helps the teacher in
planning his instructional activities.
instructional process it helps the teacher in changing and adapting the instructional
activities continuously according to the students needs.
To Furnish Instruction:
Testing factions as an instructional device it not only increases the self-
knowledge of the students, but also the attainment of specific objectives. This practice
of giving tests is common in our institutions through these the students become
aware of their speed of progress, errors, and present status on the basis of which they
plan their further efforts.
students. These are useful in assisting the students with educational and vocational
decisions, guiding them in the selection of curricular and co-curricular activities, and
helping them solve personal and social adjustment problems.
the pupils achieved the instructional objectives. Testing and evaluation help in this
regard tests are useful in determining the learning outcomes of classroom instruction.
The teacher can evaluate the success or failure of classroom learning in relation to the
test results. The teacher then accordingly adjusts the level and direction of classroom
instruction.
educational or social adjustment. These include the withdrawn, the unhappy, the
mentally retarded, and others who are not adjusting to the pattern of the school. The
standardized tests help the teachers and counselors to understand and help such
students.
To Conduct Research:
Test and evaluation data is important in research programs. The information
curriculum so that it could be changed in accordance will the need, of the society.
student learning and development. In order to make this process effective, the
following principles are taken into consideration.
1)
2)
3)
4)
procedures.
Proper use of assessment procedure requires an awareness of their limitations.
Assessment procedures range from very highly developed measuring
instruments to rather crude assessment devices. Even the best educational and
psychological measuring instrument yield results that are subject to various
types of measurement error.
Not best or assessment asks all the questions or poses all the problems that
might appropriately be presented in a comprehensive coverage of the
knowledge, skills and understanding relevant to the content standards or
objectives of a course or instructional sequence. Instead only a sample of the
relevant problems of questions is presented.
Even in a relatively narrow part of a content domain, such as understanding
photosynthesis or the addition and subtraction of fractions, there are a host of
problems that might be presented, but any given test or assessment samples
but a small fraction of those problems. Limitations of assessment procedures
do not negate the value of tests and other types of assessments. A keen
awareness of the limitations of assessment instruments makes it possible to
use them more effectively. Cruder the instrument, the greater its limitations
and consequently, the more caution required in its use.
5)
1.4
Programme Evaluation
Formative evaluation
Summative evaluation
Student evaluation
Diagnostic evaluation
iii.
iv.
v.
the content.
Relative stability and survival value in the literature.
Applicability to familiar and novel situation.
To matter how good a programme may be; the maintenance system must be
well facilitated. The school administrators head of subjects unit, supervisors. The
teacher and the pupils must be activity involved if successful implementation of the
programme is to be realized, the teacher being the main executor of the programme
must be will trained not just to be able to teach facts but so select facts that relate to
other facts and principles. The teacher education programmes in the advanced teacher
colleges and the universities must prepare teachers to be able to teach their subjects
effectively.
In order to be implemented a programme should be designed in such a way
that under favourable conditions certain intended learning outcomes will emerge. The
school teacher, the headmaster and supervisor must gather information from time to
time in order determine the success or seakness of the programme. If desirable
outcomes are observed, the focus of all concerned with instruction should be to
improve the programme through an effective maintenance system. If the product
(students) produced are of poor quality, corrective measures are selected and applied
in order to achieve the desired results. If after all these efforts, the products are still
found to be poor the programme is usually abandoned.
Several process are involved in the input out put process. The teacher is the
most important component of the maintenance process of the programme.t he
interacts with the students with the staff, experts and administrators and forms a
bridge between hem and learning materials. Often he acts as the input analyzer and an
identifier as well as the teaching agent of the programme.t he external sensor
Forative and
ii. Summative
i.
Formative Evaluation:
Formative evaluation aims at ensuring a healthy acquisition and development
3.
and so on.
To specify the relationship between content and levels of cognitive abilities.
In other words, formative e evaluation provides the evaluator with useful
3.
4.
3.
4.
5.
6.
Summative Evaluation
Summative evaluation is primary concerned with purposes progress and
outcomes of the teaching learning process attempts as far as possible to determine to
what extent the broad objective of a programme have been achieved. It is based on the
following assumptions.
1.
2.
3.
4.
3.
more is to be done.
The summative evolution limits the use of profile and record of achievement
4.
evaluation takes the form of a dialogue between the student and teacher in
which both determine the task.
Broad Differences Formative and Summative
Characteristic
Formative
Summative
Purpose
Content focus
Methods
Daily assignments
Projects
Observations
Projects
Daily
Frequency
1.5
terms of an individuals relative standing in some known group is called normreferenced test. A norm group may be made up of a students at the local level, district
level, provincial level or national level.
Types of Norms: There are two type of norms which are following.
a)
b)
2.
3.
4.
5.
Norm-referenced test is likely to have times that are very difficult for the grade
level so student can be ranked.
Test item that are answered correctly by most of the pupils are not included in
these test because of their inadequate contribution to response variance. They
will be the items that deals with the important concepts of course content.
2.
a)
1.
2.
b)
1.
2.
3.
4.
5.
c)
students interaction.
Limitation of CRTS:
CRTs tell only whether a learner has reached proficient in a task area but does
not show how good or poor in the learners level of ability.
Only some area readily land themselves for listing specific test can be built
and this may be a constructing element for teacher.
1.6
EDUCATIONAL:
Educational assessment can be defined as the process of documenting
What idea?
Q2.
What document?
Q3.
Examples:
If teachers goal is that students should learn written skills or creative writing,
composition, sentence structure so the if multiple choice will be option for assessment
then it will be poor one, teacher must include story writing, easy, summaries or such
type of things for improving writing stills of a child.
Close match between the intended learning goals and type of assessment is
must.
Comprehensive assessment requires a variety of procedure:
A lot of procedures are required to assess the knowledge of a person about
anything. Things which are to be assessed also play a vital role in connectivity with
the procedure some of the procedures are given below.
Multiple choice
Short answer
Essay test
Written projects
Observational technique
Multiple-Choice and short answer test of achievement are useful for
measuring knowledge, understanding, and application outcomes, but essay tests and
other written project are needed to assess the ability to organize and express ideas.
Projects that require students to formulate problems, accumulate to formulate
problems, accumulate information through library research or collect data (e.g
through experimental observations or interviews) are needed to measure certain skills
in formulating and sawing problems observational techniques are needed to assess
performance skills and various aspects of students behavior and self-report techniques
are use full for assessing interests and attitudes. A complete picture of students
achievement and development requires the use of many different assessment
procedure.
UNIT-2:
JUDGING THE QUALITY OF THE TEST
Definition: Test percentile scores are just one type of test scores you will find on your
child's testing reports. Many test reports include several types of scores. Percentile
scores are almostalways reported on major achievement that are taken by your child's
entire class. Percentile scores will also be found on individual diagnostic test reports.
Understanding test percentile scores is important for you to make decisions about
your child's special education program.
Test percentile scores commonly reported on most standardized assessments a child
takes in school. Percentile literally means perhundred. Percentile scores on teachermade tests and homework assignments are developed by dividing the student's raw
score on her work by the total number of points possible. Converting decimal scores
to percentiles is easy. The number is converted by moving the decimal point two
places to the right and adding a percent sign. A score of .98 would equal 98%.
Test percentiles on a commercially produced, norm-referenced or standardized test,
are calculated in much the same way, although thecalculations are typically included
in test manuals or calculated with scoring software.
If a student scores at the 75th percentile on a norm-referenced test, it canbe said that
she has scored at least as well, or better than, 75 percent of students her age from the
normative sample of the test. Several othertypes of standard scores may also appear
on test reports.
Percentile rank
From Wikipedia, the free encyclopedia
The percentile rank of a score is the percentage of scores in its frequency distribution
that are the same or lower than it. For example, a test score that is greater than or
equal to 75% of the scores of people taking the test is said to be at the 75th percentile
rank.
Percentile ranks are commonly used to clarify the interpretation of scores on
standardized tests. For the test theory, the percentile rank of a raw score is interpreted
as the percentages of examinees in the norm group who scored at or below the score
of interest.mm
Percentile ranks (PRs) are often normally distributed (bell-shaped) while normal
curve equivalents (NLEs) are uniform and rectangular in shape. Percentile ranks are
not on an equal-interval scale; that is, the difference between any two scores is not the
same between any other two scores whose difference in percentile ranks is the same.
For example, 50 _ 25 = 25 is not the same distance as 60 _ 35 = 25 because of the
bell-curve shape of the distribution. Some percentile ranks are closer to some than
others. Percentile rank 30 is closer on the bell curve to 40 than it is to 20.
The mathematical formula is
ce+0.5 fi
X 100%
N
where c is the count of all scores less than the score of interest, f is the frequency of
the score of interest, and Nis the number of examinees in the sample. If the
distribution is normally distributed, the percentile rank can be inferred from the
standard score.
2.1
Introduction:
Tests play a central role in the evaluation of pupil learning. They provide
relevant measures of many important learning outcomes. Tests and other evaluation
instruments serve a variety of uses in the school, for example test of achievement
might be used for selection, placement, diagnosis or certification of mastery.
When constructing or selecting tests and other evaluation instruments, the
most important question is to what extent will be the interpretation of the scores be
appropriate, meaningful and useful for the intended application of the results?So
validity is always concerned with the specific use of results.
Factors Influencing Validity:
A careful examination of test items will indicate wether the test appears to
measure the subject matter content and the mental function that the teacher is
interested in testing. Following are the factors that prevent the test items from
functioning as intended and there by lower the validity of the interpretation.
1.
2.
3.
4.
Unclear Direction:
Directions that do not clearly indicate to the pupil how to respond to the items
will reduce validity.
Reading Vocabulary and Sentence Structure too Difficult:
Vocabulary and sentence structure that is too complicated for the pupils will
distort the meaning of the test results.
Inappropriate level of difficulty:
Items that are too easy or too difficult also lower validity.
Poorly constructed items:
Test items that provide clues to the answers will measure the pupils alertness
in detecting clues as well as those aspects of pupil performance that the test is
5.
intended to measure.
Ambiguity:
Ambiguous statements confuse the pupil and so causes to discriminate in a
6.
negative direction.
Inadequate time limits:
Time limits that do not provide pupil with enough time to consider the items
7.
8.
Test items are arranged in order of difficulty, with the easiest items first.
9.
Placing difficult items early may cause pupils to spend too much time.
Identifiable pattern of answers:
Placing correct answers in some systematic patern will enable pupils to guess
the answers more easily.
Content Validity:
Content validity is evaluated by showing how well the content of the test
samples the class of situations. It is especially important in the case of
2.
3.
concurrent validity.
Predictive Validity:
It is evaluated by showing how well prediction made from the tests are
confirmed by evidence gathered at some subsequent time. When the tester
wants to estimate how well a student may be able to do in college courses on
4.
the basis of how well he has done on test he took in secondary schools.
Construct Validity:
It is evaluated by investigating what psychological qualities a test measures. It
is ordinarily used when the tester has no definitive criterion measure of what
he is concerned with and hence must use indirect measures. This type of
validity is usually involved in such tests as those of study habits,
appreciations, understanding and interpretation of data.
Conclusion:
In short we can say that validity is specific to the purpose and situation for
which a test is used. A test can be reliable without being valid but the converse is not
true in other words. It is conceivable that a test can measure some quality with a high
degree of consistency without measuring at all the quality it was actually intended to
measure.
2.2
i.
Content Validity:
Content validity is the degree to which a test measures an intended content
area. In other words the content validity of a test refers to the extent to which
the test content represents a specified universe of content.
For Example: for example if a teacher taught a course of biology and would
like to give a test at the end of the course.
ii.
Construct Validity:
iv.
measure the ability to reason, and give two reasoning tests to his class.
Criterion related Validity:
This type of validity is used to predict about the upcoming or futures or
current performance and it correlated the test results with another criterion of
interest. (Coz by, Zool).
For Example: If for an educational program, measures are developed to
v.
intelligence test was unreliable then a student soring an IQ of 120 today might score
an IQ of 140 tomorrow and a 95 the day after tomorrow. On the other hand if the test
is reliable then the IQ of a student will remain nearly the same each time the test is
administered. The reliability of a test depends upon the number of questions consisted
by it. A test will be more reliable if it possesses more questions. In this respect,
objective type tests are more reliable because its sampling is more extensive.
We can take another expert opinion to understand the meaning of reliability.
If a clinical thermo meter on three successive determinations, for example yielded
reading of 97o, 103o and 99.6ofor the same patient, it would not be considered very
reliable.
Reliability, of course is a necessary but not a sufficient condition for using a
test. A highly reliable test may be totally invalid or may not measure anything that is
psychologically of educationally significant.
The reliability of a single of a single test score is expressed quantitatively in
terms of the instruments standard error of measurement. If the standard error of
measurement, for example, is 2.5, we can say that there are approximately two
chances in three (more precisely 68 in 100) that the true score falls between 72.5 and
77.5 when the obtained score is 75. By definition, an unreliable test cannot possible
be valid. The necessary degree of reliability however depends on the use that is made
of test scores.
Methods of Determining Reliability:
For determining reliability, it is necessary that the test should be valid and it
should measure what it is designed to measure. It should be administered to an
appropriate person or group of parsons for whom the test has been developed.
Reliability is a statistical measure and therefore it can be computed by using different
statistical methods. Which have been stated in detail on next page.
1.
Test-retest method:
When the reliability of the results are two measured, then at that very situation
the test retest method is used in this method the tests are subjected to the group
of students at different perrods of time. The scores obtained from first and
second time can be correlated in order to check the stability and persistency of
test.
In test re-test reliability the time factor counts a lot in very close retesting the
results are approximately the same, yielding high correlation. But when the
retest is administered after an year or two, as the result of changes in the
characteristics of students, there are expected to be large variation in the
outcome and therefore stability will be low.\
Limitation:
i.
ii.
iii.
The test retest method is not an objective method ascertaining reliability of the
test.
2.
Equivalent forms are used in the same group and in close succession. The
result of both the tests are correlated. The correlation shows the degree to
which both tests are measurining the same content area. Sometimes the
equivalent forms are used with time interval. Results obtained by this method
provide both stability and reliability of test. This method is generally
considered to be the best method.
Limitation:
i.
ii.
This process is more time consuming and also it is not free from carry over
effect.
iii.
More over establishment of reliability through this method is not feasible for
each and every type of test.
3.
1
2 x Reliability on test
2
1
1+ Reliability on test
2
Like equivalent forms method the split-half method helps in determining the
reliability of test items are representative sample of the content.
Limitation:
i.
ii.
The second criticism is concerned with the items difficulty. Generally the
items of a test are arranged in ordered of difficulty but this fact is not true for
each and every type of test. Say for example, without knowing the difficulty
level of items if one goes for splitting all the difficulty items in one half and
the simple item in another half, it will affect the reliability coefficient
adversely.
4.
iii. Immediate
Kuder
Richardson
formula
5.
6.
sets of instruments.
Inter - rater method of Reliability:
The Measures of the reliability about the different judges agree upon the
decisions about the assessment is called inter rater method of rebility. The
answers cannot effectively interpret by human observes and for that very
purpose the inter-rater reliability is of utmost importance.
2.4
Reliability:
The degree or the extent of the similarities among the results obtained on
several occasion or in other words it can be defined as the degree to which an
assessment instruments elicit stable and consistent plethora results.
Reliability means consistency of measurement. The words of Ebel & Frisbie
The ability of a test to measure the same quantity when it is administered, to an
individual on two different occasions by two different testers is called reliability.
Reliability also called dependability or trust worthiness. Reliability is the
degree to which a test consistently measures whatever it measure.
A very important factor influencing test reliability is the number of test items.
That is the greater number of items in a test, the more reliable the test.
2.
Other things being equal the narrower the rang of difficulty of the items of a
test the granter the reliability.
3.
4.
Other things being equal, inter-dependent items tend to decrease the reliability
of a test.
5.
The more objective the scoring of a test the more reliable is the test.
6.
Chance in getting the correct answer to an items is a factor which lowers the
test reliability.
7.
Other things being equal, the more homogenous the material of a test the
greater its reliability.
8.
Other thing being equal, the more common the experiences called for in a test
are the members of the group taking the test more reliable the test.
9.
Other things being equal the same test given late in the school year (i.e. after
covering the unit in the class) is more reliable that when given Carly in the
year (i.e. without teaching the unit).
10.
Other things being equal, each, question in a test lower the reliability of test. A
test answered by the systematic relall or recognition of orderly facts or
experience is more reliable than a test answered by sudden insight because of
novelty.
11.
Lengthy items lower the reliability because certain factors in the item will be
over or under estimated.
12.
13.
14.
15.
Difference in incentive and effort tend to make tests unreliable. The appeal of
a test is stronger with some individuals than with others, and is stronger with
an individual at one time than at another.
16.
17.
The interval between the test and retest is important for reliability estimate.
18.
19.
Illness, worry, excitement though less important still they influence the
reliability of the test.
References
Murad Ali Katozai 1st Edition, June, 2013 Measurement and Evaluation.
Dr. Mohammad Nooman & Obaid Ullah 1st Edition June 27th 2013 A Manual
of Educational & Social Science and Research Methodologies.
2.5
PRACTICALITY:
Meaning:
The word Practicality means feasibility or us ability.
A test will be practicable if it is easy to administrated, easy to interpret and
economical in operation. A good test is that which have sufficiently simple
instructions so that it can be administered even by a person of low level intelligence.
Tests having difficult instructions and requiring high level training for administering
them and expensive for wide use in schools are social to have low usability or
practicability. Practicality refers to the economy of time effort and money in testing.
In other words a test should be.
Easy to design
Easy to administer
Easy to interpret
Characteristics of Practicality:
There are many characteristics of practicality they are.
1.
The test should be free from drawbacks and limitations of both essay type and
objective type tests. They should have they merits and good point to both these
type of test. For this purpose a test should have both essay and objective type
of test and questions so that it may cover at the time, the whole course as well
2.
3.
4.
5.
6.
students.
It should be arranged in the social and economical conditions of the country.
There should be no choice in the given questions. Students should have to
answer all the questions. This will discourage selective study.
UNIT-3:
APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS)
3.1
3.1.1
Item Analysis
Item is a statistical technique which is used for selecting and rejecting the
items of a test on the basis of their difficulty value and discriminative power. Item
analysis is a general term that refers to the specific methods used in education to
evaluate test items, typically for the purpose of test construction and revision.
Regarded as one of the most important aspects of test construction and increasingly
receiving duration, it k an approach incorporated into item response theory (ERT),
high serves as an alternative to classical measurement theory (GMT) or classical test
theory (CIT). Classical measurement theory considers a score to Ile the direct result of
a person's true score plus error. It is this error that is of interest as previous
measurement theories have been unable to specify its source. However, item response
theory uses item analysis to differentiate between types of error in order to gain a
clearer(1) The main objective of item analysis is to select the appropriate
understanding of any existing deficiencies. Particular attention is given to individual
test items, item characteristics, probability of answering items correctly, overall
ability of the test taker, and degrees or levels of knowledge being assessed.
Item analysis is concerned basically with the two characteristics of an item-difficulty value and discriminative power.
Need of Item Analysis
Item analysis is a technique by which the test items are selected and rejected.
The selection of items may serve the purpose of the designer or test constructor,
because the items have the such characteristics. The following are the main purpose of
the test:
(a)
(b)
(c)
(d)
(e)
(f)
different characteristics. The selection or entrance test includes the items of high
difficulty value as well as high power of discrimination. The promotion or prognostic
test has the items of moderate difficulty value. There are various techniques of item
analysis which are used these days.
The Objectives of Item Analysis
(1)
The following are the main objectives of item analysis technique: items for the
final drift and reject the poor items which do not contribute in the functioning
of the test. Some items are to be modified.
(2)
Item analysis obtains the difficulty values of all the items of preliminary draft
of the test. The items are classified- difficulties, moderate and easy items.
(3)
(4)
(5)
(6)
It provides the basis for preparing the final draft a test. In the final draft items
are arranged in difficulty order. The most easy items are given in the
beginning and most difficult items are provided at the end.
(7)
Item analysis is a cyclic technique. The modified items are tried out and their
item analysis is done again to obtain these indexes (difficulty and
discrimination). The empirical evidences are obtained for selecting the
modified items for the final draft.
Item difficulty value (D. V.) is the proportion of subjects answering each item
correctly.
(2)
must be able to discriminate between high and poor students on the test. In other
words, a test fulfils its purpose with maximum success when each .items serves as
good predictor. Therefore it is essential that each item of the test should be analysed in
terms of its difficulty value and discriminative power for the justification. Item
analysis serves the following purpose
(1)
(2)
To select the best items for a test with regard to its purpose after a proper try
out on the group of subjects selected from the target population.
(3)
To provide the statistical check-up for the characteristics of the test items for
the judgment of test designer.
(4)
To set up parallel forms of a test. Parallel form of test should not require only
to have Similar items content or type of items but they should also have the
sky& difficulty value and discriminative power. Item analysis' technique that
exactly parallel test can be developed, provides 'the empirical basis.
(5)
To modify and reject OF poor items of the test. The poor items may not serve
the purpose of the test. The powerful distractor of items are changed an'tkpoor
distracters are also changed.
(6)
Item analysis is usually done of a power test rather than speed test. It speed
test all the items are of the same difficulty value. The purpose of speed test is
to measure the speed and accuracy while speed is acquired through practice.
There is no power test, because the time limit is imposed, therefore these are
the speeded test. The speediness of the test depends on the difficulty values of
the items of the test. Most of the students should reach to last items, in the
allotted time for the test. Item analysis is the study of the statistical properties
of test items. The qualities usually of interest are the difficulty of the item and
its ability or power to differentiate between more capable and less capable
on actually knowing the correct answer of an item rather than answering an item
correctly.
In the procedure of item-analysis "correction for guessing formula is used for
the scores rather than right answers. The difficulty value is also obtained in terms of
standard scores or z-scores.
Methods or Techniques of item Analysis
A recent review of the literature on item analysis indicates that there are at
least twenty three different techniques of item analysis. As it has been discussed that
item analysis technique obtain the indexes for the characteristics of an item. The
following two methods of item analysis are most popular and are widely used.
1)
2)
Stanley method of item analysis. It is used for the diagnostic test items. The
wrong responses are considered in obtaining the difficulty value and
discriminative power. The wrong responses provide the cause of weakness of
the students. The proportion of wrong responses on an item is considered for
this purpose.
There are separate techniques for obtaining difficulty value and discriminative power
of the items.
(a)
a1 Proportion of right responses on an item technique. Davis and Haper have also
used this technique.
a2 Standard scores or z-scores or normal probability curve.
Technique of Discriminative Power.
b1 Proportion of right responses on an item technique. Davis and Haper have used
this technique.
3.2
techniques of item analysis have been devices to obtain the difficulty value and
discriminative index of an item of a test. It is not possible to describe all the
techniques of item analysis in this chapter. Therefore, most popular and widely used
techniques have been discussed.
Fredrick B. Davis method of Item Analysis of Prognostic test, and
Stanley method of Item Analysis of Diagnostic test.
"The item difficulty value may be defined as the proportion or percentage of
certain sample subjects that actually know the answer of an item.
--Frank S. Freeman
The difficulty value depends on actually knowing the answer rather than
answering correctly i.e. right responses. In objective type test, the items are answered
correctly by guessing rather than actually knowing the answer. It means that an item
may be answered without knowing its answer. Thus, correction for guessing is to be
used for obtaining the scores which may be actual correct responses.
It is important to note that in the procedure of item analysis item wise scoring is
done, while subject wise scoring is done in general. There are several formulas have
(b)
(b)
(c)
3.2
skill, in part because it presents so many choices. Decisions must be made concerning
the method, format, timing, and duration of the evaluative procedures. Once designed,
the evaluative procedure must be administered and then scored, interpreted, and
graded. Afterwards, feedback must be presented to students. Accomplishing these
tasks demands a broad range of cognitive, technical, and interpersonal resources on
the part of faculty. But an even more critical task remains, one that perhaps too few
faculty undertake with sufficient skill and tenacity: investigating the quality of the
evaluative procedure.
Even after an exam, how do we know whether that exam was a good one? It is
obvious that any exam can only be as good as the items it comprises, but then what
constitutes a good exam item? Our students seem to know, or at least believe they
know. But are they correct when they claim that an item was too difficult, too tricky,
or too unfair?
Lewis Aiken (1997), the author of a leading textbook on the subject of
psychological and educational assessment, contends that a "postmortem" evaluation is
just as necessary in classroom testing as it is in medicine. Indeed, just such a
postmortem procedure for exams exists--item analysis, a group of procedures for
assessing the quality of exam items. The purpose of an item analysis is to improve the
quality of an exam by identifying items that are candidates for retention, revision, or
removal. More specifically, not only can the item analysis identify both good and
deficient items, it can also clarify what concepts the examinees have and have not
mastered.
So, what procedures are involved in an item analysis? The specific procedures
involved vary, but generally, they fall into one of two broad categories: qualitative and
quantitative.
when no examinees answered the item correctly, and 1.00, obtained when all
examinees answered the item correctly. Notice that no test item need have only one p
value. Not only may the p value vary with each class group that takes the test, an
instructor may gain insight by computing the item difficulty level for a number of
different subgroups within a class, such as those who did well on the exam overall and
those who performed more poorly.
Although the computation of the item difficulty indexp is quite
straightforward, the interpretation of this statistic is not. To illustrate, consider an item
with a difficulty level of 0.20. We do know that 20% of the examinees answered the
item correctly, but we cannot be certain why they did so. Does this item difficulty
level mean that the item was challenging for all but the best prepared of the
examinees? Does it mean that the instructor failed in his or her attempt to teach the
concept assessed by the item? Does it mean that the students failed to learn the
material? Does it mean that the item was poorly written? To answer these questions,
we must rely on other item analysis procedures, both qualitative and quantitative ones.
Item Discrimination Index (D)
Item discrimination analysis deals with the fact that often different test takers
will answer a test item in different ways. As such, it addresses questions of
considerable interest to most faculty, such as, "does the test item differentiate those
who did well on the exam overall from those who did not?" or "does the test item
differentiate those who know the material from those who do not?" In a more
technical sense then, item discrimination analysis addresses the validity of the items
on a test, that is, the extent to which the items tap the attributes they were intended to
assess. As with item difficulty, item discrimination analysis involves a family of
techniques. Which one to use depends on the type of testing situation and the nature
of the items. I'm going to look at only one of those, the item discrimination index,
symbolized D. The index parallels the difficulty index in that it can be used whenever
Divide the group of test takers into two groups, high scoring and low scoring.
Ordinarily, this is done by dividing the examinees into those scoring above
and those scoring below the median. (Alternatively, one could create groups
2.
3.
levelp, the item discrimination index can take on negative values and can range
between -1.00 and 1.00. Consider the following situation: suppose that overall, half of
the examinees answered a particular item correctly, and that all of the examinees who
scored above the median on the exam answered the item correctly and all of the
examinees who scored below the median answered incorrectly. In such a situation P,
upper, = 1.00 and P lower = 0.00. As such, thevalue of the item discrimination index D
is 1.00 and the item is said to be a perfect positive discriminator. Many would regard
this outcome as ideal. It suggests that those who knew the material and were wellprepared passed the item while all others failed it.
Though it's not as unlikely as winning a million-dollar lottery, finding a
perfect positive discriminator on an exam is relatively rare. Most psychometricians
would say that items yielding positive discrimination index values of 0.30 and above
are quite good discriminators and worthy of retention for future exams.
Finally, notice that the difficulty and discrimination are not independent. If all
the students in both the upper and lower levels either pass or fail an item, there's
nothing in the data to indicate whether the item itself was good or not. Indeed, the
value of the item discrimination index will be maximized when only half of the test
takers overall answer an item correctly; that is, whenp = 0.50. Once again, the ideal
situation is one in which the half who passed the item were students who all did well
on the exam overall.
Does this mean that it is never appropriate to retain items on an exam that are
passed by all examinees, or by none of the examinees? Not at all. There are many
reasons to include at least some such items. Very easy items can reflect the fact that
some relatively straightforward concepts were taught well and mastered by all
students. Similarly, an instructor may choose to include some very difficult items on
an exam to challenge even the best-prepared students. The instructor should simply be
aware that neither of these types of items functions well to make discriminations
among those taking the test.
[material omitted...]
Conclusion
To those concerned about the prospect of extra work involved in item analysis,
take heart: item difficulty and discrimination analysis programs are often included in
the software used in processing exams answered on Scantron or other optically
scannable forms. As such, these analyses can often be performed for you by personnel
in your computer services office. You might consider enlisting the aid of your
departmental student assistants to help with item distractor analysis, thus providing
them with an excellent learning experience. In any case, an item analysis can certainly
help determine whether or not the items on your exams were good, ones and to
determine which items to retain, revise, or replace.
Understanding Item Analysis Reports
Item analysis is a process which examines student responses to individual test
items (questions) in order to assess the quality of those items and of the test as a
whole. Item analysis is especially valuable in improving items which will be used
again in later tests, but it can also be used to eliminate ambiguous or misleading items
in a single test administration. In addition, item analysis is valuable for increasing
instructors' skills in test construction, and identifying specific areas of course content
which need greater emphasis or clarity. Separate item analyses can berequested for
each raw score' created during a given ScorePak run. Sample
Sample item analysis (30K PDF*)
A basic assumption made by ScorePak is that the test under analysis is
composed of items measuring a single subject area or underlying ability. The quality
of the test as a whole is assessed by estimating its "internal consistency." The quality
of individual items is assessed by comparing students' item responses to their total test
scores.
Following is a description of the various statistics provided on a ScorePak
item analysis report. This report has two parts. The first part assesses the items which
made up the exam. The second part shows statistics summarizing the performance of
the test as a whole.
Item Statistics
Item statistics are used to assess the performance of individual test items on
the assumption that the overall quality of a test derives from the quality of its items.
The ScorePak item analysis report provides the following item information:
Item Number
This is the question number taken from the student answer sheet, and the
ScorePak Key Sheet. Up to 150 items can be scored on the Standard Answer Sheet.
adding up the number of points earned by all students on the item, and dividing that
total by the number of students.
The standard deviation, or S.D., is a measure of the dispersion of student
scores on that item. That is, it indicates how "spread out" the responses were. The
item standard deviation is most meaningful when comparing items which have more
than one correct alternative and when scale scoring is used. For this reason it is not
typically used to evaluate classroom tests.
Item Difficulty
For items with one correct alternative worth a single point, the item difficulty
is simply the percentage of students who answer an item correctly. In this case, it is
also equal to the item mean. The item difficulty index ranges from 0 to 100; the higher
the value, the easier the question. When an alternative is worth other than a single
point, or when there is more than one correct alternative per question, the item
difficulty is the average score on that item divided by the highest number of points for
any one alternative. Item difficulty is relevant for determining whether students have
learned the concept being tested. It also plays an important role in the ability of an
item to discriminate between students who know the tested material and those who do
not. The item will have low discrimination if it is so difficult that almost everyone
gets it wrong or guesses, or so easy that almost everyone gets it right.
To maximize item discrimination, desirable difficulty levels are slightly higher
than midway between chance and perfect scores for the item. (The chance score for
five-option questions, for example, is 20 because one-fifth of the students responding
to the question could be expected to choose the correct option by guessing.) Ideal
difficulty levels for multiple-choice items in terms of discrimination potential are:
Format
Ideal Difficulty
Five-response multiple-choice
70
Four-response multiple-choice
74
Three-response multiple-choice
77
85
(from Lord, F.M. "The Relationship of the Reliability of Multiple-Choice Test to the
Distribution of Item Difficulties," Psychometrika, 1952, 18, 181-194.)
ScorePak arbitrarily classifies item difficulty as "easy" if the index is 85% or
above; "moderate" if it is between 51 and 84%; and "hard" if it is 50% or below.
Item Discrimination
Item discrimination refers to the ability of an item to differentiate among
students on the basis of how well they know the material being tested. Various hand
calculation procedures have traditionally been used to compare item responses to total
test scores using high and low scoring groups of students. Computerized analyses
provide more accurate assessment of the discrimination power of items because they
take into account responses of all students rather than just high and low scoring
groups.
The item discrimination index provided by ScorePak is a Pearson Product
Moment correlation2 between student responses to a particular item and total scores
on all other items on the test. This index is the equivalent of a point-biserial
coefficient in this application. It provides an estimate of the degree to which an
individual item is measuring the same thing as the rest of the items.
Because the discrimination index reflects the degree to which an item and the
test as a whole are measuring a unitary ability or attribute, values of the coefficient
will tend to be lower for tests measuring a wide range of content areas than for more
homogeneous tests. Item discrimination indices must always be interpreted in the
context of the type of test which is being analyzed. Items with low discrimination
indices are often ambiguously worded and should be examined. Items with negative
indices should be examined to determine why a negative value was obtained. For
example, a negative value may indicate that the item was mis-keyed, so that students
who knew the material tended to choose an unkeyed, but correct, response option.
Tests with high internal consistency consist of items with mostly positive
relationships with total test score. In practice, values of the discrimination index will
seldom exceed .50 because of the differing shapes of item and total score
distributions. ScorePak classifies item discrimination as "good" if the index is above
.30; "fair" if it is between .10 and.30; and "poor" if it is below .10.
Alternate Weight
This column shows the number of points given for each response alternative.
For most tests, there will be one correct answer which will be given one point, but
ScorePak allows multiple correct alternatives, each of which may be assigned a
different weight.
Means
The mean total test score (minus that item) is shown for students who selected
reported. The bar graph on the right shows the percentage choosing each response;
each "#" represents approximately 2.5%. Frequently chosen wrong alternatives may
indicate common misconception among the students.
Difficulty and discrimination Distributions
At the end of the Item Analysis report, test items are listed according their
degrees of difficulty (easy, medium, hard) and discrimination (good, fair, poor). These
distributions provide a quick overview of the test, and can be used to identify items
which are not performing well and which can perhaps be improved or discarded.
Test Statistics
Two statistics are provided to evaluate the performance of the test as a whole.
Reliability Coefficient
The reliability of a test refers to the extent to which the test is likely to produce
consistent scores. The particular reliability coefficient computed by ScorePak
reflects three characteristics of the test:
The intercorrelations among the items -- the greater the relative number of
positive relationships, and the stronger those relationships are, the greater the
reliability. Item discrimination indices and the test's reliability coefficient are
Interpretation
.80- .90
.70 - .80
.60 - .70
.50 - .60
.50 or below
Whereas the reliability of a test always varies between 0.00 and 1.00, the
standard error of measurement is expressed in the same scale as the test scores. For
example, multiplying all test scores by a constant will multiply the standard error of
measurement by that same constant, but will leave the reliability coefficient
unchanged.
A general rule of thumb to predict the amount of change which can be
expected in individual test scores is to multiply the standard error of measurement by
1.5. Only rarely would one expect a student's score to increase or decrease by more
than that amount between two such similar tests. The smaller the standard error of
measurement, the more accurate the measurement provided by the test.
(Further discussion of the standard error of measurement can be found in J. C.
Nunnally, Psychometric Theory. New York: McGraw-Hill, 1967, pp.172-235, see
especially formulas 6-34, p. 201.)
A Caution in Interpreting Item Analysis Results
Each of the various item statistics provided by ScorePak provides
information which can be used to improve individual test items and to increase the
quality of the test as a whole. Such statistics must always be interpreted in the context
of the type of test given and the individuals being tested. W. A. Mehrens and I. J.
Lehmann provide the following set of cautions in using item analysis results
(Measurement and Evaluation in Education and Psychology. New York: Holt,
Rinehart and Winston, 1973, 333-334):
Item analysis data are not synonymous with item validity. An external criterion
is required to accurately judge the validity of test items. By using the internal
criterion of total test score, item analyses reflect internal consistency of items
a) extremely difficult or easy items will have low ability to discriminate but such
items are often needed to adequately sample course content and objectives;
b) an item may show low discrimination if the test measures many different
content areas and cognitive skills. For example, if the majority of the test
measures "knowledge of facts," then an item assessing "ability to apply
principles" may have a low correlation with total test score, yet both types of
against a ScorePak Key Sheet. Raw score names are EXAM1 through EXAM9,
QUIZ1 through QUIZ9, MIDTRMI through MIDTRM3, and FINAL. ScorePak
cannot analyze scores taken from the bonus section of student answer sheets or
computed from other scores, because such scores are not derived from individual
items which can be accessed by ScorePak. Furthermore, separate analyses must be
requested for different versions of the same exam. Return to the text. (anchor near
note 1 in text)
A correlation is a statistic which indexes the degree of linear relationship
between two variables. If the value of one variable is related to the value of another,
they are said to be "correlated." In positive relationships, the value of one variable
tends to be high when the value of the other is high, and low when the other is low. In
negative relationships, the value of one variable tends to be high when the other is
low, and vice versa. The possible values of correlation coefficients range from -1.00 to
1.00. The strength of the relationship is shown by the absolute value of the coefficient
(that is how large the number is whether it is positive or negative). The sign indicates
the direction of the relationship (whether positive or negative). Return to the text.
Assemble or write a relatively large number of items of the type you want on
2.
the test.
Analyze the items carefully using item format analysis to make sure the items
are well written and clear (for guidelines, see Brown, 1996, 1999; Brown &
Hudson, 2002).
3.
Pilot the items using a group of students similar to the group that will
ultimately be taking the test. Under less than ideal conditions, this pilot testing
4.
5.
Basically, those five steps are followed in any test development or revision project.
Item Analysis Statistics for Norm-Referenced Tests
As indicated above, the fourth step, item analysis, is different for NRTs and
CRTs, and in this column, I will only explain item analysis statistics as they apply to
NRTs. The basic purpose of any NRT is to spread students out along a general
continuum of language abilities, usually for purposes of making aptitude, proficiency,
or placement decisions (for much more on this topic, see Brown, 1996, 1999; Brown
& Hudson, 2002). Two item statistics are typically used in the item analysis of such
norm-referenced tests: item facility and item discrimination.
Item facility (IF) is defined here as the proportion of students who answered a
particular item correctly. Thus, if45 out of 50 students answered a particular item
correctly, the proportion would be 45/50 = .90. An IFof .90 means that 90% of the
students answered the item correctly, and by extension,that the item is very easy. In
Screen 1, you will see one way to calculate IFusing the Excel spreadsheet for item 1
(I1) in a small example data set coded 1 for correct and 0 for incorrect answers.
Notice the cursor has outlined cell C21 and that the function/formula typed in that cell
(shown both in the row above the column labels and in cell B21) is = AVERAGE
(C2:C19), which means average the ones and zeros in the range between cells C2 and
C19. The result in this case is .94, a very easy item because 94% of the students are
answering correctly.
All the other NRT and CRT item analysis techniques that I will discuss here
and in the next column are based on this notion of item facility. For instance, item
discrimination can be calculated by first figuring out who the upper and lower
students are on the test (using their total scores to sort them form the highest score to
the lowest). The upper and lower groups should probably be made up of equal
numbers of students who represent approximately one third of the total group each. In
Screen 1, I have sorted the students from high to low based on their total test scores
from 77 for Hide down to 61 for Hachiko. Then I separated the three groups such that
there are five in the top group, five in the bottom group, and six in the middle group.
Notice that Issaku and Naoyo both had scores of 68 but ended up in different groups
(as did Eriko and Kimi with their scores of 70). The decision as to which group they
were assigned to was made with a coin flip.
To calculate item discrimination (ID), I started by calculating IFfor the upper
group using the following: = AVERAGE(C2:C6), as shown in row 22. Then, I
calculated IFfor the lower group using the following: = AVERAGE(C15:C19), as
shown in row 23. With IFupper and IFlower in hand, calculatingIDsimply required
subtracting IFupperIFlower. I did this by subtracting C22 minus C23, or = C22 -C23, as
shown in row 24, which resulted in an IDof .20 for I1.
Once I had calculated the four item analysis statistics shown in Screen 1 for Il,
I then simply copied them and pasted them into the spaces below the other items,
which resulted in all the other item statistics you see in Screen 1. [Note that the
statistics didn't always fit in the available spaces, so I got results that looked like ###
in some cells; to fix that, I blocked out all the statistics and typed alt oca and
thusadjusted the column widths to fit the statistics. You may also want to adjust the
number of decimal places, which is beyond the scope of this article. You can learn
about this by looking in the Help menu or in the Excel manual.
Ideal items in an NRT should have an average IFof .50. Such items would thus
be well centered, i.e., 50 percent of the students would have answered correctly, and
by extension, 50 percent would have answered incorrectly. In reality however, items
rarely have an IFof exactly .50, so those that Ell in a range between .30 and .70 are
usually considered acceptable for NRT purposes.
Once those items that fall within the .30 to .70 range of IFs are identified, the
items among them that have the highest IDs should be further selected for inclusion in
the revised test. This process would help the test designer to keep only those items
that are well centered and discriminate well between the high and the low scoring
ITEM DIFFICULTY:
Definition:
item difficulty is a measure of the proportion of individuals who responded correctly
to each test item. Item difficulty in a test determined by two proportion of
individuals who correctly respond to the item in particular.
#1
24*
#2
12*
13
P=
24
30
P= .80
A rough role of thumb is that if the item difficulty is more then 75, it is an
easy item; if the difficulty is below 25, it is a difficult item. Given these parameters,
this item could be regarded moderately easy 10ts (80%) of students got it correct. In
P=
( 1230 =.40)
12
30
P= .40
In fail, on question # 2, more students selected an incorrect answer (B) than
selected the correct answer (A). This item should be carefully analyzed to ensure that
B is an appropriate distracter.
Therefore Item difficulty should have been named item easiness; it
expresses the proportion or percentage of students who answered two items correctly.
3.4
Introduction
1.
2.
It is a degree to which students with high overall exam scores also got a
particular item correct. It is often referred to as Item Effect, since it is an index
of an item's effectiveness at discriminating those who know the content from
those who do not.
The item discrimination index is a point biserial correlation coefficient. Its
3.
possible range is -1.00 to 1.00. A strong and positive correlation suggests that
students who get any one question correct also have a relatively high score on
the overall exam. Theoretically, this makes sense. Students who know the
content and who perform well on the test overall should be the ones who know
the content. There's a problem if students are getting correct answers on a test
and they don't know the content.
Measurement of Index of Discrimination
Examples I If we are using the Item Analysis provided by Scanning Operations,
discrimination indices are listed under the column head Disc.
RESPNSE TABLE - FORMA
ITEM
NO
1
2
3
OMI
T
%
0
0
0
KEY-
DISC
%
0
79
4
%
18
0
7
%
82
0
89
%
0
21
0
%
0
0
0
C
A
C
82
79
89
0.22
0.23
-0.12
Item difficulty! Very easy or very difficult items are not good discriminators.
If an item is so easy (e.g., difficulty = 98) that nearly everyone gets it correct
2.
3.
That does not mean that very easy and very difficult items should be
eliminated. In fact, they are fine as long they are used with the instructor's
recognition that they will not discriminate well and if putting them on the test
matches the intention of the instructor to either really challenge students or to
4.
Example 2
Another measure, the Discrimination Index, refers to how well an assessment
differentiates between high and low scorers. In other words, you should be able to
expect that the high-performing students would select the correct answer for each
question more often than the low-performing students. If this is true, then the
assessment is said to have a positive discrimination index (between 0 and 1) -indicating that students who received a high total score chose the correct answer for a
specific item more often than the students who had a lower overall score.
If, however, you find that more of the low-performing students got a specific
item correct, then the item has a negative discrimination index (between -1 and 0).
Let's look at an example.
Table 1 displays the results of ten questions on a quiz. Note that the students
are arranged with the top overall scorers at the top of the table 1
Table-1:
Questions
1
Asif
90
Sam
90
Jill
80
Charlie
80
Sonya
70
Ruben
60
Clay
60
Kelley
50
Justin
50
Tonya
40
After the students are arranged with the highest overall scores at the top, count
the number of students in the upper and lower group who got each item
correct. For Question #1, there were 4 students in the top half who got it
2.
3.
the total number of students. For Question #1, this would be 8/10 or p=.80.
Determine the Discrimination Index by subtracting the number of students in
the lower group who got the item correct from the number of students in the
upper group who got the item correct. Then, divide by the number of students
in each group (in this case, there are five in each group). For Question #1, that
means you would subtract 4 from 4, and divide by 5, which results in a
4.
Discrimination Index of 0.
The answers for Questions 1-3 are provided in Table 1
Table-2
Item
# Correct (upper
# Correct
group)
(Lower
Difficulty (p)
Discrimination
(D)
group)
Question
.80
.30
-0.6
.60
0.8
1
Question
2
Question
3
In table 2 we can see that Question #2 had a difficulty index of .30 (meaning it
was quite difficult), and it also had a negative discrimination index of -0.6 (meaning
that the low-performing students were more likely to get this item correct). This
question should be carefully analyzed, and probably deleted or changed. Our "best"
overall question is Question 3, which had a moderate difficulty level (.60), and
discriminated extremely well (0.8).
Recommendations for Determining Index of Discrimination
It is typically recommended that item discrimination be at least .20. It's best to
aim even higher. Items with a negative discrimination are theoretically indicating that
either the students who performed poorly on the test overall got the question correct
or that students with high overall test performance did not get the item correct. Thus,
the index could signal a number of problems:
be addressed. Items with discrimination indices less than .20 (or slightly over, but still
relatively low) must be revised or eliminated. Be certain that there is only one
possible answer, that the question is written clearly, and that your answer key is
correct.
UNIT-4:
INTERPRETING THE TEST SCORES
4.1
10
12.50
40
25
31.25
30
30
37.50
20
15
18.75
10
Total
80
100
20
100
Ten students from class A and eight students from class B got grade A. It looks
apparently that class A is better in getting A grade but 12.5% of the students from
class A and 40% students from class B got grade A. It is clear from the percentages
that class.
B is far better in getting grade A than class A.
Interpretation the Score by Norm Referencing
Interpretation of scores by norm referencing involves making of scores and
expressing a given score in relation, to the other scores Norm-referenced test
interpretation tells us how an individual is compared with other persons who have
taken the same test. The simplest type of comparison is to rank the scores from
highest to lowest and to note where an individual's score falls. The rest of the scores
serve as the norm group. The given score is compared with the other scores by norm
referencing. If a student's score is second from the top in a group of 20 students, it is a
high score meaning that the scores of 90% of the students are less than him.
Ordering and Ranking
indicate a total absence of the quantity being measured. We measure height in meters
or feet, weight in kilograms or pounds, temperature in centigrade or Fahrenheit,
income in rupees and the time in seconds. Arithmetic operations can be done on this
scale. You can add the income of a wife to that of his husband.
Determine the Range. Range is the difference between highest and lowest
scores.
2.
Decide the appropriate number of class intervals: 'There is no hard and fast
formula for deciding the number of class intervals. The number of class
intervals is usually taken between 5 and 20 depending on the length of the
data.
3.
Determine the approximate length of the class interval by dividing the range
with number of class intervals.
5.
Determine the limits of the class intervals taking the smallest scores at the
bottom of the column to the largest scores at the top.
5.
Determine the number of scores falling in each class interval. This is done by
using a tally or score sheet.
Example:
The marks obtained by 120 students of first year class in the subject of
Education are given below-Construct a frequency distribution.
57
86
69
62
75
73
80
78
87 83 77
35
70
68 84
73
81
78
61
72
59
98
95
63
76
73
88 60 52
83
86
45 70
53
85
74
62
78
89
84
60
79
91
64
84 85 81
79
90
78 83
50
71
65
76
58
71
79
51
61
61
89
81 74 76
74
82
91 71
76
80
52
71
66
77
65
44
79
95
74
79 63 83
87
77
75 83
48
70
85
61
70
72
67
61
83
75
79
97 75 66
54
81
68 78
75
83
61
33
76
62
55
72
76
78
75 99 80
83
86
2.
3.
I=
Range
No . of class intervals
The length is usually rounded upward to whole number. Therefore 9.4 is taken
as 10.
4.
60 69
50 59
40 49
30 39
The lowest class interval is taken in which the minimum scores can be
included. The minimum score is 33. The lowest class interval can be started from 30,
but it is convenient to start the lowest class interval from the score to which addition
of the length of the class intervals is easy. So we start from 30. This is called lower
limit of the class intervals. Add 9 (1 1 = 10 1 = 9) to the lower limit to get the
upper limit of the first class interval. Now add consequently i = 10 to the lower limits
and upper limits to get the remaining class intervals.
5.
Step-5: Distribute the scores in the class intervals by putting a tally mark in the
relevant class interval and count the number of scores in each class interval.
Grade
Tallies
No. of
Students
90 - 99
||||| |||
88
80 89
70 79
45
60 69
223
50 59
||||| |||||
10
40 49
|||
30 39
||
3030
Frequency
The number of scores lying in a class interval is called the frequency of that
class interval. For Example two scores lie in the class interval 30-39. Therefore 2 is
the frequency of the class interval 30-39.
Mid-Point of Class Mark
The middle of a class interval is called mid point or class mark and is usually
denoted by X. It is calculated as
Midpoint = X =
X=
69
2
= 34.5
x
X =
N
Where
Median =
( N 2+ 1 )
th score
Median = L +
i N
+ C
f
2
Where
L = lower class boundary of the median class interval.
I = length of the median class interval.
F - the frequency of the median class interval.
N= f
C = the cumulative frequency of the class interval below the median class interval.
The Mode
The mode is the score that occurs greatest number of times in a data set. Mode
does not always exist. If each score occur the same number of times, there is no mode.
There may be more than one mode. If two or more scores occur greatest number of
times, then there are more than one mode.
The mode can be calculated for grouped data with the help of following
formula.
Mode =
L+
f mf l
i
2 f mf 1f 2
Where
L = lower class boundary of the modal class interval.
Fm = the maximum frequency.
Fi = the frequency preceding to the modal class.
f2 = the frequency succeeding to the modal class,
I = the length of the modal class interval.
Note: The mode lies in the class interval having maximum frequency. This class
interval is called the modal class.
Empirical Relationship between Mean, Median and Mode:
For moderately skewed distributions, we have the following empirical
relation:
Mode = 3 Median 2 Mean
and
Q1 =
( N 4+ 1 )
Q2 =
2 ( N +1 ) N + 1
=
th score
4
2
th score
Q3 =
4.2
3 (N +1)
th score
4
The Percentiles:
The values that divide a set of scores into hundred equal parts are called
percentiles and are denoted by P1, P2, P3, .. and P99.? P25 is the first quartile,
P75 is the third quartile and P50 is the median.
The Percentile Ranks (PR):
The procedure for calculating percentile ranks is the reverse of the procedure
for calculating percentiles. Here we have an individual's score and find the percentage
of scores that lies below it. In the Example, we calculate P78== 83.37. It means that
83.37 is the score below which 78% of the scores fall. If a student has a score of
83.37, we can say that his percentile rank (PR) is 78 on a scale of 100.
Positive Correlation
Negative Correlation
No correlation
Rank-difference method:
This method is useful when the number of scores to be correlated'-is small or
exact magnitude of the scores cannot be ascertained. The scores are ranked according
to size or some other criterion using numbers 1, 2, 3 n The rank-difference coefficient
of correlation can be computed by the following formula.
2
Rs = 1
6 D
2
N (N 1)
large. Thus this method is used in most research studies. The product-moment
coefficient is usually denoted by r.
XY X Y
( )( )
( ) ( )
rxy =
X2
N
Y2
N
Measures of Variability:
Measures of central tendency measure the centre of a set of scores. However,
two data sets can have the same mean, median and mode and yet be quite different in
other respects. For example, consider the heights (in inches) of the players of two
basketball teams.
Team-1:
72 73 76 76 78
Team-2:
67 72 78 76 84
The two teams have the same mean height. 75 inches, but it is clear that the
heights of the players of team 2 vary much more than those of team 2. If we have
information about the centre of scores and the manner in which they are spread out we
know much more bout set of scores. The degree to which scores tend to spread about.
an average value is called dispersion.
The Range
It is the simplest measure of dispersion. The range of a set of scores is the
difference between maximum scores and minimum scores.
In symbols
Range = Xm Xo
Where Xm is the maximum score and Xo is the minimum score. Quartile
Deviation:
The quartile deviation is defined as half of the difference between the third and
the first quartiles.
In symbols
Q. D. = (Q3 Ql) / 2
Where
Q1 is the first quartile and
Q3 is the third
The Mean Deviation or Average Deviation:
The average deviation is defined as the arithmetic mean of the deviations of
the scores from the mean or median; the deviations are taken as positive. In symbols
X X
N
M.D. =
f X X
f
M.D. =
S=
X X 2
S=
X2 X
N
( )
N
C.V.
S
X
100
Standard Scores:
A frequently used quantity is statistical analysis is the standard score or Zscore. The standard score for a data value in the number of standard deviations that
the data value is away from the mean of the data set.
Z=
X X
S
2.
3.
The normal distribution is symmetric about the mean that is the part of the
curve to the left of is the mirror image of the part of the curve to the right of
it.
4.
5.
6.
7.
In a formal distribution,
0.674+5 to + 0.6745 covers 50% of the area.
to + covers 68.27% of the area.
- 2 to + 2 covers 95.45 of the area.
- 3 to IA + 3 covers 99.73% of the area.
4.3
STANDARD SCORES:
Most educational and psychological tests provide standard scores that are
based on a scale that has a statistical mean (or average score) of 100. If a student earns
a standard score that is less than 100, then that student is said to have performed
below the mean, and if a student earns a standard score that is greater than 100, then
that student is said to have performed above the mean. However, there is a wide range
of average scores, from low average to high average, with most students earning
standard scores on educational and psychological tests that fall in the range of 85-115.
This is the range in which 68% of the general population performs and, therefore, is
considered the normal limits of functioning.
109), and high average (110-119). These classifications are used typically by school
psychologists and other assessment specialists to describe a student's ability compared
to same-age peers from the general population.
Subtest Scores
Many psychological tests are composed of multiple subtests that have a mean
of 10, 50, or 100. Subtests are relatively short tests that measure specific abilities,
such as. vocabulary, general knowledge, or short-term auditory memory. Two or more
subtest scores that reflect different aspects of the same broad ability (such as broad
Verbal Ability) are usually combined into a composite or index score that has a mean
of 100. For example, a Vocabulary subtest score, a Comprehension subtest score, and
a General Information subtest score (the three subtest scores that reflect different
aspects of Verbal Ability) may be combined to form a broad Verbal Comprehension
Index score. Composite scores, such as IQ scores, Index scores, and Cluster scores,
are more reliable and valid than individual subtest scores. Therefore, when a student's
performance demonstrates relatively uniform ability across subtests that measure
different aspects of the same broad ability (the Vocabulary, Comprehension, and
General Information subtest scores are both average), then the most reliable and valid
score is the composite score (Verbal Comprehension Index in this example). However,
when a student's performance demonstrates uneven ability across subtests that
measure different aspects of the same broad ability (the Vocabulary score is below
average, the Comprehension score is below average, and the General Information
score is high average), then the Verbal Comprehension Index may not provide an
accurate estimate of verbal ability. In this situation, the student's verbal ability may be
best understood by Helping Children at Home and School II: Handouts for Families
and Educators S2-8llooking at what each subtest measures. In sum, it is important to
remember that unless performance is relatively uniform on the subtests that make up a
particular broad ability domain (such as Verbal Ability), then the overall score (in this
case the Verbal Comprehension Index) may be a misleading estimate.
4.4
PROFILE: >>>>>>>
UNIT-5:
EVALUATING PRODUCT, PROCEDURES &
PERFORMANCE
5.1
which fall under six evaluation themes. The following are major themes of evaluation.
1.
Learning Outcome:The quality of learning outcome is the first theme identified in teaching and
learning from work under this theme one of important sub theme is identified.
Attainment of Curriculum Objectives:
chosen area?
What opportunities are pupil's afforded use and display their ability applies
Oral language
Reading
Writing
Digital literacy
If we have not completed a school improvement plan to date, what do we need
to focus on the support learns out comes and attainment of curriculum objectives in
each curriculum area.
2.
Learning Experience.
The quality learning experience is the second theme identified in teaching of
Learning environment
Engaged in learning
Learning to learning
Engaged in Learning
Are students interested and enthused by the content and teaching approaches
used?
Do we encouraged pupil questioning considered teacher input V'S pupil
Pupils enjoy learning in class room and are eager to find out more,
All students in class room afforded the opportunity to participate in lesson and
engage with learning.
Learning Environment
To involve the students in development rules which recognize the rights of
responsibilities of the community.
Prepare supervision of pupils both within the class and at break times within
the school setting.
All the recourses well organized, labeled and clear to all learners.
Celebrates pupils learning and achievements through a range of display.
Concrete and visual materials, centers of interest and display of pupil work.
Learning to Learn
Learning to learning is the third sub theme of learning experiences.
To engage the pupils to monitor their own progress in learning for learning
technique to utilize them properly in class room to develop the skills of learner by
proper planning of lessons.
To allow the learner to communicate work with other in the clam.
How do we enable the student learner to develop their personal organization to plan
out their own work study and revision skills do we teach.
To teach the pupils how to organized prose nil the work.
To make the pupil creative and give the opportunity for collaborative work.
3.
Teacs Practice
The quality of teacher's practice is the third theme of teacher and learning
from work. Under this theme four sub themes are identified
1.
Learning outcome:
Do we provide class, relevant and differentiated learning out comes to pupil?
How pupils are made aware of what they are going to learn?
Are pupil familiar with the expected success criteria in learning activities.
Written Plans
Are the long and short terms plans prepared in accordance with the rules for
primary teacher.
Does the planning clearly indicate expected learning out comes, teaching
approaches resources and activities.
Are the literacy and numeracy opportunities identified across the curriculum?
How we identified these opportunities an are whole school plans individual
planning?
Resources
How satisfied are we with the resources, materials and equipment we have
with in our class room and available within the school? Are the necessary and relevant
material & readily available?
Assessment
Teaching Approach
Learning Outcome
curriculum.
What provision is made to ensure expected learning outcomes are achieved
during lesson?
Focus of Learning
including ICI?
Pupils leaving timely and does it happen at a regular interval
Term Paper:
Definition
A term paper has two purposes the student should demonstrate an
understanding of the material as well as the ability to communicate that understanding
effectively.
Writing term papers gives students practical experience in writing at length
communicating thoughts and idea through the written word is a necessary skill in any
profession.
A term paper is a research paper written by students over an academic term
accounting of a large part of a grade. Terms papers are generally intended to describe
on event, a concept or argue paint. A term is a written original work discussing a topic
in detail, usually several typed paper in length and is often due at the end of semester
there is much overlap between term papers and research paper "The term Paper" was
originally used to describe a paper (usually a research based) that was due to at the
end of "term" either a semester or quarter, depending on which unit of measure a
school used common usage has "term paper" and "research paper" as interchangeable
but this is not completely accurate. Not all term papers involve academic research and
not all research papers are one term papers.
Term papers date back to the beginning of the 19th century when print could
be reproduced cheaply and written text of all types (reports memoranda specifications,
and scholarly articles) could be easily produced and disseminated during the year
from 1870 to 1900, mouton and Holmes (2003) write that American education was
transformed as writing become a method of discoursed and research the hallmark or
learning.
Importance:
The author has integrated a variety of key pieces of literature on the topic but
so representing the consent state of research as well as covering various view
point.
The author has integrated a variety of key pieces of literature but focuses too
prominence.
The contents of individual pieces of literature are largely represented correctly
although the student may give too much prominence to individual.
The literature review reveals that the student has not fully under stood large
parts of the literature. The content of theoretical part is, as a result incorrect to
the considerable degree.
The student has represented the views of prominent scholars on the topic and
has developed critical argument and support of or against the literature
represented has / her literature review is focused on the research question and
relevant to it.
The student presents the current state of research concerning the topic at hand
too much on one particular view point.
The literature review shows lack of focus. The author presents bit and piece
that are loosely related to the topic at hand.
Presentation of Results
The tables and figures are legible and easily to grasp at the first glance It is
evident that the author has spent find best visual means of presentation.
The formatting of tables and figures is satisfactory yet not always easy to
grasp to grasp at the first glace.
The student has not attempted to use tables and figures to support his / her
argument.
The student's plain how he/she arrived at the result represented and indicates
their significance to the topic or field of linguistics in which paper is written.
The author largely points out the most striking result at the same time;
however he/she concentrates too much discussing aspects that are not entirely
relevant to research question at hand.
The author mainly lists example from her data no comparison of her results
with those of previous studies is offered.
The student uses the academic writing register/ style with appropriate
linguistic terminatories.
The language used is largely suitable for an academic piece of writing but the
paper exhibit some mainly recurring.
Descriptive title
Author
Abstract (NOTE: MAXIMUM 200 WORDS)
Introduction
Hypotheses and predictions (these can be incorporated into the introduction or
6.
7.
8.
9.
tense to describe results found in previous studies; use future tense "we will..." to
describe work that you propose to do (5 points).
5.2
which:
Specific Questions:
Evaluator can ask more specific questions about:
work.
Group work process versus the group work product.
The effectiveness of group work in class and/or out of class to enhance
learning,
The appropriateness of group work.
Organizational, planning, management and monitoring issues.
Strengths and weakness of group work and ideas for improvement.
Diversity issues (did some students find it easier or harder, benefit more
than others and why, what about issues of power),
The ways in which explained, facilitated, managed and monitored the groups.
The overall nature of the unit o study.
Timing of Evaluation:
Evaluation an occur at any time during the unit of study program, but it
usually occurs at the end of the semester or at the end of the task that is being
undertaken and evaluated.
Ideally students should be given time to reflect upon their experiences prior to
completing any form of evaluation especially if evaluator desire some specific
Questionnaire:
Questionnaire is a common method of approach that involves having students
complete a survey in the class. When evaluator designing questionnaire ensure that
there is an introduction which explains the purpose of the evaluation, that there are
clear instructions for completion and that the questions are unambiguous. The
questions posed can be open ended or closed, or a combination.
Open Ended Questions:These questions have the advantage of allowing students to identify what were
the most important elements of their experience. A disadvantage is that they may not
write much or may be nothing at all.
Closed Questions:
These are statements that allow students to rate their agreement or
disagreement with a comment or statement by using a Liker Scale.
Strongly
Agree
Neural
Disagree
Strongly agree
agree
disagree
Students are usually willing to answer these questions, especially if the questionnaires
are anonymous. A disadvantage is that they do not give detailed response or answer
"why" or "how" questions.
2.
Checklist:
Checklist is another method that can provide basic data.
An example may be a list of provided unit outcomes (knowledge, skills,
attributes, abilities etc) and students circle or tick the ones that apply.
Alternatively evaluator could ask to generate their own list of outcomes, For
example group work provided me with......
Autonomy.
Opportunity to get to know my classmates.
Opportunity to work on a real life problems students are usually willingly to
complete these lists but again the disadvantages is that they do not give detailed
responses or answer "why" or "how" questions.
Evaluation Hand-Out:
Some academies design their own evaluation hand-out that can combine a
number of evaluation methods and are anonymous, quick and easy to complete. They
can take any form, use images, diagrams, comment boxes or questions and lists as
above.
Interview:
Interview can be done individually or in small groups and provide the
opportunity for evaluator to probe for deeper analysis of the process and experience.
The disadvantage of this method is that it can be time consuming for both
evaluator and the students, and in a larger group may be some students may be more
vocal than others.
Focus Group:
Focus group uses a facilitative rather than direct questioning approach and is a
useful way of having students discusses the process of group work. This method
allows students to work off and build upon each other's answers.
The disadvantage is that it is time consuming for both evaluator and students
and there is the added difficult of arranging a time that will suit everyone.
Practicality of the Evaluation Process:
Before making a choice about evaluation method also consider the following
questions:
interview rooms)?
What levels of participation evaluator require from the students, tutors,
organizations or any other party who were involved in the group work
activities?
Uses of Evaluation:
It is important to consider who will use the evaluations and how it will be
used. This is a key part of the planning process which relates to the purpose of the
evaluation.
It is also important to reflect upon and consider the methods that have been
used to gather information about the effectiveness of group work.
5.3
EVALUATING DEMONSTRATION:
1.
2.
assistance.
For the simulated forced approach you should tell students that you will be
simulating an engine failure and that they are to carry out the entire procedure
3.
NOTE: You would interrupt the student's performance, of course, if safety became a
factor.
4.
Success or failure during the evaluation stage of the lesson will determine
whether you carry on with the next exercise or repeat the lesson.
Demonstration
1.
The explanation and demonstration may be done at the same time, or the
demonstration given first followed by an explanation, or vice versa. The skill
you are required to teach might determine the best approach.
2.
Consider the following: You are teaching a student how to do a forced landing.
Here are your options:
should do, will give them an opportunity to prove they know the procedure,
although they have not yet flown it.
c. After completing the forced landing approach, while climbing for altitude,
clear up any misunderstandings the students may have and ask questions.
d. The demonstration and explanation portion of the demonstration-performance
method is now complete and you should proceed to the next part, which is the
student performance and instructor supervi
Evaluation Matrix for the Demonstration
When assessing the demonstration of teaching skills, attention is given to the
applicant's use of didactic solutions. The following matrix transparently describes the
criteria used to evaluate the demonstration. The matrix is indicative instead of
normative, and is used for support when evaluating the demonstration of teaching
skills. In other words, not all of the aspects listed in the matrix need to be assessed
systematically. The evaluators use the criteria listed below to form an overall appraisal
of the demonstration's standard by assessing the quality of the components that are of
a good or better level. If the demonstration includes a preliminary assignment, all the
individual components are assessed in relation to it. If well grounded, the
demonstration may also be virtual, held in real time and interactively.
Component of the demonstration of
teaching skills
Objectives
Passable
Satisfactor
y
Good
Content
Methods
Organization of teaching
Motivation of target group
Suitable use of teaching methods
Suitable use of teaching aids and
materials
The
content
is
academic.
demonstration.
The applicant takes the
target
group
into
consideration when.
Th
co
The
The
an
The
The
m
The
b
The
m
Wh
h
d
T
a
i
c
T
e
a
T
m
s
The
applicant
evaluates the teaching
situation in terms of
the objectives set.
T
s
T
r
c
Interaction skills
Use of voice
The
applicants
delivery is clear and
understandable.
Oral
and
written
communication
is
coherent.
T
u
O
i
T
i
t
Wrap-up
Evaluation of the teaching situation in
terms of the objectives set
Consideration given to the target group
in solutions related to evaluation
The
preliminary
assignment and the
demonstration
of
teaching skills are well
aligned.
T
f
d
c
5.4
Motor Skills
A motor skill is a function, which involves the precise movement of muscles
with the intent to perform a specific act.
Motor skills are skills that are associated with the activity of the body muscles
like the skills performed in sport. Fine motor skills arc the type that is associated with
small movements of the wrists, hands, feet, fingers and toes.
Motor skills are the ability to make particular bodily movements to achieve
certain tasks. They are a way of controlling muscles to make fluid and accurate
movements. These skills must be learned, practiced and mastered, and overtime can
be performed without thought, for example, walking or swimming. Children are
clumsy in comparison to adults, because they have yet to learn many motor skills that
allow them to effectively accomplish tasks.
Motor skills are also learned and refined in adulthood. If a woman takes up
belly dancing, her first movements will not closely resemble that of the teacher.
Overtime however, she will learn how to control her muscles to make the signature
movements that a belly dancer makes.
Genetic factors also affect the development of motor skills, for example, the
children of a professional dancer are far more likely to be good at dancing, with good
coordination and muscular control, than the children of a biochemist. Gross motor
skills are usually learned during childhood and require a large group of muscles to
perform actions, such as balancing or crawling. Fine motor skills involve smaller
groups of muscles and are used for fine tasks, such as threading a needle or playing a
computer game. These skills can be forgotten if disused over time.
relationships can be detected among other fundamental motor skills and specific sport
skills and movements.
Assessment of Motor Skills
A motor skills assessment is an evaluation of a patient to determine the extent
and nature of motor skill dysfunction. Care providers like physical therapists and
neurologists can perform the assessment, which may be ordered for a number of
reasons. It is not invasive, but does require the completion of a number of tasks. The
length of time required can vary, depending on the test or tests used. It may be
necessary to set aside a full day for testing.
One reason for a motor skills assessment may be to establish a child's baseline
level of motor competency. This can provide a reference point for the future. Physical
education teachers, for example, may perform brief assessments with new students to
determine which kinds of activities would be safe and appropriate for them.
Pediatricians also use such testing to assess their patients. If a child appears to have
developmental delays, this may result in a referral for a more extensive examination
Different Ways to Assess Motor Skills
Motor Skills can be evaluated in different ways .some of them are as follow.
1. Test gross motor skills using range of motion. Assess gross motor skills by
asking the individual to perform a series of movements known as range of motion.
Evaluate range of motion by asking the individual to hold an arm out and move it in a
circular direction. The arm should be able to move in a complete circle when fully
extended. Then ask the individual to stand and place one leg out. Have the individual
move the leg up and down, back and forth and left to right. Note any difficulty in
movement, abnormalities or pain experienced by the individual.
2. Assess gross motor skills using games. Gross motor skills can be evaluated using
games and sports. Ask the individual to kick a ball to test gross motor skills of the leg.
Jumping rope is a great way to evaluate motor skills, because it uses both the arms
and legs working together to accomplish the task. Hopscotch, basketball and walking
on a balance beam are also good ways to evaluate gross motor skills. Look for the
fluidity of movement, problems with balance and hand-eye coordination.
3. Evaluate fine motor skills of arms and legs. Ask the individual to put a clothespin
on the edge of a box. Stringing beads on a shoelace is another way to assess fine
motor skills. Using a stapler and placing a paperclip on a sheet of paper are also ways
to assess fine motor skills. Place an item on the floor and ask the individual to pick it
up using his toes only. Watch the individual perform each task, looking at how smooth
the movements are and how easily the task is completed, and note any difficulties.
4. Test fine motor skills using common household items. Give the individual a jar
and ask her to unscrew the lid and screw the lid back on. Ask the individual to place
items, such as coins or blocks, into containers such as a bowl, bucket or cup. Draw a
straight line on a piece of paper and have the individual use a pair of scissors to cut
the line on the paper. Using pencils or pens of different sizes, ask the person to pick
up and grasp each pencil/pen. Then ask the individual to trace items drawn on the
paper. Watch for the completion of each task, looking for any problems during each
movement.
5. Assess fine motor skills while getting dressed. Ask the individual to put on and
button up a shirt. Next, have the individual put on a pair of pants that have a snap
closure and a zipper. Give the individual a pair of shoes, which have shoelaces and
not Velcro closures, and ask him to tie the shoes. Watch the individual perform each
task, looking for difficulties, abnormal movements and the ability to perform each
task completely without help.
Visual-Motor Integration This sub-test evaluates the child's eye and hand coordination. Aside from controlling muscles, the test determines the level of the child's
visual perception. Some examples of the activities of this 72-item sub-test include
building blocks and copying designs.
5.5
and students are in need of speaking and listening skills that will help them succeed in
future courses and in the workplace. Thus, the assessment of communication skills is
an important issue in general education .Oral assessment is often carried out to look
for students' ability to produce words and phrases by evaluating students' fulfillment
of a variety of tasks such as asking and answering questions about themselves, doing
role-plays, making up mini-dialogues, defining or talking about some pictures given
them. The operations in an oral ability test are either informational skills or
interactional skills. The testing of speaking is widely regarded as the most challenging
of all language tests to prepare, administer and score.
Kind of Oral Communication
Oral communication can also be delivered individually or as part of a team.
Therefore, knowing the kind of oral communication act that is expected is a necessary
step in being able to give useful feedback and ultimately an accurate evaluation
Pronunciations
Pronunciation is a basic quality of language learning. Though most second
language learners will never have the pronunciation of a native speaker, poor
pronunciation can obscure communication and prevent a student from making his
meaning known. When evaluating the pronunciation of students, listen for clearly
articulated words, appropriate pronunciations of unusual spellings, and assimilation
and contractions in suitable places. Also listen for intonation. Are students using the
correct inflection for the types of sentences they are saying? Do they know that the
inflection of a question is different from that of a statement? Listen for these
pronunciation skills and determine into which level student falls.
Vocabularies
Vocabulary comprehension and vocabulary production are always two
separate banks of words in the mind of a speaker, native as well as second language.
Teacher should encourage students to have a large production vocabulary and an even
larger recognition vocabulary. For this reason it is helpful to evaluate students on the
level of vocabulary they are able to produce. Are they using the specific vocabulary
instructed them in the class? Are they using vocabulary appropriate to the contexts in
which they are speaking? Listen for the level of vocabulary students are able to
produce without prompting and then decide how well they are performing in this area.
Accuracy
Grammar has always been and forever will be an important issue in foreign
language study. Writing sentences correctly on a test, though, is not the same as
accurate spoken grammar. As students speak, listen for the grammatical structures and
tools teachers have taught them. Are they able to use multiple tenses? Do they have
agreement? Is word order correct in the sentence? All these and more are important
grammatical issues, and an effective speaker will successfully include them in his or
her language.
Communications
A student may struggle with grammar and pronunciation, but how creative is
she when communicating with the language she knows? Assessing communication in
the students means looking at their creative use of the language they do know to make
their points understood. A student with a low level of vocabulary and grammar may
have excellent communication skills if she is able to make other understand
her/him whereas an advanced student who is tied to manufactured dialogues may not
be able to be expressive with language and would therefore have low communication
skills. Don't let a lack of language skill keep the students from expressing themselves.
The more creative they can be with language and the more unique ways they can
express themselves, the better their overall communication skills will be.
Interactions
Ask the students questions. Observe how they speak to one another. Are they
able to understand and answer questions? Can they answer when teacher ask them
questions? Do they give appropriate responses in a conversation? All these are
elements of interaction and are necessary for clear and effective communication in
English. A student with effective interaction skills will be able to answer questions
and follow along with a conversation happening around him. Great oratory skills will
not get anyone very far if he or she cannot listen to other people and respond
appropriately. Encourage your students to listen as they speak and have appropriate
responses to others in the conversation.
Fluency
Fluency may be the easiest quality to judge in your students' speaking. How
comfortable are they when they speak? How easily do the words come out? Are there
great pauses and gaps in the student's speaking? If there are then your student is
struggling with fluency. Fluency does not improve at the same rate as other language
skills. You can have excellent grammar and still fail to be fluent. You want your
students to be at ease when they speak to you or other English speakers. Fluency is a
judgment of this ease of communication and is an important criterion when evaluating
speaking.
feedback and support, and they do want suggestions. Not so useful: "Don't wave your
hands when you talk."Better: "Let's figure out what you're going to do with your
hands so that you don't distract the audience from what you are saying. What feels
more natural to you?"
Present oral communication skills as a set of professional skills that all
professionals learn and practice steadily throughout their lives.
UNIT-6:
PORTFOLIOS
6.1
PURPOSE OF PORTFOLIOS:
Literally Definition:
A) a large, flat, thin case for carrying loose papers or drawings or maps; usually
leather
B) a set of pieces of creative work collected to be shownto potential customers or
employers; ' the artist had put together a portfolio of his work"; "every actor
has a portfolio of photographs"
C) A collection of various company shares, fixed interest securities or moneymarket instruments.
Terminological Ally Definition:
A portfolio is a purposeful collection of student work that exhibits the
student's efforts, progress, and achievements in one or more areas of the curriculum.
The collection must include the following:
one year to the next. Instructors can use them for a variety of specific purposes,
including:
program
Demonstrating progress toward identified outcomes.
Creating an intersection for instruction and assessment. ,
Providing a way for students to value themselves as learners.
Offering opportunities for peer-supported growth.
Benefits of Portfolio:
critical thinking%, skills which result from the need for students tot
Develop evaluation criteria
Students are pleased to observe their personal growth,
They have better attitudes toward their work, and
They are more likely to think of themselves as writers.
First, you must decide the purpose of your portfolio. For example, the portfolios
might be used to show student growth, to identify weak spots in student work,
and/or to evaluate your own teaching methods.
2.
After deciding the purpose of the portfolio, you will need to determine how you
are going to grade it. In (titer words, what would a student need in their
3.
portfolio for it to be considered success and for them to earn a passing grade.
The answer to the previous two questions helps form the answer to the third:
What should be included in the portfolio? Are you going to have students put
of all their work or only certain assignments? Who gets to choose?
Set a Purpose for the Portfolio. First, we need to decide what your purpose
of the portfolio is. Is it going to be used to show student growth or identify
specific skills? Are we looking for a concrete way to quickly show parents
student achievement, or are we looking for a way to evaluate your own
teaching methods? Once we have figured out your goal of the portfolio, then
we think about how to use it.
2.
Decide How ' You Will You Grade it. Next, we will need to establish how we
are going to grade the portfolio. There are several ways you can grade students
work, we can use a rubric, letter grade, or the most efficient way would be to
use a rating scale. Is the work completed correctly and completely? Can we
comprehend it? we can use the grading scale of 4-1. 4 = Meets all
Expectations, 3 = Meets Most Expectations, 2 = Meets Some Expectation, 1 =
Meets No Expectations. Determine what skills you will be evaluating then use
the rating scale to establish a grade.
3.
What will b Included in it. How will we determine what will go into the
portfolio? Assessment portfolios usually include specific pieces that students
are required to know. For example, work that correlates with the Common
Core Learning Standards. Working portfolios include whatever the student is
currently working on, and display portfolios showcase only the best work
students produce. Keep in, mind that we can create a portfolio for one unit and
not the next. We get to choose what is included and how it is included. If you
want to use it as a long-term project and include various pieces throughout the
year, we can. But, we can also use it for short- term projects as well.
4.
How Much Will You Involve the Students. How much we involve the
students in the portfolio depends upon the students age. It is important that all
students should understand the, purpose of the portfolio and what is expected
of them. Older students should be give n a checklist of what is expected, and
how' it will be graded. Younger students may 1 of understand the grading scale
so we can give them the option of what w 11 be include d in their portfolio.
Ask them questions such as, why did you choose this particular piece and does
it represent your best work? Involving students in the portfolio process will
encourage them to reflect on their work.
5.
Will You Use a Digital Portfolio. With the fast-paced world of technology,
paper portfolios may'become a thing of the past. Electric portfolios (eportfolios/digital portfolios) are Teat because they are easily accessible, easy
to transport and easy to use. Todays students are tuned into the latest musthave technology, and electronic portfolios arc part of that. With students using
an abundance of multimedia outlets, digital portfolios seem like a great fit.
The uses of these portfolios are the same, students still reflect upon the r work
but only in a digital way.
The key to designing a student portfolio is to take the time to think about what
kind it will be, and how we well manage it. Once we do that and follow the
steps;above, we will find it will be a success.
Types C F Portfolios Duo
1)
This type of portfolio highlights and shows evidence of the best work of
learners. Frequently, this type of portfolio is called a display or showcase portfolio.
For Students, best work is often associated with pride am a sense of accomplishment
and can result in a desire to share their work with o hers. Best work can include both
product and process. It is often correlated with the amount of effort that few learners
have invested in their work. A major advantage of this type of portfolio is that learners
(an select items that reflect their highest level of learning and canexplain why these it
(ms represent their best effort and achievement. Best work portfolios are used for the
following purposes:
Student Achievement. Students may select a given number of entries (e.g., 10) that
reflect their best effort or achievement (or both) in a course of study. The portfolio can
be presented in a student-led parent conference or at a community open house. As
students publicly share their excellent work, work they have chosen and reflected
upon, the experience may enhance their self-esteem.
Post-Secondary Admissions. The preparation of g.post-secondary portfolio targets
work samples from high school that can be submitted forconsideration in the process
of admission to college or university. This portfolio should show evidence of a range
of knowledge, skills, and attitudes, and may highlight particular qualities relevant to
specific programs. Many colleges and universities are adding portfolios to the initial
admissions process while others are using them to determine particular placements
once students are admitted.
Employability. The audience for this portfolio is an employer, .This collection of
work needs to be focused on specific knowledge, skills, and attitudes necessary for a
particular job or career. The school-to-work movements in North America are
influencing an increase in the use of employ-ability portfolios. The Conference Board
of Canada (1092), for example, outlines the academic, personal management, and
Growth Portfolio
A growth portfolio demonstrates an individual's development and growth over
school setting that substantiate students' choices and create a holistic view of the
students as learners and people. This type of portfolio may be modified for
employment purposes.
3)
Showcase Portfolios
Showcase portfolios highlight the best products over a particular time period
or course. For example, a showcase portfolio in a composition class may include the
best examples of different writing genres, such an essay, a poem, a short story, a
biographical piece, or a literary analysis. In a business class, the showcase portfolio
may include a resume, sample business letters, a marketing project, and a
collaborative assignment that demonstrates the individual's ability to work in a team.
Students are often allowed to choose What they believe to be their best work,
highlighting the it achievements and skills. Showcase reflections typically focus on
the strengths of selected pieces and discuss how each met or exceeded required
standards
4)
Process Portfolios
Process portfolios, by contrast, concentrate more on the journey, of learning
rather than the final destination or end pro lusts of the learning process. In the
composition class, for example, different stages of the processan outline, first draft,
peer and teacher responses, early revisions, and a final edited draftmay be required.
A process reflection may discuss why a particular strategy was used, what was useful
or ineffective for the individual in the writing process, and how the student went
about making progress in the face of difficulty in meeting requirements. A process
reflection typically focuses on many aspects of the learning process, including the
following: what approaches fiches work best, which are ineffective, information about
oneself as a learner, and strategies or approaches to remember in future assignments.
5)
Evaluation Portfolios.
Evaluation portfolios may vary substantially in their content. Their basic
purpose, however, remains to exhibit a series of evaluations over a course and the
learning or accomplishments of the student in regard to previously determined criteria
or goals. Essentially, this type of portfolio documents tests, observations, records, or
other assessment artifacts required for successful completion of the course. A math
evaluation portfolio may include tests, quizzes, and written explanations of how me
went about solving a problem or determining which formula to use, whereas a science
evaluation portfolio might also include laboratory experiments, science project
outcomes with photo ; or other artifacts, and research reports, as well as tests and
quizzes. Unlike the showcase portfolio, evaluation portfolios do not simply include
the best work, but rather a selection of predetermined evaluations that may also
demonstrate students' difficulties and unsuccessful struggles as well as their better
world. Students who reflect on why some work was successful and other work was
less so continue their learning as they develop their met cognitive skills.
6)
Online or e-portfolios
Online or e-portfolios may be one of the above portfolio types or a
combination of different types, a general requirement being that all information and
artifacts are somehow accessible online. A number of colleges require students to
maintain a virtual portfolio that may include digital, video, or Wet -based products.
The portfolio assessment process may be linked to a specific course or an entire
program. As with all portfolios, students are able to visually track and show their
accomplishments to a wide audience,
Conclusion: The portfolio process will continue to be refined and efforts made to
improve students' perceptions if the process as it is intended to develop the selfassessments skills they will need to improve their knowledge and professional skills
throughout their education careers.
6.3
OF
Portfolio:
An organized presentation of an individuals education, work samples, and
skills.
Terminologically a portfolio is a purposeful collection of student work that
exhibits the students efforts, progress, and achievements in one or more areas of the
curriculum.
Guidelines:
Identify purpose
Select objectives.
Think about the kinds of entries that will best match instructional outcomes.
Decide how much to include, how to organize the portfolio, where to keep it
and when to access.
Set the criteria for judging the work (rating scales, rubrics, checklists) and
make such student understand the criteria.
Identify Purpose:
Without purpose, a portfolio is only a collection of student work samples.
Different purposes result in different portfolios. For example, if the student is to be
evaluated on the basic of the work in the portfolio for admission to college, then his
final version of his best work would probably be included in the portfolio.
Select Objectives:
The objectives ot be met be students should be clearly stated a list of
communicative functions can be included for students to check when the feel
comfortable with them and stapled to the inside lover. Students would list the little or
the number of the samples which address this function.
Portfolios also can be organized according the selected objectives addressing
one skill such as writing. The selected objectives will be directly related to the stated
purpose for the portfolio. At any rate, teachers must ensure that classroom instruction
support the identified seals.
Decide how much to include & how to Organize:
Teachers may want to spend some time going over the purpose of the portfolio
at regular intervals with students to ensure that the selected pieces do address the
purpose and the objectives. At regular times, ask students to go through their entries,
to choose what should remain in the portfolio, and what could be replaced by another
work which night be move illustrative of the objectives. Other material no longer
current and/or not useful to document student progress toward attain bent of the
objective should be discarded.
What is the students role?
The students role of participation in the portfolio will be largely responsible
for the success of the portfolio. For this reason, students must be actively involved in
the choices of entries and in the rationale for selecting those entries.
i.
Selecting:
The students first role is in selecting some of the items to be4 pair of the
portfolio. Some teachers give students a checklist for making choices. Others leave
students almost freedom in selecting their entries. At an rate student should include
their best and favorite pieces of work along with those showing growth and process.
ii.
about their own work. At the beginning students might not know what to saw so
teaching will need to model the kinds of reflection expected from students.
Set the Criteria for Judging the Work:
There are two kinds of criteria needed at this point.
Criteria for individual entries (refers to the section on rubrics for details).
portfolio as a whole. If the purpose of the portfolio is to now student progress then if
is highly probable that some of the beginning entries may not reflect high quality;
however, over several months, the student now have demonstrated growth toward the
stated objectives.
Reference
Katozai, Murad Ali. Measurement & Evaluation. Peshawar. University Publisher,
2013
6.4
Portfolio:
Literally the word Portfolio is used in the following meanings:
1. A portable large things and flat briefcase especially of leather used for
carrying papers, pictures, drawings or maps.
2. A list of the financial assests held by an individual or a bank or other financial
institution.
3. The role of the head of a government department e.g. He holds the portfolio
for foreign affairs.
4. An organized presentation of an individuals education, work samples and
skills.
Using portfolios of studne4t work in Instruction and communication:
The term portfolio has become popular buzz word.
Unfortunately, it is not always clear exactly what is meant or implied by the
term especially when used in the context of portfolio assessment. This training
module is intended to clarify the notion of portfolio assessment and help users design
such assessments in a thoughtful manner. We begin with a discussion of the rationale
for assessment alternatives and the discuss portfolio definitions characteristics and
design considerations.
Educators and critics are currently reciting a litany of problems concerning the
use of multiple-choice and other structured format tests for assessing many important
students outcomes. This has been accompanied by an explosion of activity searching
for assessment alternatives.
1.
Capture a richer array of what students know and can do than is possible with
multiple-choice tests. Current goals for students go beyond knowledge of facts
and include such things as problems solving critical thinking, lifelong learning
of new information and thinking independently. Goals also include
dispositions such as persistence, flexibility, motivation and self-confidence.
2.
3.
Make our assessment align with what we consider important outcomes for
students in order to communicate the right message to students and other about
what we valve. For example if we emphasize higher order thinking in
instruction but only test knowledge because testing thinking is difficult,
students figure out pretty fast figure out pretty fast what is really valued.
4.
Have realistic contexts for the production of work, so that we can examine
what students know and can do in real-life situations.
5.
6.
7.
2.
3.
4.
5.
6.
It should give opportunities for students in selecting pieces they consider most
reprehensive of themselves as learners to be placed into their portfolios, and to
establish criteria for their selections.
7.
8.
9.
Share the criteria that will be used to assess the work in the portfolio as well as
in which the result are to be used. Teachers should give feedback to students,
parents about the use the portfolio.
In conclusion, in portfolio making process some necessary steps are;
assessment of studies should be clearly explained the process should over a certain
time period, portfolio should encourage students to learn, and items in the portfolio
should be multi-dimensional and should address different learning areas. Besides, it is
vitally important that the studies in a portfolio should be designed in order to present
students, performance and development in any time period in detail.
Reference
Katozai, Murad Ali. Measurement & Evaluation. Peshawar. University Publisher,
2013
6.5
1.
2.
3.
It allows students, parent, teacher and staff to evaluate the students' strength
and weakness.
4.
5.
6.
7.
8.
9.
It encourages students to think of creative ways to share what they are learning
10.
11.
12.
13.
14.
15.
16.
It can accommodate diverse learning styles, though they are not suitable for all
learning styles.
17.
18.
Portfolios can assess performance, with practical application of theory, in realtime naturalistic settings (i.e., authentic assessment).
19.
20.
21.
Portfolios have high face validity, content validity, and construct validity.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
individual, each unique with its own characteristics, needs, and strengths.
32.
33.
34.
35.
36.
When portfolios are used for summative assessment, students may be reluctant
to reveal weaknesses.
2.
3.
4.
5.
6.
7.
May be seen as less reliable or fair than more quantitative evaluations such as
test scores.
8.
Can be very time consuming for teachers or program staff to organize and
evaluate the contents, especially if portfolios have to be done in addition to
traditional testing and grading.
9.
10.
If goals and criteria are not clear, the portfolio can be just a miscellaneous
collection of artifacts that don't show patterns of growth or achievement.
11.
Like any other form of qualitative data, data from portfolio assessments can be
difficult to analyze or aggregate to show change.
2.
3.
4.
Providing information that gives meaningful insight into behaviour and related
change. Because portfolio assessment emphasizes the process of change or
growth, at multiple points in time, it may be easier to see patterns.
5.
6.
Allowing for the possibility of assessing some of the more complex and
important aspects of many constructs (rather than just the ones that are easiest
to measure).
Evaluating programs that have very concrete, uniform goals or purposes. For
example, it would be unnecessary to compile a portfolio of individualized
evidence in a program whose sole purpose is full immunization of all
children in a community by the age of five years. The required immunizations
are the same, and the evidence is generally clear and straightforward.
2.
3.
4.
May be seen as less reliable or fair than more quantitative evaluations such as
test scores.
5.
Can be very time consuming for teachers or program staff to organize and
evaluate the contents, especially if portfolios have to be done in addition to
traditional testing and grading.
6.
7.
If goals and criteria are not clear, the portfolio can be just a miscellaneous
collection of artifacts that dont show patterns of growth or achievement.
8.
Like any other form of qualitative data, data from portfolio assessments can be
difficult to analyze or aggregate to show change.
6.6
EVALUATION OF PORTFOLIO:
According to Paulson, Paulson and Meyer, (1991, p. 63): Portfolios offer a
way of assessing student learning that is different than traditional methods. Portfolio
assessment provides the teacher and students an opportunity to observe students in a
broader context: taking risks, developing creative solutions, and learning to make
judgments about their own performances.
In order for thoughtful evaluation to take place, teachers .must have multiple
scoring strategies to evaluate students' progress. Criteria for a finished portfolio might
include several of the following:
prioritize those criteria that will be used as a basis for assessing and evaluating student
progress, both formatively (i.e., throughout an instructional time period) and
summatively (i.e., as part of a culminating project, a.ctivity, or related assessment to
determine the extent to which identified curricular expectancies, indicators, and
standards have been achieved).
As the school year progresses, students and teacher can work together to
identify especially significant or important artifacts and processes to be captured in
the portfolio. Additionally, they can work collaboratively to determine grades or
scores to be assigned. Rubrics, rules, and scoring keys can be designed for a variety of
portfolio components. In addition, letter grades might also be assigned, where
appropriate. Finally, some form of oral discussion or investigation should be included
as part of the summative evaluation process. This component should involve the
student, teacher, and if possible, a panel of reviewers in a thoughtful exploration of
the portfolio components, students' decision-making and evaluation processes related
to artifact selection, and other relevant issues.
UNIT-7:
BASIC CONCEPTS OF INFERENTIAL STATISTS
7.1
Introduction:
The role and importance of statistics in education cannot be denied. In
education we come across with measurement, evaluation and research. Similarly, we
have to make educational policies and budgets. In all these fields we need to make
proper measurement and present the data quantitatively. Thus without statistics we
cannot make proper measurement. As quoted in different statistics books "Planning is
the order of the day, and planning without statistics is inconceivable". Good statistics
and sound statistical analysis assist in providing the basis for the design of educational
policies, monitor policy implications and evaluate policy impact. To generate reliable
and relevant information the data should be collected using appropriate statistical
methods. The materials one uses for data collection should be well designed. The data
analysis should also be done using appropriate statistical method. All these show that
statistics plays vital role in Education Management and educational planning.
Concept of Inferential Statistics
Definition:
The branch of statistics concerned with using sample data to make an
inference about a larger group of data is called inferential statistics.
Example:
For instance the college teacher decides to use the average grade achieved by
one statistics class to estimate the average grade of all the sections of the same
statistics course. This is a problem of estimation, which falls in the inferential
statistics.
To illustrate this practically, imagine an entire room full of socks. You want to
determine whether there are more white socks than green socks in the room. However,
there are too many socks in the room to count them all, so you want to take a sample
of socks. Based on this sample of socks, you will draw a conclusion about whether
there are more white socks than green socks. After you collect your sample, then you
will need to calculate inferential statistics is to determine whether the colours chosen
in your sample likely reflect the colours of socks in the entire room or if your results
were due to chance.
What factors will determine whether the colours in the sample of socks
adequately represents the colours of the entire room? Sample size. If you only pick
two socks, they would probably not represent the entire room. The larger the sample
is, the more representative the sample will be of the entire room and the more likely
the inferential statistics will find a significant result. This is why when conducting
experiments, the larger the sample is, the better: with large samples, the results will
more likely reflect the entire population.
Inferential statistics is the mathematics and logic of how this generalization
from sample to population can be made. The fundamental question is: can we infer the
population's characteristics from the sample's characteristics? Descriptive statistics
remains local to the sample, describing its central tendency and variability, while
inferential statistics focuses on making statements about the population.
Unlike descriptive statistics, inferential statistics provide ways of testing the
reliability of the findings of a study and "inferring" characteristics from a small group
of participants or people (your sample) onto much larger groups of people (the
population). Descriptive statistics just describe the data, but inferential let you say
what the data mean.
7.2
SAMPLING ERROR:
In statistics, sampling error is incurred when the statistical characteristics of a
population are estimated from a subset, or sample, of that population. Since the
sample does not include all members of the population, statistics on the sample, such
as mean and quantities, generally differ from parameters on the entire population.
For example:
If one measures the height of a thousand individuals from a country of one
million, the average height of the thousand is typically not the same as the average
height of all one million people in the country. Since sampling is typically done to
determine the characteristics of a whole population, the difference between the sample
and population values is considered a sampling error.
Population and Samples:
A population is the entire group to which we want to generalize our results. A
sample if a subset of the population might be all adult humans but our sample might
be a group of 30 friends and relatives.
Types of sampling errors:
1.
2.
3.
1.
Random sampling
Bias problems
Non-sampling error
Random Sampling:
In statistics, sampling error is the error caused by observing a sampling instead
of the whole population. The sampling error can be found by subtracting the
value of a parameter from the value of a statistic. In nursing research, a sample
error is the difference between sample statistics used to estimate a population
parameter and the actual but unknown value of the parameter.
(Bunns and Grove, 2009)
Parameters and statistics:
A numerical summary of a population is called a parameter, while the same
2.
3.
error is a catch all term for the deviations from the true value that are not a
function of the sample chosen, including various systematic errors and any
random errors that are not due o sampling. Non-sampling errors are much
harder to quantify than sampling error.
Example of non-sampling error:
Answers given by respondents may be influenced by the desire to impress an
4.
1.
2.
3.
4.
5.
7.3
interviewer.
Characteristics:
Sampling Error:
Generally decreased as the sample size increases (but not proportionally)
Depends on the size of the population under study.
Depends on the variability of the characteristic of interest in the population.
Can be accounted for and reduced by an appropriate sampling plan.
Can be measured an controlled in probability sample surveys.
NULL HYPOTHESIS:
Before defining the term null-hypothesis, it is necessary that we must know
> 22
2 # 25
1 = 2
1 - 2 = 0
Null Hypothesis:
The hypothesis to be tested in a test of hypothesis is called null hypothesis. It
is a hypothesis which is tested for possible rejection or mollification under the
assumption that it is true. It is denoted by H0 and usually contains and equal sign.
For example if we want to test that the population mean is 80, then we write.
H0 : = 80
Another definition of Null-Hypothesis:
Null hypothesis is a type of hypothesis used in statistics that proposes that no
statistical significance exists in a set of given observations.
The null hypothesis attempts to show that no variation exists between
variables, or that single variable is no different than ero. It is presumed to be true
until statistical evidence nullifies it for an alternative hypothesis.
Examples:
Hypothesis:
The loss of my socks is due to alien burglary. (Alien burglary means
unfamiliar theft).
Null Hypothesis:
The loss of my socks is nothing to do with alien burglary.
Alternative Hypothesis:
The loss of my socks is due to alien burglary. In statistics, the only way of
supporting your hypothesis is to refute the null hypothesis. Rather than trying to brave
your idea (the alternative hypothesis) right you must show that the null hypothesis is
likely to be wrong. You have to refute or nullify the null hypothesis.
7.4
TESTS OF SIGNIFICANCE:
Once sample data has been gathered through an observational study or
If we conclude "do not reject HO", this does not necessarily mean that the null
hypothesis is true, it only suggests that there is not sufficient evidence against HO in
favor of Ha; rejecting the null hypothesis then, suggests that the alternative hypothesis
may be true.
Example
Suppose a test has been given to all high school students in a certain state. The
mean test score for the entire state is 70, with standard deviation equal to 10.
Members of the school board suspect that female students have a higher mean score
on the test than male students, because the mean score from a random sample of 64
female students is equal to 73. Does this provide strong evidence that the overall
mean for female students is higher?
The null hypothesis HO claims that there is no difference between the mean
score for female students and the mean for the entire population, so that = 70. The
alternative hypothesis claims that the mean for female students is higher than the
entire student population mean, so that > 70.s
Steps in Testing for Statistical Significance
1.
2.
3.
4.
5.
1)
General: The length of the job training program is related to the rate of job placement
of trainees. Direction: The longer the training program, the higher the rate of job
placement of trainees.
Magnitude: Longer training programs will place twice as many trainees into jobs as
shorter programs.
General: Graduate Assistant pay is influenced by gender.
Direction: Male graduate assistants are paid more than female graduate assistants.
Magnitude: Female graduate assistants are paid less than 75% of what male graduate
assistants are paid.
2)
LEVELS OF SIGNIFICANCE:
In hypothesis testing, the significance level is the criterion used for rejecting
level (1% level), although the choice of levels is largely subjective. The lower the
significance level, the more the data must diverge from the null the 0.01 level is more
conservative than the 0.05 level.
Symbols:
The Greek word alpha () is sometime used to indicate the significance level.
The above explanation shows that the significance level is a value associated to some
statistical value, tests, which indicates the probability of obtaining those on more
extreme results. This value can be interpreted as the probability of obtain those
results. If the null hypothesis were (true) when (sampling is random) on as the
probability of obtaining those results by chance alone. (When sampling is less than
random). The value of this probability (also known as p, p value, alpha &
Type I error) runs between 0& 1. The closer to 0 the lower the probability of the
results being found if the null hypothesis were true, on the lower the probability of the
result being a chance result. As stated in beginning, significance levels are used to
reject the null hypothesis that, for example, there is no correlation between variables
there is no difference between groups on there is no change between treatments.
A significant level of 0.051 is conventionally used in the social sciences,
although probity as high as 0.10 also be used. Probability greater than 0.10 are
rarely used. A significance level of 0.05 for example indicates that there is a 5%
probability that results are due to chance. A significance level of 0.10 indicates a 10%
probability that the results are due to chance. Thus, using significance levels above
0.10 is rather risky: while using lower significance level is safer.
History:
The present day concept of statistical significance originated by Ronald Fisher
when he developed statistical hypothesis testing which he described as test of
significance in his 1925 publication.
Statistical Errors
Even in the best research project, there is always a possibility that the
researcher will make a mistake regarding the relationship between the two variables.
This mistake is called statistical error.
In statistical test theory the notion of statistical error is an integral part of
hypothesis testing. The test requires an unambiguous statement of a null hypothesis,
which usually corresponds to a default "state of nature", for example "this person is
healthy", "this accused is not guilty" or "this product is not broken". An alternative
hypothesis is the negation of null hypothesis, for example, "this person is not
healthy", "this accused is guilty" or "this product is broken". The result of the test may
Type-I Error:
The first is called a Type I error. This occurs when the researcher assumes that
a relationship exists when in fact the evidence is that it does not. In a Type 1 error, the
researcher should accept the null hypothesis and reject the research hypothesis, but
the opposite occurs. The probability of committing a Type I error is called alpha (a).
A type I error, also known as an error of the first kind, occurs when the null
hypothesis (H0) is true, but is rejected. It is asserting something that is absent, a false
hit. A type I error may be compared with a so-called false positive (a result that
indicates that a given condition is present when it actually is not present) in tests
where a single condition is tested for. Type I errors are philosophically a focus of
skepticism and Occam's razor. A Type I error occurs when we believe a falsehood. In
terms of folk tales, an investigator may be "crying wolf' without a wolf in sight
(raising a false alarm) (Ho: no wolf).
The rate of the type I error is called the size of the test and denoted by the
Greek letter a (alpha). -It usually equals the significance level of a test. In the case of
a simple null hypothesis a is the probability of a type I error. If the null hypothesis is
composite, a is the maximum (supremum) of the possible probabilities of a type I
error.
Explanation:
A Type I Error is also known as a False Positive or Alpha Error. This happens
when you reject the Null Hypothesis even if it is true. The Null Hypothesis is simply a
statement that is the opposite of your hypothesis. For example, you think that boys are
better in arithmetic than girls. Your null hypothesis would be: "Boys are not better
than girls in arithmetic."
You will make a Type I Error if you conclude that boys are better than girls in
arithmetic when in reality, there is no difference in how boys and girls perform. In this
case, you should accept the null hypothesis since there is no real difference between
the two groups when it comes to arithmetic ability. If you reject the null hypothesis
and say that one group is better, then you are making a Type I Error.
Type-II Error
The second is called a Type II error. This occurs when the researcher assumes
that a relationship does not exist when in fact the evidence is that it does. In a Type II
error, the researcher should reject the null hypothesis and accept the research
hypothesis, but the opposite occurs. The probability of committing a Type II error is
called beta.
Generally, reducing the possibility of committing a Type I error increases the
possibility of committing a Type II error and vice versa, reducing the possibility of
committing a Type II error increases the possibility of committing a Type I error.
Researchers generally try to minimize Type I errors, because when a
researcher assumes a relationship exists when one really does not, things may be
worse off than before. In Type II errors, the researcher misses an opportunity to
confirm that a relationship exists, but is no worse off than before.
Type II Error is a statistical term used within the context of hypothesis testing
that describes the error that occurs when one accepts a null hypothesis that is
actually false. The error rejects the alternative hypothesis, even though it does
not occur due to chance.
A type II error accepts the null hypothesis, although the alternative hypothesis
is the true state of nature. It confirms an idea that should have been rejected, claiming
that two observances are the same, even though they are different.
Example:
An example of a type II error would be a pregnancy test that gives a negative
result, even though the woman is in fact pregnant. In this example, the null hypothesis
would be that the woman is not pregnant, and the alternative hypothesis is that she is
pregnant.
In other words, a type DI error, also known as an error of the second kind, occurs
when the null hypothesis is false, but erroneously fails to be rejected. It is failing to
assert what is present, a miss. A type II error may be compared with a so-called false
negative (where an actual 'hit' was disregarded by the test and seen as a 'miss') in a
test checking for a single condition with a definitive result of true or false. A Type II
error is committed when we fail to believe a truth. In terms of folk tales, an
investigator may fail to see the wolf ("failing to raise an alarm"). Again, Ho: no wolf.
The rate of the type II error is denoted by the Greek letter f3 (beta) and related
to the power of a test (which equals 143).
What we actually call type I or type H error depends directly on the null
hypothesis. Negation of the null hypothesis causes type I and type II errors to switch
roles.
The goal of the test is to determine if the null hypothesis can be rejected. A
statistical test can either reject (prove false) or fail to reject (fail to prove false) a null
hypothesis, but never prove it true (i.e., failing to reject a null hypothesis does not
prove it true).
Explanation:
A Type II Error is also known as a False Negative or Beta Error. This happens
when you accept the Null Hypothesis when you should in fact reject it. The Null
Hypothesis is simply a statement that is the opposite of your hypothesis. For example,
you think that dog owners are friendlier than cat owners. Your null hypothesis would
be: "Dog owners are as friendly as cat owners."
You will make a Type II Error if dog owners are actually friendlier than cat
owners, and yet you conclude that both kinds of pet owners have the same level of
friendliness. In this case, you should reject the null hypothesis since there is a real
difference in friendliness between the two groups. If you accept the null hypothesis
and say that both types of pet owners are equally friendly, then you are making a Type
II Error.
7.7
DEGREES OF FREEDOM:
In statistics, the numl er of degrees of freedom is the number of values in the
step is an often overlooked but crucial detail in both the calculation of confidence
intervals and the workings of hypothesis tests.
There is not a single general formula for the number of degrees Of freedom for
every inferenceproblem. Instead there are specific formulas to be used for each type
of procedure in inferentialstatistics. In other worlds, the setting that we are working in
will determine how we calculate thenumber of degrees of freedom.
Determining Degree of Freedom:
A Few Examples
For a moment suppose that we know the mean of data is 25 and that the values
are 20,10, 50, and one unknown value. To find the mean of a list of data, we add all of
the data and divide by the total number of values. This gives us the formula (20 + 10
+ 50 + x)/4 = 25, where x denotes the unknown. Despite c ling this unknown, we can
use some algebra to determine that x = 20.
Let's alter this scenario slightly. Instead we suppose that we know the mean of
a data set is 25, with values 20, 10; and two unknown values. These unknowns Could
be different, so we use two different variables, A and y to denote this. The resulting
formula is (20 + 10 + x +y)/4 = 25. With some algebra we obtain y = 70 - x. The
formula is written in this form to show that once we choose a value for x, the value
fory is determined. This shows 'that there is one degree of freedom.
Now we'll look at a t ample size of one hundred. If we know that the mean of
this sample data is 20, but do not know he values of any of the data, then there are 99
degrees of freedom. All values must add up t ) a total of 20 x 100 = 2000. Once we
have the values of 99 elements in the data set, then the last one has been determined.
Example
To compute the variance I first sum the square deviations from the mean. The
mean is a parameter: it is a characteristic of the variable under examination as a whole
and is part of describing the overall distribution of values. If you know all the,
parameters you can accurately describe the data. The more parameters you know, that
is to saythe more you fix, the fewer samples fit this mode of the data. If you know
only the mean, there will be many possible sets of data that are consistent with this
model but if you know the mean and the standard deviation, fewer possible sets of
data fit this model.
So in computing the Variance I had first to calculate the mean. When I have
calculated the mean, I could vary any of the scores in the data except for one. If I
leave one score unexamined it can always be calculated accurately from the rest of the
data and the mean itself. Maybe an example can make this clearer.
I take the ages of a class of students and find the mean. If I fix the mean, how
many of the other scores (there are N of them remember) could still vary? The answer
is N-1. There are N-1 independent pieces of information that could vary while the
mean is known. These are the degrees of freedom. One piece of information cannot
vary because its value is fully determined by the parameter (in t its case the mean) and
the other scores. Each parameter that is fixed during our computations constitutes the
loss of a degree of freedom.
If we imagine starting with a small number of data points and then fixing a
relatively large number of parameter: as we compute some statistic, we see that as
more degrees of freedom are lost, fewer and fewer different situations are accounted
for by our model since fewer and fewer pieces of information could in principle be
different from what is actually observed.
So, the interest, to put it very informally, in our data is determined by the
degrees of freedom: if there is nothing that can vary once our parameter is fixed
(because we have so very few data points - maybe just or e) then there is nothing to
investigate. Degrees of freedom can be seen as linking sample size to explanatory
power.
The Standard Deviation is a measure of how spread out numbers are;
Its symbol is a (the greek letter sigma)
The formula is easy: It is the square root of the Variance.
To calculate the variance follow these steps:
Work out the Mean (the simple average of the numbers)
Then for each number: subtract the Mean and square the resIult (the squared
difference).
Then work out the average of those squared differences.
Let suppose we have five values i.e 600,470,170,430 & 300
Mean =
2
Variance: =n
1970
5
94
2
2062 + 762 +(224 )2+36 2+
42,436+5,776+50,176+1,296+ 8,836
5
108,520
5
= 21,704
= 394
2
Variance =
94
2
2062 + 762 +(224 )2+36 2+
42,436+5,776+50,176+1,296+ 8,836
5
108,520
5
= 21,704
UNIT-8:
SELECTED TESTS OF SIGNIFICANCE
8.1
T-TEST:
Definition:
i)
A t-test helps you compare weather two groups have different average
values (For example, weather men and women have different average
ii)
heights).
A t-test asks weather a different between two groups averages unlikely to
have occurred because of random chance in sample selection. A difference
is more likely to be meaningful and real if (a) the difference between,
the average is large, (b) the sample size is large, and (c) Responses are
consistently close to the average values and not widely spread out (the
iii)
primary outputs of the t-test. Statistical significance indicates weather the difference
between sample averages is likely to represent an actual difference between
population and the effect size indicates wither that difference is large enough to be
practically meaningful.
The One sample t-test is similar to the independent samples t-test except it
is used to compare one groups average value to a single number .x. for practical
purposes you can look at the confidence interval around the average value to gain this
same information.
The paired t-test is used when each observation in one group is paired with a
related observation in the other group. For example do Kansans spend money on
movies in January to February. Where each respondent is asked about their January
from their February spending? In fact a period t-test subtracts each respondents
January spending from their February spending (yielding the increase is spending),
then take the average of all those increases in spending and looks to see wither that
average is statistically significantly greater than Zero (using a one sample t-test).
The ranked independent t-test ask a similar question to the typical unranked
test but it is more robust to outliners (a few bad outliners can make the results of an
unranked t-test invalid).
T-test (Independent Samples)
Dollars spend on movies per month. Stat-wing represents t-test results as
distribution curves. Assuming there is a large enough sample size, the difference
between these samples probably represents a reals difference between population
from they were sampled.
Example:
Lets say you are curious about wether New Yorkers and Kansans spend a
different amount of money per month on movies. It is impractical to ask every New
Yorker and Kansans about their movie spending, so instead you ask a sample of each
may be 300 New Yorkers and 300 Kansans and the average are 14 Dollars and 18
Dollars. The t-test asks wether that difference is probably representative of a real
difference between Kansans and New Yorkers generally or whether that is most likely
a meaningless statistical fluke.
SE
( X T X c )=
var T var c
+
nc
nc
Remember that the variances is simply the square of the standard deviation.
The final formula for the T-test is shown in the given figure as under.
T=
T X C
X
nar t nar c
+
nT
nC
References
OMahony, Michael (1986). Sensory Evaluation of Food: Statistical Methods and
procedures.
William H.; Saul A. Teukolsky. William T. Vetterling Br Ain P. Flannery (1992).
Numerical Recipes in C: The Art of \Scientific Computing.
Internet Google, pre Encyclopedia.
8.2
CHI-SQUARE (X2):
The X2-distribution (X is the Greek letter Chi, pronounced Ki) was first
obtained in 1875 by H.R Helmert a German physicist. Later in 1900, Karl Pearson
showed that as n-increasing to infinity a discrete multinomial distribution may be
transformed and made to approach a chi-square distribution. This approximation has
broad application such as a test of goodness of fit, as a test of independence and a test
of homogeneity.
The chi-square distribution contains only one parameter, called the number of
degree of freedom.
Chi-Square Distribution:
Let Z1, Z2 ----- Zn be normally and independently distributed variables with
Zero mean and unit vassance (0, 1). Then the random variable expressed by the
quantity.
X2 =
2.
3.
4.
The variance of f2 distribution is equal to twice the degree of freedom i.e. 2n.
5.
6.
7.
8.
Uses of X2 Distribution:
1.
2.
3.
4.
5.
To see whether there is evidence of small or large differences, the test statistic to use
is;
2 /ei
oiei
u
(npi)2
=
npi
i=1
K
x2
i=1
B1
B2
Total
A1
O11
O12
(A1)
A2
O21
O22
(A2)
Total
(B1)
(B2)
eij
=
( Ai)
eij
i=1
4
(Ai )
(Bj )
n
So that
i=1
Eij =
( Ai ) (Bj)
n
That is, under Ho: The classification are independent, the expected frequency
in any cell is equal to the product of the marginal total common to that cell divided by
the total number of observation.
If our hypothesis of independence is true the difference between observed and
expected frequencies are small and are attributed to sampling error. Large differences
arise of the seeing false. The Chi-square statistic provides a means for deciding
whether the differences are large or small overall. Hence the statistic to use is,
i=1
j=1
X 2= ( oijeij )
1 /eij
With (r-1) (c-1) degrees of freedom. Where r represents rows and c represents
the number of columns. A large value of X2 indicates that the null hypothesis is false.
The procedure for testing the null hypothesis of independence in
contingency table is given below:
i)
H1:
ii)
iii)
i=1
j=1
X = ( oijeij )
2
2 /eij
eij=
( Ai )( Bj )
n
iv)
Decide as below:
(i)
(ii)
Accept H0 if
X2> X2 (r-1) (c-1)
References
1.
2.
8.3
REGRESSION:
In statistics, regression analysis is a statistical technique for estimating the
(i)
Linear regression
(ii)
Multiple regression.
Linear regression uses one independent variable to explain and/or predict the
outcome of Y, while multiple regression uses two or more independent variables to
predict the outcome. The general form of each type of regression is:
Linear Regression: Y = a + bX + u
Multiple Regression: Y = a + b1 X1+ b2 X2 + B3 X3 B3X3 + Bt Xt u
Where:
Y
the intercept
the slope