You are on page 1of 18

Professional Education: Assessment and Evaluation of Learning

Basic Concepts
• Test - an instrument designed to measure any characteristic, quality, ability, skill or knowledge
• Measurement - a process of quantifying the degree to which someone or something possesses a given trait (i.e. quality, characteristics,
feature)
• Assessment - a process of gathering and organizing quantitative or qualitative data into an interpretable form to have a basis for judgment
or decision-making
• Evaluation - a process of systematic collection and analysis of both qualitative and quantitative data in order to make some judgment or
decision; involves judgment about the desirability of changes in students

Assessment
• Traditional Assessment – refers to pen and paper mode of assessing any quality, ability, skill or knowledge (Ex. standardized and teacher-
made tests)
• Alternative Assessment
◦ Performance-based Assessment - a mode of assessment that requires the students to perform a significant task that is relevant to a
task outside the school (Ex. practical test, oral and aural tests, projects)
◦ Portfolio Assessment - a process of gathering multiple indicators of student progress to support course goals in dynamic, ongoing
and collaborative process
• Authentic Assessment - refers to the use of assessment methods that simulate true-to-life situations

Purposes of Assessment

• Assessment for learning (Ex. placement, formative, diagnostic)


• Assessment of learning (Ex. summative)
• Assessment as learning (training teachers on how to assess)

Principles of High Quality Assessment

A. Clear and appropriate learning targets G. Practicality and efficiency


B. Appropriate methods H. Assessment should be a continuous process
C. Balance I. Authenticity
D. Validity J. Communication
E. Reliability K. Positive consequences
F. Fairness L. Ethics

Performance-based Assessment
• A process of gathering information about student’s learning through actual demonstration of essential and observable skills and creation of
products that are grounded in real world contexts and constraints

Types of Performance-based Task

• Demonstration-type - requires no product (Ex. cooking demonstrations, entertaining tourists)


• Creation-type - requires tangible products (Ex. project plan, Research paper, Project flyers

Criteria in Selection a Task

A. Generalizability E. Feasibility
B. Authenticity F. Scorability
C. Multiple foci G. Fairness
D. Teachability

How?

• Identify the competency that has to be demonstrated by the students with or without a product.
• Describe the task to be performed by the students either individually or as a group, the resources needed, time allotment and other
requirements to be able to assess the focused competency.
• Develop a scoring rubric reflecting the criteria, levels of performance and the scores.

Portfolio Assessment
• A purposeful, ongoing, dynamic, and collaborative process of gathering multiple indicators of the learner’s growth and development
• Also performance-based but more authentic than any other performance-based task

Page 1
Principles of Portfolio Assessment

• Content principle – should reflect important subject matter


• Learning principle – should enable students to become more active learners
• Equity principle – should allow students to demonstrate their learning styles and intelligence

Types of Portfolios

Portfolios come in three types


• Working portfolio – a collection of a student’s day-to-day works which reflect his/her learning
• Show portfolio – a collection of a student’s best works
• Documentary portfolio – a combination of a working and a show portfolio

Steps in Portfolio Development


1. Set Goals

2. Collect
Evidences 7. Exhibit

3. Select

6. Evaluate

4. Organize

5. Reflect

Rubrics
• A measuring instrument used in rating performance-based tasks
• Offers a set of guidelines or descriptions in scoring different levels of performance or qualities of products of learning

Similarity of Rubric with Other Scoring Instruments

A rubric is a modified checklist and rating scale.

Shows the Shows the


observed traits degree of Rating
of a work or quality of a
Checklist performance work or Scale
performance
Rubric

Types of Rubrics

Type Description
Holistic Describes the overall quality of a performance or product; there is only one rating given to the entire work or performance
Rubric
Analytic Describes the quality of a performance or product in terms of the identified dimensions and/or criteria which are rated
Rubric independently to give a better picture of the quality of work or performance

Important Elements of a Rubric

Whether holistic or analytic, the rubric should have the following information
• Competency to be tested – this should be a behavior that requires either a demonstration or creation of products of learning
• Performance task – the task should be authentic, feasible, and has multiple foci
• Evaluative criteria and their indicators – these should be made clear using observable traits
• Performance levels – these levels could vary in number from 3 or more
• Qualitative and quantitative descriptions of each performance level – these descriptions should be observable to be measurable

Tests
Purposes/Uses of Tests

• Instructional (Ex. grouping learners for instruction within a class, identifying learners who need corrective and enrichment experiences,
assigning grades)
• Guidance (Ex. preparing information/data to guide conferences with parents about their children, determining interests in types of
occupations not previously considered or known by the students)
• Administrative (Ex. determining emphasis to be given to the different learning areas in the curriculum, determining appropriateness of the
school curriculum for students of different levels of ability)

Page 2
Types of Tests

According to:
Educational Test Psychological Test
• Aims to measure the results of • Aims to measure students
instruction intelligence or mental ability in a
• Administered after the large degree without reference to
instructional process what the student has learned
What it measures (Purpose) Example: Achievement Test • Intangible aspects of an
individual
• Administered before the
instructional process
Examples: Aptitude Test, Personality Test,
Intelligence Test
Norm-Referenced Test Criterion-Referenced Test
• Result is interpreted by • Result is interpreted by
comparing one student with other comparing a student against a
students set of criteria
How it is interpreted (Interpretation) • Some will really pass • All or none may pass
• There is competition for a limited • There is NO competition for a
percentage of high score. limited percentage of high score.
• Describes student’s performance • Describes student’s mastery of
compared to others the course objective
Survey Mastery Test
• Covers a broad range of • Covers a specific learning
objectives objective
The scope of the test (Scope and Content) • Measures general achievement • Measures fundamental skills and
in certain subjects abilities
• Is constructed by trained • Is typically constructed by the
professional teacher
Power Speed
• Consists of items of increasing • Consists of items with the same
Level of difficulty of the test and time level of difficulty but taken with level of difficulty but taken with
allotment (Time Limit and Level of ample time time limit
Difficulty) • Measures a student’s ability to • Measures student’s speed and
answer more and more difficult accuracy in responding
items
Individual Group
• Given to one student at a time • Given to many individuals at the
• Mostly given orally or requires same time
actual demonstration of skill • Usually a pencil and paper test
Manner of administration • Many opportunities for clinical • Lack of insights about the
observation examinee
• Chance to follow-up examinee’s • Same amount of time needed to
response in order to clarify gather information from each
student (i.e. efficient)
Verbal Non-Verbal
Language mode
• Words are used by students in • Pictures or symbols are used by
attaching meaning to or students in attaching meaning to
responding to test items or in responding to test items
Standardized Informal
• Made by an expert; tried out, so it • Made by the classroom teacher;
can be used to a wider group not tried out
• Covers a broad range of content • Covers a narrow range of content
covered in a subject area • Various types of items are used
Who constructed the test and who can • Uses mainly multiple choice • Teacher picks or writes items as
take it (Construction) • Items written are screened and needed for the test
the best items are chosen for the • Scored by a teacher
final instrument • Interpretation of results is usually
• Can be scored by a machine criterion-referenced
• Interpretation of results is usually
norm-referenced
Objective Subjective
• Scorer’s personal biases do not • Affected by scorer’s personal
affect scoring bias, opinion, or judgment
• Worded so that only one answer • Several answers are possible
Degree of influence of the rater on the
satisfies the requirement of the • Possible disagreement on what is
outcome (Effect of Biases)
statement the correct answer
• Little or no disagreement on what
is the correct answer

Format Selective Test Supply Test Essay Test


• There are • There are no
choices for the choices for the Examples:
answer. answer. • Restricted
• Can be • Preparation of Response
Page 3
answered quickly items is relatively • Extended
• Prone to easy because Response
guessing only a few
• Time consuming questions are
to construct needed
• Lessen the
Examples: chance of
• Multiple choice students
• True-False or guessing the
Alternative correct answer
Response • Bluffing is a
• Matching Type problem
• Time consuming
to score

Examples:
• Short Answer
• Completion Test

Assessment of Affective and Other Non-Cognitive Learning Outcomes

Affective/Non-Cognitive Learning Sample Behavior


Outcome
Social attitudes Concern for the welfare of others, sensitivity to social issues, desire to work toward social
improvement
Scientific attitude Open-mindedness, risk taking and responsibility, resourcefulness, persistence, humility,
curiosity
Academic self-concept Self-perception as a learner in particular subjects (Math, etc.)
Interests Feeling toward various educational, mechanical, aesthetic, social, recreational, vocational
activities
Appreciations Feelings of satisfaction and enjoyment expressed toward nature, music, art, literature,
vocational activities
Adjustments Relationship to peers, reaction to praise and criticism, emotional, social stability, acceptability

Affective Assessment Procedures/Tools

• Observational Techniques
◦ Anecdotal records
◦ Peer appraisal
▪ Guess-Who technique
▪ Sociometric technique
◦ Self-report technique
◦ Attitude scales
• Personality Assessments
◦ Personality inventories
◦ Creativity tests
◦ Interest inventories

Stages in the Development and Validation of an Assessment Instrument


Phase I: Phase II: Phase III: Phase IV:
Planning Stage Item Writing Stage Try Out Stage Evaluation Stage
1. Specify the 1. Write test items 1. Conduct First Trial Run 1. Administration of the
objectives/skills and based on the (50 to 100 students) final form of the test
content area to be Table of 2. Score 2. Establish test validity
measured Specifications 3. Perform First Item 3. Establish test
2. Prepare the Table of 2. Consult with Analysis reliability
Specifications experts (subject 4. Perform First Option
3. Decide on the item teacher, test Analysis
format (Short answer, expert) for 5. Revise the test items
etc) validation based on the results of
(content) and the item analysis
editing 6. Conduct Second Trial
Run/Field Testing
7. Perform Second Item
Analysis
8. Perform Second Option
Analysis
9. Write the final form of
the test

The smaller the Difficulty Index is, the


more difficult the item.
The larger the Discrimination Index is, the
more discriminating the item.

Page 4
Interpreting the Difficulty and Discrimination Indices

Difficulty Index Interpretation Discrimination Index Interpretation


0.00 – 0.20 Very difficult item -1.00 – -0.60 Questionable item
0.21 – 0.40 Difficult item -0.59 – -0.20 Not discriminating item
0.41 – 0.60 Moderately difficult item -0.21 – 0.20 Moderately discriminating item
0.61 – 0.80 Easy item 0.21 – 0.60 Discriminating item
0.81 – above Very easy item 0.61 – 1.00 Very discriminating item

General Suggestions in Test Writing

1. Use your test specifications as guide to item writing.


2. Write most items than needed.
3. Write the test items well in advance of the testing date.
4. Write each test item so that the task to be performed is clearly defined.
5. Write each test item in appropriate reading level.
6. Write each test item so that it does not provide help in answering other items in the test.
7. Write each test item so that the answer is one that would be agreed upon by the experts.
8. Write each test item so that it is in the proper level of difficulty.
9. Whenever a test is revised, recheck its relevance.

Suggestions for Writing Selective Type Tests

1. Multiple Choice
a) The stem of the item should be meaningful by itself and should present a definite problem.
b) The item should include as much of the item and should be free of irrelevant material.
c) Use a negatively stated item stem only when significant learning outcomes required it.
d) Highlight negative words in the stem for emphasis.
e) All the alternatives should be grammatically consistent with the stem of the item.
f) An item should only have one correct or clearly best answer.
g) Items used to measure understanding should contain novelty, but beware of too much.
h) All distracters should be plausible.
i) Verbal associations between the stem and the correct answer should be avoided.
j) The relative length of the alternatives should not provide a clue to the answer.
k) The alternatives should be arranged logically.
l) The correct answer should appear in each of the alternative positions and approximately equal number of times but in random order.
m) Use of special alternatives such as “none of the above” or “all of the above” should be done sparingly.
n) Do not use multiple-choice items when other types are more appropriate.
o) Always have the stem and alternatives on the same page.
p) Break any of these rules when you have a good reason for doing so.
2. Alternative Response
a) Avoid broad statements.
b) Avoid trivial statements.
c) Avoid the use of negative statements, especially double negatives.
d) Avoid long and complex sentences.
e) Avoid including two ideas in one statement unless cause-effect relationships are being measured.
f) If opinion is used, attribute it to some source unless the ability to identify opinion is being specifically measured.
g) True statements and false statements should be approximately equal in length.
h) The number of true statements and false statements should be approximately equal.
i) Start with false statement since it is a common observation that the first statement in this type is always positive.
3. Matching Type
a) Use only homogeneous material in a single matching exercise.
b) Include an unequal number of responses and premises, and instruct the students that responses may be used once, more than once,
or not at all.
c) Keep the list of items to be matched brief, and place the shorter responses at the right.
d) Arrange the list of responses in logical order.
e) Indicate in the directions the basis for matching the responses and premises.
f) Place all the items for one matching exercise on the same page.

Suggestions for Writing Supply Type Tests

1. Word the item(s) so that the required answer is both brief and specific.
2. Do not take statements directly from textbooks as a basis for short answer items.
3. A direct question is generally more desirable than an incomplete statement.
4. If the item is to be expressed in numerical units, indicate the type of answer wanted.
5. Blanks for answers should be equal in length.
6. Answers should be written before the item number for easy checking.
7. When completion items are to be used, do not have too many blanks. Blanks should be within or at the end of the sentence and not at the
beginning.

Page 5
Suggestions for Writing Essay Type Tests

1. Restrict the use of essay questions to those learning outcomes that cannot be satisfactorily measured by objective items.
2. Avoid the use of optional questions.
3. Indicate the approximate time limit or the number of points for each question.
4. Prepare the scoring guide (rubric) for the essay questions.

Criteria to Consider when Constructing Good Test Items


Validity

Validity is the degree to which the test measures what it intended to measure. It is the usefulness of the test for a given purpose.

Types of Validity

• Face Validity – done by examining the physical appearance of the test


• Content Validity – done through a careful and critical examination of the objectives of the test so that it reflects the curricular objectives
• Criterion-related Validity – established statistically such that a set of scores revealed by a test is correlated with the scores obtained in
another external predictor or measure
◦ Concurrent Validity – describes the present status of the individual by correlating the sets of scores obtained from measures given
concurrently
◦ Predictive Validity – describes the future performance of an individual by correlating the sets of scores obtained from two measures
given at a longer time interval
• Construct Validity – established statistically by comparing psychological traits of factors that influence scores in a test (e.g. verbal,
numerical, spatial)
◦ Convergent Validity – established if the instrument defines a similar trait other than what it intended to measure (e.g. Critical Thinking
Test may be correlated with Creative Thinking Test)
◦ Divergent Validity – established if an instrument can describe only the intended trait and no other traits (e.g. Critical Thinking Test may
not be correlated with Reading Comprehension Test)

Factors Influencing the Validity of Tests

A. Appropriateness of test E. Construction of test items


B. Directions F. Length of the test
C. Reading vocabulary and sentence structures G. Arrangement of items
D. Difficulty of items H. Patterns of answers

Reliability

Reliability refers to the consistency of scores obtained by the same person when retested using the same instrument or one that is parallel to it.

Types of Reliability Measure

Reliability Measure Method Procedure


Test-Retest Give a test twice to the same group with time interval between tests from
Measure of stability
several minutes to several years
Measure of equivalence Equivalent/Parallel Forms Give parallel forms of tests with close time intervals between forms
Measure of stability and Test-Retest with Equivalent Give parallel forms of tests with increased time intervals between forms
equivalence Forms
Split Half Give a test once. Score equivalent halves of the test (e.g. odd- and even-
Measure of internal numbered items
consistency Kuder-Richardson Give the test once then correlate the proportion/percentage of the students
passing and not passing a given item

Factors Affecting Reliability

A. Length of the test E. Scorability


B. Difficulty of the test F. Economy
C. Objectivity G. Adequacy
D. Administrability H. Authenticity

Data and Data Measures


Measures of Central Location

Any measure indicating the center of a set of data, arranged in an increasing or decreasing order of magnitude, is called a measure of central
location or a measure of central tendency.

• The arithmetic mean is the sum of the data values divided by the total number of values.
Page 6
• The median of a set of numbers arranged in order of magnitude is the middle value or the arithmetic mean of the two middle values.
• The mode is defined to be the value that occurs most often in a data set. The mode may not exist, and even if it does exist, it may not be
unique.

Interpretation of Measures of Central Tendency

• The mean (or median or mode) is the point on the scale around which scores tend to group
• It is the average or typical score which represents a given group of subjects
• Given two or more values of central tendency, one can define who performed poor, good, better, or best

Measures of Variability

A measure of variation or dispersion describes how large the differences between the individual scores.
• The larger the measure of variability, the more spread the scores, and the group is said to be heterogeneous.
• The smaller the measure of variability, the less spread the scores, and the group is said to be homogenous.

Range, Standard Deviation, Quartile Deviation

• The range of a set of data is the difference between the largest and smallest number in the set.


N

• Given the finite population x1 , x 2 ,..., x N , the population standard deviation is s= ∑i=1  x i −  2
N
• Quartile Deviation: QD =
Q3 - Q1
2

Interpretation of Standard Deviation and Quartile Deviation


• Standard Deviation
◦ The computed value indicates how spread the scores are. One SD below and above the mean includes around 68.26% of the cases.
Measuring off two SD units on each side of the mean includes (between the two points) approximately 95% of the cases. And for
three SD units, 99% of the cases are covered between the two points.
◦ Helps in determining how many students performed about the mean or average, above average or below average
• Quartile Deviation
◦ In a normal distribution, getting the median and adding and subtracting 1 QD on each side includes approximately 50% of the cases.
In a large sample, four (4) QDs on each side of the median include practically all the cases.

Measures of Relative Position

Percentiles
• Percentiles divide the distribution into 100 groups.
• Deciles divide the data set into 10 groups. Deciles are denoted by D1,D2 ,...,D9 with the corresponding percentiles being P10 ,P20 ,...,P90
.
• Quartiles divide the data set into 4 groups. Quartiles are denoted by Q1 , Q2 , and Q3 with the corresponding percentiles being P25 , P50 ,
and P75 .
• The interquartile range, IQR = Q3 - Q1 .

Standard Scores
• The standard score or z-score for a value is obtained by subtracting the mean from the value and dividing the result by the standard
deviation. It represents the number of standard deviations a data value falls above or below the mean.

Stanines
• Standard scores that tell the location of a raw score in a specific segment in a normal distribution which is divided into 9 segments,
numbered from a low of 1 through a high of 9
• Scores falling within the boundaries of these segments are assigned one of these 9 numbers (standard nine)

t-Score
• Tells the location of a score in a normal distribution having a mean of 50 and a standard deviation of 10

Measures of Shape

Skewness – deviation from normality in the shape of a distribution


• Positively skewed – most scores are low, there are extremely high scores, and the mean is greater than the mode
• Negatively skewed - most scores are high, there are extremely low scores, and the mean is lower than the mode

Page 7
Kurtosis – the peakedness or flatness of the distribution
• Mesokurtic – moderate peakedness
• Leptokurtic – more peaked or steeper than a normal distribution
• Platykurtic – flatter than a normal distribution

Other Shapes
• Bimodal – curves with two peaks or mode
• Polymodal – curve with three or mode modes
• Rectangular – there is no mode

Assigning Grades/Marks/Ratings
Marking/Grading is the process of assigning value to a performance.

A. Percent, such as 70%, 75%, 80%


B. Numbers, such as 1.0, 2.0, 3.0, 4.0
C. Letters, such as A, B, C, D or F
Could be in
D. Descriptive expressions, such as Outstanding (O), Very Satisfactory (VS), Satisfactory (S), Needs
Improvement (NI)
Note: Any symbol can be used provided that it has a uniform meaning to all concerned
A. How a student is performing in relation to other students (Norm-referenced grading)
B. The extent to which a student has mastered a particular body of knowledge (Criterion-referenced
Could represent grading)
C. How a student is performing in relation to a teacher’s judgment of his or her potential (Grading in
relations to teacher’s judgment)
A. Certification that gives assurance that a student has mastered a specific content or achieved a certain
level of accomplishment
B. Selection that provides basis in identifying or grouping students for certain educational paths or
Could be for programs
C. Direction that provides information for diagnosis and planning
D. Motivation that emphasizes specific material or skills to be learned and helping students to
understand and improve their performance

A. Examination results or test data G. Reports, themes and research papers


B. Observations of student work H. Discussions and debates
Could be based on C. Group evaluation activities I. Portfolios
D. Class discussion and recitation J. Projects
E. Homework K. Attitudes
F. Notebooks and note-taking
Could be assigned by using A. Criterion-referenced grading
◦ Based on fixed or absolute standards where grade is assigned based on how a student
has met the criteria or the well-defined objectives of a course that were spelled out in
advance
◦ It is then up to the student to earn the grade he/she wants to receive regardless of how
other students in the class have performed.
B. Norm-referenced grading
◦ Based on relative standards where a student’s grade reflects his/her level of achievement
relative to the performance of other students in the class
◦ In this system, the grade is assigned based on the average of test scores.
C. Point or Percentage grading
◦ The teacher identifies points or percentages for various tests and class activities
depending on their importance. The total of these points will be the grade assigned to the
student.
◦ Example: Written Outputs (50%), Oral Outputs (30%), Special project (20%)
D. Contract grading
◦ Each student agrees to work for a particular grade according to agreed upon standards

Grade Condition
F Not coming to class regularly or not turning in the required works
D Coming to class regularly and turning in the required work on time
C Coming to class regularly, turning in the required work on time, and
receiving a check mark on all assignments to indicate they are
satisfactory
B Coming to class regularly, turning in the required work on time, and
Page 8
receiving a check mark on all assignments except at least three that
achieve a check-plus, indicating superior achievement
A As above, plus a written report on one of the books listed for
supplementary reading

Guidelines in Grading Students

1. Explain your grading system to the students early in the course and remind them of the grading policies regularly.
2. Base grades on a predetermined and reasonable set of standards.
3. Base grades on as much objective evidence as possible.
4. Base grades on the student’s relative standing compared to classmates.
5. Base grades on a variety of sources.
6. As a rule, do not change grades.
7. Become familiar with the grading policy of your school and with your colleague’s standards.
8. When failing a student, closely follow school procedures.
9. Record grades on report cards and cumulative records.
10. Guard against bias in grading.
11. Keep students informed of their standing in the class.

Conducting Parent-Teacher Conferences

1. Make plans for the conference


2. Begin the conference in a positive manner
3. Present the student’s strong points before describing the areas needing improvement
4. Encourage parents to participate and share information
5. Plan a course of action cooperatively
6. End the conference with a positive comment
7. Use good human relation skills during the conference
8. DOs
a) Be friendly and informal
b) Be positive in your approach
c) Be willing to explain in understandable terms
d) Be willing to listen
e) Be willing to accept parents’ feelings
f) Be careful about giving advice
9. DON’Ts
a) Don’t argue or get angry
b) Don’t ask embarrassing questions
c) Don’t talk about other students, teachers or parents
d) Don’t bluff if you do not know an answer
e) Don’t reject parents’ suggestions
f) Don’t be a know-it-all person

Page 9
Exercises
1. A class is composed of academically poor students. The distribution would be most likely to be ____________.
A. skewed to the right C. a bell curve
B. leptokurtic D. skewed to the left
2. A negative discrimination index means that ___________.
A. the test item has low reliability
B. the test item could not discriminate between the lower and upper groups
C. more from the lower group answered the test item correctly
D. more from the upper group got the item correctly
3. A number of test items are said to be non-discriminating? What conclusion/s can be drawn?
I. Teaching or learning was very good.
II. The item is so easy that anyone could get it right.
III. The item is so difficult that nobody could get it.
A. II only C. III only
B. I and II D. II and III
4. A positive discrimination index means that
A. the test item has low reliability.
B. the test item could not discriminate between the lower and upper groups.
C. more from the upper group got the item correctly.
D. more from the lower group got the item correctly.
5. A quiz is classified as a
A. diagnostic test. C. summative test.
B. formative test. D. placement test.
6. A teacher would use a standardized test ___________.
A. to serve as a unit test C. to compare her students to national norms
B. to engage in easy scoring D. to serve as a final examination
7. A test item has a difficulty index of 0.81 and a discrimination index of 0.13. What should the test constructor do?
A. Make it a bonus item. C. Reject the item.
B. Retain the item. D. Revise the item.
8. An examinee whose score is within x1 SD belongs to which of the following groups?
A. Above average C. Needs improvement
B. Below average D. Average
9. Are percentile ranks the same as percentage correct?
A. It cannot be determined unless scores are given.
B. It cannot be determined unless the number of examinees is given.
C. No
D. Yes
10. Assessment is said to be authentic when the teacher ___________.
A. considers students’ suggestions in testing
B. gives valid and reliable paper-pencil test
C. includes parents in the determination of assessment procedures
D. gives students real-life tasks to accomplish
11. Below is a list of methods used to establish the reliability of the instrument. Which method is questioned for light reliability due
to practice and familiarity?
A. Split half
B. Equivalent forms
C. Test retest
D. Kuder Richardson
12. Beth is one-half standard deviation above the mean of her group in arithmetic and one standard deviation above the mean in
spelling. What does this imply?
A. She is better in arithmetic than in spelling when compared to the group.
B. She excels both in spelling and in arithmetic.
C. In comparison to the group, she is better in spelling than in arithmetic.
D. She does not excel in spelling nor in arithmetic.
13. Concurrent validity requires
A. correlation study. C. item difficulty.
B. item analysis. D. peer consultation.
14. For mastery learning which type of testing will be most fit?
A. Norm-referenced testing
B. Criterion-referenced testing
C. Formative testing
D. Aptitude testing
15. For maximum interaction, which type of questions must a teacher avoid?
A. Rhetorical C. Leading
B. Informational D. Divergent
16. “Group the following items according to phylum” is a thought test item on ________________.
A. inferring
B. classifying
C. generalizing
D. comparing
17. HERE IS A COMPLETION TEST ITEM: THE __________ IS OBTAINED BY DIVIDING THE __________ BY THE
__________. The rule in completion test item construction violated is
A. avoid over mutilated statements
B. avoid grammatical clues to the answer
C. avoid infinite statements
D. the required response should be a single word or a brief phrase
18. “If all the passers of 2006 Licensure Examination for Teachers will turn out to be the most effective in the Philippine school
system, it can be said that this LET possesses ______________ validity.
A. construct C. predictive
B. content D. concurrent
19. If all your students in your class passed the pretest, what should you do?
A. Administer the posttest.
B. Go through the lesson quickly in order not to skip any.
C. Go on to the next unit.
D. Go through the unit as usual because it is part of the syllabus.
20. If I favor “assessment for learning”, which will I do most likely?
A. Conduct a pretest, formative and summative test.
B. Teach based on pretest results.
C. Give specific feedback to students.
D. Conduct peer tutoring for students in need of help.
21. If teacher wants to test student’s ability to organize ideas, which type of test should she formulate?
A. Technical problem type
B. Short answer
C. Multiple-Choice type
D. Essay
22. If the computed range is low, this means that ____________.
A. The students performed very well in the test.
B. The difference between the highest and the lowest score is high.
C. The students performed very poorly in the test.
D. The difference between the highest and the lowest score is low.
23. If your Licensure Examination Test (LET) items sample adequately the competencies listed in the syllabi, it can be said that
the LET possesses __________ validity.
A. concurrent C. content
B. construct D. predictive
24. In a 50-item test where the mean is 20 and the standard deviation is 8, Soc obtained a score of 16. What descriptive rating
should his teacher give him?
A. Average C. Poor
B. Below average D. Above average
25. In a grade distribution, what does the normal curve mean?
A. All students have average grades.
B. A large number of students have high grades and very few with low grades.
C. A large number of more or less average students and very few students receive low and high grades.
D. A large number of students receive low grades and very few students get high grades.
26. In a Science class test, one group had a range within the top quarter of 15 points and another group on the same
measurement had a range of 30 points. Which statement applies?
A. The first group is more varied than the second group.
B. The first group has a variability twice as great as the second group within the top quarter.
C. The second group has a variability twice as great as the first group within the top quarter.
D. The second group does not differ from the first group in variability.
27. In an entrance examination, student A’s Percentile is 25 (P 25). Based on this Percentile rank, which is likely to happen?
A. Student A will be admitted.
B. Student A has 50-50 percent chance to be admitted.
C. Student A will not be admitted.
D. Student A has 75 percent chances to be admitted.
28. In group norming, the percentile rank of the examinee is
A. dependent on his batch of examinees.
B. independent on his batch of examinees.
C. unaffected by skewed distribution.
D. affected by skewed distribution.
29. In her item analysis, Teacher G found out that more from the upper group got test item no. 6 correctly. What conclusion can be
drawn? The test item has a ________.
A. high difficulty index C. positive discrimination index
B. high facility index D. negative discrimination index
30. In his second item analysis, Teacher H found out that more from the lower group got the test item no. 6 correctly. This means
that the test item __________.
A. has a negative discriminating power C. has a positive discriminating power
B. has a lower validity D. has a high reliability
31. In test construction, what does TOS mean?
A. Table of Specifications C. Table of Specific Test Items
B. Table of Specifics D. Terms of Specification
32. In the context on the theory on multiple intelligences, what is one weakness of the paper-pencil test?
A. It is not easy to administer.
B. It puts the non-linguistically intelligent at a disadvantage
C. It utilizes so much time.
D. It lacks reliability.
33. In the parlance of test construction, what does TOS mean?
A. Team of Specifications C. Table of Specifications
B. Table of Specifics D. Terms of Specifications
34. In which competency did my students find the greatest difficulty? In the item with a difficulty index of ____________.
A. 0.1 C. 0.9
B. 1.0 D. 0.5
35. In which type of grading do teachers evaluate students’ learning not in terms of grade but by evaluating the students in terms
of expected and mastery skills?
A. Point grading system C. Mastery grading
B. Relative grading D. Grade contracting
36. Is it wise practice to orient our students and parents on our grading system?
A. No, this will court a lot of complaints later.
B. Yes, but orientation must be only for our immediate customers, the students.
C. Yes, so that from the very start student and their parents know how grades are derived.
D. No, grades and how they are derived are highly confidential.
37. It is good to give students challenging and creative learning tasks because
A. development is aided by stimulation. C. development is affected by cultural changes.
B. the development of individuals is unique. D. development is the individual’s choice.
38. Marking on a normative basis means that ___________.
A. the normal curve of distribution should be followed
B. the symbols used in grading indicate how a student achieved relative to other students
C. some get high marks
D. some are expected to fail
39. Median is to point as standard deviation is to __________.
A. area C. distance
B. volume D. square
40. Ms. Celine gives a quiz to her class after teaching a lesson. What does she give?
A. Diagnostic test C. Performance test
B. Summative test D. Formative test
41. NSAT and NEAT results are interpreted against set mastery level. This means that NSAT and NEAT fall under __________.
A. intelligence test C. criterion-referenced test
B. aptitude test D. norm-referenced test
42. On the first day of class after initial introductions, the teacher administered a Misconception/Preconception Check. She
explained that she wanted to know what the class as a whole already knew about Philippines before the Spaniards came. On
what assumption is this practiced based?
A. Teachers teach a number of erroneous information in history.
B. A Misconception/Preconception check determines students’ readiness for instruction.
C. The greatest obstacle to new learning often is not the students’ lack of prior knowledge but, rather, the existence of
prior knowledge.
D. History books are replete with factual errors.
43. Other than finding out how well the course competencies were met, Teacher K also wants to know his students’ performance
when compared with other students in the country. What is Teacher K interested to do?
A. Authentic evaluation C. Formative evaluation
B. Norm-referenced evaluation D. Criterion-referenced evaluation
44. Other than the numerical grades found in students’ report cards, teachers are asked to give remarks. On which belief is this
practice based?
A. Numerical grades have no meaning.
B. Giving remarks about each child is part of the assessment task of every teacher.
C. Remarks, whether positive or negative, motivate both parents and learner.
D. Grades do not reflect all developments that take place in every learner.
45. Out of 3 distracters in a multiple choice test item, namely B, C and D no pupil chose D as an answer. This implies that D is
____________.
A. an ineffective distracter C. a plausible distracter
B. a vague distracter D. an effective distracter
46. Q1 is 25th percentile as median is to ____________.
A. 40th percentile C. 50th percentile
th
B. 60 percentile D. 75th percentile
47. Quiz is to formative test while periodic is to __________
A. criterion-reference test C. norm-reference test
B. summative test D. diagnostic test
48. Range is to variability as mean is to _____________.
A. level of facility C. correlation
B. level of difficulty D. central tendency
49. Referring to assessment of learning, which statement on the normal curve is FALSE?
A. The normal curve may not necessarily apply to homogenous class.
B. When all pupils achieve as expected their learning curve may deviate from the normal curve.
C. The normal curve is sacred. Teachers must adhere to it no matter what.
D. The normal curve may not be achieved when every pupils acquires targeted competencies.
50. Ruben scored 60 on a percentile-ranked test. This means that __________.
A. Ruben got 60% of the question wrong.
B. 60% of the students who took the test scored higher than Ruben.
C. 60% of the students who took the test scored lower than Ruben.
D. Ruben got 60% of the questions right.
51. Standard deviation is to variability as mode is to ___________________.
A. correlation C. discrimination
B. level of difficulty D. central tendency
52. Standard deviation is to variability as mean is to __________.
A. coefficient of correlation C. discrimination index
B. central tendency D. level of difficulty
53. Study this group of test which was administered to a class to whom Peter belongs, then answer the question:

SUBJECT MEAN SD PETER’S SCORE


Math 56 10 43
Physics 41 9 31
English 80 16 109

In which subject(s) did Peter perform most poorly in relation to the group’s mean performances?
A. English C. English and Physics
B. Physics D. Math
54. Study this group of tests which was administered with the following results, then answer the question:

SUBJECT MEAN SD RONNEL’S SCORE


Math 40 3 58
Physics 38 4 45
English 75 5 90

In which subject/s were the scores most homogenous?


A. English C. Math and English
B. Physics D. Math
55. Suppose that in the April 2008 LET the mean in the professional education test for the elementary group was 44.3. What does
this mean?
A. 44.3 is the best single value that represents the performance of the elementary teacher examinees as a whole.
B. Most of the elementary teacher examinees obtained a score of 44.3.
C. 50% of the elementary teacher examinees got a score of at least 44.
D. None of the elementary teacher examines got a score below 44.
56. Teacher A constructed a matching type test. In her columns of items are a combination of events, people, circumstances.
Which of the following guidelines in constructing matching type of test did he VIOLATE?
A. List options in an alphabetical order. C. Make list of items heterogeneous.
B. Make list of items homogeneous. D. Provide three or more options.
57. Teacher A discovered that his pupils are very good in dramatizing. Which tool must have helped him discover his pupils'
strength?
A. Portfolio assessment C. Journal entry
B. Performance test
58. Teacher A wants to make use of the most stable measure of variability? Which one(s) should you recommend?
A. External range C. Standard deviation
B. Quartile range D. External range and quartile range
59. Teacher A wrote of Henry: “When Henry came to class this morning, he seemed very tired and slouched into his seat. He took
no part in class discussion and seemed to have no interest in what was being discussed. This was very unusual, for he has
always been eager to participate and often monopolized the discussion time.” What Teacher A wrote is an example of a (an)
A. personalized report C. observation report
B. anecdotal report D. incidence report
60. Teacher B wants to diagnose in which vowel sound(s) her students have difficulty. Which tool is most appropriate?
A. Portfolio assessment C. Performance test
B. Journal entry D. Paper-and-pencil test
61. Teacher B wanted to teach the pupils the skill of cross stitching. Her check-up quiz was a written test on the steps of cross
stitching. What characteristic of a good test does it lack?
A. Objectivity C. Predictive validity
B. Reliability D. Content validity
62. Teacher C administered only true-false and multiple choice tests during the midterm. The students did well on these tests. He
decides to make the final exam consists of five essay questions. Which of the following is the most likely effect of this
decision?
A. The students will do better than they had in the previous test.
B. The students will not do as well as in the previous test.
C. There will be no correlation between the final and the midterm tests.
D. The students will do as well as in the previous test.
63. Teacher D gave a test in grammar. She found out that one half of the class scored very low. She plans to give another test to
the pupils who scored very low to find out exactly where they are weak. Which type of test is this?
A. Achievement test C. Placement test
B. Diagnostic test D. Intelligent test
64. Teacher Y does norm-referenced interpretation of scores. Which of the following does she do?
A. She describes group performance in relation to a level of mastery set.
B. She uses a specified content as its frame of reference.
C. She compares every individual students' scores with others' scores.
D. She describes what should be their performance.
65. Teacher Z is engaged in a criterion-reference interpretation of scores. Which of the following does she do?
A. She uses a specified population of persons as its interpretative frame of reference.
B. She describes every individual student performance in relation to the clearly-defined learning task.
C. She describes every individual student performance in relation to the performance of the age group of the student.
D. She compares every individual student performance with the performance of the rest.
66. The Analysis of Variance utilizing the F-test is the appropriate significance test to run
A. frequencies. C. medians.
B. two means only. D. three or more means.
67. The computed r for English and Math score is -.75. What does this mean?
A. The higher the scores in English, the higher the scores in Math.
B. The scores in Math and English do not have any relationship.
C. The higher the scores in Math, the lower the scores in English.
D. The lower the scores in English, the lower the scores in Math.
68. The computed r for scores in Math and Science is .85. This means that ________.
A. Science scores are slightly related to Math scores.
B. Math scores are not in any way related to Science scores.
C. Math scores are positively related to Science scores.
D. The higher the Math scores the lower the Science scores.
69. The difficulty index of a test item is 1. This means that ___________.
A. The test is very difficult. C. The test item is quality item.
B. The test is very easy. D. Nobody got the item correctly.
70. The difficulty index of a test item is 1. What does this imply? The test item must be _________.
A. moderate in difficulty
B. very difficult because only 1 got the item correctly
C. very easy because everybody got the test item correctly
D. neither difficult nor easy
71. The discrimination index 1.0 means that
A. 50% of the lower students got the item correctly and 50% of the upper students got it wrongly.
B. there is no difference between the lower and the upper students.
C. all lower students got an item correctly and upper students got it wrong.
D. all students in the upper group got the item correctly, no students in the lower group got it.
72. The distribution of scores in a Chemistry examination is drawn and found to be positively skewed. This means that
A. The scores are all above the mean.
B. There is a lumping of scores at the right side of the curve.
C. The scores are all below the mean.
D. There is a lumping of scores at the left side of the curve.
73. The facility of a test item is .50. This means that the test item is ________________.
A. valid C. reliable
B. very easy D. moderate in difficulty
74. The first thing to do in constructing a periodic test is for a teacher to
A. study the content C. decide on the number of items for the test
B. go back to her instructional objective D. decide on the type of test to construct
75. The following demand criterion-referenced tests, EXCEPT
A. Outcome-based education C. Mastery learning
B. Collaborative learning D. Personalized System of Instruction
76. The main purpose in administering a pretest and a posttest to students is to __________.
A. Measure the value of the material taught C. Keep adequate records
B. Measure gains in learning D. Accustom the students to frequent testing
77. The mode of a score distribution is 25. This means that
A. twenty-five is the score that occurs least.
B. twenty-five is the score that occurs most.
C. twenty-five is the average of the score distribution.
D. there is no score of 25.
78. The score distribution of Set A and Set B have equal mean but with different SDs. Set A has an SD of 1.7 while Set B has an
SD of 3.2. Which statement is TRUE of the score distributions?
A. Majority of the scores in Set B are clustered around the mean.
B. Scores in Set A are more widely scattered.
C. Majority of the scores in Set A are clustered around the mean.
D. The scores of Set B has less variability than the scores in Set A.
79. The sum of all the scores in a distribution always equals
A. the mean times the interval size. C. the mean times N.
B. the mean divided by the interval size. D. the mean divided by N.
80. The teacher gives achievement test to his 25 students. The test consists of the 50 items. He wants to classify his students’
performance based on the test result.
What is the appropriate measure for the position?
A. Z-value C. Stanine
B. Percentile Rank D. Percentage
81. The test item has a discrimination index if -.38 and difficulty index of 1.0. What does this imply to test construction? Teacher
must __________________.
A. recast the item C. reject the item
B. shelve the item for future use D. retain the item
82. The use of a table specification assures the teacher and pupils of a test with?
A. Reliability C. Constructive validity
B. Predictive validity D. Content validity
83. The variance, standard deviation, and range are all measures of
A. variability C. grouping
B. central tendency D. partition values
84. To determine student’s entry knowledge and skills which test should be given?
A. Diagnostic C. Placement
B. Aptitude D. Standardized
85. To evaluate teaching skills, which is the most authentic tool?
A. Observation C. Short answer test
B. Non-restricted essay test D. Essay test
86. To have a test with a coverage and with power to test analytical thinking and case scoring? Which type should the teacher
use?
A. Alternative response C. Completion
B. Short answer D. Multiple choice
87. Tom’s raw score in the Filipino class is 23 which is equal to the 70 th percentile. What does this imply?
A. 70% of Tom’s classmates got a score lower than 23.
B. Tom’s score is higher than 23% of his classmates.
C. 70% of Tom’s classmates got a score above 23.
D. Tom’s score is higher than 23 of his classmates.
88. What can be said of student performance in a positively skewed score distribution?
A. Most students performed well. C. A few students performed excellently.
B. Almost all students had average performance. D. Most students performed poorly.
89. What does a negative discrimination index mean?
A. There are more from the upper group that got the item right.
B. The test item is quite difficult.
C. The test item is quite easy.
D. There are more from the lower group who got the item right.
90. What does a negatively skewed score distribution imply?
A. The scores congregate on the left side of the normal distribution curve.
B. The scores are widespread.
C. The students must be academically poor.
D. The score congregate on the right side of the normal distribution curve.
91. What does a percentile rank of 62 mean?
A. It is the student’s score in the test.
B. The student’s score is higher than 62 percent of all students who took the test.
C. The student answered sixty-two percent (62%) of the items correctly.
D. Sixty-two percent (62%) of those who took the test scored higher than the individual.
92. Which is most implied by a negatively skewed score distribution?
A. Several of the pupils are in middle of the distribution.
B. Most of the scores are high.
C. Most of the scores are low.
D. The scores are evenly distributed from left to right of the normal curve.
93. Which is the first step in planning an achievement test?
A. Make a table of specification.
B. Go back to the instructional objective.
C. Determine the group for whom the test is intended.
D. Select the type of test item to use.
94. Which is true when standard deviation is big?
A. Scores are concentrated. C. Scores are spread apart.
B. Scores are not extremes. D. The bell curve shape is steep.
95. Which item is learned most by my students? In the item index with a difficulty index of
A. .50 C. .90
B. .10 D. 1.0
96. Which measure(s) of central tendency can be determined by mere inspection?
A. Mode & Median C. Mode
B. Median D. Mean
97. Which measure(s) of central tendency separate(s) the top half of the group from the bottom half?
A. Median C. Median and Mean
B. Mean D. Mode
98. Which must go with self-assessment for it to be effective?
A. Consensus of evaluation results from teachers and students
B. Scoring rubric
C. External monitor
D. Public display of results of self-evaluation
99. Which of the following could produce more than one value?
A. Mean C. Median
B. Mode D. Mean of grouped data
100. Which of the following is considered the most important characteristic of a good test?
A. Administrability C. Validity
B. Reliability D. Usability
101. Which of the following measures is most affected by an extreme score?
A. Semi-interquartile range C. Mode
B. Median D. Mean
102. Which of the following types of test is the least applicable in measuring higher level of achievement?
A. Multiple choice C. True-false
B. Matching D. Completion
103. Which one can enhance the comparability of grades?
A. Using common conversion table for translating test scores in to ratings
B. Formulating tests that vary from one teacher to another
C. Allowing individual teachers to determine factors for rating
D. Individual teachers giving weights to factors considered for rating
104. Which one describes the percentile rank of a given score?
A. The percent of cases of a distribution below and above a given score
B. The percent of cases of a distribution below the given score
C. The percent of cases of a distribution above the given score
D. The percent of cases of a distribution within the given score
105. Which one of the following is NOT a measure of central tendency?
A. Median B. Mean C. Variance D. Mode
106. Which ones can tell a teacher whether the score distributions appear compressed or expanded?
A. Standard scores C. Measures of variability
B. Measures of correlation D. Measures of central tendency
107. Which score distribution shows the scores in a very narrow range?
A. Bimodal B. Platykurtic C. Left-skewed D. Leptokurtic
108. Which statement about performance-based assessment is FALSE?
A. They emphasize merely process.
B. They stress on doing, not only knowing.
C. Essay tests are an example of performance-based assessments.
D. They accentuate on process as well as product.
109. Which statement about standard deviation is CORRECT?
A. The lower the standard deviation the more spread the scores are.
B. The higher the standard deviation the less the spread the scores are.
C. The higher the standard deviation the more spread the scores are.
D. It is measure of central tendency.
110. Which statement is true in a bell-shaped curve?
A. There are more high scores than low scores. C. The scores are normally distributed.
B. Most scores are high. D. There are more low scores than high scores.
111. Which type of report to “on-the-spot” description of some incident, episode or occurrence that is being observed and
recorded as possibly significant?
A. Anecdotal record C. Biographical report
B. Autobiographical report D. Sociometry
112. Which type of statistics is meant to draw out implications about the population from which is the sample is taken
_________.
A. Descriptive and inferential C. Correlational
B. Inferential D. Descriptive
113. Which types of statistics give (s) information about the sample being studied?
A. Inferential C. Inferential and correlational
B. Correlational D. Descriptive
114. Which will be the most authentic assessment tool for an instructional objective on working with and relating to
people?
A. Writing articles on working and relating to people C. Home visitation
B. Organizing a community project D. Conducting mock election
115. While she is in the process of teaching, Teacher J finds out if her students understand what she is teaching. What is
Teacher J engaged in?
A. Criterion-reference evaluation C. Formative evaluation
B. Summative evaluation D. Norm-referenced evaluation
116. Why are test norms established? To have basis for
A. computing grades C. interpreting test results
B. establishing learning goals D. identifying pupil’s difficulties
117. With the current emphasis on self-assessment and performance assessment, which is indispensable?
A. Numerical grading C. Letter grading
B. Paper-and-pencil test D. Scoring rubric
118. With types of test in mind, which does NOT belong to the group?
A. Restricted response essay C. Multiple choice
B. Completion D. Short answer
119. You give a 100-point test; three students make scores of 95, 92 and 91, respectively; And the other 22 students in the
class make scores ranging form 33 to 67. The measure of the central tendency which is apt to best describe for this group of
25 students is
A. the mean. C. average of the median & mode.
B. the mode. D. the median.

You might also like