You are on page 1of 9

Development, Construction, and Validation of a Kindergarten Test on Language

Final Requirement for Completion in Assessment and Evaluation of Learning (Ed235 A)

Submitted by: Jo Anne Michelle S. Go ID Num: 041474 Submitted to: Dr. Cornelia C. Soto

I. Introduction I am a kindergarten teacher at Academia de Bellarmino. The ages of the students I teach are from four (4) to five (5) years old. Language is taught twice a week, on Mondays and Tuesdays, 35 and 40 minutes per meeting respectively. Teachers are expected to teach in the first session (Mondays) then have the tests in the next session (Tuesdays). But it is undeniable that this time is not enough for valuable learning to occur. Luckily, since I teach all the subjects, I can adjust the time. I teach the lessons on Mondays and Tuesdays, and then have the tests on Wednesdays. This paper is a study on the development, construction, and validation of a kindergarten test on Language. The coverage of the test is: The School and The People in School and Their Duties. This is a two-part exam, containing 15 items per exam. This topic was taught during the 5th and 6th week of the 1st Quarter. The tests, also called Evaluation Sheet 5 and Evaluation Sheet 6 respectively, were administered in the same weeks.

II. Methodology The development, construction, and validation of these tests went through several steps. The first step was defining the subject, the level, the topic/s, and the objectives. I have chosen Language as the subject and kindergarten as the level. At first, my chosen topic was The Members of the Family. However, since one topic would not be enough to make long tests of at least 30 items all in all, I broadened the coverage of the tests and added more topics. I then added topics such as The House, The School, and The People in School and Their Duties. I then defined my general and specific objectives for each topic. These objectives covered three

domains: the cognitive, affective, and psychomotor. Upon realizing that four topics would be more than enough, I decided to narrow down my tests to two topics: The School and The People in School and Their Duties. I decided to dedicate 15 items to each topic, thus having 30 items in total. I then constructed a Table of Specifications based on the objectives I have plotted. After the teacher validated my Table of Specifications, I then proceeded to constructing my test items. For the tests, I only used a variation of selected response measurement strategies as these are best used to measure recall and are therefore most suitable for kindergarten students. After all, what we are trying to promote in the children is mastery of the topic/s. For the topic The School, five items were dedicated to multiple-choice questions testing identification of the places in school and their functions. Ten more items where dedicated to binary-choice questions testing understanding of functions of things used in school. This exam has a total of 15 items. For the topic People in School and Their Duties, seven items were dedicated to a matching-type exam testing identification of the people in school. Eight more items were dedicated to binary-choice questions testing understanding of the duties of the persons in school. This exam has a total of 15 items. The two exams have 30 items all in all. An expert then critiqued my tests. The comments I received were: the number of items can be increased, that some pictures can be confusing and unclear to the students, suggestion to adopt the format of Evaluation 6, Test B. for Evaluation 5, Test B., questions if the items assume that the students can read, and some suggestions for revision of certain parts. I very much appreciated the comments the expert made but due to time constraints, by the time I received the comments, the school (Academia de Bellarmino) already had my tests reproduced. But had I received the comments

earlier, would I follow the suggestions? Responses on the abovementioned comments are discussed in the next chapter. It is important to note that the main goal of teaching these topics in kindergarten is for mastery. Thus, what goes on in the teaching and learning process are ultimately directed to this. As the places and people in school topics are taught, pictures are provided for the children to familiarize themselves with these people and places. They also color pictures of these people and places in their books, again for familiarization. As a form of formative assessment, they answer short recall questions in their books. These questions found in their books are the same questions in their tests, only rearranged and presented in another way. After the development and construction of the tests, they were administered to 18 kindergarten students.

III. Results Let me first discuss my response to the comments on the critique on the test items. On the comment of increasing the number of items, I think 15 items for each test is already enough. Any less, then I will not be able to cover the essentials; any more, then it will be too much for the children as they only have limited time to take these tests. On the comment that some pictures are confusing to the students, that would not be a problem as these students are introduced to these pictures from the very beginning of instruction. On the suggestion to adopt a different format for Test B of Evaluation 5, I think the format I used is already appropriate. The items are properly spaced and are not confusing to the students. The items also help them be able to form sentences once these are read to them and they mimic what the teacher says. The tests do not assume that the pupils can read and so the items are read to them. For Evaluation 6, imagine if the pictures were placed on the first

column, and the words on the second column; it would be difficult to read the words to the students as compared to saying Number 1 says janitor, find the picture of the janitor and connect that to the word in number 1. The comments I agree with are the suggestions to reword certain parts of the tests. I appreciate those comments on keeping the vocabulary level of the test items as simple and understandable as possible. And I very much agree with it. Let me now talk about my observations when the students took the test. It took them around 25-30 minutes to answer the whole test. The 25-30 minutes also included disciplining them before the test and having them quiet down, and also writing their names takes some time too. The students reacted positively to the test. They were actually excited to take them and were even happier when they saw the familiar pictures on the tests. The items they found difficult were items #7 and #15 of Test 2 (Evaluation 6) and item #4 of Test 1 (Evaluation 5). Item #4 of Test 1 (Evaluation 5) asks the students about the place in school where we clean ourselves. The answer of course is the comfort room. They may have found the item difficult because probably the picture is too small or the description could have said where we urinate/pee and wash our hands. Item #7 of Test 2 (Evaluation 6) asks the students to identify who the principal is. They may have found the item difficult because the pictures are too small or they cannot distinguish one picture from the other. But we must also note that these pictures have been introduced to them when the lessons were taught and the exact same pictures were used for this exam. Item #15 of Test 2 (Evaluation 6) asks the students which person in school keeps the records and information about the children. The answer of course is the registrar. They may have found the item difficult because the concept of the registrar is still something that is unfamiliar to them. Although this has been discussed in class, the fact that they do not see or hear

about the registrar much in their daily lives, it is easy to forget about the word registrar, let alone what the registrar does. As for the directions, they were clear enough and I did not have to translate or paraphrase for the students. Item Analysis and Determining the Reliability of the Test Long Test 1 or Evaluation Sheet 5 Item No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 pH 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 pL .89 .89 .89 .67 1 1 1 .89 1 1 1 1 .78 .78 1 P .95 .95 .95 .84 1 1 1 .95 1 1 1 1 .89 .89 1 D .11 .11 .11 .33 0 0 0 .11 0 0 0 0 .22 .22 0 Decision Marginal item, Subject to improvement Marginal item, Subject to improvement Marginal item, Subject to improvement Good item Poor item, to be rejected or revised Poor item, to be rejected or revised Poor item, to be rejected or revised Marginal item, Subject to improvement Poor item, to be rejected or revised Poor item, to be rejected or revised Poor item, to be rejected or revised Poor item, to be rejected or revised Reasonably good item Reasonably good item Poor item, to be rejected or revised

Long Test 2 or Evaluation sheet 6 Item No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 pH 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .89 pL 1 1 .89 .78 .78 .78 .67 1 1 .89 .89 .89 1 1 .44 P 1 1 .95 .89 .89 .89 .84 1 1 .95 .95 .95 1 1 .67 D 0 0 .11 .22 .22 .22 .33 0 0 .11 .11 .11 0 0 .45 Decision Poor item, to be rejected or revised Poor item, to be rejected or revised Marginal item, Subject to improvement Reasonably good item Reasonably good item Reasonably good item Good item Poor item, to be rejected or revised Poor item, to be rejected or revised Marginal item, Subject to improvement Marginal item, Subject to improvement Marginal item, Subject to improvement Poor item, to be rejected or revised Poor item, to be rejected or revised Very good item

On the item analysis table, we see a trend of a high average difficulty index ranging from 0.67-1. This means that almost all items are easy. Most items are marginal and poor items. There are only a number of reasonably good items. The results of the item analysis should, however, not be a bother; because as discussed in class, for Preschool long tests, there is no problem if the item discrimination index (D) is zero (0) because the purpose of the test is mastery. There is, therefore, no need to discriminate. For Test 1, the reliability of half the test is 0.73 and the reliability of the whole test is 0.84. This means that the test has a high reliability. This means that there is consistency in the scores or answers provided by the test. As for Test 2, the reliability of half the test is 0.16 and the reliability of the whole test is 0.28. This means that the

test has a low reliability. This means that there is no consistency in the scores and answers provided by the test. However, the high and low reliability is not so much a problem, because, again, the purpose of these tests is mastery. *All computations can be found in the Appendix Section.

IV. Insights and Conclusion Again, it is important to note that the goal of these tests is mastery. These are kindergarten students and the teachers role is to initially try to develop mastery of knowledge before the students can move on to higher competencies in the hierarchy of educational objectives. This study has shown the very important value of the relationship between development, construction, and validations of tests. A teacher who has no knowledge of these considerations would not make effective forms of assessment. One may think that it is easy to construct tests- that is all just about putting things together, or rearranging sentences into questions. But there is a science behind test construction. It is sometimes a mistake that happens when teachers forget about their objectives and make tests that are inappropriate and a mismatch to the objectives. That is why the definition of the objectives is an important step, and so is the construction of the Table of Specifications. In the construction of the tests, it is thus important to always keep in mind the competencies that we want to hone in our students. It is also important to seek the opinion of an expert because sometimes an opinion other than our own can pinpoint things we usually overlook. The process of assessment and evaluation does not stop once the tests are administered. The tests are then subjected to item analysis and a reliability test. Let us not neglect the importance of these steps. If our exams are intended to be reused or used by a lot of people, let us at least make sure that these tests are reliable; otherwise, they will not

be a sufficient and accurate measure of learning. It is when we have the results of the item analysis and reliability test that we can retrace our steps to see where we have gone wrong or if we have to improve certain parts. All these steps are done, not to make teachers lives difficult or more hectic as it is, but to make better teachers out of good ones. And once we are better teachers, we then produce better learners. V. Appendix Appendix 1: Appendix 2: Appendix 3: Appendix 4: Appendix 5: Appendix 6: Appendix 7: Appendix 8: Appendix 9: Appendix 10: Appendix 11: Appendix 12: Appendix 13: Appendix 14: Appendix 15: Appendix 16: Appendix 17: General and Specific Objectives Revised General and Specific Objectives Table of Specifications Revised Table of Specifications Test Construction of Test 1 (Evaluation Sheet 5) and critique Test Construction of Test 2 (Evaluation Sheet 6) and critique My Critique of Test Items Constructed by Maria Audris Abesamis Raw Data- Test 1 High Group (9 test papers) Raw Data- Test 1 Low Group (9 test papers) Raw Data- Test 2 High Group (9 test papers) Raw Data- Test 2 Low Group (9 test papers) Item Analysis Computations for Test 1 Item Analysis Computations for Test 2 Reliability Test and Split-Half Computations for Test 1 Reliability Test and Split-Half Computations for Test 2 Revised Test 1 (Evaluation Sheet 5) Revised Test 2 (Evaluation Sheet 6)