Standardized Test

Standardized Test: Design, Advantages and Disadvantages
Standardized Test: Design, Advantages and Disadvantages For

Syed Zubair Haider Department of Educational Training, The Islamia University of Bahawalpur, Rahim Yar Khan Campus Pakistan
Abstract
Standardized testing is any test that is used across a variety of schools or other situations. Designers of such tests must specify a discrete correct answer for every question. This type of test includes both achievement (which measures knowledge already known) and aptitude (which attempts to predict future performance or potential) tests given to grade-school students e.g. GREs, and the GATs. These tests will have gone through the psychometric procedure of test standardization have established norms, specific administration and scoring instructions and to have had their reliability and validity scientifically demonstrated. Standardized tests generally include at least some multiple-choice and true-false questions. These can be graded by computer, or by humans who do not understand the material in depth, as long as they have a list of the correct answers. One potential defect in such tests is that the test-taker can accidentally skip a line and then be marked wrong on material to which he or she knew the correct answer. Standardized tests often include written portions as well; these are graded by humans who use rubrics, or guidelines, as to what a good essay on the subject will be. Keywords: Psychometric, Test standardization, Norms, Reliability, Validity, Rubrics, Computer-adaptive testing, Benchmark papers, Test score, Norm-referenced, Criterion-referenced, Neuroticism, Innate intelligence.
Introduction
Originally a standardized test was simply a standard test of academic achievement or of knowledge in a specific academic or vocational domain. It has since acquired the meaning of a written test whose scores are interpreted by reference to the scores of a norm group which has taken the test and which is usually considered to be representative of the population which takes the test. For example, standardized tests of academic achievement provide conversion tables showing the percentile ranks in the norm group of all possible raw scores. Some standardized tests are now analyzed with item response theory. The earliest evidence of standardized testing based on merit comes from China during the Han dynasty. The concept of a state ruled by men of ability and virtue was an outgrowth of Confucian Department of Educational Training and Research M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1 1
philosophy. The imperial examinations covered the Six Arts which included music, archery and horsemanship, arithmetic, writing, and knowledge of the rituals and ceremonies of both public and private parts. Later, the five studies (military strategies, civil law, revenue and taxation, agriculture and geography) were added to the testing. Standardized testing has not traditionally been a part of European pedagogy. Based on the skeptical and open-ended tradition of debate inherited from Ancient Greece, Western academia favored the essay.
Standardization
In practice, standardized tests can be composed of multiple-choice, true-false and/or essay questions. Such items can be tested inexpensively and quickly by scoring special answer sheets by computer or via computer-adaptive testing. Some tests also have short-answer or essay writing components that are assigned a score by independent evaluators who use rubrics (rules or guidelines) and benchmark papers (examples of papers for each possible score) to determine the grade to be given to a response. Most assessments, however, are not scored by people; people are used to score items that are not able to be scored easily by computer (i.e., essays). For example, the Graduate Record Exam is a computer-adaptive assessment that requires no scoring by people (except for the writing portion). [4] Administrative procedures While the test can be administered in either an individual or a group format, one should highlight the standard procedures for individual administration. While these instructions are simple, they should be followed precisely if we intend to compare a childs performance with that of examinee in the normative sample (to whom the test was administered under exact these conditions). When we alter the administration for whatever reason we may be giving a child an unfair advantage or disadvantage that affects his or her behavior and, if so, invalidates his or her performance. [1] Scoring issues There can be issues with human scoring, which is a reason for the preference given to computer scoring. For example, the Seattle Times reported that for Washington State's WASL, temporary employees that were paid $10 an hour spent as little as 20 seconds on each math problem and 2.5 minutes on essay items which might determine if a student graduates from high school, which some believe is a matter of concern given the high stakes nature of such tests. Pearson scores many other state tests similarly. [5] Agreement between scorers can vary between 60 to 85 percent depending on the test and the scoring session. Sometimes states pay to have two or more scorers read each paper to improve reliability, though this does not eliminate test responses getting different scores. [6] Note, however, that open-ended components of test are often only a small proportion of the test. Score Department of Educational Training and Research M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1 2
There are two types of standardized test score interpretations: a norm-referenced score interpretation or a criterion-referenced score interpretation. Norm-referenced score interpretations compare test-takers to a sample of peers. Criterion-referenced score interpretations compare test-takers to a criterion (a formal definition of content), regardless of the scores of other examinees. These may also be described as standards-based assessments as they are aligned with the standards-based education reform movement. [7] Norm-referenced test score interpretations are associated with traditional education, which measures success by rank ordering students using a variety of metrics, including grades and test scores, while standardsbased assessments are based on the belief that all students can succeed if they are assessed against high standards which are required of all students regardless of ability or economic background. Evaluation standards In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational Evaluation [8] has published three sets of standards for evaluations. The Personnel Evaluation Standards [9] was published in 1988, The Program Evaluation Standards (2nd edition) [10] was published in 1994, and The Student Evaluation Standards [11] was published in 2003. Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance. Testing standards In the field of psychometrics, the Standards for Educational and Psychological Testing [12] place standards about validity and reliability, along with errors of measurement and individuals with disabilities. The third and final major topic covers standards related to testing applications, credentialing, plus testing in program evaluation and public policy.
Statistical Properties
Reliability In judging the adequacy of the test, our first order of business is to determine how consistently it measured individuals skills. In other words would the test give us faithful information, or would performance fluctuate from one testing to another? If performance is variable, the test is useless. We asked If examinee takes this test twice with short interval between testing, will they achieve the same or similar Department of Educational Training and Research M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1 3
scores? We also need to address another question: If two examiners score the same protocols, will they come up with the same or similar scores for each child? If the scoring guidelines are precise, then the examiners should obtain the same or similar scores. If the correlation coefficient in both cases is less then zero, it indicates good test retest reliability and very high interscorer reliability. [3] Validity Once we establish adequate reliability, we looked to determine the validity of the test. That is would it provide useful information for educational decision making. To answer this question we should administer a test before and after some particular program. This test is proving to be useful in predicting examinees performance and actual achievement. We could now proceed to develop a normative base so other could use this system to determine examinees skill proficiency. [3] Norms Establishing a normative base proved to be a tall order. Since norms should closely approximate the types of individuals who will likely be taking the test, there should be good representation geographically, socioeconomically, ethnically, etc. however for practical and financial reasons this not easy. We thought of this test as a screening instrument. A child should not be labeled or placed in a specific program strictly on the basis of his or her performance on this test, especially if he or she is from non-core-culture background and thus may have had little experience with paper-and-pencil task. If a child does poorly then a more comprehensive assessment should be undertaken. [2] Test interpretations Once we have a normative data base, then we could interpret examinees performance. To facilitate test interpretations, examinees raw scores could be converted to standard scores (T scores) and percentile scores. These converted scores should be available for both examinee and scorer. The T score describes how much a examinees score varies from the average of those in the standardization sample in his or her grade or age group. The percentile scores designate the percentage of examinee in his or her grade or age group with lower scores. [2]
Advantages
One of the main advantages of standardized testing is that the results can be empirically documented; therefore the test scores can be shown to have a relative degree of validity and reliability, as well as results which are generalizable and replicable.[13]. This is often contrasted with grades on a school transcript, which are assigned by individual teachers. It may be difficult to account for differences in educational culture across schools, difficulty of a given teacher's curriculum, differences in teaching style, and techniques and biases that affect grading. This makes standardized tests useful for admissions purposes in higher education, where a school is trying to compare students from across the nation or across the world. Department of Educational Training and Research M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1 4
Another advantage is aggregation. A well designed standardized test provides an assessment of an individual's mastery of a domain of knowledge or skill which at some level of aggregation will provide useful information. That is, while individual assessments may not be accurate enough for practical purposes, the mean scores of classes, schools, branches of a company, or other groups may well provide useful information because of the reduction of error accomplished by increasing the sample size.
Disadvantages and criticism

"Standardized tests can't measure initiative, creativity, imagination, conceptual thinking, curiosity, effort, irony, judgment, commitment, nuance, good will, ethical reflection, or a host of other valuable dispositions and attributes. What they can measure and count are isolated skills, specific facts and function, content knowledge, the least interesting and least significant aspects of learning." Bill Ayers [14] Though many educators recognize that standardized tests have a place in the arsenal of tools used to assess student achievement, critics feel that overuse and misuse of these tests is having serious negative consequences on teaching and learning. According to the group Fair Test, [15] when standardized tests are the primary factor in accountability, the temptation is to use the tests to define curriculum and focus instruction. What is not tested is not taught, and how the subject is tested becomes a model for how to teach the subject. Critics say this disfavors higher-order learning. Of course this can also be used to focus instruction on desired outcomes [16], such as basic reading and math. Moreover, Popham [17] points out that standardized test scores are problematic tools for school accountability because the examinee scores are influenced by three things: what kids learn in school, what kids learn outside of school, and innate intelligence. New value-added-models have been proposed to cope with this criticism by statistically controlling for innate ability and out of school contextual factors. [18] While it is possible to use a standardized test and not let its limits control curriculum and instruction, this can result in a school putting itself at risk for producing lower test scores, with negative political consequences. For example, under the federal No Child Left Behind law in the United States, low test scores mean schools and districts can be labeled "in need of improvement" and punished. If the test is the only method of accountability, then parents and the community are less likely to know how well examinee are learning in untested areas. Supporters of standardized testing respond that these are not reasons to abandon testing, but rather criticisms of poorly designed testing regimes. They argue that testing focuses educational resources on the most important aspects of education - imparting a pre-defined set of knowledge and skills - and that other aspects are either less important, or should be added to the testing scheme. If "knowledge and skills" include the ability to write an essay, for example, then it clearly lays outside the province of standardized testing.
Department of Educational Training and Research
M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1
Some critics say that some examinee do not do well on standardized tests, despite mastery of the material, due to testing anxiety or lack of time management or test-taking skills. This reflects the fact that tests cannot directly measure student knowledge, only the ability of students to apply knowledge in a stressful situation. Testing anxiety has been linked to trait Neuroticism, which is related to generalize anxiety. The growing influence of test preparation is also a concern for some. As the importance of standardized testing rises, many students attempt to prepare themselves for a test, either through free sample tests and programs, purchasing books designed to prepare the student for a test, or private tutoring sessions. Some parents are willing to pay thousands of dollars to prepare their examinee for tests, [19] a financial barrier that may give examinee of more wealthy parents an advantage compared to less affluent families. However this criticism would probably apply even more to testing alternatives such as portfolios or essays. Many studies also show that test coaching has little effect on scores of well-built tests. The ability of wealthy families to pay for higher-quality education is not specifically related to standardize testing.
Conclusion
The preceding findings concisely explain the inherent harms of a test-driven curriculum. Additionally, the ways in which teachers, parents, and students can successfully fight against both the ignorant politicians who parade their notions of accountability for media sound-bites, and the wave of test-obsession that our schools have embraced with something akin to religious fervor. Measuring What Matters Least Although test scores are frequently quoted by newspapers, and hailed as the most effective way of measuring a students' success, they aren't necessarily valid indicators of achievement. The Worst Tests Not all tests are created equal. Although most tests are harmful, some - such as multiple choice, timed, or norm referenced examinations - prove even more damaging to children. Burnt at the High Stakes Simply put, high stakes testing is a flawed notion with a vast array of negative consequences. Though talk of 'accountability' and 'raising the bar' might make for good media sound bites, high stakes testing has led to increased incidents of bribery, coercion, and cheating. Poor Teaching for Poor Kids The movement towards high stakes testing has placed children from lower socio-economic backgrounds at an even greater disadvantage. Not only are tests often biased, but such students lack access to equal amounts of funding for quality teachers and test preparation. Department of Educational Training and Research M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1 6
If Not Standardized Tests, Then What? Contrary to popular belief, there are alternatives to standardized testing. Written evaluations, parent-teacher conferences, and performance assessments such as the 'portfolio' could not only lessen the need for standardized tests, but also lead to increased communication between all involved parties. Fighting the Tests Nothing about our educational system need be set in stone. We found ways to incorporate standardized testing as a major component of modern schooling, and likewise, we can find ways to remove such elements when they prove harmful. [20] Beware of Movements Making students accountable for test scores works well on a bumper sticker and it allows many politicians to look good by saying that they will not tolerate failure. But it represents a hollow promise. Far from improving education, high- stakes testing marks a major retreat from fairness, from accuracy, from quality, and from equity. [21]
Reference and Resources:

1. Gary G. Brannigan. (State University of New York- Plattsburg). The Enlightened Educator. McGraw-Hill, Inc. 2. Schachter, S., Brannigan, G. G., & Tooke, W.. (1991). Comparision of two scoring systems for the modified version of the Bender-Gestalt Testimonies Journal of School Psychology, 25, 265-269. 3. Bannigan, G. G., Aabye, S. M., Baker, L. A., & Ryan, G. T.. (Further validation of the qualitative scoring system for the Modified Bender-Gestalt Test.). (1995). Psychology in the Schools, 32, 24-26. 4. ETS Home. (2012). Educational Testing Service. http://www.ets.org/. 5. Jolayne H.. (Sunday, August 27, 2000 - Page updated at 12:00 AM). Temps spends just minutes to scorestate test; A WASL math problem may take 20 seconds; an essay 2 1/2 minutes. In The Seattle Time Co.. 6. Citizens United for Responsible Education. (April 19, 2010). Why the WASL is so AWful- a brochure. Cure Washington. http://www.curewashington.org/archives/821. 7. American Federation of Teachers, AFL. CIO. (July 2002). Where We Stand: Standards-Based Assessment and Accountability. http://www.aft.org/pdfs/teachers/wwsstandassessaccnt0603.pdf. Department of Educational Training and Research M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1 7
8. Paty McDivitt. (Draft 3 for Review). (18/10/11). Joint Committe on Standards for Educational Evaluation. http://www.jcsee.org/standards-development. 9. Joint Committe on Standards for Educational Evaluation.. (26/9/2008). The Personnel Evaluation Standards Newbury Park, CA: Sage Publications. http://www.jcsee.org/personnelevaluation-standards. 10. Donald B. Yarbrough. (2011). The Program Evaluation Standards Newbury Park, CA: Sage Publications. http://www.sagepub.com/booksProdDesc.nav? prodId=Book230597&_requestid=255617. 11. Joint Committe on Standards for Educational Evaluation, A. R. (21/10/2010). The Student Evaluation Standards Newbury Park, CA: Sage Publications. http://www.sagepub.com/productSearch.nav?siteId=sageus&prodTypes=any&q=e+student+evaluation+standards. 12. American Psychological Association. The Standards for Educational and Psychological Testing. 1430K Street, NW Suite 1200, Washington DC 20005: AERA Publications Sales. http://www.apa.org/science/programs/testing/standards.aspx. 13. Kuncel, N. R., & Hezlett, S. A.. (2007). Science, 315, 1080-81. 14. Carolyn S. Carr & Connie L. Fulmer, R.. (2004). In Educational Leadership: Knowing the Way, Showing the Way, Going the Way (135-136). 15. National Center for Fair and Open Testing. http://www.fairtest.org/. 16. College And Work Readiness Assessment http://www.cae.org/content/pro_collegework.htm. Council for Aid to Education.
17. Popham, W. J.. (1999). Why Standardized Test Scores Don't Measure Educational Quality Educational Leadership, 56 (6), 8-15. 18. Henry I. Braun. (September 2005). A Primer on Value-Added Models. In Fordham Foundation. http://www.edexcellence.net/commentary/education-gadfly-weekly/old/evaluatingvalue-added-findings-and-recommendations-from-the-nasbe-study-group-on-value-addedassessments-and-using-student-progress-to-evaluate-teachers-a-primer-on-value-addedmodels.html. 19. Associated Press. (August 4, 1998). Tackling the SAT? Test-Prep Help Abounds The Christian Science Monitor, 90 (175). 20. World Prosperity Ltd.. (1994-2003). http://www.world-prosperity.org/testing.htm. Department of Educational Training and Research M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1 8
21. Alfie Kohn. Rescuing Our Schools from "Tougher Standards" http://www.alfiekohn.org/stdtest.htm.
Department of Educational Training and Research
M. Ed; Session: 2011-2012 1/12/2012 22:48:00 a1/p1

Standardized Test

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Standardized Test

Uploaded by

Copyright:

Available Formats

Standardized Test: Design, Advantages and Disadvantages