Test Development and Grading

Test Development and Grading
Introduction
The procedures used by American Petroleum Institute (API) to prepare certification examinations are consistent with the technical guidelines recommended by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (AERA, APA, & NCME, 1985), and they adhere to relevant sections of the Uniform Guidelines on Employee Selection adopted by the Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, and Department of Justice (EEOC, CSC, DOL, & DOJ, 1978) and the PES Guidelines for the Development, Use, and Evaluation of Licensure and Certification Programs (Professional Examination Service, 1995). All API tests are constructed to meet the test specifications in effect for the API certification examination programs. These test specifications are always based on the results of job analysis studies conducted by API, with assistance from the industry professionals and the Professional Examination Service (PES). The job analysis studies for all inspector certification programs consist of defining the job-related activities, as well as knowledge and skills necessary to perform those activities. Panels of subject matter experts chosen by API generated the work-related activities and content areas of the job analyses under the guidance of professional staff from PES. The components delineated in the studies were validated by a number of subject matter experts (SMEs) which included owners, engineering supervisors, designers/engineers, supervisors of inspectors, inspectors, professional engineers, and trainers. These individuals were chosen by API and served to demonstrate that the activities, tasks, and knowledge statements developed by the panels of experts were applicable to individuals from a variety of work settings and geographic locations.
The primary objective of the API Individual Certification Programs is to protect the public by ensuring that candidates for certification demonstrate competence in content areas that are relevant to practice as entry- level inspectors. API and PES instituted a number of review procedures to ensure that the API examinations contain items that are relevant to practice and are critical to assessing the competence of inspectors at entry- level. The items of the API item bank were classified by content experts from the API Individual Certification Taskforce according to the content areas of the validated test specifications. To be accepted for inclusion in the API item bank, each item must also meet minimum criteria concerning its importance and criticality to measuring entry-level knowledge to practice as an inspector. In addition, the item must assess an aspect of work in the field that is frequently performed at entry-level. All new items that fail to meet these standards are automatically rejected from the API item pool. In addition to rigorous content validity reviews, all API items are evaluated by PES testing experts and editors to make sure that they conform to accepted principles of test construction and to established rules of grammar and style. Items that survive this screening procedure are placed in the API item pool for potential use on a subsequent API examination. Before any API test is administered, however, the test must be approved by the API Individual Certification Examination Committee. Members of the committee consider each item on the test and rate the items according to the validity
scales in effect for the API program. The Committee also checks the accuracy of the question during this review session. At the completion of the Committee review process, the test items undergo one additional round of psychometric and grammatical editing before a final form of the test is assembled. Examination Development PES staff initiates the process of developing a new test form for the API programs by reviewing the statistical data accrued for the most current test form. Test items with undesirable item statistical characteristics (items that are too easy or too difficult for candidates, items that do not distinguish among candidate ability groups, etc.) are flagged during this review process. In addition, items that have appeared on several times on the API test are targeted for replacement. PES staff assembles a draft form of the test by rejecting approximately 30% of the items on the most current test form, according to the criteria above. Replacement items from the API item bank are selected to match the content category and, if possible, the difficulty level of items removed from the test form. The draft form of the test is then duplicated and reviewed by the full API Inspector Certification Examination Committee at an exam construction meeting. While the focus of PES's evaluation of the draft test is on the psychometric properties of the examination, the Committee concentrates on the content of the examination. As is standard practice in the development of credentialling examinations, committee participation ensures that a broad range of content expertise is brought to bear during test review activities. Adequacy of content coverage, test item redundancy, and the accuracy of the answer key are among the factors considered by the Committee during this phase of the test development process. The Committee has access to the API item bank during this test evaluation period, in the event that additional item replacement is necessary. In addition, the Committee has the opportunity to edit or refine any item on the examination, so that even items that appear on successive forms of the exam may be different from one form to another. A critical feature of the test review process for the API program involves the use of item validation scales. The item validation scales are printed below each API test item on the reviewer copies of the test. The Validation Rating Scales relate to the importance, criticality, and accuracy of the test items. Committee members complete the validation scales during the test review session, and only items with adequate ratings are accepted by the Committee. PES maintains the Committee's rating data as part of the permanent documentation of the test items, in the event that an item is challenged as to its validity or accuracy. At the conclusion of the Committee's review of the draft test, PES staff incorporates additional replacement items or item revisions into a second draft version of the new test form. Once the production of the revised draft test has been completed, reviewer copies of the test are evaluated by the Committee Chair and one Committee member. Final item revisions or replacements may be made by reviewers at this time. Scoring/Key Finalization
After each examination has been administered, and before the exams are scored, the following procedures are followed to assure that the scoring process is fair. An item analysis is performed and the statistical performance of each item is reviewed by PES testing experts. Items found to be questionable, based on their statistical performance are set aside for review. Comment forms from every test site are collected and sent to PES, along with the answer sheets and other test material. Any item that receives a comment or challenge is set aside for review. This group of items, copies of the comment forms, and statistical information is sent to members of the Inspector Certification Examination Committee for a final review. This review may result in no change to the scoring of an item, or it may result in changing the answer to a question (re-keying) or it may result in giving credit for more than one correct answer (multiple-keying). The recommendations of this review are forwarded to PES so that the answer key to the examination can be finalized. In this way, any changes to the scoring of items as a result of this process are applied to everyone who took the test. Only after the answer key is finalized, can the examination finally be scored. It usually takes about six weeks from the time a test is administered until this process is completed and score reports can be mailed. Scoring/Setting a Pass Point The passing point for API Inspector Certification examinations is generated at standardsetting workshops. These workshops are used to establish a passing point on a base form of the examination that represents an absolute standard of knowledge. In the context of a high stakes credentialing examination, this method is more fair to the candidate than other ways to select a pass point. The workshop technique requires content specialists to answer the following question for each test item, "What percent of candidates who are just barely qualified for certification as an API inspector will answer this item correctly?" After considering each element of this question, judges practice with the technique on a subset of items from the examination. After discussing these practice ratings, participants are asked to rate every item on the examination. Passing scores are derived by summing the judges' ratings and then calculating an average score across judges. For subsequent forms of the examination, an equating procedure is used to determine passing scores comparable to the passing score set on the base form of the examination (see below). Scoring/Equating We do not utilize a so-called bell curve to score our examinations. For a high-stake professional examination it would not be fair if the passing/not-passing decisions for each individual were based on the knowledge of a particular testing group. Therefore we use statistical process called equating (see below) is used to adjust the pass mark according to the relative difficulty of the exam form. The equated raw score pass point is computed by PES and provided to the Chairman of the Inspector Certification Task Force for approval. Once the equated pass point is established, the exam can be scored, and score reports are generated and mailed. Why do we use equating?
The purpose of equating is to establish, as nearly as possible, an effective equivalence between raw scores on two test forms. Despite attempts to construct examinations that are very similar in content, format, and difficulty, different forms of an examination will vary in the level and range of difficulty. As a result, raw score comparisons on two forms of an examination would not be fair to candidates who were administered the more difficult form. Equating methods have been used to establish a relationship between raw scores on two forms of the test that can be used to express the scores on one form in terms of the scores on the other form. Advantages of using equating. First, differences in test difficulty are controlled statistically and candidates are not penalized (or rewarded) for taking a more difficult (or easier) form of the test. Second, variations in passing scores from year to year will be reduced. Third, limited resources (content expertise, money) can be expended on other test development activities, rather than on passing point workshops.

Test Development and Grading

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Test Development and Grading

Uploaded by

Copyright:

Available Formats

Test Development and Grading

You might also like