Testing and Assessment: Ch. 6 Test Bank

c6
Student: ___________________________________________________________________________
1. In legal terminology, a valid contract is a contract that

A. measures what it purports to measure.
B. has been executed with the proper formalities.
C. is well grounded on principles of evidence.
D. None of these
2. As the term is applied to a test, validity is a judgment or estimate of how well a test
B. measures what it purports to measure in a particular context.
C. satisfies the deductions that could logically be made from inferences about it.
D. All of these
3. A test reviewer comes to the conclusion that a certain test is "a valid test." This means
that the reviewed test has been shown to be valid for
A. a particular use with a particular population for the life of the test.
B. a particular use with a universal population of testtakers for a limited time.
C. universal use with all testtakers for the life of the test.
D. a particular use with a particular population at a particular time.
4. Each of the three approaches to validity assessment in the trinitarian model should
BEST be thought of as
A. mutually exclusive as evidence of a test's validity with any one source necessary and
sufficient for demonstrating a test's validity.
B. one type of evidence that, with others, contributes to a judgment concerning the
validity of a test.
C. insufficient, either by themselves or together with the other two, to demonstrate the
validity of a test.
D. None of these
5. The validation of a test is a process
A. that can be carried out by the test author.
B. that can be carried out by the test user.
C. of gathering evidence of the test's validity.
D. All of these
6. Comedian Rodney Dangerfield was cited in the text to illustrate a point about how
which of the following is viewed?
A. test validation
B. content validity
C. face validity
D. construct validity
7. "It's a measure of validity that arrived at by a comprehensive analysis of how scores

on the test relate to other test scores." This statement is a reference to:
A. face validity
B. content validity
C. the trinitarian index
8. Messick is to unitarian as __________ is to trinitarian.

A. Cronbach
B. Lawshe
C. Landy
D. Dangerfield
9. As mentioned in Chapter 6 of your text, the measurement of content validity is

particularly important in
A. classroom settings, where tests will form the basis of a grade.
B. employment settings, where tests may be used to promote employees.
C. courtroom settings, where tests may be used to determine competence.
D. screening for the potential of emission of violent or aggressive behavior.
10. Lawshe's method for gauging agreement among raters is used to derive a measure
of
A. face validity.
B. content validity.
C. criterion-related validity.
D. construct validity.
11. In Chapter 6 of your text, Dr. Adam Shoemaker, the featured professional in Meet an
Assessment Professional, described the use of a test with little criterion validity. Dr.
Shoemaker recalled that this test was used for the purpose of
A. gauging inter-item consistency of another test.
B. gaining "buy-in" from the test users.
C. providing a "job preview" of sorts to assessees.
D. hiring candidates for mid-level executive positions.
12. If a test developer has only a "fuzzy" vision of the construct being measured, then
A. the content validity of the test is likely to suffer.
B. the construct validity of the test is likely to suffer.
C. content irrelevant to the targeted construct may be measured.
D. All of these
13. Test blueprinting is applied in the design of

A. an attitude test.
B. a personality test.
C. an aptitude test.
D. All of these
14. In order to remain consistent with a test's blueprint, a test administered on a regular
basis is likely to require
A. item pool management.
B. base rate maintenance.
C. predictive validity certification.
D. None of these
15. Criterion-related validity is to predictive validity as criterion-related validity is to

A. construct validity.
C. concurrent validity.
D. test bias.
16. A team of consumer psychologists is interested in conducting research to test the
palatability of Papa John's Pizza (PJP), A PJP Palatability Test is developed on the basis of
the opinions of a sample of prison inmates sentenced to life in prison. These same
inmates are then used to validate a paper-and-pencil "PJP Palatability Survey." What error
has been committed by the researchers?
A. The researchers used an inappropriate population to test.
B. The test validation was invalid due to criterion contamination.
C. Convergent evidence was confused with discriminant evidence.
D. A Constitutional prohibition against subjecting prisoners to cruel and unusual
punishment was violated
17. It has to do with the degree to which an additional predictor explains something
about the criterion measure that is not explained by predictors already in use. It is
A. the false positive rate.
B. evidence of construct validity.
C. predictive validity.
D. incremental validity.
18. An expectancy chart is

A. a graphic representation of an expectancy table.
B. a table illustrating the incremental validity of a test.
C. a pictorial image of a hit rate versus a miss rate.
D. All of these
19. "The effect of instituting this remedy for adverse impact is to make equivalent all
scores that fall within a particular range." The remedy for adverse impact referred to
here is technically referred to as
A. within-group norming.
B. differential cut-offs.
C. preference policies.
D. banding.
20. "How can group differences on cognitive ability tests be reduced while retaining
existing high levels of reliability and criterion-related validity?" According to Gottfredson,
the answer to this question
A. lies in the judicious application of affirmative action strategies.
B. must be answered by measurement professionals for themselves.
C. must come from strategies designed to minimize adverse impact.
D. will not come from measurement-related research.
21. A test is considered valid when the test
B. measures whatever it is that it measures consistently.
C. can be administered efficiently and cost-effectively.
D. has little or no error associated with it.
22. Which is NOT a method of evaluating the validity of a test?

A. evaluating scores on the test as compared to scores obtained on other tests
B. evaluating the content of the test
C. evaluating the percentage of passing and failing grades on the test
D. evaluating test scores as they relate to predictions from a particular theory
23. Predictive and concurrent validity can be subsumed under

A. content validity.
B. criterion-related validity.
C. face validity.
D. true score validity.
24. Relating scores obtained on a test to other test scores or data from other assessment
procedures is typically done in an effort to establish the __________ validity of a test.
A. content-related
B. criterion-related
C. face
D. about-face
25. Face validity refers to

A. the most preferred method for determining validity.
B. another name for content validity.
C. the appearance of relevancy of the test items.
D. validity determined by means of face-to-face interviews.
26. Face validity

A. may influence the way the testtaker approaches the situation.
B. relates more to what the test appears to measure than what the test may actually
measure.
C. is given short-shrift as compared to other indices of validity.
D. All of these
27. Which assessment technique is the BEST example of a face valid method?
A. a personality test in which testtakers are asked to describe what they see in inkblots
B. administering a word processing test to a person applying to be a word processor
C. asking testtakers to draw a picture of their family to assess family relationships
D. measuring the height of applicants applying for a semi-pro basketball team
28. An instructor announces that an examination will cover the topics of reliability and
validity. Malcolm boasts that he will read and study only the material on reliability. As it
turns out, all of the test questions are only on the topic of reliability. The MOST
reasonable conclusion a student of assessment could draw from this is that
A. the examination lacked criterion-related validity.
B. the examination lacked content validity.
C. the examination lacked face validity.
D. it's worth getting to know Malcolm better.
29. Lawshe devised a method for determining agreement among raters or judges who
rate items on how essential they are. This method provides a way to quantify what type
of validity?
A. content
B. construct
C. criterion-related
D. predictive
30. Before constructing a comprehensive final examination that covers everything you
have studied since Day 1 of your course, your instructor reviews the objectives of the
course, the textbook, and all lecture notes. Your instructor is clearly making a diligent
effort to maximize the __________ validity of the final examination.
A. content
C. predictive
D. internal consistency
31. In calculating the content validity ratio, panelists are asked to determine
A. if the test item has face validity and an acceptable level of reliability.
B. if the test item is too long or too short.
C. if the test item is ambiguous.
D. if the skill or knowledge measured by the item is essential.
32. The minimum value of a content validity ratio necessary to be statistically significant
at the .05 level is dependent on
A. the number of panelists judging the items.
B. the degree of the construct validity of the test.
C. the number of testtakers.
D. the number of items on the test.
33. A standard against which a test or test score is evaluated is known as

A. a facet.
B. a correlation coefficient.
C. a validity coefficient.
D. a criterion.
34. Which of the following is BEST be viewed as varieties of criterion-related validity?

A. concurrent validity and face validity
B. content validity and predictive validity
C. concurrent validity and predictive validity
D. concurrent validity and content validity
35. The form of criterion-related validity that reflects the degree to which a test score is
correlated with a criterion measure obtained at the same time that the test score was
obtained is known as:
A. predictive validity.
B. construct validity.
D. content validity.
36. The form of criterion-related validity that reflects the degree to which a test score
correlates with a criterion measure that was obtained some time subsequent to the test
score is known as:
37. A key difference between concurrent and predictive validity has to do with
A. the time frame during which data on the criterion measure is collected.
B. the magnitude of the reliability coefficient that will be considered significant at the .05
level.
C. the magnitude of the validity coefficient that will be considered significant at the .05
level.
D. Both b and c
38. Which is an example of a criterion?

A. achievement test scores
B. success in being able to repair a defective toaster
C. student ratings of teaching effectiveness
D. All of these
39. Criterion contamination occurs when

A. the criterion measure is influenced by the predictor measure.
B. subjects talk to one another about the test.
C. the characteristic being measured occurs with low frequency in the group being
studied.
D. All of these
40. Which BEST represents an unobtrusive measure of marital adjustment?

A. the number of years a couple has been married
B. self-ratings of marital satisfaction by each spouse
C. ratings of marital satisfaction made by trained observers
D. None of these
41. According to the text, face validity may ultimately be more of an issue regarding
__________ than ________.
A. social values/psychometric soundness.
B. psychometric soundness/public relations.
C. public relations/psychometric soundness.
D. social values/public perception.
42. An investigation of a test's construct validity may yield evidence that
A. the test is measuring a single construct.
B. the test does not correlate significantly with another test purporting to measure the
same construct.
C. test scores increase as a function of age.
D. All of these
43. What type of validity evidence BEST sheds light on how a shorter and less expensive
test compares with a longer and more expensive one?
A. predictive criterion-related validity
B. concurrent criterion-related validity
C. content validity
44. What type of validity evidence best sheds light on whether a college admissions test
is valid for selecting students who will complete the program within 4 years?
C. content validity
45. Blueprinting is best associated with

D. architectural validity.
46. The magnitude of a validity coefficient may be affected by

A. attrition of the sample.
B. restriction of range.
C. inflation of range.
D. All of these
47. Which magnitude of validity coefficient is typically acceptable to conclude that a test
is valid?
A. 1.50
B. 1.80
C. above 1.90
D. None of these
48. A coefficient of correlation is calculated between Malcolm's score on a test of
sociopathy and a clinician's rating of Malcolm on the variable of sociopathy. This
coefficient of correlation might also be referred to as
A. an index of reliability.
B. an index of sociopathy.
D. a content-related validity coefficient.
49. Criterion-related validity can be evaluated through the use of

A. expectancy data.
B. reliability coefficients.
C. the Rulon formula.
D. None of these
50. Expectancy tables are used in evaluating

B. factorial validity.
D. None of these
51. The percentages included in expectancy tables refer to the number of

A. tests administered versus tests passed.
B. people obtaining a particular test-score/criterion-score combination.
C. items the test developer expects will be sufficient for the item pool.
D. people who are expected to pass the test but may not be successful at the criterion.
52. Which statement is always TRUE of the criterion in expectancy tables?

A. The criterion is represented as the number of points scored.
B. The criterion is dichotomized.
C. The criterion is listed by score interval.
D. The criterion can be objectively scored.
53. In an expectancy table, the percentage of employees who are currently successful in
a position provides some indication of:
A. the validity of the proposed selection measure as compared to another proposed
selection measure.
B. the percent successful using current methods of selection.
C. the reliability of the proposed selection measure.
D. the base rate of the proposed selection measure.
54. Which measures provide statistical evidence for the judgment of criterion-related
validity?
A. reliability coefficient and content validity ratio
B. validity coefficient and expectancy data
C. validity coefficient and content validity ratio
D. reliability coefficient and expectancy data
55. Employment test data suggests that an individual applicant is incapable of

successfully performing a particular job. However, in reality, this individual would be very
successful at the job. This situation exemplifies what is meant by
A. a base rate.
B. a false positive.
C. a false negative.
D. an "E" True Hollywood Story.
56. Which is an example of a false positive?

A. A test identifies a client as schizophrenic when the client is not.
B. A test correctly identifies a client as schizophrenic.
C. A test correctly identifies a client as not having schizophrenia.
D. A test indicates that a client is not schizophrenic when he is.
57. If you were a psychologist working in the field of human resources, which claim for a
new personnel selection test by a test publisher would be MOST persuasive?
A. The test identifies a large number of false positives.
B. The test improves the hit rate.
C. The test identifies a large base rate.
D. The test improves the selection ratio.
58. A construct is
A. unobservable.
B. something that describes behavior.
C. something that is assumed to exist.
D. All of these
59. Which qualifies as a construct?

A. depression
B. intelligence
C. mechanical aptitude
D. All of these
60. All validity evidence can be interpreted as ________ validity.
A. content
C. predictive
D. construct
61. Evidence of the homogeneity of a test can be found in the

A. correlation between a test and some criterion.
B. correlation between test items and total test scores.
C. correlation between subtest scores and total scores.
D. Both b and c
62. Which statistic is appropriate for use to estimating the heterogeneity of a test
composed of multiple-choice items?
A. point-biserial correlation coefficient
B. Pearson-product moment correlation coefficient
C. coefficient alpha
D. chi square
63. Test scores may be affected in pre- and post-testing by

A. therapy.
B. medication.
C. education.
D. All of these
64. If a test is a valid measure of a particular construct, we would expect that

A. groups of people who differ with respect to the construct will obtain different test
scores.
B. groups of people who differ with respect to the construct will obtain similar test
scores.
C. groups of people who obtain similar scores will have similar personalities.
D. None of these
65. A significant, positive relationship exists between scores on a new test of intelligence
and scores on the fourth edition of the Stanford-Binet intelligence scale. These data may
be viewed as supportive of which type of validity evidence for the new test?
A. criterion-related validity
B. content validity
C. convergent evidence of construct validity
D. discriminant evidence of construct validity
66. A statistically insignificant correlation between scores on a new test of depression

and a well established measure of satisfaction with life may be construed as which type
of validity evidence with regard to the test of depression?
B. convergent evidence of construct validity
C. discriminant evidence of construct validity
D. None of these because there was an insignificant relationship.
67. Which is the MOST useful tool in evaluating convergent and discriminant validity
evidence?
A. the Rulon formula
B. a multitrait-multimethod matrix
C. a Greco-Latin squares design
D. an abacus
68. The names attributed to different factor loadings in a factor analysis are
A. dictated by the factors themselves.
B. subject to change as new analyses occur.
C. thoroughly validated against dictionary definitions.
D. dependent on the researcher's judgment.
69. In the context of test bias, a biased test

A. may be used fairly.
B. may be used unfairly.
C. may be used either fairly or unfairly.
D. is only used by biased test users.
70. A test is considered to contain a bias if
A. 50% of the test-takers fail the test.
B. one group, such as males, consistently performs better than another group, such as
females.
C. a factor inherent in the test systematically prevents accurate measurement.
D. the test developer was found to harbor prejudice against some group.
71. Which is TRUE regarding a rating?

A. It refers only to a numerical judgment that places a person or an attribute along a
continuum.
B. It refers only to a verbal judgment that places a person or an attribute along a
continuum.
C. It tends not to involve a judgment.
D. It refers to either a numerical or a verbal judgment that places a person or an
attribute along a continuum.
72. Which term is used to refer to the tendency of a rater to evaluate ratees higher than
they objectively deserve because of the rater's inability to discriminate between aspects
of the ratee's behavior?
A. halo effect
B. random error
C. generosity error
D. severity error
73. Rating errors

A. may be unintentional.
B. may be intentional.
C. may involve a tendency to be lenient in rating.
D. All of these
74. A supervisor unintentionally rates his supervisees less favorably than they really
deserve. Which type of error is at work here?
A. unconscious error
B. severity error
C. random error
D. vocational error
75. Which type of error has occurred when a music critic's review of Lady GaGa's latest
album is more positive than most person on the planet believe was warranted?
A. fashion error
B. central tendency error
C. severity error
D. halo effect
76. A rater systematically assigns ratings in the middle range, thus avoiding extremely
positive and negative ratings. Which type of error BEST characterizes this rater's
ratings?
A. leniency error
C. severity error
D. halo effect
77. Issues of "fairness" as applied to tests

A. are seldom discussed in the popular media.
B. may be determined through mathematical procedures.
C. are generally agreed on.
D. are rooted in moral and philosophical issues.
78. Quotas may be viewed as one type of remedy for

A. low reliability of selection tests.
B. previously unfair practices.
C. low validity of selection tests.
D. All of these
79. Which of the following is TRUE of test bias as compared to test fairness?
A. Test bias is dependent on statistical analyses while test fairness relates to values.
B. Test bias is dependent on values while test fairness relates to statistical analyses.
C. Whether a test is fair can be answered with certainty while whether a test is biased
cannot.
D. None of these statements are true.
80. Any definition of test fairness as used in a psychometric context would be likely to
include reference to
A. the percent of items answered correctly by members of different groups.
B. the mean scores earned by various groups on a particular test.
C. the degree to which a test is used in an impartial, just and equitable way.
D. All of these
81. If new predictors explain something about a predicted score that was not already
explained by existing predictors, the new predictor might be praised for its
A. test-retest reliability.
B. incremental validity.
C. construct validity.
D. face validity.
82. In psychological testing and assessment, bias refers to

A. random variation in test performance attributable to covert prejudice on the part of
the test developer.
B. systematic variation in test performance that is unrelated to the construct that the
test is intended to be measured.
C. a test or testing practice that systematically favors the performance of one group of
testtakers over another.
D. All of these
83. Which of the following is the BEST way to minimize test bias?
A. create separate norm groups for different groups so that any potential bias is reduced.
B. have a panel of experts review the test items at various stages during the test's
development.
C. pre-screen examiners to be used in the test administration for any signs of bias or
prejudice.
D. employ the multitrait-multimethod matrix to screen items for bias.
84. A new test designed to gauge competence to stand trial is found to lack face validity.
Which is the MOST likely consequence of this fact?
A. Judges will urge assessors to use this test.
B. Lawyers will urge assessors not to use this test.
C. impression management will be less of a factor in the test results.
D. whether defendants are competent will be less of a factor in the test results.
85. Which BEST describes the concept of validity as applied to tests?
A. It refers to how well a test measures what the test authors intend it to measure
B. It refers to whether the same results could have occurred by chance less than five
times in a hundred.
C. It refers to how well a specific sample performs on an administration of a test.
D. It refers to whether or not a test is administered under standardized conditions.
86. Relating to Lawshe's Content Validity Ratio (CVR), if half of the panelists rated a
given item on an employment test as representing an essential skill required on the job,
the item's CVR would
A. be negative.
B. be zero.
C. be positive.
D. depend on how many panelists there are.
87. A psychologist wants to determine the criterion-related validity of an intelligence test

by determining how well it predicts a student's placement in a special class. If the
psychologist used the intelligence test for both diagnosis and special class placement,
that criterion would be said to be
A. irrelevant.
B. contaminated.
C. invalid.
D. negatively skewed
88. A test developer compares a student's performance on a newly developed math

achievement test to the same student's performance on a well established math
achievement test for the purpose of exploring the ________ validity of the new test.
A. content
B. concurrent criterion-related
C. predictive criterion-related
D. construct
89. Comparing SAT scores earned in high school with the first semester college GPA of
that same student is a process related to establishing the ________ validity of the SAT.
A. content
D. construct
90. The results of a predictive validity study of a test will likely be affected most by
A. the characteristics of the sample tested, such as attrition and self-selection.
B. the number of items on the test, with longer tests demonstrating higher predictive
validity.
C. the correlation coefficient chosen to measure the validity.
D. the administration time required for the test compared with that of the criterion test
chosen.
91. Which is an example of convergent evidence for the construct validity of a test
measuring fear of cats?
A. a high correlation between the test and an existing validated test measuring fear of
cats
B. a high correlation with an existing validated test measuring more-generalized fear
C. a low correlation between the test and a test to measure fear of dogs
D. Both a and b
92. In contrast to a trinitarian view of validity, a unitary view of validity takes into
account
A. two of the three elements of the trinitarian view.
B. none of the elements of the trinitarian view but a new model based on consequences
of test use.
C. all three elements of the trinitarian view plus additional factors such as cultural
values.
D. None of these
93. If a newly developed test designed to measure happiness correlates with other tests
of happiness but not with tests of sadness, this is referred to as __________________ and
_________________ evidence of validity, respectively.
A. convergent; discriminant
B. discriminant; convergent
C. homogeneous; concurrent
D. concurrent; homogeneous
94. Which is TRUE regarding the concept of test fairness?

A. Fairness is relatively easy to determine compared with bias.
B. Fairness is usually determined statistically.
C. Fairness often involves moral/ethical issues.
D. All of these
95. Which is TRUE regarding the adjustment of test scores as a function of group
membership?
A. It is illegal for purposes of making hiring or promotion decisions according to the Civil
Rights Act of 1991.
B. It is viewed as helping guarantee the proportional representation of various minority
groups in the workplace.
C. It is viewed as allowing the preferential treatment of certain groups.
D. All of these
96. The primary purpose of the correlation matrix in the multitrait-multimethod matrix
technique is
A. to break down variables into a smaller number of factors.
B. to create a large number of factors from a basic set of variables.
C. to determine how well a variable correlates with itself.
D. None of these
97. "Unequal levels of difficulty between two groups" characterizes the definition of a
biased test that would MOST probably be a quote from
A. any random member of the general public.
B. a court
C. a psychometrician.
D. Kourtney and Kim Kardashian.
98. Which of the following is NOT included in the traditional "trinitarian"

conceptualization of validity?
A. face validity
B. content validity
C. construct validity
D. criterion-related validity
99. In studies that indicate that Attention Deficit Disorder occurs in approximately 2% of
the population, 2% represents the __________ for the disorder.
A. hit rate
B. base rate
C. miss rate
D. sample
100. Which of the following is the best definition of hit rate?
A. the proportion of people the test correctly identifies as possessing a particular trait,
behavior, characteristic, or attribute
B. the proportion of people in the general population who possess the particular trait,
C. the proportion of people the test incorrectly identifies as possessing a particular trait,
D. the degree of validity of a particular test
101. The extent to which a particular factor contributes to a test score is referred to as a
A. true score.
B. base rate.
C. factor loading.
D. hit rate.
102. Factor analysis

A. is a class of mathematical procedures.
B. is a data reduction technique.
C. explains the extent to which a factor or factors explain test scores.
D. All of these
103. Using a test that measures a low base rate trait

A. will likely result in more correct than incorrect classifications.
B. will likely result in more incorrect than correct classifications.
C. will result in an equal number of correct and incorrect classifications.
D. will have results that cannot be determined based on the information presented.
104. This Child Abuse Potential (CAP) Inventory boasts an accuracy rate of approximately
90%. Properly interpreted, this means that
A. 90% of the people who score high on the CAP physically abuse children.
B. 90% of the people who score low on the CAP do not physically abuse children.
C. in groups with a 50% base rate, 90% of those who abuse children are correctly
identified.
D. that in groups with a 90% base rate, 50% of those who abuse children are correctly
identified.
105. If the rate of a particular disorder occurring in the population is low, what impact
does this have on the classification of individuals based on the results of a psychological
test?
A. There will be no impact on the accuracy of the classification.
B. More individuals will be incorrectly classified as not having the disorder.
C. More individuals will be incorrectly classified as having the disorder.
D. The impact cannot be determined based on the information provided.
validity of a test.
validity of a test.
D. None of these
c6 Key
1. In legal terminology, a valid contract is a contract that

B. has been executed with the proper formalities.
C. is well grounded on principles of evidence.
D. None of these
Cohen - Chapter 06 #1
2. As the term is applied to a test, validity is a judgment or estimate of how well a test
B. measures what it purports to measure in a particular context.
C. satisfies the deductions that could logically be made from inferences about it.
D. All of these
3. A test reviewer comes to the conclusion that a certain test is "a valid test." This means
that the reviewed test has been shown to be valid for
A. a particular use with a particular population for the life of the test.
B. a particular use with a universal population of testtakers for a limited time.
C. universal use with all testtakers for the life of the test.
D. a particular use with a particular population at a particular time.
validity of a test.
validity of a test.
D. None of these
5. The validation of a test is a process

A. that can be carried out by the test author.
B. that can be carried out by the test user.
C. of gathering evidence of the test's validity.
D. All of these
6. Comedian Rodney Dangerfield was cited in the text to illustrate a point about how
which of the following is viewed?
A. test validation
B. content validity
C. face validity
7. "It's a measure of validity that arrived at by a comprehensive analysis of how scores

on the test relate to other test scores." This statement is a reference to:
A. face validity
B. content validity
C. the trinitarian index
8. Messick is to unitarian as __________ is to trinitarian.
A. Cronbach
B. Lawshe
C. Landy
D. Dangerfield
9. As mentioned in Chapter 6 of your text, the measurement of content validity is

particularly important in
A. classroom settings, where tests will form the basis of a grade.
B. employment settings, where tests may be used to promote employees.
C. courtroom settings, where tests may be used to determine competence.
D. screening for the potential of emission of violent or aggressive behavior.
10. Lawshe's method for gauging agreement among raters is used to derive a measure
of
A. face validity.
D. construct validity.
11. In Chapter 6 of your text, Dr. Adam Shoemaker, the featured professional in Meet an
Assessment Professional, described the use of a test with little criterion validity. Dr.
Shoemaker recalled that this test was used for the purpose of
A. gauging inter-item consistency of another test.
B. gaining "buy-in" from the test users.
C. providing a "job preview" of sorts to assessees.
D. hiring candidates for mid-level executive positions.

12. If a test developer has only a "fuzzy" vision of the construct being measured, then
A. the content validity of the test is likely to suffer.
B. the construct validity of the test is likely to suffer.
C. content irrelevant to the targeted construct may be measured.
D. All of these
13. Test blueprinting is applied in the design of

A. an attitude test.
B. a personality test.
C. an aptitude test.
D. All of these
14. In order to remain consistent with a test's blueprint, a test administered on a regular
basis is likely to require
A. item pool management.
B. base rate maintenance.
C. predictive validity certification.
D. None of these
15. Criterion-related validity is to predictive validity as criterion-related validity is to

D. test bias.

16. A team of consumer psychologists is interested in conducting research to test the
palatability of Papa John's Pizza (PJP), A PJP Palatability Test is developed on the basis of
the opinions of a sample of prison inmates sentenced to life in prison. These same
inmates are then used to validate a paper-and-pencil "PJP Palatability Survey." What error
has been committed by the researchers?
A. The researchers used an inappropriate population to test.
B. The test validation was invalid due to criterion contamination.
C. Convergent evidence was confused with discriminant evidence.
D. A Constitutional prohibition against subjecting prisoners to cruel and unusual
punishment was violated
17. It has to do with the degree to which an additional predictor explains something
about the criterion measure that is not explained by predictors already in use. It is
A. the false positive rate.
B. evidence of construct validity.
C. predictive validity.
D. incremental validity.
18. An expectancy chart is

A. a graphic representation of an expectancy table.
B. a table illustrating the incremental validity of a test.
C. a pictorial image of a hit rate versus a miss rate.
D. All of these
19. "The effect of instituting this remedy for adverse impact is to make equivalent all
scores that fall within a particular range." The remedy for adverse impact referred to
here is technically referred to as
A. within-group norming.
B. differential cut-offs.
C. preference policies.
D. banding.

20. "How can group differences on cognitive ability tests be reduced while retaining
existing high levels of reliability and criterion-related validity?" According to Gottfredson,
the answer to this question
A. lies in the judicious application of affirmative action strategies.
B. must be answered by measurement professionals for themselves.
C. must come from strategies designed to minimize adverse impact.
D. will not come from measurement-related research.
21. A test is considered valid when the test

B. measures whatever it is that it measures consistently.
C. can be administered efficiently and cost-effectively.
D. has little or no error associated with it.
22. Which is NOT a method of evaluating the validity of a test?

A. evaluating scores on the test as compared to scores obtained on other tests
B. evaluating the content of the test
C. evaluating the percentage of passing and failing grades on the test
D. evaluating test scores as they relate to predictions from a particular theory
23. Predictive and concurrent validity can be subsumed under

B. criterion-related validity.
C. face validity.
D. true score validity.

24. Relating scores obtained on a test to other test scores or data from other assessment
procedures is typically done in an effort to establish the __________ validity of a test.
A. content-related
C. face
D. about-face
25. Face validity refers to

A. the most preferred method for determining validity.
B. another name for content validity.
C. the appearance of relevancy of the test items.
D. validity determined by means of face-to-face interviews.
26. Face validity

A. may influence the way the testtaker approaches the situation.
B. relates more to what the test appears to measure than what the test may actually
measure.
C. is given short-shrift as compared to other indices of validity.
D. All of these
27. Which assessment technique is the BEST example of a face valid method?
A. a personality test in which testtakers are asked to describe what they see in inkblots
B. administering a word processing test to a person applying to be a word processor
C. asking testtakers to draw a picture of their family to assess family relationships
D. measuring the height of applicants applying for a semi-pro basketball team

28. An instructor announces that an examination will cover the topics of reliability and
validity. Malcolm boasts that he will read and study only the material on reliability. As it
turns out, all of the test questions are only on the topic of reliability. The MOST
reasonable conclusion a student of assessment could draw from this is that
A. the examination lacked criterion-related validity.
B. the examination lacked content validity.
C. the examination lacked face validity.
D. it's worth getting to know Malcolm better.
29. Lawshe devised a method for determining agreement among raters or judges who
rate items on how essential they are. This method provides a way to quantify what type
of validity?
A. content
B. construct
C. criterion-related
D. predictive
30. Before constructing a comprehensive final examination that covers everything you
have studied since Day 1 of your course, your instructor reviews the objectives of the
course, the textbook, and all lecture notes. Your instructor is clearly making a diligent
effort to maximize the __________ validity of the final examination.
A. content
C. predictive
D. internal consistency
31. In calculating the content validity ratio, panelists are asked to determine
A. if the test item has face validity and an acceptable level of reliability.
B. if the test item is too long or too short.
C. if the test item is ambiguous.
D. if the skill or knowledge measured by the item is essential.

32. The minimum value of a content validity ratio necessary to be statistically significant
at the .05 level is dependent on
A. the number of panelists judging the items.
B. the degree of the construct validity of the test.
C. the number of testtakers.
D. the number of items on the test.
33. A standard against which a test or test score is evaluated is known as

A. a facet.
B. a correlation coefficient.
D. a criterion.
34. Which of the following is BEST be viewed as varieties of criterion-related validity?

A. concurrent validity and face validity
B. content validity and predictive validity
C. concurrent validity and predictive validity
D. concurrent validity and content validity
35. The form of criterion-related validity that reflects the degree to which a test score is
correlated with a criterion measure obtained at the same time that the test score was
obtained is known as:

36. The form of criterion-related validity that reflects the degree to which a test score
correlates with a criterion measure that was obtained some time subsequent to the test
score is known as:
37. A key difference between concurrent and predictive validity has to do with
A. the time frame during which data on the criterion measure is collected.
B. the magnitude of the reliability coefficient that will be considered significant at the .05
level.
C. the magnitude of the validity coefficient that will be considered significant at the .05
level.
D. Both b and c
38. Which is an example of a criterion?

A. achievement test scores
B. success in being able to repair a defective toaster
C. student ratings of teaching effectiveness
D. All of these
39. Criterion contamination occurs when

A. the criterion measure is influenced by the predictor measure.
B. subjects talk to one another about the test.
C. the characteristic being measured occurs with low frequency in the group being
studied.
D. All of these

40. Which BEST represents an unobtrusive measure of marital adjustment?
A. the number of years a couple has been married
B. self-ratings of marital satisfaction by each spouse
C. ratings of marital satisfaction made by trained observers
D. None of these
41. According to the text, face validity may ultimately be more of an issue regarding
__________ than ________.
A. social values/psychometric soundness.
B. psychometric soundness/public relations.
C. public relations/psychometric soundness.
D. social values/public perception.
42. An investigation of a test's construct validity may yield evidence that

A. the test is measuring a single construct.
B. the test does not correlate significantly with another test purporting to measure the
same construct.
C. test scores increase as a function of age.
D. All of these
43. What type of validity evidence BEST sheds light on how a shorter and less expensive
test compares with a longer and more expensive one?
C. content validity

44. What type of validity evidence best sheds light on whether a college admissions test
is valid for selecting students who will complete the program within 4 years?
C. content validity
45. Blueprinting is best associated with

D. architectural validity.
46. The magnitude of a validity coefficient may be affected by

A. attrition of the sample.
B. restriction of range.
C. inflation of range.
D. All of these
47. Which magnitude of validity coefficient is typically acceptable to conclude that a test
is valid?
A. 1.50
B. 1.80
C. above 1.90
D. None of these

48. A coefficient of correlation is calculated between Malcolm's score on a test of
sociopathy and a clinician's rating of Malcolm on the variable of sociopathy. This
coefficient of correlation might also be referred to as
A. an index of reliability.
B. an index of sociopathy.
D. a content-related validity coefficient.
49. Criterion-related validity can be evaluated through the use of

A. expectancy data.
B. reliability coefficients.
C. the Rulon formula.
D. None of these
50. Expectancy tables are used in evaluating

B. factorial validity.
D. None of these
51. The percentages included in expectancy tables refer to the number of

A. tests administered versus tests passed.
B. people obtaining a particular test-score/criterion-score combination.
C. items the test developer expects will be sufficient for the item pool.
D. people who are expected to pass the test but may not be successful at the criterion.

52. Which statement is always TRUE of the criterion in expectancy tables?
A. The criterion is represented as the number of points scored.
B. The criterion is dichotomized.
C. The criterion is listed by score interval.
D. The criterion can be objectively scored.
53. In an expectancy table, the percentage of employees who are currently successful in
a position provides some indication of:
A. the validity of the proposed selection measure as compared to another proposed
selection measure.
B. the percent successful using current methods of selection.
C. the reliability of the proposed selection measure.
D. the base rate of the proposed selection measure.
54. Which measures provide statistical evidence for the judgment of criterion-related
validity?
A. reliability coefficient and content validity ratio
B. validity coefficient and expectancy data
C. validity coefficient and content validity ratio
D. reliability coefficient and expectancy data
55. Employment test data suggests that an individual applicant is incapable of

successfully performing a particular job. However, in reality, this individual would be very
successful at the job. This situation exemplifies what is meant by
A. a base rate.
B. a false positive.
C. a false negative.
D. an "E" True Hollywood Story.

56. Which is an example of a false positive?
A. A test identifies a client as schizophrenic when the client is not.
B. A test correctly identifies a client as schizophrenic.
C. A test correctly identifies a client as not having schizophrenia.
D. A test indicates that a client is not schizophrenic when he is.
57. If you were a psychologist working in the field of human resources, which claim for a
new personnel selection test by a test publisher would be MOST persuasive?
A. The test identifies a large number of false positives.
B. The test improves the hit rate.
C. The test identifies a large base rate.
D. The test improves the selection ratio.
58. A construct is
A. unobservable.
B. something that describes behavior.
C. something that is assumed to exist.
D. All of these
59. Which qualifies as a construct?

A. depression
B. intelligence
C. mechanical aptitude
D. All of these

60. All validity evidence can be interpreted as ________ validity.
A. content
C. predictive
D. construct
61. Evidence of the homogeneity of a test can be found in the

A. correlation between a test and some criterion.
B. correlation between test items and total test scores.
C. correlation between subtest scores and total scores.
D. Both b and c
62. Which statistic is appropriate for use to estimating the heterogeneity of a test
composed of multiple-choice items?
A. point-biserial correlation coefficient
B. Pearson-product moment correlation coefficient
C. coefficient alpha
D. chi square
63. Test scores may be affected in pre- and post-testing by

A. therapy.
B. medication.
C. education.
D. All of these

64. If a test is a valid measure of a particular construct, we would expect that
A. groups of people who differ with respect to the construct will obtain different test
scores.
B. groups of people who differ with respect to the construct will obtain similar test
scores.
C. groups of people who obtain similar scores will have similar personalities.
D. None of these
65. A significant, positive relationship exists between scores on a new test of intelligence
and scores on the fourth edition of the Stanford-Binet intelligence scale. These data may
be viewed as supportive of which type of validity evidence for the new test?
B. content validity
C. convergent evidence of construct validity
D. discriminant evidence of construct validity
66. A statistically insignificant correlation between scores on a new test of depression

and a well established measure of satisfaction with life may be construed as which type
of validity evidence with regard to the test of depression?
B. convergent evidence of construct validity
C. discriminant evidence of construct validity
D. None of these because there was an insignificant relationship.
67. Which is the MOST useful tool in evaluating convergent and discriminant validity
evidence?
A. the Rulon formula
B. a multitrait-multimethod matrix
C. a Greco-Latin squares design
D. an abacus

68. The names attributed to different factor loadings in a factor analysis are
A. dictated by the factors themselves.
B. subject to change as new analyses occur.
C. thoroughly validated against dictionary definitions.
D. dependent on the researcher's judgment.
69. In the context of test bias, a biased test

A. may be used fairly.
B. may be used unfairly.
C. may be used either fairly or unfairly.
D. is only used by biased test users.
70. A test is considered to contain a bias if

A. 50% of the test-takers fail the test.
B. one group, such as males, consistently performs better than another group, such as
females.
C. a factor inherent in the test systematically prevents accurate measurement.
D. the test developer was found to harbor prejudice against some group.
71. Which is TRUE regarding a rating?

A. It refers only to a numerical judgment that places a person or an attribute along a
continuum.
B. It refers only to a verbal judgment that places a person or an attribute along a
continuum.
C. It tends not to involve a judgment.
D. It refers to either a numerical or a verbal judgment that places a person or an
attribute along a continuum.

72. Which term is used to refer to the tendency of a rater to evaluate ratees higher than
they objectively deserve because of the rater's inability to discriminate between aspects
of the ratee's behavior?
A. halo effect
B. random error
C. generosity error
D. severity error
73. Rating errors

A. may be unintentional.
B. may be intentional.
C. may involve a tendency to be lenient in rating.
D. All of these
74. A supervisor unintentionally rates his supervisees less favorably than they really
deserve. Which type of error is at work here?
A. unconscious error
B. severity error
C. random error
D. vocational error
75. Which type of error has occurred when a music critic's review of Lady GaGa's latest
album is more positive than most person on the planet believe was warranted?
A. fashion error
C. severity error
D. halo effect

76. A rater systematically assigns ratings in the middle range, thus avoiding extremely
positive and negative ratings. Which type of error BEST characterizes this rater's
ratings?
A. leniency error
C. severity error
D. halo effect
77. Issues of "fairness" as applied to tests

A. are seldom discussed in the popular media.
B. may be determined through mathematical procedures.
C. are generally agreed on.
D. are rooted in moral and philosophical issues.
78. Quotas may be viewed as one type of remedy for

A. low reliability of selection tests.
B. previously unfair practices.
C. low validity of selection tests.
D. All of these
79. Which of the following is TRUE of test bias as compared to test fairness?
A. Test bias is dependent on statistical analyses while test fairness relates to values.
B. Test bias is dependent on values while test fairness relates to statistical analyses.
C. Whether a test is fair can be answered with certainty while whether a test is biased
cannot.
D. None of these statements are true.

80. Any definition of test fairness as used in a psychometric context would be likely to
include reference to
A. the percent of items answered correctly by members of different groups.
B. the mean scores earned by various groups on a particular test.
C. the degree to which a test is used in an impartial, just and equitable way.
D. All of these
81. If new predictors explain something about a predicted score that was not already
explained by existing predictors, the new predictor might be praised for its
A. test-retest reliability.
B. incremental validity.
C. construct validity.
D. face validity.
82. In psychological testing and assessment, bias refers to

A. random variation in test performance attributable to covert prejudice on the part of
the test developer.
B. systematic variation in test performance that is unrelated to the construct that the
test is intended to be measured.
C. a test or testing practice that systematically favors the performance of one group of
testtakers over another.
D. All of these
83. Which of the following is the BEST way to minimize test bias?
A. create separate norm groups for different groups so that any potential bias is reduced.
B. have a panel of experts review the test items at various stages during the test's
development.
C. pre-screen examiners to be used in the test administration for any signs of bias or
prejudice.
D. employ the multitrait-multimethod matrix to screen items for bias.

84. A new test designed to gauge competence to stand trial is found to lack face validity.
Which is the MOST likely consequence of this fact?
A. Judges will urge assessors to use this test.
B. Lawyers will urge assessors not to use this test.
C. impression management will be less of a factor in the test results.
D. whether defendants are competent will be less of a factor in the test results.
85. Which BEST describes the concept of validity as applied to tests?

A. It refers to how well a test measures what the test authors intend it to measure
B. It refers to whether the same results could have occurred by chance less than five
times in a hundred.
C. It refers to how well a specific sample performs on an administration of a test.
D. It refers to whether or not a test is administered under standardized conditions.
86. Relating to Lawshe's Content Validity Ratio (CVR), if half of the panelists rated a
given item on an employment test as representing an essential skill required on the job,
the item's CVR would
A. be negative.
B. be zero.
C. be positive.
D. depend on how many panelists there are.
87. A psychologist wants to determine the criterion-related validity of an intelligence test

by determining how well it predicts a student's placement in a special class. If the
psychologist used the intelligence test for both diagnosis and special class placement,
that criterion would be said to be
A. irrelevant.
B. contaminated.
C. invalid.
D. negatively skewed

88. A test developer compares a student's performance on a newly developed math
achievement test to the same student's performance on a well established math
achievement test for the purpose of exploring the ________ validity of the new test.
A. content
D. construct
89. Comparing SAT scores earned in high school with the first semester college GPA of
that same student is a process related to establishing the ________ validity of the SAT.
A. content
D. construct
90. The results of a predictive validity study of a test will likely be affected most by
A. the characteristics of the sample tested, such as attrition and self-selection.
B. the number of items on the test, with longer tests demonstrating higher predictive
validity.
C. the correlation coefficient chosen to measure the validity.
D. the administration time required for the test compared with that of the criterion test
chosen.
91. Which is an example of convergent evidence for the construct validity of a test
measuring fear of cats?
A. a high correlation between the test and an existing validated test measuring fear of
cats
B. a high correlation with an existing validated test measuring more-generalized fear
C. a low correlation between the test and a test to measure fear of dogs
D. Both a and b

92. In contrast to a trinitarian view of validity, a unitary view of validity takes into
account
A. two of the three elements of the trinitarian view.
B. none of the elements of the trinitarian view but a new model based on consequences
of test use.
C. all three elements of the trinitarian view plus additional factors such as cultural
values.
D. None of these
93. If a newly developed test designed to measure happiness correlates with other tests
of happiness but not with tests of sadness, this is referred to as __________________ and
_________________ evidence of validity, respectively.
A. convergent; discriminant
B. discriminant; convergent
C. homogeneous; concurrent
D. concurrent; homogeneous
94. Which is TRUE regarding the concept of test fairness?

A. Fairness is relatively easy to determine compared with bias.
B. Fairness is usually determined statistically.
C. Fairness often involves moral/ethical issues.
D. All of these
95. Which is TRUE regarding the adjustment of test scores as a function of group
membership?
A. It is illegal for purposes of making hiring or promotion decisions according to the Civil
Rights Act of 1991.
B. It is viewed as helping guarantee the proportional representation of various minority
groups in the workplace.
C. It is viewed as allowing the preferential treatment of certain groups.
D. All of these

96. The primary purpose of the correlation matrix in the multitrait-multimethod matrix
technique is
A. to break down variables into a smaller number of factors.
B. to create a large number of factors from a basic set of variables.
C. to determine how well a variable correlates with itself.
D. None of these
97. "Unequal levels of difficulty between two groups" characterizes the definition of a
biased test that would MOST probably be a quote from
A. any random member of the general public.
B. a court
C. a psychometrician.
D. Kourtney and Kim Kardashian.
98. Which of the following is NOT included in the traditional "trinitarian"

conceptualization of validity?
A. face validity
B. content validity
C. construct validity
D. criterion-related validity
99. In studies that indicate that Attention Deficit Disorder occurs in approximately 2% of
the population, 2% represents the __________ for the disorder.
A. hit rate
B. base rate
C. miss rate
D. sample

100. Which of the following is the best definition of hit rate?
A. the proportion of people the test correctly identifies as possessing a particular trait,
B. the proportion of people in the general population who possess the particular trait,
C. the proportion of people the test incorrectly identifies as possessing a particular trait,
D. the degree of validity of a particular test
101. The extent to which a particular factor contributes to a test score is referred to as a
A. true score.
B. base rate.
C. factor loading.
D. hit rate.
102. Factor analysis

A. is a class of mathematical procedures.
B. is a data reduction technique.
C. explains the extent to which a factor or factors explain test scores.
D. All of these
103. Using a test that measures a low base rate trait

A. will likely result in more correct than incorrect classifications.
B. will likely result in more incorrect than correct classifications.
C. will result in an equal number of correct and incorrect classifications.
D. will have results that cannot be determined based on the information presented.

104. This Child Abuse Potential (CAP) Inventory boasts an accuracy rate of approximately
90%. Properly interpreted, this means that
A. 90% of the people who score high on the CAP physically abuse children.
B. 90% of the people who score low on the CAP do not physically abuse children.
C. in groups with a 50% base rate, 90% of those who abuse children are correctly
identified.
D. that in groups with a 90% base rate, 50% of those who abuse children are correctly
identified.
105. If the rate of a particular disorder occurring in the population is low, what impact
does this have on the classification of individuals based on the results of a psychological
test?
A. There will be no impact on the accuracy of the classification.
B. More individuals will be incorrectly classified as not having the disorder.
C. More individuals will be incorrectly classified as having the disorder.
D. The impact cannot be determined based on the information provided.
validity of a test.
validity of a test.
D. None of these

c6 Summary
Category # of Questi
ons
Cohen - Chapte 106
r 06

Testing and Assessment: Ch. 6 Test Bank

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Testing and Assessment: Ch. 6 Test Bank

Uploaded by

Copyright:

Available Formats

c6

1. In legal terminology, a valid contract is a contract that

7. "It's a measure of validity that arrived at by a comprehensive analysis of how scores

8. Messick is to unitarian as __________ is to trinitarian.

9. As mentioned in Chapter 6 of your text, the measurement of content validity is

13. Test blueprinting is applied in the design of

15. Criterion-related validity is to predictive validity as criterion-related validity is to

18. An expectancy chart is

22. Which is NOT a method of evaluating the validity of a test?

23. Predictive and concurrent validity can be subsumed under

25. Face validity refers to

26. Face validity

33. A standard against which a test or test score is evaluated is known as

34. Which of the following is BEST be viewed as varieties of criterion-related validity?

38. Which is an example of a criterion?

39. Criterion contamination occurs when

40. Which BEST represents an unobtrusive measure of marital adjustment?

45. Blueprinting is best associated with

46. The magnitude of a validity coefficient may be affected by

49. Criterion-related validity can be evaluated through the use of

50. Expectancy tables are used in evaluating

51. The percentages included in expectancy tables refer to the number of

52. Which statement is always TRUE of the criterion in expectancy tables?

55. Employment test data suggests that an individual applicant is incapable of

56. Which is an example of a false positive?

59. Which qualifies as a construct?

61. Evidence of the homogeneity of a test can be found in the

63. Test scores may be affected in pre- and post-testing by

64. If a test is a valid measure of a particular construct, we would expect that

66. A statistically insignificant correlation between scores on a new test of depression

69. In the context of test bias, a biased test

71. Which is TRUE regarding a rating?

73. Rating errors

77. Issues of "fairness" as applied to tests

78. Quotas may be viewed as one type of remedy for

82. In psychological testing and assessment, bias refers to

87. A psychologist wants to determine the criterion-related validity of an intelligence test

88. A test developer compares a student's performance on a newly developed math

94. Which is TRUE regarding the concept of test fairness?

98. Which of the following is NOT included in the traditional "trinitarian"

102. Factor analysis

103. Using a test that measures a low base rate trait

1. In legal terminology, a valid contract is a contract that

5. The validation of a test is a process

7. "It's a measure of validity that arrived at by a comprehensive analysis of how scores

9. As mentioned in Chapter 6 of your text, the measurement of content validity is

Cohen - Chapter 06 #10

Cohen - Chapter 06 #11

Cohen - Chapter 06 #12

13. Test blueprinting is applied in the design of

Cohen - Chapter 06 #13

Cohen - Chapter 06 #14

15. Criterion-related validity is to predictive validity as criterion-related validity is to

Cohen - Chapter 06 #15

Cohen - Chapter 06 #16

Cohen - Chapter 06 #17

18. An expectancy chart is

Cohen - Chapter 06 #18

Cohen - Chapter 06 #19

Cohen - Chapter 06 #20

21. A test is considered valid when the test

Cohen - Chapter 06 #21

22. Which is NOT a method of evaluating the validity of a test?