Professional Documents
Culture Documents
Hamzeh Dodeen
To cite this article: Hamzeh Dodeen (2008) Assessing testtaking strategies of university students:
developing a scale and estimating its psychometric indices, Assessment & Evaluation in Higher
Education, 33:4, 409-419, DOI: 10.1080/02602930701562874
Download by: [The University of British Columbia] Date: 21 March 2017, At: 22:28
Assessment & Evaluation in Higher Education
Vol. 33, No. 4, August 2008, 409419
Test-taking strategies are important cognitive skills that strongly affect students
performance in tests. Using appropriate test-taking strategies improves students
achievement and grades, improves students attitudes toward tests and reduces test
anxiety. This results in improving test accuracy and validity. This study aimed at
developing a scale to assess students test-taking strategies at university level. The scale
developed was passed through several validation procedures that included content,
construct and criterion-related validity. Similarly, scale reliability (internal reliability
and stability over time) was assessed through several procedures. Four samples of
students (50, 828, 553 and 235) participated by responding to different versions of the
scale. The scale developed consists of 31 items distributed into four sub-scales: Before-
test, Time management, During-test and After-test. To the researchers knowledge, this
is the first comprehensive scale developed to assess test-taking strategies used by
university students.
Introduction
Test scores should reflect the level of students knowledge of the test content as well as
related skills. This is a critical aspect of tests that is used to make decisions regarding partic-
ular persons. For example, When tests scores become the most important factor determin-
ing who gets included in and excluded from educational opportunities, scores that
accurately reflect students knowledge and skills become imperative (Taylor and Walton
1997, 67). However, do test scores reflect only students knowledge? Are there other vari-
ables that influence test scores? During test-taking, ability is not the only factor that affects
students performance. There are several cognitive and psychological factors, such as
subject matter, level of test anxiety, attitudes toward the subject of the test, attitudes towards
tests in general and test-taking strategies, that influence test scores (Hambleton et al. 1991).
Therefore, several atypical student responses or behaviors may be observed during tests
(Meijer 1996):
An examinee having difficulty beginning the test may show a sleeping behavior.
Plodding behavior results from working slowly and not moving to the next item.
Alignment error may occur when a high-ability student skips an item in the test but
forgets to skip it in the answer sheet.
Other unusual responses might be due to poorly managing test time, fatigue, unfamiliarity
with the topic or the test format (Swearingen 1998) and scoring errors (Hulin et al. 1983).
*Email: hdodeen@uaeu.ac.ae
In addition, some students who are prepared for a test do not do well while others perform
better than expected (Vattanapath and Jaiprayoon 1999).
Students who are able to do better than others from the same ability level are called
test-wise. Test-wiseness is A subjects capacity to utilize the characteristics and formats of
test and/or the test-taking situation to receive a high score (Hyde 1981, 3). Test-wise
students have strategies or skills that help them do well in tests independent of the knowl-
edge of the test content or materials (Sarnacki 1979). Their strategies or skills, usually
called test-taking strategies, are the cognitive abilities that allow them to undertake any
testing situation in an appropriate manner and to know what to do before, during and after
the test.
Test-taking strategies used effectively help examinees cope with the problem of test anxiety.
For example, Carraway (1987) investigated the effect of a test-taking strategies seminar on
improving students scores in tests and on reducing their level of anxiety related to tests.
The results of this study indicated that students who participated in the seminar had lower
test-anxiety levels and higher test scores than their matched peers who did not participate in
the seminar.
Tests are usually designed to assess students knowledge in particular content or mate-
rials. When other factors affect students performance, test scores are no longer valid
measures of students knowledge or ability levels. Test-taking strategies can improve the
overall validity of the test scores so that they accurately reflect what students really know.
This could be done by ensuring that students lose points only because they do not know the
information and not for unrelated reasons. Ebel (1965) stated that More error in measure-
ment is likely to originate from students who have too little, rather than too many, skills in
test-taking (3). Assessing test-taking strategies of university students is useful to study and
to understand students behavior in tests. This can be an initial step in understanding several
related phenomena such as why some students do poorly in exams. The purpose of this study
is to develop a scale to assess test-taking strategies of university students and to estimate its
psychometric indices.
Scale development
Participants
This study was conducted on the students of the United Arab Emirates University (UAEU).
UAEU is a medium-sized four-year public university which has an enrolment of approxi-
mately 15,000 students. Four random samples (50, 828, 553 and 235 students) participated
in this study by responding to different versions of the test-taking strategies scale. These
samples represented the actual percentage of both genders in the university. Sample 1
consisted of 50 students (31 females [62%] and 19 males [38%]). Sample 2 had 828 students
(534 females [64.5%] and 294 males [35.5%]). Sample 3 consisted of 553 students (342
females [61.8%] and 211 males [38.2%]). Finally, Sample 4 had 235 students (160 females
[68%] and 75 males [32%]). All colleges at UAE University were also represented in these
samples. Table 1 shows number and percentage of students from each college in Samples
2, 3 and 4.
Development steps
The development of the scale to assess students test-taking strategies and estimating its
psychometric indices was conducted in the following manner:
distributed into the four categories as follows: 13 items in Before-test, 11 items in Time
management, 22 items in During-test, and 13 items in After-test.
4. Assessing validity
Validity is an indication of how well an instrument actually measures what it is claimed to
measure and helps to ensure that there are no logical errors in drawing conclusions from
the data (Garson 1998). To validate an instrument, several pieces of evidence of validity
are usually assessed. The most widely used aspects are content, construct and criterion-
related validity (Crocker and Algina 1986). These three types were assessed for the
present scale.
Content validity this is the degree to which the content of the scale of interest is
relevant, representative and of technical quality in respect of the scale used. Content
validity of an instrument is established if content experts agree that the instrument
items cover the issues to be assessed. To assess content validity of the current scale,
a panel of 15 faculty members with a background in education, measurement and
evaluation, or educational psychology was asked to review the scale. Revisions were
made that addressed rewording, appropriateness, clarity and some technical issues.
Ambiguous items were either removed or rewritten. Following this step a revised
version of the scale was prepared. This version consisted of only 48 items that were
distributed into the four categories as follows: eight items in Before-test, nine items in
Time management, 20 items in During-test, and 11 items in After-test.
Construct validity this refers to the degree to which a scale measures an intended
hypothetical construct (Gay 1996). This evidence of validity can be established by
relating the scale or the instrument of interest to some other measures consistent with
the hypothesis or the construct being assessed. Statistically, construct validity can be
assessed through the use of a factor analysis procedure. The aim of this analysis is to
identify the main components or categories that underline the scale. Sample 2 (828
students: 534 females and 294 males) was used in this analysis. For a clear interpre-
tation of the extracted components, a factor analysis with Varimax rotation was
applied to the data. The KasierMeyerOlkin (KMO) value was 0.87, which indicated
that the sample was suitable to run factor analysis (a minimum value of KMO = 0.60
is acceptable; Stevens, 1996). Four factors were extracted from this analysis. These
factors together explained more than 50% of the total variance in all scale items. Items
with low loading values were identified and deleted. In social sciences, the common
minimum cut-off loading value is 0.30 or 0.35 (Stevens 1996; Tabachnick and Fidel,
2001). Using this criterion, only variables with loading values of 0.30 were retained.
On that basis, all eight variables in the first factor, Before-test strategies, were
retained. In the second factor, Time-management strategies, one variable was deleted
because of low loading value, and another variable was deleted because it loaded on
two factors. In the third factor, During-test strategies, eight variables were deleted
because of their low loading values, and one variable was deleted because it loaded in
more than one factor. Finally in the fourth factor, After-test strategies, six variables
were deleted. Table 2 summarizes the loading values of all variables with deleted vari-
ables emboldened. At the end of this analysis, the total number of items in the four
sub-scales was reduced to 31 items.
Criterion-related validity this refers to the degree to which the scores on the scale
are related to the scores on another valid criterion available at the same time (Gay
Table 2. Loading of items with the four extracted components.
414
Component
Item 1 2 3 4
I do not attend the last few classes before the test 0.082 0.131 0.269 0.343
I spend most of the night before the test studying 0.415
H. Dodeen
The first instrument was Attitudes toward Tests. This instrument was designed to assess
students attitudes toward tests. Examples from the instrument are: Tests motivate me to
study hard, For me taking tests is a painful experience and I try my best to avoid any
course that requires a lot of tests. A high score on this instrument suggests a positive atti-
tude toward tests. The correlation between students scores on the scale and their attitudes
toward tests was used to estimate the convergent validity of the scale. Generally, attitudes
toward the subject matter have a positive relationship with achievement (Schofield 1982;
Wilson 1983). It is assumed that students who have good test-taking strategies develop more
positive attitudes toward tests.
The second instrument was Test Anxiety Inventory (TAI). This inventory has been
widely used in measuring the level of adult test anxiety. It was originally developed by
Spielberger (1980), then used and validated to fit several cultures. Examples from the TAI
questions are: While taking an examination, I have an uneasy, upset feeling (Item 2),
Thoughts of doing poorly interfere with my concentration on tests (Item 7), and During
examinations, I get so nervous so that I forget facts I really know (Item 20). The Arabic
version of TAI, validated by Tayb (1984), was used in this study. The first item (I feel
confident and comfortable during tests), which is the only positively stated item on the
scale, was recoded before being added to the other ones. A high score on this inventory indi-
cates a high level of test anxiety. In this study, it is assumed that students who have more
appropriate test-taking strategies are less anxious about tests.
The two instruments Attitudes toward Tests and Test Anxiety were administered
at the same time with the developed scale. Cronbachs alpha values were 0.89 and 0.93
for the Attitudes toward Tests and Test Anxiety respectively. Based on these results,
the two instruments were judged to have adequate internal reliability. Correlations
between each category and the instruments were calculated and results are summarized
in Table 3.
As shown in Table 3, there was a significant positive correlation between each category
and students attitudes toward tests. For example, the correlation between students attitudes
toward the test and the Before-test sub-scale was 0.42, p < 0.01. Similar results were
observed for the other sub-scales. This could be seen as evidence of criterion-related valid-
ity to each of the four categories. On the other hand, the negative significant correlations
between each sub-scale and test-anxiety level could also be seen as an evidence of validity
for these sub-scales. For example the correlation between Before-test and the Test Anxiety
scale was 0.30, p < 0.01.
Table 3. Correlations between each sub-scale and attitudes toward tests and test anxiety.
Subscale
Scale Before-test Time management During-test After-test
Attitude toward rests 0.42** 0.43** 0.52** 0.44**
Test anxiety 0.30** 0.19** 0.26** 0.16**
Note: **Denotes significant at 0.01.
Assessment & Evaluation in Higher Education 417
5. Assessing reliability
Reliability of an instrument refers to the degree to which the results could be replicated if
the same individuals were tested again under similar circumstances (Crocker and Algina
1986, 105). Two types of reliability were assessed: stability over time and internal
reliability.
Stability over time: this refers to the degree to which the scale is giving similar results
over time. A random sample of 235 students (Sample 4) was used in this analysis.
Students responded to the scale twice within three weeks at different time intervals.
Stability was estimated through correlating students responses on the two adminis-
tration times. Results of this analysis showed that there was a high correlation
between the two responses. Correlations for the four categories were 0.89, 0.82, 0.92
and 0.86 for Before-test, Time management, During-test and After-test respectively.
Internal reliability: the internal consistency and homogeneity for the four categories
of the scale were assessed using Cronbachs alpha. The minimum advisable level is
0.70 (Nunnally and Bernstein 1994). Cronbachs alpha values for the four categories
were as follows: Before-test 0.71, Time management 0.75, During test 0.76, and After-
test 0.81. Based on these results, the four categories were judged to have adequate
internal reliability.
6. Item discrimination
This was used as evidence of item quality. Item discrimination was assessed through calcu-
lating the correlation between each item and its main component category. Correlations
between the eight items of the first category and their scale were 0.52, 0.54, 0.65, 0.31, 0.61,
0.62, 0.46 and 0.34. Correlations between the seven items that make up the second category
and their scale were 0.69, 0.61, 0.41, 0.62, 0.68, 0.56, and 0.69. Correlations between the
11 items that make up the third category and their scale were as follows: 0.51, 0.46, 0.47,
0.42, 0.47, 0.56, 0.42, 0.58, 0.58, 0.45 and 0.53. Finally, correlations between the five items
that make up the fourth category and their scale were 0.63, 0.56, 0.58, 0.55 and 0.68. These
values were higher than correlations between items and the scales to which they were
unrelated. This indicated that items have acceptable discrimination values.
Conclusion
Test-taking strategies are important cognitive skills that strongly affect students perfor-
mance in tests. With the increasing use of tests in different academic and non-academic
contexts, using appropriate test-taking strategies becomes a critical factor in helping
students test performance, better matching their preparation and ability level. This results
in improved test accuracy and validity. In addition, having test-taking strategies improves
students attitudes toward tests and reduces test anxiety.
A scale for assessing strategies and skills used by university students in test-taking was
developed in this study (see Appendix). The developmental process required several steps
and procedures to ensure a high-quality scale from a psychometric point of view. Develop-
ing the scale was dependent on extensively reviewing related literature on important,
common test strategies that were highly recommended by educators and test specialists. The
developed items went through several validation procedures that included content, construct
and criterion-related validity. Similarly, scale reliability was assessed through several
418 H. Dodeen
procedures and using several samples of responses. This included internal reliability as well
as stability of the scale over time. Two panels of faculty members with background in
measurement, education or educational psychology reviewed the scale items and validated
their content. Four samples of students (50, 828, 553 and 235) participated by responding
to different versions of the scale. To the researchers knowledge, this is the first comprehen-
sive scale developed to assess test-taking strategies used by university students. Additional
applications, however, are needed to replicate and validate the scale using different samples
from different educational levels.
Acknowledgements
The author would like to thank the Scientific Research Affairs Sector in the UAE University for fund-
ing this research.
Notes on contributor
Hamzeh M. Dodeen is associate professor in measurement and evaluation at UAE University. His
research interests include item analysis in both Classical Test Theory (CTT) and Item Response
Theory (IRT), person- and item-fit analysis, differential item functioning (DIF), and test-related
characteristics.
References
Carraway, C. 1987. Determining the relationship of nursing test scores and test-anxiety levels before and
after a test-taking strategy seminar. (ERIC Document Reproduction Service No. ED 318 498).
Colosi, L. 1997. The laymans guide to social research methods. Available online at http://
www.socialresearchmethods.net/tutorial/Colosi/lcolosi1.htm
Crocker, L., and J. Algina. 1986. Introduction to classical and modern test theory. Orlando, FL:
Harcourt Brace Jovanovich.
Culler, R.E., and C.J. Holahan. 1980. Test anxiety and academic performance: the effect of study-
related behavior. Journal of Educational Psychology 72: 1620.
Dolly, J.P., and K.S. Williams. 1986. Using test-taking strategies to maximize multiple-choice test
scores. Educational and Psychological Measurement 46: 619625.
Dreisbach, M., and B. Keogh. 1982. Testwiseness as a factor in readiness test performance of young
Mexican-American children. Journal of Educational Psychology 72, no. 2: 224229.
Ebel, R. 1965. Measuring educational achievement. Englewood Cliffs, NJ: Prentice-Hall.
Gallagher, A.M. 1992. Sex differences in problem-solving strategies used by high-scoring examinees on
SAT-Math. (ERIC Document Reproduction Service No. ED 352-420.)
Garson, D. 1998. Quantitative research in public administration. Available online at http://
www2.chass.ncsu.edu/garson/pa765/validity.htm
Gay, L.R. 1996. Educational research. Englewood Cliffs, NJ: Prentice-Hall.
Hambleton, R.K., H. Swaminathan, and H.J. Rogers. 1991. Fundamentals of item response theory.
Newbury Park, CA: Sage Publications.
Hembree, R. 1988. Correlates, causes, effects, and treatment of test anxiety. Review of Educational
Research 58, no. 1: 4777.
Hulin, C.L., F. Drasgow, and C.K. Parsons. 1983. Item response theory. Homewood, IL: Dow Jones-
Irwin.
Hyde, R.E. 1981. Successful test-taking strategies for nursing students. Paper presented at the
Annual Meeting of the College Reading Association. Louisville, KY.
Kimball, M. 1989. A new perspective on womens math achievement. Psychological Bulletin 105:
198214.
Langerquist, S. 1982. Nursing examination review: test-taking strategies. Menlo Park, CA: Addison-
Wesley.
McLellan, J., and C. Craig. 1989. Facing the reality of achievement test. Education Canada, 3640.
Meijer, R.R. 1996. Person-fit research: an introduction. Applied Measurement in Education 9, no. 1: 38.
Assessment & Evaluation in Higher Education 419
Nunnally, F., and I. Bernstein. 1994. Psychometric theory. New York: McGraw-Hill.
Rocklin, T., and J.M. Thompson. 1985. Interactive effects of test anxiety, test difficulty, and
feedback. Journal of Educational Psychology 77: 368372.
Sarnacki, R.E. 1979. An examination of test-wiseness in the cognitive test domain. Review of
Educational Research 49: 252279.
Schofield, H. 1982. Sex, grade level, and the relationship between mathematics attitudes and
achievement in children. Journal of Educational Research 75, no. 5: 280284.
Spielberger, C.D. 1980. Conceptual and methodological issues in anxiety research. In Anxiety:
current trends in theory and research, ed. C.D. Spielberger, Vol. 2. New York: Academic Press.
Stevens, J. 1996. Applied multivariate statistics for social sciences. Mahwah, NJ: Lawrence Erlbaum.
Strnad, K. 2003. Coping with college series: handling test anxiety. Available online at http://
www.counseling.ilstu.edu/files/downloads/articles/coping-test_anxiety.pdf
Swearingen, D.L. 1998. Person-fit and its relationship with other measures of response set. Paper
presented at the Annual Meeting of the American Educational Research Association. San Diego,
CA.
Sweetnam, K.R. 2003. Test-taking strategies and student achievement. Available online at http://
www.cloquet.k12.mn.us/chu/class/fourth/ks/stratigies.htm
Tabachnick, B.G., and L.S. Fidel. 2001. Using multivariate statistics. Boston: Allyn & Bacon.
Tayb, M.A. 1984. The test-anxiety scale. Cairo: Dar Al-Maref.
Taylor, K., and S. Walton. 1997. Co-opting standardized tests in the service of learning. Phi Delta
Kappan 6670.
Vattanapath, R., and K. Jaiprayoon. 1999. An assessment of the effectiveness of teaching test-taking
strategies for multiple-choice English reading comprehension test. Occasional Papers 8: 5771.
Wilson, U. 1983. A meta-analysis of the relationship between science achievement and science atti-
tude: kindergarten through college. Journal of Research in Science Teaching 20, no. 4: 839850.