Professional Documents
Culture Documents
IEA’s Trends in International Mathematics and Science Study (TIMSS) gives a lot of
framework. TIMSS tests students in grades four and eight and also gathers a wide
ragne of data from their schools and teachers about curriculum and instruction in
mathematics and science. TIMSS findings have been used by many countries around
the world in their efforts to develop better methods in teaching science and
mathematics.Involving more than 60 countries, “TIMSS 2007 is the most recent in the
achievement”. TIMSS one was in 1995 in 41 countries, the second in 1999 involving 38
countries. TIMSS 2003 consisted of more than 50 countries. The majority of countries
TIMSS Advanced assesses students who are leaving school for preparation in
advanced physics and mathematics. Since the 1995 assessment, however, TIMSS has
not assessed children who are nearing the end of high school. Recognizing the strong
link between scientific competence and economic productivity,and given the relatively
long time period since the 1995 assessments, countries around the world have
comparative data about the achievement of their students enrolled in advanced courses
Advanced, ”countries that participated in 1995 can determine whether the achievement
2
participating in TIMSS Advanced for the first time can assess the comparative standing
educational opportunities are provided to students, and the factors that influence how
students use these opportunities”. To begin the process of defining the topics to be
assessed in TIMSS Advanced, this document built on the composition of the 1995
Advanced 2008. The description of cognitive domains also benefited from the TIMSS
2007 results according to cognitive domains. The first draft of this document was
provided comments about the subjects incorporated in their subject matter in advanced
mathematics and physics courses, and made recommendations about the desirability
major undertaking of theIEA. The IEA has taken full responsibility for the management
of the project. The “TIMSS International Study Center” correlates with the IEA
sampling, and “Educational Testing Service in New Jersey on the psychometric scaling
of the data” .
evaluate the 56 differing state or state-like education entities in the United States? This
3
particular assessment. Based on research, this assessment is reliable and valid due to
the content domains which “define the specific mathematics subject matter covered by
the assessment, and the cognitive domains which define the sets of behaviors expected
of students as they engage with the mathematics content. The cognitive domains of
mathematics and science are defined by the same three sets of expected behaviors-
knowing, applying, and reasoning” . In other words, although there could exist other
variables factored into why this particular test may not be valid or reliable, the common
factors that span continents and are shared with these age groups (fourth and eighth
graders) that are tested are “the mathematical topics or content that students are
expected to learn and the cognitive skills that students are expected to have developed”
(2007).
The IEA developed TIMSS to compare the educational achievement around the
globe. TIMMS began in the 1990s with a desire to study international studies of
students within the same age/or grade bracket. It was believed that math and science
education would be effective for the economic development in the technological world of
the future. The break-up of the Soviet Union brought about new countries wanting to be
participants in this study to help provide them with data to guide their educational
systems
questionnaire for student and teacher. The measurement included topics in science and
math students should receive by grades 4 and 8. The questionnaires used were to
collect information on the background of students, their attitudes, and their belief system
4
at class scheduling of science and math coverage, the policies of the school, the
The use of a summated rating scale was used to measure the idea as
common practice in the social sciences. The summated rating scale showed validity and
reliability on the sample on which it used. Summated rating scales were derived for
each construct. The construct pertained to student self-interest in mathematics and the
belief that motivation is a vital role in predicting the present and future achievement of a
student. The opportunity for students in both the fourth and eighth grades across
continents to do well is fair and consistently assessed. Scores by subject and grade are
comparable over time (2007). The overall idea of the TIMSS when it was created had a
mean of 500 based on the number of countries that took part in the testing. This testing
of both grades four and eight began in 1995 and continued until 2007. “Successive
TIMSS assessments since 1995, 1999, 2003, and 2007 have scaled the achievement
data so that scores are equivalent from assessment to assessment” (2007). The
collected data related to the teachers, the students, and the schools. To help the
researchers comprehend the implementation of the students in their own country, this
information was vital. A summated rating scale was developed since the possibility to
divulge information item by item is almost impossible. TIMSS called this scale multi-item
indicator.
5
a huge amount of information being reported, a summated rating scale reduces data
amount and causes an easier way to digest for the public. “For the advantages of
summated rating scales over single item measures to hold, it is important that the scale
can show reasonable score reliability and validity as an indicator of a latent variable for
the sample on which the scale is used”. A collaboration of participating countries helped
create TIMSS. These countries created item pools, assessment frameworks, and
of school curricula is the cause of TIMSS and is designed to examine the provisions of
curriculum and the implemented curriculum”. The “intended curriculum” is the science
and math that society expects the learning to take place within the students, along with
the process of the organizational system reaching its goal. The “implemented
curriculum” is the content taught in class, how it is taught, and who teaches it. The
assessment of 2003 which looked at student achievement in math and science has
ambitious coverage goals, reporting not only overall science and math achievement
scores, but also scores in important content areas in these subjects. Examples of the
mathematical topical or content domains (as referred to in TIMSS) that are covered in
the fourth grade are “numbers, geometric shapes, measures, and data display. In the
eighth grade, the content domains are numbers, algebra, geometry, data and chance.
The cognitive domains in each grade are knowing, applying, and reasoning “(2007). The
five domains in science are “life science, chemistry, physics, earth science, and
6
procedures, using concepts, solving routine problems, and reasoning) and three in
defined the sets of behaviors expected of students as they engaged with the
College”, “The National Science Foundation” and “The Center for Education Statistics”
established the TIMSS 1999 benchmarking research. The TIMSS achievements tests
were given to students in spring 1999 in conjunction with the administering of TIMSS in
other countries. “Participation in TIMSS benchmarking was intended to help states and
districts understand their comparative educational standing, assess the rigor and
context, and improve the teaching and learning of mathematics and science”.
from 0.62 in Morocco to 0.86 in Singapore. The international median, 0.80 is the median
of the reliability coefficients for all countries. Reliability coefficients among benchmarking
participants were generally close to the international median ranging from 0.82 to 0.86
across states, and from 0.77 to 0.85 across districts”. An example of the validity and
reliability of TIMSS is the method of assessment for both fourth and eighth grades. This
level. The TIMSS mathematics scale is from 0-1,000 and the international mean score
Regarding test validity, the following questions were posed by TIMSS for
comparative validity:
teachers and is part of the assertion that is unquestionable. Another assumption is with
concern for the reliability of teachers’ evaluative judgments must be tempered by the
analysis. The question often raised in relation to international testing is if these results
have meaning? International testing programs such as TIMSS have been criticized for
two reasons; first, “other nations have not tested as large a percentage of their student
population causing their scores to be inflated; and second, our best students are among
the world’s best, with our average being brought down by a large cohort of low-
8
dependent. “The degree to which the test items adequately represent the construct and
whether the number of items administered is enough to provide scores with sufficient
variance”. In simpler terms, observing scores reveal the indication of an awareness and
are not influenced by contributions that are not relevant to the construct.
degree effort devoted by examinees to the test” When an individual takes a test, the
examiner assumes the individual will want to get items correct. But there are instances
when the test taker does not try his best. As a result, this leads to false underestimation
of what the test take can do. A low effort results in a negative biased estimate of the
individual’s proficiency. “Whenever test-taking effort varies across examinees, there will
be a differential biasing effect, which will introduce construct-irrelevant variance into the
test score data”. Low test scores can be caused by the test giver having low proficiency
or it could be the test taker has a higher proficiency and is not trying his best on the test.
If personal consequences such as grades were affected by the test, then low effort
would not be a major validity threat. If the examiner decides not to give his best, it will
measurements exist where scores will have an impact on test givers, but no impact on
There are two sides to everything. The sword is double edged and the
TIMMS assessment is not exempt from their being cons to all the pros. Prais (2007)
explains that “when international tests were first introduced nearly two generations ago,
9
provide broader insight into the diverse factors leading to success in learning”.
shows the level of commitment a country has towards improved global education, the
other side of the sword for this is that these large sums of money and time could be
There has been much research completed presenting very impressive points to be
considered when determining the reliability and validity of the TIMMS assessment.
Robertson (2005) states, “overall findings from past international studies have been
are going to look at some of the cons of TIMMS assessment and to what extent the
could be effected by the same factors as the European countries were. These factors
include: student age, baseline data, motivation, curriculum mismatch including cultural
differences and translation of the assessment. One factor that has been found to create
problems in interpreting scores with validity is the age of the students when they start
school. It is important to understand at what age the students were when they began
school. Start times for schools within and between countries varies which can make
interpreting the assessment scores difficult. As well as knowing the age of the students,
Tymms, Merrell, & Jones (2004) maintain that baseline data is needed to be able to
10
middle and the end of their school career. These types of testing practices are only
measuring the student’s level at the particular time of the assessment, versus
across the country there are different age ranges for when a child can start school.
Many states implement a head start program and pre-K program to create opportunities
to get children ready for school. Since these are not mandated, attendance is voluntary
and not all students glean the benefits from attending. This leaves students entering
What could be considered the primary factor in the validity, reliability, and
limited to, but includes, curriculum subject matter, translation, or academic vocabulary,
unfamiliar context and/or cultural context, and item formats. A curriculum match has
been determined to be the most serious concern of the validity of international testing.
How well an assessment measure matches the curriculum of the country will determine
the success factor of the individual countries participating in the assessment. The
translation of the TIMMS test has translated to poor assessments scores for some
students. Even though the test goes through rigorous translation practices, the
vocabulary used in the context of the test questions proved to be difficult to some
students. Item formatting has also posed a problem. While the format might have been
easily identified by some, it was not comprehensible by others. Cultural differences also
11
important to understand their differences. It was found that the cultural emphasis
also carry into test item interpretation and successful answering. These same
assessment considerations can be made in America. While, the United States historical
data could offer some homogenous research finding, our current classroom
demographics and research findings are widely varied. U.S. schools are filled with
These differences include a wide range of possibilities and limitations for students when
education background of the second language learners coming to the U.S. The United
represented in our populations from all over the world. So while the specific countries
have their own issues with the TIMMS, the U.S. faces all of these problems within our
own country. The admirable goal of universal success, which is implicit in the No Child
purpose of collecting data. The students do not get any feedback of their performance
on the test. Their scores do not have any impact of their educational experience. Low-
stakes nature of the TIMMS causes an under achievement amongst its assessment
candidates. This lack of motivation and ultimate low achievement could create a biases
12
(Elk, 2007). This is an area that lacks in research and a place on the TIMMS test
test taker’s level of motivation will have an effect on their score. If we were to now
implement a low-stakes test at this stage in the game in the U.S., it would provide some
interesting results. With our students now being exposed to high stakes testing, a low-
stakes test probably would create the same anxieties as does the state competency
tests. The nature of the test and its ramifications would need to be fully explained to
valued due to its “rich comparative data about educational systems, curriculums, school
characteristics, and instructional practices. One strength of TIMSS is its attempt to link
several concerns about the validity of how the scores are ranked “because countries
differ substantially in such factors as student selectivity, curriculum emphasis, and the
been produced to justify concerns at the secondary level (Bracey, 2000; Rotberg, 1998).
At the primary and middle school level researchers monitored the interpretation of
TIMSS results”. Wang (2001) suggested that “one researcher believed that since TIMSS
was not a controlled scientific study and did not measure the effectiveness of one
teaching method against another”, the findings could not support certain reforms at a
local school. Another issue that is a concern is that TIMSS failed to examine results of
diverse population.
13
Several technical problems became apparent which came from the database of
TIMSS. Wang (2001) found additional technical problems which in the end can skew the
comparative results which undercut the reliability of the TIMSS bench marking. Wang
(2001) believes there are some disadvantages of TIMSS: (1) “The Format of TIMSS
Test Items Is Not Consistent With the Goals of Some Education Reforms”, (2) The
Differences and Content-Differences Among Countries, the TIMSS Tests Might Not Align
With What Students Have Learned” and (4)”Several Problematic Age Outliers in the
The formats of test items have been found to be inconsistent with the
goals of various school reforms. “TIMSS test measures mostly lower learning outcomes
by means of predominantly multiple choice format (Lange, 1997, p. 3). The test items
items (Lang, 1997)”. From the pool of questions students were tested on subsets of
questions which would not reflect outcomes that came from any reform initiative.
When issuing booklets for testing (Gonzalez & Smith, 1997), expressed
how the rotation of booklets arranged in various clusters could reflect invalid results for
students. A study should show booklet eight and booklets one through seven were
structured by clusters. Booklet eight focused on the breadth cluster which could only
14
through seven contained a more focus cluster. Due to the structure discrepancy
grades such as third and fourth graders for students in the United States which is
considered primary grades. The adjacent grades resulted in grade gaps based on
school experience. Each grade level comes with different learning levels and school
experience. Because of the learning difference experience Wang, 2001 suggested that
it would be unclear to say whether any testing instrument could measure what students
have learned in any grade level. Martin and Kelly (1997) pondered “whether the
it relates to student achievement”. The authors believed that “age outliers in the TIMSS
database were not adequately explained”. For example student population ranged from
adults as old as a 49.9 year old seventh grader from a foreign country and as young as
a10 year old eighth grader from the United States. Since a students’ age is a factor in
cognitive development, the age outliers are deemed essential when analyzing data in
studies such as TIMSS. Wang (2001) supports that when interpreting results of TIMSS,
the test score component will come with problems of technical nature.
impact educational systems throughout the nation. It is important that science and
mathematics measure both what they are designed to measure to be certain that the
questionnaires are selected with care, scored, studied closely, and find meaning. It is
15
Mathematics and Science Study (TIMSS)” has proven to be an both an assessment that
is valid and reliable among International Educational Assessment System, but does
Reference
Gonzalez, E. J., & Smith, T.A. (1997). Users guide for the TIMSS international
database. Chestnut Hill, MA: TIMSS International Study Center
Hussein, M.G. (1992). What does Kuwait want to learn from TIMSS? Prospects (22),
275-277.
Lange, J.D. (1997). Looking through the TIMSS mirror from a teaching angle.
http://www.enc.org/topics/timss/additional/documents
Martin, M. O., & Kelly, D.L. (1997). Technical report volume II: Implementation and
analysis. Chestnut Hill, MA: TIMSS International Study Cen
Martin, M.O. (Ed.) (2003). TIMSS 2003 User Guide for the International Database.
TIMSS & PIRLS International Study Center, Lynch School of Education, Boston
College.
Martin, M.O. & Mullis, I.V.S (2006). TIMSS in perspective: Lessons learned from
IEA’s four decades of international mathematics assessments. TIMSS & PIRLS
International Study Center, Lynch School of Education, Boston College.
16
Rose, L.C. (1998). Who cares? And so what? Responses to the results of the third
international mathematics and science study, Phi Delta Kappan, 79(10), 722.
www.minniscoms.com.au/educationtoday/articles.php?articleid=150
www.mackinac.org/article.asps?ID=6998
http://timss.bc.edu/timss1999b/sciencebench_report/t99bscience_A.html
http://nces.ed.gov/tmiss/Results03.asp
http://timss.bc.edu/timss2003.html
http://timss.bc.edu/TIMSS2007/about.html
www.iea.nl/timss2007.html
www.asanet.org/footnotes/jan05/fn10.html
www.ed.gov/inits/Math/silver.html