Professional Documents
Culture Documents
studies
EDUCATIONAL
In
education
ACHIEVEMENTS
OF
THIRTEEN-YEAR-OLDS
IN TWELVE COUNTRIES
Results
research
project,
reported
by:
institute
for
education,
1959-61,
ARTHUR
W. FOSHAY
ROBERT
L. THORNDIKE
FERNAND
HOTYAT
DOUGLAS
A. PIDGEON
DAVID
1962
unesco
of an international
A. WALKER
hamburg
CONTENTS
Foreword.
Arthur
W. Foshay
THE
BACKGROUND
AND
TWELVE-COUNTRY
Robert
THE
STUDY
PROCEDURES
OF THE
L. Thorndike
INTERNATIONAL
COMPARISON
OF THE ACHIEVEMENT
Fernand
FROM
AND
BELGIAN
Douglas
NATIONAL
DATA.
INTERPRETATIONS
. . . 43
. . . 63
A. Pidgeon
A COMPARATIVE
Davld
OF 13-YEAR-OLDS
Hotyat
INTERNATIONAL
STUDY
OF THE
DISPERSIONS
OF TEST
SCORES
A. Walker
AN ANALYSIS
PUPILS
AND
OF THE
TO ITEMS
SCIENCE
TESTS
----
REACTIONS
IN THE
.
OF SCOTTISH
GEOGRAPHY,
.
TEACHERS
AND
MATHEMATICS
.
The national data pooled for purposes of this study derive from the work
of various national centres for educational
research and are used with
their kind permission.
The opinions expressed
in the various sections
of this report, which is sponsored by the Unesco Institute for Education,
are those of their authors and do not necessarily
represent
the views of
the Unesco Institute for Education, of Unesco, Paris, or of the research
institutions
to whose staffs the authors belong.
Foreword
The present study may well be described as an unusual addition to the literature of education.
The results of the project here reported suggest that both empirical educational
research and
comparative
education can gain new dimensions, the one by extending its range over various
educational
systems, the other by including empirical methods among its instruments.
In the
minds of its authors the project had the double purpose of throwing light on the possibilities
of
such research, and of obtaining actual results which would, not so much evaluate educational
performances
under different educational systems in absolute terms, but rather discern patterns
of intellectual functioning and attainment in certain basic subjects of the school curriculum under
varying conditions. This would be a first step towards bringing into profile the relative merits
of various learning processes and procedures.
If the results so far, because of limitations on
their validity which the authors freely admit, are little more than suggestive, at least they offer
real encouragement
for believing that such researches can, in the future, lead to more significant
results and begin to supply what Anderson has lamented as the major missing link in comparative education,
which in his view is crippled especially by the scarcity of information about the
outcomes or products of educational systems*.
It does not detract from the achievement of this exploratory
study to say that, when further
refining international
empirical research methods, it will be essential for the researcher also to
explore the possibilities of building in more possibilities of relating the data to the specific educational principles and objectives which underlie the various national educational systems.
Certainly the international
group itself was sufficiently
encouraged
by the results of its first
exploratory
study to embark on a more ambitious one during which, at several key points in the
secondary school cycle, as comparable samples of schoolchildren
as can be obtained will be
subjected to tests which bear close reference to curricula and educational aims in all the participating countries.
The project here reported could not have been accomplished
without substantial contributions
from twelve different national centres for educational research (listed on p. 8 and 9) who were responsible for the technical and scientific aspects of the study, whilst the Unesco Institute contributed from its experience in international administration
and coordination
of research. Each national
centre was involved in the expense of providing tests and the local organisation
and admlnistration of field work, whilst the Unesco Institute substantially
underwrote the expenses arising
at the international
level. Beyond this the project depended on the goodwill and cooperation
of a
large number of teachers in whose schools the tests were administered
and the background information collected. I am very glad that the importance of this support, which for unavoidable
reasons usually remains anonymous, is brought out clearly in Dr. Walkers final section of this
volume.
In conclusion, I should like to express the particular debt of thanks which is owed to Professor
Foshay, who directed the project, to Professor Thorndike, who acted as chief test editor and
was primarily responsible
for analysing the international
data, and to the authors who have
contributed to this volume. It is hoped that subsequently
it may prove possible to add further
analyses of the international
data to those contained in the present report. Our thanks are
due no less to Mr. Cobb who, following
his responsibility
for day-to-day
coordination
of the
project, has undertaken the compilation and arrangement of the present report.
Hamburg, August
C. Arnold
Anderson:
1961. No. 1, p. 7 and 8.
Saul B. Robinsohn
1982
Methodology
of Comparative
Education,
International
Review
of
Education,
Vol.
VII,
Arthur
W. Foshay
TWELVE-COUNTRY
THE
BACKGROUND
AND
THE
PROCEDURES
OF THE
STUDY
In the genersl
orientation
with which
of setting
up an international
project
I 12 countries.
and the characteristics
this report
opens.
Professor
Foshay
describes
for achievement
testing.
the tests that were
of the samples
to which
they were given.
the process
administered
such needs
could
be met. In
2. To discover
the possibilities
attending
a large-scale
international
study.
In the papers that follow in this volume, several reports are presented of the results so far
achieved. Here I shall describe the procedures we have used, the populations tested, and the
tests we have employed.
Early planning:
the participants
In 1958, the Governing Board of the Unesco Institute for Education accepted the present writers
proposal that an international
study of intellectual
functioning
be undertaken. The officers
of the Institute invited several directors of educational research organizations to meet in Hamburg
in June, 1959, to consider the proposal further, and to decide whether they wished to participate
in such a study. As it happened, the second meeting of representatives
of European Centres
of Educational Research was scheduled to take place near London a week later, and the proposal
was described there also, with the result that some centers not represented
at the Hamburg
meeting also joined in the project.
The Hamburg meeting of June, 1959 lasted for five days. During this time, the participants considered and adopted the proposal that a study be designed, and then proceeded to make the
design, to prepare preliminary tests, and to arrange a schedule.
Each of the participants
was to bear the costs of test administration
within his own country.
The Unesco Institute paid the cost of travel and maintenance for the European participants
at
the three meetings finally held, and furnished extensive coordinative
services. The participants
who finally took part in the study were the following:
Belgium:
Fernand
Hotyat,
Directeur
du Centre
des Travaux,
lnstitut
de Pedagogic,
Professor
of Psychology
and Director,
Morlanwelz.
National
Centre
Founda-
for Educational
Federal Republic of Germany: Professor Dr. Walter Schultze, Director, und Dr. Rudolf Raasch,
Research Assistant, Hochschule fur internationale
padagogische Forschung, Frankfurt am Main.
Israel: Dr. Moshe Smilansky, Pedagogical Adviser and Director of Research, Ministry
tion and Culture; Director, Henrietta Szold Institute for Child Welfare, Jerusalem.
Poland: Professor
Scotland:
Jan Konopnicki,
Dr. D. A. Walker,
University
Director,
Scottish
of Wroclaw.
Council
for Research
Professor
of Educa-
of Sciences
in Education,
Edinburgh.
of Educational Psychology,
Research, Teachers College,
and Education,
University
and L.-M.
University
of Geneva.
Dr. Vladimir
Muiiit,
Institute of Education,
University
of Zagreb.
In addition to Mr. D. J. Cobb, who served as the continuing coordinator of the project, the Unesco
Institute for Education, under the direction of Dr. S. B. Robinsohn (and prior to his appointment
the Acting Director, M. R. E. Hennion), put the services of its excellent staff at the disposal of
the project.
Under the supervision
of Professor
for the tabulations of data.
The procedure
Thorndike,
Burgess
were responsible
as planned
When the group described here met in June, 1959, they reached a number of agreements about
procedure, first having considered
and accepted the general proposal. Before stating them,
however, it is necessary to state the caveat that applies to this study.
This is an exploratory
study. The participants
in this study were working with no extra funds,
no extra allotment of time, and without the benefit of a previously developed set of procedures.
It will therefore be apparent that both the tests and the sampling procedures do not meet the
standards that might otherwise be required. For these shortcomings we are not apologetic; it was
necessary to accept them, and hence to restrict the statements based on the data gathered,
if the study was indeed to be undertaken. Since not all of the sample populations are comparable, we shall not report total scores here as if they could be compared. The most interesting
analyses involve patterns of responses among items and sub-scores,
not comparisons
of total
scores, and it is this kind of analysis that is reported here.
The following procedural agreements were made and acted on by the participants
in the study:
1. The sample
a. The students to be tested would all be aged from 13 years to 13 years 11 months on the first
day of the school year whatever might be the school level (grade) at which they were found.
b. The sample population in each country would be between 600 and 1000 in number.
c. The sample to be tested would be all the children of both sexes residing in a community
or communities selected to yield a population of the designated size.
d. The community or communities
selected for testing would be as representative
as possible of the total population of the country, according to whatever data were available to the
participant in the study. If (as was true in some countries) no data were available to aid the
participant in his selection, he was to use his own judgment.
2. Data about the children
a. Background
1)
2)
3)
4)
5)
6)
7)
8)
9)
to be tested
to be tested would
birth date
sex
number of siblings
place in birth order
home language (if different from school language)
location of home (city of 20,000-100,000; 2,000-20,000;
years in school
kindergarten
(attended, not attended)
size of class (by los, from 10 or less to 61 or more)
be gathered,
as follows:
10)
11)
12)
13)
14)
15)
fathers education
mothers education
interest of parent (much, moderate, little or no)
fathers occupation
mothers occupation
score on non-verbal intelligence test
3. The tests
a. Tests would be administered
science, geography.
b. A non-verbal
the background
test would
information.
in the following
be administered
fields:
reading
comprehension,
mathematics,
c. The working languages of the study would be French and English. Translation of the test
items would be done by each participant
into his home language. Copies of the translated
tests would be deposited with the Unesco Institute for Education.
d. Trial forms of the tests ( except the non-verbal, which had been developed by the National
Foundation for Educational
Research in England and Wales) would be developed
by the
participants working together at Hamburg. (The items for the tests as finally constructed were,
in the main, taken from existing tests originally developed in England, France, Germany, Israel
and the U.S.A.)
e. The trial forms of the tests would be pre-tested with a small number of children
country, and criticisms and suggestions sent to a test editor for consideration.
f. The tests as finally
the Unesco Institute.
approved
would
be duplicated
and circulated
in each
to the participants
by
g. Alterations
in the substance of items would be permissible, provided they were approved
by the test editor. (A typical alteration involved the change in units of measure to conform
with the custom of the country.)
h. The tests would be held to approximately
30 items, in the hope that each of them could
be completed in less than 45 minutes. (Pre-testing
and the later administration
of the tests
confirmed this as an adequate length of time.)
i. No time limit would
be imposed
on the students.
10
5. Organization
The administrative
center for the project was the Unesco Institute in Hamburg. The participants
met there three times, each time for one week: in June, 1959, to plan the project and construct
trial forms for the tests; in October, 1960, to take a final look at the project before testing, and
in June, 1961, to examine the data and to plan for interpretation
and publication.
Certain persons accepted special responsibilities
for the conduct of the project, as follows:
Arthur W. Foshay (U.S.A.), project director; editor of the geography
Robert L. Thorndike (U.S.A.), test editor,
A. Harry Passow (U.S.A.), editor of the science test,
Gaston Mialaret (France), editor of the mathematics test,
Walter Schultze (Germany), editor of the reading test,
D. A. Pidgeon (England), editor of the non-verbal test,
D. J. Cobb (Unesco Institute), coordinator
of the project.
The procedure
test,
as executed
11
Tested
Vocational
General
schools
secondary
analysis
Tested
Retained
In samples
analysis
and 4e degre
145
145
205
170
schools
214
175
193
150
-for
The sample finally submitted to analysis thus contained 640 subjects, 320 boys and 320 girls.
A check was made to ensure that the elimating process did not affect the mean scores.
No attempt was made to provide any representation
of the Flemish-speaking
part of Belgium.
Children who were retarded two or more years in school were excluded. These are estimated
(by M. Hotyat) to be about 10 y. of the total. One or two of the sections in the vocational schools
included quite retarded children. In the vocational schools about 26 o/o of the pupils were children
of foreign workers (primarily miners), and the corresponding
percentage in the general course
was 6 o/O.
England
The sample from England consisted of 1,181 pupils, 607 boys and 574 girls. The pupils were all
the 13-year-olds
attending school under one Local Education Authority in central England. This
particular area had been chosen since other evidence had shown that, on tests given at the
age of II, the authority was quite representative
of the whole country both with respect to mean
score (100) and standard deviation (15). Although the proportions
of children from urban and
rural administrative
areas were also similar to those in the country as a whole, the authority
was predominantly
rural in character and contained no large industrial town.
The number of schools
Grammar
Modern
Unreorganised
all-age
4
6
3
Boys
Girls
Both
115
461
31
105
443
26
220 (18.6 %)
904 (76.5 %)
57 ( 4.8%;)
Finland
The sample from Finland included 727 pupils, 386 boys and 361 girls. These came from about
50 classes in schools widely distributed
over Finland. The choice of schools and numbers of
pupils are such as to make the sample closely representative
of Finland as a whole with respect
to grade and type of school and with respect to the percentages of urban and rural pupils. The
proportions
for the country and for the sample were reported as follows:
12
Whole
Sample
country
62%
38%
In primary
68 x
32%
64%
36 %
78%
67 %
33%
76%
24 %
45%
55%
22 %
48 7;
54 %
No pupils were tested in Finland who were in classes below Grade VII of the primary school.
This meant that in primary schools about 2 y. of children in the age group, most of them retarded,
were not included in the sample. In the secondary schools all 13-year-old children were included,
irrespective
of their grade level (actual proportions:
55% in Grade II, 20 O/Oin Grade I, and
25 oh in Grade Ill) and the secondary school sample was thus as representative
as possible.
No pupils were included from Swedish-speaking
districts in Finland (about 8 y. of the population). Many of the classes in rural areas included mentally retarded children.
France
The sample from France was a relatively small one of 451 pupils, 181 of whom were boys and
270 girls. The small size was accounted for in part by bad weather, which reduced school attendance at the time of testing. The sample was drawn from one small city near Caen, together
with the adjoining rural area. The sample corresponded
approximately
with the total French
population with respect to percent of urban residence, occupation of father and size of family,
as these were determined in an earlier extensive survey by Heuyer, Pieron and Sauvy*.
The sample was chosen to represent the different types of schools in the following numbers:
Primary
school
Vocational school
Secondary school
Boys
Girls
Total
117
20
52
142
20
98
259
40
150
Those pupils who were retarded more than one year in school were excluded.
to be about 5 y. of the total age group.
Federal
Republic
This is believed
of Germany
The sample consisted of 811 pupils, 403 boys and 408 girls, who were attending schools in the
city of Darmstadt in Hessen, or in the adjoining rural districts. These three districts are believed
to correspond well with the country as a whole in socio-economic
structure, in distribution
by
types of occupation, and in education. Furthermore, previous experience in setting up test norms
has shown this region to be close to the national average.
Within the region, the sample was chosen so that the proportions
corresponded
closely with
the national average with respect to type of school attended, and within the Volksschule
representativeness
was sought with respect to grade level reached and, within the eighth grade, with
respect to size of school.
The proportions
are shown below for both sexes combined.
* Heuyer,
veraltalres
G., pieron,
de France,
H., and
1950.
Sauvy,
A. Le Nlveau
lntellectuel
dae
Enfanta
dlAge
Scofalre.
13
Country
2.0
10.7
18.1
Country
Volksschule Total
6th grade
7th grade
8th grade
l-3 classes
4 - 7 classes
8-9 classes
as a whole
as a whole
69.2
1.7
7.3
63.2
15.0
18.1
27.1
Sample
2.1
11.0
18.5
Sample
68.4
1.7
7.4
59.3
15.4
16.2
27.7
Israel
Because of local interest in certain types of sub-groups, Israel tested a relatively large sample,
almost 1,900. Data were analyzed for 1,873, 930 coded as girls and 942 coded as boys. These
were from a number of different schools in different localities. The basic classification
is into
town and city schools, schools in Moshavim (collective agricultural
settlements), and schools in
Kibbutzim (communities with communal housing, dining, and child-rearing).
Schools in Moshavim
were all located in well-established
settlements
of early immigrants.
Schools in Kibbutzim were chosen in equal proportions to represent three ideological trends, but
otherwise at random. Town and city schools were chosen so as to include essentially
equal
numbers of schools that had scored high, above average, and average on the national eight grade
survey.
The Israeli data are based on all eighth grade pupils in the schools which were tested. No 13-yearolds who were not in the eighth grade were included in the Israeli sample. The result is that
177, or 9.5 O/o of the group were 14 years of age or over at the beginning of the school year,
while 514 or 27.4 O/cwere less than 13 years of age. At the same time, all 13-year-olds who were
in the 7th or earlier grades were excluded. These are estimated to be 10 O/o of the total age
group. Likewise about 4 oh of the 13-year-olds
who had progressed
beyond the eighth grade
were excluded. Furthermore, the sample was limited to schools in which the pupils were primarily of the early group of immigrants to Israel, and thus largely of European origin. Thus, of the
total sample only 193 or 1O.30/o had fathers born in an African or Asian country compared to
about 30 O/oin the eighth grade nationally.
(It is to be noted that of the fathers 166, or 8.9 %, are classified as professional
or managerial,
but Dr. Smilansky indicates this is not unusually high for the European section of the countrys
population.
(It is also to be noted that the date of testing in Israel was February-March
rather than November, so that pupils had an additional three or four months of growth and schooling as compared
with most countries.)
14
Poland
The total population tested in Poland was 1,000, consisting of 346 boys and 654 girls. Children
were selected from five different environments.
The rural children were tested in large villages
possessing at least the first classes of secondary school. Children from small towns were tested
at Milicz, from mixed agricultural-industrial
surroundings
at Dzierzoniow,
from industrial areas
at Walbrzych (where there are coal mines and heavy industry), and children from a favored
cultural environment in one of the districts of Wrociaw, 80 O/oof whose inhabitants were reported
by the Polish research agency as being highly educated people.
Scotland
The Scottish sample consisted of 991 pupils, 515 boys and 476 girls, drawn from two of the
educational administrative
units in Scotland, the city of Aberdeen and the country of Stirlingshire. These were chosen because each had been found in the past to be representative
of
educational achievement in Scotland as a whole. In Aberdeen, one-sixth of the 13-year-old pupils
were tested, the sampling being based upon the day of the month on which they were born. In
Stirlingshire,
a sample was drawn from half of the schools in the county in such a way that each
school course was represented
in the same proportions
as in the county as a whole. Thirteen
schools were involved in Aberdeen and ten in Stirlingshire.
The testing in most of the countries took place in November of 1960, it having been agreed that
this was a good time for those countries in which the school term begins early in October or
during the month of September. The school year would have been well started by this time, and
the children would have had a substantially
similar number of days since the school year had
begun. In Scotland, however, the testing was conducted in June of 1960 as certain reorganisations
were due to take place in Scottish secondary schools during the autumn of 1960. It was therefore
necessary to give the test at that time, and to adjust the selection procedure so that the Scottish children were at the appropriate age when they took the tests.
Sweden
The sample in Sweden consisted of 567 pupils in all, 284 boys and 283 girls. These were drawn
from about 30 classes in the middle part of the country. The classes were chosen from schools
which in previous national surveys had given results with an average score and variability close
to the national average. Testing was limited to seventh year classes in the various types of
schools. Those pupils in the classes who were above or below the age limits for the study were
excluded from the sample.
No attempt was made to test those 13-year-olds who were in classes either above or below the
seventh. The number of 1S-year-old pupils in classes above the seventh is estimated to be about
10 y0 and the number in the sixth or lower classes is estimated to be about 5 %.
Switzerland
The Swiss sample consisted of only 314 pupils, 153 boys and 161 girls. These were drawn from
the city of Geneva, no attempt being made to represent a wider geographical
region. Dr. Roller
points out that a national sample in Switzerland
would have had to be drawn from each of the
cantons in the country, there being no other unit that is appropriate to the cultural and population
distribution
in Switzerland.
Since the total 13-year-old population of Geneva is about 2,000, the
sample is considered adequate to represent them.
Within Geneva, the sample was set up so as to include appropriate
numbers in the different
types of schools in grades 7 and 8. The total numbers tested were as follows:
15
7th Grade
boys
8th Grade
girls
boys
girls
Ecole primaire
College de G&eve
Ecole primaire
College de Geneve
College moderne
Ecole superieure lat.
Ecole superieure mod
Ecole m&-rag&e
101
72
203
72
94
52
77
76
Those pupils among this total group who were 13 years of age were included in the final sample.
No attempt was made to test 13-year-olds who fell below the seventh grade. These are estimated
to comprise 5 y0 of the age group.
U.S.A.
In the United States the total group tested was 2,254, but in order to reduce the burden of
statistical analysis, only every other pupil was included in the sample finally analyzed, which
comprised 1,127 pupils, 568 boys and 559 girls.
The United States sample consisted of all the 13-year-olds in the public school systems in three
different educational administrative
units. One was an industrial city that is part of the Boston,
Massachusetts
metropolitan
area. A second was an industrial city of about 50,000 in southern
Ohio, not far from Cincinnati. The third was a rural county in south central Illinois. The three
units were chosen because they had been found to give results close to the national average
when they were used in the nationwide standardization
testing for the Metropolitan
Achievement
Test published by the then World Book Company.
Yugoslavia
The Yugoslav
sample consisted
Boys
Urban
Rural
Multiple
Total
Class teaching
of 685 children,
Girls
drstributed
as indicated
in the following
table:
Combined
202
135
(28)
206
142
(31)
408
277
(59)
337
348
685
Some previously determined localities were substituted by others with the same general situation
(general cultural level of the population, distances from principal ways of communications,
etc.).
The localities where testing actually occurred were: Bukevje, Klara, Lomnica, Novo tire, Odra,
Vele4evac. Velika Gorica, Velika Mlaka. Vukovina, Zagreb.
In the localities tested, all I3-year-olds
attend school. Only handicapped children educated in
special institutions for the handicapped were excluded from the sample. Testing was also carried
out with pupils above and below the seventh grade. All classes are co-educational.
The tests
Four tests of academic achievement were given. Since our general purpose was to gather data
that could yield inferences about intellectual functioning, or reasoning, an attempt was made in
each test to include items that called for reasoning, but did not require previous knowledge of
the field. The ideal item was one that presented all the information required for a correct answer.
of the tests.
test:
5 items requiring
5 basic
Which
a)
b)
c)
d)
simple computation
numbers?
2) This table gives readings of maximum and minimum temperature in degrees Fahrenheit, of
rainfall in inches, and of sunshine recorded for each month of the year.
a) In which month did the highest temperature occur?
b) In which month was the difference
between the maximum and minimum temperatures
greatest?
c) Which was the wettest month?
A
9 problem sequences, e. g.:
We know that the altitude of a triangle is a line drawn from one
vertex perpendicular
to the opposite side. We are given the triangle in Figure 2.
I
I
\
B
---lH
C
FIGURE
\\J/
Total: 26 items, some with subdivisions;
Science
29 responses.
test:
you enter
pupils of
lenses in
pupils of
lenses in
a movie
the eyes
the eyes
the eyes
the eyes
theater on a sunny day, you do not see well at first because the
are still large
will focus the light in front of the retina
are still small
will focus the light behind the retina.
true, probably
true, impossible
to determine,
probably
of its trunk.
Total: 21 items.
17
Geography
test:
12 multiple-choice
items depending
on information,
e. g.:
drawing inferences
4 items requiring
that generalizations
be stated as supported
maps.
or not supported
by a bit of text.
Statements
a) Many mountain stream beds are narrow, steep, and full of rapids.
b) People who live in rugged mountain areas tend to depend for their livelihood more upon
animal products than upon the growing of crops.
c) Mountain dwellers are often fine craftsmen.
d) In mountainous areas, water power can be used to produce electricity for manufacturing.
Generalizations
1)
2)
3)
4)
Total: 32 items.
Reading comprehension:
5 reading passages,
each followed
by 6 or 7 comprehension
items, e. g.:
test:
completion
of series,
of the tests
The multiple-choice
18
figures,
form of testing
is more familiar
to students
in Scandinavia,
Reliability
of the tests
The reliability
of the tests is discussed
report. The general estimates of reliability
Mathematics
Reading
Geography
Science
chapter
of this
.81
.81
.70
.62
A note on translation
The tests were originally prepared in either English, French, or German. They had to be translated
into eight languages: English, Finnish, French, German, Hebrew, Polish, Serbo-Croatian
and
Swedish. The problem of translation was, of course, of great concern. Since the participants did
not mean to make this the main problem of the study (any more than they meant to make sampling
or test construction
the main problems), they agreed to leave the translation of the items into
their own languages to each participant. They did not, for example, test the translation by having
it translated back into its original language in order to compare the re-translation
with the original.
This led to occasional differences in items. A striking example of this, as might be excepted, was
in the translation of a passage in the reading comprehension
test, in which the literary quality of
the passage in its original French was its main characteristic.
Elle sort dune touffe dherbe qui Iavait cachbe pendant la chaleur. Elle traverse Iallbe de
sable & grandes ondulations.
A caterpillar emerges from a tuft of grass where it has been concealing itself during the warm
weather. It crosses the gravel path, moving in a series of large ripples.
Quelle belle chenille, grasse, velue, fourrbe. brune, avec des points dor et ses yeux noirs!
What a beautiful caterpillar-fat,
hairy, furry and brown, with golden spots and black eyes!
A different translation problem appeared at one point in the mathematics test. A question in the
English original read: How would omitting the decimal point in 18.52 change the number?
One of the answers from which the examinee could choose, read: Makes it 1/I 0 as large.
The French translation reads: II devient 10 fois plus petit-an
entirely different problem.
Such difficulties
in translation apparently were so small in number and so scattered as to be
insignificant. There is no evidence that they seriously influenced the national scores.
we have demonstrated
in technical
certain
and philosophical
matters
of
assumptions
19
Robert
L.Thorndike
ACHIEVEMENT
INTERNATIONAL
COMPARISON
OF THE
OF 13-YEAR-OLDS
In the second
section
of this report
Professor
Thorndike
presents
and discusses
certain
aspects
of the results.
He explains
the reason
for deciding
to restrict
comperlsons
between
country
and
country
to patterns
of achievement
(national
proflles),
omitting
comparisons
of levels,
and he
presents
findings
on the reliability
of the tests.
The results
are analysed
in relationship
to sex
and certain
background
variables.
Further
analyses
of variations
behveen
countries
in the relative
difficulty
experienced
with selected
test items,
combined
with a study of item content,
lead the
author
to investigate
e number
of hypotheses
with results
which
demonstrate
the possibilities
of
international
evaluations
of achievement.
Because the present project was a pilot enterprise, carried out with limited resources, it was not
practical to try to get a truly representative
sample of the 13-year-old population In each country.
Sampling procedures varied from country to country, as described by Foshay, but in most instances sampling was limited to one or a few communities or regions that were thought to be representative of the country as a whole. In a few countries (England, Scotland, Sweden) there had
previously been fairly complete national testing surveys, and communities or regions could be
chosen which had been found on these to correspond to the country as a whole. In some countries (Switzerland,
Israel) the sample was intentionally
restricted to a place or to a fraction of
the population that were fairly clearly not representative
of the country as a whole. In most of
the countries an attempt was made to achieve representativeness,
but the evidence upon which
communities or schools were chosen was rather meager and impressionistic.
Because of these limitations on the representativeness
of the national samples, there seems to
be little value in comparing the absolute level of achievement in one country with that in other
countries. For this reason, no country by country tables of mean scores are reported. We will
turn our attention instead to an examination of the magnitude of the differences
between countries, and to the differences in patterns of achievement from country to country.
Statistical
characteristics
of the tests
The test battery consisted of four short achievement tests and a non-verbal measure of scholastic
aptitude. Three of the achievement tests yielded separate part scores as well as a total score.
The nature of the several tests and sub-tests is described by Foshay (pp. 16 to 18). At this point
we shall merely supplement that description by a brief table (Table I) showing for each test and
sub-test (1) the number of items, (2) a general estimate of the mean, obtained by averaging the
means for 11 national groups, (3) a general estimate of the standard deviation, obtained as an
average of the 11 standard deviations within countries, and (4) a general estimate of reliability,
obtained by Kuder-Richardson
Formula No. 20 from the average standard deviation and the
average item difficulty over 11 national groups.
As might be expected, the reliabilities
of many of the sub-tests, consisting of from 4 to 10 items,
are quite low. However, the total test reliabilities
are fairly satisfactory,
the estimated values
ranging from a low of .62 for the 21-item science test through .70 for the geography test, .81
for the reading test and the mathematics test, to .89 for the considerably
longer non-verbal test.
The estimates are rough, since the assumptions underlying the Kuder-Richardson
formulas are
not completely
met. However, the general order of magnitude is indicated. Though the tests
would be of no use for the study of single individuals, they appear adequate for comparisons of
groups
of several hundred, and these are the comparisons
with which this study is primarily
concerned.
Results
from
Yugoslavia
became
available
too
late
for
inclusion
in these
and
certain
other
analyses.
21
TABLE 1
Parameters
Statistical
No.
of
items
Non-verbal
Aptitude
Mathematics
- Part 1
- Part 2
- Part 3
- Part 4
-Total
of Tests
Average
K-R No. 20
Average
Stand.
Mean
Dev.
Reliability
75
33.66
12.19
.89*
5
5
7
9
26
3.63
4.19
3.91
3.05
14.98
1.14
1.03
1.60
2.02
4.40
.51
.58
.51
.73
.81
Reading Comprehension
33
21.36
5.27
.81
Geography
- Part 1
- Part 2
- Part 3
-Total
12
16
4
32
7.01
7.82
1.59
16.42
2.16
2.77
1.07
4.65
.49
.57
.38
.70
Science
-Part1
- Part 2
-Total
16
5
21
7.77
1.99
9.76
2.85
1.17
3.39
.59
.28
.62
* By K-R Formula
l
* Inflated,
because
items
high
because
of speed
factor.
variance
From the raw scores on each test, raw score means and standard deviations were obtained. A
crude average of the variances in the 11 separate countries provided an estimate of the average
variability
of performance
of pupils within a single country-the
within-countries
variance.
The mean of the means for 11 countries was used as a grand total mean. Variance of the 11
national means around this average value provided an estimate of variability
from country to
country-the
between-countries
variance. A comparison of the two variance estimates-the
one for variability within a country and the other for variability from country to country-provides
an index of the magnitude of international
differences.
Table 2 expresses the variance between
countries as a percent of typical variance within a country. The results are shown for boys and
girls separately, and for the total group of all pupils.
TABLE 2
Variance Between Countries Expressed as a
Percent of Average Within-Country
Variance
Boys
Non-verbal
Mathematics
Aptitude
- Part 1
-Part2
- Part 3
- Part 4
-Total
Reading Comprehension
22
and
Girls
Boys
Only
Girls
Only
12.1 ,s
11.5%
11.7%
9.9
9.4
11.8
14.3
16.2
13.9
11.6
14.0
18.4
21.2
7.5
8.8
11.7
12.2
13.4
6.2
8.1
5.6
Geography
-Pa*1
-Part2
- Part 3
-Total
35.8
7.1
5.6
15.4
39.3
7.9
4.4
17.7
37.4
7.8
7.1
15.9
Science
- Part 1
- Part 2
-Total
6.1
2.3
5.2
7.3
1.5
5.9
8.0
3.3
7.1
It is clear that the variation between national means is small in relation to the variability
of
scores within any one country. National differences
represent a minor rather than a major component in these results, And the probability is that they are over-estimated
rather than underestimated, because the countries that did relatively well on the tests were in several instances those
that were known to have tested an up-graded sample of their populations. We suspect that with
truly representative
national samples, the differences would have been reduced. Of course, the
participants
in this survey were all countries with a basically European culture, and with welldeveloped educational
systems. A greater heterogeneity
in national cultures and educational
levels would very probably increase the national differences, perhaps substantially.
A comparison of the different tests with respect to magnitude of international
differences brings
out some rather dramatic and surprising results. With these tests and samples, the tests that show
the smallest variations from country to country are the tests of science and of reading comprehension. The presumably relatively culture-free
non-verbal aptitude test shows about twice as much
country-to-country
variation as the reading and science tests, and the geography and mathematics
tests about two-and-a-half
times as much. It must be remembered that all the tests had been
translated into eight different national languages - English, Finnish, French, German, Hebrew,
Polish, Serbo-Croatian,
and Swedish. The fact that a reading test, which would appear to be
especially susceptible
to changes in difficulty
with translation,
remained so uniform is rather
unexpected. The findings suggest that the nearest thing we have to a culture-fair
test may be a
carefully translated reading test, and that level of reading ability is the feature with respect to
which different educational programs are most nearly uniform.
Several of the tests had sub-tests and a comparison of the variability between nations on these is
of some interest. In the case of mathematics it is the verbal problems (Part 3) and the inductive
series (Part 4) that showed the largest international differences. Geographical
information (Part 1)
showed by a large margin the widest international variation of any test, whereas in map reading
(Part 2) the differences were much smaller and in drawing generalizations
(Part 3) smaller still.
Scientific judgment (Part 2) showed very little variation between countries, and scientific information somewhat more. In some measure, the above results are an outcome of differences
in reliability of the sub-tests. If the within-group
variance is inflated by measurement
errors, the
between-group
variance will necessarily
look small in comparison.
However, this accounts for
only part of the results, and the major differences appear to arise from more genuine factors.
Generally speaking, the boys varied more from country to country rhan did the girls. However, the
sex differences in this respect were neither large nor entirely consistent.
National
profiles
Though variations in the sampling procedure from country to country make comparisons of level
of achievement of questionable
value, comparisons of patterns of achievement from country to
country seem sound and of a good deal of interest. By pattern of achievement we mean a countrys achievement on the specific tests and sub-tests, relative to its own over-all level of achievement.
Patterns of achievement were arrived at through the following steps:
(1) For each test a crude average of the national means was computed, and also a crude
average of the national standard deviations.
(2) On any one test, such as the test of reading comprehension,
each countrys average score
was converted into a standard score, by subtracting from it the average score for all countries
and dividing the result by the average standard deviation for that test.
(3) The average standard score for the five tests (i. e., the total scores) was computed for
each country.
(4) This average standard score was subtracted from the standard score on each test and
sub-test. That is, each specific standard score was expressed as a deviation from the countrys
average standard score. In this way, each national group was reduced to a common and comparable base line. It is then possible to examine and compare directly the peaks and hollows of
achievement in the different countries.
23
TABLE 3
National Patterns of Achievement
Expressed as standard score deviations from national
average on all 5 test5
Belgium
Non-verbal
Aptitude
Mathematics
-Part1
-Part2
- Part 3
-Part4
-Total
-16
-36
-18
-6
-17
-31
-16
-16
-19
21
-40
-7
c46
-38
28
32
23
-18
6
-24
-11
- ia
3
-18
Science
-Part1
-Part2
-Total
-16
5
- 14
7
4
28
43
30
-8
11
19
-58
-12
27
-33
ta
11
-7
4
3
23
7
15
25
16
-15
26
47
-a
-29
20
13
23
16
16
4
9
- 29
13
-21
24
-8
24
-14
-24
-14
Aptitude
-Part1
-Part2
-Part3
- Part 4
-Total
-18
25
-9
2
43
30
Scotland
Sweden
Swik.
-a
12
11
33
19
-16
3
0
0
10
-4
16
-25
-26
-27
-58
-23
-43
23
-45
12
20
92
-31
-65
16
-33
-12
33
-16
-20
12
ia
-5
-9
0
24
7
12
20
12
-43
-39
-43
-44
-3
23
-1
-24
-Part1
- Part 2
-Total
Yugosl.
25
Geography
-Part1
-Part2
-Part3
-Total
U.S.A.
- 28
-43
- 29
- 19
-39
Reading Comprehension
Science
Israel
Combined
Poland
Non-verbal
Germany
-51
-42
- 20
-9
-40
-Part1
- Part 2
-Part3
-Total
Mathematics
France
12
Geography
and Girls
Finland
25
40
34
42
44
Reading Comprehension
c = Boys
England
-1
4
4
-3
7
-9
30
-54
15
35
-16
27
10
16
5
28
24
27
28
24
21
The complete set of national profiles is presented in Table 3. These show results for boys and
girls combined. All entries in the table are expressed In hundredths of a standard deviation. That
is, the entry 12 for Belgium on the non-verbal test means that Belgiums standard score on that
test was twelve hundredths of a standard deviation higher than Belgiums average standard
score on all five of the tests. Thus, if we look at the results for Belgium, we see that the pupils in
Belgium were most outstanding, relative to their over-all level of performance,
in mathematics.
Here they show a peak of almost half a standard deviation. They are slightly above their own
over-all average on the non-verbal aptitude test, and they do relatively least well on the test of
reading comprehension.
The sub-test scores show only minor deviations from the total scores.
England, by contrast, performs especially well on the non-verbal aptitude test and is especially
weak in mathematics and geography. The geography sub-test dealing with geographical
information is notably lower than the map-reading or inference tests.
24
___..
.,_
A similar analysis could be made of the pattern for each country, pointing out points of relative
strength or weakness in each. Or the results can be examined from the point of view of each
test in turn. This has been done in Figure 1, in which the strength or weakness of each country
(relative to its own over-all mean) has been plotted on a common scale. We see that on the
non-verbal test the country that doea especially well is England. Since the test was English in
origin, this result may possibly reflect some degree of previous familiarity
with the test, and
acceptance of the task as a reasonable and sensible one. Scotland also does well on the test,
while Germany and Finland perform poorly on it.
On the mathematics test, all the French-speaking
countries are superior performers, with Belgium
leading the way. Poland also shows up to advantage. The English-speaking
countries are consistently poor. One wonders what part of this is contributed
by their complex system of denominate numbers. Yugoslavia also has marked difficulty with this test.
National differences
in reading comprehension
are relatively small. It is on this test that Yugoslavia shows up to best advantage, followed
by Scotland and Finland, while Belgium and
Poland do relatively poorly.
On the geography test we find Germany, Israel and Poland leading the way, and their superiority is especially
marked in that section of the test dealing with geographic
information.
The
English-speaking
countries do notably poorly on the geography test as a whole, and especially
on the sub-test dealing with geographical
facts and information.
This is an area in which the
different national curricula appear to have produced distinctly different results.
Science is an area in which the French-speaking
countries are relatively weak. Here the leaders
in relative achievement are the United States and Germany, with Yugoslavia and England following in that order.
Some countries show rather marked peaks and hollows in their profiles. Thus, England is very
high on non-verbal aptitude and very low in mathematics and geography. Belgium is high on
mathematics and quite low in reading. Others show a notably even pattern of performance. The
best example is Sweden, which performs at almost the same level of excellence on all the tests.
The patterns of relative strength and weakness provide a picture of achievement
under the
different educational
systems. They provide no explanation
of how the differences
come into
being. This must be contributed by the investigator who is intimately acquainted with the educational systems in the several countries. However, the data presented here must still be considered
quite tentative. They are limited by (1) the local and only partially representative
character of
many of the national samples, (2) the brevity of the tests and especially the sub-tests, and (3)
the limited opportunity
to plan test content 50 as to assure the most balanced and appropriate
representation
of content and objectives.
The results reported so far are for boys and girls combined. It is of some interest to look at
the results for the sexes taken separately, and this is done for the five total tests in Table 4.
Scores for boys and girls are each expressed as deviations from the average of all five tests for
that sex in that country. That is, the score of 11 for Belgian boys on the non-verbal test means that
the Belgian boys were eleven-hundredths
of a standard deviation higher on that test than they
were on the average of the five tests.
25
FIGURE
Non-verbal
Aptitude
England
Mathematics
Relative
Achievement
Reading
Comp.
Groups
on Tests
Geography
Science
Belgium
France, Poland
Scotland
Yugoslavia
Scotland
Switzerland
Finland
Belg., Switz.
USA
England
Israel
USA
Switzerland
Germany
Flnland
Sweden
Sweden
of National
Israel
=rance, Israel
Israel
USA
Germany
Yugoslavia
Poland
France
England
Germany
Sweden
Switzerland
;,;w$=fia
Finland
Scotland
Poland
Yugoslavia
France
Germany
Belg., Israel
Poland
France
USA
Scotland
England
Yugoslavia
Switzerland
TABLE 4
Sex Differences in Pattern and Level
of Achievement
Average
B-G
32
22
24
15
32
18
37
14
6
24
- -11
17
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Yugoslavia
Average
19
Non-Verbal
6
G
11
42
-47
-7
-39
3
-22
16
-21
3
12
-12
-5
13
50
- 28
- 22
- 33
17
-17
33
9
21
11
-6
Geography
Science
42
-42
-2
34
-18
-10
22
-44
-4
17
-35
-56
46
-37
13
27
-14
-4
34
-34
6
22
-18
-34
-35
-2
11
-27
-4
-17
-34
14
-8
-8
-8
25
-12
25
27
6
10
-3
-18
33
7
16
20
35
-16
-30
-21
-36
-25
7
5
29
20
15
-11
1
6
-14
8
0
23
23
21
16
-23
-11
9
-18
1
-1
31
31
-5
32
5
20
22
33
-16
47
37
12
Mathematics
-8
Reading Comp.
-8
-1
20
0
-14
-32
16
-32
-15
-11
-9
-68
7
4
-15
The first column of Table 4 shows a different kind of a finding. In this column, the average
standard score on all five tests for boys and for girls is compared, country by country. Thus. in
Belgium on the average of all five tests the boys fell 0.32 standard deviation units above the girls.
in
An examination of this column, headed Average B-G, shows the extent of male superiority
a pooled average performance,
country by country. Thus, there is only one country in which
the girls surpassed the boys in total performance - the United States. DifFerences between the
two sexes were small in Sweden and Scotland. Largest differences were in Poland, Germany and
Belgium. These results provide some clue as to the comparability
of educational opportunity and
motivation for the two sexes in different countries. On average, over all countries and tests,
the boys fall about a fifth of a standard deviation above the girls.
An examination of results for the different tests shows that girls perform best, relatively speaking, on the reading test, and least well on the test of science. This pattern is a universal one,
appearing in each one of the 12 countries. We appear to have here a universal and quite stable
sex characteristic. There is also a small, but rather consistent tendency for the girls to do relatively
better on the non-verbal test (10 countries) and the mathematics test (11 countries). Differences
in the geography test were small and inconsistent. All of these differences in relative performance on specific tests appear, of course, after adjusting for the 0.19 standard deviation difference
in average performance of boys and girls.
Achievement
or occupation
An attempt was made to get information on parental education or fathers occupation for the
children in each country. However, there was very real difficulty in getting comparable data for
different countries. Pressures and sensitivities
differ from country to country, so that in some
it is possible to get information about education and in others it is possible to get information
about occupation, but it is rarely possible to get both. Furthermore, the differences in educational
structure in different countries make it difficult to establish classifications
that will be comparable
from country to country. However, the operation was carried out as well as could be done, and
some comparisons are presented in Tables 5-10.
The
basic
unit
IS the average standard deviation of boys and girls combined, averaged for 11 countries.
27
TABLE 5
Percent of Pupils with Fathers at Different
Levels of Education or Occupation
B
Level of fathers educ.
Belgium
Elementary only
Some secondary
Secondary completed
Some college
College
England
45
32
14
*
t
completed
College
33
44
15
*
*
completed
B
Germany
Fathers occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & managerial
*
77
11
7
24
55
19
37
27
14
.
74
,*
9
78
14
21
60
14
*
22
34
30
11
10
43
27
9
10
Sweden
t
76
12
7
Israel
25
37
9
9
16
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
France
50
26
15
5
+
Yugoslavia
58
21
9
4
9
S
4
78
55
23
11
c
78
10
16
Scotland
U.S.A.
SWit2.
Germany
22
30
18
*
23
26
42
10
5
15
Israel
8
L
Scotland
SVMZ.
8
45
27
49
30
14
21
35
17
10
17
of means.
Table 5 shows the percent of cases with fathers at different levels of education or occupation.
It is clear from this table that the different national samples were not comparable with respect
to distribution
of education or occupation of fathers. What is not clear is the extent to which
these sample differences
reflect similar differences
in the total national population and the
extent to which they reflect biases in the specific sample tested in that country. Thus the Scottish
sample showed twice as many unskilled and semi-skilled workers as the German sample, twoand-a-half times as many as the Swiss, and five times as many as the sample from Israel. How
shall we understand this? Examination of the sampling procedure brings out that the Israeli
sample was limited to that segment of the population who were of European origin, that the
Swiss sample was limited to Geneva, while several fee-paying schools were excluded from the
Scottish sample. Thus, in part at least the differences
between nationalities
appear to reflect
differences
in sampling. However, it is also probably true that differences
reflect in part actual
national differences,
especially in the amount of schooling. Thus, the differences
in educational
level of parents in Sweden and in Yugoslavia are certainly at least in part a reflection of the
past educational
level prevailing in the two countries. And to come from a family in which the
father has completed secondary education certainly signifies a less outstanding experience in the
United States than in most European countries, where such a level of education is still the exception
rather
than
a typical
event.
However,
in education
also
the figures
suggest
that
the sample
be non-representative
of the total population in some countries. Thus, a Polish sample in
which 40 s/s of fathers have completed secondary education hardly seems representative
of the
total Polish population
in the age range 40-50.
Comparisons
of national sub-groups in which the level of parental education or occupation is
may
uniform
from
national
country
to country
are almost
certainly
more
meaningful
as pointed
than
comparisons
of total
paragraphs.
TABLE 6
Non-Verbal Test Averages by Level of
Fathers Education or Occupation
B
Level of fathers educ.
Elementary only
Some secondary
Secondary completed
Some college
College completed
France
Poland
-6
23
50
-
33
96
115
- 24
-6
-
0
23
23
40
-34
-14
9
-
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
-60
-33
18
69
49
B
Fathers occupation
England
G
Elementary only
Some secondary
Secondary completed
Some college
College completed
Belgium
Sweden
-93
-59
-12
23
- 26
24
-
S
- 26
16
-
-65
7
L
Yugoslavia
U.S.A.
-52
-
-25
-19
3
12
-76
- 14
24
-
Germany
Israel
Scotland
SWltZ.
Germany
- 1
-8
15
24
59
- 3
36
48
74
79
-12
28
59
58
-
29
41
77
53
- 28
-34
-18
-13
39
-92
-70
-63
-25
Israel
4
33
55
50
86
Scotland
0
41
37
-
SWlb.
24
60
39
53
29
TABLE 7
Mathematics Test Averages by Level of
Fathers Education or Occupation
B
Level
of fathers
educ.
Belgium
England
23
56
88
-
-58
38
43
-
Elementary only
Some secondary
Secondary completed
Some college
College completed
France
11
52
-
5
10
55
-
College
completed
-64
-8
-48
occupation
Sweden
U.S.A.
-37
IO
-
-107
-124
22
33
51
-
-31
13
56
Israel
Scotland
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
15
11
26
61
-7
25
34
62
-74
-31
-7
15
87
71
Switz.
-148
-86
-43
-9
-70
-30
-143
- 79
- 59
-
-43
8
-
Germany
Yugoslavia
-12
19
32
-
Fathers
46
70
69
71
G
Elementary only
Some secondary
Secondary completed
Some college
Poland
Germany
Israel
Scotland
77
52
72
-
-7
-14
30
3
-22
12
37
49
-71
-28
-23
-
44
61
38
-
88
42
55
36
Switz.
TABLE 8
Reading Test Averages by Level of Fathers
Education or Occupation
B
Level
of fathers
educ.
Elementary only
Some secondary
Secondary completed
Some college
College completed
Belgium
England
-47
-25
-12
-
France
-53
-25
-
-20
53
91
-
College
completed
-3
39
20
-61
-39
-11
-
S
Sweden
-13
12
17
18
I
occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
30
Germany
26
22
57
90
96
Israel
-11
21
28
55
67
-65
4
38
-45
22
-
40
12
BOYS
Fathers
Yugoslavia
-96
-37
-19
2
7
U.S.A.
-38
-4
-
-30
24
-48
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
-80
1
Switz.
Germany
-13
21
58
78
-
42
28
38
63
19
7
32
47
86
-8
51
6
-
Scotland
-66
-16
Israel
-20
19
35
56
62
57
L
Scotland
-2
48
41
-
S
Swttz.
7
49
53
51
TABLE
of fathers
educ.
Elementary only
Some secondary
Secondary completed
Some college
College completed
Belgium
England
-33
-2
18
-
-44
31
52
-
College
completed
-60
-58
- 21
-
-68
-17
-23
Fathers
occupation
Germany
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
-25
-14
50
15
56
71
97
95
2
12
42
-
-51
-1
-113
31
-40
-3
39
74
TABLE
32
-92
-53
-40
-
-46
7
-
Germany
52
50
69
53
G
Switz.
5
0
-72
-34
-47
Yugoslavia
-132
Scotland
Israel
61
59
83
88
139
U.S.A.
Sweden
45
63
70
14
8
-
-22
18
25
-
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
France
Israel
23
25
64
35
94
-18
Scotland
2
41
54
75
76
Switz.
-53
-22
-18
-
8
46
32
46
10
of fathers
educ.
Elementary only
Some secondary
Secondary completed
Some college
College
completed
Belgium
England
-19
18
12
-
16
95
99
-70
-57
-14
-
-35
-36
16
15
B
Fathers
occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
Germany
Israel
Yugoslavia
4
-
6
50
69
70
-41
-
-71
-38
-34
-16
8
15
70
-73
-80
-56
-15
- 19
5
-
22
-
G
Switz.
-44
-7
23
-
Scotland
U.S.A.
Sweden
56
66
60
-
-66
-38
-44
Y
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
College completed
France
Germany
Israel
Scotland
Switz.
-42
-5
-42
87
61
96
90
1
45
57
73
5
32
58
50
40
17
42
-
25
30
26
25
102
87
40
54
1
32
-6
-
-62
-25
-35
-
-41
31
Tables 7-10 show results for the mathematics, reading, geography and science tests respectively.
The relative performance
of different countries, and of boys and girls, differs on the different
tests, reflecting the national and sex differences in patterns of achievement previously discussed
in connection with Tables 3 and 4. However, the patterns of achievement in relation to education
or occupation of father are much the same from test to test. A crude pooling of results from
different tests and countries yields the following results:
BOYS
Fathers education
Elementary only
Some secondary
Secondary completed
Some college
College completed
Fathers occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Professional
Girls
-36
-2
36
36
41
- 58
-27
1
12
13
14
28
51
66
79
-10
18
25
36
50
Additional tabulations were carried out by size of community in which the pupil resided. Communities were classified into those of under 2,000. those of 2.000 to 20,000, and those of over
20,000 inhabitants. However, there were certain ambiguities in this coding from country to country. It was not entirely clear whether place of residence meant the community in which the
school was located, or the immediate community in which the pupil had his home. Thus, in some
countries, farm children living in quite rural areas were apparently
coded as coming from
communities of two to twenty thousand because that is where they attended school. There was
also no systematic attempt to have the rural areas in the sample be representative
of all rural
areas in the country and the urban areas be representative
of all urban areas. Thus, in the United
States the primarily rural area was from a rather prosperous mid-western farming area, whereas
the two urban communities were primarily rather undistinguished
industrial centers.
The average result over all tests is summarized in Table 11. In general, though with some exceptions, those in very small communities did slightly less well (by about one-fifth of a standard
deviation, on the average) than those in larger communities, but no differences were found between the two categories of larger communities. A comparison of the different tests, as averaged
over all countries, suggests that the differences associated with size of community are greatest
in the case of mathematics and reading, least for the non-verbal test. Results for the separate
tests are shown in Table 12.
Average
TABLE 11
Score by Size of Community
Pooled Results on All Tests
B
Under
2C0J
2000 - 20 oco
for
G
Over
20 000
England
-15
Finland
France
Germany
Poland
Scotland
Sweden
U.S.A.
Yugoslavia
31
21
-4
-44
- 34
- 94
-22
56
42
43
2
- 20
-37
-
-11
58
58
-8
-13
- 33
- 30
Crude Average
- 20
Under
2000
I
2000
- 20 000
S
Over
20 000
-32
- 20
2
- 23
- 33
-48
-6
-99
-51
18
11
6
2
-18
-15
-50
-11
-34
-13
-14
- 24
22
26
-9
- 29
- 26
-58
system
of categories
used
for
Israel.
TABLE
12
Non-Verbal
Mathematics
Reading
Geography
Science
Item difficulties:
Zoo0
-17
-44
- 27
-22
15
resemblance
2000 - 20 CM
-2
0
-3
5
42
G
Over
20 000
-17
-8
4
-3
27
Under
2000
- 26
-54
-31
- 28
-31
2000
- 20 000
-16
-20
Over
20 000
-20
-13
0
-14
-15
3
-11
- 25
between countries
In addition to analyzing scores on tests and sub-tests, it was also possible to study responses to
specific items country by country. The original tabulations showed the frequency with which each
wrong option to an item was selected as well as the frequency of correct response. However,
most of the analyses to be reported here deal only with the correct responses. These are studied
from two points of view. First, correlations
are presented showing the degree of consistency of
item difficulty from country to country. Secondly, certain special groups of items are examined to
throw more light on certain elements of content that are especially easy or difficult in different
countries.
Tables 13-16 show the correlations
of item difficulty among eleven countries (excluding Yugoslavia). The correlations
are over the population of items. A high correlation
signifies that the
same items are difficult and the same ones easy for the pair of countries in question.
The first thing that impresses one as one scans Tables 13-16 is the generally substantial correlations across countries. The average correlation
is .87 for mathematics,
.87 for reading,
.68 for geography, and .72 for science. The high correlations
for mathematics and reading are
especially
impressive. A difficult item is a difficult item in these two tests, regardless of the
school system in which the pupil has been educated or the language in which his schooling has
been couched. The reading test is particularly
noteworthy,
because the differences
between
countries are small both in level of average score and in the relative difficulty of different items.
33
TABLE 13
of Item Difficulties Between
Mathematics Test
Correlations
1. Belgium
2. England
3. Finland
4. France
5. Germany
6. Israel
7. Poland
8. Scotland
9. Sweden
10. Switzerland
11. U.S.A.
l
Average
of correlation
10
11
86
92
95
92
90
84
84
90
97
80
86
89
78
93
88
69
98
92
89
90
92
89
90
95
96
76
89
90
95
92
95
78
90
85
86
75
76
84
91
75
92
93
95
85
93
78
94
95
95
91
90
88
96
86
93
75
87
86
93
89
84
69
76
75
78
75
71
76
83
60
84
98
89
76
94
87
71
93
88
92
90
92
90
84
95
86
76
93
92
91
97
89
95
91
95
93
83
88
92
86
80
90
92
75
91
89
60
92
91
86
-
for boys
TABLE 14
of Item Difficulties Between
Reading Test
Correlations
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
9. Sweden
10. Switzerla: nd
11. U.S.A.
l
Average
of correlation
10
11
88
89
98
83
87
87
85
92
96
89
88
88
86
86
91
82
98
92
85
96
89
88
84
88
84
86
85
91
85
89
98
86
84
80
84
85
83
91
94
86
83
86
88
80
84
81
87
89
81
88
87
91
84
84
84
88
90
85
83
90
87
82
86
85
81
88
82
82
82
86
85
98
85
83
87
90
82
88
84
94
92
92
91
91
89
85
82
88
89
91
96
85
85
94
81
83
82
84
89
85
89
96
89
86
88
90
86
94
91
85
-
for boys
TABLE 15
of Item Difficulties Between
Geography Test
Correlations*
1. Belgium
2. England
3. Finland
4. France
5. Germany
6. Israel
7. Poland
8. Scotland
9. Sweden
10. Switzerlal 7d
11. U.S.A.
* Average
34
of correlation
Countries
1
1.
2.
3.
4.
5.
6.
7.
8.
Countries
Countries
10
11
63
69
93
67
62
56
70
67
90
68
63
67
61
74
54
26
95
82
60
89
69
67
67
84
77
55
74
82
77
72
93
61
67
64
55
54
67
65
86
70
67
74
84
64
84
54
77
80
76
74
62
54
77
55
84
66
64
71
62
58
56
26
55
54
54
66
35
40
49
31
70
95
74
67
77
64
35
85
65
88
67
82
82
65
80
71
40
85
70
84
90
60
77
86
76
62
49
65
70
66
68
89
72
70
74
58
31
88
84
66
-
for boys
TABLE 16
of Item Difficulties Between
Science Test
Correlations*
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
l
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Average
of correlation
Countries
10
11
73
73
94
89
04
46
68
70
90
72
73
70
68
78
84
37
95
72
75
83
73
70
70
78
81
64
65
79
77
75
94
68
70
84
79
39
63
67
84
63
89
78
78
84
88
48
72
78
88
80
84
84
81
79
88
51
82
88
79
84
46
37
84
39
48
51
37
57
47
53
68
95
65
63
72
82
37
71
68
80
70
72
79
67
78
88
57
71
84
79
90
75
77
84
88
79
47
68
64
78
72
83
75
83
80
84
53
80
79
78
-
for boys
The correlations
in Tables 13-16 appear to show a certain amount of clustering. In order to bring
out
the structure more clearly, a factor analysis of the correlation
tables was carried out. The
rotated factor loadings (by varimax rotation) are shown in Tables 17-20. Countries have been
rearranged in the tables to bring out most clearly the clusters.
TABLE 17
Mathematics Test
Loadings of Rotated Factors
Factor
1
Belgium
France
Switzerland
Poland
Finland
Israel
Germany
Sweden
England
Scotland
U.S.A.
96
90
98
80
97
95
98
96
94
94
91
-ia
-09
-10
-38
08
05
00
03
08
09
37
X of variance
87.8
3.2
3
17
27
12
06
12
08
-08
-09
-26
-29
-05
2.8
4
07
00
04
02
-11
-21
-02
17
-02
04
06
0.9
5
-11
-14
- 01
10
06
02
07
06
-12
-03
11
0.7
35
TABLE
16
Reading Test
Loadings of Rotated Factors
Factor
1
Belgium
France
Switzerla md
Poland
Israel
Germany
Finland
Sweden
U.S.A.
England
Scotland
96
94
93
90
93
90
93
95
96
96
95
% of variance
87.9
2
- 25
- 31
- 26
00
09
17
07
-01
13
16
18
05
- 02
-03
10
-10
12
23
11
-06
-14
- 22
-01
-01
05
-26
-19
01
00
16
05
12
05
3.1
1.4
1.6
5
04
08
- 10
05
- 02
-12
05
04
09
06
-07
0.5
TABLE 19
Geography Test
Loadings of Rotated Factors
Factor
1
Belgium
France
Switzerland
Poland
Israel
Germany
Finland
Scotland
England
U.S.A.
81
79
83
50
78
89
88
92
89
a9
-11
-06
-05
- 60
- 48
- 20
-19
16
29
25
-53
-53
-38
-17
16
14
08
10
13
04
% of variance
62.0
7.8
4
-04
02
27
-03
05
12
22
- 28
- 25
-08
7.4
36
.., . _ - .
2.6
5
-05
09
-13
04
-06
-18
06
-02
-03
15
0.8
TABLE 20
Science Test
Loadings of Rotated Factors
Factor
1
Belgium
France
Switzerland
Germany
Israel
Poland
Finland
Sweden
U.S.A.
England
Scotland
92
a7
91
94
95
57
86
86
a7
89
85
25
29
11
13
-02
07
06
- 01
- 21
-40
-44
g of variance
75.3
5.3
3
-13
-17
00
-06
-02
48
33
16
12
-12
-09
4.1
4
- 14
-09
-32
-03
21
05
00
37
05
-06
04
2.9
5
-14
-18
06
11
04
-04
06
04
12
01
-09
0.9
The most striking feature of Tables 17-20 is the large proportion of variance accounted for by
the first factor. This can be thought of as a general factor of difficulty determined by the content
of the item and independent of country. Loadings on later factors, corresponding
to sub-groups
of country, are quite small and account for only a minor fraction of the variance. The later factors
seem to be bi-polar factors in most instances, discriminating
one sub-group of countries from
another.
In mathematics, Factor 2 involves primarily difference between Poland and the USA, while Factor
3 involves difference between the French-speaking
and the English-speaking
countries. The other
factors are of no consequence.
The only factor beyond the first that seems to amount to anything in the reading test is Factor 2,
which again discriminates French-speaking
from English-speaking
countries.
Factors 2, 3 and possibly 4 appear to amount to something in the case of the geography test.
Factor 2 contrasts a cluster composed of Israel. Poland, Germany and Finland with the Englishspeaking countries. Factor 3 groups together the French-speaking
countries. Factor 4 contrasts
England and Scotland with Switzerland and Finland.
On the science test, Factor 2 seems once again to be the French-speaking
versus Englishspeaking factor. Factor 3 links together Poland and Finland, while Factor 4 separates Switzerland from Sweden.
The language groupings represented
in these factors after the first make sense at least in that
the test remains completely uniform for all countries using the same language. It is also quite
possible that educational
patterns are more alike within language groups. A knowledgeable
person may see some rationale underlying the other groupings and polarities that is not apparent
to the author.
Item difficulty:
Examination of item content suggests certain hypotheses concerning groups of items that can
be tested by examining item difficulty
in different countries. In this section, several of these
hypotheses will be stated and examined. The basic data in terms of which these hypotheses are
examined are item difficulty
deviation values. The procedures
for deriving these values are
stated below.
1. The percent right on each item in each country was first transformed
into a normal deviate,
using tables of the normal curve.
2. The average scaled value was obtained for each item over all countries, and the scaled value
in any one country was expressed as a deviation from this. This procedure brought all items to
37
a common base line, so that the deviation value for a given country had comparable meaning
from item to item.
3. The average deviation value over all items on a single test was computed for each country,
and this average deviation value was subtracted from the single item deviation values. This
procedure eliminated differences
in average level of performance
between countries, and left
a residual deviation value, expressed in standard-score
units, that indicated how much harder or
easier that item was for that country than would be expected on the basis of average difficulty
of the item and average performance level in the country.
Hypothesis
countries,
The mathematics test includes four items dealing directly with the manipulation or understanding
of decimal fractions and three dealing with common fractions. The residual difficulty indices for
these were averaged over the items and for boys and girls, and are shown below for eleven
countries. We show the average residual for decimal fraction items, the average residual for
common fraction items, and the difference between them.
Decimals
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
18
-46
13
30
01
-11
08
-42
18
10
01
Fractions
-10
10
-10
-71
14
13
-07
19
-03
03
44
Dec.-Fract.
28
-56
23
101
-13
- 24
15
-61
21
07
-43
From the tabulation, we see that for Belgian children the difference in residuals favors the decimal
items to the extent of about 28/100 of a standard deviation on our normal deviate scale, on the
average. By contrast, for English children, the fractions items are relatively easier by 56/100 of
a standard deviation. In general, our hypothesis
is supported because the large differences
favoring fractions
are all for the English-speaking
countries, and all of the French-speaking
countries show a difference favoring decimals. However, the difference between France on the
one hand and Belgium and Switzerland
on the other is also quite striking. French children of
this age appear to be especially weak on items dealing with fractions.
The major differences
which we find in these groups of items are explicable
in terms of the
systems of weights and measures in the countries involved, and a corresponding
curricular
emphasis. The English, Scats and Americans have so many measures that go by 3s, 4s, or 12s
that they must spend instructional
time on denominate numbers and the fractions that go with
them. The continental countries, relying almost entirely on a decimal metric system can concentrate on decimal fractions, and give other types of fractions only limited emphasis. The results
suggest that this is what has taken place.
Hypothesis
It was possible to identify in the geography test five items dealing with facts about specific
places and five others dealing with concepts of latitude, longitude, and time. Average residual
values were computed for each of these sets, and are shown below.
38
Specific
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Places
Latitude
-16
-31
& Longitude
9
-5
-4
3
-18
10
50
82
-11
-13
-12
-45
6
-5
-9
54
-10
-17
-11
-7
In general, the hypothesis is supported by the data. The largest average residuals tend to relate
to specific place geography. Israel and Poland perform notably well in these items, and England
and the United States relatively
poorly. The items on latitude and longitude show generally
smaller residuals, only Poland performing on these items somewhat better than would be expected in the light of her performance
on the total test. The results suggest fairly wide national
differences
in emphasis on the teaching of specific geographic facts; at least they are learned to
different degrees.
Hypothesis
Reading test items will be easier for pupils of the country and language from which the passage
and items originally come than for other countries and especially
those speaking a different
language.
The reading test was composed of five passages, two of which had originally been in French
(one from Belgium and one from France), two in German, and one in English (from the United
States). It seems plausible, at least, that these might be easier in the language in which they
had been written and for the national culture for which they had been designed. Therefore, the
residuals were computed separately for each passage for each of the English-speaking,
Frenchspeaking and German-speaking
countries. (In Belgium and Switzerland
testing was limited to
French-speaking
parts of the country.) Results are shown below.
Passage No. I
(Belgium)
Belgium
England
France
Germany
Scotland
Switzerland
U.S.A.
00
-06
12
-08
-11
35
-13
Passage
No.
(France)
07
-04
13
-19
-13
- 09
-06
Passage
(Germany)
No. 3
Passage
No. 4
(USA)
05
01
-07
04
04
04
05
-07
-03
- 10
10
07
-15
11
Passage
No. 5
(Germany)
-06
13
-09
12
11
-10
02
Examination of the table shows that the results are in general supported. More generally, the
passages of French language origin seem to be slightly easier for the French-speaking
countries
and the passages of either English or German origin for the English- and German-speaking
countries. Thus, the precaution of choosing original passages from different language sources appears
to have been a sound one. However, even though the differences
appear in a fairly consistent
pattern, they are generally of very small size. The task difficulty seems to transcend language in
considerable
measure. Thus, once again, the universality
of the reading task is affirmed.
39
Hypothesis
National
clusion.
4
groups
show consistent
differences
in willingness
to express
certainty
about a con-
The second part of the science test consists of five statements, for each of which the pupils must
pick one of the five choices: Definitely True, Probably True, Impossible to Determine, Probably
False, and Certainly False. The keying of each item was based on the pooled judgment of several
faculty members at Teachers College, Columbia University, New York, where the test was constructed.
Our current interest is in the nature of the erroneous
can be wrong in any one of these ways:
answers.
(1) An examinee can be too sure. That is, he can mark an item definitely true or false when
it is keyed only probably true or false, or he can choose one of the four other alternatives when
he should mark the item indeterminate.
(2) An examinee can be too cautious. That is, he can mark an item as probably true or false
or as indeterminate when he should have marked it definitely true or false, or he can mark the
item indeterminate when he should have marked some other choice.
(3) He can be grossly in error. That is, he can mark an item on the true side when he should
have marked it on the false side, or vice versa.
For the set of 5 items, there were 20 wrong response options. Of these, 6 were of Type 1, 6 of
Type 2, and 8 of Type 3. We have examined the results for the different countries to see what
proportion of the choices fell on each of these types of error each time the opportunity offered.
That is, we have divided the total number of errors of a given type by the total number of
opportunities
to make that category of error. We have also determined the ratio of too sure
errors to too cautious errors, providing an index of readiness or reluctance to jump to conclusions. The results are shown by sex for 10 national groups*.
o/o
Too Sure
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
U.S.A.
21.9
23.7
13.8
15.8
13.9
15.6
22.3
26.7
24.0
21 .l
16.9
16.3
19.9
20.6
14.6
14.6
14.5
13.4
14.0
12.7
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
o/o
o/o
Too
Cautious
Gross
10.0
14.2
15.0
17.3
17.2
20.5
11.1
12.2
9.8
12.0
14.7
16.8
19.7
22.4
14.2
18.4
22.6
25.9
18.7
21.5
Error
Index
of Sureness
2.19
1.68
0.92
0.92
0.81
0.76
2.00
2.19
2.46
1.75
1.15
0.97
1 .Ol
0.92
1.03
0.80
0.64
0.52
0.75
0.59
10.8
10.6
11.3
11.6
14.0
14.3
9.5
7.3
7.1
9.7
9.8
9.8
14.3
14.1
10.2
10.9
6.7
7.5
12.7
12.6
40
Frequencies
of choice
of the
separate
available
for
Switzerland
and
Yugoslavia.
statement
The preceding pages have shown some of the kinds of comparisons that are possible when academic achievement is examined at the same time in a number of countries with the same set of
tests. These results are of some value for themselves for the international
differences
and simi!arities that were found and described. They are certainly also of interest as exhibiting a mode1
of international cooperation and of empirical comparative education that may be further developed in the future.
41
Fernand
FROM
Hotyat
BELGIAN
INTERNATIONAL
AND
NATIONAL
INTERPRETATIONS
DATA
Following
Professor
Thorndikes
overall
view of the results,
we publish
two accounts
In which
the
authors
have used the data for the lnterpretatlon
mainly
of national
performances.
In the first of
these,
M. Hotyat
concerns
himself
particularly
with
some striking
analogies
between
patterns
of achievement
in French-speaking
countries
and contrasts
these
with
patterns
of achievement
on the same groups
of test Items in English-speaking
countries.
M. Hotyat
also uses the international
results
to throw
light
on certain
aspects
of the Belgian
situation,
and in the second
half of hls article
analyses
the Belgian
results
according
to school
type,
sex. regularity
of
pupils
promotion
through
the grades,
and nationality
of parents.
In the course
of his paper he
suggests
a number
of interesting
methods
which
can be used to interpret
data of this kind.
The Belgian share in the research was carried out by a team from the Centre de Travaux de
Ilnstitut Superieur de Padagogie du Hainaut (Mme. Delepine, M. M. Hotyat. Lowyck, Rousseaux.
Manouvrier).
In accordance
with the decision taken by the international
research group, the
analyses made by the Belgian participants aim at profiles of achievement which provide fruitful
opportunities
for interpretation.
In order to do this, we have constructed
profiles in such a
manner that in each country the average score on each of the five tests is reduced to zero. The
following statistical approach was adopted with this in mind:
1. We calculated for each test the mean and standard deviation
of all the test scores, and
expressed the mean for each country as a standard score*.
2. We established for each country the mean of its 5 standard scores.
3. We scaled down the number thus obtained from each of the 5 scores so that, for each country,
the mean of the marks equals zero.
Here is an example based on the standard scores from one of the countries:
Non-verbal
+28
Mathematics
- 30
Reading Comprehension
1-25
Science
Geography
-8
l
The standard
deviation
from
scores
the mean.
express
the
positive
or
negative
value
of
a test
score
in
hundredths
of
standard
43
TABLE
Non-verbal
France (Lisieux)
Switzerland (Geneva)
Belgium (Hainaut)
1
Reading
Comprehension
Maths
- 20
t31
-6
+17
t9
1-21
14
+9
t8
+45
- 21
FIGURE
Non-verbal
0.4
Geography
Maths
Science
- 20
-42
-17
-12
Reading
Comprehension
Science
Geography
--
Lisieux
Geneva
_ -
Hainaut
. -.
These three profiles show some interesting analogies, in particular a peak in mathematics
troughs in reading comprehension and in science which are common to all three.
and
A detailed analysis of the profiles obtainable from the scores of each of the twelve participating
countries would be beyond the scope of this article, and we shall therefore limit ourselves in
Table 2 and Figure 2 to comparing the profiles of the means of the three French-speaking
samples
with those obtained similarly from the three English-speaking
groups in England, Scotland and
the USA.
The way in which the two profiles appear to complement each other is striking. (It should be
remembered that the two language groups which are here being compared represent only half
of the national samples which took part in the research.)
TABLE 2
Non-verbal
Fr.-speaking countries
Engl.-speaking countries
44
-1
+23
Maths
+32
-34
Reading
Comprehension
-8
+15
Geography
-1-3
-21
Science
-25
+17
FIGURE 2
Non-verbal
Maths
Reading
Comprehension
Results
samples-
Results
samples
Geography
Science
Of course, even if the research had related to true national samples, it would still be rash to
draw any conclusions
from these findings about the quality of the respective school systems;
other factors would have to be considered, such as curricula, time-tables, and the form of the
tests. But we have now developed a method which will enable us to make comparisons
based
on real national samples when subsequent research has been carried out.
Relative
difficulty
of items
We asked ourselves, too, to what extent the order of difficulty of items in each of the subjects
tested was the same from country to country when their results were compared.
1. In our preliminary analysis we applied Yules formula to the percentage of success on each
item obtained by each national group. The correlation coefficients
are spread out in the following
way (220 coefficients
for 11 samples and 4 subjects): in mathematics, from 0.75 to 1, with a
mean of 0.94; in reading comprehension
from 0.60 to 1, with a mean of 0.89; in geography from
0 to 0.92, with a mean of 0.68; in science from 0.05 to 0.95, with a mean of 0.54.
Assuming that the order of difficulties
within each test were equivalent, these data would mean
that there existed a closer relationship
between the way in which mathematics was taught in
the various countries than there was in the teaching of geography and science.
2. The table of percentages of items correctly answered enables us to make comparisons which
are of definite educational
interest. Let us take, for example, on the one hand the mean percentages for three items involving fractions, and on the other for four items involving decimal
numbers, and compare the results obtained in Belgium with those from the Anglo-Saxon
countries
where the system of weights and measures requires early and intensive teaching of fractions.
The percentages are set out in Table 3.
* If a = the number
of items
for which
the results
are higher
than the mean
in the hvo samples
El and
if b = the number
of items superior
to the mean in El and inferior
in E2: if c = the number
of items Inferior
the mean in El and superior
in E2; and if d = the number
of items inferior
to the mean in both samples,
ad+bc
we have o = ad-bc
EC?:
to
45
TABLE 3
Belgium
Fractions (mean %I
Decimal numbers (mean %I
Anglo-Saxon
countries
76
61
78.5
75.5
All other things being equal, it would appear that pupils in the Anglo-Saxon
disadvantage because of the need to start learning fractions at an early age.
countries
are at a
3. By converting the percentages of items passed into standard deviations from various national
means, we are able to establish national profiles of the relative difficulty of items in each of
the tests, leaving out of account the absolute levels of national achievement.
These results, quantified in this way, are more precise and more flexible than those provided by
the ranking of items according to the degree of success achievedon
them, and they offer very
interesting possibilities of analysis.
Thus, a comparison of these indices for a particular country enables us to examine whether the
order of results has a close correspondence
with the hierarchy of aims set up by the authors of
school curricula there, and, if this is not so, to study the teaching methods which are being
employed with a view to making whatever improvements seem desirable.
Table 4, for example, shows the standard deviations relating to some of the mathematical items,
obtained from the Belgian samples scores:
TABLE 4
Boys
Girls
- 0.145
+0.126
+0.25
- 0.21
- 0.28
-0.16
-to.02
+0.355
- 0.32
- 0.20
12 items of information
Interpretation of maps (16 items)
Relating facts to generalised statements
- 1.2
-2.1
+15
Taking into account the age of the pupils who are being tested, do these
respond to the hierarchy of aims which the education authorities assign to
If not, there are indications that we should study the way in which the
methods and procedures have been conceived in a country where, it seems,
reveal a more satisfactory balance.
46
Country
+19
-11.8
- 11.3
relative results corgeography teaching7
curriculum, teaching
the results achieved
Results according
to sex
If we first convert the scores of the boys to a standard score scale with a mean of 0, here for
purposes of comparison is the profile of the mean scores of the group of Belgian girls given as
standard scores:
Non-verbal
Reading
Comprehension
Maths
Geography
Science
+ 0.60
+ 0.4
+ 0.2
0
- 0.2
- 0.4
- 0.60
- 0.115
- 0.23
+ 0.015
- 0.24
- 0.54
The table of frequencies of mean plus and minus scores for girls from eleven countries
confirms this profile when compared with the boys results.
Non-verbal
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Total
Maths
TABLE 6
Reading
Comprehension
Geography
Science
+
+
+
8/l 1
-I-
+
+
+
9111
5/l 1
9/l 1
ll/ll
(Table 6)
No. of negative
results
5
4
5
3
5
5
5
3
2
4
1
42155
Over the whole range of subjects tested we are able to conclude that, with the exception of
reading comprehension,
the girls results are clearly inferior to those of the boys and the gap is
particularly
marked in science. Various hypotheses could be advanced to explain this situation
which seems all the more surprising considering
that common programmes of instruction
are
followed by both boys and girls in most of the countries concerned. Is the relative weakness of
the girls results to be explained by educational factors - for example, teaching which is biased
towards literary subjects - or rather by the way in which the tests themselves have been conceived, so that they call especially for the types of intellectual
functioning
which come more
naturally to boys? Since the profiles of scores vary very greatly from country to country, we are
faced with the hypothesis (which needs to be verified experimentally)
that there is possibly the
47
influence of a combination of social and educational factors at the bottom of these differences*
The most striking feature is the general inferiority
of the girls average scores in science. It
would be very valuable if research could be done on this problem. The question has real social
significance
at a time when more room is being found for sciences in educational programmes,
and when the technical aspects of life require that schoolchildren
receive a basic training in which
the experimental sciences play an important part.
Correlations
between different
types of test
The non-verbal test which we used contains three types of test item requiring the following kinds
of reasoning:
- choosing a fourth figure which has the same relationship to the third as the first two have to
each other;
t
- extending a series of numbers or of letters in accordance with a pre-set pattern;
- picking out from a group the odd figure which does not conform to the principle governing
the others.
Comparisons have been made of the correlations
between this test and those parts of the mathematics and geography tests which depend on reasoning rather than information for an answer.
The correlations obtained (Bravais-Pearson
formula) are as follows:
Mathematics:
TABLE 7
Maths.
II
(arithmetical
reasoning)
Maths.
I (information
Maths. II (arithmetical
0.72
-
and calculations)
reasoning)
The dispersion
Maths.
(geometrical
the intellectual
activity
III
reasoning)
0.81
0.72
involved
depends
on a whole
of test scores
We have taken as the coefficient of variation the relationship between the standard deviation and
the mean. Table 8 shows these coefficients
in the form of percentages for each of the tests and
each of the national samples.
* This hypothesis
metic.
Thus Toivo
the same textbook,
and significant
not a significant
48
has already
been found to have some foundation
in experimental
studies
in the field of arithVahervuo
studied
the performance
of 30 classes
in the 4th school
year in Helsinki,
all using
and discovered
that the results
of the boys classes
were superior
to those of girls
classes
at the
level.
2 y.
level:
but
he also
observed
a slight
superiority
of the
girls
in mixed
classes,
although
TABLE 8
ABCDEFGHIIK
Countries
Means
Tests
Non-verbal
26
30
36
31
41
37
48
Mathematics
Reading comprehension
Geography
Science
19
19
21
21
29
24
26
20
27
24
26
24
30
29
36
40
46
28.9
26
26
28
26
28
24.1
24
20
24
23
28
30
28
33
35
33
36
28
33
33
29
40
36
34
36
36
35
36
40
35.2
Means
23.7
23.7
26.5
27.5
28.2
28.5
30
31
33.5
33.7
37.5
43
35
34
36
36.9
INTERPRETATIONS
1. General
OF DlFFERENCES
IN PERFORMANCE
WITHIN
THE BELGIAN
SAMPLE
situation
TABLE 9
Non-verbal
Total
Boys
Girls
Maths
Readmg
Comprehension
Geography
+10
+45
- 27
-15
-;25
+58
+32
- 23
i-3
-5
high in mathematics
-31
and particularly
- 34
Science
- 25
+2
-53
Going beyond this, we can compare the scores on the various sections of the tests. In mathematics, the Belgian scores, which are barely satisfactory
for arithmetical calculations,
are particularly high for items which require reasoning and the use of concepts. At the other extreme, in
reading comprehension
they are expecially low for literary, historlcal and economic texts, One
wonders whether our present teaching methods demand sufficient
individual participation
from
pupils when texts are being studied, especially in literature and history. It would be worthwhile
to study this problem, since silent reading - which is true reading according to our official
primary school syllabus-plays
a highly important cultural role.
49
In geography, Belgian scores are average on questions which involve the direct use of verbal
information, but they are rather weak on items calling for the interpretation of maps.
An evaluation of the results in science seems inappropriate,
since curricula in this subject differ
too much to permit valid comparisons to be made between the mean scores of different countries.
2. Differences
according
to types of school
of school
Non-verbal
A. Vocational
schools - boys
B. Vocational
schools - girls
C. General secondary
schools-boys
D. General secondary
schools - girls
10
Reading
Comprehension
Maths
Science
- 26
-32
-44
- 24
-44
-55
- 34
-55
-48
+52
+55
+52
+55
+57
+24
Jr34
+11
+22
-6
FIGURE
Non-verbal
C. Mean-general,
boys
Geography
Maths
-3
Reading
Compr.
Geography
Science
D. Mean-general,
girls
Mean
of total
sample
Mean
total
A. Mean-vocational,
boys
B. Mean-vocational,
girls
-1
of
sample
(I
a. Assuming that scores on the non-verbal test can be taken as a valid criterion of the pupils
capabilities, these data lead us to the following conclusions:
- in the sample of boys in vocational schools, the mean performance
is rather low in mathematics, very low in reading comprehension,
but high in science, when compared with the result
of the non-verbal test:
50
- among girls in vocational schools the mean is higher than might have been expected in
reading comprehension,
but particularly low in mathematics and geography;
- among boys in general secondary schools, the mean is high in all subjects;
- among girls in general secondary schools, the mean is high in mathematics, a little low in
reading comprehension
and very low in science.
b. Both for boys and for girls the means in general secondary schools are superior to those for
either sex in vocational schools.
If the scores for both sexes are combined, the differences are, respectively:
71 y. of a standard deviation on the non-verbal test, 93 y. in mathematics, 72 y. in reading
comprehension,
81 o/0 in geography and 57 o/o in science.
It would, of course, be quite erroneous to judge the relative merits of these two types of schools
merely by comparing these results. In the first place the study covered only subjects to which
more importance is attached in general secondary schools than in vocational
schools, and in
addition to this, the gap between mean scores achieved on the non-verbal test entitles US to
assume that the differences
already existed at the time the pupils started their post-primary
education*.
The means give us only a rough idea of the differences,
but the following
table (Table 11) of
quartile distributions
provides US with a more differentiated
picture. (These distributions
have
been obtained by finding the score point at the quartiles for the total group and then calculating
the percentage falling within the different quartile ranges in the general and vocational schools
separately.)
TABLE 11
Reading
Comprehension
Maths
Gfl.
Lowest
VOC.
Gen.
Geography
VOC.
Gen.
Science
voc.
GC?.
voc
Quartile
Lower Intermediate
Quartile
l.5
pq
:;::
pj
ii?
pq
Upper Intermediate
Quartile
Highest
Quartile
are the
schools
mean
based
percentages
on samples
of success
of N=150+
on the arithmetic
test
In each type of school.
Boys: General
Secondary
Schools
dArithm&ique
Girls: General
Secondary
Schools
74.7 %
59
74
74.3
54.5
71.3
64.8
(Publications
74.7 %
56.4
73.1
73.4
50.7
70.5
64
de
Ilnstitut
Sup6rieur
taken
during
entrbnce
Boys:
Vocational
Schools
exsmlns-
Girls:
Vocational
Schools
70 %
38.6
54.1
56.9
35.5
58.5
53.2
de Pbdagogie
du Halnaut,
62 %
29.8
50.6
54.1
25.8
44.7
37.4
1961).
51
The differences
proceed, in short, from an unequal distribution of superior and inferior scorers
between the different types of schools, the superior ones being more numerous in general
secondary schools, and inferior ones in vocational schools.
c. The dispersion of scores is wider in vocational schools.
on the mathematics test in the two types of school:
TABLE 12
Vocational
schools
General Secondary
Schools
Mean
S. D.
14.4
18.4
4.04
3.59
Coefficient
of variation
in 1OOths of an
S. D. from the mean
28
19
The greater homogeneity of results in general secondary schools results from the fact that they
are very selective, whereas in the large vocational schools, the spreading of pupils over different
parallel courses makes it possible for weaker pupils to stay on in classes leading to leaving
certificates
at lower levels of achievement.
3. Differences
according
to sex
According to Table 10 and Figure 4 the boys means are higher than those of the girls in general
secondary as well in vocational schools, except in vocational schools where reading comprehension is concerned.
Because of the small number of schools covered by our present enquiry, we are not entitled,
however, to draw any general conclusions from the results obtained. It would be interesting to
carry out research on a larger scale into this problem as it exists in Belgium in order to determine the causes for these differences, should they appear significant.
4. Differences
The pupils
Table 13:
according
included
to regularity
in the sample
of promotion
were
distributed
grades
as shown
in
TABLE 13
Vocstionsl:
In grade appropriate
Repeated one year
to age
boys
62
78
Vocstlonsl:
84
86
girls
General:
114
59
boys
General:
girls
97
53
Vocational
schools clearly contain a higher number of 13- and 14-year-old
pupils who have
repeated a grade, but this repetition has usually already taken place before entry into the vocational schools, as their intake includes numbers of pupils who have already doubled classes at
primary school or who have been diverted to technical schools after failure in general secondary
schools.
In order to study the extent to which repetition of a grade corresponds to lower levels of ability
or educational performance, we present in Table 14 figures for the degree of significance of the
difference between the means for each of the tests and for each type of school.
52
3
E
p
VOCATIONAL:
TEST
SCHOOL
MSWl
Non-verbal
Mathematics
SOYS
Regularly promoted
Repeaters
Regularly promoted
Repeaters
32.24
29.65
15.69
14.37
s. D.
13.7
10.5
GIRLS
Regularly promoted
Repeaters
17.80
16.97
Mean
30
s. D.
4.6
1.06
14.32
Science
Regularly promoted
Repeaters
Regularly
26.93
10.5
14.77
3.5
BOYS
13.17
4.1
18.75
16.90
3.8
GENERAL:
12.80
GIRLS
4.1
19.94
9.37
2.7
8.42
3.05
10
2.9
3.8
8.29
2.5
16.84
2.6
23.07
4.3
2.15
37.72
11.3
31.67
9.6
18.82
14.77
16.32
13.17
17.80
4.4
19.44
3.8
16.97
4.05
18.47
17.74
3.8
12.98
3.5
9.11
2.8
7.86
2.6
4.17
2.05
4.73
15.44
3.9
11.50
2.7
7.8
4.78
9.30
2.95
3.47
5.67
4.83
6.55
S. D.
7.2
1
12.23
Meall
5.73
33.93
3.6
11.3
3.24
0.67
13.83
S. D.
2.75
2.63
Repeaters
promoted
4.4
Meall
43.57
3.7
Geography
GENERAL:
1.88
1.85
4.8
4.3
10.8
1.22
3.6
Reading
Comprehension
VOCATIONAL:
HISTORY
4.16
5. Differences
according
to nationality
of parents
Non-verbal
Mathematics
Reading
Comprehension
Geography
Science
Foreign
Belgian
parents
parents
~~ .-... __--
Standard
Foreign
parents
Deviations
Belgian
parents
DlffWeWXS
31 .l
14.8
30.2
15
12.3
5.28
11.9
4.17
16.7
13.8
8.5
18.6
14.1
9.1
4.82
4.59
3.1 1
4.62
3.97
2.81
not significant
not significant
significant at 57; level
not significant
not significant
-_.~-.-
54
et situation
soc~ale
dans
la region
du Centre
et du Borlnage
Figures 5 and 6 below illustrate this situation: they present in diagrammatic form the two groups
results in reading comprehension
and mathematics in terms of 5-point normalised scales.
FIGURE
FIGURE
Mathematics
Reading
Comprehension
I
25F
-3/2a
-712
Mean
-I-
12
Children
- B
Children
with foreign
- F
parents
+312 a - 312 a
- 112
I-*
Mean
+1/z
-?- 3/2 a
55
-.-
--.-
D.A.Pidgeon
TEST
A COMPARATIVE
STUDY
OF THE
DISPERSIONS
OF
SCORES
In the second
of the two articles
which
eerve to demonstrate
the usefulness
of international
comparative
studies
in shedding
further
light on problems
occurring
In particular
educational
systems,
Mr. Pidgeon
compares
the standard
deviation
on the five tests in the twelve
countries
and draws
conclusions
about
the effect
which
streaming
(the form
of class
organisation
extensively
practised
in England
and Scotland)
the different
approach
to teaching
for the phenomenon
he has noted.
has
which
on the dispersion
of test scores.
The author
discusses
streaming
may encourage
and which
could
account
Introduction
57
Results
The obvious statistic for measuring the spread of scores on a test in any sample is the standard
deviation. Using the raw S. D. for each country on a particular test, comparisons can legitimately
be made between countries on that test provided it can be shown that there is no direct relationship existing between S. D. and mean. As an indication of how successful the tests were in
each country, in no case was the mean score either too high or too low to be seriously influenced
by a ceiling
or floor effect, and in only one test (science) was the rank order correlation
between S. D. and mean, positive. Hence the standard deviation has been used for making comparisons in this study. Since, however, five different tests were used, each with a different number
of items, in order that comparisons could be made between tests, the raw S. D.s for each country
have been converted into a standard score. In Table 1, which gives the relevant figures, a high
positive value indicates a standard deviation well above the average for all countries on that
test, and a negative value a standard deviation below average for all countries.
58
Table 1
Standard
Deviations,
country
expressed
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Yugoslavia
in Standard
Score
Mathematics
Non-Verbal
0.43
1.47
0.29
0.78
0.66
0.23
1.74
0.20
0.81
1.72
0.06
1.34
Reading
- 0.26
2.23
- 0.38
- 0.06
0.14
-0.18
- 1.21
1.27
- 0.66
- 1.68
0.14
0.65
Geography
- 0.85
1.44
0.48
- 0.55
0.84
-1.16
- 1.39
1.21
0.11
- 1.49
0.89
0.46
Countries
Science
-0.19
1.23
- 0.69
0.18
0.53
- 0.47
- 2.37
1.21
0.76
- 0.49
1.03
- 0.72
Average
- 1.34
2.61
- 0.49
- 0.87
0.27
- 0.06
0.08
0.99
0.18
- 0.77
0.23
- 0.82
- 0.44
1.80
-0.16
- 0.10
0.49
- 0.33
- 1.33
0.97
0.24
- 1.23
0.45
- 0.35
Deviation
Countries
I
4.0 I
Fi
Fr
SC0
Swe
Swi
Some comments on Table 1 and Figure 1 are clearly necessary. Firstly, it must be emphasised
that no significant
relationship
exists between the standard deviations
and the mean scores
obtained in each country. Secondly, the fact that the pupils tested in each country were not
strictly random samples must to some extent detract from the significance
to be attached to
these findings. Samples of schools selected by subjective judgment to be representative,
are
more likely than not to yield smaller standard deviations than random samples, owing to the
human tendency to under-represent
the very bad. Also in some instances a restriction
was
acknowledged.
The description
provided by Switzerland
of its sample clearly indicates that it
was hardly representative
of the whole country, since it was taken exclusively from a prosperous
middle-class town. In Israel the sample excluded recent immigrants from under-developed
areas;
it was also chosen from 8th grade pupils, that is, although the mean age was 131/g, it contained
some pupils younger than 13 and older than 14. Both these factors clearly affected the dispersion
of test scores. In other countries, however, the sample was chosen by methods similar to that
employed in England, namely, the testing of all pupils falling within the stated age range attending
---.
-_-_
-_-.__.
all types of secondary school in a seiected area - the selection of this area being based on
other test evidence, which suggested that it was reasonably typical of the whole country both
as regards mean and spread of test scores. In Scotland, two areas were chosen - a city and
a county - but the sample was deficient in children from the professional
and skilled worker
classes, probably resulting in some restrictions of the dispersion of test scores.
With these reservations
the data from Table 1 must be viewed with caution. Nevertheless,
there
would seem to be fairly clear indications that the dispersion of test scores in England is large
compared with that in other countries, thus supporting the previous evidence cited. It would not
seem an idle occupation, therefore, to speculate upon the reasons for this.
Discussion
All countries concerned in this study, apart from England and Scotland, employ some variant of
what might be called the grade placement system. In such a system, children are assigned to
grades initially according to age, but subsequently according to their ability to assimilate successfully the work covered in the grade in which they were the previous year. It is possible for some
children to be accelerated,
that is, to miss a grade, and for others to be retarded,
that is, to
repeat a grade. Thus, at any given point in time, say after five years schooling, while the majority
of pupils will be found in Grade 6, some will be in Grade 7, and others in Grade 5 or possibly,
having repeated
twice, in Grade 4. The numbers of pupils accelerated or retarded wi!l depend
upon the limits accepted as constituting a successful pass in the previous grades work, and
this, in turn, upon the standard of work demanded in each grade.
In England and Scotland, however, yearly promotion is primarily based on age, and if numbers
necessitate, pupils within one age group will be divided into separate classes. In both countries,
it Is the general practice in such circumstances to stream these separate classes by ability and
attainment, even in the junior school i.e. between age 7 and 11. Owing to the relatively larger
number of small primary schools in Scotland, the proportion of children in streamed classes is
somewhat less.
These descriptions
do not do justice to the differences
that exist in the various countries; they
are probably sufficient, however, for the present purpose, since it is contended that it is not the
difference between the two systems that is important so much as the general aims and beliefs
of the teachers practising within the systems. It is argued that the major objective of the grade
class teacher is to ensure that as many of his pupils as possible complete the work of the grade
successfully.
His class is, however, heterogeneous
with regard to ability and it is not unreasonable to suppose that the brighter children will require much less effort from the teacher
than the duller ones. Hence, while it is possible that the brighter children within such a group will
not be unduly extended, the duller ones will presumably receive every encouragement
to achieve
the grade pass and the net result will be a tendency for achievement test scores, if not to cluster
closely around the mean, at least to be relatively unrepresented
at the extremes. In England,
however, there is no such general acceptance of a similar curriculum for all children in a particular class and, certainly in streamed schools, the work achieved by A stream children at any
given age will be considerably
more advanced than that expected of C stream children.
This introduces the notion of expectancy
or the standard of work expected by a teacher of
his pupils. What is expected will clearly be determined in the first place by any curriculum that
is defined, but secondly, it will be influenced by the philosophical
beliefs of the teacher. In
many countries employing the grade placement system, the curriculum for any grade will be
defined quite clearly even to the provision of state text books; in others a greater degree of
fluidity within a school or class may be found. In England and also Scotland individual schools
have a higher degree of autonomy and what is taught at any age will depend to a greater degree
upon what the head teacher or even the class teacher thinks is right, although external examinations even in the primary school control this to a certain extent. But in all countries what a teacher
expects from individual pupils in his class must also depend upon his own particular beliefs.
Different patterns of achievement would be obtained, for example, by two grade class teachers,
one of whom believed that all children in his class were perfectly capable of covering adequa-
80
tely the work of the grade, given sufficient effort on his part with the duller ones, and the other
who believed that, since achievement was necessarily
limited by innate ability, there must be
some children in his class incapable of completing the years work successfully.
Such difference
in beliefs will also have an influence on the work of brighter children. The teacher who strives to
match attainment with ability will also be aware that children of high ability are capable of work
more advanced than that demanded by the grade syllabus, and although in the grade system this
matching will be achieved to some extent by jumping a grade at promotion time, it is clear it
will also influence what the teacher expects from the brighter pupils within his own class.
When considering the effects of these
given age, it would seem clear that the
extent of regarding innate ability as the
hence tend to obtain a wider dispersion
the fact of individual ability differences,
of the teaching situation play an equal if
It is maintained that this belief in the relative importance of innate individual ability differences
is
predominantly
held in England and indeed has lead to the general acceptance of the practice of
streaming. Burt (1959) has said . . it is plainly imperative that both teachers and local authorities should take full account of such differences
in their efforts to provide an education which
will (in the words of the Act)* be adapted to each childs ability and aptitude. To match attainment to innate ability presupposes,
of course, that the ability can be measured. Some popular
misunderstandings
regarding attempts to do this have been described elsewhere (Pidgeon, 1961).
Burt himself has stressed many times the difficulty
of ascertaining
a childs I. Q. accurately
(e, g.: Burt, 1959), but nevertheless
insists that, because of the wide dispersion of measured
selection is absolutely essential: in my
intelligence
at age 10 or 11, some kind of provisional
view it should start much earlier,
. indeed, as soon as possible after a child has entered school.
The acceptance of this view led not only to the separation of children of greater and less ability
into different types of secondary school (Board of Education, 1926) but also to the practice of
streaming in junior schools (Board of Education, 1937). However, and Burt himself stresses this,
if children are to be separated for differential
instruction from an early age, it is essential that
the child is free to swim from one stream to another as his capacities
develop or decline
(Burt, 1959) and also, it must be added, to allow for inaccuracies in the original measurement. But.
as Daniels (1961a) has shown, there is far less fluidity in streaming than teachers themselves
imagine, or, as Vernon (1955) has demonstrated, is necessary.
The concern here is not whether streaming is, in itself, good or bad, but with its effect on the
dispersion of achievement. It is argued that the expectancy of A stream teachers for relatively
high attainment helps in itself to lead to this result being obtained, just as the expectancy of C
stream teachers for relatively low attainment helps to produce this result. Also, of course, the
belief that attainment can and should be matched to ability, made easier perhaps in homogeneous
ability classes, while it would tend to result in the stretching of brighter children, would not have
this effect with duller ones, since, for many at least, the limit of their capacity would apparently
have been reached. The effect this has on increasing the dispersion of achievement scores might,
perhaps, be enhanced where teachers use tests of intelligence
and attainment to help measure
the success of their efforts, for ordinary regression effects will tend to make dull C stream
children, subsequently tested for attainments, appear to be working up to capacity and bright
A stream children appear to have room for further improvement.
It should perhaps be added
here, that the more successfully
childrens attainments are matched to their ability, the more
successful will any initial streaming appear to be - the self-fulfilling
prophecy
described by
Daniels (1959).
There would appear, therefore, to be a number of factors affecting the dispersion of achievement
at any given age. In the first place, the general aim of the grade class teacher may tend to result
in a relatively smaller dispersion. Perhaps exerting a greater influence, however, is the belief
a teacher may have that innate ability is of paramount importance in determining the level of
* The 1944 Education
Act.
61
of Pupils scoring
Test
Average
12 countries
20.7
20.9
19.5
21.9
18.4
Non-Verbal
Mathematics
Reading
Geography
Science
Below
of
beyond
I S. D.
England
15.9
39.1
25.1
34.7
24.2
Above
Average
of
12 countries
20.3
22.7
18.2
18.3
16.0
+1 S. D.
England
28.7
15.5
20.8
11.4
18.1
It will be observed from Table 2 that, in three of the five tests (non-verbal,
reading and science)
England has a larger percentage than the average scoring above plus one standard deviation,
but that in all four attainment tests, it also has a larger percentage than the average scoring below
minus one standard deviation. Some concern might be felt for the 39.1 O/Oof pupils obtaining
low scores on the mathematics test.
Bibliographical
references
Board of Education
1926 -
Board of Education
Burt, C., 1959
1937 -
Daniels,
J. C., 1959
Lloyd, F. and
Pidgeon, D. A., 1961
Pidgeon,
D. A., 1958
62
David A.Walker
AN ANALYSIS
OF THE REACTIONS
OF SCOTTISH
TEACHERS
AND PUPILS
TO ITEMS
IN THE GEOGRAPHY,
MATHEMATICS
AND SCIENCE
TESTS
To establish
the relevance
of test items to pupils
learning
opportunities
is important
both from
the point of view of measuring
achievement
and from that of maintaining
the goodwill
of teachers
whose
pupils
undergo
the tests.
In the following
article
Dr. Walker
shows
how this need can be
fulfilled
and seeks further
to establish
the extent
to which
in- and out-of-school
learning
opportunities.
ability
and otherfactors
appeared
from the international
tests to be determinants
of success.
When the same test, or series of tests, is administered to pupils of different countrieswith
different
educational systems, it is unlikely that the items will be equally acceptable or equally useful in
all of the countries concerned. The present inquiry was intended in the first place to assess the
reactions of the teachers of the classes concerned to the items used in the tests of geography,
mathematics and science, and secondly to estimate, if possible, the contributions
made by stress
on the topics in the curriculum and bythe environment to the accuracywith
which pupils answered
the questions.
(2)
for rating
items
(Mathematics,
Science)
was taken
by a group
to rate the items of the test in two ways:
used for these pupils;
of the pupils.
of 13-year-old
Read
the
required
each
test
item.
typically
covered
point scale.
Rating
Rating
Rating
Enter
Consider
in the
degree
instruction
of
to which
pupils
in
rating
Consider
1, 2 or 3 in the
next
the
extent
answer
apace
provided
to which
pupils
have
Rating
Rating
Rating
rating.
(4)
knowledge
like
in the
Use the
and
yours.
skills
Give
extensively,
in the test
opportunity
encounter
knowledge
such as those
involved
some, or little
exposure
to such experiences.
Enter
the
classes
1 Stressed:
well covered
in class and in homework
(if any).
2 included
but not stressed;
touched
on but not dealt with
3 Not included.
the
(3)
test
The attached
test in Geography
school
last year. You are asked
(a) in relation
to the curriculum
(b) in relation
to the environment
a rating
intensively
pupils
by the
on
the
in your
question
following
era
three
or repeatedly.
booklet.
in the
home
and
in the
test question.
Decide
following
rating
scale.
community
whether
to use
there
skills
or
is considerable,
A Considerable
exposure
B Some exposure
C Litt!e or no exposure
the rating
A. 6 or C in the
e.g., 1A. 2C.
answer
apace
provided
in the
test
booklet.
Indicate
clearly
on the front of the test booklet
the school
and course
of the tests these coursea
were described
as five-year,
three-year
with
no foreign
language
and three-year
modified.
In some
booklets
schools
should
(5)
comment
Any
the coursea
for boys
be labelled
accordingly.
which
teachers
may
and
wish
girls
differ
to make
and
separate
on the tests
will
Each
item
will
thus
have
a double
to which
the ratings
refer. At the
one foreign
language,
three-year
booklets
be welcomed
will
be required
for
each
sex.
time
with
The
by the Council.
63
In mathematics
the position was similar though in this subject there was a higher proportion
given to rating 1. Of the 1,066 curriculum ratings for the 26 items, 42 % were 1, 34 % were 2 and
24 % were 3. The help expected from the environment was even less in this subject, the rating A
occurring in only 6 % of the replies, B in 28 % and C in 65 %.
In science, the curriculum ratings were similar to those in the other subjects. Of 730 ratings on
21 items, 30 % were 1, 30 z were 2 and 40 9; were 3. Greater help was thought to be available
from the environment in this subject, the rating A being given in 20 % of the cases, B in 40 % and
Cin40%.
The ratings
Appendix.
differed
greatly
It must not be assumed from the figures quoted above or from the data in the Appendix that the
topics with adverse curriculum ratings are not covered in the school courses. In many cases they
occur at a later stage in the curriculum. The pupils tested were mostly in the second year of a
course which is of three to six years duration and different schools have different schemes for
covering the work.
Agreement
It would have been possible to calculate a mean curriculum rating and standard error of the
mean for all teachers, using the values 1, 2 and 3 for the three ratings. This might, however, have
given a misleading picture. For example, an item rated 1 by 20 teachers, 2 by none, and 3 by 20
teachers would then be given a mean rating of 2, which was not actually given by any teacher,
and a standard error of about 0.16, which is relatively small because of the number of teachers
involved. For this reason the table in the Appendix gives the most frequently occurring rating
for each item and not the mean.
One possible factor causing disagreement
among ratings is the variation in course to suit the
ability and sex of the pupils. In any assessment of the extent of agreement among teachers in
rating the items it is therefore advisable to deal separately with different types of course. The
main types in Scotland are (a) the five-year course for the more gifted pupil, (b) the three-year
course with one foreign language for the pupil a little above the average, (c) the three-year
course with no foreign language for the average pupil and (d) the modified course for the pupil
within the lowest 10 to 20 % of the ability range. In some schools it is also necessary to differentiate between courses for boys and those for girls.
Within each of these types of course we can assess the extent of agreement among the teachers
ratings by calculating their variance. If the curriculum ratings are valued at 1, 2 and 3, as given
by the teacher, the variance of the distribution
wi!l be zero when all teachers agree, 2/3 when
the ratings are distributed evenly over all three values, and 1 when half of the teachers select
rating 1 and the other half select rating 3, showing maximum disagreement.
These variances
ously described,
64
were calculated for all the items of the three tests, using the categories
and the results are summarised in Table 1.
previ-
TABLE
Variances
of Curriculum
9
z
E
5
2
Items
Highest
variance
Lowest
variance
Average
variance
0.30
0
0.15
0
0.39
0.36
0.20
1
0.63
0.67
Five-year
6
11
6
4
0.29
0.75
Three-year
one foreign language
no foreign language (boys)
no foreign language (girls)
G
11
12
0.14
0
0
0.39
0.25
0.25
0.80
0.52
0.67
0.18
0.75
0
0.13
0
0
0.43
0.53
0.43
0.36
0.80
0.98
0.86
1
Five-year
Three-year
one foreign language
no foreign language
modified
Five-year
E
z
z
cn
for Different
Number of
teachers
COURX
2
$
m
:
(3
Rating Distributions
Three-year
one foreign language
no foreign language (boys)
no foreign language (girls)
no foreign language (boys and girls)
It will be observed that even within the main types of course the curriculum ratings of particular
items were in perfect agreement for some items and in complete disagreement
for others. A
similar pattern was obtained from the environment ratings. These patterns indicate that the extent
of agreement among the teachers, even within a course, was only moderate when averaged over
all the items of each test. They throw little or no light on the reliability of each teachers ratings.
The relation
between
facility
of item, teachers
of class
65
the ability level of the group. For example, the 49 pupils in the 5-year course in one school gave
35 correct answers to item 2 of the geography test, rated 1 B by the teachers in that school.
Thus the facility percentage for curriculum rating 1, environment rating 0 and ability level 2 was
71.4 y0 for this group, giving a facility probit of 5.57. The 27 groups then provided the data to set
up for each item the regression equation
facility
probit = br x curriculum
rating
+ b2 x environment
rating
+ bs x abrlity level
As a first approximation
each group was given equal weight, i.e., the differences
in the numbers
of pupils in the groups were ignored.
This technique was applied to three items in the geography test, three in the mathematics test
and four in the science test. The items were chosen partly for their relevance to the present
inquiry and partly because results from the main inquiry had suggested points of interest.
A summary of the results is shown in Table 2 in which coefficients
which are statistically
significant are marked *. It will be observed that for no item was the regression coefficient
for the
curriculum rating significantly
different from zero, and only for one item was this true for the
regression coefficient for the environment rating. On the other hand, for all items save one the
regression coefficient
for the ability rating was significantly
greater than zero. In other words,
the proficiency
of a group in answering an item is directly related to the ability level of the
group, but appears to have little relation to the amount of stress given by the teacher to the
topic tested by the item. The fraction of the whole variance of the probits accounted for by the
regression equation varied from a non-significant
12 o/0 for Item 9 of the science test to 57 o/0
for Item 22 of the mathematics test.
Table 2
Regression
for Selected
Items
Percentage 0
Test
Geography
Item
Curric.
Errors
Envt.
Ability
varmnce
accounted
- 0.12
0.30
0.36*
0.19
0.18
0.11
50
0.06
- 0.04
0.34"
0.11
0.14
0.08
46
0.26
0.31*
0.16
0.15
0.11
45
0.16
0.30*
0.21
0.21
0.15
24
0.54*
0.13
0.13
0.09
57
0.31*
0.19
0.10
0.17
0.11
52
0.1 1
0.08
43
0.14
0.17
0.13
0.14
0.10
0.1 1
23
5
12
22
Science
Standard
8
14
Mathematics
Regresslon
coefficients
Curric.
EfM.
Ability
- 0.11
0.22
see discussion..
0.21
-0.11
-0.04
0.16
0.1 1
ia
0.11
0.44*
0.07
-0.02
0.31*
0.17
0.12
0.26*
for
12
66
culation of the area of a triangle with base 40 yards and altitude 37 yards. This question proved
very difficult for Scottish pupils, twenty-three
of the forty groups scoring zero, and the mean
score of all groups being only 19 %. As there was so large a number of zero scores, the regression technique was not applied. It was, however, noted that the percentages of correct answers
for the three curriculum ratings were 23 %, 18 y. and 17 %, while those for the four ability
groups were 49 %, 24 %, 12 y. and 0 %.
Item 22 of the mathematics test was again a calculation of areas, but in this case an example
was given. Scottish pupils fared better on this item; every group contained pupils giving correct
answers and the mean score of all groups was 59 %. The percentage of the variance accounted
for by the regression equation was 57, the highest of all the items examined.
Items 1, 7 and 9 of the science test were selected partly because sex differences were shown in
the percentages
of correct answers, boys being superior in all three. The first referred to the
force required to push an object up an inclined plane and the teachers, while not stating that this
type of question was stressed more frequently with boys than girls, appeared to be of opinion
that it was the kind of problem more likely to occur in a boys environment than a girls This was
the only item in which the environment rating was significant.
Boys were also superior in their replies to Item 7, which dealt with the principle of flotation, but
neither curriculum rating nor environment rating appeared to affect the regression equation.
The responses to Item 9, on the principle of the lever, provided some surprises. Not one of the
regression coefficients
was significant nor was the contribution of all three together, the percentage of the variances attributable to regression being only 12 %. The percentage correct over
all groups was 36 O/Oand the percentage for the various groups ranged from 9 to 75 %. As the
item was a multiple choice one, with four possible answers, there is a suggestion here that a fair
amount of guessing had occurred. This idea is supported by the fact that the curriculum rating for
26 of the 36 groups indicated that the topic had not been referred to by those teachers.
Finally, Item 18 of the science test, which referred to the usual method of estimating the age of a
tree, produced a good response from the Scottish pupils, but once again the only factor associated with success was the ability rating, and the proportion of the total variance accounted for
by all three factors was only 23 %.
The results of this analysis may be disappointing
to teachers in that so little difference seems to
be made to the proportion of correct answers by their stressing or not stressing particular topics.
It must be borne in mind, however, that the measures used were comparatively
coarse and that
the analysis has been made as simple as possible. With these reservations,
it would appear that,
at the age at which the tests were administered, ability is a greater determinant of success than
stress by teacher or help from environment,
and that other factors are, in most cases, having
greater effect than all three together.
APPENDIX
ON PAGE 66
67
APPENDIX
Most frequently
Item
68
Geography
Science
IB
18
3c
28
1c
3c
2c
IC
3c
IC
1c
2B
1c
IC
38
2c
2B
3c
1C
1B
3c
2c
16
IB
1C
IC
3c
10
2c
IC
3c
11
2C
IC
3c
3A
1B
12
2c
13
2c
2c
IB
14
2c
IA
15
2c
3c
2B
16
2c
2c
2c
17
2c
2c
28
18
2c
3c
25
19
2c
3c
2c
20
2c
3c
3B
21
2c
1B
1C
22
2c
1c
23
2c
3c
24
2c
36
25
2c
3c
26
2c
3c
27
2c
28
2c
29
3c
30
3c
31
3c
32
3c
IC