You are on page 1of 63

international

studies

EDUCATIONAL

In

education

ACHIEVEMENTS
OF
THIRTEEN-YEAR-OLDS
IN TWELVE COUNTRIES

Results
research

project,

reported

by:

institute

for

education,

1959-61,

ARTHUR

W. FOSHAY

ROBERT

L. THORNDIKE

FERNAND

HOTYAT

DOUGLAS

A. PIDGEON

DAVID

1962
unesco

of an international

A. WALKER

hamburg

CONTENTS

Foreword.

Arthur

W. Foshay

THE

BACKGROUND

AND

TWELVE-COUNTRY

Robert

THE

STUDY

PROCEDURES

OF THE

L. Thorndike

INTERNATIONAL

COMPARISON

OF THE ACHIEVEMENT

Fernand
FROM

AND

BELGIAN

Douglas

NATIONAL

DATA.

INTERPRETATIONS

. . . 43

. . . 63

A. Pidgeon

A COMPARATIVE

Davld

OF 13-YEAR-OLDS

Hotyat

INTERNATIONAL

STUDY

OF THE

DISPERSIONS

OF TEST

SCORES

A. Walker

AN ANALYSIS
PUPILS
AND

OF THE

TO ITEMS

SCIENCE

TESTS

----

REACTIONS

IN THE
.

OF SCOTTISH

GEOGRAPHY,
.

TEACHERS

AND

MATHEMATICS
.

The national data pooled for purposes of this study derive from the work
of various national centres for educational
research and are used with
their kind permission.
The opinions expressed
in the various sections
of this report, which is sponsored by the Unesco Institute for Education,
are those of their authors and do not necessarily
represent
the views of
the Unesco Institute for Education, of Unesco, Paris, or of the research
institutions
to whose staffs the authors belong.

Foreword
The present study may well be described as an unusual addition to the literature of education.
The results of the project here reported suggest that both empirical educational
research and
comparative
education can gain new dimensions, the one by extending its range over various
educational
systems, the other by including empirical methods among its instruments.
In the
minds of its authors the project had the double purpose of throwing light on the possibilities
of
such research, and of obtaining actual results which would, not so much evaluate educational
performances
under different educational systems in absolute terms, but rather discern patterns
of intellectual functioning and attainment in certain basic subjects of the school curriculum under
varying conditions. This would be a first step towards bringing into profile the relative merits
of various learning processes and procedures.
If the results so far, because of limitations on
their validity which the authors freely admit, are little more than suggestive, at least they offer
real encouragement
for believing that such researches can, in the future, lead to more significant
results and begin to supply what Anderson has lamented as the major missing link in comparative education,
which in his view is crippled especially by the scarcity of information about the
outcomes or products of educational systems*.
It does not detract from the achievement of this exploratory
study to say that, when further
refining international
empirical research methods, it will be essential for the researcher also to
explore the possibilities of building in more possibilities of relating the data to the specific educational principles and objectives which underlie the various national educational systems.
Certainly the international
group itself was sufficiently
encouraged
by the results of its first
exploratory
study to embark on a more ambitious one during which, at several key points in the
secondary school cycle, as comparable samples of schoolchildren
as can be obtained will be
subjected to tests which bear close reference to curricula and educational aims in all the participating countries.
The project here reported could not have been accomplished
without substantial contributions
from twelve different national centres for educational research (listed on p. 8 and 9) who were responsible for the technical and scientific aspects of the study, whilst the Unesco Institute contributed from its experience in international administration
and coordination
of research. Each national
centre was involved in the expense of providing tests and the local organisation
and admlnistration of field work, whilst the Unesco Institute substantially
underwrote the expenses arising
at the international
level. Beyond this the project depended on the goodwill and cooperation
of a
large number of teachers in whose schools the tests were administered
and the background information collected. I am very glad that the importance of this support, which for unavoidable
reasons usually remains anonymous, is brought out clearly in Dr. Walkers final section of this
volume.
In conclusion, I should like to express the particular debt of thanks which is owed to Professor
Foshay, who directed the project, to Professor Thorndike, who acted as chief test editor and
was primarily responsible
for analysing the international
data, and to the authors who have
contributed to this volume. It is hoped that subsequently
it may prove possible to add further
analyses of the international
data to those contained in the present report. Our thanks are
due no less to Mr. Cobb who, following
his responsibility
for day-to-day
coordination
of the
project, has undertaken the compilation and arrangement of the present report.
Hamburg, August

C. Arnold
Anderson:
1961. No. 1, p. 7 and 8.

Saul B. Robinsohn

1982

Methodology

of Comparative

Education,

International

Review

of

Education,

Vol.

VII,

Arthur

W. Foshay

TWELVE-COUNTRY

THE

BACKGROUND

AND

THE

PROCEDURES

OF THE

STUDY

In the genersl
orientation
with which
of setting
up an international
project
I 12 countries.
and the characteristics

this report
opens.
Professor
Foshay
describes
for achievement
testing.
the tests that were
of the samples
to which
they were given.

the process
administered

If custom and law define what is educationally


allowable within a nation, the educational systems
beyond ones national boundaries
suggest what is educationally
possible. The field of comparative education exists to examine these possibilities.
The present exploratory
study, like all studies in comparative
education, has its roots in the
desire of each of us to know more about educational
systems other than our own. It differs
from others, however, in that it seeks to introduce prominently an empirical approach into the
methodology of comparative education, a field that has in the main relied on cultural analysis as
its chief mode of inquiry.
Beginning in June, 1959, and ending in June, 1961, research agencies from twelve countries
cooperated in a pilot study of school achievement, under the general sponsorshrp of the Unesco
Institute for Education in Hamburg. The purpose of the present report is to describe this effort
and to indicate some of the results.

Background and need


The number of cross-country
comparisons of school achievement is small, and the findings must
be limited severely. Typically, such studies have involved two populations
somewhere
in the
middle of their respective
school careers. Achievement
has been compared
in one school
subject, such as mathematics.
Such studies, useful as they have been, have been limited in
scope. In some cases, the comparisons have been inappropriate
because of differences in curriculum, a difficulty compounded by the mid-career
point at which comparison was made.
What has not heretofore been attempted, even on a limited basis, is a comparison that would
take the school population near a terminal point, and involve many countries from the same
general world culture. Such a large-scale effort would seem difficult to administer and exceedingly
expensive to carry out. Such an effort, however, would have advantages:
the results could be
examined with ones mind on the fact that they arose from many apparently different conceptions
of the nature and meaning of education; since the students were near the end of their formal
education one might take the test responses as representing
the outcome of the educational
system as a whole, rather than catching a student in mid-career, before the curriculum had been
completed. Since a large-scale attempt has not been made, one could find by trying it whether
such a project was in fact feasible. More important than these considerations,
however, is the
possibility
that comparisons
could be made more analytically
than has so far been attempted.
The function of the academic curriculum is to teach children to think in ways appropriate to the
subject matter being learned. It would be very useful if short-answer
tests, with their desirable
attributes of definiteness
and objectivity,
could be used to discern patterns of intellectual functioning in several standard school subjects. Such patterns, if they exist, would shed light on an
ancient pedagogical
problem - the problem created by the fact that we know much more about
the measurement of the results of educational effort than we know about the effort itself.

Purposes of the present study


The present exploratory
study was intended to show whether
general terms, the purposes of the study can be stated as follows:

such needs

could

be met. In

1. To see whether some indications of the intellectual


functioning
behind responses to shortanswer tests could be deduced from an examination of the patterning of such responses from
many countries.

2. To discover

the possibilities

and the difficulties

attending

a large-scale

international

study.

In the papers that follow in this volume, several reports are presented of the results so far
achieved. Here I shall describe the procedures we have used, the populations tested, and the
tests we have employed.

Early planning:

the participants

In 1958, the Governing Board of the Unesco Institute for Education accepted the present writers
proposal that an international
study of intellectual
functioning
be undertaken. The officers
of the Institute invited several directors of educational research organizations to meet in Hamburg
in June, 1959, to consider the proposal further, and to decide whether they wished to participate
in such a study. As it happened, the second meeting of representatives
of European Centres
of Educational Research was scheduled to take place near London a week later, and the proposal
was described there also, with the result that some centers not represented
at the Hamburg
meeting also joined in the project.
The Hamburg meeting of June, 1959 lasted for five days. During this time, the participants considered and adopted the proposal that a study be designed, and then proceeded to make the
design, to prepare preliminary tests, and to arrange a schedule.
Each of the participants
was to bear the costs of test administration
within his own country.
The Unesco Institute paid the cost of travel and maintenance for the European participants
at
the three meetings finally held, and furnished extensive coordinative
services. The participants
who finally took part in the study were the following:
Belgium:

Fernand

Hotyat,

Directeur

du Centre

des Travaux,

lnstitut

de Pedagogic,

England: Dr. W. D. Wall, Director, and D. A. Pidgeon, Senior Research Officer,


tion for Educational Research in England and Wales, London.
Finland: Professor Martti Takala,
Research, Jyvaskylii.

Professor

of Psychology

and Director,

France: Professor Gaston Mialaret, Professor of Psychology


(President of the International
Association
of Experimental
Countries).

Morlanwelz.

National

Centre

Founda-

for Educational

and Pedagogy, University of Caen


Education of the French-Speaking

Federal Republic of Germany: Professor Dr. Walter Schultze, Director, und Dr. Rudolf Raasch,
Research Assistant, Hochschule fur internationale
padagogische Forschung, Frankfurt am Main.
Israel: Dr. Moshe Smilansky, Pedagogical Adviser and Director of Research, Ministry
tion and Culture; Director, Henrietta Szold Institute for Child Welfare, Jerusalem.
Poland: Professor
Scotland:

Jan Konopnicki,

Dr. D. A. Walker,

University

Director,

Scottish

of Wroclaw.
Council

for Research

Sweden: Professor Torsten Husbn, Research Professor


Bjiirkquist, Research Assistant, Institute of Educational
of Stockholm.
Switzerland:

Professor

Dr. S. Roller, Institute

of Educa-

of Sciences

in Education,

Edinburgh.

of Educational Psychology,
Research, Teachers College,

and Education,

University

and L.-M.
University

of Geneva.

USA: Professors Arthur W. Foshay, A. H. Passow, and D. L. Super, Horace Mann-Lincoln


Institute, and Robert L. Thorndike, Institute of Psychological
Research, Teachers College, Columbia

University, New York. Professors Benjamin S. Bloom and C. Arnold Anderson,


parative Education, University of Chicago.
Yugoslavia:

Dr. Vladimir

Muiiit,

Institute of Education,

University

Center for Com-

of Zagreb.

In addition to Mr. D. J. Cobb, who served as the continuing coordinator of the project, the Unesco
Institute for Education, under the direction of Dr. S. B. Robinsohn (and prior to his appointment
the Acting Director, M. R. E. Hennion), put the services of its excellent staff at the disposal of
the project.
Under the supervision
of Professor
for the tabulations of data.

The procedure

Thorndike,

Dr. and Mrs. Leonard

Burgess

were responsible

as planned

When the group described here met in June, 1959, they reached a number of agreements about
procedure, first having considered
and accepted the general proposal. Before stating them,
however, it is necessary to state the caveat that applies to this study.
This is an exploratory
study. The participants
in this study were working with no extra funds,
no extra allotment of time, and without the benefit of a previously developed set of procedures.
It will therefore be apparent that both the tests and the sampling procedures do not meet the
standards that might otherwise be required. For these shortcomings we are not apologetic; it was
necessary to accept them, and hence to restrict the statements based on the data gathered,
if the study was indeed to be undertaken. Since not all of the sample populations are comparable, we shall not report total scores here as if they could be compared. The most interesting
analyses involve patterns of responses among items and sub-scores,
not comparisons
of total
scores, and it is this kind of analysis that is reported here.
The following procedural agreements were made and acted on by the participants
in the study:

1. The sample
a. The students to be tested would all be aged from 13 years to 13 years 11 months on the first
day of the school year whatever might be the school level (grade) at which they were found.
b. The sample population in each country would be between 600 and 1000 in number.
c. The sample to be tested would be all the children of both sexes residing in a community
or communities selected to yield a population of the designated size.
d. The community or communities
selected for testing would be as representative
as possible of the total population of the country, according to whatever data were available to the
participant in the study. If (as was true in some countries) no data were available to aid the
participant in his selection, he was to use his own judgment.
2. Data about the children
a. Background
1)
2)
3)
4)
5)
6)
7)
8)
9)

to be tested

data on each of the children

to be tested would

birth date
sex
number of siblings
place in birth order
home language (if different from school language)
location of home (city of 20,000-100,000; 2,000-20,000;
years in school
kindergarten
(attended, not attended)
size of class (by los, from 10 or less to 61 or more)

be gathered,

as follows:

under 2,000 inhabitants)

10)
11)
12)
13)
14)
15)

fathers education
mothers education
interest of parent (much, moderate, little or no)
fathers occupation
mothers occupation
score on non-verbal intelligence test

3. The tests
a. Tests would be administered
science, geography.
b. A non-verbal
the background

test would
information.

in the following

be administered

fields:

reading

comprehension,

to all of the children,

mathematics,

the score to be added to

c. The working languages of the study would be French and English. Translation of the test
items would be done by each participant
into his home language. Copies of the translated
tests would be deposited with the Unesco Institute for Education.
d. Trial forms of the tests ( except the non-verbal, which had been developed by the National
Foundation for Educational
Research in England and Wales) would be developed
by the
participants working together at Hamburg. (The items for the tests as finally constructed were,
in the main, taken from existing tests originally developed in England, France, Germany, Israel
and the U.S.A.)
e. The trial forms of the tests would be pre-tested with a small number of children
country, and criticisms and suggestions sent to a test editor for consideration.
f. The tests as finally
the Unesco Institute.

approved

would

be duplicated

and circulated

in each

to the participants

by

g. Alterations
in the substance of items would be permissible, provided they were approved
by the test editor. (A typical alteration involved the change in units of measure to conform
with the custom of the country.)
h. The tests would be held to approximately
30 items, in the hope that each of them could
be completed in less than 45 minutes. (Pre-testing
and the later administration
of the tests
confirmed this as an adequate length of time.)
i. No time limit would

be imposed

on the students.

j. A practice test would be constructed


which included examples of all the kinds of items
included in the tests to be scored. During the practice session students would be encouraged
to ask any question that occurred to them about the practice test and about the project as a
whole. Teachers were requested to answer all questions fully, including giving the answers
to the practice test.
4. The schedule
The tests were to be administered in November, 1960, the data sent to New York for processing
by February 1, 1961, and the results of the first data processing were to be made available to the
participants by June, 1961. (Chiefly because of the excellent coordination
by the Unesco Institute,
this schedule was met virtually to the minute.)

10

5. Organization
The administrative
center for the project was the Unesco Institute in Hamburg. The participants
met there three times, each time for one week: in June, 1959, to plan the project and construct
trial forms for the tests; in October, 1960, to take a final look at the project before testing, and
in June, 1961, to examine the data and to plan for interpretation
and publication.
Certain persons accepted special responsibilities
for the conduct of the project, as follows:
Arthur W. Foshay (U.S.A.), project director; editor of the geography
Robert L. Thorndike (U.S.A.), test editor,
A. Harry Passow (U.S.A.), editor of the science test,
Gaston Mialaret (France), editor of the mathematics test,
Walter Schultze (Germany), editor of the reading test,
D. A. Pidgeon (England), editor of the non-verbal test,
D. J. Cobb (Unesco Institute), coordinator
of the project.

The procedure

test,

as executed

The procedure as described above was as uncomplicated


and as realistic as the participants
could make it. Future planners of such projects as this, however, will be interested in the variations from the plan that developed as it was actually carried out. There were several of these:
the sampling procedure varied from the plan in a number of details: the times at which the
tests were administered
varied somewhat because of differences
in the academic calendar;
a few test items had to be changed, even after the pre-testing.
In order that the limits of the
comparisons
may be known explicitly,
we shall present here descriptions
of the population
samples furnished by the participants,
descriptions
of the tests, and some comments on the
translation of the tests.
The samples
The plan called for samples of from 600 to 1000 children between 13 and 14 years of age, these
being all of the children in a representative
community. The samples actually ranged from 300
(Switzerland)
to 1,732 (Israel). The total number of children tested in all the countries was 9,918.
In the section that follows, descriptions
of each sample as provided by the participant
are
reproduced.
Belgium
The area, over which the work of the lnstitut Superieur de Pedagogie du Hainaut extends, consists of localities with between 500 and 2.500 inhabitants (a small industrial city and its environs).
Pupils at the post-primary
level are scattered over a number of schools and not concentrated
in any one town. Due to this fact, it was necessary to use statistical
information
relating to
the whole of the French-speaking
region of Belgium as the basis for forming the sample. According to statistics for the school year 1956-59. the school population aged 13-14 (counting only
pupils who were in the grade appropriate
to their age or not retarded more than 1 year) was
divided up as follows:
Boys: general secondary schools . . . . . . . . . . . . . . . . . . . . . . . . . . . .._...__..........
55%
vocational schools or quatrieme degre * of the primary school , . . . . . . 45 Oh
II
Girls: general secondary schools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .._..
47 %
11 vocational schools or quatrieme degre of the primary school
53 %
4e degr6
de lkole
primaire:
two top classes
(7th and 8th years)
of the primary
school
providing
a suitable
terminal
course
generally
with
vocational
bias for pupils
who
have
not transferred
to general
or vocational
secondary
education
at the end of the sixth year.
(World
Survey
of Education,
Vol.
III, Unesco,
Paris,
p. 236)

11

In order to obtain representative


groups, complete schools were taken, but these were later
reduced by random sampling methods to conform to the percentages given in the official statistics. The procedure adopted is set out in the following table:
BOYS
Retalned
in samples for

Tested

Vocational
General

schools
secondary

analysis

Tested

Retained
In samples

analysis

and 4e degre

145

145

205

170

schools

214

175

193

150

-for

The sample finally submitted to analysis thus contained 640 subjects, 320 boys and 320 girls.
A check was made to ensure that the elimating process did not affect the mean scores.
No attempt was made to provide any representation
of the Flemish-speaking
part of Belgium.
Children who were retarded two or more years in school were excluded. These are estimated
(by M. Hotyat) to be about 10 y. of the total. One or two of the sections in the vocational schools
included quite retarded children. In the vocational schools about 26 o/o of the pupils were children
of foreign workers (primarily miners), and the corresponding
percentage in the general course
was 6 o/O.

England
The sample from England consisted of 1,181 pupils, 607 boys and 574 girls. The pupils were all
the 13-year-olds
attending school under one Local Education Authority in central England. This
particular area had been chosen since other evidence had shown that, on tests given at the
age of II, the authority was quite representative
of the whole country both with respect to mean
score (100) and standard deviation (15). Although the proportions
of children from urban and
rural administrative
areas were also similar to those in the country as a whole, the authority
was predominantly
rural in character and contained no large industrial town.
The number of schools

and pupils were as follows:


Schools

Grammar
Modern
Unreorganised

all-age

4
6
3

Boys

Girls

Both

115
461
31

105
443
26

220 (18.6 %)
904 (76.5 %)
57 ( 4.8%;)

Three of the grammar schools, each containing boys


authority concerned and the fourth was a Direct Grant
in the enquiry. All six secondary modern schools were
small unreorganised
rural schools containing pupils of

and girls, were controlled


by the local
boys school which volunteered
to assist
coeducational,
as were the other three
both primary and secondary school age.

Finland
The sample from Finland included 727 pupils, 386 boys and 361 girls. These came from about
50 classes in schools widely distributed
over Finland. The choice of schools and numbers of
pupils are such as to make the sample closely representative
of Finland as a whole with respect
to grade and type of school and with respect to the percentages of urban and rural pupils. The
proportions
for the country and for the sample were reported as follows:

12

Whole

Sample

country

62%
38%

school (Grades IV-VIII)


In secondary school (Grades I-V)
In rural community (7- 15 yrs.)
In urban community (7-15yrs.)
Rural students in primary school
Rural students in secondary school
Urban students in primary school
Urban students in secondary school

In primary

68 x
32%
64%
36 %
78%

67 %
33%
76%
24 %
45%
55%

22 %
48 7;
54 %

No pupils were tested in Finland who were in classes below Grade VII of the primary school.
This meant that in primary schools about 2 y. of children in the age group, most of them retarded,
were not included in the sample. In the secondary schools all 13-year-old children were included,
irrespective
of their grade level (actual proportions:
55% in Grade II, 20 O/Oin Grade I, and
25 oh in Grade Ill) and the secondary school sample was thus as representative
as possible.
No pupils were included from Swedish-speaking
districts in Finland (about 8 y. of the population). Many of the classes in rural areas included mentally retarded children.

France
The sample from France was a relatively small one of 451 pupils, 181 of whom were boys and
270 girls. The small size was accounted for in part by bad weather, which reduced school attendance at the time of testing. The sample was drawn from one small city near Caen, together
with the adjoining rural area. The sample corresponded
approximately
with the total French
population with respect to percent of urban residence, occupation of father and size of family,
as these were determined in an earlier extensive survey by Heuyer, Pieron and Sauvy*.
The sample was chosen to represent the different types of schools in the following numbers:

Primary
school
Vocational school
Secondary school

Boys

Girls

Total

117
20
52

142
20
98

259
40
150

Those pupils who were retarded more than one year in school were excluded.
to be about 5 y. of the total age group.

Federal

Republic

This is believed

of Germany

The sample consisted of 811 pupils, 403 boys and 408 girls, who were attending schools in the
city of Darmstadt in Hessen, or in the adjoining rural districts. These three districts are believed
to correspond well with the country as a whole in socio-economic
structure, in distribution
by
types of occupation, and in education. Furthermore, previous experience in setting up test norms
has shown this region to be close to the national average.
Within the region, the sample was chosen so that the proportions
corresponded
closely with
the national average with respect to type of school attended, and within the Volksschule
representativeness
was sought with respect to grade level reached and, within the eighth grade, with
respect to size of school.
The proportions
are shown below for both sexes combined.

* Heuyer,
veraltalres

G., pieron,
de France,

H., and
1950.

Sauvy,

A. Le Nlveau

lntellectuel

dae

Enfanta

dlAge

Scofalre.

paris, presses u,,I-

13

Country

E.S.N. 8th grade


Mittelschule 8th grade
Gymnasium 8th grade

2.0
10.7
18.1
Country

Volksschule Total
6th grade
7th grade
8th grade
l-3 classes
4 - 7 classes
8-9 classes

as a whole

as a whole

69.2
1.7
7.3
63.2
15.0
18.1
27.1

Sample
2.1
11.0
18.5
Sample

68.4
1.7
7.4
59.3
15.4
16.2
27.7

No data were available concerning


retardation
in the Mittelschule
or Gymnasium, and so no
cases of this type were included.
As in all other countries, except Scotland, the tests were administered in Germany in November,
1960, to pupils aged 13.0 to 13.11 on the first day of the school year. As Germany was in the
unique position of having a school year beginning in April, not September as in other countries,
the German children were in fact aged between approximately
13.7 and 14.6 at the actual time
of testing, as opposed to approximately
13.2 to 14.1 in the other countries. This difference may,
however, have been partly offset by the long summer and other holidays which considerably
reduced the German pupils effective period in school prior to testing.

Israel
Because of local interest in certain types of sub-groups, Israel tested a relatively large sample,
almost 1,900. Data were analyzed for 1,873, 930 coded as girls and 942 coded as boys. These
were from a number of different schools in different localities. The basic classification
is into
town and city schools, schools in Moshavim (collective agricultural
settlements), and schools in
Kibbutzim (communities with communal housing, dining, and child-rearing).
Schools in Moshavim
were all located in well-established
settlements
of early immigrants.
Schools in Kibbutzim were chosen in equal proportions to represent three ideological trends, but
otherwise at random. Town and city schools were chosen so as to include essentially
equal
numbers of schools that had scored high, above average, and average on the national eight grade
survey.
The Israeli data are based on all eighth grade pupils in the schools which were tested. No 13-yearolds who were not in the eighth grade were included in the Israeli sample. The result is that
177, or 9.5 O/o of the group were 14 years of age or over at the beginning of the school year,
while 514 or 27.4 O/cwere less than 13 years of age. At the same time, all 13-year-olds who were
in the 7th or earlier grades were excluded. These are estimated to be 10 O/o of the total age
group. Likewise about 4 oh of the 13-year-olds
who had progressed
beyond the eighth grade
were excluded. Furthermore, the sample was limited to schools in which the pupils were primarily of the early group of immigrants to Israel, and thus largely of European origin. Thus, of the
total sample only 193 or 1O.30/o had fathers born in an African or Asian country compared to
about 30 O/oin the eighth grade nationally.
(It is to be noted that of the fathers 166, or 8.9 %, are classified as professional
or managerial,
but Dr. Smilansky indicates this is not unusually high for the European section of the countrys
population.
(It is also to be noted that the date of testing in Israel was February-March
rather than November, so that pupils had an additional three or four months of growth and schooling as compared
with most countries.)

14

Poland
The total population tested in Poland was 1,000, consisting of 346 boys and 654 girls. Children
were selected from five different environments.
The rural children were tested in large villages
possessing at least the first classes of secondary school. Children from small towns were tested
at Milicz, from mixed agricultural-industrial
surroundings
at Dzierzoniow,
from industrial areas
at Walbrzych (where there are coal mines and heavy industry), and children from a favored
cultural environment in one of the districts of Wrociaw, 80 O/oof whose inhabitants were reported
by the Polish research agency as being highly educated people.

Scotland
The Scottish sample consisted of 991 pupils, 515 boys and 476 girls, drawn from two of the
educational administrative
units in Scotland, the city of Aberdeen and the country of Stirlingshire. These were chosen because each had been found in the past to be representative
of
educational achievement in Scotland as a whole. In Aberdeen, one-sixth of the 13-year-old pupils
were tested, the sampling being based upon the day of the month on which they were born. In
Stirlingshire,
a sample was drawn from half of the schools in the county in such a way that each
school course was represented
in the same proportions
as in the county as a whole. Thirteen
schools were involved in Aberdeen and ten in Stirlingshire.
The testing in most of the countries took place in November of 1960, it having been agreed that
this was a good time for those countries in which the school term begins early in October or
during the month of September. The school year would have been well started by this time, and
the children would have had a substantially
similar number of days since the school year had
begun. In Scotland, however, the testing was conducted in June of 1960 as certain reorganisations
were due to take place in Scottish secondary schools during the autumn of 1960. It was therefore
necessary to give the test at that time, and to adjust the selection procedure so that the Scottish children were at the appropriate age when they took the tests.

Sweden
The sample in Sweden consisted of 567 pupils in all, 284 boys and 283 girls. These were drawn
from about 30 classes in the middle part of the country. The classes were chosen from schools
which in previous national surveys had given results with an average score and variability close
to the national average. Testing was limited to seventh year classes in the various types of
schools. Those pupils in the classes who were above or below the age limits for the study were
excluded from the sample.
No attempt was made to test those 13-year-olds who were in classes either above or below the
seventh. The number of 1S-year-old pupils in classes above the seventh is estimated to be about
10 y0 and the number in the sixth or lower classes is estimated to be about 5 %.

Switzerland
The Swiss sample consisted of only 314 pupils, 153 boys and 161 girls. These were drawn from
the city of Geneva, no attempt being made to represent a wider geographical
region. Dr. Roller
points out that a national sample in Switzerland
would have had to be drawn from each of the
cantons in the country, there being no other unit that is appropriate to the cultural and population
distribution
in Switzerland.
Since the total 13-year-old population of Geneva is about 2,000, the
sample is considered adequate to represent them.
Within Geneva, the sample was set up so as to include appropriate
numbers in the different
types of schools in grades 7 and 8. The total numbers tested were as follows:

15

7th Grade

boys

8th Grade

girls
boys
girls

Ecole primaire
College de G&eve
Ecole primaire
College de Geneve
College moderne
Ecole superieure lat.
Ecole superieure mod
Ecole m&-rag&e

101
72
203
72
94
52
77
76

Those pupils among this total group who were 13 years of age were included in the final sample.
No attempt was made to test 13-year-olds who fell below the seventh grade. These are estimated
to comprise 5 y0 of the age group.

U.S.A.
In the United States the total group tested was 2,254, but in order to reduce the burden of
statistical analysis, only every other pupil was included in the sample finally analyzed, which
comprised 1,127 pupils, 568 boys and 559 girls.
The United States sample consisted of all the 13-year-olds in the public school systems in three
different educational administrative
units. One was an industrial city that is part of the Boston,
Massachusetts
metropolitan
area. A second was an industrial city of about 50,000 in southern
Ohio, not far from Cincinnati. The third was a rural county in south central Illinois. The three
units were chosen because they had been found to give results close to the national average
when they were used in the nationwide standardization
testing for the Metropolitan
Achievement
Test published by the then World Book Company.

Yugoslavia
The Yugoslav

sample consisted
Boys

Urban
Rural
Multiple
Total

Class teaching

of 685 children,
Girls

drstributed

as indicated

in the following

table:

Combined

202
135
(28)

206
142
(31)

408
277
(59)

337

348

685

685 subjects in 24 classes,


of 13 schools
in 10 localities.

Some previously determined localities were substituted by others with the same general situation
(general cultural level of the population, distances from principal ways of communications,
etc.).
The localities where testing actually occurred were: Bukevje, Klara, Lomnica, Novo tire, Odra,
Vele4evac. Velika Gorica, Velika Mlaka. Vukovina, Zagreb.
In the localities tested, all I3-year-olds
attend school. Only handicapped children educated in
special institutions for the handicapped were excluded from the sample. Testing was also carried
out with pupils above and below the seventh grade. All classes are co-educational.

The tests
Four tests of academic achievement were given. Since our general purpose was to gather data
that could yield inferences about intellectual functioning, or reasoning, an attempt was made in
each test to include items that called for reasoning, but did not require previous knowledge of
the field. The ideal item was one that presented all the information required for a correct answer.

The typical item was multiple-choice.


Mathematics

of the tests.

test:

5 items requiring
5 basic
Which
a)
b)
c)
d)

Here are very brief descriptions

simple computation

concept items, e. g.:


is the largest of the following
94512
19542
95421
59241

numbers?

7 verbal problems, e. g.:


1) A train leaves Rome at 1:OO p. m.. traveling
along the same line 30 minutes later, traveling
second train overtake the first?

at 60 miles an hour. A second train leaves


at 60 miles an hour. At what time does the

2) This table gives readings of maximum and minimum temperature in degrees Fahrenheit, of
rainfall in inches, and of sunshine recorded for each month of the year.
a) In which month did the highest temperature occur?
b) In which month was the difference
between the maximum and minimum temperatures
greatest?
c) Which was the wettest month?
A
9 problem sequences, e. g.:
We know that the altitude of a triangle is a line drawn from one
vertex perpendicular
to the opposite side. We are given the triangle in Figure 2.

I
I

\
B

a) What is the altitude from vertex B?


b) What is the altitude from vertex C?

---lH
C

FIGURE

\\J/
Total: 26 items, some with subdivisions;
Science

29 responses.

test:

16 items of the form:


When
a)
b)
c)
d)

you enter
pupils of
lenses in
pupils of
lenses in

a movie
the eyes
the eyes
the eyes
the eyes

theater on a sunny day, you do not see well at first because the
are still large
will focus the light in front of the retina
are still small
will focus the light behind the retina.

5 items to be marked as either definitely


false, or definitely false, e. g.:
One can tell the approximate

true, probably

true, impossible

age of a tree from the rings on a cross-section

to determine,

probably

of its trunk.

Total: 21 items.
17

Geography

test:

12 multiple-choice

items depending

on information,

e. g.:

The Danube flows into the


a) Red Sea
b) Mediterranean
Sea
c) Dead Sea
d) Black Sea.
16 items involving

drawing inferences

from a set of hypothetical

4 items requiring

that generalizations

be stated as supported

maps.
or not supported

by a bit of text.

Statements
a) Many mountain stream beds are narrow, steep, and full of rapids.
b) People who live in rugged mountain areas tend to depend for their livelihood more upon
animal products than upon the growing of crops.
c) Mountain dwellers are often fine craftsmen.
d) In mountainous areas, water power can be used to produce electricity for manufacturing.
Generalizations
1)
2)
3)
4)

Beautiful carved leather articles are made in the Himalayan area.


In mountainous areas, export trade is restricted to articles of high value and small bulk.
Mountain people do not build large boats.
Life in many mountainous areas has become considerably
more comfortable during the present century.

Total: 32 items.

Reading comprehension:
5 reading passages,

each followed

by 6 or 7 comprehension

items, e. g.:

According to the text, considerate driving means


a) greeting other people in a friendly way
b) giving help if there has been an accident
c) assisting if there has been a breakdown
d) watching out for old and infirm people and children.
Total: 33 items.
Non-verbal

test:

74 items, requiring either perception of analogies among abstract


perception of differences, or perception of relationships.
Validity

completion

of series,

of the tests

The multiple-choice

18

figures,

form of testing

is more familiar

to students

in Scandinavia,

the United King-

dom, and Germany than it is in the other participating


countries. A practice test which included
examljles of all the forms of test items was provided for this reason. There is no evidence that
the scores actually achieved bore any relationship to previous experience with tests of this kind.

Reliability

of the tests

The reliability
of the tests is discussed
report. The general estimates of reliability

by Professor Thorndike in the second


of the tests are as follows:

Mathematics
Reading
Geography
Science

chapter

of this

.81
.81
.70
.62

A note on translation
The tests were originally prepared in either English, French, or German. They had to be translated
into eight languages: English, Finnish, French, German, Hebrew, Polish, Serbo-Croatian
and
Swedish. The problem of translation was, of course, of great concern. Since the participants did
not mean to make this the main problem of the study (any more than they meant to make sampling
or test construction
the main problems), they agreed to leave the translation of the items into
their own languages to each participant. They did not, for example, test the translation by having
it translated back into its original language in order to compare the re-translation
with the original.
This led to occasional differences in items. A striking example of this, as might be excepted, was
in the translation of a passage in the reading comprehension
test, in which the literary quality of
the passage in its original French was its main characteristic.
Elle sort dune touffe dherbe qui Iavait cachbe pendant la chaleur. Elle traverse Iallbe de
sable & grandes ondulations.
A caterpillar emerges from a tuft of grass where it has been concealing itself during the warm
weather. It crosses the gravel path, moving in a series of large ripples.
Quelle belle chenille, grasse, velue, fourrbe. brune, avec des points dor et ses yeux noirs!
What a beautiful caterpillar-fat,
hairy, furry and brown, with golden spots and black eyes!
A different translation problem appeared at one point in the mathematics test. A question in the
English original read: How would omitting the decimal point in 18.52 change the number?
One of the answers from which the examinee could choose, read: Makes it 1/I 0 as large.
The French translation reads: II devient 10 fois plus petit-an
entirely different problem.
Such difficulties
in translation apparently were so small in number and so scattered as to be
insignificant. There is no evidence that they seriously influenced the national scores.

Taking this exploratory


study as a whole, we think
importance in the field of comparative education:
1. A large scale project, which depends on similarities
in education and in measurement, can be done.
2. The data obtained, even under the restrictions
that
And, by extension, we think we have shown that it
an element only
element into comparative education-

we have demonstrated
in technical

certain

and philosophical

matters

of

assumptions

inevitably arise, can be analyzed fruitfully.


is possible to introduce a large empirical
slightly present in the field until now.

19

Robert

L.Thorndike

ACHIEVEMENT

INTERNATIONAL

COMPARISON

OF THE

OF 13-YEAR-OLDS

In the second
section
of this report
Professor
Thorndike
presents
and discusses
certain
aspects
of the results.
He explains
the reason
for deciding
to restrict
comperlsons
between
country
and
country
to patterns
of achievement
(national
proflles),
omitting
comparisons
of levels,
and he
presents
findings
on the reliability
of the tests.
The results
are analysed
in relationship
to sex
and certain
background
variables.
Further
analyses
of variations
behveen
countries
in the relative
difficulty
experienced
with selected
test items,
combined
with a study of item content,
lead the
author
to investigate
e number
of hypotheses
with results
which
demonstrate
the possibilities
of
international
evaluations
of achievement.

Because the present project was a pilot enterprise, carried out with limited resources, it was not
practical to try to get a truly representative
sample of the 13-year-old population In each country.
Sampling procedures varied from country to country, as described by Foshay, but in most instances sampling was limited to one or a few communities or regions that were thought to be representative of the country as a whole. In a few countries (England, Scotland, Sweden) there had
previously been fairly complete national testing surveys, and communities or regions could be
chosen which had been found on these to correspond to the country as a whole. In some countries (Switzerland,
Israel) the sample was intentionally
restricted to a place or to a fraction of
the population that were fairly clearly not representative
of the country as a whole. In most of
the countries an attempt was made to achieve representativeness,
but the evidence upon which
communities or schools were chosen was rather meager and impressionistic.
Because of these limitations on the representativeness
of the national samples, there seems to
be little value in comparing the absolute level of achievement in one country with that in other
countries. For this reason, no country by country tables of mean scores are reported. We will
turn our attention instead to an examination of the magnitude of the differences
between countries, and to the differences in patterns of achievement from country to country.

Statistical

characteristics

of the tests

The test battery consisted of four short achievement tests and a non-verbal measure of scholastic
aptitude. Three of the achievement tests yielded separate part scores as well as a total score.
The nature of the several tests and sub-tests is described by Foshay (pp. 16 to 18). At this point
we shall merely supplement that description by a brief table (Table I) showing for each test and
sub-test (1) the number of items, (2) a general estimate of the mean, obtained by averaging the
means for 11 national groups, (3) a general estimate of the standard deviation, obtained as an
average of the 11 standard deviations within countries, and (4) a general estimate of reliability,
obtained by Kuder-Richardson
Formula No. 20 from the average standard deviation and the
average item difficulty over 11 national groups.
As might be expected, the reliabilities
of many of the sub-tests, consisting of from 4 to 10 items,
are quite low. However, the total test reliabilities
are fairly satisfactory,
the estimated values
ranging from a low of .62 for the 21-item science test through .70 for the geography test, .81
for the reading test and the mathematics test, to .89 for the considerably
longer non-verbal test.
The estimates are rough, since the assumptions underlying the Kuder-Richardson
formulas are
not completely
met. However, the general order of magnitude is indicated. Though the tests
would be of no use for the study of single individuals, they appear adequate for comparisons of
groups
of several hundred, and these are the comparisons
with which this study is primarily
concerned.

Results

from

Yugoslavia

became

available

too

late

for

inclusion

in these

and

certain

other

analyses.

21

TABLE 1
Parameters

Statistical
No.

of

items

Non-verbal

Aptitude

Mathematics

- Part 1
- Part 2
- Part 3
- Part 4
-Total

of Tests

Average

K-R No. 20

Average
Stand.

Mean

Dev.

Reliability

75

33.66

12.19

.89*

5
5
7
9
26

3.63
4.19
3.91
3.05
14.98

1.14
1.03
1.60
2.02
4.40

.51
.58
.51
.73
.81

Reading Comprehension

33

21.36

5.27

.81

Geography

- Part 1
- Part 2
- Part 3
-Total

12
16
4
32

7.01
7.82
1.59
16.42

2.16
2.77
1.07
4.65

.49
.57
.38
.70

Science

-Part1
- Part 2
-Total

16
5
21

7.77
1.99
9.76

2.85
1.17
3.39

.59
.28
.62

* By K-R Formula
l

* Inflated,

No. 21. but spuriously

because

items

high

because

of speed

factor.

are not independent.

Within and between countries

variance

From the raw scores on each test, raw score means and standard deviations were obtained. A
crude average of the variances in the 11 separate countries provided an estimate of the average
variability
of performance
of pupils within a single country-the
within-countries
variance.
The mean of the means for 11 countries was used as a grand total mean. Variance of the 11
national means around this average value provided an estimate of variability
from country to
country-the
between-countries
variance. A comparison of the two variance estimates-the
one for variability within a country and the other for variability from country to country-provides
an index of the magnitude of international
differences.
Table 2 expresses the variance between
countries as a percent of typical variance within a country. The results are shown for boys and
girls separately, and for the total group of all pupils.
TABLE 2
Variance Between Countries Expressed as a
Percent of Average Within-Country
Variance
Boys

Non-verbal
Mathematics

Aptitude
- Part 1
-Part2
- Part 3
- Part 4
-Total

Reading Comprehension

22

and

Girls

Boys

Only

Girls

Only

12.1 ,s

11.5%

11.7%

9.9
9.4
11.8
14.3
16.2

13.9
11.6
14.0
18.4
21.2

7.5
8.8
11.7
12.2
13.4

6.2

8.1

5.6

Geography

-Pa*1
-Part2
- Part 3
-Total

35.8
7.1
5.6
15.4

39.3
7.9
4.4
17.7

37.4
7.8
7.1
15.9

Science

- Part 1
- Part 2
-Total

6.1
2.3
5.2

7.3
1.5
5.9

8.0
3.3
7.1

It is clear that the variation between national means is small in relation to the variability
of
scores within any one country. National differences
represent a minor rather than a major component in these results, And the probability is that they are over-estimated
rather than underestimated, because the countries that did relatively well on the tests were in several instances those
that were known to have tested an up-graded sample of their populations. We suspect that with
truly representative
national samples, the differences would have been reduced. Of course, the
participants
in this survey were all countries with a basically European culture, and with welldeveloped educational
systems. A greater heterogeneity
in national cultures and educational
levels would very probably increase the national differences, perhaps substantially.
A comparison of the different tests with respect to magnitude of international
differences brings
out some rather dramatic and surprising results. With these tests and samples, the tests that show
the smallest variations from country to country are the tests of science and of reading comprehension. The presumably relatively culture-free
non-verbal aptitude test shows about twice as much
country-to-country
variation as the reading and science tests, and the geography and mathematics
tests about two-and-a-half
times as much. It must be remembered that all the tests had been
translated into eight different national languages - English, Finnish, French, German, Hebrew,
Polish, Serbo-Croatian,
and Swedish. The fact that a reading test, which would appear to be
especially susceptible
to changes in difficulty
with translation,
remained so uniform is rather
unexpected. The findings suggest that the nearest thing we have to a culture-fair
test may be a
carefully translated reading test, and that level of reading ability is the feature with respect to
which different educational programs are most nearly uniform.
Several of the tests had sub-tests and a comparison of the variability between nations on these is
of some interest. In the case of mathematics it is the verbal problems (Part 3) and the inductive
series (Part 4) that showed the largest international differences. Geographical
information (Part 1)
showed by a large margin the widest international variation of any test, whereas in map reading
(Part 2) the differences were much smaller and in drawing generalizations
(Part 3) smaller still.
Scientific judgment (Part 2) showed very little variation between countries, and scientific information somewhat more. In some measure, the above results are an outcome of differences
in reliability of the sub-tests. If the within-group
variance is inflated by measurement
errors, the
between-group
variance will necessarily
look small in comparison.
However, this accounts for
only part of the results, and the major differences appear to arise from more genuine factors.
Generally speaking, the boys varied more from country to country rhan did the girls. However, the
sex differences in this respect were neither large nor entirely consistent.

National

profiles

Though variations in the sampling procedure from country to country make comparisons of level
of achievement of questionable
value, comparisons of patterns of achievement from country to
country seem sound and of a good deal of interest. By pattern of achievement we mean a countrys achievement on the specific tests and sub-tests, relative to its own over-all level of achievement.
Patterns of achievement were arrived at through the following steps:
(1) For each test a crude average of the national means was computed, and also a crude
average of the national standard deviations.
(2) On any one test, such as the test of reading comprehension,
each countrys average score
was converted into a standard score, by subtracting from it the average score for all countries
and dividing the result by the average standard deviation for that test.
(3) The average standard score for the five tests (i. e., the total scores) was computed for
each country.
(4) This average standard score was subtracted from the standard score on each test and
sub-test. That is, each specific standard score was expressed as a deviation from the countrys
average standard score. In this way, each national group was reduced to a common and comparable base line. It is then possible to examine and compare directly the peaks and hollows of
achievement in the different countries.

23

TABLE 3
National Patterns of Achievement
Expressed as standard score deviations from national
average on all 5 test5
Belgium

Non-verbal

Aptitude

Mathematics

-Part1
-Part2
- Part 3
-Part4
-Total

-16

-36

-18
-6
-17
-31
-16

-16
-19
21
-40
-7

c46

-38
28
32
23
-18
6

-24
-11
- ia
3
-18

Science

-Part1
-Part2
-Total

-16
5
- 14

7
4
28
43
30
-8

11

19

-58
-12
27
-33

ta
11
-7
4

3
23
7
15

25
16
-15
26

47
-a
-29
20

13
23
16

16
4
9

- 29
13
-21

24
-8
24

-14
-24
-14

Aptitude
-Part1
-Part2
-Part3
- Part 4
-Total

-18
25
-9
2
43
30

Scotland

Sweden

Swik.

-a

12

11

33
19
-16
3
0

0
10

-4
16
-25
-26
-27

-58
-23
-43
23
-45

12
20

92
-31
-65
16

-33
-12
33
-16

-20
12
ia
-5

-9

0
24
7

12
20
12

-43
-39
-43

-44
-3

23

-1

-24

-Part1
- Part 2
-Total

Yugosl.

25

Geography

-Part1
-Part2
-Part3
-Total

U.S.A.

- 28
-43
- 29
- 19
-39

Reading Comprehension

Science

Israel

Combined

Poland

Non-verbal

Germany

-51
-42
- 20
-9
-40

-Part1
- Part 2
-Part3
-Total

Mathematics

France

12

Geography

and Girls

Finland

25
40
34
42
44

Reading Comprehension

c = Boys

England

-1

4
4
-3
7

-9

30

-54
15
35
-16

27
10
16
5

28
24
27

28
24
21

The complete set of national profiles is presented in Table 3. These show results for boys and
girls combined. All entries in the table are expressed In hundredths of a standard deviation. That
is, the entry 12 for Belgium on the non-verbal test means that Belgiums standard score on that
test was twelve hundredths of a standard deviation higher than Belgiums average standard
score on all five of the tests. Thus, if we look at the results for Belgium, we see that the pupils in
Belgium were most outstanding, relative to their over-all level of performance,
in mathematics.
Here they show a peak of almost half a standard deviation. They are slightly above their own
over-all average on the non-verbal aptitude test, and they do relatively least well on the test of
reading comprehension.
The sub-test scores show only minor deviations from the total scores.
England, by contrast, performs especially well on the non-verbal aptitude test and is especially
weak in mathematics and geography. The geography sub-test dealing with geographical
information is notably lower than the map-reading or inference tests.

24

___..

.,_

A similar analysis could be made of the pattern for each country, pointing out points of relative
strength or weakness in each. Or the results can be examined from the point of view of each
test in turn. This has been done in Figure 1, in which the strength or weakness of each country
(relative to its own over-all mean) has been plotted on a common scale. We see that on the
non-verbal test the country that doea especially well is England. Since the test was English in
origin, this result may possibly reflect some degree of previous familiarity
with the test, and
acceptance of the task as a reasonable and sensible one. Scotland also does well on the test,
while Germany and Finland perform poorly on it.
On the mathematics test, all the French-speaking
countries are superior performers, with Belgium
leading the way. Poland also shows up to advantage. The English-speaking
countries are consistently poor. One wonders what part of this is contributed
by their complex system of denominate numbers. Yugoslavia also has marked difficulty with this test.
National differences
in reading comprehension
are relatively small. It is on this test that Yugoslavia shows up to best advantage, followed
by Scotland and Finland, while Belgium and
Poland do relatively poorly.
On the geography test we find Germany, Israel and Poland leading the way, and their superiority is especially
marked in that section of the test dealing with geographic
information.
The
English-speaking
countries do notably poorly on the geography test as a whole, and especially
on the sub-test dealing with geographical
facts and information.
This is an area in which the
different national curricula appear to have produced distinctly different results.
Science is an area in which the French-speaking
countries are relatively weak. Here the leaders
in relative achievement are the United States and Germany, with Yugoslavia and England following in that order.
Some countries show rather marked peaks and hollows in their profiles. Thus, England is very
high on non-verbal aptitude and very low in mathematics and geography. Belgium is high on
mathematics and quite low in reading. Others show a notably even pattern of performance. The
best example is Sweden, which performs at almost the same level of excellence on all the tests.
The patterns of relative strength and weakness provide a picture of achievement
under the
different educational
systems. They provide no explanation
of how the differences
come into
being. This must be contributed by the investigator who is intimately acquainted with the educational systems in the several countries. However, the data presented here must still be considered
quite tentative. They are limited by (1) the local and only partially representative
character of
many of the national samples, (2) the brevity of the tests and especially the sub-tests, and (3)
the limited opportunity
to plan test content 50 as to assure the most balanced and appropriate
representation
of content and objectives.
The results reported so far are for boys and girls combined. It is of some interest to look at
the results for the sexes taken separately, and this is done for the five total tests in Table 4.
Scores for boys and girls are each expressed as deviations from the average of all five tests for
that sex in that country. That is, the score of 11 for Belgian boys on the non-verbal test means that
the Belgian boys were eleven-hundredths
of a standard deviation higher on that test than they
were on the average of the five tests.

25

FIGURE
Non-verbal

Aptitude

England

Mathematics

Relative

Achievement

Reading

Comp.

Groups

on Tests

Geography

Science

Belgium

France, Poland
Scotland

Yugoslavia
Scotland

Switzerland

Finland

Belg., Switz.
USA

England

Israel

USA
Switzerland
Germany

Flnland
Sweden

Sweden

of National

Israel

=rance, Israel

Israel

USA
Germany
Yugoslavia

Poland
France

England

Germany

Sweden
Switzerland
;,;w$=fia

Finland
Scotland

Poland

Yugoslavia
France

Germany

Belg., Israel

Poland
France
USA

Scotland
England
Yugoslavia

Switzerland

TABLE 4
Sex Differences in Pattern and Level
of Achievement
Average

B-G
32
22
24
15
32
18
37
14
6
24
- -11
17

Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Yugoslavia
Average

19

Non-Verbal
6
G
11
42
-47
-7
-39
3
-22
16
-21
3
12
-12
-5

13
50
- 28
- 22
- 33
17
-17
33
9
21
11
-6

Geography

Science

42
-42
-2
34
-18
-10
22
-44
-4
17
-35
-56

46
-37
13
27
-14
-4
34
-34
6
22
-18
-34

-35
-2
11
-27
-4
-17
-34
14
-8
-8
-8
25

-12
25
27
6
10
-3
-18
33
7
16
20
35

-16
-30

-21
-36

-25

7
5
29
20
15
-11
1
6
-14
8

0
23
23
21
16
-23
-11
9
-18
1

-1
31
31
-5
32
5
20
22
33
-16
47
37

12

Mathematics

-8

Reading Comp.

-8

-1

20

0
-14
-32
16
-32
-15
-11
-9
-68
7
4
-15

The first column of Table 4 shows a different kind of a finding. In this column, the average
standard score on all five tests for boys and for girls is compared, country by country. Thus. in
Belgium on the average of all five tests the boys fell 0.32 standard deviation units above the girls.
in
An examination of this column, headed Average B-G, shows the extent of male superiority
a pooled average performance,
country by country. Thus, there is only one country in which
the girls surpassed the boys in total performance - the United States. DifFerences between the
two sexes were small in Sweden and Scotland. Largest differences were in Poland, Germany and
Belgium. These results provide some clue as to the comparability
of educational opportunity and
motivation for the two sexes in different countries. On average, over all countries and tests,
the boys fall about a fifth of a standard deviation above the girls.
An examination of results for the different tests shows that girls perform best, relatively speaking, on the reading test, and least well on the test of science. This pattern is a universal one,
appearing in each one of the 12 countries. We appear to have here a universal and quite stable
sex characteristic. There is also a small, but rather consistent tendency for the girls to do relatively
better on the non-verbal test (10 countries) and the mathematics test (11 countries). Differences
in the geography test were small and inconsistent. All of these differences in relative performance on specific tests appear, of course, after adjusting for the 0.19 standard deviation difference
in average performance of boys and girls.

Achievement

in relation to parental education

or occupation

An attempt was made to get information on parental education or fathers occupation for the
children in each country. However, there was very real difficulty in getting comparable data for
different countries. Pressures and sensitivities
differ from country to country, so that in some
it is possible to get information about education and in others it is possible to get information
about occupation, but it is rarely possible to get both. Furthermore, the differences in educational
structure in different countries make it difficult to establish classifications
that will be comparable
from country to country. However, the operation was carried out as well as could be done, and
some comparisons are presented in Tables 5-10.

The

basic

unit

IS the average standard deviation of boys and girls combined, averaged for 11 countries.
27

TABLE 5
Percent of Pupils with Fathers at Different
Levels of Education or Occupation
B
Level of fathers educ.

Belgium

Elementary only
Some secondary
Secondary completed
Some college
College

England

45
32
14
*
t

completed

College

33
44
15
*
*

completed

B
Germany

Fathers occupation

Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & managerial

*
77
11
7

24
55

19
37
27
14

.
74
,*
9

78
14

Too few to justify computation

21
60
14
*

22
34
30
11

10
43
27
9
10

Sweden

t
76
12
7

Israel

25
37
9
9
16

Poland

G
Elementary only
Some secondary
Secondary completed
Some college

France

50
26
15
5
+

Yugoslavia

58
21
9
4
9

S
4

78

55
23
11
c

78

10

16

Scotland

U.S.A.

SWit2.

Germany

22
30
18
*
23

26
42
10
5
15

Israel

8
L

Scotland

SVMZ.

8
45
27

49
30
14

21
35
17

10

17

of means.

Table 5 shows the percent of cases with fathers at different levels of education or occupation.
It is clear from this table that the different national samples were not comparable with respect
to distribution
of education or occupation of fathers. What is not clear is the extent to which
these sample differences
reflect similar differences
in the total national population and the
extent to which they reflect biases in the specific sample tested in that country. Thus the Scottish
sample showed twice as many unskilled and semi-skilled workers as the German sample, twoand-a-half times as many as the Swiss, and five times as many as the sample from Israel. How
shall we understand this? Examination of the sampling procedure brings out that the Israeli
sample was limited to that segment of the population who were of European origin, that the
Swiss sample was limited to Geneva, while several fee-paying schools were excluded from the
Scottish sample. Thus, in part at least the differences
between nationalities
appear to reflect
differences
in sampling. However, it is also probably true that differences
reflect in part actual
national differences,
especially in the amount of schooling. Thus, the differences
in educational
level of parents in Sweden and in Yugoslavia are certainly at least in part a reflection of the
past educational
level prevailing in the two countries. And to come from a family in which the
father has completed secondary education certainly signifies a less outstanding experience in the
United States than in most European countries, where such a level of education is still the exception

rather

than

a typical

event.

However,

in education

also

the figures

suggest

that

the sample

be non-representative
of the total population in some countries. Thus, a Polish sample in
which 40 s/s of fathers have completed secondary education hardly seems representative
of the
total Polish population
in the age range 40-50.
Comparisons
of national sub-groups in which the level of parental education or occupation is
may

uniform

from

national

groups, though they have some limitations

country

to country

are almost

certainly

more

meaningful

as pointed

than

comparisons

out in the preceding

of total

paragraphs.

Table 6 shows comparative


data for the non-verbal aptitude test. Average score is quite clearly
related to average level of fathers education, and to a somewhat lesser degree to fathers
occupation.
Some fairly substantial differences
between countries remain, when comparisons
are restricted to those of a common level of parental education or occupation. However, it should
be noted that the countries that show up least favorably in such a comparison, as of those with
some high school education, are countries like Sweden and the United States in which almost
all parents had received at least that much education - i. e., where education had been most
nearly universal and non-selective.

TABLE 6
Non-Verbal Test Averages by Level of
Fathers Education or Occupation
B
Level of fathers educ.

Elementary only
Some secondary
Secondary completed
Some college
College completed

France

Poland

-6
23
50
-

33
96
115

- 24
-6
-

0
23
23
40

-34
-14
9
-

Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial

-60
-33

18
69
49
B

Fathers occupation

England

G
Elementary only
Some secondary
Secondary completed
Some college
College completed

Belgium

Sweden

-93
-59
-12
23

- 26
24
-

S
- 26
16
-

-65

7
L

Yugoslavia

U.S.A.

-52
-

-25
-19
3

12

-76
- 14
24
-

Germany

Israel

Scotland

SWltZ.

Germany

- 1
-8
15
24
59

- 3
36
48
74
79

-12
28
59
58
-

29
41
77
53

- 28
-34
-18
-13
39

-92
-70
-63
-25

Israel

4
33
55
50
86

Scotland

0
41
37
-

SWlb.

24
60
39
53

29

TABLE 7
Mathematics Test Averages by Level of
Fathers Education or Occupation
B
Level

of fathers

educ.

Belgium

England

23
56
88
-

-58
38
43
-

Elementary only
Some secondary
Secondary completed
Some college
College completed

France

11
52
-

5
10
55
-

College

completed

-64
-8
-48

occupation

Sweden

U.S.A.

-37
IO
-

-107

-124

22
33
51
-

-31
13

56

Israel

Scotland

Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional

15
11
26
61

-7
25
34
62

-74
-31
-7
15

Prof. & Managerial

87

71

Switz.

-148
-86
-43
-9

-70
-30

-143
- 79
- 59
-

-43
8
-

Germany

Yugoslavia

-12
19
32
-

Fathers

46
70
69
71
G

Elementary only
Some secondary
Secondary completed
Some college

Poland

Germany

Israel

Scotland

77
52
72
-

-7
-14
30
3

-22
12
37
49

-71
-28
-23
-

44
61
38
-

88

42

55

36

Switz.

TABLE 8
Reading Test Averages by Level of Fathers
Education or Occupation
B
Level

of fathers

educ.

Elementary only
Some secondary
Secondary completed
Some college
College completed

Belgium

England

-47
-25
-12
-

France

-53
-25
-

-20
53
91
-

College

completed

-3
39
20

-61
-39
-11
-

S
Sweden

-13
12
17
18
I

occupation

Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial

30

Germany

26
22
57
90
96

Israel

-11
21
28
55
67

-65
4
38

-45
22
-

40

12

BOYS
Fathers

Yugoslavia

-96

-37
-19

2
7

U.S.A.

-38
-4
-

-30
24

-48

Poland

G
Elementary only
Some secondary
Secondary completed
Some college

-80
1

Switz.

Germany

-13
21
58
78
-

42
28
38
63

19
7
32
47
86

-8
51

6
-

Scotland

-66
-16

Israel

-20
19
35
56
62

57
L

Scotland

-2
48
41
-

S
Swttz.

7
49
53
51

TABLE

Geography Test Averages by Level of


Fathers Education or Occupation
B
Level

of fathers

educ.

Elementary only
Some secondary
Secondary completed
Some college
College completed

Belgium

England

-33
-2
18
-

-44

31
52
-

College

completed

-60
-58
- 21
-

-68
-17
-23

Fathers

occupation

Germany

Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial

-25

-14

50

15
56
71
97
95

2
12
42
-

-51
-1

-113

31

-40
-3
39
74
TABLE

32

-92
-53
-40
-

-46
7
-

Germany

52
50
69
53

G
Switz.

5
0

-72
-34

-47

Yugoslavia

-132

Scotland

Israel

61
59
83
88
139

U.S.A.

Sweden

45
63
70

14
8
-

-22
18
25
-

Poland

G
Elementary only
Some secondary
Secondary completed
Some college

France

Israel

23
25
64
35
94

-18

Scotland

2
41
54
75
76

Switz.

-53
-22
-18
-

8
46
32
46

10

Science Test Averages by Level of


Fathers Education or Occupation
B
Level

of fathers

educ.

Elementary only
Some secondary
Secondary completed
Some college
College

completed

Belgium

England

-19
18
12
-

16
95
99

-70
-57
-14
-

-35

-36
16
15
B

Fathers

occupation

Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial

Germany

Israel

Yugoslavia

4
-

6
50

69

70

-41
-

-71

-38

-34
-16
8
15

70

-73
-80
-56
-15

- 19

5
-

22
-

G
Switz.

-44
-7
23
-

Scotland

U.S.A.

Sweden

56
66
60
-

-66
-38
-44
Y

Poland

G
Elementary only
Some secondary
Secondary completed
Some college
College completed

France

Germany

Israel

Scotland

Switz.

-42
-5

-42

87
61
96
90

1
45
57
73

5
32
58
50

40
17
42
-

25
30
26
25

102

87

40

54

1
32

-6
-

-62
-25
-35
-

-41

31

Tables 7-10 show results for the mathematics, reading, geography and science tests respectively.
The relative performance
of different countries, and of boys and girls, differs on the different
tests, reflecting the national and sex differences in patterns of achievement previously discussed
in connection with Tables 3 and 4. However, the patterns of achievement in relation to education
or occupation of father are much the same from test to test. A crude pooling of results from
different tests and countries yields the following results:

BOYS

Fathers education
Elementary only
Some secondary
Secondary completed
Some college
College completed
Fathers occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Professional

Girls

-36
-2
36
36
41

- 58
-27
1
12
13

14
28
51
66
79

-10
18
25
36
50

There is a difference of roughly three-fourths


of a standard deviation in average score between
the lowest educational category and the highest, a range of about two-thirds between extreme
occupational
groupings. Though the gradient seems to be a little less steep for girls than for
boys, the general pattern is quite similar for the two sexes. The gradient is common to most of
the countries studied. but there are one or two exceptions. Poland shows relatively little diffe
rence associated with level of fathers education, Switzerland
relatively
little associated with
fathers occupation. One can only speculate on whether this represents some peculiarity in the
specific sample or whether school achievement
in these countries is less related to parental
education or occupation.

Additional tabulations were carried out by size of community in which the pupil resided. Communities were classified into those of under 2,000. those of 2.000 to 20,000, and those of over
20,000 inhabitants. However, there were certain ambiguities in this coding from country to country. It was not entirely clear whether place of residence meant the community in which the
school was located, or the immediate community in which the pupil had his home. Thus, in some
countries, farm children living in quite rural areas were apparently
coded as coming from
communities of two to twenty thousand because that is where they attended school. There was
also no systematic attempt to have the rural areas in the sample be representative
of all rural
areas in the country and the urban areas be representative
of all urban areas. Thus, in the United
States the primarily rural area was from a rather prosperous mid-western farming area, whereas
the two urban communities were primarily rather undistinguished
industrial centers.

The average result over all tests is summarized in Table 11. In general, though with some exceptions, those in very small communities did slightly less well (by about one-fifth of a standard
deviation, on the average) than those in larger communities, but no differences were found between the two categories of larger communities. A comparison of the different tests, as averaged
over all countries, suggests that the differences associated with size of community are greatest
in the case of mathematics and reading, least for the non-verbal test. Results for the separate
tests are shown in Table 12.

Average

TABLE 11
Score by Size of Community
Pooled Results on All Tests

B
Under

2C0J

2000 - 20 oco

for

G
Over

20 000

England

-15

Finland
France
Germany
Poland
Scotland
Sweden
U.S.A.
Yugoslavia

31
21
-4
-44
- 34
- 94

-22
56
42
43
2
- 20
-37
-

-11
58
58
-8
-13
- 33
- 30

Crude Average

- 20

Under

2000

I
2000

- 20 000

S
Over

20 000

-32

- 20

2
- 23
- 33
-48
-6
-99

-51
18
11
6
2
-18
-15
-50

-11

-34

-13

-14

- 24
22
26
-9
- 29
- 26
-58

No data on size of community available for Belgium.


Different

system

of categories

used

for

Israel.

TABLE

12

Average Score by Size of Community for Each Test


Based on Pooled Data from All Countries
B
Under

Non-Verbal
Mathematics
Reading
Geography
Science

Item difficulties:

Zoo0

-17
-44
- 27
-22
15

resemblance

2000 - 20 CM

-2
0
-3
5
42

G
Over

20 000

-17
-8
4
-3
27

Under

2000

- 26
-54
-31
- 28
-31

2000

- 20 000

-16
-20

Over

20 000

-20
-13
0

-14
-15

3
-11
- 25

between countries

In addition to analyzing scores on tests and sub-tests, it was also possible to study responses to
specific items country by country. The original tabulations showed the frequency with which each
wrong option to an item was selected as well as the frequency of correct response. However,
most of the analyses to be reported here deal only with the correct responses. These are studied
from two points of view. First, correlations
are presented showing the degree of consistency of
item difficulty from country to country. Secondly, certain special groups of items are examined to
throw more light on certain elements of content that are especially easy or difficult in different
countries.
Tables 13-16 show the correlations
of item difficulty among eleven countries (excluding Yugoslavia). The correlations
are over the population of items. A high correlation
signifies that the
same items are difficult and the same ones easy for the pair of countries in question.
The first thing that impresses one as one scans Tables 13-16 is the generally substantial correlations across countries. The average correlation
is .87 for mathematics,
.87 for reading,
.68 for geography, and .72 for science. The high correlations
for mathematics and reading are
especially
impressive. A difficult item is a difficult item in these two tests, regardless of the
school system in which the pupil has been educated or the language in which his schooling has
been couched. The reading test is particularly
noteworthy,
because the differences
between
countries are small both in level of average score and in the relative difficulty of different items.

33

TABLE 13
of Item Difficulties Between
Mathematics Test

Correlations

1. Belgium
2. England
3. Finland
4. France
5. Germany
6. Israel
7. Poland
8. Scotland
9. Sweden
10. Switzerland
11. U.S.A.
l

Average

of correlation

10

11

86
92
95
92
90
84
84
90
97
80

86
89
78
93
88
69
98
92
89
90

92
89
90
95
96
76
89
90
95
92

95
78
90
85
86
75
76
84
91
75

92
93
95
85
93
78
94
95
95
91

90
88
96
86
93
75
87
86
93
89

84
69
76
75
78
75
71
76
83
60

84
98
89
76
94
87
71
93
88
92

90
92
90
84
95
86
76
93
92
91

97
89
95
91
95
93
83
88
92
86

80
90
92
75
91
89
60
92
91
86
-

for boys

and for girls.

TABLE 14
of Item Difficulties Between
Reading Test

Correlations

Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
9. Sweden
10. Switzerla: nd
11. U.S.A.
l

Average

of correlation

10

11

88
89
98
83
87
87
85
92
96
89

88
88
86
86
91
82
98
92
85
96

89
88
84
88
84
86
85
91
85
89

98
86
84
80
84
85
83
91
94
86

83
86
88
80
84
81
87
89
81
88

87
91
84
84
84
88
90
85
83
90

87
82
86
85
81
88
82
82
82
86

85
98
85
83
87
90
82
88
84
94

92
92
91
91
89
85
82
88
89
91

96
85
85
94
81
83
82
84
89
85

89
96
89
86
88
90
86
94
91
85
-

for boys

and for girls.

TABLE 15
of Item Difficulties Between
Geography Test

Correlations*

1. Belgium
2. England
3. Finland
4. France
5. Germany
6. Israel
7. Poland
8. Scotland
9. Sweden
10. Switzerlal 7d
11. U.S.A.
* Average

34

of correlation

Countries

1
1.

2.
3.
4.
5.
6.
7.
8.

Countries

Countries

10

11

63
69
93
67
62
56
70
67
90
68

63
67
61
74
54
26
95
82
60
89

69
67
67
84
77
55
74
82
77
72

93
61
67
64
55
54
67
65
86
70

67
74
84
64
84
54
77
80
76
74

62
54
77
55
84
66
64
71
62
58

56
26
55
54
54
66
35
40
49
31

70
95
74
67
77
64
35
85
65
88

67
82
82
65
80
71
40
85
70
84

90
60
77
86
76
62
49
65
70
66

68
89
72
70
74
58
31
88
84
66
-

for boys

and for girls.

TABLE 16
of Item Difficulties Between
Science Test

Correlations*

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
l

Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.

Average

of correlation

Countries

10

11

73
73
94
89
04
46
68
70
90
72

73
70
68
78
84
37
95
72
75
83

73
70
70
78
81
64
65
79
77
75

94
68
70
84
79
39
63
67
84
63

89
78
78
84
88
48
72
78
88
80

84
84
81
79
88
51
82
88
79
84

46
37
84
39
48
51
37
57
47
53

68
95
65
63
72
82
37
71
68
80

70
72
79
67
78
88
57
71
84
79

90
75
77
84
88
79
47
68
64
78

72
83
75
83
80
84
53
80
79
78
-

for boys

and for girls.

The correlations
in Tables 13-16 appear to show a certain amount of clustering. In order to bring
out
the structure more clearly, a factor analysis of the correlation
tables was carried out. The
rotated factor loadings (by varimax rotation) are shown in Tables 17-20. Countries have been
rearranged in the tables to bring out most clearly the clusters.

TABLE 17
Mathematics Test
Loadings of Rotated Factors
Factor
1

Belgium
France
Switzerland
Poland
Finland
Israel
Germany
Sweden
England
Scotland
U.S.A.

96
90
98
80
97
95
98
96
94
94
91

-ia
-09
-10
-38
08
05
00
03
08
09
37

X of variance

87.8

3.2

3
17
27
12
06
12
08
-08
-09
-26
-29
-05
2.8

4
07
00
04
02
-11
-21
-02
17
-02
04
06
0.9

5
-11
-14
- 01
10
06
02
07
06
-12
-03
11
0.7

35

TABLE

16

Reading Test
Loadings of Rotated Factors
Factor
1
Belgium
France
Switzerla md
Poland
Israel
Germany
Finland
Sweden
U.S.A.
England
Scotland

96
94
93
90
93
90
93
95
96
96
95

% of variance

87.9

2
- 25
- 31
- 26
00
09
17
07
-01
13
16
18

05
- 02
-03
10
-10
12
23
11
-06
-14
- 22

-01
-01
05
-26
-19
01
00
16
05
12
05

3.1

1.4

1.6

5
04
08
- 10
05
- 02
-12
05
04
09
06
-07
0.5

TABLE 19
Geography Test
Loadings of Rotated Factors
Factor
1

Belgium
France
Switzerland
Poland
Israel
Germany
Finland
Scotland
England
U.S.A.

81
79
83
50
78
89
88
92
89
a9

-11
-06
-05
- 60
- 48
- 20
-19
16
29
25

-53
-53
-38
-17
16
14
08
10
13
04

% of variance

62.0

7.8

4
-04
02
27
-03
05
12
22
- 28
- 25
-08

7.4

36

.., . _ - .

2.6

5
-05
09
-13
04
-06
-18
06
-02
-03
15
0.8

TABLE 20
Science Test
Loadings of Rotated Factors
Factor
1

Belgium
France
Switzerland
Germany
Israel
Poland
Finland
Sweden
U.S.A.
England
Scotland

92
a7
91
94
95
57
86
86
a7
89
85

25
29
11
13
-02
07
06
- 01
- 21
-40
-44

g of variance

75.3

5.3

3
-13
-17
00
-06
-02
48
33
16
12
-12
-09
4.1

4
- 14
-09
-32
-03
21
05
00
37
05
-06
04
2.9

5
-14
-18
06
11
04
-04
06
04
12
01
-09
0.9

The most striking feature of Tables 17-20 is the large proportion of variance accounted for by
the first factor. This can be thought of as a general factor of difficulty determined by the content
of the item and independent of country. Loadings on later factors, corresponding
to sub-groups
of country, are quite small and account for only a minor fraction of the variance. The later factors
seem to be bi-polar factors in most instances, discriminating
one sub-group of countries from
another.
In mathematics, Factor 2 involves primarily difference between Poland and the USA, while Factor
3 involves difference between the French-speaking
and the English-speaking
countries. The other
factors are of no consequence.
The only factor beyond the first that seems to amount to anything in the reading test is Factor 2,
which again discriminates French-speaking
from English-speaking
countries.
Factors 2, 3 and possibly 4 appear to amount to something in the case of the geography test.
Factor 2 contrasts a cluster composed of Israel. Poland, Germany and Finland with the Englishspeaking countries. Factor 3 groups together the French-speaking
countries. Factor 4 contrasts
England and Scotland with Switzerland and Finland.
On the science test, Factor 2 seems once again to be the French-speaking
versus Englishspeaking factor. Factor 3 links together Poland and Finland, while Factor 4 separates Switzerland from Sweden.
The language groupings represented
in these factors after the first make sense at least in that
the test remains completely uniform for all countries using the same language. It is also quite
possible that educational
patterns are more alike within language groups. A knowledgeable
person may see some rationale underlying the other groupings and polarities that is not apparent
to the author.
Item difficulty:

special groups of items

Examination of item content suggests certain hypotheses concerning groups of items that can
be tested by examining item difficulty
in different countries. In this section, several of these
hypotheses will be stated and examined. The basic data in terms of which these hypotheses are
examined are item difficulty
deviation values. The procedures
for deriving these values are
stated below.
1. The percent right on each item in each country was first transformed
into a normal deviate,
using tables of the normal curve.
2. The average scaled value was obtained for each item over all countries, and the scaled value
in any one country was expressed as a deviation from this. This procedure brought all items to

37

a common base line, so that the deviation value for a given country had comparable meaning
from item to item.
3. The average deviation value over all items on a single test was computed for each country,
and this average deviation value was subtracted from the single item deviation values. This
procedure eliminated differences
in average level of performance
between countries, and left
a residual deviation value, expressed in standard-score
units, that indicated how much harder or
easier that item was for that country than would be expected on the basis of average difficulty
of the item and average performance level in the country.
Hypothesis

Decimal fractions will be relatively


in the English-speaking
countries.

easy in the French-speaking

countries,

and common fractions

The mathematics test includes four items dealing directly with the manipulation or understanding
of decimal fractions and three dealing with common fractions. The residual difficulty indices for
these were averaged over the items and for boys and girls, and are shown below for eleven
countries. We show the average residual for decimal fraction items, the average residual for
common fraction items, and the difference between them.
Decimals

Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.

18
-46
13
30
01
-11
08
-42
18
10
01

Fractions

-10
10
-10
-71
14
13
-07
19
-03
03
44

Dec.-Fract.
28
-56
23
101
-13
- 24
15
-61
21
07
-43

From the tabulation, we see that for Belgian children the difference in residuals favors the decimal
items to the extent of about 28/100 of a standard deviation on our normal deviate scale, on the
average. By contrast, for English children, the fractions items are relatively easier by 56/100 of
a standard deviation. In general, our hypothesis
is supported because the large differences
favoring fractions
are all for the English-speaking
countries, and all of the French-speaking
countries show a difference favoring decimals. However, the difference between France on the
one hand and Belgium and Switzerland
on the other is also quite striking. French children of
this age appear to be especially weak on items dealing with fractions.
The major differences
which we find in these groups of items are explicable
in terms of the
systems of weights and measures in the countries involved, and a corresponding
curricular
emphasis. The English, Scats and Americans have so many measures that go by 3s, 4s, or 12s
that they must spend instructional
time on denominate numbers and the fractions that go with
them. The continental countries, relying almost entirely on a decimal metric system can concentrate on decimal fractions, and give other types of fractions only limited emphasis. The results
suggest that this is what has taken place.
Hypothesis

National differences will be greater in the relative


of specific places than in geographical
information

difficulty of items dealing with the geography


relating to the world as a totality.

It was possible to identify in the geography test five items dealing with facts about specific
places and five others dealing with concepts of latitude, longitude, and time. Average residual
values were computed for each of these sets, and are shown below.

38

Specific

Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.

Places

Latitude

-16
-31

& Longitude

9
-5
-4

3
-18
10
50
82
-11
-13
-12
-45

6
-5
-9
54
-10
-17
-11
-7

In general, the hypothesis is supported by the data. The largest average residuals tend to relate
to specific place geography. Israel and Poland perform notably well in these items, and England
and the United States relatively
poorly. The items on latitude and longitude show generally
smaller residuals, only Poland performing on these items somewhat better than would be expected in the light of her performance
on the total test. The results suggest fairly wide national
differences
in emphasis on the teaching of specific geographic facts; at least they are learned to
different degrees.
Hypothesis

Reading test items will be easier for pupils of the country and language from which the passage
and items originally come than for other countries and especially
those speaking a different
language.
The reading test was composed of five passages, two of which had originally been in French
(one from Belgium and one from France), two in German, and one in English (from the United
States). It seems plausible, at least, that these might be easier in the language in which they
had been written and for the national culture for which they had been designed. Therefore, the
residuals were computed separately for each passage for each of the English-speaking,
Frenchspeaking and German-speaking
countries. (In Belgium and Switzerland
testing was limited to
French-speaking
parts of the country.) Results are shown below.
Passage No. I
(Belgium)

Belgium
England
France
Germany
Scotland
Switzerland
U.S.A.

00
-06
12
-08
-11
35
-13

Passage
No.
(France)

07
-04
13
-19
-13
- 09
-06

Passage

(Germany)

No. 3

Passage
No. 4
(USA)

05
01
-07
04
04
04
05

-07
-03
- 10
10
07
-15
11

Passage
No. 5
(Germany)

-06
13
-09
12
11
-10
02

Examination of the table shows that the results are in general supported. More generally, the
passages of French language origin seem to be slightly easier for the French-speaking
countries
and the passages of either English or German origin for the English- and German-speaking
countries. Thus, the precaution of choosing original passages from different language sources appears
to have been a sound one. However, even though the differences
appear in a fairly consistent
pattern, they are generally of very small size. The task difficulty seems to transcend language in
considerable
measure. Thus, once again, the universality
of the reading task is affirmed.

39

Hypothesis
National
clusion.

4
groups

show consistent

differences

in willingness

to express

certainty

about a con-

The second part of the science test consists of five statements, for each of which the pupils must
pick one of the five choices: Definitely True, Probably True, Impossible to Determine, Probably
False, and Certainly False. The keying of each item was based on the pooled judgment of several
faculty members at Teachers College, Columbia University, New York, where the test was constructed.
Our current interest is in the nature of the erroneous
can be wrong in any one of these ways:

answers.

Over the set of items, an answer

(1) An examinee can be too sure. That is, he can mark an item definitely true or false when
it is keyed only probably true or false, or he can choose one of the four other alternatives when
he should mark the item indeterminate.
(2) An examinee can be too cautious. That is, he can mark an item as probably true or false
or as indeterminate when he should have marked it definitely true or false, or he can mark the
item indeterminate when he should have marked some other choice.
(3) He can be grossly in error. That is, he can mark an item on the true side when he should
have marked it on the false side, or vice versa.
For the set of 5 items, there were 20 wrong response options. Of these, 6 were of Type 1, 6 of
Type 2, and 8 of Type 3. We have examined the results for the different countries to see what
proportion of the choices fell on each of these types of error each time the opportunity offered.
That is, we have divided the total number of errors of a given type by the total number of
opportunities
to make that category of error. We have also determined the ratio of too sure
errors to too cautious errors, providing an index of readiness or reluctance to jump to conclusions. The results are shown by sex for 10 national groups*.
o/o
Too Sure
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
U.S.A.

21.9
23.7
13.8
15.8
13.9
15.6
22.3
26.7
24.0
21 .l
16.9
16.3
19.9
20.6
14.6
14.6
14.5
13.4
14.0
12.7

Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls

o/o

o/o
Too

Cautious

Gross

10.0
14.2
15.0
17.3
17.2
20.5
11.1
12.2
9.8
12.0
14.7
16.8
19.7
22.4
14.2
18.4
22.6
25.9
18.7
21.5

Error

Index

of Sureness

2.19
1.68
0.92
0.92
0.81
0.76
2.00
2.19
2.46
1.75
1.15
0.97
1 .Ol
0.92
1.03
0.80
0.64
0.52
0.75
0.59

10.8
10.6
11.3
11.6
14.0
14.3
9.5
7.3
7.1
9.7
9.8
9.8
14.3
14.1
10.2
10.9
6.7
7.5
12.7
12.6

Clearly, there are substantial differences


between national groups in their tendency to be too
assured or too cautious on these items. When Belgian, French or German children made errors,
they were much more likely to be errors of over-sureness.
When Finnish, Swedish or American

40

Frequencies

of choice

of the

separate

response options were not

available

for

Switzerland

and

Yugoslavia.

pupils made errors, they were likely to be errors of over-caution.


We are dealing, of course, with
a very limited sample of items, and the differences we find may be peculiar to this item sample
rather than a more general characteristic
of the national educational
system or cultural background. However, the differences
are certainly suggestive, and open up a line of inquiry that
might be pursued with profit using a larger and more varied sample of tasks.
Concluding

statement

The preceding pages have shown some of the kinds of comparisons that are possible when academic achievement is examined at the same time in a number of countries with the same set of
tests. These results are of some value for themselves for the international
differences
and simi!arities that were found and described. They are certainly also of interest as exhibiting a mode1
of international cooperation and of empirical comparative education that may be further developed in the future.

41

Fernand
FROM

Hotyat
BELGIAN

INTERNATIONAL

AND

NATIONAL

INTERPRETATIONS

DATA

Following
Professor
Thorndikes
overall
view of the results,
we publish
two accounts
In which
the
authors
have used the data for the lnterpretatlon
mainly
of national
performances.
In the first of
these,
M. Hotyat
concerns
himself
particularly
with
some striking
analogies
between
patterns
of achievement
in French-speaking
countries
and contrasts
these
with
patterns
of achievement
on the same groups
of test Items in English-speaking
countries.
M. Hotyat
also uses the international
results
to throw
light
on certain
aspects
of the Belgian
situation,
and in the second
half of hls article
analyses
the Belgian
results
according
to school
type,
sex. regularity
of
pupils
promotion
through
the grades,
and nationality
of parents.
In the course
of his paper he
suggests
a number
of interesting
methods
which
can be used to interpret
data of this kind.

The Belgian share in the research was carried out by a team from the Centre de Travaux de
Ilnstitut Superieur de Padagogie du Hainaut (Mme. Delepine, M. M. Hotyat. Lowyck, Rousseaux.
Manouvrier).
In accordance
with the decision taken by the international
research group, the
analyses made by the Belgian participants aim at profiles of achievement which provide fruitful
opportunities
for interpretation.
In order to do this, we have constructed
profiles in such a
manner that in each country the average score on each of the five tests is reduced to zero. The
following statistical approach was adopted with this in mind:
1. We calculated for each test the mean and standard deviation
of all the test scores, and
expressed the mean for each country as a standard score*.
2. We established for each country the mean of its 5 standard scores.
3. We scaled down the number thus obtained from each of the 5 scores so that, for each country,
the mean of the marks equals zero.
Here is an example based on the standard scores from one of the countries:
Non-verbal

+28

Mathematics

- 30

Reading Comprehension

1-25

Science

Geography

-8

Their mean is: 28 - 30 + 25 + 0 - 8


= c3
5
By deducting 3 from each of the standard scores we obtain scores with a mean of 0. We thus
arrive at the following adjusted scores: Non-verbal, +25; mathematics, - 33: reading comprehen\
sion, + 22; science, -3; geography, - 11.
This process finally enables us to establish a profile of the scores obtained by each sample in
relation to its own mean.
For example, Table I and Figure I shown the results from the groups in the three French-speaking
countries which took part in the study. The samples were taken from Lisieux in France, from
Geneva in Switzerland, and from a central eastern region of Hainaut in Belgium.

l
The standard
deviation
from

scores
the mean.

express

the

positive

or

negative

value

of

a test

score

in

hundredths

of

standard

43

TABLE
Non-verbal

France (Lisieux)
Switzerland (Geneva)
Belgium (Hainaut)

1
Reading
Comprehension

Maths

- 20

t31

-6

+17

t9

1-21

14

+9

t8

+45

- 21

FIGURE

Non-verbal

0.4

Geography

Maths

Science
- 20
-42

-17

-12

Reading
Comprehension

Science

Geography

--

Lisieux

Geneva

_ -

Hainaut

. -.

These three profiles show some interesting analogies, in particular a peak in mathematics
troughs in reading comprehension and in science which are common to all three.

and

A detailed analysis of the profiles obtainable from the scores of each of the twelve participating
countries would be beyond the scope of this article, and we shall therefore limit ourselves in
Table 2 and Figure 2 to comparing the profiles of the means of the three French-speaking
samples
with those obtained similarly from the three English-speaking
groups in England, Scotland and
the USA.
The way in which the two profiles appear to complement each other is striking. (It should be
remembered that the two language groups which are here being compared represent only half
of the national samples which took part in the research.)

TABLE 2
Non-verbal

Fr.-speaking countries
Engl.-speaking countries

44

-1
+23

Maths

+32
-34

Reading
Comprehension

-8
+15

Geography

-1-3
-21

Science

-25
+17

FIGURE 2

Non-verbal

Maths

Reading
Comprehension

Results

from the English-speaking

samples-

Results

from the French-speaking

samples

Geography

Science

Of course, even if the research had related to true national samples, it would still be rash to
draw any conclusions
from these findings about the quality of the respective school systems;
other factors would have to be considered, such as curricula, time-tables, and the form of the
tests. But we have now developed a method which will enable us to make comparisons
based
on real national samples when subsequent research has been carried out.
Relative

difficulty

of items

We asked ourselves, too, to what extent the order of difficulty of items in each of the subjects
tested was the same from country to country when their results were compared.
1. In our preliminary analysis we applied Yules formula to the percentage of success on each
item obtained by each national group. The correlation coefficients
are spread out in the following
way (220 coefficients
for 11 samples and 4 subjects): in mathematics, from 0.75 to 1, with a
mean of 0.94; in reading comprehension
from 0.60 to 1, with a mean of 0.89; in geography from
0 to 0.92, with a mean of 0.68; in science from 0.05 to 0.95, with a mean of 0.54.
Assuming that the order of difficulties
within each test were equivalent, these data would mean
that there existed a closer relationship
between the way in which mathematics was taught in
the various countries than there was in the teaching of geography and science.
2. The table of percentages of items correctly answered enables us to make comparisons which
are of definite educational
interest. Let us take, for example, on the one hand the mean percentages for three items involving fractions, and on the other for four items involving decimal
numbers, and compare the results obtained in Belgium with those from the Anglo-Saxon
countries
where the system of weights and measures requires early and intensive teaching of fractions.
The percentages are set out in Table 3.
* If a = the number
of items
for which
the results
are higher
than the mean
in the hvo samples
El and
if b = the number
of items superior
to the mean in El and inferior
in E2: if c = the number
of items Inferior
the mean in El and superior
in E2; and if d = the number
of items inferior
to the mean in both samples,
ad+bc
we have o = ad-bc

EC?:
to

45

TABLE 3
Belgium

Fractions (mean %I
Decimal numbers (mean %I

Anglo-Saxon

countries

76
61

78.5
75.5

All other things being equal, it would appear that pupils in the Anglo-Saxon
disadvantage because of the need to start learning fractions at an early age.

countries

are at a

3. By converting the percentages of items passed into standard deviations from various national
means, we are able to establish national profiles of the relative difficulty of items in each of
the tests, leaving out of account the absolute levels of national achievement.
These results, quantified in this way, are more precise and more flexible than those provided by
the ranking of items according to the degree of success achievedon
them, and they offer very
interesting possibilities of analysis.
Thus, a comparison of these indices for a particular country enables us to examine whether the
order of results has a close correspondence
with the hierarchy of aims set up by the authors of
school curricula there, and, if this is not so, to study the teaching methods which are being
employed with a view to making whatever improvements seem desirable.
Table 4, for example, shows the standard deviations relating to some of the mathematical items,
obtained from the Belgian samples scores:
TABLE 4

2 written calculations (whole numbers)


2 written calculations (decimal numbers)
4 calculations of areas and volumes
1 problem regarding meeting point of moving objects
3 items requiring pupil to extract information from a table

Boys

Girls

- 0.145
+0.126
+0.25
- 0.21
- 0.28

-0.16
-to.02
+0.355
- 0.32
- 0.20

Is there any close correspondence


between these relative percentages
of success (which are
independent of the absolute scores in mathematics) and the order of priority in which we would
place these topics? The authorities would be faced with the solution of an educational problem
if they thought, for example, that accuracy in calculating with whole numbers had not been given
a sufficiently prominent ranking in the list of aims set for the teaching of mathematics.
A comparison of these deviations or of the profiles with those deriving from results obtained in
the other countries could also lead to interesting observations.
Thus, the mean deviations (in standard deviation units) for Belgium and Country X on three parts
of the geography test are shown in Table 5:
TABLE 5
Belgium

12 items of information
Interpretation of maps (16 items)
Relating facts to generalised statements

- 1.2
-2.1
+15

Taking into account the age of the pupils who are being tested, do these
respond to the hierarchy of aims which the education authorities assign to
If not, there are indications that we should study the way in which the
methods and procedures have been conceived in a country where, it seems,
reveal a more satisfactory balance.

46

Country

+19
-11.8
- 11.3
relative results corgeography teaching7
curriculum, teaching
the results achieved

Results according

to sex

If we first convert the scores of the boys to a standard score scale with a mean of 0, here for
purposes of comparison is the profile of the mean scores of the group of Belgian girls given as
standard scores:
Non-verbal

Reading
Comprehension

Maths

Geography

Science

+ 0.60
+ 0.4
+ 0.2
0
- 0.2
- 0.4
- 0.60
- 0.115

- 0.23

+ 0.015

- 0.24

- 0.54

The table of frequencies of mean plus and minus scores for girls from eleven countries
confirms this profile when compared with the boys results.

Non-verbal

Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Total

Maths

TABLE 6
Reading
Comprehension

Geography

Science

+
+

+
8/l 1

-I-

+
+
+

9111

5/l 1

9/l 1

ll/ll

(Table 6)

No. of negative
results
5
4
5
3
5
5
5
3
2
4
1
42155

Over the whole range of subjects tested we are able to conclude that, with the exception of
reading comprehension,
the girls results are clearly inferior to those of the boys and the gap is
particularly
marked in science. Various hypotheses could be advanced to explain this situation
which seems all the more surprising considering
that common programmes of instruction
are
followed by both boys and girls in most of the countries concerned. Is the relative weakness of
the girls results to be explained by educational factors - for example, teaching which is biased
towards literary subjects - or rather by the way in which the tests themselves have been conceived, so that they call especially for the types of intellectual
functioning
which come more
naturally to boys? Since the profiles of scores vary very greatly from country to country, we are
faced with the hypothesis (which needs to be verified experimentally)
that there is possibly the

47

influence of a combination of social and educational factors at the bottom of these differences*
The most striking feature is the general inferiority
of the girls average scores in science. It
would be very valuable if research could be done on this problem. The question has real social
significance
at a time when more room is being found for sciences in educational programmes,
and when the technical aspects of life require that schoolchildren
receive a basic training in which
the experimental sciences play an important part.

Correlations

between different

types of test

The non-verbal test which we used contains three types of test item requiring the following kinds
of reasoning:
- choosing a fourth figure which has the same relationship to the third as the first two have to
each other;
t
- extending a series of numbers or of letters in accordance with a pre-set pattern;
- picking out from a group the odd figure which does not conform to the principle governing
the others.
Comparisons have been made of the correlations
between this test and those parts of the mathematics and geography tests which depend on reasoning rather than information for an answer.
The correlations obtained (Bravais-Pearson
formula) are as follows:
Mathematics:

Section with items involving information and calculation . . . . . . . . . . . . . . . . . . 0.26


. . . . . . . . . . . . . . . . . . 0.50
Section with items involving arithmetical reasoning
Section with items involving geometrical reasoning . . . . . , . . . . . . . . . . . . . . . . 0.30
Geography:
Section with items involving information
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.23
Section with items involving interpretation
of maps . . . . . . . . . . . . . . . . . . . . . . 0.17
Section with items requiring facts to be related to general statements . . . . , . . . 0.64
Comparison of these results does not confirm the existence of a close link between the types of
reasoning brought into play in the non-verbal test and those required by the scholastic tests. By
way of contrast, Table 7 shows the correlations
between the three sections of the mathematics
test.

TABLE 7
Maths.
II
(arithmetical
reasoning)

Maths.

I (information

Maths. II (arithmetical

0.72
-

and calculations)
reasoning)

To sum up, in a subject like mathematics


range of information and skills.

The dispersion

Maths.
(geometrical

the intellectual

activity

III
reasoning)

0.81
0.72
involved

depends

on a whole

of test scores

We have taken as the coefficient of variation the relationship between the standard deviation and
the mean. Table 8 shows these coefficients
in the form of percentages for each of the tests and
each of the national samples.
* This hypothesis
metic.
Thus Toivo
the same textbook,
and significant
not a significant

48

has already
been found to have some foundation
in experimental
studies
in the field of arithVahervuo
studied
the performance
of 30 classes
in the 4th school
year in Helsinki,
all using
and discovered
that the results
of the boys classes
were superior
to those of girls
classes

at the
level.

2 y.

level:

but

he also

observed

a slight

superiority

of the

girls

in mixed

classes,

although

TABLE 8
ABCDEFGHIIK

Countries

Means

Tests
Non-verbal

26

30

36

31

41

37

48

Mathematics
Reading comprehension
Geography
Science

19
19

21
21

29
24

26
20

27
24

26
24

30

29

36

40

46

28.9

26

26

28

26

28

24.1

24

20

24

23

28

30

28

33

35

33

36

28

33

33

29

40

36

34

36

36

35

36

40

35.2

Means

23.7

23.7

26.5

27.5

28.2

28.5

30

31

33.5

33.7

37.5

43

35

34

36

36.9

Two striking observations


emerge from this table:
1. The coefficients
of variation vary quite noticeably according to subject. In the scholastic tests,
they are almost constantly lowest in reading comprehension
and highest in science. Two chief
hypotheses suggest themselves to explain these differences.
The first to occur to us is that the
tests are not all equally discriminating. However, it is also quite possible that the teaching methods
at present in use lead to children producing less homogeneous results in natural sciences than in
other subjects. Experimental research is needed to determine to what extent the second factor
influences the scores.
2. These coefficients
vary considerably
from country to country (the difference
between the
extremes is over 50 O/J. Assuming that the same care had been taken when drawing the samples
in every case, this would mean - not taking into account the absolute value of the means that the national systems produced far from homogeneous
results. Some questions of profound
importance emerge from this: Is the dispersion of results satisfactory
in relation to the general
educational aims in a given country? If not, what sort of curricula and methods have led in other
school systems to a variability which could be considered more desirable?

INTERPRETATIONS
1. General

OF DlFFERENCES

IN PERFORMANCE

WITHIN

THE BELGIAN

SAMPLE

situation

One should remind the reader (see description


of Belgian sample,
analysis does not concern all Belgian schools but is only a regional
mean scores obtained by the Belgian sample (scores in hundredths
after the general mean has been converted to 0).

p. 10) that the following


study. Table 9 gives the
of a standard deviation,

TABLE 9
Non-verbal

Total
Boys
Girls

Maths

Readmg
Comprehension

Geography

+10

+45

- 27

-15

-;25

+58
+32

- 23

i-3

-5

These results are particularly

high in mathematics

-31

and particularly

- 34

Science
- 25
+2
-53

low in reading comprehension.

Going beyond this, we can compare the scores on the various sections of the tests. In mathematics, the Belgian scores, which are barely satisfactory
for arithmetical calculations,
are particularly high for items which require reasoning and the use of concepts. At the other extreme, in
reading comprehension
they are expecially low for literary, historlcal and economic texts, One
wonders whether our present teaching methods demand sufficient
individual participation
from
pupils when texts are being studied, especially in literature and history. It would be worthwhile
to study this problem, since silent reading - which is true reading according to our official
primary school syllabus-plays
a highly important cultural role.

49

In geography, Belgian scores are average on questions which involve the direct use of verbal
information, but they are rather weak on items calling for the interpretation of maps.
An evaluation of the results in science seems inappropriate,
since curricula in this subject differ
too much to permit valid comparisons to be made between the mean scores of different countries.

2. Differences

according

to types of school

Table 10 and Figure 4 below show pupils achievements


expressed in fractions of standard
deviations according to types of school and sex. Since in this case we are concerned with comparisons within one country, the figures relate to the norms of the total Belgian sample.
TABLE
Type

of school

Non-verbal

A. Vocational
schools - boys
B. Vocational
schools - girls
C. General secondary
schools-boys
D. General secondary
schools - girls

10
Reading
Comprehension

Maths

Science

- 26

-32

-44

- 24

-44

-55

- 34

-55

-48

+52

+55

+52

+55

+57

+24

Jr34

+11

+22

-6

FIGURE
Non-verbal

C. Mean-general,
boys

Geography

Maths

-3

Reading
Compr.

Geography

Science

D. Mean-general,
girls
Mean

of total

sample

Mean

total

A. Mean-vocational,
boys

B. Mean-vocational,
girls

-1

of
sample

(I

a. Assuming that scores on the non-verbal test can be taken as a valid criterion of the pupils
capabilities, these data lead us to the following conclusions:
- in the sample of boys in vocational schools, the mean performance
is rather low in mathematics, very low in reading comprehension,
but high in science, when compared with the result
of the non-verbal test:

50

- among girls in vocational schools the mean is higher than might have been expected in
reading comprehension,
but particularly low in mathematics and geography;
- among boys in general secondary schools, the mean is high in all subjects;
- among girls in general secondary schools, the mean is high in mathematics, a little low in
reading comprehension
and very low in science.
b. Both for boys and for girls the means in general secondary schools are superior to those for
either sex in vocational schools.
If the scores for both sexes are combined, the differences are, respectively:
71 y. of a standard deviation on the non-verbal test, 93 y. in mathematics, 72 y. in reading
comprehension,
81 o/0 in geography and 57 o/o in science.
It would, of course, be quite erroneous to judge the relative merits of these two types of schools
merely by comparing these results. In the first place the study covered only subjects to which
more importance is attached in general secondary schools than in vocational
schools, and in
addition to this, the gap between mean scores achieved on the non-verbal test entitles US to
assume that the differences
already existed at the time the pupils started their post-primary
education*.
The means give us only a rough idea of the differences,
but the following
table (Table 11) of
quartile distributions
provides US with a more differentiated
picture. (These distributions
have
been obtained by finding the score point at the quartiles for the total group and then calculating
the percentage falling within the different quartile ranges in the general and vocational schools
separately.)
TABLE 11
Reading
Comprehension

Maths
Gfl.

Lowest

VOC.

Gen.

Geography
VOC.

Gen.

Science
voc.

GC?.

voc

Quartile

Lower Intermediate
Quartile

l.5

pq

:;::

pj

ii?

pq

Upper Intermediate
Quartile
Highest

Quartile

* Here, for example,


tions to post-primary

are the
schools

mean
based

percentages
on samples

of success
of N=150+

on the arithmetic
test
In each type of school.

Boys: General
Secondary
Schools

Whole and decimal numbers


Mental arithmetic
Fractions
Metric system
Calculation of areas and volumes
Geometrical figures
Problems
From: Epreuves Analytlques

dArithm&ique

Girls: General
Secondary
Schools

74.7 %
59
74
74.3
54.5
71.3
64.8
(Publications

74.7 %
56.4
73.1
73.4
50.7
70.5
64
de

Ilnstitut

Sup6rieur

taken

during

entrbnce

Boys:
Vocational
Schools

exsmlns-

Girls:
Vocational
Schools

70 %
38.6
54.1
56.9
35.5
58.5
53.2
de Pbdagogie

du Halnaut,

62 %
29.8
50.6
54.1
25.8
44.7
37.4
1961).

51

The differences
proceed, in short, from an unequal distribution of superior and inferior scorers
between the different types of schools, the superior ones being more numerous in general
secondary schools, and inferior ones in vocational schools.
c. The dispersion of scores is wider in vocational schools.
on the mathematics test in the two types of school:

Below, in Table 12, we give the scores

TABLE 12

Vocational
schools
General Secondary

Schools

Mean

S. D.

14.4
18.4

4.04
3.59

Coefficient
of variation
in 1OOths of an
S. D. from the mean

28
19

The greater homogeneity of results in general secondary schools results from the fact that they
are very selective, whereas in the large vocational schools, the spreading of pupils over different
parallel courses makes it possible for weaker pupils to stay on in classes leading to leaving
certificates
at lower levels of achievement.

3. Differences

according

to sex

According to Table 10 and Figure 4 the boys means are higher than those of the girls in general
secondary as well in vocational schools, except in vocational schools where reading comprehension is concerned.
Because of the small number of schools covered by our present enquiry, we are not entitled,
however, to draw any general conclusions from the results obtained. It would be interesting to
carry out research on a larger scale into this problem as it exists in Belgium in order to determine the causes for these differences, should they appear significant.

4. Differences
The pupils
Table 13:

according
included

to regularity

in the sample

of promotion
were

distributed

among the various

grades

as shown

in

TABLE 13
Vocstionsl:

In grade appropriate
Repeated one year

to age

boys

62
78

Vocstlonsl:

84
86

girls

General:

114
59

boys

General:

girls

97
53

Vocational
schools clearly contain a higher number of 13- and 14-year-old
pupils who have
repeated a grade, but this repetition has usually already taken place before entry into the vocational schools, as their intake includes numbers of pupils who have already doubled classes at
primary school or who have been diverted to technical schools after failure in general secondary
schools.
In order to study the extent to which repetition of a grade corresponds to lower levels of ability
or educational performance, we present in Table 14 figures for the degree of significance of the
difference between the means for each of the tests and for each type of school.

52

3
E
p

VOCATIONAL:
TEST

SCHOOL

MSWl

Non-verbal

Mathematics

SOYS

Regularly promoted
Repeaters
Regularly promoted
Repeaters

32.24
29.65
15.69
14.37

s. D.

13.7
10.5

GIRLS

Regularly promoted
Repeaters

17.80
16.97

Mean

30

s. D.

4.6

1.06

14.32

Science

Regularly promoted
Repeaters
Regularly

26.93

10.5

14.77

3.5

BOYS

13.17

4.1

18.75
16.90

3.8

GENERAL:

12.80

GIRLS

4.1

19.94

9.37

2.7

8.42

3.05

10
2.9

3.8

8.29

2.5

16.84

2.6

23.07

4.3

2.15

37.72

11.3

31.67

9.6

18.82

14.77

16.32

13.17

17.80

4.4

19.44

3.8

16.97

4.05

18.47

17.74

3.8

12.98

3.5

9.11

2.8

7.86

2.6

4.17

2.05

4.73
15.44

3.9

11.50

2.7

7.8

4.78
9.30

2.95

3.47

5.67

4.83
6.55

S. D.

7.2

1
12.23

Meall

5.73
33.93

3.6

11.3

3.24

0.67
13.83

S. D.

2.75

2.63
Repeaters

promoted

4.4

Meall

43.57

3.7
Geography

GENERAL:

1.88

1.85
4.8
4.3

10.8

1.22

3.6
Reading
Comprehension

VOCATIONAL:

HISTORY

4.16

According to this table of pupils t scores, the differences


are important in general secondary
schools at the 1 J,$level for all tests, except for reading comprehension
in the case of girls where
the level is 5 %; on the other hand, the t scores are particularly
low for boys in vocational
schools and the only significant one is that for science. The same trend is present, but to a less
marked extent, in girls vocational schools where the degree of significance
of the difference is
higher in reading comprehension and in science.
These observations
confirm that general secondary schools are more severely selective where
the subjects examined by the international tests are concerned.

5. Differences

according

to nationality

of parents

The sample of boys in vocational schools included a particularly


high percentage - more than
35 O& - of children of foreign workers, mostly Itaiians. A recent study* has shown that these
pupils are subject to a great deal of educational failure due to the absence of teaching methods
which would help them to adapt progressively
to our language and school system. It seemed to
us interesting to examine, on the one hand, whether or not the presence of this group had caused
a lowering of mean scores for boys in vocational schools, and on the other hand, to study the
extent to which children of foreign parents, who had overcome the initial obstacles facing them
in elementary school, still remained handicapped in one direction or another.
To this end, we divided the results of the 145 subjects into two sub-groups: children with foreign
parents (N=52),
and children of Belgian parents (N=93).
We then calculated the means and
of
standard deviations
separately
for each group and estimated the degree of significance
differences between the means.
Table 15 gives the results obtained.
TABLE 15
MS%lS

Non-verbal
Mathematics
Reading
Comprehension
Geography
Science

Foreign

Belgian

parents

parents
~~ .-... __--

Standard
Foreign
parents

Deviations
Belgian
parents

DlffWeWXS

31 .l
14.8

30.2
15

12.3
5.28

11.9
4.17

16.7
13.8
8.5

18.6
14.1
9.1

4.82
4.59
3.1 1

4.62
3.97
2.81

not significant
not significant
significant at 57; level
not significant
not significant
-_.~-.-

A study of Table 15 enables us to conclude:


a. that the mean obtained by children with foreign parents on the non-verbal
test is slightly
superior to that of Belgian children. There are probably two reasons which chiefly explain this.
In the first place they have already overcome a more severe selection process than the Belgian
children and it is therefore likely that their mental capacities are higher. In the second place,
fewer of them have chosen to enter general secondary schools.
b. that in spite of this advantage, their mean scores on the scholastic tests are all inferior to
those of Belgian children. These differences are not high, except in reading comprehension where
they represent 40 y0 of a standard deviation and are significant at the 5 y. level. This higher
difference is explained by the fact that in silent reading knowledge of the language is the essential element, while it is only one of the factors coming into play in the other tests.
c. that the standard deviations are all higher in the group of children with foreign parents. This
group is altogether more heterogenous:
while the best pupils remain at a level close to that of
the best Belgian-born children, the weaker ones appear to fall further behind their Belgian counterparts.
* De Caster,
S.. and Derume:
Retard
pkdagogique
- lnstltute
de Sociologic
U. L. B. 1961.

54

et situation

soc~ale

dans

la region

du Centre

et du Borlnage

Figures 5 and 6 below illustrate this situation: they present in diagrammatic form the two groups
results in reading comprehension
and mathematics in terms of 5-point normalised scales.

FIGURE

FIGURE

Mathematics

Reading

Comprehension

I
25F

-3/2a

-712

Mean

-I-

12

Children

with Belgian parents

- B

Children

with foreign

- F

parents

+312 a - 312 a

- 112

I-*

Mean

+1/z

-?- 3/2 a

55
-.-

--.-

D.A.Pidgeon
TEST

A COMPARATIVE

STUDY

OF THE

DISPERSIONS

OF

SCORES

In the second
of the two articles
which
eerve to demonstrate
the usefulness
of international
comparative
studies
in shedding
further
light on problems
occurring
In particular
educational
systems,
Mr. Pidgeon
compares
the standard
deviation
on the five tests in the twelve
countries
and draws
conclusions
about
the effect
which
streaming
(the form
of class
organisation
extensively
practised
in England
and Scotland)
the different
approach
to teaching
for the phenomenon
he has noted.

has
which

on the dispersion
of test scores.
The author
discusses
streaming
may encourage
and which
could
account

Introduction

In an earlier study (Pidgeon, 1958) in which the performances


of eleven-year-old
children from
Queensland,
Australia, and from England and Wales were compared on tests of non-verbal
ability, reading and arithmetic, it was noted that in each of the tests used, the standard deviation
of test scores obtained by the English and Welsh children was considerably
greater than that of
the Australian
sample. It was thought that this finding might, in part, reflect the effects of
streaming
as practised in England - it being sometimes alleged that one of the results of
streaming is that the brighter children make more rapid progress than they would otherwise
achieve, and also that the duller children, when assigned to aCstream,
become more backward,
partly as a result of poor morale, and partly because there is a tendency in some schools for the
more experienced teachers to be put in charge of the abler streams. It was noted, however, that
while there were larger proportions of high scorers on all tests in the English and Welsh sample,
there were differences
between the tests with respect to the low scorers; on the two arithmetic
tests there were considerably
fewer children in this category from the Queensland sample, while
on the reading and non-verbal tests there were slightly more. It was cautiously concluded, therefore, that the results might be due to differences
in the methods and approach employed in
teaching the different subjects and not to any overall differences
in the organisation
of classes.
Since that study, however, other evidence has been reported which suggests that perhaps the
larger variance of test scores may after all be due in part to the system of class organisation
employed in England. An investigation
(Lloyd and Pidgeon. 1961) in which a non-verbal test,
standardised
in England, was given to groups of African, European and Indian children in Natal,
South Africa, revealed considerably
lower standard deviations in that country. It was observed
that the system of class organisation
employed in Natal meant that no child can proceed from
one class to another unless he can pass the examination
set for that class. Such a system
inevitably has repercussions
on the methods of teaching employed and the concern of a teacher
in Natal is to get as many through the examination as possible, since too many failures might
reflect on his efficiency.
This leads to mass methods of teaching and to a complete lack of
recognition of individual differences.
It might be mentioned here that the very opposite occurs
in England. Teachers are trained to recognise individual differences
and to adjust their teaching
accordingly,
and indeed, the system of streaming is a further aid to this end. The system
employed in Natal is used with all ethnic groups but its effect is more pronounced with the
African and Indian children in view of their larger school classes. The results reported reflected
this, in that the standard deviations tended to be smaller with these groups, particularly
with the
African children.

57

A further study, particularly


relevant to the question of streaming in England, is that of Daniels
(1961). Daniels compared the performances of children in two large primary schools that streamed
by ability, with those of children in two similar schools in the same type of neighbourhood
that
did not stream. Daniels stressed that the non-streaming
in the latter schools was a consistently
thought-out policy or, in other words, the teachers in these schools did not believe in streaming
and in fact felt it was educationally
wrong to do so. His results revealed a consistent trend
towards smaller dispersions of test scores in the un-streamed schools; the standard deviations
in 22 of his 24 separate comparisons
were smaller with un-streamed children although only
five of these reached statistical significance.
There would appear to be some evidence, then, that the dispersion of test scores in England
tends to be rather larger than in other countries and that this larger dispersion may be associated
with the method of class organisation
employed. The number of studies providing this evidence
is, however, fairly small, and it would need to be confirmed before any generalised conclusions
could be drawn.

The present study


It seemed clear that, apart from its other main purpose, the international study sponsored by the
Unesco Institute for Education, Hamburg, would also provide an excellent opportunity for making
further comparisons of the dispersions of test scores in England and other countries. This article,
therefore, is concerned with presenting such results as are relevant for this purpose and to discuss the implications of the results.
A full description of the investigation
has been given by Foshay in the opening chapter of this
report. It is only necessary to note here that attempts were made to make the tests appropriate
for each country, despite the necessity to translate into eight different languages. It can be
reasonably claimed that this was successfully
accomplished
for, although a translated test is,
of course, a different test, as Foshay points out there is no evidence that difficulties
in translation
seriously influenced the national scores.
Since this was in the nature of a pilot study in this field, it had been agreed that it was unnecessary to procure a strictly random sample of the school population of the stated age in each
country, but that attempts should be made to obtain a representative
sample that was as typical
as possible of the total population at least with respect to mean test score and standard deviation.
The detailed descriptions of the sample tested in each country given by Foshay suggest that, with
a few exceptions, this had been achieved. It is, of course, particularly
important, if inferences
are to be drawn about the dispersions of test scores in different countries, to ensure that samples
fully representative
of the countries concerned are obtained. Further comment, therefore, will be
made on this point in the discussion of the results below.

Results

The obvious statistic for measuring the spread of scores on a test in any sample is the standard
deviation. Using the raw S. D. for each country on a particular test, comparisons can legitimately
be made between countries on that test provided it can be shown that there is no direct relationship existing between S. D. and mean. As an indication of how successful the tests were in
each country, in no case was the mean score either too high or too low to be seriously influenced
by a ceiling
or floor effect, and in only one test (science) was the rank order correlation
between S. D. and mean, positive. Hence the standard deviation has been used for making comparisons in this study. Since, however, five different tests were used, each with a different number
of items, in order that comparisons could be made between tests, the raw S. D.s for each country
have been converted into a standard score. In Table 1, which gives the relevant figures, a high
positive value indicates a standard deviation well above the average for all countries on that
test, and a negative value a standard deviation below average for all countries.

58

Table 1
Standard

Deviations,

country

expressed

Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Yugoslavia

in Standard

Score

Mathematics

Non-Verbal

0.43
1.47
0.29
0.78
0.66
0.23
1.74
0.20
0.81
1.72
0.06
1.34

Form, on Five Tests from Twelve

Reading

- 0.26
2.23
- 0.38
- 0.06
0.14
-0.18
- 1.21
1.27
- 0.66
- 1.68
0.14
0.65

Geography

- 0.85
1.44
0.48
- 0.55
0.84
-1.16
- 1.39
1.21
0.11
- 1.49
0.89
0.46

Countries

Science

-0.19
1.23
- 0.69
0.18
0.53
- 0.47
- 2.37
1.21
0.76
- 0.49
1.03
- 0.72

Average

- 1.34
2.61
- 0.49
- 0.87
0.27
- 0.06
0.08
0.99
0.18
- 0.77
0.23
- 0.82

- 0.44
1.80
-0.16
- 0.10
0.49
- 0.33
- 1.33
0.97
0.24
- 1.23
0.45
- 0.35

It will be seen from Table 1 that, consistently


on every test, England has by far the largest
dispersion of test scores. To illustrate these figures graphically,
the average value for each
country on all five tests is depicted in Figure 1. (To obviate negative values 2.0 has been added
in each case.)
FIGURE1
The Average

Value of the Standard

Deviation

on Five Tests for each of Twelve

Countries

I
4.0 I

Fi

Fr

SC0

Swe

Swi

Some comments on Table 1 and Figure 1 are clearly necessary. Firstly, it must be emphasised
that no significant
relationship
exists between the standard deviations
and the mean scores
obtained in each country. Secondly, the fact that the pupils tested in each country were not
strictly random samples must to some extent detract from the significance
to be attached to
these findings. Samples of schools selected by subjective judgment to be representative,
are
more likely than not to yield smaller standard deviations than random samples, owing to the
human tendency to under-represent
the very bad. Also in some instances a restriction
was
acknowledged.
The description
provided by Switzerland
of its sample clearly indicates that it
was hardly representative
of the whole country, since it was taken exclusively from a prosperous
middle-class town. In Israel the sample excluded recent immigrants from under-developed
areas;
it was also chosen from 8th grade pupils, that is, although the mean age was 131/g, it contained
some pupils younger than 13 and older than 14. Both these factors clearly affected the dispersion
of test scores. In other countries, however, the sample was chosen by methods similar to that
employed in England, namely, the testing of all pupils falling within the stated age range attending

---.

-_-_

-_-.__.

all types of secondary school in a seiected area - the selection of this area being based on
other test evidence, which suggested that it was reasonably typical of the whole country both
as regards mean and spread of test scores. In Scotland, two areas were chosen - a city and
a county - but the sample was deficient in children from the professional
and skilled worker
classes, probably resulting in some restrictions of the dispersion of test scores.
With these reservations
the data from Table 1 must be viewed with caution. Nevertheless,
there
would seem to be fairly clear indications that the dispersion of test scores in England is large
compared with that in other countries, thus supporting the previous evidence cited. It would not
seem an idle occupation, therefore, to speculate upon the reasons for this.

Discussion
All countries concerned in this study, apart from England and Scotland, employ some variant of
what might be called the grade placement system. In such a system, children are assigned to
grades initially according to age, but subsequently according to their ability to assimilate successfully the work covered in the grade in which they were the previous year. It is possible for some
children to be accelerated,
that is, to miss a grade, and for others to be retarded,
that is, to
repeat a grade. Thus, at any given point in time, say after five years schooling, while the majority
of pupils will be found in Grade 6, some will be in Grade 7, and others in Grade 5 or possibly,
having repeated
twice, in Grade 4. The numbers of pupils accelerated or retarded wi!l depend
upon the limits accepted as constituting a successful pass in the previous grades work, and
this, in turn, upon the standard of work demanded in each grade.
In England and Scotland, however, yearly promotion is primarily based on age, and if numbers
necessitate, pupils within one age group will be divided into separate classes. In both countries,
it Is the general practice in such circumstances to stream these separate classes by ability and
attainment, even in the junior school i.e. between age 7 and 11. Owing to the relatively larger
number of small primary schools in Scotland, the proportion of children in streamed classes is
somewhat less.
These descriptions
do not do justice to the differences
that exist in the various countries; they
are probably sufficient, however, for the present purpose, since it is contended that it is not the
difference between the two systems that is important so much as the general aims and beliefs
of the teachers practising within the systems. It is argued that the major objective of the grade
class teacher is to ensure that as many of his pupils as possible complete the work of the grade
successfully.
His class is, however, heterogeneous
with regard to ability and it is not unreasonable to suppose that the brighter children will require much less effort from the teacher
than the duller ones. Hence, while it is possible that the brighter children within such a group will
not be unduly extended, the duller ones will presumably receive every encouragement
to achieve
the grade pass and the net result will be a tendency for achievement test scores, if not to cluster
closely around the mean, at least to be relatively unrepresented
at the extremes. In England,
however, there is no such general acceptance of a similar curriculum for all children in a particular class and, certainly in streamed schools, the work achieved by A stream children at any
given age will be considerably
more advanced than that expected of C stream children.
This introduces the notion of expectancy
or the standard of work expected by a teacher of
his pupils. What is expected will clearly be determined in the first place by any curriculum that
is defined, but secondly, it will be influenced by the philosophical
beliefs of the teacher. In
many countries employing the grade placement system, the curriculum for any grade will be
defined quite clearly even to the provision of state text books; in others a greater degree of
fluidity within a school or class may be found. In England and also Scotland individual schools
have a higher degree of autonomy and what is taught at any age will depend to a greater degree
upon what the head teacher or even the class teacher thinks is right, although external examinations even in the primary school control this to a certain extent. But in all countries what a teacher
expects from individual pupils in his class must also depend upon his own particular beliefs.
Different patterns of achievement would be obtained, for example, by two grade class teachers,
one of whom believed that all children in his class were perfectly capable of covering adequa-

80

tely the work of the grade, given sufficient effort on his part with the duller ones, and the other
who believed that, since achievement was necessarily
limited by innate ability, there must be
some children in his class incapable of completing the years work successfully.
Such difference
in beliefs will also have an influence on the work of brighter children. The teacher who strives to
match attainment with ability will also be aware that children of high ability are capable of work
more advanced than that demanded by the grade syllabus, and although in the grade system this
matching will be achieved to some extent by jumping a grade at promotion time, it is clear it
will also influence what the teacher expects from the brighter pupils within his own class.
When considering the effects of these
given age, it would seem clear that the
extent of regarding innate ability as the
hence tend to obtain a wider dispersion
the fact of individual ability differences,
of the teaching situation play an equal if

different beliefs upon the spread of achievement at any


teacher who stresses innate individual differences
to the
major factor in determining achievement, will expect and
of attainment than the teacher who, although accepting
nevertheless
believes that the environmental
influences
not more important part.

It is maintained that this belief in the relative importance of innate individual ability differences
is
predominantly
held in England and indeed has lead to the general acceptance of the practice of
streaming. Burt (1959) has said . . it is plainly imperative that both teachers and local authorities should take full account of such differences
in their efforts to provide an education which
will (in the words of the Act)* be adapted to each childs ability and aptitude. To match attainment to innate ability presupposes,
of course, that the ability can be measured. Some popular
misunderstandings
regarding attempts to do this have been described elsewhere (Pidgeon, 1961).
Burt himself has stressed many times the difficulty
of ascertaining
a childs I. Q. accurately
(e, g.: Burt, 1959), but nevertheless
insists that, because of the wide dispersion of measured
selection is absolutely essential: in my
intelligence
at age 10 or 11, some kind of provisional
view it should start much earlier,
. indeed, as soon as possible after a child has entered school.
The acceptance of this view led not only to the separation of children of greater and less ability
into different types of secondary school (Board of Education, 1926) but also to the practice of
streaming in junior schools (Board of Education, 1937). However, and Burt himself stresses this,
if children are to be separated for differential
instruction from an early age, it is essential that
the child is free to swim from one stream to another as his capacities
develop or decline
(Burt, 1959) and also, it must be added, to allow for inaccuracies in the original measurement. But.
as Daniels (1961a) has shown, there is far less fluidity in streaming than teachers themselves
imagine, or, as Vernon (1955) has demonstrated, is necessary.
The concern here is not whether streaming is, in itself, good or bad, but with its effect on the
dispersion of achievement. It is argued that the expectancy of A stream teachers for relatively
high attainment helps in itself to lead to this result being obtained, just as the expectancy of C
stream teachers for relatively low attainment helps to produce this result. Also, of course, the
belief that attainment can and should be matched to ability, made easier perhaps in homogeneous
ability classes, while it would tend to result in the stretching of brighter children, would not have
this effect with duller ones, since, for many at least, the limit of their capacity would apparently
have been reached. The effect this has on increasing the dispersion of achievement scores might,
perhaps, be enhanced where teachers use tests of intelligence
and attainment to help measure
the success of their efforts, for ordinary regression effects will tend to make dull C stream
children, subsequently tested for attainments, appear to be working up to capacity and bright
A stream children appear to have room for further improvement.
It should perhaps be added
here, that the more successfully
childrens attainments are matched to their ability, the more
successful will any initial streaming appear to be - the self-fulfilling
prophecy
described by
Daniels (1959).
There would appear, therefore, to be a number of factors affecting the dispersion of achievement
at any given age. In the first place, the general aim of the grade class teacher may tend to result
in a relatively smaller dispersion. Perhaps exerting a greater influence, however, is the belief
a teacher may have that innate ability is of paramount importance in determining the level of
* The 1944 Education

Act.

61

attainment to be expected from a child. Streaming by ability, which is viewed as an administrative


device resulting from the acceptance of this belief, will merely tend to enhance its effects. When
all these factors act in the same direction the effect will clearly be greatest and this is what happens in England. Here, it is claimed, the aims and, more especially, the beliefs of most teachers
and educational administrators
lead them to expect wide differences
in performance,
and this is
what is therefore achieved. Where, on the other hand, the grade placement system operates and
especially where, within such a system, teachers do not attempt to measure innate ability and
therefore do not expect their pupils attainments to be matched to it, then the dispersion of
achievement will be much less.
While possible explanations
for the relatively wide dispersion of test scores in England can be
offered, clearly a value judgment is involved in answering the question as to whether this is,
educationally,
good or bad. Some further relevant evidence can be given from the present study,
however, by examining the proportions
of pupils obtaining relatively
high and low scores,
Obviously, the average levels of performance will influence these proportions,
but the contrast
between England and other countries
is shown in Table 2, which gives the percentage
of
pupils scoring outside the limits of raw score, approximating
to plus and minus one standard
deviation.
TABLE 2
Percentage

of Pupils scoring

Test

Average
12 countries

20.7
20.9
19.5
21.9
18.4

Non-Verbal
Mathematics
Reading
Geography
Science

Below
of

beyond

& 1 S. D. on each of Five Tests

I S. D.
England

15.9
39.1
25.1
34.7
24.2

Above
Average
of
12 countries

20.3
22.7
18.2
18.3
16.0

+1 S. D.
England

28.7
15.5
20.8
11.4
18.1

It will be observed from Table 2 that, in three of the five tests (non-verbal,
reading and science)
England has a larger percentage than the average scoring above plus one standard deviation,
but that in all four attainment tests, it also has a larger percentage than the average scoring below
minus one standard deviation. Some concern might be felt for the 39.1 O/Oof pupils obtaining
low scores on the mathematics test.
Bibliographical

references

Board of Education

1926 -

Board of Education
Burt, C., 1959

1937 -

Daniels,

J. C., 1959

Daniels, J. C., 1961a

Daniels, J. C.. 1961 b

Lloyd, F. and
Pidgeon, D. A., 1961

Pidgeon,

D. A., 1958

Pidgeon, D. A., 1961

62

Report of the Consultative


Committee on the Education of Adolescent Children, H. M. S. O., London.
Handbook of Suggestions for Teachers, H. M. S. 0.. London.
General Ability and Special Aptitudes, Educational Research, Vol. I,
No. 2.
Some effects of sex segregation
and streaming on the intellectual
and scholastic development
of junior school children. Unpublished
thesis, Nottingham University.
The effects of streaming in the Primary School, I - What Teachers
Believe, Brit. Jour. Educ. Psych. 31, 69-78.
The effects of streaming in the Primary School, II - A Comparison
of Streamed and Unstreamed Schools, Brit. Jour. Educ. Psych. 31,
119-l 27.
An Investigation
into the Effects of Coaching on Non-Verbal
Test
Material with European, Indian and African Children, Brit. Jour. Educ.
Psych. 31, 145-l 51.
A Comparative
Study of Basic Attainments,
Educational Research,
Vol. I, No. 1.
The interpretation
of test scores, Educational
Research, Vol. IV.
33-43.

David A.Walker
AN ANALYSIS
OF THE REACTIONS
OF SCOTTISH
TEACHERS
AND PUPILS
TO ITEMS
IN THE GEOGRAPHY,
MATHEMATICS
AND SCIENCE
TESTS

To establish
the relevance
of test items to pupils
learning
opportunities
is important
both from
the point of view of measuring
achievement
and from that of maintaining
the goodwill
of teachers
whose
pupils
undergo
the tests.
In the following
article
Dr. Walker
shows
how this need can be
fulfilled
and seeks further
to establish
the extent
to which
in- and out-of-school
learning
opportunities.
ability
and otherfactors
appeared
from the international
tests to be determinants
of success.

When the same test, or series of tests, is administered to pupils of different countrieswith
different
educational systems, it is unlikely that the items will be equally acceptable or equally useful in
all of the countries concerned. The present inquiry was intended in the first place to assess the
reactions of the teachers of the classes concerned to the items used in the tests of geography,
mathematics and science, and secondly to estimate, if possible, the contributions
made by stress
on the topics in the curriculum and bythe environment to the accuracywith
which pupils answered
the questions.

Rating the items


To obtain the opinion of the teachers concerned, copies of the test booklets were sent to each
school taking part in the original investigation.
The following
instructions were issued with the
booklets.
instructions
(1)

(2)

for rating

items
(Mathematics,
Science)
was taken
by a group
to rate the items of the test in two ways:
used for these pupils;
of the pupils.

of 13-year-old

Read

the

required

each

test

item.

typically
covered
point scale.
Rating
Rating
Rating
Enter

Consider

in the

degree

instruction

of

to which

pupils

in

rating

Consider

1, 2 or 3 in the
next

the

extent

answer

apace

provided

to which

pupils

have

Rating
Rating
Rating

rating.
(4)

knowledge
like

in the
Use the

and

yours.

skills

Give

extensively,

in the test

opportunity

encounter
knowledge
such as those
involved
some, or little
exposure
to such experiences.

Enter

the
classes

1 Stressed:
well covered
in class and in homework
(if any).
2 included
but not stressed;
touched
on but not dealt with
3 Not included.
the

(3)

test

The attached
test in Geography
school
last year. You are asked
(a) in relation
to the curriculum
(b) in relation
to the environment

a rating

intensively

pupils

by the
on

the

in your

question

following

era
three

or repeatedly.

booklet.

in the

home

and

in the

test question.
Decide
following
rating
scale.

community

whether

to use

there

skills

or

is considerable,

A Considerable
exposure
B Some exposure
C Litt!e or no exposure
the rating
A. 6 or C in the
e.g., 1A. 2C.

answer

apace

provided

in the

test

booklet.

Indicate
clearly
on the front of the test booklet
the school
and course
of the tests these coursea
were described
as five-year,
three-year
with
no foreign
language
and three-year
modified.

In some
booklets

schools
should

(5)

comment

Any

the coursea
for boys
be labelled
accordingly.
which

teachers

may

and

wish

girls

differ

to make

and

separate

on the tests

will

Each

item

will

thus

have

a double

to which
the ratings
refer. At the
one foreign
language,
three-year

booklets

be welcomed

will

be required

for

each

sex.

time
with

The

by the Council.

All schools taking part in the original investigation


co-operated
in this supplementary
inquiry. A
very small part of the data had to be rejected because some teachers did not give a definite rating,
e.g., an item was rated 1 or 2 or B/C.

63

Of the 864 curriculum ratings given to


were 2, and 28% were 3. In the opinion
relation to the topics covered in school
the proportions 9 %, 34 % and 57 z, i.e.,
from the environment.

the 32 items of the geography test 28 % were 1, 44 X


of the Scottish teachers this test had only a moderate
work. The environment
ratings A, B and C occurred in
the teachers felt that only a minor amount of help came

In mathematics
the position was similar though in this subject there was a higher proportion
given to rating 1. Of the 1,066 curriculum ratings for the 26 items, 42 % were 1, 34 % were 2 and
24 % were 3. The help expected from the environment was even less in this subject, the rating A
occurring in only 6 % of the replies, B in 28 % and C in 65 %.

In science, the curriculum ratings were similar to those in the other subjects. Of 730 ratings on
21 items, 30 % were 1, 30 z were 2 and 40 9; were 3. Greater help was thought to be available
from the environment in this subject, the rating A being given in 20 % of the cases, B in 40 % and
Cin40%.
The ratings
Appendix.

differed

greatly

from item to item in each test as is shown

in the list given in the

It must not be assumed from the figures quoted above or from the data in the Appendix that the
topics with adverse curriculum ratings are not covered in the school courses. In many cases they
occur at a later stage in the curriculum. The pupils tested were mostly in the second year of a
course which is of three to six years duration and different schools have different schemes for
covering the work.

Agreement

among the ratings

It would have been possible to calculate a mean curriculum rating and standard error of the
mean for all teachers, using the values 1, 2 and 3 for the three ratings. This might, however, have
given a misleading picture. For example, an item rated 1 by 20 teachers, 2 by none, and 3 by 20
teachers would then be given a mean rating of 2, which was not actually given by any teacher,
and a standard error of about 0.16, which is relatively small because of the number of teachers
involved. For this reason the table in the Appendix gives the most frequently occurring rating
for each item and not the mean.
One possible factor causing disagreement
among ratings is the variation in course to suit the
ability and sex of the pupils. In any assessment of the extent of agreement among teachers in
rating the items it is therefore advisable to deal separately with different types of course. The
main types in Scotland are (a) the five-year course for the more gifted pupil, (b) the three-year
course with one foreign language for the pupil a little above the average, (c) the three-year
course with no foreign language for the average pupil and (d) the modified course for the pupil
within the lowest 10 to 20 % of the ability range. In some schools it is also necessary to differentiate between courses for boys and those for girls.
Within each of these types of course we can assess the extent of agreement among the teachers
ratings by calculating their variance. If the curriculum ratings are valued at 1, 2 and 3, as given
by the teacher, the variance of the distribution
wi!l be zero when all teachers agree, 2/3 when
the ratings are distributed evenly over all three values, and 1 when half of the teachers select
rating 1 and the other half select rating 3, showing maximum disagreement.
These variances
ously described,

64

were calculated for all the items of the three tests, using the categories
and the results are summarised in Table 1.

previ-

TABLE
Variances

of Curriculum

9
z
E
5
2

Items
Highest
variance

Lowest
variance

Average
variance

0.30

0
0.15
0

0.39
0.36
0.20

1
0.63
0.67

Five-year

6
11
6
4

0.29

0.75

Three-year
one foreign language
no foreign language (boys)
no foreign language (girls)

G
11
12

0.14
0
0

0.39
0.25
0.25

0.80
0.52
0.67

0.18

0.75

0
0.13
0
0

0.43
0.53
0.43
0.36

0.80
0.98
0.86
1

Five-year
Three-year
one foreign language
no foreign language
modified

Five-year
E
z
z
cn

for Different

Number of
teachers

COURX

2
$
m
:
(3

Rating Distributions

Three-year
one foreign language
no foreign language (boys)
no foreign language (girls)
no foreign language (boys and girls)

It will be observed that even within the main types of course the curriculum ratings of particular
items were in perfect agreement for some items and in complete disagreement
for others. A
similar pattern was obtained from the environment ratings. These patterns indicate that the extent
of agreement among the teachers, even within a course, was only moderate when averaged over
all the items of each test. They throw little or no light on the reliability of each teachers ratings.

The relation

between

facility

of item, teachers

ratings and ability

of class

Although the reliabilities


of the teachers ratings were not established by the results of the preceding section, an effort was made to measure the extent to which the stressing of a topic in
the curriculum, as assessed by the rating, or having help from the environment, similarly assessed, determined the pupils proficiencies
in that topic.
The proficiency
of each group (which might comprise all pupils in a particular course in one
school) was measured by the percentage of correct answers to the appropriate
item. This percentage was converted to the corresponding
probit (i.e., normal deviate plus five) for statistical
reasons. The curriculum ratings 1, 2 and 3 were replaced by 1, 0 and -1 and the environment
ratings A, B and C were changed to the numerical scale 1, 0 and - 1 also.
It is likely that the proficiency of a group is affected by the general ability of the group, whatever
the stress in the curriculum or the help provided by the environment. A simple and rough measure
of this ability for each group was obtained by rating five-year courses as 2, three-year courses
with one foreign language as 1. three-year courses with no foreign language as 0, and modified
courses as -1. The objection may be raised that it is not the ability of the pupils that is here
being assessed but their exposure to a given type of course. It would have been possible, by
seeking further information from schools, to establish that the average abilities, as measured by
tests of verbal reasoning, are in the descending order of the numbers given. Readers who prefer
in the following
discussion
by the phrase type of
to do so may replace the word ability
course followed.
For the geography test there were then available 27 different groups, each containing at least
seven pupils, and for each of these groups and for each item of the test there were available (a)
the facility probit; (b) the curriculum rating and (c) the environment rating for the item; and (d)

65

the ability level of the group. For example, the 49 pupils in the 5-year course in one school gave
35 correct answers to item 2 of the geography test, rated 1 B by the teachers in that school.
Thus the facility percentage for curriculum rating 1, environment rating 0 and ability level 2 was
71.4 y0 for this group, giving a facility probit of 5.57. The 27 groups then provided the data to set
up for each item the regression equation
facility

probit = br x curriculum

rating

+ b2 x environment

rating

+ bs x abrlity level

As a first approximation
each group was given equal weight, i.e., the differences
in the numbers
of pupils in the groups were ignored.
This technique was applied to three items in the geography test, three in the mathematics test
and four in the science test. The items were chosen partly for their relevance to the present
inquiry and partly because results from the main inquiry had suggested points of interest.
A summary of the results is shown in Table 2 in which coefficients
which are statistically
significant are marked *. It will be observed that for no item was the regression coefficient
for the
curriculum rating significantly
different from zero, and only for one item was this true for the
regression coefficient for the environment rating. On the other hand, for all items save one the
regression coefficient
for the ability rating was significantly
greater than zero. In other words,
the proficiency
of a group in answering an item is directly related to the ability level of the
group, but appears to have little relation to the amount of stress given by the teacher to the
topic tested by the item. The fraction of the whole variance of the probits accounted for by the
regression equation varied from a non-significant
12 o/0 for Item 9 of the science test to 57 o/0
for Item 22 of the mathematics test.
Table 2
Regression

for Selected

Items
Percentage 0

Test

Geography

Item

Curric.

Errors
Envt.

Ability

varmnce
accounted

- 0.12

0.30

0.36*

0.19

0.18

0.11

50

0.06

- 0.04

0.34"

0.11

0.14

0.08

46

0.26

0.31*

0.16

0.15

0.11

45

0.16

0.30*

0.21

0.21

0.15

24

0.54*

0.13

0.13

0.09

57

0.31*

0.19
0.10

0.17

0.11

52

0.1 1

0.08

43

0.14

0.17

0.13

0.14

0.10
0.1 1

23

5
12
22

Science

Standard

8
14

Mathematics

Regresslon
coefficients
Curric.
EfM.
Ability

- 0.11
0.22

see discussion..
0.21
-0.11
-0.04

0.16

0.1 1

ia

0.11

0.44*
0.07
-0.02

0.31*
0.17

0.12

0.26*

for

12

Item 2 of the geography test was of information


type, asking what use was made of Tundra
regions. Item 8 was purely factual, asking into which sea the River Danube flowed. In either item
it might have been thought that degree of stress in the curriculum would have markedly affected
the proportion of correct answers. This was not so, the accuracy of the pupils answers being
related only to their general ability and not to the stress given in the curriculum or to the help of
the environment. The position was very similar in response to Item 14, which was an exercise
in interpreting a map.
The first mathematics item to be examined was number 5, which was multiply 9.04 by 0.4. The
performances
of the groups varied from 0 oh to 100 %. but the regression accounted for only
24 y. of the variance. One reason may be that 31 of the 40 teachers gave the curriculum rating 1,
showing that this type of calculation was stressed in the curriculum. Within that rating, degrees
of stress would no doubt vary.
The second mathematics item examined was number 12, which was a problem involving the cal-

66

culation of the area of a triangle with base 40 yards and altitude 37 yards. This question proved
very difficult for Scottish pupils, twenty-three
of the forty groups scoring zero, and the mean
score of all groups being only 19 %. As there was so large a number of zero scores, the regression technique was not applied. It was, however, noted that the percentages of correct answers
for the three curriculum ratings were 23 %, 18 y. and 17 %, while those for the four ability
groups were 49 %, 24 %, 12 y. and 0 %.
Item 22 of the mathematics test was again a calculation of areas, but in this case an example
was given. Scottish pupils fared better on this item; every group contained pupils giving correct
answers and the mean score of all groups was 59 %. The percentage of the variance accounted
for by the regression equation was 57, the highest of all the items examined.
Items 1, 7 and 9 of the science test were selected partly because sex differences were shown in
the percentages
of correct answers, boys being superior in all three. The first referred to the
force required to push an object up an inclined plane and the teachers, while not stating that this
type of question was stressed more frequently with boys than girls, appeared to be of opinion
that it was the kind of problem more likely to occur in a boys environment than a girls This was
the only item in which the environment rating was significant.
Boys were also superior in their replies to Item 7, which dealt with the principle of flotation, but
neither curriculum rating nor environment rating appeared to affect the regression equation.
The responses to Item 9, on the principle of the lever, provided some surprises. Not one of the
regression coefficients
was significant nor was the contribution of all three together, the percentage of the variances attributable to regression being only 12 %. The percentage correct over
all groups was 36 O/Oand the percentage for the various groups ranged from 9 to 75 %. As the
item was a multiple choice one, with four possible answers, there is a suggestion here that a fair
amount of guessing had occurred. This idea is supported by the fact that the curriculum rating for
26 of the 36 groups indicated that the topic had not been referred to by those teachers.
Finally, Item 18 of the science test, which referred to the usual method of estimating the age of a
tree, produced a good response from the Scottish pupils, but once again the only factor associated with success was the ability rating, and the proportion of the total variance accounted for
by all three factors was only 23 %.
The results of this analysis may be disappointing
to teachers in that so little difference seems to
be made to the proportion of correct answers by their stressing or not stressing particular topics.
It must be borne in mind, however, that the measures used were comparatively
coarse and that
the analysis has been made as simple as possible. With these reservations,
it would appear that,
at the age at which the tests were administered, ability is a greater determinant of success than
stress by teacher or help from environment,
and that other factors are, in most cases, having
greater effect than all three together.

APPENDIX

ON PAGE 66

67

APPENDIX
Most frequently
Item

68

Geography

given ratings for each item


Mathematics

Science

IB

18

3c

28

1c

3c

2c

IC

3c

IC

1c
2B

1c

IC
38

2c

2B

3c

1C

1B

3c

2c

16

IB

1C

IC

3c

10

2c

IC

3c

11

2C

IC

3c

3A
1B

12

2c

13

2c

2c

IB

14

2c

IA

15

2c

3c
2B

16

2c

2c

2c

17

2c

2c

28

18

2c

3c

25

19

2c

3c

2c

20

2c

3c

3B

21

2c

1B

1C

22

2c

1c

23

2c

3c

24

2c

36

25

2c

3c

26

2c

3c

27

2c

28

2c

29

3c

30

3c

31

3c

32

3c

IC

You might also like