You are on page 1of 10

Hayward, Stewart, Phillips, Norris, & Lovell 1

Test Review: The Renfrew Bus Story (RBS)

Name of Test: The Renfrew Bus Story Language Screening by Narrative Recall (American Edition)
Author(s): Catherine Renfrew; adapted by Judy Cowley and Cheryl Glasgow
Publisher/Year: 1969; US edition 1994 published by The Centerville School Delaware
Forms: only one
Age Range: 3 years, 6 months to 7 years
Norming Sample

Previously, norming was completed for Australian and U.K. populations. In this adapted edition, the American authors have served as
examiners and scorers for a sample of 418 children in Mid-Atlantic states, Florida, and Illinois (Renfrew, Crowley, & Glasgow,
1994, p. 29). In terms of biases, the authors state, the story line proved regionally, ethnically, and sexually unbiased as supported by
the means of the Information, Sentence Length and Complexity Scores across the sample. See Figure 1 (p. 29).

Comment: At this point, I was already having some reservations about the Bus Story. For example, the Figure on page 29 referred to
is simply a tabulation and really does nothing to assure me that biases were eliminated. I will comment more on test bias later in this
review.

Total Number: 418


Number and Age: The test was normed on a sample of 418 children in seven age ranges as follows: 3 years, 6 months to 3 years, 11
months (n=43), 4 years, 0 months to 4 years, 5 months (n=65), 4 years, 6 months to 4 years, 11 months (n=78), 5 years, 0 months to 5
years, 5 months (n=71), 5 years, 6 months to 5 years, 11 months (n=51), 6 years, 0 months to 6 years, 5 months (n=52), and 6 years, 6
months to 6 years, 11 months (n= 58).
Location: Testing occurred in public, private, and parochial schools in Florida, Illinois and unspecified Mid-Atlantic states
(Renfrew et al., 1994, p. 29).
Demographics: Figure 2, Subject Distribution Chart, indicates that 228 males and 190 females participated with the number being
larger in the male category for all ages excepting 5 years, 6 months to 5 years, 11 months (males = 25 and females =26). Figure 1 (no
title, p. 29) lists 349 Caucasian, 32 Black and 13 Hispanic/Others.
Rural/Urban: Figure 1 indicates 281 students from urban settings and 137 students from rural settings were tested. Community sizes
and/or locations are not provided.
SES: The authors provide only the statement, The children who participated represented a range of socio-economic levels and both

1
Hayward, Stewart, Phillips, Norris, & Lovell 2

rural and urban backgrounds (Renfrew et al., 1994, p. 29).

Other: Children with hearing impairment, language delay, learning disability, or otherwise identified by the teacher or school were
excluded.

Comment: Without representation of these clinical groups, the scores of children with disabilities cannot be interpreted.

Comment: Details are lacking on all aspects of the sample which lend the impression that the authors are either careless or dodgy.
No U.S. census information is provided for comparison. There are small numbers throughout with none exceeding the accepted
minimum 100 participant guideline. Overall, without detailed sample information, it is difficult for examiners to decide whether the
children they serve are represented.

One Buros reviewer noted that by excluding children with special learning concerns, poorer standard scores result for these children
than if they had been included. Generally, both Buros reviewers had concerns about the standardization sample which they felt
limited the tests utility (Bain & Haccoun, 2001).

Summary Prepared By (Name and Date): Eleanor Stewart 6, and 13 July 2007 and September 2008
Test Description/Overview

The test kit consists of the story booklet, a record form, and the examiners manual. The scripted story is found on page 6 of the
manual. A tape recorder is required to audio record the childs story retell and responses to questions. The record form contains a
section called Transcript which is used to record the childs retell. Scoring for each area of analysis is presented in columns to the
right of the transcribed responses. Scoring for Independence is presented in a separate section at the top left of the record form.

Purpose of Test: This test is a screening tool designed to be used along with other assessment tools to guide further areas of
diagnostic testing. Catherine Renfrew, the originator, is quoted in the manual as stating the purpose of the test as a screening to
ascertain in a simple form the ability to give a coherent description of a continuous series of events (Renfrew et al., 1994). The Bus
Story measures childrens ability to retell relevant information about a story (p. 1). The analysis of the childrens retell provides
information about the childrens integrative language skills using a naturally occurring activity.

Theory: Though a theory is not identified, the authors provide a rationale for using story retell as a means of assessing language

2
Hayward, Stewart, Phillips, Norris, & Lovell 3

skills by presenting general notes on key research on narratives, noting that narratives in different forms have been used to explore a
range of developmental skills including: psychosocial, metacognitive, strategy use, and vocabulary. Further, they present a brief
review of research that centers on factors they describe as: the integration factor, the natural factor, the support factor, and the
academic connection. On this last factor, they cite the work of Haynes and Naidoo (1991) who found that performance on the Bus
Story was a predictor of reading comprehension (Renfrew et al., 1994, p. 4).

The Integration Factor taps high level language skills beyond the word and sentence levels commonly used in language testing.
Therefore, the Bus Story is viewed as a more sensitive measure that reflects the classroom demands that students face. The Natural
Factor refers to the story telling aspect of the Bus Story that permits the child to engage in a familiar activity that occurs at home and
at school. Story telling is associated with memory and concept development. Stories functions to internalize and helps us to
comprehend our experiences. The Academic Connection highlights the link between early language delays and later academic
difficulties. Specifically, story retell has been demonstrated to predict reading comprehension. Finally, the Support Factor of the Bus
Story incorporates structural and procedural format that supports the childs comprehension, memory, and formulation along the lines
of developmental expectations. The authors state, Due to the complex integrative demands of narrative recall, the Renfrew Bus
Story is designed to provide maximum structure in order to elicit a childs best possible performance (Renfrew et al., 1994, p. 4).

Areas Tested: oral narrative skills


Oral Language Vocabulary Grammar Narratives Other

Who can Administer: Recommendations regarding who would be qualified to administer the test are not specified.

Administration Time: Administration time is not specified, however, it appears to take less than 10 minutes. More time is needed
for transcription and scoring.

Test Administration (General and Subtests):

This test is individually administered. The manual recommends that a quiet setting be chosen. The child is given an opportunity to
look at the test booklet prior to beginning administration. The examiner then instructs the child to listen as the adult reads the scripted
story aloud. There are 12 pictures. The child is then invited to tell the story, with the adults prompt, Now you tell me the story. Once
upon a time there was a .

3
Hayward, Stewart, Phillips, Norris, & Lovell 4

Prompting is allowed with specified prompts outlined in the manual on p age 5 as well as page 25 in the section Independence
Score. An Inferential question is included because a response is seen as a strong indicator of a childs ability to integrate
information across propositions as is surface narrative recall (Renfrew et al., 1994, p. 27). (The question reads: Do you think the bus
was happy to be on the road again? Why?)

The childrens retells are then transcribed from the tape. A section in the manual, Transcription (p. 7), provides general
transcription guidelines as well as definitions and examples for what constitutes a sentence. (Comment: I think that an experienced
clinician can probably transcribe portions of the childs retell on-line and then check against and add missing information using the
tape).

Test Interpretation:

Using the transcribed utterances and the test manual, the examiner scores for key information contained in the story (content),
sentence length, linguistic complexity, and level of cueing. Scoring involves categories for which raw scores can be converted to both
standard scores and percentiles are provided: a) Information (i.e., content), b) Sentence length c) Complexity (i.e., subordinate and
relative clauses) and d) independence (level of cueing/prompt).

Scoring requires familiarity and practice. In the standardization section called Test Preparation the authors state, Scorers should
familiarize themselves thoroughly with complexity examples before scoring (Renfrew et al., 1994, p. 30). This statement follows the
report of inter-tester reliability where low reliability was reported for Complexity correlations.

Scoring instructions begin on page 9 with general guidelines that include instructions for scoring Information with 1 or 2 points
awarded to a response that captures the semantic sense of the item; that is, if the meaning of the essential component is present
(Renfrew et al., 1994, p. 9). Detailed Examples for Information Scoring follow on pages 11 through 21.

Dialect, colloquialisms, and immaturities are designated may be given credit. The authors direct examiners to the CELF-P manual
for information on dialects.

In terms of interpretation, the authors do not discuss test results in terms of implications for reading or academic success. They gave a
surface treatment of the connection between narrative retell in the opening statements as I described earlier.

4
Hayward, Stewart, Phillips, Norris, & Lovell 5

Comment: It is difficult to provide scoring when the test such as this one aims for naturalistic communication. I guess that something
is lost when we try to narrow the field. When larger standardization studies are conducted, a pool of responses can be analyzed in
more detail, thus increasing the likelihood of accurate scoring among examiners. This was done for the qualitative responses in the
CELF-4.

I think that the implication of test performance needs to be clearly outlined. Clinicians using this test probably draw on their
knowledge of the literature and of narrative development.

One Buros reviewer states: Leaving aside the omissions and technical ambiguities, the interpretation of results may prove
somewhat taxing in practice. Only the Complexity dimension (low scores flag caution) is associated with explicit interpretation
labels. This dimension is the very one that shows the weakest psychometric qualities (Bain & Haccoun, 2001, p. 1007).

Standardization: Percentiles Standard scores Other: confidence intervals (68%, 80%, and 90%) for Information and
Sentence Length (Renfrew et al., 1994, p. 46).

The Independence Score (p. 48) provides the average ranges for Independence scores by age intervals. The authors note
that at 6 years, 11 months, the children generally moved independently through the story retelling (p. 31). According to
the chart, these childrens score are in the average range of 48 to 55.

The Complexity Reference Chart (p. 47) provides qualitative ratings (concern, low average, average, above average,
superior for performance in percentages for each age interval. For example, at 5 years, 0 months to 5 years, 5 months, a
raw score of 0 was obtained by 13% of their sample which is rated concern. In the explanation on page 31, the authors
state, The label is more important than the percentage. The percentage represents an overall view of performance at each
age level (p. 31).

Comment: I am not entirely sure how the authors developed the table for Complexity Reference Chart and therefore, Im not sure
what this means.
Reliability

5
Hayward, Stewart, Phillips, Norris, & Lovell 6

Internal consistency of items: No information is provided.

Test-retest: Twenty-seven children ranging in age from 4 years, 6 months to 6 years, 11 months were retested with a four week
interval. The test-retest coefficients were: Information .7918, Sentence Length .7293, and Complexity .5825.
Comment: Was this done by the same examiner? Where did the children come from? This is a very small sample, why? Not much
information is provided.

Inter-rater: Inter-rater reliability was conducted with two special education teachers without a formal language background
(Renfrew et al., 1994, p. 30) who scored 25 randomly chosen transcripts. Their scoring was then compared to the two authors scores
with correlations calculated. Correlations were reported as:
Information: .92, .72 and .70.
Sentence Length: .70, .83, and .81.
Complexity: .22, .60. and .33

Other: none

Comment: The low reliability coefficients affect validity; thus making the Bus Story less useful for its intended purpose. Reliability
coefficients should exceed .80 (over .90 is desirable). Given that the scoring introduces subjective judgment, it is disappointing that
higher reliability coefficients were not achieved. Further, details are missing as to how the reliability was co-ordinated with two
authors and two other raters. Shouldnt there have been two sets of data, one for each author?

It seems that authors had some concerns about the inexperienced teachers (who performed the reliability testing) but they did not
explicitly state what their concerns were except to make a veiled statement about how examiners need to be completely familiar with
scoring procedures (Renfrew et al., 1994, p. 30). Also, given that the Renfrew Bus Story is so well-established in clinical practice, it
seems to me that it would have been fairly easy to do a larger sample for reliability.

Validity

Content: Authors cite Paul and Smith (1993) to support their claim as to the validity of the narrative recall task represented by the
Bus Story as a measure of an integrative task (Renfrew et al., 1994, p. 30). Also, an opening section, History and Description,

6
Hayward, Stewart, Phillips, Norris, & Lovell 7

provides an overview of research as previously described. Comment: This information provides slim qualitative evidence of content
validity. No quantitative analyses such as item analysis were offered.

Criterion Prediction Validity: The US and UK versions were correlated as follows: Information = 0.9773 and Sentence Length =
0.9832. No other tests were reported.

Comment: Because the norming sample information is wonting, I am left wondering about the concurrent validity. We dont know
anything about the UK sample from the manual. Also, this line of evidence for validity is generally demonstrated with correlations to
existing gold standard measures. While it may have been said that a corresponding narrative test did not exist when the Bus Story
was developed, the authors missed an opportunity to at least state the situation.

Construct Identification Validity:

Comment: This line of evidence for validity is not explicitly identified. Although some data is offered in the figures on pages 29 and
30, the reader would have to be familiar with the different ways that construct identification may be expressed.

Differential Item Functioning: The authors state only, Analysis of the Information, Sentence Length, and Complexity scores have
[sic] indicated that children perform increasingly well as a function of chronological age (Renfrew et al., 1994, p. 30). They give the
mean scores for each category adding that the children perform increasingly well as a function of chronological age (p. 30). For
example, the Information score began at 15.00 and rose smoothly to 31.00 as age increased (p. 30).

Figure 1 (untitled) on page 29 provides the mean scores for Information, Sentence Length, and Complexity for each category of
student (i.e., by gender, race, and geographic location urban/rural). The authors state that this information demonstrates that the
story-line proved regionally, ethnically, and sexually unbiased (Renfrew et al., 1994, p. 29).

Comment: Test bias is usually addressed by examining the results of subgroups within the normative sample and by providing
reliability and validity information for each group. In this regard, the Bus Story is not providing the type of information generally
regarded as evidence for bias-free testing. I am not sure how to interpret the data which the authors offer other than to recognize it
as not among the methods generally used.

Other: The authors reported that the British version was shown to be an effective tool in the identification of individuals with

7
Hayward, Stewart, Phillips, Norris, & Lovell 8

language delays (Renfrew et al., 1994, p. 30). They cite two references for this statement: a Bishop and Edmundson (1987) article
on distinguishing between transient and persistent language impairment and a Howland and Kendall (1991) article about which test
to use in assessing language.

Comment: Their claim regarding identification would be better supported with data showing criterion related validity and specificity
and sensitivity information.

Summary/Conclusions/Observations:

The Bus Story was among the early language tests to offer a means of ecologically sampling and measuring story retell. As such, it
was an important early contribution to the use of narratives in language testing. While it is easy to administer, scoring may be
challenging and interpretation is weak given the concerns with the normative sample and psychometric properties.

Some additional comments from the Buros reviewers are useful:

The Buros reviewer (Bain): A third concern involves caution in scoring and interpreting the Complexity subtest, which exhibits the
weakest reliability data and appears to have an inadequate floor at lower age levels. The test authors do advise caution in using this
subtest. On the other hand, the Complexity subtest might prove useful in screening for characteristics of giftedness, an issue not
discussed in manual, but perhaps worth investigating (Bain & Haccoun, 2001, p. 1004).

Buros reviewer (Haccoun): Test users will find no help from the manual in defining the types of interpretations that may be reached
given a particular pattern of scores. In the end, and after much complicated scoring, the evaluator is left pretty much on his or her
own when it comes time to reach a conclusion about a child. They would then need to rely rather strongly on intuitive clinical sense.
From a practical as well as a technical viewpoint, this considerably reduces the potential usefulness of this device (Bain &
Haccoun, 2001, pp. 1007-1008).

Clinical/Diagnostic Usefulness:

Part of the appeal of a test/procedure like the Bus Story is that it is much like language sampling which clinicians are used to doing.
It is straightforward in this regard. One only has to learn the coding. I suspect that there are wide variations in how clinicians code
childrens responses as the reported reliability is low. Some children will provide unique responses which though not awarded points

8
Hayward, Stewart, Phillips, Norris, & Lovell 9

in the coding system, will be clinically important. The test authors do not provide sufficient information overall to assist with
interpretation.

As an integrative task, the Bus Story points to the elements necessary for story retell. But if a child does poorly, the clinician will still
have to back track to locate the level of the problem, also, I wonder what role memory plays since this is a retell task and scoring is
based on exact information child recalls from the adults story.

Its not clear to me how the scores were developed and how they compare to a larger population of children or to other tests. A
clearer picture would help me to know how to interpret the results.

As a screening measure, the Bus Story is probably too time-consuming for general use in school settings. However, if the child is
identified as a potential candidate for intervention, I would be inclined to use the Bus Story to sort out elements that are challenging
for the child and to guide further investigation as the authors claim. In this sense, I would be using the test in a qualitative way due
to the caution about the tests overall psychometric properties.

If the Bus Story was proposed as a screening tool, with sufficient time allocated, I still would be reluctant to use it as a basis for
candidacy due to its weak psychometrics, especially since I would be looking for information on accuracy at that point.

I would not use the Bus Story to make funding decisions.

9
Hayward, Stewart, Phillips, Norris, & Lovell 10

References

Bain, S., & Haccoun, R. (2001). Review of the Renfrew Bus Story. In B. S. Plake & J. C. Impara (Eds.), The fourteenth mental
measurements yearbook (pp. 1003-1008). Lincoln, NE: Buros Institute of Mental Measurements.

Bishop, D., & Edmundson, A.(1987). Language-impaired four-year-olds: distinguishing transicent from persistent impairment.
Journal of Speech and Hearing Disorders, 52, 156-173.

Haynes, C., & Naidoo, S. (1991). Children with specific speech and language impairment. Clinics in Developmental Medicine, 119,
MacKeith Press.

Howland, P., & Kendall, L. (1991). Assessing children with language tests-which tests to use? British Journal of Disorders of
Communication, 26, 355-367.

Paul, R., & Smith, R.L. (1993). Narrative skills in four-year-olds with normal, impaired, and late-developing language. Journal of
Speech and Hearing Research, 36, 592-598.

Renfrew, C., Cowley, J., & Glasgow, C. (1994). The Renfrew bus story language screening by narrative recall (American edition). The
Centerville School Delaware.

Semel, E., Wiig, E., & Secord, W. (1992). Clinical evaluation of language fundamentals-preschool. San Antonio, TX: The
Psychological Corp.

To cite this document:

Hayward, D. V., Stewart, G. E., Phillips, L. M., Norris, S. P., & Lovell, M. A. (2008). Test review: The Renfrew Bus Story (RBS).
Language, Phonological Awareness, and Reading Test Directory (pp. 1-10). Edmonton, AB: Canadian Centre for Research on
Literacy. Retrieved [insert date] from http://www.uofaweb.ualberta.ca/elementaryed/ccrl.cfm.

10

You might also like