You are on page 1of 6

Corpus Linguistics: What It Is and

How It Can Be Applied to Teaching


Daniel Krieger
dannykrieger99 [at] hotmail.com
(Siebold University of Nagasaki (Nagasaki, Japan

Introduction
In recent years a lot of investigation has been devoted to how computers can
facilitate language learning. One specific area on the computer frontier which
still remains quite open to exploration is corpus linguistics. Having heard a
declaration that corpora will revolutionize language teaching, I became very
curious to find out for myself what corpus studies have to offer the English
language teacher and how feasible such an implementation would be. This
article will address those questions by examining what corpus linguistics is,
how it can be applied to teaching English, and some of the issues involved.
Resources are also included which will assist anyone who is interested in
.pursuing this line of study further

?What is Corpus Linguistics


Corpora, Concordancing, and Usage
In order to conduct a study of language which is corpus-based, it is necessary
to gain access to a corpus and a concordancing program. A corpus consists
of a databank of natural texts, compiled from writing and/or a transcription of
recorded speech. A concordancer is a software program which analyzes
corpora and lists the results. The main focus of corpus linguistics is to
discover patterns of authentic language use through analysis of actual usage.
The aim of a corpus based analysis is not to generate theories of what is
possible in the language, such as Chomsky's phrase structure grammar which
can generate an infinite number of sentences but which does not account for
the probable choices that speakers actually make. Corpus linguistics’ only
concern is the usage patterns of the empirical data and what that reveals to
.us about language behavior
Register Variation
One frequently overlooked aspect of language use which is difficult to keep
track of without corpus analysis is register. Register consists of varieties of
language which are used for different situations. Language can be divided
into many registers, which range from the general to the highly specific,
depending upon the degree of specificity that is sought. A general register
could include fiction, academic prose, newspapers, or casual conversation,
whereas a specific register would be sub-registers within academic prose,
such as scientific texts, literary criticism, and linguistics studies, each with
their own field specific characteristics. Corpus analysis reveals that language
often behaves differently according to the register, each with some unique
.patterns and rules
The Advantages of Doing Corpus-Based Analyses
Corpus linguistics provides a more objective view of language than that of
introspection, intuition and anecdotes. John Sinclair (1998) pointed out that
this is because speakers do not have access to the subliminal patterns which
run through a language. A corpus-based analysis can investigate almost any
language patterns--lexical, structural, lexico-grammatical, discourse,
phonological, morphological--often with very specific agendas such as
discovering male versus female usage of tag questions, children's acquisition
of irregular past participles, or counterfactual statement error patterns of
Japanese students. With the proper analytical tools, an investigator can
discover not only the patterns of language use, but the extent to which they
are used, and the contextual factors that influence variability. For example,
one could examine the past perfect to see how often it is used in speaking
versus writing or newspapers versus fiction. Or one might want to investigate
the use of synonyms like begin and start or big/large/great to determine their
.contextual preferences and frequency distribution

Applying Corpus Linguistics to Teaching


According to Barlow (2002), three realms in which corpus linguistics can be
applied to teaching are syllabus design, materials development, and
.classroom activities
Syllabus Design
The syllabus organizes the teacher's decisions regarding the focus of a class
with respect to the students’ needs. Frequency and register information could
be quite helpful in course planning choices. By conducting an analysis of a
corpus which is relevant to the purpose a particular class, the teacher can
.determine what language items are linked to the target register
Materials Development
The development of materials often relies on a developer's intuitive sense of
what students need to learn. With the help of a corpus, a materials developer
could create exercises based on real examples which provide students with
an opportunity to discover features of language use. In this scenario, the
materials developer could conduct the analysis or simply use a published
.corpus study as a reference guide
Classroom Activities
These can consist of hands on student-conducted language analyses in which
the students use a concordancing program and a deliberately chosen corpus
to make their own discoveries about language use. The teacher can guide a
predetermined investigation which will lead to predictable results or can have
the students do it on their own, leading to less predictable findings. This
exemplifies data driven learning, which encourages learner autonomy by
.training students to draw their own conclusions about language use

Teacher/Student Roles and Benefits


The teacher would act as a research facilitator rather than the more traditional
imparter of knowledge. The benefit of such student-centered discovery
learning is that the students are given access to the facts of authentic
language use, which comes from real contexts rather than being constructed
for pedagogical purposes, and are challenged to construct generalizations
and note patterns of language behavior. Even if this kind of study does not
have immediately quantifiable results, studying concordances can make
students more aware of language use. Richard Schmidt (1990), a proponent
of consciousness-raising, argues that “what language learners become
conscious of -- what they pay attention to, what they notice...influences and in
some ways determines the outcome of learning." According to Willis (1998),
:students may be able to determine
the potential different meanings and uses of common words•
useful phrases and typical collocations they might use themselves•
the structure and nature of both written and spoken discourse•
that certain language features are more typical of some kinds of text•
than others
:Barlow (1992) suggests that a corpus and concordancer can be used to
compare language use--student/native speaker, standard•
English/scientific English, written/spoken
analyze the language in books, readers, and course books•
generate exercises and student activities•
?analyze usage--when is it appropriate to use obtain rather than get•
examine word order•
compare similar words--ask vs. request•

Problematic Issues Involved


Several challenges are involved in implementing the use of a corpus for the
purpose of teaching. The first is that of corpus selection. For some teaching
purposes, any large corpus will serve. Some corpora are available on-line for
free (see appendix 2) or on disk. But the teacher needs to make sure that the
corpus is useful for the particular teaching context and is representative of the
target register. Another option is to construct a corpus, especially when the
target register is highly specific. This can be done by using a textbook, course
reader, or a bunch of articles which the students have to read or are
representative of what they have to read. A corpus does not need to be large
in order to be effective. The primary consideration is that of relevance to the
students--it ought to be selected with the learning objectives of the class in
.mind, matching the purpose for learning with the corpus
Related to the issue of corpus selection is that of corpus bias, which can
cause frustration for the teacher and student. This is because the data can be
misleading; if one uses a very large general corpus, it may obscure the
register variation which reveals important contextual information about
language use. The pitfall is that a corpus may tell us more about itself than
about language use. Another obstacle to confront is the comprehensibility
issue: if you use concordancing in a class, it can be quite difficult for the
students (or even the teacher) to understand the data that it provides. Lastly,
the issue of learning style differences--for some students, discovery learning
is simply not the optimal approach. All of these points reinforce the caveat that
careful consideration is required before a new technology is introduced in the
classroom, especially one which has not been thoroughly explored and
.streamlined

Exploiting a Corpus for a Classroom Activity


Although corpora may sound reasonable in theory, applying it to the
classroom is challenging because the information it provides appears to be so
chaotic. For this reason, it is the teacher's responsibility to harness a corpus
by filtering the data for the students. Although I support having students
conduct their own analyses, at present I see corpora’s greatest potential as a
source for materials development. Susan Conrad (2000) suggests that
materials writers take register specific corpus studies into account. Biber,
Conrad and Reppen (1998) emphasize the need for materials writers to
acknowledge the frequency which corpus studies reveal of words and
.(structures in their materials design. (See Appendix 1 for an example

"Taking a Closer Look at "Any


:As an English teacher, I have always taught "any" in the following way
?Interrogatives: Are there any Turkish students in your class•
.Negatives: No, there aren't any Turkish students in my class•
.Affirmatives: *Yes, there are any Turkish students in my class•
A corpus study by Mindt (1998) concluded that 50% of any usage takes place
in affirmative statements, 40% in negative statements, and only 10% in
interrogatives. My own concordance analysis bore his claim out, so I
constructed the following exercise to represent the percentage distribution of
the three structural uses of any, using ten representative examples. The
purpose of this exercise is to get the students to discover three usage
patterns and their relative frequency. These concordance lines can also be
exploited for other purposes such as defining functions and common
language chunks of any. It is assumed that an exercise like this would be part
of a lesson context in which the students were studying quantifiers or
.something related

Appendix 1
"A Closer Look at "Any
Part 1
.Read through the following lines taken from a concordance of the word any
This is going to be a test like any other test, like, for example•
working with you.. If there are any questions about how we're going to•
and I didn't receive any materials for the November meeting•
and it probably won't make any difference. I mean, that's the next•
.You can do it any way you want•
?Do you want to ask any questions? make any comments•
I don't have any problem with that. I'm just saying•
.if they make any changes, they would be minor changes•
I think we ought to use any kind of calculator. I think that way•
I see it and it doesn't make any sense to me, but I can take that•
Source: Corpus of Spoken Professional American English

?What conclusions can you draw about the use of any


Part 2
?What are the three main uses of any in order of frequency
:Any 1
:Any 2
:Any3

Appendix 2
Links to Help You Get Started
English Corpora and Concordancers for on-line use--jump right in and•
.try it
/http://vlc.polyu.edu.hk○
http://www.edict.com.hk/concordance/WWWConcappE.htm○
Tim John's web page on concordancing (geared toward language•
:(teachers
http://web.bham.ac.uk/johnstf/timconc.htm○
Catherine Ball's web page on concordances and corpora (provides a•
:(useful tutorial
http://www.georgetown.edu/cball/corpora/tutorial.html○
:Mike Barlow's corpus linguistics website•
http://www.ruf.rice.edu/~barlow/corpus.html○
:David Lee's bookmarks for corpus-based linguistics•
http://devoted.to/corpora○

References
Altenberg, Bengt & Granger, Sylviane (2001) The grammatical and•
lexical patterning of make in native and non-native student writing.
Applied linguistics Vol. 22, No. 2, pp. 173-194
Aston, guy (1997) Enriching the learning environment: corpora in ELT,•
In Gerry Knowles, Tony Mcenery, Stephen Fligelstone, Anne Wichman,
(Eds.) Teaching and language corpora . Longman pp. 51-66
Barlow, Michael ( 1992) Using Concordance Software in Language•
Teaching and Research. In Shinjo, W. et al. Proceedings of the Second
International Conference on Foreign Language Education and
Technology. Kasugai, Japan: LLAJ & IALL pp. 365-373
Barlow, Michael (2002) Corpora, concordancing, and language•
teaching. Proceedings of the 2002 KAMALL International Conference.
Daejon, Korea
Biber, Douglas & Conrad, Susan (2001) Corpus based research in•
TESOL. TESOL Quarterly Vol. 35, No. 2, pp. 331-335
Biber, Douglas & Conrad, Susan & Reppen, Randi (1998) Corpus •
linguistics: investigating language structure and use . Cambridge
Conrad, Susan (2000) Will corpus linguistics revolutionize grammar•
teaching in the 21st century? TESOL Quarterly Vol. 34, pp. 548-560
Fox, Gwyneth (1998) Using corpus data in the classroom, In Brian•
Tomlinson (Ed.) Materials development in language teaching,
Cambridge
Leech, Geoffrey (1997) Teaching in language corpora: a convergence,•
In Gerry Knowles, Tony Mcenery, Stephen Fligelstone, Anne Wichman,
(Eds.) Teaching and language corpora . Longman pp. 1-22
McCarthy, Michael & Carter, Ronald (2001) Size isn't everything:•
spoken English, corpus, and the classroom. TESOL Quarterly Vol. 35,
No. 2, pp. 337-340
Mindt, Dieter (1997) Corpora and the teaching of English in Germany,•
In Gerry Knowles, Tony Mcenery, Stephen Fligelstone, Anne Wichman,
(Eds.) Teaching and language corpora . Longman pp. 40-50
Nation, I.S.P (2001) Learning vocabulary in another language .•
Cambridge
Schmidt, Richard (1990) Input, interaction, attention, and awareness:•
the case for consciousness-raising in second language teaching. Paper
prepared for presentation at Enpuli Encontro Nacional Professores
Universitarios de Lengua Inglesa, Rio de Janeiro
Sinclair, John (1998) Corpus evidence in language description, In•
Gerry Knowles, Tony Mcenery, Stephen Fligelstone, Anne Wichman,
(Eds.) Teaching and language corpora . Longman pp. 27-39
Stevens, Vance (1995) Concordancing with language learners: Why?•
.When? What? CAELL Journal Vol 6, No. 2 pp. 2-10
Stevens, Vance (1991) Classroom concordancing: Vocabulary•
materials derived from relevant, authentic text. English for Specific
.Purposes Vol. 10, pp. 35-46
Thurstun, Jennifer & Candlin, Christopher (1998) Concordancing and•
the teaching of the vocabulary of academic English. English for
Specific Purposes Vol. 17, No. 3, pp. 267-280
Willis, Jane (1998) Concordances in the classroom without a computer,•
In Brian Tomlinson (Ed.) Materials development in language teaching,
Cambridge

The Internet TESL Journal, Vol. IX, No. 3, March 2003


/http://iteslj.org

You might also like