Discourse On The Move Using Corpus Analysis To Describe Discourse Structure DouglasBiber

Discourse on the Move
Studies in Corpus Linguistics (SCL)

SCL focuses on the use of corpora throughout language study, the development of a quantitative approach to linguistics, the design and use of new tools for processing language texts, and the theoretical implications of a data-rich discipline.
General Editor
Elena Tognini-Bonelli
The Tuscan Word Center/ The University of Siena
Consulting Editor
Wolfgang Teubert
University of Birmingham
Advisory Board
Michael Barlow Douglas Biber Marina Bondi
University of Auckland Northern Arizona University University of Modena and Reggio Emilia University of Wales, Swansea
Graeme Kennedy
Victoria University of Wellington
Geoffrey N. Leech Anna Mauranen Ute Rmer
University of Lancaster University of Helsinki University of Hannover
Christopher S. Butler
Sylviane Granger M.A.K. Halliday Susan Hunston Stig Johansson
University of Louvain University of Sydney University of Birmingham Oslo University
Michaela Mahlberg Jan Svartvik
University of Liverpool University of Lund
John M. Swales
University of Michigan
Yang Huizhong
Jiao Tong University, Shanghai
Volume 28 Discourse on the Move. Using corpus analysis to describe discourse structure Douglas Biber, Ulla Connor and Thomas A. Upton

Using corpus analysis to describe discourse structure
Douglas Biber
Northern Arizona University
Ulla Connor Thomas A. Upton

Indiana University Indianapolis
John Benjamins Publishing Company

Amsterdam/Philadelphia
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data Biber, Douglas. Discourse on the move : using corpus analysis to describe discourse structure / Douglas Biber, Ulla Connor, Thomas A. Upton. p. cm. (Studies in Corpus Linguistics, issn 1388-0373 ; v. 28) Includes bibliographical references and index. 1. Discourse analysis--Data processing. I. Connor, Ulla, 1948- II. Upton, Thomas A. (Thomas Albin) III. Title. P302.3.B53 2007 2007029145 401'.41--dc22 isbn 978 90 272 2302 9 (Hb; alk. paper)
2007 John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. P.O. Box 36224 1020 me Amsterdam The Netherlands John Benjamins North America P.O. Box 27519 Philadelphia pa 19118-0519 usa
Table of contents
Preface xi
chapter 1 1 Discourse analysis and corpus linguistics 1 Discourse and discourse analysis 1 1.1 Discourse studies of language use 3 1.2 Discourse studies of linguistic structure beyond the sentence 4 1.3 Discourse studies of social practices and ideological assumptions associated with communication 6 1.4 Register and genre perspectives on discourse 7 1.5 Identifying structural units in discourse 9 2 Corpus-based investigation of discourse structure 10 3 Top-down versus bottom-up corpus-based approaches to discourse analysis 12 3.1 Examples of top-down analyses of discourse 14 3.2 Example of bottom-up approach 16 4 Creating a specialized corpus for discourse analysis 17 5 Overview of the book 19 Part 1. Top-down analyses of discourse organization chapter 2 Introduction to move analysis WITH Budsaba Kanoksilapatham
1 2 3 3.1 3.2 4 4.1
23
Background 23 Swales move analysis of research articles 25 Move analysis of research articles applied across genres 29 Description and examples 29 Summary of previous research on move analysis 32 Overview of the methods for move analysis 32 General steps of a move analysis 32
4.2 5 5.1 5.2 5.3
5.3.1 5.3.2 5.3.3 5.3.4 6
Inter-rater reliability 35 Using a corpus-based approach to move analysis 36 Corpus-based move analysis 36 General advantages of corpus-based approaches to discourse analysis 37 Specific advantages of a corpus-based perspective for move analysis 38 Identifying linguistic features of moves 38 Move frequencies and lengths 39 Mapping move use and locations 39 Genre prototypes 40 Summary 40
chapter 3 43 Identifying and analyzing rhetorical moves in philanthropic discourse 1 Background 43 2 A specialized corpus of fundraising texts 44 3 Determining and analyzing discourse moves: Direct mail letters 46 3.1 Previous analysis of direct mail letters 46 3.2 A move analysis of fundraising letters: Background and methodology 46 3.2.1 Move types 46 3.2.2 Structural elements 52 3.3 Analysis 54 3.4 Results 55 3.5 Discussion 57 3.6 Letter prototypes 58 4 Linguistic analysis of moves: Tracking the use of stance structures 61 4.1 Identifying grammatical stance devices 62 Interpreting the use of grammatical stance devices used in moves 63 4.2 5 Final thoughts 68 chapter 4 Rhetorical moves in biochemistry research articles BY Budsaba Kanoksilapatham
1 2 3 3.1 3.2 3.3
73
Background 73 Description of the corpus 75 Determining the move categories in the genre of biochemistry research articles 76 The introduction section 77 The methods section 78 The results section 79
3.4 4 5 6 7 8
Table of contents
The discussion section 81 Coding moves in the corpus of biochemistry research articles 83 Distribution of move types within texts from the biochemistry corpus 84 Linguistic characteristics of rhetorical moves in biochemistry research articles 87 Linguistic variation among move categories in biochemistry research articles 90 Multi-dimensional variation among move types within the same section 103
chapter 5 Rhetorical appeals in fundraising WITH Molly Anthony & Kostyantyn Gladkov
1 2 2.1 2.2 2.3 3 3.1 4 4.1 4.2 5 6
121
Elements of persuasion 121 Determining and analyzing rhetorical appeals 124 Rational appeals (Logos) 125 Credibility appeals (Ethos) 129 Affective appeals (Pathos) 131 Analysis, segmentation, and classification 132 Results and discussion 133 Linguistic description of appeals 136 Wordlists 137 Keywords 138 Appeals and discourse structure of letters 141 Conclusion 143
Part 2. Bottom-up analyses of discourse organization chapter 6 Introduction to the identification and analysis of vocabulary-based discourse units WITH Eniko Csomay, James K. Jones, & Casey Keck
1 2 3 4 5
155
Conceptual introduction to VBDUs 156 Automatic identification of VBDUs in texts 161 Perceptual correlates of VBDUs 163 Using VBDUs to analyze the discourse structure of texts 169 Going one step further: Identifying generalizable VBDU types 170
chapter 7 Vocabulary-based discourse units in biology research articles WITH James K. Jones
1 2 3 4 5 6 7 7.1 7.2 7.3 8 9 10
175
Constructing the corpus of VBDUs 176 Analyzing the linguistic characteristics of VBDUs: Multi-dimensional analysis 178 Comparing the multi-dimensional characteristics of research article sections 184 The multi-dimensional profile of VBDUs within a research article: Tracking the movement of discourse 186 Identifying and interpreting the multi-dimensional text types of biology research articles 190 Using VBDU text types to describe the discourse organizational patterns of biology research articles 194 Starting and ending research article sections 196 Describing the typical discourse organizations of introductions 197 Describing the typical discourse organizations of methods sections 199 Describing the typical discourse organizations of discussion sections 201 Preferred text type sequences across research article section boundaries 203 Comparing the preferred discourse styles of research journals 205 Conclusion 208
chapter 8 Vocabulary-based discourse units in university class sessions BY Eniko Csomay

1
213
From constructing a corpus of VBDUs to identifying VBDU text-types 214 1.1 Constructing a corpus of VBDUs 214 1.2 Analyzing the linguistic characteristics of VBDUs applying MD analytical techniques 215 1.3 VBDUs and dimension scores: the multi-dimensional profile of the first three VBDUs of a business management class 217 2 Dimension scores and VBDU text-types 222 2.1 Interpreting the clusters as VBDU types based on their linguistic characteristics 224 2.1.1 Cluster 1: Personalized framing 225 2.1.2 Cluster 2: Informational monologue 227 2.1.3 Cluster 3: Contextual interactive 228
Table of contents
2.1.4 3 3.1 3.2 4
Cluster 4: Unmarked 229 From VBDU text-types to discourse structure 230 Functional interpretation of VBDU types 230 Text as sequences of VBDU types 232 Summary and conclusion 237
chapter 9 239 Conclusion: Comparing the analytical approaches 1 Overview 239 2 Comparing the top-down and bottom-up descriptions of biology research articles. 242 2.1 Discourse units in biology research articles 243 2.2 The dimensions of linguistic variation in biology research articles 244 2.3 The functional and linguistic characteristics of the discourse types (move types vs VBDU types) in biology research articles 249 2.4 Description of the typical discourse organization of biology research articles 253 3 Summary and prospects for future research 258 Appendix 1 A brief introduction to multi-dimensional analysis A.1 Conceptual introduction to the multi-dimensional approach to variation 261 A.2 Overview of methodology in the multi-dimensional approach 262 Appendix 2 Grammatical and lexico-grammatical features included in the multi-dimensional analyses References Index 261 261
267 267 273
Preface
The idea for this book evolved slowly, emerging from research taking place at several institutions applying different approaches to a single research problem: can discourse structure and organization be investigated from a corpus perspective? At Northern Arizona University (NAU), research on this topic began in a PhD seminar in 1999. Inspired by the research of Youmans (1991; 1994) on the Vocabulary Management Profile, students in that seminar explored ways in which the discourse structure of a text can be discovered automatically by tracking the text-internal use of vocabulary and other linguistic features. This initial effort resulted in a PhD dissertation by Csomay (2002), followed by several other research studies undertaken at NAU that employed the TextTiling methods originally developed by Hearst (1997). Over the same period, researchers at Indiana University Purdue University Indianapolis (IUPUI) and Georgetown University were exploring a completely different approach to this same research problem: applying the framework of rhetorical move analysis, developed by Swales (1981; 1990) for the detailed analysis of texts, to analyze the general rhetorical and linguistic patterns of discourse structure in a corpus. At IUPUI, this research effort focused primarily on philanthropic discourse, especially grant proposals and fundraising letters. And at Georgetown University, this research culminated in 2003 with the completion of a PhD dissertation by Kanoksilapatham (2003) on the discourse structure of biochemistry research articles. The actual idea for the present book came about as colleagues from these different institutions would get together at conferences and discuss their different approaches to the study of discourse structure and organization from a corpus perspective. We realized that there had been very little previous research done on this topic, and that by combining and comparing our approaches, we could provide a relatively comprehensive overview of this emerging subfield. Because the book grew out of relatively independent research efforts, each author has had different primary responsibilities. At the same time, we have been eager to structure the book as a coherent treatment of this subject: an authored book rather than an edited collection of articles. Thus, the three book authors
share equal responsibility for revising and editing all chapters, and ultimately the content of all chapters. But on the other hand, each chapter has different primary authors, including several co-authors in addition to the three book authors for Chapters 13, 57, and 9. Two chapters are invited, single-authored contributions Chapter 4 by Kanoksilapatham and Chapter 8 by Csomay. The primary authors for each chapter are as follows: Chapter 1: Biber, Connor, Upton Chapter 2: Connor, Upton, Kanoksilapatham Chapter 3: Upton, Connor Chapter 4: Kanoksilapatham Chapter 5: Connor, Anthony, Gladkov, Upton Chapter 6: Biber, Csomay, Jones, Keck Chapter 7: Biber, Jones Chapter 8: Csomay Chapter 9: Biber, Connor, Upton We would like to thank the numerous colleagues who have made useful suggestions and criticisms over the years in relation to the various research projects that come together in the present book. We also owe a special thanks to Eric Friginal, Bethany Gray, Jack Grieve, Mark Johnson, Erkan Karabacak, YouJin Kim, Poonpon Kornwipa, Jingjing Qin, Angkana Tongpoon, and Faith Young -- the students of ENG 707 (Seminar on Discourse) at Northern Arizona University in the fall of 2006, who read the entire book manuscript and made numerous useful comments and suggestions (including the title for our book, suggested by Jack Grieve).
chapter 1
Discourse analysis and corpus linguistics
Discourse and discourse analysis
The study of discourse has become a major focus of research in many disciplines of the humanities, social sciences, and information sciences. Because this area of study can be approached from so many different perspectives, the terms discourse and discourse analysis have come to be used in widely divergent ways. Several introductory treatments survey the range of definitions given to the term discourse (e.g., Jaworski & Coupland, 1999, pp. 17; Schiffrin, 1994, pp. 2343). Schiffrin, Tannen, and Hamilton (2001) in their introduction to The Handbook of Discourse Analysis (p. 1), group previous definitions of discourse analysis into three general categories: 1) the study of language use; 2) the study of linguistic structure beyond the sentence; and 3) the study of social practices and ideological assumptions that are associated with language and/or communication. The object of study for these three approaches to discourse is increasingly removed from the research goals of traditional structural linguistics. The study of language use focuses on traditional linguistic constructs, such as phrase structures and clause structures, but addresses the problem of why languages have structural variants with nearly equivalent meanings (e.g., particle movement, as in pick up the book versus pick the book up). By considering factors that are not strictly structural, linguists are able to predict when one or another variant is likely to be used. For example, the length of the direct object noun phrase is an important factor predicting the likelihood of particle movement. Aspects of the discourse context are often important for understanding linguistic variation, especially for linguistic constructions that involve word order variation (such as passives, extraposition, clefts, inversions, existential there, etc.). For example, writers will choose passive voice rather than active voice depending on the topical relevance of the patient noun phrase. The study of linguistic structure beyond the sentence focuses on a larger object of study: extended sequences of utterances or sentences, and how those texts are constructed and organized in systematic ways. Although studies of this type are removed from the traditional concerns of structural linguistics (which focuses
mostly on phrasal and clause syntax), the two share a primary focus on linguistic form and how language structures are used for communication. In contrast, the third approach to discourse is socio-cultural in orientation, and generally not concerned with the description of particular texts or the analysis of language structure and use. Socio-cultural approaches to discourse sometimes focus on the actions of participants in particular communication events, and at other times focus on the general characteristics of speech/discourse communities in relation to issues such as power and gender. Although the socio-cultural approaches are obviously important for understanding the broader role of texts in culture, they typically are not concerned with understanding the linguistic forms used in those texts. Corpus linguistic studies are generally considered to be a type of discourse analysis because they describe the use of linguistic forms in context. For example, words are described in terms of their typical collocates: the words that normally occur in the discourse context. Grammatical variation is also described in terms of the words and other grammatical structures that occur in the context. As such, corpus linguistic research has fallen squarely under the first approach to discourse: the study of language use. However, it has been much less common to study discourse organization from a corpus perspective. In fact, these two subfields have research goals and methods that might be considered incompatible: The study of discourse organization linguistic structure beyond the sentence is usually based on detailed analysis of a single text, resulting in a qualitative linguistic description of the textual organization. In contrast, corpus studies are based on analysis of all texts in a corpus, utilizing quantitative measures to identify the typical distributional patterns that occur across texts. In fact, individual texts often have no status whatsoever in corpus investigations. Instead, what we find are comparisons of the distributional patterns in one sub-corpus to the patterns in a second sub-corpus. For example, Scott and Tribble (2006) describe how we can compare the keywords of the spoken versus written sub-corpora from the British National Corpus. Nesselhauf (2005, Chapter 3) describes the deviant collocations in a corpus of learner English essays. And Rmer (2005, Chapter 4) documents the variants and distributional patterns of progressive verb phrases in the spoken sub-corpora from the British National Corpus. These studies are typical of corpus-based research on discourse: they describe the typical patterns of language use, considering the systematic ways in which aspects of the lexico-grammatical context tend to occur together with different linguistic variants; but such corpus-based studies usually tell us nothing about the discourse structure of particular texts.
Chapter 1. Discourse analysis and corpus linguistics
We thus see this interface as one of the current challenges of corpus linguistics: Is it possible to merge the analytical goals and methods of corpus linguistics with those of discourse analysis that focuses on the structural organization of texts? Can a corpus be analyzed to identify the general patterns of discourse organization that are used to construct texts, and can individual texts be analyzed in terms of the general patterns that result from corpus analysis? These are the central issues that we take up in the present book. 1.1 Discourse studies of language use
The first major approach to discourse identified above the study of language use has been carried out from several different perspectives, including research in pragmatics, speech act theory, functional linguistics, variationist studies, and register studies. These subfields all investigate how words and linguistic structures are used in discourse contexts to express a range of meanings. Many of these approaches focus on the study of linguistic variation, showing how linguistic choice is systematic and principled when considered in the larger discourse context. There have been numerous studies of grammar and discourse over the last two decades, as researchers have come to realize that the description of grammatical function is as important as structural analysis. By studying linguistic variation in naturally occurring discourse, researchers have been able to identify systematic differences in the functional use of each variant. An early study of this type is Prince (1978), who compares the discourse functions of WH-clefts and it-clefts. Thompson and Schiffrin have carried out numerous studies in this research tradition; Thompson on detached participial clauses (1983), adverbial purpose clauses (1985), omission of the complementizer that (S. Thompson & Mulac, 1991a, 1991b), relative clauses (Fox & Thompson, 1990); and Schiffrin on verb tense (1981), causal sequences (1985b), and discourse markers (1985a, 1987). Other more recent studies of this type include Ward (1990) on VP preposing, Collins (1995) on dative alternation, and Myhill (1995; 1997) on modal verbs. Most corpus-based research is discourse analytic in this sense, investigating systematic patterns of language use across discourse contexts, generalized over all the texts in a corpus (see, e.g., Biber, Conrad, & Reppen, 1998; McEnery, Xiao, & Tono, 2006). The advantages of a corpus approach for the study of discourse, lexis, and grammatical variation include the emphasis on the representativeness of the text sample, and the computational tools for investigating distributional patterns across discourse contexts. The recent edited volumes by Connor and Upton (2004b), Meyer and Leistyna (2003), Lindquist and Mair (2004), and Sampson and McCarthy (2004) provide good introductions to work of this type. There are also a number of book-length treatments reporting corpus-based investigations of grammar and
discourse: for example, Aijmer (2002) on discourse particles, Collins (1991) on clefts, Granger (1983) on passives, Mair (1990) on infinitival complement clauses, Meyer (1992) on apposition, Rmer (2005) on progressive verbs, Tottie (1991) on negation, and several books on nominal structures (e.g., de Haan, 1989; Geisler, 1995; Johansson, 1995; Varantola, 1984). The Longman Grammar of Spoken and Written English (1999) applies corpus-based analysis to a more comprehensive grammatical description of English, showing how any grammatical feature can be described for both structural characteristics and discourse patterns of use. The recent book by Partington (2003) is interesting here in that it combines corpus-based study with an analysis of pragmatics, to investigate the discourse features of White House briefings. A corpus of 48 briefings (250,000 words of running texts) was subjected to computerized concordance and keyword analysis. However, the computational analyses were guided by detailed qualitative analysis: a summer reading the corpus briefings and making notes (p. 12). This allowed Partington to check on oddities of computerized collocation analysis, highlighting odd language usage that computerized analysis might not have revealed. A more specialized corpus-based approach to the study of language use is multi-dimensional (MD) analysis. Unlike most corpus-based research, MD studies investigate language use in individual texts. This approach describes how linguistic features co-occur in each text, resulting in more general patterns of linguistic co-occurrence that hold across all texts of a corpus. The approach can thus be used to show how patterns of linguistic features vary across individual texts, or across registers and genres. MD analysis is used in several chapters in the present book, and so it is introduced more fully in Appendix One. 1.2 Discourse studies of linguistic structure beyond the sentence
The second major approach to discourse analysis identified above the study of linguistic structure beyond the sentence is the primary focus of the present book. Previous research on discourse-level structures has been undertaken from linguistic, cognitive, and computational perspectives. Linguistic Perspectives: Linguistic analyses of discourse structure have focused on lexico-grammatical features that indicate the organization of discourse (see, e.g., the papers in Coulthard, 1994). Focusing on units beyond the sentence-level (e.g., paragraphs in written discourse and episodes in oral discourse), these researchers investigate linguistic devices that signal the underlying discourse structure. Much research of this type has described the discourse functions of particular words and phrases, referred to as discourse markers, connectives, discourse particles (Schiffrin, 1994), lexical phrases (Hansen, 1994; Nattinger & DeCarrico,
1992), or cue phrases (Passonneau & Litman, 1996). Other studies discuss the linguistic devices used to mark information structure, topical development, or rhetorical structures in discourse (e.g., Mann, Matthiessen, & Thompson, 1992; Mann & Thompson, 1988; Prince, 1981). Finally, some studies track the use of linguistic devices across a text. For example, discourse maps are used to track verb tense and voice patterns across the sections of research articles (Biber et al., 1998, Chapter 5), while other studies track referential expressions used in anaphoric chains throughout a text (e.g., Biber, 1992; Fox, 1987; Givn, 1983). A related area of research is the study of textual cohesion: the use of lexical and grammatical devices as the glue of a text, holding the text together as discourse rather than an accidental sequence of sentences (see, e.g., Halliday, 1989; Halliday & Hasan, 1976; Hoey, 1991; Phillips, 1985; Tyler, 1995). Linguistic devices used to establish cohesion include anaphoric pronouns, linking adverbials, and the use of lexical repetition and synonymy to establish topical cohesion. Similarly, Tannen (1989) found that repetitions in conversation operate as a kind of theme-setting at the beginning of a topical unit and at the end, forming a kind of coda (p. 69). Cognitive perspectives: Cognitive investigations of discourse structure study the factors that make a text coherent. Text coherence refers to the linking of ideas within a text to create meaning for readers. Analyses of textual coherence typically identify the propositions expressed in a text, the logical relations among those propositions, and how listeners/readers are able to construct the overall textual meaning in terms of those propositional relations. In contrast to the study of cohesion, which refers to surface-level patterns, coherence entails the study of larger discourse relationships. Many of these studies describe texts in terms of the coherence relations expressed by clause-level propositions (Bateman & Rondhuis, 1997; Dahlgren, 1996; Hobbs, 1979; Sanders, 1997; Sanders & Noordman, 2000). Related studies also consider other factors that influence coherence, including differences between subject versus presentational matter (Mann & Thompson, 1988), text structural patterns like problem-solution (Connor, 1987) and given-new (themerheme) structures (Cooper, 1988), and the semantic and pragmatic relations between units (Polanyi, 1985, 1988; Sanders, 1997). Several researchers have developed analytical frameworks for the study of coherence relations (e.g., Grosz & Sidner, 1986; McNamara & Kintsch, 1996; Tomlin, Forrest, Ming Pu, & Hee Kim, 1997; Van Dijk, 1981, 1997; Van Dijk & Kintsch, 1983). The ongoing flow of information is also central to coherence (Grabe & Kaplan, 1996). Studies have approached information flow from various perspectives, including representations of the flow of thought (Chafe, 1994, 1997) or short-term memory (Tomlin et al., 1997).
Computational perspectives: Computational studies of discourse organization have attempted to model discourse organization for the purposes of information retrieval and natural language processing. Most computational studies of discourse structure have focused on written texts. For example, Morris and Hirst (1991) developed a lexical algorithm to find chains of related terms, which can be used to describe the structure of texts, applying Grosz and Sidners (1986) attentional/intentional model. Marcu (2000) explores the feasibility of automatic rhetorical parsing, applying Mann and Thompsons (1988) Rhetorical Structure Theory. One important study for the purposes of the present book is Youmans (1991; 1994), who developed the Vocabulary Management Profile (VMP), a computational method to track the introduction of new vocabulary into a text. Youmans shows that VMPs are quite sensitive indicators of the episodic structure of written literary texts, suggesting that the VMP graph provides a direct visual analogue for constituent structure (p. 113). Youmans compared the results of the VMP to the paragraph boundaries of literary texts and found 80percent agreement. Fewer computational studies have focused on the discourse structure of spoken discourse. One of the best known of these, Passonneau and Litman (1996; 1997), attempts to automatically segment spoken texts (spontaneous, narrative monologues) into discourse units, based on the use of referential noun phrases, cue words, and pauses. This study further compares the results of the automatic segmentation to perceptually-identified discourse units. 1.3 Discourse studies of social practices and ideological assumptions associated with communication
Finally, the third approach to discourse the study of communicative social practices and ideological assumptions focuses on the social construction of discourse rather than the linguistic description of particular texts. For example, proponents of the New Rhetoric (e.g., Bazerman, 1988, 1994; Berkenkotter & Huckin, 1995; Miller, 1984) have argued for the importance of understanding the knowledge of social context surrounding texts for helping writers select rhetorical strategies that work in a given situation. The focus here is to look not only at the products (texts) but also the processes surrounding the production and consumption of texts, asking Why are specific discourse-genres written and used by the specialist communities the way they are? (Bhatia, 1993a, p. 11). In an attempt to understand the broader social contexts of the discourse, several recent corpus-based studies have added analyses of interviews and focus group discussions with actual writers and readers of the texts or other academic specialists. For example, Hyland (2000) goes beyond the textual approach to discourse analysis of academic articles by adding focus groups, unstructured inter-
views, and discourse-based interviews with subject specialists from those disciplines, although the interviewees were not the writers of the articles in Hylands corpus. The focus groups and the first part of the one-to-one interviews used a semi-structured format and encouraged the informants to speak generally about communication and publication practices in their fields. The second stage used a discourse-based interview which involved detailed discussions about particular pieces of writing. The informants responded as members of the particular discourse community as they interpreted meanings, reconstructed writer motivations, and evaluated rhetorical effectiveness. They were also encouraged to discuss specific points in their own work by referring to a paper they had written. In another corpus study, Hyland (2004b) analyzed a corpus of 240 dissertations by L2 writers at Hong Kong universities, together with interviews with 24 students. The interviews helped in understanding the use of the analyzed metadiscourse markers transitions, frame markers, endophoric markers, evidentials, code glosses, hedges, boosters, attitude markers, engagement markers, and selfmentions. Such qualitative analyses can shed light on disciplinary differences as well as differences between MA and PhD level writers even if the interviewees are not the actual writers. Unlike many qualitative studies of texts and writing, in which the researcher observes, interviews, and works with the actual writer or writers (see, e.g., Bazerman & Prior, 2004), corpus studies tend to rely on anonymous writers who are members of the particular discourse community. In many cases, corpora are constructed from published resources, rather than being collected from writers personally, making it nearly impossible to obtain information about the writers and the circumstances of writing. However, like the Hyland studies cited above, it is possible to combine corpus-based analysis with the careful observation of individual writers. For example, Connor & Mauranen (1999) undertook a large-scale corpus analysis of rhetorical moves in grant proposal writing in the sciences and humanities. This study was later complemented by detailed interviews with five scholars in these disciplines (Connor, 2000). These scholars were not the writers of the proposals in the large corpus. However, as specialist informants they were able to comment on the appropriateness of the move definitions and the identification of move boundaries in a small corpus of their own proposals. 1.4 Register and genre perspectives on discourse
The terms register and genre have been central to previous investigations of discourse. Both terms have been used to refer to varieties associated with particular situations of use and particular communicative purposes. Many studies simply adopt one of these terms and disregard the other. In some cases, these authors
might be assuming a theoretical distinction between the two terms, but that distinction is usually not explicitly noted. For example, studies like Bhatia (2002), Samraj (2002), Bunton (2002), Love (2002), and Swales (2004) exclusively use the term genre. In contrast, studies like Ure (1982), Ferguson (1983), Hymes (1984), Heath and Langman (1994), Bruthiaux (1994; 1996), Conrad (2001), and Biber et al. (1999) exclusively use the term register. A few studies attempt to define a theoretical distinction between the constructs underlying these two terms. For example, Ventola (1984) and Martin (1985) refer to register and genre as different semiotic planes: genre is the content-plane of register, and register is the expression-plane of genre; register is in turn the content-plane of language. Lee (2001) surveys the use of these terms, providing one of the most comprehensive discussions of how they have been used in previous research (as well as terms like text type and style). When research studies have attempted to distinguish between register and genre (such as Couture, 1986; Ferguson, 1994; Martin, 1985; Swales, 1990; Ventola, 1984), the distinction has been applied at two different levels of analysis: 1) to the object of study; 2) to the characteristics of language and culture that are investigated. Thus, the term register (when it is distinguished from genre) has been used to refer to a general kind of language associated with a domain of use, such as a legal register, scientific register, or bureaucratic register. Register studies have usually focused on lexico-grammatical features, showing how the use of particular words and grammatical features vary systematically in accord with the situation of use (factors such as interactivity, personal involvement, mode, production circumstances, and communicative purpose). As such, the term register has been associated with the first general approach to discourse identified in Section 1 above the study of language use. In contrast, the term genre has been used to refer to a culturally recognized message type with a conventional internal structure, such as an affidavit, a biology research article, or a business memo. Genre studies have usually focused on the conventional discourse structure of texts or the expected socio-cultural actions of a discourse community. For example, genres are how things get done, when language is used to accomplish them (Martin, 1985, p. 250), and frames for social action (Bazerman, 1997b, p. 19). As such, the term genre is often associated with the second general approach to discourse identified in Section 1 above the study of linguistic structure beyond the sentence. In his previous work on linguistic variation, Biber has disregarded theoretical distinctions between the terms register and genre, preferring the term genre in earlier studies (e.g., Biber 1986, 1988) and the term register in later research (Biber,
1995, 2006b). In both cases, these were used simply as a general cover term to refer to situationally-defined varieties described for their characteristic lexico-grammatical features, with no implied theoretical distinction between register and genre. However, in the present book we are focused especially on the internal structure and organization of texts from a specific variety (e.g., fundraising letters or biology research articles), a perspective typically associated with the analysis of a genre rather than register. For this reason, we adopt the term genre throughout the book to refer to the linguistic variety being analyzed. 1.5 Identifying structural units in discourse
One specific research emphasis for discourse studies of structure beyond the sentence has been the attempt to segment a text into higher-level structural units. These studies are foundational to the goals of the present book, because the units of analysis in corpus-based studies of discourse structure must be well-defined discourse units: the segments of discourse that provide the building blocks of texts. In studies of written texts, discourse units have generally been identified based on visual as well as textual clues (see, e.g., Hunston, 1994). The smallest unit of analysis has usually been the proposition, followed by the t-unit or sentence, the paragraph, and finally the chapter or the whole text (Meyer, 1985). Such units are identified by written para-linguistic devices (such as sentence punctuation and paragraph indenting), rather than analysis of textual content or function. Other studies have considered the initiation of new topics within a text. Investigating written fiction, Youmans (1991, p. 774) claimed that syntactic function words do not denote new topics, whereas content words do. Similarly, Fox (1987) found that, in expository writing, full noun phrases are more likely than pronouns to indicate the start of a new topic. In spoken discourse (especially conversation) it has proved especially difficult to determine what constitutes a new topic, resulting in a reliance on qualitative or impressionistic findings. As Tannen (1984, p. 38) notes, the boundaries of the shifting topics in conversation are not always clearly and readily identifiable, and the initiation of new topics is often unclear (see also Tannen, 1984, 1989; Van Dijk, 1997). Some research has suggested that prosodic and linguistic cues can be used to determine topical boundaries in oral discourse. For example, pauses, hesitations, false starts, change in pitch, discourse particles, preposed adverbials, summary statements, and evaluative comments have all been proposed as linguistic markers that signal a discourse shift in theme or topic (e.g., Brown & Yule, 1983; Gee, 1986; Korolija & Linell, 1996; Polanyi, 1985; Stubbs, 1983; Tannen, 1987; Van Dijk, 1981).
In general, these studies have focused on linguistic devices that signal the transition from one topic to the next, but they have not attempted to rigorously segment complete texts into well-defined discourse units. However, this is exactly the task that must be accomplished for corpus-based analyses of discourse structure: we need comprehensive identification of the structural discourse units within all texts in the corpus. Two general approaches to text segmentation have been employed in previous corpus-based research: top-down and bottom-up methods of segmentation. The following section discusses these two approaches in more detail.
2 Corpus-based investigation of discourse structure As summarized in the sections above, research on the linguistic characteristics of texts and discourse has been carried out from two major perspectives: one focusing on the distribution and functions of surface linguistic features corpus studies of language use in discourse (which typically disregards the existence of individual texts) and the second focusing on the internal organization of texts discourse studies of linguistic structure beyond the sentence in particular texts. Discourse studies of language use have usually been quantitative, and in more recent years, they have been carried out on large text corpora using the techniques of corpus linguistics; these studies often compare the linguistic characteristics of discourse from different spoken and written registers. Studies of the second type have usually been qualitative and based on detailed analysis of a small number of texts; these studies usually focus on the internal structure of a few texts from a single genre, such as scientific research articles. Rmer (2005) is a good example of the first approach. This study describes the use of progressive verb phrases in spoken English, based on analysis of the British National Corpus and the Bank of English. Rather than focusing on the organization of any particular text, the study focuses on the overall patterns of distribution and use, considering factors such as the tendency of progressives to occur with different tenses and aspects; occurrence with different subject types or object types; occurrence with different adverbials; and the tendency to occur with specific verbs and verb classes. In contrast, the chapters in Mann and Thompson (1992) are good examples of the second approach. This book is based on analysis of a single fundraising letter, showing how the discourse structure and organization of that single text can be analyzed from different perspectives. Surprisingly, few studies have attempted to combine these two research perspectives. On the one hand, most corpus-based studies have focused on the quantitative distribution of lexical and grammatical features, generally disregarding the language used in particular texts and higher-level discourse structures or other
aspects of discourse organization. On the other hand, most qualitative discourse analyses have focused on the analysis of discourse patterns in a few texts from a single genre, but they have not provided tools for empirical analyses that can be applied on a large scale across a number of texts or genres. As a result, we know little at present about the general patterns of discourse organization across a large representative sample of texts from a genre. One of the major methodological problems to be solved by any corpus-based analysis of discourse structure is deciding on a unit of analysis. That is, the first step in an analysis of discourse structure is to identify the internal discourse segments of a text, corresponding to distinct propositions, topics, or communicative functions; these discourse segments become the basic units of the subsequent discourse analysis. For a corpus study of discourse structure, all texts in the corpus must first be analyzed for their component discourse units. However, such analyses were not even possible based on early text corpora, because they were composed of text-files rather than complete texts. For example, text files in the Brown, LOB, and London-Lund Corpora were defined by length 2,000 words long in the case of Brown and LOB, and 5,000 words long in the case of London-Lund. In some cases, a single text file combines multiple texts, while in other cases a text is truncated in a text file when the word limit is reached. This characteristic of early corpora might help to explain why most previous corpus studies have not considered individual texts at all. Rather, the analysis has reported general patterns for the corpus as a whole, or it has compared overall results for various sub-corpora (e.g., the overall frequency of progressive verbs in a conversational sub-corpus compared to the frequency in a sub-corpus of academic writing). More recently, corpora such as the BNC and T2K-SWAL have been designed to include complete texts, such as complete chapters from a book or complete research articles. It is thus possible, in theory, to analyze the internal discourse structure of each text in the corpus, and to then discover general patterns of discourse organization that hold across all texts in the corpus. To achieve this goal, corpus texts must first be segmented into well-defined discourse units, and then those units can be used to identify the general ways in which the discourse of corpus texts is organized. In the following section, we introduce the two major corpusbased approaches that can be applied to these research goals.
3 Top-down versus bottom-up corpus-based approaches to discourse analysis To achieve generalizable corpus-based descriptions of discourse structure, seven major analytical steps are required: Determining the types of discourse units the functional/communicative distinctions that discourse units can serve in these texts (Communicative/Functional Categories) Segmenting all texts in the corpus into well-defined discourse units (Segmentation) Identifying and labeling the type (or category) of each discourse unit in each text of the corpus (Classification) Analyzing the linguistic characteristics of each discourse unit in each text of the corpus (Linguistic analysis of each unit) Describing the typical linguistic characteristics of each discourse unit type, by comparing all discourse units of a given type across the texts of the corpus (Linguistic description of discourse categories) Describing the discourse structures of particular texts as sequences of discourse units, in terms of the general type or category of each of those units (Text structure) Describing general patterns of discourse organization that hold across all texts of the corpus (Discourse organizational tendencies) These seven steps can be achieved through either a top-down research approach or a bottom-up research approach. The two approaches differ primarily in the order of analytical steps. In a top-down approach, the analytical framework is developed at the outset: the discourse unit types are determined before beginning the corpus analysis, and the entire analysis is then carried out in those terms. In a bottom-up approach, the corpus analysis comes first, and the discourse unit types emerge from the corpus patterns. Tables 1.1 and 1.2 summarize the major differences between these two analytical approaches.
Table 1.1 Top-down corpus-based analyses of discourse organization

Required step in the analysis 1. Communicative/Functional Categories Realization in this approach Develop the analytical framework: determine set of possible functional types of discourse units, that is, the major communicative functions that discourse units can serve in corpus Segment each text into discourse units (applying the analytical framework from Step 1) Identify the functional type of each discourse unit in each text of the corpus (applying the analytical framework from Step 1) Analyze the lexical/grammatical characteristics of each discourse unit in each text of the corpus Describe the typical linguistic characteristics of each functional category, based on analysis of all discourse units of a particular functional type in the corpus Analyze complete texts as sequences of discourse units shifting among the different functional types Describe the general patterns of discourse organization across all texts in the corpus
2. Segmentation 3. Classification
4. Linguistic analysis of each unit 5. Linguistic description of discourse categories
6. Text structure
7. Discourse organizational tendencies
In the top-down approach, the first step is to develop the analytical framework, determining the set of possible discourse unit types based on an a priori determination of the major communicative functions that discourse units can serve in these texts. That framework is then applied to the analysis of all texts in a corpus. Thus, when texts are segmented into discourse units, it is done by identifying a stretch of discourse of a particular type; that is, that serves a particular communicative function. In contrast, in the bottom-up approach, the first step is to automatically segment all texts in the corpus into discourse units (based on linguistic criteria). Those discourse units are then analyzed for many other linguistic features, and grouped into clusters of discourse units that are linguistically similar. Only then after the discourse units have already been grouped linguistically are those groupings interpreted as discourse unit types, by determining their typical functions in texts.
Table 1.2 Bottom-up corpus-based analyses of discourse organization

Required step in the analysis 1. Segmentation Realization in this approach
2.
3.
4.
5.
6. 7.
Segment each text in the corpus into discourse units, based on shifts in vocabulary or other linguistic features Linguistic analysis of each unit Analyze the full range of lexical / grammatical characteristics of each discourse unit in each text of the corpus Classification Identify the set of discourse units types that emerge from the corpus analysis, based on linguistic criteria; that is, group all discourse units in the corpus into linguistically-defined categories or types Describe the typical linguistic characteristics of Linguistic description of discourse categories each discourse category, based on analysis of all discourse units of a particular type in the corpus Communicative/functional categories Describe the functional bases of each discourse category, based on post-hoc analysis of the discourse units identified as belonging to a particular type Text structure Analyze complete texts as sequences of discourse units shifting among the different functional types Discourse organizational tendencies Describe the general patterns of discourse organization across all texts in the corpus
3.1
Examples of top-down analyses of discourse
Several top-level discourse structure theories were advanced by text linguists in the 1980s and 1990s. Theories of superstructures were developed for different types of texts such as exposition, argumentation, and narration. These superstructures of texts were called macrostructures by Van Dijk (1980), problem-solution patterns by Hoey (1983; 1986), superstructures of arguments by Tirkkonen-Condit (1985), and story grammars by Mandler and Johnson (1977). Story grammar analysis had its start in the work of Labov and Waletsky (1967), who proposed the following structure for analyzing oral narratives: orientation (the major characters are introduced and a setting is established); complication (a series of events unfold, and a crisis develops); resolution (the crisis is solved); and coda (the final stage, in which the writer may express an attitude toward the story or give her perspective on its significance). Although developed for oral texts originally, the story grammar analysis became a popular tool in written discourse analysis. Martin and Rothery (1986) used it effectively as a research and teaching method for school writing in Australia.
There are other approaches to the analysis of text structure that could be classified as being top-down in nature. Mann and Thompson (1992) in their book, Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text, showcase seven different methods for looking at the text organization of a single fundraising letter. One, described by Callow and Callow (1992), is somewhat like the appeals analysis described below, except that the focus is on identifying the kinds of intended meanings (rather than appeals) that reflect the writers purposes. These different meaning purposes (e.g., informative, expressive, and conative [expressing desires and intentions]) can be used to analyze the meaning-based structure of the text. In their chapter of the book, Mann, Matthiessen, & Thompson (1992) use Rhetorical Structure Theory (RST) to analyze the relational structure of a text. At its most basic level, RST identifies coherence in a text that is, how different parts of a text relate to each other, or more specifically how one part of a text supports, elaborates, provides background for, offers contrast to, justifies, etc, another part of the text. By looking at these relationships, the rhetorical structure of the texts in a corpus could also be mapped out (see also Fox, 1987, Chapters 45). Connor (1996) pointed out that the above kinds of analyses provided a new development in written discourse analysis. Researchers became keenly aware that different textual modes (e.g., narration, exposition, argumentation) used different discourse structures. Unlike the study of cohesion, for example, the analysis of super structures was specific to a text type. The increased interest in specific genres has further stimulated research on discourse structures of texts. Move analysis (Swales, 1981, 1990) is an example of such a specific genre analysis. Move analysis was developed as a top-down approach to analyze the discourse structure of texts from a genre; the text is described as a sequence of moves, where each move represents a stretch of text serving a particular communicative function. The analysis begins with the development of an analytical framework, identifying and describing the move types that can occur in this genre: these are the functional/ communicative distinctions that moves can serve in the target genre. Subsequently, selected texts are segmented into moves, noting the move type of each move. The overall discourse structure of a text can be described in relation to the sequence of move types. For example, a research article might begin with a move that identifies the topic and reviews previous research, followed by a move that identifies a gap in previous research, followed by a move that outlines the goals of the present study, summarizes the major findings, and outlines the organization of the paper. Until recently, top-down approaches (including move analysis) have not been applied to an entire corpus of texts, because it is highly labor-intensive to apply a top-down analytical framework to a large corpus of texts. However, this investment of labor pays off by enabling generalizable analyses of discourse structure
across a representative sample of texts from a genre. For example, once a corpus of texts has been coded for moves, we can easily analyze the typical linguistic (lexical and grammatical) characteristics of each move type. It is then possible to identify the sequences of move types that are typical for a genre, and against that background, it is also possible to identify particular texts that use more innovative sequences of move types. In summary, corpus-based move analyses illustrate the top down approach: the functional analytical framework is developed first; that framework is then applied to segment texts into discourse units (moves); and finally the moves and functional move types are analyzed to describe their linguistic characteristics. Chapters 34 in the present book illustrate this general approach to discourse structure. Rhetorical appeals analysis is another top-down approach (see Chapter 5). Instead of describing texts according to their communicative functions (moves), rhetorical appeals analysis divides texts into sections using the three basic means of Aristotelian persuasion: ethos, pathos, and logos. Similar to move analysis, this approach begins with the development of an analytical framework, identifying and defining the appeal types. The texts in a corpus are then analyzed by applying this analytical framework: segmenting texts into appeals, noting the appeal type of each appeal. In practice, most previous discourse analyses have been top-down. However, there have been few previous top-down studies of discourse applied to an entire corpus of texts, in large part because the analyses are so labor-intensive. In the present book, we illustrate two particular top-down approaches to discourse: move analysis (Chapters 34) and rhetorical appeals analysis (Chapter 5). 3.2 Example of bottom-up approach
In contrast to the long research tradition applying top-down analyses of discourse, the bottom-up approach was only recently developed, specifically for corpusbased analyses of discourse structure. This approach has not been previously practiced by discourse analysts because it requires advanced computational techniques and does not make sense for the analysis of an individual text. That is, a discourse analyst traditionally begins by considering the communicative-functional context of a text, and relies on those considerations to identify the components of the text, and how a text is organized in those terms. In contrast, the bottom-up approach was developed to address the methodological problem of how discourse patterns could be analyzed in a large corpus, with hundreds or thousands of texts. In theory, top-down analyses can also be applied to large text corpora, but in practice, such analyses are limited by the human resources that are available for manually coding discourse units in texts. The bottom-up
approach has no such limitations, because it incorporates automatic computational techniques which can be easily applied to the analysis of hundreds of texts. Vocabulary-Based Discourse Unit (VBDU) analysis is the specific bottom-up approach illustrated in the present book. (See Chapter 6 for a detailed description.) The first step is to automatically segment texts into discourse units the VBDUs. This is done using computational techniques, based on vocabulary repetition. At this stage, we know nothing about the underlying types of discourse units or the communicative functions served by these types. Then, in the second step, we undertake comprehensive linguistic descriptions of each VBDU (again utilizing automatic computational techniques). These linguistic descriptions are used to group VBDUs into categories, so that all the VBDUs in a grouping are similar linguistically. At that point, functional considerations become important, because the linguistic groupings of VBDUs are interpreted as functional VBDU-types. That is, each type represents a grouping of VBDUs that are similar in their lexicogrammatical characteristics, and those groupings are interpreted to identify their typical discourse meanings and functions. Finally, the overall discourse organization of texts is described as sequences of VBDUs, noting the functional discourse type of each VBDU. One major difference between the two approaches is the role of the functional versus linguistic analyses. In the top-down approach, the functional framework is primary. Thus, the first step in the analysis is to determine the possible discourse unit types (e.g., move types) and provide an operational definition for each one. This functional framework is then used to segment texts into discourse units. Linguistic analysis is secondary in a top-down approach, serving an interpretive role to investigate the extent to which functionally-defined discourse units also have systematic linguistic characteristics. In contrast, the linguistic description is primary in the bottom-up approach. Texts are automatically segmented into VBDUs based on vocabulary patterns, and then VBDUs are grouped into categories based on the use of a wide range of lexico-grammatical features. Functional analysis is secondary in VBDU analysis, serving an interpretive role to investigate the extent to which linguistically-defined discourse unit categories also have systematic functional characteristics.
4 Creating a specialized corpus for discourse analysis One of the central methodological issues for corpus-based research is to ensure that the corpus chosen for analysis actually represents the discourse domain being studied and is thus suitable for the research questions being investigated (see Biber 1993, 2004). This is of course no different than any other quantitative research in
the social sciences, where there is always concern that the sample being studied actually represents the larger target population (one of the potential threats to external validity). Corpus-based studies of discourse structure are potentially problematic in this regard for two related reasons: 1. Corpora are often designed for general use rather than a specific study. As a result, the population being represented can be relatively general, such as newspaper language, or even an entire language. 2. Researchers sometimes choose to use a corpus just because it is publicly available, with little consideration of whether that corpus actually represents the target population being investigated. However, these problems can be readily addressed. Most corpora have been designed with relatively well-specified sub-corpora that represent particular text categories, such as academic research articles, newspaper editorials, or face-to-face conversation. When corpus studies have been based on particular sub-corpora, the findings have been much more interpretable. In addition, many recent corpora have been designed for more particular research purposes. For example, the T2KSWAL Corpus a relatively general corpus was designed to represent the range of spoken and written genres used in American universities (including sub-corpora for office hours, study groups, textbooks, course syllabi, etc.; see Biber 2006b). The ICIC Fundraising Corpus is somewhat more specialized, designed to represent American fundraising discourse, including sub-corpora for genres like direct mail letters and grant proposals (see Connor & Upton, 2004a, 2004b; Upton, 2002; Upton & Connor, 2001). In general, more specialized corpora are more appropriate for the study of discourse structure. The corpora used in the present book are all relatively specialized, but they differ in the extent to which they represent a narrowly-defined genre. At one extreme, the study reported in Chapter 4 of the present book is based on a highly restricted corpus of research articles published in biochemistry academic journals. Prior research was carried out to identify the five most prestigious academic journals in this discipline, and then research articles were collected over a 12-month period from those journals. The study reported in Chapter 7 is based on a corpus of research articles published in biology academic journals, but it deliberately includes a range of sub-disciplines in the sample. The study in Chapter 3 is based on analysis of the direct mail letters included in the ICIC Fundraising Corpus; these include letters from a wide variety of non-profit organizations across a wide variety of non-profit fields (e.g., health and human services, education). Finally, the corpus used in Chapter 8 is probably the least specialized, consisting of transcripts from university-level classroom teaching sessions collected across sev-
eral different academic disciplines. However, all corpora used here are relatively specialized, restricted to particular genres. Such corpora are required for corpusbased studies of discourse structure: each text has its own discourse organization, and it is reasonable to hypothesize that all texts from a genre will tend to share similar patterns of discourse organization. Our goals in the present book are relatively straightforward: we hope to analyze corpora that represent particular genres, to describe the patterns of discourse organization in those genres and to investigate empirically the variation in discourse patterns across texts within a single genre.
5 Overview of the book The book is organized into two parts, corresponding to the two major corpus-based approaches to discourse organization introduced in Section 2 above. Part I of the book focuses on Top-down analyses of discourse organization. Chapter 2 introduces top-down analysis in greater detail, describing the analytical procedures required for these analyses, with a special focus on genre-based move analysis and the methodological issues that arise during the application of this approach to the analysis of a corpus of texts. Part I of the book then presents three case studies illustrating the top-down approach. The first case study (Chapter 3) describes how fundraising letters are structured in terms of rhetorical moves, focusing on the linguistic expression of stance in the different move types. The second case study (Chapter 4) describes the typical discourse organizations of biochemistry research articles, again using move analysis as the primary analytical framework. Rather than focusing on a restricted set of linguistic features, this second case study undertakes a multi-dimensional analysis (see Appendix One) to describe the typical linguistic characteristics of move types in this genre with respect to a wide range of lexical and grammatical features. Finally, the last chapter in Part I of the book introduces a second top-down approach to discourse structure: appeals analysis. This approach is applied to the same corpus of fundraising letters as in Chapter 3, allowing a direct comparison of these two analytical approaches. Part II of the book Bottom-up analyses of discourse organization then deals primarily with Vocabulary-Based Discourse Unit (VBDU) analysis. Chapter 6 introduces this analytical framework in detail, describing both the analytical procedures and experimental research that explores the extent to which the automatically-identified VBDUs correspond to discourse units recognized on a perceptual basis by human raters. Two case studies based on this approach are then presented: Chapter 7 presents a bottom-up analysis of a corpus of biology research articles, describing how texts from this genre are structured as sequences of VBDUs; Chapter 8 presents a similar analysis of VBDUs in university classroom
teaching sessions. Finally, the concluding chapter (9) provides a synopsis of findings, a more theoretical discussion of the strengths and weaknesses of each approach, and a discussion of future prospects for investigations of this type.
Part 1
Top-down analyses of discourse organization
chapter 2
Introduction to move analysis

WITH Budsaba Kanoksilapatham
In Chapter 1, we introduced two different approaches for using corpora to analyze discourse organization: top-down and bottom-up corpus-based analyses. This chapter focuses primarily on one type of top-down approach: move analysis. We give a detailed description of move analysis including what it is, what this type of analysis tells you, examples of studies using move analysis, steps to conducting a move analysis, and special considerations for and advantages of using a corpus-based approach. As noted in the previous chapter, there are many top-down approaches to discourse analysis, like the appeals analysis described in Chapter 5; move analysis, however, is the approach that has been most frequently used to date in corpus-based studies. Chapters 3 and 4 provide specific examples of these kinds of studies. The intent of the present chapter is to introduce the goals and methods of corpus-based move analysis (as one common type of top-down discourse analysis), in order to show how generalizable corpus-based descriptions of discourse organizational patterns can be achieved using a topdown approach.
Background
Genre analysis using rhetorical moves was originally developed by Swales (1981) to describe the rhetorical organizational patterns of research articles. Its goal is to describe the communicative purposes of a text by categorizing the various discourse units within the text according to their communicative purposes or rhetorical moves. A move thus refers to a section of a text that performs a specific communicative function. Each move not only has its own purpose but also contributes to the overall communicative purposes of the genre. In Swales words, these purposes together constitute the rationale for the genre, which in turn shapes the schematic structure of the discourse and influences and constrains choice of content and style, with texts in a genre exhibiting various patterns of similarity in terms of structure, style, content and intended audience (1990, p. 58).
Genre analysis was developed in the 1970s and 1980s as part of the wider growth of discourse analyses focusing on the organization of discourse. Bhatia (2004) documents how structural concerns, for example Hoeys (1983) problemsolution structure analysis, directed the analysts attention away from studying lexico-grammatical features of texts (e.g., passives and nominalizations, use of tenses, coherence). Researchers involved in the analysis of text as genre further related discourse structures to the communicative functions of texts, resulting in the current approach of doing genre analysis using rhetorical moves. In genre analysis, the purposes of the genre are recognized by the expert members of the discourse community, less so by the novice members, and probably not by the nonmembers. These purposes shape the rationale, and the rationale helps develop the constraining conventions. According to Swales (1990), these conventions are constantly changing but still exert influence. As we will see in later chapters, discourse communities are powerful in shaping the conventions of the genre. Research papers in scholarly disciplines are good examples of such discourse communities where novice writers are indoctrinated into the paper-writing genre in their graduate studies and young publishing lives. There are genres, however, which are not shaped by such strong discourse community rationales. Take fundraising letters as an example. It is fair to say that both writers and readers recognize a fundraising letter as such. However, since readers and potential donors do not typically write them, conventions may not be so strictly adhered to. In fact, deviance from conventions may seem fresh to the reader who may receive hundreds of them a year but does not need to worry about writing any. In move analysis, the general organizational patterns of texts are typically described as consisting of a series of moves, with moves being functional units in a text which together fulfill the overall communicative purpose of the genre (Connor, Davis, & De Rycker, 1995). Moves can vary in length, but normally contain at least one proposition (Connor & Mauranen, 1999). Some move types occur more frequently than others in a genre and can be described as conventional, whereas other moves occurring not as frequently can be described as optional.Moves may contain multiple elements that together, or in some combination, realize the move. These elements are referred to as steps by Swales (1990) or strategies by Bhatia (1993a). The steps of a move primarily function to achieve the purpose of the move to which it belongs (see, e.g., Crookes, 1986; Dudley-Evans, 1994a; Hopkins & Dudley-Evans, 1988; Swales, 1981, 1984, 1990). In short, moves represent semantic and functional units of texts that have specific communicative purposes; in addition, as the following sections show, moves generally have distinct linguistic boundaries that can be objectively analyzed.
Chapter 2. Introduction to move analysis
2 Swales move analysis of research articles Swales (1981) developed the discourse approach of move analysis within the more general field of English for Specific Purposes (ESP). This approach has been revised and extended by several scholars, including Swales (1990). The original aim of Swales work on move analysis was to address the needs of advanced non-native English speakers (NNSs) learning to read and write research articles, as well as to help NNS professionals who want to publish their articles in English. His analysis of 48 introduction sections in research articles from a range of disciplines (physics, medicine, and social sciences), written in English, led Swales to propose a series of moves i.e., specific communicative functions performed by specific sections of the introductions that defined the rhetorical structure of research article introductions. A closer examination of Swales move structure, or framework, for these introductions helps elucidate the interaction between moves and steps in performing communicative functions in scientific texts. Swales three-move schema for article introductions, collectively known as the Create a Research Space (CARS) model, is presented in Table 2.1. The model shows the preferred sequences of move types and steps, which are largely predictable in research article introductions.
Table 2.1 CARS model for research article introductions, adapted from Swales (1990, p. 141)
Move 1: Establishing a territory Step 1 Step 2 Step 3 Establishing a niche Step 1A Step 1B Step 1C Step 1D Move 3: Occupying the niche Step 1A Step 1B Step 2 Step 3 Outlining purposes or Announcing present research Announcing principal findings Indicating RA structure Counter-claiming or Indicating a gap or Question raising or Continuing a tradition Claiming centrality and/or Making topic generalization(s) and/or Reviewing items of previous research
Move 2:
Swales model includes three basic move types in research article introductions. Move 1 Establishing a territory introduces the general topic of research. Move 2
Establishing a niche identifies the more specific areas of research that require further investigation. And Move 3 Occupying a niche introduces the current research study in the context of the previous research described in Moves 1 and 2. Move 1 can have a maximum of three steps (Step 1, Step 2, and Step 3). In Move 1, Step 1, Claiming centrality, the author can make a centrality claim by claiming interest or importance in referring to the classic, favorite or central perspective, or by claiming that there are many investigators in the area. This step is usually, but not always, at the beginning of the introduction. To illustrate Move 1, Step 1, Swales (1990) presents the following examples: The study ofhas become an important aspect of A central issue inis the validity of (Swales, 1990, p. 144) Move 1, Step 2, Making topic generalizations, represents a neutral kind of general statement. It usually takes the form of either statements about knowledge or practice, or statements about phenomena. Usually, this step seeks to establish territory by emphasizing the frequency and complexity of the data. Some examples of Move 1, Step 2 are: The aetiology and pathology is well known. A standard procedure for assessing has been There are many situations where (Swales, 1990, p. 146) The last step of this move, Step 3, Reviewing items of previous literature, is where the author reviews selected relevant groups of previous research. Here, the author specifies the important findings of the study and situates his/her own current research study. Examples of Move 1, Step 3 are: X Was found by Sang et al.(1972) to be impaired. Chomskyan grammarians have recently (Swales, 1990, p. 150) In establishing territory, then, the author convinces the readers about the importance of the area of study by making strong claims with reference to previously published research, which can be done in three ways, as indicated by the three step options. Move 2 of the CARS model, Establishing a niche for about-to-be presented research, is considered a key move in research article introductions because it connects Move 1 to Move 3, by articulating the need for the research that is being presented. Move 2 is manifested in one of four ways: Step 1A, Counter claiming; Step 1B, Indicating a gap, Step 1C, Question raising, and Step 1D, Continuing a
tradition. The four options for realizing Move 2 are represented by the following examples, taken from Swales, 1990, p. 154:
Step 1A, Counter Claiming Step 1B, Indicating a Gap Step 1C, Question Raising Step 1D, Continuing a Tradition Emphasis has been on, with scant attention given to The first group...cannot treat and is limited to Both suffer from the dependency on A question remains whether
The final move type that Swales proposed for research article introductions is Move 3, Occupying the niche. As noted earlier, Move 1 reports on the centrality of the research topic or generalizations about previous research. Move 2 expresses the authors own opinions about the need for the current research (with reference to the past literature). Importantly, Move 3 is distinct from the other two moves in the Introduction in that the authors assume a more active role in the research conducted, rather than just referring to previous studies or asserting the need for this one. In fact, Move 3 is the only place in the research article introduction where the authors express and enjoy their own accomplishment, pride, and commitment (Swales, 1990). Move 3 introduces new research by first either Stating research purpose(s) (Step 1A) or Describing the main features of the research (Step 1B), then by Announcing the principal findings (Step 2), and then finally by Indicating the research article structure (Step 3). Examples illustrating the steps of Move 3, taken from Swales (1990, p. 160) are:
Step 1A, Outlining Purpose Step 1B, Announcing Present Research Step 2, Announcing Principal Findings Step 3, Indicating Research Article Structure The aim of the present paper is to give This study was designed to evaluate The paper utilizes the notion of This paper is structured as follows
Swales CARS model for academic research articles has been widely studied and validated since it was first published in 1990. The model has been shown to have a recursive nature what Swales has called recycling (1990, p. 140) with moves or steps occurring more than once as well as with varied realizations in research writing across contexts. For example, Bunton (2002) has shown that the genre of Ph.D. theses introductions, while having the same general CARS structure pro-
posed by Swales, has some alternate ways for realizing the three basic moves. One example of this is in Move 1, Establishing a Territory; Bunton proposes that a new step Defining terms plays an important part in fulfilling the function of helping to establish the territory to be covered in Ph.D. thesis introductions, while this is not the case for research article introductions. Indeed, subsequent research on the introduction section of research articles in other disciplines (see discussion below) has helped us recognize how different disciplines manipulate a common genre in this case, research articles to meet their own communicative needs. Our understanding of one small section of academic research articles Introductions has evolved from a one size fits all perspective to a more subtle, discipline-specific understanding of the rhetorical purposes and expectations of research articles. Swales (2004), in response to this subsequent research, modified his model to better reflect the variability in how the three move types are realized in different sub-genres of research article introductions. His revised model, shown in Table 2.2, has a broader description of the communicative purposes of Move 1 and Move 2; it also reflects particularly in Move 3 the variation that occurs in introductions in different research fields, and recognizes the possibility of cyclical patterns of occurrence of the move types (described further below) within the introduction section.
Table 2.2 Swales revised model for research article Introductions (2004, pp. 230, 232)
Move 1: Move 2: Establishing a territory (citations required) via Topic generalizations of increasing specificity Establishing a niche (citations possible) via: Step 1A: Indicating a gap, or Step 1B: Adding to what is known Step 2: Presenting positive justification (optional) Move 3: Presenting the present work via: Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7: Announcing present research descriptively and/or purposively (obligatory) Presenting research questions or hypotheses* (optional) Definitional clarifications* (optional) Summarizing methods* (optional) Announcing principal outcomes (optional)** Stating the value of the present research (optional)** Outlining the structure of the paper (optional)**
* Steps 24 are less fixed in their order of occurrence than the others. ** Steps 57 are probable in some fields, but unlikely in others.
The key point here is that while related genres will certainly share common move types, each will have their own unique structural characteristics that reflect the specific communicative functions that the genres have.
3 Move analysis of research articles applied across genres 3.1 Description and examples
While move analysis was originally developed as a tool to teach non-native speakers the rhetorical structures of research articles, Swales framework has been successfully extended to other areas of English for Specific Purposes (ESP) instruction, including English for Business and Technology (Bhatia, 1993a, 1997a) and English for Professional Communication (Flowerdew, 1993). Swales framework of move analysis has stimulated substantial research on the rhetorical structures of academic and professional texts. In academic writing, it has been applied to academic disciplines including biochemistry (Kanoksilapatham, 2005; D. Thompson, 1993), biology (Samraj, 2002), computer science (Posteguillo, 1999), and medicine (Nwogu, 1997; Williams, 1999), as well as on a variety of academic genres, including university lectures (S. Thompson, 1994), master of science dissertations (Hopkins & Dudley-Evans, 1988), and textbooks (Nwogu, 1991). Within the genre of scientific research articles the original focus of move analysis a number of move-based studies have focused on specific sections of research articles. For example, Crookes (1986) compared Introduction sections of research articles across a variety of fields; Wood (1982) described the moves of Methods sections in chemistry articles; Thompson (D. Thompson, 1993) and Williams (1999) focused on the moves of Results sections in biochemistry and medical research articles respectively; and Peng (1987) looked at the moves used in the Discussion section of chemical engineering research articles. Posteguillo (1999) computer science and Nwogu (1997) medicine both went a step further and explored the use of moves across multiple sections within the genres they investigated, and Kanoksilapatham (2005) has investigated the move structure of complete biochemistry research articles. A more detailed description of how move analysis was used to describe the structure, and linguistic features, of entire biochemistry research articles is provided by Kanoksilapatham in Chapter 4 of this book. More recently, professional discourse has also been examined through the lens of move analysis, including legal discourse (Bhatia, 1993b), philanthropic discourse focusing on direct mail letters (Upton, 2002; Upton & Connor, 2001) and grant proposals (Connor, 2000; Connor & Mauranen, 1999; Connor & Upton, 2004a) and movie reviews (Pang, 2002).
A brief description of a move analysis done on a corpus of job application letters (Connor, Precht, & Upton, 2002) provides an interesting illustration of how different genres can have quite different move types. The letters in this study were from the Indianapolis Business Learner Corpus (IBLC), which included job application letters written by business students at U.S., Belgian, and Finnish universities between 19901998. The 99 letters in the corpus were generated by students (all either business and/or English majors) as part of a common class assignment. Applying Swales approach to analyze the genre of job application letters, the following move types were identified:
Move 1: Identify the source of information. (Explain how and where you learned of the position.) I recently received word from Blockbuster Recruiting about a management position available at your company. Move 2: Apply for the position. (State desire for consideration.) I am very interested in a temporary job working as a European business student intern in the U.S.A. Move 3: Provide arguments for the job application. Step 1: Implicit arguments based on neutral evidence or information about background and experience. In providing supporting information or arguments, the writers simply list their background experience. I received my Associates Degree in General Studies in May 1993. Previously I have received a degree in Office Management from Indiana Business College and I have obtained the Certified Professional Secretary (CPS) certification. Step 2: Arguments based on what would be good for the hiring company. In this step, the writer argues explicitly that their experience or education will benefit the company that hires them. My intercultural training will be an asset to your international negotiations team. Step 3: Arguments based on what would be good for the applicant. In this step, the writer argues how the position would in fact be beneficial to him/herself. The opportunity to study abroad the globalised business environment would help me gain the knowledge and experience to grow in the changing business world of today. Move 4: Indicate desire for an interview or a desire for further contact. I hope I got you interested so that I will be selected for an interview. Im always prepared to participate in an interview.

Move 5: Move 6:
Express pleasantries or appreciation at the end of the letter. Thank you in advance for your consideration. Thank you for your time in reviewing this material. Offer to provide more information. I will be happy to provide you with any additional information that you may need. Move 7: Reference attached resume. I have enclosed my resume... A resume is enclosed.
The most obvious difference between the move structure of research article introductions and the move structure of letters of application is that the former has only three major move types and the latter has seven. This is all the more interesting to note because research article introductions (with only three major moves) are typically much longer than letters of application. Three other important points are illustrated by comparing these two move structures. The first is the fact that moves are identified by the communicative purpose that the writer is seeking to accomplish, whether that be done in one sentence or five paragraphs. Consequently, moves can be quite variable in length. The second point is that some genres have a fairly simple move structure, with only three or four basic communicative functions, while other genres may have a fairly complex move structure, with many different communicative functions. The third point is that while some moves may be realized through two or more different steps, other moves may only be expressed in one general functional-semantic way (e.g., Swales Move 1 has three steps, while Connor et al.s Move 1 has no step options). There are two additional characteristics of moves that should be noted. The first is that some move types in a genre may be more common (or obligatory), while other moves may be optional.Lewin, Fine, and Young (2001) and Bhatia (1993a) are among those that underscore this characteristic of moves. Bhatia prefers the term strategy as opposed to step, to reflect the variability among elements within a move: move elements may or may not regularly appear, and they can be used in different sequential order. In Chapter 3 of this volume, for example, we describe the variable move structure of direct mail letters; some of the move types in this genre are clearly optional, and there is a fairly free ordering of the moves within a given text. Similarly, Kwan (2006) shows that the third move (Occupying the niche) is optional in the literature review of Ph.D. theses of applied linguistics. In addition, it is possible that some move types will recur in a cyclical fashion within a section of text (Swales, 2004). Typically, the cyclical reoccurrence of a move within a section of text has been dealt with by considering each appearance of a particular move as a separate occurrence. For example, if a text starts with, say, Move Type 1, continues with Move Type 2, and then returns to Move Type 1, Move
Type 1 would be counted as having occurred twice. The studies in Chapters 3 and 4 both used this approach to identify and count moves. More rarely, moves can be interrupted by or have inserted into them another move type (Upton, 2002). While this is rather unusual, there can be clear instances where one communicative functional unit (move type) of a text interrupts, often as an aside or a tangential comment, another very different communicative functional unit of text. The study described in Chapter 3 provides an example of this. These cyclical and embedded patterns of move types tend to occur mainly in genres that are less constrained and allow more variability than those that are more prescribed. 3.2 Summary of previous research on move analysis
To highlight key points introduced above, move analysis proposes that genres are composed of definable and, to a great extent, predictable functional components that is, moves of certain types. For example, article introductions typically have three rhetorical move types establishing territory, establishing a niche, and occupying the niche. Letters of application have seven distinguishable move types as described above. According to Bhatia (1993a), the move structuring of a genre is the property of the genre itself, not something that the reader constructs. This structure is controlled by the communicative purpose(s) of the text, and is the underlying reason that one genre varies from another. The moves of a genre are considered such an inherent part of the genre that they can be used as the building blocks for teaching novice writers how to successfully write texts in that genre (Dudley-Evans, 1995), which, as already noted, was Swales initial motivation for exploring the structure of research article introductions.
4 Overview of the methods for move analysis 4.1 General steps of a move analysis
Kwan (2006) provides a useful introduction to the functional-semantic methods used for identifying discourse moves. A functional approach to text analysis calls for cognitive judgement, rather than a reliance on linguistic criteria, to identify the intention of a text and the textual boundaries (see also Bhatia, 1993a; Paltridge, 1994). This approach is in line with the theoretical definition of a move; that is, that each move has a local purpose but also contributes to the overall rhetorical purpose of the text.
It is important to note that there are no strict rules for doing a move analysis, nor does every researcher necessarily do each of the steps described below. The intent here is to simply describe common procedures in doing a move analysis. First, in order to identify the move categories for a genre, it is important to get a big-picture understanding of the overall rhetorical purpose of the texts in the genre. The second step is then to look at the function of each text segment and evaluate what its local purpose is. This is the most difficult step. Move categories need to be distinctive. Multiple readings and reflections of the texts are needed before clear categories emerge. The third step is to look for any common functional and/or semantic themes represented by the various text segments that have been identified, especially those that are in relative proximity to each other or often occur in approximately the same location in various texts representing the genre. These functional-semantic themes can then be grouped together, reflecting the various steps (or strategies) of a broader move type, with each move having its own functional-semantic contribution to the overall rhetorical purpose of the text. Swales proposed the first CARS move, Establishing a Territory, as it was clear that research article introductions almost always began with a section that functioned to provide a context for the study being introduced, whether this was done by claiming the centrality of the study (Step 1), and/or by making generalizations about the topic being studied (Step 2), and/or by reviewing items of previous research on the topic (Step 3). Not all research articles introductions have all the steps, but most have at least one of them, serving the function of establishing the territory for the study to follow. When a researcher is ready to segment a particular text into moves, it is best to begin first with a pilot coding, ideally with at least two coders. Because coders are seeking to understand the functional-semantic purposes of text segments, coding must be done by hand. Initial analyses are then discussed and fine-tuned until there is agreement on the functional and semantic purposes that are being realized by the text segments, resulting in a protocol of move and step features for the genre, with clearly defined purposes and examples. For a corpus-based move analysis, this coding protocol is then applied to the full set of texts. Inter-rater reliability should be checked to confirm that there is agreement on what the move types are and how they are realized by text segments (see Section 4.2 below). At this point, it may be necessary to resolve any discrepancies through further discussion and analysis, and then re-code problematic texts. It is also not uncommon that additional steps or even move types will be discovered during the analysis of the full set of texts. As noted earlier, some move structures can prove more complex than the three-move structure of the CARS model. For example, Bhatia (1998) has noted that fundraising discourse offers a large variety of creative options (p. 100; see
also Chapter 3). In other words, some genres, especially dynamic and persuasionoriented ones like fundraising letters, may have obligatory, typical, and optional move elements, and move types may not necessarily occur in a fixed order. Nevertheless, a move structure for a genre can still be identified by working through the general process outlined above. Table 2.3 summarizes the typical move analysis process as it is done in a corpus-based approach.
Table 2.3 General steps often used to conduct a corpus-based move analysis
Step 1: Step 2: Step 3: Determine rhetorical purposes of the genre Determine rhetorical function of each text segment in its local context; identify the possible move types of the genre Group functional and/or semantic themes that are either in relative proximity to each other or often occur in similar locations in representative texts. These reflect the specific steps that can be used to realize a broader move. Conduct pilot-coding to test and fine-tune definitions of move purposes. Develop coding protocol with clear definitions and examples of move types and steps. Code full set of texts, with inter-rater reliability check to confirm that there is clear understanding of move definitions and how moves/steps are realized in texts. Add any additional steps and/or moves that are revealed in the full analysis. Revise coding protocol to resolve any discrepancies revealed by the inter-rater reliability check or by newly discovered moves/steps, and re-code problematic areas. Conduct linguistic analysis of move features and/or other corpus-facilitated analyses. Describe corpus of texts in terms of typical and alternate move structures and linguistic characteristics
Step 4: Step 5: Step 6: Step 7: Step 8:
Step 9: Step 10:
The ten steps outlined in Table 2.3 correspond to the general analytical steps for top-down analyses listed in Table 1.1 (in Chapter 1). For example, the analytical step Communicative/Functional Categories in Table 1.1 corresponds to Steps 15 in Table 2.3. The steps Segmentation and Classification from Table 1.1 in practice occur concurrently in Steps 68. The steps Linguistic analysis of each unit and Linguistic analysis of discourse categories from Table 1.1 are reflected in Step 9, and the final step in both Table 1.1 and Table 2.3 are the same. While the process described here is not the only way to do a corpus-based move analysis, in the end, the move structure should represent the rhetorical movement (Swales, 1990, p. 140) of the functional-semantic purposes of the text segments that make up the genre, and all texts in the corpus must be coded for these distinctions.
4.2
Inter-rater reliability
For top-down approaches to discourse analysis, the first methodological steps in the analysis involve human judgements to identify and code the discourse components of a text. This kind of analysis requires a detailed coding rubric, which explicitly defines the discourse components (e.g., the move types and steps). A minimal evaluation of this rubric is to determine whether raters can achieve high inter-rater reliability when they apply the coding scheme. That is, do different raters understand the coding definitions in the same way, with the result that they all identify the same discourse components in a text, and they all agree on the classification of those text segments as move types. The simplest method of reporting inter-rater reliability is percent agreement. This statistic merely reflects the number of agreements per total number of coding decisions, but it does not account for chance agreement among raters. A more common statistic for determining inter-rater reliability is Cohens kappa (k). Cohens kappa is a chance-corrected measure of inter-rater reliability that assumes two or more raters, n cases, and m mutually exclusive and exhaustive nominal categories (Capozzoli, McSweeney, & Sinha, 1999). Training is generally done to achieve better and more consistent inter-rater reliability, but more importantly, training encourages evaluators to examine the definitions in the coding rubric, and to arrive at a more explicit description of what each coding category represents. Inter-rater reliability should not be confused with objectivity or validity; it is rather just a measure of consistency and agreement. As noted by Raymond (1982), the degree to which inter-rater reliability is desirable varies with what is being evaluated: It would be possible to achieve near perfect inter-rater reliability by simply counting the number of words produced; but no one would seriously accept this as a measure of quality [of writing]. Because the quality of writing resides not entirely in the text, but in the interactions among the text, its author, and its individual readers, we should not only expect but actually demand a reasonable amount of variation among raters, with an inter-rater reliability of.80 being acceptable (p. 401). Much the same can be said about identifying move boundaries and coding move types. Moves, by definition, perform communicative functions within a text, but raters can differ in their understanding of the purpose of a specific text or portion of a text. Nevertheless, the process of identifying and discussing discrepancies increases inter-rater reliability among researchers and results in a more usable and consistently interpreted move framework for a genre.
5 Using a corpus-based approach to move analysis 5.1 Corpus-based move analysis
Much of the previous discussion has focused primarily on describing and discussing the theory behind and the process of doing a move analysis. Discourse analysis in general, and move analysis in particular, has typically been a qualitative approach to analyzing discourse, with studies focusing on only a few texts. This is well illustrated by the collection edited by Mann and Thompson (1992), which includes twelve different analytical approaches to analyzing the discourse of one single letter. In contrast, a corpus-based approach requires analysis of a well-designed representative collection of texts of a particular genre. These texts are encoded electronically, allowing for more complex and generalizable research findings, revealing linguistic patterns and frequency information that would otherwise be too labor intensive to uncover by hand (Baker, 2006, p. 2). That is not to say that a corpus-based approach is simply a quantitative approach. Corpus-based discourse analysis depends on both quantitative and qualitative techniques. Even with a corpus-based approach, the moves and move types in each text must first be identified and tagged individually by the researchers making qualitative judgments about the communicative purposes of the different parts of a text. And even once quantitative data are run, the results must still be interpreted functionally. As has been noted previously, Association patterns represent quantitative relations, measuring the extent to which features and variants are associated with contextual factors. However, functional [qualitative] interpretation is also an essential step in any corpus-based analysis (Biber et al., 1998, p. 4). To summarize, what makes a corpus-based approach to move analysis different from the traditional approach are the following: a) analyses are done on a relatively large representative collection of texts from a particular genre; b) all texts are electronically encoded to allow for computerized counts and calculations using different programs and software packages; c) once the coding rubric for move types is developed, all texts in the corpus are coded to identify the moves and code the move types; d) analysis of the linguistic characteristics of specific move types can be easily done in order to provide details about how different communicative purposes are realized linguistically; and e) in addition to conducting the traditional move analysis, quantitative counts permit the discussion of general trends, relative frequency of particular move types, and prototypical and alternate patterns of move type usage (this is discussed further below).
5.2
General advantages of corpus-based approaches to discourse analysis
There are several advantages to using a corpus-based approach to top-down analyses of discourse (including move analysis and appeals analysis). Baker (2006) in his book, Using Corpora in Discourse Analysis, outlines four advantages of using corpora to analyze discourse. First, a corpus-based approach helps reduce researcher bias. All researchers approach their research from a particular worldview; often we are aware and take account of our biases, but often we are unaware of biases. As Baker notes, by using a corpus, we at least are able to place a number of restrictions on our cognitive biases (p. 12); overall patterns and trends are more likely to show through when we are looking at dozens of texts rather than just one or two selected texts. In short, corpus-based approaches help put the focus of discourse analysis on interpretation of the data not the data itself by reducing the opportunity for manipulation (conscious or unconscious) of the texts selected for analysis. The second advantage of corpus-based discourse analysis identified by Baker (2006) addresses what he calls the incremental effect of discourse (p. 13). The primary purpose of discourse analysis is to understand how language is used, often in quite subtle ways. A single text on its own is insignificant; however, corpus analysis allows us to see patterns of words, phrases, structures and/or discourses that permeate, often contrary to common-sense, our language. A corpus also allows researchers to see patterns that exist but might otherwise miss when analyzing a small sample of texts because they are not overwhelmingly frequent. The third advantage Baker (2006) gives for using a corpus-based approach to discourse analysis is that it is much easier to identify counter-examples resistant discourse on the one hand, and to less readily mistake them for hegemonic or dominant discourse on the other hand (p. 14). For example, results of a corpus-based move analysis are much more likely to represent the move and linguistic structures that are in fact typical for the genre as a whole, and much less likely to be skewed by the random selection and analysis of only a handful of texts that may turn out to not be representative of the genre as a whole. Lastly, Baker (2006) suggests that a significant advantage to a corpus-based approach is that it is easily combined with other methodologies to reinforce and strengthen the overall analysis, what is often called triangulation. For example, the approach presented in Chapters 34 of the present book combines move analysis with analysis of the linguistic characteristics of the move types to describe how different communicative purposes are linguistically realized. While these four advantages are relevant for all approaches to discourse analysis, a corpus-based perspective offers distinct advantages to move analysis in particular, which are described in the next section.
5.3
Specific advantages of a corpus-based perspective for move analysis
5.3.1 Identifying linguistic features of moves While one could do a move analysis of a single text, it only becomes possible to describe the typical linguistic characteristics of move types through a corpusbased approach. Before computerized analysis, there were attempts to summarize the occurrence of linguistic features in genre moves. For example, Swales (1990, pp. 131132) summarized the findings of 40 published studies which described the use of linguistic features in the four major sections of research articles. He concluded that five linguistic features that verb complement, present tense, past tense, passive voice, and authors comments or hedging co-occur in particular patterns to convey particular rhetorical functions. The patterns observed, based on the five linguistic features, provide evidence for a two-way distinction between Introduction/Discussion and Methods/Results sections. The Introduction and Discussion sections have the functions, respectively, of providing the background of the current study and interpretation of the results. The features frequently found to be associated with these functions are that complements, present tense, and authors comments. The Methods/Results sections, respectively, provide information regarding experimental procedures and present findings of the current study. Associated with these functions are a high use of past tense and a variable use of passive voice verb forms. The studies cited by Swales usually analyzed selected linguistic features by hand, looking for patterns and differences. With computers, much more interesting and comprehensive linguistic analyses can be undertaken. Analyses which take into account only individual linguistic features will reveal very little about the co-occurrence of linguistic features and how features interact with each other in a move to perform a particular communicative purpose. It would be more informative and useful to study the distribution and co-occurrence of many features of language at once, rather than considering the distribution and function of individual features singly. Computer driven, corpus-based approaches allow us to do this. Chapters 3, 4, & 5 in this volume provide examples of how various linguistic structures work together in unique combinations to help realize the rhetorical purposes of the different moves identified for each genre. It needs to be remembered that move types, and their component steps, are identified by the functional and semantic purposes that they have. Nevertheless, because different moves have different functional and semantic purposes, it seems reasonable to expect that move purposes will be realized through variations in linguistic features. This is, in fact, what Swales observed in his early analysis of research articles: The evidence suggests a differential distribution of linguistic and rhetorical features across the four standard sections of the research article
(1990, p. 136). Consequently, as noted in Chapter 1 of this volume, once texts have been segmented into moves, it is possible to analyze the linguistic characteristics of each move to determine the typical linguistic characteristics of the different move types. This type of analysis has not generally been done in traditional move analysis studies, and it can be argued that the lack of a description of the typical lexico-grammtical characteristics of these discourse units (i.e., move types) is a significant shortcoming of the non-corpus-based approach. 5.3.2 Move frequencies and lengths Another advantage of the corpus-based approach to move analysis is that it allows description of the typical distributional and structural characteristics of each move type. That is, once moves in a corpus have been coded, a variety of descriptive counts can be made. The most obvious of these are the overall frequency of occurrence of each move type in the corpus, and the average length in words of each move type. Statistics like these allow us to make a clear determination as to whether a particular move type is obligatory, expected, or merely optional.For example, in the study described in Chapter 3, the third move type can be considered obligatory, as it occurred in over 97% of the texts, while the first move type is clearly optional, as it occurred in only about 15% of the texts. If it were not for the corpusbased approach used to analyze this genre, this optional move might not have even been identified, because it occurs so infrequently or if it had been identified, its importance in the genre might have been overstated. Similarly, it is interesting to note that the third move is, on average, 48 words in length, while the second move, which occurs in 93% of the texts, is three times longer at 150 words in length. By identifying this rather large difference in length between these two obligatory move types, the corpus-based approach invites additional follow-up questions to explore what the source of this difference might be. 5.3.3 Mapping move use and locations A computer can be used to count not only the presence of each move type for each text but also to keep track of their positions relative to each other (e.g., first, second, third), what other move types each most commonly co-occur with, how frequently a move is embedded in another move, and how frequently a move occurs in the body of the text as opposed to, say, a P.S. The ability to make these sorts of observations permits us to extend our analysis in several ways. For example, it is possible to look at the relationship that different move types have with each other. Again, looking ahead to the study described in Chapter 3, the text position of two of the moves that are identified for the fundraising letter genre turns out to be quite predictable: although Move 1 and Move 7 are optional moves, when they are present in a direct mail letter, Move 1 occurs as
the initial move in the letter 97% (34/35) of the time and Move 7 occurs as the final move before the complementary close 100% (33/33) of the time. The positions of Move 2 and Move 3 are also highly predictable. If one ignores the presence of Move 1, Move 2 occurs as the initial move in the direct mail letter 74% (180/242) of the time. And Move 2, regardless of its position in the letter, is immediately followed by Move 3 87% (316/362) of the time. 5.3.4 Genre prototypes With statistics on move frequencies and lengths, as well as descriptions of where in the genre a move type tends to occur and how one move type typically relates to another, a key advantage of a corpus-based approach can be realized: the ability to develop genre prototypes. Prototypes are particularly valuable in educational and training contexts to help novices learn to understand and produce a genre that is new to them. In the study described in Chapter 3, for example, three different prototypes of the genre are provided. The first includes only the obligatory moves, the second adds the expected moves, and the third is based on all moves (including the optional ones). In these prototypes, not only can the different move types be included, but typical and alternate locations of moves relative to other moves in the text can be described. In addition, if linguistic analysis or other follow-up analyses of the individual moves were done, the prototypes can represent these features as well. Prototypes such as these are also very useful in understanding better the genre variation that occurs between different disciplines. For example, Kanoksilapatham in Chapter 4 shows that the moves in the introduction sections of biochemistry research articles varies somewhat from the CARS model that Swales has proposed (introduced earlier in this chapter).
6 Summary This chapter has introduced the top-down approach used most often by applied linguists for the analysis of discourse structure: move analysis. While discourse analysis has often been concerned with sentence-level features in writing or general modes of writing such as narration, description, and comparison and contrast, move analysis has given researchers and practitioners useful text-focused tools. We first discussed the theoretical and empirical underpinnings of traditional move analysis. We then presented a description of corpus-based move analysis, with steps that followed the guidelines proposed in Chapter 1 for top-down analyses of discourse structure. The chapter concluded with a discussion of the added advantages of a corpus-based approach to move analysis. These include the ease of identifying the linguistic characteristics of the moves, their frequencies and
lengths, and the mapping of their use and location in the overall discourse structure of texts. Chapters 3 and 4 put this model into practice, presenting corpusbased move analyses of fund raising letters (Chapter 3) and biochemistry research articles (Chapter 4).
chapter 3
Identifying and analyzing rhetorical moves in philanthropic discourse

In Chapter 2, we described the general approach that can be used to identify and analyze the moves of a genre. In this chapter, we will describe and expand on a study1 that uses a move analysis to show the rhetorical structure of direct mail letters, a type of philanthropic discourse. This study illustrates how move analysis is done and provides a model that can be used for the study of other genres, especially genres from professional (e.g., business, legal, medical) contexts. Furthermore, a review of the corpus that was used in this study and its characteristics will provide a useful example of a specialized corpus that is essential to conducting a move analysis of a specific discourse genre.
Background
Philanthropic discourse fundraising texts like direct mail letters or magazine advertisements seeks to persuade, inform, request, catch ones eye, wrench ones heart, and twist ones arm all in a tidy attractive package. The weight upon these texts is, in fact, enormous. Nonprofit organizations depend to a larger or smaller extent on fund-raising texts for operating expenses or for funding to accomplish capital goals. And yet, the various genres of philanthropic discourse have not been closely studied. Indeed, Bhatia (1998) claims that the discourse of fundraising represents one of the most dynamic forms of language use. For a relatively limited number of communicative functions, this discourse form offers a large variety of creative options, some rarely used before. It is a category of genre that offers an interesting and challenging profile of linguistic realizations to achieve a limited set of generic objectives (Bhatia, 1998, p. 100).
1. This chapter draws on material previously published in the following two articles: (1) Upton, T. A. (2002). Understanding direct mail letters as a genre. International Journal of Corpus Linguistics 7(1), 6585; and (2) Connor, U. & Upton, T. A. (2003). Linguistic dimensions of direct mail letters. In C. Meyer & P. Leistyna (Eds.), Corpus Analysis: Language Structure and Language Use (pp. 7186). Amsterdam: Rodopi Publishers.
The dynamic nature of philanthropic discourse is due to the fact that it is designed to be quite persuasive. In short, its primary purpose is to persuade people to contribute to worthy causes or to underwrite philanthropic programs (Connor, 2000). Because of its persuasive purposes, fundraising has a great deal in common with promotional materials such as sales letters and job applications, in which the purpose is to sell something: in sales letters, a service or product; in letters of application, a persons abilities; in fundraising, a worthy cause (Bhatia, 1993a; Connor & Wagner, 1998). Recent studies of philanthropic discourse, specifically fund-raising texts, have for the most part employed a qualitative approach, analyzing characteristics such as communicative functions (Bhatia, 1997b; Connor, 1997), rhetorical patterns (Abelen, Redecker, & Thompson, 1993; Crismore, 1997; Lauer, 1997), social contexts (Bazerman, 1997a; Myers, 1997), metaphors (McCagg, 1997), and cultural differences (Connor & Wagner, 1998; Graves, 1997). Although these studies have contributed to our understanding of the language of fund raising, the qualitative nature of these studies left us without an empirical baseline for comparing the general features of fundraising texts with those of other common texts. Of particular interest are the types of rhetorical moves that are used to define the different genres of philanthropic discourse. What was missing is a corpus-based study of fundraising texts to develop such a baseline. The Indiana Center for Intercultural Communication (ICIC), with funding from and in cooperation with the Indiana University Center on Philanthropy, undertook a concerted effort to carefully study the language of fundraising by collecting a large corpus of fundraising material and then studying, among other things, the rhetorical moves in these genres. The focus of the present chapter is on the direct mail letters used by non-profit agencies to introduce readers to or remind them about what the agency does, the clientele/services they are involved with, and/or the needs that they have that the reader is being asked to assist with usually financially. Specifically, this study will first investigate the discourse structure typical of the letters in the corpus, using move analysis, and then provide a linguistic description of the grammatical stance features that each move most commonly draws on to accomplish its particular function in the genre.
2 A specialized corpus of fundraising texts The fundraising letters analyzed in this study are part of the ICIC Fundraising Corpus, which includes over 900 fundraising documents from 236 organizations and totals nearly 2 million words. The documents in the corpus include direct mail letters, newsletters, case statements, grant proposals, and annual reports. Table 3.1
Chapter 3. Identifying and analyzing rhetorical moves in philanthropic discourse
shows the total number of organizations, items and words for each text type in the corpus.
Table 3.1 ICIC fundraising corpus document types
Type of Text Direct Mail Letter Invitation, Newsletter Case Statement Grant Proposal Annual Report Total Org. n 108 172 12 27 51 370 Item n 316 445 13 69 84 927 Word n 191,540 922,212 121,780 156,021 523,770 1,915,323
Note: Org. n = the number of organizations represented in this type. Item n = the number of items of this type in the corpus. Word n = the number of words in the documents of this type in the corpus.
The present study focuses on the genre of direct mail letters, and thus uses only that component of the ICIC corpus. Letters were collected from five major types of organization; Table 3.2 shows the number of organizations, number of letters, and words broken down by these organization categories.
Table 3.2 ICIC fundraising corpus direct mail letters by organization type
Type of Organization Health/Human Services Environmental Community Development Education Arts and Culture Other Total Org. n 33 10 10 27 16 12 108 Item n 91 13 17 118 63 14 316 Word n 54,187 8,126 10,875 72,583 37,485 8,284 191,540
The ICIC Fundraising Corpus was designed to represent a specific type of discourse fundraising texts and to represent specific genres within that domain. The sub-corpus for the genre of direct mail letters was further designed to represent the range of variation found for this genre. To prevent any skewing of the corpus towards the writing of any one organization or non-profit field, effort was made to collect letters from a wide variety of non-profit organizations across a wide variety of non-profit fields (e.g., health and human services, education).
3 Determining and analyzing discourse moves: Direct mail letters 3.1 Previous analysis of direct mail letters
Direct mail letters for nonprofit fundraising have the general purpose of selling a product: a good cause. It has been noted (Connor & Upton, 2003) that a whole industry has developed around direct mail letters in nonprofits, as experts offer their advice for fundraisers in books and newsletters. It is fair to say, though, that the advice given in many of these materials often comes from the knowledge base of mass marketing rather than a careful analysis of the language actually used. Frequently, a great deal of emphasis is put on the physical appearance of the letter, while an examination of language use, for the most part, does not appear to be an important consideration. For example, even though the need for donor segmentation is frequently recommended, little concrete advice is given about how to appeal to specific audiences. Linguists interest in the direct mail letter is relatively new. As far as we are aware, there have only been three research studies published by linguists that focus on the fundraising direct mail letter. The edited book by Mann and Thompson (1992) showcased the merits of particular linguistic/rhetorical analyses (such as the Rhetorical Structure Theory and the topical structure analysis); however, the purpose of their volume was not necessarily to advance knowledge about the fundraising letter as a text type. Abelen, Redeker, and Thompson (1993) offered more valuable linguistic/rhetorical information about direct mail fundraising letters, but their focus was a cross-cultural comparison of fundraising letters written by Dutch and American in one type of non-profit (based on analysis of only 8 letters). The third article, by Upton (2002), is most relevant here and will be described in more detail below. 3.2 A move analysis of fundraising letters: Background and methodology
3.2.1 Move types Upton (2002) conducted a study using the ICIC-FC with the goal of providing a better, and more definitive, understanding of the discourse structure that underlies the persuasive aspect of direct mail letters. This study drew on the work done by Bhatia (1998), who did a preliminary move analysis on a small set of direct mail letters. Using a comprehensive, rigorous, and sustained analysis of data, a research team at ICIC identified a seven-move structure. Move Type 1, Get Attention: The communicative, functional purpose of this first move type was to get and focus the readers attention at the start of the letter. This
move type could be realized through one of two steps. Step 1 is to start with a quotation or story of some sort or a shocking or unexpected statement. Step 2 is to start by offering some type of general pleasantries. Examples from letters in the corpus of Move Type 1, as expressed through one or both of its two steps, are given in Table 3.3.
Table 3.3 Examples of Move Type 1, Steps 1 & 2, from corpus
Move Type 1 Get Attention Optional Steps: Step 1 Pleasantries 1996 is off to a fast start! What a Summer! And were just getting started! Step 2 Quotation, story or shocking/unexpected statement I learned about gardening when I was very young from my parents. They always had a garden and now so do I. The garden that I have now is very different from the garden that my parents grew. Dad would start planting about the fifteenth of April. He had two acres to plow so he used a mule and a plow. My garden now is very different from my dads garden Philanthropy is the rent we pay for the joy and privilege we have for our space on this earth. Jerold Panas. Cecilia desperately searched for medical care for her unborn child. She would have a better chance of getting help and delivering a healthy baby if she lived in Sweden. But Cecilia lives in central Indiana. Cecilia might even be your neighbor.
Move Type 2, Introduce the Cause and/or Establish Credentials: This move type serves two general functions. It focuses on establishing the credentials of the organization by highlighting what the organization does and the contribution it can make, and/or it serves to introduce the cause/need that the organization seeks to address. For many non-profit organizations, their primary or even sole purpose is to address a particular need; they talk about who they are and what they do in the context of what the cause is. Consequently, these two functions are considered part of one move type: introduce the cause and/or establish credentials of organization. This move type could be expressed by any one or more of the following five steps: 1) indicating a general problem or need, 2) highlighting a specific problem or need, 3) highlighting the successes of past organization efforts, and 4) outlining the mission of the organization. Examples from letters in the corpus of Move Type 2, as expressed through its four steps, are given in Table 3.4.
Table 3.4 Examples of Move Type 2, Steps 1 4, from corpus

Move Type 2 Introduce the cause and/or establish credentials of organization Optional Steps: Step 1 Indicate general problem/need One of the biggest challenges you face may be to find qualified, educated people to fill positions in your companyIndy Reads is working to change that! Step 2 Highlight specific problem/need This summer, more than 300 children ages 4 through 14 will attend the YWCA of Indianapolis Everyone belongs Summer Day CampAs you can imagine, a summer like this is expensive to provide. And more than 30% of the kids we serve cannot afford the camp fee. Step 3 Highlight the successes of past organization efforts My name is Joe Cooper. Last year I was so proud to be named student of the year that I thought my chest was going to burst when I was on stage. I learned first hand what GILL is all about, giving to others unselfishly. Step 4 Outline the mission of the organization Young women are growing up in an ever-changing society. As a contributor to the Council in past appeals I know that you are aware of our mission--to prepare girls with ethical values, character, a desire to succeed and a commitment to their community.
Move Type 3, Solicit Response: In the pilot study, it was observed that many letters not only requested support but also sought some other type of response, such as volunteering to help or contacting the organization for further information. Consequently, this move type was labeled solicit response, which was realized by one of two steps or both. Step 1, soliciting financial support has three options: Step 1A, state benefit of support to the need/problem; Step 1B, ask directly for pledge/ donation; and Step 1C, remind of past support to encourage future support. Step 2, soliciting other response, requests a response from the reader other than financial, such as volunteering to help. Examples from letters in the corpus of Move Type 3, as expressed through its two steps, are given in Table 3.5.

Move Type 3 Solicit Response Optional Steps: Step 1 Solicit financial support Step 1A: State benefit of support to the need/problem You can help more than 200,000 people with just one giftYour one gift to United Way of Central Indiana supports 82 human service agencies... Only if you contribute this year can these agencies continue to provide programs and services that: Strengthen Families; Invest In Our Children; Serve The Elderly And Disabled; Help People Become Self-Sufficient; Promote Health And Well-Being And thats why Im writing to you today. I urge you to continue to make a difference in the lives of individuals like Cecilia and her son. You can literally help save a life. Step 1B: Ask directly for pledge/donation Please send your gift today. Please send the largest contribution you can comfortably make. Step 1C: Remind of past support to encourage future support Last year your memorial gift of $5 for hospice care in March gave VNSF, Inc. the ability to address the needs of patients I described above. I am asking that you consider supporting our efforts once again this year with a similar gift. You have helped make Goodwills work possible with your previous support. Step 2 Solicit other response Every year we seek companies, organizations and individuals to sponsor one or more of our families If you are interested and would like more information, please contact We would like to have families matched with sponsors non later than Id be glad to respond to any questions you might have about our work. You may call me at...
Move Type 4, Offer Incentives: In Move Type 4, the writer offers an incentive, or indicates some other benefit of giving. In our analysis, we found that this move type could be realized in one of two ways, either by Step 1, which is the offer of a tangible (e.g., a mug, a matching donation) incentive, or by Step 2, the noting of an intangible (e.g., a good feeling) incentive for giving. Examples from letters in the corpus of Move Type 4, as expressed through its two steps, are given in Table 3.6.

Move Type 4 Offer Incentives Optional Steps: Step 1 Offer of Tangible Incentive Well send you our newsletters, invitations and membership cards. As an Indiana resident, your Federal tax-deductible contribution also qualifies for a special Indiana State Income Tax credit of 50%. Your membership fee assures your receiving notices of exhibition openings, lectures, discounts for Saturday School and the Pre-College Workshop, and invitations to the Janus Ball, artists dinners and other Friends only events. Step 2 Offer of Intangible Incentive When your gift helps an outstanding student become an outstanding teacher, you will know that you, too, have touched the future. I am sure you will feel good about giving. If you enjoy reading the storiesthere is an excellent chance that you will enjoy membership in the Indiana Historical Society.
Move Type 5, Reference Insert: Move Type 5 is a simple, straightforward structure that is used to draw attention to material beyond the letter itself that was included in the mailing, such as a brochure, a pledge form, or a return envelope. Two examples of Move Type 5 from the corpus are:
(1) I have enclosed a return envelope for your convenience, as well as an overview of the services we provide. (2) I have enclosed a brochure which tells you more about the Chancellors Circle and which includes a reply card. I have also enclosed a reply envelope for your convenience.
When analyzing the direct mail letters in the corpus, it became clear that Move Type 4 Offer Incentives and Move Type 5 Reference Insert were often embedded in other move types. Take, for example, the following sentence: Please fill out the enclosed card to send in your tax-deductible contribution to help support the boys and girls at Camp X (emphasis added). The primary function of this sentence is to solicit a financial response, Move Type 3, but there are two other functions it seeks to accomplish: offering an incentive for contributing (tax-deductible), which is Move Type 4, and bringing attention to the enclosure (the enclosed card), which is Move Type 5. It was decided to view this sentence and others like it
as containing three move types: the primary move of soliciting support and the embedded moves of referencing insert and offering incentive. Consequently, the two moves referencing insert and offering incentive were seen as being capable of either standing alone or being embedded in other moves. A longer example of how these two move types can be embedded in a longer move type, often Move Type 3, is the following, with tags included to mark where move types start and stop:
(3) <begin Move Type 3> Let me assure you that we would appreciate receiving one million dollars from you. But let me also assure you that we would appreciate equally well any contribution you are able to make. Whatever you can contribute, you will be helping to support a geology student at (university).
Your <begin Move Type 4> tax-deductible <end Move Type 4> contribution may be sent <begin Move Type 5> in the enclosed postage-paid envelope with the attached return card. <end Move Type 5> <begin Move Type 4> As an Indiana resident, your gift qualifies for a special tax credit of 50% (up to a maximum of $100 for an individual or $200 for a joint return). <end Move Type 4> <begin Move Type 5> For your convenience, I am enclosing a copy of Form CC-40, which should be filed with your Indiana State Income Tax. <end Move Type 5> Please give today. <end Move Type 3>
Move Type 6, Express Gratitude: This move type, which is used to express thanks, is realized by one or both of two steps. Step 1 offers thanks for past financial or other support, and Step 2 offers thanks for current as well as future financial (or other) support. Examples from letters in the corpus of Move Type 6, as expressed through its two steps, are given in Table 3.7.
Move Type 6 Express Gratitude Optional Steps: Step 1 Thanks for Past Financial or Other Support Thank you for your past gift to the Girl Scout Capital Campaign. I want to thank you for your past support of the Visiting Nurse Service Foundation, Inc. Step 2 Thanks for Current & Future Financial or Other Support Your support is greatly needed and greatly appreciated. Their appreciation and enthusiasm for what they are doing will go a long way to thank you for your encouragement and support. Thank you again for sharing our hope for a future without cancer.
Move Type 7, Conclude with Pleasantries: While not occurring as frequently as the other move types, one final move type, conclude with pleasantries, comes at the end of the letters and its communicative function is to bring the letter to a pleasant close. Examples of Move Type 7 include the following:
(4) May you be blessed, today and always. (5) I hope you have a nice day. (6) Happy Holidays!
The complete move structure for direct mail letters is given in Table 3.8.
Table 3.8 Move structure of non-profit direct mail fundraising letters
Move Type 1: Get attention Move Type 2: Introduce the cause and/or establish credentials of org. Step 1 General problem/need indicated, and/or Step 2 Specific problem/need highlighted, and/or Step 3 Successes of past organization efforts highlighted, and/or Step 4 Goals of future organization efforts outlined Move Type 3: Solicit response Step 1 Solicit financial support Step 1A State benefit of support to the need/problem, and/or Step 1B Ask directly for pledge/donation, and/or Step 1C Remind of past support to encourage future support, and/or Step 2 Solicit other response Move Type 4: Offer incentives Step 1 Offer of Tangible Incentive, and/or Step 2 Offer of Intangible Incentive Move Type 5: Reference insert Move Type 6: Express gratitude Step 1 Thanks for Past Financial or Other Support, and/or Step 2 Thanks for Current & Future Financial or Other Support Move Type 7: Conclude with pleasantries
3.2.2 Structural elements All of the letters in the direct mail corpus include text that strikes the reader as somehow different than the text in the body of the letter. Things like the date, address information, and even the signature and the signature footer have a very different function in the direct mail letter than the communicative functions
served by the move types described above. Their functions, while important and in many respects required, are more structural in nature than communicative. These features of the direct mail letters are called structural elements. According to Crossley (2007), discussing the related genre of cover letters, It appears that while structural elements are important to the framing of a cover letter, their individual meaning is not so dependent upon the writers intention as much as upon their inclusion by the writer. Structural elements are for the most part standardized patterns that rarely differ from one writer to another (p. 7). In many respects, move types are to structural elements as lexical words are to function words. Describing the latter relationship, Biber et al (1999) see lexical words as the main building blocks of texts, while function words are the mortar which binds the text together (p. 55); on a larger, genre level, move types can be seen as the main building blocks of the direct-mail letter while the structural elements provide the (boilerplate) scaffolding around which the letter is built. The structural elements that are frequently found in direct mail letters were examined to see what role they might play in the persuasive appeal of these letters. Table 3.9 below describes the seven basic structural elements that can appear in direct mail letters. As noted above, these elements are clearly something different than the seven move types outlined in Table 3.8. They do not have clear or major communicative functions, and they are for the most part very constrained (e.g., the date or writers name) or highly formulaic in nature (e.g., the salutation, the complementary close). While the study of these elements are tangential to the goals of discourse analysis, many instructional materials designed to train writers specifically address and stress the importance of using these various elements to make direct mail letters more persuasive (e.g., Cone, 1987; Lewis, 1997). Consequently, as practitioners view these structural elements as an important part of the direct mail letter, and they are intended to have an impact on the reader, they seemed worth examining; structural elements are included here as they are represented in virtually all direct mail letters and in fact can be viewed as markers that are used to help identify this text type
Table 3.9 Direct mail Structural Elements

Element A: Date line The date when the letter was written/sent is given. January 10, 1998 Element B: Address information The address of the addressee is given. This provides a level of formality to the letter. Joy Us Donor 123 Boulevard Road Here, There 45678 Element C: Salutation This is the opening greeting of the letter and is followed either by no punctuation, a comma, or a colon. Dear Joy Donor, Element D: Complimentary Close This is the word or phrase that draws the letter to a close and is followed by either a comma or no punctuation. Sincerely yours, On behalf of our clients, Element E: Signature This is the authors penned signature. Element F: Signature footer This provides the printed name of the letter signer and/or the title of the signer. Nahn Prophet President Element G: Footnote information This is information located after everything else in the letter and indicates that there is other information the reader should be aware of. enclosure cc
3.3
Analysis
Using the rubrics given in Table 3.8 outlining the rhetorical moves of the direct mail letters and in Table 3.9 outlining the structural elements, two raters handcoded the rhetorical moves and structural elements in all 242 letters in the corpus. As noted in Section 3.1 of Chapter 2, individual moves often reappeared throughout a letter, and each appearance was counted as a distinct occurrence; as a result a single move type could occur multiple times. Inter-rater reliability was calculated
at 84%, with all discrepancies reconciled through discussion. The vast majority of discrepancies that occurred between the two raters resulted from initial disagreement as to where one move ended and the next started, not as to the presence of a particular move. This inter-rater reliability is quite good, since, as Bhatia notes, there are sometimes cases which will pose problems and escape identification or clear discrimination, however fine a net one may use. After all, we are dealing with the rationale underlying linguistic behavior rather than its surface form (Bhatia, 1993a, p. 93). Once all of the moves were agreed upon and marked, each letter was then tagged to indicate the start and stop of each move in each text. The sequence of each move type and structural element for each text was also noted. This allowed for the tracking of the total frequency of each move type in the corpus, their relative locations in each letter (e.g., first, second, third), what other move types a move most commonly occurred with, how frequently a move was embedded in another move, and how frequently a move type occurred in the body of the text as opposed to in a P.S. 3.4 Results
Move Type Frequencies and Lengths: Table 3.10 provides summary information about the moves in this corpus of 242 direct mail letters, including the frequency of each move type, the number of letters that contained each move type, and the average number of words per move type. Not surprisingly, the most common move type in all of these letters was Move Type 3 Solicit Response, which occurs 546 times. This represents 39% of all the moves occurring in this corpus, showing up at the average rate of 2.3 times per letter.
Table 3.10 Move totals, percentages and rates of occurrence
Move 1 Moves Total Number % of total moves Letters w/ 1 occurrence % of total letters Words/move Avg. 35 2.5% 35 15% 39 Move 2 362 26.0% 226 93% 150 Move 3 546 39.3% 236 97% 48 Move 4 113 8.1% 85 35% 29 Move 5 153 11.0% 127 52% 9 Move 6 148 10.7% 124 51% 10 Move 7 33 2.4% 31 13% 10
In fact, of the 242 letters, only six letters did not have at least one Move Type 3 occurring at some point in the letter, with Move Type 3 represented in 97% (236/242) of the letters. The second most common move was Move Type 2 Introduce the cause and/or establish credentials of the organization, which occurred 362 times. At the rate of 1.5 times per letter, this move represents 26% of all the moves in this corpus. Move Type 2, like Move Type 3, also clearly seems to be a required move (that is, one that almost every letter uses) in this genre as it occurs in 93% of the letters. Move Type 4 (Offer Incentive) at 8.1% of the total moves, Move Type 5 (Reference Insert) at 11.0%, and Move Type 6 (Express Gratitude) at 10.7% occurred at relatively similar rates of frequency across the 242 letters. While apparently optional move types within this genre, each occurred fairly regularly in these letters: Move Type 4 was represented at least once in 35% of the letters, Move Type 5 occurred in 52% of the letters, and Move Type 6 occurred in 51% of the 242 letters. Move Type 1 (Get attention) and Move Type 7 (Conclude with pleasantries) were clearly icing-on-the-cake moves that writers of this genre could draw upon when desired but did not do so very frequently. Move Type 1 represented only 2.5% of the moves in this corpus and occurred in only 15% of the letters. Similarly, Move Type 7 represented 2.4% of the moves in this corpus and occurred in only 13% of the letters. It is further possible to compare the lengths of each of these move types. Move Type 2 is by far the longest move in this genre, averaging 150 words per occurrence. Move Type 3, the second longest move, is only one-third the length, at 48 words per occurrence. Move Types 5, 6 and 7 are the shortest, with Move Type 5 averaging 9 words per occurrence, and Move Types 6 and 7 averaging 10 words per occurrence. Structural Elements: Table 3.11 shows the relative frequency of each of the structural elements of the direct mail letters in this corpus.
Table 3.11 Percentage of letters with each structural element
Structural Elements Element A: Date Line Element B: Address Information Element C: Salutation Element D: Complimentary Close Element E: Signature Element F: Signature Footer Element G: Footnote Information Percent of Letters 77% 51% 88% 90% 89% 87% 7%
The vast majority of the letters in this corpus contained four structural elements, an opening salutation (88%), a complimentary close (90%), a signature (89%), and a typed signature footer (87%). The date line (77%) and address information (51%) were more optional, while footnote information is included relatively infrequently (7%). 3.5 Discussion
Based on the results of the genre analysis of the 242 direct mail letters in this corpus, a couple of observations can be made about how moves are used within the genre. First of all, some of these moves are nearly obligatory in the genre, while others seem to be merely optional.Secondly, it seems clear that the juxtaposition of the moves relative to each other shows meaningful patterns. Move Type 2 (Introduce the cause and/or establish credentials of organization) and Move Type 3 (Solicit response) are the most important moves in this genre. The preeminence of these two moves can be seen by the fact that not only do they occur in nearly every direct mail letter in the corpus, but they generally occur more than once, they usually occur as the first and second moves in the letter, they are by far the longest of the moves, and they almost always occur in juxtaposition to each other. That Move Types 2 and 3 are the most prominent in frequency, size, and position in the letter is not surprising. At its most basic level, the purpose of the direct mail letter is to tell the readers what the organization is and/or what the need is, and to request funds to help the cause. These functions are accomplished in these two moves. In contrast, the other five moves serve as optional tools that individual writers in this genre can incorporate in various ways to tailor the effect of the letter on the reader. For example, Move Types 4 (Offers Incentive) and 5 (Reference Insert) clearly play a secondary role in the direct mail letter as they tend to be quite short in length and often embedded in another move, usually Move Type 3 (Solicit Response). Nevertheless, their role appears to be an important one in that they are included in a sizeable percentage of the letters (Move Type 4 in 35%; Move Type 5 in 53%). Essentially, it seems their function is to serve as a reminder: In the case of Move Type 4, the readers most often are reminded either that contributions to non-profit organizations are tax-deductible, or that they will feel good about the contribution that they make. With Move Type 5, the function of this move is simply to remind the readers to look at other material that has been included with the letter. Move Type 6 (Express Gratitude), occurring in 51% of the letters, also plays an important role of informing the readers how much the organization appreciates their support. Nevertheless, this role is noticeably a secondary one when the frequency, number of occurrences and length of this move are considered in relation to Move Types 2 (Introduce the cause and/or establish credentials of organization)
and 3 (Solicit response). Move Types 1 (Get attention) and 7 (Conclude with pleasantries) are clearly optional moves, with both of them occurring in fewer than 15% of the letters. Similar observations can be made about the structural elements that are included; clearly there are some that are considered obligatory, such as the salutation (Element C) and complementary close (Element D), and others that are more optional, such as address information (Element B). The facts that most of these structural elements occur in most direct mail letters, and that practitioners themselves view these as essential components of the direct mail letter (e.g., Cone 1987) suggest that more careful analysis of these may be warranted in future studies. Indeed, it could be argued that at least some of these elements should be viewed as moves in themselves, as they are functional units of text serving a specific purpose that adds to the persuasive nature of the letters. Textual choices within these structural elements, for example how to phrase the salutation, are actually quite significant and can be viewed as something beyond a standardized template. 3.6 Letter prototypes
One strength of this type of corpus analysis is that it allows us to develop prototypes of the genre. Three such prototypes suggest themselves from these data. The first prototype might be one that represents the most basic form of the direct mail letter, using the moves and structural elements which occur in at least 85% of the letters in the corpus. These include Move Types 2 (introduce the cause and/or establish credentials of organization) and 3 (solicit response), and Structural Elements C (salutation), D (complimentary close), E (signature), and F (signature footer). An example of such a letter is provided in Figure 3.8. A second prototype might include all the moves and the structural elements that occurred in over 50% of the letters in this corpus. These include Move Types 2, 3, 5 (reference insert) and 6 (express gratitude) as well as Structural Elements A (date line), B (address information), C, D, E and F. An example of such a letter is provided in Table 3.13.
Table 3.12 Prototype direct mail letter representing move types and structural elements which occurred in 85% of the corpus.
Structural Element C Move Type 2 Mr./Mrs. Smith Now more than ever, inner city girls need your support to help their dreams become a reality. Each generation of girls faces new challenges: new technology, new moral issues, new opportunities. Inner City Girls experience a wide range of real life skills first aid, resume writing, and managing money. They also reap benefits that are difficult to measure, including enhanced self-esteem, greater confidence in their abilities, and the strength and conviction to take the lead and excel in their endeavors. We start early. As a preventative, informal education program, Inner City Girls helps girls relate to others, develop values, contribute to their society, and develop their own potential. This results in reduced risk of teen pregnancy, suicide, truancy, substance abuse and so many other crises. Your gift to the 1997 Inner City Girls Annual Campaign helps to ensure that girls will continue to receive the benefits that Inner City Girls offers. Todays girls will be tomorrows leaders and they are counting on you. Sincerely, (Signature) Sally Mentor President 1997 Inner City Girls Annual Campaign
Move Type 3 Structural Element D Structural Element E Structural Element F
Table 3.13 Prototype direct mail letter representing move types and structural elements which occurred in 50% of the corpus
Structural Element A Structural Element B Structural Element C Move Type 2 October 26, 2000 Sam Q. Doe 123 Street Dr. Somewhere, IN 46202 Dear Sam, For many of the children and seniors that Help Your Neighbor cares for, the Holiday season can be a troubling time. Nearly every day HYN receives a call about a patient or family in need of home care who has limited financial resources. Calls for help from families that need the crisis services HYN provides for their children ring throughout the season. This is not the ringing that you and I traditionally picture during the holiday season.
Move Type 3
Move Type 2
Move Type 3 Move Type 5 Move Type 6 Structural Element D Structural Element E Structural Element F
But there is something that you can do to help. With your gift of sharing, you are: *providing needed home care services to the most needy *giving emergency respite to families of children at risk for neglect or abuse *helping establish a Golden Touch program to provide companionship and homemaker services to homebound seniors. Help Your Neighbor has been a part of this community for over 85 years. Serving the needy has been an important part of our mission. Over the last ten years, HYN has delivered over $1 million worth of free services to the citizens of Somewhere. But we cannot do it alone. We need your help. A gift of sharing can bring comfort and hope to those most in need during this holiday season. Please use the enclosed envelope to make a contribution to help us ease the suffering and indeed ring in a most joyous holiday season. I thank you for your generous support. Sincerely, (Signature) Bob L. Brown President & CEO
A third prototype might simply show what a direct mail letter would look like if it used each of the possible move types and structural elements that define this genre; Table 3.14 provides an example of such a letter. It should be pointed out, however, that most real-world direct-mail letters do not use all seven possible rhetorical move types and, in fact, only one letter in this corpus did.
Table 3.14 Prototype direct mail letter representing all possible move types and structural elements
Structural Element A Move Type 1 Structural Element B Structural Element C October 26, 2000 Do all the good you can, by all the means you can, in all the ways you can, in all the places you can, at all the times you can, to all the people you can, as long as ever you can. John Wesley Sam Q. Doe 123 Street Dr. Somewhere, IN 46202 Dear Sam,

Move Type 2
Chapter 3. Identifying and analyzing rhetorical moves in philanthropic discourse Ebenhazer cares for at-risk children and families. We do this through a wide range of programs including community-based, therapeutic foster care, group homes and our treatment center. Many of the children are victims of abuse or live in unstable homes. This Christmas season we are asking you to take a few minutes to consider making a contribution to Ebenhazer to help the 1,500 children and families that we care for. Many of the children have no homes; no memories of joy from past holidays. Others are from families that are struggling to provide a healthy, happy environment but dont have the resources to make it possible. Your contribution will make a difference in a childs life. It may help a family stay together. It can certainly make happy holiday memories. A gift to Ebenhazer means the children in our care will have presents to open. A gift means a family will have a holiday meal, cooking utensils to prepare the meal and dishes to serve it on. Your gift will go beyond the holiday season. It can help purchase clothing, school supplies, books and educational tools throughout the year. Please use the enclosed donation card and return envelope and mail your taxdeductable donation to Ebenhazer today. Thank you in advance for your gift. We wish you and your family a new year full of joy and love. Sincerely, (Signature) Mary Smith Director P.S. Let our families and children know you want them to have the same kind of memories of the holidays you will have. Please give generously. Thank you for thinking of Ebenhazer this Christmas season. Enclosures
Move Type 3 Move Type 2 Move Type 3 Move Type 5 Move Type 4 Move Type 6 Move Type 7 Structural Element D Structural Element E Structural Element F Move Type 3PS Move Type 6PS Structural Element G
4 Linguistic analysis of moves: Tracking the use of stance structures As introduced in Chapter 1, the goal of this book is to move beyond simply segmenting texts into well-defined discourse units (in this case, moves); the goal is also to analyze the linguistic characteristics of each individual discourse unit and each discourse unit type (i.e., the move types), to determine the typical linguistic
characteristics of the units. Although they are defined in functional terms, moves are constructed from linguistic devices, including word choice, phrase types, and grammatical features (e.g., tense, aspect, voice). Many of these linguistic devices are used to express stance: personal feelings, attitudes, value judgments, or assessments (Biber et al., 1999, p. 966). Linguistic features used for these functions are especially important in direct-mail letters. There have been numerous studies of the linguistic mechanisms used by speakers and writers to convey their personal feelings and assessments, carried out under several different labels, including evaluation (Hunston, 1994; Hunston & Thompson, 2000), intensity (Labov, 1984), affect (Ochs, 1989), evidentiality (Chafe, 1986; Chafe & Nichols, 1986), hedging (Holmes, 1988; Hyland, 1996a, 1996b), persuasion (Hyland, 2004a), and stance (Barton, 1993; Beach & Anson, 1992; Biber, 2004, 2006a, 2006b; Biber & Finegan, 1988, 1989; Biber et al., 1999, Chapter 12; Conrad & Biber, 2000; Hyland, 1999b; Precht, 2000). In the present case, we adopt the framework of stance devices developed in Biber et al.(1999) and Biber (2006a,b) to analyze the ways in which move types in direct-mail letters differ linguistically. Because non-profit direct mail letters are overtly persuasive in nature, there is little question that stance plays an important role in this genre. We are interested in looking at how the use of stance structures (as opposed to other expressions of stance, like word choice) varies from move to move. We believe that identifying stance structures could be important in untangling the language structures used in direct-mail letters and provide a better describing the function of the different moves in the genre. 4.1 Identifying grammatical stance devices
According to Biber et al.(1999), the five most common grammatical devices used to express stance are: 1) stance adverbials, 2) stance complement clauses (specifically that and to clauses), 3) modals, 4) premodifying stance adverbs (e.g., Im so happy for you.), and 5) stance nouns followed by prepositional phrases. While Biber (2006a; 2006b) has previously analyzed the use of grammatical stance devices in specific registers (comparing spoken and written academic registers), this study seeks to compare and contrast the use of these stance devices across the move types within a single genre. Each move was automatically tagged using a grammatical tagger. While the tagging program, developed by Biber, identifies a wide variety of linguistic features (see Appendix Two), we focused here only on those grammatical devices that express stance. These features are given in Table 3A at the end of the chapter. The rate of occurrence for each stance feature within each move type was calculated. In the
following discussion, we focus on only the stance features that occurred at least 3 times per 1,000 words. 4.2 Interpreting the use of grammatical stance devices used in moves
As expected, since each of the moves has very different rhetorical functions within this persuasion-motivated genre, the seven different move types all use different combinations of grammatical stance devices. Table 3.15 provides a breakdown of the results by move, showing those stance devices that occurred at a rate of 3 per 1,000 words.
Table 3.15 Common grammatical stance devices by move type
Move Type Move 1: Get attention Move 2: Introduce cause/ establish credentials Move 3: Solicit response Stance Structure Occurring 3 times/1000 words Stance Adverbials of Certainty Modals of possibility/permission/ability Modals of prediction/volition NA Rate/1000 words 7.9 7.0 13.1
Modals of possibility/permission/ability Modals of prediction/volition To-complement clauses controlled by (all) stance verbs Modals of possibility/permission/ability Modals of prediction/volition Modals of necessity/obligation Modals of prediction/volition To-complement clauses controlled by desire/intention/decision stance verbs To-complement clauses controlled by all stance verbs Pre-modifying stance adverbs Stance Adverbials of Certainty To-complement clauses controlled by desire/intention/decision stance verbs Pre-modifying stance adverbs
12.1 14.4 7.8 7.2 19.7 3.4 11.2 4.4 5.6 3.0 5.6 8.5 7.2
Move 4: Offer incentives Move 5: Reference insert Move 6: Express gratitude
Move 7: Conclude w/ pleasantries
Table 3.15 provides the basis for interpreting how the different moves in this genre tend to use the different grammatical structures of stance in order to accomplish their rhetorical purpose. The purpose of Move Type 1 (Get Attention) is to engage the reader and get him/her interested in the cause/need being promoted. The move typically contains a quotation, story, or strong general pleasantries. The fairly strong reliance on modals of possibility/ability and modals of prediction have the purpose of empowering the reader and trying to show that the reader can make a difference. This can be seen in the following examples. Modals of possibility/ability (italics added to show usage):
(7) You might hear some ugly talk this summer. (8) YOU can be the one to open the door.
Modals of prediction (italics added to show usage):

(9) The urgency you feel to make changes is just the extent that change will be made. (10) Until he extends the circle of his compassion to all living things, man will not himself find peace.
Stance adverbials of certainty, the other stance structure frequently used in Move Type 1, contribute to getting the readers attention by underscoring the need. Stance adverbials of certainty (italics added to show usage):
(11) Please send a million dollars so we can really support geological activities here at IUPUI in perpetuity. (12) (quoting Margaret Mead) Never doubt that a small group of thoughtful, committed citizens can change the world, indeed its the only thing that ever has.
Move Type 2, Introducing the cause and establishing credentials, did not include any especially frequent use of specific stance structures. Looking at the letters in the corpus more carefully, it appears that this move is written in a more matter of fact manner. Unlike the other moves, the emphasis in this move is on content and facts what the organizations do and what the needs are rather than emphasizing personal feelings, attitudes, value judgments, or assessments. For example:
(13) The number of companies reporting a shortage of skilled workers almost doubled from 1995 to 1998; from 27percent to more than 47percent. Did you know that about 20 percent of Americas workers have low basic skills and 75percent of unemployed adults have reading or writing difficulties?
Indy Reads is working to change that! (14) In 1985 a group of courageous pioneering women established the YWCA of Indianapolis to meet the needs of women, and in 1998 the tradition continues. The WYCA of Indianapolis still focuses on, supports, and gives empowerment to women and their families. Empowerment refers to meeting the needs of girls and women so that they can freely exercise the power to determine and direct their lives.
Move Type 3, Solicit response, can incorporate one of two steps, either soliciting financial support or soliciting other response (a non-financial contribution from readers, such as volunteering to help). The stance structures most commonly used in Move Type 3 are modals of possibility and ability, modals of prediction and volition, and to-complement clauses controlled by stance verbs. Looking again more closely at the letters themselves, Move Type 3 frequently uses modals of possibility and ability in order to state the benefit of support for the reader. The modal can, indicating ability, is by far the most common (occurring 188 times); the modal may is the next most common (occurring 45 times), indicating possibility:
(15) You can help people reach their dreams of reading and learning by making a contribution to Indy Reads. (16) It may help a family stay together.
Modals of prediction and volition, on the other hand, were typically used to ask directly for a pledge or donation:
(17) Will you help them change? (18) We hope you will become a partner of Indy Reads
To-complement clauses controlled by stance verbs most frequently appear at the end of the move and play a role in making clear what it is the organization wants the reader to do in response to the letter.
(19) We are hopeful that you will agree to help. (20) If you have any questions or concerns at any time, please do not hesitate to call me. (21) When you are contacted by your Campus Campaign volunteer, we hope youll choose to become one of the many partners in the community of IUPUI.
Move Type 4, Offer incentives, makes frequent use of modals of possibility, permission, and ability, but modals of prediction and volition are used at an extremely high rate of nearly 20 times per 1,000 words. These structures parallel those used in Move Types 1 and 3, but support very different rhetorical purposes. Modals of
possibility/permission/incentive typically are tied to offers of tangible incentives, as illustrated by examples (22) and (23).
(22) I hope we can include your name among the list of inaugural members of the 1994 Black Cane Society. (23) Based on each individual tax situation, your gift may be tax deductible
Modals of prediction and volition are also used to support reciprocal offers, including offers of tangible incentives (24, 25) and offers of intangible incentives (26).
(24) Corporate contributors will be acknowledged in our newsletter, annual report and on the Indy Reads webpage. (25) However, these tax credits are only available for a limited time, so we ask that you act soon if you would like to use them. (26) I am sure you will feel good about giving.
Move Type 5, Reference insert, uses only one grammatical stance structure consistently, but this structure, modals of necessity and obligation, is not used regularly by any other move. Looking at this structure in context, it is clearly used to direct readers attention to materials included with the letter.
(27) For your convenience, I am enclosing a copy of Form CC-40, which should be filed with your Indiana State Income Tax.
What is most interesting about the regular use of this particular stance structure in this move is that it is very directive, explicitly telling the reader what s/he must, should, or ought to do. This is a rather surprising structure to see in a letter such as this whose whole purpose is to persuade a reader to make a financial (usually) contribution; telling someone they have to do something (when they really dont) is usually not a successful persuasion tactic. Nevertheless, within Move 5, this stance structure does not come across as inappropriate, primarily because it is not part of the solicitation itself but points the reader to steps that will benefit him/ herself, rather than the agency. Move Type 6, Express gratitude, commonly uses modals of prediction and volition (28 and 29), as well as to-complement clauses controlled by stance verbs (30 and 31) to thank readers in advance for potential donations.
(28) Your check will be greatly appreciated. (29) I would like to thank you for your commitment to dental hygiene education. (30) I want to thank you for your help. (31) I want to express my gratitude to those of you who have already pledged or contributed in 1991.
The level of appreciation for the gift is frequently signaled in this move through the use of pre-modifying stance adverbs, as illustrated by the following two examples.
(32) Thank you so much for your help. (33) I can only hope that you know how appreciative we at the Indianapolis Zoo are of your philanthropy.
Move Type 7, Conclude with pleasantries, is the only move other than Move Type 1 to use stance adverbials of certainty. In this move, this structure always occurs in rather formulaic expressions, as shown by the following examples.
(34) May you be blessed, today and always, as you so generously share your blessing. (35) I am always happy to hear from you about your accomplishments.
Move Type 7 also has the highest rate of to-complement clauses controlled by desire/intention/decision stance verbs, but like the adverbials of certainty, these all, without exception, occur in short, formulaic structures that tend to end the letter.
(36) I hope to see you there. (37) I hope to hear from you soon.
Lastly, just as Move Type 6 uses pre-modifying stance adverbials for emphasis, Move Type 7 uses this structure in the same way. In fact, the only pre-modifying adverb that is commonly used in this move is the adverb so before an adjective: so great, so closely, so generously. In sum, all the move types in the fund raising letters, with the exception of Move Type 2, frequently use one or more grammatical stance devices, and the combination of grammatical structures used are distinctive, with no two moves using the same set of structures. Our results suggest that different moves use somewhat different stance structures, which supports the need to teach different strategies for different moves. Overall, however, the results were unexpected in showing a rather limited use of stance structures. Modals of possibility/ability and prediction were used along with to complement clauses. Missing, however, were many stance features that are typically considered part of persuasive discourse, e.g., modals of obligation, stance adverbials and premodifying stance adverbs. The lack of variety in the stance use suggests a discourse that treads carefully, does not take strong positions, and does not put strong demands on the reader. A previous study using Bibers multi-dimensional features (Connor & Upton, 2003) had suggested that fundraising letters as a genre are similar to academic prose, a finding, which was unexpected. The current study further supports this general finding: both genres use a limited range of grammatical stance features, restricted primarily to modal verbs and to complement clauses (compare the findings here to those reported in Biber 2006a,b for academic prose). Thus, despite
their apparent differences in communicative purpose, we see here that these two genres are surprisingly similar in the kinds of stance expressed and the particular linguistic devices used for these functions.
5 Final thoughts One goal of this chapter was to outline a general approach that can be used to identify and analyze the moves of a genre in a corpus of texts, and to provide a specific and detailed example of how this type of analysis can be done. As noted in Chapter 2, a move analysis seeks to identify the components (moves) of a genre by the communicative purposes they serve. These communicative purposes must be identified within the context of the genre as well as the social context in which the genre resides (e.g., fundraising direct mail relationships). The question that we are seeking to answer with this type of analysis is, What are the rhetorical structures that address the specific purposes of the genre, and if these vary, how so? Because we are seeking to understand why a genre is structured the way it is, and because it is important to account for the socio-cultural, institutional, and organizational influences on a specific genre, there is naturally a subjective element to this sort of analysis. However, despite its subjective nature, certain guidelines can be followed that enables an empirical analysis of moves in a corpus of texts (see also Chapter 2). First, extreme care must be taken to collect good data. In the present case, the corpus of fundraising discourse was well planned involving the input of both fundraising practitioners and linguists and carefully documented, and was large enough to provide reliable results. Then, a series of pilot studies were run with a research team, first to develop a working set of genre-specific move types with distinct definitions, and then to confirm the inter-rater reliability of using these moves to analyze the individual texts in the corpus. Once the move types were clearly defined and all of the documents in the data set coded (and checked by multiple raters), the next step was to look for patterns in how and when the different moves were used in order to help explain the specific role of each move more broadly within the genre. The goal was to have a full understanding of the communicative purposes and functions that different parts of the genre have and how they work together to accomplish the overall communicative aim of the genre. Nevertheless, although a move analysis uses communicative function as the starting point for understanding the rhetorical purposes of a genre, the expectation is that these distinct functions are realized through the use of distinct and consistent linguistic features. Consequently, it should be possible to see variation in linguistic patterns from one move to the next.
The second and more important goal of this chapter for the purposes of this book was to show the contribution that a corpus-based approach could make in the analysis of discourse structure (e.g., move structure). By analyzing generic moves in a fairly large specialized corpus of direct mail letters collected from multiple non-profit organizations of various types (e.g., environmental, education), we are able to generalize the findings and develop representative prototypes that can be used for exemplification and training. It then becomes possible to authoritatively compare and contrast the discourse structures of different types of texts in order to gain a clearer understanding of how each uniquely accomplishes its communicative purposes. In addition, using a corpus of texts to analyze discourse structures makes it much easier to identify alternate ways (steps) for accomplishing common functions (moves); such variations can be easily missed or misinterpreted when looking at individual texts. In the same vein, a corpus-based analysis makes it easy to identify which moves (and steps) are more common, even required, and which are optional or idiosyncratic and can be used at the discretion of the writer without the reader feeling the text is non-standard or inappropriate. In a more detailed analysis, a corpus-based approach will even permit generalizations about where different discourse structures occur within a typical text, and where they occur relative to other structures (i.e., before, after, or within). A corpus-based analysis also allows for the detailed analysis of the linguistic characteristics of the discourse units that have been identified. In the discourse analysis done in this chapter, the corpus-based approach allowed us to make detailed observations about the specific grammatical stance devices that each of the different move types used to accomplish their unique functions in the genre. Further analysis would likely reveal other linguistic differences among these move types.
Table 3A Grammatical devices used to express stance
1. Stance adverb(ial)s (See Biber et al, 1999, pp. 557558, 853874) Expressing Certainty: actually, always, certainly, definitely, indeed, inevitably, in fact, never, of course, obviously, really, undoubtedly, without doubt, no doubt Expressing Likelihood: apparently, evidently, kind of, most cases, most instances, perhaps, possibly, predictably, probably, roughly, sort of, maybe Expressing Attitude: amazingly, astonishingly, conveniently, curiously, disturbingly, hopefully, even worse, fortunately, importantly, ironically, regrettably, rightly, sadly, sensibly, surprisingly, unbelievably, unfortunately, wisely
Expressing Style: accordingly, according to, confidentially, figuratively, frankly, generally, honestly, mainly, strictly, technically, truthfully, typically, reportedly, primarily, usually 2. Complement clauses controlled by stance verbs, adjectives, or nouns 2.1 Stance verb + that-clause. (See Biber et al, 1999, pp. 661670) Verbs Expressing Certainty: acknowledge, affirm, ascertain, calculate, certify, check, conclude, confirm, decide, deem, demonstrate, determine, discover, find, know, learn, mean, meant, meaning, note, notice, observe, prove, realize, recall, recognize, recollect, record, remember, see, show, signify, submit, testify, understand Verbs Expressing Likelihood: appear, assume, believe, bet, conceive, consider, deduce, detect, doubt, estimate, figure, gather, guess, hypothesize, imagine, indicate, intend, perceive, postulate, predict, presuppose, presume, reckon, seem, sense, speculate, suppose, suspect, think, wager Verbs Expressing Attitude: accept, admit, agree, anticipate, boast, complain, concede, cry, dream, ensure, expect, fancy, fear, feel, forget, foresee, guarantee, hope, mind, prefer, pretend, reflect, require, resolve, trust, wish, worry Verbs Expressing Speech Act (and other communication verbs): add, announce, advise, answer, argue, allege, ask, assert, assure, charge, claim, confide, confess, contend, convey, convince, declare, demand, deny, emphasize, explain, express, forewarn, grant, hear, hint, hold, imply, inform, insist, maintain, mention, mutter, notify, order, persuade, petition, phone, pray, proclaim, promise, propose, protest, reassure recommend, remark, reply, report, respond, reveal, say, shout, state, stress, suggest, swear, sworn, teach, telephone, tell, urge, vow, warn, whisper, wire, write 2.2 Stance verb + to-clause. (See Biber et al, 1999, pp. 693715) Verbs Expressing Probability (likelihood): appear, happen, seem, tend Verbs Expressing Cognition/perception: assume, believe, consider, estimate, expect, felt, find, forget, hear, imagine, judge, know, learn, presume, pretend, remember, see, suppose, take, trust, understand, watch Verbs Expressing Desire/Intention/Decision: aim, agree, bear, care, choose, consent, dare, decide, design, desire, dread, hate, hesitate, hope, intend, like, look, love, long, mean, need, plan, prefer, prepare, refuse, regret, resolve, schedule, stand, threaten, volunteer, wait, want, wish Verbs Expressing Causation/Modality/Effort: afford, allow, appoint, arrange, assist, attempt, authorize, bother, cause, counsel, compel, defy, deserve, drive, elect, enable, encourage, endeavor, entitle, fail, forbid, force, get, help, inspire, instruct, lead, leave, manage, oblige, order, permit, persuade, prompt, require, raise, seek, strive, struggle, summon, tempt, try, venture
Verbs Expressing Speech Act (and other communication verbs): ask, advise, beg, beseech, call, claim, challenge, command, convince, decline, heard, invite, offer, pray, promise, prove, remind, report, request, say, said, show, teach, tell, urge, warn 2.3 Stance adjective + that-clause. (See Biber et al, 1999, pp. 671674; many of these occur with extraposed constructions) Adjectives Expressing Certainty: accepted, apparent, certain, clear, confident, convinced, correct, evident, false, impossible, inevitable, obvious, positive, proved, plain, right, sure, true, well-known Adjectives Expressing Likelihood: doubtful, likely, possible, probable, unlikely Adjectives Expressing Attitude/Emotion: adamant, afraid, alarmed, amazed, amused, angry, annoyed, astonished, aware, careful, concerned, curious, depressed, disappointed, dissatisfied, distressed, disturbed, encouraged, frightened, glad, grateful, happy, hopeful, hurt, irritated, mad, pleased, reassured, relieved, sad, satisfied, shocked, surprised, thankful, unaware, uncomfortable, unhappy, unlucky, upset, worried Adjectives Expressing Evaluation: acceptable, advisable, amazing, annoying, anomalous, appropriate, awful, conceivable, critical, crucial, desirable, dreadful, embarrassing, essential, extraordinary, fitting, fortunate, funny, good, great, horrible, imperative, incidental, inconceivable, incredible, indisputable, interesting, ironic, lucky, natural, neat, necessary, nice, notable, noteworthy, noticeable, obligatory, odd, okay, paradoxical, peculiar, preferable, ridiculous, sensible, shocking, silly, sorry, strange, stupid, sufficient, surprising, tragic, typical, unacceptable, unaware, uncomfortable, understandable, unfair, unfortunate, unthinkable, untypical, unusual, upsetting, vital, wonderful 2.4 Stance adjective + to-clause. (See Biber et al, 1999, pp. 716721; many of these occur with extraposed constructions) Adjectives Expressing Certainty/Likelihood): apt, certain, due, guaranteed, liable, likely, prone, unlikely, sure Adjectives Expressing Attitude/Emotion: afraid, amazed, angry, annoyed, ashamed, astonished, concerned, content, curious, delighted, disappointed, disgusted, embarrassed, free, furious, glad, grateful, happy, impatient, indignant, nervous, perturbed, pleased, proud, puzzled, relieved, sorry, surprised, worried Adjectives Expressing Evaluation: awkward, appropriate, bad, best, better, brave, careless, convenient, crazy, criminal, cumberome, desirable, dreadful, essential, expensive, foolhardy, fruitless, good, important, improper, inappropriate, interesting, logical, lucky, mad, necessary, nice, reasonable, right, safe, sick, silly, smart, stupid, surprising, useful, useless, unreasonable, unseemly, unwise, vital, wise, wonderful, worse, wrong Adjectives Expressing Ability/Willingness: able, anxious, bound, careful, competent, determined, disposed, doomed, eager, eligible, fit, greedy, hesitant, inclined, insufficient, keen, loath, obliged, prepared, quick, ready, reluctant, set, slow, sufficient, unable, unwilling, welcome, willing
Adjectives Expressing Ease or Difficulty: difficult, easier, easy, hard, impossible, pleasant, possible, tough, unpleasant 2.5 Stance noun + that-clause. (See Biber et al, 1999, pp. 648651) Nouns Expressing Certainty: assertion, conclusion, conviction, discover, doubt, fact, knowledge, observation, principle, realization, result, statement Nouns Expressing Likelihood: assumption, belief, claim, contention, expectation, feeling, hypothesis, idea, implication, impression, indication, notion, opinion, possibility, presumption, probability, rumor, sign, suggestion, suspicion, thesis Nouns Expressing Attitude/Perspective: grounds, hope, reason, view, thought Nouns Expressing Communication: comment, news, proposal, proposition, remark, report, requirement 2.6 Stance noun + to-clause. (See Biber et al, 1999, pp. 652653) agreement, authority, commitment, confidence, decision, desire, determination, duty, failure, inclination, intention, obligation, opportunity, plan, potential, promise, proposal, readiness, reluctance, responsibility, right, scheme, temptation, tendency, threat, wish, willingness 3. Modal and semi-modal verbs (See Biber et al, 1999, pp. 483ff.) Modals Expressing Possibility/Permission/Ability: can, could, may, might Modals Expressing Necessity/Obligation: must, should, (had) better, have to, got to, ought to Modals Expressing Prediction/Volition: will, would, shall, be going to 4. Premodifying stance adverb (stance adverb + adjective or noun phrase) Most common premodifying adverbs (See Biber et al, 1999, pp. 544ff): Adverbials + adjectives (It was perfectly quiet.) awfully, completely, extremely, how, perfectly, quite, really, slightly, so, totally, very Adverbials + nouns (It is almost time; It was quite a surprise.) about, almost, completely, quite, really 5. Stance noun + prepositional phrase (of + NP or for + NP) (See 2.5 and 2.6 for list of stance nouns used.)
chapter 4
Rhetorical moves in biochemistry research articles

BY Budsaba Kanoksilapatham
The study described in this chapter provides another example of the powerful descriptive nature of a corpus-based, top-down approach to discourse analysis1. Unlike previous move-based studies of research articles, this is the first study to undertake a comprehensive coding of all the moves in a fairly large corpus that represents all four sections introduction, methods, results, discussion followed by an analysis of the linguistic structures that make up those moves. In keeping with the steps introduced in Table 1.1, the first steps in the study (after the compilation of a representative corpus) were to identify the rhetorical move types used in biochemistry research articles, segment the texts into moves, and then identify the specific move type each represents. Then, following steps 45 described in Table 1.1, multidimensional analysis (see Appendix 1) was used to identify the linguistic characteristics of each rhetorical move, and to analyze the typical linguistic characteristics of each move type. Finally, the typical discourse organization of research article sections is analyzed in terms of these move types. The integration of move analysis and multidimensional analysis provides us with a comprehensive communicative and linguistic description of the discourse of biochemistry research articles, underscoring the value of a corpus-based approach.
Background
Previous move-based studies of scientific research articles have provided valuable insights regarding the rhetorical moves conventionally employed in each of the four internal sections (introduction, methods, results, discussion; see Chapter 2). Discipline-specific variations are also discernible (e.g., Anthony, 1999; Brett, 1994;
1. The material presented in this chapter is based upon dissertation research supported by the National Science Foundation, USA, under Grant No. 0213948 and a TOEFL Grant for Doctoral Research. Part of this material has been previously published in English for Specific Purposes, 24, 3, 269292.
Chu, 1996; Dubois, 1997; Naczi, Reznicek, & Ford, 1998; Swales & Luebs, 2002; D. Thompson, 1993) suggesting that the rhetorical organization of research articles is constrained by conventions of the academic disciplines and by the expectations of the relevant discourse communities. However, the findings generated by these studies must be treated with caution. First, many of these studies do not analyze a representative corpus of the discipline they studied. Sampling problems include experts subjective recommendation of the journals analyzed (e.g., Nwogu, 1997; Posteguillo, 1999), reflecting individual preferences rather than the actual academic prestige of the journals. Other sampling problems include lack of specification criteria for article selection (e.g., Swales & Najjar, 1987; Williams, 1999), mixture of different genres (such as clinical reports and experimental articles) in the same corpus (e.g., Williams, 1999), and non-compatibility of journals (e.g., specialized journals and interdisciplinary journals in the same corpus in Berkenkotter & Huckin, 1995). In addition, most previous studies have focused on a single section of articles, rather than the overall organization of research articles across all four sections (introduction, methods, results, discussion). The unsystematic and subjective selection of research articles investigated, and the mixed and unrepresentative nature of the corpus, preclude valid generalizations about the rhetorical organization of the target genre. Perhaps more importantly, previous research has been limited by its exclusive focus on rhetorical moves, with little or no attention given to the lexico-grammatical characteristics of moves. For this reason, we know little at present about the typical linguistic characteristics of the different move types that comprise research articles. The study presented in this chapter is unique in several ways. First, it analyzes the discourse structure of all four sections (introduction, methods, results, discussion) in scientific research articles. Prior to this study, Nwogus 1997 study was the only one that described moves in all four sections of research articles, based on analysis of 15 medical articles that had been recommended by medical practitioners. While a useful initial analysis, that study was still quite restricted in scope. In the present study, based on Swales (1990; 2004) framework2, 60 biochemistry research articles (which were systematically collected as part of a representative corpus) were first analyzed for move structure. This qualitative approach was then
2. Swales original model in 1990 was revised in 2004 (230, 232) consisting of three moves. Move 1: Establishing a topic is realized by topic generalizations of increasing specificity. Move 2: Preparing for the present study (citations possible) is realized as Step 1A: Indicating a gap or Step 1B: Adding to what is known, and Step 2: Presenting positive justification. Move 3: Presenting the present work is realized by up to seven steps--Step 1: Announcing present research descriptively and/or purposively, Step 2: Presenting research questions or hypotheses, Step 3: Definitional clarifications, Step 4: Summarizing methods, Step 5: Announcing principal outcomes, Step 6: Stating the value of the present research, Step 7: Outlining the structure of the paper.
Chapter 4. Rhetorical moves in biochemistry research articles
complemented by quantitative analysis of specific linguistic characteristics of each move type. Some previous grammatical-rhetorical studies have described the functions of individual linguistic features in research articles (e.g., Salager-Meyer, 1997, on hedging; D. Thompson & Ye, 1991, on reporting verbs). In general, though, these studies do not document linguistic differences across research article sections, and no study to date has attempted to describe systematic linguistic differences among the move types within research article sections. In contrast, the present study undertakes a detailed linguistic description of each move type. This description incorporates analysis of 41 distinct linguistic features, a large set of features made possible by corpus-based techniques (including multidimensional analysis). Combining the strengths of both qualitative and quantitative corpus analysis tools, this study illustrates a novel and successful application of multidimensional analysis for top-down discourse analysis: to systematically identify the linguistic features associated with each move type (representing different communicative purposes) and to provide a more comprehensive description of rhetorical organization in research articles than has been previously feasible.
2 Description of the corpus The term biochemistry was first introduced in 1903, but this field has mushroomed so that it is now represented by 261 specialized journals published worldwide (Journal Citation Reports, 2004). As a result, it is no easy challenge to build a representative corpus of research articles from this academic discipline, ensuring that the articles contained in the corpus truly represent the range of research articles in biochemistry. To control for possible differences among national varieties of English and across time, only journals published in the United States in the year 2000 were considered. The corpus was further restricted to research articles from the five most prestigious scientific journals in biochemistry (determined by their impact factors3): Cell (C), Molecular Cell (MC), Molecular and Cellular Biology (MCB), Journal of Biological Chemistry (JBC), and Molecular Biology of the Cell (MBC). From these five journals, 60 articles (12 from each journal) were randomly selected, evenly distributed over all the issues of each journal for the year 2000. These articles all have four distinct sections (Introduction, Methods, Results, and Discussion). The total corpus size is about 320,000 running words.
3. The impact factor is the average number of times articles that are published in a specific journal in the two previous years were cited in a particular year. This figure is useful in evaluating a journals relative importance, especially when a comparison is made to other journals in the same field.
3 Determining the move categories in the genre of biochemistry research articles The first step in the analysis here was to identify the move types that can occur in each section of biochemistry research articles. This task was made easier because I was able to build on the numerous previous studies that have identified move types in research articles from different academic disciplines: Anthony (1999), Chu (1996), Crookes (1986), Samraj (2002) on the move types in Introductions; Swales & Luebs (2002), Wood (1982) on the move types in methodology sections; D. Thompson (1993), Williams (1999) on the move types in Results sections; and Dubois (1997) on the move types in Discussion sections. Considering the findings from these studies, together with my own detailed analyses of biochemistry research articles, I identified 15 move types that can occur in these texts. Several of these move types can consist of multiple sub-parts, referred to as steps. Table 4.1 summarizes the overall framework.
Table 4.1 Model of move structure in biochemistry research articles
INTRODUCTION Move 1: Establishing a topic Move 2: Preparing for the present study: Indicating a gap/raising a question Move 3: Introducing the present study Stating purpose(s) Step 1: Step 2: Describing procedures Step 3: Presenting findings METHODS Move 4: Describing materials Step 1: Listing materials Step 2: Detailing the source of the materials Step 3: Providing the background of the materials Move 5: Describing experimental procedures Step 1: Documenting established procedures Step 2: Detailing procedures Step 3: Providing the background of the procedures Move 6: Detailing equipment Move 7: Describing statistical procedures RESULTS Move 8: Step 1: Step 2: Step 3: Step 4: Restating methodological issues Describing aims and purposes Stating research questions Making hypotheses Listing procedures or methodological techniques Move 9: Justifying methodological issues Move 10: Announcing results Reporting results Step 1: Step 2: Substantiating results Step 3: Invalidating results Move 11: Commenting results Explaining results Step 1: Step 2: Generalizing/interpreting results Step 3: Evaluating results Step 4: Stating limitations Step 5: Summarizing
DISCUSSION Move 12: Contextualizing the study Describing established knowledge Step 1: Step 2: Generalizing, claiming, deducing previous knowledge Move 13: Consolidating results Restating methodology (purposes, research questions, hypotheses, and procedures) Step 1: Step 2: Stating selected findings Step 3: Referring to previous literature Step 4: Explaining differences in findings Step 5: Making overt claims/generalizations Step 6: Exemplifying Move 14: Stating limitations of the study Move 15: Suggesting further research
The following sections describe the individual move types in each section and their constituent steps. 3.1 The introduction section
Move 1: Establishing a topic assures that the topic is worth investigating and the field is well established. Move 1 also reports previous research deemed relevant to the topic being discussed. Move 1 usually begins the Introduction section, consisting of topical statements of increasing specificity: General Move 1 statement:
(1) Cell-cell adhesion is critical for tissues and organs. [C9]
Specific Move 1 statement:

(2) These modifications promote plasma membrane association and facilitate highaffinity protein-protein interactions (REFERENCE). [MBC3]
Move 2: Preparing for the present study focuses on weaknesses in the existing literature and/or unaddressed research questions. Move 2 in biochemistry establishes a niche in previous research by the step of either indicating a gap or raising a question, as shown in (34).
(3) Although these and other important roles of U2 snRNP are well known, the critical issue of has not yet been determined. [MC5] (4) , but it is not known whether they associate specifically with AJs. [C1]
Move 3: Introducing the present study is realized by three steps in this genre. Step 1: Stating purpose(s) explicitly announces the purpose(s) of the study:
(5) It was undertaken to examine in detail and to try to understand . [MCB3]
Step 2: Describing procedures focuses on the principal features of the study:

(6) We therefore investigated AJ formation in primary keratinocytes . [C1]
Step 3: Presenting findings announces the major findings of the study:

(7) Our results show that U2snRNP is associated with the E complex . [MC5]
3.2
The methods section
The methods section has four move types. Move 4: Describing materials covers a wide range of materials used in biochemistry experiments, from natural substances, human/animal organs or tissues, to chemicals. Move 4 can be realized as three variations. Step 1: Listing materials explicitly itemizes materials or substances used:
(8) Bacterial strains used in this study are listed in Table 3. [C8]
Step 2: Detailing the source of materials identifies how these items are obtained, such as by purchase, as a gift, etc.:
(9) COS-7 cells were obtained from S.Brandt . [MCB4]
Step 3: Providing the background of the materials includes the description, properties, or characteristics of the materials:
(10) All strains have GAL upstream activating sequence-regulated PGK1pG abd MFA2pG genes, (REFERENCE). [MCB11]
Move 5: Describing experimental procedures has three variations or steps. Step 1: Documenting established procedures recounts established experimental processes commonly known to biochemistry researchers:
(11) Chromatin binding assays were performed as previously described (REFERENCE). [MC4]
Step 2: Detailing procedures provides detailed description of not-so typical procedures to facilitate the replication of subsequent studies:
(12) To obtain polyclonal antibodies , mice and rabbits were immunized . [MBC9]
Step 3: Providing the background of the procedures justifies the choice of technique or procedure:
(13) Complete details of all constructions will be provided upon request. [JBC10]
Move 6: Detailing equipment (14) and Move 7: Describing statistical procedures

(15) both occur infrequently in this genre: (14) Images were recorded through a Hamamatsu C-2400 New vicon camera using a 10 x objective and brightfield optics. Video images were digitized at a rate of 6 frames/min as described above. [MBC8] (15) The data were fitted to the Michaelis-Menten Equation 1 by using a non-linear least squares approach and the kinetic constants+- S.E. [JBC7]
3.3
The results section
The results section also has four move types: Move 8: Restating methodological issues focuses on how the data of the study have been produced. This move is realized by one or more of four steps. Step 1: Describing aims and purposes:
(16) To examine the kinetics , we first plated keratinocytes . [C1]
Step 2: Stating research questions:

(17) To determine whether these GTPases participate in the phagocytosis of P. aeruginosa, we expressed . [JBC1]
Step 3: Making hypotheses:

(18) Mondo A and Mlx heterodimerize are predicted to bind CACGTG E-box sequences. [MCB12]
Step 4: Listing procedures or methodological techniques:

(19) (To determine whether ,) P19 cytoplasmic extracts were incubated . Retention of MondoA Mlx heterodimers on the DNA beads was determined by Western blotting. [MCB12]
Move 9: Justifying procedures or methodology reveals what determines the scientists decision to opt for particular experimental methods, procedures, or techniques. This move can be expressed by referring to previous research.
(20) (DKO4 cells were used), in which mutant Ras had been detected homologous recombination (REFERENCE) and a conditionally active Raf allele (EGFPRaf-1: ER) was stably expressed in these cells (REFERENCE). [C10]
Move 10: Announcing results is a crucial move of the Results section and is realized by three steps. The first step reports major findings, whereas the second step persuades the respective discourse community to consider the finding as a part of consensual knowledge. The third step highlights the novelty produced by the study that might be worth further investigation. Step 1: Reporting results:
(21) Data is shown for Pse1ECFP/Nic96EYFP and Pse1ECFP/Nup188EYFP (Figure 3). [MC1]
Step 2: Substantiating results:

(22) Similar results were obtained. [MC1]
Step 3: Invalidating results:

(23) (Full length VASP-GFP localized to adhesion zippers (Figures 6A-6D). This was true in the majority of transfected cells .) In contrast, TD-GFP interfered with formation of adhesion (Figures 6E-6H). [C1]
Move 11: Commenting on the results is one place where scientists not only report but also comment on the results. Excerpts (2428) illustrate the five steps of Move 11: Step 1: Explaining results:
(24) We presume that the localization of GFP-tagged Ste18p is representative of native Ste18p because the wild-type fusion protein rescues mating in a ste18 strain. [MBC3]
Step 2: Generalizing/interpreting results:

(25) These results suggest that proteolysis of c-Myc is proteasome dependent. [MCB4]
Step 3: Evaluating results:

(26) The strong exacerbation of the phenotype of fun12 (1915).. and the lack of any effect in tif34 support our conclusion that eIF5B and eIF1A functionally interact during translation initiation. Moreover, the toxicity is consistent with
the model that release of eIF1A and eIF5B from 805 initiation complexes is coupled. [MCB10]
Step 4: Stating limitations:

(27) The molecular mechanisms are unknown. It is therefore difficult to propose an explicit model to explain why telomeres become longer . [MC10]
Step 5: Summarizing:
(28) Together, these results demonstrate that reg A- cells are capable of assessing the direction of a spatial gradient of cAMP . [MBC8]
3.4
The discussion section
The discussion section is also comprised of four possible move types: Move 12: Contextualizing the study has two distinct steps. Step 1: Describing established knowledge cites or reports related previous research or established knowledge of the topic that is crucial in understanding what is being presented:
(29) Conventional kinesin has long been suspected of being a vesicle motor. Initially, this stemmed from its discovery in axoplasm (REFERENCE), which is rich in Golgi-derived transport vesicles, and its co-localizatioin with vesicles in cultured cells (REFERENCE). [MBC8]
Step 2: Generalizing, claiming, deducing previous knowledge describes how the findings relate to the results of previous research:
(30) The observation that BAD is inactivated by phosphorylation atg Ser-155 has important implications for the understanding of the regulation of Bcl-2 family members. [MC7]
Move 13: Consolidating results highlights the strengths of the study and defends its importance. This move is realized through six steps: Step 1: Restating methodology:
(31) In this study, we exploited primary culture to examine the impact that elevated K16 protein level has on a number of basic properties of skin keratinocytes. [MBC10]
Step 2: Stating selected findings:

(32) We show that the essential Gpi11 and Gpi13 proteins are involved in late stages in the formation of the yeast GPIs, and we identify and characterize three new candidates GPI precursors. [MBC5]
Step 3: Referring to previous literature for comparison:

(33) The experiments presented here confirm the previously reported data (REFERENCE), showing that . [JBC4]
Step 4: Explaining differences in findings:

(34) The advantages the Ku-X4-LIV complex confers upon ligation in vitro can therefore explain why these factors are required for cellular end joining: ligation is fast and efficient, even at low enzyme and in the presence of unbroken DNA. [MBC5]
Step 5: Making overt claims/generalizations:

(35) (Simply changing the CaaX motif to a form recognized by Ftase significantly improved mGBP1 modification.) This result also indicates that the CaaX motif is not likely to be buried within the structure of the protein, . [MBC7]
Step 6: Exemplifying:
(36) (Within the G88R RNase A variants, cytotoxicity correlates well with conformational stability (Fig.2).) For example, A4/G88/V118C Rnase has the highest Tm value of the five enzymes and is the most potent cytotoxin. [JBC6]
Move 14: Stating limitations of the present study makes explicit the scientists views of the limitations of the study about the methodology, the findings, and/or the claims made based on the findings:
(37) Additionally, some interactions may be too transient for detection by FRET. [MC1] (38) Our data do not enable us to rule out a requirement for additional, non-PMAactivated pathways in the activation of splicing in primary T cells. [MBC1]
Move 15: Suggesting further research allows the scientists to offer recommendations for the course of future research by pinpointing particular research questions to be addressed or improvements in research methodology:
(39) Further analysis of the molecular basis of motor axon guidance in the limb may help to define two interrelated issues in the patterning of neuronal projections. . [C7]
4 Coding moves in the corpus of biochemistry research articles As mentioned in Chapter 2, the subjective nature of move identification presents a methodological challenge for corpus-based research, which requires a systematic identification and coding of all moves in the corpus (e.g., Crookes, 1986; DudleyEvans, 1994a; Paltridge, 1994). As a result, two individuals analyzing the same text type may differ in ascribing move boundaries or in identifying the move type of each move (as in the studies by Nwogu, 1997, and Williams, 1999, on the Results section in medical research articles). Therefore, it was necessary to assess intercoder reliability of move assignment for the present project, ensuring that move demarcation could be conducted consistently by different individuals, and that the framework for determining move type could be applied reliably. In the present case, I evaluated the reliability of my own coding in comparison to the coding of an expert in the field of biochemistry: a PhD student at an American university who is also a faculty member in the School of Pharmacy at Silpakorn University in Thailand. Although the expert coder is not a native speaker of English, he clearly possesses extensive experience and expertise in reading academic research articles in the field of biochemistry. A two-hour training session for each section was conducted to explain the purpose of the task and to acquaint the coder with the use of the analytical framework (described in Section 2 and Table 4.1). Texts were segmented into moves, and the move type of each move was determined. Only one rhetorical move was ascribed to a segment of a text. Texts were not coded for steps. The list of steps constituting each move was used to facilitate the coders decision in ascribing moves; however, the step distinctions played no role in the subsequent analyses. In the second stage of training, both raters coded four randomly selected texts representing the four conventional sections. We then went through each text to identify any coding disagreements. Difference in coding led to discussion and clarification of the criteria for coding assignments. Finally, the raters each independently coded 15 research articles (three articles from each of the five journals). Based on the independent coding by the author and the expert coder, inter-coder reliability was measured by agreement rate or percentage agreement and kappa value. (Percentage agreement rate does not take into account chance agreement between two coders, whereas kappa value does (Orwin, 1994).
Table 4.2 Summarized results of inter-coder reliability analysis

Section Introduction Methods Results Discussion Average Kappa .93 .81 .88 .88 .89 Percent 97.58 96.35 93.02 93.02 95.03
Table 4.2 shows high overall inter-coder reliability as measured by both agreement rates and kappa4 values. Moves in the Introduction section were more consistently and reliably identified than those in the other sections. In contrast, the Methods section displayed more divergence in move identification. However, there seemed to be no systematic pattern regarding divergences in move coding. The findings suggest the psychological reality of a move as a discourse unit that can be empirically investigated further.
5 Distribution of move types within texts from the biochemistry corpus One major goal of move analysis is to identify the primary communicative function the move type of each statement in a text. Thus, when Introductions in biochemistry research articles are described as being composed of three move types, this means that every statement in the Introduction can be attributed to one of these three types. However, it is not the case that move types necessarily occur sequentially in a text. For example, an Introduction will not necessarily be composed of sentences belonging to Move Type 1 (Establishing a topic), followed by Move Type 2 (Preparing for the present study), followed by Move Type 3 (Introducing the present study). Rather, these three move types, and their associated communicative functions, can be interspersed throughout the Introduction. A move type represents a particular communicative function, and a text often switches from one move type (communicative function) to another and then back again to the first. Each of these text segments are coded as separate moves, resulting in the possibility of multiple moves representing a single move type. The following text excerpt illustrates how the language of an Introduction can be attributed to different moves:
4. According to Fleiss (as cited in Orwin, 1994), the interpretation of Cohens kappa is summarized as follows: k <.40 poor,.40 < k <.59 fair,.60 < k <.74 good, and k >.75 excellent
(40) Introduction [C6] (Move Type 1) Small RNAs (sRNAs) in E. coli were first described in 1967. To date, more than ten sRNAs are known to be encoded by the E. coli genome (REFERENCE). These sRNAs act by mechanisms . The sRNAs regulate diverse cellular functions . (Move Type 2) Interestingly, however, the function of E. coli 6S RNA has been unknown. (Move Type 1) The 6S RNA was first detected as an abundant RNA (REFERENCE). It is transcribed as part of a message that contains the gene encoding 6S RNA (ssrS) (REFERENCE). (Move Type 2) The mechanism of processing 6S RNA has not been characterized. The function of the protein also is not known . The lack of a reported phenotype has precluded finding a function for the 6S RNA. (Move Type 3) Here, we show that the 6S RNA forms a complex with RNA polymerase. (Move Type 1) Gram-negative bacteria enter stationary phase upon nutrient limitation (REFERENCE).... (Move Type 3) Here, we show that 6S RNA binds to the 70-holoenzyme form of RNAP .
The beginning of paragraph one contextualizes the study by general statements introducing the topic of sRNAs (e.g., Small RNAs (sRNAs) in E. coli were first described in 1967, Move Type 1), followed by a relatively more specific statement regarding the role of sRNAs in regulating cellular functions. Then, the excerpt pinpoints the gap of previous research (the function of E. coli 6S RNA has been unknown, Move Type 2) The second paragraph similarly begins with Move Type 1, by citing a previous study on 6S RNA (e.g., The 6S RNA was first detected as an abundant RNA); this is coded as the third move in the introduction. The following sentence, which is the fourth move in the Introduction, presents a second statement of the gap ( The mechanism has not been characterized, Move Type 2). Then, the excerpt presents a summary of findings from the present study (Here, we show that the 6S RNA forms a complex with RNA polymerase, Move Type 3). The last paragraph begins with providing background knowledge on another aspect of 6S RNA (Gram-negative bacteria enter stationary phase upon nutrient limitation (REFERENCE), Move Type 1) and concludes with another statement of present findings (Here, we show that 6S RNA binds to the 70-holoenzyme , Move Type 3). The last paragraph begins with providing background knowledge on another aspect of 6S RNA (Gram-negative bacteria enter stationary phase upon nutrient limitation (REFERENCE), Move 1) and concludes with another statement of present findings (Here, we show that 6S RNA binds to the 70-holoenzyme , Move 3). The other article sections are structured in similar ways. The important point to note here is that research articles are not structured as a series of moves. Rather,
move analysis allows the statements of a research article to be parceled out among a closed set of communicative functions: the move types. The numbers assigned to the moves in the coding framework (Table 4.1) reflect the order in which the moves often appeared in these research articles. Similarly, the constituent steps of each move are sequenced to reflect the common orders found in the corpus. However, this type of analysis is not intended to directly describe the discourse organization of texts: variation in the order of both moves and steps are possible, and it is common to find multiple statements distributed throughout an article section all belonging to a single move type.
Table 4.3 Overall distribution of the 15 move types
Section Frequency No. of No. of Words of Observations (N = 315,667) Min/Max Occurrence (N = 5,617) (4/490) 100.00% 66.66% 100.00% 100.00% 100.00% 10.00% 13.32% 95.07% 71.59% 100.00% 91.01% 89.94% 100.00% 80.00% 53.33% 264 83 78 110 525 6 16 828 438 1,233 894 431 602 59 50 29,243 2,463 6,949 4,036 57,694 271 760 29,561 15,668 58,982 27,101 29,730 50,212 1,657 1,340 9/341 8/103 20/490 4/132 5/420 11/96 6/134 13/174 15/165 10/217 15/157 12/303 8/351 14/114 6/63
Introduction: (425 observations, 38,655 words) Move 1: Establishing a topic Move 2: Preparing for the present study Move 3: Introducing the present study Move 4: Describing materials Move 5: Describing experimental procedures Move 6: Detailing equipment Move 7: Describing statistical procedures Move 8: Stating procedures Move 9: Justifying methodological issues Move 10: Announcing results Move 11: Commenting results Move 12: Contextualizing the study Move 13: Consolidating results Move 14: Stating limitations of the study Move 15: Suggesting further research
Methods (657 observations, 62,761 words)
Results (3,393 observations, 131,312 words)
Discussion (1,142 observations, 82,939 words)
Using the coding framework described above, all moves in the corpus were identified and assigned to one of the 15 move types. Table 4.3 shows the overall distribution of move types across the 60 research articles that comprise the corpus. The table also shows that the move types are not equally well represented in these re-
search articles. The move types differ in that some are obligatory5, some are optional but normally present, while others are optional and rare. The descriptions presented in this section demonstrate that Swales model for describing the moves in Introductions was successfully extended to other welldefined sections of biochemistry research articles. In this study, the move framework described in Table 4.1 is successfully applied to the entire corpus of texts, yielding a comprehensive description of all communicative purposes employed to construct the research articles, the first goal of this study. In order to accurately and thoroughly describe research articles, the typical linguistic characteristics of each move type need to be identified, the second goal of the study. Multidimensional analysis is used to empirically characterize the move types identified by move analysis; the procedures and results for this stage of the study are described in the following sections.
6 Linguistic characteristics of rhetorical moves in biochemistry research articles As described in Chapter 2, move analyses have typically had two primary goals: to identify the major communicative purposes found in the texts from a genre the move types, and to identify the individual moves that comprise particular texts from that genre. Move analyses generally do not describe the linguistic characteristics of move types. In part, this restriction is a consequence of the methods used in previous research, which were based on analysis of a small number of texts from the target genre. Such analyses did not provide the basis for generalizable findings regarding the typical linguistic characteristics of move types. However, by extending this analytical approach to a representative corpus of texts, we are able to identify the typical linguistic patterns of variation among move types. Multi-dimensional (MD) analysis was used in the present case to provide a comprehensive linguistic description of the biochemistry moves and move types. (MD analysis is introduced in Chapter 1 and described more fully in Appendix One.) In a preliminary step, the corpus texts were further edited to facilitate quantitative linguistic analyses. For example, citations (e.g., Nose et. al, 1988) were replaced by Ref. to avoid artificial inflation of word counts. In addition, all references to tables or figures were replaced by Pointer.
5. The cut-off frequency of 60% of occurrence was arbitrarily established as a potential measure of move stability for any move posited in this study. A move occurring in 60% of the Introduction sections in the corpus was considered an obligatory move. If a moves occurrence was lower than 60%, it was considered optional.
As shown in Table 4.3 above, move analysis was undertaken to segment the 60 original research articles into 5,617 individual moves, which were each coded for their move type. These moves ranged widely from 4 to 490 words, with an average observation length of 56 words. For the MD analysis, move segments shorter than 25 words6 were excluded, because it is not possible to obtain reliable counts of linguistic features in shorter segments. In addition, Move Type 6 and Move Type 7 were excluded from the MD analysis, because they were represented by only 4 and 11 observations, respectively. Thus, the corpus used for the MD analysis consisted of 4,009 moves (comprising 287,607 words) with an average length of 71.8 words. This corpus was tagged by the Biber tagger; the automatic tagging proved to be 9899% accurate with no systematic tagging errors. A wide range of linguistic features were counted in each move, including the range of lexico-grammatical features used in previous MD analyses (see Appendix Two), as well as more specialized features that have been analyzed in scientific research articles: e.g., pointers, see Brett (1994); reference, see Swales (1990); Hyland (2000); and extraposed it constructions, see Biber et al.(1999), Hewings & Hewings (2002). These features were later collapsed into super-ordinate categories based on their similar functions. For example, demonstrative adjectives and demonstrative pronouns were aggregated into one demonstratives feature. Frequencies of the features in each text were counted and normalized to a rate per 100 words, so that comparisons could be made across texts. The normalized frequencies of linguistic features provide the basis for factor analysis. MD analysis was used to identify the basic parameters of linguistic variation among moves. In MD analysis, the distribution of many linguistic features is analyzed in each text of a corpus. Then factor analysis is used to identify the systematic co-occurrence patterns among those linguistic features the dimensions. There are two major quantitative steps in an MD analysis: (1) identifying the salient linguistic co-occurrence patterns in a language; and (2) comparing texts and genres/registers in the linguistic space defined by those co-occurrence patterns. In the present case, once the dimensions of variation are identified, moves and move types can be compared along each dimension. Forty-one linguistic features had strong patterns of variation in this corpus and were thus retained in the final factor analysis, which accounted for 33.5% of the total variance. The solution for 7 factors was selected as optimal; Table 4.4 summarizes the co-occurring linguistic features grouped on each of these factors.
6. Based on a 25-word criterion, about 28% of the corpus would be excluded, leaving approximately two-thirds of the original corpus (71%). These observations comprise 91% of all words in the original corpus and have an average observation length of 71.8 words.
Table 4.4 Summary of the linguistic features associated with each factor
Factor 1 Word length All attributive adjectives Common nouns Numerals Technical jargon Factor 2 Passives Past tense verbs All coordinating conjunctions Definite articles Nominalizations/gerunds Prepositions All modals Factor 3 Extraposed it That clause cont by adjectives Predicative adjectives (To cl cont by adjectives No negative features Factor 4 All demonstratives Quantifiers That claus cont by verbs No negative features * .750 .719 .509 -.522 -.421 .530 .516 .361 -.605 -.536 -.453 -.328 .906 .857 .450 .326)* Factor 5 All present tense verbs References Type token ratio or TTR (Common nouns (Past tense verbs (Pointers (Prepositions Factor 6 To infinitives Whether/if To clause cont by verbs Person 1 To clause cont by adjectives (Prepositions (Type token ratio Factor 7 Concession Pointers Not negation All adverbs No negative features .720 .508 .392 -.405) -.378) -.357) -.325) .764 .470 .447 .431 .351 -.405) -.320) .660 .557 .545 .538
.899 .886 .342
Features in parentheses are not used to compute dimension scores.
Each factor comprises a set of linguistic features that tend to co-occur in the moves from the biochemistry corpus. Factors are interpreted as underlying dimensions of variation based on the assumption that linguistic co-occurrence patterns reflect underlying communicative functions. That is, particular sets of linguistic features cooccur frequently in texts because they serve related communicative functions. In the present study, the following interpretive labels are proposed for each dimension:
Dimension 1: Conceptual vs. Concrete Reference Dimension 2: Concrete Action vs. Abstract Discussion Dimension 3: Evaluative Stance Dimension 4: Projected Interpretation Dimension 5: Attributed Knowledge vs. Current Study Dimension 6: Stated Purpose Dimension 7: Contradictory Proposition
The following sections describe these interpretations and the multi-dimensional characteristics of each move type.
7 Linguistic variation among move categories in biochemistry research articles To describe the multi-dimensional characteristics of each move type, it is first necessary to compute factor scores for each move with respect to each factor. It is then possible to compute and compare the mean factor scores for each move type on each dimension. Table 4A at the end of the chapter provides descriptive dimension statistics for all move types, while Table 4.5 below shows the results of ANOVAs that test for significant differences among the mean scores for each move type. Table 4.5 shows that the differences among move types are statistically significant with respect to all seven dimensions. However, the r2 values are not especially large, indicating that there is also considerable linguistic variation among the moves within some of these move types. For example, the mean dimension score of Move Type 4 on Dimension 1 is -8.81, reflecting that the moves in this move type have low frequencies of long words, attributive adjectives, and common nouns (the features with positive loadings on Factor 1) and high frequencies of numerals and technical jargon (the features with negative loadings on Factor 1). However, Move Type 4 actually shows a wide range of linguistic variation in the use of Dimension 1 features, with a standard deviation of 14.5, and a total range of Dimension 1 scores extending from -41.5 to 20.27 (See Table 4A). Thus some move types show a wide range of internal linguistic variation, while other move types are relatively well defined in their linguistic characteristics.
Table 4.5 ANOVA results on dimension score differences across 13 move types
Factor Dimension 1 Dimension 2 Dimension 3 Dimension 4 Dimension 5 Dimension 6 Dimension 7 F 39.013 94.124 26.440 64.409 114.731 103.316 98.299 Sig .000 .000 .000 .000 .000 .000 .000 r2 45.2% 34.5% 8.1% 15.6% 16.0% 16.1% 17.4%
Overall, though, each of these dimensions identifies significant linguistic differences among the move types. The following subsections describe each dimension in turn, providing a fuller functional interpretation, as well as discussion of how the move types differ with respect to the dimension. Dimension 1: Conceptual vs. Specific Reference The positive-loading features on Factor 1 (see Table 4.4) include word length, attributive adjectives, and common nouns. Word length, the highest loading feature on Factor 1, refers to the average length of the words in a text measured in orthographic letters. The higher the average word length of a text is, the higher its informational density (Biber, 1988; Zipf 1949). Attributive adjectives allow scientists to successfully describe, clarify, and qualify additional information about scientific phenomena or entities, and common nouns are used generally to refer to entities or concepts (Biber et al., 1999). The co-occurrence of these features reflects the dense use of modified noun phrases and long (technical) words, resulting in high information density. The negative-loading features on Factor 1 are numerals and technical jargon (abbreviations or acronyms used specifically in biochemistry writing). Whereas attributive adjectives provide a more conceptual description of referents, numerals provide a much more specific description, particularly regarding the exact quantity of referents. The complementary distribution of these two features suggests that they serve complementary functions: attributive adjectives provide conceptual elaboration, while numerals add rigorous explicitness for accurate and specific identification of procedures, required for later replication (e.g., 46% of cell, 3 hrs at 4C). The claim that specific reference is an important function underlying the negative pole of Factor 1 is further supported by the high frequency of technical jargon on this factor. In opposition to long words, technical jargon are abbreviated terms used commonly in this discipline. For instance, the noun phrase posttranscriptional gene includes a long word (an attributive adjective), identifying a specific technical attribute that is relevant to a particular study. In contrast, the technical jargon term RNA is used widely in these articles as an abbreviated way to refer to Ribonucleic Acid. Although both related to referential information in scientific discourse, the two poles of Dimension 1 reflect complementary functions. Using positive-loading features, the description of nominal elements is relatively conceptual with a high informational density. In contrast, greater precision and specificity of reference can be achieved by using the negative-loading features. Based on these considerations, the interpretive label Conceptual vs Specific Reference is proposed for the functional dimension underlying this factor.
Figure 4.1 shows the distribution of move types along Dimension 1. Move Type 2 (Preparing for the present study) has the highest Dimension score, characterized by dense conceptual reference. The following example illustrates the linguistic features associated with conceptual reference in this move type, including long words (bolded), attributive adjectives (underlined), and common nouns (italicized).
(41) MOVE 2 (F1 score = 38.69) While the cloned genes and mutant strains provide hints and useful tools for future studies, direct biochemical roles for the genetically identified posttranscriptional gene silencing factors have yet to be assigned.
Move Type 2 prepares readers for the current study by identifying the research gap: in this case, the absence of direct biochemical roles for the genetically identified posttranscriptional gene silencing factors. This example illustrates the dense use of long technical words, especially nouns and attributive adjectives, establishing the need for a study by identifying the research gap. In contrast, the linguistic features that characterize the other end of Dimension 1 reflect more specific and concise identification of referents. Move Types 4 and 5, both from Methods sections, have the largest negative scores here (see Figure 4.1). Example (42) from Move Type 4 (Describing materials) illustrates the relative absence of positive Dimension 1 features and a high occurrence of negative features: numerals (bolded) and technical jargon (underlined).
(42) MOVE 4 (F1 score = -23.93) For each pulldown, glutathione-agarose beads containing approximately 10 g of bound purified GST-Nup501, GST-Nup502, or GST were used.
Example (42), focusing on specific rather than conceptual information, is dense with numerals and technical jargon, describing the methods in a way that permits the validation of the results and future replication. Examples (41) and (42) thus represent the two contrasting communicative functions of this dimension. The linguistic features defining both poles might be associated with informational density; the main difference is that the negative features are associated with the demands of precise reference rather than abstract conceptual information. It is interesting that Move Type 5 (Describing procedures) and Move Type 8 (Restating methodological issues), despite their resemblance in terms of their communicative functions, are linguistically quite different on Dimension 1. Move Type 5, which occurs in Methods sections, has a relatively large negative score on Dimension 1, suggesting its greater precision and specificity of reference. In contrast, Move Type 8 occurs in Results sections and has a moderate positive score along Dimension 1, indicating mixed use of conceptual and specific reference. Overall, the relationships among move types shown in Figure 4.1 confirm the interpreta-
tion of Dimension 1 as distinguishing among texts along a continuum of Conceptual vs. Specific Reference. Dimension 2: Concrete Action vs. Abstract Discussion Figure 4.2 presents the distribution of move types for Dimension 2, allowing comparison of all moves along a continuous parameter of variation labeled Concrete Action vs. Abstract Discussion. Move Type 4 (Describing materials), which occurs in Methods sections, has the largest positive dimension score. Passives, the highest positive-loading feature on Factor 2, are used to identify where the research materials were obtained. There is no need to identify the agent (obviously the researcher) for these statements, and past tense is usually used to document these procedural activities. Example (43) illustrates this move type, highlighting passives (bolded), past tense verbs (underlined), and coordinating conjunctions (italicized):
(43) MOVE 4 (F2 score = 15.73) Donkey anti-rabbit IgG-peroxidase and sheep anti-mouse IgG-peroxidase were obtained from Amersham Life Science, and mouse anti-goat IgG-peroxidase was from Jackson ImmunoResearch Laboratories.
At the other extreme, Move Type 15 (Suggesting further research) has the largest negative dimension score. The negative-loading Factor 2 features include definite articles, nominalizations/gerunds, prepositions, and modals. Definite article the is used for noun phrases that have been previously evoked or are known to the reader, while nominalizations/gerunds refer to abstract concepts. The co-occurrence of the definite article and nominalizations/gerunds on Factor 2 indicates the authors focus on abstract information that the author constructs as given information. Example (44) illustrates the relative absence of positive Dimension 2 features and a high frequency of negative features, particularly definite articles (bolded), nominalizations/gerunds (italicized), prepositions (capitalizations), and modals (underlined).
(44) MOVE 15 (F2 score = -23.41) A question arises AS TO how an integral membrane protein may be able to interact WITH p38JAB1 and why this interaction occurs mostly WITH the 68-kDa precursor present IN the endoplasmic reticulum AS opposed TO the 85-kDa mature receptor present IN the plasma membrane. This issue will also have to be addressed experimentally.
It is interesting to note that all move types that occur in Discussion sections (Move Types 1215) have large negative scores on Dimension 2. In addition, all three move types from the Introduction (Move Types 13) have large negative Dimension 2 scores. Thus we see here how these articles begin and end with relatively
abstract discussion, while the more concrete actions are described in the two intervening sections. Dimension 3: Evaluative Stance The positive features on Dimension 3 are mostly that complement clauses. In these constructions, the authors stance is given in the main clause, and the propositional information is given in the that complement clause (e.g., it is possible that we did not detect). The heads of that complement clauses can be of different syntactic categories (e.g., nouns, verbs, and adjectives). On Factor 3, the controlling heads of that complement clauses are predicative adjectives. To be precise, the adjectives controlling that complement clauses on Factor 3 are likelihood adjectives (e.g., probable), attitudinal adjectives (e.g., interesting), and factual/certainty adjectives (e.g., evident). This indicates that the co-occurring features on Factor 3 index the authors expression of their agreement, opposition, evaluation, and interpretation of propositions. Similarly, to complement clauses are controlled by predicative adjectives such as evaluative adjectives (e.g., appropriate, necessary) and ease/difficulty adjectives (e.g., difficult, easy). Taken together, the positiveloading Factor 3 features express authors personal stance towards the propositions in the that/to complement clauses. However, these constructions are impersonal because their stance is not directly attributed to the authors. Based on these interpretations, the interpretive label Evaluative Stance is proposed for the functional dimension. Figure 4.3 presents the distribution of move types for Dimension 3, allowing comparison of all moves along a continuous parameter of variation labeled Evaluative Stance. Move Type 14 (Stating limitations of the study) has the highest dimension score on Dimension 3, while Move Type 4 (Describing materials) has the lowest mean score. The Dimension 3 characteristics of Move Type 14 is illustrated by (45) and (46), which contain frequent occurrences of extraposed it constructions (bolded), that clauses controlled by adjectives (underlined), and to clauses controlled by adjectives (capitalized).
(45) MOVE 14 (F3 score = 4.86) It is interesting that the experiments in this paper were all carried out using assays for genetic interference in somatic tissues of the animal in the first generation after injection. It is conceivable that distinct mechanisms might operate in longer term RNAi (REF.) or in specific tissues, such as the germline.
(46) MOVE 14 (F3 score = 4.62) In the absence of atomic structure, it is not possible TO determine which residues are solvent exposed and thus are likely to make physical contact with the microtubule and which ones contribute to the domains structural organization.
In contrast, the moves at the other end of the continuum of Dimension 3 show no concern for evaluative stance. An example of Move Type 4 (Describing materials) is represented by (47), with a complete absence of features on Dimension 3.
(47) MOVE 4 (F3 score = -1.02) A peptide encoding a conserved region of the C-terminal domain of SMD was used by Sigma Genosys to raise antisera SC1 and SC2, both were used at 1:500 on Westerns. An mSYD2 N-terminal domain fusion protein was used by Lampire Biological Laboratories to raise the antiserum SN1, used at 1:500 on Westerns following depletion of the antiserum with an acetone powder of bacteria expressing a portion of the antigen. -COP antibody was used at 1:5000 (Ref.). KLC 6390 antibody was used at 1:1000 (Ref.). MitoTracker Red CM-H2XRos, and COX-1 antibodies were used as indicated by Molecular Probes. Golgi-58K antibody was used as indicated by Sigma. DIC antibody was used as indicated by Chemicon International.SYN antibody was used as indicated by Boehringer.
The preceding subsections have shown that research article Introductions (Move Types 13) are very similar to Discussion sections (Move Types 1215) with respect to being conceptual (Dimension 1) and abstract (Dimension 2). However, Figure 4.3 shows a different pattern with respect to Dimension 3: the Discussion sections are highly marked for evaluative stance, while the Introductions avoid these stance expressions. This is a strong difference, with one major exception: Move Type 12 (Contextualizing the study) is marked by the absence of the stance features associated with Dimension 3. This move typically begins the discussion section, functioning as a kind of recap of the article introduction. As such, it is usually descriptive rather than evaluative, providing the immediate background for the following evaluative moves in the Discussion section. With respect to the larger goals of this book, this finding provides a nice example of how a move approach to discourse organization captures a linguistic pattern that would probably go un-noticed otherwise. That is, while Discussion sections are generally evaluative in stance, a move analysis of this section shows that the first move is usually quite different in both its communicative functions and its associated linguistic characteristics.
Dimension 4: Projected interpretation The linguistic features associated with projected interpretations (Dimension 4) are demonstratives, quantifiers, and that clauses controlled by verbs. The frequent use of quantifiers, in conjunction with demonstratives, reflects the concern with the specificity of textual reference in academic discourse. That complement clauses controlled by verbs provide a means to talk about the information in the dependent that complement clause. In this scientific discourse, the subject of these verbs is usually an inanimate entity (e.g., the findings suggest that ). The frequent use of these verbs represents the authors expression of degree of certainty or commitment associated with the claim stated in the that complement clause. For instance, demonstrate denotes a higher degree of certainty than suggest when used in the context of the findings demonstrate/suggest that. The interpretation of the features underlying Factor 4 predicts that the moves with the highest Dimension 4 score will be more characterized by projected interpretations than other moves. Figure 4.4 presents the distribution of move types along Dimension 4. Move Type 11 (Commenting results) has the highest dimension score on Dimension 4, while Move Type 5 (Describing experimental procedures) has the lowest mean score on this dimension. Example (48) illustrates Move Type 11 at the positive end of Dimension 4. The text shows the concern for expressing claims or generalizations of the texts with a frequent use of demonstratives (bolded), quantifiers (underlined), and that clauses controlled by verbs (italicized).
(48) MOVE 11 (F4 score = 24.56) All these data suggest that recombinant mammalian retromer proteins can form complexes in COS7 cells. These data, however, do not demonstrate whether the complexes are formed in the cytoplasm, on membranes, or both.
In contrast, (49) from Move Type 5 (Describing experimental procedures) reveals an absence of positive-loading features of Dimension 4.
(49) MOVE 5 (F4 score =.40) The MatchmakerTM two-hybrid system 2 was used according to the protocols provided by the manufacturer. Using polymerase chain reaction-based strategies, we subcloned the C-terminal 42 residues of the rLHR into the pAS21 vector to generate a fusion protein with the GAL4 DNA binding domain. This plasmid was used as bait to screen a human kidney 293 cells cDNA library constructed in the pACT2 vector to generate fusion products with the GAL4 activation domain.
As shown, the move in (49) provides an objective description of experimental procedures. No interpretation is involved; and so no claims are framed. Overall, the relationships among moves shown in Figure 4.4 confirm the interpretation of Di-
mension 4 as distinguishing among texts along a continuum of projected interpretations. Discourse with a focus on expression of generalizations has a high score on this dimension, and discourse with no focus on making generalizations has a markedly low score on this dimension. Dimension 5: Attributed Knowledge vs. Current Study The linguistic features associated with attributed knowledge (the positive end of Dimension 5) are present tense verbs, attributed references, and type-token ratio (TTR7). The co-occurrence of references to other studies with present tense verbs indexes generalized background knowledge established by previous research in the field. High type/token ratio in a text indicates that the discourse has a greater variety of word types and integrates a higher amount of information (Biber, 1988). Taken together, the co-occurrence of positive-loading Factor 5 features reflects a focus on attributed knowledge, a crucial requirement in scientific discourse to situate and contextualize the study being reported. In contrast, the linguistic features associated with the current study (the negative pole of Dimension 5) are common nouns, pointers, past tense verbs, and prepositions. Three of the four negative-loading Factor 5 features (common nouns, past tense, and prepositions) load higher on Factors 1 and 2. As discussed earlier, common nouns refer to entities or concepts; past tense verbs report completed actions and do not assume generalization; and prepositions often modify and specify nouns in a discourse. Pointers (e.g., see Figure 3, as shown in Table 6) direct readers to visual representations accompanying the data presented. The co-occurrence of these negative-loading Factor 5 features suggests the focus on reporting the actual findings of the current study. Figure 4.5 presents the distribution of move types for Dimension 5. Move Type 1 (Establishing a topic) has the highest positive dimension score, characterized by reference to attributed knowledge; Move Type 10 (Announcing results) has the largest negative mean score, characterized by its focus on current findings. Move Type 1 (Establishing a topic), represented by (50), contains frequent occurrences of present tense verbs (bolded), references (underlined), and type-token
7. TTR is the ratio between the number of different lexical items in a text (the types) and the total number of words in that text. Specifically, TTR is a percentage = (types/token) X 100. Longer texts tend to have more repeated words and thus a much lower TTR. If the TTR in the text is low, it means there are many more repeated words in the text. Conversely, if the type-token ratio is high in the discourse, it means that the text has fewer repeated words and greater lexical density. In this study, all move observations under 25 words were excluded, and the 4,009 move observations analyzed by multidimensional analysis has a range of 25 to 483 words, with a mean of 72 words. In this regard, given a broad range of move observation length, TTR is not going to be comparable here because the ratio always decreases as longer texts are included, and vice versa.
ratio, and it has markedly low frequencies of common nouns, pointers, past tense verbs, and prepositions. Thus, (50) is typical of Move Type 1 in being highly attributed in knowledge, providing background information of the field rather than reporting findings from the current study.
(50) MOVE 1 (F5 score = 13.34) (TTR= 27.25) Interest in prenylation has stemmed from the discovery that key proteins in multiple signal transduction cascades contain covalently attached isoprenoids (Ref.). Perhaps the most notable examples are the Ras proteins. Mutated forms of Ras proteins are found in 30% of all human tumors (Ref.). However, these mutant Ras proteins are not oncogenic if they cannot be prenylated (Ref.). Prevention of Ras prenylation thus holds promise as a new tactic for cancer chemotherapy (Ref.). To this end, many prenylation inhibitors have been developed, several of which appear to be effective anticancer agents in animal studies and are undergoing clinical trials (Ref.).
Conversely, Move Type 10 (Announcing results), represented by (51), is characterized by frequent occurrences of common nouns (italicized), pointers (underlined), past tense verbs (bolded), and prepositions (capitalized), and relatively infrequent use of present tense verbs, references, and a relatively low type-token ratio.
(51) MOVE 10 (F5 score = -36.66) (TTR = 14) Moreover, these troughs were labeled WITH antibodies against -catenin (Pointer) and were flanked BY desmosomes associated WITH thick bundles OF keratin intermediate filaments (Pointer). AT late times, the undulating cellcell border had flattened, and the epithelium appeared AS a sheet, with continuous contacts OF alternating desmosomes and Adherens junctions (Pointer).
Similar to Dimension 3, the distribution of move types along Dimension 5 shows the importance of a move-analytical approach. In this case, only one move type Move 1, which is usually the very first move in an article is especially marked for the use of attributed knowledge features. In contrast, all other move types including the other two move types found in the article Introductions are marked by the focus on current findings (to differing extents). The interesting finding here is that these research article Introductions are not at all homogeneous. Rather, the first move in the Introduction is functionally distinctive, serving the communicative purpose of establishing the topic, and the MD analysis shows that this move type is linguistically distinctive as well.
Dimension 6: Stated Purpose Dimension 6 is composed mostly of only features with positive loadings. Whether/if indirect questions introduce independent yes/no interrogative clauses expressing indirect questions as indirect speech reports. First person pronouns reflect the active role of the authors. The co-occurrence of first person pronouns, infinitive marker to, and whether/if indicates the authors deliberate purpose of addressing intellectual research questions and constructing relevant strategies to answer those questions. The other two positive-loading Factor 6 features are two types of to complement clauses, controlled by verbs and controlled by adjectives. The controlling verbs of to complement clauses on this factor are modality/causation/effort verbs (e.g., attempt, try, seek), while the controlling adjectives are ability and willingness adjectives (e.g., able, determined, sufficient). Both modality/causation/effort verbs and ability and willingness adjectives index the authors expression of specific and definite objective(s) of the study. (The two negative-loading Factor 6 features, prepositions and type-token ratio, are not unique to Factor 6 because of their salient loadings on other factors.) All in all, based on the interpretation of the communicative functions represented by positive-loading features on Factor 6, the interpretive label stated purpose is proposed for this dimension. Figure 4.6 presents the distribution of move types for Dimension 6. Move Type 15 (Suggesting further research) has the highest dimension score, while Move Type 5 (Describing experimental procedures) has the lowest mean score. The interpretation of the features comprising this factor predicts that moves with the highest Dimension 6 score will be more characterized by purposive statements than others. According to Figure 4.6, Move Type 15 (Suggesting further research) and Move Type 14 (Stating limitations of the study), represented by (5253) and (54) respectively, serve purposive functions, reflected by frequent occurrences of to infinitives (italicized), whether/if (underlined), to complement clauses controlled by verbs, first person pronouns (capitalized), and to complement clauses controlled by adjectives (bolded).
(52) MOVE 14 (F6 score = 1.59) However, because the neural tube becomes disordered in Nup50 mutant animals, it is difficult to determine if the alterations in p27Kip1 expression in Nup50-null animals are the cause or the consequence of the neural tube abnormalities. A mechanistic understanding of these abnormalities thus must await a clearer understanding of Nup50 function in nucleocytoplasmic transport. (53) MOVE 14 (F6 score = 2.28) However, WE were unable to detect any morphological changes in duct cells consistent with this hypothesis, although WE cannot rule out the possibility that functional changes have occurred that are unapparent morphologically.
(54) MOVE 15 (F6 score = 3.14) Future experiments will be necessary to ascertain whether a similar mechanism is involved at puncta, or whether the physical interaction between vinculin or zyxin and Adheren junction components are alone sufficient to promote VASP/Mena association and activation.
In contrast, the absence of the linguistic features that characterize the positive end of Dimension 6 shows less or no use of purposive statements, as shown in (55) from Move Type 5 (Describing procedures).
(55) MOVE 5 (F6 score = -41.87) Primer extension assays were performed as described previously (Ref.) using 22/44 and 24/44 primer-templates with or without 16- or 14-mer downstream oligonucleotides (Pointer). 200 fmol of primer-template was incubated with pol at 37 C in 10-l reactions containing 500 M dNTPs unless otherwise indicated. Reaction times and enzyme concentrations are indicated in the figures.
Dimension 7: Contradictory proposition Finally, Dimension 7 is composed of only four co-occurring linguistic features. Concessive markers and not negation are semantically transparent in their function of negating a proposition. The meta-textual device of pointers (e.g., See Table 1, Figure 3, etc.) directs readers attention to visual accompaniments of particular findings. And adverbs often provide qualifications of propositions. The functions of these linguistic features taken together contribute to the pragmatic function of expressing contradiction. Figure 4.7 presents the mean dimension score of each move type along Dimension 7. Move Type 2 (Preparing for the present study) has the highest dimension score on Dimension 7, while Move Type 4 (Materials used) has the lowest mean score. Move Type 2 represented by (56) demonstrates the positive features of Dimension 7: concessive markers (bold), pointers (capitalized), not negation (underlined), and adverbs (italicized).
(56) MOVE 2 (F7 score = 13.44) However, forms of Ras that are incompletely modified have received little study (Ref.), largely because of the assumption, based on direct physical studies, that prenyl proteins are fully and completely modified (POINTER). It is still not known if all functions of oncogenic Ras require prenylation or if some effector pathways may remain active regardless of prenylation state.
In contrast, (57) representing Move Type 4 shows little or no concern with expressing contradiction, or with the logical comparison of possibilities at all, as reflected by the absence of positive-loading features on Factor 7.
(57) MOVE 4 (F7 score =.55) Human kidney 293T cells are a derivative of 293 cells that express the SV40T antigen (Ref.) and were provided to us by Dr. Marlene Hosey. HeLa cells were obtained from Dr. Dawn Quelle. The 9E10 hybridoma cell line was obtained from the American Type Culture Collection. Purified hCG was kindly provided by the National Hormone and Pituitary Agency. 125I-hCG was prepared using the purified hCG as described elsewhere (Ref.).
Overall, the relationships among moves shown in Figure 4.7 confirm the interpretation of Dimension 7 as distinguishing among texts along a continuum of Contradictory proposition. Overall Multi-Dimensional Profile of Move Types Figure 4.8 summarizes the overall relations among four of the move types from biochemistry research articles: Move Type 2: Preparing for the present study, Move Type 5: Describing procedures, Move Type 8: Restating methodological issues, and Move Type 14: Stating limitations of the study (selected moves from the Introduction, Methods, Results, and Discussion sections, respectively).8 Move Type 2 (Preparing for the present study) has the highest mean scores on Dimensions 1 and 7, reflecting a relatively marked emphasis on conceptual information (Dimension 1) and a focus on expression of contradiction (Dimension 7). Move Type 2 is unmarked with respect to Dimensions 26, suggesting a mixed focus of concrete and abstract discussion (Dimension 2), a moderate use of evaluative stance (Dimension 4), a mixed focus on attributed information and current findings generated by the study (Dimension 5), and a moderate use of purposive statement (Dimension 6). Move Type 5 (Experimental procedures) stands out as having markedly low scores on Dimensions 4 and 6, showing no concern for framing claims or generalizations (Dimension 4) or expressing purposive statements (Dimension 6). Move Type 5 also has relatively low scores on Dimensions 1, 3, and 7, indicating a moderate use of specific reference (Dimension 1), and little concern for expressing evaluative stance (Dimension 3) and contradiction (Dimension 7). This move type has a moderately high mean score on Dimension 2, showing a relatively high emphasis on scientific activities. Move Type 5 has an intermediate mean score on Dimension 5, suggesting a mixed use of attributed information and current findings. Move Type 8 (Restating methodological issues) has a markedly low score on Dimensions 5 and a moderately low score on Dimension 7, suggesting the empha8. In order to facilitate comparisons across the seven dimensions, each of the dimension scores was transformed to a new scale. That is, dimension scores were multiplied/divided by a scaling coefficient so that all dimensions are presented within the same scale from plus to minus 10.
sis on current findings (Dimension 5) and little concern of contradiction (Dimension 7). Move Type 8 has intermediate scores on Dimensions 1, 2, 3, 4, and 6, indicating that this move has a mixture of both conceptual and specific reference (Dimension 1), both concrete and abstract Discussion (Dimension 2), a certain amount of evaluative stance (Dimension 3), generalizations (Dimension 4), and purposive statements (Dimension 6). Move Type 14 (Limitation of the study) has a markedly high score on Dimension 3, suggesting a high density of evaluation (Dimension 3). This move has moderately high mean scores on Dimensions 4, 6, and 7, indicating a substantial amount of generalizations (Dimension 4), purposive statements (Dimension 6), and contradictory statements (Dimension 7). Move Type 14 has a moderately low mean score on Dimension 2, suggesting relative focus on abstract concepts rather than concrete research activities. Move Type 14 has intermediate scores on Dimensions 1 and 5, indicating that this move has a mixed characteristic of both conceptual and specific reference (Dimension 1), and both attributed knowledge and current findings (Dimension 5). As shown in Figure 4.8, each move type has a different profile across the seven dimensions. As a consequence, the relation between any two moves needs to be based on consideration of all seven dimensions. The multidimensional profile is also useful in differentiating two different moves that are superficially emblematic of similar functions but belong to different sections of research articles. Move Types 2 and 14 illustrate such differences. Move Type 2 (Preparing for the present study) from the Introduction section generally contains the scientists evaluation of existing research in order to justify the need for the current study. Move Type 14 (Stating limitations of the study) from the Discussion section, expresses the scientists evaluation of their own study, acknowledging that their study is not perfect, and limitations are imminent. Thus, the two moves are functionally similar to a certain extent: both moves involve evaluation. This raises the question of whether these two moves should be labeled as two distinct move types, simply because they belong to different sections and interact with their neighboring moves differently. Or, instead, should these two functionally similar moves be considered the same move type that occurs in two different sections? To address this question, a look at representative text samples and the multidimensional profiles of these moves is informative.
(58) MOVE 2 These findings raise the possibility that the differential expression of genes by lateral motor column and lateral motor column neurons and by ventral and dorsal limb mesenchymal cells coordinates this binary choice in motor axon trajectory.
(59) MOVE 14 It is interesting that the experiments in this paper were all carried out using assays for genetic interference in somatic tissues of the animal in the first generation after injection. It is conceivable that distinct mechanisms might operate in longer term RNAi (REF.) or in specific tissues, such as the germline.
Examples (58) of Move Type 2 and (59) of Move Type 14 show that they are stylistically different, and this stylistic difference can be captured by multidimensional analysis. Based on Figure 4.8, Move Types 2 and 14 are similar on Dimensions 2, 4, 5, 6, and 7. However, they are different to a certain extent on Dimensions 1 and 3. That is, on Dimension 1, Move Type 2 has the highest score, and Move Type 14 has a relatively high mean score (Conceptual vs. Specific Reference), suggesting that Move Type 2 focuses on a conceptual current state of knowledge of the topic, whereas Move Type 14 narrows down the scope of the reference to the current study only, and thus is relatively more specific with regard to referential information. The difference between the two move types is more distinct with respect to Dimension 3 (Evaluative stance): Move Type 14 has the highest mean score, whereas Move Type 2 has a negative mean score, reflecting much more use of evaluative stance features in Move Type 14 than in Move Type 2. This difference in mean scores indicates that in Move 14, when the scientists present evaluations of their own study, they tend to background their identification by using the positive Dimension 3 features: extraposed it, predicative adjectives, and that or to complement clauses controlled by evaluative adjectives. In contrast, the negative mean score of Move Type 2 on Dimension 3 indicates little use of these features when the scientists comment on previous research. Despite their similar function of expressing evaluation as determined by move analysis, Move Types 2 and 14 are indeed stylistically distinct on the parameter of Dimension 3: Evaluative stance.
8 Multi-dimensional variation among move types within the same section It is known that a research article typically has a well-defined four-section organization, known as IMRD (Introduction-Methods-Results-Discussion). Each section is clearly marked for its distinct communicative purpose by the names of the section. As shown in the present study, each section also consists of a number of rhetorical moves. For instance, the Introduction section consists of three main moves (Move Types 13). The moves from a section tend to pattern together on most dimensions, leading to the conclusion that the sections are linguistically uniform. For example, Figure 4.9 shows that the three move types within Introductions are very similar
in their characterizations with respect to most dimensions. However, there are also differences. For example, Move Type 2 (Preparing for the present study) has a markedly high dimension score on Dimension 7, indicating a focus on expressing contradiction. Similarly, Move Type 1 (Establishing the topic) has a high score on Dimension 5, indicating a high use of attributed knowledge in order to establish the centrality of the topic being presented. These distinctive characteristics of Move Types 1 and 2 demonstrate that Introduction sections are not homogenous. Although these three move types are related in their communicative functions and linguistic MD characterizations, they also each have distinct micro-purposes and linguistic characteristics. Figure 4.10 plots the mean scores for the two main move types within Methods sections, showing that Move Type 4 (Describing materials) and Move Type 5 (Describing procedures) are relatively uniform linguistically. That is, although there are some minor differences between these two move types, they are consistently similar in their multi-dimensional characteristics when compared to the other move types. In this case, we can identify two distinct communicative purposes within Methods sections, but those functions are related to each other, and there are few linguistic differences between these move types. Figure 4.11 similarly shows that the four move types of the Results section (Move Types 811) are very similar in their linguistic characteristics. That is, all of the four Results move types have intermediate scores on Dimensions 1, 2, 3, 6, and 7. However, there are also some differences: Move Type 10 (Stating results) has an extremely large negative score on Dimension 5 (Attributed knowledge vs. Current study), indicating a focus on the current study; and Move Type 11 (Commenting on the results) has a markedly large positive score on Dimension 4 (Projected interpretation), indicating a focus on making interpretations. This same general pattern of linguistic homogeneity holds for the moves within Discussion sections (Move Types 1215; see Figure 4.12). However, here also there are some differences. For instance, Move Type 14 (Stating limitations) has an extremely large positive score on Dimension 3 (Evaluative stance), indicating a high density of evaluative stance. In contrast, Move Type 12 (Contextualizing the study) has a negative score on this dimension, reflecting the absence of evaluative stance expressions. These patterns demonstrate the value of the novel application of multidimensional analysis in analyzing move types, reflecting the internal discourse organization of research articles. Without this level of fine grain analysis, these linguistic differences might not have been noticed. Thus, this study illustrates the power of integrating these two approaches to discourse analysis: move analysis and multidimensional analysis. Each approach provides a partial analysis of the discourse. Without the initial move analysis, the communicative variation within sections
would have been missed. Similarly, without the multi-dimensional analysis, the linguistic characteristics of rhetorical move types would not have been systematically characterized, and moves with similar functions would not have been differentiated linguistically. The study thus shows the complementary nature of these two steps in the analysis for characterizing textual variation at both macro and micro levels.
Figure 4.1 Mean scores of moves along Dimension 1: Conceptual vs. Specific Reference
Figure 4.2 Mean score of moves along Dimension 2: Concrete Action vs. Abstract Discussion
Figure 4.3 Mean score of Moves along Dimension 3: Evaluative Stance
Figure 4.4 Mean score of Moves along Dimension 4: Projected interpretation
Figure 4.5 Mean score of Moves along Dimension 5: Attributed Knowledge vs. Current Study
Figure 4.6 Mean score of Moves along Dimension 6: Stated Purpose
Figure 4.7 Mean score of Moves along Dimension 7: Contradictory Proposition
Figure 4.8 Multidimensional profiles of moves, highlighting Move 2 (Preparing for the present study), Move 5 (Describing experimental procedures), Move 8 (Restating methodological issues), and Move 14 (Stating limitations of the study)
Figure 4.9 Multidimensional profile of the Introduction moves: Move 1 (Establishing a topic; dashed line), Move 2 (Preparing for the present study; dotted line), and Move 3 (Introducing the present study; solid line)
Figure 4.10 Multidimensional profile of Methods moves: Move 4 (Describing materials; dotted line) and Move 5 (Describing experimental procedures; dashed line)
Figure 4.11 Multidimensional profile of the Results moves: Move 8 (Restating methodological issues), Move 9 (Justifying methodological issues), Move 10 (Announcing results), and Move 11 (Commenting results)
Figure 4.12 Multidimensional profile of the Discussion moves: Move 12 (Contextualizing the study), Move 13 (Consolidating results), Move 14 (Stating limitations of the study), and Move 15 (Suggesting further research
Table 4A Descriptive dimension statistics for all move types

Move Dimension 1 1 2 3 4 5 8 9 10 11 12 13 14 15 Total Dimension 2 1 2 3 4 5 8 9 10 11 12 13 14 15 Total Dimension 3 1 2 3 4 5 8 9 10 11 Observation 251 45 65 60 485 550 252 911 449 361 532 27 21 4009 251 45 65 60 485 550 252 911 449 361 532 27 21 4009 251 45 65 60 485 550 252 911 449 Mean 9.9100 11.8939 10.4042 8.8137 4.1863 4.9533 7.1886 4.8427 6.3406 6.8747 6.0781 7.6219 9.7121 4.7541 17.5677 17.7664 18.9329 .8460 5.8697 13.0989 16.7269 14.0587 18.6941 18.5847 20.1220 20.1287 21.6619 15.0587 .4584 .3413 .6987 .8928 .8472 .4604 .1337 .3474 .5721 S.D. 10.2693 13.2279 11.3340 14.4933 11.1414 12.9123 12.5910 12.4696 12.2484 11.2302 11.0062 11.5713 10.4349 12.6066 7.1427 9.3825 5.6777 10.5354 8.2884 9.3133 9.6283 9.2742 8.9365 8.3080 7.3091 10.4172 11.1997 9.8304 .9535 2.2196 .5571 .4586 .5474 1.3653 1.8041 1.3442 2.4746 Minimum 22.60 14.89 14.78 41.47 36.90 39.04 30.68 48.02 29.18 25.00 26.21 21.20 13.32 48.02 42.32 38.49 30.47 32.48 39.50 40.47 38.50 47.48 42.86 46.99 49.48 48.69 43.78 49.48 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 Maximum 49.42 46.06 42.95 20.27 32.54 44.58 43.59 41.97 44.83 43.16 38.99 29.69 29.78 49.42 4.36 2.09 8.14 21.02 18.25 13.25 15.79 15.89 7.21 4.37 4.36 1.35 1.19 21.02 4.99 11.88 .90 1.15 3.14 10.98 10.08 9.62 12.30
Move 12 13 14 15 Total Dimension 4 1 2 3 4 5 8 9 10 11 12 13 14 15 Total Dimension 5 1 2 3 4 5 8 9 10 11 12 13 14 15 Total Dimension 6 1 2 3 4
Observation 361 532 27 21 4009 251 45 65 60 485 550 252 911 449 361 532 27 21 4009 251 45 65 60 485 550 252 911 449 361 532 27 21 4009 251 45 65 60
Mean .2526 .1682 1.7880 .7187 .2308 1.0425 3.0223 2.2735 .6286 .8221 .4196 1.3391 1.1888 4.4347 1.6635 2.7587 2.7309 1.4118 1.4774 2.6428 6.2822 5.2462 9.7019 2.6463 16.2223 9.3463 17.5364 10.2037 1.8498 2.5816 5.6342 8.0764 8.9857 26.5783 15.0389 21.8115 14.7642
S.D. 1.4341 1.8192 3.5792 3.2546 1.6322 2.5298 4.4581 2.7794 3.0781 1.8644 3.2853 3.5941 3.2782 4.7652 2.8445 3.4469 4.4515 3.7564 3.6339 11.9913 11.3331 15.6066 10.3624 11.9192 11.1704 11.1193 11.2713 11.8279 12.5951 12.4715 10.6608 11.3309 13.6489 8.3568 6.8015 9.8892 6.5455
Minimum 1.02 1.02 1.02 1.02 1.02 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 47.97 34.35 57.70 34.76 37.38 52.99 41.88 51.00 53.62 41.16 38.41 27.32 32.50 57.70 56.88 29.87 44.22 26.60
Maximum 10.74 11.49 8.67 9.80 12.30 13.19 15.65 12.08 15.98 9.22 13.80 19.23 18.38 23.72 13.80 20.02 13.44 9.22 23.72 34.76 16.85 23.41 14.62 32.84 14.05 21.82 15.40 18.11 38.74 34.01 18.33 19.45 38.74 3.51 10.18 .23 .42

Move 5 8 9 10 11 12 13 14 15 Total Dimension 7 1 2 3 4 5 8 9 10 11 12 13 14 15 Total
Observation 485 550 252 911 449 361 532 27 21 4009 251 45 65 60 485 550 252 911 449 361 532 27 21 4009
Mean 27.3203 14.0081 17.3601 19.6434 16.1855 21.3855 22.2919 12.5584 11.5088 20.0308 .6756 4.2911 .1782 .8970 .8082 .6487 .4396 3.6772 1.9431 .9993 1.7876 2.8591 1.0211 1.3258
S.D. 9.9091 6.7782 6.9605 6.5368 6.2292 8.6880 8.7855 6.8255 7.6612 8.8577 2.2510 4.0254 2.1534 2.1400 1.4866 2.3168 3.1371 3.8692 3.7519 2.7780 2.9378 3.3072 3.0093 3.4480
Minimum 56.42 36.09 32.02 38.56 32.61 52.00 47.89 24.91 21.12 56.88 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35
Maximum 1.12 10.82 5.72 4.43 2.24 .75 4.72 2.28 3.14 10.82 10.17 15.50 6.64 5.99 6.16 11.94 15.83 19.47 17.65 16.41 15.51 9.19 8.00 19.47
chapter 5
Rhetorical appeals in fundraising

WITH Molly Anthony & Kostyantyn Gladkov
This chapter presents another top-down approach to analyzing discourse structures in fundraising letters: rhetorical persuasion analysis. As outlined in Chapter 1 (see Table 1.1), the first step in a top-down corpus-based analysis of discourse organization is to determine the set of possible functional/ communicative categories of discourse units. Instead of describing texts according to their semantic/functional purposes, as in move analysis, rhetorical persuasion analysis divides texts into sections using the three basic means of Aristotelian persuasion: ethos, pathos and logos. This chapter gives background about rhetorical persuasion, describes the set of rhetorical appeals, applies the text analysis to the ICIC corpus of fundraising letters, and attempts to describe the sequencing and location of rhetorical appeals in relation to rhetorical moves, explained in Chapter 3.
Elements of persuasion
Persuasion has been a topic of interest to scholars since ancient times. The first and most influential theory of persuasion was developed by the Greek philosopher Aristotle. According to Aristotle, in order to make an argument, one has to study the following categories: the means or sources of persuasion, the language, and the arrangement or organization of the various parts of the treatment. The means or sources of persuasion are strategies for making three appeals, those of ethos, pathos, and logos: The first kind depends on the personal character of the speaker, the second on putting the audience into a certain frame of mind, the third on the proof, provided by the words of the speech itself (Aristotle, 1984, p. 2155). The language of the argument had to be carefully crafted; word choice was important, and the use of appropriate topoi, or themes, and metaphors, or tropes, was encouraged. Finally, the arrangement was also important: A speech has two parts. You must state your case, and you must prove it (Aristotle, 1984, p. 2257). A well-organized or properly arranged speech had three parts: introduction, argument and
counter argument, and epilogue. What you should do in your introduction is to state your subjects, in order that the point to be judged may be quite plain; in the epilogue you should summarize the argument by which your case has been proved (Aristotle, 1984, p. 2268). Naturally, the organization introduction, argument, counterargument, and epilogue are important in making any formal argument. As moves, they could be expected to be obligatory and appear in a typical sequence. In truly persuasive discourse, however, the three means of persuasion ethos, pathos, logos are equally or more important. They can be identified in texts using a top-down, discourse-structural analysis. Let us first examine how Aristotle defined these three categories of persuasion. Firstly, ethos is used in order to create a positive character of the writer. For Aristotle, the character of the speaker is a cause of persuasion when the speech is so uttered as to make him worthy of belief; for as a rule we trust men of probity more, and more quickly about things in general, while on points outside the realm of exact knowledge, where opinion is divided we trust them absolutely (p. 8). That is, persuasion can be achieved only by those speakers who appear to be positive characters for the audience. According to Aristotle, the worthiness of the cause presented by speakers depends upon the worthiness and reliability of the speakers themselves; thus, the speakers image in the eye of the audience is the crucial point against which the audience will test the worthiness of the cause. Therefore, it is the writers responsibility to produce such an image of themselves that they would be thought of as reliable and unfailing people. Pathos, another of the basic means of persuasion, is used when the audience is set into an emotional state by the speaker. Persuasion is effected through the audience when they are brought by the speech into a state of emotion; for we give very different decisions under the sway of pain or joy, liking or hatred (Aristotle, 1932, p. 9). Only an emotionally unsettled audience can react in the way the speaker intends it to react, because emotions serve as an impulse to take a certain action, and very often the audience will look at the presented case through the prism of their emotions. As Aristotle mentioned, to the audience that is eager and hopeful, the proposed object will seem as a valuable and worthy thing, while to the audience that is pessimistic and distrustful, the same object will seem the opposite (p. 91). Lastly, logos is employed when the speaker appeals to the reasonable side of the audience by utilizing rational arguments. As Aristotle noted, persuasion is effected by the arguments, when we demonstrate the truth, real or apparent (Aristotle, 1932, p. xlii). In other words, facts, statistics, and information constitute an essential part of persuasion. Aristotles rhetorical theory is powerful; it has survived through centuries. Rhetoricians and philosophers of all centuries appreciate it, highly commend it and
Chapter 5. Rhetorical appeals in fundraising
use it as the solid basis for their own theories. Chaim Perelman was one of the 20th century rhetoricians who adapted the ancient rhetorical theory to the contemporary discourse of persuasion. What Perelman did in his theory was the enlargement of understanding of argument by noting numerous subforms of argumentation (Arnold, 1982, p. 10). Exploring rhetorical theories of ancient philosophers, Perelman added some fundamental points to his theory of argumentation. For example, while describing one of the basic types of the arguments, namely, the Argument Based on the Structure of Reality (logos, in Aristotelian theory), he developed a new subtype of argument called Cause and Effect. It is thus the truth of an idea that can be judged by its effects (Perelman, 1982, p. 83). This kind of argument allows the audience to appraise the cause the speaker proposes through the consequences this cause will have in the future. Another new subtype of argument developed by Perelman was the argument of Model. According to Perelman, this type of argument can be employed when the speaker tries to persuade the audience through presenting a specific case as a model to be imitated. Thus, Perelman, led by the necessities of contemporary discourse of argumentation, used Aristotelian theory of persuasion to develop his theory of new rhetoric. Aristotles and Perelmans rhetorics have influenced research in composition and rhetoric in the U. S. (Corbett, 1965; Kinneavy, 1971). However, scholars in applied linguistics have not typically sought answers to persuasion in classical rhetoric. For example, consider the extensive research on academic persuasion by Hyland (1999a; 2001; 2002a; 2002b; 2004a). Although Hyland writes In fact, the ways that writers establish their credibility (or create an ethos) and consider readers potential attitudes to the argument (pathos) date back to Aristotle (2004a, p. 89), his analyses do not connect with the theories of Aristotle and new rhetoricians. Instead, his persuasion categories seem to stem directly from the data. Persuasive appeals analysis can be applied to any kind of persuasive text, e.g., essays, sales letters and reports. Specific appeals strategies may vary across genres, but the basic appeal types logos, ethos and pathos apply. The appeals analysis, like the other analyses in this book, provides one hundred percent coverage of the text. In other words, every sentence and word in the text can be categorized by one of the appeals. We believe that Aristotles and Perelmans theories provide an important complementary perspective on the discourse of direct mail letters. While rhetorical moves segment texts according to the communicative functions of texts, the primary role of appeals is not necessarily to communicate information for the reader. Instead, their intent is to make the reader act and, in the case of fundraising appeals, to donate. Even if appeals communicate information and facts about a cause, they do not do so for informations sake but to make the reader do something about the situation, in the form of giving money. In fact, even in logos, the writer
has at his/her disposal a variety of different strategies for the best possible persuasive outcome. The writer can use facts and statistics, cause-effect description, stories, etc. The study of appeals reveals how different strategies are used, not only to communicate but also to affect action. This chapter will apply a rhetorical appeals analysis to the ICIC sub-corpus of fundraising letters, introduced in Chapter 3.
2 Determining and analyzing rhetorical appeals In order to evaluate the degree of persuasion in the direct mail letters, a working system of appeals was needed. Such a system is found in Connor and Lauers (1985) work on persuasive writing. This system of persuasive strategies was designed and successfully used for teaching and evaluating college-level students argumentative essays. It includes 23 persuasive appeals with 14 rational appeals (logos), 4 credibility appeals (ethos), and 5 affective appeals (pathos). However, for the present analysis of fundraising discourse, only 19 of these appeal types are relevant, summarized in Table 5.1. Table 5A at the end of the chapter includes examples of each appeal.
Table 5.1 Definitions of rhetorical appeals
Rational appeals (Logos) R1 R2 Descriptive Example Using a compelling descriptive example from ones own or someone elses experience Narrative Example Using a compelling narrative example. Must contain a beginning, middle, and end of the story Classification Placing in a class or unit, and describing what that means Comparison Using comparison to support ones focus Contrast Using contrast to support ones focus Degree Arguing that two things are separated by a difference of degree rather than kind, or making an appeal for an incremental change Authority Using the authority of a person other than the writer Cause/effect Means/end Consequences Showing how one event is the cause of another Model Proposing a model for action that relies on existing programs
R3 R4 R5 R6
R7 R8 R9
R10 Stage in process Reviewing previous steps and looking forward to what steps need to be taken R11 Ideal or Principle R12 Information Using supporting facts and statistics Credibility appeals (Ethos) C13 First hand experience Providing information to show first hand experience or some authority on the subject C14 Showing writers respect for audiences interests and point of view C15 Showing writer-audience shared interests and points of view C16 Showing writers good character and/or judgment Affective appeals (Pathos) A17 Appealing to the audiences views (emotional, attitudinal, moral) A18 Vivid picture Creating a thought, a minds eye vision A19 Charged language Using strong language used to arouse emotions
2.1
Rational appeals (Logos)
Rational arguments are designed to appeal to the sensible and rational aspect of the readers mind. For example, the second type of argument is Narrative Example (R2) contains a beginning, middle, and end of a story.
(1) Ted is a single father with three children under 10. Hes never been on welfare and hes always had a job doing manual laborThere was a time when he felt like he had no choice but to tolerate his wifes constant abuse and neglect of their children. Then Ted decided the children deserved a chance to start over in another town, no matter how difficult it might prove to be.
The author of this description is trying to make the reader see the dreadfulness of the situation. Moreover, reading this example describing one family, the reader, according to the logical rule of induction, infers a general conclusion that such an example is true of a number of families. Such a conclusion, that this appalling family situation is true not only of a single family, but of a number of families, makes the reader willing to react to this appeal.Depicting a family of a certain type, the author exemplifies all families of this particular type, thus intensifying the effect of the appeal by implying that the number of unhappy families is actually bigger than just one. Descriptive Example (R1) is another appeal of this type of argument. (Ta-
ble 5A at the end of the chapter provides again the definitions with an example of each of the different appeal types.) Another type of argument found in Aristotles theory is the argument of Classification (R3). This kind of argument places a person or a thing into a certain class and then offers defining features. Our example of Classification would be as follows: In joining SCS, you join the ranks of those who believe that bringing art and art education to the city makes life better, richer, and more rewarding for the entire community. By making this rational appeal, the writer classifies and then defines the reader as a member of a noble group by making him akin to a limited circle of noble and distinguished individuals. The next two appeals, Comparison (R4) and Contrast (R5), build a logical argument on the relationship of like to like. In a fundraising letter, the appeal of Comparison would sound as follows: Our faculty-student ratio is 1:27. For law schools in the United States, the range of faculty-student ratios is from 1:13 to 1:35, but well over half of the law schools in the country have better ratios than we do. Comparison supports a conclusion about a subject from a description of a related subject; as in our example, the conclusion about one law school can be made from descriptions of other law schools. Unlike Comparison, the appeal of Contrast supports conclusion on a subject by describing its counterpart. For instance:
(2) Unfortunately, our view of the importance of philanthropy is not shared by all Americans. Many see philanthropy as no more than the grand gestures of the rich. They do not understand, as you do, that the museums, parks, hospitals and community organizations supported by philanthropy are the cornerstones of our very quality of life.
In this example, the writers opinion of the donor is raised by denigrating his/her counterparts people who do not donate. Rational appeal of Degree (R6) in Aristotles original theory is called an argument of More or Less. The rational principle of this argument, according to Aristotle (1932, p. 161), can be expressed by the following example: if the less frequent thing occurs, then the more frequent thing would occur. In fundraising discourse, one comes across the appeal of Degree in the form of asking for an increase in donations. For instance: Please consider an increase in your contribution to the Girls Scout Annual Campaign. By employing this appeal, the writer implies that if the donor has already given X amount of money, the next logical step would then be to increase their donation. One type of argument based on Perelmans category of person is the appeal of Authority (R7) (Perelman, 1982). The argument of Authority relies on the consistency between a person and his/her activities. In the argument from the authority, prestige is the quality that leads others to imitate acts of authoritative people. The
Authority appeal in fundraising discourse would employ a distinguished name to make the reader act under the influence of someone who is authoritative. For example: Pat LaCrosse asked me to send this information inviting you to join the Georgia OKeefe Circle of the Indianapolis Museum of Arts Second Century Society (SCS). The author used the name of Pat LaCrosse without explaining who this person is. The author assumes that the reader will be acquainted with Pat LaCrosse and will consider his actions authoritative. The example of the great is a rhetorician of such power that it can persuade people to commit the most infamous acts (Perelman, 1982, p. 217). An important name brings the flavor of authoritativeness to the discourse and makes it even more persuasive by presenting a model to be imitated by the reader. The appeal of Cause/Effect Means/End Consequences (R8) stems from both Aristotle and Perelmans (Perelman, 1982, p. 83) theories. According to Aristotle, Since it commonly happens that a given thing has consequences both good and bad, you may argue from these [to their antecedents] in urging or dissuading, in prosecuting or defending, in praising or blaming (Aristotle, 1932, p. 166). This appeal helps the writer to urge action on the readers part by forecasting effects, consequences, or ends. Perelman adds, Consequences can be observed or foreseen, ascertained or presumed. It is the truth of an idea that can only be judged by its effects (Perelman, 1982, p. 83). Thus, the writers of direct mail letters often employ the Cause/Effect Means/End Consequences appeal to let the reader evaluate an event through its described outcomes. For instance: As one of only a few zoos in the country that receives no local, state, or federal tax support, IZS must depend on donations for general operating funds from corporations like yours Here, the reader is urged to contribute in order to supply necessary funds to the organization that receives no local, state, or federal tax support, and as a consequence must depend on donations. The appeal of Model (R9), as discussed by Perelman, provides the reader with a description of the way a proposed end can be achieved. A working model reflects and supports the current case by a precedent. For instance:
(3) A group of your colleagues recently volunteered to help set the priorities for this campaign. They surveyed members of the staff and faculty councils, administrators and others and learned that we at IUPUI have a number of vital concerns.
Here, the author gives the reader a precedent A group of your colleagues recently volunteered to make him/her follow this model and take the same actions. Stage in Process (R10) is also an important argument in the theory of persuasion. According to Perelman, this appeal is used when a gap exists between the concept accepted by the audience and the proposal the writer is defending. The
gap is closed by showing how the proposed action can be a stage in a process. Instead of going from A to D, one offers to lead the interlocutor first to B than to C and finally to D (Perelman, 1982, p. 87). In other words, when the audience might think that the distance between the initial and final stage or goal of the process is impossible to cover, the writer creates one or more middle stages or transitional goals, which, in audiences opinion would be easier to reach. For instance:
(4) Three years ago, the Heritage Trust set aside land for the restoration of the Limberlost Swamp, near Geneva in eastern Indiana. Now, wildlife is returning to the area. Egrets, ducks and geese now gather at waterfowl resting ponds in large numbers; and native prairie grass has been planted to return natural diversity and other wildlife to the area.
Before the author indicates the final step, which in this case would be to return natural diversityto the area, he reviews what steps have been taken set aside land for restorationnative prairie grass has been planted in the long process of achieving the final goal to return natural diversityto the area. The rational appeal of Ideal or Principle (R11) also helps persuade readers. A convincing discourse is one whose premises are universalizable, that is acceptable in principle to all members of the universal audience (Perelman, 1982, p. 18). While persuading the audience, the writer should show that his/her argument is based on a universal principle that is accepted by all members of the audience. In the fundraising letters, an example of this appeal occurs as follows:
(5) The mission of the Indianapolis Zoological Society is to provide recreational learning experiences for the citizens of Indiana through the exhibition and presentation of natural environment in a way to foster a sense of discovery, stewardship, and the need to preserve the Earths plants and animals. In short, the Society is about connecting animals, plants and people.
In this example, the writer establishes a specific value: providing recreational learning experiencethrough the exhibition of natural environment under a universal value connecting animals, plants and people. If all members of the audience agree on the fact that bringing animals, plants and people together is valuable, then they would more quickly agree on a more determined value of learning the environment through the exhibition. The last rational appeal, Information (R12), also contributes to successful persuasion. The speaker must, first of all, be provided with a special selection of premises (facts) The more facts he has at his command, the more easily he will make the point (Aristotle, 1932, p. 157). The appeal of Information presents facts and statistics and gives definiteness to the writers argument. The writers of fund-
raising letters must persuade the audience not by vague generalities, but by providing the reader with accurate and meaningful numbers. For example:
(6) Through the efforts of about 300 volunteers, nearly $89,900 was raised through the IUPUI Campus Campaign. Almost 900 of us made new gifts in support of the things we care about. Together with those who were already donors, there are over 1,350 staff and faculty supporting the work of IUPUI with their gifts.
The numbers in this paragraph show the reader the definiteness of the writers point on the one hand, and on the other they demonstrate that the writer is knowledgeable on the subject. To conclude this section on rational appeals, it can be stated that rational appeals are used to target the logical and rational side of the audiences mind (logos). These twelve arguments are employed by the writers to demonstrate the truth to the reader in a persuasive way. As Perelman (1982, p. 13) noted, one of the aims of persuasive discourse, and, consequently, of fundraising discourse, is to make the reader admit the truth and to provoke him to take an immediate or eventual action. However, one should remember that, apart from logos, persuasion is also effected through ethos, the character of the writer. 2.2 Credibility appeals (Ethos)
According to Aristotle, the discourse must not only convince through the argument, it must create a trustworthy image of the speaker.
The character of the speaker is a cause of persuasion when the speech is so uttered as to make him worthy of belief; for as a rule we trust men of probity more, and more quickly about things in general, while on points outside the realm of exact knowledge, where opinion is divided we trust them absolutely. (Aristotle, 1932, p. 8)
In fundraising discourse, the writer plays an important role because the goal of direct mail letters is to elicit a response from the audience in the form of giving money to a particular non-profit organization. It is almost always the case in direct mail letters that the organization is represented by the writer. Since the trustworthiness and reliability of the organization can be a crucial factor in the donors decision whether to give money or not, then it is the writers responsibility to create such an image of him/herself and the institution in the letter that he/she would be thought of as a reliable and unfailing person. The first credibility appeal is the appeal of First Hand Experience (C13). In fundraising discourse, it is used as a technique for providing information directly from the writers experiences, thus, establishing the writers credibility; it gives the
impression that the writer is knowledgeable and versed on the subject he/she is talking about. An example follows:
(7) Purdue has been a part of my life for as long as I can remember. I was raised in West Lafayette. As I grew older, I realized more and more that Purdue isnt just a state institution; it is a public university. Moreover, it is a world-class university.
This example indicates that the writer is a knowledgeable person, who knows and cares about Purdue University. Thus, the author of the letter tries to create an impression of him/herself as a individual of intelligence and virtue through the display of deep respect and gratitude for the place where he/she was educated: Purdue isnt just a state institution; it is a public universityit is a world class university. The next appeal centers on the Writers Respect for Audiences Interests and Point of View (C14) and is employed to create the necessary impression of a good willed writer in the audiences mind. This appeal often takes the form of the writers appreciation for what the donors have done for the organization. For example:
(8) In looking back at the last decade, we at the Indianapolis Zoological Society (IZS) wish to express our sincere thanks to all companies who have helped us to achieve so many successes at the Indianapolis Zoo and White River Gardens.
Since he/she is so appreciative of the noble and virtuous deeds of others, the audience would consider the writer as a man of good will. When a writer acknowledges shared values and ideas that are held with the audience, this reflects the appeal of Showing writer-audience shared interests and points of view (C15). Using this appeal, the author builds up solidarity with the audience by making himself a part of it. For example:
(9) Because if you and I truly want to preserve philanthropy as a way of life, we must make certain that Americans everywhere take philanthropy seriously, that they talk about it, debate it, challenge it, and ultimately keep it alive as a cherished tradition.
The last of the credibility appeals is based on the Writers Good Character and/or Judgment (C16). It implies the same Aristotelian ideas of intelligence, virtue, and good will, but is focused on the creation of the image of the writer. In the case of this appeal, the author may take a subjective stance to make a judgment. For example: Who helps Randy break a cycle of violence and become a better dad? Who helps Michael, who has spina bifida, learn to talk, dress himself, and get around independently? Without you, no one. Such a judgment should work towards contributing to the positive image of the writer in the readers eye. By making positive
comments about the reader, a positive helping character, the writer urges the reader to view him/her as a person of good intentions, because it takes good will to notice and appreciate the good deeds of others. To conclude the discussion about credibility appeals (ethos), it can be stated that persuasion cannot be effective without taking into consideration the role of the writers image. So far, we have talked about the role of rationality and credibility in the theory of persuasion; however, Aristotle defined a third essential aspect of persuasion theory, namely, emotional or affective appeals (pathos). 2.3 Affective appeals (Pathos)
Persuasion is effected through the audience when they are brought by the speech into a state of emotion; for we give very different decisions under the sway of pain or joy, liking or hatred (Aristotle, 1932, p. 9). Emotions can serve as an impulse to take a certain action, and very often the audience will look at the presented case through the prism of their emotions. As Aristotle mentioned, to the audience that is eager and hopeful, the proposed object will seem as a valuable and worthy thing, while to the audience that is pessimistic and distrustful the same object will seem the opposite (p. 91). The following discussion presents the three appeals that are used in fundraising discourse to target the emotional aspect of audiences mind. Appeal to the Audiences Views (A17) arouses emotions in the reader by addressing his/her attitudinal and moral values. In fundraising discourse, this appeal can take the form of a direct request to donate, for this or that reason. For instance: Please make a tax-deductible gift to Community Centers of Indianapolis in 1999, and know that you are playing an important part in meeting the needs of its community. In this example, the author makes an emotional appeal to donate followed by a reason for the donation. The word tax-deductible also appeals to the audiences values, suggesting that the donor also may profit by way of cutting his/her tax. The next affective appeal, Vivid Picture (A18), is very important to persuasion theory in the sense that it creates the effect of the presence of a reader in a situation depicted by the writer. Consequently, the writer, trying to persuade the audience, needs to bring an object as close to the audience as possible. For example:
(10) Do you remember how wonderful and how proud you felt in 1980 when the young United States Hockey Team beat the powerful Soviet Team 43, and then went on to beat Finland 42 for the gold or in 1984 when 16 year old Mary Lou Retton, needing a 9.95 in her final event to tie for first place in the all around Gymnastics competition, vaulted her way to the gold by scoring a perfect 10?
Putting the statement into the form of a question involves the reader and makes him/ her look for the answer and thus, makes him present at the event that took place long ago. Dwelling on the details creates desired emotions in the reader. Thus, creating a Vivid Picture is an essential appeal to arouse desired emotions in the reader. The last appeal in the system, Charged language (A19) is the appeal that usually arouses emotions of anger and indignation. The language that is used by the writer to evince those emotions has a negative connotation. As Aristotle (1932, p. 122) said, the writer should heighten the effect of his description with fitting attitudes, tones, and dress. The emotions should be appropriate to the subject, and if the writer wants the audience to experience anger, he needs to be angry in his language. For instance: When it comes to the misuse and destruction of our natural areas, reality is not only harsh, it is deadly. Once they are developed or altered, and their fragile ecosystems are disrupted, we lose them forever. Such words as misuse, destruction, harsh, deadly, loseforever are charged with negative emotions. While employing such an angry description, the writer attempts to make the audience experience relevant emotion. Consequently, being in a relevant emotional condition, the readers might take a relevant action.
3 Analysis, segmentation, and classification The system of 19 rhetorical appeals was applied to the 245 fundraising letters in the ICIC corpus. First, a sample of 12 letters was evaluated to ensure a high coefficient of interrater reliability. Three trained researchers worked separately on the identification of appeals in their copies of the sample. After the negotiation of differences in the analysis and finalization of the system, another sample of 50 letters was analyzed to test the level of agreement among the raters.1 After all the discrepancies were negotiated, the other 183 direct mail letters were put to analysis. Each occurrence of a particular appeal in the letters was identified, coded, and then, manually counted. A sample tagged letter is shown in Table 5B. The results of the analysis are presented in the following section.
1. The total number of appeals identified was 463. The number of appeals with disagreement was only 38, which resulted in a high reliability coefficient of r =.92. Some of the initial definitions of appeals were further refined in order to better describe fundraising discourse. The following section describes the final system and the theoretical basis for it.
3.1
Results and discussion
The results of the segmentation and classification are shown in Table 5.2 and Table 5.3. The overall number of appeals in the 245 sample letters was 1,829. Table 5.2 shows the breakdown of numbers and percentages of appeals by appeal type (rational, credibility, and affective) and by non-profit field. The overall percentage of rational appeals in all letters was 48% percent; the corresponding percentages for credibility and affective appeals were 25% and 28% percent, respectively. Table 5.2 indicates that the use of rational appeal was quite consistent across all six fields; however, the high amounts of use in the Health and Human Services and Environment fields (55% and 47%) were unexpected, since common wisdom in fundraising suggests more emotional appeals in these fields. Human services letters are typically seen as appealing to readers through human sob stories, while environmental fundraisers are often seen as liberal idealists.
Table 5.2 Fundraising letter appeals counts and percentages by non-profit field
Rational Appeals Health and Human Services Letters (74) Environmental Letters (10) Community Development Letters (10) Education Letters (108) Arts and Culture Letters (37) Other Letters (6) All Letters (245) 320 (55%) 43 (47%) 34 (49%) 316 (44%) 138 (44%) 19 (34%) 870 (48%) Credibility Appeals 104 (18%) 21 (23%) 17 (24%) 214 (30%) 83 (27%) 14 (25%) 453 (25%) Affective Appeals 153 (27%) 27 (30%) 19 (27%) 193 (27%) 91 (29%) 23 (41%) 506 (28%) Total Appeals 577 91 70 723 312 56 1829
Concerning the use of credibility appeals, Table 5.2 shows that Education had the highest percentage of these appeals (30%), and Health and Human Services the lowest percentage (18%). The high percentage of credibility appeals in Education reflects the relationship between the writer representing educational agencies and the target audience. Most of the letters in the corpus come from Indiana University schools, such as the School of Dentistry, the School of Law, and the School of Liberal Arts. These letters were addressed to former students of IU and were, for the most part, authored by faculty personally acquainted with the addressees. In the letters, the writers stress the interpersonal connection with students so that the students would find the information in the letter credible and, thus, more appealing.
As indicated in Table 5.2, 41% of the appeals in the letters representing Other Organizations were affective appeals, which is significantly higher than any other field. The letters in this category represent mainly religious organizations, which address the audience through the extensive use of affective appeals. Table 5.3 shows separate counts and percentages for each specific appeal type. As Table 5.3 indicates, among rational appeals, R8 (Cause/Effect Means/End Consequences) and R12 (Information) had the highest percentages: 10.3% and 17.7%, respectively. The credibility appeal that occurred most often was C14 (Showing Writers Respect for Audiences Interests and Point of View. Finally, among affective appeals, the highest percentage occurred with A17 (Appealing to the Audiences Views; 21.2%). In summary, the results here show that all three major types of appeals are used: rational, credibility, and affective. However, the extent of the use of these appeals in the letters is not equal.Rather, the writers of fundraising letters choose, for the most part, to persuade the audience through the use of the rational appeal.As a matter of fact, in some of the non-profit fields, it is used almost twice as much as credibility and affective appeals combined. The finding about the prevalence of logos in the letters was a surprising finding. Common wisdom would expect emotion from fundraisers. However, a previous study (Connor & Upton, 2003) also found that fundraising letters resemble academic writing according to Bibers multidimensional analysis. They are carefully constructed and polished. Informal interviews with fundraisers suggest that they want to sound factual and be taken seriously. Also, it should be noted that the narrative and descriptive examples in our rating system count as rational appeals. Since they are often rather lengthy, too, we might be advised to analyze the data after removing them. It would also be interesting to see whether variation in the expected audience would change the type of appeals used. For example, would a younger audience respond better to emotional appeals than older ones? As far as the use of individual appeals is concerned, we can conclude that the most extensively used rational appeals are the ones that provide the audience with the beneficial results or consequences of a particular philanthropic program (R8), and those appeals that provide the reader with information about the organization (R12). The credibility of the organization is most often achieved when the writers demonstrate appreciation of the donors past actions (C14) and less often by stressing organization-donor shared interests and goals (C15). Among the affective appeals, the emotional appeal to the donors views and attitudes, which in fundraising texts takes the form of a direct request for donation, stands out as the most frequently used individual appeal.
Table 5.3 Individual appeals counts and column percentages, by non-profit field
Community Development 70 19 (27.1%) 193 (26.7%) 723 5 8 0 4 17 13 4 2 (7.1%) (11.4%) (0.0%) (5.7%) (24.3%) (18.6%) (5.7%) (2.9%) 56 (7.8%) 124 (17.2%) 23 (3.2%) 11 (1.5%) 214 (29.6%) 161 (22.3%) 30 (4.2%) 2 (0.3%) 16 61 4 2 83 73 18 0 312 34 (48.6%) 316 (43.7%) 138 (44.2%) (5.1%) (19.6%) (1.3%) (0.6%) (26.6%) (23.4%) (5.8%) (0.0%) 9 (29.2%) 19 (33.9%) 2 10 1 1 14 15 6 2 56 (3.6%) (17.9%) (1.8%) (1.8%) (25.0%) (26.8%) (10.7%) (3.6%) 23 (41.0%) 2 3 2 0 0 3 5 4 0 3 3 10 (2.9%) (4.3%) (1.4%) (0.0%) (0.0%) (4.3%) (7.1%) (5.7%) (0.0%) (4.3%) (4.3%) (14.3%) 7 2 16 5 4 19 20 67 4 46 16 110 (1.0%) (0.3%) (2.2%) (0.7%) (0.6%) (2.6%) (2.8%) (9.3%) (0.6%) (6.4%) (2.2%) (15.2%) 2 1 7 5 2 10 9 34 0 7 2 59 (0.7%) (0.3%) (2.2%) (1.6%) (0.6%) (3.2%) (2.9%) (10.9%) (0.0%) (2.2%) (0.6%) (18.9%) 0 0 0 0 2 0 4 2 0 2 0 9 (0.0%) (0.0%) (0.0%) (0.0%) (3.6%) (0.0%) (7.1%) (3.6%) (0.0%) (3.6%) (0.0%) (16.1%) 35 (1.9%) 23 (1.3%) 36 (2.0%) 12 (0.7%) 20 (1.1%) 61 (3.3%) 47 (2.6%) 188 (10.3%) 5 (0.3%) 80 (4.4%) 39 (2.1%) 324 (17.7%) 870 (47.6%) 102 (5.6%) 281 (15.4%) 40 (2.2%) 30 (1.6%) 453 (24.8%) 387 (21.2%) 100 (5.5%) 19 (1.0%) 506 (27.7%) 1829 Education Arts and Culture Other Total
Individual Appeal (1.1%) (2.2%) (3.3%) (0.0%) (1.1%) (3.3%) (3.3%) (7.7%) (0.0%) (2.2%) (0.0%) (23.0%) (1.1%) (16.5%) (2.2%) (3.3%) (23.1%) (19.8%) (7.7%) (2.2%)
Health and Human
Environment
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12
23 15 9 2 11 26 6 74 1 20 18 115
(4.0%) (2.6%) (1.6%) (0.4%) (1.9%) (4.5%) (1.0%) (12.8%) (0.2%) (3.5%) 3.1%) (19.9%)
1 2 3 0 1 3 3 7 0 2 0 21
R Total
320 (55.5%)
43 (47.3%)
C13 C14 C15 C16 C Total A17 A18 A19
22 (3.8%) 63 (10.9%) 10 (1.7%) 9 (1.6%) 104 (18.0%) 107 (18.5%) 35 (6.1%) 11 (1.9%)
1 15 2 3 21 18 7 2
A Total
153 (26.3%)
27 (29.7%)
Field Total
577
91
4 Linguistic description of appeals In order to explore ways in which appeals are realized linguistically in this case, lexically wordlist, keyword, and concordance analyses were performed for each rhetorical appeal; keyword data for all of the 19 appeals analyzed in this study is included in Table 5C. However, the data from all appeals provided more information than can be adequately discussed in this chapter. Therefore, one frequently used appeal with compelling keyword data was chosen to be presented as an illustration of how linguistic (lexical) variation could be analyzed: appeal A17, appealing to audiences views. Affective appeals represent 27.7% (506/1829) of the total appeal use in the ICICs fundraising letters (see Table 5.2). These appeals play a vital role in persuading readers by targeting the audiences emotions (Connor & Gladkov, 2004). As such, appeal A17, appealing to audiences views, which accounts for 76% (387/506) of all the affective appeals, attempts to arouse readers emotions by speaking to their emotional, attitudinal, and moral views. Examples of its use in letters are shown here:
(11) P.S. Many adults in our community dont enjoy reading the way you and I do. Wont you help plant the seed of learning in them through a gift to [Health and Human Services Organization]? (Excerpt taken from letter <102ATL003> <251> of the ICIC corpus) (12) If you enjoy reading the stories in the enclosed brochure, there is an excellent chance that you will enjoy membership in [Arts and Culture Organization]. There has never been a better time to join than right now. [Arts and Culture Organization] offers more opportunities than ever before to learn about Indiana history in fun and interactive ways. You can play an important role in preserving Indianas history by being an active supporter of [Arts and Culture Organizations] mission. Start enjoying the benefits of [Arts and Culture Organization] membership today by completing and returning the enclosed reply card. Dont forget to choose your free gift when selecting your preferred membership level. [Arts and Culture Organization] is missing only one important component...you! We hope you will join us. (Excerpt taken from letter <601CZL183> <643> of the ICIC corpus)
These samples show how writers use the readers own views to persuade them, stressing the intrinsic value of their cause or organization and then urging readers to support it, thus proving the value of their own emotions, attitudes, or morals. In fundraising discourse, appeal A17 (appealing to audiences views) is often used when the writer makes a direct request for a donation while providing a specific reason that explains why this donation should be made (Connor & Gladkov, 2004).
4.1
Wordlists
The Wordsmith program was used to compile wordlists for the entire ICIC corpus of fundraising letters, a second reference corpus, and each individual appeal (Scott, 2004a). Wordlists, which show the frequency of every word used in a corpus, are useful tools for identifying potential differences between reference and specialized corpora that can later be examined in greater detail (Henry & Roseberry, 1996; Hunston, 2002).
Table 5.4 Word frequency counts for affective appeals and corpora2
A17 appealing to audiences views THE TO OF YOU # AND YOUR A IN WILL FOR GIFT OUR PLEASE THAT HELP WE BE THIS I IS WITH CAN OR SUPPORT ICIC Fundraising Letter Corpus THE TO AND OF # A IN FOR YOU YOUR OUR THAT IS WE ARE WITH WILL AS THIS I HAVE BE AT SCHOOL ON British National Corpus THE OF AND TO A IN # THAT IS IT FOR WAS I ON WITH AS BE HE YOU AT BY ARE THIS HAVE BUT
2. The symbol # indicates the use of numerals 09
In the present case, the word use in appeal A17 (appealing to audiences views) was compared to the two other affective appeal categories, as well as two reference corpora: all fundraising letters in the ICIC corpus, and the British National Corpus (BNC). Table 5.4 shows the 25 most frequent words for appeal A17 and in the comparison corpora. A visual comparison of the reference and appeal wordlists reveals word frequency differences that characterize the content and range of each corpus. While the BNCs wordlist is comprised entirely of function words such as pronouns, prepositions, conjunctions, and articles, the wordlist for the ICIC letter corpus shows more specificity. Words such as please appear in the wordlist of A17, reflecting the appeals purpose of appealing to the audiences views. Pronouns you and your show high frequencies due to their use in addressing readers. However, while some differences in word frequency can be observed in these tables, frequency alone does not indicate that a word is characteristic of the appeal. In other words, wordlists represent only a first step in finding differences between corpora. To determine which words are intrinsic to the language of the appeal, wordlists of specific corpora must be compared to a reference corpus wordlist to perform a keyword analysis. 4.2 Keywords
After wordlists were created, a Wordsmith keyword analysis was completed. A keyword analysis shows the relative frequency of words usages in a specific group of texts, in this case an appeal type, compared to the relative frequency of those words usages in a much larger group of texts, in this case the ICIC letter corpus. Thus, words that occur with a higher relative frequency in the appeal than they do in the letter corpus are identified as keywords, in that their use represents the lexical tenor of the appeal (Hunston, 2002; Scott, 2004b). Negative keywords, which occur less relatively frequently in the appeal than in the corpus, also show the lexical tenor of the appeal, through their lack of use. The keyword analysis collected data for frequency, keyness and significance (p value). The frequency shows how many times a keyword occurs in a specific category of appeals in the corpus. The keyness of a keyword represents the value of log-likelihood or Chi-square statistics; in other words, it provides an indicator of a keywords importance as a content descriptor for the appeal.The significance (p value) represents the probability that this keyness is accidental.Therefore, the higher the keyness value and the lower the p value, the more distinctive a word is for a particular appeal.This shows that the keyword is used more frequently in the selected group of texts than in the general corpus. In the case of negative keywords, which occur less in a certain appeal than in the letter
corpus, a low negative keyness value and a low p value indicate that the word is less distinctly used in the appeal than in the general corpus. Table 5.5 shows the data collected by a Wordsmith keyword analysis of the A17 appeals compared to the entire ICIC letter corpus, arranged by semantic function (described below).
Table 5.5 Keywords in Appeal A17 in order of distinctiveness
Semantic Function Second person pronouns Solidarity between reader and writer/organization Description of readers generosity Key Word your you join us membership gift contribution donation check pledge please consider help today will make hope send tax enclosed card envelope return deductible receive name sincerely to Frequency 395 440 44 90 45 157 82 38 33 33 154 58 135 70 185 86 59 37 76 75 50 46 40 29 36 6 3 632 Keyness* 186.9 156.1 37.3 27.1 26.4 110.1 64.7 47.3 36.9 28.2 159.7 59.0 56.9 56.7 40.4 39.0 32.7 30.8 89.3 54.9 45.0 43.8 38.7 28.4 26.7 -36.0 -41.1 25.3
Inciting of a response from readers
Incentives for donation or ease of giving
Negative keywords Multiple categories
* P-values for all keywords are less than.01, indicating that there is a less than 1% danger of error in the calculations
As Table 5.5 shows, your is the keyword most characteristic of this appeals word use, with a keyness of 186.9. On the other end of the spectrum, the negative keyword sincerely is least characteristic of appeal A17, with a keyness of -41.1. Appeal A17 contains 28 keywords, which can be grouped into seven categories by their semantic function. That is, these are words from particular semantic domains associated with the communicative purposes of this appeal an appeal to the audiences emotional, attitudinal, and moral views. In A17, second person pronouns are used to address the audiences perspectives and to acknowledge their actions: Your contribution has been important If you enjoy reading the stories in the enclosed brochure, there is an excellent chance that you will enjoy membership. Besides recognizing readers as individuals, A17 also uses words to stress the writers or organizations solidarity with the audience, in sentences such as I urge you to join us as we build better ways to help our students and Please join us, by sending in your membership-application today! These keywords indicate that the writer views the reader as a peer, one who would be a valuable addition to the organization. As Connor and Gladkov (2004) mention, appeal A17 is often used to directly request donations. Three different categories of keywords are used in donation requests: keywords that describe forms of generosity, keywords that incite readers to donate, and keywords that indicate incentives for donation. The first group of keywords depicts the forms that readers generosity can take, using many synonymous monetary terms. This group of descriptive keywords appears in phrases and sentences such as: By making a contribution or a pledge... It only takes three quick, easy steps to obtain a matching gift from your (or your spouses) employer to enhance your donation and Make your check payable to [Education Organization]. The second group of keywords contains directives like consider, make, send, and help along with the use of please, today, hope, and will to urge readers to respond to the letter with a donation to the organization. Sentences like I hope you will send in your renewal gift today Please consider sending your donation in today You can make the difference today with your check and Please send in your gift today and help us reach thousands of Indiana children directly appeal for donations from readers, using language that requests a response. The third group of keywords is used in sentences to illustrate the easiness of donating or offer incentives for donating: Return the completed card with your check by October 15 to receive an invitation to a special artist dinner on November 8 Your gift, made through [Education Organization], is tax-deductible Just fill out the enclosed pledge card and send it in the return envelope today. The keywords return, enclosed, card, and envelope depict how easy making a donation will be for readers; they can simply fill out and return the paperwork
enclosed in the letter. Additionally, receive, tax and deductible all provide incentives for donating; donators might receive a gift or tax deductions. Appeal A17 also has two negative keywords: name and sincerely. In other appeal types, the keyword name serves as the placeholder at the beginning of the letter for the heading and greeting (e.g. Dear name,) which will later be personalized for each recipient. On the other hand, sincerely is used as a closing salutation at the end of the letter. The negative keyness of these two words indicates that appeal A17 does not appear at the beginning or end of fundraising letters; it is distinct from the greeting and closing salutations. The last of A17s keywords in table 5.5, to, is more difficult to account for, because it can be associated with several semantic and grammatical categories, as it appears in conjunction with pronouns, words that describe readers generosity, incentives for donation, and expressions of solidarity. The sentence Please consider a gift to [Arts and Culture Organization] before December 31 to receive full tax deductibility for this year shows two instances of to within appeal A17. Other occurrences of the word can be seen in phrases like gift to the, your gift to, a contribution to, and you to join, which are used throughout the appeal.
5 Appeals and discourse structure of letters Appeals and moves represent two complementary top-down approaches to discourse. As such it is useful to compare the distribution of these two features in the letter discourse. Therefore, a preliminary analysis compared the distribution in 50 randomly selected letters from the corpus. Table 5.6 presents the tabulation of appeals used by each move type. The table includes each occurrence of a specific appeal in the identified move types in these 50 letters. As can be seen, in most cases, each move type consisted of multiple appeals. For example, move occurences of Move Type 3 included 15 different appeal types. The results suggest that the occurrence of certain appeals in certain move types is somewhat predictable. For example, rational appeals tend to be placed in Move Types 2 and 3, towards the beginning of the letter, with the exception of appeals R8 and R12, which occur also in some of the later moves in the letters. It is interesting, and not surprising, that appeals C14 and A17 are sprinkled throughout the letter. The writer shows respect for the audiences interests (expressed often as a thank you for previous contributions) and appeals to his/her views (expressed as a request for a donation) throughout the letter.
Table 5.6 Placement of appeals in each move type in 50 randomly selected letters
Move 1 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 C13 C14 C15 C16 A17 A18 A19 Total Appeals 1 0 0 0 0 0 3 0 0 0 0 2 2 0 0 1 2 2 0 13 Move 2 9 7 5 1 6 5 6 19 1 15 7 53 18 14 3 4 11 9 0 193 Move 3 0 0 3 0 1 7 3 24 1 5 1 23 4 33 4 4 85 1 0 199 Move 4 0 0 0 0 0 0 0 2 0 0 0 2 1 1 0 0 23 1 0 30 Move 5 0 0 0 0 0 0 0 1 0 0 1 8 1 3 0 0 16 0 0 30 Move 6 Move 7 0 0 0 0 0 0 0 2 0 0 0 0 0 26 0 0 6 0 0 34 0 0 0 0 0 0 2 0 0 0 0 0 0 2 1 0 1 0 0 6
The results also point to the benefits of studying discourse structures from multiple perspectives. For example, Chapter 3 (Table 3.10) shows how only two Move Types (2 and 3) were predominant in the letters, accounting for about 65% of all the moves in these texts. The present analysis, however, shows that appeals reflect different functional considerations from moves; these same two move types consist of many different types of appeals (all appeal types except A19). At the same time, a single appeal type is distributed across multiple move types. All in all, appeals are the ways in which ideas are expressed to persuade the reader. Any of the moves that were identified in the letters can in theory be expressed through any of the three broad types of appeals (rational, emotional, credibility). These broad types of appeals can be expressed through a choice of individual appeals.
6 Conclusion This chapter has explored the purpose and characteristics of rhetorical appeals in fundraising letters. The characterization and use of appeals can be dated back to Aristotle, and has since been extended and enhanced by rhetoricians and philosophers. Our study approached the study of appeals as a top-down discourse analysis of fundraising letters. After the set of possible appeal types was determined, the letters were segmented and categorized into appeals. Linguistic analyses were conducted to gain a better understanding of the typical linguistic category of each appeal.A closer analysis of one appeal type A17 served as an example of the characteristics of word usage within individual rhetorical appeals from fundraising letters. Wordlists show how A17s word frequencies diverge from the word frequencies of other affective appeals, and of reference corpora. By examining the presence and functions of keywords, the purpose and tenor of the appeal is revealed through its most characteristic words. Through this investigation, appeals become defined as distinct elements of rhetorical structure. The analysis of rhetorical appeals provides a complementary perspective to the study of fundraising letters. Also a top-down approach to discourse, appeals analysis differs from rhetorical moves analysis in that its function is to reveal the persuasive roles of text sections, not the communicative or informative ones, as is the case with moves. It should be noted that the sequencing of appeals in letters is not as predictable as that of moves. However, preliminary analysis revealed interesting patterns about the placement of the most frequently occurring appeals. Further analyses should continue exploring the placement of all the appeals for a more comprehensive understanding of their discourse organizational tendencies.
Table 5A Definitions and examples of rhetorical appeals

Rational R1 Descriptive Example Using a compelling descriptive example from ones own or someone elses experience Families are being torn apart, and too often, children are the victims. Kids like Tommie J., made a ward of the court because of repeated beatings by an alcoholic father; Alice, sent to a group home to get help because of severe behavior disorders; and John H., a recovering alcoholic, rebuilding a relationship with his family so they can live together again. R2 Narrative Example Using of a compelling narrative example. Must contain a beginning, middle, and end of a story Ted is a single father with three children under 10. Hes never been on welfare and hes always had a job doing manual laborThere was a time when he felt like he had no choice but to tolerate his wifes constant abuse and neglect of their children. Then Ted decided the children deserved a chance to start over in another town, no matter how difficult it might prove to be. R3 Classification Placing in a class or unit, and describing what that means In joining SCS, you join the ranks of those who believe that bringing art and art education to the city makes life better, richer, and more rewarding for the entire community. R4 Comparison Using comparison to support ones focus Our faculty-student ratio is 1:27. For law schools in the United States, the range of faculty-student ratios is from 1:13 to 1:35, but well over half of the law schools in the country have better ratios than we do. R5 Contrast Using contrast to support ones focus Unfortunately, our view of the importance of philanthropy is not shared by all Americans. Many see philanthropy as no more than the grand gestures of the rich. They do not understand, as you do, that the museums, parks, hospitals and community organizations supported by philanthropy are the cornerstones of our very quality of life. R6 Degree Arguing that two things are separated by a difference of degree rather than kind, or making an appeal for an incremental change Please consider an increase in your contribution to the Girls Scout Annual Campaign. R7 Authority Using the authority of a person other than the writer
Pat LaCrosse asked me to send this information inviting you to join the Georgia OKeeffe Circle of the Indianapolis Museum of Arts Second Century Society (SCS). R8 Cause/effect Means/End Consequences Showing how one event is the cause of another As one of only a few zoos in the country that receives no local, state, or federal tax support IZS must depend on donations for general operating funds from corporations like yours R9 Model Proposing a model for action that relies on existing programs A group of your colleagues recently volunteered to help set the priorities for this campaign. They surveyed members of the staff and faculty councils, administrators and others and learned that we at IUPUI have a number of vital concerns. R10 Stage in process Reviewing previous steps and looking forward to what steps need to be taken Three years ago, the Heritage Trust set aside land for the restoration of the Limberlost Swamp, near Geneva in eastern Indiana. Now, wildlife is returning to the area. Egrets, ducks and geese now gather at waterfowl resting ponds in large numbers; and native prairie grass has been planted to return natural diversity and other wildlife to the area. R11 Ideal or Principle As our state continues to develop, we must work harder to protect important natural areas for wildlife and recreation. R12 Information Using supporting facts and statistics Through the efforts of about 300 volunteers, nearly $89,000 was raised through the IUPUI Campus Campaign. Almost 900 of us made new gifts in support of the things we care about. Together with those who were already donors, there are over 1,350 staff and faculty supporting the work of IUPUI with their gifts. Credibility C13 First hand experience Providing information to show first hand experience or some authority on the subject Purdue has been a part of my life for as long as I can remember. I was raised in West Lafayette. As I grew older, I realized more and more that Purdue isnt just a state institution; it is a public university. Moreover, it is a world-class university. C14 Showing writers respect for audiences interests and point of view In looking back at the last decade, we at the Indianapolis Zoological Society (IZS) wish to express our sincere thanks to all companies who have helped us to achieve so many successes at the Indianapolis Zoo and White River Gardens.
C15 Showing writer-audience shared interests and points of view Because if you and I truly want to preserve philanthropy as a way of life, we must make certain that Americans everywhere take philanthropy seriously, that they talk about it, debate it, challenge it, and ultimately keep it alive as a cherished tradition. C16 Showing writers good character and/or judgment Who helps Randy break a cycle of violence and become a better dad? Who helps Michael, who has spina bifida, learn to talk, dress himself, and get around independently? Without you, no one. Affective A17 Appealing to the Audiences views (emotional, attitudinal, moral) Please, make a tax-deductible gift to Community Centers of Indianapolis in 1999, and know that <company> is playing an important part in meeting the needs of its community. A18 Vivid picture Creating a thought, a minds eye vision. Do you remember how wonderful and how proud you felt in 1980 when the young United States Hockey Team beat the powerful Soviet Team 43, and then went on to beat Finland 42 for the gold or in 1984 when 16 year old Mary Lou Retton, needing a 9.95 in her final event to tie for first place in the all around Gymnastics competition, vaulted her way to the gold by scoring a perfect 10? A19 Charged language Using strong language used to arouse emotions. When it comes to the misuse and destruction of our natural areas, reality is not only harsh, it is deadly. Once they are developed or altered, and their fragile ecosystems are disrupted, we lose them forever.
Table 5B Sample Fundraising Letter with Appeals Indicated*

Dear Mrs. Name, <begin R2> Ted is a single father with three children under 10. Hes never been on welfare and hes always had a job doing manual labor. Life isnt easy for Ted, but hes determined to raise his children himself and be a good role model for them. It wasnt always like that. There was a time when he felt like he had no choice but to tolerate his wifes constant abuse and neglect of their children. Then Ted decided the children deserved a chance to start over in another town, no matter how difficult it might prove to be. <end R2> <begin A18> What happens when people are living on the edge -- barely able to survive -- and an unexpected emergency arises? What happens if they cant pay their heating or electric bill because of extreme temperatures? What happens if their children get sick? What happens if they get sick? <end A18> <begin R11> Starting over can be very hard -- especially for people who dont have family or financial resources to draw from in an emergency. <end R11> <begin C14> Thats why Im writing to you today. Through your financial support, youve already demonstrated that you want to help genuinely needy people begin anew. So, Id like to give you one more way to make a difference: The [Health and Human Services Organization] Card. <end C14 <begin R12> Although it cant be used to make purchases or withdraw money from an ATM, the value of this Card could be immeasurable -- for the person whose life it might change. Heres how it works: Simply detach the Care Card from the top of this letter and keep it handy. Then, if you know of or come across someone who needs our help, please give the card to him or her. It shows our address and phone number in [City], but through our Service Extension Office, we can put the person in contact with the volunteer representative in [County]. Well welcome their call and do our best to help them through the difficult time, so they can get on with their lives. <end R12> <begin A17> And just as important, I invite you to renew your partnership with [Health and Human Services Organization] by sending a contribution, once again, today. Your donation of $25, $50, $125, $250 or more will help us remain a steady source of assistance for people like Ted. <end A17> <begin C14> Thank you for your continued financial support, and for remaining alert for neighbors who need a helping hand. God bless you for your compassion and kindness. <end C14> Blessings, [Writers Name] Major DIVISIONAL COMMANDER <begin A17> P.S. Your gift today will help us care for struggling families, hungry children, and lonely and needy senior citizens right here in this community. <end A17> * This sample is from letter <108AXL026> <322> of the ICIC Fundraising Corpus
Table 5C Keyword data for all appeals*

R1 Key Word she job I her was my [Health and Human Services Organization] his you # R2 Key Word he his was miyares job him her she Wanda blind had baby but disabled were mother Wandas will are you # your Frequency 19 17 44 16 22 18 11 17 4 14 Frequency 58 43 48 21 20 17 22 20 11 10 18 12 27 15 19 10 7 5 4 10 38 3 Keyness* 52.02 46.28 34.44 34.40 28.30 27.66 26.58 25.89 44.44 63.55 Keyness 175.91 97.01 86.26 63.63 45.43 43.61 43.15 42.41 36.42 36.03 34.94 34.70 31.92 31.09 29.15 26.25 24.86 25.32 30.55 50.92 51.89 59.51

R6 Key Word # or R7 Key Word his dr # R8 Key Word provide you I R10 Key Word faculty gift I your you R12 Key Word us please contribution name gift your you I C13 Key Word I my am was your
Frequency 197 28 Frequency 16 10 14 Frequency 47 76 12 Frequency 43 8 16 28 32 Frequency 20 10 3 6 13 74 89 13 Frequency 136 64 20 31 12
Keyness 186.08 24.88 Keyness 28.26 24.93 43.85 Keyness 34.35 42.35 62.27 Keyness 40.05 25.55 41.03 76.03 107.08 Keyness 24.36 32.54 34.27 44.63 48.63 84.98 118.38 120.28 Keyness 214.10 164.02 34.11 31.13 35.50
C14 Key Word you thank your call support if questions any have I thanks ext forward appreciate for look me advance C15 Key Word we # C16 Key Word proud A17 Key Word your please you gift tax contribution consider help today enclosed donation
Frequency 315 89 217 32 88 50 22 32 85 86 17 11 15 14 156 17 26 12 Frequency 32 4 Frequency 6 Frequency 395 154 440 157 76 82 58 135 70 75 38
Keyness 290.51 202.96 165.47 64.37 62.37 48.07 44.33 42.72 40.48 34.19 30.33 28.34 28.14 27.85 26.88 26.11 25.04 24.31 Keyness 33.87 33.03 Keyness 27.91 Keyness 186.94 159.72 156.06 110.11 89.30 64.65 59.02 56.89 56.71 54.94 47.31

card envelope will make return join check hope send deductible pledge us receive membership to name sincerely A18 Key Word its what
Chapter 5. Rhetorical appeals in fundraising 50 46 185 86 40 44 33 59 37 29 33 90 36 45 632 6 3 Frequency 14 16 44.95 43.78 40.44 39.03 38.68 37.25 36.88 32.65 30.84 28.39 28.24 27.07 26.71 26.37 25.27 36.04 41.12 Keyness 33.88 31.90
* The reference corpus used in this comparison included all fundraising letter texts. Moves A19, R3, R4, R5, R9 and R11 did not have keyword results. P-values for all keywords are less than.01, indicating that there is a less than 1% danger of error in the calculations Keywords preceded by have negative keyness.
Part 2
Bottom-up analyses of discourse organization
chapter 6
Introduction to the identification and analysis of vocabulary-based discourse units

WITH Eniko Csomay, James K. Jones, & Casey Keck
As noted in Chapter 1, one major analytical issue for any attempt to combine corpus-linguistic and discourse-analytic research perspectives is to decide on a unit of analysis with a well-defined linguistic basis. In early corpora (such as the Brown Corpus and the LOB Corpus), the unit of analysis was a text file, containing a segment of a fixed length (e.g., 2,000 words) extracted from a text. Corpora of this type have been extremely useful for functional investigations of grammatical features, but they were not suitable for discourse studies. More recently, corpora have been constructed from complete texts, such as chapters, research articles, newspaper articles, or even complete books. However, there is often extensive linguistic variation within a text, associated with internal shifts in communicative task, purpose, and topic. One of the first hurdles for a corpus-based investigation of discourse structure is to identify the units that comprise texts. In some written genres, text-internal discourse units can be readily identified because they are marked by sections (in academic articles) or chapter breaks (in textbooks). However, even a discourse unit like a book chapter is likely to have systematic internal variation, associated with shifts in topic or purpose. Other kinds of texts, like a newspaper editorial or a conversation, have no overt markers of internal discourse units. Thus we need methods to determine the structural units of discourse in each kind of text, and to identify the boundaries of those units. The chapters in Part I of this book addressed this methodological problem by segmenting texts into moves or appeals, following a top-down analytical approach. In Part II of the book, we explore a complementary bottom-up approach. As described in Section 3 of Chapter 1 (see especially Table 1.2), the first step in a bottom-up approach is to automatically segment all texts in the corpus into well-defined discourse units, based on linguistic criteria. The specific method that we adopt here relies on analysis of vocabulary patterns within texts, identifying a discourse unit boundary when the text shifts to a new set of words. We thus refer to these units as Vocabulary-Based Discourse Units (VBDUs). Two overall analytical goals govern the bottom-up approach to
discourse organization developed here. First, the approach should provide a comprehensive linguistic description of discourse units and the flow of discourse within texts. Second, the approach should describe generalizable patterns of discourse organization that hold across all texts of the target corpus. As noted above, the first step required to achieve these goals is to automatically segment texts into discourse units. In the following chapters, the construct of the VBDU is used for these purposes. The present chapter introduces VBDUs and the analytical techniques required to segment texts into these discourse units. The following two chapters, then, illustrate the application of this approach to the analysis of discourse organization in texts from written research articles and spoken university lectures.
Conceptual introduction to VBDUs
Conceptually, a Vocabulary-Based Discourse Unit (VBDU) is a block of discourse defined by its reliance on a particular set of words. The boundary of a VBDU is identified as the place in a text where the author/speaker switches to a new set of words. Because the topic of discourse is expressed through vocabulary, VBDUs can usually be interpreted as topically-coherent units. However, it also often turns out that the author/speaker shifts communicative purpose from one VBDU to the next. An easy way to illustrate the correspondence between shifts in vocabulary and topical discourse units is through consideration of the major sections of a research article. There are almost always major shifts in vocabulary between the Introduction and Methods sections of research articles. For example, Text Excerpt 6.1 compares the end of the Introduction to the beginning of the Methods from a biochemistry research article (taken from the Biochemistry Research Corpus, see Chapter 4). Text Excerpt 6.1. Introduction and Methods sections from a Bio-chemistry research article. (MBCSep) [Underlined words do not occur in the adjacent discourse unit]
INTRODUCTION [] [VBDU 4] Drosophila early embryos undergo a morphologically intermediate mitosis in which pore complexes disassemble during prophase and prometaphase, leaving behind open holes, whereas nuclear membranes remain largely intact and the lamina partially disassembles: some lamins delocalize to the cytoplasm, but a fraction of them remain in place through early-mid anaphase (Ref.). To begin determining the functions of LEM domain proteins in vivo, we chose
Chapter 6. Introduction to the identification and analysis of vocabulary-based discourse units the genetically tractable nematode C. elegans. We report here the identification and characterization of the LEM domain proteins MAN1 and emerin in C. elegans and the discovery that the timing of nuclear envelope breakdown may be unique in C. elegans relative to other studied eukaryotes. METHODS [VBDU 5] To obtain polyclonal antibodies against Ce-MAN1 and Ce-emerin, mice and rabbits were immunized at 3-week intervals with synthetic peptides conjugated to keyhole limpet hemocyanin. Immunizations and serum production were performed by Covance Research Products. The following keyhole limpet hemocyanin-conjugated peptides were used: CAVWKWIGNQSQKRWCOOH, which corresponds to the last 14 residues of Ce-MAN1 plus an Nterminal Cys residue; and CQLKLVAETNPEDTI-COOH, which corresponds to the last 14 residues of emerin plus an N-terminal Cys residue. All peptides were synthesized, purified by reverse-phase HPLC with the use of a C18 analytical column, and conjugated to keyhole limpet hemocyanin by Boston Biomolecules. Rabbit polyclonal antibodies to Ce-lamin were produced against a bacterially expressed polypeptide consisting of residues D-217 to F-550 of lamin and were affinity purified. mAb414, which recognizes a subset of nucleoporins, was purchased from BAbCO. mAb104, which recognizes conserved small nuclear ribonucleoproteins (Ref.), was provided by Dr. Geraldine Seydoux. Cy3- conjugated goat anti-mouse and goat anti-rabbit antibodies, and FITC-conjugated goat anti-rabbit antibodies, were purchased from Jackson Laboratories. mAbs against tubulin were purchased from Sigma Chemical.
The vocabulary used in these two discourse units is dramatically different. There are only three content words shared by both the introduction and methods sections: emerin, lamin, and nuclear. These words are marked in italics in the above extract. All other content words, and many of the function words, are unique to one or the other of these two sections; these words are underscored in the above extract. (Function words common to both sections are in plain text.) This extract illustrates the dramatic way in which vocabulary can shift at the boundary of textinternal discourse units, marking a shift in topic. In this case, the boundary also marks a distinctive shift in communicative purpose: the introduction provides background information and an overview of the study; the methods provide the details of the actual procedures. Interestingly, these different communicative purposes are sometimes associated with different sets of function words in addition to content words. For example, the first discourse unit in Text Excerpt 6.1 relies on the pronoun we and the preposition in, while the second discourse unit uses past tense was/were and the prepositions with, from, and by.
But VBDUs are not necessarily restricted to the segments of a text that are overtly marked by paragraphs or section breaks. That is, because the methodology for identifying VBDU boundaries relies entirely on distinctive vocabulary, these discourse units sometimes reflect relatively subtle shifts in topic within orthographically marked sections. For example, Text Excerpt 6.2 shows the last two VBDUs from the Introduction of this same research article. (Note that the second of these is repeated from Text Excerpt 6.1 above.) Because they are both from the Introduction, these two discourse units share more vocabulary than what we saw for the two VBDUs in Text Excerpt 6.1 above; shared words are marked in italics. However, most words in these two units are unique to one or the other VBDU; the unique words are underscored. Text Excerpt 6.2. The last two VBDUs from the Introduction of a Bio-chemistry research article. (MBCSep) [underscored words are unique to a VBDU; italicized words are used in both VBDUs]
INTRODUCTION [] [VBDU 3] In mammals, the nucleus is completely disassembled during mitosis, a process known as open mitosis (Ref.). The lamina depolymerizes, and nuclear membranes disperse into the endoplasmic reticulum network during prometaphase (Ref.). Physical disruption of the nuclear envelope, caused by spindle microtubules during mid-late prophase (Ref.), may also contribute to the release of intranuclear contents. By metaphase, the vertebrate nuclear envelope is completely disassembled. The envelope reassembles onto chromosomes during late anaphase and telophase (Ref.). Lamina-associated polypeptide 2, lamin B receptor, and lamins have been proposed to help target reforming nuclear membranes to chromosomes or to mediate nuclear envelope assembly or growth (Ref.). The open mitosis of higher eukaryotes contrasts with the closed mitosis of single-celled eukaryotes such as Saccharomyces cerevisiae (Ref.). During closed mitosis, the nucleus remains intact and chromosomes are segregated by an intranuclear spindle apparatus.
[VBDU 4] Drosophila early embryos undergo a morphologically intermediate mitosis in which pore complexes disassemble during prophase and prometaphase, leaving behind open holes, whereas nuclear membranes remain largely intact and the lamina partially disassembles: some lamins delocalize to the cytoplasm, but a fraction of them remain in place through early-mid anaphase (Ref.). To
Chapter 6. Introduction to the identification and analysis of vocabulary-based discourse units begin determining the functions of LEM domain proteins in vivo, we chose the genetically tractable nematode C. elegans. We report here the identification and characterization of the LEM domain proteins MAN1 and emerin in C. elegans and the discovery that the timing of nuclear envelope breakdown may be unique in C. elegans relative to other studied eukaryotes.
Several of these distinctive words are repeated within a VBDU, but not used in the adjacent VBDU; these words are bold underscored in Text Excerpt 6.2. For example, the words nucleus, spindle, chromosomes, and closed are repeated in VBDU 3 but not used in VBDU 4. In contrast, LEM domain proteins and elegans are used repeatedly in VBDU 4 but not used at all in VBDU 3. These repeated words that are unique to a VBDU give a direct indication of the distinctive topic of the discourse unit in contrast to adjacent VBDUs. Another example of this type comes from the first two VBDUs in the Methods section from the same research article. Text Excerpt 6.3 shows these two VBDUs, highlighting the words that are unique to one VBDU but used repeatedly within that VBDU. In this case, the VBDU division corresponds to a major division in the procedures used in the study: VBDU 5 the first VBDU in the Methods section describes the process used to obtain polyclonal antibodies, using peptides conjugated to keyhole limpet hemocyanin. This VBDU identifies procedures carried out by other labs and the materials that were purchased from those labs. In contrast, the ensuing VBDU (VBDU 6) provides a detailed description of the procedures used for immunostaining. These procedures include the preparation of slides, using PBST and PBS to wash, incubate, and/or dilute the preparations, for specified periods of time (hours or minutes). Although these two VBDUs both come from the Methods section, they describe different steps in the procedure and different kinds of methodologies. Here we see how the shift in vocabulary associated with VBDUs corresponds to the textual shift in topic and purpose. Text Excerpt 6.3. The first two VBDUs from the Methods of a Bio-chemistry research article. (MBCSep) [bold underscored words are unique to a VBDU but used repeatedly within that VBDU]
METHODS [VBDU 5] To obtain polyclonal antibodies against Ce-MAN1 and Ce-emerin, mice and rabbits were immunized at 3-week intervals with synthetic peptides conjugated to keyhole limpet hemocyanin. Immunizations and serum production were performed by Covance Research Products. The following keyhole limpet hemocyanin-conjugated peptides were used: CAVWKWIGNQSQKRWCOOH, which corresponds to the last 14 residues of Ce-MAN1 plus an N-
terminal Cys residue; and CQLKLVAETNPEDTI-COOH, which corresponds to the last 14 residues of emerin plus an N-terminal Cys residue. All peptides were synthesized, purified by reverse-phase HPLC with the use of a C18 analytical column, and conjugated to keyhole limpet hemocyanin by Boston Biomolecules. Rabbit polyclonal antibodies to Ce-lamin were produced against a bacterially expressed polypeptide consisting of residues D-217 to F-550 of lamin and were affinity purified. mAb414, which recognizes a subset of nucleoporins, was purchased from BAbCO. mAb104, which recognizes conserved small nuclear ribonucleoproteins (Ref.), was provided by Dr. Geraldine Seydoux. Cy3- conjugated goat anti-mouse and goat anti-rabbit antibodies, and FITC-conjugated goat anti-rabbit antibodies, were purchased from Jackson Laboratories. mAbs against tubulin were purchased from Sigma Chemical. [VBDU 6] Immunostaining was performed essentially as described (Ref.). Mixed-stage animals or isolated wild-type adult C. elegans were placed on polylysinetreated slides, and 60-mm coverslips were placed above the nematodes. The slides were placed in liquid N2 or dry ice, and the coverslips were immediately removed. The nematodes were fixed for 4 minute at 20C in methanol and then incubated for 30 minute at 2224C in PBST containing 3.7% formaldehyde. Nematodes were then washed once in PBST, incubated for 10 minute at room temperature in PBST containing 5% nonfat dry milk, washed once again with PBST, and incubated overnight at 4C with the primary antibody diluted in PBST. Excess primary antibody was removed by washes in PBST: once for 1 minute, once for 10 minute, and twice for 30 minute each. The nematodes were then incubated for 2 hour at 22C with the Cy3-conjugated goat anti-rabbit antibodies or Cy3- conjugated goat anti-mouse antibodies diluted in PBST. Double-label immunostaining for small nuclear ribonucleoproteins and Ce-lamin was performed as follows. Animals were first stained with antibodies to Ce-lamin, followed by FITC-conjugated anti-mouse secondary antibody, and then washed in PBST; the animals were then incubated for 2 hour at 22C with mAb104 or anti-tubulin antibodies, rewashed as described above, and incubated for 2 hour with Cy3- conjugated anti-mouse antibodies. For both double- and single-label immunostaining, excess secondary antibody was then removed by washes in PBST: once for 1 minute, once for 10 minute, and twice for 30 minute each. Nematodes were then incubated for 10 minute in PBS containing 1 g/ml Hoechst 33258, washed once with PBS, and mounted in glycerol containing 2% n-propyl gallate.
Chapter 6. Introduction to the identification and analysis of vocabulary-based discourse units
2 Automatic identification of VBDUs in texts The computational methods used to automatically identify Vocabulary-Based Discourse Units are based on Hearsts (1994; 1997) TextTiling procedure. Conceptually, this is a quantitative procedure that compares the words used in adjacent segments of a text. If the two segments use the same vocabulary to a large extent, they are analyzed as belonging to a single discourse unit. However, when the two segments are maximally different in their vocabulary, they are analyzed as two distinct VBDUs. The TextTiling program processes texts through two 50-word windows. The windows move through the text one word at a time, and at each point, the program compares the 50 words in the first window with the words in the second window. For example, the program begins by comparing the vocabulary in words 150 from a text to the vocabulary in words 51100. The windows then advance one word, comparing the words 251 to words 52101. These comparisons continue until the entire text is processed. Each comparison is represented by a similarity score that measures the extent to which the vocabulary in the two 50-word windows is the same or different. The TextTiling similarity score is calculated using the following formula:
totaltypes
similarity
i1
(freq1(wordi)*freq2(wordi))
totaltypes
i1
totaltypes
i1
(freq1(wordi))2
(freq2(wordi))2
For each different word (i.e., each word type) in the two windows, the comparison procedure first multiplies the frequency of that word-type in the first 50-word segment (freq1) times the frequency of the same word-type in the second 50-word segment (freq2). (If a word type occurs in only one of the 50-word segments, then the product of these frequencies is 0.) The multiplied frequencies are then summed up, creating the numerator of the equation. In the denominator, the frequency of each word type in each 50-word segment is squared, and those squared frequencies are then summed up (for each segment); the two summations (for each segment) are then multiplied, and we then compute the square root of the resulting product. This formula produces values between 0 and 1, where values close to 1 indicate that the two windows have many words in common, and values close to zero indicate that the two windows have few words in common. Figure 6.1 shows a plot of the TextTiling similarity scores for the Introduction and Methods sections from the biochemistry research article discussed above. Peaks on the graph represent points where the two adjacent 50-word segments are
maximally similar in their vocabulary, indicating that the two segments belong to the same discourse unit. Valleys represent the point where the two adjacent text segments are maximally different in their vocabulary. Peaks and valleys can be identified automatically by computing slope measures. Any valley that differs by at least 25% from the preceding peak is marked as a VBDU boundary. The exact location of a VBDU boundary is often adjusted slightly to correspond to written sentence or spoken turn boundaries. Figure 6.1 identifies five valleys that represent VBDU boundaries; the peaks between these boundaries correspond to VBDUs 3, 4, 5, and 6 presented in Text Excerpts 6.1 6.3 above. There are several parameters that can be manipulated for the automatic identification of VBDUs, including the size of the text windows (50 words in the present studies), the required difference between peak and valley (25% in the present studies), and the maximum score allowed for a VBDU boundary (i.e., for a valley, which must be less than 0.2 in the present case). Manipulating these parameters would potentially result in fewer or more VBDUs being identified in a text, although the boundaries of those VBDUs should remain relatively constant. The values for these parameters used here were arrived at through a process of trial and error, by considering the automatic boundaries assigned to TextTiling profiles of texts from different genres, together with the interpretability of the resulting VBDUs.
Figure 6.1 Profile of TextTiling Scores in a Biochemistry Research Article (MBCSEP), showing the boundaries of four VBDUs
3 Perceptual correlates of VBDUs Hearst (1997) evaluates the extent to which human raters agree among themselves on the location of textual boundaries, and the extent to which the boundaries automatically assigned by TextTiling agree with human perceptions. In both cases, acceptable but generally weak levels of agreement were found. We carried out a further series of experiments on the perceptual salience of VBDU boundaries assigned automatically by our segmentation tool. Seven raters identified the locations in texts where they perceived a topical shift or some other kind of discourse boundary. Raters analyzed 12 text excerpts: four textbooks, four university classroom lectures, and four conversations. Raters were asked to identify points in the text where there was a major shift in topic or rhetorical purpose. The human analyses of all texts followed a single general pattern: there was a very high level of agreement for the placement of a few boundaries, but much less agreement for the placement of other boundaries. Figure 6.2 illustrates this pattern in a passage from a sociology textbook on age. All seven raters agreed on the location of two topical boundaries in this text excerpt: after sentence #21 and sentence #39. Both of these breaks are at paragraph boundaries, but there are no other grammatical or textual signals of topical units. Rather, raters identified these boundaries based on the content of the passage. Text Excerpt 6.4 gives many of the sentences from this textbook passage, showing the overall development: Text Excerpt 6.4. Sociology Textbook on Aging
The final measure of population aging we will discuss is life expectancy. Life expectancy refers to the average length of time the members of a population can expect to live. It is not the same as life span, which refers to a theoretical biological maximum length of life that could be achieved under ideal conditions. [Paragraph 1 continues] Life expectancy, then, is the average experience of a population. It is calculated from actual mortality data from a single year [Paragraph 2 continues] For a better understanding of life expectancy, Exhibit 3.8 gives a great deal more detail about life expectancy in the United States; it shows the average number of years of life remaining for people of different age, sex, and race categories in the United States in 1990. To use the table, look at the left-hand column to find a target age, then read across to the race and gender category that is of interest to you. [Paragraph 3 continues] As you spend some time calculating life expectancies from this table, you will notice some interesting sources of variation. Average length of life varies depending on age, race, and sex.
[Paragraph 4 continues] [SENTENCE 21]: So, the longer you live, the longer you can expect to live (and you can quote us on that)! [[Major Perceptual Boundary]] The race differential in life expectancy is evident in Exhibit 3.8. African American men of all ages have the lowest life expectancies. African American women have life expectancies lower than European American women [Paragraph 5 continues] These observations suggest two questions: Why is there a race differential in life expectancy at all, and why does it diminish and even reverse itself at the oldest ages? In answer to the first question, most of the race differential in mortality is explained by[Paragraph 6 continues] The second question regarding the convergence in the race differential in life expectancy has received some attention, but no definitive answer. One suggested explanation is [Paragraph 7 continues] [SENTENCE 39]: Another hypothesis for the convergence effect is that because African Americans who make it to the oldest ages do so in spite of many disadvantages and long odds, they may be survivors; that is, they may have some complex set of physiological and social psychological survival advantages. [[Major Perceptual Boundary]] A final variation in life expectancy that is readily apparent in Exhibit 3.8 is the gender difference. At every age, for both races, females have higher life expectancies than do males. [Paragraph 8 continues]
Raters perceived three major topics in this excerpt: 1) a conceptual introduction to life expectancy (definition and how to measure it); 2) comparing/contrasting the life expectancies of African Americans and European Americans; and 3) gender differences in life expectancies. The boundaries between these units are at paragraph breaks, but there are otherwise no other overt grammatical or textual signals for the boundaries. Rather, recognition of the boundaries requires actually reading the preceding and following paragraphs, and recognizing major shifts in the topic. Perceptual boundaries of this type, associated with major shifts in content, can also be associated with a shift in vocabulary, and therefore they often correspond to a VBDU boundary. Figure 6.2 shows that the TextTiling segmentation placed VBDU boundaries after sentence #21 and sentence #40, at essentially the same places as the perceptual boundaries.
Figure 6.2 Placement of Topical Unit Boundaries by Raters and TextTiling: Sociology Textbook
At the same time, all texts also had many other minor shifts in topic or purpose that were regarded as perceptual boundaries by some raters. Figure 6.2 shows boundaries of this type after sentences 6, 8, 11, 28, 34, and 52. There was much less agreement on the location of these boundaries, with only 13 raters agreeing. In many cases, there is also a VBDU boundary located near these less salient perceptual boundaries (e.g., the VBDU boundary after sentence #9). Surprisingly, the human analyses of texts from both spoken and written genres followed this same pattern: some boundaries are clear-cut, with most raters agreeing, while many other boundaries reflect more subtle shifts in topic, and raters show less agreement on those. For example, Figure 6.3 plots the perceptual boundaries that raters identified in a conversation. The topic in this text shifts abruptly, typical of many face-to-face conversations, and it is often difficult to identify discrete topical units. Raters perceived a relatively clear topical break after utterance 13, when the topic shifts from a general discussion of what Speaker A has been doing for the past month to a more specific discussion of Speaker As plans for the next semester at his university. The TextTiling software located a VBDU boundary at this shift as well.
Figure 6.3 Placement of Topical Unit Boundaries by Raters and TextTiling: Conversation
In contrast, raters found it much more difficult to agree on the location of other topic boundaries in this conversation, as the topic shifts abruptly and sometimes subtly. In rapid succession, the participants note that its been a hard year, talk about work plans for the next day, admire a drawing, ask about the mail, discuss a school project, lament the absence of money, talk about food and roommates, etc. Raters located nine topic boundaries in this conversation, but often with only one or two of the raters agreeing on the location of the boundary. Not surprisingly, TextTiling seemed to perform a kind of averaging, placing boundaries in between these less clear perceptual boundaries. One important aspect of TextTiling is that it is actually a continuous construct, directly representing the on-going use of vocabulary in a text, and indirectly representing the on-going unfolding of topic. Figure 6.1 (above) illustrates the continuous vocabulary profile of a text, with numerous valleys that could all be potentially interpreted as shifts in sub-topic. After the TextTiling profile has been computed for a text, we perform a separate analytical step to segment the graph into discrete VBDUs, representing discrete topical units (see below). There are several parameters that can be adjusted in the TextTiling software that affect the number of discrete VBDUs that are identified in a text. The most important of these is the difference between the TextTiling peak and valley required to be considered as a VBDU boundary. For example, Figure 6.4 plots the boundaries in an anthropology textbook passage. Similar to the pattern found
with other texts, human raters agreed on the location of a few topic boundaries in this text, and then also identified several other boundaries with lesser agreement. The VBDU segmentation software was run twice on this text: once requiring a 25% difference between the TextTiling peak and valley to be considered a VBDU boundary, and a second time requiring only a 20% difference. With the more strict (25%) requirement, only two VBDU boundaries were identified in the text (after sentences 23 and 36). In contrast, seven VBDU boundaries were identified with the requirement of a 20% difference between TextTiling peak and valley (which included the two boundaries identified by the 25% setting). In this case, the VBDU boundaries are identical to all major perceptual boundaries identified in this text, plus two other intermediate boundaries.1
Figure 6.4 Placement of Topical Unit Boundaries by Raters and TextTiling: Anthropology Textbook
1. Horn (2005) compares the location of VBDU boundaries to the location of step (and move) boundaries in a sub-corpus of 11 biochemistry research articles taken from Kanoksilapathams study (see Chapter 4). Although the two methods of text segmentation agreed in some instances, in general they reflected different underlying constructs. Steps (and moves) are generally smaller text segments than VBDUs, and they can be discontinuous in a text (so that parts of a single move/step can be found throughout a text, composed of text segments that are not necessarily contiguous). As a result, there was generally low agreement between the scope of the two types of units and the specific location of boundaries.
In sum, the results of our experiments indicate a high level of agreement between the human and automatic VBDU segmentations in cases where there is a high level of agreement among human raters. However, in other cases there was little agreement among human raters, and therefore low agreement with the VBDU segmentation. We would argue that the automatically assigned boundaries are as valid as human-assigned boundaries in such cases. VBDU segmentation has the advantages of being reliable and easily applicable to a large corpus of texts. As such, it is ideal for the investigation of generalizable discourse patterns from a corpus perspective. In the following chapters, texts are segmented into VBDUs as the first major procedural step in the analysis. The focus of the analysis, though, is on the different types of VBDUs, determined by a comprehensive description of lexico-grammatical characteristics, and on the generalizable patterns of discourse organization when texts are considered as sequences of VBDUs from these different types. We argue that VBDUs are valid units of analysis because they prove to be useful for the description of discourse patterns in texts. That is, the following analyses show that there are systematic linguistic differences among VBDU types, and that we can gain useful insights into discourse organization by considering the sequences of VBDUs in texts. Thus, the validation of VBDUs can be supported independently from two sources: they correspond generally to human-identified perceptual discourse units in cases where humans are able to agree with a high degree of reliability, and they prove to be useful and interpretable units of analysis in their own right. We certainly would not argue that VBDUs are the single correct way to segment a text into coherent discourse units, but we hope to demonstrate that this is a highly productive approach for corpus analysis. At the same time, we recognize the need for extensive future research on the textual basis of the TextTiling profile. As noted above, TextTiling is a continuous construct, reflecting the continuous evolution of topic in a text. Future research could also explore methods for describing texts as continuous constructs rather than being composed of a sequence of discrete discourse units. Furthermore, we need additional research on the mechanics of segmenting texts into VBDUs based on the TextTiling profile. In particular, we need methods for determining the best settings for the parameters in the TextTiling software (including the window size, required peak/valley difference, minimum VBDU length, and whether different parameters should be used for VBDUs of differing lengths). It seems likely that different kinds of texts will be best segmented with different TextTiling parameter settings, but at present we have not developed procedures for these adjustments. Future research should help to refine the actual VBDU segmentation resulting from TextTiling.
4 Using VBDUs to analyze the discourse structure of texts As described in Section 2 above, TextTiling is used to segment all texts in a corpus into vocabulary-based discourse units. Each of these VBDUs can then be treated as a unit (or a sub-text) for the purposes of linguistic analysis. This corresponds to Step 2 in the general bottom-up approach to corpus-based discourse analysis (see Table 1.2 in Chapter 1). For the linguistic analysis, each VBDU is automatically tagged to identify a large number of linguistic features. The current version of the tagger used in the present studies incorporates the corpus-based research carried out for the Longman Grammar of Spoken and Written English (Biber et al., 1999). Using dictionaries, probabilities, structural rules, and contextual features, the tagger identifies a wide range of grammatical features, including word classes (e.g., nouns, modal verbs, prepositions), syntactic constructions (e.g., WH relative clauses, conditional adverbial clauses, that-complement clauses controlled by nouns), semantic classes (e.g., activity verbs, likelihood adverbs), and lexico-grammatical classes (e.g., that-complement clauses controlled by mental verbs, to-complement clauses controlled by possibility adjectives). Appendix Two lists the full set of features that are identified by the tagger. Once individual VBDUs are tagged, it is possible to analyze the discourse development of a text by tracking the use of linguistic features across the VBDUs of a text. For example, Figure 6.5 shows the distribution of three linguistic features passive verbs, possibility modals (can, could, may, might), and communication verbs (e.g., suggest, report) across the 10 VBDUs of a biology research article. This plot shows that the different sections of this research article are very different in their use of these linguistic features. For example, passive verbs are especially common in the Methods section, while possibility modals are especially common in the Introduction. However, the VBDU analysis further shows that there are interesting patterns of variation within these sections. For example, communication verbs are especially common in the last two VBDUs of the Introduction, but considerably less common in the very first VBDU. Possibility modals are especially common in the second VBDU, but considerably less common in the first and last VBDU of the Introduction. At the other end of the article, communication verbs are moderately common in the first VBDU of the Discussion section, but then rare in the final two VBDUs. Patterns like these can be interpreted as reflections of the shifts in communicative purpose within the scope of a text. (See the fuller discussion of this research article AGFOENT01 in Chapter 7.)

80
70
Passive verbs
60
50 Dimension Score
Possibility modals
40
Communication verbs
30
20
10
0 Intro (VBDU 1-3) | Methods (VBDU 4-5) | Results (VBDU 6-7) | Discussion (VBDU 8-10)
Figure 6.5 Distribution of linguistic features across the VBDUs of a biology research article
5 Going one step further: Identifying generalizable VBDU types VBDU segmentation of a text, and the linguistic analysis of individual texts, are only preliminary analytical steps required as the basis for the two ultimate goals of this approach: to provide a comprehensive linguistic description of discourse units and the flow of discourse within texts, and to describe generalizable patterns of discourse organization that hold across all texts of the target corpus. One important measure of this second goal is to investigate whether those general patterns can be applied to individual texts to reveal new insights into their discourse structure. To achieve these goals, four analytical steps are required (see also Section 2 in Chapter 1, especially Table 1.2): (1) Identify all Vocabulary-based Discourse Units (VBDUs) in a large corpus representing a genre, using TextTiling (Segmentation); (2) Analyze the linguistic characteristics of each VBDU, using a grammatical tagger and multi-dimensional analysis (Linguistic analysis of each unit); (3) Identify and interpret the basic VBDU types, using Cluster Analysis (Classification and Linguistic description of discourse categories) (4) Analyze the discourse structure of texts as sequences of VBDU types (Text structure and Discourse organizational tendencies)
In the following chapters, the first major analytical goal is to provide a comprehensive linguistic description of the unfolding discourse within texts. For this purpose, we apply multi-dimensional (MD) analysis, rather than focusing on the distribution of individual linguistic features. That is, individual features will vary in use across VBDUs, reflecting the functional associations of each feature in relation to the communicative goals of each VBDU; Figure 6.5 illustrates such patterns of variation. However, it is further possible to investigate the developing discourse of texts more comprehensively, considering a much wider range of linguistic features. The multi-dimensional analytical approach enables descriptions of that type. As introduced in Appendix One, MD analysis is a methodological approach that applies multivariate statistical techniques (especially factor analysis and cluster analysis) to the investigation of genre/register variation in a language. The approach was originally developed to analyze the range of spoken and written genres (or registers) in English (Biber, 1986, 1988). There are two major quantitative aspects of a MD analysis: (1) identifying the salient linguistic co-occurrence patterns in a language the dimensions; and (2) comparing texts and genres in the linguistic space defined by those dimensions. The results of the MD analysis of VBDUs can be applied directly to investigate the discourse organization of texts (see, e.g., Csomay, 2005b). That is, each VBDU has a score for each dimension, where each of the dimensions represents a distinct set of co-occurring linguistic features. The dimension scores capture the extent to which the co-occurring linguistic features are used in the VBDU. By tracking changes in the dimension scores across the VBDUs of a text, we are able to track the linguistic development of discourse in the text. If we imagine a visual representation that plots the dimension scores of each VBDU, we would notice that discourse units are scattered throughout this multidimensional linguistic space. At the same time, there would be dense groupings or clusters of VBDUs, representing linguistic styles that are used in multiple texts. For example, Figure 6.6 shows the distribution of VBDUs extracted from a corpus of conversations, plotted in a two-dimensional space resulting from an MD analysis. This figure shows some distinct clusters of VBDUs referred to as text types; the VBDUs grouped into each cluster are maximally similar in their multi-dimensional profiles (see Biber, 1989, 1995). Figure 6.6 clearly shows how VBDUs of a specific type in a genre tend to exhibit particular linguistic characteristics, as described by their multi-dimensional linguistic coordinates. For example, all the VBDUs in Text Type 1 (on Figure 6.6) have large positive scores on Dimension 1 and large negative scores on Dimension 2. In contrast, all the VBDUs in Text Type 2 have moderate positive scores on Dimension 1 and positive scores on Dimension 2.
Figure 6.6 Plot of VBDUs along Dimension 1 vs. Dimension 2
Text types are linguistically well defined; text type distinctions have no necessary relation to genre distinctions. Rather, text types are defined such that the texts within each type are maximally similar in their linguistic characteristics, regardless of their situational/genre characteristics. However, because linguistic features have strong functional associations, text types can be interpreted in functional terms.2 In the methodological approach here, text types are identified quantitatively using cluster analysis, with the dimensions of variation as predictors. Cluster analysis groups VBDUs into clusters on the basis of shared multi-dimensional/linguistic characteristics: the VBDUs grouped in a cluster are maximally similar linguistically, while the different clusters are maximally distinguished (see Biber, 1989, 1995). It is possible to use these text types to achieve the second major analytical goal of this approach: describing generalizable patterns of discourse organization that hold across all texts of a corpus. In particular, we can investigate the distribution of text type sequences across all the VBDUs and texts of a corpus, interpreting the preferred sequences of text types in functional terms. These discourse patterns can further be applied to the description of individual texts, to identify texts with typ-
2. Text types and genres (or registers) represent complementary ways to dissect the textual space of a language. Text types and genres/registers are similar in that both can be described in linguistic and in situational/functional terms. However, the two constructs differ in their primary bases: genres/registers are defined in terms of their situational characteristics, while text types are defined linguistically.
ical versus more specialized discourse organizations. Chapters 78 illustrate the application of these methods to spoken and written genres. In previous pilot research of this type, we investigated the vocabulary-based discourse unit types in a large multi-genre corpus, including university classroom teaching sessions, textbooks, and academic research articles (Biber, Csomay, Jones, & Keck, 2004). Because there are striking linguistic differences among these genres, it was relatively easy to identify different discourse unit types with dramatically different linguistic characteristics. Genre proved to be the most important factor in that study, with the VBDU types being constrained by genre distinctions. This was especially the case for the spoken/written opposition represented in the corpus for that study: The VBDU types used for spoken discourse were for the most part distinct from the VBDU types used for the written genres in this corpus. That pilot research suggests that this analytical approach will be more productive if carried out on a restricted corpus representing only a single genre. That is, by minimizing the influence of genre and mode on the macro-level, we are more likely to capture differences associated with the particular communicative purposes that can shift within a text. In this way, the VBDU types can be interpreted as sub-genres on a micro-level. The following two chapters illustrate analyses of this type: Chapter 7 focuses on academic research articles while Chapter 8 focuses on university classroom teaching. The advantages of this approach are that the underlying units of analysis and constructs are identified through automated large-scale corpus analysis, and thus they represent the typical patterns of use found over the scope of the entire corpus. However, the ultimate application of the analysis takes us back to the individual text, to investigate the extent to which we can describe the discourse organization of a particular text in terms of these generalizable constructs derived from the corpus. The discussion in this section has been relatively abstract, outlining the major analytical steps followed for the study of VBDU types. It is much easier to understand the application of these analytical procedures through concrete examples, and these are provided by the case studies in the following two chapters.
chapter 7
Vocabulary-based discourse units in biology research articles

WITH James K. Jones
Textual analyses of moves were first proposed by Swales (1981) as a way to understand the internal discourse structure of Introductions in academic research articles. Since that time, there have been numerous move-based investigations of research articles (see the survey of studies in Chapter 2), including the corpus-based study of moves in biochemistry research articles presented in Chapter 4. These studies have extended the original framework developed by Swales, considering the moves found in all sections of research articles (Introduction-Methods-Results-Discussion), and comparing the move structure of research articles from different academic disciplines. The present chapter takes a complementary approach, investigating the discourse organization of research articles from the perspective of VocabularyBased Discourse Units (introduced in Chapter 6). The study here focuses specifically on empirical research articles in biology. All of these articles already have their internal discourse structure explicitly marked by four sections: Introduction Methodology Results Discussion. The research question that we set for ourselves in this study was whether a multidimensional discourse analysis could identify other micro-genres that operate within the scope of these rhetorical sections, thus allowing a more detailed analysis of the internal discourse organization of these academic research articles. The analysis is motivated by the same two overall goals that we described in Chapter 6: to provide a comprehensive linguistic description of discourse units and the flow of discourse within texts; and to describe generalizable patterns of discourse organization that hold across all texts of the target corpus.
Constructing the corpus of VBDUs
The first step in the analysis was to construct a corpus representing a broad sampling of empirical research articles in biology. We included articles from 10 major journals from several different subfields: Agricultural and Forest Entomology Annals of Human Genetics Clinical and Experimental Pharmacology and Physiology Conservation Biology Functional Ecology International Journal of Plant Sciences Journal of Anatomy Journal of Applied Microbiology Journal of Avian Biology Journal of Medical Primatology Recent issues of these journals that were available on-line (usually from summer or fall 2004) were chosen, with the first 10 empirical research articles selected from each journal.(Survey articles and theoretical articles were excluded from the corpus.) All articles included in the corpus had four sections: Introduction Methodology Results Discussion (IMRD). At the outset, these sections were treated as separate texts. The article sections provided a high-level representation of the discourse structure, but we were interested in the internal discourse organization within each section. We thus segmented articles into sections before undertaking the VBDU analysis. Thus, our corpus initially consisted of 400 texts: 10 academic journals x 10 articles from each journal x 4 sections in each article. The next step in the analysis was to segment these texts into Vocabulary-Based Discourse Units (VBDUs). We applied the TextTiling procedure to automatically identify VBDU boundaries (described in Chapter 6). The following text sample from the introduction of a research article illustrates the kind of discourse units identified by the TextTiling tool, showing how VBDU boundaries typically correspond to a shift in topic and/or purpose. Each of these two VBDUs contain many words not found in the adjacent stretch of discourse. The first VBDU introduces the general context of the study, referring to various coniferous hosts, Europe, Northern America, and Mediterranean region. The first VBDU also introduces the eriophyoid mite, which is then discussed further in the second VBDU. However, the two VBDUs differ in purpose: in the first VBDU, we learn about the general distribution of the eriophyoid mite; for example, the mites are associated with fast growing plant tissues. Then, in the second VBDU, the topic/ purpose shifts to a more specific discussion of why seasonal variation occurs in the
Chapter 7. Vocabulary-based discourse units in biology research articles
abundance of these mites. This new topic is associated with many new vocabulary items not found in the first VBDU, including: partly, explained, typical, dynamics, attack, colonized, first, swell, enlarge, stop, time, reproducing, etc. Text Excerpt 7.1. Text from the Introduction of a research article (Agricultural and Forest Entomology; AGFOENT01I), showing the location of VBDU boundaries. (The distinctive words in VBDU #2 are shown in bold underlined.)
ARTICLE BEGINS [VBDU 1] The eriophyoid mite Trisetacus juniperinus has been frequently recorded on various coniferous hosts in Europe, as well as in Northern America, and is responsible for shoot deformation and death of apical cells (Keifer, 1975; Castagnoli, 1996). In the Mediterranean region, it can cause considerable damage to the evergreen cypress, Cupressus sempervirens L., especially in nurseries and young stands (Nuzzaci & Monaco, 1977; Castagnoli & Simoni, 1998; Roques & Battisti, 1999). This cypress is one of the most important tree species for landscape and forestry in the whole Mediterranean region (Teissier du Cros, 1999 ). The mites appear to be associated with fast growing plant tissues, such as the apical buds of the shoots and the young reproductive organs (male and female cones) (Guido et al., 1995; Castagnoli & Simoni, 2000 ). Active meristemes in the apical buds are available throughout the year, particularly in nurseries and young stands, whereas cones are produced only when trees are sexually mature (1015 years ) (Teissier du Cros, 1999). A detailed study of the life history of T. juniperinus on young cypress trees showed that great seasonal variation in abundance might occur in this species, with a major peak during the spring growth period (Castagnoli & Simoni, 2000). [VBDU 2] Such variation was partly explained by the typical dynamics of the mite attack on buds: the colonized buds first swell and enlarge, then stop growing. At that time, the mites are reproducing within the buds and a high number of eggs and juveniles can be found inside (fig. 1a). Subsequently, the mites leave to disperse in the crown, whereas the deformed buds can resume growth to some extent (fig. 1b). New attacks can be detected within the same year but, usually, they are less severe. This behaviour of the mite and the reaction of the tree can be considered as something in between the formation of a true gall and the defence reaction of the tissues, both phenomena having been described for eriophyoid mites (Westphal & Manson, 1996).
Using the TextTiling techniques, we segmented all texts in our corpus into VBDUs. Table 7.1 below shows the composition of the original corpus and the number of VBDUs identified in each research article section.
Table 7.1 Corpus used for the analysis: Breakdown by section
# of texts Introduction Methods Results Discussion Total 100 100 100 100 400 # of words 61,000 109,000 122,000 94,000 386,000 total VBDUs 292 526 469 540 1,827 # of VBDUs 100 words 238 426 381 458 1,503
Table 7.2 presents descriptive statistics for the VBDUs that were extracted from each of the four different article sections. VBDUs are on average around 240 words long in each section, with the longest VBDUs being around 1,000 words. We excluded all VBDUs shorter than 100 words from the quantitative analyses because the quantitative distribution of linguistic features cannot be reliably measured in short texts. Thus, the shortest VBDUs in Table 7.2 are 100 words.
Table 7.2 Descriptive statistics for VBDU length in each register
VBDU Length (words) N of VBDUs Total research articles Breakdown by section Introduction Methods Results Discussion 238 426 381 458 242 240 232 254 115 120 105 127 100 100 100 100 770 835 602 906 1,503 Mean 243 Std Dev 118 Min. 100 Max. 906
2 Analyzing the linguistic characteristics of VBDUs: Multi-dimensional analysis To achieve the first major analytical goal providing a comprehensive linguistic description of research article discourse units we carried out a multi-dimensional (MD) analysis of this corpus. After the biology corpus was segmented, each
discourse unit was automatically tagged for a large number of linguistic features using the Biber grammatical tagger (see Chapter 6, Section 4). Then multi-dimensional analysis was used to identify the major patterns of linguistic variation among discourse units. Tables 7A, 7B and 7C at the end of the chapter give the full factorial structure for the analysis in the present study, while Table 7.3 summarizes the important linguistic features defining each dimension.1 As introduced in earlier chapters, MD analysis requires two major quantitative steps: (1) identifying the salient linguistic co-occurrence patterns, using factor analysis; and (2) comparing texts and genres/registers in the linguistic space defined by those co-occurrence patterns: the dimensions. Appendix One provides a fuller conceptual and methodological introduction to multi-dimensional analysis. Only 38 of the original 120+ linguistic features were retained in the factor analysis for the present study. Features were discarded either because they did not occur frequently enough to be considered important, or because they overlapped to a large extent with other features. For example, the counts for common verbs, nouns, and adjectives overlapped extensively with the semantic categories for those word classes, even though the counts were derived independently. In other cases, features were dropped because they were extremely rare in biology research articles (e.g., 2nd person pronouns, phrasal verbs). Some of these features were combined into a more general class. For example, to-clauses were originally broken down into five lexico-grammatical features, depending on the semantic class of the controlling verbs: communication verbs, mental verbs, verbs of desire, verbs of causation, and epistemic verbs. However, these lexico-grammatical features did not occur frequently enough in this corpus, and so they were all combined into a single feature: verb + to-clause. Similarly, all passive constructions were combined into a single feature (including agentless, by-passives, and non-finite passive clauses). In this case, the individual features were all relatively common, but they did not vary sufficiently across the texts of this corpus to figure prominently in the final factor analysis.
1. Principal components analysis, with a promax rotation, was used for the analysis. Features with a communality estimate greater than .1 were retained in the final factor analysis. All features with loadings greater than |.3| are listed in Table 7.3. In addition, features with loadings slightly less than |.3| are listed in parentheses.
Table 7.3 Summary of the four dimensions from the factor analysis of the biology corpus*
Dimension 1: Evaluation of possible explanations Features with large positive loadings: predicative adjectives, main verb be, adjective + to-clause, adjective + that-clause, adverbs, prediction modals, possibility modals, linking adverbials, causative adverbial subordination, conditional adverbial subordination, pronoun it, 1st person pronouns, (3rd person pronouns, necessity modals) Features with large negative loadings: nouns Dimension 2: Current state of knowledge versus past events and actions Features with large positive loadings: communication verbs, communication verb + that-clause, present tense, perfect aspect, epistemic verb + that-clause, relative clauses, demonstrative pronouns, (verb + to-clause) Features with large negative loadings: past tense, clausal coordination, (concrete nouns) Dimension 3: Procedural presentation of actions / events vs. elaborated description Features with large positive loadings: passive voice verbs, activity verbs, past tense, mental verbs, progressive aspect, time adverbials, cognitive nouns Features with negative loadings: attributive adjectives Dimension 4: Abstract / theoretical discussion of concepts Features with large positive loadings: nominalizations, long words, abstract nouns, process nouns, cognitive nouns, attributive adjectives, noun + that-clause * See Appendix Two for examples and Biber et al. (1999) and Quirk et al. (1985) for further description of each of the linguistic features.
The solution for four factors was selected as optimal.2 Each factor comprises a set of linguistic features that tend to co-occur in the discourse units from the biology corpus. Factors are interpreted as underlying dimensions of variation based on the assumption that linguistic co-occurrence patterns reflect underlying communicative functions. That is, particular sets of linguistic features co-occur frequently in texts because they serve related communicative functions.
2. Taken together, these factors account for only 27% of the shared variance (see Table 7A), but they are readily interpretable. Solutions with additional factors accounted for relatively little additional variance, and subsequent factors were represented by few features. The Promax rotation used to conduct this analysis allows for correlated factors, although the inter-factor correlations in the present analysis were all small (see Table 7B).
For example, the positive features on Factor 1 (e.g., predicative adjectives, main verb be, adjective + to-clause, adjective + that-clause, prediction modals, possibility modals, linking adverbials, causative adverbial subordination, conditional adverbial subordination) co-occur in academic discourse that presents logical possibilities and compares the merits of competing explanations. Text Excerpt 7.2 a VBDU from a Discussion section illustrates the dense use of positive Dimension 1 features: Text Excerpt 7.2. From a Discussion section (Functional Ecology; FUNCECO04D) (Selected positive Dimension 1 features are shown in bold underlined; Dimension 1 score = 14.4)
It is conceivable that recently created grasslands will maintain a weedy, ruderal character on a long-term basis and not develop into the stable communities typical of old grasslands. The opening of gaps in the sward after drought may potentially make these communities more vulnerable to colonization by invasive species more suited to the changed climatic conditions. However, if the model predictions for wetter winters as well as drier summers prove correct, the impact on species composition and the dangers from invasive species will be smaller than they would be otherwise. The strategy of conversion of arable land to grassland is therefore still desirable on conservation grounds. However, weather as wet as that in the period 19982001 is likely to remain relatively rare, and it may be advisable to adjust seed mixes, where sown, to include more deep-rooting perennial forb species which will persist through periods of drought.
Reflecting the functions of these co-occurring features in discourse units, the interpretive label Evaluation of possible explanations can be proposed for Dimension 1. Dimension 2 has features with positive loadings as well as features with negative loadings, representing two distinct co-occurrence sets. These two feature sets comprise a single dimension because they tend to occur in complementary distribution: when a discourse unit has a high frequency of the positive set of features, that same discourse unit will tend to have low frequencies of the negative set of features, and vice versa. The positive features on Dimension 2 include communication verbs (especially controlling a that-clause), present tense, perfect aspect, and epistemic verb + that-clauses. Those features occur in complementary distribution to the features with negative loadings: past tense, clausal coordination, and concrete nouns. On first consideration, this distribution of features is surprising, representing exactly the opposite pattern from that found in the Biber (1988) multi-dimensional analysis of general spoken and written registers. That is, Dimension 2 in the
1988 study was defined by past tense verbs co-occurring with perfect aspect verbs, in complementary distribution to present tense verbs. That dimension was interpreted as representing stereotypical narrative discourse, especially fictional narrative. In contrast, Dimension 2 in the present analysis shows that present tense tends to co-occur with perfect aspect in biology academic discourse units, and that both of those features have a complementary distribution to past tense. In this case, perfect aspect verbs function to report past findings that continue to have current validity; thus they co-occur with present tense verbs that often report timeless facts and the current state of knowledge. An example of this is shown in Text Excerpt 7.3: Text Excerpt 7.3. From an Introduction section (Agricultural and Forest Entomology; AGFOENT05I) (Present tense and perfect aspect verbs are shown in bold underlined; Dimension 2 score = 0.5)
After hatching, larval clutches develop through five instars before pupation (Elliott & Bashford, 1978 ). Gregarious newly emerged neonates initially feed in the vicinity of their eggshells, avoiding eucalypt oil glands by skeletonizing the leaf surface. However, from the third instars onwards, larvae feed on all leaf material.Upon cessation of the larval stage, larvae leave the host plant and enter a prepupal stage in the soil, after which pupation occurs in cocoons formed from silk, bodily fluids and surrounding soil particles (Elliott & Bashford, 1978; Mcquillan, 1985 ). A suite of natural enemies has been recorded to attack the immature stages of M. privata. Predators of larvae include spiders and an unidentified mirid (Hemiptera: Miridae) (Lukacs, 1999). [] The larvae of M. privata are oligophagous, having been recorded feeding in the field on at least 27 species of eucalypt (Neumann & Collett, 1997). [] Variations in the level of E. globulus defoliation following outbreaks of M. privata have been recorded in E. globulus genetics trials and plantations (Farrow et al., 1994; Jones et al., 2002).
At the other extreme, the negative features on Dimension 2 past tense verbs and clausal coordination are used mostly for a simple reporting of past actions and events, as in Text Excerpt 7.4: Text Excerpt 7.4. From a Methods section (Agricultural and Forest Entomology; AGFOENT05M) (Past tense verbs are shown in bold underlined; Dimension 2 score = -7.5)
Before being searched each tree was divided into thirds, which represented a north eastern, southern and north-western aspect. The juvenile foliage of each aspect was simultaneously searched by one of three scorers for the
number of egg batches present, with scorers alternating which third of the tree was searched. Searches were conducted for 2 min, or less if the entire tree was searched within the 2 min. The leaves with egg batches were removed and returned to the laboratory, where the number of batches per tree was recorded, as well as the size of each egg batch. The number of eggs within a batch parasitized by Telenomus sp. (see Telenomus sp. egg parasitism, Woolnorth) was also recorded. Prior to analysis a log transformation was used to normalize the count data.
Considering the functions of these complementary sets of co-occurring features, we can propose the interpretive label Current state of knowledge versus past events and actions for Dimension 2. For the most part, Dimensions 3 and 4 have only positive features. Dimension 3, which is interpreted as Procedural presentation of actions/events (versus elaborated description), consists of passive voice verbs, past tense verbs, progressive aspect verbs, time adverbials, activity verbs, and mental verbs and nouns. Text Excerpt 7.4 above illustrates many of these features co-occurring in a typical procedural discussion from a methods section. Nearly all verb phrases in this discourse unit are in the passive voice and incorporate a past tense activity verb (e.g., was divided, was searched, were conducted, were removed, was recorded, was used). This sample also illustrates the use of (non-finite) progressive verbs (e.g., being searched, alternating) and time adverbials (before, simultaneously, for 2 min., prior to). Only one negative feature occurs on this dimension: attributive adjectives. VBDUs with a dense use of attributive adjectives (coupled with the absence of activity verbs, mental verbs, etc.) tend to have a descriptive focus. Finally, Dimension 4 includes mostly nominal features: nominalizations, long words, abstract nouns, process nouns, cognitive nouns, noun + that complement clause, and attributive adjectives. At the same time, many noun classes do not cooccur with these features, such as concrete nouns, animate nouns, or place nouns (the last two were dropped from the factor analysis because they did not co-vary significantly with other features in this corpus). The nouns that do co-occur on Dimension 4 refer to abstract concepts and processes, prompting the interpretive label Abstract discussion of concepts. Text Excerpt 7.5 illustrates these characteristics in a discussion section VBDU: Text Excerpt 7.5. From a Discussion section (Conservation Biology; CONSBIO05D) (Selected Dimension 4 features nominalizations, long words, and abstract/process nouns are shown in bold underlined; Dimension 4 score = 11.8)
Agricultural intensification had a profound impact on nocturnal and crepuscular aerial insect abundance, and certain insect families, many of which
are host-specific, were less common on conventional farms than on organic farms. In particular, insect families important in bat diets were adversely affected by agricultural intensification. Changes in land use through agricultural intensification have reduced resource abundance for bats and have reduced the stability and predictability of such food resources. Because bat communities are resource-limited (Bonaccorso 1979; Findley 1993), our data support the hypothesis that agricultural intensification has been a factor in the reduction in the numbers of key dietary components for bats and that this reduction has led to reduced bat activity on conventional farms. Significant correlations between the activity of bats and the abundance of their prey support assumptions in the United Kingdoms biodiversity action plans that agricultural intensification has been a significant factor leading to declines in bat populations. Furthermore, our data suggest that managing farms to maximize insect abundance, especially that of key insect families, by maintaining diverse and structurally varied habitats and reducing agrochemical use, would benefit bat populations.
3 Comparing the multi-dimensional characteristics of research article sections The corpus for this study was designed to represent four registers: the Introduction (I), Methods (M), Results (R), and Discussion (D) sections of biology research articles. Based on this design, it is possible to contrast the multi-dimensional characteristics of I-M-R-D sections from a register perspective (see also Finegan & Biber, 2001). We can use the dimensions to compare VBDUs from different sections by computing a dimension score for each discourse unit. Dimension scores (or factor scores) are computed by summing the individual scores of the co-occurring linguistic features on a dimension (see Biber, 1988, pp. 9397). For example, the Dimension 1 score for each VBDU is computed by adding together the frequencies of predicative adjectives, main verb be, adjective+to-clause, adjective+that-clause, etc. the features with positive loadings on Factor 1 (from Table 3) and then subtracting the frequencies of nouns the only feature with a negative loading. The individual feature counts are first standardized so that each feature has a comparable scale, with a mean of 0.0 and a standard deviation of 1. This process converts the feature scores to scales representing standard deviation units, so that all features on a factor have equivalent weights in the computation of dimension scores (see Biber, 1988, pp. 9397). (The standardization is based on the overall means and standard deviations for each feature in the biology academic corpus.)
Then, dimension scores are computed by summing the standardized frequencies for the features comprising each of the four dimensions. Figure 7.1 plots the mean dimension scores for each of the four article sections, showing that there are important linguistic differences in their preferred styles. For example, VBDUs in the Introduction tend to have large positive scores for Dimension 2 (Current state of knowledge), and large negative scores for Dimension 3 (representing the absence of procedural features, coupled with a dense use of attributive adjectives for elaborated description).
Figure 7.1 Overall mean dimension scores for the research article sections
Methods and Discussion sections are the most distinctive in their multi-dimensional characterizations, but the two have almost opposite multi-dimensional profiles. VBDUs from Methods sections have large negative scores on Dimension 2 (a focus on past events) and large positive scores on Dimension 3 (Procedural presentation). The negative scores for Dimensions 1 and 4 reflect the absence of evaluative and abstract features in Methods VBDUs. In contrast, the Discussion section is the only part of these research articles to show a large positive score for Dimension 1 (Evaluation of possible explanations), and this section also shows the largest positive scores for Dimension 2 (Current state of knowledge) and Dimension 4 (Abstract/theoretical discussion). In sum, the multi-dimensional analysis shows that there are important linguistic differences across the major sections of research articles, reflecting the gen-
eral communicative purposes of each section (e.g., theoretical discussion and evaluation of explanations versus a procedural description of the actual steps in the analysis). However, as the following section shows, we can use this same general approach to investigate discourse patterns at a more detailed level of analysis.
4 The multi-dimensional profile of VBDUs within a research article: Tracking the movement of discourse Research article sections provide an overt indication of the authors intended purpose: introducing the topic; describing the methodology; reporting results; discussing the implications. As the previous section has shown, there are important linguistic differences associated with these major shifts in purpose. However, in the present approach, we compute dimension scores for each VBDU in a research article. As a result, it is possible to track the discourse flow of a research article by considering the change in dimension scores across all VBDUs. Such analyses show that the authors purpose can shift in less explicit ways within a research article section. By tracking the MD profile of VBDUs within article sections, we are able to provide a more detailed description of the internal discourse organization of an article. Figure 7.2 plots the multi-dimensional profile for an article from Agricultural and Forest Entomology. In all, there are 10 VBDUs in this article: three in the Introduction; two in Methods; two in Results; and three in the Discussion. Figure 7.2 plots only scores for Dimensions 13. (Dimension 4 is less distinctive in this research article, and it is excluded to avoid clutter in the figure.) Similar to the general patterns described in the previous section (see Figure 7.1), Figure 7.2 shows that there are important overall differences in the dimension scores across the major sections of this research article. In general, the Introduction is characterized by reference to current knowledge (vs. past events; Dimension 2), a moderate use of evaluation features (Dimension 1), and a mixed profile for procedural description (Dimension 3). The Methods section is quite distinctive, with a marked use of past event features (Dimension 2) and procedural description features (Dimension 3), together with the absence of evaluation features (Dimension 1). The Results section in this article maintains the focus on past events (Dimension 2), but shifts to a moderate use of evaluation features (Dimension 1). Finally, the Discussion section shows a dramatic shift to current knowledge (Dimension 2) and the use of evaluation features (Dimension 1). However, these characterizations are not consistent across all VBDUs within a section. Rather, we can track shifts in the multi-dimensional profile across the VBDUs within an article section, reflecting internal shifts in communicative pur-
pose. The two most dramatic examples of this type are in the Introduction and Discussion sections of the article. Figure 7.2 shows that the first VBDU in the Introduction is not evaluative, somewhat focused on current (as opposed to past) knowledge, and markedly nonprocedural.The actual text of this VBDU is given above in Text Excerpt 7.1. Most of this passage is written in the present tense (Dimension 2), with a reliance on existence and simple occurrence verbs (rather than activity or mental verbs). The most notable characteristic of this VBDU, in comparison to other VBDUs from biology research articles, is the dense use of noun phrases modified with descriptive attributive adjectives (e.g., various, coniferous, apical, considerable, young, most, important, whole, growing, reproductive, male, female, active, detailed, great seasonal, major).
Figure 7.2 Multi-dimensional profile of a biology research article (AGFOENT01), showing Dimension 13 scores for the 10 VBDUs in the article
In contrast, the second VBDU in this research article shifts to a more evaluative style (Dimension 1), with a greater focus on current knowledge (Dimension 2) and procedural presentation (Dimension 3). This VBDU is given as Text Excerpt 7.6 (repeated from Text Excerpt 7.1 above), with selected Dimension 2 and Dimension 3 features highlighted. These are mostly verbal features (semantic classes of verbs, and tense/aspect/voice features).
Text Excerpt 7.6. VBDU #2 from the Introduction section (Agricultural and Forest Entomology; AGFOENT01I) (Selected Dimension 2 and Dimension 3 features are shown in bold underlined: communication verbs, activity verbs, mental verbs, present tense, perfect aspect, progressive aspect, passive voice, time adverbials)
Such variation was partly explained by the typical dynamics of the mite attack on buds: the colonized buds first swell and enlarge, then stop growing. At that time, the mites are reproducing within the buds and a high number of eggs and juveniles can be found inside (fig. 1a). Subsequently, the mites leave to disperse in the crown, whereas the deformed buds can resume growth to some extent (fig. 1b). New attacks can be detected within the same year but, usually, they are less severe. This behaviour of the mite and the reaction of the tree can be considered as something in between the formation of a true gall and the defence reaction of the tissues, both phenomena having been described for eriophyoid mites (Westphal & Manson, 1996).
The shifts in dimension scores reflect an underlying shift in communicative purpose across these two VBDUs. In the first VBDU of the introduction, the focus is on a description of the state of affairs, providing elaborating details with attributive adjectives; for example:
This cypress is one of the most important tree species for landscape and forestry in the whole Mediterranean region (Teissier du Cros, 1999 ). The mites appear to be associated with fast growing plant tissues, such as the apical buds of the shoots and the young reproductive organs
In contrast, the second VBDU shifts to a description of events, documenting the process by which mites attack buds; for example:
the colonized buds first swell and enlarge, then stop growing. At that time, the mites are reproducing within the buds and a high number of eggs and juveniles can be found inside (fig. 1a). Subsequently, the mites leave to disperse
A similarly dramatic shift can be observed between the last two VBDUs in the Discussion section. Figure 7.2 shows that all three VBDUs in the Discussion have positive Dimension 2 scores, focusing on current knowledge (in contrast to the VBDUs in the Methods and Results sections, which report past events). However, in the last VBDU we see a notable shift to a highly evaluative style of discourse, marked by the high positive score on Dimension 1. This shift represents a switch from an impersonal summary of findings (VBDU 9) to a first-person discussion of possible implications, alternative factors, and suggestions for future research (VBDU 10). Text Excerpt 7.7 presents selected sentences from these two VBDUs to illustrate this difference.
Text Excerpt 7.7. The last two VBDUs from the Discussion section of a research article (Agricultural and Forest Entomology; AGFOENT01D)
[VBDU 9, with selected Dimension 2 features marked in bold underline] [] This implies the existence of long-term effects of mite infestation, [] It is worth noting that female cones attacked by T. juniperinus exhibit a similar reaction. Scales colonized by mites produce more tissue and assume a typical deformation, [] Whereas, in a mature tree, the mites could move between the apical buds and the cones, they depend on the availability of suitable buds in seedlings and young trees, which makes survival more difficult. The details of colonization and re-colonization of trees in the field are not known for T. juniperinus, but eriophyoid mites are known to spread by wind currents and, occasionally, they are phoretic on birds and insects (Kifer, 1975; Shvanderov, 1975 ). [VBDU 10, with selected Dimension 1 (evaluative) features marked in bold underline: 1st person pronouns, modal verbs, linking adverbials, and causative subordination] In the greenhouse, as in our study, infestation of seedlings may occur via contact with infested plants because seeds are not infested with mites (Battisti et al., 2000 ). [] Infestation of grafted trees via scions can be another way of mite spreading; however, this was not the case in our experiment where mites coming from the rootstocks heavily infested the grafted scions. Other ways of seedling infestation appear unlikely, because a natural re-infestation of trees maintained under outdoor conditions during the experiment was not detected [] The higher susceptibility of Bolgheri, confirmed in our study, may represent an economic problem because Bolgheri is the most important clone on the market, [] Later colonization in the field may be likely because trees have a higher number of tips and cones. []
The present section has illustrated how the results of multi-dimensional analysis can be used for detailed investigation of the discourse development of particular texts; this is done by tracking the multi-dimensional profile of the VBDUs that constitute the text. As the following sections show, multi-dimensional analysis can also be used to identify underlying text types, allowing description of generalizable discourse patterns across the texts of a corpus.
5 Identifying and interpreting the multi-dimensional text types of biology research articles We have noted several times the two major analytical goals of bottom-up approaches to discourse organization: to provide a comprehensive linguistic description of discourse units and the flow of discourse within texts; and to describe generalizable patterns of discourse organization that hold across all texts of the target corpus. The case study presented in Section 4 above showed the application of a comprehensive linguistic analysis to describe the systematic patterns of linguistic variation within the scope of a single research article section. That case study illustrates how multi-dimensional analysis, coupled with a segmentation of a text into VBDUs, can be used to provide a continuous linguistic profile of language use over the course of a text, corresponding to text-internal shifts in communicative purpose. That approach enables a linguistically-motivated description of the discourse structure of individual texts. However, such descriptions do not produce generalizable results; it is not feasible to directly compare these continuous linguistic profiles across all texts of a corpus. Thus the methodological challenge here is to develop analytical constructs that can be used to compare the discourse structure of multiple texts, and thus to identify discourse patterns that can be generalized for all texts in a corpus. The approach adopted here is based on discrete text types, considering general discourse patterns realized by systematic combinations of those text types. In previous multi-dimensional studies (e.g., Biber 1989, 1995), text types are identified quantitatively using Cluster Analysis, with the dimensions of variation as predictors. Cluster analysis groups texts into clusters on the basis of shared multi-dimensional/linguistic characteristics: the texts grouped in a cluster are maximally similar linguistically, while the different clusters are maximally distinguished. For the application here, the four dimensions of variation are used as linguistic predictors in the cluster analysis, which identifies groups of VBDUs that are maximally similar in their linguistic characteristics; these groupings are interpreted as VBDU text types. Cluster analysis is an exploratory statistical technique that groups VBDUs statistically, based on the scores for all four dimensions. The FASTCLUS procedure from SAS was used for the present analysis. Disjoint clusters were analyzed because there was no theoretical reason to expect a hierarchical structure. Peaks in the Cubic Clustering Criterion and the Pseudo-F Statistic (produced by FASTCLUS) were used to determine the number of clusters. These measures are heuristic devices that reflect goodness-of-fit: the extent to which the texts within a cluster are similar, while the clusters are maximally distinguished. In the present case, these measures had peaks for the 6-cluster solution.
Figure 7.3 shows the distribution of research article VBDUs plotted with respect to Dimension 1 and Dimension 2. This figure shows some distinct clusters of VBDU text types. For example, the VBDUs in Text Type 1 (on Figure 7.3) have large positive scores on Dimension 1. The VBDUs in Text Type 5 tend to have large positive scores on Dimension 2. The other text types are less clearly distinguished with respect to these two dimensions, but some of them have more distinctive characterizations with respect to Dimensions 3 and 4.
Figure 7.3 Scores for research article VBDUs along Dimensions 1 and 2, identifying the text type of each VBDU
Table 7D (at the end of the chapter) provides a descriptive summary of the cluster analysis results, showing the number of VBDUs grouped into each cluster together with other statistics on the dispersion of VBDUs within the cluster and the nearest cluster. The clusters can be interpreted as VBDU Text Types, because each cluster represents a grouping of VBDUs with similar linguistic profiles. Figure 7.4 provides a graphic representation of the linguistic differences among the text types, showing the mean dimension scores for each type. (Table 7E at the end of the chapter provides the actual mean scores and standard deviations for the dimension scores of each cluster.) The clusters differ notably in their distinctiveness: the smaller clusters are more specialized and more sharply distinguished linguistically. For example, Cluster 1 has only 73 VBDUs (see Table 7D); linguistically, Figure 7.4 shows that the VBDUs grouped in Cluster 1 have extremely large positive scores on Dimension 1
(Evaluation) and moderately large positive scores on Dimension 2 (Current state of knowledge). At the other extreme, Cluster 3 is a general MD-Discourse Type: it is large (538 VBDUs) and relatively unmarked in its dimension scores.
Figure 7.4 Mean dimension scores for the six VBDU text types
Table 7F at the end of the chapter and Figure 7.5 show the distribution of VBDUs across discourse types (the clusters) and research article sections. The clusters are not distributed evenly across research article sections. For example, 58 of the 73 VBDUs in Cluster 1 (or 79.45% of all Cluster 1 VBDUs) occur in Discussion sections. Cluster 2 similarly shows a strong association with a single section: 199 of the 265 VBDUs in Cluster 2 (or 75.09% of all Cluster 2 VBDUs) occur in Methods sections. At the same time, Table 7F and Figure 7.5 show that most research article sections can be composed of discourse units from all six clusters. For example, there are a total of 458 VBDUs from Discussion sections, broken down as follows: Cluster 1: 58 (12.66%) Cluster 2: 8 (1.75%) Cluster 3: 66 (14.31%) Cluster 4: 65 (14.19%) Cluster 5: 140 (30.57%) Cluster 6: 121 (26.42%) Two major patterns are apparent from Figure 7.5. First, the article sections differ greatly in the extent to which they rely on different text types. At one extreme,
Results sections tend to rely on VBDUs from a single text type (Cluster 3, interpreted below as Description of events); almost 60% of the VBDUs in Results sections are from this text type. Methods sections are also highly specialized, relying primarily on only two text types; about 93% of the VBDUs in Methods sections are from Clusters 2 and 3. In contrast, Introductions and Discussion sections use a much wider range of text types. The second major pattern has to do with the particular text types preferred in each article section. For example, VBDUs from Text Type 3 are especially prevalent in Methods and Results sections, while VBDUs from Text Type 2 are prevalent only in the Methods section. VBDUs from Text Type 1 are less common overall and restricted primarily to the Discussion sections of these articles.
Figure 7.5 Distribution of text types across article sections
Taken together, Figures 7.45 provide the basis for the interpretation of each VBDU Text Type. These interpretations are refined by consideration of individual VBDUs from each type (discussed below). Figure 7.4 shows that the four most distinctive Text Types are defined by especially large scores on one of the four dimensions (these scores are shown in bold on Table 7E). VBDUs from Type 1 have especially large scores on Dimension 1 (Evaluation of possible explanations) plus relatively large scores on Dimension 2 (Current state of knowledge); Text Excerpt 7.2 above illustrates a VBDU of this type. VBDUs from Type 2 have especially large positive scores on Dimension 3 (Procedural discourse), together with large
negative scores on Dimension 2 (Past events); Text Excerpt 7.4 above illustrates a VBDU of this type. VBDUs from Type 5 have especially large positive scores on Dimension 2 (Current state of knowledge); Text Excerpt 7.3 above illustrates a VBDU of this type. And VBDUs from Type 6 have especially large positive scores on Dimension 4 (Abstract/theoretical discussion of concepts); Text Excerpt 7.5 above illustrates a VBDU of this type. Reflecting these linguistic characteristics, we propose the following tentative interpretive labels for the six text types in our study: Text Type 1: Current evaluation of implications and explanations Text Type 2: Procedural description of past actions and events Text Type 3: Report of past events Text Type 4: Abstract elaborated discussion (not evaluative and not procedural) Text Type 5: Presentation of the current state of knowledge Text Type 6: Current abstract/theoretical discussion As the following sections show, biology research articles tend to use these six text types in systematic combinations, reflecting different underlying patterns of discourse organization.
6 Using VBDU text types to describe the discourse organizational patterns of biology research articles The sections above have described how research articles can be segmented into discourse units; how factor analysis can be used to identify the underlying dimensions of linguistic variation among these VBDUs; how those dimensions can be used to track the discourse development of an individual research article; and how cluster analysis can be used to identify the text types that are defined by the multidimensional linguistic space. It is further possible to use these constructs to describe the discourse organization of an individual research article. Figure 7.5 above shows that Research Article Sections and Text Types provide two complementary perspectives on the discourse structure of biology research articles, with all six text types occurring in each section. By identifying the text type of each VBDU in a research article, we can describe the internal discourse organization of article sections. To illustrate, Table 7.4 presents an outline of the VBDUs that comprise a research article from the journal Agricultural and Forest Entomology. This same article is discussed in Section 4 above, and Figure 7.2 (above) plots the multi-dimensional profile of VBDUs from this article. Table 7.4 is similar in tracking the linguistic characteristics of each VBDU in the article, but in this case, those VB-
DUs are classified according to their text type (allowing direct comparison to the discourse patterns of other research articles). We have already presented several text samples from this research article. Text Sample 1 (above) presents the first two discourse units from the article, and Sample 6 highlights the distinctive linguistic characteristic of VBDU 2. It turns out that these two VBDUs also represent different text types (Types 4 and 5). The first VBDU is Type 4 (Abstract/theoretical discussion), characterized mostly by a large positive score on Dimension 4: Abstract / theoretical discussion of concepts. As such, this VBDU uses many long words, nominalizations, and abstract/process nouns, and attributive adjectives (e.g., eriophyoid, Trisetacus, juniperinus, coniferous, Mediterranean, Cupressus, sempervirens, reproductive, meristemes, variation, abundance). This reliance on technical/abstract vocabulary is absent in the second VBDU, which is from Type 5 (Presentation of the current state of knowledge). Instead we find frequent reference to concrete, tangible nouns, such as mite, bud, eggs, juveniles, crown, tree, gall. At the same time, the second VBDU illustrates the use of communication verbs to report previous findings (explained, described), together with the use of present tense verbs to present currently accepted facts that provide the background to the present study (e.g., the colonized buds first swell and enlarge, then stop growing. At that time, the mites are reproducing within the buds [] Subsequently, the mites leave to disperse in the crown).
Table 7.4 Outline of the VBDUs in a biology research article (AGFOENT01)
Introduction VBDU1 Type 4 VBDU2 Type 5 VBDU3 Type 5 Methods VBDU4 VBDU5 Results VBDU6 VBDU7 Discussion VBDU8 VBDU9 VBDU10 Type 3 Type 2 Type 3 Type 3 Type 5 Type 5 Type 1 Abstract / theoretical discussion (not evaluative and not procedural) Presentation of the current state of knowledge Presentation of the current state of knowledge Simple report of past events Procedural description of past actions and events Simple report of past events Simple report of past events Presentation of the current state of knowledge Presentation of the current state of knowledge Evaluation of implications and explanations (within the context of current knowledge)
Table 7.4 shows that there is also a text type shift between the last two VBDUs in the Discussion section of this research article; Text Excerpt 7.7 above highlights some of the distinctive linguistic characteristics of those two VBDUs. VBDU 9 is from Type 5, interpreted as presenting the current state of knowledge. In VBDU 9, these linguistic features are used to present generally accepted facts that can be used to interpret the particular findings of the present study (e.g., this implies; it is worth noting that female cones exhibit; Scales colonized by mites produce more tissue and assume a typical deformation; size and seed quality of the cone is lower than that of healthy cones (Battisti et al., 2000); eriophyoid mites are known to spread by wind currents and, occasionally, they are phoretic on birds and insects (Kifer, 1975; Shvanderov, 1975 )). The last discourse unit of this article (VBDU 10) shifts to Text Type 1, which relies on the evaluative features associated with Dimension 1. As Text Excerpt 7.7 shows, this VBDU is marked by a notable shift to evaluative/interpretive features, including 1st person pronouns (our), modal verbs (e.g., may, can, should), causative subordination, linking adverbials (e.g., however), main verb be, and the dense use of predicative adjectives, often controlling a complement clause (e.g., optimal, unlikely, present; are able to, be sufficient to). In sum, the analysis of this discussion section in terms of its composite text types helps to identify a shift in author orientation reflecting an underlying shift in purpose. In sum, the text type description is generally in agreement with the multi-dimensional profile described in Section 4 above. The multi-dimensional profile, as in Figure 7.2, actually provides more detailed information about the internal discourse characteristics of an individual research article, because it represents continuous patterns of use rather than discrete categories. However, the primary advantage of multi-dimensional profiles that it captures continuous patterns of use is a liability for any attempt to document generalizable patterns of discourse organization across research articles. In contrast, the analysis of discourse structure in terms of discrete categories text types is ideal for such purposes. The following sections discuss several of the general discourse patterns that emerge from the investigation of biology research articles.
7 Starting and ending research article sections One general research question that can be investigated from a text type perspective is whether there are preferred ways to begin and end a section in a research article. In the present study, there are distinctive patterns of discourse in Introductions, Methods, and Discussion sections. (All VBDUs in Results sections tend to be Text Type 3, Report of Events. There is thus little internal variation among the VBDUs within Results sections.)
7.1
Describing the typical discourse organizations of introductions
Figure 7.6 compares the preferred text type used to begin an article introduction versus the preferred text type for the final VBDU in the introduction. (Only 76 pairs of VBDUs are considered here, because 14 of the research articles in our corpus had short Introductions, consisting of only a single VBDU.)
Figure 7.6 Preferred text types in Introductions, by position
Figure 7.6 shows that there is considerable variability in the text types used for article introductions. VBDUs from Text Types 3, 4, 5, and 6 are used both to begin an article introduction and as the last VBDU in the introduction. At the same time, there is a difference between the two discourse positions: the preferred text type used to begin article introductions is Text Type 5: Presentation of the current state of knowledge. In contrast, Text Type 4: Abstract elaborated discussion is the preferred type used to end article introductions. That is, the most common discourse pattern for Introductions is to begin with a summary of the current state of knowledge, and then shift to a more technical/ abstract introduction of the proposed study.3 Text Sample 7.8 illustrates a discourse organization of this type. Notice in particular the reliance on present tense and
3. The research article described above in Table 7.4 (see Text Excerpts 7.1 and 7.6) actually illustrates the opposite pattern: the introduction begins with a VBDU of Type 4 (Abstract discussion) and then shifts to Type 5 (Current knowledge).
perfect aspect in VBDU #1, to establish what has been accomplished in this area of research to date. Communication and discovery verbs + that-clause constructions are also typical of this text type (e.g., proposed that, found that). In contrast, the actual overview of the present study in the last VBDU of the Introduction shifts to a greater reliance on technical terminology and elaborated noun phrases. Notice the extremely dense use of long, technical terms, attributive adjectives, nominalizations, and abstract/process nouns generally in VBDU #3. Text Excerpt 7.8. The first and last VBDUs from the Introduction section (Conservation Biology; CONSBIO06I)
VBDU 1 (the first VBDU in the Introduction) (Text Type 5: Current Knowledge, with selected Dimension 2 features marked in bold underlined: present tense, perfect aspect, communication/epistemic verbs + that-clause) Habitat fragmentation and edge effects are putative threats to population viability for a variety of wildlife species. Documented declines of some migratory bird species over the past three decades (REFS) have resulted in numerous studies of how fragmentation affects the nesting success of populations. [] Numerous studies of edge effects, defined as an increased probability of nesting failure near habitat edges, also have been conducted. However, the impact of edges on nesting success remains unclear. [] Andersen (1995) proposed that landscape composition, the amount of different patch types in the landscape, might explain the variation in results of edge-effects studies and found that edge effects in Europe are more common in forest-farmland mosaics than in forest mosaics characterized by stands of varying ages. VBDU 3 (the last VBDU in the Introduction). (Text Type 4: Abstract discussion, with selected Dimension 4 features marked in bold underlined: nominalizations, long words, abstract / process nouns, attributive adjectives) However, artificial nests may be subject to different predation pressures than natural nests (REFS) and may not reflect the nesting success of an actual bird species (REFS). Thus, critical information about how edges affect the success of natural nests of birds in heterogeneous landscapes is still lacking. To address these issues, we evaluated the success of 230 Wood Thrush (hylocichla mustelina) nests in edge and interior habitats in both fragmented and contiguously forested landscapes in central New York. The Wood Thrush, a Nearctic Neotropical migratory songbird, is an ideal species for separating the effects of fragmentation and edge. [] Our objectives were to (1) compare the abundance and nesting success of Wood Thrushes in fragmented and
contiguous landscapes; (2) compare Wood Thrush abundance and nesting success in edge and interior habitat in each landscape type; and (3) to use actual nest data to test the hypothesis that edge effects are stronger in fragmented landscapes than in contiguous landscapes.
In sum, we have illustrated here the preferred pattern of discourse organization within biology research article Introductions. In this case, many other patterns are possible and in fact commonly occur (as in the case of the shift from an Abstract Text Type to a Current Knowledge Text Type shown in Table 7.4 above). In contrast, the following section shows that Methods sections have a much more strongly preferred pattern of internal discourse organization. 7.2 Describing the typical discourse organizations of methods sections
Figure 7.7 shows that Methods sections in biology articles are quite constrained in their preferred discourse organizations: Text Type 3 (Report of Events) is strongly preferred (66% of the time) as the first VBDU within a Methods section, while Text Type 2 (Procedural Description) is strongly preferred as the last VBDU in this section (67% of the time), as illustrated in Text Excerpt 7.9.
Figure 7.7 Preferred text types in Methods sections, by position
Text Excerpt 7.9. The first and last VBDUs from the Methods section (Functional Ecology; FUNCECO05M)
VBDU 3 (the first VBDU in the Methods). (Text Type 3: Report of Events with active voice verbs marked in bold underlined) Bicyclus anynana (satyrinae) is a tropical butterfly distributed from southern Africa to Ethiopia, which feeds on a variety of fallen and decaying fruit, including that from Ficus trees (REF). A laboratory stock population of B. anynana was established at Leiden University in 1988 from over 80 gravid females collected at a single locality in Malawi. [] Butterflies from this stock population were used for this study. As in earlier studies (REFS), we used natural variation in the 13c content of plants to trace the dietary sources of egg carbon. The 13c content is expressed as the ratio of sample 13c:12c relative to a limestone carbon standard [FORMULA]. A more positive number indicates an increased abundance of the heavy isotope, and is referred to as enriched. [] By raising butterflies on isotopically contrasting larval and adult diets, we could easily identify the dietary source of carbon in the eggs. Likewise, plants may also differ in 15n content, depending on the sources and pathways used in nitrogen assimilation. [] VBDU 5 (the last VBDU in the Methods) (Text Type 2: Procedural with past tense verbs and selected Dimension 3 features marked in bold underlined: passive voice, activity verbs, and mental verbs) In both laboratories, egg 13c and 15n were measured using continuous flow isotope ratio mass spectrometry. A Ce Instruments Elemental Analyser (Milan, Italy) was used to combust and separate sample gases, which were introduced into a Finnigan Delta Xl Plus isotope ratio mass spectrometer in a continuous stream of helium, via the Conflo Ii interface (thermo Finnigan, Bremen, Germany). C were obtained from the Elemental Analyser for both experiments; % c and % n were determined for Experiment 2 only. Statistics Throughout, means are given 1 Se. The daily trend in mean egg number was evaluated with linear regression. Differences in fecundity between fed and starved females were evaluated with t tests. Differences in egg 13c per day were evaluated with Anova or Ancova, and residuals were evaluated for normality using the Shapiro-Wilks test. Non linear fitting was performed using least squares minimization (REF).
Methods sections in these research articles tend to follow the same discourse progression: The first VBDU is a general introduction to the study and the methodol-
ogy. This section mixes past tense and present tense verbs, and it mixes active voice and passive voice verbs. In many cases, active voice verbs are used with direct reference to the researchers (e.g., we used, we could identify), while in other cases, active voice verbs are used to establish the parameters of the study (a positive number indicates, plants may also differ). In contrast, subsequent VBDUs in Methods sections tend to shift to Text Type 2, characterized by a dense and consistent use of passive voice verbs. The researchers are fully backgrounded in these methodological VBDUs, because they are assumed to be the logical subject of every verb phrase. These verbs generally report the activities that the researchers performed in the study (e.g., measured, introduced, obtained), which also often involve abstract mental processes (e.g., determined, evaluated). Similar to the preferred text type sequences in Introductions, the pattern described here is not an absolute rule for Methods sections. However, the text type analysis shows that there is a very strong preference for this discourse organization within these research articles. 7.3 Describing the typical discourse organizations of discussion sections
Finally, we can use the same approach to consider the preferred sequences of text types in Discussion sections. At a general level, Figure 7.8 shows that Discussions follow a pattern more similar to Introductions than Methods: many different text types are used in Discussions, occurring in both initial and final positions. However, there are certain moderate preferences. First of all, although Text Type 1, Current evaluation of implications and explanations, is not especially common overall, this text type is more likely to occur as the last VBDU in the Discussion (14% of the time) than as the first VBDU (7%). In contrast, Text Type 5, Current Knowledge, is the preferred type used for the first VBDU in Discussion sections. Text Excerpt 7.7 above (see Table 7.4) illustrates this discourse organization (Shifting from Type 5 to Type 1). However, it is more common to begin the Discussion with Text Type 5 and then shift to Text Type 6, Current abstract/theoretical discussion, as the final VBDU in the Discussion section (33% of the time). Text Excerpt 7.10 illustrates this discourse pattern. The first VBDU in the discussion section sets the stage again, reminding readers about the larger context of the study. Present tense and perfect aspect verbs are prevalent in this VBDU, used to state what we currently know about the topic. The following VBDUs in the Discussion then shift to a general summary of the study findings and the broader theoretical implications of those results. In some cases, that discussion can be evaluative, as in Text Excerpt 7.7 above. However, more commonly the discussion is abstract and theoretical, as in VBDU 11 in Text Excerpt 7.10. Notice in particular the use of nominalizations (e.g., correlation, interac-
tion, conclusion, resistance), which are often abstract/process nouns, as well as the stance noun + that-clause constructions (e.g., the fact that, a hypothesis that), and the use of long words generally (e.g., benzalkonium, ciprofloxacin). Taken together, these co-occurring features present a style of discourse that is used for concluding abstract/theoretical discussions of implications.
Figure 7.8 Preferred text types in Discussions, by position
Text Excerpt 7.10. The first and last VBDUs from the Discussion section (Journal of Applied Microbiology; JAPMICR05D)
VBDU 9 (the first VBDU in the Discussion) (Text Type 5: Current Knowledge with selected Dimension 2 features marked in bold underlined: present tense and perfect aspect) The relationship between antibiotic and antimicrobial biocide (disinfectants) resistance is currently considered a hot topic (REF). In cases where antibioticresistant organisms have become a serious problem with respect to nosocomial infection, the widespread use of antimicrobial biocides and the introduction (or enforced compliance) of handwashing and general hygiene measures normally leads to its amelioration (REFS). If, as has been suggested, antimicrobial biocides may be aiding the incidence of the same antibiotic resistant organisms, then there appears to be a dichotomy of reasoning. The study of the mean Mic, Mic50 or the Mic90 values for a population does allow general
trends in the change of resistance with time to be highlighted. However, such a study will fail to answer any questions on potential correlations of the antimicrobials. [] VBDU 11 (the last VBDU in the Discussion) (Text Type 6: Current Abstract/theoretical discussion with selected Dimension 4 features marked in bold underlined: nominalizations, long words, abstract / process nouns, attributive adjectives, noun+that complement clause) Table 5 shows that benzalkonium chloride and ciprofloxacin are negatively correlated but no correlation was found with gentamycin. The rotated factor, rpc3, consists of factors from the two Qacs and is essentially devoid of any interaction with any other antimicrobial. Although this compares well with the study of Jones et al.(1989) who showed cross resistance between Qacs when Ps. aeruginosa was trained to be resistant to a single Qac, the absence of antibiotic factors and the fact that there is zero correlation between rpc1 and rpc3, suggests that, from this data, that there are no correlations between Qacs and antibiotics. As this latter result was not expected, a study with a larger data set with more antibiotics is needed to help confirm or change this conclusion. From the analyses performed here it is very difficult to support a hypothesis that increased biocide resistance is a cause of increased antibiotic resistance either in Staph. aureus or in Ps. aeruginosa. []
8 Preferred text type sequences across research article section boundaries The preceding sections have documented the preferred patterns of discourse development within the sections of biology research articles. It also turns out that there are preferred patterns across article sections: the text type of one section can influence the choice of text type in subsequent sections. For example, Figure 7.7 in Section 7.2 above shows that Methods sections typically begin with a Report of Events VBDU (Type 3), followed by Procedural VBDUs (Type 2). However, Figure 7.7 further shows that some Methods sections (about 30% of the time) begin directly with a Type 2 (Procedural) VBDU. We can predict this discourse choice in part by considering the text type of the final VBDU in the Introduction. As Figure 7.9 shows, when the Introduction ends with Text Type 5 (Current Knowledge) or Text Type 6 (Abstract Current Knowledge), it is more likely that the Methods section will begin directly with a Procedural VBDU (Text Type 2). (That is, only about 25% of Methods sections begin with a Procedural VBDU when the Introduction ends with Report of events or an Abstract
VBDU, compared to almost 40% of Methods sections that begin with a Procedural VBDU when the Introduction ends with Current knowledge VBDU.) Text Excerpt 7.11 illustrates an article with this discourse pattern. It is interesting to compare this text sample to Text Excerpt 7.9 in Section 7.2 above. Text Excerpt 7.9 illustrates the typical discourse organization for Methods sections in these research articles: the first VBDU in the Methods section provides a general overview of the study and methodology (Text Type 3), followed by the details of the procedures (Text Type 2). In contrast, we see here the pattern where these discourse units have been shifted forward in the article. Thus, in Text Excerpt 7.11, the last VBDU in the Introduction provides a general overview of the study, in the context of the authors own previous research. Then, the first VBDU in the Methods moves directly into the details of the procedures, using Text Type 2. In this case, we see how the choice of discourse organization in one section influences the discourse patterns in subsequent sections.
Figure 7.9 First VBDU in Methods, following different text types used as the last VBDU in the Introduction
Text Excerpt 7.11. The last VBDU from the Introduction section, plus first VBDU from the Methods section (Journal of Applied Microbiology; JAPMICR06I,M)
VBDU 3 (the last VBDU in the Introduction) (Text Type 5: Current Knowledge with selected Dimension 2 features marked in bold underlined: present tense, epistemic verb + that-clause, communication verb, verb + to-clause) In this study, we attempted to determine whether the beer-spoilage ability of Lact. paracollinoides strains is an intrinsic character of this species. Our previous study indicates that Lact. paracollinoides has three distinct ribotypes, represented by three ribopatterns obtained from La2t, La7 and La8 (REFS). The search for these ribotypes in our database, consisting of various Lactobacillus strains, showed that the ribotype of nonspoilage strain, Lact. brevis Atcc8291, is identical with that of Lact. paracollinoides La7. This finding led us to characterize Atcc8291 in an attempt to determine whether Lact. paracollinoides is an intrinsic beer-spoiler. We also discussed the use of Orf5 as a genetic marker for differentiating the beer spoilage ability of lact. paracollinoides. VBDU 4 (the first VBDU in the Methods). (Text Type 2: Procedural with past tense verbs and selected Dimension 3 features marked in bold underlined: passive voice and activity verbs) Bacterial strains and growth conditions Lactobacillus strains were grown anaerobically at 25 C in MRS broth (REF). Anaerobic conditions were generated by Anaeropack (REF). Cells were stored in MRS broth containing 20% glycerol at 80 C. Characterization of Atcc8291 Pcr assay for identifying Lact. paracollinoides. The nucleic acid was extracted, as described previously (REF) with the modification that 1 l glycogen (REF) was added to facilitate ethanol precipitation of DNA. DNA (100 l) solution was prepared in 10 mmol 1 Tris buffer (ph 8 0 ) from 1 ml cell culture. Dna ( 5 l ) extracts were subjected to Pcr assay as templates. []
9 Comparing the preferred discourse styles of research journals Finally, research journals differ in their preferred styles of discourse organization. For example, we showed above how Methods sections typically begin with a Report of Events VBDU (Type 3), followed by Procedural VBDUs (Type 2). However, a relatively large minority of Methods sections begin directly with a Procedural VBDU (about 30%). It turns out that this alternative pattern is strongly associated
with particular research journals. Table 7.5 shows that two of the journals included in our corpus show a strong preference for this pattern: 7 of the 10 articles sampled from the journals Clinical and Experimental Pharmacology and Physiology and Journal of Applied Microbiology include Methods sections that begin directly with a Procedural VBDU. Text Excerpt 7.11 in the previous section illustrates this pattern. In contrast, the other eight research journals in our corpus rarely adopt this discourse organization. A second example of this type comes from the Discussion sections. As shown in Section 7.3 above, a notable minority (about 15%) of biology research articles use a Current evaluation VBDU (Type 1) to end the Discussion section (see Figure 7.8). It turns out that that pattern is especially common in one research journal: Agricultural and Forest Entomology (5 of the 10 articles with this pattern); Text Excerpt 7.7 in Section 4 above illustrates this discourse style. In contrast, most other journals rarely follow this pattern. It is interesting to note that these marked discourse patterns for Methods and Discussion sections are in complementary distribution in our corpus: Journals that rely on a strictly Procedural Methods section rarely end the Discussion with a Current evaluation VBDU (see Table 7.5). Figure 7.10 shows that the journals also differ generally in their preferred text types in Discussion sections. A comparison of Table 7.5 and Figure 7.10 indicates that these stylistic preferences are more general.At one extreme, the Journal of Avian Biology rarely begins Methods sections with a procedural VBDU (only 10% of the time); about 62% of the VBDUs in Discussion sections from this journal express current evaluation or current abstract discussion. The journal Clinical and Experimental Pharmacology and Physiology illustrates a quite different preferred discourse style: Methods sections usually begin with a Procedural VBDU (70% of the time), and Discussion sections rely most heavily on Current knowledge VBDUs (Type 5; 53% of the time). Our corpus sample for each of these academic journals is small (only 10 articles per journal), and thus minor differences in the discourse patterns across journals must be interpreted with caution. However, the findings here indicate large and systematic differences in the preferred discourse patterns of different research journals, probably reflecting their more general communicative purposes and priorities. Future research of this type, based on larger corpus samples, is needed to provide details of these patterns.
Table 7.5 Stylistic differences across research journals: Proportion of articles from each journal with a marked discourse preference
Research Journals % of Methods sections beginning w/ Procedural VBDU (Type 2) 0 10 0 10 20 20 30 30 70 70 % of Discussion sections ending w/ Evaluation VBDU (Type 1) 50 30 10 10 10 10 20 0 0 0
Agricultural and Forest Entomology Journal of Avian Biology Conservation Biology Functional Ecology Annals of Human Genetics International Journal of Plant Sciences Journal of Medical Primatology Journal of Anatomy Journal of Applied Microbiology Clinical and Experimental Pharmacology and Physiology
Figure 7.10 Stylistic variation across four research journals: Preferred text types in Discussion sections
10 Conclusion The present chapter has illustrated the bottom-up approach to integrating the strengths and goals of corpus analysis and discourse analysis. This approach allows the consideration of the internal discourse structure of individual texts based on generalizable units of analysis identified through empirical analysis of a large corpus. Specifically, we used corpus-based analysis to identify and interpret the types of discourse units commonly found in a corpus of biology research articles. We then showed how that analysis could be applied to describe the internal discourse organization of particular articles and sections, as well as the more generalizable organizational patterns of research articles and different journals. The research approach described here has several advantages: Most importantly, the approach relies on empirical analysis of a large corpus, and on comprehensive linguistic analysis of the discourse units. The results are therefore replicable, and the findings can be interpreted as generalizable patterns that are representative of this genre. When an individual text is analyzed in terms of these constructs, the discourse patterns are readily comparable to the analyses of other texts from the same genre. As a result, these constructs can be applied to identify the generally preferred discourse patterns of a genre. At the same time, the approach has disadvantages. Most importantly, the discourse units themselves are identified primarily on the basis of word use, disregarding more subtle signals of a shift in purpose that might be noticed by a human analyst. Thus, we would certainly not argue that corpus-based analysis of discourse unit types should replace more conventional discourse analytic techniques. However, we hope that the present study has shown the usefulness of a bottom-up corpus-based approach to discourse.
Table 7A Statistical output from the factor analysis of biology research articles: Eigenvalues for the first six factors
Factor 1 2 3 4 5 6 Eigenvalue 4.81539474 1.98906147 1.92963489 1.63901899 1.50502503 1.31812037 Difference 2.82633326 0.05942658 0.29061590 0.13399396 0.18690466 0.01730357 Proportion 0.1267 0.0523 0.0508 0.0431 0.0396 0.0347 Cumulative 0.1267 0.1791 0.2298 0.2730 0.3126 0.3473
Table 7B Statistical output from the factor analysis of biology research articles: Inter-factor correlations
Factor 1 Factor 1 Factor 2 Factor 3 Factor 4 1.00000 0.33704 0.28207 0.16770 Factor 2 0.33704 1.00000 0.23946 0.21578 Factor 3 0.28207 0.23946 1.00000 0.12994 Factor 4 0.16770 0.21578 0.12994 1.00000
Table 7C Statistical output from the factor analysis of biology research articles: Rotated factor pattern (Promax rotation)
Factor 1 Features with high loadings on Factor 1: predicative adjectives copula be adjective + to complement clause adverbs prediction modals conjuncts possibility modals causative adverbial clauses conditional adverbial clauses adjective + that complement clause pronoun it first person pronouns third person pronouns necessity modals nouns 0.59725 0.53596 0.53167 0.49067 0.47859 0.42739 0.41064 0.40932 0.36147 0.33134 0.31604 0.30642 0.26277 0.25684 0.50186 Factor 2 0.21374 0.02230 0.04448 0.03961 0.07542 0.16989 0.20298 0.06489 0.11082 0.06686 0.31504 0.10662 0.02551 0.09208 0.27915 Factor 3 0.18694 0.06724 0.09421 0.14722 0.03827 0.05834 0.03595 0.04717 0.10658 0.06864 0.05425 0.13475 0.17277 0.05455 0.03416 Factor 4 0.08250 0.18825 0.04828 0.15303 0.02015 0.03781 0.20525 0.09738 0.02544 0.01815 0.12020 0.18895 0.08850 0.18069 0.21725
Factor 1 Features with high loadings on Factor 2: communication verbs communication verb + that complement clause present tense perfect aspect epistemic verb + that complement clause relative clauses demonstrative pronouns verb + to complement clause past tense clausal coordination concrete nouns Features with high loadings on Factor 3: passive voice activity verbs mental verbs progressive aspect time adverbials Features with high loadings on Factor 4: nominalizations word length abstract nouns process nouns cognitive nouns attributive adjectives noun + that complement clause 0.10369 0.06108 0.26291 0.12455 0.09052 0.01880 0.12527 0.09671 0.11086 0.23437 0.12157 0.11597 0.05540 0.29240 0.05395 0.04528 0.00750 0.01948 0.00163 0.13612 0.06819 0.04460 0.06473
Factor 2 0.59439 0.55623 0.52945 0.52518 0.44476 0.35085 0.32147 0.28786 0.45245 0.35013 0.19192 0.00543 0.00949 0.08361 0.05294 0.22996 0.16825 0.05385 0.04929 0.09724 0.04338 0.01662 0.18228
Factor 3 0.28658 0.01099 0.29089 0.05667 0.07416 0.14433 0.05153 0.12872 0.53318 0.22316 0.12353 0.68231 0.65759 0.43665 0.42232 0.37108 0.05314 0.06320 0.10624 0.04659 0.33553 0.36044 0.11090
Factor 4 0.02054 0.03151 0.09118 0.04949 0.00740 0.03247 0.08789 0.09367 0.14306 0.09040 0.08479 0.14811 0.05539 0.11977 0.11338 0.24231 0.72610 0.62795 0.47413 0.45618 0.40226 0.37436 0.24678
Table 7D Cluster summary

Cluster Freq. RMS Std Deviation 3.1 2.7 2.5 2.5 2.7 2.7 Max. Distance from Seed to Observation 12.7 14.1 11.1 12.0 13.0 12.6 Nearest Cluster 5 3 2 6 6 4 Distance Between Cluster Centroids 7.8 6.9 6.9 6.4 6.6 6.4
1 2 3 4 5 6
73 265 538 180 249 198
Table 7E Descriptive statistics for the dimension scores of clusters. [Especially large dimension scores are shown in bold.]
Dimension 1 Means Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 8.18 1.91 0.97 0.45 1.21 1.07 SD 2.81 1.99 2.00 2.05 2.35 2.26 Dimension 2 Means 3.93 4.09 2.91 0.86 7.35 3.47 SD 3.47 2.70 2.94 2.91 3.32 2.64 Dimension 3 Means 1.18 5.94 0.55 4.43 1.32 0.31 SD 2.98 3.01 2.26 2.13 2.46 2.82 Dimension 4 Means 0.58 0.59 2.26 3.03 0.55 4.66 SD 2.96 3.05 2.51 2.71 2.40 2.89
Table 7F Distribution of VBDUs across clusters (MD discourse types) & research article sections
Introduction Cluster 1 Frequency Percent Row % Column % Cluster 2 Frequency Percent Row % Column % Cluster 3 Frequency Percent Row % Column % Cluster 4 Frequency Percent Row % Column % Cluster 5 Frequency Percent Row % Column % Cluster 6 Frequency Percent Row % Column % Total Frequency Percent * 6 0.40 8.22 2.52 5 0.33 1.89 2.10 48 3.19 8.92 20.17 59 3.93 32.78 24.79 74 4.92 29.72 31.09 46 3.06 23.23 19.33 238 15.83 Research Article Sections* Methods Results 2 0.13 2.74 0.47 199 13.24 75.09 46.71 199 13.24 36.99 46.71 10 0.67 5.56 2.35 1 0.07 0.40 0.23 15 1.00 7.58 3.52 426 28.34 7 0.47 9.59 1.84 53 3.53 20.00 13.91 225 14.97 41.82 59.06 46 3.06 25.56 12.07 34 2.26 13.65 8.92 16 1.06 8.08 4.20 381 25.35 Discussion 58 3.86 79.45 12.66 8 0.53 3.02 1.75 66 4.39 12.27 14.41 65 4.32 36.11 14.19 140 9.31 56.22 30.57 121 8.05 61.11 26.42 458 30.47 Total 73 4.86
265 17.63
538 35.80
180 11.98
249 16.57
198 13.17
1503 100.00
Clusters 16 = multi-dimensional discourse type
chapter 8
Vocabulary-based discourse units in university class sessions

BY Eniko Csomay
During the past four decades, classroom talk has been investigated from multiple perspectives. Several studies have looked at the general structural and interactional patterns of individual class sessions (Cazden, 1986; Chaudron, 1988; Long & Sato, 1983; Mehan, 1979; Sinclair & Coulthard, 1975), including foreign language classrooms. More recently, the socio-cultural aspects of classroom interaction in K-12 settings have also been investigated (Poole, 2005; Wells, 1999). Linguistic studies of academic lectures reflect the general interest in the lexical, rhetorical, and topical structures of discourse. For example, Nattinger and DeCarrico (1992) identify recurring multi-word sequences (lexical phrases) in lectures, classifying them into global or local discourse organizers depending on their discourse functions. As for rhetorical patterns, disciplinary differences are described by identifying coherent sub-units in lectures (e.g., phases by Young, 1994) or through the discourse organization of lectures in varying disciplines (e.g., Dudley-Evans, 1994b). Hansen (1994) concludes that tracing discourse markers that signal topic shifts is a most useful method in determining topic shifts in lectures. With the availability of large collections of texts, corpus-based linguistic studies on spoken academic language use have also emerged. Two broad approaches have been pursued. One approach uses corpora to provide comprehensive linguistic descriptions of language use in academic contexts (see. e.g., Biber, 2003). In these studies, the linguistic characteristics of class sessions have been compared to those of other academic and non-academic registers (e.g., textbooks, Biber, Conrad, Reppen, Byrd, & Helt, 2002; face-to-face conversation, Csomay, 2006). A second approach has been taken by scholars who look at individual linguistic features and discuss their functional variants across a number of contexts, such as university class sessions (e.g., pronouns by Fortanet, 2004; reflexivity by Mauranen, 2001; idioms by Simpson & Mendis, 2003; evaluative adjectives by Swales & Burke, 2003). Relatively few previous studies have investigated the discourse organization
of academic class sessions (e.g., Young, 1994 and Dudley-Evans, 1994, noted above), and none of these have been based on large-scale corpus analysis. As a result, these previous studies do not address the major research goals of bottomup corpus-based studies of discourse organization: to provide a comprehensive linguistic description of discourse units and the flow of discourse within texts, and to describe generalizable patterns of discourse organization that hold across all texts of the target corpus (see Chapter 6). The present chapter complements previous studies by adopting a bottom-up corpus-based approach, motivated by these two research goals (see also Csomay (2002; 2005a; 2005b).
From constructing a corpus of VBDUs to identifying VBDU text-types
The study of classroom discourse reported in the present chapter follows the same methodological steps as in Chapters 67:
Step 1: Construct a corpus of VBDUs, by segmenting complete classroom teaching sessions into smaller discourse units; Step 2: Analyze the linguistic characteristics of classroom VBDUs applying multi-dimensional analysis techniques; Step 3: Identify and interpret VBDU types via cluster analysis; Step 4: Analyze classroom teaching sessions as sequences of VBDUs and VBDU types.
1.1
Constructing a corpus of VBDUs
The study here is based on analysis of the classroom teaching texts included in the T2K-SWAL Corpus (see Biber et al. 2004, Biber 2006b), supplemented by a few class sessions from the MICASE Corpus. The corpus includes class sessions from six major academic disciplines and three levels of instruction (lower division undergraduate, upper division undergraduate, and graduate). The sub-corpus of 196 class sessions was segmented into Vocabulary-Based Discourse Units (VBDUs) using the techniques described in Chapter 6. Table 8.1 shows the breakdown of VBDUs across academic disciplines. Only VBDUs longer than one hundred words were considered for further linguistic analyses. The average VBDU length was 231 words, ranging from a minimum of 101 words to a maximum of 1031 words. As Table 8.1 shows, a total of 5,847 VBDUs of 100+ words were included in the analysis.
Chapter 8. Vocabulary-based discourse units in university class sessions
Table 8.1 Breakdown of class sessions and vocabulary-based discourse units by discipline
Discipline Business Education Engineering Humanities Natural Sciences Social Sciences Total Number of class sessions 36 16 35 39 31 39 196 Number of words 236,400 137,200 210,900 300,200 219,000 294,400 1,398,100 Number of VBDUs 1,021 565 941 1,214 901 1,205 5,847
1.2
Analyzing the linguistic characteristics of VBDUs applying MD analytical techniques
Step 2 identified above is to analyze the linguistic characteristics of classroom VBDUs through a multi-dimensional (MD) analysis. That is, as in the preceding chapters, dimensions of variation are identified through a factor analysis, based on the distribution of c. 100 linguistic variables across the 5,847 VBDUs from the classroom teaching corpus. In preparation for the multi-dimensional analysis, a set of lexico-grammatical and interactional features were identified as potentially important. Besides the linguistic features found to be important in earlier MD studies of academic language use (Biber, 2003; Biber et al., 2004), five interactional features were also included in the analysis (see also Csomay, 2005b): the number of turns, for teachers versus students; and the average turn length, for teachers versus students. Turns were defined adopting Taos broad definition: any speaker change will be treated as a new turn (2003: 189). In addition, the analysis included a count of 1-word turns, such as right or good. The final factor analysis in this case has three factors, with 41 of these linguistic variables being retained. Table 8.2 shows the linguistic features co-occurring on each factor.
Table 8.2 Dimensions of variation identified in university classroom talk (Csomay 2005b)
Dimension 1 Contextual orientation versus Conceptual, informative focus Non-past tense First and second person pronoun Modals Non-passive constructions Contractions Third person pronoun (reduced forms) Adverbial clauses: conditional Verb-initial lexical bundles Action verbs in directive forms Commonly used vocabulary Activity verbs Word length Nouns Past tense Prepositions Attributive adjectives Words used in one lecture only Nominalization Dimension 2 Personalized framing That deletion Mental verbs Factual verbs with that I mean You know (Non-passive constructions (Past tense Likelihood verbs with that Third person pronoun (he, she, they) no negative features Dimension 3 Interactive dialogue versus Teacher monologue Student turns Teacher turns One word turns Discourse particles Turn length (teacher) .824 .760 .564 .456 -.611 .780 .570 .502 .499 .452 .410 .371 .370 .365 .346 .307 .620 .523 .424 .409 .384 .350 .323 .777 .724 .621 .570 .544 .482)* .422) .413 .369
* The features in parentheses appear on more than one factor with a loading over the set threshold of .3.
Dimension 1 is interpreted as Contextual orientation versus Conceptual, informative focus. Most of the features on the positive side of Dimension 1 are associated with a context-dependent interaction, such as first and second person pronouns, contractions, third person pronouns, activity verbs, commonly used words, and conditional clauses. This side of Dimension 1 marks interactive instructional discourse where participants are actively involved and where reference is made to the immediate context. In contrast, the features co-occurring on the negative side of Dimension 1 (e.g., nouns, prepositional phrases, attributive adjectives) are associated with a conceptual, informational focus. Overall, this dimension is consistent with Bibers (1988, 1995) first dimension of variation across speech and writing, Informational versus Involved production circumstances; it also supports previous studies that have found that university classroom talk exhibits features of both face-to-face conversation and academic prose (Csomay 2006). Dimension 2 Personalized framing reflects the way in which participants formulate their ideas on-line, overtly expressing their own personal perspective that frames statements. For example, mental verbs (e.g., think, know, guess) express the personal state of mind. Many of these verbs are used with that-clauses, where the complementizer that is often omitted, and the controlling verb expresses the personal framing relative to the information in the that-clause. On Dimension 2, we find especially factual verbs (e.g., know, mean, realize) with that-clauses. These are often used to identify the shared background knowledge of a class (e.g., I mean we know its going to be X plus one). Likelihood verbs controlling thatclauses also co-occur on Dimension 2, as in:
but I certainly dont think [0] these data indicate that we are more lenient to women uh but I also think [0] they raise a serious question about whether or not were more punitive to them too.
Finally, the co-occurring features on Dimension 3 are interpreted as reflecting Interactive dialogue versus Teacher monologue. There are few features on this dimension. The positive features reflect intense turn-taking patterns and discourse particles. In contrast, the negative side of this dimension is characterized by long teacher turns, indicating a more monologic style of presentation. 1.3 VBDUs and dimension scores: the multi-dimensional profile of the first three VBDUs of a business management class
By calculating the dimension scores for each VBDU, we are able to determine the extent to which each discourse unit relies on the co-occurring linguistic features associated with each dimension. For example, Figure 8.1 below displays the multi-
dimensional profile of the first three VBDUs of an upper division Business Management class.
VBDUs and the three dimensions of linguistic variation in university classroom talk Dimension 3 Dimension 2 Dimension 1 5,98
1
-3,83 4,95 -0,12
VBDU sequence
-1,52 0,84 -8,84
-2,47 0,64
-10
-8
-6
-4
-2
10
Dimension scores
Figure 8.1 VBDUs and the three dimensions of linguistic variation in university classroom talk in an upper division Business Management class session
The following text examples illustrate how lexico-grammatical and interactional features function in these three VBDUs, representing the opening discourse of a classroom teaching session. Figure 8.1 shows that VBDU 1 is marked by large positive scores on Dimension 1 and Dimension 3, and a large negative score on Dimension 2. In Text 8.1 below, the linguistic features on the positive side of Dimension 1 (e.g., first and second person pronouns, modals, verbs in non-past tense) are bold italicized. The negative Dimension 2 score of this VBDU corresponds to the relative absence of personal framing features; thus there are no Dimension 2 features to highlight in Text 8.1. Finally, the large positive score for Dimension 3 reflects the large number of different turns in this VBDU, by both students and teachers.
Text 8.1: VBDU 1 from a business classroom teaching session (Positive Dimension 1 features marked in bold italics)
Teacher: to summarize and give the key points in the hot stove rule for us on page three twenty one. um somebody volunteer to do that start looking it up so I Student: hot stove page three twenty one? Teacher: yeah. youve been real good about doing it but you Student: oh, oh somebody else can do it. Teacher: well Im going to say I dont mind youre doing a nice job but we ought to give some of these other brilliant folks a chance. anybody want to summarize the hot stove rule for us?.. Ok. you do it. I dont want to make anybody do it. Student: alright. page three twenty one. Teacher: yeah.... we had a I know one answer marked wrong on my crib sheet I gave you um [xxx] so Im going to save about six or eight minutes to review that uh so um nobody else give me a paper to correct and well [xxx] at the end of the class.
In this first VBDU, the teacher starts off by prompting students to summarize key points from an earlier discussion. Direct reference is made to the context by referring to the speaker and addressee (I and you) and to specific pages in the textbook. The discourse is also highly interactive. This VBDU functions primarily to address class and instructional management issues. The second VBDU in this class session uses features from both poles of Dimension 1, but few Dimension 2 features. This VBDU has one long teacher turn, followed by several shorter turns by both students and the teacher. In this VBDU, the teacher outlines the structure of the present class session. Then he introduces the first topic for the day, evoking students background knowledge through a question-answer sequence on what the content of the chapter is. Text 8.2: VBDU 2 (Positive Dimension 1 features are marked by bold italics; negative Dimension 1 features are marked by bold.)
Teacher: and uh usually I stick pretty close to the text in going through the lesson somewhere along the line some people have said that helps them follow the material and notes but today theres a lot of material in that chapter that sort of clutters it up. its more cluttered and worthwhile so Im going to while an abbreviated lecture notes that I have and I will skip quite a few of those what I think are less important items in there we will have a brilliant dissertation
Student: Teacher:
Student:
Teacher: Student: Teacher: Student: Teacher:
on the hot stove rule and then we will take uh six eight minutes to talk about the case and uh which I think has some value uh with that are we about ready to go? ok. what whats our chapter about today? managing conflict and stress. and the text gives us a definition of conflict which may be a little different than the one that when your big brother used to beat you up you thought about conflict. um what does the text say about conflict, what it is? somebody. overt behavior that results when an individual or group of individuals thinks they perceive need or needs of the individual or group has been blocked or is about to be blocked. first of all its overt behavior. What do we mean by overt behavior? obvious pardon? obvious obvious or real. its just real behavior, uh and it happens when an individual or groups of individuals think what do they think or perceive?
Finally, the third VBDU has a large negative score on Dimension 1, a negative score on Dimension 2, and a score near 0.0 on Dimension 3 (see Figure 8.1). Text 8.3 highlights the dense use of negative Dimension 1 features, including nouns, attributive adjectives, and prepositional phrases. In this case the instructor takes several longer turns, presenting course content, interspersed with short responses from students. Text 8.3: VBDU 3 (Positive Dimension 1 features are marked by bold italics; negative Dimension 1 features are marked by bold; positive Dimension 2 features are CAPITALIZED; positive Dimension 3 features are marked by underlined italics; negative Dimension 3 features are underlined)
Student: things have been blocked from them. Teacher: that something that they think is important something is blocking them from obtaining that need of what they desire. The text says uh well we just talked about it uh when a perceived need has been blocked made unobtainable apparently unobtainable. and why do conflicts arise? what causes conflicts to occur? Student: uh disagreements.
Teacher: Ok.... its real simple. different people have different perceptions of their needs and uh their beliefs and goals and uh these different perceptions generate these conflicts. uh out in the business world and this was particularly true uh according to my observation with my thirty some years working with a large corporation several of them, uh that managers have some beliefs about conflict and theyre quite often uh dont true up with what our text describes. what are some of the common beliefs about conflict? Student: [xxx] Teacher: thats the main thing. many managers, particularly managers feel like (that) by gosh you just shouldnt have a lot of conflict in your organization, and many of these managers say Im just not going to put up with a lot of conflict. they try to squash it and keep it uh undercover and keep it down. well, thank goodness the idea about conflict being so bad is in the process of changing. they also think that conflict results from personality problems. you wouldnt have conflicts in your group or your organization if you didnt have a bunch of people uh that are less than uh normal dont have their head screwed on right. and conflict uh produces inappropriate uh reactions of other people involved.
To summarize, this section illustrates how VBDUs can differ in their dimension scores, specifically comparing the first three VBDUs from an upper division Business Management class. In previous research, Csomay (2005b) describes how the first VBDUs in a class session are often similar to one another in their linguistic characteristics. In the present case, though, we see relatively large differences across these VBDUs, as the instructor shifts from class management tasks to actual course content more quickly than in many class sessions.
2 Dimension scores and VBDU text-types As in the preceding chapters, cluster analysis is used next to identify groupings of VBDUs with similar linguistic profiles the VBDU Types. Each VBDU has a specific characterization within the three-dimensional space created by the dimension scores. Figure 8.2 illustrates this in a three-dimensional scatter plot. Cluster
analysis1 produces groupings of VBDUs based on two major considerations: 1) the VBDUs grouped into a cluster are maximally similar to one another in their linguistic characteristics in this 3-dimensional space, and 2) the different clusters are maximally different from one another. In the present case, the solution for four clusters best captured the patterns of variation among these VBDUs, and Figure 8.2 shows the 3-dimensional linguistic characterization of those four clusters. Two of the three dimensions are particularly strong in determining which VBDU falls into which cluster. Dimension 1, Contextual orientation versus conceptual, informational focus, is the most powerful predictor (R squared: 0.7759), while Dimension 2, Personalized framing is also a strong predictor (R squared: 0.4024). Dimension 3, Interactive dialogue versus Teacher monologue has the least effect (R squared: 0.0636) in determining cluster membership. Centroid measures help to determine the typical linguistic characteristics of each of the four clusters. Centroids are mean scores, reflecting the central characterizations of each cluster with respect to each dimension (Biber, 1995. p. 323). Table 8.3 provides descriptive statistics for the centroid measures of each cluster. The clusters are roughly the same size, with c. 1200 1700 VBDUs being grouped into each one.
1. The statistical cluster analytical procedures were run in SAS, a statistical computer program. To carry out the cluster analytical procedures, FASTCLUS, a type of partitioning technique that allows the classes to be mutually exclusive was run on the nearly 6000 VBDUs, with the three dimension scores as predictors. In order to find the optimal number of clusters, researchers normally run a set of analyses with varying clusters while closely monitoring the relationship between the value of the clustering criterion and the number of variables. In our case, the number of variables is 3, since we have three dimension scores for each discourse unit. As Everitt (1974) explains It is generally suggested that a plot of the criterion value against the number of groups will indicate the correct number [of clusters] to consider by showing a sharp increase (or decrease, if the criterion is being minimized), at the correct number of groups. (p. 59) Solutions of 3, 4, and 5 clusters were run and the data was inspected. A sharp decrease was shown in of the cubic clustering criterion between solutions 5 and 4 while no major change was shown between solutions 3 and 4. Hence, in the light of Everitts observation above, a 4 cluster solution was applied in the present study. This solution grouped those VBDUs together that share linguistic characteristics, while the clusters themselves remained maximally distinct.
Figure 8.2 Clusters plotted in a three-dimensional space represented by the three dimensions of variation
VBDUs falling into Cluster 1 have the highest use of features related to positive scores on Dimension 2, reflecting personalized, narrative kinds of discourse. At the same time, Cluster 1 has the lowest positive score on both Dimension 1 and Dimension 3, showing a moderate use of contextual features as well as moderately dialogic discourse. VBDUs falling into Cluster 2 have extremely large negative scores on Dimension 1, reflecting a very low number of linguistic features associated with a contextual orientation, and a dense use of features associated with an informational focus. The VBDUs in this cluster also lack linguistic features associated with personalized framing (a negative Dimension 2 score), and have a large negative Dimension 3 score, reflecting long teacher monologues.
Table 8.3 Centroid measures for the four clusters

Variable Cluster 1 Dimension 1 Dimension 2 Dimension 3 Cluster 2 Dimension 1 Dimension 2 Dimension 3 Cluster 3 Dimension 1 Dimension 2 Dimension 3 Cluster 4 Dimension 1 Dimension 2 Dimension 3 N 1315 1315 1315 1178 1178 1178 1654 1654 1654 1700 1700 1700 Mean 0.90 5.63 0.06 12.49 2.74 1.75 10.16 0.57 0.83 1.88 3.03 0.37 Std Dev 4.38 5.54 3.28 4.68 3.25 2.12 4.66 4.79 4.06 3.21 2.21 3.96 Minimum 11.77 2.58 4.64 42.14 9.16 4.75 3.20 6.95 4.63 10.74 7.84 4.69 Maximum 20.08 36.31 14.89 5.93 8.83 10.46 37.52 35.08 23.47 6.63 6.89 33.12
In contrast, Cluster 3 has the highest positive Dimension 1 score and the highest positive Dimension 3 score. Hence, VBDUs in this cluster exhibit linguistic features associated with a dialogic type of discourse and a strong contextual orientation. Finally, Cluster 4 is characterized by negative features on both Dimension 1 and Dimension 2, and a score near 0.0 on Dimension 3. 2.1 Interpreting the clusters as VBDU types based on their linguistic characteristics
In Section 1.2, it was shown that each VBDU is characterized by all three dimensions, and as described in Table 8.3 above, each cluster can also be characterized by scores on the three dimensions. It is further possible to plot the centroids of each cluster, providing an overall multi-dimensional profile of the VBDU-types, as in Figure 8.3. Based on these linguistic profiles, we are able to propose interpretive labels for each VBDU Type: Personalized framing (Cluster 1), Informational monologue (Cluster 2), Contextual interactive (Cluster 3), and Unmarked (Cluster 4).
Figure 8.3 Four types of VBDUs (Personalized framing, Informational monologue, Contextual interactive, and Unmarked) on three dimensions
2.1.1 Cluster 1: Personalized framing VBDUs in Cluster 1 typically have extremely large Dimension 2 scores (Personalized framing), but unmarked scores on Dimensions 1 and 3. Hence, this VBDU type exhibits by far the most linguistic features associated with the personal framing of the speakers ideas, using features like factual verbs and likelihood verbs controlling that-clauses. Text Excerpt 8.4 illustrates this VBDU type. Text Excerpt 8.4. Example of VBDU Type Personalized framing (Cluster 1) (Positive Dimension 2 linguistic features are in Capitalized italics: that-complement clauses, factual and likelihood verbs, mental verbs, you know, I mean, and third person animate pronouns.)
Student: Yeah, kind of [laughter] Teacher: you dont you dont believe that he did that? Student: No I mean I dont really want to take it as a I dont know I mean I take it for what it is which is a further mythologizing of the great Thomas Edison as one of the great men and the great I dont KNOW I mean he says Im sure the story has been embellished throughout time and I THINK yeah thats the thing I underlined and I MEAN I dont KNOW Im sorry keep going Student: yeah Student: no I
Teacher: but the point is what that story should (hear what...that storys) about communication Student: right I I KNOW Teacher: Now not that it shows about Edison per say about Edisons genes, Student: right Teacher: you YOU KNOW, or genius but what it says about genius as it was it constructed in these communications networks Student: But I wanted more of that
Personalized framing VBDUs reflect framing not only from structural and semantic perspectives (e.g., verbs controlling that- complement clauses or mental verbs, respectively) but also from an interactional perspective. In addition to expressing ideas, two specific cognitive verbs (think and know) also show interactional framing functions. In these instances, they appear as fixed expressions such as, I mean and you know, or are often used together (I mean you know or you know I mean) constituting frequently occurring four-word combinations (lexical bundles; see Biber et al.(2004). Either together or simply repeated separately, these combinations provide time for the speaker to hold the floor in the interaction while they are (re)formulating their ideas or expressing their opinions on line. In Text 8.4, the teacher challenges the students statement on the given topic, prompting the student to think about his interpretation of the text under discussion. Accordingly, the student reformulates his thoughts to restate his position, which is then supported by the teacher. Besides clarifying ideas and expressing opinions through reformulation, as we have seen in the extract above, further communicative functions are apparent in units that exhibit language associated with Personalized framing. For example, when the frequent use of third person pronouns and past tense co-occur, they are most commonly associated with personal narratives. In the extract below, third person pronouns and past tense co-occur with conditional clauses. These three features together signal hypothetical past, suggesting an interpretative function of this segment. Other features from the two dimensions (e.g., modals) are also marked. Text Excerpt 8.5. Example of VBDU Type Personalized framing (Cluster 1) showing narrative features (Positive Dimension 1 features are marked by bold italics. Positive Dimension 2 features are marked by CAPITALIZED ITALICS.)
Student: To me if she didnt know, what the radiation was doing to her, then her power that she possessed wouldnt hold true. I mean supposedly why she had power was because she was doing something for the good of mankind. Um, if she truly didnt
Teacher: Student: Teacher: Student: Student: Teacher:
Student:
Teacher: Student: Teacher:
know what was going on that leads me to think that, she it was more of an accident that she found out these things. Mhm And if its an accident you truly dont (exert) a lot of power. Ok. She did make a sacrifice but she didnt know she was doing it. Exactly, exactly. The poet seems to think she was denying these things, for some reason. Maybe she was aware but she denied.. refused to accept it. Yeah. Yeah I agree with that. I mean thats what I was thinking like like she suspected it, but she wasnt gonna admit it to anybody or, profess it to the world. I mean she was gonna go on with what she was doing. Mhm. I mean she was a scientist after all. Right. She can look at the cause effect relationship.
VBDUs with a narrative tone potentially serve multiple communicative, instructional, and discourse purposes in a classroom context. For example, they are used to raise attention or to create background to an immediately forthcoming point, to expand on a previously made major point, or to provide a niche to interpret a previously introduced text or other instructional materials. This interpretative focus is highlighted in Text Excerpt 8.5. 2.1.2 Cluster 2: Informational monologue Informational monologue VBDUs typically have large negative scores on all three dimensions, meaning that they are highly informational (Dimension 1), marked by the absence of personal framing features (Dimension 2), and primarily monologic with long turns taken by the teacher (Dimension 3). Text Extract 8.6 illustrates this VBDU type. Text Excerpt 8.6. Example of VBDU Type Informational monologue (Cluster 2) (Negative Dimension 1 features are in bold. Note also the lack of the linguistic features from positive Dimension 2 which shows an impersonal style, and the long teacher turn reflecting monologic discourse, which is a feature of negative Dimension 3)
Teacher: Okay. All righty um what I want to do is continue with this discussion that weve been trying to show, between the interaction of history and, language change, and again as I state were using the Ro-
mance languages as sort of our test case, because we have an abundant documentation of the situation both of the historical development of the Romance languages the historical background of real world events which occurred from the start of the Roman Empire right up through the fall of the Roman Empire, and the ultimate uh fate of the various provinces of the Roman Empire, and we also have an abundant corpus of linguistic documentation. so to a large extent the Romance languages present an ideal case, uh for studying, the development of language against the background of history. or conversely how history affects language. one of the issues that I want to particularly concentrate on today is the issue of the linguistic, uh impact of language contact. one of the main historical themes that Ive been stressing, throughout the last uh couple of classes has been that in the history of (thin) the linguistic history of the Roman Empire, weved movements of peoples. we first of all have the expansion of the Romans. as the Romans left Rome and over the course of several centuries expanded their territorial domain, to what was to become the Roman Empire which at its height, stretched from Ireland all the way in uh through west bowell most of uh central western and eastern Europe and through the Mediterranean basin both the north and the south shore the Mediterranean, and beyond into Asia Minor.
In this segment, the teacher summarizes the major theme of a learning unit for the given class session: the interaction between history and language change. VBDUs from this type serve purposes such as academic reporting, reading written text out loud, listing, and so on. 2.1.3 Cluster 3: Contextual interactive Contextual interactive VBDUs have the highest positive scores on Dimension 1 (Contextual orientation vs. Conceptual, informational focus) and Dimension 3 (Interactive dialogue vs. Teacher monologue), as well as a small positive score on Dimension 2 (Personalized framing). A Contextual interactive VBDU exhibits linguistic and interactive features that are associated with texts reflecting high participant involvement. As illustrated in Text Excerpt 8.7, references are made to the immediate spatial or mental context as a hypothetical situation is discussed while the participants are involved in a dialogue. Text Excerpt 8.7. Example of VBDU Type Contextual interactive (Cluster 3) (Positive Dimension 1 features are marked by bold italics. Positive Dimension 2 features are marked by CAPITALIZED ITALICS.)
Teacher: . what else would possibly happen? yes? Student: they could go insane. like shoot up the building Teacher: some people get so stressed they come in and shoot up the building. and sometimes they shoot up the people and sometimes they kill people. and that s a certainly a very critical ultimate sign of stress. right? Ok. gosh the post office two or three years ago boy if I saw a post office worker in the post office building I would said I would (bug) out of there because we had two or three didnt we? yeah? Student: but you know well I notice in my office when people are having conflict or whatever they do start to lay out more and call out sick a little bit more. Teacher: they do what? Student: if you know theyre having some type of conflict at work they start calling out. they dont come in and work you know Teacher: oh they report out. absenteeism Student: mhm Teacher: or tardiness certainly. um if you dread to go to work, (whats follows up a natural um Student: it wouldnt take much to Student: yeah Student: just a sniffle or Student: yeah Teacher: thats right you make an excuse not to come to work (to) put yourself in a situation thats not comfortable. Ok.
In this text excerpt, a concept is discussed through exemplification. First the teacher asks the students to think about a hypothetical scenario. After making a general statement on the given topic the teacher gives a concrete example of an incident relating to the topic. The students follow up on the concrete example with more examples from their own context that they characterize even further through a dialogue. 2.1.4 Cluster 4: Unmarked While the other three VBDU types differ sharply from each other in one or two dimensions, this last type exhibits no specific distinctiveness from the others, having scores near 0.0 on all three dimensions.
3 From VBDU text-types to discourse structure 3.1 Functional interpretation of VBDU types
To this point, the VBDU-types have been interpreted primarily by considering their multi-dimensional profiles (see Figure 8.3). To provide a more detailed functional interpretation of each VBDU type, I carried out a survey of VBDUs from each type, identifying the primary instructional purpose of each one. A sample of fifty VBDUs (approx. 1% of the corpus) was selected for the survey. Each VBDU was analyzed to determine its primary instructional purpose. Eight major categories are distinguished, based on previous research (e.g., Cazden, 1986; Marton & Tsui, 2004; Sinclair & Coulthard, 1975), and supplemented by a detailed consideration of the tasks performed by instructors and students in the current corpus: 1. Academic reporting: presentation of out-of-class research project or in-class group work (students); presenting a case study or the results of past studies (teacher) 2. Exposition: stating a series of facts to explain a concept; transmitting information by restating facts from a book; providing a conceptual framework for a topic; classifying measurements; direct quotation from written text 3. Demonstration: demonstrating a computer program (as the focus of the presentation); 4. Expansion: topic elaboration; exemplifying through personal opinion, narrative, or reformulating the content of written text; contrasting multiple aspects of past events, discussing discipline specific matters, narrating past events including personal and professional experience (facts and reflection), or through solving a problem on-line; 5. Elicitation: brainstorming terms and concepts on a particular topic through post reading reflections, eliciting definitions of a concept from the reading; 6. Management: three types of management are distinguished: a. Class management with activities such as: finding lecture notes; finding past exams on the network; procedures to disseminate exams; collecting exams; procedures for exam evaluation; handing in assignments; deadlines and the sequence of assignments, activities, and exams; roles and procedures to carry out a project. b. Instructional management with activities such as: using contextual resources (visuals, e.g., board, graph, chart, computer program); making topic related references to other readings or materials; illustrating a concept by using a computer.
c. Technical management with activities such as: getting ready for a presentation; changing sites during an IITV distance education class; putting a microphone on. 7. Scaffolding: analyzing a math problem (step by step analysis on board); step by step procedures (e.g., with a computer program or an in-class activity) 8. Summarizing: summarizing previously shared knowledge. The distribution of the different types of VBDUs across instructional purposes is summarized in Table 8.4 below.
Table 8.4 Instructional purposes and patterns of discourse unit types
Instructional purpose Contextual 1. Academic reporting 2. Exposition 3. Demonstration 4. Expansion 5. Elicitation 6. Management 7. Scaffolding 8. Summarizing Total 1 (11%) 1 (100%) 2 (13%) 14 (100%) 1 (33%) 19 (38%) Discourse unit type Informational Personal Unmarked Total 3 (75%) 1 (25%) 4 (100%) 8 (89%) 9 (100%) 1 (100%) 10 (67%) 3 (20%) 15 (100%) 3 (100%) 3 100%) 14 (100%) 2 (67%) 3 (100%) 1 (100%) 1 (100%) 11 (22%) 12 (24%) 8 (16%) 50 (100%)
Interesting patterns emerge from this sample data. Academic reporting and Exposition instructional functions are usually realized as Informational, monologic VBDUs. Management activities are always realized as Contextual interactive VBDUs in this sample. Expansion activities (e.g., exemplifying, interpreting and clarifying), are usually personal (Personalized framing VBDUs) rather than strictly informational presentations. Overall, three major patterns emerge from this survey. First, there is almost a 1-to-1 correspondence between Informational monologue VBDUs and academic / expository instructional purposes. Second, Personalized framing VBDUs are usually used for expansion activities. And finally, Contextual interactive VBDUs are usually used for management activities. However, it is interesting that these Contextual interactive VBDUs seem to be the most versatile, serving expansion and demonstration functions, and even exposition purposes in one case. In sum, although the present survey is based on a small sample of VBDUs, it identifies striking differences among the instructional purposes typically served by each VBDU Type.
3.2
Texts as sequences of VBDU types2
One of the central goals of this book is to provide a comprehensive linguistic description of discourse units and the flow of discourse within texts. In Section 1.3 above, I showed how this goal can be addressed by tracking the multi-dimensional profile of VBDUs across all discourse units of a classroom teaching session. I return to this goal here, using the VBDU-Types to track the flow of discourse in these texts. Table 8.5 below itemizes all VBDUs in the same classroom teaching session that was discussed in Section 1.3. For each VBDU, this table lists both the VBDUtype and the primary instructional purpose accomplished in the VBDU. We saw in Section 1.3 that the first three VBDUs in this text were very different in their multi-dimensional linguistic characteristics; in Table 8.5 we see that these three VBDUs belong to three different VBDU-Types: Contextual interactive, Unmarked, and Informational teacher monologue. The linguistic characteristics of these VBDUs are discussed in detail as Text Excerpts 8.1, 8.2, and 8.3 (above). (VBDU 7 from this text was also discussed earlier, as Text Excerpt 8.7). In the discussion below, I focus on three other VBDUs from this class session: #26 28. These VBDUs come from three different types, and thus further illustrate how the types exhibit different linguistic profiles, corresponding to differences in communicative and/or instructional purposes. The linguistic profile of VBDU #26 places it into the Personalized framing type. The dominant linguistic features in this VBDU are positive Dimension 2 features, such as that-deletion, mental, factual, and likelihood verbs (e.g., find, think, know), third person personal pronouns, and past tense. This VBDU type also has a moderately positive Dimension 1 score (contextual orientation).
2. It is important to emphasize that although short VBDUs (less than 100 words) are not included in the linguistic analysis, they have important functions in a stretch of discourse. Due to space limitations, however, the present study does not attempt to provide a detailed account of either their linguistic characteristics or their functional purposes. As a result, the study can offer only a restricted description of the overall organization of class sessions. In general though, class sessions include multiple longer VBDUs, making it possible to provide preliminary comparisons of the overall discourse organization of class sessions.
Table 8.5 Instructional purposes of the discourse units in a business management class
VBDU Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 * Cluster Number 3 4 2 4 4 3 VBDU Type Contextual interactive Unmarked Informational teacher monologue Unmarked Short* Unmarked Contextual interactive Short Short Unmarked Short Personalized framing Short Short Personalized framing Unmarked Unmarked Unmarked Informational teacher monologue Unmarked Short Short Informational teacher monologue Unmarked Contextual interactive Personalized framing Contextual interactive Informational teacher monologue Personalized framing Short Contextual interactive Short Instructional Purpose Management (class management) Elicitation Exposition Elicitation Elicitation Expansion (exemplification)
4 1
Expansion (explanation) Expansion (elaboration)
1 4 4 4 2 4
Expansion (elaboration) Expansion (explanation) Expansion (explanation) Expansion (explanation) Expansion (exemplification) Expansion (exemplification)
2 4 3 1 3 2 1 4
Exposition (description) Elicitation Expansion (exemplification) Expansion (interpreting, clarifying text) Summarizing (reformulating text) Direct quotation Expansion (interpreting text) Summarizing
Short indicates that the VBDU was 100 words and so was not included in this analysis.
Text Excerpt 8.8

VBDU 26 Personalized framing (summary and clarification) (Positive Dimension 1 features are marked by bold italics. Negative Dimension 1 features are in bold. Positive Dimension 2 features are marked by CAPITALIZED ITALICS.) Teacher: got oh we need to find out about the hot stove rules uh and therere three or four really important points in the hot stove rule. and who was the name of the person who wrote this? Student: Douglas McGreagor. Teacher: Douglas. Student: McGreagor. Teacher: Mcgreagor. Ok. um summarize those four or five points out of DMs hot stove rule. Student: well first of all what hes talking about is when a manager has to manage conflict at work. he has to act like uh or hes going from the frame of reference if youre the mother or father and your child or whatever touches the stove and how that child reacts, thats kind of the frame of reference hes going from. so youre saying the manager has to be swift uh you have to quickly establish rules and policies so you wont have difficulties and harassments in the future. Teacher: now why is being swift and stepping in immediately important um rather than sitting there and letting it just go on? Student: because if you let it fester up YOU KNOW people are going to get that much more upset with each other and YOU KNOW somebody else might even get involved. its just better to nip it in bud while you have the opportunity. Teacher: Ok nip (it) in the bud is probably a good term. now what was the second point that D.M. made? Student: its relatively intense with the first offense and what hes saying there is like the first time it happens you want to make sure you jump on it with both feet so that both the parties dont THINK its alright to do? and they need to YOU KNOW cease that kind of activity and you dont want to give the same people a second chance to do the same thing again, same conflict arise same disagreements Teacher: Ok. what was the third point?
In VBDU #26, the teacher prompts students to summarize and to explain multiple aspects of the concept under discussion. The primary instructional purpose focuses on interpreting and clarifying a written piece of text, with students express-
ing their own personal interpretations. As a result, we see frequent use of 3rd person pronouns, referring to the author of the source text, and frequent use of the discourse marker you know framing the students interpretation. Positive Dimension 1 features are also relatively common in this VBDU, including 2nd person pronouns, conditional clauses, and activity verbs. In sum, VBDU #26 displays the linguistic features of a Personalized framing VBDU while reflecting the most typical instructional purpose of this VBDU type: Expansion (interpreting and clarifying; see Table 8.5). In the next segment, VBDU #27, is from the contextual interactive type; the dominant lexico-grammatical features are first and second person pronouns, contractions, and modals positive Dimension 1 features along with interactive features from positive Dimension 3 such as, relatively short turns and discourse markers such as well and ok. Text Excerpt 8.9
VBDU #27 Contextual interactive (elaboration) (Positive Dimension 1 features are marked by bold italics. Negative Dimension 1 features are in bold. Positive Dimension 2 features are marked by CAPITALIZED ITALICS. Positive Dimension 3 features are marked by underlined italics.) Student: its impersonal.just YOU KNOW (you) cant take this personally because it s just my job. I MEAN I think THEIR point is that the manager should let the employees know that this is nothing personal no matter what HE or SHE decides. Teacher: Ok. now quite often we as managers we do make that mistake. we personalize things giving the impression that I dont like you uh youre a screwball uh gosh youre dumb. you want to take the person out of it saying that is a non-acceptable act nothing about the person, its just a non-acceptable act and you cant do that. Ok. whats the next point? Student: well the next one in my opinion is just like its personal. it says it emphasizes behavior not the person? which in my opinion is the same thing as. Teacher: dont personalize (it). Ok. and the last one?
This VBDU begins by summarizing the previous discussion. The student summarizes the ideas in the text and makes direct references to the authors of the text by using third person pronouns such as, their, he, she. Responding to this, the teacher repeats the content of the students summary but uses 1st and 2nd person pronouns (you and we) to include the actual classroom participants, bringing
the hypothetical situation back to the physical context. The instructional purpose of this segment is Summarizing (see Table 8.5), but the summary involves a direct contextualized interaction with students. Finally, VBDU #28 comprises a quote from the written text and thus represents the Informational monologue type. The dominant linguistic characteristics are negative Dimension 1 features (e.g., nouns, attributive adjectives, past tense features, prepositions) and negative Dimension 3 (long turns). Text Excerpt 8.10
VBDU # 28 Informational monologue (written text that is read aloud) (Positive Dimension 1 features are marked by bold italics. Negative Dimension 1 features are in bold. Positive Dimension 2 features are marked by CAPITALIZED ITALICS.) Student: it is consistent. uh HE says in HIS example the hot stove will burn you every time therefore the punishment should be consistent and without favoritism and Teacher: one of the things THEY stress in parenting is to be consistent and particularly with parents um some parents are inconsistent between siblings. uh fathers are notorious for letting THEIR little darling girls get away with what THEY swat the boys about. and mamas have a tendency to let the boys get away with things a little stricter on the girls. thats not in all cases but its certainly possible. Ok. let us look at the case. the case was about what B? Ok. how about somebody summarizing the reading of the case for us. the man with the mic over there can say it better than anybody. Student: problems at the hospital.Smith County is a suburban area near a major midwestern city. The county has experienced such a tremendous rate of growth during the past decade that local governments have had difficulty providing adequate service to the citizens. Smith County Hospital has a reputation for being a first class facility but it is inadequate to meet local needs. During certain periods of the year the occupancy rates exceed the licensed capacity. There is no doubt in anyones mind that the hospital must be expanded immediately. At a recent meeting of the hospital authority, the hospital administrator K. A. presented the group with a proposal to accept the architectural plans of the firm of W and G G. the plan calls for a hundred bed additional addition adjacent to the existing structure. K. announced that after reviewing several alternative plans, SHE believed the W and G plans would provide
the most benefit for the expenditure. At this point R.R.L. the board chairperson began questioning the plan. R. made it clear HE would not go along with the W and G plan. HE stated that the board should look for other firms to serve as the architects for the project. The ensuing argument became somewhat heated and a ten minute recess was called to allow those attending to get coffee as well as allow tempers to calm down. K was talking to J. R. another member of the hospital authority board in the hall and said R. seems to fight me on every project. R. who was talking to other members of the board was saying I know that the W and G plan is good but I just cant stand for K. to act like its HER plan. I wish SHE would leave so we could get a good administrator from the community who whom we can identify with. Teacher: the text asks the question Is R.s reaction uncommon? Explain. Somebody want to comment on that?... Student: I would say its not with a big change like that theres usually people are change. Teacher: THEY usually oppose change did you say? Ok. anybody else got a comment?
What is most noticeable in VBDU 28 is the large number of linguistic features from the negative side of Dimension 1, more characteristic of written text than typical speaking. This VBDU also includes some personal framing, by both teacher and student, as an expansion of the written text; but by far the dominant style of this VBDU is determined by the typical linguistic characteristics of informational written prose. The series of three extracts above illustrates how differences in the VBDUs linguistic features relate to the differences in context and primary instructional purposes. Taken together, an analysis of these shifts provides a linguistic perspective on the discourse structure of a classroom teaching session.
4 Summary and conclusion The primary goal of this chapter was to demonstrate how the discourse structure of an individual class session can be described based on the intra-textual linguistic changes identified through variation in VBDUs. This chapter examined the discourse patterns of an upper division Business Management class session, relying on patterns of linguistic variation identified from a corpus of university class sessions. The VBDU Types were analyzed in terms of their typical instructional pur-
poses. For example, Informational monologue VBDUs typically relate to instructional purposes such as, Academic reporting and Exposition; Contextual interactive VBDUs relate to class management functions; and Personalized framing VBDUs relate to Expansion as an instructional purpose, including activities such as clarification, or text interpretation. Each classroom teaching session is uniquely organized beyond its macro-patterns (Young, 1994). However, describing the structure of classroom discourse beyond its general macro-structure (e.g., openings and closings) has proven to be a challenging task for researchers. The present study contributes to this area of research by applying corpus-based methods to the analysis of classroom teaching discourse. It takes the co-occurring patterns of a large number of linguistic items in a large number of units as the basis for the analysis. The study further describes the relationships between language variation and corresponding communicative functions and instructional purposes in class sessions. As a result, variation within class sessions can be described, allowing the analyses to go beyond the description of simple macro-structure (openings and closing).
chapter 9
Conclusion
Comparing the analytical approaches
1 Overview
In Chapter 1, we noted that there have been three major domains of inquiry associated with discourse analysis: 1) the study of language use; 2) the study of linguistic structure beyond the sentence; and 3) the study of social practices and ideological assumptions that are associated with language and/or communication (see also Schiffrin et al., 2001). Many recent discourse studies focus on the third perspective. For the most part, these studies do not deal with linguistic analysis, and often they do not even provide examples of specific texts. Rather, studies of this type tend to discuss communication practices and cultural norms and relationships, apart from their linguistic instantiation in particular texts. As Scollon and Scollon (2001, p. 538) put it, In this line of development the primary focus is on society and social practice, with an attenuated or even absent interest in texts or discourse in the narrower linguistic sense. At the other extreme, the field of corpus linguistics has become almost synonymous with the study of discourse as language use (#1 above). That is, the major strength of corpus-based analysis is to document empirically the patterns of language use in some collection of texts. As a result, most recent discourse studies of language use are based on analysis of a corpus, and conversely, most studies in corpus linguistics describe how lexical/grammatical features are used in discourse. For example, well over half of the articles published in recent issues of the International Journal of Corpus Linguistics state that they are studying characteristics of discourse. Recent corpus-based books have also been quite explicit about their discourse basis, with titles like Discourse in the Professions, Strategies in Academic Discourse, Textual Patterns, Using Corpora in Discourse Analysis, and Exploring Discourse through Corpora. In almost all cases, these corpus investigations are studies of discourse in that they document the patterns of language use in a representative collection of natural texts. Such studies focus on the traditional concerns of corpus linguistics, such as patterns of collocation, the use of particular gram-
matical features, patterns of grammar in association with lexis, the typical linguistic characteristics of particular genres/registers, etc. However, the second analytical perspective on discourse listed above the study of discourse structure, text organization, and linguistic structure beyond the sentence has been neglected in recent years, whether undertaken from qualitative or corpus-based perspectives. This trend represents a dramatic departure from earlier research on discourse, where the description of textual structure and organization was a central concern. For example, the classic textbook on discourse analysis by Brown and Yule (1983) focused almost entirely on linguistic structure beyond the sentence, with sections on discourse topics, topic boundary markers, thematic structure, information structure, cohesion, coherence, frames, scripts, schemata, etc. Every chapter contained detailed discussion and numerous examples of how these aspects of discourse structure are realized linguistically. While such research was popular in the 1970s, 1980s, and early 1990s (e.g., Brown & Yule, 1983; Grimes, 1975; Mann & Thompson, 1992), in recent years it has become much less common to approach discourse as the study of linguistic structure or organization above the sentence-level. Thus, for example, of the 41 chapters in the recent Handbook of Discourse Analysis (Schiffrin, Tannen, and Hamilton (2001), only seven chapters by Martin, Ward and Birner, Polanyi, Dubois and Sankoff, Stubbs, Herring, and Chafe include any discussion or examples of how language features are used to organize discourse above the sentence level. Given this general decline in the study of discourse structure and textual organization, it is not surprising that there have been almost no previous corpusbased studies of discourse organization. Of course, the methodological challenges of such investigations have also been a major deterrent. The study of discourse organization requires detailed analysis of each individual text, to identify their text-internal structures, and to describe their pragmatic functions. But as noted above, corpus-based research has instead focused on linguistic patterns that exist across hundreds or thousands of texts, necessitating the use of automatic analytical techniques (e.g., concordancing software to analyze the collocations of a target word). Such analyses usually do not even acknowledge the existence of individual texts; rather, the goal of the analysis is to produce frequency counts for the entire corpus, rather than analyses of each text. Thus, studies offer findings like Pattern A occurs with a certain frequency in this corpus, with no indication of how this pattern is distributed within or across individual texts.1
1. In a few previous corpus studies, texts are the units of analysis and thus have a more central status. For example, in multi-dimensional studies, the distribution of linguistic features is analyzed separately in each text, providing the basis for the quantitative investigation of how linguistic features tend to co-occur in the texts of a corpus.
Chapter 9. Conclusion
In contrast, individual texts are central to the goals of the present book. That is, to truly integrate the study of discourse organization with corpus-based analysis, we needed to develop an approach that includes a detailed analysis of each individual text, in terms that can be generalized across all texts of a corpus. In Chapter 1, we described how these goals can be achieved through two main methodological approaches: top-down and bottom-up. Top-down is the more traditional approach i.e. applying the discourse-analytic techniques previously developed for the analysis of individual texts to an entire corpus. In fact, the topdown approach is not necessarily corpus-based. Rather, through the intensive efforts of dedicated researchers, these methods can be applied to each of the multiple texts of a corpus. Once that coding is completed, it is possible to apply automatic techniques to describe the typical patterns of discourse organization that hold for the entire corpus. The bottom-up approach, on the other hand, is fundamentally corpus-based. That is, bottom-up approaches are automated so that they can easily be applied to a corpus of any size. Once the techniques are developed, there are no concerns about the effort required to analyze a large corpus of texts. Of course, the major challenge with a bottom-up approach is identifying meaningful discourse structures: producing a kind of structural analysis that is actually useful to discourse analysts. Three specific approaches are illustrated in the present book: two top-down approaches move analysis and appeals analysis and one bottom-up approach, based on vocabulary-based discourse units. These specific approaches clearly illustrate the more general characteristics described above: move analysis and appeals analysis are both extremely labor-intensive, and both approaches have been previously used for more traditional discourse analyses of selected texts. In contrast, the analysis of vocabulary-based discourse units is automated and can easily be applied to a corpus of any size; this approach is specifically designed for corpus analysis and has not been used previously for traditional discourse analysis. As we describe in Chapter 1, there are other important differences between topdown and bottom-up approaches. Perhaps the most important of these is the primary basis of the analysis: functional-qualitative vs. linguistic-quantitative characteristics. Functional analysis is primary in top-down approaches; functional distinctions are determined on a qualitative basis, to determine the set of relevant discourse types and to identify specific discourse units within texts. In contrast, linguistic analysis is primary in bottom-up approaches; a wide range of linguistic distributional patterns are analyzed quantitatively, again being used to determine the set of relevant discourse types and to identify specific discourse units within texts. A related difference is the order of analysis. In top-down approaches, the researcher begins with functional-qualitative methods to develop an analytical framework that describes the types of discourse units in the target genre. That is,
we need to fully describe the (functional) discourse types in this genre before beginning the empirical analysis (segmenting the texts in the corpus into discourse units). In these approaches, linguistic-quantitative analysis comes at an even later step, to facilitate the interpretation of the discourse types. In contrast, bottom-up approaches begin with a kind of linguistic-quantitative analysis: segmenting the texts into discourse units on the basis of vocabulary distributional patterns. Comprehensive linguistic-quantitative (lexical and grammatical) analysis is then undertaken as the basis for identifying the types of discourse units). Functional-qualitative analysis is a later step, to facilitate the interpretation of the discourse types. The two approaches have very different strengths. Top-down approaches apply well-established research methods that are quite familiar to the wider professional community. Further, because the analyses have a primary functional-qualitative basis, they are directly interpretable by discourse analysts. In contrast, bottom-up analyses have a complex quantitative-linguistic basis, analyzing numerous linguistic distributional patterns to determine the discourse units within texts as well as the general discourse types. The strengths of this approach are that it is replicable, can easily be applied to large corpora, and produces generalizable results. Its major weakness is that the discourse description is relatively complex, based on multivariate quantitative-linguistic distributional patterns, and thus does not necessarily represent the kinds of discourse constructs that can be uncovered though the detailed discourse analysis of individual texts. But this weakness might also be considered a strength: by approaching discourse from a radically different perspective, we have the possibility of identifying textual patterns that would otherwise go unnoticed by analysts. Given these major differences in methods, and their complementary strengths, we would predict that the two approaches will provide different insights into the discourse organizational patterns of a genre. At the same time, it is reasonable to expect that the inherent structure of a genre would be reflected in analyses undertaken from both perspectives. The following section explores these relationships by comparing the top-down analysis of biochemistry research articles (Chapter 4) to the bottom-up analysis of biology research articles (Chapter 7).
2 Comparing the top-down and bottom-up descriptions of biology research articles. Two separate studies in this book have focused on the discourse organization of biology research articles. The two studies were carried out independently, so it is not possible to directly compare the analyses of the exact same texts. Further, the
studies are not exactly comparable: Chapter 4 focuses on a specific sub-discipline biochemistry and is based on analysis of research articles from five scientific journals; Chapter 7 focuses on the more general discipline of biology including sub-disciplines like entomology, human genetics, and microbiology and is based on analysis of research articles from ten different scientific journals. However, the similarities between these studies are stronger than the differences both studies investigated published academic research articles in the general discipline of biology. The major difference between the studies is their methodological approach: top-down move analysis in Chapter 4 versus bottom-up VBDU analysis in Chapter 7. Thus it is instructive to compare the nature of the findings from the two studies. The discourse descriptions resulting from these two studies can be compared with respect to four different considerations: a) the nature of the discourse units (moves vs VBDUs) in biology research articles; b) the dimensions of linguistic variation among discourse units in biology research articles; c) the functional and linguistic characteristics of the discourse types (move types vs VBDU types) in biology research articles; d) description of the typical discourse structures of biology research articles. 2.1 Discourse units in biology research articles
The first point of comparison is the nature of the discourse units that are investigated in these two studies. We have already discussed the differing methodological bases of moves and Vocabulary-Based-Discourse-Units (VBDUs): that is, each move expresses a distinct communicative function, while each VBDU uses a distinct set of words. This methodological difference results in discourse units that are somewhat different in nature. First of all, moves tend to be considerably shorter than VBDUs. The shortest moves can be 4 or 5 words long, while the longest moves contain 400500 words. On average, moves are about 56 words long in the biochemistry research articles (see Table 4.3). In contrast, the shortest VBDUs are around 70 words, while the longest VBDUs are around 1,000 words. On average, VBDUs are about 211 words long in the biology research articles (see Table 7.1).2 Moves and VBDUs also differ in terms of their textual coverage: Moves are not necessarily continuous stretches of text. Rather, a single move includes all portions
2. In both studies, the shortest moves were excluded for the quantitative-linguistic analyses, but this restriction also reflected the different nature of these discourse units: moves shorter than 25 words were excluded in the Chapter 4 study, while VBDUs shorter than 100 words were excluded in the Chapter 7 study.
of a text related to a single communicative function, regardless of whether those text segments are contiguous or not. In contrast, VBDUs represent continuous stretches of text. This difference has practical consequences for the description of discourse organization. It is possible to describe a text as a sequence of VBDUs, because those text segments in fact occur in sequential order. In contrast, moves do not necessarily correspond to the actual sequence of words in a text. To some extent, it is possible to describe a typical sequence of moves (as in Chapter 3, Section 3.6), but this task is complicated by the fact that moves can be interspersed and overlapping in a text. In sum, moves and VBDUs are similar in that they both provide ways to segment a text into smaller discourse units, and thus analyze the internal discourse organization of a text. They differ, however, in their methodological bases, and in the actual extent of text included in the unit. 2.2 The dimensions of linguistic variation in biology research articles
In both Chapter 4 and Chapter 7, multi-dimensional (MD) analyses were carried out. As noted in earlier chapters, the goal of this kind of analysis has usually been to identify the underlying parameters of linguistic variation among texts within the target discourse domain, which usually includes several different registers (elementary school registers; university registers). In contrast, the MD analyses in the present book describe the patterns of variation within a single written genre. If these analyses had been based on complete research articles, we would expect two general results: 1) that the two MD analyses would be nearly identical, and 2) that only minor patterns of variation would be uncovered. The first anticipated result is the easiest to understand: Given that the two corpora were constructed to represent highly similar discourse domains, we would predict that the MD analysis would uncover similar dimensions of variation. The second predicted result requires a little more explanation. Like all correlational statistical techniques, MD analysis requires variation to achieve meaningful results. That is, if linguistic features do not vary in their rate of occurrence across the texts of a corpus, this technique will not succeed in identifying patterns of co-variance (or linguistic co-occurrence). When we consider a corpus of scientific research articles from a single discipline where each text is a complete research article we find extremely little linguistic variation across texts. Thus, if the MD analyses in the present book had been based on complete research articles, it is unlikely that we would have discovered interpretable linguistic patterns. However, there is extensive linguistic variation within scientific research articles. That is, the sub-sections and sub-texts within research articles are quite different in their communicative purposes, and those discourse units therefore differ
considerably in their typical linguistic characteristics. One general goal in this book has been to first segment texts into smaller discourse units that are linguistically well-defined, so that we can then track patterns of linguistic variation within texts. Chapters 4 and 7 adopt different approaches to capturing that text-internal variation, segmenting research articles in different ways into fundamentally different kinds of sub-texts (moves versus VBDUs). And the MD analyses in these two chapters were then based on those different kinds of sub-texts. From a statistical point of view, this procedure creates a corpus with linguistic variation. That is, there is extremely little variation among texts in a corpus of research articles. But if those same research articles are segmented into moves or VBDUs, we will discover extensive linguistic differences in a corpus composed of those sub-texts. The question that we take up in the present section is whether we find similar patterns of linguistic variation in a corpus of moves and a corpus of VBDUs. Although these two MD analyses are based on different kinds of sub-texts, they are similar in two crucial respects: 1) the units of analysis cover the full extent of each text, and thus taken together, they cover the full range of linguistic variability found in these research articles; and 2) each of the two multi-dimensional analyses is based on a relatively comprehensive set of linguistic features. As a result, the two MD analyses identify similar parameters of variation. The MD analysis of biochemistry moves identified 7 factors, while the biology VBDU study identified only 4 factors. However, there are close correspondences in both studies for those four factors. Table 9.1 lists the correspondences between the two analyses. Table 9.1 shows that the four factors identified in the MD analysis of biology VBDUs all have highly similar corresponding dimensions in the MD analysis of biochemistry moves, based on the sets of co-occurring linguistic features, as well as the functional interpretations assigned to each dimension. Some differences between the two analyses are due to the fact that a larger set of features were included in the biology VBDU study (especially semantic classes of nouns and verbs, such as abstract nouns, process nouns, activity verbs, communication verbs). However, these additional features co-occur with the core features included in both analyses, rather than defining additional dimensions in the VBDU analysis. As a result, the four dimensions identified in the VBDU analysis all have close counterparts in the move analysis. The positive features for Dimension 1 in the move analysis are long words, attributive adjectives, and nouns nearly identical to the positive features for Dimension 4 in the VBDU analysis. (The major difference is that Dimension 4 in the VBDU analysis includes two specific semantic classes of nouns: abstract nouns and process nouns.) Both analyses interpret this dimension as relating to (abstract) conceptual discussion (as opposed to concrete reference in the case of the move analysis).
Table 9.1 Comparison of the corresponding dimensions in the MD analysis of biochemistry moves (Ch 4) and the MD analysis of biology VBDUs (Ch 7)
Factor analysis of biology VBDUs (See Table 7.3) Factor number & Interpretation 4: Abstract / Theoretical Discussion of Concepts Linguistic features
Factor analysis of biochemistry moves (See Table 4.4)
Factor number & Interpretation
Linguistic features
1: Conceptual versus Concrete Reference
long words attributive adjectives nouns versus numerals acronyms and jargon 3: Procedural Description of Actions / Events
nominalizations long words abstract nouns process nouns attributive adjectives
2: Concrete Action versus Abstract Discussion
passive voice verbs past tense coordinators versus definite articles nominalizations prepositions
passive voice verbs activity verbs past tense progressive aspect time adverbials
Factor analysis of biochemistry moves (See Table 4.4)
Factor analysis of biology VBDUs (See Table 7.3)
3: Evaluative Stance
extraposed It adjective+that-clause predicative adjectives 1: Evaluation of Possible Explanations linking adverbials causative / conditional subordination 2: Current State of Knowledge versus Past Events and Actions
predicative adjectives adjective+to-clause adjective+that-clause modal verbs
5: Attributed Knowledge versus Current Study
present tense references type/token ratio versus nouns past tense
communication verbs present tense perfect aspect epistemic verb + that-clause versus past tense
Dimension 2 in the move analysis corresponds to Dimension 3 in the VBDU analysis. The main features shared by these dimensions are passive voice verbs and past tense. The VBDU analysis additionally has activity verbs and time adverbials loading on this dimension. In both analyses, this dimension was interpreted as a (procedural) description of actions (or events). Dimension 3 in the move analysis corresponds to Dimension 1 in the VBDU analysis. Both dimensions included predicative adjectives, especially controlling a complement clause; this dimension in the VBDU analysis additionally included modal verbs and causative/conditional adverbial subordination. In both analyses, this dimension was interpreted as evaluative stance (comparing the strengths of competing explanations). Finally, Dimension 5 in the move analysis corresponds to Dimension 2 in the VBDU analysis. The basic opposition represented in both of these dimensions is present tense versus past tense. In Dimension 5 of the move analysis, present tense co-occurs with frequent citations to previous research and a high type/token ratio. The VBDU analysis did not include citations as a linguistic feature, but it did include the semantic class of communication verbs, which loads strongly with present tense (and perfect aspect) on Dimension 2. The interpretive labels here are somewhat different, but the prose descriptions in the two chapters show that similar underlying functions are associated with the two dimensions. The positive Dimension 5 features in the move analysis are interpreted as attributed knowledge, while the positive Dimension 2 features in the VBDU analysis are interpreted as current state of knowledge. In both cases, these features are interpreted as presenting the findings of previous research, describing what we already know as the backdrop to the present study. Present tense is used for these descriptions to emphasize that this is the current state of our knowledge, even though it is based on previous studies. In contrast, the negative pole of both dimensions uses past tense verbs to describe the actual actions and events of the present study. For Dimension 5 of the move analysis, this pole is labeled current study in opposition to the current state of knowledge (i.e., summarizing previous studies) in the label of VBDU Dimension 2. Taken together, this comparison shows a strong set of correspondences between the two independent MD analyses, based on different corpora, targeting slightly different genres, analyzed with respect to slightly different sets of linguistic features, and each interpreted on its own terms by different researchers. Most importantly for our purposes here, the two MD analyses are based on different kinds of discourse units: In Chapter 4, research articles are segmented into moves, while in Chapter 7, research articles are segmented into VBDUs. As noted above, the MD approach is statistically feasible only after segmenting the corpus of research articles into smaller linguistic sub-texts. It is reasonable to expect that we might
uncover different parameters of variation when comparing the linguistic characteristics of moves to the linguistic characteristics of VBDUs. However, the comparison here suggests that both approaches to text segmentation result in discourse units that capture the range of linguistic variability in this discourse domain, and as a result, the four major dimensions identified in the VBDU MD analysis have close counterparts in the move MD analysis.3 2.3 The functional and linguistic characteristics of the discourse types (move types vs VBDU types) in biology research articles
The two approaches to discourse structure in the present book have fundamentally different bases for determining the discourse types in a genre: functional criteria in the case of move analysis versus linguistic criteria in the case of VBDU analysis. Given this difference, it might be supposed that the discourse types identified in the two analyses would bear no resemblance to one another. There are in fact major differences in the two sets of discourse types. Fifteen different move types were identified in the analysis of biochemistry research articles, and these were further broken down into 29 different steps in addition to the 7 move types that do not have steps: a total of 36 different discourse types, each with a distinct communicative function. In contrast, only 6 different discourse types were identified in the VBDU study. Thus, it is clear that the move analysis produces a much more detailed description of the different kinds of discourse types than the VBDU analysis. The functional basis of move analysis is the main factor that permits this level of detail: The researcher asks at each point in a text what specific communicative goals the author is trying to achieve. There is no a priori limit on the number of goals in a text or genre, allowing for a very fine-grained description. In contrast, VBDU types must be linguistically different. Further, the specific approach illustrated in Chapter 7 is based on only 4 linguistic predictors: the 4 dimensions of variation. VBDU types are further different from move types in that they are intended to represent only the major linguistic groupings of discourse units. That is, the dimensions of variation represent the major parameters of linguistic variation in this discourse domain, and the VBDU types correspondingly represent only the major clusters of discourse units as defined by those linguistic parameters. Thus, whereas move types can take into account fine-grained distinctions of communicative purpose, the VBDU types are much more general, in this case accounting for only six major groupings of linguistically distinct discourse units.
3. The major difference between the two analyses is due to the fact that the factor analysis in the move study extracted seven dimensions while the VBDU factor analysis extracted only four dimensions.
Despite these differences, there are important similarities in the discourse types identified by the two approaches. One way to describe those similarities is to compare the functional interpretations of the VBDU types (see Sections 68 of Chapter 7) to the move types identified in Chapter 4 (see Table 4.1). Table 9.2 lists some of the most obvious correspondences, where the functional basis of specific move types seems to clearly correspond to the functional interpretation of a VBDU type. Table 9.3 then lists additional correspondences that are more tentative, usually because the functional interpretation of the VBDU type is more general. As summarized in Table 9.2, several discourse types identified in the two analyses seem to have close correspondences to one another. In general, a single VBDU type corresponds to several different move types (and often to specific steps within a particular move type). This pattern makes sense given that VBDU types are more general than move types (see discussion above). In addition, a single VBDU type corresponds to moves/steps from across the extent of a research article. For example, VBDU Type 2 (Procedural description of past actions) seems to correspond to moves/steps that occur in all four major sections of research articles: the Introduction (Move 3-Step 2), Methods (Move 5-Steps 1, 2, 3; and Move 7), Results (Move 8-Step4), and Discussion (Move 13-Step 1). Similarly, VBDU Type 5 (Presentation of the current state of knowledge) seems to correspond to moves/steps that occur in both the Introduction (Moves 1, 2) and Discussion (Moves 12, 13). In contrast, the three VBDU Types listed in Table 9.3 have more general functional interpretations, making it more difficult to identify specific corresponding move types. VBDU Type 3 is interpreted as Report of past events, but it is not entirely clear what moves/steps report past events (apart from the procedural moves that correspond to VBDU Type 2). One likely candidate here is Move Type 4 Step 2 (Detailing the source of materials), which seems to often provide an account of how materials were obtained (a kind of past tense narration). Similarly, It is not clear which move types correspond to VBDU Types 4 (Abstract elaborated discussion) and VBDU Type 6 (Current abstract/theoretical discussion). It seems likely that many of the move types in these articles might present abstract discussion of this type (see the list in Table 9.3), but the functional descriptions of move types in Chapter 4 do not in general distinguish between abstract / theoretical discussion versus more concrete description. (Interestingly, the MD analysis of moves does reflect the distinction between abstract / conceptual discussion versus more concrete description.)
Table 9.2 Strong correspondences between the functional interpretations of VBDU Types and Move Types VBDU Type 1: Current evaluation of implications and explanations
Move Type 11: Commenting on results Step 1: Explaining results Step 2: Generalizing/interpreting results Step 3: Evaluating results Move Type 13: Consolidating results Step 4: Explaining differences in findings Step 5: Making overt claims/generalizations
VBDU Type 2: Procedural description of past actions and events

Move Type 3: Introducing the present study Step 2: Describing procedures Move Type 5: Describing experimental procedures Step 1: Documenting established procedures Step 2: Detailing procedures Step 3: Providing the background of the procedures Move Type 7: Describing statistical procedures Move Type 8: Restating methodological issues Step 4: Listing procedures or methodological techniques
Move Type 13: Consolidating results Step 1: Restating methodology (purposes, research questions, hypotheses, and procedures)
VBDU Type 5: Presentation of the current state of knowledge

Move Type 1: Establishing a topic Move Type 2: Preparing for the present study: Indicating a gap/raising a question Move Type 12: Contextualizing the study Step 1: Describing established knowledge Step 2: Generalizing, claiming, deducing previous knowledge Move Type 13: Consolidating results Step 3: Referring to previous literature
Table 9.3 Tentative correspondences between the functional interpretations of VBDU Types and Move Types VBDU Type 3: Report of past events
Move Type 1: Establishing a topic [if this move reports what previous studies accomplished] Move Type 4: Describing materials Step 2: Detailing the source of the materials [if this Step reports where/how materials were obtained] Move Type 10: Announcing results [if this move reports what the authors found (in the past tense)]
VBDU Type 4: Abstract elaborated discussion (not evaluative and not procedural) VBDU Type 6: Current abstract/theoretical discussion
Move Type 3: Introducing the present study Step 1: Stating purpose(s) Step 3: Presenting findings Move Type 8: Restating methodological issues Step 1: Describing aims and purposes Step 2: Stating research questions Step 3: Making hypotheses Move Type 9: Justifying methodological issues Move Type 11: Commenting on results Step 1: Explaining results Step 2: Generalizing/interpreting results Step 4: Stating limitations Step 5: Summarizing Move Type 13: Consolidating results Step 4: Explaining differences in findings Step 5: Making overt claims/generalizations Step 6: Exemplifying Move Type 14: Stating limitations of the study
In sum, there are surprising correspondences between the move types identified in Chapter 4 and the VBDU types identified in Chapter 7, despite the major differences in the research approaches used to identify the two types of discourse units. In the present section, we matched discourse types based on their functional interpretations in the two approaches. This matching leads to certain predictions, allowing us to test the interpretive bases of the two approaches. For example, we
know that Move Type 5 (Describing experimental procedures) always occurs in Methods sections. If we are correct in claiming that VBDU Type 2 (Procedural description of past actions) corresponds to Move Type 5, then the analysis in Chapter 7 should have shown that this VBDU Type also commonly occurs in Methods sections. We consider predictions of this type in the next section. 2.4 Description of the typical discourse organization of biology research articles
In the last section, we compared the functional bases of discourse types in the move analysis of biochemistry research articles to those identified in the VBDU analysis of biology articles. We identified several correspondences, where discourse types were posited serving similar functions in the two analyses. In the present section, we shift our attention to the question of how these discourse types are used to construct research articles. In the case of move analysis, this question is relatively straightforward: each section of an article is composed of a unique set of moves, and these are described in the order in which they typically occur. Thus, for example, Move Type 1 (Establishing a topic) usually occurs before Move Type 2 (Preparing for the present study), which in turn usually occurs before Move Type 3 (Introducing the present study); all three of these move types occur only in Introductions. Section 5 in Chapter 4 notes that there can be variation in the order of move types within an article section. In addition, moves are not necessarily continuous stretches of text. However, the order of move types given in Table 4.1 is described as the most common sequence within these articles. In contrast, VBDU types are not identified or defined in any way by reference to their position within the text. Thus, VBDUs can potentially occur in any order within a text, and a given VBDU Type can potentially occur in any section of a research article. Despite these differences, it is informative to compare the results from move analysis and VBDU analysis regarding the typical sequence of discourse types within texts. For move analysis, this is based directly on Table 4.1 (in Chapter 4), which lists the move types within each article section, in their most common order. For VBDU analysis, this is based on Figure 7.4 (in Chapter 7), which shows the most common VBDU Type in each section, as well as Figures 7.5 7.7, which compare the preferred VBDU Types at the beginning and end of each section. Table 9.4 summarizes the most common sequential organization of these research articles from the two analytical perspectives. This comparison reinforces and further elucidates many of the functional interpretations in Chapters 4 and 7. For example, Table 9.4 (cf Figure 7.5) shows that VBDU Type 5 (Current state of knowledge) is preferred in the initial position in article Introductions, while VBDU Type 4 (Abstract elaborated discussion) is preferred in the final position of Intro-
ductions. This finding agrees well with the results of the move analysis, which shows how the first two move types in Introductions establish a topic and indicate a gap by surveying previous research (i.e. describing the current state of knowledge) while the third move type introduces the present study, which apparently requires the use of abstract elaborated discussion.
Table 9.4 Comparing the sequential organization of research articles: Move types versus VBDU types
Move Types INTRODUCTION Move 1: Establishing a topic Move 2: Preparing for the present study: Indicating a gap/raising a question Move 3: Introducing the present study Step 1: Stating purpose(s) Step 2: Describing procedures Step 3: Presenting findings METHODS Move 4: Describing materials Step 1: Listing materials Step 2: Detailing the source of the materials Step 3: Providing the background of the materials Move 5: Describing experimental procedures Step 1: Documenting established procedures Step 2: Detailing procedures Step 3: Providing the background of the procedures Move 6: Detailing equipment Move 7: Describing statistical procedures RESULTS Move 8: Restating methodological issues Step 1: Describing aims and purposes Step 2: Stating research questions Step 3: Making hypotheses Step 4: Listing procedures or methodological techniques RESULTS Most common VBDU Type: VBDU Type 3: Report of past events VBDU Types INTRODUCTION Beginning: VBDU Type 5: Current state of knowledge (also VBDU Types 3, 4, 6) End: VBDU Type 4: Abstract elaborated discussion (also VBDU Types 3, 5, 6) METHODS Beginning: VBDU Type 3: Report of past events (also VBDU Type 2)
End: VBDU Type 2: Procedural description of past actions (also VBDU Type 3)

Move 9: Justifying methodological issues Move 10: Announcing results Step 1: Reporting results Step 2: Substantiating results Step 3: Invalidating results Move 11: Commenting results Step 1: Explaining results Step 2: Generalizing/interpreting results Step 3: Evaluating results Step 4: Stating limitations Step 5: Summarizing DISCUSSION Move 12: Contextualizing the study Step 1: Describing established knowledge Step 2: Generalizing, claiming, deducing previous knowledge Move 13: Consolidating results Step 1: Restating methodology (purposes, research questions, hypotheses, and procedures) Step 2: Stating selected findings Step 3: Referring to previous literature Step 4: Explaining differences in findings Step 5: Making overt claims/generalizations Step 6: Exemplifying Move 14: Stating limitations of the study Move 15: Suggesting further research DISCUSSION
Beginning: VBDU Type 5: Current state of knowledge (also VBDU Types 4, 6)
End: VBDU Type 6: Current abstract discussion (VBDU Type 1: Current evaluation of explanations) (also VBDU Types 4, 5)
A second example comes from Methods sections, which have a strong preference for VBDU Type 3 (Report of past events) at the beginning, and VBDU Type 2 (Procedural description of past actions) at the end (see Figure 7.6). This pattern agrees with the sequence of move types identified in Methods sections, with procedural descriptions (experimental and statistical) coming after the description of materials (see Table 9.4). That is, VBDU Type 2 (Procedural description) seems to clearly correspond functionally to Move Types 5 and 7 (Experimental procedures and Statistical procedures).
As noted in Section 2.3 above, it is less clear from the functional interpretations how VBDU Type 3 (Report of past events) corresponds to Move Type 4 (Describing materials). However, the sequential comparison summarized in Table 9.4 indicates that these two do have a strong correspondence the same discourse units occurring at the beginning of Methods sections describe materials from a move-analysis perspective, and they report past events from a VBDU perspective. Findings like this should lead to useful future research, enhancing the descriptions of both the move descriptions and the VBDU descriptions. In the present case, this correspondence indicates that the description of materials in Methods sections often utilizes a narrative mode of discourse, reporting the past actions and events used to obtain experimental materials, rather than a simple descriptive itemization of materials. The preferred sequence of VBDU Type 3 (Report of past events) before VBDU Type 2 (Procedural description) also raises an additional possibility when compared to the move analysis: that there might be two different linguistic styles for Move Type 5 (Describing experimental procedures). That is, it might be the case that experimental procedures are introduced using simple past tense report (i.e. VBDU Type 3), followed by a more detailed passive voice description of the actual actions performed to carry out the experiment (VBDU Type 2). The two analytical perspectives are also in general agreement in their description of Discussion sections. VBDU Type 5 (Current state of knowledge) is preferred at the beginning of Discussion sections, which corresponds closely to Move Type 12 (Describing established knowledge and Generalizingprevious knowledge). VBDU Type 6 (Current abstract discussion) is preferred at the end of Discussion sections, which seems to correspond to Move 13, Steps 45 (Explaining differences in findings and Making overt claims/generalizations). (VBDU Type 1 Current evaluation of explanations is also strongly preferred at the end of Discussion sections when considered in relative terms.) At the same time, there are several aspects of the comparison between these two analyses that are problematic, raising useful questions that can help guide future research. The main reason for the discrepancies has to do with the different analytical bases of the two approaches: move types are functional distinctions while VBDU types are linguistic distinctions. Thus, because the two approaches have complementary strengths, a more detailed comparison of the resulting analyses should greatly facilitate our understanding of the typical discourse structure of the target genre. For example, while Table 9.4 shows that the preferred order of VBDU types in Introductions agrees well with the preferred sequence of move types, other sequences of VBDU types in Introductions are also possible (see Figure 7.5), with VBDU Types 3, 4, 5, and 6 all being relatively common in both the initial and final
position of Introductions. Because VBDUs have a linguistic basis, this finding shows that there is considerable variability in the linguistic styles of the initial and final discourse units in research article introductions. In contrast, the order of move types seems to be relatively fixed in article introductions: Move 1 (Establishing a territory), followed by Move 2 (Establishing a niche), followed by Move 3 (Occupying a niche). That is, the functional progression of article introductions seems to be relatively invariant. These findings might be reconciled in two ways. First, it is possible that Move Types 1, 2, and 3 do not always occur in this order. This possibility is discussed in Chapter 4, which notes that the sequence of move types is not fixed, and that moves are not necessarily continuous stretches of text. However, previous studies of research article introductions all generally agree that that the move types in RA introductions usually occur in a relatively fixed order. This raises the possibility of a second explanation for the two patterns: that a single move type can be realized by multiple linguistic styles. Future research on this possibility might eventually lead to the identification of distinct linguistic subtypes for a given move type. That is, move types are defined on the basis of their communicative function, but the present comparison indicates that there could be fairly extensive linguistic variability in the realization of those functions. Future more detailed research on the linguistic styles used to realize a single move type should help to clarify the variability in this article section. In contrast, we find the opposite pattern for Results sections: Table 9.4 shows that this section contains 13 distinct moves or steps, but these are all usually realized linguistically as the single VBDU Type 3 (Report of past events). In this case, the move analysis indicates extensive functional variability within this article section, but the VBDU analysis indicates only a single linguistic style of expression. Here again, more detailed comparison of the linguistic characteristics of individual move types should help us to clarify the relationship between function and linguistic expression within this article section. In sum, the comparison here shows that the overall patterns of discourse organization documented by move analysis are generally compatible with those documented by VBDU analysis. At the same time, each approach identifies particular patterns of variation that help to clarify and extend our understanding of discourse resulting from the complementary approach. And most interestingly, there are some apparent discrepancies between the two approaches, which do not have ready explanations. We hope that future research focusing on these areas of apparent discrepancy will contribute new knowledge to our understanding of the discourse of the target genre.
3 Summary and prospects for future research The present work is one of the first book-length explorations of how corpus-based methods can be applied to the description of discourse organization. As such, we have only been able to scratch the surface of this important research area. Numerous avenues for future research are obvious: applying the methods described here to additional genres; developing these research methods further; and most importantly, developing additional methodological approaches to these research issues. In our chapters, we have applied top-down and bottom-up analyses to texts from two written genres, one genre from academic discourse research articles and another genre from professional discourse fund raising letters. These same approaches could be applied to other important written genres, such as newspaper editorials (Pak & Acevedo, Forthcoming; Wang, Forthcoming), grant proposals (Connor & Mauranen, 1999; Feng, Forthcoming), book reviews (Suarez & Moreno, Forthcoming), business letters (Loukianenko, Forthcoming), and websites (McBride, Forthcoming). Previous studies of these genres have used top-down analyses to understand discourse structures such as moves and topicsolution top-level structures. Combining such analyses with bottom-up analyses should result in more linguistically based descriptions of these genres. Such an approach will provide useful information that could lead into (semi) automated top-down analyses, which will enable the use of larger corpora. The analyses conducted for the chapters in this book have viewed texts as mono-modal paper-based entities. Yet, technology today increasingly develops texts into multi-modal forms that rely on visuals and digital media. In our study, pictures, headings, white space and such other multimodal textual features were not coded. Increasingly, however, the multimodality of texts has been found to be important in the comprehension as well as production of texts (Kress & van Leeuwen, 1990, 2001; Ventola, Charles, & Kaltenbacher, 2004). We need to continue working on ways to incorporate the multimodal properties of texts in our corpus design and analysis, as well as build ways to analyze digital texts and hypertext, which often rely on different logics of organization. Another consideration for corpus-based descriptions of discourse is the broader contexts around texts. Recent methods of text analysis go beyond texts and include observations of writers and readers, interviews with writers and focus groups as well as ethnographic inquiries (Bazerman & Prior, 2004). Some corpusbased research has also been complemented by more detailed contextual analysis, such as oral interviews. For example, Hyland (1998) used specialist informants in the study of hedging devices in a corpus of 80 research articles. Connor (2000) interviewed five experienced academic grant proposal writers to validate the move analysis system used to analyze a corpus of grant proposals. Future corpus re-
search of discourse should more systematically incorporate such descriptions of context obtained through surveys, interviews, observations, and ethnographies. Thus, we hope that the present work will be the starting point for a new area of research. For discourse analysts, we hope to have shown that there are systematic, generalizable patterns of structure and organization across the texts of a genre, complementing other perspectives such as the detailed description of an individual text and more abstract socio-cultural descriptions of a genre. And for corpus linguists, we hope to have shown that discourse can be studied within particular texts but generalized across a large collection of texts, complementing studies of language use that use a corpus merely as a large body of linguistic forms in context. In a sense, we are advocating a return to the research interests of previous decades, when texts were often analyzed for their internal structure and organization. What we have added here is the corpus perspective, showing how these research goals can be investigated across multiple texts, resulting in generalizable descriptions of discourse organization for a target genre.
Appendix 1
A brief introduction to multi-dimensional analysis

Sections A.1 A.3 of the following appendix are adapted from Chapters 12 of Variation in English: Multi-Dimensional Studies, edited by Susan Conrad and Douglas Biber, published by Longman (2001).
A.1 Conceptual introduction to the multi-dimensional approach to variation Multi-dimensional (MD) analysis was developed as a methodological approach to: (1) identify the salient linguistic co-occurrence patterns in a language, in empirical/quantitative terms; and (2) compare spoken and written genres/registers in the linguistic space defined by those co-occurrence patterns. The approach was first used in Biber (1985, 1986) and then developed more fully in Biber (1988). The salient characteristics of the MD approach are listed below: The research goal of the approach is the linguistic analysis of texts, genres/ registers, and text types, rather than analysis of individual linguistic constructions. The importance of variationist and comparative perspectives is assumed by the approach. That is, the approach is based on the assumption that different kinds of text differ linguistically and functionally so that analysis of any one or two text varieties is not adequate for conclusions concerning a discourse domain. For example, considering only academic prose and fiction would not give an accurate representation of writing; rather, many other written varieties, such as newspaper reports, editorials, personal letters, etc., also would need to be included. The approach is explicitly multi-dimensional.That is, it is assumed that multiple parameters of variation will operate in any discourse domain. The approach is empirical and quantitative. Analyses are based on frequency counts of linguistic features, describing the relative distributions of features across texts. The linguistic co-occurrence patterns that define each dimension are identified empirically using multivariate statistical techniques.
The approach synthesizes quantitative and qualitative/functional methodological techniques. That is, the statistical analyses are interpreted in functional terms, to determine the underlying communicative functions associated with each distributional pattern. The approach is based on the assumption that statistical cooccurrence patterns reflect underlying shared communicative functions. The notion of linguistic co-occurrence has been given formal status in the MD approach, in that different co-occurrence patterns are analyzed as underlying dimensions of variation. The co-occurrence patterns comprising each dimension are identified quantitatively. That is, based on the actual distributions of linguistic features in a large corpus of texts, statistical techniques (specifically factor analysis) are used to identify the sets of linguistic features that frequently co-occur in texts. It is not the case, though, that quantitative techniques are sufficient in themselves for MD analyses of genre/register variation. Rather, qualitative techniques are required to interpret the functional bases underlying each set of co-occurring linguistic features. The dimensions of variation have both linguistic and functional content. The linguistic content of a dimension comprises a group of linguistic features (e.g., nominalizations, prepositional phrases, attributive adjectives) that cooccur with a high frequency in texts. Based on the assumption that co-occurrence reflects shared function, these co-occurrence patterns are interpreted in terms of the situational, social, and cognitive functions most widely shared by the linguistic features. That is, linguistic features co-occur in texts because they reflect shared functions. A simple example is the way in which first and second person pronouns, direct questions, and imperatives are all related to interactiveness. Contractions, false starts, and generalized content words (e.g., thing) are all related to the constraints imposed by real-time production. The functional bases of other co-occurrence patterns are less transparent, so that careful qualitative analyses of particular texts are required to help interpret the underlying functions.
A.2 Overview of methodology in the multi-dimensional approach All MD analyses, such as those in Chapters 4, 7, and 8 of the present book follow the same methodological steps. These steps are summarized in Table A.1, while the following paragraphs discuss each step in greater detail.
Appendix 1. A brief introduction to multi-dimensional analysis
Table A.1 The eight methodological steps of a complete multi-dimensional analysis

1. An appropriate corpus is designed based on previous research and analysis. Texts are collected, transcribed (in the case of spoken texts), and input into the computer. The situational characteristics of each spoken and written register are noted (e.g., purposes of the register, production circumstances, and other characteristics discussed in chapter 1). 2. Research is conducted to identify the linguistic features to be included in the analysis, together with functional associations of the linguistic features. 3. Computer programs are developed for automated grammatical analysis, to identify or tag all relevant linguistic features in texts. 4. The entire corpus of texts is tagged automatically by computer, and all texts are edited interactively to insure that the linguistic features are accurately identified. 5. Additional computer programs are developed and run to compute frequency counts of each linguistic feature in each text of the corpus. 6. The co-occurrence patterns among linguistic features are analyzed, using a factor analysis of the frequency counts. 7. The factors from the factor analysis are interpreted functionally as underlying dimensions of variation. 8. Dimension scores for each text with respect to each dimension are computed; the mean dimension scores for each register are then compared to analyze the salient linguistic similarities and differences among the registers being studied.
MD analyses can be conducted to study many different varieties of language from the full range of spoken/written genres/registers in a language to a specific subgenre. The first requirement for any MD analysis, therefore, is to compile a text corpus that represents the variety being studied. Texts must be sampled from all genres/registers included in the target discourse domain (see Section 3 in Chapter 1). The corpora used in the studies described in Chapters 4, 7, and 8 are all examples of the kind of corpus needed to conduct an MD analysis. A second preliminary task in MD analysis is to identify the linguistic features to be used in the analysis. The goal here is to be as inclusive as possible, identifying all linguistic features (including lexical classes, grammatical categories, and syntactic constructions) that might have functional associations. Thus, any feature associated with particular communicative functions, or used to differing extents in different text varieties, is included. Occurrences of these features are counted in each text of the corpus, providing the basis for the subsequent statistical analyses. Computer programs are usually used to tag the words in corpus texts for various lexical, grammatical, and syntactic categories, and to compile frequency counts of linguistic features. The tagger used in previous MD studies (developed by Biber) marks the word classes and syntactic information required to automatically identify the linguistic features listed in the last section. Biber (1988, Appendix II; 1993) provides a description of an early version of the tagging program. Biber, Conrad,
and Reppen (1998, Methodology Boxes 4 and 5) provide a general description of tagging programs and the process of tagging. In recent years, this tagging program has been extended as part of the research for the Longman Grammar of Spoken and Written English (Biber et al, 1999). The full list of linguistic features included in the MD studies for the present book is given in Appendix Two. After linguistic features have been tagged, additional computer programs tally frequency counts of each feature in each text. These counts are normalized to a common basis, to enable comparison across the texts. Counts are normed (e.g., to their rate of occurrence per 1,000 words of text) before conducting statistical analyses. (The procedure for normalization is further described in Biber, 1988, pp. 7576, and in Biber, Conrad, & Reppen, 1998, Methodology Box 6). As described in Section A.1 above, co-occurrence patterns are central to MD analyses because each dimension represents a different set of co-occurring linguistic features. The statistical technique used for identifying these co-occurrence patterns is known as factor analysis, and each set of co-occurring features is referred to as a factor. In a factor analysis, a large number of original variables (in this case the linguistic features) are reduced to a small set of derived, underlying variables the factors. When considering a set of linguistic features, each having its own variance, it is possible to analyze the pool of shared variance, that is, the extent to which the features vary in similar ways. Shared variance is directly related to co-occurrence. If two features tend to be frequent in some texts and rare in other texts, then they co-occur and have a high amount of shared variance. Factor analysis attempts to account for the shared variance among features by extracting multiple factors, where each factor represents the maximum amount of shared variance that can be accounted for out of the pool of variance remaining at that point. Thus, the second factor extracts the maximum amount of shared variance from the variability left over after the first factor has been extracted, and so on. Each linguistic feature has some relation to each factor, and the strength of that relation is represented by factor loadings. (The factor loading represents the amount of variance that a feature has in common with the total pool of shared variance accounted for by a factor.) Factor loadings can range from 0.0, which shows the absence of any relationship, to 1.0, which shows a perfect correlation. The factor loading indicates the extent to which one can generalize from a factor to a particular linguistic feature, or the extent to which a linguistic feature is representative of the dimension underlying a factor. Put another way, the size of the loading reflects the strength of the co-occurrence relationship between the feature in question and the total grouping of co-occurring features represented by the factor. Each linguistic feature has a loading (or weight) on each factor. However, when interpreting a factor, only features with salient or important loadings are consid-
Appendix 1. A brief introduction to multi-dimensional analysis
ered. In most MD analyses, features with loadings smaller than.30 are disregarded as unimportant for the interpretation of a factor. Positive and negative sign are not related to importance; the sign instead identifies two groupings of features that occur in a complementary pattern as part of the same factor. That is, when the features with positive loadings occur together frequently in a text, the features with negative loadings are markedly less frequent in that text, and vice versa. Table 4.4 in Chapter 4 is an example of a factor analysis with both positive and negative loadings of multiple linguistic features grouped around seven different factors. Factor interpretations depend on the assumption that linguistic cooccurrence patterns reflect underlying communicative functions. That is, particular sets of linguistic features cooccur frequently in texts because they serve related communicative functions. In the interpretation of a factor, it is important to consider the likely reasons for the complementary distribution between positive and negative feature sets as well as the reasons for the cooccurrence patterns within those sets. The interpretation of a factor as a functional dimension is based on (1) analysis of the communicative function(s) most widely shared by the set of co-occurring features defining a factor, and (2) analysis of the similarities and differences among registers with respect to the factor. In order to determine the distribution of registers along a dimension, we compute dimension scores for each text and then compare texts and registers with respect to those scores. The frequency counts of individual linguistic features might be considered as scores that can be used to characterize texts (e.g., a noun score, an adjective score, etc.). In a similar way, dimension scores (or factor scores) can be computed for each text by summing the frequencies of the features having salient loadings on that dimension. For example, the Dimension 1 score for each text in the Biber (1988) MD analysis was computed by adding together the frequencies of private verbs, that deletions, contractions, present tense verbs, etc. the features with positive loadings on Factor 1 (from Table 5) and then subtracting the frequencies of nouns, word length, prepositions, etc. the features with negative loadings. In MD studies, frequencies are standardized to a mean of 0.0 and a standard deviation of 1.0 before the dimension scores are computed. This process translates the scores for all features to scales representing standard deviation units. Thus, regardless of whether a feature is extremely rare or extremely common in absolute terms, a standard score of +1 represents one standard deviation unit above the mean score for the feature in question. That is, standardized scores measure whether a feature is common or rare in a text relative to the overall average occurrence of that feature. The raw frequencies are transformed to standard scores so that all features on a factor will have equivalent weights in the computation of dimension scores. If this process were not followed, extremely common features would have a much greater influence than rare features on the dimension scores.
The methodological steps followed to standardize frequency counts and compute dimension scores are described more fully in Biber (1988, pp. 9397). Once a dimension score is computed for each text, the mean dimension score for each register can be computed. Plots of these dimension scores then allow linguistic characterization of any given register, comparison of the relations between any two registers, and a fuller functional interpretation of the underlying dimension; standard statistical techniques (such as ANOVA and post-hoc tests like Duncan or Scheffe) can be used to determine whether the differences among mean scores are statistically significant. For example, Figure 7.1 in Chapter 7 uses mean dimension scores to plot the difference in the use of linguistic features between introduction, methods, results and discussion sections of research articles. The paragraphs above provide a brief introduction to the analytical techniques used in MD analysis. However, much more could be said about the technical aspects of MD methodology, including such matters as rotation techniques in the factor analysis; the specific procedures required to compute and interpret factors; the reliability, validity, and significance of dimensions; and representativeness and sampling in corpus design. Interested readers are referred to Biber (1990, 1993b, 1993c, 1995), Biber, Conrad, and Reppen (Biber et al., 1998), Biber, Conrad, Reppen, Byrd, and Helt (2003).
Appendix 2
Grammatical and lexico-grammatical features included in the multi-dimensional analyses

The following list identifies the major grammatical and lexico-grammatical features identified by the Biber tagger, used for the MD analyses in Chapters 4, 7, and 8.
1.
Pronouns and pro-verbs first person pronouns second person pronouns third person pronouns (excluding it) pronoun it demonstrative pronouns (this, that, these, those as pronouns) indefinite pronouns (e.g., anybody, nothing, someone) pro-verb do
2.
Reduced forms and dispreferred structures contractions complementizer that deletion (e.g., I think [0] he went) stranded prepositions (e.g., the candidate that I was thinking of) split auxiliaries (e.g., they were apparently shown to )
3. Prepositional phrases 4. Coordination phrasal coordination (NOUN and NOUN; ADJ and ADJ; VERB and VERB; ADV and ADV) independent clause coordination (clause initial and) 5. WH-Questions 6. Lexical specificity type/token ratio word length
7.
Nouns nominalizations (ending in tion, -ment, -ness, -ity) nouns 7a. Semantic categories of nouns animate noun (e.g., teacher, child, person) cognitive noun (e.g., fact, knowledge, understanding) concrete noun (e.g., rain, sediment, modem) technical/concrete noun (e.g., cell, wave, electron) quantity noun (e.g., date, energy, minute) place noun (e.g., habitat, room, ocean) group/institution noun (e.g., committee, bank, congress) abstract/process nouns (e.g., application, meeting, balance)
8. Verbs 8a. Tense and aspect markers past tense perfect aspect verbs non-past tense 8b. Passives agentless passives by passives 8c. Modals possibility/permission/ability modals (can, may, might, could) necessity/obligation modals (ought, must, should) predictive/volition modals (will, would, shall) 8d. Semantic categories of verbs be as main verb activity verb (e.g., smile, bring, open) communication verb (e.g., suggest, declare, tell) mental verb (e.g., know, think, believe) causative verb (e.g., let, assist, permit) occurrence verb (e.g., increase, grow, become) existence verb (e.g., possess, reveal, include) aspectual verb (e.g., keep, begin, continue) 8e. Phrasal verbs intransitive activity phrasal verb (e.g., come on, sit down) transitive activity phrasal verb (e.g., carry out, set up) transitive mental phrasal verb (e.g., find out, give up)
Appendix 2. Grammatical and lexico-grammatical features included in the multi-dimensional analyses
transitive communication phrasal verb (e.g., point out) intransitive occurrence phrasal verb (e.g., come off, run out) copular phrasal verb (e.g., turn out) aspectual phrasal verb (e.g., go on)
9. Adjectives attributive adjectives predicative adjectives 9a. Semantic categories of adjectives size attributive adjectives (e.g., big, high, long) time attributive adjectives (e.g., new, young, old) color attributive adjectives (e.g., white, red, dark) evaluative attributive adjectives (e.g., important, best, simple) relational attributive adjectives (e.g., general, total, various) topical attributive adjectives (e.g., political, economic, physical)
10. Adverbs and adverbials place adverbials time adverbials 10a. Adverb classes conjuncts (e.g., consequently, furthermore, however) downtoners (e.g., barely, nearly, slightly) hedges (e.g., at about, something like, almost) amplifiers (e.g., absolutely, extremely, perfectly) emphatics (e.g., a lot, for sure, really) discourse particles (e.g., sentence initial well, now, anyway) other adverbs 10b. Semantic categories of stance adverbs non-factive adverbs (e.g., frankly, mainly, truthfully) attitudinal adverbs (e.g., surprisingly, hopefully, wisely) factive adverbs (e.g., undoubtedly, obviously, certainly) likelihood adverbs (e.g., evidently, predictably, roughly)
11. Adverbial subordination causative adverbial subordinator (because) conditional adverbial subordinator (if, unless) other adverbial subordinator (e.g., since, while, whereas)
12. Nominal post-modifying clauses that relatives (e.g., the dog that bit me, the dog that I saw) WH relatives on object position (e.g., the man who Sally likes) WH relatives on subject position (e.g., the man who likes popcorn) WH relatives with fronted preposition (e.g., the manner in which he was told) past participial postnominal (reduced relative) clauses (e.g., the solution produced by this process) 13. That complement clauses 13a. That clauses controlled by a verb (e.g., we predict that the water is here) non-factive verb (e.g., imply, report, suggest) attitudinal verb (e.g., anticipate, expect, prefer) factive verb (e.g., demonstrate, realize, show) likelihood verb (e.g., appear, hypothesize, predict) 13b. That clauses controlled by an adjective (e.g., it is strange that he went there) attitudinal adjectives (e.g., good, advisable, paradoxical) likelihood adjectives (e.g., possible, likely, unlikely) 13c. That clauses controlled by a noun (e.g., the proposal that he put forward was accepted) non-factive noun (e.g., comment, proposal, remark) attitudinal noun (e.g., hope, reason, view) factive noun (e.g., assertion, observation, statement) likelihood noun (e.g., assumption, implication, opinion)
14. WH-clauses 15. To-clauses 15a. To-clauses controlled by a verb (e.g., He offered to stay) speech act verb (e.g., urge, report, convince) cognition verb (e.g., believe, learn, pretend) desire/intent/decision verb (e.g., aim, hope, prefer) modality/cause/effort verb (e.g., allow, leave, order) probability/simple fact verb (e.g., appear, happen, seem)
Appendix 2. Grammatical and lexico-grammatical features included in the multi-dimensional analyses
15b. To-clauses controlled by an adjective certainty adjectives (e.g., prone, due, apt) ability/willingness adjectives (e.g., competent, hesitant) personal affect adjectives (e.g., annoyed, nervous) ease/difficulty adjectives (e.g., easy, impossible) evaluative adjectives (e.g., convenient, smart) 15c. To-clauses controlled by a noun (e.g., agreement, authority, intention)
References
Abelen, E., Redecker, G., & Thompson, S. (1993). The rhetorical structure of US-American and Dutch fund-raising letters. Text, 13(3), 323350. Aijmer, K. (2002). English Discourse Particles: Evidence from a Corpus. Amsterdam: John Benjamins. Anthony, L. (1999). Writing research article introductions in software engineering: How accurate is a standard model? IEEE Transactions on Professional Communication, 42, 3846. Aristotle. (1932). The Rhetoric of Aristotle (L. D. Cooper, Trans.). New York NY: Appleton and Company. Aristotle. (1984). Rhetoric. In J. Barnes (ed.), The Complete Works of Aristotle (rev. Oxford ed., Vol. 2, pp. 21522269). Princeton NJ: Princeton University Press. Arnold, C. (1982). Introduction (W. Kluback, Trans.). In C. Perelman (ed.), The Realm of Rhetoric. Notre Dame IN: University of Notre Dame Press. Baker, P. (2006). Using Corpora in Discourse Analysis. London: Continuum. Barton, E. (1993). Evidentials, argumentation, and epistemological stance. College English, 55, 745769. Bateman, J., & Rondhuis, K. J. (1997). Coherence relations: Towards a general specification. Discourse Processes, 24(1), 350. Bazerman, C. (1988). Shaping Written Knowledge: The Genre and Activity of the Experimental Article in Science. Madison WI: University of Wisconsin Press. Bazerman, C. (1994). Systems of genres and the enactment of social intentions. In A. Freedman & P. Medway (eds.), Genre and the New Rhetoric (pp. 79104). London: Taylor & Francis. Bazerman, C. (1997a). Some information comments on texts mediating fund-raising relationships: Cultural sites of affiliation. In Written Discourse in Philanthropic Fund Raising. Issues of Language and Rhetoric. In Working Papers, 9813 (pp. 1726). Bazerman, C. (1997b). The life of genre, the life in the classroom. In W. Bishop & H. Ostrum (Eds.), Genre and Writing (pp. 1926). Portsmouth NH: Boynton/Cook. Bazerman, C., & Prior, P. (eds.). (2004). What Writing Does and How it Does it: An Introduction to Analyzing Texts and Textual Practices. Mahwah NJ: Lawrence Erlbaum Associates. Beach, R., & Anson, C. (1992). Stance and intertextuality in written discourse. Linguistics and Education, 4, 335357. Berkenkotter, C., & Huckin, T. (1995). Genre Knowledge in Disciplinary Communication: Cognition/Culture/Power. Hillsdale NJ: Lawrence Erlbaum. Bhatia, V. (1993a). Analyzing Genre: Language Use in Professional Settings. London: Longman. Bhatia, V. (1993b). Simplification vs. easification: The case of legal texts. Applied Linguistics, 4(1), 4254. Bhatia, V. (1997a). Applied genre analysis and ESP. In T. Miller (ed.), Functional Approaches to Written Texts: Classroom Applications (pp. 134149). Washington, DC: USIA.
Discourse on the Move Bhatia, V. (1997b). Discourse of philanthropic fund-raising. Paper presented at the Written discourse in philanthropic fund raising. Issues of language and rhetoric, Indianapolis IN. Bhatia, V. (1998). Generic patterns in fundraising discourse. New Directions for Philanthropic Fundraising, 22, 95110. Bhatia, V. (2002). A generic view of academic discourse. In J. Flowerdew (ed.), Academic Discourse (pp. 2139). New York NY: Longman. Bhatia, V. (2004). Worlds of Written Discourse: A Genre-based View. New York NY: Continuum. Biber, D. (1986). Spoken and written textual dimensions in English: Resolving the contradictory findings. Language, 62, 384414. Biber, D. (1988). Variation Across Speech and Writing. Cambridge: CUP. Biber, D. (1989). A Typology of English texts. Linguistics, 27, 343. Biber, D. (1990). Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5, 257269. Biber, D. (1992). Using computer-based text corpora to analyze the referential strategies of spoken and written texts. In J. Svartvik (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 48August 1991 (pp. 213252). Berlin: Mouton. Biber, D. (1993a). Representativeness in corpus design. Literary and Linguistic Computing, 8, 115. Biber, D. (1993b). Using register-diversified corpora for general language studies. Computational Linguistics 19, 219241. Biber, D. (1993c). The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities 26, 331345. Biber, D. (1995). Dimensions of Register Variation: A Cross-linguistic Comparison. Cambridge: CUP. Biber, D. (2003). Variation among spoken and written registers: A new multi-dimensional analysis. In P. Leistyna & C. Meyer (eds.), Corpus Analysis: Language Structure and Language Use. Amsterdam: Rodopi. Biber, D. (2004). Historical patterns for the grammatical marking of stance: A cross-register comparison. Journal of Historical Pragmatics, 5, 107135. Biber, D. (2006a). Stance in spoken and written university registers. Journal of English for Academic Purposes, 5, 97116. Biber, D. (2006b). University Language: A Corpus-based Study of Spoken and Written Registers. Amsterdam: John Benjamins. Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP. Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multidimensional analysis. TESOL Quarterly, 36, 948. Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2003). Strengths and goals of multidimensional analysis: A response to Ghadessy. TESOL Quarterly, 37, 151155. Biber, D., Csomay, E., Jones, J., & Keck, C. (2004). A corpus linguistic investigation of vocabulary-based discourse units in university registers. In U. Connor & T. Upton (eds.), Applied Corpus Linguistics: A Multi-dimensional Perspective (pp. 5372). Amsterdam: Rodopi. Biber, D., & Finegan, E. (1988). Adverbial stance types in English. Discourse Processes, 11, 134. Biber, D., & Finegan, E. (1989). Styles of stance in English: lexical and grammatical marking of evidentiality and affect. The Politics of Language Purism, 250.
References Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. London: Pearson Education. Brett, P. (1994). A genre analysis of the results section of sociology articles. English for Specific Purposes, 13(1), 4759. Brown, G., & Yule, G. (1983). Discourse Analysis. Cambridge: CUP. Bruthiaux, P. (1994). Me Tarzan, you Jane: Linguistic simplification in personal ads register. In D. Biber & E. Finegan (eds.), Sociolinguistic Perspectives on Register (pp. 136154). New York NY: OUP. Bruthiaux, P. (1996). The Discourse of Classified Advertising: Exploring the Nature of Linguistic Simplicity. New York NY: OUP. Bunton, D. (2002). Generic moves in Ph.D. thesis introductions. In J. Flowerdew (ed.), Academic Discourse (pp. 5775). London: Pearson Education. Callow, K., & Callow, J. (1992). Text as purposive communication: A meaning-based analysis. In W. Mann & S. Thompson (eds.), Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text (pp. 537). Amsterdam: John Benjamins. Capozzoli, M., McSweeney, L., & Sinha, D. (1999). Beyond kappa: A review of interrater agreement measures. The Canadian Journal of Statistics, 27(1), 323. Cazden, C. (1986). Language in the classroom. In R. Kaplan (ed.), Annual Review of Applied Linguistics (Vol. 7). Rowley MA: Newbury House. Chafe, W. (1986). Evidentiality in English conversation and academic writing. In W. Chafe & J. Nichols (eds.), Evidentiality: The Linguistic Coding of Epistemology (pp. 261272). Norwood NJ: Ablex. Chafe, W. (1994). Discourse, Consciousness, and Time. Chicago IL: University of Chicago Press. Chafe, W. (1997). Polyphonic topic development. In T. Givn (ed.), Conversation: Cognitive, Communicative and Social Perspectives. Amsterdam: John Benjamins. Chafe, W., & Nichols, J. (eds.). (1986). Evidentiality: The Linguistic Coding of Epistemology. Norwood NJ: Ablex. Chaudron, C. (1988). Second Language Classrooms: Research on Teaching and Learning. Cambridge: CUP. Chu, B. (1996). Introductions in state-of-the-art, argumentative, and teaching tips TESL journal articles: Three possible sub-genres of introductions? Unpublished Research Monograph No. 12. City University of Hong Kong. Collins, P. (1991). Cleft and the Pseudo-cleft Constructions in English. London: Routledge. Collins, P. (1995). The indirect object construction in English: An information approach. Linguistics, 33, 3549. Cone, A. L. (1987). How to Create and Use Solid Gold Fund-raising Letters. Ambler PA: FundRaising Institute. Connor, U. (1987). Argumentative patterns in student essays: Cross-cultural differences. In U. Connor & R. Kaplan (eds.), Writing Across Languages: Analysis of L2 Text (pp. 7387). Reading MA: Addison-Wesley. Connor, U. (1996). Contrastive Rhetoric. Cross-cultural Aspects of Second-language Writing. Cambridge: CUP. Connor, U. (1997). Comparing research and not-for-profit grant proposals. In Written Discourse in Philanthropic Fund Raising: Issues of Language and Rhetoric (Vol. Working Papers, 9813, pp. 4564). Indianapolis IN: Indiana University Center on Philanthropy.
Discourse on the Move Connor, U. (2000). Variation in rhetorical moves in grant proposals of US humanists and scientists. Text, 20(1), 128. Connor, U., Davis, K., & De Rycker, T. (1995). Correctness and clarity in applying for overseas jobs: A cross-cultural analysis of US and Flemish applications. Text, 15(4), 457475. Connor, U., & Gladkov, K. (2004). Rhetorical appeals in fundraising direct mail letters. In U. Connor & T. Upton (eds.), Discourse in the Professions: Perspectives from Corpus Linguistics. (pp. 257286). Amsterdam: John Benjamins. Connor, U., & Lauer, J. (1985). Understanding persuasive essay writing: Linguistic/rhetorical approach. Text, 5(4), 309326. Connor, U., & Mauranen, A. (1999). Linguistic analysis of grant proposals: European Union research grants. English for Specific Purposes, 18(1), 4762. Connor, U., Precht, K., & Upton, T. (2002). Business English: Learner data from Belgium, Finland, and the U.S. In S. Granger, J. Hung & S. Petch-Tyson (eds.), Computer Learner Corpora, Second Language Acquisition, and Foreign Language Teaching (pp. 175194). Amsterdam: John Benjamins. Connor, U., & Upton, T. (2003). Linguistic dimensions of direct mail letters. In C. Meyer & P. Leistyna (eds.), Corpus Analysis: Language Structure and Language Use (pp. 7186). Amsterdam: Rodopi. Connor, U., & Upton, T. (2004a). The genre of grant proposals: A corpus linguistic analysis. In U. Connor & T. Upton (eds.), Discourse in the Professions: Perspectives from Corpus Linguistics (pp. 235256). Amsterdam: John Benjamins. Connor, U., & Upton, T. (eds.). (2004b). Discourse in the Professions: Perspectives from Corpus Linguistics. Amsterdam: John Benjamins. Connor, U., & Wagner, L. (1998). Language use in grant proposals by nonprofits: Spanish and English. New Directions for Philanthropic Fundraising, 22, 5973. Conrad, S. (2001). Variation among disciplinary texts: A comparison of textbooks and journal articles in biology and history. In S. Conrad & D. Biber (eds.), Variation in English: MultiDimensional Studies (pp. 94107). London: Longman. Conrad, S., & Biber, D. (2000). Adverbial marking of stance in speech and writing. In S. Hunston & G. Thompson (eds.), Evaluation in Text (pp. 5673). Oxford: OUP. Conrad, S., & Biber, D. (eds.). (2001). Variation in English: Multi-dimensional studies. London: Longman. Cooper, A. (1988). Given-new: Enhancing coherence through cohesiveness. Written Communication, 5, 352367. Corbett, E. (1965). Classical Rhetoric for the Modern Student. New York NY: OUP. Coulthard, M. (ed.). (1994). Advances in Written Text Analysis. London: Routledge. Couture, B. (1986). Effective ideation in written text: A functional approach to clarity and exigence. In B. Couture (ed.), Functional Approaches to Writing: Research Perspectives (pp. 6991). Norwood NJ: Ablex. Crismore, A. (1997). Visual rhetoric in an Indiana University Foundation Annual Report. In Written Discourse in Philanthropic Fund Raising. Issues of Language and Rhetoric. In Working Papers, 9813 (pp. 64100). Indianapolis, IN. Crookes, G. (1986). Towards a validated analysis of scientific text structure. Applied Linguistics, 7(1), 5770. Crossley, S. (2007). A chronotopic approach to genre analysis: An exploratory study. English for Specific Purposes, 26(1), 424.
References Csomay, E. (2002). Episodes in University Classrooms: A Corpus Linguistic Investigation. Unpublished Ph.D. Dissertation. Flagstaff, AZ: Northern Arizona University. Csomay, E. (2005a). Linguistic variation in the lexical episodes of university classroom talk. In A. Tyler, M. Takada, Y. Kim & D. Marinova (eds.), Language in Use: Cognitive and Discourse Perspectives on Language and Language Learning. Georgetown University Round Table on Languages and Linguistics (pp. 150162). Washington DC: Georgetown University Press. Csomay, E. (2005b). Linguistic variation within university classroom talk: A corpus-based perspective. Linguistics and Education, 15, 243274. Csomay, E. (2006). Academic talk in American classrooms: Crossing the boundaries of oralliterate discourse? Journal of English for Academic Purposes, 5, 117135. Dahlgren, K. (1996). Discourse coherence and segmentation. In E. Hovy & D. Scott (eds.), Computational Discourse: Burning Issues An Interdisciplinary Account (NATO ASI Series, Series F: Computer and Systems Sciences, Vol. 151). Heidelberg: Springler-Verlag. de Haan, P. (1989). Postmodifying Clauses in the English Noun Phrase: A Corpus-based Study. Amsterdam: Rodopi. Dubois, B. (1997). The Biomedical Discussion Section in Context. Greenwich CT: Ablex. Dudley-Evans, T. (1994a). Genre analysis: An approach to text analysis for ESP. In M. Coulthard (ed.), Advances in Written Text Analysis (pp. 219228). London: Routledge. Dudley-Evans, T. (1994b). Variation in the discourse patterns favoured by different disciplines and their pedagogical implications. In J. Flowerdew (ed.), Academic Listening: Research Perspectives (pp. 146157). New York NY: CUP. Dudley-Evans, T. (1995). Genre models for the teaching of academic writing to second language speakers: Advantages and disadvantages. The Journal of TESOL France, 2(2), 181193. Everitt, B. (1974). Cluster Analysis. New York NY: Wiley. Feng, H. (Forthcoming). A genre-based study of research grant proposals in China. In U. Connor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins. Ferguson, C. A. (1983). Sports announcer talk: Syntactic aspects of register variation. Language in Society, 12, 153172. Ferguson, C. A. (1994). Dialect, register, and genre: Working assumptions about conventionalization. In D. Biber & E. Finegan (eds.), Sociolinguistic Perspectives on Register (pp. 1530). New York NY: OUP. Finegan, E., & Biber, D. (2001). Register variation and social dialect variation: The register axiom. In P. Eckert & J. Rickford (eds.), Style and Sociolinguistic Variation (pp. 235267). Cambridge: CUP. Flowerdew, J. (1993). An educational, or process, approach to the teaching of professional genres. ELT Journal, 47(4), 305316. Fortanet, I. (2004). The use of we in university lectures: Reference and function. English for Specific Purposes, 23, 4566. Fox, B. A. (1987). Discourse Structure and Anaphora. Written and Conversational English. Cambridge: CUP. Fox, B. A., & Thompson, S. A. (1990). A discourse explanation of the grammar of relative clauses in English conversation. Language, 66, 297316. Gee, J. P. (1986). Units in the production of narrative discourse. Discourse Processes, 9(4), 391422. Geisler, C. (1995). Relative Infinitives in English. Uppsala: Uppsala University. Givn, T. (Ed.). (1983). Topic Continuity in Discourse. Amsterdam: John Benjamins. Grabe, W., & Kaplan, R. (1996). Theory and Practice of Writing. New York NY: Longman.
Discourse on the Move Granger, S. (1983). The be + past participle Construction in Spoken English, with Special Emphasis on the Passive. Amsterdam: North Holland. Graves, R. (1997). Dear friend (?): Culture and genre in American and Canadian direct marketing letters. Journal of Business Communication, 34(3), 235252. Grimes, J. (1975). The Thread of Discourse. The Hague: Mouton. Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions and the structure of discourse. Computational Linguistics, 12, 175204. Halliday, M. (1989). Spoken and Written Language. Oxford: OUP. Halliday, M., & Hasan, R. (1976). Cohesion in English. London: Longman. Hansen, C. (1994). Topic identification in lecture discourse. In J. Flowerdew (ed.), Academic Listening: Research Perspectives (pp. 131145). New York: CUP. Hearst, M. (1994). Multi-paragraph segmentation of expository texts (Technical Report 94/790, Computer Science Division (EECS)). Berkeley CA: University of California. Hearst, M. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), 3364. Heath, S. B., & Langman, J. (1994). Shared thinking and the register of coaching. In D. Biber & E. Finegan (eds.), Sociolinguistic Perspectives on Register (pp. 82105). New York: OUP. Henry, A., & Roseberry, R. (1996). Using a small corpus to obtain data for teaching a genre. In M. Ghadessy, A. Henry & R. Roseberry (eds.), Small corpus studies and ELT: Theory and practice (pp. 93133). Amsterdam: John Benjamins. Hewings, M., & Hewings, A. (2002). It is interesting to note that...: A comparative study of anticipatory it in student and published writing. English for Specific Purposes, 21, 367383. Hobbs, J. (1979). Coherence and coreference. Cognitive Science, 10(3), 6790. Hoey, M. (1983). On the Surface of Discourse. London: Allen and Unwin. Hoey, M. (1986). Overlapping patterns of discourse organization and their implications for clause relational analysis in problem-solution text. In C. R. Cooper & S. Greenbaum (eds.), Studying Writing: Linguistic Approaches (pp. 187214). Newbury Park, CA: Sage. Hoey, M. (1991). Patterns of Lexis in Text. Oxford: OUP. Holmes, J. (1988). Doubt and certainty in ESL textbooks. Applied Linguistics, 9, 2044. Hopkins, A., & Dudley-Evans, T. (1988). A genre-based investigation of the discussion sections in articles and dissertations. English for Specific Purposes, 7(2), 113122. Horn, B. (2005). Quantitative and qualitative approaches to text structure analysis: A comparison of two methods. Ph.D. seminar research paper, Northern Arizona University. Hunston, S. (1994). Evaluation and organization in a sample of written academic discourse. In M. Coulthard (ed.), Advances in Written Text Analysis (pp. 191218). London: Routledge. Hunston, S., & Thompson, G. (eds.). (2000). Evaluation in Text: Authorial Stance and the Construction of Discourse. New York: OUP. Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: CUP. Hyland, K. (1996a). Talking to the academy: Forms of hedging in science research articles. Written Communication, 13, 251281. Hyland, K. (1996b). Writing without conviction? Hedging in science research articles. Applied Linguistics, 17, 433454. Hyland, K. (1998). Hedging in Scientific Research Articles. Amsterdam: John Benjamins. Hyland, K. (1999a). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20(3), 341367.
References Hyland, K. (1999b). Disciplinary discourses: Writer stance in research articles. In C. Candlin & K. Hyland (eds.), Writing: Texts, Processes and Practices (pp. 122142). London: Longman. Hyland, K. (2000). Disciplinary Discourses: Social Interaction in Academic Genres. London: Longman. Hyland, K. (2001). Bringing in the reader: Addressee features in academic articles. Written Communication, 18(4), 549574. Hyland, K. (2002a). Authority and invisibility: Authorial identity in academic writing. Journal of Pragmatics, 34(8), 10911112. Hyland, K. (2002b). Directives: Argument and engagement in academic writing. Applied Linguistics, 23(2), 215239. Hyland, K. (2004a). A convincing argument: Corpus analysis and academic persuasion. In U. Connor & T. Upton (Eds.), Discourse in the Professions: Perspectives from Corpus Linguistics (pp. 87112). Amsterdam: John Benjamins. Hyland, K. (2004b). Disciplinary interactions: Metadiscourse in L2 postgraduate writing. Journal of Second Language Writing, 13(2), 133151. Hymes, D. (1984). Sociolinguistics: Stability and consolidation. International Journal of the Sociology of Language, 45, 3945. Jaworski, A., & Coupland, N. (1999). Introduction: Perspectives on discourse analysis. In A. Jaworski & N. Coupland (eds.), The Discourse Reader (pp. 144). London: Routledge. Johansson, C. (1995). The Relativizers Whose and Of Which in Present-Day English: Description and Theory. Uppsala: Uppsala University. Journal Citation Reports. (2004). Philadelphia PA: Thomson. Kanoksilapatham, B. (2003). A corpus-based investigation of biochemistry research articles: Linking move analysis with multidimensional analysis. Unpublished Ph.D. Dissertation. Georgetown University. Kanoksilapatham, B. (2005). Rhetorical structure of biochemistry research articles. English for Specific Purposes, 24, 269292. Kinneavy, J. (1971). Theory of Discourse. Englewood Cliffs NJ: Prentice-Hall. Korolija, N., & Linell, P. (1996). Episodes: Coding and analyzing coherence in multiparty conversation. Linguistics, 34, 799831. Kress, G., & van Leeuwen, T. (1990). Reading Images. The Grammar of Visual Design. London: Routledge. Kress, G., & van Leeuwen, T. (2001). Multimodal Discourse. The Modes and Media of Contemporary Communication. London: Arnold. Kwan, B. (2006). The schematic structure of literature reviews in doctoral theses of applied linguistics. English for Specific Purposes, 25(1), 3055. Labov, W. (1984). Intensity. In D. Schiffrin (ed.), Meaning, Form, and Use in Context: Linguistic Applications (pp. 4370). Washington DC: Georgetown University Press. Labov, W., & Waletsky, J. (1967). Narrative analysis: Oral versions of personal experience. In J. Helm (Ed.), Essays on the Verbal and Visual Arts: Proceedings of the 1966 Annual Spring Meeting of the American Ethnological Society (pp. 1214). Seattle WA: University of Washington Press. Lauer, J. (1997). Fundraising letters. In Written Discourse in Philanthropic Fund Raising. Issues of Language and Rhetoric. In Working Papers, 9813 (pp. 101108). Indianapolis IN. Lee, D. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5, 3772.
Discourse on the Move Lewin, B., Fine, J., & Young, L. (2001). Expository Discourse: A Genre-based Approach to Social Science Research Texts. New York NY: Continuum. Lewis, H. (1997). Direct mail fund raising tactics. Direct Marketing, 59, 2830. Lindquist, H., & Mair, C. (eds.). (2004). Corpus Approaches to Grammaticalization in English. Amsterdam: John Benjamins. Long, M., & Sato, C. (1983). Classroom foreigner talk discourse: Forms and functions of teachers questions. In H. Seliger & M. Long (eds.), Classroom Oriented Research in Second Language Acquisition. Rowley MA: Newbury House. Loukianenko, M. (Forthcoming). Different cultures different discourses? Rhetorical patterns of business letters by English and Russian speakers. In U. Connor, E. Nagelhout & W. Rozycki (rds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins. Love, A. (2002). Introductory concepts and cutting edge theories: Can the genre of the textbook accommodate both? In J. Flowerdew (ed.), Academic Discourse (pp. 7692). New York NY: Longman. Mair, C. (1990). Infinitival Complement Clauses in English. New York: CUP. Mandler, J. M., & Johnson, N. S. (1977). Remembrance of things parsed: Story structure and recall. Cognitive Psychology 9, 111151. Mann, W., Matthiessen, C., & Thompson, S. (1992). Rhetorical structure theory and text analysis. In W. Mann & S. Thompson (eds.), Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text. Amsterdam: John Benjamins. Mann, W., & Thompson, S. (1988). Rhetorical structure theory: Towards a functional theory of text organization. Text, 8(3), 243282. Mann, W., & Thompson, S. (eds.). (1992). Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text. Amsterdam: John Benjamins. Marcu, D. (2000). The Theory and Practice of Discourse Parsing and Summarization. Cambridge MA: The MIT Press. Martin, J. R. (1985). Process and text: Two aspects of human semiosis. In J. D. Benson & W. S. Greaves (eds.), Systemic Perspectives on Discourse (Vol. Vol. 1, pp. 248274). Norwood NJ: Ablex. Martin, J. R., & Rothery, J. (1986). What a functional approach can show teachers. In B. Couture (ed.), Functional Approaches to Writing: Research Perspectives (pp. 241265). Norwood NJ: Ablex. Marton, F., & Tsui, A. (2004). Classroom Discourse and the Space of Learning. Mahwah NJ: Lawrence Erlbaum. Mauranen, A. (2001). Reflexive academic talk. In R. Simpson & J. Swales (Eds.), Corpus linguistics in North America: Selections for the 1999 Symposium. Ann Arbor MI: The University of Michigan Press. McBride, K. (Forthcoming). English web page use in an EFL setting: A contrastive rhetoric view of the development of information literacy. In U. Connor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins. McCagg, P. (1997). Metaphorical morality and the discourse of philanthropy. In Written Discourse in Philanthropic Fund Raising. Issues of Language and Rhetoric. In Working Papers, 9813 (pp. 109120). Indianapolis IN. McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based Language Studies. London: Routledge. McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22, 247288.
References
Mehan, H. (1979). Learning Lessons: Social Organization in the Classroom. Cambridge MA: Harvard University Press. Meyer, C. (1985). Prose analysis: Purposes, procedures, and problems. In B. K. Britton & J. B. Black (eds.), Understanding Expository Text: A Theoretical and Practical Handbook for Analyzing Explanatory Text. Hillsdale NJ: Lawrence Erlbaum Associates. Meyer, C. (1992). Apposition in Contemporary English. Cambridge: CUP. Meyer, C., & Leistyna, P. (eds.). (2003). Corpus Analysis: Language Structure and Language Use. Amsterdam: Rodopi. Miller, C. (1984). Genre as a social action. Quarterly Journal of Speech, 70, 157178. Morris, J., & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 18, 537544. Myers, G. (1997). Wednesday morning and the millennium: Notes on time in fund-raising texts. In Written Discourse in Philanthropic Fund Raising. Issues of Language and Rhetoric. In Working Papers, 9813 (pp. 121134). Indianapolis IN. Myhill, J. (1995). Change and continuity in the functions of the American English modals. Linguistics, 33, 157211. Myhill, J. (1997). Should and ought: The rise of individually oriented modality in American English. English Language and Linguistics, 1, 323. Naczi, R., Reznicek, A., & Ford, B. (1998). Morphological, geographical, and ecological differentiation in the Carex willdenowii coplex (Cyberaceae). American Journal of Botany, 85, 434447. Nattinger, J., & DeCarrico, J. (1992). Lexical Phrases. New York: CUP. Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam: John Benjamins. Nwogu, K. (1991). Structure of science popularizations: A genre-analysis approach to the schema of popularized medical texts. English for Specific Purposes, 10(2), 111123. Nwogu, K. (1997). The medical research paper: Structure and functions. English for Specific Purposes, 16(2), 119138. Ochs, E. (Ed.). (1989). The Pragmatics of Affect. Special Edition of Text, 9(3). Orwin, R. G. (1994). Evaluating coding decisions. In H. Cooper & L. Hedges (Eds.), The Handbook of Research Synthesis (pp. 139162). New York NY: Russell Sage Foundation. Pak, C., & Acevedo, R. (Forthcoming). Spanish language newspaper editorials from Mexico, Spain, and the U.S. In U. Connor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins. Paltridge, B. (1994). Genre analysis and the identification of textual boundaries. Applied Linguistics, 15(3), 288299. Pang, T. (2002). Textual analysis and contextual awareness building: A comparison of two approaches to teaching genre. In A. Johns (ed.), Genre in the Classroom: Multiple Perspectives (pp. 145161). Mahwah NJ: Lawrence Erlbaum. Partington, A. (2003). The Linguistics of Political Argument. The Spin-doctor and the Wolf-pack at the White House. London: Routledge. Passonneau, R., & Litman, D. J. (1996). Empirical analysis of three dimensions of spoken discourse: segmentation, coherence, and linguistic devices. In E. H. Hovy & D. R. Scott (eds.), Computational and Conversational Discourse (NATO ASI Series, Series F Computer and Systems Series, Vol. 151). New York NY: Springer Verlag. Passonneau, R., & Litman, D. J. (1997). Discourse segmentation by human and automated means. Computational Linguistics, 23(1), 103140. Peng, J. (1987). Organisational features in chemical engineering research articles. ELR Journal, 1, 79116.
Discourse on the Move Perelman, C. (1982). The Realm of Rhetoric (W. Kluback, Trans.). Notre Dame IN: University of Notre Dame Press. Phillips, M. K. (1985). Aspects of Text Structure: An Investigation of the Lexical Organization of Text. Amsterdam: North-Holland. Polanyi, L. (1985). Telling the American Story: A Structural and Cultural Analysis of Storytelling. Norwood NJ: Ablex. Polanyi, L. (1988). A formal model of the structure of discourse. Journal of Pragmatics, 12, 601638. Poole, D. (2005). Cross-cultural variation in classroom turn-taking practices. In P. Bruthiaux, D. Atkinson, W. Grabe & V. Ramanathan (eds.), Directions in Applied Linguistics. Buffalo: Multilingual Matters. Posteguillo, S. (1999). The schematic structure of computer science research articles. English for Specific Purposes, 18(2), 139158. Precht, K. (2000). Patterns of stance in English. Ph.D. dissertation, Northern Arizona University. Prince, E. F. (1978). A comparison of wh-clefts and it-clefts in discourse. Language, 54, 883906. Prince, E. F. (1981). Toward a taxonomy of given-new information. In P. Cole (ed.), Radical Pragmatics. New York NY: Academic Press. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. London: Longman. Raymond, J. (1982). What we dont know about the evaluation of writing. College Composition and Communication, 33(4), 399403. Rmer, U. (2005). Progressives, Patterns, Pedagogy: A Corpus-driven Approach to English Progressive Forms, Functions, Contexts and Didactics. Amsterdam: John Benjamins. Salager-Meyer, F. (1997). I think that perhaps you should: A study of hedges in written discourse analysis. In T. Miller (ed.), Functional Approaches to Written Text: Classroom Applications (pp. 105117). Washington DC: USIA. Sampson, G., & McCarthy, D. (eds.). (2004). Corpus Linguistics: Readings in a Widening Discipline. London: Continuum. Samraj, B. (2002). Introductions in research articles: Variation across disciplines. English for Specific Purposes, 21, 117. Sanders, T. (1997). Semantic and pragmatic sources of coherence: On the categorization of coherence relations in context. Discourse Processes, 24(1), 119148. Sanders, T., & Noordman, L. G. (2000). The role of coherence relations and their linguistic markers. Discourse Processes, 29(1), 3760. Schiffrin, D. (1981). Tense variation in narrative. Language, 57, 4562. Schiffrin, D. (1985a). Conversational coherence: The role of well. Language, 61(640667). Schiffrin, D. (1985b). Multiple constraints on discourse options: A quantitative analysis of causal sequences. Discourse Processes, 8, 281303. Schiffrin, D. (1987). Discourse Markers. Cambridge: CUP. Schiffrin, D. (1994). Approaches to Discourse. Oxford: Blackwell. Schiffrin, D., Tannen, D., & Hamilton, H. (eds.). (2001). The Handbook of Discourse Analysis. Oxford: Blackwell Publishers. Scollon, R., & Scollon, S. W. (2001). Discourse and intercultural communication. In D. Schiffrin, D. Tannen & H. Hamilton (eds.), The Handbook of Discourse Analysis (pp. 538547). Oxford: Blackwell.
References Scott, M. (2004a). Definition of key-ness [Electronic Version]. Wordsmith Tools online manual.Retrieved November 28, 2005 from http://www.lexically.net/downloads/version4/html/ index.html. Scott, M. (2004b). Wordsmith Tools (Version 4.0) [Computer software]. Oxford: OUP. Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam: John Benjamins. Simpson, R., & Mendis, D. (2003). A corpus-based study of idioms in academic speech. TESOL Quarterly, 37, 419441. Sinclair, J., & Coulthard, M. (1975). Towards an Analysis of Discourse. Oxford: OUP. Stubbs, M. (1983). Discourse Analysis: The Sociolinguistic Analysis of Natural Language. Oxford: Blackwell. Suarez, L., & Moreno, A. (Forthcoming). The rhetorical structure of academic book reviews of literature: An English-Spanish cross-linguistic approach. In U. Connor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins. Swales, J. (1981). Aspects of Article Introductions. Birmingham AL: University of Aston. Swales, J. (1984). Research into the structure of introductions to journal articles and its application to the teaching of academic writing. In R. Williams & J. Kirkman (eds.), Common Grounds: Shared Interests in ESP and Communication Studies (pp. 7786). New York NY: Pergamon Press. Swales, J. (1990). Genre Analysis: English for Academic and Research Settings. Cambridge: CUP. Swales, J. (2004). Research Genres: Explorations and Applications. Cambridge: CUP. Swales, J., & Burke, A. (2003). Its really fascinating work: Differences in evaluative adjectives across academic registers. In P. Leistyna & C. Meyer (eds.), Corpus Analysis: Language Structure and Language Use. Amsterdam: Rodopi. Swales, J., & Luebs, M. (2002). Genre analysis and the advanced second language writer. In E. Barton & G. Stygall (eds.), Genre in the Classroom: Multiple Perspectives (pp. 105119). Mahwah NJ: Lawrence Erlbaum. Swales, J., & Najjar, H. (1987). The writing of research article introductions. Written Communication, 4, 175192. Tannen, D. (1984). Conversational Style: Analyzing Talk among Friends. Norwood NJ: Ablex. Tannen, D. (1987). Repetition in conversation: Toward a poetic of talk. Language, 63, 574605. Tannen, D. (1989). Talking Voices: Repetition, Dialogue, and Imagery in Conversational Discourse. Cambridge: CUP. Thompson, D. (1993). Arguing for experimental facts in science. Written Communication, 10(1), 106128. Thompson, D., & Ye, Y. (1991). Evaluation in the reporting verbs used in academic papers. Applied Linguistics, 12(4), 365382. Thompson, S. (1983). Grammar and discourse: The English detached participial clause. In F. Klein-Andreu (ed.), Discourse Perspectives on Syntax. New York NY: Academic Press. Thompson, S. (1985). Grammar and written discourse: Initial vs final purpose clauses in text. Text, 5(12), 5584. Thompson, S. (1994). Frameworks and context: A genre based approach to analyzing lecture introductions. English for Specific Purposes, 13(2), 171186.
Discourse on the Move Thompson, S., & Mulac, A. (1991a). A quantitative perspective on the grammaticization of epistemic parentheticals in English. In E. Traugott & B. Heine (eds.), Approaches to Grammaticalization (Vol. 2). Amsterdam: John Benjamins. Thompson, S., & Mulac, A. (1991b). The discourse conditions for the use of the complementizer that in conversational English. Journal of Pragmatics, 15, 237251. Tirkkonen-Condit, S. (1985). Argumentative Text Structure and Translation (Vol. 18). Jyvskyl, Finland: Kirjapaino Oy, Sissuomi. Tomlin, R., Forrest, L., Ming Pu, M., & Hee Kim, M. (1997). Discourse semantics. In T. Van Dijk (ed.), Discourse as Structure and Process. Discourse Studies: A Multidisciplinary Introduction (Vol. 1). Thousand Oaks CA: Sage. Tottie, G. (1991). Negation in English speech and writing: A study in variation. San Diego CA: Academic Press. Tyler, A. (1995). Patterns of lexis: How much can repetition tell us about discourse coherence? In J. Alatis, C. Straehle, B. Gallenberger & M. Ronkin (eds.), Linguistics and the Education of Language Teachers: Ethnolinguistic, Psycholinguistic, and Sociolinguistic Aspects (Georgetown University Round Table on Languages and Linguistics). Washington DC: Georgetown University Press. Upton, T. (2002). Understanding direct mail letters as a genre. International Journal of Corpus Linguistics, 7(1), 6585. Upton, T., & Connor, U. (2001). Using computerized corpus analysis to investigate the textlinguistic discourse moves of a genre. English for Specific Purposes: An International Journal, 20, 313329. Ure, J. (1982). Introduction: Approaches to the study of register range. International Journal of the Sociology of Language, 35, 523. Van Dijk, T. (1980). Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition. Hilldale NJ: Erlbaum. Van Dijk, T. (1981). Episodes as units of discourse analysis. In D. Tannen (ed.), Analyzing Discourse: Text and Talk (Georgetown Round Table on Languages and Linguistics). Washington DC: Georgetown University Press. Van Dijk, T. (Ed.). (1997). Discourse as Structure and Process. Discourse Studies: A Multidisciplinary Introduction (Vol. 1). Thousand Oaks CA: Sage. Van Dijk, T., & Kintsch, W. (1983). Strategies of Discourse Comprehension. New York NY: Academic Press. Varantola, K. (1984). On Noun Phrase Structures in Engineering English. Turku: University of Turku. Ventola, E. (1984). Orientation to social semiotics in foreign language teaching. Applied Linguistics, 5, 275286. Ventola, E., Charles, C., & Kaltenbacher, M. (2004). Perspectives on Multimodality. Amsterdam: John Benjamins. Wang, W. (Forthcoming). Newspaper commentaries on terrorism in China and Australia: A contrastive genre study. In U. Connor, E. Nagelhout & W. Rozycki (eds.), Contrastive Rhetoric: Reaching to Intercultural Rhetoric. Amsterdam: John Benjamins. Ward, G. (1990). The discourse functions of VP preposing. Language, 66, 742763. Wells, G. (1999). Dialogic Inquiry: Towards a Sociocultural Practice and Theory of Education. New York NY: CUP. Williams, R. (1999). Results section of medical research articles: An analysis of rhetorical categories for pedagogical purpose. English for Specific Purposes, 18(4), 347366.
References
Wood, A. (1982). An examination of the rhetorical structures of authentic chemistry texts. Applied Linguistics, 3(2), 121143. Youmans, G. (1991). A new tool for discourse analysis: The vocabulary management profile. Language(67), 763789. Youmans, G. (1994). The vocabulary management profile: Two stories by William Faulkner. Empirical Studies of the Arts, 12(2), 113130. Young, L. (1994). University lectures: Macro-structures and micro-features. In J. Flowerdew (ed.), Academic Listening: Research Perspectives (pp. 159176). New York NY: CUP. Zipf, G.K. (1949). Human Behavior and the Principle of Least Effort. Cambridge MA: AddisonWesley.
Index
A affective appeals see rhetorical appeals Aristotle 121123 B Baker 37 Bazerman 6, 7, 8 Bhatia 6, 24, 32, 33, 43 biochemistry research articles 73ff. abstract discussion in 9394 attributed knowledge in 9799 coding moves in 8384 conceptual versus specific reference in 9193 corpus of 75 discussion section 8183, 86 distribution of move types 8487 introductions 7778, 8586, 158159 linguistic analysis of 87ff. methods section 7879, 86, 159160 move categories 7683 multi-dimensional (MD) analysis of 87119, 244249 multi-dimensional profile of move types 101103 multi-dimensional comparison of move types 104116 move categories 7683 move categories compared to VBDU types 249253 presentation of current findings 9799 results section 7981, 86 stance in 9496 VBDUs in 15660 see biology research articles
biology research articles 175ff. as sequences of VBDUs 186 189, 194207, 253257 abstract discussion in 183184 cluster analysis of 211212 comparing top-down and bottom-up analyses 242 257 corpus of 176 current state of knowledge in 181182 discussion sections in 181, 183184, 185, 188189, 201203 evaluation in 181 extracting VBDUs from 176 178 factor analysis of 209210 introductions in 176177, 182, 187188, 197199, 204205 MD description of research article sections 184185 methods section in 182183, 185, 199201, 204205 multi-dimensional (MD) analysis of 178189, 209210, 244249 procedural description in 183 reporting past events in 182183 research journal styles 205 207 VBDU types compared to move categories 249253 VBDU types in 190194, 195ff. bottom-up approach to discourse analysis see corpus-based approaches to discourse analysis
C CARS 2528 classroom teaching (university) 213ff. as sequences of VBDUs 217 221, 232237 corpus of 214 extracting VBDUs from 214 215 functional interpretation of VBDU types 230231 informational monologue in 227228, 236237 multi-dimensional analysis of 215217 stance (personalized framing) in 225227, 234235 VBDU types in 222229, 230231 cluster analysis 171172, 190194, 222224 corpus-based approaches to discourse analysis 1217, 240242 advantages of 3640, 7475 bottom-up approaches 14, 1617, 155173, 241242 comparison of approaches 239ff. top-down approaches 13, 1416, 2341, 241242 methodologies 1214 corpus design 1719 Coupland 1 credibility appeals see rhetorical appeals D discourse definitions 12, 239240 socio-cultural approaches 2, 67, 239

structure beyond the sentence 12, 46, 910, 240 see language use see corpus-based approaches to discourse analysis E ethos see rhetorical appeals evaluation in discourse see stance F factor analysis 88, 264265 see multi-dimensional analysis Fox 9 fundraising discourse 43ff. ICIC Fundraising Corpus 4445 fundraising letters 46ff., 121ff. affective appeals in 131132 corpus-based analysis of 5461 credibility appeals in 129131 distribution of move types 5556 keywords in 137141, 148151 move analysis of 4654 prototypes 5861 rational appeals in 125129, 147 rhetorical appeals in 125132, 132135 stance features in 6168 structural elements in 5253, 56 G genre 23ff. compared to register 79 prototypes 40, 5861 genre analysis 2324 H Hamilton 1 Hearst 161, 163 Hunston 138 Hyland 67, 123 Kwan 32 L language use 1, 34, 239 letters fundraising 46ff., 121ff. job applications 3031 linguistic analysis of moves 3839 grammatical features used for studies 267271 logos see rhetorical appeals M Martin 8 Mauranen 7 modal verbs 72 in fundraising letters 64, 65, 66 move analysis 15, 23 compared to rhetorical appeals 141142 compared to VBDUs 243 244 corpus-based 3640, 8487 distribution of move types 5556, 8487 inter-rater reliability 35, 68, 8384 linguistic analysis of moves 3839, 6368, 87ff. methodology 3235 of biochemistry research articles 7683 of direct mail letters 4654 of research article introductions 2528 of other genres 2932 sequences of moves 3940 stance features in moves 63 68 multi-dimensional (MD) analysis 4, 261266 methodology 8788, 263266 of moves in biochemistry research articles 87119 of VBDUs 171 of VBDUs in biology research articles 178189 of VBDUs in classroom teaching 215217 P Partington 4 pathos see rhetorical appeals Perelman 123ff. persuasion ethos, pathos, logos 122 see rhetorical appeals prototypes see genre R rational appeals see rhetorical appeals register 79 see genre reliability for coding moves 35, 84 research articles introductions 2528 move analysis of 2532 see biochemistry research articles, biology research articles rhetorical appeals 16, 121ff. affective appeals 125, 131132, 136, 138141, 146 compared to moves 141142 corpus-based analysis of 132135 credibility appeals 125, 129131, 145146 definitions of 144146 ethos, pathos, logos 122 in fundraising letters 121ff. keywords in affective appeals 138141 keywords in other appeals 148151 linguistic characteristics of 136141 rational appeals 124, 125129, 144145 Rhetorical Structure Theory 5, 6, 15 Rmer 2, 4, 10 S Schiffrin 1, 3 Scott 2, 138 stance grammatical devices for stance 6972 in biochemistry research articles 9496
Jaworski 1 K keywords 138141

in biology research articles 181, 188189, in classroom teaching 217, 225227 in fundraising letters 6168 see rhetorical appeals (affective appeals) stance adverbials 6970 in fundraising letters 64, 67 stance verb + complement clause 7072 in fundraising letters 65 Swales 15, 23, 24, 2528, 38 T Tannen 1, 9 TextTiling 161162 texts status of 2 see unit of analysis text types 171173 Thompson 3, 5, 10, 15 top-down approach to discourse analysis see corpus-based approaches to discourse analysis Tribble 2 U unit of analysis 9, 11, 155156, 243244 V Vocabulary-based discourse units (VBDUs) cluster analysis of 171172, 190194 compared to moves 243244 definition of 156 exemplified in biochemistry research articles 156160 functional interpretation of 230231
Index
in biology research articles 176178, 194207 in classroom teaching 214 215 linguistic analysis of 169170 methodology for automatic identification 161162 multi-dimensional analysis of 171, 178189, 215217 perceptual correlates of 163168 types 170173, 190194, 222229 VBDU types compared to article sections 192194 Y Youmans 6, 9
In the series Studies in Corpus Linguistics (SCL) the following titles have been published thus far or are scheduled for publication:
29 Flowerdew, Lynne: Corpus-based Analyses of the ProblemSolution Pattern. A phraseological approach. ix,173pp.+index. Expected November 2007 28 Biber, Douglas, Ulla Connor and Thomas A. Upton: Discourse on the Move. Using corpus analysis to describe discourse structure. 2007. xii,289pp. 27 Schneider, Stefan: Reduced Parenthetical Clauses as Mitigators. A corpus study of spoken French, Italian and Spanish. 2007. xiv,237pp. 26 Johansson, Stig: Seeing through Multilingual Corpora. On the use of corpora in contrastive studies. 2007. xxii,355pp. 25 Sinclair, John McH. and Anna Mauranen: Linear Unit Grammar. Integrating speech and writing. 2006. xxii,185pp. 24 del, Annelie: Metadiscourse in L1 and L2 English. 2006. x,243pp. 23 Biber, Douglas: University Language. A corpus-based study of spoken and written registers. 2006. viii,261pp. 22 Scott, Mike and Christopher Tribble: Textual Patterns. Key words and corpus analysis in language education. 2006. x,203pp. 21 Gavioli, Laura: Exploring Corpora for ESP Learning. 2005. xi,176pp. 20 Mahlberg, Michaela: English General Nouns. A corpus theoretical approach. 2005. x,206pp. 19 Tognini-Bonelli, Elena and Gabriella Del Lungo Camiciotti (eds.): Strategies in Academic Discourse. 2005. xii,212pp. 18 Rmer, Ute: Progressives, Patterns, Pedagogy. A corpus-driven approach to English progressive forms, functions, contexts and didactics. 2005. xiv+328pp. 17 Aston, Guy, Silvia Bernardini and Dominic Stewart (eds.): Corpora and Language Learners. 2004. vi,312pp. 16 Connor, Ulla and Thomas A. Upton (eds.): Discourse in the Professions. Perspectives from corpus linguistics. 2004. vi,334pp. 15 Cresti, Emanuela and Massimo Moneglia (eds.): C-ORAL-ROM. Integrated Reference Corpora for Spoken Romance Languages. 2005. xviii,304pp.(incl.DVD). 14 Nesselhauf, Nadja: Collocations in a Learner Corpus. 2005. xii,332pp. 13 Lindquist, Hans and Christian Mair (eds.): Corpus Approaches to Grammaticalization in English. 2004. xiv,265pp. 12 Sinclair, John McH. (ed.): How to Use Corpora in Language Teaching. 2004. viii,308pp. 11 Barnbrook, Geoff: Defining Language. A local grammar of definition sentences. 2002. xvi,281pp. 10 Aijmer, Karin: English Discourse Particles. Evidence from a corpus. 2002. xvi,299pp. 9 Reppen, Randi, Susan M. Fitzmaurice and Douglas Biber (eds.): Using Corpora to Explore Linguistic Variation. 2002. xii,275pp. 8 Stenstrm, Anna-Brita, Gisle Andersen and Ingrid Kristine Hasund: Trends in Teenage Talk. Corpus compilation, analysis and findings. 2002. xii,229pp. 7 Altenberg, Bengt and Sylviane Granger (eds.): Lexis in Contrast. Corpus-based approaches. 2002. x,339pp. 6 Tognini-Bonelli, Elena: Corpus Linguistics at Work. 2001. xii,224pp. 5 Ghadessy, Mohsen, Alex Henry and Robert L. Roseberry (eds.): Small Corpus Studies and ELT. Theory and practice. 2001. xxiv,420pp. 4 Hunston, Susan and Gill Francis: Pattern Grammar. A corpus-driven approach to the lexical grammar of English. 2000. xiv,288pp. 3 Botley, Simon Philip and Tony McEnery (eds.): Corpus-based and Computational Approaches to Discourse Anaphora. 2000. vi,258pp. 2 Partington, Alan: Patterns and Meanings. Using corpora for English language research and teaching. 1998. x,158pp. 1 Pearson, Jennifer: Terms in Context. 1998. xii,246pp.

Discourse On The Move Using Corpus Analysis To Describe Discourse Structure DouglasBiber

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discourse On The Move Using Corpus Analysis To Describe Discourse Structure DouglasBiber

Uploaded by

Copyright:

Available Formats

Discourse on the Move

Studies in Corpus Linguistics (SCL)

Victoria University of Wellington

Geoffrey N. Leech Anna Mauranen Ute Rmer

University of Lancaster University of Helsinki University of Hannover

Sylviane Granger M.A.K. Halliday Susan Hunston Stig Johansson

University of Louvain University of Sydney University of Birmingham Oslo University

Michaela Mahlberg Jan Svartvik

University of Liverpool University of Lund

Jiao Tong University, Shanghai