Human Abstracts, Machine Summaries, Cyborg Solutions?

Human Abstracts, Machine Summaries, Cyborg Solutions?
Ahmad M. Kamal and Victoria L. Rubin

Faculty of Information and Media Studies, University of Western Ontario
North Campus Building, Room 260
London, Ontario, Canada N6A 5B7
akamal8@uwo.ca, vrubin@uwo.ca
ABSTRACT are attributed to the intended audience, intended effect or

We present a comparative study of abstracts and machine- background of the writer (Swales, 1990). Abstracts are
generated summaries. This study bridges two hitherto written to fulfill any number of functions, such as attracting
independent lines of research: the descriptive analyses of a specific reader or promoting one’s work. Differences in
abstracts as a genre and the testing of summaries produced goals of authors versus abstracting and indexing services
by automatic text summarization (ATS). A pilot sample of are recognized (Cross & Oppenheim, 2006). Montesi and
eight articles was gathered from Library and Information Owen (2007), for example, compared author abstracts and
Science Abstracts (LISA) database, with each article indexed abstracts (author abstracts amended by staff
including an author-written abstract and one of four types editors) in Library and Information Science Abstracts
of indexed abstracts. Three ATS systems (Copernic (LISA) and noted content and stylistic modifications,
Summarizer, Microsoft AutoSummarize, SweSum) were suggesting the value-added contribution of LISA editors
used to produce three additional summaries per article. The improves the accessibility of documents to a wider audience
structure, content and style of abstracts and summaries were than initially conceived by the article author. Yet, LISA
analyzed by building on genre analysis methods, creating increasingly relies on unedited abstracts in their database.
ten functional categories. Summaries and abstracts
demonstrate variability in analyzed features and captured So far, studies have not explored the use of ATS summaries
concepts, with some consistencies and overlap. to supplement ‘deficiencies’ in author abstracts, nor have
Incorporating ATS output can be useful to information ATS has been incorporated in the genre studies of abstracts.
seekers: summaries complement abstracts by expanding Though summaries are the product of computational
representativeness of source articles. Yet certain cognitive algorithms rather than volitional communicators, analyzing
processes performed by abstractors remain irreplaceable. ATS summaries alongside human abstracts can help
highlight the linguistic and cognitive mechanisms behind
Keywords author and professional abstracting. The goal of this study
Automatic summarization, abstracting, textual analysis is to develop a descriptive vocabulary and classification
INTRODUCTION system to allow for comparative assessments in the context
Abstracts are compressed representations of larger of real-world information practices. We aim to demonstrate
documents. From the perspective of information behavior, the strengths, weaknesses, and convergences among
abstracts are important resources for activities such as abstractors and ATS systems towards synergistically
information retrieval, browsing, and classification (Moens, incorporating cognitive and computational processes to
2000). Yet there is little consistency in the writing of improve information seeking services.
abstracts, whether between writers or even by the same METHODS
writer (Rath et al, 1999). Pioneers of automatic text The pilot data consist of 40 abstracts and summaries for a
summarization (ATS) saw it as an ‘objective’ and cost- stratified convenience sample of 8 machine-readable
effective alternative to manually produced abstracts (Luhn, articles, selected from the LISA database, published in
1958/1999). Yet approaches created to evaluate ATS English-language journals within in Library Technology
summaries and thereby develop better systems struggle section between 2000 - 2003. Each selected article included
with the lack of a ‘gold standard’ against which to measure the author abstract (Fig.1A) and 1 of the 4 types of indexed
ATS summary quality (Hirschman & Mani, 2004). LISA abstracts – LISA-written abstracts, amended author
Linguistic studies, by contrast, approach the variability of abstracts (Fig.1B), abstracts comprised of quotes from the
abstracts descriptively. Abstracts are treated as a genre of article, or unedited author abstracts. Each article was
communication; differences in structure, content and style processed through 3 ATS systems (commercial stand-alone
Copernic Summarizer (Fig1C), AutoSummarize within
Microsoft Word 2007 (Fig.1D), free online SweSum), to
ASIST 2010, October 22–27, 2010, Pittsburgh, PA, USA. create outputs of comparable length to human abstracts.
Copyright © 2010 Ahmad M. Kamal and Victoria L. Rubin.
1
Author Abstract (A) Indexed Abstract (B) Copernic Summarizer (C) |Microsoft AutoSummarize (D)
(1) Usability testing is an invaluable (1) Reviews the major principles (1) by Brenda Battleson, Austin Booth, (1) Usability Testing
tool for evaluating the effectiveness involved in the usability testing of and Jane Weintrop
and ease of use of academic library academic library Web sites (2) Usability testing can be
Web sites. (2) Usability testing is an invaluable tool divided into three categories:
(2) with particular reference to a for evaluating the effectiveness and ease inquiry, inspection, and formal
(2) This article reviews major case study involving Buffalo of use of academic library Web sites. usability testing.
usability principles University Libraries (UB
Libraries). (3) Clearly defined priorities in terms of (3) The term "usability testing"
(3) and explores the application of the "who" and "what" of a Web site are encompasses numerous
formal usability testing to an existing (3) Describes the activities of the bases for assessing whether or not methods of evaluating site
site at the University at Buffalo setting the goals, designing the the site provides sufficient task support. usability.
libraries. test, and evaluating the results.
Figure 1. Excerpts from sample abstracts and summaries, showing the first three moves for the same LIS article.
Following genre analysis methodology, all texts were In all, we envision combining human cognitive labor with
divided in moves - a syntactic unit serving a communicative automation to improve the representativeness of documents
function. Each move (e.g., Fig 1, 1-3) was analyzed for its – what we refer to as a cyborg-solution to the plethora of
global features (i.e., types of content such as an ‘argument/ abstracting approaches. This differs from suggestions of
conclusion’, ‘background’, or ‘method/activity’) and local semi-automated summarization (Hovy, 2004), which
features (i.e., how the content is expressed, such as whether prescribe an editorial role for people to simply improve the
it describes details of the ‘method’ (i.e., is informative) or readability and coherence of ATS outputs.
talks about the ‘method’ indirectly (i.e., is indicative); and
CONCLUSIONS
what style is used (e.g., mood, voice)). ATS summaries Automatic summaries and human abstracts each bring
often captured superfluous extracts from texts; hence different perspectives to the document representation. This
incidental (describing peripheral, incomplete or incoherent study developed a framework of analysis across summaries
text segments) were distinguished from significant moves. and abstracts. It reveals considerable overlaps, differences,
RESULTS AND DISCUSSION and disjunctions among ATS systems, but also between
Global and local features vary across summaries and abstractors with different motivations. A merger between
abstracts, nonetheless demonstrating some consistencies abstracting practices and automated summarization offers
and overlap. Indexed abstracts are most conservatively new horizons for representation that can ultimately benefit
structured, varying least across articles. Summaries all information seekers, article writers, and abstracting services
generated many incidental moves, but also significantly
ACKNOWLEDGMENTS
informative units regarding methods, findings, and Thanks to Tyrone Nagai, Senior Supervising Editor of
background. Ten categories of global features were Social Sciences, for his invaluable support and assistance.
identified (e.g., incidental moves (Fig.1D1: heading),
external moves (Fig. 1C1: authors) or explication moves REFERENCES
(Fig.1B3: examples; Fig.1D3: definition). This is beyond Cross, C. &Oppenheim, C. (2006). A genre an analysis of
the standard 5 moves of a scientific abstracts (following the scientific abstracts. J. Doc. 64(4), 428-446.
IMRaD structure of introduction, methods/materials, Hovy, E. (2004). Text summarization. In R. Mitkov (Ed.)
results, and discussion), and is well suited for LIS literature. The Oxford Handbook of Computational Linguistics.
Interpreting incidental moves in summaries is challenging
Hirschman, L. & Mani, I. (2004). Evaluation. In R. Mitkov
since extracted sentences lack the context necessary for
(Ed.) The Oxford Handbook of Computational
proper attribution (e.g., is this claim made by the author or a
Linguistics: Oxford University Press
cited source?).
Library and Information Science Abstracts. (2006). Notes
Moves in abstracts are largely indicative, talking indirectly on Abstracting for LISA.
about the article. This is in-keeping with LISA’s guidelines,
which instruct that “abstract is not intended to be a Luhn, H.P.(1958/1999). The automatic creation of literature
replacement for the original article” (LISA, 2006). abstracts. In Mani & Maybury (Eds.) Advances in
Summaries are mostly informative, presenting content Automatic Text Summarization. Cambridge: MIT Press
explicitly, and functioning more like document surrogates. Moens, M. F. 2000. Automatic indexing and abstracting of
Furthermore, summaries occasionally include segments of document texts. Boston: Kluwer.
text left out of abstracts – such as negative finding or
definitions (Fig.1D3) – which are potentially useful to Montesi, M., & Owen, J.M. (2007). Revision of author
information seekers without full article access. ATS is an abstracts: How it is carried out by LISA editors. AsLib
imperfect process but can legitimately supplement Proceedings: New Information Perspectives 59(1), 26-45.
abstracts. This study confirms and builds upon the findings Rath, G.J., Resnick, A. & Savage, R.T. (1999). The
of Montesi & Owen (2007) by expanding the examination formation of abstracts by the selection of sentences. In
of LISA to all types of indexed abstracts. A larger sample Mani & Maybury (Eds.) Advances in Automatic Text
size is necessary to test and refine the observed linguistic Summarization. Cambridge: MIT Press
patterns that constitute or indicate global and local features.
Swales,J.M.(1990). Genre Analysis: English in Academic
Findings suggest that ATS summaries could present a
and Research Settings. NY: Cambridge University Press
useful resource for improved information retrieval systems.

Human Abstracts, Machine Summaries, Cyborg Solutions?

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Human Abstracts, Machine Summaries, Cyborg Solutions?

Uploaded by

Copyright:

Available Formats

Human Abstracts, Machine Summaries, Cyborg Solutions?

Ahmad M. Kamal and Victoria L. Rubin

ABSTRACT are attributed to the intended audience, intended effect or

You might also like