SOCIAL RESEARCH METHODS Pertti Et Al 2008 PDF

The SAGE
Handbook of
Social Research
Methods
The SAGE
Handbook of
Social Research
Methods
Edited by
Pertti Alasuutari,
Leonard Bickman,
Julia Brannen
Editorial arrangement and Introduction © Pertti Chapter 19 © Andrea Doucet and Natasha Mauthner 2008
Alasuutari, Leonard Bickman, Julia Brannen 2008 Chapter 20 © Joanna Bornat 2008
Chapter 2 © Alan Bryman 2008 Chapter 21 © Janet Smithson 2008
Chapter 3 © Marja Alastalo 2008 Chapter 22 © Suzanne E. Graham, Judith D. Singer and
Chapter 4 © Martyn Hammersley 2008 John B. Willett 2008
Chapter 5 © Karen Armstrong 2008 Chapter 23 © Rick H. Hoyle 2008
Chapter 6 © Pekka Sulkunen 2008 Chapter 24 © Stephen G. West and Felix Thoemmes 2008
Chapter 7 © Ann Nilsen 2008 Chapter 25 © Charles Antaki 2008
Chapter 8 © Celia B. Fisher and Andrea E. Anushko 2008 Chapter 26 © Matti Hyvärinen 2008
Chapter 9 © Howard S. Bloom 2008 Chapter 27 © Kathy Charmaz 2008
Chapter 10 © Thomas D. Cook and Vivian C. Wong 2008 Chapter 28 © Lindsay Prior 2008
Chapter 11 © Ken Kelley and Scott E. Maxwell 2008 Chapter 29 © Christian Heath and Paul Luff 2008
Chapter 12 © Giampietro Gobo 2008 Chapter 30 © Janet Heaton 2008
Chapter 13 © Linda Mabry 2008 Chapter 31 © Angela Dale, Jo Wathan and Vanessa
Chapter 14 © Jane Elliott, Janet Holland and Rachel Higgins 2008
Thomson 2008 Chapter 32 © Erika A. Patall and Harris Cooper 2008
Chapter 15 © David de Vaus 2008 Chapter 33 © Jane Fielding and Nigel Fielding 2008
Chapter 16 © James A. Bovaird and Susan E. Chapter 34 © Ann Cronin, Victoria D. Alexander, Jane
Embretson 2008 Fielding, Jo Moran-Ellis and Hilary Thomas 2008
Chapter 17 © Susan A. Speer 2008 Chapter 35 © Manfred Max Bergman 2008
Chapter 18 © Edith de Leeuw 2008 Chapter 36 © Amir Marvasti 2008
First published 2008
Apart from any fair dealing for the purposes of research or private
study, or criticism or review, as permitted under the Copyright,
Designs and Patents Act, 1988, this publication may be reproduced,
stored or transmitted in any form, or by any means, only with the
prior permission in writing of the publishers, or in the case of
reprographic reproduction, in accordance with the terms of licences
issued by the Copyright Licensing Agency. Enquiries concerning
reproduction outside those terms should be sent to the publishers.
SAGE Publications Ltd

1 Oliver’s Yard
55 City Road
London EC1Y 1SP
SAGE Publications Inc.

2455 Teller Road
Thousand Oaks, California 91320
SAGE Publications India Pvt Ltd

B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road
New Delhi 110 044
SAGE Publications Asia-Pacific Pte Ltd

33 Pekin Street #02-01
Far East Square
Singapore 048763
Library of Congress Control Number: 2007929185
British Library Cataloguing in Publication data
A catalogue record for this book is available from

the British Library
ISBN 978-1-4129-1992-0
Typeset by CEPHA Imaging Pvt. Ltd., Bangalore, India

Printed in Great Britain by The Cromwell Press Ltd, Trowbridge, Wiltshire
Printed on paper from sustainable resources
Contents
Notes on Contributors ix
1. Social Research in Changing Social Conditions 1
PART I: DIRECTIONS IN SOCIAL RESEARCH 9
2. The End of the Paradigm Wars? 13

Alan Bryman
3. The History of Social Research Methods 26

Marja Alastalo
4. Assessing Validity in Social Research 42

Martyn Hammersley
5. Ethnography and Audience 54

Karen Armstrong
6. Social Research and Social Practice in Post-Positivist Society 68

Pekka Sulkunen
7. From Questions of Methods to Epistemological Issues: The Case of

Biographical Research 81
Ann Nilsen
8. Research Ethics in Social Science 95

Celia B. Fisher and Andrea E. Anushko
PART II: RESEARCH DESIGNS 111
9. The Core Analytics of Randomized Experiments for Social Research 115

Howard S. Bloom
10. Better Quasi-Experimental Practice 134

Thomas D. Cook and Vivian C. Wong
vi CONTENTS
11. Sample Size Planning with Applications to Multiple Regression: Power and
Accuracy for Omnibus and Targeted Effects 166
Ken Kelley and Scott E. Maxwell
12. Re-conceptualizing Generalization: Old Issues in a New Frame 193

Giampietro Gobo
13. Case Study in Social Research 214

Linda Mabry
14. Longitudinal and Panel Studies 228

Jane Elliott, Janet Holland, and Rachel Thomson
15. Comparative and Cross-National Designs 249

David de Vaus
PART III: DATA COLLECTION AND FIELDWORK 265
16. Modern Measurement in the Social Sciences 269

James A. Bovaird and Susan E. Embretson
17. Natural and Contrived Data 290

Susan A. Speer
18. Self-Administered Questionnaires and Standardized Interviews 313

Edith de Leeuw
19. Qualitative Interviewing and Feminist Research 328

Andrea Doucet and Natasha Mauthner
20. Biographical Methods 344

Joanna Bornat
21. Focus Groups 357

Janet Smithson
PART IV: TYPES OF ANALYSIS AND INTERPRETATION OF EVIDENCE 371
22. An Introduction to the Multilevel Model for Change 377

Suzanne E. Graham, Judith D. Singer and John B. Willett
23. Latent Variable Models of Social Research Data 395

Rick H. Hoyle
24. Equating Groups 414

Stephen G. West and Felix Thoemmes
25. Discourse Analysis and Conversation Analysis 431

Charles Antaki
CONTENTS vii
26. Analyzing Narratives and Story-Telling 447

Matti Hyvärinen
27. Reconstructing Grounded Theory 461

Kathy Charmaz
28. Documents and Action 479

Lindsay Prior
29. Video and the Analysis of Work and Interaction 493

Christian Heath and Paul Luff
30. Secondary Analysis of Qualitative Data 506

Janet Heaton
31. Secondary Analysis of Quantitative Data Sources 520

Angela Dale, Jo Wathan and Vanessa Higgins
32. Conducting a Meta-Analysis 536

Erika A. Patall and Harris Cooper
33. Synergy and Synthesis: Integrating Qualitative and Quantitative Data 555
Jane Fielding and Nigel Fielding
34. The Analytic Integration of Qualitative Data Sources 572

Ann Cronin, Victoria D. Alexander, Jane Fielding, Jo Moran-Ellis and
Hilary Thomas
35. Combining Different Types of Data for Quantitative Analysis 585

Manfred Max Bergman
36. Writing and Presenting Social Research 602

Amir Marvasti
Index 617
Notes on Contributors
Marja Alastalo is post-doctoral Research Fellow in the Department of Sociology and Social
Psychology, University of Tampere, Finland. She is interested in history of research methods
and sociology of knowledge and science. Currently she is doing research on the processes of
harmonizing social statistics in the European Union.
Pertti Alasuutari, PhD, is Professor of Sociology and Director of the International School of
Social Sciences at the University of Tampere, Finland. He is editor of the European Journal
of Cultural Studies, and has published widely in the areas of cultural and media studies and
qualitative methods. His books include Desire and Craving: A Cultural Theory of Alcoholism
(SUNY Press, 1992), Researching Culture: Qualitative Method and Cultural Studies (SAGE,
1995), An Invitation to Social Research (SAGE, 1998), Rethinking the Media Audience (SAGE,
1999), and Social Theory and Human Reality (SAGE, 2004).
Victoria D. Alexander is Senior Lecturer in Sociology at the University of Surrey, and is

interested in sociology of the arts, sociology of cultural organizations, and visual methods.
She is author of Museums and Money (Indiana University Press, 1996), Sociology of the Arts
(Blackwell, 2003), and co-author of Art and the State (Palgrave Macmillan, 2005).
Charles Antaki, PhD, is Professor of Language and Social Psychology at the University of
Loughborough, where he is a member of the Discourse and Rhetoric Group. He is Associate
Editor of Research on Language and Social Interaction, and among his books are Identities
in Talk (SAGE, 1998; with Susan Widdecombe) and Conversation Analysis and Psychotherapy
(CUP, 2007; with Anssi Perakyla, Sanna Vehvilainen, and Ivan Leudar). He has published widely
on language and interaction.
Andrea E.Anushko, MAis a graduate student in the applied developmental psychology program
at Fordham University and the project coordinator for the Fordham Resident Alcohol Prevention
Program at the Center for Ethics Education. Her research interests include language development
and early education.
Karen Armstrong is Professor of Cultural Anthropology at the University of Helsinki, Finland.

Her research focuses on politics and the narrative construction of national identity. She is the
author of Remembering Karelia (Berghahn, 2004), and is currently doing research on the relation
of American Samoa to the US nation-state.
Manfred Max Bergman is Professor of Sociology at Basel University, Switzerland. His areas
of specialization are political sociology and research methods. His research interests relate to
x NOTES ON CONTRIBUTORS
stratification, identity, and inter-group relation, and his recent publications focus on poverty,
stratification and mobility, mixed methods research, and data quality.
Leonard Bickman, PhD, is Professor of Psychology, Psychiatry and Public Policy. He is

Associate Dean for Research and Director of Center for Evalution and Program Improvement,
Peabody College of Vanderbilt University.
Howard S. Bloom, Chief Social Scientist for MDRC, specializes in the design and analysis of
experimental and quasi-experimental studies of causal effects. He has conducted a number of
such studies and has written widely on methodologies for them.
Joanna Bornat is Professor of Oral History in the Faculty of Health and Social Care at the
Open University. She has researched and published in the areas of oral history and ageing for a
number of years. Her current research interests include the secondary analysis of archived data.
James A. Bovaird is Assistant Professor of Quantitative, Qualitative, and Psychometric

Methods (QQPM) in the Department of Educational Psychology at the University of Nebraska-
Lincoln. As a quantitative psychologist (University of Kansas, 2002), his research focuses on
the application of latent variable methodologies to novel substantive contexts and the evaluation
of competing latent variable methodologies in situations of limited inference.
Julia Brannen is Professor of the Sociology of the Family, Institute of Education, University of
London. Her main interests are in research methodology; the family lives of parents, children,
and young people; and the relation between paid work and family life. She is a co-founder
and co-editor of the International Journal of Social Research Methodology. Books include:
Mixing Methods: Qualitative and Quantitative Research (Ashgate, 1992), Connecting Children:
Care and Family Life in Later Childhood (Falmer, 2000), Young Europeans, Work and Family
(Routledge, 2002), Rethinking Children’s Care (OUP, 2003), Working and Caring over the
Twentieth Century (Palgrave, 2004), and Coming to Care (Policy Press, 2007).
Alan Bryman is Professor of Organisational and Social Research, Management Centre,

University of Leicester. His main research interests lie in research methodology, leadership,
organizational analysis, and Disneyization. He is author or co-author of many books, including:
Quantity and Quality in Social Research (Routledge, 1988), Social Research Methods (OUP,
2001, 2004), Business Research Methods (OUP, 2003), and Disneyization of Society (SAGE,
2004). He is co-editor of The SAGE Encyclopedia of Social Science Research (SAGE, 2004),
Handbook of Data Analysis (SAGE, 2004), and the forthcoming Handbook of Organizational
Research Methods (SAGE, 2008).
Kathy Charmaz is Professor of Sociology and Coordinator of the Faculty Writing Program
at Sonoma State University. Her books include Good Days, Bad Days: The Self in Chronic
Illness and Time (Rutgers, 1993) and Constructing Grounded Theory: A Practical Guide through
Qualitative Analysis, published by SAGE, London, and has co-edited the forthcoming The SAGE
Handbook of Grounded Theory. She received the 2006 George Herbert Mead award for lifetime
achievement from the Society for the Study of Symbolic Interaction.
Thomas D. Cook has a BA from Oxford and a PhD from Stanford and is a Professor of sociology,
psychology, education and social policy, and Joan and Serepta Harrison Chair in Ethics and
Justice at Northwestern University. His main interests are in social science methodology and
contextual influences on adolescent development.
NOTES ON CONTRIBUTORS xi
Harris Cooper is Professor of psychology and Director of the Program in Education at Duke
University. His research interests include research synthesis methodology and applications of
social psychology to education policies and practices.
Ann Cronin, BSc, PhD (Surrey) is Lecturer in Sociology at the University of Surrey. She
teaches a variety of courses relating to social theory, methodology, and the substantive topics
of gender and sexuality. Her research interests lie in the social construction of sexual identities
and qualitative methodologies.
Angela Dale is Professor of Quantitative Social Research at the Centre for Census and
Survey Research, University of Manchester. She is Director of the ESRC’s Research Methods
Programme and heads a team providing support for government datasets as part of the UK’s
Economic and Social Data Service. From 1993 to 2003, she led the academic team responsible for
the development and dissemination of samples of microdata from the UK Census of Population.
Andrea Doucet is Associate Professor in the Department of Sociology and Anthropology,

Carleton University, Ottawa, Canada. She is the author of Do Men Mother? (University of
Toronto Press, 2006) and over two dozen book chapters and articles on mothering and fathering,
gender equality and gender differences, and methodology and epistemology.
Jane Elliott, PhD, is reader of Research Methodology and Principal Investigator of the 1958
and 1970 British Birth Cohort Studies at the Centre for Longitudinal Studies at the Institute
of Education, University of London. She has a long-standing interest in combining qualitative
and quantitative methodologies and has published in the areas of methodology, gender, and
employment. Her book Using Narrative in Social Research: Qualitative and Quantitative
Approaches was published by SAGE in 2005.
Susan E. Embretson is a Professor of psychology at the Georgia Institute of Technology.

Her interests span modern psychometric methods (e.g. item response theory), cognitive and
intelligence, and quantitative methods, and her main research program has been to integrate
cognitive theory into psychometric models and test design.
Jane Fielding is Senior Lecturer in Quantitative Sociology, University of Surrey, and teaches
statistics and computing at both undergraduate and postgraduate levels. Recent research projects,
supported by funding from the Environment Agency, include flood warning for vulnerable
groups and the public response to flood warning and, more recently, a study of environmental
inequalities. Her particular interest is in mapping and measuring environmental inequalities using
geographical information techniques. She was also a co-holder on an ESRC Methods Programme
project (2002–2005) exploring the integration of quantitative and qualitative methods in an
investigation of the concept of vulnerability.
Nigel Fielding is Professor of Sociology and co-Director of the Institute of Social Research,
University of Surrey. His research interests are in qualitative research methods, mixed methods
research design, and new technologies for social research. His books include Linking Data
(SAGE, 1986; with Jane Fielding), a study of methodological integration; Using Computers
in Qualitative Research (SAGE, 1991; edited with Raymond M. Lee), an influential book on
qualitative software; Computer Analysis and Qualitative Research (SAGE, 1998; with Raymond
M. Lee), a study of the role of computer technology in qualitative research; and Interviewing
(SAGE, 2002; editor), a four volume set; he is currently co-editing the Handbook of Online
Research Methods (SAGE).
xii NOTES ON CONTRIBUTORS
Celia B. Fisher holds the Marie Doty Chair in Psychology at Fordham University where she
also directs the Center for Ethics Education. Her professional interests are in developing ethical
standards for the discipline of psychology and federal guidelines for the protection of vulnerable
populations in research.
Giampietro Gobo, PhD, is Associate Professor of Methodology of Social Research and

Evaluation Methods, and Director of the ICONA (Centre for Innovation and Organizational
Change in Public Administration) at the University of Milan, Italy. Among the founders of the
Qualitative Methods research network of ESA (European Sociological Association), he has been
its first chair. Associate Editor of the International Journal of Qualitative Research in Work and
Organizations, member of the editorial boards of Qualitative Research and International Journal
of Social Research Methodology, he has published over fifty articles in the areas of qualitative
and quantitative methods. His books include Ethnography into Practice (SAGE, 2007), and he
has co-edited Qualitative Research Practice (SAGE, 2004; with Clive Seale, Jaber F. Gubrium,
and David Silverman).
Suzanne E. Graham is an Assistant Professor at the University of New Hampshire. She is

interested in applying methods of longitudinal data analysis to questions about mathematics
course taking and achievement among secondary school and college students.
Martyn Hammersley is Professor of Educational and Social Research at the Open University.
His early research was in the sociology of education. Much of his more recent work has
been concerned with the methodological issues surrounding social and educational enquiry.
His most recent books are Taking Sides in Social Research (Routledge, 2000); Educational
Research, Policymaking and Practice (Paul Chapman, 2002); and Media Bias in Reporting
Social Research? The Case of Reviewing Ethnic Inequalities in Education (Routledge, 2006).
He is currently working on the issue of research ethics.
Christian Heath is Professor at King’s College London, and leads the Work Interaction and
Technology research group. He specializes in video-based studies of social interaction drawing
on ethnomethodology and conversation analysis. He is currently undertaking projects in areas
that include health care, museums and galleries, and auctions.
Janet Heaton, BA (Hons), is Research Fellow at the Social Policy Research Unit, University
of York. She is the author of Reworking Qualitative Data (SAGE, 2004), and has published a
number of articles based on her mainly qualitative research on health and social care services
for patients and their families in the UK.
Vanessa Higgins is based at the Centre for Census and Survey Research, University of
Manchester, where she works for ESDS Government, providing support for research and
teaching using the large-scale government datasets. Prior to this, Vanessa worked at the Office for
National Statistics and also on a number of policy-led research projects within academic settings.
Janet Holland is Professor of Social Research and co-Director of the Families and Social
Capital ESRC research group at London South Bank University. She also co-directs Timescapes:
Changing Relationships and Identities through the Life Course, a multi-university, large-scale
qualitative longitudinal study. Research interests cover youth, education, gender, sexuality and
family life, and methodology, and she has published widely in these areas. Examples are
Sexualities and Society (Polity Press, 2003; edited with Jeffrey Weeks and Matthew Waites);
NOTES ON CONTRIBUTORS xiii
Feminist Methodology: Challenges and Choices (SAGE, 2002; with Caroline Ramazanoglu);
and Inventing Adulthoods: A Biographical Approach to Youth Transitions (SAGE, 2007; with
Sheila Henderson, Sheena McGrellis, Sue Sharpe, and Rachel Thomson).
Rick H. Hoyle is Research Professor of psychology and neuroscience at Duke University,

where he is Associate Director of the Center for Child and Family Policy and Director
of the Office of Data, Methods, and Research Facilities in the Social Science Research
Institute. His methodological interests include the strategic application of structural equation
modeling to longitudinal and complex cross-sectional data in the social and behavioral
sciences, with a particular focus on statistical strategies for managing the detrimental effects of
measurement error.
Matti Hyvärinen is an Academy of Finland Research Fellow, University of Tampere,

Department of Sociology and Social Psychology. His current project, The Conceptual History
of Narrative, aims to capture the changing and different uses of narrative in literary, social,
and historiographical theory and analysis. He also leads the Politics and the Arts research
team at the Finnish Centre for Excellence in Political Thought and Conceptual Change.
He is a co-editor of the electronic volume The Travelling Concept of Narrative (2006)
at http://www.helsinki.fi/collegium/e-series/volumes/index.htm. Recent work includes Acting,
Thinking, and Telling: Anna Blume’s Dilemma in Paul Auster’s In the Country of Last Things
(Partial Answers 4:2, June 2006). Website: http://www.hyvarinen.info.
Ken Kelley is an Assistant Professor in the Inquiry Methodology Program at Indiana University,
where his research focuses on methodological and statistical issues that arise in the behavioral,
educational, and social sciences. More specifically, Dr. Kelley’s research focuses on the design
of research studies, with an emphasis on sample size planning from the power analytic and
accuracy in parameter estimation approaches, and the analysis of change, with an emphasis on
multilevel change models nonlinear in their parameters.
Edith de Leeuw is an Associate Professor at the University of Utrecht, Department of

Methodology and Statistics and a member and Senior Lecturer of the Interuniversities graduate
school for psychometrics and sociometrics in the Netherlands. Her most recent publications
focus on children as respondents, survey nonresponse, survey data quality, and comparative
research.
Paul Luff is Reader of Organisations and Technology at King’s College, University of London.
His recent publications include Technology in Action (Cambridge University Press, 2000; with
Christian Heath) and numerous articles in journals and books. He is co-editor of Workplace
Studies: Recovering Work Practice and Informing System Design (Cambridge University Press,
2000).
Linda Mabry, Professor of Education at Washington State University Vancouver, specializes

in qualitative research methods in research and evaluation and in the assessment of student
achievement K-12. She has conducted studies for the US Department of Education, National
Science Foundation, National Endowment for the Arts, and others, publishing a number
of articles and books. She is a member of the Board of Trustees for the Center for the
Improvement of Educational Assessments, and a former member of the Board of Directors
for the American Evaluation Association. Her most recent book (co-authored) is RealWorld
Evaluation (SAGE, 2006).
xiv NOTES ON CONTRIBUTORS
Amir Marvasti is Assistant Professor of Sociology at Penn State Altoona. His research focuses
on social construction and representation of deviant identities in everyday life. He is the author
of Being Homeless: Textual and Narrative Constructions (Lexington Books, 2003), Qualitative
Research in Sociology (SAGE, 2003), and Middle Eastern Lives in America (Rowman &
Littlefield, 2004; with Karyn McKinney). His articles have been published in the Journal of
Contemporary Ethnography, Qualitative Inquiry, and Symbolic Interaction.
Natasha Mauthner is a Senior Lecturer at the University of Aberdeen, where she teaches
courses on qualitative research methods, and gender, work, and organization. She has published
extensively on methodological and epistemological issues in qualitative research. Much of this
work has focused on the links between reflexivity, research practice, and the construction of
knowledge, and the implications for data analysis, data archiving, and the politics of research
management. Her empirical research has focused on issues of gender, work, and family and has
been published in a number of publications including The Darkest Days of My Life: Stories of
Postpartum Depression (Harvard University Press, 2002).
Scott E. Maxwell is Fitzsimons Professor of Psychology at the University of Notre Dame.

He received his Ph.D. from the University of North Carolina at Chapel Hill, and is currently
editor of Psychological Methods.
Jo Moran-Ellis is Senior Lecturer in the Department of Sociology, University of Surrey. Her

research interests are primarily in the areas of childhood studies and research methods, especially
mixed and multiple methods. Her recent projects include a reflexive methodological study
looking at integrating methods (the PPIMs study), public attitudes toward research governance,
and studies of children’s mental health services.
Ann Nilsen is Professor of Sociology at the Department of Sociology, University of Bergen,

Norway. Her areas of interest include biographical and life course methodology, cross-national
research, gender studies, and environmental sociology. In addition to books and articles in
Norwegian and international journals, her publications include a recent co-edited book Young
People, Work and Family: Futures in Transition (Routledge, 2002). She is currently writing a
book on American pragmatist thought and biographical research.
Erika A. Patall is a graduate student in Social Psychology in the Department of Psychology

and Neuroscience at Duke University. Her research interests include research synthesis, as well
as the nature of motivation and the relationship between motivation and academic achievement.
Lindsay Prior is Professor of Sociology at Queen’s University, Belfast. He is the author of

Using Documents in Social Research (SAGE, 2003), and has contributed to various handbooks
and edited collections in the field of social research methods.
Judith D. Singer is the James Bryant Conant Professor of Education at Harvard University and
former academic Dean of the Harvard Graduate School of Education. As one of the nation’s
leading applied statisticians she is primarily known for her contributions to the practice of
multilevel modeling, survival analysis, and individual growth modeling.
Janet Smithson is a post-doctoral Research Fellow in the Schools of Law and Psychology at the
University of Exeter. She has worked on a variety of national- and European-funded research
projects, using both qualitative and quantitative research methods. Her main research interests
are in cross-national comparative research on work–family, youth, transitions to adulthood and
NOTES ON CONTRIBUTORS xv
parenthood, gender and discourse, and qualitative methodology. She is currently working on a
Nuffield-funded study ‘The common law marriage myth and cohabitation law revisited’ with
Anne Barlow and Carole Burgoyne, University of Exeter.
Susan A. Speer is a Senior Lecturer in Language and Communication in the School of

Psychological Sciences at The University of Manchester. Her research interests include
conversation analysis, medical interaction, and gender and sexuality (especially transgender).
Her book Gender Talk: Feminism, Discourse and Conversation Analysis was published by
Routledge in 2005. She is Principal Investigator on the project ‘Transsexual Identities:
Constructions of Gender in an NHS Gender Identity Clinic’, which is part of the ESRC’s
Identities and Social Action Research Programme. She is currently working with Elizabeth
Stokoe (Loughborough University) on an edited collection, Conversation and Gender, for
Cambridge University Press.
Pekka Sulkunen, PhD, is Professor of Sociology at the University of Helsinki, Finland. He

has published widely in the areas of alcohol and addiction studies and social theory. His books
include The European New Middle Class (Avebury, 1992) and Constructing the New Consumer
Society (Macmillan, 1997, edited).
Hilary Thomas is Professor of Health Care Research in the Centre for Research in Primary
and Community Care, School of Nursing and Midwifery, University of Hertfordshire. She was
previously Senior Lecturer in the Department of Sociology, University of Surrey. Her substantive
research interests include the sociology of health and illness, particularly reproduction and
women’s health, and recovery from illness and injury. She was convenor of the BSA Medical
Sociology Group (1991–1994) and president of the European Society for Health and Medical
Sociology (1999–2003).
Felix Thoemmes is a graduate student in the Department of Psychology at Arizona State

University with an interest in Latent Class Models, the history of statistics, and some aspects
of evolutionary psychology.
Rachel Thomson is Professor of Social Research in the Faculty of Health and Social
Care at the Open University. Her research interests include youth transitions, gender/sexual
identities, and social change, and she has published widely in these fields. She is part of the
team that conducted a 10-year qualitative longitudinal study of youth transitions (Inventing
Adulthoods) and is currently researching the transition to motherhood. Forthcoming publications
include Researching Social Change: Qualitative Approaches to Personal, Social and Historical
Approaches (with Julie McLeod) published by SAGE in 2008.
David de Vaus is Professor of Sociology and Dean of the Faculty of Humanities and Social
Sciences at La Trobe University, Australia. He is the author of a number of internationally
renowned books on research methods including Surveys in Social Research (Routledge, 2001)
and Research Design in Social Research (SAGE, 2001). His main areas of research are family
sociology, living alone, life course transitions, and the sociology of ageing. Further details are
available at http://www.latrobe.edu.au/humanities/devaus.html.
Jo Wathan is Research Fellow at the Cathie Marsh Centre for Census and Survey Research.
She works as a member of two data support teams for British cross-sectional microdata: ESDS
Government and the Samples of Anonymised Records Support team. She also teaches classes
on statistical software and secondary analysis.
xvi NOTES ON CONTRIBUTORS
Stephen G. West is currently Professor of psychology at Arizona State University, and was
the editor of Psychological Methods for six years. His research interests are in field research
methods, multiple regression analysis, longitudinal data analysis, and multilevel modeling.
John B. Willett is Charles William Elliot Professor at Harvard University Graduate School of
Education. He is interested in all things quantitative, particularly statistical methods for analyzing
the timing and occurrence of events; methods for modeling change, learning, and development;
and longitudinal research design.
Vivian C. Wong is training to be a Research Methodologist in the field of education. Her interests
include examination of the following areas: recent shifts in methodology choice in education;
empirical tests of quasi-experimental designs such as regression-discontinuity (RD), abbreviated
interrupted time series, and difference-in-differences designs; and issues in implementation and
analysis of regression-discontinuity studies.
1
Social Research in Changing
Social Conditions
According to Herbert Blumer (1969), method- methodological work in which they were
ology refers to the ‘entire scientific quest’ engaged. Thus, the contributors draw not
that has to fit the ‘obdurate character of the only upon their own research experiences but
social world under study’. Thus methodology relate their discussions in Blumer’s terms to
is not some super-ordained set of logical the larger issue of strategy, that is tailoring
procedures that can be applied haphazardly to methodological processes to fit the empirical
any empirical problem. In short methodology world under study.
constitutes a whole range of strategies and Across the social sciences and humanities,
procedures that include: developing a picture there are differences in the development and
of an empirical world; asking questions about popularity of particular methods, differences
that world and turning these into researchable that are also evident cross-nationally. From
problems; finding the best means of doing the 1930s onward survey research and sta-
so – that involve choices about methods and tistical methods have assumed a dominant
the data to be sought, the development and position, whereas qualitative methods have
use of concepts, and the interpretation of gained ground more recently. There has also
findings (Blumer 1969: 23). Methods per se been a recent resurgence of interest both
are therefore only one small part of the in the social sciences and humanities in
methodological endeavor. quantitative methods and in mathematical
In producing this book we address the modes of inquiry, for example, fuzzy logic
methodology of social science research and (Ragin 2000). Mixing different methods (e.g.
the appropriate use of different methods. The Goldthorpe et al. 1968) and the innovative
contributors describe and question different use of statistical analysis (e.g. Bourdieu 1984)
phases of the research process with many are not, however, recent phenomena. The
focusing upon one or more methods, often growth of explicit interest in mixed-methods
in combination with others. What unites research designs dates from the late 1980s,
their contributions is the way they relate resulting in a number of specialist texts
the discussion of method to the broader (Brannen 1992, Bryman 1988, Creswell 2003,
2 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Tashakkori and Teddlie 2003) but the practice there is no uniform ‘qualitative research’
has historically been intrinsic to many types either. Because much of the craft of empir-
of social science research. In qualitative ical social research cannot be classified as
research, many researchers have incorporated either qualitative or quantitative, an increased
several quantitative approaches such as cross- permissiveness toward mixing methods and
tabulation of their data (Alasuutari 1995, questioning of the binary system formed by
Silverman 1985, 2000); and some have the terms ‘qualitative’ and ‘quantitative’ are
adopted a multivariate approach (Clayman welcome trends.
and Heritage 2002). In 1987 Charles Ragin In this new paradigmatic situation many
published his text on qualitative compara- contemporary scholars no longer regard it as
tive methods (Ragin 1987), which lies in reasonable to divide the field of methodology
between qualitative and quantitative methods into opposing camps. On the one hand,
and draws upon logic rather than statistical researchers are willing to learn more about the
probability. Historically there has been a possibilities of applying survey methods and
plurality of practices of social research. statistics to their data analysis. On the other
What distinguishes the social sciences hand, what is known as ‘qualitative research’
today is a positive orientation toward engag- has gone a long way since Malinowski’s
ing in different types of research practice. (1922) principles of ethnography or Glaser
Present-day scholars undertaking empirical and Strauss’ (1967) grounded theory. Dif-
research view methods as tools or optics ferent methods of analyzing talk, texts and
to be applied to several different kinds social interaction have multiplied the ‘optics’
of research questions that they and their available to scholars who want to study social
funders seek to address in carrying out reality from different viewpoints.
research. Coding observations and subjecting This book charts the new and evolving
them to statistical processes is one way of terrain of social research methodology in
creating and explaining patterns. Case study an age of increasing pluralism. By putting
and comparative approaches are others: the together different approaches to the study
explication of the logic that brings together of social phenomena within a single vol-
the clues about a case and has an explanatory ume, the Handbook serves as an invaluable
purpose with reference to other cases. These resource for researchers who wish to approach
two approaches can also be combined as in research with an open mind and decide
embedded case studies that employ both a case which methodological strategies to adopt in
study design and a survey design. empirical research in order to understand the
Although qualitative and quantitative meth- social world. Given the scope of the field
ods have evolved from very different scientific of social research methodology, this volume
traditions as, among others, Charles Ragin concentrates on mapping the field rather than
(1994) points out, from the viewpoint of how discussing each and every aspect and method
empirical data are used to validate and defend in detail. In this way the Handbook serves not
an interpretation, they form a continuum. It only as a manual but also as a roadmap. If and
can be argued that the two concepts, ‘qualita- when the reader wants to learn more about a
tive’ and ‘quantitative’, are not so much terms particular aspect of methodology or method,
for two alternative methods of social research he or she can consult other literature.
as two social constructs that group together
particular sets of practices (see Chapter 2).
For instance, quantitative research draws on CHALLENGING THE PROGRESS
many kinds of statistical approaches and is NARRATIVE
not necessarily epistemologically positivistic
in orientation. While the social survey is the Why social research seems to be heading
current dominant, paradigmatic form, there is toward greater open-mindedness in method-
no uniform ‘quantitative research’. Similarly, ological strategies can easily be interpreted
SOCIAL RESEARCH IN CHANGING SOCIAL CONDITIONS 3
as proof of scientific progress. It is tempting small band and that practically all of them are
to think that after decades of hostility American, because both authors come from
between different methodological camps, the United States. Moreover, the closer to the
notably between qualitative and quantitative present, the more frequently there are new
researchers, we have now finally acquired moments, and the narrower the group.
the wisdom to see that the best results can To follow suit in this book, it would be
be achieved by addressing different ways of quite easy to find good reasons for arguing
framing research questions and by bringing that the methods represented here are a
to bear the means to ensure the validity of natural outcome of scientific progress in social
data analysis and interpretation. This may research methodology. One such argument
imply the use of a mixed method design; in may be that scientific progress constitutes
qualitative research it may mean employing the closure of the gap between qualitative
innovative approaches such as hypermedia and quantitative methods; that by pursuing
or, in social surveys, multi-mode approaches. a multi-method approach we can best tackle
When researchers adopt new methods they the tasks of the social sciences in today’s
will require the guidance of methodological society.
texts. The Handbook represents our attempt Even though we are not unsympathetic
to provide such guidance. to such a view, there are also problems
When discussing developments in social with that argument. Unlike natural science,
research methodology, it is also common to whose development can be described as the
justify change through a narrative in which vertical accumulation of knowledge about
problems and omissions in past research the laws of nature, human sciences are quite
practices and paradigms have led to new different. They are more like a running
approaches. For instance, in the influential commentary on the cultural turns and political
Handbook of Qualitative Research Denzin events of different societies, communities,
and Lincoln recount the development of institutions and groups that change over time.
qualitative research in terms of a progress Social science research not only speaks to
narrative (Denzin et al. 2000). According to particular social conditions; it reflects the
them, the history of qualitative research in social conditions of a society and the theories
the social and behavioral sciences consists that dominate at the time. Because there
of seven moments or periods: the traditional is no unidirectional progress in social and
(1900–1950); the modernist or golden age societal development, the theoretical and
(1950–1970); blurred genres (1970–1986); methodological apparatus available to social
the crisis of representation (1986–1990); scientists change as they too are shaped by
the postmodern, a period of experimental historical, structural and cultural contexts.
and new ethnographies (1990–1995); post- The notion that eventually methodology may
experimental inquiry (1995–2000); and the consist in a collectively usable toolbox of
future (2000–). As informative as their methods is illusory. Methodological traditions
description of the development of qualitative vary across societies and they are also subject
research is, their story also testifies to the to fashion with some more popular at one
problems and dangers of such a narrative. moment in time and in a particular context
Despite their caveats, their progress narrative than others. In any case it is rare for a wholly
functions implicitly as an enlightenment new method to be developed.
discourse, suggesting where up-to-date, well-
informed researchers should be heading if
they are not already there and likewise METHODOLOGICAL PLURALISM AND
identifying exemplary studies that represent EVIDENCE-BASED RESEARCH
the avant-garde or the cutting edge of present-
day qualitative research. It is hardly a surprise From this viewpoint, changes in social
that the researchers in question are a very research must always be seen in their social
and historical contexts. Thus, our assumption medicine’s Cochrane Collaborative, focuses
that there is a trend toward greater permis- on systematic evidence of the effectiveness
siveness in methodology stems from our own of programs in mental health, education
experience as scholars working in countries and criminal justice. At the federal level
that belong to the Organisation for Economic of government the agencies themselves are
Co-operation and Development (OECD)1 . now responsible for providing formal reviews
In addition, our experience stems from of their agency’s performance through the
primarily following the English language Government Performance and Results Act
literature.According to our analysis, that trend (GPRA).
is due to the position that social research has The systematic review of social research
been required to adopt. During recent decades, evidence is widespread in quantitative
the OECD countries have experienced a research whose quality is seen to be mea-
climate of increased accountability in public surable in ‘scientific terms’. Systematic
expenditure and a requirement that research review is also being applied to qualitative
should serve policy ends and ‘user’ interests2 . research, a process that is requiring
In particular the promotion and dominance researchers in this genre to develop more
of the concept of new public management rigorous and convincing arguments for their
by the OECD and its member countries evidence as well as criteria against which
is a key factor. As part of the growing such studies may be measured.
pervasiveness of neoliberal principles, public Social research is also affected by the
policy decisions are required to be grounded increasing prevalence of cross-disciplinary
in evidence-based, scientifically validated pilot or applied projects that serve as tools
research. This has also led to developments to develop solutions to social, economic
in social science research: the ‘systematic and environmental problems. Typically such
review’ process, one of the catchwords also projects, often developed in co-operation
promoted by the OECD, has become a major between public, private and civil society
area of methodological investment in the sectors, include a practical research element
social sciences. and the evaluation of results. One of the
For instance in the United States, although aims is to generate ‘best practices’ that are
the emphasis on policy is not as strong, the to be promoted worldwide3 . Such a model for
tradition of action research and the account- the improvement of governance creates new
ability of research to a diversity of ‘user’ roles and requirements for social research.
groups is longstanding. Program evaluation is The close co-operation of researchers with
a significant player in the policy environment. policy-makers and the merging of the roles
Most government agencies require that their of project manager and researcher challenge
demonstration programs be evaluated. One the ideals of rigorous science, thus creating
research agency, the Institute of Educational an increased interest in action research
Sciences, has in the last few years shifted to methodology. Second, the evaluation of pilot
rigorous randomized experiments. There are or demonstration projects has contibuted to
forces promoting evidence-based treatments the further development of a whole evaluation
in health, mental health and education. Even research industry. Additionally, the marketing
though the evidence-based medicine approach of such pilot projects as best practice creates
originated in Great Britan, the United States an aura of research as scientifically system-
is emphasizing the existence of such evidence atic, although the emphasis is on practical,
in the funding of health and mental health policy-directed research.
services. The U.S Department of Education, The growing market for policy-directed
through its No Children Left Behind pro- and practice-oriented social research does not
gramme is requiring quantitative evidence of necessarily or directly affect academic social
academic improvement. The establishment of science the same way in all contexts. In some
the Campbell Collaborative, modeled after contexts universities need to complement
shrinking public funding with money from in quantitative methods. This development,
external sources, while in other countries however, must be seen against the larger
such as the UK universities are increasingly picture in which qualitative research can be
being seen and run as businesses, with placed at the forefront, because qualitative
research income from external sources sought methods have gained popularity particularly
at ‘full economic cost’. Within Academe, during the past two or three decades. Despite
one consequence of the growing market of increasingly pluralist attitudes toward quanti-
policy-directed research is that the position of tative methods, a major proportion of British
traditional disciplines is weakened as a result sociologists, for instance, conduct qualitative
of the growth of cross-disciplinary theme- inquiries. A recent study shows that only
based research programmes, which are fishing about one in 20 of published papers in the
in the new funding pools of research and mainstream British journals uses quantitative
development. This, in turn, affects the field analysis (Payne et al. 2004). The figures are
of methodology. Cross-disciplinary applied about the same in Finland (Räsänen et al.
research improves the transfer of knowledge 2005), and the same trend, a forward march
between hitherto bounded disciplines, thus of qualitative research particularly from 1990s
constructing methodology as an arena and onward, can also be detected in Canada (Platt
area of expertise that spans disciplines. 2006) and the U.S. (Clark 1999).
In some ways, this has also meant that The increase in the popularity of qualitative
methodology has become a discipline in methods has coincided with new theoretical
itself, or at least it has assumed part of trends that have many names. One talks, for
the role of traditional disciplines. Vocational instance, of a linguistic or cultural turn, or
apprenticeships conducted within a particular about interpretive social science. Overall, we
discipline have been overtaken by training could say that constructionist approaches have
courses for the new generation of researchers gained ground from scientific realism and
who are schooled in a broad repertoire structural sociology.Along with this paradigm
of methods. While it is always useful to shift, personal experience, subjectivity and
master a large toolbox of methods, the identity have become key concerns for many
danger is that without a strong link between social researchers. For instance in British
theory and practice via a particular discipline, sociology, as Carl May (2005: 522) points
for example sociology, people lack what out, ‘after the political watershed of the early
C. Wright Mills (1959) called the ‘sociolog- 1980s, much explicitly Marxist analysis dis-
ical imagination’. As methodology acquires appeared, to be subsumed by social construc-
a higher status across all the social sciences tionism and postmodern theoretical positions
and more emphasis is placed on displaying that also privilege subjectivity and experience
methodological rigour, there is the need to be over objectification and measurement’. He
mindful of Lewis Coser’s admonition to the emphasizes that in different ways, subjectivity
American Sociological Association in 1975 seems to have been one of the central concerns
against producing researchers ‘with superior of British sociology since the 1980s, which
research skills but with a trained incapacity according to him also explains the popularity
to think in theoretically innovative ways’ of qualitative investigation. Indeed, a recent
(Coser 1975). study shows that only about one in 20 of
published papers in the mainstream British
journals uses quantitative analysis (Payne
THE RELEVANCE OF QUALITATIVE et al. 2004).
RESEARCH An interest in cultural studies and construc-
tionist research grew up out of a desire by
In recent years advanced capitalist societies social scientists to distance themselves from
have indeed witnessed increasing method- economistic Marxism and structural sociol-
ological pluralism and a resurgence of interest ogy, particularly in the UK. Other political
influences were also important. For example, link between media research and changes in
under the influence of the Women’s Move- media policy throughout the OECD countries:
ment in the 1970s feminist social scientists while the deregulation of public broadcasting,
sought to address gender inequality and promoted and reviewed by the OECD (OECD
to focus upon women’s perspectives in 1993, 1999), was started during the 1980s,
public and private spheres. By the early reception studies and qualitative audience
1980s qualitative research had established research gained in momentum from the 1970s
a foothold, and by the early 1990s qual- onward4 . For the most part, however, the
itative methods had become mainstream increased interest in subjectivity and identity
in Finnish sociology (Alastalo 2005) and construction within academic (qualitative)
pervasive in the UK. Theory-wise, differ- research is only indirectly related to its policy
ent strands of constructionist thought have relevance.
gained popularity, and the development has
meant an increased interest in questions of
identity. THE IMPORTANCE OF REFLECTIVITY
In the United States qualitative research
developed particularly in response to ‘scien- All in all, social research is being forced to
tistic’ sociology and to research techniques perform a more strategic role in society than
that require a deductive model of hypothesis hitherto. Our argument is not that this strategic
testing. The more inductive approach of role is the sole determinant of developments
qualitative research was seen not only as a in social research, or the kinds of research
better way to explain social phenomena by methods that are used. However, we think
understanding the meaning of action, but it it is important for social scientists to be
was also seen as a way to ‘give voice’ to conscious of the social conditions of our
the underdog, to help see the world from the profession. In that way we are likely to be
viewpoint of the oppressed rather than the better equipped to meet the changing demands
oppressor (Becker 1967, Becker and Horowitz upon us, for instance the need to argue for
1972). Like European sociology, the rise of the methodological strategies we employ and
qualitative research has meant a trend ‘away’ the way we interpret our data. On the one
from determinism to active agency and to hand, we need to retain a sense of integrity
questions of subjectivity. about the claims we make for our research
It seems that the increased interest in evidence while, on the other, we need to take
qualitative research is partly due to recent part in a dialogue with the funders and users of
policy changes, which have foregrounded social research. Reflectivity about the position
questions of subjectivity in many ways. For of social scientists and their public role will
instance, when public services are marketized enable them to retain a critical edge toward
or privatized and citizens are turned into research.
customers, there is demand for expertise on Under the present conditions in which
subjectivity (Rose 1996: 151). Sometimes social research has an increasingly close
the link between policy changes and an link with policy-makers and methodology is
increasing demand for qualitative research assuming higher status in the social sciences, it
can be quite direct. For instance, when is more important than ever to emphasize that
the deregulation of the Finnish electronic methods cannot be seen as separate from the
media system started during the first part ‘entire scientific quest’ and should include the
of the 1980s, YLE, the national public inspiration of theory. This is the spirit of this
broadcasting company quickly launched a book. It is meant to be an aid to researchers in
fairly big qualitative research program to their attempt to perform innovative research.
study the audiences, their way of life and As researchers have always known, one of the
viewing preferences to fight for its share of keys to good research is to challenge one’s
the audience. There appears to be a similar own assumptions and to carry out the study in
such a way that the data have the possibility orientations. It is well known for its individual country
of surprising the researcher. surveys and reviews.
2 European Union funding requires research that
produces ‘impacts’ and addresses the concerns of the
social partners.
USING THE HANDBOOK 3 For this task, there is an international Best
Practices database, maintained by the United Nations,
The Handbook is structured around the differ- UNESCO and non-profit organizations (http://www.
ent phases of the research process: research bestpractices.org/index.html).
4 For the development of qualitative audience
design, data collection and fieldwork, and the research, see Alasuutari 1999.
processes of analyzing and interpreting data.
First, however, it begins with several chapters
of more overarching importance that set out REFERENCES
some important current issues and directions
in social research: such as the history and Alastalo, Marja (2005) Metodisuhdanteiden mahti:
present state of social research, the debate Lomaketutkimus suomalaisessa sosiologiassa
about research paradigms, the issue of judging 1947–2000 [The Power of Methodological Trends:
the credibility of different types of social Survey Research in Finnish Sociology 1947–2000].
science research, and the importance now Tampere: Vastapaino.
being placed upon research ethics. Alasuutari, Pertti (1995) Researching Culture: Qualita-
The contents of the Handbook have several tive Method and Cultural Studies. London: Sage.
features that are not present in all such texts. Alasuutari, Pertti (1999) ‘Three Phases of Reception
As well as ranging widely across the field of Studies.’ Pp. 1–21 in Rethinking the Media Audience:
The New Agenda, edited by Alasuutari, Pertti.
social research methodology, we have been
London: Sage.
selective in including a number of chapters
Becker, Howard S. (1967) ‘Whose Side Are We On?’
that discuss the combining of qualitative Social Problems 14(3): 239–47.
and quantatiative methods and integrating Becker, Howard S. and Irving Louis Horowitz (1972)
different types of data. The book is also ‘Radical Politics and Sociological Research: Observa-
particularly strong in its section on data tions on Methodology and Ideology.’ Americal Journal
analysis and includes four chapters on the of Sociology 78(1): 48–66.
analysis of quantitative data, five devoted Blumer, Herbert (1969) Symbolic Interactionism:
to qualitative data analysis, and three to the Perspective and Method. Berkeley, CA: University of
integration of data of different types. It also California Press.
covers the secondary analysis of qualitative Bourdieu, Pierre (1984) Distinction: A Social Critique of
the Judgement of Taste. London: Routledge & Kegan
and quantitative data with one chapter on
Paul.
meta-analysis, and another on writing up and
Brannen, Julia (1992) Mixing Methods: Qualitative and
presentation of social research. Quantitative Research. Aldershot: Avebury.
Bryman, Alan (1988) Quantity and Quality in Social
Research. London: Unwin Hyman.
NOTES Clark, Roger (1999) ‘Diversity in Sociology: Problem or
Solution?’ American Sociologist 30(3): 22–41.
1 Originally set up in 1947 with support from Clayman, Steven E. and John Heritage (2002)
the United States and Canada to co-ordinate the ‘Questioning Presidents: Journalistic Deference and
Marshall Plan for the reconstruction of Western Adversarialness in the Press Conferences of U.S.
Europe after World War II, today the OECD consists Presidents Eisenhower and Reagan.’ Journal of
of 30 member countries sharing a commitment to Communication 52(4): 749–75.
democratic government and the market economy. It
Coser, L (1975) ‘Presidential address: Two methods in
plays a prominent role in fostering good governance
in the public service and in corporate activity and
search of a substance.’ American Sociological Review
helps governments to ensure the responsiveness of 40(6): 691–700.
key economic areas with sectoral monitoring. By Creswell, John W. (2003) Research Design: Qualitative,
deciphering emerging issues and identifying policies Quantitative, and Mixed Methods Approaches.
that work, it helps policy-makers adopt strategic 2nd ed. London: Sage.
Denzin, Norman K. and Yvonna S. Lincoln (2000) Platt, Jennifer (2006) ‘How Distinctive Are Canadian
‘Introduction: The Discipline and Practice of Qualita- Research Methods?’ Canadian Review of Sociology &
tive Research.’ Pp. 1–28 in Handbook of Qualitative Anthropology 43(2): 205–31.
Research, 2nd ed., edited by Denzin, Norman K. and Ragin, Charles C. (1987) The Comparative Method:
Yvonna S. Lincoln. Thousand Oaks: Sage. Moving Beyond Qualitative and Quantitative Strate-
Glaser, Barney G. and Anselm L. Strauss (1967) gies. Berkeley: University of California Press.
The Discovery of Grounded Theory: Strategies for Ragin, Charles C. (1994) Constructing Social Research:
Qualitative Research. Chicago: Aldine Transaction. The Unity and Diversity of Method. Thousand Oaks:
Goldthorpe, John H., David Lockwood, Frank Bechhofer Pine Forge Press.
and Jennifer Platt (1968) The Affluent Worker: Ragin, Charles C. (2000) Fuzzy-Set Social Science.
Industrial Attitudes and Behaviour. Cambridge: Chicago: University of Chicago Press.
Cambridge University Press. Rose, Nikolas (1996) Inventing Our Selves: Psychology,
Malinowski, Bronislaw (1922) Argonauts of the Western Power, and Personhood. Cambridge, England;
Pacific. London: G. Routledge & Sons. New York: Cambridge University Press.
May, Carl (2005) ‘Methodological Pluralism, British Räsänen, Pekka, Jani Erola and Juho Härkönen (2005)
Sociology and the Evidence-based State: A Reply to ‘Teoria ja tutkimus Sosiologia-lehdessä [Theory and
Payne et al.’ Sociology 39(3): 519–28. research in the Sosiologia journal].’ Sosiologia 42(4):
Mills, Wright C. (1959) The Sociological Imagination. 309–14.
New York: Oxford University Press. Silverman, David (1985) Qualitative Methodology and
OECD (1993) ‘Competition Policy and a Changing Sociology: Describing the Social World. Aldershot:
Broadcast Industry.’ Gower.
OECD (1999) ‘Regulation and Competition Issues in Silverman, David (2000) Doing Qualitative Research:
Broadcasting in the Light of Convergence.’ A Practical Handbook. London: Sage.
Payne, Geoff, Malcolm Williams and Suzanne Tashakkori, Abbas and Charles Teddlie (2003)
Chamberlain (2004) ‘Methodological Pluralism in Handbook of Mixed Methods in Social and Behavioral
British Sociology.’ Sociology 38(1): 153–63. Research. London: Sage.
PART I
Directions in Social Research
What is the state of the art of social research? contributed to an exaggerated distinction
What are its new directions in terms of between two camps, when in fact social
methods, credibility, ethical questions, and its researchers using quantitative methods have
relationship to the users of research? As was always been innovative and pragmatic in
discussed in the introduction, to understand applying different approaches. Because of the
better the current trends we need to place focus upon differences between methodolo-
them in historical and societal context. Social gies, we tend to miss the continuing diversity
research does not only follow its own logic of that exists within qualitative and quantitative
scientific progress but rather responds to and research. On the other hand, as Bryman
at times also influences social change. (Chapter 2) notes, there is a hierarchy of status
Part I of this book discusses the current given to particular research designs within the
state of social research and places it in quantitative tradition in which experimental
historical context. The chapters approach the methods with their superiority in offering
present condition of social research from causal explanations are positioned at the top.
different angles and complement each other In contrast, qualitative research is represented
in producing a picture of the field, in which by diversity rather than hierarchy. The trend
some of the earlier controversies or tensions is, however, towards an increase in the explicit
are left behind and new ones emerge. use of mixed methods research designs and a
It is interesting that methodology, as the growing pragmatism and diversity in the ways
means of knowing, has become a forum in which such researchers view the integration
for furious disputes, generally known as the of qualitative and quantitative data.
paradigm wars. More generally, there is a Why is it, then, that the self-identity
tendency in the field of social science for of social researchers is caught up in the
researchers to define themselves and the idea of incommensurable paradigms, which
other in terms of differentness. As Alan tends to exaggerate differences and downplay
Bryman argues in Chapter 2, while these diversity and a pragmatic use of methods?
differences are referred to as paradigms or One possible explanation is given by Marja
philosophical positions in practice they often Alastalo in Chapter 3, in which she laments
represent technical decisions about the use the scarcity of empirical research about the
of methods – qualitative or quantitative. In history of social research. Instead, method
a similar vein, Marja Alastalo points out in textbooks, for instance, contain histories
Chapter 3 that the paradigm wars between of methodological development that aim at
qualitative and quantitative methods have legitimating the writers’ own approaches.
Such descriptions tend to paint a picture knowledge imply and produce forms and
of the field in black and white and ignore relations of power. However, this does not
details that do not fit nicely into stereotypical mean that researchers can select a standpoint
representations of the different camps. For and an audience of their own choice and
instance, in many accounts of the history only produce knowledge that serves interests
of social research, the contradiction between of which they approve. First, as Karen
case study and statistical methods is presented Armstrong (Chapter 5) remarks, researchers
in terms of differences of tradition in the are dependent on research funding; this
universities of Chicago and Columbia. Such affects the topics they study and often
accounts ignore the fact that the Chicago reflects the influence of dominant interests in
School, often mentioned as the birthplace of society. Second, the audiences of ethnography
case study, also contributed to quantitative with which Armstrong deals are increasingly
social research, and the Columbia School global. The text may be written from a
was a dominant force in the development of perspective of a Western academic –‘we’ –
qualitative research. All in all it is evident but as Armstrong points out, the audience
that despite the paradigm wars between case may be any number of people with an interest
study and statistical research, or qualitative in the place, the topic, or for other reasons.
and quantitative approaches, in actual prac- Ethnographers – and other social researchers –
tice many social researchers have always are faced, therefore, with the situation in
been quite flexible in applying different which data are collected from a variety of
methods. people who themselves have a variety of
Currently methodological pluralism is on interests, while a variety of readers bring
the rise, and this development calls for a their own interests to understanding the text.
rethinking of the nature of research, both Thus the work produced will be read for its
quantitative and qualitative, and of how it relevance by readers who assign meaning to
can be assessed. Reflecting the exagger- it according to their own evaluations.
ated contrast drawn between qualitative and The observation that social research has an
quantitative methods, it is often suggested increasingly diverse audience and serves the
that quantitative research has a clear set of interests of a diversity of social groups, as
assessment criteria, whereas in the case of reflected in the trend towards participatory
qualitative inquiry no agreed validity criteria methods, is part of the general picture of
are available. However, Martyn Hammersley the changes taking place in the role of
argues in Chapter 4 that the general standards social inquiry in advanced capitalist societies.
in terms of which both the process and These changes are outlined from different
products of research should be judged are perspectives by Marja Alastalo (Chapter 3),
the same whichever approach is employed. Pekka Sulkunen (Chapter 6) and Ann Nilsen
Hammersley stresses that whether we are talk- (Chapter 7). As Pekka Sulkunen discusses,
ing about quantitative or qualitative inquiry, there has been a major trend over the last
there cannot be tests that measure validity; three decades from Mode 1 ‘pure’ science
there is substitute for judgement. to Mode 2 knowledge production, in which
In addition to aiming at true findings the latter relies on pragmatic criteria of
or conclusions in their inquiries, social evaluation and is trans-disciplinary (Gibbons
researchers also need to think about the et al. 1994).
questions they pose in their research. From This change in the role of social science
which perspectives are they relevant, and knowledge in society is part of the regime
whose interests does the knowledge produced change from Keynesian liberalism to neolib-
serve? In light of Michel Foucault’s (Foucault eralism, in which there has been a move from
1977, 1980a, 1980b) point about the power- ‘resource steering’ to ‘market steering’ within
knowledge couplet, it is evident that no neutral public administration and in the privatization
observer position exists. Instead, forms of of many public services. The change has
DIRECTIONS IN SOCIAL RESEARCH 11
affected social research in several ways. social science responsibly. Insuring ethical
On the one hand, structural functionalism competence in social research is a difficult
and other holistic theories of society, which task for social researchers and for institutional
served the interests of Keynesian-planned review boards. Social scientists are addi-
economy, have been challenged by construc- tionally challenged because of the historical
tionist approaches, which direct attention biomedical bias in the way in which ethical
to questions of subjectivity and identity. questions are perceived and handled. More
Because the regulation of human beings is generally they are challenged by increased
increasingly based on one’s own ability to open access to information (Freedom of Infor-
foresee and manage ‘choices’, there is demand mation laws) and increased legal protection of
for expertise in subjectivity (Rose 1996: 151). informants.
Consequently, qualitative research has gained
in momentum from the 1970s onwards.
On the other hand, the requirement that
REFERENCES
public policies and practices are grounded
in evidence-based, scientifically validated
Dixon-Woods, Mary, Sheila Bonas, Andrew Booth,
research has also gained in momentum, since et al. (2006) ‘How Can Systematic Reviews Incor-
the early 1990s (Dixon-Woods et al. 2006: 27). porate Qualitative Research? A Critical Perspective.’
That is one reason why there is increased Qualitative Research 6(1): 27–44.
demand for quantitative research skills. Under Foucault, Michel (1977) Discipline and Punish: The Birth
these conditions it is predictable that along of the Prison. London: Penguin Books.
with the attitude of methodological pluralism Foucault, Michel (1980a) The History of Sexuality/Vol. 1.
there continues to be tension between realist An Introduction. New York: Vintage Books.
and constructionist approaches, as discussed Foucault, Michel (1980b) Power/Knowledge: Selected
by Ann Nilsen in Chapter 7. Interviews and Other Writings, 1972–1977. Brighton,
Sussex: Harvester Press.
Albeit the role of social research in
Gibbons, Michael, Camille Limoges, Helga Nowotny,
society is changing, its importance is not et al. (1994) The New Production of Knowledge: The
decreasing. As Celia B. Fisher and Andrea Dynamics of Science and Research in Contemporary
E. Anushko (Chapter 8) argue, increased Societies. London: Sage.
public recognition of the value of social Rose, Nikolas (1996) Inventing Our Selves: Psychology,
research has been accompanied by a height- Power, and Personhood. Cambridge, England;
ened sensitivity to the obligation to conduct New York: Cambridge University Press.
2
The End of the Paradigm Wars?
Alan Bryman
INTRODUCTION scientific programme for social research, as

against one that eschews scientific pretensions
The term ‘the paradigm wars’is not easy to pin and the search for general laws and instead
down, in that there is likely to be some debate emphasizes humans as engaged in constant
about which paradigms were involved and the interpretation of their environments within
dates signalling the beginning and the end of specific contexts. This contrast is one that
the conflict (in addition, of course, the matter is frequently drawn up in terms of a battle
of whether there really has been a cessation of between positivist philosophical principles
hostilities). One of the main meanings of the and interpretivist ones, based on general
term in social research and kindred fields is the theoretical and methodological stances, such
reference to the debates that have raged about as phenomenology, symbolic interactionism
the merits and assumptions of quantitative and a verstehende approach to social action.
and qualitative research, although alternative At the ontological level, there is a contrast
terms are sometimes employed to express between a belief that there is a social
these contrasting positions. This is certainly realm waiting to be uncovered by the social
the meaning that can be gleaned from such researcher and which exists externally to
prominent writers as Hammersley (1992) and actors and on the other hand a domain that
Oakley (1999). This was also one of the battle is in a continuous process of creation and
lines in an article by Gage (1989) which was recreation by its participants. This contrast is
one of the earliest uses of the term, although often drawn up in terms of a contrast between
he employed alternative terms to quantitative objectivist and constructionist accounts of
and qualitative research. the nature of society. Quantitative research
The paradigm wars in this sense centre on is typically associated with a positivist and
the contrasting epistemological and ontolog- objectivist stance, while qualitative research is
ical positions that characterize quantitative associated with an interpretivist and construc-
and qualitative research and their various tionist one. However, the often stark contrasts
synonyms. At the level of epistemology, that are sometimes drawn up in accounts
there is the issue of desirability of a natural of the differences between quantitative and
qualitative research possibly exaggerate the science proceeds through successive scientific
differences between them. revolutions whereby one paradigm of scien-
It is striking that this contrast is drawn tific understanding is replaced by another.
up in predominantly philosophical terms. The A paradigm, then, represents a cluster of
presence or absence of quantification, as beliefs about the proper conduct of science.
symbolized by the terms quantitative and One further important element in Kuhn’s
qualitative research, is not the issue that is argument was that paradigms within a field
the focus of conflict between the warring are incompatible. Their fundamental beliefs
parties; rather, quantification and its absence cannot be reconciled. There is no common
act as ciphers for the underlying philosophical ground between paradigms in terms of their
issues. Had the issue that divides the parties underlying tenets.
simply been a technical matter of the desir- One of the over-riding implications of con-
ability or otherwise of quantification, it is struing quantitative and qualitative research
likely (or at least possible) that the differences as paradigms in Kuhn’s sense, and therefore
between the proponents of quantitative and as incompatible approaches, was that this
qualitative research would not have been as implied to many commentators that it was
intractable as they have been. It is the fact not appropriate to combine them in an
that debate about quantitative and qualitative investigation. In other words, it denied the
research is to do with such fundamental philo- legitimacy of conducting a research project
sophical matters as how humans and their in a manner that combined, say, a survey with
society should be studied and the very nature unstructured interviewing or with any other
of ‘the social’ that has contributed towards research method associated with qualitative
making the paradigm wars so resistant to research. While the term ‘paradigm wars’may
mediation, although the parties sometimes seem a rather dramatic – some might say
alternate between philosophical and technical overly dramatic – way of characterizing the
discourses (Bryman, 1984, 1988). Quite why debates that were going on about methodolog-
philosophical issues became entwined with ical issues, it does give a sense of the intensity
matters of research practice to this degree of these debates.
is unclear. One factor may be that drawing Whether it is justifiable to treat quantitative
on philosophical ideas provided an intellec- and qualitative research as paradigms is a
tual rationale and legitimacy to qualitative separate issue. It is probably the case that it
research as it emerged from the shadows of is quite inappropriate to designate them as
quantitative research in the 1970s. Indeed, paradigms because neither of them can be
our understanding of quantitative research and viewed as indicative of the normal science of
its philosophical bases and biases is largely a discipline, which is how Kuhn employed
founded on the account of it provided by qual- the term, although it has to be recognized
itative researchers since that time (Brannen, that his use of the term was somewhat
2006). Quantitative researchers tend to be slippery. Quantitative and qualitative research
less reflective than qualitative researchers are probably closer to being ‘pre-paradigms’.
concerning the fundamental nature of their As Kuhn noted: ‘it remains an open ques-
approach. tion what parts of social science have yet
acquired … paradigms at all’ (1970: 15).
However, the language of scientific paradigms
THE ISSUE OF INCOMPATIBILITY is deeply ingrained in many discussions of
social research methods and even when the
The association of the two approaches term is not used, there is a sense that the
with the idea of paradigms represented an ‘paradigmatic mentality’(Hammersley, 1984)
implicit reference to the influential work of lies behind those discussions. Moreover,
the American historian of science Thomas the notion of incommensurability is deeply
Kuhn (1970). Kuhn memorably argued that a ingrained so that any recourse to the language
THE END OF THE PARADIGM WARS? 15
of paradigms tends to be associated with a be more determinative with regard to research

sense of the differentness and incompatibility approach than others). At the technical level,
of approaches. the differences are more to do with the
One of the most influential statements character of the data generated by the research
revealing a preoccupation with paradigms is methods associated with quantitative and
Burrell and Morgan’s (1979) account of the qualitative approaches and their relevance
ways in which organization theory could be to different kinds of research questions or
viewed in terms of four distinct paradigms. roles in the overall research process (Bryman,
Two of the paradigms they identified – the 2004).
functionalist and interpretive paradigms –
correspond closely to quantitative and qual-
itative research. For these authors, the four THE RISE OF MIXED METHODS
paradigms ‘define four views of the social RESEARCH
world based upon different meta-theoretical
assumptions with regard to the nature of A crucial stage in the paradigm wars, and more
science and society’ (1979: 24) and as such particularly in the production of some respite
are incompatible. in hostilities, has been the emergence of
It is this sense of paradigm incompatibility mixed methods research. By mixed methods
that lies at the heart of the paradigm research I am referring to research that com-
wars. The discussions about quantitative and bines quantitative and qualitative research.
qualitative research tended to be underpinned This has become the most common meaning
by a sense of their incompatibility, as long as of the term (Tashakkori and Teddlie, 2003).
the debates about them remained at the level of Of course, it is possible to mix quantitative
what Burrell and Morgan refer to in the above research methods and it is also possible to
quotation as ‘meta-theoretical assumptions’. mix qualitative research methods, so that
In fact, as I noted in my early discussions the mixing is within a quantitative or a
of these issues, writers on these issues were qualitative strategy. Indeed, each of these is
not consistent about the levels at which they quite a common occurrence, but the term
explored quantitative and qualitative research ‘mixed methods research’ tends to be used to
(Bryman, 1984, 1988). While the discussion represent the mixing of research methods that
sometimes operated at an epistemological cross the quantitative-qualitative divide.
level, and as such was concerned with ‘meta- Mixed methods research should not be
theoretical assumptions’, it also sometimes regarded as a new approach, even though
took place at a technical level. At this latter some writers are characterizing it as a
level, the debate about quantitative and quali- third way of conducting social research
tative research was fundamentally concerned (e.g. Creswell, 2003). For example, Fine
with the technical merits and limitations and Elsbach (2000) have noted that some
of each of the two approaches and the of the early classics in social psychology
research methods with which they tend to be were notable for their employment of both
associated. quantitative and qualitative methods. Such
The distinction between epistemological classic studies as Marienthal (Jahoda et al.,
(along with ontological) and technical levels 1972), a study of a community with a
of the debate is crucial from the point of view high level of unemployment and originally
of the paradigm wars and the prospects for published in German in 1933, is a veritable
their resolution. At the epistemological and smorgasbord of data sources, some of which
ontological levels there is an incompatibility are quantitative and some qualitative.
of fundamental assumptions in terms of what The early existence of mixed methods
should be regarded as acceptable knowledge studies might seem to be inconsistent with the
and how society and its institutions should be paradigm wars and their timing, as outlined
characterized (although some positions may above. If there are early mixed methods
classics such as these, can it make sense to other chapters presenting specific methods,
date the paradigm wars from the 1970s and this one is about problems in qualitative and
to associate the hostilities with the rise of case analysis. In other words, the chapter is
qualitative research? The answer resides in not just an exposition of these methods but a
large part in the rise of quantitative research critique of them as well. Even the chapter on
as the dominant approach to the collection and observation (Chapter 10) was not concerned
analysis of data in the years after the Second with observation of the participant observa-
World War. While this research strategy was tion kind but that associated with structured
especially dominant in North America, it held observation – a quantitative approach to
sway in many other countries as well, such observation. This brief examination of a key
as the UK. Qualitative research continued to text provides a small insight into the marginal
enjoy support and to be practised but it was status of qualitative research in the past.
often regarded as unscientific and as merely An interesting insight into this neglect
occupying a preparatory role for the conduct of qualitative research during these years
of quantitative social research. is provided by Savage’s (2005) examination
We can see such a perception if we briefly of the Affluent Worker studies conducted in
examine the chapter headings of Methods in Luton in England in the 1960s (Goldthorpe
Social Research, a key text published in 1952 et al., 1966). In various reports of their
by William Goode and Paul Hatt. This book findings, the Affluent Worker researchers
was significant for two reasons. First, it was emphasized findings that could be expressed
written by two leading figures in the field. in statistical terms. These were findings that
Both authors were distinguished American reflected a high level of consistency between
social researchers who also had made signifi- coders. As a result, the authors tended to
cant contributions to social research method- ignore:
ology and to substantive areas. Second, the
broad structure formed a kind of template the more qualitative features of the interview and
concentrating on those aspects of the respondent’s
that many other research methods texts would testimony which could be quantified … In the
follow over the succeeding years. process, a huge amount of evocative material
Three things are striking about this chapter was left ‘on the cutting room floor’. Having
layout. First, virtually the first third of the book gathered rich qualitative material, the researchers
in terms of the number of chapters concerns then effectively stripped out such materials in favour
of more formal analytical strategies when they came
issues to do with the scientific method. to write up their findings. (Savage, 2005: 932)
Not only are there references to science
and scientific method but we also see key Savage observes that his re-analysis of
terms often associated with the approach – the qualitative data did not lead him to cast
references to facts, hypotheses, proof, and doubt on the broad conclusions Goldthorpe
testing. These activities were seen as the et al. proffered, such as their significant
very stuff of scientific method at the time. findings concerning the prevalence of instru-
Second, most of the following chapters are mentalism among a broad swathe of the
based on the discussion of methods that work force. However, there is evidence
are associated with the implementation of from the transcripts and the field notes that
the scientific method in social research – both the respondents and their interviewers
questionnaires, interviews, probability ideas thought in different ways about class from
and sampling, and scaling. Third, there is just the researchers, especially David Lockwood,
one chapter – Chapter 19 – that includes a who was a member of the team and a
discussion of methods that stand outside the prominent theorist of social stratification
mainstream methods with their scientific con- in the 1960s. It is plausible that had the
notations. This chapter covers the discussion researchers not been so clearly locked into
of qualitative research and the examination of a quantitative research approach, they might
single cases. However, it is telling that unlike have taken the qualitative nuances in their data
more seriously. The general point is that in and use of mixed methods research.
Savage’s exercise sheds light on the relatively I conducted a content analysis of articles
low esteem in which qualitative research was using a mixed methods approach covering the
held at the time. period 1994–2003. This research is described
It is difficult and probably impossible to in Bryman (2006a) but one unreported
chart the point that qualitative research came finding relevant to the present discussion is
out of the shadows and closer to the main- that if we compare the number of articles
stream, although it is questionable how far it which combined quantitative and qualitative
has entered the mainstream in North America. research in 2003 with the number in 1994,
From 1970 onwards, there is evidence of a there was a threefold increase. However, it
growing number of books (Filstead, 1970; would be wrong to depict the paradigm wars
Schwartz and Jacobs, 1979). Journals with as having totally come to an end. The growth
a qualitative research emphasis began to of mixed methods research may give the
appear: Qualitative Sociology was started in impression that there has been an abatement
1978 and Urban Life and Culture (later named in the hostilities but that is not the case.
Urban Life and then Journal of Contemporary
Ethnography) began life in 1972. The reasons
probably had a lot to do with a certain amount THE CONTINUED EXISTENCE OF
of disillusionment in some quarters regarding PARADIGM DISPUTES
the utility of quantitative research and its out-
comes. Critiques of the quantitative research In the rest of this chapter, I will draw attention
orthodoxy like those written by authors like to three areas which suggest that there
Cicourel (1964) and Phillips (1971, 1973) are lingering signs of paradigm hostilities.
probably played a significant role in the rise In other words, although mixed methods
of qualitative research, although qualitative research represents a sign that one of the
research itself was not immune to their main cleavages in the paradigm wars has
critical gaze. Further, as previously suggested, been bridged, this is not to say that paradigm
the growing awareness of theoretical ideas disputes have been totally resolved. First,
and philosophical positions that offered an it is important to appreciate that there are
alternative viewpoint to the positivist position fundamental differences within both quan-
that was seen as the motor behind quantitative titative and qualitative research. Insofar as
research probably played a significant role quantitative and qualitative research might
and almost certainly accounts for the way be described as paradigms, these represent
in which quantitative and qualitative research what could be termed ‘intra-paradigmatic
became entangled with philosophical issues. differences’. Second, there are some fairly
Along with a growing awareness of theoretical fundamental differences among social and
ideas and philosophical positions that offered other researchers concerning how mixed
an alternative to positivism, it served to methods research should be viewed. Third,
legitimate the use of qualitative methods in the there are signs in fields that are very adjacent
face of the hegemony of quantitative research. to social research that the dust has not settled
Thus, although there is evidence of ear- on the paradigm wars and that in fact there
lier generations of researchers combining are occasional paradigm skirmishes. Each of
quantitative and qualitative research, the these three areas will form the basis for the
emergence of the paradigm wars was a product remainder of this chapter.
of the way in which philosophical issues
became attached to research methods and the
Intra-paradigmatic differences
domination of social research by quantitative
research. Quantitative research is sometimes viewed
There is little doubt, as previously noted, as though it is a monolithic, undifferen-
that there has been an increase in interest tiated approach that is completely imbued
with positivism. However, there is a growing associated with experimentalists and non-
recognition of a post-positivist position that, experimentalists do not warrant the appel-
while it shares many of positivism’s basic lation ‘paradigms’. On the other hand, they
tenets, it differs in certain respects. Post- do reflect a fundamental difference in the
positivism differs in its more accommodating degree to which a strict positivist position
stance towards qualitative data, which are should be followed and what value can
given short shrift in traditional positivist and cannot be placed on non-experimental
conceptions other than in a very limited role. investigations. Such considerations also elide
It typically shares with positivism the view with disciplinary contexts, in that a view like
that there is a reality that is independent of Crano’s is more likely to be associated with a
and external to the researcher but tends to discipline like psychology which has a strong
recognize that reality can only be understood inclination towards experiments.
in a limited way because that understanding However, there are even more intra-
derives from the researcher’s conceptual paradigmatic differences within qualitative
tools. As such, post-positivism accommodates than within quantitative research. A glance at
many of the critiques of the positivist view of the latest edition of the Handbook of Quali-
science by recognizing that there cannot be tative Research (Denzin and Lincoln, 2005b)
theory-neutral observation (Wacquant, 2003). displays an extraordinary and apparently
Further, there are fundamental differences growing diversity of approaches within the
in some areas of social research, such as social qualitative research community. At one point
psychology, between those who prioritize in the volume, Denzin and Lincoln (2005a: 24)
experiments and those who include non- outline a table that presents this diversity.
experimental research methods, such as the They delineate several paradigms (their term)
sample survey, within their purview. For the that share three features – relativist ontologies,
former, it is not possible in non-experimental interpretivism at the epistemological level,
research unambiguously to attribute causality and interpretive and naturalistic methods.
to relationships between variables, whereas They then outline several paradigms that
the second group accepts that causal impacts share these three criteria but differ in other
can be gleaned through statistical controls. fundamental ways, including constructivism,
As an example of the former position, an feminism, ethnic, Marxist, cultural studies and
experimentalist writes: queer theory.
Other writers have drawn attention to
For strict experimentalists, factors that differentiate
additional basic differences among qualitative
participants (e.g., sex, gender, religion, IQ, per- researchers. Charmaz (2000, 2005) discusses
sonality factors), and other factors not under the a basic difference between objectivist and
control of the researcher (e.g., homicide rates in constructivist stances within expositions of
Los Angeles), are not considered independent and and studies using grounded theory. Whereas
thus are not interpreted causally. However, in some
research traditions, variables under experimental
the former is founded on the assumption
control sometimes are suggested as causes. … that there is an ‘external world that can be
Owing to the possibility of … third-variable causes, described, analyzed, explained, and predicted’
causal inferences based on correlational studies are (2000: 524), a constructivist grounded theory
best offered tentatively. (Crano, 2004: 484) ‘recognizes that the viewer creates the data
and ensuing analysis through interaction with
It is precisely for this reason, that a hier- the viewed’ (2000: 523). A further fundamen-
archy of research methods is sometimes tal difference between forms of or approaches
presented which implies that evidence from to qualitative research centres on the approach
experimental studies is or should be at the to the use of language. Much qualitative
top after systematic reviews of experiments research treats language as a mechanism
(Becker and Bryman, 2004: 57). Arguably, for understanding the social world, so that
the ‘research traditions’ (to use Crano’s term) interviewees’ replies are treated as a means
of understanding the topics about which they argument which depicts research methods
are asked questions. For researchers working as associated with a set of epistemological
within traditions like conversation analysis assumptions. A research method is thus a
and discourse analysis, language is a topic in cipher for underlying philosophical ideas.
its own right. It is viewed as constitutive of Smith and Heshusius write:
social reality and is a form of action in its own
right, not simply a window on action. Given This disregard of assumptions and preoccupation
these different stances on the role of language with techniques have had the effect of transforming
qualitative inquiry into a procedural variation of
in social research, it is not too fanciful
quantitative inquiry. … That certain individual
to suggest that they represent paradigmatic procedures can be mixed does not mean that there
differences in the ways in which social are no differences of consequence. (1986: 8, 9)
reality should be apprehended. For example,
the conversation analyst’s disinclination to This is in reality a re-statement of the bases
take context, as identified by researchers, on which the paradigm wars were waged.
into account in examinations of talk is in It depicts two irreconcilable sides, so that no
stark contrast to the significance of context fraternizing with the enemy is legitimate.
for many qualitative researchers (Schegloff, In recent years, this position on mixed
1997). For example, Morse (2001) talks methods research has become less frequently
about evidence of a degree of ‘paradigm voiced and in its place an attitude of
asynchronicity’ when referring to the rise of pragmatism has permeated the field. Initially,
a debate within qualitative research implying this sense of a pragmatist position was
that approaches like grounded theory and most often in evidence in the more applied
narrative analysis are less rigorous than fields in the social sciences, such as eval-
conversation analysis. uation research. Indeed, practitioners from
such fields have been especially prominent
advocates of and writers on mixed methods
Differences in positions on mixed
research (e.g. Greene et al., 1989). Essen-
methods research
tially, the pragmatist position either ignores
Mixed methods research has attracted a paradigmatic differences between quantita-
variety of positions on its prospects and on tive and qualitative research or recognizes
what it can and cannot achieve. Some writers their existence but in the interests of exploring
have been extremely resistant to the idea research questions with as many available
that quantitative and qualitative research tools as possible, it shoves them to the
might be combined. Smith and Heshusius side. For example, Maxcy (2003: 79) argues
(1986) have provided one of the strongest that pragmatism ‘seems to have emerged
and clearest statements of such resistance. as both a method of inquiry and a device
These authors argue that treating quantitative for the settling of battles between research
and qualitative research as compatible and purists and more practical-minded scientists.
therefore as combinable neglects the fact that The point about pragmatism is that in place
they are based on fundamentally different of an emphasis on philosophical issues and
and irreconcilable foundations. Theirs is an debates that were a feature of the paradigm
example of what I have referred to as the wars and which were the province of the
‘paradigm argument’, which stresses the dif- ‘research purists’ to which Maxcy refers,
ferences between quantitative and qualitative issues to do with the mixing of methods
research in terms of foundational assumptions become matters of technical decisions about
about the nature of knowledge rather than the appropriateness of those methods for
in terms of technique (Bryman, 2004). answering research questions. Issues to do
The paradigm argument rests upon another with the appropriateness of research methods
argument which is often employed in such for answering research questions or ensuring
discussions. This is the ‘embedded methods’ continuing funding in the modern competitive
academic environment became the criteria for prioritizes finding out whatever is needed to
judging the desirability or otherwise of mixing address the researcher’s objectives.
methods, rather than philosophical principles. As such, there would seem to be two
In 2003, I interviewed 20 UK social distinct stances on mixed methods research:
scientists who were known to be mixed one which emphasizes paradigm differences
methods research practitioners. The details between quantitative and qualitative research
of this research can be found in Bryman and which stresses their incompatibility, and
(2006b). The pragmatist stance was very another which emphasizes a pragmatist posi-
much in evidence among these researchers. tion of depicting research as using whichever
In the words of one of my interviewees: research methods are most appropriate regard-
‘So we’ve taken that pragmatic decision less of the supposed epistemological location.
to do it that way because that’ll generate These might usefully be labelled the paradig-
something that either method, standing alone, matic and pragmatic stances on the prospects
is not gonna give us’ (quoted in Bryman, of doing mixed methods research, although
2006b: 117). Another referred to the fact these do not exhaust the range of possibilities
that he/she was located in an entrepreneurial (Greene and Caracelli, 1997).
research centre where ‘there’s always been The growth of mixed methods research has
so much more of a pragmatic approach to to a significant extent occurred because the
doing things’(quoted in Bryman, 2006b: 117). pragmatic stance became ascendant in the
On other occasions, it was striking that years after Smith and Heshusius articulated
although the term ‘pragmatism’ was not their views, although it is important to
employed, it could be clearly discerned in appreciate that similar views continued to be
interviewees’ replies. One interviewee replied expressed (e.g. Buchanan, 1992). However,
that the crucial issue was: the very surge of interest in doing mixed
methods research has been accompanied by
assessments of its prospects and potential.
attempting to better understand what it is you’re
trying to understand, and in that way, you then have One of the themes that can be discerned among
to ask how appropriate are the sorts of methods I’m these appraisals is some recourse to paradig-
using and are they going to give me the information matic arguments. Three examples can be used
to understand what it is I’m researching? (Quoted to illustrate this point. Sale et al. (2002)
in Bryman, 2006b: 117)
write that because they represent different
paradigms with contrasting epistemologi-
Further evidence of the sidelining of philo- cal positions, quantitative and qualitative
sophical issues among many mixed methods research involve the study of different phe-
researchers is that the previously mentioned nomena and therefore cannot be compared.
content analysis revealed that only 6 percent This means that they cannot be used for
of the 232 articles examined referred to episte- exercises like triangulation of findings, but
mological or ontological issues or to paradigm can be employed to study complementary
conflicts in the combined use of quantitative issues. This argument does not represent an
and qualitative research (Bryman, 2006a). outright rejection of mixed methods research
The coding of this dimension required only at all, but it does imply that there are limits to
a mention of these issues; it was not concerned its use. A second example is Giddings’ (2006)
with the way in which the issue was couched. suggestion that mixed methods research ‘is
Thus, the coding was neutral about whether positivism dressed in drag’. As she puts it:
paradigm issues were depicted in articles as ‘mixed methods dwells within positivism;
impeding or irrelevant to the combination the ‘thinking’ of positivism continues in the
of the mixing of quantitative and qualitative ‘thinking’ of mixed methods. … [It] rarely
research. This finding provides further sug- reflects a constructionist or subjectivist view
gestion that mixed methods researchers adopt of the world’ (2006: 200). The point here is
a pragmatic view of the research process that very consistent with Smith and Heshusius’s
concerns in that Giddings is arguing that in studies. In these fields, systematic review is
the service of mixing methods, qualitative sometimes promoted as a yardstick for con-
research becomes what they called in the ducting literature reviews and, as previously
quotation above a ‘procedural variation’ of noted, is often regarded as occupying the top
quantitative research. The concern here seems spot in hierarchies of evidence in fields like
to be that by colonizing qualitative research, social policy research (Becker and Bryman,
mixed methods research may marginalize 2004). It has emerged out of medical research,
philosophical traditions that have come to the where it has been used to inform evidence-
fore in recent years and which have drawn based medical decision-making. In this field,
significantly on qualitative methods (e.g. crit- meta-analyses of trials and other kinds of
ical approaches, interpretivism). A similar investigation have become gold standards on
kind of concern has been expressed by Howe which important decisions rest. Systematic
(2004) who argues that in mixed methods review draws on and incorporates many of
research, qualitative methods have become the insights and procedures with which meta-
adjuncts to quantitative ones. He suggests that analysis is associated. Indeed, it is to all intents
such research is founded on the same episte- and purposes a form of systematic review.
mological principles as quantitative research Systematic review has been defined as:
and argues for mixed methods research that ‘a replicable, scientific and transparent pro-
draws explicitly on interpretivism. We see cess, in other words a detailed technology,
here a clear example of a paradigmatic stance that aims to minimize bias through exhaustive
on mixed methods research. literature searches of published and unpub-
The point of this brief discussion of these lished studies and by providing an audit trail
views that are critical of the use of mixed of the reviewer[’]s decisions, procedures and
methods research is that they imply that conclusions’ (Tranfield et al., 2003: 209).
paradigmatic views of the approach have Systematic review begins with an explicit
not gone into abeyance and indeed may be statement of the purpose of the review and
involved in something of a renaissance in specifies the criteria by which studies are
response to its growing prominence. What we to be included in the review. The issue of
see here as well is a suggestion that the criteria operates on at least two levels. One is
paradigm wars are not over or that clashes that the criteria should specify such things
continue even when a truce has been declared. as the limits in terms of geography and
time. The other is that the reviewer should
specify quality criteria, that is, that only
Paradigm wars in applied fields
research that meets the pre-set criteria should
It is very striking that, as previously noted, be included in the review. This has become one
applied fields like evaluation research and of the most contentious areas of systematic
nursing research have been very receptive to review because it has sometimes been viewed
mixed methods research, as can be seen when as discriminating against the inclusion of
the contents of the Handbook of Mixed qualitative studies within its purview, because
Methods in Social and Behavioral Research they cannot meet the criteria that are specified
(Tashakkori and Teddlie, 2003) are exam- which presume that the studies derive from
ined. However, at the same time, some quantitative research. Further, qualitative
applied fields continue to provide something research, until fairly recently, has been viewed
of a battleground in which clashes akin to the as less obviously capable of synthesis than
paradigm wars can be encountered. quantitative research. These features have
One of the most prominent forms of what I resulted in considerable interest since the
am suggesting here is the rise of systematic late 1990s in the development of quality
review in areas that overlap with social criteria for qualitative studies to inform
research, such as health research, educa- their inclusion or exclusion from systematic
tion, social policy research, and organization reviews and of approaches to aggregating
qualitative studies. The issue of synthesizing for synthesizing such studies. However, at
qualitative studies has been explored in terms the time of writing there has been no
of both aggregating qualitative studies with agreement about either of these areas. Instead,
quantitative ones and aggregating qualitative there has been a proliferation of attempts
studies in domains where most of the literature to specify quality criteria for qualitative
draws on qualitative evidence. research, both within and beyond the context
Two things are relevant to the discussion of systematic review (Bryman, 2006b; Dixon-
of the supposed termination of the paradigm Woods et al., 2004; Spencer et al., 2003).
wars. One is that the systematic review Also, several approaches to synthesis have
approach is very much predicated upon been promoted but there is little consensus
principles that can be traced to a quantitative about which to use or when (Sparkes, 2001).
research stance and its association with pos- The approaches include: meta-ethnography;
itivism. These principles include an empha- content analysis; and critical interpretive
sis on: transparency, replicability, and the synthesis (Dixon-Woods et al., 2006; Mays
application of apparently neutral procedures. et al., 2005). In itself, the lack of agreement
These principles can then be deployed against concerning how qualitative studies can best
conventional reviews to suggest that they are be incorporated into systematic reviews is
lacking in rigour and are biased. For example, not a problem. However, it does make it
Tranfield et al. write: ‘applying specific difficult for qualitative researchers to acquire
principles of systematic review methodology legitimacy beyond the qualitative research
used in the medical sciences to management community for their literature reviews. This is
research will help in counteracting bias by not unlike the situation that pertained in
making explicit the values and assumptions the early years of the paradigm wars when,
underpinning a review’ (2003: 208). There from the point of view of many qualitative
is a glimpse in these discussions of the researchers, quantitative researchers were
remnants of paradigm war issues or at least the perceived as defining what constituted an
potential for them. For example, Hammersley appropriate approach to the research process.
has argued that systematic review ‘assumes What is not clear is how far the predilection
the superiority of what … can be referred for systematic reviews will diffuse beyond the
to as the positivist model of research’ applied fields where it has been especially
(2001: 544). Much like in qualitative research, promoted. Systematic review works best
the reviewer is almost seen as a contaminant when research questions are of the ‘what
whose biases and predilections have to be works?’kind but in less applied fields this kind
minimized. Hammersley also observes that of research question is uncommon or unlikely.
evidence is not typically presented to suggest The main point that is being registered at
that systematic reviews are superior to non- this juncture is that the creation of a contrast
systematic (increasingly called ‘narrative’) between systematic and narrative reviews,
reviews. Instead, narrative reviews are con- along with the problems of incorporating
demned by innuendo – they are not system- qualitative studies into the former, reveals
atic, they do not use explicit procedures, etc. vestiges of issues that were long associated
Hammersley (2001) also argues that it is not with the paradigm wars.
easy to see how qualitative studies fit with a A further example of a resurgence of
systematic review approach. In fact, one of the paradigm hostilities can be found in educa-
most notable aspects of the discussion of sys- tional research. In this field, there has been
tematic reviews in the social sciences since he a recognition in both the USA and the UK
wrote this article is the growing discussion of that there have been attempts to restrict the
ways of making qualitative research amenable acceptability of empirical research to just
to systematic review. As previously noted, this studies that conform to what is taken to be
includes developing quality criteria specifi- scientific research. Feuer et al. (2002) note
cally for qualitative studies and mechanisms that in the context of educational research
in the USA, ‘scientifically based research’ onto the quantitative-qualitative distinction.

has become a watchword for what is to In many fields, the existence of a critical
be treated by government departments as paradigm, as noted by Denzin and Lincoln
valid and acceptable knowledge. The authors (2005b) and mentioned above, has been a
counted no fewer than 111 references to the constant companion to the quantitative and
term in the No Child Left Behind Act of qualitative ones (see, for example, Deetz,
2001. Scientifically based research perhaps 1996). While critical studies tend to be
unsurprisingly rests on the same or at least associated with qualitative approaches, this
similar principles to those that have long need not be so (Morrow and Brown, 1994).
been held among quantitative researchers in
the social sciences. As Hodkinson (2004)
notes, this valorization of a set of epistemo- CONCLUSION
logical principles means that methodological
procedures associated with certain research In this chapter, I have sought to outline
methods come to be seen as the ones most the grounds on which it is sometimes
likely to generate acceptable knowledge. claimed that the paradigm wars have come to
A similar kind of stance could be discerned an end. At a superficial level, there has been
in the UK in the Tooley report (Tooley with something of a lessening of hostilities around
Darby, 1998). This report provided a critique the quantitative-qualitative divide. At this
of much educational research in the UK level, the rise of mixed methods research and
largely using principles associated with quan- a commitment to pragmatism would seem to
titative research to criticize qualitative studies. act as a high-profile indicator of this détente.
These discussions have caused considerable However, the evidence that the paradigm
consternation among educational researchers wars have come to an end can be countered
and others working within a qualitative with some trends that point in the opposite
research tradition (e.g. Hodkinson, 2004; direction. I have mentioned three areas that
Lather, 2004; Ryan and Hood, 2004). The suggest this: the continued presence of intra-
subtext of much of this discussion is to argue paradigmatic differences; the existence of
against the tendency to attach greater value to different stances on mixed methods research;
a set of methodological principles and to carve and signs of paradigm wars in applied fields
out some space for qualitative investigations that are adjacent to social research. Thus, even
in the face of a perceived hostility. the rise of mixed methods research has not
As Hammersley has acknowledged, the brought the paradigm wars to an end, although
creation in the education field of an ortho- it may have lessened the mutual hostility.
doxy around so-called scientific research The issue then becomes does the continued
principles ‘may amount to a new round in presence of paradigm divergences matter?
the paradigm wars’ (2005: 141). However, Some social scientists may feel uncomfortable
Hammersley writes that it is doubtful whether about the lack of resolution to some of the
this means that a period of paradigm peace main debates in the area of social research
has been shattered because there have been methodology. For others, the existence of
other paradigmatic battles. He mentions the competing paradigmatic positions is a cause
battle over postmodernism as one such area. for celebration and offers the opportunity to
This is an important point. It is easy to examine the social world through different
view the paradigm wars purely in terms of lenses. Such a stance may reflect the way
quantitative and qualitative research and their in which although postmodernism is often
various synonyms. However, these were never regarded as having lost its potency as a force
the only ways of conceiving of paradigm within social theory, its influence still lingers
conflicts. It is worth recalling that in Burrell in diverse ways (Bloland, 2005). It may
and Morgan’s (1979) scheme there were four be that postmodernism’s commitment to the
paradigms and only two of these mapped co-presence of different ways of viewing the
world and the diffusion of constructivist ideas Charmaz, K. 2000. ’Constructivist and objectivist
has resulted in a greater tolerance of such grounded theory’ in Denzin, N.K. and Lincoln, Y.S.
paradigm diversity. (eds.) The Sage Handbook of Qualitative Research.
Thousand Oaks, CA: Sage.
Charmaz, K. 2005. ’Grounded theory in the 21st century’
in Denzin, N.K. and Lincoln, Y.S. (eds.) The Sage
ACKNOWLEDGEMENTS Handbook of Qualitative Research. Thousand Oaks,
CA: Sage.
I wish to thank Martyn Hammersley for Cicourel, A.V. 1964. Method and Measurement in
discussions of some of these issues as well as Sociology. New York: Free Press.
for his comments on this chapter. His ideas Crano, W.D. 2004. ‘Independent variable in experimen-
greatly helped to sharpen my thoughts on tal research’ in Lewis-Beck, M.S., Bryman, A. and
many of these topics, although I alone am Liao, T.F. (eds.) The Sage Encyclopedia of Social
responsible for the deficiencies in this chapter. Science Research Methods (Vols. 1–3). Thousand
I also wish to thank the Economic and Oaks, CA: Sage, pp. 483–4.
Social Research Council for funding the Creswell, J.W. 2003. Research Design: Qualitative,
research project ‘Integrating quantitative and Quantitative, and Mixed Methods Approaches.
Thousand Oaks, CA: Sage.
qualitative research: prospects and limits’
Deetz, S. 1996. ’Describing differences in approaches
(Award number H333250003) which made
to organizational science: rethinking Burrell and
possible the research on which parts of this Morgan and their legacy’. Organization Science
chapter are based. 7: 191–207.
Denzin, N.K. and Lincoln, Y.S. 2005a. ’Introduction:
the discipline and practice of qualitative research’
REFERENCES in Denzin, N.K. and Lincoln, Y.S. (eds.) The Sage
Handbook of Qualitative Research. Thousand Oaks,
Becker, S. and Bryman, A. 2004. Understanding CA: Sage.
Research for Social Policy and Practice. Bristol: Policy Denzin, N.K. and Lincoln, Y.S. 2005b. The Sage
Press. Handbook of Qualitative Research. Thousand Oaks,
Bloland, H.G. 2005. ’Whatever happened to post- CA: Sage.
modernism in higher education?’ Journal of Higher Dixon-Woods, M., Cavers, D., Agarwal, S.,
Education 76: 121–150. Annandale, E., Arthur, A., Harvey, J., Hsu, R.,
Brannen, J. 2006. ’Mixed Methods Research: A Discus- Katbamna, S., Olsen, R., Smith, L.K. and Sutton, A.J.
sion Paper’ NCRM Methods Review Papers: ESRC 2006. ’Conducting a critical interpretive synthesis of
National Centre for Research Methods. the literature on access to healthcare by vulnerable
Bryman, A. 1984. ’The debate about quantitative groups’. BMC Medical Research Methodology 6: 35.
and qualitative research: a question of method Dixon-Woods, M., Shaw, R.L., Agarwal, S. and Smith,
or epistemology?’ British Journal of Sociology J.A. 2004. ’The problem of appraising qualitative
35: 75–92. research’. Quality and Safety in Health and Social Care
Bryman, A. 1988. Quantity and Quality in Social 13: 223–225.
Research. London: Unwin Hyman. Feuer, M.J., Towne, L. and Shavelson, R.J. 2002.
Bryman, A. 2004. Social Research Methods. Oxford: ’Scientific culture and educational research’. Educa-
Oxford University Press. tional Researcher 31: 4–14.
Bryman, A. 2006a. ’Integrating quantitative and Filstead, W.J. 1970. Qualitative Methodology: First-
qualitative research: how is it done?’ Qualitative hand Involvement with the Social World. Chicago:
Research 6: 97–113. Markham.
Bryman, A. 2006b. ’Paradigm peace and the implications Fine, G.A. and Elsbach, K.D. 2000. ’Ethnography and
for quality’. International Journal of Social Research experiment in social psychological theory building:
Methodology 9: 111–126. tactics for integrating qualitative field data with
Buchanan, D.R. 1992. ’An uneasy alliance: combin- quantitative lab data’. Journal of Experimental Social
ing qualitative and quantitative research’. Health Psychology 36: 51–76.
Education Quarterly 19: 117–135. Gage, N. (1989). ‘The paradigm wars and their
Burrell, G. and Morgan, G. 1979. Sociological Paradigms aftermath: a ‘historical’ sketch of research on teaching
and Organisational Analysis. London: Heinemann. since 1989’. Educational Researcher 18: 4–10.
Giddings, L.S. 2006. ’Mixed-methods research: posi- Mays, N., Pope, C. and Popay, J. 2005. ’Systematically
tivism dressed in drag?’ Journal of Research in Nursing reviewing qualitative and quantitative evidence to
11: 195–203. inform management and policy-making in the health
Goldthorpe, J.H., Lockwood, D., Bechhofer, F. and field’. Journal of Health Services Research and Policy
Platt, J. 1966. The Affluent Worker: Industrial 10: S6–S20.
Attitudes and Behaviour. Cambridge: Cambridge Morrow, R.A. and Brown, D.D. 1994. Critical Theory and
University Press. Methodology. Thousand Oaks, CA: Sage.
Goode, W.J. and Hatt, P.K. 1952. Methods in Social Morse, J.M. 2001. ‘A storm in an academic teacup’.
Research. New York: McGraw-Hill. Qualitative Health Research 11: 587–588.
Greene, J.C. and Caracelli, V.J. 1997. ’Defining and Oakley, A. 1999. ’Paradigm wars: some thoughts on a
describing the paradigm issue in mixed-method personal and public trajectory’. International Journal
evaluation’ in Greene, J.C. and Caracelli, V.J. of Social Research Methodology 2: 247–254.
(eds.) Advances in Mixed-Method Evaluation: The Phillips, D.L. 1971. Knowledge from What? Theories and
Challenges and Benefits of Integrating Diverse Methods in Social Research. Chicago: Rand McNally.
Paradigms. San Francisco: Jossey-Bass. Phillips, D.L. 1973. Abandoning Method. San Francisco:
Greene, J.C., Caracelli, V.J. and Graham, W.F. 1989. Jossey-Bass.
’Toward a conceptual framework for mixed-method Ryan, K.E. and Hood, L.K. 2004. ’Guarding the castle and
evaluation designs’. Educational Evaluation and opening the gates’. Qualitative Inquiry 10: 79–95.
Policy Analysis 11: 255–274. Sale, J.E.M., Lohfeld, L.H. and Brazil, K. 2002. ’Revisiting
Hammersley, M. 1984. ’The paradigmatic mentality: the quantitative-qualitative debate: implications for
a diagnosis’ in Barton, L. and Walker, S. (eds.) mixed-methods research’. Quality and Quantity
Social Crisis and Educational Research. London: 36: 43–53.
Croom Helm. Savage, M. 2005. ’Working-Class identities in
Hammersley, M. 1992. ‘The paradigm wars: reports from the 1960s: revisiting the Affluent Worker study’.
the front’. British Journal of Sociology of Education Sociology 39: 929–946.
13: 131–143. Schegloff, E.A. 1997. ’Whose text? Whose context?’
Hammersley, M. 2001. ’On ‘systematic’ reviews of Discourse and Society 8: 165–187.
research literatures: a ‘narrative’ response to Evans & Schwartz, H.D. and Jacobs, J. 1979. Qualitative
Benefield’. British Educational Research Journal Sociology: A Method to the Madness. New York: Free
27: 543–554. Press.
Hammersley, M. 2005. ’Countering the ‘new ortho- Smith, J.K. and Heshusius, L. 1986. ’Closing down the
doxy’ in educational research: a response to Phil conversation: the end of the quantitative-qualitative
Hodkinson’. British Educational Research Journal debate among educational researchers’. Educational
31: 139–155. Researcher 15: 4–12.
Hodkinson, P. 2004. ’Research as a form of work: Sparkes, A. 2001. ’Myth 94: qualitative health
expertise, community and methodological objectiv- researchers will agree about validity’. Qualitative
ity’. British Educational Research Journal 30: 9–26. Health Research 11: 538–552.
Howe, K.R. 2004. ’A critique of experimentalism’. Spencer, L., Ritchie, J., Lewis, J. and Dillon, L. 2003.
Qualitative Inquiry 10: 42–61. Quality in Qualitative Evaluation: A Framework for
Jahoda, M., Lazarsfeld, P.F. and Zeisel, H. 1972. Assessing Research Evidence. London: Government
Marienthal: the Sociography of an Unemployed Chief Social Researcher’s Office.
Community. London: Tavistock. Tashakkori, A. and Teddlie, C. 2003. Handbook of Mixed
Kuhn, T.S. 1970. The Structure of Scientific Revolutions. Methods in Social and Behavioral Research. Thousand
Chicago: University of Chicago Press. Oaks, CA: Sage.
Lather, P. 2004. ’Scientific research in education: a Tooley, J. and Darby, D. 1998. Educational Research:
critical perspective’. British Educational Research A Critique. London: Ofsted.
Journal 30: 759–772. Tranfield, D., Denyer, D. and Smart, P. 2003. ’Towards
Maxcy, S.J. 2003. ’Pragmatic threads in mixed methods a methodology for developing evidence-informed
research in the social sciences: the search for multiple management knowledge by systematic review’.
modes of enquiry and the end of the philosophy of British Journal of Management 14: 207–222.
formalism’ in Tashakkori, A. and Teddlie, C. (eds.) Wacquant, L.J.D. 2003. ’Positivism’ in Outhwaite, W.
Handbook of Mixed Methods in Social and Behavioral (ed.) The Blackwell Dictionary of Modern Social
Research. Thousand Oaks, CA: Sage. Thought. Oxford: Blackwell.
3
The History of Social
Research Methods
Marja Alastalo
Not only theories but also methods change in In this chapter social research is understood
the course of history and these changes have as empirical research on the society that can
had consequences for what is known about also be conducted in other institutions than
societies. However, less attention is paid to universities1 . By the concept of ‘method’ I
the history and formation of research methods refer to techniques of gathering and analyzing
than to the history of theoretical ideas and the data. I also make an analytical distinction
thinking of key scholars (Platt, 1996: 1). There between ‘a method of data collection’ and
has also been a related tendency to discuss ‘a method of analyzing data’, because changes
methods and methodological issues on a rather in the methods of data collection and the
abstract and philosophical level, instead of methods of analysis have not occurred
studying what has actually been done. simultaneously. Textbooks also often focus
In this chapter my aim is to briefly on either specific methods of gathering data
outline the history of social research methods (e.g. Gubrium & Holstein, 2002; Kvale,
on the basis of earlier accounts of that 1996) or methods of analysis (Hardy &
history. I try to cover the wide-ranging and Bryman, 2004) and they may contain different
incoherent histories of both quantitative and sections for each (Denzin & Lincoln, 2000a).
qualitative research methods. The focus is Methodology is often understood and defined
unavoidably but regrettably in the Anglo- as a normative attempt to find and discuss
American traditions. The Anglo-American ‘the good and the bad practices’. However,
social research is often a starting point here methodology is understood as a research
that is taken for granted (Alasuutari, 2004). performed on research methods. ‘Sociologists
To compensate the brevity of this text an study man in society; methodologists study
extensive listing of references in the history the sociologist at work’ (see Lazarsfeld,
of social research methods is provided. 1993a: 236).
THE HISTORY OF SOCIAL RESEARCH METHODS 27
VARIATIONS IN THE HISTORY OF myths typically cover a long time-span. Thus,

SOCIAL RESEARCH METHODS for instance qualitative methods are often
said to stem from Max Weber’s thinking
The history of social research methods has and the Chicago School. This story has been
been told in many ways and with different disproved by showing that actually there was
emphases for different purposes. Basically no continuity from Weber to the Chicago
two types of histories can be found: in addition school (Platt, 1983). Another characteristic
to actual studies on the history of methods, feature of an origin myth of qualitative
method textbooks, for instance, contain histo- methods is that the writer’s own approach
ries of methodological development that aim is contrasted to the claimed weaknesses
at legitimating the writers’ own approaches. of quantitative methods that are criticized
Research on the history of research methods for not being able to tackle the current
has been rare compared with the numerous challenges3 .
brief accounts in method textbooks. The histories of social research methods
A comprehensive history of social research are not written in a vacuum, but always in a
methods is still unwritten because social specific temporary-spatial context. The con-
research is fractured and exercised in various current methodological disputes, controver-
disciplines and on all continents. Even the sies and conflicts have often guided the choice
most extensive histories have concentrated on of focus in the histories of research methods.
one country and on a certain period of time Despite the good intentions of the author
(e.g. Bulmer, 1985; Kent, 1981; Oberschall, there is always the possibility of a teleological
1965; Platt, 1996, 2006b). In most cases, interpretation and overdetermination of the
historical research on methods has focused course of history. That is why past events are
on a limited period prior to the 1960s, which inevitably seen to lead to the current state
means that very little research has been of art: ‘It is observable that much writing
conducted on the second half of the twentieth about the history of sociology (…) starts
century. According to Jennifer Platt there from the moving frontier of contemporary,
is a shortage of serious historical work on and works forward to it from ancestors
empirical research and its methods since the chosen for their perceived contemporary
1930s (Platt, 1996: 4). relevance’ (Platt, 1996: 3). Another tendency
A great deal of the historical research has has been to narrate the history of a discipline
focused on the rise of statistical thinking as a progress narrative, where science is
and the formation of survey research. History assumed to develop through successive and
and formation of qualitative methods is increasingly comprehensive paradigms (see
less studied than the history of survey Alasuutari, 2004)4 .
methods, although a number of articles have
also been written on the developments in
qualitative methods (e.g. Platt, 1983, 1986, AN OUTLINE OF THE HISTORY OF
2002; Vidich & Lyman, 1994)2 . Historical SOCIAL RESEARCH METHODS
overviews of some quite prominent subfields
of qualitative research – such as ethnog- The roots of social research methods go
raphy and feminist research – still remain back to the seventeenth century, when evi-
unwritten. dence based science started to take shape
The emergence of qualitative research (see e.g. Oakley, 2000: 73–160). It is often
methods is often told in the textbooks on claimed that the rise of capitalism with the pro-
qualitative methods (e.g. Bogdan & Taylor, cesses of urbanization and industrializations
1975; Denzin & Lincoln, 2000). The stories gave impetus for empirical social research.
of the emergence are sometimes called origin A special need for knowledge of society is
myths. In an origin myth the method at said to have arisen. ‘Almost the very day on
hand is told a glorious history. These origin which the European feudal order suffered its
first political defeat became the birthday of the who also conducted a considerable amount
first sociographic study’ (Zeisel, 2002: 100). of empirical research during his career
In the following, the history of social (Lazarsfeld, 1993b: 283–298).
research will be reviewed from the beginning The pioneers did not aim at testing theories
of the twentieth century to the turn of the but collecting facts and sometimes also
millennium. My aim is to trace both the changing the state of affairs. At that time
continuities and discontinuities and to present even the idea of collecting empirical material
an outline of the history, drawing on earlier on ordinary people for research was novel.
research. The early studies were influenced by various
Christian, philanthropic and socialist ideas but
also scientific ideas from statistics to national
The methods of social research economy. The social reforms suggested by
before the First World War Booth and his successors are often interpreted
A prehistory of qualitative methods has not as early steps taken towards the welfare
been traced to the same extent as the prehistory state. In these interpretations the divergent
of the survey, and especially the formation suggestions – such as the segregation of
of ideas that led to the rise of modern the casual poor to ‘labour colonies’ and the
statistics and statistical institutions, which loafers to detention centres – are forgotten
have been carefully studied (Höjer, 2001; (Kent, 1985: 55).
Lazarsfeld, 1977; Porter, 1986; Stigler, 1986; The early social survey in America was
Zeisel, 2002). Also the history of empirical influenced by the European counterpart and
social research and the formation of social at least one part of it has been defined
survey from the end of the nineteenth century as a social movement ‘dedicated to putting
to the First World War in particular are science (…) in the service of social reform’
outlined in several countries (Abrams, 1981; (Converse, 1987: 21). In addition to the social
Converse, 1987, 11–53; Kent, 1981,1985; surveys in the United States, election and
Marsh, 1982; Oberschall, 1965; Young, 1949). opinion polls also started to evolve very early
With few exceptions (e.g. Converse, 1987; (Hoinville, 1985: 106). So, the new ideas
Young, 1949) these histories discuss the of studying and describing the society were
course of events in Europe, as the roots applied and advanced by various actors and
of empirical social research actually lie in for various interests.
Europe, not America: Neither the methods of data collection nor
the methods of analysis in the pioneer surveys
All European countries have conducted empirical
meet the definition of the modern survey.
social research for nearly 200 years. As a matter
of fact, many of the techniques which are now The data collected in the early surveys can be
considered American in origin were developed in considered miscellaneous because structured
Europe 50 or 100 years ago and then they were questionnaires were not yet an established
exported from the United States after they had been mode of data collection. For example Booth,
refined and made manageable for use on a mass
with his assistants, ‘used a variety of meth-
scale. (Lazarsfeld, 1965: v.)
ods, consulting existing statistics, conducting
The pioneer surveys in Britain and interviews with informants, and making
Germany dealt with poverty and the material countless observations of real conditions’
and moral living conditions among working (Converse, 1987: 15). What was characteristic
class and agricultural labour (Oberschall, of Booth and also of Max Weber was that they
1965: 3). The aim was to provide infor- collected the data from informants instead
mation on contemporary social problems. of relying on the poor people themselves.
The pioneers of social survey had various Weber assumed that direct interviewing was
backgrounds from non-academics, such as impossible with low-income people because
Charles Booth and Seebohm Rowntree, to they were not able to describe their own situ-
the classics of sociology such as Max Weber, ation. Later Weber changed his mind on this
and became convinced that also low-income diffusion of these ideas. For instance,
people are able to speak for themselves most tabulations were carried out by hand,
and thus they can be directly interviewed because machines for sorting and counting
(Lazarsfeld, 1993: 286, 290). punch-cards were rare and mainly used by
Catherine Marsh (1985) has noted that even statistical offices. Random sampling was also
the idea of a respondent who is both a subject technically difficult as it was laborious to
of the study and an informant at the same compile lists of people suitable for sampling.
time was slow to develop. Once the ideas of The imperfection of the methods used
direct data collection and interviewing were was not the only weakness of the early
invented, researchers started to pay attention British social surveys; they were also often
also to the questionnaire design and question both conceptually and theoretically vague.
wording. As Raymond Kent has put it:
These early surveys were not sample
Investigators did attempt to explain their findings
surveys, so in this respect too they differed by looking for causes, but the attempt was not
from the modern surveys. Probability very successful. (…) What they failed to realize
sampling was invented in statistics at the was that explanation of the facts could never be
turn of the century, but the usefulness of based on yet more facts. Such an explanation
sampling was not found in social research. was always a question of interpretation of the
facts, and for that they would have needed
The pioneers of survey aimed at covering the kind of theories being proposed by political
everyone in the area that was chosen. economists and academic sociologists of the day.
This led to the encyclopaedic endeavours (Kent, 1985: 68.)
where huge amounts of data were collected.
A.L. Bowley discovered the useful properties In the Continent attempts to combine theory
of probability sampling for social research. and methods in empirical research were
He applied probability sampling for the first made in the field of sociology. The first
time in his study of five English towns in 1915. method textbook The Rules of Sociological
The methods of analysis were also elemen- Method by Emile Durkheim was published
tary before the First World War. The data in 1895 in French. Later on Max Weber
drawn from various sources were usually wrote some methodological texts5 . Because
counted, classified and presented in percent- of the language barrier these texts did
age tables and sometimes in cross-tabulations. not influence the Anglo-American tradition
Early surveys have been criticized for being before they were translated into English at
unsophisticated as they did not connect with the end of the 1930s and 1940s (see Platt,
the developments in correlational techniques 1996: 69–70, 117–119 on the reception of
that were invented by the turn of the century these classics). In the United States the
(Selvin, 1985). European tradition was seen through the
According to Catherine Marsh, major contemporary frame. For example, Emile
advances in the survey technology were Durkheim’s Suicide was presented as an early
already made before the First World War example of quantitative reasoning conducted
(Marsh, 1982: 27). By major advances Marsh in the Lazarsfeldian style (Selvin, 1958; also
means the idea of probability sampling, the Madge, 1963; Riley, 1963).
use of structured questionnaires, and the basic
tools of statistical analysis such as correlation The interwar period: A tension
and regression coefficients. However, these
between case study and statistical
innovations did not spread overnight. It took
method
a long time before these methodological
inventions were refined operational and Most writings on social research methods
widely accepted as self-evident established from the 1920s onwards deal with the
practices. Backward technical conditions development of methods in the United States.
are probably one explanation for the slow According to Jennifer Platt this emphasis
can be considered justified because after descriptions provided by basically quantita-

the First World War American sociology tive researchers such as Le Play (Hammersley,
‘became dominant quantitatively and qual- 1989: 93). As a consequence of this diversity
itatively’ (Platt, 1996: 2). Platt emphasizes there was not any shared understanding of
the importance of American development case study as a method (Platt, 1992).
by saying that ‘the directions in which The Chicago School is often mentioned as
they (national sociologies, MA) have moved a birth home of case study and it is strongly
cannot be understood without understanding identified with the birth of qualitative
what happened in America, even if they methods. Following from this emphasis
have often reacted strongly against American The Polish Peasant in Europe and America
influence in general, as well as particular by Znaniecki and Thomas has been presented
American tendencies’ (Platt, 1996: 2). This as a foremost landmark of the Chicago
influence already began during the interwar School. No doubt it deserves to be praised.
period, but had the strongest impact after the It moved academic sociology towards the
Second World War. empirical world and ‘attempted to integrate
The studies dealing with the history of theory and data in a way no American
social research during the interwar period study had done before’ (Bulmer, 1984: 45).
have concentrated on the Chicago School Its importance was already acknowledged
and also on the research conducted in in 1937, when members of the American
Columbia University. Otherwise the interwar Sociological Association nominated it as
period has not been paid much attention to the most influential sociological monograph
(Bulmer, 1984; Harvey, 1987; Platt, 1996: 4)6 . (Hammersley, 1989: 71).
Despite the scarcity of research on this era, However, the situation in Chicago was
quite remarkable changes happened within more complex in that statistical methods and
both quantitative and qualitative methods. case study were seen both as complementary
The debate from the 1920s onwards took place and as opposing approaches7 . There were
between – in terms that were in use then – the researchers (such as Ernest W. Burgess)
case study and statistical method. The front- who advocated a research style where after
line between case study and statistical meth- analyzing the symbolic culture and subjective
ods is typically drawn between the universities meanings in a single case, a study was con-
of Chicago and Columbia. The terms ‘quanti- tinued with statistical methods to search for
tative’ and ‘qualitative’ were not commonly more general patterns (Bulmer, 1984: 151).
used at that time, although they have been The Chicago School also contributed to
employed when the developments in the quantitative social research, as some
1920s and 1930s have later been described. prominent figures of survey – for instance
Case study was used to refer to the collec- William Ogburn, Samuel Stouffer and
tion and presentation of detailed, relatively L.L. Thurnstone – worked there8 . The tech-
unstructured information from a range of nique of mapping was especially elaborated
sources. As a scientific enterprise the case in the field of urban studies. Mapping was
study was as much associated with social work a simple quantitative technique where any
as with sociology (Platt, 1992: 19). The con- available data were used to make maps of the
cept of case study originates from the case city to show population density, the distribu-
work techniques developed by social workers. tion of nationalities, land values, businesses
The influence of case reports was twofold: and so on. Ernest Burgess contributed to
first they provided social researchers a model census statistics by formulating the basic
of reporting their fieldwork and second they principles of modern census tract statistics
were a source of data for social researchers. and is recognized as the father of the idea.
In addition, case study was rooted in the L.L. Thurnstone made advances in developing
clinical methods of doctors, the methods of attitude measurement scales and in the anal-
historians and anthropologists and qualitative ysis of such data (Bulmer, 1984: 151–89)9 .
The Chicagoans’ contributions to statistical various data types such as life histories,
methods discussed above shows that it is time sheets, school essays, meal records and
misleading to equate the Chicago School statistical data. The authors – Marie Jahoda,
merely with qualitative methods (see also Paul Lazarsfeld and Hans Zeisel – crystallized
Platt, 1996: 264–65). Considerable advances the atmosphere of the moment:
in statistical methods were also made outside
Chicago during the interwar period. Statistical But there is a gap between the bare figures of
official statistics and the literary accounts, open
methods were widely practised by social as they invariably are to all kinds of accidental
surveyors, social researchers, pollsters and impressions. The purpose of our study of the
market researchers; all of them made method- Austrian village, Marienthal, is to bridge this gap.
ological contributions.At that time these fields (Jahoda et al., 2002: 1)
were not separate but there was interaction
The study is said not to be directly influenced
as, for instance, some of the academic social
by American sociology or German social
researchers worked in community survey
research (Fleck, 2002: viii). This conclusion is
programmes and then moved back to the
difficult to draw from the book itself because
university10 . Also, at least some of the
it is unconventional in a sense that there are
academic departments and research institutes
no references. As an afterword, there is a short
appear to have formed multidisciplinary –
history of sociography by Hans Zeisel where
before the word was invented – environ-
he writes about ‘the American survey’. This
ments, where social scientists, statisticians
proves that the authors were at least to some
and psychologists met.
extent aware of American social research and
In the interwar years the development of
the writings of the Chicago School. However,
sampling techniques continued, as did the
it can be said with certainty that this trio
discussion on the use and choice of sampling
influenced American social research more
methods which were far from being matters
thoroughly after their immigration to the
of course. By the end of the 1930s probability
United States in the 1930s11 .
sampling became customary. Furthermore,
All in all, it would probably be more apt
advances were made by Louis Guttman
to refer to both traditions in plural and speak
and Rensis Likert in the attitude scaling
about case studies and statistical methods.
techniques as they both invented scales
This would also direct more attention to
which still carry their names (for details see
the obvious diversity within the traditions,
Converse, 1987: 54–76).
even though a similarity is found between
Not surprisingly, these advances were
the sides of the controversy as both of
not mobilized simultaneously in different
them are said to have adhered to the
disciplines and non-academic environments.
realistic approach (Hammersley, 1989). In
They were also slow to spread, which
America, the controversy between case study
can at least partly be explained by the
and statistical methods faded away before
material prerequisites of the time: ‘Tasks
the Second World War (Platt, 1992). The
now routinely carried out by computer were
case study vanished for decades and the
then done by hand, very laboriously. (…)
conceptual repertoire changed so that the
Quantitative analysis required much more
concept of ‘statistical methods’ was replaced
intensive use of manpower than is the case
by the concept of survey without the epithet
today’ (Bulmer, 1984: 169).
‘social’.
Regarding these developments there is
one study from Europe: Marienthal (Jahoda
et al., 2002), published in Austria in 1933,
From the 1940s to the end of the
which is worth mentioning. This study which
1960s: The rise of survey
became a classic of social research dealt
with unemployment during the depression in The Second World War can be considered
an industrial village. The study combined as a watershed in the sense that almost
everything written on social research methods Lundberg considered his model apt to social
in the after-war period has focused on survey as well as to natural sciences (Platt, 1996: 78).
methods from different angles. Like the Afterwards Lundberg has been labelled as an
economic depression of the 1930s in America extreme operationalist and his approach has
stimulated social research, the Second World been criticized for being atheoretical (Platt,
War also fuelled empirical research and 1996: 93)13 .
especially the diffusion of survey methods. These decades are widely recognized as
The two volumes of The American Soldier the heyday of survey. However, surprisingly,
are often recognized as the keystones of some of the best-known method textbooks
modern survey (Stouffer et al., 1949a, 1949b). do not focus in a blinkered way only on
They belong to the monumental four-volume the collection of survey data (Jahoda et al.,
research entitled Studies in Social Psychology 1953a, 1953b; Riley, 1963; Selltiz et al., 1961;
in World War II, which were published in Young, 1949). On the contrary, the use of
1949–50. The huge volumes consisted of historical and personal documents, statistical
reanalysis and rewriting of the data collected data and field observation are also presented
during the wartime by the Research Branch of extensively, but when the focus turns to the
the Army. methods of analysis then most of the pages
are reserved to statistical methods. There were
With data gathered from individuals largely by also exceptions to the dominance of survey
written questionnaires, Stouffer and his colleagues analysis in the 1940s and 50s. For instance
tried to capture some of the dynamic influence
of group membership and context on individual
William Whyte used participant observation
perceptions, attitudes, opinions, morale, adjust- and attempted to systematize the case study
ment, and behaviours. Though they had few means method (Platt, 1996: 62–63).
of measuring group process directly, through After the war a change happened in social
tireless replication and imaginative analysis, they research in relation to theory. The British
were able to cast some light on the interplay
between individual and group characteristics.
interwar sociology has been described in
(Converse, 1987: 220) this way: ‘These individuals who conducted
survey before 1939 were not for the most part
Most of the reviewers noticed the contri- consciously trying to develop or test socio-
butions American Soldier made to social logical theory. Their motives lay elsewhere
research.According to Platt the significance of but the end result of their endeavours was
the study was that it established survey as the often the formulation of ideas and theories’
leading method of data collection (Converse, (Kent, 1985: 52). This statement appears also
1987: 217–24; Madge, 1963: 287–332; Platt, to be apt of the American counterpart. After
1996: 60–61). the war empirical research was often explicitly
If methodological advances were made in grasped as an effort to test a theory. However,
empirical research, the logic of survey anal- a slightly different conception of theory is
ysis was recorded and established in method implicated by Stouffer and Lazarsfeld whose
textbooks. Since the 1940s several influential main goal, according to Converse was to keep
textbooks were published (Lundberg, 1942; the scientists shuttling back and forth between
Jahoda et al., 1951a,b; Hyman, 1960) and they theory and data (Converse, 1987: 219).
spread widely outside America12 . In his text- The controversies within survey are sel-
book Social Research (1942) Georg Lundberg dom taken into consideration either in ori-
formulated the steps to be taken in most gin myths or in the critiques of survey.
advanced level scientific research: ‘The work- In reality, in the 1940s and 50s, there
ing hypothesis; the observation and recording were tensions and disagreements on var-
data; the classification and organisation of the ious issues. For example, the usefulness
data collected; generalisation to a scientific of statistical tests in social sciences was
law, applicable to all similar phenomena in disputed (Morrison & Henkel, 1970) and there
the universe studied under given conditions’. was no consensus on whether questionnaires
should be based on open-ended or structured was seen as its leading exponent. A few
questions. Jean M. Converse claims that years later in Method and Measurement
the controversy ended up in the structured Aaron Cicourel discussed the problems that
questionnaires’ favour, but not by evidence come up when sociologists try to measure
(Converse, 1984, 1987). Many of these con- meaningful action. He did not even intend
troversies can be interpreted as consequences to offer a solution either; if anything he
of strong departmental traditions, which also called for clarification of sociological theory
influenced the style of analysis that was (1964: iii). Since the 1950s Howard S.
preferred (Platt, 1996: 133). Becker contributed to the use of qualita-
Simultaneously with the rise of popularity tive methods and especially to participant
also the critique of survey increased. Because observation with his studies on collective
of his central position in the field, Paul action: ‘I conceive of society as collective
Lazarsfeld was one of the main targets. action and sociology as the study of the
‘Great man theories of history may be forms of collective action’ (Becker, 1970: v).
unfashionable, but they are hard to avoid Becker’s methodological writings differed
here; the whole pattern of publication after from the ones mentioned above as he did
the war is marked by Lazarsfeld’s influence’ not concentrate on dissecting the weaknesses
(Platt, 1996: 61). Altogether his reception, of the survey method. All these researchers
as it has emphasized only his impact on prove that besides the mainstream of survey,
survey methods, is criticized to have been there were efforts towards more qualita-
lopsided compared to his contribution (Platt, tively orientated methods of social research.
1996: 64). It has not been remembered for Textbooks on qualitative methods did not
instance that he insisted that quantitative appear until the end of the 1960s, when The
and qualitative analysis should be combined Discovery of Grounded Theory was published
(Boudon, 1993: 23) and that he promoted (Glaser & Strauss, 1967).
research on the history of social research. In the late 1960s and 70s it was common
Herbert Blumer, the inventor of symbolic to claim that there is a connection between
interactionism, criticized statistical methods functionalism and survey method since they
since the end of the 1920s. In the mid were the leading tendencies in the post-
1950s he targeted his critique especially on war social research. These views rested
‘variable sociology’ as a method of data on the assumption that ‘(t)he relationship
collection and analysis and he saw Lazarsfeld between method and theory is one of elective
as the main proponent of survey research. affinity, but not symmetrical: theory is more
Blumer defined the process of interpretation fundamental, and leads to the corresponding
as ‘the core of human action’ and considered method or (…) the epistemological leads to
variable sociology incapable of catching its the technical’ (Platt, 1996: 106). Later on
essence. Blumer saw the potential of ‘variable Jennifer Platt claims that it was more of a
sociology’ as very restricted. He notes that coincidence that functionalism and survey
it is applicable to ‘those areas of social life dominated at the same time and there is no
and formation that are not mediated by an causal or logical connection between them
interpretative process’ but gives no examples (Platt, 1996: 113–17; 2006a).
of what such might be (Blumer, 1956; see Treating three post-war decades together
also Hammersley, 1989: 113–36.) Despite his gives necessarily a rough-grained picture.
searing criticism against survey, Blumer did It does not do justice to the variety of social
not suggest an alternative way of doing social research during this period. For instance, the
research as he conducted very little empirical year 1960 has sometimes been considered
research himself (Platt, 1996: 120)14 . a watershed, because, first, the pioneers,
In 1959 in The Sociological Imagination e.g. Lazarsfeld, Stouffer and Likert, were no
C. Wright Mills attacked what he called longer active in survey work and, second,
‘abstracted empiricism’. Again Lazarsfeld the modern survey had also been established
in an institutional sense (Converse, 1987: to the weaknesses of survey (Bogdan &

381). Anyway in the 1960s survey methods Taylor, 1975). As a reaction to the mushroom-
were widely exercised and had such a domi- ing survey critiques the textbooks of survey
nant position in the field that it provoked an methods also started to go through and to reply
increasing amount of criticism. to these critiques (e.g. De Vaus, 1995: 7–10;
Marsh, 1982).
If the tension between the quantitative and
The 1970s and 1980s:
qualitative approaches is seen characteristic
The paradigm war
to this period, that is not all; one should
The beginning of the 1970s can be considered also remember that both approaches have
as a turning point, when the unexplored era transformed. From the survey textbooks and
of social research methods begins. Not much empirical articles it can be inferred that
is known about the formation of research more complicated methods of multivariate
methods from 1970 onwards. After the turn of analysis were applied to survey data. In the
the decade, the discussion on research meth- beginning of the decade new opportunities
ods became structured by the quantitative- opened up for quantitative analysis along with
qualitative distinction. The whole period is the development of computers:
known for this debate.
The development of electronic computers has
In this debate, strong epistemological led to tremendous advances in survey analy-
assumptions were made about methods. Con- sis. Not only has it resulted in great ease in
sequently, they were described to be rooted tabulation but, more importantly, it has led
in contradictory epistemological traditions. to the use and development of high-powered
Positivism as an epistemological stance was multivariate statistical procedures. Before the
advent of computers, the enormous amount
firmly connected to quantitative methods and of computation required for multivariate sta-
to survey. Correspondingly the arising qual- tistical analyses in large-scale surveys limited
itative methods were related to the traditions the use of these methods drastically. Mul-
of phenomenology and hermeneutics. Follow- tivariate methods were employed by only a
ing from these epistemological assumptions few survey researchers, and even they had to
restrict their analyses severely. (Moser & Kalton,
quantitative methods, especially survey, and 1986: 432)
qualitative methods were conceived of as
incompatible. In these critiques positivism Furthermore the methods of survey data
became a new nickname for survey; such a collection have been shaped by the evolution
labelling had not been made by the critics of of techniques, which for instance led to the
the 1950s and 60s. At this point the concept emergence of the new forms of computer-
of a paradigm was also employed to refer assisted interviewing. In addition, more sub-
to the opposite nature of the qualitative and stantial work has been done to improve the
quantitative traditions. questionnaire design (for a summary of this
Positivism is an example of a label that is research see Schaeffer & Presser, 2003). What
given to a tradition from the outside. It is well is often forgotten is the importance of data
known that there is no shared understanding archives especially for the use of survey. Data
of ‘positivism’, but numerous contradictory archives highly increased the availability of
ways of using the term. In fact, there have been survey data and made the secondary analysis
several distinct debates on social research attainable.
and positivism in the course of the twentieth In a review on the history of quali-
century (Bryant, 1985; Halfpenny, 1982; an tative methods the time-span from 1970
insightful summary of survey critiques is to 1986 has been designated as a period
presented by Marsh, 1982: 48–68). of ‘blurred genres’. This refers to the
In the textbooks of qualitative meth- situation where ‘qualitative researchers had
ods the authors’ own approach since the a full complement of paradigms, methods,
1970s was often justified by contrasting it and strategies to employ in their research’
(Denzin & Lincoln, 2000b: 15). On the list combining quantitative and qualitative analy-
a wide range of theories is mentioned such sis of qualitative data.
as symbolic interactionism, ethnomethod-
ology, critical theory, feminism and neo-
From the 1990s onwards:
Marxist theory. Furthermore Denzin and
Unavoidable fragmentation?
Lincoln remind us that ‘diverse ways of
collecting and analysing empirical materi- Apparently, the most difficult task for a
als were also available, including quali- historian is to try to find current patterns.
tative interviewing (…) and observational, Every reader can make a trial and try to
visual, personal experience, and documentary figure out the essential trends of contemporary
methods’ (Denzin & Lincoln, 2000b: 15). social research after reading this handbook.
Exceptionally, the authors draw attention However, two tendencies of the evolution of
to computers that were also beginning to social research methods since 1990 will be
influence the methods of qualitative data discussed here with some, but not systemat-
analysis. Surprisingly, they do not recognize ically selected evidence. The first one is the
the impact of new technical devices (such fragmentation or diffusion of methodological
as tape recorders and video cameras) on the approaches, and the second one is the
methods of data collection15 . increasing tolerance between various methods
All in all, during these two decades qual- of analysis and data collection.
itative methods were established in several I claim that the differentiation of
method textbooks and journals that certainly methodological approaches has continued
do not make up a coherent unity. The nat- to escalate both within qualitative and
uralistic, postpositivistic and constructionist quantitative methods since the beginning
traditions of thinking have been seen as of the 1990s. There are highly specialized
distinctive to qualitative methods of this approaches within both traditions – one can
period. By the 1980s the linguistic turn started specialize in conversation or correspondence
to challenge the more naturalistic lines of analysis, choose to construct a structural
thinking. The linguistic turn probably also equation or multilevel models or end up with
directed the attention from the qualitative- one of the many variations of discursive
quantitative divide for instance to the contro- or narrative analysis, just to mention a
versies within qualitative methods. few alternatives. The increasing number of
There is some indication that at this point analytical approaches can partly be seen as a
the American and European methodologi- consequence of interaction between different
cal traditions differentiated at the level of disciplines and traditions. Simultaneously,
empirical research. In America the success numerous narrowly focused textbooks and
story of survey methods continued and journals have emerged to institutionalize
there was serious work done to advance them.
the methods of survey research. In Britain, The abundance of different methodological
and maybe more generally in Europe, sur- and theoretical approaches or traditions comes
vey methods gained a bad reputation in out clearly from the periodization of quali-
academic research and the listings of their tative methods presented by Norman Denzin
failings started to spread (see e.g. Marsh, and Yvonne Lincoln (2000b). They divide
1982). In the beginning of this period the the field of qualitative methods since 1986
quantitative and qualitative traditions were into four separate, but partly overlapping,
defined as incompatible, but as time went phases that relate to successive waves of
by the juxtaposition was questioned and epistemological theorizing that have ensued
by the end of the 1980s the possibility of a crisis of representation. Each of the
mixing the methods was taken under con- ‘moments’, as they are called, cover only a few
sideration (e.g. Bryman, 1988). For example years and take different stances to the crisis
David Silverman (1985) ‘radically’ suggested representation.
The four moments are the crisis of represen- whether qualitative and quantitative meth-
tation, the postmodern period of experimental ods can be combined) and later on more
ethnographic writing, the post-experimental confidently proclaiming the use of mixed
moment, and the future. The crisis of methods research (Brannen, 1992, 2005;
representation is associated with some Tashakkori & Teddlie, 1998). The number
methodological texts (e.g. Clifford & Marcus, of textbooks that include chapters on both
1986; Turner & Bruner, 1986) that made qualitative and quantitative traditions has
research and writing more reflexive and recently increased (e.g. Bernard, 2000; May,
conscious of questions of gender, class and 2003). Also the new Journal of Mixed
race. As the crisis of representation meant that Method Research is an indicator of this kind
researchers were not any longer seen able to of change. In its very first number, the
capture the lived experience, it changed the journal presents an outline of a transition
relations of fieldwork, analysis and scientific in relation to mixed methods research as
writing. This led to the search for new models well as a detailed analysis of various types
of truth, method and representation. The post- of multi-methods research (Morgan, 2007).
modern period of experimental ethnographic This tendency has been interpreted as a
writing struggled with the triple crisis of sign of increasing popularity of a more
representation (i.e. crisis of representation, pragmatic approach to research methods
legitimation and praxis). In this moment effort (Tashakkori & Teddlie, 1998).
was made to search for more local and small- These two tendencies raise two questions.
scale theories instead of grand narratives First, the motto of mixed methods approach
and writers also looked for new ways of has proclaimed a ‘dictatorship of the research
composing ethnography. According to Denzin question’ in the choice of research methods
and Lincoln the post-experimental moment (Tashakkori & Teddlie, 1998: 20–22), but
and the future were upon ‘us’ by the turn of how can one rationally choose the method
millennium. In the post-experimental phase in a situation where it is impossible even to
researchers try ‘to connect their writings to master the whole spectrum of alternatives by
the needs of a free democratic society’ and to names? Second, is the suggested tolerance
answer to the demands of a moral qualitative between the various methodological traditions
social science (Denzin & Lincoln, 2000a, only superficial? Is dialogue and deeper
16–18; 2000b). understanding between the diverse lines of
Even though this delineation has been thinking on research methods possible16 ?
criticized (e.g. Alasuutari, 2004), it proves
that the field appears quite complex even to
the insiders. The complexity of the qualitative THE ACTUAL USE OF DIFFERENT
methods is also pointed out by Jaber Gubrium METHODS
and James Holstein (1997). Their overview is
illuminating also historically as it goes to the So far the evolution of social research methods
roots of diverse lines of qualitative methods has been the centre of attention and very little
and takes into account the European tradition. has been said about the actual use of research
What is still missing is a corresponding study methods. However, there are some empirical
of the ramifications of quantitative methods studies that have grasped the actual use of
since the 1970s. different research methods, mainly during
Concurrently with this fragmentation, tol- the post-war decades. They will be shortly
erance between different methodological discussed to shed more light on some points
approaches seems to have slightly increased. of the history that have been dealt with earlier
A growing amount of methodological texts on in this chapter.
have been published during this period first These studies are indicative of the
exploring and pondering the possibility of proportions of the different research methods
mixed methods research (usually asking at various points in time (Snizek, 1975;
Wells & Picou, 1981; cf. Platt, 1996: 124–25; CONCLUSION AND DISCUSSION
Bechhofer, 1996; also Platt, 2006b). Most of
the studies draw on analyses of journal articles Up to now sociologists have scarcely occupied
and cover a time-span from the end of the themselves with the task of characterising and
defining the method that they apply to the study
1930s to the mid 1970s; only one of the studies of social facts. (Durkheim, 1982: 48)
goes back to the interwar decades.
However, regardless of the differences Since Durkheim’s time social scientists have
in the periods and categorizations of the spared no effort when writing on research
research methods, the main results are methods. Enormous amounts of methodologi-
parallel. Not surprisingly, the studies show cal texts have been written and also numerous
the rise of survey and other methods based controversies have arisen on methodological
on quantification especially in the leading issues.
journals of sociology in America during the The twentieth century has been a period
post-war decades. But they also show that of great expansion and institutionalization for
survey methods never – not even in the 1950s social research and its methods. To sum-
and 60s – were the only ones applied. Other marize, not only the methods as such but
apparently more qualitative approaches such also the relationships of different methods
as ‘observation’, ‘the interpretative method’ and methodological approaches have changed
and ‘the qualitative method’were always used considerably during the period considered
to some extent, although clear trends can be here. There have also been numerous method-
found in popularity of the different methods ological debates both within the quantitative
in America. One of the studies also shows (e.g. on probability sampling, questionnaire
that a small amount of experimental research construction, statistical testing and causal-
was published around the Second World ity) and qualitative approaches (Denzin &
War (Wells & Picou, 1981). Because the Lincoln, 2000b). Less attention is often paid
experimental approach never gained success to these controversies than to the dispute that
in social research, it is easily forgotten is now being referred to as the paradigm war
in method histories that it was regarded and which has drawn most of the attention.
a promising – and sometimes even only There are some issues that seem to occur
rigorously scientific – method supported, for frequently in methodological writing. One is
example by Samuel Stouffer. the relationship between theories and methods
Quite recently, on the basis of studying and another is the relationship of qualitative
journal articles and conference abstracts the and quantitative methods (in whatever ways
decline of survey and more sophisticated they are called). The first one is here passed by
statistical methods has been shown in Britain with only wonder as to whether there has been
(Bechhofer, 1996; Payne et al., 2004). This a shift in the interrelations between methods
data on the actual use of methods also provides and theories during the past decade or two
some evidence for the assumption that social so that methods are more frequently seen as
research has gone to different directions in matters of a technical nature, not as theories
America and Europe. of reality in themselves.
Given the attention that these studies have The controversy between qualitative and
directed to the quantitative-qualitative divide, quantitative approaches is the most discussed
they appear to be motivated by contemporary topic; it has come up frequently with different
methodological debates. Yet most of the names (case study vs. statistical method,
articles have been descriptive, and attempts participant observation vs. survey, qualitative
to explain the changes in the popularity of vs. quantitative) (cf. Platt, 1996: 45). The
particular research methods have been rare. divide has not only split methods textbooks
Not even sloppy explanations drawing on the and teaching but also the research on social
concepts of science studies, like ‘paradigm’, research methods. There are only very few
can be found. texts that even try to cover both approaches.
This divide has also drawn attention from ACKNOWLEDGEMENTS

the attempts being made to combine the two
approaches. The earliest attempts to bridge I wish to thank the editors of this book and the
the gap between qualitative and quantitative anonymous referee for the valuable comments
methods, that was just about to become on this chapter. I also wish to thank the
important, were made in the 1930s. For Academy of Finland for funding (the project
instance, Hans Zeisel concluded his history number 114638) which made possible the
of sociography in a way that still sounds writing of this chapter.
familiar: ‘The task of integration lies still
ahead’.
An interesting question is why methods, the NOTES
means of knowing, have become a subject
of furious disputes – or wars. Why have 1 Martin Bulmer has discussed the terminological
just methods been so emotionally loaded for differences between ‘social research’, ‘sociology’ and
such a long time? Ann Oakley has suggested ‘social science’ in the British context and notes
that the paradigm war is to continue as long that they are indicative of underlying tensions.
as there are communities that take sides According to Bulmer there has been a discontinuity
between sociology and empirical research as the latter
(Oakley, 2000: 41–42). Another reason may cannot be treated as a part of the former, because
be that research methods have been connected of both intellectual and institutional differences
with theoretical approaches. Similarly, as (Bulmer, 1985: 4–5).
the rise of survey methods was connected 2 Here I must remind of the importance of Jennifer
with the rise of structuralist-functionalist Platt’s work in this field. She has not only spoken for
the empirical research on the methods history, but
approach, the rise of qualitative methods also conducted a significant amount of research in
has been concomitant to the expansion of this area.
constructionism. 3 Similarly, the formation of survey is often told in
This chapter has largely drawn on studies the method textbooks in a way that can be viewed as
in the history of methods. From time to an origin myth. An origin myth of survey can begin,
first, with the ancient censuses; second, with the
time the importance of such a research has history of statistics; or third, from the early (British)
been noticed. Paul Lazarsfeld was one of social surveys. These histories may contain leaps of
the first people who recognized the need hundreds of years and they are often quite brief
for research on research methods. He even listings of methodological improvements and the most
wished that ‘perhaps soon a historian of important empirical studies.
4 Paradoxically researchers of the history of
empirical sociology will be an acknowledged methods have seldom paid any attention to the
specialist of his own, where familiarity with methods of their own research – neither in the sense of
contemporary work, skill in archival inquiry, data collection nor analysis – or to methods used in this
and creativity in interpretation will be equally kind of historical research more generally. However
required’ (Lazarsfeld, 1972: xv). if the methods are explicated, the datasets drawn on
typically consist of empirical research (journal articles),
One can doubt whether the history of method textbooks, interviews and syllabi.
methods will ever be a specialist area or 5 Weber’s three methodological writings originally
even whether it should be one. Yet research published between 1904 and 1917 were translated
on social research methods is needed to and edited into English in 1949 under the title
prevent the origin myths or other empirically Methodology of the Social Sciences by Edward A. Shils
and Henry A. Finch.
ungrounded narratives from becoming the 6 The notion of ‘school’ is discussed e.g. by Bulmer
only versions of the course of history. If any (1984: 2–3) and Platt (1996: 230–37).
version of history can be considered partial, 7 An anecdote is told that in the 1930s in Chicago
one can remember Jennifer Platt’s comforting ‘baseball sides at the annual faculty-student picnic
words that ‘(p)robably it is most fruitful to see were chosen to represent case study vs statistical
method’ (Platt, 1992, 19). Martyn Hammersley
the attempts to write the history of empirical discusses at length the case study vs statistical
social research as a necessarily continuing method controversy focusing especially on the argu-
discussion’ (Platt, 1996: 4). ment between Herbert Blumer and Georg Lundberg
(Hammersley, 1989: 92–112). He notes that in REFERENCES

this debate ‘we can see the emergence of many
arguments that are used today by the advocates of
Abrams, Philip (1981) The Origins of British Sociology:
qualitative and quantitative approaches’ (Hammers-
1834–1914. Chicago: The University of Chicago
ley, 1989: 111–12).
8 William Ogburn, trained in Columbia, was Press.
appointed to Chicago to strengthen the quantitative Alasuutari, Pertti (2004) The Globalization of Qualitative
side of the department of sociology in 1927. That Research. In Clive Seale et al. (eds.) Qualitative
same year the psychologist L.L. Thurnstone was Research Practice. London: Sage, 595–608.
also nominated as associate professor of psychology. Bechhofer, Frank (1996) Quantitative Research in British
Bulmer notes these nominations as signs of the Sociology: Has It Changed Since 1981? Sociology,
collective commitment to excellence, because they 30(3), 583–591.
were made despite the diversity in the department’s Becker, Howard S. (1970) Sociological Work. Method
interests (Bulmer, 1984: 170–72, 176). Ogburn was and Substance. New Brunswick: Transaction Books.
spokesman for the use of statistical methods, as Bernard, H. Russell (2000) Social Research Meth-
he wrote that ‘a body of knowledge ought not
ods: Qualitative and Quantitative Approaches.
to be called science until it can be measured’
Thousand Oaks: Sage.
(Hammersley, 1989: 95).
9 Charles E. Merriam’s and Harold F. Gosnell’s
Blumer, Herbert (1956) Sociological Analysis and the
study Non-Voting (1924) has been celebrated because ‘Variable’. American Sociological Review, 21(6),
of its complex and innovative research design and data 683–690.
collection which was based on personal interviews Bogdan, Robert & Taylor, Steven J. (1975) Introduction
as well as written questionnaires. Gosnell is usually to Qualitative Research Methods. A Phenomeno-
given merit for the methodological expertise. Despite logical Approach to Social Sciences. New York:
its high quality the study is seldom recognized in Wiley & Sons.
the histories of survey (Converse, 1987: 79–83; also Boudon, Raymond (1993) Introduction. In Teoksessa
Bulmer, 1984: 164–69). Boudon, Raymond (ed.) Paul F. Lazarsfeld. On Social
10 The career of Samuel Stouffer can be considered
Research and Its Language. Chicago: University of
as an example of such interaction. He graduated
Chicago Press, 1–29.
from Chicago, worked in the Research Branch of the
US Army, and ended up at Harvard.
Brannen, Julia (ed.) (1992) Mixing Methods:Qualitative
11 Marienthal was translated into English as late as and Quantitative Research. Aldershot: Avebury.
in 1972, although it was reviewed in various journals Brannen, Julia (2005) Mixing Methods: The Entry of
in a number of languages at the time of publication Qualitative and Quantitative Approaches into the
(Fleck, 2002). Research Process. International Journal of Social
12 These textbooks spread in several editions. Research Methodology, 8(3), 173–184.
There is even a legend that modern sociology Bryant, Christopher G.A. (1985) Positivism in Social
was founded in Norway when during the Second Theory and Research. New York: St. Martin’s Press.
World War Lundberg’s Social Research was found Bryman, Alan (1988) Quality and Quantity in Social
in the backpack of a member of the resistance Research. Unwin Hyman, London.
movement, who had died in the combat (Eskola,
Bulmer, Martin (1984) The Chicago School. Institu-
1992: 260).
13 These characterizations were made by social
tionalization, Diversity, and the Rise of Sociological
scientists interviewed by Platt in the beginning of the Research. Chicago: The University of Chicago Press.
1980s, so they do not necessarily correspond to the Bulmer, Martin (ed.) (1985) Essays on the History of
reception of Lundberg’s writing at his own time. British Sociological Research. Cambridge: Cambridge
14 Again this may be a statement that is not signed University Press.
by everyone, e.g. Martin Hammersley has extensively Cicourel, Aaron (1964) Method and Measurement.
written on Blumer’s alternative (1989: 155–220). New York: The Free Press.
15 One can ponder whether it is apt to refer to this Clifford, James & Marcus, George E. (eds.) (1986)
as a period of ‘blurred genres’ or whether the label is Writing Culture: The Poetics and Politics of Ethnogra-
due to lack of research on developments in qualitative phy. Berkeley: University of California Press.
methods.
Converse, Jean M. (1984) Strong Arguments and Weak
16 Frank Bechhofer describes the British situation in
Evidence: The Open/Closed Questioning Controversy
the mid 1990s in this way: ‘There is no sign of a move
away from two empirical cultures within the discipline, of the 1940s. Public Opinion Quarterly, 48(1B),
one growing the other static, with little commu- 267–282.
nication between them’ (Bechhofer, 1996: 588). Converse, Jean M. (1987) Survey Research in the United
By ‘growing’ Bechhofer refers to qualitative and by States.Roots and Emergence 1890–1960. Berkeley:
‘static’ to quantitative ‘empirical culture’. University of California Press.
Denzin, Norman K. & Lincoln, Yvonna S. (eds.) with Especial Reference to Prejudice. Part I: Basic
(2000a) Handbook of Qualitative Research (2nd edn). Processes. New York: Dryden Press.
Thousand Oaks: Sage (1st edn 1994). Jahoda, Maria, Deutsch, Morton & Cook, Stuart W.
Denzin, Norman K. & Lincoln, Yvonna S. (2000b) Intro- (1953b) Research Methods in Social Relations with
duction. The Discipline and Practice of Qualitative Especial Reference to Prejudice. Part II: Selected
Research. In Norman K. Denzin & Yvonna S. Lincoln Techniques. New York: Dryden Press.
(eds.) Handbook of Qualitative Research (2nd edn). Jahoda, Marie, Lazarsfeld, Paul & Zeisel, Hans (2002)
Thousand Oaks: Sage, 1–28. Marienthal.The Sociography of an Unemployed
De Vaus, D.A. (1995) Surveys in Social Research Community. New Brunswick: Transaction Publishers
(4th edn). London: Routledge. (Orig. 1933).
Devine, F. & Heath, S. (1999)Sociological Research Kent, Raymond (1981) The History of British Empirical
Methods in Context. Houndmills: Palgrave. Sociology. Aldershot: Gower.
Durkheim, Emile (1982) The Rules of Sociological- Kent, Raymond (1985) The Emergence of the Socio-
Method and Selected Texts on Sociology and Its logical Survey, 1887–1939. In Martin Bulmer (ed.)
Method. London: The Macmillan Press Ltd. Essays on the History of British Sociological Research.
Eskola, Antti (1992) Sosiologian uudistuminen 1950- Cambridge: Cambridge University Press, 52–69.
luvulla. In Alapuro, Risto, Alestalo, Matti & Kvale, Steinar (1996) InterViews. An Introduction to
Haavio-Mannila, Elina (eds.) Suomalaisen sosiologian Qualitative Research Interviewing. Thousand Oaks:
historia. Porvoo: WSOY, 241–285. Sage Publications.
Fleck, Christian (2002) Introduction to the Transaction Lazarsfeld, Paul F. (1965) Preface. In Oberschall,
Edition. In Marie Jahoda, Paul Lazarsfeld & Hans Anthony (ed.) Empirical Social Research in Germany
Zeisel (eds.) Marienthal.The Sociography of an Unem- 1848–1914. Paris: Mouton & Co, v–viii.
ployed Community. New Brunswick: Transaction Lazarsfeld, Paul F. (1972) Foreword. In Teoksessa
Publishers, vii–xxx. Oberschall, Anthony (ed.) The Establishment of
Glaser & Strauss, Anthony (1967) The Discovery of Empirical Sociology. Studies in Continuity, Disconti-
Grounded Theory. Strategies for Qualitative Research. nuity, and Institutionalization. New York: Harper &
New York: Aldine. Row, vi–xvi.
Gubrium, Jaber F. & Holstein, James A. (1997) The New Lazarsfeld, Paul F. (1977) Notes on the History of
Language of Qualitative Method. New York: Oxford Quantification in Sociology – Trends, Sources and
University Press. Problems. In Kendall, Maurice & Plackett, R.L. (eds.)
Gubrium, Jaber F. & Holstein, James A. (eds.) (2002) Studies in the History of Statistics and Probability
Handbook of Interview Research: Context and vol. II. London: Charles Griffin & Company limited,
Method. Thousand Oaks: Sage. 213–270 (Orig. 1961).
Halfpenny, Peter (1982) Positivism and Sociology: Lazarsfeld, Paul (1993a) Methodological Problems in
Explaining Social Life. London: Allen & Unwin. Empirical Social Research. In Boudon, Raymond (ed.)
Hammersley, Martin (1989) The Dilemma of Qualitative On Social Research and Its Language. Chicago:
Method.Herbert Blumer and the Chicago Tradition. University of Chicago Press, 236–254.
London: Routledge. Lazarsfeld, Paul (1993b) Max Weber and Empirical
Hardy, Melissa & Bryman, Alan (eds.) (2004) Handbook Social Research. In Boudon, Raymond (ed.) On Social
of Data Analysis. Thousand Oaks: Sage. Research and Its Language.Chicago: University of
Harvey, Lee (1987) The Myths of the Chicago School of Chicago Press, 283–298.
Sociology. Aldershot: Avebury. Lundberg, George (1942) Social Research: A Study in
Hoinville, Gerald (1985) Methodological Research on Methods of Gathering Data. New York: Green & co.
Sample Surveys: a Review of Developments in Britain. Madge, John (1963) The Origins of Scientific Sociology.
In Martin Bulmer (ed.) Essays on the History of London: Tavistock publications.
British Sociological Research. Cambridge: Cambridge Marsh, Catherine (1982) The Survey Method. The
University Press, 101–120. Contribution of Surveys to Sociological Explanation.
Hyman, Herbert H. (1960) Survey Design and Analysis. London: George Allen & Unwin.
Principles, Cases and Procedures. Third Printing. Marsh, Catherine (1985) Informants, Respondents and
Glencoe: The Free Press (Orig. 1955). Citizens. In Martin Bulmer (ed.) Essays on the History
Höjer, Henrik (2001) Svenska siffor: nationell integra- of British Sociological Research. London: Cambridge
tion och identifikationgenom statistic 1800–1870. University Press, 206–227.
Hedemora: Gidlunds. May, Tim (2003) Social Research: Issues, Methods and
Jahoda, Maria, Deutsch, Morton & Cook, Stuart W. Process (3rd edn). Buckingham: Open University
(1953a) Research Methods in Social Relations Press.
Mills, Wright C. (1977) The Sociological Imagination. Selltiz, Claire, Jahoda, Marie, Deutsch, Morton &
Harmondsworth: Pelican Book (Orig. in 1959). Cook, Stuart W. (1961) Research Methods in Social
Morgan, David L. (2007) Paradigm Lost and Prag- Relations. New York: Holt, Rinehart and Winston.
matism Regained. Methodological Implications of Selvin, Hanan C. (1958) Durkheim’s Suicide and
Combining Qualitative and Quantitative Meth- Problems of Empirical Research. American Journal of
ods. Journal of Mixed Methods Research, 1(1), Sociology, 63(6), 607–619.
48–76. Selvin, Hanan C. (1985) Durkheim, Booth and Yule:
Morrison, Denton & Henkel, Ramon E. (eds.) (1970) the Non-diffusion of an Intellectual Innovation.
The SignificanceTest Controversy – A Reader. In Martin Bulmer (ed.) Essays on the History of
Chicago: Aldine. British Sociological Research. Cambridge: Cambridge
Moser, Claus & Kalton, Graham (1986) Survey University Press, 70–82.
Methods in Social Investigation (2nd edn). Aldershot: Silverman, David (1985) Qualitative Methodology and
Gover. Sociology. Describing the Social World. Aldershot:
Oakley, Ann (2000). Experiments in Knowing.Gender Gover.
and Method in Social Sciences. Cambridge: Polity Snizek, W.E. (1975) The Relationship between Theory
Press. and Research: A Study in the Sociology of Sociology.
Oberschall, Anthony (1965) Empirical Social Research in Sociological Quarterly, 16, 415–428.
Germany 1848–1914. Paris: Mouton & Co. Stigler, Stephen M. (1986) The History of Statistics:
Payne, G., Williams, M. & Chamberlain, S. (2004). The Measurement of Uncertainty Before 1900.
Methodological Pluralism in British Sociology. Cambridge: Harvard University Press.
Sociology, 38(1), 153–163. Stouffer, S.A., Suchman, E.A., de Vinney, L.C.,
Platt, Jennifer (1983) Weber’s verstehen and the History Star, S.A. & Williams, R.M. (1949a) The American
of Qualitative Research: The Missing Link. British Soldier vol. I. Adjustment during Army Life. Princeton:
Journal of Sociology, 26(3), 448–466. Princeton University Press.
Platt, Jennifer (1986) Qualitative Research for the State. Stouffer, S.A., Suchman, E.A., de Vinney, L.C.,
Quarterly Journal of Social Affairs, 2, 87–108. Star, S.A. & Williams, R.M. (1949b) The American
Platt, J. (1992) ‘Case Study’ In American Methodological Soldier vol. II. Combat and Its Aftermath. Princeton:
Thought. Current Sociology, 40(1), 17–48. Princeton University Press.
Platt, Jennifer (1996) A History of Sociological Research Tashakkori, Abbas & Teddlie, Charles (1998) Mixed
Methods in America 1920–1960. Cambridge: Methodology. Combining Qualitative and Quantita-
Cambridge University Press. tive Approaches. Sage: Thousands Oaks.
Platt, Jennifer (2002) The History of Interview. Turner, Victor W. & Bruner, Edward (eds.) (1986)
In Gubrium, Jaber F. & Holstein, James A. (eds.) The Anthropology of Experience. Urbana: University
Handbook of Interview Research: Context and of Illinois Press.
Method. Thousand Oaks: Sage, 33–53. Vidich, Arthur J. & Lyman, Stanford M. (1994)
Platt, Jennifer (2006a) Functionalism and the Qualitative Methods: Their History in Sociology
Survey: The Relation of Theory and Method. and Social Anthropology. In Norman K. Denzin &
In Williams, M. (ed.) Philosophical Foundations of Yvonna S. Lincoln (eds.) Handbook of Qualitative
Social Research Methods. London: Sage, 217–251 Research (2nd edn). Thousand Oaks: Sage, 23–59
(orig. in Sociological Review, 34(3), 501–536). (2nd edn 2000).
Platt, Jennifer (2006b) How Distinctive are Canadian Wells, R.H. & Picou, J.S. (1981) American Soci-
Research Methods? Canadian Review of Sociology ology: Theoretical and Methodological Structures.
and Social Anthropology, 43(2), 205–231. Washington DC: University Press of America.
Porter, Theodore M. (1986) The Rise of Statistical Young, Pauline V. (1949) Scientific Social Surveys
Thinking 1820–1900. Princeton: Princeton University and Research. An Introduction to the Background,
Press. Content, Methods, and Analysis of Social Studies
Riley, Matilda White (1963) Sociological Research I. (2nd edn). New York: Prentice-Hall (1st edn 1939).
A Case Approach. New York: Harcourt, Brace & Zeisel, Hans (2002 [1930]) Afterword. Toward a History
World, Inc. of Sociography. In Jahoda, Marie, Lazarsfeld, Paul &
Schaeffer, Nora Cate & Presser, Stanley (2003) The Zeisel, Hans (eds.) Marienthal. The Sociography
Science of Asking Questions. Annual Review of of an Unemployed Community. New Brunswick:
Sociology, 29, 65–88. Transaction Publishers, 99–125 (orig. 1933).
4
Assessing Validity in
Social Research
Martyn Hammersley
Much discussion of how validity should be comes to more detailed criteria of assess-
assessed in social research has been organized ment these need to vary according to the
around the distinction between quantitative nature of the conclusions presented, and the
and qualitative approaches, with arguments characteristics of the specific methods of
over whether or not the same criteria apply data collection and analysis used. In the
to both. It is often suggested that quantitative course of the chapter, I will raise questions
inquiry has a clear set of assessment criteria, about both older positivist conceptions of
so that readers (even those who are not quantitative research, and of how it should be
researchers) can judge the quality of such assessed, and those more recent relativist and
research relatively easily, whereas in the case postmodernist ideas, quite influential among
of qualitative inquiry no agreed or easily qualitative researchers, which reject epistemic
applicable set of criteria is available. While criteria of assessment, and perhaps even all
this is often presented as a problem, some criteria.
qualitative researchers deny the possibility or In the first section, I will examine the
even the desirability of assessment criteria. criteria normally associated with quantitative
In this chapter I will argue that this work. This discussion will raise several
contrast between the two approaches is, to questions. One of these concerns what is
a large extent, illusory; that it relies on being assessed, and the need to make
misleading conceptions of the nature of some differentiation here, notably between
research, both quantitative and qualitative, assessing findings and assessing the value of
and of how it can be assessed. I will suggest particular research techniques. Another issue
that the general standards in terms of which relates to what is meant by the term ‘criterion’
both the process and products of research and what role criteria play in the process of
should be judged are the same whichever assessment. In the second half of the chapter
approach is employed. Furthermore, when it I will examine some of the arguments in
ASSESSING VALIDITY IN SOCIAL RESEARCH 43
the qualitative research tradition about how assessing the validity of quantitative research:
studies ought to be evaluated. were the measurement procedures reliable
and valid? And it is often suggested that,
in evaluating a study, the way to go about
QUANTITATIVE CRITERIA? answering this question is to ask whether
reliability and validity tests were carried out,
If we look at the methodological literature and whether the scores on these tests were high
dealing with quantitative research, and indeed enough to warrant a positive evaluation. This,
at many treatments of the issue of validity then, is one set of commonly used criteria.
in relation to social inquiry more generally, The second key area to which well-known
several standard criteria are usually men- criteria of assessment relate concerns the
tioned. These concern three main aspects generalizability of the findings. This is an
of the process of research: measurement, especially prominent issue in the context of
generalization, and the control of variables. survey research, where data from a sample of
In relation to measurement, the require- cases are often used as a basis for drawing
ments usually discussed are that measures conclusions about the characteristics of a
must be reliable and valid. Reliability is larger population. In this context, the issue
generally taken to concern the extent to is relatively clear: are the statements made
which the same measurement technique about the sample also true of the population?
or strategy produces the same result on Short of investigating the whole population,
different occasions, for example when used which would render sampling pointless, there
by different researchers. This is held to is no direct means of answering this question.
be important because if researchers are However, statistical sampling theory provides
using standard measurement devices, such as a basis for coming to a reasonable conclusion
attitude scales or observation schedules, they about the likely validity of inferences from
need to be sure that these give consistent sample to population. If the sample was
results. Furthermore, it is often argued that sufficiently large, and was drawn from
any measure that is not reliable cannot be the population on the basis of some kind
valid, on the grounds that, if its results are of probability sampling, then a statistical
inconsistent, the measurements it produces measure can be provided of how confident we
cannot be consistently valid. As this argument can be that the findings are generalizable. The
indicates, validity of measurement is seen as criteria involved here then, are the sampling
important by quantitative researchers, even procedures employed and the results of a
though it is usually taken to be more difficult statistical significance test2 .
to assess than reliability. Indeed, given the The final area where quantitative criteria are
link between the two criteria, reliability well established concerns whether variables
tests are often treated as one important have been controlled in a sufficiently effective
means for assessing validity. Nevertheless, manner to allow sound conclusions to be
separate validity tests may also be used, for drawn about the validity of causal or predic-
instance checking whether different ways of tive hypotheses; this sometimes being referred
measuring the same property produce the to as causal validity. Experimental designs
same findings, or whether what is found when employing random allocation of subjects to
measuring the property in a particular set treatment and control groups are often seen
of objects is consistent with the subsequent as the strongest means of producing valid
behaviour of those objects. These tests are conclusions in this sense. However, statistical
often described as assessing different kinds of control, through multivariate analysis, is an
validity, in this case convergent and predictive alternative strategy that is employed in much
validity1 . social survey research. Moreover, with both
On the basis of this initial discussion, we forms of control, statistical tests are often
can identify a first key question to be applied in applied to assess the chances that the results
were a product of random error rather than the findings or conclusions of a study are
of the independent variable. Here, then, the true. The three aspects discussed above refer
criteria concern whether physical or statistical to areas where error can undermine research
control was applied, and the confidence we conclusions. For example, what was referred
can have in ruling out random error. to as ‘causal validity’is concerned with threats
Undoubtedly the most influential account to valid inferences about causality arising
of evaluative criteria for quantitative research from confounding factors. Furthermore, the
that draws together these three different distinction between types of measurement
aspects into a single framework is that validity actually refers to ways in which
developed by Campbell and his colleagues we can assess whether our measurements
(Campbell 1957; Campbell and Stanley 1963; are accurate. There is also the problem that
Cook and Campbell 1979). This distinguishes the distinction between internal and external
between internal and external validity, where validity obscures the fact that ‘causal validity’
the former is usually seen as incorporat- implies a general tendency, for the cause to
ing measurement and causal validity, while produce the effect, that operates beyond the
external validity refers to generalizability3 . cases studied (Hammersley 1991). As a result,
Campbell et al.’s scheme was originally devel- internal validity is not distinct from external
oped for application to quasi-experimental validity.
research, but it has subsequently been applied Rather than differentiating types of validity,
much more widely. we need to distinguish between the different
There is no doubt that these three issues are sorts of knowledge claim that studies pro-
potentially key aspects of any assessment of duce. There are three of these: descriptive,
validity in quantitative research, and perhaps explanatory, and theoretical5 . Recognizing
in social inquiry more generally. However, the particular sort of conclusion a study makes
there are a number of important qualifications is important because each of the three types of
that need to be made. knowledge claim has different requirements,
First, we must be clear about what and therefore involves somewhat different
we are assessing. There is confusion in threats to validity. This is true even though
much discussion of measurement between there is some overlap caused by the way
a concern with assessing the findings of that these types of knowledge are interrelated:
the measurement process and assessing the descriptive claims are required as subordinate
measurement technique or strategy employed. elements in the other two kinds; and explana-
Validity relates only to the former, while tions always depend upon implicit or explicit
reliability concerns the latter. We can talk theoretical knowledge6 .
about whether the findings are or are not In assessing the validity of descriptions, we
valid, but it makes no sense to describe a must be concerned with whether the features
measurement technique as valid or invalid, ascribed to the phenomena being described
unless we are adopting a different sense are actually held by those phenomena, and
of the term ‘validity’, using it to mean perhaps also with whether they are possessed
‘appropriately applied’. It is, of course, true to the degrees indicated. Also of importance
that we should be interested in whether a may be whether any specification of changes
measurement technique consistently produces in those features over time, or any account of
accurate results. In fact, as is sometimes sequences of events, are accurate.
done, there would be good reason to define In assessing the validity of explanations we
‘reliability’ of measurement techniques as first of all need to consider the validity of
the capacity to produce consistently valid the subordinate descriptions: those referring
measurements4 . both to what is being explained and to the
Second, it is misleading to believe that there explanatory forces that are cited. Second, we
can be different types of validity. Validity must assess the validity of the theoretical
is singular not multiple; it concerns whether principle that provides the link between
proposed cause(s) and effect(s). Third, we social world, is an issue that is relevant to
need to consider whether that theoretical all kinds of research, even those that manage
principle identifies what was the key causal to achieve low reactivity (Hammersley and
process in the context being investigated. Atkinson 2007: chapter 1).
Finally, in judging the validity of theoretical In summary, then, validity is a crucial
conclusions, we will also need to assess the standard by which the findings of research
validity of any descriptive claims on which should be judged, and it is a single standard
they rely, both about the causal mechanism that applies across the board. However, what
involved and about what it produces. In is required for assessing likely validity varies
addition, we will need to find some means of according to the nature of the findings,
comparing situations in which it does and does and also according to the research methods
not operate, and of discounting other factors employed. From this point of view, the
that could generate the same outcome. argument that qualitative and quantitative
There is also variation in the threats to approaches require different assessment cri-
validity operating on different sources of teria is defective both in drawing a distinction
evidence, and this variation must also be where none exists and in obscuring more
taken into account in assessing knowledge specific and essential differences (in relation
claims. What is involved here is partly that to types of knowledge claim and specific data
some methods have distinctive validity threats sources).
associated with them. For example, if we rely Another important point relates to the
on the accounts of informants about some set notion of assessment criteria. There is some-
of events, then we must recognize that there times a tendency within the literature of
are distinctive potential biases operating on quantitative methodology to imply that there
these accounts, in addition to those operating are procedures which can tell us whether or
on researchers’ interpretations, for example, not, for instance, a measure is valid. Thus,
to do with whether the informant is able or reliability and validity tests are often said to
willing to provide accurate information in measure validity. However, they cannot do
relevant respects. By contrast, in the case of that. They can give us evidence on which
direct observation by a researcher only one we can base judgements about the likely
of these two sources of bias operates. (At the validity of the findings, but they cannot
same time, it is perhaps worth underlining that eliminate the role of judgement. Similarly,
closely associated with many sources of bias the use of experimental control, and random
are sources of potential insight, for instance, allocation of subjects to treatment and control
informants may be able to recognize what is groups, does not guarantee the validity of
going on in ways that are less easily available the findings; nor does the absence of these
to an external researcher.) methods mean that the findings are invalid,
Equally important is the fact that particular or even that the studies concerned provide us
threats to validity vary in degree across with no evidence. In fact, there are usually
methods. Reactivity is little or no threat with trade-offs such that any research strategy
some sources of data, such as the use of extant that is more effective in dealing with one
documents or covert observation of public threat to validity generally increases the
behaviour. By contrast, it is a very significant danger of other validity threats. Furthermore,
danger in the case of laboratory experiments, making sound judgements about validity
where subjects’ actions may be shaped by the relies on background knowledge, both about
experimental setup and by the appearance and the substantive matters being investigated and
behaviour of the experimenter. At the same also about the sources of data and methods of
time, we should note that what is threatened investigation employed. This means that there
by reactivity, the extent to which we can safely will be significant differences between people
generalize our findings from the situations in how well placed they are to assess the
studied to other relevant situations in the validity of particular sets of research findings
effectively. The relevant research community of research. Epistemological foundationalism

necessarily plays a crucial, but by no means was a strong influence on the development
infallible, role here. of ideas about criteria of assessment within
For all these reasons, it is misleading to social science research in the first half of the
talk about criteria of assessment, if by that twentieth century, and it underpins some dis-
is meant a universal and rigorous set of cussions of the concepts mentioned in the first
procedures that, if applied properly, can in part of this chapter. Foundationalism claims
themselves, and with certainty, tell us whether that what is distinctive about science, what
or not the findings of a study are valid. makes the knowledge it produces superior to
This notion is a mirage. How we assess that available from other sources, is that it
research findings must vary according to the can rely on a foundation of absolutely certain
nature of the knowledge claims being made data, from which theoretical conclusions can
and the methods employed. Furthermore, be logically derived and/or against which they
this assessment will always be a matter can be rigorously tested. Very often, these data
of judgement that relies on background are seen as being produced by experimental
knowledge and skill. method, but what is also often stressed is
the requirement that the process of inquiry
follows an explicit set of procedures that are
QUALITATIVE CRITERIA? replicable by others.
However, by the 1950s, most arguments for
Not surprisingly, much thinking about the the existence of an epistemological founda-
assessment criteria appropriate to qualitative tion had been effectively undermined within
research has taken the quantitative criteria the philosophy of science (Suppe 1974),
mentioned in the previous section as a key though the impact of this on social research
reference point. Some commentators have was delayed until the following decades.
attempted to translate these criteria into terms The claim that there could be perceptual
that can be applied to qualitative work (see, for data whose validity is simply given, and the
example, Goetz and LeCompte 1984). Others idea that any particular set of data will only
have replaced one or more of them by some validate a single theoretical interpretation,
new criterion, or have added extra ones to the were both challenged. Particularly significant
list (see, for example, Lincoln and Guba 1985 was the account of scientific development
and Lather 1986, 1993). Often, additions have presented by Thomas Kuhn, in which the
been motivated by a belief that research is not older view of science as involving a gradual
just about producing knowledge but should accumulation of facts on the basis of solid
also be directed towards bringing about some evidence was overturned. In its place, Kuhn
improvement in, or radical transformation of, presented a picture of recurrent revolutions
the world. Sometimes, this is linked to the idea within scientific fields, in which one frame-
that application of knowledge is the primary work of presuppositions, or ‘paradigm’, that
means of testing its validity, but this argument had previously guided research was rejected
is not always present. Indeed, increasingly in and replaced by a new paradigm that was
recent years, among qualitative researchers, ‘incommensurable’ with the old one (Kuhn
there have been challenges to epistemic 1970). In other words, Kuhn emphasized
criteria, with the proposal that these be discontinuity, rather than continuity, in the
replaced by practical, ethical, and/or aesthetic history of science, in the fundamental sense
considerations (see Smith and Deemer 2000; that later paradigms reconceptualized the field
Smith and Hodkinson 2005). of phenomena dealt with by earlier paradigms,
Of central importance in these develop- in such a manner that even translation from
ments have been philosophical arguments one to the other could be impossible. Rather,
about foundationalism, as well as political and what was involved, according to Kuhn, was
ethical arguments about the proper purpose more like conversion to a new way of looking
at the world, or gaining the ability to speak a requirement is to challenge claims to universal
different language7 . knowledge and to celebrate marginalized and
These developments led the way for transgressive perspectives, perhaps in the
some qualitative researchers to argue that name of freedom and democracy. Here, ethics
older conceptions of validity, and of validity and politics are foregrounded. Along these
criteria, are false or outdated8 . Many com- lines, Denzin and Lincoln argue that the
mentators claimed that we must recognize criteria of assessment for qualitative research
that there are simply different interpretations should be those of a ‘moral ethic (which) calls
or constructions of any set of phenomena, for research rooted in the concepts of care,
with these being incommensurable in Kuhn’s shared governance, neighbourliness, love and
sense; they are not open to judgement in kindness’ (Denzin and Lincoln 2005: 911).
terms of a universal set of epistemic criteria. Closely related to this line of argument is an
At best, there can only be plural, culturally insistence on seeing all claims to knowledge
relative, ways of assessing validity. This as intertwined, if not fused, with attempts
argument, variously labelled ‘relativism’ or to exercise power. Thus, the work of social
‘postmodernism’9 , was reinforced by claims scientists has often come to be analyzed both
from feminists, anti-racists, and others. They in terms of how it may be motivated by
argued that conventional social science sim- their own interests and/or in terms of the
ply reproduces the dominant perspectives wider social functions it is said to serve, in
in society, that it marginalizes other voices particular the reproduction of dominant social
that rely on distinctive, and discrepant, structures. In the context of methodology,
epistemological frameworks. From this point this has involved an emphasis on the senses
of view, the task of social science should be to in which researchers exercise power over
counter the hegemony of dominant groups and the people they study; and this has led to
their discourses, and thereby to make way for calls for collaborative or practitioner research,
marginalized discourses to be heard and their in which decisions about who or what to
distinctive epistemologies to be recognized. In research, as well as about research method,
this way, the original conception of epistemic are made jointly with people rather than their
criteria, and perhaps even the very notion of being simply the focus of study. Indeed, some
validity or truth, are rejected as ideological have argued that outside researchers should
and replaced by a political, ethical or aesthetic do no more than serve as consultants helping
concern with valuing, appreciating, or treating people to carry out research for themselves.
fairly, multiple conceptions of or discourses These ideas have been developed within the
about the world. action research movement, among feminists,
These critics of assessment criteria claim and are also currently very influential in the
then, that since there can be no foundation of field of research concerned with the lives
evidence that is simply given and therefore of children and young people (see Reason
absolutely certain in validity from which and Bradbury 2001 and MacNaughton and
knowledge can be generated, or against Smith 2005). Almost inevitably, this breaking
which hypotheses can be tested, then all down of the barriers between researchers
knowledge, in the traditional sense of that and lay people, designed to undermine any
word, is impossible. We are, to quote Smith claim to authority based on expertise, leads
and Hodkinson (2005: 915) ‘in the era to epistemic judgements being made in ways
of relativism’. This means that we must that diverge from those characteristic of
recognize that any claims to knowledge, traditional forms of research (qualitative as
including those of researchers, can only well as quantitative), and/or to them being
be valid within a particular framework of mixed in with or subordinated to other
assumptions; or within a particular socio- considerations.
cultural context. And, as already noted, some The problem with much of this criticism
writers have concluded from this that the main of epistemic criteria is that we are presented
with contrasting, old and new, positions as if a clash between the latter and scientific
there were no middle ground. Furthermore, findings, can it be assumed that science must
the irony is that the radical critique of foun- always be trusted. From this point of view,
dationalist epistemology inherits the latter’s science, including social science, becomes a
definition of ‘knowledge’. Foundationalists more modest enterprise than it was under
define ‘knowledge’as being absolutely certain foundationalism. But, at the same time, the
in validity. The critics show, quite convinc- specialized pursuit of knowledge is justified
ingly, that no such knowledge is possible. as both possible and desirable. By contrast
But why should a belief only be treated as with relativist and postmodernist positions,
knowledge when its validity is absolutely fallibilism does not reduce the task of social
certain? There is a third influential tradition science to challenging dominant claims to
of philosophical thinking, fallibilism, that knowledge or celebrating diverse discourses.
is at odds with both foundationalism and Nor is it turned into a practical or political
relativism/scepticism. This position can be project directly concerned with ameliorating
found in the writings of some contemporaries the world.
of Descartes, such as Mersenne, in the work of From this point of view, then, epistemic
pragmatists like Peirce and Dewey, and in the assessment of research findings is not only
philosophy of Wittgenstein. From this point of possible but is also the most important form of
view, while all knowledge claims are fallible – assessment for research communities. More-
in other words, they could be false even when over, while judgements cannot be absolutely
we are confident that they are true – this does certain, they can vary in the extent to which
not mean that we should treat them as all we are justified in giving them credence. In
equally likely to be false, or judge them solely my view, it also follows from this position
according to whether or not they are validated that the findings from qualitative research
by our own cultural communities. While we should be subjected to exactly the same
make judgements about likely validity on the form of assessment as those from quantitative
basis of evidence that is itself always fallible, studies, albeit recognizing any differences in
this does not mean either that validity is the the nature of the particular knowledge claims
same as cultural acceptability or that different being made and in the particular methods
cultural modes of epistemic judgement are all employed.
equally effective. Furthermore, in the normal
course of making sense of, and acting in, the
world we do not (and could not) adopt those OTHER RECENT DEVELOPMENTS
assumptions10 .
Where the sceptical/relativist position chal- Within the last decade there has been a revival
lenges the claims of science to superior of older, positivist ideas about the function
knowledge, the fallibilist position does not and nature of social research, and about
do this, although it insists on a more how it should be assessed. With the rise of
modest kind of authority than that implied by what is often referred to as the new public
foundationalism. It points to both the power management in many Western and other
of, and the limits to, scientific knowledge societies (Pollitt 1990; Clarke and Newman
(Haack 2003). The normative structure of 1997), along with the growing influence of
science is designed to minimize the danger ideas about evidence-based policy-making
of error, even though it can never eliminate and practice, there have been increasing
it. Moreover, while science can provide us pressures for the reform of social research
with knowledge that is less likely to be so as to make it serve the demands of policy
false than that from other sources, it cannot and practice more effectively. These pressures
give us a whole perspective on the world have been particularly strong in the field
that can serve as a replacement for practical of education, but are also increasingly to
forms of knowledge. Nor, in the event of be found elsewhere11 . The task of research,
from the viewpoint of many policy-makers significantly in character. The project was
today, is to demonstrate which policies and commissioned by the UK Economic and
practices ‘work’, and which do not; and this Social Research Council, and the background
has led to complaints that there is insufficient here was very much recent criticism of
relevant research, and that much of it is educational research for being of poor quality
small-scale and does not employ the kind and little practical relevance. At the same
of experimental method that is taken to be time, a prime concern of the authors seems
essential for identifying the effects of policies to have been to provide criteria for use in
and practices. To a large extent, this attitude the upcoming Research Assessment Exercise
reflects the fact that evidence-based practice (RAE) in the UK, a process that is used
has its origins in the field of medicine, where to determine the distribution of research
randomized, controlled trials are common12 . resources across universities. A longstanding
At the same time, there have been attempts complaint on the part of some educational
on the part of some qualitative researchers researchers has been that the RAE uses tra-
to show how their research can contribute to ditional scholarly criteria of assessment that
evidence-based policy and practice, and also discriminate against applied work directed
to specify the criteria by which qualitative at practitioner audiences. And there has
studies can be judged by ‘users’. For example, been much discussion of how this alleged
in the UK two sets of assessment criteria bias can be rectified. In addressing the
for qualitative research have recently been problem, Furlong and Oancea produce four
developed that are specifically designed to sets of criteria. The first is epistemic in
demonstrate how it can serve policy-making character, being concerned with issues of
and practice. The first was commissioned by validity and knowledge development. More
the Cabinet Office in the UK from the National striking, however, are the other three sets of
Centre for Social Research, an independent criteria: technical, practical, and economic.
research organization (Spencer et al. 2003). Here educational research is to be judged
These authors provide a discussion of the in terms of the extent to which it provides
background to qualitative research, and of techniques that can be used by policy-makers
previous sets of criteria, before outlining a or practitioners; the ways in which it informs,
lengthy list of considerations that need to be or could inform reflective practice; and/or
taken into account in assessing the quality of the extent to which it offers ‘added value’
qualitative research. They take great care in efficiently14 .
making clear that these should not be treated There is an interesting parallel between
as a checklist of criteria that can give an the emphasis placed by Furlong and Oancea
immediate assessment of quality. However, on non-epistemic criteria and the move,
perhaps not surprisingly, the authors have outlined earlier, on the part of some qualitative
been criticized, on one side, for producing researchers to abandon epistemic criteria
too abstract a list of criteria and, on the other, completely. While many of the latter are
for providing what will in practice be used as hostile to the pressure for research to serve
a checklist, one which distorts the nature of evidence-based policy-making and practice
qualitative research13 . (see, for instance, Lather 2004), there is
Another recent set of criteria for assessing what might be described as a ‘third way’
research emerged in the field of educa- approach championed by some, notably those
tion (Furlong and Oancea 2005). While it associated with the tradition of qualitative
was not restricted to qualitative research, action research. This redirects the pressure
being concerned with ‘applied and practice- on research for policy- and practice-relevance
based educational inquiry’ more generally, away from a positivist emphasis on the need
the authors clearly had qualitative work for quantitative methods to demonstrate ‘what
particularly in mind. This venture had rather works’ towards a broader view of worthwhile
different origins from the first, and differs forms of research and of the ways in which
it can shape practice. It is seen as playing Another recent development that has impor-
a much more interactive and collaborative tant implications for assessing the validity
role, at least in relation to practitioners of research findings is a growing movement
‘on the ground’. Advocates of this sort of among some groups of social scientists
position, such as John Elliott, are as critical towards championing the integration of quan-
of ‘academic’ educational research as the titative and qualitative methods (see Bryman
advocates of the new positivism. Where they 1988; Tashakkori and Teddlie 2003a). ‘Mixed
differ is in the kind of research they believe is methods’ research is promoted as capitalizing
needed to inform policy-making and practice on the strengths of both approaches. And
(see Elliott 1988, 1990, and 1991; see also this movement raises at least two issues of
Hammersley 2003). importance in the present context. First, there
We can see then, that besides divergent is the question of what sort of philosophical
philosophical orientations between and framework, if any, should underpin mixed
among quantitative and qualitative methods research, since this has implications
researchers, equally important in shaping for how findings should be assessed. After all,
ideas about how social research should be simply combining the various types of validity
assessed are views about its social function. identified by both quantitative and qualitative
In crude terms, we can distinguish four broad researchers produces a formidable list (see
positions. First, there are those who see Teddlie and Tashakkori 2003: 13). A number
most social science research, especially that of alternative ways of formulating mixed
located in universities, as properly concerned methods research as a ‘third way’ have been
exclusively with producing knowledge proposed, from the idea of an ‘aparadigmatic’
about human social life whose relevance orientation that dismisses the need for reliance
to policy and practice is indirect, albeit on any philosophical assumptions at all to
not unimportant. Second, there are those the adoption of one or another alternative
who share the belief that social research research paradigm, such as pragmatism or
must retain its independence, rather than ‘transformative-emancipatory’ inquiry (see
being subordinated to policy-making or Teddlie and Tashakkori 2003b). It should
professional practice, but who regard the be noted, though, that the reaction of many
criteria of assessment as properly political, qualitative researchers to mixed methodology
ethical, and/or aesthetic. For example, the task approaches is that, in practice, they force
may be viewed as to ‘disturb’ or ‘interrupt’ qualitative work into a framework derived
conventional thinking in a manner that is not from quantitative method, of a broadly
dissimilar to Socratic questioning, in its most positivist character. And there is some truth
sceptical form. Third, there are those who, in this.
while they see the purpose of social science A second issue raised by mixing quan-
very much as producing knowledge, insist titative and qualitative approaches concerns
that for this to be worthwhile it must have whether new, distinctive, criteria of assess-
direct policy or practice implications: the task ment are required, for instance relating
is to document what policies and practices specifically to the effectiveness with which
‘work’. Finally, there are those who doubt the different kinds of method have been com-
the capacity of social science to produce bined. Here, as elsewhere, there is often insuf-
knowledge about the social world, in the ficient clarity about the difference between
conventional sense of that term, and who assessing research findings, as against assess-
believe the task of social researchers is to ing the effectiveness with which particular
work in collaboration with particular groups research projects have been pursued, the
of social actors to improve or transform the value of particular methods, the competence
world15 . Clearly, which of these stances of researchers, and so on. Moreover, there
is adopted has major implications for the is also the question of whether combining
question of how research should be evaluated. quantitative and qualitative methods is always
desirable, and of whether talk about mixing the quality of qualitative research. Finally,
the two approaches does not in effect embalm I considered the implications of the growing
what is, in fact, too crude and artificial a advocacy of ‘mixed methods’ research, which
distinction. in some respects is not unrelated to these
external pressures.
We are a long way from enjoying any
CONCLUSION consensus among social scientists on the issue
of how social research ought to be assessed.
Clearly, the assessment of research findings However, the differences in view cannot
is not a straightforward or an uncontentious be mapped onto the distinction between
matter. In this chapter I began by outlining the quantitative and qualitative approaches, even
criteria usually associated with quantitative though the argument is often formulated in
research, and noted serious problems with those terms. It is essential to engage with the
these: that there is often confusion about what complexities of this issue if any progress is to
is being assessed, and a failure to recognize be made in resolving the disputes.
differences in what is required depending
upon the nature of the knowledge claim made
and the particular research method used. In
addition, I argued that it is not possible to NOTES
have criteria in the strict sense of that term,
as virtually infallible indicators of validity 1 These commitments to reliability and measure-
or invalidity. Judgement is always involved, ment validity, and distinctions between types of
validity, are spelled out in many introductions to social
and this necessarily depends upon background research. For a recent example, see Bryman 2001:
knowledge and practical understanding. 70–4. As Bryman indicates, the checking of reliability
In the second half of the chapter, I consid- and validity in much quantitative research is rather
ered the relativist and postmodernist views limited, sometimes amounting to ‘measurement
that are currently influential among many by fiat’.
2 Of course, there are many other issues that survey
qualitative researchers. These deny the rele- researchers take into account, not least non-response.
vance of epistemic standards of assessment, in 3 The different accounts produced over several
favour of an emphasis on political, ethical, or years allocate measurement somewhat differently: see
practical ones. I tried to show how this stems Hammersley 1991.
from a false response to the epistemological 4 On the considerable variation in definitions
of ‘reliability’ and measurement ‘validity’, see
foundationalism that has informed much Hammersley 1987.
thinking about quantitative research. Instead, 5 There are also value claims: evaluations and
I suggested that what is required is a fallibilist prescriptions. I am taking it as given that research
epistemology. This recognizes that absolute cannot validate these on its own: see Hammersley
certainty is never justified but insists that it 1997.
6 The last of these claims is controversial: there are
does not follow either that we must treat all those, particularly among commentators on historical
knowledge claims as equally doubtful or that explanation, who deny that explanations always
we should judge them on grounds other than appeal to theoretical principles. For a discussion of this
their likely truth. issue, see Dray 1964.
Of course, discussion of these issues never 7 For valuable recent accounts of Kuhn’s complex,
and often misunderstood, position, see Hoyningen-
takes place in a socio-cultural vacuum, and Huene 1993, Bird 2000, and Sharrock and Read 2002.
I outlined some recent changes in the external 8 For an extended account of a more moderate
environment of social science research, in the position, see Seale 1999.
US and the UK and elsewhere, which have 9 Smith 1997 and 2004 distinguishes between
increased demands that they demonstrate their his own relativist position and that of some post-
modernists. However, the distinction is not cogent,
value. I examined a couple of the responses in my view (Hammersley 1998). At the very least,
to these pressures, in terms of attempts to there is substantial overlap between relativist and
develop criteria that should be used to assess postmodernist positions.
10 For a sophisticated recent fallibilist account in Elliott, J. (1990) ‘Educational research in crisis:
epistemology, see Haack 1993. performance indicators and the decline in excellence’,
11 On the history of these developments in the UK, British Educational Research Journal, 16, 1, pp. 3–18.
see Hammersley 2002: chapter 1. On parallel changes Elliott, J. (1991) Action Research for Educational
in the US, see Feuer et al. 2002, Mosteller and Boruch Change, Milton Keynes, Open University Press.
2002, and Lather 2004.
Feuer, M. J., Towne, L. and Shavelson, R. J.
12 For these arguments, see, for example, Oakley
(2002) ‘Scientific culture and educational research’,
2000 and Chalmers 2003; see also Hammersley 2005.
13 See Kushner 2004; Murphy and Dingwall Educational Researcher, 31, 8, pp. 4–14.
2004; Torrance 2004. One critique has dismissed it Furlong, J. and Oancea, A. (2005) Assessing Quality in
as a ‘government-sponsored framework’ (Smith and Applied and Practice-focused Educational Research:
Hodkinson 2005: 928–9). A Framework for Discussion, Oxford, Oxford Univer-
14 Hammersley 2006 provides an assessment of sity Department of Educational Studies.
the case put forward by Furlong and Oancea for these Goetz, J. P. and LeCompte, M. D. (1984) Ethnography
criteria. and Qualitative Design in Educational Research,
15 These four positions are intended simply to Orlando, Academic Press.
map the field; many researchers adopt positions which Haack, S. (1993) Evidence and Inquiry, Oxford, Blackwell.
combine and/or refine their elements. Haack, S. (2003) Defending Science – Within Reason,
Amherst, NY, Prometheus Books.
Hammersley, M. (1987) ‘Some notes on the terms
“validity” and “reliability” ’, British Educational
REFERENCES Research Journal, 13, 1, pp. 73–81.
Hammersley, M. (1991) ‘A note on Campbell’s
Bird, A. (2000) Thomas Kuhn, Princeton, Princeton distinction between internal and external validity’,
University Press. Quality and Quantity, 25, pp. 381–7.
Bryman, A. (1988) Quantity and Quality in Social Hammersley, M. (1997) Reading Ethnographic Research,
Research, London, Allen and Unwin. 2nd edition, London, Longman.
Bryman, A. (2001) Social Research Methods, Oxford, Hammersley, M. (1998) ‘Telling tales about educational
Oxford University Press. research: a response to John K. Smith’, Educational
Campbell, D. T. (1957) ‘Factors relevant to the validity of Researcher, 27, 7, pp. 18–21.
experiments in social settings’, Psychological Bulletin, Hammersley, M. (2002) Educational Research, Policy-
54, 4, pp. 297–312. making and Practice, London, Paul Chapman.
Campbell, D. T. and Stanley, J. (1963) ‘Experimental and Hammersley, M. (2003) ‘Can and should educational
quasi-experimental designs for research on teaching’, research be educative?’, Oxford Review of Education,
in N. L. Gage (ed.) Handbook of Research on 29, 1, pp. 3–25.
Teaching, Chicago, Rand McNally. Hammersley, M. (2005) ‘Is the evidence-based practice
Chalmers, I. (2003) ‘Trying to do more good than movement doing more good than harm?’, Evidence
harm in policy and practice: the role of rigorous, and Policy, 1, 1, pp. 1–16.
transparent, up-to-date evaluations’, Annals of the Hammersley, M. ‘Troubling criteria: a critical com-
American Academy of Political and Social Science, mentary on Furlong and Oancea’s framework for
589, pp. 22–40. assessing educational research’, forthcoming British
Clarke, J. and Newman, J. (1997) The Managerial State, Educational Research Journal, 2008.
London, Sage. Hammersley, M. and Atkinson, P. (2007) Ethnography:
Cook, T. D. and Campbell, D. T. (1979) Quasi- Principles in Practice, 3rd edition, London, Routledge.
Experimentation: Design and Analysis Issues for Field Hoyningen-Huene, P. (1993) Reconstructing Scientific
Situations, Boston, MA, Houghton-Mifflin. Revolutions: Thomas S. Kuhn’s Philosophy of Science,
Denzin, N. K. and Lincoln, Y. S. (2005) ‘The art Chicago, University of Chicago Press. (First published
and practices of interpretation, evaluation, and in German in 1989.)
representation’, in Denzin, N. K. and Lincoln, Y. S. Kuhn, T. S. (1970) The Structure of Scientific Revolutions,
(eds.) Handbook of Qualitative Research, 3rd edition, Chicago, University of Chicago Press.
Thousand Oaks, CA, Sage. Kushner, S. (2004) ‘Government regulation of qualitative
Dray, W. (1964) Philosophy of History, Englewood Cliffs, evaluation’, Building Research Capacity, 8, May,
NJ, Prentice-Hall. pp. 5–8.
Elliott, J. (1988) ‘Response to Patricia Broadfoot’s Lather, P. (1986) ‘Issues of validity in openly ideological
presidential address’, British Educational Research research: between a rock and a soft place’,
Journal, 14, 2, pp. 191–4. Interchange, 17, 4, pp. 63–84.
Lather, P. (1993) ‘Fertile obsession: validity after Smith, J. K. (1997) ‘The stories educational researchers
poststructuralism’, Sociological Quarterly, 34, tell about themselves’, Educational Researcher, 26, 5,
pp. 673–93. pp. 4–11.
Lather, P. (2004) ‘This is your father’s paradigm: Smith, J. K. (2004) ‘Learning to live with relativism’, in
Government intrusion and the case of qualitative H. Piper and I. Stronach (eds.) Educational Research:
research in education’, Qualitative Inquiry, 10, Diversity and Difference, Aldershot, Ashgate.
pp. 15–34. Smith, J. K. and Deemer, D. K. (2000) ‘The problem of
Lincoln, Y. S. and Guba, E. G. (1985) Naturalistic Inquiry, criteria in the age of relativism’, in N. K. Denzin and
Beverley Hills, Sage. Y. S. Lincoln (eds.) Handbook of Qualitative Research,
MacNaughton, G. and Smith, K. (2005) ‘Transforming 2nd edition, Thousand Oaks, Sage.
research ethics: the choices and challenges of Smith, J. K. and Hodkinson, P. (2005) ‘Relativism,
researching with children’, in A. Farrell (ed.) criteria, and politics’, in Denzin, N. K. and Lincoln, Y. S.
Ethical Research with Children, Maidenhead, Open (eds.) Handbook of Qualitative Research, 3rd edition,
University Press. Thousand Oaks, CA, Sage.
Mosteller, F. and Boruch, R. (eds.) (2002) Evidence Spencer, L., Ritchie, J., Lewis, J. and Dillon, L.
Matters: Randomized Trials in Education Research, (2003) Quality in Qualitative Evaluation: A Frame-
Washington D.C., Brookings Institution Press. work for Assessing Research Evidence, London,
Murphy, E. and Dingwall, R. (2004) ‘A response Cabinet Office. Available at: http://www.policyhub.
to ‘Quality in Qualitative Evaluation: a framework gov.uk/docs/qqe_rep.pdf (Accessed 13.02.2006).
for assessing research evidence”, Building Research Suppe, F. (ed.) (1974) The Structure of Scientific
Capacity, 8, May, pp. 3–4 Theories, Chicago, University of Chicago Press.
Oakley, A. (2000) Experiments in Knowing: Gender and Tashakkori, A. and Teddlie, C. (eds.) (2003a) Handbook
Method in the Social Sciences, Cambridge, Polity of Mixed Methods in Social and Behavioral Research,
Press. Thousand Oaks, CA, Sage.
Pollitt, C. (1990) Managerialism and the Public Services, Teddlie, C. and Tashakkori, A. (2003b) ‘Major issues
Oxford, Blackwell. and controversies in the use of mixed methods in
Reason, P. and Bradbury, H. (eds.) (2001) Handbook of the social and behavioral sciences’, in Tashakkori
Action Research: Participative Inquiry and Practice, and Teddlie (eds.) Handbook of Mixed Methods in
London, Sage. Social and Behavioral Research, Thousand Oaks, CA,
Seale, C. (1999) The Quality of Qualitative Research, Sage.
London, Sage. Torrance, H. (2004) ‘ “Quality in Qualitative Evalua-
Sharrock, W. and Read, R. (2002) Kuhn: Philosopher of tion” – a (very) critical response’, Building Research
Scientific Revolution, Cambridge, Polity. Capacity, 8, May, pp. 8–10.
5
Ethnography and Audience
Karen Armstrong
INTRODUCTION and other things are understood even if not

said. In most cases, the audience is expected to
The ethnographic method results in an analy- do some of the work. What happens between
sis of society which is built up from small facts the speaker’s intention and the audience’s
and details. It brings the distant near to point understanding is a matter of interpretation.
out something not realized before, much like For example, the Xavante of Brazil perform
poetry does in a different genre (Heidegger dances for audiences of tourists in Brazil that
1971). In ethnography, the data are generally could be interpreted to be invented tradition
gathered from what people say and do in cer- (Graham 2005). What the Xavante perform,
tain situations in order to illuminate broader however, is not intended to be measured as
comparative questions. The contemporary being true or not. They choose to perform
expansion of the audience1 for ethnographic as they do because they would insult the
writing affects the contextualization of data ancestors if they performed the full traditional
by the researcher and raises questions about rituals for outsiders. When non-Brazilian
the relation of theory to audience. I will use audiences appreciate their performance, the
some examples from anthropology to explore Xavante interpret the response to mean that
issues that are widespread in qualitative their culture is recognized by outsiders as
research to argue that, while the move to meaningful. In other situations, fragments of
‘critical ethnography’ raised issues about a narrative, or just allusions to a story, circulate
representation, it did not fully address the among Quechua speakers in Peru, while
relation of ethnography to audience. place names summarize a moral story for
The problem of audience is apparent the Western Apache (Becker and Mannheim
already when collecting ethnographic data. 1995; Basso 1996). The fragments or names
Typically, the researcher is a participant in the provide a cue; the audience does the work of
immediate situation, translating it later into understanding and creating meaning. As can
a text. Implicit in this activity – talking, be seen in these examples, being positioned
performing, writing – there is an audience; as an insider or outsider affects audience and
only some things are said to certain audiences, meaning.
ETHNOGRAPHY AND AUDIENCE 55
The problem of audience appears again society being studied, spending enough time
during the writing process. Whenever ethno- among the people in order to know how
graphers write up their data they engage in they live, what they say about what they do,
an act of recontextualization (Duranti 1986: what they actually do, what they believe, and
244) by setting contextual clues, always their system of valuation. The fieldworker
selective, for the intended audience to judge may include archival and statistical data
the analysis. In anthropology, the ethnography and discuss the influence of national and
(or monograph) is considered to be the international organizations. Apart from these
account that pulls together the bits and pieces general procedures, it can be said that there
of data into a single whole. An ethnography is, is no distinct object of the anthropological
by definition, comparative; it should address fieldwork method (Faubion 2001: 39, my
central questions about the nature of human emphasis). What remains constant is the
existence through a specific society and its recurring problem of self and other: how do
cultural system. Therefore, the audience is we know what we know, how do we assume
assumed to be both specific (academic, place, to speak for others, and who is the audience
etc.) and general in the sense that anyone may being addressed? The first two issues have
engage with the broader questions addressed. been addressed as problems of validity and
No good ethnography is self-contained. representation; to address the third it is useful
Implicitly or explicitly ethnography is an to begin by looking at the relation of theory to
act of comparison. By virtue of comparison audience.
ethnographic description becomes objective.
Not in the naïve positivist sense of an
unmediated perception – just the opposite: THEORY AND AUDIENCE
it becomes a universal understanding to the
extent it brings to bear on the perception of The sociologist, Arto Noro (2001, 2004),
any society the conceptions of all the others argues that there are three genres of sociolog-
(Sahlins 1996:10). ical theory, each with an intended audience.
There have been ongoing debates about One is general theory; theories of this
the goals of ethnography. These debates type pose questions about how society in
are commonly related to changing historical general is constituted and try to answer
conditions and the need for social scientists to the questions. General theory is directed
analyze what is going on in the contemporary toward a scientific audience and aims for
world. In the past, situations like colonial- an interpretative synthesis by referring to
ism generated the need for new theoretical earlier questions, which are readdressed to
and methodological approaches. It seems contemporary events. A second genre is
appropriate, now that we live in a world research theory; this level consists of research
connected by the Internet, mobile phones, web projects that address or test the propositions of
cameras, extensive media coverage, and so on, general theory and, in turn, provide material
that we should rethink problems of method for general theory (2001: 1–2). As Noro
as related to audience. This is especially points out, there is a significant relationship
true for anthropology since the ‘natives’ between these two. They lose their common
are professionals in many fields, including ground only when research theory turns
anthropology. into administrative research or when general
In the most general sense, anthropologi- theory becomes philosophy. Research theory
cally informed ethnography is based on long- supplies material for general theory and is
term fieldwork, and participant observation intended for a scientific audience; alterna-
in a society other than one’s own has tively, it is directed toward specific social
been assumed and prioritized. Participant problems and is intended for instrumental use
observation includes the assumption of a (for example, in forming social policy). Noro
measure of fluency in the language of the calls the third genre ‘Zeitdiagnose’; this is
theory that focuses on a diagnosis of the times and summer social groups) are presented to
we live in. Zeitdiagnose is directed toward a argue the larger comparative point of social
‘group-We’audience and intends to encourage morphology.
‘us’ to think about our situation and perhaps The concept of culture was the general
to change it accordingly. theory in American anthropology of the same
Noro claims that Zeitdiagnose became period where cultural relativism focused on
popular in sociology in the 1980s and 1990s breaking the evolutionary model, a move
with books about risk society and modern which was especially relevant in the context of
identity (e.g. Beck 1992, 1994; Giddens 1991, American society. Franz Boas and his students
1992, among others)2 . The key characteristic typically made visits to the field to collect
of Zeitdiagnose is that it offers an insight, cultural data and material artifacts. Much of
understanding or vision (Noro 2001: 5) about their work was based on textual material
our own times, something we have an inkling collected from various North American Indian
about but cannot name without the synthesis groups in order to record so-called aboriginal
provided by the author. Zeitdiagnose tends culture. This has been labeled ‘salvage
to be openly normative and political (ibid.). ethnography’ because they were aware that
Such texts are intensely seductive because most of the groups had been decimated
they tell us who ‘we’ are, although these by war with the American government at
theories cannot be used in the interpretation of the end of the nineteenth century and they
empirical evidence because we would find in understood that what they were witnessing
the material what the diagnoses have already had been influenced and broken down by
named. As Noro says, the end result would be historical events. Nevertheless, they were
poor mimesis (2001: 11). looking at these groups to identify specific
As the audience for ethnographic research culture traits and their local patterning, not as
becomes global and less contained, there can an evolutionary process or a comparison of
be problems with the goals of research theory the primitive with the civilized.
and Zeitdiagnose theory. These issues were One student of Boas, Paul Radin, did
anticipated in early discussions of the object of extensive fieldwork among the Winnebago
ethnographic research and the use of analytic for nearly 50 years and wrote a book for
concepts. the method of studying culture (1987[1930]).
Radin did not deny history, but he denied
comparisons of cultures as being more or
THE SCIENTIFIC AUDIENCE: EARLY less advanced in direct or implicit comparison
STUDIES to ‘us.’ He was critical, therefore, of those
who followed Malinowski’s universalistic and
Ethnographies tend to fall into the above functional style of description of ‘primitives’:
classification of theories. An early example ‘…whereas I see no necessity for proving
of an ethnography framed by general theory that culture is culture, they apparently feel
is Seasonal Variation of the Eskimo by that it is incumbent upon them to laboriously
Marcel Mauss in collaboration with Henri demonstrate that, among primitive people, we
Beuchat (1979[1950]). It is based on field are dealing with human beings who think as
research by Beuchat and others and organized we do, feel as we do, and act as we do’ (Radin
around Emile Durkheim’s concept of social 1987[1930]: 257). Radin’s method argued for
morphology to discuss the influence of a study of culture based on ‘reconstruction
seasonal variation on both social and cultural from internal evidence.’
elements in Eskimo society and to propose
that there may be similar variation in other The task, let me insist, is always the same:
a description of a specific period, and as much
societies. As with any good ethnography, of the past and as much of the contacts with
details about culture (house styles, naming other cultures as is necessary for the elucidation
practices, hunting, etc.) and society (winter of the particular period. No more. This can be
done only by an intensive and continuous study audience3 . The book presents the transition
of a particular tribe, a thorough knowledge of the from youth to adulthood in Samoan culture as
language, and an adequate body of texts; and being easy and without the stress and rebellion
this can be accomplished only if we realize, once
and for all, that we are dealing with specific, not
found in American society. By using the
generalized, men and women, and with specific, contrast of Samoan culture, Mead proposed
not generalized, events. (ibid. 184–85) that the stress experienced in American
adolescence had social and cultural causes
Radin was critical of the categories imposed which might be altered (see the discussion
by universalistic theory, although he recog- in Marcus and Fisher 1986; Stocking 1992).
nized that his method was similar to that A friend of Mead, Edward Sapir, immediately
of Marcel Mauss: ‘In elucidating culture we complained that a student of culture cannot
must begin with a fixed point, but this point use what he knows as medicine for society
must be one that has been given form by (Handler 1986). Mead’s book has generated
a member of the group described, and not by enormous commentary, the most famous
an alien observer’ (ibid. 186). To demonstrate being numerous books and articles written
his method, Radin uses one Winnebago man’s by anthropologist Derek Freeman to disclaim
(John Rave’s) account of his conversion to the validity of Mead’s ethnographic method
the Peyote Cult. Radin traces themes in the and data (e.g. Freeman 1983, 1999, 2001).
narrative and, along with other native texts George Marcus and Michael Fisher argue that
and his own observations, Radin shows how Mead failed because cultural juxtapositioning
Rave’s account is similar to and different between ‘us’ and ‘them’ requires equal
from previous Winnebago practices. Radin ethnography among ‘us’ (1986: 138). In the
thus analyzes how Rave could change his same period, Mead also wrote an ethnographic
beliefs and still remain within the general report on Samoa for the Bishop Museum
Winnebago cultural framework. While the in Hawai’i: Social Organization of Manu’a
analysis remains self-contained (about the (1969[1930]). This is a standard research
Winnebago), the method of eliciting native report about social organization (chiefs, titles,
accounts of specific events and tracing how land arrangements) that does not attract
certain themes are replicated remains valid much attention apart from an audience of
today. anthropologists.
Boas and his students often commented on Coming of Age in Samoa reached an
issues in American society, especially about audience beyond the US. It remains significant
race or in their role as experts on Native in Samoa today, especially in American
American society. It has always been the Samoa where Mead did fieldwork on the
practice of social science research to com- island of Ta’u in Manu’a. And, because texts
ment on contemporary issues; however, such extend beyond the moment of their production
comments are not the same as Zeitdiagnose (Ricoeur 1991), Coming of Age continues to
when they are based on empirical research frame the meaning of anthropology in Samoa
and linked to general theory (Noro 2001). and of Samoa; my presence there in 2005 gen-
A notable exception, Margaret Mead, came erated discussions of the book and the purpose
close to Zeitdiagnose in her popular writing of anthropology. Coming of Age in Samoa is
and in her widely read ethnography, Coming cited by the American Samoan representative
of Age in Samoa: A Psychological Study of to Congress, Faleomavaega Eni Hunkin, as an
PrimitiveYouth for Western Civilisation (2001 insult to Samoan culture (Tavita 2004). He is
[1928]). This book – not written strictly for upset by Mead’s categorization of Samoa as
a scientific audience – caused furor inside a primitive society and by her discussion of
and outside academic circles. Mead used Samoan sexuality. Perhaps more importantly,
her ethnographic knowledge about Samoa as Manu’a was at one time the sacred center of an
a basis for a critique of American culture, elaborate hierarchical culture and Mead does
and wrote the book for an American general not recognize this in the popular Coming of
Age (although she does recognize it in Social overt forms (census materials, economic flow,
Organization of Manu’a). Faleomavaega feels geography, language, material culture, etc.) as
that the world continues to get the wrong well as how the forms are lived by individuals,
image of Samoa because, he claims, the which Sapir called the analysis of variation
book is taught in introductory courses of (Preston 1966: 1127). The cultural relativism
anthropology at American universities. Derek of Radin and Sapir proposed a method that was
Freeman does not escape criticism either; he is based on internal evidence in order to avoid
accused of depicting Samoan culture as being imposing categories on other cultures, with a
excessively violent. Both anthropologists are focus on engaged individuals; this method was
criticized for their reduction of Samoan later criticized as being too particularistic.
culture. Samoan culture is not represented Regarding audience, Radin and Sapir
according to Samoan norms; that is, the preferred to rely on texts, which turn out
Samoan voice is missing. In the research to have a longer ‘shelf life’ than concepts.
and writing process Samoans were typed The present-day Winnebago, or the Tikopians
by anthropological categories and Samoans described so thoroughly by Raymond Firth,
today reject the gloss. do not care about the concepts used by
Edward Sapir was a contemporary of Radin the anthropologists or their interpretations.
and Mead, also a student of Boas, and The ‘native’ audience today is interested in
a linguist. Sapir noted that all people use these old ethnographies for their descriptive
general categories as a way of making sense value as historical documents; they give them
of a huge amount of personal experience. their own interpretation.
Because of this general tendency, he was wary In another school, methods were developed
of concepts (such as ‘motivation’) because to break out of the particularistic view and to
they are generalizations that are imposed address contemporary issues through general
on our perception of objects and events, theory. Beginning in the 1940s, the so-called
useful for talking about, in an analogical Manchester School of anthropology, headed
sense, the actual phenomena but removed by Max Gluckman, defined what became
from the phenomena (Preston 1966: 1115). called situational analysis (a slightly different
Concepts tend to become endowed with version was called social drama by his student,
what Sapir called a ‘peculiar quality of self- Victor Turner). Most of these anthropologists
determination’ (ibid. 1115). Social scientists were working in Africa and trying to
tend to prefer concepts and categories because develop theories and methods appropriate
they offer precision and clarity and in for analyzing colonial relations. Gluckman
fact Sapir was criticized for his lack of (1958 [1940]) insisted that Europeans and
theory and refusal of categories (ibid. 1105). Africans had to be seen as a total system,
However, when the categories are given not as isolated groups. This could be done
prime importance, the researcher tends to through the analysis of situations or events
use people selectively and only insofar as where problems would become apparent;
they provide new material for the categories. the concept of ‘social fields’ was used to
Sapir insisted that this was wrong, that ‘the recognize the unbounded nature of social
categories must be distinctively meaningful relations. Victor Turner used this method to
in and therefore derived from, the particular show the symbolic importance of events – for
milieu, so that they will accurately describe example, rituals or conflicts – for individual
the milieu’ (ibid. 1120). For Sapir, the locus participants. In four volumes about the
of culture is in individuals and the experience Ndembu (cf. 1957, 1962, 1967, 1968) Turner
of actual individuals brings the researcher demonstrates, through the personal stories of
closest to the inherent structure of culture. named individuals, how cultural categories
He demonstrated this in his analyses of life sustain a given social structure through
histories (Sapir 1922, 1995[1938]). As Sapir an intermingling of meanings. For Turner,
defined method, you have to know the a social drama, which is often a moment of
conflict, reveals a ‘moment of translucence’ others (see, for example, Gubrium and Hol-
when the positions and conflicts among stein 1997). ‘Critical ethnography’ introduced
the involved individuals become apparent. reflexivity about what ‘we’ do and cast a
Turner (1957) concluded that changes brought critical eye on writing practices, fieldwork
by colonial rule exacerbated the internal topics and research sites. In anthropology,
contradictions in Ndembu social structure. the emphasis has been on the production of
Whereas the contradictions caused by res- ethnography, especially the relation between
idence and decent could be tolerated or the fieldworker and those being researched.
resolved before, they often collapsed into Two popular books, Paul Rabinow’s (1977)
unrestrained conflict under colonial rule. Reflections on Fieldwork in Morocco, and
Along with his analysis of conflict, Turner The Headman and I by Jean-Paul Dumont
looked for replication in the symbols and (1978), opened up the reflexive question
concepts used by individuals in Ndembu in the US about the relation between
society. Key (or root) metaphors were defined the researcher and his or her informants4 .
by Turner as those that occur at different These were followed by Writing Culture
times in different situations to structure (Clifford and Marcus 1986) and Anthropology
meaning. as Cultural Critique (Marcus and Fischer
In a review of the Ndembu work, Mary 1986). Writing Culture was a collection of
Douglas claimed that Turner solved the articles that questioned how the process of
problem of validation once and for all, writing established a self/other relationship
although she worried about what the named in ethnographic description. It questioned the
individuals would think about their stories notion of ethnographic authority and how the
being public. ‘It should never again be ‘I’of the anthropologist had fashioned the ‘we’
permissible to provide an analysis of an or the ‘other’ of the ‘natives.’Anthropology as
interlocking system of categories of thought Cultural Critique called for a more politically
which has no demonstrable relation to the active engagement of anthropologists in
social life of the people who think in these the issues of their times. Following these,
terms’ (Douglas 1970:303).Whereas Sapir anthropology was challenged to drop the
and Radin focused on culture as a system, ‘savage slot’and to undertake critical research
Turner linked culture to practice, to the about the contemporary world (Trouillot
concept of society, and to universal questions. 1991: 40). These works, and many others,
All these authors were attempting to address opened an experimental current that continues
broader contemporary issues – the decimation today (e.g. Carucci and Dominy 2005).
of North American indigenous groups and the As a result there have been various efforts
disruption caused by colonialism in Africa. in writing, such as teamwork between the
The intended audience consisted of academics anthropologist and the interlocutor in order
and possibly administrators. Mary Douglas’ to produce ‘dialogue’ or ‘polyphony,’ with
comment about named Ndembu individu- different measures of success (Faubion 2001;
als seems to anticipate that the audience Marcus and Mascarenhas 2005). Topics have
was not going to be so contained in the broadened to include the contemporary world
future. of elites, corporations, medicine, law and
environmental issues, to name a few. Along
with the focus on new topics, George Marcus
CRITICAL ETHNOGRAPHY (1998) talks about the ‘complicity’ of the
fieldworker regarding his or her relation with
The emergence of a self-consciousness the events or people being studied while
regarding ‘self and other’ in the last quarter of others talk about ‘emergent practices’ (Mauer
the twentieth century altered the way anthro- 2005: 1). Like Zeitdiagnose, these authors
pologists and sociologists write ethnography aim to study issues that they are involved
and deal with data and representations of in and to take a political, often a moral,
position in order to describe what these times draw the listener (anthropologist) in, so that
are like. the anthropologist shares their complicity in
The valuation of the sites of research the violent events being described, while at
has also been redefined. Akhil Gupta and the same time the narrators deny their own
James Ferguson (1997) critiqued the place complicity (2003: 147). The author reports to
orientation of anthropology and called for a professional audience and determines the
research that was not so place dependent. truth of the narratives. But, were life story
Since we live in a world of transnational narratives appropriate for talking about the
flows, refugees, and exiles, a researcher tensions of state formation? It is likely that a
should adjust his or her methods appropri- different genre or domain was being addressed
ately. For George Marcus (1998), multi-sited by her interlocutors and here is where, again,
ethnography recognizes that individuals in the problem of audience appears. Despite
today’s world are on the move; the anthro- the attention paid to writing, topics and
pologist therefore tracks these individuals and place, critical ethnography has not addressed
their networks. James Faubion (2001: 52) adequately the relation of theory to method
notes that, although this type of research or the issues of ethnographic competence and
is easily justified, it has remained largely intended audience.
an ideal model since it is hard to find
the time or funding to do it in practice.
And even if funded and attempted, Ghassan ETHNOGRAPHIC COMPETENCE
Hage (2005) found that there are many pitfalls,
primarily the exhaustion of the ethnographer The subject matter for anthropology has
and the unhappy expectations of reciprocity always been global but today its institutions,
by individuals who expect him to take them practitioners and audiences are also global
seriously, not just to drop in for a short (Lederman 2005: 321). The same can be
visit. Faubion suggests an alternative, that said for the other social sciences and this
fieldwork might ‘proceed cross-sectionally expanded situation has implications for the
and sequentially,’ and ends his review of methods used as well as for reception.
American anthropology strongly in the spirit However, with the exception of linguistic
of Zeitdiagnose: ‘modernity is …many things; anthropology, very little of the discussion
and it is up to the cultural (and social) about fieldwork and representation addresses
fieldworker to explore, describe and diagnose the need for new methods to interpret the
at once what such a multi-scalar assemblage data (e.g. Briggs 1986; Silverstein and Urban
of artifacts is, or what it might be’ (Faubion 1996). Although critical ethnography – and
2001: 52). taking a political and moral stand – is often the
The resultant political positioning generally goal, how do we know – and can we know? –
puts the weight on categories or concepts the truth and intentionality intended by our
like ‘power,’ ‘hybridity,’ and ‘race,’ and uses interlocutors?
individuals to fill in the story. One example is The move to critical ethnography defined
a book about multi-sited memory and identity privileged sites and privileged topics with
in the border region of Trieste that ‘breaks sometimes unanticipated results. For exam-
with relativism’ because it is ‘not a standard ple, the site of Asale Angel-Ajani’s research
ethnography of empathy’ but ‘an ethnography reveals a preference for certain sites, the
of complicity’ (Ballinger 2003: 7). The author problem of speaking for someone else, and the
analyses how average citizens make sense problem of audience. Angel-Ajani’s research
of history by assimilating the events of their with women prisoners in Italy, most of whom
lives into long-standing narratives that ‘are were from Africa, put her in the position of
legitimated or authorized precisely in moral listening to dramatic testimony of chaos and
terms’ (2003: 9). Asked to tell about the violence, where what the prisoners say often
events of 1943–45, the speakers are said to does not seem to be ‘really real.’ She argues
that in such situations the listener cannot Because of the creative/transformative

assume to be an expert or authority; instead, nature of speech, intentionality and truth are
the focus should be on critical reception, not as straightforward as question and answer
on actually listening to what is being said sessions might suppose. In spoken language,
(2004: 142). Ironically, when she presented not only do we communicate messages,
her data at a conference the academic audience we communicate how to interpret those
rejected the prisoners’ stories as implausible. messages, sometimes with additional devices
Angel-Ajani suspects that the stories would (tone, body language, etc.) to modify the
have been accepted if they had been situated meaning. As has been shown in conversation
in a privileged site, such as a refugee camp or analysis and linguistic anthropology, acts
a women’s shelter (ibid.). of speaking and interpretation are partly
Questions about intentionality raise issues constructed by the audience’s response and,
about audience and code that have been in fact, we would not be able to communicate
addressed by discourse analysis in productive without others to carry, complete, expand
ways. Charles Briggs reviews the problems and revise our messages (Duranti 1993: 226).
inherent in the interview as a genre. Briggs Even the simplest routine like the opening of
criticizes the assumption that acceptable a telephone call is a joint activity between
questions are semantically transparent, that speaker and audience. When the audience is
the respondent’s assumptions will match cooperative this work is hardly recognized.
those of the researcher (Briggs 1986: 50). However, when the message is open to
He goes on to demonstrate the significance dispute speakers use other techniques such
of reference, indexicality, code and social as verbal indirection, where the meaning is
relations in interview situations. Responses not in the text alone, the speaker avoids
can address a given subject from many full responsibility for what is said, and the
different points of view (ibid. 54) and with audience is actively involved and compelled
differing amounts of detail. Radin argued that to interpret the referent and the meaning
no one can say everything about the topic in (Brenneis 1987: 504).
answer to a question (1957[1927]: 16); there If a message has multiple goals, how is
is always a selective process, a partial answer. it to be understood? Sometimes speakers
Add to this the fact that any researcher exerts seem to be exploiting the truth rather than
a certain amount of control over the situation to use it as the criterion for interpretation.
so that often one’s interlocutors are trying The goal might be to make truth irrel-
to give the appropriate answer to what they evant or to make the audience at least
understand is the intended question. In other partly responsible for what is being implied
words, there is a large gap between intention (Duranti 1993: 233). If a researcher does
and meaning which allows room for the not fully account for the audience, he or
researcher to impose an external meaning onto she might suppose that the analysis is
a response. A reliance on interviews, without immune to the consequences of interpretation
being aware of the nature of speech acts, due to the geographical and institutional
often reinforces our preconceptions rather distance from their subjects (ibid. 229).
than raising new questions (Briggs 1986: 119). As this distance breaks down, however,
However, interviews do not have to be thrown the notion of being objective is harder
into the scrap bin. The aim of Briggs’book is to to maintain. According to Duranti, the
show that they require rhetorical competence: emphasis must come away from the speaker
an understanding of the modes of verbal and move instead to the ‘coordinated role
interaction among the group being studied, played by the addresses or audience in
the context and indexical meanings for what any kind of communicative act’ (ibid. 237).
is being said, and the fact that speech has the And beyond this, truth has a different
possibility to create or transform a given state meaning across sociocultural domains such
of affairs (ibid. 45–46). as domestic, political and ritual domains,
so that an ethnographer’s observations must The contradictions in narratives about colo-

be grounded in a careful analysis of discourse nial memories encountered by Andrea Smith
patterns according to the appropriate domain (2004) were in fact examples of Bahktin’s
audience. idea of heteroglossia. On the one hand,
The issue of audience turns the atten- informants told her that in colonial Algeria
tion from ethnographic authority (even a people were ‘all the same, a melting-pot,’
reflexive one) to that of ethnographic com- and on the other hand they told stories about
petence. One must be competent in order to an ethnic hierarchy and tensions due to
understand the subtle signals and shifts that ethnic intermarriage. Her examples show that
are occurring during the research. An example memory is narrated contingent on audience
of this is John Haviland’s (1991) recording and on distinct voices. For a general audience,
and analysis of the life story of Roger Algeria was a melting pot, as the story is
Hart, an Australian aboriginal man. Haviland replicated in official histories and discussions.
demonstrates how the discourse constitutes – But when talking about personal history and
brings into existence – a coherent view personal experience, people used the first per-
of personal identity that goes to the heart son ‘I’ and spoke about a colonial experience
of the problem of individual participation marked by ethnic divisions and class. Distinct
in a cultural order. Using the actual text voices index a distinct orientation; neither
(which was also video-recorded), Haviland version is intended to be the absolute truth
demonstrates how others participated in the (Smith 2004: 265).
performance (including the anthropologist), When the focus is on universal concepts
the immediate context for what Roger Hart such as ‘power’ or ‘race,’ combined with a
says, as well as the background issues for why Zeitdiagnose purpose to describe ‘what these
he says what he says at this point in time. times are like’ it is easy to read motives
The multiplicity of voices and the message of into informants’ answers. Ethnographic com-
the story replicate the themes of ambivalence petence – the recognition of multiple voices,
which surround the topic of aboriginality in intention and heteroglossia – draws attention
Australia (Haviland 1991: 347). to the intended meaning and audience of the
Likewise, an analysis of a misunderstand- speakers as well as to the work being done by
ing – a failure of competence – can open up the audience.
problems of intention as Johannes Fabian
(1995) found in a conversation he had while
doing fieldwork in Zaire. In an interview with INSIDERS AND OUTSIDERS
a woman leader of a charismatic prayer group,
the woman insists, by coming back to the topic Due to the reflexive turn in ethnography, it is
and addressing it openly, that they discuss not possible to produce texts without consid-
why Johannes Fabian is no longer a Catholic. ering the relation of authors to their subjects.
The ensuing conversation is full of evasion Anthropology has generally practiced the
and a certain level of embarrassment on the interpretative project of translating the con-
part of the anthropologist. Both were speaking cerns of the research site and a certain group
Swahili – a language in which Fabian is very for an audience unlikely to encounter them
competent – so there was not a language directly (Lederman 2005: 322). However, an
problem. As Fabian realized, it was a problem important historical shift has occurred so that
of intention. He understood the conversation it is often no longer clear who is a cultural
as an interview, while she was trying to engage insider and who is an outsider. The situation
him in testimony (1995: 46). It is an example of ‘translation’ is changing as ‘insider’
of how one cannot assume that the context anthropologists become more common and
is given; rather, Fabian argues, context works because publications circulate beyond lim-
in a dialectical, not a logical-methodological ited audiences. One way around translation
way (ibid. 48). can be seen in the recent collaborative
Zeitdiagnose has its own pitfalls. Because Kapferer makes an ethnographically informed
it is aimed at identity audiences, it is often argument about the general possibilities of
based on fieldwork ‘at home’ and written for sorcery – it is directed to the contradictions
a defined ‘we.’ Since the content is already and discordances of life worlds – while
known, the only novelty is in the production, acknowledging distinctions in Sri Lankan
in the way the argument is written (Siikala practices (ibid. 11, 15). Sri Lankan sorcery
2004: 202). It is inevitable that certain identity is not another example of exotic otherness;
audiences will have priority over others sorcery is a practical discourse about ‘human-
depending on the location and interests of generated social and political realities,’ part of
the major publishing houses. As ethnographic the general problematic of ‘the alienating and
texts become accessible – especially through constituting forces of power’ (ibid. 7, 303).
the Internet – they are not tied to the frame Kapferer breaks with the category because
of academic judgment or to a particular ‘we.’ categories structure the interpretation, as Sapir
If Zeitdiagnose defines a ‘we’ it runs the risk warned. At the same time, Kapferer avoids
of excluding others since any ‘we’ implies a the particularistic view of cultural relativism
‘not-we’ (Urban 1996). and the moral positioning of Zeitdiagnose.
General theory is written for a global Sorcery is not analyzed to determine the
scientific audience. When Marshall Sahlins truth about violence and power; rather, it
states that comparison is at the heart of demonstrates the anguish of human beings in
ethnography he is talking about general a social and political world (ibid. 25). When
theory, not the comparison of ‘these to ethnography addresses general questions the
those.’ At the level of general theory, broad audience is ‘human beings’ and there is the
questions are addressed concerning the nature possibility to debate and disagree. The intent
of society, the relationship of individuals is relevance not truth; thus, it allows the
to social structures, the way reciprocity possibility for a voice (response) for a global
creates social relations, the processes of audience.
social change, etc., and are argued with The move to critical ethnography opened
detailed ethnographic data. The questions the question of audience. Since that time,
can be revisited and revised in all sites as information has become more widely avail-
historical changes affect the nature of society able, making the problem of audience more
and social relations. Ethnographies like The pronounced in all aspects of the research
Fame of Gawa by Nancy Munn (1992[1986]), project. So long as the audience was primarily
Marshall Sahlins’ Anahulu (1992) or Feast a scientific one, there were guidelines about
of the Sorcerer by Bruce Kapferer (1997) how the analysis should be read and judged –
are explorations of general questions such as often for the way it addressed problems within
(respectively) value and reciprocity, cosmol- an academic discipline. However, it is less
ogy and contact between different cultural likely today that the audience will be so
orders, and sorcery’s relation to the conditions narrow; in fact, it is quite likely that the
of human existence. These questions can be audience will be any number of people with
explored anew in new sites, with new data, an interest in the place, the topic, or for many
according to new circumstances, because no other reasons. It means that one’s writing
one instance of a phenomenon accounts for all is read increasingly by ‘an undisciplined
its dimensions (Kapferer 1997: 302). audience’ (Lederman 2005: 323). We are
For example, when Bruce Kapferer writes faced, therefore, with the situation where we
about sorcery in Sri Lanka he breaks the collect data from a variety of people who
category expectations of sorcery, which are themselves have a variety of interests, and
‘deeply engaged in the very aims and publish our analyses in a variety of sites for a
methodology of anthropology,’ while at the variety of readers, each of whom brings his or
same time avoiding the ‘dark cave of her own interests to the text. The text always
methodological relativism’ (1997: 11, 13). escapes the author. The work produced will
work between the anthropologist George internal critique (Kapferer 1997: 20). Naming
Marcus and a Portuguese nobleman, Fernando creates authority; competence is the ability
Mascarenhas. Their email exchanges are to live according to local systems of signif-
reproduced with little editing and no inter- icance.
pretation as an example of the ‘shifting
“politics” between the tradition of letters in
Portugal and the tradition of interviewing CONCLUSION: ACCOUNTING FOR
in anthropology’ (Marcus and Mascarenhas AUDIENCE
2005: xv). The intended audience remains
narrow: an academic audience interested in Accounting for an expanded audience is
the study of elites or the general prob- a measure of the goals of the research
lems of ‘the presentation of ethnography and whether it addresses problems – even
expressed through the relations that produce if unwittingly – defined by interests or
it’ (ibid. xvi). categories that frame the results. If the
Another possibility is that one’s ethno- ethnographic method aims to study every
graphic competence will be used by the possible group and site, the goal of the
subjects of the study for their own purposes. research is a critical issue. One danger is
This often occurs when the researcher has that ethnography becomes a form of spying;
worked in an area over a long period of another is that it reproduces dominant interests
time. Glenn Petersen has done research in and discourses. The question of dominant
Micronesia for 30 years, but as he notes, ‘since interests is relevant in Finland, for example,
Micronesians know about Micronesia, they where much of the research funding comes
have neither need for nor much interest in from the state and where the state often
my ethnography; they already know about determines (beforehand) the topics that it
themselves, to put it simply’ (2005: 312). will fund. Research theory, as defined earlier,
While his writing gave him ethnographic can address two audiences: a scientific or
authority with a scholarly public (mostly an administrative audience. In many cases,
outside Micronesia), the Micronesians put research questions are designed for topics
more value on his competence, that is, about which the state needs information (such
how they could make use of his outside as prison populations, area studies or Islam).
experience. Competence means taking into The researcher in these cases is defined as an
account all the ‘messy’ parts: disagreements, expert, despite the fact that expert predictions
the tensions that link hierarchy and equality, have proven to be unreliable and, ultimately,
and the discrepancies of everyday experience unaccountable for their errors (Menand 2005).
(Petersen 2005: 315). For Petersen, a good Even when one avows to be critical – as in
ethnographer knows about life as the indi- critical ethnography in the US – the research
viduals in the community experience it, and questions and results may unwittingly repli-
therefore knows something about the effect cate central problems in American society
of cultural contradictions on their lives (ibid. (power, race, ethnicity, gender) in other places
316). Competence is gained by recognizing if one does not listen carefully to what is
the complexities, not glossing them over being said within the context of another social
with general concepts. This was, after all, setting. This is what Louis Dumont meant
the point of practice theory, when Pierre when he warned that anthropology should not
Bourdieu quoted Jean-Paul Sartre: ‘Words be subjected to non-anthropological concerns
wreck havoc when they find a name for (Dumont 1986). Dumont argued that the
what had up to then been lived namelessly’ proper study of society was based on enriching
(Bourdieu 1977: 170). Practice brings the general theoretical questions through detailed
contradictions to the surface, as Gluckman ethnography in order to determine the valu-
recognized; practice theory recognizes the ations that distinguish one research context
political implications of categorization and from another.
not be read conclusively; it will be read for its Bourdieu, Pierre 1977 Outline of a Theory of Practice.
relevance by readers who assign meaning to Cambridge: Cambridge University Press.
it according to their own valuations. Brenneis, Donald 1987 ‘Talk and Transformation.’ Man,
New Series 22(3): 499–510.
Briggs, Charles 1986 Learning How to Ask. Cambridge:
Cambridge University Press.
NOTES Carucci, Lawrence and Michèle Dominy 2005 ‘Anthro-
pology in the ‘Savage Slot’: Reflections on the
Epistemology of Knowledge.’ Anthropological Forum
1 A caveat is necessary here: the ‘native’ audience 15(3): 223–233.
is not necessarily a recent phenomenon. It has been Clifford, James and George Marcus (eds.) 1986 Writing
common in Finland for a long time for the general Culture: The Poetics and Politics of Ethnography.
public to read ethnology and folklore texts, among Berkeley: University of California Press.
others, about themselves. However, while the audi- Douglas, Mary 1970 ‘The Healing Rite (review article).’
ence is ‘native,’ the writing is among ‘insiders’ and Man, New Series 5(2): 302–308.
does not raise the issue of ‘self and other’ in the same Dumont, Jean-Paul 1978 The Headman and I. Austin:
way as when the researcher is from another culture.
University of Texas Press.
2 These works are part of ‘reflexive modernity’
and, in fact, ‘modernity’ is marked by reflexivity. Dumont, Louis 1986 Essays on Individualism: Modern
The implications of how the information flow makes Ideology in Anthropological Perspective. Chicago:
the world ‘modern’ and ‘reflexive’ on an institutional University of Chicago Press.
level, and the impact on anthropology, are discussed Duranti, Alessandro 1986 ‘The Audience as Co-Author:
by John Knight (1992). An Introduction.’ In, Special Issue: The Audience as
3 There are no footnotes or references, as would Co-Author, edited by Alessandro Duranti and Donald
be expected in scientific writing, although there is an Brenneis. Text 6–3.
explanation of the methodology in an appendix. Duranti, Alessandro 1993 ‘Truth and Intentionality:
4 The reflexive turn in American anthropology
An Ethnographic Critique.’ Cultural Anthropology
happened in the context of Project Camelot and the
Vietnam War. Both events opened debates about
8(2): 214–245.
the purpose of anthropological research: was it to Fabian, Johannes 1995 ‘Ethnographic Misunderstanding
supply information for the CIA and the US military? and the Perils of Context.’ American Anthropologist
Rabinow refers to the American political context in the 97(1): 41–50.
introduction to Reflections on Fieldwork in Morocco. Faubion, James 2001 ‘Currents of Cultural Fieldwork.’
In, The Handbook of Ethnography, edited by Paul
Atkinson, Amanda Coffey, Sara Delamont, John
Lofland and Lyn Lofland. London: Sage Publications,
REFERENCES pp. 39–59.
Freeman, Derek 1983 Margaret Mead and Samoa.
Angel-Ajani, Asale 2004 ‘Expert Witness: Notes Toward Cambridge, MA: Harvard University Press.
Revisiting the Politics of Listening.’ Anthropology and Freeman, Derek 1999 The Fateful Hoaxing of Margaret
Humanism 29(2): 133–144. Mead: A Historical Analysis of her Samoan Research.
Ballinger, Pamela 2003 History in Exile: Memory and Boulder: Westview Press.
Identity at the Borders of the Balkans. Princeton: Freeman, Derek 2001 ‘Words have no Words for Words
Princeton University Press. that are not True’: A Rejoinder to Serge Tcherkézoff.’
Basso, Keith 1996 Wisdom Sits in Places: Landscape and Journal of the Polynesian Society 4: 301–311.
Language among the Western Apache. Albuquerque: Giddens, Anthony 1991 Modernity and Self-Identity.
University of New Mexico Press. Stanford: Stanford University Press.
Beck, Ulrich 1992 Risk Society: Towards a New Giddens, Anthony 1992 The Transformation of Intimacy.
Modernity. London: Sage. Stanford: Stanford University Press.
Beck, Ulrich 1994 Ecological Politics in the Age of Risk. Gluckman, Max 1958[1940] Analysis of a Social Situ-
Cambridge: Polity. ation in Modern Zululand. Manchester: Manchester
Becker, Alton and Bruce Mannheim 1995 ‘Culture Trop- University Press.
ing: Languages, Codes, and Texts.’ In, The Dialogic Graham, Laura 2005 ‘Image and Instrumentality in
Emergence of Culture, edited by Bruce Mannheim and a Xavante Politics of Existential Recognition: The
Dennis Tedlock. Urbana: University of Illinois Press, Public Outreach Work of Eténhiritipa Pimentel
pp. 237–252. Barbosa.’ American Ethnologist 32(4): 622–641.
Gubrium, Jaber and James Holstein 1997 The New Menand, Louis 2005 ‘Everybody’s an Expert: Putting
Language of Qualitative Method. New York: Oxford Predictions to the Test.’ Book Review in The New
University Press. Yorker, December 5: 98–101.
Gupta, Akhil and James Ferguson (eds.) 1997 Culture, Munn, Nancy D. 1992[1986] The Fame of Gawa:
Power, Place: Explorations in Critical Anthropology. A Symbolic Study of Value Transformation in a
Durham: Duke University Press. Massim (Papua New Guinea) Society. Durham: Duke
Hage, Ghassan 2005 ‘A not so Multi-Sited Ethnography University Press.
of a not so Imagined Community.’ Anthropological Noro, Arto 2001 ‘Zeitdiagnose’ as the Third Genre
Theory 5(4): 463–475. of Sociological Theory?’ Paper presented at Euro-
Handler, Richard 1986 ‘Vigorous Male and Aspir- pean Sociological Association Conference, Helsinki,
ing Female: Poetry, Personality and Culture in August 28.
Edward Sapir and Ruth Benedict.’ In, Malinowski, Noro, Arto 2004 ‘Sosiologian Kolmio: teoriat, käytöt
Rivers, Benedict and Others, edited by George ja yleisöt’ (‘A Sociological Triangle: Theory, Use and
Stocking. Madison: University of Wisconsin Press, Audience’), unpublished paper.
pp. 127–155. Petersen, Glenn 2005 ‘Important to Whom? On Ethno-
Haviland, John 1991 ‘ “That Was the Last Time I Seen graphic Usefulness, Competence and Relevance.’
Them, and No More”: Voices Through Time Anthropological Forum 15(3): 307–317.
in Australian Aboriginal Autobiography.’ American Preston, Richard J. 1966 ‘Edward Sapir’s Anthropology:
Ethnologist 18(2): 331–361. Style, Structure, and Method.’ American Anthropolo-
Heidegger, Martin 1971 Poetry, Language, Thought, gist 68(5): 1105–1128.
translated by Albert Hofstadter. New York: Harper Rabinow, Paul 1977 Reflections on Fieldwork in
and Row. Morocco. Berkeley: University of California Press.
Kapferer, Bruce 1997 Feast of the Sorcerer: Practices Radin, Paul 1957 [1927] Primitive Man as Philosopher.
of Consciousness and Power. Chicago: University of New York: Dover Publications.
Chicago Press. Radin, Paul 1987 [1930] The Method and Theory of
Ethnology. South Hadley, MA: Bergin and Garvey.
Knight, John 1992 ‘Globalization and the New Ethno-
Ricoeur, Paul 1991 From Text to Action: Essays
graphic Localities: Anthropological Reflections on
in Hermeneutics, II, translated by K. Blamey and
Giddens’s Modernity and Self-Identity.’ Journal of the
J. B. Thompson. Chicago: University of Chicago Press.
Anthropological Society of Oxford 23(3): 239–251.
Sahlins, Marshall 1992 Anahulu: The Anthropology
Lederman, Rena 2005 ‘Challenging Audiences: Critical
of History in the Kingdom of Hawai’i. Volume
Ethnography in/for Oceania.’ Anthropological Forum
One, Historical Ethnography. Chicago: University of
15(3): 319–328.
Chicago Press.
Marcus, George and Michael Fisher 1986 Anthropology
Sahlins, Marshall 1996 [1993] Waiting for Foucault.
as Cultural Critique. Chicago: University of Chicago
Cambridge, UK: Prickly Pear Press.
Press. Sapir, Edward 1922 ‘Sayach’apis, a Nootka Trader.’ In,
Marcus, George 1998 Ethnography Through Thick and American Indian Life, edited by Elsie Clews Parsons.
Thin. Princeton: Princeton University Press. New York: Viking.
Marcus, George and Fernando Mascarenhas 2005 Sapir, Edward 1995 [1938] Foreword. Left Handed, Son
Ocasião: The Marquis and the Anthropologist, of Old Man Hat, Recorded by Walter Dyk. Lincoln:
A Collaboration. Walnut Creek, CA: Alta Mira. University of Nebraska Press.
Mauer, Bill 2005 ‘Introduction to “Ethnographic Emer- Siikala, Jukka 2004 ‘Theories and Ideologies in
gences”.’ American Anthropologist 107(1): 1–4. Anthropology.’ Social Analysis 48(3): 199–204.
Mauss, Marcel (in collaboration with Henri Beuchat) Silverstein, Michael and Greg Urban (eds.) 1996 Natural
1979 [1950] Seasonal Variations of the Eskimo: Histories of Discourse. Chicago: University of Chicago
A Study in Social Morphology, Translated with a Press.
Foreword, by James J. Fox. London: Routledge and Smith, Andrea 2004 ‘Heteroglossia, “Common
Kegan Paul. Sense,” and Social Memory.’ American Ethnologist
Mead, Margaret 1969 [1930] Social Organization of 31(2): 251–269.
Manu’a. Bernice B. Bishop Museum Bulletin 76. Stocking, George 1992 The Ethnographer’s Magic and
Honolulu, Hawaii: Bishop Museum Reprints. Other Essays in the History of Anthropology. Madison:
Mead, Margaret 2001[1928] Coming of Age in Samoa: University of Wisconsin Press.
A Psychological Study of Primitive Youth for Western Tavita, Terry 2004 ‘Faleomavaega Tackles Mead-
Civilisation. New York: Harper Collins (Perennial Freeman Debate.’ Samoan Observer Online,
Classics). 25 October.
Trouillot, Michel-Rolphe 1991 ‘Anthropology and Turner, Victor 1962 Chihamba, the White Spirit.
the Savage Slot: The Poetics and Politics of Manchester: Manchester University Press.
Otherness.’ In, Recapturing Anthropology: Work- Turner, Victor 1967 The Forest of Symbols: Aspects of
ing in the Present, edited by Richard G. Fox. Ndembu Ritual. Ithaca: Cornell University Press.
Santa Fe: School of American Research Press, Turner, Victor 1968 The Drums of Affliction: A Study
pp. 17–44. of Religious Process among the Ndembu of Zambia.
Turner, Victor 1957 Schism and Continuity in an Oxford: Clarendon Press.
African Society: A Study of Ndembu Religious Life. Urban, Greg 1996 Metaphysical Community. Austin:
Manchester: Manchester University Press. University of Texas Press.
6
Social Research and Social
Practice in Post-Positivist Society
Pekka Sulkunen
Scientific methods are not tool kits that How should we classify such styles of
researchers can select to suit their tastes and reasoning in sociology, and how could we
preferences to compete with other techniques explain or understand the reasons for such
contending to reach the truth. Research differences? In this article I argue that a major
instruments in sociology are no more than change in sociological styles of reasoning took
in other sciences independent of concepts place in the late 1970s and early 1980s both
and problematics from which they emerge, in the way sociology began to conceptualise
and they in turn structure the questions and the social world and in the way sociological
theoretical concepts that they can be used to research was related to social practices or
deal with. Instead of a choice of methods policy-making. One apparent indication of
it is more appropriate to talk about ‘styles the new style of reasoning was the boost in
of reasoning’, like Ian Hacking (1990: 6), qualitative research and the accompanying
who has argued that although the social world ‘cultural’ or ‘linguistic’ turn in sociology (see
is constructed differently by different styles Chapter 1). These changes reflect the role that
of reasoning, this is not to say that the social sciences first had in the three post-war
constructions are arbitrary. It simply means decades and then lost when the welfare state
that, for example, an explanation or prediction construction period had attained maturity.
formulated in probabilistic quantitative terms
already implies a great deal about the world
in its concepts which, in turn, are integrated REPRESENTATIONAL, EPISTEMIC
with a statistical methodology. The same AND POSITIONAL DIMENSIONS
reality represented in another vocabulary OF KNOWLEDGE
and through a biographical or ethnographic
methodology would look different but still be Sociological studies tell about social reality
no less true. in three different ways. First, they report
SOCIAL RESEARCH AND SOCIAL PRACTICE IN POST-POSITIVIST SOCIETY 69
knowledge about social realities. This represent the same reality, but within very
knowledge depends on their conceptual different styles of reasoning and methods.
framework and on their instruments of Second, the style of reasoning itself tells
observation such as ethnography, media us about society. The three studies of social
analysis or the survey technology, but exclusion, with their different methods and
within the constraints of the concepts and concepts, involve very different problematics
instruments, knowledge it is (Bhaskar 1975). although their subject matter is at least
This is the representational dimension partly the same. The first probably would be
of knowledge. For example, a study on built on communitarian hypotheses on how
the relationship between social capital social relationships support people in their
and social exclusion might be made with self-control, autonomy and integration into
statistical methods, which require that the educational and work life. The second would
abstract categories ‘social capital’ and ‘social raise different kinds of questions concerning
exclusion’ are operationalised as measurable the authority of the state, the basis of selecting
indicators that describe individuals or some pharmaceuticals as legal and others
collectivities. Most likely, a fair amount of as illegal, and the intended and unintended
drug users would be found among the most consequences of prevention efforts. The third
excluded. Another study might compare would pay attention to the fact that social
Western countries and come to the conclusion capital may be of very different kinds, and that
that most of them apply strict prohibitions it is not entirely an independent variable in
on a selection of pharmaceuticals – not the processes of social exclusion but depends,
all, like alcohol, but many such as opiates, instead, on power relationships in society.
cocaine, amphetamine or MDMA (‘ecstasy’). All three studies involve moral investments
Possession, distribution, production and in the way they categorise their observations,
import of the prohibited drugs are legal they represent not only the reality as facts
offences with penal consequences. The role but also wider frameworks in which they
of the criminal justice system as the interface see society, the state, the individual and the
between the state and the drug user in many interface between citizens and the public
ways operates as a mechanism of exclusion. powers. In other words, they are motivated
The term ‘prohibition’ is also an abstract by different interests of knowledge.
category and describes at least part of the The interests of knowledge which define
same reality as the quantitative study, but from the needs and dispositions to explain and
a completely different angle. Finally, a third understand what happens in society determine
study, made with ethnographic methods, the types of questions that can be asked about
could analyse the social relationships in the social reality: the epistème, to use Michel
different types of public social and health Foucault’s term (Foucault 1966: 197). Let us
services offered to illicit drug users, and find call this the epistemic dimension of socio-
that at the low-threshold needle exchange logical knowledge. Epistèmes themselves are
clinic the (often voluntary) social workers social facts that represent the relations of
are allies of their clients, trying to help them domination in the given society. The master
to get medication and other help, whereas the example is Foucault’s own account of the
workers at the substitution treatment clinic history of Western science and its ways of
require a great deal of ‘motivation’ and effort relating human culture and nature. It evolved
from their clients, often with the consequence from classifying and representing the natural
that they are felt to be part of the penalising world, including humans, in the natural
control system rather than a help. Again, history of the seventeenth and eighteenth
we are observing mechanisms of exclusion, centuries, to the study of exchange and utility
including social capital and the lack of it, but in mercantilist and physiocratic economics,
from a completely different angle than the to the focus on work in classical economic
other two studies. All of them report facts that theory, and finally to the complete separation
between human and natural sciences towards understand to see precisely what practical role
the end of the nineteenth century. A similar they potentially serve today.
example is Ian Hacking’s analysis of the dis-
covery of probability and stochastic processes
in the early nineteenth century. This opened PLANNED ECONOMY AND MODE 1
up whole new areas of scientific research SOCIAL SCIENCE
concerning populations and mass phenomena.
Such grand transformations of the epistème When the architect of the British welfare state,
reflect society’s interests in itself and its Sir William Beveridge, envisioned the state’s
natural environment in wide philosophical role in the post-war society he considered that
terms, but as the three examples above point the ‘spectacular achievements of the war-time
out, the kinds of questions society asks of planned economy’ (Beveridge 1944: 120)
itself are also reflected in research designs in measured by the GNP and employment should
a smaller scale, and the designs and questions be applied to the economy in peace, which
themselves tell us something important about also could benefit from state regulation, and
society. not only by means of income redistribution.
Third, sociological studies report through The state’s aim was no longer to minimise
their form and scientific practice quite special public spending but to optimise all spending
facts about society, namely facts about the in society, in regard to available labour power
relationship between sociologists themselves by means of ‘manpower budgeting’. The state
and the object of their study. This we budget should be measured to maintain full
can call the positional, or the sociology of employment but not to exceed the national
knowledge dimension of sociological facts manpower capacity. The Keynesian principle
(Bourdieu 1982). The division of sciences of full employment was translated into income
into disciplines in itself is an important fact equalisation in social policy and growth was
about the society that engenders it. The fact its primary objective. Thus planning was not
that social sciences are today separated from uniquely a Socialist idea; a plan designed and
natural sciences, and split into sub-disciplines supervised by the centralised national state
each with their own dominant styles of was a generally accepted European model of
reasoning, is not simply a consequence of industrial development.
the accumulation of knowledge but also a The planning did not only cover infras-
real factor which has an impact on what new tructure, regional policy, monetary and fiscal
knowledge it can produce. Another division, policy, but also the ways in which people
especially important in sociology, is the way should lead their lives. The Swedish Alva and
that scientific knowledge is entangled with Gunnar Myrdal (1934) had in their famous
but sometimes also opposed to practical population policy programme proposed that
knowledge about society, held by ordinary the state should root out bad habits among
people, by policy-makers, by the media and its citizens and teach them good manners.
other significant institutions. People had to be trained to take care of
All these three dimensions must be their households and bring up their children,
accounted for when we discuss the rela- although the important and complicated task
tionship between social science and social of education should primarily be yielded
practice. Sociological studies should not be up to professionals in nursery schools and
read only as reports about their objects, but other institutions. The state had to make
symptomatically, as manifestations of the people conscious of their real interests.
power fields of knowledge in which they oper- Psychological research about happiness was
ate, and of their relationships to these fields. needed to discover what makes life worth
In all three respects the social sciences in living according to people themselves, and the
advanced capitalist societies have undergone institutions of society should be formed on the
a transformation which we must clearly basis of these observations.
The sociology associated with the plan in absolute terms. The earlier consumer
was an exemplary case of what Gibbons booms of the eighteenth century in England
et al. (1997) call Mode 1 science. Knowledge (McKendrick et al. 1982; Mukerji 1983) and
production in Mode 1 takes place at a distance still in nineteenth-century Europe (Williams
from the context of application, as ‘pure’ 1982) were limited to small elites, but the new
science at the far end of the continuum industry-based consumer society was a phe-
from research to ‘development’. Mode 1 nomenon of the masses and encompassed the
knowledge production respects rigorous disci- structural foundations of industrial society.
plinary boundaries. Its canon of accountability In retrospect this change was so drastic that
and quality control dictates that only intra- it has been given dramatic names, such as
disciplinary expert authority is qualified to the European golden era (Therborn 1995), the
judge the validity of knowledge, the merits golden years of capitalism (Hobsbawm 1994),
of the scientists and the value of their work. the glorious thirty years (Fourastié 1979) or
Mode 1 science is enclosed in the universities, even the second French revolution (Mendras
and – the authors claim in a second book 1988). It changed the make-up and technology
(Nowotny et al. 2001) – in fact not accountable of everyday life. It reconfigured both social
at all in practical terms, such as outcomes in structures and people’s way of thinking about
welfare or as impact in policy effectiveness. themselves and about their relationships with
Nowotny et al. (2001: 63) explain that others. It brought to ordinary people a quantity
the positivist virtue of a completely self- and diversity of goods, pleasures and uses of
controlling, context-free science was culti- time that either had never existed before or had
vated in a context that had an unlimited only been accessible to the very privileged.
appetite for meaning and certainty already Luxury was democratised and became part of
from the eighteenth century, when Western everyday life. The pleasures of consumption
society was experiencing an enormous wave and sensuality became publicly presentable,
of modernisation. The same explanation in everyday life as well as in the media and
holds even more emphatically for the post- in marketing, whereas they had earlier been
war decades in Western countries where excluded from public discourses and left to
progress, change for the better, lurked in the the private sphere. The Weberian values of
future biographies of not only the elites but industrial society – frugality, industriousness
of the great majority of people. Post-war and achievement orientation – were replaced
industrialisation was particularly dramatic for by post-industrial or post-modern values that
Europe which, with the exception of England stress pleasure for its own sake and cherish
and Belgium, was still a continent dominated its public presentation as much as they spurn
by small-holding agriculture on the eve of its public control. The romantic ethos of
the Second World War. Germany, Denmark, capitalism seemed to get the upper hand.
Netherlands and Sweden all had well over At the same time parliamentary institutions
one-fifth of their labour force employed in were consolidated in all Western countries.
agriculture; Spain and the eastern countries Europe only gradually recovered from quasi-
including Finland had well over one-half. totalitarian war-time regimes, the USA from
Thirty years turned first the west and then the an era of ultra-nationalistic anti-communist
central and eastern part of Europe to econo- suspicion. Value conflicts over religion,
mies dominated numerically by the industrial nationalism, the family, sexuality and many
working class, the peaks reaching up to forms of consumption and culture gained
almost half of the total (civilian) labour force political platforms and turned into protests
(48.5 percent in West Germany in 1970)1 . and counter-protests or moral panics (Cohen
The post-war industrialisation produced a 1972).
phenomenal growth in consumption possi- The appetite for meaning and certainty was
bilities with no parallel in human history, not only of a psychological nature. The plan
not relatively speaking and certainly not was a central instrument in progressive
national industrial policies, and the plan science research could detect the determining
required reliable and impartial information elements in human social conduct, it does not
for its material. Also the moral ambivalences matter who participates in the production of
needed to be formulated in a language that knowledge, and from what point of view.
and described more systematically than with Instead of engaging in the question
anecdotal accounts by journalists and writers of standpoints of knowledge, there was
or movie directors. The appetite was not only a strange cleavage between ‘Grand Theory’
for meaning and certainty; it was also for and ‘Abstracted Empiricism’ (Mills 1959)
information. prevalent in sociological texts of that era.
Population statistics had already a solid The highly technical vocabulary of the former
foundation from the late nineteenth and and the bureaucratic ethos of the latter
early twentieth century. To a lesser extent appear quite distinct from each other, theory
this was true also for economic and labour representing ‘basic’ or pure science with
statistics. However, household consumption disinterested motives (beyond the interest in
data only began to become available in the the establishment of the discipline itself) while
1950s. Income and mobility surveys have the empirical researchers apply their measure-
an even shorter history, and individual data ments and methods to practical social issues
on specific consumption patterns (such as of integration, cohesion, equality, crime pre-
alcohol), sexual behaviour, political opinions vention, youth work, health promotion, etc.
and attitudes about this or that aspect of every- Neither theory nor empiricism left much
day life, which today are routinely provided room to human agency, with understandable
by Eurostat, European Science Foundation, aspirations, goals and hopes. For empiricist as
and national statistical offices, or which are well as theoretical sociologists, Mills argued,
industrially produced and commercialised by the object of knowledge is social action – what
private ‘research’ companies, were still in the makes members of society act in a meaningful
1960s a rarity provided by specially funded and orderly way from the point of view of
academic research programmes. All this society. According to Mills, it was the task of
information required a conceptual portrayal of emancipating social science to help out people
society – a language to describe its direction who ‘need, and feel they need … a quality of
of change, and to interpret its relevance. mind that will help them to use information
Even though the epistemic dimension of and to develop reason in order to achieve
the sociology associated with the plan was lucid summations of what is going on in the
strongly normative – preparing the good life world and of what may be happening within
for all – any sociology of knowledge was themselves’ (p. 5). That quality of mind, the
an alien, if not hostile, idea to Mode 1 sociological imagination, is offered to them
knowledge production. Science that speaks by the critical sociologist who is capable of
with the voice of disciplinary authority does using the classical tradition to translate private
not highlight its subject and the subject’s problems to public issues and vice versa.
relationship with the reality it speaks about.
To take an example from the natural sciences,
the mapping out of the human genome THE NEOLIBERAL TURN AND MODE 2
is a collective project which advances at SOCIAL SCIENCE
every new step independently of who makes
that step and independently of what the By the 1970s social research in accordance
consequences of the genome project will with Mode 1 knowledge production was crit-
be for diagnostic practices, for treatment icised increasingly often. One of the objects
methods, for the lives of people with known of critique was the problematic assumption
genetic disorders, and for the lives of many about objective knowledge independent from
other people who live with them. In the same the viewpoint of the knower. One solution
way, one might think that if basic social has been to make explicit ‘whose side we are
on’, as Howard Becker, the famous American not only do people know a great deal about
sociologist of deviant minorities, asked in their society – obviously, in order to go to
1966, and argued that it is the task of the school, to be employed or be an employee,
sociologist to side with the ‘underdogs’, the to be husband and wife, to make one’s way in
drug users, prostitutes, ethnic minorities or modern traffic, to be a consumer, a political
extremely poor people. The voice of such or a social citizen, one has to know a very
people is not heard in the media; they are not complicated set of rules and norms – but that
seen in the halls of power, and thus informa- the whole social structure is based on such
tion about their lives must be produced by shared knowledge. Thus the proper approach
professional sociologists who are explicitly to the analysis of social structure is not abstract
equipped with methodologies to make that measurement such as statistics on income
information available (Becker 1970). But as distributions or class divisions but sociology
Alvin Gouldner (1970) remarked in a famous of knowledge.
and influential debate with Becker, such Once it was recognised that people know
a position does not solve the problem itself, a great deal about social life, and that
created by the division of labour between social scientists’ knowledge is part of the
pure academic science and applied research. same ‘stock of social knowledge’ in which
Being on the side of the underdog is in other people also live, it is easy to dismiss
itself an ambiguous position. What is an Mode 1 science as an illusion. There is
underdog? There is always somebody above no pure social science, independent of the
every overdog, and thus if we study drug context of application, because the scientists’
users, for example, even the local police knowledge is itself part of the context: it serves
officer – an obvious overdog to the addicts – is to define situations, to conceptualise social
under the authority of the police headquarters, issues and to establish selections of feasible
of the municipal council, the President of policy options, to exclude others and so on.
the local Lions Club, and many others, Social sciences are permanently challenged
not least the legislator who decided that by everyday thought, they cannot in actual
drug use is illegal and thus a police affair. fact justify themselves only with disciplinary
Moreover, Gouldner argued that even when canons, and their academic authority is
sociologists take the underdog point of view constantly questioned. Such a view stresses
they, knowingly or not, serve a constituency the positional, or sociology of knowledge-
on whose interest their career possibilities dimension of social science: scientific con-
depend. cepts, methods and language which produce
A major blow to Mode 1 social science and express facts also reflect the relationship
came from social constructionism, which between the scientists and their object, the
pointed out that there cannot be any pure people they study. Sociology committed
social science knowledge independent from to this view always faces what is called
ordinary people’s everyday knowledge about ‘the reflexivity problem’. If social reality is
society. Anthony Giddens (1979: 245–253) significantly influenced by what people think
gave this point a famous formulation in or believe about it, and these beliefs are
his state-of-the-art review of social theory influenced by the believers’ interests, social
by saying that the twentieth-century trend scientists contribute to the shaping of this
in social science has been to increasingly reality in a way that also is infected with their
account for the fact that people always interests. In what way, then, can sociologists
already, without any interference from social claim that their knowledge is superior or
scientists, possess enormous amounts of somehow less influenced by their situation
knowledge about society. A landmark volume than other knowledge? Berger and Luckmann
to realise this had already appeared in 1966: said that sociology of knowledge is ‘like
The Social Construction of Reality by Berger trying to push a bus in which one is riding’
and Luckmann (1987). They had argued that (1987: 20). To pretend that disciplinary social
science is somehow neutral and virtuously countries if we look at it from the perspective
outside of social reality, even in its basic of the principles of governance. Nikolas
theoretical part, is to make a fallacious claim Rose and Peter Miller (1992) have associated
of objectivity and a rather dubious attempt this change with the Foucauldian idea of
to cover up its partiality. Recently this view governmentality, the internalisation of power
has been profusely advocated by Michael by its subjects in modern society, and
Burawoy (2005). found its locus in the changing role of the
When Giddens made his observation that state. Since then, an extensive literature has
social sciences tend towards a recognition of demonstrated that essential reforms in public
the importance of everyday knowledge, he management (itself a new term signalling
was in fact pointing at a major change in the change) have taken place in advanced
the relationships between social science and capitalist states, at times to a point where
social practice that was occurring in all its the state seemed to be withering away from
three dimensions: representational, epistemic capitalism altogether. Luc Boltanski and Ève
and sociology of knowledge, in the post- Chiapello (1999), on the other hand, have
positivist transition. In representational terms, studied business management doctrines and
the so-called cultural, semiotic or linguistic found that a similar re-organisation has taken
turn drew sociologists’ attention to critical place in the private sector even earlier. In
analyses of meaning in peoples’ everyday fact, the new style of governance has shifted
life, in the media, in cultural products and from business to public management with
also in social science itself. In Erik Allardt’s more or less success. Michael Power (1997)
terms (2006), the hermeneutic pole in social has confirmed this phenomenon and used
science gained dominance vis-à-vis its com- the term The Audit Society to describe the
plementary opposite, the positivist vision. It essential change that has occurred to the role
was observed that beyond what was taken for of social sciences in the new mode of power:
fact there is a complex web of communica- evaluation, of which auditing is one especially
tion, from statistics collectors’ concepts and important part. Using the term coined by
classifications, to respondents’ interpretations Gibbons and associates (1997), it depicted the
and responses to them, to statistical analysis change from Mode 1 to Mode 2 knowledge
and interpretation of results by researchers and production. In contrast with Mode 1 ‘pure’
by their readers. No part in this web can be science, Mode 2 knowledge production takes
taken for granted as evident and obvious. In place in the context of application; it is
cultural and media studies the same ambi- transdisciplinary and it is directly accountable
guity of meaning appeared in many forms. also on grounds of its practical usefulness
Semioticians talked about the ‘referential (Nowotny et al. 2001: 220).
fallacy’ (Greimas and Courtès 1979), media Boltanski and Chiapello concluded that
researchers focused on the user perspective, by the mid-1970s industrial life had entered
i.e. the interaction between the media and a deep management crisis in OECD (all
the audience (Sulkunen and Törrönen 1997; Organisation for Economic Co-operation and
Alasuutari 1995), and literary criticism fol- Development) countries. The bureaucratic
lowed Roland Barthes (1977: 142–48) in management structures that had been copied
believing that the ‘author is dead’– the ‘mean- from the military were inadequate for per-
ing’ of literary texts escapes the intentions formance and unacceptable from the point
of their authors, and in the extreme case it of view of the increasingly educated labour
even escapes the text itself. Meaning became force. The response was to create more
a problem, the object of study, the referent, democratic participatory work organisations,
instead of being simply the medium of facts. flexible employment schemes, subcontract-
Why? It has by now become established ing, autonomous quality circles or teams,
that the end of the 1970s marked an end outsourcing and competition within compa-
of a historical period in advanced capitalist nies. The new organisational form was no
longer the hierarchy but the network, and its There is no willingness to prescribe norms
node was the project: a task-based uniquely of how and what we should or should not
funded team with autonomous leadership, do. Nevertheless, the political responsibility
targets and a deadline. Control was no has to be attested and the officials have to
longer directed from central management be given grounds for decisions about how to
down to the divisions, departments and direct the state’s money to different purposes,
the shop-floor stewards; from now on it among other things. Frame laws and pro-
was not only internalised in the employees’ grammes that define goals, recommendations
own individual interest but also externalised for programmes and criteria for standards are
to peers and to competitive relationships needed to achieve the purposes mentioned
between operational units and profit centres. above. In very many areas supra-national
The public management doctrines that were bodies define the targets. For example in
adopted in a short time-span in the mid- the European Union framework programmes
1980s in the OECD and its member countries are formulated on many issues: development
applied the same principles to state and local of technology, employment, prevention of
government. Similar problems of bureaucratic exclusion, regional development, promotion
management were to be eliminated as in the of health, prevention of drug problems and
private sector, but a moral dimension was harmonization of education and many other
also important: citizens should no longer be things. These are again translated to national
seen as subjects of the state; they were put strategies, policy programmes and eventually
in the position of clients, and the public to short-term action plans. Local and regional
service-providing agencies were re-organised governments insert these to their own objec-
to meet requirements that are often called tives and action plans. The formulations of
the three Es: Economy (ensuring the best these goals are of very general nature in the
possible terms for endowed resources, imply- programmes and their accentuations usually
ing competition between service producers), correspond to those of the general public
Efficiency (producing more value for money) administration thinking: in alcohol and drug
and Effectiveness (ensuring that outcomes programmes the goals are the responsibility
conform to intentions) (Power 1997: 50). of citizens themselves, initiative, networking
The central government is no longer autho- and relying on the support of neighbourhood
rised to issue norms to local officials and communities, to name just a few.
service producers such as hospitals, schools, From the epistemic point of view,
day care services etc., but only information governance by programmes and frameworks
and advice, and resources now measured to rather than by plans means that society asks
output rather than needs. itself different kinds of questions than before.
Social sciences that were attached to the plan
were expected to say what happens if we do X,
FROM THE GOOD LIFE TO GOOD and what should be done to make Y happen.
PRACTICES Now the questions are: in regard with the three
Es, which of the projects A, B, C … N meet
Governance – or management, borrowing best the objectives of the programme? For
again the language from the business world – example, the objective might be to minimise
by information is often used to describe the alcohol-related problems. The central
new power structure.Abetter term to highlight government does not have the means at
the moral dimension of the change would be its disposal to reduce alcohol consumption
‘governance by programmes’ or ‘frameworks in the country, or is reluctant to use such
which have replaced the plan’. The moral and policy instruments (price increases, permitted
political authority of the state does not suffice hours of sale and other regulations of the
to define what the good society is, what kind market); instead it asks local communities,
of life is good or bad or how to solve problems. non-governmental organisations (NGOs),
businesses, labour unions, churches, etc. factors, that the practical social work in
to establish innovative projects and have prisons, for example, cannot commit only
them evaluated for economy, efficiency and to one or a few explanation models and
effectiveness (Sulkunen 2006). their conclusions concerning clients. It is
The central concept in goal and framework more useful to observe the effects of the
management, ‘innovation’, has been used in existing methods of social work itself and
the science and technology policy already choose the methods that seem functioning
for a long time. The administration cannot and cost-effective. The innovation thinking is
predetermine the results of the researchers or dressed in the rhetoric of good practice, and
the direction of the development interests of it leads to a sort of new social Darwinism.
companies, but it can take a stand on the direc- Clients and employees are given free hands to
tion of the development in general and make invent new kinds of action models, mutations,
strategic policy definitions. New ideas come and eventually the most fit among them are
from the ‘grassroots level’, from fieldworkers chosen for additional refining on the basis of
and citizens themselves. Transferred to social expert reports. Evaluation is then considered
policy, the pattern of ‘innovation thinking’ the unbiased and unemotional mechanism of
has assimilated traits of romantic rationalism: social and natural selection.
people are thought to be creative and the The other side of pragmatic thinking
solutions have to be given space to develop is moral neutrality. Assumption that the
and grow upwards from down under. The methods of social work or the alternatives
researchers should evaluate and strengthen for control policies could be evaluated only
these tendencies instead of planning. The in regard of their functionality and effec-
primary tasks of evaluation are surveillance tiveness, presupposes a strong unanimity of
of expenses, ensuring quality and supervision goals – the employment, health and security
of observance of rules and regulations: tasks of the population being considered good
which used to belong to inspectors and objectives and repeated offences a bad one,
superintendents of state governance. Often for example. In programme rhetoric neutrality
they include, though, more ambitious goals of leads to abstracticism and definitional – and
generalisation, which are called recognizing at the same time administrative – ambiguity.
good practices. Promotion of health is a good example of
The expressions ‘good practice’ and ‘what this. Another is management of security. This
works’ originate from prison administration rhetoric calls the acts of officials with a general
(Garland 2001), and from there they have name that has a morally neutral flavour. It is
spread to social work and public adminis- easy for everyone to accept, but at the same
tration in general. This manner of speech is time it expands the range of goals of the
an application of solution-oriented therapy officials and experts and blurs the boundaries
or pedagogy, which detaches itself from of their actions. The other moral points of view
analysing reasons of problematic behaviour related to the matter – the customers’ freedom
and instead concentrates on the recognition of choice or the sense of justice of many
of the effects of alternative action models. citizens demanding more severe punishment
The search for reasons is, according to this perfor criminals, for example – can be forgotten
spective, not only a waste of time but it might from the standpoint of effectiveness.
also have negative effects. When criminals
learn about the causes of their behaviour, those
causes become ‘vocabularies of motive’, THE FICTIONS OF EVALUATION
justifications and rhetoric for escaping respon- RESEARCH
sibility (Sykes and Matza 1957).
The recognition of good and working From the point of view of the sociology
practices is pragmatic thinking. The behaviour of knowledge, governance by programmes
of a person is a sum of such complicated positions the sociologist in a new relationship
with social practice, exactly like Nowotny reflects what we have called the Ethics of
et al. (2001) describes it as characteristic Not Taking a Stand, quoting a fieldworker
of Mode 2 knowledge production. Social we interviewed on how she advises parents
research operates in the context of application; to behave in the drug issue: ‘The most ethical
it is not constrained by disciplinary boundaries stand is not to take a stand at all, the parents
and the criteria of its accountability are less should decide this for themselves’ (Määttä
academic than practical: tell us what works, et al. 2003).
and we shall be pleased not to know why Abstraction has also another legitimating
something else might not work. function. It protects the sphere of intimacy,
If the idea of ‘pure science’ in the posi- which was the historical goal of the wel-
tivist Mode 1 knowledge production was an fare state: the self-responsibility of citizens,
illusion, but an illusion in a real context with individual agency and commitment to good
real consequences, are the ideals of Mode 2 choices to promote a person’s own health,
social science more realistic and convincing? security and well-being. This is not limited to
To some extent the answer is positive: social rhetoric or ideological speech, but it is part
science that operates in a context and is aware of the everyday life of advanced capitalist
of its own vested interests is more honest about society. For example, the health care expert
itself and potentially also more relevant than system is relatively helpless if the patient is
social science built on the fiction of basic unwilling to co-operate: ‘only the medication
science and applied research. However, also that is taken will help’. But you cannot
Mode 2 science attached to the programme force anyone to co-operate. You cannot get
rather than to the plan has its illusions, overweight under control unless consumers
as real as the fiction of Mode 1 science eat less. Disciplining consumers’food choices
but in a different context and with different directly would be felt as unacceptable pater-
consequences. The first illusion arises from nalism. They will have to take responsibility
the logic of governance by programmes itself: for their own choices.
abstract objectives. In programmes with very concrete targets
Programme and evaluation rhetoric make such as weight loss the outcomes are easily
politics look rational, and hierarchical measured. However, in many cases standards
decision-making just like business manage- of performance are more ambiguous, and
ment. But what does state need this rhetoric the audit or evaluation of efficiency and
for? Why is it impossible for example for a effectiveness is in fact a process of defining
ministry to decide on its strategy in alcohol and operationalising them, often with perverse
policy and to follow that strategy in financing effects on the actual operation of the system.
and other solutions? One reason for this A good example is research evaluation.
is the pursuit of political neutrality already In theory, university departments and research
discussed above. The ministry does not want institutes are expected to produce relevant
to decide or it considers itself incapable good quality research, but the auditing crite-
to dictate how municipalities, organisations, rion: articles published in refereed journals,
companies – or other ministries – should act in leads to an increase in the number of such
order to decrease problems caused by alcohol journals, with the consequence that fewer
consumption. To preserve the autonomy of people read them and the social relevance
those actors the policy goals are defined of research results declines. Nevertheless,
with abstract concepts, of which employment, money is invested in them because the
health and security are the most central ones. effective alternative, such as taxing food or
It is always possible to reach unanimity alcohol, is not included in the repertoire of
concerning those goals, even though the acceptable policies.
moral or power resources would not always Governance by programmes and frame-
suffice to make concrete policy decisions. The works thus supports what Nowotny et al.
rhetoric of ‘what works’ and ‘best practices’ (2001) consider the key features of the
Mode 2 science. Abstract objectives of eval- effect is a part of the equipment of science,
uation research in the context of application as well as of everyday thinking. We light the
encourage transdisciplinarity and pragmatic lamp, roast the ham, start the car, give an
division of labour. When the interest is not advice to another person or call a meeting
directed at explaining behaviour nor even at assuming on the basis of our prior experience,
the mechanisms of effects of the measures that a certain state of affairs will follow. We do
taken, but only at the effectiveness of the not usually ask why it results from that action.
alternative action models, there is no need Only when the lamp does not get lighted, the
for the research of alcohol problems, youth ham does not roast or advice or invitation
culture or deviant behaviour but for skil- are not followed, do we start investigating
ful evaluation researchers who can flexibly the error. Even then we don’t have to know
move from one substance area to another. much about the mechanisms of the causal
Corresponding abstracticism is visible in the chain, but we can lean on our prior experience.
training of fieldworkers and their division of We routinely change the bulb, check the fuse
work. As the French sociologist Robert Castel and the position of the ignition key or whether
(1981: 135–44) has claimed, the profession- our advice or invitation has actually been
alisation of social work has not actually led received. Only in very exceptional circum-
to the often anticipated medicalisation nor stances do we have to lean on expert support,
specialisation of other kind. Instead there has that is to say we utilise research-based knowl-
developed a paraprofessional mixed type, the edge to explain the mechanism between the
general task of which is social control. cause and the effect and this directs us to look
The abstracticism of goal and framework for the error in the different parts of the chain.
management has resulted in efficiency and In evaluation research the primary interest
effectiveness becoming passkey concepts that of knowledge is similar to our everyday causal
are applied everywhere. Sometimes, however, thinking. The interest of knowledge is not to
they misrepresent the reality that they are establish general laws about social life but to
supposed to evaluate. For example, every verify whether the action causes the desired
society will need to take care of addicts effect or not. This could be called clinical
in some way. For the clients’ welfare as causal thinking. Its objective is not to explain
well as for the institutions – the police, the mechanisms of effects, but only to test
social offices, penal and medical institutions – pragmatically if they are there, how much they
the most relevant questions relate not to vary and are there possibly some ill effects.
outcomes in terms of recovery but to the Medicine that is based on evidence and the
division of labour between controlling and medicine-influenced social policy of the same
helping professions. This, however, is not type are examples of clinical causal thinking2 .
an issue of performance but of ethics and Still, clinical causal thinking has similarly
values. Constrained to evaluating efficiency limiting logical conditions as the causality
and effectiveness, Mode 2 social science may tests of the research laboratories. The cause
in fact sustain inefficient responses instead of and the effect have to be logically independent
asking pragmatically relevant questions about and empirically dependent on one another;
their rationale. the cause factor has to be adjustable in an
unambiguous and measurable manner; and the
effect of other variables has to be eliminated
THE RETURN OF CAUSALITY AND ITS experimentally or statistically. Also there have
OLD PROBLEMS to exist unambiguous means for measuring
the effect, which has to follow the cause
The second illusion of the new mode of practi- temporarily.
cal social science arises from the requirements Some clinical medical research is able to
of efficiency and effectiveness. Both are based come up with these expectations. The medica-
on the notion of causality. The concept of ment will stay the same in spite of who it is
given to and who hands it out, and the human This shift has had implications at three
body is approximately the same in different levels: referential (what is studied), epistemic
circumstances. Usually it is possible to control (what kinds of questions are asked) and
the effect of differences with the reliability sociology of knowledge in a narrow sense
that meets the expectations of the practice. (position of scientists in relation to the
In social work and social policy the conditions object of their research and to those whose
of clinical research can be measured up only knowledge needs they serve).
in exceptional circumstances. As Tom Erik I have also argued that Mode 1 social
Arnkil and Jaakko Seikkula (2005: 60) have science was a deviation rather than a long
claimed, a psychosocial work does not move tradition in modern social science. It was
from a certain place, actor or situation to associated with governance by plan in the
another remaining the same, as medication. post-war decades of state-driven industri-
No ‘method’ or ‘model’ can be independent alisation and construction of the welfare
of the agent who delivers it, who receives it, states. It had important functions in providing
or that would be conceptually independent of a conceptual portrayal of society and the
the effect it aims at. theoretical framework for growing needs for
Evaluation is usually performed in a sit- monitoring and information, which now are
uation where a test or even comparative mostly covered by information systems other
configuration of any kind is not possible. than the social sciences. However, Mode 1
Ordinarily the evaluator is contacted when social science was also an illusion, and
the funding of the project has already been many social scientists and critics were aware
granted, its staff and principal idea are of this.
decided, and the fieldwork of the project The shift to Mode 2 science was a reaction
has already partly started. Some vested to internal developments within the social
interests have already been created, the good- sciences but more importantly it reflects the
willing mission is an inspirational source for epochal change in the logic of governance
action, and there is no time or resources in capitalist societies from the plan to
for comparison presupposed by a real eval- programmes and frameworks. This change is
uation of effectiveness. The expectation deeply rooted in the structure of capitalist
of establishing causality turns into a thin societies which stress individuality and auton-
fiction. omy of agents. Fixity on abstract targets, good
practices and causal relationships in Mode 2
science are fictions too, but on the other hand,
CONCLUSION AND DISCUSSION science which is aware of its own context has
a greater critical potential and capacity to act
In this article I have discussed the relationship as ‘public sociology’ than a discipline that
of social science to social practice, and argued is divided between pure science and applied
that a radical paradigm shift occurred in the research.
1980s in all advanced capitalist countries from
the positivist mode associated with the idea
of the plan to a more context-based science NOTES
attached to governance by programmes and
frameworks. The change reflects the new
practices of governance that were introduced 1 Therborn 1995, table 4.4, p. 66, and table 4.6,
at the same historical period in the business p. 69.
world as well as in public management. 2 The so-called Cochrane-library collects the results
In social science knowledge production the of clinical treatment research, evaluates their validity
and draws conclusions on the probabilities of the
shift corresponds to a transition from what effects of the methods. Corresponding work has been
Gibbons et al. (1997) call a transition from done in social policy under the name of Campbell-
Mode 1 to Mode 2 science. cooperation.
REFERENCES Greimas and Courtès (1979) Sémiotique - diction-

naire raisonné de la théorie du langage. Paris:
Alasuutari, Pertti (1995) Researching Culture. Qualita- Hachette.
tive Method and Cultural Studies. London: Sage. Hacking, Ian (1990) The Taming of Chance. Cambridge:
Allardt, Erik (2006) ‘The Twofold Nature of Sociology: The Cambridge University Press.
Positivistic and Hermeneutic’, International Journal of Hobsbawm, Eric John (1994) Age of Extremes:
Contemporary Sociology, 43(2): 248–61. the Short Twentieth Century 1914–1991. London:
Arnkil, Tom Erik and Seikkula, Jaakko (2005) Dialoginen Abacus.
verkostotyö. (Dialogical meetings in social networks.) McKendrick, Neil, Brewer, John and Plumb, J.H.
Helsinki: Tammi. (1982) The Birth of a Consumer Society. The
Barthes, Roland (1977) Images, Music, Text. Essays. Commercialization of Eighteenth-Century England.
London: Fontana. London: Hutchinson.
Becker, Howard S. (1970) ‘Whose Side are We On?’, in Mendras, Henri (1988) La seconde révolution française
Douglas, Jack D. (ed.) The Relevance of Sociology. 1965–1984. Paris: Gallimard.
New York: Appleton-Century-Crofts. pp. 99–111. Mills, C. Wright (1959) The Sociological Imagination.
(1st edn, 1966.) New York: Oxford University Press.
Berger, Peter and Luckmann, Thomas (1987) The Social Mukerji, Chandra (1983) From Craven Images: Patterns
Construction of Reality. Harmondsworth: Penguin of Modern Materialism. New York: Columbia
Books. (1st edn, 1966.) University Press.
Beveridge, William (1944) Full Employment in a Free Myrdal, Alva and Myrdal, Gunnar (1934) Kris
Society. London: George Allen & Unwin. i befolkningsfrågan (Crisis in the Population Ques-
Bhaskar, Roy (1975) A Realist Theory of Science. Leeds: tion). Stockholm: Albert Bonniers förlag.
Books. Määttä, Mirja, Rantala, Kati and Sulkunen, Pekka (2003)
Boltanski, Luc and Chiapello, Ève (1999) Le nouvel esprit ‘The Ethics of Not Taking a Stand. Dilemmas of Drug
du capitalisme. Paris: Nrf Essais. and Alcohol Prevention in a Consumer Society –
Bourdieu, Pierre (1982) La leçon sur la leçon. Paris: a Case Study’, International Journal of Drug Policy,
Éditions de Minuit. 15(5–6): 427–34.
Burawoy, Michael (2005) ‘American Sociological Asso- Nowotny, Helga, Scott, Peter and Gibbons, Michael
ciation Presidential Address: For Public Sociology’, (2001) Re-Thinking Science. Knowledge and the
The British Journal of Sociology, 56(2): 259–94. Public in an Age of Uncertainty. Cambridge: Polity
Castel, Robert (1981) La gestion des risques. De l’anti- Press.
psychiatrie à l’après-psychanalyse. Paris: Éditions de Power, Michael (1997) The Audit Society. Rituals of
Minuit. Verification. Oxford: Oxford University Press.
Cohen, Stanley (1972) Folk Devils and Moral Panics. Rose, Nikolas and Miller, Peter (1992) ‘Political Power
London: MacGibbon and Kee. Beyond the State: Problematics of Government’,
Foucault, Michel (1966) Les Mots et les Choses. Paris: The British Journal of Sociology, 43(2): 173–205.
Gallimard. Sulkunen Pekka (2006) ‘Projektiyhteiskunta ja uusi
Fourastié, Jean (1979) Les trente glorieuses. Paris: yhteiskuntasopimus’, in Rantala Kati and Sulkunen,
Fayard. Pekka (eds) Projektiyhteiskunnan kääntöpuolia
Garland, David (2001) The Culture of Control: Crime and (The Flip-side of the project society). Helsinki:
Social Order in Contemporary Society. Oxford: Oxford Gaudeamus.
University Press. Sulkunen Pekka and Törrönen Jukka (1997) ‘Con-
Gibbons, Michael, Limoges, Camille, Nowotny, Helga, structing Speaker Images: The Problem of Enunci-
Schwartzman, Simon, Scott, Peter and Trow, ation in Discourse Analysis’, Semiotica, 115(1–2):
Martin (1997) The New Production of Knowledge. 121–46.
The Dynamics of Science and Research in Contempo- Sykes, Gresham and Matza, David (1957) ‘Techniques
rary Societies. London: Sage. (1st edn, 1994.) of Neutralization: A Theory of Juvenile Delinquency’,
Giddens, Anthony (1979) Central Problems in Social American Sociological Review, 22(6): 664–73.
Theory: Action, Structure and Contradiction in Social Therborn, Göran (1995) European Modernity and
Analysis. London: Macmillan Press. Beyond. The Trajectory of European Societies
Gouldner, Alwin W. (1970) ‘Anti-Minotaur: The Myth 1945–2000. London: Sage.
of a Value-free Sociology’, in Douglas, Jack D. (ed.) Williams, Rosalind H. (1982) Dream Worlds. Mass
The Relevance of Sociology. New York: Appleton- Consumption in Late Nineteenth-Century France.
Century-Crofts. pp. 64–84. Berkeley: University of California Press.
7
From Questions of Methods to
Epistemological Issues: The Case
of Biographical Research
Ann Nilsen
INTRODUCTION The history of the shifts in topics for debate

in biographical research is set within the wider
There is a long tradition of research from field of qualitative research. A number of
biographical approaches1 in sociology. Many different qualitative methods exist. A focus
and varied studies have been carried out on biographical research highlights issues that
from this perspective over time. My concern have been important in different phases of
in this chapter is however not to outline the development of these methods. It demon-
the history of empirical studies in the field. strates very clearly the main change in
Rather the intention is to focus on some of discussions; from method and methodological
the methodological debates that have been concerns in the early days, to more episte-
prominent during different phases from the mological and ontological questions that have
1920s to the present. These discussions are come to dominate the field from the 1980s
of interest in their own right. They are also onwards. These debates form the parameters
important because different methodological between which methodological debates are
perspectives invite focus on different aspects set and are important for understanding the
of social reality. Biographical research is also types of discussions that have dominated
a good case2 to highlight important features of many areas of the social sciences, sociological
a wider methodological debate. Even though biographical research in particular, over the
this chapter has no ambition of addressing the time period. Thus the different sections in the
history of these debates in the social sciences chapter will highlight debates with reference
in a broader sense, discussions in biographical to ontological/epistemological foundations
research cannot be explored without reference of methodological discussions that were
to the wider field of methodological questions. important in different phases.
In order to set the discussion within the is a theory and analysis of how research does
wider context of methodological issues, the or should proceed; it includes accounts of
starting point here is a brief overview over how the general structure of theory finds its
some main lines of questions and concepts application in particular scientific disciplines’
associated with the methodological debates. (p. 3). The connection between methods,
methodology, epistemology5 and ontology,
is complex and is often debated in writings
TERMS AND CONCEPTS IN about method. For Harding this relationship
METHODOLOGICAL DISCUSSIONS: can be thought of as concentric circles where
A BRIEF OVERVIEW method forms the inner circle and ontology
the outer (ibid). This could in some instances
The meaning of the term ‘method’ has be thought to imply that choices of methods
changed over time. The earliest book on social bring with them certain methodological and
research methods: Emile Durkheim’s The epistemological assumptions. However, in
Rules of Sociological Methods (1972 [1895]), discussions about the quantitative-qualitative
was not widely known in the English-speaking divide in social science methods, the claim
academia until its translation into English in that choice of method implies certain episte-
1938. Durkheim’s objective was to write a mological underpinnings, is but one of several
text to discuss methods explicitly (Durkheim standpoints in the debate (see e.g. Platt 1996;
1972, p. 19)3 . As Platt (1996, p. 252) observes Bryman 2004).
in a discussion about changes in interpretation Throughout the history of the social
over time, Durkheim’s work in the English- sciences one of the most salient debates
speaking world came to be associated with in the field of research methods has been
method and the kind of multivariate analysis that discussing the ‘quantitative-qualitative’
advocated by Lazarsfeld and his colleagues divide. Even though the general understand-
in the 1950s because of the use they made of ing of the distinction between the two involves
his writings in their discussions on method as techniques for collecting and analysing data,
technique (Kaplan 1964). the boundaries between them are not as
Methodology is a concept often used syn- clear-cut if aspects of methodology and
onymously with the term method4 . Whereas epistemology are brought to bear on the
the term ‘method’ in most cases refers to pro- discussion (Brannen 1995; Bryman 2004).
cedures or techniques for gathering evidence, As Platt (1996) points out in writing on the
methodology has a wider meaning. For the history of methods discussions in America,
current purpose a definition of methodology the terms and concepts for describing methods
that highlights the wider field of discussions have changed over time. The quantitative-
about methods, and the relationship between qualitative divide was described in terms of
method and theory, will be referred to. ‘case studies vs. statistical methods’ before
Kaplan (1964) gives the following definition World War II. ‘Survey’ was in this period
of methodology: ‘I mean by methodology used to describe a method in studies of whole
the study – the description, the explanation, communities, whereas its modern use is asso-
and the justification – of methods, and not ciated with large-scale statistical studies. The
the methods themselves’ (p. 18). On the aim term ‘case study’derived from social workers’
of methodology he continues: ‘[…] the aim cases that were used by sociologists as data at
of methodology is to help us to understand, a time when the boundaries between social
in the broadest possible terms, not the work and sociology were not clearly defined
products of scientific inquiry but the process (Platt 1996; Levin 2000). Life histories were
itself’ (Kaplan 1964, p. 23). Harding (1987) used synonymously with case studies (Platt
discussing these issues along the same lines, 1992, 1996). When the focus shifted in the
broadens the meaning of methodology even 1950s from what data was about to the way
more when she observes that, ‘A methodology it was collected, the debates changed and
FROM QUESTIONS OF METHODS TO EPISTEMOLOGICAL ISSUES 83
what was earlier known as case studies now person’s life in depth where the subject’s voice
known as ‘qualitative’ research. In circles is at the centre. A topical story focuses on
where quantitative methods were regarded as one particular issue in persons’ lives and is
the only truly objective methods for collecting aimed at researching a particular area of life
objective data, qualitative methods were whereas edited stories leave the researcher’s
thought of as useful only in initial stages of voice at the forefront. Plummer also makes
a study. a distinction referring to researcher ‘inter-
Current discussions about the quantitative- ference’ with accounts; naturalistic stories
qualitative issue are more open to bridging are spontaneously given accounts, researchers
the divide where data and methods are have not prompted them. Researched accounts
concerned (Brannen 1995; Bryman 2004). are those that researchers have asked infor-
Bryman (2004) points out how different ways mants to provide, and reflexive stories are
of approaching the discussion are decisive those where researchers reflect on their own
of whether multi-strategy research is deemed partaking in the creating and constructing of a
possible or not. If the divide is seen in story (Plummer 2001, pp. 19–35). In current
terms of methods for collecting and analysing research there might be a focus on single
data – the technical version – the gap is individuals or groups of individuals such as
easy to bridge. However, if the quantitative- families (Brannen et al. 2004), or as in the case
qualitative divide is referred to in terms of Bertaux and Thompson’s study of social
of different epistemologies, multi-method mobility in families (1997) and Bourdieu’s
approaches are not easy to apply. This latter study of socially excluded people in France,
point goes to the heart of the discussion in focusing on issues such as social class and
this paper; different epistemological positions try and map out meaning behind statistics
invite different standpoints to what data is, and (Bourdieu et al. 1999).
indeed also whether the very term ‘data’ is Ways of analysing such material varies with
considered valid. the overall approach taken by the researcher,
as well as the purpose for having collected
it to start with. When using biographical
BIOGRAPHICAL MATERIAL material other sources of data are inevitably
drawn on to map out and understand the
One definition of a biographical account is different layers of context lives are embedded
a story told in the present about a person’s in (Nilsen and Brannen 2005). In spite of
experiences of events in the past and her or his this paper focusing mainly on one perspective
expectations for the future (Nilsen 1997). The in particular, it is nevertheless clear, as the
term ‘biographical material’ does however following will demonstrate, that there is no
cover a wide range of empirical evidence: such thing as one correct way of approaching
personal letters, diaries, photographs, written biographical research material, and as the
autobiographical accounts (life stories) and method has evolved into multiple ways of
more (Plummer 1983, 2001; Roberts 2002). collecting biographical material, methods of
In this paper the discussion is focused on analysis have also become many and varied.
research material stemming from interviews. An important theoretical influence for the
Life stories come in many varieties and discussion in this paper is the tradition
one way of classifying is offered by Plummer from which biographical research originates:
(2001)6 . He makes a distinction between long American pragmatism as developed by Peirce
and short stories, where the first is the full- and Mead, especially with reference to notions
length story of one person’s life, and the latter of self and the social world as well as the
is based on more stories. A further distinction type of ontological perspective that informed
is related to the ‘depth’ of the accounts and is the works of these two (Lewis and Smith
between comprehensive, topical or edited life 1980). This perspective has been influential
stories. The comprehensive is a story of one in most European approaches to biographical
research, although as the following discus- uncover the basic laws that govern any
sion will highlight, other epistemological phenomenon under scrutiny. Thus Thomas
standpoints and theoretical approaches have and Znaniecki, in keeping with their time,
become more prominent over time. sought through the study of Polish immigrant
society in Chicago and in Poland, to uncover
‘laws of social becoming’. Such laws were
THE MAKING OF A SOCIOLOGICAL sought by focusing on the objective (values)
METHOD: CHICAGO CA. 1920 and the subjective (attitudes) sides of social
life. The insistence on including subjective
One of the most comprehensive sociolog- factors in social analysis was new at the
ical studies to date is W. I. Thomas and time the study was carried out. It had the
F. Znaniecki’s The Polish Peasant in Europe potential to undermine one of the basic
and America published in five volumes premises of positivist social science: that
(1918–20)7 . It is a study of Polish migrants objective facts alone could constitute the
in Poland and in Chicago, where they settled data studies were to be based upon. Their
upon arrival in the USA. Based on a number methodological principle was formulated as
of sources of data such as official documents follows: The cause of a social or individual
and statistics, it also included personal letters, phenomenon is never another social or
diaries and one autobiographical account: that individual phenomenon alone, but always a
of the peasant Wladek. For many reasons the combination of a social and an individual
study was not fully recognised as the accom- phenomenon (Blumer 1979 [1939], p. 9).
plishment it was until nearly 20 years after it Their standpoint was contrary to positivist
was published. In 1938 the American Socio- social science also in that they did not see
logical Association elected the study as one of physics as the paradigmatic science the social
the greatest works in sociology. The appraisal sciences should model itself on:
proceedings were convened by Herbert
[…] while the effect of a physical phenomenon
Blumer and were published in 1939. The book
depends exclusively on the objective nature of
makes fascinating reading for anyone inter- this phenomenon and can be calculated on the
ested in social science methodology as the ground of the latter’s empirical content, the effect
discussions in the panel are quoted verbatim. of a social phenomenon depends in addition on
The debates were set in a time period when the subjective standpoint taken by the individual
or group toward this phenomenon. (Thomas and
positivism8 defined the boundaries of what
Znaniecki 1918, p. 38 cited in Blumer 1939, p. 11)
was to be considered science. Dilemmas dis-
cussed at the appraisal proceedings included The epistemological basis of pragmatist
whether ‘subjective factors’ should play a role thought as represented by Peirce and Mead
in social science research, and if so, how (Lewis and Smith 1980) could be thought
was this to be accomplished? Following from of as a form of processual realism in
this, another question – that of whether and that it does indeed presuppose independent
to what extent ‘human documents’ could be reality, but this reality is not fixed as in
considered reliable sources of data – became positivist thinking. Reality itself changes in
central. Wladek’s autobiographical account, time and humans as social beings create
the first ever to be used as a sociological reality as a collective activity. In contrast to
source of data, was especially scrutinised a constructionist position, which highlights
with reference to whether or not it could be the social constructed nature of reality and
considered reliable. rejects any independent qualities of it, the
The underlying ontological premise in form of realism found in Peirce and Mead
positivistic thinking is that reality is fixed and defined itself in contrast to their contemporary
exists independent of human observation and variety of constructionism, namely idealism
interpretation9 . According to this position, (Lewis and Smith 1980). Drawing this parallel
the role of any scientific endeavour is to is reasonable because what idealism and
constructionism share is a set of questions way of studying social phenomena (Platt

starting from epistemological foundations 1996). A positivist way of thinking social
rather than ontology; ‘what can be known’ science formed the epistemological basis
is the idealist/constructionist epistemological of these methods11 . When new technology
question, whereas ‘what is there’ is a realist made it feasible to handle large quantities of
ontological approach. Starting enquiry from data within a shorter time-span, it became
the former blurs the boundaries between easier to focus on sophisticated statistical
ontology and epistemology in that reality is techniques for analysing data rather than
seen only in terms of knowledge as expression questioning the validity of the data itself, or
of what is known – language. discussing design of studies or ontological
The transcript of the conference proceed- and epistemological foundations for social
ings following Blumer’s critique demon- research more generally. The phase when
strates how the discussion and comments questions of method as technique (Kaplan
centred very much on the value of subjec- 1964) were predominant lasted well into the
tive data, and to what extent these could seventies. As Blumer remarks in his foreword
be regarded as ‘scientific’. Indeed Blumer to the 1979 edition of the Appraisal of The
himself maintains that the human documents Polish Peasant,
used by Thomas and Znaniecki could not
It is believed today that generalizations are to be
be tested in a scientific way, and that their sought and that analyses are to be made in the form
claim to have developed concepts from the of relations and correlations between ‘objective’
empirical material could consequently not variables. Further, even when sociological scholars
be regarded as valid (Blumer 1939, pp. are sensitive to so-called subjective factors, they are
109–111). The discussion is interesting for highly unlikely to rely on letters and life histories to
catch such factors. (Blumer 1979, p. xi)
several reasons, not least because some
of Blumer’s viewpoints anticipate his later The situation was however not as bleak
writings. During the discussion he refuses throughout the whole period as this suggests.
to enter into a debate about validity, and Herbert Blumer’s own work is but one
concludes that his viewpoints coincide with example of alternative ways of thinking about
those of Thomas and Znaniecki in that he social science and questions of method. In
believes the ultimate test of theory is whether 1956 he published an article called ‘What
it makes sense in relation to the data to which is wrong with social theory?’ where he
it refers (Blumer 1939, p. 115). This position raised issues that had been touched upon
is contrary to a strict positivist standpoint, in the Appraisal procedures. Some of the
where validity criteria are related to a theory’s questions he did not want to explore during
capability to predict in research beyond a that discussion were developed in this paper.
single study10 . The discussion in the panel He sought to draw boundaries between the
thus demonstrates how The Polish Peasant as social sciences and the natural sciences by
an example of empirical research challenged examining notions of theoretical concepts in
mainstream thought in the social sciences of both. For the social sciences to develop on
its day. It did so by emphasising the need its own terms and in order to free itself from
for research to include subjective accounts the paradigmatic status that classical physics
in empirical studies in order to make sense still enjoyed, he suggested that concepts in
of the social world, and in doing so, it also the social sciences be termed and treated as
challenged core discussions not only about sensitising concepts in contrast to the definite
questions of method, but also by moving the concepts characteristic of the natural sciences
discussion into the realm of epistemological (Blumer 1954). The former are theoretical
and methodological issues. concepts that indicate a direction in which
Anglo-American social research from the to look, rather than concepts with strict
forties onwards entered into a phase where definitions that tell you precisely what to
statistical methods became the dominant look for, which is what definite concepts do.
Throughout his career Blumer sought to Thomas and Znaniecki, his thoughts on what
challenge ‘variable sociology’ by doing empirical material sociology should concern
empirical studies that were in keeping with itself with, emphasised the value of such
his notions that sociology was to be centred data. In early writings he outlined thoughts
on studies of social interaction12 . His naming about methodological issues (Mills 1940) that
of symbolic interactionism as a strand of were later presented more extensively in The
sociology that developed the heritage from Sociological Imagination. Mills’ work moved
G.H. Mead’s social behaviourism is evidence the discussion from method as technique, to
of this. Blumer’s symbolic interactionism has the realms of methodology and epistemology.
been very influential for the development of Very much influenced by American pragma-
biographical research, not least in the work of tist thought and also by Karl Mannheim’s
Norman Denzin. This will, however, be the sociology of knowledge, his views on social
topic of a later section. reality coincided with those of Mead in
Other alternative strands of thought existed, that he thought of the self as in process in
also in American sociology. The most radical social contexts that were also in continual
critique of the situation in the social sciences development, hence his insistence on the
came from a scholar who by many was proper subject for sociological study to be the
regarded as an outsider but whom nevertheless intersection between history and biography.
made his mark in a distinctive way. Only in studying the actions, thoughts and
feelings of individuals and contextualising
them in particular moments in history, can
POSITIVISM CHALLENGED: THE sociology fulfil its potential:
SOCIOLOGICAL IMAGINATION
[The sociological imagination] is the capacity to
range from the most impersonal and remote
C. Wright Mills published The Sociological transformations to the most intimate features of the
Imagination in 1959, three years before his human self – and to see the relations between the
death in 1962. The ideas presented in this two. Back of its use there is always the urge to know
book were developed throughout his career the social and historical meaning of the individual
in the society and in the period in which he has his
as an empirical researcher and a critic of
quality and his being. (Mills 1980 [1959], p. 14)
much of his contemporary researchers’ work.
He was especially critical of the dominance Evident in this are his notions of theory
of what he on the one hand called ‘The that were closely linked to his thoughts on
Theory’ which referred to a tendency to seek methodology and his epistemological and
explanations for social phenomena in large ontological beliefs. As to the latter he can
bodies of thought known as ‘Grand Theory’ be characterised as a realist of the variety
of the Parsonian variety, and on the other found in the pragmatism of Peirce and Mead
hand what he called ‘The Method’: statistical meaning that he thought of social reality
techniques for analysing huge datasets. The as existing beyond human interpretation; yet
most prominent advocate of the latter was one interpretation (what Thomas and Znaniecki
of Mills’ earlier superiors, Paul Lazarsfeld. termed the subjective side of social reality
Mills’ critique was grounded in an alternative or attitudes) was an inescapable part of
vision of what sociology was to be about. The empirical data. His processual and double-
historical period known as the cold war did natured view of social reality lay at the
not take kindly to a politically radical figure heart of his vision of what the sociological
such as Mills. However, he was a productive imagination was and what role sociology had
empirical researcher and studies such as White in society. It also informed his thoughts on
Collar and The Power Elite received wide data and methods for analysing them: his
acclaim. methodological viewpoints. In the appendix
Even though Mills himself did not carry to The Sociological Imagination he outlines
out biographical research in the tradition of in much detail how social science studies
can be carried out in order to collect and from the ground up13 . Qualitative data, such
produce empirical material and how to analyse as observation and interviews, was the main
it in ways that shed light on the crucial source of empirical evidence in this approach.
questions of a particular period in history; It thus helped to develop a logic of method
identifying how private troubles and public that was said to be particular to qualitative
issues are interconnected for people living analysis14 .
in particular places at some defined period Another approach that emerged in the 1970s
in history (ibid). Empirical material includes was life course research, a quantitative way of
biographical accounts as told and interpreted analysing data with special attention to life
by individuals themselves, in addition to course events seen in light of cohorts and
information of a more factual kind; records historical periods. Age is of special relevance
and facts about life courses in general and as social institutions in most societies are
about the society in which the individual lives organised such that cohorts go through the
unfold. He did in other words advocate the use same events at roughly the same chronological
of data from many different sources in order to age, for instance the system of education
understand the layers of context that people’s (Elder 1974; Riley 1988; Giele and Elder
lives are embedded in. 1998). This perspective, which is quantitative
Mills’ writings did not result in any revival and owes much to both demographic studies
of biographical research in his time. It took and more macro-oriented social research as
nearly two decades after the publication of found in the classic texts as well as to Mills’
The Sociological Imagination for this research approach to sociology, has been influential
tradition to re-emerge, this time in Europe. also in qualitative approaches in that both
see temporal aspects of social processes,
and the link between macro and micro,
METHOD DISCUSSIONS: CHANGES IN as central to social research (Giele and
APPROACHES TO THE Elder 1998). Methodologically, life course
QUANTITATIVE-QUALITATIVE DIVIDE research with its large datasets that can
span generations of individuals is oriented
As positivism came under close scrutiny towards debates on statistical analyses and
and critique from philosophers and social methods as technique. Following Bryman’s
scientists alike throughout the 1960s and 70s, (2004) distinction between an approach to the
mainstream social science debates were still quantitative-qualitative divide as one based
stuck within the parameters of discussion on data and methods on the one hand, and
defined by positivist notions of science: the more epistemologically founded one on
those of methods as techniques. Questions the other, quantitative life course research and
about validity and reliability of data, of qualitative biographical approaches can easily
generalisations and representativeness were be combined if the former stance is taken.
argued over across the borders between However, as will be seen in the following, this
qualitative and quantitative research. combination of data is not possible with all
The publishing of The Discovery of types of approaches to biographical material.
Grounded Theory (Glaser and Strauss 1967)
was important for the development of qual-
itative research in its own right. The main THE REVIVAL OF BIOGRAPHICAL
thesis in this book challenged contemporary APPROACHES
notions of theory and method both. Where
surveys were analysed to test hypotheses Oral history had by the early 1970s emerged
based on theoretical assumptions formulated as a tradition to be reckoned with in history
beforehand, grounded theory suggested a way (Thompson 1978). Biographical accounts
of carrying out research and analysis starting played an important role in this research,
from data and building concepts and theories and debates in this field to start with often
centred on whether or not such data could be of a follow-up volume of this book published
considered reliable sources of knowledge for in 2001. By then what could be called ‘the
historians; whether people’s recollections of linguistic turn’ had taken hold in the social
the past could be considered accurate enough sciences, and most discussions related to
for this to qualify as scientific data, and methodology had taken on a new shape.
the retrospective element in such interviews Epistemologically the discussions during
was scrutinised (see, e.g. Gittins 1979). What the first revival phase of biographical research
distinguished oral history from the work of were carried out from a realist ontological
Thomas and Znaniecki was first and foremost position, e.g. underlying the debates was the
the use of interviews. Wladek’s autobiography notion that biographical material was able to
was a written account, and therefore not the give access to some form of truth about social
result of a life history interview15 . life. When accounts were questioned it was
The most important phase in biographical from a perspective of reliability at a method-
research started in the late 1970s with the work ological level, whether people’s stories could
of Daniel Bertaux in France. The publication be relied upon; notions of truth itself were not
of a collection of papers from the first ad the object of debate during this phase.
hoc workshop on biographical research at
the World Congress in Sociology in 1978
in Uppsala, marks a revival of the interest ‘THE LINGUISTIC TURN’:
for biographical research in sociology. The
POST-MODERNISM AND
book, entitled Biography and Society. The
POST-STRUCTURALISM
Life History Approach in the Social Sciences
contains papers that cover a broad spectrum
In Europe hermeneutical approaches have
of topics. Questions arising from written and
become prominent in discussions that high-
oral biographical accounts were looked into,
light differences between the humanities and
and perspectives from the social sciences
the natural sciences17 . Husserl’s phenomenol-
and humanities were drawn on to explore
ogy was important for the development of
them. Common to all papers in this volume
the Heideggerian hermeneutics, but has also
is a concern with time, and life lived and
been influential in its own right in the
interpreted in time. Questions about gen-
social sciences, not least through Garfinkel’s
eralisations and representativeness are also
ethnomethodology which was developed in
explored but unlike earlier discussions they
the intersection between Parsonian thought
are set within a wider frame of understanding
and A. Schutz’s expanding of Husserl’s work
than a mere positivistic frame of reference16 .
(Heritage 1984). Hermeneutics started out as
Another influential work from the early
a method to examine texts, and to try and read
days is Ken Plummer’s Documents of Life
texts as part of the context they originated
(1983). In contrast to the volume edited
in – the hermeneutical circle. As this per-
by Bertaux, this book is a monograph that
spective gained more ground in social science
sets the biographical tradition within the
methods debates, aspects of language and
frame of Chicago sociology in general and
narrative structure in biographical accounts
symbolic interactionism in particular. This
were highlighted.
book has also become a classic in biographical
Another important influence for this shift
research because it was a first attempt to
came from linguistics. As the structural
map the history of this particular sociological
linguistics of Lévi-Strauss was criticised by
research tradition. Plummer’s epistemologi-
Foucault and Derrida, the grounds were laid
cal perspective in this book is realist, and
for post-structuralism in language theory and
he pays much attention to interviewing and
social theory. But:
analysis of interviews in order to grasp the
meaning inherent in biographical accounts. Despite their differences, structuralism and post-
This perspective is a contrast to his publication structuralism both contributed to the general
displacement of the social in favour of culture Norman Denzin was one of the most
viewed as linguistic and representational. Social prominent advocates of a shift in biographical
categories were to be imagined not as preceding
research towards narrative approaches and
consciousness or culture or language, but as
depending upon them. Social categories only a focus on language. A former student of
came into being through their expressions or Blumer’s, he changed the term used for his
representations. (Bonnell and Hunt 1999, p. 9) perspective from symbolic interactionism to
interpretive interactionism (Denzin 1989a,
The semiotics of Roland Barthes, 1989b).
Foucault’s critique of power and Lyotard’s
critique of ‘grand narratives’ were all The term ‘interpretive interactionism […] signifies
an attempt to join traditional symbolic interactionist
influential for the direction social science
thought with participant observation and ethno-
research took throughout the 1980s. graphic research, semiotics and fieldwork, post-
Methodological questions were replaced modern ethnographic research, naturalistic studies,
by epistemological debates; and these centred creative interviewing, the case study method,
on whether there was reality beyond language. the interpretive, hermeneutic, phenomenological
works of Heidegger and Gadamer, the cultural
When influence from the humanities
studies approach of Hall, and recent feminist
became more pronounced throughout the critiques of positivism. (Denzin 1989a, pp. 7–8)18
80s, a shift of focus also occurred in
biographical research. From having been
concerned with analyses of life stories and From this quote it becomes clear that bio-
biographical accounts as empirical evidence graphical research epistemologically founded
of lived life, gradually more attention was in realist pragmatist thought was no longer
given to the narrative itself, to the told life centre stage. A blending of many different –
and to the different phases of interpretation and in some instances incompatible – research
of a biography. Questions about the role approaches opened a wider field for biograph-
of the researcher in the production of ical research, and also invited collaboration
the biographical account, whether this had across disciplinary boundaries in ways that
originated as a written autobiography or was had earlier not been common. This was espe-
the outcome of an interview between an cially true in feminist biographical research19 .
informant and a researcher, became important. Denzin’s changed approach is symptomatic
Demands that the researcher be self-reflective of the debates that occurred in biographical
in the writing up of biographical research research during this period. From discussions
material were frequently heard, and in many about whether individuals’ accounts could
instances the biographical experiences of the be regarded as reliable in the sense of
researcher and his or her reactions to the people telling the truth about their lives,
story told by the informants, became topics the interest was gradually shifted towards
of interest (Iles 1992). This shift also marked debates on ontological and epistemological
a change in epistemological focus towards a issues (Nilsen 1994, 1996). In many instances
more constructionist standpoint which implies the underlying epistemological notions were
a line of questioning that is premised on not taken up explicitly but informed research
knowledge about reality as reality (Lewis design and choices of methods for data
and Smith 1980). A belief that reality is collection and analysis in empirical studies.
a human construction alone can lead to In Chicago during the 20s a processual
extreme relativism in the approach to any notion of the self as developed in the pragma-
research material. A blurring of the boundaries tist thought of Peirce and Mead, underpinned
between fact and fiction, between truth and Thomas and Znaniecki’s research. A notion
non-truth, between the factual and the non- of self, and of life, as lived in time with
factual, implies a very different approach to access to memories of experiences in the past
biographical research from that of the classic and the willingness and ability to recount
studies. these in some present, is central in classical
biographical research20 . In order for this the social sciences and the humanities has
approach to have merit, some form of realist increased over the past decade, and cross-
epistemological position must bear upon the disciplinary studies have been encouraged22 ,
theoretical and methodological perspectives the debates over biographical and other
employed in research. Experiences21 cannot methods of performing research are many
be recalled if there is no such thing as and varied. The influence from hermeneutics
reality beyond language. Indeed, a strong and methodological approaches originating
constructionist position seems to annihilate in humanistic disciplines, together with the
the notion of time as process and leaves epistemological shift towards construction-
only a present with no relation to past or ist/interpretive perspectives, has led some
future as discourse and language replace to subsume biographical material under the
time and material practice. Where there was term interpretive approaches23 . In doing so
earlier a concern with time as process and the story as a told story is put at the
self as developing in social relationships forefront of attention. It goes without saying
that changed over time, more attention has that biographical accounts are told stories.
been paid to the concept of identity, also in However, whether one believes there is a
biographical research. reality beyond the account and hence some
Identity was earlier discussed in relation to factual experiences informants talk about and
development and particularly with reference make these part of the analysis, is an important
to the life course phase of youth (Erikson 1980 distinction between a constructionist and
[1959]). The epistemological shift towards a realist approach. In order to overcome
constructionist approaches introduced terms the divide created by the epistemological
such as ‘fragmented identities’ and identities debates, and for social science in general
as matters of choice (Giddens 1991; Plummer and biographical research in particular, to
2001). Such notions are more spatial than maintain its critical potential, a return to
temporal since identities in this sense bear agency as a key sociological notion, is by
no relation to development in time but can some held as crucial (Bonnell and Hunt 1999;
be regarded as constructed in discourse and Chamberlayne et al. 2000).
markers of life style rather than being related Exploring the way people talk about
to the development over life course phases their lives is important for many reasons.
(Brannen and Nilsen 2005). Where Erikson Understanding narrative structure can add
saw identity as part of a wider notion of self, immensely to the overall understanding of
identity has in many instances replaced the a biographical account, not only in terms of
notion of self as ‘selves’ are thought of in language used, but also with reference to the
terms of being constructed in discursive fields social positioning of individuals in society
rather than developed in social relationships (Reissman 1991; Nilsen 1996). Moreover, it
(Bonnell and Hunt 1999, p. 22). can also give insight into and draw attention
to the silences in biographical accounts,
and thus make visible the taken-for-granted
aspects of people’s lives that are more
METHODOLOGY DISCUSSIONS often than not structurally founded and thus
BEYOND THE important for understanding the informant
QUANTITATIVE-QUALITATIVE, THE in the context that the life unfolds within.
POSITIVIST-INTERPRETIVE AND THE In cross-national comparative research this
REALIST-CONSTRUCTIONIST DIVIDES? aspect of biographical accounts is particularly
important (Nilsen and Brannen 2002; Brannen
Biographical research currently sweeps a and Nilsen 2005).
wide array of approaches and perspectives. As However, approaching biographical
the blurring of boundaries between disciplines accounts from this perspective alone can
within the social sciences and between render the more material structural contexts
that surround and inform the content of the make choices to a much larger extent than in
story an individual has to tell less important. ‘high modernity’. Individual choices become
It therefore seems significant for biographical centre stage and the characteristics that form
research to be equally aware of the questions opportunity structures that make for system-
that were raised early in its history as those atic disparities in individuals’ life chances are
that are currently in vogue. not recognised as such. If social scientists
The ontological and epistemological foun- carry out empirical biographical research
dations of the ‘cultural turn’ make it difficult with this type of theoretical back cloth as
to envisage a social science that can produce the main conceptual apparatus, analyses are
convincing evidence of, for instance, social taken to a level of abstractions where indeed
disparities between groups of people (Nilsen discourse and narratives are more meaningful
1994; Bonnell and Hunt 1999). If the notion starting points than the intersection of history
of culture replaces that of social structure, and biography. For the latter to be included
and individual narratives about lives become in studies, attention to the complex and
the most important objects of analysis rather many layered contexts that people’s lives
than lived experiences as expressions of are embedded is needed. Empirical research
social and collective being, the question of has challenged the individualisation thesis
whether there is a place for social science on many fronts, especially the fact that it is
research that highlights power and systematic not sensitive to variation but rather works
differences and inequalities between people as another ‘Grand Narrative’ that shapes the
may rightfully be posed. Whether there will outlook on life rather than tells a sociological
indeed be room for the potential of social story about social diversity and inequality
science to provide critical analyses of trends (Nilsen and Brannen 2002; Brannen and
and development at different levels of society Nilsen 2005).
is another question that can be asked. As For biographical research it is especially
Chamberlayne et al. point out in a critique important that the tradition which sets the sto-
of cultural studies without agency, ‘ “Cultural ries informants tell into a multi-layered social
sociology” rather than “cultural studies” is framework rather than merely analysing them
what is needed’ (p. 9). from a discourse and narrative approach, is
To illustrate some implications of these upheld. As Daniel Bertaux observes in a paper
questions a current strand of thought may that highlights biographical research as a tool
be taken as an example. It also highlights for comparative analysis,
the importance of discussing methods in
relation to theoretical perspectives and ideas Whenever [life stories] are used for probing
that address themselves to particular topics in subjectivities, life story interviews prove able to
social research. probe deep; perhaps because it is much easier to
The individualisation thesis as formu- lie about one’s opinions, values and even behaviour
than about one’s own life. […] it takes a sociological
lated by Beck (1992) and Beck and Beck- eye – some lay persons do possess it – to look
Gernsheim (1995) is informed by a life course through a particular experience and understand
perspective and a biographical approach. what is universal in it; to perceive, beyond described
Arguing from a life course perspective Beck actions and interactions, the implicit sets of rules
and Beck-Gernsheim (1995) maintain that and norms, the underlying situations, processes and
contradictions that have both made actions and
a ‘standard biography’ is being replaced by interactions possible and that have shaped them in
a ‘choice biography’, and that life course specific ways. It takes some training to hear, behind
phases no longer follow the same pattern they the solo of a human voice, the music of society and
used to since structural characteristics such culture in the background. This music is all the more
as age, gender and social class are not as audible if, in conducting the interview, in asking
the very first question, in choosing, even earlier,
significant for shaping individuals’ lives as the right persons for interviewing, one has worked
they once were. An individualisation is said with sociological issues and riddles in mind. (Bertaux
to take place, where people are forced to 1990, pp. 167–168)
This quote echoes Mills’ visions for sociol- 4 As pointed out by Platt (1996) the English
ogy – what it should be about and the role of terms for method and methodology create problems
sociologists in society. However, it also draws when used as adjectives; both are referred to as
‘methodological’. This chapter is concerned with
attention to some of the debates that biograph- methodology in the wider sense, not to method in the
ical research initiated by the work of Thomas strict sense of ‘technique’ or ‘procedure’ for studying
and Znaniecki; can life stories be relied upon? the social world.
To what extent can this type of material hope 5 As Kaplan (1964) observes, the term ‘method-
ology’ is often used synonymously with epistemology
to be seen as representative of more than the
by philosophers (p. 20). The definition of epistemology
individual story? Far from being dismissed referred to in the context of this chapter is ‘theory of
as mere ‘positivist’ lines of questioning, such knowledge’.
issues are real and are routinely faced by 6 For other ways of classifying, see Miller 2000;
researchers working within this tradition. Roberts 2002.
7 This study was carried out in Chicago where
The paradox is that both the positivist and
sociology was still very much influenced by American
interpretive sides of the divide question the pragmatism. For a further discussion of this see Nilsen
validity of biographical research founded on American Pragmatism and Biographical Research
a realist pragmatic starting point. From an (work in progress).
extreme interpretive side of the divide debates 8 See, e.g. Kaplan (1964) for a detailed discussion
of different forms of positivism and their relevance for
about representativeness are easily rejected as
social science studies. Platt (1996) also gives a detailed
irrelevant since they are considered positivist. account of different interpretations of positivism in
Extreme positivism on the other hand would relation to ‘scientism’: ‘Its meaning overlaps with
question biographical material because it does that now attached to ‘positivism’. It is associated
not qualify as objective data. This chapter with a commitment to making social science like
natural science, and thus with themes such as
has thus argued that a third position needs
empiricism, objectivity, observability, operationalism,
focusing on. In order to map out this third behaviourism, value neutrality, measurement and
position the case has been made for a closer quantification’ (Platt 1996, pp. 67–68).
look into the ontological and epistemological 9 This ontological position is in Lewis and Smith’s
standpoints that underpin methodological (1980) terms a ‘materialist social nominalism’ (p. 8).
10 Theory in a strong positivist sense is aimed at
debates within biographical research. The building laws through hypothesis testing over time.
parameters for the discussion have been the 11 It should be kept in mind here that the
starting point in debates about ‘method as situation in Hitler’s extended Germany was one where
technique’ that highlighted the quantitative- positivist ways of doing social science was actually
qualitative divide, to the current situation the most effective way to challenge racist beliefs
that underpinned the Third Reich’s ideology, and
that focuses on epistemological questions and social scientists who advocated such research were
discussions across the boundaries of a realist- persecuted and had to flee the country if they could.
constructionist divide. Paul Lazarsfeld was but one of these scientists who
fled to the USA. The direct impact of the ‘Vienna
Circle’ for the development of American and also
European social science methods, is however one that
must be seen in view of other simultaneous tendencies
NOTES within American social science itself (see Platt 1996 for
a detailed discussion of this topic).
1 Definitions of biographical research will be 12 The difference between Blumer and Mead
discussed in a later section. In this chapter the on approach to method, where the latter saw no
focus will be on overall debates within this field, problems in combining qualitative and quantitative
thus variations in traditions for making use of this methods, is pointed out by Deegan 2001. Blumer’s
perspective will not be the focus here. approach must be seen in view of the contemporary
2 The terms ‘case’ and ‘case studies’ are referred to time of his writing, where the quantitative-qualitative
in different ways in current sociology. For an overview divide was much more prominent than in Mead’s time.
of themes and topics in debates over case studies, see 13 In one sense Glaser and Strauss took Blumer’s
Gomm et al. 2000 and Yin 2003. notion of ‘sensitising concepts’ and developed it in
3 In Durkheim’s original text the use of the term a direction that ‘operationalised’ how to go about
‘method’ also encompasses what is being referred to making use of sensitising concept in actual empirical
here as methodology. studies.
14 Grounded theory has been criticised for being interdisciplinarity can only work if there are in fact
too positivist and quantitative in its approach to disciplinary differences’.
data and method (see, e.g. Christensen et al. 1998). 23 See, e.g. Plummer 2001.
However, at the time it was published it represented
a more radical approach than what it is thought of
today.
15 This is not to say that life history interviews had REFERENCES
not been conducted before the 1960s; in psychology
there was much interest in biographical interviewing. Beck, Ulrich 1992. The Risk Society. London: Sage.
However, an account of this falls outside the scope of Beck, Ulrich and Elisabeth Beck-Gernsheim 1995. The
this paper.
Normal Chaos of Love. Cambridge: Polity Press.
16 See in particular papers by Bertaux, Ferrarrotti,
Kohli and Thompson in the book.
Bertaux, Daniel (ed.) 1981. Biography and Society.
17 Drawing on Dilthey’s notions of understanding London: Sage.
meaning in context and Heidegger’s development Bertaux, Daniel 1990. ‘Oral History Approaches to
of his ideas in Being and Time, Heidegger’s student an International Social Movement’ in Öyen E. (ed.)
Gadamer published Truth and Method in 1960 Comparative Methodology. London: Sage.
(Gadamer 1989), which has since become a standard Bertaux, Daniel and Paul Thompson 1997. Pathways
reference within hermeneutical approaches. These to Social Class: A Qualitative Approach to Social
works are mainly concerned with the interpretation of Mobility. Oxford: Clarendon Press.
texts and were subjects for the humanities rather than
Blumer, Herbert 1979 [1939]. An Appraisal of
the social sciences to start with. This was to change as
Thomas and Znaniecki’s ‘The Polish Peasant in
post-structuralism and post-modernism gained more
ground in the social sciences in the 1980s. Europe and America’. New Brunswick: Transaction
18 References in Denzin’s text are not included in Books.
this quote. Blumer, Herbert 1954. ‘What is Wrong with Social
19 See Teresa Iles (1992) for an example of publi- Theory’, American Sociological Review 19, 3–10.
cations from meetings across disciplinary boundaries. Bonnell, Victoria E. and Lynn Hunt 1999. ‘Introduction’
Stanley (1992) also voices the need for more cross- in Bonnell V. and Hunt L. (eds) Beyond the Cultural
disciplinary research in feminist biographical studies. Turn. Berkeley: University of California Press.
20 Pragmatist thought does not rest on a notion
Bourdieu, Pierre et al. 1999. The Weight of the World:
about truth as fixed, and thus a possibility to arrive
Social Suffering in Contemporary Society. Cambridge:
at some final account of life. Events and individuals’
experiences of them are recalled at different points in Polity Press.
time which can make factual events take on different Brannen, Julia 1995. Mixing Methods. Qualitative and
meanings in a personal life as time passes. This does, Quantitative Research. Aldershot: Avebury.
however, not mean events did not happen, or did Brannen, Julia, Peter Moss and Ann Mooney 2004.
not happen that particular way, rather that they are Working and Caring over the Twentieth Century.
seen and interpreted in different ways depending on Change and Continuity in Four-Generation Families.
the present a story is told in and the context the Basingstoke: Palgrave Macmillan.
interview takes place in (Nilsen 1996). The interview Brannen, Julia and Ann Nilsen 2005. ‘Individualisation,
itself and the relationship between the interviewer and
Choice and Structure: A Discussion of Current Trends
the informant, are also decisive of what aspects of
in Sociological Analysis’, The Sociological Review
factual events informants relate in their accounts. It is
important to note here that this way of approaching 53(3), 412–428.
interpretation does not imply a rejection of something Bryman, Alan 2004. Social Research Methods. Oxford:
‘true’ and ‘factual’ in events, in personal lives as well Oxford University Press.
as in historical and structural terms. Chamberlayne, Prue, Joanna Bornat and Tom Wengraf
21 The notion of experience, for the very reasons 2000. ‘Introduction’ in Chamberlayne et al. (eds)
mentioned here, came under debate and questions The Turn to Biographical Methods in Social
about experience itself were asked. It was not the Science: Comparative Issues and Examples. London:
‘truth’ of people’s accounts of experiences that were Routledge.
called into question, but the ontological foundation
Christensen, Karen, Else Jerdal, Atle Møen, Per Solvang
that the notion of experience rests on; whether there
is independent reality.
and Liv J. Syltevik 1998. Prosess og methode (Process
22 This drift towards interdisciplinarity has its and Method). Oslo: Universitetsforlaget.
critics. As Bonnell and Hunt (1999, p. 14) observe: Deegan, Mary Jo 2001. ‘Introduction: George Herbert
‘Dialogue among the disciplines depends in part on Mead’s First Book’ in Mead, George Herbert
a strong sense of their differences from each other: (ed.) Essays in Social Psychology. New Brunswick:
exchange is not needed if everything is the same; Transaction Publishers.
Denzin, Norman 1989a. Interpretive Interactionism. Nilsen, Ann 1994. ‘Life Stories in Context. A Discussion
London: Sage. of the Linguistic Turn in Contemporary Sociological
Denzin, Norman 1989b. Interpretive Biography. London: Life Story Research’, Sosiologisk Tidsskrift 2(2),
Sage. 139–153.
Durkheim, Emile 1972 [1895]. Den sociologiske metode Nilsen, Ann 1996. ‘Stories of Life – Stories of Living.
(Rules of Sociological Methods). København: Fremad. Women’s Narratives and Feminist Biographies in
Elder, Glen 1974. Children of the Great Depression: NORA’, Nordic Journal of Women’s Studies 1(4),
Social Change in Life Experience. Chicago: University 16–31.
of Chicago Press. Nilsen, Ann 1997. ‘Great Expectations? Exploring Men’s
Erikson, Erik 1980 [1959]. Identity and the Life Cycle. Biographies in Late Modernity’ in Grønmo, Sigmund
New York: Norton. and Bjørn Henrichsen. (eds) Society, University
Gadamer, Hans-Georg 1989. Truth and Method. and World Community. Essays for Ørjar Øyen,
London: Sheed and Ward. pp. 111–135. Oslo: Scandinavian University Press.
Giddens, Anthony 1991. Modernity and Self-Identity. Nilsen, Ann and Julia Brannen 2002. ‘Theorising the
Cambridge: Polity Press. Individual-Structure Dynamic’ in Brannen et al. (eds)
Giele, Janet and Glen Elder 1998. ‘Life Course Research. Young Europeans, Work and Family: Futures in
Development of a Field’ in Giele J. and Elder G. (eds) Transition, pp. 30–48. London: Routledge.
Methods of Life Course Research. Qualitative and Nilsen, Ann and Julia Brannen 2005. Consolidated
Quantitative Approaches. London: Sage. Interview Report from the Transitions Research
Gittins, Diana 1979. ‘Oral History, Reliability, and Project for the EU Framework 5 funded study Gender,
Recollection’ in Moss and Goldstein. (eds) The parenthood and the changing European workplace,
Recall Method in Social Survey. University of London printed by the Manchester Metropolitan University:
Institute of Education: Studies in Education 9. Research Institute for Health and Social Change.
Glaser, Barney G. and Anselm L. Strauss 1967. Nilsen, Ann (forthcoming) American Pragmatism and
The Discovery of Grounded Theory: Strategies for Biographical Research (work in progress).
Qualitative Research. Chicago: Aldine. Platt, Jennifer 1992. ‘ “Case Study” in American
Gomm, Roger, Martyn Hammersley and Peter Foster Methodological Thought’, Current Sociology 40(1),
(eds) 2000. Case Study Method. London: Sage. 17–48.
Harding, Sandra 1987. ‘Introduction: Is There a Platt, Jennifer 1996. A History of Sociological Research
Feminist Method’ in Harding S. (ed.) Feminism and Methods in America. 1920-1960 Cambridge:
Methodology. Milton Keynes: Open University Press. Cambridge University Press.
Heritage, John 1984. Garfinkel and Ethnomethodology. Plummer, Ken 1983. Documents of Life. An Introduction
Cambridge: Polity Press. to the Problems and Literature of a Humanistic
Iles, Teresa (ed.) 1992. Biography. All Sides of the Method. London: Allen & Unwin.
Subject. London: Pergamon Press. Plummer, Ken 2001. Documents of Life 2. An Invitation
Kaplan, Abraham 1964. The Conduct of Inquiry. to a Critical Humanism. London: Sage.
Methodology for Behavioral Science. Scranton: Reissman, Catherine Kohler 1991. ‘When Gender is
Chandler Publishing Company. Not Enough. Women Interviewing Women’ in Lorber,
Levin, Irene 2000. ‘Forholdet mellom sosiologi og sosialt Judith and Susan Farrell. (eds) The Social Construction
arbeid’ (The relationship between sociology and social of Gender. London: Sage.
work), Sosiologisk tidsskrift (Journal of Sociology ) Riley, Mathilda W. (ed.) 1988. Social Structures and
8(1), 61–71. Human Lives. London: Sage.
Lewis, David and Richard Smith 1980. American Roberts, Brian 2002. Biographical Research.
Sociology and Pragmatism. Mead, Chicago Sociology Buckingham: Open University Press.
and Symbolic Interactionism. Chicago: The University Stanley, Liz 1992. The Autobiographical I. Manchester:
of Chicago Press. Manchester University Press.
Miller, Robert 2000. Researching Life Stories and Family Thomas, William I. and Florian Znaniecki [1918–20]
Histories. London: Sage. 1927. The Polish Peasant in Europe and America.
Mills, C. Wright 1940. ‘Methodological Consequences New York: Knopf.
of the Sociology of Knowledge’, American Journal of Thompson, Paul 1978. The Voice of the Past, Oral
Sociology 46(3), 316–330. History. Oxford: Oxford University Press.
Mills, C. Wright 1980 [1959]. The Sociological Yin, Robert 2003. Case Study Research. Design and
Imagination. London: Penguin Books. Methods. Thousand Oaks: Sage.
8
Research Ethics in Social Science
Celia B. Fisher and Andrea E. Anushko
Unparalleled growth in the social and behav- and Africa among other developing countries
ioral sciences in the last half of the twentieth (e.g. Council for International Organizations
century has and will continue to make signif- of Medical Sciences, 2002; Indian Council of
icant contributions to society’s understanding Medical Research, 2000; National Consensus
of persons as individuals, as members of Conference on Bioethics and Health Research
familial and non-familial social groups, and in Uganda, 1997; National Research Council,
participants within cultural, social, economic 2003; Thailand Ministry of Public Health
and political macrosystems. Increased public Ethics Committee, 1995; World Medical
recognition of the value of social research has Association, 2000).
been accompanied by heightened sensitivity
to the obligation to conduct social science
responsibly. The formidable task of insuring A BRIEF HISTORY OF RESEARCH
ethical competence in social research depends ETHICS RULES AND REGULATIONS
upon sensitive and informed planning by
ethically informed scientists and careful Biomedical research ethics have a long history
review by nationally mandated or indepen- formally beginning with the Nuremberg Code
dent Institutional Review Boards (IRBs) or (1946), the international response to the
Research Ethics Committees (REC). The atrocities committed by the Nazi medical
broad language of national and international experimentation. However because the acts
regulations and the diversity of expertise and committed by the Nazi scientists seemed
wide latitude in decision-making given to so far removed from standard medical and
IRBs is often intimidating to social scientists social research, the Nuremberg Code had
who are required to apply for IRB approval little influence on medical or social science
as a condition of conducting their research. research (Steinbock et al., 2005). Biomedical
Social scientists are additionally challenged research ethics continued to evolve slowly in
because of the historical and biomedical bias the United States and abroad (Declaration of
in the language and scope of regulations Helsinki, 1964). In the United States it was not
governing IRBs in the United States and RECs until the 1970s, when revelations of subjects’
in Europe, Latin America, India, Thailand, abuse in the now infamous Tuskegee Syphilis
Study (Heller, 1972; Jones, 1993) prompted

Ethics in the social sciences
U.S. Public Law 93-348 to call for the estab-
lishment of the National Commission for the Problems identified in social science research
Protection of Human Subjects of Biomedical did not produce the serious harms observed
and Behavioral Research. The National Com- in medical studies during the period of
mission published recommendations, known national and international biomedical research
as the Belmont Report (DHEW, 1978), regulations. Indeed, in the United States
that served as the basis for revised federal for example, prior to the National Com-
regulations published in the Federal Register mission’s report, social science researchers
in 1979 with continued revisions through rarely sought informed consent even when
2001 (DHHS, Code of Federal Regulations punishing stimuli were part of the research
Title 45-Part 46 Protection of Human Subjects design and the use of deception and invasion
45 CFR 46, 2001). At the same time the of privacy was common place (Sieber,
Council for International Organizations of 1992). This is somewhat surprising since the
Medical Sciences (CIOMS) in association American Psychological Association (APA)
with the World Health Organization (WHO) adapted its first ethics code covering research,
set out to develop guidelines that applied the teaching, and practice in 1953 (APA, 1953)
principles of the Declaration of Helsinki to the and the American Anthropological Associa-
conduct of biomedical research, particularly tion officially approved their Statement on
in developing countries. The final product Problems of Anthropological Research and
was the 1982, Proposed International Ethical Ethics in 1967 (Nolan, 2002). One reason
Guidelines for Biomedical Research Involv- for the lack of ethical awareness within
ing Human Subjects. Since 1982 two revisions social science research at that time might
have been made to the CIOMS guidelines: one have been the broad aspirational languages
in 1993 and the most recent in 2002. National of the codes. For example, it was not until
and international guidelines base research 1992 that specifically worded operational
ethics regulation on three general ethical standards of conduct for research, teaching
principles: (1) Beneficence: the obligation and professional practice were included
to maximize research benefits and minimize in the APA Ethics Code (Canter et al.,
research harms; (2) Respect: the responsi- 1994) and this model was then adapted by
bility to ensure that research participation other social sciences including the American
is informed, rationale, and voluntary; and Sociological Association, and the Canadian
(3) Justice: the obligation to ensure the fair Psychological Association (ASA, 1999; CPA,
distribution of research benefits and burdens 2000). The most recent revision of the APA
across populations. While the conceptual and Ethics Code (APA, 2003) includes a more
practical frameworks for research ethics in protective standard on deception research,
its present form are rooted in and largely prohibiting such research if it leads to pain or
dominated by Western culture (Ogundiran, substantial stress or discomfort and requiring
2004), these principles have retained their investigators to respect a participant’s request
fundamental value in guiding the ethical to withdraw data following debriefing (Fisher,
conduct of contemporary research in the West 2003a).
and increasingly in developing countries. In Recognizing the strong biomedical basis
Africa for example, the Pan African Bioethics for many of the previous guidelines gov-
Initiative (PABIN), was established in 2001 erning research, some countries have shifted
to foster the development of research ethics their focus to create statements of ethical
with special emphasis on the need to develop conduct specific to the social sciences.
the capacity for reviewing the ethics of Australia for instance, in revising their
research conducted in Africa by nationals and 1999 National Statement on Ethical Conduct
internationals (see:http://www.pabin.net/en/ in Research Involving Humans has drafted
index.asp). a new set of guidelines specifically for
RESEARCH ETHICS IN SOCIAL SCIENCE 97
social scientists while still building upon federal research funds to have written
the Nuremberg Code and the Declaration guidelines for the avoidance and institutional
of Helsinki, highlighting such principles as review of conflict of interest. These guidelines
research merit and integrity, justice, benef- must reflect state and local laws and
icence, and respect. Others have chosen to cover financial interests, gifts, nepotism,
order their principles according to the weight political participation, and other issues (see:
they should receive when in conflict specific http://grants.nih.gov/grants/policy/emprograms/
to the types of dilemmas social scientists often overview/ep-coi.htm).
face. For example, Canada prioritizes their Of relevance to investigators is the U.S.
four principles for social science researchers Public Health Service and National Science
in the order of: (1) Respect for the Dignity of Foundation (NSF) requirement that any
Persons; (2) Responsible Caring; (3) Integrity funding application must include a statement
in Relationships; and (4) Responsibility to on whether there are any significant financial
Society (CPA, 2000). interests that could directly and significantly
This chapter now turns to four specific areas affect the design, conduct or reporting of
of continued and emerging ethical concern in the research. Such interests can include
social research: conflicts of interest, informed consulting fees, honoraria, ownership or
consent, cultural equivalence, and the use of equity options, or intellectual property (e.g.
monetary incentives. The chapter concludes patents, copyrights, and royalties) where
with a call for ethical commitment, ethical such values exceed $10,000. Academic
awareness and active engagement in the institutional salaries and lectures sponsored
ongoing development of courses of action by non-profit or public entities are exempt
reflecting the highest ideals of responsible from this policy (see: http://www.nsf.gov/
social science. policies/conflicts.jsp, http://grants2. nih.gov/
grants/policy/nihgps_2001/nihgps_2001.pdf).
In addition, many IRBs in the United States
CONFLICT OF INTERESTS are requiring researchers to include a conflict
of interest statement in their informed
Social researchers should strive to estab- consents and journals are requiring a
lish relationships of trust with research statement describing the absence or existence
participants, the scientific community, and of a potential conflict of interest. For example,
the public. When conflicting professional, APA publications require authors to reveal
personal, financial, legal or other interests any possible conflict of interest (e.g. financial
impair the objectivity of data collection, interests in a test procedure, funding by
analysis or interpretation, such trust and the pharmaceutical companies) in the conduct
validity of the research is compromised. and reporting of research. According to the
Ethical steps to avoid potentially harmful or International Committee of Medical Journal
exploitative conflicts of interest are critical Editors (ICMJE) (2003) editors may use
to ensure that the objectivity of data analysis information disclosed in conflict of interest
and interpretation is led by data and not other and financial interest statements as a basis for
interests. Impairment of objectivity can harm editorial decisions. Prompted in large part by
participants, the public, institutions, funders, concerns about conflicts of interest stemming
and the integrity of social science as a field. from the relationship between pharmaceutical
Several national bodies and organizations companies and independent clinical research
have produced guidelines for conflict of organizations, India and other developing
interest decision-making relevant to the countries are beginning to call for adoption
conduct of social science research. For of international and establishment of national
example, in the United States the National regulations for research conflicts of interest
Institutes of Health Office of Extramural (Editorial, The Hindu, 2005; Pan African
Research requires every institution receiving Bioethics Initiative, 2001).
Several professional codes of conduct, emerge when investigators sponsored by pri-

including the APA Ethics Code standard on vate industry or organizations do not consider
conflict of interest (APA, 2002, Standard in advance the implications of data ownership
3.06), the British Sociological Association (Fried & Fisher, in press). Investigators
Code of Ethics (BSA, 2002, Standard 42), the working on independent projects funded
British Psychological Society Code of Ethics externally need to ensure that they maintain all
and Conduct (BPS, 2006, Standard 4.2), and access to and ownership of data as well as the
the Canadian Code of Ethics for Psychologists right to publish results without prior approval
(CPA, 2000, Standard III.31) are applicable or interference from the sponsor. Sponsors
for all social science researchers. As applied with financial interest in the outcome of the
to research they prohibit conflict of interests research if provided the opportunity may
if another personal, scientific, professional, deny investigators access to the final dataset,
financial or other interests or relationships attempt to dictate analytic strategies, stall
could reasonably be expected to impair dissemination of negative findings, or insist
objectivity, competence or effectiveness of on ghostwriting the scientific report. Failure
the psychologist to conduct the research, to anticipate the consequences of, acquiescing
if it would expose his or her organization to or naively signing a contract waiving
to harm, or if it would result in the harm these responsibilities can result in becoming
or exploitation of research participants or an accomplice to letting a financial agenda
research assistants. The ethics codes of other rather than the data drive research results.
social science organizations have similar In addition to resulting in a violation of
prohibitions against conflicts of interest (e.g. avoidance of unethical conflicts of interest,
AAA, 1998; ASA, 1999 in the U.S.). such decisions can result in other violations
within APA. For example, according to the
APA Ethics Code (APA, 2002, Standards
Examples of potentially harmful 1.01 Misuse of Psychologists Work and 5.01
conflicts of interest Avoidance of False or Deceptive Statements)
and the International Sociological Association
Examples of potential conflicts of interest can
Code of Ethics (ISA, 2001, Standard 3,
occur if: (a) a social scientist takes gifts from
Publication and Communication of data)
or has financial holdings in a company whose
investigators are prohibited from knowingly
product she or he is investigating; (b) the
making public statements that are false,
research is sponsored by a company or orga-
deceptive or fraudulent concerning their
nization that has a financial investment in the
research and are responsible for preventing
direction of results that might place pressure
or correcting false statements about their
on the investigator; (c) the investigator or his
work by others. For social scientists such
or her institution will hold the patent for the
public statements can include not only false
researched instrument; or (d) scientists are
statements in publications and professional
reviewing a grant application or manuscript
presentations, but product endorsements, false
submission from a competitor.
statements concerning conflict of interest
or delegation of research responsibilities on
Conflict of interest and industry grant applications, and expert testimony about
scientific data in legal proceedings.
sponsored research: Who owns
the data?
Conflicts between ethics and
In traditional academic contexts, social sci-
organizational demands
entists have a responsibility to report on
the results of their data, and to ensure that Social scientists who are employees or
the report accurately represents the findings. consultants to an organization face a slightly
Potentially unethical conflicts of interest can different set of ethical challenges. In such
contexts, the company or organization may INFORMED CONSENT

have a priori ownership of any data produced
by its employees. In such contexts the inves- The principle of respect reflects a moral
tigator’s role is to provide the organization concern for the autonomy and privacy rights
with the results and interpretation of data of those recruited for research participation.
collected from well-designed studies that In its most fundamental form, it embodies
were conducted to provide information for the moral necessity of obtaining consent
organizational decision-making. The choice to participate in research that is informed,
to make public the findings belongs to the rationale and voluntary. The informed require-
organization. Unethical conflicts of interest ment requires that prospective participants
can emerge in such settings. For example, are provided with all information about the
if the researcher agrees to a request by study that would be expected to influence
the company or organization to design a their willingness to participate. As embodied
study that will guarantee results are all in U.S. federal regulations and the APA Ethics
biased in a particular direction, falsify Code (APA, 2002, Standards 3.10 and 8.02;
results from previously collected data or DHHS, 2001) as well as the Canadian Code of
write a report that provides an incomplete Ethics (CPA, 2000, Standard 1.24) and the EU
summary of the data or that intentionally Code of Ethics for Socio-Economic Research
misinterprets study results. When entering (Dench, Iphofen, Huws, 2004, Standard 4.3)
into an employment or contractual agreement such information includes: (1) the purpose,
with a company or organization, social duration, and procedures; (2) the right
scientists should anticipate and educate the to decline or withdraw from participation;
company to the conflict of interest issues (3) consequences of declining or withdraw-
that may emerge and establish agreements ing; (4) risks and potential discomforts or
about data collection, interpretation, and adverse effects; (5) any prospective benefits
dissemination that permit the investigator to to participants or society; (6) extent and limits
act ethically. of confidentiality; (7) incentives for partic-
ipation; (8) who to contact with questions
regarding the research (usually the principal
investigator) and their research rights (usually
Conflicts of interest in social
the Chair of the IRB); and (9) an opportunity
research: Unchartered territory
to ask questions. Some forms of social
In summary, as industry and organizations research create consent challenges. Next we
increasingly recognize the value of social discuss informed consent within the context of
research for policy decisions and public three of these research methods: Qualitative,
relations, social scientists will increasingly archival, and deception research.
be confronted with conflict of interest
challenges. Not all conflicts of interest are
Qualitative research
unethical or avoidable. The ethical challenge
for social scientists is to be vigilant in The exploratory and open-ended nature of
identifying such conflicts, assure the public semi-structured interviews, participant obser-
that conflicts are eliminated when possible vation, or ethnographic work raises questions
and effectively managed when necessary. about whether truly informed consent for
As noted by the Office of Human Research such research can be obtained (Marshall,
Protections, ‘Openness and honesty are 1992). Several Codes, including the Aus-
indicators of integrity and responsibility, tralian National Statement on Ethical Conduct
characteristics that promote quality research in Human Research (National Health and
and can only strengthen the research process’ Medical Research Council (NHMRC), 2007).
(http://www.hhs.gov/ohrp/nhrpac/mtg12-00/ Set out specific guidelines for qualitative
finguid.htm). research (Standard 3.1). The movement to
view social sciences as ‘hard science’and IRB be archived. Social science has a prestigious
unfamiliarity with qualitative research meth- history of archives (Young & Brooker, 2006).
ods has also posed challenges to anthropolo- The purpose of archived data is to provide
gists, sociologists, and other social scientists a rich set of data that can be used by
whose research often strays from the classical future investigators to examine empirical
scientific method because of unique research questions about populations that may not be
questions or the nature of their population anticipated when information is first collected.
(Marshall, 2003). Informed consent is also prob- Several organizations have begun to unite
lematic when working with immigrant popu- social science researchers and their data from
lations or in international settings for reasons around the world to create large and secure
ranging from language barriers and fear of accessible databases of archived information.
exploitation or deportation to authority to con- For instance, the Inter-University Consortium
sent resting with an individual other than the for Political and Social Research (ICPSR)
participants, e.g. in countries where women has over 500 college or university members
are not permitted to consent to research with- and has four major operations units, one of
out prior male permission (Marshall, 2003). which is data security and preservation. The
In studies where informed consent is Harvard-MIT data center also archives and
obtained, it is often difficult to ensure fully protects various social science data to allow
informed consent at the start of a project access for future generations of social science
because researchers may not be able to researchers. Participant identity is protected
anticipate the full extent of information that in these archives through a very detailed pro-
will emerge (Haverkamp, 2005). Risks to cess of individual de-identification. However,
privacy and confidentiality emerge when the the racial, ethnic, cultural, health, or other
information leads to unanticipated revelations demographic-based populations from which
regarding illegal behaviors (crimes, child or participants were recruited in most instances
domestic abuse, illegal immigration), health must remain identifiable for the research
problems (HIV status, genetic disorder) or questions to be meaningful.
other information that if revealed could Within the continuously changing social-
jeopardize participants’ legal or economic political context in which science and society
status (Fisher & Goodman, in press; Fisher & evolve, some investigators have begun to
Ragsdale, 2006). One way to address this question the validity of informed consent to
issue is to develop in advance a re-consent ongoing secondary analysis by unknown third
strategy for situations in which unanticipated parties with research questions that may be
and sensitive issues emerge during the course inconsistent with the consent understandings
of observation or discussion (Fisher, 2004; of those who initially agreed to participation
Haverkamp, 2005). The strategy can include and preservation. This becomes of particular
a set of criteria to help the interviewer: concern when secondary analysis of data from
(1) identify when unexpected information historically oppressed or disenfranchised
may lead to increased participant privacy and communities is requested (Young & Brooker,
confidentiality risk; (2) determine whether 2006) or if the circumstances under which the
the direction of the conversation is relevant original data was collected is questionable as
to the research question; (3) if not relevant, in the 1968 Yanomami research conducted
find ways to divert the discussion; or (4) if by Neel (http://members.aol.com/archaeodog/
relevant, alert the participant to the new nature darkness_in_el_dorado/documents/0081.htm).
of information and implement a mutually Requiring individual participants to recon-
negotiated re-consent procedure. sent to the use of archival data can be
both harmful and infeasible. First, it would
require that records linking responses to
Archival research
individually identifiable information is pre-
Similar, but more difficult issues emerge when served over decades, where confidentiality
consent is obtained for social research that will protections may be vulnerable over time.
Second, it would require locating individuals the moral ambiguity surrounding consent
after years or decades which in many cases for deception research when the investi-
would be impossible and the unavailability gator intentionally gives participants false
of segments of the initial population would information about the purpose and nature
compromise the validity of the sample. In of the study. In such contexts consent for
response to these challenges, the Council of deception research distorts the informed
National Psychological Associations for the consent process, because it leads prospective
Advancement of Ethnic Minority Interests participants to believe they have autonomy
(CNPAAEMI, 2000) has recommended that to decide about the type of experimental
social research archives consider setting procedures they will be exposed to, when in
up standing community (broadly defined) fact they do not.
advisory boards as a means of helping
archive administrators determine when newly
The deception debate
proposed analyses may violate the intent of
the informed consent. Debate on the ethical justification for decep-
tive research practices reflects a tension
between scientific validity and respect for
Deception research and the
participants’ right to make a truly informed
‘consent paradox’
participation decision (Fisher & Fyrberg,
In research using deceptive methods, the 1994). Arguments for deception emphasize
researcher intentionally misinforms partici- the methodological advantage of keeping par-
pants about the purpose of the study, the ticipants naïve about the purpose of the study
procedures, or the role of individuals with to ensure responses to experimental manipula-
whom the participant will be required to tions are spontaneous and unbiased (Milgram,
interact (Sieber, 1982). The use of deceptive 1964; Resnick & Schwartz, 1973; Smith &
techniques is not prohibited in any national Richardson, 1983). Arguments against decep-
research regulations and is explicitly permit- tion emphasize the violation of participant
ted with stipulations in professional ethics autonomy, the potential to create public
codes including the American Psychological distrust in social science research in general
Association (2002), American Sociological and the harm resulting from infliction of self-
Association (1999), Canadian Psychologi- knowledge that was unexpected, unwanted,
cal Association (2000), British Psycholog- shameful or distressful (Baumrind, 1964).
ical Society (2006), and the International Sociologists have been at the center of
Sociological Association (2001). Baumrind deception controversy and have members
(1979) distinguished between nonintentional who are stanch advocates and opponents
deception, in which failure to fully inform of the practice. Allen (1997) falls into the
cannot be avoided because of the complexity latter category, criticizing sociologists for
of the information, and intentional deception, befriending groups of interest without letting
which is the withholding of information in on that they were subjects of sociological
order to obtain participation that the subject research, misrepresenting the motives of
might otherwise decline. Simply not pro- their research, and adopting a false persona
viding participants with specific hypotheses to conduct research. Particularly disturbing
regarding the relationship among experimen- to Allen is the defense that personal time
tal variables does not in itself constitute and effort prevented the feasibility of other
deception. methods, thus in order to get the research done
Deception most obviously violates the deception was necessary.
principle of respect, by depriving prospective
participants the opportunity to make an
Ethical options
informed choice regarding the true nature
of their participation. What Fisher (2005) Bulmer (1982) concludes that completely
has termed the ‘consent paradox’ underscores disguising the intent of research can affect
the quality of the data collected as well physical pain or severe emotional distress.
as exaggerate the unknown biases of the Third, the investigator must prove that
researcher. Instead he proposes such meth- the same hypotheses cannot be sufficiently
ods as retrospective participant observation explored and tested using non-deceptive
in which a sociologist uses retrospective designs. This standard thus prohibits the use
observations from previous experience when of deception research if inconvenience or
she was a total participant prior to any costs of performing non-deceptive research
research interest. He also supports the use are the only reasons for proposing such
of native as stranger, in which an already methods (Fisher, 2003a). In addition, the true
established member of the group is trained as nature of the deception must be revealed to
a sociologist. The covert outsider is another participants at the end of the study unless the
suggested method in which a legitimate role, debriefing might reasonably be expected to
such as a teacher in a prison, is taken on in bias future participant responses; or withhold
order to observe behavior and gain access to such information if the debriefing itself
an otherwise unreachable population (Bulmer, would cause participant harm (APA, 2002,
1982). Standard 8.08b).
According to U.S. federal guidance (OPRR, While the APA and other organizations’
1993), when considering the use of deception, ethics codes attempt to increase the ethical
investigators must first decide whether the rigor of decisions to use deception methodolo-
information to be withheld during consent gies, no guidance can erase the threat to partic-
would, if known, influence the individual’s ipant autonomy that such procedures reflect.
desire to participate in research. However how Neither, debriefing (even when believed to
to judge this prospectively is difficult. Some be valid by participants) nor the opportunity
have argued that responses from previous par- to withdraw their data, are a panacea for
ticipants during dehoaxing (revealing the true the ethical paradox of deception research.
nature of the study at the end of participation) Consent can only be obtained prospectively
can be used to document the benign effects (OPRR, 1993); subsequent procedures can
of different deceptive methodologies. This never be considered an adequate substitute.
approach raises its own (debriefing) paradox
(Fisher, 2005). Fisher and Fyrberg (1994)
found that introductory psychology students FAIR DISTRIBUTION OF THE BENEFITS
(the most commonly recruited participants for AND BURDENS OF RESEARCH
deception studies) were likely to believe that
the dehoaxing process was either simply a The principle of justice is concerned with
continued extension of the research or that the fair and equitable distribution of research
the debriefing information was itself untrue. benefits and burdens. In social research,
As a result, students reported they would benefits are defined by the usefulness of
be unlikely to reveal their true feelings to data generated to help understand micro and
experimenters during the dehoaxing process; macro social processes within and among
and some were concerned they would be different populations. The burdens of social
penalized if they were truthful. research include exposure to research risks
TheAPAEthics Code (APA, 2002) attempts and required time and effort associated
to balance the principles of beneficence, with participation. Justice in social research
non-maleficence, and respect. First, the use becomes a particular ethical challenge when
of deceptive methods must be justified by racial or ethnic minority, disadvantaged, or
the study’s prospective value in scientific, disenfranchised populations are recruited for
educational or applied areas. Second, even participation in research designs that fail to
if the research is determined to have value, include consideration of unique population
deception is prohibited if it is reasonably characteristics that may reduce the knowledge
expected that the procedures will cause any value of data generated or expose them to
greater risk or financial burden (Fisher, 1999; social, economic, and political forces contin-
Trimble & Fisher, 2006). uously shape and redefine these definitions
for both individuals and society at large
(Chan & Hume, 1995; Zuckerman, 1990).
Population generalizability Investigators need to consider and explicitly
describe the theoretical, empirical, and social
The constantly changing demographic U.S.
frameworks driving the definitions of race,
and international landscapes pose the risk
ethnicity, or culture used to select participant
that research findings from one participant
populations, to insure the scientific validity
population will be inappropriately generalized
of the research question and to allow their
to other populations. This can occur in at least
research findings to be evaluated within the
two ways. First, injustices may occur when
context of continuously changing scientific
populations are intentionally or unintention-
and societal conceptions of these definitions
ally excluded from recruitment, but results of
(Fisher et al., 2002).
the study are inappropriately generalized to
Within group differences are also an
apply to their social or psychological charac-
important factor to consider when identifying
teristics and circumstances. This becomes par-
population characteristics relevant to the
ticularly problematic for social science when
study questions. Investigators often ignore
the descriptions of ethnic/racial characteris-
the scientific implications of variation among
tics are vaguely described in journal articles.
populations described under broad panethnic
Typical descriptions that provide inadequate
labels. For example, failure to identify the
knowledge for assessing the relevance of the
national origins of participants categorized
data to ethnic minority populations in the
as ‘Hispanic’ (e.g. Mexico, Puerto Rico,
United States, for example are: ‘the majority
Guatemala, Chile) can produce overgener-
of participants were non-Hispanic white’;
alizations that dilute or obscure moderating
or ‘eighty-percent of participants were non-
effects on social behavior resulting from
Hispanic white; the remaining 20 percent were
national origin, immigration history, religion,
African American and Hispanic’ (Fisher &
and tradition. In addition, within even these
Brennan, 1992).
more nationally defined categories, research
participants may vary greatly in their identifi-
cation with the ethnic group of family origin or
Defining race, ethnicity, and culture
with the degree to which they are acculturated
When participants’ race, ethnicity, or culture to majority culture (Fisher et al., 1997).
are described in greater detail there is often
an absence of definition of what these Cultural equivalence of assessment
terms mean or how decisions to identify
measures
participants by ‘race’ (physical similarities
assumed to reflect phenotypic expressions Investigators need to heed a second risk
of shared genotypes), ‘ethnicity’ (assumed of producing research injustice: failure to
cultural, linguistic, religious, and historical recognize when a measure of a social
similarities), or ‘culture’ (group ways of construct established in one population when
thinking and living based upon shared knowl- applied to another ethnic/cultural group may
edge, consciousness, skills, values, expressive not yield similar psychometric properties
forms, social institutions, and behaviors that nor reflect a social phenomenon that has
allow individuals to survive in the contexts similar behavioral or psychological patterns
within which they live) reflects assumptions of relationships (Hoagwood & Jensen, 1997;
about the underlying causal mechanisms Laosa, 1990). The use of such measures
driving similarities or differences found risks the over- or under-identification of
among populations (Fisher et al., 1997). socially meaningful characteristics, compro-
Further, there is often little recognition that mising the scientific benefits of the research
and potentially resulting in harmful social and inconvenience of research as long as no

labeling or maladaptive self-conceptions of ‘undue inducements’are offered to lure people
members of the racial or ethnic group studied into participating and incentives are not
(Canino & Guarnaccia, 1997; Fisher et al., included as a ‘benefit’ in risk-benefit analyses
2002; Knight & Hill, 1998). Thus, whenever (APA, 2002, Standard 8.06; BPS, 2006,
possible, investigators should select surveys, Standard 3.3.4; CPA, 2000, Standard 1.14;
interview techniques or instruments that have NHMRC, 2007, Standard 2.2.9; National
been standardized on members of the research Advisory Council on Drug Abuse, 2000;
participants’ racial or ethnic group. When OHRP, 1993). The science establishment
such measures have not yet been developed thus recognizes that some inducement is
or sufficiently evaluated, investigators can necessary to insure sufficient sample size
evaluate the cultural validity of the measure and that it is possible for investigators to
by evaluating item equivalence and other distinguish between ‘due’and ‘undue’induce-
psychometric properties. ments (Dickert & Grady, 1991; Macklin,
1999). Selecting non-coercive incentives is
critical to insuring the voluntary nature
Moving away from comparative and of participation and that research burdens
deficit approaches are not born unequally by economically
disadvantaged populations. Cash payments or
Injustices in research can also occur when
other incentives may be considered coercive if
social research involving ethnic minority pop-
they: (1) prompt participants to lie or conceal
ulations focuses only on population deficits
information that would disqualify them from
rather than a more comprehensive analysis of
the research or; (2) lure into participating
both population vulnerabilities and strengths.
those who would otherwise choose not to
This ‘deficit’ investigative approach often
expose themselves to research risks (Macklin,
appears alongside another potential bias in
1999). The extent to which these criteria are
social research design: the assumption that
met will vary across research populations.
ethnic minority social constructs can only be
understood when compared to non-minority
standards (Fisher et al., 2002; Heath, 1997). To Types of payments
provide fair and equitable research knowledge
Ethical decisions about the use of cash
benefits, social scientists need to apply the
incentives to secure and retain participa-
same principles of scientific inquiry to all
tion in surveys on illegal and dangerous
populations studied (e.g. EU Code of Ethics
behaviors must include consideration of
for Socio-Economic Research, Standard 2.5,
how monetary inducements will affect the
Dench et al., 2004). Cultural bias in social
quality of data as well as the equitable
science has also been identified in developing
distribution of the benefits and burdens of
countries. In India for example, the People’s
research participation. Monetary incentives
Science Movement (PSM) has drawn atten-
are often used for participant recruitment.
tion to the internalization of local cultural
Payments to research participants can be
gender biases by scientists in developing
ethically justified as: (1) reimbursement for
countries (Varma, 1999).
legitimate travel or other expenses accrued
because of research participation; (2) fair
compensation for time and inconvenience
DUE AND UNDUE RESEARCH involved in research participation; (3) appre-
INCENTIVES FOR DIVERSE ciation payments (e.g. in the form of cash,
SOCIOECONOMIC POPULATIONS coupons, or gifts); and (4) incentive payments
that offer money or the equivalence beyond
National guidelines and organizational ethics those limited to reimbursement, compensa-
codes permit compensation for effort, time, tion, or appreciation (Wendler et al., 2002).
Payments across research populations dif- published studies are related to time and
fering in financial need create a tension level of activity. In addition they found no
between fair compensation for the time and evidence that participants of these studies
inconvenience of research participation and were being enticed with large monetary
coercion. inducements.
Ideally monetary incentives for research
participation should strengthen generalizabil-
Payment for participation in illicit
ity by providing a balanced representation of
individuals from all economic levels appro-
drug use research
priate to the research question (Giuffrida & Cash payment for participation in illicit drug
Togerson, 1997; Kamb et al., 1998). However, use research can create an ethical paradox if
individuals from different economic circum- it is used by participants to purchase illegal
stances can have different responses to cash drugs, encourages them to maintain their drug
inducement as fair or coercive (Levine, 1986). habits to continue earning research money, or
Payments that are unnecessarily low can leads them to provide answers to experimental
reduce the generalizability of data through questions that distort evaluation of the social
under-recruitment of economically disadvan- correlates and consequences of drug use
taged populations. Payments that are too (Fisher, 2003b; Koocher, 1991; McGrady &
high raise different concerns. For example, Bux, 1999; Shaner et al., 1995). On the
large financial incentives can jeopardize the other hand, for those who have difficulty
voluntary nature of participation, under- obtaining and holding jobs, the money may be
mine altruistic motivations for engaging in ethically justified as a legal means of obtaining
research, tempt prospective participants to payment for unskilled labor. Policies aimed
provide false information to become eligible at addressing this problem include spreading
for study participation, or lie in response out the payment of full compensation over
to experimental questions to comply with a period of time, using food coupons or
investigator expectations (Attkisson et al., vouchers for other health-related products,
1996; Fisher, 2003b; Saunders et al., 1999). making payments to third parties on behalf
Grady (2001) argues that arbitrary or large of the participant, or withholding payment if
sums of money to entice participants is poor a participant is intoxicated or in withdrawal
practice, while modest payments help to (Fisher, 2004; Gorelick et al., 1999). Such
minimize possible undue inducement. She alternatives raise their own ethical quandaries.
proposes that the informed consent process First, there is no evidence that any substitute
in which participants are reminded of their for non-cash incentives deters participants
freedom to refuse participation or withdraw with illicit drug habits from using the
their consent without repercussions is ade- monetary value of the incentives to purchase
quate protection against potential coercion drugs. For example, informal observations by
(Grady, 2001). social scientists working in the field suggest
Based on an analysis of compensation that if need be vouchers are easily sold by
practices of a representative sample of participants for cash. Furthermore, a decision
biomedical and psychosocial research con- not to pay substance abusers can reinforce
ducted in 1997 and 1998, Latterman and economic inequities between drug abusing
Merz (2001) reported research payments and non-abusing populations or deny them the
on average of $9.50/hour plus $12.00 for right to apply their own value system to life
each additional task (U.S. dollars); larger risk decisions (Fisher, 1999).
compensation was related to longer partic-
ipatory time, repeated interaction with the
Ensuring fairness
researcher, invasive tasks, and the number
of tasks. From their small study these Social scientists are challenged to determine
researchers concluded that payments in payments that are perceived by all participants
as equally attractive and legitimate for REFERENCES

the time and effort contributed. To ensure
fairness, some institutions adopt a standard Allen, C. (1997). Spies like us: When Sociologists deceive
compensation rate for all research partic- their subjects. Lingua franca, 7, 31–39.
ipation. Others have defined fair financial American Anthropological Association. (1998). Code
inducements as the amount of money a of ethics of the American Anthropological Associ-
normal, healthy volunteer would lose in work ation. Retrieved March 10, 2006 from http://www.
and travel time or by fair market value for aaanet.org/committees/ethics/ethicscode.pdf.
American Psychological Association. (1953).Ethical
the work involved (Dickert & Grady, 1991;
standards of psychologists. Washington, DC:
Winslade & Douard, 1992). Obtaining the
American Psychological Association.
opinions of community representatives prior American Psychological Association. (2002). Ethical
to research initiation provides another means principles of psychologists and code of conduct.
of establishing fair and non-coercive research American Psychologist, 57, 1060–1073.
payments (Fisher, 2003b). American Sociological Association. (1999). Code of
ethics. Washington, DC: American Sociological
Association. Retrieved March 10, 2006 from
DOING GOOD SCIENCE WELL http://www.asanet.org/galleries/default-file/Code%
20of%20Ethics.pdf.
The conduct of responsible social science Attkisson, C. C., Rosenblatt, A., & Hoagwood, K. (1996).
Research ethics and human subjects protection in
depends upon investigators’ commitment and
child mental health services research and adolescent
lifelong efforts to act ethically. However, and parent studies. In K. Hoagwood, P. Jenson, &
a desire to do the right thing must be C. B. Fisher (eds.), Ethical issues in research with
accompanied by familiarity with national children and adolescents with mental disorders
and international regulations, ethics codes, (pp. 43–58). Hillsdale, NJ: Lawrence Erlbaum
and laws essential to the identification and Associates, Inc.
resolution of ethics-in-science challenges Baumrind, D. (1964). Some thoughts on ethics
(Fisher, 2003a). Ethical commitment and of research: after reading Milgram’s ‘Behavioral
consciousness in turn are necessary but not study of obedience.’ American Psychologist, 26,
sufficient to anticipate and rightly address the 887–896.
array of ethical challenges that will emerge Baumrind, D. (1979). IRBs and social science research:
The cost of deception. IRB: A Review of Human
when social scientists work in diverse contexts
Subjects Research, 1, 1–4.
with diverse populations. Doing good science British Psychological Society (2006). http://www.bps.
well requires flexibility and sensitivity to org.uk/downloadfile.cfm?file_uuid=5084A882-1143-
the research context, the scientist’s fiduciary DFD0-7E6C-F1938A65C242&ext=pdf.
responsibilities, and participant expectations British Sociological Association. (2002). http://www.
unique to each study. The evaluation of risks britsoc.co.uk/user_doc/Statement%20of%20Ethical
and benefits, the construction of informed %20Practice.pdf.
consent procedures, and the development Bulmer, M. (1982). When is disguise justified?
of confidentiality and disclosure policies Alternatives to covert participant observation. Quali-
need to reflect a ‘goodness of fit’ between tative Sociology, 5, 251–264.
study goals and participant characteristics Canadian Psychological Association. (2000). Canadian
(Fisher, 2002, 2003c; Fisher & Goodman, code of ethics for psychologists, 3rd edition.
Retrieved March 10, 2006 from http://www.
in press; Fisher & Masty, 2006; Fisher &
cpa.ca/cpasite/userfiles/Documents/Canadian%20
Ragsdale, 2006). Framing the responsible
Code%20of%20Ethics%20for%20Psycho.pdf.
conduct of social science as a process that Canino, G., & Guarnaccia, P. (1997). Methodological
draws upon investigators’ dual commitment challenges in the assessment of Hispanic children
to scientific validity and participant protection and adolescents.Applied Developmental Science, 7,
will nourish activities that reflect the high- 13–26.
est ideals of science and merit participant Canter, M. B., Bennett, B. E., Jones, S. E., & Nagy, T. F.
trust. (1994). Ethics for psychologists: A commentary on
the APA ethics code. Washington, DC: American Fisher, C. B. (2004). Informed consent and clini-
Psychological Association. cal research involving children and adolescents:
Chan, K. S., & Hume, S. (1995). Racialization and Implications of the revised APA ethics code and
panethnicity: From Asians in America to Asian HIPAA. Journal of Clinical Child and Adolescent
Americans. In W. D. Hawley & A. W. Jackson (Eds.), Psychology, 33, 833–840.
Toward a common destiny: Improving race and ethnic Fisher, C. B. (2005). Deception research involving
relations in America (pp. 205–236). San Francisco: children: Ethical practices and paradoxes. Ethics &
Jossey-Bass. Behavior, 15, 271–287.
Council for International Organizations of Medical Fisher, C. B., & Brennan, M. (1992). Application
Sciences. (2002). International guidelines for eth- and ethics in developmental psychology. In
ical review of epidemiological studies Geneva: D. L. Featherman, R. M. Lerner, & M. Perlmutter
Council for International Organizations of Medical (Eds.), Life-span development and behavior
Sciences. Retrieved May 11, 2007 from http://www. (pp. 189–219). Hillsdale, NJ: Lawrence Erlbaum
cioms.ch/frame_guidelines_nov_2002.htm. Associates.
Council of National Psychological Associations for Fisher, C. B., & Fyrberg, D. (1994). College students
the Advancement of Ethic Minority Interests. weigh the costs and benefits of deceptive research.
(2000).Guidelines for research in ethnic minority American Psychologist, 49, 417–426.
communities. Washington, DC: American Psycholog- Fisher, C. B., & Goodman, S. J. (in press). Goodness-
ical Association. of-fit ethics for non-intervention research involving
Dench, S., Iphofen, R., & Huws, U. (2004). IES Report dangerous and illegal behavior. In D. Buchanan,
412. An EU code of Ethics for Socio-economic C. B. Fisher, & L. Gable (Eds.), Ethical & legal
Research. Brighton, UK: The Institute for Employment issues in research with high risk populations:
Studies. Addressing threats of suicide, child abuse, and
Department of Health, Education, & Welfare (DHEW). violence. Washington, DC: APA Press.
(1978). The Belmont report: Ethical principles and
Fisher, C. B., Hoagwood, K., Boyce, C., Buster, T.,
guidelines for the protection of human subjects of
Frank, D. A., Grisso, T., Levine, R. J., Macklin, R.,
research. Washington DC: US Government Printing
Spencer, M. B., Takanishi, R., Trimble, J. E., &
Office.
Zayas, L. H. (2002). Research ethics for mental health
Department of Health and Human Services. (2001). Title
science involving ethnic minority children and youth.
45 Public Welfare, Part 46, Code of federal regu-
American Psychologist, 57, 1024–1040.
lations, Protections of human subjects.Washington,
Fisher, C. B., Jackson, J., & Villarruel, F. (1997). The study
DC: Government Printing Office.
of African American and Latin American children
Dickert, N., & Grady, C. (1991). What’s the price
and youth. In R. M. Lerner (Ed.), Handbook of
of a research subject: Approaches to the payment
child psychology (Vol. I, 5th ed., pp. 1145–1207).
for research participation. New England Journal of
New York: Wiley.
Medicine, 341, 198–203.
Fisher, C. B. (1999). Relational ethics and research Fisher, C. B., & Masty, J. K. (2006). A goodness-of-
with vulnerable populations. Reports on research fit ethic for informed consent to pediatric cancer
involving persons with mental disorders that may research. In R. T. Brown (Ed.), Comprehensive
affect decision-making capacity (Vol. II, pp. 29–49). handbook of childhood cancer and sickle cell
Commissioned Papers by the National Bioethics disease: A biopsychosocial approach (pp. 205–217).
Advisory Commission, Rockville, MD. Retrieved New York: Oxford University Press.
March 21, 2006 from http://www.georgetown. Fisher, C. B., & Ragsdale, K. (2006). A goodness-of-
edu/research/nrcbl/nbac/pubs.html. fit ethics for multicultural research. In J. Trimble
Fisher, C. B. (2002). A goodness-of-fit ethic for informed and C. B. Fisher (Eds.),The handbook of ethical
consent. Fordham Urban Law Journal, 30, 159–171. research with ethnocultural populations and com-
Fisher, C. B. (2003a). Decoding the ethics code: munities (pp. 3–26). Thousand Oaks, CA: Sage
A practical guide for psychologists. Thousand Oaks, Publications.
CA: Sage Publications. Fried, A. F., & Fisher, C. B. (in press). The ethics
Fisher, C. B. (2003b). Adolescent and parent perspec- of informed consent for research in clinical and
tives on ethical issues in youth drug use and suicide abnormal psychology. In D. McKay (Ed.), Handbook
survey research. Ethics & Behavior, 13, 302–331. of research methods in abnormal and clinical
Fisher, C. B. (2003c). A goodness-of-fit ethic for psychology. Thousand Oaks, CA: Sage Publications.
child assent to nonbeneficial research. The American Giuffrida, A., & Togerson, D. J. (1997).Should we pay the
Journal of Bioethics, 3, 27–28. patient? Review of financial incentives to enhance
patient compliance. British Medical Journal, 315, Koocher, G. P. (1991). Questionable methods in
703–707. alcoholism research. Journal of Consulting and
Gorelick, D. A., Pickens, R. W., & Bonkovsky, F. O. Clinical Psychology, 59, 246–248.
(1999). Clinical research in substance abuse: Laosa, L. M. (1990). Population generalizeability, cul-
Human subjects issues. In H. A. Pincus, J. A. tural sensitivity, and ethical dilemmas. In C. B. Fisher &
Lieberman, & S. Ferris (Eds.), Ethics in psychiatric W. W. Tryon (Eds.), Ethics in applied developmental
research (pp. 177–192). Washington, DC: American psychology: Emerging issues in an emerging field
Psychiatric Association. (pp. 227–252). Norwood, NJ: Ablex.
Grady, C. (2001). Money for research participation: Does Latterman, J., & Merz, J. F. (2001). How much are
it jeopardize informed consent? American Journal of subjects paid to participate in research? The American
Bioethics, 1, 40–44. Journal of Bioethics, 1, 45–46.
Haverkamp, B. E. (2005). Ethical perspectives on Levine, R. (1986). Ethics and regulation of clin-
qualitative research in applied psychology. Journal of ical research (2nd ed.). Baltimore: Urban &
Counseling Psychology, 52, 146–155. Schwarzenberg.
Heath, S. B. (1997). Culture: Contested realm in research Macklin, R. (1999). Moral progress and ethnical
on children and youth. Applied Developmental universalism. In R.Macklin (Ed.), Against relativism:
Science, 1, 113–123. Cultural diversity and the search for ethical universal
Heller, J. (1972). Syphilis victims in the U.S. study went in medicine (pp. 249–274). New York: Oxford
untreated for 40 years. New York Times, 26 July University Press.
1972, 1, 8. Marshall, P. L. (1992). Research ethics in applied
Hoagwood, K., & Jensen, P. S. (1997). Developmental anthropology. IRB: A Review of Human Subjects
psychopathology and the notion of culture: Introduc- Research, 14, 1–5.
tion to the special section on ‘The fusion of cultural Marshall, P. L. (2003). Human Subjects Protections,
horizons: Cultural influences on the assessment Institutional Review Boards, and Cultural Anthro-
of psychopathology in children and adolescents.’ pological Research. Anthropological Quarterly, 76,
Applied Developmental Science, 1, 108–112. 269–285.
Indian Council of Medical Research. (2000). Ethical McGrady, B. S., & Bux, D. A. (1999). Ethical issues in
guidelines on biomedical research involving human informed consent with substance abusers. Journal of
subjects New Delhi: Indian Council of Medical Consulting and Clinical Psychology, 67, 186–193.
Research. Retrieved May 11, 2007 from http://www. Milgram, S. (1964). Issues in the study of obedience:
icmr.nic.in/ethical.pdf. A reply to Baumrind. American Psychologist, 19,
International Committee of Medical Journal Editors. 848–852.
(2003). Uniform requirements for manuscripts sub- National Advisory Council on Drug Abuse. (2000).
mitted to biomedical journals: Writing and editing for Recommended guidelines for the administra-
biomedical publication. Retrieved April 2, 2004 from tion of drugs to human subjects. DA-01-002.
http://www.icmje.org/#conflicts. NIDA-CAMCODA. Retrieved January 11, 2004
International Sociological Association (2001). http:// from http://grants.nih.gov/grants/guide/noticefiles/
www.isa-sociology.org/about/isa_code_of_ethics.htm. NOT-DA-01-002.html.
Jones, J. H. (1993). Bad blood: The Tuskegee syphilis National Consensus Conference on Bioethics and
experiment, new and expanded ed. New York: Free Health Research in Uganda. (1997). Guidelines for
Press. the Conduct of Health Research Involving Human
Kamb, M. L., Rhodes, F., Hoxworth, T., Rogers, J., Subjects in Uganda. Kampala, Uganda.
Lentz, A., Kent, C., MacGowen, R., & Peterman, T. A. National Health and Medical Research Council
(1998). What about money? Effects of small monetary (NHMRC). (2007). National Statement on Ethical
incentives on enrollment, retention, and motivation to Conduct in Human Research. http://www.nhmrc.
change behavior in an HIV/STD prevention counseling gov.an/publications/synopses/_files/e72.pdf.
intervention. Sexually Transmitted Infection, 74, National Research Council. (2003). Protecting partici-
253–255. pants and facilitating social and behavior sciences
Knight, G. P., & Hill, N. E. (1998). Measurement equiv- research. In C. F. Citro, D. R. Ilgen, & C. B. Marret
alence in research involving minority adolescents. (Eds.). Washington, D.C.: The National Academies
In V. C. McLoyd & L. Steinberg (Eds.), Studying minor- Press.
ity adolescents: Conceptual, methodological,and Nolan, R. W. (2002). Anthropology in practice: building
theoretical issues (pp. 183–211). Mahwah, NJ: a career outside the academy (directions in applied
Erlbaum. anthropology). Boulder: Lynne Rienner.
Nuremberg Code. (1946). Journal of the American of medical ethics. Thailand: Ministry of Public
Medical Association, 132, 1090. Health.
Office for Protection From Research Risks, Department The Hindu. (2005). Editorial: A dangerous conflict of
of Health and Human Service, National Institutes interest. The Hindu. Retrieved May 11, 2007 from
of Health. (1993). Protecting human research http://www.hindu.com/2005/11/30/stories/2005113
subjects: Institutional review board guidebook. 002301000.htm.
Washington, DC: Government Printing Office. Trimble, J. E., & Fisher, C. B. (2006). The handbook
Ogundiran, T. O. (2004). Enhancing the African bioethics of ethical research with ethnocultural population
initiative. BMC Medical Education, 4, 21. and communities. Thousand Oaks, CA: Sage
Pan African Bioethics Initiative (PABIN). (2001). Terms Publications.
of Reference. http://www.pabin.net/enindex.asp. Varma, R. (1999). Women and people’s science
Resnick, J. H., & Schwartz, T. (1973). Ethical standards movements in India. Technology & Society:
as an independent variable in psychological research. Historical, Societal, and Professional Perspectives
American Psychologist, 28, 134–139. Proceedings, 1999 International Symposium, 29-21,
Saunders, C. A., Thompson, P. D., & Weijer, C. (1999). 378–382.
What’s the price of a research subject? New England Wendler, D., Rackoff, J. E., Emanuel, E. J., &
Journal of Medicine, 341, 1550–1552. Grady, C. (2002). The ethics of paying for children’s
Shaner, A., Eckman, T. A., & Roberst, L. J. (1995). participation in research. Journal of Pediatrics,
Disability income, cocaine use, and repeated hos- 141(2), 166–171.
pitalization among schizophrenic cocaine abusers: Winslade, W. J., & Douard, J. W. (1992). Ethical issues
A government-sponsored revolving door? New in psychiatric research. In L. K. G. Hsu & M. Hersen
England Journal of Medicine, 333, 777–783. (Eds.), Research in psychiatry: Issues, strategies, and
Sieber, J. E. (1982). Ethical dilemmas in social research. methods (pp. 57–70). New York: Plenum.
In J. E. Sieber (Ed.), The ethics of social research: World Medical Association. (2000). Declaration of
Surveys and experiments (pp. 1–30). New York: Helsinki: Ethical principles for medical research
Springer-Verlag. involving human subjects. Edinburgh: World
Sieber, J. E. (1992). Planning ethically responsible Medical Association. Retrieved May 11, 2007 from
research: A guide for students and internal review http://www.wma.net/e/policy/pdf/17c.pdf.
boards. Thousand Oaks: CA. Sage Publications. Young, C. H., & Brooker, M. (2006). Safeguarding sacred
Smith, S. S., & Richardson, D. (1983). Amelioration lives: The ethical use of archival data for the study of
of deception and harm in psychological research. diverse lives. In J. E. Trimble & C. B. Fisher (Eds.),
Journal of Personality and Social Psychology, 44, The handbook of ethical research with ethnocultural
1075–1082. populations and communities. Thousand Oaks, CA:
Steinbock, B., Arras, J. D., & London, A. J. (2005). Sage Publications.
Ethical issues in modern medicine. New York, NY: Zuckerman, M. (1990). Some dubious premises in
McGraw-Hill Higher Education. research and theory on racial differences: Scientific,
Thailand Ministry of Public Health Ethics Committee. social and ethical issues. American Psychologist, 45,
(1995). Rule of the medical council on the observance 1297–1303.
PART II
Research Designs
This section of the handbook provides diverse intervention is being tested to determine
perspectives on the design of social research. its impact. The priority for these research
This section provides a sample of important designs is to enhance the ability to draw valid
issues in the design of qualitative and conclusions about the attribution of cause.
quantitative research rather than an integrated Howard Bloom’s chapter on randomized
textbook approach to design. This approach experiments provides both a basic framework
allows a more in-depth exploration of topics for understanding the design of experiments
that range from a detailed quantitative analysis as well as a look at future developments and
of sample size planning for studies using applications. Randomized designs require that
multiple regression, to broader overviews individuals or aggregates such as organiza-
of the conduct of qualitative case studies. tions have an equal chance of being assigned
The creation of randomized and quasi- to treatment or control groups. The major
experimental research designs is discussed advantage of this design is that it is the best
in detail in the first two chapters. These way to assure that the groups are equivalent
chapters provide essential information on how on both measured and unmeasured variables
to improve both of these research designs. at the start of the study. Properly implemented,
From the in-depth quantitative perspective on this design eliminates most threats to internal
sample size we move to a re-conceptualization validity, i.e. the factors that threaten the ability
of generalizability in qualitative research. to demonstrate that the treatment caused the
The author of this chapter argues that effect and not something else. Familiarity
correctly designed qualitative studies are with randomized designs is increasingly
as generalizable as representative sampling important as the number of studies using
used in quantitative studies. An overview of these designs increases. For example, in the
the qualitative case study is provided in the U.S. one federal research agency (Institute of
following chapter. In the next chapter the Education Sciences) requires applicants for
similarities and differences in the design of research grants to use a randomized design or
qualitative and quantitative longitudinal and justify why they did not. Randomized designs
panel studies are discussed. The final chapter have been used in almost all substantive areas
of this section discusses specific issues in including such diverse topics as education,
the design of comparative and cross-national policing, and child care.
studies. Bloom explains the five elements that
The first two chapters of this section need to be present in a randomized design.
deal with social science studies where an The research question must specify what
treatment is being tested and with what more difficult in other situations. This design
condition it will be compared. Typically the is less powerful than a randomized design and
comparison group will not be a no treatment thus is less likely to detect an effect if one
group but a group receiving usual treatment. is there. The interrupted times series design
Second, the unit of randomization needs to requires several data points before and after
be specified. One of the major advances in the intervention.
research has been the application of ran- The authors also discuss in some detail how
domized designs to organizations and other to strengthen the most widely used design –
aggregations such as schools or classes. The the non-equivalent control design. This design
specification of the measurement methods is compares a treated group with an untreated
the third element. How will outcome and control group using one pre-test and one
baseline characteristics be measured? The post-test. Random assignment to conditions
fourth element is of a practical nature. What is not used in this design. Cook and Wong
is the implementation strategy? How will note that many dismiss quasi-experiments
sites or individuals be recruited, randomly as being grossly inferior to randomized
assigned and the treatment delivered? Fifth experiments. However, they describe studies
is the analysis plan that addresses whether that show that under some circumstances
randomization was successful and if the well-executed quasi-experiments’ outcomes
treatment delivered as planned. are comparable to randomized experiments’
Planning a randomized experiment is more outcomes. One of the most important con-
complex than just these five elements. Bloom ditions is how well the groups match before
explains some of these complications and the study on both measured and unmeasured
suggests actions that the researcher can take variables.
to prevent or deal with potential problems. One of the more recent approaches to
For example he discusses the effects of non- matching groups to enhance their equivalency
compliance to the intervention and how to involves the use of propensity scores. These
statistically adjust for it in the analysis. The scores are usually constructed of variables
chapter also suggests future directions for found in pre-treatment scores that are good
randomized designs. predictors of group membership. These scores
The chapter by Tom Cook and Vivian Wong represent the differences in selection between
provides an excellent overview of experimen- the two groups. The authors provide excellent
tal and quasi-experimental research designs, examples of other ways to strengthen quasi-
with a focus on the latter. The authors experiments such as the use of double pre-
stress that while well-executed, randomized tests.
experiments are the best choice for drawing One of the first questions experimental
causal conclusions there are some quasi- researchers need to consider in planning a
experiments that are excellent alternatives. study is the sample size. The availability and
The first section of the chapter carefully feasibility of collecting data from the sample
examines two strong designs. Both the is of prime consideration, especially when
regression-discontinuity and the interrupted the sampling units are not individuals but
time series with a control series are good organizations such as schools or clinics. The
in reducing the plausibility of alternative cost collecting data and the number of units
explanations that threaten the internal validity required will set the outer limit on the sample
of non-experimental designs. However, there size. In planning a study two categories need
are significant limitations to both approaches. to be considered. The first category is whether
The regression-discontinuity design requires the research question is about an overall
that the treatment and comparison be assigned indicator (i.e. an omnibus test) or targeted
by a cut-off score from some assignment vari- effect. The second category is whether the
able. This is feasible, for example, where there goal is to determine a point estimate that
is screening before getting the intervention but requires the calculation of statistical power or
RESEARCH DESIGNS 113
if it is a confidence interval that requires the producing generalizable results, because they
calculation of accuracy. start from the assumption that their objects
In the second category a power estimate of study possess quasi-invariant states on the
is needed to test the null hypothesis, i.e. properties observed. The (statistical) principle
that a specific value is different from zero. of variance is the key concept applied here.
Concern over power is driven by needing Under the variance principle, to determine the
to demonstrate statistical significance or how sample size, the researcher must first know the
probable the result (a point estimate) is due range of variance that one intends to measure.
to chance. An alternate approach to research If the range of variance is high, the number
questions favored by some is the use of of cases studied needs to be high, whereas
confidence intervals. Here the question is if the range of variance is restricted, the
concerned with how wide is the band of number may be restricted as well. Gobo shows
uncertainty or error. The authors use the how the way in which representativeness is
term ‘accuracy’ to describe the narrowness discussed and sought for in many traditions
of confidence intervals. The smaller the of qualitative research is in line with the
confidence interval the better is the accuracy. variance principle. By applying a theory-
Accuracy is a function of precision and bias. driven strategy of choosing additional cases
Ken Kelley and Scott Maxwell provide an and by defining their units of analysis in
in-depth explanation of these concepts and a sensible way, researchers are able to assess
how research questions can be categorized the variability of the phenomenon and to make
into a two by two table where the goal sure that extreme cases are taken into account.
can be power verses accuracy and the effect Thus the explanation given can be argued to
can be targeted verses omnibus. They use be generalizable to the defined population,
this approach to help explicate how the although probability sampling is not used.
determination of the sample size in multiple Linda Mabry’s chapter on case studies in
regression is dependent on these four factors. social research provides an overview of the
The chapter can be formidable for persons not ways in which this approach has evolved
well versed in statistical analysis. However, and is used in the social sciences. Case
it provides an important way to conceptualize studies are most useful for identifying and
the decisions needed to determine sample size. documenting the patterns of ordinary events
In the chapter ‘Re-Conceptualizing Gener- in their social, cultural, and historical context.
alization in Qualitative Research’ Giampietro The case study is based on the inductive
Gobo makes the point that probability sam- method and is a means to build a theoretical
pling cannot be advocated as the only model understanding of social phenomena. From this
suited to the generalization of findings. On the viewpoint, traditional hypotheses testing may
other hand Gobo warns against the extreme restrict the researchers’ vision and may foster
postmodernist stance, which in fact agrees a premature conclusion and thus miss a deeper
with and supports the positivist viewpoint that understanding of the object of study. Mabry
generalizability can only be based on random emphasizes that an attitude of openness should
sampling. Instead, he promotes what he calls be maintained in conducting a case study.
an idiographic sampling theory, which is in The particular strength of the chapter
fact in use in several disciplines outside the by Jane Elliott, Janet Holland, and Rachel
human sciences. These are disciplines akin to Thomson on longitudinal research is that
qualitative research, for they work exclusively they cover both qualitative and quantitative
on few cases and have learnt to make a virtue research traditions, which are both well
out of necessity. Disciplines such as biol- established and typically discussed separately.
ogy, astrophysics, genetics, paleontology, and The chapter focuses on panel and cohort
linguistics work on non-probability samples studies where the same group of individuals
regarded as being just as representative of is followed through time. Elliott et al. show
their relative populations and therefore as that in terms of the objectives for carrying
out longitudinal research, there isn’t much social relations rather than as contributing
difference between qualitative and quantita- to the maintenance and metamorphosis of
tive researchers. Longitudinal social research themselves, and the culture and community
is done because it offers unique insights in which they live.
into process, change and continuity over Comparative research, especially when it
time in phenomena ranging from individuals, is conducted cross nationally, is another
families, and institutions to societies. important growth area in the social sci-
Elliott et al. point out that both quali- ences in the context of the globalization
tative and quantitative traditions have their of communications, technological progress,
strengths, which may be complemented in and growing internationalization. This is the
mixed methods studies. Quantitative methods focus of Chapter 15 by David de Vaus. As
offer refined techniques to analyze causal de Vaus concludes, such research raises the
relations, whereas qualitative researchers tend same methodological issues as other research,
to be shy in talking about causal relations even at least in abstracto. However because of the
though some argue that because of its attention complexity involved in comparative research,
to detail, process, complexity, and contex- especially when applied cross nationally,
tuality, qualitative research is particularly there are additional problems of how to deal
valuable for identifying and understanding with inter- and intra-societal differences of
multi-causal linkages. In quantitative longitu- language and culture. The chapter explores the
dinal research a priority is placed on collecting nature of comparative research and classifies
accurate data from a large representative sam- it according to two broad types: case-
ple about the nature and timing of life events, based comparative studies and variable-based
circumstances, and behavior. In qualitative comparative research. The chapter explores
longitudinal research the emphasis is far more their different logics and the problems that
on individuals’ understanding of their lives each confronts. The strength of case-based
and circumstances and how these may change comparative methods lies in its understanding
through time. While quantitative longitudinal of specificities within the context of the
analytic processes provide a more processual whole case, a feature that is crucial to
or dynamic understanding of the social world, cross-cultural research. On the other hand,
they do so at the expense of setting up such research raises the problem of how
a static view of the individual. Quantitative to know the boundaries of the case, issues
longitudinal research provides a powerful tool to do with the small number of cases that
for understanding the multiple factors that are typically involved, and issues around
may affect individuals’ lives, shaping their invariant causation. The problems of variable-
experiences and behavior. But there is little based comparative studies, notably discussed
scope for understanding how individuals use with reference to cross-national surveys, also
narrative to construct and maintain a sense have their own problems related to equiva-
of their own identity. Without this element lences of meanings and the standardization
there is a danger that people are merely seen of procedures. However it is arguable that
as making decisions and acting within a pre- case-based comparative research also has to
defined and structurally determined field of contend with these challenges.
9
The Core Analytics of
Randomized Experiments
for Social Research
Howard S. Bloom
INTRODUCTION research (e.g. Fisher, 1935; Kempthorne,

1952; Cochran and Cox, 1957; Cox, 1958) to
This chapter introduces the central analytic research on industrial engineering (e.g. Box
principles of randomized experiments for et al., 2005), to educational and psychological
social research. Randomized experiments are research (e.g. Lindquist, 1953; Myers, 1972)
lotteries that randomly assign subjects to to social science and social policy research
research groups, each of which is offered (e.g. Boruch, 1997; Orr, 1999; Bloom, 2005a).
a different treatment. When the method is In addition, several journals have been
implemented properly, differences in future established to promote advancement of the
outcomes for experimental groups provide method (e.g. the Journal of Experimental
unbiased estimates of differences in the Criminology, Clinical Trials and Controlled
impacts of the treatments offered. The method Clinical Trials).
is usually attributed to Ronald A. Fisher The use of randomized experiments for
(1925 and 1935), who developed it during social research has greatly increased since the
the early 1900s1 . After World War II, War on Poverty in the 1960s. The method has
randomized experiments gradually became been used in laboratories and in field settings
the method of choice for testing new drugs to randomize individual subjects, such as
and medical procedures, and to date over students, unemployed adults, patients, or
350,000 randomized clinical trials have been welfare recipients, and intact groups, such as
conducted (Cochrane Collaboration, 2002)2 . schools, firms, hospitals, or neighborhoods3 .
Numerous books have been written about Applications of the method to social research
randomized experiments as their application have examined issues such as child nutri-
has expanded from agricultural and biological tion (Teruel and Davis, 2000); child abuse
(Olds et al., 1997); juvenile delinquency randomization was conducted or randomization

(Lipsey, 1988); policing strategies (Sherman occurred within blocks or matched pairs; and
and Weisburd, 1995); child care (Bell et al., (3) whether baseline covariates were used to
2003); public education (Kemple and Snipes, improve precision.
2000); housing assistance (Orr et al., 2003);
health insurance (Newhouse, 1996); income This chapter examines the analytic core
maintenance (Munnell, 1987); neighborhood of randomized experiments — design and
effects (Kling et al., 2007); job training analysis, with a primary emphasis on design.
(Bloom et al., 1997); unemployment insur-
ance (Robins and Spiegelman, 2001); welfare-
to-work (Bloom and Michalopoulos, 2001); WHY RANDOMIZE?
and electricity pricing (Aigner, 1985)4 .
A successful randomized experiment There are two main reasons why well-
requires clear specification of five elements. implemented randomized experiments are
the most rigorous way to measure causal
1 Research questions: What treatment or treat- effects.
ments are being tested? What is the counterfactual They eliminate bias: Randomizing subjects
state (in the absence of treatment) with which to experimental groups eliminates all system-
treatments will be compared? What estimates of atic preexisting group differences, because
net impact (the impact of specific treatments versus only chance determines which subjects are
no such treatments) are desired? What estimates assigned to which groups5 . It is therefore
of differential impact (the difference between
valid to attribute observed differences in
impacts of two or more treatments) are desired?
future group outcomes to differences in
2 Experimental design: What is the unit of
randomization: individuals or groups? How many the treatments they were offered. Hence,
individuals or groups should be randomized? What these causal inferences (impact estimates) are
portion of the sample should be randomized to unbiased. Randomization of a given sample
each treatment or to a control group? How, if may produce experimental groups that differ
at all, should covariates, blocking, or matching by chance, however. These differences are
(explained later) be used to improve the precision random errors, not biases. Hence, the absence
of impact estimates? of bias is a property of the process of
3 Measurement methods: What outcomes are randomization, not a feature of its application
hypothesized to be affected by the treatments to a specific sample. The laws of probability
being tested, and how will these outcomes be
ensure that the larger the experimental sample
measured? What baseline characteristics, if any,
is, the smaller preexisting group differences
will serve as covariates, blocking factors, or match-
ing factors, and how will these characteristics be are likely to be.
measured? How will differences in treatments be They enable measurement of uncertainty:
measured? Experiments randomize all sources of uncer-
4 Implementation strategy: How will experimen- tainty about impact estimates for a given sam-
tal sites and subjects be recruited, selected, ple (their internal validity). Hence, confidence
and informed? How will they be randomized? intervals or tests of statistical significance
How will treatments be delivered and how will can account for all of this uncertainty. No
their differences across experimental groups be other method for measuring causal effects
maintained? What steps will be taken to ensure has this property. One cannot, however,
high-quality data?
account for all uncertainty about generalizing
5 Statistical analysis: The analysis of treatment
an impact estimate beyond a given sample
effects must reflect how randomization was
conducted, how treatment was provided, and (its external validity) without both randomly
what baseline data were collected. Specifically it sampling subjects from a known population
must account for: (1) whether randomization was and randomly assigning them to experimental
conducted or treatment was delivered in groups or groups (which is rarely possible in social
individually (explained later); (2) whether simple research)6 .
THE CORE ANALYTICS OF RANDOMIZED EXPERIMENTS FOR SOCIAL RESEARCH 117
A SIMPLE EXPERIMENTAL where n equals the total number of experi-

ESTIMATOR OF CAUSAL EFFECTS mental sample members (nT + nC ) and P
equals the proportion of this sample that is
Consider an experiment where half of the randomized to treatment8 .
sample is randomized to a treatment group
that is offered an intervention and half is
randomized to a control group that is not CHOOSING A SAMPLE SIZE AND
offered the intervention, and everyone adheres ALLOCATION
to their assigned treatment. Follow-up data
are obtained for all sample members and The first steps in designing a randomized
the treatment effect is estimated by the experiment are to specify its treatment,
difference in mean outcomes for the two target group, and setting. The next steps
groups, YT − YC . This difference provides an are to choose a sample size and alloca-
unbiased estimate of the average treatment tion that maximize precision given existing
effect (ATE) for the study sample, because constraints. For this purpose, it is useful
the mean outcome for control group members to measure precision in terms of minimum
is an unbiased estimate of what the mean detectable effects (Bloom, 1995, 2005b).
outcome would have been for treatment group Intuitively, a minimum detectable effect is the
members had they not been offered the smallest true treatment effect that a research
treatment (their counterfactual). design can detect with confidence. Formally,
However, any given sample can yield it is the smallest true treatment effect that
a treatment group and control group with has a specified level of statistical power for
preexisting differences that occur solely by a particular level of statistical significance,
chance and can overestimate or underestimate given a specific statistical test.
the ATE. The standard error of the impact Figure 9.1 illustrates that the minimum
estimator (SE( YT − YC )) accounts for this detectable effect of an impact estimator is
random error, where: a multiple of its standard error. The first
bell-shaped curve (on the left of the figure)
σ2 σ2 represents a t distribution for a null hypothesis
SE(YT − YC ) = + (1) of zero impact. For a positive impact estimate
nT nC
to be statistically significant at the α level with
given: a one-tail test (or at the α/2 level with a two-
tail test), the estimate must fall to the right
nT and nC = the number of treatment group
of the critical t-value, tα (or tα/2 ), of the first
members and control group members,
distribution. The second bell-shaped curve
σ 2 = the pooled outcome variance across represents a t distribution for an alternative
subjects within experimental groups7 . hypothesis that the true impact equals a
specific minimum detectable effect. To have a
The number of treatment group members probability (1 − B) of detecting the minimum
and control group members are experimental detectable effect it must lie a distance of t1−B
design decisions. The variance of the outcome to the right of the critical t-value for the null
measure is an empirical parameter that must hypothesis. (The probability (1−B) represents
be ‘guesstimated’ from previous research the level of statistical power.) Hence the
when planning an experiment and can be esti- minimum detectable effect must lie a total
mated from follow-up data when analyzing distance of tα + t1−B (or tα/2 + t1−B ) from
experimental findings. For the discussion that the null hypothesis. The minimum detectable
follows it is useful to restate Equation 1 as: effect is either tα + t1−B (for a one-tail test)
or tα/2 + t1−B (for a two-tail test) times the
σ2
SE( YT − YC ) = (2) standard error. These critical t-values depend
nP(1 − P) on the number of degrees of freedom.
ta /2 or ta t1−B
Effect size Effect size

0 Minimum detectable
effect
One-tail multiplier = ta + t1−B
Two-tail multiplier = ta /2 + t1−B
Figure 9.1 The minimum detectable effect multiplier
A common convention for defining mini- detectable effect, it should be based on a

mum detectable effects is to set statistical careful search of empirical estimates for
significance (α) at 0.05 and statistical power closely related studies11 .
(1 − B) at 80 percent. When the number of Sometimes impacts are measured as a
degrees of freedom exceeds about 30, the standardized mean difference or ‘effect size,’
multiplier equals roughly 2.5 for a one-tail (ES) either because the original units of
test and 2.8 for a two-tail test9 . Thus, if the the outcome measures are not meaningful
standard error of an estimator of the average or because outcomes in different metrics
effect of a job-training program on future must be combined or compared. (There is
annual earnings were $500, the minimum no reason to standardize the impact estimate
detectable effect would be roughly $1,250 for the preceding job training example.)
for a one-tail test and $1,400 for a two-tail The standardized mean difference ES equals
test. the difference in mean outcomes for the
Consider how this applies to the experiment treatment group and control group, divided
described above. The multiplier, Mn−2 10 , by the standard deviation of outcomes across
times the standard error, SE(YT − YC ), yields subjects within experimental groups, or:
the minimum detectable effect:

YT −
YC
σ2 ES = (4)
MDE( YT − YC ) = Mn−2 (3) σ
nP(1 − P)
Some researchers use the pooled within-
Since the multiplier Mn−2 is the sum of two group standard deviation to define ESs while
t-values, determined by the chosen levels of others use the control-group standard devi-
statistical significance and power, the missing ation. Standardized mean ESs are therefore
value that needs to be determined for the measured in units of standard deviations.
sample design is that for σ 2 . This value For example, an ES of 0.25 implies an
will necessarily be a guess, but since it impact equal to 0.25 standard deviation. When
is a central determinant of the minimum impacts are reported in ES, precision can
be reported as a minimum detectable ES from approximately $570 to $400. Thus,

(MDES), where: quadrupling the sample cuts the minimum
detectable effect in half. The same pattern
1 holds for MDESs.
MDES( YT −
YC ) = Mn−2 (5)
nP(1 − P) The second main observation is that for a
given sample size, precision decreases slowly
Table 9.1 illustrates the implications of as the allocation between the treatment and
Equations 3 and 5 for the relationship between control groups becomes more imbalanced.
sample size, allocation, and precision. The Equation 5 implies
that the MDES is pro-
top panel in the table presents minimum portional to 1/ P(1 − P), which equals 2.00,
detectable effects for a hypothetical job 2.04, 2.18, 2.50, or 3.33 when P (or its
training program, given a standard deviation complement) equals 0.5, 0.6, 0.7, 0.8, or 0.9.
for the outcome (annual earnings) of $1,000. Thus, for a given sample size, precision is best
The bottom panel presents corresponding with a balanced allocation (P = 0.5). Because
MDESs. precision erodes slowly until the degree of
The first main observation is that increasing imbalance becomes extreme (roughly P ≤ 0.2
sample size has a diminishing absolute return or P ≥ 0.8), there is considerable latitude for
for precision. For example, the first column using an unbalanced allocation. Thus, when
in the table illustrates how the minimum political pressures to minimize the number of
detectable effect (or ES) declines with an control group members are especially strong,
increase in sample size for a balanced one could use a relatively small control group.
allocation (P = 0.5). Doubling the sample Or when the costs of treatment are particularly
size from 50 individuals to 100 individuals high, one could use a relatively large control
reduces the minimum detectable effect from group12 .
approximately
√ $810 to $570 or by a factor of One of the most difficult steps in choosing
1/ 2. Doubling the sample size again from a sample design is selecting a target minimum
100 to 200 individuals reduces the minimum √ detectable effect or ES. From an economic
detectable effect by another factor of 1/ 2 perspective, this target should equal the
Table 9.1 Minimum detectable effect and ES for individual randomization

Sample size n Sample allocation P /(1 − P )
0.5/0.5 0.6/0.4 or 0.4/0.6 0.7/0.3 or 0.3/0.7 0.8/0.2 or 0.2/0.8 0.9/0.1 or 0.1/0.9

Minimum detectable effect given σ = $1,000
50 $810 $830 $880 $1,010 $1,350
100 570 580 620 710 940
200 400 410 430 500 660
400 280 290 310 350 470
800 200 200 220 250 330
1,600 140 140 150 180 230
Minimum detectable effect size

50 0.81 0.83 0.88 1.01 1.35
100 0.57 0.58 0.62 0.71 0.94
200 0.40 0.41 0.43 0.50 0.66
400 0.28 0.29 0.31 0.35 0.47
800 0.20 0.20 0.22 0.25 0.33
1,600 0.14 0.14 0.15 0.18 0.23
Source: Computations by the author.
Note: Minimum detectable effect sizes are for a two-tail hypothesis test with statistical significance of
0.05 and statistical power of 0.80.
smallest true impact that would produce Consequently, it is important to distin-

benefits that exceed intervention costs. From a guish between the following two impact
political perspective, it should equal the small- questions:
est true impact deemed policy-relevant. From
a programmatic perspective, the target should 1 What is the average effect of offering treatment?
equal the smallest true impact that exceeds 2 What is the average effect of receiving treatment?
known impacts from related interventions.
The most popular benchmark for gauging The first question asks about the impact
standardized ESs is Cohen’s (1977/1988) of a treatment offer. This impact — which
prescription (based on little empirical evi- can be estimated experimentally — is often
dence) that values of 0.20, 0.50, and 0.80 called the average effect of ‘intent to treat’
be considered small, moderate, and large. (ITT ). Since voluntary programs can only
Lipsey (1990) subsequently provided empir- offer treatment — they cannot require it —
ical support for this prescription from a the effect of ITT is a relevant consideration for
synthesis of 186 meta-analyses of intervention making policy decisions about such programs.
studies. The bottom third of ESs in Lipsey’s Furthermore, since even mandatory programs
synthesis ranges from 0.00 to 0.32, the middle often have incomplete compliance, the effect
third ranges from 0.33 to 0.55, and the of ITT can be an important consideration for
top third ranges from 0.56 to 1.20. Both judging them.
authors suggest, however, that their general The second question above asks about
guidelines do not apply to many situations. the impact of treatment receipt. It is often
For example, recent research suggests that called the average impact of ‘treatment on the
much smaller ESs are policy-relevant for treated’ (TOT ) and is typically the question of
educational interventions. Findings from the interest for developers of interventions who
Tennessee Class Size Experiment indicate that want to know what they can achieve by full
reducing elementary school class size from implementation of their ideas. However, in
22–26 students to 13–17 students increased many instances this impact question may not
performance on standardized reading and be as policy-relevant as the first one, because
math tests by 0.1 to 0.2 standard deviation rarely can treatment receipt be mandated.
(Nye et al., 1999). More recently, Kane’s There is no valid way to estimate the
(2004) study of grade-to-grade improvement second type of effect experimentally, because
in math and reading on a nationally normed there is no way to know which control group
test suggests that one full year of elementary members are counterparts to treatment group
school attendance increases student achieve- members who receive treatment. To estimate
ment by roughly 0.25 standard deviation. such impacts, Bloom (1984) developed an
These results highlight the importance of extension of the experimental method, which
basing decisions about needed precision on was later expanded by Angrist et al. (1996)14 .
the best existing evidence for the context To see how this approach works, it is useful
being studied13 . to adopt a framework and notation that is
now conventional for presenting it. This
framework comprises three variables: Y , the
ESTIMATING CAUSAL EFFECTS WITH outcome measure; Z, which equals one for
NONCOMPLIANCE subjects randomized to treatment and zero
otherwise; and D, which equals one for
In most social experiments, some treatment subjects who receive the treatment and zero
group members (‘no-shows’) do not receive otherwise.
treatment and some control group members Consider an experiment in which some
(‘crossovers’) do. This noncompliance dilutes treatment group members do not receive treat-
the experimental treatment contrast, causing ment (they become no-shows) but no control
it to understate the average treatment effect. group members receive treatment (there are
no crossovers). If no-shows experience no In addition, because of potential heterogeneity

effect from the intervention (because they of treatment effects, the effect of TOT
are not exposed to it) or from randomization generalizes only to experimental treatment
per se, the average effect of ITT equals the recipients and does not necessarily equal the
weighted mean of TOT for treatment recipi- average treatment effect for the full study
ents and zero for no-shows, with weights equal sample.
to the treatment receipt rate ([E(D|Z = 1]) Now add crossovers (control group mem-
and the no-show rate (1 − [E(D|Z = 1)]), bers who receive treatment) to the situation,
such that: which further dilutes the experimental treat-
ment contrast. Nonetheless, the difference in
ITT = [E(D|Z = 1)]TOT+[1−E(D|Z = 1)]0 mean outcomes for the treatment group and
= [E(D|Z = 1)]TOT (6) control group provides an unbiased estimate
of the effect of ITT. Thus, it addresses
Equation 6 implies that: the first impact question stated above. To
address the second question requires a more
ITT complex analytic framework with additional
TOT = (7)
E(D|Z = 1) assumptions. This framework — developed
by Angrist et al. (1996) — is based on
The effect of TOT thus equals the effect of four conceptual subgroups, which because of
ITT divided by the expected receipt rate for randomization comprise the same proportion
treatment group members. For example, if of the treatment group and control group,
the effect of ITT for a job training program in expectation. Figure 9.2 illustrates the
were a $1,000 increase in annual earnings, and framework and how it relates to the concepts
half of the treatment group members received of no-shows and crossovers. The first stacked
treatment, then the effect of TOT would bar in the figure represents all treatment
be $1,000/0.5 or $2,000. This adjustment group members (for whom Z = 1) and the
allocates all of the treatment effect to only second stacked bar represents all control
those treatment group members who receive group members (for whom Z = 0). Treatment
treatment. Equation 7 represents the true group members who do not receive treatment
effect of TOT for a given population. The (for whom D = 0) are no-shows, and control
corresponding sample estimator, TOT ˆ , is: group members who do receive treatment (for
whom D = 1) are crossovers.

YT −
YC Randomization induces treatment receipt
ˆ =
TOT (8)
|Z = 1)
(D for two of the four subgroups in the Angrist
et al. framework — ‘compliers’ and ‘defiers.’
where (D |Z = 1) equals the observed treat- Compliers receive treatment only if they
ment receipt rate for the treatment group. If are randomized to the treatment group, and
no-shows experience no effect, this estimator defiers receive treatment only if they are
is statistically consistent and its estimated randomized to the control group. Thus,
standard error is approximately: compliers add to the effect of ITT, and
defiers subtract from it. Randomization does
se(
YT −
YC ) not influence treatment receipt for the other
ˆ )≈
se(TOT (9)
|Z = 1)
(D two groups — ‘always-takers,’ who receive
treatment regardless of their randomization
Hence, both the point estimate and standard status, and ‘never-takers,’ who do not receive
error are scaled by the treatment receipt rate. treatment regardless of their randomization
The preceding approach does not require status. Never-takers experience no treatment
that no-shows be similar to treatment recipi- effect in the treatment group or control
ents. It requires only that no-shows experience group, and always-takers experience the same
no effect from treatment or randomization15 . effect in both groups, which cancels out
D=1 Compliers D=0
D=1 Always-takers D=1
Never-takers
D=0 D=0
Defiers
D=0 D=1
Treatment Group Control Group

(Z=1) (Z=0)
Figure 9.2 A hypothetical experiment including no-shows and crossovers

Source: Bloom 2005a.
Note: D equals 1 if the treatment would be received and 0 otherwise.
in the overall difference between treatment The estimated LATE is the ratio of the esti-
and control groups. Hence, always-takers and mated impact of randomization on outcomes
never-takers do not contribute information and the estimated impact of randomization on
about treatment effects. treatment receipt18 . Angrist et al. show that
If defiers do not exist16 , which is reasonable this ratio is a simple form of instrumental
to assume in many situations, the effect of variables analysis called a Wald estimator
treatment for compliers, termed by Angrist (Wald, 1940).
et al. (1996) the Local Average Treatment Returning to our previous example, assume
Effect (LATE) is17 : that there is a $1,000 difference in mean
annual earnings for a treatment group and
control group; half of the treatment group
ITT receives treatment and one-tenth of the control
LATE = (10)
E(D|Z = 1) − E(D|Z = 0) group receives treatment. The estimated
LATE equals the estimated impact on the
outcome ($1,000), divided by the estimated
Thus to estimate the LATE from an exper- impact on treatment receipt rates (0.5 – 0.1).
iment, one simply divides the difference in This ratio equals $1,000/0.4 or $2,50019 .
mean outcomes for the treatment and control When using this approach to estimate
groups by their difference in treatment receipt treatment effects, it is important to clearly
rates, or: specify the groups to which it applies,
because different groups may experience
different effects from the same treatment,

YT − YC
ˆ =
LATE (11) and not all groups and treatment effects
|Z = 1) − (D
(D |Z = 0) can be observed without making further
assumptions. The impact of ITT applies to multiple regression model for estimating
the full treatment group. So both the target intervention effects:
group and its treatment effect can be observed.
∗
The LATE, which can be observed, applies
k
to compliers, who cannot be observed. The Y i = α + β 0 Ti + Bk Xki + εi (13)

effect of TOT, which cannot be observed, k=1
applies to all treatment group members who
receive treatment (compliers plus always- Defining RA2 as the proportion of pooled
takers), who can be observed. unexplained variation in the outcome within
experimental groups predicted by covariates,
the MDES is21 :
USING COVARIATES AND BLOCKING
TO IMPROVE PRECISION 1 − RA2
MDES(β̂0 ) = Mn−k ∗ −2 (14)
nP(1 − P)
The two main approaches for improving
the precision of randomized experiments — There are two differences between the
covariates and blocking — use the predictive MDES in Equation 14 with covariates and
power of past information about sample Equation 5 without covariates. The first
members to reduce unexplained variation difference involves the multipliers, Mn−2 and
in their future outcomes. This reduces the Mn−k ∗ −2 , where the latter multiplier accounts
standard error of the impact estimator and its for the loss of k ∗ degrees of freedom from
corresponding minimum detectable effect20 . estimating coefficients for k ∗ covariates. With
To examine these approaches, it is useful roughly 40 or more sample members and 10
to reformulate the impact of ITT as the or fewer covariates, this difference is however
following bivariate regression: negligible22 .
The second difference is the term 1 − RA2
Yi = α + β0 Ti + εi (12) with covariates in Equation 14, instead of the
value 1 in Equation 5 without covariates. The
where: term 1 − RA2 implies that the MDES decreases
as the predictive power of covariates increases
Yi = the outcome for sample member i for a given sample size and allocation. In this
way, covariates can increase effective sample
Ti = one if sample member i is randomized
size. For example, an RA2 of 0.25 yields an
to the treatment group and zero otherwise
effective sample that is one-third larger than
εi = a random error that is independently that without covariates; an RA2 of 0.50 yields
and identically distributed across sample an effective sample that is twice as large; and
members within experimental groups, an RA2 of 0.75 yields an effective sample that
with a mean of 0 and a variance of σ 2 . is four times as large.
Several points are important to note about
α is the expected mean outcome without using covariates with experiments. First, they
treatment and β0 is the average effect of ITT. are not needed to eliminate bias, because
Thus β0 equals the difference in expected randomization has done so already23 . Thus,
outcomes for the treatment group and control values for the term B0 in Equations 12
group, and its estimator, β̂0 equals the differ- and 13 are identical. Second, it is good
ence in mean outcomes for the treatment and practice to specify all covariates in advance
control groups in the experimental sample. of the impact analysis — preferably when
Using k ∗ baseline characteristics, Xki , an experiment is being designed. This helps
as covariates to reduce the unexplained to avoid subsequent data mining. Third,
variation in Yi produces the following the best predictors of future outcomes are
typically past outcomes. For example, past There are two differences in the expressions
student achievement is usually the best for minimum detectable effects with and
predictor of future student achievement. This without blocking (Equations 16 and 5).
is because past outcomes reflect most factors The first difference involves the multipliers,
that determine future outcomes. Fourth, Mn−m∗ −1 versus Mn−2 , which account for
some outcomes are more predictable than the loss of one degree of freedom per block
others, and thus covariates provide greater and the gain of one degree of freedom
precision gains for them. For example, the from suppressing the intercept. With samples
correlation between individual standardized of more than about 40 members in total
test scores is typically stronger for high school and 10 or fewer blocks, there is very little
students than for elementary school students difference between these two multipliers.
(Bloom et al., 2005). The second difference is the addition of the
The second approach to improving pre- term 1 − RB2 in Equation 16 to account for
cision is to block or stratify experimen- the predictive power of blocking. The more
tal sample members by some combination similar sample members are within blocks
of their baseline characteristics, and then and the more different blocks are from each
randomize within each block or stratum. The other, the higher this predictive power is. This
extreme case of two sample members per is where precision gains come from. Note,
block is an example of matching. Factors however, that for samples with fewer than
used for blocking in social research typically about 10 subjects, precision losses due to
include geographic location, organizational reducing the number of degrees of freedom by
units, demographic characteristics, and past blocking can sometimes outweigh precision
outcomes. To compute an unbiased estimate gains due to the predictive power of blocking.
of the impact of ITT from such designs This is most likely to occur in experiments
requires computing impact estimates for each that randomize small numbers of groups
block and pooling estimates across blocks. (discussed later).
One way to do this in a single step is to add Another reason to block sample members
to the impact regression a series of indicator is to avoid an ‘unhappy’ randomization
variables that represent each of the m∗ blocks, with embarrassing treatment and control
and suppress the intercept, α, yielding: group differences on a salient characteris-
tic. Such differences can reduce the face
∗

m validity of an experiment, thereby under-
Yi = β0 Ti + γm Smi + εi (15) mining its credibility. Blocking first on
m=1 the salient characteristic eliminates such a
mismatch.
where: Sometimes researchers wish to assure
Smi = one if sample member i is from block treatment and control group matches on
(or stratum) m and zero otherwise. multiple characteristics. One way to do so is
The estimated value of B0 provides an to define blocks in terms of combinations of
unbiased estimator of the effect of ITT. The characteristics (e.g. age, race, and gender).
MDES of this estimator can be expressed as: But doing so can become complicated in
practice due to uneven distributions of sample
1 − RB2 members across blocks, and the consequent
MDES(β̂0 ) = Mn−m∗ −1 (16) need to combine blocks, often in ad hoc
nP(1 − P)
ways. A second approach is to specify a
composite index of baseline characteristics
where:
and create blocks based on intervals of
RB2 = the proportion of unexplained this index24 . Using either approach, the
variation in the outcome within experimental quality of the match on any given char-
groups (pooled) predicted by the blocks. acteristic typically declines as the number
of matching variables increases. So it is model for estimating ITT effects with group
important to set priorities for which variables randomization:
to match25 .
Regardless of how blocks are defined, one’s Yij = α + β0 Tj + ej + εij (17)
impact analysis must account for them if they
are used. To not do so would bias estimates where:
of standard errors. In addition, it is possible to
use blocking in combination with covariates. Yij = the outcome for individual i from
If so, both features of the experimental design group j
should be represented in the experimental α = the mean outcome without treatment
analysis.
B0 = the average impact of ITT
Tj = 1 for groups randomized to treatment
RANDOMIZING GROUPS TO and 0 otherwise
ESTIMATE INTERVENTION EFFECTS
ej = an error that is independently and
This section introduces a type of experimental identically distributed between groups
design that is growing rapidly in popular- with a mean of 0 and a variance of τ 2
ity — the randomization of intact groups εij = an error that is independently and
or clusters26 . Randomizing groups makes identically distributed between individ-
it possible to measure the effectiveness of uals within groups with a pooled mean
interventions that are designed to affect entire of zero and variance of σ 2 .
groups or are delivered in group settings,
such as communities, schools, hospitals, Equation 17 for group randomization has
or firms. For example, schools have been an additional random error, ej , relative to
randomized to measure the impacts of whole Equation 12 for individual randomization.
school reforms (Cook et al., 2000; Borman This error reflects how mean outcomes vary
et al., 2005) and school-based risk-prevention across groups, which reduces the precision of
campaigns (Flay, 2000); communities have group randomization.
been randomized to measure the impacts To see this, first note that the rela-
of community health campaigns (Murray tionship between group-level variance, τ 2 ,
et al., 1994); small local areas have been and individual-level variance, σ 2 , can be
randomized to study the impacts of police expressed as an intra-class coefficient, ρ,
patrol interventions (Sherman and Weisburd, where:
1995); villages have been randomized to
study the effects of a health, nutrition, τ2
ρ= (18)
and education initiative (Teruel and Davis, τ2 + σ2
2000); and public housing developments ρ equals the proportion of total variation
have been randomized to study the effects across all individuals in the target population
of a place-based HIV prevention pro- (τ 2 + σ 2 ) that is due to variation between
gram (Sikkema et al., 2000) and a place- groups (τ 2 ). If there is no variation in
based employment program (Bloom and mean outcomes between groups, (τ 2 = 0)
Riccio, 2005). ρ equals zero. If there is no variation in
Group randomization provides unbiased individual outcomes within groups, (σ 2 = 0)
estimates of intervention effects for the same ρ equals one.
reasons that individual randomization does. Consider a study that randomizes a total
However, the statistical power or precision of J groups in proportion P to treatment
of group randomization is less than that with a harmonic mean value of n individuals
for individual randomization, often by a lot. per group. The ratio of the standard error of
To see this, consider the basic regression this impact estimator to that for individual
randomization of the same total number of Choosing a sample size and allocation for
subjects (Jn) is referred to as a design effect group-randomized studies means choosing
(DE), where: values for J, n, and P. Equation 20 illustrates
how these choices influence MDES (Bloom
et al., 2005).
DE = 1 + (n − 1)ρ (19)

ρ 1−ρ
MDES(βˆ0 ) = MJ−2 +
As the intra-class correlation (ρ) increases, P(1−P)J P(1−P)Jn
the DE increases, implying a larger standard (20)
error for group randomization relative to
individual randomization. This is because a This equation indicates that the group-
larger ρ implies greater random variation level variance (ρ) is divided by the total
across groups. The value of ρ varies typically number of randomized groups, J, whereas the
from about 0.01 to 0.20, depending on the individual-level variance, (1 − ρ) is divided
nature of the outcome being measured and the by the total number of individuals, Jn28 .
type of group being randomized. Hence, increasing the number of randomized
For a given total number of individuals, groups reduces both variance components,
the DE also increases as the number of whereas increasing the number of individuals
individuals per group (n) increases. This per group reduces only one component. This
is because for a given total number of result illustrates one of the most impor-
individuals, larger groups imply fewer groups tant design principles for group-randomized
randomized. With fewer groups randomized, studies: The number of groups randomized
larger treatment and control group differences influences precision more than the size of the
are likely for a given sample27 . groups randomized.
The DE has important implications for The top panel of Table 9.2 illustrates this
designing group-randomized studies. For point by presenting MDESs for an intra-
example, with ρ equal to 0.10 and n class correlation of 0.10, a balanced sample
equal to 100, the standard error for group allocation, and no covariates. Reading across
randomization is 3.3 times that for individual each row illustrates that, after group size
randomization. To achieve the same precision, reaches about 60 individuals, increasing it
group randomization would need almost affects precision very little. For very small
11 times as many sample members. Note that randomized groups (with less than about
the DE is independent of J and depends only 10 individuals each), changing group size can
on the values of n and ρ. have a more pronounced effect on precision.
The different standard errors for group Reading down any column in the top
randomization and individual randomization panel illustrates that increasing the number
also imply a need to account for group ran- of groups randomized can improve precision
domization during the experimental analysis. appreciably. Minimum detectable effects are
This can be done by using a multilevel model approximately inversely proportional to the
that specifies separate variance components square root of the number of groups ran-
for groups and individuals (for example domized once the number of groups exceeds
see Raudenbush and Bryk, 2002). In the about 20.
preceding example, using an individual-level Equation 21 illustrates how covariates
model, which ignores group-level variation, affect precision with group randomization29 .
would estimate standard errors that are one-
third as large as they should be. Thus, as MDES(β̂0 )

Jerome Cornfield (1978: 101) aptly observed:
ρ(1−R22 ) (1−ρ)(1−R12 )
‘Randomization by group accompanied by = MJ−g∗ −2 +
an analysis appropriate to randomization by P(1−P)J P(1−P)Jn
individual is an exercise in self-deception.’ (21)
Table 9.2 Minimum detectable effect size for

balanced group randomization with ρ = 0.10
Total groups randomized (J) Group size (n)
10 30 60 120 480
No covariates
10 0.88 0.73 0.69 0.66 0.65
30 0.46 0.38 0.36 0.35 0.34
60 0.32 0.27 0.25 0.24 0.24
120 0.23 0.19 0.18 0.17 0.17
480 0.11 0.09 0.09 0.08 0.08
Group-level covariate (R22 = 0.6)

10 0.73 0.54 0.47 0.44 0.41
30 0.38 0.28 0.25 0.23 0.22
60 0.27 0.20 0.17 0.16 0.15
120 0.19 0.14 0.12 0.11 0.11
480 0.09 0.07 0.06 0.06 0.05
Source: Computations by the author.
Note: Minimum detectable effect sizes are for two-tail hypothesis
tests with statistical significance of 0.05 and statistical power of 0.80.
where Because of group-randomization’s large

sample size requirements, it is especially
R12 = the proportion of individual variance important to use covariates to predict group-
(at level one) predicted by covariates, level variances. The bottom panel of Table 9.2
illustrates this point. It presents the MDES for
R22 = the proportion of group variance (at
each sample configuration in the top panel
level two) predicted by covariates,
when a covariate that predicts 60 percent of the
g∗ = the number of group covariates used group-level variance (R22 = 0.6) is included.
(note: the number of individual-level For example, adding this covariate to a design
covariates does not affect the number of that randomizes 30 groups with 60 individuals
degrees of freedom). each reduces the MDES from 0.36 to 0.25,
which is equivalent to doubling the number
With group randomization, multiple levels of of groups randomized.
predictive power are at play — R12 for level one Widespread application of group random-
(individuals) and R22 for level two (groups)30 . ization is only beginning, and much remains
Group-level covariates can reduce the unex- to be learned about how to use the approach
plained group-level variance (τ 2 ), whereas effectively for social research. One of the most
individual-level covariates can reduce both important pieces of information required to do
the group-level and individual-level variances so is a comprehensive inventory of parameter
(τ 2 and σ 2 ). However, because group-level values needed to design such studies: ρ, R12
variance is typically the binding constraint and R22 . These values vary widely, depending
on precision, its reduction is usually most on the type of outcome being measured, the
important. This is analogous to the fact that type of group being randomized, and the type
increasing the number of groups is usually of covariate/s being used31 .
more important than increasing group size.
Thus, in some cases group-level covariates —
which can be simple and inexpensive to FUTURE FRONTIERS
obtain — provide as much gain in precision as
do individual covariates (Bloom et al., 1999, During the past several decades, randomized
2005). experiments have been used to address a
9 When the number of degrees of freedom used to study causal effects of other mediating
becomes smaller, the multiplier becomes larger as the variables.
t distribution becomes fatter in its tails. 20 The remainder of this chapter assumes
10 The subscript n − 2 equals the number of a common variance for treatment and control
degrees of freedom for a treatment and control group groups.
difference of means, given a common variance for the 21 One way to estimate RA2 from a dataset
two groups. would be to first estimate Equation 12 and compute
11 When the outcome measure is a one/zero binary residual outcome values for each sample member.
variable (e.g. employed = 1 or not employed = 0) The next step would be to regress the residuals on
the variance estimate is p(1 − p)/n where p is the covariates. The resulting r -square for the second
the probability of a value equal to one. The usual regression is an estimate of RA2 .
conservative practice in this case is to choose p = 0.5, 22 See Bloom (2005b) for a discussion of this issue.
which yields the maximum possible variance. 23 Covariates can also provide some protection
12 The preceding discussion makes the con- against selection bias due to sample attrition.
ventional assumption that σ 2 is the same for the 24 Such indices include propensity scores
treatment and control groups. But if the treatment (Rosenbaum and Rubin, 1983) and Mahalanobis
affects different sample members differently, it distance functions (http//en.wikipedia.org/wiki/
can create a σ 2 for the treatment group which Mahalanobis_distance).
differs from that for the control group (Bryk and 25 One controversial issue is whether to treat
Raudenbush, 1988). This is a particular instance of blocks as ‘fixed effects,’ which represent a defined
heteroscedasticity. Assuming that these two standard population, or ‘random effects,’ which represent a
deviations are equal to each other can produce a random sample from a larger population. Equations
bias in estimates of the standard error of the impact 15 and 16 treat blocks as fixed effects. Raudenbush
estimator (Gail et al., 1996). Two ways to eliminate this et al. (2005) present random-effects estimators for
problem are to: (1) use a balanced sample allocation blocking.
and (2) estimate separate variances for the treatment 26 Bloom et al. (2005), Donner and Klar (2000),
and control groups (Bloom, 2005b). and Murray (1998) provide detailed discussions of
13 Weisburd (1993) among others, found that this approach; Boruch and Foley (2000) review its
large samples can sometimes provide less statistical applications.
power than small samples because large samples may 27 The statistical properties of group randomiza-
have weaker treatment implementation. Researchers tion in experimental research are much like those of
should consider this possibility when designing cluster sampling in survey research (Kish, 1965).
experiments, although there are no clear quantitative 28 When total student variance (τ 2 + σ 2 ) is
guidelines for doing so. standardized to a value of one by substituting
14 Angrist (2005) and Gennetian et al. (2005) the intra-class correlation (ρ) into the preceding
illustrate the approach. expressions, ρ represents τ 2 and (1−ρ) represents σ 2 .
15 This is a specific case of the exclusion principle 29 Raudenbush (1997) and Bloom et al. (2005)
specified by Angrist et al. (1996). discuss in detail how covariates affect precision with
16 Angrist et al. (1996) refer to this condition as group randomization.
monotonicity. 30 The basic principles discussed here extend to
17 This formulation assumes that the average situations with more than two levels of clustering.
effect of treatment on always-takers is the same 31 Existing sources of this information include,
whether they are randomized to treatment or control among others: Bloom et al. (1999, 2005); Hedges and
status. Hedberg (2005); Murray and Blitstein (2003); Murray
18 The expression for LATE in Equation 10 and Short (1995); Schochet (2005); Siddiqui et al.
simplifies to the expression for TOT in Equation 7 (1996); and Ukoumunne et al. (1999).
when there are no-shows but no crossovers. Both 32 Some other countries where randomized social
expressions represent ITT divided by the probability of experiments have been conducted include: the UK
being a complier. When there are crossovers (but no (Walker et al., 2006); Mexico (Shultz, 2004); Colombia
defiers), the probability of being a complier equals the (Angrist et al., 2002); Israel (Angrist and Lavy, 2002);
probability of receiving the treatment if randomized to India (Banerjee et al., 2005; Duflo and Hanna, 2005);
the treatment group, minus the probability of being and Kenya (Miguel and Kremer, 2004). For a review
an always-taker. When there are no crossovers, there of randomized experiments in developing countries,
are no always-takers. see Kremer (2003).
19 In the present analysis, treatment receipt is 33 Two studies that tried to open the black box
a mediating variable in the causal path between of treatment effects experimentally are the Riverside,
randomization and the outcome. Gennetian et al. California Welfare Caseload Study, which randomized
(2005) show how the same approach (using different caseload sizes to welfare workers (Riccio
instrumental variables with experiments) can be et al., 1994) and the Columbus, Ohio, comparison of
rapidly expanding range of social science broader application of this approach to social
questions; experimental designs have become research.
increasingly sophisticated; and statistical
methods have become more advanced. So
what are the frontiers for future advances?
One frontier involves expanding the geo- ACKNOWLEDGMENTS
graphic scope of randomized experiments in
the social sciences. To date, the vast majority This chapter was supported by the Judith
of such experiments have been conducted Gueron Fund for Methodological Innovation
in the United States, although important in Social Policy Research at MDRC, which
exceptions exist in both developed and was created through gifts from the Annie
developing countries32 . Given the promise of E. Casey, Rockefeller, Jerry Lee, Spencer,
the approach, much more could be learned by William T. Grant and Grable Foundations.
promoting its use throughout the world. Many thanks are due to Richard Dorsett,
A second frontier involves unpacking the Carolyn Hill, Rob Hollister, and Charles
‘black box’ of social experiments. Experi- Michalopoulos for their helpful suggestions.
ments are uniquely qualified to address ques-
tions like: what did an intervention cause to
happen? But they are not well suited to address
questions like: why did an intervention have NOTES
or not have an effect33 ? Two promising
approaches to such questions are emerging, 1 References to randomizing subjects to compare
treatment effects date back to the seventeenth
which combine nonexperimental statistical century (Van Helmont, 1662), although the earliest
methods with experimental designs. documented use of the method was in the late
One approach uses instrumental variables nineteenth century for research on sensory perception
analysis to examine the causal paths between (Peirce and Jastrow, 1884/1980). There is some
randomization and final outcomes by com- evidence that randomized experiments were used for
educational research in the early twentieth century
paring intervention effects on intermediate (McCall, 1923). But it was not until Fisher (1925 and
outcomes (mediating variables) with those on 1935) combined statistical methods with experimental
final outcomes34 . The other approach uses design that the method we know today emerged.
methods of research synthesis (meta-analysis 2 Marks (1997) provides an excellent history of this
or multilevel models that pool primary process.
3 See Bloom (2005a) for an overview of group-
data) with multiple experiments, multiple randomized experiments; see Donner and Klar (2000)
experimental sites, or both to estimate how and Murray (1998) for textbooks on the method.
intervention effects vary with treatment 4 For further examples, see Greenberg and Shroder
implementation, sample characteristics, and (1997).
local context35 . It is especially important 5 Absent treatment, the expected values of all
past, present, and future characteristics are the
for this latter approach to have high-quality same for a randomized treatment group and control
implementation research that is conducted in group. Hence, the short-term and long-term future
parallel with randomized experiments. experiences of the control group provide valid
Perhaps the most important frontier for estimates of what these experiences would have been
randomized experiments in the social sciences for the treatment group had it not been offered the
treatment.
is the much-needed expansion of organiza- 6 Three studies that used national probability
tional and scientific capacity to implement sampling and random assignment are the evaluations
them successfully on a much broader scale. of Upward Bound (Myers et al., 2004), Head Start
To conduct this type of research well requires (Puma et al., 2006) and the Job Corps (Schochet,
high levels of scientific and professional 2006).
7 The present discussion assumes a common
expertise, which at present exist only at a outcome variance for the treatment and control
limited number of institutions. It is therefore groups.
hoped that this chapter will contribute to a 8 Note that Pn equals nT and (1 − P )n equals nC .
separate versus integrated job functions for welfare Bloom, Howard S. 1984. ‘Accounting for No-Shows in
workers (Scrivener and Walter, 2001). Experimental Evaluation Designs.’ Evaluation Review
34 For example, Morris and Gennetian (2003), 8(2): 225–46.
Gibson et al. (2005), Liebman et al. (2004), and Bloom, Howard S. 1995. ‘Minimum Detectable Effects:
Ludwig et al. (2001) used instrumental variables with A Simple Way to Report the Statistical Power of
experiments to measure the effects of mediating
Experimental Designs.’ Evaluation Review 19(5):
variables on final outcomes.
35 Heinrich (2002) and Bloom et al. (2003) used
547–56.
primary data from a series of experiments to address Bloom, Howard S. (ed.). 2005a. Learning More from
these issues. Social Experiments: Evolving Analytic Approaches.
New York: Russell Sage Foundation.
Bloom, Howard S. 2005b. ‘Randomizing Groups to
Evaluate Place-Based Programs.’ In Howard S.
REFERENCES Bloom (ed.), Learning More from Social Experiments:
Evolving Analytic Approaches. New York: Russell
Aigner, Dennis J. 1985. ‘The Residential Time-of-Use Sage Foundation.
Pricing Experiments: What Have We Learned?’ In Bloom, Howard S., Johannes M. Bos, and Suk-Won
Jerry A. Hausman and David A. Wise (eds.), Social Lee. 1999. ‘Using Cluster Random Assignment to
Experimentation. Chicago: University of Chicago Measure Program Impacts: Statistical Implications for
Press. the Evaluation of Education Programs.’ Evaluation
Angrist, Joshua D. 2005. ‘Instrumental Variables Review 23(4): 445–69.
Methods in Experimental Criminology Research: Bloom, Howard S., Carolyn J. Hill, and James A.
What, Why and How.’ Journal of Experimental Riccio. 2003. ‘Linking Program Implementation and
Criminology 2: 1–22. Effectiveness: Lessons from a Pooled Sample of
Angrist, Joshua, Eric Bettinger, Erik Bloom, Elizabeth Welfare-to-Work Experiments.’ Journal of Policy
King, and Michael Kremer. 2002. ‘Vouchers for Analysis and Management 22(4): 551–75.
Private Schooling in Colombia: Evidence from a Bloom, Howard S., Larry L. Orr, George Cave, Stephen
Randomized Natural Experiment.’ The American H. Bell, Fred Doolittle, and Winston Lin. 1997. ‘The
Economic Review 92(5): 1535–58. Benefits and Costs of JTPA Programs: Key Findings
Angrist, Joshua, Guido Imbens, and Don Rubin. 1996. from the National JTPA Study.’ The Journal of Human
‘Identification of Causal Effects Using Instrumental Resources 32(3): 549–576.
Variables.’ JASA Applications invited paper, with Bloom, Howard S., and James A. Riccio. 2005. ‘Using
comments and authors’ response. Journal of the Place-Based Random Assignment and Comparative
American Statistical Association 91(434): 444–55. Interrupted Time-Series Analysis to Evaluate the
Angrist, Joshua D., and Victor Lavy. 2002. ‘The Effect Jobs-Plus Employment Program for Public Housing
of High School Matriculation Awards: Evidence from Residents.’ Annals of the American Academy of
Randomized Trials.’ Working Paper 9389. New York: Political and Social Science 599 (May): 19–51.
National Bureau of Economic Research. Bloom, Howard S., Lashawn Richburg-Hayes, and
Banerjee, Abhijit, Shawn Cole, Esther Duflo, and Leigh Alison Rebeck Black. 2005. ‘Using Covariates to
Linden. 2005. ‘Remedying Education: Evidence from Improve Precision: Empirical Guidance for Studies
Two Randomized Experiments in India.’ Working that Randomize Schools to Measure the Impacts
Paper 11904. Cambridge, MA: National Bureau of of Educational Interventions.’ Working Paper. New
Economic Research. York: MDRC.
Bell, Stephen, Michael Puma, Gary Shapiro, Ronna Cook, Borman, Geoffrey D., Robert E. Slavin, A. Cheung, Anne
and Michael Lopez. 2003. ‘Random Assignment for Chamberlain, Nancy Madden, and Bette Chambers.
Impact Analysis in a Statistically Representative Set 2005. ‘The National Randomized Field Trial of
of Sites: Issues from the National Head Start Impact Success for All: Second-Year Outcomes.’ American
Study.’ Proceedings of the August 2003 American Educational Research Journal 42: 673–96.
Statistical Association Joint Statistical Meetings Boruch, Robert F. 1997. Randomized Experiments for
(CD-ROM). Alexandria, VA: American Statistical Planning and Evaluation. Thousand Oaks, CA: Sage
Association. Publications.
Bloom, Dan, and Charles Michalopoulos. 2001. How Boruch, Robert F., and Ellen Foley. 2000. ‘The Honestly
Welfare and Work Policies Affect Employment Experimental Society: Sites and Other Entities as
and Income: A Synthesis of Research. New York: the Units of Allocation and Analysis in Randomized
MDRC. Trials.’ In Leonard Bickman (ed.), Validity and Social
Experimentation: Donald Campbell’s Legacy, vol. 1. Evolving Analytic Approaches. New York: Russell
Thousand Oaks, CA: Sage Publications. Sage Foundation.
Box, George E.P., J. Stuart Hunter, and William G. Gibson, C., Katherine Magnusen, Lisa Gennetian,
Hunter. 2005. 2nd ed. Statistics for Experimenters: and Greg Duncan. 2005. ‘Employment and Risk
Design Innovation and Discovery. New York: John of Domestic Abuse among Low-Income Single
Wiley and Sons. Mothers.’ Journal of Marriage and the Family 67:
Bryk, Anthony S., and Stephen W. Raudenbush. 1988. 1149–68.
‘Heterogeneity of Variance in Experimental Stud- Greenberg, David H., and Mark Shroder. 1997. The
ies: A Challenge to Conventional Interpretations.’ Digest of Social Experiments. Washington, DC: Urban
Psychological Bulletin 104(3): 396–404. Institute Press.
Cochrane Collaboration. 2002. ‘Cochrane Central Hedges, Larry V., and Eric C. Hedberg. 2005. ‘Intraclass
Register of Controlled Trials Database.’ Available at Correlation Values for Planning Group Randomized
the Cochrane Library Web site: www.cochrane.org Trials in Education.’ Working Paper WP-06-12.
(accessed September 14, 2004). Evanston, IL: Northwestern University, Institute for
Cochran, William G., and Gertrude M. Cox. 1957. Policy Research.
Experimental Designs. New York: John Wiley and Heinrich, Carolyn J. 2002. ‘Outcomes-Based Perfor-
Sons. mance Management in the Public Sector: Implications
Cohen, Jacob. 1977/1988. Statistical Power Analysis for Government Accountability and Effectiveness.’
for the Behavioral Sciences. New York: Academic Public Administration Review 62(6): 712–25.
Press. Kane, Thomas. 2004. ‘The Impact of After-School
Cook, Thomas H., David Hunt, and Robert F. Murphy. Programs: Interpreting the Results of Four Recent
2000. ‘Comer’s School Development Program in Evaluations.’ Working Paper. New York: W.T. Grant
Chicago: A Theory-Based Evaluation.’ American Foundation.
Educational Research Journal 37(1): 535–97. Kemple, James J., and Jason Snipes. 2000. Career
Cornfield, Jerome. 1978. ‘Randomization by Group: A Academies: Impacts on Students’ Engagement and
Formal Analysis.’ American Journal of Epidemiology Performance in High School. New York: MDRC.
108(2): 100–02. Kempthorne, Oscar. 1952. The Design and Analysis
Cox, D.R. 1958. Planning of Experiments. New York: of Experiments. Malabar, FL: Robert E. Krieger
John Wiley and Sons. Publishing Company.
Donner, Allan, and Neil Klar. 2000. Design and Analysis Kish, Leslie. 1965. Survey Sampling. New York: John
of Cluster Randomization Trials in Health Research. Wiley.
London: Arnold. Kling, Jeffrey R., Jeffrey B. Liebman, and Lawrence F.
Duflo, Esther, and Rema Hanna. 2005. ‘Monitoring Katz. 2007. ‘Experimental Analysis of Neighborhood
Works: Getting Teachers to Come to School.’ Working Effects.’ Econometrica 75(1): 83–119.
Paper 11880. Cambridge, MA: National Bureau of Kremer, Michael. 2003. ‘Randomized Evaluations of
Economic Research. Educational Programs in Developing Countries:
Fisher, Ronald A. 1925. Statistical Methods for Research Some Lessons.’ American Economic Review 93(2):
Workers. Edinburgh: Oliver and Boyd. 102–06.
Fisher, Ronald A. 1935. The Design of Experiments. Liebman, Jeffrey B., Lawrence F. Katz, and Jeffrey R.
Edinburgh: Oliver and Boyd. Kling. 2004. ‘Beyond Treatment Effects: Estimating
Flay, Brian R. 2000. ‘Approaches to Substance Use the Relationship Between Neighborhood Poverty and
Prevention Utilizing School Curriculum Plus Social Individual Outcomes in the MTO Experiment.’ IRS
Environment Change.’ Addictive Behaviors 25(6): Working Paper 493 (August). Princeton, NJ: Princeton
861–85. University, Industrial Relations Section.
Gail, Mitchell H., Steven D. Mark, Raymond J. Carroll, Lindquist, E.F. 1953. Design and Analysis of Experiments
Sylvan B. Green, and David Pee. 1996. ‘On Design in Psychology and Education. Boston: Houghton
Considerations and Randomization-Based Inference Mifflin Company.
for Community Intervention Trials.’ Statistics in Lipsey, Mark W. 1988. ‘Juvenile Delinquency Interven-
Medicine 15: 1069–92. tion.’ In Howard S. Bloom, David S. Cordray, and
Gennetian, Lisa A., Pamela A. Morris, Johannes M. Bos, Richard J. Light (eds.), Lesson from Selected Program
and Howard S. Bloom. 2005. ‘Constructing Instru- and Policy Areas. San Francisco: Jossey-Bass.
mental Variables from Experimental Data to Explore Lipsey, Mark W. 1990. Design Sensitivity: Statistical
How Treatments Produce Effects.’ In Howard S. Power for Experimental Research. Newbury Park, CA:
Bloom (ed.), Learning More from Social Experiments: Sage.
Ludwig, Jens, Greg J. Duncan, and Paul Hirschfield. Olds, David L., John Eckenrode, Charles R. Henderson,
2001. ‘Urban Poverty and Juvenile Crime: Evidence Jr., Harriet Kitzman, Jane Powers, Robert Cole,
from a Randomized Housing-Mobility Experiment.’ Kimberly Sidora, Pamela Morris, Lisa M. Pettitt, and
The Quarterly Journal of Economics 116(2): 655–80. Dennis Luckey. 1997. ‘Long-Term Effects of Home
McCall, W.A. 1923. How to Experiment in Education. Visitation on Maternal Life Course and Child Abuse
New York: MacMillan. and Neglect.’ The Journal of the American Medical
Marks, Harry M. 1997. The Progress of Experiment: Association 278(7): 637–43.
Science and Therapeutic Reform in the United States, Orr, Larry L. 1999. Social Experiments: Evaluating Public
1900–1990. Cambridge: Cambridge University Programs with Experimental Methods. Thousand
Press. Oaks, CA: Sage Publications.
Miguel, Edward, and Michael Kremer. 2004. ‘Worms: Orr, Larry L., Judith D. Feins, Robin Jacob, Erik
Identifying Impacts on Education and Health in the Beecroft, Lisa Sanbomatsu, Lawrence F. Katz,
Presence of Treatment Externalities.’ Econometrica Jeffrey B. Liebman, and Jeffrey R. Kling. 2003.
72(1): 159–217. Moving to Opportunity: Interim Impacts Evaluation.
Morris, Pamela, and Lisa Gennetian. 2003. ‘Identifying Washington, DC: U.S. Department of Housing and
the Effects of Income on Children’s Development: Urban Development.
Using Experimental Data.’ Journal of Marriage and Peirce, Charles S., and Joseph Jastrow. 1884/1980.
the Family 65(3): 716–29. ‘On Small Differences of Sensation.’ Reprinted in
Munnell, Alicia (ed.). 1987. Lessons from the Income Stephen M. Stigler (ed.), American Contributions to
Maintenance Experiments. Boston: Federal Reserve Mathematical Statistics in the Nineteenth Century,
Bank of Boston. vol. 2. New York: Arno Press.
Murray, David M. 1998. Design and Analysis of Group- Puma, Michael, Stephen Bell, Ronna Cook, Camilla
Randomized Trials. New York: Oxford University Heid, and Michael Lopez. 2006. Head Start Impact
Press. Study: First Year Impact Findings. (Prepared by
Murray, David M., and Jonathan L. Blitstein. 2003. Westat, Chesapeake Research Associates, The Urban
‘Methods to Reduce the Impact of Intraclass Institute, American Institutes for Research, and
Correlation in Group-Randomized Trials.’ Evaluation Decision Information Resources, June.) Washington,
Review 27(1): 79–103. DC: U. S. Department of Health and Human Services,
Murray, David M., Peter J. Hannan, David R. Jacobs, Administration for Children and Families, Office of
Paul J. McGovern, Linda Schmid, William L. Baker, and Planning, Research, and Evaluation.
Clifton Gray. 1994. ‘Assessing Intervention Efforts Raudenbush, Stephen, W. 1997. ‘Statistical Analysis
in the Minnesota Heart Health Program.’ American and Optimal Design for Group Randomized Trials.’
Journal of Epidemiology 139(1): 91–103. Psychological Methods 2(2): 173–85.
Murray, David M., and Brian Short. 1995. ‘Intraclass Raudenbush, Stephen W., and Anthony S. Bryk. 2002.
Correlation among Measures Related to Alcohol Hierarchical Linear Models: Applications and Data
Use by Young Adults: Estimates, Correlates and Analysis Methods, 2nd ed. Thousand Oaks, CA: Sage
Applications in Intervention Studies.’ Journal of Publications.
Studies on Alcohol 56(6): 681–94. Raudenbush, Stephen W., Andres Martinez, and Jessaca
Myers, David, Robert Olsen, Neil Seftor, Julie Young, Spybrook. 2005. Strategies for Improving Precision in
and Christina Tuttle. 2004. The Impacts of Regular Group-Randomized Experiments. New York: William
Upward Bound: Results from the Third Follow-up T. Grant Foundation.
Data Collection. Washington, DC: Report prepared by Riccio, James, Daniel Friedlander, and Stephen
Mathematica Policy Research for the U.S. Department Freedman. 1994. Benefits, Costs, and Three-Year
of Education. Impacts of a Welfare-to-Work Program. New York:
Myers, Jerome L. 1972. Fundamentals of Experimental MDRC.
Design. Boston: Allyn and Bacon. Robins, Philip K., and Robert G. Spiegelman (eds.).
Newhouse, Joseph P. 1996. Free for All? Lessons from 2001. Reemployment Bonuses in the Unemployment
the RAND Health Insurance Experiment. Cambridge, Insurance System: Evidence from Three Field Exper-
MA: Harvard University Press. iments. Kalamazoo, MI: W.E. Upjohn Institute for
Nye, Barbara, Larry V. Hedges, and Spyros Employment Research.
Konstantopoulos. 1999. ‘The Long-Term Effects Rosenbaum, Paul R., and Donald B. Rubin. 1983. ‘The
of Small Classes: A Five-Year Follow-Up of the Central Role of the Propensity Score in Observational
Tennessee Class Size Experiment.’ Education Studies for Causal Effects.’ Biometrika 70(1):
Evaluation and Policy Analysis 21(2): 127–42. 41–55.
Schochet, Peter A. 2005. Statistical Power for Random of a Randomized Community-Level HIV Prevention
Assignment Evaluations of Education Programs. Intervention for Women Living in 18 Low-Income
Princeton, NJ: Mathematica Policy Research. Housing Developments.’ American Journal of Public
Schochet, Peter A. 2006. National Job Corps Study and Health 90(1): 57–63.
Longer-Term Follow-Up Study: Impact and Benefit- Teruel, Graciela M., and Benjamin Davis. 2000. Final
Cost Findings Using Survey and Summary Earnings Report: An Evaluation of the Impact of PROGRESA
Records Data. Princeton, NJ: Mathematica Policy Cash Payments on Private Inter-Household Transfers.
Research. Washington, DC: International Food Policy Research
Scrivener, Susan, and Johanna Walter, with Thomas Institute.
Brock and Gayle Hamilton. 2001. National Evaluation Ukoumunne, O.C., Gulliford, M.C., Chinn, S.,
of Welfare-to-Work Strategies: Evaluating Two Sterne, J.A.C., and Burney, P.F.J. 1999. ‘Methods
Approaches to Case Management: Implementa- for Evaluating Area-Wide and Organisation-Based
tion, Participation Patterns, Costs, and Three-Year Interventions in Health and Health Care: A Systematic
Impacts of the Columbus Welfare-to-Work Program. Review.’ Health Technology Assessment 3(5):
Washington, DC: U.S. Department of Health and 1–99.
Human Services, Administration for Children and Van Helmont, John Baptista. 1662. Oriatrik or,
Families, and Office of the Assistant Secretary for Physick Refined: The Common Errors Therein
Planning and Evaluation; and U.S. Department of Refuted and the Whole Art Reformed and Rectified.
Education, Office of the Under Secretary and Office London: Lodowick-Lloyd. Available at the James
of Vocational and Adult Education. Lind Library Web site: www.jameslindlibrary.org/
Sherman, Lawrence W., and David Weisburd. 1995. trial_records/17th_18th_Century/van_helmont/van_
‘General Deterrent Effects of Police Patrol in Crime helmont_kp.html (accessed January 3, 2005).
‘Hot Spots’: A Randomized Control Trial.’ Justice Wald, Abraham. 1940. ‘The Fitting of Straight Lines
Quarterly 12(4): 625–48. If Both Variables Are Subject to Error.’ Annals of
Shultz, Paul T. 2004. ‘School Subsidies for the Poor: Mathematical Statistics 11(September): 284–300.
Evaluating the Mexican Progresa Poverty Program.’ Walker, Robert, Lesley Hoggart, Gayle Hamilton, and
Journal of Development Economics 74(1): 199–250. Susan Blank. 2006. Making Random Assignment
Siddiqui, Ohidul, Donald Hedeker, Brian R. Flay, and Happen: Evidence from the UK Employment Retention
Frank B. Hu. 1996. ‘Intraclass Correlation Estimates in and Advancement (ERA) Demonstration. Research
a School-Based Smoking Prevention Study: Outcome Report 330. London: Department for Work and
and Mediating Variables, by Sex and Ethnicity.’ Pensions.
American Journal of Epidemiology 144(4): 425–33. Weisburd, David, with Anthony Petrosino and Gail
Sikkema, Kathleen, J., Jeffrey A. Kelly, Richard A. Winett, Mason. 1993. ‘Design Sensitivity in Criminal Justice
Laura J. Solomon, Cargill, V.A., Roffman, R.A., Experiments.’ In Michael Tonry (ed.), Crime and
McAuliffe, T.L., Heckman, T.G., Anderson, E.A., Justice, An Annual Review of Research, vol. 17.
Wagstaff, D.A., Norman, A.D., Perry, M.J., Chicago: University of Chicago Press.
Crumble, D.S., and Mercer, M.B. 2000. ‘Outcomes
10
Better Quasi-Experimental
Practice
Thomas D. Cook and Vivian C. Wong
INTRODUCTION share the same treatment group and therefore

only vary how the control group is created —
Recent reviews comparing effect size esti- randomly or not. Our doubt comes from two
mates from randomized experiments and sources: the judgment that the majority of
quasi-experiments that share the same treat- such studies are logically and empirically
ment group have strengthened the view flawed in ways we will point out; and our
that randomized experiments provide the demonstration that experiments and stronger
best approximation to a gold standard for quasi-experiments do indeed sometimes pro-
answering causal questions (Glazerman et al., duce the same causal answers in ways that
2003; Bloom et al., 2005a). This is because the make theoretical sense.
quasi-experiments in these reviews generally This chapter is organized in three
failed to attain the same effect sizes as sections. The first highlights the strongest
their yoked experiments. The finding has quasi-experimental designs, regression-
prompted major funding institutions such as discontinuity and interrupted time series, and
the Department of Labor and the Institute of discusses features that make these designs
Education Sciences to encourage those who superior, including results from studies that
apply to them for grants and contracts to use empirically test their efficacy relative to
randomized experiments whenever possible. experiments. The second section examines
We believe that the question of whether the difference-in-differences design, the
to use experiments or quasi-experiments is most frequently used quasi-experimental
not closed. This is not just because well- design that contrasts two non-equivalent
designed randomized experiments are not groups measured at pretest and posttest
always possible, but also because there are on the same scale. This section also uses
reasons to question the validity and generality details from recent empirical attacks on the
of past research contrasting the effect sizes difference-in-differences design to show how
from experiments and quasi-experiments that certain ways of selecting control groups and
BETTER QUASI-EXPERIMENTAL PRACTICE 135
of measuring and analyzing selection can lead sizes tended to be greater among the quasi-
to very close approximations of experimental experiments than experiments (Lipsey &
results. The third section offers suggestions Wilson, 1993; Glazerman et al., 2003). So, the
for improving quasi-experimental design, average experiment and quasi-experiment
not through the use of matching — the cannot be relied on to generate the same
current dominant strategy — but through causal conclusion.
an alternative pattern-matching strategy In the within-study approach, researchers
that depends on generating and testing take the effect size from a randomized
multiple empirical implications from the experiment and compare it to the effect size
same causal hypothesis. We use examples from a quasi-experiment that uses the same
from education and job training to illustrate intervention group data as the experiment but
the specific design attributes we discuss compare it with data from a non-randomly
and recommend because the debate over formed control group. Most of the within-
experiment versus quasi-experiment is most study comparisons conducted to date have
heated in these fields. However, the design been in the job training field, though some
principles presented here apply elsewhere as have involved educational topics. At first
well. Finally, it is worth mentioning that our glance, the within-study approach seems
intention is not to present a treatise on analytic a stronger empirical test of design type
methods for quasi-experimental designs, but difference. After all, there is variation in
is rather to showcase the strongest and whether a study is experimental or not, and
best quasi-experimental designs and design settings, people, and treatments are more
features and to suggest common areas of likely to be held constant by virtue of the
weakness in current practice. For more shared experimental group1 . In contrast, the
technical and theoretical discussions on between-study tradition can involve a set of
analytic methods described in this chapter as experiments that differs in many ways on
well as additional examples, we include a list average from the comparison set of quasi-
of suggested readings in Appendix 1. experiments, even though logic calls for
variation in design types but not in anything
else that might be correlated with study
outcomes. This makes between-study results
EFFICACY TESTS OF inherently ambiguous.
QUASI-EXPERIMENTS RELATIVE Our goal is to reexamine conclusions from
TO EXPERIMENTS: BETWEEN-STUDY the within-study comparison literature that
VERSUS WITHIN-STUDY APPROACHES have been most prominently discussed and
cited in the fields of economics, job training,
Two approaches have been employed in and education. We focus on studies that were
studies that have assessed the validity of included in Glazerman et al.’s (2002, 2003)
quasi-experimental designs. In the between- meta-analysis of within-study comparisons,
study approach, researchers compare esti- as well as comparisons that have been more
mated effects from the set of experimental recently published (see Appendix 1 for a list of
studies done on a topic with the estimated within-study comparisons found in education,
effects from whatever quasi-experimental job training, and economics). However, since
studies were available on the same topic. there are only 20 within-study comparison
Aiken et al. (1998) summarized findings studies we acknowledge that basis for extra-
from this tradition. Across many domains of polation is limited. Moreover, we discuss only
application, they concluded that the average a subset of these studies in detail — three
effect sizes were sometimes similar across the with RD design, one with an abbreviated
experiments and quasi-experiments, but that interrupted time series design, and four with a
they were also often different. And even when difference-in-differences design. Thus while
the means did not differ, the variance in effect the conclusions presented here are meant to
spur further debate, discussion, and research, of nonexperiment used for testing causal
they are not meant to be the final word on hypotheses that do not have these last features
the efficacy of quasi-experimental designs and are therefore not quasi-experimental. This
as assessed empirically from the results of chapter is concerned with the efficacy of
within-study comparisons. quasi-experimental designs relative to the
randomized experiment.
In quasi-experiments, assignment to treat-
ment or control status may be determined
TYPES OF EXPERIMENTS: by self-selection or administrator decision,
RANDOMIZED, NONEXPERIMENTS, and so initial differences between groups
AND QUASI-EXPERIMENTS may come to mimic treatment effects, thus
confounding population differences between
All experiments seek to test a causal hypothe- the treatment and control groups with possible
sis by demonstrating that the cause preceded effects of the treatment and so creating what
the effect in time, that the two co-vary, and is called a ‘selection’ problem. The per-
that there are no alternative interpretations of fectly implemented randomized-experiment
why they vary other than that the cause was rules out selection (and other alternative
responsible for the effect. Experiments in the interpretations of why a potential cause and
social and behavioral sciences also have some effect co-vary) by distributing these alterna-
similar structural attributes. There is always tives equally over the various experimental
one or more outcome measure, plus groups conditions. They are not removed from the
of units that undergo either a treatment or research setting, as though by magic; they are
some contrast experience. This last is often merely removed as alternative interpretations
a no-treatment control group experience that by being equally represented in each of the
seeks to function as a causal counterfactual — groups under contrast.
that is, as an assessment of what would have A well-designed quasi-experiment can
happened to units receiving the treatment if also rule out alternative explanations, but
they had not in fact received it. to do this requires more assumptions and
There are different types of experiments. less transparency, and consequently a more
The randomized experiment is character- uncertain causal answer than the randomized
ized by assignment to treatment or control experiment provides. In particular, the use of
status on the basis of some equivalent of quasi-experiments requires close attention to
a fair coin toss. It creates two or more three related issues. The first is to identify
groups that are initially comparable within all plausible alternative interpretations to
the limits of sampling error. This renders the hypothesis that the independent and
them valid as a no-treatment counterfac- dependent variables are causally related,
tual, with the warrant for this judgment these alternatives being called threats to
stemming from formal probability theory. internal validity (see Shadish et al. (2002) for
Nonexperiments, in contrast, do not use extended discussion). While the randomized
random assignment. Quasi-experiments are experiment takes care of these threats by
the special subtype of nonexperiments that distributing them equally across conditions,
attempt to mimic randomized experiments the quasi-experiment requires researchers to
in purpose and structure despite the absence examine and assess the plausibility of each
of random assignment. In contrast to quasi- threat explicitly. The second is the assumption
experiments, other nonexperiments do not that experimental design principles enjoy a
directly manipulate treatments, nor do they primacy over substantive theory or statistical
have observations and comparison groups that adjustment procedures when it comes to
are deliberately and originally designed to ruling out validity threats. In practice, this
provide a causal counterfactual. Longitudinal entails reliance on carefully chosen compar-
observational studies are a common type ison groups and/or pretest measures taken
at multiple times. The third principle for side of the cutoff score are assigned to the
ruling out alternative explanations is the use treatment while individuals who score on the
of coherent pattern matching. This requires other side are assigned to the comparison.
that existing substantive theory be specific Thus, treatment assignment is completely
enough to predict the specific pattern of observed and depends on one’s score on the
multivariate results that should result from cutoff variable and on nothing else. Treatment
a given causal hypothesis, a pattern that effects then are estimated by examining the
few alternative explanations can match. We displacement of the regression line at the
begin by discussing designs that exemplify cutoff point determining program receipt.
the best of what quasi-experimental theory has Figures 10.1 and 10.2 show a hypothetical
to offer. RD experiment with and without treatment
effects. In both cases, the cutoff is a score
of 50 — those scoring above 50 receive
REGRESSION-DISCONTINUITY treatment and those scoring below it are the
DESIGN non-equivalent controls. The graphs show
scatterplots of assignment scores against
The regression-discontinuity (RD) is still not posttest scores, each depicting a linear, pos-
widely used despite theoretical and empirical itive relationship between the two variables.
demonstrations of its ability to provide In Figure 10.1, where a treatment effect is
unbiased treatment effect estimates when its present, we see a vertical disruption — or dis-
assumptions are met. Nonetheless, RD has continuity — at the cutoff, though treatments
gained prominence as an abstract alternative can obviously also cause an upward shift.
to experiments in health, economics, and The displacement in Figure 10.1 represents a
education (for history of RD, see Cook, in change in the mean posttest scores, equivalent
press). Indeed, a recent request for proposal to a main effect of treatment. It is also possible
from the Institute of Education Sciences, for treatments to cause a change in slope at
a United States Department of Education the cutoff, this being equivalent to a treat-
agency that funds education research, stated ment by assignment statistical interaction,
that if a randomized experiment was not provided that the change in slope can be
possible for addressing a causal question, unambiguously attributed to the intervention
then acceptable alternatives included ‘appro- rather than to some underlying non-linear
priately structured regression-discontinuity relationship between the assignment and
designs’ (Institute of Education Sciences, outcome. In Figure 10.1 that has linear and
2004). In this section, we examine the basics parallel regressions, we interpret the effect
of a RD design, theoretical and empirical size to be a negative change of 5 units because
reasons for why RD is so special among quasi- there is a vertical displacement of 5 points
experimental designs, and examples of RD in at the cutoff. In Figure 10.2, there is no
order to highlight practical considerations that displacement at cutoff and the regression lines
are important for implementing the design. are again parallel. So we interpret this as
no effect.
For a simple RD design one needs an
The basics of RD
assignment variable that has ordinal prop-
In a RD design, individuals are assigned erties or better. Continuous measures such
to treatment and comparison groups solely as income, achievement scores, or blood
on the basis of a cutoff score from some pressure work best, while nominal measure-
assignment variable. The assignment variable ments such as race or gender do not work
is any measure taken prior to the treatment at all because they cannot lead to correct
intervention, and there is no requirement that modeling of the regression line. However, the
the measure be reliable. The obtained fallible continuous assignment variable can take on
score suffices. Individuals who score on one any form. It can be a pretest measure of the
60
cutoff score
‘projected’
regression line ‘discontinuity’
or treatment
55 effect
50
Y
treatment
group
regression
45
line
control group treatment group

40
40 45 50 55 60
X
Figure 10.1 Regression-discontinuity with treatment effects

Source: Trochim, 1994.
60 cutoff score
55
regression
50
Y
line
45
40
40 45 50 55 60
X
Figure 10.2 Regression-discontinuity without treatment effects

Source: Trochim, 1994.
dependent variable (Seaver & Quarton, 1976), O1 C X O2

a three-item composite measuring family O1 C O2
material welfare in Mexico (Buddelmeyer &
Skoufias, 2003), a composite of 140 vari- Figure 10.3 Regression-discontinuity design
ables predicting the likelihood of long-term
unemployment (Black et al., 2005), a birth Taken together, the basic RD design can
order of participants, or an order of individuals be represented as in Figure 10.3. Here O1
entering a room. When the assignment and represents the assignment variable, C is the
outcome variables are not correlated, this units assigned to conditions on the basis of
is very much like what happens with a the cutoff score, X is the treatment, and O2 is
randomized experiment where, because of the posttest or outcome measure.
the coin toss, the process of assignment to
conditions is not related to the outcome.
Second, a cutoff point must be carefully
Unique characteristics of this design:
selected. Many considerations for choosing
Theoretical and empirical reasons
a cutoff point are beyond the scope of this
chapter, but in general one should select The most common question asked about RD
a cutoff point that ensures adequate range is how the design can yield unbiased estimates
and sample sizes in both the treatment and when the pretest means of the treatment and
comparison groups so that the regression each control groups do not overlap. In an experi-
side of the cutoff can be reliably estimated. ment, treatment effects are usually inferred
This is potentially problematic if awards by comparing treatment group posttest means
are given to the particularly meritorious or with control group posttest means under the
compensatory resources to those in very assumption that they would otherwise have
greatest need, thus creating a miniscule range been identical, given the perfect overlap
on one side of the cutoff. that is initially achieved on both observed
Third, the strict implementation of a cutoff and unobserved variables. Similarly, causal
score is what makes RD designs unique, inference in quasi-experiments is stronger the
and serves as the basis for this design’s less the two group means initially differ (Cook
comparative advantage over other quasi- et al., 2005). Yet in RD, the groups must
experimental designs. Assignment must be by maximally differ on the assignment variable.
the specific score on the assignment variable, In RD, treatment effects are not estimated
and nothing else. As with randomized experi- by comparing posttest means or some form of
ments, overriding the selection mechanisms difference in gains, but rather by extrapolat-
by arbitrarily moving participants from one ing the relationship between the assignment
condition to another introduces the potential variable and posttest on the untreated side
for bias because some persons whose scores of the cutoff into the treated side. The
place them on one side of the cutoff will in fact counterfactual is therefore a slope, and the
receive the treatment destined for those on the simplest null hypothesis is that both treatment
other side. This makes the assignment process and comparison group regression lines have
more ‘fuzzy’ in practice than it is supposed the same intercept at the cutoff. Should there
to be in theory (Trochim, 1984). So, as with be a difference and all other conditions for
the randomized experiment, treatment assign- causal inference are met — especially the
ment must be carefully planned and its comparability of regression functions on each
implementation controlled, making many side of the cutoff — then an inference is drawn
retrospective uses of the RD design problem- that the treatment caused the difference in the
atic. The main exception here is when assign- intercept.
ment is by birth dates (Angrist et al., 1996; A better way to think about RD design
Staw et al., 1974), or some other means that is that the selection process is perfectly
can be easily verified and carefully recorded. known. It depends only on the obtained
score on the assignment variable and so discontinuity can be estimated using non-
can be perfectly modeled. In other quasi- parametric regression techniques, and other
experiments, how units came to be assigned economists have attributed the virtues of RD
to treatment is usually not fully known. We to the near randomness of allocation decisions
cannot control for all the possible covariates around the cutoff point itself. So they use
that might discriminate between students analytic methods that give greatest weight to
who volunteer to participate in a dropout observations closer to the cutoff on grounds
prevention program versus those who do not. that this is where random error is most likely to
Indeed, the methodology literature is replete determine treatment status. The disadvantage
with mostly unresolved debates about proof this assumption and its attendant analysis
cedures that might control for selection in strategy is that treatment effects are identified
quasi-experiments other than RD. However, only at the cutoff, thus limiting external
only in RD and the randomized experiment validity over what would be the case when
is the selection process completely known slopes on each side of the cutoff have similar
and measured. This is why strict adherence values.
to assignment based on the cutoff is essential Adding to the theoretical case for the bias-
if RD is to yield unbiased results, just as free nature of perfectly implemented RD
strict adherence to the ‘coin toss’ allocation are the results of three empirical studies
is crucial for interpreting a randomized that compare effect sizes from RD and
experiment. experimental benchmarks. Aiken et al. (1998)
Goldberger (1972a, 1972b) proved that examined how students enrolled in a college
generalized treatment estimates obtained remedial writing class performed in essay
from RD are comparable to estimates from writing and on a Test of Standard Written
randomized-experimental designs. However, English (TSWE) when compared to students
unbiased estimates require meeting the without the remedial course. Before the
following key assumptions: that the cutoff is study began, students at this university were
rigorously followed; that the functional form assigned to the remedial class on the basis
of the relationship between the assignment of a cutoff score either on the ACT or SAT.
and posttest can be fully described; that there The RD design used this feature to create
are enough assignment values to responsibly the treatment group consisting of all those
estimate the regression line each side of students scoring below the cutoff, and the
the cutoff; and that the assignment variable comparison group from all those scoring
is continuous. Under these conditions, and above it. In addition to the RD design, the
when the assignment and outcome variables authors included a randomized experiment
are linearly related, a single regression that took a sample of volunteers from just
function or ANCOVA can be used to estimate below the cutoff and randomly assigned them
treatment effects, with the group assignment to the remedial course or Standard English
variable and the cutoff being included as writing class. Despite differences in where
covariates. However, as Goldberger (1972a, treatment effects were estimated for both the
1972b) also showed, the RD analysis will experimental and RD studies, the authors
have approximately 2.75 times less statistical found that both designs produced similar
power than an experiment with the same patterns of results in significance levels and
sample size when the cutoff is at the midpoint effect size.
of the assignment variable. The second experiment RD contrast was
Econometricians have extended the discus- by Buddelmeyer and Skoufias (2003). They
sion of statistical analysis in RD by devising reanalyzed data from PROGRESA, a large-
methods that bypass the questions of which scale Mexican program aimed at alleviating
variables are needed to model outcomes and poverty through investments in education,
their functional form. Hahn et al. (2001) have nutrition, and health. The authors took
shown that treatment effects at the point of advantage of the fact that Mexican villages
were randomly assigned to PROGRESA, experiment and ensured that the RD causal
but that families within the experimental estimate was at the same average point on
villages were then assigned into treatment the assignment variable as the experiment,
conditions based on their score on a scale creating a more interpretable contrast of the
of material resources. For the experimental two design types.
and RD studies, the authors examined whether The experimental and RD analyses com-
PROGRESA improved school attendance and pared results for three outcomes — weeks
reduced labor force participation among girls receiving unemployment insurance (UI) ben-
and boys between the ages of 12 and 16. efits, amount of UI benefits received, and
Overall, the authors found close correspon- annual earnings. The RD analyses weighted
dence in the experimental and RD results. data closer to the cutoff and examined how
However, there was one round of results the correspondence between experimental and
where the RD and experimental findings RD results varied with proximity to the cutoff.
diverged and, after additional analyses, the The assignment and outcome variables were
authors found evidence of spillover effects not linearly related, but even so a close
in the comparison group that produced correspondence was obtained between the
dissimilar RD findings. This led the authors experimental and RD results in statistical
to conclude that, ‘it is the comparison group significance patterns, magnitude of estimates,
rather than the method itself that is primarily and in direct tests of differences between
responsible for the poor performance of the RD and experimental impacts. This was
the RD.’ especially true when the RD observations
The third direct comparison of experiment/ were closest to the cutoff. The implication of
RD results is the most methodologically all three attempts to check RD results against
advanced. Black et al. (2005) reanalyzed data experimental ones is that the design generates
from a job training program in Kentucky that bias-free results, not just in theory, but also
assigned those likely to exhaust unemploy- in complex research practice.
ment insurance to mandatory reemployment Black et al.’s (2005) study further illustrates
services as a requirement for benefit receipt. that researchers can handle non-linearity
The RD was claimants’ assignment into job in the relationship between the assignment
training programs based on a single score variable and the outcome. They did this
derived from a 140-item test predicting the by varying the range of the assignment
likelihood of long-term unemployment. For variable and putting an a priori faith in
each local employment office in each week, estimates with the least range. It is also
new claimants were ranked by their assigned possible to use non-parametric regression or to
scores. Reemployment services were given include a range of models using higher order
to those with the highest scores, followed terms, interactions, and/or transformations
by those with the next highest scores until of variables in order to probe the stability
the slots for each office each week were of results across alternative specifications of
filled. When offices reached their maximum functional form. Best of all, though, is
capacity, and if there were two or more to get measures of the outcome variable
claimants with the same profiling scores, from a period prior to the intervention.
then random number generators were used Such a pretest helps describe the functional
to assign the remaining claimants with the form of the assignment/outcome relation-
same profiling scores into treatment condition. ship independently of the influence of the
Thus, only claimants with marginal profiling treatment in order to permit an analysis
scores — the point at which capacity con- that, in essence, differences the pre- and
straint was reached in a given week and in a post-intervention slopes each side of the
given local office — were randomly assigned cutoff. This design response to the problem
into experimental groups. This sampling of possible non-linear relationships stands
procedure resulted in a true tie-breaking in stark contrast to statistical responses
that are based on non-parametric regression, external organizations. Schools that were
differential weighting, and willingness to limit below the 15 percent cutoff were assigned
the external validity of the causal relationship to the treatment condition (teacher develop-
to just around the cutoff point. ment) while schools above the cutoff served
as the comparison group. The independent
variable was resources for teacher training;
Examples of RDs
the assignment was the percentage of students
Earlier, we suggested that RD is slowly who met national norms in reading; and the
becoming a more popular choice for eval- outcome was math and reading achievement
uation in education, where standards-based among elementary school students. Results
reforms have allocated funds, resources, and found that teacher training had no statistically
penalties based on students’ or schools’ significant effect on either students’ math or
obtained scores on achievement tests. This reading achievement.
section offers more examples for how RD However, the cutoff for assignment in Jacob
can be used to evaluate treatment effects and Lefgren’s (2002) study was not as clean
in education; and it also illustrates another as one would want. First, several schools
common problem in RD that arises when that scored below the probation cutoff were
the cutoff point is not the only criterion for waived from the policy (15 of the 77). Second,
treatment assignment and when other, more 25 schools originally placed on probation
social or political factors, also enter into the raised student achievement by enough to
allocation decision, making it fuzzy rather be removed from probation even before the
than sharp as is preferable for RD. treatment was completed. On the other hand,
Trochim (1984) analyzed data to determine 16 schools that missed the probation cutoff in
the effects of compensatory education on the first year were placed on probation in the
student achievement. He examined a second- next two years. Finally, there was substantial
grade compensatory reading program in Prov- student mobility between schools. Including
idence, Rhode Island where all children in overrides to the cutoff in the analysis sample
the same pool were pre-tested using a reading is likely to produce bias in treatment effect
test. Those who scored below the cutoff were estimates, as is failure to take up the assigned
assigned to a reading program while those treatment and attrition from the sample after
who scored above the cutoff were not assigned assignment.
treatment. His analysis of Rhode Island Several statistical procedures have been
second-graders found that the program signif- proposed to address fuzzy discontinuity.
icantly improved children’s reading abilities. In the first approach, suggested by Trochim
However, few other state compensatory and Spiegelman (1980), an estimated assign-
education programs that Trochim examined ment variable is constructed for each unit. Its
yielded similar positive effects (1984). distribution resembles, not the step function of
Jacob and Lefgren (2004a, 2004b) exam- a sharp discontinuity, but an ogive or spline
ined the effects of teacher training and whose slope value depends on how much
summer school participation and retention mis-assignment has occurred. A simulation
on student achievement in Chicago Public study by Trochim (1984) and an evaluation
Schools (CPS). We describe the design of of Title I (Trochim, 1984) show the use of
the teacher training study in detail only. such functions as an unbiased method for
In 1996, CPS introduced a reform that placed dealing with fuzzy discontinuity. The second
schools on academic probation if fewer than approach, employed by Jacob and Lefgren
15 percent of students met the national norms (2004a, 2004b) and others (Angrist & Lavy,
on standardized reading exams. To improve 1999; van der Klauww, 2002), uses an instru-
academic achievement, CPS provided pro- mental variable (IV) framework. Here, fuzzy
bation schools with funds and resources discontinuity is seen as an endogeneity issue,
to buy teacher development services from where the assignment variable is believed to
be correlated with unobservables in the error series (ITS) design can be used to assess
term. An ideal instrument in RD is a variable whether a treatment administered at a known
that affects the outcome only through its time during the series leads to a change
association with the endogenous assignment in intercept or slope at the intervention
term. In principle, the use of an instrument point. In much social science practice, it is
expunges correlation between the assignment difficult to find studies with enough time
variable and the error term. In practice, it may points to estimate the error structure and
be difficult to know what a good IV is because provide responsible analysis at a district,
one cannot test whether the IV in question school, class or student level (Box & Jenkins,
is truly uncorrelated with unobservables in 1970)4 . Much more common are abbreviated
the error2 . Jacob and Lefgren (2004a) used ITSs with, say, 4 to 20 pretest time points.
discontinuities in school test scores for Indeed, standards-based reform in education
predicting whether teachers received training has led to the repeated tracking of student
or not, and then used the predicted term as their test scores, providing many opportunities for
instrument for the assignment variable in the abbreviated ITS design and analysis. This
parametric RD models. They ran sensitivity section discusses the design, the theory and
tests to explore alternative pathways for empirical research supporting its validity, and
how test scores could influence the outcome examples of how it has been used.
other than through its relationship with the
assignment and found no such evidence. Thus,
the authors concluded that they had a valid
The basics of controlled abbreviated
instrument for addressing fuzziness3 .
ITS design
Finally, it is important with RDs to examine
empirically the social dynamics of the cutoff. A time series requires repeated measurements
In the Irish school-leaving examination, it was made on the same variable over time. The
discovered that scores just below the passing observations can be made on the same units, as
cutoff score were underrepresented in the with multiple test scores on the same student,
frequency distribution, presumably because or on different but similar units, as with
examiners did not want to hurt a student’s test scores from multiple cohorts of students
chances by assigning them a 38 or 39 when within the same school. ITS also requires an
40 was the passing score. In other RD studies intervention that is supposed to generate an
it is not unknown for social workers to mis- interruption in the series at a known point in
represent family income around cutoffs that time corresponding to implementation of the
determine eligibility for services. Researchers treatment. The design also works better when
should control the assignment process as a rapid response to the intervention is expected
much as possible and observe the process (or when the response interval is well known,
directly, preferably in a pilot research phase so as with 9 months in the case of the period from
that potential problems can be addressed. This intercourse to birth), and when the intervals
same advice holds for the experiment also. Its between observations are short. If a treatment
implementation needs to be directly examined is phased in slowly over time, or if it reaches
and otherwise checked. different sections of the target population at
differing times, then implementation is better
described as a gradually diffusing process
rather than as an abrupt intervention. In these
ABBREVIATED INTERRUPTED TIME cases of delayed intervention, the chance
SERIES DESIGN WITH A CONTROL of other events influencing the outcome
SERIES increases, making history a plausible threat to
internal validity. At a minimum, the diffusion
When a series of observations are available process should be directly observed and,
on the same variable, an interrupted time where possible, modeled.
1000
900
800
Intervention
Number of Calls
700
600
500
400
300
200
100
0
1962 1964 1966 1968 1970 1972 1974 1976
Year
Figure 10.5 The effects of charging for directory assistance in Cincinnati

Source: Shadish et al., 2002. Copyright 2002 by Houghton Mifflin Company.
slope, and variance), its permanence (contin- assessing whether effects are immediate or not
uous or discontinuous), and its immediacy and continuous or not.
(immediate or delayed). In March 1974, We are only aware of one study testing
Cincinnati Bell began charging 20 cents per the validity of an abbreviated ITS design
call to local directory assistance. Figure 10.5 by comparing its results to those achieved
shows an immediate and large drop in local from a randomized experiment that had the
directory assistance calls when this charge same intervention group. Bloom et al. (2005a;
began. But treatment effects can be described Michalopoulos et al., 2004) reanalyzed data
along dimensions other than their means. from the 11-city NEWWS, a component
A continuous treatment effect persists over of the Job Opportunity and Basic Skills
time, while a discontinuous effect tends to (JOBS) program that mandated job training
drift back to pre-intervention level after the services for unemployed individuals. The
initial effect wears off. Figure 10.5 shows study involved at least 8 pretest quarterly
a continuous treatment effect because the reports on earnings prior to intervention and
change in level persisted well into 1976. 20 quarters of earnings post-intervention.
Effects can also be immediate or delayed. Four cities — Oklahoma City, Riverside,
Immediate treatment effects are easier to Portland, and Detroit — included welfare
interpret, while delayed effects are more recipients in one part of the city who were
problematic because plausible alternative randomly assigned to treatment or control
explanations may be introduced in the time group, and the non-equivalent comparison
interval between intervention onset and the group for the ITS study was composed of
recorded response. Therefore, a strong theo- people from another part of the same city. In
retical justification that predicts the length of fact, comparisons had comprised of individ-
a delay is helpful when examining delayed uals who had served as controls in the same
effects, such as the expectation of increased experiment. A fifth comparison was in-state
births nine months after a citywide electricity rather than within-city, involving treatment
blackout, not three months after the event. and comparison groups from Detroit and
In the Cincinnati Bell case, the treatment Grand Rapids. All the data we report here are
response was immediate, with a large drop at the site mean level, aggregated up from
in directory assistance calls occurring on longitudinal individual data collected at the
intervention day. When interpreting an ITS same times and on the same measures for
study, it is helpful to describe effects in terms both the experimental and the abbreviated
of changes in level, slope, and variance, thus ITS samples. The general logic with empirical
O1 O2 O3 O4 O5 X O6 O7 O8 O9 O10
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
Figure 10.4 Interrupted time series design
Also, the inclusion of an untreated control to be out of work in 1963. So their reported
group with multiple observations can help earnings in 1963 had to be depressed.
rule out plausible threats to validity such as Consider the numerous threats to validity
history, maturation, and statistical regression if Ashenfelter had used only one pretest
that a simple ITS cannot rule out. So a quality- measure. First, he would not have been able
abbreviated ITS requires both treatment and to eliminate ‘maturation’ as an alternative
control groups for which there are multiple explanation. Under the maturation hypothesis,
and frequent observations before and after the training members’ earnings increased at a
intervention. A simple design with a control faster rate than comparison members’ but
group and 10 observations is depicted in had started at a lower point than comparison
Figure 10.4. units, even before 1963. With multiple years
of earnings data, Ashenfelter was able to
examine the data for group differences
Unique characteristics of this in maturation. Second, regression to the
mean would have been difficult to discount.
design: Theoretical and
In this scenario, if unemployment of treatment
empirical reasons
group members in 1963 was temporary
There are several potential advantages of and necessary for program inclusion, then
the abbreviated ITS for assessing treatment the increase in earnings after 1963 might
effects. Ashenfelter (1978) examined the have occurred even without participation in
effects of participation in a job training the treatment. Using multiple years of pre-
program on earnings for Blacks and Whites intervention data, Ashenfelter (1978) found a
and for males and females. The treatment small decrease in earnings for the treatment
group consisted of individuals who began job group between 1962 and 1963, but not
training under the Manpower Development enough that regression could have accounted
and Training Act in the first 3 months of 1964. for all treatment effects. Finally, history
The comparison sample was constructed from would have been another plausible alternative
the 0.1 percent Work History Sample of the explanation. Under this threat, observed
Department of Labor, a random sample of increases in earnings would have been due
earnings records on American workers. The to upward trends in the economic cycle,
outcome was earnings at 11 time points for and not to treatment effects. Multiple pretest
each of the groups. In addition to multiple observations allowed Ashenfelter to test for
posttest observations, Ashenfelter had four seasonal or cyclical patterns in the data.
years of earnings for the treatment and Note in this example that it is the length,
comparison groups prior to the intervention. number, and frequency of pre-intervention
Posttest results suggested that participation in time points that permits the examination of
the job training program increased earnings common threats to validity. Multiple posttest
for all the treatment groups by race and observations help determine the temporal
gender. However, Ashenfelter noted that pattern of an effect, but they cannot rule out
treatment group members had lower earnings alternative explanations.
than the comparison group in the year The second unique feature of ITS design
before intervention. While comparison group is that treatment effects can be assessed
members remained in the labor force, those along multiple dimensions. The next example
eligible for job training in 1964 were required demonstrates that treatment effects can be
as a condition of acceptance into the program measured by the form of the effect (level,
tests of the correspondence in results between each of the five sites, the intervention point
an experiment and quasi-experiment is that the being designated as 0 on the time scale.
randomly formed control group and the non- Visual inspection suggests no shift in the
randomly formed comparison group would intercept at the intervention point in three
have to be identical if they were to produce the sites — Oklahoma City (N controls = 831;
same causal effect size, given that both groups N comparisons = 3,184), Detroit (N controls
would be analyzed with the same treatment = 955; N comparisons = 1,187), and
data. In the ITS case, though, the logic is Riverside (N controls = 1,459; N comparisons
slightly different. The means and slopes can = 1,501). There were no reliable differences
differ, but not the behavior of the control in slopes either, though the possibility of
or comparison group around the intervention such is indicated in the later lags in both
point. Any temporal changes observed there Detroit and Riverside. However, these small
can masquerade as alternative interpretations differences had opposite signs and basically
of an immediate program impact. cancelled each other out. Indeed, neither the
Figure 10.6 displays the means over time means nor trends reliably differed at any of
for the control and comparison groups at these three sites and would not differ if they
Mean Quarterly Earnings: Oklahoma City

2000
1800
1600
Mean Earnings (1996 $)
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Quarters from Random Assignment
Oklahoma City Rural (control group) Oklahoma City Central (comparison group)
Mean Quarterly Earnings: Detroit

2000
1800
1600
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Detroit Fullerton (control group) Detroit Hamtramck (comparison group)
Figure 10.6 Mean quarterly earning by site

Source: Michalopoulos et al., 2004.
Mean Quarterly Earnings: Riverside

2000
1800
1600
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Riverside City (control group) Riverside County (comparison group)
Mean Quarterly Earnings: Grand Rapids and Detroit

2000
1800
1600
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Grand Rapids (control group) Detroit (comparison group)
Mean Quarterly Earnings: Portland

2000
1800
1600
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Portland West Office (control group) Portland East and North Offices (comparison group)
Figure 10.6 Continued

were aggregated into a single analysis. These all 28 time points irrespective of the sign of
results suggest that selecting these samples these differences, thus capitalizing on random
within cities induced so much comparability error of whatever source. By contrast, our
that further individual matching would hardly analysis was of average bias, of the difference
help control bias. between the two types of comparisons across
The comparability between control and 28 time points per site when account is taken
comparison groups was not replicated in of the signs attached to each difference at
Portland, the smallest site where the random- each time point. Fortunately, Bloom et al.
ized control cases were about a third of the also compute average bias, reporting that the
next smallest control group (N controls = comparison and control group differed by
328; N comparisons = 1,019). Figure 10.6 between 1 percent and −3 percent at two years
shows that while control and comparison after the intervention and between 3 percent
groups were stably different at the series’ and −4 percent at five years, even when the
very beginning and throughout most of the less appropriate Grand Rapids/Detroit com-
post-intervention period, one group exhib- parison was included in the calculation. Such
ited a large earnings dip immediately prior a close correspondence between an experi-
to the intervention. Thus, the control and mental control group and a nonexperimental
comparison groups did not act similarly comparison group would lead to experimental
pre- and post-intervention, as ITS requires. and quasi-experimental effect sizes that do not
The same was true when Grand Rapids differ when each is subsequently yoked to the
(N = 1,390) was compared to its within-state same treatment group.
comparison, Detroit (N = 2,142). Here, the Even accepting Bloom et al.’s (2005a;
pre-intervention group means differed, but not Michalopoulos et al., 2004) analysis of
the post-intervention ones. This again implies absolute rather than average bias, we would
that different causal conclusions would arise still have reason to be concerned about
between the randomized experiment and the generalizing the study’s findings to other
abbreviated ITS quasi-experiment yoked to it. research domains. Despite the 8 pretest
However, design and city differences were observations, the earnings measures were
confounded in this last analysis. While Detroit not highly correlated across a year (by our
and Grand Rapids are in the same state, they rough estimate, about 0.42). As a point of
are not in the same city and so would likely comparison, for example, student test scores
have different local labor markets with their tend to correlate on a magnitude of about 0.58
unique economic pressures at different times. to 0.74 in math and 0.60 to 0.74 in reading
If we were to sum the four within-city (Bloom et al., 2005b). The relatively low
comparisons and weight Portland appropri- annual correlations for earnings suggest that
ately less than the other sites, there would be the pretests were limited in their usefulness
little or no difference between the control and as selection controls than would be the case
comparison groups around the intervention when examining academic achievement, for
point and hence, there would be little causal example. Even so, the number of pretest
bias. The same would likely be true if all observations still helps, for Bloom et al.
five sites were summed. In this particular report that constructing a pretest covariate
case, the abbreviated ITS would not be out of pretest earnings data from varying
biased relative to the experiment. However, numbers of waves led to less bias the more
Bloom et al. (2005a; Michalopoulos et al., waves there were. The presumption is that
2004) concluded that the within-state control creating a single pretest measure out of more
and comparison groups did not closely waves leads to more reliable estimation of
approximate each other. Their analysis of that pretest selection difference. Bloom et al.
absolute bias was predicated on computing could not show, though, that constructing an
the difference between the randomly and non- individual level growth model helped reduce
randomly formed comparison groups across the selection threat they claimed to find when
analyzing absolute bias. But even so, the variable was a time series of 10 observations
lower correlations among adjacent earnings on awareness of alcohol abuse among college
measures suggest that growth trends were not students. The two nonequivalent dependent
stably estimated in this project, the more so variables, good nutrition and stress reduction,
since quarterly data were analyzed and these were conceptually related to health, and
are presumably even less stable than the 0.42 thus would reflect changes if the treatment
correlations for annual data. effect was due to a general improvement
To summarize, when we look at the four in attitudes toward health. However, since
within-city comparisons, there were no differ- good nutrition and stress reduction were not
ences between control and comparison groups targeted by the campaign, they would not
at three of the sites. The only difference was in show improvements if the effect resulted from
the smaller, less stable Portland comparison. the treatment alone. As Figure 10.7 shows,
For the fifth within-state comparison, labor awareness of alcohol abuse clearly increased
markets between Grand Rapids and Detroit during the media campaign, but awareness
were different enough that we would expect of other health-related issues did not.
these sites to produce inferior matches to those McClannahan et al. (1990) employed a
of truly local comparison and control groups. switching-replications feature to assess the
Even so, when results were summed across all effects of providing married couples who
sites, the average biases cancelled each other supervised group homes for autistic children
out, and the quasi- and experimental studies with regular feedback about the daily personal
yielded estimates with close correspondence. hygiene and appearance of the children in their
Thus, we disagree with Bloom et al.’s (2005a; home. The authors used a short time series
Michalopoulos et al., 2004) conclusion about (21 observations), with feedback introduced
different effects attributable to the experiment after Session 6 in Home 1, Session 11
and quasi-experiment. Fortunately, it is easy in Home 2, and Session 16 in Home 3.
for readers to judge for themselves. Just Figure 10.8 shows that after each introduction,
look at Figure 10.6 and see whether there the personal appearance of the children in
is a control/comparison difference around the that home increased above baseline, and
intervention point for most of the within-city the improvement was maintained over time.
cases. Both examples, however, demonstrate one
limitation of abbreviated time series data —
the difficulty in knowing the duration of an
Examples of ITS design
effect. For example, Figure 10.7 shows an
In this section, we use examples of abbre- apparent decrease in alcohol abuse awareness
viated ITS to highlight two design features after the two-week intervention.
over and above those already mentioned — Two additional features, removing a treat-
a longer pretest time-series and a con- ment at a known time and adding multiple
trol series selected from non-equivalent replications of a treatment, can strengthen
but matched units. The two features we inference in an abbreviated ITS design. In the
emphasize are nonequivalent dependent vari- former, treatment effects can be demonstrated
ables and switching replications. When they by not only showing that the effects occur
are thoughtfully incorporated into quasi- with the treatment but also that the effects
experimental designs, many common threats stop when the treatment is removed later in the
to validity can be addressed. time series, making this design akin to having
In a study that assessed the effects of two consecutive ITS. In multiple replications,
a 1989 media campaign to reduce alcohol the treatment is introduced, removed, and
use among students at a university festival, then introduced again according to a planned
McKillip (1992) added two nonequivalent schedule. A treatment effect is suggested if
dependent variables to strengthen inference the outcome responds similarly each time the
in a short time series. His main dependent treatment is introduced and removed, with the
3.2 Campaign
Good Nutrition
3.0 Stress Reduction
Responsible Alcohol Use
2.8
Awareness
2.6
2.4
2.2
2.0
28-Mar 30-Mar 4-Apr 6-Apr 11-Apr 13-Apr 18-Apr 20-Apr 25-Apr 27-Apr
Date of Observation
Figure 10.7 The effects of a media program to increase awareness of alcohol abuse
Source: McKillip, 1992. Copyright 1992 by Plenum Press.
100
Mean Personal
Appearance
75
Score
50
Home 1
25
100
No Feedback Feedback
Mean Personal
Appearance
75
Score
50
Home 2
25
100
Mean Personal
Appearance
75
Score
50
Home 3
25
1 3 5 7 9 11 13 15 17 19 21
Sessions
Figure 10.8 The effects of a parental intervention on the physical appearance of autistic
children in three different homes
Source: McClannahan et al., 1990. Copyright 1990 by The Society for the Experimental Analysis
of Behavior.
direction of responses being different for the validity threats as maturation, regression,
introductions compared with the removals. and history.
Evidence for this design’s validity

THE MOST FREQUENTLY USED The validity of the difference-in-differences
QUASI-EXPERIMENTAL DESIGN: design depends on how well the comparison
DIFFERENCE-IN-DIFFERENCES DESIGN group is matched to the treatment group. Ideal
matches would not differ between treatment
Probably the most widely used nonequivalent groups other than for participation in the
comparison group design compares a treated intervention, thus ruling out selection as a
and an untreated comparison group at one threat to validity. Randomized experiments
pretest time point and at one posttest time, are based on this paradigm. Statisticians
using the same units in each group at each have routinely shown that, in a perfectly
time. Within the quasi-experimental tradition, implemented randomized experiment, no
this design is called the nonequivalent com- differences are expected between treatment
parison group design (Campbell & Stanley, groups prior to intervention, and that this
1963; Cook & Campbell, 1979; Shadish holds for both measured and unmeasured
et al., 2002), but in economics, the design variables. Congruent with this analogy and
is described as fixed effects or difference- Mill’s canons (1856), finding better matches
in-differences. We use all three names inter- dominates much of the thinking about better
changeably. The design is diagrammed in quasi-experimental design.
Figure 10.9. In quasi-experiments, because Some studies have examined the validity
comparison groups are nonequivalent by defi- of the difference-in-differences design by
nition, selection is a concern. Pretest measures comparing estimates from it with those from
can help researchers assess the direction and an experimental benchmark that shares the
size of any bias directly observable at pretest. same treatment group. One is by Shadish
It can also help assess the threat of attrition if et al. (2007), and the second by Aiken et al.
participants exit the study before it is finished, (1998). Each study exemplifies the most
for those exiting may be different from those important requirement for a fair test of design
who remain. With a pretest one can directly types — that everything is identical between
examine group differences in those exiting the experiment and quasi-experiment other
and remaining, at least on the pretest variable. than how units are assigned to treatment.
A word of caution, however. A single Shadish et al. (2007) present one of the best
pretest measure rarely eliminates all plausible tests of the difference-in-differences design,
threats to validity. The extent to which the though it is only one study, in the laboratory,
pretest rules out selection can also depend and very short term. The authors looked at
on unmeasured variables that are correlated the effects of a group coaching intervention
with both treatment receipt and outcome. on math and vocabulary performances among
When pretests show differences between college students. Participants were randomly
groups, selection can also combine with assigned to either the experimental or quasi-
other threats additively or interactively. Thus, experimental condition. Both designs were
the earlier Ashenfelter example demonstrated administered so that respondents underwent
how selection can combine with such internal essentially the same experiences at the same
time and in the same setting. Further, the
same battery of measurements was used in
O1 X O2
O1 O2 the experiment and quasi-experiment, with
all participants providing details at a pretest
Figure 10.9 Nonequivalent comparison time point on personality scales, about their
group design interest in math and reading, and about their
performance on math and vocabulary topics design, and appropriate statistical procedures.
close to those taught. School records were At a minimum, a quality quasi-experiment
also examined to get their grades in math and minimizes selection by clearly measuring
language arts courses as well as their SAT and modeling it. Note the pool from where
scores. For students exposed to math coach- comparison group members were drawn —
ing, their posttest math scores functioned as not only were comparison matches ‘local’ to
the intervention-relevant outcome while their treatment members, but they also attended
vocabulary posttest results served as con- the same institution, were of similar ages,
trols. For students who received the reading and exposed to similar experiences at the
intervention, vocabulary performance was the institution (a psychology course). In addition,
intervention-relevant outcome and math per- the authors modeled a selection process
formance served as controls. Half of the stu- where individuals’ motivation for choosing
dents were randomly assigned a vocabulary or a field of coaching was related to their
math treatment, while students in the quasi- interests and cognitive strengths, measured
experimental design condition were able to by pre-intervention, psychometrically sound
choose which treatment they received. The pre- multi-item questionnaires assessing students’
test results indicated no treatment and control motivation to learn about math and language
group differences in the experimental condition, arts, their test scores in math and language
but not in the quasi-experimental condition. arts, past grades in math and language arts
Those who chose the vocabulary intervention courses, and content-valid scales specifically
had higher vocabulary pretest scores than constructed to assess math and vocabulary
those who chose the math intervention, and knowledge. This last was also used to
vice versa. Thus, the quasi-experimental measure post-intervention outcomes, with the
groups were non-equivalent in a way that was expectation that pre- and post-intervention
originally intended by the authors. test scores would be highly correlated.
Consider the ways in which Shadish In all, access to rich covariates that modeled
et al.’s (2007) study meet our criterion for the selection process, and strong overlap
a strong test of design types. First, pretest in background characteristics between
scores for students in the experimental and treatment and comparison group members,
quasi-experimental conditions indicate that enabled the authors to use a statistical
there was variation between design types. procedure called propensity score matching.
Next, as we have already discussed, the Like other matching techniques, propensity
random assignment of students into the exper- scores seek to pair treatment and comparison
imental and quasi-experimental conditions group members on observable characteristics
and the uniformity in procedures for both con- that are stably measured. One problem is
ditions ruled out variation in features that were that, as the number of matching variables
correlated with both design type and outcome. increases, so does the dimensionality of
Third, the laboratory conditions meant that matches, making it exponentially more diffi-
the random assignment process was entirely cult to find suitable matches for each treated
under experimenter control and its efficacy unit. Propensity scores reduce this problem
could be independently checked against the by creating a single index of the propensity
pretest means. The setting also prevented to be exposed to the treatment through a first
differential attrition from the two design type stage in the analysis where potential predictors
groups, and no treatment contamination from of selection are used to see which ones are
math to vocabulary coaching and vice versa. related to treatment exposure understood in
So we feel confident that results obtained binary fashion. A propensity score is the
from the experimental design served as a valid probability of receiving treatment conditional
standard for which to compare estimates from on these pretreatment covariates that are
the quasi-experiment. weighted and put into a single index. The
We would also expect that a fair test of design advantage of propensity score matching is that
types uses a sophisticated quasi-experimental it allows researchers to condition on a single
scalar variable rather than multiple dimension contacted prior to the quarter they attended
spaces; this single variable is then used to the university, or whose decision to enroll
analyze the outcome data in a number of was made after information was collected
different possible ways. for the RD study. Because the hard-to-reach
Because there is some art to the use of and late-applying students still had SAT or
propensity scores, Shadish et al. (2007) con- ACT scores as requirements for admission,
sulted with one of the method’s developers, the authors were able to create a quasi-
Paul Rosenbaum. They then used his rec- experimental comparison group that was
ommendations, first, to calculate propensity restricted to those who scored within the
scores from the array of covariates collected same bandwidth as students in the randomized
and, then, to achieve good balance across the experiment.
five equal strata computed from these scores. Note that the matching took place within the
(Another analysis used the propensity scores sampling design and not ex post facto when
as covariates, after testing and adjusting for cases from obviously different populations
possible non-linearities — this not being the would have to be individually matched by
analysis Rosenbaum recommended.) taking advantage of where they overlap.
Within-study comparisons results indicated Indeed, the match at the sampling level was
that under the conditions built into this so close, as in Bloom et al. (2005a) that
study, the experiment and quasi-experiment the control and comparison groups did not
resulted in post-intervention effect sizes differ on any observables correlated with
that corresponded for the experiment and the outcomes of interest—viz., on entry-level
quasi-experiment. Bias was not significantly ACT/SAT scores or pretest essay writing and
reduced, however, when just the demographic multiple choice exam scores. Moreover, the
variables—called predictors of convenience experimental and quasi-experimental samples
by the authors—were used and in one case underwent the same treatment and non-
may even have increased it. Later analyses treatment experiences and the same measure-
showed that a measure of the strength of ment schedules in order to rule these out as
motivation to be exposed to math or language sources of conceptually irrelevant variance.
arts was the most important single covariate In all, this was a carefully constructed quasi-
for reducing bias, particularly for the effects experiment despite the modest structure of
of instruction in mathematics, followed by just two non-equivalent groups and a single
the measures of math and language arts pretest measurement wave. The randomized
achievement. The key assumption is that, in experiment was also carefully managed. The
this case, the selection process was driven authors demonstrated that pretest means did
largely, but not exclusively, by individuals not differ and differential attrition did not
self-selecting themselves into coaching on the occur. Given the close correspondence in
subject matter about which they felt more means, the authors used ANCOVA to ana-
comfort. It seems plausible to hypothesize, lyze the quasi-experiment, with each pretest
therefore, that the quality of the covariate outcome serving as a covariate for itself at
structure played a role in reducing bias, and a later date.
that this quality reflects how well the selection For the test of English knowledge out-
process was conceptualized and measured. come, effect size results were 0.57 standard
The second study we identified as a deviations for the quasi-experiment and
strong test of the difference-in-differences 0.59 for the randomized experiment, both
design is by Aiken et al. (1998). In addition being statistically different from zero. For
to the RD design discussed earlier, the the essay-writing outcome, the effect sizes
authors compared their experimental results were 0.16 and 0.06, neither being reliably
on the efficacy of remedial English with different from zero. Thus, by criteria of both
estimates obtained from a carefully designed effect size magnitude and statistical signifi-
basic quasi-experiment. Their comparison cance patterns, the experimental and quasi-
group was of students who could not be experimental design produced comparable
results. Note that the close correspondence comparison cases came from national reg-
was achieved largely through careful selection istries rather than from the same local
of the non-equivalent comparison cases, prior venue as the experiments, and earnings were
to statistical adjustment that in this case served measured at different times in each study,
more to increase power than to control for thus confounding the type of design with the
selection. state of local labor markets as they varied
We now briefly review within-study com- over time (Smith & Todd, 2005). In addition,
parisons from the job training literature. These selection models were sometimes estimated
studies have had extraordinary influence over using few demographic variables that did
methodology choice in American evaluation not even contain pre-intervention earnings
policy, with conclusions from these studies assessments, though many later studies did
suggesting that quasi-experiments fail to include these measures. However, the pre-
replicate benchmark experimental estimates intervention variables were not analyzed in
(Glazerman et al., 2003). However, a close the abbreviated time-series fashion detailed
reading of these early papers (Cook et al., earlier, but combined into a single propensity
2005; Smith & Todd, 2005) suggests that the score. In fairness, the early job training
experiments and quasi-experiments differed studies were conducted by pioneers, many of
in many other ways than just in how treatments whom were anxious to investigate matching
were assigned, thus making obscure why the strategies with databases that had already
experiments and quasi-experiments differed been collected for non-evaluation purposes
in obtained effect sizes. Was it due to the mode and that would be much less expensive than
of treatment assignment, the issue at stake, or constructing randomly formed control groups
was it due to extraneous differences between as in an experiment. So their interests were
the two study types — e.g. in how outcomes pragmatic as well as theoretical.
were measured? The early within-study comparisons
The earliest within-study comparisons in spawned more studies similar in conception
job training took the effect size from a and overall structure but differing in some
randomized experiment and compared it to the details. Later studies used newer statistical
effect size from a quasi-experiment consisting tools for handling selection, more experi-
of the same intervention group (Fraker & mental datasets were added, and different
Maynard, 1987; LaLonde, 1986). Comparison ways evolved for constructing quasi-
group members were drawn selectively and experimental comparison groups, moving
systematically from large, national datasets, away from the use of national datasets to
such as the Panel Study of Income Dynamics comparisons that were living quite locally
or the Current Population Study. Data from the to the treated (Smith & Todd, 2005). This
quasi-experiments were then analyzed using was to unconfound the mode of treatment
various statistical models, including OLS and assignment with differences in location and
the Heckman selection models of the day. testing in order to draw clearer conclusions
Here the emphasis was on selection adjust- about the effects of random assignment or not.
ment via statistical manipulation rather than Overall, the job training literature
sample selection. When the resulting effect yielded some important lessons about
sizes were compared to the effect size from the quasi-experimental design. We learned that
yoked experiment, the authors concluded that ‘technically better’ designs had the following
the experimental and nonexperimental effect features: (1) pretests and longer pretest time
sizes were generally different whatever the series, especially those with higher pretest-
mode of statistical adjustment for selection. outcome correlations; (2) local control
Unfortunately, the design comparisons groups, though this never went so far as to
were almost inevitably confounded with use twins, siblings or within-organization
both location and manner of testing. In the comparisons; (3) treatment and comparison
quasi-experiments, unlike the experiments, groups assessed in exactly the same way at
exactly the same time by exactly the same A closer look at the quasi-experimental
assessment procedure; (4) testing a causal design, however, suggests several weak-
hypothesis with several implications in the nesses. First, because the original study was
data; and (5) directly and comprehensively a randomized experiment with large samples
measuring and modeling the selection process of students, Project Star did not require
in its own right. pretest achievement measures. Most non-time
series quasi-experiments are considered to be
causally uninterpretable if they are without
Examples of difference-in-
pretest measures because it is so difficult to
differences design
rule out selection effects in any transparent
In this section, we discuss examples of fashion (Cook & Campbell, 1979). A second
the nonequivalent comparison group design. concern is that treatment students were
Because of the wide variation in quality matched with control students who attended
of quasi-experimental designs that use this schools from all over the state of Tennessee,
design model, we present two studies that thus reducing the degree of localness. Yet with
Cook and Campbell (1979) would identify little extra effort the researchers could have
as ‘generally uninterpretable,’ and two that first selected schools or classrooms in terms
we believe are exemplars of the design. of their average student race or free lunch
We begin with two studies that attempt to status, then creating their individual student-
strengthen quasi-experiments almost exclu- level matches from within this prior school
sively through the use of statistical matching and/or classroom matching. Better yet, since
procedures, in this case matching through prior achievement data is routinely available
propensity scores. at the school level, and sometimes even at the
The original intent of the Wilde and classroom level, why did the researchers not
Hollister (2007) and Agodini and Dynarski match on prior aggregate level achievement
(2004) studies was to compare impact causal before then matching on individual level
estimates from a randomized experiment with propensity scores? The matching procedure
those from a quasi-experimental design that used by Wilde and Hollister created treatment
used propensity scores to match what were and comparison units from such different
evidently different populations. Using data aggregate worlds that there was little overlap
from 11 schools in the Tennessee Project Star on measured variables. The alternative sam-
Study, Wilde and Hollister looked at class pling design we propose permits propensity
size effects on student achievement in each scores to be calculated from worlds that
of the 11 sites. For their quasi-experimental overlap much more from the start. We suspect
study, the researchers matched students from that the lack of pretest achievement measures
treatment classrooms within a school to and weak matching procedure with samples
students from untreated classrooms from all of limited initial comparability led to a design
other schools in the Project Start study. Their that no sophisticated researcher would use
propensity score matches were constructed if asked to create a quasi-experiment from
using data from multiple levels, including scratch. In other words, a good experiment
information about the student (especially free- is being compared to a mediocre quasi-
lunch status) and about the teacher and school. experiment in both design and analysis terms,
No pretest achievement measures were used thus confounding design type with quality
at the student, classroom, or school levels of design in features other than the mode of
since none were collected in the original treatment assignment.
study. The authors concluded that results Agodini and Dynarski (2004) is the second
from the experimental and quasi-experimental study we analyze. It examined how 16 middle
designs generally failed to replicate, and that and high school dropout prevention programs
experimental results should be preferred on a affected student dropout, absenteeism, and
priori grounds. self-esteem two years later. They provided
volunteer students with targeted services such and restricted only to the 7th , 9th and 10th
as mentoring, tutoring, individual counseling, grades. Given the differences between the
and smaller class sizes in order to reduce two groups in location and school age, and
student dropout and absenteeism and increase probably also in observed characteristics, it
self-esteem and educational aspirations over is understandable why so few acceptable
two years. In the experiment, controls were matches were achieved between treatment
randomly selected students who had applied students and comparison students from a few
to the intervention or were referred to partici- middle schools. The NELS comparison was
pate in it. Two data sources were used to con- likely no better. National datasets contain
struct the non-equivalent comparison groups relatively few persons at risk for dropping
from which propensity scores were computed. out, and so the pool of potential matches
For the first, researchers matched treatment was restricted to start with. Further, measure-
group members with students attending four ment specifics and geographic location vary
comparison schools in a quasi-experimental between the treatment group and potential
study of school restructuring. These were 7th comparison students from NELS, making it all
graders in two middle schools, 9th graders the more difficult to achieve suitable matches
in one high school, and 10th graders in when using NELS for matching purposes.
another. For the second,Agodini and Dynarski Looking at Wilde and Hollister (2007)
constructed matches from a national dataset, and Agodini and Dynarski (2004) together,
the National Educational Longitudinal Study one is reminded of the adage, ‘You cannot
(NELS). The researchers’ original plan was put right by statistics what you have done
to generate 128 propensity score matches wrong by design’ (Light & Pillemer, 1984).
across four outcomes, 16 schools and their Shadish et al. (2007) and Aiken et al. (1998)
matched comparisons, and two types of showed that the best nonequivalent group
comparison groups (NELS versus the four comparisons are from studies where matching
comparison middle schools). The number was achieved through a careful sampling
of pre-intervention covariates used in the design and where statistical adjustment is
propensity score calculations varied by data relegated to the role of an auxiliary procedure
source, but there were never fewer than 13, to control for any remaining differences
including prior test scores but not much between groups. It is definitely not the first line
dropout information since so few students of attack on initial group non-comparability.
drop out and then return to school. Below, we discuss other design features
Agodini and Dynarski (2004) concluded that improve causal conclusion-drawing from
that their quasi-experimental and experimen- quasi-experiments over and above the careful
tal designs produced different results when the sampling discussed above that antedates any
two could be compared, but that they could individual case-matching.
be compared in only 29 of the 128 planned Wortman et al. (1978) examined how a
cases. At first glance, the quasi-experimental program that provided parents with educa-
design appears strong due to the presence tional vouchers to attend a local school of their
of pretest scores and extensive baseline choice affected students’ reading test scores.
measures. However, close inspection of both The program’s goal was to foster competition
comparison group sources suggests serious between schools in the system, and initial
limitations in the sampling design. For the results by others suggested that vouch-
first comparison, students were drawn from ers decreased academic performance among
four schools not in the same school districts students. However, Wortman et al. doubted
as the treatment schools, nor necessarily these conclusions and so they followed groups
even in the same part of the country. In of students from first to third grades in both
addition, treated students were from all the voucher and non-voucher schools, and fur-
middle and high school grades whereas the ther divided voucher schools into those with
comparison students were fewer in number and without traditional voucher programs.
The authors also reanalyzed the data using reverse effect. In Hackman et al.’s study,
double pretest scores, which allowed them technological innovations in a bank resulted
to compare pretreatment growth rates in in some clerical jobs to be more complex and
reading with posttest change in rates. Results challenging (treatment +) and other jobs to
from Wortman et al.’s analyses found that be less so (treatment −). The job changes
the decrease in reading scores previously were made without telling the employees
attributed to voucher schools could actually be of their possible motivational consequences,
attributed to nontraditional voucher programs. and measures of job characteristics, employee
Further, traditional voucher and non-voucher attitudes, and work behaviors were taken
groups showed no differential effects that before and after the jobs were reconstituted.
could not be explained by a continuation An effect would be detected if a statistical
of the same maturation rates which had interaction resulted from improved scores
previously characterized the traditional and among employees who received treatment (+)
voucher control schools. and lower scores among those who received
Double pretests allowed researchers to treatment (−).
assess the threat of selection-maturation on Consider how the reversed-treatment
the assumption that the rates between the first design can strengthen a study’s construct
two pretests will continue between the second validity. In a design with only treatment (+)
pretest and outcome measure. However, this and no treatment controls, a steeper pretest-
assumption is testable only for the untreated posttest slope in the enriched condition could
group, and within-group growth rates will be be explained by employees’ responding to
fallibly estimated given measurement error novelty in their jobs, feelings of special
and possible instrumentation shifts that make treatment, or guessing the study’s hypothesis.
measured growth between the two pretests These alternatives are less plausible if the
different from the second pretest and outcome reversed-treatment group exhibits a pretest-
measure. Thus, while the double pretest posttest decrease in job satisfaction because it
design with nonequivalent groups is not is thought that knowledge of being in a study
perfect, it can help assess the plausibility tends to elicit socially desirable responses
of selection-maturation by describing pre- from participants. Thus, to explain both an
treatment growth differences. The double increase in treatment (+) group and decrease
pretest design can also assess regression in the reversed group, each set of respondents
effects by showing whether the second would have to guess the hypothesis and
pretest for either group is atypically low or corroborate it in their own different way.
high compared to the first pretest. Finally, Interpretation of this design then depends on
the second pretest measure can help with producing two effects with opposite signs,
statistical analysis by providing more precise and the design assumes that little historical
estimates of correlation between observations and/or motivation changes are otherwise
at different times. Without the extra pretest taking place.
measure, the correlation in observations
without the treatment would be unclear.
Hackman et al. (1978) strengthened the
nonequivalent comparison group design by STRENGTHENING WEAK
adding a reversed-treatment control group QUASI-EXPERIMENTAL
feature to investigate how changes in moti- DESIGNS THROUGH THE
vational properties of jobs affect worker atti- USE OF PATTERN-MATCHING
tudes and behaviors. In a reversed-treatment
control design, one group receives a treatment The quasi-experimental designs we believe
(+) to produce an effect in one direction are weak causal tests should be apparent by
and the other group receives a conceptu- now — those without a pretest measure on
ally opposite treatment (−) to produce the the same scale as the outcome, those without a
comparison group, and those without baseline content analysis showed that Sesame Street
covariates that can be combined to create a taught predominantly letter skills in its first
plausible and well-measured selection model. year, and so Minton hypothesized that the
Without a pretest, it is difficult to know younger siblings should do better than their
whether a change has occurred and to rule older siblings in letter recognition but not in
out most threats to internal validity. Without five other cognitive areas that are part of a
a comparison group, it is difficult to know child’s normal maturation. In other words, the
what would have happened in the treatment difference between siblings should be greater
group had the intervention not been in in letter recognition than in other cognitive
place — the desired counterfactual. Finally, skills. She further hypothesized that children
without relevant covariates to control for who watched the show more frequently
pre-intervention differences between groups, would do better than their siblings on letter
it is difficult to know whether selection is recognition to a degree that was different
confounded with treatment effects. These from among lighter viewers and that was
factors point to the kinds of quasi-experiment different than what was found for non-letter
that should be avoided because of the recognition skills. Thus, the hypothesis was
high risk of yielding results that Cook of a difference of differences of differences.
and Campbell (1979) have called ‘generally OLS analyses showed that heavier viewers
non-interpretable.’ However, what happens did indeed do better than their siblings on
in circumstances where the ideal design letter recognition to an extent not found with
conditions that Shadish et al. (2007) andAiken the lighter viewers, and that this difference of
et al. (1998) created are not possible (i.e. differences was not as pronounced on the five
studies where pretest data is not available)? other cognitive tests as on letter recognition.
The superiority of RD and ITS designs is Few alternative interpretations can be offered
based on an epistemology that is subtly dif- for this predicted pattern of difference of
ferent from the one that validates randomized difference of differences.
experiments and most of the other quasi- Note that this study’s finding appears valid
experimental designs utilized today. There, even without pretests, and that measurement
the counterfactual is a single posttest mean took place at different years for the treatment
or a form of ‘gain’ in the control group. In RD and comparison groups. Yet the design seems
and ITS, on the other hand, the counterfactual strong. Why? First, Minton compensated for
is more complex and depends instead on a some design weaknesses by having siblings
pattern match, on a causal hypothesis that in the treatment and comparison groups. They
predicts multiple implications in the data. are not perfect matches, though, even if
Together, they form a multivariate pattern they do control for some environmental and
(Corrin & Cook, 1998; Shadish et al., 2007) family differences better than matches better
that few if any alternative interpretations than more distantly related individuals would.
would be expected to create, though this last Second, the same general causal hypothesis
assertion has to be critically assessed. about Sesame Street’s effectiveness was made
Let us illustrate an example. Minton (1975) to have a number of substantive and testable
examined the effects of Sesame Street by implications in the data, not just a single
comparing the cognitive performance of implication. In particular, effect sizes should
children who were exposed to the show in vary by the outcome measure and dosage
kindergarten with the performance of their level. This still does not make causal inference
own siblings when they were in the same ‘automatic.’ A case still has to be made that
kindergarten one or two years earlier and when no other causal hypothesis can explain the
they could not have seen the show since it was predicted and obtained complex data pattern;
not yet on the air. To compare just these sib- and one has to develop such designs with one’s
lings is a weak design that fails to account for eyes wide open that the hypothesis involves
selection and history differences. However, a multi-way statistical analysis that requires
large sample sizes and quality measurement comparison groups that are assessed in exactly
to test well. Nonetheless and following the same way at exactly the same time
Minton’s example, we would like to see by exactly the same person; (4) a causal
more use of patterned causal hypotheses hypothesis with several testable implications
when experiments or very high-quality quasi- in the data that can be addressed with
experiments are not possible. larger samples and quality measurement;
and (5) a study component that empirically
examines the selection process into treatment,
CONCLUSIONS and then measures this process very carefully.
Because of the randomized experiment’s
In some economics’ contexts, quasi- more elegant rationale and transparency of
experiments are lumped together with assumptions, no quasi-experiment provides a
causal studies that do not have any direct better warrant for causal inference. However,
intervention, and the whole is called randomized experiments are not always
‘nonexperiments.’ However, one tradition possible, and so we ask, ‘How can quasi-
(Campbell & Stanley, 1963; Cook & experiments be crafted and justified because,
Campbell, 1979; Shadish et al., 2002) makes on empirical grounds, they are likely to
finer distinctions than this, distinguishing produce similar results to an experiment?’
among experiments and nonexperiments — A review of the empirical literature suggests
based mainly on deliberate intervention that the best quasi-experiments tend to yield
into an ongoing activity — and between causal estimates close to those of the experi-
different kinds (and qualities) of quasi- ment, while the worst quasi-experiments do
experiments. Widespread use of the generic not. The time has come for us to move
‘nonexperiment’ label loses all this subtlety. beyond the simplicity of the ‘experiment
At best, it serves as the contrast to experiment; versus nonexperiment’ debate and to take a
at its worst it lumps together methods that closer look at factors affecting the quality of
radically vary in their ability to approximate quasi-experiments.
the results of experiments. It should be a
concept rarely invoked, though we realize we NOTES
cannot legislate this.
In this chapter, we have chosen to highlight 1 Cook and Wong (in press) present the following
the best designs that quasi-experimental seven criteria for conducting a high quality study of
theory has to offer. Empirical research has within-study comparisons:
shown that RD studies give the same causal 1 There must be variation in the design types
answer as experiments on the same topic; being compared — that is, random assignment
abbreviated ITS studies may also when there in one group of units and a contrasting form of
is a control time series; even the lowly assignment in another group.
2 The assignment difference between the exper-
workhorse design with two non-equivalent iment and nonexperiment should not co-vary
groups and a pretest and posttest may give with theoretically irrelevant third variables that
a close approximation if the treatment and might be plausibly correlated with study outcome
comparison groups are carefully selected (Smith & Todd, 2005). For instance, in the
initially. Certain design attributes seem par- earliest within-study comparisons, the randomly
selected control cases came from the same sites
ticularly important, including: (1) pretests as the intervention cases, but the non-random
and longer pretest time series, especially comparison cases came from national datasets
when the pretest-outcome correlation is high; like the Current Population Survey and hence
(2) local comparison groups — whether these from different physical locations than those in
be monozygotic twins, identical twins, same- the experiment. The random and systematic
controls also differed in many aspects of when
sex siblings, opposite-sex siblings, within- and how outcome measurement occurred, thus
organization controls, within-city matched also confounding the assignment variable of
controls, and so on; (3) treatment and theoretical interest with measurement factors.
3 The experiment and nonexperiment should also posttest sample means and variances. Assuming
estimate the same average or local average adequate statistical power, the same pattern of
treatment effect. For example, in a RD study the statistical significance will result in only 68 per-
causal impact is assessed at the cutoff point on the cent of comparisons — the probability of two
assignment variable. Comparability demands that significant findings across experiments is 0.80 ×
the average treatment effect in the experiment 0.80, and the probability of two non-significant
should also be estimated at this point. Otherwise, findings is 0.20 × 0.20. Better than comparisons of
differences in results might be attributed to significance test patterns are focused tests of the
differences in design type whereas they are due difference between mean estimates. But these are
to differences in where the effect is estimated. This rare in the literature we review and require careful
will not matter with linear effects in RD, but it will interpretation, especially when experimental and
with non-linear ones. nonexperimental estimates with the same causal
4 The randomized experiment should demonstrably sign reliably differ from zero and are also reliably
meet all the usual criteria for technical adequacy. different from each other. Comparing magnitude
That is, the treatment and control group should estimates without significance tests is another
have been properly randomized; the correct option. But this is complicated by the need to
randomization procedure should not have resulted determine what degree of difference is close
in unhappy randomization by chance; there should enough to justify concluding that the experimental
be no differential attrition; nor should there be and nonexperimental estimates do or do not
treatment crossovers. The importance of these differ.
features follows from the role the randomly 7 The persons analyzing the non-experimental data
formed control group is supposed to play as a should be blind to the results of the experiment
benchmark of complete internal validity. so as not to bias which non-experimental analyses
5 The type of nonexperiment under analysis should are conducted or offered for publication.
also meet all of its technical criteria for being
2 One of the cleanest examples of IV is the use
a good example of its type. This is a difficult
of random assignment as an instrumental variable
criterion, but necessary for avoiding the situation
in order to examine the effects of assignment as it
that results when a good experiment is contrasted
actually occurred as opposed to how it was supposed
with a poor example of a particular type of
to occur (see Angrist et al., 1996 for full explanation).
observational study. The key here is an explicit
3 Hahn et al. (2001) offer a formal discussion of
theory of what constitutes a quality observational
instrumental variable methods for addressing fuzzy
study in terms of its design, implementation, and
discontinuities, and suggest local linear regression as a
analysis. This is better known for RD than for the
non-parametric IV procedure for estimating treatment
difference-in-differences design, largely because
effects.
the assignment process is more transparent and
4 It is important to note that when doing analysis
better modeled in RD, directing major attention
using an interrupted time series design, one must
to how the functional form is specified and how
adjust for possible correlation between observations.
fuzziness around the cutoff is handled. This is not
For example, ordinary statistical tests (i.e. t-tests)
to argue that unbiased inference is impossible with
that compare pre- and post-treatment observations
the difference-in-difference design. However, the
assume that observations are taken from independent
requirement is then that assignment processes
and identical distributions. However, this assumption
have to be perfectly modeled or the outcome
is often not met when analyzing time series data
totally predicted and, in actual research practice,
(think about autocorrelation of a student’s test score
uncertainties always remain about how well these
from year to year). Estimating autocorrelation requires
requirements are met. Clues are also offered
a larger number of observations to facilitate correct
by the results identifying bias-reducing features
model identification.
in past reviews of the within-study comparison
literature in job training. But as we have seen,
these are incomplete and have never completely
reduced selection bias. At most, common sense REFERENCES
can help identify clear cases of poor design and
analysis even if it cannot help discriminate among
Agodini, R., & Dynarski, M. (2004). Are experiments
the alternatives currently thought to be better.
the only option? A look at dropout prevention
6 A within-study comparison should be explicit
about the criteria it uses for inferring corre- programs. The Review of Economics and Statistics,
spondence between experimental and nonexper- 86 (1), 180–194.
imental results. Identical estimates are not to Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J., &
be expected. Even close replications of the same Hsuing, S. (1998). Comparison of a randomized
randomized experiment will not result in identical and two quasi-experimental designs in a single
outcome evaluation: Efficacy of a university-level policy? Econometric evaluation of public policies:

remedial writing program. Evaluation Review, 22(4), Methods and applications Retrieved December 15,
207–244. 2005, from www.crest.fr/conference/paper/cook.doc
Angrist, J., Imbens, G. W., & Rubin, D. B. (1996). Identifi- Cook, T. D., & Wong, V. C. (in press). Empirical tests
cation of causal effects using instrumental variables. of the validity of the regression-discontinuity design.
Journal of the American Statistical Association, 91, Annales d’Economie et de Statistique.
444–472. Corrin, W. J., & Cook, T. D. (1998). Design elements of
Angrist, J. D., & Lavy, V. (1999). Using Maimonides’ quasi-experiments. In A. J. Reynolds & H. J. Walberg
rule to estimate the effect of class size on scholastic (Eds.), Advances in educational productivity (Vol. 7).
achievement. Quarterly Journal of Economics, 144, Greenwich, CT: JAI Press, Inc.
533–576. Fraker, T., & Maynard, R. (1987). The adequacy
Ashenfelter, O. (1978). Estimating the effects of training of comparison group designs for evaluations of
programs on earnings. Review of Economics and employment-related programs. Journal of Human
Statistics, 60, 47–57. Resources, 22(2), 194–227.
Black, D., Galdo, J., & Smith, J. C. (2005). Evaluating the Glazerman, S., Levy, D. M., & Myers, D. (2003).
regression discontinuity design using experimental Nonexperimental versus experimental estimates of
data. Working paper. earnings impacts. The Annals of the American
Bloom, H. S., Michalopoulos, C., & Hill, C. J. Academy, 589, 63–93.
(2005a). Using experiments to assess nonexperi- Goldberger, A. S. (1972a). Selection bias in evaluating
mental comparison-group methods for measuring treatment effects: Some formal illustrations. Madison,
program effects. In H. S. Bloom (Ed.), Learning more WI: Institute for Research on Poverty.
from social experiments (pp. 173–235). New York: Goldberger, A. S. (1972b). Selection bias in evaluating
Russell Sage Foundation. treatment effects: The case of interaction. Madison,
Bloom, H. S., Michalopoulos, C., Hill, C. J., & WI: Institute for Research on Poverty.
Lei, Y. (2002). Can nonexperimental comparison Hackman, J. R., Pearce, J. L., & Wolfe, J. C. (1978). Effects
group methods match the findings from a random of changes in job characteristics on work attitudes and
assignment evaluation of mandatory welfare-to-work behaviors: A naturally occurring quasi-experiment.
programs? Washington, DC: Manpower Demonstra- Organizational Behavior and Human Performance,
tion Research Corporation. 21, 289–304.
Bloom, H. S., Richburg-Hayes, L., & Black, A. R. Hahn, J., Todd, P., & Van der Klaauw, W. (2001).
(2005b). Using covariates to improve precision: Identification and estimation of treatment effects
Empirical guidance for studies that randomize schools with a regression-discontinuity design. Econometrica,
to measure the impacts of educational interven- 69(1), 201–209.
tions. Washington, DC: Manpower Demonstration Institute of Education Sciences (2004). Reading comprehen-
Research Corporation. sion and reading scale-up research grants request for
Box, G. E. P., & Jenkins, G. M. (1970). Time series applications. Washington, DC: Department of Education.
analysis: Forecasting and control. San Francisco: Jacob, B., & Lefgren, L. (2004a). The impact of teacher
Holden-Day. training on student achievement: Quasi-experimental
Buddelmeyer, H., & Skoufias, E. (2003). An evaluation evidence from school reform efforts in Chicago.
of the performance of regression discontinuity Journal of Human Resources, 39(1), 50–79.
design on PROGRESA. Bonn, Germany: IZA. Jacob, B., & Lefgren, L. (2004b). Remedial education
Campbell, D. T., & Stanley, J. C. (1963). Experimental and student achievement: A regression-discontinuity
and quasi-experimental designs for research on analysis. Review of Economics and Statistics,
teaching. In N. L. Gage (Ed.), Handbook of research LXXXVI(1), 226–244.
on teaching. Chicago: Rand McNally. Jacob, B. A., & Lefgren, L. (2002). The impact of teacher
Cook, T. D. (in press). ‘Waiting for life to arrive’: A history training on student achievement: Quasi-experimental
of the regression-discontinuity design in psychology, evidence from school reform efforts in Chicago. NBER
statistics and economics. Journal of Econometrics. working paper series #W8916.
Cook, T. D., & Campbell, D. T. (1979). Quasi- LaLonde, R. (1986). Evaluating the econometric
experimentation: Design and analysis for field evaluations of training with experimental data. The
settings. Chicago, IL: Rand McNally. American Economic Review, 76 (4), 604–620.
Cook, T. D., Shadish, W. R., & Wong, V. C. (2005). Light, R. J., & Pillemer, D. (1984). Summing up:
Within-study comparisons of experiments and non- The science of reviewing research. Cambridge, MA:
experiments: Can they help decide on evaluation Harvard University Press.
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of van der Klaauw, W. (2002). Estimating the effect
psychological, educational, and behavioral treatment. of financial aid offers on college enrollment:
American Psychologist, 48(12), 1181–1209. A regression-discontinuity approach. International
McClannahan, L. E., McGee, G. G., MacDuff, G. S., & Economic Review, 43(4), 1249–1287.
Krantz, P. S. (1990). Assessing and improving child Wilde, E. T., & Hollister, R. (2007). How close
care: A personal appearance index for children with is close enough? Testing nonexperimental esti-
autism. Journal of Applied Behavior Analysis, 23, mates of impact against experimental estimates of
469–482. impact with education test scores as outcomes.
Michalopoulos, C., Bloom, H. S., & Hill, C. J. (2004). Can Journal of Policy Analysis and Management, 26,
propensity-score methods match the findings from a 455–477.
random assignment evaluation of mandatory welfare- Wortman, P. M., Reichardt, C. S., & St. Pierre, R. G.
to-work programs? The Review of Economics and (1978). The first year of the education vouncher
Statistics, 86(1), 156–179. demonstration. Evaluation Quarterly, 2, 193–214.
McKillip, J. (1992). Research without control groups:
A control construct design. In F. B. Bryant, J. Edwards,
R. S. Tindale, E. J. Posavac, L. Heath & E. Henderson APPENDIX 1: SUGGESTIONS FOR
(Eds.), Methodological issues in applied psychology
(pp. 159–175). New York: Plenum.
FURTHER READINGS ON TOPICS
Mill, J. S. (1856). A system of logic: Ratiocinative and COVERED IN THIS CHAPTER
inductive. Honolulu, Hawaii: University Press of the
Pacific. Regression discontinuity design
Minton, J. H. (1975). The impact of ‘Sesame Street’ on
reading readiness of kindergarten children. Sociology Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J., &
of Education, 48, 141–151. Hsuing, S. (1998). Comparison of a randomized
Seaver, W. B., & Quarton, R. J. (1976). Regression- and two quasi-experimental designs in a single
discontinuity analysis of dean’s list effects. Journal outcome evaluation: Efficacy of a university-level
of Educational Psychology, 68, 459–465. remedial writing program. Evaluation Review, 22(4),
Shadish, W. R., Clark, M. H., & Steiner, P. M. (2007). Can 207–244.
nonrandomized experiments yield accurate answers? Angrist, J. D., & Lavy, V. (1999). Using Maimonides’
A randomized experiment comparing random to rule to estimate the effect of class size on scholastic
nonrandom assignment. achievement. Quarterly Journal of Economics, 144,
Shadish, W. R., Luellen, J. K., & Clark, M. H. 533–576.
(2006). Propensity scores and quasi-experiments: Berk, R. A., & de Leeuw, J. (1999). An evaluation
A testimony to the practical side of Lee Sechrest. of California’s inmate classification system using a
In R. R. Bootzin (Ed.), Measurement, methods and generalized regression discontinuity design. Journal
evaluation. Washington, DC: American Psychological of the American Statistical Association, 94(448),
Association Press. 1045–1052.
Smith, J. C., & Todd, P. (2005). Does matching overcome Berk, R. A., & Rauma, D. (1983). Capitalizing on
LaLonde’s critique of nonexperimental estimators. nonrandom assignment to treatments: A regression-
Journal of Econometrics, 125, 305–353. discontinuity evaluation of a crime-control program.
Staw, B. M., Notz, W. W., & Cook, T. D. Journal of the American Statistical Association,
(1974). Vulnerability to the draft and attitudes 78(381), 21–27.
toward troop withdrawal from Indochina: Repli- Black, D., Galdo, J., & Smith, J. C. (2005). Evaluating the
cation and refinement. Psychological Reports, 34, regression discontinuity design using experimental
407–417. data. Working paper.
Trochim, W. (1994). The regression-discontinuity design: Buddelmeyer, H., & Skoufias, E. (2003). An evaluation
An introduction. Chicago, IL: Thresholds National of the performance of regression discontinuity design
Research and Training Center on Rehabilitation and on PROGRESA. Bonn, Germany: IZA.
Mental Illness. Cook, T. D. (in press). ‘Waiting for life to arrive’:
Trochim, W. M. K. (1984). Research design for program A history of the regression-discontinuity design in
evaluation. Beverly Hills, CA: Sage Publications. psychology, statistics and economics. Journal of
Trochim, W. M. K., & Spiegelman, C. (1980). The relative Econometrics.
assignment variable approach to selection bias in Goldberger, A. S. (1972a). Selection bias in evaluating
pretest-posttest designs. Alexandria, VA: American treatment effects: Some formal illustrations. Madison,
Statistical Association. WI: Institute for Research on Poverty.
Goldberger, A. S. (1972b). Selection bias in evaluating group methods match the findings from a random
treatment effects: The case of interaction. Madison, assignment evaluation of mandatory welfare-to-work
WI: Institute for Research on Poverty. programs? Washington, DC: Manpower Demonstra-
Hahn, J., Todd, P., & Van der Klaauw, W. (2001). tion Research Corporation.
Identification and estimation of treatment effects Braver, M. W. (1991). The multigroup interrupted
with a regression-discontinuity design. Econometrica, time-series design and analysis: An application to
69(1), 201–209. career ladder research. Arizona State University,
Jacob, B., & Lefgren, L. (2004a). The impact of teacher Phoenix.
training on student achievement: Quasi-experimental McArdle, J. J., & Wang, L. (2006). Modeling age-
evidence from school reform efforts in Chicago. based turning points in longitudinal life-span growth
Journal of Human Resources, 39(1), 50–79. curves of cognition. In P. Cohen (Ed.), Turning points
Jacob, B., & Lefgren, L. (2004b). Remedial education research. Mahwah, NJ: Erlbaum.
and student achievement: A regression-discontinuity McClannahan, L. E., McGee, G. G., MacDuff, G. S., &
analysis. Review of Economics and Statistics, Krantz, P. S. (1990). Assessing and improving child
LXXXVI(1), 226–244. care: A personal appearance index for children with
Ludwig, J., & Miller, D. L. (2005). Does head start autism. Journal of Applied Behavior Analysis, 23,
improve children’s life chances? Evidence from a 469–482.
regression discontinuity design. Cambridge, MA: McKillip, J. (1992). Research without control groups:
National Bureau of Economic Research. A control construct design. In F. B. Bryant, J. Edwards,
Seaver, W. B., & Quarton, R. J. (1976). Regression- R. S. Tindale, E. J. Posavac, L. Heath & E. Henderson
discontinuity analysis of dean’s list effects. Journal (Eds.), Methodological issues in applied psychology
of Educational Psychology, 68, 459–465. (pp. 159–175). New York: Plenum.
Spiegelman, C. (1977). A technique for analyzing
a pretest-posttest nonrandomized field experiment.
Statistics Report M435.
Thistlewaite, D. L., & Campbell, D. T. (1960). Regression- Difference-in-differences studies
discontinuity analysis: An alternative to the ex-post that use propensity score methods
facto experiment. Journal of Educational Psychology,
Agodini, R., & Dynarski, M. (2004). Are experiments
51, 309–317.
the only option? A look at dropout prevention
Trochim, W. M. K. (1984). Research design for program
programs. The Review of Economics and Statistics,
evaluation. Beverly Hills, CA: Sage Publications.
86 (1), 180–194.
Trochim, W. M. K., & Spiegelman, C. (1980). The relative
Dehejia, R., & Wahba, S. (1999). Causal effects in
assignment variable approach to selection bias in
nonexperimental studies: Reevaluating the evaluation
pretest-posttest designs. Alexandria, VA: American
of training programs. Journal of the American
Statistical Association.
Statistical Association, 94(448), 1053–1062.
van der Klaauw, W. (2002). Estimating the effect
Heckman, J., Ichimura, H., & Todd, P. E. (1998).
of financial aid offers on college enrollment:
Matching as an econometric evaluation estimator.
A regression-discontinuity approach. International
Review of Economic Studies, 65(2), 261–294.
Economic Review, 43(4), 1249–1287.
Heckman, J., Imbens, I. H., & Todd, P. E. (1997).
Matching as an econometric evaluation estimator:
Evidence from evaluating a job training programme.
Interrupted time series design Review of Economic Studies, 64, 605–654.
Heckman, J., & Navarro-Lozano, S. (2004). Using match-
Ashenfelter, O. (1978). Estimating the effects of training ing, instrumental variables, and control functions
programs on earnings. Review of Economics and to estimate economic choice models. Review of
Statistics, 60, 47–57. Economics and Statistics, 86 (1), 30–57.
Bloom, H. S., Michalopoulos, C., & Hill, C. J. Hirano, K., & Imbens, G. W. (2001). Estimation of causal
(2005). Using experiments to assess nonexperimental effects using propensity score weighting: An applica-
comparison-group methods for measuring program tion to data on right heart catheterization. Health
effects. In H. S. Bloom (Ed.), Learning more from social Services and Outcomes Research Methodology, 2,
experiments (pp. 173–235). New York: Russell Sage 259–278.
Foundation. Imbens, G. W. (2000). The role of the propensity score
Bloom, H. S., Michalopoulos, C., Hill, C. J., & in estimating dose-response functions. Biometrika,
Lei, Y. (2002). Can nonexperimental comparison 87 (3), 706–710.
Heckman, J. J., Ichimura, H., Smith, J. C., & Todd, P. Shadish, W. R., Luellen, J. K., & Clark, M. H.
(1998). Characterizing selection bias. Econometrica, (2006). Propensity scores and quasi-experiments:
66 (5), 1017–1098. A testimony to the practical side of Lee Sechrest.
Hotz, V. J., Imbens, G. W., & Klerman, J. (2000). In R. R. Bootzin (Ed.), Measurement, methods and
The long-term gains from GAIN: A re-analysis of evaluation. Washington, DC: American Psychological
the impacts of the California GAIN program. NBER Association Press.
technical working paper #8007. Smith, J. C., & Todd, P. (2005). Does matching overcome
Hotz, V. J., Imbens, G. W., & Mortimer, J. H. (1999). LaLonde’s critique of nonexperimental estimators.
Predicting the efficacy of future training programs Journal of Econometrics, 125, 305–353.
using past experience. NBER technical working Wilde, E. T., & Hollister, R. (2002). How close is
paper #238. close enough? Testing nonexperimental estimates of
LaLonde, R. (1986). Evaluating the econometric impact against experimental estimates of impact with
evaluations of training with experimental data. The education test scores as outcomes, Discussion paper
American Economic Review, 76 (4), 604–620. no. 1242-02. Madison, WI: Institute for Research on
Michalopoulos, C., Bloom, H. S., & Hill, C. J. (2004). Can Poverty.
propensity-score methods match the findings from a
random assignment evaluation of mandatory welfare-
to-work programs? The Review of Economics and Threats to internal validity
Statistics, 86 (1), 156–179.
Olsen, R., & Decked, P. (2001). Testing different methods Shadish, W. R., Cook, T. D., & Campbell, D. T.
of estimating the impacts of worker profiling and (2002). Experimental quasi-experimental designs for
reemployment services systems. Washington, DC: generalized causal inference. Boston: Houghton
Mathematica Policy Research, Inc. Mifflin Company.
Michalopoulos, C., Bloom, H. S., & Hill, C. J. (2004). Can Bell, S. H., Orr, L. L., Blomquist, J. D., & Cain,
propensity-score methods match the findings from a G. C. (1995). Program applicants as a com-
random assignment evaluation of mandatory welfare- parison group in evaluating training programs.
to-work programs? The Review of Economics and Kalamazoo, MI: Upjohn Institute for Employment
Statistics, 86 (1), 156–179. Research.
Rosenbaum, P. (2002). Observational Studies. Bloom, H. S., Michalopoulos, C., & Hill, C. J.
New York: Springer-Verlag. (2005). Using experiments to assess nonexperi-
Rosenbaum, P., & Rubin, D. B. (1983). The central role mental comparison-group methods for measuring
of the propensity score in observational studies for program effects. In H. S. Bloom (Ed.), Learning more
causal effects. Biometrika, 70(1), 41–55. from social experiments (pp. 173–235). New York:
Rosenbaum, P., & Rubin, D. B. (1984). Reducing bias in Russell Sage Foundation.
observational studies using subclassification on the Bloom, H. S., Michalopoulos, C., Hill, C. J., &
propensity score. Journal of the American Statistical Lei, Y. (2002). Can nonexperimental comparison
Association, 79, 516–524. group methods match the findings from a random
Rosenbaum, P., & Rubin, D. B. (1985). Constructing a assignment evaluation of mandatory welfare-to-work
control group using multivariate matched sampling programs? Washington, DC: Manpower Demonstra-
methods that incorporate the propensity score. The tion Research Corporation.
American Statistician, 39(1), 33–38. Bratberg, E., Grasdal, A., & Risa, A. E. (2002). Evaluating
Rubin, D. B. (1977). Assignment to treatment group social policy by experimental and nonexperimental
on the basis of a covariate. Journal of Educational methods. Scandinavian Journal of Economics, 104(1),
Statistics, 2(1), 1–26. 147–171.
Rubin, D. B., & Thomas, N. (1996). Matching using Buddelmeyer, H., & Skoufias, E. (2003). An evaluation
propensity scores: Relating theory to practice. of the performance of regression discontinuity
Biometrics, 52, 249–264. design on PROGRESA. Bonn, Germany: IZA.
Shadish, W. R., Luellen, J. K., & Clark, M. H. Dehejia, R., & Wahba, S. (1999). Causal effects in non-
(2006). Propensity scores and quasi-experiments: experimental studies: Reevaluating the evaluation of
A testimony to the practical side of Lee Sechrest. training programs. Journal of the American Statistical
In R. R. Bootzin (Ed.), Measurement, methods and Association, 94(448), 1053–1062.
evaluation. Washington, DC: American Psychological Fraker, T., & Maynard, R. (1987). The adequacy
Association Press. of comparison group designs for evaluations of
Wilde, E. T., & Hollister, R. (2002). How close is employment-related programs. Journal of Human
close enough? Testing nonexperimental estimates of Resources, 22(2), 194–227.
impact against experimental estimates of impact with Friedlander, D., & Robins, P. (1995). Evaluating
education test scores as outcomes, Discussion paper program evaluations: New evidence on commonly
no. 1242-02. Madison, WI: Institute for Research on used nonexperimental methods. American Economic
Poverty. Review, 85(4), 923–937.
Zhong, Z. (2004). Using matching to estimate treatment Glazerman, S., Levy, D. M., & Myers, D. (2002).
effects: Data requirements, matching metrics, and Nonexperimental replications of social experiments:
Monte Carlo evidence. The Review of Economics A systematic review. Washington, DC: Mathematica
and Statistics, 86 (1), 156–179. Policy Research, Inc.
Glazerman, S., Levy, D. M., & Myers, D. (2003).
Nonexperimental versus experimental estimates of
earnings impacts. The Annals of the American
Within-study comparison papers
Academy, 589, 63–93.
Agodini, R., & Dynarski, M. (2004). Are experiments Greenberg, D. H., Michalopoulos, C., & Robins, P.
the only option? A look at dropout prevention (2006). Do experimental and nonexperimental
programs. The Review of Economics and Statistics, evaluations give different answers about the effec-
86 (1), 180–194. tiveness of government-funded training programs.
Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J., & Journal of Public Policy and Management, 25(3),
Hsuing, S. (1998). Comparison of a randomized 523–552.
and two quasi-experimental designs in a single Gritz, M., & Johnson, T. (2001). National Job Corps
outcome evaluation: Efficacy of a university-level Study: Assessing program effects on earnings for
remedial writing program. Evaluation Review, 22(4), students achieving key program milestones. Seattle,
207–244. WA: Battelle Memorial Institute.
11
Sample Size Planning with
Applications to Multiple
Regression: Power and
Accuracy for Omnibus and
Targeted Effects
Ken Kelley and Scott E. Maxwell
ABSTRACT for the squared multiple correlation coefficient

(an omnibus effect) and for a specific regression
When designing a research study, sample size coefficient (a targeted effect). A discussion of
planning is one of the key factors to consider. statistical significance testing and confidence interval
One aspect of sample size planning is whether the construction for the parameters of interest is provided.
primary goal of the research study is to reject a Whereas the power analytic approach is largely
false null hypothesis, the power analytic approach. reviewed from existing literature, developments are
Another primary goal may be to obtain a confidence made for the accuracy in parameter estimation
interval that is sufficiently narrow, the accuracy in approach.
parameter estimation approach. Some questions of
interest may pertain to a collection of parameters
At the heart of scientific research is the
(i.e. an omnibus effect), whereas other questions desire for understanding. Even though many
may pertain to only a single parameter (i.e. a methods exist for attempting to gain a
targeted effect). The issue of power or accuracy better understanding of the phenomenon or
and the issue of an omnibus effect or a targeted phenomena of interest, statistical methods
effect leads to a two-by-two conceptualization
for planning sample size. The power analytic and
have proven to be the most useful way
accuracy in parameter estimation approaches are of extracting information from data. Given
discussed in the context of multiple regression that the use of statistical methods is so
SAMPLE SIZE PLANNING WITH APPLICATIONS TO MULTIPLE REGRESSION 167
vital to scientific research, ensuring that first being defined. There are multiple ways
the statistical methods chosen provide the to plan sample size for a single study.
information of interest is an important step The way in which sample size is planned
for scientific progress. Even though science depends heavily on the question(s) of interest
is often laborious and slow, by designing a that the investigator has defined. Thus, not
well-planned study researchers can be in the defining the question of interest implies that
best position to maximize their chances for a method for choosing sample size, and thus
success, where the ultimate goal is gaining the sample size itself, cannot adequately be
a better understanding of the phenomenon of defined1 .
interest. For example, suppose a researcher wishes
Designing research studies is arguably the to examine the relationship between five
most important single phase of research. regressor variables and a criterion variable
With a poorly designed study, little or no in a multiple regression context. However,
understanding of the phenomenon of interest the process of deciding on an appropriate
may be gained. Given the high economic sample size cannot begin until the question
and professional costs of poorly designed of interest has been clearly defined. There are
research, motivation of the researcher should at least four scenarios in which sample size
clearly be on the side of beginning an planning can proceed in a multiple regression
investigation with a well-designed study. context:
Many facets exist to research design and
each one deserves attention. At a minimum, (a) desired degree of statistical power for the overall
the following points must be considered fit of the model (i.e. power for the squared
when designing studies in the behavioral, multiple correlation coefficient);
educational, and social sciences: (b) desired degree of statistical power for a
specific regressor variable (i.e. power for the
(a) the question(s) of interest must be determined; test of a particular population regression coef-
(b) the population of interest must be identified; ficient);
(c) a sampling scheme must be devised; (c) statistical accuracy for the overall fit of the
(d) selection of independent and dependent mea- model (i.e. a narrow confidence interval for
sures must occur; the population squared multiple correlation
(e) a decision regarding experimentation versus coefficient);
observation must be made; (d) statistical accuracy for a specific regressor
(f) statistical methods must be chosen so that the variable (i.e. a narrow confidence interval for one
question(s) of interest can be answered in an or more population regression coefficients)2 .
appropriate and optimal way;
(g) sample size planning must occur so that an
Thus, an appropriate sample size depends
appropriate sample size given the particular
scenario, as defined by points a through f, can
very much on the goals of the researcher.
be used; Not surprisingly, given the fundamental
(h) the duration of the study and number of differences between power and accuracy
measurement occasions need to be considered; for omnibus and targeted effects, necessary
(i) the financial cost (and feasibility) of the proposed sample size can be very different in the four
study calculated. scenarios. More general than the multiple
regression example, sample size planning
Sample size planning (Point g) as it relates can be conceptualized in a two-by-two table,
to the question(s) of interest (Point a) of where the effect of interest, either an omnibus
an investigation is the focus of this chapter. or a targeted effect, is on one dimension and
Although sample size planning is an important the goal, either power or accuracy, is on the
part of research design, sample size planning other dimension. Such a conceptualization is
cannot occur without some question of interest given in Table 11.1 for sample size planning
Table 11.1 Two-by-two conceptualization statistical accuracy, and multiple regression,

of possible scenarios when statistical power respectively. The overview sections are
is crossed with statistical accuracy followed by methods for planning sample
size given the goals of statistical power
Effect and statistical accuracy for omnibus and
Omnibus Targeted targeted effects, respectively, in the context
Accuracy Power
of multiple regression analysis. The computer

a b program R (R Development Core Team,
Goal
2007) is used throughout the article with

c d
the MBESS package (Kelley, 2007). R is
a comprehensive statistics environment and
language with powerful graphics capabilities.
MBESS is an add-on package for R that has,
among other things, numerous functions for
for points a–d, where the effect is represented assisting researchers planning an appropriate
by the column dimension and the goal is sample size. Both R and MBESS are Open
represented by the row dimension. Source and freely available3 . The R code used
Even though Table 11.1 has four cells, none throughout the chapter is distinguished from
of the cells are mutually exclusive, nor is text by using a non-serif font (such as this).
any specific one necessary. That is to say, a R examples are typeset in a gray box with
researcher could have the goal of achieving ‘R >’ denoting an executable R command as
power for the omnibus effect (cell a) and a follows:
specific effect (cell b). Likewise, a researcher
R > mean (data)
could have the goal of accuracy for the
omnibus effect (cell c) and a specific effect which returns the mean of the values
(cell d). A researcher most interested in the contained in the object ‘data.’
omnibus effect could desire both its power We have synthesized a large amount of
(cell a) and its accuracy (cell c). A researcher work done in the sample size planning
most interested in a specific effect could literature and packaged it in what we hope
desire both power (cell b) and accuracy is a conceptually appealing and readily
(cell d). Another possibility is for a researcher comprehensible presentation, complete with
to desire a high degree of power for the easy to use computer commands for planning
omnibus effect (cell a) and to desire accuracy necessary sample size in each of the four
for a specific effect (cell d). Conversely, a scenarios described.
researcher might desire a high degree of
accuracy for the omnibus effect (cell c) and
a high degree of power for a specific effect RATIONALE OF STATISTICAL POWER
(cell b). Any combination of the cells in the ANALYSIS
table is possible, and given the goals of the
researcher, multiple cells in Table 11.1 might Statistical power is a function of four things:
be relevant. (a) the size of the effect; (b) the model
What may not be obvious from Table 11.1 error variance; (c) the Type I error rate
is that the sample size necessary to fulfill one (); and (d) sample size (N)4 . Power is
of the scenarios of interest might also be large defined as one minus the probability of a
enough to fulfill one or more other scenario(s). Type II error5 . In most cases the size of the
We will discuss methods of planning sample effect, Type I error rate (e.g. = 0.01 or
size for each of the scenarios in upcoming = 0.05), and often the model error variance
sections of the chapter in the context of multi- are considered fixed, leaving only the sample
ple regression. The next three sections provide size as a quantity that is in the control
overviews and rationales of statistical power, of the researcher6 . Given that power is in part
a function of sample size, the sample size can and covariance have omnibus effect sizes
be manipulated so that a desired degree of that are generally not easy to interpret. One
power is reached. Power has been discussed option is to reduce such multivariate effects
in numerous book length treatments for many into simpler effects (e.g. pairwise, simple
statistical tests (e.g. Kraemer & Thiemann, main effects, specific effects, etc.) and then
1987; Cohen, 1988; Lipsey, 1990; Murphy & report their corresponding effect sizes and
Myors, 1998). confidence intervals. Even though such effects
The use of null hypothesis significance are readily comprehensible, such simplified
testing has been under fire for some time hypotheses generally fail to consider the com-
(e.g. Nickerson, 2000, for a review; the works plexity and multivariate nature of the original
contained in Rozeboom, 1960; Bakan, 1966; research question, requiring the questions to
Morrison & Henkel, 1970; Meehl, 1978; be addressed with multivariate techniques
Cohen, 1994; Schmidt, 1996). Even though that may not have readily interpretable
we sympathize with many of the critiques effect sizes. We will discuss the benefits
leveled against the use of null hypothesis of confidence interval formation in the next
significance testing, null hypothesis signif- section, but we acknowledge that confidence
icance testing has its place in science and intervals are not adequate for addressing
there is little question that it will continue all substantively interesting questions. In
to be widely used (e.g. Chow, 1996; Hagen, cases where a research question is best
1997; Harris, 1997; Wainer, 1999; Mogie, addressed with a null hypothesis significance
2004). There are two main reasons why test, the a priori power of the test should
null hypothesis significance tests are valuable be as important as the obtained probability
in research: they help researchers decide if value.
the population value of some effect differs Even though the conceptual rationale of
from a specified quantity (generally zero), power analysis is generally well understood,
and for many tests they allow the researcher not often discussed are the implications and
to decide the direction of the effect. For importance of mapping a power analysis onto
some questions of interest, the use of null the research question(s) of interest. In a given
hypothesis significance tests is not especially study, there are often numerous statistical
helpful. In those situations other techniques hypotheses evaluated. Given a particular
can be used. sample size and holding everything else
One common alternative to null hypothesis constant, each of the potential statistical tests
significance testing is the use of effect sizes has a population effect size and model error
and their corresponding confidence inter- (or simply a standardized effect size which
vals (e.g. Schmidt, 1996; Thompson, 2002; simultaneously considers both) that must be
Smithson, 2003; Hunter & Schmidt, 2004; estimated, and an associated level of statistical
Steiger, 2004; Grissom & Kim, 2005). Effect power. Sample size can thus be determined
sizes and their corresponding confidence so that power is at some desired level for
intervals can better address issues involving one or several tests. If power is set to a
the magnitude of an effect than can null value, such as 0.85, it is likely that a different
hypothesis significance tests. However, some sample size would be necessary for each of the
research questions do not lend themselves statistical tests of interest. Depending on the
to being framed as an effect where the exact question of interest (i.e. for which test
magnitude is meaningful and of interest. This is the appropriate sample size determined),
is especially true with some multiparameter necessary sample size to achieve some desired
and multivariate hypotheses, as such tests goal will generally be different. Thus, before
are more difficult to transform into an sample size planning from a power analytic
effect size and corresponding confidence approach can proceed, the exact question of
interval that is readily interpretable. For interest must be specified (Point a from the
example, multivariate analysis of variance designing research list).
When statistical tests are conducted in sit- chance of achieving statistical significance for
uations of low power, the literature of an area the parameter(s) of interest, it can prevent
can become awash with contradictory results the study from even being conducted because
(e.g. Sedlmeier & Gigerenzer, 1989; Rossi, funding is not secured.
1990; Hunter & Schmidt, 2004; Maxwell, Power analysis is also an important tool for
2004). For example, suppose several protecting valuable resources. For example,
researchers each replicate the same previously suppose a study was conducted with a sample
reported study using multiple regression with size of N = 20. Further suppose that the
several regressor variables. Further suppose statistical test on the parameter of interest
that the power was low for each of the several did not yield a statistically significant result.
regressors. It is entirely possible that each of Such a result might be disappointing, but
the researchers obtained a different set of sta- such a result might have also been avoided.
tistically significant regression coefficients, Suppose that a power analysis (e.g. based
none of which mirror the previously reported on an independent group t-test where the
study! By having low power across multiple population standardized mean difference is
parameters, there is often a high probability of thought to be 0.40 with the Type I error
obtaining statistical significance somewhere rate set to 0.05) would have revealed that a
(Kelley et al., 2003), but a small probability sample size of 100 would be necessary in order
of replicating the same set of statistically for the power to equal 0.80, the researcher’s
significant regressors (Maxwell, 2000, 2004). operational definition of ‘adequate power.’
Consistency of research findings is thus Had such a power analysis been conducted by
difficult if power is low for some or all of the the researcher a priori, the researcher would
effects examined. Without ensuring that an have had at least three choices: (a) perform the
adequate degree of power is achieved, low- study with N = 20 anyway, with the caveat
powered studies riddled with Type II errors that there would be only a small probability
can permeate the literature and scientific (specifically 0.23 under the anticipated effect
growth can falter because of inconsistencies size) of achieving statistical significance (i.e.
regarding statistically significant effects low power); (b) modify the original design so
across multiple studies that examine the same that the sample size was changed to N = 100
effects (Rosenthal, 1993; Schmidt, 1996; in order for the researcher to have an adequate
Kraemer et al., 1998; Hunter & Schmidt, degree of power for detecting the effect of
2004, chapter 1). interest; or (c) realize that N = 100 is not
Many times when a study has important practical given the difficulty of collecting data
implications, such as those often conducted and conclude that the cost/benefit ratio is not
in the behavioral, educational, social, and worth conducting the study at the present time.
medical sciences, ignoring issues of power is Points b and c are both enlightening from a
irresponsible and potentially even unethical. resource standpoint, because it may become
This is true, for example, when individuals are apparent that N = 20 is not adequate and
subjected to an inferior treatment condition in thus using a sample size of only 20 may not
a study with low power. The individuals in be a wise use of resources given the low
such studies are put at risk with little chance probability of finding statistical significance.
of determining whether some treatments are
truly superior to others. A more tangible
reason for seriously considering power analy- RATIONALE OF ACCURACY IN
sis is that grant funding review boards now PARAMETER ESTIMATION
generally require explicit consideration of
design and power in grant proposals in order In order for a piece of information to be
to receive funding (e.g. Allison et al., 1997; meaningful, it is generally desirable for that
Kraemer et al., 1998). Thus, not only can piece of information to be accurate. In the
ignoring power issues lead to a study with little context of parameter estimation, accuracy is
defined in terms of the (square) root of the estimate and will contain the parameter
mean square error (RMSE), and is a function (1–)100% of the time, as the width of the
of precision and bias. Formally, the accuracy interval decreases the expected accuracy of
of an estimate θ̂ is defined as the estimate improves (i.e. the RMSE is
reduced).
2 The effect of increasing sample size
RMSE = E θ̂−
potentially has two effects on accuracy. First,
the larger the sample size generally the
2 2
= E θ̂−E θ̂ + E[θ̂−] more precision the estimate will have (i.e.
its variance decreases as N increases)9 . For

unbiased estimates, improving the precision
= 2 +B2 , (1) necessarily improves accuracy. Estimators
θ̂ θ̂
that are biased will many times become less
where E [·] is the expected value of the biased as sample size increases. Indeed, for
quantity in brackets, is the parameter consistent estimators, regardless of whether
of interest with θ̂ as its estimate, 2 is the estimator is biased or unbiased, as sample
θ̂ size tends to infinity the probability that the
the
population
variance
of the estimator
2 sample estimate differs from the population
i.e. E θ̂ − E θ̂ , and B is the bias quantity by any value tends to zero (Stuart
θ̂
et al., 1994, chapter 17). Thus, above and
of the estimator i.e. E θ̂ − (Rozeboom, beyond any effect of precision, decreasing
1966, p. 500). Whereas precision reflects the bias also improves accuracy. In fact, even for
repeatability of measurements and is thus biased estimates, decreasing the confidence
inversely related to the sample-to-sample interval width can still be desirable. In such
variability, bias is the systematic (i.e. average) a scenario the point estimate itself might be
discrepancy between an estimate and the biased but the range of plausible parameter
parameter it estimates. Notice that when the values sufficiently small10 .
bias equals zero, the estimate is unbiased Sample size planning is almost always
and accuracy and precision are equivalent regarded as being synonymous with power
concepts7 . However, precision alone does not analysis. However, as previously discussed,
imply an accurate estimate8 . sample size planning can also proceed with
A narrow confidence interval has a tightly the goal of obtaining a sufficiently narrow
clustered set of plausible parameter values confidence interval. We call this method of
that will contain the parameter of interest sample size planning accuracy in parameter
with the degree of confidence specified. estimation (AIPE; Kelley & Rausch, in press;
These plausible parameter values are those Kelley et al., 2003; Kelley & Maxwell,
that cannot be rejected as the value of the 2003; Kelley, 2006), because when the
population parameter. In the long run when width of the (1–)100% confidence interval
the assumptions of the model are satisfied decreases — implying that there is a smaller
for an exact confidence interval procedure, range of plausible parameter values at a given
(1–)100% of the confidence intervals formed confidence level — the expected accuracy of
under the same conditions will contain the estimate necessarily increases. Because
(Hahn & Meeker, 1991, p. 31). Holding the accuracy can almost never be calculated
confidence level constant, the narrower the for a single estimate, due to the fact that
confidence interval width, the more values can it depends on unknown population values,
be excluded from the plausible set of param- minimizing the confidence interval width to
eter values. The effect of this is a homing some acceptable value serves as a way to
in on the population parameter. Because an operationally define the expected accuracy of
appropriately constructed confidence interval the estimate. Our usage of the term ‘accuracy
will always contain the observed parameter in parameter estimation’is consistent with that
used by Neyman in his seminal work on the confidence interval will, on (1–)100% of
theory of confidence intervals: ‘the accuracy occasions, have its lower bound less than
of estimation corresponding to a fixed value zero and its upper bound greater than zero
of 1- may be measured by the length of the (and thus the null value of zero is contained
confidence interval’ (1937, p. 358, notation within the interval and cannot be rejected).
changed to reflect current system). Further suppose that the confidence interval
It can be argued that obtaining an estimate contains zero, yet is wide relative to the scale
that has a narrow confidence interval is more of the measurement. Even though the null
beneficial scientifically than obtaining an hypothesis of zero cannot be rejected, a large
estimate that reaches statistical significance. range of other plausible values (i.e. those
It has even been recommended that statistical values contained in the confidence limits) can
significance tests be banned and replaced also not be rejected. Contrast such a situation
with point estimates and their corresponding with one where zero is contained within
confidence intervals (Schmidt, 1996, p. 116). the interval and the width of the confidence
In many situations, especially in observational interval is narrow. In such a situation it is
research, it is known a priori that the null possible to exclude a wide range of values
hypothesis is almost always false (Bakan, as being plausible (i.e. those not contained
1966; Meehl, 1967; Cohen, 1994; Schmidt, within the confidence limits) and thus narrow
1996; Harris, 1997), and as such situations the range of plausible values.
reaching statistical significance is simply a When one wishes to show support for
function of having a large enough sample the null hypothesis (Greenwald, 1975), the
size (of course, the direction of some effects accuracy of the obtained estimate as judged
is often of interest and importance; see our by the width of the corresponding confidence
discussion in the previous section)11 . How- interval should be of utmost concern. The
ever, when an effect is of interest, learning as ‘good enough’ principle can be used and
much as possible about the size of the effect a corresponding ‘good enough belt’ can be
is almost always beneficial, and many times formed for the null value, where the limits
it can be more beneficial than learning only of the belt would define what constituted
the direction and statistical significance of the a nontrivial effect (Serlin & Lapsley, 1985,
parameter. Embracing the AIPE approach to 1993). Suppose that not only is the null
sample size planning will help to facilitate value contained within the good enough
the accumulation of scientific knowledge by belt, but so too are the confidence limits.
yielding more accurate information about This would be a situation where all of
the parameter. Indeed, as Rosenthal (1993) the plausible values would be smaller in
discusses, there are really two results of magnitude than what has been defined as a
interest: (a) the estimate of the magnitude trivial effect (i.e. the confidence limits are
of the effect; and (b) an indication of the contained within the good enough belt). In
accuracy of the effect ‘as in a confidence such a situation the limits of the (1–)100%
interval around the estimate’ (p. 521). Thus, confidence interval would exclude all effects
rather than simply asking if an effect differs of any ‘meaningful’ size. If the parameter
from some specified null value, in most cases is less in magnitude than what is minimally
it seems better to address the size of the important, then learning this can be very
effect, realizing that the more accurate the valuable. This information may or may not
estimate of the effect the more information support the theory of interest, but what is
is learned. important is that valuable information about
Suppose there is no treatment effect in the size of the effect, and thus the phenomenon
a two-group situation (i.e. the null hypothesis of interest, has been gained. Illuminating
is true). Assuming its assumptions are met, the the size of the effect is something a null
t-test will yield a p-value greater than on hypothesis test in and of itself cannot do.
(1–)100% of occasions. The corresponding Furthermore, in order for future researchers
to incorporate the study into a meta-analysis, where Y is the population mean of Y and
the size of the effect is required (e.g. Hunter & µX is the p length vector of population
Schmidt, 2004). means for the regressor variables (see, for
example, Graybill, 1976; Darlington, 1990;
Pedhazur, 1997; Rancher, 2000; Cohen et al.,
OVERVIEW OF MULTIPLE 2003 for comprehensive coverage of multiple
REGRESSION regression and the general linear model).
Throughout the chapter we assume that the
Let Yi be an observed score on some criterion regressor variables are fixed, which implies
variable for the ith individual (i = 1, . . ., N) that in theoretical replications of the study
and Xij be the observed score for the jth the same X matrix would be obtained. This
regressor variable ( j = 1, . . ., p) for the ith would be the case, for example, when the X
individual12,13 . The general univariate linear matrix is literally developed as part of the
model can be written as study design. Theoretical replications of the
study would then have the same X matrix
Yi = 0 + Xi1 1 + Xi2 2 + · · · + Xip p + ε i , and the only variation would be the values
(2) of the criterion variables (and thus the error).
When the regressors are random, and thus in
where 0 is the population intercept, j is theoretical repetitions of the study different
the regression coefficient for the jth regressor, X matrices would be obtained, the discussion
and ε i is the error in prediction for the ith that follows would need to be modified to take
individual generally assumed to be normally into consideration the increased randomness
distributed with mean zero and variance ε2 14 . of the design (e.g. Sampson, 1974; Gatsonis &
The matrix analog of Equation 2 can be Sampson, 1989; Rancher, 2000).
written as Often of interest in a multiple regression
context is the squared multiple correlation
y = 0 1 + Xβ + ε, (3)
coefficient, sometimes termed the coefficient
where y is an N length vector of observed of determination. Recall that the squared mul-
criterion variables, 0 is the intercept, 1 is tiple correlation coefficient is the proportion
an N length column vector of 1s, X is an N of variance in Y that is accounted for by
by p matrix of fixed regressor variables, β is the p regressor variables. The population
a p length vector of regression coefficients, multiple correlation coefficient, denoted with
and ε is an N length vector of errors15 . The an uppercase Greek rho, squared, is defined as
p regression coefficients in the vector β can
be obtained by manipulation of the normal σ Y X −1
XX σ XY
P2Y ·X = , (6)
equations as Y2

β = −1 −1
XX σ XY = XX σ Y X , (4) which is equivalent to the population
squared product moment correlation coeffi-
where XX is the p by p covariance matrix cient between the observed scores (Yi ) and the
of the regressor variables with a minus predicted scores (Ŷi ; i.e. P2Y ·X = 2 ) 16 .
Y Ŷ
one power representing the inverse of the Equations 2–6 have used only population
matrix, σ XY is the p length column vector of parameters. In practice, of course, only
covariances of the p regressors with Y and the sample means, variances, and covari-
σ Y X is the p length row vector of covariance ances are known. The means and the vari-

of Y with the p regressors (σ XY = σ Y X , where ance/covariance matrix of the p + 1 variables
prime denotes transposition). The intercept is (the outcome variable and the p regressor vari-
defined as ables) are estimated with the usual unbiased
estimates and substituted into Equations 4–6.
0 = Y − µX β, (5) The estimate of β corresponding to the p
regressor variables, b, can be obtained by when claiming statistical significance, there

substituting sYX or sXY and SXX for their is always the possibility of a Type I error,
population analogs into Equation 4: but that is the price of rejecting a population
value based on a sample value. The next two

b = S−1 −1
XX sXY = SXX sY X . (7) subsections discuss the two most common null
hypotheses that are tested in the context of
Likewise, the estimate of 0 can be obtained multiple regression: the test that P2Y ·X = 0
by substituting the sample means for the and the test that j =0.
population means and the vector of sample
regression coefficients in Equation 5: The test of the null hypothesis that
the squared multiple correlation
Y −
b0 = X b. (8) coefficient equals zero
The estimate of P2Y ·X , RY2 ·X , is obtained When P2Y ·X is zero, by implication β is a
by substituting the sample estimates of the p-length vector of zeros (i.e. β = 0p ).
parameters into Equation 6: Of course, in any particular sample, RY2 ·X
will almost certainly be greater than zero.
sY X S−1 It is a task of the researcher to evaluate
XX sXY
RY2 ·X = . (9) if enough evidence exists to reject the idea
s2Y that P2Y ·X is zero. When the null hypothesis
that P2Y ·X = 0 is true, a test statistic can
An obtained estimate will almost certainly not
be formed from RY2 ·X that follows a central
equal its population value. What is generally
F-distribution. The statistic that is used to test
of interest is knowing if the population
the null hypothesis for the squared multiple
value differs from some specified null value
correlation coefficient is
(generally zero) or determining the plausible
values of the parameter (i.e. the values RY2 ·X /p
contained within the (1 − )100% confidence F= , (10)
1 − RY2 ·X /(N − p − 1)
interval). The next two sections discuss null
hypothesis significance testing and confidence where the F-value has p and N – p – 1
interval formation, respectively, first for degrees of freedom. Of course, this F-statistic
the squared multiple correlation coefficient has an associated probability value, and if
and then for regression coefficients. Null the obtained p-value is less than the adopted
hypothesis significance tests and confidence Type I error rate (i.e. the level), then the null
interval formation are briefly discussed for hypothesis can be rejected.
regression parameters in order to form a basis When P2Y ·X is not zero, implying that
for the methods of sample size planning that β = 0p (i.e. at least one element of the
will be discussed in later sections of the vector of regression coefficient is non-
chapter. zero), the distribution of the F-statistic
from Equation 10 follows a noncentral
F-distribution, whereas the F-statistic when
NULL HYPOTHESIS SIGNIFICANCE the null hypothesis is true follows a central
TESTS FOR REGRESSION F-distribution (the central F-distribution is
PARAMETERS the standard ‘F-distribution’ discussed in
introductory and intermediate level statistics
The idea of a null hypothesis significance test books). Rather than having only two parame-
is to infer if values at least as extreme as the ters, the numerator and denominator degrees
observed value are sufficiently unlikely if in of freedom like the central F-distribution,
fact the population value were equal to the the noncentral F-distribution also has a
specified null value (usually zero). Of course, noncentrality parameter. The noncentrality
As mentioned, the test of a specific regression implying the noncentrality parameter for the
coefficient is equivalent to the test of no jth regressor is
change in P2Y ·X when the jth regressor
is
removed from the regression equation −j = f−j
2
N. (23)
i.e. P2Y ·X − P2Y ·X−j = 0 . This is in turn
equivalent to the test of the squared semi- It should be kept in mind that all derivations
partial (part) correlation of Y with the have been for the case where the regressors
jth regressor being zero. Let P2Y ·(X ·X−j ) be are considered fixed. This and the previous
j
section laid out the formal distributional
the correlation of Y with the independent
theory of RY2 ·X and bj . The derivations given
part of Xj (i.e. the squared semi-partial
in this section allow them to be used in a future
correlation between Y and Xj ). The definition
section that deals with statistical power for the
of P2Y ·(X ·X−j ) is given as
j squared multiple correlation coefficient.
P2Y ·(Xj ·X−j ) = P2Y ·X − P2Y ·X−j . (20)

CONFIDENCE INTERVAL FORMATION
Similar to the test of P2Y ·X from Equation 10, FOR REGRESSION PARAMETERS
the test of P2Y ·(X ·X−j ) can be written as an
j
In order to understand how well an observed
F-statistic with 1 and N − p − 1 degrees of
estimate represents its corresponding param-
freedom:
eter, confidence intervals are necessary.
Confidence intervals for some effects are
RY2 ·X − RY2 ·X−j ) /(p − (p − 1))
F= simple and involve only the estimate, the
1 − RY2 ·X /(N − p − 1) standard error of the estimate, and the critical
value from the test of the null hypothesis
RY2 ·(Xj ·X−j )
= , (21) (e.g. the critical t, F, or 2 value). However,
1 − RY2 ·X /(N − p − 1) in certain cases the confidence interval is
more complicated and involves the use of
The F-statistic of Equation 21 is the square noncentral distributions.
of the t-statistic in Equation 13. The reason Noncentral distributions, as will be dis-
for rewriting the t-statistic for j as an F-test cussed in a future section, are important
for the change in P2Y ·X when Xj is removed for determining sample size in a power

i.e. P2Y ·X − P2Y ·X−j from the prediction equa- analytic context. These distributions are also
tion is to show the relationship between important for confidence interval formation
the omnibus F-statistic of Equation 10 and for certain effects, especially those that
the targeted F-statistic of Equation 21. This have been standardized or when the sam-
relationship will become important later when pling distribution of the statistic does not
discussing power. follow a central distribution or a mean-
It should be noted that the noncentrality shifted central distribution17 . Effects that
parameter of the test of a single regression are standardized will not generally follow
coefficient is very similar to the noncentrality a central distribution, because such effects
parameter of the test of all regression are not pivotal. Stuart et al. (chapter 23,
coefficients tested simultaneously (i.e. the test 1999) provide a technical discussion of pivotal
of P2Y ·X = 0). The signal-to-noise ratio for quantities, but in the context of effect sizes, a
the change in P2Y ·X when the jth regressor is pivotal quantity is one where the confidence
removed is given as interval is a simple rearrangement of the
test statistic (Cumming and Finch, 2001).
P2Y ·X − P2Y ·X−j Effects such as the squared multiple corre-
2
f−j = , (22) lation coefficient (e.g. Smithson, 2003), the
P2Y ·X standardized mean difference (e.g. Steiger &
parameter indexes the magnitude of the evaluated against a null value of zero, is based
difference between the null and alternative on a t-value with N −p−1 degrees of freedom,
hypotheses. The larger the difference between and is given as
the null and alternative hypotheses, the larger
bj
is the noncentrality parameter. t= , (13)
It can be shown that the noncentrality sbj
parameter of the sampling distribution for the where sbj is given as
F-statistic of Equation 10 is given as

1 − RY2 ·X sY
= f 2 N, (11) sbj = s ,
1 − RXj ·X−j N − p − 1
2 Xj
where (14)
P2Y ·X with RX 2 being the squared multiple
f2 = (12) j ·X−j
1 − P2Y ·X correlation coefficient using the jth regressor
as the criterion on the remaining p − 1
and where f 2 has an interpretation as the regressors. RX 2 is also indirectly available
j ·X−j
signal-to-noise ratio (Cohen, 1988; Stuart from SXX as
et al., 1999; Rancher, 2000; Smithson, 2001).
−1
As can be seen, is a function of P2Y ·X 2
RX = 1 − sj2 cjj , (15)
j ·X−j
and N. As either of these quantities becomes
larger, so too does . The effect of a where sj2 is the variance for the jth regressor
larger is that the sampling distribution of and cjj is the jth diagonal element of S−1 XX
the F-statistic in Equation 10 has a larger (Harris, 2001).
mean and for fixed sample size values will Similar to the situation described previ-
be more positively skewed. Thus, a larger ously when the null hypothesis that P2 = 0 is
proportion of the noncentral distribution will false and the F-statistic of Equation 10 follows
be larger than the critical value under the null a noncentral distribution, so too does the test
hypothesis. This idea will become important statistic of Equation 13 when j = 0. It can
in the discussion of power and for confidence be shown that when the null hypothesis that
interval formation. j = 0 is false, the t-statistic in Equation 13
has a noncentrality parameter which can be
written as
The test of the null hypothesis that √
a regression coefficient equals zero λj = fj N, (16)
Let P2Y ·X−j be the population squared multiple where

correlation coefficient when Y is predicted
1 − P2X ·X X
from p − 1 regressor variables with Xj
fj = j
j −j j
. (17)
excluded. Researchers are often interested in 1 − PY ·X
2 Y
knowing if a specific regressor variable adds
a statistically significant amount to the fit Because j can be written (e.g. Hays, 1994) as
of the model, which translates into a test of
2
P2Y ·X being larger than P2Y ·X−j . Such a test PY ·X − P2Y ·X−j Y
j = , (18)
is equivalent to the test of the regression 1 − P2X ·X Xj
j −j
coefficient for Xj when all of the p variables
are included in the model. fj from Equation 17 can be rewritten as
One of the ways to test the hypothesis that
2
βj is non-zero is to conduct a t-test directly PY ·X − P2Y ·X
fj =
−j
on bj from the full model. A null hypothesis . (19)
1 − P2Y ·X
significance test for a regression coefficient,

Fouladi, 1997; Cumming & Finch, 2001; given as p θ̂|
. Calculation of a confidence
Kelley, 2005), and standardized regression interval for based on the inversion confi-
coefficients all require the use of noncen- dence interval
principle
involves finding θL
tral distributions. The following subsection such that p θ̂|θL = 1 − L for the lower

will discuss methods of forming confidence limit and θU such that p θ̂|θU = U for the
intervals when noncentral distributions are upper limit. The confidence interval for has
required. coverage of 1 − (L + U ) and is given as
Forming noncentral confidence prob.[θL θU ] = 1 − (L + U ).

intervals: Applications to regression
The confidence interval is general and need
parameters
not have equal rejection regions. For example,
Confidence intervals based on noncentral a one-sided confidence interval is obtained by
distributions are computed in a different setting L or U (whichever is appropriate
manner than typical confidence intervals for the specific situation) to zero (Steiger &
based on central distributions. Two principles, Fouladi, 1997; Steiger, 2004).
or their equivalent, are necessary and are The real benefit from the confidence inter-
described below. The description given here val transformation and inversion confidence
is largely based on Steiger and Fouladi (1997) interval principles, is that when the two
and Steiger (2004). principles are combined, confidence intervals
The confidence interval transformation for quantities that are not pivotal can be
principle is beneficial for forming a con- determined. In the context of effect sizes,
fidence interval on a parameter that is Cumming & Finch (2001) describe pivotal
monotonically related to another parameter, quantities to be those that are of the form
when the latter has a tractable method of
obtaining the confidence interval whereas the θ̂ − ∗
former might not. Let f () be a monotonic ,
s
transformation of , some parameter of inter- θ̂
est, with θL and θU being the lower and upper
where θ̂ is the estimate of the population
(1 − )100% ( = L + U ; generally L =
quantity , ∗ is the null value of interest
U = /2) confidence limits for , where L
(usually zero), and s is the standard deviation
and U define the lower and upper proportion θ̂
of the distribution beyond the lower θL and of the sampling distribution of θ̂ (i.e. its
upper U , respectively. The (1 − )100% standard error). What can be done in order
confidence limits for f () are f (θL ) and f (θU ), to form confidence intervals for non-pivotal
quantities is to use the inversion confidence
prob.[ f (θL) f () f (θU )] = 1−(L +U ), interval principle to find a confidence interval
for some noncentrality value (i.e. what values
where prob. represents probability. Thus, for of the noncentrality parameter lead to the
monotonic transformations the confidence observed noncentrality parameter being the
interval for the transformed population 1−/2 and /2 quantiles?). When these values
quantity is obtained by applying the are found, the noncentrality parameters (i.e.
same transformation to the limits of the the confidence bounds of the noncentral value)
confidence interval for the population quantity are transformed into the statistic of interest,
(Steiger & Fouladi, 1997; Steiger, 2004). which then yields a (1 − )100% confidence
The inversion confidence interval principle interval for the parameter of interest. Stated
states that if θ̂ is an estimate of with another way, confidence intervals for non-
a cumulative distribution that depends on pivotal quantities are found by determining
some
, the probability of observing an the values of the noncentrality parameter
estimate of smaller than that obtained is that would lead to the observed noncentral
which yields a confidence interval of CI0.95 = The MBESS R package includes a func-
[0.7165 P2Y ·X 0.8206], where CI0.95 tion, ci.reg.coef(), for confidence interval
represents a 95% confidence interval with formation for j . A confidence interval
the limits given in the brackets for the for an unstandardized regression coefficient
parameter on interest. Thus, we can be 95% can be obtained by specifying the stan-
confident that the population squared multiple dard deviations of the variables (with the
arguments s.Y and s.X) and specifying
correlation coefficient in this situation is
Noncentral = FALSE. In the situation
somewhere between 0.7165 and 0.8206. described for the unstandardized regression
coefficients (bj = 4.4245), where sY =
Confidence interval for a regression 150.0734 and sXj = 9.3605, the ci.reg.coef()
coefficient function could be specified as
Before forming a confidence interval for
R > ci.reg.coef( b.j = 4.4245,
a regression coefficient, the distinction has
to be made whether or not the regression R2. Y_X = 0.7854,
coefficient will be standardized. An unstan- R2.j_X.without.j = 0.3607, N = 145, p = 5,
dardized regression coefficient is a pivotal
s.Y = 150.0734, s.X = 9.3605,
quantity, whereas a standardized regression
coefficient is a non-pivotal quantity (in an conf.level = 0.95, Noncentral = FALSE)
analogous fashion as the difference between
two group means is pivotal but the standard- which yields a confidence interval of CI0.95 =
ized difference between two group means is [2.8667 j 5.9823], where b.j
nonpivotal). Thus, a confidence interval for an is the unstandardized regression coefficient
unstandardized regression coefficient requires for the jth regressor variable, R2.Y_X is
only a critical value from a central distribution the squared multiple correlation coefficient,
whereas a standardized regression coefficient R2.j_X.without.j is the squared multiple
requires the critical values to be obtained correlation coefficient when the jth regressor
from a noncentral distribution (analogous to variables are predicted from the remaining
forming a confidence interval for P2Y ·X ). The p − 1 regressor variables, conf.level is the
following two sections discuss confidence confidence level specified (i.e. 1 − ), and
intervals for unstandardized and standardized Noncentral is an indicator of whether or not
regression coefficients. the noncentral method should be used (FALSE
for unstandardized and TRUE for standardized
Confidence intervals for an regression coefficients).
unstandardized regression
coefficient
Confidence intervals for a
The t-test for the unstandardized regression
standardized regression coefficient
coefficient, Equation 11, is a pivotal quan-
tity implying that the test statistic can be When a regression coefficient is standardized,
manipulated into a confidence interval. The the unstandardized regression coefficient
sX
confidence interval for the unstandardized is multiplied by the quantity sYj in order
regression coefficient is thus given as to remove the scale of Xj and Y . Such a
quantity is no longer pivotal because of the
prob.[bj −t(1−/2;N−p−1) sbj j process of standardization, implying that the
bj +t(1−/2;N−p−1) sbj ] = 1−. (26) confidence interval necessarily depends on a
noncentral t-distribution. The difficulties that
The confidence interval given above is arise when forming a confidence interval for
the confidence interval given in standard s j , the population standardized regression
textbooks that discuss multiple regression. coefficient for the jth regressor, arise because
value having probability 1 − /2 and /2 P2Y ·X = 0, the test statistic given in Equation
for the lower and upper confidence limits, 10 follows a noncentral F-distribution with
respectively. The values of the noncentrality noncentrality parameter , as given in
parameter that would lead to the observed Equation 11. In accord with the inversion
values occurring with the specified probabili- confidence interval principle, RY2 ·X must be
ties are then transformed into the quantity of converted into the estimated noncentrality
interest. The resultant limits form the (1 − parameter and then noncentral parameters
)100% confidence interval for the population must be found such that
quantity of interest. Although true for confi-
dence intervals based on central distributions p Λ̂|L = 1 − /2 (24)
when L = U , there is no requirement that
the lower confidence interval width, θ̂ − θL, and
will equal the upper confidence interval
width, θU − θ̂ for confidence intervals based p Λ̂|U = /2, (25)
on noncentral distributions. Throughout the
chapter, ‘width’ refers to the full confidence where Λ̂ is the observed noncentrality
interval width, θU − θL. parameter, L and U are the noncentral
values that have at their 1–/2 and /2
quantiles Λ̂ and are thus the lower and
upper confidence limits, respectively (e.g.
Confidence interval for the squared Mendoza and Stafford, 2001; Smithson, 2003;
multiple correlation coefficient Steiger 2004).
The squared multiple correlation coefficient The MBESS R package includes a function,
is one of the most widely used statistics. RY2 ·X ci.R2(), for confidence interval formation
is almost always reported in the context of for P2Y ·X , for fixed (or random) regressor
multiple regression, but in its various forms variables. Although other options can be
RY2 ·X can be used to describe the proportion specified, a straightforward call to the ci.R2()
of variance accounted for in a wide variety of function for fixed regressor variables would
situations (e.g. between subjects analysis of be of the form
variance and covariance designs; as a measure
of cross validation; as an index of comparison R > ci.R2(R2 = RY2 ·X , N = N, p = p,
in meta-analyses, etc.). As Steiger states, conf.level = 1 − ,
‘confidence intervals for the squared multiple Random.Regressors = FALSE)
correlation are very informative yet are not
discussed in standard texts, because a single where RY2 ·X , N, p, and 1– are defined
simple formula for the direct calculation of in the function in the same way as they
such an interval cannot be obtained in a have been defined previously and Random.
manner that is analogous to the way one Regressors identifies if the regressors are
obtains a confidence interval for the popu- random (TRUE) or fixed (FALSE). For
lation mean’ (2004, p. 167). However, con- example, suppose a researcher conducts a
fidence intervals for the population squared study with five regressor variables on 145
multiple correlation coefficient are available individuals and obtains a multiple correlation
with certain software (e.g. R2, an MS-DOS of RY2 ·X = 0.785418 . The ci.R2() function for
program written by Steiger and Fouladi, 1992; 95% confidence interval coverage could be
MultipleR2, a Mathematica package written specified as
by Mendoza and Stafford, 2001; MBESS,
an R package written by Kelley (2007); and R > ci.R2(R2 = 0.7854, N = 145,
indirectly with SAS and SPSS, Smithson,
p = 5, conf.level = 0.95,
2003). Difficulties arise when forming a
confidence interval for P2Y ·X because when Random.Regressors = FALSE)
values of N, which occurs essentially by was previously shown (Equations 16–19)

systematic trial and error, can be imple- to equal
mented using tabled values (e.g. Kraemer &

Thiemann, 1987; Cohen, 1988; Murphy & 1 − P2X ·X X √
λj = j
j −j j
Myors, 1998; Lipsey, 1990) or with a N
noncentral F computer routine (see also 1 − PY ·X
2 Y

Gatsonis & Sampson, 1989; Green, 1991; 2
PY ·X − P2Y ·X √
=
Dunlap et al., 2004). The general idea of −j
N. (31)
the power analysis procedure is to determine 1 − PY ·X
2
the sample size so that the proportion
of the alternative distribution beyond the This implies that sample size is given as
critical value under the null distribution is
at or greater than the desired degree of
λj 2 1 − P2Y ·X Y2
power. N=
The ss.power.R2() function from MBESS j 1 − P2Xj ·X−j X2 j

can be used to determine sample size 1 − P2Y ·X
for the omnibus effect of the regression = λj
2
. (32)
P2Y ·X − P2Y ·X−j
model i.e., P2Y ·X . For example, suppose
a researcher wishes to determine necessary Thus, given the population parameters and λj ,
sample size when it is believed P2Y ·X = 0.25 sample size can be determined. However, in
for the test of the null hypothesis that the order to plan an appropriate sample size, once
squared multiple correlation coefficient is zero the population parameters and the desired
in order to have power of 0.80 when the degree of certainty are specified, λj is the only
Type I error rate is specified at = 0.05. unknown parameter because N is unknown. If
The basic way in which the ss.power.R2() the λj that satisfies a desired degree of power
function from MBESS would be used is as can be determined, then the equation can be
follows: solved for necessary sample size.
Power is based on λj and the degrees
R > ss.power.R2(Population.R2 = 0.25, of freedom, which in turn are based on N.
alpha.level = 0.05, Different values of N can be used to
desired.power = 0.80, p = 5) update λj and the degrees of freedom until
the desired level of power is achieved for
where Population.R2 is the (hypothesized) the test that j = 0. As before, this
value of P2Y ·X , alpha.level is the Type I process can be implemented with tabled
error rate, desired.power is the desired values (e.g. Kraemer & Thiemann, 1987;
degree of power, and p is the number of Cohen, 1988; Lipsey, 1990; Murphy &
regressor variables. Applying this function Myors, 1998; see also Maxwell, 2000 for a
to the example yields a necessary sample comprehensive review) or with a noncentral t
size of 45. (or F) computer routine.
The ss.power.reg.coef() function from
MBESS can be used to determine sample
size for a targeted regression coefficient.
Power for targeted effects in
For example, suppose a researcher believes
multiple regression: Obtaining that P2Y ·X = 0.40 and when the regressor of
statistical significance for a interest is removed P2Y ·X−j = 0.30. Thus, the
regression coefficient of interest
regressor of interest uniquely explains 0.10
When the effect of interest concerns a of the proportion of variance in the criterion
single regression coefficient, the noncentrality variable. Although several possibilities exist,
parameter from the noncentral t-distribution the basic way that the ss.power.reg.coef()
sX
bj is multiplied by sYj (in order to obtain s bj , SAMPLE SIZE PLANNING FOR
the sample standardized regression coefficient MULTIPLE REGRESSION GIVEN THE
for variable j). The distribution of s bj is not GOAL OF STATISTICAL POWER
pivotal and it is necessary to form confidence
intervals based on noncentral t-distributions. This section discusses methods to plan
In accord with the inversion confidence sample size for statistical power in multiple
interval principle, s bj must be converted regression. We begin with an overview of
into the observed noncentrality parameter sample size planning for a desired power for
(via, Equation 13), and then the noncentral the omnibus effect (i.e. P2Y ·X ) and then provide
parameters must be found such that an overview of sample size planning for a
desired power for a targeted effect (i.e. j
ˆ L = 1 − /2
p |λ (27) or s j ).
and
Power for omnibus effects in
ˆ U = /2,
p |λ (28) multiple regression: Obtaining
statistical significance for the
where λL and λU are the lower and upper
squared multiple correlation
confidence limits for s j and are noncentrality
parameters from t-distributions. coefficient
The MBESS R package includes a func- When interest concerns the omnibus effect of
tion, ci.reg.coef(), for confidence interval the model, recall that the noncentrality param-
formation for s j , technically assuming fixed eter was previously shown (Equations 11–12)
regressor variables. Although other options to equal
can be specified, a straightforward call to the
ci.reg.coef() function would be of the form
P2Y ·X
R > ci.reg.coef (b.j = s bj , R2.Y_X = RY2 ·X ,
= N. (29)
1 − P2Y ·X
R2.j_X.without.j = R2Xj ·X−j , N = N, p = p,
This implies that sample size is given as
conf.level = 1 − , Noncentral = TRUE).

For example, in the previous example where 1 − P2Y ·X
N = 145 and RY2 ·X = 0.7854, suppose that N = . (30)
P2Y ·X
s bj = 0.2760 and RXj ·X−j = 0.3607. The
2
ci.reg.coef() function for 95% confidence

Thus, given P2Y ·X and , sample size can be
interval coverage could be specified as
determined. Once P2Y ·X is specified, is the
R > ci.reg.coef(b.j = 0.2760, only unknown parameter since N is unknown.
R2.Y_X = 0.7854, If the that satisfies a desired degree of power
R2.j_X.without.j = 0.3607, N = 145, p = 5, can be determined, then the equation can be
solved for necessary sample size.
conf.level = 0.95, Noncentral = TRUE)
Power is based on and the degrees
which yields a confidence interval of CI0.95 = of freedom, which are in turn based on N.
[0.1739 s j 0.3771]. Notice the asym- Even though, N is unknown, it is the
metry between the confidence limits and value of interest when planning a study
the estimate for the standardized regression with a desired degree of power. The way
coefficient, whereas it was symmetric for the to plan an appropriate sample size is to
unstandardized regression coefficient.This use different values of N to update and
asymmetric property about the point estimate the degrees of freedom until the desired
generally holds for confidence intervals based level of power is achieved for the test that
on noncentral distributions. P2Y ·X = 0. This process of using different
function from MBESS can be specified is as The idea is to first use P2Y ·X , p, and in order to
follows: determine the width of the confidence interval
given some minimal sample size. If the width
R > ss.power.reg.coef(Rho2.Y_X = 0.40, is larger than desired, the current estimate of
Rho2.Y_X.without.j = 0.30, p = 5, N is incremented by 1 and then the expected
desired.power = 0.80, alpha.level = 0.05) width is determined again. This iterative
process continues until the sample size is just
where Rho2.Y_X is the population squared large enough so that the expected confidence
multiple correlation coefficient predicting Y interval width is sufficiently narrow. Two
from X and Rho2.Y_X.without.j is the caveats with such an approach arise: RY2 ·X is
population squared multiple correlation coef- a positively biased estimate of P2Y ·X and the
ficient predicting Y from X−j . The necessary sample size calculated is only for the expected
sample size in this example is 50. width.
Even though RY2 ·X is the sample estimate
of P2Y ·X , RY2 ·X is positively biased. However,
the confidence limits for P2Y ·X , and thus its
SAMPLE SIZE PLANNING FOR
width, are based on RY2 ·X . Even though the
MULTIPLE REGRESSION GIVEN THE
bias of RY2 ·X decreases as N increases, holding
GOAL OF STATISTICAL ACCURACY
everything else constant,basing the necessary
sample size on P2Y ·X directly would lead to
AIPE for the omnibus effect in inappropriate estimates of necessary sample
multiple regression: Obtaining a size because the width of the computed
narrow confidence interval for the confidence interval in part depends on RY2 ·X .
population squared multiple The way in which this complication is
correlation coefficient overcome is by using the expected value of
RY2 ·X in place of P2Y ·X . The expected value
The way in which sample size can be
determined in order for the expected width of RY2 ·X given P2Y , N, and p when regressors
of the confidence interval for P2Y ·X to be are fixed does not have a known derivation.
sufficiently narrow is quite involved. The However, the expected value of RY2 ·X given
method is computationally tedious and can P2Y ·X , N, and p when regressors are random
only be carried out with the use of an is known and is used as an approximation to
iterative computer routine that uses noncentral the case where predictors are fixed, which is
F-distributions. As elsewhere in the chapter, given as
we have restricted the discussion to regressors
that are fixed. The case of random regressors is E RY2 ·X | P2Y ·X , N, p
fully developed in Kelley (2006)19 . It should N − p − 1
be noted that two methods are discussed. =1− 1 − P2Y ·X
N −1
The first method discussed provides necessary N +1 2
sample size for the expected confidence × H 1;1; ; PY ·X , (33)
2
interval width. The confidence interval width
is a random variable that will vary from where H is the hypergeometric function
sample to sample. A modified approach will (Stuart et al., 1999, section 28.32; Johnson
also be discussed so that the width will be et al., 1995).
sufficiently narrow with no less than some The sample size procedure is based on
specified degree of certainty. the expected value of RY2 ·X because it is the
The values that must be specified in order value expected to be obtained in the study.
to determine the necessary sample size given For a given , p, and N, the confidence
an expected confidence interval width that interval width depends only on RY2 ·X . Thus,
is sufficiently narrow are P2Y ·X , p, and . the expected confidence interval width can be
determined by forming a confidence interval before, but now using the confidence limits
with the expected RY2 ·X . The expected confi- in place of P2Y ·X from the first procedure. The
dence interval width can be made sufficiently rationale of this approach is to base the sample
narrow by increasing sample size, implying size procedure on the largest and smallest
that the expected value of RY2 ·X changes, plausible value for the obtained RY2 ·X based
until the expected confidence interval width on the original sample size and the degree of
is equal to or just narrower than the desired certainty specified.
width. Once the sample size is found so The reason the upper and lower confidence
that the expected confidence interval width is limits are used is because, unlike many
sufficiently narrow, using the sample size in effects where the larger the noncentrality
a study will ensure that the expected width parameter the wider the confidence interval
of the confidence interval will be sufficiently (holding everything else constant), there is
narrow. a nonmonotonic relationship between RY2 ·X
For example, suppose a researcher wishes and the confidence interval width. Depending
to determine necessary sample size so that on the particular situation, a larger sample
the expected width of a 95% confidence size may be necessitated by the lower limit
interval for P2Y ·X is 0.20 for 5 regressor or the upper limit from the two 100%
variables in a situation where P2Y ·X = 0.5. The one-sided confidence limits (or a value in
ss.aipe.R2() function from MBESS would be between). The relationship between RY2 ·X and
used as the corresponding confidence interval width is
illustrated in Figure 11.1 for 95% confidence
intervals where p = 5 and N = 100.
R > ss.aipe.R2(Population.R2 = 0.50,
The lack of monotonicity between the size
conf.level = 0.95, width = 0.20, p = 5, of RY2 ·X and the confidence interval width
Random.Regressors=FALSE), implies that, depending on the particular
situation, the upper limit, the lower limit, or
which returns a necessary sample size of values in-between the two one-sided 100%
152. Thus, using a sample size of 152 would confidence interval limits will yield wider
provide an expected width for the confidence confidence intervals for P2Y ·X . Even though
interval of 0.20. Figure 11.1 is helpful to illustrate why upper
Since the width of the confidence interval and lower limits are required, recall that the
is a random variable, having a sample size procedure always uses the expected value of
such that the expected width is sufficiently RY2 ·X . Thus, an analog to the figure presented,
narrow does not ensure that any particular and what is actually used in the procedure,
sample will have a confidence interval that is one where the values on the ordinate
is sufficiently narrow (e.g. see Hahn & are a function of basing confidence interval
Meeker, 1991, or Kupper & Hafner, 1989, width on the expected values of RY2 ·X for
for a discussion of these issues in simpler corresponding values of P2Y ·X .
situations). What can be done is to specify Two issues arise when basing the sample
some desired degree of certainty that the size procedure on limits from the 100% one-
obtained confidence interval will in fact be sided confidence intervals. First, it is possible
sufficiently narrow. The way in which this that the point estimate itself requires a larger
additional step proceeds is by using the sample sample size than either of the confidence
size obtained from the previously discussed limits (e.g. suppose the corresponding point
procedure and from two 100% one-sided estimate is 0.35 from the figure). Second,
confidence intervals for P2Y ·X , where is the the maximum confidence interval width could
desired degree of certainty that the obtained be between the limits (e.g. suppose the
interval will be sufficiently narrow. The limits corresponding confidence limits are 0.2 and
from the 100% confidence intervals are then 0.6 from the figure). To ensure that an
used to plan an appropriate sample size as appropriate sample size is determined, an
CI Width
SE(R 2)
0.25
95% CI Width and SE(R 2) Given R 2 = P2
0.20
0.15
0.10
0.05
0.00
0.0 0.2 0.4 0.6 0.8 1.0

Value of P2
Figure 11.1 Relationship between the observed width of the 95% confidence interval for the
population squared multiple correlation coefficient (PY2 ·X ) as a function of the observed
squared multiple correlation coefficient (RY2 ·X ) when the total sample size is 100 and there are
five regressors
optimization routine is used to determine 100% of the confidence intervals widths

if there is a value within the confidence will be sufficiently narrow (Kelley, 2006,
limits that leads to a wider confidence provides more detail on the procedure in the
interval than either of the limits. If not, the case of random regressors). It is important to
larger of the two sample sizes is used. If remember that at every stage, the expected
so, if the value that leads to the widest value of RY2 ·X is used based on the particular
confidence interval is the value on which population value. Depending on the particular
the original sample size is based, then the situation, incorporating a degree of certainty
original sample size is used. If it is some other parameter can yield only a small or a large
value between the confidence limits, then increase in necessary sample size.
the RY2 ·X value that occurs with probability The method discussed in order to obtain a
(1 − )/2 less than the value leading to the narrow confidence interval with some degree
maximum confidence width and the value of certainty can be readily implemented with
that occurs with probability (1 − )/2 more the ss.aipe.R2() function. Realizing that
than the maximum confidence width are used. having only an expected width of 0.20 is not
The probabilities are determined from the sufficient, further suppose that the researcher
appropriate noncentral F-distributions. Of the incorporates a 99% degree of certainty
contending sample sizes, the largest one is that the obtained confidence interval will be
used. Doing so ensures that no less than no wider than 0.20 units. The way in which the
ss.aipe.R2() function is used in order to order for the expected width to be sufficiently
ensure a desired degree of certainty of 0.99 narrow:
is given as follows:
t(1−/2;N−p−1) 2 1 − P2Y ·X
N=
R > ss.aipe.R2(Population.R2 = 0.50, ω/2 1 − P2Xj ·X−j
conf.level = 0.95, width = 0.20, p = 5,
Y2
degree.of.certainty = 0.99, × + p + 1, (35)
X
2
j
Random.Regressors = FALSE),
where ω is the desired full width of the
which yields a necessary sample size of 189. confidence interval. A complication is that
the desired N is implicitly involved on the
AIPE for targeted effects in multiple right side of the equation since the degrees of
regression: Obtaining a narrow freedom of the t-value depend on N. It is thus
necessary to solve Equation 35 iteratively.
confidence interval for the
Because the confidence interval width is
population regression coefficient itself a random variable, obtained values of
Recall that when regression coefficients are sb2j larger than the population value used in
unstandardized, the way in which confidence the calculation of N will lead to confidence
intervals are obtained is based on the central intervals wider than desired. In order to
t-distribution. However, confidence intervals avoid obtaining a confidence interval wider
based on standardized regression coefficients than desired, the 100% confidence limit for
require the use of noncentral distributions the standard error can be used in place of
(since s bj is not a pivotal quantity). Thus, the population standard error when solving
the appropriate procedures are different for for N. The 100% upper confidence limit
the two scenarios. The first procedure dis- for the population standard error of the jth
cussed will be for unstandardized regression regression coefficient, based on a chi-square
coefficients followed by a procedure for distribution with N − p − 1 degrees of
standardized regression coefficients. freedom, can then be substituted for the
population variance from Equation 34. Doing
AIPE for unstandardized regression so will ensure that the obtained confidence
coefficients interval will be sufficiently narrow no less
Kelley and Maxwell (2003) discussed AIPE than 100% of the time. Since the only way
for a targeted regression coefficient. We will for a confidence interval to be wider than
base the present discussion largely on an desired is to obtain a standard error larger than
updated account of that work in the context of the population standard error, using the upper
unstandardized regression coefficients. Recall 100% confidence limit of the standard error
from Equation 26 that the confidence interval will ensure that the confidence interval will
for j is straightforward to calculate given bj , be sufficiently narrow no less than 100% of
sbj (which is a function of N, p, RY2 ·X , RX2 j ·X−j ), the time.
N, p, and . The population variance for the The way in which the upper limit for
jth regression coefficient is given as the variance of the regression coefficient is
⎛ ⎞ determined is given as

1−P 2
2
b2j = ⎝ Y ·X ⎠ Y
. 1 − P2Y ·X Y2
1−P2 /(N −p−1) X2
bj =
2
Xj ·X−j
1 − P2Xj ·X−j /(N − p − 1) Xj
j 2
(34) 2
(;N−1)
Given b2j , the sample size can be solved × , (36)
N −p−1
for, yielding the necessary sample size in
where 2(;N−1) is the th quantile from a 2 AIPE for standardized regression

distribution with N − 1 degrees of freedom coefficients
and b2j is the upper limit of the 100% con- Similar to the sample size for the expected
fidence interval for b2j . Substituting b2j from confidence interval width being sufficiently
Equation 36 for b2j from Equation 34 yields narrow for an unstandardized regression
the modified sample size, coefficient, the sample size necessary in
order for the expected width of a noncentral

confidence interval for s j can be solved
t(1−/2;N−p−1) 2 1 − P2Y ·X
N = iteratively. Because the critical value cannot
ω/2 1 − PXj ·X 2−j
2 be written analytically since it is based
Y2 (;N−1) on a noncentral t-distribution, the iterative
× + p + 1, nature for the necessary sample size of the
X
2 N −p−1
j standardized regression coefficients must also
(37) include a step for determining the expected
confidence interval width given the partic-
where N is the modified sample size so that
ular sample size. Thus, the iterative nature
there is 100% certainty that the obtained con-
necessary to determine the expected width
fidence interval will be sufficiently narrow.
is more difficult for standardized regression
The methods discussed can be readily
coefficients than it is for their unstandardized
implemented with the MBESS R function
counterparts due to the necessary employment
ss.aipe.reg.coef(). Suppose that P2Y ·X =
of the noncentral t-distribution. Although
0.50 and P2Xj ·X−j = 0.20, Y2 = 50, X 2 = 5,
j this requires a great deal more work in the
p = 5, and j = 3. Further suppose that the actual algorithm to determine sample size,
desired width for the 95% confidence interval there is no conceptual difference compared to
is 2 for the regressor of primary importance the method for the unstandardized regression
(the estimate plus and minus 1 unit). The way coefficient.
in which the ss.aipe.reg.coef() function can The method has been implemented in
be used is given as the ss.aipe.reg.coef() function from MBESS
when Noncentral=TRUE has been specified.
R > ss.aipe.reg.coef(Rho2.Y_X = 0.5, For the situation described in the previous
Rho2.j_X.without.j = 0.2, p = 5, b.j = 3, section, sample size for the standardized
analog can be obtained as
width = 2, sigma.Y = 50, sigma.X = 5,
R > ss.aipe.reg.coef(Rho2.Y_X = 0.5,
conf.level = 0.95.)
Rho2.j_X.without.j = 0.2, p = 5, b.j = 0.3,
with the result of the function being 250. width = 0.2, sigma.Y = 1, sigma.X = 1,
Further suppose that the researcher would like conf.level = 0.95, Noncentral = TRUE)
to be 85% certain that the 95% confidence
interval is no larger than 2 units wide. The which yields a necessary sample size
modified sample size can be obtained by of 264.
specifying the degree of certainty parameter: As in the unstandardized case, the con-
fidence interval width is itself a random
R > ss.aipe.reg.coef(Rho2.Y_X = 0.5, variable. At the present time, there has
Rho2.j_X.without.j = 0.2, p = 5, b.j = 3, not been a satisfactory method developed
for determining necessary sample size for
width = 2, sigma.Y = 50, sigma.X = 5, confidence intervals for s j that incorporates a
conf.level = 0.95, desired degree of certainty. The complication
degree of certainty = 0.85) in developing such a method stems from
the fact that the noncentrality parameter
which yields a necessary sample size of 278. is based on two parameters: s j and s2bj .
Thus, an analog for the way a desired degree of certainty parameter into the sample
degree of certainty is incorporated into the size procedure for standardized regression
unstandardized regression coefficient, where coefficients and for sensitivity analyses in
the confidence interval width depends on general20 .
only one parameter, b2j , is necessarily more
difficult in the standardized case. Even though
we believe that a method can and will DISCUSSION
be developed, at the present time a brute-
force trial and error simulation-based method In the context of multiple regression, the
can be implemented in order to plan an question ‘What size sample should I use?’
appropriate necessary sample size. Such an does not have a simple answer. As this
approach would proceed by specifying the chapter has demonstrated, the answer is best
population parameters and simulating data addressed with the two-by-two conceptualiza-
based on a particular sample size. From tion presented in Table 11.1. Specifically, the
there, confidence intervals could be performed sample size that should be used depends on
for standardized regression coefficients as the goals of the study. If the goal is for the
previously discussed. The proportion of overall fit of the model, then interest concerns
confidence intervals that are less than the P2Y ·X ; if the goal is for a targeted effect, then
desired width can be determined for different interest concerns j (or s j ). Of course, both
sample size values. This could be done until P2Y ·X and j (or s j ) might be of interest, which
the minimum sample size is found that yields implies that the larger of the two sample
no less than the desired degree of certainty sizes from the situations of interest should
specified. be used.
The function ss.aipe.reg.coef.sensitivity() However, identifying only that one is
contained in the MBESS R package can be interested in P2Y ·X and/or j (or s j ) is still not
used to determine the appropriate sample enough to determine the necessary sample
size as well as perform general sensitivity size. It is also necessary to determine if the
analyses. When an estimated set of population goal is to reject the null hypothesis that
parameters is specified (that differs from the the effect is zero in the population or if
true set), the sample size used is based on the goal is to obtain an accurate parameter
the estimated values, but the simulation is estimate via a narrow confidence interval
conducted based on the properties of the for the population parameter (possibly both).
true set of parameter values. This allows one In multiple regression, although the idea is
to perform a sensitivity analysis, where the much more general, choosing an adequate
effects of mis-specifying population parame- sample size is not generally possible until
ters by varying amounts on the typical width a particular cell in Table 11.1 has been
and the percentage of confidence intervals identified as the scenario of interest. Once
narrower/wider than desired can be evaluated. the particular scenario from the two-by-two
Alternatively, a specific sample size can be conceptualization has been determined, then
used in order to evaluate the properties of the and only then can an appropriate sample size
situation described by the true set of parameter be planned (recall Point f from the designing
values at the specified value of sample size. research studies list in the introduction of the
Using the specified sample size approach, one chapter).
can run the simulation with different values of Even after the scenario has been deter-
sample size until the percentage of confidence mined, it is still necessary to use an appro-
interval widths less than the desired width is priate value of an effect size parameter. One
equal to the degree of certainty of interest. thing that has been conspicuously absent
Although generally more time consuming, from the chapter is ways to choose an
the brute force method described works very appropriate value for the effect size parameter
well when one wants to incorporate a desired so that all the sample size procedures can
be implemented. The effect size has been NOTES

termed the ‘problematic parameter’ due to
the difficulty in estimating this unknown but 1 One important complication not addressed in
necessary quantity (Lipsey, 1990). Options this chapter is the total financial cost of conducting
include basing population values on values a study. Some studies may require a necessary sample
size so large that the cost of conducting the study
obtained in previous research, possibly using with that sample size becomes prohibitively expensive
meta-analytic techniques, performing a pilot (e.g. Kraemer, 1991; Allison et al., 1997).
study to estimate the necessary population 2 With regards to the statistical power or accuracy
quantities, or basing the population values of regression coefficients, we have approached the
on a reasonable exchangeable correlation chapter as if interest is restricted to either the omnibus
effect or a single targeted regression coefficient. Of
structure. An exchangeable correlation struc- course, a researcher might be interested in more than
ture is one where the correlation between one regression coefficient or potentially all regression
each regressor and the criterion is the same coefficients. When interest includes more than one
and the correlation among the regressors is regression coefficient or all regression coefficients,
the same (but the two correlation values issues of multiple and simultaneous inference become
important. These issues are beyond the scope of the
may be different; Maxwell, 2000). Even present chapter and are not discussed.
though this may seem simplistic, it is 3 R and MBESS, along with their respective
often a reasonable alternative unless obvious manuals, can be downloaded from the following
reasons exist for why it should not be Internet address: http://www.cran.r-project.org/.
used (Maxwell, 2000; see also Green, 1977). 4 Some sources state that power is a function
of only three things, but in those cases the work
Given the difficulty of estimating the effect
generally refers to the standardized effect size, which
size parameter, combined with the nonlinear involves both the (unstandardized) effect size and the
relationship between the necessary sample model error variance. An example of such a situation
size and the desired degree of power or is when planning sample size to detect the difference
accuracy, sensitivity analyses are almost between two independent group means. Either the
mean difference and the common variance or the
always helpful.
standardized mean difference, which is defined as
The chapter has made use of the Open the mean difference divided by the square root of
Source and freely available computer package the common variance, can be specified.
MBESS for the R statistical language and 5 A Type I error occurs when the null hypothesis
environment. We believe that the user-friendly is true but the null hypothesis is rejected (this occurs
functions contained in this package will be with probability ). A Type II error occurs when the
null hypothesis is false but the null hypothesis fails to
helpful for researchers planning sample size be rejected.
for multiple regression from any of the cells 6 At times the data analysis procedure can be
within Table 11.1. Alternatively, when there modified so as to reduce the model error variance
are multiple goals, choosing the larger of the yet still address the same research question, which
necessary sample sizes is suggested as a way potentially increases power and/or accuracy. For
example, analysis of covariance can be used instead
to achieve the multiple goals.
of an analysis of variance in a randomized design.
It is our hope that this chapter has been The same question is addressed (are there differences
helpful in synthesizing four very different among the population group means?), yet the model
methods of planning sample size. The correct error variance is reduced by an amount related to
choice, of course, depends on the goal(s) of the the squared correlation between the covariate and
the dependent variable (e.g. Huitema, 1980; Cox &
researcher. Before determining sample size, a
McCullagh, 1982; Maxwell & Delaney, 2004).
necessary but not a sufficient task is to clearly 7 It should be noted that the terms accuracy
identify the particular question of interest that and precision have often been (incorrectly) used
the study would ideally accomplish. Unless synonymously in the literature, which has at times
the question of interest is clearly identified, caused confusion (Stallings & Gillmore, 1971). We
believe the definition used here is optimal, in the sense
sample size cannot be adequately planned.
that accuracy is clearly a function of precision and bias.
Perhaps the best answer to the question The term accuracy in parameter estimation, the term
‘What size sample should I use?’ is, ‘Well, we use for planning sample size with the desire to have
it depends.’ a narrow confidence interval, is also thought to be
ideal, as it conveys the goal of achieving a parameter fixed throughout the chapter. Even though the
estimate that is close to its population value. distinction between fixed and random regressors
8 As an extreme example, suppose that regardless is not often made in applied work, the sampling
of the observed data, a researcher always estimates distribution of an estimated regression coefficient
the parameter to be a value that corresponds to an tends to depend on whether the regressors are fixed
a priori theory irrespective of any observed data. In or random (e.g. Stuart et al., 1999; Rancher, 2000).
such a case there would be a high degree of precision Many applications of multiple regression implicitly or
but the accuracy would likely be poor due to the explicitly take the view ‘given this X’ so that the
effect of bias in the estimation procedure unless the X variables can be considered fixed for purposes
theory is perfect. Precision is thus a necessary but not a of the study (e.g. O’Brien & Mueller, 1993, p. 23).
sufficient condition for achieving accurate parameter O’Brien and Mueller (1993) make the argument that
estimates. the distinction is not important in the context of
9 A counter example is the Cauchy distribution, sample size planning for power in multiple regression
where the precision of the location estimate is the by stating that ‘the practical discrepancy between
same regardless of the sample size used to estimate it the two approaches disappears as the sample size
(Stuart et al., 1994, pp. 2–3). increases’ (p. 23). O’Brien and Mueller (1993) go on
10 Some population parameters are typically to say that ‘because the population parameters are
estimated with biased estimators but have exact conjectures or estimates, strict numerical accuracy of
confidence interval procedures. Even though the the power computations is usually not critical’ (p. 23).
estimator is biased, the point estimate may be We will say more about the distinction between fixed
necessary for calculation of the (exact) confidence and random regressors elsewhere in the chapter.
interval, where the values within the interval represent 14 We use both standardized and unstandardized
plausible values and will contain the parameter regression coefficients in various parts of the chapter.
with (1 − )100% confidence. Many such population Observed standardized regression coefficients have
parameters also have unbiased (or more unbi- at times been referred to as ‘beta weights’ in the
ased) estimators. Examples include the standardized behavioral and educational sciences. We will use j to
mean difference (e.g. Hedges & Olkin, 1985), represent the unstandardized population regression
the squared multiple correlation coefficient (e.g. coefficient of variable j with bj as its estimate. We
Algona & Olenek, 2000), the standard deviation use s j to represent the standardized population
(e.g. Hays, 1994, for the confidence interval method regression coefficient of variable j with s bj as its
and Boltzmann, 1950, for the unbiased estimate), estimate.
and the coefficient of variation (e.g. Johnson & 15 Notice that we have not used the standard
Welch, 1940 for the confidence interval method and general linear model equations, where the intercept
Social & Baumann, 1980, for its nearly unbiased is contained within and X contains a vector of ones
estimate). A strategy in such cases is to report the for the intercept. The notation used here is equivalent
exact confidence interval and the unbiased estimate to the standard general linear model equations, but
of the population parameter. it is especially helpful for presenting the necessary
11 The direction of an effect is known if the information for each of the four approaches to sample
upper and lower limits of the confidence interval are size planning for multiple regression.
both in the same direction (i.e. both are positive or 16 Throughout the chapter, multiple correlation
both are negative). Furthermore, the confidence limits coefficients will be denoted with a subscript that
determine whether or not a particular null hypothesis identifies the variable being predicted separated by
(such as zero) can be rejected. Confidence limits a dot from one or more regressor variables. Thus,
provide the same information as an infinite set of the criterion variable is on the left of the dot and
hypothesis tests. The values within the confidence the regressor variable(s) are to the right of the dot,
limits are the values of the null hypothesis that would where the dot can literally be read as ‘regressed on,’
not be rejected. The values outside of the confidence ‘predicted from’ or ‘explained by.’
limits are the values of the null hypothesis that would 17 A mean-shifted central distribution is one that
be rejected. follows a central distribution after subtracting the
12 The term ‘regressors’ has been used throughout population value. For example, when comparing two
the chapter as a generic term for the Ax variables. A independent group means, if there is a population
regressor variable is termed independent, explanatory, mean difference between the two groups a priori,
predictor, or concomitant variable in other contexts. then that difference can be subtracted from the
The term criterion is used as a generic term for observed difference: Ȳ1 − Ȳ2 − (1 − 2 ), where Ȳ1
the Y variable. The criterion variable is termed and Ȳ2 are the observed means for groups one and
dependent, outcome, or predicted variable in other two, respectively, and 1 and 2 are the population
contexts. means for groups 1 and 2,respectively.
13 Notice that the regressor variables (i.e. the Ax 18 The illustrative data from Holzinger and
variables) are not italicized in any of the equations. Swineford’s (1939) Grant-White School data (avail-
This is because we will regard the regressors as able in MBESS), where the criterion variable, total
score (the sum of all of the 26 measured variables Cox, D. R., & McCullagh, P. (1982). Some aspects of
included in the dataset), is modeled as a function analysis of covariance. Biometrics, 541–561.
of the regressor variables flags, wordm, addition, Cumming, G., & Finch, S. (2001). A primer on the
object, and series. The standardized and unstan- understanding, use, and calculation of confidence
dardized regression coefficients, presented in the
intervals that are based on central and noncentral
next section, are for the series variable, which was
distributions. Educational and Psychological Mea-
a test that measured students’ ability to complete
mathematical/numeric series. Notice that the squared surement, 61, 532–574.
multiple correlation coefficient is quite large by most Darlington, R. B. (1990). Regression and linear models.
behavioral, educational, and social science standards. New York, NY: McGraw-Hill.
The large squared multiple correlation coefficient is Dunlap, W. P., Xin, X., & Myers, L. (2004).
because the dependent variable is a sum of five Computing aspects of power for multiple regres-
positively correlated measures, where the zero-order sion. Behavior Research Methods, Instruments, &
correlations among the measures tended to be large. Computers, 36, 695–701.
19 Even though only fixed regressors are discussed
Gatsonis, C., & Sampson, A. R. (1989). Multiple
in the chapter, The ss.aipe.R2() function in MBESS
correlation: Exact power and sample size calculations.
can be used for regressors that are fixed or random
by specifying Random.Predictors=TRUE (for random Psychological Bulletin, 106, 516–524.
predictors) or Random.Predictors=FALSE (for fixed Graybill, F. A. (1976). Theory and application of the
regressors). linear model. Pacific Grove, CA: Brooks/Cole.
20 In addition to the ss.aipe.reg.coef.sensitivity() Green, B. F. (1977). Parameter sensitivity in multivariate
function described, there is also a ss.power.reg. methods. Multivariate Behavioral Research, 12,
coef.sensitivity() function that allows the effects 263–288.
of parameter mis-specification or selected sample Green, S. B. (1991). How many subjects does it take
size to be specified in order to assess empiri- to do a regression analysis? Multivariate Behavioral
cal power, and other properties, for a targeted
Research, 26, 499–510.
regression coefficient. These functions for confi-
dence interval width and power have analogs for Greenwald, A. G. (1975). Consequences of prejudice
omnibus effect with the ss.aipe.R2.sensitivity() and the against the null hypothesis. Psychological Bulletin,
ss.power.R2.sensitivity() functions. 82, 1–20.
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for
research: A broad practical approach. Mahwah, NJ:
REFERENCES Lawrence Erlbaum Associates.
Hagen, R. L. (1997). In praise of the null hypothesis
Algona, J., & Olenek, S. (2000). Determining sample statistical test. American Psychologist, 52(1), 15–24.
size for accurate estimation of the squared mul- Hahn, G., & Meeker, W. (1991). Statistical intervals:
tiple correlation coefficient. Multivariate Behavioral A guide for practitioners. New York, NY: John Wiley &
Research, 35, 119–136. Sons, Inc.
Allison, D. B., Allison, R. L., Faith, M. S., Paultre, F., & Harris, R. J. (1997). Significance tests have their place.
F. X. Pi-Sunyer. (1997). Power and money: Design- Psychological Science, 8, 8–11.
ing statistically powerful studies while minimizing Harris, R. J. (2001). A primer of multivariate statistics
financial costs. Psychological Methods, 2, 20–33. (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Bakan, D. (1966). The test of significance in psycholog- Hays, W. L. (1994). Statistics (5th ed.). Belmont,
ical research. Psychological Bulletin, 66, 423–437. CA: Wadsworth Publishing.
Chow, S. L. (1996). Statistical significance: Rationale, Hedges, L. V., & Olkin, I. (1985). Statistical methods for
validity and utility. Newbury Park, CA: Sage meta-analysis. Orlando, FL:Academic Press.
Publications. Boltzmann, W. H. (1950). The unbiased estimate of
Cohen, J. (1988). Statistical power analysis for the the population variance and standard deviation.
behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence American Journal of Psychology, 63, 615–617.
Erlbaum Associates. Holzinger, K. J., & Swineford, F. (1939). A study in factor
Cohen, J. (1994, December). The earth is analysis: The stability of a bi-factor solution. Chicago,
round (p < 0.05). American Psychologist, 49, IL: The University of Chicago.
997–1003. Huitema, B. E. (1980). The analysis of covariance and
Cohen, J., Cohen, P., West, S. G., & Aiken, L. alternatives. New York, NY: Wiley.
S. (2003). Applied multiple regression/correlation Hunter, J. E., & Schmidt, F. L. (2004). Methods of
analysis for the behavioral sciences (3rd ed.). meta-analysis: Correcting error and bias in research
Mahwah, NJ: Erlbaum. findings. Newbury Park, CA: Sage.
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1995). Con- perspective. (2nd ed.). Mahwah, NJ: Lawrence
tinuous univariate distributions (Vol. 2). New York, Erlbaum Associates.
NY: John Wiley & Sons, Inc. Meehl, P. E. (1967). Theory testing in psychology and
Johnson, N. L., & Welch, B. L. (1940). Applications in physics: A methodological paradox. Philosophy of
of the noncentral t -distribution. Biometrika, 31, Science, 34, 103–115.
362–389. Meehl, P. E. (1978). Theoretical risks and tabular
Kelley, K. (2005). The effects of nonnormal distributions asterisks: Sir Karl, Sir Ronald, and the slow progress
on confidence intervals for the standardized mean of soft psychology. Journal of Consulting and Clinical
difference: Bootstrapping as an alternative to Psychology, 46, 806–834.
parametric confidence intervals. Educational and Mendoza, J. L., & Stafford, K. L. (2001). Confidence
Psychological Measurement, 65(1), 51–69. intervals, power calculations, and sample size
Kelley, K. (2006). Sample size planning for the estimation for the squared multiple correlation
squared multiple correlation coefficient: Accuracy in coefficient under the fixed and random regression
parameter estimation via narrow confidence intervals. models: A computer program and useful standard
Manuscript under review. tables. Educational and Psychological Measurement,
Kelley, K. (2007). MBESS version 0.0.9: An R package. 61, 650–667.
[computer software and manual]. Retrievable from Mogie, M. (2004). In support of null hypothesis
http://www.cran.r-project.org/ significance testing. Proceedings of the Royal
Kelley, K., & Maxwell, S. E. (2003). Sample size for Society of London, Series B, Biology Letters, 271,
multiple regression: Obtaining regression coefficients 82–84.
that are accurate, not simply significant. Psychological Morrison, D. E., & Henkel, R. E. (1970). The significance
Methods, 8, 305–321. test controversy: A Reader. Chicago, IL: Aldine
Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Publishing Company.
Obtaining power or obtaining precision: Delineating Murphy, K. R., & Myors, B. (1998). Statistical
methods of sample size planning. Evaluation and the power analysis: A simple and general model for
Health Professions, 26, 258–287. traditional and modern hypothesis tests. Mahwah, NJ:
Kelley, K., & Rausch, J. R. (2006). Sample size planning Erlbaum.
for the standardized mean difference: Accuracy in Neyman, J. (1937). Outline of a theory of statistical
parameter estimation via narrow confidence intervals. estimation based on the classical theory of probability.
Psychological Methods, 11, 363–385. Philosophical Transaction of the Royal Society
Kraemer, H., Gardner, C., Brooks, J. O., & Yesavage, of London. Series A, Mathematical and Physical
J. A. (1998). Advantages of excluding underpow- Sciences, 236, 333–380.
ered studies in meta-analysis: Inclusionist versus Nickerson, R. S. (2000). Null hypothesis significance
exclusionist viewpoints. Psychological Methods, 3, testing: A review of an old and continuing
23–31. controversy. Psychological Methods, 5, 241–301.
Kraemer, H. C. (1991). To increase power in O’Brien, R., & Mueller, K. E. (1993). A unified approach
randomized clinical trials without increasing sample to statistical power for t -tests to multivariate models.
size. Psychopharmacology Bulletin, 27, 217–224. In L. Edwards (Ed.), Applied analysis of variance
Kraemer, H. C., & Thiemann, S. (1987). How many in behavioral sciences (pp. 297–344). New York,
subjects? Beverly Hills, CA: Sage. NY: Marcel Dekker.
Kupper, L. L., & Hafner, K. B. (1989). How appro- Pedhazur, E. J. (1997). Multiple regression in behavioral
priate are popular sample size formulas? American research: Explanation and prediction (3rd ed.).
Statistician, 43, 101–105. New York, NY: Harcourt Brace College Publishers.
Lipsey, M. W. (1990). Design sensitivity: Statistical R Development Core Team. (2007). R version 2.5.0:
power for experimental research. Newbury Park, A language and environment for statistical computing
CA: Sage. [computer software and manual], R foundation for
Maxwell, S. E. (2000). Sample size and multiple statistical computing.
regression. Psychological Methods, 5, 434–458. Rancher, A. C. (2000). Linear models in statistics.
Maxwell, S. E. (2004). The persistence of under- New York, NY: John Wiley & Sons, Inc.
powered studies in psychological research: Causes, Rosenthal, R. (1993). Cumulative evidence. In G. Keren &
consequences, and remedies. Psychological Meth- C. Lewis (Eds.), A handbook for data analysis
ods, 9, 147–163. in the behavioral sciences: Methodological issues
Maxwell, S. E., & Delaney, H. D. (2004). Designing (pp. 519–559). Hillsdale, NJ: Lawrence Erlbaum
experiments and analyzing data: A model comparison Associates.
Rossi, J. S. (1990). Statistical power of psychological Social, R. R., & Baumann, C. A. (1980). Significance tests
research: What have we gained in 20 years? for coefficients of variation and variability profiles.
Journal of Consulting and Clinical Psychology, 58(5), Systematic Zoology, 29, 50–66.
646–656. Stallings, W. M., & Gillmore, G. M. (1971). A note
Rozeboom, W. W. (1960). The fallacy of the null- on ‘accuracy’ and ‘precision’. Journal of Educational
hypothesis significance test. Psychological Bulletin, Measurement, 8, 127–129.
57, 416–428. Steiger, J. H. (2004). Beyond the F test: Effect
Rozeboom, W. W. (1966). Foundations of the theory of size confidence intervals and tests of close fit
prediction. Homewood, IL: The Dorsey Press. in the analysis of variance and contrast analysis.
Sampson, A. R. (1974). A tale of two regressions. Journal Psychological Methods, 9, 164–182.
of the American Statistical Association, 69, 682–689. Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer
Schmidt, F. L. (1996). Statistical significance testing program for interval estimation, power calculation,
and cumulative knowledge in psychology: Impli- and hypothesis testing for the squared multiple
cations for training of researchers. Psychological correlation. Behavior Research Methods, Instruments,
Methods, 1, 115–129. and Computers, 4, 581–582.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality
statistical power have an effect on the power of interval estimation and the evaluation of statistical
studies? Psychological Bulletin, 105, 309–316. methods. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger
Serlin, R., & Lapsley, D. (1985). Rationality in (Eds.), What if there where no significance tests?
psychological research: The good-enough principle. (pp. 221–257). Mahwah, NJ: Lawrence Erlbaum
American Psychologist, 40, 73–83. Associates.
Serlin, R. C., & Lapsley, D. K. (1993). Rational Stuart, A., & Ord, J. K. (1994). Kendall’s advanced theory
appraisal of methodological research and the good- of statistics: Distribution theory (6th ed.). New York,
enough principle. In G. Keren & C. Lewis (Eds.), NY: John Wiley & Sons.
Methodological and quantitative issues in the analysis Stuart, A., Ord, J. K., & Arnold, S. (1999). Kendall’s
of psychological data (pp. 199–228). Mahwah, NJ: advanced theory of statistics: Classical inference and
Lawrence Earlbaum Associates. the linear model (6th ed., Vol. 2A). New York, NY:
Smithson, M. (2001). Correct confidence intervals Oxford University Press.
for various regression effect sizes and parameters: Thompson, B. (2002). What future quantitative social
The importance of noncentral distributions in science research could look like: Confidence intervals
computing intervals. Educational and Psychological for effect sizes. Educational Researcher, 31(3), 25–32.
Measurement, 61, 605–632. Wainer, H. (1999). One cheer for null hypothesis
Smithson, M. (2003). Confidence intervals. Thousand significance testing. Psychological Methods, 4(2),
Oaks, CA: Sage Publications. 212–213.
12
Re-conceptualizing
Generalization: Old Issues
in a New Frame
Giampietro Gobo
INTRODUCTION case, snowball, telephone)1 . With regards the

latter is stated:
Even though qualitative methods are now
recognized in the methodological literature, the obvious disadvantage of nonprobability sam-
they are still regarded with skepticism by pling is that, since the probability that a person
will be chosen is not known, the investigator
some methodologists, mainly those with generally cannot claim that his or her sample
statistical training. One reason for this skep- is representative of the larger population. This
ticism concerns whether qualitative research greatly limits the investigator’s ability to generalize
results can be generalized, which is doubted his or her findings beyond the specific sample
not only because they are derived from studied (…) A nonprobability sample may prove
perfectly adequate if the researcher has no desire
only a few cases, but also because even to generalize his or her findings beyond the sample.
where a larger number is studied these (Bailey, 1978: 92)
are generally selected without observing
the rigorous criteria of statistical sampling This position again tends to relegate
theory. In this regard, the methodology qualitative research to the marginal role
textbooks still distinguish samples into two of furnishing ancillary support for surveys,
types: probability samples (simple ran- which is precisely as it was conceived
dom, systematic, proportional stratified, non- by Barton and Lazarsfeld (1955) and the
proportional stratified, multistage, cluster, methodologists of their time.
area, and their various combinations), and The aim of this study is to show that
non-probability ones (haphazard or conve- this methodological denigration of qualitative
nience, quota, purposive, of the emblematic research is overly severe and unjustified,
for three reasons. First, because the use of unemployed be extracted if the whole list of
probability samples and statistical inference unemployed people is not available beforehand?
in social research often proves problem- It is true that many unemployed people are enrolled
atic. Second, because there are numerous at job placement offices, but it is equally true
disciplines, in both the social and human that not all unemployed people are so enrolled.
Consequently, the majority of studies on particular
sciences, whose theories are based exclusively
segments of the population cannot make use
on research conducted on only a few cases.
of population lists: consider studies on blue-
Third, because, pace the methodological collar workers, the unemployed, home-workers,
orthodoxy, a significant part of sociological artists, immigrants, housewives, pensioners, foot-
knowledge, is idiographic. My intention is ball supporters, members of political movements,
therefore not to criticize sampling theory charity workers, elderly people living alone, and
or its applications; rather, it is to remedy so on.
a situation where statistical inference is 2 The phenomenon of nonresponse. The concept
deemed the only acceptable method, and of random selection is theoretically very simple
idiographic generalization as scientifically ill- and, thanks to the ideal-typical image of the box,
founded. Finally qualitative researchers do not quite clear to the general public. This clarity is
need to throw away the baby generalization misleading, however, because human beings differ
with the bathwater of probability sampling, from balls in a ballot box in two respects: they
are not immediately accessible to the researcher,
because we can have generalizations without
and they are free to decide not to answer. In fact,
probability. account must be taken of the gap (which varies
according to the research project) between the
initial sample (all the individuals about whom we
THE PROBLEMATIC USE OF want to collect information) and the final sample
PROBABILITY SAMPLES IN SOCIAL (the cases about which we have been able to obtain
RESEARCH information); the two sets may correspond, but
usually some of the objects in the first sample
Several authors (among them Goode and Hatt, are not surveyed. As Groves and Lyberg (1988:
191) pointed out, nonresponse error threatens
1952; Chain, 1963; Galtung, 1967; Capecchi,
the characteristic which makes the survey unique
1972) have stressed that the application of
among research methods: its statistical inference
statistical sampling theory in sociological from sample to population. If the sample is at odds
contexts gives rise to various difficulties. with the probability model, nothing can be said
This theory, in fact, requires the researcher about its general representativeness; that is, about
to construct a probability sample (one, that whether it truly reproduces all the characteristics
is, where each subject’s likelihood of being of the population.
selected is known and also every item has 3 Representativeness and generalizability: two sides
an equal chance of being selected), and the of the same coin? The social science textbooks
cases must be selected in rigorously random usually describe generalizability as the natural
manner. But these two requirements are not outcome of a prior probabilistic procedure.
easy to satisfy in social research, because their In other words, the necessary condition for
carrying out a statistical inference is previous
fulfillment encounters a series of obstacles,
use of a probability sample. It is forgotten,
not all of which can be overcome.
however, that probability/representativeness and
There is no space to describe in depth the generalizability are not two sides of the same
problems and limits of statistical sampling coin. The former is a property of the sample,
theory (see Gobo, 2004). I will briefly whilst the latter concerns the findings of research.
examine three limits only: Put otherwise: between construction of a sample
and confirmation of a hypothesis there intervene a
1 The difficulty of finding sampling frames (lists complex set of activities which pertain to at least
of population) for certain population sub-sets, seven different domains: (1) the trustworthiness of
because these frames are often not available. operational definitions and operational acts; (2)
How, for example, can a random sample of the the reliability of the data collection instrument;
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 195
(3) the appropriateness of conceptualizations; Some have aptly pointed out that ‘most
(4) the accuracy of the researcher’s descriptions, social anthropological and a good deal of
categorizations, and/or measurements; (5) to be sociological theorizing has been founded
successful with observational (or field) relations; upon case studies’ (Mitchell, 1983: 188) or
(6) the validity of the data; and (7) the validity has been the product of exclusively theoretical
of the interpretation. These activities, and their
inquiry (without, that is, being grounded
relative errors (called ‘measurement errors’ in the
on systematic research). The more moderate
literature), may impair the connection between
probability/representativeness and generalizabil-
have complied with the injunction of the
ity – a not infrequent occurrence in a complex statisticians but reconceptualized the problem
activity like social research. by claiming that there are two types of
generalization (which they have termed in
various ways): enumerative (statistical) vs.
These drawbacks do not signify that proba-
analytic induction (Znaniecki, 1934: 236;
bility sampling and statistical inference are
Mitchell, 1983: 191); formalistic/scientific vs.
instruments by their nature unsuited to social
naturalistic generalization (Stake, 1978: 6);
research. Rather, according to the research
distributive vs. theoretical generalization
setting, they are instruments with certain
(Hammersley, 1992: 186ff; Williams, 2000:
practical disadvantages that can sometimes be
215; Payne and Williams, 2005: 296–7).
remedied and sometimes cannot.
The first type of generalization involves
In light of these difficulties, probability
estimating the distribution of particular fea-
sampling cannot be propounded as the only
tures within a finite population; the second,
model suited to the generalization of findings.
eminently theoretical, is concerned with the
As Geertz (1973: 21) points out, it is
relations among the variables in any sample
not only statistical inference that enables
of the relevant kind (moreover, the population
the move from ‘local truths to general
of relevant cases is potentially infinite).
visions.’ Moreover, as we have seen, not
The latter is usually based on identifying
all sociological phenomena can be studied
causal or essential relations among particular
with rigorous application of the principles of
categories, whose character is defined by
sampling theory, the consequence being that
those relations, so that it is inferred that all
the adoption of other forms of generalization
instances of those categories are involved in
has been vital for social research: otherwise,
the specified type of relation.
an important part of sociological theory (that
Even though some qualitative researchers
based on research conducted on a few cases or
may privately agree with Znaniecki (1934:
even on haphazard or convenience samples as
236–7) that analytical induction is the true
in the cases of, for example, Gouldner, Dalton,
method of science and it is the superior method
Becker, Goffman, Garfinkel, Cicourel) would
(because it discovers the causal relations of
never have been produced.
a phenomenon rather than only the probabilis-
tic ones of co-occurrence), the idea that there
exist two types of generalization represents
GENERALIZATION AS SEEN BY acceptance of the statisticians’ diktat. It also
QUALITATIVE METHODOLOGISTS represents acceptance of a ‘political’ division
into areas of competence: a compromise
Qualitative researchers have taken up a variety already envisaged by some members of the
of positions in reaction to the pronouncement Chicago School, like Burgess (1927), who
that those who do not use probability samples maintained that statistics and case studies
cannot generalize. The most extreme of were mutually complementary2 with their
them have (paradoxically) on the one hand own criteria of excellence.
accepted the verdict but on the other dis- The distinction between the two types of
missed sampling as ‘a mere positivist worry’ generalization has been drawn with exem-
(Lincoln and Guba, 1979; Denzin, 1983). plary clarity by Alberoni and colleagues, who
wrote in their Introduction to a study on an idiographic account which lays no claim

108 political activists of the Italian Commu- to generalization (see Burrell and Morgan,
nist Party and the Christian Democrat Party as 1979). Norman K. Denzin is very explicit on
follows: the matter:
if we want to know, for instance, how many The interpretivist rejects generalization as a goal
activists of both parties in the whole country are and never aims to draw randomly selected samples
from families of Catholic or Communist tradition, of human experience. For the interpretivist every
(this) study is useless. Conversely, if we want instance of social interaction, if thickly described
to show that family background is important in (Geertz, 1973), represents a slice from the life world
determining whether a citizen will be an activist in that is the proper subject matter for interpretative
the Communist rather than Christian Democratic inquiry (…) Every topic (…) must be seen as
party, this research can give the right answer. If we carrying its own logic, sense or order, structure, and
want to find out what are and have been the meaning. (Denzin, 1983: 133–4)4
percentages of the different ‘types’ of activists […]
in both parties, the study is useless, whereas if we Guba and Lincoln (1981: 62) likewise claim
want to show that these types exist the study gives that ‘it is virtually impossible to imagine any
a certain answer […]. The study does not aim at
human behavior that is not heavily mediated
giving a quantitative objective description of Italian
activism, but it can aid understanding of some by the context in which it occurs. One can
of its essential aspects, basic motivations, crucial easily conclude that generalizations that are
experiences and typical situations which gave birth intended to be context free will have little
to Italian activism and help keep it alive. (1967: 13) that is useful to say about human behavior.’
However, Guba and Lincoln moderate their
The two generalizations are therefore made in position by introducing two novel elements:
completely different ways3 . a new formulation of the concept of working
This moderate stance has been adopted by hypothesis proposed by Cronbach (1975), and
the majority of qualitative methodologists, the new concept of transferability.
some of whom have sought to underscore the According to Cronbach, ‘when we give
difference between statistical and ‘qualitative’ proper weight to local conditions, any gen-
generalization by coining specific terms for eralization is a working hypothesis, not
the latter. This endeavor has given rise to a conclusion’(1975: 125). Hence, Lincoln and
a welter of terms: ‘naturalistic generalization’ Guba maintain,
(Stake, 1978: 6), ‘transferability’ (Lincoln
local conditions (…) make it impossible to gener-
and Guba, 1979), ‘translatability’ (Goetz and alize. If there is a ‘true’ generalization, it is that
LeCompte, 1984), ‘analytic generalization’ there can be no generalization. And note that
(Yin, 1984), ‘extrapolation’ (Mitchell, 1983: ‘working hypotheses’ are tentative both for the
191; Alasuutari, 1995: 196–7), ‘moderatum situation in which they first uncovered and for
other situations; there are always differences in
generalization’ (Williams, 2000; Payne and
context from situation to situation, and even the
Williams, 2005), and others. single situation differs over time. (1979, reprinted
2000: 39)
Five concepts of generalization They now make their own proposal,

At least five different positions on the which has become well-known in qualitative
generalizability of research results can be research:
identified within the qualitative methodolog- How can one tell whether a working hypothesis
ical tradition (see Ragin and Becker, 1992; developed in Context A might be applicable in Con-
Gomm, Hammersley and Foster, 2000). text B? We suggest that the answer to that question
The first position, the most radical of must be empirical: the degree of transferability is
them, has been assumed by Lincoln and a direct function of the similarity between the two
contexts, what we shall call ‘fittingness’. Fittingness
Guba (1979), Guba (1981), Guba and Lincoln is defined as the degree of congruence between
(1982), and Denzin (1983). It adheres to the sending and receiving contexts. If Context A and
traditional position that qualitative research is Context B are ‘sufficiently’ congruent, then working
but to no others. (1990: 191, bold in the original than a handful of nations or organizations –
text) sometimes even fewer – are compared with
respect to the forces driving a societal
In other words the aim is not to generalize outcome such a political development or
to some finite population but to develop an organizational characteristic’ (Lieberson,
theoretical ideas that will have general 1992, reprinted in 2000: 208).
validity.
More practical are authors who engage
in ‘evaluation research’ (Cronbach, 1982; The unavoidableness of
Pawson and Tilley, 1997). These ground their generalization
reasoning on the notion of the cumulability
Sampling and generalizing are unavoidable
of knowledge: case study after case study,
practices because, even before being sci-
in the course of time in a particular sector
entific, they are everyday life activities
of research, there accumulates a repertoire
deeply rooted in thought, language, and
or inventory of the possible forms that
practice (Gobo, 2004). With regard to thought,
a particular object of study may assume.
cognitive psychologists have demonstrated
As Pawson and Tilley (1997: 119–20) put
the tendency of people to generalize on
it, in polemic with Guba and Lincoln, what
the basis of a few observed characteristics
can be transferred between studies are not
or events, a process called the heuristic of
‘lumps of cases’ but ‘sets of ideas’ which
representativeness by Kahneman and Tversky
enable understanding of general mechanisms.
(1972) and Tversky and Kahneman (1974).
In other words, cumulability is the prelude for
With regard to the world of language, the
qualitative generalizability.
same function is performed, as Becker has
The final position, and perhaps the oldest
stated, by ‘synecdoche, a rhetorical figure in
of them, is represented by Znaniecki’s method
which we use a part for something to refer the
of analytic induction. The purpose of analytic
listener or reader to the whole it belongs to’
induction is to uncover causal relations
(1998: 67). Finally, in the world of action, the
through identification of the essential charac-
seller shows a sample of cloth to the customer;
teristics of the phenomenon studied. To this
in a paint shop the buyer skims through the
end, the method starts not with a hypothesis
catalogue of color shades in order to select
but with a limited set of cases from which an
a paint; the buyer tastes in order to choose a
initial explanatory hypothesis is then derived.
wine or a cheese; the teacher asks a student
If the initial hypothesis fails to be confirmed
questions to assess his or her knowledge
by one case, it is revised. Additional cases of
about the syllabus. In everyday life, social
the same class of phenomena are then selected.
actors constantly sample and generalize.
If the hypothesis is not confirmed by these
As Gomm, Hammersley and Foster point out,
further cases, the conceptual definition of the
‘we all engage in naturalistic generalizations
phenomenon is revised. The process continues
routinely in the course of our life, and this
until the hypothesis is no longer refuted
may take the form of empirical generalization
and further study tells the researcher nothing
as well as of informal theoretical inference.
new (Znaniecki, 1934: 236ff). The inner
Given this, there is no reason in principle
logic of analytic induction derives from
why case study research should not provide
Mill’s ‘method of agreement’ and ‘method of
the basis for empirical generalization’ (2000:
difference.’
104). This is also because the unavoidability
There are several variants of Znaniecki’s
of generalization is epistemologically and
method of analytic induction. One of them is
reflexively founded. As Gomm, Hammersley
Mitchell’s (1983) critical case study approach.
and Foster acutely observe:
Analytic induction revisited has been also
widely used in comparative studies based on the very meaning of the word ‘case’ implies that
a small numbers of cases ‘when little more what it refers to is a case [instance or example]
hypotheses from the sending originating context Naturalistic generalizations develop within a person
may be applicable in the receiving context. (Lincoln as a product of experience. They derive from
and Guba, 1979, reprinted 2000: 40) the tacit knowledge of how things are, why
they are, how people feel about them, and
how these things are likely to be later or in
However, transferability is not an inferential
other places with which this person is familiar.
process performed by the researcher (who They seldom take the form of predictions but
cannot know all the other contexts of they lead regularly to expectations (…) These
research). Rather, it is a choice made by the generalizations may become verbalized, passing of
reader, who on the basis of argumentative course from tacit knowledge to propositional; but
they have not yet passed the empirical and logical
logic and a thick description (of the case
tests that characterize formal (scholarly, scientific)
study) produced by the researcher, may generalizations. (Stake, 1978: 6)
decide (on his/her own responsibility – see
Gomm, Hammersley and Foster, 2000: 102) A third position, which is contiguous
to transfer this knowledge to other situations to the intrinsic case study, has been put
that she/he deems similar (Lincoln and Guba, forward by Connolly (1998). It starts from
1979, reprinted 2000: 40). The reader, basing the distinction between extensive vs. intensive
this on the persuasive power of the arguments studies. The aim of the former (like case
used by the researcher, decides on the studies) is to identify statistically significant
similarity between the (sending) context of the and therefore generalizable causal relations;
case studied and the (receiving) contexts to the aim of the latter is to reconstruct in detail
which the reader him/herself intends to apply the mechanisms that connect cause and effect.
the results (Guba and Lincoln, 1982: 246). Like Stake, Connolly relieves the case study
To conclude, these authors are convinced of responsibility for formal generalization, but
that ‘generalizations are impossible since he gives it a task complementary to such gen-
phenomena are neither time- nor context- eralization, explaining (via the mechanisms)
free’; however, ‘some transferability of these correlations whose statistical significance has
hypotheses may be possible from situation to already been documented by other studies.
situation, depending on the degree of temporal These three positions have a common
and contextual similarity’ (Guba and Lincoln, basis consisting in the concept of ‘theoretical
1982: 238). sampling’ proposed by Glaser and Strauss
A second, more moderate, approach has (1967), Schatzman and Strauss (1973) and
been proposed by Stake (1978: 1994), who Strauss (1987): when we do not possess
argues that the purpose of case studies is complete information about the population,
not so much to produce general conclusions cases are selected according to their status on
as to describe and analyze the principal one or more properties identified as the subject
features of the phenomenon studied. If these matter for research.As Mason writes, ‘theoret-
features concern an emblematic case of ical sampling is concerned with constructing
political, social, or economic importance (for a sample which is meaningful theoretically
example, the decision-making procedures of because it builds in certain characteristics
a large institution like the US Department or criteria which help to develop and test
of Defense), the ‘intrinsic case study’ will your theory and explanation’ (1996: 94). And
per se produce results of indubitable intrinsic Strauss and Corbin are very explicit on the
relevance5 , even though they cannot be concept of generalization:
generalized in accordance with the canons of
scientific induction: in terms of making generalization to a larger popu-
lation, we are not attempting to generalize as
naturalistic generalization, arrived at by recognizing such but to specify […] the condition under which
the similarities of objects and issues in and out of our phenomena exist, the action/interaction that
context and by sensing the natural covariations of pertains to them, and the associated outcomes or
happenings. To generalize this way is to be both consequences. This means that our theoretical for-
intuitive and empirical, and not idiotic. mulation applies to these situation or circumstances
of something. In other words, we necessarily it may be a coincidence, the fact that

identify cases in terms of general categories (…) Egon Guba was a well-known statistician
the idea that somehow cases can be identified
before he became a celebrated qualitative
independently of our orientation to them is false.
It is misleading to talk of the uniqueness of cases methodologist may have heightened the
(…) we can only identify their distinctiveness on the inflexibility of the debate. Consequently, an
basis of a notion of what is typical or representative unexpected consequence of this paradox is
of some categorial group or population. (Gomm, that interpretivism has been just as positivist
Hammersley and Foster, 2000: 104)
on qualitative generalization as quantitative
The unavoidableness of generalizing is methods have.
such that ‘in practice, much case study Second, the concept of theoretical sampling
research has in fact put forward empirical has failed to address the problem of sam-
generalizations’ (Gomm, Hammersley and ple representativeness which Denzin himself
Foster, 2000: 98) and ‘current qualitative (1971) considered so important. Likewise,
researchers often seem to produce [general- the concept of transferability provides ‘no
ization] unconsciously’ (Payne and Williams, guidance for researchers about which case to
2005: 297). study – in effect, it implies that any case may
be as good as any other in this respect’(Gomm,
Hammersley and Foster, 2000: 101). This
omission has been pedagogically harmful
FOR AN IDIOGRAPHIC SAMPLING because it has permitted several generations
THEORY of qualitative researchers entirely to neglect –
in the belief that ‘anything goes’ – this aspect
The thesis (which I have called ‘moderate’) of the investigative process.
that there are two types of generalization Third, an opportunity has been missed to
has had the indubitable merit of cooling the rediscuss the entire issue, addressing it in more
dispute with quantitative methodologists and practical (and not solely theoretical) terms
of legitimating two ways to conduct research. with a view to developing a new sampling
However, this political compromise has also theory: an idiographic theory, joint and equal
had a number of harmful consequences. with statistical theory, and which remedies a
First, it has not stimulated reflection on how series of ancestral misunderstandings:
to emancipate ‘qualitative’ generalization
denial of the capacity of case study research to
from its subordination to statistical infer-
support empirical [distributive] generalization often
ence. Traditional methodologists continue seems to rest on the mistaken assumption that this
to attribute inferior status to qualitative form of generalization requires statistical sampling.
research, on the grounds that although it can This restricts the idea of representation to its
produce interesting results, they have a limited statistical version; it confuses the task of empirical
generalization with the use of statistical techniques
extension only. This long-standing positivist
to achieve that goal. While those techniques are
prejudice has been recently reinforced by a very effective basis for generalization, they are
the extreme positions taken up by Lincoln not essential. (Gomm, Hammersley and Foster,
and Guba (1979) and Denzin (1983). Their 2000: 104)
insistence that generalization in interpretative
research is impossible, and that their work
Sampling in some contemporary
is not intended to produce scientific general-
sciences
izations, paradoxically fits perfectly with the
equally intransigent position of quantitative The first step in this endeavor is to survey
methodologists. As Gomm, Hammersley and certain disciplines – paleontology, archaeol-
Foster, observe, ‘to deny the possibility of ogy, geology, ethology, biology, astronomy,
case studies providing the basis for empirical anthropology, cognitive science, linguistics
generalizations is to accept the views of their (which for some scientists is more reputable
critics too readily’ (2000: 98). Even though than sociology) – and see how they have
tackled the problems of representativeness which climbed trees and used wings to
and generalizability. In certain respects, these glide back to earth. This was the theory
are disciplines akin to qualitative research, propounded, for example, by the American
for they work exclusively on few cases and naturalist, William Beebe, who as early
have learnt to make a virtue out of necessity. as 1915 had predicted the existence of
As Becker writes: feathered dinosaurs exactly like Microraptor
gui. However, the British journal urged
Archeologists and paleontologists have this prob- caution when evaluating the importance of
lem to solve when they uncover the remnants of the discovery: the Microraptor could also be
a now-vanished society. They find some bones,
but not a whole skeleton; they find some cooking
an evolutionary blind ally which had not left
equipment, but not the whole kitchen; they find descendants.
some garbage, but not the stuff of which the There are therefore numerous disciplines
garbage is the remains. They know that they are which work on a limited number of cases,
lucky to have found the little they have, because and do so consciously; in fact, there is
the world is not organized to make life easy for
archeologists. So they don’t complain about having
animated discussion within them on sam-
lousy data. (1998: 70–1) pling and generalizability. Moreover, this
procedure is adopted by other disciplines
For reasons of space, it is not possible here to as well: for instance, biology, astrophysics,
provide an exhaustive account of how these history, genetics, anthropology, linguistics,
disciplines have dealt with the above issues. cognitive science, psychology (whose theo-
But by way of example, consider the following ries are largely based on experiments, and
study, which is one of the dozens published on therefore on research conducted on non-
the subject. It appeared in the journal Nature probabilistic samples consisting of psychol-
on January 23, 2003. ogy students). Why, we may ask, is this
The scientist Xing Xu and colleagues procedure acceptable for monkeys, rocks,
(2003) of the Institute of Vertebrate Palaeon- and cells but not for human beings? Why
tology, Beijing, had found six fossils in do the majority of disciplines work with/on
the province of Liaoning, North China. non-probability samples (regarded as being
The impression left in the rock was of two just as representative of their relative pop-
pairs of wings and a long feathered tail ulations and therefore as producing gen-
of what appeared to be a Microraptor gui: eralizable results) while in sociology this
a dinosaur less than one meter in length is not possible? Why can a geneticist like
which lived in that region of China around Luca Cavalli Sforza of Stanford University
130 million years ago. According to its argue that the evolution of language has
discoverers, the fossil was the missing link had a direct impact on our genetic her-
between terricolous dinosaurs and modern itage, while in sociology a similar claim
birds, the intermediate evolutionary stage for would require very different methodological
which scientists had long been searching. support? The majority of these disciplines
The discovery has fuelled the debate among start from the assumption that their objects
paleontologists on the origin of flight. Whilst of study possess quasi-invariant states on
the close kinship between birds and dinosaurs the properties observed: that is, their states
is accepted by almost all scientists, there is with respect to a property (e.g. size of the
much disagreement on the evolutionary stages brain or the physique of a hominid) vary
that led to winged flight. The predominant little and slowly among members of the
theory is that wings began to develop, not class. Consequently, these disciplines are
to enable flight but to help the ancestors unconcerned about their use of only a handful
of birds to run faster. The small dinosaur of cases to draw inferences and generaliza-
discovered in China instead appeared to tions about thousands of people, animals,
support the opposite hypothesis, namely that plants, and other objects. Moreover, science
the direct ascendants of birds were animals studies the individual object/phenomenon not
in itself but as a member of a broader nothing can be said. Between the rationalism
class of objects/phenomena with particular and the postmodern nihilism underlying these
characteristics/properties. two positions, one may attempt to address
the problem in practical terms, doing so
by examining the nature of the units of
FOUR PROPOSALS FOR AN analysis considered, rather than adhering to
IDIOGRAPHIC SAMPLING THEORY standard procedural rules. As stressed by
Rositi (1993: 198), we may reasonably doubt
The above survey of disciplines midway the generalizability of findings from
between the natural sciences and the social
studies of 1,000–2,000 cases which claim to sample
science yields a number of suggestions the whole population. We have to wonder if we
for formulation of an idiographic sampling should prefer such samples with such aims […].
theory. They can be summarized in the Studies with samples of 100–200 conversational
following four steps: interviews, structured to ‘describe’ variables rather
than a population are definitely more suitable for
a new model of studying society. (1993: 198)
(a) abandon the (statistical) principle of probability;
(b) recover the (statistical) principle of variance;
(c) pay renewed attention to the units of analysis;
(d) identify social regularities.
Variance: From (general) principle to
(local) practice
The second step is to recover the (statistical)
Representativeness without
principle of variance, which has received
probability less attention than the probability principle.
The use of probability samples does not Contrary to the latter’s standardizing intent
automatically signify the use of representative and automatist inclination (which are among
samples. Random and representative are terms the reasons for its success), variance is
neither synonymous nor necessarily inter- a criterion which requires the researcher to
related. ‘Randomness’ concerns a particular reason, to conduct contextual analysis, and
procedure used to select the cases to include in to take local decisions. Under the variance
a sample, while ‘representativeness’ concerns principle,
the outcome of the selection. One may
in order to determine the sample size, the statistics
question whether the former is the obligatory
must first know the range of variance that the
path for the latter. Nor do representativeness researcher intends to measure (at least in sufficiently
and probability form a natural pair, since it close terms) because it is likely that, if the range
may be possible to construct a representative of variance of variable X is high, n [the number of
sample using other procedures. Qualitative individuals to interview] will be high, whereas if the
range of variance is restricted (for example to only
research (or at least a part of it) does
two modalities), n may be very restricted as well.
not relinquish the aim of working with (Capecchi, 1972: 50)
representative samples; it only rejects the
obligatory nexus between probabilistic and Hence, it is more likely that a sample will be a
representative (on the one hand), or between miniature of the population if that population
randomness and representativeness (on the is tendentially homogeneous; and it is less
other). likely to be so if the reference population
It is therefore not necessarily the case is tendentially heterogeneous. Consequently,
that a researcher must choose between an if the variance is high, the researcher will
(approximately) random sample or an entirely require a large number of cases (in order to
subjective one – or between a sample which include every dimension of the phenomenon
is (even only) partially probabilistic and one studied in his/her sample). If, instead, the vari-
about whose representativeness absolutely ance is low, the researcher will presumably
need only a few cases, and in some instances the quintessence of non-probability sam-
only one. In other words, pling. The research studies by Alvin G.
Gouldner and Melvin Dalton belong to this
it is important to recognize that the greater the category. For example, Gouldner (1954)
heterogeneity of a population the more problematic studied a gypsum mine situated close to the
are empirical generalizations based on a single case,
university where he taught (a convenience
or a handful of cases. If we could reasonably assume
that the population were composed of more or less sample, therefore6 ). In his methodological
identical units, then there would be no problem. appendix, Gouldner reported that his team
(Gomm, Hammersley and Foster, 2000: 104) conducted 174 interviews – and therefore
on almost all the population (precisely
As also Payne and Williams (2005: 306–7) 77 percent). One hundred and thirty-two of
point out: these 174 interviews were conducted with
a ‘representative sample’ of the blue-collar
the breadth of generalization can be extensive workers at the company, for which purpose
or narrow, depending on the nature of the
Gouldner used quota sampling stratified by
phenomenon under study and our assumptions
about the wider social world (…) [hence] the age, rank, and tasks. He then constructed
generalization may claim high or lower levels of another representative sample of 92 blue-
precision of estimates (…) [and it] will be conditional collar workers, to whom a questionnaire was
upon the ontological status of the phenomena in administered.
question. We can say more, or make stronger claims
Dalton (1959), who was a company
about some things than others. A taxonomy of
phenomena might look like this: 1◦ physical objects manager at that time, conducted covert
and their social properties; 2◦ social structures; observation at Milo and Fruhuling, the
3◦ cultural features and artefacts; 4◦ symbols; fictitious names of two American compa-
5◦ group relationships; 6◦ dyadic relationship; nies for which he worked as a consultant
7◦ psychological dispositions/behaviour (…) This
(again a convenience sample, therefore).
outline taxonomy demonstrates that generaliza-
tions depend on what levels of social phenomena The ethnologist De Martino (1961) observed
are being studied. 21 people suffering from tarantism disease;
Goffman (1961) stayed for several months
The conversation analyst Harvey Sacks at a psychiatric hospital; the anthropologist
(1992, vol. 1: 485, quoted in Silverman, Geertz (1972) attended 57 cock fights; Sacks
2000: 109) reminds us of the anthropologist and colleagues described the mechanics
and linguist Benjamin Lee Whorf, who of conversational interaction by analyzing
was able to reconstruct Navajo grammar a few telephone calls; the anthropologist
by extensively interviewing only one native Crapanzano (1980) studied Moroccan social
Indian speaker. Grammars usually have low relations through the experience of Tuhami,
variance. However, had Whorf wanted to a tilemaker. The anthropologist Griaule
study how the Navajo educated their children, (1948) reconstructed the cosmology of the
entertained themselves, etc., he would (per- Dogon, a tribe in Mali, by questioning only
haps) have found greater variance in the a small group of informants; Bourdieu’s
phenomenon and would have needed more book (1993) on professions was based on
cases. On this logic, the formal criteria 50 interviews with policewomen, temporary
that guide sampling are more informed by workers, attorneys, blue-collar workers, civil
and embedded in sociological (rather than servants, and unemployed workers.
statistical) reasoning based on contingent Why, one may ask, have such circum-
reflection about the dimensions specific to the scribed studies given rise to such wide-
phenomenon investigated and the knowledge ranging theories? In other words, why have
objectives of the research. they been generalized to other contexts? I shall
Moreover, as said, an authoritative part of answer these questions later. For the moment
sociological theory and a large part of anthro- I would stress (and avoid) the danger of
pological theory are based on the case study: the nihilistic or postmodern drift implied
by this approach, where any sample may This means that the sampling unit (e.g. the
serve and it is not worth bothering too much family) is different from the observational
about it. Instead, at a certain point of the unit (i.e. the single respondent as a mem-
inquiry, giving clear definition to the units ber of the family). Only a focus group
of analysis (an operation performed before can (at least to some extent) preserve the
the cases are selected, and therefore before integrity of the collective subject. Instead,
the sample is constructed) is of extreme choosing individuals implies an atomistic
importance if the research is not to be botched rather than organic conception of society
and empirically inconsistent. On analyzing (Burgess, 1927), whose structural elements
a series of Finnish studies on ‘artists,’Mitchell are taken for granted or reckoned to be
and Karttunen (1991) found that the results mirrored in the individual (Galtung, 1967:
differed according to the definition given to 37), while the sociological tradition that
‘artist’ by the researchers, a definition which gives priority to relations over individuals is
then guided construction of the sample. In neglected. As a consequence, the following
some studies, the category ‘artist’ included more dynamic units are neglected as well:
(i) subjects who defined themselves as
artists; (ii) those permanently engaged in • beliefs, attitudes, stereotypes, opinions;
the production of works of art; (iii) those • emotions, motivations;
recognized as artists by society at large; and • behaviors, social relations, meetings, interactions,
(iv) those recognized as such by associations ceremonies, rituals, networks;
of artists. The obvious consequence was that it • cultural products (such as pictures, paintings,
was subsequently impossible to compare the movies, theatre plays, television programs);
results of these studies. • rules and social conventions;
• documents and texts (historical, literary, journal-
istic);
• situations and events (wars, elections).
Units of analysis
The standard practice in sociology and Hence, ‘a reliable sampling model that rec-
political science is to choose clearly defined ognizes interaction must be adopted [so that
and easily detectable individual or col- sampling is conducted on] interactive units
lective units: persons, households, groups, (such as social relationships, encounters,
associations, movements, parties, institutions, organizations)’ (Denzin, 1971: 269).
organizations, regions, or states. The consis- The researcher should focus his/her inves-
tency of these collective subjects is vague. tigation on these kinds of units, not only
In practice, members of these groups are because social processes are more easily
interviewed individually: the head of the detectable and observable, but also because
family, the human resources manager, the these units allow more direct and deeper
statistics department manager, and so on. analysis of the characteristics observed.
Consider the following illustrative example. Assume that we want to study work practices at call
centers, which are technology-intensive workplaces. In Italy, it has been calculated that there were
1350 call centers in 2002. In order to construct a probability and representative sample, we may
proceed in two ways: randomly extract a certain number of cases from the population list (which is
possible because a complete list can be obtained from the Chambers of Commerce), or construct a
proportional stratified sample. In this latter case, we must first classify call centers according to the
properties that interest us:
• the ownership of the organization, so that we have private call centers (e.g. Vodafone), public ones (e.g.
the 911 emergency helpline), and non-profit ones;
• the ‘vocation,’ so that we have call centers that are ‘generalist’ (in the sense that they provide a variety
of services) or ‘vertical’ (i.e. dedicated to only one service, e.g. credit recovery);
• membership or otherwise of the organization for which the service is provided, so that we have call
centers ‘internal’ to the company, or ones to which the work is outsourced;
• the classic variables such as size of the organization (small, medium, large), geographical location
(north-west, north-east, centre, south, islands), etc.;
• the type of service furnished.
Note that many of these properties are mutually exclusive, so that the sampling decision must be
carefully pondered. In these cases, the usual practice is for the researcher to base the probability
sampling on the first property. However, this may be sociologically inadequate if the researcher’s
interest is in work practices, because these cannot be accessed via the variable ‘ownership.’ For some
authors (e.g. Capecchi, 1972), representativeness does not seem to transfer from one property to
another. Put otherwise: it is not the variance of the ownership of call centers that interests us
here, but the variance of work practices. It might be more satisfactory to choose property (e).
Experience of this sector of inquiry (but also the literature, previous research, interviews with
experts or operators in the sector, etc.) shows that call centers mainly provide the following services:
counseling, credit recovery, marketing, interviewing, and advertising. Constructing a probability
sample on this classification is practically impossible because a population list for each of these
activities does not exist. The only alternative is to use the method outlined in the previous section.
Again on the basis of experience, we note that only the first of these five activities has substantial
variance, while the four latter seem to have low variance. In fact, the counseling provided by call
centers is multiform: it consists of information, technical assistance, psychological help or support,
medical advice, or therapy. Consequently, in order to preserve the representativeness of the sample,
we must sample several cases for the specific work practice of counseling. If we have insufficient
resources to collect the necessary number of cases, we can restrict our research to only some
activities. Other studies in the future will account for the rest.
It is evident that representativeness is wants to make. The first two criteria are
not always possessed by the sample when in some way opposed to each other: com-
research begins. It is a resource also acquired parative inference maximizes the probability
ex post, progressively and iteratively, research of extracting odd cases; deductive inference
project after research project, with the gradual selects only odd (deviant) cases. Theoretical
accumulation of expertise. This definition of inference instead concentrates on emblematic
representativeness seems somehow to tie this cases, focusing on social similarities.
property to the relation between the results
obtained by an individual research project Deductive inference
and the experience of the researcher who The first criterion consists of the choice of
conducts it. a critical or deviant case which can be used
(à la Popper) to prove the refutability of an
accredited or standard theory. An outstanding
In search of social regularities
example of its application is provided by
I now turn to the final aspect of the entire Goldthorpe et al.’s study (1968) of workers
question. There are three broad criteria which in the town of Luton. The distinctive feature
serve to orient the construction of a non- of this inferential process is that it starts from
probability sample; and to each of them a theory of which it intends to prove the
corresponds a particular form of reasoning implausibility: in this case the embourgeoise-
alternative to inductive or statistical inference: ment of the working class. The theory is tested
deductive inference, comparative inference, against a case comprising the largest number
and emblematic case. (and the greatest intensity) of its founding
The three criteria impose different cogni- properties or requirements of this theory. If, in
tive objectives, and they are used according to these optimal conditions, the consequences
the type of generalization that the researcher foreseen by the theory do not ensue, it is
extremely unlikely that the theory will work to which a phenomenon is widespread in
in all those empirical cases where those the population. It only directs the scientific
requirements are more weakly present. Hence community’s attention to the phenomenon’s
the theory is falsified, and its inadequacy existence and the need to revise the dominant
can be legitimately generalized. When the theory. The generalization to the population
critical case study procedure is used, the cases comes about by default: that is by virtue of
are selected according to their explanatory the non-occurrence of the event foreseen by
power, rather than according to the criteria of the theory under examination.
probability theory or their typicality (Mitchell, Obviously, the generalization must be
1983: 207, 209). Moreover, the legitimacy of carefully thought through. Otherwise, the
the generalization (of the scant explanatory danger arises of lapsing into the determinism
capacity of the theory just falsified) depends to which Popper’s falsificationism is suscep-
not only on the cogency of the rhetorical tible. As Lieberson (1992: 212) emphasizes:
argument but also on the strength of the
connections established between theory and it is very difficult to reject a major theory because it
observations. appears not to operate in some specific setting. One
There are many other important studies is wary of concluding that Max Weber was wrong
(which follow in a very broad sense the because of a single deviation in some inadequately
Popperian approach) which have focused on understood time or place. In the same fashion, we
would view an accident caused by a sober driver as
deviant cases in order to understand standard failing to disprove the notion that drinking causes
behavior: Goffman (1961) on ceremonies and automobile accidents.
rituals in a psychiatric clinic; Cicourel and
Boise (1972) on the interpersonal communi- Comparative inference
cations of deaf children; Garfinkel (1967) on The second criterion is used to make gen-
achievement of sex status in an ‘intersexed’ eralizations similar to statistical inferences,
person; Pollner and Winkler (1985) on but without employing probability criteria.
interactions in a family with a mentally This can be done by identifying cases
retarded child; and many others. within extreme situations as well as certain
This criterion can also be used to explore characteristics, or cases within a wide range of
subcultures or emergent or avant-garde phe- situations in order to maximize variation, that
nomena which may become dominant or is, to have all the possible situations in order
significant in the future, although at present to capture the heterogeneity of a population.
they are still marginal: see Festinger et al. We can choose two elementary schools
(1956) on millenial groups after their pre- where, from press reports, previous studies,
dicted date for the end of the world had interviews or personal experiences, we know
passes; Becker (1953) on marijuana smokers; we can find two extreme situations: in the
Hebdige (1979) on style groups like mods, first school there are severe difficulties of
punks, skinheads; Fielding (1981) on right- integration between natives and immigrants,
wing political movements. while in the second there are virtually none.
The deviant case can also be used to prove We can also pick three schools: the first with
the refutability and falsifiability of a well- severe integration difficulties; the second with
known and received theory, as in Rosenhan’s average difficulties; and the third with rare
(1973) study on the medical-organizational ones. In the 1930s and 1940s, the American
origin of psychiatric illness, or the already- sociologist W. Lloyd Warner (1898–1970) and
cited study by Goldthorpe et al. (1968) on his team of colleagues and students carried out
blue-collar workers in the town of Luton. This studies on various communities in the United
criterion (which is widely applied in biology, States. When Warner set about choosing the
astrophysics, history, genetics, anthropology, samples, he decided to select communities
linguistics, paleontology, archaeology, ethol- whose social structures mirrored important
ogy, geology) does not determine the extent features of American society. He chose four
communities (given assumed names): a city the construction of a sample: the typical or
in Massachusetts (Yankee City) ruled by tradi- emblematic case.
tions on which he wrote five volumes; a lonely Gouldner’s case studies (1954) on bureau-
county of Mississippi (Deep South, 1941); a cratization in medium-sized firms, or that by
Chicago black district (Bronzetown, 1945); Cicourel (1968) on the relational construction
and a city in the Midwest (Jonesville, 1949). of the figure of the juvenile delinquent, have
In comparative inferences, the cases are been considered amply generalizable (by both
selected by making careful comparisons: first researchers and readers), probably because
by seeking to find cases which represent all the they were typical cases and consequently
forms of heterogeneity in a target population, grasped structural aspects of the social action
and then by controlling whether they are in the organizations studied. Nor should we
sufficiently homogeneous with the type that forget that the question of generalizability
one wants to represent. In this difficult but is closely tied to the phenomenon being
important analysis, researched, according to the degree of vari-
ance in its states.
it is necessary to compare the characteristics of This means that it is possible to find cases
the case(s) being studied with available information which on their own can represent a significant
about the population to which generalization
feature of a phenomenon. Generalizability
is intended (…) we are suggesting that where
information about the larger population (or about thus conceived concerns more general struc-
overlapping populations) is available, it should be tures and is detached from individual social
used. If it is not available, then the potential practices, of which they are only an instance.
risks involved in generalization still need to be In other words, the scholar does not generalize
noted, preferably via specification of likely types
the individual case or event, which as Weber
of heterogeneity that could render the findings
unrepresentative. (Gomm, Hammersley and Foster, stressed is unrepeatable, but the key structural
2000: 105–106) features of which it is made up, and which are
to be found in other cases or events belonging
We are therefore very distant from the to the same species or class. As Becker has
concepts of naturalistic generalization and recently pointed out:
transferability, which are unsatisfactory in
various respects, for they ‘do not provide in every city there is a body of social practices —
a sound basis for the design, or justification, forms of marriage, or work, or habitation — which
of case study research’ (Gomm, Hammersley don’t change much, even though the people who
perform them are continually replaced through
and Foster, 2000: 102). They assign the reader the ordinary demographic process of birth, death,
a function which should also be performed immigration, and emigration. (2000: 6)
by the researcher (assuming responsibility
for affirming the generalizability of the On this view, the question of generalizability
study’s findings). They therefore relieve the assumes a different significance: for example
researcher of responsibility for the careful in the conclusions to his study on the
selection of cases on the basis of the variance relationship between a psychotherapist and
principle, and not solely on the basis of a patient suffering from AIDS, Peräkylä
the theoretical significance of theoretical writes:
sampling and of all research on variables
(rather than cases). As Schofield (1990) notes,
The results were not generalizable as descriptions
all too often cases seem to be chosen for of what other counselors or other professionals
reasons of convenience and are therefore do with their clients; but they were generalizable
atypical in various respects. as descriptions of what any counselor or other
professional, with his or her clients, can do, given
that he or she has the same array of interactional
The emblematic case competencies as the participants of the AIDS
If we bear the variance principle in mind, counseling session have. (1997: 216, quoted in
there emerges a third major criterion for Silverman, 2000: 109)
Something similar happens in film and the researcher has learnt in the field is provided
radio productions with noise sampling. by Becker:
The squeak of the door (which gives us the
Blanche Geer and I were studying college students.
shivers when we watch a thriller or a horror At a certain point, we became interested in
film) does not represent all squeaks of doors, student ‘leaders,’ students who were heads of
but we associate it with them. We do not think major organizations at the university (there were
about the differences between that squeak and several hundred of them). We wanted to know
the one made by our front door; we notice the how they became leaders and how they exercised
their powers. So we made a list of the major
similarities only. These are two different ways organizations (which we could do because we
of thinking, and most social sciences seek to had been there for a year and knew what those
find patterns of this kind. were, which we would not have known when we
While the verbal expressions of an inter- began) and interviewed twenty each of men and
active exchange may vary, exchange based women student leaders. And got a great result —
it turned out that the men got their positions
on the question-answer pattern features a for- through enterprise and hustling, while the women
mal trans-institutional (though not universal) were typically appointed by someone from the
structure. While laying a page of a newspaper university! (Howard Becker, 13/7/2002, personal
on the floor and declaring one’s sovereignty communication)
over it (Goffman, 1961) is a behavior observed Consistency must be given to the sampling
in one psychiatric clinic only, the need to have reasoning, but not by mere application of
a private space and control over a territory has procedural steps. The reasoning could be as
been reported many times, albeit in different follows.
forms.
1 The researcher usually starts from his/her research
questions. Melvin Dalton’s were:
INTERACTIVE, PROGRESSIVE, AND
ITERATIVE SAMPLING: SOME TIPS Why did grievers and managers form cross-cliques?
Why were staff personnel ambivalent toward line
Having outlined the theoretical premises of officers? Why was there disruptive conflict between
an idiographic sampling theory, I shall now Maintenance and Operation? If people where
describe its procedural aspects. However, awarded posts because of specific fitness, why
there is no precise logical itinerary to set out, the disparity between their given and exercised
because methodological principles and rules influence? Why among executives on the same
formal level, were some distressed and some not?
do not have to stand on their own – as they are
And why were there such sharp differences in
instead required to do in statistical sampling
viewpoint and moral concern about given events?
theory – in that they have only a weak What was the meaning of double talk about success
relation to practice. It is instead necessary as dependent on knowing people rather than on
to approach the entire question of sampling possessing administrative skills? Why and how
sequentially, and it would be misleading to were ‘control’ staffs and official guardians variously
plan the whole strategy beforehand. In order to compromised? What was behind the contradictory
achieve representativeness, the sampling plan policy and practices associated with the use of
must be set in dialogue with field incidents, company materials and services? Thus the guiding
contingencies, and discoveries. This is what question embracing all others was: what orders
I mean by ‘interactive, progressive, and the schism and ties between official and unofficial
action? (1959: 274)
iterative sampling.’ An excellent instance of
this procedure ‘is given in Glaser and Strauss’s
Research questions comprise the concepts and
(1964, 1968) studies on dying in the hospital, categories (behaviors, attitudes, and so on) that
where hypotheses were developed hand in the researcher intends to study.
hand with data collection’ (Denzin, 1971: 2 The researcher conducts primary (or ‘provisional’
269). Another example of changing or adding and ‘open’7 : Strauss and Corbin, 1990: 193)
to the sampling plan on the basis of something sampling in order to collect cases in accordance
with the concepts. As Payne and Williams make connections among them, thus formulating
(2005: 295) suggest, ‘research design should plan working hypotheses. Even though not every
for anticipated generalizations, and that general- hypothesis is testable (indeed the most interesting
ization should be more explicitly formulated within ones often are not), if the reader is to be persuaded,
a context of supporting evidence.’ they must be formulated in a testable way.
3 Because not every concept can be directly studied, 7 When the researcher has formulated hypotheses,
when the researcher constructs the provisional s/he restarts sampling in order to collect cases
sample, s/he considers the following aspects: systematically relating to each hypothesis, and
seeking to make his/her analysis consistent.
(a) specificity (focusing on specific social activities Strauss and Corbin call this second sampling
with distinctive features, like rituals or ‘relational and variational: is associated with
ceremonies); axial coding. It aims to maximize the finding of
(b) the field’s degree of openness (open or closed differences at the dimensional level’ (1990: 176).
places); They depict the research process as funnel-shaped:
(c) intrusiveness (the endeavor to reduce the through three increasingly focused steps (open,
researcher’s visibility); axial, and selective) the researcher clarifies his/her
(d) institutional accessibility (free-entry versus statements because ‘consistency here means
limited-entry situations within the organiza- gathering data systematically on each category’
tion); (Strauss and Corbin, 1990: 178). When the
(e) significance (frequent and high organizational researcher finds an interesting aspect, she/he must
significance of social activities). always check whether it occurs in other samples.
8 Generalization must be ensured ‘across and within
cases (…) [because] the danger of error in drawing
4 It is advisable to sample type of actions or
general conclusions from a small number of cases
events: ‘not, then, men and their moments. Rather
must not be underestimated’ (Gomm, Hammersley
moments and their men’ (Goffman, 1967: 3), ‘not
and Foster, 2000: 98). This concept has been some-
only people but moments of lived life’ (Converse
times rubricated as ‘internal generalization,’ and it
and Schuman, 1974: 1), ‘incidents and not
implies different strategies which take account of
persons per se!’ (Strauss and Corbin, 1990: 177),
diverse dimensions: time, sites, days, and people.
in contrast with the common practice of sampling
The researcher should collect cases of behavior
bodies, and of seeking information from these
recurring at different moments of time. Because
bodies about behaviors and events that are never
the researcher cannot observe the case-study
observed directly (Cicourel, 1996). There are two
population twenty-four hours a day, s/he must take
reasons for this important recommendation: first,
a decision on when and where s/he will observe the
it serves to prevent the survey sampling mistake
population (Schatzman and Strauss, 1973: 39–41;
concerning the transferability of ideas about
Corsaro, 1985: 28–32). Unfortunately,
representativeness; second, the same person may
be engaged in overlapping activities. For example,
Dalton (1959), when studying power struggles in case study researchers rarely make clear what
companies, found five ‘types of cliques:’ vertical they take to be the temporal boundaries of
(symbiotic and parasitic), horizontal (defensive and the cases they have studied (…) it is not
aggressive), and random. If we sample individuals, unusual for case studies of schools to focus on
we find that they belong to more than one one year-group or cohort of students and to
clique according to the situation, intention, and so assume that the experience of these students is
on. If we consider activities, everything becomes representative of other cohorts, past and future.
simpler. (Gomm, Hammersley and Foster, 2000: 109)
5 To date, four main types of sampling have been
developed in social research: purposive, quota, Social practices always occur in certain places and
emblematic, and snowball. When cases are at certain times of the day. Only if the researcher
selected, attention should be paid to the variance knows all the rituals of the organization observed
of concept, so that different voices or cases can can s/he draw a representative sample.
be included in the sample.
6 As the research proceeds, the researcher will refine A classic illustration is provided by Berlak et al.’s
his/her ideas, categories and concepts, or come study of progressive primary school practice in
up with new ones. The important thing is to Britain in the 1970s (Berlak and Berlak, 1981;
Berlak et al., 1975). They argued that previous phenomenon cumulatively, study by study.
American accounts had been inaccurate because As Gomm, Hammersley and Foster (2000:
observation had been brief and had tended to take 107) acknowledge:
place in the middle of the week, not on Monday
or Friday. On the basis of these observations, it is possible for subsequent investigations
the inference had been drawn that in progressive to build on earlier ones by providing
classrooms children simply chose what they wanted additional cases, so as to construct a
to do and got on with it. As Berlak et al. document, sample over time that would allow
however, what typically happened was that the effective generalization. At the present,
teachers set out the week’s work on Mondays, this kind of cumulation is unusual (…)
and on Fridays they checked that it had been the cases are not usually selected in such
completed satisfactorily. Thus, earlier studies were a way as to complement previous work;
based on false temporal generalizations within (c) representative samples are used to justify the
cases they investigated. (Gomm, Hammersley and researcher’s statements.
Foster, 2000: 109–110)
It is therefore apparent that, although on the
Qualitative researchers do not seek to know the
distribution of such behaviors (how many times);
one hand ‘generalization is not an issue that
they only seek to know whether they are recurrent can be dismissed as irrelevant by case study
and significant in the organization under study. In researchers’(Gomm, Hammersley and Foster,
addition, ‘our concern is with representativeness 2000: 111), on the other it is not the impossible
of concepts’ (Strauss and Corbin, 1990: 190). And undertaking that survey researchers have
finally, in regard to people and sites, always mocked. Finally, whilst probability
sampling has a substantive aim – to construct
there is also likely to be variation in the behavior a sample in order to extend the findings to the
of both teachers and pupils across different population – interactive sampling has a further
contexts within a school. While most contact task: to reflect, through its recursiveness, on
between members of the two groups probably the plausibility of generalizations.
occur in classrooms, they also meet one another
in other places as well: in assembly halls, dining
rooms, corridors, on game fields, and so on CONCLUSION
(…) Teacher-pupil relationships are likely to vary
across mathematics classrooms, drama studios
Statistical inference (survey) and theoretical
and science laboratories, for example. (Gomm,
Hammersley and Foster, 2000: 111) inference (experiment), as the two legitimate
ways to draw general conclusions, continue
9 The researcher can sample new incidents or to be used even though their application is
s/he can review incidents already collected: fraught with difficulties; and they in fact
‘Theoretical sampling is cumulative. This is end up by deviating from their theoretical
because concepts and their relationships also principles and assumptions. Hence one fails to
accumulate through the interplay of data understand why it is not possible to resort to
collection and analysis […] until theoretical other forms of generalization which, though
saturation of each category is reached’ (Strauss unsatisfactory, are no more unsatisfactory
and Corbin, 1990: 178, 188). that those deemed superior to them. For that
10 This interplay between sampling and hypothesis
matter, contemporary social scientists do not
testing is needed because
have to choose between perfect and imperfect
forms of generalization, but between forms
(a) representative samples are not predicted
of inference whose strengths and weaknesses
in advance but found, constructed, and
discovered gradually in the field; depend on the researcher’s cognitive aims,
(b) it reflects the researcher’s experience, the research situation, and the nature of the
previous studies, and the literature on phenomenon under study.
the topic. In other words, the researcher The central idea of this essay lies
will come to know the variance of a midway between two highly authoritative
and well-known methodological proposals: States. Comparison between the results of the
Durkheim’s (1912) cas pur (the ‘pure case’), two research studies showed that the three
with positivist overtones, and Max Weber’s researchers had discovered almost identical
(1904) theory of ideal types. Durkheim patterns of behavior. The reason for this simi-
believed that the simplest society of all for larity was probably that the survey interview-
study of the elementary forms of religious ers had been trained with textbooks widely
life was the Australian tribe of the Arunta. used on both sides of theAtlantic, and that they
The Flemish statistician and sociologist had used artifacts – technological (telephone,
Adolphe Quételet (1796–1874) looked to the keyboard), cognitive (questionnaires), and
crowd for his homme moyen (the average organizational (scripts or interview formats) –
man), who represented the ‘normality’ of the which made the social activities very similar.
species. He was prompted to do so by the There are consequently numerous social
discovery that certain characteristics (physical research settings in which a few cases may
and biological) of individuals were distributed suffice to make a generalization. Provided
in the populations which he studied according they are chosen carefully.
to the ‘normal’ curve constructed by the
mathematician Gauss.
Conversely, Weber maintained that ‘feu-
dal society,’ ‘bureaucracy,’ ‘charisma’ were NOTES
genetic concepts (developed with a view to
a causal explanation) and limiting concepts.
They consequently could not be evaluated 1 To be stressed is that the distinction between
probability and non-probability does not mark
in terms of their reality-describing adequacy, the boundary between qualitative and quantitative
only in terms of their instrumental efficacy. research: in fact, non-probability samples are also used
For Weber (1904), an ideal type was not for surveys (quota, telephone, and so on) and for
a representation of the real; rather, it was experiments.
formed by a one-sided accentuation of one or 2 This compromise centered on the idea of
complementarity is still accepted by numerous
more points of view and by the connection of methodologists: see for instance Payne and Williams
a quantity of diffuse, discrete, more or less (2005: 297).
present and occasionally absent, particular 3 Indeed, there are some who maintain that
phenomena. Given the conceptual purity of generalizability is perhaps the wrong word for what
an ideal type, it could never be empirically qualitative researchers seek to achieve: ‘Generaliza-
tion is (…) [a] word (…) that should be reserved for
detected in reality; it was a utopian entity. surveys only’ (Alasuutari, 1995: 156–7).
The typical or emblematic case suggested 4 However, Denzin’s (1971) position was very
as a criterion for the construction of sample different at the end of the 1960s: he expressed himself
stands midway between the claim to have in favor of operationalization (‘this does not mean that
discovered the pure case (the quintessence operationalization is avoided – it merely suggests that
the point of operazionalization is delayed until the
of the phenomenon studied) and renunciation situated meaning of concepts is discovered,’ p. 268);
of the empirical search for cases of interest he believed that the use of indicators was important
because of their typicality. (‘a series of empirical indicators relevant to each data
At the end of the 1980s, in a study on base and hypothesis must be constructed, and, last,
the interview, I documented the rituals and research must progress in a formative manner in which
hypotheses and data continually interrelate,’ p. 269),
rhetorical strategies used by an interviewer and he argued that ‘it is necessary for researchers to
as he made telephone calls to 10 adolescents demonstrate the representativeness of those units in
in order to arrange subsequent face-to-face the total population of similar events’ (p. 269).
interviews (Gobo, 1990, 2001). The research 5 Gomm, Hammersley and Foster (2000: 112,
involved the recording of the telephone calls endnote 2) acutely point out: ‘there is some ambiguity
in Stake’s position. He also recognizes that case
and subsequent discourse analysis. Some studies can be instrumental rather than intrinsic, and
years later, Maynard and Schaeffer (1999) in an outline of the ‘major conceptual responsibilities’
conducted very similar research in the United of case study inquiry he lists the final one as
‘developing assertions or generalizations about the Cicourel, Aaron V. and Boese, R. 1972 Sign language
case (Stake, 1994, 244).’ acquisition and the teaching of deaf children, in
6 For this reason, apparently too severe and D. Hymes, Courtney B. Cazden, Vera P. John, and
without empirical justification is Payne and Williams’ Dell Hymes (eds.), Functions of Language in the
statement that: ‘opportunistic site selection will Classroom, New York: Teacher College Press.
normally be incompatible with even moderatum
Connolly, Paul 1998 ‘Dancing to the wrong tune’:
generalization’ (2005: 310).
7 As Strauss and Corbin (1990: 176) explain:
Ethnography, generalization, and research on racism
‘open sampling is associated with open coding. in schools, in P. Connolly and B. Troyna (eds.),
Openness rather than specificity guides the sampling Researching Racism in Education, Buckingham: Open
choices.’ Open sampling can be performed purpo- University Press, pp. 122–39.
sively (e.g. pp. 183–4) or systematically (e.g. p. 184), Converse, Jean M. and Schuman, Howard 1974
or it occurs fortuitously (e.g. pp. 182–3). It includes Conversations at Random: Survey Research as
on-site sampling. Interviewers See it, New York: Wiley.
Corsaro, William A. 1985 Friendship and Peer Culture
in the Early Years, Norwood, N.J: Ablex Publishing
REFERENCES Corporation.
Crapanzano, Vincent 1980 Tuhami. Portrait of a Moroc-
can, Chicago: University of Chicago Press.
Alasuutari, Pertti 1995 Researching Culture, London: Cronbach, Lee J. 1975 Beyond the two disciplines
Sage. of scientific psychology, American Psychologist,
Alberoni, Francesco et al. 1967 L’attivista di partito, 30: 116–27.
Bologna: Il Mulino. Cronbach, Lee J. 1982 Designing Evaluations of
Bailey, Kenneth D. 1978 Methods in Social Research, Educational and Social Programs, San Francisco:
New York: Free Press. Jossey-Bass.
Barton Allen H. and Lazarsfeld Paul F. 1955 Some Dalton, Melvin 1959 Man Who Manage, New York:
functions of qualitative analysis in social research, Wiley.
Frankfurter Beitrage zu Sociologie, 1: 321–361. De Martino, Ernesto 1961 La terra del rimorso, Milano:
Becker, Howard. 1953 Becoming a Marijuana Smoker. Il Saggiatore, transl. The Land of Remorse: A Study of
American Journal of Sociology, 59: 235–242. Southern Italian Tarantism, London: Free Association
Becker, Howard 1998 Trick of the Trade, Chicago and Books, 2005.
London: University of Chicago Press. Denzin, Norman K. 1971 Symbolic interactionism and
Becker, Howard 2000 Italo Calvino as Urbanologist, ethomethodology, in J.D. Douglas (ed.), Understand-
paper. ing Everyday Life, London: Routledge and Kegan Paul,
Bourdieu, Pierre. et al. 1993 La Misere du monde, Paris: pp. 259–284.
Editions du Seuil, transl. The Weight of the World: Denzin, Norman K. 1983 Interpretive interactionism, in
Social Suffering in Contemporary Society, Cambridge: G. Morgan (ed.), Beyond Method: Strategy for Social
Polity, 1999. Research, Beverly Hills, CA: Sage, pp. 129–46.
Burgess, Ernest W. 1927 Statistics and case studies Durkheim, Emile 1912 Les formes élémentaires de la vie
as methods of sociological research, Sociology and religieuse, Paris: Alcan, transl. The Elementary Forms
Social Research, 12: 103–120. of the Religious Life, London: G. Allen & Unwin, 1915.
Burrell, Gibson and Morgan, Gareth 1979 Sociological Festingers, Leon, Riecken, Henry W. and Schachter,
Paradigms and Organizational Analysis, London: Sanley 1956 When Prophecy Fails, New York: Harper
Heinemann. Torchbooks.
Capecchi, Vittorio 1972 Struttura e tecniche della ricerca, Fielding, Nigel 1981 The National Front, London:
in Pietro Rossi (ed.), Ricerca sociologica e ruolo del Routledge.
sociologo, Bologna: Il Mulino. Galtung, John 1967 Theory and Methods of Social
Chain, Isidor 1963 An introduction to sampling, in Research, Oslo: Universitets Forlaget.
C. Selltiz and M. Jahoda (eds.), Research Methods Garfinkel, Harold 1967 Studies in Ethnometodology,
in Social Relations, New York: Holt & Rinehart, Englewood Cliffs, NJ: Prentice Hall.
pp. 509–45. Geertz, Clifford 1972 Deep play: notes on the Balinese
Cicourel, Aaron V. 1968 The Social Organization of Cockfight, Dedalus, 101: 1–37.
Juvenile Justice, New York: Wiley. Geertz, Clifford 1973 The Interpretation of Culture,
Cicourel, Aaron V. 1996 Ecological Validity and New York: Basic Books.
White Room Effects, Pragmatic and Cognition, 4(2): Glaser, Barney G. and Strauss, Anselm L. 1967
221–263. The Discovery of Grounded Theory, Chicago: Aldine.
Gobo, Giampietro 1990 The First Call: Rituals and Lieberson, Stanley 1992 Small N’s and Big Conclusions:
Rhetorical Strategies in the First Telephone Call with An examination of the Reasoning in Comparative
Italian Respondents, paper, Annual Meeting of the Studies Based on Small Number of Cases, reprinted
A.S.A., Washington D.C. August, 11–15. in R. Gomm, Hammersley, M. and Foster P. (eds.)
Gobo, Giampietro 2001 Best practices: rituals and (2000), op. cit.
rhetorical strategies in the ‘initial telephone Lincoln, Yvonna, S. and Guba, Egon, G. 1979 Naturalist
contact,’ Forum Qualitative Social Research, 2(1), Inquiry, Beverly Hills, CA: Sage. (Reprinted partially in
http://www.qualitative-research.net/fqs-texte/1-01/ Gomm Roger, Hammersley, Martyn and Foster, Peter
1-01gobo-e.htm. (eds.) 2000 Case Study Method, London: Sage,
Gobo, Giampietro 2004 Sampling, representative- pp. 27–42.
ness and generalizability, in Seale C., Gobo G., Mason, Jennifer 1996 Qualitative Researching, Newbury
Gubrium J.F., Silverman D. (eds.), Qualitative Park: Sage.
Research Practice, London: Sage, pp. 435–56. Maynard, Douglas W. and Schaeffer, Nora Cate 1999
Goetz, J.P. and LeCompte, Margaret D. 1984 Ethnog- Keeping the gate, Sociological Methods & Research,
raphy and Qualitative Design in Education Research, 1: 34–79.
Orlando, FL, Academic Press. Mitchell, Clyde J. 1983 Case and situation analysis,
Goffman, Erving 1961 Asylums, New York: Doubleday. Sociological Review, 31: 187–211.
Goffman, Erving 1967 Interaction Ritual, New York: Mitchell, R. and Karttunen, S. 1991 Perché e come
Doubleday Anchor. definire un artista?, Rassegna Italiana di Sociologia,
Goldthorpe, John H., Lockwood, David, Bechhofer, XXXII(3): 349–64.
Frank and Platt, Jennifer 1968 The Affluent Worker: Pawson Ray and Tilley Nick 1997 Realistic Evaluation,
Industrial Attitudes and Behaviour, Cambridge: Sage: London.
Cambridge University Press. Payne, Geoff and Williams, Malcolm 2005 Gener-
Gomm, Roger, Hammersley, Martyn and Foster, Peter alization in qualitative research, Sociology, 39(2):
(eds.) (2000) Case Study Method, London: Sage. 295–314.
Goode, William and Hatt, Paul, K. 1952 Methods in Peräkylä, Anssi 1997 Reliability and validity in research
Social Research, New York: McGraw-Hill. based upon transcripts, in David Silverman (ed.),
Gouldner, Alvin G. 1954 Patterns of Industrial Qualitative Research, London: Sage, pp. 201–19.
Bureaucracy, New York: The Free Press. Pollner, Melvin and McDonald, Wikler Lynn 1985 The
Griaule, Marcel 1948 Dieu d’eau: entretiens avec social construction of unreality: a case study of
Ogotemmêli, Paris: Éditions du Chêne. a family’s attribution of competence to a severely
Groves, Robert M. and Lyberg, Lars E. 1988 An retarded child, Family Process, 24: 241–254.
overview of nonresponse issues in telephone surveys, Ragin Charles C. and Becker Howard S. (eds.) 1992 What
in R.M. Groves, P.P. Biemer, L.E. Lyberg, J.T. Massey, is a Case? Cambridge: Cambridge University Press.
W.L. Nicholls II and J. Waksberg (eds.), Telephone Rosenhan, David L. 1973 On being sane in insane places,
Survey Methodology, New York: Wiley. Science, 179: 250–8.
Guba, Egon G. 1981 Criteria for assessing the Rositi, Franco 1993 Strutture di senso e strutture di dati,
trustworthiness of naturalistic enquiries, Educational Rassegna Italiana di Sociologia, 2: 177–200.
Communication and Technology Journal, 2(29): 75–92. Sacks, Harvey 1992 Lectures on Conversation, Oxford:
Guba, Egon G. and Lincoln, Yvonna S. 1981 Effective Blackwell.
Evaluation: Improving the Usefulness of Evalua- Schatzman, Leonard and Strauss, Anselm L. 1973 Field
tion Results Through Responsive and Naturalistic Research, Englewood Cliffs, NJ: Prentice Hall.
Approaches, San Francisco: Jossey-Bass. Schofield Janet Ward 1990 Increasing the generaliz-
Guba, Egon G. and Lincoln, Yvonna S. 1982 Episte- ability of qualitative research, in E.W. Eisner and
mological and methodological bases of naturalistic A. Peshkin (eds.), Qualitative Inquiry in Education:
inquiry, Educational Communication and Technology The Continuing Debate, New York: Teachers College
Journal, 30: 233–252. Press, pp. 201–232.
Hammersley, Martyn 1992 What’s Wrong with Ethnog- Silverman, David 2000 Doing Qualitative Research,
raphy?, London: Routledge. London: Sage.
Hebdige, Dick 1979 Subculture: The Meaning of Style, Stake, Robert 1978 The case study method in social
London and New York: Routledge. enquiry, Educational Researcher, 7: 5–8 (Reprinted
Kahneman, D. and Tversky, A. 1972 Subjective prob- in Gomm Roger, Hammersley, Martyn and Foster,
ability: A judgment of representativeness, Cognitive Peter (eds.) 2000 Case Study Method, London: Sage,
Psychology, 3: 430–454. pp. 19–26).
Strauss, Anselm 1987 Qualitative Analysis for XIX: 22–87, transl. On the methodology of the social
Social Scientists, Cambridge: Cambridge University sciences, Illinois: The Free Press of Glencoe, 1949.
Press. Williams, Malcolm 2000 Interpretativism and general-
Strauss, Anselm and Corbin, Julet 1990 Basics of ization, Sociology, 34(2): 209–24.
Qualitative Research, London: Sage. Xing Xu, Zhonghe Zhou, Xiaolin Wang, Xuewen Kuang,
Tversky, Amos and Kahneman, Daniel 1974 Judgment Fucheng Zhang and Xiangke Du 2003 Four winged
under uncertainty: Heuristics and biases, Science, dinosaurs from China, Nature, 421: 335–339.
185: 1123–1131. Yin, Robert K. 1984 Case Study Research, Thousand
Weber, Max 1904 Die ‘Objektivität’ sozialwis- Oaks: Sage.
senschaftlicher und sozialpolitischer Erkenntnis, Znaniecki , Florian 1934 The Method of Sociology, New
Archiv für sozialwissenschaf und Sozialpolitik, York: Farrar & Rinehart.
13
Case Study in Social Research
Linda Mabry
A case study is the empirical investigation of UNDERSTANDING CASES AND

a specified or bounded phenomenon (Smith, CASE STUDY
1978). The focus of study – the case – may
be as minutely targeted as a single person, The raison d’être of case study is deep
such as a clinical case of a patient’s response understanding of particular instances of
to medical treatment or an investigation of phenomena. This overriding goal drives all
whether the educational resources provided practical decisions in conducting a case
to a student eligible for special services meet study: which site or sites might prove
legal requirements. More commonly, case most revealing, which questions or issues
study research in social science concentrates might usefully guide investigation, which data
on instances of greater complexity, such collection methods might be helpful, which
as a community’s approach to addressing participants might be informative, which
a prevailing societal issue, a program’s analyses might be revealing, which reporting
effectiveness, or a policy’s implications. style might be most accessible and compelling
The case may be selected because of the to interested audiences. Deep understanding
researcher’s interest in a particular instance is not easily achieved. Take, for example,
or site or because of the case’s capacity to be a single-subject case of interest to each of
informative about a theory, an issue, or a larger us, the effort to understand oneself. Such
constellation of cases. self-study, if it may be called that, often
This chapter will discuss case study as takes a lifetime and, clearly, can go awry, as
a research approach, its contribution to is evident from our encounters with people
understanding social phenomena, its method- who, despite their unlimited access to data,
ology, and related issues. Discussion will be appear to have either unjustifiably humble
illustrated with examples from social science, or inflated views of their own capacities
especially the field of education. or importance. Case study in social science
CASE STUDY IN SOCIAL RESEARCH 215
involves careful methodology to avoid such by Kant (1781) as the study of phenomena
error. or things-as-they-appear and the measure-
Case study researchers in social science ment of things-as-they-are or nuomena. As
commonly scrutinize not only the demograph- human beings were distinguished on the basis
ics and other statistics of a case, such as of their sense-making proclivities, perception
how many persons are involved or affected emerged as an object of social science – how
and how indicators of impact vary over time, things appear to a participant in the scene
but even more closely the experiences and (e.g. the homeless person, the manager of a
perceptions of participants. Understanding a soup kitchen, the police chief) and how they
case almost always requires going beyond appear to an observer (e.g. the social scientist).
countable aspects and trends. Inquiry into To the extent that case study researchers
the social phenomenon of homelessness, for work to document human perception and
example, may benefit from counting the experiences, consciously using their own
number of persons dispossessed, comparing perceptions in the process, they engage in
the current with a past census, and identifying phenomenology.
the homeless by age group, gender, and The clashing motifs of natural science
location. But this is not enough for deep (sometimes referred to as hard science
understanding. Grasping why people live or quantitative or experimental research)
on the streets and such things as whether and social science (contrastingly referred
sufficient resources are available to support to as soft science or qualitative, interpre-
any who might choose not to do so, whether tive, or hermeneutic research) may present
there are cross-generational effects, which themselves to case study researchers as a
policies and social structures tend to push choice or may be resolved in mixed-methods
people into homelessness and which tend to inquiry. Resolution once seemed unlikely, and
protect them from it will significantly improve some still deem the two research paradigms
understanding. What do the homeless think incommensurable (Kuhn, 1962; Lincoln &
their opportunities and barriers are? Do social Guba, 1985). Where case study researchers
workers and law enforcement officers agree in social science choose between the two,
with them? What do policy-makers think their methodological choice is typically qual-
the homeless need, and what do they think itative.
their constituents or budgets will support? In this methodological distinction, differ-
Because the social reality of homelessness is entiation has evolved over time. During the
co-constructed by people who participate in century after Kant, the Vienna School moved
the phenomenon, their experiences, beliefs, from positivism’s insistence on the measura-
and values must be studied in order to bility of an objective reality (Comte, 1822) and
understand the phenomenon of homelessness the notion that the truth of a statement depends
in any place that it occurs, the political and upon its being in a one-to-one correspondence
ideological contexts that sustain it, and the with an objective reality (David, 2005) to log-
capacity of participants to imagine or accept ical positivism’s less demanding requirement
potential solutions. of the verifiability of real entities (Popper,
1935). Across the scientific aisle, simulta-
neously, Nietzsche (1882), urging subjective
Historical and epistemological
judgment, observed, ‘We behold all things
antecedents
through the human head’, and Dilthey (1883)
Because social reality is created by people advanced a general theory of understand-
and because it is complex, dynamic, and ing, Verstehen, whose research imperative
context-dependent, its study required the involved subjective meaning-making, Geis-
development of a highly nuanced research teswissenschaften. From there, the Chicago
approach. Eighteenth-century views of natu- School developed an urban sociology in the
ral science and social science were contrasted 1920s–30s employing ethnographic methods.
At the end of the next century, Erickson, Second, the search for broad applicability
describing qualitative methodology, was still of findings in quantitative research drives
encouraging researchers to ‘put mind back in random sampling from large populations and
the picture’ (1986, p. 127, italics original). data collection using standardized proce-
Qualitative or interpretivist study implies dures. The quality of large-scale quantitative
the constructivist theory that all knowledge research depends largely on careful adherence
is personally constructed (Piaget, 1955; see to a prescriptive research design. In contrast to
also Glassman, 2001; and Phillips, 1995). the preordinate design of quantitative studies,
Personal experience, including the vicarious qualitative case studies employ emergent
experience promoted in interpretivist case design. Rather than carefully adhering to a
studies, provides the building blocks for design specified at the outset, when relatively
the knowledge base constructed by each little is known about a case, a qualitative
individual. In articulating a resonant inter- case researcher is expected to improve on
pretivist research methodology, Lincoln and the original blueprint as information emerges
Guba (1985) promoted an ontology of truth during data collection. For example, if
and a subjectivist epistemology in which unexpected sources of data become apparent
meaning is personally or socially constructed or if unanticipated aspects of the case come to
(see Vygotsky, 1978). Similarly, hermeneutic light, the researcher is expected to capitalize
methodology is marked by search for the on the new opportunities and progressively
meanings people attribute to phenomena focus the study on the features of the case
(Guba and Lincoln, 1994; see also Guba which gradually appear to be most significant.
and Lincoln, 2005; Schutz, 1967; Schwandt, Finally, while large-scale quantitative stud-
1994). ies reduce data to numbers for aggregation
and statistical analysis, interpretivist case
studies tend to expand datasets as new sources
Interpretivist distinctions
are discovered and questions articulated.
Three additional contrasts help to distinguish The contrast is between the reductionism of
interpretivist case study in the social sciences, quantitative studies and the expansionism of
with the caution that these characteristics interpretivist studies. Reductionism allows
are better understood as complementary than quantitative researchers to utilize statistical
as conflicting with quantitative methodology. analysis procedures; expansionism allows
First, while large-scale quantitative studies interpretivist case study researchers fuller
sample from broad populations and produce access to a case’s contexts, conditionalities,
grand generalizations, case studies provide and meanings.
deep understanding about specific instances.
The contrast is one of breadth and depth, Contributions to knowledge and
both needed for understanding complex social
understanding
phenomena. For example, it may be helpful
to know the correlations between exposure Such characteristics position case studies to
to the sun and the incidence of melanoma, contribute substantively to social science by
treatment options, and survival rates; it may offering intense focus on cases of interest,
also be helpful to know how some patients their contexts, and their complexity.
deal with the effects of treatment and whether
their personal strategies aided recovery and Selection of cases
quality of life. For disaster planning, it Cases abound – micro-lending, public trans-
may be helpful to know which community portation, the westernization of indigenous
services are most frequently accessed during cultures, consolidation of rural high schools,
emergencies; it may also be helpful to know access of the uninsured to hospitals, a social
how a critical service agency mobilizes and worker’s case load, an immigrant child’s
rations access. struggle to learn. The identification of a
case to be studied will largely depend on pressure points in educational delivery to

the researcher’s interest, his or her industry those students who struggle academically,
in identifying a case informative enough to psycho-emotionally, or socio-economically,
be worth studying, and his or her skill in and often invisibly.
negotiating access to its site. When more than one instance is to be
Where a case is thought to be representative studied, the scope of the inquiry may include
of a larger population, a typical case study contrasting cases. Contexts, circumstances,
may be useful for identifying and document- and their effects on each case may provide
ing patterns of ordinary events, the social a fuller picture of the larger phenomenon
and political structures that sustain them, as different cases feature different aspects
and the underlying perceptions and values of interest. For example, cases exhibiting
of participants (e.g. Fine, 1991; Stake et al., different degrees of success in implementing
1991; Tobin et al., 1989). Through studies of a statewide model of inclusion for disabled
typical cases, the status quo of a phenomenon children in regular classrooms can clarify
can be revealed and understood. factors that support or hinder local efforts
Atypical cases can be especially enlight- (Peck et al., 1993). Cases from different coun-
ening about the conditionalities of a phe- tries can surface a variety of approaches to
nomenon, promoting not only understanding early childhood education and generate useful
but also theory refinement. Often, cases questions regarding teacher-pupil ratios, locus
which defy expectations, conflict with the of authority for social norms, and societal
ordinary, illustrate contrasting approaches, or support for young children and their families
suggest alternatives or possibilities for change (Tobin et al., 1989).
prove most illuminating. Recognition of the
uniqueness of each case positions case study Complexity and contextuality of cases
researchers to appreciate the particularities of Case study exhibits a profound respect for
outliers, to attend to the negative, discrepant, the complexity of social phenomena. Interpre-
or deviant cases and to the disconfirming data tivist methodology encourages the case study
other researchers often dismiss as trivial or researcher to be alert to patterns of activities
‘noise’. Studies of exceptional cases often and the variety of meanings participants
challenge and assist theorizers to account for ascribe to their experiences. While portrayals
enigmatic counterexamples at the margins of sensitive to myriad details and factors may
generalized explanations, offering invaluable include quantitative data, in general, reducing
opportunities to improve abstracted represen- experiences and perspectives to numbers
tations of social phenomena. representing a few preselected dimensions
For example, in the field of education, involves too great a loss of meaning for
a study of drop-outs which documents quantitative methods alone to satisfy the
their resistance to numbing curricula and expectations of case study.
dehumanizing conformities as well as efforts Contextuality is an aspect of the dynamism
to expel them can focus attention on the and complexity of a case. Case study
discrepancy between policy goals and actual researchers recognize that cases are shaped
practices (Fine, 1991) in a way that study by their many contexts – historical, social,
of typically performing students cannot. political, ideological, organizational, cultural,
A study that shows a well-meaning teacher linguistic, philosophical, and so on. Relation-
to be, in effect, the unwitting cultural enemy ships between contexts and cases (and among
of Native American children attending a contexts) are interdependent and reciprocal.
residential boarding school (Wolcott, 1987) For example, the operations of a social service
can raise new sensibilities about the role agency may reduce the neediness of its clients
of teachers in diverse communities. Single- and generate public support, while client
subject case studies of students (Mabry, 1991; neediness and available funding affect how
Spindler, 1997; Wolcott, 1994) can identify the agency operates.
The inherent subjectivity of these methods researcher interpretations, as exemplified by

leaves researchers susceptible to challenge Freudian interpretations of Shakespeare three
regarding validity by those who equate sub- centuries after the Bard’s death, narratives
jectivity more with bias than with sensitivity. reveal their special merits and contributions.
Case study researchers generally employ
ethnographic techniques which require more Experientiality
intimate (proximal) contact with research sub- Learning from experience is important
jects than, for example, survey researchers. partly because human capacity to understand
The issues of validity, generalizability, and exceeds the capacity of language to convey
proximity call for care and will each be meaning. A company president may choose
considered later in the text. to spend a day as an entry-level employee,
not because of insufficient written information
about procedures, profits, or personnel but
Narrative reporting
because such an experience can tell something
The development of reports, usually in the more. In attempting to encourage public
form of narrative accounts, is the final step support for famine victims, a journalist may
in a long analytic process. Narrative reporting report not only mortality rates but also the
offers at least three important advantages: harrowing stories of some individuals that
conveyance of deep meaning, reader acces- instantiate the human understanding of media
sibility, and opportunity for readers to rec- consumers.
ognize and consider researcher subjectivity. More than other types of empirical research
Narratives carry complex meanings which are reports, a case study tells the story of the
comprehensible to readers, narrative portray- case. Interpretivist case study researchers are
als building on natural ways of understanding expected to stimulate vicarious experience for
which have evolved across human history readers, providing a sense of almost having
(Carter, 1993), as the endurance of Homerian been present to witness the events documented
and other sagas attest. in case studies. Case studies foster deep
Story-like representations of cases promote understanding not only by presenting analytic
wide accessibility for general and schol- details – Geertz’s (1973) ‘thick description’ –
arly audiences. Human capacity to grasp but also by offering experiential reports.
nuanced understandings from stories has been Recognizing that, for purposes of understand-
confirmed by cognitive psychologists who ing, experiential knowledge is often superior
urge case-based approaches to understanding to declarative knowledge, interpretivist case
multi-faceted, ambiguous, ill-structured phe- study researchers attempt to promote their
nomena (Spiro et al., 1987) to help readers readers’ vicarious experience of the events
transition from grasping empirical findings to described. By contrast, the declarative knowl-
applying them. edge in statements of research findings may be
First-person case narratives, using ‘I’ judi- less memorable and more easily dismissed.
ciously so as not to deflect focus from the Experiential portrayals enhance tacit
case to the researcher, subtly and continuously knowledge (Polanyi, 1958), unspoken
remind the reader that the narrative is the understandings or ‘gut feelings’ that may
product of the researcher’s mind. Detailed elude satisfactory expression in language
data presentation in narratives also invites yet be more influential for action. The
readers to judge whether the data support power of tacit knowledge can be seen in
the findings and to construct their own a parent’s – but few others’ – ability to
personal meanings. Readers’ analyses of the decode a teenager’s ‘Right, Mom’ which may
data presented may differ – usefully or signal approval, compliance, malingering,
uncomfortably – from those of the researcher. or sarcasm. Perhaps not even Mom, despite
When experiential details in narratives allow understanding the intent, could explain how
readers to engage in analysis that extends to derive the meaning. The experientiality of
METHODOLOGICAL APPROACHES (1991), Erickson (1986), LeCompte and

AND ISSUES Preissle (1993), Lincoln and Guba (1985),
Stake (1978, 2005), and Wolcott (1994,
‘What is happening here, specifically? What 1995), qualitative methods prominently fea-
do these happenings mean to the people ture three data collection techniques: obser-
engaged in them?’ – responding to such key vation, interview, and the review and analysis
questions, qualitative case study addresses of site-generated or -related documents. These
a ‘need for specific understanding through methods have been accepted as legitimate
documentation of concrete details’ (Erickson, even in program evaluation (e.g. House, 1994;
1986, p. 124, italics original). Detailed data Mabry, 1998, 2003; Shadish et al., 1991;
can reveal ‘the invisibility of everyday life’ Worthen et al., 1997).
(p. 121) and exotic othernesses, layers of lived Direct observation and semi-structured
experience, and their implications. interviews, which allow probative follow-up
Taking a phenomenological approach questions and exploration of topics unantic-
(Barritt et al., 1985; Kant, 1781; Schutz, ipated by the interviewer, facilitate develop-
1967), case studies are generally naturalistic ment of subtle understanding of what happens
(Lincoln & Guba, 1985), sited in natural in the case and why. These techniques facil-
settings as undisturbed by the researcher as itate rigorous penetration of the unknowns
possible. Interest in cultural contexts typically and depend on the researcher to recognize
leads to ‘thick description’ (Geertz, 1973), the the importance of new input, to generate
recording and analyzing of experiences and pertinent questions, and to maintain curiosity
meaning-making in detail. Thick descriptions rather than jumping to interpretation – that
provide understanding of social realities as is, on intuitiveness and on a methodological
they are subjectively perceived, experienced, commitment to emergent design. Rather than
and created by participants. Some case searching for data to confirm or disconfirm
studies are radical or postmodern, reveal- an a priori theory or hypothesis, interpretivist
ing power structures, imbalances, and their case study researchers are expected to notice
effects – for example, in critical ethnography opportunities and to follow data wherever
(Anderson, 1989). they lead.
The openness and judgment-intensivity
which necessitates subjective interpretation
Methods and trustworthiness
in data collection carries into data analysis.
Consistent with constructivist understanding Two approaches, each with many variations
of the slipperiness of human conceptions of in different qualitative genres, exemplify this
reality, interpretivist methods may not seek point. Intended to produce grounded theory,
to resolve social ambiguities into nomothetic constant-comparative method involves con-
findings but may problematize them: to whom tinuous comparison of incoming data with
is this real or true? According to which emerging interpretation (Glaser & Strauss,
notions of reality? An attitude of openness 1967; Strauss & Corbin, 1990), new data
about truth or reality pushes toward depth of igniting new realizations and new interpre-
understanding, propelling investigation to a tive possibilities provoking more sensitive
profound level. data collection. Case study, however, rarely
As noted, case study in social science produces grand grounded theory, seeking
generally involves qualitative or mixed instead local theory or petite generalizations
methods (Chatterji, 2005; Datta, 1997; Greene (Erickson, 1986). Second, thematic analysis
et al., 1989; Johnson & Onwuegbuzie, involves the identification of emerging pat-
2004; Mertens, 2005). As described in terns and categories from iterative reviews
the literature of educational research by of the dataset, a process which marshals
Denzin (1989, 1997), Denzin and Lincoln evidence for developing and warranting
and their colleagues (1994, 2005), Eisner findings.
narratives intensifies the power of case study insider perspectives and cases’ complexity.
reports to deepen understanding because it Familiarity with the ethos helps outsiders
promotes development of tacit knowledge. attempt etic1 representations of insiders’
There are trade-offs and drawbacks to experiences and meanings, representations
case study as a research approach, including which should be accompanied by appropriate
proximity, validity, and generalizability. qualification.
Cultural competence
PROXIMITY
Cultures and subcultures develop singular
The psycho-socio-emotional distance between histories and respond to overlapping contexts
researchers and the cases establishes their and unique personalities in highly nuanced
proximity. Researchers are usually outsiders ways. For external researchers, the cultural
to the cases they study (but not always), competence needed for grasping local mean-
observers rather than true members of a ings cannot be presumed. Even when external
case. On a continuum of possible roles from researchers share nationality and language
external observer to participant-observer, a with case participants, they may be unable
researcher’s stance may be as passive as to detect the subtle or hidden meanings
that of the proverbial fly on the wall or suggested by a pause in conversation, the type
more active, like a participant in the case. of refreshments offered, who is present and
In ethnography, researchers are sometimes who is absent in a gathering, the items found
cautioned against ‘going native’. Contrast- (or not) on a meeting agenda, who gives and
ingly, in action research or self-study, the who receives gifts, who makes decisions and
case researcher may be a native from the how. Reliance on knowledgeable participants
outset – project manager or a classroom acting as key informants can help surface
teacher conducting research for the purpose of local meanings, although debriefings and
translating deeper understanding into making other discussions for the purpose of cultural
immediate improvements. translation will inject key informants’ own
meanings into datasets and introduce new
cautions for interpretation.
Externality
Where the researcher does not share the
A case study researcher can promotes language(s) or dialect(s) indigenous to the
understanding by collecting and organizing case, dependence on translators is unavoid-
information, focusing attention on meaningful able. The transfer of meaning from speaker to
aspects, and providing an external analytic hearer, never assured, is further compromised
perspective that may be helpful even to by introducing this mediating influence.
insiders intimately familiar with the case. Language structures and idioms are so
Although researchers choose cases partly out culture-specific and dynamic that, even with
of personal interest, externality suggests an highly competent and motivated translators,
absence of vested interest, one source of inaccuracies are difficult to avoid. These lim-
bias, which can promotes the credibility of itations, too, should be acknowledged in case
findings. reports.
On the other hand, externality implies
limited lived experience of the case and
Ethics
the danger that case studies may fail to
‘get it right’ (Geertz, 1973; Wolcott, 1994). The misunderstandings which externality can
The contextuality of cases and the phe- generate has an ethical component. Partici-
nomenological impulse of case study research pants have a stake in the accuracy of how they
create special burdens for external researchers are presented and in whether case accounts
attempting to grasp and represent multiple are flattering or damning. For example,
participants in Peshkin’s case study of a differently not only by different participants

Christian fundamentalist school (1986) may but also by different researchers, preempting
have thought it not only inaccurate but also confirmatory replication of studies. Whether
unfair for a Jewish researcher to compare judgment-intensivity is a challenge for valid-
them to Nazis. A teacher in a California ity, an opportunity to seek information on
school, after agreeing to participate in a a search-as-needed basis, or an insurmount-
study about education in a socially stable able obstacle is a matter of paradigmatic
community, did feel betrayed by a researcher controversy. The researcher’s perspective has
who instead analyzed his classroom’s lack been described as more virtue than limitation
of arts opportunities (Stake, 1991). These (Peshkin, 1988), and it is researcher interest
illustrations show how participants may suffer that compels the determined efforts resulting
at the hands of an external other whom they in deep understanding.
have allowed into their communities. Although an interpretivist approach to
Moreover, human subjects may sometimes case study assumes that each reader and
forget that ever-present researchers are not each researcher will construct unique per-
their colleagues, neighbors, or friends. The sonal understandings of a case, substantial
close proximity, the access, the rapport intersubjective agreement is desirable and
case study researchers need in order to expected – working agreement rather than
develop understanding can create special absolute consensus on every point. Discus-
vulnerabilities for human subjects. Case study sion of various interpretations about a case
researchers are likely to learn a lot about helps to maintain openness to meaning-
participants, more even than participants may making and to sustain the disequilibrium
realize, and may anticipate some threats more that presses toward ever deeper thinking and
quickly or clearly than participants might. The understanding.
challenge here is to be appropriately alert and Thus, while accepting the malleability
protective without lapsing into paternalism. of truth and the inherence of judgment in
perception and interpretation, interpretivists
do not approve unbridled subjectivity or abso-
VALIDITY lute relativism. Typologies for understand-
ing and encouraging validity in interpretive
Validity refers, essentially, to the accuracy research include Lincoln and Guba’s (1985)
of data and the reasonableness and warrant- explication of trustworthiness and Maxwell’s
edness of data-based interpretations, to ‘the (1992) argument for descriptive, interpretive,
adequacy and appropriateness of inferences’ theoretical, and evaluative validity – for
(Messick, 1989, p. 13, italics original). accurate descriptive data, sufficient support
Whether the data and interpretations are for interpretations, and empirical justifica-
infused with undue researcher bias, whether tion of emergent theories and evaluative
the methods of assuring trustworthiness judgments.
are sufficient, whether limitations are fully
explicated in reporting are critical validity
Validation and triangulation
considerations.
Important to all types of research, validity The validity of a case study would be suspect
in interpretivist social science is complicated if participating human subjects rejected the
by subjectivity, so pervasive in interpretivist report as completely false or if independent
practice that some claim the researcher is researchers familiar with the case or site
the method. While subjectivity figures in all did so. Consequently, case study and other
research, it is more obvious in qualitative interpretivist methods commonly include
than in quantitative methodology. The issue triangulation and validation (see especially
is complicated by interpretivist acknowledg- Denzin, 1989) and articulate in reports their
ment that social phenomena are perceived efforts to enhance validity. Although validity
and credibility are separate properties, these a procedure in which groups representing
efforts tend to improve both. those observed and interviewed are asked to
confirm, elaborate, and disconfirm write-ups
Triangulation (Lincoln & Guba, 1985). In comprehensive
During data collection, triangulation2 by data validation, a more thorough approach, each
source involves collecting data from different human subject reviews data collected from
persons or entities. Checking the degree to his or her own interviews or observations
which each source confirms, elaborates, and prior to further dissemination (Mabry, 1998).
disconfirms information from other sources Research subjects may be asked to validate
honors case complexity and the perspectives interpretations as well as data.
among participants and helps ascertain the In addition to triangulation and validation,
accuracy of each datum. Methodological peer review by critical friends, especially
triangulation involves checking data col- colleagues with expertise in the phenomenon
lected via one method with data collected or case or methodology, can provide a check
using another, for example, checking whether on the sufficiency of the evidence, the logic of
direct observation can confirm interview arguments, overall clarity and experientiality.
testimony. Triangulation by time involves
repeated return to the site to track patterns
of events and their trends and permutations. GENERALIZABILITY
Because different observers might see dif-
ferent things or might interpret the same Generalizability refers to the capacity of
things differently, triangulation by observer the case to be informative about a general
can help expand meaning-making, balance phenomenon, to be broadly applicable beyond
interpretations, and guard against undue the specific site, population, time, and cir-
researcher subjectivity. cumstances studied. The understanding that
Theoretical triangulation in data analysis a single case studied in depth can offer is
involves recourse to different abstractions that different from the generalizable explanation,
might explain the data. Various theories, mod- often via theory or model, more easily
els, typologies, and categorization systems provided by large-scale study (von Wright,
may suggest different meanings. For example, 1971).
analysis of the productivity of a working unit In quantitative research, generalizability,
according to an economic model may suggest often referred to as external validity, drives
an interpretation quite different from one design: hypothesis-testing to support a gen-
suggested by analysis of the unit’s procedures eralizable theory, random sampling to assure
according to a model of democratic decision- representativeness of a larger population,
making. Similarly, an analysis of classroom team training for reliability or consistency in
discussion regarding the degree to which the administration of data collection instruments
teacher’s questioning prompts student knowl- to allow aggregability. While researchers
edge gains may yield quite different results schooled in the quantitative tradition have
from an analysis regarding the degree to considered case studies problematic in their
which ethnically diverse students participate determined focus on single cases (e.g.
meaningfully. Campbell & Stanley, 1963), case study
researchers have made different types of
Validation arguments regarding generalizing their work.
The notion that accounts may be ‘made
better by good readers and clearer by
Acceptable interpretivist
good opponents’ (Nietzsche, 1879) underlies
generalization
processes of validation in interpretivist social
science. Research subjects can help assure The case-to-population generalizations
the accuracy of data by member-checking, (Firestone, 1993) important in quantitative
research may be more available from empower them to generalize to cases of

multi-case studies than from single cases, for interest to them.
example, a series of individual lesson studies
(Lewis et al., 2006) or scaling up studies
Purposive sampling
(McDonald et al., 2006) which pay specific
attention to the particulars affecting wider Development of deep understanding does
application. indeed take time, so few cases can usually
More appropriate to single case studies than be selected even for multi-case studies. The
grand generalizations are petite generaliza- basis for making selections of cases and
tions (Erickson, 1986) which apply within human subjects is consequently purposeful
the case but do not go beyond the strongest or purposive, since random selection might
possible interpretations warranted by the data easily fail to yield the most informative sites or
from the case. Examples from Education are samples of human subjects, skewing findings
listed below: because of sampling bias. Cases and subjects
may be selected for their representativeness of
• Henson Elementary School’s implementation of a larger population but are more likely to be
a peer mediation program, despite missteps in chosen for their or informativeness.
training, empowered students to regulate and Exemplary cases, contrasting cases, deviant
improve their interpersonal behaviors (adapted cases, or a range of cases illustrating different
from Mahoney, 1999). aspects or a phenomenon may be of interest.
• Offering schooling in English and in the mother Selection may be based on reputation (e.g.
tongue forced parents in the communities of
the lowest-ranking school on California’s
the remote highlands of Papua New Guinea to
accountability index, a school renowned for
choose between their children’s future economic
prosperity and maintenance of their cultural its focus on the arts) or location (e.g. a class-
identities (adapted from Malone, 1997). room in a juvenile detention center, one school
• Local teachers’ classroom assessments were more per geographic area) or demographics (e.g.
informative about their students’ achievements a school with a minority student population,
but were sidelined by state tests (adapted from a women’s college) or other selection criteria.
Mabry et al., 2003). Such cases may not produce generalizable
theory but are very capable of contributing
The temptation to generalize beyond a cir- to it.
cumscribed case may be strong; for example, Convenience sampling is inevitably a factor
wouldn’t peer mediation likely empower in any sampling strategy, reflecting subjects’
other children to regulate and improve their willingness to participate or to grant access
interpersonal behaviors? Firestone’s (1993) to a site, and suggesting two caveats. First,
concept of analytic generalizations, in which for case study or any other research, restricted
the ‘theory in question is embedded in a access can diminish representativeness where
broader web of theories … [used] to link typicality is desired. Second, researchers have
specific study findings to the theory of interest’ more often chosen to study, for example, the
(p. 17) hints at the possibility of extension. impoverished than the rich, low-level workers
More common are case-to-case generali- rather than CEOs. This tendency may be
zations (Firestone, 1993), readers’ links related to a greater openness among the haves
between case reports and cases of personal as opposed to the have-notes. The result is that,
interest. Case study reports which include across cases, these methods have historically
substantial data, even some relatively unin- exhibited socio-economic lopsidedness.
terpreted observation narratives and interview
excerpts, not only convey vicarious experi-
Role of theory
ence but also present the evidentiary base.
These readerly or open texts invite readers Theory can be produced in small-scale
to construct individual interpretations and studies but, more often, such generalizations
REFERENCES Fine, M. (1991). Framing dropouts: Notes on the politics

of an urban public high school. Albany, NY: SUNY
Anderson, G. A. (1989). Critical ethnography in Press.
education: Origins, current status, and new direc- Firestone, W. A. (1993). Alternative arguments for
tions. Review of Educational Research, 59 (3), generalizing from data as applied to qualitative
249–270. research. Educational Researcher, 22 (4), 16–23.
Barritt, L., Beekman, T., Bleeker, H. & Mulderij, K. Geertz, C. (1973). The interpretation of cultures:
(1985). Researching educational practice. University Selected essays. New York: Basic Books.
of North Dakota: Center for Teaching and Learning. Glaser, B. G. & Strauss, A. I. (1967). The discovery of
Campbell, D. T. & Stanley, J. C. (1963). Experimental grounded theory. Chicago, IL: Aldine.
and quasi-experimental designs for research. Boston: Glassman, M. (2001). Dewey and Vygotsky: Society,
Houghton-Mifflin. experience, and inquiry in educational practice.
Carter, K. (1993). The place of story in the study Educational Researcher, 30 (4), 3–14.
of teaching and teacher education. Educational Greene, J. C., Caracelli, V. & Graham, W. F. (1989).
Researcher, 22 (1), 5–12, 18. Toward a conceptual framework for multimethod
Chatterji, M. (2005). Evidence on ‘what works’: An evaluation designs. Educational Evaluation and Policy
argument for extended-term mixed-method (ETMM) Analysis, 11 (3), 255–274.
evaluation designs. Educational Researcher, 33 (9), Guba, E. G. & Lincoln, Y. S. (1994). Competing
3–13. paradigms in qualitative research. In N. K. Denzin &
Comte, A. (1822/1970). Plan des Travaux Scientifiques Y. S. Lincoln (Eds.), Handbook of qualitative research
Nécessaires pour Réorganiser la Société. Paris: (pp. 105–117). Thousand Oaks, CA: Sage.
Editions Aubier-Montaigne. Guba, E. G. & Lincoln, Y. S. (2005). Paradigmatic con-
Datta, L. (1997). Multimethod evaluations: Using troversies, contradictions, and emerging confluences.
case studies together with other methods. In In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook
E. Chelimsky & W. R. Shadish (Eds.), Evaluation for the of qualitative research (3rd ed., pp. 191–216).
21st century: A handbook (pp. 344–359). Thousand Thousand Oaks, CA: Sage.
Oaks, CA: Sage. House, E. R. (1994). Integrating the quantitative and
David, M. (2005). The correspondence theory of truth. qualitative. In C. S. Reichardt & S. F. Rallis (Eds.), The
In E. N. Zalta (Ed.), The Stanford encyclopedia qualitative-quantitative debate: New perspectives
of philosophy. Retrieved May 22, 2006 from (pp. 13–22). In W. R. Shadish (Ed.), New Directions
http://plato.stanford.edu/archives/fall2005/entries/ for Program Evaluation (no. 61). San Francisco:
truth-correspondence/. Jossey-Bass.
Denzin, N. K. (1989). The research act: A theoretical Johnson, R. B. & Onwuegbuzie, A. J. (2004). Mixed
introduction to sociological methods (3rd ed.). methods research: A research paradigm whose
Englewood Cliffs, NJ: Prentice Hall. time has come. Educational Researcher, 33 (7),
Denzin, N. K. (1997). Interpretive ethnography: Ethno- 14–26.
graphic practices for the 21st century. Thousand Kant, I. (1781/1996). The critique of pure reason.
Oaks, CA: Sage. (trans. by W. S. Pluhar & P. Kitcher). Indianapolis,
Denzin, N. K. & Lincoln, Y. S. (1994). Handbook of IN: Hackett.
qualitative research. Thousand Oaks, CA: Sage. Kuhn, T. (1962). The structure of scientific revolutions.
Denzin, N. K. & Lincoln, Y. S. (2005). Handbook Princeton, NJ: Princeton University Press.
of qualitative research (3rd ed.). Thousand Oaks, CA: LeCompte, M. D. & Preissle, J. (1993). Ethnography and
Sage. qualitative design in educational research (2nd ed.).
Dilthey, W. (1883/1976). Einleitung in die Geisteswis- San Diego: Academic Press.
senschaften. In H. P. Richman (Ed.), W. Dilthey: Lewis, C., Perry, R. & Murata, A. (2006). How should
Selected writings (pp. 157–263). London: Cambridge research contribute to instructional improvement?
University Press. The case of lesson study. Educational Researcher,
Eisner, E. W. (1991). The enlightened eye: Qualitative 35 (3), 3–14.
inquiry and the enhancement of educational practice. Lincoln, Y. S. & Guba, E. G. (1985). Naturalistic inquiry.
New York: Macmillan. Newbury Park, CA: Sage.
Erickson, F. (1986). Qualitative methods in research Mabry, L. (1991). Nicole: Seeking attention. In
on teaching. In M. C. Wittrock (Ed.), Handbook D. B. Strother (Ed.), Learning to fail: Case studies
of research on teaching (3rd ed., pp. 119–161). of students at-risk (pp. 1–24). Bloomington, IN: Phi
New York: Macmillan. Delta Kappa.
result from large-scale research. Large-scale process. Theoretical triangulation, noted ear-
quantitative research often involves causal lier, facilitates interpretation by offering views
analysis for the purpose of prediction and of the data through different explanatory
control of future behaviour. For example, lenses. Different potential interpretations
physicians may prescribing a drug to patients suggested by different theories help the
based on studies suggesting the drug alleviates interpretivist case study researcher to think
their disease. Often, an experimental study deeply about meaning.
begins with a hypothesis3 derived from theory
and tested empirically, a deductive approach
in which theory propels data collection. CONCLUSION
The inverse of this approach is more
common to case study where theory devel- With deep understanding of a case as the prime
opment (if any) is inductive, following data goal of case study, an attitude of openness
collection and explaining the dataset. Theory may be the most fortuitous item in a case
may emerge, perhaps unexpectedly, through study researcher’s dispositional toolkit. There
constant-comparative method, a dialogic is always more that can be learned about a
cycle of data collection and interpretation case, more potential interpretations of existing
which is incomplete until interpretations data, and new events that create alterations in
encompass all available data (Glaser & Strauss, the case. Premature conclusions can foreclose
1967; Strauss & Corbin, 1990), as noted earlier. on deeper understanding. Curiosity to know
Small-scale studies can also refine existing more and to understand better encourages
theory, for example, physicians prescribing a delving deeply into the meaning of the case.
drug to all patients except those from a specific Link by link, case by case, construction of
ethnic group which tends to react negatively meaning by the researcher, by the reader,
as revealed by prior case studies. and by the research community is how
Whether or not research generates theory, case study contributes to social science and
personal theory plays a role in all research. to society. As accumulated case studies
Case study researchers who claim their work refine understandings of social phenomena,
is ‘merely descriptive’ or ‘atheoretical’ inap- accumulated practice of case study may
propriately deny the effects of their own con- continue to refine these methods, resulting in
ceptualizations of the phenomena they study. ever more careful and nuaneed social science.
From the outset of a case study, formation of
a research question indicates an underlying
personal theory, perhaps implicit, about the NOTES
nature of the phenomenon. Even word choice
signals theory, for example, ‘active classroom’ 1 In contrast to a phenomenological emic approach
suggesting a theory that learners are active to research is an etic approach in which an outsider’s –
rather than an insider’s – perspective is offered to
constructors of their knowledge bases; or readers (see Seymour-Smith, 1986).
‘chaotic’ suggesting a theory that knowledge 2 Triangulation, a term derived from nautical
is delivered rather than constructed and that procedures for locating ships at sea based on three
learning is passive. As part of the effort to points, does not presume three sources (or methods,
discipline their subjectivities, interpretivist observers, data collection events, or theoretical
perspectives). More or fewer, as needed and as
researchers may try to articulate their personal available, may be consulted.
theories, making them explicit for readers as 3 Actually, experimental studies generally begin
they consider the validity of descriptions and with null hypotheses, testing to see whether the
findings. inverse of the actual hypothesis can be proved
While theory development is not usually false – thus providing indirect evidence that the
actual hypothesis is true. Note that this approach
expected in small-scale studies, the use of is essentially a matter of ruling out rival hypotheses
theory to analyze data may nevertheless be to narrow the range of possible explanations for a
a highly productive part of the interpretive phenomenon.
Mabry, L. (1998). Case study methods. In H. J. Walberg & Piaget, J. (1955). The language and thought of the child.
A. J. Reynolds (Eds.), Evaluation research for New York: World.
educational productivity (pp. 155–170). Greenwich, Polanyi, M. (1958). Personal knowledge: Towards a
CT: JAI Press. post-critical philosophy. Chicago, IL: University of
Mabry, L. (2003). In living color: Qualitative methods Chicago Press.
in educational evaluation. In T. Kellaghan & Popper, K. R. (1935). Logik der Forschung. Vienna: Julius
D. L. Stufflebeam (Eds.), International handbook Springer Verlag.
of educational evaluation (pp. 167–185). Boston: Schutz, A. (1967). Collected papers I: The problem of
Kluwer-Nijhoff. social reality. The Hague: Martinus Nijhoff.
Mabry, L., Poole, J., Redmond, L. & Schultz, A. Schwandt, T. A. (1994). Constructivist, interpretivist
(2003). Local impact of state-mandated testing. approaches to human inquiry. In N. K. Denzin &
Education Policy Analysis Archives, 11(22). Available Y. S. Lincoln (Eds.), Handbook of qualitative research
at: http://epaa.asu.edu/epaa/v11n22/ (pp. 118–137). Thousand Oaks, CA: Sage.
Mahoney, K. K. (1999). Peer mediation: An ethnographic Seymour-Smith, C. (1986). Dictionary of anthropology.
investigation of an elementary school’s program. Boston: G. K. Hall.
Unpublished doctoral dissertation. Indiana University, Shadish, W. R., Jr., Cook, T. D. & Leviton, L. C.
Bloomington, IN. (1991). Foundations of program evaluation: Theories
Malone, D. L. (1997). Namel manmeri: Language of practice. Newbury Park, CA: Sage.
and culture maintenance and mother tongue Smith, L. (1978). An evolving logic of participant
education in the highlands of Papua New Guinea. observation, educational ethnography and other case
Unpublished doctoral dissertation. Indiana University, studies. In L. Shulman (Ed.), Review of research in
Bloomington, IN. education (vol. 6, pp. 316–377). Itasca, IL: Peacock.
Maxwell, J. A. (1992). Understanding and validity in Spindler, G. D. (1997). Beth Anne–A case study of cul-
qualitative research. Harvard Educational Review, turally defined adjustment and teacher perceptions.
62 (3), 279–300. In G. D. Spindler (Ed.), Education and cultural process:
McDonald, S.-K., Keesler, V. A., Kauffman, N. J. & Anthropological approaches (3rd ed., pp. 246–261).
Schneider, B. (2006). Scaling-up exemplary interven- Prospect Heights, IL: Waveland Press.
tions. Educational Researcher, 35 (3), 15–32. Spiro, R. J., Vispoel, W. P., Schmitz, J. G.,
Mertens, D. (2005). Research and evaluation in Samarapungavan, A. & Boerger, A. E. (1987).
education and psychology: Integrating diversity Knowledge acquisition for application: Cognitive
with quantitative, qualitative, and mixed methods flexibility and transfer in complex content domains.
(2nd ed.). Thousand Oaks, CA: Sage. In B. C. Britton (Ed.), Executive control processes (pp.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educa- 177–199). Hillsdale, NJ: Erlbaum.
tional measurement (3rd ed., pp. 13–103). New York: Stake, R. E. (1978). The case study method in social
American Council on Education, Macmillan. inquiry. Educational Researcher, 7 (2), 5–8.
Nietzsche, F. (1879/1996). Human, all too human. Stake, R. E. (2005). Qualitative case studies. In
(trans. by R. J. Hollingdale). Cambridge, MA: N. K. Denzin & Y. S. Lincoln (Eds.), Handbook
Cambridge University Press. of qualitative research (3rd ed., pp. 443–466).
Nietzsche, F. (1882/1974). The gay science. (trans. Thousand Oaks, CA: Sage.
W. Kaufmann). London: Vintage Books. Stake, R., Bresler, L. & Mabry, L. (1991). Custom and
Peck, C. A., Mabry, L., Curley, J. & Conn-Powers, M. cherishing: The arts in elementary schools. Urbana, IL:
(1993, May). Implementing integration at the Council for Research in Music Education, University
preschool and kindergarten level: A follow-along of Illinois.
study of Washington’s efforts. Washington Office of Strauss, A. & Corbin, J. (1990). Basics of qualitative
the Superintendent of Public Instruction and Early research: Grounded theory procedures and tech-
Childhood Development Association of Washington’s niques. Newbury Park, CA: Sage.
Infant and Early Childhood Conference, Seattle, WA. Tobin, J. J., Wu, D. Y. H. & Davidson, D. H. (1989).
Peshkin, A. (1986). God’s choice. Chicago: University of Preschool in three cultures: Japan, China, and the
Chicago Press. United States. New Haven, CT: Yale University Press.
Peshkin, A. (1988). In search of subjectivity–one’s own. von Wright, G. H. (1971). Explanation and understand-
Educational Researcher, 17 (7), 17–22. ing. London: Routledge & Kegan Paul.
Phillips, D. C. (1995). The good, the bad, and the Vygotsky, L. S. (1978). Mind in society: The development
ugly: The many faces of constructivism. Educational of higher mental process. Cambridge, MA: Harvard
Researcher, 24 (7), 5–12. University Press.
Wolcott, H. F. (1987). The teacher as an enemy. In Wolcott, H. F. (1995). The art of fieldwork. Walnut Creek,
G. D. Spindler (Ed.), Education and cultural process: CA: AltaMira Press.
Anthropological approaches (2nd ed., pp. 136–150). Worthen, B. R., Sanders, J. R. & Fitzpatrick, J. L.
Prospect Heights, IL: Waveland Press. (1997). Program evaluation: Alternative approaches
Wolcott, H. F. (1994). Transforming qualitative data: and practical guidelines (2nd ed.). New York:
Description, analysis, and interpretation. Thousand Longman.
Oaks, CA: Sage.
14
Longitudinal and Panel Studies
Jane Elliott, Janet Holland and
Rachel Thomson
INTRODUCTION Both qualitative and quantitative longitu-

dinal research traditions are well established,
Longitudinal social research offers unique with quantitative longitudinal research having
insights into process, change and continuity offered a powerful input into government
over time in phenomena ranging from indi- policy in many societies. We indicate here
viduals, families and institutions to societies. some of the established quantitative longitu-
It helps to map the social world temporally, dinal studies with their complex, cumulative
enabling us to make sense of changes that datasets, which have made, and are making,
take place between generations, within the life considerable contributions in these areas, and
course and through history. Longitudinal data discuss developments of analysis. The focus
can broadly be understood as any information is on panel studies and cohort studies where
that tells us about what has happened to a set the same group of individuals are followed
of research cases over a series of time points. through time. Trend studies, which focus on
The majority of longitudinal data take human change over time by using repeated cross-
subjects as the unit of analysis, and therefore sectional samples (for example opinion polls
longitudinal data commonly record change which track changes in the popularity of
at an individual or ‘micro’ level (Ruspini, political parties) are beyond the scope of
2002). They can be contrasted with cross- this chapter. Qualitative longitudinal work
sectional data, which record the circumstances has been the mainstay of some social
of individuals (or other research units) at science disciplines and subsets of sociol-
just one particular point in time. Different ogy, including anthropology, oral history,
traditions of longitudinal research seek to community studies, education studies and
combine analyses of quantity (of cases) criminology. It is currently gaining ground
and the quality (of changes) in different in the social sciences more generally, and
ways, producing different types of data, is also becoming valued by policy-makers
privileging particular forms of understanding, concerned with issues where questions about
and pursuing different logics of enquiry. what happens are seen to need the unique
LONGITUDINAL AND PANEL STUDIES 229
experiential and contextual elaboration of In qualitative approaches, such as life history

qualitative approaches. In this chapter we research, individuals may be asked to report
examine and review both qualitative and events spanning a lifetime.
quantitative approaches to longitudinal social A potential problem in quantitative research
research. In some instances the problems is that people may not remember the past
and contributions are shared and similar. In accurately enough to provide good quality
others the specificities of quantitative and data. While some authors have argued that
qualitative research raise particular issues, recall is not a major problem for collecting
or create particular inflections on common information about dates of significant events,
issues. Commonalities can include overall other research suggests that individuals may
research design (prospective, retrospective, have difficulty remembering dates accurately
cohort), and issues of attrition, archiving and or may prefer not to remember unfavourable
ethics; differences include modes of data episodes or events in their lives (Dex, 1995;
generation, type of data generated, methods of Dex and McCulloch, 1998; Jacobs, 2002;
analysis and conceptualisation of the subject. Mott, 2002). The techniques for helping
Where issues are common they are merged in respondents to remember accurately the dates
the chapter, with specific inflections indicated; of events of interest to the researcher can
where different they are discussed separately. be similar to those used by some qualita-
tive researchers. It is by linking together
experiences across different life domains
COLLECTING LONGITUDINAL DATA that it becomes easier to remember exactly
when specific events took place. Qualitative
researchers are generally interested in the
Prospective and retrospective
meaning of events for participants and so
research designs
might be less interested in the accuracy of
Longitudinal data are frequently collected descriptions of the past, but regard reflective
using a prospective longitudinal research accounts generated in interviews as reworking
design, i.e. the participants in a study the past (Halbwachs, 1992). These reflective
are contacted by researchers and asked to versions of self offered at different points in
provide information about themselves and time can be compared to show for example
their circumstances on a number of different how past events are reworked to validate
occasions. This is often referred to as a panel or conform with current needs and future
study. It is not necessary, however, to use ambitions (Plumridge and Thomson, 2003).
a longitudinal research design in order to Large-scale quantitative surveys often
collect longitudinal data and a conceptual combine a number of different data collection
distinction between longitudinal data and strategies so that they do not always fit
longitudinal research should be maintained neatly into the classification of prospective
(Featherman, 1980; Scott and Alwin, 1998; or retrospective designs. In particular lon-
Taris, 2000). Indeed, the one-off retrospective gitudinal event-history data are frequently
collection of longitudinal data is very common collected retrospectively as part of an ongoing
in both qualitative and quantitative research prospective longitudinal study. For example,
traditions. In quantitative approaches it has the British Household Panel Survey (BHPS)
become an established method for obtaining is a prospective panel study. However, in
basic information about the dates of key life addition to the detailed questions asked every
course events such as marriages, separations year about current living conditions, attitudes
and divorces and the birth of any children and beliefs, in the 1992 and 1993 waves of
(i.e. event history data). This is clearly an the BHPS, respondents were asked to provide
efficient way of collecting longitudinal data information about their past employment
and obviates the need to re-contact the same experiences and their relationship histories.
group of individuals over a period of time. This type of retrospective collection of
information is also common in qualitative it is not clear whether this is an age effect
longitudinal studies. such that as individuals grow older they are
A further type of prospective panel study more likely to vote Conservative or whether
is a linked panel, which uses census data it is a cohort effect so that those born before
or administrative data (such as information 1956 are more likely to vote Conservative
about hospital treatment or benefits records). than those born after 1956. In a longitudinal
This is the least intrusive type of quantitative cohort study we would be able to track the
longitudinal research study as individuals may voting intentions of those who reached age
well not be aware that they are members of the 50 in 2006 throughout their adult lives to see
panel. Unique personal identifiers are used to whether their political allegiances were stable
link together data that were not initially col- or whether they became more Conservative
lected as part of a longitudinal research study. as they grew older. This data could then be
For example a 1 percent sub-sample of records compared with the information from cohorts
from the 1971 British Census has been linked born at earlier and later time periods to see
to records for the same sample of individuals whether there were stable cohort differences
in 1981, 1991 and 2001. This is known as in political beliefs.
the Longitudinal Study of the British Census. Cohort studies allow an explicit focus on
A similar study linking the 1991 and 2001 the social and cultural context that frames
Census records for 5 percent of the population the experiences, behaviour and decisions of
of Scotland has recently been established. individuals. For example, in the case of the
1958 British Birth Cohort Study (the National
Child Development Study), it is important
Cohort studies
to understand the cohort’s educational expe-
A cohort has been defined as an ‘aggregate riences in the context of profound changes
of individuals who experienced the same in the organisation of secondary education
event within the same time interval’ (Ryder, during the 1960s and 1970s, and the rapid
1965: 845). The notion of a group of people expansion of higher education, which was
bound together by sharing the experience of well underway by the time cohort members
common historical events was first introduced left school in the mid 1970s (Bynner and
by Karl Mannheim in the early 1920s. Fogelman, 1993). In a similar way, qualitative
Mannheim argued that people are more longitudinal studies, in following individuals,
sensitive to social phenomena that occur groups and institutions over time, can provide
during their formative years and this may information on the impact of dramatic changes
shape a cohort’s future values and behaviour. of policy on the lives and experiences of
The most straightforward type of cohort used participants. Examples here are the 12–16
in longitudinal quantitative research is the study, which provides insight into the conse-
birth cohort, i.e. a sample of individuals born quences of changing policies in different kinds
within a relatively short time period. We might of schools and communities in Australia,
also choose to study samples of a cohort of and Pollard and Filer charting the effects of
people who got married, or who were released rapidly changing education policy on children
from prison, in a particular month or year. through critical years of their primary and
One major advantage of having longitu- secondary education in the UK (McLeod and
dinal data on a series of separate cohorts Yates, 2006; Pollard and Filer, 1999, 2002).
is that it is possible to distinguish between Qualitative studies constructed in this way
‘age effects’ (or lifecycle effects) and cohort tend to avoid the danger of producing findings
effects. For example, we may discover, from that are disembodied from particular times
a cross-sectional survey carried out in 2006 and places. A similar argument has been made
in Britain, that people over the age of 50 are for quantitative approaches, where the use of
more likely to vote for the Conservative Party data from a single cohort coupled with an
than those under the age of 50. However, awareness of how the historical context could
shape the experiences of that generation of the type of data generated and methods
individuals, has been argued to lead to a more employed are very different from those in
‘narrative’ understanding of the patterns of quantitative studies, and that they can vary
behaviour being investigated (Elliott, 2005). by social science discipline. Imagine the
Comparisons between cohorts can also help wealth of detailed data on all aspects of
to clarify how individuals of different ages the life and culture of the Isthmus Zapotec
may respond differently to particular sets of generated in an ongoing, 40-year study of
historical circumstances. This emphasis on their community (Royce, 2005). In general the
the importance of understanding individuals’ methods used to generate data in qualitative
lives and experiences as arising out of longitudinal research depend on the research
the intersection of individual agency and questions, the substantive research area and
historical and cultural context has become the perspective of the researcher/discipline.
articulated as the life course paradigm. The Anthropology and community studies are
term ‘life course’ refers to ‘a sequence of the lead social science disciplines employing
socially defined events and roles that the long-term fieldwork that can be seen as qual-
individual enacts over time’ (Giele and Elder, itative longitudinal research. The approach
1998: 22). Research adopting the life course is also relatively common in the education
paradigm tends to use both qualitative and field, relevant studies including Pollard and
quantitative data (Elder, 1974; Giele, 1998; Filer, 2002; Gordon et al., 2000; Walkerdine
Laub and Sampson, 1998). et al., 2001; Yates et al., 2002; Ball et al.,
Studies can combine qualitative and quan- 2000 and Kuhn and Witzel, 2000. Qualitative
titative methods in different ways, and longitudinal work is particularly apposite in
although advocating the need for both, a developmental psychology and health – key
discussion of mixed methods is beyond the studies include Cutting and Dunn, 1999;
scope of this chapter, other than stating Hughes and Dunn, 2002; Brown and Gilligan,
that the combination of methods varies 1992; Gilligan, 1993; Gulbrandsen, 2003 and
considerably. For example predominantly Woodgate et al., 2003. There is increasing
quantitative studies may have qualitative ‘add use of this approach in sociology (Du Bois-
ons’ (for example, see Gorell-Barnes et al., Reymond, 1998) and policy studies, dealing
1998), studies may integrate both approaches with policy development and evaluation,
(Du Bois-Reymond, 1998), and studies may impact and process (Molloy et al., 2002;
begin as primarily quantitative and become Mumford and Power, 2003). Other sociology
increasingly qualitative over time as sample sub-disciplines where qualitative longitudinal
size erodes (Dwyer and Wyn, 2001). research is prevalent include criminology,
Table 14.1 provides a brief summary of a covering criminal, drug use and sex work
small selection of quantitative and qualitative ‘careers’ (Farrall, 2004; Plumridge, 2001;
studies that have used different longitudinal Smith and McVie, 2003), life course/life
panel designs, focusing on those that are history studies (Elder and Conger, 2000;
commonly used in Britain, North America and Laub and Sampson, 2003) and childhood
Europe. While some of these are individual and youth studies (Henderson et al., 2007;
research projects others are multipurpose Neale and Flowerdew, 2003; White and Wyn,
studies that generate datasets that can be used 2004). Areas investigated include for example
as resources by other researchers. gender, families, parenting, child develop-
ment, children and young people, changing
health status, all manner of transitions in life,
GENERATING QUALITATIVE sexuality, employment and the impact of new
LONGITUDINAL DATA technology.
Two collections of anthropological studies,
We can see from the examples of qualita- themselves providing a review of the field
tive longitudinal studies in Table 14.1 that over time, yield a fascinating picture of the
Table 14.1 Examples of longitudinal studies
Study Type Country Date started Frequency of data Main focus Key reference or website
collection
Panel Study of Income Household USA 1968 Annual Income http://psidonline.isr.umich.edu/
Dynamics McGonagle and Schoeni, 2006
National Longitudinal Cohort USA 1966, 1971 etc. Annual A series of cohort studies http://www.bls.gov/nls/
surveys started at different times NLS Handbook, 2005
and with cohorts of http://www.bls.gov/nls/handbook/
different ages, with a nlshndbk.htm
primary focus on
employment
Survey of Income and Household USA 1984 Every 4 months Income support http://www.bls.census.gov/sipp/
Program participation SIPP users Guide 2001 available in PDF at
http://www.bls.census.gov/sipp/pubs.html
National Longitudinal Cohort of children aged Canada 1994 Every 2 years Well-being and http://www.statcan.ca/english/sdds/
Study of Children and 0–11 development of children
Youth into early adult life
British Birth Cohort Cohort Great Britain 1946, 1958, Varies, but Health and child http://www.cls.ioe.ac.uk/
Studies: National 1970 and 2000 generally every 2–3 development with a http://www.nshd.mrc.ac.uk/
Survey of Health and years at early broader focus in adult life Dex and Joshi, 2005; Ferri et al., 2003
Development; National stages of children’s (the 1946 cohort study is
Child Development development and more specifically focused
Study; British Cohort every 4 years in on health)
Study 1970; Millennium adult life
Cohort Study
Longitudinal Study of Linked panel using England and Wales 1971 Links decennial Demographic and http://www.celsius.lshtm.ac.uk/
the Census in England census data census data employment topics Blackwell et al., 2003; Akinwale et al., 2005
and Wales included in the census
German Household study West Germany and 1984 Annual Broad focus on living http://www.diw.de/english/sop/
Socio-economic Panel now includes the conditions, social change,
former GDR education and
employment
EU Survey on Income Household study European 2003 (ECHP Annual Living conditions, http://epunet.essex.ac.uk/EU-SILC_UDB.pdf
and Living Conditions Community from 1994 to employment, income, http://www.iser.essex.ac.uk/epag/
(EU-SILC) formerly the 2001) health and housing dataset.php
European Community Berthoud and Iacovou, 2002
Household Panel
(ECHP)
The Isthmus Zapotec Anthropological, Mexico (USA) 1967 Varies, but Identity, language, Royce, 1977, 1982, 1993, 2002
ethnographic includes between every 1–3 culture, art; change/
dance, photography, years continuity
art, artefacts, advocacy
The Harvard Chiapas Anthropological, Mexico (USA) 1957 Continuous Determinants and Vogt, 1957, 1969, 1994, 2002
Project Tzotzil and controlled comparative, annually processes of cultural
Tzeltal Indians ethnographic team 1957–1980, more change, language,
approach sporadic since conceptual system
Gwembe, Valley Tonga Anthropological Northern Rhodesia/ 1956 Initially 5-year Resettlement Cliggett, 2002; Scudder and Colson,
(Northern demographic census, Zambia (UK) intervals, then post-dramatic 1979, 2002
Rhodesia/Zambia) ethnographic team varies environmental change
approach (Kariba Dam) Cultural,
social, political change
12–18 Project Sociology/ Psychology. Australia 1993 1993–2000 Gendered subjectivity, McLeod and Yates, 2006
Interview study of 4 twice-yearly identity formation,
schools, ethos, effect interviews interaction of
on young people institutional + social
contexts
Identity and Learning Educational UK 1987 Annually scheduled Identity, learning stance, Pollard and Filer, 1999
Programme ethnography, of 17 activities over the dynamics of learning
children through ages period of research careers, differentiation
4–16.
Multi-perspective,
collaborative approach
Growing up Girl Multi-method UK 1977 Revisits ages 4, 10, Education, families, Walkerdine and Lucey, 1989; Walkerdine
psychosocial study of 16, 21 gender, ethnicity and et al., 2001
female subjectivities social class
and transitions to
womanhood
Inventing Adulthoods Multi-method UK 1996 5/6 waves in Values, identities, Henderson et al., 2007; Thomson, 2007
sociological study of 10 years material & social
100 young people’s resources
transitions to
adulthood
Middletown First classic US USA 1924 E.g. 1924, 1935, To study synchronously Caccamo, 2000; Caplow and Bahr, 1979;
community study. 1979, 1982, 2001 the interwoven trends Caplow et al., 1982; Lynd and Lynd,
Many others that are the life of a small 1929, 1935
followed up American city
range and complexity of qualitative longitu- community studies has been heightened by the
dinal research undertaken in this field (Foster growing trend of researchers to return to the
et al., 1979; Kemper and Royce, 2002). Each site of earlier research. Follow-up studies have
provides considerable insight into an estab- involved the same researcher(s) (Stacey et al.,
lished canon of long-term anthropological 1975) or others (Warwick and Littlejohn,
enterprise. This has involved the development 1992). In the 1990s for example, Fiona Devine
of a necessarily flexible approach, adapting to returned to Luton where working-class car
changes in the nature of the community; in the workers were first studied by Goldthorpe and
needs, goals, options and world-views of com- his colleagues and described in The affluent
munity members; in the political landscape worker (1968). Devine (1992) was interested
and in the relationships between researchers to see what changes had occurred in the
and community members. Importantly, it intervening period in relation to working-class
illustrates how projects need to be organised lifestyles and political and social beliefs. The
on the basis of personnel and project size. As Lynd and Lynd study of Middletown became
Kemper and Royce indicate it is impossible a benchmark for community studies that was
to take on issues of time without the research revisited by the Lynds themselves and by
coming into the frame, including practical many others up to the present day (Caccamo,
questions of how to organise and maintain a 2000, see also Crow, 2002; Crow and Allen,
team, the domestic politics of a research team, 1994).
funding and job security issues and intellec- The types of methods used to generate data
tual fashions. Many of these issues are also rel- in qualitative longitudinal research, are those
evant for quantitative longitudinal research. of qualitative research in general, and can
The body of anthropological research and be combined in various ways, including with
the issues taken into consideration provide quantitative methods (for example surveys
models for other disciplines and illustrate of varying sizes and types, the collection
some differences in the concerns of different of baseline descriptive statistical and demo-
disciplines. An example here is concern about graphic data to enable assessment of change
anonymity and confidentiality that emerges over time, social mapping of geographical
for many qualitative researchers, inhibiting areas). The basic method in anthropology,
the sharing of data. Data sharing and participa- although now widely used in other disciplines,
tory involvement with those studied are well is ethnography, itself constructed from multi-
established in anthropology, although perhaps ple qualitative methods. Critically, however,
in danger in a constrained funding climate. ethnography involves social exploration, pro-
Anthropological studies can in some ways tracted investigation and the interpretation
be seen as community studies, but the of local and situated cultures grounded
community studies literature tends to straddle in attention to the singular and concrete
disciplinary boundaries, including sociology, (Atkinson and Hammersley, 1994; Atkinson
anthropology and geography or urban studies, et al., 2001). Amongst specific methods used
and many of the classic studies were con- in qualitative longitudinal research are inter-
ducted within these fields, often drawing on views on a continuum from semi-structured to
an ethnographic method in which time, and depth. Increasingly favoured are biographical
change through time were critical elements. interviews, which can relate to specific
Important here are the urban ethnographic episodes in, or aspects of, a life, or be more
tradition of the Chicago School (Lynd and holistic as in life history approaches. Also
Lynd, 1929, 1935; Whyte, 1943, 1955; Wirth, employed are case studies, observation and
1938) and family and community studies documents including diaries kept specifically
in the UK. Examples include Young and for the research (written, audio-, video, photo-
Wilmott’s studies of the family in Bethnal diaries etc.; Thomson and Holland, 2005).
Green (1957) and Stacey’s Banbury studies Various standard instruments can also be used,
(1960 and 1975). The temporal character of particularly in psychology. Visual, play and
drawing methods have also been developed, Qualitative longitudinal research can gen-
the latter for example with children. erate and test theory, and both inductive and
Further aspects of research design will deductive approaches can be undertaken, the
also be influenced by the social science specific theory again depending on the dis-
discipline or disciplines within which the cipline. Whatever the theoretical perspective
investigation takes place. This includes the of a qualitative longitudinal study, it requires
nature of the sample to be selected, the unit of a theorisation of temporal processes. The
analysis for the research (including individual, structure of a qualitative longitudinal study
group, community, organisation, institution, makes it possible to employ an iterative and
events, time period, spatial or geographical reflexive approach through which theoretical
entities) and the overall timeframe of the study interpretations can be revisited in subsequent
(including time intervals if relevant). contact with the participants leading to further
A major value of qualitative longitudinal development of the ideas. A view emerging
research is flexibility, with the potential for in the field is that a qualitative longitudinal
development and innovation to take place methodology might itself challenge or expose
throughout the entire research process. For the static character of existing theoretical
example, with technological development, frameworks, and in this way might represent
types of visual data (photography, video a theoretical orientation as much as a method-
and hypermedia) are becoming increasingly ology (McLeod, 2003; Neale and Flowerdew,
popular in qualitative longitudinal research 2003; Plumridge and Thomson, 2003).
as in qualitative research in general (Pink, Vogt, an anthropologist who worked on
2004a, 2004b; Qualitative Sociology, 1997). the Harvard Chiapras Project for many years,
Changing technology is enabling the devel- notes some advantages of the qualitative
opment and enhancement of ways of stor- longitudinal approach:
ing, accessing and representing data. This
The principal advantage of a continuous long-
flexibility can extend to sampling, methods, range project over a short-range one, or a series
units of analysis and theorisation. Sampling of revisits, is the depth, quality, and variety of
in qualitative research tends to follow a understandings achieved – understandings of the
theoretical, rather than a statistical logic basic ethnography and of the trends and processes
and so is characteristically conceptually and of change. If the long-range project also involves
a sizable team of students and younger colleagues
purposively driven. There is less concern than who make one or more revisits and keep abreast
in quantitative approaches for representative- of all the publications, then there is the added
ness, and sample and sampling can change advantage of having a variety of fieldworkers with
in the process of the research, even more varied training and different theoretical biases who
so in the longer-term qualitative longitudinal are forced to reconcile their findings and their
analyses with one another. Vogt (2002: 145)
research. Two major approaches are purposive
and theoretical sampling. In the first, cases are
chosen because they illustrate some feature or
Problems of attrition
process in which the researcher is interested;
in the second, samples are selected on the basis A major methodological issue for both qual-
of their relevance to the research questions itative and quantitative longitudinal studies
and theoretical position of the researcher, with the individual as the unit of analysis is
and characteristics or criteria which help to the problem of attrition, i.e. the drop-out of
develop and test the theory underlying the participants through successive waves of a
work are built into the sample. In the course prospective study. Each time individuals in
of ongoing research and analysis, purposively a sample are re-contacted there is the risk
chosen confirming or negative cases can also that some will refuse to remain in the study,
be used to enrich the data and its analysis and some will be untraceable, and some may have
interpretation (Mason, 2002; Morse, 1994; emigrated or died1 . In the United States the
Patton, 1990). National Longitudinal Study of Youth (1979)
is regarded as the gold standard for sample in the context of quantitative research. In
retention against which other surveys are particular the relatively high cost of conduct-
evaluated (Olsen, 2005). Olsen reports that in ing quantitative longitudinal studies makes
2002, 23 years after the first data collection, it important that the fullest possible use
there were 9,964 respondents eligible for is made of the data resource. In the past,
interview and of these 7,724 (77.5 percent) archiving and use of archived qualitative
were successfully interviewed. data for substantive and theoretical re-enquiry
The prospective nature of the majority of have been relatively limited, and propos-
longitudinal studies means that information als for such developments provoked mixed
will have been collected in earlier sweeps reactions, although attitudes are changing
about members of the sample who are not (Holland et al., 2004; Parry and Mauthner,
contacted, or refuse participation, in later 2004: 139). Again there are differences within
sweeps. This makes it possible to correct for social science, with anthropology and oral
possible distortion in results due to missing history leading the field in archiving and
cases. In quantitative research weights may re-use particularly of longitudinal material
be applied or models may be constructed (Sheridan, 2000; Webb, 1996). The iterative,
explicitly to adjust for missing data. In processual nature of qualitative research and
both qualitative and quantitative studies new consequent re-formulation and refinement of
members of the panel may be brought in, research questions over time also makes clear
and/or studies may over-sample particular definition of secondary, as opposed to primary,
groups from the outset in anticipation of analysis difficult and may, to some extent,
uneven attrition. explain the relative lack of secondary analysis
There are a number of ways in which of qualitative data (Hinds et al., 1997). The
sample retention can be maximised in longi- literature on the ethical, methodological and
tudinal studies. These include: using targeted epistemological re-use of qualitative data and
incentive payments; allowing respondents practical support for its archiving is, however,
to choose the mode in which they are growing. A recent review of secondary
interviewed, i.e. by telephone or in a face- analyses of qualitative data in health and
to-face interview (Olsen, 2005); collecting social care research identified 55 studies,
‘stable addresses’ such as the address of mostly North American, and six different
parents or other relatives who are less likely to types of qualitative secondary analysis based
move than the respondent themselves and can on variations in the purpose of the secondary
subsequently be used to trace the respondent; analysis, the extent to which the primary
making regular contact with respondents and and secondary research question differed and
asking them to confirm their current address differences in the number and type of datasets
and notify the research group of changes re-used (Heaton, 2000, 2004).
of address. Some of these techniques are
used in qualitative longitudinal studies, but
an important element in retention here is ETHICAL CONSIDERATIONS IN
the relationship that is built up between the LONGITUDINAL RESEARCH
researcher(s) and the participants. In studies
where the unit of analysis is a group or Many of the ethical issues in longitudinal
community rather than an individual, these research are similar to those in cross-sectional
issues are not so important. research. Major concerns, for both qualitative
and quantitative research, are around consent,
confidentiality, anonymity and the distortion
ARCHIVING AND RE-USE OF DATA of life experience through repeated inter-
vention. Concerns around confidentiality and
Archiving and the secondary analysis of anonymity tend to be amplified in the context
longitudinal data are already well established of longitudinal research, where typically more
detailed information is held on participants, or classifications of samples of narratives

increasing the possibility of being able to (Abbott, 1992). Given restrictions on space,
identify individuals, even in large samples. we do not discuss methods such as OLS
Birth cohort studies where the dates of birth of regression and logistic regression, which for
those participating may be widely known are example are commonly used in cohort studies
seen as posing additional risks for disclosure. to examine the links between experiences
Informed consent in the context of longi- in early life and outcomes in adulthood
tudinal research is not a one-off event, but a (see Savage and Egerton, 1997; Schoon
process, with repeated consultation necessary and Parsons, 2002). These methods are also
at each new phase of data generation. In the frequently used to explore associations using
context of qualitative longitudinal research quantitative cross-sectional data. Rather, the
this is frequently extended throughout all focus here is on methods that can only be used
phases of the research, including data analysis with longitudinal data.
and final reporting.
Ethical issues have been much more
Event history modelling
widely discussed in the qualitative literature
than in the context of quantitative research. In many respects event history modelling
In particular qualitative researchers have resembles more widely understood regression
highlighted concerns around the potential techniques, such as ordinary least squares
impact of the research on both researched and regression and logistic regression (where
researchers, intrusion, dependency, emotional the dependent variable is dichotomous). The
involvement and problems of closure and emphasis is on determining the relative
ownership and control of the data (Kemper importance of a number of independent
and Royce, 2002; Mauthner et al., 1998; variables or ‘covariates’ for ‘predicting’ the
Royce, 2005; Ward and Henderson, 2003; outcome of a dependent variable. However,
Yates and McLeod, 1996). In the case of event history modelling differs from standard
researching children and young people, or multiple regression in that the dependent
otherwise potentially vulnerable groups, the variable is not a measurement of an individual
issues once again are intensified (France et al., attribute (such as income or qualifications),
2000; Saldana, 2003). rather it is derived from the occurrence or non-
occurrence of an event, which is temporally
marked. For example, age at first partnership
ANALYSING QUANTITATIVE or length of unemployment. Standard regres-
LONGITUDINAL DATA sion techniques are not appropriate in the case
of event history data, which focus on the
There is an extensive literature on the sta- timing of events, for two reasons. First is the
tistical analysis of longitudinal data (Allison, problem of what duration value to assign to
1984; Cox, 1972; Lancaster, 1990; Yam- individuals or cases that have not experienced
aguchi, 1991), and while some of the the event of interest by the time the data is
approaches described have their roots in collected – these cases are termed ‘censored
engineering and bio-medical research there cases’. A second problem, once a sample is
are an increasing number of social scientists observed longitudinally, is the potential for the
and applied social statisticians working on values of some of the independent covariates
methods which are specifically applicable to to change. The issue then arises as to how
social data (Blossfeld and Rohwer, 1995; Dale to incorporate these ‘time-varying’ covariates
and Davies, 1994; Tuma and Hannan, 1979; into the analysis.
Yamaguchi, 1991). Here we can contrast tra- These two problems have led to the devel-
ditional modelling strategies applied to event opment of modelling techniques specifically
history data and more innovative approaches intended for the analysis of event history
to analysis that aim to provide descriptions data. In essence, these techniques allow us
to evaluate the relative importance of a reconfigured in this way, the unit of analysis
number of different variables, or ‘covariates’ is transferred from being the individual
for predicting the chance, or hazard, of case to being a person-year and logistic
an event occurring. The hazard is a key regression models can be estimated for the
concept in event history analysis, and is dichotomous dependent variable (whether
sometimes also referred to as the hazard rate the event occurred or not) using maximum
or hazard function. It can be interpreted as likelihood methods (Allison, 1984). This
the probability that an event will occur at approach facilitates inclusion of explanatory
a particular point in time, given that the variables that vary over time because each
individual is at risk at that time. The group year, or month, that an individual is at
of individuals who are at risk of the event risk is treated as a separate observation.
occurring are therefore usually referred to as It is also easy to include more than one
the risk set. measure of duration. Discrete time methods
are therefore thought to offer a preferable
Approaches to event history modelling approach when the researcher wants to include
One of the most common approaches within several time-varying covariates. A good
the social sciences is to use Cox’s proportional example is provided by Heaton and Call’s
hazard models or ‘Cox Regression’ (Cox, research on the timing of divorce (Heaton and
1972). This provides a method for modelling Call, 1995). This analytic approach is also
time-to-event data and allows the inclusion of frequently used by those looking at recidivism
predictor variables (covariates). For example, and wanting to understand the timing and
a model could be estimated for duration of correlates of repeat offending (Baumer, 1997;
marriage based on religiosity, age at marriage Benda, 2003; Gainey et al., 2000).
and level of education. Cox Regression will
handle the censored cases correctly, and it Individual heterogeneity
will provide estimated coefficients for each A major limitation with the simple approach
of the covariates, allowing an assessment of to the analysis of discretized longitudinal
the relative importance of multiple covariates data described above, is that it does not
and of any interactions between them. Cox take account of the fact that the unit of
regression is known as a continuous time analysis is the ‘person-year’ and therefore the
approach because it is assumed that the time individual cases are not fully independent (as
that an event occurs is measured accurately. they should be for a logistic regression) but
Even though the Cox model is one are clustered at the level of the person. For
of the most popular and widely applied example, in an analysis modelling duration of
approaches it has two main disadvantages. marriage, an individual who had been married
First, it is relatively inflexible in terms for 10 years would contribute 10 observations
of modelling duration dependence i.e. for or ‘person-years’ to the dataset. Another way
specifying exactly how the hazard may change to understand this problem is to consider
over time, and, second, it makes it difficult that there may be additional variables which
to incorporate time-varying covariates. For have a strong association with the dependent
this reason, many researchers, with an explicit variable but which are not included in the
interest in how the probability of an event model. The existence of such ‘unobserved
occurring changes over time, prefer to use heterogeneity’ will mean that models are
a ‘discrete-time’ approach. This requires that mis-specified and in particular spurious dura-
the data have a specific format. A separate unit tion effects may be detected. The use of
of analysis is created for each discrete time more sophisticated models including fixed
interval. Each record therefore corresponds or random effects models can overcome
to a person/month or person/year (depending these problems and allow the researcher to
on the accuracy with which events have produce more robust estimates of duration
been recorded). Once the data has been dependence. It is beyond the scope of this
chapter to discuss these models but for a more from determining the causal ordering of
detailed introductory treatment see Elliott variables. For example, there is a considerable
(2002), Davies (1994) and Box-Steffensmeier body of research that has shown a strong
and Jones (2004). association between unemployment and ill
health. This can either be interpreted to imply
that unemployment causes poor health or that
Repeated measures analysis
those who are in poor health are more likely
In some quantitative longitudinal research the to become unemployed and subsequently find
focus is not on the timing of events but it more difficult to find another job, i.e.
rather on change in an individual attribute over there is a selection effect such that ill health
time, for example weight, performance score, might be described as causing unemployment
attitude, voting behaviour, reaction time, (Bartley, 1991; Blane et al., 1993). In this
depression etc. In particular, psychologists case, longitudinal data would be needed to
often use repeated measures of traits, disposi- follow a sample of employed individuals and
tions or psychological well-being to examine determine whether their health deteriorated
which factors may promote change or stability if they became unemployed, or conversely
for individuals. This approach can also be used whether a decline in health led to an increased
to investigate what type of effect a particular probability of becoming unemployed (for
life event may have on individual functioning. examples which make use of longitudinal data
For example, several studies examining the to untangle this issue see Montgomery et al.,
potential consequences of parental divorce 1996 and 1999).
for children have compared behavioural In quantitative studies, longitudinal data is
measures and measures of performance in also valuable for overcoming the problems of
mathematics and reading in addition to other disentangling maturational effects and gener-
outcomes, before and after a parental divorce ational effects. As Dale and Davies (1994)
(Cherlin et al., 1991; Elliott and Richards, explain, cross-sectional data that examines the
1991; Ni Bhrolchain et al., 1994). link between age and any dependent variable
confounds cohort and life course effects. As
was discussed above, using the example of
CAUSALITY IN CROSS-SECTIONAL political allegiances, one advantage of having
AND LONGITUDINAL RESEARCH longitudinal data on a number of separate
cohorts is that it enables the researcher to
Information about the temporal ordering of disentangle these effects.
events is generally regarded as essential if Perhaps the major advantage of longi-
we are to make any claims about a causal tudinal data over cross-sectional data in
relationship between those events. Given the understanding the possible causal relation-
importance of establishing the chronology ships between variables is its ability to take
of events in order to be confident about account of omitted variables. Quantitative
causality it can be seen that longitudinal longitudinal data enables the construction of
data is frequently to be preferred over cross- models that are better able to take account of
sectional data. In some substantive examples the complexities of the social world and the
even when data is collected in a cross- myriad influences on individuals’ behaviour.
sectional survey, it is clear that one event or Qualitative researchers can be reluctant
variable, precedes another. For example, in an to use the term causality, seeing it as
analysis that focuses on the impact of school- intrinsically part of a quantitative paradigm.
leaving age on occupational attainment there Understanding phenomena in time enables
is unlikely to be confusion about the temporal a researcher to capture meaning, intention
ordering of the variables. However, there are and consequence, rather than findings true
a number of examples where the use of cross- for all times and places (Gergen, 1984). But
sectional survey data prevents researchers some argue that because of its attention to
detail, process, complexity and contextuality, is not to attempt to model the underlying
qualitative research is particularly valuable processes, but rather to establish a systematic
for identifying and understanding causal and description or typology of the most commonly
multi-causal linkages, especially in relation occurring patterns or sequences within them
to the temporal dimension of the longitu- (Abbott 1990, 1992). This approach has
dinal approach (Mason, 2002; Miles and been termed ‘narrative positivism’. Abbott
Huberman, 1994). Again, whilst this might be introduced the set of techniques known as
the case, qualitative longitudinal researchers Optimal Matching Analysis into sociology
would not necessarily refer to their findings from molecular biology, where it had been
in this way, for example in ethnography used in the study of DNA and other protein
causal theories are common if implicit. The sequences. He has applied the method to
focus on the meaning of experience for a substantive issues including the careers of
participant active in the construction of her/his musicians (Abbott and Hrycak, 1990), and
own identity and reflexive narrative of self the development of the welfare state (Abbott
could lead to explanations that might identify and DeViney, 1992). Following his lead, other
‘causal’ or ‘multicausal’ sequences. Pollard sociologists have also begun to adopt this
and Filer (1999) in a study of primary school approach and in particular have found the
children’s identities and careers eschew a method to be useful for the analysis of careers
focus on the academic and social outcomes (Blair-Loy, 1999; Chan, 1995; Halpin and
usually associated with school achievement, Chan, 1998; Stovel et al., 1996). However the
and what inputs would produce that output. technique is not as well developed or as widely
Taking a holistic approach they highlight the used as the modelling approaches described
dynamic, recursive nature of pupil experience, above (Wu, 2000).
seeing these children as continuously shaping It is perhaps in this approach, which
and maintaining their identity and status aims to provide a detailed description of the
as a pupil as they move through different different types of pathways or trajectories
school settings, in a dynamic, fluctuating followed by individuals, that qualitative
process, open to possibilities for change and quantitative approaches to analysis of
in varying degrees. Many elements are longitudinal data come closest. Abbott’s
identified as contributing in various ways to approach uses large samples and utilises
this reflexive pupil identity – gender, social sophisticated software to construct clusters
class and ethnicity, material, cultural and of cases with similar longitudinal profiles.
linguistic resources, physical and intellectual However, the research question addressed
capability and potential and multiple and using this technique mirrors the type of
various experiences in school observed in the research questions that form the focus of many
study. This is clearly a different understanding qualitative longitudinal studies, although the
of causality than that found in quantitative two approaches provide rather different types
approaches. of data on such trajectories.
NARRATIVE POSITIVISM AND EVENT APPROACHES TO ANALYSIS IN

SEQUENCE ANALYSIS: A MORE QUALITATIVE LONGITUDINAL
QUALITATIVE APPROACH? RESEARCH
Even though the event history techniques The analysis of quantitative data largely
described in the section on quantitative analy- involves statistical modelling of large datasets
sis above are powerful and flexible, they still to identify patterns and relationships in the
have the disadvantage that they do not deal data at an aggregated level to be able to
with sequences holistically. An alternative make probabilistic statements about particular
approach to the analysis of event history data populations.As we have just seen, more recent
holistic approaches are attempting to deal (Wolcott, 1994: 12). Finally, ‘Explaining the
with describing and classifying individual nature and meaning of those changes, or
trajectories through time, through clustering developing a theory with transferability of the
techniques. Qualitative longitudinal data pro- study’s findings to other contexts, is the final
vides a different type of detailed information stage of interpretation’ (Saldana, 2003: 63).
about processes through time for individuals Saldana elaborates the Wolcott schema
or groups of varying sizes, requiring different in his guidebook for qualitative longitudi-
analytic strategies. These methods of analysis nal research, providing framing, descriptive,
will also vary, depending on the discipline, the analytic and interpretive questions to guide
theoretical approach and the unit of analysis. the analytic process. ‘Framing questions’
A key aspect of qualitative longitudinal (p. 63) address and manage the contexts
analysis in general, however, is that it is of the particular study’s data, locating them
theoretically driven, and is characterised by in the process (e.g. what contextual and
a focus on meaning. intervening conditions appear to influence
Saldana highlights colourfully the prob- and affect participant changes through time?).
lems of analysis for the qualitative longitu- Descriptive questions (e.g. what increases
dinal researcher: or emerges through time? What kinds of
surges or epiphanies occur through time?)
The challenge for qualitative researchers is to generate information to help answer the
rigorously analyze and interpret primarily language-
based data records to describe credibly, vividly, and
framing questions, and the more complex ana-
persuasively for readers through appropriate nar- lytic and interpretive questions. Analytic and
rative the processes of participant change through interpretive questions integrate the descriptive
time. This entails the sophisticated transformation information to guide the researcher to richer
and integration of observed human interactions in levels of analysis and interpretation (e.g.
their multiple social contexts into temporal patterns
or structures. (Saldana, 2003: 46)
which changes interrelate through time? What
is the through-line of the study? The through-
The analysis of qualitative longitudinal line is ‘a single word, a phrase, a sentence, or
research must then engage with and capture a paragraph with an accompanying narrative
time, process and change. It requires working that describes, analyzes, and/or interprets
in two temporal dimensions: diachronically, the participant’s changes through time by
through time, and synchronically cross- analyzing its thematic flow—its qualitative
cutting at one point in time, and the trajectory’ (Saldana, 2003: 151, see too
articulation of these two through a third, Saldana, 2005).
integrative dimension. This is recognised as Thomson and Holland (2003) provide an
crucial for analysing change through time example of an analysis attempting two of the
(Saldana, 2003). Even though both qualitative dimensions suggested above in their 10-year
and quantitative longitudinal traditions have study of 100 young people’s transitions to and
realised such analyses, this remains a chal- constructions of adulthood, Inventing Adult-
lenging task both to execute and to describe. hoods. The cross-sectional analysis captures
Here are some of the general approaches a moment in time in the life of the sample
mooted. (at each interview or data generation point) to
Wolcott (1994) suggests three stages of identify discourses through which identities
increasing abstraction for the analytic pro- are constructed. In this case the data was
cess: description, analysis and interpretation. coded descriptively and conceptually (using
Description involves recording, chronicling NUD.IST2 ) to enable comparison across the
and describing what kinds of change occur, sample on the basis of a range of factors,
in whom or what, at what time and in e.g. age, gender, social class, geographical
what context. Analysis accomplishes ‘the location. These analyses form a repeat cross-
identification of essential features and the sectional study on the same sample and
systematic interrelationships among them’ analyses can be compared for change over
time, and each contextualised in social and with whole cases: undertaking comparison
historical time. They highlight differences between cases and between groups of cases,
and similarities within the sample, and help asking questions such as why and how might
identify the relationship between individual something that is present in one case (or
narratives and wider social processes. The group) be absent in another?
longitudinal analysis consists of examining As we can see, qualitative longitudinal stud-
the development of a particular narrative ies produce complex and multi-dimensional
for each case over the course of the study, datasets, which in turn demand innovative
following the complexity and contingency strategies for data analysis and display that
of individual trajectories, and identifying operate on more than two dimensions.
critical moments and change. This individual
temporal analysis can also be related to
social and historical time and change. More CONCLUSIONS: THE CONSTRUCTION
recently Thomson (2007) has described the OF THE INDIVIDUAL IN QUALITATIVE
process of constructing longitudinal case AND QUANTITATIVE LONGITUDINAL
histories. RESEARCH
Drawing on a significant body of policy
evaluation research, Lewis (2005) outlines As we discussed earlier, one of the main
a multi-dimensional approach to qualitative advantages of both qualitative and quan-
longitudinal data analysis built around the titative longitudinal research is the ability
‘framework’ approach to qualitative analy- to track individual lives through time. In
sis developed by the National Centre for quantitative longitudinal research a priority
Social Research (Ritchie and Lewis, 2003). is placed on collecting accurate data from a
Changes in evaluation studies are identified large representative sample about the nature
as occurring at the individual, service and and timing of life events, circumstances and
policy levels. Change is manifest in a literal behaviour. In qualitative longitudinal research
way through the chronology of the account, the emphasis is far more on individuals’
yet it is also evident in how this chronology understanding of their lives and circumstances
is reinterpreted by a research participant and how these may change through time.
over time. Lewis suggests that qualitative Even though both qualitative and quantita-
longitudinal data are characterised by ‘discor- tive longitudinal research have the potential to
dant data’where subsequent re-interpretations provide very detailed information about indi-
conflict with original accounts. To complicate viduals, what is obscured in the quantitative
matters further, not only does the participant approach are the narratives that individuals
reinterpret their story, but the researcher also tell about their own lives. While complex
reinterprets their analysis in the light of new biographical case studies can be developed
revelation and the passage of time. Lewis from survey data (Sampson and Laub,
maps each longitudinal case within a two-by- 1993; Singer et al., 1998), these accounts
two frame that enables them to plot a series of are clearly authored by the researcher and
interviews with a single participant (vertical allow no access to the reflexivity of the
axis) against themes (horizontal axis). In a respondents themselves. In contrast with
similar way to that described by Thomson qualitative longitudinal research, the whole
and Holland (2003), the analysis proceeds in emphasis of the study may be on under-
two directions: horizontally across themes and standing the reflexive process of identity
vertically through a case over time, as well as work accomplished by individuals (Pollard
‘zigzagging’ between themes and interviews and Filer, 1999; Thomson and Holland,
within a single case to trace the development 2003). It is important to be clear therefore
of a theme over time. But in order to move that whereas the criticism that quantitative
away from the single case to the wider dataset, research is less detailed than qualitative
Lewis encourages an approach to working research may be misplaced (particularly in
the context of longitudinal research), there NOTES

is a sense in which quantitative research
can never provide access to the reflexive 1 In some cases in both qualitative and quantitative
individual. The individual in quantitative longitudinal research even those who have emigrated
research is seen as a unitary subject that might be followed up and included.
2 NUD.IST (Non-numerical Unstructured Data
has remained relatively impervious to post- Indexing Searching and Theorizing) is CAQDAS (Com-
modern deconstruction. Even when detailed puter Assisted Qualitative Data Analysis Software).
longitudinal studies are used to construct case Others include The Ethnograph (now out of date),
histories or biographies, the assumption is NVivo7 (a combination of the earlier NUD.IST6 and
that those individuals have a clear, stable and NVivo2), ATLAS.ti, HyperQual (CAQDAS for the Apple
Mac OS).
coherent identity. Importantly, in quantitative
research the description of the individual is
provided by the researcher, and the resources REFERENCES
available are variables which apparently allow
no scope for ambiguity or inconsistency. Abbott, A. (1990). ‘Conceptions of Time and Events in
The identity of individuals, and the meaning Social Science Methods.’ Historical Methods 23,4:
of variables such as gender and social 140–150.
class remain relatively fixed in quantitative Abbott, A. (1992). ‘From Causes to Events: Notes
research (Elliott, 2005). on Narrative Positivism.’ Sociological Methods and
While quantitative longitudinal analytic Research 20,4: 428–455.
processes provide a more processual or Abbott, A. and DeViney, S. (1992). ‘The Welfare State
as Transnational Event.’ Social Science History 16:
dynamic understanding of the social world,
245–274.
they do so at the expense of setting up
Abbott, A. and Hrycak, A. (1990). ‘Measuring
an overly static view of the individual. Resemblance in Sequence Data: An Optimal Matching
Quantitative longitudinal research provides a Analysis of Musicians’ Careers.’ American Journal of
powerful tool for understanding the multiple Sociology 96,1: 144–185.
factors that may affect individuals’ lives, Akinwale, B., Antonatos, A., Blackwell, L. and Haskey J.
shaping their experiences and behaviour. But (2005). ‘Opportunities for New Research Using
there is little scope for understanding how the Post-2001 ONS Longitudinal Study.’ Population
individuals use narrative to construct and Trends 121: 8–16.
maintain a sense of their own identity. Without Allison, P. D. (1984). Event History Analysis: Regression
this element there is a danger that people for longitudinal event data, Beverly Hills: Sage.
Atkinson, P., Coffey, A., Delamont, S., Lofland, J.
are merely seen as making decisions and
and Lofland, L. (2001). Handbook of ethnography,
acting within a pre-defined and structurally
London: Sage, pp. 248–261.
determined field of social relations rather Atkinson, P. and Hammersley, M. (1994). ‘Ethnography
than as contributing to the maintenance and and Participant Observation,’ in Denzin, N. K. and
metamorphosis of themselves, and the culture Lincoln Y. S. (eds) Handbook of qualitative research,
and community in which they live. London: Sage.
In contrast, a more post-modern under- Ball, S. J., Maguire, M. and Macrae, S. (2000).
standing of the self fits easily within quali- Choice, pathways and transitions post-16: New
tative longitudinal research and, indeed, has youth, new economics in the global city, London:
engendered qualitative analysis that empha- RoutledgeFalmer.
sises the role of narrative in the formation Bartley, M. (1991). ‘Health and Labour Force Participa-
tion: Stress, Selection and the Reproduction Costs
and maintenance of the self (e.g. Gubrium
of Labour Power.’ Journal of Social Policy 20,3:
and Holstein, 1995; Ronai and Cross, 1998;
327–364.
Wajcman and Martin, 2002). As has been dis- Baumer, E. (1997). ‘Levels and Predictors of Recidivism:
cussed in more detail elsewhere (Elliott, 2005) The Malta Experience.’ Criminology 35: 601–628.
this provides a powerful argument for the Benda, B. B. (2003). ‘Survival Analysis of Criminal Recidi-
need to use both quantitative and qualitative vism of Boot Camp Graduates Using Elements from
approaches to longitudinal research. General and Developmental Explanatory Models.’
International Journal of Offender Therapy and Cox, D. R. (1972). ‘Regression Models and Life Tables.’
Comparative Criminology 47,1: 89–110. Journal of the Royal Statistical Society B 34:
Berthoud, R. and Iacovou, M. (2002). Diverse Europe: 187–202.
Mapping patterns of social change across the EU, Crow, G. (2002) ‘Community Studies: Fifty Years
Economic and Social Research Council. of theorization.’ Sociological Research Online
Blackwell, L., Lynch, K., Smith, J. and Goldblatt, P. 7,3,http://www.socresonline.org.uk/7/3/crow.html
(2003). ‘Longitudinal Study 1971–2001: Complete- Crow, G. and Allen, G. (1994). Community life:
ness of Census Linkage’ (Series LS No. 10) (PDF 841K), An introduction to local social relations, London:
http://www.celsius.lshtm.ac.uk/2001_data.html Harvester Wheatsheaf.
Blair-Loy, M. (1999). ‘Career Patterns of Executive Cutting, A. L. and Dunn, J. (1999). ‘Theory of Mind,
Women in Finance.’ American Journal of Sociology Emotion Understanding, Language and Family Back-
104: 1346–1397. ground: Individual Differences and Inter-relations.’
Blane, D., Smith, G. and Bartley, M. (1993). ‘Social Child Development 70: 853–865.
Selection: What Does it Contribute to Social Class Dale, A. and Davies, R. (1994). Analyzing social and
Differences in Health.’ Sociology of Health and Illness political change: A casebook of methods, London:
15,1: 1–15. Sage.
Blossfeld, H.-P. and Rohwer, G. (1995). Techniques of Davies, R. B. (1994). ‘From cross-sectional to longitu-
event history modeling: New approaches to causal dinal analysis,’ in Dale, A. and Davis, R. B. (eds)
analysis. Mahwah, NJ: Lawrence Erlbaum Associates. Analyzing social and political change: A casebook of
Box-Steffensmeier, J. and Jones, B. (2004). Event history methods, London: Sage, pp. 20–40.
modeling, Cambridge: Cambridge University Press. Devine, F. (1992) Affluent workers revisited: Privatism
Brown, L.M. and Gilligan, C. (1992). Meeting at and the working class, Edinburgh: Edinburgh
the crossroads: Women’s psychology and girls’ University Press.
development, Cambridge, MA: Harvard University Dex, S. (1995). ‘The Reliability of Recall Data:
Press. A Literature Review.’ Bulletin de Methodologie
Bynner, J. and Fogelman, K. (1993). Making the grade: Sociologique 49: 58–80.
education and training experiences, in Ferri, E. (ed.) Dex, S. and Joshi, H. (2005). Children of the 21st century:
Life at 33: The fifth follow-up of the National Child from birth to nine months, Bristol: The Policy Press.
Development Study, London: National Children’s Dex, S. and McCulloch, A. (1998). ‘The reliability
Bureau, pp. 36–59. of retrospective unemployment history data.’ Work
Caccamo, R. (2000) Back to Middletown: Three Employment and Society 12,3: 497–509.
generations of sociological reflections, Stanford: Du Bois-Reymond, M. (1998). “‘I don’t want to commit
Stanford University Press. myself yet”: Young people’s life concepts.’ Journal of
Caplow, T. and Bahr, H. M. (1979) ‘Half a Century Youth Studies 1,1: 63–79.
of Change in Adolescent Attitudes: Replication of Dwyer, P. J. and Wyn, J. (2001). Youth, education and
a Middletown Survey by the Lynds.’ Public Opinion risk: Facing the future, London: RoutledgeFalmer.
Quarterly 43,1: 1–17. Elder, G. and Conger, R. D. (2000). Children of the
Caplow, T., Bahr, H. M., Chadwick, B. A., Hill, R. and land: Adversity and success in rural America, Chicago:
Williamson, M. H. O. (1982). Middletown families: University of Chicago Press.
Fifty years of change and continuity, Minneapolis, Elder, G. H. (1974). Children of the great depression:
MN: University of Minnesota Press. social change in life experience, Chicago: University
Chan, T.-W. (1995). ‘Optimal Matching Analysis.’ Work of Chicago Press.
and Occupations 22: 467–490. Elliott, B. J. (2002). ‘The Value of Event History
Cherlin, A. J., Furstenberg, F., Chase-Landsdale, P. L. and Techniques for Understanding Social Processes:
Kiernan, K. (1991). ‘Longitudinal Studies of Effects of Modelling Women’s Employment Behaviour After
Divorce on Children in Great Britain and the United Motherhood.’ International Journal of Social Research
States.’ Science Technology & Human Values 252: Methodology 5,2: 107–132.
1386–1389. Elliott, B. J. and Richards, M. P. M. (1991). ‘Children
Cliggett, L. (2002). ‘Multigenerations and Multidis- and Divorce: Educational Performance and Behaviour
ciplines: Inheriting Fifty Years of Gwembe Tonga Before and After Parental Separation.’ International
Research,’ in Kemper, R. and Royce, A. P. Journal of Law and the Family 5: 258–276.
(eds) Chronicling cultures: Long-term field research Elliott, J. (2005). Using narrative in social research:
in anthropology, Walnut Creek, CA: AltaMira, Qualitative and quantitative approaches, London:
pp. 239–251. Sage.
Farrall, S. (2004). ‘Social Capital and Offender Reinte- Halbwachs, Maurice (1992). On collective memory.
gration: Making Probation Desistance Focussed,’ in Translated and edited by Lewis A. Coser. Chicago:
Maruna, S. and Immarigeon, R. (eds) After crime and University of Chicago Press.
punishment: Ex-offender reintegration and desistance Halpin, B. and Wing Chan, T. (1998). ‘Class Careers as
from crime, Cullompton: Willan. Sequences: an Optimal Matching Analysis of Work-
Featherman, D. L. (1980). ‘Retrospective Longitudinal Life Histories.’ European Sociological Review 14,2:
Research: Methodological Considerations.’ Journal of 111–130.
Economics and Business 32: 152–169. Heaton, J. (2000). Secondary analysis of qualitative data:
Ferri, E., Bynner, J. and Wadsworth, M. (2003). a review of the literature, Full Research report ESRC
Changing Britain, changing lives: three generations 1752 (8.00), Social Policy Research Unit, University
at the turn of the century, London: Institute of of York
Education. Heaton, J. (2004). Re-working qualitative data, London:
Foster, G. M., Scudder, T., Colson, E. and Kemper, R. Sage.
(1979). Long-term field research in social anthropol- Heaton, T. B. and Call, V. R. A. (1995). ‘Modeling Family
ogy, New York: Academic Press. Dynamics with Event History Techniques.’ Journal of
France, A., Bendelow, G. and Williams, S. (2000) Marriage and the Family 57: 1078–1090.
‘A “Risky” Business: Researching the Health Beliefs Henderson, S., Holland, J., McGrellis, S., Sharpe,
of Children and Young People,’ in Lewis, A. and S. and Thomson, R. (2007). Inventing adulthood:
Lindsay, G. (eds) Researching children’s perspectives, A biographical approach to youth transitions, London:
Buckingham: Open University Press, pp. 231–263. Sage.
Gainey, R. R., Payne, B. K. and O’Toole, M. (2000). ‘The Hinds, P., Vogel, R. and Clarke-Steffen, L. (1997). ‘The
Relationship Between Time in Jail, Time on Electronic Possibilities and Pitfalls of Doing a Secondary Analysis
Monitoring, and Recidivism: an Event History Analysis of a Qualitative Data Set.’ Qualitative Health Research
of a Jail-Based Program.’ Justice Quarterly 17,4: 7,3: 408–424.
733–752. Holland, J., Thomson, R. and Henderson, S. (2004).
Gergen, K. J. (1984). ‘An Introduction to Historical Social Feasibility study for a possible qualitative longitudinal
Psychology,’ in Gergen, K. J and Gergen, M. M. (eds) study, Specification ad Discussion Paper for Economic
Historical social psychology, London: NJ: Lawrence and Social Research Council, UK.
Erlbaum Associates. Hughes, C. and Dunn, J. (2002). “‘When I Say a Naughty
Giele, J. Z. (1998). Innovation in the typical life course. Word”. A Longitudinal Study of Young Children’s
Methods of life course research: qualitative and Accounts of Anger and Sadness in Themselves
quantitative approaches. J. Z. Giele and G. H. Elder. and Close Others.’ British Journal of Developmental
London: Sage, pp. 231–263. Psychology 20, 515–535.
Giele, J. Z. and Elder, G. H. (1998). Methods of life course Jacobs, S. C. (2002). ‘Reliabilty and Recall of
research: qualitative and quantitative approaches, Unemployment Events Using Retrospective
Thousand Oaks, CA: Sage. Data.’ Work, Employment and Society 16,3:
Gilligan, C. (1993). In a Different Voice: Psychological 537–548.
Theory and Women’s Development, Cambridge, MA: Kemper, R. and Royce, A. P. (eds) (2002) Chronicling
Harvard University Press. cultures: Long-term field research in anthropology,
Goldthorpe, J. H., Lockwood, D., Bechofer, F. and Platt, J. Walnut Creek, CA: AltaMira.
(1968). The affluent worker in the class structure, Kuhn, T. and Witzel, A. (2000). School-to-work Tran-
Cambridge: Cambridge University Press. sition, Career Development and Family Planning –
Gorell-Barnes, L. G., Thompson, P., Barnes, P., Daniel, G. Methodological Challenges and Guidelines of a
and Burchardt, N. (1998). Growing up in stepfamilies. Qualitative Longitudinal Panel Study. Forum: Quali-
Oxford: Oxford University Press. tative Social Research 1, 2: http://www.qualtative–
Gordon, T., Holland, J. and Lahelma, E. (2000). Making research.net/fqs-texte/2-00/2-00kuehnwitzel-e.htm
spaces: Citizenship and difference in schools, London: Lancaster, T. (1990). The econometric analysis of
Macmillan. transition data, Cambridge: Cambridge University
Gubrium, J. F. and Holstein J. A. (1995). ‘Individ- Press.
ual Agency, The Ordinary and Postmodern Life.’ Laub, J. H. and Sampson, R. J. (1998). ‘Integrating
Sociological Quarterly 36,3: 555–570. Quantitative and Qualitative Data,’ in Giele, J. Z. and
Gulbrandsen, L. M. (2003). ‘Peer Relations as Arenas Elder, G. H. (eds) Methods of life course research:
for Gender Constructions Among Young Teenagers.’ qualitative and quantitative approaches, Thousand
Pedagogy, Culture and Society 11,1: 113–132. Oaks, CA: Sage, pp. 213–230.
Laub, J. H. and Sampson, R. J. (2003). Shared Mott, F. (2002). ‘Looking Backward: Post hoc Reflections
beginnings, divergent lives: Delinquent boys to age on Longitudinal Surveys,’ in Phelps E., Furstenberg, F.
70, Cambridge, MA: Harvard University Press. and Colby A. (eds) Looking at lives: American
Lewis, J. (2005). ‘Qualitative Longitudinal Data for longitudinal studies of the twentieth century,
Evaluation Studies,’ SPRU (University of York) and New York: Russell Sage.
CASP (University of Bath), Friends Meeting House, Mumford, K. and Power, A. (2003). East Enders: Family
London 11th November 2005. and community in East London, Bristol: Policy Press.
Lynd, R. and Lynd, H. M. (1929). Middletown. A study Neale, B. and Flowerdew, J. (2003). ‘Time, Texture and
in American Culture, New York: Harcourt Brace. Childhood: The Contours of Longitudinal Qualitative
Lynd, R. and Lynd, H. M. (1935). Middletown in Research.’ International Journal of Social Research
transition: A study of cultural conflicts, New York: Methodology: Theory and Practice 6,3: 189–199.
Harcourt Brace. Ni Bhrolchain, M., Chappell, R. and Diamond, I.
Mannheim, Karl (1956). ‘On the Problem of Genera- (1994). ‘Educational and Socio-demographic Out-
tions,’ in Essays on the sociology of culture. New York: comes Among Children of Disrupted and Intact
Oxford University Press. Marriages.’ Population 36: 1585–1612.
Mason, J. (2002) (2nd edn.). Qualitative researching, Olsen, R. J. (2005). ‘The Problem of Respondent
London: Sage. Attrition: Survey Methodology is Key.’ Monthly Labor
Mauthner, N., Parry, O. and Backett-Milburn, K. (1998). Review 128,2: 63–70.
‘The Data are Out There, or are They? Implications for Parry, O. and Mauthner, N. (2004). ‘Whose Data
Archiving and Revisiting Qualitative Data.’ Sociology are They Anyway? Practical, Legal and Ethical
32,4: 733–745. Issues in Archiving Qualitative Data.’ Sociology 38,1:
McGonagle, K. A. and Schoeni, R. F. (2006). ‘The Panel 139–152.
Study of Income Dynamics: Overview & Summary Patton, M. Q. (1990). Qualitative evaluation and
of Scientific Contributions After Nearly 40 Years.’ research methods (2nd ed.), Newbury Park, CA: Sage.
Retrieved March 2006, from http://psidonline. Pink, S. (ed.) (2004a). Visual images, London:
isr.umich.edu/Publications/Papers/montrealv5.pdf Routledge.
McLeod, J. (2003). ‘Why We Interview Now – Pink, S. (2004b). Home truths: Gender, domestic objects
Reflexivity and Perspective in a Longitudinal Study.’ and the home, Oxford: Berg.
International Journal of Social Research Methodology Plumridge, L. (2001). ‘Rhetoric, Reality and Risk
6,3: 223–232. Outcomes in Sex Work.’ Health, Risk and Society 3,2:
McLeod, J. and Yates, L. (2006). Making modern lives: 119–215.
Subjectivity, schooling and social change, Albany: Plumridge, L. and Thomson, R. (2003). ‘Longitudinal
State University of New York Press. Qualitative Studies and the Reflexive Self.’ Interna-
Miles, M. B. and Huberman, A. M. (1994). Qualitative tional Journal of Social Research Methodology 6,3:
data analysis: An expanded sourcebook (2nd edn), 213–222.
London: Sage. Pollard, A. and Filer, A. (1999). The social world of pupil
Molloy, D. and Woodfield, K. with Bacon, J. (2002). career: Strategic biographies through primary school,
Longitudinal qualitative research approaches in London: Cassell.
evaluation studies, Working Paper No. 7, London: Pollard, A. and Filer, A. (2002). Identity and secondary
HMSO. schooling project. Full report to the ESRC.
Montgomery, S. M., Bartley, M. J., Cook, D. G. and Qualitative Sociology (Spring 1997) 20 (1) Special Issue:
Wadsworth, M. (1996). ‘Health and Social Precursors Visual methods in sociological analysis.
of Unemployment in Young Men in Great Britain.’ Ritchie, J. and Lewis, J. (2003). Qualitative research
Journal of Epidemiology and Community Health 50, practice: A guide for social science students and
415–422. researchers, London: Sage.
Montgomery, S. M., Cook, D. G., Bartley, M. J. and Ronai C. R. and Cross, R. (1998). ‘Dancing With Identity:
Wadsworth, M. (1999). ‘Unemployment Pre-dates Narrative Resistance Strategies of Male and Female
Symptoms of Depression and Anxiety Resulting in Stripteasers.’ Deviant Behaviour 19: 99–119.
Medical Consultation in Young Men.’ International Royce, A. P. (1977). The anthropology of dance,
Journal of Epidemiology 28,1: 95–100. Bloomington: Indiana University Press.
Morse, J. M. (1994). ‘Designing Funded Qualitative Royce, A. P. (1982). Ethnic identity: strategies of
Research,’ in Denzin, N. L. and Lincoln, Y. S. diversity, Bloomington: Indiana University Press.
(eds) Handbook of qualitative research, London: Royce, A. P. (1993). ‘Ethnicity, Nationalism, and the Role
Sage. of the Intellectual,’ in Toland, Judith D. (ed.) Ethnicity
and the state, political and legal anthropology, Vol. 9, Years on,’ Forum Qualitative Sozialforschung/Forum:
New Brunswick, NJ: Transaction Press, pp.103–122. Qualitative Social Research, 1,3. Available at:
Royce, A. P. (2002). ‘Learning to See, Learning to http://qualitative-research.net/fqs/fqs-eng.htm
Listen: Thirty-five Years of Fieldwork with the Isthmus Singer, B., C. D. Ryff, D. Carr and Magee, W. J. (1998).
Zapotec,’ in Kemper, R. V. and Royce, A. P. ‘Linking Life Histories and Mental Health: A Person
(eds) Chronicling cultures: Long-term field research Centred Strategy.’ Sociological Methodology 28: 1–51.
in anthropology, Walnut Creek: Altamira Press, Smith, D. J. and McVie, S. (2003). ‘Theory and Method in
pp. 8–33. the Edinburgh Study of Youth Transitions and Crime.’
Royce, A. P. (2005). ‘The Long and the Short of British Journal of Criminology 43,1: 169–195.
it: Benefits and Challenges of Long-Term Ethno- Stacey, M. (1960). Tradition and change: A study of
graphic Research.’ Paper presented at Principles of Banbury, Oxford: Oxford University Press.
Qualitative Longitudinal Research: An International Stacey, M., Batstone, E., Bell, C. and Murcott, A. (1975).
Seminar, University of Leeds, UK, September 30, Power, persistence and change: A second study of
2005. Banbury, London: Routledge & Kegan Paul.
Ruspini, E. (2002). Introduction to longitudinal research, Stovel, K., Savage, M. and Bearman, P. (1996).
London: Routledge. ‘Ascription into Achievement: Models of Career
Ryder, N. B. (1965). ‘The Cohort as a Concept in Systems at Lloyds Bank, 1890–1970.’ American
the Study of Social Change.’ American Sociological Journal of Sociology 102,2: 358–399.
Review 30: 843–861. Taris, T. W. (2000). A primer in longitudinal data
Saldana, J. (2003). Longitudinal qualitative research: analysis, London: Sage.
Analyzing change through time, Walnut Creek, Thomson, R. (2007). ‘The QL ‘Case History’: Practical,
Lanham, New York, Oxford: Altamira Press. Methodological and Ethical Reflections.’ Social Policy
Saldana, J (2005). ‘Coding Qualitative Data to Analyze and Society 6,4.
Change.’ Paper presented at Principles of Qualitative Thomson, R. and Holland, J. (2003). ‘Hindsight,
Longitudinal Research: An International Seminar, Foresight and Insight: The Challenges of Longitudinal
University of Leeds, UK, September 30, 2005. Qualitative Research.’ International Journal of Social
Sampson, R. J. and Laub, J. H. (1993). Crime in the Research Methodology 6,3: 233–244.
making: pathways and turning points through life, Thomson, R. and Holland, J. (2005). “‘Thanks for
Cambridge, MA: Harvard University Press. the Memory”: Memory Books as a Methodological
Savage, M. and Egerton, M. (1997). ‘Social Mobility, Resource in Biographical Research.’ Qualitative
Individual Ability and the Inheritance of Class Research 5,2: 201–291.
Inequality.’ Sociology 31,4: 465–472. Tuma, N. B. and Hannan, M. T. (1979). ‘Dynamic
Schoon, I. and Parsons, S. (2002) ‘Competence Analysis of Event Histories.’ American Journal of
in the Face of Adversity: The Impact of Early Sociology 84,4: 820–854.
Family Environment and Long-term Consequence.’ Vogt, E. Z. (1957). ‘The Acculturation of the American
Children & Society 16,4, 260–272. Indians.’ Annals of American Academy of Political and
Scott, J. and Alwin, D. (1998). ‘Retrospective Versus Social Science 311: 137–146.
Prospective Measurement of Life Histories in Lon- Vogt, E. Z. (1969) Zinacantan: A Maya community in
gitudinal Research, in Giele, J. Z. and Elder, G. H. the Highlands of Chiapas, Cambridge, MA: Bellknap
(eds) Methods of life course research: qualitative and Press of Harvard University Press.
quantitative approaches, Thousand Oaks, CA: Sage, Vogt, E. Z. (1994). Fieldwork among the Maya: Reflec-
pp. 98–127. tions on the Harvard Chiapas Project, Albuquerque:
Scudder, T. and Colson, E. (1979). ‘Long-term Research University of New Mexico Press.
in Gwembe Valley, Zambia,’ in Foster G. M., Vogt, E. Z. (2002). ‘The Harvard Chiapas Project;
Scudder, T., Colson, E. and Kemper R. V. (eds) Long- 1957–2000,’ in Kemper, R. and Royce, A. P.
term field research in social anthropology, New York: (eds) Chronicling cultures: Long-term field research
Academic Press, pp. 277–254. in anthropology, Walnut Creek, CA: AltaMira,
Scudder, T. and Colson, E. (2002) ‘Long-term Research pp. 135–159.
in Gwembe Valley, Zambia,’ in Kemper, R. V. and Wajcman J. and Martin B. (2002). ‘Narratives of Identity
Royce, A. P. (eds) Chronicling cultures: Long-term in Modern Management: the Corrosion of Gender
field research in Anthropology, Walnut Creek, CA: Difference?’ Sociology 36: 985–1002.
AltaMira, pp. 197–238. Walkerdine, V. and Lucey, H. (1989). Democracy in
Sheridan, Dorothy (2000). ‘Reviewing Mass- the kitchen: Regulating mothers and socialising
Observation: The Archive and its Researchers Thirty daughters, London: Virago.
Walkerdine, V., Lucey, H. and Melody, J. (2001). Wolcott, H. F. (1994). Transforming qualitative data:
Growing up girl: Psychosocial explorations of gender Description, analysis, and interpretation, Thousands
and class, Houndmills: Palgrave. Oaks, CA: Sage.
Ward, J. and Henderson, Z. (2003). ‘Some Practical Woodgate, R., Degner, L. and Yanofsky, R. (2003).
and Ethical Issues Encountered While Conducting ‘A Different Perspective to Approaching Cancer
Tracking Research with Young People Leaving the Symptoms in Children.’ Journal of Pain and Symptom
“Care” System.’ International Journal of Social Management, 26,3: 800–817.
Research Methodology 6,3: 255–259. Wu, L.L. (2000). ‘Some Comments on “Sequence
Warwick, D. and Littlejohn, G. (1992). Coal, capital and Analysis and Optimal Matching Methods in Sociology:
culture: A sociological analysis of mining communities Review and Prospect”.’ Sociological Methods and
in West Yorkshire, London: Routledge. Research 29,1: 41–64.
Webb, C. (1996) ‘To Digital Heaven? Preserving Yamaguchi, K. (1991). Event History Analysis. Newbury
Oral History Recordings at the National Library Park, CA: Sage.
of Australia.’ Staff paper, http://www.nla.gov.au/ Yates, L. and McLeod, J. (1996). “‘And How Would
nla/staffpaper/archive/index1996.html You Describe Yourself?” Researchers and Researched
White, R. and Wyn, J. (2004). Youth and society: in the First Stages of a Qualitative, Longitudinal Research
Exploring the social dynamics of youth experience, Project.’ Australian Journal of Education 40,1: 88–103.
Oxford: Oxford University Press. Yates, L., McLeod, J. and Arrow, M. (2002). Self, school
Whyte, W.F. (1943 2nd edition 1955). Street Corner and the future: The 12 to 18 Project, University
Society: The social structure of an Italian slum, of Technology, Sydney, Changing Knowledges
Chicago: University of Chicago Press. Changing Identities Research Group.
Wirth, L. (1938). ‘Urbanism as a Way of Life.’ American Young, M. and Willmott, P. (1957). Family and kinship
Journal of Sociology, 44: 1–24. in East London, London: Routledge and Kegan Paul.
15
Comparative and
Cross-National Designs
David de Vaus
It can be argued that virtually all social in cross-national comparative research are
research is comparative in that descriptions confronted in one way or another by those in
and explanations are derived from compar- other forms of research.
isons of groups, cases, periods or some
other unit of analysis (Przeworski and Teune
1966). This chapter focuses on one type of PART 1: WHAT IS COMPARATIVE
comparative research – that which is based on CROSS-NATIONAL RESEARCH?
cross-national comparisons. The discussion
concentrates on two main matters. While the chapter is restricted to cross-
First it outlines the nature and purpose of national comparative research, even this focus
comparative cross-national research designs is not without its definitional problems. As
and how this broad design relates to other we shall see, one of the purposes of cross-
major types of research design. The purpose national research is to assess the role of
of this discussion is to argue that while most culture in shaping outcomes. The problem in
research can be considered comparative, there comparing nations is that nations and cultures
are quite distinctive elements of comparative are not synonymous. On the one hand, many
cross-national research that deserve special countries consist of quite distinct cultures
attention. within the same national border while the
The second goal of the chapter is to describe one culture is not necessarily constrained by
and evaluate two broad forms of comparative national borders (see discussion p. 258).
cross-national research – case based and
survey based. Apart from demonstrating
Types of research design
that comparative cross-national designs come
in two main forms, the purpose of this At its simplest, cross-national comparative
discussion is to show that most of the research is research in which nations are
problems encountered by researchers engaged compared on some dimension (Przeworski
and Teune 1966). The purpose of cross- groups. Statistical techniques enable investi-
national comparisons may either be simply gators to control or remove these differences
to describe national differences or to draw to ensure group equivalence on specified
on the logic of comparisons to explain cross- characteristics.
national similarities and differences. This Suppose a study was being planned to
chapter focuses on explanatory forms of assess the impact of divorce on the educa-
comparative cross-national designs. tional performance of children. This would
To understand the place of cross-national involve comparing comparable children from
comparative designs within social science intact and divorced families. However, since
methods it is useful to review Smelser’s children from certain types of circumstances
(1972) fourfold classification1 of method- are more likely than others to experience
ological approaches. parental divorce it is necessary to distinguish
The first approach is the experimental between the effect of divorce and these
method which Smelser, like many others other circumstances. This is achieved by
regards as the gold standard in research. statistically removing the effect of these other
The simplest experimental design involves differences to then assess the impact of
the comparison of two groups at two time divorce – other things being equal. Statistical
points. Initially these two groups are identical, controls are an attempt to simulate the effect
a condition that is achieved by random of random allocation to groups that is used in
allocation of cases to the two groups. Initial the experimental method.
measures on an outcome variable are obtained A third approach outlined by Smelser is
from both groups prior to one of the groups the comparative method. This approach can
(the experimental group) being exposed to an also be understood as simulating some of
experimental intervention. The other group the features of the experimental method. This
(the control group) is not exposed to the approach will be discussed in detail in Part 2.
intervention. At some point following the The fourth approach that Smelser identifies
intervention both groups are remeasured is the case study method. This method can
on the outcome variable. The effect of consist of either single cases or multiple cases.
the intervention is measured by comparing Where multiple case studies are used the logic
the amount of change in the experimental of the case study method can be similar to
group with that in the control group. Any that of the comparative method as outlined by
significant difference in the amount of change Smelser.
between the two groups is attributed to While it is useful to view comparative
the effect of the intervention since, ideally, cross-national designs within this framework
this is the only difference between the two of experimental, statistical comparative and
groups. case study designs this framework does
For ethical and practical reasons, the exper- not fully incorporate all the work covered
imental method cannot be used for most social by comparative or cross-national studies.
science research. This has led to many social Many studies that involve some comparisons
scientists adopting what Smelser calls the between nations and cultures fit more readily
statistical method. The logic of the statistical under the heading of the statistical method.
method is to simulate important aspects I will argue, along with Ragin, that there are at
of the experimental method by ensuring least two different approaches to comparative
that the groups that are comparable are as research – what Ragin (1987) calls the
similar as possible except in relation to the variable-based and the case-based methods.
causal and outcome variables. The statistical The variable-based method is equivalent to
method relies on multivariate analysis to the statistical method outlined by Smelser
compare groups that differ in regard to the and the case-based method is similar to
key independent variables and statistically to Smelser’s description of the comparative
remove other relevant differences between method.
COMPARATIVE AND CROSS-NATIONAL DESIGNS 251
Universal and particular Weber and Durkheim saw comparative soci-

ology as a way of moving beyond the
In 1963 Bendix summarized the role of com-
atheoretical focus on detail that character-
parative cross-national research as follows:
ized traditional history and the sweeping
Comparative sociological studies represent an generalizations of the social philosophers
attempt to develop concepts and generalizations at (Ragin and Zaret 1983, p. 731). This midway
a level between what is true of all societies and what approach argues that particular phenomena
is true of one society at one point in time and space.
in any society can be the outworking of
(1963, p. 532)
more or less universal principles and of
This distinction is reflected in universalist and the particular cultural and historical circum-
culturalist approaches to explanation to which stances within which the phenomenon is
writers such as Hantrais (1999) and Kohn placed.
(1987) draw attention. Taking this approach, the contribution of
Universalist approaches are those that comparative cross-national research is to
search for general laws or uniform patterns identify the extent to which social phenomena
(nomothetic explanation) that apply in all are shaped by universal system factors and
situations regardless of the cultural context the extent to which they are shaped by unique
being investigated. Ragin and Zaret (1983) factors intrinsic to the specific time, place and
refer to this approach as one in which the culture in which they occur.
investigator seeks to identify ‘permanent
causes’. Using this approach, which was
relatively popular in comparative studies
in the 1950s and 1960s, the purpose of PART 2: CASE-BASED
comparative cross-national research was to CROSS-NATIONAL COMPARISONS
identify the commonalities across cultures
and countries and thus to establish the Ragin (1987) distinguishes between case-
universality of particular phenomena. Exam- based comparative research and variable- (or
ples of this approach were those studies survey-) based comparative research. While
that sought to demonstrate principles such these two manifestations of comparative
as the universality of the nuclear family, research may apply to a variety of types of
the incest taboo or the iron law of oli- comparative research the distinction is an
garchy. apt way of describing the two major forms
The culturalist approach stands in direct of cross-national comparative research. This
contrast. It stresses the uniqueness of each section outlines the nature and logic of case-
event and circumstance and emphasizes the based cross-national comparative research
particular and unique set of historical and while Part 3 will discuss survey-based cross-
cultural conditions that lead to specific events national comparative research.
and outcomes (ideographic explanation). This Case-based cross-national comparative
approach rejects the idea of being able to research is closest to the approach described
identify general patterns of behaviour or law- above by Smelser as the comparative
like principles that operate independently of method and is similar to the multiple case
their specific cultural and historical context. study approach described by Yin (1989).
A more useful approach takes a midway It should be stressed however that the
position between these extremes. Ragin and comparative cross-national method does not
Zaret (1983) and Hantrais (1999) argue encompass all comparative cross-national
that this middle position is what makes research.
comparative cross-national research unique Case-based comparative cross-national
and what, in different ways, characterized research is distinguished by two features: the
the vision and work of both Weber and way in which it seeks to understand cases and
Durkheim as founders of sociology. Both the logic of the causal analysis employed.
Understanding historical and cultural contexts. Slomczynski

et al. (1981) argue that:
Case-based comparative cross-national
designs seek to understand elements of a Insofar as cross-national analysis of social struc-
country (case) within the context of the whole ture and personality yield similar findings in the
case. It adopts a cultural and interpretive countries studied, our interpretation can ignore
model in that it is taken for granted that whatever differences there may be in the cultures,
any behaviour, attitude, indicator or event political and economic systems, and historical
circumstances of the particular countries, to deal
can only be understood within its historical, instead with social structural universals. But when
cultural and social context. Thus, rather the relationships between social structure and
than having a uniform meaning across all personality differ from country to country, then
countries, the act of voting, living alone we must look to what is idiosyncratic about
or civil unrest can only be understood particular countries for our interpretation. (p. 740,
my emphasis)
within the context of its history, culture and
society. Method of agreement (different cases)
Case-based comparative cross-national This form of comparative cross-national
research is based on the view that the whole design is built around comparing countries
is greater than the sum of the parts and that using the logic of J.S Mill’s Method of
parts cannot be understood without reference Agreement. He formulated this method as
to the whole. Rather than proceeding by follows:
isolating and measuring discrete variables
in each country, case-based designs seek If two or more of the phenomenon [countries]
to build a rounded understanding of each under investigation have only one circumstance in
country regarding the phenomenon being common, the circumstance in which alone all the
investigated. Each case (country) is treated instances agree, is the cause (or effect) of the given
phenomenon. (Mill 1879, Vol. 1, p. 451)
as a unit in its own right that deserves to
be understood as a coherent whole rather
For the current purpose this may be trans-
than simply the site to which variables are
lated as:
somehow attached.
If two or more countries being compared
Once each case is understood as a whole,
display the same phenomenon (e.g. high rates
causal analysis proceeds by then comparing
of solo living) and these countries share only
the cases.
one other characteristic in common (e.g. high
levels of prosperity) then that characteristic
is the cause of the phenomenon they have in
Causal explanation
common (i.e. high rates of living alone).
The similarity or difference of selected This means that, apart from the phe-
countries lies at the heart of the logic by which nomenon to be explained, where countries
comparative cross-national designs identify differ in all respects but one, the one
causes and develop explanations. In some factor they have in common is the cause
comparative cross-national designs countries of the phenomenon. This idea is expressed
are selected for comparison because they are diagrammatically in Table 15.1.
similar to one another in important respects. In this case Countries A and B display
In other designs countries are selected for similar behaviours (Y). On the basis that
comparison specifically because they differ ‘a circumstance that is not common to all
from each other. instances cannot, by definition, be causally
Kohn (1987) argues that one of the key related to it’ (Cohen and Nagel 1934) the
contributions of comparative cross-national only causal factor identified above is X1
designs is that they can help distinguish (Prosperity) because this is the only common
between phenomena that stem from universal factor between the cases. The countries
principles and those that result from particular differ in each other characteristics so these
Table 15.1 Method of agreement

Case Y X1 X2 X3 X4
High rate of Prosperity High value placed Low levels of Ample housing suitable
solo living on privacy family solidarity for solo living
Country A 1 1 1 0 1
Country B 1 1 0 1 0
All variables in this example are dichotomous and are coded 0 and 1. 0=not present; 1=present.
characteristics could not be responsible for the For example, there is nothing in the example
common outcome. in Table 15.1 to preclude the argument
When comparative cross-national analysis that prosperity plus a high value placed on
uses this reasoning it usually proceeds by privacy or prosperity plus low levels of
beginning with the observation of the same family solidarity result in high levels of solo
behaviour across countries (e.g. that the living.
countries share a high rate of solo living) A final problem is the level of abstraction
and then seek the single characteristic that the at which concepts are used. This point can
countries have in common that could explain be illustrated by the story of a man who,
this common behaviour. one evening, drank a great deal of scotch
This form of reasoning has important and soda and woke up the next morning
shortcomings which mean it must be used with with a hangover. The next evening he drank
care. a great deal of brandy and soda and again
First, it is impossible to list and compare woke up with a hangover. After drinking gin
every possible characteristic of two coun- and soda the next evening and subsequently
tries. The method can, at best, concentrate waking up with a hangover he concluded
on comparing relevant characteristics – in that the soda was causing the hangover.
this case, characteristics that might affect While this reasoning may appear logical by
national rates of solo living. But the selection this method, the reasoning is flawed because
of such factors is inevitably driven by of the conceptualization of the variables
theory or previous research and therefore and the failure to recognize the common
risks missing factors not considered by the element of scotch, brandy and gin. Similarly,
theories. conceptualizing characteristics of a country
Second, the method is biased towards the at a highly specific level can cause an
concept of mono-causation – that an outcome investigator to miss more abstract features
has a single cause. In social life this is by no that countries have in common. Alternatively,
means true and many phenomena can have conceptualizing country characteristics at too
both multiple and alternative causes. While general a level (e.g. democratic) may cause
the example in Table 15.1 is consistent with one to overstate the degree of similarity
prosperity (X1 ) being a cause of living alone between the countries – a problem described
rates it certainly does not demonstrate that it by Ragin as the problem of ‘illusory common-
is the only cause. It may be the only cause ality’.
identified within a limited set of factors but However, for all its dangers, the Method
the method cannot, in reality, exhaustively of Agreement can play a useful role by
eliminate all other factors. eliminating possible explanations. If ‘nothing
Third, the Method of Agreement is com- can be the cause of a phenomenon which is
pletely unable to identify interaction effects not a common circumstance in all instances
or what is called ‘chemical causation’ (Mill of a phenomenon’ (Cohen and Nagel 1934),
1879, Vol. 8, pp. 204–8). That is, some effects the Method of Agreement can be used to
will take place only when two characteristics eliminate explanations that do not meet this
are present in a particular combination. criterion.
Method of difference (similar cases) This form of reasoning is the analogue

Comparative cross-national studies that rely of that used in the classic experiment.
on the Method of Difference proceed by In experimental designs random allocation
focusing on countries that differ in regard to or matching are used to ensure that the
the outcome (e.g. rates of solo living) and seek control and experimental groups are identical
to find the one, and only one, other difference on all variables except for the exposure
between those countries (Lijphart 1975). to the experimental intervention (de Vaus
Where only one difference between countries 2001).
can be found (apart from the outcome) and this A similar logic applies to comparative
difference corresponds to country differences cross-national designs where similar cases
on the outcome variable this characteristic are selected. By selecting countries that
is regarded as the cause or explanation of have similar cultural political, economic and
the outcome. This scenario is represented in historical circumstances the aim is to control
Table 15.2 and is exemplified by Lipjhart for these factors. If the countries then differ
(1971) who argues that comparative design in relation to the phenomenon under inves-
must be based on selecting comparable or tigation (e.g. rates of solo living) it is argued
similar cases. that the different rates of solo living cannot be
Ideally, when using the Method of Differ- attributable to the cultural political, economic
ence, all cases will be identical on each of and historical characteristics that the countries
the potential explanatory variables except for have in common. In other words, differences
the actual causal variable. Where countries between countries cannot be attributed to
share the same characteristic a potential characteristics that the countries have in
explanatory variable can be regarded as common.
controlled. For example, in Table 15.2 the Of course it is impossible to select countries
countries have different levels of solo living that are identical in all respects but one. In
(Y) even when they are equally prosperous selecting countries the investigator will select
(X1 ) and have equal levels of family solidarity countries that are similar in relevant respects –
(X3 ). Therefore, these controlled variables that is, similar in regard to factors that are
cannot be causes of the variations in solo potentially relevant to the phenomena to be
living. explained.
The only explanatory variable that has the However, since many unobserved dif-
same pattern of variation to the outcome ferences will persist, it is impossible to
variable in Table 15.2 is the extent to which know if there are factors that have been
privacy is valued in the culture (X2 ). In this missed that explain the variation in the
example, it would be concluded that X2 is the outcome variable. While this shortcoming
cause of differences in Y. The availability of is important, it is no more serious than in
suitable housing (X4 ) would not be regarded all survey-based statistical studies where all
as a cause of solo living since its variation conclusions are based on models that contain
between countries does not match variations just a small subset of the possibly relevant
in the rates of solo living. variables.
Table 15.2 Method of difference

Case Y X1 X2 X3 X4
High rate of Prosperity High value placed Low levels of Ample housing suitable
solo living on privacy family solidarity for solo living
Country X 1 1 1 0 1
Country Y 0 1 0 0 1
Country Z 0 1 0 0 0
All variables in this example are dichotomous and are coded 0 and 1. 0 = not present; 1 = present.
assesses the extent to which the rank ordering

Difficulties with case-based
of the selected countries corresponds to the
comparative cross-national designs
rank ordering of the countries on the outcome
Since case-based comparative cross-national variable.
designs largely rely on the logic of the
Methods of Agreement and Difference the Assessing similarity and difference
method encounters the problems inherent When assessing similarity or difference it
in these methods. Some of the problems is important to interpret the meaning of
associated with each of these particular forms indicators within their social and cultural
of reasoning have been discussed above. context. For example, religion is expressed
There are additional difficulties with case- differently in different cultures. In one culture
based comparative cross-national designs that high levels of attendance at religious services
apply to such studies regardless of whether might reflect religiousness while in another
the Method of Agreement or the Method of religiousness is expressed in high levels of
Difference is applied. personal and private piety. A culturally alert
approach would investigate what constitutes
Reliance on categorical classifications religiousness in different countries and on that
The Methods of Agreement and Difference basis select countries that were religious or
are based on simple categorical classifications secular.
in which countries are classified as being The reverse may also be true – something
similar or different. Whether or not countries that appears the same in two cultures may
are classified as similar or different can have different meanings in different cultures.
have a profound influence on the conclusions For example, we may observe increasingly
drawn from a comparative design. But for high rates of solo living in a number of
many variables similarity and difference is a countries. However, solo living may mean
matter of degree. At what point in continuous different things in different countries. In one
measures (e.g. rate of solo living) are two country it may reflect social breakdown,
countries to be defined as similar? There social isolation and loneliness associated with
is a danger that where the design requires rapid urbanization. In another country, living
that countries are similar the operational alone may reflect an achievement that is only
definition of ‘similar’ becomes so broad so attainable because of a person’s prosperity
that countries that are quite heterogeneous and may reflect the high value placed on
are nevertheless classified as similar. The privacy and personal autonomy. To treat living
reverse can be true when defining countries alone in such different contexts as though
as different (Lieberson 1992). it was really the same thing would lead
Defelice (1986) argues for a more flexible to serious misunderstanding and misleading
approach to comparative design. Rather than explanations.
classifying cases as similar or different he
argues that Mill’s Method of Concomitant Type of causal explanation
Variation should be used so that the similarity A strong version of the causal reasoning
and difference of countries is regarded as a that underlies the Methods of Agreement
continuum rather than a dichotomy. Instead and Difference seeks to identify invariant
of selecting countries that are only similar patterns. Accordingly, any exception means
or only different he argues that the full that a particular causal explanation must be
range of countries should be selected and rejected. However, such a black and white
ranked in terms of similarity/difference on approach should be avoided. Where a deviant
the independent variables (e.g. rapidity of case is inconsistent with a strong pattern, it is
economic development) and the dependent better to see what is peculiar about the deviant
variables (e.g. rapidity of fertility decline). case. Rather than leading to the rejection of an
The comparative cross-national analysis then idea the deviant case can be used to refine the
understanding by helping specify the types of comparison. While the case-based approach
conditions under which a pattern applies. described above uses variables these variables
are placed and interpreted within the context
Small numbers of the whole case. The initial focus of the
Case-based comparative cross-national anal- case-based approach is to understand the
ysis that seeks to understand elements of the whole country so that specific attributes
whole within their historical, cultural, social can be interpreted within the context of
and economic context is time-consuming and the whole. A variable-based approach pays
difficult. The method limits the number of little attention to the whole and largely uses
countries that can be thoroughly studied. In variables without paying attention to the
practice this means that case-based compar- meaning of the attributes in particular cases.
ative designs frequently compare just two or Attributes are more or less treated as meaning
three countries and this in turn results in the the same thing regardless of the country in
problem of too few cases (Lijphart 1971). which they are measured.
Clearly such a small number of cases
precludes statistical generalization. But a
small number of cases still allows for gener-
Two types of comparative
alization based on the logic of replication –
survey studies
the same basis that is employed with most
experiments. As findings are replicated and The two most important types of cross-
the range of conditions under which they national survey-based research designs are
apply are specified by repeated experiments those in which the country is the unit of
(or comparisons between pairs of countries) analysis and those in which individuals are
the investigator becomes more confident the unit of analysis.
about the results and can specify the range
of situations to which they apply (de Vaus Country as the unit of analysis
2001). With this design, data are collected about
The other problem with using such so few the country at an aggregate level. A set of
cases is that it becomes difficult to apply characteristics of a country are delineated and
the logic of the Methods of Agreement or each country is coded on each of these char-
Difference (Lieberson 1991). With a very acteristics so that they are characteristics of
small number of cases the patterns can be nations/cultures rather than of the individuals
highly ambiguous and indeterminate. For in the nation/culture.
example, the Method of Agreement relies on An example of this type of survey is
finding one common factor across cases. But the Human Relations Area File (http://www.
where only three or four countries are included yale.edu/hraf/). For each country or culture,
in a comparative cross-national study there codes are created to indicate the country’s
may be many characteristics that such a or culture’s characteristics. The Human Rela-
limited number of countries share. Only tions Area File consists of a large number
through the examination of further cases do of variables that capture characteristics of
patterns of agreement begin to come into each culture (e.g. kinship rules, marriage
focus. rules, language characteristics, religious char-
acteristics, ways of thinking etc.). All these
variables reflect the characteristics of the
PART 3: SURVEY BASED country or culture – not the individuals in the
CROSS-NATIONAL COMPARATIVE country.
RESEARCH Aggregate data of this type are also
used widely by economists, criminologists,
Survey-based comparative cross-national political scientists and others in comparative
research employs a variable-based method of cross-national studies. While the nature of the
variables differs the aggregate nature of the Limitations of survey research in

variables remains. a comparative context
The survey-based approach to comparative
Individuals as the unit of analysis research has a number of important strengths.
In these survey-based comparative designs Survey-based approaches provide the means
data are collected about and from individuals of obtaining a systematic profile of each
in each country. The profile of each country is country and a formal way of evaluating the
the profile derived from the responses of each extent to which country differences exist.
sample member. However, the approach encounters important
A number of international survey programs challenges which, unless dealt with, limit
ask comparable questions from comparable the validity of cross-national comparisons.
samples in a variety of countries. Examples These problems fall into two broad categories:
of such survey programs include the limitations related to the survey method
International Social Science Survey itself and limitations due to the difficulty of
Program (http://www.issp.org/data.htm), The obtaining equivalent information from each
Eurobarometer Survey (http://www.esds.ac. country.
uk/International/access/eurobarometer.asp), An inherent bias in country-based surveys
the World Fertility Survey (http://opr. of individuals is that such studies equating
princeton.edu/archive/wfs/), the World Values interpret the sum of the responses of a sample
Survey (http://www.worldvaluessurvey.org/) of individuals as representing the country
and the European Social Survey (http:// as a whole. This problem does not refer to
naticent02.uuhost.uk.uu.net/index.htm). Of these, the representativeness of samples but to two
The European Social Survey provides the issues.
best example of an international program that
seeks to deal systematically and rigorously Individualist fallacy
with many of the problems that confront Scheuch (1968) draws attention to the prob-
cross-national survey designs. lem of the ‘individualistic fallacy’. This
The main way in which the analysis of sur- problem, to which cross-national compar-
vey data collected in cross-national surveys ative research is especially vulnerable, is
proceeds is by pooling the surveys from each the opposite of Robinson’s formulation of
country and testing for relationships between the ecological fallacy (1950). Essentially, the
variables in the single dataset . Having individualistic fallacy is the error of assuming
established overall patterns, the investigator that the whole is simply the sum of its parts.
introduces country as a dummy variable into It is the error of drawing conclusions about a
the statistical modelling to assess the extent of social unit such as a nation-state on the basis of
cross-national differences. measurements derived from the individuals in
If the initial overall patterns remain that social unit (Lazarsfeld and Menzel 1961).
unaffected by introducing country into the For example, while a survey may indicate
model it is reasonable for the investigator that most individuals value equality and
to conclude that the initial patterns are democratic participation, it does not mean that
more or less international (universal) rather the country exhibits equality or is democratic.
than national and that the patterns reflect Even though the character of a country may
regularities that transcend the particularities be affected by attitudes of individuals and the
of time and place. If, on the other hand, attitudes may be influenced by characteristics
the overall pattern changes once country of the culture, one level (the national level)
controls are introduced and if different cannot be read off directly from the individual
patterns of data are observed in different level. A country is also constituted by
countries, the investigator will look to national many factors including its institutions, its
characteristics to help account for the diverse history, its physical environment and its
national patterns. location within larger international structures.
Furthermore, not all individuals contribute The lack of cultural homogeneity of most
equally to shaping the national culture or nations means that it is difficult to infer culture
mood. Verba (1993) suggests a variety of from nation. However, most comparative
ways in which surveys might try to take surveys are based on national boundaries and
into account the uneven impact of different thus identify national rather than cultural
types of individuals in shaping the national differences. Given national heterogeneity any
picture. differences between countries may be due to
the impact of a particular part of a nation rather
Instability of measurements than any national culture. Indeed cultural
Survey research that discovers inter-country variations within a country may even be
differences requires reliable measures. Long greater than those between nations and cross-
ago Scheuch (1968) reminded comparativists country differences may simply be a statistical
that many of the so-called differences between artefact. Care therefore is required when
countries were in fact differences of only interpreting cross-national differences. The
a few percentage points. To interpret these need to explore variations within countries as
differences in terms of cultural characteristics well as between countries is required if one
requires that these inter-country differences is to avoid simplistic attributions of between-
are both real and persistent. However, given country differences to cultural differences.
the many sources of measurement error in There are, of course, valid reasons for
comparative research (see later discussion) it using national rather than cultural boundaries.
is a brave person who can confidently say that National boundaries are clearly defined and
the observed differences between countries relate closely to the available statistical data.
reflect real differences and are not simply an They also relate to policy and legislative
artefact of measurement error. Certainly one frameworks and provide a means of evaluat-
would want to be assured that the same pattern ing the impact of national laws and policies –
of inter-country differences persists over time matters that are frequently of more interest to
and with alternative measures. governments and funding agencies than the
unique impact of particular cultures (Hantrais
What is to be compared? 1999).
One of the purposes of cross-national research The reverse problem, known as Galton’s
is to assess the role of culture in determining problem, can also complicate the interpreta-
various outcomes. The problem confronted tion of cross-national differences. ‘Galton’s
by cross-national survey research is that Problem’ is the problem of interpretation
nation and culture are not synonymous. While due to cultural diffusion whereby the culture
country provides the frame from which survey of one country spreads to other countries
data are collected (whether it be at the and creates a degree of uniformity between
individual or aggregate level) these national countries. That is, each country is not truly
boundaries do not necessarily correspond to independent of the other. Where this is
cultural boundaries. Scheuch (1989) argues the case comparative cross-national analysis
that ‘there exists a German culture …[but] this may discover uniformity across nations (e.g.
does not, nor ever did, coincide with the polit- family forms or taboos), that is due to cultural
ical boundaries of any one political entity’. diffusion rather than to the operation of
Rokkan (1970) distinguishes between cross- universal principles.
national, cross-cultural and cross-societal
comparisons. Dogan and Pelassy (1984) point
Equivalence in cross-national
out that ‘Juan Linz delineated eight Spains,
comparisons
Erik Allardt four Finlands, and Stein Rokkan
as many Norways. Anyone knows that there The goal of any cross-national survey is to
are three Belgiums, four Italys and five or collect data in such a way that any cross-
six Frances’. national differences in survey findings can
be attributed to real differences between the are required. Common coding frameworks
countries rather than to differences in data and ways of managing non-equivalent
collection methods. There are two key sources responses (e.g. political party supported)
of what can be called non-equivalence error need to be specified. The ESS has made
in cross-national surveys: the adoption of considerable advances in specifying the way
non-equivalent methodologies and the non- in which equivalence in these areas can be
equivalence of the meaning of the data that achieved.
are collected. The issues of equivalence are Until the ESS insisted on conformity to
covered in some detail in Hantrais (1999). detailed survey requirements and established
clear documentation standards, the informa-
Methodological equivalence tion required to evaluate whether surveys
Cross-national differences in survey results conducted in different countries were actu-
can be due to methodological differences such ally comparable was frequently unavailable
as non-equivalent samples, data collection (Harkness 1999). This in turn has meant that
methods and coding frames in different we really do not know whether we can safely
countries. compare the data from many multi-country
Achieving such equivalence is difficult surveys. The use of ESS specifications will
(Harkness 1999). The European Social provide a major improvement in achieving
Survey (ESS) stands out for the diligence methodological equivalence in comparative
with which it minimizes methodological non- cross-national surveys.
equivalence error in cross-national surveys. However, even with detailed specifications
By adopting a centralized structure the ESS and rules to achieve equivalence the reality
imposes the same methodology on each of the remains that it is difficult to achieve equiv-
participating countries. This standardization alence in the implementation of surveys in
includes such matters as the organization of different countries (Mitchell 1965). Not only
the survey group in each country, sampling are some countries better equipped to conduct
methods, fieldwork, the ways in which quality surveys, countries vary in the types
response rates are calculated, the level of sampling frames that are available, the
of survey documentation and many other methods of administration that are possible
detailed aspects of conducting and reporting and even the level of survey ‘literacy’ of the
the survey in each country. These detailed population (Bulmer 1998; Harkness 1999).
specifications are available in the ESS website Furthermore, cultural differences in matters
(http://naticent02.uuhost.uk.uu.net/index.htm). such as politeness can affect both response
Since sample design and size affect rates and the presence of acquiescent response
the error of estimates the ESS provides sets (Jones 1963).
detailed rules about the way in which All these factors stem from the culture in
samples are obtained and on providing which the survey is administered and therefore
information by which sample quality can be which in turn makes it difficult to standardize
assessed (http://naticent02.uuhost.uk.uu.net/ across cultures. Considerable work remains to
methodology/sampling_strategy.htm). be done to design ways of assessing the impact
Methods of administration can affect of these different methods of survey procedure
responses to different types of questions and in different contexts. Certainly, when using
result in quite different levels of non-response data from cross-national surveys investigators
and response bias. A good cross-national need to be aware of the survey design in each
survey will therefore specify the mode country and be aware of the way in which
of data collection and the specifics of cultural practices may affect the way in which
exactly how that mode will be implemented the survey is implemented. To use these
(e.g.http://naticent02.uuhost.uk.uu.net/ datasets without this understanding, risks
fieldwork/index.htm). Ways of evaluating confusing observed cross-national differences
the quality of the data in each country with real differences and failing to consider
that the differences are simply methodological another culture. For example, a common ques-
artefacts. tion in international surveys has been to ask
people to indicate their political orientation
Equivalence of meaning on a left wing/right wing continuum. But the
It is one thing to enforce the same ways of concept of left and right does not translate well
collecting data in each country in a cross- to countries where the very concept is foreign.
national survey. It is another thing to ensure A similar problem arises when asking about
that the meaning of the data is equivalent in religious beliefs in different countries where
each country. The problem of the meaning of the concept of God does not always translate
observations in different countries confronts well (Jowell 1998).
all cross-national research. However, because Equivalence is not just a matter of arriving
survey responses are typically less contextu- at equivalent language but of achieving
alized than data collected with other methods, equivalent indicators. Even when equivalent
the problem of meaning is particularly acute words are used the questions do not always
in cross-national survey research. work in the same way in different cultures.
Most questions are used to tap more abstract
concepts but the specific indicators of the
Validity in different contexts. Problems in concept can differ from one country to
assessing the meaning of observations relate the next.
to the validity and reliability of survey ques- Even questions designed to measure
tions. Cross-national surveys produce special behaviour or personal attributes encounter
problems for validity since the way in which problems. Here the problem may be less a
questions are understood can vary sharply in matter of achieving equivalent wording but
different cultural contexts. Validity problems in determining how to interpret responses.
in comparative cross-national surveys arise The same response will not necessarily
from the difficulty of ensuring that questions have the same meaning in different cultures.
mean and measure the same thing in different Educational level is measured in most surveys
countries. but in cross-national surveys working out
The problem of equivalent meaning is equivalent levels of education is confounded
obvious when the questionnaire needs to be by different systems and qualifications. Even
administered in different languages. Where age is problematic (Verba 1993) especially
this is the case the first task is to ensure that where age is used as a proxy for other concepts
the equivalent meaning is contained in the such as stage in the life cycle. Depending on
different translations. A common approach the culture and society, knowing that a person
to ensuring that the language is equivalent is 20 years old indicates different things.
is to use blind back-translation methods These simple examples highlight the fun-
(Brislin 1970). This involves beginning with damental characteristic of all social measure-
a base language (e.g. English) and then ment. The meaning of the measurement must
translating the questionnaire into each of the be derived from the culture. This means that
languages used in the survey. To check on the same responses (e.g. years of education,
the accuracy of the translation, the translation voting behaviour, occupation or age) may not
is then independently translated back into have the same meaning in different cultures.
the base language and the two versions of
the questionnaire in the base language are Literal and functional equivalence. One
compared. of the decisions any comparative survey
However, it is not always possible to researcher must make is whether to aim
achieve a neutral or an accurate translation. for literal or functional equivalence. Literal
Since language is a carrier of culture, the equivalence is achieved where identical stim-
words can reflect culturally specific meanings uli are used in all countries and is exemplified
and concepts that may have no equivalent in in Almond and Verba’s (1963) The Civic
Culture, a classic study in comparative poli- This method involves developing measures
tics. Using this approach, literal translations of concepts in each country that consist of
and the same indicators of concepts are a mixture of country-specific indicators and
used in each country. The shortcomings of indicators that are common to all the countries
literal equivalence have already been outlined being compared. In this way there is some
above. capacity to evaluate the extent to which the
The alternative is to aim for functional country-specific indicators capture the same
equivalence (Przeworski and Teune 1966; underlying concept as do the common cross-
Scheuch 1968). Functional equivalence is national indicators.
achieved where the goal is to measure the
same construct but the specific means by
Improving equivalence
which the construct is measured can vary
from place to place. The notion of functional Equivalence is a continuum. While the goal
equivalence is based on Lazarsfeld’s argument in comparative cross-national survey research
that indicators can be interchangeable. In is to achieve full equivalence this goal is
cross-cultural research the argument is that unlikely to be realized. Nevertheless, there are
measures must be culturally relevant and that ways in which equivalence can be improved.
therefore different measures will frequently At the measurement level equivalence
be required to measure the same concept in is much more likely to be achieved by
different cultures. aiming for functional than literal equivalence.
The ESS seeks to achieve functional rather While methods such as identity equivalence
than literal equivalence of question wording. techniques can be useful they do not fully
Rather than insisting on literal translations resolve the issue of establishing that different
with the standard blind back-translation sets of indicators are functionally equivalent.
approach a Translation Panel works with Cognitive interviewing, by which means
the questionnaire design teams. This panel investigators try to access the meanings that
provides detailed annotations to the question- respondents attach to questions and their
naire that explain the purpose and meaning answers can assist in evaluating whether
behind questions and concepts. The purpose different questions are functionally equiv-
of these annotations is to assist the translators alent in different countries. Of course the
in retaining the meaning of the concepts traditional ways of assessing the validity
and to assist them in developing wordings of any measure can be used to improve
that capture the meaning behind the question the functional equivalence of measures in
while freeing them from a strict literal trans- different cultures.
lation (http://naticent02.uuhost.uk.uu.net/ At the level of executing comparable
methodology/translation_strategy.htm). surveys with comparable samples and com-
The notion of functional equivalence is the parable data collection methodologies there
most defensible approach in cross-national is room for considerable improvement (Lynn
research as it recognizes that meaning derives 2003). Much more careful specification of
from a context. However, the difficulty is in standards and requirements for surveys in
knowing whether one has achieved functional each participating country is essential. While
equivalence. It is one thing to accept that it will not be possible to implement identical
constructs can be measured in different procedures in all countries, some variation
ways in different cultures but it is quite could be eliminated by more rigorous specifi-
another to demonstrate that the different cation requirements such as those used in the
ways are functionally equivalent. Przeworski ESS model. More thorough documentation
and Teune (1966) proposed one method will assist investigators in interpreting inter-
which they call the ‘identity-equivalence’ country differences in results and assist in
method for deriving functionally equivalent analyzing data so as to minimize the effect
indices of concepts in different countries. of these inter-country survey differences.
Other improvements can be achieved by strengths and weaknesses of any case-based

better standardization in areas such as the method (de Vaus 2001, 2006). The great
way in which particular key questions strength of case-based comparative methods
are worded and coded. International is that they seek to understand the specific
efforts at achieving harmonization of within the context of the whole case. For
key background variables have gone some cross-cultural research this is particularly
way to obtaining consistent measures of key important.
variables. The UK National Statistics Office All case-based approaches encounter prob-
provides an example of a set of harmonized lems of knowing when the whole case, rather
questions used in government surveys (http:// than just elements have been understood.
www.statistics.gov.uk/about/data/harmonisation/ All case-based methods rely on interpretation
default.asp). and this in turn leads to some difficulties in
One of the criticisms of cross-national replication. The small number of cases that
surveys is that variables are measured without can be studied produce further problems for
context. One way in which both these dangers the methodology. Other difficulties include
can be reduced is to ensure that data are the shortcomings of the logic of the Methods
collected about the structural and cultural of Agreement and Difference that have been
elements of the country or region from outlined. The logic of the method is most
which the individual comes. Thus, if national, suited to eliminating explanations than it is to
regional and local characteristics can be ‘proving’ explanations. Case-based methods
added to individual records in survey datasets also encounter difficulties in that the logic of
then it is much more likely that subsequent the method is to seek invariant causes, an aim
analysis can at least give some weight to these which may be generally unachievable in social
characteristics. The development of multi- science research.
level modelling techniques enables these However, there is nothing unique about
macro characteristics to be taken into account case-based methods in cross-national
in assessing individual data. research. While cultural differences may be
especially obvious in cross-national research
the whole point of case-based approaches is
CONCLUSION to take the specific context into account. This
applies to historical research and case studies
In 1989, Scheuch stated that ‘in terms of as much as to cross-cultural research.
[comparative] methodology in abstracto and The second main form of cross-cultural
on issues of research technology, most of comparative research is the cross-national
all that needed to be said has already been survey. While there have been attempts to
published’. Nothing since then challenges the improve the comparability, the same problems
accuracy of his assessment. While compar- apply to cross-national surveys as to national
ative cross-national research continues, the surveys. The differences are a matter of
main approaches and problems have been degree.
known for a long time. Little progress has been Any national survey faces problems asso-
made in recent years in overcoming the basic ciated with the meaning and equivalence
problems. of items. No nation is so homogenous
In general, however, the problems faced that these issues are not important. While
in comparative cross-national research are language may not be as obvious a factor
encountered in one way or another in all other within national surveys (or is conveniently
research designs. ignored) the understanding of questions
Two main forms of comparative cross- and the appropriateness of indicators will
national research have been outlined in this vary across a range of sub-cultural group-
chapter. Case-based methods of compara- ings within any nation (de Vaus 2002a,
tive cross-national research share the same 2002b).
The equivalence of survey methodologies Defelice, E. G. (1986). ‘Causal inference and compar-
and the difficulties that non-equivalence ative methods’. Comparative Political Studies 19(3):
creates for cross-national comparisons is a 415–437.
problem and has been recognized as such. But de Vaus, D. A. (2001). Research Design in Social
the problem is not unique to cross-national Research. London, Sage.
de Vaus, D. A. (2002a). Surveys in Social Research,
surveys. Precisely the same issue confronts 5th edn. London, Routledge.
repeated cross-sectional studies that attempt de Vaus, D. A. (ed.) (2002b). Social Surveys, 4 volumes.
to track trends within countries. The non- London, Sage.
equivalence of question wording, samples and de Vaus, D. A. (ed.) (2006). Research Design, 4 volumes.
methodologies confronts any survey analyst London, Sage.
trying to interpret trend studies (Kulka 1982). Dogan, M. and D. Pelassy (1984). How to Compare
These shortcomings in case-based and Nations. Chatham NJ, Chatham House.
survey-based methodologies in cross-national Hantrais, L. (1999). ‘Contextualization in cross-national
comparative research’. International Journal of Social
comparative research are not reasons for
Research Methodology 2(2): 93–108.
avoiding cross-national research any more
Harkness, J. (1999). ‘In pursuit of quality: issues for
than they are for avoiding these methods cross-national survey research’. International Journal
in national or sub-national contexts. As the of Research Methodology 2(2): 125–140.
world becomes increasingly globalized we Jones, E. L. (1963). ‘The courtesy bias in South-East
can only anticipate a growth in the need Asian survey’. International Social Science Journal
and opportunity for cross-national research. 15(1): 70–76.
An awareness of the challenges faced in Jowell, R. (1998). ‘How comparative is comparative
conducting such research is part of the research?’ American Behavioral Scientist 42(2 Oct.):
solution to reducing the effect of these 168–177.
Kohn, M. L. (1987). Cross-National Research as
problems and for evaluating the claims made
an Analytic Strategy. Cross-National Research in
on the basis of cross-national comparative Sociology. Newbury Park, Sage.
research. Kulka, R. A. (1982). ‘Monitoring social change via survey
replication: prospects and pitfalls from a replication
survey of social roles and mental health’. Journal of
NOTES Social Issues 38(1): 17–38.
Lazarsfeld, P. F. and H. Menzel (1961). ‘On the rela-
1 Smelser actually identifies five types but one of
tionship between individual and collective properties’.
these – the method of heuristic assumption – is not Complex Organisations. A. Etzioni. New York, Holt,
particularly relevant to this discussion. Rinehart and Winston, pp. 422–440.
Lieberson, S. (1991). ‘Small N’s and big conclusions: an
examination of the reasoning in comparative studies
based on a small number of cases’. Social Forces 70:
REFERENCES 307–20.
Lijphart, A. (1971). ‘Comparative politics and the
Almond, G. and S. Verba (1963). The Civic Culture. comparative method’. American Political Science
Princeton, Princeton University Press. Review 65(3): 682–693.
Bendix, R. (1963). ‘Concepts and generalisations in com- Lijphart, A. (1975). ‘The comparable cases strategy in
parative sociological studies’. American Sociological comparative research’. Comparative Political Studies
Review 28: 532–539. 8: 158–177.
Brislin, R. W. (1970). ‘Back-translation for cross-cultural Lynn, P. (2003). ‘Developing quality standards for
research’. Journal of Cross Cultural Psychology cross-national survey research: five approaches’.
1: 185–216. International Journal of Social Research Methodology
Bulmer, M. (1998). ‘The problem of exporting social 6(4): 323–336.
survey research’. American Behavioral Scientist Mill, J. S. (1879). A System of Logic, 8th edn. London,
42(2 Oct.): 153–167. Longmans Green.
Cohen, M. R. and E. Nagel (1934). An Introduction Mitchell, R. E. (1965). ‘Survey materials collected
to Logic and Scientific Method. New York, Harcourt in the developing countries: sampling measure-
Brace Inc. ment and interviewing obstacles to intranational
and international comparisons’. International Social Scheuch, E. K. (1989). ‘Theoretical implications of

Science Journal 17(4): 665–685. comparative survey research: why the wheel of
Przeworski, A. and H. Teune (1966). ‘Equivalence in cross-cultural research keeps on being reinvented’.
cross-national research’. Public Opinion Quarterly 30: International Sociology 4: 147–167.
551–568. Slomczynski, K. M., J. Miller, and M. Kohn. (1981).
Ragin, C. C. (1987). The Comparative Method. Berkeley, ‘Stratification, work, and values: a Polish-United
University of California Press. States comparison’. American Sociological Review
Ragin, C. C. and D. Zaret (1983). ‘Theory and method in 46(6): 720–744.
comparative research: two strategies’. Social Forces Smelser, N. J. (1972). The methodology of com-
61(3): 731–754. parative studies. Comparative Research Methods.
Robinson, W. S. (1950). ‘Ecological correlations and D. P. Warwick and S. Osherson. Englewood Cliffs,
the behavior of individuals’. American Sociological Prentice Hall, pp. 41–86.
Review 15(June): 351–357. Verba, S. (1993). ‘The uses of survey research
Rokkan, S. (1970). Cross-cultural, Cross-societal and in the study of comparative politics: issues
Cross-national Research. Main Trends of Research and strategies’. Historical Social Research 18(2):
in the Human and Social Sciences. Paris, UNESCO. 55–103.
Scheuch, E. K. (1968). The cross-cultural use of sample Yin, R. K. (1989). Case Study Research: Design
surveys: problems of comparability. Comparative and Methods. Beverley Hills and London, Sage
Research Across Cultures and Nations. S. Rokkan. Publications.
Paris, Mouton, pp. 176–209.
PART III
Data Collection and Fieldwork
This section of the handbook delves into advanced framework that is model based
several different ways to collect data and con- that relates test items to examinee and
duct fieldwork. Social science methodology is item characteristics. IRT analysis produces
very rich in the choices it provides an inves- an equation that describes the relationship
tigator in conducting research. Matching the between the respondent and item parameters.
appropriate method to the research question Scores from CTT are dependent on the
sometimes makes this richness overwhelm- characteristics of the respondent and the
ing. However, the choices provide the tools specific test. These two characteristics cannot
that are needed to conduct research. A sharper be separated in the CTT approach. The IRT
tool should provide a more clearly detailed model-based approach is not specific to the
answer. While most of the world focuses test or questionnaire used and the sample
on the answer the trained social scientist tested. With IRT different measures of the
is aware that how the question is answered same trait can be used without expensive test-
can be as important as the question itself. equating procedures. In CTT the reliability of
This handbook section provides a range of the test increases with its length that produces
such tools that include both introductory and long tests or questionnaires with redundant
intermediate approaches. items. IRT allows the selection of items of
The chapter by Bovaird and Embretson on varying and non-overlapping difficulty so that
tests and measurement may be difficult to read tests can be considerably shorter than those
but it is worth the effort. The chapter deals developed under CTT.
with a well-established approach to measure IRT is most advantageous when computer-
development that has recently received more based adaptive testing is used. In conven-
visibility in the social sciences. Item response tional testing everyone gets the same or
theory (IRT) can be applied to survey research, parallel versions of a test. With IRT each test
marketing, and health contexts in addition can be individualized by selecting items of
to most substantive areas in education and varying difficulty from a pool of items. This
psychology. The authors argue that classical approach provides a more accurate estimate
test theory (CTT), which is the primary of the person’s ability in much less time. It is
social science approach to measurement, not expected that the use of IRT will continue to
only makes unrealistic assumptions about grow and displace much of CTT.
the characteristics of the data needed but Susan Speer’s chapter provides an over-
also lacks several important advantages of view and critical evaluation of the debate on
IRT. The latter is more flexible and has an the relative advantages and disadvantages of
‘natural’ versus ‘contrived’ data, or ‘unobtru- of being relatively inexpensive but require
sive’ and ‘obtrusive’ methods. She concludes accurate names and addresses. They are
by saying that by adopting a reflexive also subject to a low response rate. Internet-
approach to interviews or other contrived or Web-based survey are also relatively
data collection procedures we can obtain inexpensive, allow complex skip patterns
rich insights into interactional issues and the that written questionnaires cannot do but are
workings of normativity in culture. On the clearly limited to those persons who have
other hand she stresses that we can never easy access to the Internet. Finally, group
achieve an unmediated access to participants’ administration of questionnaires, as in a
realities, neutralize the context, or disinfect classroom, can be used if appropriate. The
our data entirely of the researcher’s presence, author provides an excellent summary of
because the knower is always intimately the advantages and disadvantages of each of
bound up in and partially constitutive of what these techniques that will aid the researcher
is known. Finally, what are natural data cannot in making the correct choice in which method
be decided on the basis of their type and/or to use.
the role of the researcher within the data. In qualitative research, the most common
Rather, the status of pieces of data as natural methods of data collection are in-depth
or not depends largely on what the researcher and semi-structured interviews. Feminist
intends to ‘do’ with them. researchers have been active in developing
Obtrusive questionnaires and interviews these methods in recent years. Doucet and
form the lion’s share of the social research Mauthner discuss qualitative interviewing
literature. The chapter by de Leeuw will from the standpoint of feminism and view the
help researchers plan their study using research interview as a way of constructing
these approaches. One of the first problems knowledge. They argue that feminists have
researchers face is fewer people are willing problematized key issues in the use of
to answer questions. In many cases the only interviews as a research tool: who produces
way to assure an appropriate sample is to knowledge, with what politics, and from
offer to pay the respondent. However, de which locations. The discussion covers issues
Leeuw provides several suggestions for how around rapport and the relational aspects
to optimize response rates. of interviewer-interviewee relationships. In
Another issue discussed in this chapter is discussing power differences they show how
how to write the questions. It seems obvious feminists have come to see the researcher
that the answer to the question needs to as both ‘outsiders’ and ‘insiders’ in the way
reflect what we wanted to know. However, they relate to their interviewees and invest
respondents may not understand the question their identities in the research relationship
in the way the person who wrote it expects. but also in their relation to the data they
Education, culture, experience all shape how produce. In referring to interview dynamics
we understand what we are being asked. they point to the two-way nature of power
Writing good questions requires pre-testing. between respondents and interviewers in the
The chapter introduces the use of cognitive co-production of interview material. The
psychology in question development. discussion also moves to the power of
The chapter also reviews several researchers to represent the narratives of
approaches to data collection. In person or those they study including the links made
face-to-face interviews are the most flexible with theory, the transcription, interpretation,
and can help and motivate respondents. and writing up.
Telephone interviews are less flexible and do While qualitative interviews are often
not possess the visual cues that can be used directed to understanding the commonali-
during an in-person interview to determine ties between those they study, biographical
if the respondent appears to understand the methods focus upon differences and upon
question. Mail surveys have the advantage the whole case. Biographical methods are
DATA COLLECTION AND FIELDWORK 267
enjoying a resurgence of popularity albeit, for their own sake so that the effects of time –
as Joanna Bornat shows in Chapter 20, a concern with ‘pastness’ is how she puts it –
a growing number of approaches have and an interest in change and continuity come
developed under this umbrella. Bornat’s to the fore. Oral history developed through
chapter is written from the perspective of a political concern to capture the unheard
an oral historian. Three main approaches are ‘voices of the past’ represents a rather more
identified that have developed along rather democratic approach to data analysis and
different interdisciplinary lines: biographic- interpretation. While narrative analysis and
interpretive approach, oral history, and nar- the biographic-interpretive approach provide
rative analysis. The biographical interpretive for a deep analysis of subconscious as
method lends itself to more psychoanalytic well as conscious processes – what may
interpretations of motivation and meaning; be unspoken or unacknowledged by the
narrative analysis leans more toward socio- interviewee, oral historians maintain a greater
linguistics; while oral history draws from interpretive distance and tighter boundaries
both sociology and history. Each gives around their role as interpreters than the two
centrality to the individual account and to other approaches.
individual agency in attempting to explain Janet Smithson’s chapter on focus groups
the changing nature and persistence of discusses practical and theoretical questions
social relations and social structures; each related to using focus groups in social
makes use of the interview to generate research and suggests how to use them and
data. They differ, Bornat argues, in three analyse the data most effectively. According
important respects: the dialogic or interactive to her, the particular strength of the focus
aspects of the interview; the centrality of group method is that it enables research
memories to their interpretation; and the role participants to discuss and develop ideas
of the researcher in the interpretation of collectively, and articulate their ideas in their
the data. own terms, bringing forward their priorities
Reflecting her identification as an oral and perspectives. The limitations of focus
historian, Bornat argues that oral history group research can be mitigated by awareness
places more emphasis on the dynamics of of the constraints, informed analysis, and
the interview process than the other two by detailed consideration of the way the
approaches. It also places more emphasis conversations are socially constructed in the
upon the importance of eliciting memories group context.
16
Modern Measurement in the
Social Sciences
James A. Bovaird and Susan E. Embretson
While item response theory (IRT) is a viable and the potential of some IRT models
and well-established methodology for educa- to impact test design for targeted aspects
tional measures, it is still relatively unused of construct validity. We will begin with
in psychology and the rest of the social a brief discussion of what constitutes the
sciences. Despite its underutilization in the area of testing and measurement followed
mainstream of social research, IRT is appro- by a direct contrast of IRT and CTT.
priate for consideration in any context that The shortcomings of CTT will be used to
postulates the presence of a latent construct illustrate the benefits of modern measurement
and involves constructing and/or analyz- techniques in the context of the characteristics
ing a multicomponent instrument designed of quality measurement. The chapter ends
to measure that construct-including survey with a discussion of current trends and future
research, marketing, and health contexts directions.
in addition to most substantive areas of
education and psychology. Some attractive
features of IRT include the possibility of TESTING AND MEASUREMENT
more flexible construction of alternative
test forms, shorter and more efficient tests, Most social scientists are interested in unob-
equating, and interpretation of scores with- servable human attributes that are often
out norms. This chapter will review and referred to as latent constructs, raising the
emphasize the benefits of contemporary IRT, issue of imparting a clear meaning to the num-
including the technical advances of IRT bers that are assigned to represent levels of
over methods based on classical test theory a construct, a process called measurement or
(CTT), the role of modern measurement psychometrics. Testing then refers to sampling
methods in computer-based testing (CBT) the individual behavior that is observable
and computerized adaptive testing (CAT), at a given point in time. Unfortunately,
measurement instruments cannot exactly rep- scales can be further classified as categorical
resent the latent construct, so the quality of data, and interval and ratio scales are often
measurement is defined by the presence of classified as continuous data.
four characteristics: a standardized mode In general, there are three basic item types
of test administration, a meaningful metric in use in the social sciences. The first type
for obtained scores, score reliability, and is a response set that represents a range
score validity. These four characteristics of trait levels ordered from low to high.
contribute to the interpretability (or lack Examples would be rating scales (often called
thereof) of scores obtained in testing and Likert scales; Likert, 1932), physiological
will be expanded upon in subsequent sections measurements, and any other ‘continuous’
as a means of distinguishing between CTT measures. While not all of the response sets
and IRT. fully meet the requirements for interval level
While some attributes such as age or of measurement, there is an assumption of
weight can be precisely measured with a an underlying continuum. (See Goldstein &
single measurement, most constructs are Hersen (1984) for a discussion of Likert-type
much harder to test with single measures. items and interval properties.) The second
Consequently, most tests or scales contain type of item has a dichotomous (two response
multiple measures, each representing a single options) response format such as true/false
observation of the characteristic. In education, questions or checklists (an endorsement
and testing in general, simple measures are constitutes the presence of the behavior, trait,
often called items, in survey research they event, etc. while the absence of endorsement
may be called questions, and in experimental indicates the absence of the behavior). The
psychology they may be referred to as third item type is a dichotomous scoring
stimuli or cues. Consistent with the testing of a polytomous (more than two response
background from which measurement has options) response set such as the case with
primarily developed, we will collectively multiple choice formats. Typically, there is a
refer to questions, items, and stimuli as correct answer and a set of distractors and the
items. The number of items required in resulting dichotomous data represents either
a scale depends on the complexity of the a correct/incorrect response or a pass/fail
characteristic. Individual items tend to be decision.
poor measures and often partially reflect
attributes other than the targeted construct.
Thus, the variability among responses to an CLASSICAL TEST THEORY AND
individual item contains a portion attributable MODERN MEASUREMENT
to the targeted construct, or true score
variance, and a portion attributable to random Historically, CTT has provided a general
error and unrelated systematic sources, or framework for the development, administra-
measurement error. tion, and interpretation of assessment tools.
The numerical representation of an observ- Gulliksen (1950) is often referred to as the
able behavior requires a clear and definitive defining volume for CTT, but much of the
rule for associating one and only one num- work was first formalized by Spearman in
ber with the magnitude of an individual’s the early 1900s, well before Lord and Novick
construct level. Given a sample O of N (1968) laid the foundation for IRT. According
distinct participants, any participant can be to McDonald (1999), there are two views on
assigned a true score t(os ). A procedure is then the relationship between CTT and modern
devised for pairing each participant os with measurement. McDonald argues that CTT
its imprecise numerical measurement, m(os ). may be viewed as a reasonable approximation
Measurement scales can be classified as one to IRT under certain conditions. Conversely,
of four scales of measurement: nominal, since the development of CTT occurred prior
ordinal, interval, or ratio. Nominal and ordinal to the development of IRT, there exists the
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 271
accurate impression that IRT represents a and determine the amount of error in test
significant change in theoretical perspective scores. The identification of a common factor
from CTT. According to Embretson and gave rise to the concept of a true score
Reise (2000), CTT can be best described and the common factor theory. Spearman’s
as representing a set of ‘Old Rules of common factor theory was further developed
Measurement’ that have served applied psy- and elaborated by Thurstone (see Thorndike &
chologists and psychometricians for decades. Lohman, 1990), Guttman (1957), Lawley
Developed from common factor theory, (see McDonald, 1999), and Joreskog (see
CTT provides fairly accurate psychometric McDonald, 1985). Spearman also showed
information for items resulting in continuous how a correlation between two alternate
data. However, there are several inherent forms of a test could be used to estimate
shortcomings involved when CTT is applied the amount of measurement error in test
to categorical data that arise from polytomous scores which became the primary purpose
response formats and dichotomous scoring. of CTT. Guttman (1945) introduced the
While CTT methods may provide reasonable concept of internal consistency by showing
approximations with binary data, it is only a how items within a test could also be used
linear approximation to a nonlinear system. to determine test reliability, and Cronbach
As suggested by both the traditional label and (1951) continued the work to the extent that
the name given to them by Embretson and the most common CTT measure of internal
Reise, the old rules have been improved upon consistency reliability is named after him,
by two modern model-based frameworks for Cronbach’s coefficient alpha (α).
measuring abilities: the extension of biserial IRT developed through the work of two
and tetrachoric correlation theory with the traditions spanning both sides of the Atlantic
common factor model referred to as item fac- Ocean. In the United States, Lazarsfeld
tor analysis (Bock, Gibbons, & Muraki 1988; (1950) introduced latent structure analysis,
Knol & Berger, 1991), and the development of which eventually became known as IRT.
essentially a nonlinear common factor model IRT combines factor analysis with the phi-
suitable for conditional probabilities, or item gamma hypothesis1 , one of the oldest laws
response theory. Item factor analysis is best in psychology that can be traced back as
discussed in the context of structural equation early as 1878 (see Guilford, 1954; McDonald,
modeling and confirmatory factor analysis 1999). Another key development was Lord’s
and will not be covered further in this chapter. (1952) demonstration that Spearman’s single
The interested reader is referred to Mislevy factor theory could be applied to binary
(1986), Muthén (1978), or Takane and de items. Lord and Novick (1968) included
Leeuw (1987) for more information. The four chapters from Allan Birnbaum on IRT.
following sections will present a summary of Bock and Aitken (1981) provided the ele-
the classical rules of measurement, contrast gant marginal maximum likelihood (MML)
them with the ‘new’ rules of measurement, method for parameter estimation. In Europe,
and illustrate how IRT better addresses some Rasch (1960) proposed what is now known
of the shortcomings of the classical methods, as the Rasch model or 1-parameter logistic
primarily when applied to binary data. (1PL) model. Anderson (1972) elaborated
on the MML estimation methods for Rasch
item and person parameters. Gerhard Fischer
Historical development
(1973) extended the binary Rasch model to
Classical test theory is frequently cited as define parameters by incorporating stimulus
having its roots in Gulliksen (1950), however, properties, treatment conditions, etc. using
the procedures upon which CTT is based a linear logistic latent trait model (LLTM).
were developed much earlier by Charles Others have progressed the field of IRT since
Spearman (1927) who described how to this seminal work, but they are too numerous
recognize that tests measure a common factor to name.
Classical test theory details, see the excellent texts by Baker and
Kim (2004); De Boeck and Wilson (2004);
The focus of CTT is to understand and
Embretson and Reise (2000); Hambleton,
improve the reliability of test scores. CTT is
Swaminathan, and Rogers (1991); and van der
also synonymous with true score theory due
Linden and Hambleton (1996).
to its decomposition of observed scores (X)
The purpose of IRT is to provide an
into true score (T ) and error (E). According to
equation, called an item response function
CTT, at the examinee level, any observation
(IRF), to maximize the relationship between
is a realization of a random variable X with
examinee and item parameters and the prob-
a probability, or propensity, distribution. The
ability of a discrete response outcome such
examinee’s true score is then the expectation
as endorsing an item or answering an item
of this propensity distribution. That is, if an
correctly. While the only explicit assumptions
examinee were observed an infinite number
in CTT pertained to the distribution of
of times, the true score would be the average
measurement errors and their relationship
of the multiple observations. The difference
with other variables, IRT makes two strong
between the actual observation and the true
assumptions. The first assumption, local
score is the error in measurement, where
independence, requires that an examinee has
error is also a random variable but with
a true location on at least one continuous
an expectation of zero. CTT also assumes
latent dimension (true score) that can explain
that errors are normally distributed and
performance, resulting in responses that are
uncorrelated with other variables.
statistically independent. In other words, pro-
However, CTT is applied at the level of
per specification of the latent dimension(s)
the test rather than the examinee level, so
explains any relationship between observed
when examinees are randomly sampled, T
responses. There may be more than one
becomes a random variable also. Reliability
dimension underlying performance, but all
then is the ratio of variability in true scores
dimensions relevant to explaining perfor-
to variability in observed scores, where the
mance are specified. Secondary factors are
square root of reliability is the correlation
assumed to be mutually independent and
between true and observed scores. There have
collectively orthogonal (unrelated). In the
been a number of methods developed to
event that not all relevant dimensions are
estimate CTT reliability, some of which will
specified, research has shown that IRT is
be discussed in a later section. For a more
robust to minor violations of this assumption
detailed discussion of CTT see McDonald
as long as there is a strong dominant
(1999) or Crocker and Algina (1986) in
factor (Drasgow & Parsons, 1983; Tate,
addition to the classic Gulliksen (1950) and
2002).
Lord and Novick (1968) texts.
The second assumption is that the relation-
ship between performance and the underlying
dimension has a specific form. In most IRT
Item response theory
applications, including the most common
IRT, also referred to as latent trait theory, models presented here, the item-trait rela-
strong true score theory, or modern mental tionship can be adequately described by a
test theory, represents a more flexible and monotonically increasing IRF whereas the
more sophisticated testing framework than level of the trait increases, the probability
CTT by making CTT hypotheses more of a correct response or item endorsement
explicit. IRT represents a collection of increases as well, in accordance with the phi-
related model-based psychometric theories gamma hypothesis. Also referred to as an
that relate item responses to examinee and item characteristic curve (ICC), item response
item characteristics. For a more thorough dis- curve, or trace line, the IRF maps examinees’
cussion of the principles of IRT than what is locations on the latent continuum across
presented here, including additional technical levels of a construct. Item characteristics,
or parameters, determine the shape of the ICC The fundamental item response model is the
and will be described shortly. IRT models Rasch model (Rasch, 1960), or 1PL model,
and the corresponding IRFs differ in the
eD(θs −bi )
mathematical form of the IRF and/or the P (Xis = 1|θs , bi ) = , (1)
number of parameters in the model, but all 1 + eD(θs −bi )
will have at least one examinee trait param- where Xis is the response of person s to item i
eter and one item parameter. The reliance on (0 or 1). The linear combination of parameters
an adequate model means that IRT models Z is the simple difference between the trait
are falsifiable – they may or may not be level for person s, θs , and the difficulty of item
appropriate for a particular set of test data and i, bi . The person parameter, θs , is the person
are testable – thus, model-to-data goodness location parameter indicating a person’s level
of fit testing is essential (see Embretson & of the trait. When estimating item parameters,
Reise, 2000). Evidence of poor model fit a process referred to as calibration, the
may be an indication of a heterogeneous person parameter is assumed to be normally
population and will be discussed later in the distributed, however non-normal distributions
chapter. may be accommodated using a prior distri-
By relating the probability of an individual bution in a Bayesian framework. The Rasch
item response to both examinee and item model is called a 1-parameter model since it
parameters, the IRT model explicitly states contains only one item parameter, bi . The diffi-
that an examinee’s response to a given item culty parameter is sometimes referred to as the
will be a joint function of examinee charac- item location parameter indicating the item’s
teristics (i.e. level of the trait) and the char- position relative to the latent trait. Assuming
acteristics of the item itself. When the model that the latent trait metric is person-anchored,
of examinee behavior is probabilistic, three difficulty is interpreted in IRT as the point at
fundamental problems with CTT exist when which examinees have a 50 percent chance
applied to categorical data (see McDonald, of answering the item correctly or endorsing
1999). First, if the range of the construct is the item. Thus, if an examinee’s ability level
broad enough, CTT will result in a negative is equal to the difficulty of the item (i.e.
probability of response for examinees in θs − bi = 0), they will have a probability that
the lower tail of the trait distribution and Xis = 1 (a correct response or item endorse-
probability greater than 1.0 in the upper tail. ment) of 0.50. An item’s difficulty typically
Second, the linear common factor model ranges from −2.0 to 2.0, where a negative
used in CTT assumes that error variance is value indicates an easier, more frequently
independent from true score variance, and endorsed item. In the ability context, an item
this cannot be true for binary items. Third, with a negative difficulty parameter would be
CTT also assumes that measurement error appropriate for an examinee of below-average
(standard error of measurement) is constant ability. In a clinical context using a symptom
over all levels of the trait, and this too is checklist for depression (assuming that a
not realistic. high depression score indicates a depressed
In order to represent probability, the IRF individual), an item with a negative difficulty
must be curvilinear since it is bounded by parameter would indicate that a person who
zero and one. The logistic function, L(Z), is below the average level of depression has
where Z represents a linear combination of a 50 percent chance of endorsing that item or
item and person parameters that varies across exhibiting that symptom. The IRT difficulty
types of IRT models, is most commonly parameter is comparable to the mean item
used as the link function to relate the linear response in CTT. The 1PL model assumes
function of the parameters to the nonlinear that all items have the same degree of rela-
probability of the keyed response. The logistic tionship, or discrimination, with the construct.
link function is appropriate for a binomial In CTT, this is referred to as parallel items
dichotomous variable. (McDonald, 1999). The constant multiplier
1.0
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0.0
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Difficulty/Ability
A(1,0,0) B(1,1,0) C(1,−1,0) D(.5,1,0) E(1.5,−1,0) F(1,0,.1)
Figure 16.1 Item response functions for six hypothetical items. A, B, and C are 1PL models;
D and E are 2PL models, and F is a 3PL model. The numbers in parentheses correspond with
the discrimination, difficulty, and guessing parameter estimates, respectively
D = 1.701 is sometimes added to the logistic typically range from 0.5 to 1.5. The IRF for an
function to make it virtually indistinguishable item with high discrimination looks like a step
from the cumulative normal-ogive function function. The IRT discrimination parameter
(McDonald, 1999). corresponds to the CTT item-total correlation,
IRFs A, B, and C in Figure 16.1 reflect and a discrimination of 1.0 corresponds to
three items that differ in difficulty or location, a common factor loading of 0.70. IRFs C
but are equal in discrimination. IRF A is and E and IRFs B and D in Figure 16.1
appropriate for an examinee of average ability reflect the effect of unequal discrimination on
(bi = 0), while IRFs B and C are appropriate the probability of a correct response or item
for examinees who are above average on the endorsement. IRFs C and E have the same
trait (bi = 1.0) and below average on the trait location (bi = −1.0), but differ in the slope
(bi = −1.0), respectively. These items are of the IRF at the location parameter with IRF
equal in discrimination because they have the E having a steeper slope indicating a more
same shape or slope indicating the same rela- discriminating item. In CTT, one would say
tionship with the trait, just offset in location. item E has a higher item-total correlation than
The most commonly used IRT model, item C. IRFs B and D also share the same
the 2-parameter logistic (2PL) model location (bi = 1.0), but IRF D has a lower
allows items to vary in difficulty and in slope at the location parameter and thus is a
discrimination, less discriminating item.
The 3-parameter logistic (3PL) model is
eDai (θs −bi )
P (Xis = 1|θs , bi , ai ) = , (2) represented as,
1 + eDai (θs −bi )
P (Xis = 1|θs , bi , ai , ci )
where ai is the discrimination parameter
and is proportional to the slope of the IRF eDai (θs −bi )
= ci + (1 − ci ) , (3)
where θs = bi . Discrimination parameters 1 + eDai (θs −bi )
where ci represents a lower asymptote, or Embretson, 1991). Several models for con-
guessing, parameter for the model to reflect tinuous responses have been developed, such
the probability of a correct response by chance as Mellenbergh (1994), as well as models
alone. IRF F in Figure 16.1 illustrates the for exploring the multidimensionality of a
impact of a guessing parameter on the IRF. scale akin to exploratory factor analysis,
Item F has a lower asymptote of ci = 0.10, the exploratory multidimensional IRT model
indicating that regardless of an examinee’s (Bock, Gibbons, & Muraki, 1988) and con-
ability or location on the trait, an examinee firming the dimensionality of a scale akin to
always has at least a 10 percent chance confirmatory factor analysis, the confirmatory
of responding correctly or endorsing that IRT models for traits (Embretson, 1991, 1997;
item due to chance alone. In comparison, DiBello et al., 1995; Adams et al., 1997).
examinees of low ability or trait level have a
near 0 percent chance of responding correctly
The benefits of a model-based
to items A–E. There is no equivalent to the
approach
guessing parameter under CTT.
Several extensions of the basic IRT models CTT has an advantage over IRT in that most
have been developed. Bock (1972) extended CTT procedures have a closed form2 and
the 2PL model to the nominal response model are computationally simple, with IRT requir-
in order to use all information contained in ing complex estimation procedures (MML,
examinee responses. Thissen and Steinberg Empirical Bayes, etc.). It is also true that the
(1984, 1986) showed that all other non- correlation between IRT person ability and
ordered polytomous models are special cases the CTT summed scale score is usually very
of the nominal response model. The partial high, and so an argument can be made that
credit model (PCM; Masters, 1982) and its not much is gained through IRT. However,
derivation, the rating scale model (Andrich, just because two scalings (CTT and IRT) are
1978), were introduced for the case where equivalent (or nearly so) does not mean that
partial credit may be necessary as is often they will produce similar experimental and
the case with math problems. The graded applied results. IRT separates examinees in
response model (Samejima, 1969) assumes the extreme ranges of the ability distribution
available response categories can be ordered rather than in the middle by providing optimal
(i.e. Likert scales). The binomial trials model scaling of individual differences. For instance,
can be used for situations involving the in a bivariate scatterplot of CTT and IRT
probability that an examinee completes x of trait estimates, a Loess fit line would take
n trials such as making 8 of 10 free throws in on an ogive form with examinees having a
a basketball game. The Poisson counts model high degree of correspondence around the
is appropriate for measurement situations average trait level and more variability at
involving the number and difficulty of events the extreme ranges of the ability distribution.
(i.e. push-ups, sit-ups, etc.) completed per Several authors have reported problems with
period of time must be considered. using CTT scores as a metric for scaling
Other examples of IRT models include the individual differences or comparing groups
multidimensional extensions of the 1-, and (Maxwell & DeLaney, 1985; Yen, 1986;
2-PL models: the multidimensional Rasch Bond & Fox, 2001), testing moderated effects
model (Reckase & McKinley, 1982) and (Embretson, 1996), and change (Bereiter,
the multidimensional 2PL model (Reckase, 1963; Embretson, 1998b, 2007; Fraley et al.,
1997). Fischer’s LLTM has been extended 2000), where these problems were alleviated
to the multicomponent latent trait model by IRT scaling. In addition, IRT’s unique
(MLTM; Whitely, 1980), the general compo- properties are necessary to facilitate advanced
nent latent trait model (GLTM, Embretson, measurement applications such as CAT
1984), and the multidimensional Rasch (Weiss, 1982), detecting item bias or differen-
model for learning and change (MRMLC; tial item functioning (DIF; Lord, 1980), and
test linking or equating (Cook & Eignor, 1983, to perform at a high level. This is because the
1989). Despite its historical popularity, CTT difficulty of a test, or individual items for that
has many shortcomings. These shortcomings matter, is defined in CTT as the proportion of
will be discussed in the context of the four examinees in a group of interest who answers
characteristics of quality measurement: a the item correctly. Thus, a difficult versus easy
meaningful metric for obtained scores, score distinction depends on the examinees taking
reliability, score validity, and a standardized the test and performance depends on whether
mode of test administration. items are hard or easy.
IRT provides person-free item parameter
estimation and item-free person parameter
A meaningful metric
estimation that are invariant within a linear
When constructs are considered latent transformation, meaning that item parameters
(e.g. intelligence, depression, attitudes, from one sample can be linearly transformed
etc.) and are not directly observable (e.g. to be equal to parameters from a second
pounds, liters, kilometers, etc.), they have no sample. IRT places person ability and item dif-
inherent metric. Under CTT, in many cases, ficulty on the same scale, explicitly estimating
construct scores have little or no meaning the joint relationship between person and item
unto themselves unless they can be compared properties. Therefore, responses from items
to a normative group. Normative information with known IRFs can be used to estimate
serves as a reference by which to evaluate trait levels for other samples. In CTT, the
how an individual compares to others who model does not include item properties, so the
took the same test. IRT improves on this trait level applies only to particular items on
limitation by providing a sample-free metric that test. In contrast to CTT, the meaning of
for interpretation of performance. a trait level applies to any item where item
Invariance. Perhaps the most significant characteristics are known. This is essential
characteristics of CTT are the dependency for specific objectivity: the case in which
of the true score estimate on the specific comparison of examinees is independent of
test and population and the dependency of the specific items or tests administered. In IRT,
item characteristics on the specific sample a number of item properties can be incorpo-
from which they are derived. This means rated into the model including item difficulty,
that examinee and test characteristics cannot discrimination, susceptibility to guessing, the
be separated. That is, ability estimates apply nature of the response alternatives, impact of
only to items on a specific test or to substantive item features, average response
items on a parallel test with equivalent item time, etc.
properties, and item characteristics depend on It is important to note that even in IRT,
the group of examinees from which responses careful consideration must be given when
are obtained. Under CTT, the trait level is selecting the sample of examinees to be used
estimated by calculating the unit-weighted for item calibration. As noted earlier, item
summed scale score. The meaning of the score parameter estimation assumes that the trait is
is obtained by comparing the individual’s person-anchored. If the calibration sample is
performance to its position in a normative not a representative sample of the population
group in order to obtain a ‘true score,’ or the that an item bank is being developed for, the
expected value of observed performance on researcher will have difficulty in interpreting
the test of interest (Hambleton, Swaminathan the meaning of the resulting item parameters.
and Rogers 1991). If an item is added or However, once the representative calibration
removed, the true score changes, resulting in sample is selected and IRFs are known, items
a unique psychometric scale for every test. from a calibrated bank can be used to estimate
If a test is difficult, an examinee of average trait levels for other samples and the resulting
ability will appear to perform poorly, and if trait estimates are comparable across samples,
a test is easy, that same examinee will appear administrations, and studies.
Comparing groups. In order to compare time (i.e. revisions, short forms, etc.), different
the performance of groups of examinees, the measures of a common construct, or the same
items on the test must function the same for all measure administered in different languages.
examinees regardless of group membership. These situations are also easily remedied due
That is, the scale items must illustrate mea- to the invariance property of IRT.
surement invariance (Vandenberg & Lance, Measurement of change. Under CTT,
2000). Under the IRT framework, an item meaningful change scores can only be com-
exhibits DIF if the IRF is not equivalent when pared when initial score levels are equivalent
estimated separately for each group. Such as a small deviation from a high initial score
an illustration is only possible because of on an easy test does not mean the same thing
the parameter invariance properties of IRT. as a small score change from an average score,
Proper identification of DIF is hindered under because an interval scale level of measure-
the CTT framework by the lack of sample ment is not achieved (Embretson & Reise,
independent item statistics. DIF has increased 2000). If an interval scale of measurement is
in prominence, and will continue doing so, achieved through transformations, then it is
along with the increased emphasis on test specific to that particular test administration.
fairness (see American Educational Research However, in IRT, change scores can be
Association, American Psychological Associ- meaningfully compared even when the initial
ation, & National Council on Measurement scores are unequal3 . This is largely due to
in Education, 1999). See Holland and Wainer the interval scale nature of item difficulty
(1993); Millsap and Everson (1993); Waller parameters and individual trait parameters.
et al. (2000); or Reise et al. (2001) for further Bereiter (1963) indicated three basic prob-
discussions and illustrations of DIF. lems with using a simple CTT difference score
Comparing different measures of the same to indicate change: a paradoxical relationship
trait. Historically, when necessary to compare between the test-retest correlation and the
or relate test scores from two different admin- reliability of the change score, the initial score
istrations or test scores from two different correlates negatively with the change score,
measures of the same construct, test-equating and the fore-mentioned scaling issue. A fourth
procedures were necessary (see Doran & problem is whether the change score actually
Holland, 2000; Embretson & Reise, 2000). reflects change due to a condition or is simple
The development and refinement of IRT pro- error (Embretson, 1998a). A special Rasch-
cedures allows for a more powerful approach family model, the multidimensional Rasch
referred to as scale linking (Choi & McCall, model for learning and change (MRMLC;
2002). Scale linking through IRT solves two Embretson, 1991) addressed the four dif-
classic problems experienced under CTT: ficulties of CTT by resolving the scaling
respondent non-response resulting in a differ- and reliability problems found with standard
ent set of items for different examinees, and ‘change’ scores and removing some of the
different measures for different examinees confounds that occur with initial status. Two
(see Vale, 1986). Under CTT, when non- of the problems are addressed by IRT in
response occurs, the average item response general. First, the Rasch model achieves
may be used instead of the unit-weighted interval scale properties (see Andrich, 1985;
summed score, or a missing data procedure Fischer, 1995a). Second, the MRMLC, as
such as multiple imputation (MI; Schafer, an IRT model, provides individual standard
1997) may be used, although this rarely occurs error of measurement estimates. The MRMLC
or is recommended. Under IRT, examinee specifically addresses the two change score
non-response is not a problem because the dilemmas: the issue of paradoxical reliabil-
trait estimate can be estimated from any set ities is addressed by modeling individual
of items with known IRFs. The need to link change directly in a model that explains
different measures for different examinees changing test correlations, and the correlation
often occurs due to changing content over between the initial score and the change
score is resolved by achieving interval scale the correlation between two alternate test
properties (Embretson, 1998b). forms could be used to estimate test reliability
(Lord & Novick, 1968), thus between-
test variability is often assessed by either
Reliability
repeated administrations of the same test (test-
Reliability refers to the accuracy or precision retest reliability) or by administrations of
of a measurement instrument. That is, scores parallel forms (alternate forms reliability).
must be reliable before they can be valid. Within-test consistency can be assessed with
It is important to note that tests themselves a single test administration either by use
are not reliable, the resulting scores are. It of split-half reliability or most commonly,
is possible for a given test to yield highly coefficient alpha (Guttman, 1945; Cronbach,
reliable scores in some circumstances but 1951).
not others. Responsible reporting of test Coefficient alpha, as the average inter-item
results should always include the reliability correlation, quantifies the internal consistency
estimate in order to reflect the impact within a test and is appropriate for multiple-
of sample-specific characteristics on score item measures that measure a single common
reliability. construct (i.e. are unidimensional). Coeffi-
Internal consistency. The second CTT cient alpha derivations assume that all items
shortcoming concerns the definition of reli- measure the same construct (i.e. the test is
ability and its complement, the standard unidimensional), and all items are assumed to
error of measurement (SEM). Under CTT, be equally related to the construct (i.e. parallel
a measure is reliable, or consistent, if an measures). For dichotomously scored items,
individual examinee can hypothetically be the Kuder-Richardson Formula 20 (KR20 )
measured a large number of times and is identical to coefficient alpha, and if all
achieve the same score each time. Reliability items have the same degree of difficulty,
quantifies the proportion of true score variance the Kuder-Richardson Formula 21 (KR21 )
in a set of scores. Even though the CTT may be used. Several factors influence the
model in Equation 1 specifies that there reliability of test scores under CTT, including
are two independent variables (IVs) per the heterogeneity of the sample, the level of
person (T and E), these IVs are not actually the sample on the construct, and the number of
separable for an individual score. Instead, items. Numerous other reliability coefficients
communalities (correlations) between items have been developed for CTT that provide
are used to infer population estimates of true either lower bound estimates or estimates
and error variance. Reliability is estimated with unknown biases (see Hambleton & van
as the correlation between test scores on der Linden, 1982).
parallel forms of a test or as a function Under CTT, the ‘quality’ of items or their
of inter-correlations among items on a test. relationship with the trait is evaluated based
As a test-level estimate under CTT, scale on the mean item response and the item-
reliability, as well as the SEM, applies equally total correlation. The mean item response,
to all individuals in a sample that takes the or the proportion endorsing the item in the
test or all scores obtained from a particular keyed direction, is a measure of the difficulty
test administration. Thus, CTT is relevant to of the item, and the item-total correlation
reliability only at the population level and not is an indication of how well the item taps
at the individual level. the construct of interest. Such item statistics
There are two primary sources of mea- are not invariant across diverse samples and
surement error in observed scores: incon- are thus sample dependent. Item difficulty
sistency across time and/or test forms changes depending on the average trait level
(between-test variability), and inconsistency of the respondent sample, and the item-
across items within a test (within-test total correlation is heavily influenced by the
variability). Spearman (1927) illustrated that variability of scale scores on a given sample
and changes depending on whether items are where information reflects how well an item
added or deleted from the test. differentiates among respondents who are at
Unfortunately, coefficient alpha and the different levels of the latent variable. Under
Kuder-Richardson formulas themselves can IRT, the IIF and a scale information function
misestimate scale reliability. When items are (SIF) are calculated that allow measurement
not parallel, regardless of dimensionality, error to vary across levels of the trait. By
coefficient alpha is actually a lower-bound allowing for non-uniform precision across
reliability estimate (i.e. reliability is under- the entire range of trait levels with extreme
estimated; see Lord & Novick, 1968; Raykov, levels of a trait having more measurement
1997). Conversely, when unidimensionality error than the typical levels of the trait,
is violated by the inclusion of subscales, IRT provides a more realistic and valid
methods factors, or strict time limits (causing conceptualization of reliability.
the introduction of a speed of processing Information is a function of item parameters
factor), coefficient alpha can result in an at any given trait level. For the 1PL model,
overestimate of the scale precision. Recently, information is a product of the probability of
newer methods have been proposed to a correct response, pi (θ), and probability of an
more accurately estimate scale reliability and incorrect response, qi (θ ). Item information for
allow for establishing a confidence interval the 2PL and 3PL models further incorporate
around the point estimate (see Raykov, 1997; the discrimination and guessing parameters.
Raykov & Shrout, 2002). See Figure 16.2 for example IIFs relative to
Information. The most significant differ- three of the IRFs presented in Figure 16.1.
ence between IRT and CTT is the con- The IIF appears as a bell-shaped function
ceptualization of measurement error. Under with the maximum information provided at
CTT, there is a single index of reliability the location parameter. That is, information
for all examinees. Instead of item reliability, is greatest when the item’s difficulty and the
IRT uses an item information function (IIF), person’s ability are matched. The shape of
1.0 0.60
0.9
0.50
0.8
0.7
0.40
0.6
Information
Probability
0.5 0.30
0.4
0.20
0.3
0.2
0.10
0.1
0.0 0.00
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Difficulty/Ability
A - IRF D - IRF E - IRF A - IIF D - IIF E - IIF
Figure 16.2 Item information functions contrasted with their corresponding item response
functions for three of the items in Figure 16.1 differing in discrimination and difficulty
1.2 4.5
4.0
1.0
3.5
0.8 3.0
Standard Error
Information
2.5
0.6
2.0
0.4 1.5
1.0
0.2
0.5
0.0 0.0
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Difficulty/Ability
A B C D E F SIF SE
Figure 16.3 Item information functions, scale information function, and test standard error
for a hypothetical test that includes the six items originally presented in Figure 16.1. Note
that item E has the highest discrimination and thus the most information. Even though the
average difficulty is 0.0, maximum precision is obtained for examinees that are approximately
0.80 standard deviations errors below ‘average’
the IIF is indicated by the item discrimina- the square root of the SIF. Information in IRT
tion, for models with varying discrimination allows the reliability of a test to be shaped
parameters. Note that the highest IIF is for for different ranges of ability. Figure 16.3
item E which also has the highest discrimina- includes the SIF and SEM for a hypothetical
tion (ai = 1.5). Highly discriminating items test that includes the same six items illustrated
provide more information over a narrow in Figure 16.1. While the average difficulty
range, while low discriminating items provide of the 6 items is approximately 0.0, maximal
less information over a broader range. For precision is actually obtained for examinees
instance, the IIF for item E shows a lot of θs /bi = −0.80 because of the additional
information is available over a narrow range information provided by item E at bi = −1.0
of abilities from about −2.0 to 0.0 centered and the relatively little information provided
at the item location parameter for that item by item D at bi = 1.0. Thus, this hypothetical
(bi = −1.0), while the IIF for item D shows test would yield the most precise measurement
that much less information is available for for individuals who are approximately 0.80
examinees with a much broader range of standard deviations below average on the trait
abilities. of interest.
Due to local independence (the latent As a parallel to coefficient alpha, an empir-
variable explains any relationship between ical reliability coefficient can be computed
items), item information is additive, so test as an average reliability across examinees.
information represented by the SIF is a sum of The empirical reliability coefficient (see du
item information. The SIF can be recomputed Toit, 2003) may be given as the ratio of the
just as the CTT statistic of alpha-if-deleted is variance in estimated scores for the sample,
calculated. The standard error or measurement σθ2 , to the sum of σθ2 and the mean square
at a given trait level is then the reciprocal of SEM (σE2 ).
than broad multidimensional constructs. In

Validity
addition, IRT can be used to determine more
While reliability is typically defined as precise subscores and then used to combine
consistency, validity is typically presented the subscores into a general score comparable
as accuracy, or the degree to which a to a second-order factor model (Thissen &
test measures the construct it purports to Wainer, 2001). Birnbaum (1968) showed that
measure. Validity generally involves either IRT provides a weighted scaling that results
demonstrating a pattern of correlations with in the smallest possible (most precise) SEM,
other variables that is consistent with the- but does IRT make a practical difference? The
oretical expectations or demonstrating that common answer (e.g. Reise & Henson, 2003)
some theoretically supported experimental is ‘maybe’ as there is evidence on both sides.
manipulation of the construct results in Interestingly, the split evidence parallels the
the expected changes in the construct. For two aspects of construct validity.
example, evidence of validity is established Construct validity. The traditional view
for a measure of depression if it positively of validity holds that there are three ‘types’
correlates with other established measures of validity information: how well the test
of depression, negatively correlates with represents the trait (content validity), how
measures of positive affect, and does not well the test predicts performance (criterion-
correlate with measures of a theoretically related validity), and what scores on the test
unrelated construct. A measure of stress mean (construct validity). Currently, validity
should result in higher levels of stress after is viewed as consisting of only one type,
examinees have been exposed to a stressor. construct validity, which has several aspects
Reliability is a necessary precondition for that apply to all tests (Messick, 1994). These
validity, as test scores can be reliable but aspects of validity are external validity, sub-
not valid. For example, scores from a ‘new’ stantive validity, content validity, structural
measure of depression may have high internal validity, generalizability, and consequential
consistency or a high correlation with repeated validity. External validity most closely reflects
administrations suggesting strong test-retest Cronbach and Meehl’s (1955) emphasis on
reliability, yet the same scores may not the nomological network of relationships to
correlate at all with established measures explicate construct validity.
of depression or show clinically meaning- Research to explicate the meaning of
ful differences in depression for diagnosed the nomological network of relationships
versus non-diagnosed patients. examines individual differences relationships
IRT has a well-established place in testing between the focal trait and other variables. The
research and applications, but CTT factor ana- focal trait(s) should represent interpretable
lytic models remain prevalent in most areas of dimensions, account for the test’s external
social research despite their close association validity, differentially predict learning under
with IRT (McDonald, 1999) – factor loadings various treatments or in separate content
are comparable to item discriminations and areas, and generalize across tasks. Test scores
factor thresholds are comparable to item show little difference in relative standing
difficulties, although thresholds are rarely among examinees when determined by both
used in factor analysis. IRT use may be CTT and IRT. In fact, the CTT raw score is a
more dependent on the underlying theoretical sufficient statistic for estimating the examinee
nature of the construct of interest rather trait parameter under the 1PL (Rasch) model.
than the discipline in which it is employed. CTT raw scores and 2PL examinee trait
For instance, IRT is appropriate for latent estimates also tend to correlate in the upper
or reflective constructs but not formative 0.90s (Reise & Henson, 2003). In a well-
or emergent constructs like social status or designed instrument, item discriminations
mental health (Reise & Henson, 2003). IRT or weights do not differ widely, so a
is better suited for narrower constructs, rather 2PL model would effectively employ unit
weighting and be comparable to the 1PL properties directly to the trait. Embretson
model. Reise and Henson (2003) report that (1983) noted that all four criteria could be
in personality research, there is no substantial met by multicomponent latent trait modeling
evidence personality research to show that which combines IRT with mathematical
IRT increases the magnitude of the validity modeling. In this approach, task decompo-
coefficient and thus external validity. sition is applied to test items as a basis for
External validity, however, is not suffi- estimating the theoretical parameters. Some
cient to elucidate construct meaning. The early examples of this approach are the linear
relationship between test scores and other LLTM (Fischer, 1973), the multicomponent
variables elaborates the test’s nomological latent trait model (MLTM; Whitely, 1980),
network, but this confounds the meaning and the general component latent trait model
of a construct with its significance. Even (GLTM; Embretson, 1984).
though construct significance is elaborated Development of measures from cogni-
by empirically established relationships, con- tive principles. Even though researchers in
struct meaning is not (Bechtoldt, 1959). Estab- psychological and educational measurement
lishing substantive validity (Messick, 1994) have been interested in developing tests based
more directly involves construct meaning. on cognitive principles for quite some time,
Embretson (1983) suggested that the theory little has been done to progress the interest.
behind a construct must be brought into more Aptitude and ability tests are frequently
of a central role in defining construct meaning described using cognitive terms, but the
by differentiating construct representation real utility of cognitive theory has been
from nomothetic span. That is, the construct widely ignored (Pellegrino, 1998). Cognitive
representation aspect of construct validity (see psychology principles can be useful for
Embretson, 1983; Messick, 1994) is explained test design because justifiable operational
by understanding the cognitive processes and definitions are required for the construct mea-
strategies that are involved in items, as well as surement, the field frequently takes advantage
by understanding the specific knowledge that of detailed task stimulus property descrip-
is required for successful item completion. tions, and they provide results on how item
Even though nomothetic span is supported by properties influence the cognitive processes
individual differences relationships, construct involved in problem solving (Embretson,
representation is supported by studying the 1998a). Understanding the sources of cogni-
impact of item and task features on item tive complexity in items can lead to effective
responses. This distinction results in several means of item generation. The stimulus fea-
advantages for test development, including tures that are quantified to represent sources
the capacity to design items to reflect specific of cognitive demand potentially can be
cognitive constructs and to select items for stim- manipulated to develop items with specified
ulus features that influence targeted processes. sources and levels of cognitive complexity.
IRT and CTT differ greatly in their potential Since the stimulus features are quantified
for explicating construct representation and in the cognitive model, item difficulty is
for guiding test design. Four general criteria predictable, depending on the strength of the
can be applied to evaluate a psychometric model. This further leads to the possibility of
methodology for construct validity: relating quickly producing a large number of items
individual test performance to the character- that may require little or no empirical tryout,
istics of item stimuli, providing a comparison due to the priors from the model predictions
of alternative theories of the task constructs, (Mislevy, 1993). Effective cognitive models
establishing specific terms for theoretical have been developed for many non-verbal
construct quantification, and measuring indi- intelligence tests (Bejar, 1993; Embretson &
viduals on the constructs involved. CTT Gorin, 2001; Embretson, 2002) and several
does not meet these criteria because it is researchers have demonstrated the potential
test-score oriented and does not link item to generate test items based on a specific
cognitive theory (Hornke & Habon, 1986; capacity for expanded visualization, audition,
Bejar & Yocam, 1991; Embretson, 1994). and interaction, and the automated nature of
Despite these successes, psychometric tests computers can be capitalized to develop or
designed from cognitive theory have been compile, process, and score tests, including
rare. See Embretson (1995, 1998a), Kyllonen complex responses such as open-ended ques-
(1993), Kyllonen and Christal (1989, 1990), tions. Finally, from a substantive perspective,
and Draycott and Kline (1994) for examples CBT has the potential to assess new skills,
of cognitive design systems and efforts some even better than other testing formats,
towards developing psychometric measures and can allow access to data that is not
from cognitive principles. readily available from a pencil-and-paper
format (e.g. response time).
CBT is not without its unresolved prob-
Standardization
lems, however. Access to computer testing
A measure is standardized if there are uniform centers or the internet, item security, test-
procedures to ensure that the measure is delivery system reliability, and the expense of
administered and scored the same way each development are of primary concern. The psy-
time it is used. If so, two individuals who chometric quality of the tests, the adequacy
receive the same score can be interpreted of the supporting theoretical models, and the
to possess the same amount of the attribute. issue of whether test bias occurs due to the
However, there is a great degree of variability effect of access to technology on performance
in the procedures that are used for standard- are very active areas of inquiry. See Mills et al.
ization. Measures scored through CTT can be (2002) for the current state of the art in CBT.
easily standardized, but some of the special Computerized adaptive testing. CBT itself
qualities of IRT allow for major advances in is not an advance attributable to the benefits
test administration. of IRT over CTT. However, an important
Computer-based testing. The development issue related to CBT is item selection. In a
of the computer and its application to testing conventional test, every examinee receives
has brought with it several improvements the same (or parallel) test form, the same
in test standardization. In contrast to the item set, and in the same (or counterbalanced)
traditional pencil-and-paper mode of test presentation order. Conventional tests are
administration, CBT has become a common usually administered through the pencil-and-
form of test delivery. Perhaps the most paper format, but a computer may be used
notable advantage of CBT over the pencil- to administer the test as well. Conventional
and-paper and interview formats is the level of tests are usually geared towards the average
administrative control given over the testing examinee, so they are not the best estimators
conditions. CBT simplifies administration, of ability for examinees at the extremes of the
requires fewer resources, provides faster ability continuum (low or high). These tests
results, may be less prone to testing-related can be time-intensive for the examinee, but as
errors, may minimize examinee cheating, group tests, they are relatively convenient for
and has become more cost-effective as the the administrator.
cost and prevalence of personal computers Adaptive tests tailor item selection to meet
decreases (Mead & Drasgow, 1993). From a the examinee’s individual ability levels by
logistical perspective, CBT can reduce testing selecting from a pool of items that are the
time, provide immediate scoring, allow more most appropriate for that particular examinee,
frequent testing, provide the opportunity for so not all examinees will receive the same set
walk-in testing, allow individual administra- of items. Traditional individual intelligence
tion, and increase test security by reducing tests or subtests that require the administrator
the possibility that examinees can provide to determine a baseline ability level for the
information to one another. More complex examinee and then administer increasingly
item types are available through increased more difficult items until a ceiling is reached
are examples of adaptive tests. Individual FUTURE DIRECTIONS AND TRENDS

versions of these tests are time-intensive on
the part of the researcher and the examinee. IRT is rapidly being implemented in areas
A computer-adaptive test (CAT) rapidly other than ability and achievement testing.
adjusts the difficulty level of the test to match A major effort in health measurement, the
the ability level of the examinee by using a PROMIS project, is to provide common
computerized algorithm to select the items scaling for the diverse measures of sub-
that are most appropriate for estimating the jectively reported patient outcomes (i.e.
examinee’s ability levels from a large pre- pain, depression, fatigue, etc.). The potential
calibrated pool of items, an item bank, with advantages for health-related studies include
pre-determined item characteristics (a set of shorter and more efficient measures, common
items with known IRFs). A CAT starts at a scaling between long forms, short forms
difficulty level that is deemed most likely and similar measures which lead to greater
to be accurate for the examinee (usually comparability between clinical studies and a
‘average’). Depending on the accuracy of the rigorous comparison of item functioning from
initial response or set of responses, which different tests of the same construct.
is immediately scored by the computer, the Current theoretical developments in IRT
item (or set of items) presented next is should permit an even greater role of existing
either more or less difficult than the last one. substantive theories in the development and
Thus the items administered are appropriate interpretation of measures. Perhaps the most
for the examinee’s ability level. Based on important development is De Boeck and
the previous responses a trait estimate and Wilson’s (2004) book that introduces a family
standard error are estimated for the examinee, of explanatory item response models with
a new item is selected from the item bank, and estimates that are obtainable with common-
so on. The iterative process is repeated until place statistical software. The explanatory
either a pre-specified number of items have models, like the LLTM described above, allow
been administered or a minimum standard construct validity to be elucidated at the item
error is achieved. level. Further, these explanatory models are
Through an iterative process of testing, also important for predicting the properties of
updating the ability estimate, and retesting, newly generated items for ability and achieve-
a CAT can arrive at a more accurate ability ment tests, as described above. However,
estimate than what can be obtained by a important applications in other areas with
non-adaptive test. Potentially, each examinee possibly structured item properties, such as
receives a different set of items that is tailored personality, attitude, and psychopathology are
to provide the most efficient estimate of feasible (see De Boeck & Wilson, 2004).
his or her ability. As a result, some of the Another important theoretical direction is
notable advantages to a CAT beyond those the development of new IRT models that
that are inherent to any CBT are: fewer test incorporate response time in the estimation of
items (as many as half when compared with the latent trait. For example, Tuerlinckx and
pencil-and-paper tests; Wainer, 2000), less De Boeck (2005) present alternative models to
time required, enhanced test security because explain how response time impacts other IRT
all examinees are potentially administered a item parameters, such as item discrimination.
different set of items, improved examinee Other research has been concerned with how
test-taking motivation, and reduced average to combine response time and accuracy into
test score differences across ethnic groups. assessment (e.g. Glickman et al., 2005).
Current focuses for research on CBT and Finally, although computerized testing is
CAT involve technical problems such as rapidly becoming state of the art in many
item bank maintenance, pre-testing items areas, internet testing has become an impor-
to obtain item statistics, and item and test tant variant. If internet testing occurs in
security. a proctored laboratory, like computerized
testing, no new issues emerge depending cultural linguistic differences (Alderman &
on the adequacy of the test administration Holland, 1981), and physical functioning
programming. However, unproctored internet (McHorney & Cohen, 2000). Through the
testing is quite controversial, prompting an continued expansion of available and user-
American Psychological Association commit- friendly software and an increase in approach-
tee to outline the various issues involved in able references and applications, IRT will
internet testing (see Naglieri et al., 2004). continue to become better-appreciated and
Even though unproctored testing cannot be further cemented in its status as the modern
generally recommended, research on the measurement framework.
various issues is actively in progress.
NOTES
SUMMARY AND CONCLUSIONS
1 According to the phi-gamma hypothesis, when
This chapter has reviewed the principles of a series of stimuli are controlled to range in
CTT and contrasted them with a modern intensity from zero to high intensity, the probability
measurement framework, IRT. IRT represents that an observer can detect the increasing stimuli
monotonically increases from zero to unity along a
several important advances over classical psychometric curve that can be represented by the
methods, including the capacity to model the cumulative normal distribution function. In modern
measurement process at the behavior level measurement terms, as the difficulty of an item
rather than at the instrument or person level; increases, the probability that an examinee will
provide a meaningful and interpretable metric correctly answer the item increases according to the
cumulative normal distribution function.
for comparing individual performance within 2 An equation is said to have a closed form if it
a sample as well as between unrelated sam- can be expressed in terms of so-called ‘elementary
ples; provide a framework for acknowledging functions’ such as addition, subtraction, multiplica-
the non-uniform precision of measurement tion, division, or exponentiation. In other words, it
across the entire range of the trait; and has a finite and exact solution. In contrast, estimation
procedures such as MML and empirical Bayes require
provide the platform by which advances in iterative procedures and result in an approximate
computerized testing are possible. solution that maintains a reasonably small amount of
IRT has its roots in educational testing error.
and the general testing of mental abilities, 3 Classical test theory suffers from the ‘physicalism-
and a vast majority of applications have subjectivism’ dilemma that equal raw score dif-
ferences do not necessarily correspond to equal
been in these contexts. While CTT has differences in the true latent trait. This is related to
historically been the dominant paradigm for the problem of ceiling and floor effects common in
measurement in the social sciences, and raw scores (Bereiter, 1963; Harris, 1963; Lord, 1963).
remains the preferred paradigm for a majority It is considered well known that these dilemmas are
of applied researchers in the social sciences, solved or are at least less critical when using IRT (see
Fischer, 1987, 1989, 1995b).
the advances represented by IRT have been
made apparent by this chapter, the numerous
texts on the subject, and the rapidly expand-
ing literature containing numerous applied REFERENCES
examples. For instance, emerging work from
a diverse set of applied research contexts Adams, R. A., Wilson, M., & Wang, W. C. (1997). The
multidimensional random coefficients multinomial
is demonstrating the applicability of IRT in
logit model. Applied Psychological Measurement, 21,
the broader social research context. Some
1–23.
emerging research contexts include person- Alderman, D. L., & Holland, P. W. (1981). Item
ality assessment (Reise & Henson, 2003), performance across native language groups on the
stroke rehabilitation (Duncan et al., 1999; Test of English as a Foreign Language (Research
Andres et al., 2004), smoking cessation (Noel, Rep. No. 81-16). Princeton, NJ: Educational Testing
1999), attitude measurement (Roberts, 1995), Service.
American Educational Research Association, American Validity, technical adequacy, and implementation
Psychological Association, & National Council on (pp. 317–338). Mahwah, NJ: Lawrence Erlbaum
Measurement in Education (1999). Standards for Associates, Inc.
educational and psychological testing. Washington, Cook, L. L., & Eignor, D. R. (1983). Practical
DC: American Psychological Association. considerations regarding the use of item response
Andersen, E. B. (1972). The numerical solution of a set theory to equate tests. In R. K. Hambleton (Ed.),
of conditional estimation equations. Journal of the Applications of item response theory (pp. 175–195).
Royal Statistical Society, Series B, 34, 42–54. Vancouver, BC: Educational Research Institute of
Andres, P. L., Black-Schaffer, R. M., Ni, P., & Haley, British Columbia.
S. M. (2004). Computer adaptive testing: A strategy Cook, L. L., & Eignor, D. R. (1989). Using item response
for monitoring stroke rehabilitation across settings. theory in test score equating. International Journal of
Topics in Stroke Rehabilitation, 11, 33–39. Educational Research, 13, 161–173.
Andrich, D. (1978). A rating formulation for ordered Crocker, L., & Algina, J. (1986). Introduction to classical
response categories. Psychometrika, 43, 561–594. and modern test theory. New York: Holt, Rinehart and
Andrich, D. (1985). Rasch measurement models. Winston.
Newbury Park, CA: Sage Publishers. Cronbach, L. J. (1951). Coefficient alpha and the internal
Baker, F. B., & Kim, S. H. (2004). Item response structure of tests. Psychometrika, 16, 297–334.
theory: Parameter estimation techniques (2nd ed.). Cronbach, L. J., & Meehl, P. E. (1955). Construct
New York: Marcel Dekker. validity in psychological test. Psychological Bulletin,
Bechtoldt, H. (1959). Construct validity: A critique. 52, 281–302.
American Psychologist, 14, 619–629. De Boeck, P., & Wilson, M. (2004). Explanatory item
Bejar, I. I. (1993). A generative approach to psychologi- response models. New York: Springer.
cal and educational measurement. In N. Frederiksen, DiBello, L. V., Stout, W. F., & Roussos, L. (1995). Unified
R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a cognitive psychometric assessment likelihood-based
new generation of tests (pp. 323–359). Hillsdale, NJ: classification techniques. In P. D. Nichols, S. F.
Erlbaum. Chipman, & R. L. Brennan (Eds.), Cognitively
Bejar, I. I., & Yocam, P. (1991). A generative approach diagnostic assessment (pp. 361–389). Hillsdale, NJ:
to the modeling of isomorphic hidden-figure items. Erlbaum Publishers.
Applied Psychological Measurement, 15, 129–138. Doran, N. J., & Holland, P. W. (2000). Population
Bereiter, C. (1963). Some persisting dilemmas in the invariance and the equatability of tests: Basic
measurement of change. In C. Harris (Ed.), Problems theory and the linear case. Journal of Educational
in measuring change (pp. 3–20). Madison, WI: Measurement, 37, 281–306.
University of Wisconsin Press. Drasgow, F., & Parsons, C. (1983). Applications
Birnbaum, A. (1968). Some latent trait models and their of unidimensional item response theory models
use in inferring an examinee’s ability. In F. M. Lord & to multidimensional data. Applied Psychological
M. R. Novick (Eds.), Statistical theories of mental Measurement, 7, 189–199.
test scores (pp. 397–424). Reading, MA: Addison- Draycott, S. G., & Kline, P. (1994). Speed and ability: A
Wesley. research note. Personality and Individual Differences,
Bock, R. D. (1972). Estimating item parameters and 17 (6), 763–768.
latent ability when responses are scored in two or Duncan, P. W., Wallace, D., Min Lai, S., Johnson, D.,
more nominal categories. Psychometrika, 37, 29–51. Embretson, S., & Laster, L. J. (1999). The stroke impact
Bock, R. D., & Aitken, M. (1981). Marginal maximum scale version 2.0: Evaluation of reliability, validity, and
likelihood estimation of parameters: An application sensitivity to change. Stroke, 30, 2131–2140.
of an EM algorithm. Psychometrika, 45, 443–459. du Toit, M. (Ed.) (2003). IRT from SSI: BILOG-MG,
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full- MULTILOG, PARSCALE, TESTFACT. Lincolnwood, IL:
information item factor analysis. Applied Psychologi- Scientific Software International.
cal Measurement, 12, 261–280. Embretson, S. E. (1983). Construct validity: Construct
Bond, T. G., & Fox, C. M. (2001). Applying the representation versus nomothetic span. Psychological
Rasch model: Fundamental measurement in the Bulletin, 93, 179–197.
human sciences. Mahwah, NJ: Lawrence Erlbaum Embretson, S. E. (1984). A general multicomponent
Associates, Inc. latent trait model for response processes. Psychome-
Choi, S. W., & McCall, M. (2002). Linking bilin- trika, 49, 175–186.
gual mathematics assessments: A monolingual IRT Embretson, S. E. (1991). A multidimensional latent
approach. In G. Tindal & T. M. Haladyna (Eds.), trait model for measuring learning and change.
Large-scale assessment programs for all students: Psychometrika, 56, 495–516.
Embretson, S. E. (1994). Application of cognitive design Fraley, R. C., Waller, N. G., & Brennan, K. A. (2000). An
systems to test development. In C. R. Reynolds (Ed.), item response theory analysis of self-report measures
Cognitive assessment: A multidisciplinary perspective of adult attachment. Journal of Personality and Social
(pp. 107–135). New York: Plenum. Psychology, 78, 350–365.
Embretson, S. E. (1995). The role of working memory Glickman, M. E., Gray, J. R., & Morales, C. J. (2005).
capacity and general control processes in intelligence. Combing speed and accuracy to assess error-free
Intelligence, 29, 169–189. cognitive processes. Psychometrika, 70, 405–425.
Embretson, S. E. (1996). Item response theory models Goldstein, G., & Hersen, M. (1984). Handbook of
and spurious interaction effects in factorial ANOVA psychological assessment. New York: Pergamon
designs. Applied Psychological Measurement, 20, Press.
201–212. Guilford, J. P. (1954). Psychometric methods (2nd ed.).
Embretson, S. E. (1997). Structured ability models in New York: McGraw Hill.
tests designed from cognitive theory. In M. Wilson, Gulliksen, H. (1950). Theory of mental tests. New York:
G. Engelhard, & K. Draney (Eds.), Objective Wiley.
measurement III (pp. 223–236). Norwood, NJ: Ablex. Guttman, L. (1945). A basis for analyzing test-retest
Embretson, S. E. (1998a). A cognitive design system reliability. Psychometrika, 10, 255–282.
approach to generating valid tests: Application Guttman, L. (1957). Simple proofs of relations between
to abstract reasoning. Psychological Methods, 3, the communality problem and multiple correlation.
380–396. Psychometrika, 22, 147–157.
Embretson, S. E. (1998b). Modifiability in lifespan Hambleton, R. K., Swaminathan, H., & Rogers, H. J.
development: Multidimensional Rasch Model for (1991). Fundamentals of item response theory.
learning and change. Paper presented at the annual Newbury Park, CA: Sage Publishers.
meeting of the American Psychological Association, Hambleton, R. K., & van der Linden, W. J. (1982).
San Francisco, August. Advances in item response theory and applications:
Embretson, S. E. (2002). Generating abstract reasoning An introduction. Applied Psychological Measurement,
items with cognitive theory. In S. Irvine, & P. Kyllonen, 6, 373–378.
(Eds.), Generating items for cognitive tests: Theory Harris, C. W. (Ed.) (1963). Problems in measuring
and practice (pp. 219–250). Mahwah, NJ: Erlbaum. change. Madison: The University of Wisconsin Press.
Embretson, S. E. (2007). Impact of measurement scale Holland, P. W., & Wainer, H. (1993). Differential
in modeling development processes and ecological item functioning. Hillsdale, NJ: Lawrence Erlbaum
factors. In T. D. Little, J. A. Bovaird, & N. A. Card (Eds.), Associates, Inc.
Modeling contextual effects in longitudinal studies. Hornke, L. F., & Habon, M. W. (1986). Rule-based
Mahwah, NJ: Erlbaum. item bank construction and evaluation within the
Embretson, S. E., & Gorin, J. (2001). Improving construct linear logistic framework. Applied Psychological
validity with cognitive psychology principles. Journal Measurement, 10, 369–380.
of Educational Measurement, 38, 343–368. Knol, D. L., & Berger, M. P. (1991). Empirical comparison
Embretson, S. E., & Reise, S. P. (2000). Item response between factor analysis and multidimensional item
theory for psychologists. Mahwah, NJ: Lawrence response models. Multivariate Behavioral Research,
Erlbaum Associates, Inc. 26, 457–477.
Fischer, G. H. (1973). The linear logistic model Kyllonen, P. C. (1993). Aptitude testing inspired by
as an instrument in educational research. Acta information processing: A test of the Four-Sources
Psychologica, 37, 359–374. Model. Journal of General Psychology, 120, 375–405.
Fischer, G. H. (1987). Applying the principles of specific Kyllonen, P. C., & Christal, R. E. (1989). Cognitive
objectivity and generalizability to the measurement of modeling of learning abilities: A status report of
change. Psychometrika, 52, 565–587. LAMP. In R. Dillon, & J. W. Pellegrino (Eds.), Testing:
Fischer, G. H. (1989). An IRT-based model for Theoretical and applied issues (pp. 146–173).
dichotomous longitudinal data. Psychometrika, 54, New York: Freeman.
599–624. Kyllonen, P. C., & Christal, R. E. (1990). Reason-
Fischer, G. H. (1995a). Derivations of the Rasch model. ing ability is (little more than) working-memory
In G. H. Fischer, & I. W. Molenar (Eds.), Rasch models: capacity?! Intelligence, 14, 389–433.
Foundations, recent developments and applications. Lazarsfeld, P. F. (1950). The logical and mathematical
New York: Springer-Verlag. foundation of latent structure analysis. In E. A.
Fischer, G. H. (1995b). Some neglected problems in IRT. Schulman, P. F. Lazarsfeld, S. A. Starr, & J. A. Clausen
Psychometrika, 60, 459–487. (Eds.), Studies in social psychology in World War II.
Vol. 4: Measurement and prediction (pp. 362–412). Muthén, B. (1978). Contributions to factor analysis of
Princeton, NJ: Princeton University Press. dichotomous variables. Psychometrika, 43, 551–560.
Likert, R. (1932). A technique for the measurement of Naglieri, J. A., Drasgow, F., Schmit, M., Handler, L.,
attitudes. Archives of Psychology, 140, 44–53. Prifitera, A., Margolis, A., & Velasquez, R. (2004).
Lord, F. M. (1952). A theory of test scores. Psychometric Psychological testing on the internet: New problems,
Monograph, No. 7. old issues. American Psychologist, 99, 150–162.
Lord, F. M. (1963). Elementary models for measuring Noel, Y. (1999). Recovering unimodal latent patterns
change. In C. W. Harris (Ed.), Problems in measuring of change by unfolding analysis: Applications
change (pp. 21–38). Madison: The University of to smoking cessation. Psychological Methods, 4,
Wisconsin Press. 173–191.
Lord, F. M. (1980). Application of item response theory Pellegrino, J. W. (1998). Mental models and mental
to practical testing problems. Hillsdale, NJ: Erlbaum. tests. In H. Wainer, & H. I. Brown (Eds.), Test validity
Lord, F. M., & Novick, M. R. (1968). Statistical theories (pp. 49–59). Hillsdale, NJ: Erlbaum.
of mental test scores. Reading, MA: Addison-Wesley. Rasch, G. (1960). Probabilistic models for some
Masters, G. (1982). A Rasch model for partial credit intelligence and attainment tests. Chicago, IL: The
scoring. Psychometrika, 47, 149–174. University of Chicago Press.
Maxwell, S. E., & DeLaney, H. (1985). Measurement Raykov, T. (1997). Estimation of composite reliability
and statistics: An examination of construct validity. for congeneric measures. Applied Psychological
Psychological Bulletin, 97, 85–93. Measurement, 21, 173–184.
McDonald, R. P. (1985). Factor analysis and related Raykov, T., & Shrout, P. E. (2002). Reliability of
methods. Hillsdale, NJ: Lawrence Erlbaum Associates. scales with general structure: Point and interval
McDonald, R. P. (1999). Test theory: A unified treatment. estimation using structural equation modeling.
Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Structural Equation Modeling, 9, 195–212.
McHorney, C. A., & Cohen, A. S. (2000). Equating Reckase, M. D. (1997). The past and future of
health status measures with item response theory: multidimensional item response theory. Applied
Illustrations with functional status items. Medical Psychological Measurement, 21, 25–36.
Care, 38, 43–59. Reckase, M. D., & McKinley, R. L. (1982). Some latent
Mead, A. D., & Drasgow, F. (1993). Equivalence of trait theory in a multidimensional latent space. In
computerized and paper-and-pencil cognitive ability D. J. Weiss (Ed.), Proceedings of the 1982 item
tests: A meta-analysis. Psychological Bulletin, 114, response theory and computerized adaptive testing
449–458. conference (pp. 151–177). Unpublished manuscript,
Mellenbergh, G. J. (1994). A unidimensional latent trait Minneapolis, University of Minnesota, Department of
model for continuous item responses. Multivariate Psychology.
Behavioral Research, 29, 223–236. Reise, S. P., & Henson, J. M. (2003). A discussion
Messick, S. (1994). Validity of psychological assessment: of modern versus traditional psychometrics as
Validation of inferences from persons’ responses applied to personality assessment scales. Journal of
and performances as scientific inquiry into score Personality Assessment, 81, 93–103.
meaning. Research Report RR-94-45. Princeton, NJ: Reise, S. P., Smith, L., & Furr, R. M. (2001). Invariance
Educational Testing Service. on the NEO PI–R Neuroticism scale. Multivariate
Mills, C. N., Potenza, M. T., Fremer, J. J., & Ward, W. C. Behavioral Research, 36, 83–110.
(Eds.) (2002). Computer-based testing: Building the Roberts, J. S. (1995). Item response theory approaches
foundation for future assessments. Mahwah, NJ: to attitude measurement. (Doctoral dissertation,
Erlbaum. University of South Carolina, Columbia, 1995).
Millsap, R. E., & Everson, H. T. (1993). Methodology Dissertation Abstracts International, 56, 7089B.
review: Statistical approaches for assessing measure- Samejima, F. (1969). Estimation of latent ability using
ment bias. Applied Psychological Measurement, 17, a response pattern of graded scores. Psychometrika
297–334. Monograph, No. 17.
Mislevy, R. (1986). Recent developments in the Schafer, J. L. (1997). Analysis of incomplete multivariate
factor analysis of categorical variables. Journal of data. New York: Chapman & Hall.
Educational Statistics, 11, 3–31. Spearman, C. (1927). The abilities of man. New York:
Mislevy, R. (1993). Foundations of a new test theory. Macmillan.
In N. Frederiksen, R. Mislevy, & I. Bejar, (Eds.), Test Takane, Y., & de Leeuw, J. (1987). On the relationship
theory for a new generation of tests (pp. 19–39). between item response theory and factor analysis of
Hillsdale, NJ: Lawrence Erlbaum Associates. discretized variables. Psychometrika, 52, 393–408.
Tate, R. (2002). Test dimensionality. In G. Tindal & T. M. organizational research. Organizational Research
Haladyna (Eds.), Large-scale assessment programs Methods, 3, 4–70.
for all students: Validity, technical adequacy, van der Linden, W. J., & Hambleton, R. K. (Eds.)
and implementation (pp. 181–211). Mahwah, NJ: (1996). Handbook of modern item response theory.
Lawrence Erlbaum Associates, Inc. New York: Springer-Verlag.
Thissen, D., & Steinberg, L. (1984). A response model for Wainer, H. (2000). Computerized adaptive testing:
multiple choice items. Psychometrika, 49, 501–519. A primer (2nd ed.). Mahwah, NJ: Lawrence Erlbaum
Thissen, D., & Steinberg, L. (1986). A taxonomy of item Associates, Inc.
response models. Psychometrika, 51, 567–577. Waller, N. G., Thompson, J., & Wenk, E. (2000).
Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, Black-white differences on the MMPI: Using IRT
NJ: Lawrence Erlbaum Associates, Inc. to separate measurement bias from true group
Thorndike, R. M., & Lohman, D. F. (1990). A century of differences on homogeneous and heterogeneous
ability testing. Chicago: Riverside Publishers. scales. Psychological Methods, 5, 125–146.
Tuerlinckx, F., & De Boeck, P. (2005). Two interpreta- Weiss, D. J. (1982). Improving measurement qual-
tions of the discrimination parameter. Psychometrika, ity and efficiency with adaptive testing. Applied
70, 629–649. Psychological Measurement, 6, 473–492.
Vale, D. C. (1986). Linking item parameters onto a Whitely, S. E. (1980). Multicomponent latent trait
common scale. Applied Psychological Measurement, models for ability tests. Psychometrika, 45,
10, 133–144. 479–494.
Vandenberg, R. J., & Lance, C. E. (2000). A review and Yen, W. M. (1986). The choice of scale for educational
synthesis of the measurement invariance literature: measurement: An IRT perspective. Journal of
Suggestions, practices, and recommendations for Educational Measurement, 23, 299–325.
17
Natural and Contrived Data
Susan A. Speer
INTRODUCTION close analysis of a relatively contrived dataset

provides. Even though I use these analyses
In recent years there has been considerable to argue for the virtues of analysing naturally
debate concerning the relative advantages occurring data, at the same time I urge
and disadvantages of ‘natural’ versus ‘con- caution in applying the ‘natural/contrived’
trived’data or ‘unobtrusive’versus ‘obtrusive’ distinction too rigidly. In particular, I suggest
methods1 . In this chapter I provide an that whether or not a piece of data is
overview and critical evaluation of these natural or contrived depends largely on what
debates, illustrating my argument with analy- one is going to do with it. I consider the
ses drawn from an empirical study conducted implications of this analysis for the way
as part of my own research on the topic of feminists and other researchers derive and
‘gender talk’ (Speer, 2002c, 2005). Gender analyse gender talk.
represents a particularly interesting case for an
analysis of the relative virtues of natural and
contrived data, since most feminist research NATURAL AND CONTRIVED DATA
on gender frequently, if not habitually, studies
talk generated using conventional social For some time now, social scientists have
scientific research methods such as surveys, made a distinction between: (i) ‘naturally
interviews and focus groups. Many feminists occurring’, ‘natural’ or ‘naturalistic’ data; and
are of the view that since they deal with (ii) ‘non-naturally occurring’, ‘researcher-
research topics that are often hidden from provoked’, ‘artificial’, or ‘contrived’ data,
view, or are too sensitive or delicate to be arguing that the former are somehow quali-
accessed in random conversation (e.g. talk tatively different from, preferable to, and/or
about gender identity, sex, infidelity, sexual ‘better’ (for the purposes of analysis) than the
harassment, rape and incest, for example), that latter (see Ten Have, 1999: 48ff; Heritage,
they must artificially elicit talk about such 1984: 234ff; 1988; Heritage and Atkinson,
topics from participants, just to render them 1984: 2–5; Potter, 2002, 2003: 612ff;
studiable. In this chapter I highlight the kinds 2004; Potter and Hepburn, 2005a; in press;
of gender-relevant evidence and insights that Potter and Wetherell, 1995; Sacks, 1984;
NATURAL AND CONTRIVED DATA 291
Schegloff, 1996a, 1996b; Silverman, 2006: laboratories’ (1998: 14, emphasis added).
201). While naturally occurring data involve ‘real
Conversation analysts and discursive psy- interests, investments, interactional trajec-
chologists are among the chief advocates of tories’ which ‘are at stake and serve as
this position, expressing a strong preference formative context’ (Schegloff, 1998: 247),
for working with ‘tapes and transcripts of non-natural data are data that have been ‘got
naturally occurring interactions’ (Schegloff up’ by the researcher using an interview, an
and Sacks, 1973: 291, emphasis added). experiment, or a survey questionnaire (Potter,
Indeed, for many, this preference has become 2004: 205). Such data, then, ‘would not
a requirement built into definitions of conver- exist apart from the researcher’s intervention’
sation analysis (CA). According to Hutchby (Silverman, 2006: 201).
and Wooffitt for example, CA is ‘the study The issue of ‘researcher provocation’
of recorded, naturally occurring talk-in- appears central here: According to Schegloff
interaction’ (1998: 14, emphasis in original). and Sacks (1973: 291), natural interaction
Similarly, Psathas argues that within CA ‘data is not ‘coproduced with or provoked by the
may be obtained from any available source, researcher’ (ten Have, 1999: 48), and the
the only requirements being that these should materials are ‘as uncontaminated as possible
be naturally occurring’ (1995: 45, emphasis by social scientific intervention’ (Heritage,
added). Others put this ‘requirement’ for 1988: 130). Ten Have (1999: 49) argues
natural data even more strongly. For example, that ‘the ideal is to (mechanically) observe
Paul ten Have suggests that ‘it is essential for interactions as they would take place without
the CA enterprise to study recordings of nat- research observation’, while Drew (1989: 96)
ural human interaction’ (1999: 47, emphasis goes even further, asserting that the data must
added) and that these recordings ‘should catch not have been ‘produced for the purpose of
“natural interaction” as fully and faithfully as study’, or collected ‘for any pre-formulated
is practically possible’ (1999: 48). Likewise, investigative or research purposes’2 .
Heritage and Atkinson assert that ‘within In what is still one of the clearest exposi-
conversation analysis there is an insistence on tions of the ethnomethodological origins of
the use of materials collected from naturally CA, Heritage (1984) argues that CA’s insis-
occurring occasions of everyday interaction’ tence on the use of naturally occurring data
(1984: 2, former emphasis added). is matched by an avoidance of data sources
A variety of terms have been used along- that are deemed ‘unsatisfactory’ (1984: 236).
side, and interchangeably with, references These include data from interviews, where
to ‘naturally occurring data’. Researchers participants’ reports of events are treated as
work with ‘natural conversation’ (Sacks et al., an ‘appropriate substitute’ for a recording of
1974: 698), ‘natural conversational materials’ the actual events; experiments and testing,
(Schegloff and Sacks, 1973: 291), ‘actual which involve the ‘direction or manipulation
utterances in actual ordinary conversations’ of behaviour’; observational methods, where
(Schegloff, 1988a: 61), ‘actually occurring data are recorded in field notes or using
data’ (Heritage and Atkinson, 1984: 18), pre-coded schemas (and which rely on the
and ‘actual, empirical, naturally occurring researcher’s post-hoc recollection or recall);
garden variety actions’ (Schegloff, 1996a: and invented data (sentences, speech acts or
166). Here, the ‘natural’ or ‘actual’ is implic- exemplar dialogues) based on intuition or
itly or explicitly contrasted with data that ‘idealizations about how interactions work’
are ‘non-natural’, ‘contrived’ or ‘researcher- (Heritage, 1984: 236; see also, Heritage
provoked’. So, Hutchby and Wooffitt argue and Atkinson, 1984: 2–5, ten Have, 1999:
that ‘naturally occurring’ refers to recorded 53–4). In sum, advocates of ‘natural data’
interactions ‘situated as far as possible in overwhelmingly focus on ‘the details of
the ordinary unfolding of people’s lives, as actual events’ (Sacks, 1984: 26) and avoid
opposed to being prearranged or set up in the decontextualised kinds of data; the
‘hypotheticalized, proposedly typicalized ver- are trying to understand’ (1996b: 468), and
sions of the world’ (1984: 25) commonly used ‘confront participants with quite distinctive,
in linguistic and philosophical approaches to and potentially complicating, interactional
language (see also Schegloff, 1988a). exigencies’ (1999: 419)3 . And yet, ironically,
Underlying this preference for natural even interviews and experiments rely in
data was Harvey Sacks’ desire to produce their design on the identification of rel-
a stable (and hence reproducible) natural evant variables for study taken from the
observational science of society (Schegloff, observation of naturally occurring interaction.
1995, vol. 1: xxx–xxxii). As part of this, Sacks As Heritage puts it, ‘it is unlikely that an
and his colleagues aimed to produce an inven- experimenter will be able to identify [control
tory of ‘recognizable social actions in this and manipulate] the range of relevant vari-
culture … to find it and provide an account of ables without previous exposure to naturally
it empirically and precisely, not imaginatively occurring interaction’ (1984: 238, see also
or typically or hypothetically or conjecturally Schegloff, 2004).
or experimentally, and to use actual, situated
occurrences of it in naturally occurring social
settings to control its description’ (Schegloff, FEMINIST PERSPECTIVES ON
1996a: 167). Sacks’ (1987: 54) argument NATURAL AND CONTRIVED DATA
was that if we are serious about producing
empirically grounded descriptions of the One group of researchers for whom this
social organisation of human interaction, then preference for naturally occurring data has
‘sequences [or talk] are the most natural sorts proven especially problematic, is feminist
of objects to be studying’. And yet, according researchers. Indeed, as I note above, most
to Sacks (1995, vol. 2: 5), researchers do feminist research on gender frequently, if not
not ‘have a strong intuition for sequencing in habitually, studies talk generated using con-
conversation’. Indeed, no matter how rich the ventional social scientific research methods
researcher’s imagination (Sacks, 1995, vol. 2: such as surveys, interviews and focus groups.
419), if we work with idiosyncratic, invented As C. Kitzinger (2000: 170) observes, very
or hypotheticalised-typicalised data exam- little feminist research is conducted using
ples, then we risk producing what Schegloff naturalistic data where gender and sexuality
calls a ‘sociology by epitome’ (1988b: 101), ‘just “happen” to be present’.
overlooking precisely those features of inter- There are three main reasons why feminist
action and its sequencing that might tell researchers have been reluctant to stray far
us something new or surprising about the from the use of such ‘contrived’ materials:
phenomena we are studying.As Sacks notes, it First, it seems to be a widely held, tacit
is only ‘from close looking at the world [that] assumption that since gender is, for most
you can find things that we couldn’t, by imag- ‘ordinary’ members, taken for granted and
ination, assert were there’ (1995, vol. 2: 419). thus background to interaction, the researcher
Likewise, where the researcher uses must artificially elicit talk about gender
‘written texts, monologues, talk or writ- from participants (i.e. they must ‘topicalize’
ing produced under experimental or quasi- gender) just to make it visible. I made
experimental conditions’ (Schegloff, 1996b: precisely this assumption in my early research
468), then the interactional practices which on masculinity, where I asked my respondents
‘undergird’ our ‘natural phenomena’ of inter- questions like ‘do you ever think you behave
est may be ‘largely or totally absent … sup- in a way that’s not traditionally masculine?’,
pressed by specially designed circumstances and ‘do you think the fact you’re male
of production’ (1996b: 468). Experimental affects your leisure in any way?’ (Speer,
control and standardisation ‘of stimuli, con- 2001; for a discussion of related issues see
ditions, topics, etc.’ (Schegloff, 1996b: 468) C. Kitzinger, 2006). For researchers who
suppress ‘the very heart of the phenomena we adopt this approach, far from suppressing
the kinds of ‘natural’ phenomena to which (hypothetical and anecdotal) examples of

they wish to gain access, by exposing the sexist talk, by referring to the pragmatics of
phenomena to the researcher’s view, contrived obtaining and accessing a sizeable enough
materials render them studiable. corpus of such data. She argued that:
Second, many feminist researchers deal
random conversation must go on for quite some
with topics that they deem to be too sensitive, time, and the recorder must be exceedingly lucky
private or delicate to be accessed in their anyway, in order to produce evidence of any
‘home’ environments (e.g. talk about sex, particular hypothesis, e.g. that there is sexism in
infidelity, sexual harassment, rape, incest, and language. If we are to have a good sample of data
so on). They commonly assume that it is to analyze, this will have to be elicited artificially
from someone; I submit I am as good an artificial
extremely difficult or impossible to gain source of data as anyone. (Lakoff, 1973: 47)
access to settings where instances of the
phenomena or talk about the phenomena, Although she was writing more than three
‘crop up’ as a matter of course. As Tainio decades ago, Lakoff’s views are echoed fre-
(2003: 173–4) remarks, ‘It is seldom pos- quently by contemporary gender and language
sible to get recordings of actual instances researchers in order to justify the use of
of “sexual harassment”’. Many feminists contrived materials. Mary Bucholtz (2004:
overcome these problems of access by 123), for example, commends Lakoff’s use
obtaining members’ retrospective reports on of an ‘introspective methodology’ and desire
their experiences of the phenomena, using to ‘locate herself so squarely within her
those reports as relatively unproblematic, text’ as an example of feminist ‘reflexivity’.
unreconstructed evidence for the underlying Similarly, Livia (2003: 147) argues that the
experiential reality. For example, C. Kitzinger use of constructed dialogue or scripts, can
and Frith (1999, and for a recent overview ‘allow us to see . . . what expectations speakers
of this study see Wilkinson and C. Kitzinger, have of patterns of speech appropriate for each
2007) collected focus group data of women’s sex’4 . Some feminist discourse analysts even
reported accounts of their difficulties in saying suggest that contrived sources like interviews
no to unwanted sexual advances, in order to and focus groups may ‘yield richer data’
help them understand the actual difficulties than naturally occurring talk, ‘simply because
that women might experience in saying no the topic has been pre-set’ (Sunderland,
to sex. As they note ‘We are not aware of 2004: 183).
any research which has used as data actual
naturalistically occurring acceptances – or
refusals – of sexual interaction’ (1999: 300). THE BIOGRAPHY OF A FEMINIST
And yet, as Tainio (2003) has since shown, it RESEARCH PROJECT
is both possible and hugely illuminating, to
access and analyse actual instances of sexual I began my own research career with a similar
harassment and refusal in action. set of assumptions. My early interest was
Third, and relatedly, many feminists are of in the topic of gender and leisure, and
the view that certain topics are mentioned specifically, people’s views about men and
far too infrequently in random conversation women’s participation in ‘non-traditional’
to be captured through naturalistic means. activities (such as men’s ballet and women’s
For example, gender and language researchers rugby, for example). I knew, on the basis of
have often found it difficult to obtain exam- both commonsense and my own experience
ples of ‘sexist talk’ in naturally occurring as a member of this culture, that when
materials and have resorted to using made- confronted with instances of men and women
up or remembered examples of talk in their breaking norms and engaging in activities
place. In her classic and still much heralded considered ‘inappropriate’ for their sex, many
study of gender and language, for example, people would express negative (sexist and
Robin Lakoff (1973) justified using contrived homophobic) views. Therefore I thought that
if I could contrive it so that my partici- topic-focused data from within a non-

pants would talk about their views on such hierarchical, participant-centred framework,
topics, and do so in a fairly naturalistic is the use of prompts as ‘stimulus materials’
and spontaneous manner, then I might be (C. Kitzinger and Powell, 1995; J. Kitzinger
able to say something productive about the and Barbour, 1999; Wilkinson, 1999). I use the
conversational and interactional practices that concept of ‘prompt’ broadly to refer to audio
members deploy in order to reproduce and and video clips (Schlesinger et al., 1992), pho-
maintain restrictive gender norms within tographs, magazine images, advertisements
society. In sum, I would be able to access and newspaper clippings (J. Kitzinger, 1990,
what sexism and heterosexism looks like 1994), objects (Chiu and Knight, 1999),
‘in action’. vignettes (Finch, 1987; Hughes, 1998; Sleed
I considered a range of methods through et al., 2002), sentence and story completion
which I might be able to access such talk. exercises (C. Kitzinger and Powell, 1995;
Given that I was driven by a feminist political Pollak and Gilligan, 1982), group exercises,
agenda, the issue for me was how to obtain games, and set tasks (Snelling, 1999), concept
topic-focused information at the same time as mapping (Campbell and Salem, 1999), and
giving respondents a degree of control over the sorting and ranking of cards (J. Kitzinger,
the research agenda. Indeed, as a feminist, 1990). While some disadvantages of prompts
one of my primary methodological concerns have been noted: stimulus materials ‘can
was to encourage non-hierarchical research make people feel uncomfortable (“it’s like
relationships and to avoid imposing my being back at school”)’ (J. Kitzinger and
own analytic categories and concepts on my Barbour, 1999: 12), they are, nonetheless,
respondents5 . Instead, I wanted to ‘give voice’ overwhelmingly regarded in a positive light.
to my participants (C. Kitzinger, 2003) and Feminist researchers have argued that prompts
have them, as far as possible, ‘assert their are ‘a useful tool for stimulating discussion’
own interpretations and agendas’ (Wilkinson, (J. Kitzinger, 1990: 323) and represent
1999: 233). I was acutely aware that the ‘a very effective way of exploring people’s
more keenly my presence and direction understandings’ (1990: 330). This is primarily
was felt by those present, and the more because they take the focus away from
‘contrived’ the research setting, the less likely the researcher and allow the participants
it was that the participants would respond themselves to set the agenda. As J. Kitzinger
‘in their own terms’. Conversely, if I was and Barbour (1999: 12) note, prompts can
to impose no structure or framework on ‘engage people in discussion without the
the discussion at all, I might be left with researcher providing any vocabulary or ter-
a confusing array of incomparable, irrelevant minology’. Prompts, then, seemed to provide
or ‘off-topic’ responses. So, my ultimate aim the ideal solution to the problems I was
was to adopt methodological procedures that facing.
would facilitate the collection of relatively I decided to use prompts in my research,
naturalistic and spontaneous talk about my believing that they would provide an inter-
phenomena of interest, but in a relatively esting, often provocative stimulus around
researcher controlled – and hence contrived, which to generate discussion about gender
fashion. issues. They would (I thought) encourage
respondents to produce ‘gendered’ views in
a relatively naturalistic, spontaneous fashion,
USING PROMPTS AS STIMULUS and with as little obvious direction from
MATERIALS myself as possible. However, as is so often
the case with the ‘real life’ application of
Though not heralded as specifically feminist, social science methods, things did not turn
one method that is increasingly recommended out quite how I had first anticipated. Indeed,
by feminists who want to collect spontaneous, it did not occur to me at the outset that
adopting these procedures might generate its activities (men ballet dancing, and women
own set of problems, or that my presence in boxing or playing rugby, for example). Some
the interactions would have the impact that prompts showing men and women engaging
it did. in traditionally gendered activities were also
I found that while the prompts were used as a point of comparison (men playing
certainly useful and provocative, they did, rugby and women shopping, for example). All
nonetheless, often fail to work in the way but one of the interviews and focus groups
I had intended them to work. In practice, were moderated by myself. The remaining
the prompts did not seem to minimise my group was moderated by a second (female)
impact, encourage the respondents to set moderator (referred to, respectively, as ‘Mod
the priorities, or produce spontaneous or 1’ and ‘Mod 2’ below). All the data were
naturalistic ‘gender talk’. In fact, it was not transcribed verbatim in the first instance.
always clear to the participants how they Detailed transcripts were then worked up
were supposed to respond to the prompts, and using conventions developed within CA by
it often took further work on my part, and Gail Jefferson (2004a). A simplified version of
follow-up questioning, before I could elicit these conventions is included in theAppendix.
their (gendered) view. In a search of the corpus I identified
58 occasions where a prompt was shown.
In just under half of these instances the
METHOD AND ANALYSIS participants had no problems responding to
the prompt, and engaging with the task set
In the remainder of this chapter, I want to (for a discussion of some of these ‘successful’
revisit some of the data I obtained from this instances see Speer, 2002c). However, in the
study in order to demonstrate what actually remaining instances, the participants seemed
happened when the prompts were shown, and to have some trouble identifying the content
to consider what this might tell us about of the prompt. They sought clarification from
the relative virtues of natural and contrive the moderator of the grounds on which they
materials. were required to respond to the prompt, thus
The four excerpts I discuss below derive engaging her in work to disambiguate its
from a series of prompted one-to-one and content.
group discussions. Research participants were Consider the following two excerpts.
drawn from several ‘naturally occurring’ In excerpt 1 (line 1) the moderator introduces
friendship and family groups, and included a a picture of a female football supporter in
diverse range of men and women ranging in overlap with Keith and Alice’s discussion of
age from 20 to 70+ years. Visual prompts were rugby (Donald [line 6] is the first participant
drawn mainly from newspapers and maga- to respond to the prompt). In excerpt 2 (line 1),
zines and showed images of men and women the moderator shows a picture of two women
engaging in a variety of ‘non-traditional’ dancing in a club.
(1) SAS 28-12-97 A:22-3 Mealtime Discussion

1 Mod 1: F -> [>Ah this one.<]
2 Keith [is a different] [ket- (of-)] I [think ru]gby=
3 Alice. [r u g b y ] [ra- pu-]
4 Keith: =is-[is a di]fferent kettle of fish.
5 Alice: [soccer ]
6 Donald: Fins-> I presume that is a (.) women- woman
7 supporter of foot↓ball.
8 Mod 1: Sins-> Yeah.
9 (0.8)
10 Donald: S -> We:::ll if that’s what she likes doing she
11 -> can like it but I don’t like the cigarette

12 -> hanging out of her mouth at a:ll.
(2) SAS 2-12-97 A: 8-9 Focus Group

1 Mod 2: F -> An’ how about (1.2) that one.
2 ?: ( )
3 (1.8)
4 Mod 2: -> °How do you react to that.°
5 (3.8)
6 Sarah: Fins-> o Is that two girls dancing together is
7 -> [it?]°=
8 Carole: Sins-> [Yeah.]=
9 Mod 2: Sins-> =Mm:.
10 (Sarah): ((sniff))
11 (1.2)
12 Carole: S -> >I think that’s prob’ly quite a normal (0.6)
13 -> normal thing [because-] coz of all this (.)=
14 (Sarah)?: [((sniffs))]=
15 Carole -> =weird (.) kind of lighting eff[ect
16 ?: [heh heh
17 heh [.h h h ]
18 Carole: -> [it makes] it look really quite biza:rre,
In both excerpts, when the moderator pair part (i.e. the evaluation of the prompt).
shows the prompt, she treats the task that These insertion sequences are addressed ‘to
the participants are engaged in as one that contingencies of what is to be done next’
is already familiar to them. Additionally, (2007: 100). In other words, they help the
as the ‘first pair part’ of the sequence, participants to establish the information and
the turn that accompanies the showing of resources they need in order to appropriately
the prompt strongly implies that there is evaluate the prompt and thus ‘to implement
something ‘comment-worthy’ or ‘notable’ the second pair part [the evaluation] which
about the image that the recipients might be is [still] pending’ (Schegloff, 2007: 106). The
able to respond or react to. In other words, different parts of this sequence are marked in
the showing of the prompt invites – or makes the left-hand margin of the transcripts, above.
‘conditionally relevant’ – an appropriately In the first part of the insertion sequence
fitted ‘second pair part’in which the recipients (excerpt 1, lines 6–7 and excerpt 2, lines 6–7),
produce some sort of evaluative commentary the recipients ask a question which puts
on the prompt (for more on adjacency pairs forward a possible candidate interpretation
see Schegloff, 2007: 13ff). However, in both of what it is they see in the prompt and
instances, the recipients do not, initially at the grounds on which they might evaluate
least, produce such an evaluative commen- it. Notice that both Donald and Carole
tary. Instead, they defer their evaluations until treat gender as the ‘relevant thing’ about
later (excerpt 1, lines 10–12, excerpt 2, lines the prompt (Edwards, 1998; Hopper and
12–13, 15, and 18) in order to first check with LeBaron, 1998). So, for Donald it is not just a
the moderator whether what they have seen in supporter of football, but a ‘woman supporter
the prompt is what they are supposed to see. of foot↓ball’ (said with emphasis on the
In each case their checks take the form repaired gender category, ‘woman’), whereas
of a question-answer ‘insertion sequence’ for Carole it is not just people dancing,
(Schegloff, 2007: 97ff). Insertion sequences but ‘two girls dancing together’ (said with
are sequences within a sequence: they come emphasis on the word ‘girls’). In both cases
after the base first pair part (i.e. the showing the moderator (and in excerpt 2, another group
of the prompt) and before the base second member [line 8]) confirms these candidate
interpretations as ‘appropriate grounds’ for Other features of dispreference are evident

response: ‘Yeah.’(excerpt 1, line 8) and ‘Mm:’ in the lengthy delays both before the insertion
(excerpt 2, line 9). Now they have secured sequence (the delay that coincides with the
the moderator’s confirmation, the recipients in intervening talk on a separate matter in
each case have the resources necessary to go excerpt 1, lines 2–5, and the gaps in excerpt 2,
on to produce their (non-gendered) evaluation lines 3 and 5), and after the insertion sequence,
of the prompt (i.e. the conditionally relevant just prior to the evaluations proper (excerpt 1,
base second pair part) -an evaluation that line 9, excerpt 2, line 11). The evaluations
has been held in temporary abeyance by the themselves (excerpt 1, lines 10–12, and
insertion sequence. excerpt 2, lines 12–13, 15, and 18) are
The first two excerpts, then, are organised composed in a characteristically dispreferred
as follows: format. So in both excerpts, even while
[F] First pair part: Moderator shows the prompt and
the insert expansions serve to supply the
indicates that a response is a relevant next turn. information necessary for the respondents
[Fins] First pair part of insertion sequence: to proceed to their evaluative commentaries,
Respondent asks a question which puts forward those commentaries are nonetheless delayed
a possible candidate interpretation of what it is (excerpt 1, line 9; excerpt 2, lines 10–11).
they see in the prompt and the grounds on which
they might evaluate it. This candidate interpretation
Moreover, in excerpt 1, Donald’s commentary
makes gender explicitly relevant. on the prompt is prefaced with the elongated
‘We:::ll’, which indicates that his upcoming
[Sins] Second pair part of insertion sequence:
Moderator confirms candidate.
response may disaffiliate with, or provide a
disfavourable interpretation of, the content of
[S] Second pair part. Respondent offers evaluative
commentary on the prompt but does not follow up
the prompt (for more on the function of turn-
their prior gender noticing. initial ‘well’ see Schegloff and Lerner, 2004).
Similarly, in excerpt 2, Carole’s evaluation
Insertion sequences are often found in is characterised by hedging (it is ‘prob’ly
environments of ‘dispreference’ (Pomerantz, quite a normal . . . . thing [lines 12–13] and
1984). Dispreference refers not to a psycho- ‘quite biza:rre’ [line 18]), disfluency and
logical state but rather to an interactional perturbations: she stops and then re-starts
one, and it concerns the kinds of alignment her utterance after the word ‘normal’ paus-
that a speaker of a second pair part takes up ing, mid-Turn Constructional Unit (TCU)
with respect to the first pair part (Schegloff, (lines 12–13).
2007: 59). In the data discussed here, the Notice that, in their evaluations of the
‘preferred response’ (i.e. the alignment that prompt, neither Donald nor Carole expands
the recipient should ideally take up with on the ‘gendered’ grounds that they initially
respect to the action that is initiated by the made relevant in their inserted question about
showing of the prompt) may be one in which the prompt. Indeed, they do not follow up on
they immediately identify what it is about this prior gender noticing or mention anything
the prompt that they are required to respond at all about women football supporters or
to, and evaluate it accordingly. However, in girls who dance. Thus although Donald’s
these excerpts, since the insertion sequence wonderfully circular ‘if that’s what she likes
intervenes between the first and second pair doing she can like it’ (excerpt 1, lines 10–11)
parts of the sequence, it compromises the has a possibly slightly ‘disgusted’ tone which
‘progressivity of the base sequence’, and might be heard as evaluative and as indicative
projects ‘the possibility of a dispreferred of some distaste for the (gendered) activity in
response’ (Schegloff, 2007: 100). Moreover, question, the design of this turn is such that
it indicates that the participants are somehow he does not so much evaluate the activity of
disaligned in this case, unable (or unwilling) to women football supporting as pointedly pass
respond in the interactionally ‘preferred’ way up the opportunity to evaluate it. Likewise
to the showing of the prompt. his contrastively negative ‘but I don’t like
the cigarette hanging out of her mouth at science ‘expert’ with privileged access to,
a:ll’ (lines 11–12) is not so much a negative and knowledge about, the prompts. Indeed,
evaluation of women football supporters as in this context, the moderator’s first turn
it is a personal view on an aspect of the may be hearable by the respondents as an
image (cigarette smoking) that is seemingly ‘exam’ or ‘test’ question for which there
unrelated (and built by Donald as unrelated) is a ‘right’ or ‘wrong’ answer (Levinson,
to the activity in question. 1992)6 . The trouble and dispreference evident
Similarly, in excerpt 2, the participants in these excerpts – and the very necessity
clearly take the task set by the moderator for the question posed by the recipients in
in which they are required to ‘react’ to the insertion sequence – displays strongly the
the content of the prompt, as indicative respondents’ presumption that the moderator
of something potentially non-normative or already knows what is going on in the prompt,
incongruous about it. However, since, for and that she has an expectation about what
them, they seem unable to find anything kind of reaction might be an ‘appropriate’ or
non-normative or ‘newsworthy’ about ‘two the most ‘correct’ one.
girls dancing together’, then rather than offer This creates a paradoxical situation for the
an evaluation which follows up on Sarah’s moderator. On the one hand, she uses picture
gender noticing at lines 6–7 (a noticing prompts in order to generate non-hierarchical,
confirmed by Carole and the moderator at participant-led discussion of topics that they
lines 8 and 9), they simply comment on draw out from the picture as relevant to them.
the ‘normality’ of what the prompt depicts On the other hand, the occasion is set up as
(‘>I think that’s prob’ly quite a normal one in which the prompt, and the moderator’s
(0.6) normal thing’ [lines 12–13]). Thus, their accompanying question, is taken by the
response is not so much an evaluation or respondents to be a ‘test question’, where their
commentary on the activity of ‘girls dancing answer is actually not a free and unencum-
together’ as it is an account for not having bered one, but rather one that is going to be
an evaluation. Indeed, the thing that seems measured against the knowledge that they sur-
most newsworthy about the image is not mise the moderator may already have about it.
‘girls dancing together’, but rather, non- In the next excerpt we see a possible
normative features of the semiotics of the attempt by the moderator to manage this
picture (‘because- coz of all this (.) weird (.) paradox and re-establish a non-hierarchical
kind of lighting effect … it makes it look really research relationship. There is a considerable
quite biza:rre’ [lines 12–13, 15, and 18]). amount of complexity here which would
In the two excerpts I’ve discussed so far the repay a detailed analysis. However, for our
moderator responds to the inserted question present purposes I want simply to note that
about the prompt with the relevant second just as in excerpts 1 and 2, the recipients
pair part in which she helps the recipients to appear to have some trouble ascertaining
disambiguate its content, thereby confirming the grounds on which they are required to
that they have ‘correctly’ identified what the respond to or evaluate the prompt. This trouble
prompt depicts and the grounds on which they appears especially acute in this case because
might appropriately evaluate it (excerpt 1, it revolves around the delicate problem of
line 8, excerpt 2, line 9). One consequence assigning a gender to the person in the
of her participation in the insertion sequence image (lines 10–11 and 14–15). This trouble,
is that the moderator helps progress the course combined with the moderator’s withholding
of action toward her required interactional of assistance at precisely those points where
outcome – the respondent evaluations and she could legitimately provide it (e.g. at lines
the giving of (possibly gendered) views. 6, 8, 12, 16), provokes Alice to initiate the first
However, at the same time, the help she pair part of an insertion sequence in which
provides may inadvertently reinforce the she reports her ‘first thoughts’ (Jefferson
respondents’ presumption that she is a social 2004b) on the gender of the person in the
image (constructed in a way that indicates she candidate interpretation of what is going
thinks she may well be wrong) (lines 10–11). on in the prompt. Instead, she initially
When the moderator does not respond, she resists answering the question by using a
seeks confirmation regarding the correctness conversational ‘counter’ (marked as ‘cnt’ in
of her interpration: ‘is it a wo↑man ↑Su◦ san?’ the left-hand margin of the transcript), which
(lines 14–15). However, in this case, the serves to throw Alice’s question about the
moderator does not answer the inserted prompt directly back to her for her to answer:
question, or confirm or disconfirm Alice’s ‘What do you ↑think?’ (line 17):
(3) 26-12-97 A: 36-7 Mealtime Discussion

1 Mod 1: F -> Right. What’s going [on in this one.]
2 Eadie: [((Hearing aid whistles)]
3 Jan: [Oh:!]
4 Alice: What is going on:,=
5 Mat: =Whistles.
6 (0.4)
7 Eadie: .hhm hhh.
8 (0.8)
9 Jan: Oh.
10 Alice: Fins -> >Oh it looks as if-< (.) ↑OOH I thought that
11 -> was a woman to start with.
12 (0.8)
13 Jan: Oh::.
14 Alice: -> I ↑thought it was a woman so it- (.) is it a
15 -> wo↑man ↑Su°san?°
16 (.)
17 Mod 1: Fins cnt -> What do you ↑think?
18 (3.4)
19 Jan: Ah [hah.]
20 Alice: Sins cnt -> [ I ] thought it was a woman in a- playing
21 [rugby.]
22 Mod 1: Sins -> >[No it] is a woman<.
23 (.)
24 Mod 1: -> It’s a woman with a cigarette in her mouth
25 -> and a can o’ lager.
26 Alice: Ye:s.
27 Mod 1: -> It’s a football supporter I think.
28 Alice: Oh: football supporter.
29 (3.4)
30 Mod 1: >Shall I pass it round?<
31 Alice: S -> We::ll. .hhh
32 Jan: Pass [(t h a t)]
33 (Alice): -> [(It’s j’st)]
34 Eadie: [((clears throat))
35 (1.2)
36 Alice: -> I mean probably, .hh (0.2) they dress up more
37 -> (.) nowadays than [they did.]
38 Jan: -> [That’s all] put on though,
39 -> that seems to me as if it’s just a big act.
Where insertion sequences serve to defer the the second pair part ‘with a question of
production of a second pair part which is their own. They thus reverse the direction
conditionally relevant but temporarily held of the sequence and its flow; they reverse the
in abeyance, a counter serves to ‘replace’ direction of constraint’ (Schegloff, 2007: 17,
emphasis in original). The interactional effect the prompt ‘in their own terms’. Indeed, the
of this is that it is not the moderator who immediate effect of the counter is to return
is now required to respond to the inserted the conversational floor – and hence the
question about the prompt, but the original responsibility for answering the question –
questioner, Alice. In technical terms, the directly to the recipients. However, one could
moderator essentially uses the counter to argue that, in practice, by reversing the
‘redistribute the responsibility for producing direction of the sequence and redirecting
a base second pair part’ (2007: 99). Alice’s question back to her, the counter
Alice responds to the counter by reverting constrains the recipients still further, putting
to the same first thoughts that she has already them ‘on the spot’. Moreover, the positioning
expressed twice (at lines 10–11 and 14) before of the counter (after the first pair part of an
asking her question about the prompt. This insertion sequence) is doubly consequential
time she expands her candidate interpretation in that, as I have already noted, insertion
by adding the activity (rugby) to the gender sequences tend to get launched in situa-
element: ‘I thought it was a woman in a- tions of dispreference. By throwing back a
playing rugby’ (lines 20–21). This response, question to someone who asked it because
as it turns out, offers the ‘wrong’ candidate, they are already in the midsts of trouble
as the moderator’s subsequent turn – ‘>No answering a just prior question, one strongly
it is a woman<’ (line 22), makes clear. risks exacerbating rather than rectifying that
Thus, although Alice has, in her reported first trouble8 .
thoughts, correctly identified the gender of the Having finally established what it is about
person in the image, she has failed to correctly the prompt that they are responding to, the
identify the activity that the woman is engaged recipients turn their attention to providing
in: it is not a woman rugby player but ‘It’s a the evaluation and/or commentary on the
woman with a cigarette in her mouth and a can prompt that has so far been held in abeyance
o’ lager…. It’s a football supporter I think.’ by the insertion sequence and conversational
(lines 24–5 and 27). counter. As before, the respondent’s reactions
Note that the moderator’s conversational to and evaluations of the prompt are marked
counter at line 17 does not, at this point as ‘dispreferred’, and preceded by a lengthy
in the sequence, project that she will go delay (line 29). In response to this delay,
on to answer Alice’s question and provide the moderator demonstrates that an evaluation
the ‘correct’ interpretation of the prompt. (the base second pair part) is still pending,
Indeed, she could quite reasonably respond at by offering to pass the prompt around
line 22 with a further question: ‘what makes the table (line 30). Rather like Donald in
you think that?’, for example. However, as excerpt 1, Alice reacts with what looks
can sometimes happen with counters (see like the start of a negative evaluation that
Schegloff, 2007: 17), in this case (and perhaps disaffiliates with the activity shown in the
in part because Alice’s past tense ‘I thought prompt (line 31). However there follows
it was’ construction may indicate that she still a further delay (e.g. line 35) before Alice
thinks she may be wrong), the moderator does unpacks what it is that she is getting at:
end up producing the response to the inserted ‘I mean probably, .hh (0.2) they dress up
question that she has just thrown back to more (.) nowadays than they did’ (lines 36–7).
Alice for Alice to answer7 . Just as we have seen with the participants
So what should we make of the moder- reactions in previous excerpts, even though
ator’s use of the conversational counter in her earlier identification problems revolved
this excerpt? It is quite possible that she around assigning a gender to the person
uses it in order to minimise her control in the image, Alice’s subsequent prompt-
over the research agenda, and to encourage related commentary does not follow up on
the participants to define what they see in this gender relevance, or evaluate the activity
depicted in the prompt in gendered terms. indexing of gender? One possible explanation
In fact, her response at lines 36–7 is not is that the participants may have picked up on
so much an evaluation and commentary on what they take to be the researcher’s ‘elusive
the activity of women football supporting, hypothesis’ (that the showing of prompts
as it is a remark on an aspect of the will allow her to access the participants’
image (what women football supporters wear gendered views) and that their responses to
nowadays) that is arguably only marginally the prompt may therefore be used indirectly
related to the (gendered) activity in question. to reveal something negative about them as
Similarly, Jan’s commentary on the prompt, people that is not immediately evident or
‘That’s all put on though, that seems to apparent. In other words, they may have
me as if it’s just a big act’ (lines 38–9) correctly identified that what’s ‘up for grabs’
is delivered as a qualification of Alice’s is not whether the image depicted in the
evaluation (with the ‘though’ marking the prompt is good or bad, but whether they’re
qualification [Pomerantz, 1984: 97]), and good or bad. It would hardly be surprising
instead of evaluating the activity depicted in given this context, if the recipients were to
the image ‘on its own terms’, Jan’s assessment anticipate and work to avoid producing the
treats it as somehow ‘staged’ or ‘non-genuine’ kind of ‘identity implicative’ commentary
and thereby as something that is possibly not that they assume the moderator is pursuing.
worthy of her evaluation. In sum, there may be a sense in which
So far I have demonstrated how, when the respondents’ apparent trouble with the
they are shown a prompt the respondents prompt, the insertion sequences in which
appear to have some significant difficulties they seek clarification from the moderator,
both working out the grounds on which the delays, and the inexplicitness of their
they are required to respond to it, and in subsequent commentaries on, and evaluations
making their evaluations proper. Instead of of, the prompt may be part of resisting
responding ‘in their own terms’ (and as giving gendered views. If they do provide
I, as a feminist, might have wished), they such views, then they could be labelled sexist
tend to treat the moderator as an ‘expert’ or homophobic – and, as I will show below,
with privileged access to the prompt, and this ‘oriented to’ possibility, creates the ideal
engage her in additional interactional work environment for resistance.
in order to disambiguate its content. Even Indeed, I want to propose that, in addition
where the moderator works explicitly to to fulfilling the task made relevant by the
avoid answering the respondents’ questions showing of the prompt, the design and
about the prompt, and encourages them, delivery of participants’ responses to the
through the use of a conversational counter, moderators’ questions can perform resistive
to put things in their own terms, she is still ‘identity work’. We can find clear evidence
engaged in work to disambiguate the content for this resistance in sequence organisational
of the prompt and progress the interaction terms.
towards her favoured interactional outcome In a search of the corpus I found 12
(i.e. the production of [gendered] views or instances in which respondents actively resist
commentaries on the prompt). Finally, where the production of a gendered view. In these
evaluations are eventually elicited by the instances, the interactions do not progress
moderator, the participants do not follow through the kinds of sequences identified
up on the gender noticing made relevant in above. Instead, they are characterised by an
their own earlier candidate inquiries about the extended ‘series’ of (moderator) questions
prompt’s content. and (respondent) answers concerning the
So what might account for these inter- prompt. The moderator is more or less
actionally ‘troubled’ responses in which dissatisfied with the response she gets in each
participants do not follow up their own initial case, and doggedly pursues her course of
action until she either elicits some (gendered) or appears satisfied that none will be forth-
commentary on, or evaluation of, the prompt coming9 . Consider excerpt 4, below:
(4) SAS 27-12-97 B: 20-21 Interview

1 Mod 1: F -> >What do you think o’ that one then.<
2 (1.0)
3 Ben: S -> Lovely.
4 (0.8)
5 Mod 1: Fpost -> Well what’s going on in it.
6 (0.6)
7 Ben: Spost -> It’s a male ballet dancer.
8 (.)
9 Mod 1: SCT -> Ri:ght.
10 (1.4)
11 Mod 1: F -> Would you do that?
12 (0.8)
13 Ben: S -> Uh:m,
14 (1.4)
15 Ben: -> No I’ve got dodgy ankles.
16 Mod 1: o Hhhh.o
17 (2.2)
18 Ben: -> But if I could, (.) then I prob’ly would.
19 (1.8)
20 Ben: -> If I: (1.0) was interested in it.
21 Mod 1: Fpost a-> Do you think though that it breaks
22 stereotypes at all.
23 (.)
24 Ben: Spost a-> No:.
25 (0.4)
26 Mod 1: Fpost b-> It doesn’t.
27 (.)
28 Ben: Spost b-> [No,]
29 Mod 1: Fpost c-> [I] mean some people would say that he’s a
30 ‘poof’ or something.
31 (0.3)
32 Ben: Spost c-> I think that some people would.
33 (0.6)
34 Mod 1: Fpost d-> But you wouldn’t.
35 (.)
36 Ben: Spost d-> ↑No.
As Schegloff (2007: 179–80) notes ‘gener- accompanied as it is by the moderator’s

ally speaking, preferred second pair parts question, can be understood rather like a topic
are “closure relevant” and dispreferred proffer. Examples of topic proffers include
second pair parts are “expansion rele- questions such as: ‘How was the races last
vant” ’ (2007: 179–80). However, in certain night?’, ‘So are you dating Keith?’ and
types of sequences, called ‘topic-proffering’ ‘So, you’re back?’ (Schegloff, 2007: 170–1).
sequences, ‘preferred responses engender These questions, like the showing of the
expansion and dispreferred responses engen- prompt, are ‘recipient oriented’: they refer to
der sequence closure’ (2007: 169). Despite (but do not themselves progress) topics ‘about
some obvious differences (the prompt is an which the recipient is, or is treated as being,
object shown to recipients in part for them to an/the authoritative speaker …. or on which
establish its ‘topicality’ or relevance), I want their view has special weight or authority’
to suggest that the showing of the prompt, (2007: 170). Indeed, in showing the prompt
the moderator treats the recipient as having to the thing he is being asked about (at least he
access to, and as able to display a stance has direct visual access to the image depicted
toward, it (2007: 171). Finally, within topic in the prompt).
proffering sequences, just like the prompted The moderator shows that she understands
sequences shown here, the recipient ‘is likely Ben’s response to be disaligned with, and
to carry the burden of the talking’ (2007: 170). resistant of, the topic proffer. Her question
If we consider the showing of the prompt ‘Well what’s going on in it’ (line 5) is ‘well’
as akin to the initiation of a topic proffering prefaced – something that we saw earlier, can
sequence, then the preferred response for this signal disagreement or disaffiliation with the
kind of sequence would be geared toward the prior (Schegloff and Lerner, 2004). As a post-
expansion, rather than closure of the sequence. expansion, it orients to the starkly minimal
In each case, the recipients would display nature of Ben’s answer, and constitutes
a stance toward the prompt that accepts, a ‘second try’ at the topic proffer. Although
encourages, and embraces the proffered topic addressed to the same ‘target’ (the prompt),
(they would literally talk about it) (Schegloff, this question makes relevant a different class
2007: 171), and their responses would be of answer to the prior – not an evaluation of the
oriented toward being ‘more than minimal’. prompt, but a description of what is going on
By contrast, a dispreferred response to a in the image – something Ben arguably needs
prompted topic proffer would be one in which to do before he can evaluate it.
the recipient rejects, declines, or discourages Now Ben identifies the content of the
it (2007: 171). Dispreferred responses would prompt, and, like the participants in excerpts
therefore be designedly minimal, and the 1–3, he does so using a gender-marked term
sequence would move toward ‘incipient ‘It’s a male ballet dancer’ (line 7). The
closure’ (2007: 180). moderator’s third position, ‘Ri:ght’ (line 9)
Right from the start, Ben refuses to embrace shows that he is now on the right lines,
the possibility for discussion engendered by grasping the nature of the task she is setting in
the showing of the prompt, or to produce the showing him the prompt and closes this part
extended evaluative commentary that it makes of the sequence.
procedurally relevant. Instead, he responds The moderator continues by asking ‘Would
with a delayed and starkly minimal, unmit- you do that?’ (line 11), thus turning the
igated, one-word answer to the moderator’s focus away from the picture to Ben’s own
opening question: ‘Lovely’ (line 3). This is relationship to the activity it depicts – ballet
said with final intonation, and does not yield dancing. After a lengthy delay (lines 12–14),
to the 0.8 second silence which follows. This Ben answers ‘No’, explaining that he would
silence provides ample opportunity for Ben to not do ballet because he has ‘got dodgy
resume talking and thereby expand, unpack, ankles’ (line 15). The moderator appears
or account for his (minimal) response. It is to laugh briefly here, and a series of gaps
worth noting that his evaluation does respond follow (lines 17 and 19) in which she
in a ‘type conforming’ (Raymond, 2003) way withholds any further response, allowing Ben
to the moderator’s question (the question to incrementally unpack his account for why
makes an assessment [what Ben ‘thinks of’ he would not do ballet (lines 18 and 20).
the prompt] relevant, and this is what Ben There is much that could be said about the
provides). However, it does not meet the way Ben crafts this account. However, one
requirement for expansion associated with the of the most interesting features of it is that it
hitherto mentioned preference organisation seems designed so as to deflect the potential
for a topic proffer. Ben’s bald response stands imputation that he would not want to do ballet
out as resistive here, not only because it for reasons of prejudice. Ben presents his
is designedly minimal, patently ‘not playing reasons for not wanting to do ballet as due
along with’ the task set by the moderator, but to his physical incapacity (his ‘dodgy ankles’
also because he clearly does have direct access [line 15]) and lack of interest (line 20) rather
than his conscious choice. As he makes clear: statement about what ‘some people would’
‘if I could, (.) then I prob’ly would’ (line 18). say (line 32). The grammatical construction
(For more on ‘inability’ accounts see Drew, of Ben’s turn – in particular the repetition
1984.) of ‘some people would’ is another way to
There follows a series of post-expansions embody a minimal response (Schegloff, 2007:
where the moderator makes a concerted 171). Indeed, this turn is designedly not
effort to elicit (or else initiate repair on) adding anything to what the moderator’s prior
Ben’s view about male ballet dancers (e.g. utterance has done, and does not progress or
lines 21–22, 26, 29–30, and 34). However, develop the course of action or prompt-related
Ben actively resists responding to each ‘topic talk’. Finally, when the moderator
successive intervention on the moderator’s pursues the question of his view: ‘But you
terms, producing only minimal answers (lines wouldn’t’ (line 34), he simply provides a
24, 28, 32, 36). Even where the moderator further bald and final ‘↑No’response (line 36).
invites him to reconsider his response with the This excerpt neatly highlights some of the
initiation of a disagreement implicative, other- interactional contingencies that participants’
initiated repair (‘It doesn’t’. [line 26], (For responses to picture prompts may be designed
more on the conversation analytic concept of to manage. In this instance, the prompt is
repair see Schegloff, 2007: 151 and Schegloff not treated by Ben as a facilitator of talk in
et al., 1997)). Ben does not work to resolve the which he is free to set the priorities. Rather,
misalignment by backing down, expanding his response is co-constructed within a context
his answer, or adjusting it to make it more of mutual suspicion, and in which he exposes
acceptable to the moderator. Instead, he and seeks to manage what he takes to be
simply repeats his prior, bald ‘No’, response the researcher’s (hidden) agenda. Specifically,
(line 28). That he does this is further evidence Ben orients to the moderators’ questions,
for resistance: He is pointedly refusing to and his responses, as things that may reveal
‘play along’ with the moderator’s agenda. something negative about him (he may be
Ben’s resistance may be due, in part, to effeminate, gay or prejudiced, for example).
his being asked questions that may involve Instead of ‘playing along with’ the task set
answers that could potentially place him in by the moderator by engaging in prompt-
(what he takes to be) a negative identity related topic talk (thus collaborating with
category – as someone who is ‘effeminate’, the moderator in progressing the interaction
‘gay’ or ‘homophobic’, for example. His toward the successful resolution of the
resistance to the latter is most obvious in his sequence), Ben’s answers seem dedicated to
response to the moderator’s ‘[I] mean some pre-empting, deflecting, and actively resisting
people would say that he’s a ‘poof’ or inferences that he is a certain sort of person,
something’ (lines 29–30). This observation and which may have negative implications
is clearly designed in continuity with the for his identity.
moderator’s previous line of questioning, and
in response to Ben’s failure to repair his
minimal answer. In citing others’ hypothet- DISCUSSION
ical, prejudiced views, the observation is
designedly provocative – placing Ben in a I began this chapter by summarising some
position where he might discuss his views key issues at the heart of debates about
on the normativity (or otherwise) of male natural and contrived data. I suggested in
ballet dancing. However, instead of treating particular that the strong preference for natural
the observation as something that is designed data expressed by conversation analysts and
to elicit his view, or as another attempt discursive psychologists derives from a con-
by the moderator at a topic proffer, Ben cern not to suppress fundamental features
simply agrees with the moderator’s assertion, of the natural interactional phenomena to
producing a second pair part to a factual which they wish to gain access. I argued that
this preference for natural data is especially the participants’ questions about the prompt
problematic for feminist researchers who, for (through the use of a conversational counter,
various reasons to do with assumptions about for example [as in excerpt 3]), she would often
the observability, access to, and frequency of quickly re-engage in talk that would progress
occurrence of the phenomena they wish to the course of action toward her favoured
study, have tended to work with relatively interactional outcome (i.e. the production of
contrived social science data sources such [gendered] commentary on/evaluations of the
as surveys, interviews, and focus groups. prompt). However, while the prompts initially
For them, far from suppressing the kinds appeared successful in getting the participants
of ‘natural’ phenomena to which they wish to notice gender, these initial gender noticings
to gain access, the artificially elicited ‘topic were rarely followed up in their subsequent
talk’ that they derive from contrived materials evaluative commentaries.
render those phenomena observable – and Finally, in a number of instances (depicted
hence studiable. here by excerpt 4), the participants strongly
In order to explore the kinds of gender- resisted seeing what they were supposed to see
relevant evidence and insights that close in the prompt, and it often took considerable
analysis of a relatively contrived dataset constructive work on the moderator’s part,
provides, I revisited some data from my and further follow-up questioning, in order
own early research on gender and leisure, to produce the kind of non-minimal reaction
in which I used picture prompts in order to the prompt that the moderator was after.
to access people’s views about men and In these instances, the participants seemed
women’s participation in ‘non-traditional’ suspicious about (what they took to be)
activities (such as men’s ballet and women’s the researcher’s ‘elusive hypothesis’, and
rugby, for example). I showed that, when we oriented to the possibility that their responses
subject the actual use of relatively ‘contrived’ might have negative implications for their
techniques involving prompts to a detailed identity. Far from being naive cultural dopes
analysis, that such techniques do not always that passively accepted the doing of social
work in the way the researcher might have science upon them, then, in these instances,
intended them to work. Thus, in my data, the participants would actively strive to subvert
participants were invited to find something such an image. They resisted the potential
topical in, or ‘comment-worthy’ about the inferences about their identities that were
prompt. In just under half the instances being imposed on them by researchers.
in the corpus, what they were invited to In sum, the prompts did not seem to
see was obviously and immediately self- minimise the researcher’s impact, generate
evident to them, and they engaged in lively non-hierarchical research relationships, or
discussion about the (gendered) content of the encourage the respondents to set the priorities
prompt. In other instances, including the first ‘in their own terms’. As we have seen, their
three excerpts discussed in this chapter, the evaluations and commentaries on the prompt
participants seemed to have trouble seeing were rarely delivered in a spontaneous,
what they were supposed to see in the unencumbered, or naturalistic fashion, and
prompt. They sought clarification from the attempts to disguise researcher provocation as
moderator (in the form of an ‘insertion free-for-all opinion giving, or manipulation as
sequence’), of the grounds on which they complete freedom, did not work.
were required to respond to it, engaging So what might these analyses tell us
her in work to disambiguate its content. about the relative virtues of natural and
Thus, the participants routinely treated the contrived data? The interactional contingen-
moderator as ‘expert’ on the prompts and cies that I have shown the participants are
her opening question as a ‘test’ question for oriented towards in their responses pose
which there is a right or wrong answer. Even problems for researchers who treat prompts,
where the moderator tried to resist answering or other ‘contrived’ techniques involving
researcher intervention, as neutral resources technical attempts to strip interviews of their

for accessing some ‘truth’ or ‘reality’ beyond interactional ingredients will be futile’ (see
or beneath the data. also Speer and Hutchby, 2003a, 2003b).
The data show how, even where the Indeed, for feminists, such worries buy into
researcher tries to remove herself as far as the very illusion of objectivity and value
possible from the data collection process – as neutrality that they have long since sought
in this case through the use of prompts – that to expose and counter.
her presence is still very much in evidence and It is important to remember that views
that the data collected, just like the data from about gender are produced in thoroughly
interviews and other more ‘interventionist’ social and interactional contexts and that
techniques that involve the manipulation the use of prompts and/or more traditional
and control of variables, are thereby always social scientific methods does not make
collaboratively produced, interactional prod- those contexts any the less contextual and
ucts. Indeed, the very business of doing interactional. It is for this reason that I am now
research undercut the intended neutrality of the view that prompted and other contrived
of the prompts and reinstated normative techniques may not be the best way to gain
conversational procedures – procedures that access to talk about gender or to understand
were in this case bound up in the construction how people ‘do gender’ in everyday contexts.
of the data and the nature of the ‘gender talk’ Indeed, we need to give serious thought to the
obtained. extent to which artificially elicited ‘topic talk’
In many ways prompts and other contrived that involves putting members in a situation
techniques create an artificial situation in that explicitly requires them to comment on
which respondents are asked to comment on gender, is paradigmatic of, or will necessarily
things that in more mundane contexts are give us access to, how members routinely
not typically brought to relevance in such do gender in other settings. As C. Kitzinger
an explicit way. Indeed, this may be one and Wilkinson (2003, emphasis in original)
reason why the respondents in my data did not observe, ‘While this approach yields a great
respond immediately to the stimuli in the way deal of talk about a category, it precludes
I had initially hoped. The situation was fun- any exploration of how people use categories
damentally non-natural for them. Prompts, interactionally in everyday life’.
interview questions, experiments and such It was precisely these concerns which
like, are not neutral, non-invasive stimuli encouraged me in my more recent work to
that help people formulate their thoughts collect examples of gender talk from settings
and opinions on certain topics. Instead, the where I was not present. I wanted to obtain
reality of ‘contrived methods’ is a socially ‘naturally occurring’ data in which gender
constructed one and their use is embedded in crops up routinely as part of the day-to-day
collaborative, meaning-making activities. business of an institution, and where my own
Researchers have long since acknowledged presence would not limit or constrain that
that contrived methods cannot be neutral gendered activity. In 2004 I began, in col-
‘machinery for harvesting data from respon- laboration with Richard Green (a consultant
dents’ (Potter, 2004: 205). We can never psychiatrist and then Head of Charing Cross
achieve an unmediated access to participants’ Hospital Gender Identity Clinic), a large-
realities, neutralise the context, or disinfect scale ESRC-funded study on the construction
our data entirely of the researcher’s presence, of transsexual identities in medical contexts
because the knower is always intimately (Speer and Green, forthcoming). This study
bound up in and partially constitutive of involved me collecting more than 150 hours
what is known. To assume otherwise is to of audio and 20 hours of video-taped assess-
deny the unavoidably social nature of data ment sessions between psychiatrists and pre-
collection practices. As Holstein and Gubrium operative transsexual patients. Unlike my own
(1997: 114, see also, 2003) point out, ‘any prior work, and much contemporary research
on gender and language (which tends to ask (e.g. in a court of law). Nor would I want to
members to comment on gender, seeks out imply that we will never obtain naturalistic
their retrospective reports on how they do talk, or gain access to general features
gender, or else is based on the researcher’s of the ‘doing’ of gender in purportedly
own recollections and post hoc reports of ‘contrived’ materials. Our inability to strip
gendered events), this new, naturally occur- data of its context should not (necessarily) be
ring dataset, has allowed me to examine adequate justification for abandoning the use
examples of interactions in which members of contrived materials altogether. As I have
are currently engaged in the act of doing shown here, by adopting a reflexive approach
gender with the psychiatrist in the clinic. And to our data, and by being sensitive to the ways
once I began to look at this dataset, it became in which the researcher herself is bound up in
apparent that often the doing of gender (in the production of that data, we can obtain rich
particular – working to pass as ‘authentically’ insights into respondents’ ways of managing
male or female in this setting) does not involve the interactional issues and dilemmas their
its overt topicalisation at all (for more on this participation throws up.
see Speer and Green, 2007; Speer and Parsons, By turning what is commonly regarded
2006; and also C. Kitzinger, 2006, 2007). as a ‘resource’ (albeit an inherently flawed
Some researchers suggest that in the future, one) into their ‘topic’, an increasing number
it is likely that the use of ‘naturalistic materi- of researchers using fine-grained analytic
als’ will become more common in qualitative methods have been able to show how social
research ‘and interviews and focus groups science methods get done, identifying features
will be mainly an adjunct to those naturalistic which characterise, say, interview talk as
studies’ (Potter, 2003: 614). In a recent debate interview talk, and which distinguish it
Potter and Hepburn (2005a: 282) ‘challenge from ‘mundane conversation’ (Drew et al.,
the taken-for-granted position of the open- 2006; Maynard et al., 2002; Mishler, 1986;
ended interview as the method of choice in Suchman and Jordan, 1990). In such studies
modern qualitative psychology’, suggesting the researcher is treated – not as a potential
that ‘The ideal would be much less interview ‘contaminant’ – but rather, as much of a
research, but much better interview research’ ‘member’ as the other participants, and of
(2005a: 282). They argue that in the future, it equal status for the purposes of analysis.
is likely that the use of ‘naturalistic materials’ Thus, I want to urge caution in applying
will become more common in qualitative the ‘natural-contrived’ distinction too rigidly.
research ‘and interviews and focus groups As I have argued elsewhere (Speer, 2002a,
will be mainly an adjunct to those naturalistic 2002b), from a discursive and CA perspective,
studies’ (Potter, 2003: 614). Indeed, Schegloff it actually makes little theoretical or practical
(1996b: 471, emphasis added), suggests that sense to map the natural/contrived distinction
‘investigators should increasingly work with onto discrete ‘types’ of data or to treat
such [naturalistic] materials’. the researcher as a potentially contaminating
Even though I would generally subscribe force. In this respect the natural-contrived
to these recommendations, and have used the distinction has been overplayed. What are
data in this chapter to demonstrate the virtues natural data and what are not is not decidable
of analysing naturally occurring data, one on the basis of their type and/or the role of
important caveat needs to be noted: I would the researcher within the data. All data can be
not want to imply that existing feminist data natural or contrived depending on what one
collection practices and modes of analysis wants to do with them.
are wrong or bad, or that we should stop Thus, it follows that it is fine if we, as
using contrived materials and other ‘non- feminist researchers, want to use contrived
directive’ techniques altogether. The rhetoric materials to explore how gender talk is derived
of social science data – just like essentialism – in research contexts, paying close attention to
can be a useful tool in certain circumstances the constructive processes involved (the data
can be ‘naturalised’ – or treated as natural, Underline Underlining marks parts of

as it has been in this chapter). However, if words that are emphasised by
one wants to analyse contrived data where the speaker.
participants are asked to comment on gender Rea::lly Colons mark an elongation or
in order to discover how people routinely stretch of the prior sound. The
do gender in ‘everyday’ settings, then such more colons, the longer the
prompted ‘gender commentary’ may not be stretch.
the best data for such purposes. huh/hah/heh Marks full laughter tokens.
Ultimately, what is needed if we are to (h) An ‘h’ in brackets indicates
make well-informed choices about the data laughter particles.
we use, and – perhaps more importantly – .hhh A dot before an ‘h’ or series of
if we are to produce theoretically sound, ‘h’s indicates an inbreath.
analytically tractable justifications for those hhh An ‘h’ or series of ‘h’s marks
choices, is a more sophisticated understanding an out-breath.
of the relationship between method, context >faster< ‘More than’ and ‘less than’
and data. We need to be clearer and more signs enclose speeded up talk.
consistent about what exactly constitutes the = An equals sign indicates
object of our analysis, and to establish why immediate latching of
a particular research method is chosen over successive talk.
and above others. In sum, we need to have (2.0) The length of a pause or gap,
a greater awareness of how our data collection in seconds.
practices shape the phenomena to which we (.) A pause or gap that is hearable
wish to gain access. but too short to assign a
time to.
[overlap] Square brackets mark the
APPENDIX onset and end of overlapping
talk.
TRANSCRIPTION NOTATION () Single brackets indicate
transcriber doubt.
See Jefferson (2004a) for further information (brackets) Content of single brackets
about these transcription symbols. represents a possible hearing.
((laughs)) Double brackets enclose
. A full stop indicates falling, or comments from the transcriber.
stopping intonation.
, A comma indicates a continuing
intonation. ACKNOWLEDGEMENT
? A question mark indicates rising
intonation. I would like to thank Victoria Clarke and Jim
- A dash marks a sharp cut-off Holstein for their helpful comments on an
of the just prior word or sound. earlier draft of this chapter.
↑ An upward arrow immediately
precedes rising pitch.
↓ A downward arrow immediately
precedes falling pitch. NOTES
LOUD Capitals mark talk that is
noticeably louder than that 1 See, for example, the debates between Speer
surrounding it. (2002a, 2002b) and ten Have (2002), Lynch (2002)
◦ quiet◦ and Potter (2002) in Discourse Studies; between
Degree signs enclose talk that is Potter and Hepburn (2005a, 2005b) and Hollway
noticeably quieter than that (2005), Mishler (2005), and Smith (2005) in Qualitative
surrounding it. Research in Psychology, and finally, between Griffin
(2007a, 2007b), Henwood (2007) and Potter and REFERENCES

Hepburn (2007) in Discourse Studies.
2 Jonathan Potter suggests that, in order to judge
Bucholtz, M. (2004) ‘Changing places: Language and
whether a piece of data is natural or not, ‘the test is
woman’s place in context’. In R. Lakoff. Language
whether the interaction would have taken place, and
would have taken place in the form that it did, had and Woman’s Place: Text and Commentaries (revised
the researcher not been born’ (1996: 135; Potter and and expanded edn, ed. M. Bucholtz, pp. 121–8).
Wetherell, 1987: 162). The data must, in other words, New York and Oxford: Oxford University Press.
pass the ‘dead social scientist test’: the interaction Cameron, D. (1998) ‘Is there any ketchup Vera? Gender,
must have taken place even ‘if the researcher got power and pragmatics’, Discourse & Society 9(4):
run over on the way to the university that morning’ 437–55.
(Potter, 2003: 612). From this perspective, doctor– Campbell, R. and Salem, D.A. (1999) ‘Concept mapping
patient interaction, courtroom trials, calls to the as a feminist research method: Examining the
police, business meetings, talk in the classroom, and
community response to rape’, Psychology of Women
conversations between friends are all ‘natural’ (Potter,
1997: 148–9; 2003).
Quarterly 23: 65–89.
3 For example, as Potter (1997: 150) notes, the Chiu, L. and Knight, D. (1999) ‘How useful are focus
social science interview ‘is contrived; it is subject to groups for obtaining the views of minority groups?’
powerful expectations about social science research In R.S. Barbour and J. Kitzinger (eds) Developing
fielded by participants; and there are particular Focus Group Research: Politics, Theory and Practice,
difficulties in extrapolating from interview talk to pp. 99–112. London: Sage.
activities in other settings’. This is not least because Drew, P. (1984) ‘Speakers’ reportings in invitation
‘the interaction in interviews and focus groups is sequences’. In J.M. Atkinson and J. Heritage (eds)
flooded by the expectations and categories of social
Structures of Social Action: Studies in Conversa-
science agendas’ (2003: 613; see also Potter and
tion Analysis, pp. 129–51. Cambridge: Cambridge
Hepburn, 2005a).
4 For contemporary examples of the use of University Press.
fictional data in research on gender and language, Drew, P. (1989) ‘Recalling someone from the past’.
see Cameron (1998) and Hopper (2003). In D. Roger and P. Bull (eds) Conversation: An
5 For some recent discussions of feminist method- Interdisciplinary Perspective, pp. 96–115. Clevedon:
ology see Harding and Norberg (2005), Lykke (2005), Multilingual Matters.
and Ramazanoglu and Holland (2002). Drew, P. (1992) ‘Contested evidence in courtroom
6 Such ‘known answer’ questions are con- cross-examination: The case of a trial for rape’. In
ventionally associated with instructional or class-
P. Drew and J. Heritage (eds) Talk at Work: Interaction
room settings, but can also be found in other
contexts (Schegloff, 2007: 223–5): for example
in Institutional Settings, pp. 470–520. Cambridge:
courtroom cross-examination, psychiatric assessment Cambridge University Press.
interviews, and even interactions around the dinner Drew, P., Raymond, G. and Weinberg, D. (eds) (2006).
table. Talk and Interaction in Social Research Methods.
7 Of course, the virtue of this response is that London: Sage.
it allows the moderator to progress the interac- Edwards, D. (1998) ‘The relevant thing about her:
tional project toward sequence closure – eliciting Social identity categories in use’. In C. Antaki and
recipients’ commentaries on, and evaluations of, the S. Widdicombe (eds) Identities in Talk, pp. 15–33.
prompt.
London: Sage.
8 Moreover, it is worth noting that the insertion
sequence is already deferring the base second pair
Finch, J. (1987) ‘Research note: The vignette technique
part (the evaluation of the prompt). The counter – in survey research’, Sociology 21(1): 105–14.
which reverses the sequence – thereby takes the Griffin, C. (2007a) ‘Being dead and being there:
speakers even further away from the resolution of Research interviews, sharing hand cream and the
the moderator’s interactional project (i.e. progressing preference for analysing “naturally occurring data”’,
the sequence in the direction of prompt-related Discourse Studies 9(4): 246–69.
commentary and evaluation). As Schegloff (2007: 17) Griffin, C. (2007b) ‘Different visions: A rejoinder to
notes, a counter can end up ‘having only deferred the Henwood, Potter and Hepburn’, Discourse Studies
answer, and inserted one question-answer exchange
4(9): 283–87.
inside another’. In this case, the counter serves simply
Harding, S. and Norberg, K. (eds) (2005) ‘New feminist
to delay the moderator’s subsequent provision of the
‘correct’ answer. approaches to social science methodologies’, Special
9 This style of questioning is not dissimilar to the issue, Signs 30(4).
kinds of cross-examination one finds in legal settings Have, P. ten (1999) Doing Conversation Analysis:
(e.g. Drew, 1992). A Practical Guide. London: Sage.
Have, P. ten (2002) ‘Ontology or methodology? Kitzinger, C. (2006) ‘Talking sex and gender’. In
Comments on Speer’s “natural” and “contrived” P. Drew, G. Raymond and D. Weinberg (eds) Talk and
data: A sustainable distinction?’ Discourse Studies Interaction in Social Research Methods, pp. 155–170.
4(4): 527–30. London: Sage.
Henwood, K. (2007) ‘Beyond hypercriticality: Taking for- Kitzinger, C. (2007) Is ‘woman’ always relevantly
ward methodological inquiry and debate in discursive gendered?, Gender and Language 1(1): 39–49.
and qualitative social psychology’, Discourse Studies Kitzinger, C. and Frith, H. (1999) ‘Just say no? The
4(9): 270–5. use of conversation analysis in developing a feminist
Heritage, J. (1984) Garfinkel and Ethnomethodology. perspective on sexual refusal’, Discourse and Society
Cambridge: Polity Press. 10(3): 293–316.
Heritage, J. (1988) ‘Explanations as accounts: A con- Kitzinger. C. and Powell, D. (1995) ‘Engendering
versation analytic perspective’. In C. Antaki (ed.) infidelity: Essentialist and social constructionist
Analysing Everyday Explanation: A Casebook of readings of a story completion task’, Feminism &
Methods, pp. 127–44. London: Sage. Psychology 5: 345–72.
Heritage, J. and Atkinson, J.M. (1984) ‘Introduction’. In Kitzinger, C. and Wilkinson, S. (2003) ‘Construct-
J.M. Atkinson and J. Heritage (eds) Structures of Social ing identities: A feminist conversation analytic
Action. Studies in Conversation Analysis, pp. 1–15. approach to positioning in action’. In R. Harré
Cambridge: Cambridge University Press. and A. Moghaddam (eds) The Self and Others:
Hollway, W. (2005) ‘Commentary 2’, Qualitative Positioning Individuals and Groups in Personal,
Research in Psychology 2(4): 312–14. Political and Cultural Contexts, pp. 157–180.
Holstein, J.A. and Gubrium, J.F. (1997) ‘Active New York Praeger/Greenwood.
interviewing’. In D. Silverman (ed.) Qualitative Kitzinger, J. (1990) ‘Audience understandings of AIDS
Research: Theory, Method and Practice, pp. 113–29. media messages: A discussion of methods’, Sociology
London: Sage. of Health and Illness 12: 319–35.
Holstein, J.A. and Gubrium, J.F. (2003) ‘Context: Work- Kitzinger, J. (1994) ‘The methodology of focus
ing it up, down, and across’. In C. Seale, G. Gobo, groups: The importance of interaction between
J. Gubrium and D. Silverman (eds) Qualitative research participants’, Sociology of Health and Illness
Research Practice, pp. 297–311. London: Sage. 16: 103–21.
Hopper, R. (2003) Gendering Talk. East Lansing, MI: Kitzinger, J. and Barbour, R.S. (1999) ‘Introduction:
Michigan State University Press. The challenge and promise of focus groups’. In
Hopper, R. and LeBaron, C. (1998) ‘How gender R.S. Barbour and J. Kitzinger (eds) Developing
creeps into talk’, Research on Language and Social Focus Group Research: Politics, Theory and Practice,
Interaction 31(1): 59–74. pp. 1–20. London: Sage.
Hughes, R. (1998) ‘Considering the vignette technique Lakoff, R. (1973) ‘Language and woman’s place’,
and its application to a study of drug injecting and Language in Society 2: 45–79.
HIV risk and safer behaviour’, Sociology of Health and Levinson, S. C. (1992) ‘Activity Types and Language’. In
Illness 20(3): 381–400. P. Drew and J. Heritage (eds) Talk at Work: Interaction
Hutchby, I. and Wooffitt, R. (1998) Conversation in Institutional Settings, pp. 66–100. Cambridge:
Analysis. Cambridge: Polity. Cambridge University Press.
Jefferson, G. (2004a) ‘Glossary of transcript symbols Livia, A. (2003) ‘“One man in two is a woman”:
with an introduction’. In G.H. Lerner (ed.) Conver- Linguistic approaches to gender in literary texts’. In
sation Analysis: Studies from the First Generation, J. Holmes and M. Meyerhoff (eds) The Handbook
pp. 13–31. Amsterdam: John Benjamins. of Language and Gender, pp. 142–58. Oxford:
Jefferson, G. (2004b) “‘At First I Thought”: A normaliz- Blackwell.
ing device for extraordinary events’. In G.H. Lerner Lykke, A. (ed.) (2005) ‘Transformative methodologies in
(ed). Conversation Analysis: Studies from the First feminist studies’, Special Issue, European Journal of
Generation, pp. 131–67. Amsterdam/Philadelphia: Women’s Studies 12(3).
John Benjamins. Lynch, M. (2002) ‘From naturally occurring data to
Kitzinger, C. (2000) ‘Doing feminist conversation naturally organized ordinary activities: Comment on
analysis’, Feminism & Psychology 10(2): 163–93. Speer’, Discourse Studies 4(4): 531–7.
Kitzinger, C. (2003) ‘Feminist approaches’. In C. Seale, Maynard, D.W., Houtkoop-Steenstra, H., Schaeffer, N.C.
G. Gobo, J. Gubrium and D. Silverman (eds) and van der Zouwen, J. (eds.) (2002) Standardization
Qualitative Research Practice, pp. 125–40. London: and Tacit Knowledge. Interaction and Practice in the
Sage. Survey Interview. John Wiley: New York.
Mishler, E.G. (1986) Research Interviewing: Context and Sacks, H. (1984) ‘Notes on methodology’. In
Narrative. Cambridge, MA: Harvard University Press. J.M. Atkinson and J. Heritage (eds) Structures of
Mishler, E. (2005) ‘Commentary 3’, Qualitative Research Social Action: Studies in Conversation Analysis,
in Psychology 2(4): 315–18. pp. 21–7. Cambridge: Cambridge University Press.
Pollak, S. and Gilligan, C. (1982) ‘Images of violence Sacks, H. (1987) ‘On the preferences for agreement and
in thematic apperception test stories’, Journal of contiguity in sequences in conversation’. In G. Button
Personality and Social Psychology 42: 159–67. and J.R.E. Lee (eds) Talk and Social Organisation,
Pomerantz, A. (1984) ‘Agreeing and disagreeing with pp. 54–69. Clevedon: Multilingual Matters.
assessments: Some features of preferred/dispreferred Sacks, H. (1995) Lectures on Conversation, Vols. 1 & 2,
turn shapes’. In J.M. Atkinson and J. Heritage (eds) ed. Gail Jefferson. Oxford: Blackwell.
Structures of Social Action: Studies in Conversa- Sacks, H., Schegloff, E.A. and Jefferson, G. (1974)
tion Analysis, pp. 57–101. Cambridge: Cambridge ‘A simplest systematics for the organization of turn-
University Press. taking for conversation’, Language 50(4): 696–735.
Potter, J. (1996) ‘Discourse analysis and construc- Schegloff, E.A. (1988a) ‘Presequences and indirection:
tionist approaches: Theoretical background’. In Applying speech act theory to ordinary conversation’,
J.T.E. Richardson (ed.) Handbook of Qualitative Journal of Pragmatics 12: 55–62.
Research Methods for Psychology and the Social Schegloff, E.A. (1988b) ‘Goffman and the analysis of
Sciences, pp. 125–40. Leicester: BPS Books. conversation’. In P. Drew and A. Wootton (eds)
Potter, J. (2002) ‘Two kinds of natural’, Discourse Studies Erving Goffman: Exploring the Interaction Order,
4: 539–42. pp. 89–135. Cambridge: Polity Press.
Potter, J. (2003) ‘Discourse analysis’, in M. Hardy Schegloff, E.A. (1995) Introduction (Volume 1). In
and A. Bryman (eds) Handbook of Data Analysis, Sacks, H. (ed.) Lectures on Conversation. 2 vols.
pp. 607–24. London: Sage. Edited by Gail Jefferson. pp. ix–lxii. Oxford: Basil
Potter, J. (2004) ‘Discourse analysis as a way of Blackwell.
analysing naturally occurring talk’. In D. Silverman Schegloff, E.A. (1996a) ‘Confirming allusions: Toward
(ed.) Qualitative Research: Theory, Method and an empirical account of action’, American Journal of
Practice, pp. 200–21. London: Sage.Potter, J. and Sociology 104(1): 161–216.
Hepburn, A. (2005a) ‘Qualitative interviews in Schegloff, E.A. (1996b) ‘Some practices of referring to
psychology: Problems and prospects’, Qualitative persons in talk-in-interaction: A partial sketch of a
Research in Psychology 2: 281–307. systematics’. In B. Fox (ed.) Studies in Anaphora,
Potter, J. and Hepburn A. (2005b) Action, interaction pp. 437–85. Amsterdam: Benjamins.
and interviews: Some responses to Hollway, Mishler Schegloff, E.A. (1998) ‘Reflections on studying
and Smith, Qualitative Research in Psychology prosody in talk-in-interaction’, Language and Speech
2: 319–25. 41(3–4): 235–63.
Potter, J. and Hepburn, A. (2007) ‘Life is out there: Schegloff, E.A. (1999) ‘Discourse, pragmatics, conversa-
A comment on Griffin’, Discourse Studies 9(4): tion, analysis’, Discourse Studies 1(4): 405–36.
276–82. Schegloff, E.A. (2004) ‘Experimentation or observation?
Potter, J. and Wetherell, M. (1987) Discourse and Of the self alone or the natural world?’ Behavioral
Social Psychology: Beyond Attitudes and Behaviour. and Brain Sciences 27(2): 271–2.
London: Sage. Schegloff, E. A. (2007) Sequence Organization in
Potter, J. and Wetherell, M. (1995) ‘Natural order: Interaction: A Primer in Conversation Analysis, Vol 1.
Why social psychologists should study (A constructed Cambridge: Cambridge University Press.
version of) natural language, and why they have not Schegloff, E.A., Jefferson, G. and Sacks, H. (1977) ‘The
done so’, Journal of Language and Social Psychology preference for self correction in the organisation of
14(1–2): 216–22. repair in conversation’, Language 53: 361–82.
Psathas, G. (1995) Conversation Analysis: The Study of Schegloff, E.A. and Lerner, G. (2004) ‘Beginning to
Talk-in-Interaction. London: Sage. respond’, Paper presented at the Annual Meeting of
Ramazanoglu, C. and Holland, J. (2002) Feminist the National Communication Association, Chicago,
methodology: Challenges and Choices. London: IL, November.
Sage. Schegloff, E.A. and Sacks, H. (1973) ‘Opening up
Raymond, G. (2003) ‘Grammer and social organization: closings’, Semiotica 8: 289–327.
Yes/No type interrogatives and the structure of Schlesinger, P., Dobash, R.E., Dobash, R.P. and
responding’, American Sociological Review 68: Weaver, C.K. (1992) Women Viewing Violence.
939–67. London: British Film Institute.
Silverman, D. (2006) Interpreting Qualitative Data: in the psychiatric assessment of transsexual patients’.
Methods for Analysing Talk, Text and Interaction, In V. Clarke and E. Peel (eds) Out in Psychology:
3rd edn. London: Sage. Lesbian, Gay, Bisexual, Trans and Queer Perspectives,
Sleed, M., Durrheim, K., Kriel, A., Solomon, V. and pp. 336–68. Chichester: Wiley.
Baxter, V. (2002) ‘The effectiveness of the vignette Speer, S.A. and Hutchby, I. (2003a) ‘From ethics to
methodology: A comparison of written and video analytics: Aspects of participants’ orientations to
vignettes in eliciting responses about date rape’, the presence and relevance of recording devices’,
South African Journal of Psychology 32(3): 21–8. Sociology 37(2): 315–37.
Smith, J. (2005) ‘Commentary 1: Advocating pluralism’, Speer, S.A. and Hutchby, I. (2003b) ‘Methodology
Qualitative Research in Psychology 2(4): 309–11. needs analytics: A rejoinder to Martyn Hammersley’,
Snelling, S.J. (1999) ‘Women’s perspectives on fem- Sociology 37(2): 353–9.
inism: A Q-methodological study’, Psychology of Speer, S.A. and Parsons, C. (2006) ‘Gatekeeping gender:
Women Quarterly 23: 247–66. Some features of the use of hypothetical questions in
Speer, S.A. (2001) ‘Reconsidering the concept of the psychiatric assessment of transsexual patients’,
hegemonic masculinity: Discursive psychology, con- Discourse & Society 17(6): 785–812.
versation analysis, and participants’ orientations’, Suchman, L. and Jordan, B. (1990) ‘Interactional troubles
Feminism and Psychology 11(1): 107–35. in face-to-face survey interviews’, Journal of the
Speer, S.A. (2002a) ‘Natural and contrived data: American Statistical Association 85(409): 232–41.
A sustainable distinction?’ Discourse Studies 4(4): Sunderland, J. (2004) Gendered Discourses. Basingstoke:
511–25. Palgrave Macmillan.
Speer, S.A. (2002b) ‘Transcending the natural/contrived Taino, L. (2003) ‘“When shall we go for a ride?” A case
distinction: A rejoinder to ten Have, Lynch and Potter’, of the sexual harassment of a young girl’, Discourse &
Discourse Studies 4(4): 543–8. Society 14: 173–90.
Speer, S.A. (2002c) ‘What can conversation analysis con- Wilkinson, S. (1999) ‘Focus groups: A feminist method’,
tribute to feminist methodology? Putting reflexivity Psychology of Women Quarterly 23: 221–44.
into practice’, Discourse and Society 13(6): 801–21. Wilkinson, S. and Kitzinger, C. (2007) ‘Conversation
Speer, S.A. (2005) Gender Talk: Feminism, Discourse analysis, gender and sexuality: A feminist perspec-
and Conversation Analysis. London: Routledge. tive’. In A. Weatherall, B. Watson and C. Gallois
Speer, S.A. and Green, R. (2007) ‘On passing: The (eds) Language, Discourse and Social Psychology,
interactional organization of appearance attributions pp. 206–30. Basingstoke, UK: Palgrave: Macmillian.
18
Self-Administered
Questionnaires and
Standardized Interviews
Edith de Leeuw
INTRODUCTION In survey research however, self-administered

mail surveys were mainly seen as a fall-
In the not too distant past there were only back method, until the publication of the
two survey methods to choose from: the 1978 Dillman book, which resulted in a
face-to-face interview and the postal or mail rise in high-quality mail surveys. Around
questionnaire. The first scientific interview 1970 a third data collection method became
goes back to 1912 and Bowley’s study of a serious option: the telephone interview.
working-class conditions in five British cities, This method was quickly adopted and
while the first postal survey is attributed to telephone surveys became the predominant
Sir John Sinclair in 1788 (for a historical mode in the USA around 1980. Since then
overview, see De Heer et al., 1999). In the major advances in computer technology
first part of the twentieth century face-to- have launched computer-assisted methods for
face survey interviews were further developed data collection, of which Computer Assisted
in the United States and evolved from short Telephone Interviewing (CATI) is the oldest
and simple inquiries into complex and highly form, and the Internet or Web survey is the
flexible research instruments (e.g. Hyman, youngest.
1954). At the same time, standardized instru- Just as in the past, there are now basically
ments were developed to measure attitudes two main forms of data collection: those
and opinions, but also to measure capacities with and those without an interviewer, or
(see also O’Muircheartaigh, 1997, pp. 9–12). in other words standardized interviews and
Soon self-administered questionnaires and self-administered questionnaires. But there
tests became the favourite data collec- are many variations possible within each
tion method in education and psychology. main form. Standardized interviews can
either be in person (face-to-face) or over categories may be used. As a consequence,

the phone, and computer-assisted equivalents the visual presentation of questions and the
are available for each version: Computer- general layout of the questionnaire are far
Assisted Personal Interviewing (CAPI) and more important in self-administered question-
CATI. Self-administered questionnaires can naires and also different from the ones used in
be used in group settings (e.g. educational interviews.
tests in classrooms), or in individual settings
(e.g. postal sample survey). Again computer-
Response and non-response
assisted equivalents are available for different
types of self-administered questionnaires. In Response to surveys has been decreasing
educational research, the school computers over the years. This is partly due to an
and computer laboratories are being used increase in non-contacts and partly due
to administer tests, in establishment surveys to an increase in refusals (De Leeuw &
disk-by mail surveys and Web surveys are De Heer, 2002). Besides non-contact and
becoming popular, and Internet surveys for refusals there are also other sources of
population surveys and panel research are non-response, such as inability to cooperate
the latest development (for an introduction (e.g. ill health, absence, language problems).
and overview of computer-assisted data col- These all influence response rates, and should
lection, see De Leeuw et al., 2003). Due to be clearly defined. For clear definitions of
this variety, the choice for the optimal data response rates for face-to-face, telephone,
collection method is far from simple! mail, and Internet surveys, see the website of
the American Association for Public Opinion
Research (www.aapor.org), and the section of
survey methods standards and best practices.
SELF-ADMINISTERED Depending on the type of survey and the
QUESTIONNAIRES VERSUS fieldwork organization, different sources of
STRUCTURED INTERVIEWS non-response play a more important role. For
instance, in telephone surveys, it is relatively
There are two main differences between self- easy to keep trying to reach not-contacted
administered questionnaires and structured respondents during the fieldwork period
interviews. The first is the absence versus without raising the costs considerably. As a
presence of the interviewer and its conse- consequence, non-contacts are a small portion
quences for implementation, non-response of the total non-response and the major part of
and data quality. Interviewers may convince non-response is due to refusals. However, in
reluctant respondents, motivate respondents, face-to-face interviews, the number of contact
and provide additional instruction or expla- attempts is usually more limited, depending
nations during the data collection. However, on budget and fieldwork procedures, and
at the same time the mere presence of the non-contacts can be a substantial part of
interviewer can influence responses and cause the total non-response (De Heer, 1999). In
unwanted interviewer effects, especially when mail surveys the number of non-contacted
sensitive issues are being discussed. In other depends on the reliability of the mail system;
words, interviewers are assets and liabilities usually the number of non-contacts due to
at the same time. non-delivery is small. Finally, for surveys of
The second main difference is that in the general population the ‘other’ category,
self-administered questionnaires, be it a psy- such as inability, will be small compared to
chological test, a postal survey or a Web the refusals. However, when special topics
questionnaire, the respondents see the ques- and populations are studied, such as health
tions, while during structured interviews surveys of the elderly, this other category
respondents usually do not, although show may become very important and special
material such as flash cards with response fieldwork measures should be taken to reduce
SELF-ADMINISTERED QUESTIONNAIRES AND STANDARDIZED INTERVIEWS 315
this source. For an overview of non-response response rate in mail surveys a respondent-
sources and design implications on response friendly questionnaire and cover letter in
propensity, see Dillman et al. (2002). combination with well-timed reminders is
In general, face-to-face surveys tend to necessary (Dillman, 1978, 2000), while for
obtain higher response rates than comparable Internet surveys a well-written invitation, in
telephone surveys, but both methods show combination with reminders and a good lay
a decrease in response over time. Mail out and respondent-friendly Web interface is
surveys tend to have a lower response rate essential (Dillman, 2000, 2007; Lozar et al.,
than comparable face-to-face and telephone 2008). Two measures are effective in all
surveys. However, there is no evidence for forms of data collection, that is, both in
a decrease of response over time in mail interviews and in self-administered mail and
surveys. Thus, the differences in response Internet surveys. Advance letters or prenoti-
between survey methods have become smaller fications do have a positive influence on the
both in Europe and in the USA and Canada response for all types of surveys (De Leeuw
(e.g. Goyder, 1987; Hox and De Leeuw, 1994). et al., 2007). The same goes for incentives,
In recent years telephone response rates have which are effective in raising response in
further decreased, partly due to technologi- both self-administered and interview surveys
cal changes, such as call-screening devices (e.g. Singer, 2002). It should be noted
which increase the non-contacts, partly due that, in general, incentives sent in advance,
to changes in attitude towards unwanted the ‘prepaid’ incentives, work better than
telephone calls (Curtin et al., 2005; Steeh ‘promised’ incentives. Furthermore, there is
and Piekarski, 2006). Systematic overviews no clear evidence that ‘lotteries’ are effective
of response rates in Internet surveys are in increasing response.
scarce; studies comparing response rates
among Internet, mail, and telephone surveys
Question development
suggest that response rates are generally lower
for online surveys (Matsuo et al., 2004). A sound questionnaire is essential for data
Empirical comparisons between e-mail and gathering in both self-administered ques-
paper mail surveys of the same population tionnaires and structured interviews. The
indicate that response rates on e-mail surveys questions asked should cover the research
are lower than for comparable paper mail objectives in order to avoid specification
surveys (Couper, 2000); similar results are errors and to get valid answers. Specification
found for list-based Web surveys (Couper, error – a term from survey methodology –
2001). occurs when the final version of the question,
To reduce non-response in interview sur- as printed in the questionnaire, fails to collect
veys, one has to reduce both the non-contact information that is essential to answer the
(e.g. through intensified field work), and research question (cf. Biemer and Lyberg,
the refusals. The fact that response rates in 2003). In the social sciences this is usually
structured interviews are in general higher referred to as construct validity: does the
than in self-administered surveys, is mainly question measure what it is supposed to mea-
due to the role of the interviewer as persuader sure? Does it measure the intended theoretical
of reluctant respondents (cf. Groves and construct? (See: Cronbach and Meehl, 1955;
Couper, 1998). Interviewers may differ in see also Embretson and Bovaird, this book,
their individual success rate, but all inter- on measurement and scaling).
viewers can be trained to do a good job of But a good question needs to do more than
convincing respondents to cooperate, both cover the construct, it should be understand-
for face-to-face surveys (National Centre able and the respondent should be able to
for Social Research, 1999; Snijkers et al., answer it. When constructing questionnaires
1999) and for telephone interviews (Groves a researcher should start with following
and McGonagle, 2001). To achieve a high the basic rules for general questionnaire
construction as outlined in handbooks such response options. Respondents first have to

as Fowler (1995). These include the advice understand the question and decide what
to use simple words, avoid ambiguity, ask information the researcher asks for. In the
one question at a time, etc. In the next stage second step, they need to recall all rele-
the questions always should be tested. No vant information from memory. When it is
one, not even the most renowned expert, can simple factual or behavioural information
write a perfect questionnaire. Pre-testing is respondents can retrieve this, when it is rare
the only way of assuring that the questions behaviour or refers to a long time ago, this is a
as written do communicate to respondents as difficult task and respondents have to rely on
intended and that the respondent will be able heuristic strategies. When the question asks
to answer the questions. Besides performing for a strongly held attitude, it is relatively
checks, pre-tests may also provide valuable easy to retrieve, but if the question refers to
pointers on how to improve unsatisfactory more superficial opinions, respondents will
questions. rarely find a ‘ready-for-use’ answer stored in
Systematic pre-testing is a recent develop- memory. Instead, they will need to form a
ment. In the last three decades, cognitive judgement on the spot, based on whatever
psychology has strongly influenced survey relevant information comes to mind. Once
methodology and questionnaire development. step 3 has been successfully completed and
One of the most profound aspects of the respondents have formed a judgement in their
effects of cognitive psychology on surveys own minds, they have to report it. When
are the insights into survey artefacts, for an open question is used they can report it
instance why do context effects occur, why in their own words. However, more often a
do respondents satisfice and give only super- closed question is used and respondents need
ficial answers, etc. For an overview, see to format their answer to fit the response
Tourangeau et al. (2000) and Sudman et al. alternatives provided by the researcher. In this
(1996). Another major contribution has been final reporting stage, respondents may hesitate
the development of intensive, small-scale to communicate their private judgement, due
methods for evaluating and testing questions, to reasons of social desirability and self-
often called ‘cognitive testing’ or ‘cognitive presentation. If so, they will either refuse to
lab methods’. There is a variety of methods answer, offer a ‘do-not-know’ option, or in
available (cf. Presser et al., 2004), but all have the case of a closed question may also opt
in common that a small group of respondents, for a more acceptable, but not necessarily true
who are similar to the intended subjects response category. For an in-depth discussion
on important characteristics, like age and of the psychology of asking questions, see
education, are studied in depth to determine Schwarz et al. (2008) and Tourangeau et al.
if they understand the question and are able (2000).
and willing to answer it. Usually a form of
in-depth or ‘cognitive’ interviewing is used.
From question to ready-to-use
For specific cognitive interview methods see
questionnaire or interview schedule
Willis (2004). For a comprehensive overview
and general introduction into systematic pre- As stated in the section above, careful
testing, see Campanelli (2008). writing and testing of questions is important
A good guideline for both question writing both for structured interviews and for self-
and question testing is the question-answer administered questionnaires. But, a question-
process and its four stages: (1) comprehension naire is more than a collection of questions;
and interpretation of the question being asked; it contains instructions and texts to keep
(2) retrieval of relevant information from the flow of information going and to keep
memory; (3) integrating this information into the respondents motivated. It also should be
a summarized judgement; and (4) reporting pleasant to use, avoid unnecessary routing
this judgement by translating it to offered errors, and correctly guide from question to
question. Visual design through the use of and placing explicit interviewer instructions
graphical tools and lay-out is very important in parentheses (Salant and Dillman, 1994,
to successfully transform a collection of pp. 130–132). This is all for the benefit
questions into a well-designed questionnaire of interviewers, not for the eyes of the
(see for instance Redline et al., 2003). It is respondent. Exceptions are response cards
important to note that structured interviews printed with the major answer categories,
and self-administered questionnaires differ which are shown to respondents when long
in necessary layout and in how the final lists of response categories are presented in
questionnaire has to be constructed. The face-to-face interviews.
users are different and have different needs: In contrast, in a self-administered ques-
interview schedules are designed for trained tionnaire everything must be tailored to
interviewers who have to guide a respondent the respondent. There is no interviewer to
through the question-answer process, while motivate or help out, and the questionnaire
self-administered questionnaires should be itself should do it all. Visual design is here
totally self-explanatory to respondents. of the utmost importance. Salant and Dillman
Interview schedules constructed for struc- (1994) and Dillman (2000, 2007) give clear
tured interviews, both over the telephone or instructions and numerous examples of how
face-to-face, contain besides the questions to order questions, give instructions, and
also instructions for trained interviewers. As motivate respondents. Numbers, symbols,
a consequence, a finalized interview schedule and graphical layout (e.g. spacing, loca-
contains text to be read aloud by the inter- tion, brightness, contrast, and figure/ground
viewer, text that should never be read aloud arrangements) all communicate meaning, and
at all, and text that only in certain situations should be used to optimize a questionnaire
should be read. Examples of texts that always for self-administered use. A good example
are read out aloud by the interviewer are of how this has been done in a consistent
the questions themselves, texts to make the way is described by Dillman et al. (2005).
transition from one group of questions to For a theoretical background see Jenkins and
the next (e.g. ‘now I would like to ask you Dillman (1997), and Redline et al. (2003).
some question on …’), and instructions to
respondents (e.g. ‘I am going to read you
a list of ... statements. For each, please FACE-TO-FACE INTERVIEWS
indicate whether you think it is not important,
somewhat important, or very important’). Face-to-face interviews are the most flexi-
Examples of texts that are never read are ble form of data collection method. Main
specific interviewer instructions (e.g. ‘probe if advantages of the face-to-face interview are
the respondent does not answer’, or, ‘skip the availability of an interviewer to structure
to question 13’), or certain response and/or the interview situation and help and motivate
coding categories (e.g. ‘refused, no opinion, respondents. Furthermore, the face-to-face
does not apply’). An example of a text that setting allows for optimal communication, as
is sometimes read aloud is: ‘if you are not both verbal and non-verbal communication
sure, please give me your best guess’. To avoid are possible. Structured or partly structured
interviewer mistakes and to help interviewers interview schedules with open questions
read out aloud the correct information it is can be used as the interviewer poses the
advised to use consistent graphical language, questions, follows up with additional probes,
such as different fonts. Examples are using bridges silences, and records answers. The
bold type for all questions, signalling that presence of a well-trained interviewer also
all text in bold should be read aloud. enables the researcher to use a variety
For other types of information, other styles of measurements besides simple question-
should be used; for instance, instructions in answer sequences. For instance, respondents
italics, categories not to be read in capitals, can be asked to sort objects or pictures,
perform specific tasks, or the interviewer may in a mixed-mode design (for more details,
even do some physical measurements, e.g. see De Leeuw, 2005). In general, interviewers
in health-related studies. Also, respondents affect respondents and their answers also
can be presented with all kinds of visual when non-sensitive questions are being asked.
stimuli, ranging from simple response cards Respondents that are interviewed by the
listing the answer categories for a question to same interviewer tend to have more similar
pictures, advertisement copy or video clips. answers; this is called the interviewer effect or
Finally, highly complex questionnaires can interviewer variance. There are many reasons
be successfully implemented as a trained for this: interviewers vary in their capabilities
interviewer takes care of navigating through of motivating respondents, they may use
the questionnaire. In computer-assisted face- different probing techniques, or reword badly
to-face interviews (CAPI), the interviewer is worded questions in different ways, etc. (for
guided through the (complex) questionnaire more detail see Japec, 2005). Well-tested
by a computer program. This lowers error questionnaires, standardized procedures, and
rates even more and gives the interviewer thorough interviewer training is necessary to
more opportunities to concentrate on the reduce unwanted interviewer effects.
interviewer-respondent interaction. (For an Face-to-face interviews are the ‘Rolls
overview see De Leeuw, 1992, 2004.) Royce’ of data collection and just like the
When one is interested in studying the car they are extremely costly and take much
general population, the face-to-face survey care and time to get rolling. Interviewers have
also has the greatest potential. Sophisticated to be trained, not only in standard interview
sampling designs for face-to-face surveys techniques, but also in how to implement
have been developed, which do not require sampling and respondent selection rules and
a detailed sampling frame or a list of persons in how to solve various problems that can arise
or households. For instance, area probability when they are working along in the field. In
sampling can be used to select geographically addition, an extensive supervisory network is
defined units (e.g. streets or blocks of houses) needed to maintain quality control. Finally,
as primary units and households within an administrative manager is needed to make
these areas. Therefore, a main advantage of sure that new addresses and interview material
face-to-face interviews is its potential for a are mailed to the interviewers on a regular
high coverage of the intended population. basis.
Elaborate techniques based on household
listings (e.g. inventories of all household
members derived by an interviewer) can then TELEPHONE INTERVIEWS
be used to randomly select one respondent
from those eligible in a household (e.g. Kish, Telephone interviews are less flexible than
1965). face-to-face interviews. Their major draw-
The presence of an interviewer is a great back is the absence of visual cues during
advantage, but it can also be a disadvantage. the interview; telephone is auditory only.
Respondents may feel inhibited to answer This limits interviewers in their tools for
more sensitive questions in the presence of communication. For instance, as no non-
an interviewer, and in general, more socially verbal communication is possible, they have
desirable answers and conventional answers to say explicitly ‘thank-you’ or ‘yes’, instead
are given in interviews than when a self- of nod or smile. The absence of a visual
administered questionnaire is being used. If channel of communication also limits the
some questions have a very sensitive nature, researcher in the type of questions that can be
but a face-to-face interview is preferable for asked. For instance, questions using graphical
other reasons (e.g. coverage, additional ques- techniques, like smiley faces, and ranking and
tions) a good strategy is to combine an inter- sorting techniques are not possible. Semantic
view with a self-administered questionnaire differentials and other rating tasks with many
potential response categories will be difficult skills needed in one person is less than in face-
to use. As no response cards with lists of to-face interviews. The majority of telephone
answer categories are available in telephone interviewers no longer have to be prepared for
interviews, the interviewer and respondent every possible emergency and can concentrate
have to rely solely on the auditory channel of on standard, but high-quality interviewing.
communication. The interviewer has to read Special respondents or problem cases can be
out aloud the question along with the available dealt with by the available supervisor or can
answer categories and the respondent has be allocated to specially skilled and trained or
to try to keep all possibilities in memory. bi-lingual interviewers.
As a consequence, only very familiar scales, Because of the potentials for close supervi-
such as 0 to 10 scales (‘on a scale of sion and quality control, interviewer effects
0 to 10 where …’) or questions with a are in general smaller over the phone
limited number of response categories can than in face-to-face interviews (e.g. Groves,
be used. This has led to the development 1989, chapter 8). Interviewers can effect the
of special question formats in which the responses given in different ways, by the
answer categories are split up, for questions way they read the question and emphasize
with seven or more response categories. certain parts, by deviating from prescribed
An example is the two-step or unfolding wording, by reacting in different ways to
procedure in which respondents are first questions or problems of the respondents,
asked if they are ‘satisfied’, ‘dissatisfied’ or and even by the way they look or sound.
‘somewhat in the middle’, and depending on As interviewers are only a voice over
their answer, are asked specific follow-up the phone, many interviewer characteristics
questions (e.g. ‘is this completely satisfied, (e.g. those connected with appearance) will be
mostly satisfied, or somewhat satisfied’). In less obvious. Furthermore, the close supervi-
general, over the telephone questions must be sion and potential for immediate feedback on
short and easily understandable. inadequate interviewer behaviour will lessen
However, just as in face-to-face interviews, unwanted interviewers’ influence over the
well-trained interviewers are an advantage. In phone.
telephone surveys the interviewer can assist Telephone interviews are only feasible if
respondents in understanding questions, can telephone coverage is high, in other words
administer questionnaires with a large number if the non-telephone part of the population
of screening questions, control the question can be ignored. To be sure that persons with
sequence, and probe for answers on open unlisted telephones are also included, one
questions.Again like in CAPI, the use of CATI can employ random digit dialling. Random
makes these tasks easier for the interviewer. digit dialling techniques, which are based on
The personnel requirements for a telephone the sampling frame of all possible telephone
survey are less demanding than in face-to- numbers, make it feasible to use telephone
face surveys. Usually, telephone interviews interviews in investigations of the general
are conducted from a central setting where population. A new challenge to telephone
supervisors and quality controllers follow survey coverage is the increasing popularity
the process closely. Because the interviews of mobile (cell) phones. If mobile phones
are being conducted from a central location are additional to fixed landline phones
over the phone and interviewers do not (i.e. a person has a mobile phone, but also a
have to travel to respondents, fewer highly landline phone at home), this will not pose
trained interviewers and supervisors are a major problem for under-coverage. But,
needed. Interviewers should, of course, be there is evidence that certain groups (e.g. the
well trained in standard interview techniques young, lower income, urban, more mobile)
and in telephone conversations and know are over-represented in the mobile-phone-
how to use this auditive-only medium of only proportion of the population. When
communication, but the variety of interviewer mobile phones are excluded from telephone
surveys, this may result in serious under- a telephone and people with an unlisted tele-
coverage of these groups. Some countries phone cannot be reached, but the advantage
have good listings of all phone numbers, that telephone reminders or follow-ups can
including mobile phones, others have not; easily be implemented. Another reason for
customs associated with mobile phone use the frequent use of the telephone directory as
also differ from country to country. How sampling frame is the relative ease and the
mobile phones affect the efficacy of telephone low costs associated with this method.
surveys is therefore country dependent. For an A distinct drawback of mail surveys is
overview see Nathan (2001) and Steeh (2008). the limited control the researcher has over
In telephone interviews, as in face-to-face the choice of the specific individual within
interviews, the Kish procedure based on a a household who in fact completes the
complete household listing can in theory be survey. There is no interviewer available
used to select respondents within a household. to apply respondent selection techniques
However asking for a complete household within a household and all instructions for
listing over the phone, is a rather complex respondent selection have to be included
and time-consuming procedure and increases in the accompanying letter. As a conse-
the risk of break-offs. A good alternative for quence only simple procedures such as the
the Kish procedure is the last birthday or the male/female/youngest/oldest alternation or
next birthday method. In the last birthday the last birthday method can be successfully
method, the interviewer asks to speak with used. The male/female/youngest/oldest alter-
that household member who most recently nation asks in a random 25 percent of the
had a birthday. Even though, the birthday accompanying letters for the youngest female
methods are very popular and seen as the in the household to fill in the questionnaire;
standard to select a particular respondent from in a second random 25 percent of the letters
a household in telephone surveys, they are the youngest male is requested to fill in
not as precise as the complete Kish method. the questionnaire, etc. When a complete
For an overview see Ganziano, 2005. list of the individual members of the target
One of the main advantages of telephone population is available, which can be the case
interviews, besides the close supervision of in surveys of special groups or in countries
interviewers for quality control, is the relative with good administrative records, a random
low cost of telephone interviews both for sample of the target population can be drawn
completed interviews and for callbacks to regardless of the data collection method used.
non-respondents. As interviewers do not have In that case, coverage and sampling will be
to travel, a limited number of interviewers as good as in interview methods.
may call a large number of respondents in a The absence of an interviewer makes mail
relative short time period. This is especially surveys the least flexible data collection
important in sparsely populated areas or technique when complexity of the ques-
countries. tionnaire is considered. All questions must
be presented in a fixed order, and only a
limited number of simple skips and branches
MAIL SURVEYS can be used. For routings special written
instructions and graphical aids, such as arrows
Mail surveys require an explicit sampling and colours, have to be provided; for a
frame of names and addresses, and have great example see Dillman et al. (2005).
the advantage if only addresses and no tele- Furthermore, in a mail survey, all respondents
phone numbers are available. Often, tele- receive the same instruction and are presented
phone directories or other lists are used with the questions without added interviewer
for mail surveys of the general population. probing or help in individual cases. In
Using the telephone directory as a sampling short, a mail questionnaire must be totally
frame has the drawback that people without self-explanatory. But, a big advantage is
that visual cues and stimuli can be used, Dillman (1978, p. 68) gives an example in
and with well-developed instructions fairly which a survey unit of 15 telephones can
complex questions and attitude scales can complete roughly 3000 interviews during the
be implemented. The visual presentation of 8 weeks it takes to perform a complete mail
the questions makes it possible to use all survey with reminders. Only if the telephone
types of graphical questions (e.g. ladder, unit is smaller than 15 interviewers, or the
thermometer), and to use questions with number of needed completed interviews is
seven or more response categories. Also, larger than 3000, will a mail survey be faster.
information booklets or product samples can Logistically, mail surveys also have two
be sent by mail with an accompanying huge advantages: small staff and low costs.
questionnaire for their evaluation. However, Organizational and personnel requirements
open-ended questions are difficult to ask, for a mail survey are far less demanding than
as no interviewer is present to probe for in interviews. Most of the workers are not
more details. required to deal directly with respondents, and
In general, self-administered question- the necessary skills are mainly generalized
naires are less intrusive and allow for more clerical skills (e.g. typing, sorting, response
privacy and less time pressure. The absence administration, and correspondence process-
of an interviewer may in certain situations be ing). Of course, a trained person must be
a real advantage, especially when sensitive or available to deal with requests for informa-
socially desirable questions are being asked. tion, questions, and refusals of respondents,
Another advantage is that mail surveys can but no interviewers or other field staff are
be completed when and where the respondent needed. Thus, the number of different persons
wants and is not dependent on interviewer necessary to conduct a mail survey is far
time. A respondent may consult records if less than that required for interview surveys
needed, which may improve accuracy. For an with equal sample sizes. Requirements for the
overview see de Leeuw (1992) and Dillman organization and personnel do influence the
(2000). cost of data collection; as a consequence mail
From a logistic point of view mail surveys surveys are among the least expensive and
have two drawbacks: questionnaire length may be the only affordable mode in certain
and turn-around time. The personal presence situations.
of interviewers in face-to-face interviews
prohibits break-offs and allows for longer
questionnaires than in mail surveys, although INTERNET SURVEYS
telephone interviews do not have this advan-
tage. According to Dillman (1978, p. 55) mail In Internet or Web surveys, coverage is still
questionnaires up to 12 pages, which contain a major problem when surveying the gen-
less than 125 items, can be used without eral population (Couper, 2000, 2001). Even
adverse effects on the response. Turn-around though Internet access is growing and around
in mail surveys is slower than in most other 70 percent of the US population has access
modes. Mail surveys are locked into a definite to the Internet, the picture is diverse ranging
time interval of mailing dates with rigidly from 75 percent coverage for Sweden to 4 per-
scheduled follow-ups, and therefore take cent in Africa (www.internetworldstats.com).
longer than other modes of data collection, Furthermore, those covered differ from those
with the exception of large, geographically not covered, with the elderly, lower educated,
dispersed face-to-face interviews, which take lower income, and minorities less well-
the longest. When speed of completion is represented online.
really important and data are needed fast, As reaction to the differential coverage and
telephone and Internet surveys are best. the relative low response rates of Internet
If the data are needed in a couple of surveys, so-called ‘access panels’ gain in
weeks, mail surveys are a good choice. popularity in market research. In access
panels, samples of panel members with market research (ESOMAR) on conduct-

Internet access are sent requests to fill in ing market and opinion research using the
questionnaires at regular intervals. Panel Internet (http://www.esomar.org/web/show/
research is not new, and the advantages and id=49859).
disadvantages of panel research have been Because an interview program determines
well described (e.g. Kasprzyk et al., 1989); the order of the questions, more complex
what is new is the potential of Internet to select questionnaires can be used than in a paper mail
and survey huge panels at low costs. A major survey. In this sense – complexity of question-
quality criterion for Internet panels is how naire structure – an Internet or Web survey is
the Internet or access panels were composed. equivalent to an interview survey. In addition,
Is the panel based on a probability sample Internet surveys share the advantages of mail
(e.g. RDD telephone invitation), or is it a surveys regarding visual aids, but the Web
non-probability sample, in other words is it has far more potential than paper. Dillman
based on self-selection (e.g. through banners (2007) gives a comprehensive overview of
or invitations on a website inviting people to visual design and Web surveys. This is
become a panel member)? Only probability- based on both theory and empirical studies
based panels allow for sound statistical (see also http://survey.sesrc.wsu.edu/dillman/
analysis. Non-probability panels may result papers.htm). Compared to mail surveys that
in very large numbers of respondents, but are limited to questionnaires of low com-
those respondents are a convenience sample. plexity, Web-based questionnaires allow for
As all statistics are based on the assumption of very complex questionnaires that on the
probability sampling, statistics (e.g. margin of screen may appear simple and attractive. This,
errors, p-values) computed on non-probability together with the potential for using visual
samples, such as self-selected Internet panels, stimuli, the freedom for the respondent to
make no sense at all. Recently, propensity respond at their chosen time or place, and
score adjustment has been suggested to reduce the greater privacy, makes Internet a new and
the biases due to non-coverage, self-selection, unique data collection procedure. However,
and non-response (Lee, 2006). In propensity Internet also has data drawbacks, it is a more
weighting one ideally has access to a reference perfunctory medium and people often just
sample with high-quality data and low non- pay a flying visit. Respondents may have a
response. Like in all weighting schemes it is stronger tendency to satisfies and give top-
important that good auxiliary variables are of-the head answers, or just peek and leave
available and that the variables used in the causing many early break-offs. In general, it is
adjustment are both highly related to the therefore wise to use only short questionnaires
‘outcome’ variable and to the self-selection on the Web; 10–15 minutes is already a long
mechanism. It is the researchers’ duty to time for an Internet survey.
be transparent on the weighting procedures Logistically, only a small number of staff
and the predictive power of the propensity is needed to implement and run Internet
model used. surveys. But, to design and implement an
Like in mail surveys, the control of the Internet survey highly skilled and special-
interview situation is low in Internet surveys. ized personnel are needed, who combine
This is often considered a disadvantage: one technical knowledge (e.g. operating systems,
does not know if the intended respondent is browsers, etc.) and knowledge on usability
completing the questionnaire. But this can and visual design. These requirements for the
also be seen as an advantage: the respondent organization and personnel do influence the
is in charge and the interview situation may cost of data collection. But, when a survey
offer more privacy. Of course, to fully take is implemented, it can be used for large
advantage of this, potential privacy concerns numbers: a large sample does not cost more
of respondents should be met (see for instance than a small sample in running the survey.
the guideline of the world association of This is what constitutes the attractiveness
of Internet surveys: it can be used for the surveys, these forms of self-administered
fast collection of large numbers of completed questionnaires allow for more privacy and
questionnaires at low costs. In addition, self-disclosure as no interviewer is directly
there are no data entry costs, an advantage involved in the question-answer process. But
Internet surveys share with all computer- there are two main differences. The first is that
assisted interview modes. it is the researcher and not the respondent who
decides when and where the questionnaire
has to be completed. The researcher also
OTHER SELF-ADMINISTERED determines how long a session will take, and
QUESTIONNAIRES how much time subjects have to fill in the
questionnaire. This may be a disadvantage
Mail and Internet surveys are only two forms when well-considered responses are needed,
in which self-administered questionnaires but an advantage when speed-tests or first
can be used. These forms are most often associations are more appropriate. The second
implemented in social sciences surveys and difference is that, although no interviewer is
in polling. In psychology and education, directly involved, usually a trained research
other forms of self-administered question- assistant is present to give instructions,
naires are frequently used. In educational distribute the tests, and answer questions
research, group-wise administration of self- if necessary. Group-administered question-
administered questionnaires is common, be naires can be seen as a hybrid between
it in a paper form in the classroom, or in interview and mail survey, combining the
an electronic form in the school’s computer advantages of both methods: enough privacy
laboratory (cf. Beebe et al., 1998; Van for subjects to answer more freely, and
Hattum & De Leeuw, 1999). In psychological available assistance when needed.
testing, self-administered tests are used either
in an individual or a group setting. Again the
administration can be either as paper-and-pen SUMMARY
or computer-assisted testing (cf. Weisband
and Kiesler, 1996). In survey research there are two main forms
Examples of individual administration are of data collection: self-administered question-
questionnaires that are handed out by a nurse naires and standardized interviews. These are
or health officer in a hospital waiting room, mainly characterized by the absence versus
or by a receptionist in a day care centre. presence of an interviewer. But there are many
Sometimes, self-administered questionnaires variations possible, such as face-to-face and
are used with an interviewer present. This telephone interviews with their computer-
is usually done when sensitive questions assisted equivalents CAPI and CASI, and self-
have to be asked and the interviewer hands administered mail questionnaires and Internet
over a questionnaire for the respondent to surveys. Each method has its advantages
fill in privately. When computer-assisted and disadvantages, which are summarized
interviewing or CAPI is used, the interviewer below.
hands over the computer to the respondent for Deciding which data collection is best in a
a short period. The respondent can answer certain situation is often complex and depends
the specific questions in privacy and the on many factors, such as population under
interviewer remains at a respectful distance, investigation, topic, types of questions to
but also is available for instructions and be asked, available time, and funds. This
assistance. presents researchers with a difficult choice
Just as in mail and Internet surveys, the indeed. It is no wonder that recently multiple
questionnaires should be well tested and modes of data collection or mixed modes have
attention should be paid to graphical tools become popular. In mixed-mode surveys,
and layout. Just as in mail and Internet two or more modes of data collection are
Main Advantages and Disadvantages of Questionnaires and Interviews
Face-to-Face Interviews in Sum:

1 Face-to-face interviewing has the highest potential regarding types of questions asked, and complexity
of questionnaires. To realize this potential one needs both well-trained interviewers and well-tested
questionnaires. In addition a highly qualified field staff is necessary to make sure that all logistics are
taken care of. Only then will a face-to-face interview really fulfill its potential. This is very costly and
time-consuming and only worth it in some situations; researchers should carefully consider if all that
potential is really needed to answer the research question.
2 Face-to-face interviewing has also the highest potential regarding coverage and sampling, but it can be
very costly, especially if the country is large and sparsely populated. Cluster sampling may be needed,
and if the sample dispersion is very high telephone surveys are often employed.
3 The greatest asset of the face-to-face interview – the presence of an interviewer – is also its greatest
weakness. Their presence may influence the answers respondents give, especially when sensitive
questions are being asked, and in general they may contribute to the total survey error, due to variance
in interviewer ability and competence.
Telephone Interviews in Sum:

4 Telephone interviews have less potential regarding types of questions asked than face-to-face interviews,
as no visual communication is possible. But interviewers are available to help and guide the respondent
and complex questionnaires may be used. However, fewer questions can be asked and telephone
interviews must be far shorter than face-to-face interviews.
5 Due to unlisted numbers and cell phones, coverage may be sub-optimal. However if good lists
are available telephone interviewing is, from a sampling point of view, comparable to face-to-face
interviewing. If the sample dispersion is very high telephone surveys are often the only interview mode
feasible.
6 In telephone interviews quality control is high as interviewers can be closely monitored and immediate
feedback is possible.
7 Many interviews can be completed in a relatively short time with a smaller number of interviewers than
face-to-face. Also telephone interviews are less costly than face-to-face interviews.
Mail Surveys in Sum:

8 Mail surveys lack the flexibility and interviewer support of interview surveys, which limits the complexity
of the questionnaire used. However, visual stimuli, such as pictures or graphics can be applied and
examples or show material may be included.
9 Mail surveys are less intrusive than interviews: respondents may answer at leisure in their own time and
there is no interviewer present who may inhibit free answers to more sensitive topics.
10 Lists with addresses of the target population should be available, but telephone numbers are not
necessary.
11 Mail surveys have a longer turn-around than telephone surveys, but face-to-face interviewing usually
takes longer.
12 Mail surveys are far less costly than both face-to-face and telephone interview surveys.
Internet Surveys in Sum:

13 Internet access varies strongly between countries and within countries. Lists with e-mail addresses of
the target population should be available, and depending on the population under investigation large
coverage problems may arise.
14 In Internet surveys complex questionnaires and visual stimuli can be applied, but questionnaires have to
be very short.
15 Like mail surveys Internet surveys are less intrusive.
16 Large numbers of completed questionnaires can be collected in a very short time and at low costs.
combined in such a way that the disadvantages Sage (Pine Forge Press series in research methods and
of one method are counterbalanced by the statistics).
advantages of another; for instance combining Don, A.D. (2007). Mail and Internet Surveys (with 2007
a Web survey with a telephone interview update). New York: Wiley (discusses establishment
to compensate for under-coverage of the surveys and mixed mode too).
Floyd, J.F. (1995). Improving Survey Questions: Design
elderly and lower educated on the Internet.
and Evaluation (Vol. 38). Thousand Oaks: Sage
Other examples of mixed-mode designs are Applied Social Research Methods Series (on question
the use of face-to-face interviews for those writing and testing).
who cannot be reached by telephone, or
telephone interviews among non-respondents
in mail or Internet surveys. In longitudinal
For international studies
surveys, mixed-mode designs are common as
data collection methods often vary between de Leeuw, E.D., Hox, J., and Dillman, D. (eds) (2008).
waves; for instance (face-to-face) surveys International Handbook of Survey Methodology.
during recruitment and in the base-line Mahwah, N.J.: Erlbaum (especially chapters 9–14 & 16).
survey and less expensive survey methods
(e.g. mail, Internet, or telephone) in the
subsequent waves. Of course, when mixing REFERENCES
modes particular attention should be paid to
equivalence of question format, comparability Beebe, T.J., Harrison, P.A., McRae, J.A., Anderson,
of answers and data integrity. (For extensive R.E., and Fulkerson, J.A. (1998). An evaluation of
computer-assisted self-interviews in a school setting.
overviews see De Leeuw, 2005.)
Public Opinion Quarterly, 62: 623–632.
Which data collection mode or mix of
Biemer, P.P. and Lyberg, L.E. (2003). Introduction to
modes is chosen is the result of a careful Survey Quality. New York: Wiley.
consideration of quality and costs. But, certain Campanelli, P. (2008). Testing survey questions. In
survey design steps should always be taken, Edith de Leeuw, Joop Hox, and Don Dillman (eds)
as they are extremely important for high- International Handbook of Survey Methodology.
quality data. Among these are the careful Mahwah, N.J.: Erlbaum.
construction and (pre)-testing of the ques- Couper, M.P. (2000). Websurveys; A review of issues
tionnaire, the implementation of response- and approaches. Public Opinion Quarterly, 64, 4:
inducing features, such as advance letters, 464–494. See also Couper (2000) the Good, the
reminders, and if the budget allows the use Bad, and the Ugly. University of Michigan, Institute
for Social Research, Survey Methodology Program,
of incentives. Finally, in the case of inter-
Working paper series # 077.
views a thorough training of interviewers is
Couper, M.P. (2001). The promises and perils of
necessary in interview rules and non-response web surveys. Presentation at the ASC-Conference
reduction. on the Challenge of the Internet. Available at
www.asc.org.uk (accessed January, 2006).
Cronbach, L.J. and Meehl, P.E. (1955). Construct validity
SUGGESTED READINGS in psychological tests. Psychological Bulletin, 52:
281–302.
On survey quality and Curtin, R., Presser, S., and Singer, E. (2005). Changes in
data collection telephone survey nonresponse over the past quarter
century. Public Opinion Quarterly, 69: 87–98.
Paul, P.B. and Lars, E.L. (2003). Introduction to Survey de Heer, W. (1999). International response trends:
Quality. New York: Wiley (especially chapters 5 & 6). results of an international survey. Journal of Official
Statistics, JOS, 15, 2: 129–142. Also available on
www.jos.nu.
On practical aspects of surveys de Heer, W., de Leeuw, E., and van der Zouwen, J.
(1999). Methodological issues in survey research:
Czaja, R. and Blair, J. (2005). Designing Surveys: A A historical review. BMS, Bulletin de Methodologie
Guide to Decisions and Procedures. Thousand Oaks: Sociologique, 64: 25–48.
de Leeuw, E.D. (1992). Data Quality in Mail Groves, R.M. and Mc Gonagle, K.A. (2001). A
Telephone and Face-to-face Surveys. Amsterdam: theory guided training protocol regarding survey
TT-Publikaties. Available at http://www.xs4all.nl/ participation. Journal of Official Statistics (JOS), 17,
∼edithl/pubs/disseddl.pdf (accessed June 2006). 2: 249–265. Also available at www.jos.nu.
de Leeuw, E.D. (2004). New Technologies in Data Hox, J.J. and de Leeuw, E.D. (1994). A comparison
Collection, Questionnaire Design and Quality. Inter- of nonresponse in mail, telephone, & face to face
national Statistical Seminars Series # 44. San surveys: Applying multilevel modeling to meta-
Sebastian: EUSTAT. Available at http://www.eustat. analysis. Quality & Quantity, 28: 329–344. Reprinted
es/prodserv/datos/sem44.pdf (accessed June 2006). in: David de Vaus (2002) Social Surveys, part eleven,
de Leeuw, E.D. (2005). To mix or not to mix: data nonresponse error. London: Sage, Benchmarks in
collection modes in surveys. Journal of Official Social Research Methods Series.
Statistics, 21, 2: 233–255. Available at www.jos.nu Hyman, H.H. (1954). Interviewing in Social Research.
(accessed June 2007). Chicago: Chicago University Press.
de Leeuw, E.D., Callegaro, M. Hox, J.J., Korendijk, E., and Japec, L. (2005). Quality Issues in Interviewer Surveys:
Lensvelt-Mulders, G. (2007). The influence of advance Some Contributions. Stockholm: Stockholm Univer-
letters on response in telephone surveys: A meta- sity: Department of statistics (ISBN 91-7155-155-7).
analysis. Public Opinion Quarterly, 71, 3: 1–31. Jenkins, C. (now Cleo Redline) and Dillman, D.A. (1997).
de Leeuw, E.D. and de Heer, W. (2002). Trends in Towards a theory of self-administered questionnaire
household survey nonresponse: A longitudinal and design. In Lyberg, L., Biemer, P., Collins, M., de
international comparison. In Dillman, D.A., Eltinge, Leeuw, E., Dippo, C., Schwarz, N., and Trewin, D.
J.L., Groves, R.M., and Little, R.J.A. (eds) Survey (eds) Survey Measurement. New York: John Wiley.
Nonresponse. New York: Wiley. Kasprzyk, D., Duncan, G.J., Kalton, G., and Singh, M.P.
de Leeuw, E., Hox, J., and Kef, S. (2003). Computer- (1989). Panel Surveys. New York: Wiley.
assisted self-interviewing tailored for special popula- Kish, L. (1965). Survey Sampling. New York: Wiley.
tions and topics. Field Methods, 15: 223–251. Lee, S. (2006). Propensity score adjustment as a
Dillman, D.A. (1978). Mail and Telephone Surveys. The weighting scheme for volunteer panel web surveys.
Total Design Method. New York: Wiley. Journal of Official Statistics (JOS), 22, 2: 329–349.
Dillman, D.A. (2000). Mail and Internet Surveys. The Also available at www.jos.nu.
Tailored Design Method. New York: Wiley. Lozar, M.K. and Vehovar, V. (2008). Internet surveys.
Dillman, D.A. (2007). Mail and Internet Surveys. In de Leeuw, E., Hox, J., and Dillman, D. (eds)
The Tailored Design Method (2007 Update with International Handbook of Survey Methodology.
Appendix). New York: Wiley. Mahwah, N.J.: Erlbaum.
Dillman, D.A., Eltinge, J.L., Groves, R.M., and Little, Matsuo, H., McIntyre, K.P., Tomazic, T., and Katz, B.
R.J.A. (2002). Survey nonresponse in design, data The online survey: its contributions and poten-
collection and analysis. In Dillman, D.A., Eltinge, tial problems. American Statitistical Association
J.L., Groves, R.M., and Little, R.J.A. (eds) Survey (ASA). Proceedings, 2004, ASA section on Survey
Nonresponse. New York: Wiley. Research Methods, pp. 3998–4000. Available at
Dillman, D.A., Gertseva, A., and Mahon-Taft, T. (2005). www.amstat.org/sections/srms/proceedings.
Achieving useability in establishment surveys through Nathan, G. (2001). Telesurvey methodologies for
the application of visual design principles. Journal households: A review and some thoughts for the
of Official Statistics (JOS), 21, 2: 183–214. Also future. Survey Methodology, 27: 7–31.
available at www.jos.nu. National Centre for Social Research (1999). How
Fowler, F.J. (1995). Improving Survey Questions: Design to Improve Survey Response Rates: A Guide for
and Evaluation (Vol. 38). Thousand Oaks: Sage Interviewers on the Doorstep. London, Thousand
Applied Social Research Methods Series. Oaks and New Delhi: Sage Publications.
Ganziano, C. (2005). Comparative analysis of within- O’Muircheartaigh, C. (1997). Measurement error in
household respondent selection techniques. Public surveys: A historical perspective. In Lyberg, L., Biemer,
Opinion Quarterly, 69: 124–157. P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N.,
Goyder, J. (1987). The Silent Minority: Nonrespondents and Trewin, D. (eds) Survey Measurement and Process
on Sample Surveys. Cambridge: Policy Press. Quality. New York: Wiley.
Groves, R.M. (1989). Survey Errors and Survey Costs. Presser, S., Rothgeb, J., M., Couper, M.P., Lessler,
New York: Wiley. J.T., Martin, E., Martin, J., and Singer, E. (2004).
Groves, R.M. and Couper, M.P. (1998). Nonresponse in Methods for Testing and Evaluating Survey Questions.
Household Interview Surveys. New York: Wiley. New York: Wiley.
Redline, C.D., Dillman, D.A., Carley-Baxter, L., and Handbook of Survey Methodology. Mahwah, N.J.:
Creecy, R. (2003). Factors that influence reading Erlbaum.
and comprehension in self-administered question- Steeh, C. and Piekarski, L. (2006). Accommodating
naires. Paper presented at the Workshop on Item- new technologies: the rejuvenation of telephone
Nonresponse and Data Quality, Basel Switzerland, surveys. Paper Presented at the second International
October 10, 2003. Available at http://survey.sesrc. Conference on Telephone Survey Methodology
wsu.edu/dillman/papers.htm (accessed June 2006). (TSMII), Florida.
Salant, P. and Dillman, D.A. (1994). How to Conduct Sudman, S., Bradburn, N.M., and Schwarz, N. (1996).
Your Own Survey. New York: Wiley. Thinking About Answers. The Application of Cognitive
Schwarz, N., Knäuper, B., Oyserman, D., and Stich, Processes to Survey Methodology. San Francisco:
C. (2008). The Psychology of Asking Questions. Jossey-Bass.
In Edith De Leeuw, Joop Hox, and Don Dillman Tourangeau, R., Rips, L.J., and Rasinski, K. (2000).
(eds) International Handbook of Survey Methodology. The Psychology of Survey Response. Cambridge:
Mahwah, N.J.: Erlbaum. Cambridge University Press.
Singer, E. (2002). The use of incentives to reduce Van Hattum, M.J.C. and de Leeuw, E.D. (1999). A disk by
nonresponse in household surveys. In Dillman, D.A., mail survey of pupils in primary schools: Data quality
Eltinge, J.L., Groves, R.M., and Little, R.J.A. (eds) and logistics. Journal of Official Statistics (JOS), 15,
Survey Nonresponse. New York: Wiley. 3: 413–429. Also available at www.jos.nu (accessed
Snijkers, G., Hox, J.J., and de Leeuw, E.D. (1999). June 2006).
Interviewers’ tactics for fighting survey nonresponse. Weisband, S. and Kiesler, S. (1996). Self disclosure
Journal of Official Statistics (JOS), 15, 2: 185–198 on computer forms: Meta analysis and implications.
(available at www.jos.nu). Reprinted in: David de CHI ’96. Available at http://acm.org/sigchi/chi96/
Vaus (2002), Social Surveys, part eleven, nonresponse proceedings/papersWeisband/sw_txt.htm (accessed
error. London: Sage, Benchmarks in Social Research July 2006).
Methods Series. Willis, G.B. (2004). Cognitive Interviewing. A Tool for
Steeh, C. (2008). Telephone surveys. In de Leeuw, E., Improving Questionnaire Design. Thousand Oaks:
Hox, J., and Dillman, D. (eds) International Sage.
19
Qualitative Interviewing and
Feminist Research
Andrea Doucet and Natasha Mauthner
INTRODUCTION because it ‘begins from the premise that

the nature of reality in western society is
Over the past three decades, there have been unequal and hierarchical’ (Skeggs 1997, 77)
multiple intersections between feminism and while Ramazanoglu and Holland (2002, 2–3)
the fields of methodology and epistemology. note that such research ‘is imbued with
While feminist scholars initially claimed particular theoretical, political, and ethical
the distinctiveness of ‘feminist methods,’ concerns that make these varied approaches
‘feminist methodologies,’ and ‘feminist epis- to social research distinctive.’ Third, femi-
temologies,’ since the 1990s they have begun nist researchers have actively engaged with
to map out significant feminist contributions methodological innovation through challeng-
to these domains rather than separate fields ing conventional or mainstream ways of
of study per se (see Doucet and Mauthner collecting, analyzing, and presenting data
2006). Nevertheless, feminist researchers (Code 1995; Gelsthorpe 1990; Lather 2001;
have embraced particular characteristics in Lather and Smithies 1997; Mol 2002; Naples
their work. First, they have long advocated 2003; Richardson 1988, 1997).
that feminist research should be not just In the 1970s and 1980s, many feminists
on women, but for women (DeVault 1990, questioned whether positivist frameworks
1996; Edwards 1990; Fonow and Cook and quantitative methods could adequately
1991, 2005; Ramazanoglu and Holland 2002; capture women’s experiences and everyday
Reinharz 1992; Smith 1987, 1989, 1999; lives (Graham 1983; Oakley 1974; Reinharz
Stanley and Wise 1983, 1993). Second, 1979; Stanley and Wise 1990). Early feminist
they have advocated that feminist research debates tended to draw a marked distinc-
should be concerned with issues of broader tion between qualitative and quantitative
social change and social justice (Fonow and approaches with the implication that quali-
Cook 1991, 2005). For example, Beverly tative methods were quintessentially feminist
Skeggs argues that feminist research is distinct (Maynard and Purvis 1994). In particular, the
QUALITATIVE INTERVIEWING AND FEMINIST RESEARCH 329
in-depth face-to-face interview came to be known at all through interviews or, indeed,
seen as ‘the paradigmatic “feminist method”’ through any other method (Wilkinson and
(Kelly et al. 1994, 34). The equation of Kitzinger 1996). Our chapter also addresses
feminist research with qualitative methods the increasingly topical and critical question
was criticized by a number of feminists of how one can come to know others
early on (e.g. Jayaratne 1983). Since then, who are different from ourselves (such as
feminists have increasingly moved away in cross-cultural interviewing and women
from privileging particular methodological interviewing men) and highlights the most
approaches and methods. There has been recent contributions of feminist scholarship to
recognition that research methodologies and contemporary understandings of the research
methods should reflect the specific research interview.
questions under investigation, and that key
feminist concerns can usefully be addressed
by adopting a range of different approaches FEMINIST CONTRIBUTIONS TO THE
and methods (Brannen 1992; Chafetz 2004a, INTERVIEW: 1970s AND 1980s
2004b; Kelly et al. 1994; Maynard 1994;
McCall 2005; Oakley 1998; Westmarland In the 1970s, feminist researchers began to
2001). engage with the intersections between fem-
Whilst recognizing that current feminist inist theory and methodologies, and turned
research is characterized by the use of their attention to the ways in which the meth-
multiple and mixed methods and approaches, ods available for studying and understanding
the focus of this chapter is specifically women’s lives were flawed. As Dorothy
on the ways in which feminist scholars Smith (1974, 2) noted, there was within
have sought to transform the classic social sociology ‘a disjunction between how women
science interview in line with feminist aims. find and experience the world beginning
Just as feminist thinking around issues of (though not necessarily ending up) from
method, methodology, and epistemology have their place and the concepts and theoretical
had a profound effect on research practices schemes available to think about it in.’ Early
and theories more generally, contributions feminist sociological theory thus pointed to
that feminist scholars have brought to the how women’s exclusion mattered both theo-
interview as a site for knowing from and about retically and methodologically. Turning their
women’s lives have been influential in re- gaze to dominant methods used to generate
shaping the practice and theory of qualitative theory, many feminist scholars expressed
interviewing more broadly. unease about quantitative data collection
The aim of this chapter is therefore to methods across the social and natural sciences
examine feminist debates concerning the and, more specifically, gender bias in the
interview as a particular method of data collection and interpretation of data on sex
collection. We begin by sketching out what differences in behavioral, biological, and bio-
we regard as some key historical trends in behavioral scientific research. Feminist scien-
feminist approaches to interviewing, with tists documented, in particular, the exclusive
a particular discussion of Ann Oakley’s use of male subjects in both experimental and
(1981) now classic piece on the importance clinical biomedical research, the selection of
of non-hierarchical interviewing practices. male activity and concomitant male-dominant
While Oakley’s contribution initially stimu- animal populations for study, and the blatant
lated discussions around the possibilities and invisibility of females in research protocols
limitations of creating rapport and friendliness (Haraway 1988, 1991; Keller 1983, 1985;
within interviews, more recent challenges Keller and Longino 1998; Longino and Doell
from black feminism, cultural studies, post- 1983; Rose 1994).
structural and postcolonial writing have Whilst feminist scientists made such obser-
questioned the extent to which ‘others’ can be vations on the basis of experiments conducted
on rats and baboons, similar concerns were lamented, women’s experiences were being
made across the social sciences and humani- measured within surveys designed on the
ties on research processes and protocols with basis of men’s lives; her provocative question,
human beings. Feminist social scientists noted posed at the beginning of the 1980s, summed
how masculine bias permeated research, as up the growing dissatisfaction with surveys
perhaps best revealed in the valuing and for understanding women’s experiences: ‘Do
incorporation of traditional masculine char- her answers fit his questions?’ (Graham
acteristics of reason, rationality, autonomy, 1983).
and disconnection (see Code 1981; Gilligan It was against this backdrop that feminist
1977, 1982; Keller 1985; Lloyd 1983; Miller social scientists turned their attention to the
1976; Smith 1974). Also within the social possibilities and practices of interviewing.
sciences and humanities, feminists waged During the 1980s feminist researchers, espe-
a long and wide epistemological critique cially those working within sociology, began
of positivism as a philosophical framework to engage with the issue of how to interview in
and its detached and ‘objective’ scientific ways that would adhere to widely recognized
approach that objectified research subjects. feminist goals of conducting non-hierarchical
Feminist scholars raised three particular and egalitarian research. This critique began
concerns within this epistemological critique. early in the decade with Ann Oakley’s
First, women’s lives and female-dominated now highly cited article on ‘non-hierarchical’
domains were largely absent in much social relationships between female interviewers
science research. Thus when Dorothy Smith and interviewees (Oakley 1981). Her dis-
argued that ‘sociology … has been based on cussion sought to provide an alternative to
and built up within the male social universe’ what were presented as ‘proper interviews’
(Smith 1974, 7), this was a ‘social universe’ in sociological textbooks. More broadly,
that left unstudied and invisible the female- Oakley challenged positivist research meth-
dominated social sites of domestic work and ods that emphasized ‘objectivity,’ distance,
the care of children, the ill and the elderly (see and ‘hygienic’ research uncontaminated by
also Finch and Groves 1983; Graham 1983, the researcher’s values or biases. In contrast
1991). Second, these sentiments were even to an objective, standardized and detached
more profoundly felt by particular groups of approach to interviewing, Oakley argued that
women, especially by women of color who ‘the goal of finding out about people through
watched as feminist movements and feminism interviewing was best achieved when the
within the academy unfolded in ways that did relationship of interviewer and interviewee is
not speak to them or about them. In the United non-hierarchical and when the interviewer is
States, this sense was aptly described as one of prepared to invest his or her own personal
‘feelings of craziness’ by the infamous Com- identity in the relationship’ (1981, 41). Janet
bahee River Collective’s manifesto entitled: Finch (1984), writing a few years later, echoed
‘A Black Feminist Statement’ (Combahee Oakley’s concerns in emphasizing the rapport
River Collective 1977/1986; see also Collins that could easily be struck between two
1990; Hooks 1989, 1990; Lorde 1984). In women in an interview situation while others
Britain, women of African and Asian descent followed suit and argued for the importance of
spoke to the invisibility of their experiences developing mutually reciprocal relationships
in public, political, and academic portrayals during the interviewing stage (Mies 1983;
of women’s lives (see Bryan et al. 1985; Rheinharz 1992; Stanley and Wise 1983,
Mirza 1998; Wilkinson and Kitzinger 1996). 1993).
A third concern was over the preferred tool A central preoccupation for feminist
for research within positivist frameworks, researchers writing in the 1980s was an acute
namely, the quantitative survey, and the extent sensitivity to the relations between researcher
to which it could adequately capture the com- and researched, and power relations more
plexity of women’s lives. As Hilary Graham widely (see Maynard and Purvis 1994;
Ramazanoglu and Holland 2002). In the Mohanty et al. 1991; Oyewumi 2000; Spivak
1990s, however, feminist social scientists 1993); the challenges of knowing transna-
began to challenge the notion of non- tional lesbian and gay identities (Bunch 1987;
hierarchical interviews, the idea that power Stone 1991); and the role and representation
differentials could be equalized between of subordinate ‘others’ in the production of
women, as well as the assumption that knowledge (Bernal 2002; Christian 1996).
reciprocity and mutuality between women A decade after Ann Oakley’s celebration
necessarily leads to ‘better’ knowing. Indeed, of non-hierarchical woman-to-woman inter-
feminists began to display a growing appre- viewing, and its ability to yield greater
ciation of the ‘dilemmas’ and tensions insight into knowledge of women’s lives,
involved in coming to know and represent feminist work took a 360-degree turn and
the narratives, experiences, or lives of their began to highlight the potential dangers
interview subjects (e.g. Ribbens and Edwards associated with trying to pretend that inter-
1998; Willkinson and Kitzinger 1996; Wolf views could be friendly or mutually bene-
1992). ficial for both researchers and interviewees.
Western-based social scientists have exhib- Judith Stacey (1991: 114) argued that the
ited profound ‘worry’ over resolving these ‘ethnographic method exposes subjects to
tensions (Fine and Wiess 1996, 251; see also far greater danger and exploitation than do
DeVault 1999). However, the ethical dilem- more positivist, abstract, and “masculinist”
mas around coming to know ‘others’ have research methods. And the greater the inti-
been particularly clearly articulated by Black macy – the greater the apparent mutuality
feminist scholars (Lewis 2000; Mama 1995; of the researcher/researched relationship –
Reynolds 2002a) and by feminists working the greater is the danger.’ Pamela Cotterill
in contexts where inequalities are especially (1992: 597) similarly drew attention to the
acute, such as in low-income communities ‘potentially damaging effects of a research
and in Third World countries (Patai 1991; technique which encourages friendship in
Wolf 1992). One of the most vocal scholars order to focus on very private and personal
on this issue has been Daphne Patai who aspects of people’s lives.’ These criticisms
has insisted that, due to socio-economic and have continued into the new millennium,
global inequalities, research relations between with feminists commenting on the irony that
First World women interviewing Third World feminist researchers may be reproducing the
women are not only intrinsically hierarchical, very practices they have been seeking to
but can be unethical (Patai 1991). Questions of challenge:
who produces knowledge, with what politics,
and from which locations (Mohanty 1988, It is perhaps ironic, then, that scholars are
discovering that methodological changes intended
1991) have, furthermore, become increasingly to achieve feminist ends—increased collaboration,
critical and urgent in feminist, postmodern, greater interaction, and more open communication
and post-colonial research. Throughout the with research participants—may have inadvertently
1990s, women of color working within reintroduced some of the ethical dilemmas feminist
western contexts and feminists working in researchers had hoped to eliminate: participants’
sense of disappointment, alienation, and potential
Third World settings have highlighted sys- exploitation. (Kirsch 2005, 2163)
temic processes of exclusion, racism, and
ethnocentrism in research. Key and much- Three decades of ardent reflection on
debated issues have included: intersections of the usefulness of interviews as the most
global capitalism and feminist transnational appropriate, or even the best, way of
identities (Ferguson 2004; Schutte 1993, gathering knowledge from and for women
1998, 2000; Shohat 2001); the extent to have paved the way for broader theoretical
which feminists in dominant cultures can and epistemological debates about ‘knowing’
ever know subaltern cultures (Alexander others. Beginning in the 1990s, feminists have
and Mohanty 1997; Ladson-Billings 2000; turned their attention to the difficulties and
challenges involved in creating knowledge in both China and the UK – argues that
from interview accounts. both the interviewer’s and interviewee’s
perceptions of social, cultural, and personal
differences have an impact on the power
FEMINIST CONTRIBUTIONS TO THE relationship in the interview and that the
relational dynamics between the interview
INTERVIEW: RECENT ISSUES AND
pair can matter in what kind of information
CONCERNS (1990s–2000s) is divulged (Tang 2002; see also Garg 2004).
Others have focused on how other aspects
While the issues raised by Oakley have
of the research relationship can influence the
been critiqued and displaced with other
content and conduct of interview, including:
key concerns, it remains the case that her
shared proficiency by both interviewer
reflections on what was important to feminist
and interviewee in the language of the
interviewing still resonate as highly relevant
interview (Garg 2004; Temple and Edwards
in the new millennium. That is, issues of
2002); generational differences between
non-hierarchical relations, power, rapport,
interviewers and interviewees (Casey 2003);
and empathy, and the investment of one’s
shared racial position (such as Black women
identity in the interview process continue
researchers conducting interviews with
to dominate discussions of feminist research
Black women on topics that are highly
practices. However, these discussions have
sensitive) (Few et al. 2003); and how class
grown more complex and nuanced, and have
relations may influence the ‘telling’ of lesbian
incorporated a number of other concerns
stories in research interviews (McDermott
including: interviews as sites for collaborative
2004).
meaning-making (the ‘how’ of interviews);
Power relations in research have been
the interrogation of ‘what’ constitutes data;
discussed with an overwhelming focus on how
and the theoretical assumptions and under-
interviews affect the researched. Recently,
pinnings of interviews, and research methods
however, feminists have highlighted the
more generally.
ways in which research respondents can
exercise power, creating a two-way flow
of power relations between the researcher
Non-hierarchical relations in
and the researched. Informed by Fou-
interviewing
cauldian understandings of power, Thapar-
Underlying early discussions of non- Bjorkert and Henry (2004) view power
hierarchical interviewing was the assumption hierarchies in research as ‘shifting, multiple,
that differences between women could be and intersecting’ (Thapar-Bjorkert and Henry
muted or eliminated altogether. Decades of 2004, 364). Drawing on the multiple locations
scholarship on differences between women, within which both researchers and research
postmodern and post-structural critiques of participants are located, they argue that
the stability of a concept and identity such their combined locations as ‘non-white/non-
as ‘woman,’ and black feminist contributions western and non-white/western researchers
to this debate have revealed the naivety and in a non-western setting’ enabled them to
essentialism inherent within this position. ‘closely examine the operation of power
Many feminist researchers have shown that as it flows and ebbs in the context of
structural characteristics other than gender, a multiplicity of potential identities of
such as differences in class, ethnicity, age, researchers and research participants’ (2004,
sexuality, and global location can matter and 363). They note, in particular, how age,
that the ways in which power imbalances generation, national location, and reciprocity
play out in the interview process are not during and after the interviews influence how
straightforward. Tang, for example, in her these power relations play out. Similarly,
interviews with peers – academic mothers drawing on her research with Black mothers,
Reynolds (2002b) questions the notion of the interviewer must never disagree with
the ‘powerful researcher.’ She notes that ‘the a respondent in qualitative research.
power relations between the mothers and Issues of rapport and empathy in interview-
myself, as researcher, involved a dynamic, ing have tended to be discussed and con-
fluid and two-way interactive process’(2002b, ceptualized in relation to woman-to-woman
303). She found that power relations within interviewing. However, since the 1990s,
her interviews shifted according to struc- feminists have increasingly been investigating
tural differences in race, class, age, and the lives of men, thus raising questions
gender between researcher and researched. around creating empathy and rapport with
She writes: male research subjects. These challenges
have emerged from the work of feminist
‘Where the researcher and research participant
share the same racial and gender position, such
researchers who, for example, have inter-
as Black female researcher interviewing Black viewed powerful, authoritative, and uni-
women, power between the two groups is primarily formed men (e.g. senior police officers)
negotiated through other facts such as social class or violent male offenders (Campbell 2003;
and age difference. This interaction between race, Presser 2004, 2005; Taylor and Rupp 2005).
class and gender suggests that power in social
research is not a fixed and unitary construct,
Researchers of fatherhood have further
exercised by the researcher over the research explored how feminist research relationships
participant. Instead … power is multifaceted, can be fostered with men. In recent research
relational and interactional and is constantly shifting on divorced fathers, for example, Canadian
and renegotiating itself between the researcher feminists have reflected on the tensions
and the research participant according to differing
contexts and their differing structural locations.’
in interviewing fathers in political climates
(2002b, 307–8) where fathers’rights groups have been gaining
momentum. They highlight how fathers’ nar-
Feminist reflections on the inevitability of ratives can be heard as potentially damaging to
hierarchy and power differences in interview women’s traditional caregiving interests (see
settings and relationships do not suggest Doucet 2004, 2006; Mandell 2002). Feminist
or imply abandonment of this method but research on men’s experiences demonstrates
rather invite researchers to be reflexive how the establishment of trustworthy relations
about their research practices by recognizing, in the interviewing setting can nevertheless
debating, and working with these power exist within relations of considerable power
differentials. inequities and conflict that can ultimately
undermine larger feminist research objectives.
Empathy, rapport, and reciprocity
Investing one’s identity in the
Feminists have deepened their reflections on
research relationship
issues of empathy, rapport, and reciprocity
in interview situations, with a recent focus In the early work of Ann Oakley (1981),
on how to navigate differences of social the idea of investing one’s identity in
positioning. Questions about how much the research relationship was marked by
researchers should reveal about themselves, a tendency to frame a binary opposition
their situations and their views during inter- between the researcher as an ‘insider’ or an
views have continued to be asked (see ‘outsider’ to the research and to one’s research
Edwards 1993), particularly in cases of subjects. Oakley, and many other feminist
research on overtly political issues where researchers who followed her, illustrated this
researcher and researched may hold divergent tendency in the argument that where the
perspectives. For example, in her research in researcher has an area of shared identity
the British Serbian community on Serbian with her research subjects, there was a
liability for atrocities, Pryke (2004) chal- reduced likelihood of unequal, exploita-
lenges the methodological convention that tive, or unethical research. In the case of
Oakley, shared motherhood was the entry fixed or static positions; rather they are ever-
point for the researcher to have ‘insider’ shifting and permeable social locations that
status in the research. Other feminists were are differentially experienced or expressed by
quick to contest this notion by underlining community members’ (Naples 2003, 373; see
how differing, as well as shared, structural also Naples 1996). Ongoing reflections on the
characteristics could impede mutuality and complexities of ‘otherness’ have highlighted
reciprocity (Coterill 1992; Edwards 1990, the increasing set of challenges that face
1993; Glucksmann 1994; Ramazanoglu 1989; researchers as they attempt to know others
Reynolds 2002b; Ribbens 1989; Song and who are different from themselves across
Parker 1995). Feminist scholars also noted multiples axes of identities and experiences
that even where researchers and respondents (see Fawcett and Hearn 2004).
shared structural and cultural similarities of Second, the question of who we are,
gender, ethnicity, class, and age, this did not while engaged concretely in the practice
guarantee mutual understanding or ‘better’ of research interviews, is also viewed as
knowing. As Catherine Riessman pointed out, neither unitary nor static. Shlulamit Reinharz,
‘gender and personal involvement may not for example, in a book chapter entitled
be enough for full “knowing”’ (Riessman ‘Who Am I,’ reflects upon how she has
1987, 189; see also Ribbens 1998). Since the ‘approximately 20 different selves’ (Reinharz
early 1990s, feminist discussions of identity 1997, 5) during her interviews and fieldwork.
investment in interviews have, thus, debunked Recent feminist contributions to this debate
the view that any commonality in one’s have highlighted how the interview topics
social positionality, structural location, and as well as the relational dynamics occurring
biographical experience can guarantee that in the research encounter influence how
these axes of shared identification will estab- we present ourselves and which parts of
lish an open or ‘better’ research exchange (see our identity we choose to emphasize. Some
Dyck 1997). researchers may adopt ‘in-between positions’
At the same time, feminists began to as they straddle different identities (Ghorashi
recognize that the identity of being an ‘insider’ 2005) while others have stressed the ‘border-
was riddled with contradictions and that there making process that occurs during the social
were varied degrees of being both an insider constructionist interview’ wherein ‘various
and an outsider in the research relationship pre-assumed roles are created by researchers
(e.g. Narayan 1993; Olesen 1998; Stanley and by their respondents’(Gubrium and Koro-
1994; Zavella 1993). In this vein, Patricia Ljungberg 2005, 690).
Hill Collins has referred to herself as the
‘outsider within’ (Hill Collins 1990, 1998) as
Interviews and an interrogation of
a way of describing ‘being on the edge’ of
‘what’ constitutes data
‘intersecting power relations of race, gender
and social class’ (Hill Collins 1999, 85; Feminist researchers have also interrogated
see also Anzaldua 1987; Braidotti 1994). just ‘what’ emerges out of interview data.
Furthermore, post-structuralist discussions of In the 1970s and 1980s, there was a tendency
the complexity of the theoretical concepts for feminist researchers, particularly those
and empirical constructs of subjectivity and influenced by feminist standpoint theory
identity have further strengthened the prob- (Harding 1987; Hartsock 1983, 1985; Smith
lematization of what it means to be an 1987), to talk and write about seemingly
insider or an outsider, both theoretically and coherent and transparent subjects whose
methodologically. experiences, voices, or subjectivities could
Two key issues have come to the fore be captured by well-formulated research
in these debates. First, there is now fairly questions. Going back to Hilary Graham’s
widespread consensus among feminists that point about ‘her answers’ not fitting ‘his
‘“outsiderness” and “insiderness” are not questions,’ there was an implicit assumption
that if the questions could just be reformulated 1995; Butler 1995; Fraser and Nicholson
better, then ‘her answers’ would indeed pro- 1988; Weeks 1998), debates on theorizing
vide pathways into understanding women’s the concept of ‘experience’ (Holt 1994;
experiences. In ensuing years, however, the Scott 1992, 1994) as well as feminist
influence of postmodern and post-structural critiques of Foucault’s varied conceptions of
critiques has meant that feminists have begun the subject (Deveaux 1994; McNay 1993;
to strongly challenge this view. Researchers Sawicki 1991).
have named this as the recurring ‘transparent
self problem’ and the ‘transparent account
problem’ (Hollway and Jefferson 2000, 3;
see also Frith and Kitzinger 1998, 304–307) Interviews as collaborative
within interviews and their analysis.
meaning-making: The ‘how’ of
An extensive scholarship on post-
interviews
structuralist conceptualizations of subjects is
now well incorporated into feminist research Feminists, particularly those influenced by
and feminist approaches to the interview. ethnomethodology, have highlighted the
Most notable has been post-structural importance of the interview not only as a place
theorizing about a non-unitary, constantly to collect data, but also a site where data
changing subject where there is no ‘core self’ is co-constructed, where identities are forged
(e.g. Weedon 1987). Even feminist scholars through the telling of stories, and where
who have been critical of post-structuralist meaning-making begins. Researchers have
approaches have been influenced by such focused on how the research interview has
critiques. Sandra Harding, for example, particularly strong meanings for the research
has moved beyond her originally narrow participant (Hiller and DiLuzio 2004; see
conception of a feminist standpoint to argue also Brannen 1988). The research interview
that ‘the subjects of knowledge are … can be a site for the construction of one’s
multiple, heterogeneous and contradictory ‘moral’ identity (Presser 2004) as well as
or incoherent’ (Harding 1993, 65). Other a potential avenue for resistance and healing
scholars have remained unconvinced by when topics are of a sensitive nature (Taylor
the linguistic turn and have continued 2002). In Presser’s qualitative work with
to hold onto some notion of coherent men who had committed ‘serious violent
subjectivities, or to ‘knowing subjects’ in crimes, including crimes again women –
their interviewing, as well as knowledge- rape of girls and women and assault and
construction practices (see Code 1993; Smith murder of female partners’ (2005, 2067),
1999; Stanley 1994). Dorothy Smith, for she examines how the interview itself acted
example, has argued persuasively that post- as a context for the creation of men’s
structuralism ‘has rejected the unitary subject narratives and their identities. Reflecting on
of modernity only to multiply it as subjects her role as a researcher in these settings,
constituted in multiple and fragmented she highlights how the men she interviewed
discourses’ (Smith 1999, 108) while Linda presented themselves as ‘good and manly’
Alcoff has maintained: ‘Poststructuralist and ‘decent’ while simultaneously construct-
critiques pertain to the construction of all ing her, the researcher, both as somebody
subjects or they pertain to none’ (Alcoff ‘needing strength and guidance concerning
1988, 409). These debates on ‘who’ or relations with men’ as well as ‘an object
‘what’ is being accessed within interviews of fantasies of domination’ (2005, 2086).
have continued in discussion of feminist Presser, thus, argues that feminist researchers
research into the new millennium against a need to pay closer attention to how power
backdrop of larger theoretical work on post- relations within the interview setting can
structuralist and materialist/interpretivist become part of one’s data and she calls for a
conceptions of the subject (see Benhabib ‘close and deep (multilevel) examination of
the “how” of talk and not just the “what”’

Research methods as
(2005, 2087).
theoretical issues
These issues have also received consid-
erable attention in the expanding literature While early feminist discussions of issues
on focus groups. Focus group, or groups of identity, reciprocity, and power focused
interviews, have come to be viewed as on the initial research stages, more recent
important ways of breaking down hierarchies feminist discussions have highlighted how
between the interviewer and the interviewees, these issues pervade the entire research
of providing insights into group-based discus- endeavor, and particularly the post-interview
sion, and for allowing an interactive forum processes of data analysis, writing up and
for negotiation around concepts and issues dissemination. As Harrison writes: ‘Every
(see Doucet 2006; Frith and Kitzinger 1998; stage of the research process relies on
Kitzinger 1994; Munday 2006; Warr 2005; our negotiating complex social situations’
Wilkinson 1999). Kitzinger (1994, 119), for (Harrison et al. 2001, 323). For example,
example, maintains that the interactive nature feminists have drawn attention to the ways
of group interviews ‘enables the researcher in which race, class, and gender intersect
to … explore how accounts are constructed, during data analysis (Archer 2002); the
expressed, censured, opposed and changed influence of biographical and theoretical
though social interaction.’ Hyams (2004), issues on the analysis and interpretation of
who utilized a ‘feminist group discussion interview transcripts (Mauthner and Doucet
method’ in her research on adolescent Latina 1998, 2003); and the diverse ways in which
gender identities noted that ‘(g)roup discus- interview stories can be presented and re-told
sions are seen as potentially empowering in (McCormack 2004; M. Wolf 1992).
exploring and enabling group members’social These reflections serve to underline the
agency and knowledge production while ways in which power relations continue
at the same time diminishing the unequal to shape the research process long after
power relations between the researched and interviews have been completed. Feminists
researcher.’ A further example of the links have noted that researchers and respondents
between feminist research ideals and group have a ‘different and unequal relation to
interviews is in Pini’s work (2002) on the knowledge’ (Glucksmann 1994, 150) and
Australian sugar industry where she argues that within most research projects, ‘the final
that the effectiveness of focus groups for shift of power between the researcher and
reaching feminist research goals can be the respondent is balanced in favor of the
demonstrated in at least four ways. These researcher, for it is she who eventually walks
include: making visible to women that away’ (Cotterill 1992, 604; see also Reinharz
which was previously invisible; enabling 1992; Stacey 1991; Wolf 1992). We have
connections between individual and collective argued that when interview accounts or
experiences; challenging dominant beliefs; narratives become ‘transformed’ into theory,
and allowing a space for ample discussion the later stages of analysis, interpretation,
about gender issues (see also Wahab 2003). and writing up are critical to feminist
Others have argued for the complementarities concerns with power, exploitation, knowing
of individual and group-based interviews and representation (Doucet and Mauthner
(Pollack 2003; Wahab 2003). Given that 2002; Mauthner and Doucet 1998, 2003;
a fundamental aim of feminist research has see also Glucksmann 1994). Researchers
always been that of social change for women, have also reflected on the dilemmas and
focus groups have served the function of elic- power issues involved when contradictions
iting a rich dataset which can simultaneously arise between interviewer interpretations and
complement individual interviews while also interviewee understandings of their own
potentially facilitating ‘consciousness raising’ stories (see Andrews 2002; Borland 1991;
(see Wilkinson 1999). Ribbens 1994).
The move away from an overwhelming to the widely acknowledged contributions

focus on the interview setting, to what happens that feminist researchers have made to the
after the interview is completed, transcribed, theory and practice of qualitative research
analyzed, and written up has meant that the (see, for example, DeVault 1999; Hesse-Biber
issue of power in interviewing has shifted and Yaiser 2004; Olesen 1998, 2005; Stanley
from the question of whether there are power and Wise 1983, 1990, 1993). The issue of
inequalities between researchers and respon- interviewing as a way of coming to know
dents, to consider how, when, and where others and to construct knowledge about them
power influences knowledge production and has been a recurrent theme of debate for all
construction processes. These reflections on qualitative researchers. As discussed in this
negotiating research relationships in the post- chapter, it has also been a subject that has
interview phase of research are part of a larger had particular salience for feminist scholars.
set of methodological and epistemological Beginning with Ann Oakley’s classic piece
conversations on the intricate connections over two decades ago which argued that ‘the
between ‘doing and knowing’ (Lather 2001; goal of finding out about people through
Letherby 2003, 2004) and on the critical interviewing is best achieved when the
ways in which methods, methodologies, and relationship of interviewer and interviewee is
epistemologies are linked through all stages of non-hierarchical’ (1981, 41), this chapter has
the research process (e.g. Code 1995; Holland traced some of the key feminist contributions
and Ramazanoglu 1994; Maynard 1994; to the theory and practice of interviewing
Naples 2003; Ramazanoglu and Holland over the past quarter century. While discussion
2002). These feminist debates have high- initially focused on the potential and pitfalls of
lighted how research methods are imbued attempting to create rapport and friendliness
with methodological, epistemological, and within interviews, more recent challenges
ontological assumptions that impact on the from cultural studies, post-structural sensibil-
later interpretive stages of the research in ities, and postcolonial writing have unsettled
terms of how and what knowledge gets the idea that ‘others’ can be known through
constructed from them. As Jennifer Mason interviews or indeed through any method.
(2002, 225) writes, ‘Asking, listening and This chapter has also highlighted the
interpretation are theoretical projects in the most recent contributions of feminist schol-
sense that how we ask questions, what we arship to contemporary understandings of
assume is possible from asking questions the research interview. These contributions
and from listening to answers, and what include: attempts to render more complex
kind of knowledge we hear answers to be, earlier debates on non-hierarchical interview-
are all ways in which we express, pursue ing; empathy, rapport, reciprocity, and the
and satisfy our theoretical orientations in our investing of one’s identity in the research
research.’ relationship; interviews as sites for collabo-
rative meaning-making (the ‘how’ of inter-
views); the interrogation of ‘what’ constitutes
CONCLUSIONS data; and the theoretical assumptions and
underpinnings of interviews, and research
In 1990, feminist theorists and researchers methods more generally. Feminist scholars,
Liz Stanley and Sue Wise (1990, 37) noted due to their overarching focus on issues of
that ‘feminist theorists have moved away power and a quest to dismantle systemic
from the “reactive” stance of the feminist inequalities within social relationships more
critiques of social science and into the widely, have made – and will continue to
realms of exploring what “feminist knowl- make – important and rich contributions to the
edge” could look like.’ Part of this task of practice of interviewing as well as to the field
generating feminist knowledge, and social of qualitative methods and methodologies
science knowledge more generally, relates more generally.
REFERENCES Casey, Emma. 2003. ‘“How do You Get a Ph.D. in

That?” Using Feminist Epistemologies to Research the
Alcoff, Linda Martin. 1988. ‘Cultural Feminism Versus Lives of Working Class Women.’ International Journal
Post-Structuralism: The Identity Crisis in Feminist of Sociology and Social Policy 23: 107–123.
Theory.’ Signs: Journal of Women in Culture and Chafetz, Janet Saltzman. 2004a. ‘Bridging Feminist
Society 13: 405–436. Theory and Research Methodology.’ Journal of Family
Alexander, Jackie and Chandra Talpede Mohanty. 1997. Issues 25: 963–977.
Feminist Genealogies, Colonial Legacies, Democratic Chafetz, Janet Saltzman. 2004b. ‘Reply to Comments
Futures. New York: Routledge. by Walker, Baber, and Allen.’ Journal of Family Issues
Andrews, Molly. 2002. ‘Feminist Research with Non- 25: 995–997.
Feminist and Anti-Feminist Women: Meeting the Christian, Barbara. 1996. ‘The Race for Theory.’
Challenge.’ Feminism and Psychology 12: 55–77. In Contemporary Postcolonial Theory: a Reader,
Anzaldua, Gloria. 1987. Borderlands/La Frontera: The edited by M. Padmini. New York: Arnold.
New Mestiza. San Francisco: Spinsters/Aunt Lute. Code, Lorraine. 1981. ‘Is the Sex of the Knower
Archer, Louise. 2002. ‘“It’s Easier That You’re a Girl Epistemologically Significant?’ Metaphilosophy
and That You’re Asian”: Interactions of “Race” 12: 267–276.
and Gender Between Researchers and Participants.’ Code, Lorraine. 1993. ‘Taking Subjectivity into Account.’
Feminist Review 72: 108–132. In Feminist Epistemologies, edited by L. Alcoff and
Benhabib, Seyla. 1995. ‘Feminism and Postmodernism.’ E. Potter. New York and London: Routledge.
In Feminist Contentions: A Philosophical Exchange, Code, Lorraine. 1995. ‘How Do We Know? Questions of
edited by S. Benhabib, J. Butler, D. Cornell, and Method in Feminist Practice.’ In Changing Methods:
N. Fraser. New York and London: Routledge. Feminists Transforming Practice, edited by S. D. Burt
Bernal, Dolores Delgado. 2002. ‘Critical Race Theory, and L. Code. Peterborough: Broadview.
Latino Critical Theory, and Critical Raced-Gendered Collins, Patricia Hill. 1990. Black Feminist Thought:
Epistemologies: Recognizing Students of Color as Knowledge, Consciousness, and the Politics of
Holders and Creators of Knowledge.’ Qualitative Empowerment. London and New York: Routledge.
Inquiry 8: 105–126. Collins, Patricia Hill. 1998. Fighting Words: Black
Borland, Katherine. 1991. ‘“That’s Not What I Said”:
Women and the Search for Justice. Minneapolis:
Interpretive Conflict in Oral Narrative Research.’
University of Minnesota Press.
pp. 63–75 in Women’s Words: The Feminist Practice
Collins, Patricia Hill. 1999. ‘Reflections on the
of Oral History, edited by S. B. Gluck and D. Patai.
Outsider Within.’ Journal of Career Development 26:
New York: Routledge.
85–88.
Braidotti, Rosi. 1994. Nomadic Subjects: Embodiment
Combahee River Collective. 1977. ‘A Black Feminist
and Sexual Difference in Contemporary Feminist
Statement.’ In Capitalist Patriarchy and the Case for
Theory. New York: Columbia University Press.
Social Feminism, edited by Z. Eisenstein.
Brannen, Julia. 1988. ‘Research Note: The Study
of Sensitive Subjects.’ The Sociological Review Cotterill, Pamela. 1992. ‘Interviewing Women: Issues
36: 552–563. of Friendship, Vulnerability, and Power.’ Women’s
Brannen, Julia. 1992. ‘Combining Qualitative and Studies International Forum 15: 593–606.
Quantitative Approaches: An Overview.’ pp. 3–37 DeVault, Marjorie L. 1990. ‘Talking and Listening
in Mixing Methods: Qualitative and Quantitate from Women’s Standpoint: Feminist Strategies
Approaches, edited by J. Brannen. Avebury: for Interviewing and Analysis.’ Social Problems
Aldershot. 37: 96–116.
Bryan, Beverly, Stella Dadzie, and Suzanne Scafe. 1985. DeVault, Marjorie L. 1996. ‘Talking Back to Sociology:
The Heart of the Race: Black Women’s Lives in Britain. Distinctive Contributions of Feminist Methodology.’
London: Virago. Annual Review of Sociology 22: 29–50.
Bunch, Charlotte. 1987. Passionate Politics: Feminist DeVault, Marjorie. 1999. Liberating Method: Feminism
Theory in Action. New York: St. Martin’s Press. and Social Research. Philadelphia, PA: Temple
Butler, Judith. 1995. ‘Contingent Foundations.’ In Fem- University Press.
inist Contentions: A Philosophical Exchange, edited Deveaux, Monique. 1994. ‘Feminism and Empower-
by S. Benhabib, J. Butler, D. Cornell, and N. Fraser. ment: A Critical Reading of Foucault.’ Feminist Studies
New York and London: Routledge. 20: 223–247.
Campbell, Elaine. 2003. ‘Interviewing Men in Uniform: Doucet, Andrea. 2004. ‘Fathers and the Responsibility
a Feminist Approach?’ International Journal of Social for Children: A Puzzle and a Tension.’ Atlantis:
Research Methodology 6: 285–305. A Women’s Studies Journal 28: 103–114.
Doucet, Andrea. 2006. Do Men Mother? Fathering, Care Fraser, Nancy and Linda Nicholson. 1988. ‘Social
and Domestic Responsibility. Toronto: University of Criticism without Philosophy: An Encounter between
Toronto Press. Feminism and Postmodernism.’ Theory, Culture and
Doucet, Andrea and Natasha S. Mauthner. 2002. Society 5: 373–394.
‘Knowing Responsibly: Linking Ethics, Research Frith, Hannah and Celia Kitzinger. 1998. ‘“Emotion
Practice and Epistemology.’ In Ethics in Qualitative Work” as a Participant Resource: A Feminist Analysis
Research, edited by M. Mauthner, M. Birch, J. Jessop, of Young Women’s Talk-in-interaction.’ Sociology 32:
and T. Miller. London: Sage. 299–320.
Doucet, Andrea and Natasha S. Mauthner. 2006. ‘Fem- Garg, Anupama. 2004. ‘Interview Reflections: A First
inist Methodologies and Epistemologies.’ In Hand- Generation Migrant Indian Woman Researcher
book of 21st Century Sociology, edited by Clifton D. Interviewing a First Generation Migrant Indian Man.’
Bryant and Dennis L. Peck. Thousand Oaks, CA: Journal of Gender Studies 14: 147–152.
Sage. Gelsthorpe, Lorraine. 1990. ‘Feminist Methodology in
Dyck, Isabel. 1997. ‘Dialogue with Difference: A Tale Criminology: A New Approach or Old Wine in New
of Two Studies.’ pp. 183–202 in Thresholds in Bottles.’ In Feminist Perspectives in Criminology,
Feminist Geography: Difference, Methodology, Rep- edited by L. Gelsthorpe and A. Morris. Milton Keynes:
resentation, edited by J. P. I. Jones, H. Nast, and Open University Press.
S. M. Roberts. Lanham, MD: Rowman and Littlefield. Ghorashi, Halleh. 2005. ‘When the Boundaries are
Edwards, Rosalind. 1990. ‘Connecting Method and Blurred: The Significance of Feminist Methods in
Epistemology: A White Woman Interviewing Black Research.’ European Journal of Women’s Studies
Women.’ Women’s Studies International Forum 12: 363–375.
13: 477–490. Gilligan, Carol. 1977. ‘In a Different Voice: Psycho-
Edwards, Rosalind. 1993. ‘An Education in Interviewing: logical Theory and Women’s Development.’ Harvard
Placing the Researcher and the Researched.’ In Educational Review 47: 481–517.
Researching Sensitive Topics, vol. 181–196, edited Gilligan, Carol. 1982. In a Different Voice: Psychological
by C. M. Renzetti and R. M. Lee. Newbury Park: Sage. Theory and Women’s Development. Cambridge,
Fawcett, Barbara and Jedd Hearn. 2004. ‘Researching Mass.: Harvard University Press.
Others: Epistemology, Experience, Standpoints and Glucksmann, Miriam. 1994. ‘The Work of Knowledge
Participation.’ International Journal of Social Research and the Knowledge of Women’s Work.’ In Research-
Methodology 7: 201–218. ing Women’s Lives from a Feminist Perspective, edited
Ferguson, Ann. 2004. ‘Symposium: Comments on Ofelia by M. Maynard and J. Purvis. London: Taylor and
Schutte’s Work on Feminist Philosophy.’ Hypatia Francis.
19: 169–181. Graham, Hilary. 1983. ‘Do Her Answers Fit His
Few, April L., Dionne P. Stephens, and Marlo Rouse- Questions? Women and the Survey Method.’ In The
Arnett. 2003. ‘Sister-to-Sister Talk: Transcending Public and the Private, edited by E. Gamarnikow.
Boundaries and Challenges in Qualitative Research London: Tavistock.
with Black Women.’ Family Relations 52: 205–215. Graham, Hilary. 1991. ‘The Concept of Caring in
Finch, Janet. 1984. ‘“It’s Great to Have Someone to Talk Feminist Research: The Case of Domestic Service.’
to”: The Ethics and Politics of Interviewing Women.’ Sociology 25: 61–78.
In Social Researching: Politics, Problems, Practice, Gubrium, Erika and Mirka Koro-Ljungberg. 2005.
edited by C. Bell and H. Roberts. London: Routledge ‘Contending with Border Making in the Social
and Kegan Paul. Constructionist Interview.’ Qualitative Inquiry
Finch, Janet and Dulcie Groves. 1983. A Labour of Love: 11: 689–715.
Women, Work and Caring. London: Routledge. Haraway, Donna. 1988. ‘Situated Knowledges: The
Fine, Michelle and Lois Wiess. 1996. ‘“Writing Science Question in Feminism and the Privi-
the Wrongs’ of Fieldwork: Confronting our Own lege of Partial Perspective.’ Feminist Studies 14:
Research/Writing Dilemmas.” Qualitative Inquiry 575–599.
2: 251–274. Haraway, Donna. 1991. Simians, Cyborgs and
Fonow, Mary M. and Judith A. Cook. 1991. Beyond Women: The Reinvention of Nature. New York:
Methodology: Feminist Scholarship as Lived Research. Routledge.
Bloomington: Indiana University Press. Harding, Sandra. 1987. ‘Conclusion: Epistemological
Fonow, Mary M. and Judith A. Cook. 2005. ‘Feminist Questions.’ pp. 181–190 in Feminsm and Method-
Methodology: New Applications in the Academy and ology, edited by S. Harding. Bloomington, Indiana
Public Policy.’ Signs: Journal of Women in Culture and and Milton Keynes, UK: Indiana University Press and
Society 30: 2211–2236. Open University Press.
Harding, Sandra. 1993. ‘Rethinking Standpoint Epis- Keller, Evelyn Fox. 1985. Reflections of Gender and
temologies: What is Strong Objectivity.’ In Feminist Science. New Haven and London: Yale University
Epistemologies, edited by L. Alcoff and E. Potter. Press.
London: Routledge. Keller, Evelyn Fox and Helen E. Longino. 1998. ‘Feminism
Harrison, Jane, Lesley MacGibbon, and Missy Morton. and Science.’ Oxford and New York: Oxford University
2001. ‘Regimes of Trustworthiness in Qualitative Press.
Research: The Rigors of Reciprocity.’ Qualitative Kelly, Liz, Sheila Burton, and Linda Regan. 1994.
Inquiry 7: 323–345. ‘Researching Women’s Lives or Studying Women’s
Hartsock, Nancy. 1983. ‘The Feminist Standpoint: Oppression? Reflections on what Constitutes Feminist
Developing the Ground for a Specifically Feminist Research.’ pp. 27–48 in Researching Women’s Lives
Historical Materialism.’ In Discovering Reality: Fem- from a Feminist Perspective, edited by M. Maynard
inist Perspectives on Epistemology, Metaphysics, and J. Purvis. London: Taylor and Francis.
Methodology and Philosophy of Science, edited by Kirsch, Gesa E. 2005. ‘Friendship, Friendliness, and
S. Harding and M. Hintakka. Dordrecht: D. Reidel Feminist Fieldwork.’ Signs: Journal of Women in
Publishing. Culture and Society 30: 2163–2172.
Hartsock, Nancy. 1985. Money, Sex and Power: Kitzinger, Jenny. 1994. ‘The Methodology of Focus
Toward a Feminist Historical Materialism. Boston: Groups: The Importance of Interaction between
Northeastern University Press. Research Participants. Sociology of Health & Illness
Hesse-Biber, Sharlene Nagy, and Michelle L. Yaiser. 16(1): 103–121.
2004. Feminist Perspectives on Social Research. Ladson-Billings, G. 2000. ‘Racialized Discourses and
New York and London: Oxford University Press. Ethnic Epistemologies.’ pp. 257–277 in Handbook
Hiller, Harry H. and Linda DiLuzio. 2004. ‘The of Qualitative Research, 2nd edition, edited by
Interviewee and the Research Interview: Analysing N. K. Denzin and Y. S. Lincoln. Thousand Oaks, CA:
a Neglected Dimension in Research.’ The Canadian Sage.
Review of Sociology and Anthropology 41: 1–26. Lather, Patti. 2001. ‘Postbook: Working the Ruins of
Feminist Ethnography.’ Signs 27: 199–227.
Holland Janet and Caroline Ramazanoglu. 1994.
Lather, Patti and Chris Smithies. 1997. Troubling the
‘Coming to Conclusions: Power and Interpreta-
Angels: Women Living with HIV/AIDS. Boulder, CO:
tion in Researching Young Women’s Sexuality.’
Westview.
pp. 125–148 in Researching Women’s Lives from
Letherby, Gayle. 2003. Feminist Research in Theory and
a Feminist Perspective, edited by M. Maynard and
Practice. Buckingham: Open University Press.
J. Purvis. London: Taylor and Francis.
Letherby, Gayle. 2004. ‘Quoting and Counting: An
Hollway Wendy and Toni Jefferson. 2000. Doing
Autobiographical Response to Oakley.’ Sociology
Qualitative Research Differently: Free Association,
38: 157–189.
Narrative and the Interview Method. London: Sage. Lewis, Gail. 2000. ‘Race,’ Gender and Social Welfare.
Holt, Thomas A. 1994. ‘Experience and the Politics of London: Polity Press.
Intellectual Inquiry.’ In Questions of Evidence: Proof, Lloyd, Genevieve. 1983. Man of Reason. London:
Practice and Persuasion across the Disciplines, edited Routledge.
by J. Chandler, A. I. Davidson, and H. Harootunian. Longino, Helen E. and Ruth Doell. 1983. ‘Body, Bias,
Chicago: University of Chicago Press. and Behaviour: A Comparative Analysis of Reasoning
Hooks, Bell. 1989. Talking Back: Thinking Feminist, in Two Areas of Biological Science.’ Signs: Journal of
Thinking Black. Boston: South End Press. Women in Culture and Society 9: 206–227.
Hooks, Bell. 1990. Yearning: Race, Gender and Cultural Lorde, Audre. 1984. Sister Outsider: Essays and
Politics. Boston: South End Press. Speeches. Berkeley, California: The Crossing Press.
Hyams, Melissa. 2004. ‘Hearing Girls’ Silences: Thoughts Mama, Amina. 1995. Beyond the Mask: Race, Gender
on the Politics and Practices of a Feminist Method and Subjectivity. London: Routledge.
of Group Discussion.’ Gender, Place and Culture 11: Mandell, Deena. 2002. Deadbeat Dads: Subjectivity and
105–119. Social Construction. Toronto: University of Toronto
Jayaratne, T. E. 1983. ‘The Value of Quantitative Press.
Methodology for Feminist Research.’ In Theories Mason, Jennifer. 2002. ‘Qualitative Interviewing:
of Women’s Studies, edited by G. Bowles and Asking, Listening and Interpreting.’ pp. 225–241 in
R. Duelli-Klein. London: Routledge and Kegan Paul. Qualitative Research in Action, edited by T. May.
Keller, Evelyn Fox. 1983. A Feeling for the Organism: London: Sage Publications.
The Life and Work of Barbara McClintock. New York: Mauthner, Natasha S. and Andrea Doucet. 1998.
W.H. Freeman. ‘Reflections on a Voice Centred Relational Method
of Data Analysis: Analysing Maternal and Domes- Naples, Nancy A. 1996. ‘A Feminist Revisiting of
tic Voices.’ In Feminist Dilemmas in Qualitative the ‘Insider/Outsider’ Debate: The ‘Outsider Phe-
Research: Private Lives and Public Texts, edited nomenon’ in Rural Iowa.’ Qualitative Sociology 19:
by J. Ribbens and R. Edwards. London: Sage 83–106.
Publications. Naples, Nancy A. 2003. Feminism and Method: Ethnog-
Mauthner, Natasha S. and Andrea Doucet. 2003. raphy, Discourse Analysis, and Activist Research.
‘Reflexive Accounts and Accounts of Reflexiv- New York and London: Routledge.
ity in Qualitative Data Analysis.’ Sociology 37: Narayan, Kiran. 1993. ‘How Native is a ‘Native’ Anthro-
413–431. pologist?’ American Anthropologist 95: 671–686.
Maynard, Mary. 1994. ‘Methods, Practice and Episte- Oakley, Ann. 1974. Housewife. London: Allen Lane.
mology: the Debate about Feminism and Research.’ Oakley, Ann. 1981. ‘Interviewing Women: A Contradic-
pp. 10–26 in Researching Women’s Lives from tion in Terms.’ pp. 30–61 in Doing Feminist Research,
a Feminist Perspective, edited by M. Maynard and edited by H. Roberts. London: Routledge and Kegan
J. Purvis. London: Taylor and Francis. Paul.
Maynard, Mary and June Purvis. 1994. Researching Oakley, Ann. 1998. ‘Gender, Methodology and People’s
Women’s Lives from a Feminist Perspective. London: Ways of Knowing: Some Problems with Feminism and
Taylor and Francis. the Paradigm Debate in Social Science.’ Sociology 32:
McCall, Leslie. 2005. ‘The Complexity of Intersectional- 707–731.
ity.’ Signs: Journal of Women in Culture and Society Olesen, Virgina. 1998. ‘Feminism and Models of
30: 1771–1799. Qualitative Research.’ In The Landscape of Qualitative
McCormack, Coralie. 2004. ‘Storying Stories: A Narra- Research: Theories and Issues, edited by N. K. Denzin
tive Approach to In-Depth Interview Conversations.’ and Y. S. Lincoln. Thousand Oaks, California: Sage.
Olesen, Virgina. 2005. ‘Early Millennial Feminist Quali-
International Journal of Social Research Methodology
tative Research.’ pp. 235–278 in The Sage Handbook
7(3): 219–236.
of Qualitative Research, edited by N. K. Denzin and
McDermott, Elizabeth. 2004. ‘Telling Lesbian Stories:
Y. S. Lincoln. Thousand Oaks, CA: Sage.
Interviewing and the Class Dynamics of ‘Talk’.’
Oyewumi, Oyeronke. 2000. ‘Family Bonds/Conceptual
Women’s Studies International Forum 27: 177–187.
Binds: African Notes on Feminist Epistemologies.’
McNay, Lois. 1993. Foucault and Feminism: Power,
Signs: Journal of Women in Culture and Society
Gender and the Self. Boston, MA: Northeastern
25: 1093–1098.
University Press.
Patai, Daphne. 1991. ‘U.S. Academics and Third World
Mies. M. 1983. ‘Towards a Methodology for Feminist Women: Is Ethical Research Possible?’ pp. 137–153
Research.’ In Theories of Women’s Studies, edited by in Women’s Words: The Feminist Practice of Oral
G. Bowles and R. Duelli Klein. London: Routledge and History, edited by S. B. Gluck and D. Patai. New York:
Kegan Paul. Routledge.
Miller, Jean Baker. 1976. Towards a New Psychology of Pini, Barbara. 2002. ‘Focus Groups, Feminist Research
Women. London: Penguin Books. and Farm Women: Opportunities for Empowerment
Mirza, Heidi Safia. 1998. Black British Feminism: in Rural Social Research.’ Journal of Rural Studies
A Reader. London: Routledge. 18: 339–351.
Mohanty, Chandra Talpede. 1988. ‘Under Western Pollack, Shoshana. 2003. ‘Focus-Group Methodology
Eyes: Feminist Scholarship and Colonial Discourses.’ in Research with Incarcerated Women: Race, Power,
Feminist Review 30: 61–88. and collective experience.’ Affilia 18: 461–472.
Mohanty, Chandra Talpede. 1991. ‘Under Western Eyes: Presser, Lois. 2004. ‘Violent Offenders, Moral Selves:
Feminism and Colonial Discourse.’ In Third World Constructing Identities and Accounts in the Research
Women and the Politics of Feminism, edited by Interview.’ Social Problems 51: 82–101.
C. T. Mohanty, A. Russo, and L. Torres. Bloomington: Presser, Lois. 2005. ‘Negotiating Power and Narrative
Indiana University Press. in Research: Implications for Feminist Methodology.’
Mohanty, Chandra Talpede, Ann Russo, and Lourdes Signs: Journal of Women in Culture and Society
Torres. 1991. ‘Third World Women and the Politics of 30: 2067–2090.
Feminism.’ Bloomington: Indiana University Press. Pryke, Sam. 2004. ‘“Some of Our People Can Be the
Mol, Annemarie. 2002. The Body Multiple: Ontology in Most Difficult.” Reflections on Difficult Interviews.’
Medical Practice. Durham, NC: Duke University Press. Sociological Research Online 9.
Munday, Jennie. 2006. ‘Identity in Focus: The Use of Ramazanoglu, Caroline. 1989. ‘Improving on Sociology:
Focus Groups to Study the Construction of Collective The Problems of Taking a Feminist Standpoint.’
Identity.’ Sociology 40: 89–105. Sociology 23: 427–442.
Ramazanoglu, Caroline and Janet Holland. 2002. Scott, Joan W. 1992. ‘Experience.’ pp. 22–40 in
Feminist Methodology: Challenges and Choices. Feminists Theorize the Political, edited by J. Butler
London: Sage Publications. and J. W. Scott. London: Routledge.
Reinharz, Shulamit. 1979. On Becoming a Social Scott, Joan W. 1994. ‘A Rejoinder to Thomas C.
Scientist. San Francisco: Jossey-Bass. Holt.’ In Questions of Evidence: Proof, Practice
Reinharz, Shulamit. 1992. Feminist Methods in Social and Persuasion across the Disciplines, edited by
Research. Oxford: Oxford University Press. J. Chandler, A. I. Davidson, and H. Harootunian.
Reinharz, Shulamit. 1997. ‘Who Am I? The Need for Chicago: University of Chicago Press.
a Variety of Selves in the Field.’ pp. 3–20 in Refelxivity Shohat, Ella. 2001. ‘Area Studies, Transnationalism and
and Voice, edited by R. Hertz. Thousand Oaks, CA: the Feminist Production of Knowledge.’ Signs: Journal
Sage. of Women in Culture and Society 26: 1269–1272.
Reynolds, Tracey. 2002a. ‘Re-thinking a Black Feminist Skeggs, Beverley. 1997. Formations of Class and
Standpoint.’ Ethnic and Racial Studies 25: 591–606. Gender. London: Sage.
Reynolds, Tracy. 2002b. ‘On Relations Between Black Smith, Dorothy. 1974. ‘Women’s Perspective as a Rad-
Female Researchers and Participants.’ pp. 300–310 ical Critique of Sociology.’ Sociological Inquiry 4: 1–13.
in Qualitative Research in Action, edited by T. May. Smith, Dorothy. 1987. The Everyday World as Problem-
London: Sage Publications. atic: A Feminist Sociology. Milton Keynes, UK: Open
Ribbens, Jane. 1989. ‘Interviewing – an Unnatural University Press.
Situation?’ Women’s Studies International Forum Smith, Dorothy. 1989. ‘Sociological Theory: Methods
12: 579–592. of Writing Patriarchy.’ In Feminism and Sociological
Ribbens, Jane. 1994. Mothers and their Children. Theory, edited by R. A. Wallace. London: Sage.
London: Sage. Smith, Dorothy. 1999. Writing the Social: Critique,
Ribbens, Jane. 1998. ‘Hearing my Feeling Voice? Theory and Investigations. Toronto: University of
An Autobiographical Discussion of Motherhood.’ Toronto Press.
pp. 24–38 in Feminist Dilemmas in Qualitative Song, Miriam and Ian Parker. 1995. ‘Commonality,
Research: Private Lives and Public Texts, edited by Difference, and the Dynamics of Disclosure in In-depth
J. Ribbens and E. Rosalind. London: Sage. Interviewing.’ Sociology 29 :241–256.
Ribbens, Jane and Rosalind Edwards. 1998. Feminist Spivak, Gayatri Chakravorty. 1993. Outside in the
Dilemmas in Qualitative Research: Private Lives and Teaching Machine. New York: Routledge.
Public Texts. London: Sage. Stacey, Judith. 1991. ‘Can There be a Feminist
Richardson, Laurel. 1988. ‘The Collective Story: Post- Ethnography?’ pp. 111–120 in Women’s Words: The
modernism and the Writing of Sociology.’ Sociological Feminist Practice of Oral History, edited by S. B. Gluck
Focus 21: 199–208. and D. Patai. New York: Routledge.
Richardson, Laurel. 1997. Fields of Play: Constructing an Stanley, Liz. 1994. ‘The Knowing Because Experiencing
Academic Life. New Brunswick, NJ: Rutgers University Subject: Narratives, Lives, and Autobiography.’
Press. pp. 132–149 in Knowing the Difference: Feminist
Riessman, Catherine. 1987. ‘When Gender is not Perspectives in Epistemology, edited by K. Lennon
Enough: Women Interviewing Women.’ Gender and and M. Whitford. London: Routledge.
Society 1: 172–207. Stanley, Liz and Sue Wise. 1983. Breaking Out. London:
Rose, Hilary. 1994. Love, Power and Knowledge: Routledge and Kegan Paul.
Towards a Feminist Transformation of the Sciences. Stanley, Liz and Sue Wise. 1990. Feminist Praxis:
Cambridge: Polity Press. Research, Theory and Epistemology in Qualitative
Sawicki, Jana. 1991. Disciplining Foucault: Feminism, Research. London: Routledge.
Power and the Body. New York: Routledge. Stanley, Liz and Sue Wise. 1993. Breaking Out Again.
Schutte, Ofelia. 1993. Cultural Identity and Social London: Routledge and Kegan Paul.
Liberation in Latin American Thought. Albany: Suny Stone, Sandy. 1991. ‘The Empire Strikes Back: A Post-
Press. Transsexual Manifesto.’ In Body Guards, edited by
—— 1998. ‘Cultural Alterity: Cross-Cultural Com- J. Epstein and K. Straub. New York: Routledge.
munication and Feminist Thought in North-South Tang, Ning. 2002. ‘Interviewer and Interviewee Rela-
Dialogue.’ Hypatia 13: 53–72. tionships Between Women.’ Sociology 36: 703–721.
—— 2000. ‘Negotiating Latina Identities.’ In Hispan- Taylor, Janette Y. 2002. ‘Talking Back: Research as an
ics/Latinos in the United States: Ethnicity, Race and Act of Resistance and Healing for African American
Rights, edited by J. E. Gracia and P. De Grief. London: Women Survivors of Intimate Male Partner Violence.’
Routledge. Women and Therapy 25: 145–160.
Taylor, Verta and Leila J. Rupp. 2005. ‘When the Girls Weedon, Chris. 1987. Feminist Practice and Poststruc-
are Men: Negotiating Gender and Sexual Dynamics turalist Theory. Oxford: Blackwell Publishers.
in a Study of Drag Queens.’ Signs: Journal of Women Weeks, Kathi. 1998. Constituting Feminist Subjects.
in Culture and Society 30: 2115–2140. Ithaca and London: Cornell University Press.
Temple, Bogusia and Rosalind Edwards. 2002. ‘Inter- Westmarland, Nicole. 2001. ‘The Quantitative/
preters/Translators and Cross-Language Research: Qualitative Debate and Feminist Research: A Sub-
Reflexivity and Border Crossings.’ International jective View of Objectivity.’ Forum Qualitative
Journal of Qualitative Methods 1(2), Article Sozialforschung/Forum: Qualitative Social Research 2.
1.http://www.ualberta.ca/∼ijqm/ Date of access: Wilkinson, Sue. 1999. ‘Focus Groups in Feminist
December 12, 2006. Research: Power, Interaction and the Co-Construction
Thapar-Bjorkert, Suruchi and Marsha Henry. 2004. of Meaning.’ Psychology of Women Quarterly 23:
‘Reassessing the Research Relationship: Location, 221–244.
Position and Power in Fieldwork Accounts.’ Inter- Wilkinson, Sue and Celia Kitzinger. 1996. Representing
national Journal of Social Research Methodology the Other: A Feminism and Psychology Reader.
7: 363–381. London: Sage.
Wahab, Stephanie. 2003. ‘Creating Knowledge Collab- Wolf, Marjery. 1992. A Thrice Told Tale: Feminism,
oratively with Female Sex Workers: Insights from Postmodernism and Ethnographic Responsibility.
a Qualitative, Feminist, and Participatory Study.’ Stanford: Stanford University Press.
Qualitative Inquiry 9: 625–642. Zavella, Patricia. 1993. ‘Feminist Insider Dilemmas: Con-
Warr, Deborah J. 2005. ‘“It Was Fun … But We Don’t structing Ethnic Identity with ‘Chicana’ Informants.’
Usually Talk About These Things”: Analyzing Sociable pp. 42–62 in Situated Lives: Gender and Culture in
Interaction in Focus Groups.’ Qualitative Inquiry 11: Everyday Life, edited by L. Lamphere, H. Ragone, and
200–225. P. Zavella. London: Routledge.
20
Biographical Methods
Joanna Bornat
Had I been writing this chapter only a few Simply putting a term such as ‘life story’ into
years ago I would have had a much easier task. Google brings hundreds and thousands of hits.
But now, in the first decade of the twenty- This is all good news, if difficult to assimilate.
first century, containing developments in Biographical methods thrive on invention
biographical methods in under eight thousand and have changed and adapted to methodo-
words, borders on the impossible. What logical, theoretical and technological change.
was an area of work scarcely acknowledged The arrival of the small portable audio
beyond groups of committed oral historians, recording machine has undoubtedly played a
occasional sociologists, auto/biographers and leading role. Indeed it would be impossible
ethnographers has become a vast and con- to imagine much of what is now recognised
stantly changing and expanding ferment of as biographical work without it. Gone are the
creative work, drawing in new as well as days when using a machine to record inter-
career-old researchers. In critical pedagogy, views was seen as a form of journalism, to be
cultural studies, critical race theory, geron- eschewed by sociologists and anthropologists
tology, decolonising research, social policy, in the field1 . Now we have the capability to
health studies, feminisms, identity theory, capture not only sounds but visual expression
studies of sexuality, employment, family and and to send the information round the world,
management theory, the range of areas in or next door in a matter of seconds.
which biographical methods have been taken In this chapter, I focus on ways in which
up is vast. All reach for meaning and accounts individual life experience is generated, anal-
in individual biographies to both confirm and ysed and drawn on to explain the social world.
complicate understandings of the working However generated, the common denomina-
and emergence of social processes and tor is that accounts are solicited and told in
relationships in place and through time. And the first person. I focus on three very different
this is only within academe. Telling your story, approaches, briefly outlining each in turn and
the public confessional, the personal account finally look at some ways to distinguish each
has become a totally pervasive form, as any in a final, and unashamedly partisan argument
quick check through the media will show. for the contribution of oral history. There are,
BIOGRAPHICAL METHODS 345
the biographical interpretive methods, oral an archive like Mass Observation or on-line
history and narrative analysis. interactive websites.
How best then to give shape and meaning to
this task? How to organise and communicate
BIOGRAPHICAL METHODS a framework which is an aid to understanding
and which provides a manageable and yet
‘Biographical methods’ is an umbrella term inclusive approach to presenting biographical
for an assembly of loosely related, variously methods? In sorting through the various
titled activities: narrative, life history, oral activities I looked for themes which would
history, autobiography, biographical interpre- bring out the strengths of biographical
tive methods, storytelling, auto/biography, approaches while highlighting what are for me
ethnography, reminiscence. These activities the most innovative and creative aspects of
tend to operate in parallel, often not recog- the contribution they make to social research
nising each other’s existence, some char- methods. On that basis the themes I will be
acterised by disciplinary purity with others working with are: interactivity, subjectivity
demonstrating deliberate interdisciplinarity. and structuring. I’ll explain briefly what
To explain and present such disparity feels I mean by each of these themes.
like a demanding intellectual undertaking. By interactivity I mean the generation
History, psychology, sociology, social policy, of data through some kind of direct social
anthropology, even literature and neuro- interaction. This is likely to be an interview
biology at times, all have a part to play. or at least a situation which involves, or
By their very nature, biographical methods has involved, face-to-face verbal exchange.
encourage a universalistic and encompassing This leads to the inclusion of biographical
approach, encouraging understanding and interpretive methods, oral history, reminis-
interpretation of experience across national, cence, storytelling, life history and narrative,
cultural and traditional boundaries, better but not autobiography, auto/biography or
to understand individual action and engage- ethnography. By choosing subjectivity I am
ment in society. See for example, Prue highlighting the extent to which the method
Chamberlayne and Annette King’s com- leads to the expression of the self, a focus on
parative study of family caring in East feelings and emotions providing insight into
and West Germany and Britain drawing on individual perceptions and understandings of
biographical interview data (Chamberlayne situations and experiences. All the activities
& King, 2000), James Hammerton and I have identified could be included under
Alistair Thomson’s life history interviews this theme, though some, for example oral
with UK migrants to Australia in the 1950s history, have at different times, and in varying
and 1960s (Hammerton & Thomson, 2005), settings shown less attention to the self, while
and African-American women’s accounts of for others, example auto/biography, see the
their professional lives in Gwendolyn Etter- positioning of the self, as generator or reader
Lewis’s study (1993). of the text as a main focus of attention
The personal and individual nature of (Stanley, 1994).
biographical data adds an additional layer With structuring I intend to convey the idea
of complexity. Biographical researchers work that biographical methods aim to generate
with a range of different types of data includ- accounts or data which, either by means of
ing diaries, notebooks, interactive websites, direct questioning, or through the nature of
videos, weblogs and written personal narra- individuals’ own responses, have an obvious
tives with methods of collection varying from or implicit structure. Again, this feels all-
the directly interventionist in, for example inclusive as what account, either told or
oral history interviewing, to a more detached expressed, does not have some kind of
encouragement and stimulation to write and narrative, a beginning or an ending? Or
record as in the collection of accounts through what story is not connected in some way
to the bigger picture, be it childbirth, war, most respects, but with some individual
schooling or sexuality? This may indeed be differences which show the distinctiveness of
the case; however, by structuring, I mean each. In what follows I draw on several of
the idea that the methods used rely on some the works cited above where these lineages
kind of prior theorising or framework of and identities are drawn out. A familiar
ideas on the part of the researcher. This is starting point is the group of sociologists
not to rule out informal structuring or the known as the ‘Chicago School’ and their
kind of everyday theorising people develop work in the first 40 years of the twentieth
in order to explain their lives but for my century. The focus on the collection of direct
purposes here to emphasise the contribution testimony and on observation under realistic
which the theorising and methods of particular conditions led to methodological innovation
disciplines, such as psychology, sociology in a number of areas. Urban society came
or history make to the generation of the under scrutiny, with studies of poverty, street
data. So, I would exclude storytelling and gangs, and high life. Alongside this strongly
autobiography from this particular category. engaged and situated commitment came a new
Finally, context; by this I mean the ways development in social psychology. Herbert
in which an individual account, or set of Mead’s idea of ‘the self’ (1934) stressed
accounts, is given meaning by its own the significance of language, culture and
framework of time and space and by those non-verbal communication, with its focus
of the researcher and interpreter of the on social interaction and reflection in the
data. Context is not only to be seen in development of the individual’s sense of who
terms of setting or the historical time or they are. His notion of the self as having
social and political structures surrounding a its own meaning and sense of reality, iden-
particular account; it also includes the agency tifiable and recognisable in relation to social
and agendas of researcher and researched, or historical context, provided a challenge
their biographical time. Autobiography and to arguments which gave primacy to the
storytelling fit less well once again. Where investigator’s or commentator’s perspective.
the main source is the single-authored account Students, teachers and researchers associated
generated independently for an audience, with the Chicago School were to generate
rather than with another, context has fewer some of the most influential developments
dimensions for exploration. in sociology; amongst these were symbolic
The burgeoning of interest in the per- interactionism (Plummer, 1991) and grounded
spective of the individual, in what has been theory (Glaser & Strauss, 1968).
described as a more ‘humanistic’ approach in It is with this background in mind that
sociological research has resulted in review I now go on to take a closer look at the
articles and books which in their different first of the three methods I identified under
ways have helpfully sketched out origins the biographical ‘umbrella’: the biographical
and developments in work with biography interpretive method.
(Plummer, 2001; Thompson, 2000; Roberts,
2002; Seale et al., 2004; Thomson, 2007).
Biographical interpretive method
This is an exciting area in which to work.
Biographical work engages with many of Fritz Schütze, a sociologist writing in
the most telling and enduring epistemological Germany in the 1980s is usually credited
and methodological issues in the human sci- with the originating work which led to
ences taking in debates on validity, memory, the development of the biographical inter-
subjectivity, standpoint, ethics, voice and pretive method. He was greatly influenced
representivity amongst others (Chamberlayne by ‘third generation Chicagoans’ such as
et al., 2000, p. 3). Anselm Strauss, Howard Becker, Erving
The three methods I have chosen to Goffman and others (Apitzsch & Inowlocki,
concentrate on have shared antecedents in 2000, p. 58). The interview method and
its subsequent analysis which he developed addresses the qualitative data with hypotheses
and which has been further refined by which draw on significant segments of text.
Gabriele Rosenthal (2004), who followed his Wengraf (2001) details the procedure for
theoretical and methodological lead, requires interpreting biographical data, showing with
the separating out of the chronological story a detailed account, how hypotheses are
from the experiences and meanings which arrived at and then worked through, as the life
interviewees provide. The process depends story is explored. Life events, as told by the
on an understanding of the biographical interviewee, are looked at and hypotheses and
interview as a process in which movement counter hypotheses drawn up and explored,
between past, present and future is constant preferably by groups of people working
and in which the interviewee may not be together, as to likely effects on someone’s
fully aware of contexts and influences in later life.
their life. This phenomenological approach to under-
Rosenthal and her erstwhile collaborator standing biographical data focuses on the
Wolfgang Fischer, developed this approach individual’s perspective within an observ-
into what is now usually known as ‘bio- able and knowable historical and structural
graphical interpretive analysis’ or ‘biographic context, and what it is like to be the
narrative interpretive analysis’ (Wengraf, person describing their lives and the various
2001). She had been interested in explaining decisions, turns and patterns of that life
work and life ethics in post World War II (Wengraf, 2001, pp. 305–6).
West German society being convinced that At one level what Wengraf is describing
the sense which people made of their lives is a complex process of interpretation, a
under the Third Reich played a central role shared and carefully documented practice
(Rosenthal, 2004, p. 49). Since Rosenthal and of searching for themes in data typical
Fischer’s early development, the method has of a grounded theory approach (Wengraf,
been given much more elaborated treatment, 2001, p. 280). However, at quite another
using individual case study analysis, based on level the analysis expects a deep level of
interview transcripts, by Prue Chamberlayne explanation and interpretation, one which
and colleagues. Their particular interest has looks for hidden and explicit meanings in the
been to theorise and explain the impact transcript. Just how this differs from the other
of social welfare policies through embrac- two approaches I’ve identified, I will come
ing the subjectivity and agency of wel- back to this later in this chapter.
fare recipients, linking private and public Oral history’s distinctive characteristic is
spheres, as these are experienced, expressed its use of sociological approaches to data
and represented through individual accounts generation and analysis in what is an historical
(Chamberlayne & King, 2000; Chamberlayne pursuit. Even though the development of the
et al., 2000, 2004). interview as a tool of investigation has a
The systematisation inherent in this much longer history, the significance of the
approach requires the elaborate codification Chicago School, as Paul Thompson points
of the interview in such a way as to identify out in his seminal text, The Voice of the
themes, having separated out the ‘lived life’ Past, was its effect on the idea of the
from the ‘told story’ in the transcribed inter- life history (2000). The interview became
view (Wengraf, 2001, p. 231). This distinction more than simply extraction of information
separates the chronological sequence of the around specific topics; it became an object
events of a life from the way that the story in itself with shape and totality given by the
is told. By identifying how someone relates individual’s told life events.
to their story, in the telling, labelling text In an early essay, the Italian oral historian
segments as to whether they are descriptive, Alessandro Portelli, argues ‘What makes
argumentative, reporting, narrative or oral history different’. Having identified oral
evaluative, biographical interpretive analysis history’s particular qualities as ‘the orality
of oral sources’ arguing for attention to of the legend of Anzac solidarity amongst
the sounds and turns of speech as opposed Australian World War I veterans (Thomson,
to the written transcript and as ‘narrative’, 1994), oral historians more typically seek
pointing out variations in narrative forms and ways of representivity through theoretical
styles, he goes on to argue oral history’s sampling, with contacts made opportunisti-
unique qualities. These are, he suggests, cally or through snowballing (see for example
‘that it tells us less about events than about Thompson, 1975; Bertaux, 1981; Lummis,
their meaning’ (his emphases) and that ‘the 1987; Bornat, 2002; Hammerton & Thomson,
unique and precious element which oral 2005, Merridale, 2005). As for data analysis,
sources possess in equal measure is the a range of approaches, some more familiar
speaker’s subjectivity’ (1981, p. 67). From to historians and some to sociologists, are
this, he argues that, ‘oral sources’ have a typically followed by oral historians, who
‘different credibility’ (p. 100, his emphasis) tend to take a more eclectic approach
and that ‘today’s narrator is not the same methodologically than researchers using the
person as took part in the distant events biographical interpretive method. In the main
he or she is relating’ (p. 102). It follows, these would be recognisable as thematic in
therefore that, ‘Oral sources are not objective’ approach, drawing directly or indirectly on
they are ‘artificial, variable and partial’ the type of constant comparative analysis and
(p. 103, his emphases). theme searching typical in grounded theory
Portelli’s position has been taken up (Glaser & Strauss, 1968).
subsequently in studies of ethnicity, class, Given oral history’s early commitment to
gender, colonialism, tradition, displacement, a form of history-making which seeks to
resistance, exclusion, by oral historians who give expression to marginalised voices with
see the method as particularly suited to emphasis on the importance of language,
understandings of oppression and marginal- emotions and oral qualities generally, data
isation. With this unashamedly political and analysis presents something of a moral
partisan approach to history, a contribution challenge as Thompson and others have
to the histories of elites was always going pointed out (Borland, 1991; Portelli, 1997,
to be less likely, though there have been pp. 64&ff; Thompson, 2000, p. 269&ff;
some exceptions, for example Courtney & Bornat & Diamond, 2007). The tension lies
Thompson’s study of business elites in the in a commitment to the presentation of the
city of London (1997) and Seldon and actual words of interviewees while seeking a
Pappworth’s case studies of elites in their way to generalize from a number of stories
handbook of elite oral history (1983). without creating too much distance between
Oral history in its early and subsequent the original recording or text and the resulting
development drew sociology for methods publication, be it hard copy, electronic or
of structuring data collection. Writing and sound and vision presentation.
researching in the context of the sociology
department at the University of Essex in
Narrative analysis
the mid 1960s (Thompson & Bornat, 1994),
Thompson was familiar with the develop- The third area of biographical activity I
ment of grounded theory as a solution to have identified, narrative analysis, also traces
sampling from a population of survivors its origins back to the Chicago School.
(2000, p. 151). While some studies have The move towards the subject as author
rested on only a handful of interviewees, for and source of evidence, through the telling
example Alessandro Portelli’s investigation of their story became its defining feature
into local memory of a massacre of civilians in the 1920s. However, where those early
by German troops occupying Tuscany in sociologists of the city were intent on
1944 (Portelli, 1997), or Al Thomson’s use capturing reality from accounts, narrative
of four life histories in his exploration theorists see the story as a greater sum
of parts than the particularities of events, and an audience: ‘us’ (Riessman, 1993,
atmospheres, environments and relationships pp. 18–19).
described. Catherine Kohler Riessman, a When it comes to analysing narrative data,
leading narratologist, explains how narratives Riessman and others point out (Andrews
interpreted through use of language, symbolic et al., 2004) ‘… there is no one (her
representations and cultural forms, provide emphasis) method’ (1993, p. 5). Indeed the
access to understanding the workings across pervasiveness of narrative studies with use
and within time of gender, class, culture, in, for example, medicine (Greenhalgh &
ethnicity, place and age, to name but a few Hurwitz, 1998), anthropology (Skultans,
social divisions and differences (1993, p. 5). 1998), psychology (Sarbin, 1986; Crossley,
This plurality does, however, mean that as 2000), media studies (Ryan, 2004), feminist
she also points out: ‘There is considerable studies (Personal Narratives Group, 1989),
disagreement about the precise definition of linguistics (Bamberg, 1997), organisation
narrative’ (1993, p. 17). studies (Denning, 2005), history (Roberts,
A focus on story or narrative sees telling, 2001), and literature (Hawthorn, 1985)
relating and recounting as a central and suggests a plethora of possible analytical
universal human activity. Lives, it is argued, procedures.
are constructed, and presented to listeners As a way to manage this diversity, to pull
in storied forms. As Widdershoven argues: it within range of some reliable analytical
‘… a story is never a pure ideal, detached framework which others can respond to and
from real life. Life and story are not two which for her preserves acknowledges the
separate phenomena. They are part of the performative and interactive nature of the
same fabric, in that life informs and is formed interview Riessman advocates use of poetic
by stories’ (Widdershoven, 2003, p. 109). and literary forms as analytical tools. These,
For Polkinghorne, narrative has special sig- she argues, enable her to identify how a
nificance for the human sciences. He argues narrative is put together and to see what are
that it is, ‘… the linguistic form uniquely its particularities in terms of characteristics
suited for displaying human existence as of speech and discourse (1993, pp. 50–51).
situated action’. This very generality presents Seeking to keep ‘the teller’ in the centre
problems of definition he goes on to admit of her analysis is ‘starting from the inside’
(1995, pp. 5–7). looking for meanings shown in the way
Riessman’s solution to the problem of the words are presented, not ignoring issues
definition is to account for narratives in terms of power which may determine what is
of genre. Narratives are to be recognised said and how (Riessman, 1993, p. 61). The
to the extent to which they relate to a perspective of the interpreter, their particular
‘narrative genre’ with its own ‘persistence theoretical stance and even their personal
of certain elements’. She argues that the history, is bound to play a part. Like the
conventional idea of a story having characters oral historians, this presents a dilemma for
acting in various ways and moving towards her but one which she feels can be resolved
some kind of conclusion is not a sufficiently through a process of open reflection and
broad enough definition. Her narrative genre questioning, as she puts it: ‘the comfort of a
includes accounts where the same event is long tradition of interpretive and hermeneutic
described repeatedly – ‘habitual narratives’ – enquiry’ (1993, p. 61).
or which are ‘topic-centred’ where particular In these very brief sketches, I’ve identi-
kinds of events are linked through a common fied what I see as the distinctive features
theme or shared characteristic. She also of the biographical interpretive method, oral
includes ‘hypothetical narratives’ of events history and narrative method, focusing mainly
which never happened. What is distinctive, on their antecedents and rather different
she seems to be arguing, is that there approaches to the interpretation of personal
is a ‘teller’, an account of ‘a situation’ accounts. To begin with, I used four themes
and on the basis of these selected the three

The interview as interrogative
approaches I’ve just been outlining from
amongst all those which come under the To argue that the interview, the most typical
heading: ‘biographical’. The themes were: source of biographical data is interrogative
interactivity, subjectivity and structuring and may appear to be a statement of the obvi-
context. ous (Bornat, 1994). After all, an interview
Before I go on to look at some differences involves questioning and the soliciting of
between the three approaches, with the aid answers, most effectively between two people
of these themes, I want to consider what though occasionally more. Why emphasise its
are the innovative and creative contributions obvious interrogative qualities? My reason for
of the biographical interpretive method, doing so is to draw attention to the dialogic
oral history and narrative analysis to social qualities of an interview, to the significance
research methods generally. In my view, of the relationship which develops, and to
each approach highlights the interview as emphasise the intentions and perspective of
an example of social interaction in ways the interviewer.
that draw on ideas of reflexivity and with The approach taken in biographical inter-
reference to the significance of difference, pretation is to use an initial question, and then
each foregrounds the subjectivity, expressed to stand back, as it were. Having posed that
feelings and meanings of the respondent, initial question, where interest in a particular
interviewee or subject. Yet for each, the topic is expressed, the interviewee in the
structuring of the dialogue through the disci- biographical interpretive interview is then left
plinary antecedents of the particular approach to relate a life narrative, if possible without
is methodologically relevant. Finally, con- interruption. A second phase then follows in
text, remembered, observed, researched, told which questions are asked as a means to
and immediate, plays a significant role in expanding on themes, to clarify points made
each of the three methods. All of them, or to ask for more detail about aspects of the
part of the ‘biographical turn’ in social life portrayed in the narrative.
science, are in different ways positioned In the oral history interview in contrast
‘… within the shifting boundaries between questioning drives the dialogue along in a
history and sociology … (and there) some quite deliberate way. As Ken Plummer argues,
of the most telling and stimulating debating oral history and life history interviews draw on
issues have emerged’ (Chamberlayne et al., ‘researched and solicited stories … (which)
2000, p. 3). do not naturalistically occur in everyday life;
rather they have to be seduced, coaxed and
interrogated out of subjects’ (Plummer, 2001,
p. 28). The questioning and answering builds
DRAWING OUT THE DIFFERENCES on itself, so that the interviewers have the
complex task of listening while questioning,
In the last part of this chapter I will take the holding at least two, sometimes more, foci
comparison further, emphasising what are, in of interests, as the interviewees pursues their
my opinion, three specific areas of difference own story, sometimes surprised at what they
using examples from interviews. However, have remembered or have found themselves
this time I won’t conceal my preference and saying in response to a question or opportunity
standpoint. In identifying the interview as to reflect. While the topic of the oral history
interrogative, emphasising the role of memory interview will have been clear initially it is
as a source for ‘pastness’ and by questioning never possible to be certain how it will turn
levels of interpretive influence, I will argue out as the dialogue develops.
that all these issues have been most effectively I’ll illustrate this with an excerpt from an
dealt with by oral historians. I will deal first interview I carried out in the early 1990s
with the interview as interrogative. with Pat Hanlon (1915–1998), a well-known
UK cyclist when I interviewed her and Did you feel that it was a sort of – was that a
four other women for an edited collection part of the feel of it, do you think, that you were
of writing on older women (Bornat, 1993). with people who were, you know, you were like a
kind of group who were rather the same, or – ?
I invited her to tell me her life story, as Well, there wasn’t very many wealthy people
a cyclist and businesswoman (unusually for around in those days. If there were they were
the cycling world she ran her own shop). nothing to do with us. You know, they’d be in a
She began with an unbroken account of her different society. There was sort of two societies,
early years as a cyclist, replete with technical wealthy people and poor people. Or moderately
poor. But there was never all running into one like
terms related to cycle racing and bike parts. they do now these days.
I was keen to guide her towards talking more Did it feel like that did it? That you were very
about the social world of cycling and took separate somehow?
this opportunity with a question about her Well yes. Because they never did the things we
first husband: did. You’d hear about them going to these dinners
and things up the town, but it never, you didn’t
even know them, half of them. It was a different
So was your first husband a cyclist as well? world. I mean, if we went to a dinner, it was only the
Yes, he was a cyclist, yes. But he used to go one year dinner, our club dinner, that was the only
out with another club. We didn’t go out with our dinner we ever went to. And I hadn’t got any clothes
club, because there wasn’t any women in that club. to go out in. I had nothing, only cycle clothes, that
I used to go out with the Actonia CC … But I also was all I had. I worked in them, I did the housework
belonged to the Clarion, which was a union all in them. The milkman would knock the door and
over the country, the Clarion were. Supposed to I was in my shorts, you know …
be Labour club, but I mean, I didn’t go to it because
it was a Labour club. Because they used to threaten As she answered my question about her
to throw me out all the time, because I used to –
didn’t agree with what they said. You know, you’re
husband I realised that she was beginning
supposed to be Labour, you know, and half of them to talk about social and political divisions in
were communists. They used to go preaching down the cycling world. This was something that
on the Dorking, on the hills and things like that. And interested me very much. Leaving behind,
I thought, I mean, wasting my time down there, for the moment, the events of her life story,
you know, with that lot! So I used to go out on my
own then.
I began on a series of questions which I
Were they strict then, about that? hoped would lead her into talk about the class
They were very strict about whether you were politics of cycling between the two world
Labour or not, yes. Because if the heads there found wars in the UK. As is obvious from the
you talking about you were – I mean, I wasn’t transcript, I used various strategies. In the
anything really, but I used to annoy them, you
know, when I said, I’m not Labour, I don’t want
end she comes back to talk about herself as
to be Labour and all this. And they used to get ever a cyclist, positioning herself as a cyclist first,
so annoyed. And they said, well, we’re going to get then as a woman. It seems that for her, class
you chucked out, you know. I says, I don’t care, you and politics were an irrelevance, or in the case
know. But, er, they never did. of the socialist Clarion movement, a means to
I suppose cycling was, it was quite a kind of what
you might call a more working-class sort of leisure
an end: more cycling.
thing. If I had used no prompts I might not
It was mostly, oh yes, mostly poor people. have heard this particular account of her
I mean, there was never a car on the road when life, and the social world of cycling might
you raced. Only the time-keeper was the only car. well not have appeared at all. Biographical
I mean if you looked for the car, that was the start
of your race …
purists might argue that I was guilty of
And they’d all be people who would be, what distorting Pat’s story. In fact I would argue
working all week, like you, and spending all their the opposite, that I was encouraging her to
weekends – develop it and to reframe it through my
Oh yes, there was, oh, it took years and years interrogative dialogue. She would have told
for wealthy people to start cycling. Their sons might
cycle, and they used to come out n their big cars,
her story differently on another occasion, to
you know, and watch their son racing. But that kind another listener or interviewer. Undoubtedly
of thing didn’t happen for years and years. I was bringing my particular ‘cultural habitus’
(Hammersley, 1997) to that interview with all narrative research. However, while memory
that this entailed. In oral history the idea that gives us access and to experience before
somehow it might be possible to render one- our own time, to experience which might
self invisible or non-interfering is regarded as otherwise be unreachable since it may not
mythical and certainly not desirable (Portelli, be recorded in documentary formats, it is
1997, chapter 1; Thompson, 2000, p. 227; not necessarily always accurate. For Portelli
Bornat, 2004). this is one of its very strengths. Confronted
I make this point to contrast with by old communists whose tales of the past
both biographical interpretive and narra- were sometimes partial, even plainly false,
tive approaches. As I have already shown he turns the tables in a celebration of oral
the preferred approach in the biographical history’s ability to reveal what really mattered
interpretive method is for a contained non- to people, ‘… uncovering the contradiction
interventionist initial interview to be followed between reality and desire’ (Portelli, 1991,
by questioning led by the interviewer. This p. 116).
separation of interviewer and interviewed Memory also plays a function in the present
through the privileging of the interviewee’s and is as much about future hopes and
account in the first interview and of the intentions as it is about telling stories, bearing
interviewer’s interests in the second, excludes witness or confessing to past involvements
the possibility of a responsive interaction with and actions. It draws on and engages with
joint initiative taking on both sides. In a collective representations and can change
contrasting way, though narrative approaches according to audience, stimuli and time of
vary in their attitude to the part played by life (Coleman et al., 1998; Rose, 2003;
questions, their focus on the structure of the Draaisma, 2004). Indeed the reliance of oral
account in order to draw out the individual’s history on older people’s memories means
perspective, similarly gives little weight to being aware of the psychological tasks facing
the dialogic possibilities of the interview. older people towards the end of life (Bornat,
Context is relevant as Riessman emphasises, 2001). ‘Pastness’ for older people therefore
‘The text is not autonomous of its context’ needs to be seen as a multidimensional
(1993, p. 21) and she rejects the model of a remembering, but none the less valuable for
narrativist such as Labov who leaves out the that. I’ll take this point further with an excerpt
interviewer-interviewee relationship in their from an interview carried out for Margot
analysis (cited in Riessman, 1993, p. 20). Jefferys’research into the founders of geriatric
However, even in her hands, context, both medicine (Ogg et al., 1999; Jefferys, 2000).
historical and immediate is presented more Dr Ronald Dent, one of Jefferys’ intervie-
as a framework than as part of the data wees, was in his mid eighties at the time of his
and evidence of the interviewer’s presence interview:
is typically excised from the text being
What do you think of the new developments in
analysed. the National Health Service? Do you have any views
about that?
Well, I’m a bit scared that a vulnerable group like
Memory as a source for ‘pastness’ the elderly sick might not benefit as much as they
Elizabeth Tonkin, an anthropologist and oral should. In fact I think they might be neglected a bit
again. And that’s what frightens me. One wouldn’t
historian, prefers the term ‘representations like to feel that the work that all of us who had
of pastness’ to ‘history’. She argues that been in geriatric medicine, the work we’ve done to
though it is less elegant, it conveys more make it a good thing to do, might find, find that
of a sense of movement between past and our work has been let down a little bit because
present as people speak and others listen hospitals are so quick, so busy doing routine ops —
operations — which they get paid a lot for rather
(Tonkin, 1992, p. 2). The active role of than looking after strokes and other problems of
memory in oral history making again distin- the elderly which take a lot longer and need more
guishes it from biographical interpretive and resources. One hopes it’s not like that2 .
Some of Jefferys’ interviewees had worked about ownership and partnership (see for
since before the NHS and in its very early example Frisch, 1990). Some feminist oral
days. Medical care of older people had been historians have led the way in questioning
much neglected and was a major challenge assumptions as to any essential understanding
for the health service. At the end of their or solidarity across the microphone, as I have
careers these doctors were looking back at argued elsewhere (Borland, 1991; Bornat &
success, medically, and in policy terms. They Diamond, 2007; see also Armitage & Gluck,
had established a specialty and could point 2002). The result for many oral historians
to a much better standard of care for older is a practice which seeks to maintain the
people, in hospital and in the community integrity of the original interview, and of
than they had witnessed in the ex Poor the interviewee, by maintaining interpretive
Law hospitals at the start of their careers. distance.
However, they were being interviewed at To identify the subjectivity of the inter-
a time of change for the health service. view, to put oneself in their place, to
Many expressed concern at the introduction draw out understandings which are not
after 1979 of a market model and business necessarily articulated in the words of the
methods into health care. To add another transcript, are all recognisable and shared
contextual layer, these doctors were now interpretive practices. To look and listen
themselves old. Contemplating the possible for silences, experiences or relationships
end to what they had achieved had specific which are unspoken or unexpressed, is
personal resonance for their own healthcare. acknowledged as appropriate and rewarding,
‘Pastness’ is thus represented through mul- but to go beyond this and to seek out
tiple time frames, in this interview as in unconscious motivations, or ways of thinking,
other oral history interviews: remembered is perhaps to be guilty of over-interpretation.
time; the time of the interview; the ‘time’ The researcher, who may or may not be
of the interviewee and of the interviewer the original interviewer, has a duty to ask
and our own time in looking back at questions of the data, to theorise about
these particular archived interviews (Bornat, it and about the people and experiences
2005). represented in it, and to become more deeply
Memory as an individual and social practice embedded in it, but this, risks distancing
and a process with known and observable the interviewee from their own words. I’ll
features and effects is of central interest use one final example to show where I feel
to oral historians in ways that it does not that the line is drawn between oral history
appear to be in biographical interpretive and and biographical interpretive and narrative
narrative analysis. It enables a perspective approaches.
which includes the effect of time and the I spent more than two hours with Pat Hanlon
influence of change and continuity while recording her life history. She gave me a
maintaining the agency of the individual as detailed account of her progression as a cyclist
the central focus of interest. to becoming one of the best wheel builders in
the country, owning a shop and being married
twice, once early in her life and then again
Interpretive influence
much later, as she retired. What she didn’t tell
The last of the three areas of difference me was that she had a son, from whom she was
I identify here is interpretive influence. By estranged. She didn’t tell me and I didn’t ask
this I am drawing attention to the ways in her. She only finally told me when I gave her
which the three approaches I’ve been looking the book chapter in which she appears to check
at position the interpreter of the data in for accuracy and representation. She then let
relation to its originator, the interviewee. me know that it might be better to mention her
Oral history’s early commitment to a demo- son as otherwise her friends might was a little
cratic purpose has led to some pointed debates strange.
To be silent about such a defining experi- These are difficult questions to answer,
ence as motherhood, could be attributed to complicated by new debates about the ethics
some deep personal flaw. I might turn to some of the secondary analysis of archived data
psychological explanations for this apparent (Bornat, 2005).
pathology on her part; I could look back
through the transcript for clues as to her mind-
set and evidence of suppression of maternal
CONCLUSION
instincts, her predilection for wearing shorts
perhaps, or an apparently obsessive interest
The three biographical methods I have
in mileage. I could hypothesise as to her
discussed in this chapter each has a distinctive
decision-making and her reflection on her life
practice and, though they share origins
from the way she accounts for the events in
in the Chicago School of Sociology, they
her life. I could counterpose her lived life to
have developed along rather different inter-
her told life, drawing out inferences as to her
disciplinary lines. Where the biographical
motivations and tendencies as a mother and
interpretive method lends itself to more
a woman. But, in the end I find this to be a
psychoanalytic interpretations of motivation
process of distancing and indeed of subjecting
and meaning, narrative analysis leans more
Pat to an over-interpreted reconstruction of
towards sociolinguistics, while oral history
her life. She may have actively chosen not
draws across both sociology and history. Each
to mention her son because to mention him
gives centrality to the individual account in
would be upsetting. She may have decided
attempting to explain the changing nature and
to focus exclusively on her life as a cyclist;
persistence of social relations and social struc-
indeed she made few references to other
tures. While each makes use of the interview
aspects of her personal life, and only when
to generate data, only oral history continues
prompted by me. She may have retold the
to focus on the dynamics of the interview
narrative of her life for herself so that her son
through the process of interpretation and
was given no role. She might also have felt,
discussion. I have admitted a partisan position
as a public person, that her private life would
in my relationship with oral history but that is
be of little interest to me. Least possible, she
not to ignore the contribution of the other two
may simply have forgotten to mention her
approaches. In looking for ways to pin down
son. Whatever the reason, I can’t know and
the process of interrogating the data they force
though I could speculate and develop a theory
us to pay attention to explaining our thinking
relating to some developmental deficiency I
and analytical procedures, highlighting the
can see no advantage in this. To carry out
detail which a phenomenological approach
more interviews with older women cyclists
demands. My only concern is that in doing
might give me a better idea of Pat’s life in
so we risk an over-interpretation which rather
context. As it is, I have only her testimony
than emphasising the qualities of the original
to go on. Perhaps what I can draw out of
teller, eclipses them and puts the interpreter in
this experience is a sense of inadequacy as an
a position of authority and control.
interviewer. For once my interrogative powers
failed me.
But there is also another angle to inter-
pretive influence and this is the question of NOTES
ethics. How far is it ethical to subject another
person’s life to interpretation if the process 1 Fieldwork training for some trainee sociologists in
and outcome are likely to be unrecognisable to the 1960s involved making notes after the interview
them? How acceptable is an interpretation in or observation. Taping was definitely frowned on as
a poor substitute for skills in observation and recall
which there is no possibility of continuing dia- (Graham Fennell, personal communication).
logue and discussion, particularly where the 2 Margot Jefferys Interview number 306, deposited
data originated in an interview relationship? at the British Library Sound Archive.
REFERENCES Chamberlayne, P., Bornat, J. & Wengraf, T. eds (2000)

The Turn to Biographical Methods in Social Science,
Andrews, M., Day Sclater, S., Squire, C. & London, Routledge.
Tamboukou, M. (2004) ‘Narrative research’, in Chamberlayne, P. & King, A. (2000) Cultures of
C. Seale, G. Gobo, J. Gubrium eds, Qualitative Care: Biographies of Carers in Britain and the Two
Research Practice, London, Sage, pp. 109–124. Germanies, Bristol, The Policy Press.
Apitzsch, U. & Inowlocki, L. (2000) ‘Biographical Coleman, P. G., Ivani-Chalian, C. and Robinson, M.
analysis: a “German” school?’, in P. Chamberlayne, (1998) ‘The story continues: persistence of life themes
J. Bornat & T. Wengraf eds, The Turn to Biographical in old age’. Ageing and Society 18(4), 389–419.
Methods in Social Science, London, Routledge, Courtney, K. & Thompson, P. (1997) Changing Lives:
pp. 53–70. the Changing Voices of British Finance, London,
Armitage, S. H. & Berger Gluck, S. (2002) ‘Reflections on Methuen.
women’s oral history: an exchange’, in S. H. Armitage, Crossley, M. L. (2000) Introducing Narrative Psychology:
P. Hart & K. Weathermon eds, Women’s Oral History: Self, Trauma, and the Construction of Meaning,
the Frontiers Reader, Lincoln, University of Nebraska Buckingham, Open University Press.
Press, pp. 75–86. Denning, S. (2005) The Leader’s Guide to Storytelling:
Bamberg, M., ed. (1997) ‘Oral versions of personal Mastering the Art and Discipline of Business
experience–Three decades of narrative analysis: Narrative, San Francisco, Jossey-Bass.
A special issue of the Journal of Narrative and Life Draaisma, D. (2004) Why Life Speeds up as you Get
History ’, Mahwah, USA, Lawrence Erlbaum. Older: How Memory Shapes our Past, Cambridge,
Bertaux, D. (1981) ‘Life stories in the baker’s trade’, in Cambridge University Press.
D. Bertaux ed., Biography and Society, London, Sage. Etter-Lewis, G. (1993) My Soul is My Own: Oral
Borland, K. (1991) ‘ “That’s not what I said”: Interpretive Narratives of African American Women in the
conflict in oral narrative research’, in S. B. Gluck & Professions, New York, Routledge.
D. Patai eds, Women’s Words: the Feminist Practice Frisch, M. (1990) A Shared Authority: Essays on the Craft
of Oral History, London, Routledge, pp. 63–75. and Meaning of Oral and Public History, Albany, State
Bornat, J. (1993) ‘Life Experience’, in Bernard, M. and University of New York Press.
Meade, K. eds. Women Come of Age: Perspectives on Glaser, B. & Strauss, A. (1968) The Discovery
the Lives of Older Women, London, Edward Arnold, of Grounded Theory, London, Weidenfeld &
pp. 23–42. Nicholson.
Bornat, J. (1994) ‘Is oral history auto/biography?’ Greenhalgh, T. & Hurwitz, B. (1998) Narrative Based
Auto/Biography 3.1/3.2, 17–30. Medicine: Dialogue and Discourse in Clinical Practice,
Bornat, J. (2001) ‘Reminiscence and oral history: Parallel London, BMJ Books.
universes or shared endeavour?’ Ageing and Society Hammersley, M. (1997) ‘Qualitative data archiving:
219–241. some reflections on its prospects and problems’,
Bornat, J. (2002) ‘Doing life history research’, in Sociology 31(1), 131–142.
A. Jamieson & C. Victor eds, Researching Ageing Hammerton, A. J. & Thomson, A. (2005) Ten Pound
and Later Life, Buckingham, Open University Press, Poms: Australia’s Invisible Migrants, Manchester:
pp. 117–134. Manchester University Press.
Bornat, J. (2004) ‘Oral History’, in C. Seale, G. Gobo & Hawthorn, J. (1985) Narrative: From Memory to Motion
J. Gubrium eds, Qualitative Research Practice, Pictures, London, Edward Arnold.
London, Sage, pp. 34–47. Jefferys, M. (2000) ‘Recollections of the pioneers of the
Bornat, Joanna (2005) ‘Recyling the evidence: geriatric medicine specialty’, in Bornat, J., Perks, R.,
different approaches to the reanalysis of geron- Thompson, P. & Walmsley J. eds, Oral History, Health
tological data [37 paragraphs]’. Forum Qual- and Welfare, London, Routledge, pp. 75–97.
itative Sozialforschung/Forum: Qualitative Social Lummis, T. (1987) Listening to History, London,
Research [On-line Journal], 6(1), Art. 42. Available Hutchinson.
at: http://www.qualitative-research.net/fqs-texte/1- Mead, G. H. (1934) Mind, Self and Society from the
05/05-1-42-e.htm. Standpoint of a Social Behaviorist, Chicago, University
Bornat, J. & Diamond, H. (2007) ‘Women’s history and of Chicago Press.
oral history: Developments and debates’, Women’s Merridale, C. (2005) Ivan’s War: the Red Army 1939–45,
History Review 16(1), 19–39. London, Faber and Faber.
Chamberlayne, P., Bornat, J. & Apitzsch, U. eds (2004) Ogg, J., Evans, G., Jefferys, M. & MacMahon, D. G.
Biographical Methods and Professional Practice, (1999) ‘Professional responses to the challenge of old
Bristol, The Policy Press. age’, in Bernard, M. & Phillips, J eds, The Social Policy
of Old Age: Moving into the 21st Century, London, Sarbin, T.R. (1986) ‘The Narrative as a Root Metaphor for
Centre for Policy on Ageing, pp. 112–127. Psychology’, in T. R. Sarbin (ed) Narrative Psychology:
Personal Narratives Group (1989) Interpreting Women’s The Storied Nature of Human Conduct, New York,
Lives: Feminist Theory and Personal Narratives, Praeger, pp. 3–21.
Bloomington: Indiana University Press. Seale, C., Gobo, G., Gubrium, J. & Silverman, D. (2004)
Plummer, K. (1991) Symbolic Interactionism, vols 1&2, Qualitative Research Practice, London, Sage.
Aldershot, Edward Elgar. Seldon, A. & Pappworth, J. (1983) By Word of Mouth:
Plummer, K. (2001) Documents of Life 2, London, Sage. Elite Oral History, London, Methuen.
Polkinghorne, D. E. (1995) ‘Narrative configuration in Skultans, V. (1998) ‘Anthropology and narrative’, in
qualitative analysis’, in Hatch, J. A. & Wisniewski, R. Greenhalgh, T. & Hurwitz, B. eds, Narrative Based
eds, Life History and Narrative, London, Falmer. Medicine: Dialogue and Discourse in Clinical Practice,
Portelli, A. (1981) ‘What makes oral history different’, London, BMJ Books.
History Workshop, 12, 96–107. Stanley, L. (1994) ‘Sisters under the skin? Oral histories
Portelli, A. (1991) The Death of Luigi Trastulli and Other and auto/biographies’, Oral History, 22(2), 88–89.
Stories: Form and Meaning in Oral History, New York, Thompson, P. (1975) The Edwardians, London,
State University of New York Press. Weidenfeld & Nicholson.
Portelli, A. (1997) ‘The massacre at Civitella val di Chiani Thompson, P. (2000) The Voice of the Past, Oxford,
(Tuscany, June 29, 1944): myth and politics, mourning Oxford University Press, 3rd edition.
and common sense’, in Portelli, A. ed., The Battle Thompson, P. & Bornat, J. (1994) ‘Myths and memories
of Valle Giulia: Oral History and the Art of Dialogue, of an English rising 1968 at Essex’, Oral History 22(2),
Madison, University of Wisconsin Press, pp. 140–160. 44–54.
Riessman, C. K. (1993) Narrative Analysis, Newbury Thomson, A. (1994) Anzac Memories: Living with the
Park, Sage. Legend, Melbourne, Oxford University Press.
Roberts, B. (2002) Biographical Research, Buckingham, Thomson, A. (2007) ‘Four paradigm transformations in
Open University Press. oral history’, Oral History Review.
Roberts, G. (2001) The History and Narrative Reader, Tonkin, E. (1992) Narrating our Pasts: The Social
London, Routledge. Construction of Oral History, Cambridge, Cambridge
Rose, S. (2003) The Making of Memory: from Molecules University Press.
to Mind, London, Vintage, 2nd edition. Wengraf, T. (2001) Qualitative Research Interviewing,
Rosenthal, G. (2004) ‘Biographical research’, in London, Sage.
Seale, C., Gobo, G. & Gubrium, J. eds, Qualitative Widdershoven, G. A. M. (2003) ‘The story of
Research Practice, London, Sage, pp. 48–64. life: Hermeneutic perspectives on the relationship
Ryan, M.-L. ed. (2004) Narrative across Media the between narrative and life history’, in R. Miller ed.,
Languages of Storytelling, Lincoln, USA, University Biographical Research Methods, Vol. IV, London,
of Nebraska Press. Sage, pp. 108–123.
21
Focus Groups
Janet Smithson
INTRODUCTION action research, and the growing use of online

focus groups.
This chapter sets out some of the main
issues, both practical and theoretical, of using
focus groups in social research, together with THE HISTORY OF FOCUS GROUPS IN
suggestions on how to use and analyse the SOCIAL SCIENCE RESEARCH
groups most effectively. First the history and
reasons for using focus groups in social Focus groups originated in sociology in the
research are considered, taking note of some 1920s (Merton and Kendall 1946), but were
of the different epistemological and theo- primarily used by market researchers for
retical positions underpinning focus group several decades (Templeton 1987), before
research. Then, design and procedure are regaining popularity in the social sciences
considered, including sampling and selecting in the 1990s (Wilkinson 1998), as well as
participants, the logistics of recording and becoming widely used as a marketing and
managing the data, and ethical considerations. political tool for gathering ‘opinions’. They
Third, the role of the moderator, including are increasingly being used as a research tool
strategies for moderating focus groups, and throughout the social sciences, as well as in
acknowledgement of the impact of the a wide range of other academic fields – for
moderator, is discussed. In the section entitled example health studies, education, political
‘Analysing focus group data’, some of the science and geography.
specific issues which arise in analysis of Even though focus groups comprise face-
focus groups are highlighted, with particular to-face interaction of crucial interest to social
reference to the importance of the group scientists, and are increasingly being used
context. Finally, the section entitled ‘Using as a research tool (Wilkinson 1998), there
focus groups in specific contexts’ looks at the is a significant lack of literature on the
use of focus groups in specific contexts: within analysis of the conversational processes and
feminist research, organisational research, in structures involved in them, although various
cross-cultural and cross-national research, in researchers have called attention to this lack
(Kitzinger 1994, Agar and MacDonald 1995, set questions, but it has elements of both
Myers 1998, Wilkinson 1998), and there these forms of talk. The different definitions of
have been some recent considerations of focus groups, as well as the origins of focus
interactive patterns within focus groups group methodology in very varied contexts,
(e.g. Myers 1998, Kitzinger and Frith 1999, demonstrate some of the variations within this
Puchta and Potter 1999). Wilkinson (1998) methodology; even within the social research
concludes that ‘there would seem to be context, focus groups are used by researchers
considerable potential for developing new – with very different theoretical and analytical
and better-methods of analysing focus group backgrounds, and these have implications for
data’ (1998: 197). The regularly occurring the use and analysis of focus groups.
lack of theoretical and analytical discussions
in the focus group literature, even in academic
contexts, is perhaps partially explained by REASONS FOR USING FOCUS GROUPS
the roots of focus group usage as a market IN SOCIAL RESEARCH
research tool. The perception that focus
groups are a quick and useful way of gathering A growing literature on the reasons for
‘opinions’ still informs mainstream debate on using focus groups in the social sciences,
focus groups and focus group manuals, and together with practical advice and how to
affects how they are used – for example, they organise them and run them, is now available,
are often viewed as (only) suitable for the for example by Kitzinger (1995), Vaughn
initial stages of a research project. et al. (1996), Greenbaum (1998), Morgan
and Kreuger (1998) and Bloor et al. (2000).
One often-stated advantage of using focus
WHAT IS A ‘FOCUS GROUP’? groups lies in the fact that they permit
researchers to observe a large amount of
A focus group is generally understood to interaction on a specific topic in a short time.
be a group of 6–12 participants, with an They are sometimes viewed as a quick and
interviewer, or moderator, asking questions easy way to gather data. However, there are
about a particular topic. Some researchers, often problems with setting up and organising
such as Hughes and DuMont (1993: 776) groups and obtaining the right number and
characterise focus groups as group interviews: mix of people to groups. In practice, groups
‘Focus groups are in-depth group interviews tend to be based on availability rather than
employing relatively homogenous groups to representativeness of sample. Moderating
provide information around topics specified focus groups can be complex, and the data
by the researchers’. Others define them obtained can be difficult to transcribe and
as group discussions: ‘a carefully planned analyse (Pini 2002).
discussion designed to obtain perceptions on From a practical perspective, the feasibility
a defined environment’ (Kreuger 1998: 88) of arranging focus groups needs to be con-
or ‘an informal discussion among selected sidered. For example, if interviewing people
individuals about specific topics’ (Beck who are geographically distant, or who have
et al., 1986). These definitions show a ten- very little time, or who will be interviewed
sion between participant-researcher interac- in a second language, then focus groups
tion and interaction between participants, with may prove impossible (though telephone
interactions between participants in the group and online focus group methods are being
being a particularly distinctive characteristic developed, see the section entitled ‘Using
of focus group methodology, although this is focus groups in specific contexts’). Focus
not always apparent from analysis of focus groups have been described as particularly
group data. The data obtained in this method useful at an early stage of research as a
is neither a ‘natural’ discussion of a relevant means of eliciting general viewpoints, which
topic, nor a constrained group interview with can be used to inform design of larger
FOCUS GROUPS 359
studies (Vaughn et al., 1996). They are often should be relatively homogenous membership
used in conjunction with another method, (Kreuger 1994, Ritchie and Lewis 2003).
such as individual interviews or survey Guides of focus group research typically
questionnaires. While perceived convenience advocate having single sex groups, and
is a regularly cited reason for using focus several groups with members with compa-
groups, from a methodological perspective, rable characteristics, to permit cross-group
the question should rather be whether focus comparability. There are many other vari-
groups will produce the best sort of data for ables which may need to be taken into
the research question. consideration, such as nationality, sexuality
One of the perceived strengths of focus and ethnic background. Having people at
group methodology is the possibility for similar life stages, or working in similar
research participants to develop ideas collec- jobs, can be particularly relevant. However,
tively, bringing forward their own priorities heterogeneous groups can produce very
and perspectives, ‘to create theory grounded interesting discussions. For example, mixed
in the actual experience and language of sex groups can challenge the typical male
[the participants]’ (Du Bois 1983). Morgan and female discourses on these topics (Smith-
(1988) views the hallmark of a focus group son 2000). Recruitment of group members
as ‘the explicit use of the group interaction has been shown to affect the group dynamics,
to produce data and insight that would for example Agar and MacDonald (1995)
be less accessible without the interaction point out how the ways in which respon-
found in a group’ (Morgan 1988: 12). dents are recruited come to condition the
A central feature of focus groups is that group talk.
they provide researchers with direct access
to the language and concepts participants
Organisation and dynamics of
use to structure their experiences and to
focus groups
think and talk about a designated topic.
‘Within-group homogeneity prompts focus While the literature often (e.g. Vaughn 1996)
group participants to elaborate stories and recommends focus groups of up to 12 partici-
themes that help researchers understand how pants, there are practical and methodological
participants structure and organize their social reasons why many focus groups are smaller.
world’ (Hughes and DuMont 1993). Focus Practically, it can be difficult to get an exact
groups with children have been shown to be a number of participants to turn up to a focus
very effective approach for collecting data in group, especially if trying to get a specific
a setting which children feel comfortable with sub-group, for example new parents working
(Ronen et al., 2001). in specific jobs, or expectant mothers of
a particular age. In larger groups, there is a
likelihood that some participants will remain
DESIGN AND PROCEDURE silent or speak very little, while smaller
groups (say 4–8 participants) often provide
an environment where all participants can
Sampling and selecting participants
play an active part in the discussion. Smaller
In focus group methodology, the unit of groups often yield interesting and relevant
analysis is taken to be the group (Morgan data, giving more space for all participants
1988, Kreuger 1998), and groups are typi- to talk and to explore the various themes
cally homogenous – for example, students in detail (Brannen et al., 2002). Ritchie and
on a certain course, or a group with a Lewis (2003) suggest that if groups are
similar medical condition. Participants are smaller than four they can lose some of the
chosen to fit in with the group’s demo- qualities of being a group, while they see triads
graphic. According to the prescriptions about and dyads as an effective hybrid of in-depth
focus group methodology in the literature interviews.
The practicalities of organising focus is likely to refocus off-topic discussions, and

groups are covered in various guides, for stick to a structured interview schedule. In
example Vaughn et al. (1996) and Morgan this context, most interaction is likely to be
and Kreuger (1998). Practicalities of setting between the moderator and the participants,
up focus groups include considering the issues and there is little discussion besides
of how you are going to obtain a sufficiently answering the set questions. In contrast,
large (but not too large) group of people a less structured approach is typical in much
at a specific place and time. Will childcare, social research; whether the goal is more
travel expenses or renumeration be provided, typically to understand the participants’
and if not will this exclude certain groups of thinking, the moderator is primarily aiming
participants? Moreover, as with recruitment, to facilitate discussion rather than direct it,
the way in which the focus group is presented and participants are encouraged to talk to
and conducted – whether refreshments are each other rather than just respond to the
offered, whether the group is being paid to moderator’s questions. As Morgan (2002)
participate, the perceived formality of the points out, both of these focus group types can
occasion – will, as with all research methods, be used within social research, depending on
have an impact on the participants’ responses the research topic and theoretical approach.
and interactions. Agar and MacDonald (1995) argued that
The focus group procedure is typically focus groups are usually too structured and not
to follow a relatively unstructured interview as useful as more in-depth qualitative ethno-
guide, which generates a list of topics for graphic interviews. However, as described in
discussion. The aim is to cover the topics this chapter, focus groups can be conducted
set by the research agenda, but with some in a less structured way, and have been
flexibility to allow related topics to emerge found useful in postmodernist and feminist
in this context. The focus group moderator research, for example, as a way of uncovering
(who may or may not be the researcher) discourses and narratives in a way which can
guides the discussion, making sure that all feel less structured to participants. It is vital to
topics are covered, and that all group members remember in focus group research that the data
are given the chance to speak. Groups will obtained is different to the data which would
ideally last from 1 to 2 hours. Just as with emerge in a different research context, such as
other forms of semi-structured interview, individual interviewing. This can be viewed
testing the guide on a pilot group is highly as hearing different stories in the different
recommended. In social science research, research contexts, or as getting both public
focus groups are usually recorded either and private accounts.
aurally and/or using video facilities. This
contrasts with market research where notes are
Ethical considerations
made during the focus group by the moderator
or a colleague. A particular concern with using focus group
Morgan (2002) makes a distinction methodology is the ethical issues involved
between the more structured approach to of having more than one research participant
focus groups which originated in market at a time. This has two implications: first
research, and a less structured approach which people may be uncomfortable with talking
has emerged from social research using focus about their concerns in a group context,
groups. In marketing research, moderators are whether with strangers or with people they
usually being paid to find out some specific know. Sometimes group members may not
answers for a client, and there is therefore a respond appropriately to other members’
need for the moderator to be active and visible disclosures. The moderator can try and move
in the group, performing for the satisfaction the discussion on or change the topic if group
of a paying client. In this context, the members appear uncomfortable with sensitive
moderator of a fairly structured focus group issues.
FOCUS GROUPS 361
Second, the researcher cannot guarantee come out in focus groups unless specifically
that all discussion in this context will remain designed groups, include gay and lesbian
totally confidential. A useful strategy is to start views, and other non-standard family set-
the focus group with a list of ‘dos and don’ts’, ups, and also ethnic minority and religious
including asking participants to respect each minority perspectives. Separate focus groups
others’ confidences and not repeat what was can cover some aspects of these perspectives,
said in the group; however this cannot be and for other aspects, more ‘private’ methods
enforced. The moderator can guarantee from such as individual interviews may be more
a personal perspective that the things said in a suitable. However, the limitations of what is
focus group context will be kept anonymous discussed and what is omitted vary and it
and confidential, but cannot guarantee that is possible to get unexpected and extremely
co-participants will not discuss the group, interesting discussions about topics which are
which can be a problem, especially in an not always ‘recommended’ in focus group
institutional setting, such as in a workplace, manuals. Groups may be happy to discuss
or health care setting. sensitive topics such as sexual orientation
and parenting in a general way, but not to
give personal details about their own lives.
When are focus groups not
Sensitive topics can be discussed in a general
appropriate?
way in a focus group context, but with the
Certain topics are commonly understood to emphasis on general discussion rather than
be unsuitable for the focus group context. individual experience.
In particular, topics which participants may
view as personal or sensitive are often
better left for other methods, for example THE ROLE AND IMPACT OF
individual interviews. These may include THE MODERATOR
people’s personal experiences or life his-
tories, their sexuality, and topics such as In market research moderators tend to be
infertility or financial status. What is viewed specifically trained and employed to per-
as a private issue varies between different form this task, while in the social sciences
cultural groups (and also depends on age, researchers often moderate the group them-
gender and other contexts). In institutional selves. Specific issues that the moderator
contexts, such as workplaces, or schools, is expected to deal with include dealing
people may be particularly wary of presenting with disagreement and arguments in the
their views or talking about their personal groups, including all participants, noticing
experiences in front of colleagues, managers when participants are uncomfortable with a
or peers. Focus groups may also be inap- discussion and dealing with this appropriately,
propriate when the aim of the research is ensuring that essential topics are covered in
to obtain in-depth personal narratives, for the time available. The moderator is expected
example of the experience of illness. The to strike a balance between generating interest
methodology may also be inappropriate for in and discussion about a particular topic,
topics where people have strong or hostile while not pushing their own research agenda
views. However, in all these cases, much ending in confirming existing expectations
depends on the questions asked and the group (Vaughn et al., 1996, Sim 2002). They should
dynamics. be trying to ensure that discussion is between
There are perspectives which rarely come participants rather than between them and the
out in ‘mainstream’ groups, though these moderator (Sim 2002).
vary in different cultural contexts, and are In qualitative social science research, the
affected by age, gender and background of the role and subjectivity of the researcher is a
participants, as well as the setting and context vital part of the research context, and in this
of the focus group. Perspectives which rarely paradigm, the role and positioning of the
focus group moderator is understood to make groups of strangers, friends or colleagues,

a difference to the group dynamic, as well respectively, affect each others’ contribution
as to the data obtained. For example, when to the research.
considering single sex groups, the sex of the
moderator also needs to be taken into account.
The moderator’s impact as a gendered and Participants’ use of groups
embodied being needs to be considered both in
Morgan (1996) highlights the need for
the set-up of the groups, and in the analysis.
focus group organisers to consider more
This is not unique to focus group research:
carefully both the concerns and the priorities
surveys, questionnaires and individual inter-
of the participants. In qualitative social
views have all been shown to sometimes result
research paradigms, research participants are
in respondents giving accounts perceived as
understood to be active co-researchers or
acceptable to the researcher (Bradburn and
participants rather than passive subjects. An
Sudman 1979, Bryman 1988). The problem
important question for focus group method-
may be exacerbated in focus group research
ology is how do participants use the focus
by fear of peer group disapproval.
groups? Focus groups are not simply a means
While focus group literature may some-
of eliciting knowledge from participants,
times give the impression that the ideal
but are often reported to be quite creative
moderator is a neutral person with the ability
experiences for the participants themselves
to encourage the discussion, and pick up
(Madriz 2000, Brannen 2004). People can use
on participants’ responses and narratives,
the context to become particularly reflective,
in practice the moderator can never be
exploring themselves and their relationships
a neutral bystander, and should instead
in tentative and thoughtful ways. Groups
aim for reflexivity and awareness of the
can become a space for participants to
way their characteristics and behaviour may
discover new things about their condition or
be influencing the group (Wilkinson and
organisation, or to make contact with other
Kitzinger 1996, Stokoe and Smithson 2002).
people with similar experiences.
Moreover, it is possible for the moderator to
make explicit use of their own experience
as a way of encouraging the discussion, for
example a moderator with young children, ANALYSING FOCUS GROUP DATA
or with experience of a specific life event or
illness, may give examples from their own Even though focus groups comprise face-to-
experience as a way of encouraging the group face interaction of crucial interest to social
to discuss an issue. scientists, and are increasingly being used
as a research tool (Wilkinson 1998), there
was, until recently, a significant lack of
Group dynamics and interaction
literature on the analysis of the conver-
The role and impact of focus group partici- sational processes and structures involved
pants on each other and on the perspectives in them, although various researchers have
which emerge have been relatively little called attention to this lack (Kitzinger 1994,
studied. There is wide variation in focus Agar and MacDonald 1995, Myers 1998,
group research in type and size of group, Wilkinson 1998), and there have been some
with corresponding effects on the group recent considerations of interactive patterns
dynamics. For example, some groups consist within focus groups (e.g. Myers 1998,
of people who have worked together or Kitzinger and Frith 1999, Puchta and
know each other well; others are made up Potter 1999). Wilkinson (1998) concludes that
of complete strangers. While the literature ‘there would seem to be considerable potential
stresses the importance of homogeneity in for developing new – and better-methods of
groups, there is little attention to how analysing focus group data’ (1998: 197).
FOCUS GROUPS 363
As with all social research, the researcher

Groups as the unit of analysis
needs to consider whether the status of
As mentioned earlier, an important charac- the data (for example, realist or postmod-
teristic of focus group data is that groups, ernist approach) fits with the methodological
rather than individuals within groups, are approach, and with the analytical techniques
usually viewed as the unit of analysis. employed, as well as fitting the research
However, the unit of analysis depends on concerns. The variation in focus group
the interpretative framework (and attendant methodologies and uses demonstrates that
underlying assumptions) that the researcher this methodology is not uniquely tied to
leans on. Wilkinson (1998) argues that many one theoretical perspective; focus groups are
articles based on focus group research appear popular with researchers from a wide range of
to be treating the data as identical to individual epistemological positions, as well as across a
interview data, and the unique aspects of focus range of disciplines, but the way they are used
groups are habitually ignored in the analysis. and analysed is likely to be very different.
The many variables in setting up and
conducting focus groups touched on earlier Natural discussion or artificial
can make systematic analysis tricky. Sample
performance?
populations in the focus groups are small
and non-representative. Topics are not all The central feature of focus groups, as a site
discussed in equal depth in all groups. of social interaction, is rarely picked up on
Some information is volunteered in some in focus group analysis, with some notable
groups and not others, some individuals are exceptions (for example Myers 1998, Puchta
more forthcoming than others, and the group and Potter 1999). A key issue for researchers
interactions will determine the discussion. is the complex relation of focus group
If a systematic analysis is needed for the talk to everyday talk. Agar and MacDonald
research agenda, then there will be a fairly doubt the ‘lively conversation’ called for
structured approach to the use of focus groups, in the focus group handbooks – ‘in fact
as described earlier (c.f. Morgan 2002), with a judgement as to whether a conversation
a strict control on the number and mix occurred, lively or not, is a delicate matter that
of participants, a limited set of questions, calls for some close analysis of transcripts’
and a more guided approach to moderation. (1995: 78). Focus groups can be viewed
The use of systematic coding, or content as performances in which the participants
analysis, which has been historically popular jointly produce accounts about proposed
in focus group research (Morgan 1988, topics in a socially organised situation.
Wilkinson 1998) tends to fit with the more Participants and moderator are ‘operating
structured approach to focus groups found in under the shared assumption that the purpose
market research, and often reflects a more of the discussion is to display opinions to
positivist epistemological stance. the moderator’ (Myers 1998: 85). However,
In contrast, focus group researchers coming ‘natural’ discussion is also a performance
from a postmodernist research perspective, (Goffman 1981); there is not a ‘simple opposi-
place less (or no) emphasis on ‘systematic tion of the institutional and the everyday, the
analysis’, as groups are viewed as produc- artificial and the real’ (Myers: 107). Rather,
ing locally situated accounts – ‘collective ‘natural’ conversation and various forms
testimonies’ (Madriz 2000) – which are of institutional talk, including classroom,
not necessarily directly comparable. From courtroom, workplace and research-generated
this approach, size of groups and the exact talk, are all part of a range of situations
discussion of set topics may be less essential, for talk(Drew and Heritage 1992). Silverman
and the research agenda may be better met by argues that ‘neither kind of data [artificial and
a fairly unstructured approach which permits naturally occurring] is intrinsically better than
a participant-led discussion. the other; everything depends on the method
of analysis’ (Silverman 1993: 106). Focus stage of the focus group can also make a
groups, then, should not be analysed as if they difference – a question asked in the first
are naturally occurring discussions, but as few minutes of the focus group may elicit
discussions occurring in a specific, controlled a different response if asked later on when
context. people are more comfortable with the group.
There have been numerous critiques of Overall, a focus group is likely to elicit
qualitative techniques which appear to offer ‘public’ accounts (Smithson 2000, Sim 2002)
an ‘authentic gaze’ into participants’ views in contrast to the private accounts which
or lives (Silverman 2000). Focus group might emerge in individual interviews or in
researchers have typically extolled the group everyday interactions.
context as one which limits the role and But detailed study of group data suggests
impact of the moderator, thereby permitting the opposite can also happen and they can be
a more ‘natural’ discussion to emerge. This a forum for contrasting opinions to emerge
view needs to be treated with caution; the and develop (Smithson 2000, Pini 2002).
group context does not obliterate the role There are various powerful counter-examples
of the moderator, or the research context of to the expected ‘rule’ that focus groups
the talk. replicate the dominant discourse. Sometimes
participants make gentle, or overt challenges
to the status quo, and there are particular
Consensus and disagreement
strengths in the challenging of views by other
The emergence of dissonant views and participants, rather than by the moderator.
opinions between participants – what Kitzinger (1994) shows how difference can
Kitzinger (1994) calls ‘argumentative be examined in the focus group context, and
interactions’ is a distinctive feature of the how the method can be used as a way of
focus group method and often makes an studying how differences are negotiated and
important contribution to the richness of understood.
the data obtained (Sim 2002). However, One of the strengths of the method
there are limitations to how disagreements (Smithson 2000, Pini 2002) is the way focus
are expressed in this peer group context. The group discussions often range between discus-
group context of this methodology, while sion of personal experiences, and collective
appropriate for uncovering group discourses experiences. Kitzinger and Farquhar (1999)
and stories, is, meanwhile, likely to reproduce contend that focus groups sometimes provide
the socially accepted, normative discourse an opportunity for ‘sensitive’ topics to be
for that group. People with unpopular views, raised, as there is the space for discussion and
or less confident group members, may be reflection and time to explore issues in a more
reluctant to air their views in a group context. in-depth way than might be the case in more
People are often (though not always – see routine dialogue. They argue that focus groups
shortly) reluctant to disagree openly with a can be used to unpack the social construction
stated view, especially in groups of strangers. of sensitive issues, uncover different layers of
It is important therefore not to assume discourse, and illuminate group taboos and
consensus just because no one has disagreed the routine silencing of certain views and
openly (Sim 2002). If a divergence of views experiences. Through attention to sensitive
emerges, it is safe to assume that participants moments, researchers can identify unspoken
do hold different views; however if no assumptions and question the nature of
divergence appears, this does not indicate everyday talk. Focus group talk, like everyday
consensus. talk can include many contradictions, norms,
General questions can often elicit socially and both official and unofficial perspectives
acceptable responses when it is likely that on a sensitive topic.
in fact the individuals in the group hold One of the claims made in favour of
stronger views than this. The timing and focus groups as a methodology is that they
FOCUS GROUPS 365
can be a powerful method for minority Conversational interaction is viewed as

groups or groups which are often ignored the prime locus for the development, or
in other research methods to express their co-construction (Jacoby and Ochs 1995) of
views and experiences (Wilkinson 1998, sense-making. Disagreements, challenges and
Smithson 2000). In these cases, the group resistances are seen as important parts of the
perspective and concerns can dominate, rather construction of collective opinions. From this
than the interviewer’s pre-set agenda. perspective, social realities and identities are
understood to be socially constructed, fluid
Silences and omissions in and context-dependent, so focus groups are a
focus groups particularly appropriate method. For example,
Munday (2006) has argued that the use of
All research methods have in-built omis-
focus groups provides a method particularly
sions – things that a specific methodology
suited to researching the construction of
is unlikely to pick up on. Inevitably, some
collective identity. Puchta and Potter (1999)
participants speak freely in the groups and
consider the contradiction in focus group
others remain silent, or need encouragement
methodology between the requirement that
to speak. It is not necessarily a problem
the talk should be both highly focused on
if some people remain silent. Silence is
predefined topics and issues, and at the same
an ‘enduring feature of human interaction’,
time spontaneous and conversational.
present in research communicative contexts as
elsewhere (Poland and Pederson 1998: 308).
Silences and pauses are issues both for
focus group moderation, and for analysis. USING FOCUS GROUPS IN
Silences after a specific question can be an SPECIFIC CONTEXTS
indicator to the moderator that the group is
not comfortable with talking about a particular Using focus groups in cross-cultural
issue (Myers 1998). and cross-national research
Focus groups are being increasingly sug-
Emergent themes and discourses
gested as a good method for understand-
While researchers construct the focus group ing cultural variations and differences. The
schedules around their research topics, a involvement of minority community groups
particularly interesting feature of focus group through focus groups has been shown to be a
methodology is the way in which groups powerful tool in developing culturally appro-
take up these discourses or themes in ways priate methods (Hughes and DuMont 1993,
unanticipated by the researchers. It is also Pollack 2003, Willgerodt 2003), and in
common for groups to introduce new themes including culturally diverse perspectives in
unanticipated in the research design. Litera- research. There are issues in the running of
ture on analysing focus groups stresses the focus groups in different cultural contexts –
key issue that the analytic focus is not on what in some cultures dissent is not expressed
individuals say in a group context but on the in public, some cultures have more subjects
discourses which are constructed within this which are not discussed in public, and in some
group context (e.g. Wilkinson 1998, Smithson cultures variables such as gender will be a
2000, Sims 2002). For this reason, analyt- bigger concern. There are topics which tend
ical approaches which explicitly consider not to work well in the focus group context,
interactive effects and group dynamics are though these vary greatly in different contexts
particularly appropriate (Myers 1998, Puchta and cultures.
and Potter 1999, 2002, Stokoe and Smithson As with any cross-national research, there
2002). These approaches all focus on are issues of translation of research tools
how discourses, or themes, are constructed and data between languages. With qualita-
jointly by participants in a group context. tive research methodologies cross-national
research also needs to take note of cultural directive and perhaps less intimidating than
differences in emotional tone, feelings and traditional research methods, there is wide
reflexivity, which are particularly noticeable variation in this, as described elsewhere in
in focus group research. In some cultures it this chapter. The moderator is still exerting
is not usual to directly disagree in a group a strong influence over the group, and still
situation, or to overtly criticise authority. retaining a high degree of control, typically,
Ways of interacting are of course cultural as over the recruitment, procedure and subse-
well as responses to a particular method and quent analysis and reporting of the group.
the result of particular factors such as gender Using focus groups does not in itself make
and status. For example, in a cross-European the research ‘collectivist’, or empower par-
study of new parents’ orientations to work, ticipants. A postmodernist feminist approach
focus groups in Sweden were described by which views accounts gathered in a research
the national research team as ‘consensual’, process as stories, or narratives, can be well
with turn taking easily managed. In the same suited to focus group methodology, but the
cross-national study, focus groups in the UK questions of how to represent these stories,
were notable for high levels of criticism and which questions to ask and which replies to
outspokenness, while in the Bulgarian focus prioritise in analysis, and how to interpret or
groups in the same study there was little cross analyse these stories, are as pertinent for focus
talking or butting in (Brannen 2004). group research as for other feminist qualitative
methodologies. A priority for feminist focus
group researchers is how to make participants’
Using focus groups in
voices heard without being exploited or
feminist research
distorted, and taking account of ‘unrealised
Focus groups have been widely used in agendas’ of class, race and sexuality (Oleson
recent feminist research, and feminist social 2000). Focus groups are not a ‘solution’
scientists have elaborated on the ways in for highlighting the views of oppressed or
which the methodology can be used to further minority groups, but can, used sensitively,
feminist aims of giving various minority help to facilitate listening to these narratives.
groups a voice through the research process.
For example, Madriz (2000) starts an account Ethnographic research and
of feminist focus group research with a focus groups
quotation from a Dominican woman telling
Ethnographic researchers have made use of
how she prefers the focus group context as
small group discussions for many years,
she finds it less intimidating than being alone
although rarely using the term ‘focus groups’.
with an interviewer. Focus groups have been
Focus groups methodology can fit neatly
taken up as an appropriate method by both post
with certain streams of ethnographic thought,
modernist and feminist standpoint researchers
which place the research encounter in a
(Wilkinson 1998, Madriz 2000, Olesen 2000).
wider social context, and emphasise the
They are seen as a way of lessening the
social and processual nature of experiences
impact of the researcher and permitting
(Tedlock 2000). As with feminist research,
minoritised groups to develop and elaborate
focus groups have been viewed within
their own perspective on a research topic, in a
ethnography as a way of emphasising the
‘safe’ environment. Madriz argues that ‘the
collective nature of experience, and the social
focus group is a collectivist rather than an
context of accounts.
individualistic research method that focuses
on the multivocality of participants’ attitudes,
Focus groups in organisational
experiences and beliefs’ (Madriz 2000: 836).
research
However, other feminist researchers are
more cautious about the use of focus groups. Conducting focus groups in an organisational
In practice, while focus groups can be less context has particular implications. While it
FOCUS GROUPS 367
can be an advantage having people from the brought together people to explore experi-
same departments and work teams, who have ences of chronic illness. It is also a potentially
shared experiences and are often comfortable useful way of talking in a group context
talking together, there can be problems with about sensitive or embarrassing issues, in a
how freely people feel they can express relatively anonymous context. Other reasons
themselves in a workplace situation. Shared for the growing popularity of online focus
workplace experiences such as restructur- group methods include cost savings, and
ing, management experiences, enthusiasm attracting people who would otherwise have
or resistance to work-life initiatives, can little time to participate (Edmunds 1999).
encourage feelings of solidarity among team There are two main discussion options
members. Groups can share common knowl- available when running an online focus
edge about relevant issues in the company group – synchronous and asynchronous
even when the people were strangers. For (Chappell 2003). Synchronous discussions
example, in a study of new parents in occur in ‘real time’ with the moderator and
organisations (das Dores Guerreiro 2004), participants all logged onto a discussion at
everyone had a strong view about the change the same time, posting their comments on a
from formal to informal flexi-time, and there joint board. While this is a close simulation
had clearly been a great deal of discussion over of a face-to-face focus group, one of the
the past months about it which was continued advantages of an online method (the ability
in lively focus group discussion. to participate at one’s own convenience) is
Possible drawbacks of using focus groups no longer available. Additional drawbacks of
in organisational settings include people this method are that the conversation can
feeling unable to speak out in front of become hard to follow and participants tend
superiors or people from different parts of the to answer questions with short, ‘I agree’-
organisation. It is generally not recommended type responses because they feel pressured to
to place managers and employees in the answer quickly. This can also pose problems
same group, although this will vary with for the moderator. It can become difficult to
the nature of the organisation. Privacy and keep track of the conversations and responses
ethical issues are of particular importance of group members, as there is often more
in an organisational context, where people than one track of conversations running
are encouraged to talk freely in front of simultaneously (Montoya-Weiss et al., 1998).
colleagues. The other main online focus group option is
asynchronous discussions, which do not occur
in real time. Messages are posted in response
Online focus groups
to the moderator and the group members at
The use of online interviewing, including the participants’ convenience. Participants do
group interviewing, is being increasingly not have to be logged on at the same time
taken up in social science research. Online and can participate at any point during the day
focus group research methods are part of or night.
this rapid expansion of online methodologies Edmunds (1999) points out that online
(e.g. Murray 1997, Chappell 2003). There are groups can lead to greater anonymity for
various reasons for this. It can be a good way participants, which can lead to greater open-
of including in research hard-to-reach groups. ness. The downside of this, and a particular
An online focus group method can bring issue for online groups is the possibility of
together geographically distant participants ‘fake’ participants – people joining in with
in one, online forum. It can also be used false personas or providing false information
to bring together people with disabilities or (a regular problem on internet chat rooms,
illnesses who would not otherwise find it easy for example). While online methods might
to participate in research, especially in group seem to be particularly susceptible to this sort
contexts. For example, Kralik et al. (2006) of misinformation, it is useful to remember
that in ‘real’ focus groups, as with other are produced in this way perhaps mitigate
forms of research, the participant is an actor the awareness that the interactions occurring
constructing a performance (Goffman 1981). in this formalised research setting will differ
Newhagan and Rafaeli (1996) pointed out in many ways from interactions in other
that using electronic media affected how contexts. As well as differing from indi-
people communicated. While it is important vidual interview data, focus group talk will
to be aware of the ways in which different also be substantially different from ‘natural’
media affect people’s communication pat- conversation.
terns, this is an issue for all qualitative social Focus groups have specific dilemmas, both
research, and all focus group situations, not ethical and procedural, such as respect for
just for online groups. individuals’ privacy, and the difficulties of
There are ways of regulating participation dealing with inappropriate group behaviour
to limit possible misuse, for example making (for example, insensitive comments or reac-
contact individually with the focus group tions to another participant’s contribution),
participants before the online group occurs. as well as the more ubiquitous dilemmas of
There is a growing literature on chat room qualitative research concerning respect for
behaviour and discourses, and the use of participants’ voices, and concerns for misrep-
online methods in social science, which is resenting the experiences and discussions of
particularly relevant when considering the vulnerable groups.
use and analysis of online focus groups The focus group method does have partic-
(Rezabek 2000). ular strengths. It enables research participants
to discuss and develop ideas collectively,
and articulate their ideas in their own terms,
CONCLUSIONS bringing forward their own priorities and
perspectives. Not only can a wide variety of
The diverse nature of focus group research opinions be given and considered, but also
reflects the origins of focus groups, first in a wide variety of interactive techniques can
social science research before being taken be observed. Participants engage in a range
up mainly by market researchers for several of argumentative behaviours, which results in
decades, and more recently becoming widely a depth of dialogue not often found in indi-
and increasingly popular in various social vidual interviews. Moreover, some of these
research fields. The method is used by limitations can also be viewed as possibilities
researchers from very varied epistemological for the method. Myers suggests that ‘the
and theoretical research traditions, which is constraints on talk do not invalidate focus
reflected in the variations of approaches, and group findings; in fact, it is these constraints
specifically the techniques and approaches to that make them practicable and interpretable’
analysing the talk produced in this context. (Myers 1998, p. 107). Focus groups permit
There are conceptual, methodological and some insights into rhetorical processes, or
ethical issues in focus group research. As contemporary discourses. Another plus is that
with other qualitative research methods, participants often report that joining in a focus
there are opportunities for consciously or group has been an enjoyable and creative
unconsciously manipulating the participants’ experience (Wilkinson 1998, Madriz 2000,
responses, and it is perhaps a feature of Smithson 2000, Pini 2002).
focus group methodology, with its seeming The effects of group dynamics in the
emphasis on ‘natural discussion’ and ‘col- focus groups can therefore be of benefit in
lective accounts’, for there to be relatively social research for exploring issues from the
little explicit awareness of the constructed perspective of the participants, in a way
nature of the discussion, and the salience of that is culturally sensitive to participants’
the moderator and research agenda throughout priorities and experiences. While there are
the process. The ‘collective stories’ which some limitations of focus group research,
FOCUS GROUPS 369
these can be partially overcome by awareness Hughes, D. and DuMont, K. (1993) Using focus groups
of the constraints, by informed analysis, and to facilitate culturally anchored research. American
by detailed consideration of the way the Journal of Community Psychology 21(6): 775–806.
conversations are socially constructed in the Jacoby, S. and Ochs, E. (1995) Co-construction: An
group context, and are narratives produced introduction. Research on Language and Social
Interaction 28(3): 171–183.
jointly by the co-participants and also by the
Kitzinger, J. (1994) The methodology of focus groups:
moderator.
The importance of interaction between research
participants. Sociology of Health and Illness 16(1):
103–121.
REFERENCES Kitzinger, J. (1995) Introducing focus groups. British
Medical Journal 311: 299–302.
Agar, M. and MacDonald, J. (1995) Focus groups and Kitzinger, J. and Farquhar, C. (1999) The analytical
ethnography. Human Organization 54: 78–86. potential of ’sensitive moments’ in focus group
Beck, L. C., Trombetta, W. L. and Share, S. (1986) discussions. In Barbour, Rosaline S. and Kitzinger,
Using focus group sessions before decisions are Jenny (eds) Developing focus group research: Politics,
made. North Carolina Medical Journal 47(2): 73–4. theory and practice. London: Sage.
Bloor, M., Frankland, J., Thomas, M. and Robson, K. Kitzinger, C. and Frith, H. (1999) Just say no? The
(2000) Focus groups in social research. Sage: London. use of conversation analysis in developing a feminist
Bradburn, N. M. and Sudman, S. (1979) Improv- perspective on sexual refusal. Discourse and Society
ing interview method and questionnaire design. 10/3: 293–316.
San Francisco: Jossey-Bass. Kralik, D., Price, K., Warren, J. and Koch, T. (2006) Issues
Brannen, J. (2004) Methodological issues in the in data generation using email group conversations
consolidated case studies. Research Report #5 for the for nursing research. Journal of Advanced Nursing
EU Framework 5 funded study ‘Gender, parenthood 53/2: 213–220.
and the changing European workplace’. Printed by Kreuger, R. A. (1994) Focus groups: A practical guide for
the Manchester Metropolitan University: Research applied research, 2nd edition. Newbury Park: Sage.
Institute for Health and Social Change. Kreuger, R. A. (1998) Analyzing and reporting focus
Brannen, J., Lewis, S., Nilsen, A. and Smithson, J. (eds) group results. Focus group kit, Volume 6. California:
(2002) Young Europeans, work and family: Futures in Sage.
transition. London: Routledge. Madriz, E. (2000) Focus groups in feminist research. In
Bryman, A. (1988) Quantity and quality in social N. K. Denzin and Y. S. Lincoln (eds) Handbook of
research. London: Unwin Hyman. qualitative research. California: Sage.
Chappell, D. (2003) A procedural manual for the online Merton, R. K. and Kendall, P. L. (1946) The
work-family focus group. Centre for Families, Work focused interview. American Journal of Sociology 51:
and Well-being, Guelph, Canada. 541–557.
Das Dores Guerreiro, M. (2004) Case studies report. Montoya-Weiss, M. M., Massey, A. P. and Clapper, D. L.
Research report #3 for the EU Framework 5 (1998) On-line focus groups: Conceptual issues and
funded study ‘Gender, parenthood and the changing a research tool. European Journal of Marketing
European workplace’. ISBN 1-900139-46-4. Printed 32: 713–723.
by the Manchester Metropolitan University: Research Morgan, D. L. (1988) Focus groups as qualitative
Institute for Health and Social Change. research. Newbury Park, CA: Sage.
Drew, P. and Heritage, J. (eds) (1992) Talk at work. Morgan, D. L. (2002) Focus group interviewing. In
Cambridge: Cambridge University Press. J. F. Gubrium and J. A. Holstein (eds) Handbook
Du Bois, B. (1983) Passionate Scholarship: Notes on of interviewing research. Context and method.
values, knowing and method in feminist social Thousand Oaks, California: Sage.
science. In G. Bowles and R. D. Klein (eds) Theories Morgan, D. L. and Kreuger, R. A. (1998) The focus group
of women’s studies. London: Routledge. kit. California: Sage.
Edmunds, H. (1999) The focus group research handbook. Munday, J. (2006) Identity in focus: The use of focus
Lincolnwood, IL: NTC Business Books/Contemporary groups to study the construction of collective identity.
Publishing. Sociology 40/1: 89–105.
Goffman, E. (1981) Forms of talk. Oxford: Blackwell. Murray, P. J. (1997) Using virtual focus groups in
Greenbaum, T. (1998) The handbook for focus group qualitative research. Qualitative Health Research
research. Sage: London. 7(4): 542–554.
Myers, G. (1998) Displaying opinions: topics and disorders: a modified focus group technique to involve
disagreement in focus groups. Language in Society children. Quality of Life Research 10(1): 71–79.
27: 85–111. Silverman, D. (1993) Interpreting qualitative data:
Newhagen, J. E. and Rafaeli, S. (1996) Why commu- methods for analysing talk, text and interaction.
nication researchers should study the internet: London: Sage.
a dialogue. Journal of Communication 46(1): 4–13. Silverman, D. (2000) Analyzing talk and text. In
Oleson, V. L. (2000) In N. K. Denzin and Y. S. Lincoln N. K. Denzin and Y. S. Lincoln (eds) Handbook of
(eds) Handbook of qualitative research. California: qualitative research. California: Sage.
Sage. Sim, J. (2002) Collecting and analysing qualitative data:
Pini, B. (2002) Focus groups, feminist research and issues raised by the focus group. Journal of Advanced
farm women: opportunities for empowerment in Nursing 28(2): 345–352.
rural social research. Journal of Rural Studies 18/3: Smithson, J. (2000) Using and analysing focus
339–351. groups: limitations and possibilities. International
Poland, B. and Pederson, A. (1998) Reading between Journal of Methodology: Theory and Practice 3(2):
the lines: interpreting silences in qualitative research. 103–119.
Qualitative Inquiry 4/2: 293–312. Stokoe, E. H. and Smithson, J. (2002) Gender and sex-
Pollack, S. (2003) Focus-group methodology in research uality in talk-in-interaction: considering conversation
with incarcerated women: race, power, and collective analytic perspectives. In P. McIlvenny (ed.) Talking
experience. Affilia 18/4: 461–472. gender and sexuality. John Benjamins: Amsterdam.
Puchta, C. and Potter, J. (1999) Asking elaborate Tedlock, B. (2000) Ethnography and ethnographic
questions: focus groups and the management of representation. In N. K. Denzin and Y. S. Lincoln (eds)
spontaneity. Journal of Sociolinguistics 3: 314–335. Handbook of qualitative research. California: Sage.
Puchta, C. and Potter, J. (2002) Manufacturing Templeton, Jane F. (1987) A guide for marketing and
individual opinions: market research focus groups and advertising professionals. Chicago: Probus.
the discursive psychology of attitudes. British Journal Vaughn, S., Shay Schumm, J. and Sinagub, J. (1996)
of Social Psychology 41: 345–363. Focus group interviews in education and psychology.
Rezabek, R. (January, 2000) Online focus groups: elec- California: Sage.
tronic discussions for research. Forum for Qualitative Wilkinson, S. (1998) Focus group methodology:
Social Research [On-line Journal], 1(1). Available at: a review. International Journal of Social Research
http://qualitative-research.net/fqs [2007, 08,08]. Methodology, Theory and Practice 1(3): 181–204.
Ritchie, J. and Lewis, J. (eds) (2003) Qualitative research Wilkinson, S. and Kitzinger, C. (1996) Representing the
practice: a guide for social science students and other. London: Sage.
researchers. Thousand Oaks, California: Sage. Willgerodt, M. A. (2003) Using focus groups to develop
Ronen, G. M., Rosenbaum, P., Law, M. and Streiner, D. L. culturally relevant instruments. Western Journal of
(2001) Health-related quality of life in childhood Nursing Research 25(7): 798–814.
PART IV
Types of Analysis and

Interpretation of Evidence
This section inevitably only covers some of One of the problems for the student in
the many analytic strategies available. It cov- this field is the surfeit of terms for similar
ers a number of types of analysis available approaches: individual growth modelling,
in relation to quantitative and qualitative data random coefficient modelling, multilevel
and issues that the researcher will encounter. modelling, mixed modelling, and hierarchical
It also has a number of chapters that focus linear modelling, together with the range of
on the analysis of data derived via different statistical packages that can be used. The
methods. term the authors use is multilevel modelling.
This approach has several advantages that
include: its ability to deal with any number
ANALYSIS OF PRIMARY of time points; that each wave of data can be
QUANTITATIVE DATA collected with different time schedules and;
that no data need be discarded because they
Three chapters focus on quantitative data: are missing. The approach can be applied to
one on the analysis of change; a second on linear, non-linear and discontinuous trends.
the analysis of latent variables (variables that The analysis can include both time-invariant
cannot be measured); and a third on the biases predictors such as gender and race as well
that are introduced into analysis when there as ones that do change with time such as
are no comparison groups or control groups attitudes. Moreover, these predictors can be
as in evaluation research. fixed or randomly varied across persons.
Analysing change is difficult. Only in the In Chapter 23, Hoyle addresses the analysis
past 35 years have approaches to statis- of complex quantitative data, focusing on
tical measures of change been developed. latent variable modelling, which examines
Chapter 22 by Graham, Singer and Willett pro- the presence or influence of constructs that
vides an introduction to one approach to the cannot be measured. The chapter discusses
analysis of quantitative longitudinal data. The the use of linear structural equation mod-
chapter goes into enough depth to provide a elling (SEM) to evaluate social models,
basic understanding of longitudinal modelling an approach that has many uses in the
but does not become so technical that it is social sciences: in particular the evaluation
difficult for a person who is not familiar with of measurement models, mediated effects,
the terminology and concepts to follow. moderator effects and longitudinal data using
several approaches including latent growth groups. Unfortunately, West and Thoemmes
curve models. In all these cases a predicted conclude from their literature review that it
model is compared to actually observed data is not clear whether the bias introduced by
to determine if the predicted model is a good fit having non-equivalent groups will make the
with the data. The predicted model describes comparison between the two groups appear
the relationships among constructs and can be smaller or larger. The chapter also deals with
regarded as a hypothesis of the mechanisms the following issues: the importance of lack
that produced the data. The use of latent of bias in the assignment; the importance
variables is especially useful in decreasing the of delivering the intervention to everyone in
number of variables that need to be tested and the treatment group; issues of attrition; and
in increasing the reliability of measurement. questions concerning the information given to
SEM’s measurement component is used to intervention and non-intervention groups.
test the relationship among latent variables
and their indicators. The structural component
is concerned with the directional relationship. ANALYSIS OF PRIMARY
While the latter appears to be causal, because QUALITATIVE DATA
the path model specifies direction, Hoyle is
quick to point out that unless the data are Five chapters focus on the analysis of primary
longitudinal then causal conclusions cannot qualitative data. Three chapters are devoted
be made. The measurement component can to the analysis of talk: a chapter on discourse
be used to test if the model is consistent analysis and conversation analysis; a chapter
across time or samples, which would indi- on the analysis of narrative and storytelling;
cate measurement invariance. He notes that and a chapter on grounded theory.
although this is a very valuable function of Charles Antaki’s chapter (25) on how to
SEM it is rarely used that way. Hoyle sets analyse discourse covers a lot of ground not
out six limitations of SEM including requiring only by talking about different varieties of
a sample size of a minimum of 400 in order discourse analysis (DA) but also by including
to obtain stable estimates but he also predicts conversation analysis (CA). Even though
that SEM’s use will grow because of its many these approaches are often seen as separate or
advantages compared to other techniques. even belonging to opposing camps, both types
In Chapter 24 West and Thoemmes address of analysis address the organization of talk and
the issue of having appropriate control or text as ‘speech acts’ thereby emphasizing their
comparison groups in research using an agentic dimension. Among the plethora of
intervention or a programme evaluation, methods used for analysing discourse, Antaki
especially when the question being addressed also discusses narrative analysis, critical
is the effectiveness of an intervention. These discourse analysis, interactional sociolinguis-
techniques, even though they have impor- tics, membership category analysis, discur-
tant limitations, provide a safety net for sive psychology, and ethnomethodologically
experimental social research. The authors inspired DA. Social interaction as revealed
provide valuable advice for research where through the lens of CA is similar to other ways
the design is intended to have non-equivalent in which discourse is analysed: it can discover
groups or where there is a failure of random things about interaction and language use that
assignment, as well as research that sets out the participant did not suspect, or which have
to have random assignment. They discuss effects or functions which did not figure in
several techniques that can be used in an the original aims of the encounter or speaker.
attempt to deal with groups that are not Such revelations, whatever the method used
equivalent at the start of the study. However, in teasing them out, are the ultimate criteria
even when the design is labelled as random for the right to claim to have carried out an
assignment, the implementation of the design analysis. As Antaki stresses, any researcher
may result in obtaining non-comparable who claims to be a discourse analyst must
TYPES OF ANALYSIS AND INTERPRETATION OF EVIDENCE 373
‘add value’ to what can be read or heard systematic procedures for the analysis of
in speech and claims must be backed up by qualitative data. Hitherto such strategies were
evidence grounded in the words used (or not largely learned by researchers in the field.
used). Thus the ‘argumentative steps’ leading In a context and time in which US research
to the conclusion must be available to the was largely quantitative or rather status was
reader and fellow-scholar. accorded largely to quantitative research, the
In discussing the analysis of narrative systematization of its approach bestowed on
Hyvarinen makes a very rich contribution qualitative research some legitimacy. How-
to the Handbook. The chapter 26 starts ever, as Charmaz argues, in their enthusiasm
with a wide-ranging account of the different followers of the approach sought to project
definitions of narrative, many of which have a rigidity on to it, in particular a belief
been potentially confusing including ordinary that disallowed macro social processes or
talk to accounts that are ‘narratives’ and those structures that are left untapped at the inter-
that ‘possess narrativity’. The chapter goes actional level, while a second considerable
on to suggest that narrative analysis includes benefit of a grounded approach – namely
as many genres as the term narrative itself to generate theory – was rarely exploited.
and picks out two developments that have Both developments are ironic, Charmaz notes,
had great impact on social research: grand given grounded theory’s original openness to
narratives and the notion of ‘life as narrative’. methodological innovation and development.
The discussion then turns to the methods of On the other hand, this chapter represents
analysis that have been applied to different an inspiring account of grounded theory and
genres of narrative, in particular the Proppian encouragement for its further use notably
model in which Russian wonder tales were for those who wish ‘by interrogating and
analysed in terms of the basic functions of following content, ..[ to] construct form
actions performed by their different characters for their inquiry, rather than solely creating
in the plots and the textual approach adopted content from form used as a recipe for
by Labov and Waletsky who sought to identify generating research’ (Charmaz Chapter 27).
the basic elements of narrative. The chapter Two chapters focus on the analysis of
then moves to recent developments: to the qualitative material of a different kind, the first
study of narratives as practices and in context, on the analysis of documents and the next on
thereby making a distinction between the story the analysis of visual material.
and the storying process. The last part of Documents are a key source of data but
the chapter discusses how narrative practices methodological guidance to their analysis and
are transformed into cultural scripts, shape use is rare. Typically documents are used by
individual action and narration, and lead to researchers as resources for trawling content.
breach and discordance. The grounded theorist distinction between
Grounded theory has been an extremely form and content is taken up by Lindsay
important development in the analysis of talk Prior in Chapter 28 on documentary research.
although it does not need to be limited to Prior argues they can also be seen as a topic
such a form of data. In Chapter 27, Kathy in their own right in which the focus is
Charmaz provides an illuminating analysis of on documents as ‘informants’ that perform
its development according to its originators – functions in social interaction. In arguing in
Glaser and Strauss in their book The Discovery favour of a focus upon discourse (as well as
of Grounded Theory published in 1967. She content) Prior gives a striking example of
discusses the development of their ideas from how the scientific discovery of DNA came to
her own position as a long-time exponent be represented in text as something that was
and developer of the method. Her argument endowed with creative action. Without the
is that its clear appeal lay in the fact that use of metaphors drawn from communication
The Discovery of Grounded Theory was the this would not have been possible to convey
first methodological text to set out explicit and hence for the public to comprehend.
Documents are also read and understood – as or methodological questions are explored.
in Bernstein’s terms (Bernstein 20001 ) they Supplementary analysis involves the in-depth
are the object of recontextualization. They investigation of an issue, or one aspect of the
may ‘act’, as in the case of a will, and they data, that was not addressed, or was only partly
may form part of a network of actors, as in the covered, in the original research. Instead,
case of a genre of literature, and they are used the purpose of re-analysis is to verify and
in social interaction to structure and pattern corroborate the findings of previous work.
their readers. In amplified analysis, two or more datasets
Like documentary methods visual methods are utilized to form a larger dataset, or
are a relatively ignored field of methodology used to compare different populations. Finally
with the exception of social anthropology in assorted analysis, secondary analysis of
where visual data have been used for some qualitative data is combined with additional
time. In Chapter 29, Christian Heath and primary research. Despite recent advances
Paul Luff set out a case for a particular in the re-use of qualitative data, Heaton
approach within sociology that draws upon stresses that further work is needed to
ethnomethodology and conversation analysis explore and outline different strategies for
and directs analytic attention towards the re-using qualitative data, and to examine the
social and interactional accomplishment of acceptability of these strategies to research
everyday activities and events. In their participants and the public.
chapter, they draw upon their own study Angela Dale and colleagues in Chapter 31
of auctions and auction houses, to provide provide a mine of useful information about the
some practical guidance to using video record- secondary analysis of quantitative data. They
ings to address the social and interactional present an excellent overview of the types of
organization of naturally occurring events. data available that are collected by academics,
governments and supra-national organiza-
tions such as the European Union. These
SECONDARY AND META-ANALYSIS include: administrative datasets, national
cohort and panel studies, international and
Three chapters are concerned with the sec- national surveys, pooled samples from several
ondary analysis of data: the first on qualitative surveys (where no one source provides
data, the second on quantitative data, while the sufficient numbers of a particular group that
third is a discussion of meta-analysis. is of interest), and micro datasets that link
The re-use of qualitative data is not together administrative records for the same
established practice in social research, as Janet individuals. The secondary analysis of large-
Heaton suggests in Chapter 30. However, it is scale datasets is moreover occurring in a
a developing methodology, and the re-use of context in which attempts are being made to
qualitative data is becoming more common, take a more global view of available datasets.
partly due to computer technology and partly For example, the UK’s Economic and Social
due to the promotion of data sharing. Social Research Council is now taking a strategic
researchers can access qualitative data for approach by providing a national map that will
secondary analysis in three ways: through enable researchers to find their way through
data archives, through informal data sharing, the myriad resources available. The chapter
and by re-using data from their own previous is highly practical and includes some tips on
research. The latter is still the most common how to gain access to these datasets, with
alternative, despite the increasing availability a particular focus upon data archives. It has
of qualitative data collected by others. Heaton the added advantage of covering datasets in
lists several ways in which qualitative data a range of countries. It also makes reference
can be re-used. In supra analysis, the focus of to ways in which such datasets may be used
the secondary analysis transcends the primary in combination with qualitative methods as
data analysis in that new theoretical, empirical part of a mixed-methods strategy. The last
TYPES OF ANALYSIS AND INTERPRETATION OF EVIDENCE 375
three sections of the chapter offer cautionary per se but the logic that underlies the
advice about using data collected for different integration of data within the analysis, and the
purposes to those of the secondary analyst extent to which the combination of methods
and discuss a variety of good practices. The strengthens the validity of that analysis. As
chapter also raises ethical issues stressing the authors put it, data integration should
how secondary data analysts inherit respon- act as quality control. This does not in
sibilities at the point of access to these data. their view mean ignoring the epistemological
A section is importantly devoted to advances assumptions underlying each method but
in access to data via e-social science (grid recognizing that there are several ways of
technology). interpreting a research question, while being
Meta-analysis is the integration of data open to the benefits and constraints of each
from similar studies that leads to a quantitative type of data.
summary of the results of these studies. The authors point to several different
In Chapter 32 Patall and Cooper provide possible mixed-method research designs and
a comprehensive framework for understand- discuss their own study in some depth in which
ing meta-analysis that is increasingly used both qualitative and quantitative methods
to make literature reviews of quantitative were equally important. They show how in
research more systematic, replacing the more their study of public responses to flood warn-
traditional narrative review. However they ing, how one method (a survey) revealed that
suggest that informed social scientists need to many of those identified according to external
be aware of both the advantages and disadvan- measures and perspectives as being at risk of
tages of meta-analysis, regardless of their own flooding were unaware of the risks, while the
use of this approach. They discuss a range of qualitative method they used explained this
issues that include: the identification of studies lack of awareness. They conclude that, rather
for inclusion; coding frames, calculation of than seeing the different methods as gener-
effect sizes; sample weighting and so on. They ating competing findings, the complex social
also identify the problems to do with testing phenomena under investigation required the
the same relationship in all the studies under coordination of different perspectives and
review, issues concerning the independence of their associated methodologies.
findings, and the variable quality of the studies Cronin et al. in Chapter 34 take a similar
included. This chapter provides an excellent view about the integration of different types of
way to obtain competence in addressing these data. Their concern is to describe the processes
issues. involved in analytic integration. Drawing
upon their own research, this discussion is
about research in which no one method is
INTEGRATING ANALYSES OF DATA dominant. Through the use of in-depth inter-
FROM DIFFERENT SOURCES views, life histories and visual methods they
explored the meaning of vulnerability and
Finally, we come to the key issue of how to safety in everyday life. They broadly defined
integrate the analysis of data from different these different data sources as qualitative. The
sources. One of the central themes of this process of analysis they describe is one in
section, to which three chapters are devoted, which they followed ‘different threads’: using
is the combination of different data collected one method they picked out one thread of
through different methods. In Chapter 33, the analysis, generated either inductively or
Jane Fielding and Nigel Fielding discuss imported from external theory, that they then
the integration of qualitative and quantitative pursued in the analysis of data produced by
data, that which is most commonly described the other methods. The chapter is particularly
as mixed-methods research. They emphasize useful in giving a very detailed account of
that what is important is not the choice the steps in the analytic process while at the
of design and use of different data sources same time demonstrating close attention to
epistemological and theoretical issues and the have just one watch instead of two; instead, it
intrinsic form of the data. Thus it identifies may simply be less confusing.
how the researchers sought to preserve the The Handbook’s last chapter is about
integrity of the individual narrative accounts writing and presenting social research. Amir
and cautions against the translation of one Marvasti (Chapter 36) suggests alternative
set of data into another – in this study the ways of writing social science and argues
translation of visual data into textual data. that during the second half of the twentieth
In Chapter 35, Max Bergmann considers century a ‘third culture’ of representation
what data ‘are’, the reasons for using more has challenged the necessity of treating
than one dataset for a research question and science and literature as mutually exclusive
how these reasons connect differently to vari- realms of knowledge. This means that in the
ous parts of the research process. The chapter social sciences there is a growing awareness
reviews issues concerned with the analysis of of the rhetorical dimensions of writing
different sources of largely quantitative data and representing facts, so that efforts to
and discusses how data are always contingent inscribe social reality also involve linguistic
and shaped by analytic strategies; analyses of constructive practice. As a consequence, in
data provide only partial answers to research recent decades alternative forms of writing
questions. In making this case a number of have emerged. These Marvasti classifies
arguments are presented for using a num- into six genres: (1) writing with pictures;
ber of different (quantitative) data sources: (2) performative writing; (3) writing factual
verification, convergence, complementarity fiction; (4) poetic representation; (5) writing
and holism, rationales that apply equally the author; and (6) post-colonial writing.
in research that combines quantitative and Marvasti also discusses the ways in which
qualitative data. These ways of combining alternative texts have been criticized. The
data are played out at different phases of the chapter provides the reader with a map of an
research process so that data in a qualitative ever-changing terrain and suggests that many
form may be transformed into quantitative territories are still to be discovered.
format at the point of data collection, for
example through CAPI technology. Such
processes of transformation Bergman refers NOTES
to as ‘a form of taming and disciplining’ data
for a particular type of analysis. The chapter 1 Bernstein, B. (2000) Pedagogy, Symbolic Control
begins and ends with a reference to Segal’s and Identity Theory: Research Critique, Lanham
law that does not propose that it is better to Maryland: Rowman and Littlefield.
22
An Introduction to the
Multilevel Model for Change
Suzanne E. Graham, Judith D. Singer and
John B. Willett
Researchers often examine how individual the most sense to the research question—from
change over time depends on selected predic- seconds to years, sessions to semesters. The
tors by fitting a multilevel model for change. data collection schedule can be fixed (every-
Generations of behavioral scientists have been one has the same periodicity) or flexible (each
interested in measuring and investigating person has a unique schedule); the number
individual change, but for decades, the of waves of data collected can be identical
prevailing view was that it was impossible to or vary from person to person. And don’t
do well (Cronbach and Furby, 1970). During let the term ‘growth model’ fool you—these
the 1980s, however, methodologists working models are also appropriate for outcomes that
within a variety of different disciplines decrease over time (e.g. weight loss among
developed a class of appropriate methods— dieters) or exhibit complex trajectories that
known variously as individual growth model- include plateaus and reversals.
ing, random coefficient modeling, multilevel Furthermore, fitting a multilevel model
modeling, mixed modeling, and hierarchical for change can be used to address research
linear modeling—that permit the effective questions posed across many substantive
investigation of change. Today we know that it disciplines. In medicine, we study change
is indeed possible to model change, and to do over time in aspects of health status, such
it well, as long as you have longitudinal data as alcohol consumption among adolescents
available (Rogosa et al., 1982; Willett, 1988). (Curran et al., 1997). In education, we exam-
A multilevel model for change can be fit ine changes in student academic achievement
successfully to longitudinal data of many over time, for example, the development
different kinds. The research design that of the understanding of mathematical con-
generated the data can be either experimental cepts during secondary school (Ai, 2002).
or observational, prospective or retrospective. In psychology, we investigate changes in
Time can be measured in whatever units make behavioral outcomes, such as externalizing
behaviors or depressive episodes, over time using items from the National Assessment of
(Keiley et al., 2000). Educational Progress. Here, in our example,
Perhaps the most intuitively appealing way we present analyses of the mathematics
of understanding how a multilevel model for achievement data from a sub-sample of
change is postulated is to link its specification 1,322 White and African-American students
to two distinct substantive questions about between 7th grade and 11th grade. We begin
change, each arising from a particular level by examining the effects of race on changes
in a natural hierarchy: in the students’ mathematics achievement
over time. Then, we investigate whether
• At level-1—the ‘within-person’ or intra-individual individual mathematics achievement growth
level—we can ask questions about each person’s trajectories differ for students from different
individual change trajectory. Does a particular stu- socio-economic backgrounds and whether
dent’s mathematics achievement improve rapidly girls’ trajectories differ from those of boys.
during secondary school? Does another student’s
achievement increase less rapidly? Might yet
another student’s mathematics achievement actu-
ally decrease over time? Are these changes linear
Level-1 model for individual change
or non-linear? The goal of addressing a level-1 In the left-hand panel of Figure 22.1, we plot
research question is to interrogate the trajectory the mathematics achievement (MATHACH)
of each person’s individual growth over time. of one African-American girl from our dataset
• At level-2—the ‘between-person’ or inter-
against her grade, between 7th and 11th grade.
individual level—we can ask how other variables
may predict differences among the change
Notice the upward trend in the empirical
trajectories of many individuals. On average, growth record, which we have summarized
do girls’ and boys’ mathematics achievement in the figure by superimposing an ordinary
trajectories start at the same initial level? Do boys least squares (OLS) ‘achievement on grade’
and girls have the same rates of change over time? linear regression line, fitted for this girl. With
Do the change trajectories differ systematically by few waves of data, it is difficult to argue that
other important individual characteristics, such as anything except a linear model is suitable
a student’s race or socio-economic background? for representing change, within-person. Here,
The goal of addressing a level-2 research question with five waves of data, we need not be
is to interrogate any heterogeneity in change limited to thinking only in terms of linear
among individuals in order to determine the
trajectories, but for simplicity we begin here
relationship between predictors and the growth
trajectories.
by focusing on linear growth over time. Later
in the chapter we consider non-linear growth
trajectories.
These two types of questions are natural
A level-1 statistical model, or individual
precursors of the statistical models that
growth model, can be specified to represent
together form an overall multilevel model for
the change that we hypothesize each member
change.
of the population will experience during the
In this chapter, we illustrate these ideas
time period under study. Assuming that true
using five waves of mathematics achievement
individual change is a linear function of
data collected as part of the Longitudinal
grade, for instance, a reasonable level-1 model
Study of American Youth [LSAY], a national
may be:
longitudinal study of U.S. secondary school
students (Miller et al., 2000). LSAY data
were collected from 5,945 students over the Yij = π0i + π1i (GRADEij − 7) + εij (1)
course of seven years, beginning in the fall of
1987 when the students were in either 7th or This model asserts that, in the population
10th grade. A primary focus of the LSAY from which this sample was drawn, Yij , the
investigation was on the measurement of value of MATHACH for student i at time
students’mathematics achievement over time, j is constituted from two important parts.
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 379
70 70 70
White
Mathematics achievement
65 65 65
60 60 60
55 55 55
African-
50 50 50 American
45 45 45
40 40 40
35 35 35
30 30 30
25 25 25
7 8 9 10 11 7 8 9 10 11 7 8 9 10 11
Grade Grade Grade
Figure 22.1 Developing a multilevel model for change using data on mathematics
achievement over time. Left-hand panel contains the empirical growth record of one
African-American girl plotted against her grade in school. Middle panel presents exploratory
OLS-fitted trajectories for a random sample of 10 White and 10 African-American students
(coded using dashed lines for White students and solid lines for African-American students).
Right-hand panel presents fitted change trajectories for White and African-American
students, obtained by substituting prototypical predictor values into the fitted multilevel
model for change
The first part — in brackets in equation (1) — become the objects of prediction in the linked
describes the underlying true change for this level-2 model that we specify below.
individual as a linear function of his (or her) An important feature of the level-1 spec-
grade in school on that occasion (GRADE ij ). ification is that the researcher controls the
In our case, the model implicitly assumes substantive meaning of these parameters
that a straight line adequately represents the by choosing an appropriate metric for the
student’s true change trajectory over time. temporal predictor. For example, in this level-
The second part of the individual growth 1 model, the intercept, π0i , represents student
model is a random error (εij ), which is i’s true mathematics achievement in 7th
intended to account for the scatter of the grade. This interpretation applies because we
observed data around the individual true centered GRADE in the level-1 model by
change trajectory. Even though everyone in subtracting the constant ‘7’ from it, to provide
our example was assessed on the same five the level-1 predictor (GRADE–7). Had we not
occasions (grades 7, 8, 9, 10, and 11), this centered the predictor in this way, the intercept
basic level-1 model can be used in a wide π0i would represent individual i’s true value of
variety of other datasets, even those in which mathematics achievement at grade 0, which,
the timing and spacing of waves varies across corresponding with kindergarten, predates the
people. onset of data collection! Centering the level-1
The brackets in equation (1) identify time predictor on the first wave of data
the model’s important structural component, collection, as we have done here, is a popular
which represents our hypotheses about each approach because it allows us to interpret π0i
person’s true trajectory of change in math- easily: it is student i’s true ‘initial’ status, at
ematics achievement over time. The model the beginning of the study.
stipulates that this linear trajectory is char- Perhaps a more important individual
acterized by two critical individual growth growth parameter is slope, π1i , which rep-
parameters, π0i and π1i , which determine its resents the rate at which student i’s true
shape for the ith student in the population. mathematics achievement changes over time.
If the model is appropriate, these parameters Since time is measured in grades, in our
represent the fundamental features of each example, individual growth parameter π1i
student’s true growth trajectory, and as such, represents student i’s true annual rate of
change in mathematics achievement. During do African-American 7th graders have lower

the investigation—from 7th grade to 11th mathematics achievement than their White
grade—her achievement is hypothesized to peers, or do they have different rates of change
change by π1i per grade. Because we hypoth- in achievement from 7th grade to 11th grade?
esize that each individual in the population To develop intuition about the level-2
has his (or her) own rate of true change, this model, examine the middle panel of
growth parameter has the subscript i. Figure 22. 1, which represents an exploratory
In specifying a level-1 model, we implicitly analysis in which we plot fitted OLS
assume that all the true individual change individual growth trajectories for a random
trajectories in the population have a common subset of 10 White and 10 African-American
algebraic form. But because each person students in our example (coded using solid
has his or her own value of the individual lines to represent African-American students
growth parameters, everyone does not need to and dashed lines for White students). As
follow exactly the same trajectory. Students’ noted for the single student in the left
true mathematics achievement levels in 7th panel, mathematics achievement appears
grade may vary, as may their rates of true generally to increase over time. In addition,
change in achievement. Some students may African-American students seem to have
begin 7th grade with lower mathematics generally lower mathematics achievement
achievement than others, and some students’ scores in 7th grade than do White students,
mathematics achievement may improve more and their rate of increase in achievement
rapidly over time than others. Yet other over time may not be as great. In other
students may have mathematics achievement words, their intercepts may be lower and their
trajectories that actually decrease over time. slopes shallower. Also note the substantial
Specifying the level-1 model appropriately inter-individual heterogeneity in growth
allows us to specify the trajectories of trajectories within groups. Not all African-
different participants using only the values American students have lower intercepts
of their individual growth parameters. This than do White students; many of them have
leap is the cornerstone of the growth curve higher mathematics achievement in 7th grade
modeling approach to analyzing longitudinal than many White students. Similarly, not
data because it means that we can study inter- all African-American students have less
individual differences in individual growth steep slopes; some of them have very rapid
trajectories by studying inter-individual vari- increases in mathematics achievement over
ation in growth parameters. Our general time. Furthermore, within both groups there
questions about predictors of ‘change’ then are students whose mathematics achievement
become questions about the relationship actually decreases over time. Our level-2
between the individual growth parameters and model must simultaneously account for both
those predictors. these general patterns (the evident between-
group differences in intercepts and slopes)
and any inter-individual heterogeneity that
Level-2 model for inter-individual
remains within groups.
differences in change
This suggests that an appropriate level-2
Once the level-1 model has been specified, model would have outcomes that are the
a level-2 statistical model can then codify level-1 individual growth parameters them-
the hypothesized relationship between the selves (the π0i and π1i parameters from
inter-individual differences in the change equation (1)). In addition, the level-2 model
trajectories (as embodied in the individual must specify the relationship between each
growth parameters) and time-invariant char- of the individual growth parameters and the
acteristics of individuals, such as race and predictor of interest (here, AFAM, which takes
gender. For instance, we can use a level-2 on only two values: 0 = White, 1 = African-
model to address questions like: On average, American). Finally, the level-2 model must
allow even individuals who share common in 7th grade) among White students in
predictor values to differ stochastically in their the population, while γ01 represents the
individual change trajectories, by permitting hypothesized population difference in average
random variation in the individual growth true initial status between African-American
parameters across people. These considera- and White students. Similarly, γ10 represents
tions suggest that the following level-2 model the average true annual rate of change in
may be a useful specification for the inter- mathematics achievement for White students,
individual differences in change: in the population, while γ11 represents the
hypothesized population difference in average
π0i = γ00 + γ01 AFAMi + ζ0i true annual rate of change between African-
(2)
π1i = γ10 + γ11 AFAMi + ζ1i American and White students. The level-2
slopes, γ01 and γ11 , then jointly capture the
Like all level-2 models, equation (2) has effects of AFAM. If γ01 and γ11 are non-
more than one component; but, taken together, zero, the average population trajectories in
they simultaneously treat the intercept (π0i ) true mathematics achievement differ between
and the slope (π1i ) of an individual’s growth the two ethnic groups; on the other hand,
trajectory as level-2 outcomes that are asso- if γ01 and γ11 are both 0, then the tra-
ciated with predictors (here, AFAM). As in jectories do not differ by race. These two
multiple regression analysis, we can modify level-2 slope parameters therefore address
the level-2 model to include other predictors, the following research question: What is the
adding, for example, socio-economic status difference in the average trajectory of true
and gender. Each component of the level-2 change in mathematics achievement between
model also has its own residual—here, White students and African-American stu-
symbolized by ζ0i and ζ1i —that permits dents?
stochastic variation in the level-1 parameters, An important feature of both the level-1
after the impact of the predictor has been and level-2 models is the presence of
accounted for. The stochastic part of the requisite stochastic terms—the residuals εij
level-2 model allows the individual intercepts at level-1, and ζ0i and ζ1i at level-2. In
and slopes to differ across individuals, in the the level-1 model, residual εij accounts for
population. the difference between individual i’s true
The structural parts of the level-2 model and observed value of the outcome, on
in (2) contain four level-2 parameters— occasion j. For our example, each level-1
which we have labeled γ00 , γ01 , γ10 , and residual represents that part of student i’s
γ11 —that are known collectively as the value of MATHACH at time j not predicted
fixed effects. These fixed effects capture by his (or her) grade level. The level-2
the systematic inter-individual differences in residuals, ζ0i and ζ1i , on the other hand, allow
change trajectories. Later, in our example, each person’s individual growth parameters
we estimate them all. In equation (2), γ00 to be deviated from their relevant population
and γ10 are level-2 intercepts; γ01 and γ11 averages. They represent those portions of
are level-2 slopes. As in simple and multiple the level-2 outcomes—the individual growth
regression analysis, the level-2 slopes are of parameters—that remain ‘unexplained’ by the
greater interest because they represent the level-2 predictor(s). For our example, ζ0i
effect of predictors (here, AFAM) on the represents the difference between student i’s
individual growth parameters. We interpret true mathematics achievement in 7th grade
the level-2 parameters much like linear regres- and the population average true mathematics
sion coefficients, except that they describe achievement in 7th grade for this student’s
variation in ‘outcomes’ that are themselves racial group. Similarly, ζ1i represents the
the level-1 individual growth parameters. difference between student i’s rate of true
For example, γ00 represents the average change in mathematics achievement and the
true initial status (mathematics achievement population true slope for her racial group.
As is the case with most residuals, we are make appropriate distributional assumptions
usually less interested in their specific values about the residuals. At level-1, the situation
than in their variability. Level-1 residual is relatively simple. In the absence of
variance, σε2 , for instance, summarizes the evidence suggesting otherwise, we usually
scatter of the level-1 residuals around each begin by invoking the classical normal-theory
person’s true change trajectory, in the popu- assumption that the level-1 residuals are
lation. The level-2 residual variances, σ02 and independently and identically distributed with
σ12 , summarize the population inter-individual homoscedastic variance, εij ∼ N(0, σε2 ). At
variation in true individual intercept and slope level-2, the presence of two (or sometimes
around their averages that is left over after more) residuals necessitates that we describe
controlling for the effect(s) of any predictors their underlying distribution using a bivariate
included in the corresponding level-2 model. (or multivariate) assumption, such as:
Conditional on adjusting for the impact of 2
the level-2 predictors, therefore, σ02 represents ζ0i 0 σ0 σ01
∼N , (3)
population residual variance in true initial ζ1i 0 σ10 σ12
status and σ12 represents population residual
variance in true annual rate of change, This complete set of residual variances
across all individuals in the population. The and covariances—both the level-1 residual
level-2 variance components therefore allow variance, σε2 and the level-2 error variance-
us to address the research question: how covariance matrix—are jointly referred to as
much heterogeneity in true initial status the model’s variance components. Later, in
and true rate of change remains among our example, we estimate them all.
students after accounting for the effects
of race?
There is a final complication at level-2.
The composite multilevel model
In practice, it is entirely possible that there
for change
may be an association between initial status
and rate of change across individuals in This ‘level-1/level-2’ format is not the only
the population. For instance, students who way to specify the multilevel model for
begin 7th grade with higher mathematics change. A more parsimonious representation
achievement may have higher (or lower) rates results if you collapse the level-1 and level-2
of change. To permit this possibility, we must models together into a single composite sta-
permit the level-2 residuals to be correlated. tistical model. The composite representation
Since ζ0i and ζ1i represent the deviations of the multilevel model for change, while
of the individual growth parameters from identical to the level-1/level-2 specification
their population averages, their population mathematically, provides an alternative way
covariance, σ01 , summarizes the association of codifying hypotheses about change and is
between true individual intercept and slope the specification utilized by many dedicated
across all members of the population. Again statistical software programs. To derive the
because of their conditional nature, this composite specification—also known as the
population covariance, σ01 , summarizes the reduced form growth curve model—notice
association between true initial status and true that any pair of linked level-1 and level-2
annual rate of change, controlling for race. models share terms in common. Specifically,
This parameter then allows us to address the the individual growth parameters specified
question: controlling for race, are the true on the right-hand side of the ‘equals’ sign
mathematics achievement in 7th grade and in the level-1 model become the outcomes
the true rate of change in achievement related on the left-hand side of the ‘equals’ sign in
across students? the level-2 model. We can therefore collapse
To fit any statistical model to data, including the submodels together by substituting for
the multilevel model for change, we must π0i and π1i from the level-2 model in
equation (2) into the level-1 model in How did this cross-level interaction arise,
equation (1), as follows: when the level-1/level-2 specification of the
multilevel model for change appears to
Yij = π0i +π1i TIMEij +εij have no similar term? Its genesis is in the
= (γ00 +γ01 AFAMi +ζ0i ) ‘multiplying-out’ procedure used to generate
the composite model. When we substitute the
+(γ10 +γ11 AFAMi +ζ1i )TIMEij +εij
level-2 model for individual growth parameter
(4)
π1i into its appropriate position in the level-1
Where we have replaced the level- model, level-2 parameter γ11 , previously
1 predictor, (GRADE ij -7), by the generic associated only with level-2 predictor AFAM,
temporal representation, TIME ij , for simplic- gets multiplied by level-1 predictor TIME.
ity. Multiplying out and rearranging terms In the composite model, then, this parameter
yields the composite multilevel model for becomes associated with the interaction term,
change: AFAM*TIME. This association makes perfect
sense if you consider the following logic.
When γ11 is different from zero in the
Yij = γ00 +γ10 TIMEij +γ01 AFAMi
level-1/level-2 specification, the slopes of the
+γ11 (AFAMi ∗TIMEij ) true change trajectories differ according to

+ ζ0i +ζ1i TIMEij +εij (5) values of AFAM. In other words, the effect
of TIME (whose effect is represented by the
Where we once again use brackets to slopes of the change trajectories) differs by
distinguish the model’s structural and race. However, generically, when the effects
stochastic components. of one predictor (here, TIME) differ by the
Even though the composite specification of levels of another predictor (here, AFAM),
the multilevel model for change in (5) appears we say that the two predictors interact.
more complex than the level-1/level-2 speci- The cross-level interaction in the composite
fication, the two forms are logically and math- specification codifies this effect, modeling
ematically equivalent. The level-1/level-2 any difference in the average rate of true
specification is more substantively appealing; change in mathematics achievement between
the composite specification is algebraically African-American and White students.
more parsimonious. In addition, the fixed Another distinctive feature of the composite
effects—the γ ’s—capture the patterns of model is its ‘composite residual,’ the three
change in the ways that we have described, terms in the second set of brackets on
but they function in the composite model in the right-hand side of equation (5) that
a different way. Rather than first postulating combine together the effects of the single
how MATHACH is related to TIME and indi- level - 1 residual and the two level-2 residuals
vidual growth parameters, and second how the that appeared in the earlier level-1/level-2
individual growth parameters are related to specification:
AFAM, the composite specification postulates
that MATHACH depends simultaneously on: Composite residual: ζ0i + ζ1i TIMEij + εij
(1) the level-1 predictor, TIME; (2) the
level-2 predictor, AFAM, and (3) their cross- Even though the components that make up
level interaction, AFAM∗TIME. From this the composite residual have the same meaning
perspective, the composite model’s structural under both the level-1/level-2 and composite
portion resembles a multiple regression model specifications of the multilevel model for
with two predictors, TIME and AFAM, that change, the composite residual provides valu-
appear as both main effects (associated with able insight into our assumptions about the
parameters γ10 and γ01 , respectively) and behavior of residuals over time in longitudinal
in a cross-level interaction (associated with data. Instead of being a simple sum, the
parameter γ11 ). second level-2 residual, ζ1i , in the composite
residual is multiplied by level-1 predictor, across occasions. The presence of the time-
TIME. Despite this unusual construction, the invariant level-2 residuals, ζ0i and ζ1i , in
interpretation of the composite residual is each of the composite residuals defined in
straightforward: it describes the difference equation (5) allows them to be autocorrelated.
between the observed and predicted value of Because they have only an ‘i’ subscript (and
Y for individual i on occasion j. Inspection no ‘j’), they feature identically in each individ-
of the mathematical form of the composite ual’s composite residual on every occasion,
residual, however, reveals two important generating the required autocorrelation across
properties of the occasion-specific residuals time.
not readily apparent in the level-1/level-2
specification for the multilevel model for
change: the composite residuals can be both
Fitting the multilevel model for
autocorrelated and heteroscedastic within-
change to data
person. Fortunately, these are exactly the
kinds of properties that you would expect Many different statistical software programs
among residuals associated with repeated can be used to fit the multilevel model for
measurements of a changing outcome over change to data. Some are specialized packages
time, within-person. written expressly for this purpose (such
When residuals are heteroscedastic, the as HLM, MlwiN, and MIXREG). Others
unexplained portions of each person’s out- are part of popular multipurpose software
come have unequal variances from occasion packages including SAS (PROC MIXED
to occasion. Even though heteroscedasticity and PROC NLMIXED), SPSS (MIXED),
has many roots, one cause is the effects STATA (xtmixed, xtreg, and gllamm) and
of omitted predictors—the consequences of SPLUS (NLME). At their core, each program
failing to include variables that are, in fact, does the same job: it fits the hypothesized
related to the outcome. Because their effects multilevel model for change to data and
have nowhere else to go, they are bundled generates parameter estimates, measures of
together, by default, into the residuals. If precision, diagnostics, and so on. All of the
their impact differs across occasions, the different packages tend to produce the same,
residual’s magnitude may differ as well, or very similar, answers to a given problem,
creating heteroscedasticity. The composite regardless of their method of model-fitting and
model allows for heteroscedasticity via the parameter-estimation (Kreft and De Leeuw,
level-2 residual ζ1i . Because ζ1i is multiplied 1998). So, in one sense, it does not matter
by TIME in the composite residual, its which computer program you choose for your
contribution can differ (linearly, at least, data analysis. But, the packages do differ in
in a linear level-1 submodel) across occa- many important other ways, including the
sions. If there are systematic differences in ‘look and feel’ of their interfaces, their ways
the magnitudes of the composite residuals of entering and pre-processing data, their
across occasions, there will be accompanying approach to model specification (whether they
differences in residual variance, and hence require the multilevel model for change be
heteroscedasticity. specified in the level-1/level-2 or composite
When residuals are autocorrelated, the formats), their estimation methods (e.g. full
unexplained portions of each person’s out- vs. restricted maximum likelihood methods),
come are correlated with each other across their strategies for hypothesis testing, and
repeated occasions. Once again, omitted their provision of diagnostics. It is beyond
predictors, whose effects are bundled into the scope of this chapter to discuss these
the residuals, are a common cause of this details. Instead, we illustrate some of them by
phenomenon. Because their effects may be turning to the results of fitting the multilevel
present identically in each residual over time, model for change that we have specified
an individual’s residuals may become linked above to data on our example, using SAS
Table 22.1 Results of fitting a multilevel model for change to data (n = 1, 322).
This model predicts mathematics achievement between grades 7 and 11 as a
function of (GRADE-7) at level-1 and race (AFAM) at level-2
Parameter Estimate (s.e.)
Fixed effects
Initial status, π0i Intercept γ00 53.02***
(0.26)
AFAM γ01 −5.93***
(0.80)
Rate of change, π1i Intercept γ10 2.87***
(0.80)
AFAM γ11 −0.48*
(0.23)
Variance components
Level-1: Within-person, εij σε2 37.17***
(0.86)
Level-2: In initial status, ζ0i σ02 59.05***
(3.23)
In rate of change, ζ1i σ12 3.19***
(0.29)
Covariance between ζ0i and ζ1i σ01 6.18***
(0.69)
∼ p < 0.10; * p < 0.05; *** p < 0.001
Note: Full ML, SAS Proc Mixed.
PROC MIXED. Estimates are presented in is related to student mathematics achievement

Table 22.1. in secondary school.
Substituting the estimated fixed effects —
the γ̂ s— from Table 22.1 into the hypothe-
INTERPRETING A FITTED MULTILEVEL sized level-2 model in equation (2), we have
MODEL FOR CHANGE the following fitted level-2 model:
In any analysis of change, the fixed effects π̂0i = 53.02 − 5.93AFAMi

(6)
parameters—the γ s of equations (2) and π̂1i = 2.87 − 0.48AFAMi
(4)—quantify the impact of time-invariant
predictors on the individual change trajec- The first part of this fitted model describes
tories. In our example, for instance, they the estimated effects of AFAM on true initial
characterize the relationship between the status; the second part describes its estimated
individual growth parameters and race. We effects on the annual rates of true change in
interpret these estimates much as we do any mathematics achievement. Begin with the first
regression coefficient, with one key differ- part of the fitted model, for true initial status.
ence: the level-2 ‘outcomes’ that these fixed In the population from which this sample
effects describe are the level-1 individual was drawn, we estimate that the true initial
growth parameters built into the multilevel status (MATHACH at grade 7) for the average
model for change.As is usual in any regression White student is 53.02; for the average
analysis, we can conduct a hypothesis test African-American 7th grader, we estimate that
on each fixed effect using a single parameter initial true mathematics achievement is 5.93
test (most commonly to examine the null points lower (47.09). In addition, in rejecting
hypothesis H0 : γ = 0). As shown in (at the 0.001 level) the null hypotheses on
Table 22.1, we reject all four such null γ00 and γ01 , we conclude that the average
hypotheses, suggesting that each parameter White student had non-zero true mathematics
plays an important role in the story of how race achievement in 7th grade (hardly surprising!)
and that there is a statistically significant We then substitute these estimates into the
difference in the average true mathematics hypothesized level-1 model in equation (1)
achievement of White students compared with to obtain the fitted individual change trajec-
their African-American peers. tories:
Next examine the second part of the fitted When AFAM = 0:
model, for the annual rate of true change.
In the population from which this sample Ŷij = 53.02 + 2.87(GRADEij − 7)
was drawn, we estimate the annual rate of
true change in mathematics achievement for When AFAM = 1:
the average White student is 2.87 points
per year; for the average African-American Ŷij = 47.09 + 2.39(GRADEij − 7) (8)
student, we estimate it to be nearly half a
point lower (at 2.39). In rejecting (at the These fitted trajectories are plotted in
0.001 level) the null hypothesis on γ10 , the right-hand panel of Figure 22.1, and
we conclude that the average White student reinforce the numeric conclusions articulated
experienced a statistically significant increase above. In comparison to White students, the
in true mathematics achievement over time. average African-American student has lower
Because we also reject (at the 0.05 level) mathematics achievement in 7th grade and
the null hypothesis on γ11 , we conclude that a slower rate of increase in mathematics
differences between African-American and achievement.
White students in their annual rates of true The estimated variance components assess
change are also statistically significant. The the amount of outcome variability left—at
estimated mathematics achievement for the either level-1 or level-2—after including the
average White student increased 11.48 points specified predictors. Because the variance
from 7th grade to 11th grade, while the components are harder to interpret in absolute
increase for African-American students was terms, many researchers rely on the associated
two points lower (9.56). African-American hypothesis tests, for at least they provide
students begin 7th grade with lower aver- some benchmark for comparison. Some
age mathematics achievement than their caution is necessary, however, because a
White counterparts, and the achievement gap null hypothesis on a variance necessarily
increases over time. falls at the border of the available parameter
Another way of interpreting the estimated space (by definition, variances cannot be
fixed effects is to plot fitted trajectories for negative) and as a result, the asymptotic
prototypical individuals. For this particular distributional properties that hold in simpler
model, only two prototypes are possible: an settings may not apply (Snijders and Bosker,
African-American student (AFAM=1) and a 1999). The level-1 residual variance, σε2 ,
White student (AFAM=0). Substituting these summarizes the population variability in an
predictor values into equation (6) yields the average person’s outcome values around his
estimated initial status and annual growth or her own true change trajectory. Its estimate
rates for each: here is 37.17. Rejection of the associated
When AFAM = 0: null hypothesis test (at the 0.001 level)
suggests the existence of additional outcome
π̂0i = 53.02 − 5.93(0) = 53.02 variation at level-1 (within-person) that may
be predictable in subsequent analyses by time-
π̂1i = 2.87 − .48(0) = 2.87
varying predictors other than time itself.
The level-2 variance components, σ02 and
When AFAM = 1:
σ12 , summarize the variability in true initial
status and rate of true change that remains
π̂0i = 53.02 − 5.93 = 47.09
(7) after controlling for level-2 predictors (here,
π̂1i = 2.87 − .48(1) = 2.39 AFAM). Tests associated with these variance
components evaluate whether there is any related to predictors; and (2) multiple kinds of
remaining residual outcome variation that effects, both the fixed effects and the variance
could potentially be explained by further components. Hypothesizing a level-1 linear
predictors at level-2. For these data, we reject individual growth model has provided two
both of these null hypotheses (at the 0.001 level-2 outcomes; a more complex level-1
level). Because these are level-2 variance submodel specification may provide more.
components (describing the residual variation One simple strategy in specifying the level-2
in true initial status and rate of true change), models is to include each level-2 predictor
we would consider adding further time- simultaneously in all level-2 submodels.
invariant predictors to the multilevel model However, as we show below, they need not
for change. Finally, let’s turn to the level-2 all remain. Each individual growth parameter
covariance component, σ01 . Since we reject can have its own predictors at level-2,
the null hypothesis on this parameter too, we and one goal of model specification is to
can conclude that the intercepts and slopes identify which level-2 variables are important
of the individual true change trajectories are predictors of which level-1 individual growth
indeed correlated in the population, control- parameters. So, too, although each level-2
ling for student race—there is a positive submodel may contain both fixed and random
association between true initial status and effects, both are not necessarily required.
annual rate of true change, once the effects Sometimes hypothesizing a model that has
of AFAM have been removed. On average, fewer random effects will provide a more
African-American and White students who parsimonious representation of the data and
have higher true mathematics achievement in clearer substantive insights into the research
7th grade also have greater rates of increase questions being posed.
in true mathematics achievement between 7th In the data-analytic example that follows,
and 11th grade. we continue to ask whether race has an
impact on change in mathematics achieve-
ment between 7th grade and 11th grade, but
Adding further predictors to the
we now expand our analyses to include socio-
multilevel model for change
economic status and gender as important
Our discussion to this point has focused on controls. Model B in Table 22.2 includes
developing the foundation for understanding SES as a level-2 predictor of both true initial
the multilevel model for change by comparing status and rate of true change in mathematics
the average trajectories of two populations achievement. Model C then removes the effect
of students, African-American and White. of race on the rate of true change. In Model D,
We have seen that true change in both the effect of FEMALE on both true initial
groups is positive, on average, with White status and rate of true change is included in
students enjoying a more rapid increase in true the level-2 model, and in Model E, the model
mathematics achievement over time. How- is again simplified by removing the effect of
ever, through the analysis of the associated FEMALE on the rate of true change.
variance components, we have found that
heterogeneity remains at level-1 and in the Interpreting the additional fitted
true intercepts and slopes, even after the models
effect of time and race have been partialled
out. This suggests that it is important to We have already discussed fitted Model A,
consider the addition of further predictors to which includes AFAM as a predictor of both
the model. Here, as we fit selected additional true initial status and rate of true change. In
models, it is important to remain aware of Model B, we now add SES to the level-2
the complexities involved, of which there are: model, including it as a predictor of both
(1) multiple level-2 outcomes (the individual true initial status and rate of true change.
growth parameters), each of which can be There are therefore now six fixed effects
Table 22.2 Results of fitting a taxonomy of multilevel models for change to the mathematics achievement data(n = 1, 322)
Parameter Model A Model B Model C Model D Model E
Fixed effects
Initial status Intercept γ00 53.02*** 52.81*** 52.82*** 52.39*** 52.40***
(0.26) (0.25) (0.25) (0.35) (0.35)
AFAM γ01 −5.93*** −4.66*** −4.77*** −4.80*** −4.80***
(0.80) (0.77) (0.77) (0.77) (0.77)
SES γ02 3.62*** 3.61*** 3.62*** 3.62***
(0.34) (0.34) (0.34) (0.34)
FEMALE γ03 0.84∼ 0.82∼
(0.48) (0.48)
Rate of change Intercept γ10 2.87*** 2.85*** 2.81*** 2.85*** 2.81***
(0.08) (0.08) (0.07) (0.11) (0.07)
AFAM γ11 −0.48* −0.35
(0.23) (0.24)
SES γ12 0.37*** 0.40*** 0.40*** 0.40***
(0.10) (0.10) (0.10) (0.10)
FEMALE γ13 −0.08
(0.146)
Variance components
Level-1: Within-person σε2 37.17*** 37.17*** 37.16*** 37.17*** 37.16***
(0.86) (0.86) (0.86) (0.86) (0.86)
Level-2: In initial status σ02 59.05*** 52.46*** 52.46*** 52.30*** 52.30***
(3.23) (2.98) (2. 98) (2.97) (2.97)
In rate of change σ12 3.19*** 3.13*** 3.14*** 3.14*** 3.14***
(0.29) (0. 29) (0.29) (0.29) (0.29)
Covariance σ01 6.18*** 5.50*** 5.50*** 5.51*** 5.51***
(0.69) (0.66) (0.66) (0.66) (0.66)
∼ p < 0.10; * p < 0.05; *** p < 0.001
Note: Full ML, SAS Proc Mixed.
to interpret. We begin by interpreting the γ̂13 , represents the estimated effect of SES
parameter estimates in the fitted level-2 itself, controlling for race. Again we are not
submodel for initial status. The estimated surprised that this estimate is positive—for
intercept for this first part of the level-2 students of either race, on average, those with
model provides an estimate of true initial SES one point higher have true growth rates
mathematics achievement when all predictors that are .37 points per year greater (p<0.001).
in that part of the level-2 model are set to Now examine the variance components
zero. As we know, when AFAM equals 0 we associated with Model B. The statistically
are dealing with White students. SES equals significant within-person variance component
0 for students of average socio-economic (σ̂ε2 ) for Model B is identical to that of
status since this measure was standardized Model A, reinforcing the need to explore the
in preliminary analysis to a mean of zero. potential inclusion of time-varying predictors
Therefore, we estimate that the average 7th at level-1. We anticipated stability like this
grade mathematics achievement for White in our estimates because we have added
students of average socio-economic status is no additional predictors at level-1 between
52.81. The next parameter, γ01 , represents Models A and B (although estimates may vary
the effect of race on true initial status, inadvertently because of uncertainties arising
controlling for socio-economic status. Here, from iterative estimation). The estimated
we estimate that, controlling for the effects level-2 variance components, however, do
of SES, the true mathematics achievement of differ: σ̂02 declines by 11.2 percent from
the average African-American 7th grader is Model A (from 59.05 to 52.46). Because
4.66 points lower than that of the average it is still statistically significant, however,
White 7th grader (p<.001). Therefore, while potentially explainable residual variation in
the effect of AFAM is slightly attenuated true initial status remains. The estimated
by controlling for SES, there remains a variation in rate of true change declines
statistically significant effect of race on 7th only minimally from 3.19 to 3.13, and also
grade true mathematics achievement. The remains statistically significant, suggesting
final parameter in the level-2 submodel for the continued presence of explainable residual
true initial status is γ02 , representing the variation in rates of true change.
effect of SES, controlling for race. This Because the average rate of true change in
parameter describes the difference in 7th mathematics achievement does not differ for
grade true mathematics achievement for a African-American and White students once
one-unit difference in SES, for students of SES is controlled, in Model C we remove
either race. We are not surprised to find a AFAM as a predictor of rate of true change,
positive effect of SES—controlling for race, while retaining it as a predictor of true initial
we estimate that average true mathematics status. The parameter estimates associated
achievement is 3.62 points higher for students with both the fixed and random effects are
whose SES is one point greater (p<0.001). essentially unchanged with the removal of
Turning now to parameter estimates associ- AFAM as a predictor of rate of true change.
ated with the rate of true change in Model B, In including the effect of predictor gender, we
we find that the estimated rate of true use a similar approach, first adding FEMALE
change in mathematics achievement for White as a predictor of both true initial status and
students of average SES is 2.85 (p<0.001). true slope (Model D), then, because we find no
While adding SES to the slope submodel differences in the average rate of true change
has not impacted its estimated intercept, the for girls and boys, we remove FEMALE as a
effect of AFAM, while still negative, is no predictor of rate of true change (Model E).
longer statistically significant. Controlling In interpreting Model E, we begin again
for SES, the average rate of true change by interpreting the model’s fixed effects.
no longer differs for African-American and With FEMALE now a predictor of true initial
White students. Our final parameter estimate, status, the interpretation of the intercept term
for the initial status submodel has changed yet as a predictor. We estimate that the rate of
again. Now γ00 represents the average true true change in mathematics achievement for
mathematics achievement for White, male a student of average SES is 2.81 (p<0.001),
(FEMALE = 0) students of average SES. and that students whose SES is one unit
Therefore, we estimate that the average 7th higher have rates of change in mathematics
grade mathematics achievement for White achievement that are greater by 0.4 point per
male students of average socio-economic year (p<0.001).
status is 52.40. The next parameter, γ01 , Finally, examining the associated variance
models the effect of race on true initial components for Model E, we see that the only
status, controlling for socio-economic status thing that has changed is the estimated varia-
and now gender as well. We estimate that tion in true initial status, which has declined
the 7th grade mathematics achievement of only slightly from 52.46 in Model C to 52.30
an African-American male from an average in Model E. Because all of the variance
socio-economic background is 4.80 points components remain statistically significant,
lower than that of a comparable White student potentially explainable residual variation in
(p<0.001). The effect of AFAM is essentially true initial status and rate of true change
unchanged when we control for FEMALE remain for future consideration.
in addition to SES. Similarly, the effect of
SES on true initial status does not change
Displaying prototypical trajectories
when controlling for FEMALE. The final
parameter in the level-2 submodel for true of change
initial status is γ03 , representing the effect For longitudinal analyses, we find that graphs
of FEMALE, controlling for race and socio- of fitted trajectories for prototypical individ-
economic status. Average 7th grade mathe- uals are more powerful tools than numerical
matics achievement is almost one point higher summaries for communicating our findings.
for girls than boys of comparable race and In Figure 22.1, we presented plots of fitted
socio-economic status, but since the p-value individual growth trajectories for prototypical
is slightly larger than 0.05, the effect is not African-American and White students, using
statistically significant at the conventionally the estimates of the fixed effects from Model
accepted 0.05 level. Nevertheless, we choose A to obtain estimates of true initial status and
to retain FEMALE as a predictor of true initial rate of true change for the two populations of
status in our model, given its substantive students (equation (7)). We can extend these
importance as a predictor of mathematics strategies to models with multiple predictors,
achievement and the fact that the p-value is as we have in Model E.
less than 0.10. Figure 22.2 presents fitted trajectories
In Model E, FEMALE is not a predictor of derived from Model E for four prototypi-
rate of true change, a substantively interesting cal students—African-American and White
finding that suggests that the rate of change students of different SES. We have selected
in mathematics achievement from 7th to 11th prototypical values of SES that correspond to
grade does not differ by gender. Our rate of the sample mean plus and minus one standard
true change submodel now includes only SES deviation (0.735 and −0.693, respectively)
Table 22.3 Fitted values of the individual growth parameters from Model E for four
prototypical individuals
AFAM SES Initial status (π̂0i ) Rate of change (π̂1i )
White Low 52.401−4.798(0)+3.616(−0.693)+0.818(1)=50.713 2.808+0.395(−0.693)=2.534
White High 52.401−4.798(0)+3.616(0.735)+0.818(1)=55.877 2.808+0.395(0.735)=3.098
African-American Low 52.401−4.798(1)+3.616(−0.693)+0.818(1)=45.915 2.808+0.395(−0.693)=2.534
African-American High 52.401−4.798(1)+3.616(0.735)+0.818(1)=51.079 2.808+0.395(0.735)=3.098
70
White, hi SES
65
African-American,
Estimated math achievement
hi SES
White, low SES
60
African-American,
55 low SES
50
45
40
7 8 9 10 11
Grade
Figure 22.2 Fitted growth trajectories for prototypical African-American and White students
of high and low socio-economic backgrounds
and chose to present trajectories for females Extensions of the multilevel model
only. Since the gender effect is small, the plot for change
would be essentially identical for males. We
compute the fitted values of the individual While it permits considerable complexity in
growth parameters for these prototypical analysis, as evidenced in Table 22.2, the
individuals as follows: shown in Table 22.3. example that we have presented in this chapter
Notice that the fitted trajectories of math- has two structural features that simplify
ematics achievement differ by both race and analysis. The example is both balanced and
socio-economic status, as anticipated. At each time-structured—all students are assessed on
level of SES, the fitted trajectory for White exactly five occasions and these occasions
students is consistently elevated above that (7th grade to 11th grade) are identical across
of African-American students, and the differ- individuals. Our analyses are also straightfor-
ential in mathematics achievement between ward in that we have used only: (1) time-
White and African-American students of the invariant predictors that describe immutable
same SES does not differ over time. The effect characteristics of the students (except for
of SES is more complex. Within racial groups, TIME itself); and (2) a representation of
the trajectory for students of high socio- TIME that forces the level-1 individual growth
economic status is above that of students of parameters to represent ‘initial status’ and
low socio-economic status across all grades. ‘linear rate of change.’ However, the multi-
Furthermore, the increase in mathematics level model for change is very flexible and can
achievement over time is more rapid for be used to address more complex problems, as
the high socio-economic status students than we now describe.
for their low socio-economic status peers,
with the difference in estimated mathematics Variably spaced measurement occasions
achievement between high and low SES 11th Researchers often collect longitudinal data
graders of the same race over 40 percent in which the actual measurement occasions
higher than the difference between these differ across individuals. These differences
students in 7th grade. may result from the realities of fieldwork and
data collection. For example, when studying datasets are described in Singer and Willett
the psychological consequences of unemploy- (2003).
ment, Ginexi et al. (2000) designed a time-
structured study, with interviews scheduled at The impact of time-varying predictors
1, 5, and 11 months after job loss. Once in the A time-varying predictor is a variable whose
field, however, the interview times varied sub- values may differ over time. Some time-
stantially around these targets, so Ginexi and varying predictors have values that change
colleagues chose to use the number of days naturally; others have values that change
since job loss as a metric for the measurement by design. For example, in the mathematics
of time in their study. Each individual in their achievement data, students’ attitudes toward
study, therefore, had a unique data collection mathematics change naturally over time. We
schedule: 31, 150, and 365 days for the first would expect students with more positive
person in the dataset; 23, 162, and 401 days attitudes about mathematics to also have
for the second person; and so on. higher levels of mathematics achievement.
Differences in the actual measurement In specifying a multilevel model for change
occasions across individuals may also occur that includes a time-varying predictor, we
by design. This is the case, for example, add the time-varying predictor to the level-1
in accelerated cohort or accelerated longi- submodel either as a main effect or as
tudinal designs, in which multiple cohorts an interaction with time, or both. Thus,
of different ages are followed longitudinally. conceptually, we may still interpret the effects
Each cohort must have at least one age that of the time-varying predictor in terms of its
overlaps with another cohort and then a single impact on true initial status and/or rate of
growth trajectory is estimated, extending from true change. However, since the time-varying
the youngest age to the oldest (Collins, 2006). predictor is added to our level-1 submodel,
The advantage of an accelerated cohort design we can also specify any additional main
is that change can be modeled over a longer effect and interaction with time as either a
temporal period using fewer waves of data. fixed or a random effect, thereby allowing
The disadvantage is that the researcher must us to investigate whether these effects are
rely more heavily on assumptions about the constant or vary across members of the
shape of the change trajectory. Miyazaki population.
and Raudenbush (2000) discuss important While time-varying predictors offer excit-
assumptions of the analysis of data from ing analytic possibilities to researchers, many
accelerated longitudinal designs. present interpretive difficulties stemming
from the problem of reciprocal causation
Varying numbers of measurement (endogeneity), as in the case of our example
occasions of mathematics achievement and attitudes
A major advantage of the multilevel model for toward mathematics: if X is correlated with
change is that it is easily fit to unbalanced data. Y , can you conclude that X causes Y or
In our mathematics achievement data, the is it possible that Y causes X? To address
analytic sample used included only students this problem it is important to first assess
with five waves of data; however, in the whether inferences are clouded by reciprocal
original dataset there are many additional causation. Second, if your data allow, con-
students with fewer waves of data. It is sider coding time-varying predictors so that
straightforward to fit the multilevel model their values in each record of the person-
for change in the larger unbalanced dataset. period dataset refer to the previous point in
With severely unbalanced datasets, however, chronological time.
there can be problems of convergence in the
iterative methods used by standard computer Modeling discontinuous individual change
packages to fit the models to the data. Practical Not all individual change trajectories are
problems that may arise when analyzing such continuous functions of time. If you believe
that individual change trajectories might Modeling change using covariance

suddenly shift in elevation and/or slope, your structure analysis
level-1 model can reflect this hypothesis. The multilevel model for change can also be
Doing so allows you to test ideas about how mapped directly onto the general mathemati-
the trajectory’s shape might be disrupted, cal framework provided by covariance struc-
with time. To postulate a discontinuous ture analysis, an analytic approach known
individual change trajectory, it is important as latent growth modeling. At its core, a
to hypothesize not just why the shift might latent growth model is essentially a multilevel
occur, but also when. The level-1 individual model for change. But, not only does the
growth model can then include one (or mapping of the multilevel model for change
more) time-varying predictor(s) that describe onto the general covariance structure model
whether and, if so, when each person provide an alternative approach to model
experiences the hypothesized shift. In some specification and estimation, the flexibility
studies, the precipitating event occurs at of the general covariance structure model
the same exact moment for everyone. In permits the modeling of simultaneous change
other studies, the precipitating event occurs in several dimensions, and other important
at different times for different people and extensions. See Singer and Willett (2003),
some participants may not experience the Willett and Sayer (1994), and Curran (2003)
event at all. Discontinuities can immediately for detailed descriptions of latent growth
affect a trajectory’s elevation, slope, or both, modeling.
and may be modeled as either fixed or
random effects. Furthermore, each person’s
Concluding comments
trajectory can be divided into discrete epochs
by adding multiple discontinuities, allowing The multilevel model for change offers
the trajectories to differ in elevation (and empirical researchers a wealth of data-
perhaps slope) during each epoch. analytic opportunities with their longitudinal
data. The approach can accommodate any
Modeling nonlinear individual change number of waves of longitudinal data, the
In addition to using the multilevel model occasions of measurement need not be equally
for change to model discontinuous change, spaced, and different participants can have
we may also use it to model smooth different data collection schedules. Individual
nonlinear individual change trajectories. The change can be represented by a variety of
easiest strategy for fitting such models is substantively interesting hypothesized trajec-
to transform either the outcome, or TIME, tories, not only linear functions presented but
in the level-1 submodel so that a growth also curvilinear and discontinuous functions.
model that specifies linear change in the In addition to time-invariant predictors of
transformed outcome or predictor will suffice. change, we can also estimate the effects
You can also model curvilinear change of time-varying predictors, whose effects
by including several level-1 predictors to may either be fixed or allowed to vary
collectively represent a polynomial function randomly across individuals in the population.
of time, which can capture a wide array of Not only can multiple predictors of change
complex patterns of change over time. Finally, be included in a single analysis, change
it is possible to specify and fit individual in multiple domains can be investigated
growth models that are fully nonlinear in the simultaneously. Finally, the multilevel model
parameters themselves, such as the logistic for change can be used to analyze intensive
and hyperbolic trajectories of change. Singer longitudinal data, where there may be nearly
and Willett (2003) provide strategies for continuous records of outcomes (Collins,
selecting optimal transformations, polyno- 2006). Readers wishing to learn more about
mial functions, and fully nonlinear individual the multilevel model for change should
growth models. consult recent books devoted to the topic,
including Diggle et al. (2002); Fitzmaurice Keiley, M. K., Bates, J. E., Dodge, K. A., & Pettit, G. S.
et al. (2004); Hedeker and Gibbons (2006); (2000). A cross-domain growth analysis: Externalizing
Raudenbush and Bryk (2002); Singer and and internalizing behavior during 8 years of
Willett (2003); Snijders and Bosker (1999); childhood. Journal of Abnormal Child Psychology, 28,
Verbeke and Molenberghs (2000); Walls and 161–179.
Kreft, I. G. G., & de Leeuw, J. (1998). Introducing
Schafer (2006); and Weiss (2005).
multilevel modeling. Thousand Oaks, CA: Sage.
Miller, J. D., Kimmel, L., Hoffer, T. B., & Nelson, C.
(2000). Longitudinal study of American youth: User’s
REFERENCES manual. Chicago, IL: International Center for the
Advancement of Scientific Literacy, Northwestern
Ai, X. (2002). Gender differences in growth in math- University.
ematics achievement: Three-level longitudinal and Miyazaki, Y., & Raudenbush, S. W. (2000). Tests
multilevel analyses of individual, home, and school for linkage of multiple cohorts in an accelerated
influences. Mathematical Thinking and Learning, longitudinal design. Psychological Methods, 5,
4, 1–22. 44–63.
Collins, L. M. (2006). Analysis of longitudinal data: The Rogosa, D. R., Brandt, D., & Zimowski, M. (1982).
integration of theoretical model, temporal design, and A growth curve approach to the measurement of
statistical model. Annual Review of Psychology, 57, change. Psychological Bulletin, 90, 726–748.
505–528. Singer, J. D., & Willett, J. B. (2003). Applied
Cronbach, L. J., & Furby, L. (1970). How should longitudinal data analysis: Modeling change and
we measure ‘change’—or should we? Psychological event occurrence. New York, NY: Oxford.
Bulletin, 74, 68–80. Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel
Curran, P. J. (2003). Have multilevel models been analysis: An introduction to basic and advanced
structural equation models all along? Multivariate multilevel modeling. London: Sage.
Behavioral Research, 38, 529–569. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical
Curran, P. J., Stice, E., & Chassin, L. (1997). The linear models: Applications and data analysis
relation between adolescent and peer alcohol methods, 2nd edition. Thousand Oaks, CA: Sage.
use: A longitudinal random coefficients model. Verbeke, G., & Molenberghs, G. (2000). Linear mixed
Journal of Consulting and Clinical Psychology, 65, models for longitudinal data. New York, NY: Springer.
130–140. Walls, T. A., & Schafer, J. L. (2006). Models for intensive
Diggle, P., Liang, K.-Y., & Zeger, S. (2002). Analysis longitudinal data. New York, NY: Oxford.
of longitudinal data, 2nd edition. New York, Weiss, R. (2005). Modeling longitudinal data. New York,
NY: Oxford.Fitzmaurice, G.M., Laird, N. M., & NY: Springer.
Ware, J. H. (2004). Applied longitudinal analysis. Willett, J. B. (1988). Questions and answers in the
New York, NY: Wiley. measurement of change. In E. Rothkopf (Ed.),
Ginexi, E. M., Howe, B. W., & Caplan, R. D. Review of research in education (1988–1989)
(2000). Depression and control beliefs in relation to (pp. 345–422). Washington, DC: American Education
reemployment: What are the directions of effect? Research Association.
Journal of Occupational Health Psychology, 5, Willett, J. B., & Sayer, A. G. (1994). Using covariance
323–336. structure analysis to detect correlates and predictors
Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data of individual change over time. Psychological Bulletin,
analysis. New York, NY: Wiley. 116, 363–381.
23
Latent Variable Models of
Social Research Data
Rick H. Hoyle
During the writing of this chapter, Rick Hoyle Shedden, 1999). Variants of most of the
was supported by grant P20-DA017589 from models described in the chapter could, in
the National Institute on Drug Abuse. principle, be evaluated using one or more of
these alternative strategies.
After presenting a brief history of structural
equation modeling, I provide an overview of
LATENT VARIABLE MODELS IN the technique, with a particular focus on the
SOCIAL RESEARCH representation of models in diagrams. I do
not provide technical details or outline the
Latent variable models concern the presence, steps involved in implementing a structural
definition, and/or influence of constructs that equation modeling analysis. Rather, I use the
either cannot be observed or characteristics brief overview as a foundation for presenting,
that are, in principle, observable but that have in conceptual terms, specific latent variable
not been directly observed in a given dataset models relevant for social research. Even
(Bollen, 2002; MacCallum & Austin, 2000; though I touch on basic models, the primary
Sobel, 1994). The focus of this chapter is focus is more complex models that take full
recent advances in the use of linear structural advantage of the capabilities of structural
equation modeling with continuous variables equation modeling. I conclude the chapter
to evaluate such models in social research. with a brief section on the limitations of the
Alternative approaches to evaluating latent technique.
variable models not covered in the chapter
include latent class analysis (Clogg, 1995),
History
latent transition analysis (Collins & Wugalter,
1992), latent profile analysis (Gibson, 1959), The origin of structural equation modeling
latent logit modeling (McCutcheon, 1994), typically is traced to the work of population
and growth mixture modeling (Muthén & geneticist, Sewall Wright, best known as
a pioneer in the synthesis of genetics and of the framework developed by Jöreskog,

evolutionary theory (e.g. Wright, 1968). Keesling, and Wiley1 . These include esti-
Wright invented the statistical method of path mators for non-normal and categorical data
analysis, a graphical model in which the linear (e.g. Bentler, 1983; Browne, 1974, 1984;
relations between variables are expressed in Muthén, 1984), and various approaches to
terms of coefficients that are derived from evaluating model fit (e.g. Bentler, 1990;
the correlations between them (Wright, 1934). Bentler & Bonett, 1980; Browne & Cudeck,
Even though Wright’s approach was limited 1993; Steiger & Lind, 1980). Also, various
by the availability of suitable estimators of notation systems (e.g. Bentler & Weeks, 1980;
those coefficients in complex models, he Jörskog, 1973; McArdle & McDonald, 1984)
foreshadowed many important developments and software programs (e.g. Arbuckle, 2003;
in structural equation modeling that did not Bentler, 1995; Jörskog & Sörbom, 1999;
come into wide use for another 50 years Muthén & Muthén, 2006) now offer multiple
(Tomer, 2003). approaches to specifying and communicating
The potential value of Wright’s model about structural equation models.
for social research was not immediately
recognized; it was not until the 1960s that
Current status
applications of path analysis to social research
data were described. The principle figures in The use of structural equation modeling
early applications of path analysis to social in the social sciences is now widespread.
research data were sociologists Blalock (1961, Social researchers have access to a growing
1964) and Duncan (1966, 1969). Duncan and number of social science-oriented textbooks
Goldberger, an econometrician, integrated and reference volumes targeting beginning
the sociological approach to path analysis (e.g. Hoyle, 1995; Kline, 2005; Maruyama,
with the simultaneous equations approach 1998; Schumacker & Lomax, 2004; Tenko &
in economics (e.g. Goldberger & Duncan, Marcoulides, 2000), intermediate (e.g. Bollen
1973) and the factor analytic approach in & Long, 1993; Hancock & Mueller, 2006;
psychology (e.g. Duncan, 1975; Goldberger, Kaplan, 2000; Wansbeek & Meijer, 2000),
1971), yielding the integrated approach to data and advanced users (e.g. Bollen, 1989;
analysis now known as structural equation Marcoulides & Schumacker, 2001), as well
modeling. as volumes focused on applications (e.g.
This general model was formalized and Bollen & Curran, 2006; DuToit et al.,
extended in the 1970s by Jöreskog (1973), 2001) and software (e.g. Byrne 2001, 2006;
Keesling (1972), and Wiley (1973), produc- Diamantopoulos & Siguaw, 2000).
ing what became known as the LISREL
(Linear Structural RELations) model. This
model includes two parts: one specifying OVERVIEW
the relations between indicators and latent
variables—the measurement model; and the Because latent variable models often include
other specifying the relations between latent many variables and parameters, fully spec-
variables—the structural model (Anderson & ifying and describing them can be a chal-
Gerbing, 1988). The LISREL model served as lenge. The principal ‘languages’ of structural
the basis for the LISREL software program, equation modeling are path diagrams and
which, by the release of Version 3 in the mid- statistical equations, the latter often involving
1970s, allowed substantively oriented social matrix equations and extensive use of Greek
researchers to specify, estimate, and test latent characters. For our purposes, path diagrams,
variable models. described in the next section, provide a
Since the mid-1970s, most significant relatively straightforward and efficient means
developments in structural equation modeling of presenting latent variable models suit-
have involved improvements to or extensions able for estimation using structural equation
LATENT VARIABLE MODELS OF SOCIAL RESEARCH DATA 397
modeling. This material is followed by a brief variables, or factors. The large oval labeled,
description of estimation and the logic of F1, is an independent variable—it is not
model fit in applications of structural equation influenced by other variables in the model.
modeling. The large ovals labeled, F2 and F3, are
dependent variables—their variance is, in
part, accounted for by other variables in the
Path diagrams
model. Paths run from each of these latent
A convenient and informative means of variables to their indicators, represented by
depicting a latent variable model is the squares labeled x1 to x10 . These paths are
path diagram (McArdle & McDonald, 1984; either labeled ‘1,’ which means the factor
McDonald & Ringo Ho, 2002). An example loading has been fixed at this value, or
appears in Figure 23.1. This path diagram *, indicating that the factor loading is to
includes all the elements necessary for be estimated from the data. Variance in
depicting even the most complex models. each indicator not attributable to the latent
The ovals represent latent variables, sources variable is allocated to measurement error,
of influence not measured directly. The or uniqueness, indicated by the small ovals
large ovals correspond to substantive latent labeled u1 to u10 . Associated with each of
∗
∗ d3
∗
F1 F3
1 ∗ ∗ 1 ∗
∗ ∗
∗
x1 x2 x3 x10 x11
d2
u1 u2 u3 u10 u11
F2
∗ ∗ ∗ ∗ ∗
1 ∗ ∗ ∗ ∗ ∗
x4 x5 x6 x7 x8 x9
u4 u5 u6 u7 u8 u9
∗ ∗ ∗ ∗ ∗ ∗
Figure 23.1 Path diagram illustrating the specification of latent independent and dependent
variables and the designation of free parameters
these ellipses is a curved, two-headed arrow

Estimation and testing
and an*, which indicates a variance. The three
latent variables are connected by directional The particular construal of observed and
arrows. Associated with each is a path latent variables in combination with the
coefficient, accompanied by a * indicating the array of free and fixed parameters shown in
magnitude and direction of influence of one Figure 23.1 constitutes a model specification.
latent variable on another. Small ovals also are The specified model is a hypothesis regarding
associated with the latent dependent variables. the mechanisms that produced the data. In
These indicate variance in the latent variables, this instance, it is a parsimonious account
labeled d2 and d3 , not accounted for by other of those mechanisms. Whereas the observed
latent variables in the model. Finally, there is data encompasses 66 parameter estimates
a variance, indicated by *, associated with the (55 covariances and 11 variances), the speci-
latent independent variable. fied model encompasses only 25 (8 loadings,
As is true of most models, this model 11 uniquenesses, 1 variance, 3 directional
includes a combination of free and paths, and 2 disturbances). Other models
fixed parameters. Free parameters are could be specified that include more or fewer
indicated by *s. The location of fixed latent variables, a different number and pat-
parameters is less obvious. It is apparent tern of directional paths, and/or covariances
that there is a single fixed loading on each among uniquenesses. It would be important
latent variable (Steiger, 2002, provides a to establish not only that the hypothesized
clear discussion of the rationale behind this model provides a suitable account of the data,
aspect of the specification). The remaining but that it provides a better fit than plausible
fixed loadings involve paths that could have alternative (MacCallum, 2003).
been included but were not. For instance, Even though a specified model might have
there is no path from F1 to x4 2 . Implicitly, a strong grounding in theory and offer a
this path has been fixed to zero. Also, there compelling conceptual model of the mecha-
are no covariances between uniquenesses, nisms that produced the observed data, there
meaning these parameters are implicitly is no guarantee that, statistically speaking, it
fixed at zero as well. Fixed parameters in does. The statistical tenability of a model is
the form of excluded paths are desirable in evaluated by using observed data to estimate
a model, for they contribute to parsimony. values for free parameters, then evaluating
They also can explain the inadequacy of a the degree to which the data implied by the
poor fitting model. Hence, when processing model including these parameter estimates,
path diagrams, it is important to take note of corresponds to, or fits, the observed data.
paths that haven omitted, indicating that the Parameter estimates typically are obtained
accompanying parameters have been fixed using the maximum likelihood estimator,
to zero. which produces values that maximize the
The model displayed in Figure 23.1 is a likelihood of the data given the specified
covariance structure model, the most common model (Myung, 2003).
type of model estimated using structural In most applications of structural equation
equation modeling. The focus of such models modeling, the relevant data are elements in the
is accounting for the covariances among variance-covariance matrix of the observed
variables. As will become evident later in the variables. This matrix is compared to a
chapter, it also is possible, and often desirable, theoretical matrix produced by substituting
to attempt to account for the observed the estimated values for the free parameters
variable means as well. Models with this in the structural equations and solving for
focus include a structured means component the covariances and variances (Bollen, 1989).
(e.g. Thompson, 2006). Two examples of To the extent that the observed and implied
models with structured means components are covariance matrices do not, within sampling
presented later in the chapter. error, differ, the specified model is said to
fit the data. Because the assumptions of the data. Even though the focus is on innovative
basic statistical test of whether these matrices models that are not yet in wide use in social
differ are rarely met in practice, a host of research, the presentation of each group is
adjunct fit indices have been developed and prefaced by a description of basic models of
informal criteria for applying them proposed that type.
(Hu & Bentler, 1995, 1999). When, by well-
justified criteria, a model yields an acceptable
Measurement
account of the data, parameter estimates can
be interpreted in a manner not unlike the As noted earlier, model specifications might
interpretation of regression coefficients or comprise one or both of two components—
factor loadings. measurement and structural (Anderson &
Before turning to a presentation of specific Gerbing, 1988). The measurement component
model specifications of potential interest to concerns the relations between latent vari-
social researchers, it is important to establish ables and their indicators, and the structural
that not all models that are specified can component concerns the directional relations
be estimated. For a model to be estimated, between the latent variables. Models need not
it must be specified in such a way that it include both components and, in fact, models
is identified. Conceptually speaking, identi- that include only the measurement equations
fication concerns the integrity of estimates are relatively common. A focus strictly on
of free parameters in a model. If a model the measurement component typically is
is identified, a unique estimate for each motivated either by a desire to test specific
and every free parameter can be obtained hypotheses about the latent structure of a set
given the criteria of the estimator. If no of indicators or a need to ensure the integrity
value, or more than one value, of one or of a set of latent variables before testing
more free parameters can be obtained, then hypotheses about the relations between them.
the model is unidentified and estimates of The use of structural equation modeling in
parameters are not valid. Eve though most this way is referred to as confirmatory factor
applications typical of social research yield analysis (Hoyle, 2000).
models that are identified, it is wise to evaluate
the identification status of a model before
Basic model
estimating. Even though application of a num-
ber of relatively straightforward identification The most basic application of structural
rules can provide some assurance that the equation modeling to matters of measurement
model is identified, the definitive evaluation is the first-order factor model. In this model,
of the identification status of individual free one or more latent variables are predicted
parameters and the model as a whole requires to explain the commonality among a set
solving the structural equations using the of indicators. Returning to Figure 23.1, if
variances and covariances (Bollen, 1989). the directional paths between latent variables
were replaced by curved arrows indicating
covariance, the model would be a basic first-
SPECIFIC MODELS FOR SOCIAL order measurement model. Because there are
RESEARCH DATA no directional paths between latent variables,
all latent variables are, in effect, independent
In this section I present a series of specific variables. To illustrate, Funk (1999) used
latent variable models relevant for social data from the National Election Studies to
research. The models are presented in three investigate the latent structure of trait ratings
groups: measurement models, which focus on of presidential nominees. The hypothesized
latent variables but not the relations between three-factor model proved superior to one-
them; models appropriate for cross-sectional and two-factor models and held across all
data; and models appropriate for longitudinal nominees for which data were available.
An advantage of this basic application over model with correlated factors. In principle
traditional methods such as exploratory fac- and with enough indicators and factors, one
tor analysis is that competing models can could estimate third-order or higher models;
be formally compared, specific aspects of however, in practice, such models are rare.
models (e.g. correlations between factors)
can be formally evaluated, and adjustments
Models of measurement invariance
can be made to accommodate covariation
among indicators not explained by the latent When relations between latent variables or
variables (i.e. correlated uniquenesses) or mean levels on constructs represented by
indicators influenced by more than one latent those latent variables are to be compared
variable (i.e. cross loadings). Even though this across samples or within a sample across time,
approach to factor analysis is sometimes por- a key concern is whether the meaning of the
trayed as contrasting sharply with exploratory latent variables is consistent across levels of
factor analysis, it is possible to relax many of the dimensions on which they are to be com-
the restrictions on the standard confirmatory pared such as nationality (e.g. Steenkamp &
factor model (e.g. simple structure) and, Baumgartner, 1998), measurement modality
in so doing, approximate applications of (e.g. Deutskens et al., 2006), and age (Pentz &
exploratory factor analysis (Hoyle & Duvall, Chou, 1994). To the extent that the measure-
2004). ment model for a latent variable is consistent
across samples or time, it is invariant with
respect to measurement. Despite the obvious
Higher-order factor models
importance of measurement invariance, it
For measurement models that specify four or is rarely evaluated (Vandenberg & Lance,
more first-order factors, it is possible to test 2000).
hypotheses about sources of commonality that In order to illustrate the various aspects
underlie correlations among the factors. In so of measurement invariance and how they
doing, one, in effect, combines a confirmatory are evaluated, it is useful to consult a
factor analysis of the observed variables with path diagram. Displayed in Figure 23.2 is
a confirmatory factor analysis of the factors. a single model with two correlated latent
As would be the case for factors at the variables that is specified for two levels, a
first order, factors at the second order are a and b, of some dimension of interest (e.g.
function of commonality—in this instance, ethnicity, age). Note the presence of paths
commonality among the first-order factors. that are not present in the model shown in
Also, as would be the case in terms of Figure 23.1. In the typical application of
observed variables at the first-order level, structural equation modeling, all variables
at least four first-order factors are necessary are rescaled as deviations from their means,
in order to allow for a test of that portion thereby setting intercepts in the measurement
of the model3 . For example, Hoyle (1991) equations and means of the latent variables
examined the second-order structure of a to zero. In tests of measurement invariance,
20-item measure of self-esteem designed to the estimated values of these constants often
yield four first-order factors corresponding to are of interest. The triangle in the center of
self-esteem domains (e.g. social competence, the two-factor model is a constant affecting
physical appearance). He reasoned that the the indicators and the latent variables. Paths
correlation among the first-order factors could running from the constant to the indicators
be attributed to a general self-esteem factor, correspond to intercepts. Paths running from
which would be evidenced by a single the constant to the latent variables correspond
second-order factor. The analyses indicated to means.
that the second-order model provided a Note that every parameter in the two-factor
good account of the data and, importantly, model on the left and has a corresponding
provided a better account than a first-order parameter in the two-factor model on the
∗ ∗ ∗ ∗ ∗ ∗
F1a F2a F1b F2b

∗ ∗ ∗ ∗
1 1
1 ∗ ∗ 1 ∗ ∗ 1 ∗ ∗ 1 ∗ ∗
0 ∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗ ∗ ∗
x1a x2a x3a x4a x5a x6a x1b x2b x3b x4b x5b x6b
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
Figure 23.2 Path diagram illustrating parameters that can be compared in studies of
measurement invariance
right. These parameters are estimated from the what they represent from variance attributable
augmented moment matrix, which adds means to how they were measured. The multitrait-
to the covariance matrix, and adds the mean multimethod matrix is a covariance matrix
structure to the standard covariance structure comprising data on two or more characteris-
in the model. If the model were fully invariant tics obtained using two or more methods. For
for a and b, every pair of parameters would instance, McPherson and Rotolo (1995), in a
be equivalent. Rather than comparing each study of the composition of voluntary groups,
pair of parameters individually, invariance obtained data on four characteristics of such
analyses usually involve comparing sets of groups (e.g. group size, age composition)
parameters and doing so in a systematic provided by three sources (e.g. group member,
manner (Widaman & Reise, 1997). The observer). Using the language of multitrait-
result is a determination of whether the multimethod analysis, in this example, group
observed variables reflect similar constructs, characteristics are ‘traits’ and sources are
and therefore can be compared, across groups ‘methods.’ In the prototypic model specifica-
or time (Byrne et al., 1989; Steenkamp & tion, each observed score is influence by a trait
Baumgartner, 1998). factor, a method factor, and a uniqueness com-
ponent (Marsh & Grayson, 1995). Variance
in the observed variables is decomposed
Multitrait-multimethod models into a portion attributable to the construct
The measurement models described to this regardless of how it is measured (monotrait-
point decompose variance in observed vari- heteromethod), a portion attributable to how
ables into two components: variance shared it was measured without reference to the
with other indicators of a single latent variable constructs (heterotrait-monomethod), and a
and uniqueness. It is, however, possible to portion attributable neither to the trait nor
further decompose variance by accounting for the method (uniqueness). Obtaining estimates
multiple sources of commonality. Such is the of parameters in the prototypic model can
case in latent variable models of multitrait- be difficult, but alternative, more robust
multimethod data (Campbell & Fiske, 1959), specifications have been proposed (Kenny &
which allow for the disentanglement of Kashy, 1992). Latent variable models of
variance in observed variables attributable to multitrait-multimethod data provide useful
information about the reliability and validity structural portion of the model. Referring back
of observed variables. to Figure 23.1, this focus concerns the direc-
tional relations between the latent variables,
estimated by the *s on the directional paths,
Trait-state-error models
and the disturbance terms, d1 and d2 . In the
Like multitrait-multimethod models, trait- remainder of this section, I focus on models
state-error models posit the influence of two for data gathered at a single point in time.
latent variables on each observed variable
(Cole et al., 2005; Kenny & Zautra, 1995).
Basic model
The univariate trait-state-error model is a
sophisticated measurement model that, in The most basic structural model includes one
effect, decomposes variance in a construct latent independent and one latent dependent
measured on four or more occasions into variable (e.g. F1 and F3 in Figure 23.1). Such
three components. The trait component is a model is equivalent to a simple regression
that part that does not change over time— model except that both the predictor and
the autoregressive component in panel and outcome do not reflect sources of error that
time-series designs. The state component is vary across the indicators (DeShon, 1998).
that portion of the variance that is reliable Thus, latent variable models overcome a crit-
but variable over time. The error component ical shortcoming of traditional approaches to
is that portion of variance that is not reliable modeling directional effects. Latent variable
over time. For example, Zautra et al. (1995) models also provide a means of evaluating the
obtained 10 monthly measures of pain and effects of independent variables on multiple
psychological distress. Their trait-state-error dependent variables while also evaluating the
model revealed that 60 percent of the variance effects of dependent variables on each other.
in pain and 75 percent of the variance For instance, in Figure 23.1, F1 affects both
in psychological distress was stable and F2 and F3, which are specified as related
therefore trait-like over the year of their due to the directional influence of F2 on F3.
study. Importantly, however, 35 percent of the Importantly, however, latent variable models
variance in pain and 18 percent of the variance do not overcome the significant limitation of
in psychological distress could be attributed cross-sectional data for tests of directional
to reliable variance at each assessment. In effects. For instance, if data on the 11 observed
a bivariate form of the model, they were variables in Figure 23.1 were gathered at a
able to study the directional influences of single point in time, the arrows between the
pain and distress on each other, focusing only latent variables could be reversed with no
on reliable variance in the variables subject change in model fit (MacCallum et al., 1993).
to change over time (i.e. states). The trait- Thus, it is not possible to test in a definitive
state-error model is valuable both for the manner the direction of influence between
information it provides regarding the nature variables measured at a single point in time
of variability on a construct and as a means of (Gollob & Reichardt, 1991). This raises the
examining the causal influence of components question of why one would use structural
of constructs subject to change over time. equation modeling on cross-sectional data
when more familiar models are available.
Even though the ability to model predictors
Cross-sectional
and outcomes as latent variables cannot
I now turn to latent variable models that focus address the directionality criterion association
on the relations between latent variables. In with causal inferences, it provides significant
such models, we assume that the specification benefits for addressing the two remaining
of the relations between indicators and latent criteria: association and isolation (Bollen,
variables has been evaluated and deemed 1989). In terms of association, the removal of
adequate, allowing the focus to shift to the some forms of error from constructs between
estimating their association ensures that the depressive symptoms among inner-city youth
association is not underestimated. In terms is mediated by control beliefs.
of isolation, the ability to model extraneous A key concern in tests of mediated effects
influences as latent variables operating at dif- is the reliability of the mediator. The more
ferent points in a model optimizes statistical unreliable the mediator, the more the indirect
control when random assignment to levels of effect is underestimated and the direct effect
causal constructs is not feasible. overestimated (Hoyle & Kenny, 1999). Thus,
With this background, I now describe with an unreliable mediator, it is possible to
two useful latent variable models of cross- conclude partial or no mediation of an effect
sectional data. when mediation is, in fact, full. For this reason,
it is advisable to always model mediators as
latent variables in tests of mediated effects.
Mediated effects
Mediators are variables that represent
Moderated effects
constructs proposed to explain the association
between two variables (Hoyle & Robinson, The evaluation of a moderated effect, in
2003). In social research, mediational conceptual terms, involves an evaluation of
hypotheses typically are evaluated using the the effect (direct or indirect) of an independent
measurement-of-mediation design (Spencer variable on an outcome at different levels
et al., 2005). In this design, the causal of a moderator variable. In social research,
variable is either manipulated or measured moderated effects are sometimes referred to as
and mediators and outcomes are measured. interaction effects and evaluated as a matter of
In cross-sectional designs the mediators course in research involving factorial designs,
and outcomes are assessed simultaneously from which data typically are analyzed using
despite the fact that mediators are presumed analysis of variance. When the independent
to exert a causal influence on the outcomes. variable and/or moderator variable are mea-
The evaluation of a mediated effect involves sured on a continuum rather than manipulated,
partitioning the effect of a causal variable on the data are best analyzed using techniques
an outcome into two portions: the direct effect that do not evaluate interaction effects as a
and the indirect effect. The direct effect is that matter of course (e.g. multiple regression).
portion of the effect that is not transmitted In such cases, researchers must manually
through the mediator. Referring back to construct interaction terms and evaluate
Figure 23.1, the path from F1 to F3 is the them in strategically specified predictive
direct effect. In the three-variable case, the equations.
remaining portion of the effect is transmitted Tests of moderated effects involving latent
through the mediator as an indirect effect. In variables are rarer still. This is unfortunate
the model shown in Figure 23.1, the mediator because, as with tests of mediated effects, tests
is F2 and the magnitude and direction of the of moderated effects are adversely affected
indirect effect is expressed in the product by measurement error (Busemeyer & Jones,
of the parameter estimates for the F1-F2 1983; McClelland & Judd, 1993). Even
and F2-F3 relations. Statistically speaking, though the adverse effect of measurement
F2 mediates the relation between F1 and error could be overcome by specifying the
F3 if the indirect effect is significant. If the interaction term as a latent variable, histori-
F1-F3 relation remains significant in the cally this strategy has not been accessible to
presence of the significant indirect effect, most social researchers because the loadings
then the mediation is only partial; if the F1-F3 and uniqueness terms associated with these
relation is nonsignificant, then the mediation latent variables are nonlinear transformations
is full. For example, using structural equation of their counterparts in the latent variables
modeling in this way, Deardorff et al. for the independent and moderator variables
(2003) found that the effect of stress on (Kenny & Judd, 1984). This nonlinearity can
be incorporated into the specification of the components. Thus, for instance, if F1-F2
latent variable representing the interaction relation is significant, the inference regard-
term; however, if the number of indicators ing directionality is nonetheless ambiguous
of the independent and moderator variables because it might reflect nothing more than
exceeds three, the specification becomes pro- correlation between the stable components
hibitively complex. Fortunately, ignoring the of F1 and F2. This inferential ambiguity is
theoretical nonlinearity in these parameters overcome through the use of a replicative
produces results that, in practical terms, strategy, in which all variables are assessed
are equivalent to results obtained using the and included in the model at each wave.
more complex specification (Marsh et al., This strategy allows for the evaluation of
2004). Published examples of moderated lagged effects from which temporal stability
effects involving latent variables are rare; a in constructs has been removed.
substantive example can be found in Ping
(1996).
Cross-lagged panel models
In the simplest latent variable cross-lagged
Longitudinal
panel model, two constructs are measured
Well-designed longitudinal studies offer sig- using multiple indicators at two points in
nificant inferential advantages over cross- time. The name derives from the fact that, in
sectional studies, the foremost being the addition to autoregressive effects—the effect
possibility of definitive tests of directionality of each construct on itself at subsequent
(Halaby, 2004). In this section, I focus on waves (i.e. stability)—the model specifies
latent variable models of data from studies an effect of each construct on the other
involving at least two assessments. construct at the next wave. These latter
effects, which are the focal part of the model,
are the cross-lagged paths. As with tests of
Basic model
mediated and moderated effects, controlling
In the basic longitudinal latent variable for measurement error is vital in cross-lagged
model, variables are positioned in a model panel models. In such models the adverse
according to when they were assessed. Thus, effect of measurement error extends beyond
for example, the specific arrangement of the attenuation of associations. Because
the latent variables in the model shown hypotheses about causal priority concern
in Figure 23.1 would suggest a three-wave the relative magnitude of the cross-lagged
longitudinal design in which F1 was assessed paths, it is critical that the reliability of the
in the first wave, F2 in the second wave, variables be equivalent. By modeling them
and F3 in the third wave. This rudimentary as latent variables, the reliability of each
longitudinal model is an example of the variable is 1.0 and, therefore, differences in
sequential strategy of longitudinal research, in the cross-lagged path coefficients cannot be
which the temporal order in which constructs attributed to differences in the reliability of
are assessed corresponds to the presumed the measures.
causal order of constructs in the model In latent variable models of cross-lagged
(Hoyle & Robinson, 2003). Data from this panel data, the primary concern is the absolute
design are an improvement over data from the and the relative magnitudes of the coefficients
cross-sectional design because the directional associated with the cross-lagged paths. In
paths can logically only go in the direction absolute terms, the concern is whether, after
they are specified; however, the improvement control for stability in the constructs, there
is modest. This is because, using terminology is evidence of an association between them.
from the trait-state-error model described In relative terms, the concern is whether one
earlier, the latent variables include both trait cross-lagged path coefficient is larger than
(i.e. stable) and state (i.e. time-specific) the other over the same span of time. If one
cross-lagged path coefficient is larger than the

other, particularly if the smaller coefficient
is not significantly different from zero, then
the evidence supports an inference of a causal Intercept Linear
relation in the direction of the path associated
1
with the larger coefficient. Sher et al. (1996)
used this strategy to investigate the association 1 1 1 1
between alcohol outcome expectancies and
alcohol use. In a two-wave study across 0 1 2 3
three years, they found evidence that alcohol
expectancies and use are associated and that x1 x2 x3 x4
the direction of prospective influence is from
expectancies to use.
Latent growth curve models

An alternative strategy for modeling repeated Figure 23.3 Path diagram of an
observations of a sample is latent growth unconditional latent growth model in which
curve modeling, by which trajectories of linear growth is modeled across four,
means are modeled and potentially included equally spaced assessments
as predictors or outcomes in structural models
(Duncan et al., 1999). Because the focus is
on modeling means rather than covariances, multilevel model. For instance, if person-level
these models are fit to the augmented moment (i.e. Level 2) data on other constructs are
matrix (described in the earlier section on available, the influence of those constructs
measurement invariance). Models of latent on the intercept and linear growth latent
growth require at least three, preferably variables can be estimated in a conditional
four, repeated observations on a sample. As growth model (Willett & Sayer, 1994). An
with trend analysis in repeated measures instructive example of latent growth modeling
analysis of variance, one can model up to is provided by Reynolds et al. (2005), who
k-1 trajectory shapes (e.g. four observations modeled linear and quadratic change in
would allow fitting of linear, quadratic, and cognitive abilities in adult twins from their
cubic trajectories). In unconditional models, early fifties into their mid-sixties.
determining the best-fitting trajectory for one An innovative longitudinal latent variable
or more samples is the primary focus. The model combines cross-lagged panel and latent
basic specification for an unconditional latent growth models in an autoregressive latent
growth model is shown in Figure 23.3. Free trajectory model (Bollen & Curran, 2004).
parameters are omitted; only the essential Even though this model can be estimated with
fixed parameters are shown. Note that the as few as three repeated observations, the ideal
paths from the intercept latent variable are design would include at least five. A strength
fixed to 1, and the paths from the linear of this model is the simultaneous and
latent variable are fixed to values that begin integrative approach to modeling associations
with zero and increase linearly4 . Also note (i.e. covariances) and trajectories (i.e. means)
that, as in Figure 23.2, the latent variables over time.
are influenced by a constant, which produces
estimates of the latent variable means. Thus,
this model yields estimates of the mean LIMITATIONS
intercept and linear slope. Variances also are
estimated for these latent variables, which Structural equation modeling is a flexible and
can be treated as Level 2 variables in a general approach to latent variable modeling
in social research; however, as with any application of structural equation modeling in

statistical model, it is not without limitations. social research, the consequences of violating
These limitations are well documented in the assumption in the manner typical of such
the literature (for general discussions, see research (e.g. 5- or 7-point Likert scales)
Breckler, 1990; MacCallum & Austin, 2000), do not appear to be severe (e.g. Tepper &
and increasingly are understood and acknowl- Hoyle, 1996). Nonetheless, the more coarsely
edged by social researchers. Six of these categorized a measurement scale, the greater
limitations are described in the remainder of the cause for concern, and estimation from
this section. data gathered on response scales with fewer
than five options is best carried out using
an estimator for ordered categorical variables
Sample size
(e.g. Muthén, 2001).
The maximum likelihood estimator can be
expected to evince theoretical properties
Model fit
in arbitrarily large samples. Not all social
research yields large samples, raising the A fundamental concern in applications of
question of how large a sample must be in structural equation modeling is the determi-
order to produce valid estimates and tests. nation of whether a given model offers a
Even though there are qualifying factors, and suitable explanation for a set of data. As noted
the number is somewhat variable as a function earlier in the chapter, the determination of
of the particular outcome in question (e.g. whether the covariance matrix implied by a
parameter estimates, fit indices), simulation model differs from the observed covariance
studies point to about 400 as the number matrix is not straightforward. The standard
of observations at which the outcomes of hypothesis of no difference between these
maximum likelihood estimation correspond to two matrices is increasingly recognized as
expectation (e.g. Bentler, 1990). The stability unreasonable (Browne & Cudeck, 1993).
of parameter estimates is questionable in Moreover, the traditional test of this hypoth-
all but the simplest models (i.e. fewer than esis is, in practice, dependent on sample
10 variables) with fewer than 200 observa- size, ironically favoring models estimated
tions (Loehlin, 1992). This number increases from small samples. This ambiguity in the
as the distributions of the variables depart evaluation of fit is compounded by concerns
from normality and as models become more stemming from how a model was estimated.
complex. The minimum number is substan- For instance, it is not unusual for social
tially larger for estimators that do not assume researchers to specify an initial model that
normality and/or continuous measurement. does not meet fit criteria. The model is then re-
specified and estimated and new fit statistics
produced. These statistics must be interpreted
Measurement scale
with caution, because they were produced
Another fundamental assumption of the by a model modified after having consulted
maximum likelihood estimator is continuous the data. In such cases, the likelihood of
measurement of variables (Jöreskog, 1994). producing a model that would not replicate
Strictly speaking this assumption is not in a new sample from the same population is
met by most measures in social research. unacceptably high (MacCallum et al., 1992).
Typically, research participants are provided
a relatively small number of response options
Equivalent models
arrayed along a continuum defined by the
two extreme options. Such response formats The interpretation of results from estimation
produce variables that, at best, evince interval- of a model also must take into account
scale properties. Even though violation of this the possibility that other models would
assumption seems almost certain in the typical provide an equally tenable account of the
data (MacCallum et al., 1993). For instance, self-reports, that error is reflected in the latent
one might posit a second-order factor model variable rather than the measurement error
in which a general factor accounts for the terms. Only the influence of those sources
correlations among three first-order factors. of error that vary across indicators, as in
Even though suitable absolute fit of this model multitrait-multimethod models, is removed
would be consistent with the hypothesis, the from latent variables (DeShon, 1998).
fit of this model would be identical to the fit
of a model in which the three factors were
Software
simply allowed to correlate. This issue is of
greater concern in structural models estimated Historically, software programs for estimat-
from cross-sectional data, for which plausible ing structural equation models could be
equivalent models often can be generated described as a limitation because their use
that reverse the specified direction of effects. assumed familiarity with statistical theory
For this reason, structural equation modeling and notation at a level uncommon among
alone cannot determine the direction of associ- social researchers. Ironically, the relative
ation between two constructs. The advantages ease with which software programs can now
structural equation modeling offers over other be used for estimating structural equation
statistical approaches in this regard are the models has introduced a new concern—that
capacity to model relations between latent social scientists can specify and estimate
variables and, in quasi- or nonexperimental models without adequate understanding of
studies, to isolate putative causes and effects what they are doing. Steiger (2001) notes
from extraneous variables. that, because of the ease with which such
software can be used (e.g. specification
through diagrams), ‘the newcomer is led to
Measurement error correction
believe that there is this impressive, but easy-
As noted throughout the chapter, a significant to-use technique that allows modeling of
advantage of latent variable models is the causality in a kind of flow diagram’ (p. 338).
capacity for modeling relations between Given the many ways in which a structural
variables from which the effects of certain equation modeling analysis can go awry,
sources of measurement error have been the complexity in evaluating model fit, and
removed. It is not uncommon, however, for the caveats associated with inferences about
social researchers to overstate the degree to models and parameters, the likelihood of
which latent variables are error free (DeShon, misuses of structural equation modeling by
1998). In the typical case (cf. Bollen & novice users is higher than ever.
Lennox, 1991) latent variables are a function
of the commonality across all their indicators.
Variance in indicators not shared with the FUTURE DIRECTIONS
remaining indicators is termed measurement
error, or uniqueness, and potentially includes As the use of structural equation modeling
random and systematic components. Of rele- has become more commonplace across the
vance to measurement error correction is the social sciences, the gap between what can
fact that the measurement errors do not— legitimately be accomplished using the tech-
and therefore the latent variables do—contain nique in its traditional form and the questions
variance that is common to all the indicators. social scientists wish to ask of their data
As such, if all indicators are subject to the has become increasingly apparent. In effect,
same source of measurement error, the latent the direction of influence between statistical
variable, in fact, is not free of the influence methodology and research application has
of that source of error. For instance, if error reversed. From the mid-1970s to the late-
attributable to self-reports is a concern but 1990s, as social researchers came to appreci-
all indicators are operationally defined as ate the potential of latent variable modeling,
they were inspired to address more complex Advances in the capacity for estimating
research questions in a more holistic manner. from non-normal and categorical data have
At the dawn of the twenty-first century, paved the way for advances in terms of
with structural equation modeling having the kinds of latent variable models that can
become more familiar to social scientists, be specified and estimated. For instance,
they began contemplating research questions a focus by methodologists on estimators
beyond the reach of standard specification and specification strategies for modeling
and estimation strategies. Thus, an alternative nonlinear effects promises to increase the ease
direction of influence, from social researchers with which such effects can be incorporated
to statistical methodologists, has emerged. into models such as the ones described in this
Spurred by the increasingly complex demands chapter (Schumacker & Marcoulides, 1998).
of social research data and questions, sta- A particularly promising advance concerns
tistical methodologists are extending the the modeling of latent variables that are
boundaries of what traditionally would have categorical. These latent variables can reflect
been considered appropriate applications of latent classes in the traditional sense or reflect
structural equation modeling. distinctive classes of latent growth trajectories
Three primary fronts on which this exten- (Muthén, 2001). Such applications illustrate
sion is taking place concern qualities of social the increasing generality of statistical models
research data. As noted earlier, the standard for estimating latent variable models in social
estimator in structural equation modeling, research, potentially including in a single
maximum likelihood, assumes multivari- model continuous and categorical indicators,
ate normality and continuous measurement. continuous and categorical latent variables—
In practice, these conditions often are not met. of which some are latent classes, and multiple
Even though the maximum likelihood estima- levels of analysis—of which one might be a
tor is reasonably robust to violations of these latent growth model of individual-level data.
assumptions, the extent of non-normality or
coarseness of measurement in social research
data sometimes clearly exceeds the limits of CONCLUSION
this robustness. Advances in estimation from
non-normal and categorical data that perform Structural equation modeling is a flexible and
well in practice are increasingly available general statistical approach to specifying and
to social researchers (e.g. Muthén, 1984). evaluating latent variable models in social
A third characteristic of data with which research. In this chapter, I described and
social researchers often have to contend is provided examples of basic and advanced
missingness. Considerable progress has been applications of structural equation modeling
made in the understanding and implementa- relevant to social research. Measurement
tion of strategies for managing missingness models focus strictly on the relations between
that are not specific to a particular statistical observed variables and the latent variables
strategy (e.g. Schafer & Graham, 2002). Also, they are assumed to reflect. They can be used
however, statistical software for estimating to decompose variance in observed variables
latent variable models is increasingly likely in ways that both increase understanding of
to include an estimator that allows for the observed variables and produce latent vari-
the management of missingness within the ables that are relatively pure representations
context of specific models (Arbuckle, 1996; of the constructs the observed variables are
Enders, 2001). Because meeting minimum assumed to reflect. Even though structural
sample size recommendations for applications equation modeling is not a viable solution
of structural equation modeling is a challenge to the primary limitation of cross-sectional
in some social science literatures, the avail- data—the inability to determine direction
ability of a strategy for keeping all research of influence—it is nonetheless useful for
participants in the analysis sample is critical. modeling such data by enabling some control
over the effects of measurement error on corresponding to the various trajectory shapes must
directional relations and the inclusion of take into account the relative time lapses between
multiple dependent variables and the relations waves. In Figure 23.3, the use of 0, 1, 2, and 3 for
the coefficients corresponding to a linear trajectory
among them. The ability to eliminate some indicates an assumption of equal spacing between
sources of measurement error is particularly waves. If, for example, there were six months between
beneficial in ‘third-variable’ models such the first three waves and a year between the last
as mediation and moderation, in which the two waves, the coefficients corresponding to a linear
effects of such error are compounded. The trajectory would be 0, 1, 2, 3, and 5.
full benefits of structural equation modeling
are apparent in latent variable models of
longitudinal data. In traditional autoregressive REFERENCES
models, structural equation modeling allows
for simultaneous estimation of directional Anderson, J. C., & Gerbing, D. W. (1988). Structural
effects across waves controlling for measure- equation modeling in practice: A review and
ment error. In latent growth curve models, recommended two-step approach. Psychological
structural equation modeling allows for the Bulletin, 103, 411–423.
estimation of patterns of change and the Arbuckle, J. L. (1996). Full information estimation in the
prediction of variation in those patterns across presence of incomplete data. In G. A. Marcoulides &
individuals. These models are illustrative of R. E. Schumacker (Eds.), Advanced structural equation
the broad range of latent variable models modeling techniques (pp. 243–277). Mahwah, NJ:
relevant to social research. A burgeoning Erlbaum.
Arbuckle, J. L. (2003). AMOS 5.0 update to the AMOS
didactic literature on applied structural equa-
User’s Guide. Chicago, IL: SPSS.
tion modeling coupled with software updated
Bentler, P. M. (1983). Simultaneous equation systems as
frequently to reflect the latest developments moment structure models. Journal of Econometrics,
in estimation and testing make these models 22, 13–42.
more appealing than ever. Bentler, P. M. (1990). Comparative fit indices in struc-
tural models. Psychological Bulletin, 107, 238–246.
Bentler, P. M. (1995). EQS structural equations program
manual. Encino, CA: Multivariate Software.
NOTES Bentler, P. M., & Bonett, D. G. (1980). Significance
tests and goodness-of-fit in the analysis of covariance
1 An important exception is Muthén’s more gen- structures. Psychological Bulletin, 88, 588–606.
eral framework, implemented in the Mplus software Bentler, P. M., & Weeks, D. G. (1980). Linear structural
program (Muthén & Muthén, 2006). equations with latent variables. Psychometrika, 45,
2 Readers familiar with exploratory factor analysis 289–308.
will recognize this specification as corresponding to Blalock H. M. (1961). Correlation and causality: The
simple structure, which, in the exploratory case, is
multivariate case. Social Forces, 39, 246–251.
sometimes achieved through rotation. By forcing
many loadings to zero, confirmatory factor analysis
Blalock, H. M. (1964). Causal inferences in nonexper-
avoids the indeterminacy of parameter estimates in imental research. Chapel Hill: University of North
exploratory factor analysis. Carolina Press.
3 With only two or three first-order factors, Bollen, K. A. (1989). Structural equations with latent
although a second-order factor could be specified, variables. New York: Wiley.
such a model would yield identical fit to a first- Bollen, K. A. (2002). Latent variables in psychology and
order model with correlated factors. Thus, although the social sciences. Annual Review of Psychology 53,
adequate fit of such models would suggest that a 605–634.
second-order model is consistent with the data, a Bollen, K. A., & Curran, P. J. (2004). Autoregressive
first-order model with correlated factors would be
latent trajectory (ALT) models: A synthesis of two
equally consistent with the data. Nonetheless, if the
loadings of a set of first-order factors on a second-
traditions. Sociological Methods and Research, 32,
order factor are high, the data favor interpretation of 336–383.
the second-order model. Bollen, K. A., & Curran, P. J. (2006). Latent curve
4 As with orthogonal polynomials in analysis of models: A structural equation perspective. Hoboken,
variance or multiple regression analysis, the values NJ: Wiley.
Bollen, K. A., & Lennox, R. D. (1991). Conventional DeShon, R. P. (1998). A cautionary note on measure-
wisdom on measurement: A structural equation ment error corrections in structural equation models.
perspective. Psychological Bulletin, 110, 305–314. Psychological Methods, 4, 412–423.
Bollen, K. A., & Long, J. S. (Eds.) (1993). Testing Deutskens, E., de Ruyter, K., & Wetzels, M. (2006).
structural equation models. Thousand Oaks, CA: Sage An assessment of equivalence between online and
Publications. mail surveys in service research. Journal of Service
Breckler, S. J. (1990). Applications of covariance Research, 8, 346–355.
structure modeling in psychology: Cause for concern? Diamantopoulos, A., & Siguaw, J. A. (2000). Introducing
Psychological Bulletin, 107, 260–273. LISREL: A guide for the uninitiated. London: Sage
Browne, M. W. (1974). Generalized least squares Publications.
estimators in the analysis of covariance structures. Duncan, O. D. (1966). Path analysis: Sociological
South African Statistical Journal, 8, 1–24. examples. American Journal of Sociology, 74,
Browne, M. W. (1984). Asymptotic distribution free 119–137.
methods in analysis of covariance structures. British Duncan, O. D. (1969). Some linear models for two-wave,
Journal of Mathematical and Statistical Psychology, two-variable panel analysis. Psychological Bulletin,
37, 62–83. 72, 177–182.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of Duncan, O. D. (1975). Introduction to structural equation
assessing model fit. In K. A. Bollen & J. S. Long (Eds.), models. New York: Academic Press.
Testing structural equation models (pp. 136–162). Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., &
Thousand Oaks, CA: Sage Publications. Alpert, A. (1999). An introduction to latent variable
Busemeyer, J. R., & Jones, L. D. (1983). Analysis of growth curve modeling. Mahwah, NJ: Erlbaum.
multiplicative combination rules when the causal DuToit, S., Cudeck, R., & Sörbom, D. (Eds.) (2001).
variables are measured with error. Psychological Structural equation modeling: Present and future.
Bulletin, 93, 549–562. Lincolnwood, IL: Scientific Software International.
Byrne, B. M. (2001). Structural equation modeling Enders, C. K. (2001). A primer on maximum likelihood
with AMOS: Basic concepts, applications, and algorithms available for use with missing data.
programming. Mahwah, NJ: Erlbaum. Structural Equation Modeling, 8, 128–141.
Byrne, B. M. (2006). Structural equation modeling with Funk, C. L. (1999). Bringing the candidate into models
EQS: Basic concepts, applications, and programming. of candidate evaluation. Journal of Politics, 61,
Mahwah, NJ: Erlbaum. 700–720.
Byrne, B. M., Shavelson, R. J., & Muthén, B. Gibson, W. A. (1959). Three multivariate models: Factor
(1989). Testing for the equivalence of factor analysis, latent structure analysis, and latent profile
covariance and mean structures: The issue of partial analysis. Psychometrika, 24, 229–252.
measurement invariance. Psychological Bulletin, 105, Goldberger, A. S. (1971). Econometrics and psychomet-
456–466. rics: A survey of commonalities. Psychometrika, 36,
Campbell, D. T., & Fiske, D. W. (1959). Convergent and 83–107.
discriminant validation by the multitrait-multimethod Goldberger, A. S., & Duncan, O. D. (Eds.) (1973).
matrix. Psychological Bulletin, 56, 81–105. Structural equation models in the social sciences.
Clogg, C. C. (1995). Latent class models. In G. Arminger, New York: Academic Press.
C. C. Clogg, & M. E. Sobel (Eds.), Handbook of Gollob, H. F., & Reichardt, C. S. (1991). Interpreting
statistical modeling for the social and behavioral and estimating indirect effects assuming time lags
sciences (pp. 311–359). New York: Plenum. really matter. In L. M. Collins & J. L. Horn
Cole, D. A., Martin, N. C., & Steiger, J. H. (2005). (Eds.), Best methods for the analysis of change
Empirical and conceptual problems with longitudinal (pp. 243–259). Washington, DC: American Psycho-
trait–state models: Introducing a trait-state-occasion logical Association.
model. Psychological Methods, 10, 3–20. Halaby, C. N. (2004). Panel models in sociological
Collins, L. M., & Wugalter, S. E. (1992). Latent class research: Theory into practice. Annual Review of
models for stage-sequential dynamic latent variables. Sociology, 30, 507–544.
Multivariate Behavioral Research, 27, 131–157. Hancock, G. R., & Mueller, R. O. (Eds.) (2006). Structural
Deardorff, J., Gonzales, N. A., & Sandler, I. N. (2003). equation modeling: A second course. Greenwich, CT:
Control beliefs as a mediator of the relation between Information Age Publishing.
stress and depressive symptoms among inner city Hoyle, R. H. (1991). Evaluating measurement models
adolescents. Journal of Abnormal Child Psychology, in clinical research: Covariance structure analy-
31, 205–217. sis of latent variable models of self-conception.
Journal of Consulting and Clinical Psychology, 59, Kenny, D. A., & Kashy, D. A. (1992). Analysis of the
67–76. multitrait-multimethod matrix by confirmatory factor
Hoyle, R. H. (Ed.) (1995). Structural equation modeling: analysis. Psychological Bulletin, 112, 165–172.
Concepts, issues, and applications. Thousand Oaks, Kenny, D. A., & Zautra, A. (1995). The trait-state-error
CA: Sage Publications. model for multiwave data. Journal of Consulting and
Hoyle, R. H. (2000). Confirmatory factor analysis. In Clinical Psychology, 63, 52–59.
H. E. A. Tinsely & S. D. Brown (Eds.), Handbook Kline, R. B. (2005). Principles and practice of structural
of applied multivariate statistics and mathematical equation modeling (2nd ed.). New York: Guilford
modeling (pp. 465–497). New York: Academic Press. Press.
Hoyle, R. H., & Duvall, J. L. (2004). Determining the num- Loehlin, J. C. (1992). Genes and environment in
ber of factors in exploratory and confirmatory factor personality development. Thousand Oaks, CA: Sage
analysis. In D. Kaplan (Ed.), Handbook of quantitative Publications.
methodology for the social sciences (pp. 301–315). MacCallum, R. C. (2003). Working with imperfect
Thousand Oaks, CA: Sage Publications. models. Multivariate Behavioral Research, 38,
Hoyle, R. H., & Kenny, D. A. (1999). Sample size, 113–139.
reliability, and tests of statistical mediation. In MacCallum, R. C., & Austin, J. T. (2000). Applications
R. H. Hoyle (Ed.), Statistical strategies for small sample of structural equation modeling in psychological
research (pp. 195–222). Thousand Oaks, CA: Sage research. Annual Review of Psychology, 51, 201–226.
Publications. MacCallum, R. C., Roznowski, M., & Necowitz, L. B.
Hoyle, R. H., & Robinson, J. I. (2003). Mediated and (1992). Model modifications in covariance structure
moderated effects in social psychological research: analysis: The problem of capitalization on chance.
Measurement, design, and analysis issues. In Psychological Bulletin, 111, 490–504.
C. Sansone, C. Morf, & A. T. Panter (Eds.), Handbook MacCallum, R. C., Wegener, D. T., Uchino, B. N., &
of methods in social psychology (pp. 213–233). Fabrigar, L. R. (1993). The problem of equivalent
Thousand Oaks, CA: Sage Publications. models in applications of covariance structure
Hu, L.-T., & Bentler, P. M. (1995). Evaluating model fit. analysis. Psychological Bulletin, 114, 185–199.
In R. H. Hoyle (Ed.), Structural equation modeling: Marcoulides, G. A., & Schumacker, R. E. (Eds.)
Concepts, issues, and applications (pp. 76–99). (2001). Advanced structural equation modeling:
Thousand Oaks, CA: Sage Publications. New developments and techniques. Mahwah, NJ:
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit Erlbaum.
indexes in covariance structure analysis: Conventional Marsh, H. W., & Grayson, D. (1995). Latent variable
criteria versus new alternatives. Structural Equation models of multitrait-multimethod data. In R. H. Hoyle
Modeling, 6, 1–55. (Ed.), Structural equation modeling: Concepts, issues,
Jöreskog, K. G. (1973). A general method for and applications (pp. 177–198). Thousand Oaks, CA:
estimating a linear structural equation system. In Sage Publications.
A. S. Goldberger & O. D. Duncan (Eds.), Structural Marsh, H. W., Wen, Z., & Hau, K.-T. (2004).
equation models in the social sciences (pp. 85–112). Structural equation models of latent interactions:
New York: Academic Press. Evaluation of alternative estimation strategies and
Jöreskog, K. G. (1994). On the estimation of polychoric indicator construction. Psychological Methods, 9,
correlations and their asymptotic covariance matrix. 275–300.
Psychometrika, 59, 381–389. Maruyama, G. M. (1998). Basics of structural equation
Jörskog, K. G. & Sörbom, D. (1999). LISREL 8: modeling. Thousand Oaks, CA: Sage Publications.
Structural equation modeling with the SIMPLIS McArdle, J. J., & McDonald, R. P. (1984). Some
command language. Lincolnwood, IL: Scientific algebraic properties of the reticular action model for
Software International. moment structures. British Journal of Mathematical
Kaplan, D. (2000). Structural equation modeling: and Statistical Psychology, 37, 234–251.
Foundations and extensions. Thousand Oaks, CA: McClelland, G. H., & Judd, C. M. (1993). Statistical
Sage Publications. difficulties of detecting interactions and moderator
Keesling, J. W. (1972). Maximum likelihood approaches effects. Psychological Bulletin, 114, 376–390.
to causal analysis. Unpublished doctoral dissertation, McCutcheon, A. L. (1994). Latent logit models with
University of Chicago. polytomous effects variables. In A. von Eye & C. C.
Kenny, D. A., & Judd, C. M. (1984). Estimating the Clogg (Eds.), Latent variables analysis: Applications
nonlinear and interactive effects of latent variables. for developmental research (pp. 353–372). Thousand
Psychological Bulletin, 96, 201–210. Oaks, CA: Sage Publications.
McDonald, R. P., & Ringo Ho, M.-H. (2002). Principles research (pp. 3–35). Thousand Oaks, CA: Sage
and practice in reporting structural equation analyses. Publications.
Psychological Methods, 7, 64–82. Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005).
McPherson, J. M., & Rotolo, T. (1995). Measuring Establishing a causal chain: Why experiments are
the composition of voluntary groups: A multitrait- often more effective than mediational analyses
multimethod analysis. Social Forces, 73, 1097–1115. in examining psychological processes. Journal of
Muthén, B. O. (1984). A general structural equation Personality and Social Psychology, 89, 845–851.
model with dichotomous, ordered categorical and Steenkamp, J. E. M., & Baumgartner, H. (1998).
continuous latent variable indicators. Psychometrika, Assessing measurement invariance in cross-national
49, 115–132. consumer research. Journal of Consumer Research,
Muthén, B. O. (2001). Second-generation structural 25, 78–90.
equation modeling with a combination of categorical Steiger, J. H. (2001). Driving fast in reverse: The
and continuous latent variables: New opportuni- relationship between software development, theory,
ties for latent class/latent growth modeling. In and education in structural equation modeling.
L. M. Collins & A. Sayer (Eds.), New methods for Journal of the American Statistical Association, 96,
the analysis of change (pp. 291–322). Washington: 331–338.
American Psychological Association. Steiger, J. H. (2002). When constraints interact:
Muthén, L. K., & Muthén, B. O. (2006). Mplus user’s A caution about reference variables, identification
guide (4th ed.). Los Angeles, CA: Muthén & Muthén. constraints, and scale dependencies in structural
Muthén, B. O., & Shedden, K. (1999). Finite mixture equation modeling. Psychological Methods, 7,
modeling with mixture outcomes using the EM 210–227.
algorithm. Biometrics, 55, 463–469. Steiger, J. H., & Lind, J. C. (1980, May). Statistically
Myung, J. (2003). Tutorial on maximum likelihood based tests for the number of common factors. Paper
estimation. Journal of Mathematical Psychology, 47, presented at the Annual Meeting of the Psychometric
90–100. Society, Iowa City, IO.
Pentz, M. A., & Chou, C.-P. (1994). Measurement Tenko, R., & Marcoulides, G. (2000). A first course in
invariance in longitudinal clinical research Assum- structural equation modeling. Mahwah, NJ: Erlbaum.
ing change from development and intervention. Tepper, K., & Hoyle, R. H. (1996). Latent variable models
Journal of Consulting and Clinical Psychology, 62, of need for uniqueness. Multivariate Behavioral
450–462. Research, 31, 467–494.
Ping, R. A., Jr. (1996). Estimating latent variable Thompson, M. S. (2006). Evaluating between-group dif-
interactions and quadratics: The state of this art. ferences in latent variable means. In G. R. Hancock, &
Journal of Management, 22, 163–183. R. O. Mueller (Eds.), Structural equation modeling:
Reynolds, C. A., Finkel, D., McArdle, J. J., Gatz, M., A second course (pp. 119–169). Greenwich, CT:
Berg, S., & Pedersen, N. L. (2005). Quantitative Information Age Publishing.
genetic analysis of latent growth curve models Tomer, A. (2003). A short history of structural equation
of cognitive abilities in adulthood. Developmental models. In B. H. Pugesek, A. Tomer, & A. Von Eye
Psychology, 41, 3–16. (Eds.), Structural equation modeling: Applications in
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our ecological and evolutionary biology (pp. 85–124).
view of the state of the art. Psychological Methods, Cambridge, UK: Cambridge University Press.
7, 147–177. Vandenberg, R. J., & Lance, C. E. (2000). A review and
Schumacker, R. E., & Lomax, R. G. (2004). A beginner’s synthesis of the measurement invariance literature:
guide to structural equation modeling (2nd ed.). Suggestions, practices and recommendations for
Mahwah, NJ: Erlbaum. organizational research. Organizational Research
Schumacker, R. E., & Marcoulides, G. A. (Eds.) Methods, 3, 4–70.
(1998). Interaction and nonlinear effects in structural Wansbeek, T., & Meijer, E. (2000). Measurement error
equation modeling. Mahwah, NJ: Erlbaum. and latent variables in econometrics. Amsterdam:
Sher, K. J., Wood, M. D., Wood, P. K., & Raskin, G. Elsevier Science.
(1996). Alcohol outcome expectancies and alcohol Widaman, K. F., & Reise, S. P. (1997). Explor-
use: A latent variable cross-lagged panel study. ing the measurement invariance of psychological
Journal of Abnormal Psychology, 105, 561–574. instruments: Applications in the substance use
Sobel, M. E. (1994). Causal inference in latent variable domain. In K. J. Bryant, M. Windle, & S. G. West
models. In A. von Eye, & C. C. Clogg (Eds.), Latent (Eds.), The science of prevention: Methodologi-
variables analysis: Applications for developmental cal advances from alcohol and substance abuse
research (pp. 281–323). Washington, DC: American Wright, S. (1934). The method of path coef-
Psychological Association. ficients. Annals of Mathematical Statistics, 5,
Wiley, D. E. (1973). The identification problem 161–215.
for structural equation models with unmeasured Wright, S. (1968). Evolution and the genetics of
variables. In A. S. Goldberger & O. D. Duncan (Eds.), populations (vol. 1). Chicago: University of Chicago
Structural equation models in the social sciences Press.
(pp. 69–83). New York: Academic Press. Zautra, A. J., Marbach, J. J., Raphael, K. G., Dohrenwend,
Willett, J. B., & Sayer, A. G. (1994). Using covariance B. P., Lennon, M. C., & Kenny, D. A. (1995).
structure analysis to detect correlates and predictors The examination of myofascial face pain and
of individual change over time. Psychological Bulletin, its relationship to psychological distress. Health
116, 363–381. Psychology, 14, 223–231.
24
Equating Groups
Stephen G. West and Felix Thoemmes
EQUATING GROUPS taken as the estimate of the treatment effect.

However, the estimate of the treatment effect
One of the most central tasks of both basic and will be valid if and only if the two groups
applied behavioral science is to estimate the have been successfully equated prior to the
size of treatment effects. The basic procedure implementation of the treatment. Otherwise
is conceptually very straightforward. The stated, only if the groups are equated will
researcher identifies a treatment (T) of interest Y T −Y C be an unbiased estimate of the causal
such as a new drug treatment or a new effect of the treatment.
cognitive approach to psychotherapy. In our This chapter will examine some major
illustration T is designed as a possible methods of equating groups. We will draw
means of reducing depression in a clinical from insights in statistics (Holland, 1986;
population. The researcher then identifies Rosenbaum, 2002; Rubin, 1974, 1978, 2005),
a comparison (C) condition to which the psychology (Reichardt, 2006; Shadish et al.,
treatment is to be compared. In the case of 2002; West et al., 2000), public health
the new drug treatment, the researcher might (Little & Rubin, 2000), and sociology and
choose a placebo which has no pharmaceutical econometrics (Winship & Morgan, 1999). We
effect on depression or another drug that will focus on comparisons of a treatment and
is the current standard drug prescribed to comparison group in two commonly used
help relieve depression. Similarly, in the case research designs: the randomized experiment
of the new psychotherapy, the researcher and the observational study (i.e. nonequiva-
might choose no psychotherapy, psychother- lent control group design). The key feature
apy without the new cognitive elements, or that distinguishes these two designs is the
the standard psychotherapeutic treatment that process through which units are assigned
is commonly delivered (standard of practice). to the T and C groups (Judd & Kenny,
Each patient’s level of depression is then 1981). The randomized experiment uses some
measured following treatment. The difference random process (e.g. flipping a coin, a random
between the mean level of depression in the number generator) to determine assignment
treatment and control groups,Y T −Y C , is then of the units to the T and C groups. The units
EQUATING GROUPS 415
are typically individual participants, but they of the treatment effect. We then introduce
may be larger aggregations such as schools modern methods of adjusting treatment
or entire communities. This process implies effects in observational studies for measured
that the expected mean of the units in the differences at baseline. These methods can
T group will equal the expected mean of substantially reduce any bias in the estimate of
the C group on any conceivable measured the treatment effect. Other approaches attempt
or unmeasured baseline variable so that to bracket the size of the treatment effect
Y T − Y C may be taken as an unbiased so that it represents a reasonable estimate
estimate of the treatment effect. In contrast, even if there are variations on important
the observational study uses an unknown unmeasured differences at baseline. Finally,
process to assign participants to the groups. we consider design enhancements that help
Participants may choose to receive the T rule out likely effects of unmeasured variables
versus C, or participants may receive the that may provide alternative explanations for
treatment because they are located in a single the observed effect of treatment.
community, school, hospital, or other larger
unit that has agreed to participate in the study.
The process through which participants end up RANDOMIZED EXPERIMENTS
in the T versus C groups is unknown, implying
that researchers should expect that there are Randomization approximately equates the T
potential mean differences on background and C groups at baseline. More formally,
variables between the T and C groups at randomization produces two important results
baseline, even before treatment commences. (Holland, 1986; West et al., 2000). First, as we
Now Y T − Y C no longer represents an observed above, the expected mean on any
unbiased estimate of the causal effect of the participant characteristic at baseline
will be
treatment, but rather a confounded estimate equal
in the
T and C groups, E Y Tbaseline =
reflecting some combination of the true causal E YCbaseline , where E( ) is the expected
effect of treatment and preexisting differences value of the variable in parentheses. Second,
between the groups on measured or unmea- the binary variable X (1 = T ; 0 = C)
sured variables at baseline (Reichardt, 2006). indicating the treatment condition, is expected
Only by carefully assessing critical participant to be unrelated to all possible participant

characteristics at baseline and developing characteristics at baseline, E rXYbaseline = 0.
methods to equate the T and C groups prior to These two results imply that Y T − Y C at
the beginning of treatment can the researcher post test will be an unbiased estimate of
even approximately estimate the desired effect the treatment effect so that no adjustment
of the treatment. of this effect is needed. Note, however, that
We begin this chapter by briefly reviewing these results are expectations. They will hold
the randomized experiment. The randomized exactly only given very large sample sizes or
experiment is often described as the ‘gold across a large number of exact replications of
standard’ design and it serves as an important the same experiment conducted on a single
benchmark for the observational study. We population. In any single experiment using
identify some ways in which even randomized more modest sample sizes—‘unfortunate
experiments can be enhanced through the use randomization’—in which the T and C groups
of additional procedures designed to more differ at baseline on some subset of important
closely equate the groups at baseline. We background variables can be expected to occur
then briefly review studies comparing the with some regularity. For this reason many
treatment effect estimates from randomized journals in the public health area formally
experiments to those of observational studies require that means of the T and C groups on
studying similar treatments, to provide infor- important baseline measures be reported as a
mation about the conditions under which these check on the success of the randomization in
two designs may lead to different estimates the experiment. Following our presentation of
additional requirements for randomized field procedures that can provide proper estimates of
experiments, we will discuss procedures that the treatment effect when there is treatment
use these baseline measures to equate groups noncompliance.
more adequately prior to treatment in order to 3 Absence of Attrition. All participants who are
provide more statistically powerful tests of the assigned to T and C conditions must be
treatment effects. measured on the outcome variable. Even though
randomization serves to equate participants on
average at baseline, this equating is potentially lost
Additional requirements if some participants are not measured at posttest.
Of most concern is differential attrition in which
Randomized experiments involve additional participants with different characteristics drop out
requirements that must be met for valid esti- of the two groups. For example, in an experiment
mation of the treatment effect (see Chapter 8). investigating a new method of mathematics
These requirements are routinely met in most instruction, less mathematically talented students
laboratory experiments, but can be easily might find the new course too challenging and
violated in community settings. Failure to withdraw prior to the collection of the outcome
meet these requirements may necessitate the measure. Y T would only be based on the scores
of the more talented students assigned to the
use of special procedures, the inclusion of
T condition, leading to an overestimate of the
additional design features, or the use of effectiveness of the course.
special analysis procedures that adjust for the
Modern missing data techniques (Little & Rubin,
potential bias (Barnard et al., 1998). Four 2002; Schafer & Graham, 2002) can improve the
requirements over which the experimenter estimation of the treatment effect, particularly if
may only have limited control are of particular variables that are highly related to the outcome
importance in randomized field experiments1 . (e.g. baseline measures on the outcomes of
interest), to missingness, or ideally both are
1 Proper Randomization. The randomization process measured at baseline. Full information maximum
must be properly carried out and adhered to. likelihood estimation (FIML), now available in
Treatment providers must not be permitted to several statistical packages (e.g. Mplus), combines
alter the assignment of participants to the T and all of the observed data to produce optimal
C conditions. Kopans (1994) presents evidence estimates and standard errors for the treatment
that reassignment of high-risk women to the effect and other parameters of interest in the
treatment condition apparently occurred in a statistical model. Multiple imputation (MI), also
large national randomized trial evaluating the available in several statistical packages (e.g. SAS),
effectiveness of screening mammography. Connor makes multiple copies of the dataset. In each copy,
(1977) provides other examples of experiments in the optimal predicted value for each missing datum
which randomization failed or was not maintained is calculated, then random error matching that in
by treatment providers. He suggests procedures the complete data is added. The step of adding
that potentially minimize the likelihood of such random error ensures that the original variability
randomization failures. Robins (1989) and Hernán of the observed data is retained in the values
et al. (2001) present methods of adjusting that are imputed. The statistical model testing the
treatment effect estimates in complex longitudinal treatment effect is then estimated in each copy of
studies, for example, when participants are the dataset. Finally, the estimates of the treatment
reassigned to another treatment, as in certain effects (and other parameters of interest) in each
medical studies in which the patient does not copy of the dataset are recombined. FIML and
respond to the assigned treatment. MI will both produce unbiased estimates of the
2 Treatment Compliance. The participants must treatment effect with proper standard errors if
receive the intended treatment. In randomized missingness is related to measured variables in the
experiments studying mammography screening, dataset, but not if there are other aspects of the
some participants have refused screening (T). missing variables that are not captured by other
Other participants in the C group have sought out variables in the dataset. Consider two potential
mammography screening outside the experiment reasons why participants might be missing from
(Baker, 1998). West and Sagarin (2000; see also a measurement session in a study of health
Angrist et al., 1996; Jo, 2002) review statistical outcomes in a large company. In the first case,
EQUATING GROUPS 417
each participant’s baseline measure of health (e.g. is unbiased. Unfortunately, this only means
number of days of illness the previous year) is the that the treatment effect will be correct
only variable that systematically predicts whether on average. There is no guarantee that
the participant will be present for the session. In unfortunate randomization will not occur in
the second case, several of the participants in a a particular experiment. If the T and C
division of the company are missing because they
groups can be closely equated at baseline on
are suffering health problems from working day
and night on an intensive new project. In the first
variables thought to be important predictors
case, either FIML or MI will produce unbiased of the outcome, then the likelihood of unfor-
estimates because the source(s) of missingness tunate randomization can be substantially
were measured at baseline and are present in the reduced. Equating procedures thus reduce
dataset. In the second case, both FIML and MI will the potential of an incorrect estimate of the
produce biased estimates of the treatment effect treatment effect in a specific experiment.
unless information about project participation and Equating procedures can also have the benefit
the current project-related health problems are of increasing the statistical power of the
present in the dataset. Suppose, however, that the test, the probability that a true treatment
researchers had used available substantive theory effect of a specified size can be detected.
and research to select an extensive set of baseline
Finally, they may help reduce some of
variables that were expected to be related to the
outcome variables, missingness, or both. Once
the uncertainty associated with statistical
again, information about project participation and methods of correcting treatment estimates
project-related health problems are not available when the four additional requirements are
in the dataset. In this case, the use of FIML or MI not met. The use of equating procedures is
will typically lead to estimates of the treatment particularly important when the number of
effect that are less biased, perhaps substantially units to be assigned is small, the units are not
so, than methods that ignore missing data or homogeneous, or the treatment effect is not
that use traditional approaches such as listwise constant, but rather differs in magnitude as a
deletion, pairwise deletion, and mean imputation function of the variable(s) on which equating
to address missing data. is based.
4 Stable-Unit-Treatment-Value Assumption. The
Consider the following example that cap-
response of the participant should not be affected
by the treatments (or the participant’s knowledge
tures the importance of equating with a small
thereof) that other participants receive. This number of non-homogeneous units. Suppose a
condition is known as the stable-unit-treatment- randomized experiment is conducted in which
value assumption (SUTVA); its purpose is to ensure the units are six different US cities. Each
that each participant can only have one true city receives either an intensive mass media
response in the treatment condition (see Rubin, campaign of anti-smoking public service
1978, 1980). Otherwise, the outcomes of the announcements (T) or it does not receive any
participants in the C group are likely to be smoking-related messages in the media (C).
atypical. For example, if cancer patients learn that The cities chosen for study are from three
other participants have been assigned to a more groups: (a) large cities: Chicago, IL, Los
promising treatment condition, they may give up
Angeles, CA; (b) medium-sized cities: Bal-
hope and stop performing their normal health
supportive practices (e.g. proper diet) so that they
timore, MD, Portland, OR; and (c) small
will have worse outcomes than they would have cities: Terre Haute, IN and San Angelo, TX.
had in the absence of this knowledge. Three cities are to be assigned to T and
three cities to C. Assume that size of the
city is known to be strongly related to the
effectiveness of mass media campaigns in
Some effects of improving group
health. Following Cochran and Cox (1957),
comparability at baseline
when there are equal numbers(n) of units in
Randomization combined with meeting the 2n
the T and C groups, there are possible
four requirements outlined above assures n
that the estimate of the treatment effect randomizations. In the present example, there

6 6 ×5×4×3×2×1 of treatment non-compliance and attrition,
are = (3×2×1)(3×2×1) or 20 possible
3 particularly in experiments in which sample
randomizations. A randomization that com- sizes are moderate rather than extremely large
pared Chicago, Baltimore, and Terre Haute to and the size of the treatment effect is not
Los Angeles, Portland, and San Angelo would constant, but rather depends on the level
be desirable. In contrast, a randomization of the baseline variable (i.e. a baseline ×
that compared Chicago, Los Angeles, and treatment condition interaction). Conceptu-
Baltimore to Portland, Terre Haute, and San ally, matching followed by randomization
Angelo would be unfortunate. To avoid this may also have other potential advantages in
problem, the researcher could match the two certain contexts as it implicitly identifies a
large cities, the two medium cities, and two specific comparison participant with which
small cities. Within each matched pair, one each treatment recipient may be compared.
city would be randomly assigned to T and For example, many clinicians would ideally
one to C, leading to a randomization in which like to understand the effects of treatments
the T and C groups will be more adequately on single cases rather than the average effect
balanced, particularly on the critical baseline of the treatment on patients in general. The
variable of the size of city. matching and randomization procedure can
This procedure of pair matching followed permit a closer approximation of this ideal
by randomization is very general. For exam- than simple randomization.
ple, in a randomized experiment evaluating a When many measures are collected at
new math instruction program, students could baseline, matching becomes more difficult.
be assessed on a baseline measure of math In some cases the multiple measures can be
ability that is expected to be highly related to combined a priori into a single composite
the outcome variable, here math achievement. variable on which matching can occur.
The students could be ranked based on their For example, in research related to breast
scores and pairs formed (the two highest; the cancer, a set of measures including age at
next two highest; … down to the two lowest). menarche, number of first-degree relatives
Once again, within each pair students would (mother, sister) with breast cancer, number
be randomly assigned to T and C groups. This of previous breast biopsies, and age are
procedure ensures that the T and C groups will combined into a single risk score using
be closely equated on the important baseline a formula based on prior epidemiological
variable of pretest math ability, preventing research (Gail et al., 1989). Alternatively,
any possibility of unfortunate randomization measures can be collected on the entire sample
with respect to this critical variable. A second prior to randomization. The researcher can
advantage of this procedure is that it can generate several thousand different possible
lead to far more statistically powerful tests of randomizations and calculate Hotellings T2
the treatment2 . For example, Student (1931) for each randomization using the key variables
showed that an early randomized experiment measured at baseline. Hotellings T2 describes
on 10,000 children studying the effects of the magnitude of the multivariate difference
pasteurized (T) versus raw (C) milk on height between the groups, here on the baseline
and weight gains could have achieved the variables. The randomizations are sorted
same level of statistical power with 50 pairs from low to high in terms of the values
of identical twins. Matching followed by of Hotellings T2 . From the 5 percent or
randomization may also lead to a third 10 percent of the randomizations with the
benefit, providing a stronger foundation for lowest values of Hotellings T2 , a randomiza-
addressing failures to adequately meet the tion is chosen, thereby minimizing potential
additional requirements of randomized exper- problems of unfortunate randomization. More
iments (presented above). For example, the complicated blocking and randomization pro-
existence of well-matched pairs may provide cedures to achieve these same goals in other
a stronger basis for modeling the effects specialized experimental contexts (e.g. trickle
EQUATING GROUPS 419
flow randomization in which participants are Two types of comparisons have been
recruited over an extended period of time) made: (a) single investigations of parallel
are described in Friedman et al. (1998) and randomized experiments and observational
Matthews (2000). studies using similar (possibly identical)
treatments; and (b) extensive meta-analyses
of research areas investigating the effect of
a treatment. Of note, exact agreement of the
RESEARCH COMPARING THE RESULTS estimates of treatment effects in randomized
OF RANDOMIZED EXPERIMENTS AND experiments and observational studies should
OBSERVATIONAL STUDIES not be expected—given sampling error, even
exact replications of a randomized experi-
As a starting point for studying methods to ment using the same population would not
improve the results of observational studies, be expected to produce identical treatment
it is useful to review literature compar- effects. In addition, other differences between
ing the results of randomized experiments the studies representing the two designs may
with those of observational studies. Properly exist. For example, the populations sampled
implemented randomized experiments serve in the two designs, the treatment delivery,
as the ‘gold standard’—they typically provide the research setting, or other methodological
the best, unbiased estimates of the magnitude features (e.g. a less adequate control condition
of the treatment effect. In contrast, the is constructed in the observational study) may
unknown rules through which participants in differ in addition to the focal difference of
observational studies are assigned to the T or C randomized versus non-randomized design
conditions lead to far greater uncertainty about (Cook et al., 2006; Reichardt, 2006; West
the treatment effect estimate. The researcher et al., 2007).
would like to claim that some aspect of
the treatment caused the observed results;
Single comparative studies
however, it may be possible that a failure
to successfully equate the groups at the Studies comparing treatment effect estimates
beginning of the experiment provides a strong from randomized experiments and observa-
alternative explanation (Reichardt, 2006). tional studies have produced diverse results.
Even when adjustments in the treatment effect A classic example is Meier’s (1972) large-
can be made on the basis of measures collected scale evaluation of the effectiveness of the
at baseline, there may be less than complete Salk polio vaccine in the US. In some
certainty that the T and C groups have been states, a randomized experiment was used; in
properly equated. others, an observational study. Even though
Statistical theory clearly identifies failure both designs led to the conclusion that the
to equate the T and C groups on important Salk vaccine was effective, the effect size
variables at baseline as an important plausible in the randomized experiment was substan-
problem that may occur in observational tially larger. Gilbert et al. (1975) suggested
studies. However, it provides little guidance that the difference in effect sizes primarily
as to the likely frequency of this problem in resulted from the different populations on
practice, nor to the contexts in which estimates which the polio rates were based in the
of treatment effects are most likely to be C conditions. In the randomized experiment,
biased. To gain some insights into this issue, the comparison group included only children
below we briefly review literature comparing who had permission to be vaccinated in
the results of randomized experiments with contrast to the observational study in which
observational studies that employed similar the full population was represented.
treatments. We then turn to an examination Cook et al. (2006) reviewed a unique
of modern statistical and design solutions that subset of investigations in which a single
attempt to address these issues. randomized treatment group was compared
with both a randomized control group that suggests that the effect size estimates
(randomized experiment) and a second non- of observational studies may be associated
randomized comparison group (yoked obser- with more uncertainty than randomized
vational study). Those observational studies experiments.
that created a high-quality comparison group Reviews of other areas also suggest that the
produced comparable results to those of the direction of mean bias is by no means certain.
yoked randomized experiment. Investigations Lipsey and Wilson (1993) analyzed 74 meta-
with a poorly selected comparison group, analyses of behavioral and educational inter-
poor statistical adjustment for baseline differ- ventions, finding no difference in the mean
ences, or which differed in other procedural effect sizes of randomized experiments and
or design features between the observational observational studies. Heinsman and Shadish
study and yoked randomized experiment often (1996) analyzed four meta-analyses in the
produced discrepant findings. areas of drug-use prevention, psychosocial
interventions for surgery, coaching for the
SAT, and ability grouping in secondary
Meta-analyses
schools. They found a larger effect size
Across diverse substantive research areas, for randomized experiments than for obser-
such as skill training, organizational devel- vational studies. Taken together, the meta-
opment, psychotherapy, and medical inter- analytic results suggest that the magnitude of
ventions, meta-analyses have produced bias resulting from the use of an observational
heterogeneous outcomes in which random- study rather than a randomized is typically
ized experiments have shown larger, smaller, not large and its direction is uncertain. They
and no difference in treatment effect estimates also suggest that area-specific choices of
relative to observational studies. An early samples and methodological features (e.g.
influential meta-analytic investigation by type of comparison group) may be important
Sacks et al. (1983) identified six medical determinants of any bias that is observed.
therapies that had been studied using both
randomized experiments and observational
Methodological features
studies. Sacks et al. concluded that
observational studies produced biased results Heinsman and Shadish (1996) coded method-
in comparison to randomized controlled ological features that might potentially
trials. Attempts to adjust treatment effects in account for the observed difference in effect
observational studies for available prognostic sizes between randomized experiments and
factors did not remove this bias. More observational studies in four behavioral sci-
recently, Ioannidis et al. (2001) conducted ence research areas (e.g. SAT coaching, drug
meta-analyses of 45 medical interventions use prevention). Of importance, they found
(e.g. vaccines for meningitis; local versus in a regression analysis that not allowing
general anesthesia) involving a total of 240 self-selection into T versus C conditions in
randomized trials and 168 observational observational studies, using a control group
studies. Overall, there was no consistent from the same population as the treatment
pattern of over- or under-estimation of group, minimizing the baseline effect size
treatment effects by the observational studies difference between the T and C groups,
relative to the randomized experiments and minimizing both overall attrition and
Significant differences between the differential attrition made the treatment effect
randomized experiments and observational estimates more comparable in the two designs.
studies were found in only a small proportion Shadish and Ragsdale (1996) found similar
of the meta-analyses . Ioannidis et al. provided results in a meta-analysis of randomized
evidence of smaller between-study variance experiments and observational studies of mar-
in the randomized experiments than in the ital or family psychotherapy. Consistent with
observational studies, an important finding these findings, Heckman and Robb (1986)
EQUATING GROUPS 421
also point to conceptual and statistical reasons student in the classroom A, an attempt is
why allowing participants to self select into made to identify a student in classroom B
T and C groups is particularly likely to lead who is closely equated on IQ. This matching
to biased estimates. These results suggest process diminishes the mean difference in
that it may be possible to improve estimates baseline IQ between the two groups in our
of treatment effects in observational studies example from MA − MB = 5 in the full
through the careful use of design and analysis unmatched sample to MA − MB = 0.5
strategies. in the reduced, matched sample. A variety
of computer algorithms are available that
Adjustment strategies for equating match T and C participants to produce the
groups at baseline minimum discrepancy on the pretest variable
(see Ming & Rosenbaum, 2001; Rosenbaum,
Matching 2002). These computer algorithms are partic-
Matching is used in observational studies to ularly useful when both the T and C groups
identify a set of participants in the T and are large, are of dramatically different sizes,
C groups that are comparable. To illustrate, or both. For example, observational studies
consider two small school classrooms, labeled of initial trials of innovative programs (T)
A and B, one of which implements an may involve a relatively small number of
innovative new math curriculum, whereas the participants, whereas there are a substantially
other implements a standard math curriculum larger number of participants in the standard
in 6th grade. Table 24.1 illustrates the basic program (C) that serve as the comparison. In
process of simple 1:1 matching. All students such cases, the algorithm will select a variable
in both classrooms are given an IQ test at number of optimal matches (e.g. up to 5) for
the beginning of the school year. For each each participant3 . These variable matching
procedures lead to more adequate equating
Table 24.1 Illustration of simple matching of the groups on the matching variable and
of two small classroom on baseline IQ scores greater statistical power for the T versus C
Pair Classroom A Classroom B comparison, given the larger sample size
130 (Ming & Rosenbaum, 2000).
1 125 124 Researchers are encouraged to measure
2 120 120
many variables at baseline, particularly those
3 119 119
4 119 118 that may be related to treatment group
5 117 116 assignment or the outcome variable. Substan-
6 115 115 tive theory and prior research can provide
7 109 109 guidance in the selection of a set of measures
8 107 107
that will capture as fully as possible potential
9 107 106
10 104 102 baseline differences between the T and
11 101 101 C groups. However, the availability of a large
12 96 96 number of baseline variables makes matching
90 far more complex. In rare cases, a composite
89
variable can be created (e.g. the Gail score
Note: Scores were ordered within units and represent for breast cancer risk described earlier).
pretest IQ scores of participants. Pairs of participants on
the same line represent matched pairs. One person in
More commonly, propensity scores are used.
Classroom A and two persons in Classroom B have no Propensity scores provide an estimate of the
matched pairs. The mean IQ score for all participants in probability that a participant will be assigned
Classroom A is 113; the mean IQ score for all participants to the treatment group (Rosenbaum, 2002;
in Classroom B is 108. The mean difference (Y A − Y B ) for Rosenbaum & Rubin, 1983, 1984; Rubin,
the full unmatched sample is 5. The mean for the matched
pairs of Classroom A is 111.6 and for Classroom B is 111.1,
1997; Shadish et al., 2006; Smith, 1997). The
yielding a mean difference of 0.5. nA = 13 and nB = 14 researcher uses all baseline variables (or a
for the full sample subset containing the most important ones
if this number is very large) and predicts measured early in the school year. In the full
the probability that the participant will be in sample (n = 769) of children at risk for
the T group. This probability is known as the grade retention, there were large differences
propensity score. between students on the Woodcock Johnson
There are two major issues in the creation of reading score at baseline. Students who were
propensity scores. The first is to make sure that later retained in first grade had substantially
subject matter expertise in the form of prior lower scores than students who were later
research and theory has been used to select promoted to second grade, Y baseline−retained =
baseline measures that will capture as fully 420 versus Y baseline−promoted = 438. Optimal
as possible important baseline differences 1 to 1 matching on propensity scores yielded
between the T and C groups. The second is to 97 matched pairs with Y baseline = 422.4 for
choose a statistical model that adequately rep- the retained students and Y baseline = 423.4 for
resents the form of the relationship between the promoted students. Similar reductions in
the variables and each participant’s propensity baseline differences were achieved for other
score. Rosenbaum and Rubin (1983) used variables measured at baseline. Theoretically,
simple linear logistic regression to produce propensity scores will provide a proper
these estimates. Dehejia and Wahba (1999) adjustment for the unknown assignment rule
used more complicated logistic regression if all important baseline variables have been
models involving specification of interactions included and the form of the propensity model
and curvilinear effects of baseline variables. has been correctly specified.
McCaffrey et al. (2004) used automated Matching has substantial strengths in that
stepwise nonparametric regression tree meth- it does not require specification of the form
ods to model possible complex relationships of the relationship between the baseline and
between the variables and the propensity outcome variables, it clearly delimits the
score. In each case the goal is to achieve T and range of the baseline variables over which
C groups that are balanced on all important T and C can be appropriately compared,
baseline variables and for which the error of and it leads to efficient estimates of the
prediction in the sample has been minimized treatment effect because of the small number
(Shadish et al., 2006). As an important check of parameter estimates that are involved.
on the success of this procedure, the data are Hypothesized treatment group x baseline level
divided into five strata and the balance of interactions can also be examined within the
the baseline variables within each stratum is matched propensity score framework. There
compared. When balance is achieved, there are two primary limitations of the matched
is a strong basis for comparing the groups. If propensity score framework. First, it does not
balance is not achieved within one (or more) adjust the treatment effect for measurement
stratum, the comparison of the treatment and error in the baseline variables giving rise to
control groups is carried out only over those potential regression to the mean effects if
strata on which balance has been achieved. very reliable and stable measures of important
Each participant’s propensity score may baseline variables are not available. Second,
then be taken as the best summary of the it does not adjust for other important variables
baseline information. The propensity score is (hidden variables) that are not measured at
used as the basis for equating the groups. The pretest, again emphasizing the importance of
groups may be equated using the standard selection of the full range of potential baseline
1 to 1 or variable many to 1 matching proce- variables based on subject matter expertise.
dures described above. Alternatively, analysis
of covariance or blocking on the strata may be Statistical adjustment strategies based on
used (but see footnote 3). As an illustration of measured baseline differences
the matching strategy, Wu et al. (in press) con- A variety of statistical models may be
structed propensity scores for retention in first developed that attempt to adjust for baseline
grade from a large set of baseline variables differences in measured variables. Perhaps,
EQUATING GROUPS 423
the simplest is analysis of covariance that the models to date have specified a linear
(Huitema, 1980; Reichardt, 1979) which is relationship between the baseline measures
used to provide an adjustment of the treatment and the outcome. Lee et al. (2004), Marsh
effect for one or more baseline variables. et al. (2004), and Wall and Amemiya (2007)
Typically, a simple linear model is used, describe extensions of structural equation
Ŷ = b0 + b1 COV + b2 X, where Y is the out- models that may account for curvilinear and
come variable, COV is the covariate measured interactive effects.
at baseline and X is the binary treatment Correction for measurement error can also
indicator. This model can be extended to be desirable when treatment participants
include multiple covariates, other parametric are selected on the basis of a variable
relationships (e.g. addition of a b3 COV2 term that is unstable over time. For example, if
to represent a quadratic relationship between T participants are selected based on high
X and Y ), and treatment x covariate inter- scores on a measure of depression (or because
actions (Cohen et al., 2003; Huitema, 1980; they are seeking treatment because of a severe
Reichardt, 1979). Nonparametric methods can depressive episode), it is likely that some of
be used to model more complex relationships the participants are in a temporary state of high
between X and Y (see Little et al., 2000). depression and would return to their typical
The primary limitation of ANCOVA methods level of depression in the absence of any
is that their success in equating the T and treatment simply given the passage of time.
C groups depends heavily on the correct Reliability correction methods that adjust the
specification of the adjustment model. For estimate of the treatment effect for the test-
example, if the relationship between COV and retest reliability for the time interval between
Y is nonlinear and a simple linear ANCOVA the baseline and outcome measures in the
model is used, the treatment effect estimate absence of treatment can improve the estimate
will be biased. of the treatment effect. If repeated measures
The basic ANCOVA approach shares the are collected on multiple indicators of the
limitation with matching that baseline vari- outcome variable at baseline and multiple
ables may be measured with less than perfect other time points, special structural equation
reliability. This problem is most serious models can be used that partition the variance
when the T and C groups are selected at each time point into state (temporary) and
from different populations, so that regression trait (true score) components (cf. Khoo et al.,
to the mean will occur (see Campbell & 2006; Steyer et al., 1992).
Kenny, 1999; Shadish et al., 2002). Even if
the statistical adjustment model is otherwise Adjusting for unmeasured baseline
correctly specified, measurement error will differences (hidden variables)
typically lead to under-adjustment of the treat- The matching and the statistical adjustment
ment effect for baseline differences. Huitema strategies described above can provide appro-
(1980) provides an introduction and Fuller priate correction of the estimate of the
(1987) provides a more advanced treatment of treatment effect for variables measured at
methods for correcting for measurement error baseline. However, it is also possible that
in the context of ANCOVA. Alternatively, variables that are not measured at baseline
when multiple indicators are available for could account for all or part of the estimated
each important construct measured at pretest, treatment effect. Three general strategies exist
structural equation models can be used to for addressing this problem.
provide measurement error-free estimates of First, a variety of methods have been
the treatment effect. Aiken et al. (1994) proposed for conducting sensitivity analy-
provide a good discussion of the use of this ses of treatment effect estimates (Marcus,
approach and apply it to the evaluation of 1997; McCafferty et al., 2004; Rosenbaum,
a drug treatment program. One limitation of 2002). As an illustration of one simple
the structural equation modeling approach is method, imagine a researcher has found a
0.8 standard deviation difference (large effect variable4 , a variable that strongly predicts
size) between the T and C groups on the treatment assignment in the first equation but
outcome variable. The researcher would then which has no separate relationship to the
identify the largest standardized difference outcome (see Figure 24.1). In essence, the
between the T and C groups on the set instrument can be thought of as a naturally
of variables measured at baseline. Suppose occurring randomization (Heckman, 1996).
the largest baseline difference were d = 0.5 The instrumental variable can only affect
standard deviations. Then the researcher the outcome indirectly through its effect on
identifies the maximum correlation between treatment assignment, an assumption known
any of the baseline measures and the posttest as the exclusion restriction. If the assumptions
measure of the outcome of interest. Suppose of this approach are met, the treatment
the maximum correlation were r = 0.6. The effect estimate will include proper adjust-
product of these two quantities, adjustment = ment for both measured and unmeasured
Y baselineT −Y baselineC baseline variables. However, in practice, this
SD rbaseline−outcome , here adjust- method is extremely sensitive to violations
ment = 0.5 × 0.6 = 0.3, provides a rough of its underlying assumptions, particularly
estimate of the maximum extent that this the exclusion restriction (Heckman, 1997;
estimate of the standardized treatment effect Stolzenberg & Relles, 1990; Winship &
would need to be reduced given what is a
‘worst case scenario’ for an important hidden
variable. If the standardized treatment effect
were reduced by this amount, to 0.8 − 0.3 =
0.5 in our example, we would have a plausible
estimate of its lower bound. If this value Residual 1 Residual 2
were still statistically significant, it would
provide evidence that the treatment effect
is robust. Note that there is no theoretical BOT
reason why the actual adjustment required for Treatment
Outcome
Indicator
hidden variables could not exceed this value.
However, in practice, if a number of variables
are measured at baseline and they can be
presumed to be representative of important Instrumental
Variable
hidden variables, the adjustment will nearly
always be an overestimate of the adjustment
Figure 24.1 Illustration of econometric
needed in practice.
selection bias model
Econometric approaches (e.g. Barnow Note: The instrumental variable directly
et al., 1980; Heckman, 1979, 1989, 1990; affects only the Treatment Indicator (T = 1;
Muthén & Jöreskog, 1983) have been pro- C = 0). This condition is known as the
posed that adjust for the effects of both exclusion restriction. Residual 1 is the error
measured and unmeasured variables at base- of the prediction of the Treatment Indicator
line. Two separate equations are used in including error produced by hidden
these models. The first (selection model) variables. The hidden variables may also be
equation uses measured baseline variables to associated with the residual of the Outcome
predict the assignment of the participant to (Residual 2). If the model is correctly
specified, an adjustment of the regression
the treatment or control group. The second
coefficient BOT will yield an unbiased
uses this selection probability, an indicator
estimate of the treatment effect controlling
variable (T = 1; C = 0) for treatment for the hidden variables. If the assumptions
condition, and potentially other covariates to of the model are violated (notably the
estimate the outcome. A key feature of this exclusion restriction), the estimate of the
approach is the requirement of an instrumental treatment effect may be severely biased.
EQUATING GROUPS 425
Mare, 1992). When assumptions are violated, well represented using this approach. More
the treatment effect estimates of econometric adequate modeling of growth requires the
models can be far more biased than those collection of additional data at multiple time
based on simpler approaches likeANCOVAor points, ideally both before and after the
matching. In addition, even if the assumptions treatment (Shadish et al., 2002; West et al.,
of the approach are met, the standard errors 2000). If sufficient additional time points are
of the estimate of the treatment effect can collected, the natural pattern of growth prior to
be extremely large if the instrument is not treatment can be estimated; this pattern can be
very strongly related to treatment assignment. compared to the pattern of growth following
Finally, the econometric approach assumes the introduction of treatment in the T group.
that the treatment effect is constant across all Singer and Willett (2002) describe multilevel
participants. modeling methods that estimate the treatment
A third approach suggested by Manski effect while allowing for differences between
(1994), Manski and Nagin (1998), Manski participants in growth rates.
and Pepper (2000) has explored the effects of
making weaker assumptions about instrumen-
Design enhancements
tal variables in econometric selection models.
This approach results in the estimation of a In line with the topic of this chapter, we have
plausible range of values for the treatment focused on methods of equating the T and
effect within upper and lower bounds. How- C groups at baseline. However, we would
ever, in some cases, the bounds may be very be remiss if we did not remind readers of
large so that little information is conveyed an important alternative strategy emphasized
about the size of the treatment effect. by Shadish and Cook (1999) and Shadish
et al. (2002). This strategy involves adding
Adjusting for growth design features that address specific threats
A final issue occurs when participants show to validity that arise in observational studies.
different rates of natural growth (e.g. young Shadish and Cook (1999) argue that the use of
children in math skills) or decline (e.g. design enhancements will often be preferable
Alzheimer’s patients in memory) on the out- to the use of statistical adjustment strategies.
come variable of interest. With observations We present three methods of enhancing
taken only at baseline, no measure of the the design of the basic observational study
natural growth rate in the absence of treatment here (see Shadish & Cook, 1999, for an
is available for the participants. Change score extensive list).
analysis (Judd & Kenny, 1981) can be used
to estimate the treatment effect. Participants Multiple control groups
are measured on the same measure at baseline When a treatment and control group are
and outcome. These baseline and outcome selected in an observational study, they will
measures are then transformed so that their be similar at baseline in some respects and
variances are equated (see Huitema, 1980). different in others. This feature gives rise
The mean change in the T group is then to the possibility that some hidden variable
compared with the mean change in the C group may be accounting for the result. If multiple
to provide an estimate of the treatment effect. control groups can be identified and the
This approach adequately models special estimates of the treatment effects are similar
situations in which growth is occurring when different control groups are used, the
at a constant rate across all participants researcher’s confidence that the treatment
or is of the fan spread variety in which effect is not biased is increased. For example,
growth is occurring at a rate proportional to using a large database, Roos et al. (1978)
the participant’s baseline score (e.g. those compared children receiving tonsillectomies
advantaged at baseline gain more). Treatment (T ) with two different comparison groups:
effects for other forms of growth are not (a) children having a matched history of
respiratory illness; and (b) untreated siblings through treatment noncompliance or attrition.
of the T child who were similar in age. In observational studies, groups are equated
Rosenbaum (2002) presents several examples to help assure that the estimate of the
of the use of this strategy. treatment effect is unbiased and not the
result of baseline differences on measured or
Nonequivalent dependent variables unmeasured variables.
Other dependent variables that would be Initial attempts to compare the effect
expected to be affected by the same factors sizes of observational studies and randomized
as the outcome of interest, but not by experiments studying the same treatment
the treatment can sometimes be identified. have suggested that the direction of bias,
Reynolds and West (1987) studied the effect if any, observed in the observational study
of a promotional campaign (T ) versus no is not consistent, but rather depends on the
campaign (C) on the sales of state lottery research area and features of the design.
tickets in convenience stores. The sales of Research by Shadish and colleagues suggests
lottery tickets increased in the T stores relative that three related factors—(a) larger measured
to the C stores. However, sales of other classes baseline differences; (b) self-selection into
of items (e.g. groceries, gasoline) did not treatment; and (c) the use of comparison
change appreciably, providing support that groups selected from a different population
the increase in ticket sales resulted from than the treatment group—are all associated
the promotional campaign rather than other with bias in treatment effect estimates in
factors (e.g. greater increase in customer observational studies.
traffic in T stores). A variety of statistical adjustment and
design approaches were considered to mini-
Multiple pretreatment measures over time mize the influence of these factors. Matching
We noted earlier that the collection of multiple strategies, including matching on propensity
measurements over time prior to treatment scores, provide a strong basis for equating
permits estimation of the pattern of growth the T and C groups on measured variables.
or decline in the absence of treatment. In Key determinants of the success of this
one design reported by Reynolds and West strategy include the use of content area
(1987), sales figures were available from each expertise to select reliable variables that
store for each of the 12 weeks of the lottery will capture baseline differences between
game. Sales declined each week during the groups as fully as possible and careful
lottery. The sales campaign was introduced checking that the propensity score model
into the T stores during the middle of the leads to balance of the baseline variables
lottery permitting a strong basis for estimating within each stratum on the propensity score.
the treatment effect despite different rates of Analysis of covariance and structural equation
decline in the individual participating stores. modeling can also properly equate the T and
C groups of measured variables at baseline
variables and can also provide adjustment for
SUMMARY AND CONCLUSIONS measurement error. The key determinant of
the success of these strategies is whether the
In this chapter we have considered methods relationships between the baseline variables
of equating groups at baseline in randomized and the outcome variable have been properly
experiments and in observational studies. In specified. For example, structural equation
randomized experiments, groups are equated models have only recently been extended
to avoid unfortunate randomization and to beyond examination of linear relationships.
maximize statistical power. Equating groups Econometric approaches provide appropriate
at baseline can also be helpful in interpreting adjustment for both measured and unmea-
the results when there is a breakdown sured variables, but the results may be fragile
of the original randomization, for example as they are dependent on meeting strong
EQUATING GROUPS 427
statistical assumptions. Other econometric (2002) and Shadish et al. (2002) offer
methods make weaker assumptions and pro- useful advice for planning studies to achieve
vide upper and lower bound estimates of this end.
treatment effects; however, if the bounds are
large, there will be considerable uncertainty as
to the true size of the treatment effect. Change NOTES
score analyses can estimate models for the
special case in which there is constant or fan
spread growth (or decline) in the absence of 1 Other threats to internal validity are also possible,
treatment. Addressing more complex forms of as when the experimenter uses different equipment or
different observers to measure the outcome variable
growth requires the collection of additional
in the T and C conditions.
measurements over time both pre- and post- 2 Blocking or analysis of covariance may also be
treatment. used to increase statistical power. A priori matching is
A complementary and often preferable often preferred because it does not assume a specific
approach to statistical adjustment is the form of relationship between the variable(s) on which
participants are matched and the outcome variable.
inclusion of design enhancements that address
Matching can also make it easier to detect unexpected
specific threats to internal validity that arise interactions between the matching variable(s) and
in observational studies. Potential nonequiv- treatment. Maxwell and Delaney (2004, pp. 448–452)
alence can be addressed during the design provide a comparative discussion of the conditions
of the study, ensuring that the participants under which matching, blocking, and analysis of
covariance may be preferred.
in the T and C conditions are sampled
3 As the ratio of the number of participants in
from populations that are as comparable the C to the T group approaches 5 or 6 to 1, the
as possible. The use of additional design statistical power of the test approaches asymptote.
features that rule out specific threats to internal Adding additional C participants will lead to only very
validity can often increase the confidence minimal increases in statistical power.
4 Earlier work within the econometric tradition
with which inferences about treatment effects
proved that selection models were identified so
may be made. These include the use of that treatment effects could be estimated without
multiple control groups that address different an instrument. However, these models require the
threats to validity, nonequivalent dependent assumption of a specific distribution of the variables
variables that would be expected to be in the population. Theoretical work by Little (1985)
showed that these models are extraordinarily sensitive
affected by potential threats to validity, but
to the specific distributional assumptions that were
not the treatment, and multiple pretreatment made. More recent work by Heckman (1997) has
measures over time which permit estimation emphasized the importance of having a good instru-
of patterns of natural growth and decline. ment in producing unbiased estimates of treatment
As researchers move from the ideal ran- effects.
domized experiment to weaker designs such
as broken randomized designs involving
noncompliance or attrition, to designs in REFERENCES
which participants are assigned to T versus C
conditions on the basis of a quantitative mea- Aiken, L. S., Stein, J. A., & Bentler, P. M. (1994).
sure (Reichardt, 2006; see also Cook & Wong, Structural equation analysis of clinical subpopulation
this volume), and finally, to the observational differences and comparative treatment outcomes:
studies that have unknown assignment rules, Characterizing the daily lives of drug addicts.
Journal of Consulting and Clinical Psychology, 62,
the estimate of the magnitude of the treatment
488–499.
effect becomes associated with increasing
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996).
uncertainty. To the extent that researchers Identification of causal effects using instrumental
can bring substantive knowledge, additional variables (with commentary). Journal of the American
design features that address specific validity Statistical Association, 91, 444–472.
threats, and good measurement to bear, Baker, S. G. (1998). Analysis of survival data from a ran-
this uncertainty can be reduced. Rosenbaum domized trial with all-or-none compliance: Estimating
the cost-effectiveness of a cancer screening program. Heckman, J. J. (1989). Causal inference and nonrandom
Journal of the American Statistical Association, 93, samples. Journal of Educational Statistics, 14,
929–934. 159–168.
Barnard, J., Du, J., Hill, J. L., & Rubin, D. B. (1998). Heckman, J. J. (1990). Varieties of selection bias.
A broader template for analyzing broken randomized American Economic Review, 80, 313–318.
experiments. Sociological Methods and Research, 27, Heckman, J. J. (1996). Randomization as an instrumental
285–317. variable. Review of Economics and Statistics, 77,
Barnow, L. S., Cain, G. G., & Goldberger, A. S. 336–341.
(1980). Issues in the analysis of selection bias. In Heckman, J. J. (1997). Instrumental variables: A study
E. S. Stromsdorfer & G. Farkas (Eds), Evaluation of implicit behavioral assumptions used in making
studies review annual (Vol. 5, pp. 53–59). Beverly program evaluations. Journal of Human Resources,
Hills, CA: Sage. 32, 441–462.
Campbell, D. T., & Kenny, D. A. (1999). A primer on Heckman, J. J., & Robb, R. (1986). Alternative methods
regression artifacts. New York: Guilford. for solving the problem of selection bias in evaluating
Cochran, W. G., & Cox, G. M. (1957). Experimental the impact of treatments on outcomes. In H. Wainer
designs (6th ed.). New York: Wiley. (Ed.), Drawing inferences from self-selected samples
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). (pp. 63–113). New York: Springer-Verlag.
Applied multiple regression/correlation analysis for Heinsman, D. T., & Shadish, W. R. (1996). Assignment
the behavioral sciences (3rd. ed.). Mahwah, NJ: methods in experimentation: When do nonran-
Erlbaum. domized experiments approximate answers from
Conner, R. F. (1977). Selecting a control group: An randomized experiments? Psychological Methods, 1,
analysis of the randomization process in twelve 154–169.
social reform programs. Evaluation Quarterly, 1, Hernán, M. A., Brumbach, B., & Robins, J. M. (2001).
195–244. Marginal structural models to estimate the joint
Cook, T. D., Shadish, W. R., Jr., & Wong, V. C. causal effect of nonrandomized treatments. Journal
(2006). Within-study comparisons of experiments of the American Statistical Association, 96, 440–448.
and non-experiments: What the findings imply Holland, P. W. (1986). Statistics and causal inference
for the validity of different kinds of observa- (with discussion). Journal of the American Statistical
tional study. Unpublished Manuscript, Northwestern Association, 81, 945–970.
University. Available at: http://www.metheval.uni- Huitema, B. E. (1980). The analysis of covariance and
jena.de/projekte/symposium2006/contributions.php alternatives. New York: Wiley.
Dehejia, R. H., & Wahba, S. (1999). Causal effects in Ioannidis, J. P. A., Haidich, A.-B., Pappa, M., Pantazi, N.,
nonexperimental studies: Reevaluating the evaluation Kokori, S. I., Tektonidou, M. G., Contopoulous-
of training programs. Journal of the American Ioannidis, D. G., & Lau, J. (2001). Comparison of
Statistical Association, 94, 1053–1062. evidence of treatment effects in randomized and
Friedman, L. M., Furberg, C. D., & DeMets, D. L. (1998). nonrandomized studies. Journal of the American
Fundamentals of clinical trials (3rd ed.). New York: Medical Association, 286, 821–830.
Springer. Jo, B. (2002). Statistical power in randomized inter-
Fuller, W. A. (1987). Measurement error models. New vention studies with noncompliance. Psychological
York: Wiley. Methods, 7, 178–193.
Gail, M. H., Brinton, L. A., Byar, D. P., Corle, D. K., Judd, C. M., & Kenny, D. A. (1981). Estimating the effects
Green, S. B., Schairer, C., & Mulvihill, J. J. (1989). of social interventions. New York: Cambridge.
Projecting individualized probabilities of developing Khoo, S.-T., West, S. G., Wu, W., & Kwok, O.-M.
breast cancer for White females who are being (2006). Longitudinal methods. In M. Eid and E. Diener
examined annually. Journal of the National Cancer (Eds), Handbook of multimethod measurement
Institute, 81, 1879–1886. in psychology (pp. 301–317). Washington, DC:
Gilbert, J. P., Light, R. J., & Mosteller, F. (1975). American Psychological Association.
Assessing social innovations: An empirical base for Kopans, D. B. (1994). Screening for breast cancer and
policy. In C. A. Bennett & A. A. Lumsdaine (Eds), mortality reduction among women 40–49 years of
Evaluation and experiment: Some critical issues in age. Cancer, 74 (Supplement.), 311–322.
assessing social programs (pp. 39–193). New York: Lee, S. Y., Song, X. Y., & Poon, W. Y (2004). Comparison
Academic. of approaches in estimating interaction and quadratic
Heckman, J. J. (1979). Sample bias as a specification effects of latent variables. Multivariate Behavioral
error. Econometrica, 46, 153–162. Research, 39, 37–67.
EQUATING GROUPS 429
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy algorithm. Journal of Computational and Graphical
of psychological, educational, and behavioral treat- Statistics, 10, 455–463.
ment: Confirmation from meta-analysis. American Muthén, B., & Jöreskog, K. G. (1983). Selectivity
Psychologist, 48, 1181–1209. problems in quasi-experimental studies. Evaluation
Little, R. J. (1985). A note about models for selectivity Review, 7, 139–174.
bias. Econometrica, 53, 1469–1474. Reichardt, C. S. (1979). The statistical analysis of data
Little, R. J., Hyonggin, J., Johanns, J., & Giordani, B. for nonequivalent group designs. In T. D. Cook and
(2000). A comparison of subset selection and analysis D. T. Campbell (Eds), Quasi-experimentation: Design
of covariance for the adjustment of confounders. and analysis issues for field studies (pp. 147–205).
Psychological Methods, 5, 459–476. Boston: Houghton-Mifflin.
Little, R. J., & Rubin, D. B. (2000). Causal effects Reichardt, C. S. (2006). The principle of parallelism in
in epidemiological studies via potential outcomes: the design of studies to estimate treatment effects.
Concepts and analytical approaches. Annual Review Psychological Methods, 11, 1–18.
of Public Health, 21, 121–145. Reynolds, K. D., & West, S. G. (1987). A multiplist
Little, R. J., & Rubin, D. B. (2002). Statistical analysis strategy for strengthening nonequivalent control
with missing data (2nd ed.). New York: Wiley. group designs. Evaluation Review, 11, 691–714.
Manski, C. F. (1994). The selection problem. In Robins, J. M. (1989). The analysis of randomized and
C. Sims (Ed.). Advances in econometrics (Vol. 1, nonrandomized AIDS trials using a new approach
pp. 147–170). Cambridge, UK: Cambridge University to causal inference in longitudinal studies. In
Press. L. Sechrest, H. Freeman, & A. Mulley (Eds), Health
Manski, C. R., & Nagin, D. S. (1998). Bounding services research methodology: A focus on AIDS
disagreements about treatment effects. Sociological (pp. 113–159). Washington, DC: US Public Health
Methodology, 28, 99–137. Service.
Manski, C. R., & Pepper, J. V. (2000). Monotone Roos, L. L., Jr., Roos, N. P., & Henteleff, P. D. (1978).
instrumental variables: With an applications to the Assessing the impact of tonsillectomies. Medical
return to schooling. Econometrica, 68, 997–1010. Care, 16, 502–518.
Marcus, S. (1997). Using omitted variable bias to Rosenbaum, P. R. (2002). Observational studies
assess uncertainty in the estimation of an AIDS (2nd ed.). New York: Springer-Verlag.
education treatment effect. Journal of Educational Rosenbaum, P. R., & Rubin, D. B. (1983). The central
and Behavioral Statistics, 22, 193–202. role of the propensity score in observational studies
Marsh, H. W., Wen, Z. L., & Hau, K. T. (2004). Structural for causal effects. Biometrika, 70, 41–55.
equation models of latent interactions: Evaluation Rosenbaum P. R., & Rubin, D. B. (1984). Reducing bias
of alternative estimation strategies and indicator in observational studies using subclassification on the
construction. Psychological Methods, 9, 275–300. propensity score. Journal of the American Statistical
Matthews, J. N. S. (2000). An introduction to Association, 79, 516–524.
randomized clinical trials. New York: Oxford. Rubin, D. B. (1974). Estimating causal effects of treat-
Maxwell, S. E., & Delaney, H. D. (2004). Designing ments in randomized and nonrandomized studies.
experiments and analyzing data: A model comparison Journal of Educational Psychology, 66, 688–701.
perspective (2nd ed.). Mahwah, NJ: Erlbaum. Rubin, D. B. (1978). Bayesian inference for causal effects:
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). The role of randomization. Annals of Statistics, 6,
Propensity score estimation with boosted regression 34–58.
for evaluating causal effects in observational studies. Rubin, D. B. (1980). Discussion of ‘Randomization anal-
Psychological Methods, 9, 403–425. ysis of experimental data in the Fisher randomization
Meier, P. (1972). The biggest public health experiment test,’ by D. Basu. Journal of the American Statistical
ever: The 1954 field trial of the Salk poliomyelitis Association, 75, 591–593.
vaccine. In J. M. Tanur, F. Mosteller, W. H. Kruskal, Rubin, D. B. (1997). Estimating causal effects from large
R. F. Link, R. S. Pieters, & G. R. Rising (Eds), data sets using propensity scores. Annals of Internal
Statistics: A guide to the unknown (pp. 120–129). Medicine, 127, 757–763.
San Francisco: Holden Day. Rubin, D. B. (2005). Causal inference using potential
Ming, K., & Rosenbaum, P. R. (2000). Substantial gains outcomes: Design, modeling, decisions. Journal
in bias reduction from matching with a variable of the American Statistical Association, 100,
number of controls. Biometrics, 56, 118–124. 322–331.
Ming, K., & Rosenbaum, P. R. (2001). A note on optimal Sacks, H. S., Chalmers, T. C., & Smith, H. (1983).
matching with variable controls using the assignment Sensitivity and specificity of clinical trials: Randomized
v. historical controls. Archives of Internal Medicine, Student (W. S. Gosset). (1931). The Lancashire milk
143, 753–755. experiment. Biometrika, 23, 398–406.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our Wall, M. M., & Amemiya, Y. (2007). A review of
view of the state of the art. Psychological Methods, nonlinear factor analysis and nonlinear structural
7, 147–177. equation modeling. In R. Cudeck & R. C. MacCallum
Shadish, W. R., & Cook, T. D. (1999). Design rules: (Eds), Factor analysis at 100: Historical developments
More steps towards a complete theory of quasi- and future directions (pp. 337–361). Mahwah, NJ:
experimentation. Statistical Science, 14, 294–300. Erlbaum. West, S. G., Biesanz, J. C., & Pitts, S. C.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). (2000). Causal inference and generalization in
Experimental and quasi-experimental designs for gen- field settings: Experimental and quasi-experimental
eralized causal inference. Boston: Houghton-Mifflin. designs. In H. T. Reis & C. M. Judd (Eds), Handbook of
Shadish, W. R., Luellen, J. K., & Clark, M. H. research methods in social and personality psychology
(2006). Propensity scores and quasi-experiments: (pp. 40–84). New York: Cambridge.
A testimony to the practical side of Lee Sechrest. In West, S. G., Duan, N., Pequegnat, W., Gaist, P.,
R. R. Bootzin, & McKnight, P. E. (Eds), Strengthening DesJarlais, D., Holtgrave, D., Szapocznik, J.,
research methodology: Psychological measurement Fishbein, M., Rapkin, B., Clatts, C., & Mullen, P.
and evaluation (pp. 143–157). Washington, DC: (2007). Alternatives to the randomized controlled
American Psychological Association. trial. Manuscript under review, Arizona State
Shadish, W. R., & Ragsdale, K. (1996). Random versus University.
nonrandom assignment in controlled experiments: Do West, S. G., & Sagarin, B. J. (2000). Participant
you get the same answer? Journal of Consulting and selection and loss in randomized experiments. In
Clinical Psychology, 64, 1290–1305. L. Bickman (Ed.), Research design: Donald Campbell’s
Singer, J. D., & Willett, J. B. (2002). Applied legacy (Vol. 2, pp. 117–154). Thousand Oaks, CA:
longitudinal data analysis: Modeling change and Sage.
event occurrence. New York: Oxford. Winship, C., & Mare, R. D. (1992). Models for sample
Smith, H. L. (1997). Matching with multiple controls to selection bias. Annual Review of Sociology, 18,
estimate treatment effects in observational studies. 327–350.
Sociological Methodology, 27, 325–353. Winship, C., & Morgan, S. L. (1999). The estimation
Steyer, R., Ferring, D., & Schmitt, M. J. (1992). States and of causal effects from observational data. Annual
traits in psychological assessment. European Journal Review of Sociology, 25, 659–707.
of Psychological Assessment, 8, 79–98. Wu, W., West, S. G., & Hughes, J. (in press). Short-
Stolzenberg, R. M., & Relles, D. A. (1990). Theory testing term effects of grade retention on the growth rate
in a world of constrained research design. Sociological of Woodcock-Johnson III broad math and reading
Methods and Research, 18, 395–415. scores. Journal of School Psychology.
25
Discourse Analysis and
Conversation Analysis
Charles Antaki
ANALYSING DISCOURSE something, in the broad social world, or in the

immediate interaction, or in both.
‘Discourse’ means what people say or write. What is it that discourse is supposed to
Scholars might want to look into what people do? It varies, according to the interests of
say or write for many reasons, and their the analyst. The familiar way of setting out
particular reason will play a large part in this difference is to range the interests from
deciding just what sort of saying and writing global to local. As we shall see later on, this
they choose to study, and what methods they distinction is itself a matter of dispute. But
use to do so. for the moment let us keep with it. At the
Students of history, cultural and media more global end, discourse analysts can be
studies and politics, among other disciplines, interested in actions at the overarching level
will want at times to identify a ‘discourse’ as of social regulation, expressed through official
a collection of metaphors, allusions, images, and unofficial discourses like laws, media
historical references and so on that populate coverage or advertising texts; actions that
some cultural phenomenon (the discourse have their effect not just in what is explicitly
of modernity, for example, or the discourse said, but what the analyst finds left unsaid.
of cyberculture, or the discourse of Human At that level, those doing the action (and
Resource Management; all current scholarly those suffering it) may be classes of people,
projects). That way of looking at discourse or ‘society’ in general. At the local end,
is more static than those I review in this the analyst might be interested in discourse
chapter, where discourse is taken to be social that acts at the level of interaction, through
action made visible in language. The sort of conversationalists’ activities, realised in the
discourse analyst I talk about in this chapter allocation, organisation and internal design of
is a social scientist: she or he sees discourse turns at talk. Here, doers and sufferers are
as an organisation of talk or text that does visible in the scene.
METHODS FOUR CORE FEATURES OF DA
There is no lack of methods available to The table below shows a variety of named dis-
discourse analysts once they have decided course analytic methods, but I have reserved
where their interests lie. Since the ‘linguistic an entry for unadorned ‘discourse analysis’.
turn’ in the social sciences of the nineteen That is useful for two reasons: it prompts us
seventies, qualitative methods textbooks have to ask what the core features are that makes
laid out an increasingly varied menu of something recognisable as DA, and reminds
discourse analytic methods, which have over us that many scholars are happy to use just
the years moved from novel and marginal these features without committing themselves
to familiar and central. Picking a method to one or other specific variant.
among these is apparently straightforward, The four core features of any DA are these:
once analysts have a clear idea of what
interests them. In Table 25.1, I range interests • The talk or text is to be naturally found
alongside appropriate methods. (in the sense of not invented, as it might
Students of discourse analysis (DA) will be in psycholinguistics, pragmatics or linguistic
recognise that the column headings in philosophy; some analysts admit interview data
Table 25.1 should only be used as a con- into this natural category, while others do not);
venience, because I have pretended that one • The words are to be understood in their co-text
at least, and their more distant context if doing so
can just start with a simple notion of ‘what
can be defended;
actions are to be revealed’, list them, then • The analyst is to be sensitive to the words’ non-
read off the corresponding theory, method and literal meaning or force;
data. In fact, of course, theory and method • The analyst is to reveal the social actions and
have a large say in calling something an consequences achieved by the words’ use – as
‘action’ in the first place, and what counts enjoyed by those responsible for the words, and
as evidence for that action; so these three suffered by their addressees, or the world at
apparently solid columns are better thought large.
of as fuzzy threads twined around each other.
Indeed, not even the rows are discrete; they Before I give an account of some specific
too are harder to separate than the simple table examples of discrete sorts of discourse
suggests. All that will become clearer as we analysis, it would be as well to recall
see examples of discourse analytic work in that many social scientists find a service-
practice. able use for what we might call ‘generic’
Table 25.1 Discourse analytic methods and data according to researcher’s interests
What actions are to be revealed Candidate theory/method Typical data
Personal meaning-making Narrative Analysis, Interpretative Interviews, diaries, autobiographies, stories
Phenomenological Analysis
Imposing and managing frames of Interactional Sociolinguistics, Audio and video recordings, ethnographic
meaning and identities Ethnography of speaking observations
Accomplishing interactional life in Conversation Analysis Audio and video recordings
real time
Displaying and deploying Discursive Psychology Audio and video recordings, texts
psychological states; describing the
world and promoting interests
Constituting and representing culture [Generic] Discourse Analysis Texts, interviews
and society
Constituting and regulating the social Critical Discourse Analysis Official and unofficial texts, speeches,
and the political world; the operation media accounts and representations,
of power interviews
DISCOURSE ANALYSIS AND CONVERSATION ANALYSIS 433
discourse analysis. This is work done without the sort of eclectic analysis that borrows from
a strong commitment to the sorts of episte- more than one school. I have allocated space
mologies and ontologies of the schools of to these six according to their influence as
analysis we shall see later on: it is a sort I see it, acknowledging that other reviewers
of working procedure, inspired by the four may see things differently. Setting them out in
basic principles of discourse analysis, and series will reveal, I think, that the differences
brought off in bespoke ways to make sense of between them are instructive about what is
one particular topic or domain of experience. at stake in the discourse analytic project as a
The method of choice in such work is whole.
often an inspection of textual material (e.g.
news media reports) or interview transcripts
(e.g. researchers’ interviews with informants
NARRATIVE ANALYSIS
chosen for their particular experiences). The
author or speaker is not, however, taken to
The origins of narrative analysis lie in literary
be a simple informant, reporting unvarnished
anatomies of folk stories. Since the publica-
facts; he or she is seen as producing
tion of Vladimir Propp’s The Morphology of
(or reproducing) themes or representations
the Folktale (1928), folklorists and literary
(sometimes called ‘interpretative repertoires’,
analysts have had an interest in discerning
after the influential use made of the term,
the underlying and possibly universal patterns
originating in Gilbert and Mulkay (1984),
in what seem to be discrete and individual
by Potter and Wetherell (1987)). The job of
stories (for example, in one of Propp’s most
the analyst is to sift carefully among the
basic templates, the underlying pattern of ‘the
material to extract these themes or repertoires,
quest’ or ‘the restitution of an object lost at the
and thus uncover the underlying dimensions
start of the tale’). Social scientists, as opposed
along which the author or interviewee makes
to literary and folklore scholars, have seized
sense of their experiences, or, if the interest
on the idea of structure, but shied away from
is less psychological, to uncover the imprint
looking for universal primitives as such. Their
that society has left on their lives. Generic
interest is in finding how the narrator finds
discourse analysis is, however, difficult to
a pattern and chronology that makes sense
illustrate with a given empirical example,
of her or his own unique life and the events
precisely because different studies take a
in it (see, for example, the work collected
great deal of colouring from their topic
in Schiffrin et al., 2006). Such patterns and
of interest (which might be media reports of
chronologies might be shared among a like-
political events, or people’s experiences of
minded group, but can equally be wholly
health and illness, or organisational change,
particular to the individual.
or educational practice, to name three typical
As illustration we may consider the work
examples).
of Michelle Crossley, whose Introducing
We shall be on firmer ground if we turn
narrative psychology: self, trauma and the
now to see how particular styles of discourse
construction of meaning (2000) crystallised
analysts address the texts in front of them. In
the application of narrative DA to the study
what follows, I won’t be able to describe all
of psychology, especially the psychology
the varieties of DA that I list in Table 25.1,
of health and wellbeing. Crossley analyses,
still less those which haven’t quite yet joined
among other kinds of narrative, the self-
the canon. I have chosen five influential
reflections of people who have undergone
varieties that have been successful (and
traumatic changes in their health. Here is
controversial) in different ways: narrative
an excerpt from such a reflection, in an
analysis, critical discourse analysis (CDA),
autobiography:
interactional sociolinguistics, conversation
analysis and discursive psychology. I have Without even realising it, before my diagnosis I had
also appended a further example to illustrate been living in an open, expansive, interior space.
Now the walls and ceilings had moved uncom- social life; and to have such a theory is vital.
fortably close. Limits were everywhere I looked … Without such a theory, the CDA argument
Gone was my sense of feeling protected or secure.
runs, one risks wasting time on non-problems
Gone, too, was any feeling of certainty about the
future. As my treatment progressed, these invisible or trivialities, or telling only part of the story,
losses were to become more painful, in some ways, and missing its political significance. In the
than the outward, physical losses and privations of worst case, one’s mere technical analysis,
the disease and its remedies. (Mayer, 1994, p. 54, by refusing to recognise political forces at
cited in Crossley, 2000)
work in the data, may implicitly condone or
Crossley’s analysis points us towards the perpetuate them.
realisation that in words such as these, we see Within this broad family of analysts there
how psychologically important it is for the are those who come from a post-structuralist
individual to have an articulable ‘story-line’ background, to some degree independent of
which maintains continuity and integrity: the the linguistics traditions which inform a good
trauma is destructive insofar as it radically deal of critical discourse work. In the post-
disturbs one’s sense of trajectory and sense structuralist tradition much use is made of
of selfhood. As Crossley puts it: ‘This sense Michel Foucault’s insights into the operation
is severely disrupted in the face of trauma, of power in discourses, and, increasingly,
which demonstrates a devastating capacity psychoanalytical concepts from the school
to “unmake the world”’, (Crossley, 2000, of Jacques Lacan. An example of this sort
p. 541). The promise of this sort of discourse of CDA can be found in the work of Ian
analysis is that it will recast ‘facts’ as Parker (see, for example, his programmatic
constructions, reveal heretofore unsuspected statement, Parker, 2003), and in the narrative
and perhaps marginalised experiences, give analysis of Wendy Hollway (see, for example,
voice to those whose experiences are not Hollway and Jefferson, 2000), among many
well understood, and perhaps feed into others. Other critical discourse analysts come
policy-making in the domains of health and from linguistics background, and bring with
education: two areas where narrative analysis them an array of linguistic tools with which to
has a strong presence. unfold their data.
For an illustration of the more linguis-
tically oriented kind of CDA, consider
this exemplary analysis, taken from a joint
CRITICAL DISCOURSE ANALYSIS account of CDA by two of its best-known
(but of course not uniquely representative)
The umbrella term ‘Critical Discourse Anal- proponents and theorists, Norman Fairclough
ysis’ shelters a broad family of analysts, but and Ruth Wodak (1997). They give a 125-line-
all have this in common: they approach texts long extract from a question-and-answer radio
from a certain prior point of departure, often interview with Margaret Thatcher during
an avowedly political one. That is the ‘critical’ her time as Britain’s Prime Minister. It
in the term. ‘The way we approach these is not an event-led news interview; she
questions’, says van Dijk, one of the doyens is being asked generally, if I can offer a
of CDA, ‘is by focussing on the role of rough gloss, about her political beliefs and
discourse in the (re)production and challenge aspirations. Fairclough and Wodak present
of dominance. Dominance is defined here as their analysis in eight facets, of which I
the exercise of power by elites, institutions select the two most emblematic examples.
or groups, that results in social inequality, Inevitably this will impoverish what they
including political, class, ethnic, racial and say, but it will give a flavour of these
gender inequality’ (van Dijk, 1993, p. 249; authors’ CDA style, on two central CDA
emphasis in the original). To be aware of themes: power and ideology. I will quote
the exercise of power, and its resulting social part of the transcript to help illustrate their
inequality, requires a political theory about analysis.
Extract 1: From Fairclough and Wodak, using such privileged talk, Thatcher not only
(1997, pp. 269–270) (MT = Prime Minister ‘circumvents and marginalises [the radio
Margaret Thatcher.) presenter’s] power as interviewer’, but also
exercises her power over the radio audience.
61 MT […] then you turn to internal They go on to observe that ‘Thatcherism
security
can … be partly seen as an ongoing hegemonic
62 and yes you HAVE got to be strong on
law and order
[power] struggle in discourse and over
63 and do things that only governments discourse, with a variety of antagonists -
can do but “wets” in the Conservative part, the other
64 there it’s part government and part political parties, the trade unions, and so forth’
people because
(p. 273). This is a good illustration of how
65 you CAN’T have law and order
observed unless it’s
CDA is able to make the kind of generalisation
66 in partnership with people then you that allows it to link the immediate data back
have to be strong to the analysts’ prior political commitments.
67 to uphold the value of the currency
and only
68 governments can do that by sound Ideology
finance and then
69 you have to create the framework for The authors note that, in the extract above,
a good Margaret Thatcher formulates a free-market
70 education system and social security ideology explicitly; but their analysis aims
and at that point
to add value by showing how she expresses
71 you have to hand over to people
people are inventive
the ideology more subtly. This stretch of
72 creative and so you expect PEOPLE to her words (and some 20 further lines not
create thriving shown here), they say, ‘is actually’ (i.e. not
73 industries thriving services yes you as one might first naively think, without
expect people
analytic help) ‘built around a contrast between
74 each and every one from whatever
their background
government and people which we would
75 to have a chance to rise to whatever see as ideological: it covers the fact that
level their own “people” who dominate the creation of
76 abilities can take them […] “thriving industries” and so forth are mainly
the transnational corporations, and it can help
to legitimise existing relations of economic
Power
and political domination’ (pp. 265–266).
Fairclough and Wodak see Thatcher’s display Fairclough and Wodak do not specify
of power in a number of discourse features: her exactly where in the extract Thatcher’s failure
use of longish monologues; her interruption of to mention transnational corporations was
her interviewer (not illustrated in the extract significant (that it is a ‘fact’ that her words
above); and her use of linguistic devices such ‘cover’). This is an important analytic point.
as parallel constructions (‘it has to be strong to Claiming that something is a fact, and that
have defence’… ‘you HAVE got to be strong it is significantly absent from a stretch of
on law and order’… ‘you have to be strong discourse, is a harder claim to ground than
to uphold the value of the currency’). Such pointing to something that is significantly
rhetorical devices, the authors claim, are ‘the present (after all, there is an infinity of things
prerogative of professional politicians’ (ibid., that may be facts, and which are absent from
p. 272). CDA’s willingness to use extra-textual any given stretch of talk or text; whereas
claims (in this case, about what generally what is there is at least there). Different DA
politicians do) is shared by many, but not all, traditions solve the problem in different ways.
kinds of DA. CDA notices absence not by working it out
Using their knowledge of the political from the logical or pragmatic implications
scene, the authors are able to say that by of the utterances around it, or from of the
reaction of those who are there to hear it, verb form) with demographic factors like
as other schools of analysis do. It works geographic location or socioeconomic class,
it out by virtue of prior theorising about or situational variables like the formality
the political or social nature of the world or informality of the speech setting. As
to which the utterance refers. In this case, interest shifted into what those features of
Fairclough and Wodak have a prior theory speech might actively be doing in interaction,
or account of what is happening in the researchers dropped the survey method in
British economy, what ‘thriving industries’ favour of a close qualitative look at what was
refer to, that these industries are owned by going on in the scene – what the founders
transnationals, and that this ownership is of interactional sociolinguistics, Dell Hymes
important in the discussion that Thatcher is and John Gumperz, called the ‘ethnography
currently having with her interviewer. They of communication’.
have a further belief, or expectation, that if Like CDA, interactional sociolinguistics
given an opportunity, a speaker should express means to explore the way that social and
the politically relevant facts of the matter (as cultural forces (including power differentials)
the analysts see them, and whether they are cash out in the details of talk. Unlike CDA, its
logically or pragmatically implied or not, or proponents do not normally require a specific
whether the speaker’s local interlocutors hold prior theory of politics or society, beyond a
them to it or not). Margaret Thatcher was generic belief that society is structured along
given the opportunity, and did not mention class, gender and cultural or ethnic lines, and
transnationals; therefore, it is analytically an expectation that this structure will reveal
safe, as well as useful, to claim that she is itself in interaction. A further difference is
masking their role in the economy. interactional sociolinguistics’ preference for
If we translate these snippets of analysis a great deal of ethnographic knowledge of the
back into the four core features of DA (data local scene in which the discourse takes place,
found naturally; interpreted in co-text; non- and a fairly particular set of codes with which
literally understood; actions achieved), we to analyse it.
see that CDA will insist on a very wide To the degree that working interactional
sense of ‘co-text’ in its interpretation, and on sociolinguists draw on pioneering work by
drawing out implications which may not be John Gumperz, they will see people achieving
visible to those who do not share the analyst’s their local goals (or being thwarted from
prior political commitments, or hesitate to doing so) by offering each other (and taking
apply them to the data. Its prime candidate up, or failing to take up) ‘contextualisation
for ‘social action’ is the action, taken to be cues’. These are various sorts of hints,
unequally shared in society, of constituting codes and signals as to what speakers mean.
the social world. CDA is attractive to scholars (The requirement to call such things ‘con-
who have the view that DA must ally itself textualisation cues’ has been progressively
to a social theory, and must be aware of relaxed as interactional linguistics becomes
inequalities in society. This is shared, in a more widespread, but remains important for
more dilute form, in the next influential DA core proponents of the method.) To get a
I shall look at. sense of what these contextualisation cues
are doing, the interactional sociolinguist is
committed to knowing something about the
INTERACTIONAL SOCIOLINGUISTICS local ethnography of the speakers’ situation:
what jobs they do, what their goals are
Interactional sociolinguistics emerged from and so on.
quantitatively minded variation sociolinguis- Here is an illustrative analysis, taken from
tics of the 1960s (and which still continues an account meant to show off interactional
today) which sought to correlate features of sociolinguistics against a number of other
speech (like a glottal stop or a truncated discourse approaches (Stubbe et al., 2003).
Before turning to the transcribed recording, starts off with the ‘contextualisation cues’ of
the authors give us some background: a complaint involving gender bias, and the
authors can then proceed to see how these two
The discussion takes place between a senior public
interlocutors bring it off.
service manager, Tom, and an analyst, Claire,
who is two ranks below him in the organisational Interactional sociolinguistics’ version of
hierarchy. From the ethnographic fieldwork that the four core features of DA (data found
was done at the time of the data collection, naturally; interpreted in co-text; non-literally
we know that Claire is annoyed that she was understood; actions achieved) gives generous
overlooked for the shared acting manager position
place to the wider ethnographic context. It is
she believes she was promised by her own
manager, and that she and some of her female willing to use information from prior scenes
colleagues interpret this as another example of to guess at what participants are feeling and
gender discrimination within the organisation. We intending in this one. It admits into its analysis
also know that she has expressed the intention to inferences from prior theories, or common
raise the issue with Tom [… continues …]. (Stubbe
assumptions, about interaction. In the extract
et al., p. 359).
above, for example, a speaker was judged
The authors then invite us to read over the to be ‘nervous’, and her nervousness was
following lines to see how Claire gets across partly ascribed to a common-sensical fear
to Tom a way of framing what she is about to that a woman risks being heard as making
say or do in the interaction: a gender-based complaint. Such theorising is
Extract 2: From Stubbe et al., p. 381 less particular and explicit than is required
(transcription conventions in this extract: ‘+’ by CDA, yet still contrasts starkly with
is a pause of up to one second; sloping lines conversation analysis’ distaste for what they
indicate overlapping speech). consider to be ‘going native’.
<#1:CT> yeah um yeah i want to talk to

you about um oh it’s a personal
CONVERSATION ANALYSIS
issue um + well i- the decision to
make um jared acting manager while
//joseph\ is away Conversation analysis (henceforth, CA) is the
<#2:TR> /mm\\ study of social action as achieved through the
medium of talk in interaction. Its genesis was
The authors point to certain speech features in the dissatisfaction of some sociologists in
(the intonation, the ums, the false starts) the late 1960s with the dominant quantitative
that suggest that Claire is nervous. The methodologies of their discipline, which were
interactional sociolinguist means to ask why silent about how people actively realised the
this might be so in this local scene, and social world, in real time. In the 40 years
what it might prefigure for the conduct of the since the pioneering work of the group
interaction, We can infer, the authors tell us, around Harvey Sacks (whose lectures were
that one cause of her nervousness is the fact published posthumously as Sacks, 1992), CA
that she is lower in the hierarchy than is her has attracted a good deal of attention within
interlocutor (something they have established sociology and outside it, and has developed
prior to this recording). Furthermore, she is into a multidisciplinary enterprise (for an
nervous because she is doing what women account of the history of CA, see Heritage,
do not do: ‘she is behaving in a direct, 1984; for a more recent overview of its
competitive way which is not stereotypically methods and style, see Hutchby and Wooffitt,
associated with women. This may help to 1998; and for an account of its relation to other
explain some of the apparent tension in itself, modes of DA, see Wooffitt, 2005).
as well as the likelihood that, given that CA abides by the four generic DA criteria
her addressee is a senior male, her utterance of looking for natural data, setting it in its
may be heard as an implicit accusation of co-text, watching for its non-literal meaning,
gender bias’. (ibid., p. 360). So the interaction and identifying the social actions performed.
Perhaps its most obvious departure from this 4 → (2.0)

basic platform is its insistence on seeing 5 A: Probably not.
social actions performed through the very
close organisation, as well as the content, of Note that it is A who is responsible for
talk. In describing those actions, CA – again both turns – so why does s/he answer his
unlike generic DA – wants to stay as close as or her own question, and answer it with a
possible to the speakers’ own understandings negative? Because B has done the unexpected
of the actions without imposing interpretation thing of not answering, and thus allowed the
from above or speculation about motives from implication that the answer is ‘no’. A then
below. Its ‘added value’is teasing out the what makes plain that this has been understood.
and the how, while shying away from the why, The interaction can proceed, with both parties
and leaving off anything not made ‘live’by the now having disposed of the possibility that A
participants in the scene. visit B’s office on Monday, without A having
The currency that CA trades in might had explicitly to say no. The ‘action’ has
be structures on a chronologically minute been achieved by exploiting the regularities
scale (for example, the binding relation of talk.
between speakers’ adjacent utterances, and CA has been applied productively to a
the injunction to keep their separation brief) variety of institutional activities otherwise
or extensive (the overall shape of a story accessible only in retrospect (by interviews
delivered over many turns), but they are with participants) or in simulation, or through
all normative. That is to say, speakers are comparatively coarse contemporary observa-
expected to follow them, or risk (or invite) tion. For example, CA has been used in
listeners to draw implications when they research on how talk in interaction achieves
do not. We can see an example of such a business meetings (Boden, 1994), educational
normative structure in the simple example testing (Maynard and Marlaire, 1992) and sur-
below, where the second utterance meets the vey interviewing (Houtkoop-Steenstra, 2000),
expectation of a prompt acceptance of the to take a few notable examples.
first: What can CA reveal about such working
Extract 3: Holt: 1988 Undated: Side 2: interactions? Peräkylä and Vehviläinen (2003)
Call 1 (original transcription much simplified; put it neatly. Members of a trade or profession
for full list, see appendix) (they were talking about psychotherapists, but
it’s true of anyone who routinely has dealings
1 Les: ((material not shown)) now we’re with clients) may have ‘stocks of interactional
2 feeling a bit freer. knowledge’– fairly clear ideas of what they do
3 (.)
with the people they work with. CA can check
4 Arn: [Ye:s.
5 Les: [.hhhhhh So we wondered if you’d
these accounts, correct them, or go beyond
6 like to meet us.hh them. In going beyond lay accounts, CA can
7 → Arn: Yes certainly. discover things about the interaction that the
practitioners didn’t suspect, or which have
To show how strong the normative expec- effects or functions which don’t figure in (or
tation is that the response be positive and indeed may be counter to) the official aims of
prompt, consider this variant. Here the the encounter.
speaker’s non-normative silence in line 3 As an example of CA’s illumination of
invites the listener to draw a significant professional practice, consider Maynard’s
implication. work on clinicians’ delivery of a diagnosis.
Extract 4: From Levinson (1983), p. 320 He inductively finds a pattern in which the
clinician prefaces the actual diagnosis ( you
1 A: So I was wondering would you be
have X) by evidence (from test results, and
2 in your office on Monday (.) by any so on). The typical sequence is like this, in
3 chance which a doctor in a developmental difficulties
clinic is talking to a mother about her five- mundane conversation (Maynard, 2003). The
year-old son: news deliverer organises their hints at bad
Extract 5: From Maynard (2004, p. 63) news in such a way that it is the recipient
who is prompted actually to pronounce it.
1 Dr Y: From the:: test results (0.3) In ordinary social life that hinting has a set
2 he seems to function (0.6) of implications which we might interpret as
3 comfortably (0.2) you know and
being to do with the complexities surrounding
4 (achieve) some kind of you
5 know happy and responsive death and other taboo issues; in the clinic,
6 (0.2) it has all those, but also has more prosaic
7 Mrs R: Ye [e:s ] consequences as well. If the patient (or
8 Dr Y: [ .h ]hh ON THE LEVEL of their representative, as in the case above)
9 about you know three
is the one who comes out with the news,
10 (0.1) and
11 a half year old child it shows that he or she has been attending
12 Mrs R: mm to what the doctor said, at least enough
to work things out for themselves; it puts
The doctor is describing evidence: the boy patient and doctor on something of an equal
seems to function comfortably at the level footing. Certainly it is more equal (or more
of a three and a half year old. She is not equal-looking) than would be the case if
(yet) giving a diagnosis. The next extract the doctor simply pronounces the condition
follows the first (though some intervening straight off.
talk has been omitted). But notice how the
doctor manages to avoid actually stating the
child’s condition even as she makes her CA AND ‘MEMBERSHIP CATEGORIES’
recommendation.
Extract 6: From Maynard (2004, p. 63) My account of CA so far has focused on
sequential analysis. There is another strand
1 Dr Y: I feel very strongly that, you of CA, traceable back to Sacks’ work in the
2 know, because he (0.4) tests early seventies, which, although it is alive
3 some kind you know, functions
to sequence and placement of utterances,
4 between mildly retarded and
5 borderline level [.hhhhh ] he
is concerned with them insofar as they
6 needs special class placement. sustain the speaker’s version of events; and
7 Mrs R: [Mm hmm] specifically, the speaker’s choice of identity
8 Dr Y: (Yeah) the (.) class for (0.2) or person categories. This is sometimes called
9 .hh educable mentally retardet
Membership CategoryAnalysis (though many
10 (0.2) will be the best (.) for
11 his (0.8) you know?
in CA prefer to see it as merely a part of the
12 functioning and emotional, he’s broader CA project); but in any case, it is
13 still not ready you know very different from other discourse work on
14 enough [to be more- ] identities. A generic DA of identities would
15→ Mrs R: [Are y- are you tr]yin’ ta
look at material which explicitly names a
16 tell me that you feel he
17 is: s:lightly mentally re
given identity category (say, ‘asylum seeker’),
18 [tard]ed? and chart the ways in which that category is
19 Dr Y: [Yes.] constructed. The aim of that sort of analysis
would be to draw up a picture of ‘asylum
What the doctor has done is to glide from a seeker’ as it appears, explicitly and subtly, in
statement of the evidence (from the tests) to a the materials. Then a further stage of analysis
recommendation for treatment, passing over takes over, and speculation is made about what
actually naming the child’s condition. It falls interests such a picture serves in a general
to the mother (at line 9) to make explicit what way in society. For CA, there is no need to
has so far been implicit. Maynard has noted go to such an abstract level and separate the
this pattern in his work on news delivery in use of the category from its consequences.
The speaker or writer’s use of (or hint at) drinks should be made available at the staff
an identity category is locally effective. If party. But there is more. He has explicitly
you call someone an asylum seeker (or hint excluded X from ‘we … here in Sweden’.
that she or he is one) then you are doing it The effect is to exclude her not only from
for local consumption, and the consequences the fellow-national category but the locally
will be interactionally visible. And this is true operative category of fellow member of the
for mundane categories (like, say, ‘daughter’) current social group.
as much as it is for more politically charged Both Day’s work, and that of Maynard
ones. that I described above, are examples of
In the case of politically charged identities, CA’s claim to deliver the substance of
consider what is happening here, in this large-scale social phenomena. Their claim is
extract from Dennis Day’s (1998) account that if we want to say that, for example,
of ‘ethnification’. Here, some workers in a agreement between patient and clinician is
factory in Sweden are in a coffee break and at a premium in US consulting rooms; or
planning an upcoming works party. that people can exclude fellow-workers from
Extract 7: Day, 1998, p. 163 (English joint ventures by subtly casting them into
translation from the Swedish) ethnic categories; then CA will provide the
evidence – unaffected, its adherents say,
1 L: that one has wine and normal by prior theorising about context or social
2 drinks too,
forces.
3 right, of course like a party
4 ((writing))
5 → L: that’s what we have at least
6 here in DISCURSIVE PSYCHOLOGY
7 → Sweden one drinks wine, that’s
8 of course
9 what [one wants
The epistemological commitment of conver-
10 R: [of course, it’s like sation analysis – to begin with what the
11 different that participants in the scene make visible to each
12 [to drink other – is shared by Discursive Psychology.
13 L: [what does one drink in what This is a movement, impelled by a number
14 does one drink
15 L: ((points))
of hands, to make Psychology treat the tradi-
16 X: [don’t drink wine but light beer tional psychological topics of perception and
17 or just (soda) cognition (seeing, remembering, knowing and
so on) not, in the first place, as mental
Speaker ‘X’, Day tells us, is categorisable and individual matters, but as resources that
on sight as not ethnically white-Swedish; people use: a person will avow a belief,
she is (or looks) Chinese. But notice that challenge another’s veracity, test a third
we hardly need even this minimal piece of person’s knowledge, admit a faulty memory
ethnography (and the reader might compare and so on. This branch of DA, like others
it with the thick description and inference we have covered, comes in various versions.
required by interactional sociolinguistics; see I will pick an illustrative example from
above). See how, in lines 4 and 5, it is one what has probably been the most empirically
of the participants himself (L) who introduces productive form, the Discursive Psychology
the notion that Otherness is a live issue. That’s developed by Derek Edwards and Jonathan
what (drink) we have, he says; at least here Potter (for programmatic statements of their
in Sweden one drinks wine. It is the ‘we’ project, see Edwards, 1997, Edwards and
and the ‘here in Sweden’ that do the work Potter, 1992, and Potter, 2003).
of setting national or ethnic identities on the Consider Edwards’ work on emotions (see,
table. From the CA point of view, the minimal for example, Edwards, 1999). At first sight,
observation is that L has ‘ethnified’ X to the emotions find a natural home in traditional
extent that he has called into question what Psychology: they are (surely?) subjective,
directly experienced, irrational, stimulated by 5 and then::, (.) obviously

events in the world, and liable to vary in 6 you went
7 through your a:ngry stage,
intensity and character according to classic
8 didn’t you?
psychological variables such as social and 9 (.)
physical stimuli, mood, age, gender and so on. 10 Ve:ry upset obviously, .hh
Yet, Edwards argues, to say all this is to put 11 an:d uh,
the cart before the horse. All these things 12 (0.6) we: started ar:guing a
13 lot, an:d
are true not necessarily about emotions-in-
14 (0.6) just drifted awa:y.
the-head, but about emotions-as-traded-in-
interaction. People (who, after all, predate
Edwards invites us to notice how Mary trades
psychologists) treat them as all of the above
on the presumptions of emotion terms to
things, and psychologists have fallen into the
accomplish a number of rhetorically powerful
trap of thinking they are all true. Edwards
moves. Jeff’s reaction to Mary’s revelation
does not mean we should therefore replace
was (according to her account) to be angry;
a scientific study of emotions with a study
she does not report his state of mind as
of people’s folk theories about emotion, or
a matter of reasoned appreciation, but of
by asking them survey questions about what
visceral reaction. Moreover she portrays this
they think emotions are, or by recording
anger as your angry stage. This implies
their spontaneously offered definitions of
that Jeff is prone to a predictable chain-
emotions in natural talk and so on. Such
reaction of emotions that are sparked, then
things are of secondary importance. What
run their course. These two undercurrents,
is of prime importance is how people bring
heavily implied but never stated, bear Mary’s
emotion terms into conversations (which may
narrative into the rhetorically clear waters of
be mundane chat, or consequential events
inevitable separation. As Edwards puts it:
like police interrogations, marital counselling,
psychotherapy, courtroom testimony and so
… while Jeff’s anger is proper in its place, one
on) actually to achieve their ends. To be sure, would not expect it to go on forever, to endure
such ends will be served by the presumption unreasonably, beyond its ‘stage’. Mary has made
that an emotion is internal, not rational rhetorical room for something she goes on to
and so on and so forth (or some distortion develop, which is the notion that Jeff’s reactions
are starting to get in the way of progress, starting to
of this list, as circumstances demand) but
become (instead of her infidelity, as Jeff insists) ‘the
that in no way guarantees the truth of the problem’ they have in their relationship. Indeed,
presumption, still less persuade us to give up the next thing she says in her narrative (and
the study of emotions in talk in favour of implicationally, therefore, what not only follows but
a possibly chimerical survey of emotions in follows from Jeff’s reactions) is how ‘we started
arguing a lot, and just drifted away’ (…) Their
the head.
problems are now joint ones, arguments, and a kind
We can put flesh on that argument by of non-agentive, non-blaming, ‘just’ drifting apart.
looking at a stretch of talk that Edwards (Edwards, 1999, p. 277)
reports from a marital therapy session, where
one person’s descriptions of their spouse’s In other words, Mary’s description of events,
emotions have, of course, a high premium. in just that way and at just that time, has
Early in the session, ‘Mary’ describes what socially important consequences for how her
happened when she told her husband ‘Jeff’ of relationship is to be read, how her spouse’s
an affair she had had: role in proceedings is to be understood, and
Extract 3 (DE-JF:C1:S1:4) perhaps how the counselling will proceed.
Deploying an emotion term was not a neutral
1 Mary: (. . .) so that’s when I decided
matter of describing the world as it is and was,
2 to (.) but a rhetorically charged choice of a term that
3 you know to tell him. (1.0) packed a punch, as any choice of description
4 U::m (1.0) always does.
Edwards’ analysis here of the emotion people to consult or refer to; and in the
term angry is a good example of the affordances of the physical sites they live
respecification that Discursive Psychology and work in. Thus if one wanted to find
intends for the entire realm of ‘the mental’. out how people solve the problem of (say)
It reminds psychologists that emotions, like taking turns to be served (Garfinkel, 2002,
any other ostensibly mental state of mind, ch. 8), one would not limit oneself to analysing
may be allegedly owned in private, but people’s language, but would analyse the ebb
are manifestly traded in public. This makes and flow of bodily movement, synchronised
Discursive Psychology especially attractive occupation of space, gestures, gaze and so
for application to any discourse in which play on, to see how queues form and are oriented
is made of psychological terms, and that of to and policed. Much more than language
course is a wide field. But we should notice needs to be mastered by the person who wants
that Discursive Psychology is not limited to competently to join a line for service – as
the study of the use of psychological terms, many of us who have tried the experience
common though such usage is. Discursive in unfamiliar places, perhaps when in foreign
Psychology’s radical anti-cognitivism aligns lands, can testify.
it with other discourse analyses which take Nevertheless, ethnomethodology has
discourse to be constitutive of social (and inspired a kind of DA which, while wanting
not just social) reality – see, for example, to explicate people’s public reasoning
Potter’s Representing Reality (1996). Were processes, privileges talk in its ethnographic
space to permit, it would have been instructive setting. Perhaps the best label for such work
to describe its close, ethnomethodologically – is ‘eclectic’, since it combines the four
and Conversation Analytically – inspired canonical principle of DA with a concern for
investigation of people’s interested descrip- the physical and temporal location in which
tions and accounts of events, for example in the event takes place. For an example of such
such charged encounters as the police inter- work, I have chosen a much-anthologised
rogation (Edwards, 2006). In its concern for study by Hugh Mehan (1996) on how
unpacking descriptions of reality, Discursive children are sorted into various categories
Psychology is applicable to discourse in its by educators. This picks up the theme of
widest remit. identities in the section on CA above, and
shows how an eclectic discourse analyst can
use non-talk elements of the scene.
AN EXAMPLE OF AN ECLECTIC DA Mehan follows the career of one nine-year-
old boy (‘Shane’). Our first sight of him is
I want to turn for my last example to a DA when a teacher spots him behaving in a way
inspired – if distantly – by ethnomethodology. that concerns her. He then becomes a case for
If ethnomethodology has a place in a survey the educational psychologist, who tests him,
such as this one, it is an uncomfortable one at and the language in which he is described
best. Most practitioners of ethnomethodology changes from the teacher’s common-sensical,
would not describe themselves as doing DA. teacherly talk (‘he’s very apprehensive about
Their aim – as the term ‘ethno-methodology’ approaching anything …’, ‘whenever he’s
suggests – is to explicate the reasoning given some new task to do it’s always like, too
practices or rules that ordinary people display hard, “no way I can finish it” ’) to technical,
in prosecuting their ordinary lives. While quantitative norm-based terms (‘he was given
some of those practices are made visible the WISC-R and his IQ was slightly lower,
in their use of language, many others are full scale of 93 …’).
embodied in the props and resources which Mehan’s set-piece for analysis is a record-
furnish the daily scene; in the temporal ing of a subsequent meeting of educators
organisation of people’s comings and goings; (teachers, educational psychologists and so
in the artefacts and documents available for on) and parents. At this point Shane’s fate,
as is that of a list of children who have to disability; he has been set on a career which
come to the school’s attention as possibly may have profound consequences (for good
needing special education, is to be decided. or ill). Mehan has not simply noted that
Each case will be decided by talk; and as different sorts of evidence have been brought
the outcomes are quite dramatically different forward to reach this decision; by careful
(the child might be classified then and there note of how descriptions are phrased and
as ‘learning disabled’ and sent to one kind of received he has offered us the analysis that
school, or as ‘educationally handicapped’ and (as he puts it) ‘these modes of representation
sent to another), the power of discourse is all are not equal’ (p. 356). It is a DA that
too visible. delivers the generic promise not merely
It is up to the Board to hear the various of describing talk but of explaining social
descriptions of Shane available from his action, and adds specific ethnomethodolog-
teacher, his parents, the school nurse and the ical value by charting participants’ treat-
psychologist, and meld them into a decision ment of each other and the distributions
as to just what kind of schoolboy he is. of powers and expertise that they allow
Mehan describes the props (for example, themselves.
the psychologist’s thick bundle of forms,
test scores and reports) or the lack of them
(the child’s mother has no notes) as part
of the action. The props round out his CONCLUDING COMMENTS:
observations about the talk: that, for example, DISCOURSE ANALYSIS MEANS
the psychologist refers to her official notes DOING ANALYSIS
while delivering her account uninterrupted,
while the mother’s unsupported account A word is in order to remind the reader that this
is drawn out by others’ questioning; or account of DA has been selective. Each exam-
that the psychologists’ document-based story, ple, in the sections above, elbowed its way past
although freighted with obscure jargon, is not a dozen equally significant competitors. Some
challenged, whereas the mother is asked to styles of analysis were crowded out entirely,
explain what she means by her common-sense and a longer chapter may well have found
claims about her son’s behaviour (claims that space for interpersonal phenomenological
would pass unremarked in a more mundane analysis (Smith, 2004), psychoanalytically
setting; for example, that ‘lots of times he oriented Marxist critical discursive psychol-
comes home and he’ll write or draw’). Mehan ogy (Parker, 2002), Foucauldian discursive
‘adds value’ of a startling kind when he psychology (Wetherell and Edley, 1999),
claims that free-association narrative inquiry (Hollway
and Jefferson, 2000), and action-implicative
The psychologist’s report gains its authority by the discourse analysis (Tracy, 2005), among
very nature of its construction. The psychologist’s others. And I ought to say that many working
discourse obtains its privileged status because it
is ambiguous, because it is shot full of technical
discourse analysts claim no specific rules
terms, because it is difficult to understand (p. 357; beyond the four canonical DA features of
emphasis in original) looking for social action in natural data,
non-literally understood in its co-text. Indeed
Mehan’s point is that the technicality of the some discourse analysts have made an explicit
psychologist’s claims meant that they could virtue of keeping their independence from
not easily be challenged, so her conclusions restrictive technicality. An eloquent defence
were never subject to the sort of test that of this way of thinking is Billig’s case in
the mother’s or the teacher’s could be. favour of critical scholarship over narrow
Because of its permitted obscurity, it is the method (Billig, 1988, 1999). It is better, on his
psychologist’s report that carries the day, argument, to have the core discourse analytic
and Shane is classified as having a learning sentiments in mind, be guided by a critical
spirit, and to avoid particular methodological TRANSCRIPTION SYMBOLS FOR THE

practices which might miss as much as they CONVERSATION ANALYSIS EXTRACTS
catch.
However, whether one flies under the (.) Just noticeable pause
flag of a particular kind of DA or sails (.3), (2.6) Examples of timed pauses
alone, it is not the case that ‘anything word [word
goes’. The editor of one of the principal, [word The start of overlapping talk.
indeed defining, journals of the field sounds .hh, hh In-breath (note the preceding full stop)
a clear warning in his editorial instructions: and out-breath respectively.
‘Articles should provide a detailed, systematic wo(h)rd (h) shows that the word has ‘laughter’
and theoretically based analysis […]. It is bubbling within it
insufficient to merely quote, summarise or wor- A dash shows a sharp cut-off
paraphrase such discourse’ (Teun van Dijk, wo:rd Colons show that the speaker has
stretched the preceding sound.
in the instructions on ‘Preferred Papers for
(words) A guess at what might have been said
Discourse & Society’ which has appeared if unclear
in the journal since March 2002). A useful ( ) Very unclear talk.
expansion of that injunction can be found in word=
a joint paper by Antaki et al. (2003) who, =word No discernible pause between two
although as individual authors vary in their sounds or turns at talk
theoretical allegiances, nevertheless insist word, WORD Underlined sounds are louder, capitals
together that, as they put it, ‘discourse analysis louder still
means doing analysis’. Any discourse analyst ◦ word◦ Material between ‘degree signs’ is
who claims to be analysing, they argue, must quiet
‘add value’ to what is readable or hearable >word Faster speech
in the words straight off, beyond simple word<
<word Slower speech
paraphrasis or glossing; they must be able word>
to back up their claims with some evidence ↑ word Upward arrow shows upward
grounded in the words used or warrantably not intonation
used; and they must reach their conclusions by ↓ word Downward arrows shows downward
argumentative steps available to a fair-minded intonation
fellow-scholar. → Analyst’s signal of a significant line
To use a DA ‘method’, or not, and which ((sniff)) Attempt at representing something
method to use, is not a simple matter of hard, or impossible, to write phoneti-
cally
bloodless fashion; there are strong forces
at work which push new methods onto the
agenda (and indeed resist them). I haven’t FURTHER READING
been able to do justice to such forces in
this chapter; for an excellent recent survey The sources cited in the References (at the end
of the general ebb and flow in the tides of this chapter) will take the reader further
of discourse methods, see de Beaugrande’s along the particular paths sketched out in the
useful short account (de Beaugrande, 1997), text. Those who would like to follow up issues
Wood and Kroger’s book-length overview and topics I have only mentioned fleetingly
(Wood and Kroger, 2000) and Denzin and may like to pick among the following further
Lincoln’s thoughtful introduction to their readings.
recent Handbook of Qualitative Research
(2005). DA is a particularly unsettled method
Interpersonal phenomenological
of working in the social sciences – probably
analysis
because, to its adherents, who want to
understand (and sometimes unmask) social An approach to individual meaning-making
action, the stakes are high. through a discursive analysis of interviews.
Smith, J.A. (2004) Reflecting on the development Billig, M. (1999) Whose terms? Whose ordinariness?
of interpretative phenomenological analysis and its Rhetoric and ideology in Conversation Analysis.
contribution to qualitative research in psychology. Discourse and Society, 10, 543–558.
Qualitative Research in Psychology, 1, 39–54. Schegloff, E. A. (1999) ‘Schegloff’s texts’ as ‘Billig’s
data’: a critical reply. Discourse and Society, 10,
558–72.
Feminist discourse analysis Billig, M. (1999) Conversation analysis and the claims
of naivety. Discourse and Society, 10 (4), 572–6.
For a variety of examples of discourse analytic Schegloff, E. A. (1999) Naiveté vs. sophistication or
research projects that offer a specifically discipline vs. self-indulgence: a rejoinder to Billig.
feminist approach, see: Discourse and Society, 10, 577–82.
Kitzinger, C. (2000) Doing feminist conversation
Lazar, M. (Ed.) (2005). Feminist Critical Discourse analysis. Feminism and Psychology, 10, 163–93.
Analysis: Gender, Power and Ideology in Discourse.
Basingstoke: Palgrave.
REFERENCES
Varieties of critical discourse Antaki, Charles, Billig, Michael, Edwards, Derek, and
analysis Potter, Jonathan (2003). Discourse analysis means
doing analysis: A critique of six analytic shortcomings.
There is broad range within Critical Discourse Discourse Analysis On Line, 1(1). Available at:
Analysis. These sources, along with those <http://www.shu.ac.uk/daol/articles/v1/n1/a1
cited in the text, will give an indication of the antaki2002002.html>.
variety. Billig, M. (1988). Methodology and scholarship in
understanding ideological explanation. In C. Antaki
Rogers, R. (Ed.) (2003). An Introduction to Critical (Ed.), Analysing Everyday Explanation: A Casebook
Discourse Analysis in Education. Mahwah, NJ: of Methods. London: Sage.
Lawrence Erlbaum. Billig, M. (1999). Whose terms? Whose ordinariness?
Toolan, M. (Ed.) (2002). Critical Discourse Analysis: Rhetoric and ideology in conversation analysis.
Critical Concepts in Linguistics (Vols 1–4). London: Discourse and Society, 10, 543–558.
Routledge. Boden, D. (1994). The Business of Talk. Oxford: Polity.
Wodak, R. & Meyer, M. (Eds.) (2001). Methods of Critical Crossley, M. L. (2000). Narrative psychology, trauma
Discourse Analysis. London: Sage. and the study of self/identity. Theory & Psychology,
van Dijk, T. (1993) Principles of CDA. Discourse and 10, 527–546.
Society, 4, 249–83. Day, D. (1998). Being ascribed, and resisting, mem-
bership of an ethnic group. In C. Antaki and
S. Widdicombe (Eds.), Identities in Talk. London:
Debate between conversation Sage, pp. 151–170.
analysis and critics de Beaugrande, R. (1997). The story of discourse
analysis. In T. van Dijk (Ed.), Discourse as Structure
This exchange is often cited as a useful and Process. London: Sage, pp. 35–62.
crystallisation of the debate – not always tem- Denzin, N. K., and Lincoln, Y. (2005). The discipline and
perate – between Conversation Analysts and practice of qualitative research. In N.K. Denzin, and
their discourse analytically minded critics. Y. Lincoln (Eds.), The Sage Handbook of Qualitative
I list the papers in their chronological order. Research. London: Sage.
Edwards, D. (1997). Discourse and Cognition. London:
Schegloff, E. A. (1997) Whose text? Whose context? Sage.
Discourse and Society, 8, 165–87. Edwards, D. (1999). Emotion discourse. Culture and
Wetherell, M. (1998) Positioning and interpreta- Psychology, 5, 271–291.
tive repertoires: conversation analysis and post- Edwards, D. (2006). Discourse, cognition and social
structuralism in dialogue. Discourse and Society, 9, practices: The rich surface of language and social
387–412. interaction. Discourse Studies, 8, 41–49.
Schegloff, E. A. (1998) Reply to Wetherell. Discourse and Edwards, D., and Potter, J. (1992). Discursive Psychol-
Society, 9, 413–6. ogy. London: Sage.
Fairclough, N., and Wodak, R. (1997). Critical discourse (2001) (Eds), Discourse Theory and Practice. London:
analysis. In T. van Dijk (Ed.), Discourse Studies Sage Publications.
A Multidisciplinary Introduction, Volume 2: Discourse Parker, I. (2002). Critical Discursive Psychology. London:
as Social Interaction. London: Sage. Palgrave.
Garfinkel, Harold (2002). Ethnomethodology’s program: Parker, I. (2003). Psychoanalytic narratives: Writing the
Working out Durkheim’s aphorism. Edited and self into contemporary cultural phenomena. Narrative
introduced by Anne Rawls. Lanham, MD: Rowman & Inquiry, 13 (2), 301–15.
Littlefield. Peräkylä, A. and Vehviläinen, S. (2003). Conversation
Gilbert, G. N. and Mulkay, M. (1984). Opening Pandora’s analysis and the professional stocks of interactional
Box: A Sociological Analysis of Scientists’ Discourse. knowledge. Discourse & Society, 14 (6).
Cambridge, UK: CUP. Potter, J. (1996). Representing Reality. London: Sage.
Heritage, J. (1984). Garfinkel and Ethnomethodology. Potter J. (2003). Discursive psychology: Between
Cambridge: Polity Press. method and paradigm. Discourse & Society, 14,
Hollway, W. and Jefferson, T. (2000). Doing Qualitative 783–794.
Research Differently: Free Association, Narrative and Potter, J., and Wetherell, M. (1987). Discourse and Social
the Interview Method. London: Sage. Psychology. London: Sage.
Houtkoop-Steenstra, H. (2000). Interaction and the Sacks, H. (1992). Lectures on Conversation (Vols 1
Standardised Survey Interview: The Living Question- and 2). Oxford: Basil Blackwell.
naire. Cambridge: Cambridge University Press. Schiffrin, D., De Fina, A., and Bamberg, M. (Eds.) (2006).
Hutchby, I. and Wooffitt, R. (1998). Conversation From Talk to Identity: Methodological and Theoretical
Analysis. Oxford: Polity Press. Issues in Identity Research. Cambridge University
Levinson, S. C. (1983). Pragmatics. Cambridge, UK: Press.
Cambridge University Press. Stubbe, M., Lane, C., Hilder J., Vine E., Vine B., Marra M.,
Mayer, M. (1994). Examining Myself: One Woman’s Holmes J., and Weatherall, A. (2003). Multiple
Story of Breast Cancer Treatment and Recovery. discourse analyses of a workplace interaction.
Winchester, MA: Faber & Faber. Discourse Studies, 5, 351–388.
Maynard, D. W. (2003). Bad News, Good News: Tracy, K. (2005). Reconstructing communicative prac-
Conversational Order in Everyday Talk and Clinical tices: Action-implicative discourse analysis. In K. Fitch
Settings. Chicago & London: University of Chicago and R. Sanders (Eds), Handbook of Language and
Press. Social Interaction (pp. 301–319). Mahwah, NJ:
Maynard, D. W. (2004). On predicating a diagnosis as Lawrence Erlbaum.
an attribute of a person. Discourse Studies, 6, 53. van Dijk, Teun A. (1993). Principles of critical discourse
Maynard, D. W. and Marlaire, C. (1992). Good reasons analysis. Discourse & Society, 4, 249–283.
for bad testing performance: The interactional sub- Wetherell, M., and Edley, N. (1999). Negotiating
strate of educational testing. Qualitative Sociology, hegemonic masculinity: Imaginary positions and
15, 177–202. psycho-discursive practices. Feminism & Psychology,
Mehan, H. (1996). The construction of an LD student: 9, 335–356.
A case study in the politics of representation. In Wood, L. A., and Kroger, R. O. (2000). Doing Discourse
M. Silverstein and G. Urban (Eds), Natural Histories Analysis. Thousand Oaks: Sage Publications.
of Discourses. Chicago: University of Chicago Press. Wooffitt, R. (2005). Conversation Analysis and Discourse
Reprinted in M. Wetherell, S. Taylor and S. Yates Analysis. London and New York: Sage.
26
Analyzing Narratives and
Story-Telling
Matti Hyvärinen
Narrative inquiry has established itself as as a method, and narratives as resources

a broad and polymorphous research orien- with which to investigate the phenomena
tation within the social sciences. The most of which the narratives make an account.
varied personal, political, institutional, orga- Amore ambitious version of narrative analysis
nizational and conversational stories are draws from the social constructionist notion
currently collected and studied, yet the term that narratives already always are part of
‘narrative analysis’ remains replete with the constitution of the social, cultural and
innate tensions. Does the research material political world (Bruner 1991; Gergen and
as such qualify the narrativity of the analysis, Gergen 1993). ‘From a hermeneutic point of
or is it also required that these narratives are view’, Guy Widdershoven maintains, ‘human
studied as narratives? life is a process of narrative interpretation’,
The use of narratives in social research quite independently and before any narrative
may be characterized by three separate, but analysis (Widdershoven 1993, 2). These
by no means straightforwardly successive notions motivate theoretical investigation on
moments. At the first stage, narratives were how narratives are constituted, what their
used as factual resources. The second moment place is in human life, who is entitled to tell
was characterized by the study of narratives them and when, how they are received, and
as texts with a particular form. The third how they work in the social world. Narrative
moment includes a movement beyond a sepa- analysis is thus inseparable from concerns of
rate narrative text, into the study of narratives the narrative constitution of selves, identities
and storytelling as polymorphous phenomena and social realities.
in context. This chapter first discusses the concept of
Narratives bring into the open rich, detailed narrative and then proceeds to outline the use
and often personal perspectives. Therefore, of narratives before what has been termed
it is easy to misunderstand narrative simply ‘the narrative turn’. Instead of one narrative
turn, three partly separate turns are discussed. of speech such as argumentation, instruction
As early versions of narrative analysis, the and narration (Linde 1993; Fludernik 2000).
models of Vladimir Propp (1968) and William In the second case, narrative is a substitute
Labov and Joshua Waletsky (1997) will be for a general assumption, theory or ideo-
introduced next. The Labovian model will logical stance without temporal organization
be systematically used as a comparative (Rimmon-Kenan 2006). Clive Seale, for
backdrop for further developments: the move example, suggests a far broader notion of
from text to context and the contribution of narrative:
recent semantic and cognitive studies for the
analysis of narratives. The last section sug- I understand narratives to be constructed through
gests expectation analysis as a way to connect many things, including acts of consumption, for
the Labovian heritage, contextual orientation example, which can be made symbolically to tell
stories about tastes, relationships (whether real or
and the idea of positioning. The focus of the desired) or social standing. (Seale 2000, 37)
chapter is on the analytic procedures, not on
the interpretive alternatives. Seale points out convincingly how narrativity
and narrative understanding are not something
that only accounts for social action in
THE NOTION OF NARRATIVE retrospect. He also rejects, in a useful way, the
too narrow textualist ways of understanding
Social scientists have seldom considered defi- narrative and opens new areas for narrative
nitions of narrative (cf. Brockmeier and Harré analysis. Narrativity is woven into acting and
1991; Riessman 1993, 17–18). Many scholars planning in ways discussed more thoroughly
simply repeat Aristotle’s characterization of a moment later. But yet, in order to ward
a good tragedy having a beginning, middle off the tendency of ‘narrative imperialism’
and end (Aristotle 1968, 1450b). For open, (Strawson 2004; Phelan 2005a), the elegant
conversational or artistic narratives this is solution suggested by Mari-Laure Ryan might
a far too compelling formula, emphasizing the be more sustainable:
clear sequence of events; on the other hand
the terms are far too broad to reveal anything The narrative potential of life can be accounted for
fundamental in the nature of what narratives by making a distinction between ‘being a narrative’,
actually do. and ‘possessing narrativity’. (Ryan 2005, 347)
Barbara Herrnstein Smith (1981, 228)
offers a useful, rhetorically oriented defi- Narrativity may be understood as an aspect
nition: ‘Someone telling someone else that of texts, experiences and action; an aspect
something happened’. With a slight revision that invites more or less direct narrative
we can also include sensitivity to the context: responses. Narrativity is a matter-of-degree,
‘Somebody telling somebody else on some rendering texts and speech more or less
occasion and for some purpose(s) that some- narrative. A wish for analytic clarity does
thing happened’ (Phelan 2005b, 18). The next not imply that narratives would exist as pure
step taken in this chapter is to suggest that and distinct objects. It would be hopeless
one can also turn the term ‘somebody’ into and misleading to assume that narratives
the plural form, making shared tellership are formally similar, always complete and
visible (Ochs and Capps 2001). always neatly distinct from other kinds of
Cultural studies may be criticized for two discourse (Ochs and Capps 2001). ‘Nar-
confusing ways of discussing narrative. In the rative is first and foremost a prodigious
first case, all kinds of interview talk is variety of genres’, asserts Ronald Barthes
understood as narrative, narration or story. (1966/1977, 79). This means that no definition
In such manner, the whole term of narrative is will fit all narratives and that the desire for
itself at risk of becoming redundant. Ordinary a conceptual consensus may be rather counter-
talk may as well include different genres productive.
ANALYZING NARRATIVES AND STORY-TELLING 449
NARRATIVES BEFORE NARRATIVE not only on sociological approaches, but

ANALYSIS also on those of linguistics and literature’
(p. 62). Where ‘life records’orient the analysis
Many kinds of narratives were used as towards registering past events, Kohli already
research material long before any narrative addresses the relevance of the present moment
analysis. William I. Thomas and Florian and expectations of the future in the creation
Znaniecki (1984) used hundreds of more or of biographical materials. Kohli notices the
less storied letters and other life documents relevance of literary analysis for sociology by
in their classical work The Polish Peasant asserting that ‘both literature and sociology
in Europe and America, originally published are dealing with texts’ (ibid., 67). The tone
1918–1920. In their analysis, letters and other and point of view of his analysis is explicitly
documents constitute ‘life records’. textualist: life stories should be analyzed as
The Polish Peasant demonstrates the power texts like literary artefacts.
of individual story for the sociological imag- The use of stories in social research thus
ination. The belief in the factual, referential has a much longer history than narrative
transparency of these documents of life analysis. Erik H. Erikson (1956, 118) had even
is tangible while the authors read the letters suggested the systematic study of biographies
as illustrations of attitudes, life situations or of ‘ordinary people’. However, narrative was
their own conclusions. While the authors not theorized as such, and it received no
introduced new kinds of material to social entries in the index sections of the early works.
research, they were still convinced that their
field of study was sociology. No less than
50 years later, when Norman K. Denzin THE NARRATIVE TURNS
(1970) revisited the heritage of life history
method, he shared this sociological point Instead of one narrative turn and one
of departure. Denzin points out that ‘the new attitude towards narrative, we can
life history presents the experiences and rather speak of at least three different
definitions held by one person, one group turns and attitudes. Within literary studies,
or one organization as this person, group the narrative turn began as early as the
or organization interprets those experiences’ 1960s and signified a structuralist, scientific
(ibid, p. 220). and descriptive rhetoric in the study of
Daniel Bertaux’ anthology Biography and narrative. In historiography, the turn to
Society (1981b) is an important threshold narrative theory indicated criticisms of naive
publication. It can be read as an early narrative historiography and more generally
example of narrative studies; yet most of ‘the value of narrative in representing reality’
its articles discuss biography without any (Mink 1987; White 1987). The narrative turn
explicit narrative vocabulary. Bertaux himself in social sciences began later, in the early
recommends a far-reaching shift from the 1980s and encompassed entirely different
study of ‘life history’to ‘life stories’, believing issues: positive appraisal of narratives as such,
that the two kinds of data ‘might well involve a a general anti-positivist and often humanist
distinction between two different approaches’ approach to the study of human psychology
(Bertaux 1981a, 7). and culture (Plummer 1983, 2001; Bruner
Martin Kohli (1981) explicitly offers the 1991; Riessman 1993). Several historical
vision of narrative analysis. Kohli approaches accounts of narrative turns are currently
biographical data from the perspective of available (Fludernik 2005; Herman 2005;
its terms of production and wants to notice Kreiswirth 2005; Riessman 2001; Hyvärinen
the ‘codes’, or ‘textual schemata which are 2006b), yet the diversity of histories of
available for the production of meaningful different disciplines is seldom addressed.
biographical accounts’ (p. 62). But this is But why is it a turn in the first place?
a new research problem, and ‘one has to rely Aristotle wrote on tragedy; epics, biographies
and folktales had been studied for ages. Jean-Francois Lyotard’s (1993[1983]) rejec-
But the new theoretical landscape was neither tion of grand narratives was emblematic for
normative nor Aristotelian. What was new in the gradual rehabilitation of the alternative,
the 1960s narrative inquiry was what Martin small, forgotten and untold stories, often first
Kreiswirth identifies as ‘the institutional study in feminist studies. If quantitative research
of narrative for its own sake, as opposed foregrounded dominant trends, stories were
to the examination of individual narratives’ to theorize the particular. The post-modern
(2005, 377–378). Marie-Laure Ryan (2005, suspicion of authoritative professional, sci-
344) points out the birth of the new concept entific and institutional truths legitimated the
of narrative: ‘it is only in the past fifty years search for new voices. Second, the new
that the concept of narrative has emerged as an metaphoric discourse on ‘life as narrative’
autonomous object of inquiry’. The abstract, suggested that narratives should have a unique
theoretically rich, flexible, and thus quickly role in the study of human lives, action
moving concept of narrative was a new thing and psychology (MacIntyre 1984; Ricoeur
even in literature and linguistics in the 1960s. 1984; Carr 1986; Sarbin 1986; Bruner 1987;
Roland Barthes’s famous passage has been McAdams 1988, 1993; Polkinghorne 1988;
used to characterize the ubiquity of narrative: Ochberg and Rosenwald 1992; Widdershoven
1993; Brockmeier and Harré 2001; Plummer
Able to be carried by articulated language, spoken
2001; Bamberg 2004a; Hyvärinen 2006b).
or written, fixed or moving images, gestures,
and the ordered mixture of all these substances; The new theoretical perspective was not
narrative is present in myth, legend, fable, tale, easily reconciled with the inherited structural-
novella, epic, history, tragedy, drama, comedy, ist, formal and scientifically oriented methods
mime, painting […], stained glass windows, cin- of reading. In many a case, the adopted
ema, comics, news item, conversation. (Barthes
way to interpret narratives might duly be
1977, 79)
characterized as the hermeneutic re-telling
Looking from another angle, this passage of the stories, or narrative ‘criticism’ (e.g.
indicates the existence of a new kind of Freeman 1993, 2004; Josselsson 2004). There
concept of narrative. Structuralist narratology is always the point to which good stories are
nurtured scientific ambitions and rhetoric. informative as such, and able to evoke strong
Its imagery ‘projects the illusion that narrative reader responses.
is knowable and describable, and therefore The metaphorical impulse for narrative
that its workings can be explained compre- studies created a huge search for meth-
hensively. Narratology promised to provide ods. Here the story of narrative turn is
guidelines to interpretation uncontaminated not so much progressive as often explic-
by the subjectivism of traditional literary itly regressive: methods and theories were
criticism’ (Fludernik 2005, 38). searched out from earlier decades and from
In education, psychology and sociology the other disciplines. Vladimir Propp (1968) and
narrative turn properly took place in the early William Labov and Joshua Waletsky (1997
1980s, and often implied qualitative, human- [1967]), for example, became widely topical
istically oriented research – in stark contrast to in the 1980s. These retroactive moves of
the scientific, descriptive tenor of structuralist reception created substantial inconvenience
narratology and the growing post-structuralist between dominantly structuralist methods
discourse in cultural studies. The narrative and often post-structuralist, phenomenolog-
turn signified both a new prospect and a new ical and hermeneutic theorizing. Yet, many
dilemma: many kinds of research materials authors have tried to overcome this tension
were now to be theorized and analyzed as and have written introductions to narra-
narratives – but often without the smallest tive analysis, including for example Kohler
consensus on what it actually meant. Riessman (1993, 2001); Lieblich et al. (1998);
Two major theoretical moves had huge Clandinin and Connelly (2000); Czarniawska
impact on social research. Critical reception of (2004); Daiute and Lightfoot (2004).
The metaphoric understanding of life as THE PROPPIAN MODEL

narrative sometimes incorporated the idea
of one, ideally coherent, and encompassing Propp studied one distinctive genre of nar-
story of life, as for example in McAdams ratives empirically – the Russian wonder-
(1993, 5, italics MH) ‘(I)n the modern world tales. He found out that ‘in the wonder-tale
in which we all live, identity is a life different characters perform identical actions,
story. A life story is a personal myth that or, what is the same thing, that identical
an individual begins working on in late actions can be performed in very different
adolescence and young adulthood in order to ways’ (Propp 1984, 73). Within this formulaic
provide his or her life with unity or purpose genre, therefore, there are basic functions that
[…]’ (see also Polkinghorne 1988, 150). can be actualized in different ways but which
Narrative is thus adopted as a way to still occur in the tales in the same order. ‘So,
re-theorize too static conceptions of self for example, if the hero leaves home in quest
and identity. However, a kind of sweeping of something, and the object of his desires is
phenomenology and a rush to totalize the far away, he can reach it by magic horse, eagle,
narrative aspect of life seems to characterize flying carpet, flying ship, astride the devil’and
parts of the early theorizing: it is indeed so on (Propp 1984, 73).
the undivided and unquestioned ‘we’ who is Propp takes a remarkable variety of actions
having these narrative identities and narrative and condenses them into basic ‘functions’.
selves. Despite considerable diversity among On the other hand, the number of key
authors regarding how normative the tone actors in fairy tales was also reduced into
was, two major conclusions seemed to appear basic categories. He identified the roles
repeatedly: life as a whole is, or is in search of villain, donor, helper, princess and her
of, a narrative while narrative implies first father, dispatcher and hero (Propp 1968,
and foremost a unity of life (McAdams 1988, 77–83). The power of the model lies in
1993). Discursive, post-structuralist analyses this compression of the seemingly unlimited
of personal narrative of course rejected this number of agents and their possible moves
unitary vision. ‘From this perspective, the into a limited number of alternatives, and in
storyteller is not a unitary self, making arranging the functions into a sequence.
holistic sense of his/her life in the telling. Propp intended a bottom-up, empirical and
Instead, the stories that people tell about strictly inductive approach in the study of
themselves are about many selves, each wonder-tales. The reception of the book in
situated in particular contexts, and working the French discussion of the early 1960s
strategically to resist those contexts’ (Squire turned his project upside down (Propp 1984,
2004, 116). 69–74). As a consequence, the model was
The metaphoric discussion of life as primarily used in a top-down way: trying
narrative seems to have four equally important fit parts of whatever narratives into wonder-
consequences. First of all, it makes the tale categories. The merit of the model is to
collection and study of life narratives vitally suggest that well-established cultural genres
important; second, it privileges the ‘big’ may privilege certain categories of agents,
narratives of life (see Bamberg 2004a, 2006; repertoires of actions and processes.
Freeman 2006; Georgakopoulou 2006); third,
it gives a strong impetus towards reading life
narratives as coherent and unitary; and finally, THE LABOVIAN PERSONAL
the emphasis on the expressive nature of life NARRATIVE
narratives encourages us to envisage them
as self-sufficient wholes, waiting for ‘exter- Few other models of narrative analysis
nalization’, and not primarily interactionally have ever had such huge impact in social
occasioned utterances within institutional and research as the one presented by William
cultural contexts. Labov and Joshua Waletsky (1997 [1967]).
The formative role of the model was reflected narrative studies in practice, and offered the
in the 1997 special issue of Journal of means to approach fairly small stories in
Narrative and Life History. a detailed way.
Emerging from the linguistic discourse, the Mishler, however, was among the first to
model provided social research with one of the voice a key problem with the Labovian model,
first tools to approach the studied narratives in when he ‘pointed to its relative inattention
a detailed way. Textually, the model offered to the interview context in the production of
clear criteria to recognize narrative, and narratives’ (Mishler 1997, 71). In a typically
recognize its difference from other forms of structuralist way, the model portrays stories
talk (description, argument or question). as independent and fully formed texts, and
Labov and Waletsky tried to find the ‘appears to take the story or narrative as
smallest, most elementary, oral version of already formed, as waiting to be delivered’
narrative. Following the main trend of the (Schegloff 1997, 100). Schegloff points out
time, their approach is formal, trying to locate that nothing is told about the recipients
the structural model of narrative. But in during the telling or afterwards, no silences
addition to this, there is a conscious functional or hesitations are reported (Ibid., 100–101).
element: narratives are for ‘recapitulating The strong emphasis on sequence is
experience’, but this is not the only function. another problem. Mishler (1997, 72) conveys
A sheer experiential narrative would be a broadly shared experience in noticing
pointless, they argue, without the function of how ‘in intensive life history interviews,
‘evaluation’ (Labov & Waletsky 1997, 4). respondents rarely provided chronological
The basic element of the model is a ‘nar- accounts’. In other words, the model, strictly
rative clause’. Narrative clauses are ordered based on clause level narrative sequence, was
sequentially, and the change in their order all too narrow actually to capture the complex
would change the whole narrative. Thus, ‘I fell narration so typical in interview situations.
in love with Paula. My wife left me’ would This seems to lead to a marginalization in
be an entirely different story if the order the model of other aspects such as place,
of clauses had been reversed. But still, only by rendering it only as a static element of
very elementary narratives are exclusively orientation. But from life stories to fiction,
built on these narrative clauses; ‘free’ and place may have a much more central and
‘restricted’ clauses are needed as well. The constitutive role in the narrative (e.g. Herman
model is based on sequence, narratives being 2002; Georgakopoulou 2003).
‘one method of recapitulating past experience
by matching verbal sequence of clauses
to the sequence of events that actually FROM TEXT TO NARRATIVE PRACTICE
occurred’ (Labov and Waletsky 1997, 12).
The model has the following parts (Labov The changing reception of the Labovian
1972, 370): model exhibits a more profound change from
studying narratives as separate, complete
1. Abstract; 4. Evaluation; and self-sufficient texts towards a study of
2. Orientation; 5. Result;
3. Complicating action; 6. Coda.
narratives in context and interaction and the
study of narrative practices (Gubrium and
As Hymes (1996, 193) notes, this structure Holstein 2008). Within this emerging under-
resembles models created earlier in literary standing, ‘emphasis is on narrative activity
studies. In comparison with the very as sense-making process rather than as
theoretical discussion of life as narrative, it a finished product in which loose ends knit
steered interest towards more empirically together into a single story-line’ (Ochs and
based problems. Labovian approach and Capps 2001, 15).
such influential works as Elliot Mishler’s The work of Elinor Ochs and Lisa Capps
(1986) Research Interviewing informed (2001, 3) marks, in various ways, the end
of the dominance of the Labovian form in approach, they welcome the study of ‘nar-
narrative analysis. Instead of full narratives, rative environments’, which ‘challenge as
proceeding through the six steps, the authors well as affirm various stories’ (Ibid., 26)
suggest conversational narratives, many of and ‘narrative control’. Arthur W. Frank’s
which ‘seem to be launched without knowing influential study The Wounded Storyteller
where they will lead’ (Ibid., 2). If narrative, (1995) portrays ‘restitution narrative’ as
as ‘a cognitively and discursively com- one of the three basic models of ill-
plex genre’ often incorporates the elements ness narratives, but as the model that
of description, chronology, evaluation and is heavily supported by medical institu-
explanation, then the conversational story- tions, advertising and media. (Frank 1995,
telling completes and complicates this picture 78–79).
with the respective elements of question,
clarification, challenge and speculation (Ibid.,
18–19). What seemed to be formal and stable Events, states and narrative genres
elements are transformed into processes. Catherine Riessman (1990, 75–78) identi-
Jaber F. Gubrium and James A. Holstein fies three separate narrative genres in the
(2008) argue for a similar shift from strictly interviewed divorce talk she studied, calling
textual study of stories towards investigating them ‘proper’ stories, ‘habitual narratives’,
the storying process, or ‘narrative ethnog- and ‘hypothetical narratives’. In her dis-
raphy’, as they call their approach. They course, ‘story’ is reserved for the kind of
recognize the relevance of the conceptual oral narratives Labov and Waletsky studied.
distinction between the story and storying Indeed, how representative is the Labovian
process, which offers ‘grounds for thinking narrative?
about narrativity as something interesting Paul Ricoeur (1984) discusses ‘the
on its own’ (Ibid., 1). The observation has semantics of action’, suggesting a strong
profound consequences. When the interest relationship between the vocabularies of
moves from narratives as separate texts into narrative and action (Hyvärinen 2006a). The
storytelling and narrative practice within narrative theorist David Herman takes this
social institutions, the social functions of point further and unpacks the key Labovian
narrativity can be theorized in a new way. terminology of ‘complicating action’ in his
This move out from the confines of nar- Story Logic (2002). Drawing on the work
rative structure invokes a whole new array of language philosophers and semantics, he
of questions, and the authors emphatically suggests a far-reaching distinction between
invoke even larger contexts than Ochs and states, activities/processes, accomplishments
Capps, seeing them embedded like nested and achievements (Herman 2002, 29–37):
dolls:
[Zeno] Vendler […] proposed a fourfold distinction
Concern with the production, distribution, and between activity terms (e.g. used to describe some-
circulation of stories in society requires that we step one running or pushing a cart), accomplishment
outside of narrative material and consider questions terms (used to describe someone running a mile or
such as who produces particular kinds of stories, drawing a perfect circle), achievement terms (used
where are they likely to be encountered, what to describe someone reaching the top of a hill), and
are their consequences, under what circumstances state terms (used to describe someone as female,
are particular narratives more or less accountable, North-American, or in debt). (Herman 2002, 30)
what interests publicize them, how do they
gain popularity, and how are they challenged?
Each of these categories presumes a different
(Ibid., 19) extension of time. For processes, the implied
period of time is not definite, as it is
Distinctive for the work of Gubrium and for accomplishments. ‘Growing old takes
Holstein is the recognition of two dif- a certain unspecified amount of time, whereas
ferent layers of control: interactional and finishing a peanut butter sandwich entails
institutional (Ibid., 30–41). Within this a sequence of action that falls within a definite
temporal span’ (Ibid., 30). States (being in Relational Carrier, Attribute,

debt, being pregnant, being ill) apparently Identified, Identifier
Behavioural Behaver
hold true over variable stretches of time.
Verbal Sayer, Receiver, Target
This plurality helps to recognize new kinds Existential Existent
of narratives. Frank (1995, 77), for example,
briefly summarizes the restitution narrative: By simplifying Herman’s discussion, this
‘Yesterday I was healthy, today I’m sick, variety of process types may be condensed
but tomorrow I’ll be healthy again’. Does into three semantic roles of agent, experi-
this narrative qualify at all as a story in the encer/witness and patient. In comparison with
Labovian model? One could reasonably argue the Labovian model, the accounted mental
that states – states of mind, states of illness, processes and the corresponding roles of
states of body – figure more prominently the experiencers are considered on equal
within genres such as illness narratives. footing with material actions. Perception,
Herman suggests that different narrative affection and cognition may be the action
genres have different ‘preference-rules’. privileged by particular genres, say in ill-
As an example of different preference- ness narratives. Genres, in turn, are far
rules, one can take the difference between from exclusively textual phenomena, they
‘epic’ and ‘psychological novel’, (Ibid., 37): are entirely socially conditioned (Bakhtin
1986). Gubrium and Holstein (2008, 34–37),
Epic Accomplishment>achievement>
activities>states
for example, compare the narratives from
Psychological States>activities> Alcoholics Anonymous groups and Secular
novel accomplishments>achievements Sobriety Groups (SGS) as examples of dif-
ferent institutionally fostered ways of talking
It is easy to see that the Labovian model about alcoholism. While the SGS genre
prefers the ‘epics’ over the ‘psychological privileges the roles of agent and experiencer,
novel’. If the original question was about life- the AA-narratives in contrast privilege the
threatening situations, this inclination to see other end of the continuum, experiencer and
adventurous stories as paradigmatic narratives patient. The grammar thus provides a basic
is not surprising. The concept of ‘state’ is semantic matrix for the study of narrative
obviously of great importance for positional positioning.
analysis of storytelling. Actions, activities
and states can also be either bounded or
unbounded – Riessman’s habitual narratives SCRIPTS, STORIES AND NARRATIVITY
being a good example of the use of unbounded
verbal forms. The shift in attention from strictly defined nar-
Herman’s discussion of Halliday’s func- rative texts and their inner structures to cover
tional grammar, different verbal processes broader narrative practices, as Gubrium,
and semantic roles is of particular interest Holstein, Ochs, Capps and many others
(Ibid., 140–148). Instead of approaching the have suggested, invite closer scrutiny of
whole range of verbal processes in terms ‘narrativity’ as a theme. Many narratologists
of complicating action, Halliday’s grammar have argued for understanding narrativity as
offers useful new distinctions. His model a matter of degree (Fludernik 1996; Abbott
portrays six verbal processes: 2002, 22; Herman 2002). ‘She drove the car
Process Types (adapted from Halliday to work’ is unequivocally a narrative clause,
1994 via Herman 2002): yet its narrativity is almost nonexistent.
When children begin to narrate experiences
Process type Role types at about the age of two, their way of telling is
Material (Dispositive, Agent,Goal particular, because
Creative)
Mental (Perceptive, Senser, Phenomenon …children’s earliest personal narratives depict rou-
Affective, Cognitive) tine rather than particular, novel events. In addition,
when young children recount routine, scripted An interesting interplay occurs above, involv-
events, their narratives tend to be more detailed ing slightly different horizons of a cultural
than those of depicting less common incidents. script or (at least partly) shared cultural
(Ochs and Capps 2001, 78; Nelson 2003, 28)
knowledge, master narratives presenting nor-
It is as if these routines and scripts were still, matively privileged accounts, counter nar-
for children, an open and exciting world to ratives that resist and take distance from
be learned and accounted for. But it does such culturally privileged ways of telling, and
not take many years to learn to focus on the high narrativity of good stories that do not
unforeseen, exceptional; the diversions from simply recount the cultural scripts. Because
routine. Mark Turner (1996, 19) calls these master narratives are seldom explicitly told
routine sequences stories, and argues that by anyone, the more formulaic term ‘script’
‘most of our actions consist of executing small is preferred here to refer to the cultural and
spatial stories: getting a glass of juice from the situational impacts on narration.
refrigerator, dressing, bicycling to the market. As Jens Brockmeier and Rom Harré (2001)
Executing these stories, recognizing them, argue, very little is known about how exactly
and imagining them are all related because cultural scripts impose their models on
they are all structured by the same image individual action or narration. There seems to
schemas’. Turner is perfectly right in arguing be two different ways to reckon with cultural-
for the relevance of such spatial sequences cognitive scripts. One is conscious reflection,
in organizing and perceiving human action. resisting or affirmation of what has been called
However, it is argued that these sequences are ‘master narratives’ (Andrews 2004; Bamberg
not yet stories. 2004a; Jones 2004).
Cognitive theorists have discussed scripts, But what should be said about the master
frames and schemata as mental ways of narratives, which ‘remain inaccessible to
understanding new and old situations (Schank our conscious recognition and transforma-
and Abelson 1977). The famous restaurant tion’ (Bamberg 2004a, 361)? One answer
script informs us about understandings of is that the human capacity of narrativity
choosing a table, having a menu, ordering processes this scripting level in an automatic
food and paying the bill as relatively per- way. As a child, we start recounting the
manent parts of the script. Scripts organize formulaic, normal course of events but
shopping, political campaigning and sexual learn step by step – in telling, listening
relationships. Scripts, in addition to being and monitoring responses – to report on
cognitive, cultural and normative, also seem to the exceptional. Our skill as narrators is
be future oriented as well. It is possible to think established on expert understandings of such
that in both following such scripts in practice, cultural scripts as ‘going to a restaurant’.
and in telling stories on visiting restaurants, Herman suggests ‘a direct proportion between
that each teller contributes to the construction a sequence’s degree of narrativity’ and the
of a script, or as I suggest, a master narrative richness of ‘world knowledge’ that it triggers
on the issue. Michael Bamberg (2004a, by using scripts. A clear paradox is made
361) expresses a similar thought without manifest here: narratives should invoke a rich
explicitly making the connection between density of scripts to provide thick narration,
master narratives and scripts: yet narration cannot merely constitute the
repetition of these scripts:
I would like to catch up with the concession that
speakers constantly invoke master narratives, and Just as there is a lower limit of narrativity, past
that many, possibly even most, of the master which certain ‘stories’ activate so few world models
narratives employed remain inaccessible to our that they can no longer be processed as stories at
conscious recognition and transformation. Master all, refusing to be configured into action structures
narratives structure how the world is intelligible, drawing on pre-storied scripts and frames, so there
and therefore permeate the petit narratives of our is an upper limit of narrativity, past which the
everyday talk. tellable gives way to stereotypical, and the point of
a narrative, the reason for its being told, gets lost personal and subjective, expectations are
or at least obscured […]. (Herman 2002, 103) always social, local and conventional. The
analysis of expectations focuses on the
Important conclusions can be drawn from
dialectics of recognizing, following and devi-
this discussion. Narrativity is based on the
ating from scripts. Originally presented by
processing of numberless cultural scripts.
Hyvärinen (1994, 1998), the practice has been
Scripts as such are not stories or nar-
further elaborated by Komulainen (1998) and
ratives, because narrativity requires both
Löyttyniemi (2001).
‘canonicity and breach’, as Jerome Bruner
The detailed way of reading owes much to
(1991) has put it. Scripts and formulaic
Labov and Waletsky (1997) who already rec-
narratives are used as resources both in
ognized the cognitive relevance of negative
living and telling; yet the whole point of
expressions, which paradoxically do not tell
narrativity grows out of surprise, betrayal
what happened, but what did not. In a closer
of expectations, the ‘discordance’ of life
examination, there are a good many linguistic
(Ricoeur 1984). Beyond early childhood,
expressions reckoning expectations, not the
there is no social telling of script-like
actual experience. Deborah Tannen (1993) has
sequences. But the told narratives can never
summarized the following list of what she
be entirely individual, devoid of script-like
calls ‘evidence of expectation’:
resources. Narratives and narrativity thus
move between cultural scripts (‘canonicity’) (1) Repetition; especially repetition of whole utter-
and totally idiosyncratic babble (breach in ances; (2) False starts; (3) Backtracks, breaking-
every moment). down of the temporal order of telling; (4) Hedges
that flavour the relation between what was
If scripts and master narratives are vital
expected and what finally happened; indeed, just,
parts of narrativity, so is the expectation they anyway, however; (5) Negatives. As a rule negative
necessarily carry along. Labov and Waletsky is only used when its affirmative is expected (Labov,
(1997) noticed that recounted experiences 1972, 380-381); (6) Contrastives; (7) Modals;
are regularly contrasted with expectations. (8) Evaluative language; (9) Evaluative verbs;
(10) Intesifiers; including laughter. (Löyttyniemi
Reading, watching or listening to narratives
2001, 181)
trigger expectations that the stories either
confirm or betray. The point of the list is to illustrate the way
narrative is accounting for and making rele-
vant past futures and past expectations rather
EXPECTATION ANALYSIS than just piecing together action sequences.
The claim behind the analysis is that the key
Bakhtin (1986) not only understands all turning points of life stories exhibit thickness
language use as response to earlier utterances, of expectation and a strong presence of the ‘I’.
he also includes the aspect of expectation in The examples below are from a study on
every utterance: ‘As we know, the role of the the1970s Socialist Student Union (SOL) in
others for whom the utterance is constructed Finland (Hyvärinen 1994, 1998). The female
is extremely great. […] From the very begin- interviewee, ‘Kirsi’, used to be a secretary
ning, the speaker expects a response from general in a local university organization and
them, an active responsive understanding. member of the national central government of
The entire utterance is constructed, as it were, the SOL at the end of her career as an activist
in anticipation of encountering this response’ (Hyvärinen 1994, 164–167):
(Bakhtin 1986, 94).
1 I guess it has been the same year when I’ve
Expectation analysis presumes that oral been in the Central Government that
life stories essentially recount the story of 2 I was totally stuck up
changing, failing or realized expectations 3 that I knew that now everything will go totally
(in other words, they reflect ‘cononicity’). wrong
While experiences may be thought as mainly 4 but I couldn’t say it in a way that I’d believed
5 and probably the guys of SOL also loathed words ‘in any case’ indicate that she no longer
me […] cares about the old expectations, whatever
6 but I sulked there happens. There is still the balancing role of
7 To me, the visits to the government were a loyal ex-activist and reflecting experiencer
horrible. Yuk. on line (6) appreciating the experience as
8 But … the reason why I really had the horrible
such but quickly counterbalanced again by the
feeling
9 was that I was in a deadlock. In a way there
price of its learning. Kirsi moves to Helsinki,
was nothing to do where no one knows her, and is able to
experience a new teenage with dancing and
As a narrator, Kirsi is normally very deter- partying. The exhilaration is contrasted with
mined and strongly enacts her identity as the old expectation: ‘I really had hobbies no
regards the interviewer. The problem here Bolshevik would have ever […] believed’
is that she cannot position herself anymore a secretary general to have. It is easy to see
as an agent within the received horizon of how this play with expectations signifies her
expectations. In the above, she takes the re-positioning as regards the organization and
position of affective experiencer who is not the Communist movement.
able to be a competent reflective experiencer
in the situation. This is also a habitual
narrative: it is about the state of being A SECOND NARRATIVE TURN?
stuck, and unbounded emotional processes
(sulking, loathing). The whole section is The map of narrative analysis is changing
full of intensified, colourful expressions. rapidly. Textual and structuralist models of
She hates the situation; it is almost unbearable, analysis are giving way to more contextual
but it is against her expectations of being a approaches that focus on narrative practices
‘good comrade’ to withdraw. The conflict of and storytelling. Semantic theories and cog-
expectations is dramatized on lines (3–4): she nitive narratology offer new tools to connect
sees that everything is going totally wrong the vocabularies of action and narrative in
but she cannot explain it – that is, she productive ways. Recent theories of narrative
cannot solve the conflict within the frame of offer a new sensitivity to stories that are
enduring expectations, since she cannot take incomplete or foreground mental events (of
her position as a brave speaker of truths. observation, feeling, and cognition) instead of
A bit later she talks about leaving the posi- physical action. Expectation and positioning
tion in the organization. The usual dilemma in analysis alike direct attention to the fact that
those days was to find a replacement for the narratives not only account for past experi-
post to achieve a loyal exit: ences but position speakers within networks
of social and cultural expectations (Bamberg
1 It was a horrible task 2004b). The dialectics of ‘master’ and
2 I just said that in any case I’ll quit ‘counter’ narratives highlight the continuous
3 because I’d next start to go haywire move between cultural canon and individual
4 it was that tough expression. The rich flow of post-classical
5 because I was [p]
literary theory of narrative accentuates the
6 afterwards one learned a lot, in a way, though
need to realize the original, interdisciplinary
7 but it was a high price to pay
8 it was the worst situation I’ve gotten into in my ethos of narrative studies. Considering all
life including my divorce these new and dynamic elements, it is indeed
plausible to argue for a ‘second narrative
At last, Kirsi is able to reassume the role turn’, as Alexandra Georgakopoulou (2006)
of an agent, in the verbal form of speaker. does. The key to the realization of this
The conflict of expectations and the old promise, more than ever, seems to reside in
structure of expectations as a dutiful activist realizing the interdisciplinary mission of the
are broken down on lines (2–3), where her narrative turn.
REFERENCES Series editor: David Silverman. London, Thousand

Oaks & New Delhi: Sage.
Abbott, H. Porter. 2002. The Cambridge Introduction to Daiute, Colette and Lightfoot, Cynthia. 2004. Narrative
Narrative. Cambridge: Cambridge University Press. Analysis. Studying the Development of Individuals
Andrews, Molly. 2004. Opening to the original in Society. Thousand Oaks, London and New Delhi:
contributions. Counter-naratives and the power to Sage.
oppose. In Considering Counter-Narratives, edited by Denzin, Norman K. 1970. The research act. A theoretical
M. Bamberg and M. Andrews, pp. 1–26. Amsterdam introduction to sociological methods. In Methodolog-
and Philadelphia: John Benjamins. ical Perspectives, edited by R. J. Hill. Chicago: Aldine
Aristotle. 1968. Poetics. Oxford: Clarendon Press. Publishing Company.
Bakhtin, M.M. 1986. Speech Genres and Other Late Erikson, Erik H. 1994 [1956]. The problem of ego
Essays. Translated by V. W. McGee. Austin: Texas identity. In Identity and the Life Cycle, edited by
University Press. E.H. Erikson. New York and London: W. W. Norton &
Bamberg, Michael. 2004a. Considering counter nar- Company.
ratives. In Considering Counter-Narratives, edited Fludernik, Monika. 1996. Towards a ’Natural’ Narratol-
by M. Bamberg and M. Andrews. Amsterdam and ogy. London and New York: Routledge.
Philadelphia: John Benjamins. Fludernik, Monika. 2000. Genres, text types, or
Bamberg, Michael. 2004b. Positioning with Davie discourse modes? Narrative modalities and generic
Hogan. Stories, tellings, and identities. In Narrative categorization. Style 34 (1):274–292.
Analysis. Studying the Development of Individuals Fludernik, Monika. 2005. Histories on narrative
in Society, edited by C. Daiute and C. Lightfoot. theory (II): From structuralism to the present.
Thousand Oaks, London, New Delhi: Sage. In A Companion to Narrative Theory, edited by
Bamberg, Michael. 2006. Stories: Big or small – Why do J. Phelan and P.J. Rabinowitz. Malden, MA:
we care? Narrative Inquiry 16 :1, 139–147. Blackwell.
Barthes, Roland. 1977. Introduction to the structural Frank, Arthur W. 1995. The Wounded Storyteller. Body,
analysis of narrative. In Image, Music, Text. Roland Illness, and Ethics. Chicago & London: The University
Barthes, edited by S. Heath. New York: Hill and of Chicago Press.
Wang. Original edition, 1966.
Freeman, Mark. 1993. Rewriting the self. History,
Bertaux, Daniel. 1981a. From the life-history approach
memory, narrative. In Critical Psychology, edited by
to the transformation of sociological practice.
J. Broughton, D. Ingleby and V. Walkerdine. London
In Biography and Society. The Life History Approach
and New York: Routledge
in the Social Sciences, edited by D. Bertaux. Beverly
Freeman, Mark. 2004. Data are everywhere: narrative
Hills and London: Sage.
criticism in the literature of experience. In Narrative
Bertaux, Daniel. 1981b. Introduction. In Biography and
Analysis. Studying the Development of Individuals
Society. The Life History Approach in the Social
in Society, edited by C. Daiute and C. Lightfoot.
Sciences, edited by D. Bertaux. Beverly Hills and
Thousand Oaks, London and New Delhi: Sage.
London: Sage.
Brockmeier, Jens, and Rom Harré. 2001. Narrative. Freeman, Mark. 2006. Life ‘on holiday’? In defence of
Problems and promises of an alternative paradigm. big stories. Narrative Inquiry 16 (1):131–138.
In Narrative Identity. Studies in Autobiography, Self Georgakopoulou, Alexandra. 2003. Plotting the ‘right
and Culture, edited by J. Brockmeier and D. Carbaugh. place’ and the ‘right time’: place and time as
Amsterdam & Philadelphia: John Benjamins. interactional resources in narratives. Narrative Inquiry
Bruner, Jerome. 1987. Life as narrative. Social Research 13:413–423.
54 (1):11–32. Georgakopoulou, Alexandra. 2006. Thinking with small
Bruner, Jerome. 1991. The narrative construction of stories in narrative and identity analysis. Narrative
reality. Critical Inquiry 18:1–21. Inquiry, 16 :1, 122–130.
Carr, David. 1986. Time, narrative, and history. In Stud- Gergen, Mary M., and Kenneth J. Gergen. 1993. Nar-
ies in Phenomenology and Existential Philosophy, ratives of gendered body in popular autobiography.
edited by J.M. Edie. Bloomington & Indianapolis: In The Narrative Study of Lives, edited by R. Josselsson
Indiana University Press. and A. Lieblich. Newbury Park, London & New Delhi:
Clandinin, D. Jean, and F. Michael Connelly. 2000. Sage.
Narrative Inquiry. Experience and Story in Qualitative Gubrium, Jaber F., and James A. Holstein. (2008).
Research. San Francisco: Jossey-Bass Publishers. Narrative ethnography. In Handbook of Emergent
Czarniawska, Barbara. 2004. Narratives in Social Methods, edited by S. Hesse-Biber and P. Leavy.
Science Research. Introducing Qualitative Methods. New York: Guilford Press.
Halliday, M.A.K. 1994. An Introduction to Func- Interpretation. Vol. 47, Applied Social Research Meth-
tional Grammar. Second ed. London, Melbourne & ods Series. Thousand Oaks, London & New Delhi:
Auckland: Edward Arnold. Sage.
Herman, David. 2002. Story Logic. Problems and Linde, Charlotte. 1993. Life Stories. The Creation of
Possibilities of Narrative. Lincoln and London: Coherence. New York, Oxford: Oxford University
University of Nebraska Press. Press.
Herman, David. 2005. Histories of narrative theory (I): Lyotard, Jean-Francois. 1993 [1983]. The Postmodern
A genealogy of early developments. In A Companion Condition, edited by W. a. J. S.-S. Godzich, Theory
to Narrative Theory, edited by J. Phelan and and History of Literature. Minneapolis: University of
P.J. Rabinowitz. Malden, MA: Blackwell. Minnesota Press.
Hymes, Dell. 1996. Ethnography, linguistics, narrative Löyttyniemi, Varpu. 2001. The setback of a doctor’s
inequality. In Critical Perspectives on Literacy and career. In Turns in the Road. Narrative Studies
Education, edited by A. Luke and J. Cook. London: of Lives in Transition, edited by D. P. McAdams,
Taylor & Francis. R. Josselsson and A. Lieblich. Washington, DC:
Hyvärinen, Matti. 1994. Viimeiset taistot [The Last American Psychological Association.
Battles]. Tampere: Vastapaino. MacIntyre, Alasdair. 1984. After Virtue. A Study in Moral
Hyvärinen, Matti. 1998. Thick and thin narratives: Theory. Second ed. Notre Dame: University of Notre
Thickness of description, expectation, and causality. Dame Press.
In Cultural Studies: A Research Volume, edited by McAdams, Dan P. 1988. Power, Intimacy, and the
N.K. Denzin. Stamford: JAI Press. Life Story. Personological Inquiries into Identity.
Hyvärinen, Matti. 2006a. Acting, thinking, and telling: New York, London: The Guilford Press.
Anna Blume’s Dilemma in Paul Auster’s In the Country McAdams, Dan P. 1993. The Stories We Live By. Personal
of Last Things. Partial Answers 4 (2):59–77. Myths and the Making of the Self. New York and
Hyvärinen, Matti. 2006b. Towards a conceptual history London: The Guilford Press.
of narrative. In The Travelling Concept of Narrative, Mink, Louis O. 1987. Historical Understanding, edited
edited by M. Hyvärinen, A. Korhonen and J. by Brian Fay, Eugene O. Golob and R. T. Vann. Ithaca
Mykkänen. Helsinki: Helsinki Collegium for Advanced and London: Cornell University Press.
Studies. Mishler, Elliot G. 1986. Research Interviewing. Context
Jones, Rebecca L. 2004. ‘That’s Very Rude, I Shouldn’t and Narrative. Cambridge, MA: Harvard University
be Telling You That’. Older women talking about Press.
sex. In Considering Counter-Narratives. Narrating, Mishler, Elliot G. 1997. A matter of time: when, since,
Resisting, Making Sense, edited by M. Bamberg and after Labov and Waletsky. Journal of Narrative and
M. Andrews. Amsterdam/Philadelphia: John Benjamins. Life History 7 (1–4):61–68.
Josselson, Ruthellen. 2004. The hermeneutics of faith Nelson, Katherine. 2003. Narrative and the emergence
and the hermeneutics of suspicion. Narrative Inquiry of a consciousness of self. In Narrative and
14 (1):1–28. Consciousness. Literature, Psychology, and the Brain,
Kohli, Martin. 1981. Biography: account, text, method. edited by G.D. Fireman and T.E. McVay Jr. Oxford and
In Biography and Society. The Life History Approach New York: Oxford University Press.
in the Social Sciences, edited by D. Bertaux. Beverly Ochberg, Richard L., and George C. Rosenwald.
Hills and London: Sage. 1992. Storied Lives:The Cultural Politics of Self-
Komulainen, Katri. 1998. Kotihiiriä ja ihmisiä. Vol. 35, understanding. New Haven and London: Yale
Joensuun yliopiston yhteiskuntatieteellisiä julkaisuja. University Press.
Joensuu: Joensuun yliopisto. Ochs, Elinor, and Lisa Capps. 2001. Living Narrative.
Kreiswirth, Martin. 2005. Narrative turn in the Creating Lives in Everyday Storytelling. Cambridge,
humanities. In Routledge Encyclopedia of Narrative MA: Harvard University Press.
Theory, edited by D. Herman, M. Jahn and M.-L. Ryan. Phelan, James. 2005a. Editor’s column. Narrative
London and New York: Routledge. 13 (3):205–210.
Labov, William. 1972. Language in the Inner City. Phelan, James. 2005b. Living to Tell about It. A Rhetoric
Oxford: Basil Blackwell. and Ethics of Character Narration. Ithaca: Cornell
Labov, William, and Joshua Waletsky. [1967] 1997. Nar- University Press.
rative analysis: oral versions of personal experience. Plummer, Ken. 1983. Documents of Life. An Introduction
Journal of Narrative and Life History 7 (1–4): 3–38. to the Problems and Literature of a Humanistic
Lieblich, Amia, Rivka Tuval-Mashiach, and Tamar Zilber. Method, Contemporary Social Science Series. London:
1998. Narrative Research. Reading, Analysis, and Allen & Unwin.
Plummer, Ken. 2001. Documents of Life 2. An Invitation Schank, Roger, and Robert Abelson. 1977. Scripts, Plans,
to a Critical Humanism. London, Thousand Oaks, Goals and Understanding. New York: John Wiley &
New Delhi: Sage. Sons.
Polkinghorne, Donald E. 1988. Narrative knowing and Schegloff, Emanuel A. 1997. ‘Narrative Analysis’ thirty
the human sciences. In SUNY Series in Philosophy of years later. Journal of Narrative and Life History
the Social Sciences, edited by L. Langsdorf. Albany: 7 (1–4):97–106.
State University of New York Press. Seale, Clive. 2000. Resurrective practice and narrative.
Propp, Vladimir. 1968. Morphology of the Folktale. In Lines of Narrative. Psychosocial Perspectives,
Translated by L. Scott. 2nd ed. Austin: University of edited by M. Andrews, S.D. Sclater, C. Squire and
Texas Press. A. Treacher. London and New York: Routledge.
Propp, Vladimir. 1984. The structural and historical study Smith, Barbara Herrnstein. 1981. Narrative version,
of the wondertale. In Theory and History of Folklore, and narrative theories. In On Narrative, edited by
edited by A. Liberman. Manchester: Manchester W.J.T. Mitchell. Chicago: University of Chicago Press.
University Press. Squire, Corinne. 2004. Narrative genres. In Qualitative
Ricoeur, Paul. 1984. Time and Narrative 1. Translated by Research Practice, edited by C. Seale, G. Gobo,
K. McLaughlin and D. Pellauer. 3 vols. Vol. 1. Chicago J.F. Gubrium and D. Silverman. London, Thousand
and London: The University of Chicago Press. Oaks, New Delhi: Sage.
Riessman, Catherine Kohler. 1990. Divorce Talk. Women Strawson, Galen. 2004. Against narrativity. Ratio (New
and Men Make Sense of Personal Relationships. New Series) XVII (4):428–452.
Brunswick and London: Rutgers University Press. Tannen, Deborah. 1993. What’s in a frame? In Framing
Riessman, Catherine Kohler. 1993. Narrative Analysis, in Discourse, edited by D. Tannen. Oxford: Oxford
Qualitative Research Methods Volume 30. Newbury University Press. Original edition, Freedle, R.O. (Ed.)
Park, London & New Delhi: Sage. 1979 New Directions in Discourse Processing.
Riessman, Catherine Kohler. 2001. Analysis of personal Thomas, William I., and Florian Znaniecki. 1984.
narratives. In Handbook of Interview Research, edited The Polish Peasant in Europe and America, edited by
by J.F. Gubrium and J.A. Holstein. Thousand Oaks, E. Zaretsky. Urbana and Chicago: University of Illinois
London & New Delhi: Sage. Press. Original edition, 1918–1920.
Rimmon-Kenan, Shlomith. 2006. Concepts of narrative. Turner, Mark. 1996. The Literary Mind. The Origins of
In The Travelling Concept of Narrative, edited by Thought and Language. Oxford and New York: Oxford
M. Hyvärinen, A. Korhonen and J. Mykkänen. University Press.
Helsinki: Helsinki Collegium for Advanced Studies, White, Hayden. 1987 [1981] The value of narrativity
University of Helsinki. in the representation of reality. In The Content
Ryan, Marie-Laure. 2005. Narrative. In Routledge Ency- of the Form. Narrative Discourse and Historical
clopedia of Narrative Theory, edited by D. Herman, Representation, edited by H. White. Baltimore &
M. Jahn and M.-L. Ryan. London and New York: London: The Johns Hopkins University Press.
Routledge. Widdershoven, Guy A.M. 1993. The story of
Sarbin, Theodor. 1986. Narrative as a root metaphor life. Hermeneutic perspectives on the relationship
for psychology. In Narrative Psychology. The Storied between narrative and life history. In The Narrative
Nature of Human Conduct, edited by T. Sarbin. Study of Life, Volume I, edited by R. Josselsson and
New York: Praeger Press. A. Lieblich. Newbury Park and London: Sage.
27
Reconstructing Grounded Theory
Kathy Charmaz
In the 40 years since Barney G . Glaser Yet which grounded theory strategies to
and Anselm L. Strauss (1967) wrote their adopt, what they entail and how to put
pioneering book, grounded theory has become them into practice have undergone change
a general qualitative method that cuts across and reconstruction, even by the originators
disciplines and professions. The method con- themselves (see Glaser, 1998, 2001; Strauss
sists of several distinctive strategies; however, 1987; Strauss and Corbin 1990, 1998). Major
scholars vary in what they adopt and major differences among proponents arise from
proponents differ on which strategies they varied assumptions about what constitutes
see as integral to the method (see Charmaz, theory and from contrasting epistemological
2006; Clarke, 2005, 2006; Glaser, 1998, 2001; allegiances. These allegiances result in dif-
Strauss, 1987; Strauss and Corbin, 1990, ferent constructions of the research process,
1998). What then is grounded theory? What the practice of theorizing, and what stands
does it include? The term refers to both as erosion or evolution of the method (see
a method of theory construction, my focus Baker et al., 1992; Boychuk Duchscher and
here, and the product of this construction, Morgan, 2004; May, 1996; Mills et al., 2006;
a theory that explains or elucidates a particular Stern, 1994).
process or phenomenon. Throughout the chapter, I show how
The grounded theory method provides sys- grounded theory, and its various iterations,
tematic, successive strategies for developing have shifted and changed. I also address
fresh ideas to collect, study, and analyze the following objectives: (1) to situate
empirical data (see also, Atkinson et al., 2003; the original methodological contribution of
Clarke, 2005, 2006; Glaser, 1978; Glaser and grounded theory; (2) to look at the history
Strauss, 1967). Grounded theory starts with an and development of the method; (3) to out-
inductive logic and emphasizes simultaneous line postmodern challenges to the method
data collection and analysis to construct and discuss its constructivist reconstructions;
middle-range theories. and (4) to analyze grounded theory as
Those who subscribe to grounded theory method and practice. I attend to debates about
would accept this definition of the method. grounded theory and show how they are
played out as various proponents reconstruct Schreiber and Stern, 2001; Stern, 1980;
the method and note its potential for creating Wilson and Hutchinson, 1996; Wuest, 1995,
imaginative interpretations. 2001) and information systems (Bryant, 2002,
2003; Urquhart, 2003). Specialists are begin-
ning to appear within subfields (LaRossa,
2005), and have become established in
SITUATING THE USE AND grounded theory computer applications (see,
METHODOLOGICAL IMPORT for example, Fielding and Lee, 1998; Lonkila,
OF GROUNDED THEORY 1995; Kelle, 2004).
The logic and explicit strategies of
To understand why and how scholars, grounded theory have contributed to its wide
including its originators, have reconstructed appeal. Unlike earlier twentieth-century field
grounded theory, one needs to know about the research, Glaser and Strauss made simultane-
situations surrounding its development and ous data collection and analysis an integral
current directions. These situations transcend part of grounded theory. They proposed ways
the method itself as they include its followers of focusing and integrating data collection
and critics. Both followers and critics tend to while advancing the theoretical analysis of the
have limited visions of the method. Followers collected data. The logic of grounded theory
commonly identify the version of grounded relies on starting with inductive data and sub-
theory they first learned as representing the jecting them to close scrutiny through specific
method in its entirety (Urquhart, 2007). coding and analytic practices, while collecting
Some followers and critics have scarcely read data (see Charmaz, 2003, 2006; Glaser, 1978,
beyond Glaser and Strauss’ (1967) original 1998). Grounded theory coding practices
exegesis. Critics often conflate the way the lead to developing analytic categories, and
originators used the method as mirroring then refining these categories and checking
inherent characteristics of the method (see, them empirically, as the analysis becomes
for example, Burawoy, 1991; Layder, 1998). increasingly theoretical. Thus, the logic of
They argue that grounded theory cannot grounded theory means that researchers retain
account for macro social processes or struc- strong empirical foundations in their work
tures left untapped at the interactional level. and offer abstract, conceptual theories of the
Grounded theory made its methodological studied empirical phenomena.
mark by proposing explicit guidelines for the- Glaser and Strauss’ original statement was
orizing from data. From Glaser and Strauss’ revolutionary for four reasons. First, they took
original treatise to recent major statements discussion of qualitative inquiry beyond data
by Adele E. Clarke (2003, 2005) and Kathy collection techniques and field research roles.
Charmaz (2000, 2006), grounded theorists Instead, they explained how to streamline
have emphasized constructing theory from data collection by asking analytic questions
inductive qualitative data through using suc- and developing theoretical rendering of the
cessive analytic strategies. Grounded theory data—from the very beginning of the research
methods have appealed to diverse researchers endeavor. Second, they outlined inductive
from varied disciplines and professions who guidelines for coding data and developing
have claimed allegiance to using them. emergent abstract categories. Third, Glaser
By now, spokespersons have emerged in a and Strauss argued that their methodological
variety of disciplines, such as psychology strategies could advance data analysis to
(see for example, Charmaz, 2003; Charmaz construct middle-level theories. Fourth, they
and Henwood, 2007; Henwood and Pidgeon, provided powerful legitimation for conduct-
1995; 2003; Pidgeon and Henwood, 1996, ing inductive qualitative research at a time
2004; Rennie et al., 1988), management when most social scientists were enamored
(Goulding, 2002; Locke, 2001), nursing with the promise of rigorous quantitative
(Benoliel, 1996; Chenitz and Swanson, 1986; inquiry.
RECONSTRUCTING GROUNDED THEORY 463
This last reason led social scientists to claim research widened. U.S. sociology steadily
that they adopted grounded theory methods adopted more quantitative techniques and the
when they had conducted some sort of qual- distance between theory and methods grew
itative research or had only followed one or (Charmaz, 2000, 2006).
two grounded theory strategies but did not aim Grounded theory methods arose from
for theory development. Other researchers’ Glaser and Strauss (1967) efforts to explicate
claims of adopting grounded theory strategies the strategies they had followed while con-
may have been more consistent with the ducting their qualitative studies of the social
method but their reductionist, mechanistic organization of dying in hospitals (Glaser and
application of it undermined its potential Strauss, 1965, 1968). Their efforts brought
for open-ended, creative theorizing. Miller’s renewed attention to qualitative research at
(2000) argument still holds: the full potential a pivotal point in time. Platt (1996) points
of grounded theory methods for generating out that the development of public opinion
theory remains untapped. Researchers can research and statistical techniques during
profit from the flexible, open-ended strategies World War II and the institution building of
of grounded theory to conduct systematic, Kurt Lewin and Paul Lazarsfeld afterwards
directed inquiry and to engage in imaginative established the hegemony of the survey and
theorizing from empirical data. the dominance of its proponents’departments.
Meanwhile, inductive qualitative inquiry in
sociology in the United States had shifted
HISTORY AND DEVELOPMENT OF from the case study to participant observation.
This methodology had not been theorized,
GROUNDED THEORY
explicated, or codified in accessible ways.
Nor, as Platt notes, did proponents talk about
The emergence of grounded theory
field methods. Paul Rock (1979) points out
The history and development of grounded that novices learned Chicago school field
theory are intertwined with larger currents in research through a combination of mentoring
social scientific inquiry, and particularly with and becoming immersed in field research
tensions between qualitative and quantitative settings. What researchers actually did while
research in sociology in the United States. in the field and afterwards remained opaque.
During the early decades of the twentieth cen- Early methodological texts emphasized data
tury, sociologists, particularly at the Univer- gathering and field work roles and relations
sity of Chicago, began building an empirical rather than qualitative analytic strategies (see,
foundation in life histories and case studies1 . for example, Adams and Preiss, 1960; Junker,
By mid-century this foundation had weakened 1960; Kahn and Cannell, 1957)2 .
due to the development of quantitative By 1965, quantification with its positivist
methods. Unlike strong British and European underpinnings framed methodological discus-
sociological traditions in critical debate and sions in United States sociology3 . Methods
praxis in theorizing, U.S. sociology advanced textbooks of the day outlined methodological
quantification of various sorts and abstract objectives and procedures that did not fit
macro theories devoid of solid empirical qualitative research. Some mid-century quan-
roots. As Jennifer Platt (1996) states, leading titative researchers saw qualitative inquiry
quantitative methodologists often borrowed as a precursor to constructing quantitative
procedures from other disciplines and some instruments but most viewed qualitative
sociologists quantified measures to persuade studies as impressionistic, anecdotal, and
outside audiences, not because they believed biased. As such, qualitative research could
quantification to be necessary. At that time, not meet mid-century canons for reliability
however, the divide between theory and and validity. The inability of qualitative
research deepened and the gap between induc- researchers to replicate their studies further
tive qualitative and deductive quantitative marginalized qualitative research.
The arrival of grounded theory sparked the rhetoric of operationalizing theoretical

growing interest in qualitative methods concepts into testable concepts through
beyond Chicago school sociologists and deductive reasoning.
their students and subsequently changed Glaser and Strauss intended to wrest theo-
the way American researchers learned these rizing from exclusive domain of elite armchair
methods. Given the hegemony of quantita- macro theorists and to join theory and methods
tive research, the Discovery book probably through proposing arguments and provid-
remained unnoticed by leading quantitative ing methodological strategies. They aimed
researchers. Yet it commanded enormous to have empirical research inform theory
symbolic and practical influence among U.S. construction and advocated an egalitarian
qualitative researchers and graduate students approach to it: ordinary researchers could
with qualitative inclinations. Grounded theory construct useful grounded theories.
methods made qualitative methods accessi-
ble. By adopting grounded theory methods,
The originators’ construction
professors could impart specific data collec-
of the method
tion and analytic strategies to their students.
From its beginning, grounded theory spread The Discovery of Grounded Theory stands as
beyond sociology. Strauss’ doctoral students a pioneering book that spawned generations
in nursing brought grounded theory to new of qualitative researchers, many of whom
graduate students as the nursing profession read no further works on grounded theory but
began to establish its own doctoral programs. claimed to use it. Perhaps most researchers
The Discovery book legitimated inductive cited the Discovery book to legitimize qual-
qualitative research. Glaser and Strauss chal- itative inquiry rather than to demonstrate
lenged positivistic proclivities to apply the adherence to the method. Glaser’s lesser-
logic of quantitative research to qualitative known book, Theoretical Sensitivity (1978),
studies. In opposition to narrow positivistic provided the most definitive early exegesis
ideals, Glaser and Strauss proposed that of the logic of the method and instructions
qualitative inquiry had its own logic and on how to use it. Nonetheless, the dense
could be conducted systematically. In short, writing and the assumption of the reader’s
they rejected the frame that quantitative familiarity with the grounded theory method
methodologists imposed on research design made Theoretical Sensitivity most accessible
and practice. Glaser and Strauss refuted quan- to those already schooled in this method (but
titative researchers’ claim to own exclusive see Melia, 1987, 1996). In this book, Glaser’s
rights on rigor. They also challenged the concept-indicator approach and inductive
established—and growing—division of labor reasoning took explicit form and his positivist
between theory and research. Mid-century assumptions became more visible.
theorists and methodologists had pursued Mid-century qualitative research also
different problems. At that time, theorizing became an object of Glaser and Strauss’
emphasized grand theories that explained the scrutiny. With the notable exception of
social order of whole societies but exhibited Erving Goffman (1959, 1961, 1963),
scant study of empirical research. Glaser and inductive qualitative studies of the time had
Strauss saw such theorizing as far removed largely remained descriptive4 . Goffman wove
from worlds of everyday action. Instead of stunning theoretical insights throughout his
arising from human action, grand theory of descriptions and essays but seldom organized
the day took a logico-deductive form which them in explicit theoretical frameworks.
reasoned from abstract concepts down to From the start, Glaser intended to take
empirical instances. Meanwhile quantitative description apart and treat it in analytic,
methodologists increasingly turned to refining abstract, general, and parsimonious concepts.
instruments, developing statistical measures, Where Goffman’s work was rich in both
and investigating concrete problems despite detail and context, Glaser (1978), in contrast,
aimed for streamlined, general abstract sampling and theoretical sampling become
statements removed from context. Goffman’s blurred? How does a budding grounded
metaphor of the drama permitted readers theorist reconcile Glaser’s notion of a single
to see social life anew. In keeping with core variable with the search for meanings and
his empirical emphasis, however, Glaser actions in a field of inquiry?
(1978) contended that Goffman relied too For Strauss, the search for meanings and
heavily on this metaphor. Glaser and Strauss actions formed the core of sociological
called on qualitative researchers to raise research. Pragmatists John Dewey, George
their description to a theoretical level and to Herbert Mead, and Charles S. Peirce had left
develop explicit theoretical statements. a lifelong imprint on him. During his doctoral
The Discovery of Grounded Theory studies Strauss’ immediate intellectual influ-
attacked reigning theoretical and ences at the University of Chicago included
methodological assumptions of the day Herbert Blumer, Everett Hughes, and Robert
and led the charge to win a new and Park. Thus, Strauss brought symbolic inter-
renewed place for qualitative inquiry—for actionism, and ethnographic field research to
everyone. In this sense, Glaser and Strauss grounded theory and an emphasis on work
democratized qualitative research. For them, to his empirical research. Strauss’ pragmatist
it consisted of a set of skills that students heritage gave grounded theory its emphases
beyond elite Chicago circles could learn. on agency, emergence, meaning, and action.
Simultaneously, they demystified qualitative Both Glaser and Strauss aimed to study social
analysis by offering flexible guidelines. and social psychological processes. They
This combination of democratization and first planned to generate substantive theories
demystification struck a responsive chord that explicated and explained a fundamental
among diverse audiences. social or social psychological process within
Grounded theory combined two competing a social setting or a particular experience
traditions in mid-century American sociology such as dying in hospitals. They argued
in an unlikely marriage. Glaser wished that the resulting grounded theory could
to codify qualitative inquiry in an analo- explain the major categories in the studied
gous way that his mentor, Paul Lazarsfeld process, explicate their properties, demon-
(Lazarsfeld & Rosenberg, 1955) had codified strate the causes and conditions under which
quantitative research5 . Glaser’s Columbia these categories emerged and varied, and
University intellectual heritage in structural- delineate their consequences. As Glaser and
functionalism, rigorous quantitative methods, Strauss (1965, 1968) developed categories
and the quest for middle-range theories gave such as ‘mutual pretense,’ ‘open awareness,’
grounded theory its rigor, language, direction, ‘closed awareness,’ and ‘time expectations,’
and objectives. He borrowed terms from they began to move into formal theoriz-
quantitative research design but gave them ing because their categories and processes
new, often inverted, meanings. Thus qualita- reached across substantive areas and could be
tive coding became something that emerged further explored in these new areas. Thus, as
from data rather than applied to it; sampling Glaser and Strauss’ theories reached this level
became a strategy to fill out theoretical of generality, they advocated refining their
categories rather than to seek population emerging theories by seeking relevant data in
representativeness, and core variables arose varied settings that moved across substantive
from tentative categories not from deduced areas. The researchers would then refine the
operations from abstract concepts. The lan- categories of the emerging formal theory, as
guage itself spawned confusion that has informed by the new data these categories
lasted until the present. When does grounded subsumed.
theorists’ coding emerge from their study of Glaser and Strauss’ arguments in the
data rather than serving as codes applied to Discovery book contributed much to revital-
data? When, if ever, might representational izing qualitative research and to maintaining
and extending Chicago school ethnographic remained consistent with his 1978 exegesis
traditions in sociology. They inspired new of the method, which relied on comparative
scholars in diverse fields to pursue qualita- approaches at each step of the analytic pro-
tive research and trained doctoral sociology cess, avoidance of extant theories, a delayed
and nursing students in grounded theory. literature review, and on a direct and, often,
They offered innovative strategies to move narrow empiricism.
qualitative inquiry beyond description and To an extent, Strauss and Corbin’s technical
into explanatory theory that conceptualized applications foster a formulaic approach
the studied phenomena in theoretical cate- rather than developing Glaser’s type of emer-
gories and demonstrated abstract relationships gent analysis. They introduce axial coding as
between these categories. And they contended part of a complex ‘coding paradigm’ and the
that a completed grounded theory was useful, conditional matrix as techniques for viewing
unlike mid-century grand theory. Glaser and data and producing an analysis. In axial
Strauss proposed that a finished grounded coding, researchers (1) treat a category as an
theory would meet the following criteria: axis; (2) specify the properties and dimensions
a close fit with the data, usefulness, den- of this category; (3) relate categories to their
sity, durability, modifiability, and explanatory subcategories; and (4) delineate relationships
power (Glaser, 1978, 1992; Glaser and between them (Strauss and Corbin, 1998,
Strauss, 1967). p. 123). Strauss and Corbin argue that axial
coding brings the data back together again
into a coherent whole after fracturing them
Procedures versus emergence in the
during initial line-by-line coding (Charmaz,
reconstruction of grounded theory
2006, p. 186). In addition to forcing data
Strauss’ publication of Qualitative Analysis into preconceived frameworks, Glaser (1992,
for Social Scientists (1987) sowed the seeds 1998) viewed axial coding as sidestepping
of the first reconstruction of grounded theory. his families of theoretical codes that he
These seeds matured in his co-authored book laid out in Theoretical Sensitivity. Glaser
with Juliet Corbin, Basics of Qualitative views these codes as supplying the latent
Research (1990, 1998) because in significant links and theoretical explanations that hold
ways it revised grounded theory and set a new a researcher’s inductive categories together.
course for it. In his 1987 book, Strauss began He insists that theoretical codes must earn
to move grounded theory toward verification. their way into the analysis; however, whether
His co-authored works with Corbin further or not these codes constitute another form
this direction. In addition, Strauss and Corbin of forcing data remains ambiguous. Applying
created several new technical procedures to them mechanically would result in forc-
be applied to the data rather than emerging data—and forcing one’s categories into
ing from analyzing them. Glaser’s (1992) a particular configuration, as Glaser (1992)
acrimonious response to the first edition acknowledges. Seeing and pursuing which
of Basics disavows Strauss and Corbin’s theoretical directions, issues, and, possibly,
innovations and proclaims his version of concepts the data suggest makes more sense.
grounded theory to be the only authentic These theoretical directions may spawn
statement of the method. Glaser argues that original ideas that move beyond Glaser’s
Strauss and Corbin’s procedures force data theoretical codes or Strauss and Corbin’s axial
and analysis into preconceived categories and, coding.
thus, contradicted essential grounded theory Strauss and Corbin designed their other
guidelines based on comparative analysis and procedural innovation, the conditional/
emergent categories. Glaser saw Strauss and consequential matrix, to provide a technique
Corbin’s innovations as usurping the method for coding to make the intersections of
and imposing unnecessary complexity on micro and macro conditions/consequences
the analytic process. At that time, Glaser on actions visible and to clarify connections
between them. By creating the conditional/ with code, code with code and so forth as the
consequential matrix, Strauss and Corbin researcher moves up levels of abstraction.
intended to make connections between levels The potential tensions between Glaser’s
of analysis more visible. positivism and Strauss’ pragmatism are per-
Kelle (2005) reduces the controversy haps greater than their respective grounded
between Glaser and Strauss and Corbin to theory books indicate. Strauss’ strong prag-
whether a researcher follows the coding matist roots are more evident in his early
paradigm systematically—perhaps rigidly?— works (e.g. 1959/1969, 1961; Glaser and
or adopts ad hoc theoretical codes from Strauss, 1965, 1967, 1968) and in Continual
Glaser’s coding families. Even though Kelle’s Permutations of Action (1993) than in his co-
view makes sense, it undermines Glaser’s authored grounded theory texts with Juliet
approach to constructing emergent categories. Corbin, Basics of Qualitative Research (1990,
Kelle sees Glaser’s emphasis on emergence 1998), which contain positivist undercurrents.
as a problematic methodological concept Both Strauss and Corbin’s and Glaser’s ver-
imbedded in Glaser’s exhortations to study sions of grounded theory assume an external
data without adopting a preconceived theo- reality independent of the observer, a neutral
retical frame. True, Glaser views emergence observer, and the discovery of data. Notions
as contingent on not forcing data into about what researchers see, define, and
extant theories and his resounding ‘Trust in describe as data do not permeate their texts.
emergence’ has the ring of a slogan. Yet an Glaser ignores the vital roles of perspectives
emphasis on emergence means more than and language for what we define as data
a slogan. An apt approach combines Dey’s and Strauss and Corbin state, ‘Although we
(1999) view of bringing an open mind to data do not create data, we create theory out of
with Henwood and Pidgeon’s (2003) notion data’ (1998, p. 56). Such approaches do not
of theoretical agnosticism. This approach acknowledge the position from which the
is consistent with an injunction from the observer sees and speaks much less how
abductive logic that has always characterized grounded theory is an inherently interactive
grounded theory: remain open to all kinds method during every step of the process.
of theoretical possibilities and gather more Whether or not researchers use axial coding
data to check the most plausible explanation or adopt the conditional/consequential matrix,
(Peirce, 1938/1958; Rosenthal, 2004). Kelle Strauss (1987) and Strauss and Corbin (1990,
correctly takes Glaser to task about assuming 1998) have made diagramming an integral
that facts stand alone, and that a theory- part of the method for their followers.
free observer can see them but also notes Diagramming representations of relationships
the conflicting assumptions about possessing between categories fosters developing ana-
‘theoretical sensitivity’ (Glaser and Strauss, lytic complexity with multiple categories.
1967, p. 3; Glaser, 1978). In this sense, Strauss and Corbin’s reconstruc-
A major difference between Glaser and tion moves beyond Glaser’s variable analysis
Strauss and Corbin may lie in how and when of one core variable and also provides a foun-
each imports their respective form of coding dation for Adele E. Clarke’s (2003, 2006)
into the analysis. For Glaser, theoretical postmodernist revision of grounded theory
coding comes after the grounded theorist has and methodological strategy of mapping
advanced tentative categories; for Strauss and empirical situations and positions. She creates
Corbin axial coding is a means of developing positional maps that not only chart discourses
categories. Glaser and Strauss and Corbin but also locate silences and paths not taken as
each contend that their respective forms of well as those taken.
coding put the previously fractured data In keeping with his positivist heritage,
back together in conceptual ways. A second Glaser assumes an expert observer who makes
difference lies in their use of comparisons. neutral, unproblematic observations and
Glaser sticks to comparing data with data, data offers slogans such as ‘All is data’
(2001, p. 145) that gloss what researchers Since Glaser and Strauss’ (1967) original
may define as ‘all.’ Glaser explicitly promotes statement, several major grounded theorists
theorizing from outside the studied experi- have aimed beyond middle-range theories.
ence rather than from within it. For years Strauss (1987, 1993), independently as well
he argued that study participants will tell as with co-author Juliet Corbin (Strauss and
researchers their main concern about what’s Corbin, 1990, 1998), began to move from
happening in their setting (see, for example, the micro level of analysis to meso and
Glaser, 1992). Beyond any intent to focus macro levels, an effort that Clarke (2003,
an observer’s gaze on some issues and 2005, 2006) has extended. Elsewhere I have
away from others, relying on participants’ initiated a discussion of taking grounded
directives can still result in an outsider’s theory methods into structural analysis with an
analysis. Participants often take for granted explicit emphasis on social justice (Charmaz,
the fundamental processes and conditions that 2005).
shape their lives. Following participants’overt
statements may lead to unwitting acceptance
of a public relations rhetoric and subsequent
analysis of an outsider rather than insider’s POSTMODERN CHALLENGES AND
viewpoint. Interestingly, in a significant shift, CONSTRUCTIVIST RECONSTRUCTIONS
Glaser later (2001, p. 51) acknowledges that OF GROUNDED THEORY
the researcher identifies and conceptualizes
participants’ main concern. By 1990, publication of Strauss’ Qualitative
Overall, Glaser’s epistemology has Methods for Social Scientists (1987) and
remained consistent over the years. Yet he, Strauss and Corbin’s Basics of Qualitative
too, has reconstructed grounded theory Research had made the method immensely
practice in both major and minor ways. Unlike popular throughout the social sciences and
Strauss and Corbin’s (1990, 1998) recon- professions. The qualitative revolution had
structions, Glaser’s shifts are incremental and spread widely and Basics gave researchers
buried in the dense texts of his self-published a way to conduct qualitative research. Simul-
books. He also presents his shifts as contribu- taneously, however, the positivist residues
tions to an evolving method. But who decides of early grounded theory statements came
what represents its evolution, reconstruction under increased scrutiny and postmodern
or erosion? Glaser has disavowed his quest to and narrative turns undermined the method.
define and analyze a basic social process or Some scholars (see for example, Conrad,
basic social psychological process because he 1990; Ellis, 1995; Richardson, 1993) viewed
now sees such a quest as forcing the data. This grounded theory as clinging to an outdated
change is fundamental because earlier Glaser modernist epistemology. For them, grounded
built grounded theory practice on the analytic theory fragmented the respondent’s story,
explication of these processes. Similarly, relied on the authoritative voice of the
another major change in methodological researcher, blurred difference, and accepted
practice concerns initial coding. Glaser (1992, Enlightenment grand metanarratives about
2001) disavows his earlier prescription to do truth, universality, human nature, and world
line-by-line coding to fracture the data and views. Such critiques melded grounded theory
to see beyond the immediate story during the strategies with the originators’ early state-
initial coding. Instead, he advocates seeking ments and how they used the method.
a core variable through comparisons of A reconstructed grounded theory can take
incidents. Minor shifts include adding more into account many of the criticisms that
families of theoretical codes, changing the varied critics have raised. Researchers can
rules for memo-making, and narrowing adopt–and may adapt—the flexible strategies
the definition of theorizing to ‘a theory of that Glaser and Strauss (1967; Glaser, 1978,
a core category’ (2001, p. 206). 2001) originally delineated. These strategies
remain enormously helpful in producing The objectivist-constructivist dichotomy

analyses that offer useful interpretations of between grounded theory approaches
studied life. A growing number of scholars, juxtaposes their respective assumptions,
including myself, have sought to loosen logics, and objectives (see Figure 27.1).
key grounded theory strategies from their This dichotomy provides a heuristic device
positivist foundations evident in both Glaser’s for increasing the visibility of starting
and Strauss and Corbin’s versions of the assumptions and for assessing proponents’
method (see, for example, Bryant, 2002, 2003; innovations and reconstructions of the orig-
Castellani et al., 2003; Charmaz, 2000, 2002, inal method. This dichotomy also helps
2005; Clarke, 2003, 2005; Henwood and researchers to examine their starting assump-
Pidgeon, 2003; Seale, 1999). tions and research actions. In practice,
Researchers can use grounded theory grounded theory inquiry ranges from objec-
strategies without endorsing mid-century tivist to constructivist.
assumptions of an objective external reality, Constructivist grounded theorists refute
a passive, neutral observer, or a detached, notions of unproblematic selection, collec-
narrow empiricism. If, instead, we start with tion, and representation of data. Data and
the assumption that social reality is multiple, their meanings are neither singular nor self-
processual, and constructed, then we must evident; instead, researchers interpret and
take the researcher’s position, privileges, categorize data but their potential meanings
perspective, and interactions into account as are multiple. Constructivists look for multiple
an inherent part of the research reality. It, too, meanings and complexity and thus, limit
is a construction. As Clarke (2005, 2006) the simplifying, generalizing impulse, and
stresses, the research reality arises within resist decontextualizing the analysis, as advo-
a situation and includes what researchers cated in earlier grounded theory statements.
and participants bring to it and do within Constructivists argue for locating both the
it. Thus, relativism characterizes the research grounded theory process and product in
endeavor rather than objective, unproblematic time, space, and social conditions. That
prescriptions and procedures. Research acts means a completed grounded theory must be
are not given; they are constructed. Viewing evaluated in light of its specific origins rather
the research as constructed rather than discov- than viewed as separate and distant from
ered fosters researchers’reflexivity about their its construction. Constructivists also favor
actions and decisions. aiming for abstract understanding rather than
This perspective shreds notions of a neutral pursuing earlier positivist goals of explanation
observer and value-free expert. Not only does and prediction. In short, grounded theory
that mean that researchers must examine strategies foster the researcher taking an active
rather than erase how their privileges and stance throughout data collection and analysis
preconceptions may shape the analysis, but and constructivist approaches further this
it also means that their values shape the very stance and combine it with reflexivity and
facts that they can identify. Like the Marxist relativity (Charmaz, 2006).
view of history, this approach treats research In her constructivist revision of classical
as a construction but acknowledges that it grounded theory, Clarke (2003, 2005, 2006)
occurs under specific conditions—of which explicitly builds on its pragmatist foundations
we may not be entirely aware and of which and incorporates postmodern perspectives.
may not be of our choosing. When emphasizing the compatibility of prag-
Thus, the major reconstruction of grounded matism and symbolic interactionism with
theory derives from wresting grounded theory contemporary epistemological developments
from its earlier objectivist roots and, instead, including feminist theory, Clarke reminds
adopting constructivist epistemologies us that pragmatism’s relativistic view of
with their respective implications for truth, assumption of a multiplicity of per-
research practice (Charmaz, 2000, 2006). spectives, and emphasis on partial views,
Objectivist Grounded Theory Constructivist Grounded Theory
Assumes an external reality Assumes multiple realities

Assumes discovery of data Assumes mutual construction of data
Assumes conceptualizations emerge from Assumes researcher constructs
data categorizations
Views representation of data as Views representation of data as
unproblematic problematic, relativistic, situational,
and partial
Assumes the neutrality, passivity, and Assumes the observer’s values, priorities
authority of the observer and positions, and actions affect
views
Views data analysis as an objective Acknowledges subjectivities in data
process analysis, recognizes co-
construction of data; engages in
reflexivity
Gives priority to researcher’s views Seeks participants’ views and
voices as integral to the analysis
Aims to achieve context-free Views generalizations, as partial,
generalizations conditional, and situated in time,
space, positions, action, and
interactions
Focuses on developing abstractions Focuses on constructing interpretations

Aims for parsimonious explanation Aims for interpretive understanding
Figure 27.1 Comparison of objectivist and constructivist grounded theory∗

∗ See Charmaz, 2000, 2006.
situated actions, and positional knowledge fundamental properties? To turn grounded

already aligns it with constructivist grounded theory logic on itself, which analytic prop-
theory. Clarke (2006) sees grounded theory erties distinguish the method and make it
and symbolic interactionism as fitting what distinctive? Grounded theory is an inductive-
Star (1989) calls a theory-method package abductive, comparative, emergent, and inter-
in which ontology and epistemology are active method (Charmaz, 2006). These
co-constitutive and non-fungible. To provide properties take full form in its constructivist
a research practice that builds on her perspec- versions and shape how researchers invoke its
tive, Clarke (see especially 2003, 2005, 2006) strategies.
offers situational analysis as a way to map From the beginning, Glaser and Strauss
positions, discourses, actions, and to capture (1967) have treated grounded theory as an
silences at meso and macro levels. inductive and fundamentally comparative
method. They align grounded theory with
its practical applicability as consistent with
GROUNDED THEORY AS METHOD John Dewey’s pragmatism, but they do
not mention Peirce (1938/1958) or abduc-
AND PRACTICE
tive reasoning. Strauss (1987), however,
acknowledges the debt that grounded the-
Properties of the method
ory owed to George Herbert Mead and
Many qualitative researchers are familiar Charles S. Peirce. In his teaching, Strauss
with the flexible guidelines constituting the routinely described grounded theory as an
grounded theory method. But what are its abductive method6 . As such, researchers
begin with inductive cases and define an interpretation. Such an approach adopts
intriguing finding, which they attempt to a preconceived form for the method without
explain. Abductive reasoning involves the attending to how the content of the research
imaginative interpretation of accounting for can re-form the form. Form and content
this finding by entertaining all possible shape each other, particularly in constructivist
theoretical interpretations, and then checking versions of grounded theory. Researchers
these interpretations against experience until study and focus data collection and analytic
arriving at the most plausible theoretical in a dialectical process. Therefore, the method
explanation (Hildebrand, 2000/2004; Peirce, itself becomes constructed and reconstructed
1938/1958; Reichert, 2000/2004; Rosenthal, throughout the research process. Maintain-
2004). Abductive logic builds checks into ing this dialectic requires active, reflective
the research process and, therefore keeps an researchers, whose reasoning directs their
emerging theory grounded in the data that it enactment of this method.
attempts to explain. The fundamental property of emergence in
For Glaser (1992, 1998, 2001, 2003), grounded theory relies on active researchers
the comparative methodology consists of who interact with their data and interpret these
a set of successive strategies for developing data—and their research practices. The image
theoretical categories and renders these cate- of neutral, passive researchers who discover
gories objective through abstraction of their data and theory is a mirage. Moving from
properties. For Strauss and Corbin (1998), data to theory requires researchers’ sustained
the comparative method corrects ‘possible interaction and actions with their data and
distortion of meaning’ (p. 137). Their ‘far- emerging analyses. In short, grounded the-
out’ comparisons leap beyond the data but orists study emergent processes—and the
hearken back to Everett Hughes’ (1958) method itself is an emergent process.
seemingly incongruent comparisons such as
the similarities between psychiatrists and
Grounded theory guidelines
prostitutes, a comparison that long entranced
American sociologists. Several basic grounded theory guidelines
In the early years, both Glaser and Strauss have become standard fare in qualitative
treated grounded theory as an emergent inquiry. Nonetheless, the grounded the-
method. It is ironic that Strauss’ methodolog- ory emphasis on action and process, its
ical texts with Corbin became increasingly comparative approach, and its particular
procedural. Mead’s (1932) philosophy of coding and sampling strategies make the
time and conception of the emergent present method unique—and sometimes misunder-
had profoundly affected Strauss’ method- stood. Because these guidelines have been
ological practice and theoretical perspective. discussed at length elsewhere (Charmaz,
His methodology books do not fully portray 2003, 2005, 2006; Glaser, 1978, 1992, 1998,
the fluidity of his thinking or the creativity 2001, 2003; Glaser and Strauss, 1967; Locke,
enacted in his co-authored research with both 2001; Strauss, 1987; Strauss and Corbin,
Glaser and Corbin (see, for example, Corbin 1990, 1998) I merely outline them here.
and Strauss, 1988; Glaser and Strauss, 1968). Unlike most qualitative approaches,
A procedural approach to grounded theory grounded theory provides explicit strategies
dampens its emergent strengths and dimin- for defining and studying processes: this
ishes possibilities for theoretical innovation. method places priority on action. Glaserian
Researchers have long associated grounded versions of grounded theory build action
theory as having a particular form, but have into the analysis from the earliest coding.
not explicated the vital role of content for The comparative study of actions and codes
directing this form. They can become mired advances an inductive analysis. By invoking
in following procedures and subsequently comparative methods throughout the analysis
produce description rather than theoretical grounded theorists define analytic properties
of their codes. Essentially from the start Focused or selective coding follows
grounded theorists code and analyze to scrutiny of the initial codes. Focusing on
illuminate actions, process, and potential both the most frequent and the most telling
theoretical meaning (Glaser, 1978). In brief codes provides tentative leads to explore
grounded theory guidelines include the and check during subsequent data collection.
following comparative research practices: Researchers use focused codes to sort large
amounts of data and to construct tentative
categories in their emerging theories.
• Comparing data with data
• Labeling data with active, specific codes
Memo-writing is the crucial stage of
• Selecting focused codes analysis between coding and writing sections
• Comparing and sorting data with focused codes of a first draft of the study. In grounded theory
• Raising telling focused codes to tentative analytic practice, researchers write memos from the
categories very beginning of their research and continue
• Comparing data and codes with analytic cate- to write progressively more focused and
gories analytic memos as they proceed. Memos lend
• Constructing theoretical concepts from abstract form to fleeting ideas, take codes and cate-
categories gories apart, make comparisons explicit, mine
• Comparing category with concept descriptions, stories, and incidents for their
• Comparing concept and concept7
analytic import, raise and discuss conjectures,
and identify gaps and unanswered questions
Objectivist grounded theorists who follow in the data. Writing memos becomes a means
Glaser’s aim to use comparative methods of actively engaging one’s data, codes, and
without preconceptions. Thus they pre- categories. By including data in the memo,
scribe entering the research setting and researchers build clear links to categories.
analysis uncontaminated by prior theory Much comparative analysis occurs while
and disciplinary knowledge. Constructivist memo-writing from comparing data with data
grounded theorists use their prior knowledge and codes early on to comparing category with
and disciplinary perspectives to sensitize category as researchers develop their theories.
them to conceptual issues at the beginning An emergent fit of the categories may then
but seek new theoretical interpretations as become apparent through writing memos.
they interrogate their data and emerging Grounded theory builds checks on the
analyses. analysis throughout the process. Memo-
At least two phases of coding characterize writing fosters checking hunches and keeping
grounded theory: open, or initial, and selec- the analysis grounded. Theoretical sampling,
tive, or focused. During the initial phase, line- offers another pivotal, but often misunder-
by-line coding prompts the researcher’s active stood strategy for grounding the analysis
involvement in the analysis. To do line-by- and increasing its incisiveness. Theoretical
line coding at all, researchers must view the sampling means sampling to flesh out or
data in greater depth than passively perusing refine theoretical categories to increase the
it or looking for themes, as qualitative precision of the emerging theory. In short,
researchers generally do. Even though Glaser this strategy invokes abductive reasoning
has jettisoned line-by-line coding, it remains because researchers test their tentative ideas.
an excellent heuristic strategy for scrutinizing Theoretical sampling arises from researchers’
data and for examining one’s preconceptions analyses, not from any representation of
about the data as well as becoming aware of population traits or status attributes.
tacit alignments or shared assumptions with When does the iterative process of moving
participants. By constructing active, specific, between collecting and analyzing data end?
and short initial codes, the grounded theorist The standard grounded theory answer is when
creates handles for making comparisons categories are saturated. That means that
between data and between codes. the researcher has explicated the properties
of each theoretical category and has sought and their scientistic language undercuts its
data that fill each property. The emphasis on potential artfulness. Their dual emphases on
categories and properties makes saturation science and art are also evident in their
a theoretical concern, not merely a method- shared empirical works. They develop such
ological measure indicating redundancy of concepts as ‘biographical body conceptions
data as in conventional qualitative research. (or BBC) … [which] represents those three
Yet the concept of theoretical saturation concepts—biographical time, body, and con-
remains problematic in grounded theory. Like ceptions of self’ (p. 252) and the ‘BBC
the assumption that grounded theorists share chain,’(Corbin and Strauss, 1987, p. 253), ‘the
definitions of ‘theory,’ the standard answer of combination of the three working together.’
saturation does not address what constitutes These concepts provide analytic tools that
a category, nor does it explain how one knows dissect experience but distance it from how
that all salient properties and their variations people live it. Within the same paper, however,
have been defined, much less been given ade- Corbin and Strauss, offer some artful narrative
quate coverage. Grounded theorists usually descriptions that bring the experience to life.
assert that they have saturated the properties Below they discuss questions arising when
of a category rather than demonstrating it people first receive a diagnosis of chronic
(Morse, 1995). illness and describe the properties of this
The last major grounded theory strategy temporal turning point:
involves integrating the analysis. How does
… [W]hen past and future come crashing into the
one accomplish it? By this time grounded undesirable or dreaded present. This identity shock
theorists should have a set of well-developed is followed by future images of what the illness will
analytic memos on their categories and mean in terms of biographical performances such
concepts. Integrating them becomes part as: ‘I will be crippled.’ ‘I will no longer be able to,’
of theorizing and, thus, researchers next ‘I might die soon.’ The degree to which identity is
jolted depends upon the number of aspects of self
engage in theoretical sorting to best present lost, their salience, and the possibility of comeback –
the relationships between categories and regaining lost aspects of self. (p. 272)
concepts. Sorting memos occurs first in
service to the emergent grounded theory Glaser’s version of grounded theory sticks
and then, perhaps later, for presentation to to conventional social science. He does not
an audience. The explanation of the sorting take into account the potential power of artful
helps to integrate the theory and makes interpretation and advises against attending to
the analytic argument visible for the written writing (2001). For Glaser, the ‘conceptual
report. Strauss (1987) and Strauss and Corbin grab’ of the analysis trumps the writing of
(1990, 1998) propose diagramming major it (2001, p. 80). Not surprisingly, Glaser
ideas and relationships, and Clarke (2003, has expressed disdain for both qualitative
2005, 2006) offers a means of making researchers who aim to tell the overarching
structure and process visible. story in their research and the stories that
support it. His remarks endorse a unitary
treatment of grounded theory reportage,
untouched by either the narrative turn in
ART AND SCIENCE IN GROUNDED
the social sciences or the demands of varied
THEORY STUDIES writing genres and publishing venues.
Art and science in the originators’
works Artful interpretations in grounded
theory works
Strauss and Corbin (1998) treat their approach
as both science and art but their overlay of Grounded theorists’ published works range
technical procedures and objectivist assump- from neutral reports to imaginative interpreta-
tions undermine its interpretive elements tions written with style and grace. Much work
conducted under the banner of grounded contention that a critical mass of anoma-
theory consists of routine description couched lies eventually cause change in scientific
in academic conventions. Numerous ana- theorizing. In contrast, Star proposes that
lytic writings are stilted and mechanical. theoretical shifts in science are continual and
How might grounded theorists produce artful routine and argues that brain localizationists’
interpretations? victory over brain diffusionists in the late
Typical grounded theory writing fosters nineteenth century provides a case in point.
making categories explicit in linear form. She weaves description throughout the narra-
These categories represent authors’ tive to support her theoretical perspective and
construction of their respective research argument. For Star, abstract theorizing about
participants’ actions. This writing strategy scientific reasoning arises from the whole of
can shrink the substance of a study to her analysis rather than the disparate parts.
a list of mundane, loosely related processes In this sense, she reunites the fragmented data
or descriptors. When, however, authors into a coherent—and fascinating—analytic
present both the central idea and its major story but she does so in a way that its
categories in vivid terms, they simultaneously grounded theory underpinnings recede into
integrate their analyses and engage readers the background and her theoretical points
in their theoretical renderings. Geralyn A. emerge in the foreground. In the passage
Meyer (2002) titles her articles, ‘The Art below, Star explores her category ‘the contra-
of Watching Out: Vigilance in Women dictions’ [in the localizationists’ position] and
Who Have Migraine Headaches,’ and then also builds her case about how these scientists
posits ‘owning the label’ and ‘making the reconstruct the exigencies of their work to fit
connections’ as the two major conditions for their theoretical proclivities. Star writes:
her core category, ‘watching out,’ to occur.
She aims to provide a substantive analysis of
the vigilance she finds in her 22 interviews.
THE CONTRADICTIONS
Meyer breaks down both the conditions
and the core category into sub-categories. Localizationists recognized that material and imma-
Watching out included these subcategories: terial realms could not, without serious philo-
‘assigning meaning to what is, calculating the sophical difficulties, simply be posited as causing
risk, staying ready, and monitoring the results’ action in one another. They also recognized
(p. 1225). The names of the categories and that in principle ‘correlation is not causation,’
although they sometimes used correlation as proof.
sub-categories alone carry substantial weight The major conceptual difficulties thus caused by
and create the form of the analysis. Thus parallelism [‘the doctrine that the mind and body
these categories may require less detailing operate as two separate but parallel realms’ (Star,
and supporting evidence than more opaque 1989, p. 155)] were how the two realms (mind
categories because their analytic rendering and brain) were brought together and by what
mechanisms they were made to operate in tandem.
aims for limited theoretical reach but makes Again, it is not surprising to find that the localiza-
sound intuitive sense. Meyer’s analysis res- tionists’ responses of these problems were neither
onates with readers’experience. She keeps the unified nor consistent. They were facing multiple
analysis simple, the categories crisp, and the incommensurate audiences: philosophy, medicine,
relationships between them sequential. physiology, antivivisection, and evolutionary biol-
ogy. In addition their everyday work posed serious
In the following passage from a much technical difficulties and uncertainties.
larger project, Susan Leigh Star (1989) In order to resolve the conflicting demands of the
adopts a different and more difficult analytic several audiences, localization of adopted several
objective and writing strategy. She sets high general strategies. The first strategy was to refer
analytic stakes by making a major theoretical philosophical problems to an expert within their
ranks. This was someone who understood their
argument about relationships between scien- daily work concerns but who would speak as a
tific work and shifts in scientific theorizing. philosopher for them. The person elected to do this
Her argument challenges Thomas Kuhn’s was John Hughlings Jackson. Because he addressed
many of the contradictions posed by parallelism and construction nor in its seeming substance.
the mind/brain relationship, Jackson became a kind Layers of meaning and action underlie
of symbolic leader for the localizationists….
both its construction and substance, which
The second strategy was to develop theories
and concepts that could act as plausible bridges means researchers have rich soil to excavate.
between the realms of the mind and the brain. Doing grounded theory may simplify method-
These explanations were not, strictly speaking, ological decisions but it fosters developing
philosophically accurate. However they were good complex and layered analyses, as the excerpt
enough as theoretical explanations to allow work
above from Star suggests. Given Glaser
to continue respectably.
As a final resort, when problems cannot be and Strauss’ (1967) original openness to
resolved, localizationists would simply jettison methodological innovation and development,
intractable problems into other lines of work. it is ironic that grounded theory has become
That is, those difficulties that could not easily be a methodological template—of whichever
addressed by some physical or medical model were
version—for some researchers who seek
relegated to ‘mind’—related lines of work, such
as psychiatry and psychology. In this way, psy- mechanical means to stamp out qualitative
chophysical parallelism was reinforced on an orga- studies.
nizational level. Such a division of labor effectively Yet by interrogating and following content,
obscured many of the epistemological problems grounded theorists can construct form for their
arising from the mind/brain gap. The contradictions
inquiry, rather than solely creating content
were thus eradicated from immediate concern.
(pp. 162–163) from form used as a recipe for generating
research. Grounded theory gives researchers
Star crafts a convincing argument. Note sufficient strategies that they can assume
how she weaves her evidence through the control of their research practice and advance
narrative to support her theoretical argu- their original ideas. Thus, the present points
ment. She creates smooth transitions between the way for future reconstruction of grounded
description and her category of ‘contradic- theory to open further possibilities for making
tions’ that simultaneously directs the reader original theoretical contributions.
and builds her case.
CONCLUSION NOTES
Researchers have reconstructed grounded

1 What became known as the ‘Chicago school’
theory to fit their work and fulfill their objec- typically includes a symbolic interactionist theoretical
tives. As in the past, many researchers still perspective and ethnographic field research method-
claim grounded theory to legitimatize some ological tradition. As Abbott (1999) points out, con-
support of inductive inquiry although it may sensus on theory and method did not exist at Chicago
bear faint resemblance to grounded theory in the 1940s, when the ‘second Chicago school’
emerged. Some Chicago graduate students were
strategies. Those who adopt grounded theory influenced by Herbert Bulmer; others saw themselves
strategies tend to select among them and may as field researchers, but not necessarily symbolic
remain unaware that their selections represent interactionists, and, simultaneously, as Bulmer (1984)
a partial use of the method. Still, the wide states, traditional methodologists pursued a vigorous
acceptance of versions of grounded theory quantitative agenda.
2 Platt (1996, 253) notes that the ‘case-study
attest to the usefulness of the method and method’ held sway as a key concept before World
the current debates about its construction and War II, but what it meant was often not clear.
direction affirm its vibrancy. 3 Platt (1996, 14–17) charts increased numbers
How researchers reconstruct grounded of technical works addressing topics such as surveys,
theory matters. The strength of the method sampling, scaling, and measurement between 1945
and 1960 in her table of American methodological
lies in its recursive practice in which content monographs. Of the 29 cited volumes, only four
shapes form. As I have argued above, this address distinctively qualitative methods. Several
content is neither straightforward in its works focused on interview techniques, which Platt
correctly points out overlap quantitative and qualita- quantitative data. Symbolic Interaction, 26(4):
tive research. 577–589.
4 Some blurring between theoretical treatises Charmaz. K. (2000). Constructivist and objectivist
and empirical studies occurs when anything without grounded theory. In N. K. Denzin & Y. Lincoln
numbers counts as ‘qualitative.’ Not all macro
(Eds.), Handbook of Qualitative Research, 2nd ed.
qualitative works are empirical.
(pp. 509–535). Thousand Oaks, CA: Sage.
5 Lazarsfeld also pursued qualitative methods but
his contribution to quantitative methods became Charmaz, K. (2002). Grounded theory analysis.
more widely known. In J. F. Gubrium & J. A. Holstein (Eds.), Handbook of
6 My comments here derive from my days as Interview Research (pp. 675–694). Thousand Oaks,
a student of both Glaser and Strauss and a long CA: Sage.
friendship with Strauss thereafter. Charmaz, K. (2003). Grounded theory. In Jonathan
7 This list is congruent with Glaser’s comparative A. Smith (Ed.), Qualitative Psychology: A Practical
approach. For further details see Charmaz (2006) and Guide to Research Methods (pp. 81–110). London:
Glaser (1978, 1992, 1998).
Sage.
Charmaz, K. (2005). Grounded theory in the
21st century: A qualitative method for advancing
REFERENCES social justice research. Forthcoming in N. Denzin &
Y. Lincoln (Eds.), Handbook of Qualitative Research,
Abbott, A. (1999). Department & Discipline: Chicago 3rd ed. Thousand Oaks, CA: Sage.
Sociology at One Hundred. Chicago: University of Charmaz, K. (2006). Constructing Grounded Theory:
Chicago Press. A Practical Guide Through Qualitative Analysis.
Adams, R. N. & Preiss, J. J. Eds. (1960). Human London: Sage.
Organization Research. Homewood, IL: Dorsey Press. Charmaz, K. & Henwood, K. (2007). Grounded theory.
Atkinson, P., Coffey, A., & Delamont, S. (2003). Key In C. Willig & W. Stainton-Rogers (Eds.), Handbook
Themes in Qualitative Research: Continuities and of Qualitative Research in Psychology. London: Sage
Changes. New York: Rowan and Littlefield. 240–259.
Baker, C., Wuest, J. & Stern, P. (1992). Method slurring: Clarke, A. E. (2003). Situational analyses: Grounded
The grounded theory, phenomenology example. theory mapping after the postmodern turn. Symbolic
Journal of Advanced Nursing, 17 :1355–1360. Interaction, 26, 553–576.
Benoliel, J. Q. (1996). Grounded theory and nursing Clarke, A. E. (2005). Situational Analysis: Grounded
knowledge. Qualitative Health Research, 6(3): 406–428. Theory After the Postmodern Turn. Thousand Oaks,
Boychuk Duchscher, J. E. & Morgan, D. (2004). CA: Sage.
Grounded theory: Reflections on the emerging Clarke, A. E. (2006). Feminism, grounded theory, and
vs. forcing debate. Journal of Advanced Nursing, situational analysis. In S. Hess-Biber & D. Leckenby
48(6):605–612. (Eds.), Handbook of Feminist Research Methods.
Bryant, A. (2002). Re-grounding grounded theory. Thousand Oaks, CA: Sage.
Journal of Information Technology Theory and Conrad, P. (1990). Qualitative research on chronic
Application, 4(1):25–42. illness: A commentary on method and conceptual
Bryant, A. (2003, January). A constructive/ist development. Social Science & Medicine, 30,
response to Glaser. FQS: Forum for Qualitative 1257–1263.
Social Research, 4(1), www.qualitative-research. Corbin, J. & Strauss, A. L. (1987). Accompaniments of
net/fqs/-texte/1-03/1-03bryant-e.htm [Accessed chronic illness: Changes in body, self, biography, and
03-14-2003]. biographical time. In J. A. Roth & P. Conrad (Eds.),
Bulmer, M. (1984). The Chicago School of Sociology. Research in the Sociology of Health Care, Vol. 6.
Chicago: University of Chicago Press. The Experience and Management of Chronic Illness
Burawoy, M. (1991). The extended case study. (pp. 249–281). Greenwich, CT: JAI Press.
In M. Burawoy, A. Burton, A. A. Ferguson, K. Fox, Corbin, J. & Strauss, A. L. (1988). Unending Work
J. Gamson, N. Gartrell, L. Hurst, C. Kurzman, and Care: Managing Chronic Illness at Home.
L. Salzinger, J. Schiffman, & S. Ui (Eds.), Ethnography San Francisco: Jossey-Bass.
Unbound : Power and Resistance in the Modern Dey, I. (1999). Grounding Grounded Theory. San Diego:
Metropolis (pp. 271–290). Berkeley: University of Academic Press.
California Press. Dey, I. (2004). Grounded theory. In C. Seale, G. Gobo,
Castellani, B., Castellani, J., & Spray, S. L. (2003). J. F. Gubrium, & D. Silverman (Eds.), Qualitative
Grounded neural networking: Modeling complex Research Practice (pp. 80–93). London: Sage.
Ellis, C. (1995). Emotional and ethical quagmires of Jaber F. Gubrium, & David Silverman (Eds.), Qualita-
returning to the field. Journal of Contemporary tive Research Practice (pp. 479–483). London: Sage.
Ethnography, 24(1): 68–98. Kelle, U. (2005, May). ‘Emergence’ vs. ‘forcing’:
Fielding, N. G. & Lee, R. M. (1998). Computer Analysis A crucial problem of ‘grounded theory’ Reconsidered
and Qualitative Data. London: Sage. [52 paragraphs]. Forum Qualitative Sozialforsung/
Glaser, B. G. (1978). Theoretical Sensitivity. Mill Valley, Forum Qualitative Sociology [On-line journal] 6,2
CA: The Sociology Press. Art. 27. Available at http/www.qualitative-research.
Glaser, B. G. (1992). Basics of Grounded Theory net/fqs.texte-2-05/05-2-27-e.htm [Accessed: 05-30-
Analysis. Mill Valley, CA: The Sociology Press. 2005].
Glaser, B. G. (1998). Doing Grounded Theory: Issues and LaRossa, R. (2005). Grounded theory methods and
Discussions. Mill Valley, CA: Sociology Press. qualitative family research. Journal of Marriage and
Glaser, B. G. (2001). The Grounded Theory Perspective: Family 67 (November):837–857.
Conceptualization Contrasted with Description. Mill Layder, D. (1998). Sociological practice: Linking theory
Valley, CA: The Sociology Press. and social research. London: Sage.
Glaser, B. G. (2002). Constructivist grounded theory? Lazarsfeld, P. & Rosenberg, M. (Eds.). (1955). The
Forum qualitative Sozialforschung/ Forum: Qualitative Language of Social Research: A Reader in the
Social Research [On-line Journal], 3. Available Methodology of Social Research. Glencoe, IL: Free
at: http://www.qualitative-research.net/fqs-texte/3- Press.
02/3-02glaser-e-htm Locke, K. (2001). Grounded Theory in Management
Glaser, B. G. (2003). Conceptualization Contrasted with Research. Thousand Oaks, CA: Sage.
Description. Mill Valley, CA: Sociology Press. Lofland, Lyn H. (1980). Reminiscences of classic Chicago.
Glaser, B. G. & Strauss, A. L. (1965). Awareness of Dying. Urban Life, 9:251–281.
Chicago: Aldine. Lonkila, M. (1995). Grounded theory as an emerging
Glaser, B. G. & Strauss, A. L. (1967). The Discovery of paradigm for computer-assisted qualitative data
Grounded Theory. Chicago: Aldine. analysis. In Kelle, U. (Ed.), Computer-aided Quali-
Glaser, B. G. & Strauss, A. L. (1968). Time for Dying. tative Data Analysis: Theory, Methods and Practice
Chicago: Aldine. (pp. 41–51). London: Sage.
Goffman, E. (1959). The Presentation of Self in Everyday Maines, David R. (2001). The Faultline of Consciousness:
Life. Garden City, NY: Doubleday Anchor Books. A View of Interactionism in Sociology. New York:
Goffman, E. (1961). Asylums. Garden City, NY: Aldine de Gruyter.
Doubleday Anchor Books. May, K. (1996). Diffusion, dilution or distillation? The
Goffman, E. (1963). Stigma. Englewood Cliffs, NJ: case of grounded theory method. Qualitative Health
Prentice-Hall. Research, 6(3):309–311.
Goulding, C. (2002). Grounded Theory: A Practical Guide Mead, G. H. (1932). Philosophy of the present. LaSalle,
for Management, Business, and Market Researchers. IL: Open Court Press.
London: Sage Melia, K. M. (1987). Learning and Working: The
Henwood, K. & Pidgeon, N. (2003). Grounded theory in Occupational Socialization of Nurses. London:
psychological research. In P. M. Camic, J. E. Rhodes, & Tavistock.
L. Yardley (Eds.), Qualitative Research in Psychol- Melia, K. M. (1996). Rediscovering Glaser. Qualitative
ogy: Expanding Perspectives in Methodology and Health Research, 6(3):368–378.
Design (pp. 131–155). Washington, DC: American Meyer, G. A. (2002). The art of watching out: Vigilance
Psychological Association. in women who have migraine headaches. Qualitative
Hildebrand, Bruno. (2000/2004). Anselm Strauss. Health Research, 12(9):1220–1234.
In U. Flick, E. Von Kardorff, & I. Steinke (Eds.), Miller, D. E. (2000). Mathematical dimensions of quali-
A Companion to Qualitative Research (pp. 17–23). tative research. Symbolic Interaction, 23:399–402.
London: Sage. Mills, J., Bonner, A. & Francis, K. (2006). The develop-
Hughes, E. C. (1958). Men and Their Work. Glencoe, IL: ment of constructivist grounded theory. International
Free Press. Journal of Qualitative Methods, 5(1):1–10.
Junker, B. H. (1960). Field work: An introduction to the Morse, J. M. (1995). The significance of saturation.
social sciences. Chicago: University of Chicago Press. Qualitative Health Research, 5:147–149.
Kahn, R. L. & Cannell, C. F. (1957). The Dynamics of Peirce, C. S. (1938/1958). Collected Papers. Cambridge:
Interviewing. New York: Wiley. Harvard University Press.
Kelle, U. (2004). Computer assisted qualitative Pidgeon, N. F. & Henwood, K. L. (1995). Grounded
data analysis. In Clive Seale, Giampietro Gobo, theory: Practical implementation. In J. T. E. Richardson
(Ed.), Handbook of Qualitative Research Methods for Stern, P. N. (1994). Eroding grounded theory. In J. Morse
Psychology and the Social Sciences (pp. 86–101), (Ed.), Critical issues in qualitative research methods
Leicester: British Psychological Society Books. (pp. 212–223). Thousand Oaks, CA: Sage.
Pidgeon, N. F. & Henwood, K. L. (2004). Grounded Strauss, A. (1987). Qualitative Analysis for Social
theory. In M. Hardy & A. Bryman (Eds.), Handbook Scientists. New York: Cambridge University Press.
of Data Analysis (pp. 625–648). London: Sage. Strauss, A. (1993). Continual Permutations of Action.
Platt, J. (1996). A History of Sociological Research New York: Aldine de Gruyter.
Methods in America, 1920–1960. New York: Strauss, A. L. (1959/1969). Mirrors and Masks. Mill
Cambridge University Press. Valley, CA: The Sociology Press.
Reichert, J. (2000/2004). Abduction, deduction and Strauss, A. L. (1961). Images of the American city.
induction in qualitative research. In U. Flick, Chicago: University of Chicago Press.
E. Von Kardorff, & I. Steinke (Eds.), A Companion to Strauss, A. & Corbin, J. (1990). Basics of Qualita-
Qualitative Research (pp. 159–164). London: Sage. tive Research: Grounded Theory Procedures and
Rennie, D., Phillips, J. R., & Quartaro, G. K. (1988). Techniques. Newbury Park, CA: Sage.
Grounded theory: A promising approach to con- Strauss, A. & Corbin, J. (1998). Basics of Qualita-
ceptualisation in Psychology. Canadian Psychology, tive Research: Grounded Theory Procedures and
29(2):139–150. Techniques, 2nd edn. Thousand Oaks, CA: Sage.
Richardson, L. (1993). Interrupting discursive spaces: Urquhart, C. (2003). Re-grounding grounded theory-
Consequences for the sociological self. In N. K. Denzin or reinforcing old prejudices?: A brief response to
(Ed.), Studies in Symbolic Interaction, Vol. 14 Bryant. Journal of Information Technology Theory and
(pp. 77–83). Greenwich, CT: JAI Press. Application, 4:43–54.
Rock, P. (1979). The Making of Symbolic Interactionism. Urquhart, C. (2007 forthcoming). The evolving nature
London: Macmillan. of the grounded theory method: The case of the
Rosenthal, G. (2004). Biographical research. In C. Seale, information systems discipline. In A. Bryant &
G. Gobo, J. F. Gubrium, & D. Silverman (Eds.), Quali- K. Charmaz (Eds.), Handbook of Grounded Theory.
tative Research Practice (pp. 48–64). London: Sage. London: Sage.
Schreiber, R. S. & Stern, P. N. (Eds.) (2001). Using Wilson, H. S. & Hutchinson, S. A. (1996). Methodologic
Grounded Theory in Nursing. New York: Springer mistakes in grounded theory. Nursing Research,
Publication Company. 45(2):122–124.
Seale, C. (1999). The Quality of Qualitative Research. Wuest, J. (1995). Feminist grounded theory: An
London: Sage. exploration of the congruency and tensions between
Star, S. L. (1989). Regions of the Mind: Brain Research two traditions in knowledge discovery. Qualitative
and the Quest for Scientific Certainty. Stanford, CA: Health Research, 5(1):125–137.
Stanford University Press. Wuest, J. (2001). Precarious ordering: Toward a formal
Stern, P. N. (1980). Grounded theory methodology: its theory of women’s caring. Health Care for Women
uses and processes. Image, 12, 20–23. International, 22(1–2):167–178.
28
Documents and Action
Lindsay Prior
Tis writ, ‘In the beginning was the Word’. between writing, text, records and documen-
I pause, to wonder what is here inferred. … tation, but will merely refer to documents in a
The spirit comes to guide me in my need, generic sense – that is, as readable matter.
I write, ‘In the beginning was the Deed’.
Goethe, Faust, Part One.
As someone who has called upon and exten-
sively used documents in social research, it
seems to me that they always enter into social
affairs in two distinct modes: (a) as receptacles
The dynamic connection between words, of content; and (b) as agents in networks of
writing, and action that is highlighted in action. In what follows I intend to illustrate
the extract from Goethe’s Faust constitutes by the use of examples how a researcher
the central theme of this chapter. Oddly might relate to these two modes. My examples
it is a theme that is rarely taken up with are drawn mostly from my own work and
issues relating to social research, despite the therefore concern matters affecting health,
fact that writing plays such a large part in illness and medicine – the areas in which I do
everyday culture. Indeed, in our age and our my research. However, the discerning reader
world, writing is more often than not seen as should not be misled by the specificity of the
being somewhat divorced from action – as examples, and should be able to see how an
something static, immutable and isolated from investigator in other fields of inquiry might
human deed – lodged as it is in books, libraries extend the strategies discussed herein to their
and archives. Yet the plain fact is that writing own areas of interest.
is itself a form of action and can even serve to As far as the social sciences are concerned,
structure significant features of interaction. most of the research that uses or calls upon
Writing is not of course co-terminus with documents focuses mainly on the collection
documentation; rather it is contained within and analysis of document content – and
documentation (along with numerous other that is where our own starting point is to
human creations such as maps, architectural be found. Indeed, a focus on documents as
plans, film, photographs and electronic web containers for content is well established
pages). However, in this chapter I will not in the social sciences. Documents in this
be overly concerned with drawing distinctions frame can be approached as sources of
information, and the writing and images that strategy will be discussed in the section
they contain scoured for appropriate data. entitled ‘Studying documents in action’.
Thus, letters, texts, photographs, adverts, Examining the role of documents in a
biographies and autobiographies, as well network generates questions about what
as documents containing statistical data are documents ‘do’, rather than what they ‘say’ –
typically regarded as a resource for the though in the messy way of the world such
social science researcher – see, for example, distinctions hold only at a conceptual rather
Plummer (2001) and Scott (1990, 2006). than an empirical level. Yet, by focusing on
Usually, various kinds of content analysis are ‘doing’ we come to see that documents not
adopted for such approaches – see Bryman only enter into human affairs as actors, but
(2004), Krippendorf (2004) and May (2001). can also structure such affairs – often in fine
Content analysis can also blend into discourse detail. Consequently, in the section entitled
analysis – a form of analysis that examines ‘Documents in interaction’ I will concentrate
how objects and relations between objects are on word and deed – showing how documents
represented and structured by means of text can influence episodes of human interaction
and talk (Wood, 2000). and thereby enter into the research frame as
On occasion, these relatively static forms active agents and something other than mere
of analysis can be extended so as to study containers of content.
documents as ‘topic’, rather than resource –
in which case the focus is, in part, on the
ways in which any given document came STUDYING CONTENT
to assume its actual content and structure.
This latter approach is akin to what Foucault Given that documents are normally viewed
(1972) might have called the ‘archaeology as little more than containers of content, the
of documentation’ – looking, for example, study of the material lodged within documents
at the first points at which certain objects in usually takes pride of place in relevant social
the world are mentioned and come into being scientific research strategies. Thus, letters,
via documentation, or revealing the ways in diaries, wills, biographies, newspaper stories,
which systems of classification of things in or whatever, can be scrutinised for their
the world – birds, flowers, viruses and the rhetoric, their syntax or even just for ‘themes’.
like – change at specific points in time. Some In this respect, Glaser and Strauss (1967:
implications of this style of research will also 163), argued that, in matters of sociological
be examined in the following section. research, documents ought to be regarded as
Approaching documents as topic rather akin ‘to an anthropologist’s informant or a
than resource can, however, open up a sociologist’s interviewee’.
further dimension of analysis. It concerns an Naturally, the use of documents as ‘infor-
examination of the ways in which documents mants’ stretches much further back into the
are used in social interaction and how they social sciences than the 1960s. For example,
function. Indeed, in this vein it is evident in one of the earliest sociological studies of
that during recent decades new approaches the twentieth century Thomas and Znaniecki
to the study of documents have emerged. In (1958; orig. 1918) collected together and
the field of sociology these new visions may analysed letters written by Polish immigrants
be seen to relate, in part, to developments to the USA. The use of immigrant letters
in actor-network theory or ANT (Law and as a source of social scientific data was
Hassard, 1999). In history and the history probably not original – even in 1918 when
of science they relate to the newly emergent the first volume of the ‘Polish Peasant’
‘geographies of knowledge’ (Livingstone, was published – but it was, nevertheless,
2005). In all cases the key theme involves insightful. W. I. Thomas, in particular, was
a consideration of documents as objects and concerned with individual attitudes – towards
actors in a web of activity. This kind of possessions, the family, social relationships
DOCUMENTS AND ACTION 481
and the like. The immigrant letter in this approach to the study of documentation
respect was seen to function as a repository of as ‘informant’. Insofar as rigour applies to
attitudes. For instance, the very fact that such content analysis – whether it is from a
letters were written at all, indicated that Polish newspaper story, a life history, a police report
immigrants were ready to invest a consider- on a crime scene or a social work report on a
able amount of time and effort in maintaining person with multiple problems – such analysis
family links across two continents. On the can take any one of a number of routes.
other hand, the actual content of the letters In my own case, I usually like to begin by
suggested to Thomas that in many key identifying all of the words used in a document
respects social solidarity was breaking down as well as the number of times that any given
in the Polish community. Thus, the letters were word is used. (This can be achieved through
said to reveal a considerable degree of conflict the use of simple concordance programmes
about such matters as marriage partners and that are freely available on the WWW.) By
other family relationships. As with many implication, content analysis necessitates both
researchers Thomas and Znaniecki can be enumeration and understanding of the various
accused of finding in the data only what they words lodged within a text. For example, in
wished to see – a common failing in analyses Table 28.1, I have provided an indication of
of content – and it is clear that theme of ‘social the number of times that particular words
disorganisation’ was already firmly implanted appeared in a patient support group leaflet
in the sociology of W. I. Thomas well before for people who suffer from chronic fatigue
he had looked at any letters. It is not surprising, syndrome – CFS (also known in the UK as
therefore, that social disorganisation in the ‘M.E’. and in the USA as CFIDS). Given
American urban Polish community is what the name of the condition, the appearance
Thomas saw the letters to reveal, but the Polish in the document of ‘fatigue’ and ‘chronic’
Peasant nevertheless gave a spur to the use of over 50 times apiece is not perhaps surprising.
such documents in the study of contemporary However, it is interesting to note that viruses
culture and history. In sociology and anthro- seem to be associated with whatever is
pology during subsequent decades there were going on in the document (23 citations), as
a sizeable number of studies that used diaries, well as an entity referred to as fibromyalgia
letters, biographies and autobiographies as life (18 citations), depression (14), genes (4)
histories and as important sources of social
scientific data (Angrosino, 1989). Plummer
(2001) provides an excellent overview of the Table 28.1 Occurrence of selected words
field and indicates how the use and study of in a 2315-word patient-support group
leaflet on Chronic Fatigue Syndrome
such materials came to be associated with
Fatigue 55
distinct methods of social scientific inquiry Chronic 51
(as is the case with ‘biographical’ methods, Illness 50
for example). Syndrome 46
Scouring newspapers and other documents Research 29
for supportive stories or evidence is one Virus/Viral/Virology 23
Disease 19
way of approaching document content, but Fibromyalgia 18
a more systematic approach would require Depression 14
both an appreciation of the ‘population’ of Immune/Immune-related/lmmunology 9
documents that may be available for sampling Genetic 4
(Hill, 1993), and of the entire content of the Psychology/psychological 4
Neurology/neurological 4
documents selected – looking at the segments Psychoneuroimmunology 2
that fail to fit hypotheses and theories as well Psychiatric/ Psychiatrists 2
as those that support hypotheses and theories. Mental 1
In that respect Glaser and Strauss (1967) were Mind 1
probably among the first to suggest a rigorous Source: Prior, 2003.
and something called psychoneuroimmunol- rejects such a claim because that would be
ogy (2). The simple presence of these words to suggest that CFS is being ‘dismissed’
is worthy of note and for someone who knows or not ‘accepted’ as a real illness simply
the arguments and debates associated with because it is ‘unproven’. In fact, were I to
the diagnosis and treatment of CFS they are produce the document in full it would be
all highly significant. In general, however, reasonably easy to see that throughout the
rather than a focus on individual words, it text there is a tension between the claims of
is usually more important for the researcher the writer – who asserts variously that CFS
to grasp (a) how the words relate to each is a ‘real’ and essentially ‘physical disease’ –
other and (b) what is being implied by their and some unknown others who have claimed
use. Let us consider a brief example, by that CFS is related to depression, anxiety
moving up a level and looking at sentences and other psychological problems. (Similar
and phrases rather than just words. Here is tensions are evident in debates concerning the
an extract from the aforementioned WWW nature of fibromyalgia – also cited above.)
document. By examining such tensions in the chosen
text, the analyst is drawn into an examination
‘Is CFS genetic? of a rhetoric of illness – concerning the
The cause of the illness is not yet known. Current
theories are looking at the possibilities of neuroen-
ways in which a disorder of unknown cause
docrine dysfunction, viruses, environmental toxins, is represented and understood by different
genetic predisposition, or a combination of these. parties. It is at that point, however, that
For a time it was thought that Epstein-Barr virus content analysis tends to drift into discourse
(EBV), the cause of mononucleosis, might cause CFS analysis.
but recent research has discounted this idea. The
illness seems to prompt a chronic immune reaction
Unlike content analysis, discourse analysis
in the body, however it is not clear that this is in is an awkward concept to capture. It has
response to any actual infection – this may only be essentially concerned the ways in which
a dysfunction of the immune system itself. things and our knowledge of things are
structured and represented through text and
A number of things are evident from the talk. For instance, there is a considerable
passage – such as the cause of the illness tradition within social studies of science and
being unknown; the possibility of the illness technology for examining the role of scientific
being caused by toxins, viruses, or endocrine rhetoric in structuring our notions of ‘nature’
disorder; and the fact that the illness might be and the place of human beings within nature.
‘genetic’, or caused by immune dysfunction. The role and structure of scientific rhetoric in
Indeed, the suggestion is that whatever text has, for example, figured in the work of
the cause might be, it is likely to be a Bazerman (1988), Gross (1996), Latour and
physiological (possibly neurological) rather Woolgar (1979), Myers (1990) and Woolgar
than, say, a psychological cause. Indeed later (1988); and even been extended beyond text
on in the document we get the following and into the realm of visual representations
statement: (Lynch and Woolgar, 1990) and everyday talk
Emerging illnesses such as CFS typically go through (Gilbert and Mulkay, 1984). And in this vein
a period of many years before they are accepted there have been numerous studies examining
by the medical community, and during that interim how the objects of science, medicine and tech-
time patients who have these new, unproven nology have been, and are, structured through
illnesses are all too often dismissed as being
discourse. One particularly interesting set of
"psychiatric cases". This has been the experience
with CFS as well. studies have been those that have concentrated
attention on the concept of the ‘gene’ and the
So it is also clear that somebody somewhere human genome. For example, Lily Kay (2000)
has argued that CFS might be related in analysed the role of metaphors of the gene and
some way to psychological or psychiatric genetics in genetic science between the 1950s
conditions – but the author of this document and the twenty-first century – indicating how
the image of DNA as a code or text 205 50 volumes. And it takes our laboratory a week
of instructions (recipe) or plan (blueprint) to check each one,
emerged only gradually during the second 206 which you can then work out quite quickly that
that is effectively a year
half of the twentieth century. Thus, she points 207 to check every single one. That is just the
out how, in the famous April 1953 Nature practicality of the time scale.
paper by Crick and Watson on DNA, the 208 The other problem though, if you are dealing
authors referred only to the structure of DNA– with something as big as
and she then investigates how the idea of 209 something like an encyclopaedia and you are
looking for a mistake and
using concepts of grammar and semantics to 210 effectively what you dealing with is just a code,
describe genetic processes emerged during a series of letters, then
the 1960s – particularly relating to work on 211 you are looking for something like a missing
‘messenger’ RNA. Indeed, the first ‘word’ paragraph or sometimes just
of the genetic code (the UUU of RNA) was 212 a missing word, or sometimes just a missing
letter. And right down to
not identified until 1961. Kay subsequently 213 just a change on one letter can be all that is
argues that the Nobel prize-winning work needed to have disastrous
of Nirenberg and Mathei (who discovered 214 effects.
the first word) would simply not have been 215 Patient: Yeah.
possible without calling upon and utilising
metaphors of communication and information A number of issues deserve attention here. The
science such as we have referred to above. first is the extensive use of metaphor in this
Other writers have chosen to focus on genetic exchange. In particular, genes are referred to
discourse in everyday culture (as reflected as ‘coding instructions’ (lines 201–02, 210),
through news stories and the like) with equally ‘volumes of an encyclopaedia’ (203–04), a
interesting results. Thus Nelkin (2001), for ‘series of letters’ (210), and words and/or
instance, has noted how, in popular culture, paragraphs (211–12). And in accord with
DNA is not simply regarded as a ‘code’ – such rhetorical forms, mutations are referred
carrying and expressing information – but to as ‘missing’ words, letters or paragraphs,
that it is also endowed with executive action. as ‘mistakes’ possibly brought about by a
In short, DNA is represented through text as ‘change in just one letter’ (213). The second
something that ‘makes things’ (humans, can- issue of interest is in what may be called
cers, and so forth), in a deterministic system. the actional components of the sentences
In the following paragraph I present some that link genes to human physiology. Of
of my own data (derived from talk between particular significance is the way in which
a doctor and a client of a cancer genetics genes are said to ‘control’ protein functions
service) to illustrate some possibilities of this (line 201), and genetic re-arrangements of
kind of approach. Even though the data are DNA sequences (letters) are argued to be
derived from talk (rather than text per se), they capable of having ‘disastrous effects’ (lines
serve to illustrate how analysis of a discourse 213–14) on the human body. Such attention
can reveal detail about the ways in which, in to the ways in which the use of tropes (such
any given culture, the world and the objects as metaphor) and syntax operate in text lead
within it are represented and structured. us to consider how ‘things’ and events in the
world are structured through discourse.
200 Doctor: And the genes are broken up into
sections and so a gene that
It could be said that with both content and
201 controls a protein function in a body is not just discourse analysis, researchers are essentially
one long coding seeking to use documentation as ‘resource’ –
202 instruction it is in fact broken up into sections that is as a source of data for social scientific
that then get joined theorising (of varying degrees of complexity).
203 together. And those sections you can think of
them as being volumes of
It is, however, possible to approach doc-
204 an encyclopaedia. Basically between the two ument content as ‘topic’. The very useful
genes there are effectively distinction between resource and topic was
first introduced by Zimmerman and Pollner Disorder is, for instance, recognised as a
(1971), and picking up on this distinction disorder only in DSM-III (first published in
can encourage us to ask a different set of 1974), whilst multiple personality disorder
questions about documentation. So instead of (MPD) has undergone a few transformations
focusing merely on what documents contain and is no longer listed in the 4th-revised
we can begin to ask how the documentation edition of the DSM. The inclusion and
that we elect to examine came to assume deletion of such diagnostic categories can be
the form that it did. This line of inquiry can used as key indicators of not merely how
be especially useful in the examination of professional and technical discourse might
the ways in which people ‘sort things out’ have altered, but also how political, legal
(Bowker and Star, 1999). For instance, it is and socio-economic processes impinge on
often instructive in matters of social research the affairs of science and medicine (for a
to ask how things come to be classified detailed example of the relationships between
in a particular way (and not other ways) a form of scientific classification and styles
and what rules are to be used to allocate of professional practice see Keating and
objects to one realm rather than another. Cambrosio, 2000).
Thus we might, for example, ask questions The manufacture and standardisation of
concerning the ‘causes’ of death, disease taxonomies – as well as the deployment of
and illness – such as what can one die of? rules for allocating ‘cases’ to appropriate
The answer to that question is invariably categories – is important for various reasons,
constrained by the content of a World Health but not least because they are indispensable to
Organization (WHO) manual – namely, The generating images of the world. For example
International Classification of Diseases and the ways in which events relating to crime,
Related Health Problems (WHO, 1992). It is the economy, illness and disease or education
often referred to in an abbreviated form as are classified and counted, is fundamental to
the ICD. The current edition of the manual our understanding of long-term trends and
is the tenth, and so the abbreviation is, our image of contemporary happenings. And
more accurately, ICD-10. ICD-10 provides as numerous analysts of official statistical
a list of all currently accepted causes of accounts of the world have demonstrated (see
death, and they are classified into ‘chapters’. for example, May, 2001; Prior, 2003), for any
Thus, there are chapters relating to diseases given society we can have as much or as little
and disorders of the respiratory system, the illness, crime, ‘success’ and ‘failure’ as we
circulatory system, the nervous system and want – depending on how, exactly, we sort
so on. In different decades different diseases things out.
and causes of death are added and deleted Unfortunately, once we are engaged with
from the manual. HIV/AIDS is an obvious the routine messiness of the empirical world
case of an addition and it appears as a cause many of these distinctions between content
of death only in ICD-10, whilst ‘old age’ as a and discourse, topic and resource are difficult
cause of death was eliminated in ICD-6. Such to hold to. For documents, as with most
taxonomies reflect aspects of human culture phenomena are fluid, messy and somewhat
and researching the ‘archaeology’of such doc- slippery objects for analysis. More impor-
uments can be instructive in itself. A related tantly, and as I shall demonstrate in the
publication – The Diagnostic and Statistical next two sections, documents often appear as
Manual of Mental Disorders (American active agents in a universe of deeds.
Psychiatric Association, 2000) or DSM – is
available for the classification of psychiatric
(mental) conditions. One might say that the STUDYING DOCUMENTS IN ACTION
DSM provides the conceptual architecture in
terms of which western culture comprehends A focus on documents in action tends to
disorders of the mind. Post-Traumatic Stress encourage a focus on how documents are used
(function) and how they are exchanged and the book, librarians to identify the literary
circulate in various communities. Naturally, genre of the book, readers to search out the
documents carry content – words, images, book as science-fiction and so forth. It is in
plans, ideas, patterns and so forth – but such a way that we can begin to see the book as
the ways in which such content is actually an object within a network. More importantly,
called upon and how it functions cannot be however, it is likely that our mysterious book
determined (though it may be constrained) by (or text) will not simply be at the mercy of
an analysis of its content. Indeed once a text the various ‘actors’ in such a network but will
or document is sent out into the world there also become an actor itself.
is simply no predicting how it is going to Perhaps the clearest image of a document
circulate and how it is going to function in as an actor arises in the case of a legally
specific social and cultural contexts. For this constituted ‘last will and testament’, which
reason alone, a study of what the author(s) of on the occasion of its final ‘reading’, acts.
a given document (text) ‘meant’ or intended Or consider the role of various books of the
can only ever add up to limited examination of Bible in the history of social and religious
what a document ‘is’. Indeed, as the literary controversy – which have also served as
theorist De Certeau (1984: 170) has argued, actors (as sources of authority, as witness
‘Whether it is a question of newspapers or to evidence and so forth). And as with
Proust, the text has a meaning only through human actors, documents as actors can be
its readers; it changes along with them; recruited, suppressed, enrolled into the service
it is ordered in accordance with codes of of various interest groups – some examples
perception that it does not control’. In this of which are referred to in Prior (2003).
regard an interest in the reception and reading Unfortunately, one of the problems with
of text has formed the focus for recent histories the concept of immutable mobiles is its
of knowledge that seek to examine how emphasis on stasis. For as the objects in a
the ‘same’ documents have been received network move they often become mutable and
and absorbed quite differently into different metamorphose into new objects.
cultural and geographical contexts (see, for A consideration of objects in a network is
example, Burke, 2000; Livingstone, 2005). usually associated with a somewhat amor-
One possible starting point for inquiries phous group of writers who favour what is
into the dynamics of documentation rests in called actor-network-theory orANT (see, Law
Latour’s notion of an ‘immutable mobile’ and Hassard, 1999). ANT is of concern to
(1987). An immutable mobile is something us insofar as it opens a new dimension for
that can move around, whilst – at the same social research – analysing how documents
time – holding its essential shape. Thus a book, are positioned in actor-networks and also how
or set of instructions, or a recipe, or map, can they function (act) in such networks. (In terms
hold its shape in the ordinary everyday sense of ANT, non-human agents are commonly
of such words, and it can also hold its shape referred to as actants rather than as actors.)
in a relational manner. That is to say, a book From our point of view, the key research
has shape in (three-dimensional) space, but questions revolve around the ways in which
it also has shape as a member of a specific documents are integrated into networks and
type of literature (say a science text, or a how they influence the development of the
work of fiction, or work of science-fiction, or network. This kind of focus has, in some cases,
philosophy, or poetry or history of art). Yet for led to developments in research software to
the book to retain its shape in this relational explore the relational aspects of humans and
sense, a dynamic network of actors is needed. documentation. In what follows I shall outline
Such a network might include, for instance, a few examples. I shall concentrate first on
authors and literary critics to identify the book WWW pages as documents and sketch out
as a work of science-fiction, book catalogues how they can be approached in a variety of
to classify the work, libraries in which to hold social scientific frameworks.
In the first instance, of course, it is clear crawl necessitated the identification of WWW
that WWW pages can be scoured for their addresses for two Ugandan non-governmental
content alone – that is, used and interrogated organisations (NGOs) working with people
as informant. For example, in a 2002 study of with HIV/AIDS. The results of this initial
anti-vaccination web sites, Wolfe et al. (2002) crawl indicate a number of features. I have
identified 22 such WWW sites and noted that highlighted only a few of these in Figure 28.1.
in all cases the documentation asserted that They concern the centrality of international
vaccines caused idiopathic illness, in 95 per- organisations such as the UN, Unicef and the
cent of cases that vaccines erode immunity, World Health Organisation in the document
and in 91 percent of cases that vaccination network. Surrounding those organisations are
policy was driven by profit motives rather the pages of various Ugandan government
than cares about health. These and other organisations (such as health.go.ug), and on
details concerning document content were the periphery are the local NGOs, whilst at
acquired by the use of relatively simple the very edge is the page for the Ugandan
coding techniques. The authors also noted parliament.
that anti-vaccination sites used specific tactics The links between such documentation may
for transmitting their messages. Thus, one be considered as data in themselves – and
favoured strategy involved the use of personal they certainly point to factors such as position
stories – often from parents who served as (degrees of centrality, for example), density
witnesses to the fact that vaccination caused of contact, directions of contact and so forth.
severe illness in their children. Analysis of The links could also be considered as a
story structure would, of course, inveigle us map for exploring the relationships between
into a specific style of discourse analysis – local NGOs, international organisations and
in this case perhaps one that focused on the Ugandan government. Naturally, the
narrative rather than on rhetoric. However, exploration of such links would need to be
there remains a further strategy for the supplemented by the use of other methods
examination of anti-vaccination sites and it and techniques (such as interview techniques
involves looking at the networks that emerge or a range of ethnographic techniques),
out of the relations between such sites. nevertheless the provision of the web map
The possibility for examining relations provides both a starting point and ground
between web sites is, of course, built into on which hypotheses might be generated
web sites ordinarily, for web sites contain concerning notions of, say, ‘partnership’ in
hyperlinks (to other web pages), and by the field of HIV/AIDS in Africa. There is,
concentrating on the outlinks of the web however, a feature of social activity that is
pages it becomes possible to study how only touched upon – rather than confronted –
internet documents relate one to another. by the use of a web crawler. It involves the fact
In recent years the task of tracing the that actor-networks contain human as well as
links between such sites has been facilitated non-human actors.
by the use of web crawlers. However, By tradition, a focus on relationships
Richard Rogers, who has designed one such between people in a network has been
crawler (www.govcom.org), refers to issue associated with social network analysis. Such
networks and issue spaces rather than WWW analysis concentrates on the number of links
networks, (see, Marres and Rogers, 2005). between specific individuals, the degree to
An issue network is a network of pages that which an individual is central or peripheral to
acknowledge each other by way of hyperlinks. a given network, the density of interactional
I have provided a simple example of such or contact nodes and so forth (see, Scott,
a network in Figure 28.1. The figure traces 1999). However, as actor-network theorists
links between web pages of organisations emphasise, social networks cannot be reduced
who work with people with HIV/AIDS in to relations between humans. Consequently,
Uganda. The starting point for the web what is usually needed is an analysis of
amref.org
Local NGO’s
Straight-tak.ar.ug
und.ac.aa
rabn.org
kapc.ocke
sara.aed.org kanco.org
aioug.org govemment.go.ug
International Organisations
unaids.org
who.int
unipa.org
unicef.org
cdc.gov
parliament.go.ug health.go.up tasouganda.org
undp.org
unaso.or.ug
ugandamolg.org ama-assn.org
aidsuganda.org
uhpl.uganda.co.ug cafs.org
globalhealth.org
Ugandan Parliament worldbank.org
toa.gov
finance.go.ug
managa.52
theglobalfurd.org
ubos.org
Figure 28.1 WWW links between organisations in Uganda concerned with HIV/AIDS
(generated using Issuecrawler.net)
relationships between humans, organisations, interested in how the people in the network
and ‘things’ (such as documents, machines, collaborated, as well as the role of such things
germs or whatever). For example, Cambrosio as antigen, antibody reagents (contained in
et al. (2004) studied the nature of collaborative bottles) and antibodies in a research net-
research networks and innovation in a specific work. One component of their investigation
field of biomedicine. The researchers were concentrated on the relationships between
research workshops and research laboratories from the map how the relationships fan
in the development of particular (HLDA) out, the relative importance of each of the
antibodies, and Cambrosio et al. sought to three workshops and which institutions are
designed a network map of the relations that linked to which antibodies. Antibodies are not
linked the institutions and workshops to the documents, of course, but the network map
antibodies. In doing that they designed a illustrates how documents could be mapped
network map – reproduced as Figure 28.2. into a scheme of social relations and how
In the context of this figure the points T, M it could be the documents that form the
and B represent different research workshops. focus of attention rather than the human
The outer points represent the laboratories or beings. However, such maps require dedicated
research centres and the size of the circles software that can generate visual traces of
and squares are proportional to the number actor-networks. In the case discussed the
of antibodies submitted by each laboratory relevant technology was provided by Réseau-
to each workshop. We can see immediately Lu (see, Mogoutov et al., 2005).
Workshops
Research Centres
Connections
Figure 28.2 Human leucocyte differentiation antigens (HLDA) workshops research centres
and antigens
Source: Cambrosio et al., 2004.
DOCUMENTS-IN-INTERACTION In this respect, documents-in-action often take

on qualities similar to those of the broom set
A concern with documents-in-action does not, in motion by Goethe’s Sorcerer’s Apprentice,
however, necessitate a commitment to any or the monster unleashed on the world by
concept of network; whether it be of the Mary Shelley’s Frankenstein – that is to say
ANT variety or otherwise. For it is possible they take on the qualities of human creations
to focus on documents-in-action in terms of that act back on their creators. Exactly how
traditional interactional frameworks. That is documentation can influence performances of
to say, it is plausible and possible to focus the this nature is a focus that is rarely given any
research effort on examining how documents emphasis in qualitative research, yet the detail
enter into ordinary everyday episodes of of my next example underlines how central the
social interaction and how the presence of role of documentation can be.
such documents influences such interaction. My second example is drawn from my own
Sociological and other social scientific studies work and illustrates how documentation can
of schools, workplaces, hospitals, and the form the occasion for talk and interaction;
like are littered with observations concerning how documentation is drawn into interactions
these influences but they are rarely picked up and, again, how it has effects on the
or emphasised in any coherent way. In what performance of the interaction. The data are
follows I shall provide a few simple examples provided in Figure 28.3. The talk therein
of the manner in which documents can was gathered from a study of work in a
(a) enter into episodes of human interaction cancer genetics clinic. In this instance a
and (b) structure the activities of humans. clinical geneticist (designated CG) and nurse
My first example arises out of consideration counsellors (designated NC) are discussing
of an essay written by George Psathas (1979) their understanding of the degree to which
on maps. In that essay, Psathas looked at a given patient is at risk of inheriting a
how maps are used in everyday contexts. His certain type of cancer mutation. The episode
specific focus was on the kind of maps that begins with one of the NCs reading a letter
people draw and dispense for and to others (lines 1–9) of referral to the clinic, and the
so as to find the forthcoming party at ‘our letter frames the ensuing discussion. A second
house’ or some such. His sociological interest document enters into the frame at line 11 –
was on the reasoning that was implicated in it’s a family history, or pedigree as it is known
the drawing of such maps. For example, he in clinical genetics. The pedigree traces the
pointed out how direction maps are always ancestry of the patient who is the focus of
drawn with reference to a destination rather the discussion, and it does so in a drawing
than, say, to the topography of a given that contains symbols for males and females
neighbourhood. More importantly, the use of and lines linking those who are related (see
such maps clearly implicates readers as well as lines 12–13 of the data extract). In this case,
writers (or in this case amateur cartographers). the drawing has been composed by what
For readers of such maps are invariably Latour (1987) would refer to as an ‘inscription
inveigled into following the sequences drawn device’(known here as ‘Cyrillic’). Cyrillic has
on the map. They are obliged, as it were, to also calculated the numerical risk (line 19)
‘perform’ the route that is drawn on the map. of inheritance. Both documents are clearly
Thus, in reading and using the map, the map central to the manner in which the interaction
reader moves herself or himself from point is sequenced and structured. Thus documents
A to point B in a manner dictated above all are read (lines 1–9) whilst others listen; they
by the mapmaker. Such use provides a good are referred to as the occasion for the talk
example of a process referred to as action-at- (lines 11–19); they are pointed at (line 14);
a-distance. It also serves to demonstrate how and used as evidence and counter evidence
documents in use can structure and pattern (lines 14–19). What’s more the documents
their readers – tell the readers how to act. are linked to the speakers in distinct ways
1. NC2: This is ((patient name)) who is 32. ((reading the referral

2. letter)) ‘This lady’s 35 year old sister has just been diagnosed with breast
3. cancer. She herself is 33 and is naturally concerned. There are other
NC Reads 4. sufferers of the disease in the family. An aunt was also diagnosed in her
a letter 5. early 30s. (0.6) She realises that the risks are going to be higher than
6. average. (1.0) She has been thinking of the contraceptive pill although I
7. have asked her to put this on hold until she has been seen and then
8. presumably I will be able to give her a progesterone only pill if you feel
9. this is indicated.’ It’s from her GP. Referring to
10. NC1: It’s an extremely good GP pedigree
11. CG2: Yes, the interesting thing is if you really start to tease it apart
12. there are lots of black lines all over the place, they are all on different
13. sides of the family. This is her grand-maternal’s, er (1.0) niece. 40s
14. NC1: That’s 3rd degree Pointing
15. CG2: Well that is 3rd degree, yeah. And then her (0.4) well, her
16. mother’s grandfather’s sister at 67, so I think we can discount that one.
17. This is the one that is of more concern. She has a sister at 35 and then
18. somebody else at 38 over here. So there are two young people and I
19. suspect that puts her into a high–oh!–24.6 percent. (1.0) Mm.
20. NC1: What did you think, because you had some good thoughts
21. about this one?
22. NG2: (2.0) Em.
23. CG2: This is one that I would put into a high risk group. Can you
24. think why I have decided to put her into a high risk group?
Cyrillic suggests low Division of

risk (just) expertise =
division of labour.
CG = Clinical Geneticist
NC = Nurse Counsellor
Cyrillic – a programme that draws pedigrees and calculates risk
Figure 28.3 Text & documentation underpin the division of labour
and in clear sequences, and finally serve to Documents have content – words, sentences,
underline the ways in which the division of phrases – and content can be counted and
labour (between ‘doctors’ and ‘nurses’) is classified and compared (one document to
underpinned in both this episode and the clinic another). A study of document content can
at large (lines 23–24). form an excellent starting point for social
This second example also raises a number researchers – illustrating how ‘things’ are
of other important issues that lay beyond the described and linked. Social researchers
scope of this chapter; namely, how talk is may also be interested in how those same
to be transcribed and translated into writing things are represented and structured through
(as has been done in Figure 28.3), and what language – in which case the researcher is
conventions are to be deployed so as to render drawn into various forms of discourse anal-
active talk into inert text. ysis. These days of course there are various
types of software that can be called upon
and used as aids to content and discourse
CONCLUSIONS analysis. At the most basic level a researcher
can use a simple concordance programme.
The closing example – as shown in Such a programme would commonly provide
Figure 28.3 – illustrates the multidimensional a list and count of words used in a text
features of documentation in the social world. (together with a facility for locating word
use in sentence context). More sophisticated However, as I have demonstrated, a focus on

text analysis programmes also offer ways documents as ‘actors’ need not be constrained
to recognise and extract ‘concepts’ out of a by thinking about networks, and research
text and to undertake a conceptual analysis into documentation can be allied to a variety
of content. (For some pointers on such proof interactional approaches. Indeed, in the
grammes, see: http://caqdas.soc.surrey.ac.uk/ modern world, documents enter into almost
bibliography.htm.) As with all forms of all episodes of human interaction. Given such
data analysis, however, the software cannot omnipresence it remains puzzling why social
provide a substitute for thinking or for social science relies so heavily on ‘talk’ rather than
scientific insight, and it is clear from some of text as the key source of research data.
the aforementioned references that the most
imaginative forms of analysis rely on concepts
that emerge from the sociological imagination REFERENCES
rather than from simple data mining exercises.
These two broad kinds of analysis – content American Psychiatric Association. (2000) Diagnostic and
and discourse – in various guises tend to Statistical Manual of Mental Disorders. DSM-IV-TR.
Washington, DC: American Psychiatric Association.
dominate in the collection and analysis of
Angrosino, M.V. (1989) Documents of Interaction.
documents and documentary evidence. For Biography, Autobiography, and Life History in Social
the most part, both styles of research tend Science Perspective. Gainsville, FL: University of
to treat the text as a static object – as Florida Press.
something to be read and understood. Another Bazerman, C. (1988) Shaping Written Knowledge. The
way of putting this is to say that both Genre and Activity of the Experimental Article in
styles of analysis use documents as ‘resource’ Science. Madison, WI: University of Wisconsin Press.
rather than ‘topic’. In other words, text and Bowker, G.C. and Star, S.L. (1999) Sorting Things Out.
documentation are there to be scoured for Classification and its Consequences. Cambridge: MA:
evidence or for facts. Consideration as to MIT Press.
Bryman, A. (2004) Social Research Methods. 2 nd Ed.
how the text assumes the shape that it does
Oxford: Oxford University Press.
or, indeed, what the text does is left in Burke, P. (2000) A Social History of Knowledge. From
abeyance. Yet during recent years there has Guttenberg to Diderot. Cambridge: Polity Press.
been an emergent emphasis on the relational Cambrosio, A., Keating, P. and Mogoutov, A.
properties of documentation – in the manner (2004) Mapping collaborative work and innovation
described previously. These interests have in biomedicine. Social Studies of Science, 34:3:
been driven by the development of theoretical 325–364.
concerns (such as in ANT) and developments De Certeau. M. (1984) The Practice of Everyday Life.
in technology that enable us to examine Tr. S. Rendall. London: University of California Press.
Foucault, M. (1972) The Archaeology of Knowledge.
the traces that documentation produce. The
Tr. A. Sheridan. NY: Pantheon.
clearest example of this trend relates to the Gilbert, G.N. and Mulkay, M. (1984) Opening Pandora’s
links between WWW pages – which, with Box. A Sociological Analysis of Scientists’ Discourse.
the use of web crawlers, can be seen to Cambridge: Cambridge University Press.
form a network. One could also extend this Glaser, B.G. and Strauss, A.L. (1967) The Discovery of
kind of analysis to citations of published Grounded Theory. Strategies for Qualitative Research.
work (citation networks); to an examination New York: Aldine De Gruyter.
of links between e-mail messages, or possibly Goethe, J.W. (1949) Faust. Part One. Trans. P. Wayne.
to telephone text messages (although the data Harmondworth: Penguin.
Gross, A.G. (1996) The Rhetoric of Science. Cambridge,
for the latter would need to be derived form
MA: Harvard University Press.
verbal answers to questions about networks Hill, M. (1993) Archival Strategies and Techniques.
rather than from electronic traces). The use London: Sage.
of software such as Réseau-Lu also enables Kay, L.E. (2000) Who Wrote the Book of Life? A History
us to visualise networks of humans and of the Genetic Code. Stanford, CA: Stanford University
things – including documents as things. Press.
Keating, P. and Cambrosio, A. (2000) ‘”Real compared Myers, G. (1990) Writing Biology. Texts in the Construc-
to what?” Diagnosing leukemias and lymphomas’, in tion of Scientific Knowledge. London: University of
M. Lock, A. Young and A. Cambrosio (eds.) Living Wisconsin Press.
and Working with the New Medical Technologies. Nelkin, D. (2001) Molecular metaphors. The gene in
Intersections of Inquiry. Cambridge: Cambridge popular discourse. Nature Reviews, 2:555–559.
University Press. pp. 103–134. Plummer, K. (2001) Documents of Life.2. An invitation
Krippendorf, K. (2004) Content analysis. An Introduction to critical humanism. London: Sage.
to its Methodology. 2nd Ed. London: Sage. Prior, L. (2003) Using Documents in Social Research.
Latour, B. (1987) Science in Action. How to Follow London: Sage.
Scientists and Engineers Through Society. Milton Psathas, G. (1979) Organizational features of direction
Keynes: Open University Press. maps, in G. Psathas (ed.) Everyday Language.
Latour, B. and Woolgar, S. (1979) Laboratory Life. The Studies in Ethnomethodology. New York: Irvington
Social Construction of Scientific Facts. London: Sage. Publishers. pp. 203–225.
Law, J. and Hassard, J. (eds.) (1999) Actor-Network Scott, J. (1990) A Matter of Record. Documentary
Theory and After. Oxford: Blackwell. Sources in Social Research. Cambridge: Polity Press.
Livingstone, D.N. (2005) Text, talk, and testimony: Scott, J. (1999) Social Network Analysis. London: Sage.
geographical reflections on scientific habits. An Scott, J.P. (ed.) (2006) Documentary Research. 4 Vols.
afterword. British Society for the History of Science, London: Sage.
38:1:93–100. Thomas, W.I. and Znaniecki, F. (1958) The Polish
Lynch, M. and Woolgar, S. (eds.) (1990) Represen- Peasant in Europe and America. New York: Dover.
tation in Scientific Practice. Cambridge, MA: MIT Wolfe, R.M., Sharp, L.K. and Lipsky, M.S. (2002) Content
Press. and design attributes of anti-vaccination websites.
Marres, N. and Rogers, R. (2005) Recipe for tracing Journal of the American Medical Association,
the fate of issues and their publics on the web, in 287:24:3245–3248.
B. Latour and P. Wiebel (eds.) Making Things Public. Wood, L.A. (2000) Doing Discourse Analysis. Methods
Atmospheres of Democracy. Cambridge, MA: MIT for Studying Action in Talk and Text. London: Sage.
Press. pp. 922–935. Woolgar, S. (1988) Science: The Very Idea. London:
May, T. (2001) Social Research. Issues, Methods Tavistock.
and Process. 3 rd Ed. Buckingham: Open University World Health Organisation. (1992) International Statis-
Press. tical Classification of Diseases and Related Health
Mogoutov, A., Cambrosio, A. and Keating, P. (2005) Problems. 10 th Revision. London: HMSO. 3 Vols.
Making collaborative networks visible, in B. Latour Zimmerman, D.H. and Pollner, M. (1971) The everyday
and P. Wiebel (eds.) Making Things Public. Atmo- world as a phenomenon, in J.D. Douglas (ed.)
spheres of Democracy. Cambridge, MA: MIT Press. Understanding Everyday Life. London: Routledge and
pp. 342–345. Kegan Paul. pp. 80–103.
29
Video and the Analysis of
Work and Interaction
Christian Heath and Paul Luff
If society is conceived as interaction among cultural organisation and everyday practice

individuals, the description of the forms of this and a growing range of anthropological,
interaction is the task of the science of society in its
and more recently sociological, research that
strictest and most essential sense. (Simmel, 1950:
21–2) uses video and more generally visual media,
to reflect on, illustrate, and in some cases
analyse, the social and institutional forms
that arise in contemporary society. In this
INTRODUCTION regard, it is worthwhile differentiating the
substantial corpus of research and method-
It has long been recognised that video, and ological reflection concerned with the use
before that film, provide the social sciences of visual media in social science research
with an unprecedented opportunity to analyse (consider for example Banks and Murphy,
human culture and social organisation. As 1997; Curry and Clarke, 1978; Emmison and
early as the 1880s, A.C. Haddon used Smith, 2000; Pink, 2001a, 2001b; Rose, 2001;
film as part of his studies of the Torres Ruby, 2000), from the relative paucity of
Strait Islands, and in a very different vein material that address the ways in which video
Edward Muybridge, encouraged by Leland can be used to analyse everyday activities
Stanford, used instantaneous photography to and social interaction (for instance Goodwin,
explore, amongst other things, the structure of 1981; Heath, 1986; Heath and Luff, 2000;
human movement and coordination. (Prodger, Kendon, 1982; Knoblauch et al., 2006).
2003). Since these early beginnings we have Rather than review the diverse ways in
witnessed a burgeoning interest, in particular which the visual, and to a lesser extent
within social anthropology in using video in video, can inform qualitative research, in this
qualitative research (Marks, 1995). There is chapter we wish to briefly sketch a partic-
for example a well-established tradition of ular approach, a methodological orientation,
ethnographic film that powerfully portrays that enables the analysis of audio-visual
recordings of everyday activities and events. is very different from more traditional studies
The approach draws upon methodological of work and occupational practice. However,
developments within sociology, namely eth- in various ways it can be seen to evolve from
nomethodology and conversation analysis. some of the key methodological and analytic
It directs analytic attention towards the concerns that underpinned the emergence of
social and interactional accomplishment of organisational ethnographies. It is perhaps
everyday activities and events. Even though worthwhile providing a little background and
this analytic orientation is only one way raising one or two points that might give a
in which video is used in social science sense of the potential contribution of video
research, it is an approach that has proved and this particular approach.
highly productive and is of growing signif- Work and workplace organisation have
icance within various disciplines including formed a pervasive concern for sociology and
sociology, anthropology and linguistics. It more generally the social sciences from their
is an approach that has begun to throw a significant beginnings in the late nineteenth
new and distinctive light on a variety of century. It has long been recognised that
long-standing topics and issues in the social social interaction in the workplace produces
sciences and an approach that provides the and reproduces organisational forms and the
analytic resources to address the organisation various rules, procedures and dispositions
of social action across a broad and complex that inform the daily transactions that arise
range of everyday and institutional environ- between people in organisations. Parsons’
ments. In recent years for example, we have (1951) analysis of the ‘situation of medical
seen the emergence of studies of scientific practice’ is exemplary in this regard, and
practice, surveillance, medical consultations, though commonly known more for its expo-
children’s play, museum visits, the household, sition of the sick role rather than the organ-
computer-mediated communication, conver- isational structure of the professional-client
sational interaction, political discourse, sur- consultation, it powerfully demonstrates the
gical operations and architectural practice ways in which patterned forms of social
(see for example Engeström and Middleton, interaction, governed by expectations and
1996; Goodwin, 1981, 1995; Goodwin, 1990; dispositions, underpin medical work. The
Goodwin and Goodwin, 1994, 1996; Heath, character of this interaction however, and
1986; Heath and Luff, 2000; Knoblauch et al., the practices that enable its concerted and
2006; LeBaron and Koschmann, 2003; Luff contingent accomplishment, remain largely
et al., 2000; Mondada, 2003; Streeck and unexplicated. Indeed, despite the wide-spread
Kallmeyer, 2001; Suchman, 1987; Whalen, recognition that social interaction forms
1995, Whalen et al., 2002). In this chapter, we the foundation to work and occupational
draw on materials from a study of auctions practice, there is a long-standing neglect
and auction houses, to provide some practical in many forms of organisational analysis,
guidance to using video recordings to address of what Goffman (1983) refers to as the
the social and interactional organisation of ‘interaction order’. In turn, by neglecting the
naturally occurring events. interactional foundations of organisations, we
not infrequently find a disregard for the ways
in which work is accomplished by participants
WORKPLACE ORGANISATION & themselves (Barley, 1996; Barley and Kunda,
SOCIAL INTERACTION 2001; Silverman, 1970, 1997a, 1997b).
There are important exceptions. Since their
An increasing body of video-based, quali- early beginnings many qualitative studies of
tative research is concerned with work; in work and organisation have placed social
particular the social and interactional accom- interaction at the heart of analytic agenda.
plishment of complex forms of organisational For example, in his insightful discussion
activity. This burgeoning corpus of research of the methodological commitments that
VIDEO AND THE ANALYSIS OF WORK AND INTERACTION 495
informed what came to be known as the sociological attention. Perhaps the most
post-war Chicago school, Everett Hughes significant contribution in this regard are
suggests that the principal aim of the studies studies that draw upon ethnomethodology and
is to ‘discover patterns of interaction’ and conversation analysis and form ‘part of a
that ‘the subject matter of sociology is programme of work undertaken … to explore
interaction’ (Hughes, 1971). These method- the possibility of achieving a naturalistic
ological commitments, and in particular, observation discipline that could deal with the
the recognition that work and occupational details of social action(s) rigorously, empir-
performance evolves in, and is sustained ically, and formally’ (Schegloff and Sacks,
through, interaction, gave rise to a rich 1973:233). Building on the analysis of con-
and insightful body of sociological and versation, we have witnessed the emergence
in particular ethnographic studies of work of a broad range of studies of talk in insti-
and organisation (see for example Becker, tutional settings, primarily based on audio-
1963; Goffman, 1963; Roth, 1963; Strauss recordings, that address the organisation of
et al., 1964). These studies have had a a range of workplace activities including
profound influence on successive generations legal interrogation, news interviews, political
of workplace ethnography including for oratory, diagnosis in medical consultations,
example Barley, 1989; Hochschild, 1983; the delivery of bad news, counselling and ther-
Star, 1996; Strong, 1978; Van Maanen, apy and classroom instruction and teaching
1991, and directly and indirectly given (see for example Atkinson, 1984; Atkinson
rise to parallel developments in cognitive and Drew, 1980; Boden, 1994, Boden and
science, anthropology and emerging fields Zimmerman, 1991; Clayman and Heritage,
such as Computer Supported Cooperative 2002; Drew and Heritage, 1992; Heritage and
Work. Despite these methodological com- Maynard, 2006; Maynard, 2003; Peräkylä,
mitments, the richness and insightfulness of 1995; Silverman, 1997a, 1997b; Whalen
these ethnographies, the interaction that arises et al., 1988; Zimmerman, 1992). As Heritage
in, and sustains, organisations, the interac- (1984, 1997) points out, the sequential and
tion through which work is accomplished turn organisation of talk has provided a
in collaboration with others, can remain critical resource for these studies as they
under-explored and sometimes unexamined. explicate the ways in which highly specialised
Indeed, many of the concepts that inform forms of activity embody a re-specification
this ethnographic tradition: concepts such of the interactional practices that inform
as negotiation, bargaining, career, and the conversational organisation; a re-specification
like, tend to draw attention away from the that enables ‘institutional realities and their
details of organisational conduct – the talk, unique characteristics to be talked into
visible and material action through which being’.
people, in collaboration with others, produce Not withstanding the significant contribu-
and coordinate their workplace activities. tion of these studies to our understanding of
Moreover, the concepts and methodological work and organisation, it is recognised that
precepts that pervade qualitative studies of the interactional accomplishment of social
work and related forms of ethnography, whilst actions and activities involves the interplay
powerfully resonating with field studies and of talk and visible conduct such as gesture
naturalistic observation, do not necessarily and bodily comportment. It is recognised that
lend themselves to the analysis of video objects and artefacts, tools and technologies,
and in particular to examining the wealth of play a critical part in many activities and
detail made available through audio-visual that the use of material resources are a
recordings of everyday events. pervasive and integral feature of almost all
Over the past few decades however human activities not least of which those
the social and interactional foundations of that arise in the workplace. In the last
workplace activities has received sustained decade or so, audio-visual recordings of
naturally occurring events have provided in interaction is both ‘context-sensitive and

researchers with unprecedented access not context-renewing’. Third, analysis is directed
just to talk, but the bodily and material conduct towards explicating the social organisation,
of participants and enabled the detailed, the methods in and through which participants
repeated examination of social actions and themselves accomplish their actions and
activities and their situated accomplish- activities in concert and collaboration with
ment. Ethnomethodology and conversation others, that is, the socially organised practices
analysis provide the resources that enable and reasoning on which people rely to
the analysis of video and in particular produce their own actions and make sense
the detailed examination of the ways in of the contributions of others – the practices
which talk, gesture, the use of tools and and reasoning that inform the concerted,
artefacts and the like, inform the practical collaborative accomplishment of practical
interactional accomplishment of work and action.
organisation. With the focus on the situated accomplish-
ment of practical action, analysis proceeds
therefore on a ‘case-by-case’ basis. It involves
EXAMINING A FRAGMENT the detailed examination of particular events
and the ways in which they are accomplished
This approach to the analysis of video by the participants themselves, within the
recordings of naturally occurring events is practical circumstances in which they arise.
driven by three principal methodological It addresses the talk, the visible, and the
commitments that direct analytic attention material conduct of participants, their use of
towards the local, practical accomplishment objects and artefacts, tools and technologies,
of social actions and activities. In the first and considers the ways in which particular
instance, it is concerned with the ‘situated’ actions and activities are accomplished, in and
character of practical action and in particular through interaction.
the ways in which the accomplishment of It is helpful to consider an example. The
social actions and activities is inseparable following fragment is drawn from a corpus of
from, and inextricably part of, the context video recordings of auctions of fine art and
in which they arise. In other words the antiques. The following fragment involves
sense and significance of social actions the sale of a small nineteenth-century silver
or activities is accomplished within the porringer. It is one of six hundred or so
circumstances and context of their produc- lots for sale over a couple of days at a
tion. Second, the concern with the situated leading provincial auction house. The sale of
character of practical action directs attention the lot lasts no longer than thirty seconds.
to the ways in which social actions and It involves a rapid and complex interaction
activities are ongoingly and contingently through which the price is systematically
accomplished by participants themselves; escalated and the goods sold on the fall of the
how actions and activities are produced hammer to the highest bidder. This type of
moment by moment with regard to emerging interaction is repeated numerous times during
circumstances at hand and in particular, the auction and can provide some useful
the real time contributions of others. The insights – not only into the organisation of
emergent, interactional accomplishment of sales, but the work and practices of a partic-
social action and activity is perhaps most ular occupational group, namely auctioneers.
manifest in talk in conversation, in which Here, therefore is a fragment of organisational
each next utterance, or a turn at talk, is activity, involving a particular occupation,
produced with regard to the immediately where work is accomplished through social
preceding action(s) and in turn, implicates, interaction; interaction that involves the
and provides the framework for subsequent interplay of talk and visible conduct and a
action; as Heritage (1984) suggests action form of interaction that determines the price
and exchange of goods worth some billions of the length of pauses or silences are captured
pounds each year. in tenths of a second, for example, ‘(0.3)’.
To simplify matters we use ‘{B1 bids}’ Pauses of less than two tenths of a second
to represent the bidding, the number giving are represented by ‘(.)’; words or parts of
an indication of the order at which different words that are emphasised by the speaker are
participants enter the bidding. Where the underlined, ‘is that’. Sounds that are elongated
auctioneer (A) bids on behalf of a buyer who are captured by colons, the number of colons
cannot attend the sale – what is known as a representing the length of the elongation,
‘commission bid’ – we have used ‘{A bids}’. ‘number:’; and intonation is captured by
Commission bids are where the buyer leaves punctuation marks, for example, for rising
a price with the auction house and the intonation: ‘One thirty now:?’. More detailed
auctioneer bids on their behalf until they reach versions of the orthography can be found
the maximum price of the commission. in various books and collections including
for example Boden and Zimmerman (1991),
Drew and Heritage (1992) and Maynard
FRAGMENT 1: TRANSCRIPT 1 (2003).
Before considering the visible or nonverbal
A: Lot number: (0.2) Four Three aspects of the participants’ conduct, we can
Three (.) Four Three Three the lot begin to generate some initial observations
number: now. Bidding here at one
concerning the talk that arises in the fragment.
hundred pounds now.
(.) {A bids} In the first place, we can see that the talk
A: A hundred pounds I’m bid straight is primarily produced by one party, namely
away for this, at a hundred pounds:, the auctioneer. He briefly introduces the lot
(.) One hundred pounds (will do it) and then repeatedly announces a series of
One hundred one ten (.) n ow:? (0.3)
figures. These figures escalate in terms of
A hundred pounds only. One hundred
pounds, one hundred pounds. One ten increments of ten pounds – beginning at
now quickly? one hundred pounds, with the goods finally
(0.3) {B1 bids, B2 raises hand} being sold at one hundred and thirty pounds.
A: One ten is that. One ten I’m bid. Bidding appears to alternate between the
One ten. One twenty on commission now.
auctioneer, bidding on behalf of a commission
One thirty now:? One twenty still
with me, at one twenty. buyer (‘bidding here at one hundred’ and
{B2 bids} ‘one twenty on commission’), and buyers in
A: One thirty bid there: fresh bid, the room (B1 bids ‘one ten’, B2 bids ‘one
one thirty, one thirty. Forty now:? thirty bid there:’). In the first instance, the
(0.2)
auctioneer appears to take a bid from B1
A: At a hundred an thirty pounds (.)
bids there at one thirty. Do show rather than B2 who also attempts to bid by
if you happen to have an extra bid. raising his hand. The auctioneer not only
At one thirty over there. takes bids from particular participants, but
{knock} displays those bids to all who are present, for
A: One thirty that’s yours sir.
example announcing that the bid is ‘here’ at
The buyer number is?
one hundred pounds, ‘there’ at one hundred
Talk is transcribed using an orthography and thirty, and ‘still with me’ at one twenty.
developed by Gail Jefferson and commonly It also appears that the auctioneer goes to
used in ethnomethodology, conversation anal- some trouble to elicit bids from people in the
ysis and cognate approaches such as discourse audience and before finally selling the goods;
analysis. The transcription system is designed attempting to maximise the opportunities for
to capture aspects of the articulation of the anyone present to bid.
talk and in particular the interactional position Whilst the auctioneer does most, if not all,
and production of the participants’ utterances. of the speaking during the sale of the lot, the
Very briefly: talk is laid out turn by turn, transcript begins to reveal the ways in which
sequences of action are critical to the structure a question whose […] analysis may also be relevant
of the activity. For example, the auctioneer’s to find what ‘that’ is. That is to say, some utterances
repetition of a particular increment, such as may derive their character as actions entirely from
placement considerations. (1974)
one hundred pounds, involves an attempt
to elicit a bid from a member of the For instance, whilst the auctioneer repeat-
audience. Once the bid is received, in this edly reiterates the first bid, one hundred
case by a participant raising his hand, it is pounds, it is only when he announces the
acknowledged by the auctioneer with ‘one next increment with a rising intonation that
ten is that’. In turn, the auctioneer produces participants attempt to bid, in this case two
the next bid, on behalf of his commission at the same time. Transcription also begins to
buyer, ‘one twenty on commission now’ and reveal the complexity of the action that arises
invites a subsequent bid from the floor, ‘one even within a very brief fragment such as this,
thirty now:?’. The participant’s bid, indeed and provides the resources to begin to draw
the attempt by both B1 and B2 to bid, are some preliminary observations concerning
sensitive to the auctioneer’s invitation, ‘one the structure and arrangement of the actions.
ten now quickly?’, and in turn, the auctioneer In this case, the transcript also points to
accepts a bid from B1 and is able to announce some more general features of interaction,
the next bid, namely ‘one twenty’. In turn, be it within the workplace or any other
the announcement of the commission bid environment for that matter – how the event
at ‘one twenty still with me’ is followed contingently emerges, moment by moment,
by the auctioneer looking for a next bid and the ways in which each contribution
at one hundred and thirty pounds. We can is sensitive to the actions of others, or the
see therefore how particular actions of the withholding of particular actions, and oriented
auctioneer serve to elicit bids from members to a determinate range of possibilities.
of the audience, just as those bids enable the
auctioneer to announce the price and produce
a subsequent bid. Each action is sensitive to THE VISIBLE AND THE MATERIAL
the prior, indeed, may be elicited by the prior
action, and in each case forms the basis to It is clear that a range of actions that arise
subsequent action and activity. These actions within the sale of the lot are not available
are organised with regard to distinct forms of through inspection of the talk alone and that
sequential and interactional organisation that the talk is accompanied by, and sensitive
underpins the escalation of price. Where no to, various visible and material actions. For
further bids are forthcoming, the auctioneer example, at least two people bid using
is able to bring the sale to a successful nonverbal or visible actions and these bids are
completion with the fall of the hammer. critical to the escalation of the price and the
Transcribing talk provides the opportunity final sale of the goods. How these actions arise
to become more familiar with the actions with regard to the visible and accompanying
that arise within a particular activity and to talk of the auctioneer is not available using this
begin to scrutinise not only what is said and limited transcript. Moreover, these gestured
how, but the location of particular utterances turns or bids, are attributed by the auctioneer
or actions and how they are produced with to particular individuals in the room, or even
regard to the contributions of others. It an absentee buyer, and yet their ascription
enables the researcher to address why specific of actions to the participants, for example
actions arise at particular moments within the ‘one ten is that’, ‘bids there at one thirty’,
emerging course of the activity. As Schegloff ‘bidding here at one hundred pounds now’
and Sacks suggest: remain ambiguous without reference to the
visible aspects of the activity. These gestured
a pervasively relevant issue (for participants) about turns and their revelation are critical to the
utterances in conversation is ‘why that now,’ escalation of price and the sale of the goods
and feature in the sequence of action through The following is a highly simplified version
which bids are elicited and acknowledged. of a more complex transcript that is included
Various artefacts also play an important role later in the chapter, but it provides a sense of
in the event. The fall of the gavel for example the ways in which we can begin to map out
finalises the sale of the goods and their transfer the participants’ conduct and identify some
of ownership. The auctioneer’s book not only features of actions’ organisation.
provides information concerning commission Transcribing the visible, as well as the
bids, reserves and the like, but is referenced spoken aspects of the fragment, provides
and referred to by the auctioneer during the an important resource with which to begin
course of the sale. Without taking the visible to examine the participants’ conduct and to
aspects of the participants seriously, their identify the potential relationship between
gestures, bodily orientation, use of artefacts particular actions. For example, in this
and the like, it is difficult to address the fragment, we can notice that as he announces
organisation of the activity and the practices the current increment ‘one twenty on com-
upon which the auctioneer relies upon to mission’ the auctioneer turns and gestures
conduct the sale. towards the first bidder, B1, inviting him
To examine how the visible, as well as talk to bid at the next increment, namely one
feature in the accomplishment of the activity, hundred and thirty pounds. However, even
we need to develop our transcript to enable us as he voices the next increment ‘one thirty
to begin to encompass various aspects of the now:?’, he turns away from the first bidder
participants’ visible conduct. Unfortunately, and looks for an alternative participant who
but not surprisingly, there is no general may be prepared to bid. The auctioneer’s
or widely accepted transcription system for actions reveal that the first bidder has declined
the visible and material aspects of social the next increment and that ‘one thirty
interaction. Over some years however, those now:?’ serves as a generalised invitation for
undertaking video-based studies informed by anyone in the room to bid. As he undertakes
ethnomethodology and conversation analysis, the search for a new bidder, he not only
have developed ways of working with video announces that the bid is ‘still with me’ but
that enables them to transcribe aspects of reveals the source of that bid, dramatically
the participants’ bodily conduct in particular pointing first to the book that contains the
with regard to the talk (see for example commission bid and second to himself bidding
Goodwin, 1981; Heath, 1986). There is some on behalf of the absentee participant. A
individual variation in how this is done, but it new bidder raises his hand and the bid is
ordinarily includes identifying the onset and accepted ‘one thirty’. The bid is produced
completion of particular actions, such as a as the auctioneer announces ‘it’s still with
gesture and demarcating significant aspects me’ and in particular when the auctioneer’s
of its articulation – such as for example, search around the room arrives at the area
where it reaches its acme. These transcripts where the bidder is sitting. In other words,
are primarily concerned with delineating the both the auctioneer’s announcement ‘it’s still
occurrence and position of particular aspects with me’ and his visual orientation, serve to
of the participants’ visible conduct. They encourage the participant to bid and to bid at a
may include details of head nods, gestures, particular moment. As he announces the bid,
visual orientation, changes in body position, ‘one thirty’, the auctioneer gestures towards
the use of particular artefacts, and the like; the bidder, and displays both to the bidder
indeed whatever arises within the developing and all those present, who has the bid of
course of a fragment. The transcript provides ‘one thirty’.
a resource to begin to discover the geography Such transcripts are far more detailed than
and organisation of action within a fragment the diagram shown above. They are primarily
and to document certain features of the used by the researcher and enable a range
participants’ conduct and interaction. of potentially relevant details of conduct
FRAGMENT 1: TRANSCRIPT 2
Auctioneer
Orientation looks
B1 around room B2
..........____________________,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,....____________________,,
Gesture
open palm points
at B1 at book at self at book at B2
Œ Œ Œ Œ Œ
One twenty on commission. One thirty now: It’s still with me. At one twenty. One thirty
Œ
to be identified, clarified and documented. which such transcripts are primarily designed
They form the basis to generating notes as a vehicle for the individual researcher to
and ideas about particular fragments and examine and document observations concern-
the organisation of particular actions and ing a fragment. Transcription is an important,
activities. The transcript below is part of the if not the critical resource, for the analysis
original from which our observations of this of particular events with video recording
fragment are drawn and illustrates the ways in remaining the principal source of data.
FRAGMENT 1 TRANSCRIPT 3
Video, coupled with an appropriate scrutinising the video recording of naturally

methodological framework, enables the occurring events such as auctions, we can
researcher to begin to unravel the complex begin to discern the ways in which the
range of action that arises within a seemingly character of the participants’ conduct might
transient activity. In the case at hand, from a be pertinent to our understanding of particular
single fragment, one can begin to understand form of work and the organisation of markets
a little more of auctions and the practices as well as associated issues such as trust and
through which the auctioneer and participants legitimacy (see for example Heath and Luff,
accomplish the valuation and sale of goods. 2007; Smith, 1989).
We find for example, that the auctioneer Examination of a single case, in this
juxtaposes bids from different members of instance a brief fragment from a sale by auc-
the audience with commission bids from tion, can provide a rich array of observations
the book, and when one of those bidders concerning the organisation of an activity. It
withdraws, initiates a search to discover enables the researcher to scrutinise action,
a new bidder. We also see how, through to consider the ways in which participants
talk and gesture he ascribes values or collaboratively accomplish an activity, and to
increments to particular individuals, even reflect upon the resources, the competencies,
absent individuals, and thereby enables on which they rely. It also enables a researcher
bidders to know where they stand with regard to respect and to recognise the significance
to the escalating price of the goods. It is of seemingly slight, even trivial actions,
also interesting to note, that in eliciting and and to discover how they feature in the
ascribing bids to particular individuals, the activity’s accomplishment. In this way, we
auctioneer enables all those present to see and can use video to address one of the basic
to witness, who has the bid at any point within methodological concerns that underlie much
the developing course of the proceedings. In qualitative social science, that is, to take
various ways therefore, despite the seeming, the participant’s perspective seriously and to
slightness of the actions revealed when consider how their conduct serves to produce
particular actions within the practicalities of have to be managed. Moreover, with the
accomplishing everyday, socially organised, interest in exploring the social interactional
activities, in concert, with others. organisation of naturally occurring activities,
it is critical, as far as practically possible,
to encompass the actions of all participants.
DATA COLLECTION In some settings, where there are two or
three participants involved in what Goffman
This particular form of analysis has a signifi- (1971) refers to as a ‘focused gathering’it may
cant bearing on the type of data that needs to well be possible to gather analytically fruitful
be gathered and the ways in which we record data using a single camera with a built-in
and document action and activities within microphone. Settings that involve numerous
particular environments. Every setting poses participants, and in some cases a diverse
its own unique demands on data collection and range of material resources, settings such
can raise particular difficulties for undertaking as classrooms, control rooms and operating
video recording. In almost every setting theatres, may necessitate the use of multiple
it is critical therefore that the researcher cameras and separate microphones placed in a
undertakes a period of field observation before number of locations. It is unlikely, even after
considering the introduction of cameras a period of fieldwork, that the first recordings
and microphones. Fieldwork provides an will provide the necessary quality or access to
opportunity for the researcher to become the action, and in many cases, the researcher
familiar with the setting – the socio-physical will find that it is necessary to gather recorded
environment, the sorts of activities that arise data over a series of occasions before finding
and patterns of interaction and the like. It the most useful and appropriate position and
also enables the researcher to see the ways perspective for recording. Indeed, it is not
in which various material resources feature unusual during the course of a project to gather
in particular activities, be they computers, data from rather different positions to enable
paper documents, or even as in previous particular phenomena to be investigated.
fragment, hammers, and to reflect upon the These phenomena, and the decision to collect
ways in which they constrain and of course particular forms of data, may change as the
provide opportunities for particular activities. analysis develops during the course of a study;
Last but not least, a period of fieldwork data collection is an iterative process in which
enables the researcher to engage, where materials may be progressively gathered in
relevant, with the participants themselves, the course of examining, transcribing and
and to establish a relationship that can form analysing data.
the basis to securing their willingness to While the audio-visual recordings are
be video recorded and to clarify the ethical likely to form the principal data on which
requirements that participants themselves see analysis is developed, fieldwork, and in some
as important. cases the fieldwork that accompanies the
There are a number of practical issues that actual recording, remains an important, if not
have to be addressed in undertaking video critical, part of the research. If we take the
recording of naturally occurring activities. workplace for example, there are a range
Each setting poses its own unique demands of practices, conventions and resources that
and it is unusual that one is able to gather bear upon, and inform the accomplishment
quality data on the first occasion that one of particular activities, and it may well be
records. The lighting, the physical arrange- necessary to augment video analysis with field
ment of the space, the position and movement observation and even interviews. In many
of the participants, the ambient noise, the cases it is necessary to gain access to the
location of particular objects and technologies relevant material resources, such as records,
and the necessity to remain, as far as possible, work sheets, diagrams, plans and the like,
unobtrusive, can all raise difficulties that and become familiar with the ways in which
they are used. In this regard, screen-based or field research. The technology, and the
technologies can pose particular difficulties, analytic opportunities it affords, however,
since it may be necessary to record the raises important methodological challenges
contents of the screen. There are a number of for the social sciences and demands distinctive
solutions. In some circumstances it is possible approaches to the study of social action and
to video record the screen with a camera (for interaction.
some screens this may require the frame rate Perhaps the most substantial corpus of
to be appropriately adjusted). The data that video-based, naturalistic studies to emerge
are gathered depends not only on the analytic within sociology over the past couple of
approach that has been adopted but also the decades or so have been informed by
sorts of phenomena that are addressed. Data, ethnomethodology and conversation analysis.
including audio-visual recordings, are always These studies have addressed the social and
constrained by practicalities and resources. interactional organisation of a broad range
It is critical however that the materials that of actions and activities and delineated ways
underpin the research can legitimately serve in which seemingly mundane events are
the insights and phenomena that are addressed accomplished in, and through, the complex,
in the analysis. In particular, we need to yet systematic, interplay of talk, visible and
demonstrate the ways in which participants material conduct. They have revealed the
themselves orient to and rely upon the order and organisation that underlies and
practices that inform the accomplishment of informs the production of everyday activities
the action and activities at hand. and begun to delineate the resources on which
participants rely to make sense of and coordi-
nate the actions in which they engage. In this
SUMMARY regard, the emergence of workplace studies –
studies of work, interaction and technology
Despite numerous calls for the social sciences in complex organisational environments – is
to take the visual seriously, video remains a of particular interest. These studies provide
surprisingly neglected resource, relegated to the resources to address and re-specify some
a marginal role in some qualitative research key concepts and ideas that inform more
and absent from most. When video is used, conventional analyses of occupational prac-
it often forms an accompaniment to others tice and institutional environments. Indeed,
forms of data collection that prioritises in various ways, these video-based studies of
fieldwork, and is used to illustrate events and the workplace draw on, and transform, the
activities that have been primarily identified long-standing recognition that social inter-
and analysed using conventional ethnographic action underpins and preserves institutional
observations. Yet video provides a more arrangements, and enables a reorientation
significant opportunity for social science of studies of organisational practice. They
research, a resource that enables analysts to also provide a vehicle for taking the object
scrutinise social actions and activities in ways seriously, and reshaping the ways in which
that hitherto were not possible, to begin to we address and reveal how the material, the
discover phenomena and aspects of socially environment, technology, artefacts and the
organised practice unavailable to conven- like, feature in the practical accomplishment
tional fieldwork and ethnography. Moreover, of social action and activity. The significance
audio-visual recordings of naturally occurring of video therefore is not simply that it provides
activities and events provide the opportunity another way of gathering data, but rather, with
of building a more cumulative data corpus an appropriate methodological framework,
than is possible within many other forms enables the social sciences to build a rigorous
of qualitative research and to engage in and systematic analysis of the organised
forms of collaborative research and analysis production of social action as it occurs in its
that is unavailable within much ethnography everyday, natural environments.
REFERENCES Goodwin, C., & Goodwin, M. H. (1994). Professional

vision. American Anthropologist, 96 (3), 606–633.
Atkinson, J. M. (1984). Our Master’s Voices: The Goodwin, C., & Goodwin, M. H. (1996). Seeing
Language and Body Language of Politics. London: as a situated activity: Formulating planes, in
Methuen. Y. Engeström, & D. Middleton (eds) Cognition and
Atkinson, J. M., & Drew, P. (1980). Order in Court: The Communication at Work (pp. 61–95). Cambridge:
Organisation of Verbal Interaction in Judicial Settings. Cambridge University Press.
London: Macmillan. Goodwin, M. H. (1990). He-Said-She-Said: Talk as Social
Banks, M., & Murphy, H. (1997). Rethinking Visual Organisation Among Black Children. Bloomington,
Anthropology. New Haven: Yale University Press. IN: Indiana University Press.
Barley, S. (1996). Technicians in the workplace: Heath, C. C. (1986). Body Movement and Speech
ethnographic evidence for bringing work into organi- in Medical Interaction. Cambridge: Cambridge
sational studies. Administrative Science Quarterly, 41, University Press.
404–444. Heath, C. C., & Luff, P. K. (2000). Technology in Action.
Barley, S., & Kunda, G. (2001). Bringing work back in. Cambridge: Cambridge University Press.
Organization Science, 12 (1), 76–95. Heath, C. C., & Luff, P. (2007) Ordering competition: the
Barley, S. R. (1989). Careers, identities and institutions: interactional accomplishment of the sale of art and
The legacy of the Chicago School of Sociology, antiques at auction. British Journal of Sociology, 58
in Arther, M., Hall, T., & Lawrence, B. (eds) The (1), 63–85.
Handbook of Career Theory (pp. 41–65). Cambridge: Heritage, J. (1997). Conversation analysis and institu-
Cambridge University Press. tional talk: Analysing data, in S. D. Silverman (ed)
Becker, H. (1963). The Outsiders: Studies in the Qualitative Research: Theory, Method and Practice
Sociology of Deviance. New York: The Free Press. (pp. 161–182). London: Sage.
Boden, D. (1994). The Business of Talk: Organizations Heritage, J., & Maynard, D. W. (eds) (2006).
in Action. Oxford and Cambridge, MA: Polity Press. Communication in Medical Care: Interaction between
Boden, D., & Zimmerman, D. H. (eds). (1991). Talk Primary Care Physicians and Patient s. New York and
and Social Structure: Studies in Ethnomethodol- Cambridge: Cambridge University Press.
ogy and Conversation Analysis. Cambridge: Polity Heritage, J. C. (1984). Garfinkel and Ethnomethodology.
Press. Cambridge: Polity Press.
Clayman, S., & Heritage, J. C. (2002). The News Hochschild, A. R. (1983). The Managed Heart: The
Interview: Journalists and Public Figures on the Air. Commercialisation of Feeling. Berkeley and Los
Cambridge: Cambridge University Press. Angeles, California: University of California Press.
Curry, T. J., & Clarke, A. C. (1978). Introducing Visual Hughes, E. C. (1971). The Sociological Eye: Selected
Sociology. Dubuque: Kendall Hunt. Papers on Institution and Race (Part I) and Self
Drew, P., & Heritage, J. C. (eds). (1992). Talk at Work: and the Study of Society (Part II). Chicago: Aldine
Interaction in Institutional Settings. Cambridge: Atherton.
Cambridge University Press. Kendon, A. (1982). The organisation of behaviour in face
Emmison, M., & Smith, P. (2000). Researching the to face interaction: Observation on a development
Visual. London: Sage. of a methodology, in K. Scherer, & P. Ekman
Engeström, Y., & Middleton, D. (eds) (1996). Cognition (eds) Handbook of Methods of Nonverbal Behaviour
and Communication at Work. Cambridge: Cambridge Research. Cambridge: Cambridge University Press.
University Press. Knoblauch, H., Schnettler, B., Raab, J., & Soeffner,
Goffman, E. (1963). Asylums: Essays on the Social G. (eds) (2006) Video Analysis: Methodology and
Situation of Mental Patients and other Inmates. Methods. Berlin: Peter Lang.
New York: Doubleday. LeBaron, C., & Koschmann, T. (2003). Gesture and
Goffman, E. (1971). Relations in Public. Harmondsworth: the transparency of understanding, in P. Glenn,
Penguin. C. LeBaron, & J. Mandelbaum (eds) Studies in
Goffman, E. (1983). The interaction order. American Language and Social Interaction in Honor of Robert
Sociological Review, 48 (February), 1–17. Hopper (pp. 119–132).Mahwah, NJ: Lawrence
Goodwin, C. (1981). Conversational Organisation: Erlbaum Associates.
Interaction between Speakers and Hearers. London: Luff, P., Hindmarsh, J., & Heath, C. C. (eds) (2000).
Academic Press. Workplace Studies: Recovering Work Practice and
Goodwin, C. (1995). Seeing in depth. Social Studies of Informing System’s Design. Cambridge: Cambridge
Science, 25 (2), 237–274. University Press.
Marks, D. (1995) Ethnographic film: from Flaberty to Silverman, D. (1997b). Discourses of Counseling: HIV
Asch and after. American Anthropologist, 97 (2), Counseling as Social Interaction. London: Sage.
337–347. Simmel, G. (1950). The Sociology of George Simmel,
Maynard, D. W. (2003). Bad News, Good News: Wolf, K. (ed). Glencoe, Illinois: Free Press.
Conversational Order in Everyday Talk and Clinical Smith, C. W. (1989). Auctions: The Social Construction
Settings. Chicago: University of Chicago Press. of Value. London: Harvester Wheatsheaf.
Mondada, L. (2003). Working with video: how surgeons Star, S. L. (1996). Working together: Symbolic interac-
produce video records of their actions. Visual Studies, tionism, activity theory and information systems, in
18, 58–73. Engeström, Y. & Middleton, D. (eds) Cognition and
Parsons, A. S. (1951). The Social System. Glencoe: Free Communication at Work (pp. 296–318). Cambridge:
Press. Cambridge University Press.
Peräkylä, A. (1995). Aids Counselling: Institutional Inter- Strauss, A., Schatzman, L., Bucher, R., Ehrlich, D.,
action and Clinical Practice. Cambridge: Cambridge & Sabshin, M. (1964). Psychiatric Ideologies and
University Press. Institutions. London: Free Press.
Pink, S. (2001a). Doing Ethnography: Images, Media and Streeck, J., & Kallmeyer, W. (2001). Interaction by
Representation in Research. London: Sage. inscription. Journal of Pragmatics, 33, 465–490.
Pink, S. (2001b) More visualising, more methodologies Strong, P. (1978). The Ceremonial Order of the Clinic:
on video, reflexivity and qualitative research. Patients, Doctors and Medical Bureaucracies. London:
Sociological Review, 49 (1), 586–599. Routledge Kegan Paul.
Prodger, P. (2003). Time Stands Still: Muybridge and Suchman, L. (1987). Plans and Situated Actions: The
the Instantaneous Photography Movement. Oxford: Problem of Human Machine Interaction. Cambridge:
Oxford University Press. Cambridge University Press.
Rose, G. (2001). Visual Methodologies: An Introduction Van Maanen, J. (1991) The smile factory: Work at
to the Interpretation of Visual Materials. London: Disneyland, in Frost, P. J., Moore, L. F., Louis, M. L.,
Sage. Lundberg, C. C., & Martin, J. (eds) Reframing
Roth, J. A. (1963). Timetables: Structuring and the Organisational Culture (pp. 58–76). London: Sage.
Passage of Time in Hospital Treatment and other Whalen, J. (1995). Expert systems vs. systems for
Careers. Indianapolis: Bobbs Merrill. experts: Computer-aided dispatch as a support
Ruby, J. (2000). Picturing Culture: Explorations of Film system in real-world environments, in P. Thomas (ed)
and Anthropology. Chicago: University of Chicago The Social and Interactional Dimensions of Human-
Press. Computer Interfaces (pp. 161–183). Cambridge:
Schegloff, E. A., & Sacks, H. (1973). Opening up closings. Cambridge University Press.
Semiotica, 7, 289–327. Whalen, J., Whalen, M., & Henderson, K. (2002).
Schegloff, E. A., & Sacks, H. (1974). Opening up closings, Improvisational choreography in a teleservice work.
in R. Turner (ed.) Ethnomethodology (pp. 233–264). British Journal of Sociology, 53 (2), 239–259.
Harmondsworth, U.K. and Baltimore, MD: Whalen, J., Zimmerman, D. & Whalen, M. (1988). When
Penguin. words fail: a single case analysis. Communication
Silverman, D. (1970). The Theory of Organisation. Yearbook, 11, 406–432.
London: Heinemann. Zimmerman, D. H. (1992). The interactional organization
Silverman, D. (1997a). Studying organisational interac- of calls for emergency assistance, in P. Drew &
tion: ethnomethodology’s contribution to the ‘new J. Heritage (eds) Talk at Work: Interaction in
institutionalism’. Administrative Theory and Praxis, Institutional Settings (pp. 418–469). Cambridge:
19 (2), 1. Cambridge University Press.
30
Secondary Analysis of
Qualitative Data
Janet Heaton
INTRODUCTION dedicated for such studies. However, since the

mid-1990s, there has been growing interest
Secondary analysis of qualitative data is an in the methodology, particularly in the UK,
emerging methodology in social research that Europe and North America. This is indicated
involves the re-use of data originally collected by the growing number of studies involving
in primary studies. Such data include field secondary analysis of a wide range of qual-
notes, transcripts of interviews and group itative data, as well as commentaries on the
discussions and observational records. The possibilities and problems of re-using these
analysis of other ‘found’ or more ‘naturalistic’ data (for example, see Corti and Thompson,
types of qualitative data, such as personal 2004; Fielding, 2004; Hammersley, 1997;
diaries, autobiographies, letters, documents Heaton, 1998, 2004; Hinds et al., 1997;
and photographs, is better known as ‘doc- Mauthner et al., 1998; Parry and Mauthner,
umentary analysis’ (Plummer, 1983, 2001; 2004, 2005; Thorne, 1994, 1998).
Scott, 1990). That said, some types of qualita- In the first part of this chapter, I examine the
tive data, notably life stories, may be more or current state of the methodology, describing
less naturalistic, depending on how they were sources of qualitative data available for
produced, and hence the distinction between secondary analysis, ways in which these
‘secondary’ and ‘documentary’ analysis is not could be and have been re-used, and key
always clear-cut. issues emerging from debates on the method-
Unlike secondary analysis of quantitative ology. In the second part, I discuss three
data, the re-use of qualitative data is not questions which have implications for future
established practice in social research. There policy and practice concerning the collection,
are few qualitative multi-purpose or longitu- archiving, and re-use of qualitative data in
dinal datasets for researchers to access, no social research. The chapter draws on and
published manuals on ‘how to do’ qualitative updates previous work exploring epistemo-
secondary analysis, and limited funding logical, methodological and ethical issues in
SECONDARY ANALYSIS OF QUALITATIVE DATA 507
qualitative secondary analysis (Heaton, 1998, Republic; the Norwegian Social Science
2000, 2004). It focuses on developments in Data Services (NSD); the Swedish Social
the UK, where there has been considerable Science Data Services (SSD); and the Institute
work to promote the archiving and re-use of für Geschichte und Biographie in Germany.
qualitative data, and describes examples of In the USA, the Murray Research Center
secondary analysis carried out internationally (A Center for the Study of Lives at Harvard
in social research (but not social research of a University) holds over 270 datasets from
more historical nature). Most of the examples research on human development and social
are from health-related research, where the change, including longitudinal datasets con-
vast majority of studies involving the re-use of taining qualitative data (James and Sørensen,
qualitative research data have been published 2000).
to date. Particular advances in qualitative data
archiving have been made in the UK, where
formal sharing of all types of qualitative data
across the social sciences has been heavily
STATE OF THE ART
promoted since the mid-1990s by a major
funder of social research, the Economic and
Accessing qualitative data
Social Research Council (ESRC). In 1994, the
There are three ways in which social ESRC established the world’s first and only
researchers can access qualitative research Qualitative Data Archiving Resource Centre
data for secondary analysis: through data (Qualidata), based at the University of Essex
archives, by informal data sharing and by in England and directed by Paul Thompson.
re-using data from their own previous research The role of this service has evolved over
(Heaton, 2004). These approaches, and some time (Corti, 2000, 2003; Corti and Backhouse,
illustrative examples of studies using different 2005; Corti and Thompson, 2004). Originally,
sources of data, are described below. Qualidata was set up to promote and facilitate
the archiving of qualitative datasets in existing
Data archives repositories across the UK. In 2003, Qualidata
Many countries have national and other data became part of the new Economic and Social
archives which preserve datasets from the Data Service (ESDS), an initiative jointly
social sciences and make them available for funded by the ESRC and Joint Information
further use by other researchers.Archived data Systems Committee (JISC). Renamed ESDS
tends to be quantitative rather than qualitative Qualidata, the service is now based within
in nature, although some longitudinal studies the UKDA. Following a consultation carried
include a qualitative component. Where out for the ESRC on the use of qualitative
archives do hold qualitative data, these tend research resources (Henwood and Lang,
to be collections of life stories retained for 2003), ESDS Qualidata has sought to improve
use in historical research, rather than other the accessibility of archived material by
types of qualitative data often collected in making selected datasets available via the
social research. Information on worldwide web, and by creating web-based samplers of a
archives is available through the Council larger number of datasets so that researchers
of European Social Sciences Data Archives can more easily assess the potential for
(CESSDA) website1 . In Europe, there are using them in teaching and/or for secondary
a number of archives where qualitative research purposes.
datasets are already deposited, or which are The ESRC has further promoted qualitative
planning to accept this type of data2 . They data archiving and re-use through a number
include: the UK Data Archive (UKDA); of related policy and funding initiatives.
the Finnish Social Science Data Archive Since 1995, the ESRC has had a Datasets
(FSD); the Danish Data Archives (DDA); Policy making it a condition of its awards
the Sociological Data Archive in The Czech that researchers make available for archiving
qualitative datasets arising from their work; studies published in the international health
in applying for funding researchers also have and social care literature, which has been
to demonstrate that the proposed primary updated over time (Heaton, 1998, 2000,
research cannot be carried out using existing 2004). While this work is limited in that it
archived datasets3 . In addition, following the focuses on one area of social research, it
aforementioned consultation on qualitative provides an indication of how researchers
research resources, the ESRC funded a have re-used qualitative data in practice,
feasibility study on the possibility of a and I have not found evidence to suggest
qualitative longitudinal study (Holland et al., that numerous secondary studies have been
2004). This, in turn, has been followed published in other areas of social research to
up with funding for a programme of work date8 . The review found that that only nine
intended to develop resources for qualitative (14%) of the 65 secondary studies identified
secondary analysis. This includes funding involved the re-use of datasets collected
for a series of demonstration studies to by other researchers, and were carried out
investigate the value of innovative models independently of the primary researchers
of archiving, sharing and re-using qualitative (Heaton, 2004). Of these, two studies utilised
data, commissioned in 2005 as part of publicly archived datasets. One was a study
the ESRC’s Qualitative Archiving and Data by Bloor (2000) of communal understanding
Sharing Scheme (QUADS)4 . It also includes of, and responses to, the disease popularly
funding for a major qualitative longitudinal known as ‘Miners’ Lung’, using oral history
study, called Changing Lives and Times, material from South Wales Miners’ Library
commencing in 20065 . at the University of Wales Swansea. The
As a result of the above strategies, there other was a study by Bevan (2000) of
has been an increase in the availability of the career choices of general practitioners,
archived qualitative datasets in the UK, as well using life histories deposited with the British
as an improvement in the cataloguing of these Library National Sound Library. Another two
resources. By 2002, Qualidata had facilitated publications were based on data that Julius
archiving of 140 qualitative datasets and Roth had left with Paul Atkinson and which
added details of a further 150 existing were used for teaching and in research. These
collections to its catalogue (Corti, 2003; see data were re-used in a study of the cultural
also Corti and Backhouse, 2005)6 . However, aspects of tuberculosis (Weaver, 1994), and
there have been difficulties collating figures also to illustrate a book on micro-computing
on usage of these resources (Corti, 2000), and qualitative data analysis (Weaver and
and little is known about the extent to which Atkinson, 1994).
existing datasets have been accessed7 . Of Notable secondary studies which have been
course, many archived datasets have only just carried out using archived datasets in other
become available and work is ongoing to areas of social research include Fielding
improve the accessibility of some of these, and Fielding’s (2000) secondary analysis of
hence it will take time for researchers to com- Cohen and Taylor’s (1972) research on the
plete work based on these resources and for long-term imprisonment of men in a maxi-
resulting secondary studies to be published. mum security prison (archived at the Institute
Nonetheless, as Parry and Mauthner (2005) of Criminology, Cambridge). And data from
have argued, the ongoing case for qualitative the ‘Affluent Worker’ study (available via
data archiving (and different models for this) Qualidata, at the University of Essex) have
needs to be supported by information on the been re-used in a secondary study by Savage
extent to which these datasets are accessed and (2005a; see also Savage 2005b). Thompson
re-used, by whom and for what purposes. (1998) has also reported that oral histories
In a bid to examine whether and how collected for ‘The Edwardians’ study (held at
researchers have re-used qualitative research the University of Essex) have been re-used in
data, in 1997 I began a review of secondary numerous publications and for teaching.
Informal data sharing A more recent example illustrating this

An alternative approach to accessing data for approach to qualitative secondary analysis,
secondary analysis is through informal data this time from the field of education, is
sharing. Here, researchers share their data provided by Nelson et al. (2004). They were
directly with other researchers. One or more of part of a larger primary research team that
the primary researchers who collected the data carried out a study of family and professional
can be involved in the secondary analysis (and partnerships in special education in the USA.
others may act as advisers). Single or multiple Feedback on the original research findings
datasets can be shared, and re-used in full or highlighted issues regarding boundaries in
in part, depending on the aims and scope of families’ relationships with professionals.
the secondary research. A secondary analysis was carried out to
While informal data sharing has not been examine this topic in more depth, in which
officially promoted in the UK or elsewhere, some of the codes developed for the primary
this source of data has been used in secondary analysis (using Ethnograph) were re-used for
studies carried out in health-related research. this purpose.
In the aforementioned review, 20 (32%) of
the secondary studies were by researchers
Secondary uses of qualitative data
who had informally shared their data with
others (Heaton, 2004). These studies were Various claims have been made about the
by researchers based in North America. ways in which qualitative data could be re-
Examples include a secondary study by used in social research (for example, see Corti
Yamashita and Forsyth (1998), which came and Thompson, 2004; Hinds et al., 1997;
about after the two researchers met at a Thorne, 1994). In one of the first articles
conference and found that they had both dedicated to the topic of qualitative secondary
carried out research on families’ reactions analysis, Sally Thorne (1994) outlined five
to a relative’s mental illness in Canada and possibilities. In ‘analytic expansion’, she
the USA. Angst and Deatrick (1996) also suggested researchers could make use of
drew on data from studies that they had their own data to answer new or extended
independently carried out, to compare and questions; in ‘retrospective interpretation’,
contrast the involvement of children with new questions which were raised by, but
different conditions in healthcare decision- not addressed in, the primary research
making. could be examined; in ‘armchair induction’,
inductive methods of textual analysis could
Self-collected data be applied to data collected by others for
Researchers also have the option of re-using purposes of theory development; in ‘amplified
datasets that they personally have collected analysis’, several distinct and theoretically
and retained over the course of their career. representative datasets could be compared;
This may be data which were not originally and in ‘cross-validation’, data collected by
analysed, or data which are rich enough to others could be re-analysed and alterna-
support further analysis – either as a secondary tive findings and links with other research
study in its own right, or in conjunction explored.
with additional primary research designed to Ten years later, Corti and Thompson
collect more data required to address the new (2004) were able to provide some exam-
study aims. ples of secondary studies carried out in
In the aforementioned review, over half the meantime to illustrate their view that
the studies identified (36, 55%) were by archived qualitative data could be used for
researchers who had re-used their own data purposes such as: descriptive work; compar-
(Heaton, 2004). The majority were by authors ative research; re-study or follow-up study;
based in the USA and Canada, while the reanalysis or secondary analysis; research
remainder were from the UK and Sweden. design and methodological advancement;
verification; and for teaching and learning where primary research stops and secondary
(no example of verification was provided, analysis starts, particularly when the sup-
which the authors acknowledge researchers plementary analysis is carried out by the
have not yet pursued, despite the availability same researchers who carried out the primary
of resources). research.
In my review of the health and social An example of supplementary analysis is
care literature, which looked in detail at how provided by Brownlie and Howson’s (2005)
and why researchers had re-used qualitative secondary analysis of two datasets on pro-
datasets in published studies, I found that fessional and parental views of the measles,
there were five main types of qualitative mumps and rubella (MMR) vaccination.
secondary analysis (Heaton, 2004). These These data were collected in studies carried
are summarised below, together with a few out by an independent research agency for the
examples of relevant studies drawn from the Health Education Board for Scotland (HEBS,
review and from a more recent search of the now NHS Health Scotland) in 1999 and
social research literature carried out to update 2001. These organisations agreed to provide
the findings for this chapter. the secondary researchers with access to the
datasets after they had been anonymised by
Supra analysis the research agency. The secondary analysis
In this type of secondary analysis, the focused on ‘emergent themes of trust and
focus of the secondary study transcends parental anxiety about risk’ (Brownlie and
that of the primary work. New theoretical, Howson, 2005: 223).
empirical or methodological questions are
explored that are distinct from the aims of Re-analysis
the original research. For example, three of Whereas the above types of secondary
the secondary studies reviewed focused on the analysis involve the investigation of new
use of metaphors in participants’ accounts of questions or emergent issues, the purpose of
medical encounters (Jairath, 1999; Jenny and re-analysis is to verify and corroborate the
Logan, 1996; Pascalev, 1996). Another three findings of previous work. Only one example
studies used secondary analysis in method- approximating this type of secondary analysis
ological work concerning micro-computing was identified in the review. This was a study
and qualitative data analysis (Weaver and by Popkess-Vawter et al. (1998), where
Atkinson, 1994), different methods of textual alternative methods of analysis were used,
analysis (Atkinson, 1992), and the value of in a form of methodological triangulation, to
different approaches to biographical analysis re-examine data originally collected by the
(Jones and Rupp, 2000). first author on women’s experiences of losing
and gaining weight after dieting (‘weight-
Supplementary analysis cycling’). Whereas the primary analysis was
Supplementary analysis was the most com- based on ‘reversal theory’, the secondary anal-
mon type of secondary analysis identified in ysis was a content analysis performed by two
the review. This approach involves the in- independent coders ‘with no consideration
depth investigation of an issue, or aspect for reversal theory’ (Popkess-Vawter et al.,
of the data, that was not addressed, or was 1998: 71). The authors claim that secondary
only partly covered, in the original research. analysis was carried out to provide ‘a validity
The focus may be on a particular issue check for the primary coding and an accuracy
or theme that emerged from the primary check for complete interpretation’ (Popkess-
work, or on a sub-set of the data. Unlike Vawter et al., 1998: 71), although in reporting
supra analysis, the subject of this type of their results they do not comment on how
secondary analysis is more closely related the coding and findings from the secondary
to that of the primary work. As a result, in analysis related to those previously applied
some cases it may be difficult to distinguish and obtained.
Amplified analysis is compatible with some of the basic tenets

Secondary studies vary not only in terms of qualitative inquiry. For example, one
of the extent to which their aims diverge concern is whether research questions can be
from, or converge with, the primary studies addressed using data which were originally
from which they are derived, but also collected for other purposes (Heaton, 2004;
according to the number and type of primary Szabo and Strang, 1997; Thorne, 1994, 1998).
studies involved. In amplified analysis, two This problem of data ‘fit’is seen as a particular
or more qualitative datasets are utilised. problem in qualitative research where, for
These data may be aggregated to form instance, data collection can be refined during
a larger dataset, or used to compare different a study in response to emerging findings. Use
populations. An example illustrating this of open-ended topic guides in interviews can
approach (and supra analysis) is Bloor and also result in a rich but relatively unstructured
McIntosh’s (1990) study, in which they re- dataset, where a range of topics are covered
used two datasets to examine forms of in varying degrees of depth depending on the
surveillance in professional-client relation- direction of the interviews. However, others
ships, and associated strategies of resistance, have argued that secondary analysis allows for
from a Foucauldian perspective. In another, unexpected topics that emerge from primary
more recent example, data from a series research to be followed up, and that these
of studies carried out between 1995 and are worthy topics of investigation precisely
2001 were re-used to examine how family because they have emerged spontaneously,
doctors conceptualised chronic illness and without being directly solicited by researchers
its management in their consultations with (Corti and Thompson, 2004). It has also
patients (May et al., 2004). been suggested that secondary analysis allows
primary researchers to ‘salvage’ data that
Assorted analysis could not be used for the original purposes
In assorted analysis, secondary analysis of intended (Sandelowski, 1997: 129).
qualitative data is combined with additional Another matter of concern is whether
primary research and/or documentary analysis researchers can effectively re-use qualitative
of relevant materials. For example, Thorne data that other researchers have collected
(1990a) re-used data from multiple datasets, (Corti and Thompson, 2004; Hammersley,
and carried out additional interviews, in 1997; Heaton, 2004; Hinds et al., 1997;
a study of non-compliance with advice in Mauthner et al., 1998; Parry and Mauthner,
chronic illness. In other studies, re-use of 2005; Thorne, 1994). When re-using other
qualitative research data was combined with researchers’ data, secondary analysts have
analysis of more naturalistic data in the form the problem of not having ‘been there’ at
of autobiographies (Cohen, 1995; Thorne, data collection, which means that they do
1988). not have the benefit of personal knowledge
and experience of being involved in the
fieldwork that produced the data. As a result,
Key issues
they lack the primary researcher’s detailed
The development of qualitative secondary understanding of the context in which the data
analysis has been accompanied by a growing were collected, and have a relatively cold and
debate over the epistemological, practical, distant relationship to the data (which may
ethical and legal problems connected with the be compounded by the dataset having being
re-use of qualitative data. Some of the key anonymised and stripped of other identifying
issues in this debate are highlighted below. features). However, it has been pointed out
that this problem is not particular to secondary
Epistemological and practical concerns analysis, as some qualitative studies are
A major topic of debate has been whether carried out by teams of primary researchers
or not secondary analysis of qualitative data whose members are variously involved in
the fieldwork (Heaton, 1998, 2004). Some Godfrey, 2003; Thorne, 1998). This could be
archivists and researchers have also argued done at the time data are collected. However,
that this problem can be reduced by primary information on exactly how data will be re-
researchers fully documenting their dataset, used, by whom and for what purpose, is
and by secondary analysts consulting the likely to be scant at this time. Alternatively,
researchers who collected the data (Corti and consent could be sought retrospectively, as
Thompson, 2004; Fielding, 2004; Hinds et al., and when particular secondary studies are
1997). planned. But this requires that participants’
Yet another concern is whether one sug- identity and contact details are known and
gested use of secondary analysis – re- can be used for this purpose. Re-contacting
analysis in order to confirm or discount participants also presents researchers with
previous research findings – is a realistic logistical and ethical difficulties where people
ambition or accordant with the principles have changed address or may have died; being
of qualitative inquiry (Hammersley, 1997; re-contacted may also be unwelcome to some
Heaton, 2004). However, others support the former participants. In addition, whether or
concept of preserving data for replication not researchers decide to seek fresh consent
in both quantitative and qualitative research for a secondary study may depend on who
(Schneider, 2004). collected the data and on the type of qualitative
Discussion of technical issues has tended secondary analysis planned; for example, in
to focus more on issues of how to archive the case of a supplementary analysis carried
qualitative data than how to do qualitative sec- out by the same researchers who collected
ondary analysis. For example, there has been the primary data, and where the aims of the
some discussion of how best to anonymise secondary and primary research are relatively
qualitative data while preserving the integrity congruent, this may not be required (for
of datasets (Thomson et al., 2005), and when example, see Brownlie and Howson, 2005).
best to obtain consent for archiving and re- From a legal perspective, data may be
using qualitative data (see below). However, re-used in research in the UK under the
unlike the literature on secondary analysis of Data Protection Act 1998 providing it has
quantitative data, there are no textbooks on been anonymised. However, copyright law
how to re-use qualitative data and there has also has to be considered when publicly
been only preliminary discussion of issues archiving and re-using qualitative data. Under
such as: how to design secondary studies re- the Copyright, Designs and Patents Act
using qualitative data; how to find and select 1988, copyright of ‘original works’ (which
relevant datasets; how to analyse secondary include interview transcripts), is owned by
qualitative data; how to assure and assess the interviewee. While some use can be made
the quality of secondary studies; and what of such material by non-copyright holders,
to include in reports of such studies (Heaton, researchers in the UK have been advised to
2004; Hinds et al., 1997; Thorne, 1994, 1998). have ownership of copyright of qualitative
There is an urgent need for further research on data transferred in writing from participants
these topics. to themselves or an archive if the dataset is
to be archived for re-use by others (Allen and
Ethical and legal concerns Overy, 1998).
Another set of concerns relate to the ethical
and legal aspects of re-using qualitative
data. These include the issue of whether QUESTIONS FOR FUTURE POLICY AND
and, if so, when researchers should seek PRACTICE
consent to re-use data in secondary studies
(Alderson, 1998; Corti et al., 2000; Heaton, Ongoing developments in the secondary
2004; Hood-Williams and Harrison, 1998; analysis of qualitative data raise a number
Parry and Mauthner, 2004; Richardson and of questions for future policy and practice
concerning the collection, archiving and re- should concentrate on retention of ‘classic’
use of qualitative data. Three of the most or ‘key’ qualitative datasets and suggested
critical questions are discussed below. the ESRC explore ‘alternative approaches
to the re-use of qualitative data in order to
demonstrate the possibilities’ (Boddy, 2001).
Which qualitative datasets should be
But, as Parry and Mauthner (2005) point out,
archived?
this begs the question of how some datasets
As we have seen, great advances in qualitative come to be defined as ‘classic’ and selected
data archiving have been made in the UK, for archiving. Furthermore, as we have seen,
driven by policies of a major funder of there is little evidence of the extent to which
research in the social sciences, the ESRC. researchers have made use of qualitative
Since 1995, the ESRC has had a Datasets datasets that have been officially archived so
Policy that requires researchers to provide far across the UK. So far, reviews have shown
qualitative datasets for archiving and possible that most of the (non-historical) secondary
use by third parties as a condition of their analyses of qualitative data published to
funding, although applicants may make a date have been by researchers who have
case for exemption or request access to informally shared their data or re-used their
their datasets is made subject to conditions. own data.
Qualidata helped inform development of the Adoption of a blanket mandatory rather
ESRCs Datasets Policy and has discussed than, say, an elective or invited, policy
archiving policies with other funders (Corti of formal data archiving, would mean that
and Backhouse, 2005). While commending all researchers would have to aim to meet
the ESRC’s policy lead, staff from ESDS minimum criteria for archiving datasets to
Qualidata have recommended that the ESRC a standard that could be used by third parties –
improves implementation of its Datasets regardless of the nature of the study, the
Policy, to make it more ‘robust, systematic potential value of the dataset as a secondary
and accountable’ – for example, suggesting resource (which may be hard to predict
that penalties could be introduced for non- in advance), and the associated work and
compliant researchers (Corti, 2003: 424; see costs involved in meeting this standard. The
also Corti and Backhouse, 2005). But what requirement to archive could also impact
is the case for such a mandatory policy of upon the conduct of primary qualitative
data archiving? And what are the possible research when consent for archiving data
alternatives to this model of promoting is sought at the time of data collection,
secondary analysis of qualitative data? adding to the amount of information that
Parry and Mauthner (2005: 338) have needs to be given and explained to potential
argued that, so far as the demand for archived research participants by primary researchers.
data goes, the ‘jury are still out’. As they point While ESDS Qualidata provide guidelines on
out, there is no clear evidence of support for how to do this10 , it is not known whether
formal archiving of qualitative datasets. On prolonging and complicating the process of
the one hand, Qualidata carried out a survey of getting informed consent at this stage affects
academics and researchers in the UK in 1999, participants’ agreement to take part. Nor is
which found that 92% of over 550 respondents it known if, having agreed to take part and
wanted access to qualitative datasets (Corti have their contribution to the dataset deposited
and Thompson, 2004)9 . On the other hand, in an archive, participants’ disclosure to
a report of a consultation on ESRC Data the primary researcher(s) is affected by
Policy andArchiving found mixed support for, the knowledge that the information will be
and highlighted ‘considerable concerns within available, albeit anonymously, to unknown
the research community’ about, the archiving third parties. In short, there is little research on
and re-use of qualitative data (Boddy, 2001). this topic to help researchers, peer reviewers
The report recommended that archiving policy of grant applications, funding organisations,
ethics committees, and the public, decide supply qualitative data for use in secondary
whether or not archiving is, per se, a desirable research. These studies would be the equiva-
scientific and personal option in social lent of multi-purpose statistical surveys, and
research. funding would include provision for archiving
Different models of quantitative and and associated costs. Both ‘exemplary’ and
qualitative archiving have been previously multi-purpose qualitative datasets would be
discussed (see Boddy, 2001; Corti, 2000). available to registered users via the web.
In contrast to a mandatory qualitative data Whereas the first and last points are being
archiving policy, I would like to propose advanced through the aforementioned ESRC
an alternative fourfold strategy, subject to initiatives on formal data sharing, less is
support by the research community, including being done to investigate, support and develop
research participants and the public. First, informal data sharing, or the private retention
this strategy would focus on making widely and re-use of qualitative datasets.
available datasets that have value for historical
and/or contemporary secondary research.
Where this differs from current policy and
Whose qualitative data should be
practice is that what counts as an ‘exemplary’
re-used?
study would be decided retrospectively and by
independent peer review, using agreed criteria As the availability of public and pri-
for selecting studies which demonstrate value vately retained qualitative datasets grows,
for teaching and/or secondary research pur- researchers will increasingly have a choice of
poses across social science disciplines. The not only whether to do primary or secondary
selection of datasets for archiving would be a research, but whether to re-use datasets
mark of prestige for the researchers involved, available in dedicated archives, via informal
and include a financial award for their help data sharing, or from their own oeuvre.
in documenting and preparing the dataset for The advantages and limitations of re-using
deposit. qualitative data varies depending on whose
Second, in support of the requirement data are re-used.
of some funding organisations, and many The main advantages of re-using formally
publishers, that datasets are retained for archived datasets are that these will have been
a minimum period of time after work has specially prepared for use by third parties.
been completed and/or published, funders Thus, issues of consent, copyright ownership,
would make available resources for the anonymity of data, meta-documentation of
adequate in-house preparation and retention datasets and conditions of access to and use
of qualitative datasets. At present, there is of the material, should have been dealt with
little provision in research grants, and limited and be clear to potential secondary users.
facilities in university workplaces, for data The main limitations are that these datasets
to be adequately retained even for a limited will have been collected by other researchers,
period and to a lower standard than that which presents two major problems for
required in formal data archiving (that is, secondary analysts. One is how they can
where data are not purposely made available recapture the context in which the original
for use by third parties). Third, services study was devised and the data collected.
such as Qualidata and data archives would As we have seen, some researchers believe
provide advice and guidelines for researchers that intimate knowledge of ‘being there’ in
on protocols for informal data sharing and the field and ‘immersing’ oneself in data
researchers’ re-use of their self-held datasets, processing and analysis, are integral and
as well as procedures for depositing and re- essential to the process of doing qualitative
using qualitative datasets in official archives. research and making sense of people’s
Finally, funds would be dedicated for projects, experiences. While the meta-documentation
such as longitudinal studies, designed to of datasets provides some background and
insight, this can only ever be an approx- The main disadvantages of informal data
imation (Mauthner et al., 1998). The other sharing are that datasets may not be prepared
problem, which is related to this, concerns the to be as high a standard as in an archive, nor
relative distance that secondary analysis of may all the aforementioned protocols be fully
formally archived datasets imposes between satisfied. For example, secondary researchers
the researcher and the researched (Thorne, may have to re-contact participants for
1994). Here, the researcher’s relationship consent to re-use data where this is required.
to the data is reduced to (most likely) In addition, where primary researchers share
anonymised data, perhaps offset by wider their data but are not involved in the
personal experience of doing primary research secondary analysis, the disadvantages of
with similar groups of people, and/or by re-using formally archived data apply and
contact with the primary researcher(s) who may be compounded by the relatively poor
collected the data. While there may be documentation of datasets if they have not
some advantages to having this distance been prepared for sharing with third parties.
in some secondary studies (for instance, Many of the advantages and limitations
where re-analysis is the goal), nevertheless of informal data sharing apply to secondary
it is a different, less intimate, relationship research carried out by researchers who
compared to that of primary researchers and choose to re-use their own data. Additional
their subjects11 . advantages are that researchers who have
Where datasets are informally shared worked on related projects in their careers can
between colleagues, and primary researchers draw on and utilise material from this work
are also involved in the secondary analysis, (for example, see Thorne, 1990a, 1990b and
here the secondary research team have the 1990c). Researchers may also identify and
advantage of jointly holding and sharing the follow up spontaneous topics of analysis that
tacit, as well as the documented, knowledge emerge unexpectedly in the course of research
of the researcher(s) who collected the data. and which otherwise may go unanalysed if
In this situation, the process of doing they are not germane to the aims of the primary
secondary analysis is arguably no different research, or if the data are not shared with
to that of doing primary research in teams others or archived for further use. However,
where interviews may be carried out and this practice raises new issues. Where does
analysed by different members (Heaton, 1998, primary research stop and secondary analysis
2004). The co-involvement of the primary start? At what point is further consent
researchers means that, compared to re-using required from participants to re-use data for
archived data, there may be greater awareness spontaneous studies, even if these are to be
of the context of the primary work, and carried out by the same researcher(s) who
sensitivity to the feelings of the researched collected the data? Finally, researchers who
(and any other researchers who carried out re-use their own data may also find that their
the primary work). Other advantages are memory of the original study changes over
that secondary researchers may be able to time, and that their perspective shifts as their
gain quicker access to informally shared own life experiences inform their subsequent
datasets, rather than have to wait for them analysis of the data (Mauthner et al., 1998).
to be processed and become available via an Of course, the above are just some of the
archive. They may also have access to and be pros and cons of working with qualitative
able to re-use any electronic coding that was data drawn from different sources. Many
employed in the original analysis, carried out other factors, including the accessibility of
using software designed to assist qualitative datasets, preference for data format, quality
data analysis. And where primary researchers of the original study, degree of ‘fit’ between
are involved in the secondary research, they the aims of the secondary research and
may retain direct control over the re-use of the content of the dataset(s), trust between
the dataset rather than rely on an archive. researchers, compatibility of shared datasets,
and availability of any electronic coding appropriateness of approaches used to obtain

used in the primary research (and secondary and preserve data for sharing with others.
analyst’s preferred software), will influence
the decision as to whether or not to do
secondary analysis and, if so, using which CONCLUSION
source of data.
While an increasing number of secondary
studies are being published, together with
(How) do research participants want commentaries debating the pros and cons of
qualitative data to be re-used? re-using different types of qualitative data
The third and final important topic for from different sources, secondary analysis
debate concerns the involvement of research of qualitative data is still an emerging and
participants in helping to shape future policy intricate methodology. There are, however,
and practice in the secondary analysis of a number of things that researchers can do
qualitative data. Some research has been to further develop and establish the value
carried out examining public attitudes to of re-using qualitative data. In reports of
consent for secondary use of mainly statistical their work, secondary analysts could usefully
data collected for administrative or research describe their methods in more depth and
purposes in health services research12 . This reflect on the strengths and limitations of the
has shown support for data sharing, although particular approach they used. Further work
with appropriate consent and safeguards in is also needed to explore and outline different
place. However, little is known about partici- strategies for re-using qualitative data, and to
pants’ and public views on whether and, if so, examine the acceptability of these strategies
how, qualitative data is obtained, anonymised, to research participants and the public. And
shared and re-used for secondary purposes primary researchers need to be mindful of the
in social research (Heaton, 2004). Some possibility of data being re-used, through data
related work on methods of anonymisation archiving, informal data sharing or secondary
in primary research has been carried out. analysis of self-preserved datasets, when
For example, Grinyer (2002) has discussed collecting qualitative data in the course of
participants’ views on anonymisation and use primary research. Finally, the ongoing devel-
of pseudonyms in connection with her primary opment of secondary analysis of qualitative
research with families of young people who data has implications for the principles and
have cancer. This showed that respondents practices in qualitative research generally.
can have different views on whether or not Ethical protocols, data processing, data anal-
they would like to be personally identified in ysis, reporting, and criteria for assessing the
research reports. A small study of participants’ quality of qualitative research, need to keep
views on the use of verbatim quotations in pace with these developments so that they are
qualitative research, including how speech inclusive of the possibilities and practice of
should be edited, attributed and reported in secondary analysis of qualitative data.
research reports, showed that anonymisation
was important to the people who took part,
and also revealed a dislike of being identified NOTES
as belonging to certain groups or categories
that might be perceived negatively by others
(Corden and Sainsbury, 2005). Hopefully, the 1 The CESSDA website address is:
aforementioned QUADS studies will provide http://www.nsd.uib.no/cessda/ [accessed 28/2/2006].
an insight into participants’ experiences of 2 International developments in qualitative data
archiving are reported in an issue of Forum: Qual-
being involved in, for example, longitudinal itative Social Research [Online Journal], 200, 1 (3).
research, where datasets are to be retained for Available at: http://www.qualitative-research.net/fqs/
use by third parties, and their views on the fqs-e/inhalt3-00-e.htm [accessed 1/3/2006].
3 ESRCs Datasets Policy is set out in Annex C Allen and Overy (1998) ’Copyright/confidentiality: final
of ‘2005 ESRC Research Funding Guide Post fEC’, report to the Economic and Social Research Council’.
available at: http://www.esrcsocietytoday.ac.uk/ Retrieved from: ftp://ftp.esrc.ac.uk/pub/guide.doc
ESRCInfoCentre/opportunities/research%5Ffunding/ [accessed 14/9/1998].
[accessed 28/2/2006]. Angst, D.B. and Deatrick, J.A. (1996) ’Involvement
4 Information on QUADS is available at
in health care decisions: parents and children with
the UKDA website: http://quads.esds.ac.uk/about/
introduction.asp [accessed 28/2/2006].
chronic illness’, Journal of Family Nursing, 2 (2):
5 ‘Changing Lives and Times qualitative 174–94.
longitudinal initiative’. Call for outline proposals on Atkinson, P. (1992) ’The ethnography of a medical
ESRC website: http://www.esrcsocietytoday.ac.uk/ setting: reading, writing and rhetoric’, Qualitative
ESRCInfoCentre/opportunities/current_funding_ Health Research, 2 (4): 451–74.
opportunities/index28.aspx [accessed 28/2/2006]. Bevan, M. (2000) ’Family and vocation: career choice
6 The ESDS Qualidata catalogue currently lists 162 and the life histories of general practitioners’, in
datasets (and details of others remain to be transferred Bornat, J., Perks, R., Thompson, P. and Walmsley, J.
from the older Qualicat catalogue). Available at: (eds.), Oral History, Health and Welfare. London:
http://www.data-archive.ac.uk/search/allSearch.asp? Routledge. pp. 21–47.
q1=qualidata&zoom_page=1&zoom_per_page=10&
Bloor, M. (2000) ’The South Wales Miners Federation,
zoom_cat=-1&zoom_and=1&zoom_sort=1&ct=
Miners’ Lung and the instrumental use of expertise,
xmlAll [accessed 28/2/2006].
7 There is some public information on this, 1900–1950’, Social Studies in Science, 30 (1):
produced by JISC on ESDS performance and published 125–40.
on its website: http://www.mu.jisc.ac.uk/servicedata/ Bloor, M. and McIntosh, J. (1990) ’Surveillance and
esds/data/ [accessed 28/2/2006]. concealment: a comparison of techniques of
8 Provisional searches of ASSIA and selected client resistance in therapeutic communities and
electronic databases of research on criminology and health visiting’, in Cunningham-Burley, S. and
education carried out by myself and independently by McKeganey, N.P. (eds.), Readings in Medical
two colleagues (Rachel Pitman and Janette Colclough, Sociology. London: Tavistock/Routledge. pp. 159–81.
University of York) in February 2006 provided little evi- Boddy, M. (2001) Data Policy and Data Archiving: Report
dence of such studies. However, there are difficulties
on Consultation for the ESRC Research Resources
searching for secondary studies because there are no
Board. Bristol: University of Bristol.
established key words for classifying such studies, and
authors’ own definitions of secondary analysis vary.
Brownlie, J. and Howson, A. (2005) ‘ “Leaps of faith”
A renewed search of the health-related literature, and MMR: an empirical study of trust’, Sociology,
using similar search strategies, did result in further 39 (2): 221–39.
studies been identified. In total, over 100 secondary Cohen, M.H. (1995) ’The triggers of heightened parental
studies in health, criminology and education have uncertainty in chronic, life-threatening childhood
been identified to date. illness’, Qualitative Health Research, 5 (1): 63–77.
9 A different response figure (99%) and date Cohen, S. and Taylor, L. (1972) Psychological Survival:
of the survey (2000) have been reported elsewhere The Effects of Long-Term Imprisonment. London:
(Corti, 2000). I have quoted the most recently Allen Lane.
published. Corden, A. and Sainsbury, R. (2005) ‘Research
10 Guidelines on creating and depositing
participants’ views on use of verbatim quotations’.
qualitative datasets are available on the ESDS
Final report to ESRC, ref 2094. York: Social Policy
Qualidata website: http://www.esds.ac.uk/qualidata/
create/ [accessed 28/2/2006]. Research Unit (SPRU), University of York.
11 There are parallels here with concerns over the Corti, L. (2000) ’Progress and problems of preserving
use of computer software in qualitative data analysis and providing access to qualitative data for social
(see Gilbert, 2002). research – the international picture of an emerging
12 See essays published in a special supplement culture’, Forum: Qualitative Social Research
of the Journal of Health Services Research and Policy, [Online Journal], 1 (3): 58 paragraphs. Available
2005, 8 (1). at: http://www.qualitative-research.net/fqs-texte/
3-00/3-00corti-e.htm [accessed 1/3/2006].
Corti, L. (2003) ‘Infrastructure services and needs for
REFERENCES the provision of enhanced qualitative data resources’,
International Social Science Journal, 55 (3): 417–32.
Alderson, P. (1998) ’Confidentiality and consent in Corti, L. and Backhouse, G. (2005) ‘Acquiring
qualitative research’, Network – Newsletter of the qualitative data for secondary analysis’,
British Sociological Association, 69: 6–7. Forum: Qualitative Social Research [Online Journal],
6 (2): 31 paragraphs. Available at: http://www. Hood-Williams, J. and Harrison, W.C. (1998) ‘ “It’s all
qualitative-research.net/fqs-texte/2-05/05-2-36-e.htm in the small print …”: archiving and qualitative
[accessed 1/3/2006]. research’, Network – Newsletter of the British
Corti, L., Day, A. and Backhouse, G. (2000) Sociological Association, 70: 8–9.
’Confidentiality and informed consent: issues Jairath, N. (1999) ’Myocardial infarction patients’ use
for consideration in the preservation of and provision of metaphors to share meaning and communicate
of access to qualitative data archives’, Forum: underlying frames of experience’, Journal of Advanced
Qualitative Social Research [Online Journal], 1 (3): Nursing, 29 (2): 283–89.
46 paragraphs. Available at: http://www.qualitative- James, J.B. and Sørensen, A. (2000) ’Archiving
research.net/fqs-texte/3-00/3-00cortietal-e.htm longitudinal data for future research: why
[accessed 1/3/2006]. qualitative data add to a study’s usefulness’, Forum:
Corti, L. and Thompson, P. (2004) ‘Secondary anal- Qualitative Social Research [Online Journal], 1 (3):
ysis of archived data’, in Seale, C., Gobo, G., 57 paragraphs. Available at: http://www.qualitative-
Gubrium, J.F. and Silverman, D. (eds.), Qualitative research.net/fqs-texte/3-00/3-00jamessorensen-e.htm
Research Practice. London: Sage. pp. 327–43. [accessed 1/3/2006].
Fielding, N. (2004) ‘Getting the most from archived Jenny, J. and Logan, J. (1996) ’Caring and comfort
qualitative data: epistemological, practical and metaphors used by patients in critical care’, Image:
professional obstacles’, International Journal of Social Journal of Nursing Scholarship, 28 (4): 349–52.
Research Methodology, 7 (1): 97–104. Jones, C. and Rupp, S. (2000) ’Understanding the
Fielding, N.G. and Fielding, J.L. (2000) ’Resistance carers’ world: a biographical-interpretive case study’,
and adaptation to criminal identity: using secondary in Chamberlayne, P., Bornat, J. and Wengraf, T.
analysis to evaluate classic studies of crime and (eds.), The Turn to Biographical Methods in Social
deviance’, Sociology, 34 (4): 671–89. Science: Comparative Issues and Examples. London:
Gilbert, L.S. (2002) ‘Going the distance: ‘closeness’ Routledge. pp. 276–89.
in qualitative data analysis software’, International Mauthner, N.S., Parry, O. and Backett-Milburn, K. (1998)
Journal of Social Research Methodology, 5 (3): ’The data are out there, or are they? Implications for
215–28. archiving and revisiting qualitative data’, Sociology,
32 (4): 733–45.
Grinyer, A. (2002) ‘The anonymity of research partici-
May, C., Allison, G., Chapple, A., Chew-Graham, C.,
pants: assumptions, ethics and practicalities’, Social
Dixon, C., Gask, L., Graham, R., Rogers, A. and
Research Update, Issue 36, University of Surrey.
Roland, M. (2004) ‘Framing the doctor-patient
Hammersley, M. (1997) ’Qualitative data archiving:
relationship in chronic illness: a comparative study of
some reflections on its prospects and problems’,
general practitioners accounts’, Sociology of Health &
Sociology, 31 (1): 131–42.
Illness, 26 (2): 135–58.
Heaton, J. (1998) ’Secondary analysis of qualitative Nelson, L.G.L., Summers, J.A. and Turnbull, A. (2004)
data’, Social Research Update, Issue 22, University ‘Boundaries in family-professional relationships:
of Surrey. implications for special education’, Remedial and
Heaton, J. (2000) ’Secondary analysis of qualitative data: Special Education, 25 (3): 153–65.
a review of the literature’. Final report to ESRC, Parry, O. and Mauthner, N.S. (2004) ‘Whose data are
ref R000222918. York: Social Policy Research Unit they anyway? Practical, legal and ethical issues in
(SPRU), University of York. archiving qualitative research data’, Sociology, 38 (1):
Heaton, J. (2004) Reworking Qualitative Data. London: 139–52.
Sage. Parry, O. and Mauthner, N.S. (2005) ‘Back to basics:
Henwood, K. and Lang, I. (2003) Qualitative Research who re-uses qualitative data and why?’, Sociology,
Resources: A Consultation with UK Social Scientists. 39 (2): 337–42.
Swindon, UK: ESRC. Pascalev, A. (1996) ’Images of death and dying in the
Hinds, P.S., Vogel, R.J. and Clarke-Steffen, L. (1997) intensive care unit’, Journal of Medical Humanities,
’The possibilities and pitfalls of doing a secondary 17 (4): 219–36.
analysis of a qualitative data set’, Qualitative Health Plummer, K. (1983) Documents of Life: An Introduction
Research, 7 (3): 408–24. to the Problems and Literature of a Humanistic
Holland, J., Thomson, R. and Henderson, S. (2004) Method. London: George Allen & Unwin.
‘Feasibility study for a possible qualitative longitudinal Plummer, K. (2001) Documents of Life 2: An Invitation
study: discussion paper’. Available at: http://www. to a Critical Humanism. London: Sage.
lsbu.ac.uk/inventingadulthoods/feasibility_study.pdf Popkess-Vawter, S., Brandau, C. and Straub, J. (1998)
[accessed 23/2/2006]. ’Triggers of overeating and related intervention
strategies for women who weight cycle’, Applied qualitative data’, Forum: Qualitative Social Research
Nursing Research, 11 (2): 69–76. [Online Journal], 6 (1): 33 paragraphs. Available
Richardson, J.C. and Godfrey, B.S. (2003) ‘Towards at: http://www.qualitative-research.net/fqs-texte/
ethical practice in the use of archived transcripted 1-05/05-1-29-e.htm [accessed 1/3/2006].
interviews’, International Journal of Social Research Thorne, S.E. (1988) ’Helpful and unhelpful commu-
Methodology, 6 (4): 347–55. nications in cancer care: the patient perspective’,
Sandelowski, M. (1997) ‘ “To be of use”: enhancing the Oncology Nursing Forum, 15 (2): 167–72.
utility of qualitative research’, Nursing Outlook, 45: Thorne, S.E. (1990a) ’Constructive noncompliance in
125–32. chronic illness’, Holistic Nursing Practice, 5 (1): 62–9.
Savage, M. (2005a) ‘Working-class identities in Thorne, S.E. (1990b) ’Mothers with chronic illness:
the 1960s: revisiting the Affluent Worker study’, a predicament of social construction’, Health Care
Sociology, 39 (5): 929–46. for Women International, 11: 209–21.
Savage, M. (2005b) ‘Revisiting classic qualitative Thorne, S.E. (1990c) ’Navigating troubled waters:
studies’, Forum: Qualitative Social Research chronic illness experience in a health care crisis’.
[Online Journal], 6 (1): 43 paragraphs. Available Unpublished thesis, The Union Institute of Advanced
at: http://www.qualitative-research.net/fqs-texte/ Studies: Cincinnati.
1-05/05-1-31-e.htm [accessed 1/3/2006]. Thorne, S.E. (1994) ’Secondary analysis in qualitative
Schneider, B. (2004) ‘Building a scientific community: research: issues and implications’, in Morse, J.M.
the need for replication’, Teachers College Record, (ed.), Critical Issues in Qualitative Research Methods.
106 (7): 1471–83. London: Sage. pp. 263–79.
Scott, J. (1990) A Matter of Record: Documentary Thorne, S.E. (1998) ’Ethical and representational issues
Sources in Social Research. Cambridge: Polity Press. in qualitative secondary analysis’, Qualitative Health
Szabo, V. and Strang, V.R. (1997) ’Secondary analysis of Research, 8 (4): 547–55.
qualitative data’, Advances in Nursing Science, 20 (2): Weaver, A. (1994) ’Deconstructing dirt and disease: the
66–74. case of TB’, in Bloor, M. and Taraborrell, P. (eds.),
Thompson, P. (1998) ’Sharing and reshaping life Qualitative Studies in Health and Medicine. Aldershot:
stories: problems and potential in archiving research Avebury. pp. 76–95.
narratives’, in Chamberlain, M. and Thompson, P. Weaver, A. and Atkinson, P. (1994) Microcomputing and
(eds.), Narrative and Genre. London: Routledge. Qualitative Data Analysis. Aldershot: Avebury.
pp. 167–81. Yamashita, M. and Forsyth, D.M. (1998) ’Family coping
Thomson, D., Bzdel, L., Golden-Biddle, K., Reay, T. with mental illness: an aggregate from two studies,
and Estabrooks, C.A. (2005) ‘Central questions of Canada and United States’, Journal of the American
anonymization: a case study of secondary use of Psychiatric Association, 4 (1): 1–8.
31
Secondary Analysis of
Quantitative Data Sources
Angela Dale, Jo Wathan and
Vanessa Higgins
INTRODUCTION The data sources discussed in this chapter

are primarily those collected through some
Secondary analysis is generally understood kind of survey, with a focus on microdata:
as the analysis of data originally collected typically individual-level data where there is
and analysed for another purpose (Hakim, one case for each respondent. However, we
1982; Kielcolt and Nathan, 1986; Dale et al, also mention data obtained from administra-
1988; Firebaugh, 1997). It is a method that tive records (for example vital registration,
has increased in popularity with the increasing taxation records or records relating to those
availability of high-quality data through who have claimed benefits), as well as
national data archives. Secondary analysis aggregate data – for example tables extracted
enables researchers to analyse datasets that from official sources such as the census
they would not dream of being able to of population, or the Office of Economic
collect themselves. Examples include surveys Cooperation and Development (OECD).
and census data collected by government, The chapter reviews the range of data
or surveys conducted by academics but then available for secondary analysis and includes
made available for others to use. Here, we some tips on how to find data sources, with a
also discuss the increasing number of surveys particular focus on the data archives that play a
which are collected specifically as a research key role in facilitating data access. We discuss
resource for others. Secondary analysis is some of the major benefits of secondary
sometimes used to refer to the analysis analysis and highlight ways in which it can
of data sources such as published reports be used to complement other methods, such
or newspaper articles. This may be better as qualitative interviews. We then go on
considered as primary analysis of secondary to stress the role of informed consent in
sources and is not discussed here. the re-use of data and the importance of
SECONDARY ANALYSIS OF QUANTITATIVE DATA SOURCES 521
good practice. Good practice has two aspects – national archives can be found from the
ensuring data are used in a responsible websites of the International Federation of
way that maintains the confidentiality of the Data Organisations (IFDO see http://www.
respondents, and also good practice in terms ifdo.org/) and Council of European Social
of analysis. Finally, we review some of the Science Data Archives (CESSDA see http://
new developments in access that have resulted www.cessda.org).
from web technologies. The archives listed in Box 31.1 are
typically created in an academic environment
with academic re-use in mind. However,
DATA AVAILABILITY many data collectors are also involved with
data distribution. In the United States a
In this section we discuss what a data archive range of microdata are available directly
does, what types of data are available and from the website of the Census Bureau,
provide some generic advice on how to find a whilst many other statistical offices, such
dataset. as the UK Office for National Statistics
(ONS) and the National Institute for Statistics
and Economic Studies (INSEE) in France,
Data archives
make summary statistics available online.
Data archives play a fundamental role in The United Nations Statistics Division
making data available for secondary analysis. provides a listing of national statistics offices
A data archive is a storehouse of digitised data. as well as links to other statistical databases
The archive performs a set of related functions (http://unstats.un.org/unsd/methods/inter-
which include obtaining data, assessing its natlinks/sd_natstat.htm (last accessed 06/02/07).
suitability for release, checking the data,
adding the necessary data description and
What data are available?
documentation and preserving the data for
future use. All archives have some form The range of data that is available for a specific
of catalogue. Large archives usually have country will vary with historical and cultural
sophisticated search facilities that allow you factors but may include many of the types
to browse through major studies, search described in Box 31.2. Data from private
abstracts for keywords and so forth. sector sources, or business surveys may also
Many countries now have either a national be available.
archive, or a small number of major archives.
This development can be traced back to the
Locating a dataset
establishment of the Roper Center in the
United States in 1957 (Dale et al, 1988) The following points provide some guidance
which continues to be a major source of on how to search for a dataset.
public opinion datasets. The Inter-University
Consortium of Public and Social Research 1 Searching your local data archive
(ICPSR) followed in 1962 and houses a The most obvious place to search for a dataset
broad range of social data, mainly from is in the archives of your own country. Such
academic and government sources. The UK archives will almost certainly have a website with
DataArchive was formed in 1967 and has been a searchable catalogue. The CESSDA (for Europe)
or IFDO websites are helpful in locating national
centrally funded by the Economic and Social
archives.
Research Council throughout this period. 2 Looking for data from data collectors
By the start of the century archives were If you know a dataset exists but cannot find
widespread as illustrated in Box 31.1. it in your local data archive it is worth finding
This list, which is far from comprehensive, out who collected or commissioned the data.
illustrates the extent to which archives are National statistical organisations and other major
found world-wide. More extensive lists of social survey organisations may be able to provide
Box 31.1 Some key national, and other, major data archives
Country Archive Web address1

Australia Australian Social Science Data Archive http://assda.anu.edu.au/
Austria Wiener Institut für Sozialwissen- http://www.wisdom.at/
schaftliche Dokumentation und
Methodik (WISDOM)
Czech Republic Sociologický datový archiv (SDA) http://archiv.soc.cas.cz/
France Reseau Quetelet http://www.centre.quetelet.cnrs.fr/
Germany Zentralarchiv für Empirische http://www.gesis.org/ZA/
Sozialforschung
Ireland Irish Social Science Data Archive http://www.ucd.ie/issda/
(ISSDA)
Israel Israeli Social Sciences Data Center http://isdc.huji.ac.il/
(ISDC)
Japan Information Center for Social Science http://ssjda.iss.u-tokyo.ac.jp/en/index.html
Research on Japan (SSJ)
Norway Norsk samfunnsvitenskapelig http://www.nsd.uib.no/
datatjeneste
South Africa South African Data Archive (SADA) http://www.nrf.ac.za/sada/
United UK Data Archive – a member of the http://www.data-archive.ac.uk
Kingdom Economic and Social Data Service http://www.esds.ac.uk
(ESDS). ESDS is a good entry point
for new researchers.
USA Inter-university Consortium of http://www.icpsr.umich.edu/
Political and Social Research
USA Minnesota Population Center http://www.pop.umn.edu/
USA Roper Center http://www.ropercenter.uconn.edu/
1 Note: urls given as at 6th February 2007.
you with access to the data or direct you to resources that can help you to locate potential data
another organisation that disseminates data on sources. A list of these is given in Box 31.3.
their behalf.
3 Using a dataset from another country
Some datasets are restricted to users within the Microdata based on administrative
country of origin. However, the archive website records
will usually describe access conditions. Often your
local data archive will be able to help you to obtain There are a growing number of datasets
the dataset. CESSDA has an international data that are constructed by linking together
browser and search facility which enables users administrative records for the same individ-
to explore a range of data published by major uals. However, they may not be listed in
European national archives. a data archive catalogue and will almost
4 Does a dataset exist? certainly only be available under restricted
A literature search on your research topic is conditions. The use of administrative records
a good way to find out about data availability.
for research has been pioneered by countries
If a major data source is available it is likely
such as Norway, Denmark, Finland and
that someone will already have used it. A good
knowledge of the literature will help you to identify the Netherlands. In these countries a single
the sorts of data sources that may be available. identification number which is used across
5 Other information sources a wide range of official records provides a
You may find that there are other resources basis of record linkage. In Denmark a unique
(often web-based) that can help with your search. ID is allocated to individuals at birth and is
The United Kingdom, for example, has a range of used by government departments responsible
Box 31.2 Types of data available to secondary analysts
Type of data Example

Summary statistics for Neighbourhood Statistics (United Kingdom)
small areas These data are drawn from a mixture of census and administrative
record sources. They provide summary statistics (e.g. counts) for small
administrative areas and are available directly from the Office for
National Statistics.
Large cross-survey series Enquete Emploi (France)
collected on behalf of The European Union required member states to conduct regular
government departments labour force surveys. Some countries make the survey microdata
available to secondary researchers. In France, the Enquete Emploi
is conducted annually in March. A sample containing data on
approximately 135,000 individuals is available from Réseau Quetelet
Large longitudinal The Panel Study of Income Dynamics (United States)
datasets The PSID is one of the longest running longitudinal studies, which
started in 1968 with a sample of approximately 4,800 households.
It focuses on family income and the determinants of changes in
income. Microdata files are available from the ICPSR.
Academic studies Social Change and Economic Life Initiative (United Kingdom)
The SCELI study was a programme funded by the UK’s Economic and
Social Research Council. Around 6,000 interviews were conducted in
four areas and collected work histories and attitudes to work. The
data and its associated follow-up studies are available from the
Economic and Social Data Service.
International comparative The European Social Survey (Europe)
studies The European Social Attitudes survey is conducted simultaneously in a
large number of countries. Data are available from ESS data website
at: http://ess.nsd.uib.no/
Census microdata The International Public Use Microdata Sets (USA/International)
Some countries make samples of microdata drawn from census output
databases available for reanalysis. These data have small sampling
fractions and are anonymised to protect confidentiality. The IPUMS is
a major international collection of such files which are held at the
University of Minnesota. Individual countries may also make census
microdata available through national archives or statistical offices.
for employment, taxation, benefits, education, determinants of mortality. These studies have
housing and health. This has enabled the all been based on evidence from death records,
Danish statistical office to create a research linked to other information from vital statistics
database by linking together records for each and, in some cases, census data. In the UK
individual in the country (Smith et al, 2004). these is a growing focus on realising the
This has, for example, been used to model the research benefits of record linkage across a
effect of proposed tax and benefit changes on much wider range of topic areas, although the
different sections of the population. Similarly, absence of single reliable ID which is used
Sweden has a longitudinal database for across all administrative records hampers
education, income and employment that was progress.
set up to support research on changes in In all cases, where administrative records
the Swedish labour market during the 1990s. are used for research there are major concerns
In Norway, Sweden and Denmark, and also over protecting the anonymity and confiden-
England and Wales, linked records have tiality of the individuals in the database.
long been used for analysis of the social This means that databases are very carefully
Box 31.3 Information sources for UK secondary analysts
Economic and Social Data Service (ESDS)

http://www.esds.ac.uk
This service supports the work done by the UK Data Archive in making data available. There are
four specialist functions which support secondary analysis of: government surveys, longitudinal data,
international comparative data and qualitative data.
Census of Population Programme

http://www.census.ac.uk
Census data is accessible through a separate service designed for academic users.
Question Bank
http://qb.soc.surrey.ac.uk
This service contains information on survey content and survey questions. There are also links to the
Survey Link Scheme which enables researchers to attend a survey briefing, and often to shadow an
interviewer in the field.
Office for National Statistics

http://www.statistics.gov.uk
The National Statistics Office of the UK is responsible for collecting many key data series. It provides
information on their surveys, summary statistics and published reports.
Intute
http://www.intute.ac.uk
Intute is a general resource which provides links to key websites.
controlled by the relevant national statistical means that the data can be used in much more
offices and research access is subject to very sophisticated analyses than is possible with
tight security measures (see section entitled tabular outputs.
‘Advances in access to data and support’). Large quantitative surveys tend to be
collected by agencies with well-established
reputations for quality research, for example,
ANALYTICAL AND RESEARCH VALUE the US Census Bureau or the UK Office for
National Statistics. Rigorous methodologies
The quantitative data sources available for and sophisticated sampling methods are
secondary analysis offer enormous potential employed and interviewers are trained exten-
for research on a wide range of topics. sively to ensure good quality data. Usually
Whilst tabular data provide an excellent the survey process is well documented and
source of material for many purposes (for data are carefully checked and edited. Access
example, national censuses of population to these very expensive resources provides
provide essential information on the structure considerable benefit to secondary analysts. In
of the population and, in particular, the the following paragraphs we briefly review
characteristics of small areas), these aggregate some of the key research benefits from
sources do not allow the analyst the flexibility secondary analysis of microdata files.
available with microdata. For example, access
to microdata provides a much more extensive
Large and nationally representative
range of variables, usually in a great deal of
samples
detail. This allows the creation of new cate-
gorisations and new definitions appropriate to Secondary analysis can provide the basis for
the research question, rather than using those making generalisations to the population as a
defined by the survey commissioner. It also whole. Large government surveys are usually
designed to be nationally representative and Study (LIS) (see http://www.lisproject.org/)

may contain weighting factors which gross up brings together economic, social, demo-
the sample to provide population estimates. graphic, and labour market data from about
In the US, large surveys such as the 30 different countries in Europe, America,
Survey of Income and Program Participation Asia and Oceania and is widely used in
(SIPP) (www.bls.census.gov/sipp) provide comparative studies of income inequality
comprehensive information about the income and poverty. For example, Rainwater and
and program participation of individuals and Smeeding (2003) have used comparative data
households in the US. In the UK, government from LIS to ask what it means to be poor in
surveys such as the Labour Force Survey a prosperous nation, especially for children.
provide detailed information on topics related They compare the situation of American
to employment, education and training and children in low-income families with their
earnings. In both examples sample sizes are counterparts in 14 other countries – including
very large and designed to be nationally Western Europe, Australia, and Canada, thus
representative; all adult members of the providing a powerful perspective on the
household are interviewed, and the surveys dynamics of child poverty in the US. Their
have been repeated, usually annually, over book also contains a valuable section on how
more than two decades. to use the LIS.
Samples of microdata drawn from the The Luxembourg Employment Study
census also provide large and nationally (LES) provides a similar set of data files
representative samples. The US Public-Use based on Labour Force Surveys for a range
Microdata Samples (PUMS) are samples of of countries. In both LIS and LES the support
individual records from the US decennial teams do a great deal of preparative work to
census of population. The files contain records make the studies comparable. Because most
representing 5 percent or 1 percent samples of the surveys come from national statistical
of the occupied and vacant housing units in offices and are not usually distributed
the US and the people in the occupied units. internationally, a system of remote access has
Similarly, the UK Samples of Anonymised been devised so that no microdata actually
Records (SARs) are samples of individual leave the secure LIS/LES setting (see section
records from the 1991 and 2001 censuses. entitled ‘Advances in access to data and
The 1991 SARs represent 2 percent of enu- support’).
merated individuals in the UK and 1 percent In other situations comparative analysis
of enumerated households, whilst in 2001 may be based on separate analyses of different
the individual-level SAR file increased to data sources but asking the same, or similar,
3 percent. The PUMS and SARs sample sizes questions. Breen (2005) reports results from
are much larger than most national surveys a large international comparative study of
thus permitting analysis of small groups and social mobility based on 11 different European
sub-national areas. Both cover the full range countries over more than 20 years (from
of census topics including housing, education, the mid-1970s to the mid-1990s). Data
health, transport, employment and ethnicity were coded to common international class
and, in the US, income. and education schemes. An early chapter
brings together all the datasets to make a
cross-country comparative analysis of social
International comparisons
mobility in Europe between 1970 and 2000.
Secondary analysis plays an important role in Subsequent chapters provide an analysis for
supporting international comparisons. Some- each country, by an expert from that country.
times it is possible to locate data sources from These country-specific chapters provide the
different countries with sufficient similarity in context needed to understand the differ-
the topics and questions asked to support com- ences found in the international comparative
parative research. The Luxembourg Income analyses.
The European Social Survey (ESS) pro- (e.g. Dickens et al, 2000, Marmot, 2003) or
vides a contrast in that it is explicitly before-and-after policy analysis (Gregg et al,
designed to support international comparison. 2005).
The survey started in 2001 and has been The General Social Survey has been
conducted every two years since. It covers conducted in the US by NORC since 1972
over 20 nations and is designed to chart and provides information on the changing
and explain the interaction between Europe’s attitudes of the US population. In a similar
changing institutions and the attitudes, beliefs vein, the British Social Attitudes Survey has
and behaviour patterns of its diverse pop- been conducted annually since 1983 and
ulations. Achieving equivalence across all provides a unique insight into how attitudes
countries participating in the study is a in Britain have changed over this time period.
principle that is applied to sample selection, Both studies form part of a larger programme:
translation of the questionnaire, and to the International Social Science Programme
all methods and processes. All procedures (ISSP) which provides comparative data for
and outcomes are comprehensively docu- up to 41 countries world-wide www.issp.org.
mented in a standard way. More information Cross-sectional surveys do not follow the
and direct download of data is available same individual over time so they cannot be
from:www.europeansocialsurvey.org. used to analyse individual level change over
Clark and Lelkes (2005) used the 2002– time. However, change across aggregated
2003 ESS to show that religion acts as a groups can be analysed. For example, Payne
buffer between stressful life events and the and Payne (1994) used the Labour Force
ensuing economic and social implications. Survey for 1979–1989 to model trends in the
All denominations suffer less psychological work chances of unemployed people relative
harm from unemployment than the non- to the chances of people in work. Longitudinal
religious. Catholics and Protestants are less data such as cohort studies or panel studies are
hurt by marital separation than the non- required to compare individuals at different
religious but, while Protestants are protected points in time.
against divorce, Catholics suffer a greater fall
in life satisfaction than other groups.
Cohort studies
In the UK a succession of birth cohorts have
Historical comparisons and change
studied people born in 1946, 1958, 1970 and,
over time
most recently, 2000–2001. These studies have
Many of the examples in the earlier section been repeated at intervals since birth and thus
also included a time dimension and secondary grow richer as the respondents grow older. For
analysis may be the only means by which example, the 1958 cohort study sampled all
historical comparisons can be made for those children born in Great Britain during one
information that cannot be collected retro- week in March 1958 and conducted follow-up
spectively. Data archives allow the researcher surveys of sample individuals at key stages
to go back in time and find sources of (e.g. ages 7, 11, 16, 23, 33, 42). It is expected
information on, for example, what people that all these cohort studies will continue
thought, how they voted and how much they throughout the lifetime of their members.
earned. Many surveys, such as the British Longitudinal birth cohort studies are valu-
General Household Survey (GHS), which able for investigating the lifetime processes
collects data on a range of topics covering of individuals. For example, using the 1958
household, family and individual information, cohort, Butler et al (1971) identified the
have now been running for 30 years or effect of smoking on low-birth weight and
more. These surveys have retained a high perinatal mortality; Hobcraft and Kiernan
degree of consistency in their core questions (2001) showed that any experience of child-
and therefore support time series analyses hood poverty is clearly associated with
adverse outcomes in adulthood; and Elias and whilst Berthoud and Gershuny (2000) provide
Blanchflower (1988) demonstrated the impact analyses based on the first seven years of
of early school achievement on occupational BHPS.
attainment.
A single cohort study is clearly limited in its
Small population sub-groups
ability to say anything about how outcomes
vary between different cohorts. However, Secondary analysis can provide a means of
the ability to compare a number of cohorts obtaining data on small groups within the
born at time intervals from 1946 to 2000 population for whom there is no obvious
becomes a very powerful analysis tool. Ferri sampling frame. However, a dataset must
et al (2003) provide an accessible account of be large enough to ensure that sufficient
cohort differences based on analysis of the numbers of the sub-group can be located, and
1946, 1958 and 1970 cohorts. Topics include: should also be able to provide a representative
family and parenting, qualifications and sample. Some surveys occasionally have
employment, income and living standards, special boost samples for sub-groups; for
physical and mental health, lifestyles, health example, the Health Survey for England
and citizenship. An account of the first contained ethnic minority boosts in 1999 and
findings from the Millennium cohort (births 2004 (Erens et al, 2001; Sproston and Mindell,
from 2000 to 2001) is given by Dex and Joshi 2006). The survey results highlighted some
(2005). interesting ethnic differences in health out-
comes. Bangladeshi and Pakistani men and
women, and Black Caribbean women, were
Panel studies
more likely than the general population to
Panel studies such as the US Panel Study report that they had bad or very bad health.
of Income Dynamics (PSID) and the British In relation to the general population (set at
Household Panel Study (BHPS) cover all 1.0) the risk ratios for bad or very bad health
ages, and are repeated at frequent intervals, were 3.77 for Bangladeshi men, 4.02 for
usually annually. Whereas cohort studies are Bangladeshi women, 2.33 for Pakistani men,
primarily suited to understanding develop- 3.54 for Pakistani women, and 1.90 for Black
mental processes over a life course, a panel Caribbean women (Sproston et al, 2006).
study is able to show the effect of short- Additionally, datasets with comparable
term changes in levels of income, household questions and data collection methods can
composition and changes in the economy. be pooled to increase sample sizes. For
For example, Jarvis and Jenkins (1999) use example Ginn and Price (2002) pooled a
the BHPS to show the impact of marital number of annual GHS datasets to look at the
break-up on income whilst Jenkins and Van subpopulations of divorcees. Many analysts
Kerm (2006) examine trends in income pool a number of years from the Labour Force
inequality and income mobility. The similarity Survey to allow analysis of ethnic minorities
between PSID and BHPS lends support to (Dale et al, 2006). When data are being pooled
comparative analyses between the US and over successive years it is vitally important to
Britain – for example Banks et al’s (2003) check that there are no changes in sampling
comparison of financial wealth inequality design, question wording or categorisation.
between these two countries. Both PSID and
BHPS provide a wealth of information to
Relationships within households
support users and have published collections
of papers that demonstrate very fully some Many datasets collect information about all
of the research strengths of the data. Five members in the household, for example
Thousand American Families captures the most of the UK government surveys, BHPS,
first 13 years of PSID and is now available PSID and the SIPP. This is valuable for
on-line from: www.psidonline.isr.umich.edu, analysing intra-household relationships and
supports research concerned with, for exam- characteristics of the individuals concerned,
ple, the impact of a partner’s characteristics interpretivists pointed out that these social
on women’s employment. Other levels of ‘facts’ were, in themselves, artefacts that
analysis may also be possible, for example it resulted from socially mediated processes.
is often possible to identify a family unit or, in For example, whether a suicide is recorded
the case of the UK Family Resources Survey, is influenced by legislation and coroner
a social-security benefit unit. decisions (Atkinson, 1977). This prototype
of secondary analysis not only came to be
associated with positivism but also with a lack
Combining survey analysis with of reflection on data sources.
qualitative research However, critical secondary analysts
New data dissemination and analysis tools should now be aware that survey data are
make it easier than in the past to conduct socially constructed artefacts of the processes
secondary analysis as part of a mixed methods that produced them. In this sense, secondary
approach. There are many ways in which this analysis is no different from other forms of
might be undertaken (Bryman, 1988). social research. The results of a qualitative
study based on in-depth interviews are,
• Secondary analysis can provide evidence to help similarly, a product of the relationship
in planning a qualitative study. For example between the subject and researcher, the
analysis of census data can help to target which researcher’s interpretation of that interaction,
geographical areas to use in an interview-based and the choices made over which aspects of
study. the research to report.
• Secondary data can provide a nationally represen- However, secondary analysis is usually
tative context for a small-scale study, such as a undertaken by researchers who did not
locality-based study or a study of divorcees or lone conduct the primary data collection. For this
fathers.
reason they have a more distant relationship
• Qualitative studies are often very important in
explaining relationships which are identified by
to the data and may not, therefore, fully
quantitative analysis (for example, the low levels appreciate the processes by which the data
of economic activity amongst some groups of were constructed. Therefore it is vital that
South Asian women in the UK; see Dale et al, analysts find out as much detail as possible
2006). about how the survey was conducted and the
• Secondary analysis can often be used to test strengths and limitations of the dataset. Axinn
theories generated as the result of qualitative and Pearce (2006: 23) make a number of
studies. valuable suggestions for ways in which the
secondary analyst can learn about the process
of collecting the data. These include using a
WHAT ARE THE METHODOLOGICAL copy of the survey questionnaire to interview
ISSUES ASSOCIATED WITH someone and then getting them to interview
SECONDARY ANALYSIS? you and visiting the organisation that collected
the data and inspecting fieldwork notes to
One of the earliest examples of secondary learn about the problems that occurred during
analysis is Durkheim’s classic study of fieldwork.
suicide – routinely cited as the archetype of The secondary analyst is usually trying to
positivistic research in which the adminis- answer a rather different research question
trative records of suicides were treated as than the primary data analyst. For example,
‘social facts’ to be studied as ‘things, that data may have been collected by a government
is as realities external to the individual’ department to address a particular policy
(Durkheim, 1952: 38). Whilst Durkheim’s requirement and concepts will, therefore,
study demonstrated that evidence on suicide reflect this. The secondary analyst needs to
rates showed relationships with particular work through their own conceptual definitions
before starting the study rather than accept- all data collection agencies, maintaining the
ing, uncritically, those of the primary data confidentiality of their respondents is of huge
collector. Often it is possible to combine data importance and a breach of confidentiality
elements in new ways to construct the desired may have negative consequences for the
definitions. Where the data are less than ideal respondent as well as a negative impact on
it is valuable to explain the shortcomings and the public’s willingness to participate in such
seek evidence of how this may affect the studies.
results. Even though secondary analysts may not
One of the benefits of secondary analysis face these obligations at the point of data
is that documentation and data are available collection, they inherit responsibilities as a
for others to use. This means that the research result of access to the data and must cooperate
results can be critically assessed by other in ensuring the confidentiality of the data. In
researchers and analyses replicated, perhaps some cases this means that a researcher will
using alternative assumptions or different not be able to obtain as much detailed data as
models. wished. The relationship between the amount
of detail released and the restrictions on access
are discussed further in the section entitled
ETHICS IN SECONDARY ANALYSIS ‘Advances in access to data and support’.
A further set of obligations arises with
At first sight secondary analysis may appear respect to professional conduct. Even though
to bypass all the ethical issues that arise there are no specific guidelines on secondary
at the data collection stage of a study. research, the codes of professional organisa-
The primary investigators will have been tions, whose remit covers secondary analysts,
responsible for obtaining appropriate ethical share some common features. Table 31.1 gives
approval for the study and made decisions the common features of the codes of the
about their procedures for informed consent British Social Research Association (SRA)
and for protecting the confidentiality of the (e.g. 2003), British Sociological Association
respondent. Data collection agencies take (BSA) (e.g. 2002) and the Royal Statis-
great care to ensure that procedures conform tical Society (RSS) (1993). These include
to high ethical standards. Many national maintaining awareness of necessary law and
statistical offices collect data under statutory legislation, reporting the limitations of your
requirement and, in these cases, the security data and method, respecting privacy and
of the data is governed by law. However, for maintaining confidentiality of data.
Table 31.1 A comparison of the ethical codes of the British Sociological Association, Royal
Statistical Society and Social Research Association
Conduct RSS BSA SRA
Ensure that you know the relevant law & regulations – abide by these ✓ ✓ ✓
Freely given informed consent wherever possible; be aware of power issues, explain the research ✓ ✓
fully and uses of data produced
Do not produce misleading research; honestly & proportionately state problems and limitations of ✓ ✓ ✓
your data and method. Distinguish interpretation of results from opinion. Give readers enough
information to assess the quality of work
Seek to upgrade your own skills ✓
Only do research work that you are competent to do ✓ ✓
Respect privacy – don’t unnecessarily intrude on subjects ✓ ✓
Consider the effects of your research, including publication; minimise harm to research ✓ ✓
participants and self
Maintain confidentiality of data – and inform research participants about the use to which data ✓ ✓ ✓
will be put
GOOD PRACTICE frame; what the level of response was; and,

most importantly, how this varied between
In this section we review issues around good different groups in the population.
practice in looking after data and using it in a If a survey has a complex sampling design,
responsible way that will produce research of information on sampling design needs to
high quality – both important aspects of the inform analysis. For example where the pop-
ethical use of data. ulation is stratified and a different sampling
fraction is used for different strata, the sample
needs to be adjusted before it is proportionate
Looking after data
to the population. A sample may be stratified
The secondary analyst is usually asked to by ethnic group, and a larger sampling fraction
accept the conditions laid down in some kind used for minority groups than for the majority
of agreement or licence that relates to the group. Disproportionate sampling may also
data to be used. Typically these require that occur where households are sampled and
you do not pass the data on to anyone else – only one person in each household selected
unless they have already agreed to these same for interview. In this case people in small
conditions – and not to try to identify any households have a disproportionate chance
individual or household from the data. These of being sampled by comparison with people
are basic conditions to protect the interests of in big households. Usually sample design
the individuals who take part in the study. weights will be supplied with the dataset
However, it is also important that data are so that weighting can correct for the effects
seen as a valuable commodity that needs to of sampling design and thus the dataset can
be treated with respect. This means that they reflect the target population. If this is not
should be stored securely and, when a project done results will at best be biased and,
is finished, disposed of securely. For example, in the examples given above, meaningless.
CDs containing data should be physically Most analysis packages (e.g. SPSS version
destroyed. Data files should not be left on 13 onwards, STATA, SAS) support the use of
your PC so that they may be accessible to the these kinds of survey weights.
next person who uses it. UK guidelines for
good practice in storing and deleting data are
Analysis issues: Non-response
available from http://www.esds.ac.uk/news/
microDataHandlingandSecurity.pdf. It is becoming increasingly difficult to obtain
high response rates to social surveys (Groves
and Couper, 1998; Groves et al, 2002; Couper
Use of documentation
and De Leeuw, 2003). The key concern for
Good practice also extends to ensuring that the secondary analyst is the fact that non-
data are used in an appropriate way. This respondents almost invariably differ from
entails reading all the relevant documentation respondents with obvious consequences for
so that you know, for example, what popula- the validity of the results of any analysis.
tion the data refer to, how the information was However, it is not just the level of non-
collected and compiled and what biases and response that matters but how it is distributed.
inaccuracies there may be in the data. Good If, in a survey with a 30 percent response
datasets will have extensive documentation. rate, the 70 percent of non-respondents were
allocated at random from the set sample then,
apart from small numbers, the low response
Analysis issues: Sampling
rate would not matter. However, if, in a
In the case of a sample survey you need to survey with a response rate of 80 percent, the
establish how the sample was drawn; whether 20 percent of non-respondents came almost
some people (e.g. students, the homeless, high entirely from the top 20 percent of earners, we
earners) were not included in the sampling would have serious concerns about drawing
inferences from the 80 percent of respondents, resource (see http://www.napier.ac.uk/depts/

despite the high response rate. It is therefore fhls/peas/).
important not just to establish the extent
of non-response for the overall population
Analysis issues: Item non-response
but how non-response is distributed across
population groups. It is often the case that questions have missing
In some surveys weights are available information and this may apply particularly
to correct for non-response. Non-response to some questions – for example questions
weights give a differential weighting to each about income. It is important that this is
respondent depending on the likelihood of not ignored. First, it is valuable to explore
non-response for people with their charac- the dataset to find out more about those
teristics. For example, we know that young who are missing on some questions – for
single men tend not to respond to surveys and example, are non-respondents on income
therefore young single men who were in the more likely to be working or not-working?
survey would have a higher weight than, say, Young or old? These simple analyses can
older women who have higher response rates. help to indicate whether dropping cases with
In longitudinal data, response rates are also missing data will bias the analysis. Kenward
of concern not just in the first survey sweep and Carpenter (2005) give examples where the
but in all subsequent sweeps. However, unlike use of imputation to correct for missing data
cross-sectional surveys, valuable information made a radical difference to the results of an
is available from earlier sweeps on the analysis on gender differences in children’s
characteristics of subsequent non-responders. literacy. Generally, it is worthwhile to use
Generally it is accepted that using non- imputation methods to deal with missing
response weights reduces bias in the estimates data. Guidance on the range of options for
and thus provides greater accuracy than not dealing with item non-response is available
using them. However, Plewis (2004) points at: www.missingdata.org.uk.
out that this may not be the case if the outcome
is related to the sources of the non-response
Modelling and causality
and the probability of response is related to
the outcome. For example, if the outcome of One of the strengths of survey analysis is
interest is voting, where young men in inner the ability to conduct multivariate analysis
cities are also known to have very low turn- (e.g. regression analysis) that includes all the
out rates, then weighting may increase the variables which theory suggests are influential
non-response error rather than reduce it. in producing a given outcome. This allows us
For many of the UK government surveys to assess the effect of, say, qualifications on
post-stratification or population weights are outcomes such as earnings whilst controlling
also calculated to allow weighting to the for the individual’s age, gender, full-time
latest Census. Typically weights are based or part-time working; size of organisation;
on variables such as age, sex and local work experience; and other related factors.
authority. Applying these weights will mean Almost always such models have an implicit
that descriptive statistics from the survey cor- or explicit assumption of causality. However,
respond with the relevant population figures Cox and Wermuth (2001: 70) provide a
on the variables used to derive the weights. very important reminder that single studies –
Information about the use of weights may whether based on cross-sectional or longitu-
be available through various support services. dinal data – cannot bear the weight needed
In the UK a guide to weighting has been for assumptions of causality. They emphasise
published by the ESDS. A broader account the need for caution in attributing causality
of the impact of sample design including and the importance of having an a priori
non-response (see below) is given in the Prac- explanation or ‘causal narrative’, rather than
tical Exemplars in Analysing Surveys web a retrospective explanation. They also warn
against reading too much into small effects, perspective, the potential of the grid might
even if statistically significant, and emphasise be to provide a controlled environment within
the importance of replication to see whether which disclosive data could be analysed
the explanatory variable in question is found without the need to distribute the microdata
repeatedly in independent studies. to users. The processing capacity can also be
harnessed for processing very large datasets
and conducting very power-hungry analyses
ADVANCES IN ACCESS TO DATA (Smith, 2004). These and other functions are
AND SUPPORT being championed by the UK National Centre
for E-social Science (www.ncess.ac.uk).
The technical developments of the web The increased power of the web, including
and remote access to data are mirrored in its search facilities, and the increased level of
moves towards distributed services: that is, data availability in general, has led to growing
a service that is located at more than one concerns over ensuring the confidentiality
physical site. The UK ESDS is one such of data for secondary analysis. This applies
service with specialist functions run by teams particularly to microdata where a great deal
at two universities which are 200 miles of information about an individual may be
apart. This geographical distance should not contained in a single record. By contrast, it is
be apparent to users who access a single much harder to identify someone using aggre-
website and who are supported by a joined gate statistics (e.g. a table from the Census).
up helpdesk. However the ability to run a We can define two interacting dimensions
distributed service means that it can benefit when considering access to data – the level
from specialist groups irrespective of their of safety associated with the dataset and the
geographical location. This has the potential level of safety associated with the access
to add more expertise to the service than would setting. The level of safety associated with
be possible if all staff were required to be in the dataset will depend heavily on the degree
the same institution. In this sense ESDS might of detail in the data; the proportion of
be considered to set a new standard in data the population in the sample and the ease
services. of identifying the data – either through
A second area of development builds on matching or spontaneous recognition. Thus a
the potential for networked technology to small sample with very restricted individual
provide linkages without the constraints of detail and little geographical information will
geography. Grid technology moves beyond be much ‘safer’ than a very large sample
the internet and provides the means for containing detailed individual information
users to benefit from increased data storage (e.g. occupation, educational qualification,
facilities and processing power. The Grid ethnic group) and also information on the
offers the potential for researchers to link locality of residence. The level of safety
data from different sources, held at a range associated with the access setting will range
of locations (perhaps still within the control from a safe setting within a statistical office at
of the data collector), with prescribed access one extreme, to unrestricted access to data by
conditions, and then to analyse these using any user, at the other extreme.
data processing power from one or more The two dimensions interact so that, at one
servers. In the UK, the academic community extreme, if the data are judged to be entirely
has made some investment in pilot projects safe, then the access arrangements can be very
to establish areas of potential development. open. This is exemplified by the Public Use
Grid technology has been used to provide Microdata Files produced by the US Bureau
virtual meeting spaces called ‘access grid of the Census, which can be downloaded
nodes’ which have been used for meetings without restriction from the website of the
between partners in the ESDS distributed US Bureau of the Census. These files are
data service. From a secondary analysis samples – 1 percent and 5 percent – where
the amount of both individual detail and multilevel modelling is now readily available
geographical information has been restricted (e.g. MLWin, STATA, SAS) and there is
to preserve confidentiality (Census Bureau, abundant provision of training to help the new
2005). user (see www.ncrm.ac.uk/database).
By contrast, if the data are very detailed Structural equation modelling (SEM)
and/or contain information that could be allows much greater flexibility in defining
readily used to identify someone, then greater models than standard multiple regression. It
safety needs to be built into the access condi- introduces the concept of the latent variable
tions. An example is the ONS Longitudinal which has multiple indicators and can
Study that contains data with a great deal correct for some of the measurement error in
of individual and geographical detail drawn standard regression analysis. Models tested
from the census and from vital events (e.g. may have complex causal pathways, often
birth and death records). For this dataset with a two-way direction of causality. SEM
access is only available within a secure setting allows specific pathways in the model to be
inside the Office for National Statistics. An tested as well as an overall test of a model.
alternative is a remote access facility such as As for multilevel modelling, software for
that used for the Luxembourg Income Study. using SEM is becoming much more widely
Here researchers send requests for analysis (in available and, for both, there are a growing
the form of SPSS or STATA programs) which number of courses and on-line resources.
are run and checked before the non-disclosive Other developments in secondary analysis
results are emailed to the researcher. relate to the linkage of additional data sources
In practice most data are made available to supplement or augment individual level
under some kind of licence whereby the records collected by a survey. The simplest
user agrees to a set of conditions designed example is where aggregate information about
to ensure the confidentiality of the data. a respondent’s locality is attached to that
However, as researchers need more detailed individual (for example, area-level statistics
data, for example including information on from the census) and this can then be used to
locality, or more datasets are produced by explain variation at the level of the locality
linking administrative records, then they will in a multilevel model. In addition, external
be subject to tighter controls. data may be matched to an individual, for
example tax returns may be used to provide
accurate information on earnings. This has
NEW DEVELOPMENTS IN METHODS been introduced in the Canadian Survey of
Living and Income Dynamics (SLID), where
Developments in statistical analysis now pro- respondents can choose between providing
vide more opportunities for building models detailed information on income or allowing
that reflect some of the complexities of social this to be obtained from their tax return. This
life – for example, analysis of children’s uses an exact matching method where it is
attainment in school. Multilevel models allow vitally important to have identical keys in
one to define children by the class they are in both data sources. An alternative that is used
(and the characteristics of their class teachers); where income data cannot be obtained from
the school they attend (and the characteristics the respondent is to add an estimated value to
of the school); and also the catchment area each individual. For example, a survey which
of the school. All these different levels are does contain the required income information
known to affect a child’s attainment and may be used to identify a set of explanatory
their impact can be modelled. Similarly, variables that predict income well. If these
multilevel models can improve analyses of explanatory variables are also contained in the
unemployment, for example, by allowing dataset without income, then they can be used
information about the local labour market as a basis for predicting the expected income
to be included in the model. Software for of each individual.
CONCLUSION Documentation, http://www.census.gov/prod/cen2000/

doc/pums.pdf, accessed 20.09.07
This chapter has demonstrated the breadth of Clark, A.E., & Lelkes, O. (2005) Deliver Us From
high-quality data available from secondary Evil: Religion As Insurance, Paris, PSE, http://www.
sources and some of the exciting research pse.ens.fr/document/wp200543.pdf, accessed
11.3.07.
areas that are opened up through secondary
Couper, M., & De Leeuw E. (2003) Non-response in
analysis. The increased ease of locating and cross-cultural and cross-national surveys, in Janet
accessing the data (and the accompanying A. Harkness, Fons J.R. van de Vijer and Peter Mohler
documentation) means that secondary analy- (eds) Cross-cultural Survey Methods. New Jersey:
sis can be readily used in its own right and John Wiley.
also to complement other forms of research. Cox, D.R., & Wermuth, N. (2001) Some statistical
It is very cost-effective in terms of both time aspects of causality. European Sociological Review,
and money and may therefore be particularly Vol. 17, No. 1, pp. 65–74.
valuable to graduate students or to researchers Dale, A., Arber, S., & Procter, M. (1988) Doing Secondary
Analysis. London: Unwin Hyman.
with very limited funding. However, we
Dale, A., Lindley, J., & Dex, S. (2006) A life-
have also argued that, despite not having
course perspective on ethnic differences in women’s
participated in the first-hand collection of the economic activity in Britain, European Sociological
data, the analyst nonetheless has an obligation Review, Vol. 22, No. 4, pp. 459–476.
to ensure that the data are used responsibly and Dex, S., & Joshi, H. (eds) (2005) Children of the 21st
the data subjects’ confidentiality is protected. Century: From Birth to Nine Months. Bristol: The Policy
Finally, exciting new developments in access Press.
to data, data support and statistical methods Dickens, R., Gregg, P., & Wadsworth, J. (2000) New
are enhancing opportunities and potential for Labour and the Labour Market, Centre for Market
secondary analysis. and Public Organisation working paper series 00/19.
Durkheim, E. (1952) Suicide - A Study in Sociology.
London: Routledge & Kegan Paul.
Elias, P., & Blanchflower, D. (1988) The Occupations,
REFERENCES Earnings and Work Histories of Young Adults – Who
Gets the Good Jobs? Department of Employment
Atkinson, M. (1977) Coroners and the classification of Research Paper No. 68, London: Department of
deaths as suicide, in C. Bell & H. Newby (eds) Doing Employment.
Sociological Research. London: Unwin Hyman. Erens, B., Primatesta, P., & Prior, G. (eds) (2001) Health
Axinn, W., & Pearce, L. (2006) Mixed Methods Survey for England 1999 – the Health of Minority
Data Collection Strategies. New York: Cambridge Ethnic Groups, Vol 1: Findings. London: TSO.
University Press. Ferri, E., Bynner, J., & Wadworth, M. (eds) (2003)
Banks, J., Blundell, R., & Smith, J.P. (2003) Financial Changing Britain, Changing Lives: Three Generations
wealth inequality in the United States and Great at the Turn of the Century. London: Institute of
Britain, The Journal of Human Resources, Vol. 38, Education.
No. 2, pp. 241–279. Firebaugh, G. (1997) Analyzing Repeated Surveys.
Berthoud, R., & Gershuny, J. (eds) (2000) Seven Years in Thousand Oaks, CA: Sage.
the Lives of British Families. Bristol: The Policy Press. Ginn, J., & Price, D. (2002) Do divorced women catch up
Breen, R. (ed.) (2005) Social Mobility in Europe. Oxford: in pension building, Child and Family Law Quarterly,
Oxford University Press. Vol. 14, No. 2, 157ff.
British Sociological Association (2002) Statement of Gregg, P., Waldfogel, J., & Washbrook, E. (2005) Family
Ethical Practice. Durham: BSA. Expenditures Post-Welfare Reform in the UK: Are Low
Bryman, A. (1988) Quantity and Quality in Social Income Families Starting to Catch Up? Centre for
Research. London: Routledge. Market and Public Organisation working paper series
Butler, N.R., Goldstein, H., & Ross E.M. (1971) Cigarette 05/119.
smoking in pregnancy: Influence on birth and Groves, R., & Couper, M. (1998) Non-response in
perinatal mortality, British Medical Journal, Vol. 1, Household Interview Surveys. New York: John Wiley.
pp. 127–130. Groves, R., Dillman, D., Eltinge, J., & Little, R.
Census Bureau (2005) Public use microdata sample: (2002) Survey Non-response. New York: John Wiley
2000 Census of Poplulation and Housing technical and Sons.
Hakim, C. (1982) Secondary Analysis of Payne, J., & Payne, C. (1994) Recession, restructuring
Social Research. London: George Allen and and the fate of the unemployed: Evidence in
Unwin. the underclass debate, Sociology, Vol. 28, No. 1,
Hobcraft, J., & Kiernan, K.E. (2001) Childhood pp. 1–19.
poverty, early motherhood and adult social exclusion, Plewis, I. (2004) Weighting for Non-response: Illustra-
British Journal of Sociology, Vol. 52, No. 3, tive Examples www.ccsr.ac.uk/esds/events/2004-03-
pp. 495–517. 12/documents/plewisexamleshandout.doc
Jarvis, H., & Jenkins, S.P. (1999) Marital splits and Rainwater, L., & Smeeding, T.M. (2003) Poor Kids in
income changes: Evidence from the British Household a Rich Country: America’s Children in Comparative
Panel Survey, Population Studies, Vol. 53, No. 2, Perspective. New York: Russell Sage Foundation.
pp. 237–254. Royal Statistical Society (1993) Royal Statistical Society
Jenkins, S.P., & Van Kerm, P. (2006) Trends in income Code of Conduct. London: RSS.
inequality, pro-poor income growth and income Samples of Anonymised Records: http://www.ccsr.ac.
mobility, Oxford Economic Papers, Vol. 58, No. 3, uk/sars
pp. 531–548. Smith, S. (2004) Grid Enabling the SARs. Manchester:
Kenward, M., & Carpenter, J. (2005) Missing Centre for Census and Survey Research, http://www.
Data Methodology for Multilevel Models, Methods ccsr.ac.uk/sars/publications/
Briefing 5, www.ccsr.ac.uk/methods/publications/ Smith, G., Noble, M., Anttila, C., Gill, L., Zaidi, A.,
documents/kenward_000.pdf. Wright, G., Dibben, C., & Barnes, H. (2004) The Value
Kielcolt, K.J., & Nathan, L.E. (1986) Secondary Analysis of Linked Administrative Records for Longitudinal
of Survey Data. London: Sage. Analysis, Report to the ESRC National Longitudinal
Luxembourg Income Study Project http://www. Strategy Committee.
lisproject.org/. Social Research Association (2003) Ethical Guidelines.
Marmot, M. (2003) Monitoring Socio-economic Differ- London: Social Research Association.
ences in Health, presentation to the Health Surveys Sproston, K., & Mindell, J. (eds) (2006) Health Survey
User Group, RSS, London, 11/07/03, slides available for England 2004. Volume 1: The Health of Minority
at www.esds.ac.uk Ethnic Groups. London: The Information Centre.
32
Conducting a Meta-Analysis
Erika A. Patall and Harris Cooper
A literature review typically summarizes summary of the pooled results’ (Last, 2001).
results of past studies, suggests potential Even though meta-analysis has the same
reasons for inconsistencies in past research goals as the traditional narrative review,
findings, and directs future investigations. many limitations of the narrative review can
Researchers often use a narrative approach to be addressed by using statistical procedures
summarize and integrate research on a specific to combine the results of previous studies.
topic. The traditional narrative reviewer For example, one advantage of quantitative
identifies articles relevant to the topic of synthesis was demonstrated empirically in
interest, examines the results of each article a study by Cooper and Rosenthal (1980).
to see whether the hypothesis was supported, Faculty members and graduate students were
and provides an overall conclusion. asked to draw summary conclusions using
Traditional narrative reviews have been either a meta-analytic or narrative approach
criticized because, although they can provide about studies that tested whether females
a meticulous list of multiple tests of a showed greater persistence at tasks than
hypothesis, they often fail to fully and males. Results showed that narrative review
accurately integrate the conclusions contained procedures led to inaccurate or imprecise
in them (Hunt, 1997). Narrative reviews are characterizations of the cumulative research
prone to allowing the biases of the reviewer results; in particular, reviewers using a
to enter into conclusions, because information qualitative approach underestimated the size
in the original studies can be discarded or of the effect.
improperly weighted. In this chapter we provide a framework
More recently, systematic research synthe- for understanding meta-analysis. First, major
ses that include meta-analyses have taken the meta-analytic procedures are described. This
place of purely narrative reviews of empirical is followed by a discussion of the major
literature. Meta-analysis is ‘the statistical syn- challenges that face the meta-analyst and
thesis of the data from separate but similar, i.e. some new directions in the development of
comparable studies, leading to a quantitative meta-analytic methods.
CONDUCTING A META-ANALYSIS 537
PROCEDURES OF META-ANALYSIS locate eight studies, each of which randomly

assigned participants to play pool with
Much like primary research, a rigorous friends or to a control condition and then
research synthesis involves several stages, measured well-being. Information relevant
including problem formulation, data collec- to our fictional example can be found in
tion or the literature search, data evaluation, Table 32.1.
analysis and interpretation, and public presen-
tation (Cooper, 1998). A detailed description
Estimating effect sizes
of each stage of the research synthesis process
is beyond the scope of this chapter. Rather, Often in a meta-analysis, answering the
we focus on the statistical analysis and questions ‘Does playing pool with friends
interpretation stage. For a full discussion of have an effect on well-being?’ and ‘How
methods involved at each stage of a research much of an effect does playing pool with
synthesis, the interested reader may refer friends have on well-being?’ are the questions
to Cooper’s Synthesizing Research (1998) of greatest importance. To answer these
or Cooper and Hedges’ The Handbook of questions, meta-analysts will (a) calculate an
Research Synthesis (1994). effect size for the outcomes of hypothesis tests
As we begin, it is important to note in every study; (b) average these effect sizes
three assumptions crucial to the validity of across hypothesis tests to estimate general
the conclusions of a meta-analysis. First, magnitudes of effect and calculate confidence
each finding used in the calculation of intervals as a test of the null hypothesis;
average effect sizes and their associated and (c) compare effect sizes to discover
statistics is assumed to be testing the same if variations in outcomes exist and, if so,
relationship. Second, individual findings used what features of comparisons might account
in a cumulative analysis must be independent for them.
from each other. Finally, a meta-analysis is Cohen (1988) defined an effect size as
only as good as the primary research it is ‘the degree to which the phenomenon is
cumulating. Therefore, the meta-analyst must present in the population, or the degree to
believe that the primary researchers made which the null hypothesis is false’ (pp.9–10).
valid assumptions when they computed the There are many different metrics to describe
results of their statistical tests. an effect size. Generally, each metric is
To make some of the procedures involved associated with particular research designs.
in meta-analysis more concrete, we will use a We discuss three primary metrics to describe
fictional example of a research synthesis. This effect sizes. Even though there are others, we
synthesis attempts to answer the question, limit our discussion to those most amenable
‘What is the impact of playing pool with to single degree of freedom tests involving
friends on well-being?’ We will assume that combinations of variables that are continuous
the (hypothetical) synthesist was able to or dichotomous.
Table 32.1 The effect of playing pool with friends on well-being

Study Sex Treatment group Comparison group Direction of Effect size Probability
Sample size Sample size the effect (d ) (p)
1 Female 23 25 positive 0.90 0.003
2 Male 42 46 positive 0.51 0.019
3 Male 18 18 positive 0.13 0.595
4 Female 32 45 negative 0.18 0.437
7 Male 66 64 negative 0.35 0.048
8 Male 27 27 positive 0.71 0.011
The d-index A. d=0

Cohen’s d-index1 is a scale-free measure of
the difference between two group means. It
is used when one variable in the relations
is dichotomous and the other is continuous.
Calculating the basic d-index for any compar- Well-Being
ison involves dividing the difference between
the two group means by either their pooled d = 0.40
B.
standard deviation or the standard deviation
of the control group. The result is a measure No Pool Pool Group
of the difference between the two group means Group
expressed in terms of their common standard
deviation. The formula is as follows:
Well-Being
X1 − X2
d= C. d = 0.85
sp
No Pool Pool Group
where X1 and X2 represent the two group Group
means and sp is the pooled standard deviation
defined as:
Well-Being
(n1 − 1) s12 + (n2 + 1) s22
sp = Figure 32.1 Examples of differences in
(n1 − 1) + (n2 + 1)
standard deviation units
where n1 and n2 represent the number of
subjects in each group and s1 and s2 represent
the standard deviation of each of the groups. the means and standard deviations of the
Because the d-index is scale free, the separate groups. For such cases, Rosenthal
standard deviation adjustment in the denom- (1984, 1994) has provided a computational
inator of the formula means that studies formula for the d-index that does not require
using different measurement scales can be the meta-analyst to have means and standard
compared or combined. Reporting of effect deviations. The formula is as follows:
sizes, such as the d-index, in primary research 2t
is not yet universal. d=√
dferror
To illustrate how the d-index should
be interpreted, Figure 32.1 presents three where t represents the value of the t-test for the
hypothetical d-indexes. Figure 32.1A presents associated comparison and dferror represents
a null relationship; d = 0 and there is the error degrees of freedom associated with
no difference between participants randomly the t-test. In fact, the d-index can be computed
assigned to play pool with friends and from a variety of statistical data. For a
participants who do not play pool with complete listing of algebraically equivalent
friends. In Figure 32.1B, the participants formulas that can be used to compute an effect
who play pool have an outcome score size from various statistical information, the
that is four-tenths of a standard deviation interested reader should see Lipsey and
above the control group. Here d = 0.40. Wilson’s (2001) Practical Meta-Analysis.
In Figure 32.1C, d = 0.85, indicating an even
greater separation between the two group The r-index
means. Another effect size metric is the r-index,
In many instances, synthesists will find or the Pearson product-moment correlation
that primary researchers do not report coefficient. Typically, it is used to measure
the degree of linear relation between two vari- Table 32.2 An example of odds ratio
ables. The correlation coefficient is familiar estimation
to most researchers and is most appropriate Pool playing Control
when describing the relationship between two Arrested a = 75 b = 60
continuous variables. Not arrested c = 25 d = 40
Information, such as variances and covari-
ances necessary to calculate a correlation
odds of arrest were 3 to 1 (75 to 25).
coefficient are rarely provided in primary
When participants did something else, the
research reports. Luckily, most researchers
odds of arrest were 1.5 to 1 (60 to 40).
provide r-indexes in cases where they apply.
The meta-analyst then simply forms the ratio
When only the t-value associated with the
of the playing-pool odds over the control
r-index is given, the r-index can be calculated
activities odds. In this case, the odds ratio
with the following formula:
is 2, meaning the odds of arrest are twice as
large in the pool-playing condition as in the
t2
r= placebo condition. The odds ratio can also be
t + dferror
2
calculated by dividing the product of the main
diagonal elements by the product of the off-
where all terms are defined as before.
diagonal elements. In this example, using the
However, it should be noted that this
previously described formula,
formula will always produce a positive value.
Consequently, the researcher should seek ad 75 × 40
additional information in the primary research OR = = =2
bc 60 × 25
report, such as a verbal description of the
relationship, which would allow the direction where all terms are defined in Table 32.2.
of the relationship to be determined.
Identifying independent samples
The odds ratio
The odds ratio is applicable when both A statistical problem arises when a single
variables are dichotomous and findings are study contains multiple effect size estimates
presented as frequencies or proportions. This taken on the same sample of participants.
measure of effect is used most in medical There are several approaches meta-analysts
sciences, in which the researcher is often use to handle such dependent effect sizes.
interested in the effect of a treatment on Some treat each effect size as independent,
mortality or the appearance or disappearance regardless of the number of effect sizes that
of disease. It also appears frequently in comes from the same sample of people. The
studies of educational interventions when the strength of this technique is that it does
outcome of interest is drop-out or retention not lose any of the within-study information
rates or criminal justice studies where the regarding potential moderators. However,
outcome is recidivism. Take for example, a this strategy violates the assumption that
case in which we are interested in whether the estimates are independent. This may
playing pool with friends led to subsequent cause the standard error associated with the
arrest. Suppose that the meta-analyst came overall effect to be underestimated and the
across a study in which 200 people either robustness of the effect to be exaggerated.
played pool with friends or did not and then Further, the results of studies will not be
examined evidence for arrests later that night. weighted equally in any overall conclusion
The results of the study could have looked like about results. Rather, studies will contribute
the fictional data presented in Table 32.2. to the overall effect in relation to the number
First, the odds that a participant was of statistical tests contained in it.
arrested must be determined for each con- Other meta-analysts use the study as the
dition. When participants played pool, the unit of analysis. They calculate the mean
effect size, or take the median result, or weight in calculating the average effect. In the
identify a preferred outcome measure, and use weighted procedure, each independent effect
this value to represent the study. This strategy size is first multiplied by the inverse of its
ensures that the assumption of independence variance and the sum of these products is
is not violated and that each study contributes then divided by the sum of the inverses. The
equally to the overall effect. However, some weighting procedure is generally preferred
within-study information may be lost in this because it gives greater weight to effect
approach. sizes based on larger samples and larger
Sophisticated statistical models also samples provide more precise estimates of the
have been suggested as a solution to the population value. Also, confidence intervals
problem of dependent effect size estimates are calculated for weighted average d-indexes
(Gleser & Olkin, 1994; Raudenbush and used as a test of the null hypothesis that
et al., 1988) but due to their complexity they no relation exists in the population. Hedges
are yet rarely found in practice. and Olkin (1985), Shadish and Haddock
A compromise solution is to use a shifting (1994), and Lipsey and Wilson (2001) provide
unit of analysis (Cooper, 1998). In this procedures for calculating the appropriate
procedure, each effect size is coded into the weights and confidence intervals.
dataset as if it were an independent estimate. For the d-index this procedure requires the
For example, if a study of playing pool used meta-analyst to calculate a weighting factor,
both the Satisfaction with Life Scale (Diener wi , which is the inverse of the variance
et al., 1985) and the Subjective Happiness associated with each d-index estimate:
Scale (SHS) (Lyubomirsky & Lepper, 1999)
2(ni1 + ni2 )ni1 ni2
to measure well-being, two separate d-indexes wi =
would be calculated. In the shifting unit of 2(ni1 + ni2 )2 + ni1 ni2 di2
analysis approach, for estimating the overall
relation between playing pool with friends where ni1 and ni2 represent the number
and well-being, statistical independence is of data points in Group 1 and Group 2
maintained by averaging these two d-indexes of the comparison and di represents the
prior to entry into the analysis, so that the study d-index of the comparison under considera-
only contributed one effect size. However, tion. Table 32.3 presents the group samples
in an analysis that examined the effect of sizes, d-indexes, and wi associated with each
measurement characteristics on effect size, comparison from our fictional pool-playing
each sample would contribute one estimate to and well-being example. The next step in
the effect size for life satisfaction measures obtaining a weighted average effect size
and one to the effect size for happiness involves multiplying each d-index by its
measures. This shifting unit of analysis associated weight and dividing the sum of
approach retains as much data as possible these products by the sum of the weights. The
from each study while holding to a minimum formula is:
violations of the assumption that data points

N
are independent. di wi
i=1
d• =
N
Averaging effect sizes wi
i=1
The most pivotal outcomes of a meta-analysis
are the average effect sizes and measures where all terms are defined as before.
of dispersion that accompany them. Both Table 32.3 shows the average weighted
unweighted and weighted procedures are d-index for the eight comparisons was found
typically used to calculate average effect to be d = 0.21.
sizes across comparisons. In the unweighted Finally, the confidence interval around the
procedure, each effect size is given equal average effect size estimate can be calculated.
Table 32.3 An example of d -index estimation and tests of homogeneity

Finding ni 1 ni 2 di wi di2 wi di wi Grouping
1 23 25 0.90 10.88 8.81 9.79 Female
2 42 46 0.51 21.26 5.53 10.84 Male
3 18 18 0.13 15.96 0.28 2.12 Male
4 32 45 −0.18 18.62 0.60 −3.35 Female
5 36 24 0.90 13.12 10.63 11.81 Female
6 48 48 0.16 47.32 1.16 7.40 Female
7 66 64 −0.35 32.00 3.92 −11.20 Male
8 27 27 0.71 12.70 6.40 9.02 Male
292 297 2.78 171.89 37.34 36.44
36.44
d• = = 0.21
171.89
1
CId• 95% = 0.21 ± 1.96 = 0.21 ± 0.15
171.89
36.442
Qt = 37.34 − = 29.62
171.89
Qw = 13.89 + 14.72 = 28.61
Qb = 29.62 − 28.61 = 1.01
First, the inverse of the sum of the wi s is found. Lipsey & Wilson, 2001; Rosenthal, 1994), zi
Then, the square root of this variance is using the following formula:
multiplied by the z score associated with
1+r
the confidence interval of interest. Thus, zi = 1 2 loge
the formula for a 95% confidence interval 1−r
would be: where r is the correlation coefficient and loge

1 is the natural logarithm. Next, the following
CId• 95% = d• ± 1.96 N formula is applied to compute the average

wi weighted z:
i=1

N
where all terms are defined as before. The 95% (ni − 3)zi
i=1
confidence interval for the eight pool-playing z• =
comparisons includes values of the d-index N
(ni − 3)
0.15 above and below the average d-index. i=1
Thus, we expect 95% of estimators of this
where ni represents the total sample size for
effect to fall between d = 0.06 and d = 0.36.
the ith comparison and all other terms are
Note that the interval does not contain the
defined as before. For the confidence interval,
value d = 0. It is this information that can
the formula is:
be taken as a test of the null hypothesis that
no relation exists in the population. In this 1.96
CIz• 95% = z• ±
example, we would reject the null hypothesis
N
that there is no difference in well-being (ni − 3)
i=1
between people who play pool with friends
and those who do not. where all terms are defined as before. Finally,
A parallel procedure is conducted to find to present results, z is transformed back to the
the average weighted r-index and confidence original r metric using the inverse of Fisher’s
interval. However, because the sampling z to r transformation (Lipsey & Wilson, 2001):
distribution for r is not symmetrical except
when ρ equals 0, first r is transformed to its e2zi − 1
r=
corresponding z score (Hedges & Olkin, 1985; e2zi + 1
where e is the base of the natural logarithm the problem of having cell frequencies equal
(2.718) and all other terms are defined as to zero, this strategy will bias the estimate
before. such that the strength of the relationship
Like the correlation coefficient, the odds will be slightly underestimated (Fleiss, 1994).
ratio must also be transformed by taking When only a few contingency tables contain
the natural logarithm (Haddock et al., 1998; zeros, this solution is acceptable. However, if
Lipsey & Wilson, 2001): there are many cases in which cell frequencies
are equal to zero, the Mantel-Haenszel method
LOR = loge (OR) of combining odds ratios should be used
(Hauck, 1989). The interested reader may
Next, a weighting factor, wi , which is the
refer to Lipsey and Wilson (2001) or Shadish
inverse of the variance associated with each
and Haddock (1994) for a full discussion of
logged odds ratio is calculated using the
this method.
following formula:
abcd
wi = Models of error
ab(c + d) + cd(a + b)
Another aspect of conducting a meta-analysis
where all terms are defined in Table 32.2 that has recently received considerable atten-
illustrating the odds of being arrested after tion involves the decision about whether a
playing pool with friends. fixed effects or random effects model of error
The next step in obtaining a weighted underlies the generation of study outcomes.
average effect size involves multiplying each In a fixed effects model, all studies assumed
logged odds ratio by its associated weight and to be drawn from a common population are
dividing the sum of these products by the sum therefore, estimating a common population
of the weights. The formula to calculate the effect. As such, variance in effect sizes is
weighted average logged odds ratio is: assumed to reflect only sampling error, that

N is, error solely due to participant differences.
LORi wi This type of error is the only error taken into
i=1
LOR• = account using the procedures just described

N
for weighting effect sizes by sample size.
wi
i=1 However, sometimes other features of studies
can be viewed as random influences. For
where LORi represents the logged odds ratio example, studies that look at the impact of
for the ith comparison and all other terms are pool playing on well-being might vary in
defined as before. For the 95% confidence the types of pool halls in which the studies
interval, the formula is: were conducted, in the length of play, and

1 in the game of pool being played. In this
CILOR• 95% = LOR• ± 1.96 N case, it may be most appropriate to consider
pool halls as randomly sampled from all
wi
i=1 pool halls and pool games randomly sampled
from all games. That is, in a random-effect
where all terms are as defined before. Finally
analysis, study-level variance is assumed to
these summary statistics can be converted
be present as an additional source of random
back to the original odds ratio metric by taking
influence.
the antilogarithms.
The question each meta-analyst must ask
OR = eLOR is whether the effect sizes in a dataset are
affected by a large number of these study-
It should be noted that if any of the cell level random influences. If it is the case that
frequencies equal zero, 0.5 should be added the meta-analyst suspects a larger number
to every cell. Even though this solution solves of these additional sources of random error
in effect sizes then a random effects model effect size estimate under a fixed effects model
is most appropriate in order to take these was d = 0.21 with a 95% confidence interval
sources of variance into account. If the meta- from 0.06 to 0.36. However, when a random
analyst suspects that the data are most likely effects model was used, the estimate was
little affected by other sources of random d = 0.31 with a 95% confidence interval from
variance, then a fixed effects model can −0.01 to 0.63. Note that the mean estimate
be applied. Alternatively, Hedges and Vevea of d changes using the random-effect error
(1998; p. 3) state that fixed-effect models model, because of a changed (lesser) effect
of error are most appropriate when the goal of weighting studies by sample size on the
of the research is ‘to make inferences only result. Note also that in the random-effects
about the effect size parameters in the set of error model, the variance around the mean
studies that are observed (or a set of studies estimate increases and the combined result of
identical to the observed studies except for pool-playing studies no longer rejects the null
uncertainty associated with the sampling of hypothesis. In this case then, caution must be
subjects).’A further statistical consideration is taken when considering the interpretation of
that in the search for moderators, fixed effect the result that playing pool with friends has
models may seriously underestimate error a positive effect on well-being, given that the
variance and random effects models may seri- effect is statistically different from zero only
ously overestimate error variance when their when a fixed-effects model is assumed.
assumptions are violated (Overton, 1998).
In view of these competing sets of con-
Homogeneity of effect sizes
cerns, we recommend that the meta-analyst
consider applying both models (e.g. Cooper In addition to the confidence interval as a
et al., 2006). Specifically, all analyses could measure of dispersion, meta-analysts usually
be conducted twice, once employing fixed carry out homogeneity analyses. Homogene-
effect assumptions and once using random ity analyses allow the meta-analyst to explore
effect assumptions. Differences in results if effect sizes vary from one study to the
based on which set of assumptions is used next. A homogeneity analysis compares the
can be incorporated into the interpretation and amount of variance in an observed set of
discussion of findings. effect sizes with the amount of variance that
Formulas to calculate random effects esti- would be expected by sampling error alone
mates of the mean effect size, confidence and provides calculation of how probable it
intervals, and homogeneity statistics are com- is that the variance exhibited by the effect
plex and involve a two-stage process. As such, sizes would be observed if only sampling
the interested reader should refer to Hedges error was making them different. If there is
and Olkin (1985), Raudenbush (1994), and greater variation in effects than would be
Lipsey and Wilson (2001) for a full dis- expected by chance, then the meta-analyst can
cussion of random effects computation. In begin the process of examining moderators
addition, several statistical packages have of comparison outcomes. If the observed
recently been developed specifically for meta- variance is not significantly different from
analysis that allow the meta-analyst to easily that expected by sampling error alone, many
conduct analyses using both fixed and ran- statisticians advise the meta-analyst to stop the
dom effects assumptions (e.g. Comprehensive analysis there and not look for moderators.
Meta-Analysis; Borenstein et al., 2005). After all, chance is the most parsimonious
For the remainder of this chapter, random explanation for the variation in effect sizes.
effect estimates will be presented for our We recommend that the meta-analyst may
running example, although formulas and search for moderators in the absence of a
computations will not be shown. statistically significant homogeneity analysis
In our fictional meta-analysis of the effect if there are good theoretical reasons for
of playing pool with friends on well-being, the doing so.
An alternative approach to examining if underlying population value, or that sampling

effect sizes vary across studies also compares error alone was responsible for the variation
the observed variation in obtained effect sizes in effects. We would continue our analysis of
with the variation expected due to sampling the effect by looking for variables that may
error, that is, the expected variance in effect potentially moderate the effect of playing pool
sizes given that all observed effects are with friends on well-being.
estimating the same underlying population An analogous procedure is followed for
value (Hunter & Schmidt, 2004). However, performing a homogeneity analysis on trans-
a formal statistical test of the difference formed r-indexes and odds ratios. The follow-
between these two values is typically not ing formula illustrates how Qt is calculated
carried out. Rather, the meta-analyst adopts using the z transformation of r.
a critical value for the ratio of observed-
2
to-expected variance to use as a means for
N
(ni − 3)zi
rejecting the null hypothesis. In this approach,
N
i=1
the meta-analyst might also adjust effect Qt = (ni − 3)zi2 −

N
sizes to account for methodological artifacts i=1 (ni − 3)
such as sampling error, range restrictions, or i=1
unreliability of measurements. This method
The following formula illustrates how Qt
has been applied most often in the areas
is calculated using the transformed log-odds
of industrial and organizational psychology.
ratio.
However, given the more widespread use of
the inverse-variance method deriving from 2

N
Hedges and Olkin (1985), the techniques wi LORi

N
i=1
described here follow this perspective. Qt = wi LORi2 −
To test whether a set of d-indexes is
N
i=1 wi
homogenous, the synthesis must calculate i=1
a statistic that Hedges and Olkin (1985)
called Qt . All terms are defined as before.
Just as with the d-index, these Q statistics
2 are compared to a chi-square distribution with

N

N w i di N −1 degrees of freedom. If the obtained value
i=1
Qt = wi di2 − of Qt is greater than the critical value of a

N
chi-square at the chosen level of significance,
i=1 wi
i=1 the meta-analyst rejects the hypothesis that
the variance in effect sizes was produced by
The Q statistic has a chi-square distribution sampling error alone.
with N– 1 degrees of freedom, or one less than
the number of d-indexes. If the obtained value
Testing for moderators
of Qt is greater than the critical value for the
of effect sizes
upper tail of a chi-square at the chosen level
of significance, the meta-analyst rejects the The search for why the outcomes of hypoth-
hypothesis that the variance in effect sizes was esis tests differ is often the most interesting
produced by sampling error alone. and informative part of conducting a meta-
In our fictional meta-analysis of the effect analysis. As previously suggested, homogene-
of playing pool with friends on well-being, ity analysis allows the meta-analyst to test
we find a highly significant homogeneity whether sampling error alone accounts for
statistic Q(7) = 29.62, p < 0.001 (please see variation in effect sizes or whether features
Table 32.3 for calculations). This suggests of studies, samples, treatment designs, or
that we should reject the hypothesis that outcome measures also play a role. The meta-
the d-indexes are all estimating the same analyst calculates average effect sizes for
subsets of studies, comparing the average effect of playing pool with friends has a
effect sizes for different methods, types of significant impact on well-being for females,
programs, outcome measures, and partici- d = 0.29 (95% CI = 0.08/0.79) but not males,
pants and compares these to determine if d = 0.13 (95% CI = −0.09/0.35). As shown
they provide insight into what influences the in Table 32.3, the Qt statistic for the eight
strength and/or direction of the relationship. studies was 29.61. The Qw statistic for females
In fact, a major strength of meta-analysis is was 13.89 and for males was 14.72 and the
that the meta-analyst can ask questions about total Qw for both groups is 28.61. From here,
variables that moderate outcomes even if no the Qb statistic comparing males to females
individual study has included the moderator can be calculated, Qb (1) = 1.01, p = 0.32.
variable. In our example, we can ask whether This result was not significant with 1 degree of
the relationship between playing pool with freedom. Using a random-effect error model,
friends and well-being differs for females the impact of playing pool with friends does
compared to males, even if no single study not have a significant effect on either females,
has included both groups. The results of such a d = 0.41 (95% CI = −0.08/0.89), or males
comparison of average effect sizes can suggest d = 0.23 (95% CI = −0.26/0.72). Further,
whether gender would be important to look at the Qb statistic comparing males to females
in future research. under random effects assumptions indicated
The procedure to test whether a method- that there was not a significant difference in
ological or conceptual distinction between the average weighted d-index between the
comparisons explains variance in effect sizes groups, Qb (1) = 0.26, p = 0.61.
involves several steps. First, a Qt statistic is In this way, the meta-analyst employs a
calculated using the formula just presented. formal means for testing whether different
Then, a Q statistic is calculated separately for features of studies explain variation in their
each subgroup of studies. Then the values of outcomes. This is an extension of the
these Q statistics are summed to form a value same rules of inference required of primary
called Qw . This value is then subtracted from researchers. If reliable differences do exist,
Qt to obtain Qb . the average effect sizes corresponding to these
differences will take on added meaning and
Qb = Qt − Qw will help the meta-analyst to guide future
research or make policy recommendations.
This Qb statistic is used to test whether the Further, in meta-analysis, tests of moderation
average effects from the groupings of studies may allow for the examination of certain
are homogenous. It is compared to a chi- forms of research bias. For example, modera-
square table using degrees of freedom one less tor tests can be employed to explore whether
than the number of groupings. If Qb exceeds stronger effects are more likely to come
the critical value, then the grouping variable is from certain researchers or whether allegiance
a significant contributor to variance in effect effects in clinical research are present. Specif-
sizes and remains a plausible moderator of ically, allegiance effects can be examined by
effect. This test is analogous to conducting an using the preference researchers have for a
analysis of variance in that a significant Qb particular treatment over others as a grouping
indicates that at least one group mean differs variable when exploring explanations for the
from the others. variation in study outcomes.
We use our example, illustrated in An alternative strategy for examining
Table 32.3, to demonstrate how a search for whether particular characteristics of studies
moderators of outcomes might proceed. Let us are related to the sizes of the treatment
compare effect sizes calculated from female effect is meta-regression. Unlike the strategy
samples compared to effect sizes using male previously discussed, meta-regression allows
samples, given in the last column. First, we the meta-analyst to explore the relationship
find that using a fixed-effect error model the between continuous, as well as categorical,
characteristics and effect size, and allows the THE ISSUE OF DATA CENSORING
effects of multiple factors to be investigated
simultaneously (Thompson & Higgins, 2002). Many meta-analysts go to great lengths to
In our example, imagine that our studies locate as much relevant research as possible.
ranged in the duration of the manipulation However, even after careful planning, search-
of playing pool with friends. One option ing, and coding of research reports, missing
would be to group studies into several distinct data can influence the conclusions drawn
categories of duration of pool playing and from the meta-analysis. Just as biases in the
continue with subgroup moderator analyses as selection of study participants threaten the
previously discussed. However, an alternative validity of primary research, data censoring
would be to employ meta-regression, leaving threatens the validity of the meta-analysis
this characteristic continuous. The interested (Rothstein et al., 2005). When data are
reader may refer to Thompson and Higgins systematically missing, not only is the size
(2002) or Higgins and Thompson (2004) for of the sample gathered for the research
a full discussion of this method. synthesis reduced, but the representativeness
of the sample and the validity of the results
Sensitivity analysis are compromised, regardless of the quality
An additional step in meta-analysis is the of the meta-analysis in all other respects
performance of sensitivity analyses. A sen- (Rothstein et al., 2005).
sitivity analysis is used to determine if and
how the conclusions of an analysis might
Types of data censoring
differ if it was conducted using different
statistical procedures or assumptions. There Data censoring occurs when primary
are numerous points at which a meta- researchers, journals, or publishers censor
analyst might decide a sensitivity analysis is what research gets into print or what specific
appropriate. For example, there might be a findings or aspects of the research are
set of comparisons that fall at the edge of reported. This data censoring can often cause
the conceptual definition of what constitutes the research included in a meta-analysis
an acceptably reliable measure of well-being. to be systematically unrepresentative of
The effects of playing pool with friends might the population of completed studies. As
be tested with and without the inclusion of suggested by Pigott (1994), there are three
these comparisons. Or, some evaluations of kinds of missing data that can result from
the relation between playing pool with friends data censoring.
and well-being might have missing data. First, entire studies may be unavailable to
These comparisons might be omitted from include in a dataset. In particular, unpublished
one analysis and included in another analysis research findings are frequently missing
that makes conservative assumptions about from meta-analyses. The research synthe-
what those values might be. The calculation sist can take extra precautions to include
of weighted, unweighted, and median effect unpublished research that may be difficult
sizes can be considered a form of sensitivity to locate. For example, search techniques
analysis. Lastly, averaging effect sizes and that include contacting professional networks
conducting homogeneity and moderator tests and listservs, using conference programs, or
using both fixed and random effects models is searching databases that include dissertation
another form of sensitivity analysis. In each and masters theses (Dissertation Abstracts)
case, the meta-analyst is seeking to determine can improve the inclusiveness of the studies
whether a particular finding is robust across in the meta-analysis. However, inevitably,
different sets of assumptions. If the answer there will be relevant studies left undis-
is ‘conclusions do not change under different covered. This form of data censoring is
sets of assumptions’ then greater confidence problematic because it frequently reflects the
can be placed in the conclusion. bias against the null hypothesis found in
published research. That is, published articles playing pool with friends on well-being, had
tend to report statistically significant results, particular studies failed to report the gender
whereas, unpublished research is less likely to makeup of their participant sample, those
include statistically significant results. studies could not have been included in the
Evidence suggests that bias against the null moderator analysis.
hypothesis is present in the decisions made
by both reviewers and primary researchers
Detecting missing data
(Cooper, 1998). For example, Atkinson et al.
(1982) found that significant results were A number of graphical and statistical tests
more than twice as likely as non-significant can be used to assess the possible presence
results to be recommended for publication of data censoring and the implications of this
in two APA journals in counseling psy- threat to the validity of the conclusions drawn
chology even when research designs of from the meta-analysis. One way a meta-
studies were identical. Greenwald (1975) analyst can evaluate whether data censoring
found that researchers said they were inclined has affected a distribution of effect sizes is to
to submit significant results for publication create a funnel plot (Light & Pillemer, 1984).
approximately 60% of the time. However, A funnel plot graphically depicts a measure of
they would submit the study for publication the sample size of studies, such as their given
only 6% of the time if the results failed to weight or precision, against their associated
reject the null hypothesis. When examining effect sizes (Greenhouse & Iyengar, 1994).
actual decisions made by researchers, Cooper If the meta-analyst has captured all the
et al. (1997) found that approximately 74% relevant studies, the funnel plots should be
of researchers submitted significant results symmetric around the mean and approximate
for publication, but only 5% submitted non- the shape of the normal distribution. However,
significant results. publication biases can restrict the range of the
Second, even if all relevant studies have distribution, resulting in overrepresentation
been uncovered, individual studies may be of studies in one tail of the distribution
missing relevant information necessary in (Sterne et al., 2005). In addition to graphical
order to calculate an effect size. Missing effect displays, regression methods such as the Rank
sizes will occur when the primary researcher Correlation Test (Begg & Mazumdar, 1994)
does not report adequate statistics or descrip- and Egger’s Test (Egger et al., 1997) can be
tive information needed to calculate an effect used to detect whether a bias is present (see
size. The consequence of missing an effect Sterne & Egger, 2005 for full discussion of
size is similar to missing an entire study. That these strategies).
is, a study with a missing effect size cannot Figure 32.2 presents the funnel plot illus-
be included in the estimate of the average trating the distribution of effect sizes from our
effect. Consequently, the generalizability of example meta-analysis on the effect of playing
the results may be limited to the sample of pool with friends on well-being. Our plot
studies which had complete data. Further, suggests the presence of bias, as the bottom
similar to reasons why entire studies may of the plot shows a higher concentration of
be missing from a review, effect sizes are studies on the right side of the mean compared
frequently unreported in published reports to the left.
when the relationship was not significant, and Another way to explore for possible data
thus, the author fails to report the precise censoring is by using publication status
values of the means, standard deviations, as a moderator variable in a homogeneity
statistical test, and/or p values (Pigott, 1994). analysis. As previously discussed, homogene-
Finally, information about study charac- ity analysis allows the meta-analyst to test
teristics used to examine moderators of an whether sampling error alone accounts for
effect may be missing from individual reports. variation in effect sizes or whether features
For example, when examining the effect of of studies, in this case, publication status,
Funnel Plot of Precision by Std diff in means

8
6
Precision (1/Std Err)
0
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0
Std diff in means
Figure 32.2 Funnel plot of d indexes for example meta-analysis
influences the observed effect sizes. In this direction of each test of the treatment and the
way, the meta-analyst can use observed sample size associated with each condition,
studies to assess whether publication status treatment and control.
moderates an overall effect. Briefly, the Pigott (1994) outlined several methods of
meta-analyst calculates average effect sizes imputing an estimate for missing values.
for published and unpublished studies and One strategy is to assume that missing
compares these to determine if there is a values are equivalent to a very conservative
significant difference in the strength and/or estimate, such as zero. Another option is
direction of the relationship. A description of to replace missing values with the mean
the procedure used to conduct a homogeneity value calculated from available cases for that
analysis was discussed in the section ‘Testing variable. Regression techniques can also be
for moderators of effect sizes’. used to impute missing values. Complete
cases are used to generate a regression
equation that can be used to estimate missing
Strategies for imputing missing data
values. A final alternative that appears to be
There are a number of strategies that meta- promising are multiple imputation procedures
analysts can use to deal with data censoring. (Rubin, 1987). Multiple imputation tech-
Rothstein et al. (2005) provide an in-depth niques use information from complete cases in
treatment of numerous approaches. One way the review to generate multiple estimates for
is to try to estimate the missing value using each missing value. The advantage of using
one of a number of imputation techniques. multiple imputation is that a range of estimates
Vote-counting is one strategy that can be are provided for each missing observation.
used to generate an effect size estimate (see Therefore, results using each of the estimates
Bushman, 1994; Hedges & Olkin, 1985 for a can be compared.
discussion of vote-counting techniques). That Even though imputing an estimate for
is, the underlying magnitude of a treatment’s missing values allows the meta-analyst to
effect can be estimated from the proportions include in the synthesis cases with miss-
of studies showing positive and negative ing data, data imputation methods force
directional outcomes. However, this approach the meta-analyst to make assumptions that
requires that the vote-counter knows the may not be accurate and can result in
other types of bias. In particular, when analyses are consistent with variation in effect
using single-value imputation methods, the sizes that would be predicted if the estimates
assumption that missing values may be were normally distributed. The method first
either smaller than or similar to observed examines whether the distribution of observed
values may simply be incorrect. Further, effect sizes is skewed, indicating a possible
using single-value imputation methods can bias created either by the study retrieval
result in an artificially reduced variance procedures or by data censoring on the part
for those variables for which values were of authors. Then it provides a way to estimate
imputed. This reduced variance is particularly the values from missing studies that need to be
problematic when testing the homogeneity present to approximate a normal distribution.
of effect sizes. In fact, one advantage of It imputes these missing values, permitting an
the regression imputation technique is that examination of an estimate of the impact of
an adjustment can be applied to correct data censoring on the observed distribution of
for this underestimation of the sampling effect sizes and the statistics resulting from
variance (Little & Rubin, 1987). While all including the imputed values.
but the zero imputation technique provide More specifically, the Trim-and-Fill tech-
a reasonable estimate for the mean when nique uses a nonparametric method that
information is missing completely at random, initially removes the asymmetric studies from
when information is missing for reasons the right side of the funnel plot (those
related to the value itself or other observed indicating a positive effect) in order to
or unobserved variables, these imputation compute an unbiased estimate of the effect.
results fail to generate an unbiased estimate Missing effect sizes from the left side of the
(Sutton & Pigott, 2005). Given the growing plot (those that would reduce the size of the
awareness of publication bias, imputation positive effect) are then estimated based on
techniques seem destined to remain an the normal distribution. Finally, both removed
important area for the development of new and imputed studies are placed into the funnel
meta-analytic techniques. plot and a new combined effect that includes
Regardless of which method is employed, these imputed effect sizes is computed. Con-
meta-analysts are obligated to discuss how sequently, the Trim-and-Fill method provides
much data was missing from their reports, a sensitivity analysis in which the meta-
how they handled it, and why they chose analyst can compare the observed combined
the methods they did. Finally, it is becoming effect size to the hypothetical combined effect
increasingly common practice for meta- size when imputed missing effect sizes are
analysts with large amounts of missing included.
data to conduct their analyses using more Figure 32.3 depicts the asymmetric funnel
than one strategy and determining whether plot of effect sizes from our fictional meta-
their findings are robust across different analysis with effect sizes imputed using
missing data assumptions (see Greenhouse the Trim-and-Fill method included to make
and Iyengar, 1994). the funnel plot symmetric. When looking
for missing studies on the left side of the
The Trim-and-Fill procedure distribution (and based on a fixed-effect
There is an interesting imputation method that model), the Trim-and-Fill technique suggests
is gaining popularity because of its simplicity that there are three missing studies. Recall
and ease of use. Duval and Tweedie, (2000a, that the fixed effects observed point estimate
2000b) have recently developed a Trim-and- and 95% confidence interval for the combined
Fill method that, through an iterative process, studies is 0.21 (95% CI = 0.06/0.36). Using
fills in possible values for effect sizes from Trim-and-Fill, the imputed fixed effects point
studies that are not represented in the dataset. estimate is 0.04 (95% CI = −0.09/0.18). The
The Trim-and-Fill procedure tests whether random effects observed point estimate and
the distribution of effect sizes used in the 95% confidence interval for the combined
Funnel Plot of Precision by Std diff in means

8
6
Precision (1/Std Err)
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0

Std diff in means
Figure 32.3 Funnel plot of observed and imputed d indexes for example meta-analysis
Note: Black dots represent imputed effect sizes, making the distribution symmetrical.
studies is 0.31 (95% CI = −0.01/.63). Using of effect sizes is heterogeneous, that the
the Trim-and-Fill method the imputed random differences between study outcomes exceed
effects point estimate is 0.05 (95% CI = that which may have been found by chance
−0.29/0.38). Thus, this imputation technique alone. In contrast, a non-significant Q statistic
changes our finding both in the statistical sig- indicates that the differences underlying the
nificance and magnitude of effect. Therefore, results of studies can be accounted for by
we may not be confident that the positive find- sampling error alone.
ing of our meta-analysis on the observed eight However, the Q statistic itself has power
studies testing the effect of playing pool with characteristics. That is, it may fail to detect
friends on well-being is robust against a plau- meaningful heterogeneity in the case in which
sible assumption about data censoring. In such just a few studies are being meta-analyzed,
a case, we would certainly discuss the impli- or it may detect ‘unimportant’ heterogeneity
cations of this finding and take care to caution when a large number of studies are being
the reader about this important limitation. synthesized (Hardy & Thompson, 1998).
Consequently, it may be advisable for the
meta-analyst to report another statistic, I 2 ,
which provides a way to quantify the
NEW DIRECTIONS IN META-ANALYSIS heterogeneity among effect sizes included in
a synthesis. I 2 describes the percentage of
Alternative indices of heterogeneity total variation across studies that is due to
We have seen that studies addressing a heterogeneity rather than chance (Higgins &
common question will generally vary in Thompson, 2002; Higgins et al., 2003). It can
terms of their design, interventions or other be derived from the Q test using the following
manipulations, sample characteristics, and/or formula:
outcomes. And, as previously mentioned, the I 2 = 100% × (Q − df )/Q
most common way of assessing heterogeneity
in a set of effect sizes is the Q test. A where all terms are as previously defined.
significant Q statistic indicates that the sample Negative values for I 2 should be assumed
to be equivalent to zero and indicate no variable, controlling for all other predictors,
heterogeneity. Non-zero I 2 values represent given one unit change in the criterion
the extent to which heterogeneity is present variable.
in the sample of studies, with 100% being Syntheses of regression analyses are diffi-
the maximum value. For example, in our cult to conduct for a variety of reasons. First,
fictional meta-analysis examining the effect models using multiple regression generally
of playing pool with friends on well-being, differ from study to study. Each study may
I 2 = 100% × (29.616 − 7)/29.616 = 76.364, include different predictors in the regres-
indicating that over 76% of the variability sion model and therefore, the slope for
between our eight studies cannot be explained the predictor of interest will represent a
by sampling error alone. different partial relationship in each study
As suggested by Higgins and colleagues (Wu & Becker, 2004). Second, the scale of
(2003), there are several important advantages the predictor of interest and outcome may
of I 2 . First, I 2 overcomes many of the vary across studies (Wu & Becker, 2004).
drawbacks of the Q test because it does not In some cases, a predictor such as SAT
depend directly on the number of independent scores or monetary expenditures may have
effects included in the meta-analysis. Second, a common scale. However, in most cases
given that I 2 is a percentage, it can be the scale of both the predictor and outcome
easily compared across meta-analyses, even variable will vary, making comparisons across
when they may differ in the number of studies difficult. Still, this problem can be
studies included, the outcome being assessed, overcome by using β, the fully standardized
or effect size metric used. Finally, I 2 is estimate of the slope for a particular predictor
easily computed from statistical tests that when the scaling of both the predictor
are normally conducted in a meta-analysis. and outcome variable differ across studies.
Currently, I 2 is rarely reported in published ‘Half-standardizing’ is an alternative way to
meta-analyses outside of medicine, but its create similar slopes when only outcomes are
clear advantages, as well as the ease by dissimilar (Greenwald et al., 1996).
which it can be interpreted, suggest it will If slopes are independently and identically
soon be reported regularly in social science distributed, we can apply standard methods
meta-analyses as well. for meta-analysis. Slopes will be identically
distributed across studies when the outcome
and predictor of interest are measured in a
Combining slopes from multiple
similar fashion, the other predictors in the
regressions
model are the same across studies, and when
Up to this point, the procedures for combining predictor and outcome scores are similarly
and comparing study results have generally distributed (Becker, 2005). If these conditions
assumed that the measure of effect is a are met, weighting can be accomplished by
mean difference, correlation, or odds ratio. multiplying each effect size by the inverse
However, regression analysis is a commonly of its variance and then the sum of these
used technique in the social sciences, particu- products is divided by the sum of the inverses.
larly for non-experimental studies. Like the Standard tests can be then computed, includ-
standardized mean difference or correlation ing the mean effect, confidence intervals, and
coefficient, the regression coefficient, b, or homogeneity tests.
the standardized regression coefficient, β, are However, it is rare that datasets meet the
also measures of effect size. β will typically assumption of being identically and indepen-
be used in meta-analyses because, like the dently distributed (Becker, 2005). Typically,
d-index and r-index, it standardizes effect measures differ across studies and regression
size estimates when different measures are models are diverse in terms of which
used in different studies. β represents the additional variables are included in them.
standardized score change in a predictor And, because few studies provide descriptive
statistics on the variables measured and Becker, B. J. (2005, November). Synthesizing Slopes in
included in the regression model, it remains Meta-analysis. Paper presented at the meeting on
difficult to assess whether the assumption that Research Synthesis and Meta-Analysis: State of the
scores are distributed similarly across studies Art and Future Directions, Durham, NC.
has been met. Given the current limitations, Begg, C. B. & Mazumdar, M. (1994). Operating charac-
teristics of a rank correlation test for publication bias.
a common method for summarizing the
Biometrics, 50, 1088–1101.
results of the regression analyses has been
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H.
to use a vote-count strategy (see Cooper (2005). Comprehensive Meta Analysis (Version 2.1)
et al., 2006; Hanushek, 1989; or Patall et al., [Computer software]. Englewood, NJ: BioStat.
2007, for examples). What remains clear is Bushman, B. J. (1994). Vote-counting procedures
that techniques for synthesizing results from in meta-analysis. In H. Cooper & L.V. Hedges
multiple regression analyses need to be more (Eds.). Handbook of Research Synthesis. New York:
extensively developed and studied. Russell Sage.
Cohen, J. (1988). Statistical Power Analysis in the
Behavioral Sciences. Hillsdale, NJ: Erlbaum.
Cooper, H. M. (1998). Synthesizing Research: A Guide
CONCLUSIONS for Literature Reviews (3rd ed.). Thousand Oaks,
CA: Sage.
In this chapter, the major meta-analytic Cooper, H., DeNeve, K., & Charlton, K. (1997). Finding
procedures, challenges that face the meta- the missing science: The fate of studies submitted for
analyst, and new directions of meta-analysis review by a human subjects committee. Psychological
were discussed. What should be evident is that Methods, 2, 447–452.
meta-analysis is a powerful tool that can be Cooper, H. & Hedges, L. V. (1994). Handbook of
used to inform future social science research, Research Synthesis. New York: Russell Sage.
as well as social policy decision-making. Cooper, H., Robinson, J. C., & Patall, E. A. (2006).
Does homework improve academic achievement?:
While meta-analysis is not without limitation,
A synthesis of research, 1987–2003. Review of
meta-analyses help to meet rigorous standards
Educational Research, 76, 1–62.
that allow us to be more confident when Cooper, H. M. & Rosenthal, R. (1980). Statisti-
drawing conclusions about the cumulative cal versus traditional procedures for summariz-
state of evidence on relationships in our social ing research findings. Psychological Bulletin, 87,
world. 442–449.
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S.
(1985). The stisfaction with life scale. Journal of
Personality Assessment, 49, 71–75.
NOTES Duval, S. & Tweedie, R. (2000a). A nonparametric ‘trim
and fill’ method of accounting for publication bias
1 Hedges (1980) showed that the d-index may in meta-analysis. Journal of the American Statistical
slightly overestimate the size of an effect in the entire Association, 95, 89–98.
population. However, the bias is minimal if the sample
Duval, S. & Tweedie, R. (2000b). Trim and fill: A simple
size is more than 20. If a meta-analyst is calculating
d-indexes from primary research based on samples funnel plot-based method of testing and adjusting
smaller than 20, Hedges’ (1980) correction factor for publication bias in meta-analysis. Biometrics, 56,
should be applied. 276–284.
Egger, M., Davey Smith, G., Schneider, M., & Minder, C.
(1997). Bias detected in meta-analysis detected by a
simple, graphical test. British Medical Journal, 315,
REFERENCES 629–634.
Fleiss, J. L. (1994). Measures of effect size for categorical
Atkinson, D. R., Furlong, M. J., & Wampold, B. R. (1982). data. In H. Cooper & L. V. Hedges (Eds.). Handbook
Statistical significance, reviewer evaluations and of Research Synthesis. pp. 245–260. New York:
scientific process: Is there a (statistically) significant Russell Sage.
relationship? Journal of Counseling Psychology, 29, Gleser, L. J. & Olkin, I. (1994). Stochastically dependent
189–194. effect sizes. In H. Cooper & L. V. Hedges
(Eds.). Handbook of Research Synthesis. New York: Little, R. J. A. & Rubin, D. B. (1987). Statistical Analysis
Russell Sage. with Missing Data. New York: Wiley.
Greenhouse, J. B. & Iyengar, S. (1994). Sensitivity analy- Lyubomirsky, S. & Lepper, H. S. (1999). A measure
sis and diagnostics. In H. Cooper & L. V. Hedges (Eds.). of subjective happiness: Preliminary reliability and
Handbook of Research Synthesis. pp. 383–398. construct validation. Social Indicators Research, 46,
New York: Russell Sage. 137–155.
Greenwald, A. (1975). Consequences of prejudice Overton, R. C. (1998). A comparison of fixed-effects
against the null hypothesis. Psychological Bulletin, and mixed (random-effects) models for meta-analysis
82, 1–20. tests of moderator variable effects. Psychological
Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). The Methods, 3, 354–379.
effect of school resources on student achievement. Patall, E. A., Cooper, H., & Robinson, J. C. (2007). Parent
Review of Educational Research, 66, 361–396. involvement in homework: A research synthesis.
Haddock, C. K., Rindskopf, D., & Shadish, W. R. (1998). Manuscript submitted for publication.
Using odds ratios as effect sizes for meta-analysis of Pigott, T. D. (1994). Methods for handling missing data
dichotomous data: A primer on methods and issues. in research synthesis. In H. Cooper & L. V. Hedges
Psychological Methods, 3, 339–353. (Eds.). Handbook of Research Synthesis. New York:
Hanushek, E. A. (1989). The impact of differential Russell Sage.
expenditures on school performance. Educational Raudenbush, S. W. (1994). Random effects models.
Researcher, 18, 45–51. In H. Cooper & L. V. Hedges (Eds.). Handbook of
Hardy R. J. & Thompson, S. G. (1998). Detecting and Research Synthesis. pp. 301–322. New York: Russell
describing heterogeneity in meta-analysis. Statistics Sage.
in Medicine, 17, 841–856. Raudenbush, S. W., Becker, B. J., & Kalaian, H. (1988).
Hauck, W. W. (1989). Odds ratio inference from Modeling multivariate effect sizes. Psychological
stratified samples. Communications in Statistics, 18A, Bulletin, 103, 111–120.
767–800. Rosenthal, R. (1984). Meta-Analytic Procedures for
Hedges, L. V. (1980). Unbiased estimation of effect Social Research. Beverly Hills, CA: Sage.
size. Evaluation in Education: An International Review Rosenthal, R. (1994). Parametric measures of effect size.
Series, 4, 25–27. In H. Cooper & L. V. Hedges (Eds.). Handbook of
Hedges, L. V. & Olkin, I. (1985). Statistical Methods for Research Synthesis. New York: Russell Sage.
Meta-analysis. Orlando, FL: Academic Press. Rothstein, H. R., Sutton, A. J., & Borenstein, M.
Hedges, L. V. & Vevea, J. L. (1998). Fixed and ran- (2005). Publication Bias in Meta-analysis: Prevention,
dom effects models in meta-analysis. Psychological Assessment and Adjustments. Chichester, UK: John
Methods, 3, 486–504. Wiley & Sons, Ltd.
Higgins, J. P. T. & Thompson, S. G. (2002). Quantifying Rubin, D. B. (1987). Multiple Imputation for Nonre-
heterogeneity in a meta-analysis. Statistics in sponse in Surveys. New York: Wiley.
Medicine, 21, 1539–1558. Shadish, W. R. & Haddock, C. K. (1994). Com-
Higgins, J. P. T. & Thompson, S. G. (2004). Controlling bining estimates of effect size. In H. Cooper &
the risk of spurious findings from meta-regression. L. V. Hedges (Eds.). Handbook of Research Synthesis.
Statistics in Medicine, 23, 1663–1682. pp. 261–282. New York: Russell Sage.
Higgins, J. P. T., Thompson, S. G., Deeks, J.J., & Altman, Sterne, J. A. C., Becker, B. J., & Egger, M. (2005).
D. G. (2003). Measuring inconsistency in meta- The funnel plot. In H. R. Rothstein, A. J. Sutton, &
analyses. British Medical Journal, 327, 557–560. M. Borenstein (Eds.). Publication Bias in Meta-
Hunt, M. (1997). How Science Takes Stock: the Story of analysis: Prevention, Assessment and Adjustments.
Meta-analysis. New York: Russell Sage Foundation. pp. 75–98. Chichester, UK: John Wiley & Sons, Ltd.
Hunter, J. E. & Schmidt, F. L. (2004). Methods of Sterne, J. A. C. & Egger, M. (2005). Regression
Meta-analysis: Correcting Error and Bias in Research methods to detect publication and other bias in
Findings (2nd ed.). Thousand Oaks, CA: Sage. meta-analysis. In H. R. Rothstein, A. J. Sutton, &
Last, J. M. (2001). A Dictionary of Epidemiology. Oxford: M. Borenstein (Eds.). Publication Bias in Meta-
Oxford University Press. analysis: Prevention, Assessment and Adjustments.
Light, R. J. & Pillemer, D. B. (1984). Summing Up: pp. 99–110. Chichester, UK: John Wiley & Sons, Ltd.
The Science of Reviewing Research. Cambridge, MA: Sutton, A. J. & Pigott, T. D. (2005). Bias in meta-
Harvard University Press. analysis induced by incompletely reported studies.
Lipsey, M. W. & Wilson, D. B. (2001). Practical Meta- In H. R. Rothstein, A. J. Sutton, & M. Borenstein
analysis. Thousand Oaks, CA: Sage. (Eds.). Publication Bias in Meta-analysis: Prevention,
Assessment and Adjustments. pp. 223–240. Wu, M. & Becker, B. J. (2004, April). Synthesizing
Chichester, UK: John Wiley & Sons, Ltd. Results from Regression Studies: What can we Learn
Thompson, S. G. & Higgins, J. P. T. (2002). from Combining Results from Studies Using Large
How should meta-regression analyses be under- Data Sets? Paper presented at the annual meeting
taken and interpreted? Statistics in Medicine, 21, of the American Educational Research Association,
1559–1573. San Diego, CA.
33
Synergy and Synthesis:
Integrating Qualitative and
Quantitative Data
Jane Fielding and Nigel Fielding
THE DEVELOPMENT OF SOCIAL psychology’ (Campbell 1981: 456). Dis-

SCIENCE PERSPECTIVES ON cussing his initial elaboration of triangulation
METHODOLOGICAL INTER-RELATION by way of the ‘multitrait-multimethod matrix’
technique, Campbell wrote that it grew
The origins of multiple-method from lectures at Berkeley on measurement
artefacts in the study of individual differ-
research
ences. Campbell used correlational matrices
Research designs systematically relating crossing different methods in his dissertation
multiple methods originated in the context and thus had found his way to what he
of mainstream psychology (Campbell and dubbed ‘methodological triangulation’ before
Fiske 1959), initially being termed ‘trian- his collaboration with Fiske.
gulation’. Multiple method research designs The original conception was that triangu-
(‘MMRD’) remain prominent amongst main- lation would enhance validity, understood as
stream methodological practices (Campbell agreement in the outcomes of more than one
and Russo 1999). Heuristics for relating independent measurement procedure, relative
results from substantially different methods to studies employing a single procedure.
were a theme from the outset. Campbell wrote The position assumes that there are realities
that, when he decided to study psychology, that exist independently of the observer, that
while working on a turkey ranch for the have stable properties that can be measured,
summer, ‘my notion of science was already and that can be mutually related as the
of the experimental physics sort, whereas basis of internally consistent explanations of
[a magazine article that inspired his choice social phenomena. These assumptions are
of discipline] was solely about humanistic necessary because in relating findings from
different methods, triangulation must assume counteract the ‘holistic fallacy’that all aspects
that variations in findings arise from the of a situation are congruent, and can demon-
phenomenon or the particularities of the strate the generalisability of limited-sample
methods being combined rather than methods observations. Qualitative research sometimes
haphazardly producing different findings on succumbs to ‘elite bias’, concentrating on
different occasions, or there being no pre- respondents who are articulate, strategically
dictable consistencies in the working of given placed and have a status that impresses
methods. The latter is especially important in researchers. Quantitative data can compensate
the convergent validation approach to trian- by indicating the full range that should be
gulation, as it is premised on the combined sampled. Qualitative data can contribute depth
methods having different and distinctive to quantitative research, and suggest leads that
biases; if methods are susceptible to the same the more limited kinds of quantitative data
biases, combining them may simply multiply cannot address.
error. Further implied is that these sources of As well as combining methods, triangu-
error can be anticipated and their effects can lation can also involve using a number of
be traced during analysis. It is in this sense data sources (self, informants, other com-
that Levins’ (1966: 423) declaration that ‘our mentators), several accounts of events, or
truth is the intersection of independent lies’ several researchers. Denzin’s (1970) original
is so apt. conceptualisation, which was related to Webb
The doctrine of convergent validation et al’s (1966) work on ‘unobtrusive mea-
therefore requires agreement of results from sures’, not only involved multiple methods
diverse but systematic uses of methods, (‘data triangulation’) but multiple investiga-
data sources, theories and investigators tors (‘investigator triangulation’) and multiple
(Denzin 1989). Some maintain that combin- methodological and theoretical frameworks
ing methods or drawing on different data (‘theoretical and methodological triangula-
sources only enhances validity where each tion’). Each main type has a set of sub-
is associated with compatible ontological and types. Data triangulation may include time
epistemological perspectives (Blaikie 1991). triangulation, exploring temporal influences
Post-positivists have somewhat sidestepped by longitudinal and cross-sectional designs;
the ontological/epistemological critique with space triangulation, taking the form of
the argument that datasets are open to comparative research; and person triangu-
interpretation from a range of theories. lation, variously at the individual level,
Another perspective is that combining dif- the interactive level among groups and the
ferent methodologies does not necessarily collective level. In investigator triangulation,
enhance validity but can extend the scope more than one person examines the same
and depth of understanding (Fielding and situation. In theory triangulation, situations
Fielding 1986; Denzin and Lincoln 2000; are examined from different theoretical per-
Fielding and Schreier 2001). spectives. Methodological triangulation has
Triangulation has also been informed by two variants: ‘within-method’, where the
rationales for the methodological ‘division of same method is used on different occasions
labour’ (Sieber 1973). For Sieber, qualitative (without which one could hardly refer to
work can assist quantitative work in providing ‘method’at all), and ‘between-method’, where
a theoretical framework, validating survey different methods are applied to the same
data, interpreting statistical relationships and subject in explicit relation to each other.
deciphering puzzling responses, selecting sur- While the classical approach represented
vey items to construct indices and providing by Campbell’s work seeks convergence
case studies. Quantitative data can identify or confirmation of results across different
individuals, groups and settings for qualitative methods, the triangulation term has accu-
fieldwork and indicate representative and mulated so many renderings that it is now
unrepresentative cases. Quantitative data can clearer to use the terms ‘convergence’ or
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 557
‘confirmation’ when seeking cross-validation to ‘ethnographic authority’ (Hammersley and

between methods. In reality the classic goal Atkinson 1995), the defence of interpretations
of seeking convergence has always been rela- not by adherence to systematic, externally
tively unusual. One reason is the difficulties tested analytic procedures but because the
caused when results fail to converge, but researcher ‘was there’ and so must have the
another is the effort required to pursue best sense of what the data mean. Validity
the goal of producing convergent findings. queries may be met by reference to the
Morgan (1998) argues that researchers often amount of time spent in fieldwork, the rapport
cannot afford to put so much effort into achieved and so on. Such criteria contrast
finding the same thing twice. Moreover, sharply with the warrant for inferences from
the complex topics of social research make quantitative data, where statistical procedures
apparent the different strengths of different are used whose steps are standardised, so
methods, supporting a more flexible approach that adherence to each stage can be checked,
to methodological combination than in clas- and whose criteria for drawing a particular
sic triangulation. conclusion are not only explicit but precisely
The fact that there are different con- define the conditions under which it can be
structions of triangulation implies there are expected to hold. Triangulation enables qual-
varying degrees of rigour in operationalising itative researchers to adopt the stance often
triangulation. We might, for example, regard characteristic of the quantitative researcher,
as relatively weak the idea that validity for whom conclusions are always ‘on test’,
will be enhanced simply by drawing on hold only under specified conditions, and
data collected by different researchers using whose relationship to the data is not uncritical
the same method, while approaches based ‘immersion’ but measured detachment.
on combining different methods might be It is not suggested that qualitative
regarded as more rigorous. For triangulation researchers should transform their approach
to be credibly founded and implemented, we to resemble that of quantitative researchers,
must identify in advance the characteristic but we can certainly argue that the value of
weaknesses or types of error associated with triangulation lies more in ‘quality control’
the chosen methods so that we can discount than any guarantee of ‘validity’. The approach
the danger that they might be susceptible promotes more complex research designs
to the same threats to validity. Thus, much that oblige researchers to be more clear
depends on the logic by which researchers about what relationships they seek to study,
derive and mesh together data from different what they will take as indicators of these
methods. ‘What is involved in triangulation relationships and so on. Diffusely-focused
is not the combination of different kinds of exploratory research will always have a
data per se, but rather an attempt to relate place but as qualitative research tackles
different sorts of data in such a way as more precisely-specified topics and becomes
to counteract various possible threats to the more prominent in policy-related research
validity of (their) analysis’ (Hammersley and its audiences want to know how confident in
Atkinson 1995: 199). the findings they can be. Even in exploratory
Triangulation in itself is no guarantee work researchers cannot be indifferent to
of internal and external validity. Its real accuracy.
value is not that it guarantees conclusions Moreover, when findings from indepen-
about which we can be confident but that dent methods converge, it is not simply a
it prompts in researchers a more critical matter of identifying points of agreement.
stance towards their data. Too often, research We also have to identify the conditions under
attracts the criticism that its conclusions which findings are invariant, explain failures
simply confirm what everyone already knew. of invariance and determine why given condi-
Evaluative criteria for qualitative methods are tions apply. The differences between findings
particularly problematic, with much recourse from different knowledge sources can be as
illuminating as their points of agreement. to focus on exceptional examples found in one

Triangulation helps address the tendency type of data and refine their explanation of
to focus on data fitting preconceptions or it via analysis of data of another type. The
that is conspicuous at the expense of less final strategy, data consolidation, extends the
exotic, but possibly more indicative, data. data transformation strategy in that data are
While the rigidity of quantitative methods converted into another form, but the emphasis
helps researchers resist such faults, their is on assimilating multiple forms of data to
work is not immune to such problems either. produce a new dataset.
However, such faults can more readily be These strategies enable numerous types
traced because quantitative methodologies of multiple method research design. Green
necessitate clarity about hypotheses, make et al. (1989) identified six main dimensions of
the researcher’s assumptions more explicit methodological design. When combining two
and sediment these assumptions in research methods the nature of the relationship between
instruments that cannot generally be adjusted the methods can be categorised along each
after they are deployed. Deploying qualitative dimension (see Figure 33.1). Thus, combining
methods alongside quantitative methods in a survey with qualitative interviewing – two
multiple-method research designs helps qual- distinct methods – can be categorised as
itative research gain some of these benefits. using different paradigms to explore different
Similarly, it can bring to quantitative elements aspects of the same phenomenon, in sequence
of the research more refinement and analytic (e.g. first the survey, then the interviews); with
depth. the methods being independent but with each
method having equal status.
Moreover, the research designs must be
From convergent validation to the
distinguished from the reported rationale
celebration of diversity
or practical purpose of the research (see
As well as taking a convergent valida- Table 33.1).
tion perspective, the original literature on Other attempts at definitive typologies
combining methods usually involved one arrive at different numbers of main types of
method taking precedence (Creswell 2003). methodological combination (Creswell 2003;
Qualitative components rarely held this role Niglas 2004; Tashakkori and Teddlie 1998),
and were mostly used for pilot work or some of which proliferate to the point
follow-up with a sub-sample. More recent of intellectual indigestion (Johnson and
approaches suggest more even-handed com- Onweugbuzie 2004). The most exhaustive
binations, as in Caracelli and Green’s (1997) typology can never capture all potential
classification of mixed-method research into combinations; the essential thing is having
component designs (such as ‘complementary’ a considered but open stance in deriving a
or ‘comparative’ designs) and integrated design that captures the research question.
designs, which include iterative designs, Over-concentration on choosing exactly the
nested designs and holistic designs. Caracelli right permutation at the outset can make for
and Green (1993) identify four different an unhelpfully rigid approach, but this is not
strategies through which qualitative and to sideline preliminary reflection. Rather, it is
quantitative data might be integrated. The to say that precisely specifying the research
first, data transformation, requires data of one question is the key thing, and from this a sense
type being converted into that of another so of the best methodological combination will
they may be analysed together. Typological emerge, with the proviso that researchers must
development, the second strategy, involves always be ready to adjust the design in light of
the use of conceptual categories emergent what is found. Research design is not a stage,
from the analysis of one type of data to the it is a process.
analysis of a contrasting data type. Third, Broadly, strategies for interrelating find-
extreme case analysis requires the researcher ings from multiple methods fall into two
Methods
Different
Pa
n a
rad
me
Dif
t
ren
igm
no
fer
e
ffe
en
s
Ph
Di Similar
t
Sa
e
m
me
Sa
us
eo
Eq
an
ua
ult
l
Sim
l
tia
Un
Interactive
g
en
eq
cin
St
qu
ua
atu
en
Se
l
qu
s
Se
Independent
Inter-
dependence
Figure 33.1 Dimensions of methodological design (Original figure, drawing on Green et al.,
1989)
Table 33.1 Purpose of mixed-method research designs

Classification Purpose
Triangulation Convergence, corroboration and correspondence of results from different methods
Complementarity Elaboration, illustration and clarification from one method with the results of the other
Development The results of one method are used to help develop or inform the other (this may include
sampling, implementation or measurement issues)
Initiation Discovery of paradox – used to recast the questions or results of one method with the
results of the other
Expansion (parallel design) Expand the breadth of study using different methods for different components of the
study.
Source: Adapted from Green et al. (1989), Table 1, p. 259
types: ‘combination’and ‘conversion’(Bazeley history (Elliott 2005). Bazeley (2006) notes

2006). An instance of combination is when that strategies involving the consolidation,
categorical or continuous variables are the blending or merging of data tend to involve
basis both of statistical analysis and for both conversion and combination.
comparison of coded qualitative data. Textual A well-established case for inter-relating
and numerical data may have been collected quantitative and qualitative methods is that
together, as where questionnaires mix fixed the qualitative element can suggest types of
and open response items, or in sequence, adaptation or experience for which the quan-
such as where surveys are followed by titative element can then test, thus enabling
interviews. Conversion involves changing conclusions concerning the statistical fre-
one type of data to another, such as where quency of types in a population. Qualitative
the coding applied to qualitative data is used research is good at identifying types but is
in statistical analysis, or where quantitative seldom sufficiently comprehensive to indicate
data contributes to narrative analyses or a life for what share of the sample a given type
may account. In combination, qualitative and essential integrity of the quantitative and
quantitative methods can reveal more about qualitative components of the method. They
the extent of regularities and the dimensions represent moves to interrelation rather than
of the types. Numerous hybrid techniques juxtaposition of different forms of data.
interrelate quantitative and qualitative proce-
dures. Where codes derived from qualitative
data are recorded separately for each case, the CORE PRINCIPLES OF
presence/absence of each code can be used to MULTIPLE-METHOD
create variables, from which case-by-variable
RESEARCH DESIGN
matrices can be derived. Such matrices enable
hypothesis testing, predictive modelling and
Epistemology and pragmatism
exploratory analyses.
Statistical techniques like cluster analy- The advantages of combining methods do
sis, correspondence analysis and multidi- not require that we ignore that different
mensional scaling can be applied to such approaches are supported by different episte-
‘quantitised’ qualitative data. For example, mologies. Accepting the case for interrelating
non-standardised interviews documenting data from different sources is to accept a
types of adaptation to labour force position moderate relativistic epistemology, one that
can be used as the basis of a probabilistic justifies the value of knowledge from many
cluster analysis. The proximity and prob- sources, rather than elevating one source.
ability of classification of each respondent Taking a triangulation or multiple-method
towards the centre of the relevant clus- approach is to accept the continuity of all data-
ter (i.e. type) can thus be visualised and gathering and analytic efforts. Proponents are
categories reduced to fewer dimensions by likely to regard all methods as both privileged
multiple correspondence analysis. Kuiken and and constrained: the qualities that allow us to
Miall (2001) used this technique to specify access and understand one kind of information
experiential categories derived from interview close off other kinds. A full understanding
response in a study comparing different flows from tackling the research question in
readers’ impressions of the same short story. several ways.
Having identified attributes qualitatively, Results from different methods founded
categories were specified by a quantitative on different assumptions may then be com-
cluster analysis that systematically varied bined for different purposes than that associ-
the presence of individual attributes. Subse- ated with convergent validation. Theoretical
quent qualitative inspection of the clusters triangulation does not necessarily reduce
further differentiated the types. In her study bias, nor does methodological triangula-
of mixed-methods projects, Niglas (2004) tion necessarily increase validity. Combining
used scales to capture variation amongst results from different analytic perspectives
them on various characteristics of research or methods may offer a fuller picture but
design. Cluster analysis of variables from not a necessarily more ‘objective’ or ‘valid’
her quantitative content analysis produced one. When we combine theories and methods
eight distinctive groups and identified the we do so to add breadth or depth to our
characteristics best differentiating them. The analysis, not because we subscribe to a
findings were compared to discursive notes single and ‘objective’ truth. In the social
from her initial reading of the study to realm it is beyond our capacities to achieve
produce summary descriptions of each group. absolute objectivity or axiomatic truth, but
The descriptions were used to make the this is not the same as rejecting the attempt
final assignment of studies into categories to be objective or the standard of truth.
representing variables for further statistical It is merely to accept that our knowledge
analysis. These alternating quantitative and is always partial and incomplete. We can
qualitative procedures do not challenge the make it less so by expanding the sources
of knowledge on which we draw. When (Fielding et al., 2007), aimed to provide

we accept an empirically based conclusion a detailed understanding of the ways in
with identifiable and defined limits, such which the ‘at flood risk’ public understood,
as that educational achievement is generally interpreted, and responded to flood warnings.
related to social class but the relationship is Both projects consisted of qualitative and
more pronounced for ethnic minority people quantitative components whose results fed
(discussed in Becker 1986), we implicitly back into the subsequent phases of the project
accept the ‘constant and unevadable necessity but also provided explanations for anomalies
for interpretation and change of aspect’ or actions reported in previous phases.
(Needham 1983: 32). That is the ultimate Figures 33.2 and 33.3 outline the projects’
warrant for the triangulation paradigm. research designs. The vulnerable groups
project consisted of two phases. The first
involved secondary analysis of existing
A rounded picture: data in tandem
quantitative data to establish the social
and data in conflict distribution of flood risk and identify groups
We comment later on the extent to which that were particularly at risk. In parallel,
MMRD is practised in applied research. qualitative interviews were conducted with
Our principal example is taken from key informants. Results from both techniques
applied research for the UK Environment defined the sample for the second phase: focus
Agency (EA). One project, Flood Warning groups with vulnerable groups. The public
for Vulnerable Groups (FWVG) (Burningham response project consisted of three phases:
et al., 2005) was designed to explore the (i) a secondary analysis of existing data
social distribution of flood risk and variation running in parallel to; (ii) a qualitative enquiry
in public awareness and the ability to respond using focus groups and individual interviews;
to flood warning, especially for those seen and followed by (iii) a primary quantitative
as more ‘vulnerable’. The second project, survey. Phase 1, the secondary analysis,
Public Response to Flood Warning (PRFW) explored reported actions taken by flood
Census data Secondary

analysis of Mapping
existing data to social
Sample: identify distribution
Respondents vulnerable of flood risk
post a flood groups and
Results inform Sample:
establish social Sampling Vulnerable
Sample: distribution of population at
Population at flood risk risk of
risk of flooding
flooding
Focus groups
with vulnerable
groups to Flood
Key informant management
establish
interviews policy
understanding
of flood
warning
Figure 33.2 Research design for the Vulnerable Groups Project

Sample:
Respondents
post a flood
Secondary
analysis of
existing data
establishing
Sample:
what flood
Population at
victims did
risk of
following
flooding
flood warning
Results inform
questionnaire
Interviews to Primary
discover survey to Baseline results of
discover what Flood
actions taken Results inform imputed actions
people say management
following questionnaire following flood
they would do policy
flood warning warning
following
flood warning
Results inform
questionnaire
Focus groups
Sample: to investigate
Population at public
risk of understanding
flooding of flood
warnings
Figure 33.3 Research design for Public Response to Flood Warning Project
victims following the Autumn 2001 floods. because group discussions were thought best
Phase 2 consisted of two qualitative compo- able to access people’s thinking about the
nents: focus group discussions and individual issue while action was thought most reliably
interviews. While the focus groups concen- to be accessed by interviewing individuals.
trated on public understanding and interpre-
tation of the Environment Agency’s warning Identification of risky places and
codes, the in-depth interviews explored how risky people
individuals said they would act in response to The EA projects had multiple aims and
warnings. Another important difference was outcomes but centrally depended upon the
that while focus groups largely rely on the identification of risky places and risky people.
interaction between group members and a Respondents were defined as those ‘at risk’
shared experience, the individual interviews from tidal or fluvial flooding but who may
were conducted in respondents’ own homes, never have actually experienced a flood
with the potential to provide situational cues event. The study’s multiple-method design
prompting responses. In the final phase, enabled us to negotiate the controversies
the survey used a questionnaire instrument associated with identifying this population
developed from the responses obtained in and their understanding of their risk. The
phases 1 and 2. This was designed, using ‘at risk’ samples were identified by the use
hypothetical flood scenarios, to establish how of flood plain maps. It may seem obvious
the public would respond to flood warning in that residents within the flood plains are
the event of an emergency. most at risk from flooding but measuring the
Note that the conventional sequence of pilot extent of the flood plains and quantifying the
qualitative work enabling design of a survey likelihood of floods is a contentious exercise
instrument is here augmented by preliminary exacerbated by many factors ranging from
secondary analysis, and that the qualitative climate change to the involvement of the
components were in two modes chosen insurance industry.
The EA maps identified the ‘risky places’ people). These characteristics are usually seen
but were also used to identify the ‘at risk’ as those which increase social dependence; i.e.
population living within them. Thus the old age, ill-health, disability and ethnicity (due
quantitative data was used to define the sample to language barriers). Quantitative methods
for subsequent qualitative and quantitative are nearly always used to identify vulnerable
analyses, exemplifying a ‘development’ strat- places (measuring the likelihood of an event
egy in research design (Green et al., 1989). occurring) and are also often used to identify
This ‘at risk’ population was then targeted vulnerable people. One negative consequence
by the EA ‘awareness campaigns’ designed of this approach is that individuals may
to educate the vulnerable public about flood become stereotyped based on their defining
facts. A potential five million people and functional ‘deficit’. Another problem is that
two million homes and businesses were such defined ‘vulnerable groups’ are not
targeted. However, the flood maps were an homogenous.
etic, outsider measure of those at risk and In contrast, an emic viewpoint seeks to
recognition of their risk by those affected was identify vulnerability on the basis of meanings
clearly important for appropriate public action held by individuals arising from their lived
in preparation for any future disaster. This experience and tends to be aligned with qual-
dichotomy of meaning and measurement, itative methodology. Emic vulnerability is
in terms of outsider (etic) and insider (emic) founded on a person’s/family’s/community’s
perspectives, will now be discussed. sense of their own resilience and ability
to respond in the face of a flood. Emic
Emic and etic conceptualisation of vulnerability can only be determined by
vulnerability the person experiencing it. So, a person
A useful conceptual framework for thinking who may be defined as belonging to an
about vulnerability to flood is in terms of at-risk group (etic vulnerability) may only feel
‘emic’ and ‘etic’ approaches (see Spiers vulnerable if they consider some threat to their
2000; Fielding and Moran-Ellis 2005). These self to exceed their capacity to adequately
concepts, re-interpreted from linguistics and respond, despite ‘rationally’ acknowledging
anthropology, refer to two complementary their possession of vulnerable characteristics.
perspectives. The etic perspective represents They need to recognise that they are at risk
the ‘outsider’ viewpoint and the emic an before they can effectively prepare.
‘insider’ viewpoint. Pike (1967) linked emic
and etic linguistic analysis to emic and etic Public awareness of risk
perspectives on human behaviour, developing Quantitative analysis of the ‘at risk’ popula-
a methodology for cross-cultural compar- tion based on a survey administered in 2001
isons. Pike regards emic and etic perspectives (Fielding et al., 2005) and more recently
as being like the two images of a matching reported by the EA1 , where 49 percent of
stereoscopic view. They may initially look residential respondents (41 percent in 2005)
alike but on close inspection are different, and, were not aware that their property was in a
when combined, give a ‘startling’ and ‘tri- flood risk area, made it clear that the EA’s
dimensional understanding’ of human behav- message was not getting through. Nearly half
ior instead of a ‘flat’ etic one (Pike 1967: 41). those defined as ‘at risk’ were not aware
The payoff from combination is key: ‘emic of their risk. Thus, while the quantitative
and etic data do not constitute a rigid measurement of the extent of the flood
dichotomy of data, but often present the same plains had been used to identify the ‘at
data from two points of view’ (ibid). risk’ population, other quantitative analysis
An etic viewpoint defines vulnerable indi- identified a differing perception of reality.
viduals as those at greater risk based either The imposed, outsider view defining risky
on where they live (in vulnerable places) or places was at odds with the lived experience
on demographic characteristics (vulnerable of those defined ‘at risk’. The fact that
Table 33.2 Factors that influence awareness of flood risk of own property
% aware property in flood risk Total N Significance a
Age 16–24 31% 49 **
25–34 43% 207
35–44 55% 193
45–54 57% 150
55–64 56% 141
65+ 52% 201
Class A 86% 29 ***
B 62% 160
C1 49& 259
C2 47% 175
D 49% 144
E 43% 175
Source: At Risk 2001 survey
a Chi Square test significance ***p < 0.001; **p < 0.01
an emic perspective (risk awareness) was possibly their parents’ experience, they may
captured using an etic measure illustrates not have suffered flooding and therefore feel
that the etic/emic perspectives are not simply perfectly safe. EA public safety materials,
questions of method. including targeted letters and leaflet drops
Why were those who are vulnerable about the ‘objective’ risk, simply reinforce
according to etic measures not aware of a belief that the authorities do not know
their risk? This was initially explored using what they are talking about. Analysis of
the survey data relating other variables to response to flood warnings and of relevant
‘explain’ variation in the dependant variable, survey data (Fielding et al., 2005) found
awareness. However, the other variables that the most influential factor on flood
chosen, generally those indicating, in line with awareness and likely action in the event
the literature, a social or financial dependency, of a flood was previous flood experience.
drew on etic, or outsider, analysis to explain Evidence of scepticism based on local
lack of awareness. This did establish a clear knowledge and experience was found not
social class gradient, with the lower social only in verbatim responses in the survey
classes, the young and the old least aware but in elaborated form in the individual
of their flood risk (see Table 33.2). One use interviews.
made of the focus groups and interviews was In response to why no action was taken
to establish whether these most vulnerable upon receiving a flood warning, verbatim
groups feel most at risk, and to see whether responses in the survey included:
there were other explanations for lack of
awareness. Thus the qualitative data was used ‘Lived in [town] all my life and know where it floods
to complement and ‘explain’ the findings and where it doesn’t’.
from the quantitative analysis: an example ‘We were not flooded the first time so we did not
of ‘complementarity’ in the Green et al., expect to be flooded again’.
“I don’t want to be ignorant but it is absolute trash
typology. to say that this property is at risk of being flooded.
Flood researchers regularly encounter I have lived in [riverside town] all of my life and
respondents who deny that they live within I am 84 years old, and this area has never been
the flood plains identified by the EA. flooded in that time, and I am saying that with
Indeed, some actively campaign against their 30 years experience in the fire brigade. Whoever put
this address on the at risk register was very wrong,
properties being included (possibly because if the flooding ever got to this area [town] would
it affects their insurance premiums and not exist’.
thus house prices). In their experience, and (Post Events Survey 2001 verbatim responses)
While interviews yielded similar responses, time ago it’s not worth worrying about … Which
e.g. TD: No, I’ve lived ‘ere thirteen years and I could understand.
I’ve never felt [at risk], never (Parent Inter- (New residents interview (FWVG Project))
view, FWVG Project), the finer-grained data There is indeed ‘objective’ cause for scepti-
also contained indications that ignorance was cism about flood risk information. Flood plain
a factor. maps underestimate risk in the case of
I knew about floodplains but I didn’t imagine for flooding caused by inadequate storm-drains
one minute that where we’re located was on a or groundwater and surface water runoff,
[floodplain], in fact I didn’t even know […] there and overestimate where flood defences or
was a bloody river, that was a surprise, I knew local topography have not been accounted for.
the hump back bridge [I] go over [it] every day but
I didn’t know there was a river in that proximity.
In addition, the EA’s own literature concedes
(New residents focus group (FWVG Project)) the maps ‘… cannot provide detail on indi-
vidual properties’2 . There was evidence of
Interviews suggested that experience could disbelief in the integrity of the maps among
negate ‘objective’ awareness: ‘at risk’ respondents, who had taken no action
F: I don’t actually feel at risk. I mean I’m quite kind when warned:
of aware that I live on
‘Being on first floor flat didn’t worry’
[a floodplain because], … we have had leaflets ‘Because property is not in flood area’
through saying you’re in a blue zone and … (Post Events Survey 2001 verbatim)
knowing environmentally I could see there was a
rise and you know floods that happened like … There were hints of conspiracy between the
Lewes and Cornwall. EA and insurers from respondents:
(Owner occupier focus group (PRFW Project))
But as soon as you give your postcode they
This respondent was aware of the flood immediately know you’re in a high risk flood area.
risk but discounted it from lack of experience […]
of flooding. Participant 1: Even if you’re not, I mean I notice on
the list of roads that you gave us one of those was
F: I think that’s it, I think because I haven’t actually … Hill, well I mean that’s literally up on the Downs,
experienced anything either. how can you possibly flood up there? [Laughter] […]
And yet as far as … the insurance companies are
Several respondents recognised their lack of concerned, all they have is your postcode […] The
awareness but blamed it on lack of official Environment Agency’s stated that you are in that
warning when they moved into the area, which area.[…] Participant 3: And in the harbour there
in turn was blamed on the long time lapse, are seven storey blocks … so if you live in the top
of the storey […]
and therefore reduced risk, since the last You’re still going to be penalised.
flood: (Owner occupier focus group (PRFW Project))
…It’s just ignorance on all of our parts because Depending on personal circumstances,
nobody had told us in the first place you know,
if you only get flooded in the last time in 1968
recognition of vulnerability to flood risk,
everyone sort of forgets about it and if we’d have according to the ‘etic’ flood maps, may either
probably known that there was a chance that we be accepted and acted upon, a situation where
were going to get flooded you might have done the emic and etic perspective coincide, or
something about it sooner. rejected where etic and emic viewpoints
F: And [property] searches … you only have to
give the last twenty years history.
are at variance. In the latter case there are
(Families focus group (FWVG Project)) two possibilities. First, the respondent is not
[Second participant] I looked for it you know actually at risk – due either to an error in
because I phoned my solicitor up and gave him a the flood maps (the respondent lives on a
piece of my mind and he said well … it does show hill or recent flood defences have not been
up in your search and he told me the page it was on
but he said it is 1968, it’s quite a long time ago so
taken into account) or personal circumstance
he said I never really mentioned it to you because (the respondent lives above the ground floor).
I thought that … perhaps [because] that was a long Second, the respondent is at risk but does not
perceive this risk to be significant. Reasons for declared such a rationale did not use multiple
this are diverse: they may lack information methods in the study itself, and yet other
about the risk; through past experience and researchers who declared both a rationale
local knowledge their perception of their and followed it through by using multiple
coping ability may outweigh perceived risk; methods actually relied on a single method
acknowledging the risk may have negative for their analysis. These divergences reflect
impacts (psychological and/or economic); the fact that MMRD is not a technique, like
or they may distrust the flood maps. calculating tests of significance or running a
So, while there is value in identifying cross tabulation, but an attitude of inquiry,
those ‘at risk’ to target awareness campaigns an approach to quality standards and to what
or to explore the environmental justice constitutes adequate explanations of social
agenda, it must also be recognised that phenomena.
vulnerability is a quality of experience and The policy community – government,
produces different responses in different voluntary organisations and interest groups –
individuals. Rather than regard emic and etic is a growing consumer of social science
perspectives as competing versions, complex research. In the UK and USA those engaged
social phenomena require coordination of the in commissioning research have increasingly
perspectives and their associated method- construed adequate research as multiple-
ologies. The principal social science tool method research. At root, MMRD is a grow-
enabling such an approach is a mixed-method ing orthodoxy because of the ‘common sense’
design that assigns different roles to different appeal of the underlying logic (combined with
methods. either a measure of ignorance or indifference
to the epistemological differences between
methods), but the trend is also related to
THE STANDING, USES AND FUTURE OF the increasing promotion of ‘evidence-based
METHODOLOGICAL COMBINATION policy’, which has engendered significant
institutional moves towards standardisation
of research methods, manifest in professional
The contemporary practice of
reviews of research capacity, such as the
multiple-method research
Rhind Report in the UK (2003).
The status of MMRD contrasts in the To overcome what are regarded as the
academic and applied research spheres. constraints on the representativeness and
MMRD remains controversial in the academic generalisability of qualitative research, gov-
sphere. Since the canonical formulation of ernment has initiated both topic-specific
‘triangulation’ in the 1950s, the social sci- reviews of quality standards for research
ences have developed a range of considered (such as in health) and generic reviews
objections on grounds of epistemology and of quality standards for particular methods,
incommensurability of methods. The situation such as qualitative research (e.g. the Spencer
contrasts with that in applied research, where Review for the UK’s Cabinet Office; Spencer
many regard MMRD as a practical necessity. et al., 2003). Such reviews tend to result
Bryman (2005) compared planned research in checklists of ingredients for reliable
design and actual practice in studies claim- and valid research, and are uncomfortable
ing MMRD, finding substantial divergence reading for those who do not construe social
from the kind of planned use of MMRD research as a matter of following recipes,
that we might expect if the concept of but there is no doubting the significance of
MMRD was firmly established as part of such developments. In particular, qualitative
the methodological canon. Researchers some- research may have ‘arrived’, but it is welcome
times employed multiple methods without at the platform only provided its findings
any rationale for why this was superior to can be associated with findings from research
using a single method; other researchers who using other methods.
Long before checklists emerged for qual- qualitative studies. Initially their idea was
itative research they were already a familiar to simply add together the samples from
part of the environment for quantitative a number of qualitative studies of parental
researchers. Criteria in that area reflect the resistance until they had what they regarded
tidier characteristics of quantitative method- as a large enough sample size from which
ology and benefit from the benchmark to draw inferences. These researchers had no
standards that are intrinsic to work with direct expertise in qualitative research. Their
statistical data, such as expected sample sizes, background was in epidemiology. It had to
accepted tests of association and standard be explained that simply ‘adding together’
measures of effect size. So the checklist a cluster of qualitative studies would be
approach emerged earlier in relation to to ignore the different modes of eliciting
quantitative research and attracted less con- parental views, different analytic techniques,
troversy. A major application of large-scale different degrees of experience of vaccination
quantitative research is to health research amongst the respondents and so on. ‘Adding
and much of the heuristic associated with together’ would do little more than multiply
quality standards for quantitative research was error.
laid down in the context of epidemiological
research, which is associated with large
Technological transformations
samples and experimental/control designs.
This approach is sufficiently embedded in While the institutional frames within which
the apparatus of policy-making that it has multiple-method research is conducted cast
taken institutional form in organisations like a strong influence over what is understood
the ‘Campbell collaboration3 ’ in criminal as legitimate methodological practice, social
justice and the ‘Cochrane collaboration4 ’ in research methodology is also responsive to
health. Membership represents a kind of new techniques, particularly those emergent
official seal of approval to conduct research from the computational field. In this section
in this area and members must produce we consider some current and potential ‘trans-
research that adheres to inflexible quality formative technologies’ for their potential
standards. impact on the future of multiple-method
Ill-considered multiple-method research research.
can lead to real methodological traps. We A recent means of interrelating quali-
might take an example from the health tative and quantitative data that embraces
field, concerning the UK controversy over Caracelli and Green’s integrated approach
the Measles, Mumps and Rubella (MMR) has emerged largely by stealth. This is
vaccine, a combined vaccination against the development of quantification routines
common childhood diseases. A small sample within computer-assisted qualitative data
study conducted by a medical researcher analysis (‘CAQDAS’). Most qualitative soft-
suggested a link between the vaccine and ware counts ‘hits’ from specified retrievals
autism, and received considerable publicity. (e.g. all single female interviewees who
During the 1990s parental resistance to commented on divorce), and encourages
MMR vaccination grew, and many parents triangulation by offering a port to export
demanded that the National Health Service data to SPSS and import quantitative data
instead provide single vaccines against the tables. Some argue that such facilities repre-
various diseases. Other parents refused all sent a hybrid methodology transcending the
vaccination. Both forms of parental resistance quantitative/qualitative distinction (Bazeley
increased the incidence of the diseases. Health 1999; Bourdon 2000). These claims relate to
policy researchers were asked to address these software that enables statistical information
problems. They wanted to add qualitative to be imported into qualitative databases and
understanding to epidemiological and survey used to inform coding of text, with coded
data. They proposed a ‘meta-analysis’ of information then being exported to statistical
software for further quantitative analysis. so that findings from integrated qualitative
For example, NUD*IST’s table import and studies can in turn be related to findings
export functions enable manipulation of from quantitative research, exploiting meta-
exported data either as information about analysis strategies. Studies of family formation,
codes that have been applied to the text or the household economy and health-related
a matrix built from cross-tabulated coded behaviour are amongst areas where a number
data. Some packages also have a command of qualitative studies, rich in themselves,
language for automating repetitive or large- have proved unable to ‘talk to each other’
scale processes, allowing autocoding of data. due to varying conceptualisations addressing
Quantitative data can be imported to inform fundamentally rather similar characteristics.
interpretation before detailed coding, such as XML protocols provide the basis of a
divisions within the sample that emerged from meta-data model to integrate individual
survey response. analyses from cognate small-scale studies.
Possibilities for interrelating data range In other words, we increasingly have just
from sorting qualitative comments by cate- the tools the medical researchers wanted
gorical or scaled criteria to incorporating the in the MMR example above. By creating
results of qualitative coding in correspon- a translation protocol between researchers,
dence analysis, logistic regression or other data, contexts and interpretations, using an
multivariate techniques. Categorised response XML data model and wrappers around each
sets exported to a statistics package for individual study, the meta-data model can
analysis are still linked to the qualitative access and query individual datasets. An
data from which they were developed. For ontology is used to specify a common
example, a table in N-Vivo provides access vocabulary for both methodological and
to qualitative data from each cell of the substantive facets. The ontology is in effect
matrix produced when a cross-tabulation- a practical conciliation of quantitative and
type search is performed across data files. qualitative epistemology. Defining it draws
This enables users to show any number out and reconciles different constructions
of socio-demographic characteristics against of the features of the same phenomenon.
any number of selected codes. Supplementing The procedure of matching up the dis-
counts of hits, colour-graduation of table parate terminologies employed by different
cells flags the density of coding in each researchers in a number of independent
cell. Analytic searches can thus be com- studies enables a ‘scaling up’ of findings
posed of combinations of interpretive coding without the problem of multiplying error.
and coding representing socio-demographic The ontology ‘translates’ between projects
details. (so that what study A calls ‘conflict over
Since the emergence of Grid and High Per- shared space’ is matched to ‘kids fight over
formance computing in the late 1990s, a suite bathroom rights’ in study B etc.), enabling
of new research tools has become available generalisations and heuristics derived from
to social scientists (see Fielding 2003). Large the different studies to be reliably combined
gains in computing resource offer new data- while genuine differences are identified and
handling capacities and analytic procedures, highlighted.
and new facilities to archive, curate and Another e-Research tool relates to the
exploit social science data. A development under-exploitation of archival data, particu-
relevant to methodological integration is larly in the qualitative field. The capacity to
in ‘scaling up’ findings from small-scale link data is a key issue in exploiting archived
studies, which often have small sample data: linking qualitative and quantitative data,
sizes, non-standardised definitions and non- and linking material like personal biographies
cumulative patterns of inquiry, in such a to census data, maps and so on. ‘Data Grids’
way that inquiries by cognate qualitative enable researchers to share annotations of data
researchers can build on each other, and and access multimodal, distributed archival
material with a view to producing multiple, detect points of disparity have a helpful part
inter-linked analytic narratives. A given data to play.
event can be represented by multiple streams The potential analytic yield of multiple-
and captured using multiple tools (for sound, method research from fully exploiting
image, transcript, statistics). ‘Asset manage- expensively gathered social science data
ment’ software such as ‘Extensis Portfolio’ and drawing on the analytic affordances of
and ‘iVIEWMEDIA Pro’ enable a range computational technologies is very attractive.
of data types to be held in an integrated Such applications interest several disciplines,
environment that supports data collection, including anthropologists working with visual
analysis and authoring. Such an approach archives, linguists with sound archives and
was used in a multimedia ethnographic humanities and social researchers interested
study of a heritage centre (discussed in in multimedia work. More significantly, the
Fielding 2003). Grid computing resources ability to interrelate a host of data sources
were used to distribute large audio and offers the potential for multimethod research
video datasets for collaborative analysis. For to address social science ‘grand challenges’,
example, ‘Hypercam’ software was used to such as the relationship between social
record ‘physical’ interaction within a 3D exclusion and educational achievement in a
graphical environment as a way of annotating mixed economy, in such a way that the kind
and modelling different visitor behaviours of predictive capacity and causal explanation
in heritage centres. The 3D files could be associated with the natural sciences comes
streamed over networks via the Internet, into frame for the social sciences.
enabling researchers at other centres to com-
ment on and modify the behavioural models in
real time. Data Grids also enable researchers NOTES
to access image, statistical or audio files held
in remote archives and to work on them 1 http://www.environment-agency.gov.uk/news/
over networks (e.g. collaboratively, or using Environment Agency launches campaign to tackle
specialist software not available locally) or flood apathy (12/10/2005) Accessed 20/02/2006.
download them. Thus, an image database 2 http://www.environment-agency.gov.uk/subjects/
flood/826674/829803/858477/862632/?version=1&
compiled in one study can be systematically lang=_e#3
compared to those from others. 3 http://www.campbellcollaboration.org/
Technology opens up new types of mode index.html
comparison. The oldest ‘research’ technique 4 http://www.cochrane.org/index.htm
is pure observation and we still gain much
from carefully watching what people do.
Multimedia tools like THEME combine REFERENCES
multivariate methods to detect behaviour
patterns over time (Koch and Zumbach 2002). Bazeley, P. (1999) ‘The bricoleur with a computer’,
THEME searches for syntactical real-time Qualitative Health Research 9 (2): 279–287.
patterns based on probability theory. Applying Bazeley, P. (2006) ‘The contribution of qualitative
it to digital film, interaction patterns relating software to integrating qualitative and quantitative
to complex behaviours can be found that data and analyses’, Research in the Schools 13 (1):
are not detectable by ‘eyeballing’ the data. 63–73.
Becker, H. (1986) Writing for Social Scientists, Chicago:
Comparisons can then be made between what
University of Chicago Press.
is found using observation recorded in con-
Blaikie, N. (1991) ‘A critique of the use of triangulation in
ventional field notes and using THEME. Since social research’, Quality and Quantity 25 (2):
MMRD is all about making connections, 115–136.
technologies that allow researchers to derive Bourdon, S. (2000) ‘QDA software: Enslavement or
comparator datasets, open up their own data liberation’, Social Science Methodology in the New
to collation with that gathered by others and Millennium: Proceedings of the Fifth International
Conference on Logic and Methodology, Köln: Fielding, N. and Schreier, M. (2001, February).
Zentralarchiv fur Empirische Sozialforschung. Introduction: On the Compatibility between Qualita-
Bryman, A. (2005) ‘Why do we need mixed meth- tive and Quantitative Research Methods [54 para-
ods?’. Presented at ‘Mixed-methods: Identifying the graphs]. Forum Qualitative Sozialforschung/Forum:
issues’, Manchester, 26–27 October 2005. Qualitative Social Research [On-line Journal], 2(1).
Burningham, K., Fielding, J., Thrush, D. and Available at: http://www.qualitative-research.net/
Gray, K. (2005). Flood Warning for Vulnerable fqs-texte/1-01/1-01hrsg-e.htm [accessed 6 August
Groups: Technical Summary. Bristol: Environment 2007].
Agency. Green, J., Caracelli, V. and Graham, W. (1989) ‘Towards
Campbell, D.T. (1981) ‘Comment: another perspective a conceptual framework for mixed-method evaluation
on a scholarly career’, in M. Brewer and H. Collins, design’, Educational Evaluation and Policy Analysis
eds., Scientific Inquiry and the Social Sciences, 11 (3): 255–274.
San Francisco: Jossey Bass, pp. 454–486. Hammersley, M. and Atkinson, P. (1995) Ethnography:
Campbell, D.T. and Fiske, D.W. (1959) ‘Convergent and Principles in Practice, London: Routledge. 2nd
discriminant validity by the multi-trait, multi-method edition.
matrix’, Psychological Bulletin 56: 81–105. Johnson, R.B. and Onweugbuzie, A.J. (2004) ‘Mixed
Campbell, D.T. and Russo, M.J. (1999) Social Experi- methods research’, Educational Researcher 33 (7):
mentation, Thousand Oaks CA: Sage. 14–26.
Caracelli, V. and Green, J. (1993) ‘Data analysis Koch, S.C. and Zumbach, J. (2002, May). The Use
strategies for mixed-method evaluation designs’, of Video Analysis Software in Behavior
Educational Evaluation and Policy Analysis 15: Observation Research: Interaction Patterns in
195–207. Task-oriented Small Groups [37 paragraphs]. Forum
Caracelli, V. and Green J. (1997) ‘Crafting mixed method Qualitative Sozialforschung/Forum: Qualitative
evaluation designs’, in J. Green and V. Caracelli, eds., Social Research [On-line Journal], 3(2). Available at:
Advances in Mixed Method Evaluation, San Francisco http://www.qualitative-research.net/fqs-texte/2-02/
CA: Jossey Bass. 2-02kochzumbach-e.htm [accessed 6 August 2007].
Creswell, J.W. (2003) Research Designs, Thousand Oaks, Kuiken, D. and Miall, D.S. (2001, February). Numerically
CA: Sage. Second edition. Aided Phenomenology: Procedures for Investigat-
Denzin, N. (1970) The Research Act, Chicago: Aldine. ing Categories of Experience [68 paragraphs].
Denzin, N. (1989) The Research Act, New York: McGraw Forum Qualitative Sozialforschung/Forum: Qualitative
Hill. Second edition. Social Research [On-line Journal], 2(1). Available at:
Denzin, N. and Lincoln, Y.S. (2000) ‘Introduction: http://www.qualitative-research.net/fqs-texte/1-01/
the discipline and practice of qualitative research’, 1-01kuikenmiall-e.htm [accessed 6 August 2007].
in N. Denzin and Y. Lincoln, eds., Handbook of Levins, R. (1966) ‘The strategy of model building
Qualitative Research, Thousand Oaks, CA: Sage, in population biology’, American Scientist, 54,
pp. 1–28. 420–440.
Elliott, J. (2005) Using Narrative in Social Research, Morgan, D. (1998) ‘Practical strategies for combin-
London: Sage. ing qualitative and quantitative methods’, Qualitative
Fielding, Jane and Jo Moran-Ellis (2005) ‘Synergies and Health Research 8 (3): 362–376.
tension in using multiple methods to study vulnera- Needham, R. (1983) The Tranquillity of Axiom,
bility ’. Presented at ‘Mixed-methods: identifying the Los Angeles: University of California Press.
issues’; Manchester, 26–27 October 2005. Niglas, K. (2004) ‘The combined use of qualitative
Fielding, J., Burningham, K., Thrush, D. and Catt, R. and quantitative methods in educational research’,
(2007) Public Response to Flood Warning’, Bristol, Tallinn, Estonia: Tallinn Pedagogical University.
Environment Agency. Pike, K.L. (1967) Language in Relation to a Unified
Fielding, J., Gray K., Burningham K. and Thrush D. Theory of Human Behavior, The Hague: Mouton.
(2005) Flood Warning for Vulnerable Groups: Sec- Rhind, D. (2003) Great Expectations, London: Academy
ondary Analysis of Flood Data, Bristol: Environment of Learned Societies in the Social Sciences.
Agency. Sieber, S. (1973) ‘The integration of fieldwork and survey
Fielding, N. (2003) ‘Qualitative research and E-Social methods’, American Journal of Sociology 78 (6):
Science: Appraising the potential ’, Swindon: ESRC, 1335–1359.
pp. 43. Spencer, L., Ritchie, J., Lewis, J. and Dillon, L. (2003)
Fielding, N. and Fielding, J. (1986) Linking Data, Beverly ‘Quality in qualitative evaluation: a framework for
Hills: Sage. assessing research evidence’, Government Chief
Social Research Office Occasional Paper 2. London, Tashakkori, A. and Teddlie, C. (1998) Mixed Methodol-
Cabinet Office. ogy, Thousand Oaks CA: Sage.
Spiers, J. (2000) ‘New perspectives on vulnerability Webb, E., Campbell, D., Schwartz, R. and Sechrest, L.
using emic and etic approaches’, Journal of (1966) Unobtrusive Measures, Chicago: Rand
Advanced Nursing 31 (3): 715–721. McNally.
34
The Analytic Integration of
Qualitative Data Sources
Ann Cronin, Victoria D. Alexander, Jane
Fielding, Jo Moran-Ellis and Hilary Thomas
INTRODUCTION narrative interviews. Drawing on data from

the PPIMs project1 (Practice and Process
In recent times there has been a considerable in Integrating Methodologies project), which
growth in research projects using more than explored the methodological issues that arise
one method (see for example, Corden and in multi-method and multi-level approaches to
Sainsbury, 2006; Dicks et al, 2006; Mason, investigating the management of vulnerability
2006). This has led to renewed debate about in everyday life, we specifically focus on
the issues involved in using multiple methods the process of achieving integration across
in a single study, including questions con- these sets of data at the point of analysis
cerning the different ways in which methods and document an approach we call ‘following
and data could or should be brought together a thread’ (Moran-Ellis et al, 2004).
(see for example, Caracelli and Greene, 1997; We begin the chapter with a brief overview
Moran-Ellis et al, 2006; Pawson, 1995). How- of the concept of integration before moving on
ever, within these debates there is a tendency to an outline of our research design. We then
to focus attention on designs which bring set out the framework we developed – ‘fol-
together qualitative and quantitative methods, lowing a thread’ – to achieve the integration
leaving aside research designs which utilise of linked but separately generated qualitative
multiple qualitative methods, perhaps on the datasets at the point of analysis.
assumption that ‘qualitative data’is a homoge-
neous category. In this chapter we examine the
issues involved in integrating different types CONCEPTUALISING INTEGRATION
of qualitative data generated through three
qualitative methods: ‘conventional’ in-depth In our own work (Moran-Ellis et al, 2006)
interviews, photo-elicitation interviews and we have argued for the importance of the
THE ANALYTIC INTEGRATION OF QUALITATIVE DATA SOURCES 573
conceptualisation of integration as a specific research question in their own paradigmatic

relationship between different methods (and terms, and the methods interface with each
methodologies) which accords equal weight other through some kind of designed and
to the findings of all the methods used for systematic juxtaposition.
answering the research question, does not Integration can be achieved at various
violate the epistemological or ontological points in the research process, from research
assumptions that underpin them, but does not instrument design to interpretation of findings
necessarily lead to any particular knowledge (see Brannen, 2004; Moran-Ellis et al, 2004).
claims concerning validity or complexity. However, it is frequently the case that
This differs from triangulation approaches integration is deferred until the analysis stage
which are concerned with the accuracy or either for pragmatic or theoretical reasons.
interpretive complexity of research findings Such ‘analytic integration’ is distinct from
(see for example Bryman, 2004). Integration integration at other stages of the research
of data may be necessary for triangulation, but process, and it is this that we discuss in this
it is a process of bringing research methods chapter. Using data from our PPIMs project
(or datasets) together, whereas triangulation we illustrate how analytic integration might be
is an epistemological claim. It also differs achieved using the framework of ‘following
from other uses of multiple methods: for a thread’.
example research designs where one method
is given explanatory precedence and the data
from other method(s) are used to support THE PPIMS PROJECT
and elaborate on those findings; and those
designs where one method is employed to The PPIMs project used a number of methods
develop the other such as in the use of focus to explore the complex dimensions of vulner-
groups to inform questionnaire design. In both ability in the everyday lives of a wide range
these examples the different methods do not of people living in Hilltown2 . The project also
contribute equally to the production of expla- examined the methodological issues involved
nations of the phenomenon (see Greene et al, in implementing a mixed-methods research
1989 for a more comprehensive review of the design.
different ways in which multiple methods may The project consisted of small-scale studies
be used). In effect, our conceptualisation of that explored participants’ understandings,
integration in multiple methods research is experiences and management of everyday
analogous to an integrated transport system vulnerability. Table 34.1 provides an overview
where buses, trains and perhaps planes are of each of the qualitative small-scale studies
linked together by terminals, connections (the project also included a small-scale study
and timetables. Passengers use different that used secondary quantitative data but this
transportation modes for different parts of is not discussed in this chapter).
their journey as appropriate, and each form The concept of vulnerability has been used
of transportation retains its own nature whilst extensively in both the physical and social
also interfacing in a coordinated way with sciences to investigate and theorise factors
the other means of transport needed for the and processes that lead to individuals or
journey (Moran-Ellis et al, 2006). groups having raised levels of risk concerning
Key to this conceptualisation of integration specific negative phenomena or events. Even
in research is the requirement that each though recent work has begun to take
method used retains its own character: account of the socially constructed nature
different data types are not transformed into of vulnerability, it remains the case that
one type and then analysed using one analytic much of the research on vulnerability has
method. This retention of methodological been underpinned by a deficit model which
character allows the findings of each dataset or assumes that some groups of people are
method to contribute equally to answering the more vulnerable than others because they
Table 34.1 Overview of the PPIMs qualitative small-scale studies

Households 21 in-depth individual interviews and 3 paired sibling interviews with each member of
6 households containing children/young people and at least one parent.
Individuals 28 individual in-depth interviews with 10 people living on their own and 21 people living with
at least one other adult.
People with experience 1 focus group discussion with 6 people who had all been homeless at some time. Individual
of homelessness interviews using a life history approach with 7 participants.
Visual follow-on study Photo elicitation interviews with 13 people, based on photographs they took for the study.
Video-recorded neighbourhood journeys with 8 people. Participants had already participated in
one of the first three parts of the research.
lack something. For example, people may visual realm. Similarly, accounts generated
be classified as ‘vulnerable’ because they in interviews may emerge with different co-
are homeless, children are assumed to be constructions of vulnerabilities than those
essentially vulnerable and older people are generated through life history interviews.
seen as vulnerable when they lack power and Critical reflection on these possibilities points
capacity. Undoubtedly the uneven distribution towards the potentially heterogeneous nature
of economic, social and political power in of our qualitative datasets and the implications
society leads to certain groups of people of this for integration of these particular
being at greater risk of adverse events data. One implication concerned analytic
such as ill-health, trauma or material loss. approaches to different sets of data, and the
However, this one-sided approach tells us question of how to analyse each dataset using
very little about the experiential nature either an approach appropriate to the nature of that
of being a member of such a group or of data, so that its epistemological contribution
feeling vulnerable. Furthermore, designating to understanding the phenomenon is realised,
specific groups of people ‘vulnerable’ and whilst also being able to integrate the analyses
implying others are ‘not vulnerable’ leaves to produce explanations and understandings
us unable to examine how people (regardless which were greater than the sum of the parts.
of their situation) experience and manage The in-depth interviews were based on
vulnerabilities in everyday life. As Wisner conventional practices of using a broad
(1991: 128) argues, research on vulnerability schedule of topics to guide the interview,
needs to ‘create ways of analysing the vulner- and being responsive to participants’ own
ability implicit in daily life’, and the coping accounts of their experiences and meanings
strategies that people develop to manage with regard to questions asked. On the
these. This conceptualisation of vulnerability basis of this, we considered that the most
points towards the research methods which appropriate analytic approach to the dataset
can capture these experiential aspects. generated through in-depth interviews was
In the PPIMs project we used three methods that of a grounded thematic analysis. For this
to generate qualitative data in respect of the researcher typically begins by examining
the experiential nature of vulnerability: in- the data line by line, identifying themes and
depth interviews, life histories, and visual coding these (see Coffey and Atkinson, 1996),
methods. These different methods have the then developing these codings to capture
potential to tap into different dimensions of multiple meanings, coding convergence and
vulnerability. For example, verbal accounts divergence, and the relationship of codes to
of vulnerability elicited through in-depth broader categories. The process is iterative,
interviews allow exploration of meanings and involves segmenting the data. Analysis
of vulnerability whereas accounts generated then proceeds through consideration of codes
through photo-elicitation interviews may con- and categories to develop a thematic level of
nect with constructions connected to the analysis.
The practicalities of the process of compar- to narrate their experiences and thus situate
ison of segments of data leads to an enduring the issue of vulnerability in a broader context.
problem of this type of analysis, namely that Consequently, these accounts required a
the segments are to some extent removed from different analytic approach. To have analysed
the contexts of their occurrences within the such accounts through thematic analysis –
interview. The development of the thematic which pulls short segments out of the whole
analysis requires the research to re-connect interview, fragmenting it – would not have
segments to contexts in order to derive maintained the integrity of the participants’
legitimate interpretations of the data. stories. Accordingly they were analysed using
The visual study component of the project a sociologically informed narrative analysis
was based on two visually rooted meth- approach.
ods: photo-elicitation interviews based on Narrative analysis focuses on the social
photographs, and video-recorded neighbour- construction of the story and the role that
hood tours. In this chapter we focus on stories play in the construction and presen-
the verbal dataset generated through the tation of identity (Rosenweld and Ochburg,
photo-elicitation interviews. Photo-elicitation 1992). Moving beyond the idea that a story is
interviews involve participants discussing representative of an individual life, attention
photographs with the researcher. In our is focused on the ‘joint actions’ involved
study, participants themselves generated the in the production of the story. Plummer’s
photos about which they were interviewed. (1995) tri-partite model of the producers
Collier (1967), an early advocate of this (those who tell their story), the coaxers (those
technique, suggests that the use of pho- who encourage and enable the story to be told)
tographs during interviews helps frame and and the consumers (those who read/hear the
focus the discussion, sharpen memory, evoke story) is illustrative of this mode of thinking.
rich descriptions and set the informant at Even though the producer (teller), encouraged
ease. The interview enables participants to by the coaxer, draws on real events and
discuss their interpretation and meaning of experiences to tell the story, the story is
the photographs and to provide an explanation only ever an interpretation of the significance
for why they chose to photograph what they of past events and experiences. Finally the
did. We felt that for this dataset a thematic consumer will add another layer of meaning
approach to the photo-elicitation interviews and interpretation onto the story. As Riessman
was also appropriate. However, the presence (1993) notes, representation is ambiguous and
in the photo-elicitation interview data of always open to different interpretations. Thus,
references to the photographs, and hence to the both the meaning and consequences of a story
visual realm, both by participants and by the is always contingent upon first, the social
researcher, created a framing of participants’ location of those involved in the production
experiences to which the thematic analysis and consumption of the story and second,
also had to attend. the wider social context in which the story
The third set of qualitative data was gener- is told. For our purposes here we focus on
ated via interviews with people who were, or the producers (the participants) who tell their
had recently been, homeless. Our experience stories.
of running a focus group with previously In contrast to thematic analysis, narrative
homeless individuals indicated that although analysis begins by identifying the ‘sequence’
participants were willing to engage in inter- of a story. While ‘sequencing’ can take many
active discussion about their experiences of forms, including chronological, consequential
homelessness, they were concerned to present or thematic sequencing, it focuses attention
their own life accounts, or stories, of home- on the socially constructed nature of the
lessness. Taking this into account, subsequent story. Thus analysis moves beyond the mere
individual interviews specifically used a life identification of past events and experiences
history approach which enabled participants to concentrate on trying to understand the
contemporary significance or meaning they Having undertaken this initial analysis of

hold for the individual telling the story. Only each dataset, the second step focused on
when a skeleton structure has been completed identifying a ‘promising’ finding within a
for each story included in the dataset is dataset which could be picked up as a thread
it possible to begin to make comparisons to be followed through into the other datasets.
between the stories. The identification of a promising emergent
To summarise, then, it was epistemologi- finding may be sparked by the relationship
cally most appropriate to analyse the in-depth between it and the over-arching research
interviews and photo-elicitation studies using question, or by the resonance of it with one or
a grounded thematic approach, whilst the life more of the other datasets. This established a
history accounts were best analysed through lead for further analysis involving an iterative
a sociologically informed narrative analysis interrogation of all the datasets.
approach. Even though both the thematic This led to the third step whereby emergent
and narrative approaches are located in a findings, categories, and codes concerning the
social constructionist paradigm, the former thread that was followed into each dataset
focuses on identifying conceptual themes and were juxtaposed to create a data ‘repertoire’3 .
issues raised by the participants, while the This repertoire was then further analysed
latter attends to the social construction of to refine and extend the analysis of the
the story and the role that stories play in relationship between the thread and the over-
the construction and presentation of identity. arching research question.
This represented a potential point of tension Finally, in the fourth step, the findings
for integration in as much as one mode of that Step 3 generated for a particular thread
analysis consists of extracting information were synthesised with other threads that were
from the whole, while the other seeks to similarly picked up and followed. This can be
maintain the ‘wholeness’ of the story. Where undertaken without predetermining whether
the goal is analytic integration, a means the phenomenon being researched is multi-
must be found for reconciling this tension faceted, complex or singular, and without
without undermining the contribution of each prejudicing the contribution each research
method to understanding the phenomenon method can make to the overarching research
being researched. question.
The following section looks in more detail
at this approach in practice. Even though
Achieving an integrated analysis
we focus on our qualitative datasets in this
There has been little written about the chapter, in practice we used this approach to
practicalities of integrating multiple datasets integrate all the PPIMs’ data including the
within the parameters of each achieving quantitative data.
an equal contribution (with the exception
of Coxon, 2005 and Pawson, 1995). To
address this challenge in our own research AN EXEMPLAR: FOLLOWING THE
with respect to the heterogeneity of our ‘PHYSICAL SAFETY’ THREAD
three qualitative datasets we developed an
approach to enable us to be systematic In team discussions of the initial analytic
and rigorous which we called ‘following findings from our qualitative datasets it
a thread’ (Moran-Ellis et al, 2004, 2006). became apparent that a particular finding
This consisted of four steps. The first step from the visual component – that of the
entailed each dataset being initially analysed significance of physical safety as a vulner-
using the analytic method appropriate to that ability to be managed in everyday life –
data (as described earlier) resulting in the resonated strongly with emergent analytic
identification of emergent findings and further findings in the homeless data and in the
analytic questions. sub-set of interviews with the children and
young people in our study. On this basis tend to characterise notions of vulnerable
we moved to Step 2 and took it up as a groups and individuals at an objective level.
‘promising’ thread, systematically identifying
and analysing ‘physical safety’ in these
and the other datasets we had generated. Step 1 – Initial analysis
Through this we identified codes and cate- The photo-elicitation interview data
gories, and generated emergent findings on Thematic analysis of the photo-elicitation
‘physical safety’ for each dataset. This led interviews suggested that participants associ-
on to Step 3 where we juxtaposed these ated a threat to physical safety with specific
to create a data repertoire. This repertoire places, groups of people or hazards, with a dis-
was then analysed further, with particular tinction being made between physical assault
emphasis on analytic questions such as and accidents. Photographs of dark alleyways,
whether issues concerning vulnerability and deserted paths, and graffiti were taken by
physical safety were persistent features of respondents to represent unsafe places where
experiences of vulnerability, the different assaults could occur. Participants often said
facets that were revealed by different research that they avoided these places, especially at
methods, and the importance of contexts night. Photographs of a fast lorry, a dark street
for how this form of vulnerability was and a blind curve represented potential traffic
experienced. hazards. Even though participants said that
Whilst it is interesting that two of our sets they had to exercise due care, these hazards
of participants – people who are homeless, were constructed as being beyond the control
and children and young people – are usually of the individual and responsibility was seen
classified as ‘vulnerable’ in policy terms, or to rest with ‘the Council’. In relationship to
categorised as members of a vulnerable group a photo (of a blurry lorry), one participant
in objective measures and conceptualisations commented:
of their social position, they were not selected
for particular analytic attention on this basis. I hate the lorries using this as a rat run to the
industrial estate at the end because they make the
Rather their data have been given prominence
house shake. The whole road is up in arms about
here because of the strong resonances we that. (Jane, 37 years old)
found between the emergent findings in the
analysis of the photo-elicitation interviews, Participants made a further distinction
which were conducted with a range of people, between potential threats (either malicious or
and the initial analyses of the data generated accidental) to their own safety and threats to
with these two groups of participants. Our other people. In the latter case, participants
orientation to all the participants in the talked about the threat to specific groups
study was to their subjective understandings of people – children, the elderly or the
and constructions of vulnerability in their disabled – suggesting they saw vulnerability
everyday lives, and their accounts of how they as being an inherent characteristic of these
strategically manage these vulnerabilities. particular groups. One respondent, for
This precludes any assumptions being made example, photographed an uneven pavement
about essentialised or inevitable vulnerabil- which she saw as a potential tripping hazard.
ities for any group of participants in our She was not concerned for her own safety, but
studies. In this respect, the use of multiple referenced ‘vulnerable old people’, perhaps
qualitative methods was particularly valuable with walking sticks, who could easily trip.
as it enabled us to gain an extensive and The same respondent photographed the
intensive exploration of vulnerability as a detritus of drug use but focused her concern
subjective interpretive phenomenon. This on this being found near a primary school:
allowed for people’s own understandings … it’s literally about 50 yards from the back end of
and agency and moved away from the the school field and there is a gate that goes from
overarching deterministic discourses which the junior school, to this. It is literally about 50 yards
and you go down there and they have got, they photographs as well as elaborated in the
have made, bits of furniture that have been chucked accompanying photo-elicitation interviews, it
away, like that was a table and all around there
was only in the latter interviews that people
there is paraphernalia, what I call paraphernalia.
There is drink cans, there is coke cans where they talked about how they managed potential
have made bombs to smoke drugs, there’s even threats to their physical safety. From this
silver foil where they have actually, we did have a verbal data we identified three key strategies
look and it looked as though they had been smoking which participants used to minimise either
heroin and that is a concern, obviously, to the whole
actual risk or their perception of risk. The
of the neighbourhood because any kids of any age
can go down there. (Alice, aged 56-65) first strategy was to avoid places or people
categorised as unsafe. The second was related
Certain types of public space, represented to the degree of familiarity participants felt
by photographs of alleyways, overgrown pas- about their local environment. While a high
sages between buildings and a subway were degree of familiarity could be used to aid
considered intrinsically unsafe, particularly at decisions about which places or groups of
night time, due to the potential for physical people to avoid, it was also used to ‘offset’
assault. The canal had a more ‘fluid’ status feelings of insecurity or a lack of safety. One
as a safe/unsafe place, seen as a recreational woman, for example, claimed that she felt safe
amenity during the day but dangerous after living in the neighbourhood despite knowing
dark. In addition, specific groups of people that other people had been assaulted there. She
(the homeless, drunk people, local youth had lived in the neighbourhood for a long time
gangs) were labelled ‘trouble’ or ‘scary’, and was familiar with it, and so felt it was safe.
generally because they represented a potential This links to the third strategy of displacing
threat to an individual’s safety. Even though the perception of risk to oneself onto groups
participants did not take photographs of of people already designated vulnerable.
people whom they feared – participants cited
safety reasons for not photographing these
threatening people, but also said that they did Step 2 – Picking up the promising
not feel comfortable invading the privacy of thread of ‘physical safety’
such individuals – other means were used to
indicate the sense of threat felt by participants. The accounts of children and young people
For one respondent, a photograph of graffiti Picking up the thread of physical safety in the
was emblematic of a gang of youths who were interviews with children and young people,
considered unstoppable due to the support the analysis showed that these participants
they enjoyed from older male relatives. In were often making decisions, and taking
contrast, participants took images of graffiti actions in relation to their safety, based on
to suggest that crime was generally prevalent other people’s worries and concerns rather
in the area where it was found. than their own. In particular they were subject
Photographs of CCTV cameras were to the worries and concerns of their parent(s),
presented as either representing the dual which varied in terms of what the worry was,
sword of security and surveillance or, in and how strongly it was a factor in parental
one instance, given that an old man had moves to constrain their child’s actions:
been physically assaulted twice under the
I: Are there any […] rules that your parents set
photographed camera, used to question the [about using the internet]?
notion of security implicit in the use of P: Not really but they don’t let us have hotmail
CCTV cameras. In contrast to these examples, because of the chat room, my sister had it
photographs of personal spaces – homes, but I don’t know what she did but then they
banned it … so I don’t get the benefit which
gardens or bedroom – were taken to indicate
I think is really unfair as all my friends have it
safe, comfortable places. and I’m the only one who doesn’t have it
Whilst experiences and perceptions of I: Do you understand the reasons why you can’t
vulnerability were represented visually in the have it?
P: Not really, I asked but they wouldn’t tell me. Constraints related to safety were also often
(Tom, age 13 years) contingent on time of day. The arrival of ‘the
dark’ was a particularly important marker of a
The children in our study, aged 10–13, shift from a safe time to an unsafe time. In this
indicated they were constrained concerning respect, the temporality of safety and threat
their actions, the places they could go, and resonated with a similar framing by adults as
how they got there. In general they accepted well as children in the visual data accounts.
these limitations whilst also wishing for, However, in the visual data it was named
and indeed trying to gain, greater autonomy places that became less safe with the arrival
in their movement in public spaces. Two of night time, whilst in the interviews with
of the children who had recently started the children parental fears were understood as
cycling into the town centre on their own being simply about ‘the dark’:
identified this as an extension of their usual
domains beyond the house and garden. I: What about when you are outside playing?
Are there rules about where you can go or
Undertaking this venture was accompanied what time?
by an acute awareness that they needed to P: Sometimes I am not allowed to go to the park
guard their safety in respect of being in the I have to stay right in front [garden]. And we
town unaccompanied by an adult. Thus the are not allowed to come home really late.
threat, and their physical vulnerability, was I: What is late, what would be late?
P: Well, when it gets dark. When it gets dark.
associated with being in a particular place (Yasmin, 11 years old)
without the protection of an adult rather than
the hazard of cycling on the roads (the latter In contrast young people, generally aged
being a safety issue they did not mention). 14–18, felt that these worries about safety
Another child spoke of his sense of a particular belonged really to their parents and did not
threat to his safety when he was not in the reflect the safety issues that they actually
company of a protective adult: had to deal with when they were out and
about in public spaces. These young people
I: What is it about strangers that you worry
about?
identified having to deal with threats of
P: Kidnapped. violence: some of the places they went – the
(Jack, 13 years old) amusement arcade, the town centre – opened
up the possibility that they might encounter
Indeed for some children the threat of individuals who wanted to fight, gangs or
being kidnapped or murdered framed their general violence. Thus, it was important
reflections on whether there were places in to know when to leave a place and who
the town that they might not go, or where to avoid. Furthermore, young people often
they had to be careful. These threats were worked to manage their parents so they did
‘monstrous’ but at the same time the children not find out about these hazards, for example
outlined their strategies for maximising their by withholding information as to their true
safety, primarily through being able to identify whereabouts or by presenting themselves in
people who might pose such a threat: ways designed primarily to reassure their
parents:
P: If I like see someone who doesn’t, if it’s late
or something and I find, if I see someone who I go to my friend’s house and we’ll go out, and I’ll
doesn’t look like normal than I just walk off just text my parents and say we’ve gone here, there
with my mates and go somewhere else. or wherever. If I’m staying at a friend’s house, I will
I: […]what kind of things do you look for when go out with them but won’t tell my parents. (Lucy,
you’re trying to decide if someone’s OK or a 14 years old)
bit?
P: It’s just like if he doesn’t look right, they’re While space does not permit a full discussion
watching and things. here, one of the girls in the study also talked
(Stuart, age 12 years) about managing gendered threats to her safety
from men, whilst another indicated that this hide this identity: he avoided mixing with
was a parental worry that she had to negotiate other homeless people in public therefore
in order to be allowed out with her friends or hoping to ‘pass’ as a general member of the
on her own. public, thus remaining safe. Another resident
Young people who articulated a definition of the night shelter – Tom, a man in his late
of vulnerability tended to associate it with the thirties and homeless since the age of 14 – had
ability or inability to defend oneself physically developed additional strategies to cope with
from attack. the physical threats that arise from being street
homeless. On arriving in a new and unfamiliar
The narrative data: Stories of homelessness environment he applied knowledge gained in
Picking up the thread in the narrative previous locations to the new location, in short
interviews with people who were homeless, constructing a ‘universalised’ safety ‘map’.
the analysis revealed the ways in which the For example, previous experiences had taught
topic of physical safety in the accounts of him that the chances of being physically
people who had experienced homelessness attacked were higher if he slept in the centre
was a salient factor in both the construction of a town as opposed to the outskirts, thus he
of identity, and the material practices of daily routinely avoided the centre of all towns.
life. Physical safety – the lack of it, the search Participants who had been through drug
for it, the meaning of it – was an integral part and/or alcohol rehabilitation and were cur-
of individual stories. rently living in residential move-on accom-
Many participants presented biographical, modation, from which they hoped to move to
chronologically structured accounts of their individual permanent accommodation, were
lives which highlighted the lived experience at a different stage and this was reflected not
of vulnerability and its links to (a lack of) only in the telling of their stories but also
physical safety. This included, for examin their reflections on physical safety. While
ple, physical, sexual and emotional abuse producing in-depth accounts of past threats
in childhood, experiences of being street to physical safety, the majority felt physically
homeless, the physical dangers inherent in safe in the present although recognising that
alcohol and/or drug abuse or the transient this was contingent upon remaining alcohol
nature of many homeless people’s lives. and/or drug free. Looking to the future,
At the individual level it was evident that participants expressed concerns that they
the majority of the stories were structured might be housed in areas populated by drug
around the ‘quest’ for physical safety; taken users and dealers, which would constitute a
collectively it was possible to chart the new threat to their physical safety.
different ‘stages’ involved in homelessness, These participants adopted a number of
the strategies developed at each stage to strategies to reduce threats to their physical
deal with the experience, and the subsequent safety, including avoidance, ‘invisibility’ and
impact on identity. ‘passing’. Additionally, recovery from alcohol
One young man – David – for example, and/or drug abuse was often talked about in
had become homeless in his home town terms of a long-term strategy to reduce the
and had lived for a short period of time in risk of physical harm, in as much as the
a car, yet had felt safe doing so because of his ultimate goal is permanent accommodation
familiarity with the area and the people. This and reintegration into ‘mainstream’ society.
contrasted sharply with his recent experiences In addition, participants’ explanations for
of living in a night shelter. His lack of why they left home could be construed
familiarity with the area, coupled with his as a strategic act of resistance, whereby
perception that local people were actively being homeless was considerably preferable
hostile to homeless people not only led him to to being subjected to further abuse at home.
reflect on the salience of this new identity for In the discussion of the previous two
him but also to develop strategies to publicly datasets it was possible to use data extracts
from the interviews to illustrate our analysis. living with violence and a visually locatable
Unfortunately, a combination of a lack of phenomenon for all participants. For young
space and methodological considerations does children the concept was often related to
not permit the inclusion of data extracts from extraordinary events (kidnap, murder) rather
the homeless accounts – one ‘extract’ ran than more ordinary or frequent threats to
to some 12 pages of transcription. In order physical safety such as road traffic accidents,
to do justice to the data we would need to muggings or assault.
present extended data extracts to demonstrate Contingent features of vulnerability also
the narrative nature of the accounts and emerged out of the analytic integration of the
the presentation of identity. One man, for three datasets. These included vulnerabilities
example, began his interview by asking if associated with physical safety which were
the interviewer wanted the story of his life contingent on time of day/night as well as
and then proceeded to provide a very detailed being linked to material-spatial-architectural
chronologically ordered account of his life, aspects of public spaces. In terms of dealing
which attempted to provide a socially situated with situations and locations which increased
explanation for his homelessness. Drawing perceptions or senses of physical vulner-
on the notion of ‘discredited identities’ ability, all participants identified strategies
(Goffman, 1963) it is possible to see how the which they used to manage their (potential)
interview provided the homeless participants vulnerability.
with the opportunity to provide an alternative It became clear that perceptions, construc-
account of homelessness from the negative tions and experiences of vulnerability also
one traditionally portrayed in society. diverged in different domains for different
groups of participants. For people who were
homeless, vulnerability was closely tied to
Step 3 – Creating a data repertoire
the biographies that had led them to be
The third step of this process of analytic without a home. They identified physical
integration involved juxtaposing both the assault as a recurrent feature of their childhood
initial analytic findings of the individual homes, their temporary homes in their adult
datasets and the data segments/elements that lives, and of their times living on the
had been coded in the initial analysis to street. In addition, threats to their physical
create a data repertoire for the theme of safety were encountered, or anticipated, when
‘physical safety’. This was then subjected to moving into new areas or new towns and
further analysis and interpretation, looking for occasioned the need to make decisions about
commonalities and differences, convergences where they would stay and where they would
and divergences. Effectively this repeats locate themselves. Physical vulnerability was
the process of inductive analysis with the tied into the identity of being homeless in a
data identified as salient to the thread of profoundly biographical and narrative way.
physical safety whilst remaining mindful of In contrast the physical safety issues
the implications of the nature of the data that concerned children and young people
and its origin. It is through the development in our study reflected the ways in which
of the analysis of the data repertoire that they are positioned between structures which
findings can be integrated to produce a more constrain their actions on the basis of their
complex understanding of the thread and its age, and their own desires, opportunities,
relationship to the overall research question. and abilities to be (relatively) autonomous
In relation to physical safety and vulnera- social actors (see Hutchby and Moran-Ellis,
bility, further analysis of our data repertoire 1998; James and Prout, 1990). In this regard
led us to understand physical safety as both a their constructions of physical safety and
present and an embedded past feature of the vulnerability were linked to the relative
lives of people who were homeless, a present distributions of power between adults and
but negotiable hazard for young people not children/young people, and the ways in which
these distributions intersect with their social of ecology on the one hand and an inherent
worlds. For the children in the study, their characteristic for some on the other. Physical
sense of vulnerability in physical terms related vulnerability can be understood in visual
to extending their usual geographical range terms as readings of present dangers, future
from their immediate localities with known dangers, and attributed responsibility for
adults nearby to being unaccompanied in causing the vulnerability to outside agents
public spaces at a further distance from home. such as the Council, or a local group of
They managed their vulnerability by adhering youths. Strategies for managing safety were
to parental rules which they understood not manifested in the visual domain, emerging
to be designed to maximise their safety, instead as accounts of actions including
and by developing their own readings of avoidance of the location.
other people in their vicinity in terms of In summary then, physical safety emerges
whether or not they might present a threat. as a dimension of vulnerability but how it
For the young people in the study their emerges is contextual to the social worlds
social worlds were already more extended of the participants. How people experience
both geographically and temporally, but they vulnerability, and how they act on that, varies
sought greater control and autonomy in considerably whilst the environment presents
their movements and activities. With this different degrees of threat. For the homeless
came an increased likelihood of having people in the study physical safety was a
to deal with physical safety issues, with key strand in their narratives, interweaving
threats presented by others in the form with their identities and biographies. For the
of fights, gang actions, violent encounters, young people and children it was a site around
and possible sexual harassment or assault. which the relationship between their structural
Key for the young people in the study position in their families, and in society more
was managing parental concerns so that generally, and their status as social actors is
the young people could exercise physical played out. Visually, the notion of physical
autonomy in the face of other people’s worries safety can be framed by participants as a
about their vulnerability whilst balancing material, ecologically located phenomenon.
this with managing the potential risk of Uniting these dimensions brought us to
actual violence when they were in the public considerations of vulnerability and safety
arena. Their perception of their own physical which suggested that, whilst there were
vulnerability was framed in the context of commonalities of dimensions across different
their strength or weakness relevant to their genres of experience and perception, such
potential assailant. as the significance of time and place, this
Physical safety and vulnerability took form of vulnerability also intersected with
on a different dimension in the domain individuals’ notions of their own and others’
of the visual as represented in the photo- identities. This led us towards theorising how
elicitation interviews. Here it was the material this aspect of vulnerability and its intersection
fabric of places which were invoked visually with (or contribution to) individual identities
and verbally as increasing or decreasing fits with other forms of vulnerability. To
vulnerability to physical hazards and assaults. address this question we returned to the
The built environment was taken to be a data to identify other ‘promising threads’
context in which a person’s vulnerability may and followed those analytically across the
be accentuated – for example that of older datasets. The final goal was to synthesise
people who might trip over loose paving, these findings with other themes, to create
or children who were at secondary risk to multi-faceted understandings of vulnerabil-
the hazards of drug taking near their school. ity and its management in everyday life
This material context intersected with ideas across a broad range of dimensions that
of time of day and sources of responsibility emerged from our research. At the end our
to produce physical vulnerability as a product theoretical understandings of vulnerability
was a picture woven from these different approaches. In respect to this, the PPIMs
threads. team critically examined what happened to
the narrative accounts of the homeless people
when the data repertoire was created. Our
conclusion was that the data repertoire could
CONCLUSION
encompass narratives provided effort is made
to preserve their integrity by constantly re-
Challenges in this approach
examining the links between the themes
Our goal in this chapter has been to demon- and the narratives. In our case, the over-
strate how integration of different qualitative arching structural narrative feature, which
datasets, through an examination of each set was paramount to our understanding of the
of findings relating to safety and vulnerability, homeless participants’ accounts, draws on
increased our understanding of the experience notions of identity as a homeless person. The
of vulnerability. Each dataset contributed an theme of physical safety extended beyond
equal share to the analysis of vulnerability and specific instantiations to form a cornerstone
physical safety, and as the analysis proceeded, of identity. We suggest that the salience of
we were able to reflect on the complex nature identity in these accounts complements those
of vulnerability in this regard. produced by other participants and resulted
This process for generating analytic inte- in an increased understanding of the theme
gration is time-intensive and entails a number of physical safety and the overall theme of
of challenges. The first is identifying ‘promis- vulnerability.
ing’ threads. There are a number of strategies Nevertheless, the potential ‘risk’ of some
used in single dataset analyses which can types of data being ‘translated’ into other
be drawn on: inductive leads may arise types remains. We successfully retained
from within the project, through reference the narrative quality of the homeless data;
to the research question or sensitivity to the however, we were unable to convey a
content of the data, or it may be sparked sense of the story-ness of the data in a
externally, so to speak, by the stimulus of short chapter such as this. Similarly, in this
theoretical work and other empirical studies. chapter we described photographs, thereby
In addition, team discussions about dataset translating visual data into verbal, and relied
contents, emergent findings, and puzzling on the transcriptions of the photo-elicitations
questions are essential for establishing res- (textual data) leaving aside actual visual
onances between the datasets. Thus it is analysis which was also part of the study.
important that team research includes team The photo-elicitation draws on the visual
members with a range of expertise, allows knowledge of the study participants and is
for appropriate methodological divisions of therefore distinct from the other interview
labour, and includes sufficient opportunities data. We believe that visual data itself, as
for good communication within the team. with narrative data, can be part of analytical
Another key challenge is to allow each integration; however, conventional reporting
dataset its own integrity throughout the and publishing formats provide challenges in
integration process. Creating a data repertoire presenting such data in their own terms.
of systematically identified initial analyses, In this chapter we have argued, drawing
assembled for further analysis to produce an on our earlier work, that integration should
integrated story about a particular aspect of be thought of as a process which creates, and
the phenomenon (such as we have in done analytically exploits, a particular relationship
in this chapter with respect to physical safety between different sets of data. We have
and vulnerability), might seem to privilege a also argued that since all qualitative data
thematic approach to analysis. If this were are not alike attention must be paid to
the case, it would be problematic for data the processes by which research generating
more appropriately handled by other analytic multiple qualitative datasets will achieve
integration, where that is the purpose of Collier, J. (1967). Visual Anthropology: Photography as
having a multiple methods research design. a Research Method. New York: Holt, Rinehart and
To this end we have presented a model for Winston.
the practical accomplishment of integration at Corden, A., and Sainsbury, R. (2006). ‘Exploring
the level of analysis – ‘following the thread’ – ’quality’: research participants’ perspectives on
verbatim quotations’, International Journal of Social
which focuses on ensuring that the integrity of
Research Methodology, 9(2):97–110.
each type of dataset is preserved in the process
Coxon, T. (2005). ‘Integrating qualitative and
of integration, and hence the epistemological quantitative data: What does the user need?’
contribution of each set of data is maintained. FQS (Forum: Qualitative Social Research). 6 (2):
We also argue that this approach offers the e-paper. http://www.qualitative-research.net/fqs/
opportunity for synergies between datasets in fqs-eng.htm. Accessed July 2006.
order to achieve one of the goals of multiple- Dicks, B., Soyinka, B., and Coffey, A. (2006). ‘Multimodal
methods research: the generation of an overall ethnography’, Qualitative Research, 6(1): 77–96.
analysis which is greater than the sum of the Goffman, Erving. (1963). Stigma: Notes on the
(methodological) parts. Management of Spoiled Identity. Englewood Cliffs,
NJ: Prentice Hall.
Greene, J.C., Caracelli, V.J., and Graham, W.F. (1989).
NOTES ‘Toward a conceptual framework for mixed-method
evaluation designs’, Educational Evaluation and
Policy Analysis, 11:225–274.
1 ESRC Award H333250054 Investigating
Practice and Process in Integrating Methodologies
Hutchby, I., and Moran-Ellis, J. (eds) (1998). Children
(PPIMs). The project is funded by the ESRC and Social Competence: Arenas of Action. London:
under the Research Methods Programme Falmer Press.
http://www.ccsr.ac.uk/methods/. James, A., and Prout, A. (eds) (1990). Constructing and
2 A pseudonym for a small town in the South of Reconstructing Childhood. London: Falmer Press.
England. All participants are anonymised. Mason, J. (2006). ‘Mixing methods in a qualitatively
3 This alludes to a repertoire of dance or music driven way’, Qualitative Research, 6(1): 9–25.
pieces, rehearsed and developed, which provide a Moran-Ellis, J., Alexander, V.D., Cronin, A.,
pool from which a selection is made to create a
Dickinson, M., Fielding, J., Sleney, J., and Thomas, H.
particular conceptual performance. We use this to
capture the assemblage of initial analyses which are
(2004). Following a Thread – An Approach to
not ‘raw’ data, have their own (methodological) Integrating Multi-method Data Sets, paper given
integrity, and which can be brought together to at ESRC Research Methods Programme, Methods
produce a coherent ‘story’. We would not, however, Festival Conference, Oxford, July 2004.
wish the metaphor to be taken too far: the intention Moran-Ellis, J., Alexander, V.D., Cronin, A.,
is to provide some language to describe this part of Dickinson, M., Fielding, J., and Thomas, H.
the process of integrated analysis. (2006). ‘Triangulation and integration: Processes,
claims and implications’, Qualitative Research, 6(1):
45–59.
REFERENCES Pawson, R. (1995). ‘Quality and quantity, agency
and structure, mechanism and context, dons and
Brannen, J. (2004). ‘Mixing methods: the entry of cons’, BMS, Bulletin de Methodologie Sociologique,
qualitative and quantitative approaches into the 47:5–48.
research process’, International Journal of Social Plummer, K. (1995). Telling Sexual Stories: Power,
Research Methodology, 8(3):173–184. Change and Social Worlds. London: Routledge.
Bryman, A. (2004). Social Research Methods, second Riessman, C.K. (1993). Narrative Analysis. London:
edition. Oxford: Oxford University Press. Sage.
Caracelli, V.J. and Greene, J. (1997). Advances in Mixed- Rosenweld, G.C. and Ochberg, R.L. (1992). Storied Lives:
Method Evaluation: The Challenges and Benefits of The Cultural Politics of Self-understanding. London:
Integrating Diverse Paradigms, New Directions for Yale University Press.
Evaluation, No. 74. San Francisco, CA: Jossey-Bass. Wisner, B. (1991). ‘Rural livelihoods in Kenya,
Coffey, A., and Atkinson, P. (1996). Making Sense of 1971–1990: Further reflections on justice and
Qualitative Data Analysis: Complementary Strategies. sustainability’. Paper presented to the Association of
Thousand Oaks, CA: Sage. American Geographers, Miami.
35
Combining Different Types of
Data for Quantitative Analysis
Manfred Max Bergman
A man [sic.] with a watch knows what time it is. data are indeed thus produced, what are the
A man [sic.] with two watches is never sure. advantages of using more than one dataset for
Segal’s Law
a particular research purpose?
This chapter is about what and how data
are detected and used, and, as a consequence,
INTRODUCTION how certain limitations thus arising may be
overcome by using more than one dataset.
Data do not occur naturally, nor do data Of particular interest are different types of data
ever speak for themselves, nor does there and how they are selected and combined in
exist an obvious interpretation for a datum. modern research designs. For this objective,
Instead, data are manufactured and interpreted it is necessary, first, to conceptualize data
to fit a particular research purpose or line and their integral position within the research
of argumentation. Empirical detection and process, second, to understand the process
interpretation of presences or absences, pat- of data production, and, third, to explain
terns, order, structure, or change, regardless the possibilities and limits of using more
of whether inductively or deductively derived, than one dataset for a research project. This
are the outcome of theoretical models and chapter will not deal with data analysis
assumptions underlying analysis, of which issues specifically but will nevertheless cover
data are an integral part. Already in 1964, reasons for which more than one dataset could
Coombs wrote: ‘knowledge is the result of be used in quantitative research. In addition,
theory – we buy information with assump- while many of these issues could be applicable
tions – “facts” are inferences, and so also are to qualitative or mixed-methods analysis,
data and measurements and scales’ (1964: 5). the explicit focus here is on quantitatively
If data production is part of the constitutive oriented research. Finally, there exists an
process of research, then from where do they excellent literature on validity and reliability,
come and of what are they made? And if which connects in many ways to the use of
more than one dataset for a particular research practice, researchers often either formulate or at
purpose. In this text, however, such issues least adjust their research questions according
are not covered in detail. The utility of using to the characteristics of the available data. This
multiple datasets transcends quality issues is particularly the case with secondary analysis
relating to classical validity concerns but tends of existing data, where researchers often create
proxies from variables that may be related to,
to be under-theorized. This chapter addresses
but do not fully connect with, a construct
this omission. under investigation, or they adjust their research
questions or models to create a more adequate fit
between the constructs embedded in the research
DATA AND THE RESEARCH PROCESS question and the data available. Moreover, few
researchers are unclear about what analytic
Prompted by various introductory texts and techniques they will use, at least in general
lectures on research methods and method- terms, before they have collected their data, often
ology, most people understand empirical selecting the analytic strategies and methods
research as a tripartite process: the conceptu- according to their analytic competences and
alization of a research question, the collection habits. Quite often, specialists in multidimensional
of data, and the analysis of these data, from scaling, correspondence analysis, latent class
analysis, etc. tend to stick with the technique with
which the research results emanate.
which they are familiar.
• Fragmentation: Due to the conventional tripartite
The conventional view of the division of the research process, researchers
research process tend to focus on the details relating to the
components of the research process – research
The conventional model about the research question, data collection, and data analysis, while
process connects the four principal research neglecting the intricate relations between them.
components, i.e. research question, data However, the quality of the research process
collection, data analysis, and the research and its results are at least as dependent on
results1 , in a specific way. Figure 35.1 illus- the interconnectedness between the components
as they are on the components themselves.
trates this conventional view of the research
Due in part to this fragmented research design,
process.
many research results are unconvincing or
There are three fundamental problems with incommensurable with other research findings,
this research model: chronology, fragmenta- despite the availability of appropriate data and the
tion, and apparent inevitability: application of sophisticated analytic techniques.
This connects to some extent to John Tukey’s
• Chronology : This conventional model implies suggestion that there exists an error source far
a chronological ordering of the different parts more treacherous than the Type I or Type II error:
of the research process such that researchers the greatest threat to validity, the ‘Type III error,’
appear to have settled on a research question is asking the wrong questions of the data (cited in
before they collect or select appropriate data, Raiffa, 1968).
and only then would they consider how these • Inevitability : This model also implies a certain
data are to be analyzed. Thus, the model strongly inevitability of the results that emerge from
implies a deductive approach to research, while the research question. The research results are
inductive research, including data exploration and believed to be an inevitable consequence of the
visualization, are either ignored or spurned2 . In research question because the data were collected
Research
Data Analysis Results
Question
Figure 35.1 The conventional view of the research process

COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 587
or selected based on their suitability for answering taken long before data have been collected,
a particular research question, and the analytic and that there exist many options to answer
technique was selected according to the data a particular research question. All research
at hand and in line with the research question. findings are contingent. The intricate intercon-
However, this is not necessarily the case. Just nectedness between research results, research
because a dataset is analyzed adequately, i.e. the
question, data, and analysis is illustrated in
analysis conforms to established standards and
that its output provides an answer to the research
Figure 35.2.
question, it does not mean that no other analyses Even though this model is less parsimo-
are equally adequate for this dataset and research nious than the research model presented in
question. A different analytical model with the Figure 35.1, it is more comprehensive, making
same data or a similar statistical model with explicit the complex interactions between
different data is likely to produce variations in the different parts of the research process. As a
results, even if the research question remains the more realistic representation of the research
same. What is neglected in this tripartite research process, it implies that:
process with one dataset and one analytic strategy
is the awareness of equally suitable alternatives, • The research question, data collection, and data
i.e. other suitable datasets or analytic strategies,
analysis are interconnected reciprocally and are
which could have served equally well to answer
thus not connected to one anther chronologically.
the research question. Due to the implied causal
For example, experienced researchers formulate
chain – from research question via a dataset to
precise research questions or hypotheses based on
the research results, variations in results due to
part or the data that can be or have been collected.
alternative data choices or analytic strategies are
They furthermore collect data such that they are
rarely considered.
suitable for a particular set of analyses. This does
not only refer to the kind of data necessary to
answer research questions but also to the ‘shape’
An alternative view of the research data needs to have in order to be analyzed
process according to various analytical techniques.
• The intricate relations between the components
Experienced researchers are not taken in by of the research process make it necessary to
this traditional model and its implications. consider the research process within a larger
They are aware that the components of the research framework, within which the relations
research process are far more integrated, that between the components are as important as the
many decisions about data analysis have been components themselves. Thus, there may exist
Research Data
Question
Analysis Results
Figure 35.2 Interdependence between the research question, data, analysis, and results
different ways to analyze a particular dataset or among the countries participating in the
there may exist many different datasets relevant survey. However, as only 21 items of the scale
for a research question. The criteria for data were included in the ESS survey, some aspects
and analytic selection are not only based on the of the theory cannot be tested fully (Schwartz,
suitability in relation to the research question, 2005). An inclusion of the entire scale or a
but also on familiarity with the data or analytic
different subgroup of items, a different sample
technique, contemporary fashions and trends,
institutional politics, access and cost, political and
of individuals within the participating coun-
economic context, etc. tries, a different set of participating countries,
• The research results are a function of not etc. may have changed the results generated
only the research question, but also of choices by the testing of the hypothesis regarding
relating to the selection and preparation of data the universal nature and structure of cultural
and analysis. As different datasets and different values. Furthermore, values are studied in
analytic techniques respond to different parts of many other ways. For example, Schwartz
a particular research question, so will the results be labeled the value associated with prestige
a function of not only the research question per se, and social status ‘Power.’ Two items measure
but also of the selection of the dataset (including Power on a six-point ordinal scale in the
how key concepts were operationally defined
ESS: ‘important to be rich, have money and
before data were collected, as well as the context
within which these data were collected) and
expensive things’and ‘important to get respect
the analytic technique (including how data were from others.’ It is debatable whether these
prepared for analysis and which analyses were two survey items adequately measure prestige
conducted). In other words, regardless of whether and social status and to what extent prestige
researchers frame their work in a materialist- and social status encapsulate power as a
realist or a constructivist paradigm, empirical desirable and trans-situational goal for survey
research always has a constructivist slant to respondents. For instance, Treiman (1977),
it because no objective manner exists to, for Coxon and Jones (1978), and Ganzeboom
instance, define, measure, or analyze a cultural and Treiman (1992, 1996) propose markedly
value, an attitude, a social class, a policy, an different ways to conceptualize and measure
education level, or a poverty line. Empirical
prestige. Returning to the research model,
research results are framed by the way a research
question has been phrased and operationally
it should be clear from this example that
defined, as well as what and how empirical cultural value theory can indeed be tested
phenomena were selected and prepared as data. with the ESS data, but always only partially.
They are framed furthermore by how and in Other data could be used for the same theory
what context these data have been collected and associated hypothesis, which may not
and prepared for analysis, how they have been only produce different results, but might also
analyzed, and how the results from the analysis address a different aspect of the research
have been interpreted and qualified. question. For instance, including one of the
omitted items relating to Power in the ESS,
While both models indicate that data should i.e. authority (‘the right to lead or command,’
be collected or selected according to their suit- Schwartz, 1997) may have not only changed
ability for the research question, the second the result in relation to the presence of this
model also shows – via the double arrows – value in the participating countries, but also
that any specific dataset will only partially have had implications for the way Power was
answer a research question. An example will assessed with this item. Thus, the research
clarify the arguments above. The European question not only has obvious implications
Social Survey (ESS, 2004) includes 21 items for the selection of data but, less obviously,
of a 56-item scale to measure ten ‘universal the actual data selected have implications
cultural values,’ as developed by Schwartz for what part of the research question is
(1999). It should thus be possible to test being answered. Accordingly, the relationship
Schwartz’s hypothesis that the 10 values between the research question and data is
indeed exist within a particular configuration reciprocal in nature.
It is a matter of purpose and debate, whether statistical analysis. To pursue this problematic
empirical research ought to begin with basic argument and, thus, shed light on why
laws or theory, whether it should start with and how multiple datasets could be used
empirical observations from which laws and for quantitatively oriented research, it is
theory are deduced, or whether research necessary to explore how data are classified
iteratively vacillates between data and theory more generally.
(e.g. Bryman, 2001). Nevertheless, empirical
research is irreducibly connected with both
Qualitative vs. quantitative data
theory and data. Indeed, it is argued here that
no datum can be conceived of or understood in One of the most widespread and misleading
the absence of explicit or implicit theoretical classification systems divides data into quan-
assumptions, and that data can be understood titative and qualitative data. Within this tradi-
and evaluated in terms of their suitability and tion, there are three different practices relating
quality only with regard to their relationship to this nomenclature. First, it is used to dif-
with a research question and how they are ferentiate between variables measured on so-
to be analyzed. In order to substantiate this called continuous and discrete scales. The age
argument, it is necessary to examine how of respondents in months or years, the precise
the term ‘datum’ is used and in what way net annual household income, the estimated
assumptions and interpretations are part of this percentage of time dedicated to specific
usage when conducting research. To explain leisure activities, etc. are habitually pre-
how and why different types of data can be sented as continuous and are thus considered
used for empirical research, it is necessary to quantitative variables, while place of resi-
explore: first, what data are made of; second, dence, religious affiliation, ethnicity, etc. are
the reasons for using more than one dataset often considered discrete and thus qualitative
for a research question; and, third, how these variables. Setting aside a critique of this
reasons connect differently to various parts of particular practice, it should be evident that
the research process. data thus coded have already gone through
a theoretical and analytic process such that
using the terms ‘qualitative’ and ‘quantita-
WHAT ARE DATA MADE OF? tive’ in this narrow sense is most useful
for the selection of a particular statistical
Etymologically, a datum, past participle of the technique with which these data may be
Latin word dare, i.e. ‘to give,’ implies that a analyzed. Second, the bifurcation of data into
datum is something given or something that qualitative and quantitative data often refers
exists, and that in some way it reflects or to a more general form in which observations
at least is connected with an understanding have been recorded, e.g. numbers vs. words
of what colloquially is referred to as reality. or numerical vs. textual data. The problem
Most students taking an introductory course with this form of classification is that, on
in statistics may get the impression that social the one hand, numbers often stand for words,
science data originate from spreadsheets, concepts, or positions on axes of judgment
readymade and conveniently organized into and that, on the other hand, textual data could
rows and columns that correspond to cases and be easily, and often are, transformed into
variables, respectively. But data are of course numerical form. Furthermore and related to
the result of a very long production chain, this point, numbers and text do not share
which includes operationalization, selection, the same level of abstraction in that numbers
translation, and transmogrification processes often stand for text. Finally, dividing data
(e.g. Marsh, 1982). into numbers and text does not do justice
By either habit or misconception, only to the tremendous variety of data used in
certain kinds of data, usually the rows-and- the social sciences, such as visual and audio
columns kind, are believed to be suitable for data. Depending on the research question and
design, such data can be transformed into data collected from surveys or experiments
numerical or other kinds of data. Third, this will be subjected to statistical analysis, it
bifurcation often reflects the way in which may be of interest to explore interactions
data are analyzed. Accordingly, quantitative between researchers and respondents in
data are ostensibly analyzed statistically, survey research or (quasi-) experiments non-
while qualitative data are not. However, any statistically.
so-called qualitative data, e.g. texts, audio and
video recordings, symbols, photos, drawings,
Sense data, objective data, and
etc. could be transformed into numerical form
subjective data
and then analyzed statistically. Quantitative
content analysis is one of the numerous Amore elaborate way to classify data connects
techniques, in which non-numeric data are to its relation to a presumed external reality,
analyzed statistically. As such, one would add dividing data into sense data, objective data,
to the confusion by proposing that ‘qualitative and subjective data. The most obvious way to
data’ are analyzed quantitatively. think about where data come from and what
From these arguments, the terminology they are made of relates to sense perception,
‘qualitative data’ and ‘quantitative data’ i.e. acquiring and processing sensory informa-
should be considered misnomers, and they tion not only through the five senses – vision,
should be avoided because these three prac- audition, gestation, olfaction, and tactition, as
tices are confusing and misleading. The terms proposed by Aristotle in De Anima, Book II,
‘qualitative’ and ‘quantitative,’ if they must but also thermoception (heat), nociception
be used, should be restricted to how data (pain), equilibrioception (balance), and pro-
are analyzed, though even this usage is not prioception (body awareness), etc. (Hurley,
entirely unproblematic. 1998). More usual are empirical data that are
derived from sense data. Derived data can
be based on memory or experience such as
Data as a product of the data
attitude or value statements, or data that are
collection method
inferred from sense or other derived data.
Beyond dividing data into qualitative and Particularly the latter form of data gives rise
quantitative data, another typical way to to the central constructs in the empirical
classify data is to associate them with the social sciences such as poverty, exclusion,
method with which they were collected. class, networks, identity, family, household,
Accordingly, interviews, focus groups, par- etc. A further distinction of these indirect
ticipant observations, indirect measurement, derivatives is termed ‘objective and subjective
surveys, experiments and quasi-experiments, data.’
etc. generate interview data, focus group No longer requiring the perceived infor-
data, observational data, experimental data, mation to represent faithfully the external
etc. This classification is far less problem- objects, objective data nowadays are more
atic but neither makes a clear statement likely to refer to data that may be observed
about data types or of their content, nor and possibly verified by more than one person.
about how these data will be analyzed. Duncan et al. state that ‘[o]bjective phenom-
Despite the incorrect assumption that inter- ena are those that can be known by evidence
view data or data from participant obser- that is, in principle, directly accessible to
vations, e.g. will be submitted to some an external observer. Often that evidence
form of qualitative analysis, it is not is actually a matter of record, although the
only conceivable but occasionally of par- relevant records may not be easily sampled
ticular interest to statistically analyze data for the population of interest’ (1984: 8).
from interviews or participant observations Experimental data or answers to survey
(e.g. Johnson, 1978; Bernard, 2005). Sim- questions relating to name, gender, age,
ilarly, while it is usually assumed that commuting distance to work, annual gross
income from work, marriage status, number data but none captures sufficiently how data
of unprotected sexual encounters in the should be integrated in the research process
past month, etc. are examples of objective more generally and in quantitative research
data in that the information conveyed by more specifically. Another conceptualization
the data could be verified by someone of data is needed in order to make a convincing
other than the respondent. However, whether argument about why and how more than one
confirmation by others would indeed render dataset should be used in quantitative analysis.
data truly objective is questionable. On the one If it is not possible to make a convincing case
hand, convergence between the respondents’ about the use of data from existing typologies,
answers and an external observer about the then it needs to be made with regard to their
phenomenon under investigation does not purpose in the different stages of the research
guarantee objectivity as the respondent and process.
the external observer may misperceive or
misjudge the phenomenon in a similar way.
On the other hand, divergent information FOUR GENERAL REASONS FOR
from verification through alternative sources COMBINING DATASETS IN
may not automatically falsify the respon- QUANTITATIVE RESEARCH
dents’ declarations. In contrast, subjective
data are data that ostensibly cannot be There are four general reasons for using more
verified by external observers. In this vein, than one dataset in one research project, par-
‘[s]ubjective phenomena are those that, in ticularly in quantitatively oriented research:
principle, can be directly known, if at all, verification, convergence, complementarity,
only by persons themselves’ (Duncan et al., and holism.
1984: 8). Examples of subjective phenomena
are answers to questions relating to atti- • Verification: Using data for the purpose of
tudes, values, preferences, judgments, etc. verification can take a number of different forms.
However, if this were true, i.e. if attitudes Generally, verification here means to assess some
and values exist only in the minds of form of fit, whether empirical or theoretical,
the persons in question, then they would between an ostensibly established dataset or
be of little significance to social science theory and another, less well-established dataset.
research. Attitudes and values, e.g. often Verification is part of what is often referred
have behavioral and symbolic correlates to as convergent validity. However, this form
of ‘validation,’ i.e. convergence, differs from
such that they can be inferred by external
verification, in that using more than one dataset
others. Hence, Duncan et al. add the qual-
for the purpose of convergence goes beyond
ification that ‘a person’s intimate associates a comparison of results with some empirically or
or a skilled observer may be able to surmise theoretically established baseline.
from indirect evidence what is going on • Convergence: Researchers often consider findings
“inside”’ (ibid). from different datasets and different studies in
It should have become evident from these order to examine how results between different
typologies that, contrary to habits and frequent time periods, contexts, or samples converge.
misconceptions, all types of data presented Convergence can be of importance with regard
above could be analyzed quantitatively, i.e. to data quality, changes across time periods,
statistically. Indeed, it is important to differ- regional and situational variations, etc. The idea
of convergence connects to convergent (and
entiate between what data one wants to use
divergent) validity. Derived from measurement
and how these data need to be prepared for
theory and well established in psychometrics,
statistical analysis; the former is connected convergent validity relates to the extent to
most closely to the research question, the which items or sets of items that should
latter to the statistical technique that will be associated with one another theoretically
be performed. The typologies presented so indeed can be observed to relate to each other
far are based on the origin and uses of statistically (Campbell & Fiske, 1959). When using
unconnected dataset, i.e. when it is not possible were presented as if they are mutually
to correlate items or sets of items with each other, exclusive, researchers may actually pursue
it is more difficult to assess convergence. a combination of these reasons within one
• Complementarity : In essence, complementarity or more research phases. What remains
stands for the use of more than one dataset to be accomplished is to connect these
for the purpose of finding additional but directly
general reasons with the different phases in
related aspects that can be discerned only in
their combination. There may exist theoretical
quantitatively oriented research in order to
and empirical reasons for combining different show that, on the one hand, different reasons
conceptualizations and empirical findings in order could be employed in the same research phase,
to get an additional perspective on a particular or that the same reason could be employed
theory or research finding. Important here is that fruitfully for very different purposes, depend-
researchers do not simply examine additional data ing on the particular research step and
that in some way relate to the research topic as research aim.
practically any additional dataset would provide
further insights or qualifications. Instead, the use
of an additional dataset should go beyond the
desire for an ‘additional perspective.’ It should be REASONS FOR USING MULTIPLE
either theory driven or at least pursue a specific
DATASETS AND THE FOUR RESEARCH
purpose. As with the first two reasons for using
more than one dataset, here too, complementarity COMPONENTS
is often not mutually exclusive from the other
reasons. With regard to the term ‘data,’ Coombs (1964)
• Holism: Holism is an extension of complementarity distinguishes between recorded observations
but goes one step further. It is based on and that which is analyzed. More precisely,
a classical view of empirical research and stands his theory of data, implicitly emphasizing
for the aim of studying the phenomenon under inductive and exploratory approaches to
investigation as it exists ‘in reality,’ i.e. beyond quantitative research, includes three phases:
the limits of research-related errors, biases, the selection and recording of observations
and subjectivity. Thus, each dataset and results
from a universe of potential observations; the
associated with it is considered a piece of the
puzzle that will eventually, if combined correctly,
production of data by interpreting, classifying,
reveal the true phenomenon and its dynamics and labeling these observations; and, by
(Brewer & Hunter, 2006). The extent to which applying the data to an analytic model, the
findings from many different datasets, often identification of relations, order, and structure.
collected for different purposes and in different Given that the results from an analytic
contexts, are able to eliminate all kinds of model do not speak for themselves but must
errors and provide insight into how things really be interpreted, one could propose a fourth
are ‘out there’ are questionable. While holism phase: the transformation from the relations,
explicitly aims at depicting reality by piecing order, and structure as emergent from the
together evidence from different data sources, analysis into research results, usually by an
complementarity merely uses different datasets
interpretive process that links these to theories
for establishing, expanding, or testing an idea or
theory. Nevertheless, the use of multiple datasets
and research questions. While Coombs’ Data
in the pursuit of holism has been practiced widely Theory is predominantly concerned with the
in the past although, with Kuhn (1970, 1983) and process of identifying patterns and structures
Rorty (1991), one wonders whether a belief in the from existing data, this chapter is an extension
idea of objectivity and convergence of scientific in three ways: it explores how and why more
progress toward some external reality is necessary than one dataset could be used in quantitative
for even classical approaches to science. analysis; it examines the research process
beyond the identification of structures and
The use of more than one dataset in a research patterns from existing data; and it emphasizes
project may be justified based on these the non-chronological ordering and intercon-
four general reasons. While these reasons nectedness of the research components.
All four research components are embed- a compound variable reflecting household
ded in creative and interpretive processes. net income, which will then be used for
In the absence of objective guidelines with further analysis. Some transformations are so
regard to how researchers get from the complicated that authors publish conversion
conceptualization of the research question to tools that allow users to recode certain
the interpretation of the statistical results, variables according to such a key (e.g.
each step requires creative decisions that are Ganzeboom & Treiman, 1992). But rather
forced upon the researcher. Hence, each phase than using existing compound variables or
delimits the research results in a particular using a conversion tool, one may want to adapt
way, where different types of data and or test these instruments by, e.g. introducing
different types of analysis could be used or omitting variable, or by weighing the
for different purposes. The four research importance of a variable differently.
components and the four general reasons With regard to convergence, for instance,
for using more than one dataset allow for researchers may not be satisfied with using
16 possible combinations, i.e. using more only a subset of an established scale to test
than one dataset for verification, convergence, a theory, such as the example about the
complementarity, and holism in relation universal cultural values as described above,
to identification/selection of observations, and may therefore collect additional data.
recording/transforming them into data, iden- Within limits, they may also want to verify
tifying analytically patterns/structures, and whether the ESS data from the subset of
interpreting patterns/structures meaningfully. Schwartz’s value scale are adequate to assess
This section will not cover all 16 possibilities values as proposed by the full 56-item scale,
but only provide examples on how more as discussed earlier. Researchers may also
than one dataset can be used for different want to verify the universality of Schwartz’s
reasons and different components. For the theory on values by collecting and analyzing
following, it is also important to realize data for a country not part of the ESS.
that it is only possible in principle to Verification may also include an examination
separate the four general reasons for using of the suitability of a shortened scale or
more than one dataset across the different the representativeness of a dataset with
research components. In practice, the line of regard to some demographic indicators, e.g.
demarcation between categories can be rather gender, ethnic composition, age, etc., by
difficult to identify. comparing them with national census data,
for instance. With regard to complementarity,
researchers may be interested in testing or
Links between data and their
qualifying the universality of value systems
patterns
or Schwartz’s value structure by exploring
For most of those conducting quantitatively alternative cross-cultural datasets on values
oriented research, the relations within the (e.g. Hofstede, 2001).
research process between raw data and their There are numerous problems associated
patterns and structures are the most accessible, with comparing the results of an analysis,
and so I will begin by outlining reasons including the compatibility of the contexts
for using more than one dataset within this within which data were collected, the com-
phase. Data are either explored for patterns, patibility of the sample, the compatibility of
or the fit between theory-guided patterns the variables, etc. (Kiecolt & Nathan, 1985;
and the data are analyzed. Often, data are Dale et al., 1988).
also prepared for further analysis, e.g. by Beyond these, there are also problems
creating compound variables or components. associated with combining micro and micro
For example, variables relating to income, data. In short, the social sciences often
debts, savings, etc. of individuals living attempt to connect micro-level data such as
in a household may be used to produce individual behaviors or family dynamics with
macro-level data such as social norms and convergence, and complementarity may nev-
power structures. For example, Alexander ertheless play an important role in advanc-
and Giesen (1987) identified the five main ing theory and empirical support thereof,
approaches to micro-macro analysis in the without necessarily providing (or needing to
social sciences. According to the major provide) the ‘real’ model of complex social
strands in social theory, society is created systems.
by (a) rational individuals, (b) interpretive
individuals, (c) socialized individuals acting
Links between patterns and their
as a collective force, (d) socialized indi-
interpretation
viduals who reproduce the existing social
environment on a micro-level, and (e) rational The complex relations between patterns
individuals who acquiesce due to external and structures on the one hand, and their
forces of social control (cf. Münch & Smelser, interpretation, on the other, is also part of the
1987). However, it should be noted that main focus in statistically oriented research.
there is nothing intrinsic about a level to An analysis of presences or absences, patterns,
be identified as micro or macro, i.e. they order, structure, etc. within datasets still needs
represent relative points on a continuum. to be interpreted. A mere statistical description
In other words, interactions, families, or thereof is insufficient because coefficients do
neighborhoods could represent the micro or not speak for themselves but must be linked
macro level, depending on their integration meaningfully to the research question and the
into a model. What is important, however, underlying theories. Convergence can play an
is that there are more micro-level units important role at this state. For instance, in
than macro-level units, and that the micro our study on intergenerational social mobility
units can be assigned to a macro unit. in Switzerland, we examined all large-scale
The computational complexity of assessing data available in Switzerland that contained
the interrelation of systems that are formed information about social position between
concurrently between micro units, between parents and their children (Joye et al., 2003).
macro units, and between micro-macro units Convergence was important in three respects:
is tremendous (Saam, 1999), but the greater first, with regard to data quality, data col-
problem is the lack of unity between theoreti- lected during approximately the same period,
cal and computational models. Consequently, including census data, should converge in
some researchers, particularly those engaged order to cross-validate the datasets in relation
in empirical research, often argue that the to their representativeness of the population
levels cannot be combined, i.e. that macro- under investigation. Convergence of datasets
level data follow a different set of laws and from different time periods was examined
logics than micro-level data. Theorists, on in order to explore how intergenerational
the other hand, have long been involved in social mobility has changed in Switzerland
conceptualizing the relationship so central to over time. Finally, the chronological trends
social science (e.g. Mill, 1961; Luhmann, identified in this study were compared to those
1982; Giddens, 1991; Collins, 2000), but have of other countries in order to explore how
difficulties with finding convincing empirical social mobility in Switzerland converged with
evidence for their sophisticated arguments. other European countries.
While the combination and analysis of micro- When using unconnected dataset, i.e. when
and macro-level data, either separately or it is not possible to correlate items or sets of
pooled, may provide important insights into items with each other, it is more difficult to
the complex relations between and within the assess convergence. Meta-analysis is an area
levels, it is likely that the gap between empiri- of research, where multiple research findings
cal results and theory will remain. Combining are compared with each other. There often
micro- and macro-level data for verification, exist many different studies with their own
data, all pursuing a similar research question. opinion (e.g. John Zaller, 1992) or the
Even though the first meta-analysis was role of cognitive intelligence in society
performed to merely increase statistical power (Herrnstein & Murray, 1994). An appeal
(Pearson, 1904), data and findings of related toward holistic research is also made by
studies can be pooled and compared with each Brewer and Hunter (2006), who even argue
other in relation to a substantive theory. Meta- that by integrating the ‘four major research
analysis, the ‘analysis of analysis’ (Glass, styles’ – fieldwork, surveys, experiments, and
1976) attempts to identify and partially non-reactive research – it would be possible to
correct artifacts and variations in findings due take advantage of the strength of each of these
to sampling and measurement error, range methods and, thus, arrive at ‘valid’ research
restriction, correlation bias etc. over a series results. The attractiveness of this approach
of studies (Hunter & Schmidt, 1990). With is an underlying quest for systematization
variations, this is basically accomplished by of the many competing and conflicting
identifying a set of studies for meta-analysis theoretical and empirical approaches. For
that are relevant to a research question, numerous reasons elaborated in this chapter,
determining the suitability for inclusion in however, theories and empirical findings on
a meta-analysis in terms of the research a research topic are bound to be conflicting
respondents, variables, time period, research and contradictory. Rather than attempting to
design, etc., assessing the effect size of the isolate the one set of social science theories
different studies with regard to the qualities and empirical findings that are superior
and quantities under investigation (e.g. group according to some set of criteria, presumably
mean differences, correlations, proportions, because they are closer to reality, social
etc.), creating comparability between the science research may indeed be marked not
effect sizes by a coefficient, and then examin- only by systematic thought and analysis, but
ing the convergence and variability between also by eternal ambiguity about the validity,
the studies and their respective data (Lipsey & utility, and context dependence of different
Wilson, 2001). A further variant is that of approaches. This may not necessarily be
pooled data, i.e. combining data collected a bad thing. It could be argued that it
at multiple sites, different time periods, or is precisely this ambiguity, the competition
a combination thereof (Beck, 2001; Halaby, between theories and empirical approaches
2004). However, it is often argued that pooling that can be considered a way of doing science;
data is fraught with error due to heterogeneity not necessarily a closing in of how things
problems across datasets (Maddala, 1999). really are in a mind-independent reality but
Another problem is the issue of which studies a negotiation of questions and their pursuit
to include: some argue that methodologically between different stake holders in a particular
weaker studies should also be included, time and space.
albeit with a different weighting, while others
propose to include only methodologically
Links between recorded
sound studies (Abrami et al., 1988). The
observations and data
‘file-drawer effect’ is yet another problem
because meta-analyses often exclude non- Statistically oriented research is often not
significant findings as these are usually not directly involved in problems associated with
published. transformation between recorded observa-
Combining datasets and their analysis is tions and data. Nevertheless, a considerable
also often practiced in search of holism. information loss occurs during the recording
For example, some researchers interested of an observation, e.g. from the attitude of vot-
in voting behavior may use a multitude of ers at the moment of recording to the recorded
available data in order to pursue complex attitude statements in the questionnaire, from
theories, e.g. a general theory on public the lived experience of the interview situation
to the interview transcript, etc. This loss, be used for verification, convergence, or
however, should not only be considered as complementarity.
a potential source of bias in a classical sense,
but also as a necessary step in the focusing
of empirical phenomena to a set of relevant Links between potential
aspects as defined by the researcher’s focus
observations and recorded
and research question. Once observations
observations
have been recorded in whatever crude form,
e.g. photos, ticks on questionnaires, piles Despite the fact that the transformation from
of items sorted by respondents, interview a potential observation to a recorded obser-
recordings, etc., they must be turned into data vation is so crucial to the research process,
before they can be analyzed quantitatively. most quantitatively oriented researchers have
At times, this is done simultaneously, such not considered its complexity sufficiently.
as in the encoding of responses with CAPI Thus, a brief transgression into related fields
or CATI, where the interviewers encode will shed light on the complexity, within
responses directly into preexisting response which potential observations are ultimately
categories, often with significant freedom transformed into data via their selection
and error when reinterpreting respondents’ and recording. Originally explored by pre-
answers (Elias, 1997a, 1997b). Turning the Socratic philosophers in conjunction with the
recorded observations into meaningful cate- limits of our senses to provide us with true
gories is an art in itself, as the following quote knowledge about the world (White, 1991;
illustrates: Dancy & Sosa, 1992), this first transition –
from potential to recorded observations – has
In the field one has to face a chaos of facts, occupied a prominent position in cognitive
some of which are so small that they seem
insignificant; others loom so large that they are
and social psychology as well as social
hard to encompass with one synthetic glance. But anthropology and ethnography.
in this crude form they are not scientific facts at Termed ‘sense data’ by twentieth-century
all; they are absolutely elusive, and can be fixed philosophers, information from our senses
only by interpretation, by seeing them sub specie appears to reproduce external objects in the
aeternitatis, by grasping what is essential in them
and fixing this. (Malinowski, 1948: 238)
mind via perception. According to sense-
theorists such as Russell (1927) and Moore
Turning observations into data is a form (1953), the book you are reading is repre-
of taming and disciplining them, turning sented by sense information relating to shape,
them into a form that is suitable for texture, weight, color, etc. such that the object
a particular type of analysis. Far too little is represented by the mind according to this
attention is paid to this process, which, perceived sense information. Thus, sense data
ultimately, includes a type of analysis at reflect the attributes that an object is believed
least as important as the subsequent analysis to have. But sense data also relate to the
with the thus derived data. For example, awareness of perception and are, thus, always
before a quantitative content analysis can also mind dependent. From a materialist-
be performed, non-numeric material needs realist perspective, even though the size and
to be coded meaningfully. While there are shape of this book varies if viewed from
some tentative suggestions about how to different angles or distances, these changes are
derive and verify these codes, e.g. via iterative variations in perspectives of the same external
procedures (Glaser & Strauss, 1967; Glaser, object. As part of cognitive development
2005) and inter-rater reliability (Gwet, 2001), of infants, this and related issues have
the processes suggested in the literature are at been studied by developmental psychologists
best guidelines and recommendations. In this such as Piaget (1955) under the heading
case, producing different datasets from the of conservation and persistence. However,
same set of recorded observations could a number of philosophers (e.g. Austin, 1962;
Jackson, 1977) question the possibility of i.e. theories and ideas. It is quite likely that
representation of external objects through many data collectors, coders, and researchers
sense data, listing in particular phenomena are subject to similar tendencies when they
relating to illusions, hallucinations, double are identifying, sorting, and interpreting
vision, and the time delay between existence as relevant a small subset of observations
and perception. Furthermore, sense data such in the pursuit of a particular research
as color, taste, smell, and sound do not exist question.
in the external world but are recognized as Indeed, social anthropologists and ethnog-
attributes due to specific interactions between raphers initially attempted, and later contested
stimuli, physiology, and mind. As such, the the possibility of, an objective description
perception of objects is fundamentally influ- of meaning structures. Social anthropologists
enced by human physiology and psychology. and ethnographers initially attempted to, and
More precisely, research in social cognition later contested the possibility of, objec-
has revealed that perception and memory tively encode and present meaning structures.
are shaped by prior knowledge and cur- Malinowski, the first modern anthropological
rent context. Asch proposed two competing explorer and specialized fieldworker, outlined
models for impression management (1946): the tools with which to understand the
according to Asch’s configurational model, complexities of meaning structures external
individual elements of perception are aligned to one’s own mental context. Empathy and
to form an overall impression such that these insight, acquired in part through long-term
can be changed according to context and exposure to a socio-cultural environment of
expectations. His algebraic model proposes concern, are the tools that were believed to
that individuals assemble all elements of assist in the understanding of the meaning of
perception and then come up with a combined such phenomena.
impression thereof. Both of these models But careful and systematic empirical
have received wide attention, while the latter observations and detailed descriptions of
has had a strong influence on attitude and socio-cultural phenomena have created incon-
value research (e.g. Fishbein & Ajzen, 1975). sistencies and, ultimately, doubts about the
Heider’s balance theory (1944) is related feasibility of precisely this undertaking.
to Asch’s in that perceived elements tend At least since the 1970s, a time period marked
to be changed in people’s minds, if they by what Geertz named the ‘crisis of repre-
do not fit an existing model. Apparently, sentation,’ it was clear that a reproduction
sense information is adapted to fit existing of meaning or, more generally, the transport
thought structures in order to maintain of meaning from one meaning system to
unified, overall impressions and knowledge another, is at least problematic. As Geertz
structures. Less socially oriented, Bartlett states:
(1932) explored how past behaviors and
There is a lot more than native life to plunge into if
experiences are organized into patterns such one is to attempt this total immersion approach to
that they facilitate future cognitions and ethnography. There is the landscape. There is the
behavior. isolation. There is the local European population.
While psychological studies about social There is the memory of home and what one has
cognition and impression management focus left. There is the sense of vocation and where
one is going. And, most shakingly, there is the
on general human processes including cog- capriciousness of one’s passions, the weakness
nition, motivation, and behavior, the findings of one’s constitution, and the vagaries of one’s
from these studies could well be applied thoughts: the nigrescent thing, the self. It is not
to researchers and the research process. a question of going native…. It is a question of
Researchers make sense of a confusing living a multiplex life: sailing at once in several seas.
(1988: 77)
and complex environment and, here too,
researchers may have the tendency to adjust Clifford (1983, 1986) goes so far as to con-
and adapt elements to fit existing schemas, sider insights acquired through observations
as highly intersubjective engagements, i.e. tools, and the analysis, interpretation, and
where observations are ‘orchestrated’ within presentation of research results, also within
politically charged situations, far better the limits of well-established tools and forms.
reflecting the ethnographer’s view and Inconsistencies between research findings,
position than that of the people and situations if detected at all, are usually attributed
observed. While this position represents an rather vaguely to differences in theoretical
extreme view in the social sciences, it nev- approaches, data collection and analysis
ertheless stresses correctly the intersubjective methods, interpretations, etc. Sooner or later,
nature and selectivity of phenomena, long the self-correcting nature of science, so it is
before these phenomena are recorded, trans- hoped, will take care of these inconsistencies.
formed into analyzable data, and analyzed. The third way begins with the recognition
From a practically infinite number of possible that all knowledge derived from empirical
empirical phenomena, in themselves only a research is partial, subject to argument,
subgroup of all potential empirical phenom- verification, and revision. This third option
ena that could have been chosen, researchers also paves the way for using more than
select as empirical evidence for their project one dataset for quantitative research, not
that which they believe to be suitable, based on merely for purposes relating to verification
specific social, economic, political, cultural, or convergence, but also for complementarity
etc. considerations (Bergman, 2002). In other and holism.
words, even before a shred of empirical All four reasons for the use of more
evidence has been conceived of as a potential than one dataset could be relevant in this
source of data, the research results have been permanently transitional phase. A researcher
‘compromised.’ could use additional data to verify whether
The social sciences deal with this problem a construct has been adequately concep-
in three ways. The first entails a call for tualized and captured by existing studies.
the abandonment of empirical research alto- Similarly, convergence and complementarity
gether, supported by the claim that research could motivate a researcher to propose
thus tainted would not yield what is often an alternative, shorter, or otherwise more
called objectivity or, in epistemology and convenient way to collect data, which would
the philosophy of science, is considered true either test or elaborate on an existing study
knowledge or truth, i.e. knowledge that is not or theory. For example, values have been
subject to argument and perspective. In this studied cross-nationally not only by Schwartz
vein, Tyler (1986) proposes that the aims of and his colleagues, but also by, for exam-
science in general, and ethnography in par- ple, Hofstede (2001), Triandis (e.g. 1995)
ticular, are now an evocation of an imagined and Abramson and Inglehart (1995). There
reality between the author and the reader of may exist theoretical and empirical reasons
scientific texts for therapeutic and aesthetic for combining different conceptualizations
effect. The second, far more frequently and empirical findings in order to get an
practiced way to deal with this problem, par- additional perspective on how values are
ticularly by quantitatively oriented research, distributed across nations and in which
is to ignore it. Concerns outlined above combination they are distributed between
are drowned out by comfortable routines regions or social groups. Finally, this third
engrained in the craft and habits of doing way also reconnects researchers to the human-
research. These include: the formulation of made artifacts within research, e.g. that a
a research question or hypothesis in line ‘value’ is a complex construct and that it
with the literature of respectable authors and is part of a form of shorthand that allows
journals, the operational definition of key researchers to explain to their public a set
constructs relating to the research question of phenomena, which they have crafted and
or hypothesis, data collection according to identified as relevant within a particular space
these definitions and with well-established and time.
CONCLUSION results from one dataset. Indeed, studies using

more than one dataset, particularly those
Research results are a function of not only that focus on complementarity, often appear
the research question and how it is embedded rather disjointed. Such studies can give the
in its socio-cultural, political, economic, and impression that they consist of a set of loosely
situational context, but also of the choices related research findings without sufficient
that are made in relation to data selection connection to each other. However, Segal’s
and analytic strategies. Given the chapter’s law, cited at the beginning of this chapter,
focus on the use of more than one dataset in does not propose that it is better to have just
a quantitatively oriented research project, it one watch; instead, it may simply be less
was necessary to dispel some misconceptions confusing. But an absence of confusion due
about data and about which kind of data to divergent research findings should not be
are used for statistical analysis. Next, four equated with coherence and the notion of
general reasons were presented for using more truth in a scientific sense. Instead, divergences
than one dataset in such studies: verification, and inconsistencies are formidable sources
convergence, complementarity, and holism. for the elaboration and qualification of
In the final part of this chapter, four phases of theory and research findings. Rather than
the research process were examined in order ignoring complexities and inconsistencies,
to illustrate different reasons for using more which experienced researchers are aware of
than one dataset. Overall, research results will anyway, a greater explicit attention to these,
remain contingent, and they will always only for instance by using more than one dataset
provide partial answers to a research question. in quantitatively oriented studies, would be
What is argued here is to abandon both an important development toward a more
the relativistic and the positivistic approach; critical and differentiated approach to post-
neither is it satisfactory any longer to attribute postmodern social science research.
inconsistencies in findings to differences in
methods and approaches, nor are efforts
sustainable that aim at identifying objective ACKNOWLEDGMENTS
structures behind the varying results. It
is difficult to outline a third alternative I would like to thank Eugène Horber,
at this time because most ontological and Bernhard Kittel, and two anonymous review-
epistemological approaches seem to force ers for their incisive comments on an earlier
researchers chose sides. Nevertheless, using draft of this paper.
more than one dataset to explore, verify,
complement, or qualify may point to a
solution. As such, integrating this research NOTES
strategy into quantitatively oriented projects
can be considered an important development 1 Even though models relating to the research
for modern social science practices well process vary in terms of their individual components
beyond classical validity issues. This possi- and complexity, e.g. Leedy (1989) also includes,
bility is supported by the expansion and user- among other things, ‘identification of the problem’
friendliness of data archives, the availability and ‘statement of the problem’ as separate compo-
nents, while Walliman (2005) adds the theoretical
of a tremendous number of datasets, the cost- background and ethical issues among 14 components
efficiency and popularity of secondary data of the research process, they almost always share the
analysis, etc. four main components.
There are of course also disadvantages 2 While many textbooks on research meth-
associated with the use of more than one ods divide research approaches into inductive and
deductive research, sometimes even connecting
dataset. Beyond economic and other resource- deductive research to quantitative research and
related constraints, it is always easier to tell inductive research to qualitative research, research
a coherent story about one set of statistical practices differ. Indeed, while most exploratory
data analyses tend to emphasize induction and The Poetics and Politics of Ethnography. Berkeley,
hypothetico-deductive research, exploratory analysis CA: University of California Press.
needs some form of ‘container’ that provides a Collins, R. (2000). Situation stratification: A micro-
minimal theoretical underpinning from which explo- macro theory of inequality. Sociological Theory, 18,
rations are conducted. On the other hand, most 1, 17–43.
statistical modeling, which formally is based on
Coombs, C.H. (1964). A Theory of Data. New York:
hypothesis testing, includes model adjustments for
Wiley.
various theoretical and empirical reasons. Hence,
empirical research in practice is rarely purely inductive
Coxon, A.P.M., & Jones, C.L. (1978). The Images of
or deductive (see also Bryman, 2001). Occupational Prestige. London: Macmillan.
Dale, A., Arbor, S., & Proctor, M. (1988). Doing
Secondary Analysis (Contemporary Social Research
Series No. 17). London: Unwin Hyman.
Dancy, J., & Sosa, E. (1992). A Companion to
REFERENCES
Epistemology. Oxford: Blackwell.
Duncan, O.D., Fischhoff, B., & Turner, C.F. (1984).
Abrami, P.C., Cohen, P.A., & d’Apollonia, S. (1988). Domain of the study: Objective and subjective
Implementation problems in meta-analysis. Review phenomena. In C.F. Turner & E. Martin (Eds.),
of Educational Research, 58, 2, 151–179. Surveying Subjective Phenomena (vol. 1). New York:
Abramson, P.R., & Inglehart, R. (1995). Value Change Sage.
in Global Perspective. Ann Arbor, MI: Michigan Elias, P. (1997a). Social class and the standard
University Press. occupational classification. In D. Rose & K. O’Reilly
Alexander, J.C., & Giesen, B. (1987). From reduction to (Eds.), Constructing Classes: Towards a New Social
linkage: The long view of the micro-macro debate. Classification for the UK. Swindon: ESRC/ONS.
In J.C. Alexander, B. Giesen, R. Munch, N.J. Smelser Elias, P. (1997b). Occupational Classification: Con-
(Eds.), The Micro-Macro Link. Berkeley: University of cepts, Methods, Reliability, Validity, and Cross-
California Press. National Comparability. Occasional Papers, 20,
Asch, S.E. (1946). Forming impressions of personality. OECD, Warwick: Institute for Employment Research.
Journal of Abnormal and Social Psychology, 41, ESS (2004). ESS documentation report 2002/2003. The
1230–1240. ESS Data Archive. Norwegian Social Science Data
Austin, J.L. (1962). Sense and Sensibilia. Oxford: Services. http://www.europeansocialsurvey.org/
Clarendon. Fishbein, M., & Ajzen, I. (1975). Belief, Attitude,
Bartlett, F.A. (1932). A Study in Experimental and Social Intention, and Behavior: An Introduction to Theory
Psychology. Cambridge: Cambridge University Press. and Research. Reading, MA: Addison-Wesley.
Beck, N. (2001). Time-series cross-section data: What Ganzeboom, H.B.G., & Treiman, D.J. (1992). Interna-
have we learned in the past few years? Annual Review tional Stratification and Mobility File: Conversion
of Political Science, 4, 271–293. Tools. Utrecht: Department of Sociology.
Bergman, M.M. (2002). Reliability and validity in Ganzeboom, H.B.G., & Treiman, D.J. (1996). Interna-
interpretative research during the conception of the tionally comparable measures of occupational status
research topic and data collection. Sozialer Sinn, 2, for the 1988 international standard classification of
317–331. occupations. Social Science Research, 25, 201–239.
Bernard, H.R. (2005). Research Methods in Anthro- Geertz, C. (1988). Works and Lives: The Anthropologist
pology: Qualitative and Quantitative Approaches as Author. Stanford, CA: Stanford University Press.
(4th ed.). Walnut Creek, CA: Alta Mira. Giddens, A. (1991). Modernity and Self-Identity: Self and
Brewer, J., & Hunter, A. (2006). Foundations of Society in the Late Modern Age. Stanford: Stanford
Multimethod. Research: Synthesizing Styles (2nd ed). University Press.
Thousand Oaks, CA: Sage. Glaser, B.G. (2005). The Grounded Theory Perspec-
Bryman, A. (2001). Social Research Methods. Oxford: tive III: Theoretical Coding. Mill Valley, CA: Sociology
Oxford University Press. Press.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and Glaser, B.G., & Strauss, A. (1967). Discovery of
discriminate validity by the multitrait-multimethod Grounded Theory: Strategies for Qualitative Research.
matrix. Psychological Bulletin, 54, 297–312. Chicago: Aldine.
Clifford, J. (1983). On ethnographic authority. Represen- Glass, G.V. (1976). Primary, secondary, and meta-
tations, 1, 2, 118–146. analysis of research. Educational Researcher, 5, 3–8.
Clifford, J. (1986). Introduction: Partial truths. Gwet, K. (2001). Handbook of Inter-Rater Reliability.
In J. Clifford & G.E. Marcus (Eds.), Writing Culture: Gaithersburg, MD: StatAxis.
Halaby, C. (2004). Panel models in sociological research: Moore, G.E. (1953). Some Main Problems of Philosophy.
Theory into practice. Annual Review of Sociology, 30, London: George, Allen and Unwin.
507–544. Münch, B., & Smelser, N.J. (1987). Relating the micro
Heider, F. (1944). Social perception and phenomenal and macro. In J.C Alexander, B. Giesen, R. Münch,
causality. Psychological Review, 51, 358–374. N.J. Smelser et al. (Eds.), The Micro-Macro Link.
Hofstede, G. (2001). Culture’s Consequences: Compar- Berkeley: University of California Press.
ing Values, Behaviors, Institutions, and Organizations Pearson, K. (1904). Report on certain enteric fever
across Nations. Thousand Oaks, CA: Sage. inoculation statistics. British Medical Journal, 3,
Herrnstein, R.J. & Murray, C. (1994). The Bell Curve: 1243–1246.
Intelligence and Class Structure in American Life. Piaget, J. (1955). The Construction of Reality in the Child.
New York: Simon & Schuster. London: Routledge and Kegan Paul.
Hunter, J.E., & Schmidt, F.L. (1990). Methods of Meta- Raiffa, H. (1968). Decision Analysis. Reading, MA:
Analysis: Correcting Error and Bias in Research Addison-Wesley.
Findings. Newbury Park, CA: Sage. Richard, R. (1991). Objectivity, Relativism, and Truth.
Hurley, S. (1998). Consciousness in Action. Cambridge, Cambridge: Cambridge University Press.
MA: Harvard University Press. Russell, B. (1927). The Analysis of Matter. New York:
Jackson, F.C. (1977). Perception: A Representative Harcourt, Brace.
Theory. Cambridge: Cambridge University Press. Saam, N.J. (1999). Simulating the micro-macro link: New
Johnson, A.W. (1978). Quantification in Cultural approaches to an old problem and an application to
Anthropology. Stanford: Stanford University Press. military coups. Sociological Methodology, 29, 43–79.
Joye, D., Bergman, M.M., & Lambert, P. (2003). Schwartz, S.H. (1999). A theory of cultural values and
Intergenerational educational and social mobility some implications for work. Applied Psychology – an
in Switzerland. Swiss Journal of Sociology, 29, 2, International Review, 48, 23–47.
263–291. Schwartz, S.H. (2005). Universalism values and the
Kiecolt, K.J., & Nathan, L.E. (1985). Secondary Analysis inclusiveness of our moral universe. In A.-M. Pirttilä-
of Survey Data (Quantitative Applications in the Social Backman, M. Ahokas, L. Myyry, & S. Lähteenoja
Sciences). Newbury Park, CA: Sage. . (Eds.), Values, Morality and Society: Change and
Kuhn, T.S. (1970). The Structure of Scientific Revolutions Diversity. Helsinki: Gaudeamus.
(2nd ed.). Chicago: Chicago University Press. Schwartz, S. H., Verkasalo, M., Antonovsky, A., &
Kuhn, T.S. (1983). Rationality and theory choice. Journal Sagiv, L. (1997). Value priorities and social desirabil-
of Philosophy, 80, 10, 563–570. ity: Much substance, some style. British Journal of
Leedy, P.D. (1989). Practical Research: Planning and Social Psychology, 36, 3–18.
Design (4th ed.). London: Collier Macmillan. Treiman, D.J. (1977). Occupational Prestige in Compar-
Lipsey, M.W., & Wilson, D.B. (2001). Practical ative Perspective. New York: Academic Press.
Meta-Analysis (Applied Social Research Methods). Triandis, H.C. (1995). Individualism and Collectivism.
Thousand Oaks, CA: Sage. Boulder, CO: Westview.
Luhmann, N. (1982). The Differentiation of Society. Tyler, S.A. (1986). Post-modern ethnography: from
New York: Columbia University Press. document of the occult to occult document.
Maddala, G.S. (1999). On the use of panel data methods In J. Clifford & G.E. Marcus (Eds.), Writing culture:
with cross-country data. Annales d’économie et de The Poetics and Politics of Ethnography. Berkeley:
statistique, 55–56, 429–448. University of California Press.
Malinowski, B. (1948/1916). Magic, Science and Walliman, N. (2005). Your Research Project (2nd ed.).
Religion, and Other Essays. Boston: Beacon. London: Sage.
Marsh, C. (1982). The Survey Method: The Contribution White, N.P. (1991). Plato’s epistemological metaphysics.
of Surveys to Sociological Explanation. Winchester, In R. Kraut (Ed.), Cambridge Companion to Plato.
MA: Allen & Unwin. Cambridge: Cambridge University Press.
Mill, J.S. (1961[1843]). A System of Logic. London: Zaller, J.R. (1992). The Nature and Origin of Mass
Longmans, Green & Co. Opinion. Cambridge: Cambridge University Press.
36
Writing and Presenting
Social Research
Amir Marvasti
Traditionally, there has been a divide between In the social sciences, while some remain
‘science’ and ‘literature,’ mostly due to devoted to the traditional divide, there
the belief that representing ‘scientific facts’ is a growing awareness of the rhetorical
requires a method of writing that is free from dimensions of writing and representing facts,
aesthetic whimsy and emotions. A procedural particularly among qualitative researchers
approach to writing was first developed by (see, for example, Alasuutari 1995 and
natural scientists (e.g. physicists) and later Gubrium and Holstein 1997). This reflexive
adopted by social scientists (e.g. sociologists) or rhetorical turn, as it is often called,
as the ideal model for disseminating facts. centers on the recognition that any effort
Thus grew the two representational cultures to inscribe social reality invariably involves
of science and literature, with the former linguistic constructive practices as well.
presiding over the domain of ‘universal Perhaps the work that is most widely cited
truths’ and the latter being relegated to the in connection with this movement in the
world of fiction and individualistic self- social sciences is James Clifford and George
expression. Marcus’s Writing Culture: The Poetics and
The divide between science and literature Politics of Ethnography (1986). This edited
went unchallenged well into the second volume calls for social scientists, particularly
half of the twentieth century. However, ethnographers, to see writing as a craft
a ‘third culture’ of representation (Shaffer that involves culture, aesthetics, and politics.
1998) is now questioning the necessity of As stated in this book’s introduction, ‘the
treating science and literature as mutually making of ethnography is artisanal, tied to the
exclusive realms of knowledge. This emerg- worldly work of writing’ (p. 6).
ing interdisciplinary field focuses on the Another important work in this area is John
reflexive relationship between the two worlds Van Maanen’s Tales of the Field (1988). This
of representation where literature influences book is also concerned with ethnography and
science and science informs literature. its stylistic conventions. Through secondary
WRITING AND PRESENTING SOCIAL RESEARCH 603
analysis, Van Maanen identifies different proposed writing strategies for texts that are
genres of ethnographic texts (e.g. realist, sensitive to postmodern sentiments. It has
confessional, and impressionist). He argues been suggested that these experiments or
that rather than describing a single social alternative representational forms expand
reality seen from multiple perspectives, the representational space of ‘value-free’
variations in writing construct realities of research, provide strategies for challenging
their own. For Van Maanen, ‘[T]here is no dominant texts, and convey fresh perspec-
way of seeing, hearing, or representing the tives on old questions. Alternative forms
world of others that is absolutely, universally, of writing also have been the subject of
valid or correct’ (p. 35). considerable criticism, which I take up in the
In the analysis of writing as representational conclusion.
practice, some of the greatest contributions In the remainder of this chapter, I offer
come from feminist scholars who have doc- a brief survey of these alternative writing prac-
umented the absence or distortion of female tices by focusing on the following six genres:
subjectivity in dominant textual paradigms (1) writing with pictures, (2) performative
(e.g. Irigaray 1985 and Butler 1990). At the writing, (3) writing factual fiction, (4) poetic
same time, feminists have turned our attention representation, (5) writing the author, and
to the linguistic nuances and conventions (6) post-colonial writing. I end the chapter
of texts and their gendered tones. For with a critical assessment of these genres.
example, Laurel Richardson (1990, 2000)
shows the prevalence of literary devices
(e.g. metaphors) in social science texts. WRITING WITH PICTURES
For her, scientific writing is never neutral
but is invariably embedded in practices of The old saying ‘a picture speaks a thou-
power and oppression. As she writes, ‘power sand words’ is now considered theoretically
is, always, a sociohistorical construction. naïve—pictures, like written texts, are seen
No textual staging is ever innocent. We are as constructive of the realities they represent.
always inscribing values in our writing. It is Gillian Rose’s Visual Methodologies (2001)
unavoidable’ (1990, p.12). offers an excellent postmodern analysis of
As a whole, the textual shift in the the place of the visual in contemporary
social sciences relates to a larger movement society and social research. According to
that explicitly and intensely questions the Rose, rather than simply providing ‘realistic’
value and presumably benign character of all representations, the visual creates the reality
scientific knowledge. This movement largely under observation. Images provide ways of
referred to as ‘postmodernism’ or ‘post- seeing social issues from particular cultural
structuralism’ challenges the very authority standpoints. Thus a given image can be
and linguistic structures of science and their interpreted in different ways depending on the
representations of ‘truth.’ For example, the viewers and their cultural sensibilities.
renowned sociologist and postmodern thinker, While the visual has always had a place
Norman Denzin (1993), states in the social sciences, its use and analysis
have fluctuated over the history of var-
[i]f there is a center to recent critical poststructural
thought, it lies in the recurring commitment to strip ious disciplines. For example, more than
any text of its external claims to authority. Every a hundred years ago, the American Journal
text must be taken on its own terms. The desire to of Sociology, the flagship journal of the
produce a valid and authoritarian text is renounced. discipline, published a number of articles that
Any text can be undone in terms of its internal-
used photos as data (Stasz 1979). According
structural logic. (p. 136)
to Elizabeth Chaplin (1994: 201), the first
While some have dismissed the textual shift manuscript of this type was F. Blackmar’s
as a passing fad, others have embraced it ‘The Smoky Pilgrims’ published in 1897. The
as the new logic of social science and have study depicted poverty in rural Kansas using
posed photographs. Yet, this earlier interest that cinematic representations are mere
in the visual waned as the written word entertainment with no social value. Instead,
accompanied with numerical analysis became he argues that we understand and express
the dominant mode of sociological analysis. ourselves and our social settings through
In a way, statistical figures, charts, and tables Hollywood films. According to Denzin, cin-
became the visual centerpieces of professional ematic representations both describe social
sociological publications (Marvasti 2003). realities and mandate a way of seeing
It is worth noting that this trend was or accepting these realities. Consider, for
not followed in the related discipline of example, his analysis of the movie When
anthropology where the visual has remained Harry Met Sally:
a strong and legitimate component of the
discipline’s representational practice. The movie … is a ‘Field Guide to Single Yuppies’. …
As such it takes a stand on and defines the
In the different editions of the Handbook of
following problematic terms; being single versus
Qualitative Research, Douglas Harper offers being married; sexuality and women’s orgasms;
thorough surveys of the growing field of love, sexuality, and friendship; life after divorce, or
visual research. In the most recent edition after breaking up with a lover. These terms are
(2005), he notes, for example, that Contexts, presented as obstacles. … The solutions are gender
specific. Women must not be single, must learn
a relatively new journal of the American
how to fake orgasms, so that males think they
Sociological Association, makes use of visual have sexual power. … Men, on the other hand,
images in three ways. First, images can be must have a woman who lets them think they can
used to illustrate the text. Second, they are make them sexually happy. They need male friends
used as part of visual essays where the images to talk to, because women don’t understand male
sexuality. In this battle between the sexes, sex must
dominate the discussion and the text for
be overcome, before love and friendship can be
the most part describes the images. Third, achieved. (Denzin 1995: 117)
Contexts articles sometimes use images to
visually depict the process of social change According to this analysis, such cinematic
(748–749). representations mandate a way of thinking
In the broader context of writing in the about male-female relationships. When Harry
social sciences, one can think of the visual Met Sally becomes a sort of how-to guide
in two ways: (1) writing about pictures and on heterosexual relations, constructing and
(2) writing with pictures (as is the case describing the reality of how men and women
with most typologies, these categories are not should relate to one another. Over time
mutually exclusive). Writing about pictures cinematic representations become taken-for-
involves the analysis of existing images, granted truths that both construct and validate
often for the purpose of cultural critique. gender stereotypes.
For example, in his landmark sociological In the field of anthropology, Catherine
study, Gender Advertisements (1979), Erving Lutz and Jane Collins’ Reading National
Goffman analyzed how gender roles and Geographic (1993) offers a brilliant critique
expectations are reflected in magazine ads. of the representations of non-Western cultures
Using over 500 photos, he critiqued taken-for- in the National Geographic. This analysis
granted nature of gender relations in Western connects the magazine’s photographs with
societies. Goffman showed how magazine ads Western assumptions about ‘savage’ cultures
in the late 1970s, depicted men in active roles and their exotic lifestyles. As Lutz and
(doing things like helping patients or playing Collins put it, ‘Non-Westerners draw a look,
in sports), whereas the women were depicted rather than disattention or interaction, to the
as mere spectators, passively watching the extent that their difference or foreignness
men’s activities. defines them as noteworthy yet distant’ (188).
Similarly, in Images of Postmodern Society The authors show how such ‘looks’ are
(1991) and Cinematic Society: The Voyeur’s reflected in the National Geographic’s rep-
Gaze (1995), Norm Denzin rejects the notion resentations of ‘foreignness.’ The magazine’s
photos can thus be seen as ‘gazes’ that subjected to “scientific” and “professional”
construct the exotic other. discourse. Photography resists a language of
Aside from analyzing existing images, analysis. The image speaks in silence. We give
writing with pictures could also involve ourselves up to that which is beyond language
creating first-hand visual material for the and rational thought’ (p. 381). In a sense,
purpose of illustrating, complementing, or Quinney uses photographs in the same way
transcending the written text. In the social some social scientists use poetry to transcend
sciences, anthropology is a leader of the the limits of scientific and ordinary language
use of pictorial and filmic materials for (poetic representations are discussed later in
illustrative purposes. For example, G. Bateson this chapter).
and Margaret Mead’s Balinese Character: The use of photographs is most common in
A Photographic Study (1942) juxtaposes text multidisciplinary fields like cultural studies.
and the visual in a complementary way so that For example, Crossing the Divide: Strangers,
one would enhance the meaning of the other. Neighbors, Aliens in New America presents
In the words of the authors, interviews with people from the multiethnic
communities of Queens, New York. Here is
We are attempting a new method of stating the
intangible relationship among different types of
how the authors describe the project:
culturally standardised behavior by placing side by
We decide to become travelers in our own
side mutually relevant photographs. … By the use
backyard. For three years we trek between the
of photographs, the wholeness of each piece of
shadows of the block-long superstores that now
behavior can be preserved. (Bateson and Mead
dominate most of the major boulevards in Queens,
1942: xii, as quoted in Harper 1994: 404)
down the side streets, into the bodegas, family-
For example, by placing a series of photos of owned restaurants, homes, places of worship,
libraries, and community rooms—looking for
a given native ritual on one page and related
migrations stories, culture, and soul. (Lehrer and
text on the opposite page, Bateson and Mead Sloan 2003: 12–13)
encourage their readers to see and read the
story simultaneously. The still photos in this book show the
In sociology, one of the most recognized interviewee’s faces, the places where they live
voices of the visual has been Howard Becker, and work, and the cultural artifacts that define
who in a 1975 article called for advancing their ethnic background. Even the written text
beyond photography as an art form to itself is manipulated for visual effect with
seeing it as a mode of representing and different font types, sizes, and colors adding
analyzing social reality. He also promoted more layers of textuality and meaning to the
greater appreciation for the role of social work.
theory in the production and analysis of Similarly, Body Type: Intimate Messages
photographic images (Harper 1994: 406). Etched in Flesh (Saltz 2006) tells the
Becker subsequently published Exploring stories of tattoos and the people who wear
Society Photographically (1981), an edited them. The written text plays a minimal role
book with a visual presentation style similar in this book. Instead, the photographs of
to that of Bateson and Mead. tattooed body parts dominate the book. Each
Photographs can also be incorporated in photograph is accompanied with a direct quote
writing personal narratives. For example, explaining its significance for the tattooed
Richard Quinney (1996) uses photographs person. Interestingly, the book does not
from his father’s trip to California in the contain any facial images; the respondents are
1920s to tell the intimate, nostalgic story of identified only through their tattoos.
his relationship with his father. Even though Writing with the visual continues to expand.
Quinney’s photographs are interspersed with As Douglas Harper (2005) notes, emerging
a good deal of writing, he gives greater computer technologies are revolutionizing the
weight to the visual impact of his work. use of visual material in social research.
In his words, ‘photographs are not to be Particularly, multimedia texts can now easily
combine pictures and written material in the and its conventions. In Sarah Finely’s words,
same context, thanks to technology that is ‘art-based research’
exceedingly affordable. Additionally, multi-
is an act of political emancipation from the
media texts can be posted on internet websites dominant paradigm of science for new paradigm
accessible to users virtually from any location researchers to say “I am doing art” and to mean
in the world. A key feature of internet-posted “I am doing research” – or vice versa. In either
multimedia text (e.g. ‘hypertext’) is that the utterance, that art and research are common acts
material does not have to be read or viewed makes a political statement. (Finely 2003: 90, cited
in Finely 2005: 685)
linearly like a bound book. So-called ‘hot
links’or ‘hyperlinks’allow the readers to jump There are many variations to this approach
from one passage to another. For example, where the author becomes an acting voice or
while reading a hypertext ethnography, the body in evocative texts. For the purpose of
reader can click on pictures from the field, see this discussion, I present a research example
an image of a respondent, and click on his that literally involves a staged performance.
name to see excerpts from an interview with Specifically, I use Gray Ross et al’s ‘Making
that respondent. a Mess and Spreading It Around: Articulation
Sarah Pink (2001) suggests that hypertext of an Approach to Research-Based Theater’
brings a sort of reader-oriented coherence to to offer a summary of how social research is
ethnographic research. In her words, ‘The transformed into theater. The original research
coherence of ethnographic hypermedia is data for the staged performances discussed in
created in the relationship between the design this work come from Ross et al’s studies of
of the text and how it is interpreted. It depends cancer patients (i.e. women with breast cancer
on authors’ creativity for the former and users’ and men with prostate cancer).
for the latter’ (169). Pink also notes that The first step in staging research is
hypertext allows for continuous revisions of preparing a script. The authors recommend
the original work: avoiding ‘representations that fail to deliver
the promise of an engaging and visceral
Theoretically, this means neither knowledge itself connection with the research material’ (Ross
nor representations of knowledge are ever
et al 2002: 62) by consulting expert directors,
complete. … Practically, this means that, unlike
printed books and finished films, on-line hyperme- scriptwriters, set designers—generally people
dia texts may be up-dated, added to, or altered. with expertise about what does or does not
Video sequences may be re-edited, photographs work on stage. Additionally, Ross et al suggest
manipulated in new ways, written words changed, that the following groups be included in the
and the hyperlinks between them modified. (p. 167)
development of the script: (1) researchers who
are familiar with the nuances of the data;
For an example of hypermedia ethnographies (2) research participants whose stories are
discussed in Pink (2001), visit the following being told; and (3) people who are ‘naïve to
website: http:anthropology.ac.uk/Bhalot the area under study’ (p. 63) and can provide
insight about how outside audiences might
respond to the performance.
The script itself can incorporate: (1) the
PERFORMATIVE WRITING original research findings; (2) a ‘second
research process’ (64) where new insights
This genre of writing is the most aesthetically emerge through secondary analysis and exam-
conscious (Ellis and Bochner 1992; Paget ination of the original data; and (3) invented
1995; Mienczakowski 1996; Denzin 1997, scenes from rehearsals and improvisations.
2000, 2003). Like other genres discussed The script should then be read, reread,
thus far, the goal here is to transcend the rehearsed, and revised.
limits of ordinary language and to, overtly or Finally, the cast could include both orig-
covertly, rebel against mainstream academia inal research participants and actors who
have become intimately familiar with the pedagogical and theoretical models for the
roles. To encourage audience participation kind of alternative writing or ‘creative ana-
a traditional viewing can be followed by lytical practices’ (Richardson and St. Pierre
a discussion and question-and-answer session 2005: 962) that are now gaining momentum
with the actors, researchers, and director. in the field. The same frustrations about the
Of course, this entire process involves limitations of objectivity and the need to
deliberate choices about what is included ‘bring the text to life’ inspired journalists
and what is excluded from the research. to experiment with innovative modes of
For example, an important research finding representing the stuff of everyday life and find
may not be dramatically and aesthetically ways of ‘writing about oneself in relation to
powerful and thus cannot be included in the the subject at hand’ (Brett Lott, cited in Moore
script. Ross et al advise against improvising 2007: 280). The sociological emphasis on the
the material to the point where the original reflexive relationship between the self and
research participants no longer recognize the social world is echoed in the pedagogy
themselves on the stage. This commitment of creative nonfiction. For example, in his
to ‘real’ people seeing themselves on the introductory text for English courses about
screen serves two purposes. On a practical this genre, Dinty Moore delineates the link
level, if a dramatization of a tragedy does not between reality and the imaginative author in
connect with the very people who endured this way:
the suffering, then there might be reason to
believe that the work has failed theatrically. A subject becomes noteworthy, in other words,
On a more analytical level, the matter of because the author takes close notice, and then
finds a way to transmit his or her own fascination
authenticity takes center stage here, so to with the subject to the curious reader. Moreover,
speak. That is, we are once more faced with a writer of creative non-fiction is not asked to
the question: To what extent does the per- be invisible .… In fact, voice and point of view
formance represent ‘real’ life experience? As are fundamental to what is creative about creative
this example indicates, alternative practices non-fiction. (2007: 11)
do not necessarily resolve representational
Creative nonfiction writers have offered
dilemmas; sometimes they simply transport
insightful analyses of topics that are the
the questions to a different arena. In the case
mainstay of the social sciences. For example,
of research-as-theater, as the written text is
through ‘total immersion’ (the equivalent of
set aside in favor of bodily performance,
what Adler and Adler (1987) call ‘complete
the problem of representing ‘authentic selves’
participant role’), Lee Gutkind explores the
migrates onto the stage.
‘humanistic aspects of the high-tech medical
world’ (1998: 6). His book Many Sleepless
Nights looks at the lives and practices
WRITING FACTUAL FICTION
surrounding organ transplantations. Gutkin
observes that in their single-minded devotion
Despite the apparent contradiction in the
to ‘saving lives’ surgeons become detached
phrase, factual fiction or what is known
from the emotional health of the very lives
as ‘creative nonfiction’ outside the social
they are saving:
sciences, is an exciting and influential
school of writing with a long and distin- I once listened to a prominent surgeon impatiently
guished history of transgressing the divide interrupt a resident who was carefully explaining
between objective truth and imagination (see, a procedure to a family member, prompting
for example, Truman Capote’s 1966 novel him to “save lives first—answer questions later.”
In Cold Blood). As Michael Agar notes Another surgeon told me, in defense of his
insensitive behavior, “Psychologic [sic] trauma and
(1995), although largely ignored by social all that stuff is important, but it doesn’t make a
scientists, creative nonfiction and literary goddamn difference if you are well-adjusted and
journalism in many respects could serve as dead.” (p. 7)
In contrast, Gutkind’s study of veterinary a systematic explorer, and a careful reporter

medicine titled An Unspoken Art notes that in ways that are responsive to a community
touch and emotions ironically play a more of researchers’ (Rosenblatt 2002: 907).
important role in the business of healing
animals. He recounts a surgical procedure on
a race horse where,
POETIC REPRESENTATION
Eight exhausted veterinarians and nurses, all
women, remained in the recovery area with Cam At first glance representing science through
Fella (the horse), sitting in a circle, elbow to elbow, poetry may seem impractical and con-
keeping him calm. Touching him. Kissing him.
trary to the aphorisms regarding objectiv-
Talking to him. Until he was awake enough to stand
on his own and navigate the winding path back to ity and detachment. Conventional wisdom
his stall. (p. 8) suggests that poetry is the language of
emotions and science the language of facts.
A good example of the social science version The synthesis of the two, as in the phrase
of creative nonfiction can be found in Paul ‘poetic science,’ thus seems oxymoronic.
Rosenblatt’s ‘Interviewing at the Border of Yet, as suggested throughout this chapter,
Fact and Fiction.’ This author relies on such divisions are linguistic constructions
fictional and literary tropes for soliciting in their own right and do not reflect
and narrating life experiences. Rosenblatt’s inherent properties of texts. Indeed, the
interview data, for example, are explicitly proponents of the third culture (alluded
solicited in search of stories that are ‘good to earlier in the chapter) have noted that
enough to be fiction’ (Rosenblatt 2002: 898). literary movements like Romanticism were
Likewise, his composition and narration styles directly influenced by scientific thought. For
do not just report the facts or present interview example, Joanne Merrison’s (1998) ‘The
experts and analysis; rather, Rosenblatt’s text Death of the Poet: Coleridge and the Science
is constructed around aesthetic and reader- of Logic’ highlights Samuel Coleridge’s
response priorities. He quite deliberately appreciation for logic and empirical observa-
engages in the kind of character and plot tion. According to Merrison, Coleridge was
building that one finds in the best of adamantly opposed to divorcing the ‘essence’
fiction: of ‘nature’ from lived experience and the
social contexts that make it meaningful. This
I talk about how people sit as they talk, what
they ask me, how they smell, how their language is shown in the following excerpt from
changes as who as present changes, how their a Coleridge poem:
dogs are players are in family experience, their
use of facial tissues when they cry, how they slide In nature there is nothing melancholy!
by family disagreements during a family interview, But some night-wandering man, whose heart was
the ways they can blithely and unapologetically be pierced
inconsistent, and how much they seem trapped by With the remembrance of a grievous wrong,
culture, neighbors, property, ownership, and much Or slow distempter, or neglected love,
else into thinking along certain lines and not others. (Coleridge, Poetical Works, p. 264, cited in Merrison
(Rosenblatt 2002: 901) 1998: 177–178)
To the degree that the social scientific genre Similarly, the famed Persian poet Omar
can be viewed as different from ‘creative Khayam was considered an important
nonfiction’ is that the former has different astronomer and mathematician, and his poetry
disciplinary ties and is more explicitly in the Rubaiyat is as much about the physical
committed to systematic and scholarly wonders of the universe as it is about aesthet-
research. For example, Rosenblatt writes ics and self-exploration per se. So the recent
that at the end even the most creative social attempt by social scientists to use poetry in
science writer ‘must still be a craftsperson, conveying their observations is not entirely
a consummate interviewer, a doubter, without precedent, nor is it entirely ‘new.’
The social scientist most widely associated experience. The genre simply gives the
with use of poetic prose in qualitative texts is author greater creative latitude in telling
Laurel Richardson, who argues, the story. As Richardson notes about the
above poem, ‘The speech style is Louisa
Poetic representation … is a practical and powerful,
indeed transforming, method for understanding
May’s, the words are hers, but the poetic
the social, altering the self, and invigorating the representation, including the ordering of the
research community that claims knowledge of our material, are my own’ (883).
lives. (Richardson 2002: 888) Again, initially, this kind of writing may
It is worth noting that this method of writing seem a radical departure from mainstream
does not imply an anything-goes approach representational practices in the social sci-
to writing. Formal training and conventions ences, but in some ways it is simply an
still apply. In fact, Richardson recommends extension of existing practices. In particular,
poetry classes for anyone interested in cre- qualitative researchers have always had the
ative writing of social science. She reminds discretion to use some material and not
her would-be followers that writing poetry others. Arguably, the choices that shape the
involves learning the basics of a craft like ‘final report’ have never been completely
any other. Richardson draws attention to the detached from aesthetic concerns. To the
importance of ‘sound, sight, and ideation’ degree that ethnographers strive to tell a
(p. 881) (i.e. tone, imagery, and symbolism) coherent story their field experiences, they all
in poetic representations and chides, engage in poetic revisions. Surprisingly, this
observation equally applies to quantitative
A line writing. I recently attended a job interview in
break does which the candidate presented several colorful
not
a poem
graphs of a regression analysis. In a sense, the
make. (p. 882) statistical logic of the numbers projected on
the screen was complemented by aesthetically
The task of writing or rewriting research pleasing colors and shapes (e.g. a continuous
findings into poetic forms requires familiarity green line for one dependent variable and
with the conventions of the form and a good fragmented red line for another). At one point,
deal of practice. Like traditional poetry, this the candidate was openly complimented for
kind of writing begins with an object or his ‘nice graphs,’making explicit the aesthetic
a thing in the real world but then tries criteria for the assessment of the quantitative
to transcend the object through masterful representation of research findings.
description. The poetry is intended to be
a condensed and more powerful version of
the original text. For example, Richardson WRITING THE AUTHOR
rewrote the transcripts from a five-hour
interview with a Southern woman into a five- A few decades ago, including the subjective
page poem. Here is an excerpt from the voice of the author in the scientific text was
poeticized interview: considered antithetical to the very essence
Well, one thing that happens of science. Today, at least in the realm
growing up in the South of ethnographic texts, writing the author
is that you leave. I into the field notes, or autoethnography, has
always knew I would
become an established method of representing
I would leave. (p. 888)
research findings. There are many flourishing
The goal here is to convey the woman’s forms in this genre and a good deal of
life narrative without losing its emotional empirical and pedagogical literature.
tone to the very words that describe the A thorough survey of this type of writing
experience. Like other social scientific texts, can be found in the introductory chapter of
the basic objective is still representing human Deborah Reed-Danahay’s Auto/Ethnography.
Stylistic variations notwithstanding, one how looking at the world from a specific, per-
can gather form Reed-Danahay’s discussion spectival, and limited vantage point can tell,
that most experts concede that autoethno- teach, and put people in motion’ (2005: 763).
graphic writing is a self-reflexive account of In the field of autoethnography, the works
social experience. The central criterion for of Carol Ronai are exemplary because of
autoethnographic text appears to be that the her ability to combine the best analytical
explicit voice of the author must be embedded innovations of this genre with superior
in a broader social context. Autoethnographic aesthetic sensibility. Ronai’s writing is both
text is expected to tie idiosyncratic stories with informative and politically brave. The story
a larger universe of experiences and meanings. of how her father sexually abused her,
Reed-Danahay makes this point explicit in titled ‘My Mother is Mentally Retarded,’ is
her definition of autoethnography as ‘self- a classic example of what she calls a ‘multi-
narrative that places the self within a social layered account.’ In this particular form of
context’ (1997: 9). autoethnography, the author’s experiential
Having said that, how this is achieved account is juxtaposed against academic and
and for what purposes is the subject of popular discourses. The descriptions are
considerable debate and contention. In Reed- layered and deliberately disjointed using a set
Danahay’s chapter there seems to be a con- of asterisks. To better appreciate the potency
tinuum of representational strategies for of Ronai’s writing, consider the following
autoethnographers. On the one end, there excerpt:
is the minimally self-referential text that
I resent the imperative that all is normal with my
simply adds the author’s own subjective family, an imperative that is enforced by silence,
voice to the many voices and observations and “you don’t talk about this to anyone” rhetoric.
from the field. On the other end, there Our pretense is designed to make event flow
is ‘pure,’ ‘native’ experience represented smoothly, but it doesn’t work. Everyone is plastic
with little or no intervention from academic and fake around my mother, including me. Why?
Because no one has told her to her face that she
sources. For example, John Dorst’s The is retarded. We say we don’t want to upset her.
Written Suburb (1989, cited in Reed-Danahay I don’t think we are ready to deal with her reaction
1997) treats suburbanites’ artistic creations to the truth. … Because of [my mother] and because
(i.e. arts and crafts) as autoethnographic of how the family as a unit has chosen to deal
representations. For Dorst, autoethnography the problem, I have compartmentalized a whole
segment of my life into a lie. (1996: 115)
is a sort of ‘self-documentation’ done by
ordinary people. In this context, expert social As this excerpt shows, autoethnographic text
scientific description is unnecessary because can be a powerful method of representing
in a postmodern society anyone can be a social issue. Ronai’s gripping and ‘author-
an informed author of culture: ‘If the task itative’ voice compels the reader to engage
of autoethnography can be described as the topic. For many readers of ethnography,
the inscription and interpretation of culture, this representation of Ronai’s suffering has
then postmodernity seems to render the become an inescapable memory.
professional ethnographer superfluous’(Dorst
1989: 2, cited in Reed-Danahay 1997: 8).
Other advocates of autoethnography, who POSTCOLONIAL (RE)WRITING
fall somewhere in the middle of the two
extremes on the continuum, emphasize nei- This method of representation in some
ther academic nor ordinary dimensions of ways is as much about rewriting or un-writing
this genre but its potential for political the canonical texts as it is about writing per
action and change. For example, Stacy se. In some ways, postcolonial writing has
Holman Jones (2005) introduces her paper been the analytical engine of the many
titled ‘Autoethnography: Making the Personal alternative forms of representation in the
Political,’ in this way: ‘This is a chapter about social sciences. The seminal contributions
of postmodernists and poststructuralists have ‘constructing maps as innocently mimetic

played a crucial role in forming this body ignores the fact that maps are productions of
of knowledge. In particular, Jacques Derrida’s complex social forces; they create and manip-
direct assault on the authority of the text in ulate reality as much as they record’ (1994:
Writing and Difference (1978) and Michel 115–116). Ryan empirically demonstrates the
Foucault’s analyses of the constructive power constructive power of cartography through his
of text and discourse (1966, 1977) have analysis of maps and related texts, such as the
been instrumental in defining the field of following:
postcolonialism.
The postmodern critique of the authority The soft, blue, harmless sky of Australia, the pale,
white unwritten atmosphere of Australia. Tabula
of language enabled postcolonial writers
rasa. The world a new leaf. And on the new
to question the validity of so-called ‘sci- leaf nothing. The white clarity of the Australian,
entific’ texts about others. For example, fragile atmosphere. Without a mark, without
Edward Said’s Orientalism (1978) challenges a record. (D. H. Lawrence 1950: 365, cited in Ryan
Western representations of Arab or Eastern 1994: 129)
others. According to Said, the ‘Orient’ is
textually constructed as the mirror opposite Finally, postcolonial writing can be used to
of the ‘Occident’ in support of Western question mainstream culture. For example, in
stereotypes (e.g. where the West is rational, Anthropology as Cultural Critique, Marcus
the Arab world is irrational and childlike). and Fischer offer ‘defamiliarization’ (1999:
For Said, colonial dichotomies are primarily 137–164) as a writing strategy for challenging
constructed and maintained through textual the dominant culture. This method of writing
practices. sometimes involves exoticizing the West’s
Similarly, in Nations and Narration (1990), representations of itself to underline the fact
Homi Bhabha advances the critique of that any culture can be textually constructed as
colonialism by suggesting that the very idea ‘irrational’ or ‘primitive.’ A famous example
of ‘nation’ is textually sustained through of this kind of textual subversion is ‘Body
selective memories and a sort of textual Ritual among the Nacirema.’ In this article,
amnesia where the errors (or horrors) of the through a clever reversal of spelling (i.e.
empire are erased. Thus, it is not a factual ‘American’ into ‘Nacirema’), Horace Miner
history that defines the relationship between (1956) transforms the familiar Western culture
the colonists and the colonized but a set of and selves into an exotic tribe. For example, he
self-serving myths that conveniently validate rewrites the significance of familiar Western
colonial authority and its oppression of others. hygiene rituals, as seen in the following
But unlike Said, Bhabha is careful not to excerpt:
inadvertently reify the ‘self-other’ dichotomy In addition to the private mouth-rite, the people
through his own text. Instead, Bhabha argues seek out a holy-mouth-man once or twice a
that colonialism and its culture are ‘hybrid’ year. These practitioners have an impressive set of
and fluid; they are constantly rearticulated paraphernalia, consisting of a variety of augers,
through multiple discursive sources. awls, probes, and prods. The use of these objects
in the exorcism of the evils of the mouth
In addition to broad critiques of Western involves almost unbelievable ritual torture of the
imperialism, postcolonial writing sometimes client .… In the client’s view, the purpose of these
focuses on retelling particular stories of ministrations is to arrest decay and to draw friends.
the colonized. For example, in ‘Inscribing The extremely sacred and traditional character of
Emptiness: Cartography and the Construction the rite is evident in the fact that the natives
return to the holy-mouth-men year after year,
of Australia,’ Simon Ryan (1994) shows how despite the fact that their teeth continue to decay.
aboriginal inhabitants of Australia were made (pp. 504–505)
virtually invisible through cartographic texts
that represented the continent as vacant space, By casting the ordinary practices (e.g. a visit to
ready for Western occupation. Ryan states a dentist) in an exotic light, Minor exposes the
textual ‘tricks’ underpinning the construction CONCLUSION

of the ‘savage’ other.
As a whole, postcolonial writing argues The methods of writing discussed in this
that that the power of ‘the empire’ is chapter overlap and there are many other
mostly created and maintained through tex- forms that are not included. For example,
tual representations; therefore, it is through much can be said about ‘collaborative ethnog-
alternative texts that this power can be raphy’ and its inclusion of research partici-
undone. As Chris Tiffin and Alan Lawson pants in writing and editing of the findings
state in their book, aptly titled De-Scribing (see, for example, Lassiter 2005). Likewise,
Empire: entire books can and have been devoted to the
feminist influence on writing (see, for exam-
ple, Behar and Gordon 1995). Given these
just as fire can be fought with fire, textual control
can be fought with textuality, the post-colonial shortcomings, this chapter should be read as a
is especially and pressingly concerned with the necessarily selective map of an ever-changing
power that resides in discourse and textuality; terrain with many undiscovered territories.
its resistance, then, quite appropriately takes The topics discussed here can be thought of
place in –and from—the domain of textuality .…
as relatively known landmarks in an otherwise
The contestation of post-colonialism is a contest of
representation. (1994: 10) elusive territory. Specifically, representational
choices, authorship and authority debates, and
the need and moral compulsion to ‘give voice’
Two words of caution are in order in to marginal groups, as discussed in relation
this discussion of postcolonialism. First, to various writing forms in this chapter,
postcolonial writing is not synonymous with continue to be the central themes that fuel the
a naïve image of natives speaking for engine of textual experimentation in the social
themselves, or an ‘essentialist Third World sciences.
consciousness’ (Tiffin and Lawson 1994: 8, Of course, the status of ‘alternative’
see also Grifiths 1994). While such works does not exempt these texts from critical
are important in adding complexity to the assessment. Critics point out that some
understanding of subaltern identities (see for representational experiments result in bad
example, Yasmin Hussain’s Writing Diaspora writing. For example, in her review of Ellis’s
2005), we cannot assume that they are The Ethnographic I: A Methodological Novel
inherently ‘authentic’ and textually ‘innocent’ about Autoethnography, Pamela Moro writes:
because they are written by the ‘natives’
themselves. Such a conceptualization would The real question is, perhaps, whether Ellis is a good
contradict the core argument of rhetoric enough writer to pull off this heartfelt endeavor.
Writing good fiction is hard; writing compelling
theorists that all texts are embedded in culture
dialogue is extremely hard. I am not entirely sure if
and discourse. what Ellis has written is a “novel.” … It is as though
Second, despite its apparent phrasing, post- she has taken the shell of a novel and poured into
colonialism is not an analysis of events and it the material of textbook. (2006: 266)
practices of the past. This point is passionately
made by Robert Ashcroft in the following Other critics question whether alternative
passage: writing forms are effective in achieving their
emancipatory goals. For example, Atkinson
and Delamont caution that some writing
How many times must we insist that “post- experimentations inadvertently (1) re-center
colonialism” does not mean “after colonialism,”
the social scientist as the all-knowing author
that it begins from the moment of colonization?
Indeed, how often must we insist that post- and (2) promote an individualized rather than
colonialism exists? … How often must we wait an interactive view of social experience:
for the occasional applause attending post-colonial
theory to be matched by some small textual we warn against the wholesale acceptance of
application by the applauders? (1994: 34–35) aesthetic criteria in the reconstruction of social life.
In many contexts, there is a danger of collapsing although they can’t quite

the various forms of social action into one aesthetic hear the tune, they know
mode—that is, implicitly revalorizing the authorial if they could sing it
voice of the social scientist—and of transforming that even their wild
socially shared and culturally shaped phenomena rage and lust and death
into the subject matter of an undifferentiated but terrors would seem
esoteric literary genre. (2005: 823) as beautiful as the
endolithic algae
Of course, these criticisms signal the fact that that releases nitrogen
alternative or experimental forms are becom- into rocks so that
ing ‘in and of themselves, valid and desirable junipers can milk them. (p. 18)
representations of the social’ (Richardson and
St. Pierre 2005: 962). However, as avant- For writers like Deming, as languages that
garde writing becomes institutionalized, it attempt to describe ‘the unknown,’ science
has to contend with its own epistemologi- and poetry are not mutually exclusive. On the
cal inconsistencies. Concurrently, the main- contrary, she argues that ‘What science
stream academic establishment could change bashers fail to appreciate is that scientists, in
its strategy from dismissing the alternative their unflagging attraction to the unknown,
forms to appropriating and formalizing them love what they don’t know. It guides and
on its own terms (for example, see Leon motivates their work; it keeps them up at
Anderson’s (2006) article on ‘analytical night; and it makes that work poetic’ (p. 15).
autoethnography’ and the response from Ellis Accordingly, the language of science, in
and Bochner (2006)). its own peculiar way, is transcendental and
Writers of alternative texts find themselves poetic. Conversely, poetry often relies on the
the target of attack from three fronts: (1) posi- material objects that science tries to explain.
tivists who see their work as lacking scientific Instead of opposition, Allison speaks of an
objectivity; (2) progressive sociologists with ‘edge effect,’a term that in the field of ecology
their warnings against individualism and self- describes the border between two ecosystems
absorption; and (3) the would-be literary who where new life forms flourish (p. 23).
simply find the text lacking in craft, style, Ultimately, what is indisputable is that
and substance. The response is sometimes writing is an ongoing and socially embedded
moderate and sometimes decidedly opposi- practice. It is about ‘textwork’ (Van Maanen
tional, as in Denzin’s declaration of ‘guerrilla 2006: 14), or the practice, art, and craft of
warfare’ (1999) on mainstream academia. writing. Writing is also what Pertti Alasuutari
This tension and conflict may be unneces- calls a ‘literary process’ that:
sary. Again, creative nonfiction could serve as
an instructive example. Rather than opposing resembles riding a bicycle. Not in that once you have
science, some creative nonfiction writers are learned it you’ll master it, but because riding a bike
in fact inspired and intrigued by the language is based on consecutive repairments of balance.
of science. For example, Allison H. Deming The staggerings or whole detours of the text have
to be repaired over and over again so that they do
(1998) uses scientific observations and terms
not lead the story line in the wrong direction; and
in her poems. The following is a commentary the rambling of the first draft cannot be seen in the
on the scientific fascination with the wonders final product. (1995: 178)
of nature:
When the naturalists The best advice for writing ‘good’ social
See a pile of scat, science may be to keep writing and always
They speed toward it be open to constructive criticism. A social
As if a rare orchid scientist committed to writing should be pre-
Bloomed in their path
…
pared to relentlessly improve her craft. Often
An Ancient music they try adjustments may be necessary depending on
to recall because, the writing terrain in which one is traveling.
ACKNOWLEDGMENTS Denzin, N. 1991. Images of Postmodern Society.

Newbury Park, CA: Sage.
I would like to thank Jaber Gubrium, Pertti Denzin, N. K. (1993). ‘Rhetoric and Society.’ The
Alasuutari, and anonymous reviewers for their American Sociologist, 24,135-146.
comments on the earlier drafts of this chapter. Denzin, N. 1995. The Cinematic Society: The Voyeur’s
I am particularly indebted to Jay Gubrium for Gaze. Thousand Oaks, CA: Sage.
Denzin, N. 1997. ‘Performance Texts.’ In Represen-
providing the basic outline for this chapter.
tation and the Text: Re-framing the Narrative
Voice, edited by W. G. Tierney & Y. S. Lincoln.
Albany, NY: State University of New York Press.
REFERENCES pp. 179–217.
Denzin, N. 1999. ‘Two Stepping in the 90’s.’ Qualitative
Adler, P. and P. Adler. 1987. Membership Roles in Field Inquiry 5: 568–572.
Research. Thousand Oakes, CA: Sage. Denzin, N. 2000. ‘Aesthetics and the Practices
Agar, M. 1995. ‘Literary Journalism as Ethnography: of Qualitative Inquiry.’ Qualitative Inquiry 6:
Exploring the Excluded Middle.’ In Representation in 256–265.
Ethnography, edited by J. Van Maanen. Thousand Denzin, N. 2003. Performance Ethnography: Critical
Oaks, CA: Sage. pp. 112–129. Pedagogy and the Politics of Culture. Thousand Oaks,
Alasuutari, P. 1995. Researching Culture: Qualitative CA: Sage.
Method and Cultural Studies. London: Sage. Derrida, J. 1978. Writing and Difference. London:
Anderson, L. 2006. ‘Analytic Autoethnography.’ Journal Routledge.
of Contemporary Ethnography 35: 373–395. Dorst, J. 1989. The Written Suburb: An American Site,
Ashcroft, B. 1994. ‘Excess: Post-Colonialism and the an Ethnographic Dilemma. Philadelphia: University of
Verandahs of Meaning.’ In De-Scribing Empire: Post- Pennsylvania Press.
Colonialism and Textuality, edited by C. Tiffin and Ellis, C. and A. P. Bochner. 1992. ‘Telling and
A. Lawson. London: Routledge. pp. 33–44. Performing Personal Stories: The Constraints of
Atkinson, P. and S. Delamont. 2005. ‘Analytic Per- Choice in Abortion.’ In Investigating Subjectivity:
spectives.’ In The Handbook of Qualitative Research Research on Lived Experience, edited by C. Ellis and
(3rd ed.), edited by N. Denzin and Y. S. Lincoln. M. Flaherty. Thousand Oaks, CA: Sage Publications.
Thousand Oaks, CA: Sage. pp. 821–840. pp. 79–101.
Bateson, G., and M. Mead. 1942. The Balinese Ellis, C. and A. Bochner. 2006. ‘Analyzing Analytic
Character: A Photographic Analysis. New York: Autoethnography: An Autopsy.’ Journal of Contem-
New York Academy of Sciences. porary Ethnography 35(4): 429–449.
Becker, H. 1975. ‘Photography and Sociology.’ Afterim- Finely, S. 2005. ‘Arts-Based Inquiry: Perform-
age 3: 22–32. ing Revolutionary Pedagogy.’ In The Handbook
Becker, H. 1981. Exploring Society Photographically. of Qualitative Research (3rd ed.), edited by
Chicago: University of Chicago Press. N. Denzin and Y. S. Lincoln. Thousand Oaks, CA:
Behar, R. and D. Gordon. 1995. Women Writing Culture. Sage. pp. 681–695.
Berkeley, CA: University of California Press. Foucault, M. 1966. The Order of Things. London:
Bhabha, H. 1990. Nations and Narration. London: Tavistock.
Routledge. Foucault, M. 1977. Discipline and Punish: The Birth of
Blackmar, F. W. 1897. ‘The Smoky Pilgrims.’ American the Prison. London: Allen Lane.
Journal of Sociology 2: 485–500. Goffman, E. 1979. Gender Advertisements. New York:
Butler, J. 1990. Gender Trouble: Feminism and the Harper.
Subversion of Identity. New York, NY: Routledge. Grifiths, G. 1994. ‘The Myth of Authenticity: Repre-
Capote, T. 1965. In Cold Blood. New York: Random sentation, Discourse and Social Practice.’ In De-
House. Scribing Empire: Post-Colonialism and Textuality,
Chaplin, E. 1994. Sociology and Visual Representation. edited by C. Tiffin and A. Lawson. London: Routledge.
New York: Routledge. pp. 70–85.
Clifford, J. and Marcus G. (Eds.). 1986. Writing culture: Gubrium, J. and J. Holstein. 1997. The New Language
The poetics and politics of ethnography. Berkeley, CA: of Qualitative Method. New York: Oxford University
University of California Press. Press.
Deming, A. H. 1998. ‘Science and Poetry: A View from Gutkind, L. 1998. ‘Introduction: Doctors and Writers.’
the Divide.’ Creative Nonfiction 11: 11–29. Creative Nonfiction 11: 1–10.
Harper, D. 1994. ‘On the Authority of the Image: Pink, S. 2001. Doing Visual Ethnography. London:
Visual Methods at the Crossroads.’ In Handbook Sage.
of Qualitative Research, edited by N. Denzin Quinney, R. 1996. ‘Once My Father Traveled West to
and Y. Lincoln. Thousand Oaks, CA: Sage. California.’ In Composing Ethnography: Alternative
pp. 403–412. Forms of Qualitative Writing, edited by C. Ellis and
Harper, D. 2005. ‘What’s New Visually?’ In The A. Bochner. Walnut Creek, CA: AltaMira Press.
Handbook of Qualitative Research (3rd ed.), edited pp. 357–382.
by N. Denzin and Y. S. Lincoln. Thousand Oaks, CA: Reed-Danahay, D. 1997. Auto/Ethnography: Rewriting
Sage. pp. 747–762. the Self and the Social. Oxford, UK: Berg.
Hussain, Y. 2005. Writing Diaspora: South Asian Richardson, L. 1990. Writing Strategies: Researching
Women, Culture and Ethnicity. Burlington, VT: Diverse Audiences. Thousand Oaks, CA: Sage.
Ashgate. Richardson, L. 2000. Writing: A method of inquiry.
Irigaray, L. 1985. This Sex Which is Not One. Ithaca, NY: In Handbook of Qualitative Research, edited by
Cornell University Press. N. Denzin, and Y. Lincoln. Thousand Oaks, CA: Sage.
Jones, S. H. 2005. ‘Autoethnography: Making the pp. 923–948.
Personal Political.’ In The Handbook of Qualita- Richardson, L. 2002. ‘Poetic Representation of
tive Research (3rd ed.), edited by N. Denzin and Interviews.’ In The Handbook of Interview
Y. S. Lincoln. Thousand Oaks, CA: Sage. pp. 763–791. Research: Context & Method, edited by
Lassiter, L. E. 2005. The Chicago Guide to Collaborative J. Gubrium and J. Holstein. Thousand Oaks, CA: Sage.
Ethnography. Chicago: The University of Chicago pp. 877–891.
Press. Richardson, L. and E. A. St. Pierre. 2005. ‘Writing:
Lawrence, D. H. 1950. Kangaroo. Middlesex, England: A Method of Inquiry.’ In The Handbook of Qualitative
Penguin Press. Research (3rd ed.), edited by N. Denzin and
Lehrer, W. and J. Sloan. 2003. Crossing the Divide: Y. S. Lincoln. Thousand Oaks, CA: Sage.
Strangers, Neighbors, Aliens in a New America. pp. 959–978.
New York: W. W. Norton & Company. Ronai, C. 1996. ‘My Mother is Mentally Retarded.’
Lutz, C. A. and J. Collins. 1993. Reading the National In Composing Ethnography, edited by C. Ellis and
Geographic. Chicago: The University of Chicago A. Bochner. Walnut Creek, CA: Altamira Press.
Press. pp. 109–131.
Marcus, E. M. and M. Fischer. 1999. Anthropology as Rose, Gillian. 2001. Visual Methodologies. London:
Cultural Critique: An Experimental Moment in the Sage.
Human Sciences (2nd ed.). Chicago: The University Rosenblatt, P. C. 2002. ‘Interviewing at the Bor-
of Chicago Press. der of Fact and Fiction.’ In The Handbook of
Marvasti, A. 2003. Qualitative Research in Sociology. Interview Research: Context & Method, edited by
London: Sage. J. Gubrium and J. Holstein. Thousand Oaks, CA:
Merrison, J. 1998. ‘The Death of the Poet: Coleridge and Sage. pp. 893–909.
the Logic of Science.’ In The Third Culture: Literature Ross, G., V. Invonoffski and C. Sinding. 2002.
and Sciences, edited by E. S. Shaffer. Berlin: Walter ‘Making a Mess and Spreading It Around:
de Gruyter. pp. 170–181. Articulation of an Approach to Research-Based
Mienczakowski, J. 1996. ‘An Ethnographic Act: The Theater.’ In Ethnographically Speaking, edited by
Construction of Consensual Theater.’ In Composing A. Bochner and C. Ellis. Walnut Creek: Altamira Press.
Ethnography: Alternative Forms of Qualitative Writ- pp. 57–75.
ing, edited by C. Ellis and A. Bochner. Walnut Creek, Ryan, S. 1994. ‘Inscribing the Emptiness: Cartography,
CA: AltaMira Press. pp. 244–266. Exploration, and the Construction of Australia.’ In De-
Miner, H. 1956. ‘Body Ritual among the Nacirema.’ Scribing Empire: Post-Colonialism and Textuality,
American Anthropologist 58(3): 503–507. edited by C. Tiffin and A. Lawson. London: Routledge.
Moore, D. 2007. The Truth of the Matter: Art and Craft pp. 115–130.
in Creative Nonfiction. New York: Pearson Longman. Said, E. 1978. Orientalism. London: Routledge.
Moro, P. 2006. ‘It Takes a Darn Good Writer: Saltz, I. 2006. Body Type: Intimate Messages Etched in
A Review of Ethnographic I.’ Symbolic Interaction Flesh. New York: Harry N. Abrams.
29(2): 265–269. Shaffer, E. S. 1998. The Third Culture: Literature and
Paget, M. A. 1995. ‘Performing the Text.’ In Repre- Sciences. Berlin: Walter de Gruyter.
sentation in Ethnography, edited by J. Van Maanen. Stasz, C. 1979. ‘The Early History of Visual Sociology.’ In
Thousand Oaks, CA: Sage. pp. 222–244. Images of Information: Still Photography in the Social
Sciences, edited by J. Wagner. Beverly Hills, CA: Sage. Van Maanen, J. 1988. Tales of the Field. Chicago:
pp. 119–136. University of Chicago Press.
Tiffin, C. and A. Lawson. 1994. ‘Introduction: The Van Maanen, J. 2006. ‘Ethnography Then and
Textuality of Empire.’ In De-Scribing Empire: Post- Now.’ Qualitative Research in Organizations and
Colonialism and Textuality, edited by C. Tiffin and Management 1(1): 13–21.
A. Lawson. London: Routledge. pp. 1–14.
Index
1-parameter logistic (1PL) model 273, 279 analytic generalization, case studies 223
2-parameter logistic (2PL) model 274, 275, 279 analytic induction 198
3-parameter logistic (3PL) model 274–5, 279 Angel-Ajani, Asale 60–1
Angrist framework 120–2
abbreviated interrupted time series design with Angrist, Joshua and colleagues 120–3
a control series 143–51 applied research, paradigm wars 21–3
description 143–4 archival data, secondary analysis see secondary
examples 149–51 analysis, archival and survey data
strength of design 158 archival research, informed consent 100–1
treatment effects: multiple dimensions 145 archives 507–8, 521–2
unique characteristics: theoretical and empirical archiving and re-use of data
reasons 144–9 longitudinal studies 236
validity 144 secondary analysis, qualitative data 513–14
Abstracted Empiricism 72 Arnkil, Tom Erik 79
abstraction 77 Asch, S.E. 597
accuracy in parameter estimation (AIPE) 171–2 Ashenfelter, O., job training study 144
omnibus effect in multiple regression 182–5 assessment criteria, qualitative research 49
standardized regression coefficients 186–7 assessment measures, cultural equivalence 103–4
targeted effects in multiple regression 185–7 attitude scaling 30
unstandardized regression coefficients 185–6 attrition, longitudinal studies 235–6
actions, as situated 496 atypical case 217
actor-network-theory (ANT) 485 Australia, ethical guidelines 96–7
adaptive tests 283–4 autocorrelation, residuals 384
adjustment strategies for equating groups at baseline, autoethnography 609–10
comparing randomized experiments and
observational studies 421–5 Bakhtin, M.M. 456
Affluent Worker (Goldthorpe et al.) 16–17, 204–5, 234 Bartlett, F.A. 597
Agodini, R., dropout prevention study 155–6 Basics of Qualitative Research (Strauss and Corbin) 466
Aiken, L.S. and colleagues, TSWE study 140, 153–4 Bateson, G. 605
AIPE see accuracy in parameter estimation Beck-Gernsheim, Elisabeth 91
Alberoni, F. and colleagues 195–6 Beck, U. 91
Allen, C. 101 Becker, Howard S. 33, 72–3, 200, 207, 605
ambiguity, in standards of performance 77 Belmont Report 96
American Anthropological Association, ethical benefits of research, fair distribution 102–3
guidelines 96 Bertaux, Daniel 88, 91–2, 449
American pragmatism 83–5, 89–90 Beuchat, Henri 56
American Psychological Association (APA), ethical Beveridge, Sir William 70
guidelines 96, 98 Bhabha, Homi 611
American Sociological association, ethical bias 45, 116, 171
guidelines 96 biographical material 83–4
analysis biographical research 266–7
early survey research 29 biographical interpretive method 346–7
qualitative data 372–4 comparison of methods 350–4
quantitative data 371–2 context 346
analysis of covariance 423 as illustrative of debates in social research 81
of documents 479–80 increased popularity 344
618 INDEX
biographical research (cont’d ) comprehensive validation 222

interactivity 344 contextuality of cases 217
interpretive influence 353–4 contributions to knowledge and understanding
memory and pastness 352–3 216–17
narrative analysis 348–501 convenience sampling 223
older people 352–3 cultural competence 220
oral history 347–8 data analysis 218
positivism/interpretivism debate 92 depth of understanding 216
post-modernism and post-structuralism 88–90 direct observation 218
revival 87–8 emergent design 216
shift of focus 89 examples 202
sources of data 345 expansionism 216
structuring 345–6 experientiality 219
subjectivity 345, 353–4 externality 220
validity 92 generalizability 222
variety 345 interpretivist distinctions 216
wide scope 90 interpretivist generalization 222–3
Biography and Society: The Life History Approach to member checking 222
Social Sciences (Bertaux) 88 methodological approaches and issues 218
biomedical research ethics 95–6 methodological triangulation 222
Black, D. and colleagues, job training study 141 methods 218
blocking sample members, improving precision of narrative reporting 219–20
randomized experiments 124–5 naturalism 218
Bloom, H.S. and colleagues, JOBS study 145–9 nature of studies 214–15
Blumer, Herbert 1, 33, 84–6 petite generalizations 223
blurred genres 34–5 proximity 220
Boas, Franz 56, 57 purposive sampling 223
Boltanski, Luc 74–5 research ethics 220–1
Booth, Charles 28 role of theory 223
Bowley, A.L. 29 selection of cases 216–17
Breen, R. 525 semi-structured interviews 218
Briggs, Charles 61 thematic analysis 218
British Household Panel Survey (BHPS) 229–30, 527 theoretical triangulation 222
Brownlie, J. 510 triangulation 222
Bryman, A. 20, 83, 87 triangulation by data source 222
Buddelmeyer, H., PROGRESA study 140–1 triangulation by time 222
Bulmer, M. 102 typical case 217
burdens of research, fair distribution 102–3 use of term 82
Burgess, Ernest W. 30 validation 221–2
Burrell, G. 15 validity 219, 221
case study method 250
calibration 273 case study research, development 30
Cambrosio, A. and colleagues 487–8 case-to-case generalization, case studies 223
Campbell, D.T. and colleagues 44, 555 case-to-population generalization, case studies 222–3
Canada, ethical guidelines 97 causal effects, simple experimental estimator 117
Canadian Psychological Association, ethical causality 78
guidelines 96 case-based comparative cross-national research 252
Capps, Lisa 452–3 cross-sectional research 239–40
case-based comparative cross-national research longitudinal studies 239–40
causal explanation 252 Chamberlayne, Prue 347
difficulties 255–6 Charmaz, K. 18
understanding 252 Chiapello, Ève 74–5
case histories 242 Chicago School 30–1, 216, 234, 346
case studies 113 choices, individual 91
analytic generalization 223 Cicourel, Aaron 33
antecedents – historical and epistemological 215–16 classical test theory (CTT) 265, 270–1
atypical case 217 application to categorical data 273
case-to-case generalization 223 comparing different measures of same trait 277
complexity of cases 217 comparing groups 277
INDEX 619
development 271 computer-based testing, standardization 283

information 279–80 computerized adaptive testing 283–4
internal consistency 278–9 computers, effect on quantitative research 34
invariance 276 concepts, self determination 58
measurement of change 277 confidence intervals 169, 171, 172
overview 272 determining sample size 182–5
reliability 278–80 non-central 177–8
standardization 283–4 regression coefficient 179
validity 281–3 regression parameters 176–80
classification of research methods (Smelser) 250 squared multiple correlation coefficient 178–9
Clifford, J. 602 standardized regression coefficient 179–80
clinical causal thinking 78 unstandardized regression coefficient 179
coefficient alpha 278 conflicts of interest 97–9
cognitive interviewing 261, 316 Connolly, P. 197
cognitive principles, development of measures 282–3 constant comparative method 218
cognitive psychology, influence on survey construct validity 281–2, 315
research 316 constructionism 85, 89
cognitive theory, narratives 455 content analysis 481–2
cohort studies 230–1 context
secondary analysis, archival and survey data 526–7 in biographical research 346
Collins, Jane 604–5 of social research 3–5
Columbia University 30 convenience sampling, case studies 223
Coming of Age in Samoa: A Psychological Study of convergence 591–2, 595
Primitive Youth for Western Civilisation (Mead) convergent validation 556–8
57–8 conversation analysis 437–40
community studies 234 emphasis on natural data 291–2
comparative approaches, research ethics 104 membership categories 439–40
comparative cross-national research 114 transcription notation 444
assessing similarity and difference 255 workplace studies 495–6
case-based 251–6 Corbin, Juliet 197–8, 466–8
categorical classifications, reliance on 255 Corti, L. 509–10
culturalist approach 251 Council for International Organizations of Medical
difficulties with case-based approach 255–6 Sciences (CIOMS) 96
equivalence 258–62 Council of National Psychological Associations
indicators 260 for the Advancement of Ethnic Minority
individualist fallacy 257–8 Interests 101
language 260 countries, as units of analysis 256–7
literal and functional equivalence 260–1 covariates, improving precision of randomized
Method of Agreement 252–3, 255 experiments 123–4
Method of Concomitant Variation 255 Cox Regression 238
Method of Difference 254, 255 crisis of representation 36
methodological equivalence 259–60 critical discourse analysis
small numbers 256 ideology 435–6
survey-based 256–62 interactional sociolinguistics 436–7
surveys, instability of measurements 258 power 435
surveys, limitations 257 critical ethnography 59–61, 63
surveys, nature of comparisons 258 Cronbach, L.J. 271
surveys, units of analysis 256–7 cross-cultural and cross-national research,
surveys, validity 260 focus groups 364
type of causal explanation 255–6 cross disciplinary research 4–5
types of research design 249–51 cross-sectional research, causality 239–40
universalist 251 Crossley, Michelle 433–4
comparative inference 205–6 cultural competence, case studies 220
comparative method 250 cultural relativism 56, 58
complementarity 592 cultural studies, use of photographs 605
composite multilevel model for change 382–4 cultural turn 91
comprehensive validation, case studies 222
computer-assisted qualitative data analysis (CAQDAS) d-index 538
567–8 Dalton, Melvin 207
620 INDEX
data narrative analysis 433–4

censoring, in meta-analyses 546–7 overview 431
co-construction 335–6 Discovery of Grounded Theory (Glaser and Strauss)
missing data, detecting 547–8 463–6
missing data, imputing 548–50 discrepant cases 217
mixing qualitative and quantitative discursive psychology 440–2
see multiple-method research dispreference 297
natural and contrived 266, 290–2, 306–7 division of labour, methodological 556
objective data 590 documents 373–4
ownership 98 in action 484–8
pooling 595 in actor-networks 485
as product of method of collection 590 as immutable mobiles 485
qualitative vs. quantitative 589–90 as informants 480–1
sense data 590, 596 in interaction 489–90
subjective data 590 multidimensionality 490
typology 590–1 as resources and topics 483–4
data analysis studying content 480–4
case studies 218 ways of analysing 479–80
focus groups 361–4 Documents of Life (Plummer) 88
qualitative longitudinal 240–2 double pretest design 157
quantitative longitudinal 237–9 Douglas, Mary 59
data archives 34, 507–8, 521–2 Dumont, Louis 63
data collection Duranti, Alessandro 61
absence of researcher 305–6 Durkheim, Emile 29, 82, 210, 251, 528
longitudinal studies 229–36 Dynarski, M., dropout prevention study 155–6
Data Grids 568–9
data theory (Coombs) 592 eclectic discourse analysis 442–3
datasets, multiple 592–8 Economic and Social Research Council (ESRC) 49
Day, Dennis, membership category analysis 440 data archiving 507–8
deception research 101–2 Datasets Policy 513
Declaration of Helsinki 96 educational research
declarative knowledge 219 assessing validity 49–50
deduction 224 paradigm wars 22–3
deductive inference 204–5 Educational Research: A Critique (Tooley and
defamiliarization 611 Darby) 23
deficit approaches, research ethics 104 Edwards, Derek, discursive psychology 440–2
Denzin, N.K. 18, 35–6, 89, 196, 449, 603, 604 effect size (ES) 118–20
Derrida, Jacques 611 benchmarks 120
descriptive knowledge, validity 44 homogeneity, in meta-analyses 543–4
deviant cases 205, 217 meta-analyses 537–9, 540–2
Devine, Fiona 234 testing for moderators 544–6
dichotomous responses 270 effect sizes 169
difference-in-differences design email surveys 315
examples 155–7 embedded methods argument 19
job training studies 154–5 emblematic case 206–7, 210
validity 151–5 emergent design 216
differentiation, of methodological approaches 35–6 emic perspective 563
Dilthey, W. 215 empathy 333
direct observation 218 epistème 69–70
discourse analysis 372–3 epistemic dimension of knowledge 69–70
conversation analysis 437–40 epistemological foundationalism 46, 48
core features 432–3 epistemology
critical discourse analysis 434–5 assumptions about method 34–5
discursive psychology 440–2 multiple-method research 560–1
in documents 482–3 quantitative/qualitative divide 13
eclectic discourse analysis 442–3 equivalence, comparative cross-national research
generic 433 258–62
membership category analysis 439–40 ESDS Qualidata 507–8, 513, 532
methods 432 ethical codes 529
INDEX 621
Ethics of Not Taking a Stand 77 focus groups 365

ethnographic competence 60–2 nature of interview data 334–5
ethnography research methods as theoretical issues 336–7
accounting for audience 63 use of contrived data 292–3
as comparative 55 feminist research project, biography
critical 59–60 background 293–4
early studies 56–9 discussion 303–7
focus groups 365 method and analysis 295–303
goals 55 participants 295
insiders and outsiders 62–3 prompts 295
intentionality 61 use of prompts 294–303, 304–5
longitudinal studies 234 feminist sociological theory 329
nature of method 54 Feuer, M.J. and colleagues 22–3
objective encoding 597 fieldwork, as preparatory activity 502
relation of authors to subjects 62–3 first person, use in narrative 219
research sites 60 fittingness 196
role of audience 54 fixed effects design see difference-in-differences design
workplace studies 494–6 flexibility, longitudinal studies 235
etic perspective 563 focus groups 267
European Social Survey 259, 261 analysing data 361–4
European Social Survey (ESS) 526 appropriate use 360
evaluation, of research 77 collaborative meaning-making 336
evaluation research 4–5 consensus and disagreement 363–4
fictions 76–8 cross-cultural and cross-national research 364
interest of knowledge 78 definitions 357
situational 79 as discussion or performance 362–3
event history modelling 237–8 dynamics 367–8
evidence-based policy and practice 4–5, 48–9 emergent themes and discourses 364
expansionism 216 ethical considerations 359–60
expectation analysis 456–7 ethnography 365
experientiality, case studies 219–20 feminist research 365
experimental design, in randomized experiments 116 group as unit of analysis 358, 362
experimental ethnographic writing 36 group dynamics 361
experimental method 250 history 356–7
experiments, types of 136–7 issues 367
explanatory knowledge, validity 44–5 market research 359
external validity 44, 282 moderator 360–1
case studies 221 online research 366–7
organization and dynamics 358–9
Fabian, Johannes 62 organizational research 365–6
face-to-face surveys 317–18 participants’ use 361
advantages and disadvantages 324 practical considerations 357–8, 359
response rates 314–15 reasons for using 357–8
factual fiction 607–8 sampling 358
Fairclough, Norman, Margaret Thatcher interview silences and omissions 364
434–6 strengths 358, 363, 367
fairness, research ethics 105–6 Foucault, Michel 69–70, 611
Faleomavaega, Eni Fa’aua’a Hunkin 57–8 framework approach to data analysis 242
fallibilism 48 frameworks, in governance 75–6
Faubion, James 60 Freeman, Derek 57, 58
feminist knowledge 337 functionalism, and survey research 33
feminist research 266 funding 5
characteristics 328–9 Furlong, J. 49
collaborative meaning-making 335–6 fuzzy discontinuity 142–3
contributions to interview research 1970s and 1980s
329–32 Geertz, C. 597
contributions to interview research 1990s–2000s Geisteswissenschaften (subjective meaning making) 215
332–7 gender bias, quantitative research 329–30
critique of positivism 330 general theory 55, 64
622 INDEX
generalizability 43, 113 Hahn, J. and colleagues 140

case studies 222 Hammersley, M. 22, 23
emblematic case 206 Handbook of Qualitative Research (Denzin and
generalization Lincoln) 18
case-to-population 222–3 Harding, Sandra 82
five concepts 196–8 Harper, Douglas 604
naturalistic 197 Hatt, Paul 16
in qualitative research 195–9 Haviland, John 62
unavoidableness 198–9 Heidegger, Martin 88
Gibbons, M. and colleagues 71 Heider, F. 597
Giddens, Anthony 73, 74 Herman, David 454, 455
Giddings, L.S. 20–1 hermeneutics 88
Ginexi, E.M. and colleagues, job loss study 392 Heshusius, L. 19
Glaser, Barney 87, 462, 463–6 heteroscedasticity 384
Gluckman, Max 58 hierarchies, in research relations 330–1
goals, in governance 75–6 history of social research 27–8
Goffman, Erving 464–5, 604 Hodkinson, P. 23
Goldberger, A.S. 140 holism 592, 595
Gomm, R. and colleagues 198–9 Holland, J., Inventing Adulthoods 241–2
good enough belt 172 Hollister, R., Tennessee Project Star Study 155, 156
good practice 76 Holstein, James A. 453
Goode, William 16 Howe, K.R. 21
goodness, defining 75 Howson, A. 510
Gouldner, Alvin 73 humanism 346
gypsum mine study 202 Husserl, Edmund 88
governance hypertext 606
by goals and frameworks 75–6
by programmes and frameworks 77–8 ideal type (Weber) 210
governmentality 74 idealism, epistemological foundation 85
Grand Theory 72 identity 90
grid technology 532 conversation analysis 439–40
grounded theory 87, 373 in research relations 333–4
as art and science 473–4 identity equivalence 261
axial coding 466 ideology, critical discourse analysis 435–6
conditional/consequential matrix 466–7 idiographic sampling theory
conditional matrix 466 comparative inference 205–6
constant comparative method 218 deductive inference 204–5
and constructivism 469–70 emblematic case 206–7, 210
contradictions 474–5 interactive, progressive and iterative sampling 207–9
development by Glaser and Strauss 464–6 proposals 201–7
diagramming 467 representativeness without probability 201
emergence of the method 463–4 social regularities 204–7
flexibility 467–8 units of analysis 203–4
guidelines to method 471–3 variance 201–3
methodological importance 462–3 illicit drug use research, ethics 105
objectivism/constructivism 18 immutable mobiles 485
overview 461 implementation strategy, in randomized
and postmodernism 468–9 experiments 116
procedures vs. emergence debate 466–8 incentives, ethics 105
properties of method 470–1 indicators, in cross national research 260
range of use 462 individual heterogeneity 238–9
groups, assigning 414–15 individual, in qualitative and quantitative
growth, adjusting for 425 research 243
Guba, E.G. 196–7 individualization thesis 91
Gubrium, Jaber F. 453 individualist fallacy 257–8
Guttman, L. 271 individuals, as units of analysis 257
induction 224
Hacking, Ian 68, 70 inference 204–6
Hackman, J.R. and colleagues 157 information loss 595–6
INDEX 623
informed consent future development 284–5

archival research 100–1 information 279–80
deception research 101 invariance 276
longitudinal studies 237 measurement of change 277
qualitative research 99–100 overview 269
unexpected circumstances 100 Rasch model 273
innovation thinking 76 standardization 283–4
insertion sequences 296 validity 281–3
Institutional Review Boards 95
intentionality 61, 62 Jacob, B.
interaction 494–5 Chicago schools study 142
interactional sociolinguistics 436–7 fuzzy discontinuity 142
interactivity, in biographical research 345
internal consistency, classical test theory Kant, Immanuel 215
(CTT) 278–9 Kapferer, Bruce 64
internal validity 44, 136 Kaplan, Abraham 82
International Committee of Medical Journal Editors Kelle, U. 467
(ICMJE) 97 Kemper, R. 234
Internet Kish method 320
multimedia texts 606 knowledge
secondary analysis, archival and survey data 532 declarative 219
Internet surveys 321–3 dimensions of 68–70
experiential 219
advantages and disadvantages 324
relativism 47
response rates 315
tacit 219
interpretive approaches, biographical material 90
knowledge claims 44
interpretive interactionism 89
Kohli, Marti 449
interpretivist generalization, case studies 222–3
Kuder-Richardson formulas 278–9
intersubjective agreement 221
Kuhn, Thomas 14, 46–7
interview research
collaborative meaning-making 335–6
Labov, William 451–2
empathy, rapport and reciprocity 333
language
feminist contributions 1970s and 1980s 329–32 in cross national research 260
feminist contributions 1990s–2000s 332–7 in qualitative research 18–19
nature of data 334–5 large-scale quantitative surveys 229–30
interview schedules 317 latent constructs 269–70
interviewers latent trait theory 272
effects of absence/presence 314, 318 latent variable models see structural equation modeling
training 318 Lazarsfeld, Paul 33, 38
interviews 266 Lefgren, L.
face-to-face 317–18 Chicago schools study 142
as interrogative 350–2 fuzzy discontinuity 143
longitudinal studies 234 level-1 model for individual change 378–80
non-hierarchical relations 332–3 level-2 model for inter-individual differences in change
relationship of interviewer and interviewee 380–2
353–4 Lewis, J. 242
telephone 318–20 life course research 87, 231
intra-paradigmatic differences 17–19 life histories, use of term 82
intrinsic case study 197 life stories 83–4
issue networks 486 Lincoln, Y.S. 18, 35–6, 196–7
item characteristic curve (ICC) 272–3 linguistics 88–9
item information function (IIF) 279–80 linked panel study 230
item response curve 272 LISREL model 396
item response function 272–5 literal and functional equivalence 260–1
item response theory (IRT) 265, 272–5 literature reviews 536
comparing different measures of same trait 277 local independence 272
comparing groups 277 logical positivism 215
comparison with classical test theory 275–6 longitudinal studies 113–14
development 271 analysing quantitative data 237–9
examples and extensions of model 275 anthroplogy 231, 234
624 INDEX
longitudinal studies (cont’d ) meaning

archiving and re-use of data 236 as object of study 74
attrition 235–6 reproduction of 597–8
case histories 242 measurement
causality 239–40 benefits of model-based approach 275–6
cohort studies 230–1 meaningful metrics 276–8
collecting data 229–36 quality of 270
combining methods 234 measurement error 270, 423
description 228–9 measurement methods, in randomized experiments 116
ethics 236–7 measurement of uncertainty, in randomized
ethnography 234 experiments 116
event history modelling 237–8 measures
examples 232–3 development from cognitive principles 282–3
flexibility 235 validity and reliability 43
framework approach to data analysis Mehan, Hugh, eclectic discourse analysis 442–3
242 member checking, case studies 222
individual heterogeneity 238–9 membership category analysis 439–40
influence of discipline 235 memory 352–3
informed consent 237 messages, understanding 61
interviews 234 meta-analyses 375
linked panel study 230 alternative indices of heterogeneity 550–1
mixed methods 231 averaging effect sizes 540–2
narrative positivism 240 combining slopes from multiple regressions 551–2
prospective research design 229 comparing randomized experiments and
qualitative data 231, 234–5 observational studies 420
qualitative data analysis 240–2 d-index 538
reflexivity 242–3 data censoring 546–7
repeated measures analysis 239 effect size (ES), homogeneity 543–4
retention 236 estimating effect sizes 537–9
retrospective research design 229 independent samples 539–40
sampling 235 missing data, detecting 547–8
theory 235 missing data, imputing 548–50
Lundberg, George 32 missing data, Trim-and-Fill procedure 549–50
Lutz, Catherine 604–5 models of error 542–3
Luxembourg Employment Survey (LES) 525 moderators of effect sizes 544–6
Luxembourg Income Survey (LIS) 525 new directions 550–2
odds ratio 539
mail surveys 320–1 overview 536
advantages and disadvantages 324 procedures 537–46
response rates 314–15 quantitative data, combining 595
Malinowski, B. 597 r-index 538–9
Manchester School 58–9 sensitivity analysis 546
Mannheim, Karl 230 units of analysis 539–40
mapping 30 validity 537
Marcus, George 59, 60, 63, 602 Method and Measurement (Cicourel) 33
Marienthal (Jahoda et al.) 15, 30 Method of Agreement 252–3, 255
market research, focus groups 359 Method of Concomitant Variation 255
Marsh, Catherine 29 Method of Difference 254, 255
Mascarenhas, Fernando 63 method, use of term 26, 82
matching groups 421–2 methodological approaches 35–7
Mauss, Marcel 56 methodological equivalence 259–60
Mauthner, N.S. 513 methodological features, comparing randomized
Maxcy, S.J. 19 experiments and observational studies 420–1
MBESS R 168, 178–9, 180, 181–2, 186, 187 methodological triangulation, case studies 222
McClannahan, L.E. and colleagues, group home study methodology
149–51 development 1–2
McKillip, J., student alcohol use study 149, 150 use of term 26, 82
Mead, Herbert 346 Methods in Social Research (Goode and Hatt) 16
Mead, Margaret 57–8, 605 Meyer, Geralyn A. 474
INDEX 625
microdata from administrative records 522–4 multiple regression 173–4

Miller, Peter 74 statistical power for omnibus effects 180–1
Mills, C. Wright 33, 72 statistical power for targeted effects 181–2
Miner, Horace 611 multivariate effects 169
minimum detectable effect 117–18, 119 Myrdal, Alva and Gunnar 70
minimum detectable effect size (MDES) 119
Minton, J.H., Sesame Street study 158–9 narrative analysis 348–50, 433–4
mixed methods research 1–2 see also multiple-method narrative positivism 240
research narrative reporting, case studies 219–20
assessing validity 50 narrative reviews 536
as challenge to paradigm wars 15–17 narrative structure, in understanding of biographical
differences in position 19–21 accounts 90
increasing popularity 36 narrative turns 449–51, 457
longitudinal studies 231 narratives 373
study of research practice 20 cognitive theory 455
MMRD see multiple-method research definitions 448
mobile phones 319–20 events, states and genres 453–4
Mode 1 science 70–2, 73, 77 expectation analysis 456–7
Mode 2 science 72–5, 77 Labovian personal narrative 451–2
models of error, meta-analyses 542–3 move away from text 452–3
moderators, in focus groups 360–1 before narrative analysis 449
modern mental test theory 272 Proppian model 451
routines 455
moral neutrality 76
scripts, stories and narrativity 454–6
Morgan, G. 15
use in social research 447
multilevel model for change
use of first person 219
adding further predictors 387
narrativity 448
composite multilevel model 382–4
National Centre for Social Research 49
displaying prototypical trajectories 390–1
National Child Development Study 230
extensions 391–3
National Commission for the Protection of Human
fitting to data 384–5
Subjects of Biomedical and Behavioural
interpretation 385–93
Research 96
interpreting additional fitted models 387–90
National Institutes of Health Office of Extramural
level-1 model for individual change 378–80
Research (USA), ethical guidelines 97
level-2 model for inter-individual differences in
National Science Federation (NSF) (USA), ethical
change 380–2
guidelines 97
modeling discontinuous individual change 392–3
National Statement on Ethical Conduct in Research
modeling nonlinear individual change 393 Involving Humans (Australia) 96–7
overview 377–8 naturalism, case studies 218
scope 393 naturalistic generalizations 197
time-varying predictors 392 negative cases 217
variably spaced measurement occasions 391–2 neoliberalism, and Mode 2 science 72–5
varying numbers of measurement occasions 392 non-central distributions 176
multilevel modelling, secondary analysis, archival and non-hierarchical relations in interviews 332–3
survey data 533 noncompliance, estimating causal effects 120–3
multiple choice responses 270 nonequivalent comparison group design see
multiple-method research 375 see also mixed methods difference-in-differences design
research nonexperiments, definition 136
core principles 560–6 Noro, Arte 55–6
data combination 558–60 Nowotny, Helga and colleagues 71
data conversion 559–60 null hypothesis 172–3
data linkage 568–9 regression coefficient equals zero 175–6
dimensions 558, 559 significance testing 169
Environment Agency studies 561–6 squared multiple correlation coefficient equals zero
epistemology 560–1 174–5
evaluation of method 566–9 nuomena 215
origins 555–8 Nuremberg Code 95
strategies 558
technology 567–9 Oakley, Ann 330
validity 556–8 Oancea, A. 49
626 INDEX
observational studies petite generalizations, case studies 223

assigning to groups 415 phenomena 215
compared to randomized experiments 419–26 phenomenology 88, 215, 347
enhancing design 425–6 philosophy 14, 17, 20
multiple control groups 425–6 Pini, Barbara 336
multiple pretreatment measures over time 426 planned economy, and Mode 1 science 70–2
nonequivalent dependent variables 416 Platt, Jennifer 29–30, 33, 82, 463
observational units 203 Plummer, Ken 83, 88
Ochs, Elinor 452–3 pluralism in social research 3–5
odds ratio 539 poetic representation 608–9
older people, biographical research 352–3 policy makers, links with research 4
omnibus effects 166, 167, 180–1 Popkess-Vawter, S. and colleagues 510
online focus groups 366–7 population generalizability, research ethics 103
ontology population statistics 72
quantitative/qualitative divide 13–14 Portelli, Alessandro 347, 351
realist 88 positional dimension of knowledge 70, 73
Optimal Matching Analysis 240 positivism 215
oral history 87–8, 267, 347–8, 350–2 and biographical material 92
organizational research, focus groups 365–6 feminist critique 330
organization theory, paradigms 15 as foundation of statistical methods 85
origin myths 27 and life stories 84
revival 48
Pan African Bioethics Initiative (PABIN) 96 use of term 34
panel studies 229, 322 post-experimental moment 36
paradigm argument 19 post-modernism and post-structuralism
paradigm asynchronicity 19 and biographical research 88–90
paradigm shift, Mode 1 to Mode 2 science 79 lingering influence 23
paradigm wars see also quantitative/qualitative and writing 603
divide post-positivism 18
applied research 21–3 power
continuation 38 critical discourse analysis 435
of continuing importance? 23–4 in social research 47, 330–1, 332, 336–7
dimensions 23 power analysis 169
educational research 22–3 Power, Michael 74
in history of social research 34–5 power, statistical
levels of debate 15 defined 168–9
and mixed methods research 15–17 low 170
systematic review 21–2 Practice and Process in Integrating Methodologies
use of term 13 project (PPIMs) 573–83
paradigmatic mentality 14 creating a data repertoire 581–3
paradigms following a thread 576–83
application to mixed methods research 20–1 in-depth interviews 574
differences within 17–19 initial analysis 577–8
as incompatible 14, 15 integrating datasets 576
organization theory 15 interviews with homeless people 575
as successive 46–7 narrative analysis 575–6
parameter estimation, rationale of accuracy 170–3 narrative data 580–1
Parry, O. 513 picking up a thread 578–80
participant observation 55 social constructionism 576
pastness 352–3 thematic analysis 574–5
path analysis 396 visual study 575
path diagrams 397–8 practice theory 63
pattern matching, to strengthen quasi-experimental pragmatic thinking 76
designs 157–9 pragmatism, in mixed methods research 19–20
Pawson, Ray 198 pre-paradigms 14
payments, research ethics 104–5 pre-testing, questions 316
perception 215 precision 171
performative writing 606–7 and sample size 119
Petersen, Glenn 63 use of covariates and blocking to improve 123–5
INDEX 627
preordinate design 216 links between patterns and interpretation 594–5

Presser, Lois 335–6 links between potential and recorded observations
probability sampling 204 see also sampling 596–8
difficulty of finding sampling frames 194 links between recorded observations and data 595–6
early survey research 29 meta-analyses 595
nonresponse 194 overview 585–6
problems and limits 194–5 reasons for 591–2, 598
representativeness and generalizability 194–5 verification 591
progress narrative in social research 2–3 quantitative/qualitative divide 2, 82–3, 87, 215
prompts 294–303, 304–5 see also paradigm wars
propensity scores 112, 152–3, 422 quantitative research
Proposed International Ethical Guidelines for assessing validity 45–7
Biomedical Research Involving Human Subjects development of computers 34
(CIOMS/WHO) 96 gender bias 329–30
Propp, Vladimir 451 rise to dominance 16
prospective longitudinal research design 229 validity 43–6
proximity, case studies 220 view of individual 243
Psathas, George 489 quasi-experiments 112
psychometrics 269 compared to random experiments: within and
Public Health Service (USA), ethical guidelines 97 between studies 135–6
public management doctrines 75 definition 136
pure case (Durkheim) 210 important design attributes 159
purposive sampling 235 pattern matching to strengthen designs 157–9
case studies 223 weak designs 157–8
question-answer process, four stages 316
Qualitative Analysis for Social Scientists (Strauss) 466 questionnaires 266, 316–17
qualitative data analysis, longitudinal studies 240–2 questions 314, 315–16
qualitative data, integration 375–6 Quételet, Adolphe 210
conceptualization 573 Quinney, Richard 605
as distinct from triangulation 573
integrated analysis 576 r-index 538–9
maintaining integrity of data 583 race, ethnicity and culture, research ethics 103–4
overview 572–3 Radin, Paul 56–7, 58, 61
Practice and Process in Integrating Methodologies Ragin, Charles 250–1
project (PPIMs) see under random digit dialling 319
separate heading randomized experiments 111–12
team research 583 assigning to groups 414–15
qualitative data, secondary analysis see secondary attrition 416–17
analysis, qualitative data compared to observational studies 419–26
qualitative research definition 136
1970s and 1980s 34–5 development and usage 115–16
assessing validity 47–8, 49 elimination of bias 116
diversity of approaches 18 equating groups 372, 415–16
emergence 17 essential elements 116
generalization 195–9 estimating causal effects with noncompliance 120–3
informed consent 99–100 as gold standard 134
origin myths 27 impact questions 120
predominance 5 improving group comparability at baseline 417–19
prehistory 28 measurement of uncertainty 116
use of language 18–19 possible future developments 127–8
validity 46–8 proper randomization 416
view of individual 243 randomizing groups to estimate intervention effects
quantitative data, analysis 371–2 125–7
quantitative data, combining 376 sample size and allocation 117–20
advantages and disadvantages 599 simple experimental estimator of causal effects 117
complementarity 592 stable-unit-treatment-value assumption 416–17
convergence 591–2, 595 treatment compliance 416
holism 592, 595 use of covariates and blocking 123–5
links between data and patterns 593–4 rapport 333
628 INDEX
Rasch model 273 Research Ethics Committees 95

rating scales 270 research evaluation 77
realist ontology 88 research methods
reciprocity 333 epistemological assumptions 34–5
reductionism 216 Smelser’s classification 250
Reed-Danahay, Deborah 609–10 studies of usage 36–7
reflectivity see reflexivity research process
reflexivity 6 alternative view 587–9
in biographical research 89 chronology 586
in ethnography 59 conventional view 586–7
longitudinal studies 242–3 European Social Survey example 588
reflexivity problem 73 fragmentation 586
regression analysis, combining slopes 551–2 inevitability 586–7
regression coefficient research questions 207–8
confidence intervals 179 in randomized experiments 116
statistical significance 181–2 research theory 55
regression-discontinuity design 137–43 research traditions 18
cutoff 137, 139, 140, 143 researcher, absence from data collection 305–6
description and examples 137–9 responses, types of 270
examples 142–3 retention, longitudinal studies 236
strength of design 158 rhetoric, of programmes and evaluation 77
unique characteristics: theoretical and empirical Richardson, Laurel 609
reasons 139–42 Riessman, Catherine Kohler 349, 453–4
regression parameters Ronai, Carol 610
confidence interval formation 176–80 Rose, Gillian 603
non-central confidence intervals 177–8 Rose, Nikolas 74
null hypothesis significance tests 174–6 Rosenbaum, Paul 153
relativism, of knowledge 47 Rosenblatt, Paul 608
relativism/scepticism 48 Rosenthal, Gabriele 347
reliability Royce, A.P. 234
classical test theory (CTT) 278–80 Rules of Sociological Method (Durkheim) 82
of measures 43 Ryan, Simon 611
use of term 44
repeated measures analysis 239 Sacks, Harvey 292
representational dimension of knowledge 69 Said, Edward 611
Research Assessment Exercise (RAE) 49 Saldana, J. 241
research design, elements to consider 167 Sale, J.E.M. and colleagues 20
research ethics salvage ethnography 56
as basis for good science 106 sample size planning 112–13, 187–8
case studies 220–1 accuracy in parameter estimation 171–2
comparative and deficit approaches 104 goal of research 166, 167
distribution of benefits and burdens 102–3 multiple regression with goal of statistical accuracy
ethical codes 529 182–7
fairness 105–6 multiple regression with goal of statistical power
focus groups 359–60 180–2
history of rules and regulations 95–6 power analytic approach 169–70
illicit drug use research 105 samples
longitudinal studies 236–7 blocking members 123–4
population generalizability 103 increasing size 171
principles 96 independent, in meta-analyses 539–40
race, ethnicity and culture 103–4 stratifying members 124–5
regulatory bodies 95 sampling see also probability sampling
research incentives 104 in contemporary science 199–201
research relations 331 development of techniques 30
secondary analysis, archival and survey data 529 early survey research 29
secondary analysis, qualitative data 512 idiographic see idiographic sampling theory
social science guidelines 96–7 interactive, progressive and iterative 207–9
and statistical power 170 interactive units 203–4
types of payments 104–5 longitudinal studies 235
INDEX 629
palaeontology 200 practical considerations 512

probability 204 re-analysis 510
probability and non-probability 193 research participants’ views 516
randomized experiments 117–20 review of studies 508, 509, 510
secondary analysis, archival and survey data 530 secondary uses of qualitative data 509–11
Sapir, Edward 58 self-collected data 506
Savage, M. 16–17 supplementary analysis 510
scale information function (SIF) 279–80 supra-analysis 510
scales of measurement 270 Seikkula, Jaakko 79
Schütze, Fritz 346–7 self-administered questionnaires 314, 323
Seale, Clive 448 semi-structured interviews 218
Seasonal Variation of the Eskimo (Mauss and sensitivity analysis 546
Beuchat) 56 Shadish, W.R. and colleagues
secondary analysis, archival and survey data 374–5 Cincinnatti Bell study 145
access to data and support, advances in 531–3 group coaching study 151–3
analytical and research value 524–8 single comparative studies, comparing randomized
available data 521–2, 523 experiments and observational studies 419–20
cohort studies 526–7 situational analysis 58
combining survey and qualitative research 528 Skoufias, E., PROGRESA study 140–1
cross-sectional surveys 526 Smelser, N.J., classification of research methods 250
data archives 521–2 Smith, Andrea 62
data linkage 533 Smith, Dorothy 330, 331
data sources, UK 524 Smith, J.K. 19
good practice 530–2 social action, as object of knowledge 72
grid technology 532 social constructionism 73
historical comparisons and change over time 526 social drama 58–9
international comparisons 525–6 social network analysis 486–8
Internet 532 social planning 70–2
item non-response 531 social practice, and social science 70, 74
large and nationally representative samples 524–5 social psychology, experimental/non experimental 18
locating datasets 521–2 social reality, dimensions of knowledge 68–70
looking after data 530 social research
methodological issues 528–9 1970s and 1980s 34–5
methods, developments 532 1990s onwards 35–6
microdata from administrative records 522–4 dominance of statistical methods 85
modelling and causality 531–2 before First World War 28–9
multilevel modelling 533 interwar period 29–31
non-response 530–1 locus of power 47
overview 520 Second World War and Post-War period 31–4
panel studies 527 social function 49–50
population sub-groups 527 social science
relationships within households 527–8 emancipating 72
research ethics 529 paradigm as inappropriate term 14
sampling 530 and planned economy 70–2
structural equation modeling 533 and social practice 70, 74
use of documentation 530 sociological theory, three genres 55–6
secondary analysis, qualitative data 374 sociology of knowledge dimension of knowledge
accessing data 507–9 70, 73
amplified analysis 511 software
archiving policy 513–14 statistical 384
assorted analysis 511 structural equation modeling 407
choice of data 514–16 text analysis 491
data archives 507–8 Spearman, Charles 271
epistemological issues 511–12 speech, creative/transformative nature 61
ethical and legal concerns 512 Spiegelman, C., fuzzy discontinuity 142–3
future policy and practice 512–16 squared multiple correlation coefficient
informal data sharing 509 confidence intervals 178–9
key issues 511–12 statistical significance 180–1
overview 506 Stake, Robert 197
630 INDEX
standardization 283–4 comparative cross-national research 256–62

standardized interviews 313–14 development of techniques 34
standardized regression coefficient, confidence intervals disagreements over method 32–3
179–80 equivalence 259–62
Star, Susan Leigh 474–5 in Europe and America 35
statistical accuracy, sample size planning 182–5 and functionalism 33
statistical adjustment strategies based on measured Internet 321–3
baseline differences 422–3 large-scale 229–30
statistical analysis, in randomized experiments 116 mail 320–1
statistical method 250 mixed mode 323–4, 325
statistical power 180–2 prehistory 28–9
statistical power analysis 168–70 question development 315–16
statistical sampling theory, and generalizability 43 response/non-response 314–15
statistical software 384 rise in popularity 31–2
strategy 1 self-administered questionnaires 323
Strauss, Anselm 87, 197–8, 462, 463–6 and social reform 28
strong true score theory 272 standardization 259
structural equation modeling 372–3 use of term 82
basic cross-sectional model 402–3 Sweden, population policy programme 70
basic longitudinal model 404 systematic review 4, 21–2
cross-lagged panel models 404–5
cross-sectional model 402 tacit knowledge 219
current status 396 targeted effects 166, 167, 181–2
equivalent models 406–7 taxonomies 484
estimation and testing 398–9 technology 235
first-order factor model 399–400 multiple-method research 567–9
future 407–8 telephone surveys
higher-order factor models 400 advantages and disadvantages 324
history 395–6 interviews 318–20
latent growth curve models 405 response rates 314–15
limitations 405–7 testing, definition 269–70
longitudinal models 404 The American Soldier (Stouffer et al.) 32
measurement error correction 407 The Audit Society (Power) 74
measurement invariance models 400–1 The Discovery of Grounded Theory (Glaser and
measurement models 399 Strauss) 87
measurement scale 406 The Polish Peasant in Europe and America (Thomas and
mediated effects 403 Znaniecki) 84–5, 449, 480–1
model fit 406 The Social Construction of Reality (Berger and
moderated effects 403–4 Luckmann) 73
multitrait-multimethod models 401–2 The Sociological Imagination (Wright Mills) 33, 86–7
overview 396–9 the three Es 75
path diagrams 397–8 thematic analysis, case studies 218
sample size 406 theoretical knowledge, validity 45
secondary analysis, archival and survey theoretical sampling 197, 199, 235
data 533 Theoretical Sensitivity (Glaser) 464
software 407 theoretical triangulation, case studies 222
trait-state-error models 402 theory
structuring, biographical research 345–6 longitudinal studies 235
Stubbe, M. and colleagues, interactional sociolinguistics role in case studies 224
436–7 Thomas, William 84–5, 449, 480–1
styles of reasoning 68 Thompson, P. 509–10
subjectivity 221 Thomson, R., Inventing Adulthoods 241–2
in biographical research 345, 353–4 Thorne, Sally 509
survey data, secondary analysis see secondary analysis, Thurnstone, L.L. 30
archival and survey data Tilley, N. 198
survey research trace line 272
administration 259 transcription notation 308, 444, 496
advantages and disadvantages of methods 324 transferability 196
available methods 313–14 treatment effects 145, 414
INDEX 631
triangulation Verstehen (understanding) 215

case studies 222 video
as distinct from integration 573 additional data 502–3
multiple-method research 555–8 data collection 502–3
triangulation by data source, case studies 222 development 494
triangulation by time, case studies 222 example of analysis 496–502
Trim-and-Fill procedure 549–50 fieldwork 502
Trochim, W. strengths and focus of method 503
compensatory reading study 142 in study of everyday activities 493–4
fuzzy discontinuity 142–3 transcribing visual material 499–501
true score theory 272 workplace studies 494
Turner, Victor 58–9 Vienna School 215
Tuskegee Syphilis Study 95–6 visual images 374, 603–6
typical case 217 visual media, development of use 493
underdogs, siding with 73 Waletsky, Joshua 451–2

United States of America, ethical guidelines 97 Warner, W. Lloyd 205–6
units of analysis 203–4 web crawlers 486
comparative cross-national survey research 256–7 Weber, Max 28–9, 210, 251
focus groups 358, 362 weighting 530–1
meta-analyses 539–40 welfare state, Britain 70
unmeasured baseline differences, adjusting for 423–5 Wengraf, Tom 347
unstandardized regression coefficient, confidence what works 76
intervals 179 Whorf, Benjamin Lee 202
Wilde, E.T., Tennessee Project Star Study 155, 156
validation, case studies 221–2 Wodak, Ruth, Margaret Thatcher interview 434–6
validity Wolcott, H.F. 241
abbreviated interrupted time series design with working hypothesis 196–7
a control series 143 workplace studies 494–6
assessing 45–7, 49 World Health Organization (WHO) 96
case studies 219, 221 World Wide Web, pages as documents 485–8
classical test theory (CTT) 281–3 Wortman, P.M. and colleagues, educational vouchers
control of variables 43–4 study 156–7
cross national surveys 260 Wright Mills, C. 33, 86–7
difference-in-differences design 151–5 Wright, Sewall 395–6
distinction between qualitative and quantitative writing
research 42 as action 479
epistemic criteria 47–8 aesthetic criteria 612–13
exercise of judgment 51 alternative texts 613
generalizability 43 defamiliarization 611
as ideology 47 factual fiction 607–8
internal and external 44 feminist approaches 603
item response theory (IRT) 281–3 performative writing 606–7
of measures 43 poetic representation 608–9
meta-analyses 537 poetry and science 613
qualitative research 46–8 postcolonial writing 610–12
quantitative research 43–6 procedural approach 602
quasi-experiments 136–7 relationship with documentation 479
relativism 48, 51 as representational practice 603
social function of research 49–50 visual images 603–6
threats to 45 writing the author 609–10
triangulation 555–8
use of term 44 Xing Xu 200
van Maanen, John 602–3
variables, control of 43–4 Zaret, D. 250–1
variance, true score 270 Zeitdiagnose 55–6, 57, 59–60, 64
verification 591 Znaniecki, Florian 84–5, 198, 449, 480–1

SOCIAL RESEARCH METHODS Pertti Et Al 2008 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SOCIAL RESEARCH METHODS Pertti Et Al 2008 PDF

Uploaded by

Copyright:

Available Formats

The SAGE

First published 2008

SAGE Publications Ltd

SAGE Publications Inc.

SAGE Publications India Pvt Ltd

SAGE Publications Asia-Pacific Pte Ltd

Library of Congress Control Number: 2007929185

British Library Cataloguing in Publication data

A catalogue record for this book is available from

Typeset by CEPHA Imaging Pvt. Ltd., Bangalore, India

1. Social Research in Changing Social Conditions 1

PART I: DIRECTIONS IN SOCIAL RESEARCH 9

2. The End of the Paradigm Wars? 13

3. The History of Social Research Methods 26

4. Assessing Validity in Social Research 42

5. Ethnography and Audience 54

6. Social Research and Social Practice in Post-Positivist Society 68

7. From Questions of Methods to Epistemological Issues: The Case of

8. Research Ethics in Social Science 95

PART II: RESEARCH DESIGNS 111

9. The Core Analytics of Randomized Experiments for Social Research 115

10. Better Quasi-Experimental Practice 134

12. Re-conceptualizing Generalization: Old Issues in a New Frame 193

13. Case Study in Social Research 214

14. Longitudinal and Panel Studies 228

15. Comparative and Cross-National Designs 249

PART III: DATA COLLECTION AND FIELDWORK 265

16. Modern Measurement in the Social Sciences 269

17. Natural and Contrived Data 290

18. Self-Administered Questionnaires and Standardized Interviews 313

19. Qualitative Interviewing and Feminist Research 328

20. Biographical Methods 344

21. Focus Groups 357

PART IV: TYPES OF ANALYSIS AND INTERPRETATION OF EVIDENCE 371

22. An Introduction to the Multilevel Model for Change 377

23. Latent Variable Models of Social Research Data 395

24. Equating Groups 414

25. Discourse Analysis and Conversation Analysis 431

26. Analyzing Narratives and Story-Telling 447

27. Reconstructing Grounded Theory 461

28. Documents and Action 479

29. Video and the Analysis of Work and Interaction 493

30. Secondary Analysis of Qualitative Data 506

31. Secondary Analysis of Quantitative Data Sources 520

32. Conducting a Meta-Analysis 536

34. The Analytic Integration of Qualitative Data Sources 572

35. Combining Different Types of Data for Quantitative Analysis 585

36. Writing and Presenting Social Research 602

Victoria D. Alexander is Senior Lecturer in Sociology at the University of Surrey, and is

Karen Armstrong is Professor of Cultural Anthropology at the University of Helsinki, Finland.

Leonard Bickman, PhD, is Professor of Psychology, Psychiatry and Public Policy. He is

James A. Bovaird is Assistant Professor of Quantitative, Qualitative, and Psychometric

Alan Bryman is Professor of Organisational and Social Research, Management Centre,

Andrea Doucet is Associate Professor in the Department of Sociology and Anthropology,

Susan E. Embretson is a Professor of psychology at the Georgia Institute of Technology.

Giampietro Gobo, PhD, is Associate Professor of Methodology of Social Research and

Suzanne E. Graham is an Assistant Professor at the University of New Hampshire. She is

Rick H. Hoyle is Research Professor of psychology and neuroscience at Duke University,

Matti Hyvärinen is an Academy of Finland Research Fellow, University of Tampere,

Edith de Leeuw is an Associate Professor at the University of Utrecht, Department of

Linda Mabry, Professor of Education at Washington State University Vancouver, specializes

Scott E. Maxwell is Fitzsimons Professor of Psychology at the University of Notre Dame.

Jo Moran-Ellis is Senior Lecturer in the Department of Sociology, University of Surrey. Her

to compliers, who cannot be observed. The Y i = α + β 0 Ti + Bk Xki + εi (13)