Professional Documents
Culture Documents
Handbook of
Social Research
Methods
The SAGE
Handbook of
Social Research
Methods
Edited by
Pertti Alasuutari,
Leonard Bickman,
Julia Brannen
Editorial arrangement and Introduction © Pertti Chapter 19 © Andrea Doucet and Natasha Mauthner 2008
Alasuutari, Leonard Bickman, Julia Brannen 2008 Chapter 20 © Joanna Bornat 2008
Chapter 2 © Alan Bryman 2008 Chapter 21 © Janet Smithson 2008
Chapter 3 © Marja Alastalo 2008 Chapter 22 © Suzanne E. Graham, Judith D. Singer and
Chapter 4 © Martyn Hammersley 2008 John B. Willett 2008
Chapter 5 © Karen Armstrong 2008 Chapter 23 © Rick H. Hoyle 2008
Chapter 6 © Pekka Sulkunen 2008 Chapter 24 © Stephen G. West and Felix Thoemmes 2008
Chapter 7 © Ann Nilsen 2008 Chapter 25 © Charles Antaki 2008
Chapter 8 © Celia B. Fisher and Andrea E. Anushko 2008 Chapter 26 © Matti Hyvärinen 2008
Chapter 9 © Howard S. Bloom 2008 Chapter 27 © Kathy Charmaz 2008
Chapter 10 © Thomas D. Cook and Vivian C. Wong 2008 Chapter 28 © Lindsay Prior 2008
Chapter 11 © Ken Kelley and Scott E. Maxwell 2008 Chapter 29 © Christian Heath and Paul Luff 2008
Chapter 12 © Giampietro Gobo 2008 Chapter 30 © Janet Heaton 2008
Chapter 13 © Linda Mabry 2008 Chapter 31 © Angela Dale, Jo Wathan and Vanessa
Chapter 14 © Jane Elliott, Janet Holland and Rachel Higgins 2008
Thomson 2008 Chapter 32 © Erika A. Patall and Harris Cooper 2008
Chapter 15 © David de Vaus 2008 Chapter 33 © Jane Fielding and Nigel Fielding 2008
Chapter 16 © James A. Bovaird and Susan E. Chapter 34 © Ann Cronin, Victoria D. Alexander, Jane
Embretson 2008 Fielding, Jo Moran-Ellis and Hilary Thomas 2008
Chapter 17 © Susan A. Speer 2008 Chapter 35 © Manfred Max Bergman 2008
Chapter 18 © Edith de Leeuw 2008 Chapter 36 © Amir Marvasti 2008
Apart from any fair dealing for the purposes of research or private
study, or criticism or review, as permitted under the Copyright,
Designs and Patents Act, 1988, this publication may be reproduced,
stored or transmitted in any form, or by any means, only with the
prior permission in writing of the publishers, or in the case of
reprographic reproduction, in accordance with the terms of licences
issued by the Copyright Licensing Agency. Enquiries concerning
reproduction outside those terms should be sent to the publishers.
ISBN 978-1-4129-1992-0
Notes on Contributors ix
11. Sample Size Planning with Applications to Multiple Regression: Power and
Accuracy for Omnibus and Targeted Effects 166
Ken Kelley and Scott E. Maxwell
33. Synergy and Synthesis: Integrating Qualitative and Quantitative Data 555
Jane Fielding and Nigel Fielding
Index 617
Notes on Contributors
Marja Alastalo is post-doctoral Research Fellow in the Department of Sociology and Social
Psychology, University of Tampere, Finland. She is interested in history of research methods
and sociology of knowledge and science. Currently she is doing research on the processes of
harmonizing social statistics in the European Union.
Pertti Alasuutari, PhD, is Professor of Sociology and Director of the International School of
Social Sciences at the University of Tampere, Finland. He is editor of the European Journal
of Cultural Studies, and has published widely in the areas of cultural and media studies and
qualitative methods. His books include Desire and Craving: A Cultural Theory of Alcoholism
(SUNY Press, 1992), Researching Culture: Qualitative Method and Cultural Studies (SAGE,
1995), An Invitation to Social Research (SAGE, 1998), Rethinking the Media Audience (SAGE,
1999), and Social Theory and Human Reality (SAGE, 2004).
Charles Antaki, PhD, is Professor of Language and Social Psychology at the University of
Loughborough, where he is a member of the Discourse and Rhetoric Group. He is Associate
Editor of Research on Language and Social Interaction, and among his books are Identities
in Talk (SAGE, 1998; with Susan Widdecombe) and Conversation Analysis and Psychotherapy
(CUP, 2007; with Anssi Perakyla, Sanna Vehvilainen, and Ivan Leudar). He has published widely
on language and interaction.
Andrea E.Anushko, MAis a graduate student in the applied developmental psychology program
at Fordham University and the project coordinator for the Fordham Resident Alcohol Prevention
Program at the Center for Ethics Education. Her research interests include language development
and early education.
Manfred Max Bergman is Professor of Sociology at Basel University, Switzerland. His areas
of specialization are political sociology and research methods. His research interests relate to
x NOTES ON CONTRIBUTORS
stratification, identity, and inter-group relation, and his recent publications focus on poverty,
stratification and mobility, mixed methods research, and data quality.
Howard S. Bloom, Chief Social Scientist for MDRC, specializes in the design and analysis of
experimental and quasi-experimental studies of causal effects. He has conducted a number of
such studies and has written widely on methodologies for them.
Joanna Bornat is Professor of Oral History in the Faculty of Health and Social Care at the
Open University. She has researched and published in the areas of oral history and ageing for a
number of years. Her current research interests include the secondary analysis of archived data.
Julia Brannen is Professor of the Sociology of the Family, Institute of Education, University of
London. Her main interests are in research methodology; the family lives of parents, children,
and young people; and the relation between paid work and family life. She is a co-founder
and co-editor of the International Journal of Social Research Methodology. Books include:
Mixing Methods: Qualitative and Quantitative Research (Ashgate, 1992), Connecting Children:
Care and Family Life in Later Childhood (Falmer, 2000), Young Europeans, Work and Family
(Routledge, 2002), Rethinking Children’s Care (OUP, 2003), Working and Caring over the
Twentieth Century (Palgrave, 2004), and Coming to Care (Policy Press, 2007).
Kathy Charmaz is Professor of Sociology and Coordinator of the Faculty Writing Program
at Sonoma State University. Her books include Good Days, Bad Days: The Self in Chronic
Illness and Time (Rutgers, 1993) and Constructing Grounded Theory: A Practical Guide through
Qualitative Analysis, published by SAGE, London, and has co-edited the forthcoming The SAGE
Handbook of Grounded Theory. She received the 2006 George Herbert Mead award for lifetime
achievement from the Society for the Study of Symbolic Interaction.
Thomas D. Cook has a BA from Oxford and a PhD from Stanford and is a Professor of sociology,
psychology, education and social policy, and Joan and Serepta Harrison Chair in Ethics and
Justice at Northwestern University. His main interests are in social science methodology and
contextual influences on adolescent development.
NOTES ON CONTRIBUTORS xi
Harris Cooper is Professor of psychology and Director of the Program in Education at Duke
University. His research interests include research synthesis methodology and applications of
social psychology to education policies and practices.
Ann Cronin, BSc, PhD (Surrey) is Lecturer in Sociology at the University of Surrey. She
teaches a variety of courses relating to social theory, methodology, and the substantive topics
of gender and sexuality. Her research interests lie in the social construction of sexual identities
and qualitative methodologies.
Angela Dale is Professor of Quantitative Social Research at the Centre for Census and
Survey Research, University of Manchester. She is Director of the ESRC’s Research Methods
Programme and heads a team providing support for government datasets as part of the UK’s
Economic and Social Data Service. From 1993 to 2003, she led the academic team responsible for
the development and dissemination of samples of microdata from the UK Census of Population.
Jane Elliott, PhD, is reader of Research Methodology and Principal Investigator of the 1958
and 1970 British Birth Cohort Studies at the Centre for Longitudinal Studies at the Institute
of Education, University of London. She has a long-standing interest in combining qualitative
and quantitative methodologies and has published in the areas of methodology, gender, and
employment. Her book Using Narrative in Social Research: Qualitative and Quantitative
Approaches was published by SAGE in 2005.
Jane Fielding is Senior Lecturer in Quantitative Sociology, University of Surrey, and teaches
statistics and computing at both undergraduate and postgraduate levels. Recent research projects,
supported by funding from the Environment Agency, include flood warning for vulnerable
groups and the public response to flood warning and, more recently, a study of environmental
inequalities. Her particular interest is in mapping and measuring environmental inequalities using
geographical information techniques. She was also a co-holder on an ESRC Methods Programme
project (2002–2005) exploring the integration of quantitative and qualitative methods in an
investigation of the concept of vulnerability.
Nigel Fielding is Professor of Sociology and co-Director of the Institute of Social Research,
University of Surrey. His research interests are in qualitative research methods, mixed methods
research design, and new technologies for social research. His books include Linking Data
(SAGE, 1986; with Jane Fielding), a study of methodological integration; Using Computers
in Qualitative Research (SAGE, 1991; edited with Raymond M. Lee), an influential book on
qualitative software; Computer Analysis and Qualitative Research (SAGE, 1998; with Raymond
M. Lee), a study of the role of computer technology in qualitative research; and Interviewing
(SAGE, 2002; editor), a four volume set; he is currently co-editing the Handbook of Online
Research Methods (SAGE).
xii NOTES ON CONTRIBUTORS
Celia B. Fisher holds the Marie Doty Chair in Psychology at Fordham University where she
also directs the Center for Ethics Education. Her professional interests are in developing ethical
standards for the discipline of psychology and federal guidelines for the protection of vulnerable
populations in research.
Martyn Hammersley is Professor of Educational and Social Research at the Open University.
His early research was in the sociology of education. Much of his more recent work has
been concerned with the methodological issues surrounding social and educational enquiry.
His most recent books are Taking Sides in Social Research (Routledge, 2000); Educational
Research, Policymaking and Practice (Paul Chapman, 2002); and Media Bias in Reporting
Social Research? The Case of Reviewing Ethnic Inequalities in Education (Routledge, 2006).
He is currently working on the issue of research ethics.
Christian Heath is Professor at King’s College London, and leads the Work Interaction and
Technology research group. He specializes in video-based studies of social interaction drawing
on ethnomethodology and conversation analysis. He is currently undertaking projects in areas
that include health care, museums and galleries, and auctions.
Janet Heaton, BA (Hons), is Research Fellow at the Social Policy Research Unit, University
of York. She is the author of Reworking Qualitative Data (SAGE, 2004), and has published a
number of articles based on her mainly qualitative research on health and social care services
for patients and their families in the UK.
Vanessa Higgins is based at the Centre for Census and Survey Research, University of
Manchester, where she works for ESDS Government, providing support for research and
teaching using the large-scale government datasets. Prior to this, Vanessa worked at the Office for
National Statistics and also on a number of policy-led research projects within academic settings.
Janet Holland is Professor of Social Research and co-Director of the Families and Social
Capital ESRC research group at London South Bank University. She also co-directs Timescapes:
Changing Relationships and Identities through the Life Course, a multi-university, large-scale
qualitative longitudinal study. Research interests cover youth, education, gender, sexuality and
family life, and methodology, and she has published widely in these areas. Examples are
Sexualities and Society (Polity Press, 2003; edited with Jeffrey Weeks and Matthew Waites);
NOTES ON CONTRIBUTORS xiii
Feminist Methodology: Challenges and Choices (SAGE, 2002; with Caroline Ramazanoglu);
and Inventing Adulthoods: A Biographical Approach to Youth Transitions (SAGE, 2007; with
Sheila Henderson, Sheena McGrellis, Sue Sharpe, and Rachel Thomson).
Ken Kelley is an Assistant Professor in the Inquiry Methodology Program at Indiana University,
where his research focuses on methodological and statistical issues that arise in the behavioral,
educational, and social sciences. More specifically, Dr. Kelley’s research focuses on the design
of research studies, with an emphasis on sample size planning from the power analytic and
accuracy in parameter estimation approaches, and the analysis of change, with an emphasis on
multilevel change models nonlinear in their parameters.
Paul Luff is Reader of Organisations and Technology at King’s College, University of London.
His recent publications include Technology in Action (Cambridge University Press, 2000; with
Christian Heath) and numerous articles in journals and books. He is co-editor of Workplace
Studies: Recovering Work Practice and Informing System Design (Cambridge University Press,
2000).
Amir Marvasti is Assistant Professor of Sociology at Penn State Altoona. His research focuses
on social construction and representation of deviant identities in everyday life. He is the author
of Being Homeless: Textual and Narrative Constructions (Lexington Books, 2003), Qualitative
Research in Sociology (SAGE, 2003), and Middle Eastern Lives in America (Rowman &
Littlefield, 2004; with Karyn McKinney). His articles have been published in the Journal of
Contemporary Ethnography, Qualitative Inquiry, and Symbolic Interaction.
Natasha Mauthner is a Senior Lecturer at the University of Aberdeen, where she teaches
courses on qualitative research methods, and gender, work, and organization. She has published
extensively on methodological and epistemological issues in qualitative research. Much of this
work has focused on the links between reflexivity, research practice, and the construction of
knowledge, and the implications for data analysis, data archiving, and the politics of research
management. Her empirical research has focused on issues of gender, work, and family and has
been published in a number of publications including The Darkest Days of My Life: Stories of
Postpartum Depression (Harvard University Press, 2002).
Judith D. Singer is the James Bryant Conant Professor of Education at Harvard University and
former academic Dean of the Harvard Graduate School of Education. As one of the nation’s
leading applied statisticians she is primarily known for her contributions to the practice of
multilevel modeling, survival analysis, and individual growth modeling.
Janet Smithson is a post-doctoral Research Fellow in the Schools of Law and Psychology at the
University of Exeter. She has worked on a variety of national- and European-funded research
projects, using both qualitative and quantitative research methods. Her main research interests
are in cross-national comparative research on work–family, youth, transitions to adulthood and
NOTES ON CONTRIBUTORS xv
parenthood, gender and discourse, and qualitative methodology. She is currently working on a
Nuffield-funded study ‘The common law marriage myth and cohabitation law revisited’ with
Anne Barlow and Carole Burgoyne, University of Exeter.
Hilary Thomas is Professor of Health Care Research in the Centre for Research in Primary
and Community Care, School of Nursing and Midwifery, University of Hertfordshire. She was
previously Senior Lecturer in the Department of Sociology, University of Surrey. Her substantive
research interests include the sociology of health and illness, particularly reproduction and
women’s health, and recovery from illness and injury. She was convenor of the BSA Medical
Sociology Group (1991–1994) and president of the European Society for Health and Medical
Sociology (1999–2003).
Rachel Thomson is Professor of Social Research in the Faculty of Health and Social
Care at the Open University. Her research interests include youth transitions, gender/sexual
identities, and social change, and she has published widely in these fields. She is part of the
team that conducted a 10-year qualitative longitudinal study of youth transitions (Inventing
Adulthoods) and is currently researching the transition to motherhood. Forthcoming publications
include Researching Social Change: Qualitative Approaches to Personal, Social and Historical
Approaches (with Julie McLeod) published by SAGE in 2008.
David de Vaus is Professor of Sociology and Dean of the Faculty of Humanities and Social
Sciences at La Trobe University, Australia. He is the author of a number of internationally
renowned books on research methods including Surveys in Social Research (Routledge, 2001)
and Research Design in Social Research (SAGE, 2001). His main areas of research are family
sociology, living alone, life course transitions, and the sociology of ageing. Further details are
available at http://www.latrobe.edu.au/humanities/devaus.html.
Jo Wathan is Research Fellow at the Cathie Marsh Centre for Census and Survey Research.
She works as a member of two data support teams for British cross-sectional microdata: ESDS
Government and the Samples of Anonymised Records Support team. She also teaches classes
on statistical software and secondary analysis.
xvi NOTES ON CONTRIBUTORS
Stephen G. West is currently Professor of psychology at Arizona State University, and was
the editor of Psychological Methods for six years. His research interests are in field research
methods, multiple regression analysis, longitudinal data analysis, and multilevel modeling.
John B. Willett is Charles William Elliot Professor at Harvard University Graduate School of
Education. He is interested in all things quantitative, particularly statistical methods for analyzing
the timing and occurrence of events; methods for modeling change, learning, and development;
and longitudinal research design.
Vivian C. Wong is training to be a Research Methodologist in the field of education. Her interests
include examination of the following areas: recent shifts in methodology choice in education;
empirical tests of quasi-experimental designs such as regression-discontinuity (RD), abbreviated
interrupted time series, and difference-in-differences designs; and issues in implementation and
analysis of regression-discontinuity studies.
1
Social Research in Changing
Social Conditions
According to Herbert Blumer (1969), method- methodological work in which they were
ology refers to the ‘entire scientific quest’ engaged. Thus, the contributors draw not
that has to fit the ‘obdurate character of the only upon their own research experiences but
social world under study’. Thus methodology relate their discussions in Blumer’s terms to
is not some super-ordained set of logical the larger issue of strategy, that is tailoring
procedures that can be applied haphazardly to methodological processes to fit the empirical
any empirical problem. In short methodology world under study.
constitutes a whole range of strategies and Across the social sciences and humanities,
procedures that include: developing a picture there are differences in the development and
of an empirical world; asking questions about popularity of particular methods, differences
that world and turning these into researchable that are also evident cross-nationally. From
problems; finding the best means of doing the 1930s onward survey research and sta-
so – that involve choices about methods and tistical methods have assumed a dominant
the data to be sought, the development and position, whereas qualitative methods have
use of concepts, and the interpretation of gained ground more recently. There has also
findings (Blumer 1969: 23). Methods per se been a recent resurgence of interest both
are therefore only one small part of the in the social sciences and humanities in
methodological endeavor. quantitative methods and in mathematical
In producing this book we address the modes of inquiry, for example, fuzzy logic
methodology of social science research and (Ragin 2000). Mixing different methods (e.g.
the appropriate use of different methods. The Goldthorpe et al. 1968) and the innovative
contributors describe and question different use of statistical analysis (e.g. Bourdieu 1984)
phases of the research process with many are not, however, recent phenomena. The
focusing upon one or more methods, often growth of explicit interest in mixed-methods
in combination with others. What unites research designs dates from the late 1980s,
their contributions is the way they relate resulting in a number of specialist texts
the discussion of method to the broader (Brannen 1992, Bryman 1988, Creswell 2003,
2 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Tashakkori and Teddlie 2003) but the practice there is no uniform ‘qualitative research’
has historically been intrinsic to many types either. Because much of the craft of empir-
of social science research. In qualitative ical social research cannot be classified as
research, many researchers have incorporated either qualitative or quantitative, an increased
several quantitative approaches such as cross- permissiveness toward mixing methods and
tabulation of their data (Alasuutari 1995, questioning of the binary system formed by
Silverman 1985, 2000); and some have the terms ‘qualitative’ and ‘quantitative’ are
adopted a multivariate approach (Clayman welcome trends.
and Heritage 2002). In 1987 Charles Ragin In this new paradigmatic situation many
published his text on qualitative compara- contemporary scholars no longer regard it as
tive methods (Ragin 1987), which lies in reasonable to divide the field of methodology
between qualitative and quantitative methods into opposing camps. On the one hand,
and draws upon logic rather than statistical researchers are willing to learn more about the
probability. Historically there has been a possibilities of applying survey methods and
plurality of practices of social research. statistics to their data analysis. On the other
What distinguishes the social sciences hand, what is known as ‘qualitative research’
today is a positive orientation toward engag- has gone a long way since Malinowski’s
ing in different types of research practice. (1922) principles of ethnography or Glaser
Present-day scholars undertaking empirical and Strauss’ (1967) grounded theory. Dif-
research view methods as tools or optics ferent methods of analyzing talk, texts and
to be applied to several different kinds social interaction have multiplied the ‘optics’
of research questions that they and their available to scholars who want to study social
funders seek to address in carrying out reality from different viewpoints.
research. Coding observations and subjecting This book charts the new and evolving
them to statistical processes is one way of terrain of social research methodology in
creating and explaining patterns. Case study an age of increasing pluralism. By putting
and comparative approaches are others: the together different approaches to the study
explication of the logic that brings together of social phenomena within a single vol-
the clues about a case and has an explanatory ume, the Handbook serves as an invaluable
purpose with reference to other cases. These resource for researchers who wish to approach
two approaches can also be combined as in research with an open mind and decide
embedded case studies that employ both a case which methodological strategies to adopt in
study design and a survey design. empirical research in order to understand the
Although qualitative and quantitative meth- social world. Given the scope of the field
ods have evolved from very different scientific of social research methodology, this volume
traditions as, among others, Charles Ragin concentrates on mapping the field rather than
(1994) points out, from the viewpoint of how discussing each and every aspect and method
empirical data are used to validate and defend in detail. In this way the Handbook serves not
an interpretation, they form a continuum. It only as a manual but also as a roadmap. If and
can be argued that the two concepts, ‘qualita- when the reader wants to learn more about a
tive’ and ‘quantitative’, are not so much terms particular aspect of methodology or method,
for two alternative methods of social research he or she can consult other literature.
as two social constructs that group together
particular sets of practices (see Chapter 2).
For instance, quantitative research draws on CHALLENGING THE PROGRESS
many kinds of statistical approaches and is NARRATIVE
not necessarily epistemologically positivistic
in orientation. While the social survey is the Why social research seems to be heading
current dominant, paradigmatic form, there is toward greater open-mindedness in method-
no uniform ‘quantitative research’. Similarly, ological strategies can easily be interpreted
SOCIAL RESEARCH IN CHANGING SOCIAL CONDITIONS 3
as proof of scientific progress. It is tempting small band and that practically all of them are
to think that after decades of hostility American, because both authors come from
between different methodological camps, the United States. Moreover, the closer to the
notably between qualitative and quantitative present, the more frequently there are new
researchers, we have now finally acquired moments, and the narrower the group.
the wisdom to see that the best results can To follow suit in this book, it would be
be achieved by addressing different ways of quite easy to find good reasons for arguing
framing research questions and by bringing that the methods represented here are a
to bear the means to ensure the validity of natural outcome of scientific progress in social
data analysis and interpretation. This may research methodology. One such argument
imply the use of a mixed method design; in may be that scientific progress constitutes
qualitative research it may mean employing the closure of the gap between qualitative
innovative approaches such as hypermedia and quantitative methods; that by pursuing
or, in social surveys, multi-mode approaches. a multi-method approach we can best tackle
When researchers adopt new methods they the tasks of the social sciences in today’s
will require the guidance of methodological society.
texts. The Handbook represents our attempt Even though we are not unsympathetic
to provide such guidance. to such a view, there are also problems
When discussing developments in social with that argument. Unlike natural science,
research methodology, it is also common to whose development can be described as the
justify change through a narrative in which vertical accumulation of knowledge about
problems and omissions in past research the laws of nature, human sciences are quite
practices and paradigms have led to new different. They are more like a running
approaches. For instance, in the influential commentary on the cultural turns and political
Handbook of Qualitative Research Denzin events of different societies, communities,
and Lincoln recount the development of institutions and groups that change over time.
qualitative research in terms of a progress Social science research not only speaks to
narrative (Denzin et al. 2000). According to particular social conditions; it reflects the
them, the history of qualitative research in social conditions of a society and the theories
the social and behavioral sciences consists that dominate at the time. Because there
of seven moments or periods: the traditional is no unidirectional progress in social and
(1900–1950); the modernist or golden age societal development, the theoretical and
(1950–1970); blurred genres (1970–1986); methodological apparatus available to social
the crisis of representation (1986–1990); scientists change as they too are shaped by
the postmodern, a period of experimental historical, structural and cultural contexts.
and new ethnographies (1990–1995); post- The notion that eventually methodology may
experimental inquiry (1995–2000); and the consist in a collectively usable toolbox of
future (2000–). As informative as their methods is illusory. Methodological traditions
description of the development of qualitative vary across societies and they are also subject
research is, their story also testifies to the to fashion with some more popular at one
problems and dangers of such a narrative. moment in time and in a particular context
Despite their caveats, their progress narrative than others. In any case it is rare for a wholly
functions implicitly as an enlightenment new method to be developed.
discourse, suggesting where up-to-date, well-
informed researchers should be heading if
they are not already there and likewise METHODOLOGICAL PLURALISM AND
identifying exemplary studies that represent EVIDENCE-BASED RESEARCH
the avant-garde or the cutting edge of present-
day qualitative research. It is hardly a surprise From this viewpoint, changes in social
that the researchers in question are a very research must always be seen in their social
4 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
and historical contexts. Thus, our assumption medicine’s Cochrane Collaborative, focuses
that there is a trend toward greater permis- on systematic evidence of the effectiveness
siveness in methodology stems from our own of programs in mental health, education
experience as scholars working in countries and criminal justice. At the federal level
that belong to the Organisation for Economic of government the agencies themselves are
Co-operation and Development (OECD)1 . now responsible for providing formal reviews
In addition, our experience stems from of their agency’s performance through the
primarily following the English language Government Performance and Results Act
literature.According to our analysis, that trend (GPRA).
is due to the position that social research has The systematic review of social research
been required to adopt. During recent decades, evidence is widespread in quantitative
the OECD countries have experienced a research whose quality is seen to be mea-
climate of increased accountability in public surable in ‘scientific terms’. Systematic
expenditure and a requirement that research review is also being applied to qualitative
should serve policy ends and ‘user’ interests2 . research, a process that is requiring
In particular the promotion and dominance researchers in this genre to develop more
of the concept of new public management rigorous and convincing arguments for their
by the OECD and its member countries evidence as well as criteria against which
is a key factor. As part of the growing such studies may be measured.
pervasiveness of neoliberal principles, public Social research is also affected by the
policy decisions are required to be grounded increasing prevalence of cross-disciplinary
in evidence-based, scientifically validated pilot or applied projects that serve as tools
research. This has also led to developments to develop solutions to social, economic
in social science research: the ‘systematic and environmental problems. Typically such
review’ process, one of the catchwords also projects, often developed in co-operation
promoted by the OECD, has become a major between public, private and civil society
area of methodological investment in the sectors, include a practical research element
social sciences. and the evaluation of results. One of the
For instance in the United States, although aims is to generate ‘best practices’ that are
the emphasis on policy is not as strong, the to be promoted worldwide3 . Such a model for
tradition of action research and the account- the improvement of governance creates new
ability of research to a diversity of ‘user’ roles and requirements for social research.
groups is longstanding. Program evaluation is The close co-operation of researchers with
a significant player in the policy environment. policy-makers and the merging of the roles
Most government agencies require that their of project manager and researcher challenge
demonstration programs be evaluated. One the ideals of rigorous science, thus creating
research agency, the Institute of Educational an increased interest in action research
Sciences, has in the last few years shifted to methodology. Second, the evaluation of pilot
rigorous randomized experiments. There are or demonstration projects has contibuted to
forces promoting evidence-based treatments the further development of a whole evaluation
in health, mental health and education. Even research industry. Additionally, the marketing
though the evidence-based medicine approach of such pilot projects as best practice creates
originated in Great Britan, the United States an aura of research as scientifically system-
is emphasizing the existence of such evidence atic, although the emphasis is on practical,
in the funding of health and mental health policy-directed research.
services. The U.S Department of Education, The growing market for policy-directed
through its No Children Left Behind pro- and practice-oriented social research does not
gramme is requiring quantitative evidence of necessarily or directly affect academic social
academic improvement. The establishment of science the same way in all contexts. In some
the Campbell Collaborative, modeled after contexts universities need to complement
SOCIAL RESEARCH IN CHANGING SOCIAL CONDITIONS 5
shrinking public funding with money from in quantitative methods. This development,
external sources, while in other countries however, must be seen against the larger
such as the UK universities are increasingly picture in which qualitative research can be
being seen and run as businesses, with placed at the forefront, because qualitative
research income from external sources sought methods have gained popularity particularly
at ‘full economic cost’. Within Academe, during the past two or three decades. Despite
one consequence of the growing market of increasingly pluralist attitudes toward quanti-
policy-directed research is that the position of tative methods, a major proportion of British
traditional disciplines is weakened as a result sociologists, for instance, conduct qualitative
of the growth of cross-disciplinary theme- inquiries. A recent study shows that only
based research programmes, which are fishing about one in 20 of published papers in the
in the new funding pools of research and mainstream British journals uses quantitative
development. This, in turn, affects the field analysis (Payne et al. 2004). The figures are
of methodology. Cross-disciplinary applied about the same in Finland (Räsänen et al.
research improves the transfer of knowledge 2005), and the same trend, a forward march
between hitherto bounded disciplines, thus of qualitative research particularly from 1990s
constructing methodology as an arena and onward, can also be detected in Canada (Platt
area of expertise that spans disciplines. 2006) and the U.S. (Clark 1999).
In some ways, this has also meant that The increase in the popularity of qualitative
methodology has become a discipline in methods has coincided with new theoretical
itself, or at least it has assumed part of trends that have many names. One talks, for
the role of traditional disciplines. Vocational instance, of a linguistic or cultural turn, or
apprenticeships conducted within a particular about interpretive social science. Overall, we
discipline have been overtaken by training could say that constructionist approaches have
courses for the new generation of researchers gained ground from scientific realism and
who are schooled in a broad repertoire structural sociology.Along with this paradigm
of methods. While it is always useful to shift, personal experience, subjectivity and
master a large toolbox of methods, the identity have become key concerns for many
danger is that without a strong link between social researchers. For instance in British
theory and practice via a particular discipline, sociology, as Carl May (2005: 522) points
for example sociology, people lack what out, ‘after the political watershed of the early
C. Wright Mills (1959) called the ‘sociolog- 1980s, much explicitly Marxist analysis dis-
ical imagination’. As methodology acquires appeared, to be subsumed by social construc-
a higher status across all the social sciences tionism and postmodern theoretical positions
and more emphasis is placed on displaying that also privilege subjectivity and experience
methodological rigour, there is the need to be over objectification and measurement’. He
mindful of Lewis Coser’s admonition to the emphasizes that in different ways, subjectivity
American Sociological Association in 1975 seems to have been one of the central concerns
against producing researchers ‘with superior of British sociology since the 1980s, which
research skills but with a trained incapacity according to him also explains the popularity
to think in theoretically innovative ways’ of qualitative investigation. Indeed, a recent
(Coser 1975). study shows that only about one in 20 of
published papers in the mainstream British
journals uses quantitative analysis (Payne
THE RELEVANCE OF QUALITATIVE et al. 2004).
RESEARCH An interest in cultural studies and construc-
tionist research grew up out of a desire by
In recent years advanced capitalist societies social scientists to distance themselves from
have indeed witnessed increasing method- economistic Marxism and structural sociol-
ological pluralism and a resurgence of interest ogy, particularly in the UK. Other political
6 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
influences were also important. For example, link between media research and changes in
under the influence of the Women’s Move- media policy throughout the OECD countries:
ment in the 1970s feminist social scientists while the deregulation of public broadcasting,
sought to address gender inequality and promoted and reviewed by the OECD (OECD
to focus upon women’s perspectives in 1993, 1999), was started during the 1980s,
public and private spheres. By the early reception studies and qualitative audience
1980s qualitative research had established research gained in momentum from the 1970s
a foothold, and by the early 1990s qual- onward4 . For the most part, however, the
itative methods had become mainstream increased interest in subjectivity and identity
in Finnish sociology (Alastalo 2005) and construction within academic (qualitative)
pervasive in the UK. Theory-wise, differ- research is only indirectly related to its policy
ent strands of constructionist thought have relevance.
gained popularity, and the development has
meant an increased interest in questions of
identity. THE IMPORTANCE OF REFLECTIVITY
In the United States qualitative research
developed particularly in response to ‘scien- All in all, social research is being forced to
tistic’ sociology and to research techniques perform a more strategic role in society than
that require a deductive model of hypothesis hitherto. Our argument is not that this strategic
testing. The more inductive approach of role is the sole determinant of developments
qualitative research was seen not only as a in social research, or the kinds of research
better way to explain social phenomena by methods that are used. However, we think
understanding the meaning of action, but it it is important for social scientists to be
was also seen as a way to ‘give voice’ to conscious of the social conditions of our
the underdog, to help see the world from the profession. In that way we are likely to be
viewpoint of the oppressed rather than the better equipped to meet the changing demands
oppressor (Becker 1967, Becker and Horowitz upon us, for instance the need to argue for
1972). Like European sociology, the rise of the methodological strategies we employ and
qualitative research has meant a trend ‘away’ the way we interpret our data. On the one
from determinism to active agency and to hand, we need to retain a sense of integrity
questions of subjectivity. about the claims we make for our research
It seems that the increased interest in evidence while, on the other, we need to take
qualitative research is partly due to recent part in a dialogue with the funders and users of
policy changes, which have foregrounded social research. Reflectivity about the position
questions of subjectivity in many ways. For of social scientists and their public role will
instance, when public services are marketized enable them to retain a critical edge toward
or privatized and citizens are turned into research.
customers, there is demand for expertise on Under the present conditions in which
subjectivity (Rose 1996: 151). Sometimes social research has an increasingly close
the link between policy changes and an link with policy-makers and methodology is
increasing demand for qualitative research assuming higher status in the social sciences, it
can be quite direct. For instance, when is more important than ever to emphasize that
the deregulation of the Finnish electronic methods cannot be seen as separate from the
media system started during the first part ‘entire scientific quest’ and should include the
of the 1980s, YLE, the national public inspiration of theory. This is the spirit of this
broadcasting company quickly launched a book. It is meant to be an aid to researchers in
fairly big qualitative research program to their attempt to perform innovative research.
study the audiences, their way of life and As researchers have always known, one of the
viewing preferences to fight for its share of keys to good research is to challenge one’s
the audience. There appears to be a similar own assumptions and to carry out the study in
SOCIAL RESEARCH IN CHANGING SOCIAL CONDITIONS 7
such a way that the data have the possibility orientations. It is well known for its individual country
of surprising the researcher. surveys and reviews.
2 European Union funding requires research that
produces ‘impacts’ and addresses the concerns of the
social partners.
USING THE HANDBOOK 3 For this task, there is an international Best
Practices database, maintained by the United Nations,
The Handbook is structured around the differ- UNESCO and non-profit organizations (http://www.
ent phases of the research process: research bestpractices.org/index.html).
4 For the development of qualitative audience
design, data collection and fieldwork, and the research, see Alasuutari 1999.
processes of analyzing and interpreting data.
First, however, it begins with several chapters
of more overarching importance that set out REFERENCES
some important current issues and directions
in social research: such as the history and Alastalo, Marja (2005) Metodisuhdanteiden mahti:
present state of social research, the debate Lomaketutkimus suomalaisessa sosiologiassa
about research paradigms, the issue of judging 1947–2000 [The Power of Methodological Trends:
the credibility of different types of social Survey Research in Finnish Sociology 1947–2000].
science research, and the importance now Tampere: Vastapaino.
being placed upon research ethics. Alasuutari, Pertti (1995) Researching Culture: Qualita-
The contents of the Handbook have several tive Method and Cultural Studies. London: Sage.
features that are not present in all such texts. Alasuutari, Pertti (1999) ‘Three Phases of Reception
As well as ranging widely across the field of Studies.’ Pp. 1–21 in Rethinking the Media Audience:
The New Agenda, edited by Alasuutari, Pertti.
social research methodology, we have been
London: Sage.
selective in including a number of chapters
Becker, Howard S. (1967) ‘Whose Side Are We On?’
that discuss the combining of qualitative Social Problems 14(3): 239–47.
and quantatiative methods and integrating Becker, Howard S. and Irving Louis Horowitz (1972)
different types of data. The book is also ‘Radical Politics and Sociological Research: Observa-
particularly strong in its section on data tions on Methodology and Ideology.’ Americal Journal
analysis and includes four chapters on the of Sociology 78(1): 48–66.
analysis of quantitative data, five devoted Blumer, Herbert (1969) Symbolic Interactionism:
to qualitative data analysis, and three to the Perspective and Method. Berkeley, CA: University of
integration of data of different types. It also California Press.
covers the secondary analysis of qualitative Bourdieu, Pierre (1984) Distinction: A Social Critique of
the Judgement of Taste. London: Routledge & Kegan
and quantitative data with one chapter on
Paul.
meta-analysis, and another on writing up and
Brannen, Julia (1992) Mixing Methods: Qualitative and
presentation of social research. Quantitative Research. Aldershot: Avebury.
Bryman, Alan (1988) Quantity and Quality in Social
Research. London: Unwin Hyman.
NOTES Clark, Roger (1999) ‘Diversity in Sociology: Problem or
Solution?’ American Sociologist 30(3): 22–41.
1 Originally set up in 1947 with support from Clayman, Steven E. and John Heritage (2002)
the United States and Canada to co-ordinate the ‘Questioning Presidents: Journalistic Deference and
Marshall Plan for the reconstruction of Western Adversarialness in the Press Conferences of U.S.
Europe after World War II, today the OECD consists Presidents Eisenhower and Reagan.’ Journal of
of 30 member countries sharing a commitment to Communication 52(4): 749–75.
democratic government and the market economy. It
Coser, L (1975) ‘Presidential address: Two methods in
plays a prominent role in fostering good governance
in the public service and in corporate activity and
search of a substance.’ American Sociological Review
helps governments to ensure the responsiveness of 40(6): 691–700.
key economic areas with sectoral monitoring. By Creswell, John W. (2003) Research Design: Qualitative,
deciphering emerging issues and identifying policies Quantitative, and Mixed Methods Approaches.
that work, it helps policy-makers adopt strategic 2nd ed. London: Sage.
8 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Denzin, Norman K. and Yvonna S. Lincoln (2000) Platt, Jennifer (2006) ‘How Distinctive Are Canadian
‘Introduction: The Discipline and Practice of Qualita- Research Methods?’ Canadian Review of Sociology &
tive Research.’ Pp. 1–28 in Handbook of Qualitative Anthropology 43(2): 205–31.
Research, 2nd ed., edited by Denzin, Norman K. and Ragin, Charles C. (1987) The Comparative Method:
Yvonna S. Lincoln. Thousand Oaks: Sage. Moving Beyond Qualitative and Quantitative Strate-
Glaser, Barney G. and Anselm L. Strauss (1967) gies. Berkeley: University of California Press.
The Discovery of Grounded Theory: Strategies for Ragin, Charles C. (1994) Constructing Social Research:
Qualitative Research. Chicago: Aldine Transaction. The Unity and Diversity of Method. Thousand Oaks:
Goldthorpe, John H., David Lockwood, Frank Bechhofer Pine Forge Press.
and Jennifer Platt (1968) The Affluent Worker: Ragin, Charles C. (2000) Fuzzy-Set Social Science.
Industrial Attitudes and Behaviour. Cambridge: Chicago: University of Chicago Press.
Cambridge University Press. Rose, Nikolas (1996) Inventing Our Selves: Psychology,
Malinowski, Bronislaw (1922) Argonauts of the Western Power, and Personhood. Cambridge, England;
Pacific. London: G. Routledge & Sons. New York: Cambridge University Press.
May, Carl (2005) ‘Methodological Pluralism, British Räsänen, Pekka, Jani Erola and Juho Härkönen (2005)
Sociology and the Evidence-based State: A Reply to ‘Teoria ja tutkimus Sosiologia-lehdessä [Theory and
Payne et al.’ Sociology 39(3): 519–28. research in the Sosiologia journal].’ Sosiologia 42(4):
Mills, Wright C. (1959) The Sociological Imagination. 309–14.
New York: Oxford University Press. Silverman, David (1985) Qualitative Methodology and
OECD (1993) ‘Competition Policy and a Changing Sociology: Describing the Social World. Aldershot:
Broadcast Industry.’ Gower.
OECD (1999) ‘Regulation and Competition Issues in Silverman, David (2000) Doing Qualitative Research:
Broadcasting in the Light of Convergence.’ A Practical Handbook. London: Sage.
Payne, Geoff, Malcolm Williams and Suzanne Tashakkori, Abbas and Charles Teddlie (2003)
Chamberlain (2004) ‘Methodological Pluralism in Handbook of Mixed Methods in Social and Behavioral
British Sociology.’ Sociology 38(1): 153–63. Research. London: Sage.
PART I
What is the state of the art of social research? contributed to an exaggerated distinction
What are its new directions in terms of between two camps, when in fact social
methods, credibility, ethical questions, and its researchers using quantitative methods have
relationship to the users of research? As was always been innovative and pragmatic in
discussed in the introduction, to understand applying different approaches. Because of the
better the current trends we need to place focus upon differences between methodolo-
them in historical and societal context. Social gies, we tend to miss the continuing diversity
research does not only follow its own logic of that exists within qualitative and quantitative
scientific progress but rather responds to and research. On the other hand, as Bryman
at times also influences social change. (Chapter 2) notes, there is a hierarchy of status
Part I of this book discusses the current given to particular research designs within the
state of social research and places it in quantitative tradition in which experimental
historical context. The chapters approach the methods with their superiority in offering
present condition of social research from causal explanations are positioned at the top.
different angles and complement each other In contrast, qualitative research is represented
in producing a picture of the field, in which by diversity rather than hierarchy. The trend
some of the earlier controversies or tensions is, however, towards an increase in the explicit
are left behind and new ones emerge. use of mixed methods research designs and a
It is interesting that methodology, as the growing pragmatism and diversity in the ways
means of knowing, has become a forum in which such researchers view the integration
for furious disputes, generally known as the of qualitative and quantitative data.
paradigm wars. More generally, there is a Why is it, then, that the self-identity
tendency in the field of social science for of social researchers is caught up in the
researchers to define themselves and the idea of incommensurable paradigms, which
other in terms of differentness. As Alan tends to exaggerate differences and downplay
Bryman argues in Chapter 2, while these diversity and a pragmatic use of methods?
differences are referred to as paradigms or One possible explanation is given by Marja
philosophical positions in practice they often Alastalo in Chapter 3, in which she laments
represent technical decisions about the use the scarcity of empirical research about the
of methods – qualitative or quantitative. In history of social research. Instead, method
a similar vein, Marja Alastalo points out in textbooks, for instance, contain histories
Chapter 3 that the paradigm wars between of methodological development that aim at
qualitative and quantitative methods have legitimating the writers’ own approaches.
10 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Such descriptions tend to paint a picture knowledge imply and produce forms and
of the field in black and white and ignore relations of power. However, this does not
details that do not fit nicely into stereotypical mean that researchers can select a standpoint
representations of the different camps. For and an audience of their own choice and
instance, in many accounts of the history only produce knowledge that serves interests
of social research, the contradiction between of which they approve. First, as Karen
case study and statistical methods is presented Armstrong (Chapter 5) remarks, researchers
in terms of differences of tradition in the are dependent on research funding; this
universities of Chicago and Columbia. Such affects the topics they study and often
accounts ignore the fact that the Chicago reflects the influence of dominant interests in
School, often mentioned as the birthplace of society. Second, the audiences of ethnography
case study, also contributed to quantitative with which Armstrong deals are increasingly
social research, and the Columbia School global. The text may be written from a
was a dominant force in the development of perspective of a Western academic –‘we’ –
qualitative research. All in all it is evident but as Armstrong points out, the audience
that despite the paradigm wars between case may be any number of people with an interest
study and statistical research, or qualitative in the place, the topic, or for other reasons.
and quantitative approaches, in actual prac- Ethnographers – and other social researchers –
tice many social researchers have always are faced, therefore, with the situation in
been quite flexible in applying different which data are collected from a variety of
methods. people who themselves have a variety of
Currently methodological pluralism is on interests, while a variety of readers bring
the rise, and this development calls for a their own interests to understanding the text.
rethinking of the nature of research, both Thus the work produced will be read for its
quantitative and qualitative, and of how it relevance by readers who assign meaning to
can be assessed. Reflecting the exagger- it according to their own evaluations.
ated contrast drawn between qualitative and The observation that social research has an
quantitative methods, it is often suggested increasingly diverse audience and serves the
that quantitative research has a clear set of interests of a diversity of social groups, as
assessment criteria, whereas in the case of reflected in the trend towards participatory
qualitative inquiry no agreed validity criteria methods, is part of the general picture of
are available. However, Martyn Hammersley the changes taking place in the role of
argues in Chapter 4 that the general standards social inquiry in advanced capitalist societies.
in terms of which both the process and These changes are outlined from different
products of research should be judged are perspectives by Marja Alastalo (Chapter 3),
the same whichever approach is employed. Pekka Sulkunen (Chapter 6) and Ann Nilsen
Hammersley stresses that whether we are talk- (Chapter 7). As Pekka Sulkunen discusses,
ing about quantitative or qualitative inquiry, there has been a major trend over the last
there cannot be tests that measure validity; three decades from Mode 1 ‘pure’ science
there is substitute for judgement. to Mode 2 knowledge production, in which
In addition to aiming at true findings the latter relies on pragmatic criteria of
or conclusions in their inquiries, social evaluation and is trans-disciplinary (Gibbons
researchers also need to think about the et al. 1994).
questions they pose in their research. From This change in the role of social science
which perspectives are they relevant, and knowledge in society is part of the regime
whose interests does the knowledge produced change from Keynesian liberalism to neolib-
serve? In light of Michel Foucault’s (Foucault eralism, in which there has been a move from
1977, 1980a, 1980b) point about the power- ‘resource steering’ to ‘market steering’ within
knowledge couplet, it is evident that no neutral public administration and in the privatization
observer position exists. Instead, forms of of many public services. The change has
DIRECTIONS IN SOCIAL RESEARCH 11
affected social research in several ways. social science responsibly. Insuring ethical
On the one hand, structural functionalism competence in social research is a difficult
and other holistic theories of society, which task for social researchers and for institutional
served the interests of Keynesian-planned review boards. Social scientists are addi-
economy, have been challenged by construc- tionally challenged because of the historical
tionist approaches, which direct attention biomedical bias in the way in which ethical
to questions of subjectivity and identity. questions are perceived and handled. More
Because the regulation of human beings is generally they are challenged by increased
increasingly based on one’s own ability to open access to information (Freedom of Infor-
foresee and manage ‘choices’, there is demand mation laws) and increased legal protection of
for expertise in subjectivity (Rose 1996: 151). informants.
Consequently, qualitative research has gained
in momentum from the 1970s onwards.
On the other hand, the requirement that
REFERENCES
public policies and practices are grounded
in evidence-based, scientifically validated
Dixon-Woods, Mary, Sheila Bonas, Andrew Booth,
research has also gained in momentum, since et al. (2006) ‘How Can Systematic Reviews Incor-
the early 1990s (Dixon-Woods et al. 2006: 27). porate Qualitative Research? A Critical Perspective.’
That is one reason why there is increased Qualitative Research 6(1): 27–44.
demand for quantitative research skills. Under Foucault, Michel (1977) Discipline and Punish: The Birth
these conditions it is predictable that along of the Prison. London: Penguin Books.
with the attitude of methodological pluralism Foucault, Michel (1980a) The History of Sexuality/Vol. 1.
there continues to be tension between realist An Introduction. New York: Vintage Books.
and constructionist approaches, as discussed Foucault, Michel (1980b) Power/Knowledge: Selected
by Ann Nilsen in Chapter 7. Interviews and Other Writings, 1972–1977. Brighton,
Sussex: Harvester Press.
Albeit the role of social research in
Gibbons, Michael, Camille Limoges, Helga Nowotny,
society is changing, its importance is not et al. (1994) The New Production of Knowledge: The
decreasing. As Celia B. Fisher and Andrea Dynamics of Science and Research in Contemporary
E. Anushko (Chapter 8) argue, increased Societies. London: Sage.
public recognition of the value of social Rose, Nikolas (1996) Inventing Our Selves: Psychology,
research has been accompanied by a height- Power, and Personhood. Cambridge, England;
ened sensitivity to the obligation to conduct New York: Cambridge University Press.
2
The End of the Paradigm Wars?
Alan Bryman
qualitative research possibly exaggerate the science proceeds through successive scientific
differences between them. revolutions whereby one paradigm of scien-
It is striking that this contrast is drawn tific understanding is replaced by another.
up in predominantly philosophical terms. The A paradigm, then, represents a cluster of
presence or absence of quantification, as beliefs about the proper conduct of science.
symbolized by the terms quantitative and One further important element in Kuhn’s
qualitative research, is not the issue that is argument was that paradigms within a field
the focus of conflict between the warring are incompatible. Their fundamental beliefs
parties; rather, quantification and its absence cannot be reconciled. There is no common
act as ciphers for the underlying philosophical ground between paradigms in terms of their
issues. Had the issue that divides the parties underlying tenets.
simply been a technical matter of the desir- One of the over-riding implications of con-
ability or otherwise of quantification, it is struing quantitative and qualitative research
likely (or at least possible) that the differences as paradigms in Kuhn’s sense, and therefore
between the proponents of quantitative and as incompatible approaches, was that this
qualitative research would not have been as implied to many commentators that it was
intractable as they have been. It is the fact not appropriate to combine them in an
that debate about quantitative and qualitative investigation. In other words, it denied the
research is to do with such fundamental philo- legitimacy of conducting a research project
sophical matters as how humans and their in a manner that combined, say, a survey with
society should be studied and the very nature unstructured interviewing or with any other
of ‘the social’ that has contributed towards research method associated with qualitative
making the paradigm wars so resistant to research. While the term ‘paradigm wars’may
mediation, although the parties sometimes seem a rather dramatic – some might say
alternate between philosophical and technical overly dramatic – way of characterizing the
discourses (Bryman, 1984, 1988). Quite why debates that were going on about methodolog-
philosophical issues became entwined with ical issues, it does give a sense of the intensity
matters of research practice to this degree of these debates.
is unclear. One factor may be that drawing Whether it is justifiable to treat quantitative
on philosophical ideas provided an intellec- and qualitative research as paradigms is a
tual rationale and legitimacy to qualitative separate issue. It is probably the case that it
research as it emerged from the shadows of is quite inappropriate to designate them as
quantitative research in the 1970s. Indeed, paradigms because neither of them can be
our understanding of quantitative research and viewed as indicative of the normal science of
its philosophical bases and biases is largely a discipline, which is how Kuhn employed
founded on the account of it provided by qual- the term, although it has to be recognized
itative researchers since that time (Brannen, that his use of the term was somewhat
2006). Quantitative researchers tend to be slippery. Quantitative and qualitative research
less reflective than qualitative researchers are probably closer to being ‘pre-paradigms’.
concerning the fundamental nature of their As Kuhn noted: ‘it remains an open ques-
approach. tion what parts of social science have yet
acquired … paradigms at all’ (1970: 15).
However, the language of scientific paradigms
THE ISSUE OF INCOMPATIBILITY is deeply ingrained in many discussions of
social research methods and even when the
The association of the two approaches term is not used, there is a sense that the
with the idea of paradigms represented an ‘paradigmatic mentality’(Hammersley, 1984)
implicit reference to the influential work of lies behind those discussions. Moreover,
the American historian of science Thomas the notion of incommensurability is deeply
Kuhn (1970). Kuhn memorably argued that a ingrained so that any recourse to the language
THE END OF THE PARADIGM WARS? 15
classics such as these, can it make sense to other chapters presenting specific methods,
date the paradigm wars from the 1970s and this one is about problems in qualitative and
to associate the hostilities with the rise of case analysis. In other words, the chapter is
qualitative research? The answer resides in not just an exposition of these methods but a
large part in the rise of quantitative research critique of them as well. Even the chapter on
as the dominant approach to the collection and observation (Chapter 10) was not concerned
analysis of data in the years after the Second with observation of the participant observa-
World War. While this research strategy was tion kind but that associated with structured
especially dominant in North America, it held observation – a quantitative approach to
sway in many other countries as well, such observation. This brief examination of a key
as the UK. Qualitative research continued to text provides a small insight into the marginal
enjoy support and to be practised but it was status of qualitative research in the past.
often regarded as unscientific and as merely An interesting insight into this neglect
occupying a preparatory role for the conduct of qualitative research during these years
of quantitative social research. is provided by Savage’s (2005) examination
We can see such a perception if we briefly of the Affluent Worker studies conducted in
examine the chapter headings of Methods in Luton in England in the 1960s (Goldthorpe
Social Research, a key text published in 1952 et al., 1966). In various reports of their
by William Goode and Paul Hatt. This book findings, the Affluent Worker researchers
was significant for two reasons. First, it was emphasized findings that could be expressed
written by two leading figures in the field. in statistical terms. These were findings that
Both authors were distinguished American reflected a high level of consistency between
social researchers who also had made signifi- coders. As a result, the authors tended to
cant contributions to social research method- ignore:
ology and to substantive areas. Second, the
broad structure formed a kind of template the more qualitative features of the interview and
concentrating on those aspects of the respondent’s
that many other research methods texts would testimony which could be quantified … In the
follow over the succeeding years. process, a huge amount of evocative material
Three things are striking about this chapter was left ‘on the cutting room floor’. Having
layout. First, virtually the first third of the book gathered rich qualitative material, the researchers
in terms of the number of chapters concerns then effectively stripped out such materials in favour
of more formal analytical strategies when they came
issues to do with the scientific method. to write up their findings. (Savage, 2005: 932)
Not only are there references to science
and scientific method but we also see key Savage observes that his re-analysis of
terms often associated with the approach – the qualitative data did not lead him to cast
references to facts, hypotheses, proof, and doubt on the broad conclusions Goldthorpe
testing. These activities were seen as the et al. proffered, such as their significant
very stuff of scientific method at the time. findings concerning the prevalence of instru-
Second, most of the following chapters are mentalism among a broad swathe of the
based on the discussion of methods that work force. However, there is evidence
are associated with the implementation of from the transcripts and the field notes that
the scientific method in social research – both the respondents and their interviewers
questionnaires, interviews, probability ideas thought in different ways about class from
and sampling, and scaling. Third, there is just the researchers, especially David Lockwood,
one chapter – Chapter 19 – that includes a who was a member of the team and a
discussion of methods that stand outside the prominent theorist of social stratification
mainstream methods with their scientific con- in the 1960s. It is plausible that had the
notations. This chapter covers the discussion researchers not been so clearly locked into
of qualitative research and the examination of a quantitative research approach, they might
single cases. However, it is telling that unlike have taken the qualitative nuances in their data
THE END OF THE PARADIGM WARS? 17
more seriously. The general point is that in and use of mixed methods research.
Savage’s exercise sheds light on the relatively I conducted a content analysis of articles
low esteem in which qualitative research was using a mixed methods approach covering the
held at the time. period 1994–2003. This research is described
It is difficult and probably impossible to in Bryman (2006a) but one unreported
chart the point that qualitative research came finding relevant to the present discussion is
out of the shadows and closer to the main- that if we compare the number of articles
stream, although it is questionable how far it which combined quantitative and qualitative
has entered the mainstream in North America. research in 2003 with the number in 1994,
From 1970 onwards, there is evidence of a there was a threefold increase. However, it
growing number of books (Filstead, 1970; would be wrong to depict the paradigm wars
Schwartz and Jacobs, 1979). Journals with as having totally come to an end. The growth
a qualitative research emphasis began to of mixed methods research may give the
appear: Qualitative Sociology was started in impression that there has been an abatement
1978 and Urban Life and Culture (later named in the hostilities but that is not the case.
Urban Life and then Journal of Contemporary
Ethnography) began life in 1972. The reasons
probably had a lot to do with a certain amount THE CONTINUED EXISTENCE OF
of disillusionment in some quarters regarding PARADIGM DISPUTES
the utility of quantitative research and its out-
comes. Critiques of the quantitative research In the rest of this chapter, I will draw attention
orthodoxy like those written by authors like to three areas which suggest that there
Cicourel (1964) and Phillips (1971, 1973) are lingering signs of paradigm hostilities.
probably played a significant role in the rise In other words, although mixed methods
of qualitative research, although qualitative research represents a sign that one of the
research itself was not immune to their main cleavages in the paradigm wars has
critical gaze. Further, as previously suggested, been bridged, this is not to say that paradigm
the growing awareness of theoretical ideas disputes have been totally resolved. First,
and philosophical positions that offered an it is important to appreciate that there are
alternative viewpoint to the positivist position fundamental differences within both quan-
that was seen as the motor behind quantitative titative and qualitative research. Insofar as
research probably played a significant role quantitative and qualitative research might
and almost certainly accounts for the way be described as paradigms, these represent
in which quantitative and qualitative research what could be termed ‘intra-paradigmatic
became entangled with philosophical issues. differences’. Second, there are some fairly
Along with a growing awareness of theoretical fundamental differences among social and
ideas and philosophical positions that offered other researchers concerning how mixed
an alternative to positivism, it served to methods research should be viewed. Third,
legitimate the use of qualitative methods in the there are signs in fields that are very adjacent
face of the hegemony of quantitative research. to social research that the dust has not settled
Thus, although there is evidence of ear- on the paradigm wars and that in fact there
lier generations of researchers combining are occasional paradigm skirmishes. Each of
quantitative and qualitative research, the these three areas will form the basis for the
emergence of the paradigm wars was a product remainder of this chapter.
of the way in which philosophical issues
became attached to research methods and the
Intra-paradigmatic differences
domination of social research by quantitative
research. Quantitative research is sometimes viewed
There is little doubt, as previously noted, as though it is a monolithic, undifferen-
that there has been an increase in interest tiated approach that is completely imbued
18 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
with positivism. However, there is a growing associated with experimentalists and non-
recognition of a post-positivist position that, experimentalists do not warrant the appel-
while it shares many of positivism’s basic lation ‘paradigms’. On the other hand, they
tenets, it differs in certain respects. Post- do reflect a fundamental difference in the
positivism differs in its more accommodating degree to which a strict positivist position
stance towards qualitative data, which are should be followed and what value can
given short shrift in traditional positivist and cannot be placed on non-experimental
conceptions other than in a very limited role. investigations. Such considerations also elide
It typically shares with positivism the view with disciplinary contexts, in that a view like
that there is a reality that is independent of Crano’s is more likely to be associated with a
and external to the researcher but tends to discipline like psychology which has a strong
recognize that reality can only be understood inclination towards experiments.
in a limited way because that understanding However, there are even more intra-
derives from the researcher’s conceptual paradigmatic differences within qualitative
tools. As such, post-positivism accommodates than within quantitative research. A glance at
many of the critiques of the positivist view of the latest edition of the Handbook of Quali-
science by recognizing that there cannot be tative Research (Denzin and Lincoln, 2005b)
theory-neutral observation (Wacquant, 2003). displays an extraordinary and apparently
Further, there are fundamental differences growing diversity of approaches within the
in some areas of social research, such as social qualitative research community. At one point
psychology, between those who prioritize in the volume, Denzin and Lincoln (2005a: 24)
experiments and those who include non- outline a table that presents this diversity.
experimental research methods, such as the They delineate several paradigms (their term)
sample survey, within their purview. For the that share three features – relativist ontologies,
former, it is not possible in non-experimental interpretivism at the epistemological level,
research unambiguously to attribute causality and interpretive and naturalistic methods.
to relationships between variables, whereas They then outline several paradigms that
the second group accepts that causal impacts share these three criteria but differ in other
can be gleaned through statistical controls. fundamental ways, including constructivism,
As an example of the former position, an feminism, ethnic, Marxist, cultural studies and
experimentalist writes: queer theory.
Other writers have drawn attention to
For strict experimentalists, factors that differentiate
additional basic differences among qualitative
participants (e.g., sex, gender, religion, IQ, per- researchers. Charmaz (2000, 2005) discusses
sonality factors), and other factors not under the a basic difference between objectivist and
control of the researcher (e.g., homicide rates in constructivist stances within expositions of
Los Angeles), are not considered independent and and studies using grounded theory. Whereas
thus are not interpreted causally. However, in some
research traditions, variables under experimental
the former is founded on the assumption
control sometimes are suggested as causes. … that there is an ‘external world that can be
Owing to the possibility of … third-variable causes, described, analyzed, explained, and predicted’
causal inferences based on correlational studies are (2000: 524), a constructivist grounded theory
best offered tentatively. (Crano, 2004: 484) ‘recognizes that the viewer creates the data
and ensuing analysis through interaction with
It is precisely for this reason, that a hier- the viewed’ (2000: 523). A further fundamen-
archy of research methods is sometimes tal difference between forms of or approaches
presented which implies that evidence from to qualitative research centres on the approach
experimental studies is or should be at the to the use of language. Much qualitative
top after systematic reviews of experiments research treats language as a mechanism
(Becker and Bryman, 2004: 57). Arguably, for understanding the social world, so that
the ‘research traditions’ (to use Crano’s term) interviewees’ replies are treated as a means
THE END OF THE PARADIGM WARS? 19
of understanding the topics about which they argument which depicts research methods
are asked questions. For researchers working as associated with a set of epistemological
within traditions like conversation analysis assumptions. A research method is thus a
and discourse analysis, language is a topic in cipher for underlying philosophical ideas.
its own right. It is viewed as constitutive of Smith and Heshusius write:
social reality and is a form of action in its own
right, not simply a window on action. Given This disregard of assumptions and preoccupation
these different stances on the role of language with techniques have had the effect of transforming
qualitative inquiry into a procedural variation of
in social research, it is not too fanciful
quantitative inquiry. … That certain individual
to suggest that they represent paradigmatic procedures can be mixed does not mean that there
differences in the ways in which social are no differences of consequence. (1986: 8, 9)
reality should be apprehended. For example,
the conversation analyst’s disinclination to This is in reality a re-statement of the bases
take context, as identified by researchers, on which the paradigm wars were waged.
into account in examinations of talk is in It depicts two irreconcilable sides, so that no
stark contrast to the significance of context fraternizing with the enemy is legitimate.
for many qualitative researchers (Schegloff, In recent years, this position on mixed
1997). For example, Morse (2001) talks methods research has become less frequently
about evidence of a degree of ‘paradigm voiced and in its place an attitude of
asynchronicity’ when referring to the rise of pragmatism has permeated the field. Initially,
a debate within qualitative research implying this sense of a pragmatist position was
that approaches like grounded theory and most often in evidence in the more applied
narrative analysis are less rigorous than fields in the social sciences, such as eval-
conversation analysis. uation research. Indeed, practitioners from
such fields have been especially prominent
advocates of and writers on mixed methods
Differences in positions on mixed
research (e.g. Greene et al., 1989). Essen-
methods research
tially, the pragmatist position either ignores
Mixed methods research has attracted a paradigmatic differences between quantita-
variety of positions on its prospects and on tive and qualitative research or recognizes
what it can and cannot achieve. Some writers their existence but in the interests of exploring
have been extremely resistant to the idea research questions with as many available
that quantitative and qualitative research tools as possible, it shoves them to the
might be combined. Smith and Heshusius side. For example, Maxcy (2003: 79) argues
(1986) have provided one of the strongest that pragmatism ‘seems to have emerged
and clearest statements of such resistance. as both a method of inquiry and a device
These authors argue that treating quantitative for the settling of battles between research
and qualitative research as compatible and purists and more practical-minded scientists.
therefore as combinable neglects the fact that The point about pragmatism is that in place
they are based on fundamentally different of an emphasis on philosophical issues and
and irreconcilable foundations. Theirs is an debates that were a feature of the paradigm
example of what I have referred to as the wars and which were the province of the
‘paradigm argument’, which stresses the dif- ‘research purists’ to which Maxcy refers,
ferences between quantitative and qualitative issues to do with the mixing of methods
research in terms of foundational assumptions become matters of technical decisions about
about the nature of knowledge rather than the appropriateness of those methods for
in terms of technique (Bryman, 2004). answering research questions. Issues to do
The paradigm argument rests upon another with the appropriateness of research methods
argument which is often employed in such for answering research questions or ensuring
discussions. This is the ‘embedded methods’ continuing funding in the modern competitive
20 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
academic environment became the criteria for prioritizes finding out whatever is needed to
judging the desirability or otherwise of mixing address the researcher’s objectives.
methods, rather than philosophical principles. As such, there would seem to be two
In 2003, I interviewed 20 UK social distinct stances on mixed methods research:
scientists who were known to be mixed one which emphasizes paradigm differences
methods research practitioners. The details between quantitative and qualitative research
of this research can be found in Bryman and which stresses their incompatibility, and
(2006b). The pragmatist stance was very another which emphasizes a pragmatist posi-
much in evidence among these researchers. tion of depicting research as using whichever
In the words of one of my interviewees: research methods are most appropriate regard-
‘So we’ve taken that pragmatic decision less of the supposed epistemological location.
to do it that way because that’ll generate These might usefully be labelled the paradig-
something that either method, standing alone, matic and pragmatic stances on the prospects
is not gonna give us’ (quoted in Bryman, of doing mixed methods research, although
2006b: 117). Another referred to the fact these do not exhaust the range of possibilities
that he/she was located in an entrepreneurial (Greene and Caracelli, 1997).
research centre where ‘there’s always been The growth of mixed methods research has
so much more of a pragmatic approach to to a significant extent occurred because the
doing things’(quoted in Bryman, 2006b: 117). pragmatic stance became ascendant in the
On other occasions, it was striking that years after Smith and Heshusius articulated
although the term ‘pragmatism’ was not their views, although it is important to
employed, it could be clearly discerned in appreciate that similar views continued to be
interviewees’ replies. One interviewee replied expressed (e.g. Buchanan, 1992). However,
that the crucial issue was: the very surge of interest in doing mixed
methods research has been accompanied by
assessments of its prospects and potential.
attempting to better understand what it is you’re
trying to understand, and in that way, you then have One of the themes that can be discerned among
to ask how appropriate are the sorts of methods I’m these appraisals is some recourse to paradig-
using and are they going to give me the information matic arguments. Three examples can be used
to understand what it is I’m researching? (Quoted to illustrate this point. Sale et al. (2002)
in Bryman, 2006b: 117)
write that because they represent different
paradigms with contrasting epistemologi-
Further evidence of the sidelining of philo- cal positions, quantitative and qualitative
sophical issues among many mixed methods research involve the study of different phe-
researchers is that the previously mentioned nomena and therefore cannot be compared.
content analysis revealed that only 6 percent This means that they cannot be used for
of the 232 articles examined referred to episte- exercises like triangulation of findings, but
mological or ontological issues or to paradigm can be employed to study complementary
conflicts in the combined use of quantitative issues. This argument does not represent an
and qualitative research (Bryman, 2006a). outright rejection of mixed methods research
The coding of this dimension required only at all, but it does imply that there are limits to
a mention of these issues; it was not concerned its use. A second example is Giddings’ (2006)
with the way in which the issue was couched. suggestion that mixed methods research ‘is
Thus, the coding was neutral about whether positivism dressed in drag’. As she puts it:
paradigm issues were depicted in articles as ‘mixed methods dwells within positivism;
impeding or irrelevant to the combination the ‘thinking’ of positivism continues in the
of the mixing of quantitative and qualitative ‘thinking’ of mixed methods. … [It] rarely
research. This finding provides further sug- reflects a constructionist or subjectivist view
gestion that mixed methods researchers adopt of the world’ (2006: 200). The point here is
a pragmatic view of the research process that very consistent with Smith and Heshusius’s
THE END OF THE PARADIGM WARS? 21
concerns in that Giddings is arguing that in studies. In these fields, systematic review is
the service of mixing methods, qualitative sometimes promoted as a yardstick for con-
research becomes what they called in the ducting literature reviews and, as previously
quotation above a ‘procedural variation’ of noted, is often regarded as occupying the top
quantitative research. The concern here seems spot in hierarchies of evidence in fields like
to be that by colonizing qualitative research, social policy research (Becker and Bryman,
mixed methods research may marginalize 2004). It has emerged out of medical research,
philosophical traditions that have come to the where it has been used to inform evidence-
fore in recent years and which have drawn based medical decision-making. In this field,
significantly on qualitative methods (e.g. crit- meta-analyses of trials and other kinds of
ical approaches, interpretivism). A similar investigation have become gold standards on
kind of concern has been expressed by Howe which important decisions rest. Systematic
(2004) who argues that in mixed methods review draws on and incorporates many of
research, qualitative methods have become the insights and procedures with which meta-
adjuncts to quantitative ones. He suggests that analysis is associated. Indeed, it is to all intents
such research is founded on the same episte- and purposes a form of systematic review.
mological principles as quantitative research Systematic review has been defined as:
and argues for mixed methods research that ‘a replicable, scientific and transparent pro-
draws explicitly on interpretivism. We see cess, in other words a detailed technology,
here a clear example of a paradigmatic stance that aims to minimize bias through exhaustive
on mixed methods research. literature searches of published and unpub-
The point of this brief discussion of these lished studies and by providing an audit trail
views that are critical of the use of mixed of the reviewer[’]s decisions, procedures and
methods research is that they imply that conclusions’ (Tranfield et al., 2003: 209).
paradigmatic views of the approach have Systematic review begins with an explicit
not gone into abeyance and indeed may be statement of the purpose of the review and
involved in something of a renaissance in specifies the criteria by which studies are
response to its growing prominence. What we to be included in the review. The issue of
see here as well is a suggestion that the criteria operates on at least two levels. One is
paradigm wars are not over or that clashes that the criteria should specify such things
continue even when a truce has been declared. as the limits in terms of geography and
time. The other is that the reviewer should
specify quality criteria, that is, that only
Paradigm wars in applied fields
research that meets the pre-set criteria should
It is very striking that, as previously noted, be included in the review. This has become one
applied fields like evaluation research and of the most contentious areas of systematic
nursing research have been very receptive to review because it has sometimes been viewed
mixed methods research, as can be seen when as discriminating against the inclusion of
the contents of the Handbook of Mixed qualitative studies within its purview, because
Methods in Social and Behavioral Research they cannot meet the criteria that are specified
(Tashakkori and Teddlie, 2003) are exam- which presume that the studies derive from
ined. However, at the same time, some quantitative research. Further, qualitative
applied fields continue to provide something research, until fairly recently, has been viewed
of a battleground in which clashes akin to the as less obviously capable of synthesis than
paradigm wars can be encountered. quantitative research. These features have
One of the most prominent forms of what I resulted in considerable interest since the
am suggesting here is the rise of systematic late 1990s in the development of quality
review in areas that overlap with social criteria for qualitative studies to inform
research, such as health research, educa- their inclusion or exclusion from systematic
tion, social policy research, and organization reviews and of approaches to aggregating
22 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
qualitative studies. The issue of synthesizing for synthesizing such studies. However, at
qualitative studies has been explored in terms the time of writing there has been no
of both aggregating qualitative studies with agreement about either of these areas. Instead,
quantitative ones and aggregating qualitative there has been a proliferation of attempts
studies in domains where most of the literature to specify quality criteria for qualitative
draws on qualitative evidence. research, both within and beyond the context
Two things are relevant to the discussion of systematic review (Bryman, 2006b; Dixon-
of the supposed termination of the paradigm Woods et al., 2004; Spencer et al., 2003).
wars. One is that the systematic review Also, several approaches to synthesis have
approach is very much predicated upon been promoted but there is little consensus
principles that can be traced to a quantitative about which to use or when (Sparkes, 2001).
research stance and its association with pos- The approaches include: meta-ethnography;
itivism. These principles include an empha- content analysis; and critical interpretive
sis on: transparency, replicability, and the synthesis (Dixon-Woods et al., 2006; Mays
application of apparently neutral procedures. et al., 2005). In itself, the lack of agreement
These principles can then be deployed against concerning how qualitative studies can best
conventional reviews to suggest that they are be incorporated into systematic reviews is
lacking in rigour and are biased. For example, not a problem. However, it does make it
Tranfield et al. write: ‘applying specific difficult for qualitative researchers to acquire
principles of systematic review methodology legitimacy beyond the qualitative research
used in the medical sciences to management community for their literature reviews. This is
research will help in counteracting bias by not unlike the situation that pertained in
making explicit the values and assumptions the early years of the paradigm wars when,
underpinning a review’ (2003: 208). There from the point of view of many qualitative
is a glimpse in these discussions of the researchers, quantitative researchers were
remnants of paradigm war issues or at least the perceived as defining what constituted an
potential for them. For example, Hammersley appropriate approach to the research process.
has argued that systematic review ‘assumes What is not clear is how far the predilection
the superiority of what … can be referred for systematic reviews will diffuse beyond the
to as the positivist model of research’ applied fields where it has been especially
(2001: 544). Much like in qualitative research, promoted. Systematic review works best
the reviewer is almost seen as a contaminant when research questions are of the ‘what
whose biases and predilections have to be works?’kind but in less applied fields this kind
minimized. Hammersley also observes that of research question is uncommon or unlikely.
evidence is not typically presented to suggest The main point that is being registered at
that systematic reviews are superior to non- this juncture is that the creation of a contrast
systematic (increasingly called ‘narrative’) between systematic and narrative reviews,
reviews. Instead, narrative reviews are con- along with the problems of incorporating
demned by innuendo – they are not system- qualitative studies into the former, reveals
atic, they do not use explicit procedures, etc. vestiges of issues that were long associated
Hammersley (2001) also argues that it is not with the paradigm wars.
easy to see how qualitative studies fit with a A further example of a resurgence of
systematic review approach. In fact, one of the paradigm hostilities can be found in educa-
most notable aspects of the discussion of sys- tional research. In this field, there has been
tematic reviews in the social sciences since he a recognition in both the USA and the UK
wrote this article is the growing discussion of that there have been attempts to restrict the
ways of making qualitative research amenable acceptability of empirical research to just
to systematic review. As previously noted, this studies that conform to what is taken to be
includes developing quality criteria specifi- scientific research. Feuer et al. (2002) note
cally for qualitative studies and mechanisms that in the context of educational research
THE END OF THE PARADIGM WARS? 23
world and the diffusion of constructivist ideas Charmaz, K. 2000. ’Constructivist and objectivist
has resulted in a greater tolerance of such grounded theory’ in Denzin, N.K. and Lincoln, Y.S.
paradigm diversity. (eds.) The Sage Handbook of Qualitative Research.
Thousand Oaks, CA: Sage.
Charmaz, K. 2005. ’Grounded theory in the 21st century’
in Denzin, N.K. and Lincoln, Y.S. (eds.) The Sage
ACKNOWLEDGEMENTS Handbook of Qualitative Research. Thousand Oaks,
CA: Sage.
I wish to thank Martyn Hammersley for Cicourel, A.V. 1964. Method and Measurement in
discussions of some of these issues as well as Sociology. New York: Free Press.
for his comments on this chapter. His ideas Crano, W.D. 2004. ‘Independent variable in experimen-
greatly helped to sharpen my thoughts on tal research’ in Lewis-Beck, M.S., Bryman, A. and
many of these topics, although I alone am Liao, T.F. (eds.) The Sage Encyclopedia of Social
responsible for the deficiencies in this chapter. Science Research Methods (Vols. 1–3). Thousand
I also wish to thank the Economic and Oaks, CA: Sage, pp. 483–4.
Social Research Council for funding the Creswell, J.W. 2003. Research Design: Qualitative,
research project ‘Integrating quantitative and Quantitative, and Mixed Methods Approaches.
Thousand Oaks, CA: Sage.
qualitative research: prospects and limits’
Deetz, S. 1996. ’Describing differences in approaches
(Award number H333250003) which made
to organizational science: rethinking Burrell and
possible the research on which parts of this Morgan and their legacy’. Organization Science
chapter are based. 7: 191–207.
Denzin, N.K. and Lincoln, Y.S. 2005a. ’Introduction:
the discipline and practice of qualitative research’
REFERENCES in Denzin, N.K. and Lincoln, Y.S. (eds.) The Sage
Handbook of Qualitative Research. Thousand Oaks,
Becker, S. and Bryman, A. 2004. Understanding CA: Sage.
Research for Social Policy and Practice. Bristol: Policy Denzin, N.K. and Lincoln, Y.S. 2005b. The Sage
Press. Handbook of Qualitative Research. Thousand Oaks,
Bloland, H.G. 2005. ’Whatever happened to post- CA: Sage.
modernism in higher education?’ Journal of Higher Dixon-Woods, M., Cavers, D., Agarwal, S.,
Education 76: 121–150. Annandale, E., Arthur, A., Harvey, J., Hsu, R.,
Brannen, J. 2006. ’Mixed Methods Research: A Discus- Katbamna, S., Olsen, R., Smith, L.K. and Sutton, A.J.
sion Paper’ NCRM Methods Review Papers: ESRC 2006. ’Conducting a critical interpretive synthesis of
National Centre for Research Methods. the literature on access to healthcare by vulnerable
Bryman, A. 1984. ’The debate about quantitative groups’. BMC Medical Research Methodology 6: 35.
and qualitative research: a question of method Dixon-Woods, M., Shaw, R.L., Agarwal, S. and Smith,
or epistemology?’ British Journal of Sociology J.A. 2004. ’The problem of appraising qualitative
35: 75–92. research’. Quality and Safety in Health and Social Care
Bryman, A. 1988. Quantity and Quality in Social 13: 223–225.
Research. London: Unwin Hyman. Feuer, M.J., Towne, L. and Shavelson, R.J. 2002.
Bryman, A. 2004. Social Research Methods. Oxford: ’Scientific culture and educational research’. Educa-
Oxford University Press. tional Researcher 31: 4–14.
Bryman, A. 2006a. ’Integrating quantitative and Filstead, W.J. 1970. Qualitative Methodology: First-
qualitative research: how is it done?’ Qualitative hand Involvement with the Social World. Chicago:
Research 6: 97–113. Markham.
Bryman, A. 2006b. ’Paradigm peace and the implications Fine, G.A. and Elsbach, K.D. 2000. ’Ethnography and
for quality’. International Journal of Social Research experiment in social psychological theory building:
Methodology 9: 111–126. tactics for integrating qualitative field data with
Buchanan, D.R. 1992. ’An uneasy alliance: combin- quantitative lab data’. Journal of Experimental Social
ing qualitative and quantitative research’. Health Psychology 36: 51–76.
Education Quarterly 19: 117–135. Gage, N. (1989). ‘The paradigm wars and their
Burrell, G. and Morgan, G. 1979. Sociological Paradigms aftermath: a ‘historical’ sketch of research on teaching
and Organisational Analysis. London: Heinemann. since 1989’. Educational Researcher 18: 4–10.
THE END OF THE PARADIGM WARS? 25
Giddings, L.S. 2006. ’Mixed-methods research: posi- Mays, N., Pope, C. and Popay, J. 2005. ’Systematically
tivism dressed in drag?’ Journal of Research in Nursing reviewing qualitative and quantitative evidence to
11: 195–203. inform management and policy-making in the health
Goldthorpe, J.H., Lockwood, D., Bechhofer, F. and field’. Journal of Health Services Research and Policy
Platt, J. 1966. The Affluent Worker: Industrial 10: S6–S20.
Attitudes and Behaviour. Cambridge: Cambridge Morrow, R.A. and Brown, D.D. 1994. Critical Theory and
University Press. Methodology. Thousand Oaks, CA: Sage.
Goode, W.J. and Hatt, P.K. 1952. Methods in Social Morse, J.M. 2001. ‘A storm in an academic teacup’.
Research. New York: McGraw-Hill. Qualitative Health Research 11: 587–588.
Greene, J.C. and Caracelli, V.J. 1997. ’Defining and Oakley, A. 1999. ’Paradigm wars: some thoughts on a
describing the paradigm issue in mixed-method personal and public trajectory’. International Journal
evaluation’ in Greene, J.C. and Caracelli, V.J. of Social Research Methodology 2: 247–254.
(eds.) Advances in Mixed-Method Evaluation: The Phillips, D.L. 1971. Knowledge from What? Theories and
Challenges and Benefits of Integrating Diverse Methods in Social Research. Chicago: Rand McNally.
Paradigms. San Francisco: Jossey-Bass. Phillips, D.L. 1973. Abandoning Method. San Francisco:
Greene, J.C., Caracelli, V.J. and Graham, W.F. 1989. Jossey-Bass.
’Toward a conceptual framework for mixed-method Ryan, K.E. and Hood, L.K. 2004. ’Guarding the castle and
evaluation designs’. Educational Evaluation and opening the gates’. Qualitative Inquiry 10: 79–95.
Policy Analysis 11: 255–274. Sale, J.E.M., Lohfeld, L.H. and Brazil, K. 2002. ’Revisiting
Hammersley, M. 1984. ’The paradigmatic mentality: the quantitative-qualitative debate: implications for
a diagnosis’ in Barton, L. and Walker, S. (eds.) mixed-methods research’. Quality and Quantity
Social Crisis and Educational Research. London: 36: 43–53.
Croom Helm. Savage, M. 2005. ’Working-Class identities in
Hammersley, M. 1992. ‘The paradigm wars: reports from the 1960s: revisiting the Affluent Worker study’.
the front’. British Journal of Sociology of Education Sociology 39: 929–946.
13: 131–143. Schegloff, E.A. 1997. ’Whose text? Whose context?’
Hammersley, M. 2001. ’On ‘systematic’ reviews of Discourse and Society 8: 165–187.
research literatures: a ‘narrative’ response to Evans & Schwartz, H.D. and Jacobs, J. 1979. Qualitative
Benefield’. British Educational Research Journal Sociology: A Method to the Madness. New York: Free
27: 543–554. Press.
Hammersley, M. 2005. ’Countering the ‘new ortho- Smith, J.K. and Heshusius, L. 1986. ’Closing down the
doxy’ in educational research: a response to Phil conversation: the end of the quantitative-qualitative
Hodkinson’. British Educational Research Journal debate among educational researchers’. Educational
31: 139–155. Researcher 15: 4–12.
Hodkinson, P. 2004. ’Research as a form of work: Sparkes, A. 2001. ’Myth 94: qualitative health
expertise, community and methodological objectiv- researchers will agree about validity’. Qualitative
ity’. British Educational Research Journal 30: 9–26. Health Research 11: 538–552.
Howe, K.R. 2004. ’A critique of experimentalism’. Spencer, L., Ritchie, J., Lewis, J. and Dillon, L. 2003.
Qualitative Inquiry 10: 42–61. Quality in Qualitative Evaluation: A Framework for
Jahoda, M., Lazarsfeld, P.F. and Zeisel, H. 1972. Assessing Research Evidence. London: Government
Marienthal: the Sociography of an Unemployed Chief Social Researcher’s Office.
Community. London: Tavistock. Tashakkori, A. and Teddlie, C. 2003. Handbook of Mixed
Kuhn, T.S. 1970. The Structure of Scientific Revolutions. Methods in Social and Behavioral Research. Thousand
Chicago: University of Chicago Press. Oaks, CA: Sage.
Lather, P. 2004. ’Scientific research in education: a Tooley, J. and Darby, D. 1998. Educational Research:
critical perspective’. British Educational Research A Critique. London: Ofsted.
Journal 30: 759–772. Tranfield, D., Denyer, D. and Smart, P. 2003. ’Towards
Maxcy, S.J. 2003. ’Pragmatic threads in mixed methods a methodology for developing evidence-informed
research in the social sciences: the search for multiple management knowledge by systematic review’.
modes of enquiry and the end of the philosophy of British Journal of Management 14: 207–222.
formalism’ in Tashakkori, A. and Teddlie, C. (eds.) Wacquant, L.J.D. 2003. ’Positivism’ in Outhwaite, W.
Handbook of Mixed Methods in Social and Behavioral (ed.) The Blackwell Dictionary of Modern Social
Research. Thousand Oaks, CA: Sage. Thought. Oxford: Blackwell.
3
The History of Social
Research Methods
Marja Alastalo
Not only theories but also methods change in In this chapter social research is understood
the course of history and these changes have as empirical research on the society that can
had consequences for what is known about also be conducted in other institutions than
societies. However, less attention is paid to universities1 . By the concept of ‘method’ I
the history and formation of research methods refer to techniques of gathering and analyzing
than to the history of theoretical ideas and the data. I also make an analytical distinction
thinking of key scholars (Platt, 1996: 1). There between ‘a method of data collection’ and
has also been a related tendency to discuss ‘a method of analyzing data’, because changes
methods and methodological issues on a rather in the methods of data collection and the
abstract and philosophical level, instead of methods of analysis have not occurred
studying what has actually been done. simultaneously. Textbooks also often focus
In this chapter my aim is to briefly on either specific methods of gathering data
outline the history of social research methods (e.g. Gubrium & Holstein, 2002; Kvale,
on the basis of earlier accounts of that 1996) or methods of analysis (Hardy &
history. I try to cover the wide-ranging and Bryman, 2004) and they may contain different
incoherent histories of both quantitative and sections for each (Denzin & Lincoln, 2000a).
qualitative research methods. The focus is Methodology is often understood and defined
unavoidably but regrettably in the Anglo- as a normative attempt to find and discuss
American traditions. The Anglo-American ‘the good and the bad practices’. However,
social research is often a starting point here methodology is understood as a research
that is taken for granted (Alasuutari, 2004). performed on research methods. ‘Sociologists
To compensate the brevity of this text an study man in society; methodologists study
extensive listing of references in the history the sociologist at work’ (see Lazarsfeld,
of social research methods is provided. 1993a: 236).
THE HISTORY OF SOCIAL RESEARCH METHODS 27
first political defeat became the birthday of the who also conducted a considerable amount
first sociographic study’ (Zeisel, 2002: 100). of empirical research during his career
In the following, the history of social (Lazarsfeld, 1993b: 283–298).
research will be reviewed from the beginning The pioneers did not aim at testing theories
of the twentieth century to the turn of the but collecting facts and sometimes also
millennium. My aim is to trace both the changing the state of affairs. At that time
continuities and discontinuities and to present even the idea of collecting empirical material
an outline of the history, drawing on earlier on ordinary people for research was novel.
research. The early studies were influenced by various
Christian, philanthropic and socialist ideas but
also scientific ideas from statistics to national
The methods of social research economy. The social reforms suggested by
before the First World War Booth and his successors are often interpreted
A prehistory of qualitative methods has not as early steps taken towards the welfare
been traced to the same extent as the prehistory state. In these interpretations the divergent
of the survey, and especially the formation suggestions – such as the segregation of
of ideas that led to the rise of modern the casual poor to ‘labour colonies’ and the
statistics and statistical institutions, which loafers to detention centres – are forgotten
have been carefully studied (Höjer, 2001; (Kent, 1985: 55).
Lazarsfeld, 1977; Porter, 1986; Stigler, 1986; The early social survey in America was
Zeisel, 2002). Also the history of empirical influenced by the European counterpart and
social research and the formation of social at least one part of it has been defined
survey from the end of the nineteenth century as a social movement ‘dedicated to putting
to the First World War in particular are science (…) in the service of social reform’
outlined in several countries (Abrams, 1981; (Converse, 1987: 21). In addition to the social
Converse, 1987, 11–53; Kent, 1981,1985; surveys in the United States, election and
Marsh, 1982; Oberschall, 1965; Young, 1949). opinion polls also started to evolve very early
With few exceptions (e.g. Converse, 1987; (Hoinville, 1985: 106). So, the new ideas
Young, 1949) these histories discuss the of studying and describing the society were
course of events in Europe, as the roots applied and advanced by various actors and
of empirical social research actually lie in for various interests.
Europe, not America: Neither the methods of data collection nor
the methods of analysis in the pioneer surveys
All European countries have conducted empirical
meet the definition of the modern survey.
social research for nearly 200 years. As a matter
of fact, many of the techniques which are now The data collected in the early surveys can be
considered American in origin were developed in considered miscellaneous because structured
Europe 50 or 100 years ago and then they were questionnaires were not yet an established
exported from the United States after they had been mode of data collection. For example Booth,
refined and made manageable for use on a mass
with his assistants, ‘used a variety of meth-
scale. (Lazarsfeld, 1965: v.)
ods, consulting existing statistics, conducting
The pioneer surveys in Britain and interviews with informants, and making
Germany dealt with poverty and the material countless observations of real conditions’
and moral living conditions among working (Converse, 1987: 15). What was characteristic
class and agricultural labour (Oberschall, of Booth and also of Max Weber was that they
1965: 3). The aim was to provide infor- collected the data from informants instead
mation on contemporary social problems. of relying on the poor people themselves.
The pioneers of social survey had various Weber assumed that direct interviewing was
backgrounds from non-academics, such as impossible with low-income people because
Charles Booth and Seebohm Rowntree, to they were not able to describe their own situ-
the classics of sociology such as Max Weber, ation. Later Weber changed his mind on this
THE HISTORY OF SOCIAL RESEARCH METHODS 29
and became convinced that also low-income diffusion of these ideas. For instance,
people are able to speak for themselves most tabulations were carried out by hand,
and thus they can be directly interviewed because machines for sorting and counting
(Lazarsfeld, 1993: 286, 290). punch-cards were rare and mainly used by
Catherine Marsh (1985) has noted that even statistical offices. Random sampling was also
the idea of a respondent who is both a subject technically difficult as it was laborious to
of the study and an informant at the same compile lists of people suitable for sampling.
time was slow to develop. Once the ideas of The imperfection of the methods used
direct data collection and interviewing were was not the only weakness of the early
invented, researchers started to pay attention British social surveys; they were also often
also to the questionnaire design and question both conceptually and theoretically vague.
wording. As Raymond Kent has put it:
These early surveys were not sample
Investigators did attempt to explain their findings
surveys, so in this respect too they differed by looking for causes, but the attempt was not
from the modern surveys. Probability very successful. (…) What they failed to realize
sampling was invented in statistics at the was that explanation of the facts could never be
turn of the century, but the usefulness of based on yet more facts. Such an explanation
sampling was not found in social research. was always a question of interpretation of the
facts, and for that they would have needed
The pioneers of survey aimed at covering the kind of theories being proposed by political
everyone in the area that was chosen. economists and academic sociologists of the day.
This led to the encyclopaedic endeavours (Kent, 1985: 68.)
where huge amounts of data were collected.
A.L. Bowley discovered the useful properties In the Continent attempts to combine theory
of probability sampling for social research. and methods in empirical research were
He applied probability sampling for the first made in the field of sociology. The first
time in his study of five English towns in 1915. method textbook The Rules of Sociological
The methods of analysis were also elemen- Method by Emile Durkheim was published
tary before the First World War. The data in 1895 in French. Later on Max Weber
drawn from various sources were usually wrote some methodological texts5 . Because
counted, classified and presented in percent- of the language barrier these texts did
age tables and sometimes in cross-tabulations. not influence the Anglo-American tradition
Early surveys have been criticized for being before they were translated into English at
unsophisticated as they did not connect with the end of the 1930s and 1940s (see Platt,
the developments in correlational techniques 1996: 69–70, 117–119 on the reception of
that were invented by the turn of the century these classics). In the United States the
(Selvin, 1985). European tradition was seen through the
According to Catherine Marsh, major contemporary frame. For example, Emile
advances in the survey technology were Durkheim’s Suicide was presented as an early
already made before the First World War example of quantitative reasoning conducted
(Marsh, 1982: 27). By major advances Marsh in the Lazarsfeldian style (Selvin, 1958; also
means the idea of probability sampling, the Madge, 1963; Riley, 1963).
use of structured questionnaires, and the basic
tools of statistical analysis such as correlation The interwar period: A tension
and regression coefficients. However, these
between case study and statistical
innovations did not spread overnight. It took
method
a long time before these methodological
inventions were refined operational and Most writings on social research methods
widely accepted as self-evident established from the 1920s onwards deal with the
practices. Backward technical conditions development of methods in the United States.
are probably one explanation for the slow According to Jennifer Platt this emphasis
30 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
The Chicagoans’ contributions to statistical various data types such as life histories,
methods discussed above shows that it is time sheets, school essays, meal records and
misleading to equate the Chicago School statistical data. The authors – Marie Jahoda,
merely with qualitative methods (see also Paul Lazarsfeld and Hans Zeisel – crystallized
Platt, 1996: 264–65). Considerable advances the atmosphere of the moment:
in statistical methods were also made outside
Chicago during the interwar period. Statistical But there is a gap between the bare figures of
official statistics and the literary accounts, open
methods were widely practised by social as they invariably are to all kinds of accidental
surveyors, social researchers, pollsters and impressions. The purpose of our study of the
market researchers; all of them made method- Austrian village, Marienthal, is to bridge this gap.
ological contributions.At that time these fields (Jahoda et al., 2002: 1)
were not separate but there was interaction
The study is said not to be directly influenced
as, for instance, some of the academic social
by American sociology or German social
researchers worked in community survey
research (Fleck, 2002: viii). This conclusion is
programmes and then moved back to the
difficult to draw from the book itself because
university10 . Also, at least some of the
it is unconventional in a sense that there are
academic departments and research institutes
no references. As an afterword, there is a short
appear to have formed multidisciplinary –
history of sociography by Hans Zeisel where
before the word was invented – environ-
he writes about ‘the American survey’. This
ments, where social scientists, statisticians
proves that the authors were at least to some
and psychologists met.
extent aware of American social research and
In the interwar years the development of
the writings of the Chicago School. However,
sampling techniques continued, as did the
it can be said with certainty that this trio
discussion on the use and choice of sampling
influenced American social research more
methods which were far from being matters
thoroughly after their immigration to the
of course. By the end of the 1930s probability
United States in the 1930s11 .
sampling became customary. Furthermore,
All in all, it would probably be more apt
advances were made by Louis Guttman
to refer to both traditions in plural and speak
and Rensis Likert in the attitude scaling
about case studies and statistical methods.
techniques as they both invented scales
This would also direct more attention to
which still carry their names (for details see
the obvious diversity within the traditions,
Converse, 1987: 54–76).
even though a similarity is found between
Not surprisingly, these advances were
the sides of the controversy as both of
not mobilized simultaneously in different
them are said to have adhered to the
disciplines and non-academic environments.
realistic approach (Hammersley, 1989). In
They were also slow to spread, which
America, the controversy between case study
can at least partly be explained by the
and statistical methods faded away before
material prerequisites of the time: ‘Tasks
the Second World War (Platt, 1992). The
now routinely carried out by computer were
case study vanished for decades and the
then done by hand, very laboriously. (…)
conceptual repertoire changed so that the
Quantitative analysis required much more
concept of ‘statistical methods’ was replaced
intensive use of manpower than is the case
by the concept of survey without the epithet
today’ (Bulmer, 1984: 169).
‘social’.
Regarding these developments there is
one study from Europe: Marienthal (Jahoda
et al., 2002), published in Austria in 1933,
From the 1940s to the end of the
which is worth mentioning. This study which
1960s: The rise of survey
became a classic of social research dealt
with unemployment during the depression in The Second World War can be considered
an industrial village. The study combined as a watershed in the sense that almost
32 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
everything written on social research methods Lundberg considered his model apt to social
in the after-war period has focused on survey as well as to natural sciences (Platt, 1996: 78).
methods from different angles. Like the Afterwards Lundberg has been labelled as an
economic depression of the 1930s in America extreme operationalist and his approach has
stimulated social research, the Second World been criticized for being atheoretical (Platt,
War also fuelled empirical research and 1996: 93)13 .
especially the diffusion of survey methods. These decades are widely recognized as
The two volumes of The American Soldier the heyday of survey. However, surprisingly,
are often recognized as the keystones of some of the best-known method textbooks
modern survey (Stouffer et al., 1949a, 1949b). do not focus in a blinkered way only on
They belong to the monumental four-volume the collection of survey data (Jahoda et al.,
research entitled Studies in Social Psychology 1953a, 1953b; Riley, 1963; Selltiz et al., 1961;
in World War II, which were published in Young, 1949). On the contrary, the use of
1949–50. The huge volumes consisted of historical and personal documents, statistical
reanalysis and rewriting of the data collected data and field observation are also presented
during the wartime by the Research Branch of extensively, but when the focus turns to the
the Army. methods of analysis then most of the pages
are reserved to statistical methods. There were
With data gathered from individuals largely by also exceptions to the dominance of survey
written questionnaires, Stouffer and his colleagues analysis in the 1940s and 50s. For instance
tried to capture some of the dynamic influence
of group membership and context on individual
William Whyte used participant observation
perceptions, attitudes, opinions, morale, adjust- and attempted to systematize the case study
ment, and behaviours. Though they had few means method (Platt, 1996: 62–63).
of measuring group process directly, through After the war a change happened in social
tireless replication and imaginative analysis, they research in relation to theory. The British
were able to cast some light on the interplay
between individual and group characteristics.
interwar sociology has been described in
(Converse, 1987: 220) this way: ‘These individuals who conducted
survey before 1939 were not for the most part
Most of the reviewers noticed the contri- consciously trying to develop or test socio-
butions American Soldier made to social logical theory. Their motives lay elsewhere
research.According to Platt the significance of but the end result of their endeavours was
the study was that it established survey as the often the formulation of ideas and theories’
leading method of data collection (Converse, (Kent, 1985: 52). This statement appears also
1987: 217–24; Madge, 1963: 287–332; Platt, to be apt of the American counterpart. After
1996: 60–61). the war empirical research was often explicitly
If methodological advances were made in grasped as an effort to test a theory. However,
empirical research, the logic of survey anal- a slightly different conception of theory is
ysis was recorded and established in method implicated by Stouffer and Lazarsfeld whose
textbooks. Since the 1940s several influential main goal, according to Converse was to keep
textbooks were published (Lundberg, 1942; the scientists shuttling back and forth between
Jahoda et al., 1951a,b; Hyman, 1960) and they theory and data (Converse, 1987: 219).
spread widely outside America12 . In his text- The controversies within survey are sel-
book Social Research (1942) Georg Lundberg dom taken into consideration either in ori-
formulated the steps to be taken in most gin myths or in the critiques of survey.
advanced level scientific research: ‘The work- In reality, in the 1940s and 50s, there
ing hypothesis; the observation and recording were tensions and disagreements on var-
data; the classification and organisation of the ious issues. For example, the usefulness
data collected; generalisation to a scientific of statistical tests in social sciences was
law, applicable to all similar phenomena in disputed (Morrison & Henkel, 1970) and there
the universe studied under given conditions’. was no consensus on whether questionnaires
THE HISTORY OF SOCIAL RESEARCH METHODS 33
should be based on open-ended or structured was seen as its leading exponent. A few
questions. Jean M. Converse claims that years later in Method and Measurement
the controversy ended up in the structured Aaron Cicourel discussed the problems that
questionnaires’ favour, but not by evidence come up when sociologists try to measure
(Converse, 1984, 1987). Many of these con- meaningful action. He did not even intend
troversies can be interpreted as consequences to offer a solution either; if anything he
of strong departmental traditions, which also called for clarification of sociological theory
influenced the style of analysis that was (1964: iii). Since the 1950s Howard S.
preferred (Platt, 1996: 133). Becker contributed to the use of qualita-
Simultaneously with the rise of popularity tive methods and especially to participant
also the critique of survey increased. Because observation with his studies on collective
of his central position in the field, Paul action: ‘I conceive of society as collective
Lazarsfeld was one of the main targets. action and sociology as the study of the
‘Great man theories of history may be forms of collective action’ (Becker, 1970: v).
unfashionable, but they are hard to avoid Becker’s methodological writings differed
here; the whole pattern of publication after from the ones mentioned above as he did
the war is marked by Lazarsfeld’s influence’ not concentrate on dissecting the weaknesses
(Platt, 1996: 61). Altogether his reception, of the survey method. All these researchers
as it has emphasized only his impact on prove that besides the mainstream of survey,
survey methods, is criticized to have been there were efforts towards more qualita-
lopsided compared to his contribution (Platt, tively orientated methods of social research.
1996: 64). It has not been remembered for Textbooks on qualitative methods did not
instance that he insisted that quantitative appear until the end of the 1960s, when The
and qualitative analysis should be combined Discovery of Grounded Theory was published
(Boudon, 1993: 23) and that he promoted (Glaser & Strauss, 1967).
research on the history of social research. In the late 1960s and 70s it was common
Herbert Blumer, the inventor of symbolic to claim that there is a connection between
interactionism, criticized statistical methods functionalism and survey method since they
since the end of the 1920s. In the mid were the leading tendencies in the post-
1950s he targeted his critique especially on war social research. These views rested
‘variable sociology’ as a method of data on the assumption that ‘(t)he relationship
collection and analysis and he saw Lazarsfeld between method and theory is one of elective
as the main proponent of survey research. affinity, but not symmetrical: theory is more
Blumer defined the process of interpretation fundamental, and leads to the corresponding
as ‘the core of human action’ and considered method or (…) the epistemological leads to
variable sociology incapable of catching its the technical’ (Platt, 1996: 106). Later on
essence. Blumer saw the potential of ‘variable Jennifer Platt claims that it was more of a
sociology’ as very restricted. He notes that coincidence that functionalism and survey
it is applicable to ‘those areas of social life dominated at the same time and there is no
and formation that are not mediated by an causal or logical connection between them
interpretative process’ but gives no examples (Platt, 1996: 113–17; 2006a).
of what such might be (Blumer, 1956; see Treating three post-war decades together
also Hammersley, 1989: 113–36.) Despite his gives necessarily a rough-grained picture.
searing criticism against survey, Blumer did It does not do justice to the variety of social
not suggest an alternative way of doing social research during this period. For instance, the
research as he conducted very little empirical year 1960 has sometimes been considered
research himself (Platt, 1996: 120)14 . a watershed, because, first, the pioneers,
In 1959 in The Sociological Imagination e.g. Lazarsfeld, Stouffer and Likert, were no
C. Wright Mills attacked what he called longer active in survey work and, second,
‘abstracted empiricism’. Again Lazarsfeld the modern survey had also been established
34 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
(Denzin & Lincoln, 2000b: 15). On the list combining quantitative and qualitative analy-
a wide range of theories is mentioned such sis of qualitative data.
as symbolic interactionism, ethnomethod-
ology, critical theory, feminism and neo-
From the 1990s onwards:
Marxist theory. Furthermore Denzin and
Unavoidable fragmentation?
Lincoln remind us that ‘diverse ways of
collecting and analysing empirical materi- Apparently, the most difficult task for a
als were also available, including quali- historian is to try to find current patterns.
tative interviewing (…) and observational, Every reader can make a trial and try to
visual, personal experience, and documentary figure out the essential trends of contemporary
methods’ (Denzin & Lincoln, 2000b: 15). social research after reading this handbook.
Exceptionally, the authors draw attention However, two tendencies of the evolution of
to computers that were also beginning to social research methods since 1990 will be
influence the methods of qualitative data discussed here with some, but not systemat-
analysis. Surprisingly, they do not recognize ically selected evidence. The first one is the
the impact of new technical devices (such fragmentation or diffusion of methodological
as tape recorders and video cameras) on the approaches, and the second one is the
methods of data collection15 . increasing tolerance between various methods
All in all, during these two decades qual- of analysis and data collection.
itative methods were established in several I claim that the differentiation of
method textbooks and journals that certainly methodological approaches has continued
do not make up a coherent unity. The nat- to escalate both within qualitative and
uralistic, postpositivistic and constructionist quantitative methods since the beginning
traditions of thinking have been seen as of the 1990s. There are highly specialized
distinctive to qualitative methods of this approaches within both traditions – one can
period. By the 1980s the linguistic turn started specialize in conversation or correspondence
to challenge the more naturalistic lines of analysis, choose to construct a structural
thinking. The linguistic turn probably also equation or multilevel models or end up with
directed the attention from the qualitative- one of the many variations of discursive
quantitative divide for instance to the contro- or narrative analysis, just to mention a
versies within qualitative methods. few alternatives. The increasing number of
There is some indication that at this point analytical approaches can partly be seen as a
the American and European methodologi- consequence of interaction between different
cal traditions differentiated at the level of disciplines and traditions. Simultaneously,
empirical research. In America the success numerous narrowly focused textbooks and
story of survey methods continued and journals have emerged to institutionalize
there was serious work done to advance them.
the methods of survey research. In Britain, The abundance of different methodological
and maybe more generally in Europe, sur- and theoretical approaches or traditions comes
vey methods gained a bad reputation in out clearly from the periodization of quali-
academic research and the listings of their tative methods presented by Norman Denzin
failings started to spread (see e.g. Marsh, and Yvonne Lincoln (2000b). They divide
1982). In the beginning of this period the the field of qualitative methods since 1986
quantitative and qualitative traditions were into four separate, but partly overlapping,
defined as incompatible, but as time went phases that relate to successive waves of
by the juxtaposition was questioned and epistemological theorizing that have ensued
by the end of the 1980s the possibility of a crisis of representation. Each of the
mixing the methods was taken under con- ‘moments’, as they are called, cover only a few
sideration (e.g. Bryman, 1988). For example years and take different stances to the crisis
David Silverman (1985) ‘radically’ suggested representation.
36 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
The four moments are the crisis of represen- whether qualitative and quantitative meth-
tation, the postmodern period of experimental ods can be combined) and later on more
ethnographic writing, the post-experimental confidently proclaiming the use of mixed
moment, and the future. The crisis of methods research (Brannen, 1992, 2005;
representation is associated with some Tashakkori & Teddlie, 1998). The number
methodological texts (e.g. Clifford & Marcus, of textbooks that include chapters on both
1986; Turner & Bruner, 1986) that made qualitative and quantitative traditions has
research and writing more reflexive and recently increased (e.g. Bernard, 2000; May,
conscious of questions of gender, class and 2003). Also the new Journal of Mixed
race. As the crisis of representation meant that Method Research is an indicator of this kind
researchers were not any longer seen able to of change. In its very first number, the
capture the lived experience, it changed the journal presents an outline of a transition
relations of fieldwork, analysis and scientific in relation to mixed methods research as
writing. This led to the search for new models well as a detailed analysis of various types
of truth, method and representation. The post- of multi-methods research (Morgan, 2007).
modern period of experimental ethnographic This tendency has been interpreted as a
writing struggled with the triple crisis of sign of increasing popularity of a more
representation (i.e. crisis of representation, pragmatic approach to research methods
legitimation and praxis). In this moment effort (Tashakkori & Teddlie, 1998).
was made to search for more local and small- These two tendencies raise two questions.
scale theories instead of grand narratives First, the motto of mixed methods approach
and writers also looked for new ways of has proclaimed a ‘dictatorship of the research
composing ethnography. According to Denzin question’ in the choice of research methods
and Lincoln the post-experimental moment (Tashakkori & Teddlie, 1998: 20–22), but
and the future were upon ‘us’ by the turn of how can one rationally choose the method
millennium. In the post-experimental phase in a situation where it is impossible even to
researchers try ‘to connect their writings to master the whole spectrum of alternatives by
the needs of a free democratic society’ and to names? Second, is the suggested tolerance
answer to the demands of a moral qualitative between the various methodological traditions
social science (Denzin & Lincoln, 2000a, only superficial? Is dialogue and deeper
16–18; 2000b). understanding between the diverse lines of
Even though this delineation has been thinking on research methods possible16 ?
criticized (e.g. Alasuutari, 2004), it proves
that the field appears quite complex even to
the insiders. The complexity of the qualitative THE ACTUAL USE OF DIFFERENT
methods is also pointed out by Jaber Gubrium METHODS
and James Holstein (1997). Their overview is
illuminating also historically as it goes to the So far the evolution of social research methods
roots of diverse lines of qualitative methods has been the centre of attention and very little
and takes into account the European tradition. has been said about the actual use of research
What is still missing is a corresponding study methods. However, there are some empirical
of the ramifications of quantitative methods studies that have grasped the actual use of
since the 1970s. different research methods, mainly during
Concurrently with this fragmentation, tol- the post-war decades. They will be shortly
erance between different methodological discussed to shed more light on some points
approaches seems to have slightly increased. of the history that have been dealt with earlier
A growing amount of methodological texts on in this chapter.
have been published during this period first These studies are indicative of the
exploring and pondering the possibility of proportions of the different research methods
mixed methods research (usually asking at various points in time (Snizek, 1975;
THE HISTORY OF SOCIAL RESEARCH METHODS 37
Wells & Picou, 1981; cf. Platt, 1996: 124–25; CONCLUSION AND DISCUSSION
Bechhofer, 1996; also Platt, 2006b). Most of
the studies draw on analyses of journal articles Up to now sociologists have scarcely occupied
and cover a time-span from the end of the themselves with the task of characterising and
defining the method that they apply to the study
1930s to the mid 1970s; only one of the studies of social facts. (Durkheim, 1982: 48)
goes back to the interwar decades.
However, regardless of the differences Since Durkheim’s time social scientists have
in the periods and categorizations of the spared no effort when writing on research
research methods, the main results are methods. Enormous amounts of methodologi-
parallel. Not surprisingly, the studies show cal texts have been written and also numerous
the rise of survey and other methods based controversies have arisen on methodological
on quantification especially in the leading issues.
journals of sociology in America during the The twentieth century has been a period
post-war decades. But they also show that of great expansion and institutionalization for
survey methods never – not even in the 1950s social research and its methods. To sum-
and 60s – were the only ones applied. Other marize, not only the methods as such but
apparently more qualitative approaches such also the relationships of different methods
as ‘observation’, ‘the interpretative method’ and methodological approaches have changed
and ‘the qualitative method’were always used considerably during the period considered
to some extent, although clear trends can be here. There have also been numerous method-
found in popularity of the different methods ological debates both within the quantitative
in America. One of the studies also shows (e.g. on probability sampling, questionnaire
that a small amount of experimental research construction, statistical testing and causal-
was published around the Second World ity) and qualitative approaches (Denzin &
War (Wells & Picou, 1981). Because the Lincoln, 2000b). Less attention is often paid
experimental approach never gained success to these controversies than to the dispute that
in social research, it is easily forgotten is now being referred to as the paradigm war
in method histories that it was regarded and which has drawn most of the attention.
a promising – and sometimes even only There are some issues that seem to occur
rigorously scientific – method supported, for frequently in methodological writing. One is
example by Samuel Stouffer. the relationship between theories and methods
Quite recently, on the basis of studying and another is the relationship of qualitative
journal articles and conference abstracts the and quantitative methods (in whatever ways
decline of survey and more sophisticated they are called). The first one is here passed by
statistical methods has been shown in Britain with only wonder as to whether there has been
(Bechhofer, 1996; Payne et al., 2004). This a shift in the interrelations between methods
data on the actual use of methods also provides and theories during the past decade or two
some evidence for the assumption that social so that methods are more frequently seen as
research has gone to different directions in matters of a technical nature, not as theories
America and Europe. of reality in themselves.
Given the attention that these studies have The controversy between qualitative and
directed to the quantitative-qualitative divide, quantitative approaches is the most discussed
they appear to be motivated by contemporary topic; it has come up frequently with different
methodological debates. Yet most of the names (case study vs. statistical method,
articles have been descriptive, and attempts participant observation vs. survey, qualitative
to explain the changes in the popularity of vs. quantitative) (cf. Platt, 1996: 45). The
particular research methods have been rare. divide has not only split methods textbooks
Not even sloppy explanations drawing on the and teaching but also the research on social
concepts of science studies, like ‘paradigm’, research methods. There are only very few
can be found. texts that even try to cover both approaches.
38 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Denzin, Norman K. & Lincoln, Yvonna S. (eds.) with Especial Reference to Prejudice. Part I: Basic
(2000a) Handbook of Qualitative Research (2nd edn). Processes. New York: Dryden Press.
Thousand Oaks: Sage (1st edn 1994). Jahoda, Maria, Deutsch, Morton & Cook, Stuart W.
Denzin, Norman K. & Lincoln, Yvonna S. (2000b) Intro- (1953b) Research Methods in Social Relations with
duction. The Discipline and Practice of Qualitative Especial Reference to Prejudice. Part II: Selected
Research. In Norman K. Denzin & Yvonna S. Lincoln Techniques. New York: Dryden Press.
(eds.) Handbook of Qualitative Research (2nd edn). Jahoda, Marie, Lazarsfeld, Paul & Zeisel, Hans (2002)
Thousand Oaks: Sage, 1–28. Marienthal.The Sociography of an Unemployed
De Vaus, D.A. (1995) Surveys in Social Research Community. New Brunswick: Transaction Publishers
(4th edn). London: Routledge. (Orig. 1933).
Devine, F. & Heath, S. (1999)Sociological Research Kent, Raymond (1981) The History of British Empirical
Methods in Context. Houndmills: Palgrave. Sociology. Aldershot: Gower.
Durkheim, Emile (1982) The Rules of Sociological- Kent, Raymond (1985) The Emergence of the Socio-
Method and Selected Texts on Sociology and Its logical Survey, 1887–1939. In Martin Bulmer (ed.)
Method. London: The Macmillan Press Ltd. Essays on the History of British Sociological Research.
Eskola, Antti (1992) Sosiologian uudistuminen 1950- Cambridge: Cambridge University Press, 52–69.
luvulla. In Alapuro, Risto, Alestalo, Matti & Kvale, Steinar (1996) InterViews. An Introduction to
Haavio-Mannila, Elina (eds.) Suomalaisen sosiologian Qualitative Research Interviewing. Thousand Oaks:
historia. Porvoo: WSOY, 241–285. Sage Publications.
Fleck, Christian (2002) Introduction to the Transaction Lazarsfeld, Paul F. (1965) Preface. In Oberschall,
Edition. In Marie Jahoda, Paul Lazarsfeld & Hans Anthony (ed.) Empirical Social Research in Germany
Zeisel (eds.) Marienthal.The Sociography of an Unem- 1848–1914. Paris: Mouton & Co, v–viii.
ployed Community. New Brunswick: Transaction Lazarsfeld, Paul F. (1972) Foreword. In Teoksessa
Publishers, vii–xxx. Oberschall, Anthony (ed.) The Establishment of
Glaser & Strauss, Anthony (1967) The Discovery of Empirical Sociology. Studies in Continuity, Disconti-
Grounded Theory. Strategies for Qualitative Research. nuity, and Institutionalization. New York: Harper &
New York: Aldine. Row, vi–xvi.
Gubrium, Jaber F. & Holstein, James A. (1997) The New Lazarsfeld, Paul F. (1977) Notes on the History of
Language of Qualitative Method. New York: Oxford Quantification in Sociology – Trends, Sources and
University Press. Problems. In Kendall, Maurice & Plackett, R.L. (eds.)
Gubrium, Jaber F. & Holstein, James A. (eds.) (2002) Studies in the History of Statistics and Probability
Handbook of Interview Research: Context and vol. II. London: Charles Griffin & Company limited,
Method. Thousand Oaks: Sage. 213–270 (Orig. 1961).
Halfpenny, Peter (1982) Positivism and Sociology: Lazarsfeld, Paul (1993a) Methodological Problems in
Explaining Social Life. London: Allen & Unwin. Empirical Social Research. In Boudon, Raymond (ed.)
Hammersley, Martin (1989) The Dilemma of Qualitative On Social Research and Its Language. Chicago:
Method.Herbert Blumer and the Chicago Tradition. University of Chicago Press, 236–254.
London: Routledge. Lazarsfeld, Paul (1993b) Max Weber and Empirical
Hardy, Melissa & Bryman, Alan (eds.) (2004) Handbook Social Research. In Boudon, Raymond (ed.) On Social
of Data Analysis. Thousand Oaks: Sage. Research and Its Language.Chicago: University of
Harvey, Lee (1987) The Myths of the Chicago School of Chicago Press, 283–298.
Sociology. Aldershot: Avebury. Lundberg, George (1942) Social Research: A Study in
Hoinville, Gerald (1985) Methodological Research on Methods of Gathering Data. New York: Green & co.
Sample Surveys: a Review of Developments in Britain. Madge, John (1963) The Origins of Scientific Sociology.
In Martin Bulmer (ed.) Essays on the History of London: Tavistock publications.
British Sociological Research. Cambridge: Cambridge Marsh, Catherine (1982) The Survey Method. The
University Press, 101–120. Contribution of Surveys to Sociological Explanation.
Hyman, Herbert H. (1960) Survey Design and Analysis. London: George Allen & Unwin.
Principles, Cases and Procedures. Third Printing. Marsh, Catherine (1985) Informants, Respondents and
Glencoe: The Free Press (Orig. 1955). Citizens. In Martin Bulmer (ed.) Essays on the History
Höjer, Henrik (2001) Svenska siffor: nationell integra- of British Sociological Research. London: Cambridge
tion och identifikationgenom statistic 1800–1870. University Press, 206–227.
Hedemora: Gidlunds. May, Tim (2003) Social Research: Issues, Methods and
Jahoda, Maria, Deutsch, Morton & Cook, Stuart W. Process (3rd edn). Buckingham: Open University
(1953a) Research Methods in Social Relations Press.
THE HISTORY OF SOCIAL RESEARCH METHODS 41
Mills, Wright C. (1977) The Sociological Imagination. Selltiz, Claire, Jahoda, Marie, Deutsch, Morton &
Harmondsworth: Pelican Book (Orig. in 1959). Cook, Stuart W. (1961) Research Methods in Social
Morgan, David L. (2007) Paradigm Lost and Prag- Relations. New York: Holt, Rinehart and Winston.
matism Regained. Methodological Implications of Selvin, Hanan C. (1958) Durkheim’s Suicide and
Combining Qualitative and Quantitative Meth- Problems of Empirical Research. American Journal of
ods. Journal of Mixed Methods Research, 1(1), Sociology, 63(6), 607–619.
48–76. Selvin, Hanan C. (1985) Durkheim, Booth and Yule:
Morrison, Denton & Henkel, Ramon E. (eds.) (1970) the Non-diffusion of an Intellectual Innovation.
The SignificanceTest Controversy – A Reader. In Martin Bulmer (ed.) Essays on the History of
Chicago: Aldine. British Sociological Research. Cambridge: Cambridge
Moser, Claus & Kalton, Graham (1986) Survey University Press, 70–82.
Methods in Social Investigation (2nd edn). Aldershot: Silverman, David (1985) Qualitative Methodology and
Gover. Sociology. Describing the Social World. Aldershot:
Oakley, Ann (2000). Experiments in Knowing.Gender Gover.
and Method in Social Sciences. Cambridge: Polity Snizek, W.E. (1975) The Relationship between Theory
Press. and Research: A Study in the Sociology of Sociology.
Oberschall, Anthony (1965) Empirical Social Research in Sociological Quarterly, 16, 415–428.
Germany 1848–1914. Paris: Mouton & Co. Stigler, Stephen M. (1986) The History of Statistics:
Payne, G., Williams, M. & Chamberlain, S. (2004). The Measurement of Uncertainty Before 1900.
Methodological Pluralism in British Sociology. Cambridge: Harvard University Press.
Sociology, 38(1), 153–163. Stouffer, S.A., Suchman, E.A., de Vinney, L.C.,
Platt, Jennifer (1983) Weber’s verstehen and the History Star, S.A. & Williams, R.M. (1949a) The American
of Qualitative Research: The Missing Link. British Soldier vol. I. Adjustment during Army Life. Princeton:
Journal of Sociology, 26(3), 448–466. Princeton University Press.
Platt, Jennifer (1986) Qualitative Research for the State. Stouffer, S.A., Suchman, E.A., de Vinney, L.C.,
Quarterly Journal of Social Affairs, 2, 87–108. Star, S.A. & Williams, R.M. (1949b) The American
Platt, J. (1992) ‘Case Study’ In American Methodological Soldier vol. II. Combat and Its Aftermath. Princeton:
Thought. Current Sociology, 40(1), 17–48. Princeton University Press.
Platt, Jennifer (1996) A History of Sociological Research Tashakkori, Abbas & Teddlie, Charles (1998) Mixed
Methods in America 1920–1960. Cambridge: Methodology. Combining Qualitative and Quantita-
Cambridge University Press. tive Approaches. Sage: Thousands Oaks.
Platt, Jennifer (2002) The History of Interview. Turner, Victor W. & Bruner, Edward (eds.) (1986)
In Gubrium, Jaber F. & Holstein, James A. (eds.) The Anthropology of Experience. Urbana: University
Handbook of Interview Research: Context and of Illinois Press.
Method. Thousand Oaks: Sage, 33–53. Vidich, Arthur J. & Lyman, Stanford M. (1994)
Platt, Jennifer (2006a) Functionalism and the Qualitative Methods: Their History in Sociology
Survey: The Relation of Theory and Method. and Social Anthropology. In Norman K. Denzin &
In Williams, M. (ed.) Philosophical Foundations of Yvonna S. Lincoln (eds.) Handbook of Qualitative
Social Research Methods. London: Sage, 217–251 Research (2nd edn). Thousand Oaks: Sage, 23–59
(orig. in Sociological Review, 34(3), 501–536). (2nd edn 2000).
Platt, Jennifer (2006b) How Distinctive are Canadian Wells, R.H. & Picou, J.S. (1981) American Soci-
Research Methods? Canadian Review of Sociology ology: Theoretical and Methodological Structures.
and Social Anthropology, 43(2), 205–231. Washington DC: University Press of America.
Porter, Theodore M. (1986) The Rise of Statistical Young, Pauline V. (1949) Scientific Social Surveys
Thinking 1820–1900. Princeton: Princeton University and Research. An Introduction to the Background,
Press. Content, Methods, and Analysis of Social Studies
Riley, Matilda White (1963) Sociological Research I. (2nd edn). New York: Prentice-Hall (1st edn 1939).
A Case Approach. New York: Harcourt, Brace & Zeisel, Hans (2002 [1930]) Afterword. Toward a History
World, Inc. of Sociography. In Jahoda, Marie, Lazarsfeld, Paul &
Schaeffer, Nora Cate & Presser, Stanley (2003) The Zeisel, Hans (eds.) Marienthal. The Sociography
Science of Asking Questions. Annual Review of of an Unemployed Community. New Brunswick:
Sociology, 29, 65–88. Transaction Publishers, 99–125 (orig. 1933).
4
Assessing Validity in
Social Research
Martyn Hammersley
Much discussion of how validity should be comes to more detailed criteria of assess-
assessed in social research has been organized ment these need to vary according to the
around the distinction between quantitative nature of the conclusions presented, and the
and qualitative approaches, with arguments characteristics of the specific methods of
over whether or not the same criteria apply data collection and analysis used. In the
to both. It is often suggested that quantitative course of the chapter, I will raise questions
inquiry has a clear set of assessment criteria, about both older positivist conceptions of
so that readers (even those who are not quantitative research, and of how it should be
researchers) can judge the quality of such assessed, and those more recent relativist and
research relatively easily, whereas in the case postmodernist ideas, quite influential among
of qualitative inquiry no agreed or easily qualitative researchers, which reject epistemic
applicable set of criteria is available. While criteria of assessment, and perhaps even all
this is often presented as a problem, some criteria.
qualitative researchers deny the possibility or In the first section, I will examine the
even the desirability of assessment criteria. criteria normally associated with quantitative
In this chapter I will argue that this work. This discussion will raise several
contrast between the two approaches is, to questions. One of these concerns what is
a large extent, illusory; that it relies on being assessed, and the need to make
misleading conceptions of the nature of some differentiation here, notably between
research, both quantitative and qualitative, assessing findings and assessing the value of
and of how it can be assessed. I will suggest particular research techniques. Another issue
that the general standards in terms of which relates to what is meant by the term ‘criterion’
both the process and products of research and what role criteria play in the process of
should be judged are the same whichever assessment. In the second half of the chapter
approach is employed. Furthermore, when it I will examine some of the arguments in
ASSESSING VALIDITY IN SOCIAL RESEARCH 43
the qualitative research tradition about how assessing the validity of quantitative research:
studies ought to be evaluated. were the measurement procedures reliable
and valid? And it is often suggested that,
in evaluating a study, the way to go about
QUANTITATIVE CRITERIA? answering this question is to ask whether
reliability and validity tests were carried out,
If we look at the methodological literature and whether the scores on these tests were high
dealing with quantitative research, and indeed enough to warrant a positive evaluation. This,
at many treatments of the issue of validity then, is one set of commonly used criteria.
in relation to social inquiry more generally, The second key area to which well-known
several standard criteria are usually men- criteria of assessment relate concerns the
tioned. These concern three main aspects generalizability of the findings. This is an
of the process of research: measurement, especially prominent issue in the context of
generalization, and the control of variables. survey research, where data from a sample of
In relation to measurement, the require- cases are often used as a basis for drawing
ments usually discussed are that measures conclusions about the characteristics of a
must be reliable and valid. Reliability is larger population. In this context, the issue
generally taken to concern the extent to is relatively clear: are the statements made
which the same measurement technique about the sample also true of the population?
or strategy produces the same result on Short of investigating the whole population,
different occasions, for example when used which would render sampling pointless, there
by different researchers. This is held to is no direct means of answering this question.
be important because if researchers are However, statistical sampling theory provides
using standard measurement devices, such as a basis for coming to a reasonable conclusion
attitude scales or observation schedules, they about the likely validity of inferences from
need to be sure that these give consistent sample to population. If the sample was
results. Furthermore, it is often argued that sufficiently large, and was drawn from
any measure that is not reliable cannot be the population on the basis of some kind
valid, on the grounds that, if its results are of probability sampling, then a statistical
inconsistent, the measurements it produces measure can be provided of how confident we
cannot be consistently valid. As this argument can be that the findings are generalizable. The
indicates, validity of measurement is seen as criteria involved here then, are the sampling
important by quantitative researchers, even procedures employed and the results of a
though it is usually taken to be more difficult statistical significance test2 .
to assess than reliability. Indeed, given the The final area where quantitative criteria are
link between the two criteria, reliability well established concerns whether variables
tests are often treated as one important have been controlled in a sufficiently effective
means for assessing validity. Nevertheless, manner to allow sound conclusions to be
separate validity tests may also be used, for drawn about the validity of causal or predic-
instance checking whether different ways of tive hypotheses; this sometimes being referred
measuring the same property produce the to as causal validity. Experimental designs
same findings, or whether what is found when employing random allocation of subjects to
measuring the property in a particular set treatment and control groups are often seen
of objects is consistent with the subsequent as the strongest means of producing valid
behaviour of those objects. These tests are conclusions in this sense. However, statistical
often described as assessing different kinds of control, through multivariate analysis, is an
validity, in this case convergent and predictive alternative strategy that is employed in much
validity1 . social survey research. Moreover, with both
On the basis of this initial discussion, we forms of control, statistical tests are often
can identify a first key question to be applied in applied to assess the chances that the results
44 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
were a product of random error rather than the findings or conclusions of a study are
of the independent variable. Here, then, the true. The three aspects discussed above refer
criteria concern whether physical or statistical to areas where error can undermine research
control was applied, and the confidence we conclusions. For example, what was referred
can have in ruling out random error. to as ‘causal validity’is concerned with threats
Undoubtedly the most influential account to valid inferences about causality arising
of evaluative criteria for quantitative research from confounding factors. Furthermore, the
that draws together these three different distinction between types of measurement
aspects into a single framework is that validity actually refers to ways in which
developed by Campbell and his colleagues we can assess whether our measurements
(Campbell 1957; Campbell and Stanley 1963; are accurate. There is also the problem that
Cook and Campbell 1979). This distinguishes the distinction between internal and external
between internal and external validity, where validity obscures the fact that ‘causal validity’
the former is usually seen as incorporat- implies a general tendency, for the cause to
ing measurement and causal validity, while produce the effect, that operates beyond the
external validity refers to generalizability3 . cases studied (Hammersley 1991). As a result,
Campbell et al.’s scheme was originally devel- internal validity is not distinct from external
oped for application to quasi-experimental validity.
research, but it has subsequently been applied Rather than differentiating types of validity,
much more widely. we need to distinguish between the different
There is no doubt that these three issues are sorts of knowledge claim that studies pro-
potentially key aspects of any assessment of duce. There are three of these: descriptive,
validity in quantitative research, and perhaps explanatory, and theoretical5 . Recognizing
in social inquiry more generally. However, the particular sort of conclusion a study makes
there are a number of important qualifications is important because each of the three types of
that need to be made. knowledge claim has different requirements,
First, we must be clear about what and therefore involves somewhat different
we are assessing. There is confusion in threats to validity. This is true even though
much discussion of measurement between there is some overlap caused by the way
a concern with assessing the findings of that these types of knowledge are interrelated:
the measurement process and assessing the descriptive claims are required as subordinate
measurement technique or strategy employed. elements in the other two kinds; and explana-
Validity relates only to the former, while tions always depend upon implicit or explicit
reliability concerns the latter. We can talk theoretical knowledge6 .
about whether the findings are or are not In assessing the validity of descriptions, we
valid, but it makes no sense to describe a must be concerned with whether the features
measurement technique as valid or invalid, ascribed to the phenomena being described
unless we are adopting a different sense are actually held by those phenomena, and
of the term ‘validity’, using it to mean perhaps also with whether they are possessed
‘appropriately applied’. It is, of course, true to the degrees indicated. Also of importance
that we should be interested in whether a may be whether any specification of changes
measurement technique consistently produces in those features over time, or any account of
accurate results. In fact, as is sometimes sequences of events, are accurate.
done, there would be good reason to define In assessing the validity of explanations we
‘reliability’ of measurement techniques as first of all need to consider the validity of
the capacity to produce consistently valid the subordinate descriptions: those referring
measurements4 . both to what is being explained and to the
Second, it is misleading to believe that there explanatory forces that are cited. Second, we
can be different types of validity. Validity must assess the validity of the theoretical
is singular not multiple; it concerns whether principle that provides the link between
ASSESSING VALIDITY IN SOCIAL RESEARCH 45
proposed cause(s) and effect(s). Third, we social world, is an issue that is relevant to
need to consider whether that theoretical all kinds of research, even those that manage
principle identifies what was the key causal to achieve low reactivity (Hammersley and
process in the context being investigated. Atkinson 2007: chapter 1).
Finally, in judging the validity of theoretical In summary, then, validity is a crucial
conclusions, we will also need to assess the standard by which the findings of research
validity of any descriptive claims on which should be judged, and it is a single standard
they rely, both about the causal mechanism that applies across the board. However, what
involved and about what it produces. In is required for assessing likely validity varies
addition, we will need to find some means of according to the nature of the findings,
comparing situations in which it does and does and also according to the research methods
not operate, and of discounting other factors employed. From this point of view, the
that could generate the same outcome. argument that qualitative and quantitative
There is also variation in the threats to approaches require different assessment cri-
validity operating on different sources of teria is defective both in drawing a distinction
evidence, and this variation must also be where none exists and in obscuring more
taken into account in assessing knowledge specific and essential differences (in relation
claims. What is involved here is partly that to types of knowledge claim and specific data
some methods have distinctive validity threats sources).
associated with them. For example, if we rely Another important point relates to the
on the accounts of informants about some set notion of assessment criteria. There is some-
of events, then we must recognize that there times a tendency within the literature of
are distinctive potential biases operating on quantitative methodology to imply that there
these accounts, in addition to those operating are procedures which can tell us whether or
on researchers’ interpretations, for example, not, for instance, a measure is valid. Thus,
to do with whether the informant is able or reliability and validity tests are often said to
willing to provide accurate information in measure validity. However, they cannot do
relevant respects. By contrast, in the case of that. They can give us evidence on which
direct observation by a researcher only one we can base judgements about the likely
of these two sources of bias operates. (At the validity of the findings, but they cannot
same time, it is perhaps worth underlining that eliminate the role of judgement. Similarly,
closely associated with many sources of bias the use of experimental control, and random
are sources of potential insight, for instance, allocation of subjects to treatment and control
informants may be able to recognize what is groups, does not guarantee the validity of
going on in ways that are less easily available the findings; nor does the absence of these
to an external researcher.) methods mean that the findings are invalid,
Equally important is the fact that particular or even that the studies concerned provide us
threats to validity vary in degree across with no evidence. In fact, there are usually
methods. Reactivity is little or no threat with trade-offs such that any research strategy
some sources of data, such as the use of extant that is more effective in dealing with one
documents or covert observation of public threat to validity generally increases the
behaviour. By contrast, it is a very significant danger of other validity threats. Furthermore,
danger in the case of laboratory experiments, making sound judgements about validity
where subjects’ actions may be shaped by the relies on background knowledge, both about
experimental setup and by the appearance and the substantive matters being investigated and
behaviour of the experimenter. At the same also about the sources of data and methods of
time, we should note that what is threatened investigation employed. This means that there
by reactivity, the extent to which we can safely will be significant differences between people
generalize our findings from the situations in how well placed they are to assess the
studied to other relevant situations in the validity of particular sets of research findings
46 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
at the world, or gaining the ability to speak a requirement is to challenge claims to universal
different language7 . knowledge and to celebrate marginalized and
These developments led the way for transgressive perspectives, perhaps in the
some qualitative researchers to argue that name of freedom and democracy. Here, ethics
older conceptions of validity, and of validity and politics are foregrounded. Along these
criteria, are false or outdated8 . Many com- lines, Denzin and Lincoln argue that the
mentators claimed that we must recognize criteria of assessment for qualitative research
that there are simply different interpretations should be those of a ‘moral ethic (which) calls
or constructions of any set of phenomena, for research rooted in the concepts of care,
with these being incommensurable in Kuhn’s shared governance, neighbourliness, love and
sense; they are not open to judgement in kindness’ (Denzin and Lincoln 2005: 911).
terms of a universal set of epistemic criteria. Closely related to this line of argument is an
At best, there can only be plural, culturally insistence on seeing all claims to knowledge
relative, ways of assessing validity. This as intertwined, if not fused, with attempts
argument, variously labelled ‘relativism’ or to exercise power. Thus, the work of social
‘postmodernism’9 , was reinforced by claims scientists has often come to be analyzed both
from feminists, anti-racists, and others. They in terms of how it may be motivated by
argued that conventional social science sim- their own interests and/or in terms of the
ply reproduces the dominant perspectives wider social functions it is said to serve, in
in society, that it marginalizes other voices particular the reproduction of dominant social
that rely on distinctive, and discrepant, structures. In the context of methodology,
epistemological frameworks. From this point this has involved an emphasis on the senses
of view, the task of social science should be to in which researchers exercise power over
counter the hegemony of dominant groups and the people they study; and this has led to
their discourses, and thereby to make way for calls for collaborative or practitioner research,
marginalized discourses to be heard and their in which decisions about who or what to
distinctive epistemologies to be recognized. In research, as well as about research method,
this way, the original conception of epistemic are made jointly with people rather than their
criteria, and perhaps even the very notion of being simply the focus of study. Indeed, some
validity or truth, are rejected as ideological have argued that outside researchers should
and replaced by a political, ethical or aesthetic do no more than serve as consultants helping
concern with valuing, appreciating, or treating people to carry out research for themselves.
fairly, multiple conceptions of or discourses These ideas have been developed within the
about the world. action research movement, among feminists,
These critics of assessment criteria claim and are also currently very influential in the
then, that since there can be no foundation of field of research concerned with the lives
evidence that is simply given and therefore of children and young people (see Reason
absolutely certain in validity from which and Bradbury 2001 and MacNaughton and
knowledge can be generated, or against Smith 2005). Almost inevitably, this breaking
which hypotheses can be tested, then all down of the barriers between researchers
knowledge, in the traditional sense of that and lay people, designed to undermine any
word, is impossible. We are, to quote Smith claim to authority based on expertise, leads
and Hodkinson (2005: 915) ‘in the era to epistemic judgements being made in ways
of relativism’. This means that we must that diverge from those characteristic of
recognize that any claims to knowledge, traditional forms of research (qualitative as
including those of researchers, can only well as quantitative), and/or to them being
be valid within a particular framework of mixed in with or subordinated to other
assumptions; or within a particular socio- considerations.
cultural context. And, as already noted, some The problem with much of this criticism
writers have concluded from this that the main of epistemic criteria is that we are presented
48 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
with contrasting, old and new, positions as if a clash between the latter and scientific
there were no middle ground. Furthermore, findings, can it be assumed that science must
the irony is that the radical critique of foun- always be trusted. From this point of view,
dationalist epistemology inherits the latter’s science, including social science, becomes a
definition of ‘knowledge’. Foundationalists more modest enterprise than it was under
define ‘knowledge’as being absolutely certain foundationalism. But, at the same time, the
in validity. The critics show, quite convinc- specialized pursuit of knowledge is justified
ingly, that no such knowledge is possible. as both possible and desirable. By contrast
But why should a belief only be treated as with relativist and postmodernist positions,
knowledge when its validity is absolutely fallibilism does not reduce the task of social
certain? There is a third influential tradition science to challenging dominant claims to
of philosophical thinking, fallibilism, that knowledge or celebrating diverse discourses.
is at odds with both foundationalism and Nor is it turned into a practical or political
relativism/scepticism. This position can be project directly concerned with ameliorating
found in the writings of some contemporaries the world.
of Descartes, such as Mersenne, in the work of From this point of view, then, epistemic
pragmatists like Peirce and Dewey, and in the assessment of research findings is not only
philosophy of Wittgenstein. From this point of possible but is also the most important form of
view, while all knowledge claims are fallible – assessment for research communities. More-
in other words, they could be false even when over, while judgements cannot be absolutely
we are confident that they are true – this does certain, they can vary in the extent to which
not mean that we should treat them as all we are justified in giving them credence. In
equally likely to be false, or judge them solely my view, it also follows from this position
according to whether or not they are validated that the findings from qualitative research
by our own cultural communities. While we should be subjected to exactly the same
make judgements about likely validity on the form of assessment as those from quantitative
basis of evidence that is itself always fallible, studies, albeit recognizing any differences in
this does not mean either that validity is the the nature of the particular knowledge claims
same as cultural acceptability or that different being made and in the particular methods
cultural modes of epistemic judgement are all employed.
equally effective. Furthermore, in the normal
course of making sense of, and acting in, the
world we do not (and could not) adopt those OTHER RECENT DEVELOPMENTS
assumptions10 .
Where the sceptical/relativist position chal- Within the last decade there has been a revival
lenges the claims of science to superior of older, positivist ideas about the function
knowledge, the fallibilist position does not and nature of social research, and about
do this, although it insists on a more how it should be assessed. With the rise of
modest kind of authority than that implied by what is often referred to as the new public
foundationalism. It points to both the power management in many Western and other
of, and the limits to, scientific knowledge societies (Pollitt 1990; Clarke and Newman
(Haack 2003). The normative structure of 1997), along with the growing influence of
science is designed to minimize the danger ideas about evidence-based policy-making
of error, even though it can never eliminate and practice, there have been increasing
it. Moreover, while science can provide us pressures for the reform of social research
with knowledge that is less likely to be so as to make it serve the demands of policy
false than that from other sources, it cannot and practice more effectively. These pressures
give us a whole perspective on the world have been particularly strong in the field
that can serve as a replacement for practical of education, but are also increasingly to
forms of knowledge. Nor, in the event of be found elsewhere11 . The task of research,
ASSESSING VALIDITY IN SOCIAL RESEARCH 49
from the viewpoint of many policy-makers significantly in character. The project was
today, is to demonstrate which policies and commissioned by the UK Economic and
practices ‘work’, and which do not; and this Social Research Council, and the background
has led to complaints that there is insufficient here was very much recent criticism of
relevant research, and that much of it is educational research for being of poor quality
small-scale and does not employ the kind and little practical relevance. At the same
of experimental method that is taken to be time, a prime concern of the authors seems
essential for identifying the effects of policies to have been to provide criteria for use in
and practices. To a large extent, this attitude the upcoming Research Assessment Exercise
reflects the fact that evidence-based practice (RAE) in the UK, a process that is used
has its origins in the field of medicine, where to determine the distribution of research
randomized, controlled trials are common12 . resources across universities. A longstanding
At the same time, there have been attempts complaint on the part of some educational
on the part of some qualitative researchers researchers has been that the RAE uses tra-
to show how their research can contribute to ditional scholarly criteria of assessment that
evidence-based policy and practice, and also discriminate against applied work directed
to specify the criteria by which qualitative at practitioner audiences. And there has
studies can be judged by ‘users’. For example, been much discussion of how this alleged
in the UK two sets of assessment criteria bias can be rectified. In addressing the
for qualitative research have recently been problem, Furlong and Oancea produce four
developed that are specifically designed to sets of criteria. The first is epistemic in
demonstrate how it can serve policy-making character, being concerned with issues of
and practice. The first was commissioned by validity and knowledge development. More
the Cabinet Office in the UK from the National striking, however, are the other three sets of
Centre for Social Research, an independent criteria: technical, practical, and economic.
research organization (Spencer et al. 2003). Here educational research is to be judged
These authors provide a discussion of the in terms of the extent to which it provides
background to qualitative research, and of techniques that can be used by policy-makers
previous sets of criteria, before outlining a or practitioners; the ways in which it informs,
lengthy list of considerations that need to be or could inform reflective practice; and/or
taken into account in assessing the quality of the extent to which it offers ‘added value’
qualitative research. They take great care in efficiently14 .
making clear that these should not be treated There is an interesting parallel between
as a checklist of criteria that can give an the emphasis placed by Furlong and Oancea
immediate assessment of quality. However, on non-epistemic criteria and the move,
perhaps not surprisingly, the authors have outlined earlier, on the part of some qualitative
been criticized, on one side, for producing researchers to abandon epistemic criteria
too abstract a list of criteria and, on the other, completely. While many of the latter are
for providing what will in practice be used as hostile to the pressure for research to serve
a checklist, one which distorts the nature of evidence-based policy-making and practice
qualitative research13 . (see, for instance, Lather 2004), there is
Another recent set of criteria for assessing what might be described as a ‘third way’
research emerged in the field of educa- approach championed by some, notably those
tion (Furlong and Oancea 2005). While it associated with the tradition of qualitative
was not restricted to qualitative research, action research. This redirects the pressure
being concerned with ‘applied and practice- on research for policy- and practice-relevance
based educational inquiry’ more generally, away from a positivist emphasis on the need
the authors clearly had qualitative work for quantitative methods to demonstrate ‘what
particularly in mind. This venture had rather works’ towards a broader view of worthwhile
different origins from the first, and differs forms of research and of the ways in which
50 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
it can shape practice. It is seen as playing Another recent development that has impor-
a much more interactive and collaborative tant implications for assessing the validity
role, at least in relation to practitioners of research findings is a growing movement
‘on the ground’. Advocates of this sort of among some groups of social scientists
position, such as John Elliott, are as critical towards championing the integration of quan-
of ‘academic’ educational research as the titative and qualitative methods (see Bryman
advocates of the new positivism. Where they 1988; Tashakkori and Teddlie 2003a). ‘Mixed
differ is in the kind of research they believe is methods’ research is promoted as capitalizing
needed to inform policy-making and practice on the strengths of both approaches. And
(see Elliott 1988, 1990, and 1991; see also this movement raises at least two issues of
Hammersley 2003). importance in the present context. First, there
We can see then, that besides divergent is the question of what sort of philosophical
philosophical orientations between and framework, if any, should underpin mixed
among quantitative and qualitative methods research, since this has implications
researchers, equally important in shaping for how findings should be assessed. After all,
ideas about how social research should be simply combining the various types of validity
assessed are views about its social function. identified by both quantitative and qualitative
In crude terms, we can distinguish four broad researchers produces a formidable list (see
positions. First, there are those who see Teddlie and Tashakkori 2003: 13). A number
most social science research, especially that of alternative ways of formulating mixed
located in universities, as properly concerned methods research as a ‘third way’ have been
exclusively with producing knowledge proposed, from the idea of an ‘aparadigmatic’
about human social life whose relevance orientation that dismisses the need for reliance
to policy and practice is indirect, albeit on any philosophical assumptions at all to
not unimportant. Second, there are those the adoption of one or another alternative
who share the belief that social research research paradigm, such as pragmatism or
must retain its independence, rather than ‘transformative-emancipatory’ inquiry (see
being subordinated to policy-making or Teddlie and Tashakkori 2003b). It should
professional practice, but who regard the be noted, though, that the reaction of many
criteria of assessment as properly political, qualitative researchers to mixed methodology
ethical, and/or aesthetic. For example, the task approaches is that, in practice, they force
may be viewed as to ‘disturb’ or ‘interrupt’ qualitative work into a framework derived
conventional thinking in a manner that is not from quantitative method, of a broadly
dissimilar to Socratic questioning, in its most positivist character. And there is some truth
sceptical form. Third, there are those who, in this.
while they see the purpose of social science A second issue raised by mixing quan-
very much as producing knowledge, insist titative and qualitative approaches concerns
that for this to be worthwhile it must have whether new, distinctive, criteria of assess-
direct policy or practice implications: the task ment are required, for instance relating
is to document what policies and practices specifically to the effectiveness with which
‘work’. Finally, there are those who doubt the different kinds of method have been com-
the capacity of social science to produce bined. Here, as elsewhere, there is often insuf-
knowledge about the social world, in the ficient clarity about the difference between
conventional sense of that term, and who assessing research findings, as against assess-
believe the task of social researchers is to ing the effectiveness with which particular
work in collaboration with particular groups research projects have been pursued, the
of social actors to improve or transform the value of particular methods, the competence
world15 . Clearly, which of these stances of researchers, and so on. Moreover, there
is adopted has major implications for the is also the question of whether combining
question of how research should be evaluated. quantitative and qualitative methods is always
ASSESSING VALIDITY IN SOCIAL RESEARCH 51
desirable, and of whether talk about mixing the quality of qualitative research. Finally,
the two approaches does not in effect embalm I considered the implications of the growing
what is, in fact, too crude and artificial a advocacy of ‘mixed methods’ research, which
distinction. in some respects is not unrelated to these
external pressures.
We are a long way from enjoying any
CONCLUSION consensus among social scientists on the issue
of how social research ought to be assessed.
Clearly, the assessment of research findings However, the differences in view cannot
is not a straightforward or an uncontentious be mapped onto the distinction between
matter. In this chapter I began by outlining the quantitative and qualitative approaches, even
criteria usually associated with quantitative though the argument is often formulated in
research, and noted serious problems with those terms. It is essential to engage with the
these: that there is often confusion about what complexities of this issue if any progress is to
is being assessed, and a failure to recognize be made in resolving the disputes.
differences in what is required depending
upon the nature of the knowledge claim made
and the particular research method used. In
addition, I argued that it is not possible to NOTES
have criteria in the strict sense of that term,
as virtually infallible indicators of validity 1 These commitments to reliability and measure-
or invalidity. Judgement is always involved, ment validity, and distinctions between types of
validity, are spelled out in many introductions to social
and this necessarily depends upon background research. For a recent example, see Bryman 2001:
knowledge and practical understanding. 70–4. As Bryman indicates, the checking of reliability
In the second half of the chapter, I consid- and validity in much quantitative research is rather
ered the relativist and postmodernist views limited, sometimes amounting to ‘measurement
that are currently influential among many by fiat’.
2 Of course, there are many other issues that survey
qualitative researchers. These deny the rele- researchers take into account, not least non-response.
vance of epistemic standards of assessment, in 3 The different accounts produced over several
favour of an emphasis on political, ethical, or years allocate measurement somewhat differently: see
practical ones. I tried to show how this stems Hammersley 1991.
from a false response to the epistemological 4 On the considerable variation in definitions
of ‘reliability’ and measurement ‘validity’, see
foundationalism that has informed much Hammersley 1987.
thinking about quantitative research. Instead, 5 There are also value claims: evaluations and
I suggested that what is required is a fallibilist prescriptions. I am taking it as given that research
epistemology. This recognizes that absolute cannot validate these on its own: see Hammersley
certainty is never justified but insists that it 1997.
6 The last of these claims is controversial: there are
does not follow either that we must treat all those, particularly among commentators on historical
knowledge claims as equally doubtful or that explanation, who deny that explanations always
we should judge them on grounds other than appeal to theoretical principles. For a discussion of this
their likely truth. issue, see Dray 1964.
Of course, discussion of these issues never 7 For valuable recent accounts of Kuhn’s complex,
and often misunderstood, position, see Hoyningen-
takes place in a socio-cultural vacuum, and Huene 1993, Bird 2000, and Sharrock and Read 2002.
I outlined some recent changes in the external 8 For an extended account of a more moderate
environment of social science research, in the position, see Seale 1999.
US and the UK and elsewhere, which have 9 Smith 1997 and 2004 distinguishes between
increased demands that they demonstrate their his own relativist position and that of some post-
modernists. However, the distinction is not cogent,
value. I examined a couple of the responses in my view (Hammersley 1998). At the very least,
to these pressures, in terms of attempts to there is substantial overlap between relativist and
develop criteria that should be used to assess postmodernist positions.
52 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
10 For a sophisticated recent fallibilist account in Elliott, J. (1990) ‘Educational research in crisis:
epistemology, see Haack 1993. performance indicators and the decline in excellence’,
11 On the history of these developments in the UK, British Educational Research Journal, 16, 1, pp. 3–18.
see Hammersley 2002: chapter 1. On parallel changes Elliott, J. (1991) Action Research for Educational
in the US, see Feuer et al. 2002, Mosteller and Boruch Change, Milton Keynes, Open University Press.
2002, and Lather 2004.
Feuer, M. J., Towne, L. and Shavelson, R. J.
12 For these arguments, see, for example, Oakley
(2002) ‘Scientific culture and educational research’,
2000 and Chalmers 2003; see also Hammersley 2005.
13 See Kushner 2004; Murphy and Dingwall Educational Researcher, 31, 8, pp. 4–14.
2004; Torrance 2004. One critique has dismissed it Furlong, J. and Oancea, A. (2005) Assessing Quality in
as a ‘government-sponsored framework’ (Smith and Applied and Practice-focused Educational Research:
Hodkinson 2005: 928–9). A Framework for Discussion, Oxford, Oxford Univer-
14 Hammersley 2006 provides an assessment of sity Department of Educational Studies.
the case put forward by Furlong and Oancea for these Goetz, J. P. and LeCompte, M. D. (1984) Ethnography
criteria. and Qualitative Design in Educational Research,
15 These four positions are intended simply to Orlando, Academic Press.
map the field; many researchers adopt positions which Haack, S. (1993) Evidence and Inquiry, Oxford, Blackwell.
combine and/or refine their elements. Haack, S. (2003) Defending Science – Within Reason,
Amherst, NY, Prometheus Books.
Hammersley, M. (1987) ‘Some notes on the terms
“validity” and “reliability” ’, British Educational
REFERENCES Research Journal, 13, 1, pp. 73–81.
Hammersley, M. (1991) ‘A note on Campbell’s
Bird, A. (2000) Thomas Kuhn, Princeton, Princeton distinction between internal and external validity’,
University Press. Quality and Quantity, 25, pp. 381–7.
Bryman, A. (1988) Quantity and Quality in Social Hammersley, M. (1997) Reading Ethnographic Research,
Research, London, Allen and Unwin. 2nd edition, London, Longman.
Bryman, A. (2001) Social Research Methods, Oxford, Hammersley, M. (1998) ‘Telling tales about educational
Oxford University Press. research: a response to John K. Smith’, Educational
Campbell, D. T. (1957) ‘Factors relevant to the validity of Researcher, 27, 7, pp. 18–21.
experiments in social settings’, Psychological Bulletin, Hammersley, M. (2002) Educational Research, Policy-
54, 4, pp. 297–312. making and Practice, London, Paul Chapman.
Campbell, D. T. and Stanley, J. (1963) ‘Experimental and Hammersley, M. (2003) ‘Can and should educational
quasi-experimental designs for research on teaching’, research be educative?’, Oxford Review of Education,
in N. L. Gage (ed.) Handbook of Research on 29, 1, pp. 3–25.
Teaching, Chicago, Rand McNally. Hammersley, M. (2005) ‘Is the evidence-based practice
Chalmers, I. (2003) ‘Trying to do more good than movement doing more good than harm?’, Evidence
harm in policy and practice: the role of rigorous, and Policy, 1, 1, pp. 1–16.
transparent, up-to-date evaluations’, Annals of the Hammersley, M. ‘Troubling criteria: a critical com-
American Academy of Political and Social Science, mentary on Furlong and Oancea’s framework for
589, pp. 22–40. assessing educational research’, forthcoming British
Clarke, J. and Newman, J. (1997) The Managerial State, Educational Research Journal, 2008.
London, Sage. Hammersley, M. and Atkinson, P. (2007) Ethnography:
Cook, T. D. and Campbell, D. T. (1979) Quasi- Principles in Practice, 3rd edition, London, Routledge.
Experimentation: Design and Analysis Issues for Field Hoyningen-Huene, P. (1993) Reconstructing Scientific
Situations, Boston, MA, Houghton-Mifflin. Revolutions: Thomas S. Kuhn’s Philosophy of Science,
Denzin, N. K. and Lincoln, Y. S. (2005) ‘The art Chicago, University of Chicago Press. (First published
and practices of interpretation, evaluation, and in German in 1989.)
representation’, in Denzin, N. K. and Lincoln, Y. S. Kuhn, T. S. (1970) The Structure of Scientific Revolutions,
(eds.) Handbook of Qualitative Research, 3rd edition, Chicago, University of Chicago Press.
Thousand Oaks, CA, Sage. Kushner, S. (2004) ‘Government regulation of qualitative
Dray, W. (1964) Philosophy of History, Englewood Cliffs, evaluation’, Building Research Capacity, 8, May,
NJ, Prentice-Hall. pp. 5–8.
Elliott, J. (1988) ‘Response to Patricia Broadfoot’s Lather, P. (1986) ‘Issues of validity in openly ideological
presidential address’, British Educational Research research: between a rock and a soft place’,
Journal, 14, 2, pp. 191–4. Interchange, 17, 4, pp. 63–84.
ASSESSING VALIDITY IN SOCIAL RESEARCH 53
Lather, P. (1993) ‘Fertile obsession: validity after Smith, J. K. (1997) ‘The stories educational researchers
poststructuralism’, Sociological Quarterly, 34, tell about themselves’, Educational Researcher, 26, 5,
pp. 673–93. pp. 4–11.
Lather, P. (2004) ‘This is your father’s paradigm: Smith, J. K. (2004) ‘Learning to live with relativism’, in
Government intrusion and the case of qualitative H. Piper and I. Stronach (eds.) Educational Research:
research in education’, Qualitative Inquiry, 10, Diversity and Difference, Aldershot, Ashgate.
pp. 15–34. Smith, J. K. and Deemer, D. K. (2000) ‘The problem of
Lincoln, Y. S. and Guba, E. G. (1985) Naturalistic Inquiry, criteria in the age of relativism’, in N. K. Denzin and
Beverley Hills, Sage. Y. S. Lincoln (eds.) Handbook of Qualitative Research,
MacNaughton, G. and Smith, K. (2005) ‘Transforming 2nd edition, Thousand Oaks, Sage.
research ethics: the choices and challenges of Smith, J. K. and Hodkinson, P. (2005) ‘Relativism,
researching with children’, in A. Farrell (ed.) criteria, and politics’, in Denzin, N. K. and Lincoln, Y. S.
Ethical Research with Children, Maidenhead, Open (eds.) Handbook of Qualitative Research, 3rd edition,
University Press. Thousand Oaks, CA, Sage.
Mosteller, F. and Boruch, R. (eds.) (2002) Evidence Spencer, L., Ritchie, J., Lewis, J. and Dillon, L.
Matters: Randomized Trials in Education Research, (2003) Quality in Qualitative Evaluation: A Frame-
Washington D.C., Brookings Institution Press. work for Assessing Research Evidence, London,
Murphy, E. and Dingwall, R. (2004) ‘A response Cabinet Office. Available at: http://www.policyhub.
to ‘Quality in Qualitative Evaluation: a framework gov.uk/docs/qqe_rep.pdf (Accessed 13.02.2006).
for assessing research evidence”, Building Research Suppe, F. (ed.) (1974) The Structure of Scientific
Capacity, 8, May, pp. 3–4 Theories, Chicago, University of Chicago Press.
Oakley, A. (2000) Experiments in Knowing: Gender and Tashakkori, A. and Teddlie, C. (eds.) (2003a) Handbook
Method in the Social Sciences, Cambridge, Polity of Mixed Methods in Social and Behavioral Research,
Press. Thousand Oaks, CA, Sage.
Pollitt, C. (1990) Managerialism and the Public Services, Teddlie, C. and Tashakkori, A. (2003b) ‘Major issues
Oxford, Blackwell. and controversies in the use of mixed methods in
Reason, P. and Bradbury, H. (eds.) (2001) Handbook of the social and behavioral sciences’, in Tashakkori
Action Research: Participative Inquiry and Practice, and Teddlie (eds.) Handbook of Mixed Methods in
London, Sage. Social and Behavioral Research, Thousand Oaks, CA,
Seale, C. (1999) The Quality of Qualitative Research, Sage.
London, Sage. Torrance, H. (2004) ‘ “Quality in Qualitative Evalua-
Sharrock, W. and Read, R. (2002) Kuhn: Philosopher of tion” – a (very) critical response’, Building Research
Scientific Revolution, Cambridge, Polity. Capacity, 8, May, pp. 8–10.
5
Ethnography and Audience
Karen Armstrong
The problem of audience appears again society being studied, spending enough time
during the writing process. Whenever ethno- among the people in order to know how
graphers write up their data they engage in they live, what they say about what they do,
an act of recontextualization (Duranti 1986: what they actually do, what they believe, and
244) by setting contextual clues, always their system of valuation. The fieldworker
selective, for the intended audience to judge may include archival and statistical data
the analysis. In anthropology, the ethnography and discuss the influence of national and
(or monograph) is considered to be the international organizations. Apart from these
account that pulls together the bits and pieces general procedures, it can be said that there
of data into a single whole. An ethnography is, is no distinct object of the anthropological
by definition, comparative; it should address fieldwork method (Faubion 2001: 39, my
central questions about the nature of human emphasis). What remains constant is the
existence through a specific society and its recurring problem of self and other: how do
cultural system. Therefore, the audience is we know what we know, how do we assume
assumed to be both specific (academic, place, to speak for others, and who is the audience
etc.) and general in the sense that anyone may being addressed? The first two issues have
engage with the broader questions addressed. been addressed as problems of validity and
No good ethnography is self-contained. representation; to address the third it is useful
Implicitly or explicitly ethnography is an to begin by looking at the relation of theory to
act of comparison. By virtue of comparison audience.
ethnographic description becomes objective.
Not in the naïve positivist sense of an
unmediated perception – just the opposite: THEORY AND AUDIENCE
it becomes a universal understanding to the
extent it brings to bear on the perception of The sociologist, Arto Noro (2001, 2004),
any society the conceptions of all the others argues that there are three genres of sociolog-
(Sahlins 1996:10). ical theory, each with an intended audience.
There have been ongoing debates about One is general theory; theories of this
the goals of ethnography. These debates type pose questions about how society in
are commonly related to changing historical general is constituted and try to answer
conditions and the need for social scientists to the questions. General theory is directed
analyze what is going on in the contemporary toward a scientific audience and aims for
world. In the past, situations like colonial- an interpretative synthesis by referring to
ism generated the need for new theoretical earlier questions, which are readdressed to
and methodological approaches. It seems contemporary events. A second genre is
appropriate, now that we live in a world research theory; this level consists of research
connected by the Internet, mobile phones, web projects that address or test the propositions of
cameras, extensive media coverage, and so on, general theory and, in turn, provide material
that we should rethink problems of method for general theory (2001: 1–2). As Noro
as related to audience. This is especially points out, there is a significant relationship
true for anthropology since the ‘natives’ between these two. They lose their common
are professionals in many fields, including ground only when research theory turns
anthropology. into administrative research or when general
In the most general sense, anthropologi- theory becomes philosophy. Research theory
cally informed ethnography is based on long- supplies material for general theory and is
term fieldwork, and participant observation intended for a scientific audience; alterna-
in a society other than one’s own has tively, it is directed toward specific social
been assumed and prioritized. Participant problems and is intended for instrumental use
observation includes the assumption of a (for example, in forming social policy). Noro
measure of fluency in the language of the calls the third genre ‘Zeitdiagnose’; this is
56 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
theory that focuses on a diagnosis of the times and summer social groups) are presented to
we live in. Zeitdiagnose is directed toward a argue the larger comparative point of social
‘group-We’audience and intends to encourage morphology.
‘us’ to think about our situation and perhaps The concept of culture was the general
to change it accordingly. theory in American anthropology of the same
Noro claims that Zeitdiagnose became period where cultural relativism focused on
popular in sociology in the 1980s and 1990s breaking the evolutionary model, a move
with books about risk society and modern which was especially relevant in the context of
identity (e.g. Beck 1992, 1994; Giddens 1991, American society. Franz Boas and his students
1992, among others)2 . The key characteristic typically made visits to the field to collect
of Zeitdiagnose is that it offers an insight, cultural data and material artifacts. Much of
understanding or vision (Noro 2001: 5) about their work was based on textual material
our own times, something we have an inkling collected from various North American Indian
about but cannot name without the synthesis groups in order to record so-called aboriginal
provided by the author. Zeitdiagnose tends culture. This has been labeled ‘salvage
to be openly normative and political (ibid.). ethnography’ because they were aware that
Such texts are intensely seductive because most of the groups had been decimated
they tell us who ‘we’ are, although these by war with the American government at
theories cannot be used in the interpretation of the end of the nineteenth century and they
empirical evidence because we would find in understood that what they were witnessing
the material what the diagnoses have already had been influenced and broken down by
named. As Noro says, the end result would be historical events. Nevertheless, they were
poor mimesis (2001: 11). looking at these groups to identify specific
As the audience for ethnographic research culture traits and their local patterning, not as
becomes global and less contained, there can an evolutionary process or a comparison of
be problems with the goals of research theory the primitive with the civilized.
and Zeitdiagnose theory. These issues were One student of Boas, Paul Radin, did
anticipated in early discussions of the object of extensive fieldwork among the Winnebago
ethnographic research and the use of analytic for nearly 50 years and wrote a book for
concepts. the method of studying culture (1987[1930]).
Radin did not deny history, but he denied
comparisons of cultures as being more or
THE SCIENTIFIC AUDIENCE: EARLY less advanced in direct or implicit comparison
STUDIES to ‘us.’ He was critical, therefore, of those
who followed Malinowski’s universalistic and
Ethnographies tend to fall into the above functional style of description of ‘primitives’:
classification of theories. An early example ‘…whereas I see no necessity for proving
of an ethnography framed by general theory that culture is culture, they apparently feel
is Seasonal Variation of the Eskimo by that it is incumbent upon them to laboriously
Marcel Mauss in collaboration with Henri demonstrate that, among primitive people, we
Beuchat (1979[1950]). It is based on field are dealing with human beings who think as
research by Beuchat and others and organized we do, feel as we do, and act as we do’ (Radin
around Emile Durkheim’s concept of social 1987[1930]: 257). Radin’s method argued for
morphology to discuss the influence of a study of culture based on ‘reconstruction
seasonal variation on both social and cultural from internal evidence.’
elements in Eskimo society and to propose
that there may be similar variation in other The task, let me insist, is always the same:
a description of a specific period, and as much
societies. As with any good ethnography, of the past and as much of the contacts with
details about culture (house styles, naming other cultures as is necessary for the elucidation
practices, hunting, etc.) and society (winter of the particular period. No more. This can be
ETHNOGRAPHY AND AUDIENCE 57
done only by an intensive and continuous study audience3 . The book presents the transition
of a particular tribe, a thorough knowledge of the from youth to adulthood in Samoan culture as
language, and an adequate body of texts; and being easy and without the stress and rebellion
this can be accomplished only if we realize, once
and for all, that we are dealing with specific, not
found in American society. By using the
generalized, men and women, and with specific, contrast of Samoan culture, Mead proposed
not generalized, events. (ibid. 184–85) that the stress experienced in American
adolescence had social and cultural causes
Radin was critical of the categories imposed which might be altered (see the discussion
by universalistic theory, although he recog- in Marcus and Fisher 1986; Stocking 1992).
nized that his method was similar to that A friend of Mead, Edward Sapir, immediately
of Marcel Mauss: ‘In elucidating culture we complained that a student of culture cannot
must begin with a fixed point, but this point use what he knows as medicine for society
must be one that has been given form by (Handler 1986). Mead’s book has generated
a member of the group described, and not by enormous commentary, the most famous
an alien observer’ (ibid. 186). To demonstrate being numerous books and articles written
his method, Radin uses one Winnebago man’s by anthropologist Derek Freeman to disclaim
(John Rave’s) account of his conversion to the validity of Mead’s ethnographic method
the Peyote Cult. Radin traces themes in the and data (e.g. Freeman 1983, 1999, 2001).
narrative and, along with other native texts George Marcus and Michael Fisher argue that
and his own observations, Radin shows how Mead failed because cultural juxtapositioning
Rave’s account is similar to and different between ‘us’ and ‘them’ requires equal
from previous Winnebago practices. Radin ethnography among ‘us’ (1986: 138). In the
thus analyzes how Rave could change his same period, Mead also wrote an ethnographic
beliefs and still remain within the general report on Samoa for the Bishop Museum
Winnebago cultural framework. While the in Hawai’i: Social Organization of Manu’a
analysis remains self-contained (about the (1969[1930]). This is a standard research
Winnebago), the method of eliciting native report about social organization (chiefs, titles,
accounts of specific events and tracing how land arrangements) that does not attract
certain themes are replicated remains valid much attention apart from an audience of
today. anthropologists.
Boas and his students often commented on Coming of Age in Samoa reached an
issues in American society, especially about audience beyond the US. It remains significant
race or in their role as experts on Native in Samoa today, especially in American
American society. It has always been the Samoa where Mead did fieldwork on the
practice of social science research to com- island of Ta’u in Manu’a. And, because texts
ment on contemporary issues; however, such extend beyond the moment of their production
comments are not the same as Zeitdiagnose (Ricoeur 1991), Coming of Age continues to
when they are based on empirical research frame the meaning of anthropology in Samoa
and linked to general theory (Noro 2001). and of Samoa; my presence there in 2005 gen-
A notable exception, Margaret Mead, came erated discussions of the book and the purpose
close to Zeitdiagnose in her popular writing of anthropology. Coming of Age in Samoa is
and in her widely read ethnography, Coming cited by the American Samoan representative
of Age in Samoa: A Psychological Study of to Congress, Faleomavaega Eni Hunkin, as an
PrimitiveYouth for Western Civilisation (2001 insult to Samoan culture (Tavita 2004). He is
[1928]). This book – not written strictly for upset by Mead’s categorization of Samoa as
a scientific audience – caused furor inside a primitive society and by her discussion of
and outside academic circles. Mead used Samoan sexuality. Perhaps more importantly,
her ethnographic knowledge about Samoa as Manu’a was at one time the sacred center of an
a basis for a critique of American culture, elaborate hierarchical culture and Mead does
and wrote the book for an American general not recognize this in the popular Coming of
58 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Age (although she does recognize it in Social overt forms (census materials, economic flow,
Organization of Manu’a). Faleomavaega feels geography, language, material culture, etc.) as
that the world continues to get the wrong well as how the forms are lived by individuals,
image of Samoa because, he claims, the which Sapir called the analysis of variation
book is taught in introductory courses of (Preston 1966: 1127). The cultural relativism
anthropology at American universities. Derek of Radin and Sapir proposed a method that was
Freeman does not escape criticism either; he is based on internal evidence in order to avoid
accused of depicting Samoan culture as being imposing categories on other cultures, with a
excessively violent. Both anthropologists are focus on engaged individuals; this method was
criticized for their reduction of Samoan later criticized as being too particularistic.
culture. Samoan culture is not represented Regarding audience, Radin and Sapir
according to Samoan norms; that is, the preferred to rely on texts, which turn out
Samoan voice is missing. In the research to have a longer ‘shelf life’ than concepts.
and writing process Samoans were typed The present-day Winnebago, or the Tikopians
by anthropological categories and Samoans described so thoroughly by Raymond Firth,
today reject the gloss. do not care about the concepts used by
Edward Sapir was a contemporary of Radin the anthropologists or their interpretations.
and Mead, also a student of Boas, and The ‘native’ audience today is interested in
a linguist. Sapir noted that all people use these old ethnographies for their descriptive
general categories as a way of making sense value as historical documents; they give them
of a huge amount of personal experience. their own interpretation.
Because of this general tendency, he was wary In another school, methods were developed
of concepts (such as ‘motivation’) because to break out of the particularistic view and to
they are generalizations that are imposed address contemporary issues through general
on our perception of objects and events, theory. Beginning in the 1940s, the so-called
useful for talking about, in an analogical Manchester School of anthropology, headed
sense, the actual phenomena but removed by Max Gluckman, defined what became
from the phenomena (Preston 1966: 1115). called situational analysis (a slightly different
Concepts tend to become endowed with version was called social drama by his student,
what Sapir called a ‘peculiar quality of self- Victor Turner). Most of these anthropologists
determination’ (ibid. 1115). Social scientists were working in Africa and trying to
tend to prefer concepts and categories because develop theories and methods appropriate
they offer precision and clarity and in for analyzing colonial relations. Gluckman
fact Sapir was criticized for his lack of (1958 [1940]) insisted that Europeans and
theory and refusal of categories (ibid. 1105). Africans had to be seen as a total system,
However, when the categories are given not as isolated groups. This could be done
prime importance, the researcher tends to through the analysis of situations or events
use people selectively and only insofar as where problems would become apparent;
they provide new material for the categories. the concept of ‘social fields’ was used to
Sapir insisted that this was wrong, that ‘the recognize the unbounded nature of social
categories must be distinctively meaningful relations. Victor Turner used this method to
in and therefore derived from, the particular show the symbolic importance of events – for
milieu, so that they will accurately describe example, rituals or conflicts – for individual
the milieu’ (ibid. 1120). For Sapir, the locus participants. In four volumes about the
of culture is in individuals and the experience Ndembu (cf. 1957, 1962, 1967, 1968) Turner
of actual individuals brings the researcher demonstrates, through the personal stories of
closest to the inherent structure of culture. named individuals, how cultural categories
He demonstrated this in his analyses of life sustain a given social structure through
histories (Sapir 1922, 1995[1938]). As Sapir an intermingling of meanings. For Turner,
defined method, you have to know the a social drama, which is often a moment of
ETHNOGRAPHY AND AUDIENCE 59
conflict, reveals a ‘moment of translucence’ others (see, for example, Gubrium and Hol-
when the positions and conflicts among stein 1997). ‘Critical ethnography’ introduced
the involved individuals become apparent. reflexivity about what ‘we’ do and cast a
Turner (1957) concluded that changes brought critical eye on writing practices, fieldwork
by colonial rule exacerbated the internal topics and research sites. In anthropology,
contradictions in Ndembu social structure. the emphasis has been on the production of
Whereas the contradictions caused by res- ethnography, especially the relation between
idence and decent could be tolerated or the fieldworker and those being researched.
resolved before, they often collapsed into Two popular books, Paul Rabinow’s (1977)
unrestrained conflict under colonial rule. Reflections on Fieldwork in Morocco, and
Along with his analysis of conflict, Turner The Headman and I by Jean-Paul Dumont
looked for replication in the symbols and (1978), opened up the reflexive question
concepts used by individuals in Ndembu in the US about the relation between
society. Key (or root) metaphors were defined the researcher and his or her informants4 .
by Turner as those that occur at different These were followed by Writing Culture
times in different situations to structure (Clifford and Marcus 1986) and Anthropology
meaning. as Cultural Critique (Marcus and Fischer
In a review of the Ndembu work, Mary 1986). Writing Culture was a collection of
Douglas claimed that Turner solved the articles that questioned how the process of
problem of validation once and for all, writing established a self/other relationship
although she worried about what the named in ethnographic description. It questioned the
individuals would think about their stories notion of ethnographic authority and how the
being public. ‘It should never again be ‘I’of the anthropologist had fashioned the ‘we’
permissible to provide an analysis of an or the ‘other’ of the ‘natives.’Anthropology as
interlocking system of categories of thought Cultural Critique called for a more politically
which has no demonstrable relation to the active engagement of anthropologists in
social life of the people who think in these the issues of their times. Following these,
terms’ (Douglas 1970:303).Whereas Sapir anthropology was challenged to drop the
and Radin focused on culture as a system, ‘savage slot’and to undertake critical research
Turner linked culture to practice, to the about the contemporary world (Trouillot
concept of society, and to universal questions. 1991: 40). These works, and many others,
All these authors were attempting to address opened an experimental current that continues
broader contemporary issues – the decimation today (e.g. Carucci and Dominy 2005).
of North American indigenous groups and the As a result there have been various efforts
disruption caused by colonialism in Africa. in writing, such as teamwork between the
The intended audience consisted of academics anthropologist and the interlocutor in order
and possibly administrators. Mary Douglas’ to produce ‘dialogue’ or ‘polyphony,’ with
comment about named Ndembu individu- different measures of success (Faubion 2001;
als seems to anticipate that the audience Marcus and Mascarenhas 2005). Topics have
was not going to be so contained in the broadened to include the contemporary world
future. of elites, corporations, medicine, law and
environmental issues, to name a few. Along
with the focus on new topics, George Marcus
CRITICAL ETHNOGRAPHY (1998) talks about the ‘complicity’ of the
fieldworker regarding his or her relation with
The emergence of a self-consciousness the events or people being studied while
regarding ‘self and other’ in the last quarter of others talk about ‘emergent practices’ (Mauer
the twentieth century altered the way anthro- 2005: 1). Like Zeitdiagnose, these authors
pologists and sociologists write ethnography aim to study issues that they are involved
and deal with data and representations of in and to take a political, often a moral,
60 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
position in order to describe what these times draw the listener (anthropologist) in, so that
are like. the anthropologist shares their complicity in
The valuation of the sites of research the violent events being described, while at
has also been redefined. Akhil Gupta and the same time the narrators deny their own
James Ferguson (1997) critiqued the place complicity (2003: 147). The author reports to
orientation of anthropology and called for a professional audience and determines the
research that was not so place dependent. truth of the narratives. But, were life story
Since we live in a world of transnational narratives appropriate for talking about the
flows, refugees, and exiles, a researcher tensions of state formation? It is likely that a
should adjust his or her methods appropri- different genre or domain was being addressed
ately. For George Marcus (1998), multi-sited by her interlocutors and here is where, again,
ethnography recognizes that individuals in the problem of audience appears. Despite
today’s world are on the move; the anthro- the attention paid to writing, topics and
pologist therefore tracks these individuals and place, critical ethnography has not addressed
their networks. James Faubion (2001: 52) adequately the relation of theory to method
notes that, although this type of research or the issues of ethnographic competence and
is easily justified, it has remained largely intended audience.
an ideal model since it is hard to find
the time or funding to do it in practice.
And even if funded and attempted, Ghassan ETHNOGRAPHIC COMPETENCE
Hage (2005) found that there are many pitfalls,
primarily the exhaustion of the ethnographer The subject matter for anthropology has
and the unhappy expectations of reciprocity always been global but today its institutions,
by individuals who expect him to take them practitioners and audiences are also global
seriously, not just to drop in for a short (Lederman 2005: 321). The same can be
visit. Faubion suggests an alternative, that said for the other social sciences and this
fieldwork might ‘proceed cross-sectionally expanded situation has implications for the
and sequentially,’ and ends his review of methods used as well as for reception.
American anthropology strongly in the spirit However, with the exception of linguistic
of Zeitdiagnose: ‘modernity is …many things; anthropology, very little of the discussion
and it is up to the cultural (and social) about fieldwork and representation addresses
fieldworker to explore, describe and diagnose the need for new methods to interpret the
at once what such a multi-scalar assemblage data (e.g. Briggs 1986; Silverstein and Urban
of artifacts is, or what it might be’ (Faubion 1996). Although critical ethnography – and
2001: 52). taking a political and moral stand – is often the
The resultant political positioning generally goal, how do we know – and can we know? –
puts the weight on categories or concepts the truth and intentionality intended by our
like ‘power,’ ‘hybridity,’ and ‘race,’ and uses interlocutors?
individuals to fill in the story. One example is The move to critical ethnography defined
a book about multi-sited memory and identity privileged sites and privileged topics with
in the border region of Trieste that ‘breaks sometimes unanticipated results. For exam-
with relativism’ because it is ‘not a standard ple, the site of Asale Angel-Ajani’s research
ethnography of empathy’ but ‘an ethnography reveals a preference for certain sites, the
of complicity’ (Ballinger 2003: 7). The author problem of speaking for someone else, and the
analyses how average citizens make sense problem of audience. Angel-Ajani’s research
of history by assimilating the events of their with women prisoners in Italy, most of whom
lives into long-standing narratives that ‘are were from Africa, put her in the position of
legitimated or authorized precisely in moral listening to dramatic testimony of chaos and
terms’ (2003: 9). Asked to tell about the violence, where what the prisoners say often
events of 1943–45, the speakers are said to does not seem to be ‘really real.’ She argues
ETHNOGRAPHY AND AUDIENCE 61
Zeitdiagnose has its own pitfalls. Because Kapferer makes an ethnographically informed
it is aimed at identity audiences, it is often argument about the general possibilities of
based on fieldwork ‘at home’ and written for sorcery – it is directed to the contradictions
a defined ‘we.’ Since the content is already and discordances of life worlds – while
known, the only novelty is in the production, acknowledging distinctions in Sri Lankan
in the way the argument is written (Siikala practices (ibid. 11, 15). Sri Lankan sorcery
2004: 202). It is inevitable that certain identity is not another example of exotic otherness;
audiences will have priority over others sorcery is a practical discourse about ‘human-
depending on the location and interests of generated social and political realities,’ part of
the major publishing houses. As ethnographic the general problematic of ‘the alienating and
texts become accessible – especially through constituting forces of power’ (ibid. 7, 303).
the Internet – they are not tied to the frame Kapferer breaks with the category because
of academic judgment or to a particular ‘we.’ categories structure the interpretation, as Sapir
If Zeitdiagnose defines a ‘we’ it runs the risk warned. At the same time, Kapferer avoids
of excluding others since any ‘we’ implies a the particularistic view of cultural relativism
‘not-we’ (Urban 1996). and the moral positioning of Zeitdiagnose.
General theory is written for a global Sorcery is not analyzed to determine the
scientific audience. When Marshall Sahlins truth about violence and power; rather, it
states that comparison is at the heart of demonstrates the anguish of human beings in
ethnography he is talking about general a social and political world (ibid. 25). When
theory, not the comparison of ‘these to ethnography addresses general questions the
those.’ At the level of general theory, broad audience is ‘human beings’ and there is the
questions are addressed concerning the nature possibility to debate and disagree. The intent
of society, the relationship of individuals is relevance not truth; thus, it allows the
to social structures, the way reciprocity possibility for a voice (response) for a global
creates social relations, the processes of audience.
social change, etc., and are argued with The move to critical ethnography opened
detailed ethnographic data. The questions the question of audience. Since that time,
can be revisited and revised in all sites as information has become more widely avail-
historical changes affect the nature of society able, making the problem of audience more
and social relations. Ethnographies like The pronounced in all aspects of the research
Fame of Gawa by Nancy Munn (1992[1986]), project. So long as the audience was primarily
Marshall Sahlins’ Anahulu (1992) or Feast a scientific one, there were guidelines about
of the Sorcerer by Bruce Kapferer (1997) how the analysis should be read and judged –
are explorations of general questions such as often for the way it addressed problems within
(respectively) value and reciprocity, cosmol- an academic discipline. However, it is less
ogy and contact between different cultural likely today that the audience will be so
orders, and sorcery’s relation to the conditions narrow; in fact, it is quite likely that the
of human existence. These questions can be audience will be any number of people with
explored anew in new sites, with new data, an interest in the place, the topic, or for many
according to new circumstances, because no other reasons. It means that one’s writing
one instance of a phenomenon accounts for all is read increasingly by ‘an undisciplined
its dimensions (Kapferer 1997: 302). audience’ (Lederman 2005: 323). We are
For example, when Bruce Kapferer writes faced, therefore, with the situation where we
about sorcery in Sri Lanka he breaks the collect data from a variety of people who
category expectations of sorcery, which are themselves have a variety of interests, and
‘deeply engaged in the very aims and publish our analyses in a variety of sites for a
methodology of anthropology,’ while at the variety of readers, each of whom brings his or
same time avoiding the ‘dark cave of her own interests to the text. The text always
methodological relativism’ (1997: 11, 13). escapes the author. The work produced will
ETHNOGRAPHY AND AUDIENCE 63
work between the anthropologist George internal critique (Kapferer 1997: 20). Naming
Marcus and a Portuguese nobleman, Fernando creates authority; competence is the ability
Mascarenhas. Their email exchanges are to live according to local systems of signif-
reproduced with little editing and no inter- icance.
pretation as an example of the ‘shifting
“politics” between the tradition of letters in
Portugal and the tradition of interviewing CONCLUSION: ACCOUNTING FOR
in anthropology’ (Marcus and Mascarenhas AUDIENCE
2005: xv). The intended audience remains
narrow: an academic audience interested in Accounting for an expanded audience is
the study of elites or the general prob- a measure of the goals of the research
lems of ‘the presentation of ethnography and whether it addresses problems – even
expressed through the relations that produce if unwittingly – defined by interests or
it’ (ibid. xvi). categories that frame the results. If the
Another possibility is that one’s ethno- ethnographic method aims to study every
graphic competence will be used by the possible group and site, the goal of the
subjects of the study for their own purposes. research is a critical issue. One danger is
This often occurs when the researcher has that ethnography becomes a form of spying;
worked in an area over a long period of another is that it reproduces dominant interests
time. Glenn Petersen has done research in and discourses. The question of dominant
Micronesia for 30 years, but as he notes, ‘since interests is relevant in Finland, for example,
Micronesians know about Micronesia, they where much of the research funding comes
have neither need for nor much interest in from the state and where the state often
my ethnography; they already know about determines (beforehand) the topics that it
themselves, to put it simply’ (2005: 312). will fund. Research theory, as defined earlier,
While his writing gave him ethnographic can address two audiences: a scientific or
authority with a scholarly public (mostly an administrative audience. In many cases,
outside Micronesia), the Micronesians put research questions are designed for topics
more value on his competence, that is, about which the state needs information (such
how they could make use of his outside as prison populations, area studies or Islam).
experience. Competence means taking into The researcher in these cases is defined as an
account all the ‘messy’ parts: disagreements, expert, despite the fact that expert predictions
the tensions that link hierarchy and equality, have proven to be unreliable and, ultimately,
and the discrepancies of everyday experience unaccountable for their errors (Menand 2005).
(Petersen 2005: 315). For Petersen, a good Even when one avows to be critical – as in
ethnographer knows about life as the indi- critical ethnography in the US – the research
viduals in the community experience it, and questions and results may unwittingly repli-
therefore knows something about the effect cate central problems in American society
of cultural contradictions on their lives (ibid. (power, race, ethnicity, gender) in other places
316). Competence is gained by recognizing if one does not listen carefully to what is
the complexities, not glossing them over being said within the context of another social
with general concepts. This was, after all, setting. This is what Louis Dumont meant
the point of practice theory, when Pierre when he warned that anthropology should not
Bourdieu quoted Jean-Paul Sartre: ‘Words be subjected to non-anthropological concerns
wreck havoc when they find a name for (Dumont 1986). Dumont argued that the
what had up to then been lived namelessly’ proper study of society was based on enriching
(Bourdieu 1977: 170). Practice brings the general theoretical questions through detailed
contradictions to the surface, as Gluckman ethnography in order to determine the valu-
recognized; practice theory recognizes the ations that distinguish one research context
political implications of categorization and from another.
ETHNOGRAPHY AND AUDIENCE 65
not be read conclusively; it will be read for its Bourdieu, Pierre 1977 Outline of a Theory of Practice.
relevance by readers who assign meaning to Cambridge: Cambridge University Press.
it according to their own valuations. Brenneis, Donald 1987 ‘Talk and Transformation.’ Man,
New Series 22(3): 499–510.
Briggs, Charles 1986 Learning How to Ask. Cambridge:
Cambridge University Press.
NOTES Carucci, Lawrence and Michèle Dominy 2005 ‘Anthro-
pology in the ‘Savage Slot’: Reflections on the
Epistemology of Knowledge.’ Anthropological Forum
1 A caveat is necessary here: the ‘native’ audience 15(3): 223–233.
is not necessarily a recent phenomenon. It has been Clifford, James and George Marcus (eds.) 1986 Writing
common in Finland for a long time for the general Culture: The Poetics and Politics of Ethnography.
public to read ethnology and folklore texts, among Berkeley: University of California Press.
others, about themselves. However, while the audi- Douglas, Mary 1970 ‘The Healing Rite (review article).’
ence is ‘native,’ the writing is among ‘insiders’ and Man, New Series 5(2): 302–308.
does not raise the issue of ‘self and other’ in the same Dumont, Jean-Paul 1978 The Headman and I. Austin:
way as when the researcher is from another culture.
University of Texas Press.
2 These works are part of ‘reflexive modernity’
and, in fact, ‘modernity’ is marked by reflexivity. Dumont, Louis 1986 Essays on Individualism: Modern
The implications of how the information flow makes Ideology in Anthropological Perspective. Chicago:
the world ‘modern’ and ‘reflexive’ on an institutional University of Chicago Press.
level, and the impact on anthropology, are discussed Duranti, Alessandro 1986 ‘The Audience as Co-Author:
by John Knight (1992). An Introduction.’ In, Special Issue: The Audience as
3 There are no footnotes or references, as would Co-Author, edited by Alessandro Duranti and Donald
be expected in scientific writing, although there is an Brenneis. Text 6–3.
explanation of the methodology in an appendix. Duranti, Alessandro 1993 ‘Truth and Intentionality:
4 The reflexive turn in American anthropology
An Ethnographic Critique.’ Cultural Anthropology
happened in the context of Project Camelot and the
Vietnam War. Both events opened debates about
8(2): 214–245.
the purpose of anthropological research: was it to Fabian, Johannes 1995 ‘Ethnographic Misunderstanding
supply information for the CIA and the US military? and the Perils of Context.’ American Anthropologist
Rabinow refers to the American political context in the 97(1): 41–50.
introduction to Reflections on Fieldwork in Morocco. Faubion, James 2001 ‘Currents of Cultural Fieldwork.’
In, The Handbook of Ethnography, edited by Paul
Atkinson, Amanda Coffey, Sara Delamont, John
Lofland and Lyn Lofland. London: Sage Publications,
REFERENCES pp. 39–59.
Freeman, Derek 1983 Margaret Mead and Samoa.
Angel-Ajani, Asale 2004 ‘Expert Witness: Notes Toward Cambridge, MA: Harvard University Press.
Revisiting the Politics of Listening.’ Anthropology and Freeman, Derek 1999 The Fateful Hoaxing of Margaret
Humanism 29(2): 133–144. Mead: A Historical Analysis of her Samoan Research.
Ballinger, Pamela 2003 History in Exile: Memory and Boulder: Westview Press.
Identity at the Borders of the Balkans. Princeton: Freeman, Derek 2001 ‘Words have no Words for Words
Princeton University Press. that are not True’: A Rejoinder to Serge Tcherkézoff.’
Basso, Keith 1996 Wisdom Sits in Places: Landscape and Journal of the Polynesian Society 4: 301–311.
Language among the Western Apache. Albuquerque: Giddens, Anthony 1991 Modernity and Self-Identity.
University of New Mexico Press. Stanford: Stanford University Press.
Beck, Ulrich 1992 Risk Society: Towards a New Giddens, Anthony 1992 The Transformation of Intimacy.
Modernity. London: Sage. Stanford: Stanford University Press.
Beck, Ulrich 1994 Ecological Politics in the Age of Risk. Gluckman, Max 1958[1940] Analysis of a Social Situ-
Cambridge: Polity. ation in Modern Zululand. Manchester: Manchester
Becker, Alton and Bruce Mannheim 1995 ‘Culture Trop- University Press.
ing: Languages, Codes, and Texts.’ In, The Dialogic Graham, Laura 2005 ‘Image and Instrumentality in
Emergence of Culture, edited by Bruce Mannheim and a Xavante Politics of Existential Recognition: The
Dennis Tedlock. Urbana: University of Illinois Press, Public Outreach Work of Eténhiritipa Pimentel
pp. 237–252. Barbosa.’ American Ethnologist 32(4): 622–641.
66 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Gubrium, Jaber and James Holstein 1997 The New Menand, Louis 2005 ‘Everybody’s an Expert: Putting
Language of Qualitative Method. New York: Oxford Predictions to the Test.’ Book Review in The New
University Press. Yorker, December 5: 98–101.
Gupta, Akhil and James Ferguson (eds.) 1997 Culture, Munn, Nancy D. 1992[1986] The Fame of Gawa:
Power, Place: Explorations in Critical Anthropology. A Symbolic Study of Value Transformation in a
Durham: Duke University Press. Massim (Papua New Guinea) Society. Durham: Duke
Hage, Ghassan 2005 ‘A not so Multi-Sited Ethnography University Press.
of a not so Imagined Community.’ Anthropological Noro, Arto 2001 ‘Zeitdiagnose’ as the Third Genre
Theory 5(4): 463–475. of Sociological Theory?’ Paper presented at Euro-
Handler, Richard 1986 ‘Vigorous Male and Aspir- pean Sociological Association Conference, Helsinki,
ing Female: Poetry, Personality and Culture in August 28.
Edward Sapir and Ruth Benedict.’ In, Malinowski, Noro, Arto 2004 ‘Sosiologian Kolmio: teoriat, käytöt
Rivers, Benedict and Others, edited by George ja yleisöt’ (‘A Sociological Triangle: Theory, Use and
Stocking. Madison: University of Wisconsin Press, Audience’), unpublished paper.
pp. 127–155. Petersen, Glenn 2005 ‘Important to Whom? On Ethno-
Haviland, John 1991 ‘ “That Was the Last Time I Seen graphic Usefulness, Competence and Relevance.’
Them, and No More”: Voices Through Time Anthropological Forum 15(3): 307–317.
in Australian Aboriginal Autobiography.’ American Preston, Richard J. 1966 ‘Edward Sapir’s Anthropology:
Ethnologist 18(2): 331–361. Style, Structure, and Method.’ American Anthropolo-
Heidegger, Martin 1971 Poetry, Language, Thought, gist 68(5): 1105–1128.
translated by Albert Hofstadter. New York: Harper Rabinow, Paul 1977 Reflections on Fieldwork in
and Row. Morocco. Berkeley: University of California Press.
Kapferer, Bruce 1997 Feast of the Sorcerer: Practices Radin, Paul 1957 [1927] Primitive Man as Philosopher.
of Consciousness and Power. Chicago: University of New York: Dover Publications.
Chicago Press. Radin, Paul 1987 [1930] The Method and Theory of
Ethnology. South Hadley, MA: Bergin and Garvey.
Knight, John 1992 ‘Globalization and the New Ethno-
Ricoeur, Paul 1991 From Text to Action: Essays
graphic Localities: Anthropological Reflections on
in Hermeneutics, II, translated by K. Blamey and
Giddens’s Modernity and Self-Identity.’ Journal of the
J. B. Thompson. Chicago: University of Chicago Press.
Anthropological Society of Oxford 23(3): 239–251.
Sahlins, Marshall 1992 Anahulu: The Anthropology
Lederman, Rena 2005 ‘Challenging Audiences: Critical
of History in the Kingdom of Hawai’i. Volume
Ethnography in/for Oceania.’ Anthropological Forum
One, Historical Ethnography. Chicago: University of
15(3): 319–328.
Chicago Press.
Marcus, George and Michael Fisher 1986 Anthropology
Sahlins, Marshall 1996 [1993] Waiting for Foucault.
as Cultural Critique. Chicago: University of Chicago
Cambridge, UK: Prickly Pear Press.
Press. Sapir, Edward 1922 ‘Sayach’apis, a Nootka Trader.’ In,
Marcus, George 1998 Ethnography Through Thick and American Indian Life, edited by Elsie Clews Parsons.
Thin. Princeton: Princeton University Press. New York: Viking.
Marcus, George and Fernando Mascarenhas 2005 Sapir, Edward 1995 [1938] Foreword. Left Handed, Son
Ocasião: The Marquis and the Anthropologist, of Old Man Hat, Recorded by Walter Dyk. Lincoln:
A Collaboration. Walnut Creek, CA: Alta Mira. University of Nebraska Press.
Mauer, Bill 2005 ‘Introduction to “Ethnographic Emer- Siikala, Jukka 2004 ‘Theories and Ideologies in
gences”.’ American Anthropologist 107(1): 1–4. Anthropology.’ Social Analysis 48(3): 199–204.
Mauss, Marcel (in collaboration with Henri Beuchat) Silverstein, Michael and Greg Urban (eds.) 1996 Natural
1979 [1950] Seasonal Variations of the Eskimo: Histories of Discourse. Chicago: University of Chicago
A Study in Social Morphology, Translated with a Press.
Foreword, by James J. Fox. London: Routledge and Smith, Andrea 2004 ‘Heteroglossia, “Common
Kegan Paul. Sense,” and Social Memory.’ American Ethnologist
Mead, Margaret 1969 [1930] Social Organization of 31(2): 251–269.
Manu’a. Bernice B. Bishop Museum Bulletin 76. Stocking, George 1992 The Ethnographer’s Magic and
Honolulu, Hawaii: Bishop Museum Reprints. Other Essays in the History of Anthropology. Madison:
Mead, Margaret 2001[1928] Coming of Age in Samoa: University of Wisconsin Press.
A Psychological Study of Primitive Youth for Western Tavita, Terry 2004 ‘Faleomavaega Tackles Mead-
Civilisation. New York: Harper Collins (Perennial Freeman Debate.’ Samoan Observer Online,
Classics). 25 October.
ETHNOGRAPHY AND AUDIENCE 67
Trouillot, Michel-Rolphe 1991 ‘Anthropology and Turner, Victor 1962 Chihamba, the White Spirit.
the Savage Slot: The Poetics and Politics of Manchester: Manchester University Press.
Otherness.’ In, Recapturing Anthropology: Work- Turner, Victor 1967 The Forest of Symbols: Aspects of
ing in the Present, edited by Richard G. Fox. Ndembu Ritual. Ithaca: Cornell University Press.
Santa Fe: School of American Research Press, Turner, Victor 1968 The Drums of Affliction: A Study
pp. 17–44. of Religious Process among the Ndembu of Zambia.
Turner, Victor 1957 Schism and Continuity in an Oxford: Clarendon Press.
African Society: A Study of Ndembu Religious Life. Urban, Greg 1996 Metaphysical Community. Austin:
Manchester: Manchester University Press. University of Texas Press.
6
Social Research and Social
Practice in Post-Positivist Society
Pekka Sulkunen
Scientific methods are not tool kits that How should we classify such styles of
researchers can select to suit their tastes and reasoning in sociology, and how could we
preferences to compete with other techniques explain or understand the reasons for such
contending to reach the truth. Research differences? In this article I argue that a major
instruments in sociology are no more than change in sociological styles of reasoning took
in other sciences independent of concepts place in the late 1970s and early 1980s both
and problematics from which they emerge, in the way sociology began to conceptualise
and they in turn structure the questions and the social world and in the way sociological
theoretical concepts that they can be used to research was related to social practices or
deal with. Instead of a choice of methods policy-making. One apparent indication of
it is more appropriate to talk about ‘styles the new style of reasoning was the boost in
of reasoning’, like Ian Hacking (1990: 6), qualitative research and the accompanying
who has argued that although the social world ‘cultural’ or ‘linguistic’ turn in sociology (see
is constructed differently by different styles Chapter 1). These changes reflect the role that
of reasoning, this is not to say that the social sciences first had in the three post-war
constructions are arbitrary. It simply means decades and then lost when the welfare state
that, for example, an explanation or prediction construction period had attained maturity.
formulated in probabilistic quantitative terms
already implies a great deal about the world
in its concepts which, in turn, are integrated REPRESENTATIONAL, EPISTEMIC
with a statistical methodology. The same AND POSITIONAL DIMENSIONS
reality represented in another vocabulary OF KNOWLEDGE
and through a biographical or ethnographic
methodology would look different but still be Sociological studies tell about social reality
no less true. in three different ways. First, they report
SOCIAL RESEARCH AND SOCIAL PRACTICE IN POST-POSITIVIST SOCIETY 69
knowledge about social realities. This represent the same reality, but within very
knowledge depends on their conceptual different styles of reasoning and methods.
framework and on their instruments of Second, the style of reasoning itself tells
observation such as ethnography, media us about society. The three studies of social
analysis or the survey technology, but exclusion, with their different methods and
within the constraints of the concepts and concepts, involve very different problematics
instruments, knowledge it is (Bhaskar 1975). although their subject matter is at least
This is the representational dimension partly the same. The first probably would be
of knowledge. For example, a study on built on communitarian hypotheses on how
the relationship between social capital social relationships support people in their
and social exclusion might be made with self-control, autonomy and integration into
statistical methods, which require that the educational and work life. The second would
abstract categories ‘social capital’ and ‘social raise different kinds of questions concerning
exclusion’ are operationalised as measurable the authority of the state, the basis of selecting
indicators that describe individuals or some pharmaceuticals as legal and others
collectivities. Most likely, a fair amount of as illegal, and the intended and unintended
drug users would be found among the most consequences of prevention efforts. The third
excluded. Another study might compare would pay attention to the fact that social
Western countries and come to the conclusion capital may be of very different kinds, and that
that most of them apply strict prohibitions it is not entirely an independent variable in
on a selection of pharmaceuticals – not the processes of social exclusion but depends,
all, like alcohol, but many such as opiates, instead, on power relationships in society.
cocaine, amphetamine or MDMA (‘ecstasy’). All three studies involve moral investments
Possession, distribution, production and in the way they categorise their observations,
import of the prohibited drugs are legal they represent not only the reality as facts
offences with penal consequences. The role but also wider frameworks in which they
of the criminal justice system as the interface see society, the state, the individual and the
between the state and the drug user in many interface between citizens and the public
ways operates as a mechanism of exclusion. powers. In other words, they are motivated
The term ‘prohibition’ is also an abstract by different interests of knowledge.
category and describes at least part of the The interests of knowledge which define
same reality as the quantitative study, but from the needs and dispositions to explain and
a completely different angle. Finally, a third understand what happens in society determine
study, made with ethnographic methods, the types of questions that can be asked about
could analyse the social relationships in the social reality: the epistème, to use Michel
different types of public social and health Foucault’s term (Foucault 1966: 197). Let us
services offered to illicit drug users, and find call this the epistemic dimension of socio-
that at the low-threshold needle exchange logical knowledge. Epistèmes themselves are
clinic the (often voluntary) social workers social facts that represent the relations of
are allies of their clients, trying to help them domination in the given society. The master
to get medication and other help, whereas the example is Foucault’s own account of the
workers at the substitution treatment clinic history of Western science and its ways of
require a great deal of ‘motivation’ and effort relating human culture and nature. It evolved
from their clients, often with the consequence from classifying and representing the natural
that they are felt to be part of the penalising world, including humans, in the natural
control system rather than a help. Again, history of the seventeenth and eighteenth
we are observing mechanisms of exclusion, centuries, to the study of exchange and utility
including social capital and the lack of it, but in mercantilist and physiocratic economics,
from a completely different angle than the to the focus on work in classical economic
other two studies. All of them report facts that theory, and finally to the complete separation
70 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
between human and natural sciences towards understand to see precisely what practical role
the end of the nineteenth century. A similar they potentially serve today.
example is Ian Hacking’s analysis of the dis-
covery of probability and stochastic processes
in the early nineteenth century. This opened PLANNED ECONOMY AND MODE 1
up whole new areas of scientific research SOCIAL SCIENCE
concerning populations and mass phenomena.
Such grand transformations of the epistème When the architect of the British welfare state,
reflect society’s interests in itself and its Sir William Beveridge, envisioned the state’s
natural environment in wide philosophical role in the post-war society he considered that
terms, but as the three examples above point the ‘spectacular achievements of the war-time
out, the kinds of questions society asks of planned economy’ (Beveridge 1944: 120)
itself are also reflected in research designs in measured by the GNP and employment should
a smaller scale, and the designs and questions be applied to the economy in peace, which
themselves tell us something important about also could benefit from state regulation, and
society. not only by means of income redistribution.
Third, sociological studies report through The state’s aim was no longer to minimise
their form and scientific practice quite special public spending but to optimise all spending
facts about society, namely facts about the in society, in regard to available labour power
relationship between sociologists themselves by means of ‘manpower budgeting’. The state
and the object of their study. This we budget should be measured to maintain full
can call the positional, or the sociology of employment but not to exceed the national
knowledge dimension of sociological facts manpower capacity. The Keynesian principle
(Bourdieu 1982). The division of sciences of full employment was translated into income
into disciplines in itself is an important fact equalisation in social policy and growth was
about the society that engenders it. The fact its primary objective. Thus planning was not
that social sciences are today separated from uniquely a Socialist idea; a plan designed and
natural sciences, and split into sub-disciplines supervised by the centralised national state
each with their own dominant styles of was a generally accepted European model of
reasoning, is not simply a consequence of industrial development.
the accumulation of knowledge but also a The planning did not only cover infras-
real factor which has an impact on what new tructure, regional policy, monetary and fiscal
knowledge it can produce. Another division, policy, but also the ways in which people
especially important in sociology, is the way should lead their lives. The Swedish Alva and
that scientific knowledge is entangled with Gunnar Myrdal (1934) had in their famous
but sometimes also opposed to practical population policy programme proposed that
knowledge about society, held by ordinary the state should root out bad habits among
people, by policy-makers, by the media and its citizens and teach them good manners.
other significant institutions. People had to be trained to take care of
All these three dimensions must be their households and bring up their children,
accounted for when we discuss the rela- although the important and complicated task
tionship between social science and social of education should primarily be yielded
practice. Sociological studies should not be up to professionals in nursery schools and
read only as reports about their objects, but other institutions. The state had to make
symptomatically, as manifestations of the people conscious of their real interests.
power fields of knowledge in which they oper- Psychological research about happiness was
ate, and of their relationships to these fields. needed to discover what makes life worth
In all three respects the social sciences in living according to people themselves, and the
advanced capitalist societies have undergone institutions of society should be formed on the
a transformation which we must clearly basis of these observations.
SOCIAL RESEARCH AND SOCIAL PRACTICE IN POST-POSITIVIST SOCIETY 71
The sociology associated with the plan in absolute terms. The earlier consumer
was an exemplary case of what Gibbons booms of the eighteenth century in England
et al. (1997) call Mode 1 science. Knowledge (McKendrick et al. 1982; Mukerji 1983) and
production in Mode 1 takes place at a distance still in nineteenth-century Europe (Williams
from the context of application, as ‘pure’ 1982) were limited to small elites, but the new
science at the far end of the continuum industry-based consumer society was a phe-
from research to ‘development’. Mode 1 nomenon of the masses and encompassed the
knowledge production respects rigorous disci- structural foundations of industrial society.
plinary boundaries. Its canon of accountability In retrospect this change was so drastic that
and quality control dictates that only intra- it has been given dramatic names, such as
disciplinary expert authority is qualified to the European golden era (Therborn 1995), the
judge the validity of knowledge, the merits golden years of capitalism (Hobsbawm 1994),
of the scientists and the value of their work. the glorious thirty years (Fourastié 1979) or
Mode 1 science is enclosed in the universities, even the second French revolution (Mendras
and – the authors claim in a second book 1988). It changed the make-up and technology
(Nowotny et al. 2001) – in fact not accountable of everyday life. It reconfigured both social
at all in practical terms, such as outcomes in structures and people’s way of thinking about
welfare or as impact in policy effectiveness. themselves and about their relationships with
Nowotny et al. (2001: 63) explain that others. It brought to ordinary people a quantity
the positivist virtue of a completely self- and diversity of goods, pleasures and uses of
controlling, context-free science was culti- time that either had never existed before or had
vated in a context that had an unlimited only been accessible to the very privileged.
appetite for meaning and certainty already Luxury was democratised and became part of
from the eighteenth century, when Western everyday life. The pleasures of consumption
society was experiencing an enormous wave and sensuality became publicly presentable,
of modernisation. The same explanation in everyday life as well as in the media and
holds even more emphatically for the post- in marketing, whereas they had earlier been
war decades in Western countries where excluded from public discourses and left to
progress, change for the better, lurked in the the private sphere. The Weberian values of
future biographies of not only the elites but industrial society – frugality, industriousness
of the great majority of people. Post-war and achievement orientation – were replaced
industrialisation was particularly dramatic for by post-industrial or post-modern values that
Europe which, with the exception of England stress pleasure for its own sake and cherish
and Belgium, was still a continent dominated its public presentation as much as they spurn
by small-holding agriculture on the eve of its public control. The romantic ethos of
the Second World War. Germany, Denmark, capitalism seemed to get the upper hand.
Netherlands and Sweden all had well over At the same time parliamentary institutions
one-fifth of their labour force employed in were consolidated in all Western countries.
agriculture; Spain and the eastern countries Europe only gradually recovered from quasi-
including Finland had well over one-half. totalitarian war-time regimes, the USA from
Thirty years turned first the west and then the an era of ultra-nationalistic anti-communist
central and eastern part of Europe to econo- suspicion. Value conflicts over religion,
mies dominated numerically by the industrial nationalism, the family, sexuality and many
working class, the peaks reaching up to forms of consumption and culture gained
almost half of the total (civilian) labour force political platforms and turned into protests
(48.5 percent in West Germany in 1970)1 . and counter-protests or moral panics (Cohen
The post-war industrialisation produced a 1972).
phenomenal growth in consumption possi- The appetite for meaning and certainty was
bilities with no parallel in human history, not only of a psychological nature. The plan
not relatively speaking and certainly not was a central instrument in progressive
72 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
national industrial policies, and the plan science research could detect the determining
required reliable and impartial information elements in human social conduct, it does not
for its material. Also the moral ambivalences matter who participates in the production of
needed to be formulated in a language that knowledge, and from what point of view.
and described more systematically than with Instead of engaging in the question
anecdotal accounts by journalists and writers of standpoints of knowledge, there was
or movie directors. The appetite was not only a strange cleavage between ‘Grand Theory’
for meaning and certainty; it was also for and ‘Abstracted Empiricism’ (Mills 1959)
information. prevalent in sociological texts of that era.
Population statistics had already a solid The highly technical vocabulary of the former
foundation from the late nineteenth and and the bureaucratic ethos of the latter
early twentieth century. To a lesser extent appear quite distinct from each other, theory
this was true also for economic and labour representing ‘basic’ or pure science with
statistics. However, household consumption disinterested motives (beyond the interest in
data only began to become available in the the establishment of the discipline itself) while
1950s. Income and mobility surveys have the empirical researchers apply their measure-
an even shorter history, and individual data ments and methods to practical social issues
on specific consumption patterns (such as of integration, cohesion, equality, crime pre-
alcohol), sexual behaviour, political opinions vention, youth work, health promotion, etc.
and attitudes about this or that aspect of every- Neither theory nor empiricism left much
day life, which today are routinely provided room to human agency, with understandable
by Eurostat, European Science Foundation, aspirations, goals and hopes. For empiricist as
and national statistical offices, or which are well as theoretical sociologists, Mills argued,
industrially produced and commercialised by the object of knowledge is social action – what
private ‘research’ companies, were still in the makes members of society act in a meaningful
1960s a rarity provided by specially funded and orderly way from the point of view of
academic research programmes. All this society. According to Mills, it was the task of
information required a conceptual portrayal of emancipating social science to help out people
society – a language to describe its direction who ‘need, and feel they need … a quality of
of change, and to interpret its relevance. mind that will help them to use information
Even though the epistemic dimension of and to develop reason in order to achieve
the sociology associated with the plan was lucid summations of what is going on in the
strongly normative – preparing the good life world and of what may be happening within
for all – any sociology of knowledge was themselves’ (p. 5). That quality of mind, the
an alien, if not hostile, idea to Mode 1 sociological imagination, is offered to them
knowledge production. Science that speaks by the critical sociologist who is capable of
with the voice of disciplinary authority does using the classical tradition to translate private
not highlight its subject and the subject’s problems to public issues and vice versa.
relationship with the reality it speaks about.
To take an example from the natural sciences,
the mapping out of the human genome THE NEOLIBERAL TURN AND MODE 2
is a collective project which advances at SOCIAL SCIENCE
every new step independently of who makes
that step and independently of what the By the 1970s social research in accordance
consequences of the genome project will with Mode 1 knowledge production was crit-
be for diagnostic practices, for treatment icised increasingly often. One of the objects
methods, for the lives of people with known of critique was the problematic assumption
genetic disorders, and for the lives of many about objective knowledge independent from
other people who live with them. In the same the viewpoint of the knower. One solution
way, one might think that if basic social has been to make explicit ‘whose side we are
SOCIAL RESEARCH AND SOCIAL PRACTICE IN POST-POSITIVIST SOCIETY 73
on’, as Howard Becker, the famous American not only do people know a great deal about
sociologist of deviant minorities, asked in their society – obviously, in order to go to
1966, and argued that it is the task of the school, to be employed or be an employee,
sociologist to side with the ‘underdogs’, the to be husband and wife, to make one’s way in
drug users, prostitutes, ethnic minorities or modern traffic, to be a consumer, a political
extremely poor people. The voice of such or a social citizen, one has to know a very
people is not heard in the media; they are not complicated set of rules and norms – but that
seen in the halls of power, and thus informa- the whole social structure is based on such
tion about their lives must be produced by shared knowledge. Thus the proper approach
professional sociologists who are explicitly to the analysis of social structure is not abstract
equipped with methodologies to make that measurement such as statistics on income
information available (Becker 1970). But as distributions or class divisions but sociology
Alvin Gouldner (1970) remarked in a famous of knowledge.
and influential debate with Becker, such Once it was recognised that people know
a position does not solve the problem itself, a great deal about social life, and that
created by the division of labour between social scientists’ knowledge is part of the
pure academic science and applied research. same ‘stock of social knowledge’ in which
Being on the side of the underdog is in other people also live, it is easy to dismiss
itself an ambiguous position. What is an Mode 1 science as an illusion. There is
underdog? There is always somebody above no pure social science, independent of the
every overdog, and thus if we study drug context of application, because the scientists’
users, for example, even the local police knowledge is itself part of the context: it serves
officer – an obvious overdog to the addicts – is to define situations, to conceptualise social
under the authority of the police headquarters, issues and to establish selections of feasible
of the municipal council, the President of policy options, to exclude others and so on.
the local Lions Club, and many others, Social sciences are permanently challenged
not least the legislator who decided that by everyday thought, they cannot in actual
drug use is illegal and thus a police affair. fact justify themselves only with disciplinary
Moreover, Gouldner argued that even when canons, and their academic authority is
sociologists take the underdog point of view constantly questioned. Such a view stresses
they, knowingly or not, serve a constituency the positional, or sociology of knowledge-
on whose interest their career possibilities dimension of social science: scientific con-
depend. cepts, methods and language which produce
A major blow to Mode 1 social science and express facts also reflect the relationship
came from social constructionism, which between the scientists and their object, the
pointed out that there cannot be any pure people they study. Sociology committed
social science knowledge independent from to this view always faces what is called
ordinary people’s everyday knowledge about ‘the reflexivity problem’. If social reality is
society. Anthony Giddens (1979: 245–253) significantly influenced by what people think
gave this point a famous formulation in or believe about it, and these beliefs are
his state-of-the-art review of social theory influenced by the believers’ interests, social
by saying that the twentieth-century trend scientists contribute to the shaping of this
in social science has been to increasingly reality in a way that also is infected with their
account for the fact that people always interests. In what way, then, can sociologists
already, without any interference from social claim that their knowledge is superior or
scientists, possess enormous amounts of somehow less influenced by their situation
knowledge about society. A landmark volume than other knowledge? Berger and Luckmann
to realise this had already appeared in 1966: said that sociology of knowledge is ‘like
The Social Construction of Reality by Berger trying to push a bus in which one is riding’
and Luckmann (1987). They had argued that (1987: 20). To pretend that disciplinary social
74 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
science is somehow neutral and virtuously countries if we look at it from the perspective
outside of social reality, even in its basic of the principles of governance. Nikolas
theoretical part, is to make a fallacious claim Rose and Peter Miller (1992) have associated
of objectivity and a rather dubious attempt this change with the Foucauldian idea of
to cover up its partiality. Recently this view governmentality, the internalisation of power
has been profusely advocated by Michael by its subjects in modern society, and
Burawoy (2005). found its locus in the changing role of the
When Giddens made his observation that state. Since then, an extensive literature has
social sciences tend towards a recognition of demonstrated that essential reforms in public
the importance of everyday knowledge, he management (itself a new term signalling
was in fact pointing at a major change in the change) have taken place in advanced
the relationships between social science and capitalist states, at times to a point where
social practice that was occurring in all its the state seemed to be withering away from
three dimensions: representational, epistemic capitalism altogether. Luc Boltanski and Ève
and sociology of knowledge, in the post- Chiapello (1999), on the other hand, have
positivist transition. In representational terms, studied business management doctrines and
the so-called cultural, semiotic or linguistic found that a similar re-organisation has taken
turn drew sociologists’ attention to critical place in the private sector even earlier. In
analyses of meaning in peoples’ everyday fact, the new style of governance has shifted
life, in the media, in cultural products and from business to public management with
also in social science itself. In Erik Allardt’s more or less success. Michael Power (1997)
terms (2006), the hermeneutic pole in social has confirmed this phenomenon and used
science gained dominance vis-à-vis its com- the term The Audit Society to describe the
plementary opposite, the positivist vision. It essential change that has occurred to the role
was observed that beyond what was taken for of social sciences in the new mode of power:
fact there is a complex web of communica- evaluation, of which auditing is one especially
tion, from statistics collectors’ concepts and important part. Using the term coined by
classifications, to respondents’ interpretations Gibbons and associates (1997), it depicted the
and responses to them, to statistical analysis change from Mode 1 to Mode 2 knowledge
and interpretation of results by researchers and production. In contrast with Mode 1 ‘pure’
by their readers. No part in this web can be science, Mode 2 knowledge production takes
taken for granted as evident and obvious. In place in the context of application; it is
cultural and media studies the same ambi- transdisciplinary and it is directly accountable
guity of meaning appeared in many forms. also on grounds of its practical usefulness
Semioticians talked about the ‘referential (Nowotny et al. 2001: 220).
fallacy’ (Greimas and Courtès 1979), media Boltanski and Chiapello concluded that
researchers focused on the user perspective, by the mid-1970s industrial life had entered
i.e. the interaction between the media and a deep management crisis in OECD (all
the audience (Sulkunen and Törrönen 1997; Organisation for Economic Co-operation and
Alasuutari 1995), and literary criticism fol- Development) countries. The bureaucratic
lowed Roland Barthes (1977: 142–48) in management structures that had been copied
believing that the ‘author is dead’– the ‘mean- from the military were inadequate for per-
ing’ of literary texts escapes the intentions formance and unacceptable from the point
of their authors, and in the extreme case it of view of the increasingly educated labour
even escapes the text itself. Meaning became force. The response was to create more
a problem, the object of study, the referent, democratic participatory work organisations,
instead of being simply the medium of facts. flexible employment schemes, subcontract-
Why? It has by now become established ing, autonomous quality circles or teams,
that the end of the 1970s marked an end outsourcing and competition within compa-
of a historical period in advanced capitalist nies. The new organisational form was no
SOCIAL RESEARCH AND SOCIAL PRACTICE IN POST-POSITIVIST SOCIETY 75
longer the hierarchy but the network, and its There is no willingness to prescribe norms
node was the project: a task-based uniquely of how and what we should or should not
funded team with autonomous leadership, do. Nevertheless, the political responsibility
targets and a deadline. Control was no has to be attested and the officials have to
longer directed from central management be given grounds for decisions about how to
down to the divisions, departments and direct the state’s money to different purposes,
the shop-floor stewards; from now on it among other things. Frame laws and pro-
was not only internalised in the employees’ grammes that define goals, recommendations
own individual interest but also externalised for programmes and criteria for standards are
to peers and to competitive relationships needed to achieve the purposes mentioned
between operational units and profit centres. above. In very many areas supra-national
The public management doctrines that were bodies define the targets. For example in
adopted in a short time-span in the mid- the European Union framework programmes
1980s in the OECD and its member countries are formulated on many issues: development
applied the same principles to state and local of technology, employment, prevention of
government. Similar problems of bureaucratic exclusion, regional development, promotion
management were to be eliminated as in the of health, prevention of drug problems and
private sector, but a moral dimension was harmonization of education and many other
also important: citizens should no longer be things. These are again translated to national
seen as subjects of the state; they were put strategies, policy programmes and eventually
in the position of clients, and the public to short-term action plans. Local and regional
service-providing agencies were re-organised governments insert these to their own objec-
to meet requirements that are often called tives and action plans. The formulations of
the three Es: Economy (ensuring the best these goals are of very general nature in the
possible terms for endowed resources, imply- programmes and their accentuations usually
ing competition between service producers), correspond to those of the general public
Efficiency (producing more value for money) administration thinking: in alcohol and drug
and Effectiveness (ensuring that outcomes programmes the goals are the responsibility
conform to intentions) (Power 1997: 50). of citizens themselves, initiative, networking
The central government is no longer autho- and relying on the support of neighbourhood
rised to issue norms to local officials and communities, to name just a few.
service producers such as hospitals, schools, From the epistemic point of view,
day care services etc., but only information governance by programmes and frameworks
and advice, and resources now measured to rather than by plans means that society asks
output rather than needs. itself different kinds of questions than before.
Social sciences that were attached to the plan
were expected to say what happens if we do X,
FROM THE GOOD LIFE TO GOOD and what should be done to make Y happen.
PRACTICES Now the questions are: in regard with the three
Es, which of the projects A, B, C … N meet
Governance – or management, borrowing best the objectives of the programme? For
again the language from the business world – example, the objective might be to minimise
by information is often used to describe the alcohol-related problems. The central
new power structure.Abetter term to highlight government does not have the means at
the moral dimension of the change would be its disposal to reduce alcohol consumption
‘governance by programmes’ or ‘frameworks in the country, or is reluctant to use such
which have replaced the plan’. The moral and policy instruments (price increases, permitted
political authority of the state does not suffice hours of sale and other regulations of the
to define what the good society is, what kind market); instead it asks local communities,
of life is good or bad or how to solve problems. non-governmental organisations (NGOs),
76 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
businesses, labour unions, churches, etc. factors, that the practical social work in
to establish innovative projects and have prisons, for example, cannot commit only
them evaluated for economy, efficiency and to one or a few explanation models and
effectiveness (Sulkunen 2006). their conclusions concerning clients. It is
The central concept in goal and framework more useful to observe the effects of the
management, ‘innovation’, has been used in existing methods of social work itself and
the science and technology policy already choose the methods that seem functioning
for a long time. The administration cannot and cost-effective. The innovation thinking is
predetermine the results of the researchers or dressed in the rhetoric of good practice, and
the direction of the development interests of it leads to a sort of new social Darwinism.
companies, but it can take a stand on the direc- Clients and employees are given free hands to
tion of the development in general and make invent new kinds of action models, mutations,
strategic policy definitions. New ideas come and eventually the most fit among them are
from the ‘grassroots level’, from fieldworkers chosen for additional refining on the basis of
and citizens themselves. Transferred to social expert reports. Evaluation is then considered
policy, the pattern of ‘innovation thinking’ the unbiased and unemotional mechanism of
has assimilated traits of romantic rationalism: social and natural selection.
people are thought to be creative and the The other side of pragmatic thinking
solutions have to be given space to develop is moral neutrality. Assumption that the
and grow upwards from down under. The methods of social work or the alternatives
researchers should evaluate and strengthen for control policies could be evaluated only
these tendencies instead of planning. The in regard of their functionality and effec-
primary tasks of evaluation are surveillance tiveness, presupposes a strong unanimity of
of expenses, ensuring quality and supervision goals – the employment, health and security
of observance of rules and regulations: tasks of the population being considered good
which used to belong to inspectors and objectives and repeated offences a bad one,
superintendents of state governance. Often for example. In programme rhetoric neutrality
they include, though, more ambitious goals of leads to abstracticism and definitional – and
generalisation, which are called recognizing at the same time administrative – ambiguity.
good practices. Promotion of health is a good example of
The expressions ‘good practice’ and ‘what this. Another is management of security. This
works’ originate from prison administration rhetoric calls the acts of officials with a general
(Garland 2001), and from there they have name that has a morally neutral flavour. It is
spread to social work and public adminis- easy for everyone to accept, but at the same
tration in general. This manner of speech is time it expands the range of goals of the
an application of solution-oriented therapy officials and experts and blurs the boundaries
or pedagogy, which detaches itself from of their actions. The other moral points of view
analysing reasons of problematic behaviour related to the matter – the customers’ freedom
and instead concentrates on the recognition of choice or the sense of justice of many
of the effects of alternative action models. citizens demanding more severe punishment
The search for reasons is, according to this per- for criminals, for example – can be forgotten
spective, not only a waste of time but it might from the standpoint of effectiveness.
also have negative effects. When criminals
learn about the causes of their behaviour, those
causes become ‘vocabularies of motive’, THE FICTIONS OF EVALUATION
justifications and rhetoric for escaping respon- RESEARCH
sibility (Sykes and Matza 1957).
The recognition of good and working From the point of view of the sociology
practices is pragmatic thinking. The behaviour of knowledge, governance by programmes
of a person is a sum of such complicated positions the sociologist in a new relationship
SOCIAL RESEARCH AND SOCIAL PRACTICE IN POST-POSITIVIST SOCIETY 77
with social practice, exactly like Nowotny reflects what we have called the Ethics of
et al. (2001) describes it as characteristic Not Taking a Stand, quoting a fieldworker
of Mode 2 knowledge production. Social we interviewed on how she advises parents
research operates in the context of application; to behave in the drug issue: ‘The most ethical
it is not constrained by disciplinary boundaries stand is not to take a stand at all, the parents
and the criteria of its accountability are less should decide this for themselves’ (Määttä
academic than practical: tell us what works, et al. 2003).
and we shall be pleased not to know why Abstraction has also another legitimating
something else might not work. function. It protects the sphere of intimacy,
If the idea of ‘pure science’ in the posi- which was the historical goal of the wel-
tivist Mode 1 knowledge production was an fare state: the self-responsibility of citizens,
illusion, but an illusion in a real context with individual agency and commitment to good
real consequences, are the ideals of Mode 2 choices to promote a person’s own health,
social science more realistic and convincing? security and well-being. This is not limited to
To some extent the answer is positive: social rhetoric or ideological speech, but it is part
science that operates in a context and is aware of the everyday life of advanced capitalist
of its own vested interests is more honest about society. For example, the health care expert
itself and potentially also more relevant than system is relatively helpless if the patient is
social science built on the fiction of basic unwilling to co-operate: ‘only the medication
science and applied research. However, also that is taken will help’. But you cannot
Mode 2 science attached to the programme force anyone to co-operate. You cannot get
rather than to the plan has its illusions, overweight under control unless consumers
as real as the fiction of Mode 1 science eat less. Disciplining consumers’food choices
but in a different context and with different directly would be felt as unacceptable pater-
consequences. The first illusion arises from nalism. They will have to take responsibility
the logic of governance by programmes itself: for their own choices.
abstract objectives. In programmes with very concrete targets
Programme and evaluation rhetoric make such as weight loss the outcomes are easily
politics look rational, and hierarchical measured. However, in many cases standards
decision-making just like business manage- of performance are more ambiguous, and
ment. But what does state need this rhetoric the audit or evaluation of efficiency and
for? Why is it impossible for example for a effectiveness is in fact a process of defining
ministry to decide on its strategy in alcohol and operationalising them, often with perverse
policy and to follow that strategy in financing effects on the actual operation of the system.
and other solutions? One reason for this A good example is research evaluation.
is the pursuit of political neutrality already In theory, university departments and research
discussed above. The ministry does not want institutes are expected to produce relevant
to decide or it considers itself incapable good quality research, but the auditing crite-
to dictate how municipalities, organisations, rion: articles published in refereed journals,
companies – or other ministries – should act in leads to an increase in the number of such
order to decrease problems caused by alcohol journals, with the consequence that fewer
consumption. To preserve the autonomy of people read them and the social relevance
those actors the policy goals are defined of research results declines. Nevertheless,
with abstract concepts, of which employment, money is invested in them because the
health and security are the most central ones. effective alternative, such as taxing food or
It is always possible to reach unanimity alcohol, is not included in the repertoire of
concerning those goals, even though the acceptable policies.
moral or power resources would not always Governance by programmes and frame-
suffice to make concrete policy decisions. The works thus supports what Nowotny et al.
rhetoric of ‘what works’ and ‘best practices’ (2001) consider the key features of the
78 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Mode 2 science. Abstract objectives of eval- effect is a part of the equipment of science,
uation research in the context of application as well as of everyday thinking. We light the
encourage transdisciplinarity and pragmatic lamp, roast the ham, start the car, give an
division of labour. When the interest is not advice to another person or call a meeting
directed at explaining behaviour nor even at assuming on the basis of our prior experience,
the mechanisms of effects of the measures that a certain state of affairs will follow. We do
taken, but only at the effectiveness of the not usually ask why it results from that action.
alternative action models, there is no need Only when the lamp does not get lighted, the
for the research of alcohol problems, youth ham does not roast or advice or invitation
culture or deviant behaviour but for skil- are not followed, do we start investigating
ful evaluation researchers who can flexibly the error. Even then we don’t have to know
move from one substance area to another. much about the mechanisms of the causal
Corresponding abstracticism is visible in the chain, but we can lean on our prior experience.
training of fieldworkers and their division of We routinely change the bulb, check the fuse
work. As the French sociologist Robert Castel and the position of the ignition key or whether
(1981: 135–44) has claimed, the profession- our advice or invitation has actually been
alisation of social work has not actually led received. Only in very exceptional circum-
to the often anticipated medicalisation nor stances do we have to lean on expert support,
specialisation of other kind. Instead there has that is to say we utilise research-based knowl-
developed a paraprofessional mixed type, the edge to explain the mechanism between the
general task of which is social control. cause and the effect and this directs us to look
The abstracticism of goal and framework for the error in the different parts of the chain.
management has resulted in efficiency and In evaluation research the primary interest
effectiveness becoming passkey concepts that of knowledge is similar to our everyday causal
are applied everywhere. Sometimes, however, thinking. The interest of knowledge is not to
they misrepresent the reality that they are establish general laws about social life but to
supposed to evaluate. For example, every verify whether the action causes the desired
society will need to take care of addicts effect or not. This could be called clinical
in some way. For the clients’ welfare as causal thinking. Its objective is not to explain
well as for the institutions – the police, the mechanisms of effects, but only to test
social offices, penal and medical institutions – pragmatically if they are there, how much they
the most relevant questions relate not to vary and are there possibly some ill effects.
outcomes in terms of recovery but to the Medicine that is based on evidence and the
division of labour between controlling and medicine-influenced social policy of the same
helping professions. This, however, is not type are examples of clinical causal thinking2 .
an issue of performance but of ethics and Still, clinical causal thinking has similarly
values. Constrained to evaluating efficiency limiting logical conditions as the causality
and effectiveness, Mode 2 social science may tests of the research laboratories. The cause
in fact sustain inefficient responses instead of and the effect have to be logically independent
asking pragmatically relevant questions about and empirically dependent on one another;
their rationale. the cause factor has to be adjustable in an
unambiguous and measurable manner; and the
effect of other variables has to be eliminated
THE RETURN OF CAUSALITY AND ITS experimentally or statistically. Also there have
OLD PROBLEMS to exist unambiguous means for measuring
the effect, which has to follow the cause
The second illusion of the new mode of practi- temporarily.
cal social science arises from the requirements Some clinical medical research is able to
of efficiency and effectiveness. Both are based come up with these expectations. The medica-
on the notion of causality. The concept of ment will stay the same in spite of who it is
SOCIAL RESEARCH AND SOCIAL PRACTICE IN POST-POSITIVIST SOCIETY 79
given to and who hands it out, and the human This shift has had implications at three
body is approximately the same in different levels: referential (what is studied), epistemic
circumstances. Usually it is possible to control (what kinds of questions are asked) and
the effect of differences with the reliability sociology of knowledge in a narrow sense
that meets the expectations of the practice. (position of scientists in relation to the
In social work and social policy the conditions object of their research and to those whose
of clinical research can be measured up only knowledge needs they serve).
in exceptional circumstances. As Tom Erik I have also argued that Mode 1 social
Arnkil and Jaakko Seikkula (2005: 60) have science was a deviation rather than a long
claimed, a psychosocial work does not move tradition in modern social science. It was
from a certain place, actor or situation to associated with governance by plan in the
another remaining the same, as medication. post-war decades of state-driven industri-
No ‘method’ or ‘model’ can be independent alisation and construction of the welfare
of the agent who delivers it, who receives it, states. It had important functions in providing
or that would be conceptually independent of a conceptual portrayal of society and the
the effect it aims at. theoretical framework for growing needs for
Evaluation is usually performed in a sit- monitoring and information, which now are
uation where a test or even comparative mostly covered by information systems other
configuration of any kind is not possible. than the social sciences. However, Mode 1
Ordinarily the evaluator is contacted when social science was also an illusion, and
the funding of the project has already been many social scientists and critics were aware
granted, its staff and principal idea are of this.
decided, and the fieldwork of the project The shift to Mode 2 science was a reaction
has already partly started. Some vested to internal developments within the social
interests have already been created, the good- sciences but more importantly it reflects the
willing mission is an inspirational source for epochal change in the logic of governance
action, and there is no time or resources in capitalist societies from the plan to
for comparison presupposed by a real eval- programmes and frameworks. This change is
uation of effectiveness. The expectation deeply rooted in the structure of capitalist
of establishing causality turns into a thin societies which stress individuality and auton-
fiction. omy of agents. Fixity on abstract targets, good
practices and causal relationships in Mode 2
science are fictions too, but on the other hand,
CONCLUSION AND DISCUSSION science which is aware of its own context has
a greater critical potential and capacity to act
In this article I have discussed the relationship as ‘public sociology’ than a discipline that
of social science to social practice, and argued is divided between pure science and applied
that a radical paradigm shift occurred in the research.
1980s in all advanced capitalist countries from
the positivist mode associated with the idea
of the plan to a more context-based science NOTES
attached to governance by programmes and
frameworks. The change reflects the new
practices of governance that were introduced 1 Therborn 1995, table 4.4, p. 66, and table 4.6,
at the same historical period in the business p. 69.
world as well as in public management. 2 The so-called Cochrane-library collects the results
In social science knowledge production the of clinical treatment research, evaluates their validity
and draws conclusions on the probabilities of the
shift corresponds to a transition from what effects of the methods. Corresponding work has been
Gibbons et al. (1997) call a transition from done in social policy under the name of Campbell-
Mode 1 to Mode 2 science. cooperation.
80 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
In order to set the discussion within the is a theory and analysis of how research does
wider context of methodological issues, the or should proceed; it includes accounts of
starting point here is a brief overview over how the general structure of theory finds its
some main lines of questions and concepts application in particular scientific disciplines’
associated with the methodological debates. (p. 3). The connection between methods,
methodology, epistemology5 and ontology,
is complex and is often debated in writings
TERMS AND CONCEPTS IN about method. For Harding this relationship
METHODOLOGICAL DISCUSSIONS: can be thought of as concentric circles where
A BRIEF OVERVIEW method forms the inner circle and ontology
the outer (ibid). This could in some instances
The meaning of the term ‘method’ has be thought to imply that choices of methods
changed over time. The earliest book on social bring with them certain methodological and
research methods: Emile Durkheim’s The epistemological assumptions. However, in
Rules of Sociological Methods (1972 [1895]), discussions about the quantitative-qualitative
was not widely known in the English-speaking divide in social science methods, the claim
academia until its translation into English in that choice of method implies certain episte-
1938. Durkheim’s objective was to write a mological underpinnings, is but one of several
text to discuss methods explicitly (Durkheim standpoints in the debate (see e.g. Platt 1996;
1972, p. 19)3 . As Platt (1996, p. 252) observes Bryman 2004).
in a discussion about changes in interpretation Throughout the history of the social
over time, Durkheim’s work in the English- sciences one of the most salient debates
speaking world came to be associated with in the field of research methods has been
method and the kind of multivariate analysis that discussing the ‘quantitative-qualitative’
advocated by Lazarsfeld and his colleagues divide. Even though the general understand-
in the 1950s because of the use they made of ing of the distinction between the two involves
his writings in their discussions on method as techniques for collecting and analysing data,
technique (Kaplan 1964). the boundaries between them are not as
Methodology is a concept often used syn- clear-cut if aspects of methodology and
onymously with the term method4 . Whereas epistemology are brought to bear on the
the term ‘method’ in most cases refers to pro- discussion (Brannen 1995; Bryman 2004).
cedures or techniques for gathering evidence, As Platt (1996) points out in writing on the
methodology has a wider meaning. For the history of methods discussions in America,
current purpose a definition of methodology the terms and concepts for describing methods
that highlights the wider field of discussions have changed over time. The quantitative-
about methods, and the relationship between qualitative divide was described in terms of
method and theory, will be referred to. ‘case studies vs. statistical methods’ before
Kaplan (1964) gives the following definition World War II. ‘Survey’ was in this period
of methodology: ‘I mean by methodology used to describe a method in studies of whole
the study – the description, the explanation, communities, whereas its modern use is asso-
and the justification – of methods, and not ciated with large-scale statistical studies. The
the methods themselves’ (p. 18). On the aim term ‘case study’derived from social workers’
of methodology he continues: ‘[…] the aim cases that were used by sociologists as data at
of methodology is to help us to understand, a time when the boundaries between social
in the broadest possible terms, not the work and sociology were not clearly defined
products of scientific inquiry but the process (Platt 1996; Levin 2000). Life histories were
itself’ (Kaplan 1964, p. 23). Harding (1987) used synonymously with case studies (Platt
discussing these issues along the same lines, 1992, 1996). When the focus shifted in the
broadens the meaning of methodology even 1950s from what data was about to the way
more when she observes that, ‘A methodology it was collected, the debates changed and
FROM QUESTIONS OF METHODS TO EPISTEMOLOGICAL ISSUES 83
what was earlier known as case studies now person’s life in depth where the subject’s voice
known as ‘qualitative’ research. In circles is at the centre. A topical story focuses on
where quantitative methods were regarded as one particular issue in persons’ lives and is
the only truly objective methods for collecting aimed at researching a particular area of life
objective data, qualitative methods were whereas edited stories leave the researcher’s
thought of as useful only in initial stages of voice at the forefront. Plummer also makes
a study. a distinction referring to researcher ‘inter-
Current discussions about the quantitative- ference’ with accounts; naturalistic stories
qualitative issue are more open to bridging are spontaneously given accounts, researchers
the divide where data and methods are have not prompted them. Researched accounts
concerned (Brannen 1995; Bryman 2004). are those that researchers have asked infor-
Bryman (2004) points out how different ways mants to provide, and reflexive stories are
of approaching the discussion are decisive those where researchers reflect on their own
of whether multi-strategy research is deemed partaking in the creating and constructing of a
possible or not. If the divide is seen in story (Plummer 2001, pp. 19–35). In current
terms of methods for collecting and analysing research there might be a focus on single
data – the technical version – the gap is individuals or groups of individuals such as
easy to bridge. However, if the quantitative- families (Brannen et al. 2004), or as in the case
qualitative divide is referred to in terms of Bertaux and Thompson’s study of social
of different epistemologies, multi-method mobility in families (1997) and Bourdieu’s
approaches are not easy to apply. This latter study of socially excluded people in France,
point goes to the heart of the discussion in focusing on issues such as social class and
this paper; different epistemological positions try and map out meaning behind statistics
invite different standpoints to what data is, and (Bourdieu et al. 1999).
indeed also whether the very term ‘data’ is Ways of analysing such material varies with
considered valid. the overall approach taken by the researcher,
as well as the purpose for having collected
it to start with. When using biographical
BIOGRAPHICAL MATERIAL material other sources of data are inevitably
drawn on to map out and understand the
One definition of a biographical account is different layers of context lives are embedded
a story told in the present about a person’s in (Nilsen and Brannen 2005). In spite of
experiences of events in the past and her or his this paper focusing mainly on one perspective
expectations for the future (Nilsen 1997). The in particular, it is nevertheless clear, as the
term ‘biographical material’ does however following will demonstrate, that there is no
cover a wide range of empirical evidence: such thing as one correct way of approaching
personal letters, diaries, photographs, written biographical research material, and as the
autobiographical accounts (life stories) and method has evolved into multiple ways of
more (Plummer 1983, 2001; Roberts 2002). collecting biographical material, methods of
In this paper the discussion is focused on analysis have also become many and varied.
research material stemming from interviews. An important theoretical influence for the
Life stories come in many varieties and discussion in this paper is the tradition
one way of classifying is offered by Plummer from which biographical research originates:
(2001)6 . He makes a distinction between long American pragmatism as developed by Peirce
and short stories, where the first is the full- and Mead, especially with reference to notions
length story of one person’s life, and the latter of self and the social world as well as the
is based on more stories. A further distinction type of ontological perspective that informed
is related to the ‘depth’ of the accounts and is the works of these two (Lewis and Smith
between comprehensive, topical or edited life 1980). This perspective has been influential
stories. The comprehensive is a story of one in most European approaches to biographical
84 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
research, although as the following discus- uncover the basic laws that govern any
sion will highlight, other epistemological phenomenon under scrutiny. Thus Thomas
standpoints and theoretical approaches have and Znaniecki, in keeping with their time,
become more prominent over time. sought through the study of Polish immigrant
society in Chicago and in Poland, to uncover
‘laws of social becoming’. Such laws were
THE MAKING OF A SOCIOLOGICAL sought by focusing on the objective (values)
METHOD: CHICAGO CA. 1920 and the subjective (attitudes) sides of social
life. The insistence on including subjective
One of the most comprehensive sociolog- factors in social analysis was new at the
ical studies to date is W. I. Thomas and time the study was carried out. It had the
F. Znaniecki’s The Polish Peasant in Europe potential to undermine one of the basic
and America published in five volumes premises of positivist social science: that
(1918–20)7 . It is a study of Polish migrants objective facts alone could constitute the
in Poland and in Chicago, where they settled data studies were to be based upon. Their
upon arrival in the USA. Based on a number methodological principle was formulated as
of sources of data such as official documents follows: The cause of a social or individual
and statistics, it also included personal letters, phenomenon is never another social or
diaries and one autobiographical account: that individual phenomenon alone, but always a
of the peasant Wladek. For many reasons the combination of a social and an individual
study was not fully recognised as the accom- phenomenon (Blumer 1979 [1939], p. 9).
plishment it was until nearly 20 years after it Their standpoint was contrary to positivist
was published. In 1938 the American Socio- social science also in that they did not see
logical Association elected the study as one of physics as the paradigmatic science the social
the greatest works in sociology. The appraisal sciences should model itself on:
proceedings were convened by Herbert
[…] while the effect of a physical phenomenon
Blumer and were published in 1939. The book
depends exclusively on the objective nature of
makes fascinating reading for anyone inter- this phenomenon and can be calculated on the
ested in social science methodology as the ground of the latter’s empirical content, the effect
discussions in the panel are quoted verbatim. of a social phenomenon depends in addition on
The debates were set in a time period when the subjective standpoint taken by the individual
or group toward this phenomenon. (Thomas and
positivism8 defined the boundaries of what
Znaniecki 1918, p. 38 cited in Blumer 1939, p. 11)
was to be considered science. Dilemmas dis-
cussed at the appraisal proceedings included The epistemological basis of pragmatist
whether ‘subjective factors’ should play a role thought as represented by Peirce and Mead
in social science research, and if so, how (Lewis and Smith 1980) could be thought
was this to be accomplished? Following from of as a form of processual realism in
this, another question – that of whether and that it does indeed presuppose independent
to what extent ‘human documents’ could be reality, but this reality is not fixed as in
considered reliable sources of data – became positivist thinking. Reality itself changes in
central. Wladek’s autobiographical account, time and humans as social beings create
the first ever to be used as a sociological reality as a collective activity. In contrast to
source of data, was especially scrutinised a constructionist position, which highlights
with reference to whether or not it could be the social constructed nature of reality and
considered reliable. rejects any independent qualities of it, the
The underlying ontological premise in form of realism found in Peirce and Mead
positivistic thinking is that reality is fixed and defined itself in contrast to their contemporary
exists independent of human observation and variety of constructionism, namely idealism
interpretation9 . According to this position, (Lewis and Smith 1980). Drawing this parallel
the role of any scientific endeavour is to is reasonable because what idealism and
FROM QUESTIONS OF METHODS TO EPISTEMOLOGICAL ISSUES 85
Throughout his career Blumer sought to Thomas and Znaniecki, his thoughts on what
challenge ‘variable sociology’ by doing empirical material sociology should concern
empirical studies that were in keeping with itself with, emphasised the value of such
his notions that sociology was to be centred data. In early writings he outlined thoughts
on studies of social interaction12 . His naming about methodological issues (Mills 1940) that
of symbolic interactionism as a strand of were later presented more extensively in The
sociology that developed the heritage from Sociological Imagination. Mills’ work moved
G.H. Mead’s social behaviourism is evidence the discussion from method as technique, to
of this. Blumer’s symbolic interactionism has the realms of methodology and epistemology.
been very influential for the development of Very much influenced by American pragma-
biographical research, not least in the work of tist thought and also by Karl Mannheim’s
Norman Denzin. This will, however, be the sociology of knowledge, his views on social
topic of a later section. reality coincided with those of Mead in
Other alternative strands of thought existed, that he thought of the self as in process in
also in American sociology. The most radical social contexts that were also in continual
critique of the situation in the social sciences development, hence his insistence on the
came from a scholar who by many was proper subject for sociological study to be the
regarded as an outsider but whom nevertheless intersection between history and biography.
made his mark in a distinctive way. Only in studying the actions, thoughts and
feelings of individuals and contextualising
them in particular moments in history, can
POSITIVISM CHALLENGED: THE sociology fulfil its potential:
SOCIOLOGICAL IMAGINATION
[The sociological imagination] is the capacity to
range from the most impersonal and remote
C. Wright Mills published The Sociological transformations to the most intimate features of the
Imagination in 1959, three years before his human self – and to see the relations between the
death in 1962. The ideas presented in this two. Back of its use there is always the urge to know
book were developed throughout his career the social and historical meaning of the individual
in the society and in the period in which he has his
as an empirical researcher and a critic of
quality and his being. (Mills 1980 [1959], p. 14)
much of his contemporary researchers’ work.
He was especially critical of the dominance Evident in this are his notions of theory
of what he on the one hand called ‘The that were closely linked to his thoughts on
Theory’ which referred to a tendency to seek methodology and his epistemological and
explanations for social phenomena in large ontological beliefs. As to the latter he can
bodies of thought known as ‘Grand Theory’ be characterised as a realist of the variety
of the Parsonian variety, and on the other found in the pragmatism of Peirce and Mead
hand what he called ‘The Method’: statistical meaning that he thought of social reality
techniques for analysing huge datasets. The as existing beyond human interpretation; yet
most prominent advocate of the latter was one interpretation (what Thomas and Znaniecki
of Mills’ earlier superiors, Paul Lazarsfeld. termed the subjective side of social reality
Mills’ critique was grounded in an alternative or attitudes) was an inescapable part of
vision of what sociology was to be about. The empirical data. His processual and double-
historical period known as the cold war did natured view of social reality lay at the
not take kindly to a politically radical figure heart of his vision of what the sociological
such as Mills. However, he was a productive imagination was and what role sociology had
empirical researcher and studies such as White in society. It also informed his thoughts on
Collar and The Power Elite received wide data and methods for analysing them: his
acclaim. methodological viewpoints. In the appendix
Even though Mills himself did not carry to The Sociological Imagination he outlines
out biographical research in the tradition of in much detail how social science studies
FROM QUESTIONS OF METHODS TO EPISTEMOLOGICAL ISSUES 87
can be carried out in order to collect and from the ground up13 . Qualitative data, such
produce empirical material and how to analyse as observation and interviews, was the main
it in ways that shed light on the crucial source of empirical evidence in this approach.
questions of a particular period in history; It thus helped to develop a logic of method
identifying how private troubles and public that was said to be particular to qualitative
issues are interconnected for people living analysis14 .
in particular places at some defined period Another approach that emerged in the 1970s
in history (ibid). Empirical material includes was life course research, a quantitative way of
biographical accounts as told and interpreted analysing data with special attention to life
by individuals themselves, in addition to course events seen in light of cohorts and
information of a more factual kind; records historical periods. Age is of special relevance
and facts about life courses in general and as social institutions in most societies are
about the society in which the individual lives organised such that cohorts go through the
unfold. He did in other words advocate the use same events at roughly the same chronological
of data from many different sources in order to age, for instance the system of education
understand the layers of context that people’s (Elder 1974; Riley 1988; Giele and Elder
lives are embedded in. 1998). This perspective, which is quantitative
Mills’ writings did not result in any revival and owes much to both demographic studies
of biographical research in his time. It took and more macro-oriented social research as
nearly two decades after the publication of found in the classic texts as well as to Mills’
The Sociological Imagination for this research approach to sociology, has been influential
tradition to re-emerge, this time in Europe. also in qualitative approaches in that both
see temporal aspects of social processes,
and the link between macro and micro,
METHOD DISCUSSIONS: CHANGES IN as central to social research (Giele and
APPROACHES TO THE Elder 1998). Methodologically, life course
QUANTITATIVE-QUALITATIVE DIVIDE research with its large datasets that can
span generations of individuals is oriented
As positivism came under close scrutiny towards debates on statistical analyses and
and critique from philosophers and social methods as technique. Following Bryman’s
scientists alike throughout the 1960s and 70s, (2004) distinction between an approach to the
mainstream social science debates were still quantitative-qualitative divide as one based
stuck within the parameters of discussion on data and methods on the one hand, and
defined by positivist notions of science: the more epistemologically founded one on
those of methods as techniques. Questions the other, quantitative life course research and
about validity and reliability of data, of qualitative biographical approaches can easily
generalisations and representativeness were be combined if the former stance is taken.
argued over across the borders between However, as will be seen in the following, this
qualitative and quantitative research. combination of data is not possible with all
The publishing of The Discovery of types of approaches to biographical material.
Grounded Theory (Glaser and Strauss 1967)
was important for the development of qual-
itative research in its own right. The main THE REVIVAL OF BIOGRAPHICAL
thesis in this book challenged contemporary APPROACHES
notions of theory and method both. Where
surveys were analysed to test hypotheses Oral history had by the early 1970s emerged
based on theoretical assumptions formulated as a tradition to be reckoned with in history
beforehand, grounded theory suggested a way (Thompson 1978). Biographical accounts
of carrying out research and analysis starting played an important role in this research,
from data and building concepts and theories and debates in this field to start with often
88 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
centred on whether or not such data could be of a follow-up volume of this book published
considered reliable sources of knowledge for in 2001. By then what could be called ‘the
historians; whether people’s recollections of linguistic turn’ had taken hold in the social
the past could be considered accurate enough sciences, and most discussions related to
for this to qualify as scientific data, and methodology had taken on a new shape.
the retrospective element in such interviews Epistemologically the discussions during
was scrutinised (see, e.g. Gittins 1979). What the first revival phase of biographical research
distinguished oral history from the work of were carried out from a realist ontological
Thomas and Znaniecki was first and foremost position, e.g. underlying the debates was the
the use of interviews. Wladek’s autobiography notion that biographical material was able to
was a written account, and therefore not the give access to some form of truth about social
result of a life history interview15 . life. When accounts were questioned it was
The most important phase in biographical from a perspective of reliability at a method-
research started in the late 1970s with the work ological level, whether people’s stories could
of Daniel Bertaux in France. The publication be relied upon; notions of truth itself were not
of a collection of papers from the first ad the object of debate during this phase.
hoc workshop on biographical research at
the World Congress in Sociology in 1978
in Uppsala, marks a revival of the interest ‘THE LINGUISTIC TURN’:
for biographical research in sociology. The
POST-MODERNISM AND
book, entitled Biography and Society. The
POST-STRUCTURALISM
Life History Approach in the Social Sciences
contains papers that cover a broad spectrum
In Europe hermeneutical approaches have
of topics. Questions arising from written and
become prominent in discussions that high-
oral biographical accounts were looked into,
light differences between the humanities and
and perspectives from the social sciences
the natural sciences17 . Husserl’s phenomenol-
and humanities were drawn on to explore
ogy was important for the development of
them. Common to all papers in this volume
the Heideggerian hermeneutics, but has also
is a concern with time, and life lived and
been influential in its own right in the
interpreted in time. Questions about gen-
social sciences, not least through Garfinkel’s
eralisations and representativeness are also
ethnomethodology which was developed in
explored but unlike earlier discussions they
the intersection between Parsonian thought
are set within a wider frame of understanding
and A. Schutz’s expanding of Husserl’s work
than a mere positivistic frame of reference16 .
(Heritage 1984). Hermeneutics started out as
Another influential work from the early
a method to examine texts, and to try and read
days is Ken Plummer’s Documents of Life
texts as part of the context they originated
(1983). In contrast to the volume edited
in – the hermeneutical circle. As this per-
by Bertaux, this book is a monograph that
spective gained more ground in social science
sets the biographical tradition within the
methods debates, aspects of language and
frame of Chicago sociology in general and
narrative structure in biographical accounts
symbolic interactionism in particular. This
were highlighted.
book has also become a classic in biographical
Another important influence for this shift
research because it was a first attempt to
came from linguistics. As the structural
map the history of this particular sociological
linguistics of Lévi-Strauss was criticised by
research tradition. Plummer’s epistemologi-
Foucault and Derrida, the grounds were laid
cal perspective in this book is realist, and
for post-structuralism in language theory and
he pays much attention to interviewing and
social theory. But:
analysis of interviews in order to grasp the
meaning inherent in biographical accounts. Despite their differences, structuralism and post-
This perspective is a contrast to his publication structuralism both contributed to the general
FROM QUESTIONS OF METHODS TO EPISTEMOLOGICAL ISSUES 89
displacement of the social in favour of culture Norman Denzin was one of the most
viewed as linguistic and representational. Social prominent advocates of a shift in biographical
categories were to be imagined not as preceding
research towards narrative approaches and
consciousness or culture or language, but as
depending upon them. Social categories only a focus on language. A former student of
came into being through their expressions or Blumer’s, he changed the term used for his
representations. (Bonnell and Hunt 1999, p. 9) perspective from symbolic interactionism to
interpretive interactionism (Denzin 1989a,
The semiotics of Roland Barthes, 1989b).
Foucault’s critique of power and Lyotard’s
critique of ‘grand narratives’ were all The term ‘interpretive interactionism […] signifies
an attempt to join traditional symbolic interactionist
influential for the direction social science
thought with participant observation and ethno-
research took throughout the 1980s. graphic research, semiotics and fieldwork, post-
Methodological questions were replaced modern ethnographic research, naturalistic studies,
by epistemological debates; and these centred creative interviewing, the case study method,
on whether there was reality beyond language. the interpretive, hermeneutic, phenomenological
works of Heidegger and Gadamer, the cultural
When influence from the humanities
studies approach of Hall, and recent feminist
became more pronounced throughout the critiques of positivism. (Denzin 1989a, pp. 7–8)18
80s, a shift of focus also occurred in
biographical research. From having been
concerned with analyses of life stories and From this quote it becomes clear that bio-
biographical accounts as empirical evidence graphical research epistemologically founded
of lived life, gradually more attention was in realist pragmatist thought was no longer
given to the narrative itself, to the told life centre stage. A blending of many different –
and to the different phases of interpretation and in some instances incompatible – research
of a biography. Questions about the role approaches opened a wider field for biograph-
of the researcher in the production of ical research, and also invited collaboration
the biographical account, whether this had across disciplinary boundaries in ways that
originated as a written autobiography or was had earlier not been common. This was espe-
the outcome of an interview between an cially true in feminist biographical research19 .
informant and a researcher, became important. Denzin’s changed approach is symptomatic
Demands that the researcher be self-reflective of the debates that occurred in biographical
in the writing up of biographical research research during this period. From discussions
material were frequently heard, and in many about whether individuals’ accounts could
instances the biographical experiences of the be regarded as reliable in the sense of
researcher and his or her reactions to the people telling the truth about their lives,
story told by the informants, became topics the interest was gradually shifted towards
of interest (Iles 1992). This shift also marked debates on ontological and epistemological
a change in epistemological focus towards a issues (Nilsen 1994, 1996). In many instances
more constructionist standpoint which implies the underlying epistemological notions were
a line of questioning that is premised on not taken up explicitly but informed research
knowledge about reality as reality (Lewis design and choices of methods for data
and Smith 1980). A belief that reality is collection and analysis in empirical studies.
a human construction alone can lead to In Chicago during the 20s a processual
extreme relativism in the approach to any notion of the self as developed in the pragma-
research material. A blurring of the boundaries tist thought of Peirce and Mead, underpinned
between fact and fiction, between truth and Thomas and Znaniecki’s research. A notion
non-truth, between the factual and the non- of self, and of life, as lived in time with
factual, implies a very different approach to access to memories of experiences in the past
biographical research from that of the classic and the willingness and ability to recount
studies. these in some present, is central in classical
90 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
biographical research20 . In order for this the social sciences and the humanities has
approach to have merit, some form of realist increased over the past decade, and cross-
epistemological position must bear upon the disciplinary studies have been encouraged22 ,
theoretical and methodological perspectives the debates over biographical and other
employed in research. Experiences21 cannot methods of performing research are many
be recalled if there is no such thing as and varied. The influence from hermeneutics
reality beyond language. Indeed, a strong and methodological approaches originating
constructionist position seems to annihilate in humanistic disciplines, together with the
the notion of time as process and leaves epistemological shift towards construction-
only a present with no relation to past or ist/interpretive perspectives, has led some
future as discourse and language replace to subsume biographical material under the
time and material practice. Where there was term interpretive approaches23 . In doing so
earlier a concern with time as process and the story as a told story is put at the
self as developing in social relationships forefront of attention. It goes without saying
that changed over time, more attention has that biographical accounts are told stories.
been paid to the concept of identity, also in However, whether one believes there is a
biographical research. reality beyond the account and hence some
Identity was earlier discussed in relation to factual experiences informants talk about and
development and particularly with reference make these part of the analysis, is an important
to the life course phase of youth (Erikson 1980 distinction between a constructionist and
[1959]). The epistemological shift towards a realist approach. In order to overcome
constructionist approaches introduced terms the divide created by the epistemological
such as ‘fragmented identities’ and identities debates, and for social science in general
as matters of choice (Giddens 1991; Plummer and biographical research in particular, to
2001). Such notions are more spatial than maintain its critical potential, a return to
temporal since identities in this sense bear agency as a key sociological notion, is by
no relation to development in time but can some held as crucial (Bonnell and Hunt 1999;
be regarded as constructed in discourse and Chamberlayne et al. 2000).
markers of life style rather than being related Exploring the way people talk about
to the development over life course phases their lives is important for many reasons.
(Brannen and Nilsen 2005). Where Erikson Understanding narrative structure can add
saw identity as part of a wider notion of self, immensely to the overall understanding of
identity has in many instances replaced the a biographical account, not only in terms of
notion of self as ‘selves’ are thought of in language used, but also with reference to the
terms of being constructed in discursive fields social positioning of individuals in society
rather than developed in social relationships (Reissman 1991; Nilsen 1996). Moreover, it
(Bonnell and Hunt 1999, p. 22). can also give insight into and draw attention
to the silences in biographical accounts,
and thus make visible the taken-for-granted
aspects of people’s lives that are more
METHODOLOGY DISCUSSIONS often than not structurally founded and thus
BEYOND THE important for understanding the informant
QUANTITATIVE-QUALITATIVE, THE in the context that the life unfolds within.
POSITIVIST-INTERPRETIVE AND THE In cross-national comparative research this
REALIST-CONSTRUCTIONIST DIVIDES? aspect of biographical accounts is particularly
important (Nilsen and Brannen 2002; Brannen
Biographical research currently sweeps a and Nilsen 2005).
wide array of approaches and perspectives. As However, approaching biographical
the blurring of boundaries between disciplines accounts from this perspective alone can
within the social sciences and between render the more material structural contexts
FROM QUESTIONS OF METHODS TO EPISTEMOLOGICAL ISSUES 91
that surround and inform the content of the make choices to a much larger extent than in
story an individual has to tell less important. ‘high modernity’. Individual choices become
It therefore seems significant for biographical centre stage and the characteristics that form
research to be equally aware of the questions opportunity structures that make for system-
that were raised early in its history as those atic disparities in individuals’ life chances are
that are currently in vogue. not recognised as such. If social scientists
The ontological and epistemological foun- carry out empirical biographical research
dations of the ‘cultural turn’ make it difficult with this type of theoretical back cloth as
to envisage a social science that can produce the main conceptual apparatus, analyses are
convincing evidence of, for instance, social taken to a level of abstractions where indeed
disparities between groups of people (Nilsen discourse and narratives are more meaningful
1994; Bonnell and Hunt 1999). If the notion starting points than the intersection of history
of culture replaces that of social structure, and biography. For the latter to be included
and individual narratives about lives become in studies, attention to the complex and
the most important objects of analysis rather many layered contexts that people’s lives
than lived experiences as expressions of are embedded is needed. Empirical research
social and collective being, the question of has challenged the individualisation thesis
whether there is a place for social science on many fronts, especially the fact that it is
research that highlights power and systematic not sensitive to variation but rather works
differences and inequalities between people as another ‘Grand Narrative’ that shapes the
may rightfully be posed. Whether there will outlook on life rather than tells a sociological
indeed be room for the potential of social story about social diversity and inequality
science to provide critical analyses of trends (Nilsen and Brannen 2002; Brannen and
and development at different levels of society Nilsen 2005).
is another question that can be asked. As For biographical research it is especially
Chamberlayne et al. point out in a critique important that the tradition which sets the sto-
of cultural studies without agency, ‘ “Cultural ries informants tell into a multi-layered social
sociology” rather than “cultural studies” is framework rather than merely analysing them
what is needed’ (p. 9). from a discourse and narrative approach, is
To illustrate some implications of these upheld. As Daniel Bertaux observes in a paper
questions a current strand of thought may that highlights biographical research as a tool
be taken as an example. It also highlights for comparative analysis,
the importance of discussing methods in
relation to theoretical perspectives and ideas Whenever [life stories] are used for probing
that address themselves to particular topics in subjectivities, life story interviews prove able to
social research. probe deep; perhaps because it is much easier to
The individualisation thesis as formu- lie about one’s opinions, values and even behaviour
than about one’s own life. […] it takes a sociological
lated by Beck (1992) and Beck and Beck- eye – some lay persons do possess it – to look
Gernsheim (1995) is informed by a life course through a particular experience and understand
perspective and a biographical approach. what is universal in it; to perceive, beyond described
Arguing from a life course perspective Beck actions and interactions, the implicit sets of rules
and Beck-Gernsheim (1995) maintain that and norms, the underlying situations, processes and
contradictions that have both made actions and
a ‘standard biography’ is being replaced by interactions possible and that have shaped them in
a ‘choice biography’, and that life course specific ways. It takes some training to hear, behind
phases no longer follow the same pattern they the solo of a human voice, the music of society and
used to since structural characteristics such culture in the background. This music is all the more
as age, gender and social class are not as audible if, in conducting the interview, in asking
the very first question, in choosing, even earlier,
significant for shaping individuals’ lives as the right persons for interviewing, one has worked
they once were. An individualisation is said with sociological issues and riddles in mind. (Bertaux
to take place, where people are forced to 1990, pp. 167–168)
92 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
This quote echoes Mills’ visions for sociol- 4 As pointed out by Platt (1996) the English
ogy – what it should be about and the role of terms for method and methodology create problems
sociologists in society. However, it also draws when used as adjectives; both are referred to as
‘methodological’. This chapter is concerned with
attention to some of the debates that biograph- methodology in the wider sense, not to method in the
ical research initiated by the work of Thomas strict sense of ‘technique’ or ‘procedure’ for studying
and Znaniecki; can life stories be relied upon? the social world.
To what extent can this type of material hope 5 As Kaplan (1964) observes, the term ‘method-
ology’ is often used synonymously with epistemology
to be seen as representative of more than the
by philosophers (p. 20). The definition of epistemology
individual story? Far from being dismissed referred to in the context of this chapter is ‘theory of
as mere ‘positivist’ lines of questioning, such knowledge’.
issues are real and are routinely faced by 6 For other ways of classifying, see Miller 2000;
researchers working within this tradition. Roberts 2002.
7 This study was carried out in Chicago where
The paradox is that both the positivist and
sociology was still very much influenced by American
interpretive sides of the divide question the pragmatism. For a further discussion of this see Nilsen
validity of biographical research founded on American Pragmatism and Biographical Research
a realist pragmatic starting point. From an (work in progress).
extreme interpretive side of the divide debates 8 See, e.g. Kaplan (1964) for a detailed discussion
of different forms of positivism and their relevance for
about representativeness are easily rejected as
social science studies. Platt (1996) also gives a detailed
irrelevant since they are considered positivist. account of different interpretations of positivism in
Extreme positivism on the other hand would relation to ‘scientism’: ‘Its meaning overlaps with
question biographical material because it does that now attached to ‘positivism’. It is associated
not qualify as objective data. This chapter with a commitment to making social science like
natural science, and thus with themes such as
has thus argued that a third position needs
empiricism, objectivity, observability, operationalism,
focusing on. In order to map out this third behaviourism, value neutrality, measurement and
position the case has been made for a closer quantification’ (Platt 1996, pp. 67–68).
look into the ontological and epistemological 9 This ontological position is in Lewis and Smith’s
standpoints that underpin methodological (1980) terms a ‘materialist social nominalism’ (p. 8).
10 Theory in a strong positivist sense is aimed at
debates within biographical research. The building laws through hypothesis testing over time.
parameters for the discussion have been the 11 It should be kept in mind here that the
starting point in debates about ‘method as situation in Hitler’s extended Germany was one where
technique’ that highlighted the quantitative- positivist ways of doing social science was actually
qualitative divide, to the current situation the most effective way to challenge racist beliefs
that underpinned the Third Reich’s ideology, and
that focuses on epistemological questions and social scientists who advocated such research were
discussions across the boundaries of a realist- persecuted and had to flee the country if they could.
constructionist divide. Paul Lazarsfeld was but one of these scientists who
fled to the USA. The direct impact of the ‘Vienna
Circle’ for the development of American and also
European social science methods, is however one that
must be seen in view of other simultaneous tendencies
NOTES within American social science itself (see Platt 1996 for
a detailed discussion of this topic).
1 Definitions of biographical research will be 12 The difference between Blumer and Mead
discussed in a later section. In this chapter the on approach to method, where the latter saw no
focus will be on overall debates within this field, problems in combining qualitative and quantitative
thus variations in traditions for making use of this methods, is pointed out by Deegan 2001. Blumer’s
perspective will not be the focus here. approach must be seen in view of the contemporary
2 The terms ‘case’ and ‘case studies’ are referred to time of his writing, where the quantitative-qualitative
in different ways in current sociology. For an overview divide was much more prominent than in Mead’s time.
of themes and topics in debates over case studies, see 13 In one sense Glaser and Strauss took Blumer’s
Gomm et al. 2000 and Yin 2003. notion of ‘sensitising concepts’ and developed it in
3 In Durkheim’s original text the use of the term a direction that ‘operationalised’ how to go about
‘method’ also encompasses what is being referred to making use of sensitising concept in actual empirical
here as methodology. studies.
FROM QUESTIONS OF METHODS TO EPISTEMOLOGICAL ISSUES 93
14 Grounded theory has been criticised for being interdisciplinarity can only work if there are in fact
too positivist and quantitative in its approach to disciplinary differences’.
data and method (see, e.g. Christensen et al. 1998). 23 See, e.g. Plummer 2001.
However, at the time it was published it represented
a more radical approach than what it is thought of
today.
15 This is not to say that life history interviews had REFERENCES
not been conducted before the 1960s; in psychology
there was much interest in biographical interviewing. Beck, Ulrich 1992. The Risk Society. London: Sage.
However, an account of this falls outside the scope of Beck, Ulrich and Elisabeth Beck-Gernsheim 1995. The
this paper.
Normal Chaos of Love. Cambridge: Polity Press.
16 See in particular papers by Bertaux, Ferrarrotti,
Kohli and Thompson in the book.
Bertaux, Daniel (ed.) 1981. Biography and Society.
17 Drawing on Dilthey’s notions of understanding London: Sage.
meaning in context and Heidegger’s development Bertaux, Daniel 1990. ‘Oral History Approaches to
of his ideas in Being and Time, Heidegger’s student an International Social Movement’ in Öyen E. (ed.)
Gadamer published Truth and Method in 1960 Comparative Methodology. London: Sage.
(Gadamer 1989), which has since become a standard Bertaux, Daniel and Paul Thompson 1997. Pathways
reference within hermeneutical approaches. These to Social Class: A Qualitative Approach to Social
works are mainly concerned with the interpretation of Mobility. Oxford: Clarendon Press.
texts and were subjects for the humanities rather than
Blumer, Herbert 1979 [1939]. An Appraisal of
the social sciences to start with. This was to change as
Thomas and Znaniecki’s ‘The Polish Peasant in
post-structuralism and post-modernism gained more
ground in the social sciences in the 1980s. Europe and America’. New Brunswick: Transaction
18 References in Denzin’s text are not included in Books.
this quote. Blumer, Herbert 1954. ‘What is Wrong with Social
19 See Teresa Iles (1992) for an example of publi- Theory’, American Sociological Review 19, 3–10.
cations from meetings across disciplinary boundaries. Bonnell, Victoria E. and Lynn Hunt 1999. ‘Introduction’
Stanley (1992) also voices the need for more cross- in Bonnell V. and Hunt L. (eds) Beyond the Cultural
disciplinary research in feminist biographical studies. Turn. Berkeley: University of California Press.
20 Pragmatist thought does not rest on a notion
Bourdieu, Pierre et al. 1999. The Weight of the World:
about truth as fixed, and thus a possibility to arrive
Social Suffering in Contemporary Society. Cambridge:
at some final account of life. Events and individuals’
experiences of them are recalled at different points in Polity Press.
time which can make factual events take on different Brannen, Julia 1995. Mixing Methods. Qualitative and
meanings in a personal life as time passes. This does, Quantitative Research. Aldershot: Avebury.
however, not mean events did not happen, or did Brannen, Julia, Peter Moss and Ann Mooney 2004.
not happen that particular way, rather that they are Working and Caring over the Twentieth Century.
seen and interpreted in different ways depending on Change and Continuity in Four-Generation Families.
the present a story is told in and the context the Basingstoke: Palgrave Macmillan.
interview takes place in (Nilsen 1996). The interview Brannen, Julia and Ann Nilsen 2005. ‘Individualisation,
itself and the relationship between the interviewer and
Choice and Structure: A Discussion of Current Trends
the informant, are also decisive of what aspects of
in Sociological Analysis’, The Sociological Review
factual events informants relate in their accounts. It is
important to note here that this way of approaching 53(3), 412–428.
interpretation does not imply a rejection of something Bryman, Alan 2004. Social Research Methods. Oxford:
‘true’ and ‘factual’ in events, in personal lives as well Oxford University Press.
as in historical and structural terms. Chamberlayne, Prue, Joanna Bornat and Tom Wengraf
21 The notion of experience, for the very reasons 2000. ‘Introduction’ in Chamberlayne et al. (eds)
mentioned here, came under debate and questions The Turn to Biographical Methods in Social
about experience itself were asked. It was not the Science: Comparative Issues and Examples. London:
‘truth’ of people’s accounts of experiences that were Routledge.
called into question, but the ontological foundation
Christensen, Karen, Else Jerdal, Atle Møen, Per Solvang
that the notion of experience rests on; whether there
is independent reality.
and Liv J. Syltevik 1998. Prosess og methode (Process
22 This drift towards interdisciplinarity has its and Method). Oslo: Universitetsforlaget.
critics. As Bonnell and Hunt (1999, p. 14) observe: Deegan, Mary Jo 2001. ‘Introduction: George Herbert
‘Dialogue among the disciplines depends in part on Mead’s First Book’ in Mead, George Herbert
a strong sense of their differences from each other: (ed.) Essays in Social Psychology. New Brunswick:
exchange is not needed if everything is the same; Transaction Publishers.
94 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Denzin, Norman 1989a. Interpretive Interactionism. Nilsen, Ann 1994. ‘Life Stories in Context. A Discussion
London: Sage. of the Linguistic Turn in Contemporary Sociological
Denzin, Norman 1989b. Interpretive Biography. London: Life Story Research’, Sosiologisk Tidsskrift 2(2),
Sage. 139–153.
Durkheim, Emile 1972 [1895]. Den sociologiske metode Nilsen, Ann 1996. ‘Stories of Life – Stories of Living.
(Rules of Sociological Methods). København: Fremad. Women’s Narratives and Feminist Biographies in
Elder, Glen 1974. Children of the Great Depression: NORA’, Nordic Journal of Women’s Studies 1(4),
Social Change in Life Experience. Chicago: University 16–31.
of Chicago Press. Nilsen, Ann 1997. ‘Great Expectations? Exploring Men’s
Erikson, Erik 1980 [1959]. Identity and the Life Cycle. Biographies in Late Modernity’ in Grønmo, Sigmund
New York: Norton. and Bjørn Henrichsen. (eds) Society, University
Gadamer, Hans-Georg 1989. Truth and Method. and World Community. Essays for Ørjar Øyen,
London: Sheed and Ward. pp. 111–135. Oslo: Scandinavian University Press.
Giddens, Anthony 1991. Modernity and Self-Identity. Nilsen, Ann and Julia Brannen 2002. ‘Theorising the
Cambridge: Polity Press. Individual-Structure Dynamic’ in Brannen et al. (eds)
Giele, Janet and Glen Elder 1998. ‘Life Course Research. Young Europeans, Work and Family: Futures in
Development of a Field’ in Giele J. and Elder G. (eds) Transition, pp. 30–48. London: Routledge.
Methods of Life Course Research. Qualitative and Nilsen, Ann and Julia Brannen 2005. Consolidated
Quantitative Approaches. London: Sage. Interview Report from the Transitions Research
Gittins, Diana 1979. ‘Oral History, Reliability, and Project for the EU Framework 5 funded study Gender,
Recollection’ in Moss and Goldstein. (eds) The parenthood and the changing European workplace,
Recall Method in Social Survey. University of London printed by the Manchester Metropolitan University:
Institute of Education: Studies in Education 9. Research Institute for Health and Social Change.
Glaser, Barney G. and Anselm L. Strauss 1967. Nilsen, Ann (forthcoming) American Pragmatism and
The Discovery of Grounded Theory: Strategies for Biographical Research (work in progress).
Qualitative Research. Chicago: Aldine. Platt, Jennifer 1992. ‘ “Case Study” in American
Gomm, Roger, Martyn Hammersley and Peter Foster Methodological Thought’, Current Sociology 40(1),
(eds) 2000. Case Study Method. London: Sage. 17–48.
Harding, Sandra 1987. ‘Introduction: Is There a Platt, Jennifer 1996. A History of Sociological Research
Feminist Method’ in Harding S. (ed.) Feminism and Methods in America. 1920-1960 Cambridge:
Methodology. Milton Keynes: Open University Press. Cambridge University Press.
Heritage, John 1984. Garfinkel and Ethnomethodology. Plummer, Ken 1983. Documents of Life. An Introduction
Cambridge: Polity Press. to the Problems and Literature of a Humanistic
Iles, Teresa (ed.) 1992. Biography. All Sides of the Method. London: Allen & Unwin.
Subject. London: Pergamon Press. Plummer, Ken 2001. Documents of Life 2. An Invitation
Kaplan, Abraham 1964. The Conduct of Inquiry. to a Critical Humanism. London: Sage.
Methodology for Behavioral Science. Scranton: Reissman, Catherine Kohler 1991. ‘When Gender is
Chandler Publishing Company. Not Enough. Women Interviewing Women’ in Lorber,
Levin, Irene 2000. ‘Forholdet mellom sosiologi og sosialt Judith and Susan Farrell. (eds) The Social Construction
arbeid’ (The relationship between sociology and social of Gender. London: Sage.
work), Sosiologisk tidsskrift (Journal of Sociology ) Riley, Mathilda W. (ed.) 1988. Social Structures and
8(1), 61–71. Human Lives. London: Sage.
Lewis, David and Richard Smith 1980. American Roberts, Brian 2002. Biographical Research.
Sociology and Pragmatism. Mead, Chicago Sociology Buckingham: Open University Press.
and Symbolic Interactionism. Chicago: The University Stanley, Liz 1992. The Autobiographical I. Manchester:
of Chicago Press. Manchester University Press.
Miller, Robert 2000. Researching Life Stories and Family Thomas, William I. and Florian Znaniecki [1918–20]
Histories. London: Sage. 1927. The Polish Peasant in Europe and America.
Mills, C. Wright 1940. ‘Methodological Consequences New York: Knopf.
of the Sociology of Knowledge’, American Journal of Thompson, Paul 1978. The Voice of the Past, Oral
Sociology 46(3), 316–330. History. Oxford: Oxford University Press.
Mills, C. Wright 1980 [1959]. The Sociological Yin, Robert 2003. Case Study Research. Design and
Imagination. London: Penguin Books. Methods. Thousand Oaks: Sage.
8
Research Ethics in Social Science
Celia B. Fisher and Andrea E. Anushko
Unparalleled growth in the social and behav- and Africa among other developing countries
ioral sciences in the last half of the twentieth (e.g. Council for International Organizations
century has and will continue to make signif- of Medical Sciences, 2002; Indian Council of
icant contributions to society’s understanding Medical Research, 2000; National Consensus
of persons as individuals, as members of Conference on Bioethics and Health Research
familial and non-familial social groups, and in Uganda, 1997; National Research Council,
participants within cultural, social, economic 2003; Thailand Ministry of Public Health
and political macrosystems. Increased public Ethics Committee, 1995; World Medical
recognition of the value of social research has Association, 2000).
been accompanied by heightened sensitivity
to the obligation to conduct social science
responsibly. The formidable task of insuring A BRIEF HISTORY OF RESEARCH
ethical competence in social research depends ETHICS RULES AND REGULATIONS
upon sensitive and informed planning by
ethically informed scientists and careful Biomedical research ethics have a long history
review by nationally mandated or indepen- formally beginning with the Nuremberg Code
dent Institutional Review Boards (IRBs) or (1946), the international response to the
Research Ethics Committees (REC). The atrocities committed by the Nazi medical
broad language of national and international experimentation. However because the acts
regulations and the diversity of expertise and committed by the Nazi scientists seemed
wide latitude in decision-making given to so far removed from standard medical and
IRBs is often intimidating to social scientists social research, the Nuremberg Code had
who are required to apply for IRB approval little influence on medical or social science
as a condition of conducting their research. research (Steinbock et al., 2005). Biomedical
Social scientists are additionally challenged research ethics continued to evolve slowly in
because of the historical and biomedical bias the United States and abroad (Declaration of
in the language and scope of regulations Helsinki, 1964). In the United States it was not
governing IRBs in the United States and RECs until the 1970s, when revelations of subjects’
in Europe, Latin America, India, Thailand, abuse in the now infamous Tuskegee Syphilis
96 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
social scientists while still building upon federal research funds to have written
the Nuremberg Code and the Declaration guidelines for the avoidance and institutional
of Helsinki, highlighting such principles as review of conflict of interest. These guidelines
research merit and integrity, justice, benef- must reflect state and local laws and
icence, and respect. Others have chosen to cover financial interests, gifts, nepotism,
order their principles according to the weight political participation, and other issues (see:
they should receive when in conflict specific http://grants.nih.gov/grants/policy/emprograms/
to the types of dilemmas social scientists often overview/ep-coi.htm).
face. For example, Canada prioritizes their Of relevance to investigators is the U.S.
four principles for social science researchers Public Health Service and National Science
in the order of: (1) Respect for the Dignity of Foundation (NSF) requirement that any
Persons; (2) Responsible Caring; (3) Integrity funding application must include a statement
in Relationships; and (4) Responsibility to on whether there are any significant financial
Society (CPA, 2000). interests that could directly and significantly
This chapter now turns to four specific areas affect the design, conduct or reporting of
of continued and emerging ethical concern in the research. Such interests can include
social research: conflicts of interest, informed consulting fees, honoraria, ownership or
consent, cultural equivalence, and the use of equity options, or intellectual property (e.g.
monetary incentives. The chapter concludes patents, copyrights, and royalties) where
with a call for ethical commitment, ethical such values exceed $10,000. Academic
awareness and active engagement in the institutional salaries and lectures sponsored
ongoing development of courses of action by non-profit or public entities are exempt
reflecting the highest ideals of responsible from this policy (see: http://www.nsf.gov/
social science. policies/conflicts.jsp, http://grants2. nih.gov/
grants/policy/nihgps_2001/nihgps_2001.pdf).
In addition, many IRBs in the United States
CONFLICT OF INTERESTS are requiring researchers to include a conflict
of interest statement in their informed
Social researchers should strive to estab- consents and journals are requiring a
lish relationships of trust with research statement describing the absence or existence
participants, the scientific community, and of a potential conflict of interest. For example,
the public. When conflicting professional, APA publications require authors to reveal
personal, financial, legal or other interests any possible conflict of interest (e.g. financial
impair the objectivity of data collection, interests in a test procedure, funding by
analysis or interpretation, such trust and the pharmaceutical companies) in the conduct
validity of the research is compromised. and reporting of research. According to the
Ethical steps to avoid potentially harmful or International Committee of Medical Journal
exploitative conflicts of interest are critical Editors (ICMJE) (2003) editors may use
to ensure that the objectivity of data analysis information disclosed in conflict of interest
and interpretation is led by data and not other and financial interest statements as a basis for
interests. Impairment of objectivity can harm editorial decisions. Prompted in large part by
participants, the public, institutions, funders, concerns about conflicts of interest stemming
and the integrity of social science as a field. from the relationship between pharmaceutical
Several national bodies and organizations companies and independent clinical research
have produced guidelines for conflict of organizations, India and other developing
interest decision-making relevant to the countries are beginning to call for adoption
conduct of social science research. For of international and establishment of national
example, in the United States the National regulations for research conflicts of interest
Institutes of Health Office of Extramural (Editorial, The Hindu, 2005; Pan African
Research requires every institution receiving Bioethics Initiative, 2001).
98 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
view social sciences as ‘hard science’and IRB be archived. Social science has a prestigious
unfamiliarity with qualitative research meth- history of archives (Young & Brooker, 2006).
ods has also posed challenges to anthropolo- The purpose of archived data is to provide
gists, sociologists, and other social scientists a rich set of data that can be used by
whose research often strays from the classical future investigators to examine empirical
scientific method because of unique research questions about populations that may not be
questions or the nature of their population anticipated when information is first collected.
(Marshall, 2003). Informed consent is also prob- Several organizations have begun to unite
lematic when working with immigrant popu- social science researchers and their data from
lations or in international settings for reasons around the world to create large and secure
ranging from language barriers and fear of accessible databases of archived information.
exploitation or deportation to authority to con- For instance, the Inter-University Consortium
sent resting with an individual other than the for Political and Social Research (ICPSR)
participants, e.g. in countries where women has over 500 college or university members
are not permitted to consent to research with- and has four major operations units, one of
out prior male permission (Marshall, 2003). which is data security and preservation. The
In studies where informed consent is Harvard-MIT data center also archives and
obtained, it is often difficult to ensure fully protects various social science data to allow
informed consent at the start of a project access for future generations of social science
because researchers may not be able to researchers. Participant identity is protected
anticipate the full extent of information that in these archives through a very detailed pro-
will emerge (Haverkamp, 2005). Risks to cess of individual de-identification. However,
privacy and confidentiality emerge when the the racial, ethnic, cultural, health, or other
information leads to unanticipated revelations demographic-based populations from which
regarding illegal behaviors (crimes, child or participants were recruited in most instances
domestic abuse, illegal immigration), health must remain identifiable for the research
problems (HIV status, genetic disorder) or questions to be meaningful.
other information that if revealed could Within the continuously changing social-
jeopardize participants’ legal or economic political context in which science and society
status (Fisher & Goodman, in press; Fisher & evolve, some investigators have begun to
Ragsdale, 2006). One way to address this question the validity of informed consent to
issue is to develop in advance a re-consent ongoing secondary analysis by unknown third
strategy for situations in which unanticipated parties with research questions that may be
and sensitive issues emerge during the course inconsistent with the consent understandings
of observation or discussion (Fisher, 2004; of those who initially agreed to participation
Haverkamp, 2005). The strategy can include and preservation. This becomes of particular
a set of criteria to help the interviewer: concern when secondary analysis of data from
(1) identify when unexpected information historically oppressed or disenfranchised
may lead to increased participant privacy and communities is requested (Young & Brooker,
confidentiality risk; (2) determine whether 2006) or if the circumstances under which the
the direction of the conversation is relevant original data was collected is questionable as
to the research question; (3) if not relevant, in the 1968 Yanomami research conducted
find ways to divert the discussion; or (4) if by Neel (http://members.aol.com/archaeodog/
relevant, alert the participant to the new nature darkness_in_el_dorado/documents/0081.htm).
of information and implement a mutually Requiring individual participants to recon-
negotiated re-consent procedure. sent to the use of archival data can be
both harmful and infeasible. First, it would
require that records linking responses to
Archival research
individually identifiable information is pre-
Similar, but more difficult issues emerge when served over decades, where confidentiality
consent is obtained for social research that will protections may be vulnerable over time.
RESEARCH ETHICS IN SOCIAL SCIENCE 101
Second, it would require locating individuals the moral ambiguity surrounding consent
after years or decades which in many cases for deception research when the investi-
would be impossible and the unavailability gator intentionally gives participants false
of segments of the initial population would information about the purpose and nature
compromise the validity of the sample. In of the study. In such contexts consent for
response to these challenges, the Council of deception research distorts the informed
National Psychological Associations for the consent process, because it leads prospective
Advancement of Ethnic Minority Interests participants to believe they have autonomy
(CNPAAEMI, 2000) has recommended that to decide about the type of experimental
social research archives consider setting procedures they will be exposed to, when in
up standing community (broadly defined) fact they do not.
advisory boards as a means of helping
archive administrators determine when newly
The deception debate
proposed analyses may violate the intent of
the informed consent. Debate on the ethical justification for decep-
tive research practices reflects a tension
between scientific validity and respect for
Deception research and the
participants’ right to make a truly informed
‘consent paradox’
participation decision (Fisher & Fyrberg,
In research using deceptive methods, the 1994). Arguments for deception emphasize
researcher intentionally misinforms partici- the methodological advantage of keeping par-
pants about the purpose of the study, the ticipants naïve about the purpose of the study
procedures, or the role of individuals with to ensure responses to experimental manipula-
whom the participant will be required to tions are spontaneous and unbiased (Milgram,
interact (Sieber, 1982). The use of deceptive 1964; Resnick & Schwartz, 1973; Smith &
techniques is not prohibited in any national Richardson, 1983). Arguments against decep-
research regulations and is explicitly permit- tion emphasize the violation of participant
ted with stipulations in professional ethics autonomy, the potential to create public
codes including the American Psychological distrust in social science research in general
Association (2002), American Sociological and the harm resulting from infliction of self-
Association (1999), Canadian Psychologi- knowledge that was unexpected, unwanted,
cal Association (2000), British Psycholog- shameful or distressful (Baumrind, 1964).
ical Society (2006), and the International Sociologists have been at the center of
Sociological Association (2001). Baumrind deception controversy and have members
(1979) distinguished between nonintentional who are stanch advocates and opponents
deception, in which failure to fully inform of the practice. Allen (1997) falls into the
cannot be avoided because of the complexity latter category, criticizing sociologists for
of the information, and intentional deception, befriending groups of interest without letting
which is the withholding of information in on that they were subjects of sociological
order to obtain participation that the subject research, misrepresenting the motives of
might otherwise decline. Simply not pro- their research, and adopting a false persona
viding participants with specific hypotheses to conduct research. Particularly disturbing
regarding the relationship among experimen- to Allen is the defense that personal time
tal variables does not in itself constitute and effort prevented the feasibility of other
deception. methods, thus in order to get the research done
Deception most obviously violates the deception was necessary.
principle of respect, by depriving prospective
participants the opportunity to make an
Ethical options
informed choice regarding the true nature
of their participation. What Fisher (2005) Bulmer (1982) concludes that completely
has termed the ‘consent paradox’ underscores disguising the intent of research can affect
102 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
the quality of the data collected as well physical pain or severe emotional distress.
as exaggerate the unknown biases of the Third, the investigator must prove that
researcher. Instead he proposes such meth- the same hypotheses cannot be sufficiently
ods as retrospective participant observation explored and tested using non-deceptive
in which a sociologist uses retrospective designs. This standard thus prohibits the use
observations from previous experience when of deception research if inconvenience or
she was a total participant prior to any costs of performing non-deceptive research
research interest. He also supports the use are the only reasons for proposing such
of native as stranger, in which an already methods (Fisher, 2003a). In addition, the true
established member of the group is trained as nature of the deception must be revealed to
a sociologist. The covert outsider is another participants at the end of the study unless the
suggested method in which a legitimate role, debriefing might reasonably be expected to
such as a teacher in a prison, is taken on in bias future participant responses; or withhold
order to observe behavior and gain access to such information if the debriefing itself
an otherwise unreachable population (Bulmer, would cause participant harm (APA, 2002,
1982). Standard 8.08b).
According to U.S. federal guidance (OPRR, While the APA and other organizations’
1993), when considering the use of deception, ethics codes attempt to increase the ethical
investigators must first decide whether the rigor of decisions to use deception methodolo-
information to be withheld during consent gies, no guidance can erase the threat to partic-
would, if known, influence the individual’s ipant autonomy that such procedures reflect.
desire to participate in research. However how Neither, debriefing (even when believed to
to judge this prospectively is difficult. Some be valid by participants) nor the opportunity
have argued that responses from previous par- to withdraw their data, are a panacea for
ticipants during dehoaxing (revealing the true the ethical paradox of deception research.
nature of the study at the end of participation) Consent can only be obtained prospectively
can be used to document the benign effects (OPRR, 1993); subsequent procedures can
of different deceptive methodologies. This never be considered an adequate substitute.
approach raises its own (debriefing) paradox
(Fisher, 2005). Fisher and Fyrberg (1994)
found that introductory psychology students FAIR DISTRIBUTION OF THE BENEFITS
(the most commonly recruited participants for AND BURDENS OF RESEARCH
deception studies) were likely to believe that
the dehoaxing process was either simply a The principle of justice is concerned with
continued extension of the research or that the fair and equitable distribution of research
the debriefing information was itself untrue. benefits and burdens. In social research,
As a result, students reported they would benefits are defined by the usefulness of
be unlikely to reveal their true feelings to data generated to help understand micro and
experimenters during the dehoaxing process; macro social processes within and among
and some were concerned they would be different populations. The burdens of social
penalized if they were truthful. research include exposure to research risks
TheAPAEthics Code (APA, 2002) attempts and required time and effort associated
to balance the principles of beneficence, with participation. Justice in social research
non-maleficence, and respect. First, the use becomes a particular ethical challenge when
of deceptive methods must be justified by racial or ethnic minority, disadvantaged, or
the study’s prospective value in scientific, disenfranchised populations are recruited for
educational or applied areas. Second, even participation in research designs that fail to
if the research is determined to have value, include consideration of unique population
deception is prohibited if it is reasonably characteristics that may reduce the knowledge
expected that the procedures will cause any value of data generated or expose them to
RESEARCH ETHICS IN SOCIAL SCIENCE 103
greater risk or financial burden (Fisher, 1999; social, economic, and political forces contin-
Trimble & Fisher, 2006). uously shape and redefine these definitions
for both individuals and society at large
(Chan & Hume, 1995; Zuckerman, 1990).
Population generalizability Investigators need to consider and explicitly
describe the theoretical, empirical, and social
The constantly changing demographic U.S.
frameworks driving the definitions of race,
and international landscapes pose the risk
ethnicity, or culture used to select participant
that research findings from one participant
populations, to insure the scientific validity
population will be inappropriately generalized
of the research question and to allow their
to other populations. This can occur in at least
research findings to be evaluated within the
two ways. First, injustices may occur when
context of continuously changing scientific
populations are intentionally or unintention-
and societal conceptions of these definitions
ally excluded from recruitment, but results of
(Fisher et al., 2002).
the study are inappropriately generalized to
Within group differences are also an
apply to their social or psychological charac-
important factor to consider when identifying
teristics and circumstances. This becomes par-
population characteristics relevant to the
ticularly problematic for social science when
study questions. Investigators often ignore
the descriptions of ethnic/racial characteris-
the scientific implications of variation among
tics are vaguely described in journal articles.
populations described under broad panethnic
Typical descriptions that provide inadequate
labels. For example, failure to identify the
knowledge for assessing the relevance of the
national origins of participants categorized
data to ethnic minority populations in the
as ‘Hispanic’ (e.g. Mexico, Puerto Rico,
United States, for example are: ‘the majority
Guatemala, Chile) can produce overgener-
of participants were non-Hispanic white’;
alizations that dilute or obscure moderating
or ‘eighty-percent of participants were non-
effects on social behavior resulting from
Hispanic white; the remaining 20 percent were
national origin, immigration history, religion,
African American and Hispanic’ (Fisher &
and tradition. In addition, within even these
Brennan, 1992).
more nationally defined categories, research
participants may vary greatly in their identifi-
cation with the ethnic group of family origin or
Defining race, ethnicity, and culture
with the degree to which they are acculturated
When participants’ race, ethnicity, or culture to majority culture (Fisher et al., 1997).
are described in greater detail there is often
an absence of definition of what these Cultural equivalence of assessment
terms mean or how decisions to identify
measures
participants by ‘race’ (physical similarities
assumed to reflect phenotypic expressions Investigators need to heed a second risk
of shared genotypes), ‘ethnicity’ (assumed of producing research injustice: failure to
cultural, linguistic, religious, and historical recognize when a measure of a social
similarities), or ‘culture’ (group ways of construct established in one population when
thinking and living based upon shared knowl- applied to another ethnic/cultural group may
edge, consciousness, skills, values, expressive not yield similar psychometric properties
forms, social institutions, and behaviors that nor reflect a social phenomenon that has
allow individuals to survive in the contexts similar behavioral or psychological patterns
within which they live) reflects assumptions of relationships (Hoagwood & Jensen, 1997;
about the underlying causal mechanisms Laosa, 1990). The use of such measures
driving similarities or differences found risks the over- or under-identification of
among populations (Fisher et al., 1997). socially meaningful characteristics, compro-
Further, there is often little recognition that mising the scientific benefits of the research
104 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Payments across research populations dif- published studies are related to time and
fering in financial need create a tension level of activity. In addition they found no
between fair compensation for the time and evidence that participants of these studies
inconvenience of research participation and were being enticed with large monetary
coercion. inducements.
Ideally monetary incentives for research
participation should strengthen generalizabil-
Payment for participation in illicit
ity by providing a balanced representation of
individuals from all economic levels appro-
drug use research
priate to the research question (Giuffrida & Cash payment for participation in illicit drug
Togerson, 1997; Kamb et al., 1998). However, use research can create an ethical paradox if
individuals from different economic circum- it is used by participants to purchase illegal
stances can have different responses to cash drugs, encourages them to maintain their drug
inducement as fair or coercive (Levine, 1986). habits to continue earning research money, or
Payments that are unnecessarily low can leads them to provide answers to experimental
reduce the generalizability of data through questions that distort evaluation of the social
under-recruitment of economically disadvan- correlates and consequences of drug use
taged populations. Payments that are too (Fisher, 2003b; Koocher, 1991; McGrady &
high raise different concerns. For example, Bux, 1999; Shaner et al., 1995). On the
large financial incentives can jeopardize the other hand, for those who have difficulty
voluntary nature of participation, under- obtaining and holding jobs, the money may be
mine altruistic motivations for engaging in ethically justified as a legal means of obtaining
research, tempt prospective participants to payment for unskilled labor. Policies aimed
provide false information to become eligible at addressing this problem include spreading
for study participation, or lie in response out the payment of full compensation over
to experimental questions to comply with a period of time, using food coupons or
investigator expectations (Attkisson et al., vouchers for other health-related products,
1996; Fisher, 2003b; Saunders et al., 1999). making payments to third parties on behalf
Grady (2001) argues that arbitrary or large of the participant, or withholding payment if
sums of money to entice participants is poor a participant is intoxicated or in withdrawal
practice, while modest payments help to (Fisher, 2004; Gorelick et al., 1999). Such
minimize possible undue inducement. She alternatives raise their own ethical quandaries.
proposes that the informed consent process First, there is no evidence that any substitute
in which participants are reminded of their for non-cash incentives deters participants
freedom to refuse participation or withdraw with illicit drug habits from using the
their consent without repercussions is ade- monetary value of the incentives to purchase
quate protection against potential coercion drugs. For example, informal observations by
(Grady, 2001). social scientists working in the field suggest
Based on an analysis of compensation that if need be vouchers are easily sold by
practices of a representative sample of participants for cash. Furthermore, a decision
biomedical and psychosocial research con- not to pay substance abusers can reinforce
ducted in 1997 and 1998, Latterman and economic inequities between drug abusing
Merz (2001) reported research payments and non-abusing populations or deny them the
on average of $9.50/hour plus $12.00 for right to apply their own value system to life
each additional task (U.S. dollars); larger risk decisions (Fisher, 1999).
compensation was related to longer partic-
ipatory time, repeated interaction with the
Ensuring fairness
researcher, invasive tasks, and the number
of tasks. From their small study these Social scientists are challenged to determine
researchers concluded that payments in payments that are perceived by all participants
106 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
the APA ethics code. Washington, DC: American Fisher, C. B. (2004). Informed consent and clini-
Psychological Association. cal research involving children and adolescents:
Chan, K. S., & Hume, S. (1995). Racialization and Implications of the revised APA ethics code and
panethnicity: From Asians in America to Asian HIPAA. Journal of Clinical Child and Adolescent
Americans. In W. D. Hawley & A. W. Jackson (Eds.), Psychology, 33, 833–840.
Toward a common destiny: Improving race and ethnic Fisher, C. B. (2005). Deception research involving
relations in America (pp. 205–236). San Francisco: children: Ethical practices and paradoxes. Ethics &
Jossey-Bass. Behavior, 15, 271–287.
Council for International Organizations of Medical Fisher, C. B., & Brennan, M. (1992). Application
Sciences. (2002). International guidelines for eth- and ethics in developmental psychology. In
ical review of epidemiological studies Geneva: D. L. Featherman, R. M. Lerner, & M. Perlmutter
Council for International Organizations of Medical (Eds.), Life-span development and behavior
Sciences. Retrieved May 11, 2007 from http://www. (pp. 189–219). Hillsdale, NJ: Lawrence Erlbaum
cioms.ch/frame_guidelines_nov_2002.htm. Associates.
Council of National Psychological Associations for Fisher, C. B., & Fyrberg, D. (1994). College students
the Advancement of Ethic Minority Interests. weigh the costs and benefits of deceptive research.
(2000).Guidelines for research in ethnic minority American Psychologist, 49, 417–426.
communities. Washington, DC: American Psycholog- Fisher, C. B., & Goodman, S. J. (in press). Goodness-
ical Association. of-fit ethics for non-intervention research involving
Dench, S., Iphofen, R., & Huws, U. (2004). IES Report dangerous and illegal behavior. In D. Buchanan,
412. An EU code of Ethics for Socio-economic C. B. Fisher, & L. Gable (Eds.), Ethical & legal
Research. Brighton, UK: The Institute for Employment issues in research with high risk populations:
Studies. Addressing threats of suicide, child abuse, and
Department of Health, Education, & Welfare (DHEW). violence. Washington, DC: APA Press.
(1978). The Belmont report: Ethical principles and
Fisher, C. B., Hoagwood, K., Boyce, C., Buster, T.,
guidelines for the protection of human subjects of
Frank, D. A., Grisso, T., Levine, R. J., Macklin, R.,
research. Washington DC: US Government Printing
Spencer, M. B., Takanishi, R., Trimble, J. E., &
Office.
Zayas, L. H. (2002). Research ethics for mental health
Department of Health and Human Services. (2001). Title
science involving ethnic minority children and youth.
45 Public Welfare, Part 46, Code of federal regu-
American Psychologist, 57, 1024–1040.
lations, Protections of human subjects.Washington,
Fisher, C. B., Jackson, J., & Villarruel, F. (1997). The study
DC: Government Printing Office.
of African American and Latin American children
Dickert, N., & Grady, C. (1991). What’s the price
and youth. In R. M. Lerner (Ed.), Handbook of
of a research subject: Approaches to the payment
child psychology (Vol. I, 5th ed., pp. 1145–1207).
for research participation. New England Journal of
New York: Wiley.
Medicine, 341, 198–203.
Fisher, C. B. (1999). Relational ethics and research Fisher, C. B., & Masty, J. K. (2006). A goodness-of-
with vulnerable populations. Reports on research fit ethic for informed consent to pediatric cancer
involving persons with mental disorders that may research. In R. T. Brown (Ed.), Comprehensive
affect decision-making capacity (Vol. II, pp. 29–49). handbook of childhood cancer and sickle cell
Commissioned Papers by the National Bioethics disease: A biopsychosocial approach (pp. 205–217).
Advisory Commission, Rockville, MD. Retrieved New York: Oxford University Press.
March 21, 2006 from http://www.georgetown. Fisher, C. B., & Ragsdale, K. (2006). A goodness-of-
edu/research/nrcbl/nbac/pubs.html. fit ethics for multicultural research. In J. Trimble
Fisher, C. B. (2002). A goodness-of-fit ethic for informed and C. B. Fisher (Eds.),The handbook of ethical
consent. Fordham Urban Law Journal, 30, 159–171. research with ethnocultural populations and com-
Fisher, C. B. (2003a). Decoding the ethics code: munities (pp. 3–26). Thousand Oaks, CA: Sage
A practical guide for psychologists. Thousand Oaks, Publications.
CA: Sage Publications. Fried, A. F., & Fisher, C. B. (in press). The ethics
Fisher, C. B. (2003b). Adolescent and parent perspec- of informed consent for research in clinical and
tives on ethical issues in youth drug use and suicide abnormal psychology. In D. McKay (Ed.), Handbook
survey research. Ethics & Behavior, 13, 302–331. of research methods in abnormal and clinical
Fisher, C. B. (2003c). A goodness-of-fit ethic for psychology. Thousand Oaks, CA: Sage Publications.
child assent to nonbeneficial research. The American Giuffrida, A., & Togerson, D. J. (1997).Should we pay the
Journal of Bioethics, 3, 27–28. patient? Review of financial incentives to enhance
108 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
patient compliance. British Medical Journal, 315, Koocher, G. P. (1991). Questionable methods in
703–707. alcoholism research. Journal of Consulting and
Gorelick, D. A., Pickens, R. W., & Bonkovsky, F. O. Clinical Psychology, 59, 246–248.
(1999). Clinical research in substance abuse: Laosa, L. M. (1990). Population generalizeability, cul-
Human subjects issues. In H. A. Pincus, J. A. tural sensitivity, and ethical dilemmas. In C. B. Fisher &
Lieberman, & S. Ferris (Eds.), Ethics in psychiatric W. W. Tryon (Eds.), Ethics in applied developmental
research (pp. 177–192). Washington, DC: American psychology: Emerging issues in an emerging field
Psychiatric Association. (pp. 227–252). Norwood, NJ: Ablex.
Grady, C. (2001). Money for research participation: Does Latterman, J., & Merz, J. F. (2001). How much are
it jeopardize informed consent? American Journal of subjects paid to participate in research? The American
Bioethics, 1, 40–44. Journal of Bioethics, 1, 45–46.
Haverkamp, B. E. (2005). Ethical perspectives on Levine, R. (1986). Ethics and regulation of clin-
qualitative research in applied psychology. Journal of ical research (2nd ed.). Baltimore: Urban &
Counseling Psychology, 52, 146–155. Schwarzenberg.
Heath, S. B. (1997). Culture: Contested realm in research Macklin, R. (1999). Moral progress and ethnical
on children and youth. Applied Developmental universalism. In R.Macklin (Ed.), Against relativism:
Science, 1, 113–123. Cultural diversity and the search for ethical universal
Heller, J. (1972). Syphilis victims in the U.S. study went in medicine (pp. 249–274). New York: Oxford
untreated for 40 years. New York Times, 26 July University Press.
1972, 1, 8. Marshall, P. L. (1992). Research ethics in applied
Hoagwood, K., & Jensen, P. S. (1997). Developmental anthropology. IRB: A Review of Human Subjects
psychopathology and the notion of culture: Introduc- Research, 14, 1–5.
tion to the special section on ‘The fusion of cultural Marshall, P. L. (2003). Human Subjects Protections,
horizons: Cultural influences on the assessment Institutional Review Boards, and Cultural Anthro-
of psychopathology in children and adolescents.’ pological Research. Anthropological Quarterly, 76,
Applied Developmental Science, 1, 108–112. 269–285.
Indian Council of Medical Research. (2000). Ethical McGrady, B. S., & Bux, D. A. (1999). Ethical issues in
guidelines on biomedical research involving human informed consent with substance abusers. Journal of
subjects New Delhi: Indian Council of Medical Consulting and Clinical Psychology, 67, 186–193.
Research. Retrieved May 11, 2007 from http://www. Milgram, S. (1964). Issues in the study of obedience:
icmr.nic.in/ethical.pdf. A reply to Baumrind. American Psychologist, 19,
International Committee of Medical Journal Editors. 848–852.
(2003). Uniform requirements for manuscripts sub- National Advisory Council on Drug Abuse. (2000).
mitted to biomedical journals: Writing and editing for Recommended guidelines for the administra-
biomedical publication. Retrieved April 2, 2004 from tion of drugs to human subjects. DA-01-002.
http://www.icmje.org/#conflicts. NIDA-CAMCODA. Retrieved January 11, 2004
International Sociological Association (2001). http:// from http://grants.nih.gov/grants/guide/noticefiles/
www.isa-sociology.org/about/isa_code_of_ethics.htm. NOT-DA-01-002.html.
Jones, J. H. (1993). Bad blood: The Tuskegee syphilis National Consensus Conference on Bioethics and
experiment, new and expanded ed. New York: Free Health Research in Uganda. (1997). Guidelines for
Press. the Conduct of Health Research Involving Human
Kamb, M. L., Rhodes, F., Hoxworth, T., Rogers, J., Subjects in Uganda. Kampala, Uganda.
Lentz, A., Kent, C., MacGowen, R., & Peterman, T. A. National Health and Medical Research Council
(1998). What about money? Effects of small monetary (NHMRC). (2007). National Statement on Ethical
incentives on enrollment, retention, and motivation to Conduct in Human Research. http://www.nhmrc.
change behavior in an HIV/STD prevention counseling gov.an/publications/synopses/_files/e72.pdf.
intervention. Sexually Transmitted Infection, 74, National Research Council. (2003). Protecting partici-
253–255. pants and facilitating social and behavior sciences
Knight, G. P., & Hill, N. E. (1998). Measurement equiv- research. In C. F. Citro, D. R. Ilgen, & C. B. Marret
alence in research involving minority adolescents. (Eds.). Washington, D.C.: The National Academies
In V. C. McLoyd & L. Steinberg (Eds.), Studying minor- Press.
ity adolescents: Conceptual, methodological,and Nolan, R. W. (2002). Anthropology in practice: building
theoretical issues (pp. 183–211). Mahwah, NJ: a career outside the academy (directions in applied
Erlbaum. anthropology). Boulder: Lynne Rienner.
RESEARCH ETHICS IN SOCIAL SCIENCE 109
Nuremberg Code. (1946). Journal of the American of medical ethics. Thailand: Ministry of Public
Medical Association, 132, 1090. Health.
Office for Protection From Research Risks, Department The Hindu. (2005). Editorial: A dangerous conflict of
of Health and Human Service, National Institutes interest. The Hindu. Retrieved May 11, 2007 from
of Health. (1993). Protecting human research http://www.hindu.com/2005/11/30/stories/2005113
subjects: Institutional review board guidebook. 002301000.htm.
Washington, DC: Government Printing Office. Trimble, J. E., & Fisher, C. B. (2006). The handbook
Ogundiran, T. O. (2004). Enhancing the African bioethics of ethical research with ethnocultural population
initiative. BMC Medical Education, 4, 21. and communities. Thousand Oaks, CA: Sage
Pan African Bioethics Initiative (PABIN). (2001). Terms Publications.
of Reference. http://www.pabin.net/enindex.asp. Varma, R. (1999). Women and people’s science
Resnick, J. H., & Schwartz, T. (1973). Ethical standards movements in India. Technology & Society:
as an independent variable in psychological research. Historical, Societal, and Professional Perspectives
American Psychologist, 28, 134–139. Proceedings, 1999 International Symposium, 29-21,
Saunders, C. A., Thompson, P. D., & Weijer, C. (1999). 378–382.
What’s the price of a research subject? New England Wendler, D., Rackoff, J. E., Emanuel, E. J., &
Journal of Medicine, 341, 1550–1552. Grady, C. (2002). The ethics of paying for children’s
Shaner, A., Eckman, T. A., & Roberst, L. J. (1995). participation in research. Journal of Pediatrics,
Disability income, cocaine use, and repeated hos- 141(2), 166–171.
pitalization among schizophrenic cocaine abusers: Winslade, W. J., & Douard, J. W. (1992). Ethical issues
A government-sponsored revolving door? New in psychiatric research. In L. K. G. Hsu & M. Hersen
England Journal of Medicine, 333, 777–783. (Eds.), Research in psychiatry: Issues, strategies, and
Sieber, J. E. (1982). Ethical dilemmas in social research. methods (pp. 57–70). New York: Plenum.
In J. E. Sieber (Ed.), The ethics of social research: World Medical Association. (2000). Declaration of
Surveys and experiments (pp. 1–30). New York: Helsinki: Ethical principles for medical research
Springer-Verlag. involving human subjects. Edinburgh: World
Sieber, J. E. (1992). Planning ethically responsible Medical Association. Retrieved May 11, 2007 from
research: A guide for students and internal review http://www.wma.net/e/policy/pdf/17c.pdf.
boards. Thousand Oaks: CA. Sage Publications. Young, C. H., & Brooker, M. (2006). Safeguarding sacred
Smith, S. S., & Richardson, D. (1983). Amelioration lives: The ethical use of archival data for the study of
of deception and harm in psychological research. diverse lives. In J. E. Trimble & C. B. Fisher (Eds.),
Journal of Personality and Social Psychology, 44, The handbook of ethical research with ethnocultural
1075–1082. populations and communities. Thousand Oaks, CA:
Steinbock, B., Arras, J. D., & London, A. J. (2005). Sage Publications.
Ethical issues in modern medicine. New York, NY: Zuckerman, M. (1990). Some dubious premises in
McGraw-Hill Higher Education. research and theory on racial differences: Scientific,
Thailand Ministry of Public Health Ethics Committee. social and ethical issues. American Psychologist, 45,
(1995). Rule of the medical council on the observance 1297–1303.
PART II
Research Designs
This section of the handbook provides diverse intervention is being tested to determine
perspectives on the design of social research. its impact. The priority for these research
This section provides a sample of important designs is to enhance the ability to draw valid
issues in the design of qualitative and conclusions about the attribution of cause.
quantitative research rather than an integrated Howard Bloom’s chapter on randomized
textbook approach to design. This approach experiments provides both a basic framework
allows a more in-depth exploration of topics for understanding the design of experiments
that range from a detailed quantitative analysis as well as a look at future developments and
of sample size planning for studies using applications. Randomized designs require that
multiple regression, to broader overviews individuals or aggregates such as organiza-
of the conduct of qualitative case studies. tions have an equal chance of being assigned
The creation of randomized and quasi- to treatment or control groups. The major
experimental research designs is discussed advantage of this design is that it is the best
in detail in the first two chapters. These way to assure that the groups are equivalent
chapters provide essential information on how on both measured and unmeasured variables
to improve both of these research designs. at the start of the study. Properly implemented,
From the in-depth quantitative perspective on this design eliminates most threats to internal
sample size we move to a re-conceptualization validity, i.e. the factors that threaten the ability
of generalizability in qualitative research. to demonstrate that the treatment caused the
The author of this chapter argues that effect and not something else. Familiarity
correctly designed qualitative studies are with randomized designs is increasingly
as generalizable as representative sampling important as the number of studies using
used in quantitative studies. An overview of these designs increases. For example, in the
the qualitative case study is provided in the U.S. one federal research agency (Institute of
following chapter. In the next chapter the Education Sciences) requires applicants for
similarities and differences in the design of research grants to use a randomized design or
qualitative and quantitative longitudinal and justify why they did not. Randomized designs
panel studies are discussed. The final chapter have been used in almost all substantive areas
of this section discusses specific issues in including such diverse topics as education,
the design of comparative and cross-national policing, and child care.
studies. Bloom explains the five elements that
The first two chapters of this section need to be present in a randomized design.
deal with social science studies where an The research question must specify what
112 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
treatment is being tested and with what more difficult in other situations. This design
condition it will be compared. Typically the is less powerful than a randomized design and
comparison group will not be a no treatment thus is less likely to detect an effect if one
group but a group receiving usual treatment. is there. The interrupted times series design
Second, the unit of randomization needs to requires several data points before and after
be specified. One of the major advances in the intervention.
research has been the application of ran- The authors also discuss in some detail how
domized designs to organizations and other to strengthen the most widely used design –
aggregations such as schools or classes. The the non-equivalent control design. This design
specification of the measurement methods is compares a treated group with an untreated
the third element. How will outcome and control group using one pre-test and one
baseline characteristics be measured? The post-test. Random assignment to conditions
fourth element is of a practical nature. What is not used in this design. Cook and Wong
is the implementation strategy? How will note that many dismiss quasi-experiments
sites or individuals be recruited, randomly as being grossly inferior to randomized
assigned and the treatment delivered? Fifth experiments. However, they describe studies
is the analysis plan that addresses whether that show that under some circumstances
randomization was successful and if the well-executed quasi-experiments’ outcomes
treatment delivered as planned. are comparable to randomized experiments’
Planning a randomized experiment is more outcomes. One of the most important con-
complex than just these five elements. Bloom ditions is how well the groups match before
explains some of these complications and the study on both measured and unmeasured
suggests actions that the researcher can take variables.
to prevent or deal with potential problems. One of the more recent approaches to
For example he discusses the effects of non- matching groups to enhance their equivalency
compliance to the intervention and how to involves the use of propensity scores. These
statistically adjust for it in the analysis. The scores are usually constructed of variables
chapter also suggests future directions for found in pre-treatment scores that are good
randomized designs. predictors of group membership. These scores
The chapter by Tom Cook and Vivian Wong represent the differences in selection between
provides an excellent overview of experimen- the two groups. The authors provide excellent
tal and quasi-experimental research designs, examples of other ways to strengthen quasi-
with a focus on the latter. The authors experiments such as the use of double pre-
stress that while well-executed, randomized tests.
experiments are the best choice for drawing One of the first questions experimental
causal conclusions there are some quasi- researchers need to consider in planning a
experiments that are excellent alternatives. study is the sample size. The availability and
The first section of the chapter carefully feasibility of collecting data from the sample
examines two strong designs. Both the is of prime consideration, especially when
regression-discontinuity and the interrupted the sampling units are not individuals but
time series with a control series are good organizations such as schools or clinics. The
in reducing the plausibility of alternative cost collecting data and the number of units
explanations that threaten the internal validity required will set the outer limit on the sample
of non-experimental designs. However, there size. In planning a study two categories need
are significant limitations to both approaches. to be considered. The first category is whether
The regression-discontinuity design requires the research question is about an overall
that the treatment and comparison be assigned indicator (i.e. an omnibus test) or targeted
by a cut-off score from some assignment vari- effect. The second category is whether the
able. This is feasible, for example, where there goal is to determine a point estimate that
is screening before getting the intervention but requires the calculation of statistical power or
RESEARCH DESIGNS 113
if it is a confidence interval that requires the producing generalizable results, because they
calculation of accuracy. start from the assumption that their objects
In the second category a power estimate of study possess quasi-invariant states on the
is needed to test the null hypothesis, i.e. properties observed. The (statistical) principle
that a specific value is different from zero. of variance is the key concept applied here.
Concern over power is driven by needing Under the variance principle, to determine the
to demonstrate statistical significance or how sample size, the researcher must first know the
probable the result (a point estimate) is due range of variance that one intends to measure.
to chance. An alternate approach to research If the range of variance is high, the number
questions favored by some is the use of of cases studied needs to be high, whereas
confidence intervals. Here the question is if the range of variance is restricted, the
concerned with how wide is the band of number may be restricted as well. Gobo shows
uncertainty or error. The authors use the how the way in which representativeness is
term ‘accuracy’ to describe the narrowness discussed and sought for in many traditions
of confidence intervals. The smaller the of qualitative research is in line with the
confidence interval the better is the accuracy. variance principle. By applying a theory-
Accuracy is a function of precision and bias. driven strategy of choosing additional cases
Ken Kelley and Scott Maxwell provide an and by defining their units of analysis in
in-depth explanation of these concepts and a sensible way, researchers are able to assess
how research questions can be categorized the variability of the phenomenon and to make
into a two by two table where the goal sure that extreme cases are taken into account.
can be power verses accuracy and the effect Thus the explanation given can be argued to
can be targeted verses omnibus. They use be generalizable to the defined population,
this approach to help explicate how the although probability sampling is not used.
determination of the sample size in multiple Linda Mabry’s chapter on case studies in
regression is dependent on these four factors. social research provides an overview of the
The chapter can be formidable for persons not ways in which this approach has evolved
well versed in statistical analysis. However, and is used in the social sciences. Case
it provides an important way to conceptualize studies are most useful for identifying and
the decisions needed to determine sample size. documenting the patterns of ordinary events
In the chapter ‘Re-Conceptualizing Gener- in their social, cultural, and historical context.
alization in Qualitative Research’ Giampietro The case study is based on the inductive
Gobo makes the point that probability sam- method and is a means to build a theoretical
pling cannot be advocated as the only model understanding of social phenomena. From this
suited to the generalization of findings. On the viewpoint, traditional hypotheses testing may
other hand Gobo warns against the extreme restrict the researchers’ vision and may foster
postmodernist stance, which in fact agrees a premature conclusion and thus miss a deeper
with and supports the positivist viewpoint that understanding of the object of study. Mabry
generalizability can only be based on random emphasizes that an attitude of openness should
sampling. Instead, he promotes what he calls be maintained in conducting a case study.
an idiographic sampling theory, which is in The particular strength of the chapter
fact in use in several disciplines outside the by Jane Elliott, Janet Holland, and Rachel
human sciences. These are disciplines akin to Thomson on longitudinal research is that
qualitative research, for they work exclusively they cover both qualitative and quantitative
on few cases and have learnt to make a virtue research traditions, which are both well
out of necessity. Disciplines such as biol- established and typically discussed separately.
ogy, astrophysics, genetics, paleontology, and The chapter focuses on panel and cohort
linguistics work on non-probability samples studies where the same group of individuals
regarded as being just as representative of is followed through time. Elliott et al. show
their relative populations and therefore as that in terms of the objectives for carrying
114 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
out longitudinal research, there isn’t much social relations rather than as contributing
difference between qualitative and quantita- to the maintenance and metamorphosis of
tive researchers. Longitudinal social research themselves, and the culture and community
is done because it offers unique insights in which they live.
into process, change and continuity over Comparative research, especially when it
time in phenomena ranging from individuals, is conducted cross nationally, is another
families, and institutions to societies. important growth area in the social sci-
Elliott et al. point out that both quali- ences in the context of the globalization
tative and quantitative traditions have their of communications, technological progress,
strengths, which may be complemented in and growing internationalization. This is the
mixed methods studies. Quantitative methods focus of Chapter 15 by David de Vaus. As
offer refined techniques to analyze causal de Vaus concludes, such research raises the
relations, whereas qualitative researchers tend same methodological issues as other research,
to be shy in talking about causal relations even at least in abstracto. However because of the
though some argue that because of its attention complexity involved in comparative research,
to detail, process, complexity, and contex- especially when applied cross nationally,
tuality, qualitative research is particularly there are additional problems of how to deal
valuable for identifying and understanding with inter- and intra-societal differences of
multi-causal linkages. In quantitative longitu- language and culture. The chapter explores the
dinal research a priority is placed on collecting nature of comparative research and classifies
accurate data from a large representative sam- it according to two broad types: case-
ple about the nature and timing of life events, based comparative studies and variable-based
circumstances, and behavior. In qualitative comparative research. The chapter explores
longitudinal research the emphasis is far more their different logics and the problems that
on individuals’ understanding of their lives each confronts. The strength of case-based
and circumstances and how these may change comparative methods lies in its understanding
through time. While quantitative longitudinal of specificities within the context of the
analytic processes provide a more processual whole case, a feature that is crucial to
or dynamic understanding of the social world, cross-cultural research. On the other hand,
they do so at the expense of setting up such research raises the problem of how
a static view of the individual. Quantitative to know the boundaries of the case, issues
longitudinal research provides a powerful tool to do with the small number of cases that
for understanding the multiple factors that are typically involved, and issues around
may affect individuals’ lives, shaping their invariant causation. The problems of variable-
experiences and behavior. But there is little based comparative studies, notably discussed
scope for understanding how individuals use with reference to cross-national surveys, also
narrative to construct and maintain a sense have their own problems related to equiva-
of their own identity. Without this element lences of meanings and the standardization
there is a danger that people are merely seen of procedures. However it is arguable that
as making decisions and acting within a pre- case-based comparative research also has to
defined and structurally determined field of contend with these challenges.
9
The Core Analytics of
Randomized Experiments
for Social Research
Howard S. Bloom
ta /2 or ta t1−B
Never-takers
D=0 D=0
Defiers
D=0 D=1
in the overall difference between treatment The estimated LATE is the ratio of the esti-
and control groups. Hence, always-takers and mated impact of randomization on outcomes
never-takers do not contribute information and the estimated impact of randomization on
about treatment effects. treatment receipt18 . Angrist et al. show that
If defiers do not exist16 , which is reasonable this ratio is a simple form of instrumental
to assume in many situations, the effect of variables analysis called a Wald estimator
treatment for compliers, termed by Angrist (Wald, 1940).
et al. (1996) the Local Average Treatment Returning to our previous example, assume
Effect (LATE) is17 : that there is a $1,000 difference in mean
annual earnings for a treatment group and
control group; half of the treatment group
ITT receives treatment and one-tenth of the control
LATE = (10)
E(D|Z = 1) − E(D|Z = 0) group receives treatment. The estimated
LATE equals the estimated impact on the
outcome ($1,000), divided by the estimated
Thus to estimate the LATE from an exper- impact on treatment receipt rates (0.5 – 0.1).
iment, one simply divides the difference in This ratio equals $1,000/0.4 or $2,50019 .
mean outcomes for the treatment and control When using this approach to estimate
groups by their difference in treatment receipt treatment effects, it is important to clearly
rates, or: specify the groups to which it applies,
because different groups may experience
different effects from the same treatment,
YT − YC
ˆ =
LATE (11) and not all groups and treatment effects
|Z = 1) − (D
(D |Z = 0) can be observed without making further
THE CORE ANALYTICS OF RANDOMIZED EXPERIMENTS FOR SOCIAL RESEARCH 123
assumptions. The impact of ITT applies to multiple regression model for estimating
the full treatment group. So both the target intervention effects:
group and its treatment effect can be observed.
∗
The LATE, which can be observed, applies
k
typically past outcomes. For example, past There are two differences in the expressions
student achievement is usually the best for minimum detectable effects with and
predictor of future student achievement. This without blocking (Equations 16 and 5).
is because past outcomes reflect most factors The first difference involves the multipliers,
that determine future outcomes. Fourth, Mn−m∗ −1 versus Mn−2 , which account for
some outcomes are more predictable than the loss of one degree of freedom per block
others, and thus covariates provide greater and the gain of one degree of freedom
precision gains for them. For example, the from suppressing the intercept. With samples
correlation between individual standardized of more than about 40 members in total
test scores is typically stronger for high school and 10 or fewer blocks, there is very little
students than for elementary school students difference between these two multipliers.
(Bloom et al., 2005). The second difference is the addition of the
The second approach to improving pre- term 1 − RB2 in Equation 16 to account for
cision is to block or stratify experimen- the predictive power of blocking. The more
tal sample members by some combination similar sample members are within blocks
of their baseline characteristics, and then and the more different blocks are from each
randomize within each block or stratum. The other, the higher this predictive power is. This
extreme case of two sample members per is where precision gains come from. Note,
block is an example of matching. Factors however, that for samples with fewer than
used for blocking in social research typically about 10 subjects, precision losses due to
include geographic location, organizational reducing the number of degrees of freedom by
units, demographic characteristics, and past blocking can sometimes outweigh precision
outcomes. To compute an unbiased estimate gains due to the predictive power of blocking.
of the impact of ITT from such designs This is most likely to occur in experiments
requires computing impact estimates for each that randomize small numbers of groups
block and pooling estimates across blocks. (discussed later).
One way to do this in a single step is to add Another reason to block sample members
to the impact regression a series of indicator is to avoid an ‘unhappy’ randomization
variables that represent each of the m∗ blocks, with embarrassing treatment and control
and suppress the intercept, α, yielding: group differences on a salient characteris-
tic. Such differences can reduce the face
∗
m validity of an experiment, thereby under-
Yi = β0 Ti + γm Smi + εi (15) mining its credibility. Blocking first on
m=1 the salient characteristic eliminates such a
mismatch.
where: Sometimes researchers wish to assure
Smi = one if sample member i is from block treatment and control group matches on
(or stratum) m and zero otherwise. multiple characteristics. One way to do so is
The estimated value of B0 provides an to define blocks in terms of combinations of
unbiased estimator of the effect of ITT. The characteristics (e.g. age, race, and gender).
MDES of this estimator can be expressed as: But doing so can become complicated in
practice due to uneven distributions of sample
1 − RB2 members across blocks, and the consequent
MDES(β̂0 ) = Mn−m∗ −1 (16) need to combine blocks, often in ad hoc
nP(1 − P)
ways. A second approach is to specify a
composite index of baseline characteristics
where:
and create blocks based on intervals of
RB2 = the proportion of unexplained this index24 . Using either approach, the
variation in the outcome within experimental quality of the match on any given char-
groups (pooled) predicted by the blocks. acteristic typically declines as the number
THE CORE ANALYTICS OF RANDOMIZED EXPERIMENTS FOR SOCIAL RESEARCH 125
of matching variables increases. So it is model for estimating ITT effects with group
important to set priorities for which variables randomization:
to match25 .
Regardless of how blocks are defined, one’s Yij = α + β0 Tj + ej + εij (17)
impact analysis must account for them if they
are used. To not do so would bias estimates where:
of standard errors. In addition, it is possible to
use blocking in combination with covariates. Yij = the outcome for individual i from
If so, both features of the experimental design group j
should be represented in the experimental α = the mean outcome without treatment
analysis.
B0 = the average impact of ITT
Tj = 1 for groups randomized to treatment
RANDOMIZING GROUPS TO and 0 otherwise
ESTIMATE INTERVENTION EFFECTS
ej = an error that is independently and
This section introduces a type of experimental identically distributed between groups
design that is growing rapidly in popular- with a mean of 0 and a variance of τ 2
ity — the randomization of intact groups εij = an error that is independently and
or clusters26 . Randomizing groups makes identically distributed between individ-
it possible to measure the effectiveness of uals within groups with a pooled mean
interventions that are designed to affect entire of zero and variance of σ 2 .
groups or are delivered in group settings,
such as communities, schools, hospitals, Equation 17 for group randomization has
or firms. For example, schools have been an additional random error, ej , relative to
randomized to measure the impacts of whole Equation 12 for individual randomization.
school reforms (Cook et al., 2000; Borman This error reflects how mean outcomes vary
et al., 2005) and school-based risk-prevention across groups, which reduces the precision of
campaigns (Flay, 2000); communities have group randomization.
been randomized to measure the impacts To see this, first note that the rela-
of community health campaigns (Murray tionship between group-level variance, τ 2 ,
et al., 1994); small local areas have been and individual-level variance, σ 2 , can be
randomized to study the impacts of police expressed as an intra-class coefficient, ρ,
patrol interventions (Sherman and Weisburd, where:
1995); villages have been randomized to
study the effects of a health, nutrition, τ2
ρ= (18)
and education initiative (Teruel and Davis, τ2 + σ2
2000); and public housing developments ρ equals the proportion of total variation
have been randomized to study the effects across all individuals in the target population
of a place-based HIV prevention pro- (τ 2 + σ 2 ) that is due to variation between
gram (Sikkema et al., 2000) and a place- groups (τ 2 ). If there is no variation in
based employment program (Bloom and mean outcomes between groups, (τ 2 = 0)
Riccio, 2005). ρ equals zero. If there is no variation in
Group randomization provides unbiased individual outcomes within groups, (σ 2 = 0)
estimates of intervention effects for the same ρ equals one.
reasons that individual randomization does. Consider a study that randomizes a total
However, the statistical power or precision of J groups in proportion P to treatment
of group randomization is less than that with a harmonic mean value of n individuals
for individual randomization, often by a lot. per group. The ratio of the standard error of
To see this, consider the basic regression this impact estimator to that for individual
126 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
randomization of the same total number of Choosing a sample size and allocation for
subjects (Jn) is referred to as a design effect group-randomized studies means choosing
(DE), where: values for J, n, and P. Equation 20 illustrates
how these choices influence MDES (Bloom
et al., 2005).
DE = 1 + (n − 1)ρ (19)
ρ 1−ρ
MDES(βˆ0 ) = MJ−2 +
As the intra-class correlation (ρ) increases, P(1−P)J P(1−P)Jn
the DE increases, implying a larger standard (20)
error for group randomization relative to
individual randomization. This is because a This equation indicates that the group-
larger ρ implies greater random variation level variance (ρ) is divided by the total
across groups. The value of ρ varies typically number of randomized groups, J, whereas the
from about 0.01 to 0.20, depending on the individual-level variance, (1 − ρ) is divided
nature of the outcome being measured and the by the total number of individuals, Jn28 .
type of group being randomized. Hence, increasing the number of randomized
For a given total number of individuals, groups reduces both variance components,
the DE also increases as the number of whereas increasing the number of individuals
individuals per group (n) increases. This per group reduces only one component. This
is because for a given total number of result illustrates one of the most impor-
individuals, larger groups imply fewer groups tant design principles for group-randomized
randomized. With fewer groups randomized, studies: The number of groups randomized
larger treatment and control group differences influences precision more than the size of the
are likely for a given sample27 . groups randomized.
The DE has important implications for The top panel of Table 9.2 illustrates this
designing group-randomized studies. For point by presenting MDESs for an intra-
example, with ρ equal to 0.10 and n class correlation of 0.10, a balanced sample
equal to 100, the standard error for group allocation, and no covariates. Reading across
randomization is 3.3 times that for individual each row illustrates that, after group size
randomization. To achieve the same precision, reaches about 60 individuals, increasing it
group randomization would need almost affects precision very little. For very small
11 times as many sample members. Note that randomized groups (with less than about
the DE is independent of J and depends only 10 individuals each), changing group size can
on the values of n and ρ. have a more pronounced effect on precision.
The different standard errors for group Reading down any column in the top
randomization and individual randomization panel illustrates that increasing the number
also imply a need to account for group ran- of groups randomized can improve precision
domization during the experimental analysis. appreciably. Minimum detectable effects are
This can be done by using a multilevel model approximately inversely proportional to the
that specifies separate variance components square root of the number of groups ran-
for groups and individuals (for example domized once the number of groups exceeds
see Raudenbush and Bryk, 2002). In the about 20.
preceding example, using an individual-level Equation 21 illustrates how covariates
model, which ignores group-level variation, affect precision with group randomization29 .
would estimate standard errors that are one-
third as large as they should be. Thus, as MDES(β̂0 )
Jerome Cornfield (1978: 101) aptly observed:
ρ(1−R22 ) (1−ρ)(1−R12 )
‘Randomization by group accompanied by = MJ−g∗ −2 +
an analysis appropriate to randomization by P(1−P)J P(1−P)Jn
individual is an exercise in self-deception.’ (21)
THE CORE ANALYTICS OF RANDOMIZED EXPERIMENTS FOR SOCIAL RESEARCH 127
10 30 60 120 480
No covariates
10 0.88 0.73 0.69 0.66 0.65
30 0.46 0.38 0.36 0.35 0.34
60 0.32 0.27 0.25 0.24 0.24
120 0.23 0.19 0.18 0.17 0.17
480 0.11 0.09 0.09 0.08 0.08
9 When the number of degrees of freedom used to study causal effects of other mediating
becomes smaller, the multiplier becomes larger as the variables.
t distribution becomes fatter in its tails. 20 The remainder of this chapter assumes
10 The subscript n − 2 equals the number of a common variance for treatment and control
degrees of freedom for a treatment and control group groups.
difference of means, given a common variance for the 21 One way to estimate RA2 from a dataset
two groups. would be to first estimate Equation 12 and compute
11 When the outcome measure is a one/zero binary residual outcome values for each sample member.
variable (e.g. employed = 1 or not employed = 0) The next step would be to regress the residuals on
the variance estimate is p(1 − p)/n where p is the covariates. The resulting r -square for the second
the probability of a value equal to one. The usual regression is an estimate of RA2 .
conservative practice in this case is to choose p = 0.5, 22 See Bloom (2005b) for a discussion of this issue.
which yields the maximum possible variance. 23 Covariates can also provide some protection
12 The preceding discussion makes the con- against selection bias due to sample attrition.
ventional assumption that σ 2 is the same for the 24 Such indices include propensity scores
treatment and control groups. But if the treatment (Rosenbaum and Rubin, 1983) and Mahalanobis
affects different sample members differently, it distance functions (http//en.wikipedia.org/wiki/
can create a σ 2 for the treatment group which Mahalanobis_distance).
differs from that for the control group (Bryk and 25 One controversial issue is whether to treat
Raudenbush, 1988). This is a particular instance of blocks as ‘fixed effects,’ which represent a defined
heteroscedasticity. Assuming that these two standard population, or ‘random effects,’ which represent a
deviations are equal to each other can produce a random sample from a larger population. Equations
bias in estimates of the standard error of the impact 15 and 16 treat blocks as fixed effects. Raudenbush
estimator (Gail et al., 1996). Two ways to eliminate this et al. (2005) present random-effects estimators for
problem are to: (1) use a balanced sample allocation blocking.
and (2) estimate separate variances for the treatment 26 Bloom et al. (2005), Donner and Klar (2000),
and control groups (Bloom, 2005b). and Murray (1998) provide detailed discussions of
13 Weisburd (1993) among others, found that this approach; Boruch and Foley (2000) review its
large samples can sometimes provide less statistical applications.
power than small samples because large samples may 27 The statistical properties of group randomiza-
have weaker treatment implementation. Researchers tion in experimental research are much like those of
should consider this possibility when designing cluster sampling in survey research (Kish, 1965).
experiments, although there are no clear quantitative 28 When total student variance (τ 2 + σ 2 ) is
guidelines for doing so. standardized to a value of one by substituting
14 Angrist (2005) and Gennetian et al. (2005) the intra-class correlation (ρ) into the preceding
illustrate the approach. expressions, ρ represents τ 2 and (1−ρ) represents σ 2 .
15 This is a specific case of the exclusion principle 29 Raudenbush (1997) and Bloom et al. (2005)
specified by Angrist et al. (1996). discuss in detail how covariates affect precision with
16 Angrist et al. (1996) refer to this condition as group randomization.
monotonicity. 30 The basic principles discussed here extend to
17 This formulation assumes that the average situations with more than two levels of clustering.
effect of treatment on always-takers is the same 31 Existing sources of this information include,
whether they are randomized to treatment or control among others: Bloom et al. (1999, 2005); Hedges and
status. Hedberg (2005); Murray and Blitstein (2003); Murray
18 The expression for LATE in Equation 10 and Short (1995); Schochet (2005); Siddiqui et al.
simplifies to the expression for TOT in Equation 7 (1996); and Ukoumunne et al. (1999).
when there are no-shows but no crossovers. Both 32 Some other countries where randomized social
expressions represent ITT divided by the probability of experiments have been conducted include: the UK
being a complier. When there are crossovers (but no (Walker et al., 2006); Mexico (Shultz, 2004); Colombia
defiers), the probability of being a complier equals the (Angrist et al., 2002); Israel (Angrist and Lavy, 2002);
probability of receiving the treatment if randomized to India (Banerjee et al., 2005; Duflo and Hanna, 2005);
the treatment group, minus the probability of being and Kenya (Miguel and Kremer, 2004). For a review
an always-taker. When there are no crossovers, there of randomized experiments in developing countries,
are no always-takers. see Kremer (2003).
19 In the present analysis, treatment receipt is 33 Two studies that tried to open the black box
a mediating variable in the causal path between of treatment effects experimentally are the Riverside,
randomization and the outcome. Gennetian et al. California Welfare Caseload Study, which randomized
(2005) show how the same approach (using different caseload sizes to welfare workers (Riccio
instrumental variables with experiments) can be et al., 1994) and the Columbus, Ohio, comparison of
128 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
rapidly expanding range of social science broader application of this approach to social
questions; experimental designs have become research.
increasingly sophisticated; and statistical
methods have become more advanced. So
what are the frontiers for future advances?
One frontier involves expanding the geo- ACKNOWLEDGMENTS
graphic scope of randomized experiments in
the social sciences. To date, the vast majority This chapter was supported by the Judith
of such experiments have been conducted Gueron Fund for Methodological Innovation
in the United States, although important in Social Policy Research at MDRC, which
exceptions exist in both developed and was created through gifts from the Annie
developing countries32 . Given the promise of E. Casey, Rockefeller, Jerry Lee, Spencer,
the approach, much more could be learned by William T. Grant and Grable Foundations.
promoting its use throughout the world. Many thanks are due to Richard Dorsett,
A second frontier involves unpacking the Carolyn Hill, Rob Hollister, and Charles
‘black box’ of social experiments. Experi- Michalopoulos for their helpful suggestions.
ments are uniquely qualified to address ques-
tions like: what did an intervention cause to
happen? But they are not well suited to address
questions like: why did an intervention have NOTES
or not have an effect33 ? Two promising
approaches to such questions are emerging, 1 References to randomizing subjects to compare
treatment effects date back to the seventeenth
which combine nonexperimental statistical century (Van Helmont, 1662), although the earliest
methods with experimental designs. documented use of the method was in the late
One approach uses instrumental variables nineteenth century for research on sensory perception
analysis to examine the causal paths between (Peirce and Jastrow, 1884/1980). There is some
randomization and final outcomes by com- evidence that randomized experiments were used for
educational research in the early twentieth century
paring intervention effects on intermediate (McCall, 1923). But it was not until Fisher (1925 and
outcomes (mediating variables) with those on 1935) combined statistical methods with experimental
final outcomes34 . The other approach uses design that the method we know today emerged.
methods of research synthesis (meta-analysis 2 Marks (1997) provides an excellent history of this
or multilevel models that pool primary process.
3 See Bloom (2005a) for an overview of group-
data) with multiple experiments, multiple randomized experiments; see Donner and Klar (2000)
experimental sites, or both to estimate how and Murray (1998) for textbooks on the method.
intervention effects vary with treatment 4 For further examples, see Greenberg and Shroder
implementation, sample characteristics, and (1997).
local context35 . It is especially important 5 Absent treatment, the expected values of all
past, present, and future characteristics are the
for this latter approach to have high-quality same for a randomized treatment group and control
implementation research that is conducted in group. Hence, the short-term and long-term future
parallel with randomized experiments. experiences of the control group provide valid
Perhaps the most important frontier for estimates of what these experiences would have been
randomized experiments in the social sciences for the treatment group had it not been offered the
treatment.
is the much-needed expansion of organiza- 6 Three studies that used national probability
tional and scientific capacity to implement sampling and random assignment are the evaluations
them successfully on a much broader scale. of Upward Bound (Myers et al., 2004), Head Start
To conduct this type of research well requires (Puma et al., 2006) and the Job Corps (Schochet,
high levels of scientific and professional 2006).
7 The present discussion assumes a common
expertise, which at present exist only at a outcome variance for the treatment and control
limited number of institutions. It is therefore groups.
hoped that this chapter will contribute to a 8 Note that Pn equals nT and (1 − P )n equals nC .
130 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
separate versus integrated job functions for welfare Bloom, Howard S. 1984. ‘Accounting for No-Shows in
workers (Scrivener and Walter, 2001). Experimental Evaluation Designs.’ Evaluation Review
34 For example, Morris and Gennetian (2003), 8(2): 225–46.
Gibson et al. (2005), Liebman et al. (2004), and Bloom, Howard S. 1995. ‘Minimum Detectable Effects:
Ludwig et al. (2001) used instrumental variables with A Simple Way to Report the Statistical Power of
experiments to measure the effects of mediating
Experimental Designs.’ Evaluation Review 19(5):
variables on final outcomes.
35 Heinrich (2002) and Bloom et al. (2003) used
547–56.
primary data from a series of experiments to address Bloom, Howard S. (ed.). 2005a. Learning More from
these issues. Social Experiments: Evolving Analytic Approaches.
New York: Russell Sage Foundation.
Bloom, Howard S. 2005b. ‘Randomizing Groups to
Evaluate Place-Based Programs.’ In Howard S.
REFERENCES Bloom (ed.), Learning More from Social Experiments:
Evolving Analytic Approaches. New York: Russell
Aigner, Dennis J. 1985. ‘The Residential Time-of-Use Sage Foundation.
Pricing Experiments: What Have We Learned?’ In Bloom, Howard S., Johannes M. Bos, and Suk-Won
Jerry A. Hausman and David A. Wise (eds.), Social Lee. 1999. ‘Using Cluster Random Assignment to
Experimentation. Chicago: University of Chicago Measure Program Impacts: Statistical Implications for
Press. the Evaluation of Education Programs.’ Evaluation
Angrist, Joshua D. 2005. ‘Instrumental Variables Review 23(4): 445–69.
Methods in Experimental Criminology Research: Bloom, Howard S., Carolyn J. Hill, and James A.
What, Why and How.’ Journal of Experimental Riccio. 2003. ‘Linking Program Implementation and
Criminology 2: 1–22. Effectiveness: Lessons from a Pooled Sample of
Angrist, Joshua, Eric Bettinger, Erik Bloom, Elizabeth Welfare-to-Work Experiments.’ Journal of Policy
King, and Michael Kremer. 2002. ‘Vouchers for Analysis and Management 22(4): 551–75.
Private Schooling in Colombia: Evidence from a Bloom, Howard S., Larry L. Orr, George Cave, Stephen
Randomized Natural Experiment.’ The American H. Bell, Fred Doolittle, and Winston Lin. 1997. ‘The
Economic Review 92(5): 1535–58. Benefits and Costs of JTPA Programs: Key Findings
Angrist, Joshua, Guido Imbens, and Don Rubin. 1996. from the National JTPA Study.’ The Journal of Human
‘Identification of Causal Effects Using Instrumental Resources 32(3): 549–576.
Variables.’ JASA Applications invited paper, with Bloom, Howard S., and James A. Riccio. 2005. ‘Using
comments and authors’ response. Journal of the Place-Based Random Assignment and Comparative
American Statistical Association 91(434): 444–55. Interrupted Time-Series Analysis to Evaluate the
Angrist, Joshua D., and Victor Lavy. 2002. ‘The Effect Jobs-Plus Employment Program for Public Housing
of High School Matriculation Awards: Evidence from Residents.’ Annals of the American Academy of
Randomized Trials.’ Working Paper 9389. New York: Political and Social Science 599 (May): 19–51.
National Bureau of Economic Research. Bloom, Howard S., Lashawn Richburg-Hayes, and
Banerjee, Abhijit, Shawn Cole, Esther Duflo, and Leigh Alison Rebeck Black. 2005. ‘Using Covariates to
Linden. 2005. ‘Remedying Education: Evidence from Improve Precision: Empirical Guidance for Studies
Two Randomized Experiments in India.’ Working that Randomize Schools to Measure the Impacts
Paper 11904. Cambridge, MA: National Bureau of of Educational Interventions.’ Working Paper. New
Economic Research. York: MDRC.
Bell, Stephen, Michael Puma, Gary Shapiro, Ronna Cook, Borman, Geoffrey D., Robert E. Slavin, A. Cheung, Anne
and Michael Lopez. 2003. ‘Random Assignment for Chamberlain, Nancy Madden, and Bette Chambers.
Impact Analysis in a Statistically Representative Set 2005. ‘The National Randomized Field Trial of
of Sites: Issues from the National Head Start Impact Success for All: Second-Year Outcomes.’ American
Study.’ Proceedings of the August 2003 American Educational Research Journal 42: 673–96.
Statistical Association Joint Statistical Meetings Boruch, Robert F. 1997. Randomized Experiments for
(CD-ROM). Alexandria, VA: American Statistical Planning and Evaluation. Thousand Oaks, CA: Sage
Association. Publications.
Bloom, Dan, and Charles Michalopoulos. 2001. How Boruch, Robert F., and Ellen Foley. 2000. ‘The Honestly
Welfare and Work Policies Affect Employment Experimental Society: Sites and Other Entities as
and Income: A Synthesis of Research. New York: the Units of Allocation and Analysis in Randomized
MDRC. Trials.’ In Leonard Bickman (ed.), Validity and Social
THE CORE ANALYTICS OF RANDOMIZED EXPERIMENTS FOR SOCIAL RESEARCH 131
Experimentation: Donald Campbell’s Legacy, vol. 1. Evolving Analytic Approaches. New York: Russell
Thousand Oaks, CA: Sage Publications. Sage Foundation.
Box, George E.P., J. Stuart Hunter, and William G. Gibson, C., Katherine Magnusen, Lisa Gennetian,
Hunter. 2005. 2nd ed. Statistics for Experimenters: and Greg Duncan. 2005. ‘Employment and Risk
Design Innovation and Discovery. New York: John of Domestic Abuse among Low-Income Single
Wiley and Sons. Mothers.’ Journal of Marriage and the Family 67:
Bryk, Anthony S., and Stephen W. Raudenbush. 1988. 1149–68.
‘Heterogeneity of Variance in Experimental Stud- Greenberg, David H., and Mark Shroder. 1997. The
ies: A Challenge to Conventional Interpretations.’ Digest of Social Experiments. Washington, DC: Urban
Psychological Bulletin 104(3): 396–404. Institute Press.
Cochrane Collaboration. 2002. ‘Cochrane Central Hedges, Larry V., and Eric C. Hedberg. 2005. ‘Intraclass
Register of Controlled Trials Database.’ Available at Correlation Values for Planning Group Randomized
the Cochrane Library Web site: www.cochrane.org Trials in Education.’ Working Paper WP-06-12.
(accessed September 14, 2004). Evanston, IL: Northwestern University, Institute for
Cochran, William G., and Gertrude M. Cox. 1957. Policy Research.
Experimental Designs. New York: John Wiley and Heinrich, Carolyn J. 2002. ‘Outcomes-Based Perfor-
Sons. mance Management in the Public Sector: Implications
Cohen, Jacob. 1977/1988. Statistical Power Analysis for Government Accountability and Effectiveness.’
for the Behavioral Sciences. New York: Academic Public Administration Review 62(6): 712–25.
Press. Kane, Thomas. 2004. ‘The Impact of After-School
Cook, Thomas H., David Hunt, and Robert F. Murphy. Programs: Interpreting the Results of Four Recent
2000. ‘Comer’s School Development Program in Evaluations.’ Working Paper. New York: W.T. Grant
Chicago: A Theory-Based Evaluation.’ American Foundation.
Educational Research Journal 37(1): 535–97. Kemple, James J., and Jason Snipes. 2000. Career
Cornfield, Jerome. 1978. ‘Randomization by Group: A Academies: Impacts on Students’ Engagement and
Formal Analysis.’ American Journal of Epidemiology Performance in High School. New York: MDRC.
108(2): 100–02. Kempthorne, Oscar. 1952. The Design and Analysis
Cox, D.R. 1958. Planning of Experiments. New York: of Experiments. Malabar, FL: Robert E. Krieger
John Wiley and Sons. Publishing Company.
Donner, Allan, and Neil Klar. 2000. Design and Analysis Kish, Leslie. 1965. Survey Sampling. New York: John
of Cluster Randomization Trials in Health Research. Wiley.
London: Arnold. Kling, Jeffrey R., Jeffrey B. Liebman, and Lawrence F.
Duflo, Esther, and Rema Hanna. 2005. ‘Monitoring Katz. 2007. ‘Experimental Analysis of Neighborhood
Works: Getting Teachers to Come to School.’ Working Effects.’ Econometrica 75(1): 83–119.
Paper 11880. Cambridge, MA: National Bureau of Kremer, Michael. 2003. ‘Randomized Evaluations of
Economic Research. Educational Programs in Developing Countries:
Fisher, Ronald A. 1925. Statistical Methods for Research Some Lessons.’ American Economic Review 93(2):
Workers. Edinburgh: Oliver and Boyd. 102–06.
Fisher, Ronald A. 1935. The Design of Experiments. Liebman, Jeffrey B., Lawrence F. Katz, and Jeffrey R.
Edinburgh: Oliver and Boyd. Kling. 2004. ‘Beyond Treatment Effects: Estimating
Flay, Brian R. 2000. ‘Approaches to Substance Use the Relationship Between Neighborhood Poverty and
Prevention Utilizing School Curriculum Plus Social Individual Outcomes in the MTO Experiment.’ IRS
Environment Change.’ Addictive Behaviors 25(6): Working Paper 493 (August). Princeton, NJ: Princeton
861–85. University, Industrial Relations Section.
Gail, Mitchell H., Steven D. Mark, Raymond J. Carroll, Lindquist, E.F. 1953. Design and Analysis of Experiments
Sylvan B. Green, and David Pee. 1996. ‘On Design in Psychology and Education. Boston: Houghton
Considerations and Randomization-Based Inference Mifflin Company.
for Community Intervention Trials.’ Statistics in Lipsey, Mark W. 1988. ‘Juvenile Delinquency Interven-
Medicine 15: 1069–92. tion.’ In Howard S. Bloom, David S. Cordray, and
Gennetian, Lisa A., Pamela A. Morris, Johannes M. Bos, Richard J. Light (eds.), Lesson from Selected Program
and Howard S. Bloom. 2005. ‘Constructing Instru- and Policy Areas. San Francisco: Jossey-Bass.
mental Variables from Experimental Data to Explore Lipsey, Mark W. 1990. Design Sensitivity: Statistical
How Treatments Produce Effects.’ In Howard S. Power for Experimental Research. Newbury Park, CA:
Bloom (ed.), Learning More from Social Experiments: Sage.
132 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Ludwig, Jens, Greg J. Duncan, and Paul Hirschfield. Olds, David L., John Eckenrode, Charles R. Henderson,
2001. ‘Urban Poverty and Juvenile Crime: Evidence Jr., Harriet Kitzman, Jane Powers, Robert Cole,
from a Randomized Housing-Mobility Experiment.’ Kimberly Sidora, Pamela Morris, Lisa M. Pettitt, and
The Quarterly Journal of Economics 116(2): 655–80. Dennis Luckey. 1997. ‘Long-Term Effects of Home
McCall, W.A. 1923. How to Experiment in Education. Visitation on Maternal Life Course and Child Abuse
New York: MacMillan. and Neglect.’ The Journal of the American Medical
Marks, Harry M. 1997. The Progress of Experiment: Association 278(7): 637–43.
Science and Therapeutic Reform in the United States, Orr, Larry L. 1999. Social Experiments: Evaluating Public
1900–1990. Cambridge: Cambridge University Programs with Experimental Methods. Thousand
Press. Oaks, CA: Sage Publications.
Miguel, Edward, and Michael Kremer. 2004. ‘Worms: Orr, Larry L., Judith D. Feins, Robin Jacob, Erik
Identifying Impacts on Education and Health in the Beecroft, Lisa Sanbomatsu, Lawrence F. Katz,
Presence of Treatment Externalities.’ Econometrica Jeffrey B. Liebman, and Jeffrey R. Kling. 2003.
72(1): 159–217. Moving to Opportunity: Interim Impacts Evaluation.
Morris, Pamela, and Lisa Gennetian. 2003. ‘Identifying Washington, DC: U.S. Department of Housing and
the Effects of Income on Children’s Development: Urban Development.
Using Experimental Data.’ Journal of Marriage and Peirce, Charles S., and Joseph Jastrow. 1884/1980.
the Family 65(3): 716–29. ‘On Small Differences of Sensation.’ Reprinted in
Munnell, Alicia (ed.). 1987. Lessons from the Income Stephen M. Stigler (ed.), American Contributions to
Maintenance Experiments. Boston: Federal Reserve Mathematical Statistics in the Nineteenth Century,
Bank of Boston. vol. 2. New York: Arno Press.
Murray, David M. 1998. Design and Analysis of Group- Puma, Michael, Stephen Bell, Ronna Cook, Camilla
Randomized Trials. New York: Oxford University Heid, and Michael Lopez. 2006. Head Start Impact
Press. Study: First Year Impact Findings. (Prepared by
Murray, David M., and Jonathan L. Blitstein. 2003. Westat, Chesapeake Research Associates, The Urban
‘Methods to Reduce the Impact of Intraclass Institute, American Institutes for Research, and
Correlation in Group-Randomized Trials.’ Evaluation Decision Information Resources, June.) Washington,
Review 27(1): 79–103. DC: U. S. Department of Health and Human Services,
Murray, David M., Peter J. Hannan, David R. Jacobs, Administration for Children and Families, Office of
Paul J. McGovern, Linda Schmid, William L. Baker, and Planning, Research, and Evaluation.
Clifton Gray. 1994. ‘Assessing Intervention Efforts Raudenbush, Stephen, W. 1997. ‘Statistical Analysis
in the Minnesota Heart Health Program.’ American and Optimal Design for Group Randomized Trials.’
Journal of Epidemiology 139(1): 91–103. Psychological Methods 2(2): 173–85.
Murray, David M., and Brian Short. 1995. ‘Intraclass Raudenbush, Stephen W., and Anthony S. Bryk. 2002.
Correlation among Measures Related to Alcohol Hierarchical Linear Models: Applications and Data
Use by Young Adults: Estimates, Correlates and Analysis Methods, 2nd ed. Thousand Oaks, CA: Sage
Applications in Intervention Studies.’ Journal of Publications.
Studies on Alcohol 56(6): 681–94. Raudenbush, Stephen W., Andres Martinez, and Jessaca
Myers, David, Robert Olsen, Neil Seftor, Julie Young, Spybrook. 2005. Strategies for Improving Precision in
and Christina Tuttle. 2004. The Impacts of Regular Group-Randomized Experiments. New York: William
Upward Bound: Results from the Third Follow-up T. Grant Foundation.
Data Collection. Washington, DC: Report prepared by Riccio, James, Daniel Friedlander, and Stephen
Mathematica Policy Research for the U.S. Department Freedman. 1994. Benefits, Costs, and Three-Year
of Education. Impacts of a Welfare-to-Work Program. New York:
Myers, Jerome L. 1972. Fundamentals of Experimental MDRC.
Design. Boston: Allyn and Bacon. Robins, Philip K., and Robert G. Spiegelman (eds.).
Newhouse, Joseph P. 1996. Free for All? Lessons from 2001. Reemployment Bonuses in the Unemployment
the RAND Health Insurance Experiment. Cambridge, Insurance System: Evidence from Three Field Exper-
MA: Harvard University Press. iments. Kalamazoo, MI: W.E. Upjohn Institute for
Nye, Barbara, Larry V. Hedges, and Spyros Employment Research.
Konstantopoulos. 1999. ‘The Long-Term Effects Rosenbaum, Paul R., and Donald B. Rubin. 1983. ‘The
of Small Classes: A Five-Year Follow-Up of the Central Role of the Propensity Score in Observational
Tennessee Class Size Experiment.’ Education Studies for Causal Effects.’ Biometrika 70(1):
Evaluation and Policy Analysis 21(2): 127–42. 41–55.
THE CORE ANALYTICS OF RANDOMIZED EXPERIMENTS FOR SOCIAL RESEARCH 133
Schochet, Peter A. 2005. Statistical Power for Random of a Randomized Community-Level HIV Prevention
Assignment Evaluations of Education Programs. Intervention for Women Living in 18 Low-Income
Princeton, NJ: Mathematica Policy Research. Housing Developments.’ American Journal of Public
Schochet, Peter A. 2006. National Job Corps Study and Health 90(1): 57–63.
Longer-Term Follow-Up Study: Impact and Benefit- Teruel, Graciela M., and Benjamin Davis. 2000. Final
Cost Findings Using Survey and Summary Earnings Report: An Evaluation of the Impact of PROGRESA
Records Data. Princeton, NJ: Mathematica Policy Cash Payments on Private Inter-Household Transfers.
Research. Washington, DC: International Food Policy Research
Scrivener, Susan, and Johanna Walter, with Thomas Institute.
Brock and Gayle Hamilton. 2001. National Evaluation Ukoumunne, O.C., Gulliford, M.C., Chinn, S.,
of Welfare-to-Work Strategies: Evaluating Two Sterne, J.A.C., and Burney, P.F.J. 1999. ‘Methods
Approaches to Case Management: Implementa- for Evaluating Area-Wide and Organisation-Based
tion, Participation Patterns, Costs, and Three-Year Interventions in Health and Health Care: A Systematic
Impacts of the Columbus Welfare-to-Work Program. Review.’ Health Technology Assessment 3(5):
Washington, DC: U.S. Department of Health and 1–99.
Human Services, Administration for Children and Van Helmont, John Baptista. 1662. Oriatrik or,
Families, and Office of the Assistant Secretary for Physick Refined: The Common Errors Therein
Planning and Evaluation; and U.S. Department of Refuted and the Whole Art Reformed and Rectified.
Education, Office of the Under Secretary and Office London: Lodowick-Lloyd. Available at the James
of Vocational and Adult Education. Lind Library Web site: www.jameslindlibrary.org/
Sherman, Lawrence W., and David Weisburd. 1995. trial_records/17th_18th_Century/van_helmont/van_
‘General Deterrent Effects of Police Patrol in Crime helmont_kp.html (accessed January 3, 2005).
‘Hot Spots’: A Randomized Control Trial.’ Justice Wald, Abraham. 1940. ‘The Fitting of Straight Lines
Quarterly 12(4): 625–48. If Both Variables Are Subject to Error.’ Annals of
Shultz, Paul T. 2004. ‘School Subsidies for the Poor: Mathematical Statistics 11(September): 284–300.
Evaluating the Mexican Progresa Poverty Program.’ Walker, Robert, Lesley Hoggart, Gayle Hamilton, and
Journal of Development Economics 74(1): 199–250. Susan Blank. 2006. Making Random Assignment
Siddiqui, Ohidul, Donald Hedeker, Brian R. Flay, and Happen: Evidence from the UK Employment Retention
Frank B. Hu. 1996. ‘Intraclass Correlation Estimates in and Advancement (ERA) Demonstration. Research
a School-Based Smoking Prevention Study: Outcome Report 330. London: Department for Work and
and Mediating Variables, by Sex and Ethnicity.’ Pensions.
American Journal of Epidemiology 144(4): 425–33. Weisburd, David, with Anthony Petrosino and Gail
Sikkema, Kathleen, J., Jeffrey A. Kelly, Richard A. Winett, Mason. 1993. ‘Design Sensitivity in Criminal Justice
Laura J. Solomon, Cargill, V.A., Roffman, R.A., Experiments.’ In Michael Tonry (ed.), Crime and
McAuliffe, T.L., Heckman, T.G., Anderson, E.A., Justice, An Annual Review of Research, vol. 17.
Wagstaff, D.A., Norman, A.D., Perry, M.J., Chicago: University of Chicago Press.
Crumble, D.S., and Mercer, M.B. 2000. ‘Outcomes
10
Better Quasi-Experimental
Practice
Thomas D. Cook and Vivian C. Wong
of measuring and analyzing selection can lead sizes tended to be greater among the quasi-
to very close approximations of experimental experiments than experiments (Lipsey &
results. The third section offers suggestions Wilson, 1993; Glazerman et al., 2003). So, the
for improving quasi-experimental design, average experiment and quasi-experiment
not through the use of matching — the cannot be relied on to generate the same
current dominant strategy — but through causal conclusion.
an alternative pattern-matching strategy In the within-study approach, researchers
that depends on generating and testing take the effect size from a randomized
multiple empirical implications from the experiment and compare it to the effect size
same causal hypothesis. We use examples from a quasi-experiment that uses the same
from education and job training to illustrate intervention group data as the experiment but
the specific design attributes we discuss compare it with data from a non-randomly
and recommend because the debate over formed control group. Most of the within-
experiment versus quasi-experiment is most study comparisons conducted to date have
heated in these fields. However, the design been in the job training field, though some
principles presented here apply elsewhere as have involved educational topics. At first
well. Finally, it is worth mentioning that our glance, the within-study approach seems
intention is not to present a treatise on analytic a stronger empirical test of design type
methods for quasi-experimental designs, but difference. After all, there is variation in
is rather to showcase the strongest and whether a study is experimental or not, and
best quasi-experimental designs and design settings, people, and treatments are more
features and to suggest common areas of likely to be held constant by virtue of the
weakness in current practice. For more shared experimental group1 . In contrast, the
technical and theoretical discussions on between-study tradition can involve a set of
analytic methods described in this chapter as experiments that differs in many ways on
well as additional examples, we include a list average from the comparison set of quasi-
of suggested readings in Appendix 1. experiments, even though logic calls for
variation in design types but not in anything
else that might be correlated with study
outcomes. This makes between-study results
EFFICACY TESTS OF inherently ambiguous.
QUASI-EXPERIMENTS RELATIVE Our goal is to reexamine conclusions from
TO EXPERIMENTS: BETWEEN-STUDY the within-study comparison literature that
VERSUS WITHIN-STUDY APPROACHES have been most prominently discussed and
cited in the fields of economics, job training,
Two approaches have been employed in and education. We focus on studies that were
studies that have assessed the validity of included in Glazerman et al.’s (2002, 2003)
quasi-experimental designs. In the between- meta-analysis of within-study comparisons,
study approach, researchers compare esti- as well as comparisons that have been more
mated effects from the set of experimental recently published (see Appendix 1 for a list of
studies done on a topic with the estimated within-study comparisons found in education,
effects from whatever quasi-experimental job training, and economics). However, since
studies were available on the same topic. there are only 20 within-study comparison
Aiken et al. (1998) summarized findings studies we acknowledge that basis for extra-
from this tradition. Across many domains of polation is limited. Moreover, we discuss only
application, they concluded that the average a subset of these studies in detail — three
effect sizes were sometimes similar across the with RD design, one with an abbreviated
experiments and quasi-experiments, but that interrupted time series design, and four with a
they were also often different. And even when difference-in-differences design. Thus while
the means did not differ, the variance in effect the conclusions presented here are meant to
136 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
spur further debate, discussion, and research, of nonexperiment used for testing causal
they are not meant to be the final word on hypotheses that do not have these last features
the efficacy of quasi-experimental designs and are therefore not quasi-experimental. This
as assessed empirically from the results of chapter is concerned with the efficacy of
within-study comparisons. quasi-experimental designs relative to the
randomized experiment.
In quasi-experiments, assignment to treat-
ment or control status may be determined
TYPES OF EXPERIMENTS: by self-selection or administrator decision,
RANDOMIZED, NONEXPERIMENTS, and so initial differences between groups
AND QUASI-EXPERIMENTS may come to mimic treatment effects, thus
confounding population differences between
All experiments seek to test a causal hypothe- the treatment and control groups with possible
sis by demonstrating that the cause preceded effects of the treatment and so creating what
the effect in time, that the two co-vary, and is called a ‘selection’ problem. The per-
that there are no alternative interpretations of fectly implemented randomized-experiment
why they vary other than that the cause was rules out selection (and other alternative
responsible for the effect. Experiments in the interpretations of why a potential cause and
social and behavioral sciences also have some effect co-vary) by distributing these alterna-
similar structural attributes. There is always tives equally over the various experimental
one or more outcome measure, plus groups conditions. They are not removed from the
of units that undergo either a treatment or research setting, as though by magic; they are
some contrast experience. This last is often merely removed as alternative interpretations
a no-treatment control group experience that by being equally represented in each of the
seeks to function as a causal counterfactual — groups under contrast.
that is, as an assessment of what would have A well-designed quasi-experiment can
happened to units receiving the treatment if also rule out alternative explanations, but
they had not in fact received it. to do this requires more assumptions and
There are different types of experiments. less transparency, and consequently a more
The randomized experiment is character- uncertain causal answer than the randomized
ized by assignment to treatment or control experiment provides. In particular, the use of
status on the basis of some equivalent of quasi-experiments requires close attention to
a fair coin toss. It creates two or more three related issues. The first is to identify
groups that are initially comparable within all plausible alternative interpretations to
the limits of sampling error. This renders the hypothesis that the independent and
them valid as a no-treatment counterfac- dependent variables are causally related,
tual, with the warrant for this judgment these alternatives being called threats to
stemming from formal probability theory. internal validity (see Shadish et al. (2002) for
Nonexperiments, in contrast, do not use extended discussion). While the randomized
random assignment. Quasi-experiments are experiment takes care of these threats by
the special subtype of nonexperiments that distributing them equally across conditions,
attempt to mimic randomized experiments the quasi-experiment requires researchers to
in purpose and structure despite the absence examine and assess the plausibility of each
of random assignment. In contrast to quasi- threat explicitly. The second is the assumption
experiments, other nonexperiments do not that experimental design principles enjoy a
directly manipulate treatments, nor do they primacy over substantive theory or statistical
have observations and comparison groups that adjustment procedures when it comes to
are deliberately and originally designed to ruling out validity threats. In practice, this
provide a causal counterfactual. Longitudinal entails reliance on carefully chosen compar-
observational studies are a common type ison groups and/or pretest measures taken
BETTER QUASI-EXPERIMENTAL PRACTICE 137
at multiple times. The third principle for side of the cutoff score are assigned to the
ruling out alternative explanations is the use treatment while individuals who score on the
of coherent pattern matching. This requires other side are assigned to the comparison.
that existing substantive theory be specific Thus, treatment assignment is completely
enough to predict the specific pattern of observed and depends on one’s score on the
multivariate results that should result from cutoff variable and on nothing else. Treatment
a given causal hypothesis, a pattern that effects then are estimated by examining the
few alternative explanations can match. We displacement of the regression line at the
begin by discussing designs that exemplify cutoff point determining program receipt.
the best of what quasi-experimental theory has Figures 10.1 and 10.2 show a hypothetical
to offer. RD experiment with and without treatment
effects. In both cases, the cutoff is a score
of 50 — those scoring above 50 receive
REGRESSION-DISCONTINUITY treatment and those scoring below it are the
DESIGN non-equivalent controls. The graphs show
scatterplots of assignment scores against
The regression-discontinuity (RD) is still not posttest scores, each depicting a linear, pos-
widely used despite theoretical and empirical itive relationship between the two variables.
demonstrations of its ability to provide In Figure 10.1, where a treatment effect is
unbiased treatment effect estimates when its present, we see a vertical disruption — or dis-
assumptions are met. Nonetheless, RD has continuity — at the cutoff, though treatments
gained prominence as an abstract alternative can obviously also cause an upward shift.
to experiments in health, economics, and The displacement in Figure 10.1 represents a
education (for history of RD, see Cook, in change in the mean posttest scores, equivalent
press). Indeed, a recent request for proposal to a main effect of treatment. It is also possible
from the Institute of Education Sciences, for treatments to cause a change in slope at
a United States Department of Education the cutoff, this being equivalent to a treat-
agency that funds education research, stated ment by assignment statistical interaction,
that if a randomized experiment was not provided that the change in slope can be
possible for addressing a causal question, unambiguously attributed to the intervention
then acceptable alternatives included ‘appro- rather than to some underlying non-linear
priately structured regression-discontinuity relationship between the assignment and
designs’ (Institute of Education Sciences, outcome. In Figure 10.1 that has linear and
2004). In this section, we examine the basics parallel regressions, we interpret the effect
of a RD design, theoretical and empirical size to be a negative change of 5 units because
reasons for why RD is so special among quasi- there is a vertical displacement of 5 points
experimental designs, and examples of RD in at the cutoff. In Figure 10.2, there is no
order to highlight practical considerations that displacement at cutoff and the regression lines
are important for implementing the design. are again parallel. So we interpret this as
no effect.
For a simple RD design one needs an
The basics of RD
assignment variable that has ordinal prop-
In a RD design, individuals are assigned erties or better. Continuous measures such
to treatment and comparison groups solely as income, achievement scores, or blood
on the basis of a cutoff score from some pressure work best, while nominal measure-
assignment variable. The assignment variable ments such as race or gender do not work
is any measure taken prior to the treatment at all because they cannot lead to correct
intervention, and there is no requirement that modeling of the regression line. However, the
the measure be reliable. The obtained fallible continuous assignment variable can take on
score suffices. Individuals who score on one any form. It can be a pretest measure of the
138 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
60
cutoff score
‘projected’
regression line ‘discontinuity’
or treatment
55 effect
50
Y
treatment
group
regression
45
line
60 cutoff score
55
regression
50
Y
line
45
40
40 45 50 55 60
X
score on the assignment variable and so discontinuity can be estimated using non-
can be perfectly modeled. In other quasi- parametric regression techniques, and other
experiments, how units came to be assigned economists have attributed the virtues of RD
to treatment is usually not fully known. We to the near randomness of allocation decisions
cannot control for all the possible covariates around the cutoff point itself. So they use
that might discriminate between students analytic methods that give greatest weight to
who volunteer to participate in a dropout observations closer to the cutoff on grounds
prevention program versus those who do not. that this is where random error is most likely to
Indeed, the methodology literature is replete determine treatment status. The disadvantage
with mostly unresolved debates about pro- of this assumption and its attendant analysis
cedures that might control for selection in strategy is that treatment effects are identified
quasi-experiments other than RD. However, only at the cutoff, thus limiting external
only in RD and the randomized experiment validity over what would be the case when
is the selection process completely known slopes on each side of the cutoff have similar
and measured. This is why strict adherence values.
to assignment based on the cutoff is essential Adding to the theoretical case for the bias-
if RD is to yield unbiased results, just as free nature of perfectly implemented RD
strict adherence to the ‘coin toss’ allocation are the results of three empirical studies
is crucial for interpreting a randomized that compare effect sizes from RD and
experiment. experimental benchmarks. Aiken et al. (1998)
Goldberger (1972a, 1972b) proved that examined how students enrolled in a college
generalized treatment estimates obtained remedial writing class performed in essay
from RD are comparable to estimates from writing and on a Test of Standard Written
randomized-experimental designs. However, English (TSWE) when compared to students
unbiased estimates require meeting the without the remedial course. Before the
following key assumptions: that the cutoff is study began, students at this university were
rigorously followed; that the functional form assigned to the remedial class on the basis
of the relationship between the assignment of a cutoff score either on the ACT or SAT.
and posttest can be fully described; that there The RD design used this feature to create
are enough assignment values to responsibly the treatment group consisting of all those
estimate the regression line each side of students scoring below the cutoff, and the
the cutoff; and that the assignment variable comparison group from all those scoring
is continuous. Under these conditions, and above it. In addition to the RD design, the
when the assignment and outcome variables authors included a randomized experiment
are linearly related, a single regression that took a sample of volunteers from just
function or ANCOVA can be used to estimate below the cutoff and randomly assigned them
treatment effects, with the group assignment to the remedial course or Standard English
variable and the cutoff being included as writing class. Despite differences in where
covariates. However, as Goldberger (1972a, treatment effects were estimated for both the
1972b) also showed, the RD analysis will experimental and RD studies, the authors
have approximately 2.75 times less statistical found that both designs produced similar
power than an experiment with the same patterns of results in significance levels and
sample size when the cutoff is at the midpoint effect size.
of the assignment variable. The second experiment RD contrast was
Econometricians have extended the discus- by Buddelmeyer and Skoufias (2003). They
sion of statistical analysis in RD by devising reanalyzed data from PROGRESA, a large-
methods that bypass the questions of which scale Mexican program aimed at alleviating
variables are needed to model outcomes and poverty through investments in education,
their functional form. Hahn et al. (2001) have nutrition, and health. The authors took
shown that treatment effects at the point of advantage of the fact that Mexican villages
BETTER QUASI-EXPERIMENTAL PRACTICE 141
were randomly assigned to PROGRESA, experiment and ensured that the RD causal
but that families within the experimental estimate was at the same average point on
villages were then assigned into treatment the assignment variable as the experiment,
conditions based on their score on a scale creating a more interpretable contrast of the
of material resources. For the experimental two design types.
and RD studies, the authors examined whether The experimental and RD analyses com-
PROGRESA improved school attendance and pared results for three outcomes — weeks
reduced labor force participation among girls receiving unemployment insurance (UI) ben-
and boys between the ages of 12 and 16. efits, amount of UI benefits received, and
Overall, the authors found close correspon- annual earnings. The RD analyses weighted
dence in the experimental and RD results. data closer to the cutoff and examined how
However, there was one round of results the correspondence between experimental and
where the RD and experimental findings RD results varied with proximity to the cutoff.
diverged and, after additional analyses, the The assignment and outcome variables were
authors found evidence of spillover effects not linearly related, but even so a close
in the comparison group that produced correspondence was obtained between the
dissimilar RD findings. This led the authors experimental and RD results in statistical
to conclude that, ‘it is the comparison group significance patterns, magnitude of estimates,
rather than the method itself that is primarily and in direct tests of differences between
responsible for the poor performance of the RD and experimental impacts. This was
the RD.’ especially true when the RD observations
The third direct comparison of experiment/ were closest to the cutoff. The implication of
RD results is the most methodologically all three attempts to check RD results against
advanced. Black et al. (2005) reanalyzed data experimental ones is that the design generates
from a job training program in Kentucky that bias-free results, not just in theory, but also
assigned those likely to exhaust unemploy- in complex research practice.
ment insurance to mandatory reemployment Black et al.’s (2005) study further illustrates
services as a requirement for benefit receipt. that researchers can handle non-linearity
The RD was claimants’ assignment into job in the relationship between the assignment
training programs based on a single score variable and the outcome. They did this
derived from a 140-item test predicting the by varying the range of the assignment
likelihood of long-term unemployment. For variable and putting an a priori faith in
each local employment office in each week, estimates with the least range. It is also
new claimants were ranked by their assigned possible to use non-parametric regression or to
scores. Reemployment services were given include a range of models using higher order
to those with the highest scores, followed terms, interactions, and/or transformations
by those with the next highest scores until of variables in order to probe the stability
the slots for each office each week were of results across alternative specifications of
filled. When offices reached their maximum functional form. Best of all, though, is
capacity, and if there were two or more to get measures of the outcome variable
claimants with the same profiling scores, from a period prior to the intervention.
then random number generators were used Such a pretest helps describe the functional
to assign the remaining claimants with the form of the assignment/outcome relation-
same profiling scores into treatment condition. ship independently of the influence of the
Thus, only claimants with marginal profiling treatment in order to permit an analysis
scores — the point at which capacity con- that, in essence, differences the pre- and
straint was reached in a given week and in a post-intervention slopes each side of the
given local office — were randomly assigned cutoff. This design response to the problem
into experimental groups. This sampling of possible non-linear relationships stands
procedure resulted in a true tie-breaking in stark contrast to statistical responses
142 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
that are based on non-parametric regression, external organizations. Schools that were
differential weighting, and willingness to limit below the 15 percent cutoff were assigned
the external validity of the causal relationship to the treatment condition (teacher develop-
to just around the cutoff point. ment) while schools above the cutoff served
as the comparison group. The independent
variable was resources for teacher training;
Examples of RDs
the assignment was the percentage of students
Earlier, we suggested that RD is slowly who met national norms in reading; and the
becoming a more popular choice for eval- outcome was math and reading achievement
uation in education, where standards-based among elementary school students. Results
reforms have allocated funds, resources, and found that teacher training had no statistically
penalties based on students’ or schools’ significant effect on either students’ math or
obtained scores on achievement tests. This reading achievement.
section offers more examples for how RD However, the cutoff for assignment in Jacob
can be used to evaluate treatment effects and Lefgren’s (2002) study was not as clean
in education; and it also illustrates another as one would want. First, several schools
common problem in RD that arises when that scored below the probation cutoff were
the cutoff point is not the only criterion for waived from the policy (15 of the 77). Second,
treatment assignment and when other, more 25 schools originally placed on probation
social or political factors, also enter into the raised student achievement by enough to
allocation decision, making it fuzzy rather be removed from probation even before the
than sharp as is preferable for RD. treatment was completed. On the other hand,
Trochim (1984) analyzed data to determine 16 schools that missed the probation cutoff in
the effects of compensatory education on the first year were placed on probation in the
student achievement. He examined a second- next two years. Finally, there was substantial
grade compensatory reading program in Prov- student mobility between schools. Including
idence, Rhode Island where all children in overrides to the cutoff in the analysis sample
the same pool were pre-tested using a reading is likely to produce bias in treatment effect
test. Those who scored below the cutoff were estimates, as is failure to take up the assigned
assigned to a reading program while those treatment and attrition from the sample after
who scored above the cutoff were not assigned assignment.
treatment. His analysis of Rhode Island Several statistical procedures have been
second-graders found that the program signif- proposed to address fuzzy discontinuity.
icantly improved children’s reading abilities. In the first approach, suggested by Trochim
However, few other state compensatory and Spiegelman (1980), an estimated assign-
education programs that Trochim examined ment variable is constructed for each unit. Its
yielded similar positive effects (1984). distribution resembles, not the step function of
Jacob and Lefgren (2004a, 2004b) exam- a sharp discontinuity, but an ogive or spline
ined the effects of teacher training and whose slope value depends on how much
summer school participation and retention mis-assignment has occurred. A simulation
on student achievement in Chicago Public study by Trochim (1984) and an evaluation
Schools (CPS). We describe the design of of Title I (Trochim, 1984) show the use of
the teacher training study in detail only. such functions as an unbiased method for
In 1996, CPS introduced a reform that placed dealing with fuzzy discontinuity. The second
schools on academic probation if fewer than approach, employed by Jacob and Lefgren
15 percent of students met the national norms (2004a, 2004b) and others (Angrist & Lavy,
on standardized reading exams. To improve 1999; van der Klauww, 2002), uses an instru-
academic achievement, CPS provided pro- mental variable (IV) framework. Here, fuzzy
bation schools with funds and resources discontinuity is seen as an endogeneity issue,
to buy teacher development services from where the assignment variable is believed to
BETTER QUASI-EXPERIMENTAL PRACTICE 143
be correlated with unobservables in the error series (ITS) design can be used to assess
term. An ideal instrument in RD is a variable whether a treatment administered at a known
that affects the outcome only through its time during the series leads to a change
association with the endogenous assignment in intercept or slope at the intervention
term. In principle, the use of an instrument point. In much social science practice, it is
expunges correlation between the assignment difficult to find studies with enough time
variable and the error term. In practice, it may points to estimate the error structure and
be difficult to know what a good IV is because provide responsible analysis at a district,
one cannot test whether the IV in question school, class or student level (Box & Jenkins,
is truly uncorrelated with unobservables in 1970)4 . Much more common are abbreviated
the error2 . Jacob and Lefgren (2004a) used ITSs with, say, 4 to 20 pretest time points.
discontinuities in school test scores for Indeed, standards-based reform in education
predicting whether teachers received training has led to the repeated tracking of student
or not, and then used the predicted term as their test scores, providing many opportunities for
instrument for the assignment variable in the abbreviated ITS design and analysis. This
parametric RD models. They ran sensitivity section discusses the design, the theory and
tests to explore alternative pathways for empirical research supporting its validity, and
how test scores could influence the outcome examples of how it has been used.
other than through its relationship with the
assignment and found no such evidence. Thus,
the authors concluded that they had a valid
The basics of controlled abbreviated
instrument for addressing fuzziness3 .
ITS design
Finally, it is important with RDs to examine
empirically the social dynamics of the cutoff. A time series requires repeated measurements
In the Irish school-leaving examination, it was made on the same variable over time. The
discovered that scores just below the passing observations can be made on the same units, as
cutoff score were underrepresented in the with multiple test scores on the same student,
frequency distribution, presumably because or on different but similar units, as with
examiners did not want to hurt a student’s test scores from multiple cohorts of students
chances by assigning them a 38 or 39 when within the same school. ITS also requires an
40 was the passing score. In other RD studies intervention that is supposed to generate an
it is not unknown for social workers to mis- interruption in the series at a known point in
represent family income around cutoffs that time corresponding to implementation of the
determine eligibility for services. Researchers treatment. The design also works better when
should control the assignment process as a rapid response to the intervention is expected
much as possible and observe the process (or when the response interval is well known,
directly, preferably in a pilot research phase so as with 9 months in the case of the period from
that potential problems can be addressed. This intercourse to birth), and when the intervals
same advice holds for the experiment also. Its between observations are short. If a treatment
implementation needs to be directly examined is phased in slowly over time, or if it reaches
and otherwise checked. different sections of the target population at
differing times, then implementation is better
described as a gradually diffusing process
rather than as an abrupt intervention. In these
ABBREVIATED INTERRUPTED TIME cases of delayed intervention, the chance
SERIES DESIGN WITH A CONTROL of other events influencing the outcome
SERIES increases, making history a plausible threat to
internal validity. At a minimum, the diffusion
When a series of observations are available process should be directly observed and,
on the same variable, an interrupted time where possible, modeled.
BETTER QUASI-EXPERIMENTAL PRACTICE 145
1000
900
800
Intervention
Number of Calls
700
600
500
400
300
200
100
0
1962 1964 1966 1968 1970 1972 1974 1976
Year
slope, and variance), its permanence (contin- assessing whether effects are immediate or not
uous or discontinuous), and its immediacy and continuous or not.
(immediate or delayed). In March 1974, We are only aware of one study testing
Cincinnati Bell began charging 20 cents per the validity of an abbreviated ITS design
call to local directory assistance. Figure 10.5 by comparing its results to those achieved
shows an immediate and large drop in local from a randomized experiment that had the
directory assistance calls when this charge same intervention group. Bloom et al. (2005a;
began. But treatment effects can be described Michalopoulos et al., 2004) reanalyzed data
along dimensions other than their means. from the 11-city NEWWS, a component
A continuous treatment effect persists over of the Job Opportunity and Basic Skills
time, while a discontinuous effect tends to (JOBS) program that mandated job training
drift back to pre-intervention level after the services for unemployed individuals. The
initial effect wears off. Figure 10.5 shows study involved at least 8 pretest quarterly
a continuous treatment effect because the reports on earnings prior to intervention and
change in level persisted well into 1976. 20 quarters of earnings post-intervention.
Effects can also be immediate or delayed. Four cities — Oklahoma City, Riverside,
Immediate treatment effects are easier to Portland, and Detroit — included welfare
interpret, while delayed effects are more recipients in one part of the city who were
problematic because plausible alternative randomly assigned to treatment or control
explanations may be introduced in the time group, and the non-equivalent comparison
interval between intervention onset and the group for the ITS study was composed of
recorded response. Therefore, a strong theo- people from another part of the same city. In
retical justification that predicts the length of fact, comparisons had comprised of individ-
a delay is helpful when examining delayed uals who had served as controls in the same
effects, such as the expectation of increased experiment. A fifth comparison was in-state
births nine months after a citywide electricity rather than within-city, involving treatment
blackout, not three months after the event. and comparison groups from Detroit and
In the Cincinnati Bell case, the treatment Grand Rapids. All the data we report here are
response was immediate, with a large drop at the site mean level, aggregated up from
in directory assistance calls occurring on longitudinal individual data collected at the
intervention day. When interpreting an ITS same times and on the same measures for
study, it is helpful to describe effects in terms both the experimental and the abbreviated
of changes in level, slope, and variance, thus ITS samples. The general logic with empirical
144 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
O1 O2 O3 O4 O5 X O6 O7 O8 O9 O10
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
Also, the inclusion of an untreated control to be out of work in 1963. So their reported
group with multiple observations can help earnings in 1963 had to be depressed.
rule out plausible threats to validity such as Consider the numerous threats to validity
history, maturation, and statistical regression if Ashenfelter had used only one pretest
that a simple ITS cannot rule out. So a quality- measure. First, he would not have been able
abbreviated ITS requires both treatment and to eliminate ‘maturation’ as an alternative
control groups for which there are multiple explanation. Under the maturation hypothesis,
and frequent observations before and after the training members’ earnings increased at a
intervention. A simple design with a control faster rate than comparison members’ but
group and 10 observations is depicted in had started at a lower point than comparison
Figure 10.4. units, even before 1963. With multiple years
of earnings data, Ashenfelter was able to
examine the data for group differences
Unique characteristics of this in maturation. Second, regression to the
mean would have been difficult to discount.
design: Theoretical and
In this scenario, if unemployment of treatment
empirical reasons
group members in 1963 was temporary
There are several potential advantages of and necessary for program inclusion, then
the abbreviated ITS for assessing treatment the increase in earnings after 1963 might
effects. Ashenfelter (1978) examined the have occurred even without participation in
effects of participation in a job training the treatment. Using multiple years of pre-
program on earnings for Blacks and Whites intervention data, Ashenfelter (1978) found a
and for males and females. The treatment small decrease in earnings for the treatment
group consisted of individuals who began job group between 1962 and 1963, but not
training under the Manpower Development enough that regression could have accounted
and Training Act in the first 3 months of 1964. for all treatment effects. Finally, history
The comparison sample was constructed from would have been another plausible alternative
the 0.1 percent Work History Sample of the explanation. Under this threat, observed
Department of Labor, a random sample of increases in earnings would have been due
earnings records on American workers. The to upward trends in the economic cycle,
outcome was earnings at 11 time points for and not to treatment effects. Multiple pretest
each of the groups. In addition to multiple observations allowed Ashenfelter to test for
posttest observations, Ashenfelter had four seasonal or cyclical patterns in the data.
years of earnings for the treatment and Note in this example that it is the length,
comparison groups prior to the intervention. number, and frequency of pre-intervention
Posttest results suggested that participation in time points that permits the examination of
the job training program increased earnings common threats to validity. Multiple posttest
for all the treatment groups by race and observations help determine the temporal
gender. However, Ashenfelter noted that pattern of an effect, but they cannot rule out
treatment group members had lower earnings alternative explanations.
than the comparison group in the year The second unique feature of ITS design
before intervention. While comparison group is that treatment effects can be assessed
members remained in the labor force, those along multiple dimensions. The next example
eligible for job training in 1964 were required demonstrates that treatment effects can be
as a condition of acceptance into the program measured by the form of the effect (level,
146 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
tests of the correspondence in results between each of the five sites, the intervention point
an experiment and quasi-experiment is that the being designated as 0 on the time scale.
randomly formed control group and the non- Visual inspection suggests no shift in the
randomly formed comparison group would intercept at the intervention point in three
have to be identical if they were to produce the sites — Oklahoma City (N controls = 831;
same causal effect size, given that both groups N comparisons = 3,184), Detroit (N controls
would be analyzed with the same treatment = 955; N comparisons = 1,187), and
data. In the ITS case, though, the logic is Riverside (N controls = 1,459; N comparisons
slightly different. The means and slopes can = 1,501). There were no reliable differences
differ, but not the behavior of the control in slopes either, though the possibility of
or comparison group around the intervention such is indicated in the later lags in both
point. Any temporal changes observed there Detroit and Riverside. However, these small
can masquerade as alternative interpretations differences had opposite signs and basically
of an immediate program impact. cancelled each other out. Indeed, neither the
Figure 10.6 displays the means over time means nor trends reliably differed at any of
for the control and comparison groups at these three sites and would not differ if they
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Quarters from Random Assignment
Oklahoma City Rural (control group) Oklahoma City Central (comparison group)
1600
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Quarters from Random Assignment
Detroit Fullerton (control group) Detroit Hamtramck (comparison group)
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Quarters from Random Assignment
Riverside City (control group) Riverside County (comparison group)
1600
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Quarters from Random Assignment
Grand Rapids (control group) Detroit (comparison group)
1600
1400
1200
1000
800
600
400
200
0
−8 −6 −4 −2 0 2 4 6 8 10 12 14 16 18 20
Quarters from Random Assignment
Portland West Office (control group) Portland East and North Offices (comparison group)
were aggregated into a single analysis. These all 28 time points irrespective of the sign of
results suggest that selecting these samples these differences, thus capitalizing on random
within cities induced so much comparability error of whatever source. By contrast, our
that further individual matching would hardly analysis was of average bias, of the difference
help control bias. between the two types of comparisons across
The comparability between control and 28 time points per site when account is taken
comparison groups was not replicated in of the signs attached to each difference at
Portland, the smallest site where the random- each time point. Fortunately, Bloom et al.
ized control cases were about a third of the also compute average bias, reporting that the
next smallest control group (N controls = comparison and control group differed by
328; N comparisons = 1,019). Figure 10.6 between 1 percent and −3 percent at two years
shows that while control and comparison after the intervention and between 3 percent
groups were stably different at the series’ and −4 percent at five years, even when the
very beginning and throughout most of the less appropriate Grand Rapids/Detroit com-
post-intervention period, one group exhib- parison was included in the calculation. Such
ited a large earnings dip immediately prior a close correspondence between an experi-
to the intervention. Thus, the control and mental control group and a nonexperimental
comparison groups did not act similarly comparison group would lead to experimental
pre- and post-intervention, as ITS requires. and quasi-experimental effect sizes that do not
The same was true when Grand Rapids differ when each is subsequently yoked to the
(N = 1,390) was compared to its within-state same treatment group.
comparison, Detroit (N = 2,142). Here, the Even accepting Bloom et al.’s (2005a;
pre-intervention group means differed, but not Michalopoulos et al., 2004) analysis of
the post-intervention ones. This again implies absolute rather than average bias, we would
that different causal conclusions would arise still have reason to be concerned about
between the randomized experiment and the generalizing the study’s findings to other
abbreviated ITS quasi-experiment yoked to it. research domains. Despite the 8 pretest
However, design and city differences were observations, the earnings measures were
confounded in this last analysis. While Detroit not highly correlated across a year (by our
and Grand Rapids are in the same state, they rough estimate, about 0.42). As a point of
are not in the same city and so would likely comparison, for example, student test scores
have different local labor markets with their tend to correlate on a magnitude of about 0.58
unique economic pressures at different times. to 0.74 in math and 0.60 to 0.74 in reading
If we were to sum the four within-city (Bloom et al., 2005b). The relatively low
comparisons and weight Portland appropri- annual correlations for earnings suggest that
ately less than the other sites, there would be the pretests were limited in their usefulness
little or no difference between the control and as selection controls than would be the case
comparison groups around the intervention when examining academic achievement, for
point and hence, there would be little causal example. Even so, the number of pretest
bias. The same would likely be true if all observations still helps, for Bloom et al.
five sites were summed. In this particular report that constructing a pretest covariate
case, the abbreviated ITS would not be out of pretest earnings data from varying
biased relative to the experiment. However, numbers of waves led to less bias the more
Bloom et al. (2005a; Michalopoulos et al., waves there were. The presumption is that
2004) concluded that the within-state control creating a single pretest measure out of more
and comparison groups did not closely waves leads to more reliable estimation of
approximate each other. Their analysis of that pretest selection difference. Bloom et al.
absolute bias was predicated on computing could not show, though, that constructing an
the difference between the randomly and non- individual level growth model helped reduce
randomly formed comparison groups across the selection threat they claimed to find when
BETTER QUASI-EXPERIMENTAL PRACTICE 149
analyzing absolute bias. But even so, the variable was a time series of 10 observations
lower correlations among adjacent earnings on awareness of alcohol abuse among college
measures suggest that growth trends were not students. The two nonequivalent dependent
stably estimated in this project, the more so variables, good nutrition and stress reduction,
since quarterly data were analyzed and these were conceptually related to health, and
are presumably even less stable than the 0.42 thus would reflect changes if the treatment
correlations for annual data. effect was due to a general improvement
To summarize, when we look at the four in attitudes toward health. However, since
within-city comparisons, there were no differ- good nutrition and stress reduction were not
ences between control and comparison groups targeted by the campaign, they would not
at three of the sites. The only difference was in show improvements if the effect resulted from
the smaller, less stable Portland comparison. the treatment alone. As Figure 10.7 shows,
For the fifth within-state comparison, labor awareness of alcohol abuse clearly increased
markets between Grand Rapids and Detroit during the media campaign, but awareness
were different enough that we would expect of other health-related issues did not.
these sites to produce inferior matches to those McClannahan et al. (1990) employed a
of truly local comparison and control groups. switching-replications feature to assess the
Even so, when results were summed across all effects of providing married couples who
sites, the average biases cancelled each other supervised group homes for autistic children
out, and the quasi- and experimental studies with regular feedback about the daily personal
yielded estimates with close correspondence. hygiene and appearance of the children in their
Thus, we disagree with Bloom et al.’s (2005a; home. The authors used a short time series
Michalopoulos et al., 2004) conclusion about (21 observations), with feedback introduced
different effects attributable to the experiment after Session 6 in Home 1, Session 11
and quasi-experiment. Fortunately, it is easy in Home 2, and Session 16 in Home 3.
for readers to judge for themselves. Just Figure 10.8 shows that after each introduction,
look at Figure 10.6 and see whether there the personal appearance of the children in
is a control/comparison difference around the that home increased above baseline, and
intervention point for most of the within-city the improvement was maintained over time.
cases. Both examples, however, demonstrate one
limitation of abbreviated time series data —
the difficulty in knowing the duration of an
Examples of ITS design
effect. For example, Figure 10.7 shows an
In this section, we use examples of abbre- apparent decrease in alcohol abuse awareness
viated ITS to highlight two design features after the two-week intervention.
over and above those already mentioned — Two additional features, removing a treat-
a longer pretest time-series and a con- ment at a known time and adding multiple
trol series selected from non-equivalent replications of a treatment, can strengthen
but matched units. The two features we inference in an abbreviated ITS design. In the
emphasize are nonequivalent dependent vari- former, treatment effects can be demonstrated
ables and switching replications. When they by not only showing that the effects occur
are thoughtfully incorporated into quasi- with the treatment but also that the effects
experimental designs, many common threats stop when the treatment is removed later in the
to validity can be addressed. time series, making this design akin to having
In a study that assessed the effects of two consecutive ITS. In multiple replications,
a 1989 media campaign to reduce alcohol the treatment is introduced, removed, and
use among students at a university festival, then introduced again according to a planned
McKillip (1992) added two nonequivalent schedule. A treatment effect is suggested if
dependent variables to strengthen inference the outcome responds similarly each time the
in a short time series. His main dependent treatment is introduced and removed, with the
150 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
3.2 Campaign
Good Nutrition
3.0 Stress Reduction
Responsible Alcohol Use
2.8
Awareness
2.6
2.4
2.2
2.0
28-Mar 30-Mar 4-Apr 6-Apr 11-Apr 13-Apr 18-Apr 20-Apr 25-Apr 27-Apr
Date of Observation
Figure 10.7 The effects of a media program to increase awareness of alcohol abuse
Source: McKillip, 1992. Copyright 1992 by Plenum Press.
100
Mean Personal
Appearance
75
Score
50
Home 1
25
100
No Feedback Feedback
Mean Personal
Appearance
75
Score
50
Home 2
25
100
Mean Personal
Appearance
75
Score
50
Home 3
25
1 3 5 7 9 11 13 15 17 19 21
Sessions
Figure 10.8 The effects of a parental intervention on the physical appearance of autistic
children in three different homes
Source: McClannahan et al., 1990. Copyright 1990 by The Society for the Experimental Analysis
of Behavior.
BETTER QUASI-EXPERIMENTAL PRACTICE 151
direction of responses being different for the validity threats as maturation, regression,
introductions compared with the removals. and history.
performance on math and vocabulary topics design, and appropriate statistical procedures.
close to those taught. School records were At a minimum, a quality quasi-experiment
also examined to get their grades in math and minimizes selection by clearly measuring
language arts courses as well as their SAT and modeling it. Note the pool from where
scores. For students exposed to math coach- comparison group members were drawn —
ing, their posttest math scores functioned as not only were comparison matches ‘local’ to
the intervention-relevant outcome while their treatment members, but they also attended
vocabulary posttest results served as con- the same institution, were of similar ages,
trols. For students who received the reading and exposed to similar experiences at the
intervention, vocabulary performance was the institution (a psychology course). In addition,
intervention-relevant outcome and math per- the authors modeled a selection process
formance served as controls. Half of the stu- where individuals’ motivation for choosing
dents were randomly assigned a vocabulary or a field of coaching was related to their
math treatment, while students in the quasi- interests and cognitive strengths, measured
experimental design condition were able to by pre-intervention, psychometrically sound
choose which treatment they received. The pre- multi-item questionnaires assessing students’
test results indicated no treatment and control motivation to learn about math and language
group differences in the experimental condition, arts, their test scores in math and language
but not in the quasi-experimental condition. arts, past grades in math and language arts
Those who chose the vocabulary intervention courses, and content-valid scales specifically
had higher vocabulary pretest scores than constructed to assess math and vocabulary
those who chose the math intervention, and knowledge. This last was also used to
vice versa. Thus, the quasi-experimental measure post-intervention outcomes, with the
groups were non-equivalent in a way that was expectation that pre- and post-intervention
originally intended by the authors. test scores would be highly correlated.
Consider the ways in which Shadish In all, access to rich covariates that modeled
et al.’s (2007) study meet our criterion for the selection process, and strong overlap
a strong test of design types. First, pretest in background characteristics between
scores for students in the experimental and treatment and comparison group members,
quasi-experimental conditions indicate that enabled the authors to use a statistical
there was variation between design types. procedure called propensity score matching.
Next, as we have already discussed, the Like other matching techniques, propensity
random assignment of students into the exper- scores seek to pair treatment and comparison
imental and quasi-experimental conditions group members on observable characteristics
and the uniformity in procedures for both con- that are stably measured. One problem is
ditions ruled out variation in features that were that, as the number of matching variables
correlated with both design type and outcome. increases, so does the dimensionality of
Third, the laboratory conditions meant that matches, making it exponentially more diffi-
the random assignment process was entirely cult to find suitable matches for each treated
under experimenter control and its efficacy unit. Propensity scores reduce this problem
could be independently checked against the by creating a single index of the propensity
pretest means. The setting also prevented to be exposed to the treatment through a first
differential attrition from the two design type stage in the analysis where potential predictors
groups, and no treatment contamination from of selection are used to see which ones are
math to vocabulary coaching and vice versa. related to treatment exposure understood in
So we feel confident that results obtained binary fashion. A propensity score is the
from the experimental design served as a valid probability of receiving treatment conditional
standard for which to compare estimates from on these pretreatment covariates that are
the quasi-experiment. weighted and put into a single index. The
We would also expect that a fair test of design advantage of propensity score matching is that
types uses a sophisticated quasi-experimental it allows researchers to condition on a single
BETTER QUASI-EXPERIMENTAL PRACTICE 153
scalar variable rather than multiple dimension contacted prior to the quarter they attended
spaces; this single variable is then used to the university, or whose decision to enroll
analyze the outcome data in a number of was made after information was collected
different possible ways. for the RD study. Because the hard-to-reach
Because there is some art to the use of and late-applying students still had SAT or
propensity scores, Shadish et al. (2007) con- ACT scores as requirements for admission,
sulted with one of the method’s developers, the authors were able to create a quasi-
Paul Rosenbaum. They then used his rec- experimental comparison group that was
ommendations, first, to calculate propensity restricted to those who scored within the
scores from the array of covariates collected same bandwidth as students in the randomized
and, then, to achieve good balance across the experiment.
five equal strata computed from these scores. Note that the matching took place within the
(Another analysis used the propensity scores sampling design and not ex post facto when
as covariates, after testing and adjusting for cases from obviously different populations
possible non-linearities — this not being the would have to be individually matched by
analysis Rosenbaum recommended.) taking advantage of where they overlap.
Within-study comparisons results indicated Indeed, the match at the sampling level was
that under the conditions built into this so close, as in Bloom et al. (2005a) that
study, the experiment and quasi-experiment the control and comparison groups did not
resulted in post-intervention effect sizes differ on any observables correlated with
that corresponded for the experiment and the outcomes of interest—viz., on entry-level
quasi-experiment. Bias was not significantly ACT/SAT scores or pretest essay writing and
reduced, however, when just the demographic multiple choice exam scores. Moreover, the
variables—called predictors of convenience experimental and quasi-experimental samples
by the authors—were used and in one case underwent the same treatment and non-
may even have increased it. Later analyses treatment experiences and the same measure-
showed that a measure of the strength of ment schedules in order to rule these out as
motivation to be exposed to math or language sources of conceptually irrelevant variance.
arts was the most important single covariate In all, this was a carefully constructed quasi-
for reducing bias, particularly for the effects experiment despite the modest structure of
of instruction in mathematics, followed by just two non-equivalent groups and a single
the measures of math and language arts pretest measurement wave. The randomized
achievement. The key assumption is that, in experiment was also carefully managed. The
this case, the selection process was driven authors demonstrated that pretest means did
largely, but not exclusively, by individuals not differ and differential attrition did not
self-selecting themselves into coaching on the occur. Given the close correspondence in
subject matter about which they felt more means, the authors used ANCOVA to ana-
comfort. It seems plausible to hypothesize, lyze the quasi-experiment, with each pretest
therefore, that the quality of the covariate outcome serving as a covariate for itself at
structure played a role in reducing bias, and a later date.
that this quality reflects how well the selection For the test of English knowledge out-
process was conceptualized and measured. come, effect size results were 0.57 standard
The second study we identified as a deviations for the quasi-experiment and
strong test of the difference-in-differences 0.59 for the randomized experiment, both
design is by Aiken et al. (1998). In addition being statistically different from zero. For
to the RD design discussed earlier, the the essay-writing outcome, the effect sizes
authors compared their experimental results were 0.16 and 0.06, neither being reliably
on the efficacy of remedial English with different from zero. Thus, by criteria of both
estimates obtained from a carefully designed effect size magnitude and statistical signifi-
basic quasi-experiment. Their comparison cance patterns, the experimental and quasi-
group was of students who could not be experimental design produced comparable
154 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
results. Note that the close correspondence comparison cases came from national reg-
was achieved largely through careful selection istries rather than from the same local
of the non-equivalent comparison cases, prior venue as the experiments, and earnings were
to statistical adjustment that in this case served measured at different times in each study,
more to increase power than to control for thus confounding the type of design with the
selection. state of local labor markets as they varied
We now briefly review within-study com- over time (Smith & Todd, 2005). In addition,
parisons from the job training literature. These selection models were sometimes estimated
studies have had extraordinary influence over using few demographic variables that did
methodology choice in American evaluation not even contain pre-intervention earnings
policy, with conclusions from these studies assessments, though many later studies did
suggesting that quasi-experiments fail to include these measures. However, the pre-
replicate benchmark experimental estimates intervention variables were not analyzed in
(Glazerman et al., 2003). However, a close the abbreviated time-series fashion detailed
reading of these early papers (Cook et al., earlier, but combined into a single propensity
2005; Smith & Todd, 2005) suggests that the score. In fairness, the early job training
experiments and quasi-experiments differed studies were conducted by pioneers, many of
in many other ways than just in how treatments whom were anxious to investigate matching
were assigned, thus making obscure why the strategies with databases that had already
experiments and quasi-experiments differed been collected for non-evaluation purposes
in obtained effect sizes. Was it due to the mode and that would be much less expensive than
of treatment assignment, the issue at stake, or constructing randomly formed control groups
was it due to extraneous differences between as in an experiment. So their interests were
the two study types — e.g. in how outcomes pragmatic as well as theoretical.
were measured? The early within-study comparisons
The earliest within-study comparisons in spawned more studies similar in conception
job training took the effect size from a and overall structure but differing in some
randomized experiment and compared it to the details. Later studies used newer statistical
effect size from a quasi-experiment consisting tools for handling selection, more experi-
of the same intervention group (Fraker & mental datasets were added, and different
Maynard, 1987; LaLonde, 1986). Comparison ways evolved for constructing quasi-
group members were drawn selectively and experimental comparison groups, moving
systematically from large, national datasets, away from the use of national datasets to
such as the Panel Study of Income Dynamics comparisons that were living quite locally
or the Current Population Study. Data from the to the treated (Smith & Todd, 2005). This
quasi-experiments were then analyzed using was to unconfound the mode of treatment
various statistical models, including OLS and assignment with differences in location and
the Heckman selection models of the day. testing in order to draw clearer conclusions
Here the emphasis was on selection adjust- about the effects of random assignment or not.
ment via statistical manipulation rather than Overall, the job training literature
sample selection. When the resulting effect yielded some important lessons about
sizes were compared to the effect size from the quasi-experimental design. We learned that
yoked experiment, the authors concluded that ‘technically better’ designs had the following
the experimental and nonexperimental effect features: (1) pretests and longer pretest time
sizes were generally different whatever the series, especially those with higher pretest-
mode of statistical adjustment for selection. outcome correlations; (2) local control
Unfortunately, the design comparisons groups, though this never went so far as to
were almost inevitably confounded with use twins, siblings or within-organization
both location and manner of testing. In the comparisons; (3) treatment and comparison
quasi-experiments, unlike the experiments, groups assessed in exactly the same way at
BETTER QUASI-EXPERIMENTAL PRACTICE 155
exactly the same time by exactly the same A closer look at the quasi-experimental
assessment procedure; (4) testing a causal design, however, suggests several weak-
hypothesis with several implications in the nesses. First, because the original study was
data; and (5) directly and comprehensively a randomized experiment with large samples
measuring and modeling the selection process of students, Project Star did not require
in its own right. pretest achievement measures. Most non-time
series quasi-experiments are considered to be
causally uninterpretable if they are without
Examples of difference-in-
pretest measures because it is so difficult to
differences design
rule out selection effects in any transparent
In this section, we discuss examples of fashion (Cook & Campbell, 1979). A second
the nonequivalent comparison group design. concern is that treatment students were
Because of the wide variation in quality matched with control students who attended
of quasi-experimental designs that use this schools from all over the state of Tennessee,
design model, we present two studies that thus reducing the degree of localness. Yet with
Cook and Campbell (1979) would identify little extra effort the researchers could have
as ‘generally uninterpretable,’ and two that first selected schools or classrooms in terms
we believe are exemplars of the design. of their average student race or free lunch
We begin with two studies that attempt to status, then creating their individual student-
strengthen quasi-experiments almost exclu- level matches from within this prior school
sively through the use of statistical matching and/or classroom matching. Better yet, since
procedures, in this case matching through prior achievement data is routinely available
propensity scores. at the school level, and sometimes even at the
The original intent of the Wilde and classroom level, why did the researchers not
Hollister (2007) and Agodini and Dynarski match on prior aggregate level achievement
(2004) studies was to compare impact causal before then matching on individual level
estimates from a randomized experiment with propensity scores? The matching procedure
those from a quasi-experimental design that used by Wilde and Hollister created treatment
used propensity scores to match what were and comparison units from such different
evidently different populations. Using data aggregate worlds that there was little overlap
from 11 schools in the Tennessee Project Star on measured variables. The alternative sam-
Study, Wilde and Hollister looked at class pling design we propose permits propensity
size effects on student achievement in each scores to be calculated from worlds that
of the 11 sites. For their quasi-experimental overlap much more from the start. We suspect
study, the researchers matched students from that the lack of pretest achievement measures
treatment classrooms within a school to and weak matching procedure with samples
students from untreated classrooms from all of limited initial comparability led to a design
other schools in the Project Start study. Their that no sophisticated researcher would use
propensity score matches were constructed if asked to create a quasi-experiment from
using data from multiple levels, including scratch. In other words, a good experiment
information about the student (especially free- is being compared to a mediocre quasi-
lunch status) and about the teacher and school. experiment in both design and analysis terms,
No pretest achievement measures were used thus confounding design type with quality
at the student, classroom, or school levels of design in features other than the mode of
since none were collected in the original treatment assignment.
study. The authors concluded that results Agodini and Dynarski (2004) is the second
from the experimental and quasi-experimental study we analyze. It examined how 16 middle
designs generally failed to replicate, and that and high school dropout prevention programs
experimental results should be preferred on a affected student dropout, absenteeism, and
priori grounds. self-esteem two years later. They provided
156 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
volunteer students with targeted services such and restricted only to the 7th , 9th and 10th
as mentoring, tutoring, individual counseling, grades. Given the differences between the
and smaller class sizes in order to reduce two groups in location and school age, and
student dropout and absenteeism and increase probably also in observed characteristics, it
self-esteem and educational aspirations over is understandable why so few acceptable
two years. In the experiment, controls were matches were achieved between treatment
randomly selected students who had applied students and comparison students from a few
to the intervention or were referred to partici- middle schools. The NELS comparison was
pate in it. Two data sources were used to con- likely no better. National datasets contain
struct the non-equivalent comparison groups relatively few persons at risk for dropping
from which propensity scores were computed. out, and so the pool of potential matches
For the first, researchers matched treatment was restricted to start with. Further, measure-
group members with students attending four ment specifics and geographic location vary
comparison schools in a quasi-experimental between the treatment group and potential
study of school restructuring. These were 7th comparison students from NELS, making it all
graders in two middle schools, 9th graders the more difficult to achieve suitable matches
in one high school, and 10th graders in when using NELS for matching purposes.
another. For the second,Agodini and Dynarski Looking at Wilde and Hollister (2007)
constructed matches from a national dataset, and Agodini and Dynarski (2004) together,
the National Educational Longitudinal Study one is reminded of the adage, ‘You cannot
(NELS). The researchers’ original plan was put right by statistics what you have done
to generate 128 propensity score matches wrong by design’ (Light & Pillemer, 1984).
across four outcomes, 16 schools and their Shadish et al. (2007) and Aiken et al. (1998)
matched comparisons, and two types of showed that the best nonequivalent group
comparison groups (NELS versus the four comparisons are from studies where matching
comparison middle schools). The number was achieved through a careful sampling
of pre-intervention covariates used in the design and where statistical adjustment is
propensity score calculations varied by data relegated to the role of an auxiliary procedure
source, but there were never fewer than 13, to control for any remaining differences
including prior test scores but not much between groups. It is definitely not the first line
dropout information since so few students of attack on initial group non-comparability.
drop out and then return to school. Below, we discuss other design features
Agodini and Dynarski (2004) concluded that improve causal conclusion-drawing from
that their quasi-experimental and experimen- quasi-experiments over and above the careful
tal designs produced different results when the sampling discussed above that antedates any
two could be compared, but that they could individual case-matching.
be compared in only 29 of the 128 planned Wortman et al. (1978) examined how a
cases. At first glance, the quasi-experimental program that provided parents with educa-
design appears strong due to the presence tional vouchers to attend a local school of their
of pretest scores and extensive baseline choice affected students’ reading test scores.
measures. However, close inspection of both The program’s goal was to foster competition
comparison group sources suggests serious between schools in the system, and initial
limitations in the sampling design. For the results by others suggested that vouch-
first comparison, students were drawn from ers decreased academic performance among
four schools not in the same school districts students. However, Wortman et al. doubted
as the treatment schools, nor necessarily these conclusions and so they followed groups
even in the same part of the country. In of students from first to third grades in both
addition, treated students were from all the voucher and non-voucher schools, and fur-
middle and high school grades whereas the ther divided voucher schools into those with
comparison students were fewer in number and without traditional voucher programs.
BETTER QUASI-EXPERIMENTAL PRACTICE 157
The authors also reanalyzed the data using reverse effect. In Hackman et al.’s study,
double pretest scores, which allowed them technological innovations in a bank resulted
to compare pretreatment growth rates in in some clerical jobs to be more complex and
reading with posttest change in rates. Results challenging (treatment +) and other jobs to
from Wortman et al.’s analyses found that be less so (treatment −). The job changes
the decrease in reading scores previously were made without telling the employees
attributed to voucher schools could actually be of their possible motivational consequences,
attributed to nontraditional voucher programs. and measures of job characteristics, employee
Further, traditional voucher and non-voucher attitudes, and work behaviors were taken
groups showed no differential effects that before and after the jobs were reconstituted.
could not be explained by a continuation An effect would be detected if a statistical
of the same maturation rates which had interaction resulted from improved scores
previously characterized the traditional and among employees who received treatment (+)
voucher control schools. and lower scores among those who received
Double pretests allowed researchers to treatment (−).
assess the threat of selection-maturation on Consider how the reversed-treatment
the assumption that the rates between the first design can strengthen a study’s construct
two pretests will continue between the second validity. In a design with only treatment (+)
pretest and outcome measure. However, this and no treatment controls, a steeper pretest-
assumption is testable only for the untreated posttest slope in the enriched condition could
group, and within-group growth rates will be be explained by employees’ responding to
fallibly estimated given measurement error novelty in their jobs, feelings of special
and possible instrumentation shifts that make treatment, or guessing the study’s hypothesis.
measured growth between the two pretests These alternatives are less plausible if the
different from the second pretest and outcome reversed-treatment group exhibits a pretest-
measure. Thus, while the double pretest posttest decrease in job satisfaction because it
design with nonequivalent groups is not is thought that knowledge of being in a study
perfect, it can help assess the plausibility tends to elicit socially desirable responses
of selection-maturation by describing pre- from participants. Thus, to explain both an
treatment growth differences. The double increase in treatment (+) group and decrease
pretest design can also assess regression in the reversed group, each set of respondents
effects by showing whether the second would have to guess the hypothesis and
pretest for either group is atypically low or corroborate it in their own different way.
high compared to the first pretest. Finally, Interpretation of this design then depends on
the second pretest measure can help with producing two effects with opposite signs,
statistical analysis by providing more precise and the design assumes that little historical
estimates of correlation between observations and/or motivation changes are otherwise
at different times. Without the extra pretest taking place.
measure, the correlation in observations
without the treatment would be unclear.
Hackman et al. (1978) strengthened the
nonequivalent comparison group design by STRENGTHENING WEAK
adding a reversed-treatment control group QUASI-EXPERIMENTAL
feature to investigate how changes in moti- DESIGNS THROUGH THE
vational properties of jobs affect worker atti- USE OF PATTERN-MATCHING
tudes and behaviors. In a reversed-treatment
control design, one group receives a treatment The quasi-experimental designs we believe
(+) to produce an effect in one direction are weak causal tests should be apparent by
and the other group receives a conceptu- now — those without a pretest measure on
ally opposite treatment (−) to produce the the same scale as the outcome, those without a
158 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
comparison group, and those without baseline content analysis showed that Sesame Street
covariates that can be combined to create a taught predominantly letter skills in its first
plausible and well-measured selection model. year, and so Minton hypothesized that the
Without a pretest, it is difficult to know younger siblings should do better than their
whether a change has occurred and to rule older siblings in letter recognition but not in
out most threats to internal validity. Without five other cognitive areas that are part of a
a comparison group, it is difficult to know child’s normal maturation. In other words, the
what would have happened in the treatment difference between siblings should be greater
group had the intervention not been in in letter recognition than in other cognitive
place — the desired counterfactual. Finally, skills. She further hypothesized that children
without relevant covariates to control for who watched the show more frequently
pre-intervention differences between groups, would do better than their siblings on letter
it is difficult to know whether selection is recognition to a degree that was different
confounded with treatment effects. These from among lighter viewers and that was
factors point to the kinds of quasi-experiment different than what was found for non-letter
that should be avoided because of the recognition skills. Thus, the hypothesis was
high risk of yielding results that Cook of a difference of differences of differences.
and Campbell (1979) have called ‘generally OLS analyses showed that heavier viewers
non-interpretable.’ However, what happens did indeed do better than their siblings on
in circumstances where the ideal design letter recognition to an extent not found with
conditions that Shadish et al. (2007) andAiken the lighter viewers, and that this difference of
et al. (1998) created are not possible (i.e. differences was not as pronounced on the five
studies where pretest data is not available)? other cognitive tests as on letter recognition.
The superiority of RD and ITS designs is Few alternative interpretations can be offered
based on an epistemology that is subtly dif- for this predicted pattern of difference of
ferent from the one that validates randomized difference of differences.
experiments and most of the other quasi- Note that this study’s finding appears valid
experimental designs utilized today. There, even without pretests, and that measurement
the counterfactual is a single posttest mean took place at different years for the treatment
or a form of ‘gain’ in the control group. In RD and comparison groups. Yet the design seems
and ITS, on the other hand, the counterfactual strong. Why? First, Minton compensated for
is more complex and depends instead on a some design weaknesses by having siblings
pattern match, on a causal hypothesis that in the treatment and comparison groups. They
predicts multiple implications in the data. are not perfect matches, though, even if
Together, they form a multivariate pattern they do control for some environmental and
(Corrin & Cook, 1998; Shadish et al., 2007) family differences better than matches better
that few if any alternative interpretations than more distantly related individuals would.
would be expected to create, though this last Second, the same general causal hypothesis
assertion has to be critically assessed. about Sesame Street’s effectiveness was made
Let us illustrate an example. Minton (1975) to have a number of substantive and testable
examined the effects of Sesame Street by implications in the data, not just a single
comparing the cognitive performance of implication. In particular, effect sizes should
children who were exposed to the show in vary by the outcome measure and dosage
kindergarten with the performance of their level. This still does not make causal inference
own siblings when they were in the same ‘automatic.’ A case still has to be made that
kindergarten one or two years earlier and when no other causal hypothesis can explain the
they could not have seen the show since it was predicted and obtained complex data pattern;
not yet on the air. To compare just these sib- and one has to develop such designs with one’s
lings is a weak design that fails to account for eyes wide open that the hypothesis involves
selection and history differences. However, a multi-way statistical analysis that requires
BETTER QUASI-EXPERIMENTAL PRACTICE 159
large sample sizes and quality measurement comparison groups that are assessed in exactly
to test well. Nonetheless and following the same way at exactly the same time
Minton’s example, we would like to see by exactly the same person; (4) a causal
more use of patterned causal hypotheses hypothesis with several testable implications
when experiments or very high-quality quasi- in the data that can be addressed with
experiments are not possible. larger samples and quality measurement;
and (5) a study component that empirically
examines the selection process into treatment,
CONCLUSIONS and then measures this process very carefully.
Because of the randomized experiment’s
In some economics’ contexts, quasi- more elegant rationale and transparency of
experiments are lumped together with assumptions, no quasi-experiment provides a
causal studies that do not have any direct better warrant for causal inference. However,
intervention, and the whole is called randomized experiments are not always
‘nonexperiments.’ However, one tradition possible, and so we ask, ‘How can quasi-
(Campbell & Stanley, 1963; Cook & experiments be crafted and justified because,
Campbell, 1979; Shadish et al., 2002) makes on empirical grounds, they are likely to
finer distinctions than this, distinguishing produce similar results to an experiment?’
among experiments and nonexperiments — A review of the empirical literature suggests
based mainly on deliberate intervention that the best quasi-experiments tend to yield
into an ongoing activity — and between causal estimates close to those of the experi-
different kinds (and qualities) of quasi- ment, while the worst quasi-experiments do
experiments. Widespread use of the generic not. The time has come for us to move
‘nonexperiment’ label loses all this subtlety. beyond the simplicity of the ‘experiment
At best, it serves as the contrast to experiment; versus nonexperiment’ debate and to take a
at its worst it lumps together methods that closer look at factors affecting the quality of
radically vary in their ability to approximate quasi-experiments.
the results of experiments. It should be a
concept rarely invoked, though we realize we NOTES
cannot legislate this.
In this chapter, we have chosen to highlight 1 Cook and Wong (in press) present the following
the best designs that quasi-experimental seven criteria for conducting a high quality study of
theory has to offer. Empirical research has within-study comparisons:
shown that RD studies give the same causal 1 There must be variation in the design types
answer as experiments on the same topic; being compared — that is, random assignment
abbreviated ITS studies may also when there in one group of units and a contrasting form of
is a control time series; even the lowly assignment in another group.
2 The assignment difference between the exper-
workhorse design with two non-equivalent iment and nonexperiment should not co-vary
groups and a pretest and posttest may give with theoretically irrelevant third variables that
a close approximation if the treatment and might be plausibly correlated with study outcome
comparison groups are carefully selected (Smith & Todd, 2005). For instance, in the
initially. Certain design attributes seem par- earliest within-study comparisons, the randomly
selected control cases came from the same sites
ticularly important, including: (1) pretests as the intervention cases, but the non-random
and longer pretest time series, especially comparison cases came from national datasets
when the pretest-outcome correlation is high; like the Current Population Survey and hence
(2) local comparison groups — whether these from different physical locations than those in
be monozygotic twins, identical twins, same- the experiment. The random and systematic
controls also differed in many aspects of when
sex siblings, opposite-sex siblings, within- and how outcome measurement occurred, thus
organization controls, within-city matched also confounding the assignment variable of
controls, and so on; (3) treatment and theoretical interest with measurement factors.
160 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
3 The experiment and nonexperiment should also posttest sample means and variances. Assuming
estimate the same average or local average adequate statistical power, the same pattern of
treatment effect. For example, in a RD study the statistical significance will result in only 68 per-
causal impact is assessed at the cutoff point on the cent of comparisons — the probability of two
assignment variable. Comparability demands that significant findings across experiments is 0.80 ×
the average treatment effect in the experiment 0.80, and the probability of two non-significant
should also be estimated at this point. Otherwise, findings is 0.20 × 0.20. Better than comparisons of
differences in results might be attributed to significance test patterns are focused tests of the
differences in design type whereas they are due difference between mean estimates. But these are
to differences in where the effect is estimated. This rare in the literature we review and require careful
will not matter with linear effects in RD, but it will interpretation, especially when experimental and
with non-linear ones. nonexperimental estimates with the same causal
4 The randomized experiment should demonstrably sign reliably differ from zero and are also reliably
meet all the usual criteria for technical adequacy. different from each other. Comparing magnitude
That is, the treatment and control group should estimates without significance tests is another
have been properly randomized; the correct option. But this is complicated by the need to
randomization procedure should not have resulted determine what degree of difference is close
in unhappy randomization by chance; there should enough to justify concluding that the experimental
be no differential attrition; nor should there be and nonexperimental estimates do or do not
treatment crossovers. The importance of these differ.
features follows from the role the randomly 7 The persons analyzing the non-experimental data
formed control group is supposed to play as a should be blind to the results of the experiment
benchmark of complete internal validity. so as not to bias which non-experimental analyses
5 The type of nonexperiment under analysis should are conducted or offered for publication.
also meet all of its technical criteria for being
2 One of the cleanest examples of IV is the use
a good example of its type. This is a difficult
of random assignment as an instrumental variable
criterion, but necessary for avoiding the situation
in order to examine the effects of assignment as it
that results when a good experiment is contrasted
actually occurred as opposed to how it was supposed
with a poor example of a particular type of
to occur (see Angrist et al., 1996 for full explanation).
observational study. The key here is an explicit
3 Hahn et al. (2001) offer a formal discussion of
theory of what constitutes a quality observational
instrumental variable methods for addressing fuzzy
study in terms of its design, implementation, and
discontinuities, and suggest local linear regression as a
analysis. This is better known for RD than for the
non-parametric IV procedure for estimating treatment
difference-in-differences design, largely because
effects.
the assignment process is more transparent and
4 It is important to note that when doing analysis
better modeled in RD, directing major attention
using an interrupted time series design, one must
to how the functional form is specified and how
adjust for possible correlation between observations.
fuzziness around the cutoff is handled. This is not
For example, ordinary statistical tests (i.e. t-tests)
to argue that unbiased inference is impossible with
that compare pre- and post-treatment observations
the difference-in-difference design. However, the
assume that observations are taken from independent
requirement is then that assignment processes
and identical distributions. However, this assumption
have to be perfectly modeled or the outcome
is often not met when analyzing time series data
totally predicted and, in actual research practice,
(think about autocorrelation of a student’s test score
uncertainties always remain about how well these
from year to year). Estimating autocorrelation requires
requirements are met. Clues are also offered
a larger number of observations to facilitate correct
by the results identifying bias-reducing features
model identification.
in past reviews of the within-study comparison
literature in job training. But as we have seen,
these are incomplete and have never completely
reduced selection bias. At most, common sense REFERENCES
can help identify clear cases of poor design and
analysis even if it cannot help discriminate among
Agodini, R., & Dynarski, M. (2004). Are experiments
the alternatives currently thought to be better.
the only option? A look at dropout prevention
6 A within-study comparison should be explicit
about the criteria it uses for inferring corre- programs. The Review of Economics and Statistics,
spondence between experimental and nonexper- 86 (1), 180–194.
imental results. Identical estimates are not to Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J., &
be expected. Even close replications of the same Hsuing, S. (1998). Comparison of a randomized
randomized experiment will not result in identical and two quasi-experimental designs in a single
BETTER QUASI-EXPERIMENTAL PRACTICE 161
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of van der Klaauw, W. (2002). Estimating the effect
psychological, educational, and behavioral treatment. of financial aid offers on college enrollment:
American Psychologist, 48(12), 1181–1209. A regression-discontinuity approach. International
McClannahan, L. E., McGee, G. G., MacDuff, G. S., & Economic Review, 43(4), 1249–1287.
Krantz, P. S. (1990). Assessing and improving child Wilde, E. T., & Hollister, R. (2007). How close
care: A personal appearance index for children with is close enough? Testing nonexperimental esti-
autism. Journal of Applied Behavior Analysis, 23, mates of impact against experimental estimates of
469–482. impact with education test scores as outcomes.
Michalopoulos, C., Bloom, H. S., & Hill, C. J. (2004). Can Journal of Policy Analysis and Management, 26,
propensity-score methods match the findings from a 455–477.
random assignment evaluation of mandatory welfare- Wortman, P. M., Reichardt, C. S., & St. Pierre, R. G.
to-work programs? The Review of Economics and (1978). The first year of the education vouncher
Statistics, 86(1), 156–179. demonstration. Evaluation Quarterly, 2, 193–214.
McKillip, J. (1992). Research without control groups:
A control construct design. In F. B. Bryant, J. Edwards,
R. S. Tindale, E. J. Posavac, L. Heath & E. Henderson APPENDIX 1: SUGGESTIONS FOR
(Eds.), Methodological issues in applied psychology
(pp. 159–175). New York: Plenum.
FURTHER READINGS ON TOPICS
Mill, J. S. (1856). A system of logic: Ratiocinative and COVERED IN THIS CHAPTER
inductive. Honolulu, Hawaii: University Press of the
Pacific. Regression discontinuity design
Minton, J. H. (1975). The impact of ‘Sesame Street’ on
reading readiness of kindergarten children. Sociology Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J., &
of Education, 48, 141–151. Hsuing, S. (1998). Comparison of a randomized
Seaver, W. B., & Quarton, R. J. (1976). Regression- and two quasi-experimental designs in a single
discontinuity analysis of dean’s list effects. Journal outcome evaluation: Efficacy of a university-level
of Educational Psychology, 68, 459–465. remedial writing program. Evaluation Review, 22(4),
Shadish, W. R., Clark, M. H., & Steiner, P. M. (2007). Can 207–244.
nonrandomized experiments yield accurate answers? Angrist, J. D., & Lavy, V. (1999). Using Maimonides’
A randomized experiment comparing random to rule to estimate the effect of class size on scholastic
nonrandom assignment. achievement. Quarterly Journal of Economics, 144,
Shadish, W. R., Luellen, J. K., & Clark, M. H. 533–576.
(2006). Propensity scores and quasi-experiments: Berk, R. A., & de Leeuw, J. (1999). An evaluation
A testimony to the practical side of Lee Sechrest. of California’s inmate classification system using a
In R. R. Bootzin (Ed.), Measurement, methods and generalized regression discontinuity design. Journal
evaluation. Washington, DC: American Psychological of the American Statistical Association, 94(448),
Association Press. 1045–1052.
Smith, J. C., & Todd, P. (2005). Does matching overcome Berk, R. A., & Rauma, D. (1983). Capitalizing on
LaLonde’s critique of nonexperimental estimators. nonrandom assignment to treatments: A regression-
Journal of Econometrics, 125, 305–353. discontinuity evaluation of a crime-control program.
Staw, B. M., Notz, W. W., & Cook, T. D. Journal of the American Statistical Association,
(1974). Vulnerability to the draft and attitudes 78(381), 21–27.
toward troop withdrawal from Indochina: Repli- Black, D., Galdo, J., & Smith, J. C. (2005). Evaluating the
cation and refinement. Psychological Reports, 34, regression discontinuity design using experimental
407–417. data. Working paper.
Trochim, W. (1994). The regression-discontinuity design: Buddelmeyer, H., & Skoufias, E. (2003). An evaluation
An introduction. Chicago, IL: Thresholds National of the performance of regression discontinuity design
Research and Training Center on Rehabilitation and on PROGRESA. Bonn, Germany: IZA.
Mental Illness. Cook, T. D. (in press). ‘Waiting for life to arrive’:
Trochim, W. M. K. (1984). Research design for program A history of the regression-discontinuity design in
evaluation. Beverly Hills, CA: Sage Publications. psychology, statistics and economics. Journal of
Trochim, W. M. K., & Spiegelman, C. (1980). The relative Econometrics.
assignment variable approach to selection bias in Goldberger, A. S. (1972a). Selection bias in evaluating
pretest-posttest designs. Alexandria, VA: American treatment effects: Some formal illustrations. Madison,
Statistical Association. WI: Institute for Research on Poverty.
BETTER QUASI-EXPERIMENTAL PRACTICE 163
Goldberger, A. S. (1972b). Selection bias in evaluating group methods match the findings from a random
treatment effects: The case of interaction. Madison, assignment evaluation of mandatory welfare-to-work
WI: Institute for Research on Poverty. programs? Washington, DC: Manpower Demonstra-
Hahn, J., Todd, P., & Van der Klaauw, W. (2001). tion Research Corporation.
Identification and estimation of treatment effects Braver, M. W. (1991). The multigroup interrupted
with a regression-discontinuity design. Econometrica, time-series design and analysis: An application to
69(1), 201–209. career ladder research. Arizona State University,
Jacob, B., & Lefgren, L. (2004a). The impact of teacher Phoenix.
training on student achievement: Quasi-experimental McArdle, J. J., & Wang, L. (2006). Modeling age-
evidence from school reform efforts in Chicago. based turning points in longitudinal life-span growth
Journal of Human Resources, 39(1), 50–79. curves of cognition. In P. Cohen (Ed.), Turning points
Jacob, B., & Lefgren, L. (2004b). Remedial education research. Mahwah, NJ: Erlbaum.
and student achievement: A regression-discontinuity McClannahan, L. E., McGee, G. G., MacDuff, G. S., &
analysis. Review of Economics and Statistics, Krantz, P. S. (1990). Assessing and improving child
LXXXVI(1), 226–244. care: A personal appearance index for children with
Ludwig, J., & Miller, D. L. (2005). Does head start autism. Journal of Applied Behavior Analysis, 23,
improve children’s life chances? Evidence from a 469–482.
regression discontinuity design. Cambridge, MA: McKillip, J. (1992). Research without control groups:
National Bureau of Economic Research. A control construct design. In F. B. Bryant, J. Edwards,
Seaver, W. B., & Quarton, R. J. (1976). Regression- R. S. Tindale, E. J. Posavac, L. Heath & E. Henderson
discontinuity analysis of dean’s list effects. Journal (Eds.), Methodological issues in applied psychology
of Educational Psychology, 68, 459–465. (pp. 159–175). New York: Plenum.
Spiegelman, C. (1977). A technique for analyzing
a pretest-posttest nonrandomized field experiment.
Statistics Report M435.
Thistlewaite, D. L., & Campbell, D. T. (1960). Regression- Difference-in-differences studies
discontinuity analysis: An alternative to the ex-post that use propensity score methods
facto experiment. Journal of Educational Psychology,
Agodini, R., & Dynarski, M. (2004). Are experiments
51, 309–317.
the only option? A look at dropout prevention
Trochim, W. M. K. (1984). Research design for program
programs. The Review of Economics and Statistics,
evaluation. Beverly Hills, CA: Sage Publications.
86 (1), 180–194.
Trochim, W. M. K., & Spiegelman, C. (1980). The relative
Dehejia, R., & Wahba, S. (1999). Causal effects in
assignment variable approach to selection bias in
nonexperimental studies: Reevaluating the evaluation
pretest-posttest designs. Alexandria, VA: American
of training programs. Journal of the American
Statistical Association.
Statistical Association, 94(448), 1053–1062.
van der Klaauw, W. (2002). Estimating the effect
Heckman, J., Ichimura, H., & Todd, P. E. (1998).
of financial aid offers on college enrollment:
Matching as an econometric evaluation estimator.
A regression-discontinuity approach. International
Review of Economic Studies, 65(2), 261–294.
Economic Review, 43(4), 1249–1287.
Heckman, J., Imbens, I. H., & Todd, P. E. (1997).
Matching as an econometric evaluation estimator:
Evidence from evaluating a job training programme.
Interrupted time series design Review of Economic Studies, 64, 605–654.
Heckman, J., & Navarro-Lozano, S. (2004). Using match-
Ashenfelter, O. (1978). Estimating the effects of training ing, instrumental variables, and control functions
programs on earnings. Review of Economics and to estimate economic choice models. Review of
Statistics, 60, 47–57. Economics and Statistics, 86 (1), 30–57.
Bloom, H. S., Michalopoulos, C., & Hill, C. J. Hirano, K., & Imbens, G. W. (2001). Estimation of causal
(2005). Using experiments to assess nonexperimental effects using propensity score weighting: An applica-
comparison-group methods for measuring program tion to data on right heart catheterization. Health
effects. In H. S. Bloom (Ed.), Learning more from social Services and Outcomes Research Methodology, 2,
experiments (pp. 173–235). New York: Russell Sage 259–278.
Foundation. Imbens, G. W. (2000). The role of the propensity score
Bloom, H. S., Michalopoulos, C., Hill, C. J., & in estimating dose-response functions. Biometrika,
Lei, Y. (2002). Can nonexperimental comparison 87 (3), 706–710.
BETTER QUASI-EXPERIMENTAL PRACTICE 165
Heckman, J. J., Ichimura, H., Smith, J. C., & Todd, P. Shadish, W. R., Luellen, J. K., & Clark, M. H.
(1998). Characterizing selection bias. Econometrica, (2006). Propensity scores and quasi-experiments:
66 (5), 1017–1098. A testimony to the practical side of Lee Sechrest.
Hotz, V. J., Imbens, G. W., & Klerman, J. (2000). In R. R. Bootzin (Ed.), Measurement, methods and
The long-term gains from GAIN: A re-analysis of evaluation. Washington, DC: American Psychological
the impacts of the California GAIN program. NBER Association Press.
technical working paper #8007. Smith, J. C., & Todd, P. (2005). Does matching overcome
Hotz, V. J., Imbens, G. W., & Mortimer, J. H. (1999). LaLonde’s critique of nonexperimental estimators.
Predicting the efficacy of future training programs Journal of Econometrics, 125, 305–353.
using past experience. NBER technical working Wilde, E. T., & Hollister, R. (2002). How close is
paper #238. close enough? Testing nonexperimental estimates of
LaLonde, R. (1986). Evaluating the econometric impact against experimental estimates of impact with
evaluations of training with experimental data. The education test scores as outcomes, Discussion paper
American Economic Review, 76 (4), 604–620. no. 1242-02. Madison, WI: Institute for Research on
Michalopoulos, C., Bloom, H. S., & Hill, C. J. (2004). Can Poverty.
propensity-score methods match the findings from a
random assignment evaluation of mandatory welfare-
to-work programs? The Review of Economics and Threats to internal validity
Statistics, 86 (1), 156–179.
Olsen, R., & Decked, P. (2001). Testing different methods Shadish, W. R., Cook, T. D., & Campbell, D. T.
of estimating the impacts of worker profiling and (2002). Experimental quasi-experimental designs for
reemployment services systems. Washington, DC: generalized causal inference. Boston: Houghton
Mathematica Policy Research, Inc. Mifflin Company.
164 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Michalopoulos, C., Bloom, H. S., & Hill, C. J. (2004). Can Bell, S. H., Orr, L. L., Blomquist, J. D., & Cain,
propensity-score methods match the findings from a G. C. (1995). Program applicants as a com-
random assignment evaluation of mandatory welfare- parison group in evaluating training programs.
to-work programs? The Review of Economics and Kalamazoo, MI: Upjohn Institute for Employment
Statistics, 86 (1), 156–179. Research.
Rosenbaum, P. (2002). Observational Studies. Bloom, H. S., Michalopoulos, C., & Hill, C. J.
New York: Springer-Verlag. (2005). Using experiments to assess nonexperi-
Rosenbaum, P., & Rubin, D. B. (1983). The central role mental comparison-group methods for measuring
of the propensity score in observational studies for program effects. In H. S. Bloom (Ed.), Learning more
causal effects. Biometrika, 70(1), 41–55. from social experiments (pp. 173–235). New York:
Rosenbaum, P., & Rubin, D. B. (1984). Reducing bias in Russell Sage Foundation.
observational studies using subclassification on the Bloom, H. S., Michalopoulos, C., Hill, C. J., &
propensity score. Journal of the American Statistical Lei, Y. (2002). Can nonexperimental comparison
Association, 79, 516–524. group methods match the findings from a random
Rosenbaum, P., & Rubin, D. B. (1985). Constructing a assignment evaluation of mandatory welfare-to-work
control group using multivariate matched sampling programs? Washington, DC: Manpower Demonstra-
methods that incorporate the propensity score. The tion Research Corporation.
American Statistician, 39(1), 33–38. Bratberg, E., Grasdal, A., & Risa, A. E. (2002). Evaluating
Rubin, D. B. (1977). Assignment to treatment group social policy by experimental and nonexperimental
on the basis of a covariate. Journal of Educational methods. Scandinavian Journal of Economics, 104(1),
Statistics, 2(1), 1–26. 147–171.
Rubin, D. B., & Thomas, N. (1996). Matching using Buddelmeyer, H., & Skoufias, E. (2003). An evaluation
propensity scores: Relating theory to practice. of the performance of regression discontinuity
Biometrics, 52, 249–264. design on PROGRESA. Bonn, Germany: IZA.
Shadish, W. R., Luellen, J. K., & Clark, M. H. Dehejia, R., & Wahba, S. (1999). Causal effects in non-
(2006). Propensity scores and quasi-experiments: experimental studies: Reevaluating the evaluation of
A testimony to the practical side of Lee Sechrest. training programs. Journal of the American Statistical
In R. R. Bootzin (Ed.), Measurement, methods and Association, 94(448), 1053–1062.
evaluation. Washington, DC: American Psychological Fraker, T., & Maynard, R. (1987). The adequacy
Association Press. of comparison group designs for evaluations of
Wilde, E. T., & Hollister, R. (2002). How close is employment-related programs. Journal of Human
close enough? Testing nonexperimental estimates of Resources, 22(2), 194–227.
impact against experimental estimates of impact with Friedlander, D., & Robins, P. (1995). Evaluating
education test scores as outcomes, Discussion paper program evaluations: New evidence on commonly
no. 1242-02. Madison, WI: Institute for Research on used nonexperimental methods. American Economic
Poverty. Review, 85(4), 923–937.
Zhong, Z. (2004). Using matching to estimate treatment Glazerman, S., Levy, D. M., & Myers, D. (2002).
effects: Data requirements, matching metrics, and Nonexperimental replications of social experiments:
Monte Carlo evidence. The Review of Economics A systematic review. Washington, DC: Mathematica
and Statistics, 86 (1), 156–179. Policy Research, Inc.
Glazerman, S., Levy, D. M., & Myers, D. (2003).
Nonexperimental versus experimental estimates of
earnings impacts. The Annals of the American
Within-study comparison papers
Academy, 589, 63–93.
Agodini, R., & Dynarski, M. (2004). Are experiments Greenberg, D. H., Michalopoulos, C., & Robins, P.
the only option? A look at dropout prevention (2006). Do experimental and nonexperimental
programs. The Review of Economics and Statistics, evaluations give different answers about the effec-
86 (1), 180–194. tiveness of government-funded training programs.
Aiken, L. S., West, S. G., Schwalm, D. E., Carroll, J., & Journal of Public Policy and Management, 25(3),
Hsuing, S. (1998). Comparison of a randomized 523–552.
and two quasi-experimental designs in a single Gritz, M., & Johnson, T. (2001). National Job Corps
outcome evaluation: Efficacy of a university-level Study: Assessing program effects on earnings for
remedial writing program. Evaluation Review, 22(4), students achieving key program milestones. Seattle,
207–244. WA: Battelle Memorial Institute.
11
Sample Size Planning with
Applications to Multiple
Regression: Power and
Accuracy for Omnibus and
Targeted Effects
Ken Kelley and Scott E. Maxwell
vital to scientific research, ensuring that first being defined. There are multiple ways
the statistical methods chosen provide the to plan sample size for a single study.
information of interest is an important step The way in which sample size is planned
for scientific progress. Even though science depends heavily on the question(s) of interest
is often laborious and slow, by designing a that the investigator has defined. Thus, not
well-planned study researchers can be in the defining the question of interest implies that
best position to maximize their chances for a method for choosing sample size, and thus
success, where the ultimate goal is gaining the sample size itself, cannot adequately be
a better understanding of the phenomenon of defined1 .
interest. For example, suppose a researcher wishes
Designing research studies is arguably the to examine the relationship between five
most important single phase of research. regressor variables and a criterion variable
With a poorly designed study, little or no in a multiple regression context. However,
understanding of the phenomenon of interest the process of deciding on an appropriate
may be gained. Given the high economic sample size cannot begin until the question
and professional costs of poorly designed of interest has been clearly defined. There are
research, motivation of the researcher should at least four scenarios in which sample size
clearly be on the side of beginning an planning can proceed in a multiple regression
investigation with a well-designed study. context:
Many facets exist to research design and
each one deserves attention. At a minimum, (a) desired degree of statistical power for the overall
the following points must be considered fit of the model (i.e. power for the squared
when designing studies in the behavioral, multiple correlation coefficient);
educational, and social sciences: (b) desired degree of statistical power for a
specific regressor variable (i.e. power for the
(a) the question(s) of interest must be determined; test of a particular population regression coef-
(b) the population of interest must be identified; ficient);
(c) a sampling scheme must be devised; (c) statistical accuracy for the overall fit of the
(d) selection of independent and dependent mea- model (i.e. a narrow confidence interval for
sures must occur; the population squared multiple correlation
(e) a decision regarding experimentation versus coefficient);
observation must be made; (d) statistical accuracy for a specific regressor
(f) statistical methods must be chosen so that the variable (i.e. a narrow confidence interval for one
question(s) of interest can be answered in an or more population regression coefficients)2 .
appropriate and optimal way;
(g) sample size planning must occur so that an
Thus, an appropriate sample size depends
appropriate sample size given the particular
scenario, as defined by points a through f, can
very much on the goals of the researcher.
be used; Not surprisingly, given the fundamental
(h) the duration of the study and number of differences between power and accuracy
measurement occasions need to be considered; for omnibus and targeted effects, necessary
(i) the financial cost (and feasibility) of the proposed sample size can be very different in the four
study calculated. scenarios. More general than the multiple
regression example, sample size planning
Sample size planning (Point g) as it relates can be conceptualized in a two-by-two table,
to the question(s) of interest (Point a) of where the effect of interest, either an omnibus
an investigation is the focus of this chapter. or a targeted effect, is on one dimension and
Although sample size planning is an important the goal, either power or accuracy, is on the
part of research design, sample size planning other dimension. Such a conceptualization is
cannot occur without some question of interest given in Table 11.1 for sample size planning
168 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
a function of sample size, the sample size can and covariance have omnibus effect sizes
be manipulated so that a desired degree of that are generally not easy to interpret. One
power is reached. Power has been discussed option is to reduce such multivariate effects
in numerous book length treatments for many into simpler effects (e.g. pairwise, simple
statistical tests (e.g. Kraemer & Thiemann, main effects, specific effects, etc.) and then
1987; Cohen, 1988; Lipsey, 1990; Murphy & report their corresponding effect sizes and
Myors, 1998). confidence intervals. Even though such effects
The use of null hypothesis significance are readily comprehensible, such simplified
testing has been under fire for some time hypotheses generally fail to consider the com-
(e.g. Nickerson, 2000, for a review; the works plexity and multivariate nature of the original
contained in Rozeboom, 1960; Bakan, 1966; research question, requiring the questions to
Morrison & Henkel, 1970; Meehl, 1978; be addressed with multivariate techniques
Cohen, 1994; Schmidt, 1996). Even though that may not have readily interpretable
we sympathize with many of the critiques effect sizes. We will discuss the benefits
leveled against the use of null hypothesis of confidence interval formation in the next
significance testing, null hypothesis signif- section, but we acknowledge that confidence
icance testing has its place in science and intervals are not adequate for addressing
there is little question that it will continue all substantively interesting questions. In
to be widely used (e.g. Chow, 1996; Hagen, cases where a research question is best
1997; Harris, 1997; Wainer, 1999; Mogie, addressed with a null hypothesis significance
2004). There are two main reasons why test, the a priori power of the test should
null hypothesis significance tests are valuable be as important as the obtained probability
in research: they help researchers decide if value.
the population value of some effect differs Even though the conceptual rationale of
from a specified quantity (generally zero), power analysis is generally well understood,
and for many tests they allow the researcher not often discussed are the implications and
to decide the direction of the effect. For importance of mapping a power analysis onto
some questions of interest, the use of null the research question(s) of interest. In a given
hypothesis significance tests is not especially study, there are often numerous statistical
helpful. In those situations other techniques hypotheses evaluated. Given a particular
can be used. sample size and holding everything else
One common alternative to null hypothesis constant, each of the potential statistical tests
significance testing is the use of effect sizes has a population effect size and model error
and their corresponding confidence inter- (or simply a standardized effect size which
vals (e.g. Schmidt, 1996; Thompson, 2002; simultaneously considers both) that must be
Smithson, 2003; Hunter & Schmidt, 2004; estimated, and an associated level of statistical
Steiger, 2004; Grissom & Kim, 2005). Effect power. Sample size can thus be determined
sizes and their corresponding confidence so that power is at some desired level for
intervals can better address issues involving one or several tests. If power is set to a
the magnitude of an effect than can null value, such as 0.85, it is likely that a different
hypothesis significance tests. However, some sample size would be necessary for each of the
research questions do not lend themselves statistical tests of interest. Depending on the
to being framed as an effect where the exact question of interest (i.e. for which test
magnitude is meaningful and of interest. This is the appropriate sample size determined),
is especially true with some multiparameter necessary sample size to achieve some desired
and multivariate hypotheses, as such tests goal will generally be different. Thus, before
are more difficult to transform into an sample size planning from a power analytic
effect size and corresponding confidence approach can proceed, the exact question of
interval that is readily interpretable. For interest must be specified (Point a from the
example, multivariate analysis of variance designing research list).
170 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
When statistical tests are conducted in sit- chance of achieving statistical significance for
uations of low power, the literature of an area the parameter(s) of interest, it can prevent
can become awash with contradictory results the study from even being conducted because
(e.g. Sedlmeier & Gigerenzer, 1989; Rossi, funding is not secured.
1990; Hunter & Schmidt, 2004; Maxwell, Power analysis is also an important tool for
2004). For example, suppose several protecting valuable resources. For example,
researchers each replicate the same previously suppose a study was conducted with a sample
reported study using multiple regression with size of N = 20. Further suppose that the
several regressor variables. Further suppose statistical test on the parameter of interest
that the power was low for each of the several did not yield a statistically significant result.
regressors. It is entirely possible that each of Such a result might be disappointing, but
the researchers obtained a different set of sta- such a result might have also been avoided.
tistically significant regression coefficients, Suppose that a power analysis (e.g. based
none of which mirror the previously reported on an independent group t-test where the
study! By having low power across multiple population standardized mean difference is
parameters, there is often a high probability of thought to be 0.40 with the Type I error
obtaining statistical significance somewhere rate set to 0.05) would have revealed that a
(Kelley et al., 2003), but a small probability sample size of 100 would be necessary in order
of replicating the same set of statistically for the power to equal 0.80, the researcher’s
significant regressors (Maxwell, 2000, 2004). operational definition of ‘adequate power.’
Consistency of research findings is thus Had such a power analysis been conducted by
difficult if power is low for some or all of the the researcher a priori, the researcher would
effects examined. Without ensuring that an have had at least three choices: (a) perform the
adequate degree of power is achieved, low- study with N = 20 anyway, with the caveat
powered studies riddled with Type II errors that there would be only a small probability
can permeate the literature and scientific (specifically 0.23 under the anticipated effect
growth can falter because of inconsistencies size) of achieving statistical significance (i.e.
regarding statistically significant effects low power); (b) modify the original design so
across multiple studies that examine the same that the sample size was changed to N = 100
effects (Rosenthal, 1993; Schmidt, 1996; in order for the researcher to have an adequate
Kraemer et al., 1998; Hunter & Schmidt, degree of power for detecting the effect of
2004, chapter 1). interest; or (c) realize that N = 100 is not
Many times when a study has important practical given the difficulty of collecting data
implications, such as those often conducted and conclude that the cost/benefit ratio is not
in the behavioral, educational, social, and worth conducting the study at the present time.
medical sciences, ignoring issues of power is Points b and c are both enlightening from a
irresponsible and potentially even unethical. resource standpoint, because it may become
This is true, for example, when individuals are apparent that N = 20 is not adequate and
subjected to an inferior treatment condition in thus using a sample size of only 20 may not
a study with low power. The individuals in be a wise use of resources given the low
such studies are put at risk with little chance probability of finding statistical significance.
of determining whether some treatments are
truly superior to others. A more tangible
reason for seriously considering power analy- RATIONALE OF ACCURACY IN
sis is that grant funding review boards now PARAMETER ESTIMATION
generally require explicit consideration of
design and power in grant proposals in order In order for a piece of information to be
to receive funding (e.g. Allison et al., 1997; meaningful, it is generally desirable for that
Kraemer et al., 1998). Thus, not only can piece of information to be accurate. In the
ignoring power issues lead to a study with little context of parameter estimation, accuracy is
SAMPLE SIZE PLANNING WITH APPLICATIONS TO MULTIPLE REGRESSION 171
defined in terms of the (square) root of the estimate and will contain the parameter
mean square error (RMSE), and is a function (1–)100% of the time, as the width of the
of precision and bias. Formally, the accuracy interval decreases the expected accuracy of
of an estimate θ̂ is defined as the estimate improves (i.e. the RMSE is
reduced).
2 The effect of increasing sample size
RMSE = E θ̂−
potentially has two effects on accuracy. First,
the larger the sample size generally the
2 2
= E θ̂−E θ̂ + E[θ̂−] more precision the estimate will have (i.e.
its variance decreases as N increases)9 . For
unbiased estimates, improving the precision
= 2 +B2 , (1) necessarily improves accuracy. Estimators
θ̂ θ̂
that are biased will many times become less
where E [·] is the expected value of the biased as sample size increases. Indeed, for
quantity in brackets, is the parameter consistent estimators, regardless of whether
of interest with θ̂ as its estimate, 2 is the estimator is biased or unbiased, as sample
θ̂ size tends to infinity the probability that the
the
population
variance
of the estimator
2 sample estimate differs from the population
i.e. E θ̂ − E θ̂ , and B is the bias quantity by any value tends to zero (Stuart
θ̂
et al., 1994, chapter 17). Thus, above and
of the estimator i.e. E θ̂ − (Rozeboom, beyond any effect of precision, decreasing
1966, p. 500). Whereas precision reflects the bias also improves accuracy. In fact, even for
repeatability of measurements and is thus biased estimates, decreasing the confidence
inversely related to the sample-to-sample interval width can still be desirable. In such
variability, bias is the systematic (i.e. average) a scenario the point estimate itself might be
discrepancy between an estimate and the biased but the range of plausible parameter
parameter it estimates. Notice that when the values sufficiently small10 .
bias equals zero, the estimate is unbiased Sample size planning is almost always
and accuracy and precision are equivalent regarded as being synonymous with power
concepts7 . However, precision alone does not analysis. However, as previously discussed,
imply an accurate estimate8 . sample size planning can also proceed with
A narrow confidence interval has a tightly the goal of obtaining a sufficiently narrow
clustered set of plausible parameter values confidence interval. We call this method of
that will contain the parameter of interest sample size planning accuracy in parameter
with the degree of confidence specified. estimation (AIPE; Kelley & Rausch, in press;
These plausible parameter values are those Kelley et al., 2003; Kelley & Maxwell,
that cannot be rejected as the value of the 2003; Kelley, 2006), because when the
population parameter. In the long run when width of the (1–)100% confidence interval
the assumptions of the model are satisfied decreases — implying that there is a smaller
for an exact confidence interval procedure, range of plausible parameter values at a given
(1–)100% of the confidence intervals formed confidence level — the expected accuracy of
under the same conditions will contain the estimate necessarily increases. Because
(Hahn & Meeker, 1991, p. 31). Holding the accuracy can almost never be calculated
confidence level constant, the narrower the for a single estimate, due to the fact that
confidence interval width, the more values can it depends on unknown population values,
be excluded from the plausible set of param- minimizing the confidence interval width to
eter values. The effect of this is a homing some acceptable value serves as a way to
in on the population parameter. Because an operationally define the expected accuracy of
appropriately constructed confidence interval the estimate. Our usage of the term ‘accuracy
will always contain the observed parameter in parameter estimation’is consistent with that
172 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
used by Neyman in his seminal work on the confidence interval will, on (1–)100% of
theory of confidence intervals: ‘the accuracy occasions, have its lower bound less than
of estimation corresponding to a fixed value zero and its upper bound greater than zero
of 1- may be measured by the length of the (and thus the null value of zero is contained
confidence interval’ (1937, p. 358, notation within the interval and cannot be rejected).
changed to reflect current system). Further suppose that the confidence interval
It can be argued that obtaining an estimate contains zero, yet is wide relative to the scale
that has a narrow confidence interval is more of the measurement. Even though the null
beneficial scientifically than obtaining an hypothesis of zero cannot be rejected, a large
estimate that reaches statistical significance. range of other plausible values (i.e. those
It has even been recommended that statistical values contained in the confidence limits) can
significance tests be banned and replaced also not be rejected. Contrast such a situation
with point estimates and their corresponding with one where zero is contained within
confidence intervals (Schmidt, 1996, p. 116). the interval and the width of the confidence
In many situations, especially in observational interval is narrow. In such a situation it is
research, it is known a priori that the null possible to exclude a wide range of values
hypothesis is almost always false (Bakan, as being plausible (i.e. those not contained
1966; Meehl, 1967; Cohen, 1994; Schmidt, within the confidence limits) and thus narrow
1996; Harris, 1997), and as such situations the range of plausible values.
reaching statistical significance is simply a When one wishes to show support for
function of having a large enough sample the null hypothesis (Greenwald, 1975), the
size (of course, the direction of some effects accuracy of the obtained estimate as judged
is often of interest and importance; see our by the width of the corresponding confidence
discussion in the previous section)11 . How- interval should be of utmost concern. The
ever, when an effect is of interest, learning as ‘good enough’ principle can be used and
much as possible about the size of the effect a corresponding ‘good enough belt’ can be
is almost always beneficial, and many times formed for the null value, where the limits
it can be more beneficial than learning only of the belt would define what constituted
the direction and statistical significance of the a nontrivial effect (Serlin & Lapsley, 1985,
parameter. Embracing the AIPE approach to 1993). Suppose that not only is the null
sample size planning will help to facilitate value contained within the good enough
the accumulation of scientific knowledge by belt, but so too are the confidence limits.
yielding more accurate information about This would be a situation where all of
the parameter. Indeed, as Rosenthal (1993) the plausible values would be smaller in
discusses, there are really two results of magnitude than what has been defined as a
interest: (a) the estimate of the magnitude trivial effect (i.e. the confidence limits are
of the effect; and (b) an indication of the contained within the good enough belt). In
accuracy of the effect ‘as in a confidence such a situation the limits of the (1–)100%
interval around the estimate’ (p. 521). Thus, confidence interval would exclude all effects
rather than simply asking if an effect differs of any ‘meaningful’ size. If the parameter
from some specified null value, in most cases is less in magnitude than what is minimally
it seems better to address the size of the important, then learning this can be very
effect, realizing that the more accurate the valuable. This information may or may not
estimate of the effect the more information support the theory of interest, but what is
is learned. important is that valuable information about
Suppose there is no treatment effect in the size of the effect, and thus the phenomenon
a two-group situation (i.e. the null hypothesis of interest, has been gained. Illuminating
is true). Assuming its assumptions are met, the the size of the effect is something a null
t-test will yield a p-value greater than on hypothesis test in and of itself cannot do.
(1–)100% of occasions. The corresponding Furthermore, in order for future researchers
SAMPLE SIZE PLANNING WITH APPLICATIONS TO MULTIPLE REGRESSION 173
to incorporate the study into a meta-analysis, where Y is the population mean of Y and
the size of the effect is required (e.g. Hunter & µX is the p length vector of population
Schmidt, 2004). means for the regressor variables (see, for
example, Graybill, 1976; Darlington, 1990;
Pedhazur, 1997; Rancher, 2000; Cohen et al.,
OVERVIEW OF MULTIPLE 2003 for comprehensive coverage of multiple
REGRESSION regression and the general linear model).
Throughout the chapter we assume that the
Let Yi be an observed score on some criterion regressor variables are fixed, which implies
variable for the ith individual (i = 1, . . ., N) that in theoretical replications of the study
and Xij be the observed score for the jth the same X matrix would be obtained. This
regressor variable ( j = 1, . . ., p) for the ith would be the case, for example, when the X
individual12,13 . The general univariate linear matrix is literally developed as part of the
model can be written as study design. Theoretical replications of the
study would then have the same X matrix
Yi = 0 + Xi1 1 + Xi2 2 + · · · + Xip p + ε i , and the only variation would be the values
(2) of the criterion variables (and thus the error).
When the regressors are random, and thus in
where 0 is the population intercept, j is theoretical repetitions of the study different
the regression coefficient for the jth regressor, X matrices would be obtained, the discussion
and ε i is the error in prediction for the ith that follows would need to be modified to take
individual generally assumed to be normally into consideration the increased randomness
distributed with mean zero and variance ε2 14 . of the design (e.g. Sampson, 1974; Gatsonis &
The matrix analog of Equation 2 can be Sampson, 1989; Rancher, 2000).
written as Often of interest in a multiple regression
context is the squared multiple correlation
y = 0 1 + Xβ + ε, (3)
coefficient, sometimes termed the coefficient
where y is an N length vector of observed of determination. Recall that the squared mul-
criterion variables, 0 is the intercept, 1 is tiple correlation coefficient is the proportion
an N length column vector of 1s, X is an N of variance in Y that is accounted for by
by p matrix of fixed regressor variables, β is the p regressor variables. The population
a p length vector of regression coefficients, multiple correlation coefficient, denoted with
and ε is an N length vector of errors15 . The an uppercase Greek rho, squared, is defined as
p regression coefficients in the vector β can
be obtained by manipulation of the normal σ Y X −1
XX σ XY
P2Y ·X = , (6)
equations as Y2
β = −1 −1
XX σ XY = XX σ Y X , (4) which is equivalent to the population
squared product moment correlation coeffi-
where XX is the p by p covariance matrix cient between the observed scores (Yi ) and the
of the regressor variables with a minus predicted scores (Ŷi ; i.e. P2Y ·X = 2 ) 16 .
Y Ŷ
one power representing the inverse of the Equations 2–6 have used only population
matrix, σ XY is the p length column vector of parameters. In practice, of course, only
covariances of the p regressors with Y and the sample means, variances, and covari-
σ Y X is the p length row vector of covariance ances are known. The means and the vari-
of Y with the p regressors (σ XY = σ Y X , where ance/covariance matrix of the p + 1 variables
prime denotes transposition). The intercept is (the outcome variable and the p regressor vari-
defined as ables) are estimated with the usual unbiased
estimates and substituted into Equations 4–6.
0 = Y − µX β, (5) The estimate of β corresponding to the p
174 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
As mentioned, the test of a specific regression implying the noncentrality parameter for the
coefficient is equivalent to the test of no jth regressor is
change in P2Y ·X when the jth regressor
is
removed from the regression equation −j = f−j
2
N. (23)
i.e. P2Y ·X − P2Y ·X−j = 0 . This is in turn
equivalent to the test of the squared semi- It should be kept in mind that all derivations
partial (part) correlation of Y with the have been for the case where the regressors
jth regressor being zero. Let P2Y ·(X ·X−j ) be are considered fixed. This and the previous
j
section laid out the formal distributional
the correlation of Y with the independent
theory of RY2 ·X and bj . The derivations given
part of Xj (i.e. the squared semi-partial
in this section allow them to be used in a future
correlation between Y and Xj ). The definition
section that deals with statistical power for the
of P2Y ·(X ·X−j ) is given as
j squared multiple correlation coefficient.
parameter indexes the magnitude of the evaluated against a null value of zero, is based
difference between the null and alternative on a t-value with N −p−1 degrees of freedom,
hypotheses. The larger the difference between and is given as
the null and alternative hypotheses, the larger
bj
is the noncentrality parameter. t= , (13)
It can be shown that the noncentrality sbj
parameter of the sampling distribution for the where sbj is given as
F-statistic of Equation 10 is given as
1 − RY2 ·X sY
= f 2 N, (11) sbj = s ,
1 − RXj ·X−j N − p − 1
2 Xj
where (14)
P2Y ·X with RX 2 being the squared multiple
f2 = (12) j ·X−j
1 − P2Y ·X correlation coefficient using the jth regressor
as the criterion on the remaining p − 1
and where f 2 has an interpretation as the regressors. RX 2 is also indirectly available
j ·X−j
signal-to-noise ratio (Cohen, 1988; Stuart from SXX as
et al., 1999; Rancher, 2000; Smithson, 2001).
−1
As can be seen, is a function of P2Y ·X 2
RX = 1 − sj2 cjj , (15)
j ·X−j
and N. As either of these quantities becomes
larger, so too does . The effect of a where sj2 is the variance for the jth regressor
larger is that the sampling distribution of and cjj is the jth diagonal element of S−1 XX
the F-statistic in Equation 10 has a larger (Harris, 2001).
mean and for fixed sample size values will Similar to the situation described previ-
be more positively skewed. Thus, a larger ously when the null hypothesis that P2 = 0 is
proportion of the noncentral distribution will false and the F-statistic of Equation 10 follows
be larger than the critical value under the null a noncentral distribution, so too does the test
hypothesis. This idea will become important statistic of Equation 13 when j = 0. It can
in the discussion of power and for confidence be shown that when the null hypothesis that
interval formation. j = 0 is false, the t-statistic in Equation 13
has a noncentrality parameter which can be
written as
The test of the null hypothesis that √
a regression coefficient equals zero λj = fj N, (16)
Let P2Y ·X−j be the population squared multiple where
correlation coefficient when Y is predicted
1 − P2X ·X X
from p − 1 regressor variables with Xj
fj = j
j −j j
. (17)
excluded. Researchers are often interested in 1 − PY ·X
2 Y
knowing if a specific regressor variable adds
a statistically significant amount to the fit Because j can be written (e.g. Hays, 1994) as
of the model, which translates into a test of
2
P2Y ·X being larger than P2Y ·X−j . Such a test PY ·X − P2Y ·X−j Y
j = , (18)
is equivalent to the test of the regression 1 − P2X ·X Xj
j −j
coefficient for Xj when all of the p variables
are included in the model. fj from Equation 17 can be rewritten as
One of the ways to test the hypothesis that
2
βj is non-zero is to conduct a t-test directly PY ·X − P2Y ·X
fj =
−j
on bj from the full model. A null hypothesis . (19)
1 − P2Y ·X
significance test for a regression coefficient,
SAMPLE SIZE PLANNING WITH APPLICATIONS TO MULTIPLE REGRESSION 177
Fouladi, 1997; Cumming & Finch, 2001; given as p θ̂|
. Calculation of a confidence
Kelley, 2005), and standardized regression interval for based on the inversion confi-
coefficients all require the use of noncen- dence interval
principle
involves finding θL
tral distributions. The following subsection such that p θ̂|θL = 1 − L for the lower
will discuss methods of forming confidence limit and θU such that p θ̂|θU = U for the
intervals when noncentral distributions are upper limit. The confidence interval for has
required. coverage of 1 − (L + U ) and is given as
which yields a confidence interval of CI0.95 = The MBESS R package includes a func-
[0.7165 P2Y ·X 0.8206], where CI0.95 tion, ci.reg.coef(), for confidence interval
represents a 95% confidence interval with formation for j . A confidence interval
the limits given in the brackets for the for an unstandardized regression coefficient
parameter on interest. Thus, we can be 95% can be obtained by specifying the stan-
confident that the population squared multiple dard deviations of the variables (with the
arguments s.Y and s.X) and specifying
correlation coefficient in this situation is
Noncentral = FALSE. In the situation
somewhere between 0.7165 and 0.8206. described for the unstandardized regression
coefficients (bj = 4.4245), where sY =
Confidence interval for a regression 150.0734 and sXj = 9.3605, the ci.reg.coef()
coefficient function could be specified as
Before forming a confidence interval for
R > ci.reg.coef( b.j = 4.4245,
a regression coefficient, the distinction has
to be made whether or not the regression R2. Y_X = 0.7854,
coefficient will be standardized. An unstan- R2.j_X.without.j = 0.3607, N = 145, p = 5,
dardized regression coefficient is a pivotal
s.Y = 150.0734, s.X = 9.3605,
quantity, whereas a standardized regression
coefficient is a non-pivotal quantity (in an conf.level = 0.95, Noncentral = FALSE)
analogous fashion as the difference between
two group means is pivotal but the standard- which yields a confidence interval of CI0.95 =
ized difference between two group means is [2.8667 j 5.9823], where b.j
nonpivotal). Thus, a confidence interval for an is the unstandardized regression coefficient
unstandardized regression coefficient requires for the jth regressor variable, R2.Y_X is
only a critical value from a central distribution the squared multiple correlation coefficient,
whereas a standardized regression coefficient R2.j_X.without.j is the squared multiple
requires the critical values to be obtained correlation coefficient when the jth regressor
from a noncentral distribution (analogous to variables are predicted from the remaining
forming a confidence interval for P2Y ·X ). The p − 1 regressor variables, conf.level is the
following two sections discuss confidence confidence level specified (i.e. 1 − ), and
intervals for unstandardized and standardized Noncentral is an indicator of whether or not
regression coefficients. the noncentral method should be used (FALSE
for unstandardized and TRUE for standardized
Confidence intervals for an regression coefficients).
unstandardized regression
coefficient
Confidence intervals for a
The t-test for the unstandardized regression
standardized regression coefficient
coefficient, Equation 11, is a pivotal quan-
tity implying that the test statistic can be When a regression coefficient is standardized,
manipulated into a confidence interval. The the unstandardized regression coefficient
sX
confidence interval for the unstandardized is multiplied by the quantity sYj in order
regression coefficient is thus given as to remove the scale of Xj and Y . Such a
quantity is no longer pivotal because of the
prob.[bj −t(1−/2;N−p−1) sbj j process of standardization, implying that the
bj +t(1−/2;N−p−1) sbj ] = 1−. (26) confidence interval necessarily depends on a
noncentral t-distribution. The difficulties that
The confidence interval given above is arise when forming a confidence interval for
the confidence interval given in standard s j , the population standardized regression
textbooks that discuss multiple regression. coefficient for the jth regressor, arise because
178 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
value having probability 1 − /2 and /2 P2Y ·X = 0, the test statistic given in Equation
for the lower and upper confidence limits, 10 follows a noncentral F-distribution with
respectively. The values of the noncentrality noncentrality parameter , as given in
parameter that would lead to the observed Equation 11. In accord with the inversion
values occurring with the specified probabili- confidence interval principle, RY2 ·X must be
ties are then transformed into the quantity of converted into the estimated noncentrality
interest. The resultant limits form the (1 − parameter and then noncentral parameters
)100% confidence interval for the population must be found such that
quantity of interest. Although true for confi-
dence intervals based on central distributions p Λ̂|L = 1 − /2 (24)
when L = U , there is no requirement that
the lower confidence interval width, θ̂ − θL, and
will equal the upper confidence interval
width, θU − θ̂ for confidence intervals based p Λ̂|U = /2, (25)
on noncentral distributions. Throughout the
chapter, ‘width’ refers to the full confidence where Λ̂ is the observed noncentrality
interval width, θU − θL. parameter, L and U are the noncentral
values that have at their 1–/2 and /2
quantiles Λ̂ and are thus the lower and
upper confidence limits, respectively (e.g.
Confidence interval for the squared Mendoza and Stafford, 2001; Smithson, 2003;
multiple correlation coefficient Steiger 2004).
The squared multiple correlation coefficient The MBESS R package includes a function,
is one of the most widely used statistics. RY2 ·X ci.R2(), for confidence interval formation
is almost always reported in the context of for P2Y ·X , for fixed (or random) regressor
multiple regression, but in its various forms variables. Although other options can be
RY2 ·X can be used to describe the proportion specified, a straightforward call to the ci.R2()
of variance accounted for in a wide variety of function for fixed regressor variables would
situations (e.g. between subjects analysis of be of the form
variance and covariance designs; as a measure
of cross validation; as an index of comparison R > ci.R2(R2 = RY2 ·X , N = N, p = p,
in meta-analyses, etc.). As Steiger states, conf.level = 1 − ,
‘confidence intervals for the squared multiple Random.Regressors = FALSE)
correlation are very informative yet are not
discussed in standard texts, because a single where RY2 ·X , N, p, and 1– are defined
simple formula for the direct calculation of in the function in the same way as they
such an interval cannot be obtained in a have been defined previously and Random.
manner that is analogous to the way one Regressors identifies if the regressors are
obtains a confidence interval for the popu- random (TRUE) or fixed (FALSE). For
lation mean’ (2004, p. 167). However, con- example, suppose a researcher conducts a
fidence intervals for the population squared study with five regressor variables on 145
multiple correlation coefficient are available individuals and obtains a multiple correlation
with certain software (e.g. R2, an MS-DOS of RY2 ·X = 0.785418 . The ci.R2() function for
program written by Steiger and Fouladi, 1992; 95% confidence interval coverage could be
MultipleR2, a Mathematica package written specified as
by Mendoza and Stafford, 2001; MBESS,
an R package written by Kelley (2007); and R > ci.R2(R2 = 0.7854, N = 145,
indirectly with SAS and SPSS, Smithson,
p = 5, conf.level = 0.95,
2003). Difficulties arise when forming a
confidence interval for P2Y ·X because when Random.Regressors = FALSE)
SAMPLE SIZE PLANNING WITH APPLICATIONS TO MULTIPLE REGRESSION 181
sX
bj is multiplied by sYj (in order to obtain s bj , SAMPLE SIZE PLANNING FOR
the sample standardized regression coefficient MULTIPLE REGRESSION GIVEN THE
for variable j). The distribution of s bj is not GOAL OF STATISTICAL POWER
pivotal and it is necessary to form confidence
intervals based on noncentral t-distributions. This section discusses methods to plan
In accord with the inversion confidence sample size for statistical power in multiple
interval principle, s bj must be converted regression. We begin with an overview of
into the observed noncentrality parameter sample size planning for a desired power for
(via, Equation 13), and then the noncentral the omnibus effect (i.e. P2Y ·X ) and then provide
parameters must be found such that an overview of sample size planning for a
desired power for a targeted effect (i.e. j
ˆ L = 1 − /2
p |λ (27) or s j ).
and
Power for omnibus effects in
ˆ U = /2,
p |λ (28) multiple regression: Obtaining
statistical significance for the
where λL and λU are the lower and upper
squared multiple correlation
confidence limits for s j and are noncentrality
parameters from t-distributions. coefficient
The MBESS R package includes a func- When interest concerns the omnibus effect of
tion, ci.reg.coef(), for confidence interval the model, recall that the noncentrality param-
formation for s j , technically assuming fixed eter was previously shown (Equations 11–12)
regressor variables. Although other options to equal
can be specified, a straightforward call to the
ci.reg.coef() function would be of the form
P2Y ·X
R > ci.reg.coef (b.j = s bj , R2.Y_X = RY2 ·X ,
= N. (29)
1 − P2Y ·X
R2.j_X.without.j = R2Xj ·X−j , N = N, p = p,
This implies that sample size is given as
conf.level = 1 − , Noncentral = TRUE).
For example, in the previous example where 1 − P2Y ·X
N = 145 and RY2 ·X = 0.7854, suppose that N = . (30)
P2Y ·X
s bj = 0.2760 and RXj ·X−j = 0.3607. The
2
function from MBESS can be specified is as The idea is to first use P2Y ·X , p, and in order to
follows: determine the width of the confidence interval
given some minimal sample size. If the width
R > ss.power.reg.coef(Rho2.Y_X = 0.40, is larger than desired, the current estimate of
Rho2.Y_X.without.j = 0.30, p = 5, N is incremented by 1 and then the expected
desired.power = 0.80, alpha.level = 0.05) width is determined again. This iterative
process continues until the sample size is just
where Rho2.Y_X is the population squared large enough so that the expected confidence
multiple correlation coefficient predicting Y interval width is sufficiently narrow. Two
from X and Rho2.Y_X.without.j is the caveats with such an approach arise: RY2 ·X is
population squared multiple correlation coef- a positively biased estimate of P2Y ·X and the
ficient predicting Y from X−j . The necessary sample size calculated is only for the expected
sample size in this example is 50. width.
Even though RY2 ·X is the sample estimate
of P2Y ·X , RY2 ·X is positively biased. However,
the confidence limits for P2Y ·X , and thus its
SAMPLE SIZE PLANNING FOR
width, are based on RY2 ·X . Even though the
MULTIPLE REGRESSION GIVEN THE
bias of RY2 ·X decreases as N increases, holding
GOAL OF STATISTICAL ACCURACY
everything else constant,basing the necessary
sample size on P2Y ·X directly would lead to
AIPE for the omnibus effect in inappropriate estimates of necessary sample
multiple regression: Obtaining a size because the width of the computed
narrow confidence interval for the confidence interval in part depends on RY2 ·X .
population squared multiple The way in which this complication is
correlation coefficient overcome is by using the expected value of
RY2 ·X in place of P2Y ·X . The expected value
The way in which sample size can be
determined in order for the expected width of RY2 ·X given P2Y , N, and p when regressors
of the confidence interval for P2Y ·X to be are fixed does not have a known derivation.
sufficiently narrow is quite involved. The However, the expected value of RY2 ·X given
method is computationally tedious and can P2Y ·X , N, and p when regressors are random
only be carried out with the use of an is known and is used as an approximation to
iterative computer routine that uses noncentral the case where predictors are fixed, which is
F-distributions. As elsewhere in the chapter, given as
we have restricted the discussion to regressors
that are fixed. The case of random regressors is E RY2 ·X | P2Y ·X , N, p
fully developed in Kelley (2006)19 . It should N − p − 1
be noted that two methods are discussed. =1− 1 − P2Y ·X
N −1
The first method discussed provides necessary N +1 2
sample size for the expected confidence × H 1;1; ; PY ·X , (33)
2
interval width. The confidence interval width
is a random variable that will vary from where H is the hypergeometric function
sample to sample. A modified approach will (Stuart et al., 1999, section 28.32; Johnson
also be discussed so that the width will be et al., 1995).
sufficiently narrow with no less than some The sample size procedure is based on
specified degree of certainty. the expected value of RY2 ·X because it is the
The values that must be specified in order value expected to be obtained in the study.
to determine the necessary sample size given For a given , p, and N, the confidence
an expected confidence interval width that interval width depends only on RY2 ·X . Thus,
is sufficiently narrow are P2Y ·X , p, and . the expected confidence interval width can be
SAMPLE SIZE PLANNING WITH APPLICATIONS TO MULTIPLE REGRESSION 183
determined by forming a confidence interval before, but now using the confidence limits
with the expected RY2 ·X . The expected confi- in place of P2Y ·X from the first procedure. The
dence interval width can be made sufficiently rationale of this approach is to base the sample
narrow by increasing sample size, implying size procedure on the largest and smallest
that the expected value of RY2 ·X changes, plausible value for the obtained RY2 ·X based
until the expected confidence interval width on the original sample size and the degree of
is equal to or just narrower than the desired certainty specified.
width. Once the sample size is found so The reason the upper and lower confidence
that the expected confidence interval width is limits are used is because, unlike many
sufficiently narrow, using the sample size in effects where the larger the noncentrality
a study will ensure that the expected width parameter the wider the confidence interval
of the confidence interval will be sufficiently (holding everything else constant), there is
narrow. a nonmonotonic relationship between RY2 ·X
For example, suppose a researcher wishes and the confidence interval width. Depending
to determine necessary sample size so that on the particular situation, a larger sample
the expected width of a 95% confidence size may be necessitated by the lower limit
interval for P2Y ·X is 0.20 for 5 regressor or the upper limit from the two 100%
variables in a situation where P2Y ·X = 0.5. The one-sided confidence limits (or a value in
ss.aipe.R2() function from MBESS would be between). The relationship between RY2 ·X and
used as the corresponding confidence interval width is
illustrated in Figure 11.1 for 95% confidence
intervals where p = 5 and N = 100.
R > ss.aipe.R2(Population.R2 = 0.50,
The lack of monotonicity between the size
conf.level = 0.95, width = 0.20, p = 5, of RY2 ·X and the confidence interval width
Random.Regressors=FALSE), implies that, depending on the particular
situation, the upper limit, the lower limit, or
which returns a necessary sample size of values in-between the two one-sided 100%
152. Thus, using a sample size of 152 would confidence interval limits will yield wider
provide an expected width for the confidence confidence intervals for P2Y ·X . Even though
interval of 0.20. Figure 11.1 is helpful to illustrate why upper
Since the width of the confidence interval and lower limits are required, recall that the
is a random variable, having a sample size procedure always uses the expected value of
such that the expected width is sufficiently RY2 ·X . Thus, an analog to the figure presented,
narrow does not ensure that any particular and what is actually used in the procedure,
sample will have a confidence interval that is one where the values on the ordinate
is sufficiently narrow (e.g. see Hahn & are a function of basing confidence interval
Meeker, 1991, or Kupper & Hafner, 1989, width on the expected values of RY2 ·X for
for a discussion of these issues in simpler corresponding values of P2Y ·X .
situations). What can be done is to specify Two issues arise when basing the sample
some desired degree of certainty that the size procedure on limits from the 100% one-
obtained confidence interval will in fact be sided confidence intervals. First, it is possible
sufficiently narrow. The way in which this that the point estimate itself requires a larger
additional step proceeds is by using the sample sample size than either of the confidence
size obtained from the previously discussed limits (e.g. suppose the corresponding point
procedure and from two 100% one-sided estimate is 0.35 from the figure). Second,
confidence intervals for P2Y ·X , where is the the maximum confidence interval width could
desired degree of certainty that the obtained be between the limits (e.g. suppose the
interval will be sufficiently narrow. The limits corresponding confidence limits are 0.2 and
from the 100% confidence intervals are then 0.6 from the figure). To ensure that an
used to plan an appropriate sample size as appropriate sample size is determined, an
184 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
CI Width
SE(R 2)
0.25
95% CI Width and SE(R 2) Given R 2 = P2
0.20
0.15
0.10
0.05
0.00
Figure 11.1 Relationship between the observed width of the 95% confidence interval for the
population squared multiple correlation coefficient (PY2 ·X ) as a function of the observed
squared multiple correlation coefficient (RY2 ·X ) when the total sample size is 100 and there are
five regressors
ss.aipe.R2() function is used in order to order for the expected width to be sufficiently
ensure a desired degree of certainty of 0.99 narrow:
is given as follows:
t(1−/2;N−p−1) 2 1 − P2Y ·X
N=
R > ss.aipe.R2(Population.R2 = 0.50, ω/2 1 − P2Xj ·X−j
conf.level = 0.95, width = 0.20, p = 5,
Y2
degree.of.certainty = 0.99, × + p + 1, (35)
X
2
j
Random.Regressors = FALSE),
where ω is the desired full width of the
which yields a necessary sample size of 189. confidence interval. A complication is that
the desired N is implicitly involved on the
AIPE for targeted effects in multiple right side of the equation since the degrees of
regression: Obtaining a narrow freedom of the t-value depend on N. It is thus
necessary to solve Equation 35 iteratively.
confidence interval for the
Because the confidence interval width is
population regression coefficient itself a random variable, obtained values of
Recall that when regression coefficients are sb2j larger than the population value used in
unstandardized, the way in which confidence the calculation of N will lead to confidence
intervals are obtained is based on the central intervals wider than desired. In order to
t-distribution. However, confidence intervals avoid obtaining a confidence interval wider
based on standardized regression coefficients than desired, the 100% confidence limit for
require the use of noncentral distributions the standard error can be used in place of
(since s bj is not a pivotal quantity). Thus, the population standard error when solving
the appropriate procedures are different for for N. The 100% upper confidence limit
the two scenarios. The first procedure dis- for the population standard error of the jth
cussed will be for unstandardized regression regression coefficient, based on a chi-square
coefficients followed by a procedure for distribution with N − p − 1 degrees of
standardized regression coefficients. freedom, can then be substituted for the
population variance from Equation 34. Doing
AIPE for unstandardized regression so will ensure that the obtained confidence
coefficients interval will be sufficiently narrow no less
Kelley and Maxwell (2003) discussed AIPE than 100% of the time. Since the only way
for a targeted regression coefficient. We will for a confidence interval to be wider than
base the present discussion largely on an desired is to obtain a standard error larger than
updated account of that work in the context of the population standard error, using the upper
unstandardized regression coefficients. Recall 100% confidence limit of the standard error
from Equation 26 that the confidence interval will ensure that the confidence interval will
for j is straightforward to calculate given bj , be sufficiently narrow no less than 100% of
sbj (which is a function of N, p, RY2 ·X , RX2 j ·X−j ), the time.
N, p, and . The population variance for the The way in which the upper limit for
jth regression coefficient is given as the variance of the regression coefficient is
⎛ ⎞ determined is given as
1−P 2
2
b2j = ⎝ Y ·X ⎠ Y
. 1 − P2Y ·X Y2
1−P2 /(N −p−1) X2
bj =
2
Xj ·X−j
1 − P2Xj ·X−j /(N − p − 1) Xj
j 2
(34) 2
(;N−1)
Given b2j , the sample size can be solved × , (36)
N −p−1
for, yielding the necessary sample size in
186 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Thus, an analog for the way a desired degree of certainty parameter into the sample
degree of certainty is incorporated into the size procedure for standardized regression
unstandardized regression coefficient, where coefficients and for sensitivity analyses in
the confidence interval width depends on general20 .
only one parameter, b2j , is necessarily more
difficult in the standardized case. Even though
we believe that a method can and will DISCUSSION
be developed, at the present time a brute-
force trial and error simulation-based method In the context of multiple regression, the
can be implemented in order to plan an question ‘What size sample should I use?’
appropriate necessary sample size. Such an does not have a simple answer. As this
approach would proceed by specifying the chapter has demonstrated, the answer is best
population parameters and simulating data addressed with the two-by-two conceptualiza-
based on a particular sample size. From tion presented in Table 11.1. Specifically, the
there, confidence intervals could be performed sample size that should be used depends on
for standardized regression coefficients as the goals of the study. If the goal is for the
previously discussed. The proportion of overall fit of the model, then interest concerns
confidence intervals that are less than the P2Y ·X ; if the goal is for a targeted effect, then
desired width can be determined for different interest concerns j (or s j ). Of course, both
sample size values. This could be done until P2Y ·X and j (or s j ) might be of interest, which
the minimum sample size is found that yields implies that the larger of the two sample
no less than the desired degree of certainty sizes from the situations of interest should
specified. be used.
The function ss.aipe.reg.coef.sensitivity() However, identifying only that one is
contained in the MBESS R package can be interested in P2Y ·X and/or j (or s j ) is still not
used to determine the appropriate sample enough to determine the necessary sample
size as well as perform general sensitivity size. It is also necessary to determine if the
analyses. When an estimated set of population goal is to reject the null hypothesis that
parameters is specified (that differs from the the effect is zero in the population or if
true set), the sample size used is based on the goal is to obtain an accurate parameter
the estimated values, but the simulation is estimate via a narrow confidence interval
conducted based on the properties of the for the population parameter (possibly both).
true set of parameter values. This allows one In multiple regression, although the idea is
to perform a sensitivity analysis, where the much more general, choosing an adequate
effects of mis-specifying population parame- sample size is not generally possible until
ters by varying amounts on the typical width a particular cell in Table 11.1 has been
and the percentage of confidence intervals identified as the scenario of interest. Once
narrower/wider than desired can be evaluated. the particular scenario from the two-by-two
Alternatively, a specific sample size can be conceptualization has been determined, then
used in order to evaluate the properties of the and only then can an appropriate sample size
situation described by the true set of parameter be planned (recall Point f from the designing
values at the specified value of sample size. research studies list in the introduction of the
Using the specified sample size approach, one chapter).
can run the simulation with different values of Even after the scenario has been deter-
sample size until the percentage of confidence mined, it is still necessary to use an appro-
interval widths less than the desired width is priate value of an effect size parameter. One
equal to the degree of certainty of interest. thing that has been conspicuously absent
Although generally more time consuming, from the chapter is ways to choose an
the brute force method described works very appropriate value for the effect size parameter
well when one wants to incorporate a desired so that all the sample size procedures can
188 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
ideal, as it conveys the goal of achieving a parameter fixed throughout the chapter. Even though the
estimate that is close to its population value. distinction between fixed and random regressors
8 As an extreme example, suppose that regardless is not often made in applied work, the sampling
of the observed data, a researcher always estimates distribution of an estimated regression coefficient
the parameter to be a value that corresponds to an tends to depend on whether the regressors are fixed
a priori theory irrespective of any observed data. In or random (e.g. Stuart et al., 1999; Rancher, 2000).
such a case there would be a high degree of precision Many applications of multiple regression implicitly or
but the accuracy would likely be poor due to the explicitly take the view ‘given this X’ so that the
effect of bias in the estimation procedure unless the X variables can be considered fixed for purposes
theory is perfect. Precision is thus a necessary but not a of the study (e.g. O’Brien & Mueller, 1993, p. 23).
sufficient condition for achieving accurate parameter O’Brien and Mueller (1993) make the argument that
estimates. the distinction is not important in the context of
9 A counter example is the Cauchy distribution, sample size planning for power in multiple regression
where the precision of the location estimate is the by stating that ‘the practical discrepancy between
same regardless of the sample size used to estimate it the two approaches disappears as the sample size
(Stuart et al., 1994, pp. 2–3). increases’ (p. 23). O’Brien and Mueller (1993) go on
10 Some population parameters are typically to say that ‘because the population parameters are
estimated with biased estimators but have exact conjectures or estimates, strict numerical accuracy of
confidence interval procedures. Even though the the power computations is usually not critical’ (p. 23).
estimator is biased, the point estimate may be We will say more about the distinction between fixed
necessary for calculation of the (exact) confidence and random regressors elsewhere in the chapter.
interval, where the values within the interval represent 14 We use both standardized and unstandardized
plausible values and will contain the parameter regression coefficients in various parts of the chapter.
with (1 − )100% confidence. Many such population Observed standardized regression coefficients have
parameters also have unbiased (or more unbi- at times been referred to as ‘beta weights’ in the
ased) estimators. Examples include the standardized behavioral and educational sciences. We will use j to
mean difference (e.g. Hedges & Olkin, 1985), represent the unstandardized population regression
the squared multiple correlation coefficient (e.g. coefficient of variable j with bj as its estimate. We
Algona & Olenek, 2000), the standard deviation use s j to represent the standardized population
(e.g. Hays, 1994, for the confidence interval method regression coefficient of variable j with s bj as its
and Boltzmann, 1950, for the unbiased estimate), estimate.
and the coefficient of variation (e.g. Johnson & 15 Notice that we have not used the standard
Welch, 1940 for the confidence interval method and general linear model equations, where the intercept
Social & Baumann, 1980, for its nearly unbiased is contained within and X contains a vector of ones
estimate). A strategy in such cases is to report the for the intercept. The notation used here is equivalent
exact confidence interval and the unbiased estimate to the standard general linear model equations, but
of the population parameter. it is especially helpful for presenting the necessary
11 The direction of an effect is known if the information for each of the four approaches to sample
upper and lower limits of the confidence interval are size planning for multiple regression.
both in the same direction (i.e. both are positive or 16 Throughout the chapter, multiple correlation
both are negative). Furthermore, the confidence limits coefficients will be denoted with a subscript that
determine whether or not a particular null hypothesis identifies the variable being predicted separated by
(such as zero) can be rejected. Confidence limits a dot from one or more regressor variables. Thus,
provide the same information as an infinite set of the criterion variable is on the left of the dot and
hypothesis tests. The values within the confidence the regressor variable(s) are to the right of the dot,
limits are the values of the null hypothesis that would where the dot can literally be read as ‘regressed on,’
not be rejected. The values outside of the confidence ‘predicted from’ or ‘explained by.’
limits are the values of the null hypothesis that would 17 A mean-shifted central distribution is one that
be rejected. follows a central distribution after subtracting the
12 The term ‘regressors’ has been used throughout population value. For example, when comparing two
the chapter as a generic term for the Ax variables. A independent group means, if there is a population
regressor variable is termed independent, explanatory, mean difference between the two groups a priori,
predictor, or concomitant variable in other contexts. then that difference can be subtracted from the
The term criterion is used as a generic term for observed difference: Ȳ1 − Ȳ2 − (1 − 2 ), where Ȳ1
the Y variable. The criterion variable is termed and Ȳ2 are the observed means for groups one and
dependent, outcome, or predicted variable in other two, respectively, and 1 and 2 are the population
contexts. means for groups 1 and 2,respectively.
13 Notice that the regressor variables (i.e. the Ax 18 The illustrative data from Holzinger and
variables) are not italicized in any of the equations. Swineford’s (1939) Grant-White School data (avail-
This is because we will regard the regressors as able in MBESS), where the criterion variable, total
190 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
score (the sum of all of the 26 measured variables Cox, D. R., & McCullagh, P. (1982). Some aspects of
included in the dataset), is modeled as a function analysis of covariance. Biometrics, 541–561.
of the regressor variables flags, wordm, addition, Cumming, G., & Finch, S. (2001). A primer on the
object, and series. The standardized and unstan- understanding, use, and calculation of confidence
dardized regression coefficients, presented in the
intervals that are based on central and noncentral
next section, are for the series variable, which was
distributions. Educational and Psychological Mea-
a test that measured students’ ability to complete
mathematical/numeric series. Notice that the squared surement, 61, 532–574.
multiple correlation coefficient is quite large by most Darlington, R. B. (1990). Regression and linear models.
behavioral, educational, and social science standards. New York, NY: McGraw-Hill.
The large squared multiple correlation coefficient is Dunlap, W. P., Xin, X., & Myers, L. (2004).
because the dependent variable is a sum of five Computing aspects of power for multiple regres-
positively correlated measures, where the zero-order sion. Behavior Research Methods, Instruments, &
correlations among the measures tended to be large. Computers, 36, 695–701.
19 Even though only fixed regressors are discussed
Gatsonis, C., & Sampson, A. R. (1989). Multiple
in the chapter, The ss.aipe.R2() function in MBESS
correlation: Exact power and sample size calculations.
can be used for regressors that are fixed or random
by specifying Random.Predictors=TRUE (for random Psychological Bulletin, 106, 516–524.
predictors) or Random.Predictors=FALSE (for fixed Graybill, F. A. (1976). Theory and application of the
regressors). linear model. Pacific Grove, CA: Brooks/Cole.
20 In addition to the ss.aipe.reg.coef.sensitivity() Green, B. F. (1977). Parameter sensitivity in multivariate
function described, there is also a ss.power.reg. methods. Multivariate Behavioral Research, 12,
coef.sensitivity() function that allows the effects 263–288.
of parameter mis-specification or selected sample Green, S. B. (1991). How many subjects does it take
size to be specified in order to assess empiri- to do a regression analysis? Multivariate Behavioral
cal power, and other properties, for a targeted
Research, 26, 499–510.
regression coefficient. These functions for confi-
dence interval width and power have analogs for Greenwald, A. G. (1975). Consequences of prejudice
omnibus effect with the ss.aipe.R2.sensitivity() and the against the null hypothesis. Psychological Bulletin,
ss.power.R2.sensitivity() functions. 82, 1–20.
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for
research: A broad practical approach. Mahwah, NJ:
REFERENCES Lawrence Erlbaum Associates.
Hagen, R. L. (1997). In praise of the null hypothesis
Algona, J., & Olenek, S. (2000). Determining sample statistical test. American Psychologist, 52(1), 15–24.
size for accurate estimation of the squared mul- Hahn, G., & Meeker, W. (1991). Statistical intervals:
tiple correlation coefficient. Multivariate Behavioral A guide for practitioners. New York, NY: John Wiley &
Research, 35, 119–136. Sons, Inc.
Allison, D. B., Allison, R. L., Faith, M. S., Paultre, F., & Harris, R. J. (1997). Significance tests have their place.
F. X. Pi-Sunyer. (1997). Power and money: Design- Psychological Science, 8, 8–11.
ing statistically powerful studies while minimizing Harris, R. J. (2001). A primer of multivariate statistics
financial costs. Psychological Methods, 2, 20–33. (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Bakan, D. (1966). The test of significance in psycholog- Hays, W. L. (1994). Statistics (5th ed.). Belmont,
ical research. Psychological Bulletin, 66, 423–437. CA: Wadsworth Publishing.
Chow, S. L. (1996). Statistical significance: Rationale, Hedges, L. V., & Olkin, I. (1985). Statistical methods for
validity and utility. Newbury Park, CA: Sage meta-analysis. Orlando, FL:Academic Press.
Publications. Boltzmann, W. H. (1950). The unbiased estimate of
Cohen, J. (1988). Statistical power analysis for the the population variance and standard deviation.
behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence American Journal of Psychology, 63, 615–617.
Erlbaum Associates. Holzinger, K. J., & Swineford, F. (1939). A study in factor
Cohen, J. (1994, December). The earth is analysis: The stability of a bi-factor solution. Chicago,
round (p < 0.05). American Psychologist, 49, IL: The University of Chicago.
997–1003. Huitema, B. E. (1980). The analysis of covariance and
Cohen, J., Cohen, P., West, S. G., & Aiken, L. alternatives. New York, NY: Wiley.
S. (2003). Applied multiple regression/correlation Hunter, J. E., & Schmidt, F. L. (2004). Methods of
analysis for the behavioral sciences (3rd ed.). meta-analysis: Correcting error and bias in research
Mahwah, NJ: Erlbaum. findings. Newbury Park, CA: Sage.
SAMPLE SIZE PLANNING WITH APPLICATIONS TO MULTIPLE REGRESSION 191
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1995). Con- perspective. (2nd ed.). Mahwah, NJ: Lawrence
tinuous univariate distributions (Vol. 2). New York, Erlbaum Associates.
NY: John Wiley & Sons, Inc. Meehl, P. E. (1967). Theory testing in psychology and
Johnson, N. L., & Welch, B. L. (1940). Applications in physics: A methodological paradox. Philosophy of
of the noncentral t -distribution. Biometrika, 31, Science, 34, 103–115.
362–389. Meehl, P. E. (1978). Theoretical risks and tabular
Kelley, K. (2005). The effects of nonnormal distributions asterisks: Sir Karl, Sir Ronald, and the slow progress
on confidence intervals for the standardized mean of soft psychology. Journal of Consulting and Clinical
difference: Bootstrapping as an alternative to Psychology, 46, 806–834.
parametric confidence intervals. Educational and Mendoza, J. L., & Stafford, K. L. (2001). Confidence
Psychological Measurement, 65(1), 51–69. intervals, power calculations, and sample size
Kelley, K. (2006). Sample size planning for the estimation for the squared multiple correlation
squared multiple correlation coefficient: Accuracy in coefficient under the fixed and random regression
parameter estimation via narrow confidence intervals. models: A computer program and useful standard
Manuscript under review. tables. Educational and Psychological Measurement,
Kelley, K. (2007). MBESS version 0.0.9: An R package. 61, 650–667.
[computer software and manual]. Retrievable from Mogie, M. (2004). In support of null hypothesis
http://www.cran.r-project.org/ significance testing. Proceedings of the Royal
Kelley, K., & Maxwell, S. E. (2003). Sample size for Society of London, Series B, Biology Letters, 271,
multiple regression: Obtaining regression coefficients 82–84.
that are accurate, not simply significant. Psychological Morrison, D. E., & Henkel, R. E. (1970). The significance
Methods, 8, 305–321. test controversy: A Reader. Chicago, IL: Aldine
Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Publishing Company.
Obtaining power or obtaining precision: Delineating Murphy, K. R., & Myors, B. (1998). Statistical
methods of sample size planning. Evaluation and the power analysis: A simple and general model for
Health Professions, 26, 258–287. traditional and modern hypothesis tests. Mahwah, NJ:
Kelley, K., & Rausch, J. R. (2006). Sample size planning Erlbaum.
for the standardized mean difference: Accuracy in Neyman, J. (1937). Outline of a theory of statistical
parameter estimation via narrow confidence intervals. estimation based on the classical theory of probability.
Psychological Methods, 11, 363–385. Philosophical Transaction of the Royal Society
Kraemer, H., Gardner, C., Brooks, J. O., & Yesavage, of London. Series A, Mathematical and Physical
J. A. (1998). Advantages of excluding underpow- Sciences, 236, 333–380.
ered studies in meta-analysis: Inclusionist versus Nickerson, R. S. (2000). Null hypothesis significance
exclusionist viewpoints. Psychological Methods, 3, testing: A review of an old and continuing
23–31. controversy. Psychological Methods, 5, 241–301.
Kraemer, H. C. (1991). To increase power in O’Brien, R., & Mueller, K. E. (1993). A unified approach
randomized clinical trials without increasing sample to statistical power for t -tests to multivariate models.
size. Psychopharmacology Bulletin, 27, 217–224. In L. Edwards (Ed.), Applied analysis of variance
Kraemer, H. C., & Thiemann, S. (1987). How many in behavioral sciences (pp. 297–344). New York,
subjects? Beverly Hills, CA: Sage. NY: Marcel Dekker.
Kupper, L. L., & Hafner, K. B. (1989). How appro- Pedhazur, E. J. (1997). Multiple regression in behavioral
priate are popular sample size formulas? American research: Explanation and prediction (3rd ed.).
Statistician, 43, 101–105. New York, NY: Harcourt Brace College Publishers.
Lipsey, M. W. (1990). Design sensitivity: Statistical R Development Core Team. (2007). R version 2.5.0:
power for experimental research. Newbury Park, A language and environment for statistical computing
CA: Sage. [computer software and manual], R foundation for
Maxwell, S. E. (2000). Sample size and multiple statistical computing.
regression. Psychological Methods, 5, 434–458. Rancher, A. C. (2000). Linear models in statistics.
Maxwell, S. E. (2004). The persistence of under- New York, NY: John Wiley & Sons, Inc.
powered studies in psychological research: Causes, Rosenthal, R. (1993). Cumulative evidence. In G. Keren &
consequences, and remedies. Psychological Meth- C. Lewis (Eds.), A handbook for data analysis
ods, 9, 147–163. in the behavioral sciences: Methodological issues
Maxwell, S. E., & Delaney, H. D. (2004). Designing (pp. 519–559). Hillsdale, NJ: Lawrence Erlbaum
experiments and analyzing data: A model comparison Associates.
192 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Rossi, J. S. (1990). Statistical power of psychological Social, R. R., & Baumann, C. A. (1980). Significance tests
research: What have we gained in 20 years? for coefficients of variation and variability profiles.
Journal of Consulting and Clinical Psychology, 58(5), Systematic Zoology, 29, 50–66.
646–656. Stallings, W. M., & Gillmore, G. M. (1971). A note
Rozeboom, W. W. (1960). The fallacy of the null- on ‘accuracy’ and ‘precision’. Journal of Educational
hypothesis significance test. Psychological Bulletin, Measurement, 8, 127–129.
57, 416–428. Steiger, J. H. (2004). Beyond the F test: Effect
Rozeboom, W. W. (1966). Foundations of the theory of size confidence intervals and tests of close fit
prediction. Homewood, IL: The Dorsey Press. in the analysis of variance and contrast analysis.
Sampson, A. R. (1974). A tale of two regressions. Journal Psychological Methods, 9, 164–182.
of the American Statistical Association, 69, 682–689. Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer
Schmidt, F. L. (1996). Statistical significance testing program for interval estimation, power calculation,
and cumulative knowledge in psychology: Impli- and hypothesis testing for the squared multiple
cations for training of researchers. Psychological correlation. Behavior Research Methods, Instruments,
Methods, 1, 115–129. and Computers, 4, 581–582.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality
statistical power have an effect on the power of interval estimation and the evaluation of statistical
studies? Psychological Bulletin, 105, 309–316. methods. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger
Serlin, R., & Lapsley, D. (1985). Rationality in (Eds.), What if there where no significance tests?
psychological research: The good-enough principle. (pp. 221–257). Mahwah, NJ: Lawrence Erlbaum
American Psychologist, 40, 73–83. Associates.
Serlin, R. C., & Lapsley, D. K. (1993). Rational Stuart, A., & Ord, J. K. (1994). Kendall’s advanced theory
appraisal of methodological research and the good- of statistics: Distribution theory (6th ed.). New York,
enough principle. In G. Keren & C. Lewis (Eds.), NY: John Wiley & Sons.
Methodological and quantitative issues in the analysis Stuart, A., Ord, J. K., & Arnold, S. (1999). Kendall’s
of psychological data (pp. 199–228). Mahwah, NJ: advanced theory of statistics: Classical inference and
Lawrence Earlbaum Associates. the linear model (6th ed., Vol. 2A). New York, NY:
Smithson, M. (2001). Correct confidence intervals Oxford University Press.
for various regression effect sizes and parameters: Thompson, B. (2002). What future quantitative social
The importance of noncentral distributions in science research could look like: Confidence intervals
computing intervals. Educational and Psychological for effect sizes. Educational Researcher, 31(3), 25–32.
Measurement, 61, 605–632. Wainer, H. (1999). One cheer for null hypothesis
Smithson, M. (2003). Confidence intervals. Thousand significance testing. Psychological Methods, 4(2),
Oaks, CA: Sage Publications. 212–213.
12
Re-conceptualizing
Generalization: Old Issues
in a New Frame
Giampietro Gobo
for three reasons. First, because the use of unemployed be extracted if the whole list of
probability samples and statistical inference unemployed people is not available beforehand?
in social research often proves problem- It is true that many unemployed people are enrolled
atic. Second, because there are numerous at job placement offices, but it is equally true
disciplines, in both the social and human that not all unemployed people are so enrolled.
Consequently, the majority of studies on particular
sciences, whose theories are based exclusively
segments of the population cannot make use
on research conducted on only a few cases.
of population lists: consider studies on blue-
Third, because, pace the methodological collar workers, the unemployed, home-workers,
orthodoxy, a significant part of sociological artists, immigrants, housewives, pensioners, foot-
knowledge, is idiographic. My intention is ball supporters, members of political movements,
therefore not to criticize sampling theory charity workers, elderly people living alone, and
or its applications; rather, it is to remedy so on.
a situation where statistical inference is 2 The phenomenon of nonresponse. The concept
deemed the only acceptable method, and of random selection is theoretically very simple
idiographic generalization as scientifically ill- and, thanks to the ideal-typical image of the box,
founded. Finally qualitative researchers do not quite clear to the general public. This clarity is
need to throw away the baby generalization misleading, however, because human beings differ
with the bathwater of probability sampling, from balls in a ballot box in two respects: they
are not immediately accessible to the researcher,
because we can have generalizations without
and they are free to decide not to answer. In fact,
probability. account must be taken of the gap (which varies
according to the research project) between the
initial sample (all the individuals about whom we
THE PROBLEMATIC USE OF want to collect information) and the final sample
PROBABILITY SAMPLES IN SOCIAL (the cases about which we have been able to obtain
RESEARCH information); the two sets may correspond, but
usually some of the objects in the first sample
Several authors (among them Goode and Hatt, are not surveyed. As Groves and Lyberg (1988:
191) pointed out, nonresponse error threatens
1952; Chain, 1963; Galtung, 1967; Capecchi,
the characteristic which makes the survey unique
1972) have stressed that the application of
among research methods: its statistical inference
statistical sampling theory in sociological from sample to population. If the sample is at odds
contexts gives rise to various difficulties. with the probability model, nothing can be said
This theory, in fact, requires the researcher about its general representativeness; that is, about
to construct a probability sample (one, that whether it truly reproduces all the characteristics
is, where each subject’s likelihood of being of the population.
selected is known and also every item has 3 Representativeness and generalizability: two sides
an equal chance of being selected), and the of the same coin? The social science textbooks
cases must be selected in rigorously random usually describe generalizability as the natural
manner. But these two requirements are not outcome of a prior probabilistic procedure.
easy to satisfy in social research, because their In other words, the necessary condition for
carrying out a statistical inference is previous
fulfillment encounters a series of obstacles,
use of a probability sample. It is forgotten,
not all of which can be overcome.
however, that probability/representativeness and
There is no space to describe in depth the generalizability are not two sides of the same
problems and limits of statistical sampling coin. The former is a property of the sample,
theory (see Gobo, 2004). I will briefly whilst the latter concerns the findings of research.
examine three limits only: Put otherwise: between construction of a sample
and confirmation of a hypothesis there intervene a
1 The difficulty of finding sampling frames (lists complex set of activities which pertain to at least
of population) for certain population sub-sets, seven different domains: (1) the trustworthiness of
because these frames are often not available. operational definitions and operational acts; (2)
How, for example, can a random sample of the the reliability of the data collection instrument;
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 195
(3) the appropriateness of conceptualizations; Some have aptly pointed out that ‘most
(4) the accuracy of the researcher’s descriptions, social anthropological and a good deal of
categorizations, and/or measurements; (5) to be sociological theorizing has been founded
successful with observational (or field) relations; upon case studies’ (Mitchell, 1983: 188) or
(6) the validity of the data; and (7) the validity has been the product of exclusively theoretical
of the interpretation. These activities, and their
inquiry (without, that is, being grounded
relative errors (called ‘measurement errors’ in the
on systematic research). The more moderate
literature), may impair the connection between
probability/representativeness and generalizabil-
have complied with the injunction of the
ity – a not infrequent occurrence in a complex statisticians but reconceptualized the problem
activity like social research. by claiming that there are two types of
generalization (which they have termed in
various ways): enumerative (statistical) vs.
These drawbacks do not signify that proba-
analytic induction (Znaniecki, 1934: 236;
bility sampling and statistical inference are
Mitchell, 1983: 191); formalistic/scientific vs.
instruments by their nature unsuited to social
naturalistic generalization (Stake, 1978: 6);
research. Rather, according to the research
distributive vs. theoretical generalization
setting, they are instruments with certain
(Hammersley, 1992: 186ff; Williams, 2000:
practical disadvantages that can sometimes be
215; Payne and Williams, 2005: 296–7).
remedied and sometimes cannot.
The first type of generalization involves
In light of these difficulties, probability
estimating the distribution of particular fea-
sampling cannot be propounded as the only
tures within a finite population; the second,
model suited to the generalization of findings.
eminently theoretical, is concerned with the
As Geertz (1973: 21) points out, it is
relations among the variables in any sample
not only statistical inference that enables
of the relevant kind (moreover, the population
the move from ‘local truths to general
of relevant cases is potentially infinite).
visions.’ Moreover, as we have seen, not
The latter is usually based on identifying
all sociological phenomena can be studied
causal or essential relations among particular
with rigorous application of the principles of
categories, whose character is defined by
sampling theory, the consequence being that
those relations, so that it is inferred that all
the adoption of other forms of generalization
instances of those categories are involved in
has been vital for social research: otherwise,
the specified type of relation.
an important part of sociological theory (that
Even though some qualitative researchers
based on research conducted on a few cases or
may privately agree with Znaniecki (1934:
even on haphazard or convenience samples as
236–7) that analytical induction is the true
in the cases of, for example, Gouldner, Dalton,
method of science and it is the superior method
Becker, Goffman, Garfinkel, Cicourel) would
(because it discovers the causal relations of
never have been produced.
a phenomenon rather than only the probabilis-
tic ones of co-occurrence), the idea that there
exist two types of generalization represents
GENERALIZATION AS SEEN BY acceptance of the statisticians’ diktat. It also
QUALITATIVE METHODOLOGISTS represents acceptance of a ‘political’ division
into areas of competence: a compromise
Qualitative researchers have taken up a variety already envisaged by some members of the
of positions in reaction to the pronouncement Chicago School, like Burgess (1927), who
that those who do not use probability samples maintained that statistics and case studies
cannot generalize. The most extreme of were mutually complementary2 with their
them have (paradoxically) on the one hand own criteria of excellence.
accepted the verdict but on the other dis- The distinction between the two types of
missed sampling as ‘a mere positivist worry’ generalization has been drawn with exem-
(Lincoln and Guba, 1979; Denzin, 1983). plary clarity by Alberoni and colleagues, who
196 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
but to no others. (1990: 191, bold in the original than a handful of nations or organizations –
text) sometimes even fewer – are compared with
respect to the forces driving a societal
In other words the aim is not to generalize outcome such a political development or
to some finite population but to develop an organizational characteristic’ (Lieberson,
theoretical ideas that will have general 1992, reprinted in 2000: 208).
validity.
More practical are authors who engage
in ‘evaluation research’ (Cronbach, 1982; The unavoidableness of
Pawson and Tilley, 1997). These ground their generalization
reasoning on the notion of the cumulability
Sampling and generalizing are unavoidable
of knowledge: case study after case study,
practices because, even before being sci-
in the course of time in a particular sector
entific, they are everyday life activities
of research, there accumulates a repertoire
deeply rooted in thought, language, and
or inventory of the possible forms that
practice (Gobo, 2004). With regard to thought,
a particular object of study may assume.
cognitive psychologists have demonstrated
As Pawson and Tilley (1997: 119–20) put
the tendency of people to generalize on
it, in polemic with Guba and Lincoln, what
the basis of a few observed characteristics
can be transferred between studies are not
or events, a process called the heuristic of
‘lumps of cases’ but ‘sets of ideas’ which
representativeness by Kahneman and Tversky
enable understanding of general mechanisms.
(1972) and Tversky and Kahneman (1974).
In other words, cumulability is the prelude for
With regard to the world of language, the
qualitative generalizability.
same function is performed, as Becker has
The final position, and perhaps the oldest
stated, by ‘synecdoche, a rhetorical figure in
of them, is represented by Znaniecki’s method
which we use a part for something to refer the
of analytic induction. The purpose of analytic
listener or reader to the whole it belongs to’
induction is to uncover causal relations
(1998: 67). Finally, in the world of action, the
through identification of the essential charac-
seller shows a sample of cloth to the customer;
teristics of the phenomenon studied. To this
in a paint shop the buyer skims through the
end, the method starts not with a hypothesis
catalogue of color shades in order to select
but with a limited set of cases from which an
a paint; the buyer tastes in order to choose a
initial explanatory hypothesis is then derived.
wine or a cheese; the teacher asks a student
If the initial hypothesis fails to be confirmed
questions to assess his or her knowledge
by one case, it is revised. Additional cases of
about the syllabus. In everyday life, social
the same class of phenomena are then selected.
actors constantly sample and generalize.
If the hypothesis is not confirmed by these
As Gomm, Hammersley and Foster point out,
further cases, the conceptual definition of the
‘we all engage in naturalistic generalizations
phenomenon is revised. The process continues
routinely in the course of our life, and this
until the hypothesis is no longer refuted
may take the form of empirical generalization
and further study tells the researcher nothing
as well as of informal theoretical inference.
new (Znaniecki, 1934: 236ff). The inner
Given this, there is no reason in principle
logic of analytic induction derives from
why case study research should not provide
Mill’s ‘method of agreement’ and ‘method of
the basis for empirical generalization’ (2000:
difference.’
104). This is also because the unavoidability
There are several variants of Znaniecki’s
of generalization is epistemologically and
method of analytic induction. One of them is
reflexively founded. As Gomm, Hammersley
Mitchell’s (1983) critical case study approach.
and Foster acutely observe:
Analytic induction revisited has been also
widely used in comparative studies based on the very meaning of the word ‘case’ implies that
a small numbers of cases ‘when little more what it refers to is a case [instance or example]
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 197
hypotheses from the sending originating context Naturalistic generalizations develop within a person
may be applicable in the receiving context. (Lincoln as a product of experience. They derive from
and Guba, 1979, reprinted 2000: 40) the tacit knowledge of how things are, why
they are, how people feel about them, and
how these things are likely to be later or in
However, transferability is not an inferential
other places with which this person is familiar.
process performed by the researcher (who They seldom take the form of predictions but
cannot know all the other contexts of they lead regularly to expectations (…) These
research). Rather, it is a choice made by the generalizations may become verbalized, passing of
reader, who on the basis of argumentative course from tacit knowledge to propositional; but
they have not yet passed the empirical and logical
logic and a thick description (of the case
tests that characterize formal (scholarly, scientific)
study) produced by the researcher, may generalizations. (Stake, 1978: 6)
decide (on his/her own responsibility – see
Gomm, Hammersley and Foster, 2000: 102) A third position, which is contiguous
to transfer this knowledge to other situations to the intrinsic case study, has been put
that she/he deems similar (Lincoln and Guba, forward by Connolly (1998). It starts from
1979, reprinted 2000: 40). The reader, basing the distinction between extensive vs. intensive
this on the persuasive power of the arguments studies. The aim of the former (like case
used by the researcher, decides on the studies) is to identify statistically significant
similarity between the (sending) context of the and therefore generalizable causal relations;
case studied and the (receiving) contexts to the aim of the latter is to reconstruct in detail
which the reader him/herself intends to apply the mechanisms that connect cause and effect.
the results (Guba and Lincoln, 1982: 246). Like Stake, Connolly relieves the case study
To conclude, these authors are convinced of responsibility for formal generalization, but
that ‘generalizations are impossible since he gives it a task complementary to such gen-
phenomena are neither time- nor context- eralization, explaining (via the mechanisms)
free’; however, ‘some transferability of these correlations whose statistical significance has
hypotheses may be possible from situation to already been documented by other studies.
situation, depending on the degree of temporal These three positions have a common
and contextual similarity’ (Guba and Lincoln, basis consisting in the concept of ‘theoretical
1982: 238). sampling’ proposed by Glaser and Strauss
A second, more moderate, approach has (1967), Schatzman and Strauss (1973) and
been proposed by Stake (1978: 1994), who Strauss (1987): when we do not possess
argues that the purpose of case studies is complete information about the population,
not so much to produce general conclusions cases are selected according to their status on
as to describe and analyze the principal one or more properties identified as the subject
features of the phenomenon studied. If these matter for research.As Mason writes, ‘theoret-
features concern an emblematic case of ical sampling is concerned with constructing
political, social, or economic importance (for a sample which is meaningful theoretically
example, the decision-making procedures of because it builds in certain characteristics
a large institution like the US Department or criteria which help to develop and test
of Defense), the ‘intrinsic case study’ will your theory and explanation’ (1996: 94). And
per se produce results of indubitable intrinsic Strauss and Corbin are very explicit on the
relevance5 , even though they cannot be concept of generalization:
generalized in accordance with the canons of
scientific induction: in terms of making generalization to a larger popu-
lation, we are not attempting to generalize as
naturalistic generalization, arrived at by recognizing such but to specify […] the condition under which
the similarities of objects and issues in and out of our phenomena exist, the action/interaction that
context and by sensing the natural covariations of pertains to them, and the associated outcomes or
happenings. To generalize this way is to be both consequences. This means that our theoretical for-
intuitive and empirical, and not idiotic. mulation applies to these situation or circumstances
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 199
tackled the problems of representativeness which climbed trees and used wings to
and generalizability. In certain respects, these glide back to earth. This was the theory
are disciplines akin to qualitative research, propounded, for example, by the American
for they work exclusively on few cases and naturalist, William Beebe, who as early
have learnt to make a virtue out of necessity. as 1915 had predicted the existence of
As Becker writes: feathered dinosaurs exactly like Microraptor
gui. However, the British journal urged
Archeologists and paleontologists have this prob- caution when evaluating the importance of
lem to solve when they uncover the remnants of the discovery: the Microraptor could also be
a now-vanished society. They find some bones,
but not a whole skeleton; they find some cooking
an evolutionary blind ally which had not left
equipment, but not the whole kitchen; they find descendants.
some garbage, but not the stuff of which the There are therefore numerous disciplines
garbage is the remains. They know that they are which work on a limited number of cases,
lucky to have found the little they have, because and do so consciously; in fact, there is
the world is not organized to make life easy for
archeologists. So they don’t complain about having
animated discussion within them on sam-
lousy data. (1998: 70–1) pling and generalizability. Moreover, this
procedure is adopted by other disciplines
For reasons of space, it is not possible here to as well: for instance, biology, astrophysics,
provide an exhaustive account of how these history, genetics, anthropology, linguistics,
disciplines have dealt with the above issues. cognitive science, psychology (whose theo-
But by way of example, consider the following ries are largely based on experiments, and
study, which is one of the dozens published on therefore on research conducted on non-
the subject. It appeared in the journal Nature probabilistic samples consisting of psychol-
on January 23, 2003. ogy students). Why, we may ask, is this
The scientist Xing Xu and colleagues procedure acceptable for monkeys, rocks,
(2003) of the Institute of Vertebrate Palaeon- and cells but not for human beings? Why
tology, Beijing, had found six fossils in do the majority of disciplines work with/on
the province of Liaoning, North China. non-probability samples (regarded as being
The impression left in the rock was of two just as representative of their relative pop-
pairs of wings and a long feathered tail ulations and therefore as producing gen-
of what appeared to be a Microraptor gui: eralizable results) while in sociology this
a dinosaur less than one meter in length is not possible? Why can a geneticist like
which lived in that region of China around Luca Cavalli Sforza of Stanford University
130 million years ago. According to its argue that the evolution of language has
discoverers, the fossil was the missing link had a direct impact on our genetic her-
between terricolous dinosaurs and modern itage, while in sociology a similar claim
birds, the intermediate evolutionary stage for would require very different methodological
which scientists had long been searching. support? The majority of these disciplines
The discovery has fuelled the debate among start from the assumption that their objects
paleontologists on the origin of flight. Whilst of study possess quasi-invariant states on
the close kinship between birds and dinosaurs the properties observed: that is, their states
is accepted by almost all scientists, there is with respect to a property (e.g. size of the
much disagreement on the evolutionary stages brain or the physique of a hominid) vary
that led to winged flight. The predominant little and slowly among members of the
theory is that wings began to develop, not class. Consequently, these disciplines are
to enable flight but to help the ancestors unconcerned about their use of only a handful
of birds to run faster. The small dinosaur of cases to draw inferences and generaliza-
discovered in China instead appeared to tions about thousands of people, animals,
support the opposite hypothesis, namely that plants, and other objects. Moreover, science
the direct ascendants of birds were animals studies the individual object/phenomenon not
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 201
in itself but as a member of a broader nothing can be said. Between the rationalism
class of objects/phenomena with particular and the postmodern nihilism underlying these
characteristics/properties. two positions, one may attempt to address
the problem in practical terms, doing so
by examining the nature of the units of
FOUR PROPOSALS FOR AN analysis considered, rather than adhering to
IDIOGRAPHIC SAMPLING THEORY standard procedural rules. As stressed by
Rositi (1993: 198), we may reasonably doubt
The above survey of disciplines midway the generalizability of findings from
between the natural sciences and the social
studies of 1,000–2,000 cases which claim to sample
science yields a number of suggestions the whole population. We have to wonder if we
for formulation of an idiographic sampling should prefer such samples with such aims […].
theory. They can be summarized in the Studies with samples of 100–200 conversational
following four steps: interviews, structured to ‘describe’ variables rather
than a population are definitely more suitable for
a new model of studying society. (1993: 198)
(a) abandon the (statistical) principle of probability;
(b) recover the (statistical) principle of variance;
(c) pay renewed attention to the units of analysis;
(d) identify social regularities.
Variance: From (general) principle to
(local) practice
The second step is to recover the (statistical)
Representativeness without
principle of variance, which has received
probability less attention than the probability principle.
The use of probability samples does not Contrary to the latter’s standardizing intent
automatically signify the use of representative and automatist inclination (which are among
samples. Random and representative are terms the reasons for its success), variance is
neither synonymous nor necessarily inter- a criterion which requires the researcher to
related. ‘Randomness’ concerns a particular reason, to conduct contextual analysis, and
procedure used to select the cases to include in to take local decisions. Under the variance
a sample, while ‘representativeness’ concerns principle,
the outcome of the selection. One may
in order to determine the sample size, the statistics
question whether the former is the obligatory
must first know the range of variance that the
path for the latter. Nor do representativeness researcher intends to measure (at least in sufficiently
and probability form a natural pair, since it close terms) because it is likely that, if the range
may be possible to construct a representative of variance of variable X is high, n [the number of
sample using other procedures. Qualitative individuals to interview] will be high, whereas if the
range of variance is restricted (for example to only
research (or at least a part of it) does
two modalities), n may be very restricted as well.
not relinquish the aim of working with (Capecchi, 1972: 50)
representative samples; it only rejects the
obligatory nexus between probabilistic and Hence, it is more likely that a sample will be a
representative (on the one hand), or between miniature of the population if that population
randomness and representativeness (on the is tendentially homogeneous; and it is less
other). likely to be so if the reference population
It is therefore not necessarily the case is tendentially heterogeneous. Consequently,
that a researcher must choose between an if the variance is high, the researcher will
(approximately) random sample or an entirely require a large number of cases (in order to
subjective one – or between a sample which include every dimension of the phenomenon
is (even only) partially probabilistic and one studied in his/her sample). If, instead, the vari-
about whose representativeness absolutely ance is low, the researcher will presumably
202 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
need only a few cases, and in some instances the quintessence of non-probability sam-
only one. In other words, pling. The research studies by Alvin G.
Gouldner and Melvin Dalton belong to this
it is important to recognize that the greater the category. For example, Gouldner (1954)
heterogeneity of a population the more problematic studied a gypsum mine situated close to the
are empirical generalizations based on a single case,
university where he taught (a convenience
or a handful of cases. If we could reasonably assume
that the population were composed of more or less sample, therefore6 ). In his methodological
identical units, then there would be no problem. appendix, Gouldner reported that his team
(Gomm, Hammersley and Foster, 2000: 104) conducted 174 interviews – and therefore
on almost all the population (precisely
As also Payne and Williams (2005: 306–7) 77 percent). One hundred and thirty-two of
point out: these 174 interviews were conducted with
a ‘representative sample’ of the blue-collar
the breadth of generalization can be extensive workers at the company, for which purpose
or narrow, depending on the nature of the
Gouldner used quota sampling stratified by
phenomenon under study and our assumptions
about the wider social world (…) [hence] the age, rank, and tasks. He then constructed
generalization may claim high or lower levels of another representative sample of 92 blue-
precision of estimates (…) [and it] will be conditional collar workers, to whom a questionnaire was
upon the ontological status of the phenomena in administered.
question. We can say more, or make stronger claims
Dalton (1959), who was a company
about some things than others. A taxonomy of
phenomena might look like this: 1◦ physical objects manager at that time, conducted covert
and their social properties; 2◦ social structures; observation at Milo and Fruhuling, the
3◦ cultural features and artefacts; 4◦ symbols; fictitious names of two American compa-
5◦ group relationships; 6◦ dyadic relationship; nies for which he worked as a consultant
7◦ psychological dispositions/behaviour (…) This
(again a convenience sample, therefore).
outline taxonomy demonstrates that generaliza-
tions depend on what levels of social phenomena The ethnologist De Martino (1961) observed
are being studied. 21 people suffering from tarantism disease;
Goffman (1961) stayed for several months
The conversation analyst Harvey Sacks at a psychiatric hospital; the anthropologist
(1992, vol. 1: 485, quoted in Silverman, Geertz (1972) attended 57 cock fights; Sacks
2000: 109) reminds us of the anthropologist and colleagues described the mechanics
and linguist Benjamin Lee Whorf, who of conversational interaction by analyzing
was able to reconstruct Navajo grammar a few telephone calls; the anthropologist
by extensively interviewing only one native Crapanzano (1980) studied Moroccan social
Indian speaker. Grammars usually have low relations through the experience of Tuhami,
variance. However, had Whorf wanted to a tilemaker. The anthropologist Griaule
study how the Navajo educated their children, (1948) reconstructed the cosmology of the
entertained themselves, etc., he would (per- Dogon, a tribe in Mali, by questioning only
haps) have found greater variance in the a small group of informants; Bourdieu’s
phenomenon and would have needed more book (1993) on professions was based on
cases. On this logic, the formal criteria 50 interviews with policewomen, temporary
that guide sampling are more informed by workers, attorneys, blue-collar workers, civil
and embedded in sociological (rather than servants, and unemployed workers.
statistical) reasoning based on contingent Why, one may ask, have such circum-
reflection about the dimensions specific to the scribed studies given rise to such wide-
phenomenon investigated and the knowledge ranging theories? In other words, why have
objectives of the research. they been generalized to other contexts? I shall
Moreover, as said, an authoritative part of answer these questions later. For the moment
sociological theory and a large part of anthro- I would stress (and avoid) the danger of
pological theory are based on the case study: the nihilistic or postmodern drift implied
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 203
by this approach, where any sample may This means that the sampling unit (e.g. the
serve and it is not worth bothering too much family) is different from the observational
about it. Instead, at a certain point of the unit (i.e. the single respondent as a mem-
inquiry, giving clear definition to the units ber of the family). Only a focus group
of analysis (an operation performed before can (at least to some extent) preserve the
the cases are selected, and therefore before integrity of the collective subject. Instead,
the sample is constructed) is of extreme choosing individuals implies an atomistic
importance if the research is not to be botched rather than organic conception of society
and empirically inconsistent. On analyzing (Burgess, 1927), whose structural elements
a series of Finnish studies on ‘artists,’Mitchell are taken for granted or reckoned to be
and Karttunen (1991) found that the results mirrored in the individual (Galtung, 1967:
differed according to the definition given to 37), while the sociological tradition that
‘artist’ by the researchers, a definition which gives priority to relations over individuals is
then guided construction of the sample. In neglected. As a consequence, the following
some studies, the category ‘artist’ included more dynamic units are neglected as well:
(i) subjects who defined themselves as
artists; (ii) those permanently engaged in • beliefs, attitudes, stereotypes, opinions;
the production of works of art; (iii) those • emotions, motivations;
recognized as artists by society at large; and • behaviors, social relations, meetings, interactions,
(iv) those recognized as such by associations ceremonies, rituals, networks;
of artists. The obvious consequence was that it • cultural products (such as pictures, paintings,
was subsequently impossible to compare the movies, theatre plays, television programs);
results of these studies. • rules and social conventions;
• documents and texts (historical, literary, journal-
istic);
• situations and events (wars, elections).
Units of analysis
The standard practice in sociology and Hence, ‘a reliable sampling model that rec-
political science is to choose clearly defined ognizes interaction must be adopted [so that
and easily detectable individual or col- sampling is conducted on] interactive units
lective units: persons, households, groups, (such as social relationships, encounters,
associations, movements, parties, institutions, organizations)’ (Denzin, 1971: 269).
organizations, regions, or states. The consis- The researcher should focus his/her inves-
tency of these collective subjects is vague. tigation on these kinds of units, not only
In practice, members of these groups are because social processes are more easily
interviewed individually: the head of the detectable and observable, but also because
family, the human resources manager, the these units allow more direct and deeper
statistics department manager, and so on. analysis of the characteristics observed.
Consider the following illustrative example. Assume that we want to study work practices at call
centers, which are technology-intensive workplaces. In Italy, it has been calculated that there were
1350 call centers in 2002. In order to construct a probability and representative sample, we may
proceed in two ways: randomly extract a certain number of cases from the population list (which is
possible because a complete list can be obtained from the Chambers of Commerce), or construct a
proportional stratified sample. In this latter case, we must first classify call centers according to the
properties that interest us:
• the ownership of the organization, so that we have private call centers (e.g. Vodafone), public ones (e.g.
the 911 emergency helpline), and non-profit ones;
204 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
• the ‘vocation,’ so that we have call centers that are ‘generalist’ (in the sense that they provide a variety
of services) or ‘vertical’ (i.e. dedicated to only one service, e.g. credit recovery);
• membership or otherwise of the organization for which the service is provided, so that we have call
centers ‘internal’ to the company, or ones to which the work is outsourced;
• the classic variables such as size of the organization (small, medium, large), geographical location
(north-west, north-east, centre, south, islands), etc.;
• the type of service furnished.
Note that many of these properties are mutually exclusive, so that the sampling decision must be
carefully pondered. In these cases, the usual practice is for the researcher to base the probability
sampling on the first property. However, this may be sociologically inadequate if the researcher’s
interest is in work practices, because these cannot be accessed via the variable ‘ownership.’ For some
authors (e.g. Capecchi, 1972), representativeness does not seem to transfer from one property to
another. Put otherwise: it is not the variance of the ownership of call centers that interests us
here, but the variance of work practices. It might be more satisfactory to choose property (e).
Experience of this sector of inquiry (but also the literature, previous research, interviews with
experts or operators in the sector, etc.) shows that call centers mainly provide the following services:
counseling, credit recovery, marketing, interviewing, and advertising. Constructing a probability
sample on this classification is practically impossible because a population list for each of these
activities does not exist. The only alternative is to use the method outlined in the previous section.
Again on the basis of experience, we note that only the first of these five activities has substantial
variance, while the four latter seem to have low variance. In fact, the counseling provided by call
centers is multiform: it consists of information, technical assistance, psychological help or support,
medical advice, or therapy. Consequently, in order to preserve the representativeness of the sample,
we must sample several cases for the specific work practice of counseling. If we have insufficient
resources to collect the necessary number of cases, we can restrict our research to only some
activities. Other studies in the future will account for the rest.
It is evident that representativeness is wants to make. The first two criteria are
not always possessed by the sample when in some way opposed to each other: com-
research begins. It is a resource also acquired parative inference maximizes the probability
ex post, progressively and iteratively, research of extracting odd cases; deductive inference
project after research project, with the gradual selects only odd (deviant) cases. Theoretical
accumulation of expertise. This definition of inference instead concentrates on emblematic
representativeness seems somehow to tie this cases, focusing on social similarities.
property to the relation between the results
obtained by an individual research project Deductive inference
and the experience of the researcher who The first criterion consists of the choice of
conducts it. a critical or deviant case which can be used
(à la Popper) to prove the refutability of an
accredited or standard theory. An outstanding
In search of social regularities
example of its application is provided by
I now turn to the final aspect of the entire Goldthorpe et al.’s study (1968) of workers
question. There are three broad criteria which in the town of Luton. The distinctive feature
serve to orient the construction of a non- of this inferential process is that it starts from
probability sample; and to each of them a theory of which it intends to prove the
corresponds a particular form of reasoning implausibility: in this case the embourgeoise-
alternative to inductive or statistical inference: ment of the working class. The theory is tested
deductive inference, comparative inference, against a case comprising the largest number
and emblematic case. (and the greatest intensity) of its founding
The three criteria impose different cogni- properties or requirements of this theory. If, in
tive objectives, and they are used according to these optimal conditions, the consequences
the type of generalization that the researcher foreseen by the theory do not ensue, it is
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 205
extremely unlikely that the theory will work to which a phenomenon is widespread in
in all those empirical cases where those the population. It only directs the scientific
requirements are more weakly present. Hence community’s attention to the phenomenon’s
the theory is falsified, and its inadequacy existence and the need to revise the dominant
can be legitimately generalized. When the theory. The generalization to the population
critical case study procedure is used, the cases comes about by default: that is by virtue of
are selected according to their explanatory the non-occurrence of the event foreseen by
power, rather than according to the criteria of the theory under examination.
probability theory or their typicality (Mitchell, Obviously, the generalization must be
1983: 207, 209). Moreover, the legitimacy of carefully thought through. Otherwise, the
the generalization (of the scant explanatory danger arises of lapsing into the determinism
capacity of the theory just falsified) depends to which Popper’s falsificationism is suscep-
not only on the cogency of the rhetorical tible. As Lieberson (1992: 212) emphasizes:
argument but also on the strength of the
connections established between theory and it is very difficult to reject a major theory because it
observations. appears not to operate in some specific setting. One
There are many other important studies is wary of concluding that Max Weber was wrong
(which follow in a very broad sense the because of a single deviation in some inadequately
Popperian approach) which have focused on understood time or place. In the same fashion, we
would view an accident caused by a sober driver as
deviant cases in order to understand standard failing to disprove the notion that drinking causes
behavior: Goffman (1961) on ceremonies and automobile accidents.
rituals in a psychiatric clinic; Cicourel and
Boise (1972) on the interpersonal communi- Comparative inference
cations of deaf children; Garfinkel (1967) on The second criterion is used to make gen-
achievement of sex status in an ‘intersexed’ eralizations similar to statistical inferences,
person; Pollner and Winkler (1985) on but without employing probability criteria.
interactions in a family with a mentally This can be done by identifying cases
retarded child; and many others. within extreme situations as well as certain
This criterion can also be used to explore characteristics, or cases within a wide range of
subcultures or emergent or avant-garde phe- situations in order to maximize variation, that
nomena which may become dominant or is, to have all the possible situations in order
significant in the future, although at present to capture the heterogeneity of a population.
they are still marginal: see Festinger et al. We can choose two elementary schools
(1956) on millenial groups after their pre- where, from press reports, previous studies,
dicted date for the end of the world had interviews or personal experiences, we know
passes; Becker (1953) on marijuana smokers; we can find two extreme situations: in the
Hebdige (1979) on style groups like mods, first school there are severe difficulties of
punks, skinheads; Fielding (1981) on right- integration between natives and immigrants,
wing political movements. while in the second there are virtually none.
The deviant case can also be used to prove We can also pick three schools: the first with
the refutability and falsifiability of a well- severe integration difficulties; the second with
known and received theory, as in Rosenhan’s average difficulties; and the third with rare
(1973) study on the medical-organizational ones. In the 1930s and 1940s, the American
origin of psychiatric illness, or the already- sociologist W. Lloyd Warner (1898–1970) and
cited study by Goldthorpe et al. (1968) on his team of colleagues and students carried out
blue-collar workers in the town of Luton. This studies on various communities in the United
criterion (which is widely applied in biology, States. When Warner set about choosing the
astrophysics, history, genetics, anthropology, samples, he decided to select communities
linguistics, paleontology, archaeology, ethol- whose social structures mirrored important
ogy, geology) does not determine the extent features of American society. He chose four
206 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
communities (given assumed names): a city the construction of a sample: the typical or
in Massachusetts (Yankee City) ruled by tradi- emblematic case.
tions on which he wrote five volumes; a lonely Gouldner’s case studies (1954) on bureau-
county of Mississippi (Deep South, 1941); a cratization in medium-sized firms, or that by
Chicago black district (Bronzetown, 1945); Cicourel (1968) on the relational construction
and a city in the Midwest (Jonesville, 1949). of the figure of the juvenile delinquent, have
In comparative inferences, the cases are been considered amply generalizable (by both
selected by making careful comparisons: first researchers and readers), probably because
by seeking to find cases which represent all the they were typical cases and consequently
forms of heterogeneity in a target population, grasped structural aspects of the social action
and then by controlling whether they are in the organizations studied. Nor should we
sufficiently homogeneous with the type that forget that the question of generalizability
one wants to represent. In this difficult but is closely tied to the phenomenon being
important analysis, researched, according to the degree of vari-
ance in its states.
it is necessary to compare the characteristics of This means that it is possible to find cases
the case(s) being studied with available information which on their own can represent a significant
about the population to which generalization
feature of a phenomenon. Generalizability
is intended (…) we are suggesting that where
information about the larger population (or about thus conceived concerns more general struc-
overlapping populations) is available, it should be tures and is detached from individual social
used. If it is not available, then the potential practices, of which they are only an instance.
risks involved in generalization still need to be In other words, the scholar does not generalize
noted, preferably via specification of likely types
the individual case or event, which as Weber
of heterogeneity that could render the findings
unrepresentative. (Gomm, Hammersley and Foster, stressed is unrepeatable, but the key structural
2000: 105–106) features of which it is made up, and which are
to be found in other cases or events belonging
We are therefore very distant from the to the same species or class. As Becker has
concepts of naturalistic generalization and recently pointed out:
transferability, which are unsatisfactory in
various respects, for they ‘do not provide in every city there is a body of social practices —
a sound basis for the design, or justification, forms of marriage, or work, or habitation — which
of case study research’ (Gomm, Hammersley don’t change much, even though the people who
perform them are continually replaced through
and Foster, 2000: 102). They assign the reader the ordinary demographic process of birth, death,
a function which should also be performed immigration, and emigration. (2000: 6)
by the researcher (assuming responsibility
for affirming the generalizability of the On this view, the question of generalizability
study’s findings). They therefore relieve the assumes a different significance: for example
researcher of responsibility for the careful in the conclusions to his study on the
selection of cases on the basis of the variance relationship between a psychotherapist and
principle, and not solely on the basis of a patient suffering from AIDS, Peräkylä
the theoretical significance of theoretical writes:
sampling and of all research on variables
(rather than cases). As Schofield (1990) notes,
The results were not generalizable as descriptions
all too often cases seem to be chosen for of what other counselors or other professionals
reasons of convenience and are therefore do with their clients; but they were generalizable
atypical in various respects. as descriptions of what any counselor or other
professional, with his or her clients, can do, given
that he or she has the same array of interactional
The emblematic case competencies as the participants of the AIDS
If we bear the variance principle in mind, counseling session have. (1997: 216, quoted in
there emerges a third major criterion for Silverman, 2000: 109)
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 207
Something similar happens in film and the researcher has learnt in the field is provided
radio productions with noise sampling. by Becker:
The squeak of the door (which gives us the
Blanche Geer and I were studying college students.
shivers when we watch a thriller or a horror At a certain point, we became interested in
film) does not represent all squeaks of doors, student ‘leaders,’ students who were heads of
but we associate it with them. We do not think major organizations at the university (there were
about the differences between that squeak and several hundred of them). We wanted to know
the one made by our front door; we notice the how they became leaders and how they exercised
their powers. So we made a list of the major
similarities only. These are two different ways organizations (which we could do because we
of thinking, and most social sciences seek to had been there for a year and knew what those
find patterns of this kind. were, which we would not have known when we
While the verbal expressions of an inter- began) and interviewed twenty each of men and
active exchange may vary, exchange based women student leaders. And got a great result —
it turned out that the men got their positions
on the question-answer pattern features a for- through enterprise and hustling, while the women
mal trans-institutional (though not universal) were typically appointed by someone from the
structure. While laying a page of a newspaper university! (Howard Becker, 13/7/2002, personal
on the floor and declaring one’s sovereignty communication)
over it (Goffman, 1961) is a behavior observed Consistency must be given to the sampling
in one psychiatric clinic only, the need to have reasoning, but not by mere application of
a private space and control over a territory has procedural steps. The reasoning could be as
been reported many times, albeit in different follows.
forms.
1 The researcher usually starts from his/her research
questions. Melvin Dalton’s were:
INTERACTIVE, PROGRESSIVE, AND
ITERATIVE SAMPLING: SOME TIPS Why did grievers and managers form cross-cliques?
Why were staff personnel ambivalent toward line
Having outlined the theoretical premises of officers? Why was there disruptive conflict between
an idiographic sampling theory, I shall now Maintenance and Operation? If people where
describe its procedural aspects. However, awarded posts because of specific fitness, why
there is no precise logical itinerary to set out, the disparity between their given and exercised
because methodological principles and rules influence? Why among executives on the same
formal level, were some distressed and some not?
do not have to stand on their own – as they are
And why were there such sharp differences in
instead required to do in statistical sampling
viewpoint and moral concern about given events?
theory – in that they have only a weak What was the meaning of double talk about success
relation to practice. It is instead necessary as dependent on knowing people rather than on
to approach the entire question of sampling possessing administrative skills? Why and how
sequentially, and it would be misleading to were ‘control’ staffs and official guardians variously
plan the whole strategy beforehand. In order to compromised? What was behind the contradictory
achieve representativeness, the sampling plan policy and practices associated with the use of
must be set in dialogue with field incidents, company materials and services? Thus the guiding
contingencies, and discoveries. This is what question embracing all others was: what orders
I mean by ‘interactive, progressive, and the schism and ties between official and unofficial
action? (1959: 274)
iterative sampling.’ An excellent instance of
this procedure ‘is given in Glaser and Strauss’s
Research questions comprise the concepts and
(1964, 1968) studies on dying in the hospital, categories (behaviors, attitudes, and so on) that
where hypotheses were developed hand in the researcher intends to study.
hand with data collection’ (Denzin, 1971: 2 The researcher conducts primary (or ‘provisional’
269). Another example of changing or adding and ‘open’7 : Strauss and Corbin, 1990: 193)
to the sampling plan on the basis of something sampling in order to collect cases in accordance
208 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
with the concepts. As Payne and Williams make connections among them, thus formulating
(2005: 295) suggest, ‘research design should plan working hypotheses. Even though not every
for anticipated generalizations, and that general- hypothesis is testable (indeed the most interesting
ization should be more explicitly formulated within ones often are not), if the reader is to be persuaded,
a context of supporting evidence.’ they must be formulated in a testable way.
3 Because not every concept can be directly studied, 7 When the researcher has formulated hypotheses,
when the researcher constructs the provisional s/he restarts sampling in order to collect cases
sample, s/he considers the following aspects: systematically relating to each hypothesis, and
seeking to make his/her analysis consistent.
(a) specificity (focusing on specific social activities Strauss and Corbin call this second sampling
with distinctive features, like rituals or ‘relational and variational: is associated with
ceremonies); axial coding. It aims to maximize the finding of
(b) the field’s degree of openness (open or closed differences at the dimensional level’ (1990: 176).
places); They depict the research process as funnel-shaped:
(c) intrusiveness (the endeavor to reduce the through three increasingly focused steps (open,
researcher’s visibility); axial, and selective) the researcher clarifies his/her
(d) institutional accessibility (free-entry versus statements because ‘consistency here means
limited-entry situations within the organiza- gathering data systematically on each category’
tion); (Strauss and Corbin, 1990: 178). When the
(e) significance (frequent and high organizational researcher finds an interesting aspect, she/he must
significance of social activities). always check whether it occurs in other samples.
8 Generalization must be ensured ‘across and within
cases (…) [because] the danger of error in drawing
4 It is advisable to sample type of actions or
general conclusions from a small number of cases
events: ‘not, then, men and their moments. Rather
must not be underestimated’ (Gomm, Hammersley
moments and their men’ (Goffman, 1967: 3), ‘not
and Foster, 2000: 98). This concept has been some-
only people but moments of lived life’ (Converse
times rubricated as ‘internal generalization,’ and it
and Schuman, 1974: 1), ‘incidents and not
implies different strategies which take account of
persons per se!’ (Strauss and Corbin, 1990: 177),
diverse dimensions: time, sites, days, and people.
in contrast with the common practice of sampling
The researcher should collect cases of behavior
bodies, and of seeking information from these
recurring at different moments of time. Because
bodies about behaviors and events that are never
the researcher cannot observe the case-study
observed directly (Cicourel, 1996). There are two
population twenty-four hours a day, s/he must take
reasons for this important recommendation: first,
a decision on when and where s/he will observe the
it serves to prevent the survey sampling mistake
population (Schatzman and Strauss, 1973: 39–41;
concerning the transferability of ideas about
Corsaro, 1985: 28–32). Unfortunately,
representativeness; second, the same person may
be engaged in overlapping activities. For example,
Dalton (1959), when studying power struggles in case study researchers rarely make clear what
companies, found five ‘types of cliques:’ vertical they take to be the temporal boundaries of
(symbiotic and parasitic), horizontal (defensive and the cases they have studied (…) it is not
aggressive), and random. If we sample individuals, unusual for case studies of schools to focus on
we find that they belong to more than one one year-group or cohort of students and to
clique according to the situation, intention, and so assume that the experience of these students is
on. If we consider activities, everything becomes representative of other cohorts, past and future.
simpler. (Gomm, Hammersley and Foster, 2000: 109)
5 To date, four main types of sampling have been
developed in social research: purposive, quota, Social practices always occur in certain places and
emblematic, and snowball. When cases are at certain times of the day. Only if the researcher
selected, attention should be paid to the variance knows all the rituals of the organization observed
of concept, so that different voices or cases can can s/he draw a representative sample.
be included in the sample.
6 As the research proceeds, the researcher will refine A classic illustration is provided by Berlak et al.’s
his/her ideas, categories and concepts, or come study of progressive primary school practice in
up with new ones. The important thing is to Britain in the 1970s (Berlak and Berlak, 1981;
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 209
Berlak et al., 1975). They argued that previous phenomenon cumulatively, study by study.
American accounts had been inaccurate because As Gomm, Hammersley and Foster (2000:
observation had been brief and had tended to take 107) acknowledge:
place in the middle of the week, not on Monday
or Friday. On the basis of these observations, it is possible for subsequent investigations
the inference had been drawn that in progressive to build on earlier ones by providing
classrooms children simply chose what they wanted additional cases, so as to construct a
to do and got on with it. As Berlak et al. document, sample over time that would allow
however, what typically happened was that the effective generalization. At the present,
teachers set out the week’s work on Mondays, this kind of cumulation is unusual (…)
and on Fridays they checked that it had been the cases are not usually selected in such
completed satisfactorily. Thus, earlier studies were a way as to complement previous work;
based on false temporal generalizations within (c) representative samples are used to justify the
cases they investigated. (Gomm, Hammersley and researcher’s statements.
Foster, 2000: 109–110)
It is therefore apparent that, although on the
Qualitative researchers do not seek to know the
distribution of such behaviors (how many times);
one hand ‘generalization is not an issue that
they only seek to know whether they are recurrent can be dismissed as irrelevant by case study
and significant in the organization under study. In researchers’(Gomm, Hammersley and Foster,
addition, ‘our concern is with representativeness 2000: 111), on the other it is not the impossible
of concepts’ (Strauss and Corbin, 1990: 190). And undertaking that survey researchers have
finally, in regard to people and sites, always mocked. Finally, whilst probability
sampling has a substantive aim – to construct
there is also likely to be variation in the behavior a sample in order to extend the findings to the
of both teachers and pupils across different population – interactive sampling has a further
contexts within a school. While most contact task: to reflect, through its recursiveness, on
between members of the two groups probably the plausibility of generalizations.
occur in classrooms, they also meet one another
in other places as well: in assembly halls, dining
rooms, corridors, on game fields, and so on CONCLUSION
(…) Teacher-pupil relationships are likely to vary
across mathematics classrooms, drama studios
Statistical inference (survey) and theoretical
and science laboratories, for example. (Gomm,
Hammersley and Foster, 2000: 111) inference (experiment), as the two legitimate
ways to draw general conclusions, continue
9 The researcher can sample new incidents or to be used even though their application is
s/he can review incidents already collected: fraught with difficulties; and they in fact
‘Theoretical sampling is cumulative. This is end up by deviating from their theoretical
because concepts and their relationships also principles and assumptions. Hence one fails to
accumulate through the interplay of data understand why it is not possible to resort to
collection and analysis […] until theoretical other forms of generalization which, though
saturation of each category is reached’ (Strauss unsatisfactory, are no more unsatisfactory
and Corbin, 1990: 178, 188). that those deemed superior to them. For that
10 This interplay between sampling and hypothesis
matter, contemporary social scientists do not
testing is needed because
have to choose between perfect and imperfect
forms of generalization, but between forms
(a) representative samples are not predicted
of inference whose strengths and weaknesses
in advance but found, constructed, and
discovered gradually in the field; depend on the researcher’s cognitive aims,
(b) it reflects the researcher’s experience, the research situation, and the nature of the
previous studies, and the literature on phenomenon under study.
the topic. In other words, the researcher The central idea of this essay lies
will come to know the variance of a midway between two highly authoritative
210 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
and well-known methodological proposals: States. Comparison between the results of the
Durkheim’s (1912) cas pur (the ‘pure case’), two research studies showed that the three
with positivist overtones, and Max Weber’s researchers had discovered almost identical
(1904) theory of ideal types. Durkheim patterns of behavior. The reason for this simi-
believed that the simplest society of all for larity was probably that the survey interview-
study of the elementary forms of religious ers had been trained with textbooks widely
life was the Australian tribe of the Arunta. used on both sides of theAtlantic, and that they
The Flemish statistician and sociologist had used artifacts – technological (telephone,
Adolphe Quételet (1796–1874) looked to the keyboard), cognitive (questionnaires), and
crowd for his homme moyen (the average organizational (scripts or interview formats) –
man), who represented the ‘normality’ of the which made the social activities very similar.
species. He was prompted to do so by the There are consequently numerous social
discovery that certain characteristics (physical research settings in which a few cases may
and biological) of individuals were distributed suffice to make a generalization. Provided
in the populations which he studied according they are chosen carefully.
to the ‘normal’ curve constructed by the
mathematician Gauss.
Conversely, Weber maintained that ‘feu-
dal society,’ ‘bureaucracy,’ ‘charisma’ were NOTES
genetic concepts (developed with a view to
a causal explanation) and limiting concepts.
They consequently could not be evaluated 1 To be stressed is that the distinction between
probability and non-probability does not mark
in terms of their reality-describing adequacy, the boundary between qualitative and quantitative
only in terms of their instrumental efficacy. research: in fact, non-probability samples are also used
For Weber (1904), an ideal type was not for surveys (quota, telephone, and so on) and for
a representation of the real; rather, it was experiments.
formed by a one-sided accentuation of one or 2 This compromise centered on the idea of
complementarity is still accepted by numerous
more points of view and by the connection of methodologists: see for instance Payne and Williams
a quantity of diffuse, discrete, more or less (2005: 297).
present and occasionally absent, particular 3 Indeed, there are some who maintain that
phenomena. Given the conceptual purity of generalizability is perhaps the wrong word for what
an ideal type, it could never be empirically qualitative researchers seek to achieve: ‘Generaliza-
tion is (…) [a] word (…) that should be reserved for
detected in reality; it was a utopian entity. surveys only’ (Alasuutari, 1995: 156–7).
The typical or emblematic case suggested 4 However, Denzin’s (1971) position was very
as a criterion for the construction of sample different at the end of the 1960s: he expressed himself
stands midway between the claim to have in favor of operationalization (‘this does not mean that
discovered the pure case (the quintessence operationalization is avoided – it merely suggests that
the point of operazionalization is delayed until the
of the phenomenon studied) and renunciation situated meaning of concepts is discovered,’ p. 268);
of the empirical search for cases of interest he believed that the use of indicators was important
because of their typicality. (‘a series of empirical indicators relevant to each data
At the end of the 1980s, in a study on base and hypothesis must be constructed, and, last,
the interview, I documented the rituals and research must progress in a formative manner in which
hypotheses and data continually interrelate,’ p. 269),
rhetorical strategies used by an interviewer and he argued that ‘it is necessary for researchers to
as he made telephone calls to 10 adolescents demonstrate the representativeness of those units in
in order to arrange subsequent face-to-face the total population of similar events’ (p. 269).
interviews (Gobo, 1990, 2001). The research 5 Gomm, Hammersley and Foster (2000: 112,
involved the recording of the telephone calls endnote 2) acutely point out: ‘there is some ambiguity
in Stake’s position. He also recognizes that case
and subsequent discourse analysis. Some studies can be instrumental rather than intrinsic, and
years later, Maynard and Schaeffer (1999) in an outline of the ‘major conceptual responsibilities’
conducted very similar research in the United of case study inquiry he lists the final one as
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 211
‘developing assertions or generalizations about the Cicourel, Aaron V. and Boese, R. 1972 Sign language
case (Stake, 1994, 244).’ acquisition and the teaching of deaf children, in
6 For this reason, apparently too severe and D. Hymes, Courtney B. Cazden, Vera P. John, and
without empirical justification is Payne and Williams’ Dell Hymes (eds.), Functions of Language in the
statement that: ‘opportunistic site selection will Classroom, New York: Teacher College Press.
normally be incompatible with even moderatum
Connolly, Paul 1998 ‘Dancing to the wrong tune’:
generalization’ (2005: 310).
7 As Strauss and Corbin (1990: 176) explain:
Ethnography, generalization, and research on racism
‘open sampling is associated with open coding. in schools, in P. Connolly and B. Troyna (eds.),
Openness rather than specificity guides the sampling Researching Racism in Education, Buckingham: Open
choices.’ Open sampling can be performed purpo- University Press, pp. 122–39.
sively (e.g. pp. 183–4) or systematically (e.g. p. 184), Converse, Jean M. and Schuman, Howard 1974
or it occurs fortuitously (e.g. pp. 182–3). It includes Conversations at Random: Survey Research as
on-site sampling. Interviewers See it, New York: Wiley.
Corsaro, William A. 1985 Friendship and Peer Culture
in the Early Years, Norwood, N.J: Ablex Publishing
REFERENCES Corporation.
Crapanzano, Vincent 1980 Tuhami. Portrait of a Moroc-
can, Chicago: University of Chicago Press.
Alasuutari, Pertti 1995 Researching Culture, London: Cronbach, Lee J. 1975 Beyond the two disciplines
Sage. of scientific psychology, American Psychologist,
Alberoni, Francesco et al. 1967 L’attivista di partito, 30: 116–27.
Bologna: Il Mulino. Cronbach, Lee J. 1982 Designing Evaluations of
Bailey, Kenneth D. 1978 Methods in Social Research, Educational and Social Programs, San Francisco:
New York: Free Press. Jossey-Bass.
Barton Allen H. and Lazarsfeld Paul F. 1955 Some Dalton, Melvin 1959 Man Who Manage, New York:
functions of qualitative analysis in social research, Wiley.
Frankfurter Beitrage zu Sociologie, 1: 321–361. De Martino, Ernesto 1961 La terra del rimorso, Milano:
Becker, Howard. 1953 Becoming a Marijuana Smoker. Il Saggiatore, transl. The Land of Remorse: A Study of
American Journal of Sociology, 59: 235–242. Southern Italian Tarantism, London: Free Association
Becker, Howard 1998 Trick of the Trade, Chicago and Books, 2005.
London: University of Chicago Press. Denzin, Norman K. 1971 Symbolic interactionism and
Becker, Howard 2000 Italo Calvino as Urbanologist, ethomethodology, in J.D. Douglas (ed.), Understand-
paper. ing Everyday Life, London: Routledge and Kegan Paul,
Bourdieu, Pierre. et al. 1993 La Misere du monde, Paris: pp. 259–284.
Editions du Seuil, transl. The Weight of the World: Denzin, Norman K. 1983 Interpretive interactionism, in
Social Suffering in Contemporary Society, Cambridge: G. Morgan (ed.), Beyond Method: Strategy for Social
Polity, 1999. Research, Beverly Hills, CA: Sage, pp. 129–46.
Burgess, Ernest W. 1927 Statistics and case studies Durkheim, Emile 1912 Les formes élémentaires de la vie
as methods of sociological research, Sociology and religieuse, Paris: Alcan, transl. The Elementary Forms
Social Research, 12: 103–120. of the Religious Life, London: G. Allen & Unwin, 1915.
Burrell, Gibson and Morgan, Gareth 1979 Sociological Festingers, Leon, Riecken, Henry W. and Schachter,
Paradigms and Organizational Analysis, London: Sanley 1956 When Prophecy Fails, New York: Harper
Heinemann. Torchbooks.
Capecchi, Vittorio 1972 Struttura e tecniche della ricerca, Fielding, Nigel 1981 The National Front, London:
in Pietro Rossi (ed.), Ricerca sociologica e ruolo del Routledge.
sociologo, Bologna: Il Mulino. Galtung, John 1967 Theory and Methods of Social
Chain, Isidor 1963 An introduction to sampling, in Research, Oslo: Universitets Forlaget.
C. Selltiz and M. Jahoda (eds.), Research Methods Garfinkel, Harold 1967 Studies in Ethnometodology,
in Social Relations, New York: Holt & Rinehart, Englewood Cliffs, NJ: Prentice Hall.
pp. 509–45. Geertz, Clifford 1972 Deep play: notes on the Balinese
Cicourel, Aaron V. 1968 The Social Organization of Cockfight, Dedalus, 101: 1–37.
Juvenile Justice, New York: Wiley. Geertz, Clifford 1973 The Interpretation of Culture,
Cicourel, Aaron V. 1996 Ecological Validity and New York: Basic Books.
White Room Effects, Pragmatic and Cognition, 4(2): Glaser, Barney G. and Strauss, Anselm L. 1967
221–263. The Discovery of Grounded Theory, Chicago: Aldine.
212 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Gobo, Giampietro 1990 The First Call: Rituals and Lieberson, Stanley 1992 Small N’s and Big Conclusions:
Rhetorical Strategies in the First Telephone Call with An examination of the Reasoning in Comparative
Italian Respondents, paper, Annual Meeting of the Studies Based on Small Number of Cases, reprinted
A.S.A., Washington D.C. August, 11–15. in R. Gomm, Hammersley, M. and Foster P. (eds.)
Gobo, Giampietro 2001 Best practices: rituals and (2000), op. cit.
rhetorical strategies in the ‘initial telephone Lincoln, Yvonna, S. and Guba, Egon, G. 1979 Naturalist
contact,’ Forum Qualitative Social Research, 2(1), Inquiry, Beverly Hills, CA: Sage. (Reprinted partially in
http://www.qualitative-research.net/fqs-texte/1-01/ Gomm Roger, Hammersley, Martyn and Foster, Peter
1-01gobo-e.htm. (eds.) 2000 Case Study Method, London: Sage,
Gobo, Giampietro 2004 Sampling, representative- pp. 27–42.
ness and generalizability, in Seale C., Gobo G., Mason, Jennifer 1996 Qualitative Researching, Newbury
Gubrium J.F., Silverman D. (eds.), Qualitative Park: Sage.
Research Practice, London: Sage, pp. 435–56. Maynard, Douglas W. and Schaeffer, Nora Cate 1999
Goetz, J.P. and LeCompte, Margaret D. 1984 Ethnog- Keeping the gate, Sociological Methods & Research,
raphy and Qualitative Design in Education Research, 1: 34–79.
Orlando, FL, Academic Press. Mitchell, Clyde J. 1983 Case and situation analysis,
Goffman, Erving 1961 Asylums, New York: Doubleday. Sociological Review, 31: 187–211.
Goffman, Erving 1967 Interaction Ritual, New York: Mitchell, R. and Karttunen, S. 1991 Perché e come
Doubleday Anchor. definire un artista?, Rassegna Italiana di Sociologia,
Goldthorpe, John H., Lockwood, David, Bechhofer, XXXII(3): 349–64.
Frank and Platt, Jennifer 1968 The Affluent Worker: Pawson Ray and Tilley Nick 1997 Realistic Evaluation,
Industrial Attitudes and Behaviour, Cambridge: Sage: London.
Cambridge University Press. Payne, Geoff and Williams, Malcolm 2005 Gener-
Gomm, Roger, Hammersley, Martyn and Foster, Peter alization in qualitative research, Sociology, 39(2):
(eds.) (2000) Case Study Method, London: Sage. 295–314.
Goode, William and Hatt, Paul, K. 1952 Methods in Peräkylä, Anssi 1997 Reliability and validity in research
Social Research, New York: McGraw-Hill. based upon transcripts, in David Silverman (ed.),
Gouldner, Alvin G. 1954 Patterns of Industrial Qualitative Research, London: Sage, pp. 201–19.
Bureaucracy, New York: The Free Press. Pollner, Melvin and McDonald, Wikler Lynn 1985 The
Griaule, Marcel 1948 Dieu d’eau: entretiens avec social construction of unreality: a case study of
Ogotemmêli, Paris: Éditions du Chêne. a family’s attribution of competence to a severely
Groves, Robert M. and Lyberg, Lars E. 1988 An retarded child, Family Process, 24: 241–254.
overview of nonresponse issues in telephone surveys, Ragin Charles C. and Becker Howard S. (eds.) 1992 What
in R.M. Groves, P.P. Biemer, L.E. Lyberg, J.T. Massey, is a Case? Cambridge: Cambridge University Press.
W.L. Nicholls II and J. Waksberg (eds.), Telephone Rosenhan, David L. 1973 On being sane in insane places,
Survey Methodology, New York: Wiley. Science, 179: 250–8.
Guba, Egon G. 1981 Criteria for assessing the Rositi, Franco 1993 Strutture di senso e strutture di dati,
trustworthiness of naturalistic enquiries, Educational Rassegna Italiana di Sociologia, 2: 177–200.
Communication and Technology Journal, 2(29): 75–92. Sacks, Harvey 1992 Lectures on Conversation, Oxford:
Guba, Egon G. and Lincoln, Yvonna S. 1981 Effective Blackwell.
Evaluation: Improving the Usefulness of Evalua- Schatzman, Leonard and Strauss, Anselm L. 1973 Field
tion Results Through Responsive and Naturalistic Research, Englewood Cliffs, NJ: Prentice Hall.
Approaches, San Francisco: Jossey-Bass. Schofield Janet Ward 1990 Increasing the generaliz-
Guba, Egon G. and Lincoln, Yvonna S. 1982 Episte- ability of qualitative research, in E.W. Eisner and
mological and methodological bases of naturalistic A. Peshkin (eds.), Qualitative Inquiry in Education:
inquiry, Educational Communication and Technology The Continuing Debate, New York: Teachers College
Journal, 30: 233–252. Press, pp. 201–232.
Hammersley, Martyn 1992 What’s Wrong with Ethnog- Silverman, David 2000 Doing Qualitative Research,
raphy?, London: Routledge. London: Sage.
Hebdige, Dick 1979 Subculture: The Meaning of Style, Stake, Robert 1978 The case study method in social
London and New York: Routledge. enquiry, Educational Researcher, 7: 5–8 (Reprinted
Kahneman, D. and Tversky, A. 1972 Subjective prob- in Gomm Roger, Hammersley, Martyn and Foster,
ability: A judgment of representativeness, Cognitive Peter (eds.) 2000 Case Study Method, London: Sage,
Psychology, 3: 430–454. pp. 19–26).
RE-CONCEPTUALIZING GENERALIZATION: OLD ISSUES IN A NEW FRAME 213
Strauss, Anselm 1987 Qualitative Analysis for XIX: 22–87, transl. On the methodology of the social
Social Scientists, Cambridge: Cambridge University sciences, Illinois: The Free Press of Glencoe, 1949.
Press. Williams, Malcolm 2000 Interpretativism and general-
Strauss, Anselm and Corbin, Julet 1990 Basics of ization, Sociology, 34(2): 209–24.
Qualitative Research, London: Sage. Xing Xu, Zhonghe Zhou, Xiaolin Wang, Xuewen Kuang,
Tversky, Amos and Kahneman, Daniel 1974 Judgment Fucheng Zhang and Xiangke Du 2003 Four winged
under uncertainty: Heuristics and biases, Science, dinosaurs from China, Nature, 421: 335–339.
185: 1123–1131. Yin, Robert K. 1984 Case Study Research, Thousand
Weber, Max 1904 Die ‘Objektivität’ sozialwis- Oaks: Sage.
senschaftlicher und sozialpolitischer Erkenntnis, Znaniecki , Florian 1934 The Method of Sociology, New
Archiv für sozialwissenschaf und Sozialpolitik, York: Farrar & Rinehart.
13
Case Study in Social Research
Linda Mabry
involves careful methodology to avoid such by Kant (1781) as the study of phenomena
error. or things-as-they-appear and the measure-
Case study researchers in social science ment of things-as-they-are or nuomena. As
commonly scrutinize not only the demograph- human beings were distinguished on the basis
ics and other statistics of a case, such as of their sense-making proclivities, perception
how many persons are involved or affected emerged as an object of social science – how
and how indicators of impact vary over time, things appear to a participant in the scene
but even more closely the experiences and (e.g. the homeless person, the manager of a
perceptions of participants. Understanding a soup kitchen, the police chief) and how they
case almost always requires going beyond appear to an observer (e.g. the social scientist).
countable aspects and trends. Inquiry into To the extent that case study researchers
the social phenomenon of homelessness, for work to document human perception and
example, may benefit from counting the experiences, consciously using their own
number of persons dispossessed, comparing perceptions in the process, they engage in
the current with a past census, and identifying phenomenology.
the homeless by age group, gender, and The clashing motifs of natural science
location. But this is not enough for deep (sometimes referred to as hard science
understanding. Grasping why people live or quantitative or experimental research)
on the streets and such things as whether and social science (contrastingly referred
sufficient resources are available to support to as soft science or qualitative, interpre-
any who might choose not to do so, whether tive, or hermeneutic research) may present
there are cross-generational effects, which themselves to case study researchers as a
policies and social structures tend to push choice or may be resolved in mixed-methods
people into homelessness and which tend to inquiry. Resolution once seemed unlikely, and
protect them from it will significantly improve some still deem the two research paradigms
understanding. What do the homeless think incommensurable (Kuhn, 1962; Lincoln &
their opportunities and barriers are? Do social Guba, 1985). Where case study researchers
workers and law enforcement officers agree in social science choose between the two,
with them? What do policy-makers think their methodological choice is typically qual-
the homeless need, and what do they think itative.
their constituents or budgets will support? In this methodological distinction, differ-
Because the social reality of homelessness is entiation has evolved over time. During the
co-constructed by people who participate in century after Kant, the Vienna School moved
the phenomenon, their experiences, beliefs, from positivism’s insistence on the measura-
and values must be studied in order to bility of an objective reality (Comte, 1822) and
understand the phenomenon of homelessness the notion that the truth of a statement depends
in any place that it occurs, the political and upon its being in a one-to-one correspondence
ideological contexts that sustain it, and the with an objective reality (David, 2005) to log-
capacity of participants to imagine or accept ical positivism’s less demanding requirement
potential solutions. of the verifiability of real entities (Popper,
1935). Across the scientific aisle, simulta-
neously, Nietzsche (1882), urging subjective
Historical and epistemological
judgment, observed, ‘We behold all things
antecedents
through the human head’, and Dilthey (1883)
Because social reality is created by people advanced a general theory of understand-
and because it is complex, dynamic, and ing, Verstehen, whose research imperative
context-dependent, its study required the involved subjective meaning-making, Geis-
development of a highly nuanced research teswissenschaften. From there, the Chicago
approach. Eighteenth-century views of natu- School developed an urban sociology in the
ral science and social science were contrasted 1920s–30s employing ethnographic methods.
216 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
At the end of the next century, Erickson, Second, the search for broad applicability
describing qualitative methodology, was still of findings in quantitative research drives
encouraging researchers to ‘put mind back in random sampling from large populations and
the picture’ (1986, p. 127, italics original). data collection using standardized proce-
Qualitative or interpretivist study implies dures. The quality of large-scale quantitative
the constructivist theory that all knowledge research depends largely on careful adherence
is personally constructed (Piaget, 1955; see to a prescriptive research design. In contrast to
also Glassman, 2001; and Phillips, 1995). the preordinate design of quantitative studies,
Personal experience, including the vicarious qualitative case studies employ emergent
experience promoted in interpretivist case design. Rather than carefully adhering to a
studies, provides the building blocks for design specified at the outset, when relatively
the knowledge base constructed by each little is known about a case, a qualitative
individual. In articulating a resonant inter- case researcher is expected to improve on
pretivist research methodology, Lincoln and the original blueprint as information emerges
Guba (1985) promoted an ontology of truth during data collection. For example, if
and a subjectivist epistemology in which unexpected sources of data become apparent
meaning is personally or socially constructed or if unanticipated aspects of the case come to
(see Vygotsky, 1978). Similarly, hermeneutic light, the researcher is expected to capitalize
methodology is marked by search for the on the new opportunities and progressively
meanings people attribute to phenomena focus the study on the features of the case
(Guba and Lincoln, 1994; see also Guba which gradually appear to be most significant.
and Lincoln, 2005; Schutz, 1967; Schwandt, Finally, while large-scale quantitative stud-
1994). ies reduce data to numbers for aggregation
and statistical analysis, interpretivist case
studies tend to expand datasets as new sources
Interpretivist distinctions
are discovered and questions articulated.
Three additional contrasts help to distinguish The contrast is between the reductionism of
interpretivist case study in the social sciences, quantitative studies and the expansionism of
with the caution that these characteristics interpretivist studies. Reductionism allows
are better understood as complementary than quantitative researchers to utilize statistical
as conflicting with quantitative methodology. analysis procedures; expansionism allows
First, while large-scale quantitative studies interpretivist case study researchers fuller
sample from broad populations and produce access to a case’s contexts, conditionalities,
grand generalizations, case studies provide and meanings.
deep understanding about specific instances.
The contrast is one of breadth and depth, Contributions to knowledge and
both needed for understanding complex social
understanding
phenomena. For example, it may be helpful
to know the correlations between exposure Such characteristics position case studies to
to the sun and the incidence of melanoma, contribute substantively to social science by
treatment options, and survival rates; it may offering intense focus on cases of interest,
also be helpful to know how some patients their contexts, and their complexity.
deal with the effects of treatment and whether
their personal strategies aided recovery and Selection of cases
quality of life. For disaster planning, it Cases abound – micro-lending, public trans-
may be helpful to know which community portation, the westernization of indigenous
services are most frequently accessed during cultures, consolidation of rural high schools,
emergencies; it may also be helpful to know access of the uninsured to hospitals, a social
how a critical service agency mobilizes and worker’s case load, an immigrant child’s
rations access. struggle to learn. The identification of a
CASE STUDY IN SOCIAL RESEARCH 217
narratives intensifies the power of case study insider perspectives and cases’ complexity.
reports to deepen understanding because it Familiarity with the ethos helps outsiders
promotes development of tacit knowledge. attempt etic1 representations of insiders’
There are trade-offs and drawbacks to experiences and meanings, representations
case study as a research approach, including which should be accompanied by appropriate
proximity, validity, and generalizability. qualification.
Cultural competence
PROXIMITY
Cultures and subcultures develop singular
The psycho-socio-emotional distance between histories and respond to overlapping contexts
researchers and the cases establishes their and unique personalities in highly nuanced
proximity. Researchers are usually outsiders ways. For external researchers, the cultural
to the cases they study (but not always), competence needed for grasping local mean-
observers rather than true members of a ings cannot be presumed. Even when external
case. On a continuum of possible roles from researchers share nationality and language
external observer to participant-observer, a with case participants, they may be unable
researcher’s stance may be as passive as to detect the subtle or hidden meanings
that of the proverbial fly on the wall or suggested by a pause in conversation, the type
more active, like a participant in the case. of refreshments offered, who is present and
In ethnography, researchers are sometimes who is absent in a gathering, the items found
cautioned against ‘going native’. Contrast- (or not) on a meeting agenda, who gives and
ingly, in action research or self-study, the who receives gifts, who makes decisions and
case researcher may be a native from the how. Reliance on knowledgeable participants
outset – project manager or a classroom acting as key informants can help surface
teacher conducting research for the purpose of local meanings, although debriefings and
translating deeper understanding into making other discussions for the purpose of cultural
immediate improvements. translation will inject key informants’ own
meanings into datasets and introduce new
cautions for interpretation.
Externality
Where the researcher does not share the
A case study researcher can promotes language(s) or dialect(s) indigenous to the
understanding by collecting and organizing case, dependence on translators is unavoid-
information, focusing attention on meaningful able. The transfer of meaning from speaker to
aspects, and providing an external analytic hearer, never assured, is further compromised
perspective that may be helpful even to by introducing this mediating influence.
insiders intimately familiar with the case. Language structures and idioms are so
Although researchers choose cases partly out culture-specific and dynamic that, even with
of personal interest, externality suggests an highly competent and motivated translators,
absence of vested interest, one source of inaccuracies are difficult to avoid. These lim-
bias, which can promotes the credibility of itations, too, should be acknowledged in case
findings. reports.
On the other hand, externality implies
limited lived experience of the case and
Ethics
the danger that case studies may fail to
‘get it right’ (Geertz, 1973; Wolcott, 1994). The misunderstandings which externality can
The contextuality of cases and the phe- generate has an ethical component. Partici-
nomenological impulse of case study research pants have a stake in the accuracy of how they
create special burdens for external researchers are presented and in whether case accounts
attempting to grasp and represent multiple are flattering or damning. For example,
CASE STUDY IN SOCIAL RESEARCH 221
and credibility are separate properties, these a procedure in which groups representing
efforts tend to improve both. those observed and interviewed are asked to
confirm, elaborate, and disconfirm write-ups
Triangulation (Lincoln & Guba, 1985). In comprehensive
During data collection, triangulation2 by data validation, a more thorough approach, each
source involves collecting data from different human subject reviews data collected from
persons or entities. Checking the degree to his or her own interviews or observations
which each source confirms, elaborates, and prior to further dissemination (Mabry, 1998).
disconfirms information from other sources Research subjects may be asked to validate
honors case complexity and the perspectives interpretations as well as data.
among participants and helps ascertain the In addition to triangulation and validation,
accuracy of each datum. Methodological peer review by critical friends, especially
triangulation involves checking data col- colleagues with expertise in the phenomenon
lected via one method with data collected or case or methodology, can provide a check
using another, for example, checking whether on the sufficiency of the evidence, the logic of
direct observation can confirm interview arguments, overall clarity and experientiality.
testimony. Triangulation by time involves
repeated return to the site to track patterns
of events and their trends and permutations. GENERALIZABILITY
Because different observers might see dif-
ferent things or might interpret the same Generalizability refers to the capacity of
things differently, triangulation by observer the case to be informative about a general
can help expand meaning-making, balance phenomenon, to be broadly applicable beyond
interpretations, and guard against undue the specific site, population, time, and cir-
researcher subjectivity. cumstances studied. The understanding that
Theoretical triangulation in data analysis a single case studied in depth can offer is
involves recourse to different abstractions that different from the generalizable explanation,
might explain the data. Various theories, mod- often via theory or model, more easily
els, typologies, and categorization systems provided by large-scale study (von Wright,
may suggest different meanings. For example, 1971).
analysis of the productivity of a working unit In quantitative research, generalizability,
according to an economic model may suggest often referred to as external validity, drives
an interpretation quite different from one design: hypothesis-testing to support a gen-
suggested by analysis of the unit’s procedures eralizable theory, random sampling to assure
according to a model of democratic decision- representativeness of a larger population,
making. Similarly, an analysis of classroom team training for reliability or consistency in
discussion regarding the degree to which the administration of data collection instruments
teacher’s questioning prompts student knowl- to allow aggregability. While researchers
edge gains may yield quite different results schooled in the quantitative tradition have
from an analysis regarding the degree to considered case studies problematic in their
which ethnically diverse students participate determined focus on single cases (e.g.
meaningfully. Campbell & Stanley, 1963), case study
researchers have made different types of
Validation arguments regarding generalizing their work.
The notion that accounts may be ‘made
better by good readers and clearer by
Acceptable interpretivist
good opponents’ (Nietzsche, 1879) underlies
generalization
processes of validation in interpretivist social
science. Research subjects can help assure The case-to-population generalizations
the accuracy of data by member-checking, (Firestone, 1993) important in quantitative
CASE STUDY IN SOCIAL RESEARCH 223
result from large-scale research. Large-scale process. Theoretical triangulation, noted ear-
quantitative research often involves causal lier, facilitates interpretation by offering views
analysis for the purpose of prediction and of the data through different explanatory
control of future behaviour. For example, lenses. Different potential interpretations
physicians may prescribing a drug to patients suggested by different theories help the
based on studies suggesting the drug alleviates interpretivist case study researcher to think
their disease. Often, an experimental study deeply about meaning.
begins with a hypothesis3 derived from theory
and tested empirically, a deductive approach
in which theory propels data collection. CONCLUSION
The inverse of this approach is more
common to case study where theory devel- With deep understanding of a case as the prime
opment (if any) is inductive, following data goal of case study, an attitude of openness
collection and explaining the dataset. Theory may be the most fortuitous item in a case
may emerge, perhaps unexpectedly, through study researcher’s dispositional toolkit. There
constant-comparative method, a dialogic is always more that can be learned about a
cycle of data collection and interpretation case, more potential interpretations of existing
which is incomplete until interpretations data, and new events that create alterations in
encompass all available data (Glaser & Strauss, the case. Premature conclusions can foreclose
1967; Strauss & Corbin, 1990), as noted earlier. on deeper understanding. Curiosity to know
Small-scale studies can also refine existing more and to understand better encourages
theory, for example, physicians prescribing a delving deeply into the meaning of the case.
drug to all patients except those from a specific Link by link, case by case, construction of
ethnic group which tends to react negatively meaning by the researcher, by the reader,
as revealed by prior case studies. and by the research community is how
Whether or not research generates theory, case study contributes to social science and
personal theory plays a role in all research. to society. As accumulated case studies
Case study researchers who claim their work refine understandings of social phenomena,
is ‘merely descriptive’ or ‘atheoretical’ inap- accumulated practice of case study may
propriately deny the effects of their own con- continue to refine these methods, resulting in
ceptualizations of the phenomena they study. ever more careful and nuaneed social science.
From the outset of a case study, formation of
a research question indicates an underlying
personal theory, perhaps implicit, about the NOTES
nature of the phenomenon. Even word choice
signals theory, for example, ‘active classroom’ 1 In contrast to a phenomenological emic approach
suggesting a theory that learners are active to research is an etic approach in which an outsider’s –
rather than an insider’s – perspective is offered to
constructors of their knowledge bases; or readers (see Seymour-Smith, 1986).
‘chaotic’ suggesting a theory that knowledge 2 Triangulation, a term derived from nautical
is delivered rather than constructed and that procedures for locating ships at sea based on three
learning is passive. As part of the effort to points, does not presume three sources (or methods,
discipline their subjectivities, interpretivist observers, data collection events, or theoretical
perspectives). More or fewer, as needed and as
researchers may try to articulate their personal available, may be consulted.
theories, making them explicit for readers as 3 Actually, experimental studies generally begin
they consider the validity of descriptions and with null hypotheses, testing to see whether the
findings. inverse of the actual hypothesis can be proved
While theory development is not usually false – thus providing indirect evidence that the
actual hypothesis is true. Note that this approach
expected in small-scale studies, the use of is essentially a matter of ruling out rival hypotheses
theory to analyze data may nevertheless be to narrow the range of possible explanations for a
a highly productive part of the interpretive phenomenon.
226 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Mabry, L. (1998). Case study methods. In H. J. Walberg & Piaget, J. (1955). The language and thought of the child.
A. J. Reynolds (Eds.), Evaluation research for New York: World.
educational productivity (pp. 155–170). Greenwich, Polanyi, M. (1958). Personal knowledge: Towards a
CT: JAI Press. post-critical philosophy. Chicago, IL: University of
Mabry, L. (2003). In living color: Qualitative methods Chicago Press.
in educational evaluation. In T. Kellaghan & Popper, K. R. (1935). Logik der Forschung. Vienna: Julius
D. L. Stufflebeam (Eds.), International handbook Springer Verlag.
of educational evaluation (pp. 167–185). Boston: Schutz, A. (1967). Collected papers I: The problem of
Kluwer-Nijhoff. social reality. The Hague: Martinus Nijhoff.
Mabry, L., Poole, J., Redmond, L. & Schultz, A. Schwandt, T. A. (1994). Constructivist, interpretivist
(2003). Local impact of state-mandated testing. approaches to human inquiry. In N. K. Denzin &
Education Policy Analysis Archives, 11(22). Available Y. S. Lincoln (Eds.), Handbook of qualitative research
at: http://epaa.asu.edu/epaa/v11n22/ (pp. 118–137). Thousand Oaks, CA: Sage.
Mahoney, K. K. (1999). Peer mediation: An ethnographic Seymour-Smith, C. (1986). Dictionary of anthropology.
investigation of an elementary school’s program. Boston: G. K. Hall.
Unpublished doctoral dissertation. Indiana University, Shadish, W. R., Jr., Cook, T. D. & Leviton, L. C.
Bloomington, IN. (1991). Foundations of program evaluation: Theories
Malone, D. L. (1997). Namel manmeri: Language of practice. Newbury Park, CA: Sage.
and culture maintenance and mother tongue Smith, L. (1978). An evolving logic of participant
education in the highlands of Papua New Guinea. observation, educational ethnography and other case
Unpublished doctoral dissertation. Indiana University, studies. In L. Shulman (Ed.), Review of research in
Bloomington, IN. education (vol. 6, pp. 316–377). Itasca, IL: Peacock.
Maxwell, J. A. (1992). Understanding and validity in Spindler, G. D. (1997). Beth Anne–A case study of cul-
qualitative research. Harvard Educational Review, turally defined adjustment and teacher perceptions.
62 (3), 279–300. In G. D. Spindler (Ed.), Education and cultural process:
McDonald, S.-K., Keesler, V. A., Kauffman, N. J. & Anthropological approaches (3rd ed., pp. 246–261).
Schneider, B. (2006). Scaling-up exemplary interven- Prospect Heights, IL: Waveland Press.
tions. Educational Researcher, 35 (3), 15–32. Spiro, R. J., Vispoel, W. P., Schmitz, J. G.,
Mertens, D. (2005). Research and evaluation in Samarapungavan, A. & Boerger, A. E. (1987).
education and psychology: Integrating diversity Knowledge acquisition for application: Cognitive
with quantitative, qualitative, and mixed methods flexibility and transfer in complex content domains.
(2nd ed.). Thousand Oaks, CA: Sage. In B. C. Britton (Ed.), Executive control processes (pp.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educa- 177–199). Hillsdale, NJ: Erlbaum.
tional measurement (3rd ed., pp. 13–103). New York: Stake, R. E. (1978). The case study method in social
American Council on Education, Macmillan. inquiry. Educational Researcher, 7 (2), 5–8.
Nietzsche, F. (1879/1996). Human, all too human. Stake, R. E. (2005). Qualitative case studies. In
(trans. by R. J. Hollingdale). Cambridge, MA: N. K. Denzin & Y. S. Lincoln (Eds.), Handbook
Cambridge University Press. of qualitative research (3rd ed., pp. 443–466).
Nietzsche, F. (1882/1974). The gay science. (trans. Thousand Oaks, CA: Sage.
W. Kaufmann). London: Vintage Books. Stake, R., Bresler, L. & Mabry, L. (1991). Custom and
Peck, C. A., Mabry, L., Curley, J. & Conn-Powers, M. cherishing: The arts in elementary schools. Urbana, IL:
(1993, May). Implementing integration at the Council for Research in Music Education, University
preschool and kindergarten level: A follow-along of Illinois.
study of Washington’s efforts. Washington Office of Strauss, A. & Corbin, J. (1990). Basics of qualitative
the Superintendent of Public Instruction and Early research: Grounded theory procedures and tech-
Childhood Development Association of Washington’s niques. Newbury Park, CA: Sage.
Infant and Early Childhood Conference, Seattle, WA. Tobin, J. J., Wu, D. Y. H. & Davidson, D. H. (1989).
Peshkin, A. (1986). God’s choice. Chicago: University of Preschool in three cultures: Japan, China, and the
Chicago Press. United States. New Haven, CT: Yale University Press.
Peshkin, A. (1988). In search of subjectivity–one’s own. von Wright, G. H. (1971). Explanation and understand-
Educational Researcher, 17 (7), 17–22. ing. London: Routledge & Kegan Paul.
Phillips, D. C. (1995). The good, the bad, and the Vygotsky, L. S. (1978). Mind in society: The development
ugly: The many faces of constructivism. Educational of higher mental process. Cambridge, MA: Harvard
Researcher, 24 (7), 5–12. University Press.
CASE STUDY IN SOCIAL RESEARCH 227
Wolcott, H. F. (1987). The teacher as an enemy. In Wolcott, H. F. (1995). The art of fieldwork. Walnut Creek,
G. D. Spindler (Ed.), Education and cultural process: CA: AltaMira Press.
Anthropological approaches (2nd ed., pp. 136–150). Worthen, B. R., Sanders, J. R. & Fitzpatrick, J. L.
Prospect Heights, IL: Waveland Press. (1997). Program evaluation: Alternative approaches
Wolcott, H. F. (1994). Transforming qualitative data: and practical guidelines (2nd ed.). New York:
Description, analysis, and interpretation. Thousand Longman.
Oaks, CA: Sage.
14
Longitudinal and Panel Studies
Jane Elliott, Janet Holland and
Rachel Thomson
information is also common in qualitative it is not clear whether this is an age effect
longitudinal studies. such that as individuals grow older they are
A further type of prospective panel study more likely to vote Conservative or whether
is a linked panel, which uses census data it is a cohort effect so that those born before
or administrative data (such as information 1956 are more likely to vote Conservative
about hospital treatment or benefits records). than those born after 1956. In a longitudinal
This is the least intrusive type of quantitative cohort study we would be able to track the
longitudinal research study as individuals may voting intentions of those who reached age
well not be aware that they are members of the 50 in 2006 throughout their adult lives to see
panel. Unique personal identifiers are used to whether their political allegiances were stable
link together data that were not initially col- or whether they became more Conservative
lected as part of a longitudinal research study. as they grew older. This data could then be
For example a 1 percent sub-sample of records compared with the information from cohorts
from the 1971 British Census has been linked born at earlier and later time periods to see
to records for the same sample of individuals whether there were stable cohort differences
in 1981, 1991 and 2001. This is known as in political beliefs.
the Longitudinal Study of the British Census. Cohort studies allow an explicit focus on
A similar study linking the 1991 and 2001 the social and cultural context that frames
Census records for 5 percent of the population the experiences, behaviour and decisions of
of Scotland has recently been established. individuals. For example, in the case of the
1958 British Birth Cohort Study (the National
Child Development Study), it is important
Cohort studies
to understand the cohort’s educational expe-
A cohort has been defined as an ‘aggregate riences in the context of profound changes
of individuals who experienced the same in the organisation of secondary education
event within the same time interval’ (Ryder, during the 1960s and 1970s, and the rapid
1965: 845). The notion of a group of people expansion of higher education, which was
bound together by sharing the experience of well underway by the time cohort members
common historical events was first introduced left school in the mid 1970s (Bynner and
by Karl Mannheim in the early 1920s. Fogelman, 1993). In a similar way, qualitative
Mannheim argued that people are more longitudinal studies, in following individuals,
sensitive to social phenomena that occur groups and institutions over time, can provide
during their formative years and this may information on the impact of dramatic changes
shape a cohort’s future values and behaviour. of policy on the lives and experiences of
The most straightforward type of cohort used participants. Examples here are the 12–16
in longitudinal quantitative research is the study, which provides insight into the conse-
birth cohort, i.e. a sample of individuals born quences of changing policies in different kinds
within a relatively short time period. We might of schools and communities in Australia,
also choose to study samples of a cohort of and Pollard and Filer charting the effects of
people who got married, or who were released rapidly changing education policy on children
from prison, in a particular month or year. through critical years of their primary and
One major advantage of having longitu- secondary education in the UK (McLeod and
dinal data on a series of separate cohorts Yates, 2006; Pollard and Filer, 1999, 2002).
is that it is possible to distinguish between Qualitative studies constructed in this way
‘age effects’ (or lifecycle effects) and cohort tend to avoid the danger of producing findings
effects. For example, we may discover, from that are disembodied from particular times
a cross-sectional survey carried out in 2006 and places. A similar argument has been made
in Britain, that people over the age of 50 are for quantitative approaches, where the use of
more likely to vote for the Conservative Party data from a single cohort coupled with an
than those under the age of 50. However, awareness of how the historical context could
LONGITUDINAL AND PANEL STUDIES 231
shape the experiences of that generation of the type of data generated and methods
individuals, has been argued to lead to a more employed are very different from those in
‘narrative’ understanding of the patterns of quantitative studies, and that they can vary
behaviour being investigated (Elliott, 2005). by social science discipline. Imagine the
Comparisons between cohorts can also help wealth of detailed data on all aspects of
to clarify how individuals of different ages the life and culture of the Isthmus Zapotec
may respond differently to particular sets of generated in an ongoing, 40-year study of
historical circumstances. This emphasis on their community (Royce, 2005). In general the
the importance of understanding individuals’ methods used to generate data in qualitative
lives and experiences as arising out of longitudinal research depend on the research
the intersection of individual agency and questions, the substantive research area and
historical and cultural context has become the perspective of the researcher/discipline.
articulated as the life course paradigm. The Anthropology and community studies are
term ‘life course’ refers to ‘a sequence of the lead social science disciplines employing
socially defined events and roles that the long-term fieldwork that can be seen as qual-
individual enacts over time’ (Giele and Elder, itative longitudinal research. The approach
1998: 22). Research adopting the life course is also relatively common in the education
paradigm tends to use both qualitative and field, relevant studies including Pollard and
quantitative data (Elder, 1974; Giele, 1998; Filer, 2002; Gordon et al., 2000; Walkerdine
Laub and Sampson, 1998). et al., 2001; Yates et al., 2002; Ball et al.,
Studies can combine qualitative and quan- 2000 and Kuhn and Witzel, 2000. Qualitative
titative methods in different ways, and longitudinal work is particularly apposite in
although advocating the need for both, a developmental psychology and health – key
discussion of mixed methods is beyond the studies include Cutting and Dunn, 1999;
scope of this chapter, other than stating Hughes and Dunn, 2002; Brown and Gilligan,
that the combination of methods varies 1992; Gilligan, 1993; Gulbrandsen, 2003 and
considerably. For example predominantly Woodgate et al., 2003. There is increasing
quantitative studies may have qualitative ‘add use of this approach in sociology (Du Bois-
ons’ (for example, see Gorell-Barnes et al., Reymond, 1998) and policy studies, dealing
1998), studies may integrate both approaches with policy development and evaluation,
(Du Bois-Reymond, 1998), and studies may impact and process (Molloy et al., 2002;
begin as primarily quantitative and become Mumford and Power, 2003). Other sociology
increasingly qualitative over time as sample sub-disciplines where qualitative longitudinal
size erodes (Dwyer and Wyn, 2001). research is prevalent include criminology,
Table 14.1 provides a brief summary of a covering criminal, drug use and sex work
small selection of quantitative and qualitative ‘careers’ (Farrall, 2004; Plumridge, 2001;
studies that have used different longitudinal Smith and McVie, 2003), life course/life
panel designs, focusing on those that are history studies (Elder and Conger, 2000;
commonly used in Britain, North America and Laub and Sampson, 2003) and childhood
Europe. While some of these are individual and youth studies (Henderson et al., 2007;
research projects others are multipurpose Neale and Flowerdew, 2003; White and Wyn,
studies that generate datasets that can be used 2004). Areas investigated include for example
as resources by other researchers. gender, families, parenting, child develop-
ment, children and young people, changing
health status, all manner of transitions in life,
GENERATING QUALITATIVE sexuality, employment and the impact of new
LONGITUDINAL DATA technology.
Two collections of anthropological studies,
We can see from the examples of qualita- themselves providing a review of the field
tive longitudinal studies in Table 14.1 that over time, yield a fascinating picture of the
Table 14.1 Examples of longitudinal studies
Study Type Country Date started Frequency of data Main focus Key reference or website
collection
Panel Study of Income Household USA 1968 Annual Income http://psidonline.isr.umich.edu/
Dynamics McGonagle and Schoeni, 2006
National Longitudinal Cohort USA 1966, 1971 etc. Annual A series of cohort studies http://www.bls.gov/nls/
surveys started at different times NLS Handbook, 2005
and with cohorts of http://www.bls.gov/nls/handbook/
different ages, with a nlshndbk.htm
primary focus on
employment
Survey of Income and Household USA 1984 Every 4 months Income support http://www.bls.census.gov/sipp/
Program participation SIPP users Guide 2001 available in PDF at
http://www.bls.census.gov/sipp/pubs.html
National Longitudinal Cohort of children aged Canada 1994 Every 2 years Well-being and http://www.statcan.ca/english/sdds/
Study of Children and 0–11 development of children
Youth into early adult life
British Birth Cohort Cohort Great Britain 1946, 1958, Varies, but Health and child http://www.cls.ioe.ac.uk/
Studies: National 1970 and 2000 generally every 2–3 development with a http://www.nshd.mrc.ac.uk/
Survey of Health and years at early broader focus in adult life Dex and Joshi, 2005; Ferri et al., 2003
Development; National stages of children’s (the 1946 cohort study is
Child Development development and more specifically focused
Study; British Cohort every 4 years in on health)
Study 1970; Millennium adult life
Cohort Study
Longitudinal Study of Linked panel using England and Wales 1971 Links decennial Demographic and http://www.celsius.lshtm.ac.uk/
the Census in England census data census data employment topics Blackwell et al., 2003; Akinwale et al., 2005
and Wales included in the census
German Household study West Germany and 1984 Annual Broad focus on living http://www.diw.de/english/sop/
Socio-economic Panel now includes the conditions, social change,
former GDR education and
employment
EU Survey on Income Household study European 2003 (ECHP Annual Living conditions, http://epunet.essex.ac.uk/EU-SILC_UDB.pdf
and Living Conditions Community from 1994 to employment, income, http://www.iser.essex.ac.uk/epag/
(EU-SILC) formerly the 2001) health and housing dataset.php
European Community Berthoud and Iacovou, 2002
Household Panel
(ECHP)
The Isthmus Zapotec Anthropological, Mexico (USA) 1967 Varies, but Identity, language, Royce, 1977, 1982, 1993, 2002
ethnographic includes between every 1–3 culture, art; change/
dance, photography, years continuity
art, artefacts, advocacy
The Harvard Chiapas Anthropological, Mexico (USA) 1957 Continuous Determinants and Vogt, 1957, 1969, 1994, 2002
Project Tzotzil and controlled comparative, annually processes of cultural
Tzeltal Indians ethnographic team 1957–1980, more change, language,
approach sporadic since conceptual system
Gwembe, Valley Tonga Anthropological Northern Rhodesia/ 1956 Initially 5-year Resettlement Cliggett, 2002; Scudder and Colson,
(Northern demographic census, Zambia (UK) intervals, then post-dramatic 1979, 2002
Rhodesia/Zambia) ethnographic team varies environmental change
approach (Kariba Dam) Cultural,
social, political change
12–18 Project Sociology/ Psychology. Australia 1993 1993–2000 Gendered subjectivity, McLeod and Yates, 2006
Interview study of 4 twice-yearly identity formation,
schools, ethos, effect interviews interaction of
on young people institutional + social
contexts
Identity and Learning Educational UK 1987 Annually scheduled Identity, learning stance, Pollard and Filer, 1999
Programme ethnography, of 17 activities over the dynamics of learning
children through ages period of research careers, differentiation
4–16.
Multi-perspective,
collaborative approach
Growing up Girl Multi-method UK 1977 Revisits ages 4, 10, Education, families, Walkerdine and Lucey, 1989; Walkerdine
psychosocial study of 16, 21 gender, ethnicity and et al., 2001
female subjectivities social class
and transitions to
womanhood
Inventing Adulthoods Multi-method UK 1996 5/6 waves in Values, identities, Henderson et al., 2007; Thomson, 2007
sociological study of 10 years material & social
100 young people’s resources
transitions to
adulthood
Middletown First classic US USA 1924 E.g. 1924, 1935, To study synchronously Caccamo, 2000; Caplow and Bahr, 1979;
community study. 1979, 1982, 2001 the interwoven trends Caplow et al., 1982; Lynd and Lynd,
Many others that are the life of a small 1929, 1935
followed up American city
234 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
range and complexity of qualitative longitu- community studies has been heightened by the
dinal research undertaken in this field (Foster growing trend of researchers to return to the
et al., 1979; Kemper and Royce, 2002). Each site of earlier research. Follow-up studies have
provides considerable insight into an estab- involved the same researcher(s) (Stacey et al.,
lished canon of long-term anthropological 1975) or others (Warwick and Littlejohn,
enterprise. This has involved the development 1992). In the 1990s for example, Fiona Devine
of a necessarily flexible approach, adapting to returned to Luton where working-class car
changes in the nature of the community; in the workers were first studied by Goldthorpe and
needs, goals, options and world-views of com- his colleagues and described in The affluent
munity members; in the political landscape worker (1968). Devine (1992) was interested
and in the relationships between researchers to see what changes had occurred in the
and community members. Importantly, it intervening period in relation to working-class
illustrates how projects need to be organised lifestyles and political and social beliefs. The
on the basis of personnel and project size. As Lynd and Lynd study of Middletown became
Kemper and Royce indicate it is impossible a benchmark for community studies that was
to take on issues of time without the research revisited by the Lynds themselves and by
coming into the frame, including practical many others up to the present day (Caccamo,
questions of how to organise and maintain a 2000, see also Crow, 2002; Crow and Allen,
team, the domestic politics of a research team, 1994).
funding and job security issues and intellec- The types of methods used to generate data
tual fashions. Many of these issues are also rel- in qualitative longitudinal research, are those
evant for quantitative longitudinal research. of qualitative research in general, and can
The body of anthropological research and be combined in various ways, including with
the issues taken into consideration provide quantitative methods (for example surveys
models for other disciplines and illustrate of varying sizes and types, the collection
some differences in the concerns of different of baseline descriptive statistical and demo-
disciplines. An example here is concern about graphic data to enable assessment of change
anonymity and confidentiality that emerges over time, social mapping of geographical
for many qualitative researchers, inhibiting areas). The basic method in anthropology,
the sharing of data. Data sharing and participa- although now widely used in other disciplines,
tory involvement with those studied are well is ethnography, itself constructed from multi-
established in anthropology, although perhaps ple qualitative methods. Critically, however,
in danger in a constrained funding climate. ethnography involves social exploration, pro-
Anthropological studies can in some ways tracted investigation and the interpretation
be seen as community studies, but the of local and situated cultures grounded
community studies literature tends to straddle in attention to the singular and concrete
disciplinary boundaries, including sociology, (Atkinson and Hammersley, 1994; Atkinson
anthropology and geography or urban studies, et al., 2001). Amongst specific methods used
and many of the classic studies were con- in qualitative longitudinal research are inter-
ducted within these fields, often drawing on views on a continuum from semi-structured to
an ethnographic method in which time, and depth. Increasingly favoured are biographical
change through time were critical elements. interviews, which can relate to specific
Important here are the urban ethnographic episodes in, or aspects of, a life, or be more
tradition of the Chicago School (Lynd and holistic as in life history approaches. Also
Lynd, 1929, 1935; Whyte, 1943, 1955; Wirth, employed are case studies, observation and
1938) and family and community studies documents including diaries kept specifically
in the UK. Examples include Young and for the research (written, audio-, video, photo-
Wilmott’s studies of the family in Bethnal diaries etc.; Thomson and Holland, 2005).
Green (1957) and Stacey’s Banbury studies Various standard instruments can also be used,
(1960 and 1975). The temporal character of particularly in psychology. Visual, play and
LONGITUDINAL AND PANEL STUDIES 235
drawing methods have also been developed, Qualitative longitudinal research can gen-
the latter for example with children. erate and test theory, and both inductive and
Further aspects of research design will deductive approaches can be undertaken, the
also be influenced by the social science specific theory again depending on the dis-
discipline or disciplines within which the cipline. Whatever the theoretical perspective
investigation takes place. This includes the of a qualitative longitudinal study, it requires
nature of the sample to be selected, the unit of a theorisation of temporal processes. The
analysis for the research (including individual, structure of a qualitative longitudinal study
group, community, organisation, institution, makes it possible to employ an iterative and
events, time period, spatial or geographical reflexive approach through which theoretical
entities) and the overall timeframe of the study interpretations can be revisited in subsequent
(including time intervals if relevant). contact with the participants leading to further
A major value of qualitative longitudinal development of the ideas. A view emerging
research is flexibility, with the potential for in the field is that a qualitative longitudinal
development and innovation to take place methodology might itself challenge or expose
throughout the entire research process. For the static character of existing theoretical
example, with technological development, frameworks, and in this way might represent
types of visual data (photography, video a theoretical orientation as much as a method-
and hypermedia) are becoming increasingly ology (McLeod, 2003; Neale and Flowerdew,
popular in qualitative longitudinal research 2003; Plumridge and Thomson, 2003).
as in qualitative research in general (Pink, Vogt, an anthropologist who worked on
2004a, 2004b; Qualitative Sociology, 1997). the Harvard Chiapras Project for many years,
Changing technology is enabling the devel- notes some advantages of the qualitative
opment and enhancement of ways of stor- longitudinal approach:
ing, accessing and representing data. This
The principal advantage of a continuous long-
flexibility can extend to sampling, methods, range project over a short-range one, or a series
units of analysis and theorisation. Sampling of revisits, is the depth, quality, and variety of
in qualitative research tends to follow a understandings achieved – understandings of the
theoretical, rather than a statistical logic basic ethnography and of the trends and processes
and so is characteristically conceptually and of change. If the long-range project also involves
a sizable team of students and younger colleagues
purposively driven. There is less concern than who make one or more revisits and keep abreast
in quantitative approaches for representative- of all the publications, then there is the added
ness, and sample and sampling can change advantage of having a variety of fieldworkers with
in the process of the research, even more varied training and different theoretical biases who
so in the longer-term qualitative longitudinal are forced to reconcile their findings and their
analyses with one another. Vogt (2002: 145)
research. Two major approaches are purposive
and theoretical sampling. In the first, cases are
chosen because they illustrate some feature or
Problems of attrition
process in which the researcher is interested;
in the second, samples are selected on the basis A major methodological issue for both qual-
of their relevance to the research questions itative and quantitative longitudinal studies
and theoretical position of the researcher, with the individual as the unit of analysis is
and characteristics or criteria which help to the problem of attrition, i.e. the drop-out of
develop and test the theory underlying the participants through successive waves of a
work are built into the sample. In the course prospective study. Each time individuals in
of ongoing research and analysis, purposively a sample are re-contacted there is the risk
chosen confirming or negative cases can also that some will refuse to remain in the study,
be used to enrich the data and its analysis and some will be untraceable, and some may have
interpretation (Mason, 2002; Morse, 1994; emigrated or died1 . In the United States the
Patton, 1990). National Longitudinal Study of Youth (1979)
236 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
is regarded as the gold standard for sample in the context of quantitative research. In
retention against which other surveys are particular the relatively high cost of conduct-
evaluated (Olsen, 2005). Olsen reports that in ing quantitative longitudinal studies makes
2002, 23 years after the first data collection, it important that the fullest possible use
there were 9,964 respondents eligible for is made of the data resource. In the past,
interview and of these 7,724 (77.5 percent) archiving and use of archived qualitative
were successfully interviewed. data for substantive and theoretical re-enquiry
The prospective nature of the majority of have been relatively limited, and propos-
longitudinal studies means that information als for such developments provoked mixed
will have been collected in earlier sweeps reactions, although attitudes are changing
about members of the sample who are not (Holland et al., 2004; Parry and Mauthner,
contacted, or refuse participation, in later 2004: 139). Again there are differences within
sweeps. This makes it possible to correct for social science, with anthropology and oral
possible distortion in results due to missing history leading the field in archiving and
cases. In quantitative research weights may re-use particularly of longitudinal material
be applied or models may be constructed (Sheridan, 2000; Webb, 1996). The iterative,
explicitly to adjust for missing data. In processual nature of qualitative research and
both qualitative and quantitative studies new consequent re-formulation and refinement of
members of the panel may be brought in, research questions over time also makes clear
and/or studies may over-sample particular definition of secondary, as opposed to primary,
groups from the outset in anticipation of analysis difficult and may, to some extent,
uneven attrition. explain the relative lack of secondary analysis
There are a number of ways in which of qualitative data (Hinds et al., 1997). The
sample retention can be maximised in longi- literature on the ethical, methodological and
tudinal studies. These include: using targeted epistemological re-use of qualitative data and
incentive payments; allowing respondents practical support for its archiving is, however,
to choose the mode in which they are growing. A recent review of secondary
interviewed, i.e. by telephone or in a face- analyses of qualitative data in health and
to-face interview (Olsen, 2005); collecting social care research identified 55 studies,
‘stable addresses’ such as the address of mostly North American, and six different
parents or other relatives who are less likely to types of qualitative secondary analysis based
move than the respondent themselves and can on variations in the purpose of the secondary
subsequently be used to trace the respondent; analysis, the extent to which the primary
making regular contact with respondents and and secondary research question differed and
asking them to confirm their current address differences in the number and type of datasets
and notify the research group of changes re-used (Heaton, 2000, 2004).
of address. Some of these techniques are
used in qualitative longitudinal studies, but
an important element in retention here is ETHICAL CONSIDERATIONS IN
the relationship that is built up between the LONGITUDINAL RESEARCH
researcher(s) and the participants. In studies
where the unit of analysis is a group or Many of the ethical issues in longitudinal
community rather than an individual, these research are similar to those in cross-sectional
issues are not so important. research. Major concerns, for both qualitative
and quantitative research, are around consent,
confidentiality, anonymity and the distortion
ARCHIVING AND RE-USE OF DATA of life experience through repeated inter-
vention. Concerns around confidentiality and
Archiving and the secondary analysis of anonymity tend to be amplified in the context
longitudinal data are already well established of longitudinal research, where typically more
LONGITUDINAL AND PANEL STUDIES 237
to evaluate the relative importance of a reconfigured in this way, the unit of analysis
number of different variables, or ‘covariates’ is transferred from being the individual
for predicting the chance, or hazard, of case to being a person-year and logistic
an event occurring. The hazard is a key regression models can be estimated for the
concept in event history analysis, and is dichotomous dependent variable (whether
sometimes also referred to as the hazard rate the event occurred or not) using maximum
or hazard function. It can be interpreted as likelihood methods (Allison, 1984). This
the probability that an event will occur at approach facilitates inclusion of explanatory
a particular point in time, given that the variables that vary over time because each
individual is at risk at that time. The group year, or month, that an individual is at
of individuals who are at risk of the event risk is treated as a separate observation.
occurring are therefore usually referred to as It is also easy to include more than one
the risk set. measure of duration. Discrete time methods
are therefore thought to offer a preferable
Approaches to event history modelling approach when the researcher wants to include
One of the most common approaches within several time-varying covariates. A good
the social sciences is to use Cox’s proportional example is provided by Heaton and Call’s
hazard models or ‘Cox Regression’ (Cox, research on the timing of divorce (Heaton and
1972). This provides a method for modelling Call, 1995). This analytic approach is also
time-to-event data and allows the inclusion of frequently used by those looking at recidivism
predictor variables (covariates). For example, and wanting to understand the timing and
a model could be estimated for duration of correlates of repeat offending (Baumer, 1997;
marriage based on religiosity, age at marriage Benda, 2003; Gainey et al., 2000).
and level of education. Cox Regression will
handle the censored cases correctly, and it Individual heterogeneity
will provide estimated coefficients for each A major limitation with the simple approach
of the covariates, allowing an assessment of to the analysis of discretized longitudinal
the relative importance of multiple covariates data described above, is that it does not
and of any interactions between them. Cox take account of the fact that the unit of
regression is known as a continuous time analysis is the ‘person-year’ and therefore the
approach because it is assumed that the time individual cases are not fully independent (as
that an event occurs is measured accurately. they should be for a logistic regression) but
Even though the Cox model is one are clustered at the level of the person. For
of the most popular and widely applied example, in an analysis modelling duration of
approaches it has two main disadvantages. marriage, an individual who had been married
First, it is relatively inflexible in terms for 10 years would contribute 10 observations
of modelling duration dependence i.e. for or ‘person-years’ to the dataset. Another way
specifying exactly how the hazard may change to understand this problem is to consider
over time, and, second, it makes it difficult that there may be additional variables which
to incorporate time-varying covariates. For have a strong association with the dependent
this reason, many researchers, with an explicit variable but which are not included in the
interest in how the probability of an event model. The existence of such ‘unobserved
occurring changes over time, prefer to use heterogeneity’ will mean that models are
a ‘discrete-time’ approach. This requires that mis-specified and in particular spurious dura-
the data have a specific format. A separate unit tion effects may be detected. The use of
of analysis is created for each discrete time more sophisticated models including fixed
interval. Each record therefore corresponds or random effects models can overcome
to a person/month or person/year (depending these problems and allow the researcher to
on the accuracy with which events have produce more robust estimates of duration
been recorded). Once the data has been dependence. It is beyond the scope of this
LONGITUDINAL AND PANEL STUDIES 239
chapter to discuss these models but for a more from determining the causal ordering of
detailed introductory treatment see Elliott variables. For example, there is a considerable
(2002), Davies (1994) and Box-Steffensmeier body of research that has shown a strong
and Jones (2004). association between unemployment and ill
health. This can either be interpreted to imply
that unemployment causes poor health or that
Repeated measures analysis
those who are in poor health are more likely
In some quantitative longitudinal research the to become unemployed and subsequently find
focus is not on the timing of events but it more difficult to find another job, i.e.
rather on change in an individual attribute over there is a selection effect such that ill health
time, for example weight, performance score, might be described as causing unemployment
attitude, voting behaviour, reaction time, (Bartley, 1991; Blane et al., 1993). In this
depression etc. In particular, psychologists case, longitudinal data would be needed to
often use repeated measures of traits, disposi- follow a sample of employed individuals and
tions or psychological well-being to examine determine whether their health deteriorated
which factors may promote change or stability if they became unemployed, or conversely
for individuals. This approach can also be used whether a decline in health led to an increased
to investigate what type of effect a particular probability of becoming unemployed (for
life event may have on individual functioning. examples which make use of longitudinal data
For example, several studies examining the to untangle this issue see Montgomery et al.,
potential consequences of parental divorce 1996 and 1999).
for children have compared behavioural In quantitative studies, longitudinal data is
measures and measures of performance in also valuable for overcoming the problems of
mathematics and reading in addition to other disentangling maturational effects and gener-
outcomes, before and after a parental divorce ational effects. As Dale and Davies (1994)
(Cherlin et al., 1991; Elliott and Richards, explain, cross-sectional data that examines the
1991; Ni Bhrolchain et al., 1994). link between age and any dependent variable
confounds cohort and life course effects. As
was discussed above, using the example of
CAUSALITY IN CROSS-SECTIONAL political allegiances, one advantage of having
AND LONGITUDINAL RESEARCH longitudinal data on a number of separate
cohorts is that it enables the researcher to
Information about the temporal ordering of disentangle these effects.
events is generally regarded as essential if Perhaps the major advantage of longi-
we are to make any claims about a causal tudinal data over cross-sectional data in
relationship between those events. Given the understanding the possible causal relation-
importance of establishing the chronology ships between variables is its ability to take
of events in order to be confident about account of omitted variables. Quantitative
causality it can be seen that longitudinal longitudinal data enables the construction of
data is frequently to be preferred over cross- models that are better able to take account of
sectional data. In some substantive examples the complexities of the social world and the
even when data is collected in a cross- myriad influences on individuals’ behaviour.
sectional survey, it is clear that one event or Qualitative researchers can be reluctant
variable, precedes another. For example, in an to use the term causality, seeing it as
analysis that focuses on the impact of school- intrinsically part of a quantitative paradigm.
leaving age on occupational attainment there Understanding phenomena in time enables
is unlikely to be confusion about the temporal a researcher to capture meaning, intention
ordering of the variables. However, there are and consequence, rather than findings true
a number of examples where the use of cross- for all times and places (Gergen, 1984). But
sectional survey data prevents researchers some argue that because of its attention to
240 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
detail, process, complexity and contextuality, is not to attempt to model the underlying
qualitative research is particularly valuable processes, but rather to establish a systematic
for identifying and understanding causal and description or typology of the most commonly
multi-causal linkages, especially in relation occurring patterns or sequences within them
to the temporal dimension of the longitu- (Abbott 1990, 1992). This approach has
dinal approach (Mason, 2002; Miles and been termed ‘narrative positivism’. Abbott
Huberman, 1994). Again, whilst this might be introduced the set of techniques known as
the case, qualitative longitudinal researchers Optimal Matching Analysis into sociology
would not necessarily refer to their findings from molecular biology, where it had been
in this way, for example in ethnography used in the study of DNA and other protein
causal theories are common if implicit. The sequences. He has applied the method to
focus on the meaning of experience for a substantive issues including the careers of
participant active in the construction of her/his musicians (Abbott and Hrycak, 1990), and
own identity and reflexive narrative of self the development of the welfare state (Abbott
could lead to explanations that might identify and DeViney, 1992). Following his lead, other
‘causal’ or ‘multicausal’ sequences. Pollard sociologists have also begun to adopt this
and Filer (1999) in a study of primary school approach and in particular have found the
children’s identities and careers eschew a method to be useful for the analysis of careers
focus on the academic and social outcomes (Blair-Loy, 1999; Chan, 1995; Halpin and
usually associated with school achievement, Chan, 1998; Stovel et al., 1996). However the
and what inputs would produce that output. technique is not as well developed or as widely
Taking a holistic approach they highlight the used as the modelling approaches described
dynamic, recursive nature of pupil experience, above (Wu, 2000).
seeing these children as continuously shaping It is perhaps in this approach, which
and maintaining their identity and status aims to provide a detailed description of the
as a pupil as they move through different different types of pathways or trajectories
school settings, in a dynamic, fluctuating followed by individuals, that qualitative
process, open to possibilities for change and quantitative approaches to analysis of
in varying degrees. Many elements are longitudinal data come closest. Abbott’s
identified as contributing in various ways to approach uses large samples and utilises
this reflexive pupil identity – gender, social sophisticated software to construct clusters
class and ethnicity, material, cultural and of cases with similar longitudinal profiles.
linguistic resources, physical and intellectual However, the research question addressed
capability and potential and multiple and using this technique mirrors the type of
various experiences in school observed in the research questions that form the focus of many
study. This is clearly a different understanding qualitative longitudinal studies, although the
of causality than that found in quantitative two approaches provide rather different types
approaches. of data on such trajectories.
Even though the event history techniques The analysis of quantitative data largely
described in the section on quantitative analy- involves statistical modelling of large datasets
sis above are powerful and flexible, they still to identify patterns and relationships in the
have the disadvantage that they do not deal data at an aggregated level to be able to
with sequences holistically. An alternative make probabilistic statements about particular
approach to the analysis of event history data populations.As we have just seen, more recent
LONGITUDINAL AND PANEL STUDIES 241
holistic approaches are attempting to deal (Wolcott, 1994: 12). Finally, ‘Explaining the
with describing and classifying individual nature and meaning of those changes, or
trajectories through time, through clustering developing a theory with transferability of the
techniques. Qualitative longitudinal data pro- study’s findings to other contexts, is the final
vides a different type of detailed information stage of interpretation’ (Saldana, 2003: 63).
about processes through time for individuals Saldana elaborates the Wolcott schema
or groups of varying sizes, requiring different in his guidebook for qualitative longitudi-
analytic strategies. These methods of analysis nal research, providing framing, descriptive,
will also vary, depending on the discipline, the analytic and interpretive questions to guide
theoretical approach and the unit of analysis. the analytic process. ‘Framing questions’
A key aspect of qualitative longitudinal (p. 63) address and manage the contexts
analysis in general, however, is that it is of the particular study’s data, locating them
theoretically driven, and is characterised by in the process (e.g. what contextual and
a focus on meaning. intervening conditions appear to influence
Saldana highlights colourfully the prob- and affect participant changes through time?).
lems of analysis for the qualitative longitu- Descriptive questions (e.g. what increases
dinal researcher: or emerges through time? What kinds of
surges or epiphanies occur through time?)
The challenge for qualitative researchers is to generate information to help answer the
rigorously analyze and interpret primarily language-
based data records to describe credibly, vividly, and
framing questions, and the more complex ana-
persuasively for readers through appropriate nar- lytic and interpretive questions. Analytic and
rative the processes of participant change through interpretive questions integrate the descriptive
time. This entails the sophisticated transformation information to guide the researcher to richer
and integration of observed human interactions in levels of analysis and interpretation (e.g.
their multiple social contexts into temporal patterns
or structures. (Saldana, 2003: 46)
which changes interrelate through time? What
is the through-line of the study? The through-
The analysis of qualitative longitudinal line is ‘a single word, a phrase, a sentence, or
research must then engage with and capture a paragraph with an accompanying narrative
time, process and change. It requires working that describes, analyzes, and/or interprets
in two temporal dimensions: diachronically, the participant’s changes through time by
through time, and synchronically cross- analyzing its thematic flow—its qualitative
cutting at one point in time, and the trajectory’ (Saldana, 2003: 151, see too
articulation of these two through a third, Saldana, 2005).
integrative dimension. This is recognised as Thomson and Holland (2003) provide an
crucial for analysing change through time example of an analysis attempting two of the
(Saldana, 2003). Even though both qualitative dimensions suggested above in their 10-year
and quantitative longitudinal traditions have study of 100 young people’s transitions to and
realised such analyses, this remains a chal- constructions of adulthood, Inventing Adult-
lenging task both to execute and to describe. hoods. The cross-sectional analysis captures
Here are some of the general approaches a moment in time in the life of the sample
mooted. (at each interview or data generation point) to
Wolcott (1994) suggests three stages of identify discourses through which identities
increasing abstraction for the analytic pro- are constructed. In this case the data was
cess: description, analysis and interpretation. coded descriptively and conceptually (using
Description involves recording, chronicling NUD.IST2 ) to enable comparison across the
and describing what kinds of change occur, sample on the basis of a range of factors,
in whom or what, at what time and in e.g. age, gender, social class, geographical
what context. Analysis accomplishes ‘the location. These analyses form a repeat cross-
identification of essential features and the sectional study on the same sample and
systematic interrelationships among them’ analyses can be compared for change over
242 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
time, and each contextualised in social and with whole cases: undertaking comparison
historical time. They highlight differences between cases and between groups of cases,
and similarities within the sample, and help asking questions such as why and how might
identify the relationship between individual something that is present in one case (or
narratives and wider social processes. The group) be absent in another?
longitudinal analysis consists of examining As we can see, qualitative longitudinal stud-
the development of a particular narrative ies produce complex and multi-dimensional
for each case over the course of the study, datasets, which in turn demand innovative
following the complexity and contingency strategies for data analysis and display that
of individual trajectories, and identifying operate on more than two dimensions.
critical moments and change. This individual
temporal analysis can also be related to
social and historical time and change. More CONCLUSIONS: THE CONSTRUCTION
recently Thomson (2007) has described the OF THE INDIVIDUAL IN QUALITATIVE
process of constructing longitudinal case AND QUANTITATIVE LONGITUDINAL
histories. RESEARCH
Drawing on a significant body of policy
evaluation research, Lewis (2005) outlines As we discussed earlier, one of the main
a multi-dimensional approach to qualitative advantages of both qualitative and quan-
longitudinal data analysis built around the titative longitudinal research is the ability
‘framework’ approach to qualitative analy- to track individual lives through time. In
sis developed by the National Centre for quantitative longitudinal research a priority
Social Research (Ritchie and Lewis, 2003). is placed on collecting accurate data from a
Changes in evaluation studies are identified large representative sample about the nature
as occurring at the individual, service and and timing of life events, circumstances and
policy levels. Change is manifest in a literal behaviour. In qualitative longitudinal research
way through the chronology of the account, the emphasis is far more on individuals’
yet it is also evident in how this chronology understanding of their lives and circumstances
is reinterpreted by a research participant and how these may change through time.
over time. Lewis suggests that qualitative Even though both qualitative and quantita-
longitudinal data are characterised by ‘discor- tive longitudinal research have the potential to
dant data’where subsequent re-interpretations provide very detailed information about indi-
conflict with original accounts. To complicate viduals, what is obscured in the quantitative
matters further, not only does the participant approach are the narratives that individuals
reinterpret their story, but the researcher also tell about their own lives. While complex
reinterprets their analysis in the light of new biographical case studies can be developed
revelation and the passage of time. Lewis from survey data (Sampson and Laub,
maps each longitudinal case within a two-by- 1993; Singer et al., 1998), these accounts
two frame that enables them to plot a series of are clearly authored by the researcher and
interviews with a single participant (vertical allow no access to the reflexivity of the
axis) against themes (horizontal axis). In a respondents themselves. In contrast with
similar way to that described by Thomson qualitative longitudinal research, the whole
and Holland (2003), the analysis proceeds in emphasis of the study may be on under-
two directions: horizontally across themes and standing the reflexive process of identity
vertically through a case over time, as well as work accomplished by individuals (Pollard
‘zigzagging’ between themes and interviews and Filer, 1999; Thomson and Holland,
within a single case to trace the development 2003). It is important to be clear therefore
of a theme over time. But in order to move that whereas the criticism that quantitative
away from the single case to the wider dataset, research is less detailed than qualitative
Lewis encourages an approach to working research may be misplaced (particularly in
LONGITUDINAL AND PANEL STUDIES 243
International Journal of Offender Therapy and Cox, D. R. (1972). ‘Regression Models and Life Tables.’
Comparative Criminology 47,1: 89–110. Journal of the Royal Statistical Society B 34:
Berthoud, R. and Iacovou, M. (2002). Diverse Europe: 187–202.
Mapping patterns of social change across the EU, Crow, G. (2002) ‘Community Studies: Fifty Years
Economic and Social Research Council. of theorization.’ Sociological Research Online
Blackwell, L., Lynch, K., Smith, J. and Goldblatt, P. 7,3,http://www.socresonline.org.uk/7/3/crow.html
(2003). ‘Longitudinal Study 1971–2001: Complete- Crow, G. and Allen, G. (1994). Community life:
ness of Census Linkage’ (Series LS No. 10) (PDF 841K), An introduction to local social relations, London:
http://www.celsius.lshtm.ac.uk/2001_data.html Harvester Wheatsheaf.
Blair-Loy, M. (1999). ‘Career Patterns of Executive Cutting, A. L. and Dunn, J. (1999). ‘Theory of Mind,
Women in Finance.’ American Journal of Sociology Emotion Understanding, Language and Family Back-
104: 1346–1397. ground: Individual Differences and Inter-relations.’
Blane, D., Smith, G. and Bartley, M. (1993). ‘Social Child Development 70: 853–865.
Selection: What Does it Contribute to Social Class Dale, A. and Davies, R. (1994). Analyzing social and
Differences in Health.’ Sociology of Health and Illness political change: A casebook of methods, London:
15,1: 1–15. Sage.
Blossfeld, H.-P. and Rohwer, G. (1995). Techniques of Davies, R. B. (1994). ‘From cross-sectional to longitu-
event history modeling: New approaches to causal dinal analysis,’ in Dale, A. and Davis, R. B. (eds)
analysis. Mahwah, NJ: Lawrence Erlbaum Associates. Analyzing social and political change: A casebook of
Box-Steffensmeier, J. and Jones, B. (2004). Event history methods, London: Sage, pp. 20–40.
modeling, Cambridge: Cambridge University Press. Devine, F. (1992) Affluent workers revisited: Privatism
Brown, L.M. and Gilligan, C. (1992). Meeting at and the working class, Edinburgh: Edinburgh
the crossroads: Women’s psychology and girls’ University Press.
development, Cambridge, MA: Harvard University Dex, S. (1995). ‘The Reliability of Recall Data:
Press. A Literature Review.’ Bulletin de Methodologie
Bynner, J. and Fogelman, K. (1993). Making the grade: Sociologique 49: 58–80.
education and training experiences, in Ferri, E. (ed.) Dex, S. and Joshi, H. (2005). Children of the 21st century:
Life at 33: The fifth follow-up of the National Child from birth to nine months, Bristol: The Policy Press.
Development Study, London: National Children’s Dex, S. and McCulloch, A. (1998). ‘The reliability
Bureau, pp. 36–59. of retrospective unemployment history data.’ Work
Caccamo, R. (2000) Back to Middletown: Three Employment and Society 12,3: 497–509.
generations of sociological reflections, Stanford: Du Bois-Reymond, M. (1998). “‘I don’t want to commit
Stanford University Press. myself yet”: Young people’s life concepts.’ Journal of
Caplow, T. and Bahr, H. M. (1979) ‘Half a Century Youth Studies 1,1: 63–79.
of Change in Adolescent Attitudes: Replication of Dwyer, P. J. and Wyn, J. (2001). Youth, education and
a Middletown Survey by the Lynds.’ Public Opinion risk: Facing the future, London: RoutledgeFalmer.
Quarterly 43,1: 1–17. Elder, G. and Conger, R. D. (2000). Children of the
Caplow, T., Bahr, H. M., Chadwick, B. A., Hill, R. and land: Adversity and success in rural America, Chicago:
Williamson, M. H. O. (1982). Middletown families: University of Chicago Press.
Fifty years of change and continuity, Minneapolis, Elder, G. H. (1974). Children of the great depression:
MN: University of Minnesota Press. social change in life experience, Chicago: University
Chan, T.-W. (1995). ‘Optimal Matching Analysis.’ Work of Chicago Press.
and Occupations 22: 467–490. Elliott, B. J. (2002). ‘The Value of Event History
Cherlin, A. J., Furstenberg, F., Chase-Landsdale, P. L. and Techniques for Understanding Social Processes:
Kiernan, K. (1991). ‘Longitudinal Studies of Effects of Modelling Women’s Employment Behaviour After
Divorce on Children in Great Britain and the United Motherhood.’ International Journal of Social Research
States.’ Science Technology & Human Values 252: Methodology 5,2: 107–132.
1386–1389. Elliott, B. J. and Richards, M. P. M. (1991). ‘Children
Cliggett, L. (2002). ‘Multigenerations and Multidis- and Divorce: Educational Performance and Behaviour
ciplines: Inheriting Fifty Years of Gwembe Tonga Before and After Parental Separation.’ International
Research,’ in Kemper, R. and Royce, A. P. Journal of Law and the Family 5: 258–276.
(eds) Chronicling cultures: Long-term field research Elliott, J. (2005). Using narrative in social research:
in anthropology, Walnut Creek, CA: AltaMira, Qualitative and quantitative approaches, London:
pp. 239–251. Sage.
LONGITUDINAL AND PANEL STUDIES 245
Farrall, S. (2004). ‘Social Capital and Offender Reinte- Halbwachs, Maurice (1992). On collective memory.
gration: Making Probation Desistance Focussed,’ in Translated and edited by Lewis A. Coser. Chicago:
Maruna, S. and Immarigeon, R. (eds) After crime and University of Chicago Press.
punishment: Ex-offender reintegration and desistance Halpin, B. and Wing Chan, T. (1998). ‘Class Careers as
from crime, Cullompton: Willan. Sequences: an Optimal Matching Analysis of Work-
Featherman, D. L. (1980). ‘Retrospective Longitudinal Life Histories.’ European Sociological Review 14,2:
Research: Methodological Considerations.’ Journal of 111–130.
Economics and Business 32: 152–169. Heaton, J. (2000). Secondary analysis of qualitative data:
Ferri, E., Bynner, J. and Wadsworth, M. (2003). a review of the literature, Full Research report ESRC
Changing Britain, changing lives: three generations 1752 (8.00), Social Policy Research Unit, University
at the turn of the century, London: Institute of of York
Education. Heaton, J. (2004). Re-working qualitative data, London:
Foster, G. M., Scudder, T., Colson, E. and Kemper, R. Sage.
(1979). Long-term field research in social anthropol- Heaton, T. B. and Call, V. R. A. (1995). ‘Modeling Family
ogy, New York: Academic Press. Dynamics with Event History Techniques.’ Journal of
France, A., Bendelow, G. and Williams, S. (2000) Marriage and the Family 57: 1078–1090.
‘A “Risky” Business: Researching the Health Beliefs Henderson, S., Holland, J., McGrellis, S., Sharpe,
of Children and Young People,’ in Lewis, A. and S. and Thomson, R. (2007). Inventing adulthood:
Lindsay, G. (eds) Researching children’s perspectives, A biographical approach to youth transitions, London:
Buckingham: Open University Press, pp. 231–263. Sage.
Gainey, R. R., Payne, B. K. and O’Toole, M. (2000). ‘The Hinds, P., Vogel, R. and Clarke-Steffen, L. (1997). ‘The
Relationship Between Time in Jail, Time on Electronic Possibilities and Pitfalls of Doing a Secondary Analysis
Monitoring, and Recidivism: an Event History Analysis of a Qualitative Data Set.’ Qualitative Health Research
of a Jail-Based Program.’ Justice Quarterly 17,4: 7,3: 408–424.
733–752. Holland, J., Thomson, R. and Henderson, S. (2004).
Gergen, K. J. (1984). ‘An Introduction to Historical Social Feasibility study for a possible qualitative longitudinal
Psychology,’ in Gergen, K. J and Gergen, M. M. (eds) study, Specification ad Discussion Paper for Economic
Historical social psychology, London: NJ: Lawrence and Social Research Council, UK.
Erlbaum Associates. Hughes, C. and Dunn, J. (2002). “‘When I Say a Naughty
Giele, J. Z. (1998). Innovation in the typical life course. Word”. A Longitudinal Study of Young Children’s
Methods of life course research: qualitative and Accounts of Anger and Sadness in Themselves
quantitative approaches. J. Z. Giele and G. H. Elder. and Close Others.’ British Journal of Developmental
London: Sage, pp. 231–263. Psychology 20, 515–535.
Giele, J. Z. and Elder, G. H. (1998). Methods of life course Jacobs, S. C. (2002). ‘Reliabilty and Recall of
research: qualitative and quantitative approaches, Unemployment Events Using Retrospective
Thousand Oaks, CA: Sage. Data.’ Work, Employment and Society 16,3:
Gilligan, C. (1993). In a Different Voice: Psychological 537–548.
Theory and Women’s Development, Cambridge, MA: Kemper, R. and Royce, A. P. (eds) (2002) Chronicling
Harvard University Press. cultures: Long-term field research in anthropology,
Goldthorpe, J. H., Lockwood, D., Bechofer, F. and Platt, J. Walnut Creek, CA: AltaMira.
(1968). The affluent worker in the class structure, Kuhn, T. and Witzel, A. (2000). School-to-work Tran-
Cambridge: Cambridge University Press. sition, Career Development and Family Planning –
Gorell-Barnes, L. G., Thompson, P., Barnes, P., Daniel, G. Methodological Challenges and Guidelines of a
and Burchardt, N. (1998). Growing up in stepfamilies. Qualitative Longitudinal Panel Study. Forum: Quali-
Oxford: Oxford University Press. tative Social Research 1, 2: http://www.qualtative–
Gordon, T., Holland, J. and Lahelma, E. (2000). Making research.net/fqs-texte/2-00/2-00kuehnwitzel-e.htm
spaces: Citizenship and difference in schools, London: Lancaster, T. (1990). The econometric analysis of
Macmillan. transition data, Cambridge: Cambridge University
Gubrium, J. F. and Holstein J. A. (1995). ‘Individ- Press.
ual Agency, The Ordinary and Postmodern Life.’ Laub, J. H. and Sampson, R. J. (1998). ‘Integrating
Sociological Quarterly 36,3: 555–570. Quantitative and Qualitative Data,’ in Giele, J. Z. and
Gulbrandsen, L. M. (2003). ‘Peer Relations as Arenas Elder, G. H. (eds) Methods of life course research:
for Gender Constructions Among Young Teenagers.’ qualitative and quantitative approaches, Thousand
Pedagogy, Culture and Society 11,1: 113–132. Oaks, CA: Sage, pp. 213–230.
246 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Laub, J. H. and Sampson, R. J. (2003). Shared Mott, F. (2002). ‘Looking Backward: Post hoc Reflections
beginnings, divergent lives: Delinquent boys to age on Longitudinal Surveys,’ in Phelps E., Furstenberg, F.
70, Cambridge, MA: Harvard University Press. and Colby A. (eds) Looking at lives: American
Lewis, J. (2005). ‘Qualitative Longitudinal Data for longitudinal studies of the twentieth century,
Evaluation Studies,’ SPRU (University of York) and New York: Russell Sage.
CASP (University of Bath), Friends Meeting House, Mumford, K. and Power, A. (2003). East Enders: Family
London 11th November 2005. and community in East London, Bristol: Policy Press.
Lynd, R. and Lynd, H. M. (1929). Middletown. A study Neale, B. and Flowerdew, J. (2003). ‘Time, Texture and
in American Culture, New York: Harcourt Brace. Childhood: The Contours of Longitudinal Qualitative
Lynd, R. and Lynd, H. M. (1935). Middletown in Research.’ International Journal of Social Research
transition: A study of cultural conflicts, New York: Methodology: Theory and Practice 6,3: 189–199.
Harcourt Brace. Ni Bhrolchain, M., Chappell, R. and Diamond, I.
Mannheim, Karl (1956). ‘On the Problem of Genera- (1994). ‘Educational and Socio-demographic Out-
tions,’ in Essays on the sociology of culture. New York: comes Among Children of Disrupted and Intact
Oxford University Press. Marriages.’ Population 36: 1585–1612.
Mason, J. (2002) (2nd edn.). Qualitative researching, Olsen, R. J. (2005). ‘The Problem of Respondent
London: Sage. Attrition: Survey Methodology is Key.’ Monthly Labor
Mauthner, N., Parry, O. and Backett-Milburn, K. (1998). Review 128,2: 63–70.
‘The Data are Out There, or are They? Implications for Parry, O. and Mauthner, N. (2004). ‘Whose Data
Archiving and Revisiting Qualitative Data.’ Sociology are They Anyway? Practical, Legal and Ethical
32,4: 733–745. Issues in Archiving Qualitative Data.’ Sociology 38,1:
McGonagle, K. A. and Schoeni, R. F. (2006). ‘The Panel 139–152.
Study of Income Dynamics: Overview & Summary Patton, M. Q. (1990). Qualitative evaluation and
of Scientific Contributions After Nearly 40 Years.’ research methods (2nd ed.), Newbury Park, CA: Sage.
Retrieved March 2006, from http://psidonline. Pink, S. (ed.) (2004a). Visual images, London:
isr.umich.edu/Publications/Papers/montrealv5.pdf Routledge.
McLeod, J. (2003). ‘Why We Interview Now – Pink, S. (2004b). Home truths: Gender, domestic objects
Reflexivity and Perspective in a Longitudinal Study.’ and the home, Oxford: Berg.
International Journal of Social Research Methodology Plumridge, L. (2001). ‘Rhetoric, Reality and Risk
6,3: 223–232. Outcomes in Sex Work.’ Health, Risk and Society 3,2:
McLeod, J. and Yates, L. (2006). Making modern lives: 119–215.
Subjectivity, schooling and social change, Albany: Plumridge, L. and Thomson, R. (2003). ‘Longitudinal
State University of New York Press. Qualitative Studies and the Reflexive Self.’ Interna-
Miles, M. B. and Huberman, A. M. (1994). Qualitative tional Journal of Social Research Methodology 6,3:
data analysis: An expanded sourcebook (2nd edn), 213–222.
London: Sage. Pollard, A. and Filer, A. (1999). The social world of pupil
Molloy, D. and Woodfield, K. with Bacon, J. (2002). career: Strategic biographies through primary school,
Longitudinal qualitative research approaches in London: Cassell.
evaluation studies, Working Paper No. 7, London: Pollard, A. and Filer, A. (2002). Identity and secondary
HMSO. schooling project. Full report to the ESRC.
Montgomery, S. M., Bartley, M. J., Cook, D. G. and Qualitative Sociology (Spring 1997) 20 (1) Special Issue:
Wadsworth, M. (1996). ‘Health and Social Precursors Visual methods in sociological analysis.
of Unemployment in Young Men in Great Britain.’ Ritchie, J. and Lewis, J. (2003). Qualitative research
Journal of Epidemiology and Community Health 50, practice: A guide for social science students and
415–422. researchers, London: Sage.
Montgomery, S. M., Cook, D. G., Bartley, M. J. and Ronai C. R. and Cross, R. (1998). ‘Dancing With Identity:
Wadsworth, M. (1999). ‘Unemployment Pre-dates Narrative Resistance Strategies of Male and Female
Symptoms of Depression and Anxiety Resulting in Stripteasers.’ Deviant Behaviour 19: 99–119.
Medical Consultation in Young Men.’ International Royce, A. P. (1977). The anthropology of dance,
Journal of Epidemiology 28,1: 95–100. Bloomington: Indiana University Press.
Morse, J. M. (1994). ‘Designing Funded Qualitative Royce, A. P. (1982). Ethnic identity: strategies of
Research,’ in Denzin, N. L. and Lincoln, Y. S. diversity, Bloomington: Indiana University Press.
(eds) Handbook of qualitative research, London: Royce, A. P. (1993). ‘Ethnicity, Nationalism, and the Role
Sage. of the Intellectual,’ in Toland, Judith D. (ed.) Ethnicity
LONGITUDINAL AND PANEL STUDIES 247
and the state, political and legal anthropology, Vol. 9, Years on,’ Forum Qualitative Sozialforschung/Forum:
New Brunswick, NJ: Transaction Press, pp.103–122. Qualitative Social Research, 1,3. Available at:
Royce, A. P. (2002). ‘Learning to See, Learning to http://qualitative-research.net/fqs/fqs-eng.htm
Listen: Thirty-five Years of Fieldwork with the Isthmus Singer, B., C. D. Ryff, D. Carr and Magee, W. J. (1998).
Zapotec,’ in Kemper, R. V. and Royce, A. P. ‘Linking Life Histories and Mental Health: A Person
(eds) Chronicling cultures: Long-term field research Centred Strategy.’ Sociological Methodology 28: 1–51.
in anthropology, Walnut Creek: Altamira Press, Smith, D. J. and McVie, S. (2003). ‘Theory and Method in
pp. 8–33. the Edinburgh Study of Youth Transitions and Crime.’
Royce, A. P. (2005). ‘The Long and the Short of British Journal of Criminology 43,1: 169–195.
it: Benefits and Challenges of Long-Term Ethno- Stacey, M. (1960). Tradition and change: A study of
graphic Research.’ Paper presented at Principles of Banbury, Oxford: Oxford University Press.
Qualitative Longitudinal Research: An International Stacey, M., Batstone, E., Bell, C. and Murcott, A. (1975).
Seminar, University of Leeds, UK, September 30, Power, persistence and change: A second study of
2005. Banbury, London: Routledge & Kegan Paul.
Ruspini, E. (2002). Introduction to longitudinal research, Stovel, K., Savage, M. and Bearman, P. (1996).
London: Routledge. ‘Ascription into Achievement: Models of Career
Ryder, N. B. (1965). ‘The Cohort as a Concept in Systems at Lloyds Bank, 1890–1970.’ American
the Study of Social Change.’ American Sociological Journal of Sociology 102,2: 358–399.
Review 30: 843–861. Taris, T. W. (2000). A primer in longitudinal data
Saldana, J. (2003). Longitudinal qualitative research: analysis, London: Sage.
Analyzing change through time, Walnut Creek, Thomson, R. (2007). ‘The QL ‘Case History’: Practical,
Lanham, New York, Oxford: Altamira Press. Methodological and Ethical Reflections.’ Social Policy
Saldana, J (2005). ‘Coding Qualitative Data to Analyze and Society 6,4.
Change.’ Paper presented at Principles of Qualitative Thomson, R. and Holland, J. (2003). ‘Hindsight,
Longitudinal Research: An International Seminar, Foresight and Insight: The Challenges of Longitudinal
University of Leeds, UK, September 30, 2005. Qualitative Research.’ International Journal of Social
Sampson, R. J. and Laub, J. H. (1993). Crime in the Research Methodology 6,3: 233–244.
making: pathways and turning points through life, Thomson, R. and Holland, J. (2005). “‘Thanks for
Cambridge, MA: Harvard University Press. the Memory”: Memory Books as a Methodological
Savage, M. and Egerton, M. (1997). ‘Social Mobility, Resource in Biographical Research.’ Qualitative
Individual Ability and the Inheritance of Class Research 5,2: 201–291.
Inequality.’ Sociology 31,4: 465–472. Tuma, N. B. and Hannan, M. T. (1979). ‘Dynamic
Schoon, I. and Parsons, S. (2002) ‘Competence Analysis of Event Histories.’ American Journal of
in the Face of Adversity: The Impact of Early Sociology 84,4: 820–854.
Family Environment and Long-term Consequence.’ Vogt, E. Z. (1957). ‘The Acculturation of the American
Children & Society 16,4, 260–272. Indians.’ Annals of American Academy of Political and
Scott, J. and Alwin, D. (1998). ‘Retrospective Versus Social Science 311: 137–146.
Prospective Measurement of Life Histories in Lon- Vogt, E. Z. (1969) Zinacantan: A Maya community in
gitudinal Research, in Giele, J. Z. and Elder, G. H. the Highlands of Chiapas, Cambridge, MA: Bellknap
(eds) Methods of life course research: qualitative and Press of Harvard University Press.
quantitative approaches, Thousand Oaks, CA: Sage, Vogt, E. Z. (1994). Fieldwork among the Maya: Reflec-
pp. 98–127. tions on the Harvard Chiapas Project, Albuquerque:
Scudder, T. and Colson, E. (1979). ‘Long-term Research University of New Mexico Press.
in Gwembe Valley, Zambia,’ in Foster G. M., Vogt, E. Z. (2002). ‘The Harvard Chiapas Project;
Scudder, T., Colson, E. and Kemper R. V. (eds) Long- 1957–2000,’ in Kemper, R. and Royce, A. P.
term field research in social anthropology, New York: (eds) Chronicling cultures: Long-term field research
Academic Press, pp. 277–254. in anthropology, Walnut Creek, CA: AltaMira,
Scudder, T. and Colson, E. (2002) ‘Long-term Research pp. 135–159.
in Gwembe Valley, Zambia,’ in Kemper, R. V. and Wajcman J. and Martin B. (2002). ‘Narratives of Identity
Royce, A. P. (eds) Chronicling cultures: Long-term in Modern Management: the Corrosion of Gender
field research in Anthropology, Walnut Creek, CA: Difference?’ Sociology 36: 985–1002.
AltaMira, pp. 197–238. Walkerdine, V. and Lucey, H. (1989). Democracy in
Sheridan, Dorothy (2000). ‘Reviewing Mass- the kitchen: Regulating mothers and socialising
Observation: The Archive and its Researchers Thirty daughters, London: Virago.
248 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Walkerdine, V., Lucey, H. and Melody, J. (2001). Wolcott, H. F. (1994). Transforming qualitative data:
Growing up girl: Psychosocial explorations of gender Description, analysis, and interpretation, Thousands
and class, Houndmills: Palgrave. Oaks, CA: Sage.
Ward, J. and Henderson, Z. (2003). ‘Some Practical Woodgate, R., Degner, L. and Yanofsky, R. (2003).
and Ethical Issues Encountered While Conducting ‘A Different Perspective to Approaching Cancer
Tracking Research with Young People Leaving the Symptoms in Children.’ Journal of Pain and Symptom
“Care” System.’ International Journal of Social Management, 26,3: 800–817.
Research Methodology 6,3: 255–259. Wu, L.L. (2000). ‘Some Comments on “Sequence
Warwick, D. and Littlejohn, G. (1992). Coal, capital and Analysis and Optimal Matching Methods in Sociology:
culture: A sociological analysis of mining communities Review and Prospect”.’ Sociological Methods and
in West Yorkshire, London: Routledge. Research 29,1: 41–64.
Webb, C. (1996) ‘To Digital Heaven? Preserving Yamaguchi, K. (1991). Event History Analysis. Newbury
Oral History Recordings at the National Library Park, CA: Sage.
of Australia.’ Staff paper, http://www.nla.gov.au/ Yates, L. and McLeod, J. (1996). “‘And How Would
nla/staffpaper/archive/index1996.html You Describe Yourself?” Researchers and Researched
White, R. and Wyn, J. (2004). Youth and society: in the First Stages of a Qualitative, Longitudinal Research
Exploring the social dynamics of youth experience, Project.’ Australian Journal of Education 40,1: 88–103.
Oxford: Oxford University Press. Yates, L., McLeod, J. and Arrow, M. (2002). Self, school
Whyte, W.F. (1943 2nd edition 1955). Street Corner and the future: The 12 to 18 Project, University
Society: The social structure of an Italian slum, of Technology, Sydney, Changing Knowledges
Chicago: University of Chicago Press. Changing Identities Research Group.
Wirth, L. (1938). ‘Urbanism as a Way of Life.’ American Young, M. and Willmott, P. (1957). Family and kinship
Journal of Sociology, 44: 1–24. in East London, London: Routledge and Kegan Paul.
15
Comparative and
Cross-National Designs
David de Vaus
It can be argued that virtually all social in cross-national comparative research are
research is comparative in that descriptions confronted in one way or another by those in
and explanations are derived from compar- other forms of research.
isons of groups, cases, periods or some
other unit of analysis (Przeworski and Teune
1966). This chapter focuses on one type of PART 1: WHAT IS COMPARATIVE
comparative research – that which is based on CROSS-NATIONAL RESEARCH?
cross-national comparisons. The discussion
concentrates on two main matters. While the chapter is restricted to cross-
First it outlines the nature and purpose of national comparative research, even this focus
comparative cross-national research designs is not without its definitional problems. As
and how this broad design relates to other we shall see, one of the purposes of cross-
major types of research design. The purpose national research is to assess the role of
of this discussion is to argue that while most culture in shaping outcomes. The problem in
research can be considered comparative, there comparing nations is that nations and cultures
are quite distinctive elements of comparative are not synonymous. On the one hand, many
cross-national research that deserve special countries consist of quite distinct cultures
attention. within the same national border while the
The second goal of the chapter is to describe one culture is not necessarily constrained by
and evaluate two broad forms of comparative national borders (see discussion p. 258).
cross-national research – case based and
survey based. Apart from demonstrating
Types of research design
that comparative cross-national designs come
in two main forms, the purpose of this At its simplest, cross-national comparative
discussion is to show that most of the research is research in which nations are
problems encountered by researchers engaged compared on some dimension (Przeworski
250 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
and Teune 1966). The purpose of cross- groups. Statistical techniques enable investi-
national comparisons may either be simply gators to control or remove these differences
to describe national differences or to draw to ensure group equivalence on specified
on the logic of comparisons to explain cross- characteristics.
national similarities and differences. This Suppose a study was being planned to
chapter focuses on explanatory forms of assess the impact of divorce on the educa-
comparative cross-national designs. tional performance of children. This would
To understand the place of cross-national involve comparing comparable children from
comparative designs within social science intact and divorced families. However, since
methods it is useful to review Smelser’s children from certain types of circumstances
(1972) fourfold classification1 of method- are more likely than others to experience
ological approaches. parental divorce it is necessary to distinguish
The first approach is the experimental between the effect of divorce and these
method which Smelser, like many others other circumstances. This is achieved by
regards as the gold standard in research. statistically removing the effect of these other
The simplest experimental design involves differences to then assess the impact of
the comparison of two groups at two time divorce – other things being equal. Statistical
points. Initially these two groups are identical, controls are an attempt to simulate the effect
a condition that is achieved by random of random allocation to groups that is used in
allocation of cases to the two groups. Initial the experimental method.
measures on an outcome variable are obtained A third approach outlined by Smelser is
from both groups prior to one of the groups the comparative method. This approach can
(the experimental group) being exposed to an also be understood as simulating some of
experimental intervention. The other group the features of the experimental method. This
(the control group) is not exposed to the approach will be discussed in detail in Part 2.
intervention. At some point following the The fourth approach that Smelser identifies
intervention both groups are remeasured is the case study method. This method can
on the outcome variable. The effect of consist of either single cases or multiple cases.
the intervention is measured by comparing Where multiple case studies are used the logic
the amount of change in the experimental of the case study method can be similar to
group with that in the control group. Any that of the comparative method as outlined by
significant difference in the amount of change Smelser.
between the two groups is attributed to While it is useful to view comparative
the effect of the intervention since, ideally, cross-national designs within this framework
this is the only difference between the two of experimental, statistical comparative and
groups. case study designs this framework does
For ethical and practical reasons, the exper- not fully incorporate all the work covered
imental method cannot be used for most social by comparative or cross-national studies.
science research. This has led to many social Many studies that involve some comparisons
scientists adopting what Smelser calls the between nations and cultures fit more readily
statistical method. The logic of the statistical under the heading of the statistical method.
method is to simulate important aspects I will argue, along with Ragin, that there are at
of the experimental method by ensuring least two different approaches to comparative
that the groups that are comparable are as research – what Ragin (1987) calls the
similar as possible except in relation to the variable-based and the case-based methods.
causal and outcome variables. The statistical The variable-based method is equivalent to
method relies on multivariate analysis to the statistical method outlined by Smelser
compare groups that differ in regard to the and the case-based method is similar to
key independent variables and statistically to Smelser’s description of the comparative
remove other relevant differences between method.
COMPARATIVE AND CROSS-NATIONAL DESIGNS 251
characteristics could not be responsible for the For example, there is nothing in the example
common outcome. in Table 15.1 to preclude the argument
When comparative cross-national analysis that prosperity plus a high value placed on
uses this reasoning it usually proceeds by privacy or prosperity plus low levels of
beginning with the observation of the same family solidarity result in high levels of solo
behaviour across countries (e.g. that the living.
countries share a high rate of solo living) A final problem is the level of abstraction
and then seek the single characteristic that the at which concepts are used. This point can
countries have in common that could explain be illustrated by the story of a man who,
this common behaviour. one evening, drank a great deal of scotch
This form of reasoning has important and soda and woke up the next morning
shortcomings which mean it must be used with with a hangover. The next evening he drank
care. a great deal of brandy and soda and again
First, it is impossible to list and compare woke up with a hangover. After drinking gin
every possible characteristic of two coun- and soda the next evening and subsequently
tries. The method can, at best, concentrate waking up with a hangover he concluded
on comparing relevant characteristics – in that the soda was causing the hangover.
this case, characteristics that might affect While this reasoning may appear logical by
national rates of solo living. But the selection this method, the reasoning is flawed because
of such factors is inevitably driven by of the conceptualization of the variables
theory or previous research and therefore and the failure to recognize the common
risks missing factors not considered by the element of scotch, brandy and gin. Similarly,
theories. conceptualizing characteristics of a country
Second, the method is biased towards the at a highly specific level can cause an
concept of mono-causation – that an outcome investigator to miss more abstract features
has a single cause. In social life this is by no that countries have in common. Alternatively,
means true and many phenomena can have conceptualizing country characteristics at too
both multiple and alternative causes. While general a level (e.g. democratic) may cause
the example in Table 15.1 is consistent with one to overstate the degree of similarity
prosperity (X1 ) being a cause of living alone between the countries – a problem described
rates it certainly does not demonstrate that it by Ragin as the problem of ‘illusory common-
is the only cause. It may be the only cause ality’.
identified within a limited set of factors but However, for all its dangers, the Method
the method cannot, in reality, exhaustively of Agreement can play a useful role by
eliminate all other factors. eliminating possible explanations. If ‘nothing
Third, the Method of Agreement is com- can be the cause of a phenomenon which is
pletely unable to identify interaction effects not a common circumstance in all instances
or what is called ‘chemical causation’ (Mill of a phenomenon’ (Cohen and Nagel 1934),
1879, Vol. 8, pp. 204–8). That is, some effects the Method of Agreement can be used to
will take place only when two characteristics eliminate explanations that do not meet this
are present in a particular combination. criterion.
254 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
understanding by helping specify the types of comparison. While the case-based approach
conditions under which a pattern applies. described above uses variables these variables
are placed and interpreted within the context
Small numbers of the whole case. The initial focus of the
Case-based comparative cross-national anal- case-based approach is to understand the
ysis that seeks to understand elements of the whole country so that specific attributes
whole within their historical, cultural, social can be interpreted within the context of
and economic context is time-consuming and the whole. A variable-based approach pays
difficult. The method limits the number of little attention to the whole and largely uses
countries that can be thoroughly studied. In variables without paying attention to the
practice this means that case-based compar- meaning of the attributes in particular cases.
ative designs frequently compare just two or Attributes are more or less treated as meaning
three countries and this in turn results in the the same thing regardless of the country in
problem of too few cases (Lijphart 1971). which they are measured.
Clearly such a small number of cases
precludes statistical generalization. But a
small number of cases still allows for gener-
Two types of comparative
alization based on the logic of replication –
survey studies
the same basis that is employed with most
experiments. As findings are replicated and The two most important types of cross-
the range of conditions under which they national survey-based research designs are
apply are specified by repeated experiments those in which the country is the unit of
(or comparisons between pairs of countries) analysis and those in which individuals are
the investigator becomes more confident the unit of analysis.
about the results and can specify the range
of situations to which they apply (de Vaus Country as the unit of analysis
2001). With this design, data are collected about
The other problem with using such so few the country at an aggregate level. A set of
cases is that it becomes difficult to apply characteristics of a country are delineated and
the logic of the Methods of Agreement or each country is coded on each of these char-
Difference (Lieberson 1991). With a very acteristics so that they are characteristics of
small number of cases the patterns can be nations/cultures rather than of the individuals
highly ambiguous and indeterminate. For in the nation/culture.
example, the Method of Agreement relies on An example of this type of survey is
finding one common factor across cases. But the Human Relations Area File (http://www.
where only three or four countries are included yale.edu/hraf/). For each country or culture,
in a comparative cross-national study there codes are created to indicate the country’s
may be many characteristics that such a or culture’s characteristics. The Human Rela-
limited number of countries share. Only tions Area File consists of a large number
through the examination of further cases do of variables that capture characteristics of
patterns of agreement begin to come into each culture (e.g. kinship rules, marriage
focus. rules, language characteristics, religious char-
acteristics, ways of thinking etc.). All these
variables reflect the characteristics of the
PART 3: SURVEY BASED country or culture – not the individuals in the
CROSS-NATIONAL COMPARATIVE country.
RESEARCH Aggregate data of this type are also
used widely by economists, criminologists,
Survey-based comparative cross-national political scientists and others in comparative
research employs a variable-based method of cross-national studies. While the nature of the
COMPARATIVE AND CROSS-NATIONAL DESIGNS 257
Furthermore, not all individuals contribute The lack of cultural homogeneity of most
equally to shaping the national culture or nations means that it is difficult to infer culture
mood. Verba (1993) suggests a variety of from nation. However, most comparative
ways in which surveys might try to take surveys are based on national boundaries and
into account the uneven impact of different thus identify national rather than cultural
types of individuals in shaping the national differences. Given national heterogeneity any
picture. differences between countries may be due to
the impact of a particular part of a nation rather
Instability of measurements than any national culture. Indeed cultural
Survey research that discovers inter-country variations within a country may even be
differences requires reliable measures. Long greater than those between nations and cross-
ago Scheuch (1968) reminded comparativists country differences may simply be a statistical
that many of the so-called differences between artefact. Care therefore is required when
countries were in fact differences of only interpreting cross-national differences. The
a few percentage points. To interpret these need to explore variations within countries as
differences in terms of cultural characteristics well as between countries is required if one
requires that these inter-country differences is to avoid simplistic attributions of between-
are both real and persistent. However, given country differences to cultural differences.
the many sources of measurement error in There are, of course, valid reasons for
comparative research (see later discussion) it using national rather than cultural boundaries.
is a brave person who can confidently say that National boundaries are clearly defined and
the observed differences between countries relate closely to the available statistical data.
reflect real differences and are not simply an They also relate to policy and legislative
artefact of measurement error. Certainly one frameworks and provide a means of evaluat-
would want to be assured that the same pattern ing the impact of national laws and policies –
of inter-country differences persists over time matters that are frequently of more interest to
and with alternative measures. governments and funding agencies than the
unique impact of particular cultures (Hantrais
What is to be compared? 1999).
One of the purposes of cross-national research The reverse problem, known as Galton’s
is to assess the role of culture in determining problem, can also complicate the interpreta-
various outcomes. The problem confronted tion of cross-national differences. ‘Galton’s
by cross-national survey research is that Problem’ is the problem of interpretation
nation and culture are not synonymous. While due to cultural diffusion whereby the culture
country provides the frame from which survey of one country spreads to other countries
data are collected (whether it be at the and creates a degree of uniformity between
individual or aggregate level) these national countries. That is, each country is not truly
boundaries do not necessarily correspond to independent of the other. Where this is
cultural boundaries. Scheuch (1989) argues the case comparative cross-national analysis
that ‘there exists a German culture …[but] this may discover uniformity across nations (e.g.
does not, nor ever did, coincide with the polit- family forms or taboos), that is due to cultural
ical boundaries of any one political entity’. diffusion rather than to the operation of
Rokkan (1970) distinguishes between cross- universal principles.
national, cross-cultural and cross-societal
comparisons. Dogan and Pelassy (1984) point
Equivalence in cross-national
out that ‘Juan Linz delineated eight Spains,
comparisons
Erik Allardt four Finlands, and Stein Rokkan
as many Norways. Anyone knows that there The goal of any cross-national survey is to
are three Belgiums, four Italys and five or collect data in such a way that any cross-
six Frances’. national differences in survey findings can
COMPARATIVE AND CROSS-NATIONAL DESIGNS 259
be attributed to real differences between the are required. Common coding frameworks
countries rather than to differences in data and ways of managing non-equivalent
collection methods. There are two key sources responses (e.g. political party supported)
of what can be called non-equivalence error need to be specified. The ESS has made
in cross-national surveys: the adoption of considerable advances in specifying the way
non-equivalent methodologies and the non- in which equivalence in these areas can be
equivalence of the meaning of the data that achieved.
are collected. The issues of equivalence are Until the ESS insisted on conformity to
covered in some detail in Hantrais (1999). detailed survey requirements and established
clear documentation standards, the informa-
Methodological equivalence tion required to evaluate whether surveys
Cross-national differences in survey results conducted in different countries were actu-
can be due to methodological differences such ally comparable was frequently unavailable
as non-equivalent samples, data collection (Harkness 1999). This in turn has meant that
methods and coding frames in different we really do not know whether we can safely
countries. compare the data from many multi-country
Achieving such equivalence is difficult surveys. The use of ESS specifications will
(Harkness 1999). The European Social provide a major improvement in achieving
Survey (ESS) stands out for the diligence methodological equivalence in comparative
with which it minimizes methodological non- cross-national surveys.
equivalence error in cross-national surveys. However, even with detailed specifications
By adopting a centralized structure the ESS and rules to achieve equivalence the reality
imposes the same methodology on each of the remains that it is difficult to achieve equiv-
participating countries. This standardization alence in the implementation of surveys in
includes such matters as the organization of different countries (Mitchell 1965). Not only
the survey group in each country, sampling are some countries better equipped to conduct
methods, fieldwork, the ways in which quality surveys, countries vary in the types
response rates are calculated, the level of sampling frames that are available, the
of survey documentation and many other methods of administration that are possible
detailed aspects of conducting and reporting and even the level of survey ‘literacy’ of the
the survey in each country. These detailed population (Bulmer 1998; Harkness 1999).
specifications are available in the ESS website Furthermore, cultural differences in matters
(http://naticent02.uuhost.uk.uu.net/index.htm). such as politeness can affect both response
Since sample design and size affect rates and the presence of acquiescent response
the error of estimates the ESS provides sets (Jones 1963).
detailed rules about the way in which All these factors stem from the culture in
samples are obtained and on providing which the survey is administered and therefore
information by which sample quality can be which in turn makes it difficult to standardize
assessed (http://naticent02.uuhost.uk.uu.net/ across cultures. Considerable work remains to
methodology/sampling_strategy.htm). be done to design ways of assessing the impact
Methods of administration can affect of these different methods of survey procedure
responses to different types of questions and in different contexts. Certainly, when using
result in quite different levels of non-response data from cross-national surveys investigators
and response bias. A good cross-national need to be aware of the survey design in each
survey will therefore specify the mode country and be aware of the way in which
of data collection and the specifics of cultural practices may affect the way in which
exactly how that mode will be implemented the survey is implemented. To use these
(e.g.http://naticent02.uuhost.uk.uu.net/ datasets without this understanding, risks
fieldwork/index.htm). Ways of evaluating confusing observed cross-national differences
the quality of the data in each country with real differences and failing to consider
260 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
that the differences are simply methodological another culture. For example, a common ques-
artefacts. tion in international surveys has been to ask
people to indicate their political orientation
Equivalence of meaning on a left wing/right wing continuum. But the
It is one thing to enforce the same ways of concept of left and right does not translate well
collecting data in each country in a cross- to countries where the very concept is foreign.
national survey. It is another thing to ensure A similar problem arises when asking about
that the meaning of the data is equivalent in religious beliefs in different countries where
each country. The problem of the meaning of the concept of God does not always translate
observations in different countries confronts well (Jowell 1998).
all cross-national research. However, because Equivalence is not just a matter of arriving
survey responses are typically less contextu- at equivalent language but of achieving
alized than data collected with other methods, equivalent indicators. Even when equivalent
the problem of meaning is particularly acute words are used the questions do not always
in cross-national survey research. work in the same way in different cultures.
Most questions are used to tap more abstract
concepts but the specific indicators of the
Validity in different contexts. Problems in concept can differ from one country to
assessing the meaning of observations relate the next.
to the validity and reliability of survey ques- Even questions designed to measure
tions. Cross-national surveys produce special behaviour or personal attributes encounter
problems for validity since the way in which problems. Here the problem may be less a
questions are understood can vary sharply in matter of achieving equivalent wording but
different cultural contexts. Validity problems in determining how to interpret responses.
in comparative cross-national surveys arise The same response will not necessarily
from the difficulty of ensuring that questions have the same meaning in different cultures.
mean and measure the same thing in different Educational level is measured in most surveys
countries. but in cross-national surveys working out
The problem of equivalent meaning is equivalent levels of education is confounded
obvious when the questionnaire needs to be by different systems and qualifications. Even
administered in different languages. Where age is problematic (Verba 1993) especially
this is the case the first task is to ensure that where age is used as a proxy for other concepts
the equivalent meaning is contained in the such as stage in the life cycle. Depending on
different translations. A common approach the culture and society, knowing that a person
to ensuring that the language is equivalent is 20 years old indicates different things.
is to use blind back-translation methods These simple examples highlight the fun-
(Brislin 1970). This involves beginning with damental characteristic of all social measure-
a base language (e.g. English) and then ment. The meaning of the measurement must
translating the questionnaire into each of the be derived from the culture. This means that
languages used in the survey. To check on the same responses (e.g. years of education,
the accuracy of the translation, the translation voting behaviour, occupation or age) may not
is then independently translated back into have the same meaning in different cultures.
the base language and the two versions of
the questionnaire in the base language are Literal and functional equivalence. One
compared. of the decisions any comparative survey
However, it is not always possible to researcher must make is whether to aim
achieve a neutral or an accurate translation. for literal or functional equivalence. Literal
Since language is a carrier of culture, the equivalence is achieved where identical stim-
words can reflect culturally specific meanings uli are used in all countries and is exemplified
and concepts that may have no equivalent in in Almond and Verba’s (1963) The Civic
COMPARATIVE AND CROSS-NATIONAL DESIGNS 261
Culture, a classic study in comparative poli- This method involves developing measures
tics. Using this approach, literal translations of concepts in each country that consist of
and the same indicators of concepts are a mixture of country-specific indicators and
used in each country. The shortcomings of indicators that are common to all the countries
literal equivalence have already been outlined being compared. In this way there is some
above. capacity to evaluate the extent to which the
The alternative is to aim for functional country-specific indicators capture the same
equivalence (Przeworski and Teune 1966; underlying concept as do the common cross-
Scheuch 1968). Functional equivalence is national indicators.
achieved where the goal is to measure the
same construct but the specific means by
Improving equivalence
which the construct is measured can vary
from place to place. The notion of functional Equivalence is a continuum. While the goal
equivalence is based on Lazarsfeld’s argument in comparative cross-national survey research
that indicators can be interchangeable. In is to achieve full equivalence this goal is
cross-cultural research the argument is that unlikely to be realized. Nevertheless, there are
measures must be culturally relevant and that ways in which equivalence can be improved.
therefore different measures will frequently At the measurement level equivalence
be required to measure the same concept in is much more likely to be achieved by
different cultures. aiming for functional than literal equivalence.
The ESS seeks to achieve functional rather While methods such as identity equivalence
than literal equivalence of question wording. techniques can be useful they do not fully
Rather than insisting on literal translations resolve the issue of establishing that different
with the standard blind back-translation sets of indicators are functionally equivalent.
approach a Translation Panel works with Cognitive interviewing, by which means
the questionnaire design teams. This panel investigators try to access the meanings that
provides detailed annotations to the question- respondents attach to questions and their
naire that explain the purpose and meaning answers can assist in evaluating whether
behind questions and concepts. The purpose different questions are functionally equiv-
of these annotations is to assist the translators alent in different countries. Of course the
in retaining the meaning of the concepts traditional ways of assessing the validity
and to assist them in developing wordings of any measure can be used to improve
that capture the meaning behind the question the functional equivalence of measures in
while freeing them from a strict literal trans- different cultures.
lation (http://naticent02.uuhost.uk.uu.net/ At the level of executing comparable
methodology/translation_strategy.htm). surveys with comparable samples and com-
The notion of functional equivalence is the parable data collection methodologies there
most defensible approach in cross-national is room for considerable improvement (Lynn
research as it recognizes that meaning derives 2003). Much more careful specification of
from a context. However, the difficulty is in standards and requirements for surveys in
knowing whether one has achieved functional each participating country is essential. While
equivalence. It is one thing to accept that it will not be possible to implement identical
constructs can be measured in different procedures in all countries, some variation
ways in different cultures but it is quite could be eliminated by more rigorous specifi-
another to demonstrate that the different cation requirements such as those used in the
ways are functionally equivalent. Przeworski ESS model. More thorough documentation
and Teune (1966) proposed one method will assist investigators in interpreting inter-
which they call the ‘identity-equivalence’ country differences in results and assist in
method for deriving functionally equivalent analyzing data so as to minimize the effect
indices of concepts in different countries. of these inter-country survey differences.
262 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
The equivalence of survey methodologies Defelice, E. G. (1986). ‘Causal inference and compar-
and the difficulties that non-equivalence ative methods’. Comparative Political Studies 19(3):
creates for cross-national comparisons is a 415–437.
problem and has been recognized as such. But de Vaus, D. A. (2001). Research Design in Social
the problem is not unique to cross-national Research. London, Sage.
de Vaus, D. A. (2002a). Surveys in Social Research,
surveys. Precisely the same issue confronts 5th edn. London, Routledge.
repeated cross-sectional studies that attempt de Vaus, D. A. (ed.) (2002b). Social Surveys, 4 volumes.
to track trends within countries. The non- London, Sage.
equivalence of question wording, samples and de Vaus, D. A. (ed.) (2006). Research Design, 4 volumes.
methodologies confronts any survey analyst London, Sage.
trying to interpret trend studies (Kulka 1982). Dogan, M. and D. Pelassy (1984). How to Compare
These shortcomings in case-based and Nations. Chatham NJ, Chatham House.
survey-based methodologies in cross-national Hantrais, L. (1999). ‘Contextualization in cross-national
comparative research’. International Journal of Social
comparative research are not reasons for
Research Methodology 2(2): 93–108.
avoiding cross-national research any more
Harkness, J. (1999). ‘In pursuit of quality: issues for
than they are for avoiding these methods cross-national survey research’. International Journal
in national or sub-national contexts. As the of Research Methodology 2(2): 125–140.
world becomes increasingly globalized we Jones, E. L. (1963). ‘The courtesy bias in South-East
can only anticipate a growth in the need Asian survey’. International Social Science Journal
and opportunity for cross-national research. 15(1): 70–76.
An awareness of the challenges faced in Jowell, R. (1998). ‘How comparative is comparative
conducting such research is part of the research?’ American Behavioral Scientist 42(2 Oct.):
solution to reducing the effect of these 168–177.
Kohn, M. L. (1987). Cross-National Research as
problems and for evaluating the claims made
an Analytic Strategy. Cross-National Research in
on the basis of cross-national comparative Sociology. Newbury Park, Sage.
research. Kulka, R. A. (1982). ‘Monitoring social change via survey
replication: prospects and pitfalls from a replication
survey of social roles and mental health’. Journal of
NOTES Social Issues 38(1): 17–38.
Lazarsfeld, P. F. and H. Menzel (1961). ‘On the rela-
1 Smelser actually identifies five types but one of
tionship between individual and collective properties’.
these – the method of heuristic assumption – is not Complex Organisations. A. Etzioni. New York, Holt,
particularly relevant to this discussion. Rinehart and Winston, pp. 422–440.
Lieberson, S. (1991). ‘Small N’s and big conclusions: an
examination of the reasoning in comparative studies
based on a small number of cases’. Social Forces 70:
REFERENCES 307–20.
Lijphart, A. (1971). ‘Comparative politics and the
Almond, G. and S. Verba (1963). The Civic Culture. comparative method’. American Political Science
Princeton, Princeton University Press. Review 65(3): 682–693.
Bendix, R. (1963). ‘Concepts and generalisations in com- Lijphart, A. (1975). ‘The comparable cases strategy in
parative sociological studies’. American Sociological comparative research’. Comparative Political Studies
Review 28: 532–539. 8: 158–177.
Brislin, R. W. (1970). ‘Back-translation for cross-cultural Lynn, P. (2003). ‘Developing quality standards for
research’. Journal of Cross Cultural Psychology cross-national survey research: five approaches’.
1: 185–216. International Journal of Social Research Methodology
Bulmer, M. (1998). ‘The problem of exporting social 6(4): 323–336.
survey research’. American Behavioral Scientist Mill, J. S. (1879). A System of Logic, 8th edn. London,
42(2 Oct.): 153–167. Longmans Green.
Cohen, M. R. and E. Nagel (1934). An Introduction Mitchell, R. E. (1965). ‘Survey materials collected
to Logic and Scientific Method. New York, Harcourt in the developing countries: sampling measure-
Brace Inc. ment and interviewing obstacles to intranational
264 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
This section of the handbook delves into advanced framework that is model based
several different ways to collect data and con- that relates test items to examinee and
duct fieldwork. Social science methodology is item characteristics. IRT analysis produces
very rich in the choices it provides an inves- an equation that describes the relationship
tigator in conducting research. Matching the between the respondent and item parameters.
appropriate method to the research question Scores from CTT are dependent on the
sometimes makes this richness overwhelm- characteristics of the respondent and the
ing. However, the choices provide the tools specific test. These two characteristics cannot
that are needed to conduct research. A sharper be separated in the CTT approach. The IRT
tool should provide a more clearly detailed model-based approach is not specific to the
answer. While most of the world focuses test or questionnaire used and the sample
on the answer the trained social scientist tested. With IRT different measures of the
is aware that how the question is answered same trait can be used without expensive test-
can be as important as the question itself. equating procedures. In CTT the reliability of
This handbook section provides a range of the test increases with its length that produces
such tools that include both introductory and long tests or questionnaires with redundant
intermediate approaches. items. IRT allows the selection of items of
The chapter by Bovaird and Embretson on varying and non-overlapping difficulty so that
tests and measurement may be difficult to read tests can be considerably shorter than those
but it is worth the effort. The chapter deals developed under CTT.
with a well-established approach to measure IRT is most advantageous when computer-
development that has recently received more based adaptive testing is used. In conven-
visibility in the social sciences. Item response tional testing everyone gets the same or
theory (IRT) can be applied to survey research, parallel versions of a test. With IRT each test
marketing, and health contexts in addition can be individualized by selecting items of
to most substantive areas in education and varying difficulty from a pool of items. This
psychology. The authors argue that classical approach provides a more accurate estimate
test theory (CTT), which is the primary of the person’s ability in much less time. It is
social science approach to measurement, not expected that the use of IRT will continue to
only makes unrealistic assumptions about grow and displace much of CTT.
the characteristics of the data needed but Susan Speer’s chapter provides an over-
also lacks several important advantages of view and critical evaluation of the debate on
IRT. The latter is more flexible and has an the relative advantages and disadvantages of
266 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
‘natural’ versus ‘contrived’ data, or ‘unobtru- of being relatively inexpensive but require
sive’ and ‘obtrusive’ methods. She concludes accurate names and addresses. They are
by saying that by adopting a reflexive also subject to a low response rate. Internet-
approach to interviews or other contrived or Web-based survey are also relatively
data collection procedures we can obtain inexpensive, allow complex skip patterns
rich insights into interactional issues and the that written questionnaires cannot do but are
workings of normativity in culture. On the clearly limited to those persons who have
other hand she stresses that we can never easy access to the Internet. Finally, group
achieve an unmediated access to participants’ administration of questionnaires, as in a
realities, neutralize the context, or disinfect classroom, can be used if appropriate. The
our data entirely of the researcher’s presence, author provides an excellent summary of
because the knower is always intimately the advantages and disadvantages of each of
bound up in and partially constitutive of what these techniques that will aid the researcher
is known. Finally, what are natural data cannot in making the correct choice in which method
be decided on the basis of their type and/or to use.
the role of the researcher within the data. In qualitative research, the most common
Rather, the status of pieces of data as natural methods of data collection are in-depth
or not depends largely on what the researcher and semi-structured interviews. Feminist
intends to ‘do’ with them. researchers have been active in developing
Obtrusive questionnaires and interviews these methods in recent years. Doucet and
form the lion’s share of the social research Mauthner discuss qualitative interviewing
literature. The chapter by de Leeuw will from the standpoint of feminism and view the
help researchers plan their study using research interview as a way of constructing
these approaches. One of the first problems knowledge. They argue that feminists have
researchers face is fewer people are willing problematized key issues in the use of
to answer questions. In many cases the only interviews as a research tool: who produces
way to assure an appropriate sample is to knowledge, with what politics, and from
offer to pay the respondent. However, de which locations. The discussion covers issues
Leeuw provides several suggestions for how around rapport and the relational aspects
to optimize response rates. of interviewer-interviewee relationships. In
Another issue discussed in this chapter is discussing power differences they show how
how to write the questions. It seems obvious feminists have come to see the researcher
that the answer to the question needs to as both ‘outsiders’ and ‘insiders’ in the way
reflect what we wanted to know. However, they relate to their interviewees and invest
respondents may not understand the question their identities in the research relationship
in the way the person who wrote it expects. but also in their relation to the data they
Education, culture, experience all shape how produce. In referring to interview dynamics
we understand what we are being asked. they point to the two-way nature of power
Writing good questions requires pre-testing. between respondents and interviewers in the
The chapter introduces the use of cognitive co-production of interview material. The
psychology in question development. discussion also moves to the power of
The chapter also reviews several researchers to represent the narratives of
approaches to data collection. In person or those they study including the links made
face-to-face interviews are the most flexible with theory, the transcription, interpretation,
and can help and motivate respondents. and writing up.
Telephone interviews are less flexible and do While qualitative interviews are often
not possess the visual cues that can be used directed to understanding the commonali-
during an in-person interview to determine ties between those they study, biographical
if the respondent appears to understand the methods focus upon differences and upon
question. Mail surveys have the advantage the whole case. Biographical methods are
DATA COLLECTION AND FIELDWORK 267
enjoying a resurgence of popularity albeit, for their own sake so that the effects of time –
as Joanna Bornat shows in Chapter 20, a concern with ‘pastness’ is how she puts it –
a growing number of approaches have and an interest in change and continuity come
developed under this umbrella. Bornat’s to the fore. Oral history developed through
chapter is written from the perspective of a political concern to capture the unheard
an oral historian. Three main approaches are ‘voices of the past’ represents a rather more
identified that have developed along rather democratic approach to data analysis and
different interdisciplinary lines: biographic- interpretation. While narrative analysis and
interpretive approach, oral history, and nar- the biographic-interpretive approach provide
rative analysis. The biographical interpretive for a deep analysis of subconscious as
method lends itself to more psychoanalytic well as conscious processes – what may
interpretations of motivation and meaning; be unspoken or unacknowledged by the
narrative analysis leans more toward socio- interviewee, oral historians maintain a greater
linguistics; while oral history draws from interpretive distance and tighter boundaries
both sociology and history. Each gives around their role as interpreters than the two
centrality to the individual account and to other approaches.
individual agency in attempting to explain Janet Smithson’s chapter on focus groups
the changing nature and persistence of discusses practical and theoretical questions
social relations and social structures; each related to using focus groups in social
makes use of the interview to generate research and suggests how to use them and
data. They differ, Bornat argues, in three analyse the data most effectively. According
important respects: the dialogic or interactive to her, the particular strength of the focus
aspects of the interview; the centrality of group method is that it enables research
memories to their interpretation; and the role participants to discuss and develop ideas
of the researcher in the interpretation of collectively, and articulate their ideas in their
the data. own terms, bringing forward their priorities
Reflecting her identification as an oral and perspectives. The limitations of focus
historian, Bornat argues that oral history group research can be mitigated by awareness
places more emphasis on the dynamics of of the constraints, informed analysis, and
the interview process than the other two by detailed consideration of the way the
approaches. It also places more emphasis conversations are socially constructed in the
upon the importance of eliciting memories group context.
16
Modern Measurement in the
Social Sciences
James A. Bovaird and Susan E. Embretson
While item response theory (IRT) is a viable and the potential of some IRT models
and well-established methodology for educa- to impact test design for targeted aspects
tional measures, it is still relatively unused of construct validity. We will begin with
in psychology and the rest of the social a brief discussion of what constitutes the
sciences. Despite its underutilization in the area of testing and measurement followed
mainstream of social research, IRT is appro- by a direct contrast of IRT and CTT.
priate for consideration in any context that The shortcomings of CTT will be used to
postulates the presence of a latent construct illustrate the benefits of modern measurement
and involves constructing and/or analyz- techniques in the context of the characteristics
ing a multicomponent instrument designed of quality measurement. The chapter ends
to measure that construct-including survey with a discussion of current trends and future
research, marketing, and health contexts directions.
in addition to most substantive areas of
education and psychology. Some attractive
features of IRT include the possibility of TESTING AND MEASUREMENT
more flexible construction of alternative
test forms, shorter and more efficient tests, Most social scientists are interested in unob-
equating, and interpretation of scores with- servable human attributes that are often
out norms. This chapter will review and referred to as latent constructs, raising the
emphasize the benefits of contemporary IRT, issue of imparting a clear meaning to the num-
including the technical advances of IRT bers that are assigned to represent levels of
over methods based on classical test theory a construct, a process called measurement or
(CTT), the role of modern measurement psychometrics. Testing then refers to sampling
methods in computer-based testing (CBT) the individual behavior that is observable
and computerized adaptive testing (CAT), at a given point in time. Unfortunately,
270 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
measurement instruments cannot exactly rep- scales can be further classified as categorical
resent the latent construct, so the quality of data, and interval and ratio scales are often
measurement is defined by the presence of classified as continuous data.
four characteristics: a standardized mode In general, there are three basic item types
of test administration, a meaningful metric in use in the social sciences. The first type
for obtained scores, score reliability, and is a response set that represents a range
score validity. These four characteristics of trait levels ordered from low to high.
contribute to the interpretability (or lack Examples would be rating scales (often called
thereof) of scores obtained in testing and Likert scales; Likert, 1932), physiological
will be expanded upon in subsequent sections measurements, and any other ‘continuous’
as a means of distinguishing between CTT measures. While not all of the response sets
and IRT. fully meet the requirements for interval level
While some attributes such as age or of measurement, there is an assumption of
weight can be precisely measured with a an underlying continuum. (See Goldstein &
single measurement, most constructs are Hersen (1984) for a discussion of Likert-type
much harder to test with single measures. items and interval properties.) The second
Consequently, most tests or scales contain type of item has a dichotomous (two response
multiple measures, each representing a single options) response format such as true/false
observation of the characteristic. In education, questions or checklists (an endorsement
and testing in general, simple measures are constitutes the presence of the behavior, trait,
often called items, in survey research they event, etc. while the absence of endorsement
may be called questions, and in experimental indicates the absence of the behavior). The
psychology they may be referred to as third item type is a dichotomous scoring
stimuli or cues. Consistent with the testing of a polytomous (more than two response
background from which measurement has options) response set such as the case with
primarily developed, we will collectively multiple choice formats. Typically, there is a
refer to questions, items, and stimuli as correct answer and a set of distractors and the
items. The number of items required in resulting dichotomous data represents either
a scale depends on the complexity of the a correct/incorrect response or a pass/fail
characteristic. Individual items tend to be decision.
poor measures and often partially reflect
attributes other than the targeted construct.
Thus, the variability among responses to an CLASSICAL TEST THEORY AND
individual item contains a portion attributable MODERN MEASUREMENT
to the targeted construct, or true score
variance, and a portion attributable to random Historically, CTT has provided a general
error and unrelated systematic sources, or framework for the development, administra-
measurement error. tion, and interpretation of assessment tools.
The numerical representation of an observ- Gulliksen (1950) is often referred to as the
able behavior requires a clear and definitive defining volume for CTT, but much of the
rule for associating one and only one num- work was first formalized by Spearman in
ber with the magnitude of an individual’s the early 1900s, well before Lord and Novick
construct level. Given a sample O of N (1968) laid the foundation for IRT. According
distinct participants, any participant can be to McDonald (1999), there are two views on
assigned a true score t(os ). A procedure is then the relationship between CTT and modern
devised for pairing each participant os with measurement. McDonald argues that CTT
its imprecise numerical measurement, m(os ). may be viewed as a reasonable approximation
Measurement scales can be classified as one to IRT under certain conditions. Conversely,
of four scales of measurement: nominal, since the development of CTT occurred prior
ordinal, interval, or ratio. Nominal and ordinal to the development of IRT, there exists the
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 271
accurate impression that IRT represents a and determine the amount of error in test
significant change in theoretical perspective scores. The identification of a common factor
from CTT. According to Embretson and gave rise to the concept of a true score
Reise (2000), CTT can be best described and the common factor theory. Spearman’s
as representing a set of ‘Old Rules of common factor theory was further developed
Measurement’ that have served applied psy- and elaborated by Thurstone (see Thorndike &
chologists and psychometricians for decades. Lohman, 1990), Guttman (1957), Lawley
Developed from common factor theory, (see McDonald, 1999), and Joreskog (see
CTT provides fairly accurate psychometric McDonald, 1985). Spearman also showed
information for items resulting in continuous how a correlation between two alternate
data. However, there are several inherent forms of a test could be used to estimate
shortcomings involved when CTT is applied the amount of measurement error in test
to categorical data that arise from polytomous scores which became the primary purpose
response formats and dichotomous scoring. of CTT. Guttman (1945) introduced the
While CTT methods may provide reasonable concept of internal consistency by showing
approximations with binary data, it is only a how items within a test could also be used
linear approximation to a nonlinear system. to determine test reliability, and Cronbach
As suggested by both the traditional label and (1951) continued the work to the extent that
the name given to them by Embretson and the most common CTT measure of internal
Reise, the old rules have been improved upon consistency reliability is named after him,
by two modern model-based frameworks for Cronbach’s coefficient alpha (α).
measuring abilities: the extension of biserial IRT developed through the work of two
and tetrachoric correlation theory with the traditions spanning both sides of the Atlantic
common factor model referred to as item fac- Ocean. In the United States, Lazarsfeld
tor analysis (Bock, Gibbons, & Muraki 1988; (1950) introduced latent structure analysis,
Knol & Berger, 1991), and the development of which eventually became known as IRT.
essentially a nonlinear common factor model IRT combines factor analysis with the phi-
suitable for conditional probabilities, or item gamma hypothesis1 , one of the oldest laws
response theory. Item factor analysis is best in psychology that can be traced back as
discussed in the context of structural equation early as 1878 (see Guilford, 1954; McDonald,
modeling and confirmatory factor analysis 1999). Another key development was Lord’s
and will not be covered further in this chapter. (1952) demonstration that Spearman’s single
The interested reader is referred to Mislevy factor theory could be applied to binary
(1986), Muthén (1978), or Takane and de items. Lord and Novick (1968) included
Leeuw (1987) for more information. The four chapters from Allan Birnbaum on IRT.
following sections will present a summary of Bock and Aitken (1981) provided the ele-
the classical rules of measurement, contrast gant marginal maximum likelihood (MML)
them with the ‘new’ rules of measurement, method for parameter estimation. In Europe,
and illustrate how IRT better addresses some Rasch (1960) proposed what is now known
of the shortcomings of the classical methods, as the Rasch model or 1-parameter logistic
primarily when applied to binary data. (1PL) model. Anderson (1972) elaborated
on the MML estimation methods for Rasch
item and person parameters. Gerhard Fischer
Historical development
(1973) extended the binary Rasch model to
Classical test theory is frequently cited as define parameters by incorporating stimulus
having its roots in Gulliksen (1950), however, properties, treatment conditions, etc. using
the procedures upon which CTT is based a linear logistic latent trait model (LLTM).
were developed much earlier by Charles Others have progressed the field of IRT since
Spearman (1927) who described how to this seminal work, but they are too numerous
recognize that tests measure a common factor to name.
272 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Classical test theory details, see the excellent texts by Baker and
Kim (2004); De Boeck and Wilson (2004);
The focus of CTT is to understand and
Embretson and Reise (2000); Hambleton,
improve the reliability of test scores. CTT is
Swaminathan, and Rogers (1991); and van der
also synonymous with true score theory due
Linden and Hambleton (1996).
to its decomposition of observed scores (X)
The purpose of IRT is to provide an
into true score (T ) and error (E). According to
equation, called an item response function
CTT, at the examinee level, any observation
(IRF), to maximize the relationship between
is a realization of a random variable X with
examinee and item parameters and the prob-
a probability, or propensity, distribution. The
ability of a discrete response outcome such
examinee’s true score is then the expectation
as endorsing an item or answering an item
of this propensity distribution. That is, if an
correctly. While the only explicit assumptions
examinee were observed an infinite number
in CTT pertained to the distribution of
of times, the true score would be the average
measurement errors and their relationship
of the multiple observations. The difference
with other variables, IRT makes two strong
between the actual observation and the true
assumptions. The first assumption, local
score is the error in measurement, where
independence, requires that an examinee has
error is also a random variable but with
a true location on at least one continuous
an expectation of zero. CTT also assumes
latent dimension (true score) that can explain
that errors are normally distributed and
performance, resulting in responses that are
uncorrelated with other variables.
statistically independent. In other words, pro-
However, CTT is applied at the level of
per specification of the latent dimension(s)
the test rather than the examinee level, so
explains any relationship between observed
when examinees are randomly sampled, T
responses. There may be more than one
becomes a random variable also. Reliability
dimension underlying performance, but all
then is the ratio of variability in true scores
dimensions relevant to explaining perfor-
to variability in observed scores, where the
mance are specified. Secondary factors are
square root of reliability is the correlation
assumed to be mutually independent and
between true and observed scores. There have
collectively orthogonal (unrelated). In the
been a number of methods developed to
event that not all relevant dimensions are
estimate CTT reliability, some of which will
specified, research has shown that IRT is
be discussed in a later section. For a more
robust to minor violations of this assumption
detailed discussion of CTT see McDonald
as long as there is a strong dominant
(1999) or Crocker and Algina (1986) in
factor (Drasgow & Parsons, 1983; Tate,
addition to the classic Gulliksen (1950) and
2002).
Lord and Novick (1968) texts.
The second assumption is that the relation-
ship between performance and the underlying
dimension has a specific form. In most IRT
Item response theory
applications, including the most common
IRT, also referred to as latent trait theory, models presented here, the item-trait rela-
strong true score theory, or modern mental tionship can be adequately described by a
test theory, represents a more flexible and monotonically increasing IRF whereas the
more sophisticated testing framework than level of the trait increases, the probability
CTT by making CTT hypotheses more of a correct response or item endorsement
explicit. IRT represents a collection of increases as well, in accordance with the phi-
related model-based psychometric theories gamma hypothesis. Also referred to as an
that relate item responses to examinee and item characteristic curve (ICC), item response
item characteristics. For a more thorough dis- curve, or trace line, the IRF maps examinees’
cussion of the principles of IRT than what is locations on the latent continuum across
presented here, including additional technical levels of a construct. Item characteristics,
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 273
or parameters, determine the shape of the ICC The fundamental item response model is the
and will be described shortly. IRT models Rasch model (Rasch, 1960), or 1PL model,
and the corresponding IRFs differ in the
eD(θs −bi )
mathematical form of the IRF and/or the P (Xis = 1|θs , bi ) = , (1)
number of parameters in the model, but all 1 + eD(θs −bi )
will have at least one examinee trait param- where Xis is the response of person s to item i
eter and one item parameter. The reliance on (0 or 1). The linear combination of parameters
an adequate model means that IRT models Z is the simple difference between the trait
are falsifiable – they may or may not be level for person s, θs , and the difficulty of item
appropriate for a particular set of test data and i, bi . The person parameter, θs , is the person
are testable – thus, model-to-data goodness location parameter indicating a person’s level
of fit testing is essential (see Embretson & of the trait. When estimating item parameters,
Reise, 2000). Evidence of poor model fit a process referred to as calibration, the
may be an indication of a heterogeneous person parameter is assumed to be normally
population and will be discussed later in the distributed, however non-normal distributions
chapter. may be accommodated using a prior distri-
By relating the probability of an individual bution in a Bayesian framework. The Rasch
item response to both examinee and item model is called a 1-parameter model since it
parameters, the IRT model explicitly states contains only one item parameter, bi . The diffi-
that an examinee’s response to a given item culty parameter is sometimes referred to as the
will be a joint function of examinee charac- item location parameter indicating the item’s
teristics (i.e. level of the trait) and the char- position relative to the latent trait. Assuming
acteristics of the item itself. When the model that the latent trait metric is person-anchored,
of examinee behavior is probabilistic, three difficulty is interpreted in IRT as the point at
fundamental problems with CTT exist when which examinees have a 50 percent chance
applied to categorical data (see McDonald, of answering the item correctly or endorsing
1999). First, if the range of the construct is the item. Thus, if an examinee’s ability level
broad enough, CTT will result in a negative is equal to the difficulty of the item (i.e.
probability of response for examinees in θs − bi = 0), they will have a probability that
the lower tail of the trait distribution and Xis = 1 (a correct response or item endorse-
probability greater than 1.0 in the upper tail. ment) of 0.50. An item’s difficulty typically
Second, the linear common factor model ranges from −2.0 to 2.0, where a negative
used in CTT assumes that error variance is value indicates an easier, more frequently
independent from true score variance, and endorsed item. In the ability context, an item
this cannot be true for binary items. Third, with a negative difficulty parameter would be
CTT also assumes that measurement error appropriate for an examinee of below-average
(standard error of measurement) is constant ability. In a clinical context using a symptom
over all levels of the trait, and this too is checklist for depression (assuming that a
not realistic. high depression score indicates a depressed
In order to represent probability, the IRF individual), an item with a negative difficulty
must be curvilinear since it is bounded by parameter would indicate that a person who
zero and one. The logistic function, L(Z), is below the average level of depression has
where Z represents a linear combination of a 50 percent chance of endorsing that item or
item and person parameters that varies across exhibiting that symptom. The IRT difficulty
types of IRT models, is most commonly parameter is comparable to the mean item
used as the link function to relate the linear response in CTT. The 1PL model assumes
function of the parameters to the nonlinear that all items have the same degree of rela-
probability of the keyed response. The logistic tionship, or discrimination, with the construct.
link function is appropriate for a binomial In CTT, this is referred to as parallel items
dichotomous variable. (McDonald, 1999). The constant multiplier
274 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
1.0
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0.0
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Difficulty/Ability
A(1,0,0) B(1,1,0) C(1,−1,0) D(.5,1,0) E(1.5,−1,0) F(1,0,.1)
Figure 16.1 Item response functions for six hypothetical items. A, B, and C are 1PL models;
D and E are 2PL models, and F is a 3PL model. The numbers in parentheses correspond with
the discrimination, difficulty, and guessing parameter estimates, respectively
D = 1.701 is sometimes added to the logistic typically range from 0.5 to 1.5. The IRF for an
function to make it virtually indistinguishable item with high discrimination looks like a step
from the cumulative normal-ogive function function. The IRT discrimination parameter
(McDonald, 1999). corresponds to the CTT item-total correlation,
IRFs A, B, and C in Figure 16.1 reflect and a discrimination of 1.0 corresponds to
three items that differ in difficulty or location, a common factor loading of 0.70. IRFs C
but are equal in discrimination. IRF A is and E and IRFs B and D in Figure 16.1
appropriate for an examinee of average ability reflect the effect of unequal discrimination on
(bi = 0), while IRFs B and C are appropriate the probability of a correct response or item
for examinees who are above average on the endorsement. IRFs C and E have the same
trait (bi = 1.0) and below average on the trait location (bi = −1.0), but differ in the slope
(bi = −1.0), respectively. These items are of the IRF at the location parameter with IRF
equal in discrimination because they have the E having a steeper slope indicating a more
same shape or slope indicating the same rela- discriminating item. In CTT, one would say
tionship with the trait, just offset in location. item E has a higher item-total correlation than
The most commonly used IRT model, item C. IRFs B and D also share the same
the 2-parameter logistic (2PL) model location (bi = 1.0), but IRF D has a lower
allows items to vary in difficulty and in slope at the location parameter and thus is a
discrimination, less discriminating item.
The 3-parameter logistic (3PL) model is
eDai (θs −bi )
P (Xis = 1|θs , bi , ai ) = , (2) represented as,
1 + eDai (θs −bi )
P (Xis = 1|θs , bi , ai , ci )
where ai is the discrimination parameter
and is proportional to the slope of the IRF eDai (θs −bi )
= ci + (1 − ci ) , (3)
where θs = bi . Discrimination parameters 1 + eDai (θs −bi )
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 275
where ci represents a lower asymptote, or Embretson, 1991). Several models for con-
guessing, parameter for the model to reflect tinuous responses have been developed, such
the probability of a correct response by chance as Mellenbergh (1994), as well as models
alone. IRF F in Figure 16.1 illustrates the for exploring the multidimensionality of a
impact of a guessing parameter on the IRF. scale akin to exploratory factor analysis,
Item F has a lower asymptote of ci = 0.10, the exploratory multidimensional IRT model
indicating that regardless of an examinee’s (Bock, Gibbons, & Muraki, 1988) and con-
ability or location on the trait, an examinee firming the dimensionality of a scale akin to
always has at least a 10 percent chance confirmatory factor analysis, the confirmatory
of responding correctly or endorsing that IRT models for traits (Embretson, 1991, 1997;
item due to chance alone. In comparison, DiBello et al., 1995; Adams et al., 1997).
examinees of low ability or trait level have a
near 0 percent chance of responding correctly
The benefits of a model-based
to items A–E. There is no equivalent to the
approach
guessing parameter under CTT.
Several extensions of the basic IRT models CTT has an advantage over IRT in that most
have been developed. Bock (1972) extended CTT procedures have a closed form2 and
the 2PL model to the nominal response model are computationally simple, with IRT requir-
in order to use all information contained in ing complex estimation procedures (MML,
examinee responses. Thissen and Steinberg Empirical Bayes, etc.). It is also true that the
(1984, 1986) showed that all other non- correlation between IRT person ability and
ordered polytomous models are special cases the CTT summed scale score is usually very
of the nominal response model. The partial high, and so an argument can be made that
credit model (PCM; Masters, 1982) and its not much is gained through IRT. However,
derivation, the rating scale model (Andrich, just because two scalings (CTT and IRT) are
1978), were introduced for the case where equivalent (or nearly so) does not mean that
partial credit may be necessary as is often they will produce similar experimental and
the case with math problems. The graded applied results. IRT separates examinees in
response model (Samejima, 1969) assumes the extreme ranges of the ability distribution
available response categories can be ordered rather than in the middle by providing optimal
(i.e. Likert scales). The binomial trials model scaling of individual differences. For instance,
can be used for situations involving the in a bivariate scatterplot of CTT and IRT
probability that an examinee completes x of trait estimates, a Loess fit line would take
n trials such as making 8 of 10 free throws in on an ogive form with examinees having a
a basketball game. The Poisson counts model high degree of correspondence around the
is appropriate for measurement situations average trait level and more variability at
involving the number and difficulty of events the extreme ranges of the ability distribution.
(i.e. push-ups, sit-ups, etc.) completed per Several authors have reported problems with
period of time must be considered. using CTT scores as a metric for scaling
Other examples of IRT models include the individual differences or comparing groups
multidimensional extensions of the 1-, and (Maxwell & DeLaney, 1985; Yen, 1986;
2-PL models: the multidimensional Rasch Bond & Fox, 2001), testing moderated effects
model (Reckase & McKinley, 1982) and (Embretson, 1996), and change (Bereiter,
the multidimensional 2PL model (Reckase, 1963; Embretson, 1998b, 2007; Fraley et al.,
1997). Fischer’s LLTM has been extended 2000), where these problems were alleviated
to the multicomponent latent trait model by IRT scaling. In addition, IRT’s unique
(MLTM; Whitely, 1980), the general compo- properties are necessary to facilitate advanced
nent latent trait model (GLTM, Embretson, measurement applications such as CAT
1984), and the multidimensional Rasch (Weiss, 1982), detecting item bias or differen-
model for learning and change (MRMLC; tial item functioning (DIF; Lord, 1980), and
276 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
test linking or equating (Cook & Eignor, 1983, to perform at a high level. This is because the
1989). Despite its historical popularity, CTT difficulty of a test, or individual items for that
has many shortcomings. These shortcomings matter, is defined in CTT as the proportion of
will be discussed in the context of the four examinees in a group of interest who answers
characteristics of quality measurement: a the item correctly. Thus, a difficult versus easy
meaningful metric for obtained scores, score distinction depends on the examinees taking
reliability, score validity, and a standardized the test and performance depends on whether
mode of test administration. items are hard or easy.
IRT provides person-free item parameter
estimation and item-free person parameter
A meaningful metric
estimation that are invariant within a linear
When constructs are considered latent transformation, meaning that item parameters
(e.g. intelligence, depression, attitudes, from one sample can be linearly transformed
etc.) and are not directly observable (e.g. to be equal to parameters from a second
pounds, liters, kilometers, etc.), they have no sample. IRT places person ability and item dif-
inherent metric. Under CTT, in many cases, ficulty on the same scale, explicitly estimating
construct scores have little or no meaning the joint relationship between person and item
unto themselves unless they can be compared properties. Therefore, responses from items
to a normative group. Normative information with known IRFs can be used to estimate
serves as a reference by which to evaluate trait levels for other samples. In CTT, the
how an individual compares to others who model does not include item properties, so the
took the same test. IRT improves on this trait level applies only to particular items on
limitation by providing a sample-free metric that test. In contrast to CTT, the meaning of
for interpretation of performance. a trait level applies to any item where item
Invariance. Perhaps the most significant characteristics are known. This is essential
characteristics of CTT are the dependency for specific objectivity: the case in which
of the true score estimate on the specific comparison of examinees is independent of
test and population and the dependency of the specific items or tests administered. In IRT,
item characteristics on the specific sample a number of item properties can be incorpo-
from which they are derived. This means rated into the model including item difficulty,
that examinee and test characteristics cannot discrimination, susceptibility to guessing, the
be separated. That is, ability estimates apply nature of the response alternatives, impact of
only to items on a specific test or to substantive item features, average response
items on a parallel test with equivalent item time, etc.
properties, and item characteristics depend on It is important to note that even in IRT,
the group of examinees from which responses careful consideration must be given when
are obtained. Under CTT, the trait level is selecting the sample of examinees to be used
estimated by calculating the unit-weighted for item calibration. As noted earlier, item
summed scale score. The meaning of the score parameter estimation assumes that the trait is
is obtained by comparing the individual’s person-anchored. If the calibration sample is
performance to its position in a normative not a representative sample of the population
group in order to obtain a ‘true score,’ or the that an item bank is being developed for, the
expected value of observed performance on researcher will have difficulty in interpreting
the test of interest (Hambleton, Swaminathan the meaning of the resulting item parameters.
and Rogers 1991). If an item is added or However, once the representative calibration
removed, the true score changes, resulting in sample is selected and IRFs are known, items
a unique psychometric scale for every test. from a calibrated bank can be used to estimate
If a test is difficult, an examinee of average trait levels for other samples and the resulting
ability will appear to perform poorly, and if trait estimates are comparable across samples,
a test is easy, that same examinee will appear administrations, and studies.
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 277
Comparing groups. In order to compare time (i.e. revisions, short forms, etc.), different
the performance of groups of examinees, the measures of a common construct, or the same
items on the test must function the same for all measure administered in different languages.
examinees regardless of group membership. These situations are also easily remedied due
That is, the scale items must illustrate mea- to the invariance property of IRT.
surement invariance (Vandenberg & Lance, Measurement of change. Under CTT,
2000). Under the IRT framework, an item meaningful change scores can only be com-
exhibits DIF if the IRF is not equivalent when pared when initial score levels are equivalent
estimated separately for each group. Such as a small deviation from a high initial score
an illustration is only possible because of on an easy test does not mean the same thing
the parameter invariance properties of IRT. as a small score change from an average score,
Proper identification of DIF is hindered under because an interval scale level of measure-
the CTT framework by the lack of sample ment is not achieved (Embretson & Reise,
independent item statistics. DIF has increased 2000). If an interval scale of measurement is
in prominence, and will continue doing so, achieved through transformations, then it is
along with the increased emphasis on test specific to that particular test administration.
fairness (see American Educational Research However, in IRT, change scores can be
Association, American Psychological Associ- meaningfully compared even when the initial
ation, & National Council on Measurement scores are unequal3 . This is largely due to
in Education, 1999). See Holland and Wainer the interval scale nature of item difficulty
(1993); Millsap and Everson (1993); Waller parameters and individual trait parameters.
et al. (2000); or Reise et al. (2001) for further Bereiter (1963) indicated three basic prob-
discussions and illustrations of DIF. lems with using a simple CTT difference score
Comparing different measures of the same to indicate change: a paradoxical relationship
trait. Historically, when necessary to compare between the test-retest correlation and the
or relate test scores from two different admin- reliability of the change score, the initial score
istrations or test scores from two different correlates negatively with the change score,
measures of the same construct, test-equating and the fore-mentioned scaling issue. A fourth
procedures were necessary (see Doran & problem is whether the change score actually
Holland, 2000; Embretson & Reise, 2000). reflects change due to a condition or is simple
The development and refinement of IRT pro- error (Embretson, 1998a). A special Rasch-
cedures allows for a more powerful approach family model, the multidimensional Rasch
referred to as scale linking (Choi & McCall, model for learning and change (MRMLC;
2002). Scale linking through IRT solves two Embretson, 1991) addressed the four dif-
classic problems experienced under CTT: ficulties of CTT by resolving the scaling
respondent non-response resulting in a differ- and reliability problems found with standard
ent set of items for different examinees, and ‘change’ scores and removing some of the
different measures for different examinees confounds that occur with initial status. Two
(see Vale, 1986). Under CTT, when non- of the problems are addressed by IRT in
response occurs, the average item response general. First, the Rasch model achieves
may be used instead of the unit-weighted interval scale properties (see Andrich, 1985;
summed score, or a missing data procedure Fischer, 1995a). Second, the MRMLC, as
such as multiple imputation (MI; Schafer, an IRT model, provides individual standard
1997) may be used, although this rarely occurs error of measurement estimates. The MRMLC
or is recommended. Under IRT, examinee specifically addresses the two change score
non-response is not a problem because the dilemmas: the issue of paradoxical reliabil-
trait estimate can be estimated from any set ities is addressed by modeling individual
of items with known IRFs. The need to link change directly in a model that explains
different measures for different examinees changing test correlations, and the correlation
often occurs due to changing content over between the initial score and the change
278 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
score is resolved by achieving interval scale the correlation between two alternate test
properties (Embretson, 1998b). forms could be used to estimate test reliability
(Lord & Novick, 1968), thus between-
test variability is often assessed by either
Reliability
repeated administrations of the same test (test-
Reliability refers to the accuracy or precision retest reliability) or by administrations of
of a measurement instrument. That is, scores parallel forms (alternate forms reliability).
must be reliable before they can be valid. Within-test consistency can be assessed with
It is important to note that tests themselves a single test administration either by use
are not reliable, the resulting scores are. It of split-half reliability or most commonly,
is possible for a given test to yield highly coefficient alpha (Guttman, 1945; Cronbach,
reliable scores in some circumstances but 1951).
not others. Responsible reporting of test Coefficient alpha, as the average inter-item
results should always include the reliability correlation, quantifies the internal consistency
estimate in order to reflect the impact within a test and is appropriate for multiple-
of sample-specific characteristics on score item measures that measure a single common
reliability. construct (i.e. are unidimensional). Coeffi-
Internal consistency. The second CTT cient alpha derivations assume that all items
shortcoming concerns the definition of reli- measure the same construct (i.e. the test is
ability and its complement, the standard unidimensional), and all items are assumed to
error of measurement (SEM). Under CTT, be equally related to the construct (i.e. parallel
a measure is reliable, or consistent, if an measures). For dichotomously scored items,
individual examinee can hypothetically be the Kuder-Richardson Formula 20 (KR20 )
measured a large number of times and is identical to coefficient alpha, and if all
achieve the same score each time. Reliability items have the same degree of difficulty,
quantifies the proportion of true score variance the Kuder-Richardson Formula 21 (KR21 )
in a set of scores. Even though the CTT may be used. Several factors influence the
model in Equation 1 specifies that there reliability of test scores under CTT, including
are two independent variables (IVs) per the heterogeneity of the sample, the level of
person (T and E), these IVs are not actually the sample on the construct, and the number of
separable for an individual score. Instead, items. Numerous other reliability coefficients
communalities (correlations) between items have been developed for CTT that provide
are used to infer population estimates of true either lower bound estimates or estimates
and error variance. Reliability is estimated with unknown biases (see Hambleton & van
as the correlation between test scores on der Linden, 1982).
parallel forms of a test or as a function Under CTT, the ‘quality’ of items or their
of inter-correlations among items on a test. relationship with the trait is evaluated based
As a test-level estimate under CTT, scale on the mean item response and the item-
reliability, as well as the SEM, applies equally total correlation. The mean item response,
to all individuals in a sample that takes the or the proportion endorsing the item in the
test or all scores obtained from a particular keyed direction, is a measure of the difficulty
test administration. Thus, CTT is relevant to of the item, and the item-total correlation
reliability only at the population level and not is an indication of how well the item taps
at the individual level. the construct of interest. Such item statistics
There are two primary sources of mea- are not invariant across diverse samples and
surement error in observed scores: incon- are thus sample dependent. Item difficulty
sistency across time and/or test forms changes depending on the average trait level
(between-test variability), and inconsistency of the respondent sample, and the item-
across items within a test (within-test total correlation is heavily influenced by the
variability). Spearman (1927) illustrated that variability of scale scores on a given sample
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 279
and changes depending on whether items are where information reflects how well an item
added or deleted from the test. differentiates among respondents who are at
Unfortunately, coefficient alpha and the different levels of the latent variable. Under
Kuder-Richardson formulas themselves can IRT, the IIF and a scale information function
misestimate scale reliability. When items are (SIF) are calculated that allow measurement
not parallel, regardless of dimensionality, error to vary across levels of the trait. By
coefficient alpha is actually a lower-bound allowing for non-uniform precision across
reliability estimate (i.e. reliability is under- the entire range of trait levels with extreme
estimated; see Lord & Novick, 1968; Raykov, levels of a trait having more measurement
1997). Conversely, when unidimensionality error than the typical levels of the trait,
is violated by the inclusion of subscales, IRT provides a more realistic and valid
methods factors, or strict time limits (causing conceptualization of reliability.
the introduction of a speed of processing Information is a function of item parameters
factor), coefficient alpha can result in an at any given trait level. For the 1PL model,
overestimate of the scale precision. Recently, information is a product of the probability of
newer methods have been proposed to a correct response, pi (θ), and probability of an
more accurately estimate scale reliability and incorrect response, qi (θ ). Item information for
allow for establishing a confidence interval the 2PL and 3PL models further incorporate
around the point estimate (see Raykov, 1997; the discrimination and guessing parameters.
Raykov & Shrout, 2002). See Figure 16.2 for example IIFs relative to
Information. The most significant differ- three of the IRFs presented in Figure 16.1.
ence between IRT and CTT is the con- The IIF appears as a bell-shaped function
ceptualization of measurement error. Under with the maximum information provided at
CTT, there is a single index of reliability the location parameter. That is, information
for all examinees. Instead of item reliability, is greatest when the item’s difficulty and the
IRT uses an item information function (IIF), person’s ability are matched. The shape of
1.0 0.60
0.9
0.50
0.8
0.7
0.40
0.6
Information
Probability
0.5 0.30
0.4
0.20
0.3
0.2
0.10
0.1
0.0 0.00
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Difficulty/Ability
A - IRF D - IRF E - IRF A - IIF D - IIF E - IIF
Figure 16.2 Item information functions contrasted with their corresponding item response
functions for three of the items in Figure 16.1 differing in discrimination and difficulty
280 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
1.2 4.5
4.0
1.0
3.5
0.8 3.0
Standard Error
Information
2.5
0.6
2.0
0.4 1.5
1.0
0.2
0.5
0.0 0.0
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Difficulty/Ability
A B C D E F SIF SE
Figure 16.3 Item information functions, scale information function, and test standard error
for a hypothetical test that includes the six items originally presented in Figure 16.1. Note
that item E has the highest discrimination and thus the most information. Even though the
average difficulty is 0.0, maximum precision is obtained for examinees that are approximately
0.80 standard deviations errors below ‘average’
the IIF is indicated by the item discrimina- the square root of the SIF. Information in IRT
tion, for models with varying discrimination allows the reliability of a test to be shaped
parameters. Note that the highest IIF is for for different ranges of ability. Figure 16.3
item E which also has the highest discrimina- includes the SIF and SEM for a hypothetical
tion (ai = 1.5). Highly discriminating items test that includes the same six items illustrated
provide more information over a narrow in Figure 16.1. While the average difficulty
range, while low discriminating items provide of the 6 items is approximately 0.0, maximal
less information over a broader range. For precision is actually obtained for examinees
instance, the IIF for item E shows a lot of θs /bi = −0.80 because of the additional
information is available over a narrow range information provided by item E at bi = −1.0
of abilities from about −2.0 to 0.0 centered and the relatively little information provided
at the item location parameter for that item by item D at bi = 1.0. Thus, this hypothetical
(bi = −1.0), while the IIF for item D shows test would yield the most precise measurement
that much less information is available for for individuals who are approximately 0.80
examinees with a much broader range of standard deviations below average on the trait
abilities. of interest.
Due to local independence (the latent As a parallel to coefficient alpha, an empir-
variable explains any relationship between ical reliability coefficient can be computed
items), item information is additive, so test as an average reliability across examinees.
information represented by the SIF is a sum of The empirical reliability coefficient (see du
item information. The SIF can be recomputed Toit, 2003) may be given as the ratio of the
just as the CTT statistic of alpha-if-deleted is variance in estimated scores for the sample,
calculated. The standard error or measurement σθ2 , to the sum of σθ2 and the mean square
at a given trait level is then the reciprocal of SEM (σE2 ).
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 281
weighting and be comparable to the 1PL properties directly to the trait. Embretson
model. Reise and Henson (2003) report that (1983) noted that all four criteria could be
in personality research, there is no substantial met by multicomponent latent trait modeling
evidence personality research to show that which combines IRT with mathematical
IRT increases the magnitude of the validity modeling. In this approach, task decompo-
coefficient and thus external validity. sition is applied to test items as a basis for
External validity, however, is not suffi- estimating the theoretical parameters. Some
cient to elucidate construct meaning. The early examples of this approach are the linear
relationship between test scores and other LLTM (Fischer, 1973), the multicomponent
variables elaborates the test’s nomological latent trait model (MLTM; Whitely, 1980),
network, but this confounds the meaning and the general component latent trait model
of a construct with its significance. Even (GLTM; Embretson, 1984).
though construct significance is elaborated Development of measures from cogni-
by empirically established relationships, con- tive principles. Even though researchers in
struct meaning is not (Bechtoldt, 1959). Estab- psychological and educational measurement
lishing substantive validity (Messick, 1994) have been interested in developing tests based
more directly involves construct meaning. on cognitive principles for quite some time,
Embretson (1983) suggested that the theory little has been done to progress the interest.
behind a construct must be brought into more Aptitude and ability tests are frequently
of a central role in defining construct meaning described using cognitive terms, but the
by differentiating construct representation real utility of cognitive theory has been
from nomothetic span. That is, the construct widely ignored (Pellegrino, 1998). Cognitive
representation aspect of construct validity (see psychology principles can be useful for
Embretson, 1983; Messick, 1994) is explained test design because justifiable operational
by understanding the cognitive processes and definitions are required for the construct mea-
strategies that are involved in items, as well as surement, the field frequently takes advantage
by understanding the specific knowledge that of detailed task stimulus property descrip-
is required for successful item completion. tions, and they provide results on how item
Even though nomothetic span is supported by properties influence the cognitive processes
individual differences relationships, construct involved in problem solving (Embretson,
representation is supported by studying the 1998a). Understanding the sources of cogni-
impact of item and task features on item tive complexity in items can lead to effective
responses. This distinction results in several means of item generation. The stimulus fea-
advantages for test development, including tures that are quantified to represent sources
the capacity to design items to reflect specific of cognitive demand potentially can be
cognitive constructs and to select items for stim- manipulated to develop items with specified
ulus features that influence targeted processes. sources and levels of cognitive complexity.
IRT and CTT differ greatly in their potential Since the stimulus features are quantified
for explicating construct representation and in the cognitive model, item difficulty is
for guiding test design. Four general criteria predictable, depending on the strength of the
can be applied to evaluate a psychometric model. This further leads to the possibility of
methodology for construct validity: relating quickly producing a large number of items
individual test performance to the character- that may require little or no empirical tryout,
istics of item stimuli, providing a comparison due to the priors from the model predictions
of alternative theories of the task constructs, (Mislevy, 1993). Effective cognitive models
establishing specific terms for theoretical have been developed for many non-verbal
construct quantification, and measuring indi- intelligence tests (Bejar, 1993; Embretson &
viduals on the constructs involved. CTT Gorin, 2001; Embretson, 2002) and several
does not meet these criteria because it is researchers have demonstrated the potential
test-score oriented and does not link item to generate test items based on a specific
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 283
cognitive theory (Hornke & Habon, 1986; capacity for expanded visualization, audition,
Bejar & Yocam, 1991; Embretson, 1994). and interaction, and the automated nature of
Despite these successes, psychometric tests computers can be capitalized to develop or
designed from cognitive theory have been compile, process, and score tests, including
rare. See Embretson (1995, 1998a), Kyllonen complex responses such as open-ended ques-
(1993), Kyllonen and Christal (1989, 1990), tions. Finally, from a substantive perspective,
and Draycott and Kline (1994) for examples CBT has the potential to assess new skills,
of cognitive design systems and efforts some even better than other testing formats,
towards developing psychometric measures and can allow access to data that is not
from cognitive principles. readily available from a pencil-and-paper
format (e.g. response time).
CBT is not without its unresolved prob-
Standardization
lems, however. Access to computer testing
A measure is standardized if there are uniform centers or the internet, item security, test-
procedures to ensure that the measure is delivery system reliability, and the expense of
administered and scored the same way each development are of primary concern. The psy-
time it is used. If so, two individuals who chometric quality of the tests, the adequacy
receive the same score can be interpreted of the supporting theoretical models, and the
to possess the same amount of the attribute. issue of whether test bias occurs due to the
However, there is a great degree of variability effect of access to technology on performance
in the procedures that are used for standard- are very active areas of inquiry. See Mills et al.
ization. Measures scored through CTT can be (2002) for the current state of the art in CBT.
easily standardized, but some of the special Computerized adaptive testing. CBT itself
qualities of IRT allow for major advances in is not an advance attributable to the benefits
test administration. of IRT over CTT. However, an important
Computer-based testing. The development issue related to CBT is item selection. In a
of the computer and its application to testing conventional test, every examinee receives
has brought with it several improvements the same (or parallel) test form, the same
in test standardization. In contrast to the item set, and in the same (or counterbalanced)
traditional pencil-and-paper mode of test presentation order. Conventional tests are
administration, CBT has become a common usually administered through the pencil-and-
form of test delivery. Perhaps the most paper format, but a computer may be used
notable advantage of CBT over the pencil- to administer the test as well. Conventional
and-paper and interview formats is the level of tests are usually geared towards the average
administrative control given over the testing examinee, so they are not the best estimators
conditions. CBT simplifies administration, of ability for examinees at the extremes of the
requires fewer resources, provides faster ability continuum (low or high). These tests
results, may be less prone to testing-related can be time-intensive for the examinee, but as
errors, may minimize examinee cheating, group tests, they are relatively convenient for
and has become more cost-effective as the the administrator.
cost and prevalence of personal computers Adaptive tests tailor item selection to meet
decreases (Mead & Drasgow, 1993). From a the examinee’s individual ability levels by
logistical perspective, CBT can reduce testing selecting from a pool of items that are the
time, provide immediate scoring, allow more most appropriate for that particular examinee,
frequent testing, provide the opportunity for so not all examinees will receive the same set
walk-in testing, allow individual administra- of items. Traditional individual intelligence
tion, and increase test security by reducing tests or subtests that require the administrator
the possibility that examinees can provide to determine a baseline ability level for the
information to one another. More complex examinee and then administer increasingly
item types are available through increased more difficult items until a ceiling is reached
284 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
testing, no new issues emerge depending cultural linguistic differences (Alderman &
on the adequacy of the test administration Holland, 1981), and physical functioning
programming. However, unproctored internet (McHorney & Cohen, 2000). Through the
testing is quite controversial, prompting an continued expansion of available and user-
American Psychological Association commit- friendly software and an increase in approach-
tee to outline the various issues involved in able references and applications, IRT will
internet testing (see Naglieri et al., 2004). continue to become better-appreciated and
Even though unproctored testing cannot be further cemented in its status as the modern
generally recommended, research on the measurement framework.
various issues is actively in progress.
NOTES
SUMMARY AND CONCLUSIONS
1 According to the phi-gamma hypothesis, when
This chapter has reviewed the principles of a series of stimuli are controlled to range in
CTT and contrasted them with a modern intensity from zero to high intensity, the probability
measurement framework, IRT. IRT represents that an observer can detect the increasing stimuli
monotonically increases from zero to unity along a
several important advances over classical psychometric curve that can be represented by the
methods, including the capacity to model the cumulative normal distribution function. In modern
measurement process at the behavior level measurement terms, as the difficulty of an item
rather than at the instrument or person level; increases, the probability that an examinee will
provide a meaningful and interpretable metric correctly answer the item increases according to the
cumulative normal distribution function.
for comparing individual performance within 2 An equation is said to have a closed form if it
a sample as well as between unrelated sam- can be expressed in terms of so-called ‘elementary
ples; provide a framework for acknowledging functions’ such as addition, subtraction, multiplica-
the non-uniform precision of measurement tion, division, or exponentiation. In other words, it
across the entire range of the trait; and has a finite and exact solution. In contrast, estimation
procedures such as MML and empirical Bayes require
provide the platform by which advances in iterative procedures and result in an approximate
computerized testing are possible. solution that maintains a reasonably small amount of
IRT has its roots in educational testing error.
and the general testing of mental abilities, 3 Classical test theory suffers from the ‘physicalism-
and a vast majority of applications have subjectivism’ dilemma that equal raw score dif-
ferences do not necessarily correspond to equal
been in these contexts. While CTT has differences in the true latent trait. This is related to
historically been the dominant paradigm for the problem of ceiling and floor effects common in
measurement in the social sciences, and raw scores (Bereiter, 1963; Harris, 1963; Lord, 1963).
remains the preferred paradigm for a majority It is considered well known that these dilemmas are
of applied researchers in the social sciences, solved or are at least less critical when using IRT (see
Fischer, 1987, 1989, 1995b).
the advances represented by IRT have been
made apparent by this chapter, the numerous
texts on the subject, and the rapidly expand-
ing literature containing numerous applied REFERENCES
examples. For instance, emerging work from
a diverse set of applied research contexts Adams, R. A., Wilson, M., & Wang, W. C. (1997). The
multidimensional random coefficients multinomial
is demonstrating the applicability of IRT in
logit model. Applied Psychological Measurement, 21,
the broader social research context. Some
1–23.
emerging research contexts include person- Alderman, D. L., & Holland, P. W. (1981). Item
ality assessment (Reise & Henson, 2003), performance across native language groups on the
stroke rehabilitation (Duncan et al., 1999; Test of English as a Foreign Language (Research
Andres et al., 2004), smoking cessation (Noel, Rep. No. 81-16). Princeton, NJ: Educational Testing
1999), attitude measurement (Roberts, 1995), Service.
286 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
American Educational Research Association, American Validity, technical adequacy, and implementation
Psychological Association, & National Council on (pp. 317–338). Mahwah, NJ: Lawrence Erlbaum
Measurement in Education (1999). Standards for Associates, Inc.
educational and psychological testing. Washington, Cook, L. L., & Eignor, D. R. (1983). Practical
DC: American Psychological Association. considerations regarding the use of item response
Andersen, E. B. (1972). The numerical solution of a set theory to equate tests. In R. K. Hambleton (Ed.),
of conditional estimation equations. Journal of the Applications of item response theory (pp. 175–195).
Royal Statistical Society, Series B, 34, 42–54. Vancouver, BC: Educational Research Institute of
Andres, P. L., Black-Schaffer, R. M., Ni, P., & Haley, British Columbia.
S. M. (2004). Computer adaptive testing: A strategy Cook, L. L., & Eignor, D. R. (1989). Using item response
for monitoring stroke rehabilitation across settings. theory in test score equating. International Journal of
Topics in Stroke Rehabilitation, 11, 33–39. Educational Research, 13, 161–173.
Andrich, D. (1978). A rating formulation for ordered Crocker, L., & Algina, J. (1986). Introduction to classical
response categories. Psychometrika, 43, 561–594. and modern test theory. New York: Holt, Rinehart and
Andrich, D. (1985). Rasch measurement models. Winston.
Newbury Park, CA: Sage Publishers. Cronbach, L. J. (1951). Coefficient alpha and the internal
Baker, F. B., & Kim, S. H. (2004). Item response structure of tests. Psychometrika, 16, 297–334.
theory: Parameter estimation techniques (2nd ed.). Cronbach, L. J., & Meehl, P. E. (1955). Construct
New York: Marcel Dekker. validity in psychological test. Psychological Bulletin,
Bechtoldt, H. (1959). Construct validity: A critique. 52, 281–302.
American Psychologist, 14, 619–629. De Boeck, P., & Wilson, M. (2004). Explanatory item
Bejar, I. I. (1993). A generative approach to psychologi- response models. New York: Springer.
cal and educational measurement. In N. Frederiksen, DiBello, L. V., Stout, W. F., & Roussos, L. (1995). Unified
R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a cognitive psychometric assessment likelihood-based
new generation of tests (pp. 323–359). Hillsdale, NJ: classification techniques. In P. D. Nichols, S. F.
Erlbaum. Chipman, & R. L. Brennan (Eds.), Cognitively
Bejar, I. I., & Yocam, P. (1991). A generative approach diagnostic assessment (pp. 361–389). Hillsdale, NJ:
to the modeling of isomorphic hidden-figure items. Erlbaum Publishers.
Applied Psychological Measurement, 15, 129–138. Doran, N. J., & Holland, P. W. (2000). Population
Bereiter, C. (1963). Some persisting dilemmas in the invariance and the equatability of tests: Basic
measurement of change. In C. Harris (Ed.), Problems theory and the linear case. Journal of Educational
in measuring change (pp. 3–20). Madison, WI: Measurement, 37, 281–306.
University of Wisconsin Press. Drasgow, F., & Parsons, C. (1983). Applications
Birnbaum, A. (1968). Some latent trait models and their of unidimensional item response theory models
use in inferring an examinee’s ability. In F. M. Lord & to multidimensional data. Applied Psychological
M. R. Novick (Eds.), Statistical theories of mental Measurement, 7, 189–199.
test scores (pp. 397–424). Reading, MA: Addison- Draycott, S. G., & Kline, P. (1994). Speed and ability: A
Wesley. research note. Personality and Individual Differences,
Bock, R. D. (1972). Estimating item parameters and 17 (6), 763–768.
latent ability when responses are scored in two or Duncan, P. W., Wallace, D., Min Lai, S., Johnson, D.,
more nominal categories. Psychometrika, 37, 29–51. Embretson, S., & Laster, L. J. (1999). The stroke impact
Bock, R. D., & Aitken, M. (1981). Marginal maximum scale version 2.0: Evaluation of reliability, validity, and
likelihood estimation of parameters: An application sensitivity to change. Stroke, 30, 2131–2140.
of an EM algorithm. Psychometrika, 45, 443–459. du Toit, M. (Ed.) (2003). IRT from SSI: BILOG-MG,
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full- MULTILOG, PARSCALE, TESTFACT. Lincolnwood, IL:
information item factor analysis. Applied Psychologi- Scientific Software International.
cal Measurement, 12, 261–280. Embretson, S. E. (1983). Construct validity: Construct
Bond, T. G., & Fox, C. M. (2001). Applying the representation versus nomothetic span. Psychological
Rasch model: Fundamental measurement in the Bulletin, 93, 179–197.
human sciences. Mahwah, NJ: Lawrence Erlbaum Embretson, S. E. (1984). A general multicomponent
Associates, Inc. latent trait model for response processes. Psychome-
Choi, S. W., & McCall, M. (2002). Linking bilin- trika, 49, 175–186.
gual mathematics assessments: A monolingual IRT Embretson, S. E. (1991). A multidimensional latent
approach. In G. Tindal & T. M. Haladyna (Eds.), trait model for measuring learning and change.
Large-scale assessment programs for all students: Psychometrika, 56, 495–516.
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 287
Embretson, S. E. (1994). Application of cognitive design Fraley, R. C., Waller, N. G., & Brennan, K. A. (2000). An
systems to test development. In C. R. Reynolds (Ed.), item response theory analysis of self-report measures
Cognitive assessment: A multidisciplinary perspective of adult attachment. Journal of Personality and Social
(pp. 107–135). New York: Plenum. Psychology, 78, 350–365.
Embretson, S. E. (1995). The role of working memory Glickman, M. E., Gray, J. R., & Morales, C. J. (2005).
capacity and general control processes in intelligence. Combing speed and accuracy to assess error-free
Intelligence, 29, 169–189. cognitive processes. Psychometrika, 70, 405–425.
Embretson, S. E. (1996). Item response theory models Goldstein, G., & Hersen, M. (1984). Handbook of
and spurious interaction effects in factorial ANOVA psychological assessment. New York: Pergamon
designs. Applied Psychological Measurement, 20, Press.
201–212. Guilford, J. P. (1954). Psychometric methods (2nd ed.).
Embretson, S. E. (1997). Structured ability models in New York: McGraw Hill.
tests designed from cognitive theory. In M. Wilson, Gulliksen, H. (1950). Theory of mental tests. New York:
G. Engelhard, & K. Draney (Eds.), Objective Wiley.
measurement III (pp. 223–236). Norwood, NJ: Ablex. Guttman, L. (1945). A basis for analyzing test-retest
Embretson, S. E. (1998a). A cognitive design system reliability. Psychometrika, 10, 255–282.
approach to generating valid tests: Application Guttman, L. (1957). Simple proofs of relations between
to abstract reasoning. Psychological Methods, 3, the communality problem and multiple correlation.
380–396. Psychometrika, 22, 147–157.
Embretson, S. E. (1998b). Modifiability in lifespan Hambleton, R. K., Swaminathan, H., & Rogers, H. J.
development: Multidimensional Rasch Model for (1991). Fundamentals of item response theory.
learning and change. Paper presented at the annual Newbury Park, CA: Sage Publishers.
meeting of the American Psychological Association, Hambleton, R. K., & van der Linden, W. J. (1982).
San Francisco, August. Advances in item response theory and applications:
Embretson, S. E. (2002). Generating abstract reasoning An introduction. Applied Psychological Measurement,
items with cognitive theory. In S. Irvine, & P. Kyllonen, 6, 373–378.
(Eds.), Generating items for cognitive tests: Theory Harris, C. W. (Ed.) (1963). Problems in measuring
and practice (pp. 219–250). Mahwah, NJ: Erlbaum. change. Madison: The University of Wisconsin Press.
Embretson, S. E. (2007). Impact of measurement scale Holland, P. W., & Wainer, H. (1993). Differential
in modeling development processes and ecological item functioning. Hillsdale, NJ: Lawrence Erlbaum
factors. In T. D. Little, J. A. Bovaird, & N. A. Card (Eds.), Associates, Inc.
Modeling contextual effects in longitudinal studies. Hornke, L. F., & Habon, M. W. (1986). Rule-based
Mahwah, NJ: Erlbaum. item bank construction and evaluation within the
Embretson, S. E., & Gorin, J. (2001). Improving construct linear logistic framework. Applied Psychological
validity with cognitive psychology principles. Journal Measurement, 10, 369–380.
of Educational Measurement, 38, 343–368. Knol, D. L., & Berger, M. P. (1991). Empirical comparison
Embretson, S. E., & Reise, S. P. (2000). Item response between factor analysis and multidimensional item
theory for psychologists. Mahwah, NJ: Lawrence response models. Multivariate Behavioral Research,
Erlbaum Associates, Inc. 26, 457–477.
Fischer, G. H. (1973). The linear logistic model Kyllonen, P. C. (1993). Aptitude testing inspired by
as an instrument in educational research. Acta information processing: A test of the Four-Sources
Psychologica, 37, 359–374. Model. Journal of General Psychology, 120, 375–405.
Fischer, G. H. (1987). Applying the principles of specific Kyllonen, P. C., & Christal, R. E. (1989). Cognitive
objectivity and generalizability to the measurement of modeling of learning abilities: A status report of
change. Psychometrika, 52, 565–587. LAMP. In R. Dillon, & J. W. Pellegrino (Eds.), Testing:
Fischer, G. H. (1989). An IRT-based model for Theoretical and applied issues (pp. 146–173).
dichotomous longitudinal data. Psychometrika, 54, New York: Freeman.
599–624. Kyllonen, P. C., & Christal, R. E. (1990). Reason-
Fischer, G. H. (1995a). Derivations of the Rasch model. ing ability is (little more than) working-memory
In G. H. Fischer, & I. W. Molenar (Eds.), Rasch models: capacity?! Intelligence, 14, 389–433.
Foundations, recent developments and applications. Lazarsfeld, P. F. (1950). The logical and mathematical
New York: Springer-Verlag. foundation of latent structure analysis. In E. A.
Fischer, G. H. (1995b). Some neglected problems in IRT. Schulman, P. F. Lazarsfeld, S. A. Starr, & J. A. Clausen
Psychometrika, 60, 459–487. (Eds.), Studies in social psychology in World War II.
288 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Vol. 4: Measurement and prediction (pp. 362–412). Muthén, B. (1978). Contributions to factor analysis of
Princeton, NJ: Princeton University Press. dichotomous variables. Psychometrika, 43, 551–560.
Likert, R. (1932). A technique for the measurement of Naglieri, J. A., Drasgow, F., Schmit, M., Handler, L.,
attitudes. Archives of Psychology, 140, 44–53. Prifitera, A., Margolis, A., & Velasquez, R. (2004).
Lord, F. M. (1952). A theory of test scores. Psychometric Psychological testing on the internet: New problems,
Monograph, No. 7. old issues. American Psychologist, 99, 150–162.
Lord, F. M. (1963). Elementary models for measuring Noel, Y. (1999). Recovering unimodal latent patterns
change. In C. W. Harris (Ed.), Problems in measuring of change by unfolding analysis: Applications
change (pp. 21–38). Madison: The University of to smoking cessation. Psychological Methods, 4,
Wisconsin Press. 173–191.
Lord, F. M. (1980). Application of item response theory Pellegrino, J. W. (1998). Mental models and mental
to practical testing problems. Hillsdale, NJ: Erlbaum. tests. In H. Wainer, & H. I. Brown (Eds.), Test validity
Lord, F. M., & Novick, M. R. (1968). Statistical theories (pp. 49–59). Hillsdale, NJ: Erlbaum.
of mental test scores. Reading, MA: Addison-Wesley. Rasch, G. (1960). Probabilistic models for some
Masters, G. (1982). A Rasch model for partial credit intelligence and attainment tests. Chicago, IL: The
scoring. Psychometrika, 47, 149–174. University of Chicago Press.
Maxwell, S. E., & DeLaney, H. (1985). Measurement Raykov, T. (1997). Estimation of composite reliability
and statistics: An examination of construct validity. for congeneric measures. Applied Psychological
Psychological Bulletin, 97, 85–93. Measurement, 21, 173–184.
McDonald, R. P. (1985). Factor analysis and related Raykov, T., & Shrout, P. E. (2002). Reliability of
methods. Hillsdale, NJ: Lawrence Erlbaum Associates. scales with general structure: Point and interval
McDonald, R. P. (1999). Test theory: A unified treatment. estimation using structural equation modeling.
Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Structural Equation Modeling, 9, 195–212.
McHorney, C. A., & Cohen, A. S. (2000). Equating Reckase, M. D. (1997). The past and future of
health status measures with item response theory: multidimensional item response theory. Applied
Illustrations with functional status items. Medical Psychological Measurement, 21, 25–36.
Care, 38, 43–59. Reckase, M. D., & McKinley, R. L. (1982). Some latent
Mead, A. D., & Drasgow, F. (1993). Equivalence of trait theory in a multidimensional latent space. In
computerized and paper-and-pencil cognitive ability D. J. Weiss (Ed.), Proceedings of the 1982 item
tests: A meta-analysis. Psychological Bulletin, 114, response theory and computerized adaptive testing
449–458. conference (pp. 151–177). Unpublished manuscript,
Mellenbergh, G. J. (1994). A unidimensional latent trait Minneapolis, University of Minnesota, Department of
model for continuous item responses. Multivariate Psychology.
Behavioral Research, 29, 223–236. Reise, S. P., & Henson, J. M. (2003). A discussion
Messick, S. (1994). Validity of psychological assessment: of modern versus traditional psychometrics as
Validation of inferences from persons’ responses applied to personality assessment scales. Journal of
and performances as scientific inquiry into score Personality Assessment, 81, 93–103.
meaning. Research Report RR-94-45. Princeton, NJ: Reise, S. P., Smith, L., & Furr, R. M. (2001). Invariance
Educational Testing Service. on the NEO PI–R Neuroticism scale. Multivariate
Mills, C. N., Potenza, M. T., Fremer, J. J., & Ward, W. C. Behavioral Research, 36, 83–110.
(Eds.) (2002). Computer-based testing: Building the Roberts, J. S. (1995). Item response theory approaches
foundation for future assessments. Mahwah, NJ: to attitude measurement. (Doctoral dissertation,
Erlbaum. University of South Carolina, Columbia, 1995).
Millsap, R. E., & Everson, H. T. (1993). Methodology Dissertation Abstracts International, 56, 7089B.
review: Statistical approaches for assessing measure- Samejima, F. (1969). Estimation of latent ability using
ment bias. Applied Psychological Measurement, 17, a response pattern of graded scores. Psychometrika
297–334. Monograph, No. 17.
Mislevy, R. (1986). Recent developments in the Schafer, J. L. (1997). Analysis of incomplete multivariate
factor analysis of categorical variables. Journal of data. New York: Chapman & Hall.
Educational Statistics, 11, 3–31. Spearman, C. (1927). The abilities of man. New York:
Mislevy, R. (1993). Foundations of a new test theory. Macmillan.
In N. Frederiksen, R. Mislevy, & I. Bejar, (Eds.), Test Takane, Y., & de Leeuw, J. (1987). On the relationship
theory for a new generation of tests (pp. 19–39). between item response theory and factor analysis of
Hillsdale, NJ: Lawrence Erlbaum Associates. discretized variables. Psychometrika, 52, 393–408.
MODERN MEASUREMENT IN THE SOCIAL SCIENCES 289
Tate, R. (2002). Test dimensionality. In G. Tindal & T. M. organizational research. Organizational Research
Haladyna (Eds.), Large-scale assessment programs Methods, 3, 4–70.
for all students: Validity, technical adequacy, van der Linden, W. J., & Hambleton, R. K. (Eds.)
and implementation (pp. 181–211). Mahwah, NJ: (1996). Handbook of modern item response theory.
Lawrence Erlbaum Associates, Inc. New York: Springer-Verlag.
Thissen, D., & Steinberg, L. (1984). A response model for Wainer, H. (2000). Computerized adaptive testing:
multiple choice items. Psychometrika, 49, 501–519. A primer (2nd ed.). Mahwah, NJ: Lawrence Erlbaum
Thissen, D., & Steinberg, L. (1986). A taxonomy of item Associates, Inc.
response models. Psychometrika, 51, 567–577. Waller, N. G., Thompson, J., & Wenk, E. (2000).
Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, Black-white differences on the MMPI: Using IRT
NJ: Lawrence Erlbaum Associates, Inc. to separate measurement bias from true group
Thorndike, R. M., & Lohman, D. F. (1990). A century of differences on homogeneous and heterogeneous
ability testing. Chicago: Riverside Publishers. scales. Psychological Methods, 5, 125–146.
Tuerlinckx, F., & De Boeck, P. (2005). Two interpreta- Weiss, D. J. (1982). Improving measurement qual-
tions of the discrimination parameter. Psychometrika, ity and efficiency with adaptive testing. Applied
70, 629–649. Psychological Measurement, 6, 473–492.
Vale, D. C. (1986). Linking item parameters onto a Whitely, S. E. (1980). Multicomponent latent trait
common scale. Applied Psychological Measurement, models for ability tests. Psychometrika, 45,
10, 133–144. 479–494.
Vandenberg, R. J., & Lance, C. E. (2000). A review and Yen, W. M. (1986). The choice of scale for educational
synthesis of the measurement invariance literature: measurement: An IRT perspective. Journal of
Suggestions, practices, and recommendations for Educational Measurement, 23, 299–325.
17
Natural and Contrived Data
Susan A. Speer
Schegloff, 1996a, 1996b; Silverman, 2006: laboratories’ (1998: 14, emphasis added).
201). While naturally occurring data involve ‘real
Conversation analysts and discursive psy- interests, investments, interactional trajec-
chologists are among the chief advocates of tories’ which ‘are at stake and serve as
this position, expressing a strong preference formative context’ (Schegloff, 1998: 247),
for working with ‘tapes and transcripts of non-natural data are data that have been ‘got
naturally occurring interactions’ (Schegloff up’ by the researcher using an interview, an
and Sacks, 1973: 291, emphasis added). experiment, or a survey questionnaire (Potter,
Indeed, for many, this preference has become 2004: 205). Such data, then, ‘would not
a requirement built into definitions of conver- exist apart from the researcher’s intervention’
sation analysis (CA). According to Hutchby (Silverman, 2006: 201).
and Wooffitt for example, CA is ‘the study The issue of ‘researcher provocation’
of recorded, naturally occurring talk-in- appears central here: According to Schegloff
interaction’ (1998: 14, emphasis in original). and Sacks (1973: 291), natural interaction
Similarly, Psathas argues that within CA ‘data is not ‘coproduced with or provoked by the
may be obtained from any available source, researcher’ (ten Have, 1999: 48), and the
the only requirements being that these should materials are ‘as uncontaminated as possible
be naturally occurring’ (1995: 45, emphasis by social scientific intervention’ (Heritage,
added). Others put this ‘requirement’ for 1988: 130). Ten Have (1999: 49) argues
natural data even more strongly. For example, that ‘the ideal is to (mechanically) observe
Paul ten Have suggests that ‘it is essential for interactions as they would take place without
the CA enterprise to study recordings of nat- research observation’, while Drew (1989: 96)
ural human interaction’ (1999: 47, emphasis goes even further, asserting that the data must
added) and that these recordings ‘should catch not have been ‘produced for the purpose of
“natural interaction” as fully and faithfully as study’, or collected ‘for any pre-formulated
is practically possible’ (1999: 48). Likewise, investigative or research purposes’2 .
Heritage and Atkinson assert that ‘within In what is still one of the clearest exposi-
conversation analysis there is an insistence on tions of the ethnomethodological origins of
the use of materials collected from naturally CA, Heritage (1984) argues that CA’s insis-
occurring occasions of everyday interaction’ tence on the use of naturally occurring data
(1984: 2, former emphasis added). is matched by an avoidance of data sources
A variety of terms have been used along- that are deemed ‘unsatisfactory’ (1984: 236).
side, and interchangeably with, references These include data from interviews, where
to ‘naturally occurring data’. Researchers participants’ reports of events are treated as
work with ‘natural conversation’ (Sacks et al., an ‘appropriate substitute’ for a recording of
1974: 698), ‘natural conversational materials’ the actual events; experiments and testing,
(Schegloff and Sacks, 1973: 291), ‘actual which involve the ‘direction or manipulation
utterances in actual ordinary conversations’ of behaviour’; observational methods, where
(Schegloff, 1988a: 61), ‘actually occurring data are recorded in field notes or using
data’ (Heritage and Atkinson, 1984: 18), pre-coded schemas (and which rely on the
and ‘actual, empirical, naturally occurring researcher’s post-hoc recollection or recall);
garden variety actions’ (Schegloff, 1996a: and invented data (sentences, speech acts or
166). Here, the ‘natural’ or ‘actual’ is implic- exemplar dialogues) based on intuition or
itly or explicitly contrasted with data that ‘idealizations about how interactions work’
are ‘non-natural’, ‘contrived’ or ‘researcher- (Heritage, 1984: 236; see also, Heritage
provoked’. So, Hutchby and Wooffitt argue and Atkinson, 1984: 2–5, ten Have, 1999:
that ‘naturally occurring’ refers to recorded 53–4). In sum, advocates of ‘natural data’
interactions ‘situated as far as possible in overwhelmingly focus on ‘the details of
the ordinary unfolding of people’s lives, as actual events’ (Sacks, 1984: 26) and avoid
opposed to being prearranged or set up in the decontextualised kinds of data; the
292 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
‘hypotheticalized, proposedly typicalized ver- are trying to understand’ (1996b: 468), and
sions of the world’ (1984: 25) commonly used ‘confront participants with quite distinctive,
in linguistic and philosophical approaches to and potentially complicating, interactional
language (see also Schegloff, 1988a). exigencies’ (1999: 419)3 . And yet, ironically,
Underlying this preference for natural even interviews and experiments rely in
data was Harvey Sacks’ desire to produce their design on the identification of rel-
a stable (and hence reproducible) natural evant variables for study taken from the
observational science of society (Schegloff, observation of naturally occurring interaction.
1995, vol. 1: xxx–xxxii). As part of this, Sacks As Heritage puts it, ‘it is unlikely that an
and his colleagues aimed to produce an inven- experimenter will be able to identify [control
tory of ‘recognizable social actions in this and manipulate] the range of relevant vari-
culture … to find it and provide an account of ables without previous exposure to naturally
it empirically and precisely, not imaginatively occurring interaction’ (1984: 238, see also
or typically or hypothetically or conjecturally Schegloff, 2004).
or experimentally, and to use actual, situated
occurrences of it in naturally occurring social
settings to control its description’ (Schegloff, FEMINIST PERSPECTIVES ON
1996a: 167). Sacks’ (1987: 54) argument NATURAL AND CONTRIVED DATA
was that if we are serious about producing
empirically grounded descriptions of the One group of researchers for whom this
social organisation of human interaction, then preference for naturally occurring data has
‘sequences [or talk] are the most natural sorts proven especially problematic, is feminist
of objects to be studying’. And yet, according researchers. Indeed, as I note above, most
to Sacks (1995, vol. 2: 5), researchers do feminist research on gender frequently, if not
not ‘have a strong intuition for sequencing in habitually, studies talk generated using con-
conversation’. Indeed, no matter how rich the ventional social scientific research methods
researcher’s imagination (Sacks, 1995, vol. 2: such as surveys, interviews and focus groups.
419), if we work with idiosyncratic, invented As C. Kitzinger (2000: 170) observes, very
or hypotheticalised-typicalised data exam- little feminist research is conducted using
ples, then we risk producing what Schegloff naturalistic data where gender and sexuality
calls a ‘sociology by epitome’ (1988b: 101), ‘just “happen” to be present’.
overlooking precisely those features of inter- There are three main reasons why feminist
action and its sequencing that might tell researchers have been reluctant to stray far
us something new or surprising about the from the use of such ‘contrived’ materials:
phenomena we are studying.As Sacks notes, it First, it seems to be a widely held, tacit
is only ‘from close looking at the world [that] assumption that since gender is, for most
you can find things that we couldn’t, by imag- ‘ordinary’ members, taken for granted and
ination, assert were there’ (1995, vol. 2: 419). thus background to interaction, the researcher
Likewise, where the researcher uses must artificially elicit talk about gender
‘written texts, monologues, talk or writ- from participants (i.e. they must ‘topicalize’
ing produced under experimental or quasi- gender) just to make it visible. I made
experimental conditions’ (Schegloff, 1996b: precisely this assumption in my early research
468), then the interactional practices which on masculinity, where I asked my respondents
‘undergird’ our ‘natural phenomena’ of inter- questions like ‘do you ever think you behave
est may be ‘largely or totally absent … sup- in a way that’s not traditionally masculine?’,
pressed by specially designed circumstances and ‘do you think the fact you’re male
of production’ (1996b: 468). Experimental affects your leisure in any way?’ (Speer,
control and standardisation ‘of stimuli, con- 2001; for a discussion of related issues see
ditions, topics, etc.’ (Schegloff, 1996b: 468) C. Kitzinger, 2006). For researchers who
suppress ‘the very heart of the phenomena we adopt this approach, far from suppressing
NATURAL AND CONTRIVED DATA 293
adopting these procedures might generate its activities (men ballet dancing, and women
own set of problems, or that my presence in boxing or playing rugby, for example). Some
the interactions would have the impact that prompts showing men and women engaging
it did. in traditionally gendered activities were also
I found that while the prompts were used as a point of comparison (men playing
certainly useful and provocative, they did, rugby and women shopping, for example). All
nonetheless, often fail to work in the way but one of the interviews and focus groups
I had intended them to work. In practice, were moderated by myself. The remaining
the prompts did not seem to minimise my group was moderated by a second (female)
impact, encourage the respondents to set moderator (referred to, respectively, as ‘Mod
the priorities, or produce spontaneous or 1’ and ‘Mod 2’ below). All the data were
naturalistic ‘gender talk’. In fact, it was not transcribed verbatim in the first instance.
always clear to the participants how they Detailed transcripts were then worked up
were supposed to respond to the prompts, and using conventions developed within CA by
it often took further work on my part, and Gail Jefferson (2004a). A simplified version of
follow-up questioning, before I could elicit these conventions is included in theAppendix.
their (gendered) view. In a search of the corpus I identified
58 occasions where a prompt was shown.
In just under half of these instances the
METHOD AND ANALYSIS participants had no problems responding to
the prompt, and engaging with the task set
In the remainder of this chapter, I want to (for a discussion of some of these ‘successful’
revisit some of the data I obtained from this instances see Speer, 2002c). However, in the
study in order to demonstrate what actually remaining instances, the participants seemed
happened when the prompts were shown, and to have some trouble identifying the content
to consider what this might tell us about of the prompt. They sought clarification from
the relative virtues of natural and contrive the moderator of the grounds on which they
materials. were required to respond to the prompt, thus
The four excerpts I discuss below derive engaging her in work to disambiguate its
from a series of prompted one-to-one and content.
group discussions. Research participants were Consider the following two excerpts.
drawn from several ‘naturally occurring’ In excerpt 1 (line 1) the moderator introduces
friendship and family groups, and included a a picture of a female football supporter in
diverse range of men and women ranging in overlap with Keith and Alice’s discussion of
age from 20 to 70+ years. Visual prompts were rugby (Donald [line 6] is the first participant
drawn mainly from newspapers and maga- to respond to the prompt). In excerpt 2 (line 1),
zines and showed images of men and women the moderator shows a picture of two women
engaging in a variety of ‘non-traditional’ dancing in a club.
7 -> [it?]°=
8 Carole: Sins-> [Yeah.]=
9 Mod 2: Sins-> =Mm:.
10 (Sarah): ((sniff))
11 (1.2)
12 Carole: S -> >I think that’s prob’ly quite a normal (0.6)
13 -> normal thing [because-] coz of all this (.)=
14 (Sarah)?: [((sniffs))]=
15 Carole -> =weird (.) kind of lighting eff[ect
16 ?: [heh heh
17 heh [.h h h ]
18 Carole: -> [it makes] it look really quite biza:rre,
In both excerpts, when the moderator pair part (i.e. the evaluation of the prompt).
shows the prompt, she treats the task that These insertion sequences are addressed ‘to
the participants are engaged in as one that contingencies of what is to be done next’
is already familiar to them. Additionally, (2007: 100). In other words, they help the
as the ‘first pair part’ of the sequence, participants to establish the information and
the turn that accompanies the showing of resources they need in order to appropriately
the prompt strongly implies that there is evaluate the prompt and thus ‘to implement
something ‘comment-worthy’ or ‘notable’ the second pair part [the evaluation] which
about the image that the recipients might be is [still] pending’ (Schegloff, 2007: 106). The
able to respond or react to. In other words, different parts of this sequence are marked in
the showing of the prompt invites – or makes the left-hand margin of the transcripts, above.
‘conditionally relevant’ – an appropriately In the first part of the insertion sequence
fitted ‘second pair part’in which the recipients (excerpt 1, lines 6–7 and excerpt 2, lines 6–7),
produce some sort of evaluative commentary the recipients ask a question which puts
on the prompt (for more on adjacency pairs forward a possible candidate interpretation
see Schegloff, 2007: 13ff). However, in both of what it is they see in the prompt and
instances, the recipients do not, initially at the grounds on which they might evaluate
least, produce such an evaluative commen- it. Notice that both Donald and Carole
tary. Instead, they defer their evaluations until treat gender as the ‘relevant thing’ about
later (excerpt 1, lines 10–12, excerpt 2, lines the prompt (Edwards, 1998; Hopper and
12–13, 15, and 18) in order to first check with LeBaron, 1998). So, for Donald it is not just a
the moderator whether what they have seen in supporter of football, but a ‘woman supporter
the prompt is what they are supposed to see. of foot↓ball’ (said with emphasis on the
In each case their checks take the form repaired gender category, ‘woman’), whereas
of a question-answer ‘insertion sequence’ for Carole it is not just people dancing,
(Schegloff, 2007: 97ff). Insertion sequences but ‘two girls dancing together’ (said with
are sequences within a sequence: they come emphasis on the word ‘girls’). In both cases
after the base first pair part (i.e. the showing the moderator (and in excerpt 2, another group
of the prompt) and before the base second member [line 8]) confirms these candidate
NATURAL AND CONTRIVED DATA 297
the cigarette hanging out of her mouth at science ‘expert’ with privileged access to,
a:ll’ (lines 11–12) is not so much a negative and knowledge about, the prompts. Indeed,
evaluation of women football supporters as in this context, the moderator’s first turn
it is a personal view on an aspect of the may be hearable by the respondents as an
image (cigarette smoking) that is seemingly ‘exam’ or ‘test’ question for which there
unrelated (and built by Donald as unrelated) is a ‘right’ or ‘wrong’ answer (Levinson,
to the activity in question. 1992)6 . The trouble and dispreference evident
Similarly, in excerpt 2, the participants in these excerpts – and the very necessity
clearly take the task set by the moderator for the question posed by the recipients in
in which they are required to ‘react’ to the insertion sequence – displays strongly the
the content of the prompt, as indicative respondents’ presumption that the moderator
of something potentially non-normative or already knows what is going on in the prompt,
incongruous about it. However, since, for and that she has an expectation about what
them, they seem unable to find anything kind of reaction might be an ‘appropriate’ or
non-normative or ‘newsworthy’ about ‘two the most ‘correct’ one.
girls dancing together’, then rather than offer This creates a paradoxical situation for the
an evaluation which follows up on Sarah’s moderator. On the one hand, she uses picture
gender noticing at lines 6–7 (a noticing prompts in order to generate non-hierarchical,
confirmed by Carole and the moderator at participant-led discussion of topics that they
lines 8 and 9), they simply comment on draw out from the picture as relevant to them.
the ‘normality’ of what the prompt depicts On the other hand, the occasion is set up as
(‘>I think that’s prob’ly quite a normal one in which the prompt, and the moderator’s
(0.6) normal thing’ [lines 12–13]). Thus, their accompanying question, is taken by the
response is not so much an evaluation or respondents to be a ‘test question’, where their
commentary on the activity of ‘girls dancing answer is actually not a free and unencum-
together’ as it is an account for not having bered one, but rather one that is going to be
an evaluation. Indeed, the thing that seems measured against the knowledge that they sur-
most newsworthy about the image is not mise the moderator may already have about it.
‘girls dancing together’, but rather, non- In the next excerpt we see a possible
normative features of the semiotics of the attempt by the moderator to manage this
picture (‘because- coz of all this (.) weird (.) paradox and re-establish a non-hierarchical
kind of lighting effect … it makes it look really research relationship. There is a considerable
quite biza:rre’ [lines 12–13, 15, and 18]). amount of complexity here which would
In the two excerpts I’ve discussed so far the repay a detailed analysis. However, for our
moderator responds to the inserted question present purposes I want simply to note that
about the prompt with the relevant second just as in excerpts 1 and 2, the recipients
pair part in which she helps the recipients to appear to have some trouble ascertaining
disambiguate its content, thereby confirming the grounds on which they are required to
that they have ‘correctly’ identified what the respond to or evaluate the prompt. This trouble
prompt depicts and the grounds on which they appears especially acute in this case because
might appropriately evaluate it (excerpt 1, it revolves around the delicate problem of
line 8, excerpt 2, line 9). One consequence assigning a gender to the person in the
of her participation in the insertion sequence image (lines 10–11 and 14–15). This trouble,
is that the moderator helps progress the course combined with the moderator’s withholding
of action toward her required interactional of assistance at precisely those points where
outcome – the respondent evaluations and she could legitimately provide it (e.g. at lines
the giving of (possibly gendered) views. 6, 8, 12, 16), provokes Alice to initiate the first
However, at the same time, the help she pair part of an insertion sequence in which
provides may inadvertently reinforce the she reports her ‘first thoughts’ (Jefferson
respondents’ presumption that she is a social 2004b) on the gender of the person in the
NATURAL AND CONTRIVED DATA 299
image (constructed in a way that indicates she candidate interpretation of what is going
thinks she may well be wrong) (lines 10–11). on in the prompt. Instead, she initially
When the moderator does not respond, she resists answering the question by using a
seeks confirmation regarding the correctness conversational ‘counter’ (marked as ‘cnt’ in
of her interpration: ‘is it a wo↑man ↑Su◦ san?’ the left-hand margin of the transcript), which
(lines 14–15). However, in this case, the serves to throw Alice’s question about the
moderator does not answer the inserted prompt directly back to her for her to answer:
question, or confirm or disconfirm Alice’s ‘What do you ↑think?’ (line 17):
Where insertion sequences serve to defer the the second pair part ‘with a question of
production of a second pair part which is their own. They thus reverse the direction
conditionally relevant but temporarily held of the sequence and its flow; they reverse the
in abeyance, a counter serves to ‘replace’ direction of constraint’ (Schegloff, 2007: 17,
300 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
emphasis in original). The interactional effect the prompt ‘in their own terms’. Indeed, the
of this is that it is not the moderator who immediate effect of the counter is to return
is now required to respond to the inserted the conversational floor – and hence the
question about the prompt, but the original responsibility for answering the question –
questioner, Alice. In technical terms, the directly to the recipients. However, one could
moderator essentially uses the counter to argue that, in practice, by reversing the
‘redistribute the responsibility for producing direction of the sequence and redirecting
a base second pair part’ (2007: 99). Alice’s question back to her, the counter
Alice responds to the counter by reverting constrains the recipients still further, putting
to the same first thoughts that she has already them ‘on the spot’. Moreover, the positioning
expressed twice (at lines 10–11 and 14) before of the counter (after the first pair part of an
asking her question about the prompt. This insertion sequence) is doubly consequential
time she expands her candidate interpretation in that, as I have already noted, insertion
by adding the activity (rugby) to the gender sequences tend to get launched in situa-
element: ‘I thought it was a woman in a- tions of dispreference. By throwing back a
playing rugby’ (lines 20–21). This response, question to someone who asked it because
as it turns out, offers the ‘wrong’ candidate, they are already in the midsts of trouble
as the moderator’s subsequent turn – ‘>No answering a just prior question, one strongly
it is a woman<’ (line 22), makes clear. risks exacerbating rather than rectifying that
Thus, although Alice has, in her reported first trouble8 .
thoughts, correctly identified the gender of the Having finally established what it is about
person in the image, she has failed to correctly the prompt that they are responding to, the
identify the activity that the woman is engaged recipients turn their attention to providing
in: it is not a woman rugby player but ‘It’s a the evaluation and/or commentary on the
woman with a cigarette in her mouth and a can prompt that has so far been held in abeyance
o’ lager…. It’s a football supporter I think.’ by the insertion sequence and conversational
(lines 24–5 and 27). counter. As before, the respondent’s reactions
Note that the moderator’s conversational to and evaluations of the prompt are marked
counter at line 17 does not, at this point as ‘dispreferred’, and preceded by a lengthy
in the sequence, project that she will go delay (line 29). In response to this delay,
on to answer Alice’s question and provide the moderator demonstrates that an evaluation
the ‘correct’ interpretation of the prompt. (the base second pair part) is still pending,
Indeed, she could quite reasonably respond at by offering to pass the prompt around
line 22 with a further question: ‘what makes the table (line 30). Rather like Donald in
you think that?’, for example. However, as excerpt 1, Alice reacts with what looks
can sometimes happen with counters (see like the start of a negative evaluation that
Schegloff, 2007: 17), in this case (and perhaps disaffiliates with the activity shown in the
in part because Alice’s past tense ‘I thought prompt (line 31). However there follows
it was’ construction may indicate that she still a further delay (e.g. line 35) before Alice
thinks she may be wrong), the moderator does unpacks what it is that she is getting at:
end up producing the response to the inserted ‘I mean probably, .hh (0.2) they dress up
question that she has just thrown back to more (.) nowadays than they did’ (lines 36–7).
Alice for Alice to answer7 . Just as we have seen with the participants
So what should we make of the moder- reactions in previous excerpts, even though
ator’s use of the conversational counter in her earlier identification problems revolved
this excerpt? It is quite possible that she around assigning a gender to the person
uses it in order to minimise her control in the image, Alice’s subsequent prompt-
over the research agenda, and to encourage related commentary does not follow up on
the participants to define what they see in this gender relevance, or evaluate the activity
NATURAL AND CONTRIVED DATA 301
depicted in the prompt in gendered terms. indexing of gender? One possible explanation
In fact, her response at lines 36–7 is not is that the participants may have picked up on
so much an evaluation and commentary on what they take to be the researcher’s ‘elusive
the activity of women football supporting, hypothesis’ (that the showing of prompts
as it is a remark on an aspect of the will allow her to access the participants’
image (what women football supporters wear gendered views) and that their responses to
nowadays) that is arguably only marginally the prompt may therefore be used indirectly
related to the (gendered) activity in question. to reveal something negative about them as
Similarly, Jan’s commentary on the prompt, people that is not immediately evident or
‘That’s all put on though, that seems to apparent. In other words, they may have
me as if it’s just a big act’ (lines 38–9) correctly identified that what’s ‘up for grabs’
is delivered as a qualification of Alice’s is not whether the image depicted in the
evaluation (with the ‘though’ marking the prompt is good or bad, but whether they’re
qualification [Pomerantz, 1984: 97]), and good or bad. It would hardly be surprising
instead of evaluating the activity depicted in given this context, if the recipients were to
the image ‘on its own terms’, Jan’s assessment anticipate and work to avoid producing the
treats it as somehow ‘staged’ or ‘non-genuine’ kind of ‘identity implicative’ commentary
and thereby as something that is possibly not that they assume the moderator is pursuing.
worthy of her evaluation. In sum, there may be a sense in which
So far I have demonstrated how, when the respondents’ apparent trouble with the
they are shown a prompt the respondents prompt, the insertion sequences in which
appear to have some significant difficulties they seek clarification from the moderator,
both working out the grounds on which the delays, and the inexplicitness of their
they are required to respond to it, and in subsequent commentaries on, and evaluations
making their evaluations proper. Instead of of, the prompt may be part of resisting
responding ‘in their own terms’ (and as giving gendered views. If they do provide
I, as a feminist, might have wished), they such views, then they could be labelled sexist
tend to treat the moderator as an ‘expert’ or homophobic – and, as I will show below,
with privileged access to the prompt, and this ‘oriented to’ possibility, creates the ideal
engage her in additional interactional work environment for resistance.
in order to disambiguate its content. Even Indeed, I want to propose that, in addition
where the moderator works explicitly to to fulfilling the task made relevant by the
avoid answering the respondents’ questions showing of the prompt, the design and
about the prompt, and encourages them, delivery of participants’ responses to the
through the use of a conversational counter, moderators’ questions can perform resistive
to put things in their own terms, she is still ‘identity work’. We can find clear evidence
engaged in work to disambiguate the content for this resistance in sequence organisational
of the prompt and progress the interaction terms.
towards her favoured interactional outcome In a search of the corpus I found 12
(i.e. the production of [gendered] views or instances in which respondents actively resist
commentaries on the prompt). Finally, where the production of a gendered view. In these
evaluations are eventually elicited by the instances, the interactions do not progress
moderator, the participants do not follow through the kinds of sequences identified
up on the gender noticing made relevant in above. Instead, they are characterised by an
their own earlier candidate inquiries about the extended ‘series’ of (moderator) questions
prompt’s content. and (respondent) answers concerning the
So what might account for these inter- prompt. The moderator is more or less
actionally ‘troubled’ responses in which dissatisfied with the response she gets in each
participants do not follow up their own initial case, and doggedly pursues her course of
302 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
action until she either elicits some (gendered) or appears satisfied that none will be forth-
commentary on, or evaluation of, the prompt coming9 . Consider excerpt 4, below:
17 (2.2)
18 Ben: -> But if I could, (.) then I prob’ly would.
19 (1.8)
20 Ben: -> If I: (1.0) was interested in it.
21 Mod 1: Fpost a-> Do you think though that it breaks
22 stereotypes at all.
23 (.)
24 Ben: Spost a-> No:.
25 (0.4)
26 Mod 1: Fpost b-> It doesn’t.
27 (.)
28 Ben: Spost b-> [No,]
29 Mod 1: Fpost c-> [I] mean some people would say that he’s a
30 ‘poof’ or something.
31 (0.3)
32 Ben: Spost c-> I think that some people would.
33 (0.6)
34 Mod 1: Fpost d-> But you wouldn’t.
35 (.)
36 Ben: Spost d-> ↑No.
the moderator treats the recipient as having to the thing he is being asked about (at least he
access to, and as able to display a stance has direct visual access to the image depicted
toward, it (2007: 171). Finally, within topic in the prompt).
proffering sequences, just like the prompted The moderator shows that she understands
sequences shown here, the recipient ‘is likely Ben’s response to be disaligned with, and
to carry the burden of the talking’ (2007: 170). resistant of, the topic proffer. Her question
If we consider the showing of the prompt ‘Well what’s going on in it’ (line 5) is ‘well’
as akin to the initiation of a topic proffering prefaced – something that we saw earlier, can
sequence, then the preferred response for this signal disagreement or disaffiliation with the
kind of sequence would be geared toward the prior (Schegloff and Lerner, 2004). As a post-
expansion, rather than closure of the sequence. expansion, it orients to the starkly minimal
In each case, the recipients would display nature of Ben’s answer, and constitutes
a stance toward the prompt that accepts, a ‘second try’ at the topic proffer. Although
encourages, and embraces the proffered topic addressed to the same ‘target’ (the prompt),
(they would literally talk about it) (Schegloff, this question makes relevant a different class
2007: 171), and their responses would be of answer to the prior – not an evaluation of the
oriented toward being ‘more than minimal’. prompt, but a description of what is going on
By contrast, a dispreferred response to a in the image – something Ben arguably needs
prompted topic proffer would be one in which to do before he can evaluate it.
the recipient rejects, declines, or discourages Now Ben identifies the content of the
it (2007: 171). Dispreferred responses would prompt, and, like the participants in excerpts
therefore be designedly minimal, and the 1–3, he does so using a gender-marked term
sequence would move toward ‘incipient ‘It’s a male ballet dancer’ (line 7). The
closure’ (2007: 180). moderator’s third position, ‘Ri:ght’ (line 9)
Right from the start, Ben refuses to embrace shows that he is now on the right lines,
the possibility for discussion engendered by grasping the nature of the task she is setting in
the showing of the prompt, or to produce the showing him the prompt and closes this part
extended evaluative commentary that it makes of the sequence.
procedurally relevant. Instead, he responds The moderator continues by asking ‘Would
with a delayed and starkly minimal, unmit- you do that?’ (line 11), thus turning the
igated, one-word answer to the moderator’s focus away from the picture to Ben’s own
opening question: ‘Lovely’ (line 3). This is relationship to the activity it depicts – ballet
said with final intonation, and does not yield dancing. After a lengthy delay (lines 12–14),
to the 0.8 second silence which follows. This Ben answers ‘No’, explaining that he would
silence provides ample opportunity for Ben to not do ballet because he has ‘got dodgy
resume talking and thereby expand, unpack, ankles’ (line 15). The moderator appears
or account for his (minimal) response. It is to laugh briefly here, and a series of gaps
worth noting that his evaluation does respond follow (lines 17 and 19) in which she
in a ‘type conforming’ (Raymond, 2003) way withholds any further response, allowing Ben
to the moderator’s question (the question to incrementally unpack his account for why
makes an assessment [what Ben ‘thinks of’ he would not do ballet (lines 18 and 20).
the prompt] relevant, and this is what Ben There is much that could be said about the
provides). However, it does not meet the way Ben crafts this account. However, one
requirement for expansion associated with the of the most interesting features of it is that it
hitherto mentioned preference organisation seems designed so as to deflect the potential
for a topic proffer. Ben’s bald response stands imputation that he would not want to do ballet
out as resistive here, not only because it for reasons of prejudice. Ben presents his
is designedly minimal, patently ‘not playing reasons for not wanting to do ballet as due
along with’ the task set by the moderator, but to his physical incapacity (his ‘dodgy ankles’
also because he clearly does have direct access [line 15]) and lack of interest (line 20) rather
304 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
than his conscious choice. As he makes clear: statement about what ‘some people would’
‘if I could, (.) then I prob’ly would’ (line 18). say (line 32). The grammatical construction
(For more on ‘inability’ accounts see Drew, of Ben’s turn – in particular the repetition
1984.) of ‘some people would’ is another way to
There follows a series of post-expansions embody a minimal response (Schegloff, 2007:
where the moderator makes a concerted 171). Indeed, this turn is designedly not
effort to elicit (or else initiate repair on) adding anything to what the moderator’s prior
Ben’s view about male ballet dancers (e.g. utterance has done, and does not progress or
lines 21–22, 26, 29–30, and 34). However, develop the course of action or prompt-related
Ben actively resists responding to each ‘topic talk’. Finally, when the moderator
successive intervention on the moderator’s pursues the question of his view: ‘But you
terms, producing only minimal answers (lines wouldn’t’ (line 34), he simply provides a
24, 28, 32, 36). Even where the moderator further bald and final ‘↑No’response (line 36).
invites him to reconsider his response with the This excerpt neatly highlights some of the
initiation of a disagreement implicative, other- interactional contingencies that participants’
initiated repair (‘It doesn’t’. [line 26], (For responses to picture prompts may be designed
more on the conversation analytic concept of to manage. In this instance, the prompt is
repair see Schegloff, 2007: 151 and Schegloff not treated by Ben as a facilitator of talk in
et al., 1997)). Ben does not work to resolve the which he is free to set the priorities. Rather,
misalignment by backing down, expanding his response is co-constructed within a context
his answer, or adjusting it to make it more of mutual suspicion, and in which he exposes
acceptable to the moderator. Instead, he and seeks to manage what he takes to be
simply repeats his prior, bald ‘No’, response the researcher’s (hidden) agenda. Specifically,
(line 28). That he does this is further evidence Ben orients to the moderators’ questions,
for resistance: He is pointedly refusing to and his responses, as things that may reveal
‘play along’ with the moderator’s agenda. something negative about him (he may be
Ben’s resistance may be due, in part, to effeminate, gay or prejudiced, for example).
his being asked questions that may involve Instead of ‘playing along with’ the task set
answers that could potentially place him in by the moderator by engaging in prompt-
(what he takes to be) a negative identity related topic talk (thus collaborating with
category – as someone who is ‘effeminate’, the moderator in progressing the interaction
‘gay’ or ‘homophobic’, for example. His toward the successful resolution of the
resistance to the latter is most obvious in his sequence), Ben’s answers seem dedicated to
response to the moderator’s ‘[I] mean some pre-empting, deflecting, and actively resisting
people would say that he’s a ‘poof’ or inferences that he is a certain sort of person,
something’ (lines 29–30). This observation and which may have negative implications
is clearly designed in continuity with the for his identity.
moderator’s previous line of questioning, and
in response to Ben’s failure to repair his
minimal answer. In citing others’ hypothet- DISCUSSION
ical, prejudiced views, the observation is
designedly provocative – placing Ben in a I began this chapter by summarising some
position where he might discuss his views key issues at the heart of debates about
on the normativity (or otherwise) of male natural and contrived data. I suggested in
ballet dancing. However, instead of treating particular that the strong preference for natural
the observation as something that is designed data expressed by conversation analysts and
to elicit his view, or as another attempt discursive psychologists derives from a con-
by the moderator at a topic proffer, Ben cern not to suppress fundamental features
simply agrees with the moderator’s assertion, of the natural interactional phenomena to
producing a second pair part to a factual which they wish to gain access. I argued that
NATURAL AND CONTRIVED DATA 305
this preference for natural data is especially the participants’ questions about the prompt
problematic for feminist researchers who, for (through the use of a conversational counter,
various reasons to do with assumptions about for example [as in excerpt 3]), she would often
the observability, access to, and frequency of quickly re-engage in talk that would progress
occurrence of the phenomena they wish to the course of action toward her favoured
study, have tended to work with relatively interactional outcome (i.e. the production of
contrived social science data sources such [gendered] commentary on/evaluations of the
as surveys, interviews, and focus groups. prompt). However, while the prompts initially
For them, far from suppressing the kinds appeared successful in getting the participants
of ‘natural’ phenomena to which they wish to notice gender, these initial gender noticings
to gain access, the artificially elicited ‘topic were rarely followed up in their subsequent
talk’ that they derive from contrived materials evaluative commentaries.
render those phenomena observable – and Finally, in a number of instances (depicted
hence studiable. here by excerpt 4), the participants strongly
In order to explore the kinds of gender- resisted seeing what they were supposed to see
relevant evidence and insights that close in the prompt, and it often took considerable
analysis of a relatively contrived dataset constructive work on the moderator’s part,
provides, I revisited some data from my and further follow-up questioning, in order
own early research on gender and leisure, to produce the kind of non-minimal reaction
in which I used picture prompts in order to the prompt that the moderator was after.
to access people’s views about men and In these instances, the participants seemed
women’s participation in ‘non-traditional’ suspicious about (what they took to be)
activities (such as men’s ballet and women’s the researcher’s ‘elusive hypothesis’, and
rugby, for example). I showed that, when we oriented to the possibility that their responses
subject the actual use of relatively ‘contrived’ might have negative implications for their
techniques involving prompts to a detailed identity. Far from being naive cultural dopes
analysis, that such techniques do not always that passively accepted the doing of social
work in the way the researcher might have science upon them, then, in these instances,
intended them to work. Thus, in my data, the participants would actively strive to subvert
participants were invited to find something such an image. They resisted the potential
topical in, or ‘comment-worthy’ about the inferences about their identities that were
prompt. In just under half the instances being imposed on them by researchers.
in the corpus, what they were invited to In sum, the prompts did not seem to
see was obviously and immediately self- minimise the researcher’s impact, generate
evident to them, and they engaged in lively non-hierarchical research relationships, or
discussion about the (gendered) content of the encourage the respondents to set the priorities
prompt. In other instances, including the first ‘in their own terms’. As we have seen, their
three excerpts discussed in this chapter, the evaluations and commentaries on the prompt
participants seemed to have trouble seeing were rarely delivered in a spontaneous,
what they were supposed to see in the unencumbered, or naturalistic fashion, and
prompt. They sought clarification from the attempts to disguise researcher provocation as
moderator (in the form of an ‘insertion free-for-all opinion giving, or manipulation as
sequence’), of the grounds on which they complete freedom, did not work.
were required to respond to it, engaging So what might these analyses tell us
her in work to disambiguate its content. about the relative virtues of natural and
Thus, the participants routinely treated the contrived data? The interactional contingen-
moderator as ‘expert’ on the prompts and cies that I have shown the participants are
her opening question as a ‘test’ question for oriented towards in their responses pose
which there is a right or wrong answer. Even problems for researchers who treat prompts,
where the moderator tried to resist answering or other ‘contrived’ techniques involving
306 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
on gender and language (which tends to ask (e.g. in a court of law). Nor would I want to
members to comment on gender, seeks out imply that we will never obtain naturalistic
their retrospective reports on how they do talk, or gain access to general features
gender, or else is based on the researcher’s of the ‘doing’ of gender in purportedly
own recollections and post hoc reports of ‘contrived’ materials. Our inability to strip
gendered events), this new, naturally occur- data of its context should not (necessarily) be
ring dataset, has allowed me to examine adequate justification for abandoning the use
examples of interactions in which members of contrived materials altogether. As I have
are currently engaged in the act of doing shown here, by adopting a reflexive approach
gender with the psychiatrist in the clinic. And to our data, and by being sensitive to the ways
once I began to look at this dataset, it became in which the researcher herself is bound up in
apparent that often the doing of gender (in the production of that data, we can obtain rich
particular – working to pass as ‘authentically’ insights into respondents’ ways of managing
male or female in this setting) does not involve the interactional issues and dilemmas their
its overt topicalisation at all (for more on this participation throws up.
see Speer and Green, 2007; Speer and Parsons, By turning what is commonly regarded
2006; and also C. Kitzinger, 2006, 2007). as a ‘resource’ (albeit an inherently flawed
Some researchers suggest that in the future, one) into their ‘topic’, an increasing number
it is likely that the use of ‘naturalistic materi- of researchers using fine-grained analytic
als’ will become more common in qualitative methods have been able to show how social
research ‘and interviews and focus groups science methods get done, identifying features
will be mainly an adjunct to those naturalistic which characterise, say, interview talk as
studies’ (Potter, 2003: 614). In a recent debate interview talk, and which distinguish it
Potter and Hepburn (2005a: 282) ‘challenge from ‘mundane conversation’ (Drew et al.,
the taken-for-granted position of the open- 2006; Maynard et al., 2002; Mishler, 1986;
ended interview as the method of choice in Suchman and Jordan, 1990). In such studies
modern qualitative psychology’, suggesting the researcher is treated – not as a potential
that ‘The ideal would be much less interview ‘contaminant’ – but rather, as much of a
research, but much better interview research’ ‘member’ as the other participants, and of
(2005a: 282). They argue that in the future, it equal status for the purposes of analysis.
is likely that the use of ‘naturalistic materials’ Thus, I want to urge caution in applying
will become more common in qualitative the ‘natural-contrived’ distinction too rigidly.
research ‘and interviews and focus groups As I have argued elsewhere (Speer, 2002a,
will be mainly an adjunct to those naturalistic 2002b), from a discursive and CA perspective,
studies’ (Potter, 2003: 614). Indeed, Schegloff it actually makes little theoretical or practical
(1996b: 471, emphasis added), suggests that sense to map the natural/contrived distinction
‘investigators should increasingly work with onto discrete ‘types’ of data or to treat
such [naturalistic] materials’. the researcher as a potentially contaminating
Even though I would generally subscribe force. In this respect the natural-contrived
to these recommendations, and have used the distinction has been overplayed. What are
data in this chapter to demonstrate the virtues natural data and what are not is not decidable
of analysing naturally occurring data, one on the basis of their type and/or the role of
important caveat needs to be noted: I would the researcher within the data. All data can be
not want to imply that existing feminist data natural or contrived depending on what one
collection practices and modes of analysis wants to do with them.
are wrong or bad, or that we should stop Thus, it follows that it is fine if we, as
using contrived materials and other ‘non- feminist researchers, want to use contrived
directive’ techniques altogether. The rhetoric materials to explore how gender talk is derived
of social science data – just like essentialism – in research contexts, paying close attention to
can be a useful tool in certain circumstances the constructive processes involved (the data
308 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Have, P. ten (2002) ‘Ontology or methodology? Kitzinger, C. (2006) ‘Talking sex and gender’. In
Comments on Speer’s “natural” and “contrived” P. Drew, G. Raymond and D. Weinberg (eds) Talk and
data: A sustainable distinction?’ Discourse Studies Interaction in Social Research Methods, pp. 155–170.
4(4): 527–30. London: Sage.
Henwood, K. (2007) ‘Beyond hypercriticality: Taking for- Kitzinger, C. (2007) Is ‘woman’ always relevantly
ward methodological inquiry and debate in discursive gendered?, Gender and Language 1(1): 39–49.
and qualitative social psychology’, Discourse Studies Kitzinger, C. and Frith, H. (1999) ‘Just say no? The
4(9): 270–5. use of conversation analysis in developing a feminist
Heritage, J. (1984) Garfinkel and Ethnomethodology. perspective on sexual refusal’, Discourse and Society
Cambridge: Polity Press. 10(3): 293–316.
Heritage, J. (1988) ‘Explanations as accounts: A con- Kitzinger. C. and Powell, D. (1995) ‘Engendering
versation analytic perspective’. In C. Antaki (ed.) infidelity: Essentialist and social constructionist
Analysing Everyday Explanation: A Casebook of readings of a story completion task’, Feminism &
Methods, pp. 127–44. London: Sage. Psychology 5: 345–72.
Heritage, J. and Atkinson, J.M. (1984) ‘Introduction’. In Kitzinger, C. and Wilkinson, S. (2003) ‘Construct-
J.M. Atkinson and J. Heritage (eds) Structures of Social ing identities: A feminist conversation analytic
Action. Studies in Conversation Analysis, pp. 1–15. approach to positioning in action’. In R. Harré
Cambridge: Cambridge University Press. and A. Moghaddam (eds) The Self and Others:
Hollway, W. (2005) ‘Commentary 2’, Qualitative Positioning Individuals and Groups in Personal,
Research in Psychology 2(4): 312–14. Political and Cultural Contexts, pp. 157–180.
Holstein, J.A. and Gubrium, J.F. (1997) ‘Active New York Praeger/Greenwood.
interviewing’. In D. Silverman (ed.) Qualitative Kitzinger, J. (1990) ‘Audience understandings of AIDS
Research: Theory, Method and Practice, pp. 113–29. media messages: A discussion of methods’, Sociology
London: Sage. of Health and Illness 12: 319–35.
Holstein, J.A. and Gubrium, J.F. (2003) ‘Context: Work- Kitzinger, J. (1994) ‘The methodology of focus
ing it up, down, and across’. In C. Seale, G. Gobo, groups: The importance of interaction between
J. Gubrium and D. Silverman (eds) Qualitative research participants’, Sociology of Health and Illness
Research Practice, pp. 297–311. London: Sage. 16: 103–21.
Hopper, R. (2003) Gendering Talk. East Lansing, MI: Kitzinger, J. and Barbour, R.S. (1999) ‘Introduction:
Michigan State University Press. The challenge and promise of focus groups’. In
Hopper, R. and LeBaron, C. (1998) ‘How gender R.S. Barbour and J. Kitzinger (eds) Developing
creeps into talk’, Research on Language and Social Focus Group Research: Politics, Theory and Practice,
Interaction 31(1): 59–74. pp. 1–20. London: Sage.
Hughes, R. (1998) ‘Considering the vignette technique Lakoff, R. (1973) ‘Language and woman’s place’,
and its application to a study of drug injecting and Language in Society 2: 45–79.
HIV risk and safer behaviour’, Sociology of Health and Levinson, S. C. (1992) ‘Activity Types and Language’. In
Illness 20(3): 381–400. P. Drew and J. Heritage (eds) Talk at Work: Interaction
Hutchby, I. and Wooffitt, R. (1998) Conversation in Institutional Settings, pp. 66–100. Cambridge:
Analysis. Cambridge: Polity. Cambridge University Press.
Jefferson, G. (2004a) ‘Glossary of transcript symbols Livia, A. (2003) ‘“One man in two is a woman”:
with an introduction’. In G.H. Lerner (ed.) Conver- Linguistic approaches to gender in literary texts’. In
sation Analysis: Studies from the First Generation, J. Holmes and M. Meyerhoff (eds) The Handbook
pp. 13–31. Amsterdam: John Benjamins. of Language and Gender, pp. 142–58. Oxford:
Jefferson, G. (2004b) “‘At First I Thought”: A normaliz- Blackwell.
ing device for extraordinary events’. In G.H. Lerner Lykke, A. (ed.) (2005) ‘Transformative methodologies in
(ed). Conversation Analysis: Studies from the First feminist studies’, Special Issue, European Journal of
Generation, pp. 131–67. Amsterdam/Philadelphia: Women’s Studies 12(3).
John Benjamins. Lynch, M. (2002) ‘From naturally occurring data to
Kitzinger, C. (2000) ‘Doing feminist conversation naturally organized ordinary activities: Comment on
analysis’, Feminism & Psychology 10(2): 163–93. Speer’, Discourse Studies 4(4): 531–7.
Kitzinger, C. (2003) ‘Feminist approaches’. In C. Seale, Maynard, D.W., Houtkoop-Steenstra, H., Schaeffer, N.C.
G. Gobo, J. Gubrium and D. Silverman (eds) and van der Zouwen, J. (eds.) (2002) Standardization
Qualitative Research Practice, pp. 125–40. London: and Tacit Knowledge. Interaction and Practice in the
Sage. Survey Interview. John Wiley: New York.
NATURAL AND CONTRIVED DATA 311
Mishler, E.G. (1986) Research Interviewing: Context and Sacks, H. (1984) ‘Notes on methodology’. In
Narrative. Cambridge, MA: Harvard University Press. J.M. Atkinson and J. Heritage (eds) Structures of
Mishler, E. (2005) ‘Commentary 3’, Qualitative Research Social Action: Studies in Conversation Analysis,
in Psychology 2(4): 315–18. pp. 21–7. Cambridge: Cambridge University Press.
Pollak, S. and Gilligan, C. (1982) ‘Images of violence Sacks, H. (1987) ‘On the preferences for agreement and
in thematic apperception test stories’, Journal of contiguity in sequences in conversation’. In G. Button
Personality and Social Psychology 42: 159–67. and J.R.E. Lee (eds) Talk and Social Organisation,
Pomerantz, A. (1984) ‘Agreeing and disagreeing with pp. 54–69. Clevedon: Multilingual Matters.
assessments: Some features of preferred/dispreferred Sacks, H. (1995) Lectures on Conversation, Vols. 1 & 2,
turn shapes’. In J.M. Atkinson and J. Heritage (eds) ed. Gail Jefferson. Oxford: Blackwell.
Structures of Social Action: Studies in Conversa- Sacks, H., Schegloff, E.A. and Jefferson, G. (1974)
tion Analysis, pp. 57–101. Cambridge: Cambridge ‘A simplest systematics for the organization of turn-
University Press. taking for conversation’, Language 50(4): 696–735.
Potter, J. (1996) ‘Discourse analysis and construc- Schegloff, E.A. (1988a) ‘Presequences and indirection:
tionist approaches: Theoretical background’. In Applying speech act theory to ordinary conversation’,
J.T.E. Richardson (ed.) Handbook of Qualitative Journal of Pragmatics 12: 55–62.
Research Methods for Psychology and the Social Schegloff, E.A. (1988b) ‘Goffman and the analysis of
Sciences, pp. 125–40. Leicester: BPS Books. conversation’. In P. Drew and A. Wootton (eds)
Potter, J. (2002) ‘Two kinds of natural’, Discourse Studies Erving Goffman: Exploring the Interaction Order,
4: 539–42. pp. 89–135. Cambridge: Polity Press.
Potter, J. (2003) ‘Discourse analysis’, in M. Hardy Schegloff, E.A. (1995) Introduction (Volume 1). In
and A. Bryman (eds) Handbook of Data Analysis, Sacks, H. (ed.) Lectures on Conversation. 2 vols.
pp. 607–24. London: Sage. Edited by Gail Jefferson. pp. ix–lxii. Oxford: Basil
Potter, J. (2004) ‘Discourse analysis as a way of Blackwell.
analysing naturally occurring talk’. In D. Silverman Schegloff, E.A. (1996a) ‘Confirming allusions: Toward
(ed.) Qualitative Research: Theory, Method and an empirical account of action’, American Journal of
Practice, pp. 200–21. London: Sage.Potter, J. and Sociology 104(1): 161–216.
Hepburn, A. (2005a) ‘Qualitative interviews in Schegloff, E.A. (1996b) ‘Some practices of referring to
psychology: Problems and prospects’, Qualitative persons in talk-in-interaction: A partial sketch of a
Research in Psychology 2: 281–307. systematics’. In B. Fox (ed.) Studies in Anaphora,
Potter, J. and Hepburn A. (2005b) Action, interaction pp. 437–85. Amsterdam: Benjamins.
and interviews: Some responses to Hollway, Mishler Schegloff, E.A. (1998) ‘Reflections on studying
and Smith, Qualitative Research in Psychology prosody in talk-in-interaction’, Language and Speech
2: 319–25. 41(3–4): 235–63.
Potter, J. and Hepburn, A. (2007) ‘Life is out there: Schegloff, E.A. (1999) ‘Discourse, pragmatics, conversa-
A comment on Griffin’, Discourse Studies 9(4): tion, analysis’, Discourse Studies 1(4): 405–36.
276–82. Schegloff, E.A. (2004) ‘Experimentation or observation?
Potter, J. and Wetherell, M. (1987) Discourse and Of the self alone or the natural world?’ Behavioral
Social Psychology: Beyond Attitudes and Behaviour. and Brain Sciences 27(2): 271–2.
London: Sage. Schegloff, E. A. (2007) Sequence Organization in
Potter, J. and Wetherell, M. (1995) ‘Natural order: Interaction: A Primer in Conversation Analysis, Vol 1.
Why social psychologists should study (A constructed Cambridge: Cambridge University Press.
version of) natural language, and why they have not Schegloff, E.A., Jefferson, G. and Sacks, H. (1977) ‘The
done so’, Journal of Language and Social Psychology preference for self correction in the organisation of
14(1–2): 216–22. repair in conversation’, Language 53: 361–82.
Psathas, G. (1995) Conversation Analysis: The Study of Schegloff, E.A. and Lerner, G. (2004) ‘Beginning to
Talk-in-Interaction. London: Sage. respond’, Paper presented at the Annual Meeting of
Ramazanoglu, C. and Holland, J. (2002) Feminist the National Communication Association, Chicago,
methodology: Challenges and Choices. London: IL, November.
Sage. Schegloff, E.A. and Sacks, H. (1973) ‘Opening up
Raymond, G. (2003) ‘Grammer and social organization: closings’, Semiotica 8: 289–327.
Yes/No type interrogatives and the structure of Schlesinger, P., Dobash, R.E., Dobash, R.P. and
responding’, American Sociological Review 68: Weaver, C.K. (1992) Women Viewing Violence.
939–67. London: British Film Institute.
312 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Silverman, D. (2006) Interpreting Qualitative Data: in the psychiatric assessment of transsexual patients’.
Methods for Analysing Talk, Text and Interaction, In V. Clarke and E. Peel (eds) Out in Psychology:
3rd edn. London: Sage. Lesbian, Gay, Bisexual, Trans and Queer Perspectives,
Sleed, M., Durrheim, K., Kriel, A., Solomon, V. and pp. 336–68. Chichester: Wiley.
Baxter, V. (2002) ‘The effectiveness of the vignette Speer, S.A. and Hutchby, I. (2003a) ‘From ethics to
methodology: A comparison of written and video analytics: Aspects of participants’ orientations to
vignettes in eliciting responses about date rape’, the presence and relevance of recording devices’,
South African Journal of Psychology 32(3): 21–8. Sociology 37(2): 315–37.
Smith, J. (2005) ‘Commentary 1: Advocating pluralism’, Speer, S.A. and Hutchby, I. (2003b) ‘Methodology
Qualitative Research in Psychology 2(4): 309–11. needs analytics: A rejoinder to Martyn Hammersley’,
Snelling, S.J. (1999) ‘Women’s perspectives on fem- Sociology 37(2): 353–9.
inism: A Q-methodological study’, Psychology of Speer, S.A. and Parsons, C. (2006) ‘Gatekeeping gender:
Women Quarterly 23: 247–66. Some features of the use of hypothetical questions in
Speer, S.A. (2001) ‘Reconsidering the concept of the psychiatric assessment of transsexual patients’,
hegemonic masculinity: Discursive psychology, con- Discourse & Society 17(6): 785–812.
versation analysis, and participants’ orientations’, Suchman, L. and Jordan, B. (1990) ‘Interactional troubles
Feminism and Psychology 11(1): 107–35. in face-to-face survey interviews’, Journal of the
Speer, S.A. (2002a) ‘Natural and contrived data: American Statistical Association 85(409): 232–41.
A sustainable distinction?’ Discourse Studies 4(4): Sunderland, J. (2004) Gendered Discourses. Basingstoke:
511–25. Palgrave Macmillan.
Speer, S.A. (2002b) ‘Transcending the natural/contrived Taino, L. (2003) ‘“When shall we go for a ride?” A case
distinction: A rejoinder to ten Have, Lynch and Potter’, of the sexual harassment of a young girl’, Discourse &
Discourse Studies 4(4): 543–8. Society 14: 173–90.
Speer, S.A. (2002c) ‘What can conversation analysis con- Wilkinson, S. (1999) ‘Focus groups: A feminist method’,
tribute to feminist methodology? Putting reflexivity Psychology of Women Quarterly 23: 221–44.
into practice’, Discourse and Society 13(6): 801–21. Wilkinson, S. and Kitzinger, C. (2007) ‘Conversation
Speer, S.A. (2005) Gender Talk: Feminism, Discourse analysis, gender and sexuality: A feminist perspec-
and Conversation Analysis. London: Routledge. tive’. In A. Weatherall, B. Watson and C. Gallois
Speer, S.A. and Green, R. (2007) ‘On passing: The (eds) Language, Discourse and Social Psychology,
interactional organization of appearance attributions pp. 206–30. Basingstoke, UK: Palgrave: Macmillian.
18
Self-Administered
Questionnaires and
Standardized Interviews
Edith de Leeuw
this source. For an overview of non-response response rate in mail surveys a respondent-
sources and design implications on response friendly questionnaire and cover letter in
propensity, see Dillman et al. (2002). combination with well-timed reminders is
In general, face-to-face surveys tend to necessary (Dillman, 1978, 2000), while for
obtain higher response rates than comparable Internet surveys a well-written invitation, in
telephone surveys, but both methods show combination with reminders and a good lay
a decrease in response over time. Mail out and respondent-friendly Web interface is
surveys tend to have a lower response rate essential (Dillman, 2000, 2007; Lozar et al.,
than comparable face-to-face and telephone 2008). Two measures are effective in all
surveys. However, there is no evidence for forms of data collection, that is, both in
a decrease of response over time in mail interviews and in self-administered mail and
surveys. Thus, the differences in response Internet surveys. Advance letters or prenoti-
between survey methods have become smaller fications do have a positive influence on the
both in Europe and in the USA and Canada response for all types of surveys (De Leeuw
(e.g. Goyder, 1987; Hox and De Leeuw, 1994). et al., 2007). The same goes for incentives,
In recent years telephone response rates have which are effective in raising response in
further decreased, partly due to technologi- both self-administered and interview surveys
cal changes, such as call-screening devices (e.g. Singer, 2002). It should be noted
which increase the non-contacts, partly due that, in general, incentives sent in advance,
to changes in attitude towards unwanted the ‘prepaid’ incentives, work better than
telephone calls (Curtin et al., 2005; Steeh ‘promised’ incentives. Furthermore, there is
and Piekarski, 2006). Systematic overviews no clear evidence that ‘lotteries’ are effective
of response rates in Internet surveys are in increasing response.
scarce; studies comparing response rates
among Internet, mail, and telephone surveys
Question development
suggest that response rates are generally lower
for online surveys (Matsuo et al., 2004). A sound questionnaire is essential for data
Empirical comparisons between e-mail and gathering in both self-administered ques-
paper mail surveys of the same population tionnaires and structured interviews. The
indicate that response rates on e-mail surveys questions asked should cover the research
are lower than for comparable paper mail objectives in order to avoid specification
surveys (Couper, 2000); similar results are errors and to get valid answers. Specification
found for list-based Web surveys (Couper, error – a term from survey methodology –
2001). occurs when the final version of the question,
To reduce non-response in interview sur- as printed in the questionnaire, fails to collect
veys, one has to reduce both the non-contact information that is essential to answer the
(e.g. through intensified field work), and research question (cf. Biemer and Lyberg,
the refusals. The fact that response rates in 2003). In the social sciences this is usually
structured interviews are in general higher referred to as construct validity: does the
than in self-administered surveys, is mainly question measure what it is supposed to mea-
due to the role of the interviewer as persuader sure? Does it measure the intended theoretical
of reluctant respondents (cf. Groves and construct? (See: Cronbach and Meehl, 1955;
Couper, 1998). Interviewers may differ in see also Embretson and Bovaird, this book,
their individual success rate, but all inter- on measurement and scaling).
viewers can be trained to do a good job of But a good question needs to do more than
convincing respondents to cooperate, both cover the construct, it should be understand-
for face-to-face surveys (National Centre able and the respondent should be able to
for Social Research, 1999; Snijkers et al., answer it. When constructing questionnaires
1999) and for telephone interviews (Groves a researcher should start with following
and McGonagle, 2001). To achieve a high the basic rules for general questionnaire
316 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
question. Visual design through the use of and placing explicit interviewer instructions
graphical tools and lay-out is very important in parentheses (Salant and Dillman, 1994,
to successfully transform a collection of pp. 130–132). This is all for the benefit
questions into a well-designed questionnaire of interviewers, not for the eyes of the
(see for instance Redline et al., 2003). It is respondent. Exceptions are response cards
important to note that structured interviews printed with the major answer categories,
and self-administered questionnaires differ which are shown to respondents when long
in necessary layout and in how the final lists of response categories are presented in
questionnaire has to be constructed. The face-to-face interviews.
users are different and have different needs: In contrast, in a self-administered ques-
interview schedules are designed for trained tionnaire everything must be tailored to
interviewers who have to guide a respondent the respondent. There is no interviewer to
through the question-answer process, while motivate or help out, and the questionnaire
self-administered questionnaires should be itself should do it all. Visual design is here
totally self-explanatory to respondents. of the utmost importance. Salant and Dillman
Interview schedules constructed for struc- (1994) and Dillman (2000, 2007) give clear
tured interviews, both over the telephone or instructions and numerous examples of how
face-to-face, contain besides the questions to order questions, give instructions, and
also instructions for trained interviewers. As motivate respondents. Numbers, symbols,
a consequence, a finalized interview schedule and graphical layout (e.g. spacing, loca-
contains text to be read aloud by the inter- tion, brightness, contrast, and figure/ground
viewer, text that should never be read aloud arrangements) all communicate meaning, and
at all, and text that only in certain situations should be used to optimize a questionnaire
should be read. Examples of texts that always for self-administered use. A good example
are read out aloud by the interviewer are of how this has been done in a consistent
the questions themselves, texts to make the way is described by Dillman et al. (2005).
transition from one group of questions to For a theoretical background see Jenkins and
the next (e.g. ‘now I would like to ask you Dillman (1997), and Redline et al. (2003).
some question on …’), and instructions to
respondents (e.g. ‘I am going to read you
a list of ... statements. For each, please FACE-TO-FACE INTERVIEWS
indicate whether you think it is not important,
somewhat important, or very important’). Face-to-face interviews are the most flexi-
Examples of texts that are never read are ble form of data collection method. Main
specific interviewer instructions (e.g. ‘probe if advantages of the face-to-face interview are
the respondent does not answer’, or, ‘skip the availability of an interviewer to structure
to question 13’), or certain response and/or the interview situation and help and motivate
coding categories (e.g. ‘refused, no opinion, respondents. Furthermore, the face-to-face
does not apply’). An example of a text that setting allows for optimal communication, as
is sometimes read aloud is: ‘if you are not both verbal and non-verbal communication
sure, please give me your best guess’. To avoid are possible. Structured or partly structured
interviewer mistakes and to help interviewers interview schedules with open questions
read out aloud the correct information it is can be used as the interviewer poses the
advised to use consistent graphical language, questions, follows up with additional probes,
such as different fonts. Examples are using bridges silences, and records answers. The
bold type for all questions, signalling that presence of a well-trained interviewer also
all text in bold should be read aloud. enables the researcher to use a variety
For other types of information, other styles of measurements besides simple question-
should be used; for instance, instructions in answer sequences. For instance, respondents
italics, categories not to be read in capitals, can be asked to sort objects or pictures,
318 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
perform specific tasks, or the interviewer may in a mixed-mode design (for more details,
even do some physical measurements, e.g. see De Leeuw, 2005). In general, interviewers
in health-related studies. Also, respondents affect respondents and their answers also
can be presented with all kinds of visual when non-sensitive questions are being asked.
stimuli, ranging from simple response cards Respondents that are interviewed by the
listing the answer categories for a question to same interviewer tend to have more similar
pictures, advertisement copy or video clips. answers; this is called the interviewer effect or
Finally, highly complex questionnaires can interviewer variance. There are many reasons
be successfully implemented as a trained for this: interviewers vary in their capabilities
interviewer takes care of navigating through of motivating respondents, they may use
the questionnaire. In computer-assisted face- different probing techniques, or reword badly
to-face interviews (CAPI), the interviewer is worded questions in different ways, etc. (for
guided through the (complex) questionnaire more detail see Japec, 2005). Well-tested
by a computer program. This lowers error questionnaires, standardized procedures, and
rates even more and gives the interviewer thorough interviewer training is necessary to
more opportunities to concentrate on the reduce unwanted interviewer effects.
interviewer-respondent interaction. (For an Face-to-face interviews are the ‘Rolls
overview see De Leeuw, 1992, 2004.) Royce’ of data collection and just like the
When one is interested in studying the car they are extremely costly and take much
general population, the face-to-face survey care and time to get rolling. Interviewers have
also has the greatest potential. Sophisticated to be trained, not only in standard interview
sampling designs for face-to-face surveys techniques, but also in how to implement
have been developed, which do not require sampling and respondent selection rules and
a detailed sampling frame or a list of persons in how to solve various problems that can arise
or households. For instance, area probability when they are working along in the field. In
sampling can be used to select geographically addition, an extensive supervisory network is
defined units (e.g. streets or blocks of houses) needed to maintain quality control. Finally,
as primary units and households within an administrative manager is needed to make
these areas. Therefore, a main advantage of sure that new addresses and interview material
face-to-face interviews is its potential for a are mailed to the interviewers on a regular
high coverage of the intended population. basis.
Elaborate techniques based on household
listings (e.g. inventories of all household
members derived by an interviewer) can then TELEPHONE INTERVIEWS
be used to randomly select one respondent
from those eligible in a household (e.g. Kish, Telephone interviews are less flexible than
1965). face-to-face interviews. Their major draw-
The presence of an interviewer is a great back is the absence of visual cues during
advantage, but it can also be a disadvantage. the interview; telephone is auditory only.
Respondents may feel inhibited to answer This limits interviewers in their tools for
more sensitive questions in the presence of communication. For instance, as no non-
an interviewer, and in general, more socially verbal communication is possible, they have
desirable answers and conventional answers to say explicitly ‘thank-you’ or ‘yes’, instead
are given in interviews than when a self- of nod or smile. The absence of a visual
administered questionnaire is being used. If channel of communication also limits the
some questions have a very sensitive nature, researcher in the type of questions that can be
but a face-to-face interview is preferable for asked. For instance, questions using graphical
other reasons (e.g. coverage, additional ques- techniques, like smiley faces, and ranking and
tions) a good strategy is to combine an inter- sorting techniques are not possible. Semantic
view with a self-administered questionnaire differentials and other rating tasks with many
SELF-ADMINISTERED QUESTIONNAIRES AND STANDARDIZED INTERVIEWS 319
potential response categories will be difficult skills needed in one person is less than in face-
to use. As no response cards with lists of to-face interviews. The majority of telephone
answer categories are available in telephone interviewers no longer have to be prepared for
interviews, the interviewer and respondent every possible emergency and can concentrate
have to rely solely on the auditory channel of on standard, but high-quality interviewing.
communication. The interviewer has to read Special respondents or problem cases can be
out aloud the question along with the available dealt with by the available supervisor or can
answer categories and the respondent has be allocated to specially skilled and trained or
to try to keep all possibilities in memory. bi-lingual interviewers.
As a consequence, only very familiar scales, Because of the potentials for close supervi-
such as 0 to 10 scales (‘on a scale of sion and quality control, interviewer effects
0 to 10 where …’) or questions with a are in general smaller over the phone
limited number of response categories can than in face-to-face interviews (e.g. Groves,
be used. This has led to the development 1989, chapter 8). Interviewers can effect the
of special question formats in which the responses given in different ways, by the
answer categories are split up, for questions way they read the question and emphasize
with seven or more response categories. certain parts, by deviating from prescribed
An example is the two-step or unfolding wording, by reacting in different ways to
procedure in which respondents are first questions or problems of the respondents,
asked if they are ‘satisfied’, ‘dissatisfied’ or and even by the way they look or sound.
‘somewhat in the middle’, and depending on As interviewers are only a voice over
their answer, are asked specific follow-up the phone, many interviewer characteristics
questions (e.g. ‘is this completely satisfied, (e.g. those connected with appearance) will be
mostly satisfied, or somewhat satisfied’). In less obvious. Furthermore, the close supervi-
general, over the telephone questions must be sion and potential for immediate feedback on
short and easily understandable. inadequate interviewer behaviour will lessen
However, just as in face-to-face interviews, unwanted interviewers’ influence over the
well-trained interviewers are an advantage. In phone.
telephone surveys the interviewer can assist Telephone interviews are only feasible if
respondents in understanding questions, can telephone coverage is high, in other words
administer questionnaires with a large number if the non-telephone part of the population
of screening questions, control the question can be ignored. To be sure that persons with
sequence, and probe for answers on open unlisted telephones are also included, one
questions.Again like in CAPI, the use of CATI can employ random digit dialling. Random
makes these tasks easier for the interviewer. digit dialling techniques, which are based on
The personnel requirements for a telephone the sampling frame of all possible telephone
survey are less demanding than in face-to- numbers, make it feasible to use telephone
face surveys. Usually, telephone interviews interviews in investigations of the general
are conducted from a central setting where population. A new challenge to telephone
supervisors and quality controllers follow survey coverage is the increasing popularity
the process closely. Because the interviews of mobile (cell) phones. If mobile phones
are being conducted from a central location are additional to fixed landline phones
over the phone and interviewers do not (i.e. a person has a mobile phone, but also a
have to travel to respondents, fewer highly landline phone at home), this will not pose
trained interviewers and supervisors are a major problem for under-coverage. But,
needed. Interviewers should, of course, be there is evidence that certain groups (e.g. the
well trained in standard interview techniques young, lower income, urban, more mobile)
and in telephone conversations and know are over-represented in the mobile-phone-
how to use this auditive-only medium of only proportion of the population. When
communication, but the variety of interviewer mobile phones are excluded from telephone
320 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
surveys, this may result in serious under- a telephone and people with an unlisted tele-
coverage of these groups. Some countries phone cannot be reached, but the advantage
have good listings of all phone numbers, that telephone reminders or follow-ups can
including mobile phones, others have not; easily be implemented. Another reason for
customs associated with mobile phone use the frequent use of the telephone directory as
also differ from country to country. How sampling frame is the relative ease and the
mobile phones affect the efficacy of telephone low costs associated with this method.
surveys is therefore country dependent. For an A distinct drawback of mail surveys is
overview see Nathan (2001) and Steeh (2008). the limited control the researcher has over
In telephone interviews, as in face-to-face the choice of the specific individual within
interviews, the Kish procedure based on a a household who in fact completes the
complete household listing can in theory be survey. There is no interviewer available
used to select respondents within a household. to apply respondent selection techniques
However asking for a complete household within a household and all instructions for
listing over the phone, is a rather complex respondent selection have to be included
and time-consuming procedure and increases in the accompanying letter. As a conse-
the risk of break-offs. A good alternative for quence only simple procedures such as the
the Kish procedure is the last birthday or the male/female/youngest/oldest alternation or
next birthday method. In the last birthday the last birthday method can be successfully
method, the interviewer asks to speak with used. The male/female/youngest/oldest alter-
that household member who most recently nation asks in a random 25 percent of the
had a birthday. Even though, the birthday accompanying letters for the youngest female
methods are very popular and seen as the in the household to fill in the questionnaire;
standard to select a particular respondent from in a second random 25 percent of the letters
a household in telephone surveys, they are the youngest male is requested to fill in
not as precise as the complete Kish method. the questionnaire, etc. When a complete
For an overview see Ganziano, 2005. list of the individual members of the target
One of the main advantages of telephone population is available, which can be the case
interviews, besides the close supervision of in surveys of special groups or in countries
interviewers for quality control, is the relative with good administrative records, a random
low cost of telephone interviews both for sample of the target population can be drawn
completed interviews and for callbacks to regardless of the data collection method used.
non-respondents. As interviewers do not have In that case, coverage and sampling will be
to travel, a limited number of interviewers as good as in interview methods.
may call a large number of respondents in a The absence of an interviewer makes mail
relative short time period. This is especially surveys the least flexible data collection
important in sparsely populated areas or technique when complexity of the ques-
countries. tionnaire is considered. All questions must
be presented in a fixed order, and only a
limited number of simple skips and branches
MAIL SURVEYS can be used. For routings special written
instructions and graphical aids, such as arrows
Mail surveys require an explicit sampling and colours, have to be provided; for a
frame of names and addresses, and have great example see Dillman et al. (2005).
the advantage if only addresses and no tele- Furthermore, in a mail survey, all respondents
phone numbers are available. Often, tele- receive the same instruction and are presented
phone directories or other lists are used with the questions without added interviewer
for mail surveys of the general population. probing or help in individual cases. In
Using the telephone directory as a sampling short, a mail questionnaire must be totally
frame has the drawback that people without self-explanatory. But, a big advantage is
SELF-ADMINISTERED QUESTIONNAIRES AND STANDARDIZED INTERVIEWS 321
that visual cues and stimuli can be used, Dillman (1978, p. 68) gives an example in
and with well-developed instructions fairly which a survey unit of 15 telephones can
complex questions and attitude scales can complete roughly 3000 interviews during the
be implemented. The visual presentation of 8 weeks it takes to perform a complete mail
the questions makes it possible to use all survey with reminders. Only if the telephone
types of graphical questions (e.g. ladder, unit is smaller than 15 interviewers, or the
thermometer), and to use questions with number of needed completed interviews is
seven or more response categories. Also, larger than 3000, will a mail survey be faster.
information booklets or product samples can Logistically, mail surveys also have two
be sent by mail with an accompanying huge advantages: small staff and low costs.
questionnaire for their evaluation. However, Organizational and personnel requirements
open-ended questions are difficult to ask, for a mail survey are far less demanding than
as no interviewer is present to probe for in interviews. Most of the workers are not
more details. required to deal directly with respondents, and
In general, self-administered question- the necessary skills are mainly generalized
naires are less intrusive and allow for more clerical skills (e.g. typing, sorting, response
privacy and less time pressure. The absence administration, and correspondence process-
of an interviewer may in certain situations be ing). Of course, a trained person must be
a real advantage, especially when sensitive or available to deal with requests for informa-
socially desirable questions are being asked. tion, questions, and refusals of respondents,
Another advantage is that mail surveys can but no interviewers or other field staff are
be completed when and where the respondent needed. Thus, the number of different persons
wants and is not dependent on interviewer necessary to conduct a mail survey is far
time. A respondent may consult records if less than that required for interview surveys
needed, which may improve accuracy. For an with equal sample sizes. Requirements for the
overview see de Leeuw (1992) and Dillman organization and personnel do influence the
(2000). cost of data collection; as a consequence mail
From a logistic point of view mail surveys surveys are among the least expensive and
have two drawbacks: questionnaire length may be the only affordable mode in certain
and turn-around time. The personal presence situations.
of interviewers in face-to-face interviews
prohibits break-offs and allows for longer
questionnaires than in mail surveys, although INTERNET SURVEYS
telephone interviews do not have this advan-
tage. According to Dillman (1978, p. 55) mail In Internet or Web surveys, coverage is still
questionnaires up to 12 pages, which contain a major problem when surveying the gen-
less than 125 items, can be used without eral population (Couper, 2000, 2001). Even
adverse effects on the response. Turn-around though Internet access is growing and around
in mail surveys is slower than in most other 70 percent of the US population has access
modes. Mail surveys are locked into a definite to the Internet, the picture is diverse ranging
time interval of mailing dates with rigidly from 75 percent coverage for Sweden to 4 per-
scheduled follow-ups, and therefore take cent in Africa (www.internetworldstats.com).
longer than other modes of data collection, Furthermore, those covered differ from those
with the exception of large, geographically not covered, with the elderly, lower educated,
dispersed face-to-face interviews, which take lower income, and minorities less well-
the longest. When speed of completion is represented online.
really important and data are needed fast, As reaction to the differential coverage and
telephone and Internet surveys are best. the relative low response rates of Internet
If the data are needed in a couple of surveys, so-called ‘access panels’ gain in
weeks, mail surveys are a good choice. popularity in market research. In access
322 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
of Internet surveys: it can be used for the surveys, these forms of self-administered
fast collection of large numbers of completed questionnaires allow for more privacy and
questionnaires at low costs. In addition, self-disclosure as no interviewer is directly
there are no data entry costs, an advantage involved in the question-answer process. But
Internet surveys share with all computer- there are two main differences. The first is that
assisted interview modes. it is the researcher and not the respondent who
decides when and where the questionnaire
has to be completed. The researcher also
OTHER SELF-ADMINISTERED determines how long a session will take, and
QUESTIONNAIRES how much time subjects have to fill in the
questionnaire. This may be a disadvantage
Mail and Internet surveys are only two forms when well-considered responses are needed,
in which self-administered questionnaires but an advantage when speed-tests or first
can be used. These forms are most often associations are more appropriate. The second
implemented in social sciences surveys and difference is that, although no interviewer is
in polling. In psychology and education, directly involved, usually a trained research
other forms of self-administered question- assistant is present to give instructions,
naires are frequently used. In educational distribute the tests, and answer questions
research, group-wise administration of self- if necessary. Group-administered question-
administered questionnaires is common, be naires can be seen as a hybrid between
it in a paper form in the classroom, or in interview and mail survey, combining the
an electronic form in the school’s computer advantages of both methods: enough privacy
laboratory (cf. Beebe et al., 1998; Van for subjects to answer more freely, and
Hattum & De Leeuw, 1999). In psychological available assistance when needed.
testing, self-administered tests are used either
in an individual or a group setting. Again the
administration can be either as paper-and-pen SUMMARY
or computer-assisted testing (cf. Weisband
and Kiesler, 1996). In survey research there are two main forms
Examples of individual administration are of data collection: self-administered question-
questionnaires that are handed out by a nurse naires and standardized interviews. These are
or health officer in a hospital waiting room, mainly characterized by the absence versus
or by a receptionist in a day care centre. presence of an interviewer. But there are many
Sometimes, self-administered questionnaires variations possible, such as face-to-face and
are used with an interviewer present. This telephone interviews with their computer-
is usually done when sensitive questions assisted equivalents CAPI and CASI, and self-
have to be asked and the interviewer hands administered mail questionnaires and Internet
over a questionnaire for the respondent to surveys. Each method has its advantages
fill in privately. When computer-assisted and disadvantages, which are summarized
interviewing or CAPI is used, the interviewer below.
hands over the computer to the respondent for Deciding which data collection is best in a
a short period. The respondent can answer certain situation is often complex and depends
the specific questions in privacy and the on many factors, such as population under
interviewer remains at a respectful distance, investigation, topic, types of questions to
but also is available for instructions and be asked, available time, and funds. This
assistance. presents researchers with a difficult choice
Just as in mail and Internet surveys, the indeed. It is no wonder that recently multiple
questionnaires should be well tested and modes of data collection or mixed modes have
attention should be paid to graphical tools become popular. In mixed-mode surveys,
and layout. Just as in mail and Internet two or more modes of data collection are
324 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
combined in such a way that the disadvantages Sage (Pine Forge Press series in research methods and
of one method are counterbalanced by the statistics).
advantages of another; for instance combining Don, A.D. (2007). Mail and Internet Surveys (with 2007
a Web survey with a telephone interview update). New York: Wiley (discusses establishment
to compensate for under-coverage of the surveys and mixed mode too).
Floyd, J.F. (1995). Improving Survey Questions: Design
elderly and lower educated on the Internet.
and Evaluation (Vol. 38). Thousand Oaks: Sage
Other examples of mixed-mode designs are Applied Social Research Methods Series (on question
the use of face-to-face interviews for those writing and testing).
who cannot be reached by telephone, or
telephone interviews among non-respondents
in mail or Internet surveys. In longitudinal
For international studies
surveys, mixed-mode designs are common as
data collection methods often vary between de Leeuw, E.D., Hox, J., and Dillman, D. (eds) (2008).
waves; for instance (face-to-face) surveys International Handbook of Survey Methodology.
during recruitment and in the base-line Mahwah, N.J.: Erlbaum (especially chapters 9–14 & 16).
survey and less expensive survey methods
(e.g. mail, Internet, or telephone) in the
subsequent waves. Of course, when mixing REFERENCES
modes particular attention should be paid to
equivalence of question format, comparability Beebe, T.J., Harrison, P.A., McRae, J.A., Anderson,
of answers and data integrity. (For extensive R.E., and Fulkerson, J.A. (1998). An evaluation of
computer-assisted self-interviews in a school setting.
overviews see De Leeuw, 2005.)
Public Opinion Quarterly, 62: 623–632.
Which data collection mode or mix of
Biemer, P.P. and Lyberg, L.E. (2003). Introduction to
modes is chosen is the result of a careful Survey Quality. New York: Wiley.
consideration of quality and costs. But, certain Campanelli, P. (2008). Testing survey questions. In
survey design steps should always be taken, Edith de Leeuw, Joop Hox, and Don Dillman (eds)
as they are extremely important for high- International Handbook of Survey Methodology.
quality data. Among these are the careful Mahwah, N.J.: Erlbaum.
construction and (pre)-testing of the ques- Couper, M.P. (2000). Websurveys; A review of issues
tionnaire, the implementation of response- and approaches. Public Opinion Quarterly, 64, 4:
inducing features, such as advance letters, 464–494. See also Couper (2000) the Good, the
reminders, and if the budget allows the use Bad, and the Ugly. University of Michigan, Institute
for Social Research, Survey Methodology Program,
of incentives. Finally, in the case of inter-
Working paper series # 077.
views a thorough training of interviewers is
Couper, M.P. (2001). The promises and perils of
necessary in interview rules and non-response web surveys. Presentation at the ASC-Conference
reduction. on the Challenge of the Internet. Available at
www.asc.org.uk (accessed January, 2006).
Cronbach, L.J. and Meehl, P.E. (1955). Construct validity
SUGGESTED READINGS in psychological tests. Psychological Bulletin, 52:
281–302.
On survey quality and Curtin, R., Presser, S., and Singer, E. (2005). Changes in
data collection telephone survey nonresponse over the past quarter
century. Public Opinion Quarterly, 69: 87–98.
Paul, P.B. and Lars, E.L. (2003). Introduction to Survey de Heer, W. (1999). International response trends:
Quality. New York: Wiley (especially chapters 5 & 6). results of an international survey. Journal of Official
Statistics, JOS, 15, 2: 129–142. Also available on
www.jos.nu.
On practical aspects of surveys de Heer, W., de Leeuw, E., and van der Zouwen, J.
(1999). Methodological issues in survey research:
Czaja, R. and Blair, J. (2005). Designing Surveys: A A historical review. BMS, Bulletin de Methodologie
Guide to Decisions and Procedures. Thousand Oaks: Sociologique, 64: 25–48.
326 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
de Leeuw, E.D. (1992). Data Quality in Mail Groves, R.M. and Mc Gonagle, K.A. (2001). A
Telephone and Face-to-face Surveys. Amsterdam: theory guided training protocol regarding survey
TT-Publikaties. Available at http://www.xs4all.nl/ participation. Journal of Official Statistics (JOS), 17,
∼edithl/pubs/disseddl.pdf (accessed June 2006). 2: 249–265. Also available at www.jos.nu.
de Leeuw, E.D. (2004). New Technologies in Data Hox, J.J. and de Leeuw, E.D. (1994). A comparison
Collection, Questionnaire Design and Quality. Inter- of nonresponse in mail, telephone, & face to face
national Statistical Seminars Series # 44. San surveys: Applying multilevel modeling to meta-
Sebastian: EUSTAT. Available at http://www.eustat. analysis. Quality & Quantity, 28: 329–344. Reprinted
es/prodserv/datos/sem44.pdf (accessed June 2006). in: David de Vaus (2002) Social Surveys, part eleven,
de Leeuw, E.D. (2005). To mix or not to mix: data nonresponse error. London: Sage, Benchmarks in
collection modes in surveys. Journal of Official Social Research Methods Series.
Statistics, 21, 2: 233–255. Available at www.jos.nu Hyman, H.H. (1954). Interviewing in Social Research.
(accessed June 2007). Chicago: Chicago University Press.
de Leeuw, E.D., Callegaro, M. Hox, J.J., Korendijk, E., and Japec, L. (2005). Quality Issues in Interviewer Surveys:
Lensvelt-Mulders, G. (2007). The influence of advance Some Contributions. Stockholm: Stockholm Univer-
letters on response in telephone surveys: A meta- sity: Department of statistics (ISBN 91-7155-155-7).
analysis. Public Opinion Quarterly, 71, 3: 1–31. Jenkins, C. (now Cleo Redline) and Dillman, D.A. (1997).
de Leeuw, E.D. and de Heer, W. (2002). Trends in Towards a theory of self-administered questionnaire
household survey nonresponse: A longitudinal and design. In Lyberg, L., Biemer, P., Collins, M., de
international comparison. In Dillman, D.A., Eltinge, Leeuw, E., Dippo, C., Schwarz, N., and Trewin, D.
J.L., Groves, R.M., and Little, R.J.A. (eds) Survey (eds) Survey Measurement. New York: John Wiley.
Nonresponse. New York: Wiley. Kasprzyk, D., Duncan, G.J., Kalton, G., and Singh, M.P.
de Leeuw, E., Hox, J., and Kef, S. (2003). Computer- (1989). Panel Surveys. New York: Wiley.
assisted self-interviewing tailored for special popula- Kish, L. (1965). Survey Sampling. New York: Wiley.
tions and topics. Field Methods, 15: 223–251. Lee, S. (2006). Propensity score adjustment as a
Dillman, D.A. (1978). Mail and Telephone Surveys. The weighting scheme for volunteer panel web surveys.
Total Design Method. New York: Wiley. Journal of Official Statistics (JOS), 22, 2: 329–349.
Dillman, D.A. (2000). Mail and Internet Surveys. The Also available at www.jos.nu.
Tailored Design Method. New York: Wiley. Lozar, M.K. and Vehovar, V. (2008). Internet surveys.
Dillman, D.A. (2007). Mail and Internet Surveys. In de Leeuw, E., Hox, J., and Dillman, D. (eds)
The Tailored Design Method (2007 Update with International Handbook of Survey Methodology.
Appendix). New York: Wiley. Mahwah, N.J.: Erlbaum.
Dillman, D.A., Eltinge, J.L., Groves, R.M., and Little, Matsuo, H., McIntyre, K.P., Tomazic, T., and Katz, B.
R.J.A. (2002). Survey nonresponse in design, data The online survey: its contributions and poten-
collection and analysis. In Dillman, D.A., Eltinge, tial problems. American Statitistical Association
J.L., Groves, R.M., and Little, R.J.A. (eds) Survey (ASA). Proceedings, 2004, ASA section on Survey
Nonresponse. New York: Wiley. Research Methods, pp. 3998–4000. Available at
Dillman, D.A., Gertseva, A., and Mahon-Taft, T. (2005). www.amstat.org/sections/srms/proceedings.
Achieving useability in establishment surveys through Nathan, G. (2001). Telesurvey methodologies for
the application of visual design principles. Journal households: A review and some thoughts for the
of Official Statistics (JOS), 21, 2: 183–214. Also future. Survey Methodology, 27: 7–31.
available at www.jos.nu. National Centre for Social Research (1999). How
Fowler, F.J. (1995). Improving Survey Questions: Design to Improve Survey Response Rates: A Guide for
and Evaluation (Vol. 38). Thousand Oaks: Sage Interviewers on the Doorstep. London, Thousand
Applied Social Research Methods Series. Oaks and New Delhi: Sage Publications.
Ganziano, C. (2005). Comparative analysis of within- O’Muircheartaigh, C. (1997). Measurement error in
household respondent selection techniques. Public surveys: A historical perspective. In Lyberg, L., Biemer,
Opinion Quarterly, 69: 124–157. P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N.,
Goyder, J. (1987). The Silent Minority: Nonrespondents and Trewin, D. (eds) Survey Measurement and Process
on Sample Surveys. Cambridge: Policy Press. Quality. New York: Wiley.
Groves, R.M. (1989). Survey Errors and Survey Costs. Presser, S., Rothgeb, J., M., Couper, M.P., Lessler,
New York: Wiley. J.T., Martin, E., Martin, J., and Singer, E. (2004).
Groves, R.M. and Couper, M.P. (1998). Nonresponse in Methods for Testing and Evaluating Survey Questions.
Household Interview Surveys. New York: Wiley. New York: Wiley.
SELF-ADMINISTERED QUESTIONNAIRES AND STANDARDIZED INTERVIEWS 327
Redline, C.D., Dillman, D.A., Carley-Baxter, L., and Handbook of Survey Methodology. Mahwah, N.J.:
Creecy, R. (2003). Factors that influence reading Erlbaum.
and comprehension in self-administered question- Steeh, C. and Piekarski, L. (2006). Accommodating
naires. Paper presented at the Workshop on Item- new technologies: the rejuvenation of telephone
Nonresponse and Data Quality, Basel Switzerland, surveys. Paper Presented at the second International
October 10, 2003. Available at http://survey.sesrc. Conference on Telephone Survey Methodology
wsu.edu/dillman/papers.htm (accessed June 2006). (TSMII), Florida.
Salant, P. and Dillman, D.A. (1994). How to Conduct Sudman, S., Bradburn, N.M., and Schwarz, N. (1996).
Your Own Survey. New York: Wiley. Thinking About Answers. The Application of Cognitive
Schwarz, N., Knäuper, B., Oyserman, D., and Stich, Processes to Survey Methodology. San Francisco:
C. (2008). The Psychology of Asking Questions. Jossey-Bass.
In Edith De Leeuw, Joop Hox, and Don Dillman Tourangeau, R., Rips, L.J., and Rasinski, K. (2000).
(eds) International Handbook of Survey Methodology. The Psychology of Survey Response. Cambridge:
Mahwah, N.J.: Erlbaum. Cambridge University Press.
Singer, E. (2002). The use of incentives to reduce Van Hattum, M.J.C. and de Leeuw, E.D. (1999). A disk by
nonresponse in household surveys. In Dillman, D.A., mail survey of pupils in primary schools: Data quality
Eltinge, J.L., Groves, R.M., and Little, R.J.A. (eds) and logistics. Journal of Official Statistics (JOS), 15,
Survey Nonresponse. New York: Wiley. 3: 413–429. Also available at www.jos.nu (accessed
Snijkers, G., Hox, J.J., and de Leeuw, E.D. (1999). June 2006).
Interviewers’ tactics for fighting survey nonresponse. Weisband, S. and Kiesler, S. (1996). Self disclosure
Journal of Official Statistics (JOS), 15, 2: 185–198 on computer forms: Meta analysis and implications.
(available at www.jos.nu). Reprinted in: David de CHI ’96. Available at http://acm.org/sigchi/chi96/
Vaus (2002), Social Surveys, part eleven, nonresponse proceedings/papersWeisband/sw_txt.htm (accessed
error. London: Sage, Benchmarks in Social Research July 2006).
Methods Series. Willis, G.B. (2004). Cognitive Interviewing. A Tool for
Steeh, C. (2008). Telephone surveys. In de Leeuw, E., Improving Questionnaire Design. Thousand Oaks:
Hox, J., and Dillman, D. (eds) International Sage.
19
Qualitative Interviewing and
Feminist Research
Andrea Doucet and Natasha Mauthner
in-depth face-to-face interview came to be known at all through interviews or, indeed,
seen as ‘the paradigmatic “feminist method”’ through any other method (Wilkinson and
(Kelly et al. 1994, 34). The equation of Kitzinger 1996). Our chapter also addresses
feminist research with qualitative methods the increasingly topical and critical question
was criticized by a number of feminists of how one can come to know others
early on (e.g. Jayaratne 1983). Since then, who are different from ourselves (such as
feminists have increasingly moved away in cross-cultural interviewing and women
from privileging particular methodological interviewing men) and highlights the most
approaches and methods. There has been recent contributions of feminist scholarship to
recognition that research methodologies and contemporary understandings of the research
methods should reflect the specific research interview.
questions under investigation, and that key
feminist concerns can usefully be addressed
by adopting a range of different approaches FEMINIST CONTRIBUTIONS TO THE
and methods (Brannen 1992; Chafetz 2004a, INTERVIEW: 1970s AND 1980s
2004b; Kelly et al. 1994; Maynard 1994;
McCall 2005; Oakley 1998; Westmarland In the 1970s, feminist researchers began to
2001). engage with the intersections between fem-
Whilst recognizing that current feminist inist theory and methodologies, and turned
research is characterized by the use of their attention to the ways in which the meth-
multiple and mixed methods and approaches, ods available for studying and understanding
the focus of this chapter is specifically women’s lives were flawed. As Dorothy
on the ways in which feminist scholars Smith (1974, 2) noted, there was within
have sought to transform the classic social sociology ‘a disjunction between how women
science interview in line with feminist aims. find and experience the world beginning
Just as feminist thinking around issues of (though not necessarily ending up) from
method, methodology, and epistemology have their place and the concepts and theoretical
had a profound effect on research practices schemes available to think about it in.’ Early
and theories more generally, contributions feminist sociological theory thus pointed to
that feminist scholars have brought to the how women’s exclusion mattered both theo-
interview as a site for knowing from and about retically and methodologically. Turning their
women’s lives have been influential in re- gaze to dominant methods used to generate
shaping the practice and theory of qualitative theory, many feminist scholars expressed
interviewing more broadly. unease about quantitative data collection
The aim of this chapter is therefore to methods across the social and natural sciences
examine feminist debates concerning the and, more specifically, gender bias in the
interview as a particular method of data collection and interpretation of data on sex
collection. We begin by sketching out what differences in behavioral, biological, and bio-
we regard as some key historical trends in behavioral scientific research. Feminist scien-
feminist approaches to interviewing, with tists documented, in particular, the exclusive
a particular discussion of Ann Oakley’s use of male subjects in both experimental and
(1981) now classic piece on the importance clinical biomedical research, the selection of
of non-hierarchical interviewing practices. male activity and concomitant male-dominant
While Oakley’s contribution initially stimu- animal populations for study, and the blatant
lated discussions around the possibilities and invisibility of females in research protocols
limitations of creating rapport and friendliness (Haraway 1988, 1991; Keller 1983, 1985;
within interviews, more recent challenges Keller and Longino 1998; Longino and Doell
from black feminism, cultural studies, post- 1983; Rose 1994).
structural and postcolonial writing have Whilst feminist scientists made such obser-
questioned the extent to which ‘others’ can be vations on the basis of experiments conducted
330 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
on rats and baboons, similar concerns were lamented, women’s experiences were being
made across the social sciences and humani- measured within surveys designed on the
ties on research processes and protocols with basis of men’s lives; her provocative question,
human beings. Feminist social scientists noted posed at the beginning of the 1980s, summed
how masculine bias permeated research, as up the growing dissatisfaction with surveys
perhaps best revealed in the valuing and for understanding women’s experiences: ‘Do
incorporation of traditional masculine char- her answers fit his questions?’ (Graham
acteristics of reason, rationality, autonomy, 1983).
and disconnection (see Code 1981; Gilligan It was against this backdrop that feminist
1977, 1982; Keller 1985; Lloyd 1983; Miller social scientists turned their attention to the
1976; Smith 1974). Also within the social possibilities and practices of interviewing.
sciences and humanities, feminists waged During the 1980s feminist researchers, espe-
a long and wide epistemological critique cially those working within sociology, began
of positivism as a philosophical framework to engage with the issue of how to interview in
and its detached and ‘objective’ scientific ways that would adhere to widely recognized
approach that objectified research subjects. feminist goals of conducting non-hierarchical
Feminist scholars raised three particular and egalitarian research. This critique began
concerns within this epistemological critique. early in the decade with Ann Oakley’s
First, women’s lives and female-dominated now highly cited article on ‘non-hierarchical’
domains were largely absent in much social relationships between female interviewers
science research. Thus when Dorothy Smith and interviewees (Oakley 1981). Her dis-
argued that ‘sociology … has been based on cussion sought to provide an alternative to
and built up within the male social universe’ what were presented as ‘proper interviews’
(Smith 1974, 7), this was a ‘social universe’ in sociological textbooks. More broadly,
that left unstudied and invisible the female- Oakley challenged positivist research meth-
dominated social sites of domestic work and ods that emphasized ‘objectivity,’ distance,
the care of children, the ill and the elderly (see and ‘hygienic’ research uncontaminated by
also Finch and Groves 1983; Graham 1983, the researcher’s values or biases. In contrast
1991). Second, these sentiments were even to an objective, standardized and detached
more profoundly felt by particular groups of approach to interviewing, Oakley argued that
women, especially by women of color who ‘the goal of finding out about people through
watched as feminist movements and feminism interviewing was best achieved when the
within the academy unfolded in ways that did relationship of interviewer and interviewee is
not speak to them or about them. In the United non-hierarchical and when the interviewer is
States, this sense was aptly described as one of prepared to invest his or her own personal
‘feelings of craziness’ by the infamous Com- identity in the relationship’ (1981, 41). Janet
bahee River Collective’s manifesto entitled: Finch (1984), writing a few years later, echoed
‘A Black Feminist Statement’ (Combahee Oakley’s concerns in emphasizing the rapport
River Collective 1977/1986; see also Collins that could easily be struck between two
1990; Hooks 1989, 1990; Lorde 1984). In women in an interview situation while others
Britain, women of African and Asian descent followed suit and argued for the importance of
spoke to the invisibility of their experiences developing mutually reciprocal relationships
in public, political, and academic portrayals during the interviewing stage (Mies 1983;
of women’s lives (see Bryan et al. 1985; Rheinharz 1992; Stanley and Wise 1983,
Mirza 1998; Wilkinson and Kitzinger 1996). 1993).
A third concern was over the preferred tool A central preoccupation for feminist
for research within positivist frameworks, researchers writing in the 1980s was an acute
namely, the quantitative survey, and the extent sensitivity to the relations between researcher
to which it could adequately capture the com- and researched, and power relations more
plexity of women’s lives. As Hilary Graham widely (see Maynard and Purvis 1994;
QUALITATIVE INTERVIEWING AND FEMINIST RESEARCH 331
Ramazanoglu and Holland 2002). In the Mohanty et al. 1991; Oyewumi 2000; Spivak
1990s, however, feminist social scientists 1993); the challenges of knowing transna-
began to challenge the notion of non- tional lesbian and gay identities (Bunch 1987;
hierarchical interviews, the idea that power Stone 1991); and the role and representation
differentials could be equalized between of subordinate ‘others’ in the production of
women, as well as the assumption that knowledge (Bernal 2002; Christian 1996).
reciprocity and mutuality between women A decade after Ann Oakley’s celebration
necessarily leads to ‘better’ knowing. Indeed, of non-hierarchical woman-to-woman inter-
feminists began to display a growing appre- viewing, and its ability to yield greater
ciation of the ‘dilemmas’ and tensions insight into knowledge of women’s lives,
involved in coming to know and represent feminist work took a 360-degree turn and
the narratives, experiences, or lives of their began to highlight the potential dangers
interview subjects (e.g. Ribbens and Edwards associated with trying to pretend that inter-
1998; Willkinson and Kitzinger 1996; Wolf views could be friendly or mutually bene-
1992). ficial for both researchers and interviewees.
Western-based social scientists have exhib- Judith Stacey (1991: 114) argued that the
ited profound ‘worry’ over resolving these ‘ethnographic method exposes subjects to
tensions (Fine and Wiess 1996, 251; see also far greater danger and exploitation than do
DeVault 1999). However, the ethical dilem- more positivist, abstract, and “masculinist”
mas around coming to know ‘others’ have research methods. And the greater the inti-
been particularly clearly articulated by Black macy – the greater the apparent mutuality
feminist scholars (Lewis 2000; Mama 1995; of the researcher/researched relationship –
Reynolds 2002a) and by feminists working the greater is the danger.’ Pamela Cotterill
in contexts where inequalities are especially (1992: 597) similarly drew attention to the
acute, such as in low-income communities ‘potentially damaging effects of a research
and in Third World countries (Patai 1991; technique which encourages friendship in
Wolf 1992). One of the most vocal scholars order to focus on very private and personal
on this issue has been Daphne Patai who aspects of people’s lives.’ These criticisms
has insisted that, due to socio-economic and have continued into the new millennium,
global inequalities, research relations between with feminists commenting on the irony that
First World women interviewing Third World feminist researchers may be reproducing the
women are not only intrinsically hierarchical, very practices they have been seeking to
but can be unethical (Patai 1991). Questions of challenge:
who produces knowledge, with what politics,
and from which locations (Mohanty 1988, It is perhaps ironic, then, that scholars are
discovering that methodological changes intended
1991) have, furthermore, become increasingly to achieve feminist ends—increased collaboration,
critical and urgent in feminist, postmodern, greater interaction, and more open communication
and post-colonial research. Throughout the with research participants—may have inadvertently
1990s, women of color working within reintroduced some of the ethical dilemmas feminist
western contexts and feminists working in researchers had hoped to eliminate: participants’
sense of disappointment, alienation, and potential
Third World settings have highlighted sys- exploitation. (Kirsch 2005, 2163)
temic processes of exclusion, racism, and
ethnocentrism in research. Key and much- Three decades of ardent reflection on
debated issues have included: intersections of the usefulness of interviews as the most
global capitalism and feminist transnational appropriate, or even the best, way of
identities (Ferguson 2004; Schutte 1993, gathering knowledge from and for women
1998, 2000; Shohat 2001); the extent to have paved the way for broader theoretical
which feminists in dominant cultures can and epistemological debates about ‘knowing’
ever know subaltern cultures (Alexander others. Beginning in the 1990s, feminists have
and Mohanty 1997; Ladson-Billings 2000; turned their attention to the difficulties and
332 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
challenges involved in creating knowledge in both China and the UK – argues that
from interview accounts. both the interviewer’s and interviewee’s
perceptions of social, cultural, and personal
differences have an impact on the power
FEMINIST CONTRIBUTIONS TO THE relationship in the interview and that the
relational dynamics between the interview
INTERVIEW: RECENT ISSUES AND
pair can matter in what kind of information
CONCERNS (1990s–2000s) is divulged (Tang 2002; see also Garg 2004).
Others have focused on how other aspects
While the issues raised by Oakley have
of the research relationship can influence the
been critiqued and displaced with other
content and conduct of interview, including:
key concerns, it remains the case that her
shared proficiency by both interviewer
reflections on what was important to feminist
and interviewee in the language of the
interviewing still resonate as highly relevant
interview (Garg 2004; Temple and Edwards
in the new millennium. That is, issues of
2002); generational differences between
non-hierarchical relations, power, rapport,
interviewers and interviewees (Casey 2003);
and empathy, and the investment of one’s
shared racial position (such as Black women
identity in the interview process continue
researchers conducting interviews with
to dominate discussions of feminist research
Black women on topics that are highly
practices. However, these discussions have
sensitive) (Few et al. 2003); and how class
grown more complex and nuanced, and have
relations may influence the ‘telling’ of lesbian
incorporated a number of other concerns
stories in research interviews (McDermott
including: interviews as sites for collaborative
2004).
meaning-making (the ‘how’ of interviews);
Power relations in research have been
the interrogation of ‘what’ constitutes data;
discussed with an overwhelming focus on how
and the theoretical assumptions and under-
interviews affect the researched. Recently,
pinnings of interviews, and research methods
however, feminists have highlighted the
more generally.
ways in which research respondents can
exercise power, creating a two-way flow
of power relations between the researcher
Non-hierarchical relations in
and the researched. Informed by Fou-
interviewing
cauldian understandings of power, Thapar-
Underlying early discussions of non- Bjorkert and Henry (2004) view power
hierarchical interviewing was the assumption hierarchies in research as ‘shifting, multiple,
that differences between women could be and intersecting’ (Thapar-Bjorkert and Henry
muted or eliminated altogether. Decades of 2004, 364). Drawing on the multiple locations
scholarship on differences between women, within which both researchers and research
postmodern and post-structural critiques of participants are located, they argue that
the stability of a concept and identity such their combined locations as ‘non-white/non-
as ‘woman,’ and black feminist contributions western and non-white/western researchers
to this debate have revealed the naivety and in a non-western setting’ enabled them to
essentialism inherent within this position. ‘closely examine the operation of power
Many feminist researchers have shown that as it flows and ebbs in the context of
structural characteristics other than gender, a multiplicity of potential identities of
such as differences in class, ethnicity, age, researchers and research participants’ (2004,
sexuality, and global location can matter and 363). They note, in particular, how age,
that the ways in which power imbalances generation, national location, and reciprocity
play out in the interview process are not during and after the interviews influence how
straightforward. Tang, for example, in her these power relations play out. Similarly,
interviews with peers – academic mothers drawing on her research with Black mothers,
QUALITATIVE INTERVIEWING AND FEMINIST RESEARCH 333
Reynolds (2002b) questions the notion of the interviewer must never disagree with
the ‘powerful researcher.’ She notes that ‘the a respondent in qualitative research.
power relations between the mothers and Issues of rapport and empathy in interview-
myself, as researcher, involved a dynamic, ing have tended to be discussed and con-
fluid and two-way interactive process’(2002b, ceptualized in relation to woman-to-woman
303). She found that power relations within interviewing. However, since the 1990s,
her interviews shifted according to struc- feminists have increasingly been investigating
tural differences in race, class, age, and the lives of men, thus raising questions
gender between researcher and researched. around creating empathy and rapport with
She writes: male research subjects. These challenges
have emerged from the work of feminist
‘Where the researcher and research participant
share the same racial and gender position, such
researchers who, for example, have inter-
as Black female researcher interviewing Black viewed powerful, authoritative, and uni-
women, power between the two groups is primarily formed men (e.g. senior police officers)
negotiated through other facts such as social class or violent male offenders (Campbell 2003;
and age difference. This interaction between race, Presser 2004, 2005; Taylor and Rupp 2005).
class and gender suggests that power in social
research is not a fixed and unitary construct,
Researchers of fatherhood have further
exercised by the researcher over the research explored how feminist research relationships
participant. Instead … power is multifaceted, can be fostered with men. In recent research
relational and interactional and is constantly shifting on divorced fathers, for example, Canadian
and renegotiating itself between the researcher feminists have reflected on the tensions
and the research participant according to differing
contexts and their differing structural locations.’
in interviewing fathers in political climates
(2002b, 307–8) where fathers’rights groups have been gaining
momentum. They highlight how fathers’ nar-
Feminist reflections on the inevitability of ratives can be heard as potentially damaging to
hierarchy and power differences in interview women’s traditional caregiving interests (see
settings and relationships do not suggest Doucet 2004, 2006; Mandell 2002). Feminist
or imply abandonment of this method but research on men’s experiences demonstrates
rather invite researchers to be reflexive how the establishment of trustworthy relations
about their research practices by recognizing, in the interviewing setting can nevertheless
debating, and working with these power exist within relations of considerable power
differentials. inequities and conflict that can ultimately
undermine larger feminist research objectives.
Empathy, rapport, and reciprocity
Investing one’s identity in the
Feminists have deepened their reflections on
research relationship
issues of empathy, rapport, and reciprocity
in interview situations, with a recent focus In the early work of Ann Oakley (1981),
on how to navigate differences of social the idea of investing one’s identity in
positioning. Questions about how much the research relationship was marked by
researchers should reveal about themselves, a tendency to frame a binary opposition
their situations and their views during inter- between the researcher as an ‘insider’ or an
views have continued to be asked (see ‘outsider’ to the research and to one’s research
Edwards 1993), particularly in cases of subjects. Oakley, and many other feminist
research on overtly political issues where researchers who followed her, illustrated this
researcher and researched may hold divergent tendency in the argument that where the
perspectives. For example, in her research in researcher has an area of shared identity
the British Serbian community on Serbian with her research subjects, there was a
liability for atrocities, Pryke (2004) chal- reduced likelihood of unequal, exploita-
lenges the methodological convention that tive, or unethical research. In the case of
334 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Oakley, shared motherhood was the entry fixed or static positions; rather they are ever-
point for the researcher to have ‘insider’ shifting and permeable social locations that
status in the research. Other feminists were are differentially experienced or expressed by
quick to contest this notion by underlining community members’ (Naples 2003, 373; see
how differing, as well as shared, structural also Naples 1996). Ongoing reflections on the
characteristics could impede mutuality and complexities of ‘otherness’ have highlighted
reciprocity (Coterill 1992; Edwards 1990, the increasing set of challenges that face
1993; Glucksmann 1994; Ramazanoglu 1989; researchers as they attempt to know others
Reynolds 2002b; Ribbens 1989; Song and who are different from themselves across
Parker 1995). Feminist scholars also noted multiples axes of identities and experiences
that even where researchers and respondents (see Fawcett and Hearn 2004).
shared structural and cultural similarities of Second, the question of who we are,
gender, ethnicity, class, and age, this did not while engaged concretely in the practice
guarantee mutual understanding or ‘better’ of research interviews, is also viewed as
knowing. As Catherine Riessman pointed out, neither unitary nor static. Shlulamit Reinharz,
‘gender and personal involvement may not for example, in a book chapter entitled
be enough for full “knowing”’ (Riessman ‘Who Am I,’ reflects upon how she has
1987, 189; see also Ribbens 1998). Since the ‘approximately 20 different selves’ (Reinharz
early 1990s, feminist discussions of identity 1997, 5) during her interviews and fieldwork.
investment in interviews have, thus, debunked Recent feminist contributions to this debate
the view that any commonality in one’s have highlighted how the interview topics
social positionality, structural location, and as well as the relational dynamics occurring
biographical experience can guarantee that in the research encounter influence how
these axes of shared identification will estab- we present ourselves and which parts of
lish an open or ‘better’ research exchange (see our identity we choose to emphasize. Some
Dyck 1997). researchers may adopt ‘in-between positions’
At the same time, feminists began to as they straddle different identities (Ghorashi
recognize that the identity of being an ‘insider’ 2005) while others have stressed the ‘border-
was riddled with contradictions and that there making process that occurs during the social
were varied degrees of being both an insider constructionist interview’ wherein ‘various
and an outsider in the research relationship pre-assumed roles are created by researchers
(e.g. Narayan 1993; Olesen 1998; Stanley and by their respondents’(Gubrium and Koro-
1994; Zavella 1993). In this vein, Patricia Ljungberg 2005, 690).
Hill Collins has referred to herself as the
‘outsider within’ (Hill Collins 1990, 1998) as
Interviews and an interrogation of
a way of describing ‘being on the edge’ of
‘what’ constitutes data
‘intersecting power relations of race, gender
and social class’ (Hill Collins 1999, 85; Feminist researchers have also interrogated
see also Anzaldua 1987; Braidotti 1994). just ‘what’ emerges out of interview data.
Furthermore, post-structuralist discussions of In the 1970s and 1980s, there was a tendency
the complexity of the theoretical concepts for feminist researchers, particularly those
and empirical constructs of subjectivity and influenced by feminist standpoint theory
identity have further strengthened the prob- (Harding 1987; Hartsock 1983, 1985; Smith
lematization of what it means to be an 1987), to talk and write about seemingly
insider or an outsider, both theoretically and coherent and transparent subjects whose
methodologically. experiences, voices, or subjectivities could
Two key issues have come to the fore be captured by well-formulated research
in these debates. First, there is now fairly questions. Going back to Hilary Graham’s
widespread consensus among feminists that point about ‘her answers’ not fitting ‘his
‘“outsiderness” and “insiderness” are not questions,’ there was an implicit assumption
QUALITATIVE INTERVIEWING AND FEMINIST RESEARCH 335
that if the questions could just be reformulated 1995; Butler 1995; Fraser and Nicholson
better, then ‘her answers’ would indeed pro- 1988; Weeks 1998), debates on theorizing
vide pathways into understanding women’s the concept of ‘experience’ (Holt 1994;
experiences. In ensuing years, however, the Scott 1992, 1994) as well as feminist
influence of postmodern and post-structural critiques of Foucault’s varied conceptions of
critiques has meant that feminists have begun the subject (Deveaux 1994; McNay 1993;
to strongly challenge this view. Researchers Sawicki 1991).
have named this as the recurring ‘transparent
self problem’ and the ‘transparent account
problem’ (Hollway and Jefferson 2000, 3;
see also Frith and Kitzinger 1998, 304–307) Interviews as collaborative
within interviews and their analysis.
meaning-making: The ‘how’ of
An extensive scholarship on post-
interviews
structuralist conceptualizations of subjects is
now well incorporated into feminist research Feminists, particularly those influenced by
and feminist approaches to the interview. ethnomethodology, have highlighted the
Most notable has been post-structural importance of the interview not only as a place
theorizing about a non-unitary, constantly to collect data, but also a site where data
changing subject where there is no ‘core self’ is co-constructed, where identities are forged
(e.g. Weedon 1987). Even feminist scholars through the telling of stories, and where
who have been critical of post-structuralist meaning-making begins. Researchers have
approaches have been influenced by such focused on how the research interview has
critiques. Sandra Harding, for example, particularly strong meanings for the research
has moved beyond her originally narrow participant (Hiller and DiLuzio 2004; see
conception of a feminist standpoint to argue also Brannen 1988). The research interview
that ‘the subjects of knowledge are … can be a site for the construction of one’s
multiple, heterogeneous and contradictory ‘moral’ identity (Presser 2004) as well as
or incoherent’ (Harding 1993, 65). Other a potential avenue for resistance and healing
scholars have remained unconvinced by when topics are of a sensitive nature (Taylor
the linguistic turn and have continued 2002). In Presser’s qualitative work with
to hold onto some notion of coherent men who had committed ‘serious violent
subjectivities, or to ‘knowing subjects’ in crimes, including crimes again women –
their interviewing, as well as knowledge- rape of girls and women and assault and
construction practices (see Code 1993; Smith murder of female partners’ (2005, 2067),
1999; Stanley 1994). Dorothy Smith, for she examines how the interview itself acted
example, has argued persuasively that post- as a context for the creation of men’s
structuralism ‘has rejected the unitary subject narratives and their identities. Reflecting on
of modernity only to multiply it as subjects her role as a researcher in these settings,
constituted in multiple and fragmented she highlights how the men she interviewed
discourses’ (Smith 1999, 108) while Linda presented themselves as ‘good and manly’
Alcoff has maintained: ‘Poststructuralist and ‘decent’ while simultaneously construct-
critiques pertain to the construction of all ing her, the researcher, both as somebody
subjects or they pertain to none’ (Alcoff ‘needing strength and guidance concerning
1988, 409). These debates on ‘who’ or relations with men’ as well as ‘an object
‘what’ is being accessed within interviews of fantasies of domination’ (2005, 2086).
have continued in discussion of feminist Presser, thus, argues that feminist researchers
research into the new millennium against a need to pay closer attention to how power
backdrop of larger theoretical work on post- relations within the interview setting can
structuralist and materialist/interpretivist become part of one’s data and she calls for a
conceptions of the subject (see Benhabib ‘close and deep (multilevel) examination of
336 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Doucet, Andrea. 2006. Do Men Mother? Fathering, Care Fraser, Nancy and Linda Nicholson. 1988. ‘Social
and Domestic Responsibility. Toronto: University of Criticism without Philosophy: An Encounter between
Toronto Press. Feminism and Postmodernism.’ Theory, Culture and
Doucet, Andrea and Natasha S. Mauthner. 2002. Society 5: 373–394.
‘Knowing Responsibly: Linking Ethics, Research Frith, Hannah and Celia Kitzinger. 1998. ‘“Emotion
Practice and Epistemology.’ In Ethics in Qualitative Work” as a Participant Resource: A Feminist Analysis
Research, edited by M. Mauthner, M. Birch, J. Jessop, of Young Women’s Talk-in-interaction.’ Sociology 32:
and T. Miller. London: Sage. 299–320.
Doucet, Andrea and Natasha S. Mauthner. 2006. ‘Fem- Garg, Anupama. 2004. ‘Interview Reflections: A First
inist Methodologies and Epistemologies.’ In Hand- Generation Migrant Indian Woman Researcher
book of 21st Century Sociology, edited by Clifton D. Interviewing a First Generation Migrant Indian Man.’
Bryant and Dennis L. Peck. Thousand Oaks, CA: Journal of Gender Studies 14: 147–152.
Sage. Gelsthorpe, Lorraine. 1990. ‘Feminist Methodology in
Dyck, Isabel. 1997. ‘Dialogue with Difference: A Tale Criminology: A New Approach or Old Wine in New
of Two Studies.’ pp. 183–202 in Thresholds in Bottles.’ In Feminist Perspectives in Criminology,
Feminist Geography: Difference, Methodology, Rep- edited by L. Gelsthorpe and A. Morris. Milton Keynes:
resentation, edited by J. P. I. Jones, H. Nast, and Open University Press.
S. M. Roberts. Lanham, MD: Rowman and Littlefield. Ghorashi, Halleh. 2005. ‘When the Boundaries are
Edwards, Rosalind. 1990. ‘Connecting Method and Blurred: The Significance of Feminist Methods in
Epistemology: A White Woman Interviewing Black Research.’ European Journal of Women’s Studies
Women.’ Women’s Studies International Forum 12: 363–375.
13: 477–490. Gilligan, Carol. 1977. ‘In a Different Voice: Psycho-
Edwards, Rosalind. 1993. ‘An Education in Interviewing: logical Theory and Women’s Development.’ Harvard
Placing the Researcher and the Researched.’ In Educational Review 47: 481–517.
Researching Sensitive Topics, vol. 181–196, edited Gilligan, Carol. 1982. In a Different Voice: Psychological
by C. M. Renzetti and R. M. Lee. Newbury Park: Sage. Theory and Women’s Development. Cambridge,
Fawcett, Barbara and Jedd Hearn. 2004. ‘Researching Mass.: Harvard University Press.
Others: Epistemology, Experience, Standpoints and Glucksmann, Miriam. 1994. ‘The Work of Knowledge
Participation.’ International Journal of Social Research and the Knowledge of Women’s Work.’ In Research-
Methodology 7: 201–218. ing Women’s Lives from a Feminist Perspective, edited
Ferguson, Ann. 2004. ‘Symposium: Comments on Ofelia by M. Maynard and J. Purvis. London: Taylor and
Schutte’s Work on Feminist Philosophy.’ Hypatia Francis.
19: 169–181. Graham, Hilary. 1983. ‘Do Her Answers Fit His
Few, April L., Dionne P. Stephens, and Marlo Rouse- Questions? Women and the Survey Method.’ In The
Arnett. 2003. ‘Sister-to-Sister Talk: Transcending Public and the Private, edited by E. Gamarnikow.
Boundaries and Challenges in Qualitative Research London: Tavistock.
with Black Women.’ Family Relations 52: 205–215. Graham, Hilary. 1991. ‘The Concept of Caring in
Finch, Janet. 1984. ‘“It’s Great to Have Someone to Talk Feminist Research: The Case of Domestic Service.’
to”: The Ethics and Politics of Interviewing Women.’ Sociology 25: 61–78.
In Social Researching: Politics, Problems, Practice, Gubrium, Erika and Mirka Koro-Ljungberg. 2005.
edited by C. Bell and H. Roberts. London: Routledge ‘Contending with Border Making in the Social
and Kegan Paul. Constructionist Interview.’ Qualitative Inquiry
Finch, Janet and Dulcie Groves. 1983. A Labour of Love: 11: 689–715.
Women, Work and Caring. London: Routledge. Haraway, Donna. 1988. ‘Situated Knowledges: The
Fine, Michelle and Lois Wiess. 1996. ‘“Writing Science Question in Feminism and the Privi-
the Wrongs’ of Fieldwork: Confronting our Own lege of Partial Perspective.’ Feminist Studies 14:
Research/Writing Dilemmas.” Qualitative Inquiry 575–599.
2: 251–274. Haraway, Donna. 1991. Simians, Cyborgs and
Fonow, Mary M. and Judith A. Cook. 1991. Beyond Women: The Reinvention of Nature. New York:
Methodology: Feminist Scholarship as Lived Research. Routledge.
Bloomington: Indiana University Press. Harding, Sandra. 1987. ‘Conclusion: Epistemological
Fonow, Mary M. and Judith A. Cook. 2005. ‘Feminist Questions.’ pp. 181–190 in Feminsm and Method-
Methodology: New Applications in the Academy and ology, edited by S. Harding. Bloomington, Indiana
Public Policy.’ Signs: Journal of Women in Culture and and Milton Keynes, UK: Indiana University Press and
Society 30: 2211–2236. Open University Press.
340 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Harding, Sandra. 1993. ‘Rethinking Standpoint Epis- Keller, Evelyn Fox. 1985. Reflections of Gender and
temologies: What is Strong Objectivity.’ In Feminist Science. New Haven and London: Yale University
Epistemologies, edited by L. Alcoff and E. Potter. Press.
London: Routledge. Keller, Evelyn Fox and Helen E. Longino. 1998. ‘Feminism
Harrison, Jane, Lesley MacGibbon, and Missy Morton. and Science.’ Oxford and New York: Oxford University
2001. ‘Regimes of Trustworthiness in Qualitative Press.
Research: The Rigors of Reciprocity.’ Qualitative Kelly, Liz, Sheila Burton, and Linda Regan. 1994.
Inquiry 7: 323–345. ‘Researching Women’s Lives or Studying Women’s
Hartsock, Nancy. 1983. ‘The Feminist Standpoint: Oppression? Reflections on what Constitutes Feminist
Developing the Ground for a Specifically Feminist Research.’ pp. 27–48 in Researching Women’s Lives
Historical Materialism.’ In Discovering Reality: Fem- from a Feminist Perspective, edited by M. Maynard
inist Perspectives on Epistemology, Metaphysics, and J. Purvis. London: Taylor and Francis.
Methodology and Philosophy of Science, edited by Kirsch, Gesa E. 2005. ‘Friendship, Friendliness, and
S. Harding and M. Hintakka. Dordrecht: D. Reidel Feminist Fieldwork.’ Signs: Journal of Women in
Publishing. Culture and Society 30: 2163–2172.
Hartsock, Nancy. 1985. Money, Sex and Power: Kitzinger, Jenny. 1994. ‘The Methodology of Focus
Toward a Feminist Historical Materialism. Boston: Groups: The Importance of Interaction between
Northeastern University Press. Research Participants. Sociology of Health & Illness
Hesse-Biber, Sharlene Nagy, and Michelle L. Yaiser. 16(1): 103–121.
2004. Feminist Perspectives on Social Research. Ladson-Billings, G. 2000. ‘Racialized Discourses and
New York and London: Oxford University Press. Ethnic Epistemologies.’ pp. 257–277 in Handbook
Hiller, Harry H. and Linda DiLuzio. 2004. ‘The of Qualitative Research, 2nd edition, edited by
Interviewee and the Research Interview: Analysing N. K. Denzin and Y. S. Lincoln. Thousand Oaks, CA:
a Neglected Dimension in Research.’ The Canadian Sage.
Review of Sociology and Anthropology 41: 1–26. Lather, Patti. 2001. ‘Postbook: Working the Ruins of
Feminist Ethnography.’ Signs 27: 199–227.
Holland Janet and Caroline Ramazanoglu. 1994.
Lather, Patti and Chris Smithies. 1997. Troubling the
‘Coming to Conclusions: Power and Interpreta-
Angels: Women Living with HIV/AIDS. Boulder, CO:
tion in Researching Young Women’s Sexuality.’
Westview.
pp. 125–148 in Researching Women’s Lives from
Letherby, Gayle. 2003. Feminist Research in Theory and
a Feminist Perspective, edited by M. Maynard and
Practice. Buckingham: Open University Press.
J. Purvis. London: Taylor and Francis.
Letherby, Gayle. 2004. ‘Quoting and Counting: An
Hollway Wendy and Toni Jefferson. 2000. Doing
Autobiographical Response to Oakley.’ Sociology
Qualitative Research Differently: Free Association,
38: 157–189.
Narrative and the Interview Method. London: Sage. Lewis, Gail. 2000. ‘Race,’ Gender and Social Welfare.
Holt, Thomas A. 1994. ‘Experience and the Politics of London: Polity Press.
Intellectual Inquiry.’ In Questions of Evidence: Proof, Lloyd, Genevieve. 1983. Man of Reason. London:
Practice and Persuasion across the Disciplines, edited Routledge.
by J. Chandler, A. I. Davidson, and H. Harootunian. Longino, Helen E. and Ruth Doell. 1983. ‘Body, Bias,
Chicago: University of Chicago Press. and Behaviour: A Comparative Analysis of Reasoning
Hooks, Bell. 1989. Talking Back: Thinking Feminist, in Two Areas of Biological Science.’ Signs: Journal of
Thinking Black. Boston: South End Press. Women in Culture and Society 9: 206–227.
Hooks, Bell. 1990. Yearning: Race, Gender and Cultural Lorde, Audre. 1984. Sister Outsider: Essays and
Politics. Boston: South End Press. Speeches. Berkeley, California: The Crossing Press.
Hyams, Melissa. 2004. ‘Hearing Girls’ Silences: Thoughts Mama, Amina. 1995. Beyond the Mask: Race, Gender
on the Politics and Practices of a Feminist Method and Subjectivity. London: Routledge.
of Group Discussion.’ Gender, Place and Culture 11: Mandell, Deena. 2002. Deadbeat Dads: Subjectivity and
105–119. Social Construction. Toronto: University of Toronto
Jayaratne, T. E. 1983. ‘The Value of Quantitative Press.
Methodology for Feminist Research.’ In Theories Mason, Jennifer. 2002. ‘Qualitative Interviewing:
of Women’s Studies, edited by G. Bowles and Asking, Listening and Interpreting.’ pp. 225–241 in
R. Duelli-Klein. London: Routledge and Kegan Paul. Qualitative Research in Action, edited by T. May.
Keller, Evelyn Fox. 1983. A Feeling for the Organism: London: Sage Publications.
The Life and Work of Barbara McClintock. New York: Mauthner, Natasha S. and Andrea Doucet. 1998.
W.H. Freeman. ‘Reflections on a Voice Centred Relational Method
QUALITATIVE INTERVIEWING AND FEMINIST RESEARCH 341
of Data Analysis: Analysing Maternal and Domes- Naples, Nancy A. 1996. ‘A Feminist Revisiting of
tic Voices.’ In Feminist Dilemmas in Qualitative the ‘Insider/Outsider’ Debate: The ‘Outsider Phe-
Research: Private Lives and Public Texts, edited nomenon’ in Rural Iowa.’ Qualitative Sociology 19:
by J. Ribbens and R. Edwards. London: Sage 83–106.
Publications. Naples, Nancy A. 2003. Feminism and Method: Ethnog-
Mauthner, Natasha S. and Andrea Doucet. 2003. raphy, Discourse Analysis, and Activist Research.
‘Reflexive Accounts and Accounts of Reflexiv- New York and London: Routledge.
ity in Qualitative Data Analysis.’ Sociology 37: Narayan, Kiran. 1993. ‘How Native is a ‘Native’ Anthro-
413–431. pologist?’ American Anthropologist 95: 671–686.
Maynard, Mary. 1994. ‘Methods, Practice and Episte- Oakley, Ann. 1974. Housewife. London: Allen Lane.
mology: the Debate about Feminism and Research.’ Oakley, Ann. 1981. ‘Interviewing Women: A Contradic-
pp. 10–26 in Researching Women’s Lives from tion in Terms.’ pp. 30–61 in Doing Feminist Research,
a Feminist Perspective, edited by M. Maynard and edited by H. Roberts. London: Routledge and Kegan
J. Purvis. London: Taylor and Francis. Paul.
Maynard, Mary and June Purvis. 1994. Researching Oakley, Ann. 1998. ‘Gender, Methodology and People’s
Women’s Lives from a Feminist Perspective. London: Ways of Knowing: Some Problems with Feminism and
Taylor and Francis. the Paradigm Debate in Social Science.’ Sociology 32:
McCall, Leslie. 2005. ‘The Complexity of Intersectional- 707–731.
ity.’ Signs: Journal of Women in Culture and Society Olesen, Virgina. 1998. ‘Feminism and Models of
30: 1771–1799. Qualitative Research.’ In The Landscape of Qualitative
McCormack, Coralie. 2004. ‘Storying Stories: A Narra- Research: Theories and Issues, edited by N. K. Denzin
tive Approach to In-Depth Interview Conversations.’ and Y. S. Lincoln. Thousand Oaks, California: Sage.
Olesen, Virgina. 2005. ‘Early Millennial Feminist Quali-
International Journal of Social Research Methodology
tative Research.’ pp. 235–278 in The Sage Handbook
7(3): 219–236.
of Qualitative Research, edited by N. K. Denzin and
McDermott, Elizabeth. 2004. ‘Telling Lesbian Stories:
Y. S. Lincoln. Thousand Oaks, CA: Sage.
Interviewing and the Class Dynamics of ‘Talk’.’
Oyewumi, Oyeronke. 2000. ‘Family Bonds/Conceptual
Women’s Studies International Forum 27: 177–187.
Binds: African Notes on Feminist Epistemologies.’
McNay, Lois. 1993. Foucault and Feminism: Power,
Signs: Journal of Women in Culture and Society
Gender and the Self. Boston, MA: Northeastern
25: 1093–1098.
University Press.
Patai, Daphne. 1991. ‘U.S. Academics and Third World
Mies. M. 1983. ‘Towards a Methodology for Feminist Women: Is Ethical Research Possible?’ pp. 137–153
Research.’ In Theories of Women’s Studies, edited by in Women’s Words: The Feminist Practice of Oral
G. Bowles and R. Duelli Klein. London: Routledge and History, edited by S. B. Gluck and D. Patai. New York:
Kegan Paul. Routledge.
Miller, Jean Baker. 1976. Towards a New Psychology of Pini, Barbara. 2002. ‘Focus Groups, Feminist Research
Women. London: Penguin Books. and Farm Women: Opportunities for Empowerment
Mirza, Heidi Safia. 1998. Black British Feminism: in Rural Social Research.’ Journal of Rural Studies
A Reader. London: Routledge. 18: 339–351.
Mohanty, Chandra Talpede. 1988. ‘Under Western Pollack, Shoshana. 2003. ‘Focus-Group Methodology
Eyes: Feminist Scholarship and Colonial Discourses.’ in Research with Incarcerated Women: Race, Power,
Feminist Review 30: 61–88. and collective experience.’ Affilia 18: 461–472.
Mohanty, Chandra Talpede. 1991. ‘Under Western Eyes: Presser, Lois. 2004. ‘Violent Offenders, Moral Selves:
Feminism and Colonial Discourse.’ In Third World Constructing Identities and Accounts in the Research
Women and the Politics of Feminism, edited by Interview.’ Social Problems 51: 82–101.
C. T. Mohanty, A. Russo, and L. Torres. Bloomington: Presser, Lois. 2005. ‘Negotiating Power and Narrative
Indiana University Press. in Research: Implications for Feminist Methodology.’
Mohanty, Chandra Talpede, Ann Russo, and Lourdes Signs: Journal of Women in Culture and Society
Torres. 1991. ‘Third World Women and the Politics of 30: 2067–2090.
Feminism.’ Bloomington: Indiana University Press. Pryke, Sam. 2004. ‘“Some of Our People Can Be the
Mol, Annemarie. 2002. The Body Multiple: Ontology in Most Difficult.” Reflections on Difficult Interviews.’
Medical Practice. Durham, NC: Duke University Press. Sociological Research Online 9.
Munday, Jennie. 2006. ‘Identity in Focus: The Use of Ramazanoglu, Caroline. 1989. ‘Improving on Sociology:
Focus Groups to Study the Construction of Collective The Problems of Taking a Feminist Standpoint.’
Identity.’ Sociology 40: 89–105. Sociology 23: 427–442.
342 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Ramazanoglu, Caroline and Janet Holland. 2002. Scott, Joan W. 1992. ‘Experience.’ pp. 22–40 in
Feminist Methodology: Challenges and Choices. Feminists Theorize the Political, edited by J. Butler
London: Sage Publications. and J. W. Scott. London: Routledge.
Reinharz, Shulamit. 1979. On Becoming a Social Scott, Joan W. 1994. ‘A Rejoinder to Thomas C.
Scientist. San Francisco: Jossey-Bass. Holt.’ In Questions of Evidence: Proof, Practice
Reinharz, Shulamit. 1992. Feminist Methods in Social and Persuasion across the Disciplines, edited by
Research. Oxford: Oxford University Press. J. Chandler, A. I. Davidson, and H. Harootunian.
Reinharz, Shulamit. 1997. ‘Who Am I? The Need for Chicago: University of Chicago Press.
a Variety of Selves in the Field.’ pp. 3–20 in Refelxivity Shohat, Ella. 2001. ‘Area Studies, Transnationalism and
and Voice, edited by R. Hertz. Thousand Oaks, CA: the Feminist Production of Knowledge.’ Signs: Journal
Sage. of Women in Culture and Society 26: 1269–1272.
Reynolds, Tracey. 2002a. ‘Re-thinking a Black Feminist Skeggs, Beverley. 1997. Formations of Class and
Standpoint.’ Ethnic and Racial Studies 25: 591–606. Gender. London: Sage.
Reynolds, Tracy. 2002b. ‘On Relations Between Black Smith, Dorothy. 1974. ‘Women’s Perspective as a Rad-
Female Researchers and Participants.’ pp. 300–310 ical Critique of Sociology.’ Sociological Inquiry 4: 1–13.
in Qualitative Research in Action, edited by T. May. Smith, Dorothy. 1987. The Everyday World as Problem-
London: Sage Publications. atic: A Feminist Sociology. Milton Keynes, UK: Open
Ribbens, Jane. 1989. ‘Interviewing – an Unnatural University Press.
Situation?’ Women’s Studies International Forum Smith, Dorothy. 1989. ‘Sociological Theory: Methods
12: 579–592. of Writing Patriarchy.’ In Feminism and Sociological
Ribbens, Jane. 1994. Mothers and their Children. Theory, edited by R. A. Wallace. London: Sage.
London: Sage. Smith, Dorothy. 1999. Writing the Social: Critique,
Ribbens, Jane. 1998. ‘Hearing my Feeling Voice? Theory and Investigations. Toronto: University of
An Autobiographical Discussion of Motherhood.’ Toronto Press.
pp. 24–38 in Feminist Dilemmas in Qualitative Song, Miriam and Ian Parker. 1995. ‘Commonality,
Research: Private Lives and Public Texts, edited by Difference, and the Dynamics of Disclosure in In-depth
J. Ribbens and E. Rosalind. London: Sage. Interviewing.’ Sociology 29 :241–256.
Ribbens, Jane and Rosalind Edwards. 1998. Feminist Spivak, Gayatri Chakravorty. 1993. Outside in the
Dilemmas in Qualitative Research: Private Lives and Teaching Machine. New York: Routledge.
Public Texts. London: Sage. Stacey, Judith. 1991. ‘Can There be a Feminist
Richardson, Laurel. 1988. ‘The Collective Story: Post- Ethnography?’ pp. 111–120 in Women’s Words: The
modernism and the Writing of Sociology.’ Sociological Feminist Practice of Oral History, edited by S. B. Gluck
Focus 21: 199–208. and D. Patai. New York: Routledge.
Richardson, Laurel. 1997. Fields of Play: Constructing an Stanley, Liz. 1994. ‘The Knowing Because Experiencing
Academic Life. New Brunswick, NJ: Rutgers University Subject: Narratives, Lives, and Autobiography.’
Press. pp. 132–149 in Knowing the Difference: Feminist
Riessman, Catherine. 1987. ‘When Gender is not Perspectives in Epistemology, edited by K. Lennon
Enough: Women Interviewing Women.’ Gender and and M. Whitford. London: Routledge.
Society 1: 172–207. Stanley, Liz and Sue Wise. 1983. Breaking Out. London:
Rose, Hilary. 1994. Love, Power and Knowledge: Routledge and Kegan Paul.
Towards a Feminist Transformation of the Sciences. Stanley, Liz and Sue Wise. 1990. Feminist Praxis:
Cambridge: Polity Press. Research, Theory and Epistemology in Qualitative
Sawicki, Jana. 1991. Disciplining Foucault: Feminism, Research. London: Routledge.
Power and the Body. New York: Routledge. Stanley, Liz and Sue Wise. 1993. Breaking Out Again.
Schutte, Ofelia. 1993. Cultural Identity and Social London: Routledge and Kegan Paul.
Liberation in Latin American Thought. Albany: Suny Stone, Sandy. 1991. ‘The Empire Strikes Back: A Post-
Press. Transsexual Manifesto.’ In Body Guards, edited by
—— 1998. ‘Cultural Alterity: Cross-Cultural Com- J. Epstein and K. Straub. New York: Routledge.
munication and Feminist Thought in North-South Tang, Ning. 2002. ‘Interviewer and Interviewee Rela-
Dialogue.’ Hypatia 13: 53–72. tionships Between Women.’ Sociology 36: 703–721.
—— 2000. ‘Negotiating Latina Identities.’ In Hispan- Taylor, Janette Y. 2002. ‘Talking Back: Research as an
ics/Latinos in the United States: Ethnicity, Race and Act of Resistance and Healing for African American
Rights, edited by J. E. Gracia and P. De Grief. London: Women Survivors of Intimate Male Partner Violence.’
Routledge. Women and Therapy 25: 145–160.
QUALITATIVE INTERVIEWING AND FEMINIST RESEARCH 343
Taylor, Verta and Leila J. Rupp. 2005. ‘When the Girls Weedon, Chris. 1987. Feminist Practice and Poststruc-
are Men: Negotiating Gender and Sexual Dynamics turalist Theory. Oxford: Blackwell Publishers.
in a Study of Drag Queens.’ Signs: Journal of Women Weeks, Kathi. 1998. Constituting Feminist Subjects.
in Culture and Society 30: 2115–2140. Ithaca and London: Cornell University Press.
Temple, Bogusia and Rosalind Edwards. 2002. ‘Inter- Westmarland, Nicole. 2001. ‘The Quantitative/
preters/Translators and Cross-Language Research: Qualitative Debate and Feminist Research: A Sub-
Reflexivity and Border Crossings.’ International jective View of Objectivity.’ Forum Qualitative
Journal of Qualitative Methods 1(2), Article Sozialforschung/Forum: Qualitative Social Research 2.
1.http://www.ualberta.ca/∼ijqm/ Date of access: Wilkinson, Sue. 1999. ‘Focus Groups in Feminist
December 12, 2006. Research: Power, Interaction and the Co-Construction
Thapar-Bjorkert, Suruchi and Marsha Henry. 2004. of Meaning.’ Psychology of Women Quarterly 23:
‘Reassessing the Research Relationship: Location, 221–244.
Position and Power in Fieldwork Accounts.’ Inter- Wilkinson, Sue and Celia Kitzinger. 1996. Representing
national Journal of Social Research Methodology the Other: A Feminism and Psychology Reader.
7: 363–381. London: Sage.
Wahab, Stephanie. 2003. ‘Creating Knowledge Collab- Wolf, Marjery. 1992. A Thrice Told Tale: Feminism,
oratively with Female Sex Workers: Insights from Postmodernism and Ethnographic Responsibility.
a Qualitative, Feminist, and Participatory Study.’ Stanford: Stanford University Press.
Qualitative Inquiry 9: 625–642. Zavella, Patricia. 1993. ‘Feminist Insider Dilemmas: Con-
Warr, Deborah J. 2005. ‘“It Was Fun … But We Don’t structing Ethnic Identity with ‘Chicana’ Informants.’
Usually Talk About These Things”: Analyzing Sociable pp. 42–62 in Situated Lives: Gender and Culture in
Interaction in Focus Groups.’ Qualitative Inquiry 11: Everyday Life, edited by L. Lamphere, H. Ragone, and
200–225. P. Zavella. London: Routledge.
20
Biographical Methods
Joanna Bornat
Had I been writing this chapter only a few Simply putting a term such as ‘life story’ into
years ago I would have had a much easier task. Google brings hundreds and thousands of hits.
But now, in the first decade of the twenty- This is all good news, if difficult to assimilate.
first century, containing developments in Biographical methods thrive on invention
biographical methods in under eight thousand and have changed and adapted to methodo-
words, borders on the impossible. What logical, theoretical and technological change.
was an area of work scarcely acknowledged The arrival of the small portable audio
beyond groups of committed oral historians, recording machine has undoubtedly played a
occasional sociologists, auto/biographers and leading role. Indeed it would be impossible
ethnographers has become a vast and con- to imagine much of what is now recognised
stantly changing and expanding ferment of as biographical work without it. Gone are the
creative work, drawing in new as well as days when using a machine to record inter-
career-old researchers. In critical pedagogy, views was seen as a form of journalism, to be
cultural studies, critical race theory, geron- eschewed by sociologists and anthropologists
tology, decolonising research, social policy, in the field1 . Now we have the capability to
health studies, feminisms, identity theory, capture not only sounds but visual expression
studies of sexuality, employment, family and and to send the information round the world,
management theory, the range of areas in or next door in a matter of seconds.
which biographical methods have been taken In this chapter, I focus on ways in which
up is vast. All reach for meaning and accounts individual life experience is generated, anal-
in individual biographies to both confirm and ysed and drawn on to explain the social world.
complicate understandings of the working However generated, the common denomina-
and emergence of social processes and tor is that accounts are solicited and told in
relationships in place and through time. And the first person. I focus on three very different
this is only within academe. Telling your story, approaches, briefly outlining each in turn and
the public confessional, the personal account finally look at some ways to distinguish each
has become a totally pervasive form, as any in a final, and unashamedly partisan argument
quick check through the media will show. for the contribution of oral history. There are,
BIOGRAPHICAL METHODS 345
the biographical interpretive methods, oral an archive like Mass Observation or on-line
history and narrative analysis. interactive websites.
How best then to give shape and meaning to
this task? How to organise and communicate
BIOGRAPHICAL METHODS a framework which is an aid to understanding
and which provides a manageable and yet
‘Biographical methods’ is an umbrella term inclusive approach to presenting biographical
for an assembly of loosely related, variously methods? In sorting through the various
titled activities: narrative, life history, oral activities I looked for themes which would
history, autobiography, biographical interpre- bring out the strengths of biographical
tive methods, storytelling, auto/biography, approaches while highlighting what are for me
ethnography, reminiscence. These activities the most innovative and creative aspects of
tend to operate in parallel, often not recog- the contribution they make to social research
nising each other’s existence, some char- methods. On that basis the themes I will be
acterised by disciplinary purity with others working with are: interactivity, subjectivity
demonstrating deliberate interdisciplinarity. and structuring. I’ll explain briefly what
To explain and present such disparity feels I mean by each of these themes.
like a demanding intellectual undertaking. By interactivity I mean the generation
History, psychology, sociology, social policy, of data through some kind of direct social
anthropology, even literature and neuro- interaction. This is likely to be an interview
biology at times, all have a part to play. or at least a situation which involves, or
By their very nature, biographical methods has involved, face-to-face verbal exchange.
encourage a universalistic and encompassing This leads to the inclusion of biographical
approach, encouraging understanding and interpretive methods, oral history, reminis-
interpretation of experience across national, cence, storytelling, life history and narrative,
cultural and traditional boundaries, better but not autobiography, auto/biography or
to understand individual action and engage- ethnography. By choosing subjectivity I am
ment in society. See for example, Prue highlighting the extent to which the method
Chamberlayne and Annette King’s com- leads to the expression of the self, a focus on
parative study of family caring in East feelings and emotions providing insight into
and West Germany and Britain drawing on individual perceptions and understandings of
biographical interview data (Chamberlayne situations and experiences. All the activities
& King, 2000), James Hammerton and I have identified could be included under
Alistair Thomson’s life history interviews this theme, though some, for example oral
with UK migrants to Australia in the 1950s history, have at different times, and in varying
and 1960s (Hammerton & Thomson, 2005), settings shown less attention to the self, while
and African-American women’s accounts of for others, example auto/biography, see the
their professional lives in Gwendolyn Etter- positioning of the self, as generator or reader
Lewis’s study (1993). of the text as a main focus of attention
The personal and individual nature of (Stanley, 1994).
biographical data adds an additional layer With structuring I intend to convey the idea
of complexity. Biographical researchers work that biographical methods aim to generate
with a range of different types of data includ- accounts or data which, either by means of
ing diaries, notebooks, interactive websites, direct questioning, or through the nature of
videos, weblogs and written personal narra- individuals’ own responses, have an obvious
tives with methods of collection varying from or implicit structure. Again, this feels all-
the directly interventionist in, for example inclusive as what account, either told or
oral history interviewing, to a more detached expressed, does not have some kind of
encouragement and stimulation to write and narrative, a beginning or an ending? Or
record as in the collection of accounts through what story is not connected in some way
346 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
to the bigger picture, be it childbirth, war, most respects, but with some individual
schooling or sexuality? This may indeed be differences which show the distinctiveness of
the case; however, by structuring, I mean each. In what follows I draw on several of
the idea that the methods used rely on some the works cited above where these lineages
kind of prior theorising or framework of and identities are drawn out. A familiar
ideas on the part of the researcher. This is starting point is the group of sociologists
not to rule out informal structuring or the known as the ‘Chicago School’ and their
kind of everyday theorising people develop work in the first 40 years of the twentieth
in order to explain their lives but for my century. The focus on the collection of direct
purposes here to emphasise the contribution testimony and on observation under realistic
which the theorising and methods of particular conditions led to methodological innovation
disciplines, such as psychology, sociology in a number of areas. Urban society came
or history make to the generation of the under scrutiny, with studies of poverty, street
data. So, I would exclude storytelling and gangs, and high life. Alongside this strongly
autobiography from this particular category. engaged and situated commitment came a new
Finally, context; by this I mean the ways development in social psychology. Herbert
in which an individual account, or set of Mead’s idea of ‘the self’ (1934) stressed
accounts, is given meaning by its own the significance of language, culture and
framework of time and space and by those non-verbal communication, with its focus
of the researcher and interpreter of the on social interaction and reflection in the
data. Context is not only to be seen in development of the individual’s sense of who
terms of setting or the historical time or they are. His notion of the self as having
social and political structures surrounding a its own meaning and sense of reality, iden-
particular account; it also includes the agency tifiable and recognisable in relation to social
and agendas of researcher and researched, or historical context, provided a challenge
their biographical time. Autobiography and to arguments which gave primacy to the
storytelling fit less well once again. Where investigator’s or commentator’s perspective.
the main source is the single-authored account Students, teachers and researchers associated
generated independently for an audience, with the Chicago School were to generate
rather than with another, context has fewer some of the most influential developments
dimensions for exploration. in sociology; amongst these were symbolic
The burgeoning of interest in the per- interactionism (Plummer, 1991) and grounded
spective of the individual, in what has been theory (Glaser & Strauss, 1968).
described as a more ‘humanistic’ approach in It is with this background in mind that
sociological research has resulted in review I now go on to take a closer look at the
articles and books which in their different first of the three methods I identified under
ways have helpfully sketched out origins the biographical ‘umbrella’: the biographical
and developments in work with biography interpretive method.
(Plummer, 2001; Thompson, 2000; Roberts,
2002; Seale et al., 2004; Thomson, 2007).
Biographical interpretive method
This is an exciting area in which to work.
Biographical work engages with many of Fritz Schütze, a sociologist writing in
the most telling and enduring epistemological Germany in the 1980s is usually credited
and methodological issues in the human sci- with the originating work which led to
ences taking in debates on validity, memory, the development of the biographical inter-
subjectivity, standpoint, ethics, voice and pretive method. He was greatly influenced
representivity amongst others (Chamberlayne by ‘third generation Chicagoans’ such as
et al., 2000, p. 3). Anselm Strauss, Howard Becker, Erving
The three methods I have chosen to Goffman and others (Apitzsch & Inowlocki,
concentrate on have shared antecedents in 2000, p. 58). The interview method and
BIOGRAPHICAL METHODS 347
its subsequent analysis which he developed addresses the qualitative data with hypotheses
and which has been further refined by which draw on significant segments of text.
Gabriele Rosenthal (2004), who followed his Wengraf (2001) details the procedure for
theoretical and methodological lead, requires interpreting biographical data, showing with
the separating out of the chronological story a detailed account, how hypotheses are
from the experiences and meanings which arrived at and then worked through, as the life
interviewees provide. The process depends story is explored. Life events, as told by the
on an understanding of the biographical interviewee, are looked at and hypotheses and
interview as a process in which movement counter hypotheses drawn up and explored,
between past, present and future is constant preferably by groups of people working
and in which the interviewee may not be together, as to likely effects on someone’s
fully aware of contexts and influences in later life.
their life. This phenomenological approach to under-
Rosenthal and her erstwhile collaborator standing biographical data focuses on the
Wolfgang Fischer, developed this approach individual’s perspective within an observ-
into what is now usually known as ‘bio- able and knowable historical and structural
graphical interpretive analysis’ or ‘biographic context, and what it is like to be the
narrative interpretive analysis’ (Wengraf, person describing their lives and the various
2001). She had been interested in explaining decisions, turns and patterns of that life
work and life ethics in post World War II (Wengraf, 2001, pp. 305–6).
West German society being convinced that At one level what Wengraf is describing
the sense which people made of their lives is a complex process of interpretation, a
under the Third Reich played a central role shared and carefully documented practice
(Rosenthal, 2004, p. 49). Since Rosenthal and of searching for themes in data typical
Fischer’s early development, the method has of a grounded theory approach (Wengraf,
been given much more elaborated treatment, 2001, p. 280). However, at quite another
using individual case study analysis, based on level the analysis expects a deep level of
interview transcripts, by Prue Chamberlayne explanation and interpretation, one which
and colleagues. Their particular interest has looks for hidden and explicit meanings in the
been to theorise and explain the impact transcript. Just how this differs from the other
of social welfare policies through embrac- two approaches I’ve identified, I will come
ing the subjectivity and agency of wel- back to this later in this chapter.
fare recipients, linking private and public Oral history’s distinctive characteristic is
spheres, as these are experienced, expressed its use of sociological approaches to data
and represented through individual accounts generation and analysis in what is an historical
(Chamberlayne & King, 2000; Chamberlayne pursuit. Even though the development of the
et al., 2000, 2004). interview as a tool of investigation has a
The systematisation inherent in this much longer history, the significance of the
approach requires the elaborate codification Chicago School, as Paul Thompson points
of the interview in such a way as to identify out in his seminal text, The Voice of the
themes, having separated out the ‘lived life’ Past, was its effect on the idea of the
from the ‘told story’ in the transcribed inter- life history (2000). The interview became
view (Wengraf, 2001, p. 231). This distinction more than simply extraction of information
separates the chronological sequence of the around specific topics; it became an object
events of a life from the way that the story in itself with shape and totality given by the
is told. By identifying how someone relates individual’s told life events.
to their story, in the telling, labelling text In an early essay, the Italian oral historian
segments as to whether they are descriptive, Alessandro Portelli, argues ‘What makes
argumentative, reporting, narrative or oral history different’. Having identified oral
evaluative, biographical interpretive analysis history’s particular qualities as ‘the orality
348 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
of oral sources’ arguing for attention to of the legend of Anzac solidarity amongst
the sounds and turns of speech as opposed Australian World War I veterans (Thomson,
to the written transcript and as ‘narrative’, 1994), oral historians more typically seek
pointing out variations in narrative forms and ways of representivity through theoretical
styles, he goes on to argue oral history’s sampling, with contacts made opportunisti-
unique qualities. These are, he suggests, cally or through snowballing (see for example
‘that it tells us less about events than about Thompson, 1975; Bertaux, 1981; Lummis,
their meaning’ (his emphases) and that ‘the 1987; Bornat, 2002; Hammerton & Thomson,
unique and precious element which oral 2005, Merridale, 2005). As for data analysis,
sources possess in equal measure is the a range of approaches, some more familiar
speaker’s subjectivity’ (1981, p. 67). From to historians and some to sociologists, are
this, he argues that, ‘oral sources’ have a typically followed by oral historians, who
‘different credibility’ (p. 100, his emphasis) tend to take a more eclectic approach
and that ‘today’s narrator is not the same methodologically than researchers using the
person as took part in the distant events biographical interpretive method. In the main
he or she is relating’ (p. 102). It follows, these would be recognisable as thematic in
therefore that, ‘Oral sources are not objective’ approach, drawing directly or indirectly on
they are ‘artificial, variable and partial’ the type of constant comparative analysis and
(p. 103, his emphases). theme searching typical in grounded theory
Portelli’s position has been taken up (Glaser & Strauss, 1968).
subsequently in studies of ethnicity, class, Given oral history’s early commitment to
gender, colonialism, tradition, displacement, a form of history-making which seeks to
resistance, exclusion, by oral historians who give expression to marginalised voices with
see the method as particularly suited to emphasis on the importance of language,
understandings of oppression and marginal- emotions and oral qualities generally, data
isation. With this unashamedly political and analysis presents something of a moral
partisan approach to history, a contribution challenge as Thompson and others have
to the histories of elites was always going pointed out (Borland, 1991; Portelli, 1997,
to be less likely, though there have been pp. 64&ff; Thompson, 2000, p. 269&ff;
some exceptions, for example Courtney & Bornat & Diamond, 2007). The tension lies
Thompson’s study of business elites in the in a commitment to the presentation of the
city of London (1997) and Seldon and actual words of interviewees while seeking a
Pappworth’s case studies of elites in their way to generalize from a number of stories
handbook of elite oral history (1983). without creating too much distance between
Oral history in its early and subsequent the original recording or text and the resulting
development drew sociology for methods publication, be it hard copy, electronic or
of structuring data collection. Writing and sound and vision presentation.
researching in the context of the sociology
department at the University of Essex in
Narrative analysis
the mid 1960s (Thompson & Bornat, 1994),
Thompson was familiar with the develop- The third area of biographical activity I
ment of grounded theory as a solution to have identified, narrative analysis, also traces
sampling from a population of survivors its origins back to the Chicago School.
(2000, p. 151). While some studies have The move towards the subject as author
rested on only a handful of interviewees, for and source of evidence, through the telling
example Alessandro Portelli’s investigation of their story became its defining feature
into local memory of a massacre of civilians in the 1920s. However, where those early
by German troops occupying Tuscany in sociologists of the city were intent on
1944 (Portelli, 1997), or Al Thomson’s use capturing reality from accounts, narrative
of four life histories in his exploration theorists see the story as a greater sum
BIOGRAPHICAL METHODS 349
of parts than the particularities of events, and an audience: ‘us’ (Riessman, 1993,
atmospheres, environments and relationships pp. 18–19).
described. Catherine Kohler Riessman, a When it comes to analysing narrative data,
leading narratologist, explains how narratives Riessman and others point out (Andrews
interpreted through use of language, symbolic et al., 2004) ‘… there is no one (her
representations and cultural forms, provide emphasis) method’ (1993, p. 5). Indeed the
access to understanding the workings across pervasiveness of narrative studies with use
and within time of gender, class, culture, in, for example, medicine (Greenhalgh &
ethnicity, place and age, to name but a few Hurwitz, 1998), anthropology (Skultans,
social divisions and differences (1993, p. 5). 1998), psychology (Sarbin, 1986; Crossley,
This plurality does, however, mean that as 2000), media studies (Ryan, 2004), feminist
she also points out: ‘There is considerable studies (Personal Narratives Group, 1989),
disagreement about the precise definition of linguistics (Bamberg, 1997), organisation
narrative’ (1993, p. 17). studies (Denning, 2005), history (Roberts,
A focus on story or narrative sees telling, 2001), and literature (Hawthorn, 1985)
relating and recounting as a central and suggests a plethora of possible analytical
universal human activity. Lives, it is argued, procedures.
are constructed, and presented to listeners As a way to manage this diversity, to pull
in storied forms. As Widdershoven argues: it within range of some reliable analytical
‘… a story is never a pure ideal, detached framework which others can respond to and
from real life. Life and story are not two which for her preserves acknowledges the
separate phenomena. They are part of the performative and interactive nature of the
same fabric, in that life informs and is formed interview Riessman advocates use of poetic
by stories’ (Widdershoven, 2003, p. 109). and literary forms as analytical tools. These,
For Polkinghorne, narrative has special sig- she argues, enable her to identify how a
nificance for the human sciences. He argues narrative is put together and to see what are
that it is, ‘… the linguistic form uniquely its particularities in terms of characteristics
suited for displaying human existence as of speech and discourse (1993, pp. 50–51).
situated action’. This very generality presents Seeking to keep ‘the teller’ in the centre
problems of definition he goes on to admit of her analysis is ‘starting from the inside’
(1995, pp. 5–7). looking for meanings shown in the way
Riessman’s solution to the problem of the words are presented, not ignoring issues
definition is to account for narratives in terms of power which may determine what is
of genre. Narratives are to be recognised said and how (Riessman, 1993, p. 61). The
to the extent to which they relate to a perspective of the interpreter, their particular
‘narrative genre’ with its own ‘persistence theoretical stance and even their personal
of certain elements’. She argues that the history, is bound to play a part. Like the
conventional idea of a story having characters oral historians, this presents a dilemma for
acting in various ways and moving towards her but one which she feels can be resolved
some kind of conclusion is not a sufficiently through a process of open reflection and
broad enough definition. Her narrative genre questioning, as she puts it: ‘the comfort of a
includes accounts where the same event is long tradition of interpretive and hermeneutic
described repeatedly – ‘habitual narratives’ – enquiry’ (1993, p. 61).
or which are ‘topic-centred’ where particular In these very brief sketches, I’ve identi-
kinds of events are linked through a common fied what I see as the distinctive features
theme or shared characteristic. She also of the biographical interpretive method, oral
includes ‘hypothetical narratives’ of events history and narrative method, focusing mainly
which never happened. What is distinctive, on their antecedents and rather different
she seems to be arguing, is that there approaches to the interpretation of personal
is a ‘teller’, an account of ‘a situation’ accounts. To begin with, I used four themes
350 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
UK cyclist when I interviewed her and Did you feel that it was a sort of – was that a
four other women for an edited collection part of the feel of it, do you think, that you were
of writing on older women (Bornat, 1993). with people who were, you know, you were like a
kind of group who were rather the same, or – ?
I invited her to tell me her life story, as Well, there wasn’t very many wealthy people
a cyclist and businesswoman (unusually for around in those days. If there were they were
the cycling world she ran her own shop). nothing to do with us. You know, they’d be in a
She began with an unbroken account of her different society. There was sort of two societies,
early years as a cyclist, replete with technical wealthy people and poor people. Or moderately
poor. But there was never all running into one like
terms related to cycle racing and bike parts. they do now these days.
I was keen to guide her towards talking more Did it feel like that did it? That you were very
about the social world of cycling and took separate somehow?
this opportunity with a question about her Well yes. Because they never did the things we
first husband: did. You’d hear about them going to these dinners
and things up the town, but it never, you didn’t
even know them, half of them. It was a different
So was your first husband a cyclist as well? world. I mean, if we went to a dinner, it was only the
Yes, he was a cyclist, yes. But he used to go one year dinner, our club dinner, that was the only
out with another club. We didn’t go out with our dinner we ever went to. And I hadn’t got any clothes
club, because there wasn’t any women in that club. to go out in. I had nothing, only cycle clothes, that
I used to go out with the Actonia CC … But I also was all I had. I worked in them, I did the housework
belonged to the Clarion, which was a union all in them. The milkman would knock the door and
over the country, the Clarion were. Supposed to I was in my shorts, you know …
be Labour club, but I mean, I didn’t go to it because
it was a Labour club. Because they used to threaten As she answered my question about her
to throw me out all the time, because I used to –
didn’t agree with what they said. You know, you’re
husband I realised that she was beginning
supposed to be Labour, you know, and half of them to talk about social and political divisions in
were communists. They used to go preaching down the cycling world. This was something that
on the Dorking, on the hills and things like that. And interested me very much. Leaving behind,
I thought, I mean, wasting my time down there, for the moment, the events of her life story,
you know, with that lot! So I used to go out on my
own then.
I began on a series of questions which I
Were they strict then, about that? hoped would lead her into talk about the class
They were very strict about whether you were politics of cycling between the two world
Labour or not, yes. Because if the heads there found wars in the UK. As is obvious from the
you talking about you were – I mean, I wasn’t transcript, I used various strategies. In the
anything really, but I used to annoy them, you
know, when I said, I’m not Labour, I don’t want
end she comes back to talk about herself as
to be Labour and all this. And they used to get ever a cyclist, positioning herself as a cyclist first,
so annoyed. And they said, well, we’re going to get then as a woman. It seems that for her, class
you chucked out, you know. I says, I don’t care, you and politics were an irrelevance, or in the case
know. But, er, they never did. of the socialist Clarion movement, a means to
I suppose cycling was, it was quite a kind of what
you might call a more working-class sort of leisure
an end: more cycling.
thing. If I had used no prompts I might not
It was mostly, oh yes, mostly poor people. have heard this particular account of her
I mean, there was never a car on the road when life, and the social world of cycling might
you raced. Only the time-keeper was the only car. well not have appeared at all. Biographical
I mean if you looked for the car, that was the start
of your race …
purists might argue that I was guilty of
And they’d all be people who would be, what distorting Pat’s story. In fact I would argue
working all week, like you, and spending all their the opposite, that I was encouraging her to
weekends – develop it and to reframe it through my
Oh yes, there was, oh, it took years and years interrogative dialogue. She would have told
for wealthy people to start cycling. Their sons might
cycle, and they used to come out n their big cars,
her story differently on another occasion, to
you know, and watch their son racing. But that kind another listener or interviewer. Undoubtedly
of thing didn’t happen for years and years. I was bringing my particular ‘cultural habitus’
352 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
(Hammersley, 1997) to that interview with all narrative research. However, while memory
that this entailed. In oral history the idea that gives us access and to experience before
somehow it might be possible to render one- our own time, to experience which might
self invisible or non-interfering is regarded as otherwise be unreachable since it may not
mythical and certainly not desirable (Portelli, be recorded in documentary formats, it is
1997, chapter 1; Thompson, 2000, p. 227; not necessarily always accurate. For Portelli
Bornat, 2004). this is one of its very strengths. Confronted
I make this point to contrast with by old communists whose tales of the past
both biographical interpretive and narra- were sometimes partial, even plainly false,
tive approaches. As I have already shown he turns the tables in a celebration of oral
the preferred approach in the biographical history’s ability to reveal what really mattered
interpretive method is for a contained non- to people, ‘… uncovering the contradiction
interventionist initial interview to be followed between reality and desire’ (Portelli, 1991,
by questioning led by the interviewer. This p. 116).
separation of interviewer and interviewed Memory also plays a function in the present
through the privileging of the interviewee’s and is as much about future hopes and
account in the first interview and of the intentions as it is about telling stories, bearing
interviewer’s interests in the second, excludes witness or confessing to past involvements
the possibility of a responsive interaction with and actions. It draws on and engages with
joint initiative taking on both sides. In a collective representations and can change
contrasting way, though narrative approaches according to audience, stimuli and time of
vary in their attitude to the part played by life (Coleman et al., 1998; Rose, 2003;
questions, their focus on the structure of the Draaisma, 2004). Indeed the reliance of oral
account in order to draw out the individual’s history on older people’s memories means
perspective, similarly gives little weight to being aware of the psychological tasks facing
the dialogic possibilities of the interview. older people towards the end of life (Bornat,
Context is relevant as Riessman emphasises, 2001). ‘Pastness’ for older people therefore
‘The text is not autonomous of its context’ needs to be seen as a multidimensional
(1993, p. 21) and she rejects the model of a remembering, but none the less valuable for
narrativist such as Labov who leaves out the that. I’ll take this point further with an excerpt
interviewer-interviewee relationship in their from an interview carried out for Margot
analysis (cited in Riessman, 1993, p. 20). Jefferys’research into the founders of geriatric
However, even in her hands, context, both medicine (Ogg et al., 1999; Jefferys, 2000).
historical and immediate is presented more Dr Ronald Dent, one of Jefferys’ intervie-
as a framework than as part of the data wees, was in his mid eighties at the time of his
and evidence of the interviewer’s presence interview:
is typically excised from the text being
What do you think of the new developments in
analysed. the National Health Service? Do you have any views
about that?
Well, I’m a bit scared that a vulnerable group like
Memory as a source for ‘pastness’ the elderly sick might not benefit as much as they
Elizabeth Tonkin, an anthropologist and oral should. In fact I think they might be neglected a bit
again. And that’s what frightens me. One wouldn’t
historian, prefers the term ‘representations like to feel that the work that all of us who had
of pastness’ to ‘history’. She argues that been in geriatric medicine, the work we’ve done to
though it is less elegant, it conveys more make it a good thing to do, might find, find that
of a sense of movement between past and our work has been let down a little bit because
present as people speak and others listen hospitals are so quick, so busy doing routine ops —
operations — which they get paid a lot for rather
(Tonkin, 1992, p. 2). The active role of than looking after strokes and other problems of
memory in oral history making again distin- the elderly which take a lot longer and need more
guishes it from biographical interpretive and resources. One hopes it’s not like that2 .
BIOGRAPHICAL METHODS 353
Some of Jefferys’ interviewees had worked about ownership and partnership (see for
since before the NHS and in its very early example Frisch, 1990). Some feminist oral
days. Medical care of older people had been historians have led the way in questioning
much neglected and was a major challenge assumptions as to any essential understanding
for the health service. At the end of their or solidarity across the microphone, as I have
careers these doctors were looking back at argued elsewhere (Borland, 1991; Bornat &
success, medically, and in policy terms. They Diamond, 2007; see also Armitage & Gluck,
had established a specialty and could point 2002). The result for many oral historians
to a much better standard of care for older is a practice which seeks to maintain the
people, in hospital and in the community integrity of the original interview, and of
than they had witnessed in the ex Poor the interviewee, by maintaining interpretive
Law hospitals at the start of their careers. distance.
However, they were being interviewed at To identify the subjectivity of the inter-
a time of change for the health service. view, to put oneself in their place, to
Many expressed concern at the introduction draw out understandings which are not
after 1979 of a market model and business necessarily articulated in the words of the
methods into health care. To add another transcript, are all recognisable and shared
contextual layer, these doctors were now interpretive practices. To look and listen
themselves old. Contemplating the possible for silences, experiences or relationships
end to what they had achieved had specific which are unspoken or unexpressed, is
personal resonance for their own healthcare. acknowledged as appropriate and rewarding,
‘Pastness’ is thus represented through mul- but to go beyond this and to seek out
tiple time frames, in this interview as in unconscious motivations, or ways of thinking,
other oral history interviews: remembered is perhaps to be guilty of over-interpretation.
time; the time of the interview; the ‘time’ The researcher, who may or may not be
of the interviewee and of the interviewer the original interviewer, has a duty to ask
and our own time in looking back at questions of the data, to theorise about
these particular archived interviews (Bornat, it and about the people and experiences
2005). represented in it, and to become more deeply
Memory as an individual and social practice embedded in it, but this, risks distancing
and a process with known and observable the interviewee from their own words. I’ll
features and effects is of central interest use one final example to show where I feel
to oral historians in ways that it does not that the line is drawn between oral history
appear to be in biographical interpretive and and biographical interpretive and narrative
narrative analysis. It enables a perspective approaches.
which includes the effect of time and the I spent more than two hours with Pat Hanlon
influence of change and continuity while recording her life history. She gave me a
maintaining the agency of the individual as detailed account of her progression as a cyclist
the central focus of interest. to becoming one of the best wheel builders in
the country, owning a shop and being married
twice, once early in her life and then again
Interpretive influence
much later, as she retired. What she didn’t tell
The last of the three areas of difference me was that she had a son, from whom she was
I identify here is interpretive influence. By estranged. She didn’t tell me and I didn’t ask
this I am drawing attention to the ways in her. She only finally told me when I gave her
which the three approaches I’ve been looking the book chapter in which she appears to check
at position the interpreter of the data in for accuracy and representation. She then let
relation to its originator, the interviewee. me know that it might be better to mention her
Oral history’s early commitment to a demo- son as otherwise her friends might was a little
cratic purpose has led to some pointed debates strange.
354 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
To be silent about such a defining experi- These are difficult questions to answer,
ence as motherhood, could be attributed to complicated by new debates about the ethics
some deep personal flaw. I might turn to some of the secondary analysis of archived data
psychological explanations for this apparent (Bornat, 2005).
pathology on her part; I could look back
through the transcript for clues as to her mind-
set and evidence of suppression of maternal
CONCLUSION
instincts, her predilection for wearing shorts
perhaps, or an apparently obsessive interest
The three biographical methods I have
in mileage. I could hypothesise as to her
discussed in this chapter each has a distinctive
decision-making and her reflection on her life
practice and, though they share origins
from the way she accounts for the events in
in the Chicago School of Sociology, they
her life. I could counterpose her lived life to
have developed along rather different inter-
her told life, drawing out inferences as to her
disciplinary lines. Where the biographical
motivations and tendencies as a mother and
interpretive method lends itself to more
a woman. But, in the end I find this to be a
psychoanalytic interpretations of motivation
process of distancing and indeed of subjecting
and meaning, narrative analysis leans more
Pat to an over-interpreted reconstruction of
towards sociolinguistics, while oral history
her life. She may have actively chosen not
draws across both sociology and history. Each
to mention her son because to mention him
gives centrality to the individual account in
would be upsetting. She may have decided
attempting to explain the changing nature and
to focus exclusively on her life as a cyclist;
persistence of social relations and social struc-
indeed she made few references to other
tures. While each makes use of the interview
aspects of her personal life, and only when
to generate data, only oral history continues
prompted by me. She may have retold the
to focus on the dynamics of the interview
narrative of her life for herself so that her son
through the process of interpretation and
was given no role. She might also have felt,
discussion. I have admitted a partisan position
as a public person, that her private life would
in my relationship with oral history but that is
be of little interest to me. Least possible, she
not to ignore the contribution of the other two
may simply have forgotten to mention her
approaches. In looking for ways to pin down
son. Whatever the reason, I can’t know and
the process of interrogating the data they force
though I could speculate and develop a theory
us to pay attention to explaining our thinking
relating to some developmental deficiency I
and analytical procedures, highlighting the
can see no advantage in this. To carry out
detail which a phenomenological approach
more interviews with older women cyclists
demands. My only concern is that in doing
might give me a better idea of Pat’s life in
so we risk an over-interpretation which rather
context. As it is, I have only her testimony
than emphasising the qualities of the original
to go on. Perhaps what I can draw out of
teller, eclipses them and puts the interpreter in
this experience is a sense of inadequacy as an
a position of authority and control.
interviewer. For once my interrogative powers
failed me.
But there is also another angle to inter-
pretive influence and this is the question of NOTES
ethics. How far is it ethical to subject another
person’s life to interpretation if the process 1 Fieldwork training for some trainee sociologists in
and outcome are likely to be unrecognisable to the 1960s involved making notes after the interview
them? How acceptable is an interpretation in or observation. Taping was definitely frowned on as
a poor substitute for skills in observation and recall
which there is no possibility of continuing dia- (Graham Fennell, personal communication).
logue and discussion, particularly where the 2 Margot Jefferys Interview number 306, deposited
data originated in an interview relationship? at the British Library Sound Archive.
BIOGRAPHICAL METHODS 355
of Old Age: Moving into the 21st Century, London, Sarbin, T.R. (1986) ‘The Narrative as a Root Metaphor for
Centre for Policy on Ageing, pp. 112–127. Psychology’, in T. R. Sarbin (ed) Narrative Psychology:
Personal Narratives Group (1989) Interpreting Women’s The Storied Nature of Human Conduct, New York,
Lives: Feminist Theory and Personal Narratives, Praeger, pp. 3–21.
Bloomington: Indiana University Press. Seale, C., Gobo, G., Gubrium, J. & Silverman, D. (2004)
Plummer, K. (1991) Symbolic Interactionism, vols 1&2, Qualitative Research Practice, London, Sage.
Aldershot, Edward Elgar. Seldon, A. & Pappworth, J. (1983) By Word of Mouth:
Plummer, K. (2001) Documents of Life 2, London, Sage. Elite Oral History, London, Methuen.
Polkinghorne, D. E. (1995) ‘Narrative configuration in Skultans, V. (1998) ‘Anthropology and narrative’, in
qualitative analysis’, in Hatch, J. A. & Wisniewski, R. Greenhalgh, T. & Hurwitz, B. eds, Narrative Based
eds, Life History and Narrative, London, Falmer. Medicine: Dialogue and Discourse in Clinical Practice,
Portelli, A. (1981) ‘What makes oral history different’, London, BMJ Books.
History Workshop, 12, 96–107. Stanley, L. (1994) ‘Sisters under the skin? Oral histories
Portelli, A. (1991) The Death of Luigi Trastulli and Other and auto/biographies’, Oral History, 22(2), 88–89.
Stories: Form and Meaning in Oral History, New York, Thompson, P. (1975) The Edwardians, London,
State University of New York Press. Weidenfeld & Nicholson.
Portelli, A. (1997) ‘The massacre at Civitella val di Chiani Thompson, P. (2000) The Voice of the Past, Oxford,
(Tuscany, June 29, 1944): myth and politics, mourning Oxford University Press, 3rd edition.
and common sense’, in Portelli, A. ed., The Battle Thompson, P. & Bornat, J. (1994) ‘Myths and memories
of Valle Giulia: Oral History and the Art of Dialogue, of an English rising 1968 at Essex’, Oral History 22(2),
Madison, University of Wisconsin Press, pp. 140–160. 44–54.
Riessman, C. K. (1993) Narrative Analysis, Newbury Thomson, A. (1994) Anzac Memories: Living with the
Park, Sage. Legend, Melbourne, Oxford University Press.
Roberts, B. (2002) Biographical Research, Buckingham, Thomson, A. (2007) ‘Four paradigm transformations in
Open University Press. oral history’, Oral History Review.
Roberts, G. (2001) The History and Narrative Reader, Tonkin, E. (1992) Narrating our Pasts: The Social
London, Routledge. Construction of Oral History, Cambridge, Cambridge
Rose, S. (2003) The Making of Memory: from Molecules University Press.
to Mind, London, Vintage, 2nd edition. Wengraf, T. (2001) Qualitative Research Interviewing,
Rosenthal, G. (2004) ‘Biographical research’, in London, Sage.
Seale, C., Gobo, G. & Gubrium, J. eds, Qualitative Widdershoven, G. A. M. (2003) ‘The story of
Research Practice, London, Sage, pp. 48–64. life: Hermeneutic perspectives on the relationship
Ryan, M.-L. ed. (2004) Narrative across Media the between narrative and life history’, in R. Miller ed.,
Languages of Storytelling, Lincoln, USA, University Biographical Research Methods, Vol. IV, London,
of Nebraska Press. Sage, pp. 108–123.
21
Focus Groups
Janet Smithson
(Kitzinger 1994, Agar and MacDonald 1995, set questions, but it has elements of both
Myers 1998, Wilkinson 1998), and there these forms of talk. The different definitions of
have been some recent considerations of focus groups, as well as the origins of focus
interactive patterns within focus groups group methodology in very varied contexts,
(e.g. Myers 1998, Kitzinger and Frith 1999, demonstrate some of the variations within this
Puchta and Potter 1999). Wilkinson (1998) methodology; even within the social research
concludes that ‘there would seem to be context, focus groups are used by researchers
considerable potential for developing new – with very different theoretical and analytical
and better-methods of analysing focus group backgrounds, and these have implications for
data’ (1998: 197). The regularly occurring the use and analysis of focus groups.
lack of theoretical and analytical discussions
in the focus group literature, even in academic
contexts, is perhaps partially explained by REASONS FOR USING FOCUS GROUPS
the roots of focus group usage as a market IN SOCIAL RESEARCH
research tool. The perception that focus
groups are a quick and useful way of gathering A growing literature on the reasons for
‘opinions’ still informs mainstream debate on using focus groups in the social sciences,
focus groups and focus group manuals, and together with practical advice and how to
affects how they are used – for example, they organise them and run them, is now available,
are often viewed as (only) suitable for the for example by Kitzinger (1995), Vaughn
initial stages of a research project. et al. (1996), Greenbaum (1998), Morgan
and Kreuger (1998) and Bloor et al. (2000).
One often-stated advantage of using focus
WHAT IS A ‘FOCUS GROUP’? groups lies in the fact that they permit
researchers to observe a large amount of
A focus group is generally understood to interaction on a specific topic in a short time.
be a group of 6–12 participants, with an They are sometimes viewed as a quick and
interviewer, or moderator, asking questions easy way to gather data. However, there are
about a particular topic. Some researchers, often problems with setting up and organising
such as Hughes and DuMont (1993: 776) groups and obtaining the right number and
characterise focus groups as group interviews: mix of people to groups. In practice, groups
‘Focus groups are in-depth group interviews tend to be based on availability rather than
employing relatively homogenous groups to representativeness of sample. Moderating
provide information around topics specified focus groups can be complex, and the data
by the researchers’. Others define them obtained can be difficult to transcribe and
as group discussions: ‘a carefully planned analyse (Pini 2002).
discussion designed to obtain perceptions on From a practical perspective, the feasibility
a defined environment’ (Kreuger 1998: 88) of arranging focus groups needs to be con-
or ‘an informal discussion among selected sidered. For example, if interviewing people
individuals about specific topics’ (Beck who are geographically distant, or who have
et al., 1986). These definitions show a ten- very little time, or who will be interviewed
sion between participant-researcher interac- in a second language, then focus groups
tion and interaction between participants, with may prove impossible (though telephone
interactions between participants in the group and online focus group methods are being
being a particularly distinctive characteristic developed, see the section entitled ‘Using
of focus group methodology, although this is focus groups in specific contexts’). Focus
not always apparent from analysis of focus groups have been described as particularly
group data. The data obtained in this method useful at an early stage of research as a
is neither a ‘natural’ discussion of a relevant means of eliciting general viewpoints, which
topic, nor a constrained group interview with can be used to inform design of larger
FOCUS GROUPS 359
studies (Vaughn et al., 1996). They are often should be relatively homogenous membership
used in conjunction with another method, (Kreuger 1994, Ritchie and Lewis 2003).
such as individual interviews or survey Guides of focus group research typically
questionnaires. While perceived convenience advocate having single sex groups, and
is a regularly cited reason for using focus several groups with members with compa-
groups, from a methodological perspective, rable characteristics, to permit cross-group
the question should rather be whether focus comparability. There are many other vari-
groups will produce the best sort of data for ables which may need to be taken into
the research question. consideration, such as nationality, sexuality
One of the perceived strengths of focus and ethnic background. Having people at
group methodology is the possibility for similar life stages, or working in similar
research participants to develop ideas collec- jobs, can be particularly relevant. However,
tively, bringing forward their own priorities heterogeneous groups can produce very
and perspectives, ‘to create theory grounded interesting discussions. For example, mixed
in the actual experience and language of sex groups can challenge the typical male
[the participants]’ (Du Bois 1983). Morgan and female discourses on these topics (Smith-
(1988) views the hallmark of a focus group son 2000). Recruitment of group members
as ‘the explicit use of the group interaction has been shown to affect the group dynamics,
to produce data and insight that would for example Agar and MacDonald (1995)
be less accessible without the interaction point out how the ways in which respon-
found in a group’ (Morgan 1988: 12). dents are recruited come to condition the
A central feature of focus groups is that group talk.
they provide researchers with direct access
to the language and concepts participants
Organisation and dynamics of
use to structure their experiences and to
focus groups
think and talk about a designated topic.
‘Within-group homogeneity prompts focus While the literature often (e.g. Vaughn 1996)
group participants to elaborate stories and recommends focus groups of up to 12 partici-
themes that help researchers understand how pants, there are practical and methodological
participants structure and organize their social reasons why many focus groups are smaller.
world’ (Hughes and DuMont 1993). Focus Practically, it can be difficult to get an exact
groups with children have been shown to be a number of participants to turn up to a focus
very effective approach for collecting data in group, especially if trying to get a specific
a setting which children feel comfortable with sub-group, for example new parents working
(Ronen et al., 2001). in specific jobs, or expectant mothers of
a particular age. In larger groups, there is a
likelihood that some participants will remain
DESIGN AND PROCEDURE silent or speak very little, while smaller
groups (say 4–8 participants) often provide
an environment where all participants can
Sampling and selecting participants
play an active part in the discussion. Smaller
In focus group methodology, the unit of groups often yield interesting and relevant
analysis is taken to be the group (Morgan data, giving more space for all participants
1988, Kreuger 1998), and groups are typi- to talk and to explore the various themes
cally homogenous – for example, students in detail (Brannen et al., 2002). Ritchie and
on a certain course, or a group with a Lewis (2003) suggest that if groups are
similar medical condition. Participants are smaller than four they can lose some of the
chosen to fit in with the group’s demo- qualities of being a group, while they see triads
graphic. According to the prescriptions about and dyads as an effective hybrid of in-depth
focus group methodology in the literature interviews.
360 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Second, the researcher cannot guarantee come out in focus groups unless specifically
that all discussion in this context will remain designed groups, include gay and lesbian
totally confidential. A useful strategy is to start views, and other non-standard family set-
the focus group with a list of ‘dos and don’ts’, ups, and also ethnic minority and religious
including asking participants to respect each minority perspectives. Separate focus groups
others’ confidences and not repeat what was can cover some aspects of these perspectives,
said in the group; however this cannot be and for other aspects, more ‘private’ methods
enforced. The moderator can guarantee from such as individual interviews may be more
a personal perspective that the things said in a suitable. However, the limitations of what is
focus group context will be kept anonymous discussed and what is omitted vary and it
and confidential, but cannot guarantee that is possible to get unexpected and extremely
co-participants will not discuss the group, interesting discussions about topics which are
which can be a problem, especially in an not always ‘recommended’ in focus group
institutional setting, such as in a workplace, manuals. Groups may be happy to discuss
or health care setting. sensitive topics such as sexual orientation
and parenting in a general way, but not to
give personal details about their own lives.
When are focus groups not
Sensitive topics can be discussed in a general
appropriate?
way in a focus group context, but with the
Certain topics are commonly understood to emphasis on general discussion rather than
be unsuitable for the focus group context. individual experience.
In particular, topics which participants may
view as personal or sensitive are often
better left for other methods, for example THE ROLE AND IMPACT OF
individual interviews. These may include THE MODERATOR
people’s personal experiences or life his-
tories, their sexuality, and topics such as In market research moderators tend to be
infertility or financial status. What is viewed specifically trained and employed to per-
as a private issue varies between different form this task, while in the social sciences
cultural groups (and also depends on age, researchers often moderate the group them-
gender and other contexts). In institutional selves. Specific issues that the moderator
contexts, such as workplaces, or schools, is expected to deal with include dealing
people may be particularly wary of presenting with disagreement and arguments in the
their views or talking about their personal groups, including all participants, noticing
experiences in front of colleagues, managers when participants are uncomfortable with a
or peers. Focus groups may also be inap- discussion and dealing with this appropriately,
propriate when the aim of the research is ensuring that essential topics are covered in
to obtain in-depth personal narratives, for the time available. The moderator is expected
example of the experience of illness. The to strike a balance between generating interest
methodology may also be inappropriate for in and discussion about a particular topic,
topics where people have strong or hostile while not pushing their own research agenda
views. However, in all these cases, much ending in confirming existing expectations
depends on the questions asked and the group (Vaughn et al., 1996, Sim 2002). They should
dynamics. be trying to ensure that discussion is between
There are perspectives which rarely come participants rather than between them and the
out in ‘mainstream’ groups, though these moderator (Sim 2002).
vary in different cultural contexts, and are In qualitative social science research, the
affected by age, gender and background of the role and subjectivity of the researcher is a
participants, as well as the setting and context vital part of the research context, and in this
of the focus group. Perspectives which rarely paradigm, the role and positioning of the
362 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
of analysis’ (Silverman 1993: 106). Focus stage of the focus group can also make a
groups, then, should not be analysed as if they difference – a question asked in the first
are naturally occurring discussions, but as few minutes of the focus group may elicit
discussions occurring in a specific, controlled a different response if asked later on when
context. people are more comfortable with the group.
There have been numerous critiques of Overall, a focus group is likely to elicit
qualitative techniques which appear to offer ‘public’ accounts (Smithson 2000, Sim 2002)
an ‘authentic gaze’ into participants’ views in contrast to the private accounts which
or lives (Silverman 2000). Focus group might emerge in individual interviews or in
researchers have typically extolled the group everyday interactions.
context as one which limits the role and But detailed study of group data suggests
impact of the moderator, thereby permitting the opposite can also happen and they can be
a more ‘natural’ discussion to emerge. This a forum for contrasting opinions to emerge
view needs to be treated with caution; the and develop (Smithson 2000, Pini 2002).
group context does not obliterate the role There are various powerful counter-examples
of the moderator, or the research context of to the expected ‘rule’ that focus groups
the talk. replicate the dominant discourse. Sometimes
participants make gentle, or overt challenges
to the status quo, and there are particular
Consensus and disagreement
strengths in the challenging of views by other
The emergence of dissonant views and participants, rather than by the moderator.
opinions between participants – what Kitzinger (1994) shows how difference can
Kitzinger (1994) calls ‘argumentative be examined in the focus group context, and
interactions’ is a distinctive feature of the how the method can be used as a way of
focus group method and often makes an studying how differences are negotiated and
important contribution to the richness of understood.
the data obtained (Sim 2002). However, One of the strengths of the method
there are limitations to how disagreements (Smithson 2000, Pini 2002) is the way focus
are expressed in this peer group context. The group discussions often range between discus-
group context of this methodology, while sion of personal experiences, and collective
appropriate for uncovering group discourses experiences. Kitzinger and Farquhar (1999)
and stories, is, meanwhile, likely to reproduce contend that focus groups sometimes provide
the socially accepted, normative discourse an opportunity for ‘sensitive’ topics to be
for that group. People with unpopular views, raised, as there is the space for discussion and
or less confident group members, may be reflection and time to explore issues in a more
reluctant to air their views in a group context. in-depth way than might be the case in more
People are often (though not always – see routine dialogue. They argue that focus groups
shortly) reluctant to disagree openly with a can be used to unpack the social construction
stated view, especially in groups of strangers. of sensitive issues, uncover different layers of
It is important therefore not to assume discourse, and illuminate group taboos and
consensus just because no one has disagreed the routine silencing of certain views and
openly (Sim 2002). If a divergence of views experiences. Through attention to sensitive
emerges, it is safe to assume that participants moments, researchers can identify unspoken
do hold different views; however if no assumptions and question the nature of
divergence appears, this does not indicate everyday talk. Focus group talk, like everyday
consensus. talk can include many contradictions, norms,
General questions can often elicit socially and both official and unofficial perspectives
acceptable responses when it is likely that on a sensitive topic.
in fact the individuals in the group hold One of the claims made in favour of
stronger views than this. The timing and focus groups as a methodology is that they
FOCUS GROUPS 365
research also needs to take note of cultural directive and perhaps less intimidating than
differences in emotional tone, feelings and traditional research methods, there is wide
reflexivity, which are particularly noticeable variation in this, as described elsewhere in
in focus group research. In some cultures it this chapter. The moderator is still exerting
is not usual to directly disagree in a group a strong influence over the group, and still
situation, or to overtly criticise authority. retaining a high degree of control, typically,
Ways of interacting are of course cultural as over the recruitment, procedure and subse-
well as responses to a particular method and quent analysis and reporting of the group.
the result of particular factors such as gender Using focus groups does not in itself make
and status. For example, in a cross-European the research ‘collectivist’, or empower par-
study of new parents’ orientations to work, ticipants. A postmodernist feminist approach
focus groups in Sweden were described by which views accounts gathered in a research
the national research team as ‘consensual’, process as stories, or narratives, can be well
with turn taking easily managed. In the same suited to focus group methodology, but the
cross-national study, focus groups in the UK questions of how to represent these stories,
were notable for high levels of criticism and which questions to ask and which replies to
outspokenness, while in the Bulgarian focus prioritise in analysis, and how to interpret or
groups in the same study there was little cross analyse these stories, are as pertinent for focus
talking or butting in (Brannen 2004). group research as for other feminist qualitative
methodologies. A priority for feminist focus
group researchers is how to make participants’
Using focus groups in
voices heard without being exploited or
feminist research
distorted, and taking account of ‘unrealised
Focus groups have been widely used in agendas’ of class, race and sexuality (Oleson
recent feminist research, and feminist social 2000). Focus groups are not a ‘solution’
scientists have elaborated on the ways in for highlighting the views of oppressed or
which the methodology can be used to further minority groups, but can, used sensitively,
feminist aims of giving various minority help to facilitate listening to these narratives.
groups a voice through the research process.
For example, Madriz (2000) starts an account Ethnographic research and
of feminist focus group research with a focus groups
quotation from a Dominican woman telling
Ethnographic researchers have made use of
how she prefers the focus group context as
small group discussions for many years,
she finds it less intimidating than being alone
although rarely using the term ‘focus groups’.
with an interviewer. Focus groups have been
Focus groups methodology can fit neatly
taken up as an appropriate method by both post
with certain streams of ethnographic thought,
modernist and feminist standpoint researchers
which place the research encounter in a
(Wilkinson 1998, Madriz 2000, Olesen 2000).
wider social context, and emphasise the
They are seen as a way of lessening the
social and processual nature of experiences
impact of the researcher and permitting
(Tedlock 2000). As with feminist research,
minoritised groups to develop and elaborate
focus groups have been viewed within
their own perspective on a research topic, in a
ethnography as a way of emphasising the
‘safe’ environment. Madriz argues that ‘the
collective nature of experience, and the social
focus group is a collectivist rather than an
context of accounts.
individualistic research method that focuses
on the multivocality of participants’ attitudes,
Focus groups in organisational
experiences and beliefs’ (Madriz 2000: 836).
research
However, other feminist researchers are
more cautious about the use of focus groups. Conducting focus groups in an organisational
In practice, while focus groups can be less context has particular implications. While it
FOCUS GROUPS 367
can be an advantage having people from the brought together people to explore experi-
same departments and work teams, who have ences of chronic illness. It is also a potentially
shared experiences and are often comfortable useful way of talking in a group context
talking together, there can be problems with about sensitive or embarrassing issues, in a
how freely people feel they can express relatively anonymous context. Other reasons
themselves in a workplace situation. Shared for the growing popularity of online focus
workplace experiences such as restructur- group methods include cost savings, and
ing, management experiences, enthusiasm attracting people who would otherwise have
or resistance to work-life initiatives, can little time to participate (Edmunds 1999).
encourage feelings of solidarity among team There are two main discussion options
members. Groups can share common knowl- available when running an online focus
edge about relevant issues in the company group – synchronous and asynchronous
even when the people were strangers. For (Chappell 2003). Synchronous discussions
example, in a study of new parents in occur in ‘real time’ with the moderator and
organisations (das Dores Guerreiro 2004), participants all logged onto a discussion at
everyone had a strong view about the change the same time, posting their comments on a
from formal to informal flexi-time, and there joint board. While this is a close simulation
had clearly been a great deal of discussion over of a face-to-face focus group, one of the
the past months about it which was continued advantages of an online method (the ability
in lively focus group discussion. to participate at one’s own convenience) is
Possible drawbacks of using focus groups no longer available. Additional drawbacks of
in organisational settings include people this method are that the conversation can
feeling unable to speak out in front of become hard to follow and participants tend
superiors or people from different parts of the to answer questions with short, ‘I agree’-
organisation. It is generally not recommended type responses because they feel pressured to
to place managers and employees in the answer quickly. This can also pose problems
same group, although this will vary with for the moderator. It can become difficult to
the nature of the organisation. Privacy and keep track of the conversations and responses
ethical issues are of particular importance of group members, as there is often more
in an organisational context, where people than one track of conversations running
are encouraged to talk freely in front of simultaneously (Montoya-Weiss et al., 1998).
colleagues. The other main online focus group option is
asynchronous discussions, which do not occur
in real time. Messages are posted in response
Online focus groups
to the moderator and the group members at
The use of online interviewing, including the participants’ convenience. Participants do
group interviewing, is being increasingly not have to be logged on at the same time
taken up in social science research. Online and can participate at any point during the day
focus group research methods are part of or night.
this rapid expansion of online methodologies Edmunds (1999) points out that online
(e.g. Murray 1997, Chappell 2003). There are groups can lead to greater anonymity for
various reasons for this. It can be a good way participants, which can lead to greater open-
of including in research hard-to-reach groups. ness. The downside of this, and a particular
An online focus group method can bring issue for online groups is the possibility of
together geographically distant participants ‘fake’ participants – people joining in with
in one, online forum. It can also be used false personas or providing false information
to bring together people with disabilities or (a regular problem on internet chat rooms,
illnesses who would not otherwise find it easy for example). While online methods might
to participate in research, especially in group seem to be particularly susceptible to this sort
contexts. For example, Kralik et al. (2006) of misinformation, it is useful to remember
368 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
that in ‘real’ focus groups, as with other are produced in this way perhaps mitigate
forms of research, the participant is an actor the awareness that the interactions occurring
constructing a performance (Goffman 1981). in this formalised research setting will differ
Newhagan and Rafaeli (1996) pointed out in many ways from interactions in other
that using electronic media affected how contexts. As well as differing from indi-
people communicated. While it is important vidual interview data, focus group talk will
to be aware of the ways in which different also be substantially different from ‘natural’
media affect people’s communication pat- conversation.
terns, this is an issue for all qualitative social Focus groups have specific dilemmas, both
research, and all focus group situations, not ethical and procedural, such as respect for
just for online groups. individuals’ privacy, and the difficulties of
There are ways of regulating participation dealing with inappropriate group behaviour
to limit possible misuse, for example making (for example, insensitive comments or reac-
contact individually with the focus group tions to another participant’s contribution),
participants before the online group occurs. as well as the more ubiquitous dilemmas of
There is a growing literature on chat room qualitative research concerning respect for
behaviour and discourses, and the use of participants’ voices, and concerns for misrep-
online methods in social science, which is resenting the experiences and discussions of
particularly relevant when considering the vulnerable groups.
use and analysis of online focus groups The focus group method does have partic-
(Rezabek 2000). ular strengths. It enables research participants
to discuss and develop ideas collectively,
and articulate their ideas in their own terms,
CONCLUSIONS bringing forward their own priorities and
perspectives. Not only can a wide variety of
The diverse nature of focus group research opinions be given and considered, but also
reflects the origins of focus groups, first in a wide variety of interactive techniques can
social science research before being taken be observed. Participants engage in a range
up mainly by market researchers for several of argumentative behaviours, which results in
decades, and more recently becoming widely a depth of dialogue not often found in indi-
and increasingly popular in various social vidual interviews. Moreover, some of these
research fields. The method is used by limitations can also be viewed as possibilities
researchers from very varied epistemological for the method. Myers suggests that ‘the
and theoretical research traditions, which is constraints on talk do not invalidate focus
reflected in the variations of approaches, and group findings; in fact, it is these constraints
specifically the techniques and approaches to that make them practicable and interpretable’
analysing the talk produced in this context. (Myers 1998, p. 107). Focus groups permit
There are conceptual, methodological and some insights into rhetorical processes, or
ethical issues in focus group research. As contemporary discourses. Another plus is that
with other qualitative research methods, participants often report that joining in a focus
there are opportunities for consciously or group has been an enjoyable and creative
unconsciously manipulating the participants’ experience (Wilkinson 1998, Madriz 2000,
responses, and it is perhaps a feature of Smithson 2000, Pini 2002).
focus group methodology, with its seeming The effects of group dynamics in the
emphasis on ‘natural discussion’ and ‘col- focus groups can therefore be of benefit in
lective accounts’, for there to be relatively social research for exploring issues from the
little explicit awareness of the constructed perspective of the participants, in a way
nature of the discussion, and the salience of that is culturally sensitive to participants’
the moderator and research agenda throughout priorities and experiences. While there are
the process. The ‘collective stories’ which some limitations of focus group research,
FOCUS GROUPS 369
these can be partially overcome by awareness Hughes, D. and DuMont, K. (1993) Using focus groups
of the constraints, by informed analysis, and to facilitate culturally anchored research. American
by detailed consideration of the way the Journal of Community Psychology 21(6): 775–806.
conversations are socially constructed in the Jacoby, S. and Ochs, E. (1995) Co-construction: An
group context, and are narratives produced introduction. Research on Language and Social
Interaction 28(3): 171–183.
jointly by the co-participants and also by the
Kitzinger, J. (1994) The methodology of focus groups:
moderator.
The importance of interaction between research
participants. Sociology of Health and Illness 16(1):
103–121.
REFERENCES Kitzinger, J. (1995) Introducing focus groups. British
Medical Journal 311: 299–302.
Agar, M. and MacDonald, J. (1995) Focus groups and Kitzinger, J. and Farquhar, C. (1999) The analytical
ethnography. Human Organization 54: 78–86. potential of ’sensitive moments’ in focus group
Beck, L. C., Trombetta, W. L. and Share, S. (1986) discussions. In Barbour, Rosaline S. and Kitzinger,
Using focus group sessions before decisions are Jenny (eds) Developing focus group research: Politics,
made. North Carolina Medical Journal 47(2): 73–4. theory and practice. London: Sage.
Bloor, M., Frankland, J., Thomas, M. and Robson, K. Kitzinger, C. and Frith, H. (1999) Just say no? The
(2000) Focus groups in social research. Sage: London. use of conversation analysis in developing a feminist
Bradburn, N. M. and Sudman, S. (1979) Improv- perspective on sexual refusal. Discourse and Society
ing interview method and questionnaire design. 10/3: 293–316.
San Francisco: Jossey-Bass. Kralik, D., Price, K., Warren, J. and Koch, T. (2006) Issues
Brannen, J. (2004) Methodological issues in the in data generation using email group conversations
consolidated case studies. Research Report #5 for the for nursing research. Journal of Advanced Nursing
EU Framework 5 funded study ‘Gender, parenthood 53/2: 213–220.
and the changing European workplace’. Printed by Kreuger, R. A. (1994) Focus groups: A practical guide for
the Manchester Metropolitan University: Research applied research, 2nd edition. Newbury Park: Sage.
Institute for Health and Social Change. Kreuger, R. A. (1998) Analyzing and reporting focus
Brannen, J., Lewis, S., Nilsen, A. and Smithson, J. (eds) group results. Focus group kit, Volume 6. California:
(2002) Young Europeans, work and family: Futures in Sage.
transition. London: Routledge. Madriz, E. (2000) Focus groups in feminist research. In
Bryman, A. (1988) Quantity and quality in social N. K. Denzin and Y. S. Lincoln (eds) Handbook of
research. London: Unwin Hyman. qualitative research. California: Sage.
Chappell, D. (2003) A procedural manual for the online Merton, R. K. and Kendall, P. L. (1946) The
work-family focus group. Centre for Families, Work focused interview. American Journal of Sociology 51:
and Well-being, Guelph, Canada. 541–557.
Das Dores Guerreiro, M. (2004) Case studies report. Montoya-Weiss, M. M., Massey, A. P. and Clapper, D. L.
Research report #3 for the EU Framework 5 (1998) On-line focus groups: Conceptual issues and
funded study ‘Gender, parenthood and the changing a research tool. European Journal of Marketing
European workplace’. ISBN 1-900139-46-4. Printed 32: 713–723.
by the Manchester Metropolitan University: Research Morgan, D. L. (1988) Focus groups as qualitative
Institute for Health and Social Change. research. Newbury Park, CA: Sage.
Drew, P. and Heritage, J. (eds) (1992) Talk at work. Morgan, D. L. (2002) Focus group interviewing. In
Cambridge: Cambridge University Press. J. F. Gubrium and J. A. Holstein (eds) Handbook
Du Bois, B. (1983) Passionate Scholarship: Notes on of interviewing research. Context and method.
values, knowing and method in feminist social Thousand Oaks, California: Sage.
science. In G. Bowles and R. D. Klein (eds) Theories Morgan, D. L. and Kreuger, R. A. (1998) The focus group
of women’s studies. London: Routledge. kit. California: Sage.
Edmunds, H. (1999) The focus group research handbook. Munday, J. (2006) Identity in focus: The use of focus
Lincolnwood, IL: NTC Business Books/Contemporary groups to study the construction of collective identity.
Publishing. Sociology 40/1: 89–105.
Goffman, E. (1981) Forms of talk. Oxford: Blackwell. Murray, P. J. (1997) Using virtual focus groups in
Greenbaum, T. (1998) The handbook for focus group qualitative research. Qualitative Health Research
research. Sage: London. 7(4): 542–554.
370 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Myers, G. (1998) Displaying opinions: topics and disorders: a modified focus group technique to involve
disagreement in focus groups. Language in Society children. Quality of Life Research 10(1): 71–79.
27: 85–111. Silverman, D. (1993) Interpreting qualitative data:
Newhagen, J. E. and Rafaeli, S. (1996) Why commu- methods for analysing talk, text and interaction.
nication researchers should study the internet: London: Sage.
a dialogue. Journal of Communication 46(1): 4–13. Silverman, D. (2000) Analyzing talk and text. In
Oleson, V. L. (2000) In N. K. Denzin and Y. S. Lincoln N. K. Denzin and Y. S. Lincoln (eds) Handbook of
(eds) Handbook of qualitative research. California: qualitative research. California: Sage.
Sage. Sim, J. (2002) Collecting and analysing qualitative data:
Pini, B. (2002) Focus groups, feminist research and issues raised by the focus group. Journal of Advanced
farm women: opportunities for empowerment in Nursing 28(2): 345–352.
rural social research. Journal of Rural Studies 18/3: Smithson, J. (2000) Using and analysing focus
339–351. groups: limitations and possibilities. International
Poland, B. and Pederson, A. (1998) Reading between Journal of Methodology: Theory and Practice 3(2):
the lines: interpreting silences in qualitative research. 103–119.
Qualitative Inquiry 4/2: 293–312. Stokoe, E. H. and Smithson, J. (2002) Gender and sex-
Pollack, S. (2003) Focus-group methodology in research uality in talk-in-interaction: considering conversation
with incarcerated women: race, power, and collective analytic perspectives. In P. McIlvenny (ed.) Talking
experience. Affilia 18/4: 461–472. gender and sexuality. John Benjamins: Amsterdam.
Puchta, C. and Potter, J. (1999) Asking elaborate Tedlock, B. (2000) Ethnography and ethnographic
questions: focus groups and the management of representation. In N. K. Denzin and Y. S. Lincoln (eds)
spontaneity. Journal of Sociolinguistics 3: 314–335. Handbook of qualitative research. California: Sage.
Puchta, C. and Potter, J. (2002) Manufacturing Templeton, Jane F. (1987) A guide for marketing and
individual opinions: market research focus groups and advertising professionals. Chicago: Probus.
the discursive psychology of attitudes. British Journal Vaughn, S., Shay Schumm, J. and Sinagub, J. (1996)
of Social Psychology 41: 345–363. Focus group interviews in education and psychology.
Rezabek, R. (January, 2000) Online focus groups: elec- California: Sage.
tronic discussions for research. Forum for Qualitative Wilkinson, S. (1998) Focus group methodology:
Social Research [On-line Journal], 1(1). Available at: a review. International Journal of Social Research
http://qualitative-research.net/fqs [2007, 08,08]. Methodology, Theory and Practice 1(3): 181–204.
Ritchie, J. and Lewis, J. (eds) (2003) Qualitative research Wilkinson, S. and Kitzinger, C. (1996) Representing the
practice: a guide for social science students and other. London: Sage.
researchers. Thousand Oaks, California: Sage. Willgerodt, M. A. (2003) Using focus groups to develop
Ronen, G. M., Rosenbaum, P., Law, M. and Streiner, D. L. culturally relevant instruments. Western Journal of
(2001) Health-related quality of life in childhood Nursing Research 25(7): 798–814.
PART IV
This section inevitably only covers some of One of the problems for the student in
the many analytic strategies available. It cov- this field is the surfeit of terms for similar
ers a number of types of analysis available approaches: individual growth modelling,
in relation to quantitative and qualitative data random coefficient modelling, multilevel
and issues that the researcher will encounter. modelling, mixed modelling, and hierarchical
It also has a number of chapters that focus linear modelling, together with the range of
on the analysis of data derived via different statistical packages that can be used. The
methods. term the authors use is multilevel modelling.
This approach has several advantages that
include: its ability to deal with any number
ANALYSIS OF PRIMARY of time points; that each wave of data can be
QUANTITATIVE DATA collected with different time schedules and;
that no data need be discarded because they
Three chapters focus on quantitative data: are missing. The approach can be applied to
one on the analysis of change; a second on linear, non-linear and discontinuous trends.
the analysis of latent variables (variables that The analysis can include both time-invariant
cannot be measured); and a third on the biases predictors such as gender and race as well
that are introduced into analysis when there as ones that do change with time such as
are no comparison groups or control groups attitudes. Moreover, these predictors can be
as in evaluation research. fixed or randomly varied across persons.
Analysing change is difficult. Only in the In Chapter 23, Hoyle addresses the analysis
past 35 years have approaches to statis- of complex quantitative data, focusing on
tical measures of change been developed. latent variable modelling, which examines
Chapter 22 by Graham, Singer and Willett pro- the presence or influence of constructs that
vides an introduction to one approach to the cannot be measured. The chapter discusses
analysis of quantitative longitudinal data. The the use of linear structural equation mod-
chapter goes into enough depth to provide a elling (SEM) to evaluate social models,
basic understanding of longitudinal modelling an approach that has many uses in the
but does not become so technical that it is social sciences: in particular the evaluation
difficult for a person who is not familiar with of measurement models, mediated effects,
the terminology and concepts to follow. moderator effects and longitudinal data using
372 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
several approaches including latent growth groups. Unfortunately, West and Thoemmes
curve models. In all these cases a predicted conclude from their literature review that it
model is compared to actually observed data is not clear whether the bias introduced by
to determine if the predicted model is a good fit having non-equivalent groups will make the
with the data. The predicted model describes comparison between the two groups appear
the relationships among constructs and can be smaller or larger. The chapter also deals with
regarded as a hypothesis of the mechanisms the following issues: the importance of lack
that produced the data. The use of latent of bias in the assignment; the importance
variables is especially useful in decreasing the of delivering the intervention to everyone in
number of variables that need to be tested and the treatment group; issues of attrition; and
in increasing the reliability of measurement. questions concerning the information given to
SEM’s measurement component is used to intervention and non-intervention groups.
test the relationship among latent variables
and their indicators. The structural component
is concerned with the directional relationship. ANALYSIS OF PRIMARY
While the latter appears to be causal, because QUALITATIVE DATA
the path model specifies direction, Hoyle is
quick to point out that unless the data are Five chapters focus on the analysis of primary
longitudinal then causal conclusions cannot qualitative data. Three chapters are devoted
be made. The measurement component can to the analysis of talk: a chapter on discourse
be used to test if the model is consistent analysis and conversation analysis; a chapter
across time or samples, which would indi- on the analysis of narrative and storytelling;
cate measurement invariance. He notes that and a chapter on grounded theory.
although this is a very valuable function of Charles Antaki’s chapter (25) on how to
SEM it is rarely used that way. Hoyle sets analyse discourse covers a lot of ground not
out six limitations of SEM including requiring only by talking about different varieties of
a sample size of a minimum of 400 in order discourse analysis (DA) but also by including
to obtain stable estimates but he also predicts conversation analysis (CA). Even though
that SEM’s use will grow because of its many these approaches are often seen as separate or
advantages compared to other techniques. even belonging to opposing camps, both types
In Chapter 24 West and Thoemmes address of analysis address the organization of talk and
the issue of having appropriate control or text as ‘speech acts’ thereby emphasizing their
comparison groups in research using an agentic dimension. Among the plethora of
intervention or a programme evaluation, methods used for analysing discourse, Antaki
especially when the question being addressed also discusses narrative analysis, critical
is the effectiveness of an intervention. These discourse analysis, interactional sociolinguis-
techniques, even though they have impor- tics, membership category analysis, discur-
tant limitations, provide a safety net for sive psychology, and ethnomethodologically
experimental social research. The authors inspired DA. Social interaction as revealed
provide valuable advice for research where through the lens of CA is similar to other ways
the design is intended to have non-equivalent in which discourse is analysed: it can discover
groups or where there is a failure of random things about interaction and language use that
assignment, as well as research that sets out the participant did not suspect, or which have
to have random assignment. They discuss effects or functions which did not figure in
several techniques that can be used in an the original aims of the encounter or speaker.
attempt to deal with groups that are not Such revelations, whatever the method used
equivalent at the start of the study. However, in teasing them out, are the ultimate criteria
even when the design is labelled as random for the right to claim to have carried out an
assignment, the implementation of the design analysis. As Antaki stresses, any researcher
may result in obtaining non-comparable who claims to be a discourse analyst must
TYPES OF ANALYSIS AND INTERPRETATION OF EVIDENCE 373
‘add value’ to what can be read or heard systematic procedures for the analysis of
in speech and claims must be backed up by qualitative data. Hitherto such strategies were
evidence grounded in the words used (or not largely learned by researchers in the field.
used). Thus the ‘argumentative steps’ leading In a context and time in which US research
to the conclusion must be available to the was largely quantitative or rather status was
reader and fellow-scholar. accorded largely to quantitative research, the
In discussing the analysis of narrative systematization of its approach bestowed on
Hyvarinen makes a very rich contribution qualitative research some legitimacy. How-
to the Handbook. The chapter 26 starts ever, as Charmaz argues, in their enthusiasm
with a wide-ranging account of the different followers of the approach sought to project
definitions of narrative, many of which have a rigidity on to it, in particular a belief
been potentially confusing including ordinary that disallowed macro social processes or
talk to accounts that are ‘narratives’ and those structures that are left untapped at the inter-
that ‘possess narrativity’. The chapter goes actional level, while a second considerable
on to suggest that narrative analysis includes benefit of a grounded approach – namely
as many genres as the term narrative itself to generate theory – was rarely exploited.
and picks out two developments that have Both developments are ironic, Charmaz notes,
had great impact on social research: grand given grounded theory’s original openness to
narratives and the notion of ‘life as narrative’. methodological innovation and development.
The discussion then turns to the methods of On the other hand, this chapter represents
analysis that have been applied to different an inspiring account of grounded theory and
genres of narrative, in particular the Proppian encouragement for its further use notably
model in which Russian wonder tales were for those who wish ‘by interrogating and
analysed in terms of the basic functions of following content, ..[ to] construct form
actions performed by their different characters for their inquiry, rather than solely creating
in the plots and the textual approach adopted content from form used as a recipe for
by Labov and Waletsky who sought to identify generating research’ (Charmaz Chapter 27).
the basic elements of narrative. The chapter Two chapters focus on the analysis of
then moves to recent developments: to the qualitative material of a different kind, the first
study of narratives as practices and in context, on the analysis of documents and the next on
thereby making a distinction between the story the analysis of visual material.
and the storying process. The last part of Documents are a key source of data but
the chapter discusses how narrative practices methodological guidance to their analysis and
are transformed into cultural scripts, shape use is rare. Typically documents are used by
individual action and narration, and lead to researchers as resources for trawling content.
breach and discordance. The grounded theorist distinction between
Grounded theory has been an extremely form and content is taken up by Lindsay
important development in the analysis of talk Prior in Chapter 28 on documentary research.
although it does not need to be limited to Prior argues they can also be seen as a topic
such a form of data. In Chapter 27, Kathy in their own right in which the focus is
Charmaz provides an illuminating analysis of on documents as ‘informants’ that perform
its development according to its originators – functions in social interaction. In arguing in
Glaser and Strauss in their book The Discovery favour of a focus upon discourse (as well as
of Grounded Theory published in 1967. She content) Prior gives a striking example of
discusses the development of their ideas from how the scientific discovery of DNA came to
her own position as a long-time exponent be represented in text as something that was
and developer of the method. Her argument endowed with creative action. Without the
is that its clear appeal lay in the fact that use of metaphors drawn from communication
The Discovery of Grounded Theory was the this would not have been possible to convey
first methodological text to set out explicit and hence for the public to comprehend.
374 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Documents are also read and understood – as or methodological questions are explored.
in Bernstein’s terms (Bernstein 20001 ) they Supplementary analysis involves the in-depth
are the object of recontextualization. They investigation of an issue, or one aspect of the
may ‘act’, as in the case of a will, and they data, that was not addressed, or was only partly
may form part of a network of actors, as in the covered, in the original research. Instead,
case of a genre of literature, and they are used the purpose of re-analysis is to verify and
in social interaction to structure and pattern corroborate the findings of previous work.
their readers. In amplified analysis, two or more datasets
Like documentary methods visual methods are utilized to form a larger dataset, or
are a relatively ignored field of methodology used to compare different populations. Finally
with the exception of social anthropology in assorted analysis, secondary analysis of
where visual data have been used for some qualitative data is combined with additional
time. In Chapter 29, Christian Heath and primary research. Despite recent advances
Paul Luff set out a case for a particular in the re-use of qualitative data, Heaton
approach within sociology that draws upon stresses that further work is needed to
ethnomethodology and conversation analysis explore and outline different strategies for
and directs analytic attention towards the re-using qualitative data, and to examine the
social and interactional accomplishment of acceptability of these strategies to research
everyday activities and events. In their participants and the public.
chapter, they draw upon their own study Angela Dale and colleagues in Chapter 31
of auctions and auction houses, to provide provide a mine of useful information about the
some practical guidance to using video record- secondary analysis of quantitative data. They
ings to address the social and interactional present an excellent overview of the types of
organization of naturally occurring events. data available that are collected by academics,
governments and supra-national organiza-
tions such as the European Union. These
SECONDARY AND META-ANALYSIS include: administrative datasets, national
cohort and panel studies, international and
Three chapters are concerned with the sec- national surveys, pooled samples from several
ondary analysis of data: the first on qualitative surveys (where no one source provides
data, the second on quantitative data, while the sufficient numbers of a particular group that
third is a discussion of meta-analysis. is of interest), and micro datasets that link
The re-use of qualitative data is not together administrative records for the same
established practice in social research, as Janet individuals. The secondary analysis of large-
Heaton suggests in Chapter 30. However, it is scale datasets is moreover occurring in a
a developing methodology, and the re-use of context in which attempts are being made to
qualitative data is becoming more common, take a more global view of available datasets.
partly due to computer technology and partly For example, the UK’s Economic and Social
due to the promotion of data sharing. Social Research Council is now taking a strategic
researchers can access qualitative data for approach by providing a national map that will
secondary analysis in three ways: through enable researchers to find their way through
data archives, through informal data sharing, the myriad resources available. The chapter
and by re-using data from their own previous is highly practical and includes some tips on
research. The latter is still the most common how to gain access to these datasets, with
alternative, despite the increasing availability a particular focus upon data archives. It has
of qualitative data collected by others. Heaton the added advantage of covering datasets in
lists several ways in which qualitative data a range of countries. It also makes reference
can be re-used. In supra analysis, the focus of to ways in which such datasets may be used
the secondary analysis transcends the primary in combination with qualitative methods as
data analysis in that new theoretical, empirical part of a mixed-methods strategy. The last
TYPES OF ANALYSIS AND INTERPRETATION OF EVIDENCE 375
three sections of the chapter offer cautionary per se but the logic that underlies the
advice about using data collected for different integration of data within the analysis, and the
purposes to those of the secondary analyst extent to which the combination of methods
and discuss a variety of good practices. The strengthens the validity of that analysis. As
chapter also raises ethical issues stressing the authors put it, data integration should
how secondary data analysts inherit respon- act as quality control. This does not in
sibilities at the point of access to these data. their view mean ignoring the epistemological
A section is importantly devoted to advances assumptions underlying each method but
in access to data via e-social science (grid recognizing that there are several ways of
technology). interpreting a research question, while being
Meta-analysis is the integration of data open to the benefits and constraints of each
from similar studies that leads to a quantitative type of data.
summary of the results of these studies. The authors point to several different
In Chapter 32 Patall and Cooper provide possible mixed-method research designs and
a comprehensive framework for understand- discuss their own study in some depth in which
ing meta-analysis that is increasingly used both qualitative and quantitative methods
to make literature reviews of quantitative were equally important. They show how in
research more systematic, replacing the more their study of public responses to flood warn-
traditional narrative review. However they ing, how one method (a survey) revealed that
suggest that informed social scientists need to many of those identified according to external
be aware of both the advantages and disadvan- measures and perspectives as being at risk of
tages of meta-analysis, regardless of their own flooding were unaware of the risks, while the
use of this approach. They discuss a range of qualitative method they used explained this
issues that include: the identification of studies lack of awareness. They conclude that, rather
for inclusion; coding frames, calculation of than seeing the different methods as gener-
effect sizes; sample weighting and so on. They ating competing findings, the complex social
also identify the problems to do with testing phenomena under investigation required the
the same relationship in all the studies under coordination of different perspectives and
review, issues concerning the independence of their associated methodologies.
findings, and the variable quality of the studies Cronin et al. in Chapter 34 take a similar
included. This chapter provides an excellent view about the integration of different types of
way to obtain competence in addressing these data. Their concern is to describe the processes
issues. involved in analytic integration. Drawing
upon their own research, this discussion is
about research in which no one method is
INTEGRATING ANALYSES OF DATA dominant. Through the use of in-depth inter-
FROM DIFFERENT SOURCES views, life histories and visual methods they
explored the meaning of vulnerability and
Finally, we come to the key issue of how to safety in everyday life. They broadly defined
integrate the analysis of data from different these different data sources as qualitative. The
sources. One of the central themes of this process of analysis they describe is one in
section, to which three chapters are devoted, which they followed ‘different threads’: using
is the combination of different data collected one method they picked out one thread of
through different methods. In Chapter 33, the analysis, generated either inductively or
Jane Fielding and Nigel Fielding discuss imported from external theory, that they then
the integration of qualitative and quantitative pursued in the analysis of data produced by
data, that which is most commonly described the other methods. The chapter is particularly
as mixed-methods research. They emphasize useful in giving a very detailed account of
that what is important is not the choice the steps in the analytic process while at the
of design and use of different data sources same time demonstrating close attention to
376 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
epistemological and theoretical issues and the have just one watch instead of two; instead, it
intrinsic form of the data. Thus it identifies may simply be less confusing.
how the researchers sought to preserve the The Handbook’s last chapter is about
integrity of the individual narrative accounts writing and presenting social research. Amir
and cautions against the translation of one Marvasti (Chapter 36) suggests alternative
set of data into another – in this study the ways of writing social science and argues
translation of visual data into textual data. that during the second half of the twentieth
In Chapter 35, Max Bergmann considers century a ‘third culture’ of representation
what data ‘are’, the reasons for using more has challenged the necessity of treating
than one dataset for a research question and science and literature as mutually exclusive
how these reasons connect differently to vari- realms of knowledge. This means that in the
ous parts of the research process. The chapter social sciences there is a growing awareness
reviews issues concerned with the analysis of of the rhetorical dimensions of writing
different sources of largely quantitative data and representing facts, so that efforts to
and discusses how data are always contingent inscribe social reality also involve linguistic
and shaped by analytic strategies; analyses of constructive practice. As a consequence, in
data provide only partial answers to research recent decades alternative forms of writing
questions. In making this case a number of have emerged. These Marvasti classifies
arguments are presented for using a num- into six genres: (1) writing with pictures;
ber of different (quantitative) data sources: (2) performative writing; (3) writing factual
verification, convergence, complementarity fiction; (4) poetic representation; (5) writing
and holism, rationales that apply equally the author; and (6) post-colonial writing.
in research that combines quantitative and Marvasti also discusses the ways in which
qualitative data. These ways of combining alternative texts have been criticized. The
data are played out at different phases of the chapter provides the reader with a map of an
research process so that data in a qualitative ever-changing terrain and suggests that many
form may be transformed into quantitative territories are still to be discovered.
format at the point of data collection, for
example through CAPI technology. Such
processes of transformation Bergman refers NOTES
to as ‘a form of taming and disciplining’ data
for a particular type of analysis. The chapter 1 Bernstein, B. (2000) Pedagogy, Symbolic Control
begins and ends with a reference to Segal’s and Identity Theory: Research Critique, Lanham
law that does not propose that it is better to Maryland: Rowman and Littlefield.
22
An Introduction to the
Multilevel Model for Change
Suzanne E. Graham, Judith D. Singer and
John B. Willett
Researchers often examine how individual the most sense to the research question—from
change over time depends on selected predic- seconds to years, sessions to semesters. The
tors by fitting a multilevel model for change. data collection schedule can be fixed (every-
Generations of behavioral scientists have been one has the same periodicity) or flexible (each
interested in measuring and investigating person has a unique schedule); the number
individual change, but for decades, the of waves of data collected can be identical
prevailing view was that it was impossible to or vary from person to person. And don’t
do well (Cronbach and Furby, 1970). During let the term ‘growth model’ fool you—these
the 1980s, however, methodologists working models are also appropriate for outcomes that
within a variety of different disciplines decrease over time (e.g. weight loss among
developed a class of appropriate methods— dieters) or exhibit complex trajectories that
known variously as individual growth model- include plateaus and reversals.
ing, random coefficient modeling, multilevel Furthermore, fitting a multilevel model
modeling, mixed modeling, and hierarchical for change can be used to address research
linear modeling—that permit the effective questions posed across many substantive
investigation of change. Today we know that it disciplines. In medicine, we study change
is indeed possible to model change, and to do over time in aspects of health status, such
it well, as long as you have longitudinal data as alcohol consumption among adolescents
available (Rogosa et al., 1982; Willett, 1988). (Curran et al., 1997). In education, we exam-
A multilevel model for change can be fit ine changes in student academic achievement
successfully to longitudinal data of many over time, for example, the development
different kinds. The research design that of the understanding of mathematical con-
generated the data can be either experimental cepts during secondary school (Ai, 2002).
or observational, prospective or retrospective. In psychology, we investigate changes in
Time can be measured in whatever units make behavioral outcomes, such as externalizing
378 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
behaviors or depressive episodes, over time using items from the National Assessment of
(Keiley et al., 2000). Educational Progress. Here, in our example,
Perhaps the most intuitively appealing way we present analyses of the mathematics
of understanding how a multilevel model for achievement data from a sub-sample of
change is postulated is to link its specification 1,322 White and African-American students
to two distinct substantive questions about between 7th grade and 11th grade. We begin
change, each arising from a particular level by examining the effects of race on changes
in a natural hierarchy: in the students’ mathematics achievement
over time. Then, we investigate whether
• At level-1—the ‘within-person’ or intra-individual individual mathematics achievement growth
level—we can ask questions about each person’s trajectories differ for students from different
individual change trajectory. Does a particular stu- socio-economic backgrounds and whether
dent’s mathematics achievement improve rapidly girls’ trajectories differ from those of boys.
during secondary school? Does another student’s
achievement increase less rapidly? Might yet
another student’s mathematics achievement actu-
ally decrease over time? Are these changes linear
Level-1 model for individual change
or non-linear? The goal of addressing a level-1 In the left-hand panel of Figure 22.1, we plot
research question is to interrogate the trajectory the mathematics achievement (MATHACH)
of each person’s individual growth over time. of one African-American girl from our dataset
• At level-2—the ‘between-person’ or inter-
against her grade, between 7th and 11th grade.
individual level—we can ask how other variables
may predict differences among the change
Notice the upward trend in the empirical
trajectories of many individuals. On average, growth record, which we have summarized
do girls’ and boys’ mathematics achievement in the figure by superimposing an ordinary
trajectories start at the same initial level? Do boys least squares (OLS) ‘achievement on grade’
and girls have the same rates of change over time? linear regression line, fitted for this girl. With
Do the change trajectories differ systematically by few waves of data, it is difficult to argue that
other important individual characteristics, such as anything except a linear model is suitable
a student’s race or socio-economic background? for representing change, within-person. Here,
The goal of addressing a level-2 research question with five waves of data, we need not be
is to interrogate any heterogeneity in change limited to thinking only in terms of linear
among individuals in order to determine the
trajectories, but for simplicity we begin here
relationship between predictors and the growth
trajectories.
by focusing on linear growth over time. Later
in the chapter we consider non-linear growth
trajectories.
These two types of questions are natural
A level-1 statistical model, or individual
precursors of the statistical models that
growth model, can be specified to represent
together form an overall multilevel model for
the change that we hypothesize each member
change.
of the population will experience during the
In this chapter, we illustrate these ideas
time period under study. Assuming that true
using five waves of mathematics achievement
individual change is a linear function of
data collected as part of the Longitudinal
grade, for instance, a reasonable level-1 model
Study of American Youth [LSAY], a national
may be:
longitudinal study of U.S. secondary school
students (Miller et al., 2000). LSAY data
were collected from 5,945 students over the Yij = π0i + π1i (GRADEij − 7) + εij (1)
course of seven years, beginning in the fall of
1987 when the students were in either 7th or This model asserts that, in the population
10th grade. A primary focus of the LSAY from which this sample was drawn, Yij , the
investigation was on the measurement of value of MATHACH for student i at time
students’mathematics achievement over time, j is constituted from two important parts.
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 379
70 70 70
White
Mathematics achievement
Mathematics achievement
Mathematics achievement
65 65 65
60 60 60
55 55 55
African-
50 50 50 American
45 45 45
40 40 40
35 35 35
30 30 30
25 25 25
7 8 9 10 11 7 8 9 10 11 7 8 9 10 11
Grade Grade Grade
Figure 22.1 Developing a multilevel model for change using data on mathematics
achievement over time. Left-hand panel contains the empirical growth record of one
African-American girl plotted against her grade in school. Middle panel presents exploratory
OLS-fitted trajectories for a random sample of 10 White and 10 African-American students
(coded using dashed lines for White students and solid lines for African-American students).
Right-hand panel presents fitted change trajectories for White and African-American
students, obtained by substituting prototypical predictor values into the fitted multilevel
model for change
The first part — in brackets in equation (1) — become the objects of prediction in the linked
describes the underlying true change for this level-2 model that we specify below.
individual as a linear function of his (or her) An important feature of the level-1 spec-
grade in school on that occasion (GRADE ij ). ification is that the researcher controls the
In our case, the model implicitly assumes substantive meaning of these parameters
that a straight line adequately represents the by choosing an appropriate metric for the
student’s true change trajectory over time. temporal predictor. For example, in this level-
The second part of the individual growth 1 model, the intercept, π0i , represents student
model is a random error (εij ), which is i’s true mathematics achievement in 7th
intended to account for the scatter of the grade. This interpretation applies because we
observed data around the individual true centered GRADE in the level-1 model by
change trajectory. Even though everyone in subtracting the constant ‘7’ from it, to provide
our example was assessed on the same five the level-1 predictor (GRADE–7). Had we not
occasions (grades 7, 8, 9, 10, and 11), this centered the predictor in this way, the intercept
basic level-1 model can be used in a wide π0i would represent individual i’s true value of
variety of other datasets, even those in which mathematics achievement at grade 0, which,
the timing and spacing of waves varies across corresponding with kindergarten, predates the
people. onset of data collection! Centering the level-1
The brackets in equation (1) identify time predictor on the first wave of data
the model’s important structural component, collection, as we have done here, is a popular
which represents our hypotheses about each approach because it allows us to interpret π0i
person’s true trajectory of change in math- easily: it is student i’s true ‘initial’ status, at
ematics achievement over time. The model the beginning of the study.
stipulates that this linear trajectory is char- Perhaps a more important individual
acterized by two critical individual growth growth parameter is slope, π1i , which rep-
parameters, π0i and π1i , which determine its resents the rate at which student i’s true
shape for the ith student in the population. mathematics achievement changes over time.
If the model is appropriate, these parameters Since time is measured in grades, in our
represent the fundamental features of each example, individual growth parameter π1i
student’s true growth trajectory, and as such, represents student i’s true annual rate of
380 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
allow even individuals who share common in 7th grade) among White students in
predictor values to differ stochastically in their the population, while γ01 represents the
individual change trajectories, by permitting hypothesized population difference in average
random variation in the individual growth true initial status between African-American
parameters across people. These considera- and White students. Similarly, γ10 represents
tions suggest that the following level-2 model the average true annual rate of change in
may be a useful specification for the inter- mathematics achievement for White students,
individual differences in change: in the population, while γ11 represents the
hypothesized population difference in average
π0i = γ00 + γ01 AFAMi + ζ0i true annual rate of change between African-
(2)
π1i = γ10 + γ11 AFAMi + ζ1i American and White students. The level-2
slopes, γ01 and γ11 , then jointly capture the
Like all level-2 models, equation (2) has effects of AFAM. If γ01 and γ11 are non-
more than one component; but, taken together, zero, the average population trajectories in
they simultaneously treat the intercept (π0i ) true mathematics achievement differ between
and the slope (π1i ) of an individual’s growth the two ethnic groups; on the other hand,
trajectory as level-2 outcomes that are asso- if γ01 and γ11 are both 0, then the tra-
ciated with predictors (here, AFAM). As in jectories do not differ by race. These two
multiple regression analysis, we can modify level-2 slope parameters therefore address
the level-2 model to include other predictors, the following research question: What is the
adding, for example, socio-economic status difference in the average trajectory of true
and gender. Each component of the level-2 change in mathematics achievement between
model also has its own residual—here, White students and African-American stu-
symbolized by ζ0i and ζ1i —that permits dents?
stochastic variation in the level-1 parameters, An important feature of both the level-1
after the impact of the predictor has been and level-2 models is the presence of
accounted for. The stochastic part of the requisite stochastic terms—the residuals εij
level-2 model allows the individual intercepts at level-1, and ζ0i and ζ1i at level-2. In
and slopes to differ across individuals, in the the level-1 model, residual εij accounts for
population. the difference between individual i’s true
The structural parts of the level-2 model and observed value of the outcome, on
in (2) contain four level-2 parameters— occasion j. For our example, each level-1
which we have labeled γ00 , γ01 , γ10 , and residual represents that part of student i’s
γ11 —that are known collectively as the value of MATHACH at time j not predicted
fixed effects. These fixed effects capture by his (or her) grade level. The level-2
the systematic inter-individual differences in residuals, ζ0i and ζ1i , on the other hand, allow
change trajectories. Later, in our example, each person’s individual growth parameters
we estimate them all. In equation (2), γ00 to be deviated from their relevant population
and γ10 are level-2 intercepts; γ01 and γ11 averages. They represent those portions of
are level-2 slopes. As in simple and multiple the level-2 outcomes—the individual growth
regression analysis, the level-2 slopes are of parameters—that remain ‘unexplained’ by the
greater interest because they represent the level-2 predictor(s). For our example, ζ0i
effect of predictors (here, AFAM) on the represents the difference between student i’s
individual growth parameters. We interpret true mathematics achievement in 7th grade
the level-2 parameters much like linear regres- and the population average true mathematics
sion coefficients, except that they describe achievement in 7th grade for this student’s
variation in ‘outcomes’ that are themselves racial group. Similarly, ζ1i represents the
the level-1 individual growth parameters. difference between student i’s rate of true
For example, γ00 represents the average change in mathematics achievement and the
true initial status (mathematics achievement population true slope for her racial group.
382 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
As is the case with most residuals, we are make appropriate distributional assumptions
usually less interested in their specific values about the residuals. At level-1, the situation
than in their variability. Level-1 residual is relatively simple. In the absence of
variance, σε2 , for instance, summarizes the evidence suggesting otherwise, we usually
scatter of the level-1 residuals around each begin by invoking the classical normal-theory
person’s true change trajectory, in the popu- assumption that the level-1 residuals are
lation. The level-2 residual variances, σ02 and independently and identically distributed with
σ12 , summarize the population inter-individual homoscedastic variance, εij ∼ N(0, σε2 ). At
variation in true individual intercept and slope level-2, the presence of two (or sometimes
around their averages that is left over after more) residuals necessitates that we describe
controlling for the effect(s) of any predictors their underlying distribution using a bivariate
included in the corresponding level-2 model. (or multivariate) assumption, such as:
Conditional on adjusting for the impact of 2
the level-2 predictors, therefore, σ02 represents ζ0i 0 σ0 σ01
∼N , (3)
population residual variance in true initial ζ1i 0 σ10 σ12
status and σ12 represents population residual
variance in true annual rate of change, This complete set of residual variances
across all individuals in the population. The and covariances—both the level-1 residual
level-2 variance components therefore allow variance, σε2 and the level-2 error variance-
us to address the research question: how covariance matrix—are jointly referred to as
much heterogeneity in true initial status the model’s variance components. Later, in
and true rate of change remains among our example, we estimate them all.
students after accounting for the effects
of race?
There is a final complication at level-2.
The composite multilevel model
In practice, it is entirely possible that there
for change
may be an association between initial status
and rate of change across individuals in This ‘level-1/level-2’ format is not the only
the population. For instance, students who way to specify the multilevel model for
begin 7th grade with higher mathematics change. A more parsimonious representation
achievement may have higher (or lower) rates results if you collapse the level-1 and level-2
of change. To permit this possibility, we must models together into a single composite sta-
permit the level-2 residuals to be correlated. tistical model. The composite representation
Since ζ0i and ζ1i represent the deviations of the multilevel model for change, while
of the individual growth parameters from identical to the level-1/level-2 specification
their population averages, their population mathematically, provides an alternative way
covariance, σ01 , summarizes the association of codifying hypotheses about change and is
between true individual intercept and slope the specification utilized by many dedicated
across all members of the population. Again statistical software programs. To derive the
because of their conditional nature, this composite specification—also known as the
population covariance, σ01 , summarizes the reduced form growth curve model—notice
association between true initial status and true that any pair of linked level-1 and level-2
annual rate of change, controlling for race. models share terms in common. Specifically,
This parameter then allows us to address the the individual growth parameters specified
question: controlling for race, are the true on the right-hand side of the ‘equals’ sign
mathematics achievement in 7th grade and in the level-1 model become the outcomes
the true rate of change in achievement related on the left-hand side of the ‘equals’ sign in
across students? the level-2 model. We can therefore collapse
To fit any statistical model to data, including the submodels together by substituting for
the multilevel model for change, we must π0i and π1i from the level-2 model in
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 383
equation (2) into the level-1 model in How did this cross-level interaction arise,
equation (1), as follows: when the level-1/level-2 specification of the
multilevel model for change appears to
Yij = π0i +π1i TIMEij +εij have no similar term? Its genesis is in the
= (γ00 +γ01 AFAMi +ζ0i ) ‘multiplying-out’ procedure used to generate
the composite model. When we substitute the
+(γ10 +γ11 AFAMi +ζ1i )TIMEij +εij
level-2 model for individual growth parameter
(4)
π1i into its appropriate position in the level-1
Where we have replaced the level- model, level-2 parameter γ11 , previously
1 predictor, (GRADE ij -7), by the generic associated only with level-2 predictor AFAM,
temporal representation, TIME ij , for simplic- gets multiplied by level-1 predictor TIME.
ity. Multiplying out and rearranging terms In the composite model, then, this parameter
yields the composite multilevel model for becomes associated with the interaction term,
change: AFAM*TIME. This association makes perfect
sense if you consider the following logic.
When γ11 is different from zero in the
Yij = γ00 +γ10 TIMEij +γ01 AFAMi
level-1/level-2 specification, the slopes of the
+γ11 (AFAMi ∗TIMEij ) true change trajectories differ according to
+ ζ0i +ζ1i TIMEij +εij (5) values of AFAM. In other words, the effect
of TIME (whose effect is represented by the
Where we once again use brackets to slopes of the change trajectories) differs by
distinguish the model’s structural and race. However, generically, when the effects
stochastic components. of one predictor (here, TIME) differ by the
Even though the composite specification of levels of another predictor (here, AFAM),
the multilevel model for change in (5) appears we say that the two predictors interact.
more complex than the level-1/level-2 speci- The cross-level interaction in the composite
fication, the two forms are logically and math- specification codifies this effect, modeling
ematically equivalent. The level-1/level-2 any difference in the average rate of true
specification is more substantively appealing; change in mathematics achievement between
the composite specification is algebraically African-American and White students.
more parsimonious. In addition, the fixed Another distinctive feature of the composite
effects—the γ ’s—capture the patterns of model is its ‘composite residual,’ the three
change in the ways that we have described, terms in the second set of brackets on
but they function in the composite model in the right-hand side of equation (5) that
a different way. Rather than first postulating combine together the effects of the single
how MATHACH is related to TIME and indi- level - 1 residual and the two level-2 residuals
vidual growth parameters, and second how the that appeared in the earlier level-1/level-2
individual growth parameters are related to specification:
AFAM, the composite specification postulates
that MATHACH depends simultaneously on: Composite residual: ζ0i + ζ1i TIMEij + εij
(1) the level-1 predictor, TIME; (2) the
level-2 predictor, AFAM, and (3) their cross- Even though the components that make up
level interaction, AFAM∗TIME. From this the composite residual have the same meaning
perspective, the composite model’s structural under both the level-1/level-2 and composite
portion resembles a multiple regression model specifications of the multilevel model for
with two predictors, TIME and AFAM, that change, the composite residual provides valu-
appear as both main effects (associated with able insight into our assumptions about the
parameters γ10 and γ01 , respectively) and behavior of residuals over time in longitudinal
in a cross-level interaction (associated with data. Instead of being a simple sum, the
parameter γ11 ). second level-2 residual, ζ1i , in the composite
384 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
residual is multiplied by level-1 predictor, across occasions. The presence of the time-
TIME. Despite this unusual construction, the invariant level-2 residuals, ζ0i and ζ1i , in
interpretation of the composite residual is each of the composite residuals defined in
straightforward: it describes the difference equation (5) allows them to be autocorrelated.
between the observed and predicted value of Because they have only an ‘i’ subscript (and
Y for individual i on occasion j. Inspection no ‘j’), they feature identically in each individ-
of the mathematical form of the composite ual’s composite residual on every occasion,
residual, however, reveals two important generating the required autocorrelation across
properties of the occasion-specific residuals time.
not readily apparent in the level-1/level-2
specification for the multilevel model for
change: the composite residuals can be both
Fitting the multilevel model for
autocorrelated and heteroscedastic within-
change to data
person. Fortunately, these are exactly the
kinds of properties that you would expect Many different statistical software programs
among residuals associated with repeated can be used to fit the multilevel model for
measurements of a changing outcome over change to data. Some are specialized packages
time, within-person. written expressly for this purpose (such
When residuals are heteroscedastic, the as HLM, MlwiN, and MIXREG). Others
unexplained portions of each person’s out- are part of popular multipurpose software
come have unequal variances from occasion packages including SAS (PROC MIXED
to occasion. Even though heteroscedasticity and PROC NLMIXED), SPSS (MIXED),
has many roots, one cause is the effects STATA (xtmixed, xtreg, and gllamm) and
of omitted predictors—the consequences of SPLUS (NLME). At their core, each program
failing to include variables that are, in fact, does the same job: it fits the hypothesized
related to the outcome. Because their effects multilevel model for change to data and
have nowhere else to go, they are bundled generates parameter estimates, measures of
together, by default, into the residuals. If precision, diagnostics, and so on. All of the
their impact differs across occasions, the different packages tend to produce the same,
residual’s magnitude may differ as well, or very similar, answers to a given problem,
creating heteroscedasticity. The composite regardless of their method of model-fitting and
model allows for heteroscedasticity via the parameter-estimation (Kreft and De Leeuw,
level-2 residual ζ1i . Because ζ1i is multiplied 1998). So, in one sense, it does not matter
by TIME in the composite residual, its which computer program you choose for your
contribution can differ (linearly, at least, data analysis. But, the packages do differ in
in a linear level-1 submodel) across occa- many important other ways, including the
sions. If there are systematic differences in ‘look and feel’ of their interfaces, their ways
the magnitudes of the composite residuals of entering and pre-processing data, their
across occasions, there will be accompanying approach to model specification (whether they
differences in residual variance, and hence require the multilevel model for change be
heteroscedasticity. specified in the level-1/level-2 or composite
When residuals are autocorrelated, the formats), their estimation methods (e.g. full
unexplained portions of each person’s out- vs. restricted maximum likelihood methods),
come are correlated with each other across their strategies for hypothesis testing, and
repeated occasions. Once again, omitted their provision of diagnostics. It is beyond
predictors, whose effects are bundled into the scope of this chapter to discuss these
the residuals, are a common cause of this details. Instead, we illustrate some of them by
phenomenon. Because their effects may be turning to the results of fitting the multilevel
present identically in each residual over time, model for change that we have specified
an individual’s residuals may become linked above to data on our example, using SAS
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 385
Table 22.1 Results of fitting a multilevel model for change to data (n = 1, 322).
This model predicts mathematics achievement between grades 7 and 11 as a
function of (GRADE-7) at level-1 and race (AFAM) at level-2
Parameter Estimate (s.e.)
Fixed effects
Initial status, π0i Intercept γ00 53.02***
(0.26)
AFAM γ01 −5.93***
(0.80)
Rate of change, π1i Intercept γ10 2.87***
(0.80)
AFAM γ11 −0.48*
(0.23)
Variance components
Level-1: Within-person, εij σε2 37.17***
(0.86)
Level-2: In initial status, ζ0i σ02 59.05***
(3.23)
In rate of change, ζ1i σ12 3.19***
(0.29)
Covariance between ζ0i and ζ1i σ01 6.18***
(0.69)
∼ p < 0.10; * p < 0.05; *** p < 0.001
Note: Full ML, SAS Proc Mixed.
and that there is a statistically significant We then substitute these estimates into the
difference in the average true mathematics hypothesized level-1 model in equation (1)
achievement of White students compared with to obtain the fitted individual change trajec-
their African-American peers. tories:
Next examine the second part of the fitted When AFAM = 0:
model, for the annual rate of true change.
In the population from which this sample Ŷij = 53.02 + 2.87(GRADEij − 7)
was drawn, we estimate the annual rate of
true change in mathematics achievement for When AFAM = 1:
the average White student is 2.87 points
per year; for the average African-American Ŷij = 47.09 + 2.39(GRADEij − 7) (8)
student, we estimate it to be nearly half a
point lower (at 2.39). In rejecting (at the These fitted trajectories are plotted in
0.001 level) the null hypothesis on γ10 , the right-hand panel of Figure 22.1, and
we conclude that the average White student reinforce the numeric conclusions articulated
experienced a statistically significant increase above. In comparison to White students, the
in true mathematics achievement over time. average African-American student has lower
Because we also reject (at the 0.05 level) mathematics achievement in 7th grade and
the null hypothesis on γ11 , we conclude that a slower rate of increase in mathematics
differences between African-American and achievement.
White students in their annual rates of true The estimated variance components assess
change are also statistically significant. The the amount of outcome variability left—at
estimated mathematics achievement for the either level-1 or level-2—after including the
average White student increased 11.48 points specified predictors. Because the variance
from 7th grade to 11th grade, while the components are harder to interpret in absolute
increase for African-American students was terms, many researchers rely on the associated
two points lower (9.56). African-American hypothesis tests, for at least they provide
students begin 7th grade with lower aver- some benchmark for comparison. Some
age mathematics achievement than their caution is necessary, however, because a
White counterparts, and the achievement gap null hypothesis on a variance necessarily
increases over time. falls at the border of the available parameter
Another way of interpreting the estimated space (by definition, variances cannot be
fixed effects is to plot fitted trajectories for negative) and as a result, the asymptotic
prototypical individuals. For this particular distributional properties that hold in simpler
model, only two prototypes are possible: an settings may not apply (Snijders and Bosker,
African-American student (AFAM=1) and a 1999). The level-1 residual variance, σε2 ,
White student (AFAM=0). Substituting these summarizes the population variability in an
predictor values into equation (6) yields the average person’s outcome values around his
estimated initial status and annual growth or her own true change trajectory. Its estimate
rates for each: here is 37.17. Rejection of the associated
When AFAM = 0: null hypothesis test (at the 0.001 level)
suggests the existence of additional outcome
π̂0i = 53.02 − 5.93(0) = 53.02 variation at level-1 (within-person) that may
be predictable in subsequent analyses by time-
π̂1i = 2.87 − .48(0) = 2.87
varying predictors other than time itself.
The level-2 variance components, σ02 and
When AFAM = 1:
σ12 , summarize the variability in true initial
status and rate of true change that remains
π̂0i = 53.02 − 5.93 = 47.09
(7) after controlling for level-2 predictors (here,
π̂1i = 2.87 − .48(1) = 2.39 AFAM). Tests associated with these variance
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 387
components evaluate whether there is any related to predictors; and (2) multiple kinds of
remaining residual outcome variation that effects, both the fixed effects and the variance
could potentially be explained by further components. Hypothesizing a level-1 linear
predictors at level-2. For these data, we reject individual growth model has provided two
both of these null hypotheses (at the 0.001 level-2 outcomes; a more complex level-1
level). Because these are level-2 variance submodel specification may provide more.
components (describing the residual variation One simple strategy in specifying the level-2
in true initial status and rate of true change), models is to include each level-2 predictor
we would consider adding further time- simultaneously in all level-2 submodels.
invariant predictors to the multilevel model However, as we show below, they need not
for change. Finally, let’s turn to the level-2 all remain. Each individual growth parameter
covariance component, σ01 . Since we reject can have its own predictors at level-2,
the null hypothesis on this parameter too, we and one goal of model specification is to
can conclude that the intercepts and slopes identify which level-2 variables are important
of the individual true change trajectories are predictors of which level-1 individual growth
indeed correlated in the population, control- parameters. So, too, although each level-2
ling for student race—there is a positive submodel may contain both fixed and random
association between true initial status and effects, both are not necessarily required.
annual rate of true change, once the effects Sometimes hypothesizing a model that has
of AFAM have been removed. On average, fewer random effects will provide a more
African-American and White students who parsimonious representation of the data and
have higher true mathematics achievement in clearer substantive insights into the research
7th grade also have greater rates of increase questions being posed.
in true mathematics achievement between 7th In the data-analytic example that follows,
and 11th grade. we continue to ask whether race has an
impact on change in mathematics achieve-
ment between 7th grade and 11th grade, but
Adding further predictors to the
we now expand our analyses to include socio-
multilevel model for change
economic status and gender as important
Our discussion to this point has focused on controls. Model B in Table 22.2 includes
developing the foundation for understanding SES as a level-2 predictor of both true initial
the multilevel model for change by comparing status and rate of true change in mathematics
the average trajectories of two populations achievement. Model C then removes the effect
of students, African-American and White. of race on the rate of true change. In Model D,
We have seen that true change in both the effect of FEMALE on both true initial
groups is positive, on average, with White status and rate of true change is included in
students enjoying a more rapid increase in true the level-2 model, and in Model E, the model
mathematics achievement over time. How- is again simplified by removing the effect of
ever, through the analysis of the associated FEMALE on the rate of true change.
variance components, we have found that
heterogeneity remains at level-1 and in the Interpreting the additional fitted
true intercepts and slopes, even after the models
effect of time and race have been partialled
out. This suggests that it is important to We have already discussed fitted Model A,
consider the addition of further predictors to which includes AFAM as a predictor of both
the model. Here, as we fit selected additional true initial status and rate of true change. In
models, it is important to remain aware of Model B, we now add SES to the level-2
the complexities involved, of which there are: model, including it as a predictor of both
(1) multiple level-2 outcomes (the individual true initial status and rate of true change.
growth parameters), each of which can be There are therefore now six fixed effects
Table 22.2 Results of fitting a taxonomy of multilevel models for change to the mathematics achievement data(n = 1, 322)
Parameter Model A Model B Model C Model D Model E
Fixed effects
Initial status Intercept γ00 53.02*** 52.81*** 52.82*** 52.39*** 52.40***
(0.26) (0.25) (0.25) (0.35) (0.35)
AFAM γ01 −5.93*** −4.66*** −4.77*** −4.80*** −4.80***
(0.80) (0.77) (0.77) (0.77) (0.77)
SES γ02 3.62*** 3.61*** 3.62*** 3.62***
(0.34) (0.34) (0.34) (0.34)
FEMALE γ03 0.84∼ 0.82∼
(0.48) (0.48)
Rate of change Intercept γ10 2.87*** 2.85*** 2.81*** 2.85*** 2.81***
(0.08) (0.08) (0.07) (0.11) (0.07)
AFAM γ11 −0.48* −0.35
(0.23) (0.24)
SES γ12 0.37*** 0.40*** 0.40*** 0.40***
(0.10) (0.10) (0.10) (0.10)
FEMALE γ13 −0.08
(0.146)
Variance components
Level-1: Within-person σε2 37.17*** 37.17*** 37.16*** 37.17*** 37.16***
(0.86) (0.86) (0.86) (0.86) (0.86)
Level-2: In initial status σ02 59.05*** 52.46*** 52.46*** 52.30*** 52.30***
(3.23) (2.98) (2. 98) (2.97) (2.97)
In rate of change σ12 3.19*** 3.13*** 3.14*** 3.14*** 3.14***
(0.29) (0. 29) (0.29) (0.29) (0.29)
Covariance σ01 6.18*** 5.50*** 5.50*** 5.51*** 5.51***
(0.69) (0.66) (0.66) (0.66) (0.66)
∼ p < 0.10; * p < 0.05; *** p < 0.001
Note: Full ML, SAS Proc Mixed.
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 389
to interpret. We begin by interpreting the γ̂13 , represents the estimated effect of SES
parameter estimates in the fitted level-2 itself, controlling for race. Again we are not
submodel for initial status. The estimated surprised that this estimate is positive—for
intercept for this first part of the level-2 students of either race, on average, those with
model provides an estimate of true initial SES one point higher have true growth rates
mathematics achievement when all predictors that are .37 points per year greater (p<0.001).
in that part of the level-2 model are set to Now examine the variance components
zero. As we know, when AFAM equals 0 we associated with Model B. The statistically
are dealing with White students. SES equals significant within-person variance component
0 for students of average socio-economic (σ̂ε2 ) for Model B is identical to that of
status since this measure was standardized Model A, reinforcing the need to explore the
in preliminary analysis to a mean of zero. potential inclusion of time-varying predictors
Therefore, we estimate that the average 7th at level-1. We anticipated stability like this
grade mathematics achievement for White in our estimates because we have added
students of average socio-economic status is no additional predictors at level-1 between
52.81. The next parameter, γ01 , represents Models A and B (although estimates may vary
the effect of race on true initial status, inadvertently because of uncertainties arising
controlling for socio-economic status. Here, from iterative estimation). The estimated
we estimate that, controlling for the effects level-2 variance components, however, do
of SES, the true mathematics achievement of differ: σ̂02 declines by 11.2 percent from
the average African-American 7th grader is Model A (from 59.05 to 52.46). Because
4.66 points lower than that of the average it is still statistically significant, however,
White 7th grader (p<.001). Therefore, while potentially explainable residual variation in
the effect of AFAM is slightly attenuated true initial status remains. The estimated
by controlling for SES, there remains a variation in rate of true change declines
statistically significant effect of race on 7th only minimally from 3.19 to 3.13, and also
grade true mathematics achievement. The remains statistically significant, suggesting
final parameter in the level-2 submodel for the continued presence of explainable residual
true initial status is γ02 , representing the variation in rates of true change.
effect of SES, controlling for race. This Because the average rate of true change in
parameter describes the difference in 7th mathematics achievement does not differ for
grade true mathematics achievement for a African-American and White students once
one-unit difference in SES, for students of SES is controlled, in Model C we remove
either race. We are not surprised to find a AFAM as a predictor of rate of true change,
positive effect of SES—controlling for race, while retaining it as a predictor of true initial
we estimate that average true mathematics status. The parameter estimates associated
achievement is 3.62 points higher for students with both the fixed and random effects are
whose SES is one point greater (p<0.001). essentially unchanged with the removal of
Turning now to parameter estimates associ- AFAM as a predictor of rate of true change.
ated with the rate of true change in Model B, In including the effect of predictor gender, we
we find that the estimated rate of true use a similar approach, first adding FEMALE
change in mathematics achievement for White as a predictor of both true initial status and
students of average SES is 2.85 (p<0.001). true slope (Model D), then, because we find no
While adding SES to the slope submodel differences in the average rate of true change
has not impacted its estimated intercept, the for girls and boys, we remove FEMALE as a
effect of AFAM, while still negative, is no predictor of rate of true change (Model E).
longer statistically significant. Controlling In interpreting Model E, we begin again
for SES, the average rate of true change by interpreting the model’s fixed effects.
no longer differs for African-American and With FEMALE now a predictor of true initial
White students. Our final parameter estimate, status, the interpretation of the intercept term
390 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
for the initial status submodel has changed yet as a predictor. We estimate that the rate of
again. Now γ00 represents the average true true change in mathematics achievement for
mathematics achievement for White, male a student of average SES is 2.81 (p<0.001),
(FEMALE = 0) students of average SES. and that students whose SES is one unit
Therefore, we estimate that the average 7th higher have rates of change in mathematics
grade mathematics achievement for White achievement that are greater by 0.4 point per
male students of average socio-economic year (p<0.001).
status is 52.40. The next parameter, γ01 , Finally, examining the associated variance
models the effect of race on true initial components for Model E, we see that the only
status, controlling for socio-economic status thing that has changed is the estimated varia-
and now gender as well. We estimate that tion in true initial status, which has declined
the 7th grade mathematics achievement of only slightly from 52.46 in Model C to 52.30
an African-American male from an average in Model E. Because all of the variance
socio-economic background is 4.80 points components remain statistically significant,
lower than that of a comparable White student potentially explainable residual variation in
(p<0.001). The effect of AFAM is essentially true initial status and rate of true change
unchanged when we control for FEMALE remain for future consideration.
in addition to SES. Similarly, the effect of
SES on true initial status does not change
Displaying prototypical trajectories
when controlling for FEMALE. The final
parameter in the level-2 submodel for true of change
initial status is γ03 , representing the effect For longitudinal analyses, we find that graphs
of FEMALE, controlling for race and socio- of fitted trajectories for prototypical individ-
economic status. Average 7th grade mathe- uals are more powerful tools than numerical
matics achievement is almost one point higher summaries for communicating our findings.
for girls than boys of comparable race and In Figure 22.1, we presented plots of fitted
socio-economic status, but since the p-value individual growth trajectories for prototypical
is slightly larger than 0.05, the effect is not African-American and White students, using
statistically significant at the conventionally the estimates of the fixed effects from Model
accepted 0.05 level. Nevertheless, we choose A to obtain estimates of true initial status and
to retain FEMALE as a predictor of true initial rate of true change for the two populations of
status in our model, given its substantive students (equation (7)). We can extend these
importance as a predictor of mathematics strategies to models with multiple predictors,
achievement and the fact that the p-value is as we have in Model E.
less than 0.10. Figure 22.2 presents fitted trajectories
In Model E, FEMALE is not a predictor of derived from Model E for four prototypi-
rate of true change, a substantively interesting cal students—African-American and White
finding that suggests that the rate of change students of different SES. We have selected
in mathematics achievement from 7th to 11th prototypical values of SES that correspond to
grade does not differ by gender. Our rate of the sample mean plus and minus one standard
true change submodel now includes only SES deviation (0.735 and −0.693, respectively)
Table 22.3 Fitted values of the individual growth parameters from Model E for four
prototypical individuals
AFAM SES Initial status (π̂0i ) Rate of change (π̂1i )
White Low 52.401−4.798(0)+3.616(−0.693)+0.818(1)=50.713 2.808+0.395(−0.693)=2.534
White High 52.401−4.798(0)+3.616(0.735)+0.818(1)=55.877 2.808+0.395(0.735)=3.098
African-American Low 52.401−4.798(1)+3.616(−0.693)+0.818(1)=45.915 2.808+0.395(−0.693)=2.534
African-American High 52.401−4.798(1)+3.616(0.735)+0.818(1)=51.079 2.808+0.395(0.735)=3.098
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 391
70
White, hi SES
65
African-American,
Estimated math achievement
hi SES
White, low SES
60
African-American,
55 low SES
50
45
40
7 8 9 10 11
Grade
Figure 22.2 Fitted growth trajectories for prototypical African-American and White students
of high and low socio-economic backgrounds
and chose to present trajectories for females Extensions of the multilevel model
only. Since the gender effect is small, the plot for change
would be essentially identical for males. We
compute the fitted values of the individual While it permits considerable complexity in
growth parameters for these prototypical analysis, as evidenced in Table 22.2, the
individuals as follows: shown in Table 22.3. example that we have presented in this chapter
Notice that the fitted trajectories of math- has two structural features that simplify
ematics achievement differ by both race and analysis. The example is both balanced and
socio-economic status, as anticipated. At each time-structured—all students are assessed on
level of SES, the fitted trajectory for White exactly five occasions and these occasions
students is consistently elevated above that (7th grade to 11th grade) are identical across
of African-American students, and the differ- individuals. Our analyses are also straightfor-
ential in mathematics achievement between ward in that we have used only: (1) time-
White and African-American students of the invariant predictors that describe immutable
same SES does not differ over time. The effect characteristics of the students (except for
of SES is more complex. Within racial groups, TIME itself); and (2) a representation of
the trajectory for students of high socio- TIME that forces the level-1 individual growth
economic status is above that of students of parameters to represent ‘initial status’ and
low socio-economic status across all grades. ‘linear rate of change.’ However, the multi-
Furthermore, the increase in mathematics level model for change is very flexible and can
achievement over time is more rapid for be used to address more complex problems, as
the high socio-economic status students than we now describe.
for their low socio-economic status peers,
with the difference in estimated mathematics Variably spaced measurement occasions
achievement between high and low SES 11th Researchers often collect longitudinal data
graders of the same race over 40 percent in which the actual measurement occasions
higher than the difference between these differ across individuals. These differences
students in 7th grade. may result from the realities of fieldwork and
392 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
data collection. For example, when studying datasets are described in Singer and Willett
the psychological consequences of unemploy- (2003).
ment, Ginexi et al. (2000) designed a time-
structured study, with interviews scheduled at The impact of time-varying predictors
1, 5, and 11 months after job loss. Once in the A time-varying predictor is a variable whose
field, however, the interview times varied sub- values may differ over time. Some time-
stantially around these targets, so Ginexi and varying predictors have values that change
colleagues chose to use the number of days naturally; others have values that change
since job loss as a metric for the measurement by design. For example, in the mathematics
of time in their study. Each individual in their achievement data, students’ attitudes toward
study, therefore, had a unique data collection mathematics change naturally over time. We
schedule: 31, 150, and 365 days for the first would expect students with more positive
person in the dataset; 23, 162, and 401 days attitudes about mathematics to also have
for the second person; and so on. higher levels of mathematics achievement.
Differences in the actual measurement In specifying a multilevel model for change
occasions across individuals may also occur that includes a time-varying predictor, we
by design. This is the case, for example, add the time-varying predictor to the level-1
in accelerated cohort or accelerated longi- submodel either as a main effect or as
tudinal designs, in which multiple cohorts an interaction with time, or both. Thus,
of different ages are followed longitudinally. conceptually, we may still interpret the effects
Each cohort must have at least one age that of the time-varying predictor in terms of its
overlaps with another cohort and then a single impact on true initial status and/or rate of
growth trajectory is estimated, extending from true change. However, since the time-varying
the youngest age to the oldest (Collins, 2006). predictor is added to our level-1 submodel,
The advantage of an accelerated cohort design we can also specify any additional main
is that change can be modeled over a longer effect and interaction with time as either a
temporal period using fewer waves of data. fixed or a random effect, thereby allowing
The disadvantage is that the researcher must us to investigate whether these effects are
rely more heavily on assumptions about the constant or vary across members of the
shape of the change trajectory. Miyazaki population.
and Raudenbush (2000) discuss important While time-varying predictors offer excit-
assumptions of the analysis of data from ing analytic possibilities to researchers, many
accelerated longitudinal designs. present interpretive difficulties stemming
from the problem of reciprocal causation
Varying numbers of measurement (endogeneity), as in the case of our example
occasions of mathematics achievement and attitudes
A major advantage of the multilevel model for toward mathematics: if X is correlated with
change is that it is easily fit to unbalanced data. Y , can you conclude that X causes Y or
In our mathematics achievement data, the is it possible that Y causes X? To address
analytic sample used included only students this problem it is important to first assess
with five waves of data; however, in the whether inferences are clouded by reciprocal
original dataset there are many additional causation. Second, if your data allow, con-
students with fewer waves of data. It is sider coding time-varying predictors so that
straightforward to fit the multilevel model their values in each record of the person-
for change in the larger unbalanced dataset. period dataset refer to the previous point in
With severely unbalanced datasets, however, chronological time.
there can be problems of convergence in the
iterative methods used by standard computer Modeling discontinuous individual change
packages to fit the models to the data. Practical Not all individual change trajectories are
problems that may arise when analyzing such continuous functions of time. If you believe
AN INTRODUCTION TO THE MULTILEVEL MODEL FOR CHANGE 393
including Diggle et al. (2002); Fitzmaurice Keiley, M. K., Bates, J. E., Dodge, K. A., & Pettit, G. S.
et al. (2004); Hedeker and Gibbons (2006); (2000). A cross-domain growth analysis: Externalizing
Raudenbush and Bryk (2002); Singer and and internalizing behavior during 8 years of
Willett (2003); Snijders and Bosker (1999); childhood. Journal of Abnormal Child Psychology, 28,
Verbeke and Molenberghs (2000); Walls and 161–179.
Kreft, I. G. G., & de Leeuw, J. (1998). Introducing
Schafer (2006); and Weiss (2005).
multilevel modeling. Thousand Oaks, CA: Sage.
Miller, J. D., Kimmel, L., Hoffer, T. B., & Nelson, C.
(2000). Longitudinal study of American youth: User’s
REFERENCES manual. Chicago, IL: International Center for the
Advancement of Scientific Literacy, Northwestern
Ai, X. (2002). Gender differences in growth in math- University.
ematics achievement: Three-level longitudinal and Miyazaki, Y., & Raudenbush, S. W. (2000). Tests
multilevel analyses of individual, home, and school for linkage of multiple cohorts in an accelerated
influences. Mathematical Thinking and Learning, longitudinal design. Psychological Methods, 5,
4, 1–22. 44–63.
Collins, L. M. (2006). Analysis of longitudinal data: The Rogosa, D. R., Brandt, D., & Zimowski, M. (1982).
integration of theoretical model, temporal design, and A growth curve approach to the measurement of
statistical model. Annual Review of Psychology, 57, change. Psychological Bulletin, 90, 726–748.
505–528. Singer, J. D., & Willett, J. B. (2003). Applied
Cronbach, L. J., & Furby, L. (1970). How should longitudinal data analysis: Modeling change and
we measure ‘change’—or should we? Psychological event occurrence. New York, NY: Oxford.
Bulletin, 74, 68–80. Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel
Curran, P. J. (2003). Have multilevel models been analysis: An introduction to basic and advanced
structural equation models all along? Multivariate multilevel modeling. London: Sage.
Behavioral Research, 38, 529–569. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical
Curran, P. J., Stice, E., & Chassin, L. (1997). The linear models: Applications and data analysis
relation between adolescent and peer alcohol methods, 2nd edition. Thousand Oaks, CA: Sage.
use: A longitudinal random coefficients model. Verbeke, G., & Molenberghs, G. (2000). Linear mixed
Journal of Consulting and Clinical Psychology, 65, models for longitudinal data. New York, NY: Springer.
130–140. Walls, T. A., & Schafer, J. L. (2006). Models for intensive
Diggle, P., Liang, K.-Y., & Zeger, S. (2002). Analysis longitudinal data. New York, NY: Oxford.
of longitudinal data, 2nd edition. New York, Weiss, R. (2005). Modeling longitudinal data. New York,
NY: Oxford.Fitzmaurice, G.M., Laird, N. M., & NY: Springer.
Ware, J. H. (2004). Applied longitudinal analysis. Willett, J. B. (1988). Questions and answers in the
New York, NY: Wiley. measurement of change. In E. Rothkopf (Ed.),
Ginexi, E. M., Howe, B. W., & Caplan, R. D. Review of research in education (1988–1989)
(2000). Depression and control beliefs in relation to (pp. 345–422). Washington, DC: American Education
reemployment: What are the directions of effect? Research Association.
Journal of Occupational Health Psychology, 5, Willett, J. B., & Sayer, A. G. (1994). Using covariance
323–336. structure analysis to detect correlates and predictors
Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data of individual change over time. Psychological Bulletin,
analysis. New York, NY: Wiley. 116, 363–381.
23
Latent Variable Models of
Social Research Data
Rick H. Hoyle
During the writing of this chapter, Rick Hoyle Shedden, 1999). Variants of most of the
was supported by grant P20-DA017589 from models described in the chapter could, in
the National Institute on Drug Abuse. principle, be evaluated using one or more of
these alternative strategies.
After presenting a brief history of structural
equation modeling, I provide an overview of
LATENT VARIABLE MODELS IN the technique, with a particular focus on the
SOCIAL RESEARCH representation of models in diagrams. I do
not provide technical details or outline the
Latent variable models concern the presence, steps involved in implementing a structural
definition, and/or influence of constructs that equation modeling analysis. Rather, I use the
either cannot be observed or characteristics brief overview as a foundation for presenting,
that are, in principle, observable but that have in conceptual terms, specific latent variable
not been directly observed in a given dataset models relevant for social research. Even
(Bollen, 2002; MacCallum & Austin, 2000; though I touch on basic models, the primary
Sobel, 1994). The focus of this chapter is focus is more complex models that take full
recent advances in the use of linear structural advantage of the capabilities of structural
equation modeling with continuous variables equation modeling. I conclude the chapter
to evaluate such models in social research. with a brief section on the limitations of the
Alternative approaches to evaluating latent technique.
variable models not covered in the chapter
include latent class analysis (Clogg, 1995),
History
latent transition analysis (Collins & Wugalter,
1992), latent profile analysis (Gibson, 1959), The origin of structural equation modeling
latent logit modeling (McCutcheon, 1994), typically is traced to the work of population
and growth mixture modeling (Muthén & geneticist, Sewall Wright, best known as
396 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
modeling. This material is followed by a brief variables, or factors. The large oval labeled,
description of estimation and the logic of F1, is an independent variable—it is not
model fit in applications of structural equation influenced by other variables in the model.
modeling. The large ovals labeled, F2 and F3, are
dependent variables—their variance is, in
part, accounted for by other variables in the
Path diagrams
model. Paths run from each of these latent
A convenient and informative means of variables to their indicators, represented by
depicting a latent variable model is the squares labeled x1 to x10 . These paths are
path diagram (McArdle & McDonald, 1984; either labeled ‘1,’ which means the factor
McDonald & Ringo Ho, 2002). An example loading has been fixed at this value, or
appears in Figure 23.1. This path diagram *, indicating that the factor loading is to
includes all the elements necessary for be estimated from the data. Variance in
depicting even the most complex models. each indicator not attributable to the latent
The ovals represent latent variables, sources variable is allocated to measurement error,
of influence not measured directly. The or uniqueness, indicated by the small ovals
large ovals correspond to substantive latent labeled u1 to u10 . Associated with each of
∗
∗ d3
∗
F1 F3
1 ∗ ∗ 1 ∗
∗ ∗
∗
x1 x2 x3 x10 x11
d2
u1 u2 u3 u10 u11
F2
∗ ∗ ∗ ∗ ∗
1 ∗ ∗ ∗ ∗ ∗
x4 x5 x6 x7 x8 x9
u4 u5 u6 u7 u8 u9
∗ ∗ ∗ ∗ ∗ ∗
Figure 23.1 Path diagram illustrating the specification of latent independent and dependent
variables and the designation of free parameters
398 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
fit the data. Because the assumptions of the data. Even though the focus is on innovative
basic statistical test of whether these matrices models that are not yet in wide use in social
differ are rarely met in practice, a host of research, the presentation of each group is
adjunct fit indices have been developed and prefaced by a description of basic models of
informal criteria for applying them proposed that type.
(Hu & Bentler, 1995, 1999). When, by well-
justified criteria, a model yields an acceptable
Measurement
account of the data, parameter estimates can
be interpreted in a manner not unlike the As noted earlier, model specifications might
interpretation of regression coefficients or comprise one or both of two components—
factor loadings. measurement and structural (Anderson &
Before turning to a presentation of specific Gerbing, 1988). The measurement component
model specifications of potential interest to concerns the relations between latent vari-
social researchers, it is important to establish ables and their indicators, and the structural
that not all models that are specified can component concerns the directional relations
be estimated. For a model to be estimated, between the latent variables. Models need not
it must be specified in such a way that it include both components and, in fact, models
is identified. Conceptually speaking, identi- that include only the measurement equations
fication concerns the integrity of estimates are relatively common. A focus strictly on
of free parameters in a model. If a model the measurement component typically is
is identified, a unique estimate for each motivated either by a desire to test specific
and every free parameter can be obtained hypotheses about the latent structure of a set
given the criteria of the estimator. If no of indicators or a need to ensure the integrity
value, or more than one value, of one or of a set of latent variables before testing
more free parameters can be obtained, then hypotheses about the relations between them.
the model is unidentified and estimates of The use of structural equation modeling in
parameters are not valid. Eve though most this way is referred to as confirmatory factor
applications typical of social research yield analysis (Hoyle, 2000).
models that are identified, it is wise to evaluate
the identification status of a model before
Basic model
estimating. Even though application of a num-
ber of relatively straightforward identification The most basic application of structural
rules can provide some assurance that the equation modeling to matters of measurement
model is identified, the definitive evaluation is the first-order factor model. In this model,
of the identification status of individual free one or more latent variables are predicted
parameters and the model as a whole requires to explain the commonality among a set
solving the structural equations using the of indicators. Returning to Figure 23.1, if
variances and covariances (Bollen, 1989). the directional paths between latent variables
were replaced by curved arrows indicating
covariance, the model would be a basic first-
SPECIFIC MODELS FOR SOCIAL order measurement model. Because there are
RESEARCH DATA no directional paths between latent variables,
all latent variables are, in effect, independent
In this section I present a series of specific variables. To illustrate, Funk (1999) used
latent variable models relevant for social data from the National Election Studies to
research. The models are presented in three investigate the latent structure of trait ratings
groups: measurement models, which focus on of presidential nominees. The hypothesized
latent variables but not the relations between three-factor model proved superior to one-
them; models appropriate for cross-sectional and two-factor models and held across all
data; and models appropriate for longitudinal nominees for which data were available.
400 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
An advantage of this basic application over model with correlated factors. In principle
traditional methods such as exploratory fac- and with enough indicators and factors, one
tor analysis is that competing models can could estimate third-order or higher models;
be formally compared, specific aspects of however, in practice, such models are rare.
models (e.g. correlations between factors)
can be formally evaluated, and adjustments
Models of measurement invariance
can be made to accommodate covariation
among indicators not explained by the latent When relations between latent variables or
variables (i.e. correlated uniquenesses) or mean levels on constructs represented by
indicators influenced by more than one latent those latent variables are to be compared
variable (i.e. cross loadings). Even though this across samples or within a sample across time,
approach to factor analysis is sometimes por- a key concern is whether the meaning of the
trayed as contrasting sharply with exploratory latent variables is consistent across levels of
factor analysis, it is possible to relax many of the dimensions on which they are to be com-
the restrictions on the standard confirmatory pared such as nationality (e.g. Steenkamp &
factor model (e.g. simple structure) and, Baumgartner, 1998), measurement modality
in so doing, approximate applications of (e.g. Deutskens et al., 2006), and age (Pentz &
exploratory factor analysis (Hoyle & Duvall, Chou, 1994). To the extent that the measure-
2004). ment model for a latent variable is consistent
across samples or time, it is invariant with
respect to measurement. Despite the obvious
Higher-order factor models
importance of measurement invariance, it
For measurement models that specify four or is rarely evaluated (Vandenberg & Lance,
more first-order factors, it is possible to test 2000).
hypotheses about sources of commonality that In order to illustrate the various aspects
underlie correlations among the factors. In so of measurement invariance and how they
doing, one, in effect, combines a confirmatory are evaluated, it is useful to consult a
factor analysis of the observed variables with path diagram. Displayed in Figure 23.2 is
a confirmatory factor analysis of the factors. a single model with two correlated latent
As would be the case for factors at the variables that is specified for two levels, a
first order, factors at the second order are a and b, of some dimension of interest (e.g.
function of commonality—in this instance, ethnicity, age). Note the presence of paths
commonality among the first-order factors. that are not present in the model shown in
Also, as would be the case in terms of Figure 23.1. In the typical application of
observed variables at the first-order level, structural equation modeling, all variables
at least four first-order factors are necessary are rescaled as deviations from their means,
in order to allow for a test of that portion thereby setting intercepts in the measurement
of the model3 . For example, Hoyle (1991) equations and means of the latent variables
examined the second-order structure of a to zero. In tests of measurement invariance,
20-item measure of self-esteem designed to the estimated values of these constants often
yield four first-order factors corresponding to are of interest. The triangle in the center of
self-esteem domains (e.g. social competence, the two-factor model is a constant affecting
physical appearance). He reasoned that the the indicators and the latent variables. Paths
correlation among the first-order factors could running from the constant to the indicators
be attributed to a general self-esteem factor, correspond to intercepts. Paths running from
which would be evidenced by a single the constant to the latent variables correspond
second-order factor. The analyses indicated to means.
that the second-order model provided a Note that every parameter in the two-factor
good account of the data and, importantly, model on the left and has a corresponding
provided a better account than a first-order parameter in the two-factor model on the
LATENT VARIABLE MODELS OF SOCIAL RESEARCH DATA 401
∗ ∗ ∗ ∗ ∗ ∗
0 ∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗ ∗ ∗
x1a x2a x3a x4a x5a x6a x1b x2b x3b x4b x5b x6b
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
Figure 23.2 Path diagram illustrating parameters that can be compared in studies of
measurement invariance
right. These parameters are estimated from the what they represent from variance attributable
augmented moment matrix, which adds means to how they were measured. The multitrait-
to the covariance matrix, and adds the mean multimethod matrix is a covariance matrix
structure to the standard covariance structure comprising data on two or more characteris-
in the model. If the model were fully invariant tics obtained using two or more methods. For
for a and b, every pair of parameters would instance, McPherson and Rotolo (1995), in a
be equivalent. Rather than comparing each study of the composition of voluntary groups,
pair of parameters individually, invariance obtained data on four characteristics of such
analyses usually involve comparing sets of groups (e.g. group size, age composition)
parameters and doing so in a systematic provided by three sources (e.g. group member,
manner (Widaman & Reise, 1997). The observer). Using the language of multitrait-
result is a determination of whether the multimethod analysis, in this example, group
observed variables reflect similar constructs, characteristics are ‘traits’ and sources are
and therefore can be compared, across groups ‘methods.’ In the prototypic model specifica-
or time (Byrne et al., 1989; Steenkamp & tion, each observed score is influence by a trait
Baumgartner, 1998). factor, a method factor, and a uniqueness com-
ponent (Marsh & Grayson, 1995). Variance
in the observed variables is decomposed
Multitrait-multimethod models into a portion attributable to the construct
The measurement models described to this regardless of how it is measured (monotrait-
point decompose variance in observed vari- heteromethod), a portion attributable to how
ables into two components: variance shared it was measured without reference to the
with other indicators of a single latent variable constructs (heterotrait-monomethod), and a
and uniqueness. It is, however, possible to portion attributable neither to the trait nor
further decompose variance by accounting for the method (uniqueness). Obtaining estimates
multiple sources of commonality. Such is the of parameters in the prototypic model can
case in latent variable models of multitrait- be difficult, but alternative, more robust
multimethod data (Campbell & Fiske, 1959), specifications have been proposed (Kenny &
which allow for the disentanglement of Kashy, 1992). Latent variable models of
variance in observed variables attributable to multitrait-multimethod data provide useful
402 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
information about the reliability and validity structural portion of the model. Referring back
of observed variables. to Figure 23.1, this focus concerns the direc-
tional relations between the latent variables,
estimated by the *s on the directional paths,
Trait-state-error models
and the disturbance terms, d1 and d2 . In the
Like multitrait-multimethod models, trait- remainder of this section, I focus on models
state-error models posit the influence of two for data gathered at a single point in time.
latent variables on each observed variable
(Cole et al., 2005; Kenny & Zautra, 1995).
Basic model
The univariate trait-state-error model is a
sophisticated measurement model that, in The most basic structural model includes one
effect, decomposes variance in a construct latent independent and one latent dependent
measured on four or more occasions into variable (e.g. F1 and F3 in Figure 23.1). Such
three components. The trait component is a model is equivalent to a simple regression
that part that does not change over time— model except that both the predictor and
the autoregressive component in panel and outcome do not reflect sources of error that
time-series designs. The state component is vary across the indicators (DeShon, 1998).
that portion of the variance that is reliable Thus, latent variable models overcome a crit-
but variable over time. The error component ical shortcoming of traditional approaches to
is that portion of variance that is not reliable modeling directional effects. Latent variable
over time. For example, Zautra et al. (1995) models also provide a means of evaluating the
obtained 10 monthly measures of pain and effects of independent variables on multiple
psychological distress. Their trait-state-error dependent variables while also evaluating the
model revealed that 60 percent of the variance effects of dependent variables on each other.
in pain and 75 percent of the variance For instance, in Figure 23.1, F1 affects both
in psychological distress was stable and F2 and F3, which are specified as related
therefore trait-like over the year of their due to the directional influence of F2 on F3.
study. Importantly, however, 35 percent of the Importantly, however, latent variable models
variance in pain and 18 percent of the variance do not overcome the significant limitation of
in psychological distress could be attributed cross-sectional data for tests of directional
to reliable variance at each assessment. In effects. For instance, if data on the 11 observed
a bivariate form of the model, they were variables in Figure 23.1 were gathered at a
able to study the directional influences of single point in time, the arrows between the
pain and distress on each other, focusing only latent variables could be reversed with no
on reliable variance in the variables subject change in model fit (MacCallum et al., 1993).
to change over time (i.e. states). The trait- Thus, it is not possible to test in a definitive
state-error model is valuable both for the manner the direction of influence between
information it provides regarding the nature variables measured at a single point in time
of variability on a construct and as a means of (Gollob & Reichardt, 1991). This raises the
examining the causal influence of components question of why one would use structural
of constructs subject to change over time. equation modeling on cross-sectional data
when more familiar models are available.
Even though the ability to model predictors
Cross-sectional
and outcomes as latent variables cannot
I now turn to latent variable models that focus address the directionality criterion association
on the relations between latent variables. In with causal inferences, it provides significant
such models, we assume that the specification benefits for addressing the two remaining
of the relations between indicators and latent criteria: association and isolation (Bollen,
variables has been evaluated and deemed 1989). In terms of association, the removal of
adequate, allowing the focus to shift to the some forms of error from constructs between
LATENT VARIABLE MODELS OF SOCIAL RESEARCH DATA 403
estimating their association ensures that the depressive symptoms among inner-city youth
association is not underestimated. In terms is mediated by control beliefs.
of isolation, the ability to model extraneous A key concern in tests of mediated effects
influences as latent variables operating at dif- is the reliability of the mediator. The more
ferent points in a model optimizes statistical unreliable the mediator, the more the indirect
control when random assignment to levels of effect is underestimated and the direct effect
causal constructs is not feasible. overestimated (Hoyle & Kenny, 1999). Thus,
With this background, I now describe with an unreliable mediator, it is possible to
two useful latent variable models of cross- conclude partial or no mediation of an effect
sectional data. when mediation is, in fact, full. For this reason,
it is advisable to always model mediators as
latent variables in tests of mediated effects.
Mediated effects
Mediators are variables that represent
Moderated effects
constructs proposed to explain the association
between two variables (Hoyle & Robinson, The evaluation of a moderated effect, in
2003). In social research, mediational conceptual terms, involves an evaluation of
hypotheses typically are evaluated using the the effect (direct or indirect) of an independent
measurement-of-mediation design (Spencer variable on an outcome at different levels
et al., 2005). In this design, the causal of a moderator variable. In social research,
variable is either manipulated or measured moderated effects are sometimes referred to as
and mediators and outcomes are measured. interaction effects and evaluated as a matter of
In cross-sectional designs the mediators course in research involving factorial designs,
and outcomes are assessed simultaneously from which data typically are analyzed using
despite the fact that mediators are presumed analysis of variance. When the independent
to exert a causal influence on the outcomes. variable and/or moderator variable are mea-
The evaluation of a mediated effect involves sured on a continuum rather than manipulated,
partitioning the effect of a causal variable on the data are best analyzed using techniques
an outcome into two portions: the direct effect that do not evaluate interaction effects as a
and the indirect effect. The direct effect is that matter of course (e.g. multiple regression).
portion of the effect that is not transmitted In such cases, researchers must manually
through the mediator. Referring back to construct interaction terms and evaluate
Figure 23.1, the path from F1 to F3 is the them in strategically specified predictive
direct effect. In the three-variable case, the equations.
remaining portion of the effect is transmitted Tests of moderated effects involving latent
through the mediator as an indirect effect. In variables are rarer still. This is unfortunate
the model shown in Figure 23.1, the mediator because, as with tests of mediated effects, tests
is F2 and the magnitude and direction of the of moderated effects are adversely affected
indirect effect is expressed in the product by measurement error (Busemeyer & Jones,
of the parameter estimates for the F1-F2 1983; McClelland & Judd, 1993). Even
and F2-F3 relations. Statistically speaking, though the adverse effect of measurement
F2 mediates the relation between F1 and error could be overcome by specifying the
F3 if the indirect effect is significant. If the interaction term as a latent variable, histori-
F1-F3 relation remains significant in the cally this strategy has not been accessible to
presence of the significant indirect effect, most social researchers because the loadings
then the mediation is only partial; if the F1-F3 and uniqueness terms associated with these
relation is nonsignificant, then the mediation latent variables are nonlinear transformations
is full. For example, using structural equation of their counterparts in the latent variables
modeling in this way, Deardorff et al. for the independent and moderator variables
(2003) found that the effect of stress on (Kenny & Judd, 1984). This nonlinearity can
404 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
be incorporated into the specification of the components. Thus, for instance, if F1-F2
latent variable representing the interaction relation is significant, the inference regard-
term; however, if the number of indicators ing directionality is nonetheless ambiguous
of the independent and moderator variables because it might reflect nothing more than
exceeds three, the specification becomes pro- correlation between the stable components
hibitively complex. Fortunately, ignoring the of F1 and F2. This inferential ambiguity is
theoretical nonlinearity in these parameters overcome through the use of a replicative
produces results that, in practical terms, strategy, in which all variables are assessed
are equivalent to results obtained using the and included in the model at each wave.
more complex specification (Marsh et al., This strategy allows for the evaluation of
2004). Published examples of moderated lagged effects from which temporal stability
effects involving latent variables are rare; a in constructs has been removed.
substantive example can be found in Ping
(1996).
Cross-lagged panel models
In the simplest latent variable cross-lagged
Longitudinal
panel model, two constructs are measured
Well-designed longitudinal studies offer sig- using multiple indicators at two points in
nificant inferential advantages over cross- time. The name derives from the fact that, in
sectional studies, the foremost being the addition to autoregressive effects—the effect
possibility of definitive tests of directionality of each construct on itself at subsequent
(Halaby, 2004). In this section, I focus on waves (i.e. stability)—the model specifies
latent variable models of data from studies an effect of each construct on the other
involving at least two assessments. construct at the next wave. These latter
effects, which are the focal part of the model,
are the cross-lagged paths. As with tests of
Basic model
mediated and moderated effects, controlling
In the basic longitudinal latent variable for measurement error is vital in cross-lagged
model, variables are positioned in a model panel models. In such models the adverse
according to when they were assessed. Thus, effect of measurement error extends beyond
for example, the specific arrangement of the attenuation of associations. Because
the latent variables in the model shown hypotheses about causal priority concern
in Figure 23.1 would suggest a three-wave the relative magnitude of the cross-lagged
longitudinal design in which F1 was assessed paths, it is critical that the reliability of the
in the first wave, F2 in the second wave, variables be equivalent. By modeling them
and F3 in the third wave. This rudimentary as latent variables, the reliability of each
longitudinal model is an example of the variable is 1.0 and, therefore, differences in
sequential strategy of longitudinal research, in the cross-lagged path coefficients cannot be
which the temporal order in which constructs attributed to differences in the reliability of
are assessed corresponds to the presumed the measures.
causal order of constructs in the model In latent variable models of cross-lagged
(Hoyle & Robinson, 2003). Data from this panel data, the primary concern is the absolute
design are an improvement over data from the and the relative magnitudes of the coefficients
cross-sectional design because the directional associated with the cross-lagged paths. In
paths can logically only go in the direction absolute terms, the concern is whether, after
they are specified; however, the improvement control for stability in the constructs, there
is modest. This is because, using terminology is evidence of an association between them.
from the trait-state-error model described In relative terms, the concern is whether one
earlier, the latent variables include both trait cross-lagged path coefficient is larger than
(i.e. stable) and state (i.e. time-specific) the other over the same span of time. If one
LATENT VARIABLE MODELS OF SOCIAL RESEARCH DATA 405
data (MacCallum et al., 1993). For instance, self-reports, that error is reflected in the latent
one might posit a second-order factor model variable rather than the measurement error
in which a general factor accounts for the terms. Only the influence of those sources
correlations among three first-order factors. of error that vary across indicators, as in
Even though suitable absolute fit of this model multitrait-multimethod models, is removed
would be consistent with the hypothesis, the from latent variables (DeShon, 1998).
fit of this model would be identical to the fit
of a model in which the three factors were
Software
simply allowed to correlate. This issue is of
greater concern in structural models estimated Historically, software programs for estimat-
from cross-sectional data, for which plausible ing structural equation models could be
equivalent models often can be generated described as a limitation because their use
that reverse the specified direction of effects. assumed familiarity with statistical theory
For this reason, structural equation modeling and notation at a level uncommon among
alone cannot determine the direction of associ- social researchers. Ironically, the relative
ation between two constructs. The advantages ease with which software programs can now
structural equation modeling offers over other be used for estimating structural equation
statistical approaches in this regard are the models has introduced a new concern—that
capacity to model relations between latent social scientists can specify and estimate
variables and, in quasi- or nonexperimental models without adequate understanding of
studies, to isolate putative causes and effects what they are doing. Steiger (2001) notes
from extraneous variables. that, because of the ease with which such
software can be used (e.g. specification
through diagrams), ‘the newcomer is led to
Measurement error correction
believe that there is this impressive, but easy-
As noted throughout the chapter, a significant to-use technique that allows modeling of
advantage of latent variable models is the causality in a kind of flow diagram’ (p. 338).
capacity for modeling relations between Given the many ways in which a structural
variables from which the effects of certain equation modeling analysis can go awry,
sources of measurement error have been the complexity in evaluating model fit, and
removed. It is not uncommon, however, for the caveats associated with inferences about
social researchers to overstate the degree to models and parameters, the likelihood of
which latent variables are error free (DeShon, misuses of structural equation modeling by
1998). In the typical case (cf. Bollen & novice users is higher than ever.
Lennox, 1991) latent variables are a function
of the commonality across all their indicators.
Variance in indicators not shared with the FUTURE DIRECTIONS
remaining indicators is termed measurement
error, or uniqueness, and potentially includes As the use of structural equation modeling
random and systematic components. Of rele- has become more commonplace across the
vance to measurement error correction is the social sciences, the gap between what can
fact that the measurement errors do not— legitimately be accomplished using the tech-
and therefore the latent variables do—contain nique in its traditional form and the questions
variance that is common to all the indicators. social scientists wish to ask of their data
As such, if all indicators are subject to the has become increasingly apparent. In effect,
same source of measurement error, the latent the direction of influence between statistical
variable, in fact, is not free of the influence methodology and research application has
of that source of error. For instance, if error reversed. From the mid-1970s to the late-
attributable to self-reports is a concern but 1990s, as social researchers came to appreci-
all indicators are operationally defined as ate the potential of latent variable modeling,
408 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
they were inspired to address more complex Advances in the capacity for estimating
research questions in a more holistic manner. from non-normal and categorical data have
At the dawn of the twenty-first century, paved the way for advances in terms of
with structural equation modeling having the kinds of latent variable models that can
become more familiar to social scientists, be specified and estimated. For instance,
they began contemplating research questions a focus by methodologists on estimators
beyond the reach of standard specification and specification strategies for modeling
and estimation strategies. Thus, an alternative nonlinear effects promises to increase the ease
direction of influence, from social researchers with which such effects can be incorporated
to statistical methodologists, has emerged. into models such as the ones described in this
Spurred by the increasingly complex demands chapter (Schumacker & Marcoulides, 1998).
of social research data and questions, sta- A particularly promising advance concerns
tistical methodologists are extending the the modeling of latent variables that are
boundaries of what traditionally would have categorical. These latent variables can reflect
been considered appropriate applications of latent classes in the traditional sense or reflect
structural equation modeling. distinctive classes of latent growth trajectories
Three primary fronts on which this exten- (Muthén, 2001). Such applications illustrate
sion is taking place concern qualities of social the increasing generality of statistical models
research data. As noted earlier, the standard for estimating latent variable models in social
estimator in structural equation modeling, research, potentially including in a single
maximum likelihood, assumes multivari- model continuous and categorical indicators,
ate normality and continuous measurement. continuous and categorical latent variables—
In practice, these conditions often are not met. of which some are latent classes, and multiple
Even though the maximum likelihood estima- levels of analysis—of which one might be a
tor is reasonably robust to violations of these latent growth model of individual-level data.
assumptions, the extent of non-normality or
coarseness of measurement in social research
data sometimes clearly exceeds the limits of CONCLUSION
this robustness. Advances in estimation from
non-normal and categorical data that perform Structural equation modeling is a flexible and
well in practice are increasingly available general statistical approach to specifying and
to social researchers (e.g. Muthén, 1984). evaluating latent variable models in social
A third characteristic of data with which research. In this chapter, I described and
social researchers often have to contend is provided examples of basic and advanced
missingness. Considerable progress has been applications of structural equation modeling
made in the understanding and implementa- relevant to social research. Measurement
tion of strategies for managing missingness models focus strictly on the relations between
that are not specific to a particular statistical observed variables and the latent variables
strategy (e.g. Schafer & Graham, 2002). Also, they are assumed to reflect. They can be used
however, statistical software for estimating to decompose variance in observed variables
latent variable models is increasingly likely in ways that both increase understanding of
to include an estimator that allows for the observed variables and produce latent vari-
the management of missingness within the ables that are relatively pure representations
context of specific models (Arbuckle, 1996; of the constructs the observed variables are
Enders, 2001). Because meeting minimum assumed to reflect. Even though structural
sample size recommendations for applications equation modeling is not a viable solution
of structural equation modeling is a challenge to the primary limitation of cross-sectional
in some social science literatures, the avail- data—the inability to determine direction
ability of a strategy for keeping all research of influence—it is nonetheless useful for
participants in the analysis sample is critical. modeling such data by enabling some control
LATENT VARIABLE MODELS OF SOCIAL RESEARCH DATA 409
over the effects of measurement error on corresponding to the various trajectory shapes must
directional relations and the inclusion of take into account the relative time lapses between
multiple dependent variables and the relations waves. In Figure 23.3, the use of 0, 1, 2, and 3 for
the coefficients corresponding to a linear trajectory
among them. The ability to eliminate some indicates an assumption of equal spacing between
sources of measurement error is particularly waves. If, for example, there were six months between
beneficial in ‘third-variable’ models such the first three waves and a year between the last
as mediation and moderation, in which the two waves, the coefficients corresponding to a linear
effects of such error are compounded. The trajectory would be 0, 1, 2, 3, and 5.
full benefits of structural equation modeling
are apparent in latent variable models of
longitudinal data. In traditional autoregressive REFERENCES
models, structural equation modeling allows
for simultaneous estimation of directional Anderson, J. C., & Gerbing, D. W. (1988). Structural
effects across waves controlling for measure- equation modeling in practice: A review and
ment error. In latent growth curve models, recommended two-step approach. Psychological
structural equation modeling allows for the Bulletin, 103, 411–423.
estimation of patterns of change and the Arbuckle, J. L. (1996). Full information estimation in the
prediction of variation in those patterns across presence of incomplete data. In G. A. Marcoulides &
individuals. These models are illustrative of R. E. Schumacker (Eds.), Advanced structural equation
the broad range of latent variable models modeling techniques (pp. 243–277). Mahwah, NJ:
relevant to social research. A burgeoning Erlbaum.
Arbuckle, J. L. (2003). AMOS 5.0 update to the AMOS
didactic literature on applied structural equa-
User’s Guide. Chicago, IL: SPSS.
tion modeling coupled with software updated
Bentler, P. M. (1983). Simultaneous equation systems as
frequently to reflect the latest developments moment structure models. Journal of Econometrics,
in estimation and testing make these models 22, 13–42.
more appealing than ever. Bentler, P. M. (1990). Comparative fit indices in struc-
tural models. Psychological Bulletin, 107, 238–246.
Bentler, P. M. (1995). EQS structural equations program
manual. Encino, CA: Multivariate Software.
NOTES Bentler, P. M., & Bonett, D. G. (1980). Significance
tests and goodness-of-fit in the analysis of covariance
1 An important exception is Muthén’s more gen- structures. Psychological Bulletin, 88, 588–606.
eral framework, implemented in the Mplus software Bentler, P. M., & Weeks, D. G. (1980). Linear structural
program (Muthén & Muthén, 2006). equations with latent variables. Psychometrika, 45,
2 Readers familiar with exploratory factor analysis 289–308.
will recognize this specification as corresponding to Blalock H. M. (1961). Correlation and causality: The
simple structure, which, in the exploratory case, is
multivariate case. Social Forces, 39, 246–251.
sometimes achieved through rotation. By forcing
many loadings to zero, confirmatory factor analysis
Blalock, H. M. (1964). Causal inferences in nonexper-
avoids the indeterminacy of parameter estimates in imental research. Chapel Hill: University of North
exploratory factor analysis. Carolina Press.
3 With only two or three first-order factors, Bollen, K. A. (1989). Structural equations with latent
although a second-order factor could be specified, variables. New York: Wiley.
such a model would yield identical fit to a first- Bollen, K. A. (2002). Latent variables in psychology and
order model with correlated factors. Thus, although the social sciences. Annual Review of Psychology 53,
adequate fit of such models would suggest that a 605–634.
second-order model is consistent with the data, a Bollen, K. A., & Curran, P. J. (2004). Autoregressive
first-order model with correlated factors would be
latent trajectory (ALT) models: A synthesis of two
equally consistent with the data. Nonetheless, if the
loadings of a set of first-order factors on a second-
traditions. Sociological Methods and Research, 32,
order factor are high, the data favor interpretation of 336–383.
the second-order model. Bollen, K. A., & Curran, P. J. (2006). Latent curve
4 As with orthogonal polynomials in analysis of models: A structural equation perspective. Hoboken,
variance or multiple regression analysis, the values NJ: Wiley.
410 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Bollen, K. A., & Lennox, R. D. (1991). Conventional DeShon, R. P. (1998). A cautionary note on measure-
wisdom on measurement: A structural equation ment error corrections in structural equation models.
perspective. Psychological Bulletin, 110, 305–314. Psychological Methods, 4, 412–423.
Bollen, K. A., & Long, J. S. (Eds.) (1993). Testing Deutskens, E., de Ruyter, K., & Wetzels, M. (2006).
structural equation models. Thousand Oaks, CA: Sage An assessment of equivalence between online and
Publications. mail surveys in service research. Journal of Service
Breckler, S. J. (1990). Applications of covariance Research, 8, 346–355.
structure modeling in psychology: Cause for concern? Diamantopoulos, A., & Siguaw, J. A. (2000). Introducing
Psychological Bulletin, 107, 260–273. LISREL: A guide for the uninitiated. London: Sage
Browne, M. W. (1974). Generalized least squares Publications.
estimators in the analysis of covariance structures. Duncan, O. D. (1966). Path analysis: Sociological
South African Statistical Journal, 8, 1–24. examples. American Journal of Sociology, 74,
Browne, M. W. (1984). Asymptotic distribution free 119–137.
methods in analysis of covariance structures. British Duncan, O. D. (1969). Some linear models for two-wave,
Journal of Mathematical and Statistical Psychology, two-variable panel analysis. Psychological Bulletin,
37, 62–83. 72, 177–182.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of Duncan, O. D. (1975). Introduction to structural equation
assessing model fit. In K. A. Bollen & J. S. Long (Eds.), models. New York: Academic Press.
Testing structural equation models (pp. 136–162). Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., &
Thousand Oaks, CA: Sage Publications. Alpert, A. (1999). An introduction to latent variable
Busemeyer, J. R., & Jones, L. D. (1983). Analysis of growth curve modeling. Mahwah, NJ: Erlbaum.
multiplicative combination rules when the causal DuToit, S., Cudeck, R., & Sörbom, D. (Eds.) (2001).
variables are measured with error. Psychological Structural equation modeling: Present and future.
Bulletin, 93, 549–562. Lincolnwood, IL: Scientific Software International.
Byrne, B. M. (2001). Structural equation modeling Enders, C. K. (2001). A primer on maximum likelihood
with AMOS: Basic concepts, applications, and algorithms available for use with missing data.
programming. Mahwah, NJ: Erlbaum. Structural Equation Modeling, 8, 128–141.
Byrne, B. M. (2006). Structural equation modeling with Funk, C. L. (1999). Bringing the candidate into models
EQS: Basic concepts, applications, and programming. of candidate evaluation. Journal of Politics, 61,
Mahwah, NJ: Erlbaum. 700–720.
Byrne, B. M., Shavelson, R. J., & Muthén, B. Gibson, W. A. (1959). Three multivariate models: Factor
(1989). Testing for the equivalence of factor analysis, latent structure analysis, and latent profile
covariance and mean structures: The issue of partial analysis. Psychometrika, 24, 229–252.
measurement invariance. Psychological Bulletin, 105, Goldberger, A. S. (1971). Econometrics and psychomet-
456–466. rics: A survey of commonalities. Psychometrika, 36,
Campbell, D. T., & Fiske, D. W. (1959). Convergent and 83–107.
discriminant validation by the multitrait-multimethod Goldberger, A. S., & Duncan, O. D. (Eds.) (1973).
matrix. Psychological Bulletin, 56, 81–105. Structural equation models in the social sciences.
Clogg, C. C. (1995). Latent class models. In G. Arminger, New York: Academic Press.
C. C. Clogg, & M. E. Sobel (Eds.), Handbook of Gollob, H. F., & Reichardt, C. S. (1991). Interpreting
statistical modeling for the social and behavioral and estimating indirect effects assuming time lags
sciences (pp. 311–359). New York: Plenum. really matter. In L. M. Collins & J. L. Horn
Cole, D. A., Martin, N. C., & Steiger, J. H. (2005). (Eds.), Best methods for the analysis of change
Empirical and conceptual problems with longitudinal (pp. 243–259). Washington, DC: American Psycho-
trait–state models: Introducing a trait-state-occasion logical Association.
model. Psychological Methods, 10, 3–20. Halaby, C. N. (2004). Panel models in sociological
Collins, L. M., & Wugalter, S. E. (1992). Latent class research: Theory into practice. Annual Review of
models for stage-sequential dynamic latent variables. Sociology, 30, 507–544.
Multivariate Behavioral Research, 27, 131–157. Hancock, G. R., & Mueller, R. O. (Eds.) (2006). Structural
Deardorff, J., Gonzales, N. A., & Sandler, I. N. (2003). equation modeling: A second course. Greenwich, CT:
Control beliefs as a mediator of the relation between Information Age Publishing.
stress and depressive symptoms among inner city Hoyle, R. H. (1991). Evaluating measurement models
adolescents. Journal of Abnormal Child Psychology, in clinical research: Covariance structure analy-
31, 205–217. sis of latent variable models of self-conception.
LATENT VARIABLE MODELS OF SOCIAL RESEARCH DATA 411
Journal of Consulting and Clinical Psychology, 59, Kenny, D. A., & Kashy, D. A. (1992). Analysis of the
67–76. multitrait-multimethod matrix by confirmatory factor
Hoyle, R. H. (Ed.) (1995). Structural equation modeling: analysis. Psychological Bulletin, 112, 165–172.
Concepts, issues, and applications. Thousand Oaks, Kenny, D. A., & Zautra, A. (1995). The trait-state-error
CA: Sage Publications. model for multiwave data. Journal of Consulting and
Hoyle, R. H. (2000). Confirmatory factor analysis. In Clinical Psychology, 63, 52–59.
H. E. A. Tinsely & S. D. Brown (Eds.), Handbook Kline, R. B. (2005). Principles and practice of structural
of applied multivariate statistics and mathematical equation modeling (2nd ed.). New York: Guilford
modeling (pp. 465–497). New York: Academic Press. Press.
Hoyle, R. H., & Duvall, J. L. (2004). Determining the num- Loehlin, J. C. (1992). Genes and environment in
ber of factors in exploratory and confirmatory factor personality development. Thousand Oaks, CA: Sage
analysis. In D. Kaplan (Ed.), Handbook of quantitative Publications.
methodology for the social sciences (pp. 301–315). MacCallum, R. C. (2003). Working with imperfect
Thousand Oaks, CA: Sage Publications. models. Multivariate Behavioral Research, 38,
Hoyle, R. H., & Kenny, D. A. (1999). Sample size, 113–139.
reliability, and tests of statistical mediation. In MacCallum, R. C., & Austin, J. T. (2000). Applications
R. H. Hoyle (Ed.), Statistical strategies for small sample of structural equation modeling in psychological
research (pp. 195–222). Thousand Oaks, CA: Sage research. Annual Review of Psychology, 51, 201–226.
Publications. MacCallum, R. C., Roznowski, M., & Necowitz, L. B.
Hoyle, R. H., & Robinson, J. I. (2003). Mediated and (1992). Model modifications in covariance structure
moderated effects in social psychological research: analysis: The problem of capitalization on chance.
Measurement, design, and analysis issues. In Psychological Bulletin, 111, 490–504.
C. Sansone, C. Morf, & A. T. Panter (Eds.), Handbook MacCallum, R. C., Wegener, D. T., Uchino, B. N., &
of methods in social psychology (pp. 213–233). Fabrigar, L. R. (1993). The problem of equivalent
Thousand Oaks, CA: Sage Publications. models in applications of covariance structure
Hu, L.-T., & Bentler, P. M. (1995). Evaluating model fit. analysis. Psychological Bulletin, 114, 185–199.
In R. H. Hoyle (Ed.), Structural equation modeling: Marcoulides, G. A., & Schumacker, R. E. (Eds.)
Concepts, issues, and applications (pp. 76–99). (2001). Advanced structural equation modeling:
Thousand Oaks, CA: Sage Publications. New developments and techniques. Mahwah, NJ:
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit Erlbaum.
indexes in covariance structure analysis: Conventional Marsh, H. W., & Grayson, D. (1995). Latent variable
criteria versus new alternatives. Structural Equation models of multitrait-multimethod data. In R. H. Hoyle
Modeling, 6, 1–55. (Ed.), Structural equation modeling: Concepts, issues,
Jöreskog, K. G. (1973). A general method for and applications (pp. 177–198). Thousand Oaks, CA:
estimating a linear structural equation system. In Sage Publications.
A. S. Goldberger & O. D. Duncan (Eds.), Structural Marsh, H. W., Wen, Z., & Hau, K.-T. (2004).
equation models in the social sciences (pp. 85–112). Structural equation models of latent interactions:
New York: Academic Press. Evaluation of alternative estimation strategies and
Jöreskog, K. G. (1994). On the estimation of polychoric indicator construction. Psychological Methods, 9,
correlations and their asymptotic covariance matrix. 275–300.
Psychometrika, 59, 381–389. Maruyama, G. M. (1998). Basics of structural equation
Jörskog, K. G. & Sörbom, D. (1999). LISREL 8: modeling. Thousand Oaks, CA: Sage Publications.
Structural equation modeling with the SIMPLIS McArdle, J. J., & McDonald, R. P. (1984). Some
command language. Lincolnwood, IL: Scientific algebraic properties of the reticular action model for
Software International. moment structures. British Journal of Mathematical
Kaplan, D. (2000). Structural equation modeling: and Statistical Psychology, 37, 234–251.
Foundations and extensions. Thousand Oaks, CA: McClelland, G. H., & Judd, C. M. (1993). Statistical
Sage Publications. difficulties of detecting interactions and moderator
Keesling, J. W. (1972). Maximum likelihood approaches effects. Psychological Bulletin, 114, 376–390.
to causal analysis. Unpublished doctoral dissertation, McCutcheon, A. L. (1994). Latent logit models with
University of Chicago. polytomous effects variables. In A. von Eye & C. C.
Kenny, D. A., & Judd, C. M. (1984). Estimating the Clogg (Eds.), Latent variables analysis: Applications
nonlinear and interactive effects of latent variables. for developmental research (pp. 353–372). Thousand
Psychological Bulletin, 96, 201–210. Oaks, CA: Sage Publications.
412 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
McDonald, R. P., & Ringo Ho, M.-H. (2002). Principles research (pp. 3–35). Thousand Oaks, CA: Sage
and practice in reporting structural equation analyses. Publications.
Psychological Methods, 7, 64–82. Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005).
McPherson, J. M., & Rotolo, T. (1995). Measuring Establishing a causal chain: Why experiments are
the composition of voluntary groups: A multitrait- often more effective than mediational analyses
multimethod analysis. Social Forces, 73, 1097–1115. in examining psychological processes. Journal of
Muthén, B. O. (1984). A general structural equation Personality and Social Psychology, 89, 845–851.
model with dichotomous, ordered categorical and Steenkamp, J. E. M., & Baumgartner, H. (1998).
continuous latent variable indicators. Psychometrika, Assessing measurement invariance in cross-national
49, 115–132. consumer research. Journal of Consumer Research,
Muthén, B. O. (2001). Second-generation structural 25, 78–90.
equation modeling with a combination of categorical Steiger, J. H. (2001). Driving fast in reverse: The
and continuous latent variables: New opportuni- relationship between software development, theory,
ties for latent class/latent growth modeling. In and education in structural equation modeling.
L. M. Collins & A. Sayer (Eds.), New methods for Journal of the American Statistical Association, 96,
the analysis of change (pp. 291–322). Washington: 331–338.
American Psychological Association. Steiger, J. H. (2002). When constraints interact:
Muthén, L. K., & Muthén, B. O. (2006). Mplus user’s A caution about reference variables, identification
guide (4th ed.). Los Angeles, CA: Muthén & Muthén. constraints, and scale dependencies in structural
Muthén, B. O., & Shedden, K. (1999). Finite mixture equation modeling. Psychological Methods, 7,
modeling with mixture outcomes using the EM 210–227.
algorithm. Biometrics, 55, 463–469. Steiger, J. H., & Lind, J. C. (1980, May). Statistically
Myung, J. (2003). Tutorial on maximum likelihood based tests for the number of common factors. Paper
estimation. Journal of Mathematical Psychology, 47, presented at the Annual Meeting of the Psychometric
90–100. Society, Iowa City, IO.
Pentz, M. A., & Chou, C.-P. (1994). Measurement Tenko, R., & Marcoulides, G. (2000). A first course in
invariance in longitudinal clinical research Assum- structural equation modeling. Mahwah, NJ: Erlbaum.
ing change from development and intervention. Tepper, K., & Hoyle, R. H. (1996). Latent variable models
Journal of Consulting and Clinical Psychology, 62, of need for uniqueness. Multivariate Behavioral
450–462. Research, 31, 467–494.
Ping, R. A., Jr. (1996). Estimating latent variable Thompson, M. S. (2006). Evaluating between-group dif-
interactions and quadratics: The state of this art. ferences in latent variable means. In G. R. Hancock, &
Journal of Management, 22, 163–183. R. O. Mueller (Eds.), Structural equation modeling:
Reynolds, C. A., Finkel, D., McArdle, J. J., Gatz, M., A second course (pp. 119–169). Greenwich, CT:
Berg, S., & Pedersen, N. L. (2005). Quantitative Information Age Publishing.
genetic analysis of latent growth curve models Tomer, A. (2003). A short history of structural equation
of cognitive abilities in adulthood. Developmental models. In B. H. Pugesek, A. Tomer, & A. Von Eye
Psychology, 41, 3–16. (Eds.), Structural equation modeling: Applications in
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our ecological and evolutionary biology (pp. 85–124).
view of the state of the art. Psychological Methods, Cambridge, UK: Cambridge University Press.
7, 147–177. Vandenberg, R. J., & Lance, C. E. (2000). A review and
Schumacker, R. E., & Lomax, R. G. (2004). A beginner’s synthesis of the measurement invariance literature:
guide to structural equation modeling (2nd ed.). Suggestions, practices and recommendations for
Mahwah, NJ: Erlbaum. organizational research. Organizational Research
Schumacker, R. E., & Marcoulides, G. A. (Eds.) Methods, 3, 4–70.
(1998). Interaction and nonlinear effects in structural Wansbeek, T., & Meijer, E. (2000). Measurement error
equation modeling. Mahwah, NJ: Erlbaum. and latent variables in econometrics. Amsterdam:
Sher, K. J., Wood, M. D., Wood, P. K., & Raskin, G. Elsevier Science.
(1996). Alcohol outcome expectancies and alcohol Widaman, K. F., & Reise, S. P. (1997). Explor-
use: A latent variable cross-lagged panel study. ing the measurement invariance of psychological
Journal of Abnormal Psychology, 105, 561–574. instruments: Applications in the substance use
Sobel, M. E. (1994). Causal inference in latent variable domain. In K. J. Bryant, M. Windle, & S. G. West
models. In A. von Eye, & C. C. Clogg (Eds.), Latent (Eds.), The science of prevention: Methodologi-
variables analysis: Applications for developmental cal advances from alcohol and substance abuse
LATENT VARIABLE MODELS OF SOCIAL RESEARCH DATA 413
research (pp. 281–323). Washington, DC: American Wright, S. (1934). The method of path coef-
Psychological Association. ficients. Annals of Mathematical Statistics, 5,
Wiley, D. E. (1973). The identification problem 161–215.
for structural equation models with unmeasured Wright, S. (1968). Evolution and the genetics of
variables. In A. S. Goldberger & O. D. Duncan (Eds.), populations (vol. 1). Chicago: University of Chicago
Structural equation models in the social sciences Press.
(pp. 69–83). New York: Academic Press. Zautra, A. J., Marbach, J. J., Raphael, K. G., Dohrenwend,
Willett, J. B., & Sayer, A. G. (1994). Using covariance B. P., Lennon, M. C., & Kenny, D. A. (1995).
structure analysis to detect correlates and predictors The examination of myofascial face pain and
of individual change over time. Psychological Bulletin, its relationship to psychological distress. Health
116, 363–381. Psychology, 14, 223–231.
24
Equating Groups
Stephen G. West and Felix Thoemmes
are typically individual participants, but they of the treatment effect. We then introduce
may be larger aggregations such as schools modern methods of adjusting treatment
or entire communities. This process implies effects in observational studies for measured
that the expected mean of the units in the differences at baseline. These methods can
T group will equal the expected mean of substantially reduce any bias in the estimate of
the C group on any conceivable measured the treatment effect. Other approaches attempt
or unmeasured baseline variable so that to bracket the size of the treatment effect
Y T − Y C may be taken as an unbiased so that it represents a reasonable estimate
estimate of the treatment effect. In contrast, even if there are variations on important
the observational study uses an unknown unmeasured differences at baseline. Finally,
process to assign participants to the groups. we consider design enhancements that help
Participants may choose to receive the T rule out likely effects of unmeasured variables
versus C, or participants may receive the that may provide alternative explanations for
treatment because they are located in a single the observed effect of treatment.
community, school, hospital, or other larger
unit that has agreed to participate in the study.
The process through which participants end up RANDOMIZED EXPERIMENTS
in the T versus C groups is unknown, implying
that researchers should expect that there are Randomization approximately equates the T
potential mean differences on background and C groups at baseline. More formally,
variables between the T and C groups at randomization produces two important results
baseline, even before treatment commences. (Holland, 1986; West et al., 2000). First, as we
Now Y T − Y C no longer represents an observed above, the expected mean on any
unbiased estimate of the causal effect of the participant characteristic at baseline
will be
treatment, but rather a confounded estimate equal
in the
T and C groups, E Y Tbaseline =
reflecting some combination of the true causal E YCbaseline , where E( ) is the expected
effect of treatment and preexisting differences value of the variable in parentheses. Second,
between the groups on measured or unmea- the binary variable X (1 = T ; 0 = C)
sured variables at baseline (Reichardt, 2006). indicating the treatment condition, is expected
Only by carefully assessing critical participant to be unrelated to all possible participant
characteristics at baseline and developing characteristics at baseline, E rXYbaseline = 0.
methods to equate the T and C groups prior to These two results imply that Y T − Y C at
the beginning of treatment can the researcher post test will be an unbiased estimate of
even approximately estimate the desired effect the treatment effect so that no adjustment
of the treatment. of this effect is needed. Note, however, that
We begin this chapter by briefly reviewing these results are expectations. They will hold
the randomized experiment. The randomized exactly only given very large sample sizes or
experiment is often described as the ‘gold across a large number of exact replications of
standard’ design and it serves as an important the same experiment conducted on a single
benchmark for the observational study. We population. In any single experiment using
identify some ways in which even randomized more modest sample sizes—‘unfortunate
experiments can be enhanced through the use randomization’—in which the T and C groups
of additional procedures designed to more differ at baseline on some subset of important
closely equate the groups at baseline. We background variables can be expected to occur
then briefly review studies comparing the with some regularity. For this reason many
treatment effect estimates from randomized journals in the public health area formally
experiments to those of observational studies require that means of the T and C groups on
studying similar treatments, to provide infor- important baseline measures be reported as a
mation about the conditions under which these check on the success of the randomization in
two designs may lead to different estimates the experiment. Following our presentation of
416 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
additional requirements for randomized field procedures that can provide proper estimates of
experiments, we will discuss procedures that the treatment effect when there is treatment
use these baseline measures to equate groups noncompliance.
more adequately prior to treatment in order to 3 Absence of Attrition. All participants who are
provide more statistically powerful tests of the assigned to T and C conditions must be
treatment effects. measured on the outcome variable. Even though
randomization serves to equate participants on
average at baseline, this equating is potentially lost
Additional requirements if some participants are not measured at posttest.
Of most concern is differential attrition in which
Randomized experiments involve additional participants with different characteristics drop out
requirements that must be met for valid esti- of the two groups. For example, in an experiment
mation of the treatment effect (see Chapter 8). investigating a new method of mathematics
These requirements are routinely met in most instruction, less mathematically talented students
laboratory experiments, but can be easily might find the new course too challenging and
violated in community settings. Failure to withdraw prior to the collection of the outcome
meet these requirements may necessitate the measure. Y T would only be based on the scores
of the more talented students assigned to the
use of special procedures, the inclusion of
T condition, leading to an overestimate of the
additional design features, or the use of effectiveness of the course.
special analysis procedures that adjust for the
Modern missing data techniques (Little & Rubin,
potential bias (Barnard et al., 1998). Four 2002; Schafer & Graham, 2002) can improve the
requirements over which the experimenter estimation of the treatment effect, particularly if
may only have limited control are of particular variables that are highly related to the outcome
importance in randomized field experiments1 . (e.g. baseline measures on the outcomes of
interest), to missingness, or ideally both are
1 Proper Randomization. The randomization process measured at baseline. Full information maximum
must be properly carried out and adhered to. likelihood estimation (FIML), now available in
Treatment providers must not be permitted to several statistical packages (e.g. Mplus), combines
alter the assignment of participants to the T and all of the observed data to produce optimal
C conditions. Kopans (1994) presents evidence estimates and standard errors for the treatment
that reassignment of high-risk women to the effect and other parameters of interest in the
treatment condition apparently occurred in a statistical model. Multiple imputation (MI), also
large national randomized trial evaluating the available in several statistical packages (e.g. SAS),
effectiveness of screening mammography. Connor makes multiple copies of the dataset. In each copy,
(1977) provides other examples of experiments in the optimal predicted value for each missing datum
which randomization failed or was not maintained is calculated, then random error matching that in
by treatment providers. He suggests procedures the complete data is added. The step of adding
that potentially minimize the likelihood of such random error ensures that the original variability
randomization failures. Robins (1989) and Hernán of the observed data is retained in the values
et al. (2001) present methods of adjusting that are imputed. The statistical model testing the
treatment effect estimates in complex longitudinal treatment effect is then estimated in each copy of
studies, for example, when participants are the dataset. Finally, the estimates of the treatment
reassigned to another treatment, as in certain effects (and other parameters of interest) in each
medical studies in which the patient does not copy of the dataset are recombined. FIML and
respond to the assigned treatment. MI will both produce unbiased estimates of the
2 Treatment Compliance. The participants must treatment effect with proper standard errors if
receive the intended treatment. In randomized missingness is related to measured variables in the
experiments studying mammography screening, dataset, but not if there are other aspects of the
some participants have refused screening (T). missing variables that are not captured by other
Other participants in the C group have sought out variables in the dataset. Consider two potential
mammography screening outside the experiment reasons why participants might be missing from
(Baker, 1998). West and Sagarin (2000; see also a measurement session in a study of health
Angrist et al., 1996; Jo, 2002) review statistical outcomes in a large company. In the first case,
EQUATING GROUPS 417
each participant’s baseline measure of health (e.g. is unbiased. Unfortunately, this only means
number of days of illness the previous year) is the that the treatment effect will be correct
only variable that systematically predicts whether on average. There is no guarantee that
the participant will be present for the session. In unfortunate randomization will not occur in
the second case, several of the participants in a a particular experiment. If the T and C
division of the company are missing because they
groups can be closely equated at baseline on
are suffering health problems from working day
and night on an intensive new project. In the first
variables thought to be important predictors
case, either FIML or MI will produce unbiased of the outcome, then the likelihood of unfor-
estimates because the source(s) of missingness tunate randomization can be substantially
were measured at baseline and are present in the reduced. Equating procedures thus reduce
dataset. In the second case, both FIML and MI will the potential of an incorrect estimate of the
produce biased estimates of the treatment effect treatment effect in a specific experiment.
unless information about project participation and Equating procedures can also have the benefit
the current project-related health problems are of increasing the statistical power of the
present in the dataset. Suppose, however, that the test, the probability that a true treatment
researchers had used available substantive theory effect of a specified size can be detected.
and research to select an extensive set of baseline
Finally, they may help reduce some of
variables that were expected to be related to the
outcome variables, missingness, or both. Once
the uncertainty associated with statistical
again, information about project participation and methods of correcting treatment estimates
project-related health problems are not available when the four additional requirements are
in the dataset. In this case, the use of FIML or MI not met. The use of equating procedures is
will typically lead to estimates of the treatment particularly important when the number of
effect that are less biased, perhaps substantially units to be assigned is small, the units are not
so, than methods that ignore missing data or homogeneous, or the treatment effect is not
that use traditional approaches such as listwise constant, but rather differs in magnitude as a
deletion, pairwise deletion, and mean imputation function of the variable(s) on which equating
to address missing data. is based.
4 Stable-Unit-Treatment-Value Assumption. The
Consider the following example that cap-
response of the participant should not be affected
by the treatments (or the participant’s knowledge
tures the importance of equating with a small
thereof) that other participants receive. This number of non-homogeneous units. Suppose a
condition is known as the stable-unit-treatment- randomized experiment is conducted in which
value assumption (SUTVA); its purpose is to ensure the units are six different US cities. Each
that each participant can only have one true city receives either an intensive mass media
response in the treatment condition (see Rubin, campaign of anti-smoking public service
1978, 1980). Otherwise, the outcomes of the announcements (T) or it does not receive any
participants in the C group are likely to be smoking-related messages in the media (C).
atypical. For example, if cancer patients learn that The cities chosen for study are from three
other participants have been assigned to a more groups: (a) large cities: Chicago, IL, Los
promising treatment condition, they may give up
Angeles, CA; (b) medium-sized cities: Bal-
hope and stop performing their normal health
supportive practices (e.g. proper diet) so that they
timore, MD, Portland, OR; and (c) small
will have worse outcomes than they would have cities: Terre Haute, IN and San Angelo, TX.
had in the absence of this knowledge. Three cities are to be assigned to T and
three cities to C. Assume that size of the
city is known to be strongly related to the
effectiveness of mass media campaigns in
Some effects of improving group
health. Following Cochran and Cox (1957),
comparability at baseline
when there are equal numbers(n) of units in
Randomization combined with meeting the 2n
the T and C groups, there are possible
four requirements outlined above assures n
that the estimate of the treatment effect randomizations. In the present example, there
418 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
6 6 ×5×4×3×2×1 of treatment non-compliance and attrition,
are = (3×2×1)(3×2×1) or 20 possible
3 particularly in experiments in which sample
randomizations. A randomization that com- sizes are moderate rather than extremely large
pared Chicago, Baltimore, and Terre Haute to and the size of the treatment effect is not
Los Angeles, Portland, and San Angelo would constant, but rather depends on the level
be desirable. In contrast, a randomization of the baseline variable (i.e. a baseline ×
that compared Chicago, Los Angeles, and treatment condition interaction). Conceptu-
Baltimore to Portland, Terre Haute, and San ally, matching followed by randomization
Angelo would be unfortunate. To avoid this may also have other potential advantages in
problem, the researcher could match the two certain contexts as it implicitly identifies a
large cities, the two medium cities, and two specific comparison participant with which
small cities. Within each matched pair, one each treatment recipient may be compared.
city would be randomly assigned to T and For example, many clinicians would ideally
one to C, leading to a randomization in which like to understand the effects of treatments
the T and C groups will be more adequately on single cases rather than the average effect
balanced, particularly on the critical baseline of the treatment on patients in general. The
variable of the size of city. matching and randomization procedure can
This procedure of pair matching followed permit a closer approximation of this ideal
by randomization is very general. For exam- than simple randomization.
ple, in a randomized experiment evaluating a When many measures are collected at
new math instruction program, students could baseline, matching becomes more difficult.
be assessed on a baseline measure of math In some cases the multiple measures can be
ability that is expected to be highly related to combined a priori into a single composite
the outcome variable, here math achievement. variable on which matching can occur.
The students could be ranked based on their For example, in research related to breast
scores and pairs formed (the two highest; the cancer, a set of measures including age at
next two highest; … down to the two lowest). menarche, number of first-degree relatives
Once again, within each pair students would (mother, sister) with breast cancer, number
be randomly assigned to T and C groups. This of previous breast biopsies, and age are
procedure ensures that the T and C groups will combined into a single risk score using
be closely equated on the important baseline a formula based on prior epidemiological
variable of pretest math ability, preventing research (Gail et al., 1989). Alternatively,
any possibility of unfortunate randomization measures can be collected on the entire sample
with respect to this critical variable. A second prior to randomization. The researcher can
advantage of this procedure is that it can generate several thousand different possible
lead to far more statistically powerful tests of randomizations and calculate Hotellings T2
the treatment2 . For example, Student (1931) for each randomization using the key variables
showed that an early randomized experiment measured at baseline. Hotellings T2 describes
on 10,000 children studying the effects of the magnitude of the multivariate difference
pasteurized (T) versus raw (C) milk on height between the groups, here on the baseline
and weight gains could have achieved the variables. The randomizations are sorted
same level of statistical power with 50 pairs from low to high in terms of the values
of identical twins. Matching followed by of Hotellings T2 . From the 5 percent or
randomization may also lead to a third 10 percent of the randomizations with the
benefit, providing a stronger foundation for lowest values of Hotellings T2 , a randomiza-
addressing failures to adequately meet the tion is chosen, thereby minimizing potential
additional requirements of randomized exper- problems of unfortunate randomization. More
iments (presented above). For example, the complicated blocking and randomization pro-
existence of well-matched pairs may provide cedures to achieve these same goals in other
a stronger basis for modeling the effects specialized experimental contexts (e.g. trickle
EQUATING GROUPS 419
flow randomization in which participants are Two types of comparisons have been
recruited over an extended period of time) made: (a) single investigations of parallel
are described in Friedman et al. (1998) and randomized experiments and observational
Matthews (2000). studies using similar (possibly identical)
treatments; and (b) extensive meta-analyses
of research areas investigating the effect of
a treatment. Of note, exact agreement of the
RESEARCH COMPARING THE RESULTS estimates of treatment effects in randomized
OF RANDOMIZED EXPERIMENTS AND experiments and observational studies should
OBSERVATIONAL STUDIES not be expected—given sampling error, even
exact replications of a randomized experi-
As a starting point for studying methods to ment using the same population would not
improve the results of observational studies, be expected to produce identical treatment
it is useful to review literature compar- effects. In addition, other differences between
ing the results of randomized experiments the studies representing the two designs may
with those of observational studies. Properly exist. For example, the populations sampled
implemented randomized experiments serve in the two designs, the treatment delivery,
as the ‘gold standard’—they typically provide the research setting, or other methodological
the best, unbiased estimates of the magnitude features (e.g. a less adequate control condition
of the treatment effect. In contrast, the is constructed in the observational study) may
unknown rules through which participants in differ in addition to the focal difference of
observational studies are assigned to the T or C randomized versus non-randomized design
conditions lead to far greater uncertainty about (Cook et al., 2006; Reichardt, 2006; West
the treatment effect estimate. The researcher et al., 2007).
would like to claim that some aspect of
the treatment caused the observed results;
Single comparative studies
however, it may be possible that a failure
to successfully equate the groups at the Studies comparing treatment effect estimates
beginning of the experiment provides a strong from randomized experiments and observa-
alternative explanation (Reichardt, 2006). tional studies have produced diverse results.
Even when adjustments in the treatment effect A classic example is Meier’s (1972) large-
can be made on the basis of measures collected scale evaluation of the effectiveness of the
at baseline, there may be less than complete Salk polio vaccine in the US. In some
certainty that the T and C groups have been states, a randomized experiment was used; in
properly equated. others, an observational study. Even though
Statistical theory clearly identifies failure both designs led to the conclusion that the
to equate the T and C groups on important Salk vaccine was effective, the effect size
variables at baseline as an important plausible in the randomized experiment was substan-
problem that may occur in observational tially larger. Gilbert et al. (1975) suggested
studies. However, it provides little guidance that the difference in effect sizes primarily
as to the likely frequency of this problem in resulted from the different populations on
practice, nor to the contexts in which estimates which the polio rates were based in the
of treatment effects are most likely to be C conditions. In the randomized experiment,
biased. To gain some insights into this issue, the comparison group included only children
below we briefly review literature comparing who had permission to be vaccinated in
the results of randomized experiments with contrast to the observational study in which
observational studies that employed similar the full population was represented.
treatments. We then turn to an examination Cook et al. (2006) reviewed a unique
of modern statistical and design solutions that subset of investigations in which a single
attempt to address these issues. randomized treatment group was compared
420 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
with both a randomized control group that suggests that the effect size estimates
(randomized experiment) and a second non- of observational studies may be associated
randomized comparison group (yoked obser- with more uncertainty than randomized
vational study). Those observational studies experiments.
that created a high-quality comparison group Reviews of other areas also suggest that the
produced comparable results to those of the direction of mean bias is by no means certain.
yoked randomized experiment. Investigations Lipsey and Wilson (1993) analyzed 74 meta-
with a poorly selected comparison group, analyses of behavioral and educational inter-
poor statistical adjustment for baseline differ- ventions, finding no difference in the mean
ences, or which differed in other procedural effect sizes of randomized experiments and
or design features between the observational observational studies. Heinsman and Shadish
study and yoked randomized experiment often (1996) analyzed four meta-analyses in the
produced discrepant findings. areas of drug-use prevention, psychosocial
interventions for surgery, coaching for the
SAT, and ability grouping in secondary
Meta-analyses
schools. They found a larger effect size
Across diverse substantive research areas, for randomized experiments than for obser-
such as skill training, organizational devel- vational studies. Taken together, the meta-
opment, psychotherapy, and medical inter- analytic results suggest that the magnitude of
ventions, meta-analyses have produced bias resulting from the use of an observational
heterogeneous outcomes in which random- study rather than a randomized is typically
ized experiments have shown larger, smaller, not large and its direction is uncertain. They
and no difference in treatment effect estimates also suggest that area-specific choices of
relative to observational studies. An early samples and methodological features (e.g.
influential meta-analytic investigation by type of comparison group) may be important
Sacks et al. (1983) identified six medical determinants of any bias that is observed.
therapies that had been studied using both
randomized experiments and observational
Methodological features
studies. Sacks et al. concluded that
observational studies produced biased results Heinsman and Shadish (1996) coded method-
in comparison to randomized controlled ological features that might potentially
trials. Attempts to adjust treatment effects in account for the observed difference in effect
observational studies for available prognostic sizes between randomized experiments and
factors did not remove this bias. More observational studies in four behavioral sci-
recently, Ioannidis et al. (2001) conducted ence research areas (e.g. SAT coaching, drug
meta-analyses of 45 medical interventions use prevention). Of importance, they found
(e.g. vaccines for meningitis; local versus in a regression analysis that not allowing
general anesthesia) involving a total of 240 self-selection into T versus C conditions in
randomized trials and 168 observational observational studies, using a control group
studies. Overall, there was no consistent from the same population as the treatment
pattern of over- or under-estimation of group, minimizing the baseline effect size
treatment effects by the observational studies difference between the T and C groups,
relative to the randomized experiments and minimizing both overall attrition and
Significant differences between the differential attrition made the treatment effect
randomized experiments and observational estimates more comparable in the two designs.
studies were found in only a small proportion Shadish and Ragsdale (1996) found similar
of the meta-analyses . Ioannidis et al. provided results in a meta-analysis of randomized
evidence of smaller between-study variance experiments and observational studies of mar-
in the randomized experiments than in the ital or family psychotherapy. Consistent with
observational studies, an important finding these findings, Heckman and Robb (1986)
EQUATING GROUPS 421
also point to conceptual and statistical reasons student in the classroom A, an attempt is
why allowing participants to self select into made to identify a student in classroom B
T and C groups is particularly likely to lead who is closely equated on IQ. This matching
to biased estimates. These results suggest process diminishes the mean difference in
that it may be possible to improve estimates baseline IQ between the two groups in our
of treatment effects in observational studies example from MA − MB = 5 in the full
through the careful use of design and analysis unmatched sample to MA − MB = 0.5
strategies. in the reduced, matched sample. A variety
of computer algorithms are available that
Adjustment strategies for equating match T and C participants to produce the
groups at baseline minimum discrepancy on the pretest variable
(see Ming & Rosenbaum, 2001; Rosenbaum,
Matching 2002). These computer algorithms are partic-
Matching is used in observational studies to ularly useful when both the T and C groups
identify a set of participants in the T and are large, are of dramatically different sizes,
C groups that are comparable. To illustrate, or both. For example, observational studies
consider two small school classrooms, labeled of initial trials of innovative programs (T)
A and B, one of which implements an may involve a relatively small number of
innovative new math curriculum, whereas the participants, whereas there are a substantially
other implements a standard math curriculum larger number of participants in the standard
in 6th grade. Table 24.1 illustrates the basic program (C) that serve as the comparison. In
process of simple 1:1 matching. All students such cases, the algorithm will select a variable
in both classrooms are given an IQ test at number of optimal matches (e.g. up to 5) for
the beginning of the school year. For each each participant3 . These variable matching
procedures lead to more adequate equating
Table 24.1 Illustration of simple matching of the groups on the matching variable and
of two small classroom on baseline IQ scores greater statistical power for the T versus C
Pair Classroom A Classroom B comparison, given the larger sample size
130 (Ming & Rosenbaum, 2000).
1 125 124 Researchers are encouraged to measure
2 120 120
many variables at baseline, particularly those
3 119 119
4 119 118 that may be related to treatment group
5 117 116 assignment or the outcome variable. Substan-
6 115 115 tive theory and prior research can provide
7 109 109 guidance in the selection of a set of measures
8 107 107
that will capture as fully as possible potential
9 107 106
10 104 102 baseline differences between the T and
11 101 101 C groups. However, the availability of a large
12 96 96 number of baseline variables makes matching
90 far more complex. In rare cases, a composite
89
variable can be created (e.g. the Gail score
Note: Scores were ordered within units and represent for breast cancer risk described earlier).
pretest IQ scores of participants. Pairs of participants on
the same line represent matched pairs. One person in
More commonly, propensity scores are used.
Classroom A and two persons in Classroom B have no Propensity scores provide an estimate of the
matched pairs. The mean IQ score for all participants in probability that a participant will be assigned
Classroom A is 113; the mean IQ score for all participants to the treatment group (Rosenbaum, 2002;
in Classroom B is 108. The mean difference (Y A − Y B ) for Rosenbaum & Rubin, 1983, 1984; Rubin,
the full unmatched sample is 5. The mean for the matched
pairs of Classroom A is 111.6 and for Classroom B is 111.1,
1997; Shadish et al., 2006; Smith, 1997). The
yielding a mean difference of 0.5. nA = 13 and nB = 14 researcher uses all baseline variables (or a
for the full sample subset containing the most important ones
422 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
if this number is very large) and predicts measured early in the school year. In the full
the probability that the participant will be in sample (n = 769) of children at risk for
the T group. This probability is known as the grade retention, there were large differences
propensity score. between students on the Woodcock Johnson
There are two major issues in the creation of reading score at baseline. Students who were
propensity scores. The first is to make sure that later retained in first grade had substantially
subject matter expertise in the form of prior lower scores than students who were later
research and theory has been used to select promoted to second grade, Y baseline−retained =
baseline measures that will capture as fully 420 versus Y baseline−promoted = 438. Optimal
as possible important baseline differences 1 to 1 matching on propensity scores yielded
between the T and C groups. The second is to 97 matched pairs with Y baseline = 422.4 for
choose a statistical model that adequately rep- the retained students and Y baseline = 423.4 for
resents the form of the relationship between the promoted students. Similar reductions in
the variables and each participant’s propensity baseline differences were achieved for other
score. Rosenbaum and Rubin (1983) used variables measured at baseline. Theoretically,
simple linear logistic regression to produce propensity scores will provide a proper
these estimates. Dehejia and Wahba (1999) adjustment for the unknown assignment rule
used more complicated logistic regression if all important baseline variables have been
models involving specification of interactions included and the form of the propensity model
and curvilinear effects of baseline variables. has been correctly specified.
McCaffrey et al. (2004) used automated Matching has substantial strengths in that
stepwise nonparametric regression tree meth- it does not require specification of the form
ods to model possible complex relationships of the relationship between the baseline and
between the variables and the propensity outcome variables, it clearly delimits the
score. In each case the goal is to achieve T and range of the baseline variables over which
C groups that are balanced on all important T and C can be appropriately compared,
baseline variables and for which the error of and it leads to efficient estimates of the
prediction in the sample has been minimized treatment effect because of the small number
(Shadish et al., 2006). As an important check of parameter estimates that are involved.
on the success of this procedure, the data are Hypothesized treatment group x baseline level
divided into five strata and the balance of interactions can also be examined within the
the baseline variables within each stratum is matched propensity score framework. There
compared. When balance is achieved, there are two primary limitations of the matched
is a strong basis for comparing the groups. If propensity score framework. First, it does not
balance is not achieved within one (or more) adjust the treatment effect for measurement
stratum, the comparison of the treatment and error in the baseline variables giving rise to
control groups is carried out only over those potential regression to the mean effects if
strata on which balance has been achieved. very reliable and stable measures of important
Each participant’s propensity score may baseline variables are not available. Second,
then be taken as the best summary of the it does not adjust for other important variables
baseline information. The propensity score is (hidden variables) that are not measured at
used as the basis for equating the groups. The pretest, again emphasizing the importance of
groups may be equated using the standard selection of the full range of potential baseline
1 to 1 or variable many to 1 matching proce- variables based on subject matter expertise.
dures described above. Alternatively, analysis
of covariance or blocking on the strata may be Statistical adjustment strategies based on
used (but see footnote 3). As an illustration of measured baseline differences
the matching strategy, Wu et al. (in press) con- A variety of statistical models may be
structed propensity scores for retention in first developed that attempt to adjust for baseline
grade from a large set of baseline variables differences in measured variables. Perhaps,
EQUATING GROUPS 423
the simplest is analysis of covariance that the models to date have specified a linear
(Huitema, 1980; Reichardt, 1979) which is relationship between the baseline measures
used to provide an adjustment of the treatment and the outcome. Lee et al. (2004), Marsh
effect for one or more baseline variables. et al. (2004), and Wall and Amemiya (2007)
Typically, a simple linear model is used, describe extensions of structural equation
Ŷ = b0 + b1 COV + b2 X, where Y is the out- models that may account for curvilinear and
come variable, COV is the covariate measured interactive effects.
at baseline and X is the binary treatment Correction for measurement error can also
indicator. This model can be extended to be desirable when treatment participants
include multiple covariates, other parametric are selected on the basis of a variable
relationships (e.g. addition of a b3 COV2 term that is unstable over time. For example, if
to represent a quadratic relationship between T participants are selected based on high
X and Y ), and treatment x covariate inter- scores on a measure of depression (or because
actions (Cohen et al., 2003; Huitema, 1980; they are seeking treatment because of a severe
Reichardt, 1979). Nonparametric methods can depressive episode), it is likely that some of
be used to model more complex relationships the participants are in a temporary state of high
between X and Y (see Little et al., 2000). depression and would return to their typical
The primary limitation of ANCOVA methods level of depression in the absence of any
is that their success in equating the T and treatment simply given the passage of time.
C groups depends heavily on the correct Reliability correction methods that adjust the
specification of the adjustment model. For estimate of the treatment effect for the test-
example, if the relationship between COV and retest reliability for the time interval between
Y is nonlinear and a simple linear ANCOVA the baseline and outcome measures in the
model is used, the treatment effect estimate absence of treatment can improve the estimate
will be biased. of the treatment effect. If repeated measures
The basic ANCOVA approach shares the are collected on multiple indicators of the
limitation with matching that baseline vari- outcome variable at baseline and multiple
ables may be measured with less than perfect other time points, special structural equation
reliability. This problem is most serious models can be used that partition the variance
when the T and C groups are selected at each time point into state (temporary) and
from different populations, so that regression trait (true score) components (cf. Khoo et al.,
to the mean will occur (see Campbell & 2006; Steyer et al., 1992).
Kenny, 1999; Shadish et al., 2002). Even if
the statistical adjustment model is otherwise Adjusting for unmeasured baseline
correctly specified, measurement error will differences (hidden variables)
typically lead to under-adjustment of the treat- The matching and the statistical adjustment
ment effect for baseline differences. Huitema strategies described above can provide appro-
(1980) provides an introduction and Fuller priate correction of the estimate of the
(1987) provides a more advanced treatment of treatment effect for variables measured at
methods for correcting for measurement error baseline. However, it is also possible that
in the context of ANCOVA. Alternatively, variables that are not measured at baseline
when multiple indicators are available for could account for all or part of the estimated
each important construct measured at pretest, treatment effect. Three general strategies exist
structural equation models can be used to for addressing this problem.
provide measurement error-free estimates of First, a variety of methods have been
the treatment effect. Aiken et al. (1994) proposed for conducting sensitivity analy-
provide a good discussion of the use of this ses of treatment effect estimates (Marcus,
approach and apply it to the evaluation of 1997; McCafferty et al., 2004; Rosenbaum,
a drug treatment program. One limitation of 2002). As an illustration of one simple
the structural equation modeling approach is method, imagine a researcher has found a
424 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
0.8 standard deviation difference (large effect variable4 , a variable that strongly predicts
size) between the T and C groups on the treatment assignment in the first equation but
outcome variable. The researcher would then which has no separate relationship to the
identify the largest standardized difference outcome (see Figure 24.1). In essence, the
between the T and C groups on the set instrument can be thought of as a naturally
of variables measured at baseline. Suppose occurring randomization (Heckman, 1996).
the largest baseline difference were d = 0.5 The instrumental variable can only affect
standard deviations. Then the researcher the outcome indirectly through its effect on
identifies the maximum correlation between treatment assignment, an assumption known
any of the baseline measures and the posttest as the exclusion restriction. If the assumptions
measure of the outcome of interest. Suppose of this approach are met, the treatment
the maximum correlation were r = 0.6. The effect estimate will include proper adjust-
product of these two quantities, adjustment = ment for both measured and unmeasured
Y baselineT −Y baselineC baseline variables. However, in practice, this
SD rbaseline−outcome , here adjust- method is extremely sensitive to violations
ment = 0.5 × 0.6 = 0.3, provides a rough of its underlying assumptions, particularly
estimate of the maximum extent that this the exclusion restriction (Heckman, 1997;
estimate of the standardized treatment effect Stolzenberg & Relles, 1990; Winship &
would need to be reduced given what is a
‘worst case scenario’ for an important hidden
variable. If the standardized treatment effect
were reduced by this amount, to 0.8 − 0.3 =
0.5 in our example, we would have a plausible
estimate of its lower bound. If this value Residual 1 Residual 2
were still statistically significant, it would
provide evidence that the treatment effect
is robust. Note that there is no theoretical BOT
reason why the actual adjustment required for Treatment
Outcome
Indicator
hidden variables could not exceed this value.
However, in practice, if a number of variables
are measured at baseline and they can be
presumed to be representative of important Instrumental
Variable
hidden variables, the adjustment will nearly
always be an overestimate of the adjustment
Figure 24.1 Illustration of econometric
needed in practice.
selection bias model
Econometric approaches (e.g. Barnow Note: The instrumental variable directly
et al., 1980; Heckman, 1979, 1989, 1990; affects only the Treatment Indicator (T = 1;
Muthén & Jöreskog, 1983) have been pro- C = 0). This condition is known as the
posed that adjust for the effects of both exclusion restriction. Residual 1 is the error
measured and unmeasured variables at base- of the prediction of the Treatment Indicator
line. Two separate equations are used in including error produced by hidden
these models. The first (selection model) variables. The hidden variables may also be
equation uses measured baseline variables to associated with the residual of the Outcome
predict the assignment of the participant to (Residual 2). If the model is correctly
specified, an adjustment of the regression
the treatment or control group. The second
coefficient BOT will yield an unbiased
uses this selection probability, an indicator
estimate of the treatment effect controlling
variable (T = 1; C = 0) for treatment for the hidden variables. If the assumptions
condition, and potentially other covariates to of the model are violated (notably the
estimate the outcome. A key feature of this exclusion restriction), the estimate of the
approach is the requirement of an instrumental treatment effect may be severely biased.
EQUATING GROUPS 425
Mare, 1992). When assumptions are violated, well represented using this approach. More
the treatment effect estimates of econometric adequate modeling of growth requires the
models can be far more biased than those collection of additional data at multiple time
based on simpler approaches likeANCOVAor points, ideally both before and after the
matching. In addition, even if the assumptions treatment (Shadish et al., 2002; West et al.,
of the approach are met, the standard errors 2000). If sufficient additional time points are
of the estimate of the treatment effect can collected, the natural pattern of growth prior to
be extremely large if the instrument is not treatment can be estimated; this pattern can be
very strongly related to treatment assignment. compared to the pattern of growth following
Finally, the econometric approach assumes the introduction of treatment in the T group.
that the treatment effect is constant across all Singer and Willett (2002) describe multilevel
participants. modeling methods that estimate the treatment
A third approach suggested by Manski effect while allowing for differences between
(1994), Manski and Nagin (1998), Manski participants in growth rates.
and Pepper (2000) has explored the effects of
making weaker assumptions about instrumen-
Design enhancements
tal variables in econometric selection models.
This approach results in the estimation of a In line with the topic of this chapter, we have
plausible range of values for the treatment focused on methods of equating the T and
effect within upper and lower bounds. How- C groups at baseline. However, we would
ever, in some cases, the bounds may be very be remiss if we did not remind readers of
large so that little information is conveyed an important alternative strategy emphasized
about the size of the treatment effect. by Shadish and Cook (1999) and Shadish
et al. (2002). This strategy involves adding
Adjusting for growth design features that address specific threats
A final issue occurs when participants show to validity that arise in observational studies.
different rates of natural growth (e.g. young Shadish and Cook (1999) argue that the use of
children in math skills) or decline (e.g. design enhancements will often be preferable
Alzheimer’s patients in memory) on the out- to the use of statistical adjustment strategies.
come variable of interest. With observations We present three methods of enhancing
taken only at baseline, no measure of the the design of the basic observational study
natural growth rate in the absence of treatment here (see Shadish & Cook, 1999, for an
is available for the participants. Change score extensive list).
analysis (Judd & Kenny, 1981) can be used
to estimate the treatment effect. Participants Multiple control groups
are measured on the same measure at baseline When a treatment and control group are
and outcome. These baseline and outcome selected in an observational study, they will
measures are then transformed so that their be similar at baseline in some respects and
variances are equated (see Huitema, 1980). different in others. This feature gives rise
The mean change in the T group is then to the possibility that some hidden variable
compared with the mean change in the C group may be accounting for the result. If multiple
to provide an estimate of the treatment effect. control groups can be identified and the
This approach adequately models special estimates of the treatment effects are similar
situations in which growth is occurring when different control groups are used, the
at a constant rate across all participants researcher’s confidence that the treatment
or is of the fan spread variety in which effect is not biased is increased. For example,
growth is occurring at a rate proportional to using a large database, Roos et al. (1978)
the participant’s baseline score (e.g. those compared children receiving tonsillectomies
advantaged at baseline gain more). Treatment (T ) with two different comparison groups:
effects for other forms of growth are not (a) children having a matched history of
426 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
respiratory illness; and (b) untreated siblings through treatment noncompliance or attrition.
of the T child who were similar in age. In observational studies, groups are equated
Rosenbaum (2002) presents several examples to help assure that the estimate of the
of the use of this strategy. treatment effect is unbiased and not the
result of baseline differences on measured or
Nonequivalent dependent variables unmeasured variables.
Other dependent variables that would be Initial attempts to compare the effect
expected to be affected by the same factors sizes of observational studies and randomized
as the outcome of interest, but not by experiments studying the same treatment
the treatment can sometimes be identified. have suggested that the direction of bias,
Reynolds and West (1987) studied the effect if any, observed in the observational study
of a promotional campaign (T ) versus no is not consistent, but rather depends on the
campaign (C) on the sales of state lottery research area and features of the design.
tickets in convenience stores. The sales of Research by Shadish and colleagues suggests
lottery tickets increased in the T stores relative that three related factors—(a) larger measured
to the C stores. However, sales of other classes baseline differences; (b) self-selection into
of items (e.g. groceries, gasoline) did not treatment; and (c) the use of comparison
change appreciably, providing support that groups selected from a different population
the increase in ticket sales resulted from than the treatment group—are all associated
the promotional campaign rather than other with bias in treatment effect estimates in
factors (e.g. greater increase in customer observational studies.
traffic in T stores). A variety of statistical adjustment and
design approaches were considered to mini-
Multiple pretreatment measures over time mize the influence of these factors. Matching
We noted earlier that the collection of multiple strategies, including matching on propensity
measurements over time prior to treatment scores, provide a strong basis for equating
permits estimation of the pattern of growth the T and C groups on measured variables.
or decline in the absence of treatment. In Key determinants of the success of this
one design reported by Reynolds and West strategy include the use of content area
(1987), sales figures were available from each expertise to select reliable variables that
store for each of the 12 weeks of the lottery will capture baseline differences between
game. Sales declined each week during the groups as fully as possible and careful
lottery. The sales campaign was introduced checking that the propensity score model
into the T stores during the middle of the leads to balance of the baseline variables
lottery permitting a strong basis for estimating within each stratum on the propensity score.
the treatment effect despite different rates of Analysis of covariance and structural equation
decline in the individual participating stores. modeling can also properly equate the T and
C groups of measured variables at baseline
variables and can also provide adjustment for
SUMMARY AND CONCLUSIONS measurement error. The key determinant of
the success of these strategies is whether the
In this chapter we have considered methods relationships between the baseline variables
of equating groups at baseline in randomized and the outcome variable have been properly
experiments and in observational studies. In specified. For example, structural equation
randomized experiments, groups are equated models have only recently been extended
to avoid unfortunate randomization and to beyond examination of linear relationships.
maximize statistical power. Equating groups Econometric approaches provide appropriate
at baseline can also be helpful in interpreting adjustment for both measured and unmea-
the results when there is a breakdown sured variables, but the results may be fragile
of the original randomization, for example as they are dependent on meeting strong
EQUATING GROUPS 427
statistical assumptions. Other econometric (2002) and Shadish et al. (2002) offer
methods make weaker assumptions and pro- useful advice for planning studies to achieve
vide upper and lower bound estimates of this end.
treatment effects; however, if the bounds are
large, there will be considerable uncertainty as
to the true size of the treatment effect. Change NOTES
score analyses can estimate models for the
special case in which there is constant or fan
spread growth (or decline) in the absence of 1 Other threats to internal validity are also possible,
treatment. Addressing more complex forms of as when the experimenter uses different equipment or
different observers to measure the outcome variable
growth requires the collection of additional
in the T and C conditions.
measurements over time both pre- and post- 2 Blocking or analysis of covariance may also be
treatment. used to increase statistical power. A priori matching is
A complementary and often preferable often preferred because it does not assume a specific
approach to statistical adjustment is the form of relationship between the variable(s) on which
participants are matched and the outcome variable.
inclusion of design enhancements that address
Matching can also make it easier to detect unexpected
specific threats to internal validity that arise interactions between the matching variable(s) and
in observational studies. Potential nonequiv- treatment. Maxwell and Delaney (2004, pp. 448–452)
alence can be addressed during the design provide a comparative discussion of the conditions
of the study, ensuring that the participants under which matching, blocking, and analysis of
covariance may be preferred.
in the T and C conditions are sampled
3 As the ratio of the number of participants in
from populations that are as comparable the C to the T group approaches 5 or 6 to 1, the
as possible. The use of additional design statistical power of the test approaches asymptote.
features that rule out specific threats to internal Adding additional C participants will lead to only very
validity can often increase the confidence minimal increases in statistical power.
4 Earlier work within the econometric tradition
with which inferences about treatment effects
proved that selection models were identified so
may be made. These include the use of that treatment effects could be estimated without
multiple control groups that address different an instrument. However, these models require the
threats to validity, nonequivalent dependent assumption of a specific distribution of the variables
variables that would be expected to be in the population. Theoretical work by Little (1985)
showed that these models are extraordinarily sensitive
affected by potential threats to validity, but
to the specific distributional assumptions that were
not the treatment, and multiple pretreatment made. More recent work by Heckman (1997) has
measures over time which permit estimation emphasized the importance of having a good instru-
of patterns of natural growth and decline. ment in producing unbiased estimates of treatment
As researchers move from the ideal ran- effects.
domized experiment to weaker designs such
as broken randomized designs involving
noncompliance or attrition, to designs in REFERENCES
which participants are assigned to T versus C
conditions on the basis of a quantitative mea- Aiken, L. S., Stein, J. A., & Bentler, P. M. (1994).
sure (Reichardt, 2006; see also Cook & Wong, Structural equation analysis of clinical subpopulation
this volume), and finally, to the observational differences and comparative treatment outcomes:
studies that have unknown assignment rules, Characterizing the daily lives of drug addicts.
Journal of Consulting and Clinical Psychology, 62,
the estimate of the magnitude of the treatment
488–499.
effect becomes associated with increasing
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996).
uncertainty. To the extent that researchers Identification of causal effects using instrumental
can bring substantive knowledge, additional variables (with commentary). Journal of the American
design features that address specific validity Statistical Association, 91, 444–472.
threats, and good measurement to bear, Baker, S. G. (1998). Analysis of survival data from a ran-
this uncertainty can be reduced. Rosenbaum domized trial with all-or-none compliance: Estimating
428 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
the cost-effectiveness of a cancer screening program. Heckman, J. J. (1989). Causal inference and nonrandom
Journal of the American Statistical Association, 93, samples. Journal of Educational Statistics, 14,
929–934. 159–168.
Barnard, J., Du, J., Hill, J. L., & Rubin, D. B. (1998). Heckman, J. J. (1990). Varieties of selection bias.
A broader template for analyzing broken randomized American Economic Review, 80, 313–318.
experiments. Sociological Methods and Research, 27, Heckman, J. J. (1996). Randomization as an instrumental
285–317. variable. Review of Economics and Statistics, 77,
Barnow, L. S., Cain, G. G., & Goldberger, A. S. 336–341.
(1980). Issues in the analysis of selection bias. In Heckman, J. J. (1997). Instrumental variables: A study
E. S. Stromsdorfer & G. Farkas (Eds), Evaluation of implicit behavioral assumptions used in making
studies review annual (Vol. 5, pp. 53–59). Beverly program evaluations. Journal of Human Resources,
Hills, CA: Sage. 32, 441–462.
Campbell, D. T., & Kenny, D. A. (1999). A primer on Heckman, J. J., & Robb, R. (1986). Alternative methods
regression artifacts. New York: Guilford. for solving the problem of selection bias in evaluating
Cochran, W. G., & Cox, G. M. (1957). Experimental the impact of treatments on outcomes. In H. Wainer
designs (6th ed.). New York: Wiley. (Ed.), Drawing inferences from self-selected samples
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). (pp. 63–113). New York: Springer-Verlag.
Applied multiple regression/correlation analysis for Heinsman, D. T., & Shadish, W. R. (1996). Assignment
the behavioral sciences (3rd. ed.). Mahwah, NJ: methods in experimentation: When do nonran-
Erlbaum. domized experiments approximate answers from
Conner, R. F. (1977). Selecting a control group: An randomized experiments? Psychological Methods, 1,
analysis of the randomization process in twelve 154–169.
social reform programs. Evaluation Quarterly, 1, Hernán, M. A., Brumbach, B., & Robins, J. M. (2001).
195–244. Marginal structural models to estimate the joint
Cook, T. D., Shadish, W. R., Jr., & Wong, V. C. causal effect of nonrandomized treatments. Journal
(2006). Within-study comparisons of experiments of the American Statistical Association, 96, 440–448.
and non-experiments: What the findings imply Holland, P. W. (1986). Statistics and causal inference
for the validity of different kinds of observa- (with discussion). Journal of the American Statistical
tional study. Unpublished Manuscript, Northwestern Association, 81, 945–970.
University. Available at: http://www.metheval.uni- Huitema, B. E. (1980). The analysis of covariance and
jena.de/projekte/symposium2006/contributions.php alternatives. New York: Wiley.
Dehejia, R. H., & Wahba, S. (1999). Causal effects in Ioannidis, J. P. A., Haidich, A.-B., Pappa, M., Pantazi, N.,
nonexperimental studies: Reevaluating the evaluation Kokori, S. I., Tektonidou, M. G., Contopoulous-
of training programs. Journal of the American Ioannidis, D. G., & Lau, J. (2001). Comparison of
Statistical Association, 94, 1053–1062. evidence of treatment effects in randomized and
Friedman, L. M., Furberg, C. D., & DeMets, D. L. (1998). nonrandomized studies. Journal of the American
Fundamentals of clinical trials (3rd ed.). New York: Medical Association, 286, 821–830.
Springer. Jo, B. (2002). Statistical power in randomized inter-
Fuller, W. A. (1987). Measurement error models. New vention studies with noncompliance. Psychological
York: Wiley. Methods, 7, 178–193.
Gail, M. H., Brinton, L. A., Byar, D. P., Corle, D. K., Judd, C. M., & Kenny, D. A. (1981). Estimating the effects
Green, S. B., Schairer, C., & Mulvihill, J. J. (1989). of social interventions. New York: Cambridge.
Projecting individualized probabilities of developing Khoo, S.-T., West, S. G., Wu, W., & Kwok, O.-M.
breast cancer for White females who are being (2006). Longitudinal methods. In M. Eid and E. Diener
examined annually. Journal of the National Cancer (Eds), Handbook of multimethod measurement
Institute, 81, 1879–1886. in psychology (pp. 301–317). Washington, DC:
Gilbert, J. P., Light, R. J., & Mosteller, F. (1975). American Psychological Association.
Assessing social innovations: An empirical base for Kopans, D. B. (1994). Screening for breast cancer and
policy. In C. A. Bennett & A. A. Lumsdaine (Eds), mortality reduction among women 40–49 years of
Evaluation and experiment: Some critical issues in age. Cancer, 74 (Supplement.), 311–322.
assessing social programs (pp. 39–193). New York: Lee, S. Y., Song, X. Y., & Poon, W. Y (2004). Comparison
Academic. of approaches in estimating interaction and quadratic
Heckman, J. J. (1979). Sample bias as a specification effects of latent variables. Multivariate Behavioral
error. Econometrica, 46, 153–162. Research, 39, 37–67.
EQUATING GROUPS 429
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy algorithm. Journal of Computational and Graphical
of psychological, educational, and behavioral treat- Statistics, 10, 455–463.
ment: Confirmation from meta-analysis. American Muthén, B., & Jöreskog, K. G. (1983). Selectivity
Psychologist, 48, 1181–1209. problems in quasi-experimental studies. Evaluation
Little, R. J. (1985). A note about models for selectivity Review, 7, 139–174.
bias. Econometrica, 53, 1469–1474. Reichardt, C. S. (1979). The statistical analysis of data
Little, R. J., Hyonggin, J., Johanns, J., & Giordani, B. for nonequivalent group designs. In T. D. Cook and
(2000). A comparison of subset selection and analysis D. T. Campbell (Eds), Quasi-experimentation: Design
of covariance for the adjustment of confounders. and analysis issues for field studies (pp. 147–205).
Psychological Methods, 5, 459–476. Boston: Houghton-Mifflin.
Little, R. J., & Rubin, D. B. (2000). Causal effects Reichardt, C. S. (2006). The principle of parallelism in
in epidemiological studies via potential outcomes: the design of studies to estimate treatment effects.
Concepts and analytical approaches. Annual Review Psychological Methods, 11, 1–18.
of Public Health, 21, 121–145. Reynolds, K. D., & West, S. G. (1987). A multiplist
Little, R. J., & Rubin, D. B. (2002). Statistical analysis strategy for strengthening nonequivalent control
with missing data (2nd ed.). New York: Wiley. group designs. Evaluation Review, 11, 691–714.
Manski, C. F. (1994). The selection problem. In Robins, J. M. (1989). The analysis of randomized and
C. Sims (Ed.). Advances in econometrics (Vol. 1, nonrandomized AIDS trials using a new approach
pp. 147–170). Cambridge, UK: Cambridge University to causal inference in longitudinal studies. In
Press. L. Sechrest, H. Freeman, & A. Mulley (Eds), Health
Manski, C. R., & Nagin, D. S. (1998). Bounding services research methodology: A focus on AIDS
disagreements about treatment effects. Sociological (pp. 113–159). Washington, DC: US Public Health
Methodology, 28, 99–137. Service.
Manski, C. R., & Pepper, J. V. (2000). Monotone Roos, L. L., Jr., Roos, N. P., & Henteleff, P. D. (1978).
instrumental variables: With an applications to the Assessing the impact of tonsillectomies. Medical
return to schooling. Econometrica, 68, 997–1010. Care, 16, 502–518.
Marcus, S. (1997). Using omitted variable bias to Rosenbaum, P. R. (2002). Observational studies
assess uncertainty in the estimation of an AIDS (2nd ed.). New York: Springer-Verlag.
education treatment effect. Journal of Educational Rosenbaum, P. R., & Rubin, D. B. (1983). The central
and Behavioral Statistics, 22, 193–202. role of the propensity score in observational studies
Marsh, H. W., Wen, Z. L., & Hau, K. T. (2004). Structural for causal effects. Biometrika, 70, 41–55.
equation models of latent interactions: Evaluation Rosenbaum P. R., & Rubin, D. B. (1984). Reducing bias
of alternative estimation strategies and indicator in observational studies using subclassification on the
construction. Psychological Methods, 9, 275–300. propensity score. Journal of the American Statistical
Matthews, J. N. S. (2000). An introduction to Association, 79, 516–524.
randomized clinical trials. New York: Oxford. Rubin, D. B. (1974). Estimating causal effects of treat-
Maxwell, S. E., & Delaney, H. D. (2004). Designing ments in randomized and nonrandomized studies.
experiments and analyzing data: A model comparison Journal of Educational Psychology, 66, 688–701.
perspective (2nd ed.). Mahwah, NJ: Erlbaum. Rubin, D. B. (1978). Bayesian inference for causal effects:
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). The role of randomization. Annals of Statistics, 6,
Propensity score estimation with boosted regression 34–58.
for evaluating causal effects in observational studies. Rubin, D. B. (1980). Discussion of ‘Randomization anal-
Psychological Methods, 9, 403–425. ysis of experimental data in the Fisher randomization
Meier, P. (1972). The biggest public health experiment test,’ by D. Basu. Journal of the American Statistical
ever: The 1954 field trial of the Salk poliomyelitis Association, 75, 591–593.
vaccine. In J. M. Tanur, F. Mosteller, W. H. Kruskal, Rubin, D. B. (1997). Estimating causal effects from large
R. F. Link, R. S. Pieters, & G. R. Rising (Eds), data sets using propensity scores. Annals of Internal
Statistics: A guide to the unknown (pp. 120–129). Medicine, 127, 757–763.
San Francisco: Holden Day. Rubin, D. B. (2005). Causal inference using potential
Ming, K., & Rosenbaum, P. R. (2000). Substantial gains outcomes: Design, modeling, decisions. Journal
in bias reduction from matching with a variable of the American Statistical Association, 100,
number of controls. Biometrics, 56, 118–124. 322–331.
Ming, K., & Rosenbaum, P. R. (2001). A note on optimal Sacks, H. S., Chalmers, T. C., & Smith, H. (1983).
matching with variable controls using the assignment Sensitivity and specificity of clinical trials: Randomized
430 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
v. historical controls. Archives of Internal Medicine, Student (W. S. Gosset). (1931). The Lancashire milk
143, 753–755. experiment. Biometrika, 23, 398–406.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our Wall, M. M., & Amemiya, Y. (2007). A review of
view of the state of the art. Psychological Methods, nonlinear factor analysis and nonlinear structural
7, 147–177. equation modeling. In R. Cudeck & R. C. MacCallum
Shadish, W. R., & Cook, T. D. (1999). Design rules: (Eds), Factor analysis at 100: Historical developments
More steps towards a complete theory of quasi- and future directions (pp. 337–361). Mahwah, NJ:
experimentation. Statistical Science, 14, 294–300. Erlbaum. West, S. G., Biesanz, J. C., & Pitts, S. C.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). (2000). Causal inference and generalization in
Experimental and quasi-experimental designs for gen- field settings: Experimental and quasi-experimental
eralized causal inference. Boston: Houghton-Mifflin. designs. In H. T. Reis & C. M. Judd (Eds), Handbook of
Shadish, W. R., Luellen, J. K., & Clark, M. H. research methods in social and personality psychology
(2006). Propensity scores and quasi-experiments: (pp. 40–84). New York: Cambridge.
A testimony to the practical side of Lee Sechrest. In West, S. G., Duan, N., Pequegnat, W., Gaist, P.,
R. R. Bootzin, & McKnight, P. E. (Eds), Strengthening DesJarlais, D., Holtgrave, D., Szapocznik, J.,
research methodology: Psychological measurement Fishbein, M., Rapkin, B., Clatts, C., & Mullen, P.
and evaluation (pp. 143–157). Washington, DC: (2007). Alternatives to the randomized controlled
American Psychological Association. trial. Manuscript under review, Arizona State
Shadish, W. R., & Ragsdale, K. (1996). Random versus University.
nonrandom assignment in controlled experiments: Do West, S. G., & Sagarin, B. J. (2000). Participant
you get the same answer? Journal of Consulting and selection and loss in randomized experiments. In
Clinical Psychology, 64, 1290–1305. L. Bickman (Ed.), Research design: Donald Campbell’s
Singer, J. D., & Willett, J. B. (2002). Applied legacy (Vol. 2, pp. 117–154). Thousand Oaks, CA:
longitudinal data analysis: Modeling change and Sage.
event occurrence. New York: Oxford. Winship, C., & Mare, R. D. (1992). Models for sample
Smith, H. L. (1997). Matching with multiple controls to selection bias. Annual Review of Sociology, 18,
estimate treatment effects in observational studies. 327–350.
Sociological Methodology, 27, 325–353. Winship, C., & Morgan, S. L. (1999). The estimation
Steyer, R., Ferring, D., & Schmitt, M. J. (1992). States and of causal effects from observational data. Annual
traits in psychological assessment. European Journal Review of Sociology, 25, 659–707.
of Psychological Assessment, 8, 79–98. Wu, W., West, S. G., & Hughes, J. (in press). Short-
Stolzenberg, R. M., & Relles, D. A. (1990). Theory testing term effects of grade retention on the growth rate
in a world of constrained research design. Sociological of Woodcock-Johnson III broad math and reading
Methods and Research, 18, 395–415. scores. Journal of School Psychology.
25
Discourse Analysis and
Conversation Analysis
Charles Antaki
There is no lack of methods available to The table below shows a variety of named dis-
discourse analysts once they have decided course analytic methods, but I have reserved
where their interests lie. Since the ‘linguistic an entry for unadorned ‘discourse analysis’.
turn’ in the social sciences of the nineteen That is useful for two reasons: it prompts us
seventies, qualitative methods textbooks have to ask what the core features are that makes
laid out an increasingly varied menu of something recognisable as DA, and reminds
discourse analytic methods, which have over us that many scholars are happy to use just
the years moved from novel and marginal these features without committing themselves
to familiar and central. Picking a method to one or other specific variant.
among these is apparently straightforward, The four core features of any DA are these:
once analysts have a clear idea of what
interests them. In Table 25.1, I range interests • The talk or text is to be naturally found
alongside appropriate methods. (in the sense of not invented, as it might
Students of discourse analysis (DA) will be in psycholinguistics, pragmatics or linguistic
recognise that the column headings in philosophy; some analysts admit interview data
Table 25.1 should only be used as a con- into this natural category, while others do not);
venience, because I have pretended that one • The words are to be understood in their co-text
at least, and their more distant context if doing so
can just start with a simple notion of ‘what
can be defended;
actions are to be revealed’, list them, then • The analyst is to be sensitive to the words’ non-
read off the corresponding theory, method and literal meaning or force;
data. In fact, of course, theory and method • The analyst is to reveal the social actions and
have a large say in calling something an consequences achieved by the words’ use – as
‘action’ in the first place, and what counts enjoyed by those responsible for the words, and
as evidence for that action; so these three suffered by their addressees, or the world at
apparently solid columns are better thought large.
of as fuzzy threads twined around each other.
Indeed, not even the rows are discrete; they Before I give an account of some specific
too are harder to separate than the simple table examples of discrete sorts of discourse
suggests. All that will become clearer as we analysis, it would be as well to recall
see examples of discourse analytic work in that many social scientists find a service-
practice. able use for what we might call ‘generic’
Table 25.1 Discourse analytic methods and data according to researcher’s interests
What actions are to be revealed Candidate theory/method Typical data
Personal meaning-making Narrative Analysis, Interpretative Interviews, diaries, autobiographies, stories
Phenomenological Analysis
Imposing and managing frames of Interactional Sociolinguistics, Audio and video recordings, ethnographic
meaning and identities Ethnography of speaking observations
Accomplishing interactional life in Conversation Analysis Audio and video recordings
real time
Displaying and deploying Discursive Psychology Audio and video recordings, texts
psychological states; describing the
world and promoting interests
Constituting and representing culture [Generic] Discourse Analysis Texts, interviews
and society
Constituting and regulating the social Critical Discourse Analysis Official and unofficial texts, speeches,
and the political world; the operation media accounts and representations,
of power interviews
DISCOURSE ANALYSIS AND CONVERSATION ANALYSIS 433
discourse analysis. This is work done without the sort of eclectic analysis that borrows from
a strong commitment to the sorts of episte- more than one school. I have allocated space
mologies and ontologies of the schools of to these six according to their influence as
analysis we shall see later on: it is a sort I see it, acknowledging that other reviewers
of working procedure, inspired by the four may see things differently. Setting them out in
basic principles of discourse analysis, and series will reveal, I think, that the differences
brought off in bespoke ways to make sense of between them are instructive about what is
one particular topic or domain of experience. at stake in the discourse analytic project as a
The method of choice in such work is whole.
often an inspection of textual material (e.g.
news media reports) or interview transcripts
(e.g. researchers’ interviews with informants
NARRATIVE ANALYSIS
chosen for their particular experiences). The
author or speaker is not, however, taken to
The origins of narrative analysis lie in literary
be a simple informant, reporting unvarnished
anatomies of folk stories. Since the publica-
facts; he or she is seen as producing
tion of Vladimir Propp’s The Morphology of
(or reproducing) themes or representations
the Folktale (1928), folklorists and literary
(sometimes called ‘interpretative repertoires’,
analysts have had an interest in discerning
after the influential use made of the term,
the underlying and possibly universal patterns
originating in Gilbert and Mulkay (1984),
in what seem to be discrete and individual
by Potter and Wetherell (1987)). The job of
stories (for example, in one of Propp’s most
the analyst is to sift carefully among the
basic templates, the underlying pattern of ‘the
material to extract these themes or repertoires,
quest’ or ‘the restitution of an object lost at the
and thus uncover the underlying dimensions
start of the tale’). Social scientists, as opposed
along which the author or interviewee makes
to literary and folklore scholars, have seized
sense of their experiences, or, if the interest
on the idea of structure, but shied away from
is less psychological, to uncover the imprint
looking for universal primitives as such. Their
that society has left on their lives. Generic
interest is in finding how the narrator finds
discourse analysis is, however, difficult to
a pattern and chronology that makes sense
illustrate with a given empirical example,
of her or his own unique life and the events
precisely because different studies take a
in it (see, for example, the work collected
great deal of colouring from their topic
in Schiffrin et al., 2006). Such patterns and
of interest (which might be media reports of
chronologies might be shared among a like-
political events, or people’s experiences of
minded group, but can equally be wholly
health and illness, or organisational change,
particular to the individual.
or educational practice, to name three typical
As illustration we may consider the work
examples).
of Michelle Crossley, whose Introducing
We shall be on firmer ground if we turn
narrative psychology: self, trauma and the
now to see how particular styles of discourse
construction of meaning (2000) crystallised
analysts address the texts in front of them. In
the application of narrative DA to the study
what follows, I won’t be able to describe all
of psychology, especially the psychology
the varieties of DA that I list in Table 25.1,
of health and wellbeing. Crossley analyses,
still less those which haven’t quite yet joined
among other kinds of narrative, the self-
the canon. I have chosen five influential
reflections of people who have undergone
varieties that have been successful (and
traumatic changes in their health. Here is
controversial) in different ways: narrative
an excerpt from such a reflection, in an
analysis, critical discourse analysis (CDA),
autobiography:
interactional sociolinguistics, conversation
analysis and discursive psychology. I have Without even realising it, before my diagnosis I had
also appended a further example to illustrate been living in an open, expansive, interior space.
434 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Now the walls and ceilings had moved uncom- social life; and to have such a theory is vital.
fortably close. Limits were everywhere I looked … Without such a theory, the CDA argument
Gone was my sense of feeling protected or secure.
runs, one risks wasting time on non-problems
Gone, too, was any feeling of certainty about the
future. As my treatment progressed, these invisible or trivialities, or telling only part of the story,
losses were to become more painful, in some ways, and missing its political significance. In the
than the outward, physical losses and privations of worst case, one’s mere technical analysis,
the disease and its remedies. (Mayer, 1994, p. 54, by refusing to recognise political forces at
cited in Crossley, 2000)
work in the data, may implicitly condone or
Crossley’s analysis points us towards the perpetuate them.
realisation that in words such as these, we see Within this broad family of analysts there
how psychologically important it is for the are those who come from a post-structuralist
individual to have an articulable ‘story-line’ background, to some degree independent of
which maintains continuity and integrity: the the linguistics traditions which inform a good
trauma is destructive insofar as it radically deal of critical discourse work. In the post-
disturbs one’s sense of trajectory and sense structuralist tradition much use is made of
of selfhood. As Crossley puts it: ‘This sense Michel Foucault’s insights into the operation
is severely disrupted in the face of trauma, of power in discourses, and, increasingly,
which demonstrates a devastating capacity psychoanalytical concepts from the school
to “unmake the world”’, (Crossley, 2000, of Jacques Lacan. An example of this sort
p. 541). The promise of this sort of discourse of CDA can be found in the work of Ian
analysis is that it will recast ‘facts’ as Parker (see, for example, his programmatic
constructions, reveal heretofore unsuspected statement, Parker, 2003), and in the narrative
and perhaps marginalised experiences, give analysis of Wendy Hollway (see, for example,
voice to those whose experiences are not Hollway and Jefferson, 2000), among many
well understood, and perhaps feed into others. Other critical discourse analysts come
policy-making in the domains of health and from linguistics background, and bring with
education: two areas where narrative analysis them an array of linguistic tools with which to
has a strong presence. unfold their data.
For an illustration of the more linguis-
tically oriented kind of CDA, consider
this exemplary analysis, taken from a joint
CRITICAL DISCOURSE ANALYSIS account of CDA by two of its best-known
(but of course not uniquely representative)
The umbrella term ‘Critical Discourse Anal- proponents and theorists, Norman Fairclough
ysis’ shelters a broad family of analysts, but and Ruth Wodak (1997). They give a 125-line-
all have this in common: they approach texts long extract from a question-and-answer radio
from a certain prior point of departure, often interview with Margaret Thatcher during
an avowedly political one. That is the ‘critical’ her time as Britain’s Prime Minister. It
in the term. ‘The way we approach these is not an event-led news interview; she
questions’, says van Dijk, one of the doyens is being asked generally, if I can offer a
of CDA, ‘is by focussing on the role of rough gloss, about her political beliefs and
discourse in the (re)production and challenge aspirations. Fairclough and Wodak present
of dominance. Dominance is defined here as their analysis in eight facets, of which I
the exercise of power by elites, institutions select the two most emblematic examples.
or groups, that results in social inequality, Inevitably this will impoverish what they
including political, class, ethnic, racial and say, but it will give a flavour of these
gender inequality’ (van Dijk, 1993, p. 249; authors’ CDA style, on two central CDA
emphasis in the original). To be aware of themes: power and ideology. I will quote
the exercise of power, and its resulting social part of the transcript to help illustrate their
inequality, requires a political theory about analysis.
DISCOURSE ANALYSIS AND CONVERSATION ANALYSIS 435
Extract 1: From Fairclough and Wodak, using such privileged talk, Thatcher not only
(1997, pp. 269–270) (MT = Prime Minister ‘circumvents and marginalises [the radio
Margaret Thatcher.) presenter’s] power as interviewer’, but also
exercises her power over the radio audience.
61 MT […] then you turn to internal They go on to observe that ‘Thatcherism
security
can … be partly seen as an ongoing hegemonic
62 and yes you HAVE got to be strong on
law and order
[power] struggle in discourse and over
63 and do things that only governments discourse, with a variety of antagonists -
can do but “wets” in the Conservative part, the other
64 there it’s part government and part political parties, the trade unions, and so forth’
people because
(p. 273). This is a good illustration of how
65 you CAN’T have law and order
observed unless it’s
CDA is able to make the kind of generalisation
66 in partnership with people then you that allows it to link the immediate data back
have to be strong to the analysts’ prior political commitments.
67 to uphold the value of the currency
and only
68 governments can do that by sound Ideology
finance and then
69 you have to create the framework for The authors note that, in the extract above,
a good Margaret Thatcher formulates a free-market
70 education system and social security ideology explicitly; but their analysis aims
and at that point
to add value by showing how she expresses
71 you have to hand over to people
people are inventive
the ideology more subtly. This stretch of
72 creative and so you expect PEOPLE to her words (and some 20 further lines not
create thriving shown here), they say, ‘is actually’ (i.e. not
73 industries thriving services yes you as one might first naively think, without
expect people
analytic help) ‘built around a contrast between
74 each and every one from whatever
their background
government and people which we would
75 to have a chance to rise to whatever see as ideological: it covers the fact that
level their own “people” who dominate the creation of
76 abilities can take them […] “thriving industries” and so forth are mainly
the transnational corporations, and it can help
to legitimise existing relations of economic
Power
and political domination’ (pp. 265–266).
Fairclough and Wodak see Thatcher’s display Fairclough and Wodak do not specify
of power in a number of discourse features: her exactly where in the extract Thatcher’s failure
use of longish monologues; her interruption of to mention transnational corporations was
her interviewer (not illustrated in the extract significant (that it is a ‘fact’ that her words
above); and her use of linguistic devices such ‘cover’). This is an important analytic point.
as parallel constructions (‘it has to be strong to Claiming that something is a fact, and that
have defence’… ‘you HAVE got to be strong it is significantly absent from a stretch of
on law and order’… ‘you have to be strong discourse, is a harder claim to ground than
to uphold the value of the currency’). Such pointing to something that is significantly
rhetorical devices, the authors claim, are ‘the present (after all, there is an infinity of things
prerogative of professional politicians’ (ibid., that may be facts, and which are absent from
p. 272). CDA’s willingness to use extra-textual any given stretch of talk or text; whereas
claims (in this case, about what generally what is there is at least there). Different DA
politicians do) is shared by many, but not all, traditions solve the problem in different ways.
kinds of DA. CDA notices absence not by working it out
Using their knowledge of the political from the logical or pragmatic implications
scene, the authors are able to say that by of the utterances around it, or from of the
436 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
reaction of those who are there to hear it, verb form) with demographic factors like
as other schools of analysis do. It works geographic location or socioeconomic class,
it out by virtue of prior theorising about or situational variables like the formality
the political or social nature of the world or informality of the speech setting. As
to which the utterance refers. In this case, interest shifted into what those features of
Fairclough and Wodak have a prior theory speech might actively be doing in interaction,
or account of what is happening in the researchers dropped the survey method in
British economy, what ‘thriving industries’ favour of a close qualitative look at what was
refer to, that these industries are owned by going on in the scene – what the founders
transnationals, and that this ownership is of interactional sociolinguistics, Dell Hymes
important in the discussion that Thatcher is and John Gumperz, called the ‘ethnography
currently having with her interviewer. They of communication’.
have a further belief, or expectation, that if Like CDA, interactional sociolinguistics
given an opportunity, a speaker should express means to explore the way that social and
the politically relevant facts of the matter (as cultural forces (including power differentials)
the analysts see them, and whether they are cash out in the details of talk. Unlike CDA, its
logically or pragmatically implied or not, or proponents do not normally require a specific
whether the speaker’s local interlocutors hold prior theory of politics or society, beyond a
them to it or not). Margaret Thatcher was generic belief that society is structured along
given the opportunity, and did not mention class, gender and cultural or ethnic lines, and
transnationals; therefore, it is analytically an expectation that this structure will reveal
safe, as well as useful, to claim that she is itself in interaction. A further difference is
masking their role in the economy. interactional sociolinguistics’ preference for
If we translate these snippets of analysis a great deal of ethnographic knowledge of the
back into the four core features of DA (data local scene in which the discourse takes place,
found naturally; interpreted in co-text; non- and a fairly particular set of codes with which
literally understood; actions achieved), we to analyse it.
see that CDA will insist on a very wide To the degree that working interactional
sense of ‘co-text’ in its interpretation, and on sociolinguists draw on pioneering work by
drawing out implications which may not be John Gumperz, they will see people achieving
visible to those who do not share the analyst’s their local goals (or being thwarted from
prior political commitments, or hesitate to doing so) by offering each other (and taking
apply them to the data. Its prime candidate up, or failing to take up) ‘contextualisation
for ‘social action’ is the action, taken to be cues’. These are various sorts of hints,
unequally shared in society, of constituting codes and signals as to what speakers mean.
the social world. CDA is attractive to scholars (The requirement to call such things ‘con-
who have the view that DA must ally itself textualisation cues’ has been progressively
to a social theory, and must be aware of relaxed as interactional linguistics becomes
inequalities in society. This is shared, in a more widespread, but remains important for
more dilute form, in the next influential DA core proponents of the method.) To get a
I shall look at. sense of what these contextualisation cues
are doing, the interactional sociolinguist is
committed to knowing something about the
INTERACTIONAL SOCIOLINGUISTICS local ethnography of the speakers’ situation:
what jobs they do, what their goals are
Interactional sociolinguistics emerged from and so on.
quantitatively minded variation sociolinguis- Here is an illustrative analysis, taken from
tics of the 1960s (and which still continues an account meant to show off interactional
today) which sought to correlate features of sociolinguistics against a number of other
speech (like a glottal stop or a truncated discourse approaches (Stubbe et al., 2003).
DISCOURSE ANALYSIS AND CONVERSATION ANALYSIS 437
Before turning to the transcribed recording, starts off with the ‘contextualisation cues’ of
the authors give us some background: a complaint involving gender bias, and the
authors can then proceed to see how these two
The discussion takes place between a senior public
interlocutors bring it off.
service manager, Tom, and an analyst, Claire,
who is two ranks below him in the organisational Interactional sociolinguistics’ version of
hierarchy. From the ethnographic fieldwork that the four core features of DA (data found
was done at the time of the data collection, naturally; interpreted in co-text; non-literally
we know that Claire is annoyed that she was understood; actions achieved) gives generous
overlooked for the shared acting manager position
place to the wider ethnographic context. It is
she believes she was promised by her own
manager, and that she and some of her female willing to use information from prior scenes
colleagues interpret this as another example of to guess at what participants are feeling and
gender discrimination within the organisation. We intending in this one. It admits into its analysis
also know that she has expressed the intention to inferences from prior theories, or common
raise the issue with Tom [… continues …]. (Stubbe
assumptions, about interaction. In the extract
et al., p. 359).
above, for example, a speaker was judged
The authors then invite us to read over the to be ‘nervous’, and her nervousness was
following lines to see how Claire gets across partly ascribed to a common-sensical fear
to Tom a way of framing what she is about to that a woman risks being heard as making
say or do in the interaction: a gender-based complaint. Such theorising is
Extract 2: From Stubbe et al., p. 381 less particular and explicit than is required
(transcription conventions in this extract: ‘+’ by CDA, yet still contrasts starkly with
is a pause of up to one second; sloping lines conversation analysis’ distaste for what they
indicate overlapping speech). consider to be ‘going native’.
clinic is talking to a mother about her five- mundane conversation (Maynard, 2003). The
year-old son: news deliverer organises their hints at bad
Extract 5: From Maynard (2004, p. 63) news in such a way that it is the recipient
who is prompted actually to pronounce it.
1 Dr Y: From the:: test results (0.3) In ordinary social life that hinting has a set
2 he seems to function (0.6) of implications which we might interpret as
3 comfortably (0.2) you know and
being to do with the complexities surrounding
4 (achieve) some kind of you
5 know happy and responsive death and other taboo issues; in the clinic,
6 (0.2) it has all those, but also has more prosaic
7 Mrs R: Ye [e:s ] consequences as well. If the patient (or
8 Dr Y: [ .h ]hh ON THE LEVEL of their representative, as in the case above)
9 about you know three
is the one who comes out with the news,
10 (0.1) and
11 a half year old child it shows that he or she has been attending
12 Mrs R: mm to what the doctor said, at least enough
to work things out for themselves; it puts
The doctor is describing evidence: the boy patient and doctor on something of an equal
seems to function comfortably at the level footing. Certainly it is more equal (or more
of a three and a half year old. She is not equal-looking) than would be the case if
(yet) giving a diagnosis. The next extract the doctor simply pronounces the condition
follows the first (though some intervening straight off.
talk has been omitted). But notice how the
doctor manages to avoid actually stating the
child’s condition even as she makes her CA AND ‘MEMBERSHIP CATEGORIES’
recommendation.
Extract 6: From Maynard (2004, p. 63) My account of CA so far has focused on
sequential analysis. There is another strand
1 Dr Y: I feel very strongly that, you of CA, traceable back to Sacks’ work in the
2 know, because he (0.4) tests early seventies, which, although it is alive
3 some kind you know, functions
to sequence and placement of utterances,
4 between mildly retarded and
5 borderline level [.hhhhh ] he
is concerned with them insofar as they
6 needs special class placement. sustain the speaker’s version of events; and
7 Mrs R: [Mm hmm] specifically, the speaker’s choice of identity
8 Dr Y: (Yeah) the (.) class for (0.2) or person categories. This is sometimes called
9 .hh educable mentally retardet
Membership CategoryAnalysis (though many
10 (0.2) will be the best (.) for
11 his (0.8) you know?
in CA prefer to see it as merely a part of the
12 functioning and emotional, he’s broader CA project); but in any case, it is
13 still not ready you know very different from other discourse work on
14 enough [to be more- ] identities. A generic DA of identities would
15→ Mrs R: [Are y- are you tr]yin’ ta
look at material which explicitly names a
16 tell me that you feel he
17 is: s:lightly mentally re
given identity category (say, ‘asylum seeker’),
18 [tard]ed? and chart the ways in which that category is
19 Dr Y: [Yes.] constructed. The aim of that sort of analysis
would be to draw up a picture of ‘asylum
What the doctor has done is to glide from a seeker’ as it appears, explicitly and subtly, in
statement of the evidence (from the tests) to a the materials. Then a further stage of analysis
recommendation for treatment, passing over takes over, and speculation is made about what
actually naming the child’s condition. It falls interests such a picture serves in a general
to the mother (at line 9) to make explicit what way in society. For CA, there is no need to
has so far been implicit. Maynard has noted go to such an abstract level and separate the
this pattern in his work on news delivery in use of the category from its consequences.
440 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
The speaker or writer’s use of (or hint at) drinks should be made available at the staff
an identity category is locally effective. If party. But there is more. He has explicitly
you call someone an asylum seeker (or hint excluded X from ‘we … here in Sweden’.
that she or he is one) then you are doing it The effect is to exclude her not only from
for local consumption, and the consequences the fellow-national category but the locally
will be interactionally visible. And this is true operative category of fellow member of the
for mundane categories (like, say, ‘daughter’) current social group.
as much as it is for more politically charged Both Day’s work, and that of Maynard
ones. that I described above, are examples of
In the case of politically charged identities, CA’s claim to deliver the substance of
consider what is happening here, in this large-scale social phenomena. Their claim is
extract from Dennis Day’s (1998) account that if we want to say that, for example,
of ‘ethnification’. Here, some workers in a agreement between patient and clinician is
factory in Sweden are in a coffee break and at a premium in US consulting rooms; or
planning an upcoming works party. that people can exclude fellow-workers from
Extract 7: Day, 1998, p. 163 (English joint ventures by subtly casting them into
translation from the Swedish) ethnic categories; then CA will provide the
evidence – unaffected, its adherents say,
1 L: that one has wine and normal by prior theorising about context or social
2 drinks too,
forces.
3 right, of course like a party
4 ((writing))
5 → L: that’s what we have at least
6 here in DISCURSIVE PSYCHOLOGY
7 → Sweden one drinks wine, that’s
8 of course
9 what [one wants
The epistemological commitment of conver-
10 R: [of course, it’s like sation analysis – to begin with what the
11 different that participants in the scene make visible to each
12 [to drink other – is shared by Discursive Psychology.
13 L: [what does one drink in what This is a movement, impelled by a number
14 does one drink
15 L: ((points))
of hands, to make Psychology treat the tradi-
16 X: [don’t drink wine but light beer tional psychological topics of perception and
17 or just (soda) cognition (seeing, remembering, knowing and
so on) not, in the first place, as mental
Speaker ‘X’, Day tells us, is categorisable and individual matters, but as resources that
on sight as not ethnically white-Swedish; people use: a person will avow a belief,
she is (or looks) Chinese. But notice that challenge another’s veracity, test a third
we hardly need even this minimal piece of person’s knowledge, admit a faulty memory
ethnography (and the reader might compare and so on. This branch of DA, like others
it with the thick description and inference we have covered, comes in various versions.
required by interactional sociolinguistics; see I will pick an illustrative example from
above). See how, in lines 4 and 5, it is one what has probably been the most empirically
of the participants himself (L) who introduces productive form, the Discursive Psychology
the notion that Otherness is a live issue. That’s developed by Derek Edwards and Jonathan
what (drink) we have, he says; at least here Potter (for programmatic statements of their
in Sweden one drinks wine. It is the ‘we’ project, see Edwards, 1997, Edwards and
and the ‘here in Sweden’ that do the work Potter, 1992, and Potter, 2003).
of setting national or ethnic identities on the Consider Edwards’ work on emotions (see,
table. From the CA point of view, the minimal for example, Edwards, 1999). At first sight,
observation is that L has ‘ethnified’ X to the emotions find a natural home in traditional
extent that he has called into question what Psychology: they are (surely?) subjective,
DISCOURSE ANALYSIS AND CONVERSATION ANALYSIS 441
Edwards’ analysis here of the emotion people to consult or refer to; and in the
term angry is a good example of the affordances of the physical sites they live
respecification that Discursive Psychology and work in. Thus if one wanted to find
intends for the entire realm of ‘the mental’. out how people solve the problem of (say)
It reminds psychologists that emotions, like taking turns to be served (Garfinkel, 2002,
any other ostensibly mental state of mind, ch. 8), one would not limit oneself to analysing
may be allegedly owned in private, but people’s language, but would analyse the ebb
are manifestly traded in public. This makes and flow of bodily movement, synchronised
Discursive Psychology especially attractive occupation of space, gestures, gaze and so
for application to any discourse in which play on, to see how queues form and are oriented
is made of psychological terms, and that of to and policed. Much more than language
course is a wide field. But we should notice needs to be mastered by the person who wants
that Discursive Psychology is not limited to competently to join a line for service – as
the study of the use of psychological terms, many of us who have tried the experience
common though such usage is. Discursive in unfamiliar places, perhaps when in foreign
Psychology’s radical anti-cognitivism aligns lands, can testify.
it with other discourse analyses which take Nevertheless, ethnomethodology has
discourse to be constitutive of social (and inspired a kind of DA which, while wanting
not just social) reality – see, for example, to explicate people’s public reasoning
Potter’s Representing Reality (1996). Were processes, privileges talk in its ethnographic
space to permit, it would have been instructive setting. Perhaps the best label for such work
to describe its close, ethnomethodologically – is ‘eclectic’, since it combines the four
and Conversation Analytically – inspired canonical principle of DA with a concern for
investigation of people’s interested descrip- the physical and temporal location in which
tions and accounts of events, for example in the event takes place. For an example of such
such charged encounters as the police inter- work, I have chosen a much-anthologised
rogation (Edwards, 2006). In its concern for study by Hugh Mehan (1996) on how
unpacking descriptions of reality, Discursive children are sorted into various categories
Psychology is applicable to discourse in its by educators. This picks up the theme of
widest remit. identities in the section on CA above, and
shows how an eclectic discourse analyst can
use non-talk elements of the scene.
AN EXAMPLE OF AN ECLECTIC DA Mehan follows the career of one nine-year-
old boy (‘Shane’). Our first sight of him is
I want to turn for my last example to a DA when a teacher spots him behaving in a way
inspired – if distantly – by ethnomethodology. that concerns her. He then becomes a case for
If ethnomethodology has a place in a survey the educational psychologist, who tests him,
such as this one, it is an uncomfortable one at and the language in which he is described
best. Most practitioners of ethnomethodology changes from the teacher’s common-sensical,
would not describe themselves as doing DA. teacherly talk (‘he’s very apprehensive about
Their aim – as the term ‘ethno-methodology’ approaching anything …’, ‘whenever he’s
suggests – is to explicate the reasoning given some new task to do it’s always like, too
practices or rules that ordinary people display hard, “no way I can finish it” ’) to technical,
in prosecuting their ordinary lives. While quantitative norm-based terms (‘he was given
some of those practices are made visible the WISC-R and his IQ was slightly lower,
in their use of language, many others are full scale of 93 …’).
embodied in the props and resources which Mehan’s set-piece for analysis is a record-
furnish the daily scene; in the temporal ing of a subsequent meeting of educators
organisation of people’s comings and goings; (teachers, educational psychologists and so
in the artefacts and documents available for on) and parents. At this point Shane’s fate,
DISCOURSE ANALYSIS AND CONVERSATION ANALYSIS 443
as is that of a list of children who have to disability; he has been set on a career which
come to the school’s attention as possibly may have profound consequences (for good
needing special education, is to be decided. or ill). Mehan has not simply noted that
Each case will be decided by talk; and as different sorts of evidence have been brought
the outcomes are quite dramatically different forward to reach this decision; by careful
(the child might be classified then and there note of how descriptions are phrased and
as ‘learning disabled’ and sent to one kind of received he has offered us the analysis that
school, or as ‘educationally handicapped’ and (as he puts it) ‘these modes of representation
sent to another), the power of discourse is all are not equal’ (p. 356). It is a DA that
too visible. delivers the generic promise not merely
It is up to the Board to hear the various of describing talk but of explaining social
descriptions of Shane available from his action, and adds specific ethnomethodolog-
teacher, his parents, the school nurse and the ical value by charting participants’ treat-
psychologist, and meld them into a decision ment of each other and the distributions
as to just what kind of schoolboy he is. of powers and expertise that they allow
Mehan describes the props (for example, themselves.
the psychologist’s thick bundle of forms,
test scores and reports) or the lack of them
(the child’s mother has no notes) as part
of the action. The props round out his CONCLUDING COMMENTS:
observations about the talk: that, for example, DISCOURSE ANALYSIS MEANS
the psychologist refers to her official notes DOING ANALYSIS
while delivering her account uninterrupted,
while the mother’s unsupported account A word is in order to remind the reader that this
is drawn out by others’ questioning; or account of DA has been selective. Each exam-
that the psychologists’ document-based story, ple, in the sections above, elbowed its way past
although freighted with obscure jargon, is not a dozen equally significant competitors. Some
challenged, whereas the mother is asked to styles of analysis were crowded out entirely,
explain what she means by her common-sense and a longer chapter may well have found
claims about her son’s behaviour (claims that space for interpersonal phenomenological
would pass unremarked in a more mundane analysis (Smith, 2004), psychoanalytically
setting; for example, that ‘lots of times he oriented Marxist critical discursive psychol-
comes home and he’ll write or draw’). Mehan ogy (Parker, 2002), Foucauldian discursive
‘adds value’ of a startling kind when he psychology (Wetherell and Edley, 1999),
claims that free-association narrative inquiry (Hollway
and Jefferson, 2000), and action-implicative
The psychologist’s report gains its authority by the discourse analysis (Tracy, 2005), among
very nature of its construction. The psychologist’s others. And I ought to say that many working
discourse obtains its privileged status because it
is ambiguous, because it is shot full of technical
discourse analysts claim no specific rules
terms, because it is difficult to understand (p. 357; beyond the four canonical DA features of
emphasis in original) looking for social action in natural data,
non-literally understood in its co-text. Indeed
Mehan’s point is that the technicality of the some discourse analysts have made an explicit
psychologist’s claims meant that they could virtue of keeping their independence from
not easily be challenged, so her conclusions restrictive technicality. An eloquent defence
were never subject to the sort of test that of this way of thinking is Billig’s case in
the mother’s or the teacher’s could be. favour of critical scholarship over narrow
Because of its permitted obscurity, it is the method (Billig, 1988, 1999). It is better, on his
psychologist’s report that carries the day, argument, to have the core discourse analytic
and Shane is classified as having a learning sentiments in mind, be guided by a critical
444 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Smith, J.A. (2004) Reflecting on the development Billig, M. (1999) Whose terms? Whose ordinariness?
of interpretative phenomenological analysis and its Rhetoric and ideology in Conversation Analysis.
contribution to qualitative research in psychology. Discourse and Society, 10, 543–558.
Qualitative Research in Psychology, 1, 39–54. Schegloff, E. A. (1999) ‘Schegloff’s texts’ as ‘Billig’s
data’: a critical reply. Discourse and Society, 10,
558–72.
Feminist discourse analysis Billig, M. (1999) Conversation analysis and the claims
of naivety. Discourse and Society, 10 (4), 572–6.
For a variety of examples of discourse analytic Schegloff, E. A. (1999) Naiveté vs. sophistication or
research projects that offer a specifically discipline vs. self-indulgence: a rejoinder to Billig.
feminist approach, see: Discourse and Society, 10, 577–82.
Kitzinger, C. (2000) Doing feminist conversation
Lazar, M. (Ed.) (2005). Feminist Critical Discourse analysis. Feminism and Psychology, 10, 163–93.
Analysis: Gender, Power and Ideology in Discourse.
Basingstoke: Palgrave.
REFERENCES
Varieties of critical discourse Antaki, Charles, Billig, Michael, Edwards, Derek, and
analysis Potter, Jonathan (2003). Discourse analysis means
doing analysis: A critique of six analytic shortcomings.
There is broad range within Critical Discourse Discourse Analysis On Line, 1(1). Available at:
Analysis. These sources, along with those <http://www.shu.ac.uk/daol/articles/v1/n1/a1
cited in the text, will give an indication of the antaki2002002.html>.
variety. Billig, M. (1988). Methodology and scholarship in
understanding ideological explanation. In C. Antaki
Rogers, R. (Ed.) (2003). An Introduction to Critical (Ed.), Analysing Everyday Explanation: A Casebook
Discourse Analysis in Education. Mahwah, NJ: of Methods. London: Sage.
Lawrence Erlbaum. Billig, M. (1999). Whose terms? Whose ordinariness?
Toolan, M. (Ed.) (2002). Critical Discourse Analysis: Rhetoric and ideology in conversation analysis.
Critical Concepts in Linguistics (Vols 1–4). London: Discourse and Society, 10, 543–558.
Routledge. Boden, D. (1994). The Business of Talk. Oxford: Polity.
Wodak, R. & Meyer, M. (Eds.) (2001). Methods of Critical Crossley, M. L. (2000). Narrative psychology, trauma
Discourse Analysis. London: Sage. and the study of self/identity. Theory & Psychology,
van Dijk, T. (1993) Principles of CDA. Discourse and 10, 527–546.
Society, 4, 249–83. Day, D. (1998). Being ascribed, and resisting, mem-
bership of an ethnic group. In C. Antaki and
S. Widdicombe (Eds.), Identities in Talk. London:
Debate between conversation Sage, pp. 151–170.
analysis and critics de Beaugrande, R. (1997). The story of discourse
analysis. In T. van Dijk (Ed.), Discourse as Structure
This exchange is often cited as a useful and Process. London: Sage, pp. 35–62.
crystallisation of the debate – not always tem- Denzin, N. K., and Lincoln, Y. (2005). The discipline and
perate – between Conversation Analysts and practice of qualitative research. In N.K. Denzin, and
their discourse analytically minded critics. Y. Lincoln (Eds.), The Sage Handbook of Qualitative
I list the papers in their chronological order. Research. London: Sage.
Edwards, D. (1997). Discourse and Cognition. London:
Schegloff, E. A. (1997) Whose text? Whose context? Sage.
Discourse and Society, 8, 165–87. Edwards, D. (1999). Emotion discourse. Culture and
Wetherell, M. (1998) Positioning and interpreta- Psychology, 5, 271–291.
tive repertoires: conversation analysis and post- Edwards, D. (2006). Discourse, cognition and social
structuralism in dialogue. Discourse and Society, 9, practices: The rich surface of language and social
387–412. interaction. Discourse Studies, 8, 41–49.
Schegloff, E. A. (1998) Reply to Wetherell. Discourse and Edwards, D., and Potter, J. (1992). Discursive Psychol-
Society, 9, 413–6. ogy. London: Sage.
446 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Fairclough, N., and Wodak, R. (1997). Critical discourse (2001) (Eds), Discourse Theory and Practice. London:
analysis. In T. van Dijk (Ed.), Discourse Studies Sage Publications.
A Multidisciplinary Introduction, Volume 2: Discourse Parker, I. (2002). Critical Discursive Psychology. London:
as Social Interaction. London: Sage. Palgrave.
Garfinkel, Harold (2002). Ethnomethodology’s program: Parker, I. (2003). Psychoanalytic narratives: Writing the
Working out Durkheim’s aphorism. Edited and self into contemporary cultural phenomena. Narrative
introduced by Anne Rawls. Lanham, MD: Rowman & Inquiry, 13 (2), 301–15.
Littlefield. Peräkylä, A. and Vehviläinen, S. (2003). Conversation
Gilbert, G. N. and Mulkay, M. (1984). Opening Pandora’s analysis and the professional stocks of interactional
Box: A Sociological Analysis of Scientists’ Discourse. knowledge. Discourse & Society, 14 (6).
Cambridge, UK: CUP. Potter, J. (1996). Representing Reality. London: Sage.
Heritage, J. (1984). Garfinkel and Ethnomethodology. Potter J. (2003). Discursive psychology: Between
Cambridge: Polity Press. method and paradigm. Discourse & Society, 14,
Hollway, W. and Jefferson, T. (2000). Doing Qualitative 783–794.
Research Differently: Free Association, Narrative and Potter, J., and Wetherell, M. (1987). Discourse and Social
the Interview Method. London: Sage. Psychology. London: Sage.
Houtkoop-Steenstra, H. (2000). Interaction and the Sacks, H. (1992). Lectures on Conversation (Vols 1
Standardised Survey Interview: The Living Question- and 2). Oxford: Basil Blackwell.
naire. Cambridge: Cambridge University Press. Schiffrin, D., De Fina, A., and Bamberg, M. (Eds.) (2006).
Hutchby, I. and Wooffitt, R. (1998). Conversation From Talk to Identity: Methodological and Theoretical
Analysis. Oxford: Polity Press. Issues in Identity Research. Cambridge University
Levinson, S. C. (1983). Pragmatics. Cambridge, UK: Press.
Cambridge University Press. Stubbe, M., Lane, C., Hilder J., Vine E., Vine B., Marra M.,
Mayer, M. (1994). Examining Myself: One Woman’s Holmes J., and Weatherall, A. (2003). Multiple
Story of Breast Cancer Treatment and Recovery. discourse analyses of a workplace interaction.
Winchester, MA: Faber & Faber. Discourse Studies, 5, 351–388.
Maynard, D. W. (2003). Bad News, Good News: Tracy, K. (2005). Reconstructing communicative prac-
Conversational Order in Everyday Talk and Clinical tices: Action-implicative discourse analysis. In K. Fitch
Settings. Chicago & London: University of Chicago and R. Sanders (Eds), Handbook of Language and
Press. Social Interaction (pp. 301–319). Mahwah, NJ:
Maynard, D. W. (2004). On predicating a diagnosis as Lawrence Erlbaum.
an attribute of a person. Discourse Studies, 6, 53. van Dijk, Teun A. (1993). Principles of critical discourse
Maynard, D. W. and Marlaire, C. (1992). Good reasons analysis. Discourse & Society, 4, 249–283.
for bad testing performance: The interactional sub- Wetherell, M., and Edley, N. (1999). Negotiating
strate of educational testing. Qualitative Sociology, hegemonic masculinity: Imaginary positions and
15, 177–202. psycho-discursive practices. Feminism & Psychology,
Mehan, H. (1996). The construction of an LD student: 9, 335–356.
A case study in the politics of representation. In Wood, L. A., and Kroger, R. O. (2000). Doing Discourse
M. Silverstein and G. Urban (Eds), Natural Histories Analysis. Thousand Oaks: Sage Publications.
of Discourses. Chicago: University of Chicago Press. Wooffitt, R. (2005). Conversation Analysis and Discourse
Reprinted in M. Wetherell, S. Taylor and S. Yates Analysis. London and New York: Sage.
26
Analyzing Narratives and
Story-Telling
Matti Hyvärinen
turn, three partly separate turns are discussed. of speech such as argumentation, instruction
As early versions of narrative analysis, the and narration (Linde 1993; Fludernik 2000).
models of Vladimir Propp (1968) and William In the second case, narrative is a substitute
Labov and Joshua Waletsky (1997) will be for a general assumption, theory or ideo-
introduced next. The Labovian model will logical stance without temporal organization
be systematically used as a comparative (Rimmon-Kenan 2006). Clive Seale, for
backdrop for further developments: the move example, suggests a far broader notion of
from text to context and the contribution of narrative:
recent semantic and cognitive studies for the
analysis of narratives. The last section sug- I understand narratives to be constructed through
gests expectation analysis as a way to connect many things, including acts of consumption, for
the Labovian heritage, contextual orientation example, which can be made symbolically to tell
stories about tastes, relationships (whether real or
and the idea of positioning. The focus of the desired) or social standing. (Seale 2000, 37)
chapter is on the analytic procedures, not on
the interpretive alternatives. Seale points out convincingly how narrativity
and narrative understanding are not something
that only accounts for social action in
THE NOTION OF NARRATIVE retrospect. He also rejects, in a useful way, the
too narrow textualist ways of understanding
Social scientists have seldom considered defi- narrative and opens new areas for narrative
nitions of narrative (cf. Brockmeier and Harré analysis. Narrativity is woven into acting and
1991; Riessman 1993, 17–18). Many scholars planning in ways discussed more thoroughly
simply repeat Aristotle’s characterization of a moment later. But yet, in order to ward
a good tragedy having a beginning, middle off the tendency of ‘narrative imperialism’
and end (Aristotle 1968, 1450b). For open, (Strawson 2004; Phelan 2005a), the elegant
conversational or artistic narratives this is solution suggested by Mari-Laure Ryan might
a far too compelling formula, emphasizing the be more sustainable:
clear sequence of events; on the other hand
the terms are far too broad to reveal anything The narrative potential of life can be accounted for
fundamental in the nature of what narratives by making a distinction between ‘being a narrative’,
actually do. and ‘possessing narrativity’. (Ryan 2005, 347)
Barbara Herrnstein Smith (1981, 228)
offers a useful, rhetorically oriented defi- Narrativity may be understood as an aspect
nition: ‘Someone telling someone else that of texts, experiences and action; an aspect
something happened’. With a slight revision that invites more or less direct narrative
we can also include sensitivity to the context: responses. Narrativity is a matter-of-degree,
‘Somebody telling somebody else on some rendering texts and speech more or less
occasion and for some purpose(s) that some- narrative. A wish for analytic clarity does
thing happened’ (Phelan 2005b, 18). The next not imply that narratives would exist as pure
step taken in this chapter is to suggest that and distinct objects. It would be hopeless
one can also turn the term ‘somebody’ into and misleading to assume that narratives
the plural form, making shared tellership are formally similar, always complete and
visible (Ochs and Capps 2001). always neatly distinct from other kinds of
Cultural studies may be criticized for two discourse (Ochs and Capps 2001). ‘Nar-
confusing ways of discussing narrative. In the rative is first and foremost a prodigious
first case, all kinds of interview talk is variety of genres’, asserts Ronald Barthes
understood as narrative, narration or story. (1966/1977, 79). This means that no definition
In such manner, the whole term of narrative is will fit all narratives and that the desire for
itself at risk of becoming redundant. Ordinary a conceptual consensus may be rather counter-
talk may as well include different genres productive.
ANALYZING NARRATIVES AND STORY-TELLING 449
and folktales had been studied for ages. Jean-Francois Lyotard’s (1993[1983]) rejec-
But the new theoretical landscape was neither tion of grand narratives was emblematic for
normative nor Aristotelian. What was new in the gradual rehabilitation of the alternative,
the 1960s narrative inquiry was what Martin small, forgotten and untold stories, often first
Kreiswirth identifies as ‘the institutional study in feminist studies. If quantitative research
of narrative for its own sake, as opposed foregrounded dominant trends, stories were
to the examination of individual narratives’ to theorize the particular. The post-modern
(2005, 377–378). Marie-Laure Ryan (2005, suspicion of authoritative professional, sci-
344) points out the birth of the new concept entific and institutional truths legitimated the
of narrative: ‘it is only in the past fifty years search for new voices. Second, the new
that the concept of narrative has emerged as an metaphoric discourse on ‘life as narrative’
autonomous object of inquiry’. The abstract, suggested that narratives should have a unique
theoretically rich, flexible, and thus quickly role in the study of human lives, action
moving concept of narrative was a new thing and psychology (MacIntyre 1984; Ricoeur
even in literature and linguistics in the 1960s. 1984; Carr 1986; Sarbin 1986; Bruner 1987;
Roland Barthes’s famous passage has been McAdams 1988, 1993; Polkinghorne 1988;
used to characterize the ubiquity of narrative: Ochberg and Rosenwald 1992; Widdershoven
1993; Brockmeier and Harré 2001; Plummer
Able to be carried by articulated language, spoken
2001; Bamberg 2004a; Hyvärinen 2006b).
or written, fixed or moving images, gestures,
and the ordered mixture of all these substances; The new theoretical perspective was not
narrative is present in myth, legend, fable, tale, easily reconciled with the inherited structural-
novella, epic, history, tragedy, drama, comedy, ist, formal and scientifically oriented methods
mime, painting […], stained glass windows, cin- of reading. In many a case, the adopted
ema, comics, news item, conversation. (Barthes
way to interpret narratives might duly be
1977, 79)
characterized as the hermeneutic re-telling
Looking from another angle, this passage of the stories, or narrative ‘criticism’ (e.g.
indicates the existence of a new kind of Freeman 1993, 2004; Josselsson 2004). There
concept of narrative. Structuralist narratology is always the point to which good stories are
nurtured scientific ambitions and rhetoric. informative as such, and able to evoke strong
Its imagery ‘projects the illusion that narrative reader responses.
is knowable and describable, and therefore The metaphorical impulse for narrative
that its workings can be explained compre- studies created a huge search for meth-
hensively. Narratology promised to provide ods. Here the story of narrative turn is
guidelines to interpretation uncontaminated not so much progressive as often explic-
by the subjectivism of traditional literary itly regressive: methods and theories were
criticism’ (Fludernik 2005, 38). searched out from earlier decades and from
In education, psychology and sociology the other disciplines. Vladimir Propp (1968) and
narrative turn properly took place in the early William Labov and Joshua Waletsky (1997
1980s, and often implied qualitative, human- [1967]), for example, became widely topical
istically oriented research – in stark contrast to in the 1980s. These retroactive moves of
the scientific, descriptive tenor of structuralist reception created substantial inconvenience
narratology and the growing post-structuralist between dominantly structuralist methods
discourse in cultural studies. The narrative and often post-structuralist, phenomenolog-
turn signified both a new prospect and a new ical and hermeneutic theorizing. Yet, many
dilemma: many kinds of research materials authors have tried to overcome this tension
were now to be theorized and analyzed as and have written introductions to narra-
narratives – but often without the smallest tive analysis, including for example Kohler
consensus on what it actually meant. Riessman (1993, 2001); Lieblich et al. (1998);
Two major theoretical moves had huge Clandinin and Connelly (2000); Czarniawska
impact on social research. Critical reception of (2004); Daiute and Lightfoot (2004).
ANALYZING NARRATIVES AND STORY-TELLING 451
The formative role of the model was reflected narrative studies in practice, and offered the
in the 1997 special issue of Journal of means to approach fairly small stories in
Narrative and Life History. a detailed way.
Emerging from the linguistic discourse, the Mishler, however, was among the first to
model provided social research with one of the voice a key problem with the Labovian model,
first tools to approach the studied narratives in when he ‘pointed to its relative inattention
a detailed way. Textually, the model offered to the interview context in the production of
clear criteria to recognize narrative, and narratives’ (Mishler 1997, 71). In a typically
recognize its difference from other forms of structuralist way, the model portrays stories
talk (description, argument or question). as independent and fully formed texts, and
Labov and Waletsky tried to find the ‘appears to take the story or narrative as
smallest, most elementary, oral version of already formed, as waiting to be delivered’
narrative. Following the main trend of the (Schegloff 1997, 100). Schegloff points out
time, their approach is formal, trying to locate that nothing is told about the recipients
the structural model of narrative. But in during the telling or afterwards, no silences
addition to this, there is a conscious functional or hesitations are reported (Ibid., 100–101).
element: narratives are for ‘recapitulating The strong emphasis on sequence is
experience’, but this is not the only function. another problem. Mishler (1997, 72) conveys
A sheer experiential narrative would be a broadly shared experience in noticing
pointless, they argue, without the function of how ‘in intensive life history interviews,
‘evaluation’ (Labov & Waletsky 1997, 4). respondents rarely provided chronological
The basic element of the model is a ‘nar- accounts’. In other words, the model, strictly
rative clause’. Narrative clauses are ordered based on clause level narrative sequence, was
sequentially, and the change in their order all too narrow actually to capture the complex
would change the whole narrative. Thus, ‘I fell narration so typical in interview situations.
in love with Paula. My wife left me’ would This seems to lead to a marginalization in
be an entirely different story if the order the model of other aspects such as place,
of clauses had been reversed. But still, only by rendering it only as a static element of
very elementary narratives are exclusively orientation. But from life stories to fiction,
built on these narrative clauses; ‘free’ and place may have a much more central and
‘restricted’ clauses are needed as well. The constitutive role in the narrative (e.g. Herman
model is based on sequence, narratives being 2002; Georgakopoulou 2003).
‘one method of recapitulating past experience
by matching verbal sequence of clauses
to the sequence of events that actually FROM TEXT TO NARRATIVE PRACTICE
occurred’ (Labov and Waletsky 1997, 12).
The model has the following parts (Labov The changing reception of the Labovian
1972, 370): model exhibits a more profound change from
studying narratives as separate, complete
1. Abstract; 4. Evaluation; and self-sufficient texts towards a study of
2. Orientation; 5. Result;
3. Complicating action; 6. Coda.
narratives in context and interaction and the
study of narrative practices (Gubrium and
As Hymes (1996, 193) notes, this structure Holstein 2008). Within this emerging under-
resembles models created earlier in literary standing, ‘emphasis is on narrative activity
studies. In comparison with the very as sense-making process rather than as
theoretical discussion of life as narrative, it a finished product in which loose ends knit
steered interest towards more empirically together into a single story-line’ (Ochs and
based problems. Labovian approach and Capps 2001, 15).
such influential works as Elliot Mishler’s The work of Elinor Ochs and Lisa Capps
(1986) Research Interviewing informed (2001, 3) marks, in various ways, the end
ANALYZING NARRATIVES AND STORY-TELLING 453
of the dominance of the Labovian form in approach, they welcome the study of ‘nar-
narrative analysis. Instead of full narratives, rative environments’, which ‘challenge as
proceeding through the six steps, the authors well as affirm various stories’ (Ibid., 26)
suggest conversational narratives, many of and ‘narrative control’. Arthur W. Frank’s
which ‘seem to be launched without knowing influential study The Wounded Storyteller
where they will lead’ (Ibid., 2). If narrative, (1995) portrays ‘restitution narrative’ as
as ‘a cognitively and discursively com- one of the three basic models of ill-
plex genre’ often incorporates the elements ness narratives, but as the model that
of description, chronology, evaluation and is heavily supported by medical institu-
explanation, then the conversational story- tions, advertising and media. (Frank 1995,
telling completes and complicates this picture 78–79).
with the respective elements of question,
clarification, challenge and speculation (Ibid.,
18–19). What seemed to be formal and stable Events, states and narrative genres
elements are transformed into processes. Catherine Riessman (1990, 75–78) identi-
Jaber F. Gubrium and James A. Holstein fies three separate narrative genres in the
(2008) argue for a similar shift from strictly interviewed divorce talk she studied, calling
textual study of stories towards investigating them ‘proper’ stories, ‘habitual narratives’,
the storying process, or ‘narrative ethnog- and ‘hypothetical narratives’. In her dis-
raphy’, as they call their approach. They course, ‘story’ is reserved for the kind of
recognize the relevance of the conceptual oral narratives Labov and Waletsky studied.
distinction between the story and storying Indeed, how representative is the Labovian
process, which offers ‘grounds for thinking narrative?
about narrativity as something interesting Paul Ricoeur (1984) discusses ‘the
on its own’ (Ibid., 1). The observation has semantics of action’, suggesting a strong
profound consequences. When the interest relationship between the vocabularies of
moves from narratives as separate texts into narrative and action (Hyvärinen 2006a). The
storytelling and narrative practice within narrative theorist David Herman takes this
social institutions, the social functions of point further and unpacks the key Labovian
narrativity can be theorized in a new way. terminology of ‘complicating action’ in his
This move out from the confines of nar- Story Logic (2002). Drawing on the work
rative structure invokes a whole new array of language philosophers and semantics, he
of questions, and the authors emphatically suggests a far-reaching distinction between
invoke even larger contexts than Ochs and states, activities/processes, accomplishments
Capps, seeing them embedded like nested and achievements (Herman 2002, 29–37):
dolls:
[Zeno] Vendler […] proposed a fourfold distinction
Concern with the production, distribution, and between activity terms (e.g. used to describe some-
circulation of stories in society requires that we step one running or pushing a cart), accomplishment
outside of narrative material and consider questions terms (used to describe someone running a mile or
such as who produces particular kinds of stories, drawing a perfect circle), achievement terms (used
where are they likely to be encountered, what to describe someone reaching the top of a hill), and
are their consequences, under what circumstances state terms (used to describe someone as female,
are particular narratives more or less accountable, North-American, or in debt). (Herman 2002, 30)
what interests publicize them, how do they
gain popularity, and how are they challenged?
Each of these categories presumes a different
(Ibid., 19) extension of time. For processes, the implied
period of time is not definite, as it is
Distinctive for the work of Gubrium and for accomplishments. ‘Growing old takes
Holstein is the recognition of two dif- a certain unspecified amount of time, whereas
ferent layers of control: interactional and finishing a peanut butter sandwich entails
institutional (Ibid., 30–41). Within this a sequence of action that falls within a definite
454 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
when young children recount routine, scripted An interesting interplay occurs above, involv-
events, their narratives tend to be more detailed ing slightly different horizons of a cultural
than those of depicting less common incidents. script or (at least partly) shared cultural
(Ochs and Capps 2001, 78; Nelson 2003, 28)
knowledge, master narratives presenting nor-
It is as if these routines and scripts were still, matively privileged accounts, counter nar-
for children, an open and exciting world to ratives that resist and take distance from
be learned and accounted for. But it does such culturally privileged ways of telling, and
not take many years to learn to focus on the high narrativity of good stories that do not
unforeseen, exceptional; the diversions from simply recount the cultural scripts. Because
routine. Mark Turner (1996, 19) calls these master narratives are seldom explicitly told
routine sequences stories, and argues that by anyone, the more formulaic term ‘script’
‘most of our actions consist of executing small is preferred here to refer to the cultural and
spatial stories: getting a glass of juice from the situational impacts on narration.
refrigerator, dressing, bicycling to the market. As Jens Brockmeier and Rom Harré (2001)
Executing these stories, recognizing them, argue, very little is known about how exactly
and imagining them are all related because cultural scripts impose their models on
they are all structured by the same image individual action or narration. There seems to
schemas’. Turner is perfectly right in arguing be two different ways to reckon with cultural-
for the relevance of such spatial sequences cognitive scripts. One is conscious reflection,
in organizing and perceiving human action. resisting or affirmation of what has been called
However, it is argued that these sequences are ‘master narratives’ (Andrews 2004; Bamberg
not yet stories. 2004a; Jones 2004).
Cognitive theorists have discussed scripts, But what should be said about the master
frames and schemata as mental ways of narratives, which ‘remain inaccessible to
understanding new and old situations (Schank our conscious recognition and transforma-
and Abelson 1977). The famous restaurant tion’ (Bamberg 2004a, 361)? One answer
script informs us about understandings of is that the human capacity of narrativity
choosing a table, having a menu, ordering processes this scripting level in an automatic
food and paying the bill as relatively per- way. As a child, we start recounting the
manent parts of the script. Scripts organize formulaic, normal course of events but
shopping, political campaigning and sexual learn step by step – in telling, listening
relationships. Scripts, in addition to being and monitoring responses – to report on
cognitive, cultural and normative, also seem to the exceptional. Our skill as narrators is
be future oriented as well. It is possible to think established on expert understandings of such
that in both following such scripts in practice, cultural scripts as ‘going to a restaurant’.
and in telling stories on visiting restaurants, Herman suggests ‘a direct proportion between
that each teller contributes to the construction a sequence’s degree of narrativity’ and the
of a script, or as I suggest, a master narrative richness of ‘world knowledge’ that it triggers
on the issue. Michael Bamberg (2004a, by using scripts. A clear paradox is made
361) expresses a similar thought without manifest here: narratives should invoke a rich
explicitly making the connection between density of scripts to provide thick narration,
master narratives and scripts: yet narration cannot merely constitute the
repetition of these scripts:
I would like to catch up with the concession that
speakers constantly invoke master narratives, and Just as there is a lower limit of narrativity, past
that many, possibly even most, of the master which certain ‘stories’ activate so few world models
narratives employed remain inaccessible to our that they can no longer be processed as stories at
conscious recognition and transformation. Master all, refusing to be configured into action structures
narratives structure how the world is intelligible, drawing on pre-storied scripts and frames, so there
and therefore permeate the petit narratives of our is an upper limit of narrativity, past which the
everyday talk. tellable gives way to stereotypical, and the point of
456 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
a narrative, the reason for its being told, gets lost personal and subjective, expectations are
or at least obscured […]. (Herman 2002, 103) always social, local and conventional. The
analysis of expectations focuses on the
Important conclusions can be drawn from
dialectics of recognizing, following and devi-
this discussion. Narrativity is based on the
ating from scripts. Originally presented by
processing of numberless cultural scripts.
Hyvärinen (1994, 1998), the practice has been
Scripts as such are not stories or nar-
further elaborated by Komulainen (1998) and
ratives, because narrativity requires both
Löyttyniemi (2001).
‘canonicity and breach’, as Jerome Bruner
The detailed way of reading owes much to
(1991) has put it. Scripts and formulaic
Labov and Waletsky (1997) who already rec-
narratives are used as resources both in
ognized the cognitive relevance of negative
living and telling; yet the whole point of
expressions, which paradoxically do not tell
narrativity grows out of surprise, betrayal
what happened, but what did not. In a closer
of expectations, the ‘discordance’ of life
examination, there are a good many linguistic
(Ricoeur 1984). Beyond early childhood,
expressions reckoning expectations, not the
there is no social telling of script-like
actual experience. Deborah Tannen (1993) has
sequences. But the told narratives can never
summarized the following list of what she
be entirely individual, devoid of script-like
calls ‘evidence of expectation’:
resources. Narratives and narrativity thus
move between cultural scripts (‘canonicity’) (1) Repetition; especially repetition of whole utter-
and totally idiosyncratic babble (breach in ances; (2) False starts; (3) Backtracks, breaking-
every moment). down of the temporal order of telling; (4) Hedges
that flavour the relation between what was
If scripts and master narratives are vital
expected and what finally happened; indeed, just,
parts of narrativity, so is the expectation they anyway, however; (5) Negatives. As a rule negative
necessarily carry along. Labov and Waletsky is only used when its affirmative is expected (Labov,
(1997) noticed that recounted experiences 1972, 380-381); (6) Contrastives; (7) Modals;
are regularly contrasted with expectations. (8) Evaluative language; (9) Evaluative verbs;
(10) Intesifiers; including laughter. (Löyttyniemi
Reading, watching or listening to narratives
2001, 181)
trigger expectations that the stories either
confirm or betray. The point of the list is to illustrate the way
narrative is accounting for and making rele-
vant past futures and past expectations rather
EXPECTATION ANALYSIS than just piecing together action sequences.
The claim behind the analysis is that the key
Bakhtin (1986) not only understands all turning points of life stories exhibit thickness
language use as response to earlier utterances, of expectation and a strong presence of the ‘I’.
he also includes the aspect of expectation in The examples below are from a study on
every utterance: ‘As we know, the role of the the1970s Socialist Student Union (SOL) in
others for whom the utterance is constructed Finland (Hyvärinen 1994, 1998). The female
is extremely great. […] From the very begin- interviewee, ‘Kirsi’, used to be a secretary
ning, the speaker expects a response from general in a local university organization and
them, an active responsive understanding. member of the national central government of
The entire utterance is constructed, as it were, the SOL at the end of her career as an activist
in anticipation of encountering this response’ (Hyvärinen 1994, 164–167):
(Bakhtin 1986, 94).
1 I guess it has been the same year when I’ve
Expectation analysis presumes that oral been in the Central Government that
life stories essentially recount the story of 2 I was totally stuck up
changing, failing or realized expectations 3 that I knew that now everything will go totally
(in other words, they reflect ‘cononicity’). wrong
While experiences may be thought as mainly 4 but I couldn’t say it in a way that I’d believed
ANALYZING NARRATIVES AND STORY-TELLING 457
5 and probably the guys of SOL also loathed words ‘in any case’ indicate that she no longer
me […] cares about the old expectations, whatever
6 but I sulked there happens. There is still the balancing role of
7 To me, the visits to the government were a loyal ex-activist and reflecting experiencer
horrible. Yuk. on line (6) appreciating the experience as
8 But … the reason why I really had the horrible
such but quickly counterbalanced again by the
feeling
9 was that I was in a deadlock. In a way there
price of its learning. Kirsi moves to Helsinki,
was nothing to do where no one knows her, and is able to
experience a new teenage with dancing and
As a narrator, Kirsi is normally very deter- partying. The exhilaration is contrasted with
mined and strongly enacts her identity as the old expectation: ‘I really had hobbies no
regards the interviewer. The problem here Bolshevik would have ever […] believed’
is that she cannot position herself anymore a secretary general to have. It is easy to see
as an agent within the received horizon of how this play with expectations signifies her
expectations. In the above, she takes the re-positioning as regards the organization and
position of affective experiencer who is not the Communist movement.
able to be a competent reflective experiencer
in the situation. This is also a habitual
narrative: it is about the state of being A SECOND NARRATIVE TURN?
stuck, and unbounded emotional processes
(sulking, loathing). The whole section is The map of narrative analysis is changing
full of intensified, colourful expressions. rapidly. Textual and structuralist models of
She hates the situation; it is almost unbearable, analysis are giving way to more contextual
but it is against her expectations of being a approaches that focus on narrative practices
‘good comrade’ to withdraw. The conflict of and storytelling. Semantic theories and cog-
expectations is dramatized on lines (3–4): she nitive narratology offer new tools to connect
sees that everything is going totally wrong the vocabularies of action and narrative in
but she cannot explain it – that is, she productive ways. Recent theories of narrative
cannot solve the conflict within the frame of offer a new sensitivity to stories that are
enduring expectations, since she cannot take incomplete or foreground mental events (of
her position as a brave speaker of truths. observation, feeling, and cognition) instead of
A bit later she talks about leaving the posi- physical action. Expectation and positioning
tion in the organization. The usual dilemma in analysis alike direct attention to the fact that
those days was to find a replacement for the narratives not only account for past experi-
post to achieve a loyal exit: ences but position speakers within networks
of social and cultural expectations (Bamberg
1 It was a horrible task 2004b). The dialectics of ‘master’ and
2 I just said that in any case I’ll quit ‘counter’ narratives highlight the continuous
3 because I’d next start to go haywire move between cultural canon and individual
4 it was that tough expression. The rich flow of post-classical
5 because I was [p]
literary theory of narrative accentuates the
6 afterwards one learned a lot, in a way, though
need to realize the original, interdisciplinary
7 but it was a high price to pay
8 it was the worst situation I’ve gotten into in my ethos of narrative studies. Considering all
life including my divorce these new and dynamic elements, it is indeed
plausible to argue for a ‘second narrative
At last, Kirsi is able to reassume the role turn’, as Alexandra Georgakopoulou (2006)
of an agent, in the verbal form of speaker. does. The key to the realization of this
The conflict of expectations and the old promise, more than ever, seems to reside in
structure of expectations as a dutiful activist realizing the interdisciplinary mission of the
are broken down on lines (2–3), where her narrative turn.
458 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Halliday, M.A.K. 1994. An Introduction to Func- Interpretation. Vol. 47, Applied Social Research Meth-
tional Grammar. Second ed. London, Melbourne & ods Series. Thousand Oaks, London & New Delhi:
Auckland: Edward Arnold. Sage.
Herman, David. 2002. Story Logic. Problems and Linde, Charlotte. 1993. Life Stories. The Creation of
Possibilities of Narrative. Lincoln and London: Coherence. New York, Oxford: Oxford University
University of Nebraska Press. Press.
Herman, David. 2005. Histories of narrative theory (I): Lyotard, Jean-Francois. 1993 [1983]. The Postmodern
A genealogy of early developments. In A Companion Condition, edited by W. a. J. S.-S. Godzich, Theory
to Narrative Theory, edited by J. Phelan and and History of Literature. Minneapolis: University of
P.J. Rabinowitz. Malden, MA: Blackwell. Minnesota Press.
Hymes, Dell. 1996. Ethnography, linguistics, narrative Löyttyniemi, Varpu. 2001. The setback of a doctor’s
inequality. In Critical Perspectives on Literacy and career. In Turns in the Road. Narrative Studies
Education, edited by A. Luke and J. Cook. London: of Lives in Transition, edited by D. P. McAdams,
Taylor & Francis. R. Josselsson and A. Lieblich. Washington, DC:
Hyvärinen, Matti. 1994. Viimeiset taistot [The Last American Psychological Association.
Battles]. Tampere: Vastapaino. MacIntyre, Alasdair. 1984. After Virtue. A Study in Moral
Hyvärinen, Matti. 1998. Thick and thin narratives: Theory. Second ed. Notre Dame: University of Notre
Thickness of description, expectation, and causality. Dame Press.
In Cultural Studies: A Research Volume, edited by McAdams, Dan P. 1988. Power, Intimacy, and the
N.K. Denzin. Stamford: JAI Press. Life Story. Personological Inquiries into Identity.
Hyvärinen, Matti. 2006a. Acting, thinking, and telling: New York, London: The Guilford Press.
Anna Blume’s Dilemma in Paul Auster’s In the Country McAdams, Dan P. 1993. The Stories We Live By. Personal
of Last Things. Partial Answers 4 (2):59–77. Myths and the Making of the Self. New York and
Hyvärinen, Matti. 2006b. Towards a conceptual history London: The Guilford Press.
of narrative. In The Travelling Concept of Narrative, Mink, Louis O. 1987. Historical Understanding, edited
edited by M. Hyvärinen, A. Korhonen and J. by Brian Fay, Eugene O. Golob and R. T. Vann. Ithaca
Mykkänen. Helsinki: Helsinki Collegium for Advanced and London: Cornell University Press.
Studies. Mishler, Elliot G. 1986. Research Interviewing. Context
Jones, Rebecca L. 2004. ‘That’s Very Rude, I Shouldn’t and Narrative. Cambridge, MA: Harvard University
be Telling You That’. Older women talking about Press.
sex. In Considering Counter-Narratives. Narrating, Mishler, Elliot G. 1997. A matter of time: when, since,
Resisting, Making Sense, edited by M. Bamberg and after Labov and Waletsky. Journal of Narrative and
M. Andrews. Amsterdam/Philadelphia: John Benjamins. Life History 7 (1–4):61–68.
Josselson, Ruthellen. 2004. The hermeneutics of faith Nelson, Katherine. 2003. Narrative and the emergence
and the hermeneutics of suspicion. Narrative Inquiry of a consciousness of self. In Narrative and
14 (1):1–28. Consciousness. Literature, Psychology, and the Brain,
Kohli, Martin. 1981. Biography: account, text, method. edited by G.D. Fireman and T.E. McVay Jr. Oxford and
In Biography and Society. The Life History Approach New York: Oxford University Press.
in the Social Sciences, edited by D. Bertaux. Beverly Ochberg, Richard L., and George C. Rosenwald.
Hills and London: Sage. 1992. Storied Lives:The Cultural Politics of Self-
Komulainen, Katri. 1998. Kotihiiriä ja ihmisiä. Vol. 35, understanding. New Haven and London: Yale
Joensuun yliopiston yhteiskuntatieteellisiä julkaisuja. University Press.
Joensuu: Joensuun yliopisto. Ochs, Elinor, and Lisa Capps. 2001. Living Narrative.
Kreiswirth, Martin. 2005. Narrative turn in the Creating Lives in Everyday Storytelling. Cambridge,
humanities. In Routledge Encyclopedia of Narrative MA: Harvard University Press.
Theory, edited by D. Herman, M. Jahn and M.-L. Ryan. Phelan, James. 2005a. Editor’s column. Narrative
London and New York: Routledge. 13 (3):205–210.
Labov, William. 1972. Language in the Inner City. Phelan, James. 2005b. Living to Tell about It. A Rhetoric
Oxford: Basil Blackwell. and Ethics of Character Narration. Ithaca: Cornell
Labov, William, and Joshua Waletsky. [1967] 1997. Nar- University Press.
rative analysis: oral versions of personal experience. Plummer, Ken. 1983. Documents of Life. An Introduction
Journal of Narrative and Life History 7 (1–4): 3–38. to the Problems and Literature of a Humanistic
Lieblich, Amia, Rivka Tuval-Mashiach, and Tamar Zilber. Method, Contemporary Social Science Series. London:
1998. Narrative Research. Reading, Analysis, and Allen & Unwin.
460 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Plummer, Ken. 2001. Documents of Life 2. An Invitation Schank, Roger, and Robert Abelson. 1977. Scripts, Plans,
to a Critical Humanism. London, Thousand Oaks, Goals and Understanding. New York: John Wiley &
New Delhi: Sage. Sons.
Polkinghorne, Donald E. 1988. Narrative knowing and Schegloff, Emanuel A. 1997. ‘Narrative Analysis’ thirty
the human sciences. In SUNY Series in Philosophy of years later. Journal of Narrative and Life History
the Social Sciences, edited by L. Langsdorf. Albany: 7 (1–4):97–106.
State University of New York Press. Seale, Clive. 2000. Resurrective practice and narrative.
Propp, Vladimir. 1968. Morphology of the Folktale. In Lines of Narrative. Psychosocial Perspectives,
Translated by L. Scott. 2nd ed. Austin: University of edited by M. Andrews, S.D. Sclater, C. Squire and
Texas Press. A. Treacher. London and New York: Routledge.
Propp, Vladimir. 1984. The structural and historical study Smith, Barbara Herrnstein. 1981. Narrative version,
of the wondertale. In Theory and History of Folklore, and narrative theories. In On Narrative, edited by
edited by A. Liberman. Manchester: Manchester W.J.T. Mitchell. Chicago: University of Chicago Press.
University Press. Squire, Corinne. 2004. Narrative genres. In Qualitative
Ricoeur, Paul. 1984. Time and Narrative 1. Translated by Research Practice, edited by C. Seale, G. Gobo,
K. McLaughlin and D. Pellauer. 3 vols. Vol. 1. Chicago J.F. Gubrium and D. Silverman. London, Thousand
and London: The University of Chicago Press. Oaks, New Delhi: Sage.
Riessman, Catherine Kohler. 1990. Divorce Talk. Women Strawson, Galen. 2004. Against narrativity. Ratio (New
and Men Make Sense of Personal Relationships. New Series) XVII (4):428–452.
Brunswick and London: Rutgers University Press. Tannen, Deborah. 1993. What’s in a frame? In Framing
Riessman, Catherine Kohler. 1993. Narrative Analysis, in Discourse, edited by D. Tannen. Oxford: Oxford
Qualitative Research Methods Volume 30. Newbury University Press. Original edition, Freedle, R.O. (Ed.)
Park, London & New Delhi: Sage. 1979 New Directions in Discourse Processing.
Riessman, Catherine Kohler. 2001. Analysis of personal Thomas, William I., and Florian Znaniecki. 1984.
narratives. In Handbook of Interview Research, edited The Polish Peasant in Europe and America, edited by
by J.F. Gubrium and J.A. Holstein. Thousand Oaks, E. Zaretsky. Urbana and Chicago: University of Illinois
London & New Delhi: Sage. Press. Original edition, 1918–1920.
Rimmon-Kenan, Shlomith. 2006. Concepts of narrative. Turner, Mark. 1996. The Literary Mind. The Origins of
In The Travelling Concept of Narrative, edited by Thought and Language. Oxford and New York: Oxford
M. Hyvärinen, A. Korhonen and J. Mykkänen. University Press.
Helsinki: Helsinki Collegium for Advanced Studies, White, Hayden. 1987 [1981] The value of narrativity
University of Helsinki. in the representation of reality. In The Content
Ryan, Marie-Laure. 2005. Narrative. In Routledge Ency- of the Form. Narrative Discourse and Historical
clopedia of Narrative Theory, edited by D. Herman, Representation, edited by H. White. Baltimore &
M. Jahn and M.-L. Ryan. London and New York: London: The Johns Hopkins University Press.
Routledge. Widdershoven, Guy A.M. 1993. The story of
Sarbin, Theodor. 1986. Narrative as a root metaphor life. Hermeneutic perspectives on the relationship
for psychology. In Narrative Psychology. The Storied between narrative and life history. In The Narrative
Nature of Human Conduct, edited by T. Sarbin. Study of Life, Volume I, edited by R. Josselsson and
New York: Praeger Press. A. Lieblich. Newbury Park and London: Sage.
27
Reconstructing Grounded Theory
Kathy Charmaz
In the 40 years since Barney G . Glaser Yet which grounded theory strategies to
and Anselm L. Strauss (1967) wrote their adopt, what they entail and how to put
pioneering book, grounded theory has become them into practice have undergone change
a general qualitative method that cuts across and reconstruction, even by the originators
disciplines and professions. The method con- themselves (see Glaser, 1998, 2001; Strauss
sists of several distinctive strategies; however, 1987; Strauss and Corbin 1990, 1998). Major
scholars vary in what they adopt and major differences among proponents arise from
proponents differ on which strategies they varied assumptions about what constitutes
see as integral to the method (see Charmaz, theory and from contrasting epistemological
2006; Clarke, 2005, 2006; Glaser, 1998, 2001; allegiances. These allegiances result in dif-
Strauss, 1987; Strauss and Corbin, 1990, ferent constructions of the research process,
1998). What then is grounded theory? What the practice of theorizing, and what stands
does it include? The term refers to both as erosion or evolution of the method (see
a method of theory construction, my focus Baker et al., 1992; Boychuk Duchscher and
here, and the product of this construction, Morgan, 2004; May, 1996; Mills et al., 2006;
a theory that explains or elucidates a particular Stern, 1994).
process or phenomenon. Throughout the chapter, I show how
The grounded theory method provides sys- grounded theory, and its various iterations,
tematic, successive strategies for developing have shifted and changed. I also address
fresh ideas to collect, study, and analyze the following objectives: (1) to situate
empirical data (see also, Atkinson et al., 2003; the original methodological contribution of
Clarke, 2005, 2006; Glaser, 1978; Glaser and grounded theory; (2) to look at the history
Strauss, 1967). Grounded theory starts with an and development of the method; (3) to out-
inductive logic and emphasizes simultaneous line postmodern challenges to the method
data collection and analysis to construct and discuss its constructivist reconstructions;
middle-range theories. and (4) to analyze grounded theory as
Those who subscribe to grounded theory method and practice. I attend to debates about
would accept this definition of the method. grounded theory and show how they are
462 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
played out as various proponents reconstruct Schreiber and Stern, 2001; Stern, 1980;
the method and note its potential for creating Wilson and Hutchinson, 1996; Wuest, 1995,
imaginative interpretations. 2001) and information systems (Bryant, 2002,
2003; Urquhart, 2003). Specialists are begin-
ning to appear within subfields (LaRossa,
2005), and have become established in
SITUATING THE USE AND grounded theory computer applications (see,
METHODOLOGICAL IMPORT for example, Fielding and Lee, 1998; Lonkila,
OF GROUNDED THEORY 1995; Kelle, 2004).
The logic and explicit strategies of
To understand why and how scholars, grounded theory have contributed to its wide
including its originators, have reconstructed appeal. Unlike earlier twentieth-century field
grounded theory, one needs to know about the research, Glaser and Strauss made simultane-
situations surrounding its development and ous data collection and analysis an integral
current directions. These situations transcend part of grounded theory. They proposed ways
the method itself as they include its followers of focusing and integrating data collection
and critics. Both followers and critics tend to while advancing the theoretical analysis of the
have limited visions of the method. Followers collected data. The logic of grounded theory
commonly identify the version of grounded relies on starting with inductive data and sub-
theory they first learned as representing the jecting them to close scrutiny through specific
method in its entirety (Urquhart, 2007). coding and analytic practices, while collecting
Some followers and critics have scarcely read data (see Charmaz, 2003, 2006; Glaser, 1978,
beyond Glaser and Strauss’ (1967) original 1998). Grounded theory coding practices
exegesis. Critics often conflate the way the lead to developing analytic categories, and
originators used the method as mirroring then refining these categories and checking
inherent characteristics of the method (see, them empirically, as the analysis becomes
for example, Burawoy, 1991; Layder, 1998). increasingly theoretical. Thus, the logic of
They argue that grounded theory cannot grounded theory means that researchers retain
account for macro social processes or struc- strong empirical foundations in their work
tures left untapped at the interactional level. and offer abstract, conceptual theories of the
Grounded theory made its methodological studied empirical phenomena.
mark by proposing explicit guidelines for the- Glaser and Strauss’ original statement was
orizing from data. From Glaser and Strauss’ revolutionary for four reasons. First, they took
original treatise to recent major statements discussion of qualitative inquiry beyond data
by Adele E. Clarke (2003, 2005) and Kathy collection techniques and field research roles.
Charmaz (2000, 2006), grounded theorists Instead, they explained how to streamline
have emphasized constructing theory from data collection by asking analytic questions
inductive qualitative data through using suc- and developing theoretical rendering of the
cessive analytic strategies. Grounded theory data—from the very beginning of the research
methods have appealed to diverse researchers endeavor. Second, they outlined inductive
from varied disciplines and professions who guidelines for coding data and developing
have claimed allegiance to using them. emergent abstract categories. Third, Glaser
By now, spokespersons have emerged in a and Strauss argued that their methodological
variety of disciplines, such as psychology strategies could advance data analysis to
(see for example, Charmaz, 2003; Charmaz construct middle-level theories. Fourth, they
and Henwood, 2007; Henwood and Pidgeon, provided powerful legitimation for conduct-
1995; 2003; Pidgeon and Henwood, 1996, ing inductive qualitative research at a time
2004; Rennie et al., 1988), management when most social scientists were enamored
(Goulding, 2002; Locke, 2001), nursing with the promise of rigorous quantitative
(Benoliel, 1996; Chenitz and Swanson, 1986; inquiry.
RECONSTRUCTING GROUNDED THEORY 463
This last reason led social scientists to claim research widened. U.S. sociology steadily
that they adopted grounded theory methods adopted more quantitative techniques and the
when they had conducted some sort of qual- distance between theory and methods grew
itative research or had only followed one or (Charmaz, 2000, 2006).
two grounded theory strategies but did not aim Grounded theory methods arose from
for theory development. Other researchers’ Glaser and Strauss (1967) efforts to explicate
claims of adopting grounded theory strategies the strategies they had followed while con-
may have been more consistent with the ducting their qualitative studies of the social
method but their reductionist, mechanistic organization of dying in hospitals (Glaser and
application of it undermined its potential Strauss, 1965, 1968). Their efforts brought
for open-ended, creative theorizing. Miller’s renewed attention to qualitative research at
(2000) argument still holds: the full potential a pivotal point in time. Platt (1996) points
of grounded theory methods for generating out that the development of public opinion
theory remains untapped. Researchers can research and statistical techniques during
profit from the flexible, open-ended strategies World War II and the institution building of
of grounded theory to conduct systematic, Kurt Lewin and Paul Lazarsfeld afterwards
directed inquiry and to engage in imaginative established the hegemony of the survey and
theorizing from empirical data. the dominance of its proponents’departments.
Meanwhile, inductive qualitative inquiry in
sociology in the United States had shifted
HISTORY AND DEVELOPMENT OF from the case study to participant observation.
This methodology had not been theorized,
GROUNDED THEORY
explicated, or codified in accessible ways.
Nor, as Platt notes, did proponents talk about
The emergence of grounded theory
field methods. Paul Rock (1979) points out
The history and development of grounded that novices learned Chicago school field
theory are intertwined with larger currents in research through a combination of mentoring
social scientific inquiry, and particularly with and becoming immersed in field research
tensions between qualitative and quantitative settings. What researchers actually did while
research in sociology in the United States. in the field and afterwards remained opaque.
During the early decades of the twentieth cen- Early methodological texts emphasized data
tury, sociologists, particularly at the Univer- gathering and field work roles and relations
sity of Chicago, began building an empirical rather than qualitative analytic strategies (see,
foundation in life histories and case studies1 . for example, Adams and Preiss, 1960; Junker,
By mid-century this foundation had weakened 1960; Kahn and Cannell, 1957)2 .
due to the development of quantitative By 1965, quantification with its positivist
methods. Unlike strong British and European underpinnings framed methodological discus-
sociological traditions in critical debate and sions in United States sociology3 . Methods
praxis in theorizing, U.S. sociology advanced textbooks of the day outlined methodological
quantification of various sorts and abstract objectives and procedures that did not fit
macro theories devoid of solid empirical qualitative research. Some mid-century quan-
roots. As Jennifer Platt (1996) states, leading titative researchers saw qualitative inquiry
quantitative methodologists often borrowed as a precursor to constructing quantitative
procedures from other disciplines and some instruments but most viewed qualitative
sociologists quantified measures to persuade studies as impressionistic, anecdotal, and
outside audiences, not because they believed biased. As such, qualitative research could
quantification to be necessary. At that time, not meet mid-century canons for reliability
however, the divide between theory and and validity. The inability of qualitative
research deepened and the gap between induc- researchers to replicate their studies further
tive qualitative and deductive quantitative marginalized qualitative research.
464 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
aimed for streamlined, general abstract sampling and theoretical sampling become
statements removed from context. Goffman’s blurred? How does a budding grounded
metaphor of the drama permitted readers theorist reconcile Glaser’s notion of a single
to see social life anew. In keeping with core variable with the search for meanings and
his empirical emphasis, however, Glaser actions in a field of inquiry?
(1978) contended that Goffman relied too For Strauss, the search for meanings and
heavily on this metaphor. Glaser and Strauss actions formed the core of sociological
called on qualitative researchers to raise research. Pragmatists John Dewey, George
their description to a theoretical level and to Herbert Mead, and Charles S. Peirce had left
develop explicit theoretical statements. a lifelong imprint on him. During his doctoral
The Discovery of Grounded Theory studies Strauss’ immediate intellectual influ-
attacked reigning theoretical and ences at the University of Chicago included
methodological assumptions of the day Herbert Blumer, Everett Hughes, and Robert
and led the charge to win a new and Park. Thus, Strauss brought symbolic inter-
renewed place for qualitative inquiry—for actionism, and ethnographic field research to
everyone. In this sense, Glaser and Strauss grounded theory and an emphasis on work
democratized qualitative research. For them, to his empirical research. Strauss’ pragmatist
it consisted of a set of skills that students heritage gave grounded theory its emphases
beyond elite Chicago circles could learn. on agency, emergence, meaning, and action.
Simultaneously, they demystified qualitative Both Glaser and Strauss aimed to study social
analysis by offering flexible guidelines. and social psychological processes. They
This combination of democratization and first planned to generate substantive theories
demystification struck a responsive chord that explicated and explained a fundamental
among diverse audiences. social or social psychological process within
Grounded theory combined two competing a social setting or a particular experience
traditions in mid-century American sociology such as dying in hospitals. They argued
in an unlikely marriage. Glaser wished that the resulting grounded theory could
to codify qualitative inquiry in an analo- explain the major categories in the studied
gous way that his mentor, Paul Lazarsfeld process, explicate their properties, demon-
(Lazarsfeld & Rosenberg, 1955) had codified strate the causes and conditions under which
quantitative research5 . Glaser’s Columbia these categories emerged and varied, and
University intellectual heritage in structural- delineate their consequences. As Glaser and
functionalism, rigorous quantitative methods, Strauss (1965, 1968) developed categories
and the quest for middle-range theories gave such as ‘mutual pretense,’ ‘open awareness,’
grounded theory its rigor, language, direction, ‘closed awareness,’ and ‘time expectations,’
and objectives. He borrowed terms from they began to move into formal theoriz-
quantitative research design but gave them ing because their categories and processes
new, often inverted, meanings. Thus qualita- reached across substantive areas and could be
tive coding became something that emerged further explored in these new areas. Thus, as
from data rather than applied to it; sampling Glaser and Strauss’ theories reached this level
became a strategy to fill out theoretical of generality, they advocated refining their
categories rather than to seek population emerging theories by seeking relevant data in
representativeness, and core variables arose varied settings that moved across substantive
from tentative categories not from deduced areas. The researchers would then refine the
operations from abstract concepts. The lan- categories of the emerging formal theory, as
guage itself spawned confusion that has informed by the new data these categories
lasted until the present. When does grounded subsumed.
theorists’ coding emerge from their study of Glaser and Strauss’ arguments in the
data rather than serving as codes applied to Discovery book contributed much to revital-
data? When, if ever, might representational izing qualitative research and to maintaining
466 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
and extending Chicago school ethnographic remained consistent with his 1978 exegesis
traditions in sociology. They inspired new of the method, which relied on comparative
scholars in diverse fields to pursue qualita- approaches at each step of the analytic pro-
tive research and trained doctoral sociology cess, avoidance of extant theories, a delayed
and nursing students in grounded theory. literature review, and on a direct and, often,
They offered innovative strategies to move narrow empiricism.
qualitative inquiry beyond description and To an extent, Strauss and Corbin’s technical
into explanatory theory that conceptualized applications foster a formulaic approach
the studied phenomena in theoretical cate- rather than developing Glaser’s type of emer-
gories and demonstrated abstract relationships gent analysis. They introduce axial coding as
between these categories. And they contended part of a complex ‘coding paradigm’ and the
that a completed grounded theory was useful, conditional matrix as techniques for viewing
unlike mid-century grand theory. Glaser and data and producing an analysis. In axial
Strauss proposed that a finished grounded coding, researchers (1) treat a category as an
theory would meet the following criteria: axis; (2) specify the properties and dimensions
a close fit with the data, usefulness, den- of this category; (3) relate categories to their
sity, durability, modifiability, and explanatory subcategories; and (4) delineate relationships
power (Glaser, 1978, 1992; Glaser and between them (Strauss and Corbin, 1998,
Strauss, 1967). p. 123). Strauss and Corbin argue that axial
coding brings the data back together again
into a coherent whole after fracturing them
Procedures versus emergence in the
during initial line-by-line coding (Charmaz,
reconstruction of grounded theory
2006, p. 186). In addition to forcing data
Strauss’ publication of Qualitative Analysis into preconceived frameworks, Glaser (1992,
for Social Scientists (1987) sowed the seeds 1998) viewed axial coding as sidestepping
of the first reconstruction of grounded theory. his families of theoretical codes that he
These seeds matured in his co-authored book laid out in Theoretical Sensitivity. Glaser
with Juliet Corbin, Basics of Qualitative views these codes as supplying the latent
Research (1990, 1998) because in significant links and theoretical explanations that hold
ways it revised grounded theory and set a new a researcher’s inductive categories together.
course for it. In his 1987 book, Strauss began He insists that theoretical codes must earn
to move grounded theory toward verification. their way into the analysis; however, whether
His co-authored works with Corbin further or not these codes constitute another form
this direction. In addition, Strauss and Corbin of forcing data remains ambiguous. Applying
created several new technical procedures to them mechanically would result in forc-
be applied to the data rather than emerg- ing data—and forcing one’s categories into
ing from analyzing them. Glaser’s (1992) a particular configuration, as Glaser (1992)
acrimonious response to the first edition acknowledges. Seeing and pursuing which
of Basics disavows Strauss and Corbin’s theoretical directions, issues, and, possibly,
innovations and proclaims his version of concepts the data suggest makes more sense.
grounded theory to be the only authentic These theoretical directions may spawn
statement of the method. Glaser argues that original ideas that move beyond Glaser’s
Strauss and Corbin’s procedures force data theoretical codes or Strauss and Corbin’s axial
and analysis into preconceived categories and, coding.
thus, contradicted essential grounded theory Strauss and Corbin designed their other
guidelines based on comparative analysis and procedural innovation, the conditional/
emergent categories. Glaser saw Strauss and consequential matrix, to provide a technique
Corbin’s innovations as usurping the method for coding to make the intersections of
and imposing unnecessary complexity on micro and macro conditions/consequences
the analytic process. At that time, Glaser on actions visible and to clarify connections
RECONSTRUCTING GROUNDED THEORY 467
between them. By creating the conditional/ with code, code with code and so forth as the
consequential matrix, Strauss and Corbin researcher moves up levels of abstraction.
intended to make connections between levels The potential tensions between Glaser’s
of analysis more visible. positivism and Strauss’ pragmatism are per-
Kelle (2005) reduces the controversy haps greater than their respective grounded
between Glaser and Strauss and Corbin to theory books indicate. Strauss’ strong prag-
whether a researcher follows the coding matist roots are more evident in his early
paradigm systematically—perhaps rigidly?— works (e.g. 1959/1969, 1961; Glaser and
or adopts ad hoc theoretical codes from Strauss, 1965, 1967, 1968) and in Continual
Glaser’s coding families. Even though Kelle’s Permutations of Action (1993) than in his co-
view makes sense, it undermines Glaser’s authored grounded theory texts with Juliet
approach to constructing emergent categories. Corbin, Basics of Qualitative Research (1990,
Kelle sees Glaser’s emphasis on emergence 1998), which contain positivist undercurrents.
as a problematic methodological concept Both Strauss and Corbin’s and Glaser’s ver-
imbedded in Glaser’s exhortations to study sions of grounded theory assume an external
data without adopting a preconceived theo- reality independent of the observer, a neutral
retical frame. True, Glaser views emergence observer, and the discovery of data. Notions
as contingent on not forcing data into about what researchers see, define, and
extant theories and his resounding ‘Trust in describe as data do not permeate their texts.
emergence’ has the ring of a slogan. Yet an Glaser ignores the vital roles of perspectives
emphasis on emergence means more than and language for what we define as data
a slogan. An apt approach combines Dey’s and Strauss and Corbin state, ‘Although we
(1999) view of bringing an open mind to data do not create data, we create theory out of
with Henwood and Pidgeon’s (2003) notion data’ (1998, p. 56). Such approaches do not
of theoretical agnosticism. This approach acknowledge the position from which the
is consistent with an injunction from the observer sees and speaks much less how
abductive logic that has always characterized grounded theory is an inherently interactive
grounded theory: remain open to all kinds method during every step of the process.
of theoretical possibilities and gather more Whether or not researchers use axial coding
data to check the most plausible explanation or adopt the conditional/consequential matrix,
(Peirce, 1938/1958; Rosenthal, 2004). Kelle Strauss (1987) and Strauss and Corbin (1990,
correctly takes Glaser to task about assuming 1998) have made diagramming an integral
that facts stand alone, and that a theory- part of the method for their followers.
free observer can see them but also notes Diagramming representations of relationships
the conflicting assumptions about possessing between categories fosters developing ana-
‘theoretical sensitivity’ (Glaser and Strauss, lytic complexity with multiple categories.
1967, p. 3; Glaser, 1978). In this sense, Strauss and Corbin’s reconstruc-
A major difference between Glaser and tion moves beyond Glaser’s variable analysis
Strauss and Corbin may lie in how and when of one core variable and also provides a foun-
each imports their respective form of coding dation for Adele E. Clarke’s (2003, 2006)
into the analysis. For Glaser, theoretical postmodernist revision of grounded theory
coding comes after the grounded theorist has and methodological strategy of mapping
advanced tentative categories; for Strauss and empirical situations and positions. She creates
Corbin axial coding is a means of developing positional maps that not only chart discourses
categories. Glaser and Strauss and Corbin but also locate silences and paths not taken as
each contend that their respective forms of well as those taken.
coding put the previously fractured data In keeping with his positivist heritage,
back together in conceptual ways. A second Glaser assumes an expert observer who makes
difference lies in their use of comparisons. neutral, unproblematic observations and
Glaser sticks to comparing data with data, data offers slogans such as ‘All is data’
468 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
(2001, p. 145) that gloss what researchers Since Glaser and Strauss’ (1967) original
may define as ‘all.’ Glaser explicitly promotes statement, several major grounded theorists
theorizing from outside the studied experi- have aimed beyond middle-range theories.
ence rather than from within it. For years Strauss (1987, 1993), independently as well
he argued that study participants will tell as with co-author Juliet Corbin (Strauss and
researchers their main concern about what’s Corbin, 1990, 1998), began to move from
happening in their setting (see, for example, the micro level of analysis to meso and
Glaser, 1992). Beyond any intent to focus macro levels, an effort that Clarke (2003,
an observer’s gaze on some issues and 2005, 2006) has extended. Elsewhere I have
away from others, relying on participants’ initiated a discussion of taking grounded
directives can still result in an outsider’s theory methods into structural analysis with an
analysis. Participants often take for granted explicit emphasis on social justice (Charmaz,
the fundamental processes and conditions that 2005).
shape their lives. Following participants’overt
statements may lead to unwitting acceptance
of a public relations rhetoric and subsequent
analysis of an outsider rather than insider’s POSTMODERN CHALLENGES AND
viewpoint. Interestingly, in a significant shift, CONSTRUCTIVIST RECONSTRUCTIONS
Glaser later (2001, p. 51) acknowledges that OF GROUNDED THEORY
the researcher identifies and conceptualizes
participants’ main concern. By 1990, publication of Strauss’ Qualitative
Overall, Glaser’s epistemology has Methods for Social Scientists (1987) and
remained consistent over the years. Yet he, Strauss and Corbin’s Basics of Qualitative
too, has reconstructed grounded theory Research had made the method immensely
practice in both major and minor ways. Unlike popular throughout the social sciences and
Strauss and Corbin’s (1990, 1998) recon- professions. The qualitative revolution had
structions, Glaser’s shifts are incremental and spread widely and Basics gave researchers
buried in the dense texts of his self-published a way to conduct qualitative research. Simul-
books. He also presents his shifts as contribu- taneously, however, the positivist residues
tions to an evolving method. But who decides of early grounded theory statements came
what represents its evolution, reconstruction under increased scrutiny and postmodern
or erosion? Glaser has disavowed his quest to and narrative turns undermined the method.
define and analyze a basic social process or Some scholars (see for example, Conrad,
basic social psychological process because he 1990; Ellis, 1995; Richardson, 1993) viewed
now sees such a quest as forcing the data. This grounded theory as clinging to an outdated
change is fundamental because earlier Glaser modernist epistemology. For them, grounded
built grounded theory practice on the analytic theory fragmented the respondent’s story,
explication of these processes. Similarly, relied on the authoritative voice of the
another major change in methodological researcher, blurred difference, and accepted
practice concerns initial coding. Glaser (1992, Enlightenment grand metanarratives about
2001) disavows his earlier prescription to do truth, universality, human nature, and world
line-by-line coding to fracture the data and views. Such critiques melded grounded theory
to see beyond the immediate story during the strategies with the originators’ early state-
initial coding. Instead, he advocates seeking ments and how they used the method.
a core variable through comparisons of A reconstructed grounded theory can take
incidents. Minor shifts include adding more into account many of the criticisms that
families of theoretical codes, changing the varied critics have raised. Researchers can
rules for memo-making, and narrowing adopt–and may adapt—the flexible strategies
the definition of theorizing to ‘a theory of that Glaser and Strauss (1967; Glaser, 1978,
a core category’ (2001, p. 206). 2001) originally delineated. These strategies
RECONSTRUCTING GROUNDED THEORY 469
begin with inductive cases and define an interpretation. Such an approach adopts
intriguing finding, which they attempt to a preconceived form for the method without
explain. Abductive reasoning involves the attending to how the content of the research
imaginative interpretation of accounting for can re-form the form. Form and content
this finding by entertaining all possible shape each other, particularly in constructivist
theoretical interpretations, and then checking versions of grounded theory. Researchers
these interpretations against experience until study and focus data collection and analytic
arriving at the most plausible theoretical in a dialectical process. Therefore, the method
explanation (Hildebrand, 2000/2004; Peirce, itself becomes constructed and reconstructed
1938/1958; Reichert, 2000/2004; Rosenthal, throughout the research process. Maintain-
2004). Abductive logic builds checks into ing this dialectic requires active, reflective
the research process and, therefore keeps an researchers, whose reasoning directs their
emerging theory grounded in the data that it enactment of this method.
attempts to explain. The fundamental property of emergence in
For Glaser (1992, 1998, 2001, 2003), grounded theory relies on active researchers
the comparative methodology consists of who interact with their data and interpret these
a set of successive strategies for developing data—and their research practices. The image
theoretical categories and renders these cate- of neutral, passive researchers who discover
gories objective through abstraction of their data and theory is a mirage. Moving from
properties. For Strauss and Corbin (1998), data to theory requires researchers’ sustained
the comparative method corrects ‘possible interaction and actions with their data and
distortion of meaning’ (p. 137). Their ‘far- emerging analyses. In short, grounded the-
out’ comparisons leap beyond the data but orists study emergent processes—and the
hearken back to Everett Hughes’ (1958) method itself is an emergent process.
seemingly incongruent comparisons such as
the similarities between psychiatrists and
Grounded theory guidelines
prostitutes, a comparison that long entranced
American sociologists. Several basic grounded theory guidelines
In the early years, both Glaser and Strauss have become standard fare in qualitative
treated grounded theory as an emergent inquiry. Nonetheless, the grounded the-
method. It is ironic that Strauss’ methodolog- ory emphasis on action and process, its
ical texts with Corbin became increasingly comparative approach, and its particular
procedural. Mead’s (1932) philosophy of coding and sampling strategies make the
time and conception of the emergent present method unique—and sometimes misunder-
had profoundly affected Strauss’ method- stood. Because these guidelines have been
ological practice and theoretical perspective. discussed at length elsewhere (Charmaz,
His methodology books do not fully portray 2003, 2005, 2006; Glaser, 1978, 1992, 1998,
the fluidity of his thinking or the creativity 2001, 2003; Glaser and Strauss, 1967; Locke,
enacted in his co-authored research with both 2001; Strauss, 1987; Strauss and Corbin,
Glaser and Corbin (see, for example, Corbin 1990, 1998) I merely outline them here.
and Strauss, 1988; Glaser and Strauss, 1968). Unlike most qualitative approaches,
A procedural approach to grounded theory grounded theory provides explicit strategies
dampens its emergent strengths and dimin- for defining and studying processes: this
ishes possibilities for theoretical innovation. method places priority on action. Glaserian
Researchers have long associated grounded versions of grounded theory build action
theory as having a particular form, but have into the analysis from the earliest coding.
not explicated the vital role of content for The comparative study of actions and codes
directing this form. They can become mired advances an inductive analysis. By invoking
in following procedures and subsequently comparative methods throughout the analysis
produce description rather than theoretical grounded theorists define analytic properties
472 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
of their codes. Essentially from the start Focused or selective coding follows
grounded theorists code and analyze to scrutiny of the initial codes. Focusing on
illuminate actions, process, and potential both the most frequent and the most telling
theoretical meaning (Glaser, 1978). In brief codes provides tentative leads to explore
grounded theory guidelines include the and check during subsequent data collection.
following comparative research practices: Researchers use focused codes to sort large
amounts of data and to construct tentative
categories in their emerging theories.
• Comparing data with data
• Labeling data with active, specific codes
Memo-writing is the crucial stage of
• Selecting focused codes analysis between coding and writing sections
• Comparing and sorting data with focused codes of a first draft of the study. In grounded theory
• Raising telling focused codes to tentative analytic practice, researchers write memos from the
categories very beginning of their research and continue
• Comparing data and codes with analytic cate- to write progressively more focused and
gories analytic memos as they proceed. Memos lend
• Constructing theoretical concepts from abstract form to fleeting ideas, take codes and cate-
categories gories apart, make comparisons explicit, mine
• Comparing category with concept descriptions, stories, and incidents for their
• Comparing concept and concept7
analytic import, raise and discuss conjectures,
and identify gaps and unanswered questions
Objectivist grounded theorists who follow in the data. Writing memos becomes a means
Glaser’s aim to use comparative methods of actively engaging one’s data, codes, and
without preconceptions. Thus they pre- categories. By including data in the memo,
scribe entering the research setting and researchers build clear links to categories.
analysis uncontaminated by prior theory Much comparative analysis occurs while
and disciplinary knowledge. Constructivist memo-writing from comparing data with data
grounded theorists use their prior knowledge and codes early on to comparing category with
and disciplinary perspectives to sensitize category as researchers develop their theories.
them to conceptual issues at the beginning An emergent fit of the categories may then
but seek new theoretical interpretations as become apparent through writing memos.
they interrogate their data and emerging Grounded theory builds checks on the
analyses. analysis throughout the process. Memo-
At least two phases of coding characterize writing fosters checking hunches and keeping
grounded theory: open, or initial, and selec- the analysis grounded. Theoretical sampling,
tive, or focused. During the initial phase, line- offers another pivotal, but often misunder-
by-line coding prompts the researcher’s active stood strategy for grounding the analysis
involvement in the analysis. To do line-by- and increasing its incisiveness. Theoretical
line coding at all, researchers must view the sampling means sampling to flesh out or
data in greater depth than passively perusing refine theoretical categories to increase the
it or looking for themes, as qualitative precision of the emerging theory. In short,
researchers generally do. Even though Glaser this strategy invokes abductive reasoning
has jettisoned line-by-line coding, it remains because researchers test their tentative ideas.
an excellent heuristic strategy for scrutinizing Theoretical sampling arises from researchers’
data and for examining one’s preconceptions analyses, not from any representation of
about the data as well as becoming aware of population traits or status attributes.
tacit alignments or shared assumptions with When does the iterative process of moving
participants. By constructing active, specific, between collecting and analyzing data end?
and short initial codes, the grounded theorist The standard grounded theory answer is when
creates handles for making comparisons categories are saturated. That means that
between data and between codes. the researcher has explicated the properties
RECONSTRUCTING GROUNDED THEORY 473
of each theoretical category and has sought and their scientistic language undercuts its
data that fill each property. The emphasis on potential artfulness. Their dual emphases on
categories and properties makes saturation science and art are also evident in their
a theoretical concern, not merely a method- shared empirical works. They develop such
ological measure indicating redundancy of concepts as ‘biographical body conceptions
data as in conventional qualitative research. (or BBC) … [which] represents those three
Yet the concept of theoretical saturation concepts—biographical time, body, and con-
remains problematic in grounded theory. Like ceptions of self’ (p. 252) and the ‘BBC
the assumption that grounded theorists share chain,’(Corbin and Strauss, 1987, p. 253), ‘the
definitions of ‘theory,’ the standard answer of combination of the three working together.’
saturation does not address what constitutes These concepts provide analytic tools that
a category, nor does it explain how one knows dissect experience but distance it from how
that all salient properties and their variations people live it. Within the same paper, however,
have been defined, much less been given ade- Corbin and Strauss, offer some artful narrative
quate coverage. Grounded theorists usually descriptions that bring the experience to life.
assert that they have saturated the properties Below they discuss questions arising when
of a category rather than demonstrating it people first receive a diagnosis of chronic
(Morse, 1995). illness and describe the properties of this
The last major grounded theory strategy temporal turning point:
involves integrating the analysis. How does
… [W]hen past and future come crashing into the
one accomplish it? By this time grounded undesirable or dreaded present. This identity shock
theorists should have a set of well-developed is followed by future images of what the illness will
analytic memos on their categories and mean in terms of biographical performances such
concepts. Integrating them becomes part as: ‘I will be crippled.’ ‘I will no longer be able to,’
of theorizing and, thus, researchers next ‘I might die soon.’ The degree to which identity is
jolted depends upon the number of aspects of self
engage in theoretical sorting to best present lost, their salience, and the possibility of comeback –
the relationships between categories and regaining lost aspects of self. (p. 272)
concepts. Sorting memos occurs first in
service to the emergent grounded theory Glaser’s version of grounded theory sticks
and then, perhaps later, for presentation to to conventional social science. He does not
an audience. The explanation of the sorting take into account the potential power of artful
helps to integrate the theory and makes interpretation and advises against attending to
the analytic argument visible for the written writing (2001). For Glaser, the ‘conceptual
report. Strauss (1987) and Strauss and Corbin grab’ of the analysis trumps the writing of
(1990, 1998) propose diagramming major it (2001, p. 80). Not surprisingly, Glaser
ideas and relationships, and Clarke (2003, has expressed disdain for both qualitative
2005, 2006) offers a means of making researchers who aim to tell the overarching
structure and process visible. story in their research and the stories that
support it. His remarks endorse a unitary
treatment of grounded theory reportage,
untouched by either the narrative turn in
ART AND SCIENCE IN GROUNDED
the social sciences or the demands of varied
THEORY STUDIES writing genres and publishing venues.
Art and science in the originators’
works Artful interpretations in grounded
theory works
Strauss and Corbin (1998) treat their approach
as both science and art but their overlay of Grounded theorists’ published works range
technical procedures and objectivist assump- from neutral reports to imaginative interpreta-
tions undermine its interpretive elements tions written with style and grace. Much work
474 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
conducted under the banner of grounded contention that a critical mass of anoma-
theory consists of routine description couched lies eventually cause change in scientific
in academic conventions. Numerous ana- theorizing. In contrast, Star proposes that
lytic writings are stilted and mechanical. theoretical shifts in science are continual and
How might grounded theorists produce artful routine and argues that brain localizationists’
interpretations? victory over brain diffusionists in the late
Typical grounded theory writing fosters nineteenth century provides a case in point.
making categories explicit in linear form. She weaves description throughout the narra-
These categories represent authors’ tive to support her theoretical perspective and
construction of their respective research argument. For Star, abstract theorizing about
participants’ actions. This writing strategy scientific reasoning arises from the whole of
can shrink the substance of a study to her analysis rather than the disparate parts.
a list of mundane, loosely related processes In this sense, she reunites the fragmented data
or descriptors. When, however, authors into a coherent—and fascinating—analytic
present both the central idea and its major story but she does so in a way that its
categories in vivid terms, they simultaneously grounded theory underpinnings recede into
integrate their analyses and engage readers the background and her theoretical points
in their theoretical renderings. Geralyn A. emerge in the foreground. In the passage
Meyer (2002) titles her articles, ‘The Art below, Star explores her category ‘the contra-
of Watching Out: Vigilance in Women dictions’ [in the localizationists’ position] and
Who Have Migraine Headaches,’ and then also builds her case about how these scientists
posits ‘owning the label’ and ‘making the reconstruct the exigencies of their work to fit
connections’ as the two major conditions for their theoretical proclivities. Star writes:
her core category, ‘watching out,’ to occur.
She aims to provide a substantive analysis of
the vigilance she finds in her 22 interviews.
THE CONTRADICTIONS
Meyer breaks down both the conditions
and the core category into sub-categories. Localizationists recognized that material and imma-
Watching out included these subcategories: terial realms could not, without serious philo-
‘assigning meaning to what is, calculating the sophical difficulties, simply be posited as causing
risk, staying ready, and monitoring the results’ action in one another. They also recognized
(p. 1225). The names of the categories and that in principle ‘correlation is not causation,’
although they sometimes used correlation as proof.
sub-categories alone carry substantial weight The major conceptual difficulties thus caused by
and create the form of the analysis. Thus parallelism [‘the doctrine that the mind and body
these categories may require less detailing operate as two separate but parallel realms’ (Star,
and supporting evidence than more opaque 1989, p. 155)] were how the two realms (mind
categories because their analytic rendering and brain) were brought together and by what
mechanisms they were made to operate in tandem.
aims for limited theoretical reach but makes Again, it is not surprising to find that the localiza-
sound intuitive sense. Meyer’s analysis res- tionists’ responses of these problems were neither
onates with readers’experience. She keeps the unified nor consistent. They were facing multiple
analysis simple, the categories crisp, and the incommensurate audiences: philosophy, medicine,
relationships between them sequential. physiology, antivivisection, and evolutionary biol-
ogy. In addition their everyday work posed serious
In the following passage from a much technical difficulties and uncertainties.
larger project, Susan Leigh Star (1989) In order to resolve the conflicting demands of the
adopts a different and more difficult analytic several audiences, localization of adopted several
objective and writing strategy. She sets high general strategies. The first strategy was to refer
analytic stakes by making a major theoretical philosophical problems to an expert within their
ranks. This was someone who understood their
argument about relationships between scien- daily work concerns but who would speak as a
tific work and shifts in scientific theorizing. philosopher for them. The person elected to do this
Her argument challenges Thomas Kuhn’s was John Hughlings Jackson. Because he addressed
RECONSTRUCTING GROUNDED THEORY 475
many of the contradictions posed by parallelism and construction nor in its seeming substance.
the mind/brain relationship, Jackson became a kind Layers of meaning and action underlie
of symbolic leader for the localizationists….
both its construction and substance, which
The second strategy was to develop theories
and concepts that could act as plausible bridges means researchers have rich soil to excavate.
between the realms of the mind and the brain. Doing grounded theory may simplify method-
These explanations were not, strictly speaking, ological decisions but it fosters developing
philosophically accurate. However they were good complex and layered analyses, as the excerpt
enough as theoretical explanations to allow work
above from Star suggests. Given Glaser
to continue respectably.
As a final resort, when problems cannot be and Strauss’ (1967) original openness to
resolved, localizationists would simply jettison methodological innovation and development,
intractable problems into other lines of work. it is ironic that grounded theory has become
That is, those difficulties that could not easily be a methodological template—of whichever
addressed by some physical or medical model were
version—for some researchers who seek
relegated to ‘mind’—related lines of work, such
as psychiatry and psychology. In this way, psy- mechanical means to stamp out qualitative
chophysical parallelism was reinforced on an orga- studies.
nizational level. Such a division of labor effectively Yet by interrogating and following content,
obscured many of the epistemological problems grounded theorists can construct form for their
arising from the mind/brain gap. The contradictions
inquiry, rather than solely creating content
were thus eradicated from immediate concern.
(pp. 162–163) from form used as a recipe for generating
research. Grounded theory gives researchers
Star crafts a convincing argument. Note sufficient strategies that they can assume
how she weaves her evidence through the control of their research practice and advance
narrative to support her theoretical argu- their original ideas. Thus, the present points
ment. She creates smooth transitions between the way for future reconstruction of grounded
description and her category of ‘contradic- theory to open further possibilities for making
tions’ that simultaneously directs the reader original theoretical contributions.
and builds her case.
CONCLUSION NOTES
correctly points out overlap quantitative and qualita- quantitative data. Symbolic Interaction, 26(4):
tive research. 577–589.
4 Some blurring between theoretical treatises Charmaz. K. (2000). Constructivist and objectivist
and empirical studies occurs when anything without grounded theory. In N. K. Denzin & Y. Lincoln
numbers counts as ‘qualitative.’ Not all macro
(Eds.), Handbook of Qualitative Research, 2nd ed.
qualitative works are empirical.
(pp. 509–535). Thousand Oaks, CA: Sage.
5 Lazarsfeld also pursued qualitative methods but
his contribution to quantitative methods became Charmaz, K. (2002). Grounded theory analysis.
more widely known. In J. F. Gubrium & J. A. Holstein (Eds.), Handbook of
6 My comments here derive from my days as Interview Research (pp. 675–694). Thousand Oaks,
a student of both Glaser and Strauss and a long CA: Sage.
friendship with Strauss thereafter. Charmaz, K. (2003). Grounded theory. In Jonathan
7 This list is congruent with Glaser’s comparative A. Smith (Ed.), Qualitative Psychology: A Practical
approach. For further details see Charmaz (2006) and Guide to Research Methods (pp. 81–110). London:
Glaser (1978, 1992, 1998).
Sage.
Charmaz, K. (2005). Grounded theory in the
21st century: A qualitative method for advancing
REFERENCES social justice research. Forthcoming in N. Denzin &
Y. Lincoln (Eds.), Handbook of Qualitative Research,
Abbott, A. (1999). Department & Discipline: Chicago 3rd ed. Thousand Oaks, CA: Sage.
Sociology at One Hundred. Chicago: University of Charmaz, K. (2006). Constructing Grounded Theory:
Chicago Press. A Practical Guide Through Qualitative Analysis.
Adams, R. N. & Preiss, J. J. Eds. (1960). Human London: Sage.
Organization Research. Homewood, IL: Dorsey Press. Charmaz, K. & Henwood, K. (2007). Grounded theory.
Atkinson, P., Coffey, A., & Delamont, S. (2003). Key In C. Willig & W. Stainton-Rogers (Eds.), Handbook
Themes in Qualitative Research: Continuities and of Qualitative Research in Psychology. London: Sage
Changes. New York: Rowan and Littlefield. 240–259.
Baker, C., Wuest, J. & Stern, P. (1992). Method slurring: Clarke, A. E. (2003). Situational analyses: Grounded
The grounded theory, phenomenology example. theory mapping after the postmodern turn. Symbolic
Journal of Advanced Nursing, 17 :1355–1360. Interaction, 26, 553–576.
Benoliel, J. Q. (1996). Grounded theory and nursing Clarke, A. E. (2005). Situational Analysis: Grounded
knowledge. Qualitative Health Research, 6(3): 406–428. Theory After the Postmodern Turn. Thousand Oaks,
Boychuk Duchscher, J. E. & Morgan, D. (2004). CA: Sage.
Grounded theory: Reflections on the emerging Clarke, A. E. (2006). Feminism, grounded theory, and
vs. forcing debate. Journal of Advanced Nursing, situational analysis. In S. Hess-Biber & D. Leckenby
48(6):605–612. (Eds.), Handbook of Feminist Research Methods.
Bryant, A. (2002). Re-grounding grounded theory. Thousand Oaks, CA: Sage.
Journal of Information Technology Theory and Conrad, P. (1990). Qualitative research on chronic
Application, 4(1):25–42. illness: A commentary on method and conceptual
Bryant, A. (2003, January). A constructive/ist development. Social Science & Medicine, 30,
response to Glaser. FQS: Forum for Qualitative 1257–1263.
Social Research, 4(1), www.qualitative-research. Corbin, J. & Strauss, A. L. (1987). Accompaniments of
net/fqs/-texte/1-03/1-03bryant-e.htm [Accessed chronic illness: Changes in body, self, biography, and
03-14-2003]. biographical time. In J. A. Roth & P. Conrad (Eds.),
Bulmer, M. (1984). The Chicago School of Sociology. Research in the Sociology of Health Care, Vol. 6.
Chicago: University of Chicago Press. The Experience and Management of Chronic Illness
Burawoy, M. (1991). The extended case study. (pp. 249–281). Greenwich, CT: JAI Press.
In M. Burawoy, A. Burton, A. A. Ferguson, K. Fox, Corbin, J. & Strauss, A. L. (1988). Unending Work
J. Gamson, N. Gartrell, L. Hurst, C. Kurzman, and Care: Managing Chronic Illness at Home.
L. Salzinger, J. Schiffman, & S. Ui (Eds.), Ethnography San Francisco: Jossey-Bass.
Unbound : Power and Resistance in the Modern Dey, I. (1999). Grounding Grounded Theory. San Diego:
Metropolis (pp. 271–290). Berkeley: University of Academic Press.
California Press. Dey, I. (2004). Grounded theory. In C. Seale, G. Gobo,
Castellani, B., Castellani, J., & Spray, S. L. (2003). J. F. Gubrium, & D. Silverman (Eds.), Qualitative
Grounded neural networking: Modeling complex Research Practice (pp. 80–93). London: Sage.
RECONSTRUCTING GROUNDED THEORY 477
Ellis, C. (1995). Emotional and ethical quagmires of Jaber F. Gubrium, & David Silverman (Eds.), Qualita-
returning to the field. Journal of Contemporary tive Research Practice (pp. 479–483). London: Sage.
Ethnography, 24(1): 68–98. Kelle, U. (2005, May). ‘Emergence’ vs. ‘forcing’:
Fielding, N. G. & Lee, R. M. (1998). Computer Analysis A crucial problem of ‘grounded theory’ Reconsidered
and Qualitative Data. London: Sage. [52 paragraphs]. Forum Qualitative Sozialforsung/
Glaser, B. G. (1978). Theoretical Sensitivity. Mill Valley, Forum Qualitative Sociology [On-line journal] 6,2
CA: The Sociology Press. Art. 27. Available at http/www.qualitative-research.
Glaser, B. G. (1992). Basics of Grounded Theory net/fqs.texte-2-05/05-2-27-e.htm [Accessed: 05-30-
Analysis. Mill Valley, CA: The Sociology Press. 2005].
Glaser, B. G. (1998). Doing Grounded Theory: Issues and LaRossa, R. (2005). Grounded theory methods and
Discussions. Mill Valley, CA: Sociology Press. qualitative family research. Journal of Marriage and
Glaser, B. G. (2001). The Grounded Theory Perspective: Family 67 (November):837–857.
Conceptualization Contrasted with Description. Mill Layder, D. (1998). Sociological practice: Linking theory
Valley, CA: The Sociology Press. and social research. London: Sage.
Glaser, B. G. (2002). Constructivist grounded theory? Lazarsfeld, P. & Rosenberg, M. (Eds.). (1955). The
Forum qualitative Sozialforschung/ Forum: Qualitative Language of Social Research: A Reader in the
Social Research [On-line Journal], 3. Available Methodology of Social Research. Glencoe, IL: Free
at: http://www.qualitative-research.net/fqs-texte/3- Press.
02/3-02glaser-e-htm Locke, K. (2001). Grounded Theory in Management
Glaser, B. G. (2003). Conceptualization Contrasted with Research. Thousand Oaks, CA: Sage.
Description. Mill Valley, CA: Sociology Press. Lofland, Lyn H. (1980). Reminiscences of classic Chicago.
Glaser, B. G. & Strauss, A. L. (1965). Awareness of Dying. Urban Life, 9:251–281.
Chicago: Aldine. Lonkila, M. (1995). Grounded theory as an emerging
Glaser, B. G. & Strauss, A. L. (1967). The Discovery of paradigm for computer-assisted qualitative data
Grounded Theory. Chicago: Aldine. analysis. In Kelle, U. (Ed.), Computer-aided Quali-
Glaser, B. G. & Strauss, A. L. (1968). Time for Dying. tative Data Analysis: Theory, Methods and Practice
Chicago: Aldine. (pp. 41–51). London: Sage.
Goffman, E. (1959). The Presentation of Self in Everyday Maines, David R. (2001). The Faultline of Consciousness:
Life. Garden City, NY: Doubleday Anchor Books. A View of Interactionism in Sociology. New York:
Goffman, E. (1961). Asylums. Garden City, NY: Aldine de Gruyter.
Doubleday Anchor Books. May, K. (1996). Diffusion, dilution or distillation? The
Goffman, E. (1963). Stigma. Englewood Cliffs, NJ: case of grounded theory method. Qualitative Health
Prentice-Hall. Research, 6(3):309–311.
Goulding, C. (2002). Grounded Theory: A Practical Guide Mead, G. H. (1932). Philosophy of the present. LaSalle,
for Management, Business, and Market Researchers. IL: Open Court Press.
London: Sage Melia, K. M. (1987). Learning and Working: The
Henwood, K. & Pidgeon, N. (2003). Grounded theory in Occupational Socialization of Nurses. London:
psychological research. In P. M. Camic, J. E. Rhodes, & Tavistock.
L. Yardley (Eds.), Qualitative Research in Psychol- Melia, K. M. (1996). Rediscovering Glaser. Qualitative
ogy: Expanding Perspectives in Methodology and Health Research, 6(3):368–378.
Design (pp. 131–155). Washington, DC: American Meyer, G. A. (2002). The art of watching out: Vigilance
Psychological Association. in women who have migraine headaches. Qualitative
Hildebrand, Bruno. (2000/2004). Anselm Strauss. Health Research, 12(9):1220–1234.
In U. Flick, E. Von Kardorff, & I. Steinke (Eds.), Miller, D. E. (2000). Mathematical dimensions of quali-
A Companion to Qualitative Research (pp. 17–23). tative research. Symbolic Interaction, 23:399–402.
London: Sage. Mills, J., Bonner, A. & Francis, K. (2006). The develop-
Hughes, E. C. (1958). Men and Their Work. Glencoe, IL: ment of constructivist grounded theory. International
Free Press. Journal of Qualitative Methods, 5(1):1–10.
Junker, B. H. (1960). Field work: An introduction to the Morse, J. M. (1995). The significance of saturation.
social sciences. Chicago: University of Chicago Press. Qualitative Health Research, 5:147–149.
Kahn, R. L. & Cannell, C. F. (1957). The Dynamics of Peirce, C. S. (1938/1958). Collected Papers. Cambridge:
Interviewing. New York: Wiley. Harvard University Press.
Kelle, U. (2004). Computer assisted qualitative Pidgeon, N. F. & Henwood, K. L. (1995). Grounded
data analysis. In Clive Seale, Giampietro Gobo, theory: Practical implementation. In J. T. E. Richardson
478 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
(Ed.), Handbook of Qualitative Research Methods for Stern, P. N. (1994). Eroding grounded theory. In J. Morse
Psychology and the Social Sciences (pp. 86–101), (Ed.), Critical issues in qualitative research methods
Leicester: British Psychological Society Books. (pp. 212–223). Thousand Oaks, CA: Sage.
Pidgeon, N. F. & Henwood, K. L. (2004). Grounded Strauss, A. (1987). Qualitative Analysis for Social
theory. In M. Hardy & A. Bryman (Eds.), Handbook Scientists. New York: Cambridge University Press.
of Data Analysis (pp. 625–648). London: Sage. Strauss, A. (1993). Continual Permutations of Action.
Platt, J. (1996). A History of Sociological Research New York: Aldine de Gruyter.
Methods in America, 1920–1960. New York: Strauss, A. L. (1959/1969). Mirrors and Masks. Mill
Cambridge University Press. Valley, CA: The Sociology Press.
Reichert, J. (2000/2004). Abduction, deduction and Strauss, A. L. (1961). Images of the American city.
induction in qualitative research. In U. Flick, Chicago: University of Chicago Press.
E. Von Kardorff, & I. Steinke (Eds.), A Companion to Strauss, A. & Corbin, J. (1990). Basics of Qualita-
Qualitative Research (pp. 159–164). London: Sage. tive Research: Grounded Theory Procedures and
Rennie, D., Phillips, J. R., & Quartaro, G. K. (1988). Techniques. Newbury Park, CA: Sage.
Grounded theory: A promising approach to con- Strauss, A. & Corbin, J. (1998). Basics of Qualita-
ceptualisation in Psychology. Canadian Psychology, tive Research: Grounded Theory Procedures and
29(2):139–150. Techniques, 2nd edn. Thousand Oaks, CA: Sage.
Richardson, L. (1993). Interrupting discursive spaces: Urquhart, C. (2003). Re-grounding grounded theory-
Consequences for the sociological self. In N. K. Denzin or reinforcing old prejudices?: A brief response to
(Ed.), Studies in Symbolic Interaction, Vol. 14 Bryant. Journal of Information Technology Theory and
(pp. 77–83). Greenwich, CT: JAI Press. Application, 4:43–54.
Rock, P. (1979). The Making of Symbolic Interactionism. Urquhart, C. (2007 forthcoming). The evolving nature
London: Macmillan. of the grounded theory method: The case of the
Rosenthal, G. (2004). Biographical research. In C. Seale, information systems discipline. In A. Bryant &
G. Gobo, J. F. Gubrium, & D. Silverman (Eds.), Quali- K. Charmaz (Eds.), Handbook of Grounded Theory.
tative Research Practice (pp. 48–64). London: Sage. London: Sage.
Schreiber, R. S. & Stern, P. N. (Eds.) (2001). Using Wilson, H. S. & Hutchinson, S. A. (1996). Methodologic
Grounded Theory in Nursing. New York: Springer mistakes in grounded theory. Nursing Research,
Publication Company. 45(2):122–124.
Seale, C. (1999). The Quality of Qualitative Research. Wuest, J. (1995). Feminist grounded theory: An
London: Sage. exploration of the congruency and tensions between
Star, S. L. (1989). Regions of the Mind: Brain Research two traditions in knowledge discovery. Qualitative
and the Quest for Scientific Certainty. Stanford, CA: Health Research, 5(1):125–137.
Stanford University Press. Wuest, J. (2001). Precarious ordering: Toward a formal
Stern, P. N. (1980). Grounded theory methodology: its theory of women’s caring. Health Care for Women
uses and processes. Image, 12, 20–23. International, 22(1–2):167–178.
28
Documents and Action
Lindsay Prior
Tis writ, ‘In the beginning was the Word’. between writing, text, records and documen-
I pause, to wonder what is here inferred. … tation, but will merely refer to documents in a
The spirit comes to guide me in my need, generic sense – that is, as readable matter.
I write, ‘In the beginning was the Deed’.
Goethe, Faust, Part One.
As someone who has called upon and exten-
sively used documents in social research, it
seems to me that they always enter into social
affairs in two distinct modes: (a) as receptacles
The dynamic connection between words, of content; and (b) as agents in networks of
writing, and action that is highlighted in action. In what follows I intend to illustrate
the extract from Goethe’s Faust constitutes by the use of examples how a researcher
the central theme of this chapter. Oddly might relate to these two modes. My examples
it is a theme that is rarely taken up with are drawn mostly from my own work and
issues relating to social research, despite the therefore concern matters affecting health,
fact that writing plays such a large part in illness and medicine – the areas in which I do
everyday culture. Indeed, in our age and our my research. However, the discerning reader
world, writing is more often than not seen as should not be misled by the specificity of the
being somewhat divorced from action – as examples, and should be able to see how an
something static, immutable and isolated from investigator in other fields of inquiry might
human deed – lodged as it is in books, libraries extend the strategies discussed herein to their
and archives. Yet the plain fact is that writing own areas of interest.
is itself a form of action and can even serve to As far as the social sciences are concerned,
structure significant features of interaction. most of the research that uses or calls upon
Writing is not of course co-terminus with documents focuses mainly on the collection
documentation; rather it is contained within and analysis of document content – and
documentation (along with numerous other that is where our own starting point is to
human creations such as maps, architectural be found. Indeed, a focus on documents as
plans, film, photographs and electronic web containers for content is well established
pages). However, in this chapter I will not in the social sciences. Documents in this
be overly concerned with drawing distinctions frame can be approached as sources of
480 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
information, and the writing and images that strategy will be discussed in the section
they contain scoured for appropriate data. entitled ‘Studying documents in action’.
Thus, letters, texts, photographs, adverts, Examining the role of documents in a
biographies and autobiographies, as well network generates questions about what
as documents containing statistical data are documents ‘do’, rather than what they ‘say’ –
typically regarded as a resource for the though in the messy way of the world such
social science researcher – see, for example, distinctions hold only at a conceptual rather
Plummer (2001) and Scott (1990, 2006). than an empirical level. Yet, by focusing on
Usually, various kinds of content analysis are ‘doing’ we come to see that documents not
adopted for such approaches – see Bryman only enter into human affairs as actors, but
(2004), Krippendorf (2004) and May (2001). can also structure such affairs – often in fine
Content analysis can also blend into discourse detail. Consequently, in the section entitled
analysis – a form of analysis that examines ‘Documents in interaction’ I will concentrate
how objects and relations between objects are on word and deed – showing how documents
represented and structured by means of text can influence episodes of human interaction
and talk (Wood, 2000). and thereby enter into the research frame as
On occasion, these relatively static forms active agents and something other than mere
of analysis can be extended so as to study containers of content.
documents as ‘topic’, rather than resource –
in which case the focus is, in part, on the
ways in which any given document came STUDYING CONTENT
to assume its actual content and structure.
This latter approach is akin to what Foucault Given that documents are normally viewed
(1972) might have called the ‘archaeology as little more than containers of content, the
of documentation’ – looking, for example, study of the material lodged within documents
at the first points at which certain objects in usually takes pride of place in relevant social
the world are mentioned and come into being scientific research strategies. Thus, letters,
via documentation, or revealing the ways in diaries, wills, biographies, newspaper stories,
which systems of classification of things in or whatever, can be scrutinised for their
the world – birds, flowers, viruses and the rhetoric, their syntax or even just for ‘themes’.
like – change at specific points in time. Some In this respect, Glaser and Strauss (1967:
implications of this style of research will also 163), argued that, in matters of sociological
be examined in the following section. research, documents ought to be regarded as
Approaching documents as topic rather akin ‘to an anthropologist’s informant or a
than resource can, however, open up a sociologist’s interviewee’.
further dimension of analysis. It concerns an Naturally, the use of documents as ‘infor-
examination of the ways in which documents mants’ stretches much further back into the
are used in social interaction and how they social sciences than the 1960s. For example,
function. Indeed, in this vein it is evident in one of the earliest sociological studies of
that during recent decades new approaches the twentieth century Thomas and Znaniecki
to the study of documents have emerged. In (1958; orig. 1918) collected together and
the field of sociology these new visions may analysed letters written by Polish immigrants
be seen to relate, in part, to developments to the USA. The use of immigrant letters
in actor-network theory or ANT (Law and as a source of social scientific data was
Hassard, 1999). In history and the history probably not original – even in 1918 when
of science they relate to the newly emergent the first volume of the ‘Polish Peasant’
‘geographies of knowledge’ (Livingstone, was published – but it was, nevertheless,
2005). In all cases the key theme involves insightful. W. I. Thomas, in particular, was
a consideration of documents as objects and concerned with individual attitudes – towards
actors in a web of activity. This kind of possessions, the family, social relationships
DOCUMENTS AND ACTION 481
and the like. The immigrant letter in this approach to the study of documentation
respect was seen to function as a repository of as ‘informant’. Insofar as rigour applies to
attitudes. For instance, the very fact that such content analysis – whether it is from a
letters were written at all, indicated that Polish newspaper story, a life history, a police report
immigrants were ready to invest a consider- on a crime scene or a social work report on a
able amount of time and effort in maintaining person with multiple problems – such analysis
family links across two continents. On the can take any one of a number of routes.
other hand, the actual content of the letters In my own case, I usually like to begin by
suggested to Thomas that in many key identifying all of the words used in a document
respects social solidarity was breaking down as well as the number of times that any given
in the Polish community. Thus, the letters were word is used. (This can be achieved through
said to reveal a considerable degree of conflict the use of simple concordance programmes
about such matters as marriage partners and that are freely available on the WWW.) By
other family relationships. As with many implication, content analysis necessitates both
researchers Thomas and Znaniecki can be enumeration and understanding of the various
accused of finding in the data only what they words lodged within a text. For example, in
wished to see – a common failing in analyses Table 28.1, I have provided an indication of
of content – and it is clear that theme of ‘social the number of times that particular words
disorganisation’ was already firmly implanted appeared in a patient support group leaflet
in the sociology of W. I. Thomas well before for people who suffer from chronic fatigue
he had looked at any letters. It is not surprising, syndrome – CFS (also known in the UK as
therefore, that social disorganisation in the ‘M.E’. and in the USA as CFIDS). Given
American urban Polish community is what the name of the condition, the appearance
Thomas saw the letters to reveal, but the Polish in the document of ‘fatigue’ and ‘chronic’
Peasant nevertheless gave a spur to the use of over 50 times apiece is not perhaps surprising.
such documents in the study of contemporary However, it is interesting to note that viruses
culture and history. In sociology and anthro- seem to be associated with whatever is
pology during subsequent decades there were going on in the document (23 citations), as
a sizeable number of studies that used diaries, well as an entity referred to as fibromyalgia
letters, biographies and autobiographies as life (18 citations), depression (14), genes (4)
histories and as important sources of social
scientific data (Angrosino, 1989). Plummer
(2001) provides an excellent overview of the Table 28.1 Occurrence of selected words
field and indicates how the use and study of in a 2315-word patient-support group
leaflet on Chronic Fatigue Syndrome
such materials came to be associated with
Fatigue 55
distinct methods of social scientific inquiry Chronic 51
(as is the case with ‘biographical’ methods, Illness 50
for example). Syndrome 46
Scouring newspapers and other documents Research 29
for supportive stories or evidence is one Virus/Viral/Virology 23
Disease 19
way of approaching document content, but Fibromyalgia 18
a more systematic approach would require Depression 14
both an appreciation of the ‘population’ of Immune/Immune-related/lmmunology 9
documents that may be available for sampling Genetic 4
(Hill, 1993), and of the entire content of the Psychology/psychological 4
Neurology/neurological 4
documents selected – looking at the segments Psychoneuroimmunology 2
that fail to fit hypotheses and theories as well Psychiatric/ Psychiatrists 2
as those that support hypotheses and theories. Mental 1
In that respect Glaser and Strauss (1967) were Mind 1
probably among the first to suggest a rigorous Source: Prior, 2003.
482 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
and something called psychoneuroimmunol- rejects such a claim because that would be
ogy (2). The simple presence of these words to suggest that CFS is being ‘dismissed’
is worthy of note and for someone who knows or not ‘accepted’ as a real illness simply
the arguments and debates associated with because it is ‘unproven’. In fact, were I to
the diagnosis and treatment of CFS they are produce the document in full it would be
all highly significant. In general, however, reasonably easy to see that throughout the
rather than a focus on individual words, it text there is a tension between the claims of
is usually more important for the researcher the writer – who asserts variously that CFS
to grasp (a) how the words relate to each is a ‘real’ and essentially ‘physical disease’ –
other and (b) what is being implied by their and some unknown others who have claimed
use. Let us consider a brief example, by that CFS is related to depression, anxiety
moving up a level and looking at sentences and other psychological problems. (Similar
and phrases rather than just words. Here is tensions are evident in debates concerning the
an extract from the aforementioned WWW nature of fibromyalgia – also cited above.)
document. By examining such tensions in the chosen
text, the analyst is drawn into an examination
‘Is CFS genetic? of a rhetoric of illness – concerning the
The cause of the illness is not yet known. Current
theories are looking at the possibilities of neuroen-
ways in which a disorder of unknown cause
docrine dysfunction, viruses, environmental toxins, is represented and understood by different
genetic predisposition, or a combination of these. parties. It is at that point, however, that
For a time it was thought that Epstein-Barr virus content analysis tends to drift into discourse
(EBV), the cause of mononucleosis, might cause CFS analysis.
but recent research has discounted this idea. The
illness seems to prompt a chronic immune reaction
Unlike content analysis, discourse analysis
in the body, however it is not clear that this is in is an awkward concept to capture. It has
response to any actual infection – this may only be essentially concerned the ways in which
a dysfunction of the immune system itself. things and our knowledge of things are
structured and represented through text and
A number of things are evident from the talk. For instance, there is a considerable
passage – such as the cause of the illness tradition within social studies of science and
being unknown; the possibility of the illness technology for examining the role of scientific
being caused by toxins, viruses, or endocrine rhetoric in structuring our notions of ‘nature’
disorder; and the fact that the illness might be and the place of human beings within nature.
‘genetic’, or caused by immune dysfunction. The role and structure of scientific rhetoric in
Indeed, the suggestion is that whatever text has, for example, figured in the work of
the cause might be, it is likely to be a Bazerman (1988), Gross (1996), Latour and
physiological (possibly neurological) rather Woolgar (1979), Myers (1990) and Woolgar
than, say, a psychological cause. Indeed later (1988); and even been extended beyond text
on in the document we get the following and into the realm of visual representations
statement: (Lynch and Woolgar, 1990) and everyday talk
Emerging illnesses such as CFS typically go through (Gilbert and Mulkay, 1984). And in this vein
a period of many years before they are accepted there have been numerous studies examining
by the medical community, and during that interim how the objects of science, medicine and tech-
time patients who have these new, unproven nology have been, and are, structured through
illnesses are all too often dismissed as being
discourse. One particularly interesting set of
"psychiatric cases". This has been the experience
with CFS as well. studies have been those that have concentrated
attention on the concept of the ‘gene’ and the
So it is also clear that somebody somewhere human genome. For example, Lily Kay (2000)
has argued that CFS might be related in analysed the role of metaphors of the gene and
some way to psychological or psychiatric genetics in genetic science between the 1950s
conditions – but the author of this document and the twenty-first century – indicating how
DOCUMENTS AND ACTION 483
the image of DNA as a code or text 205 50 volumes. And it takes our laboratory a week
of instructions (recipe) or plan (blueprint) to check each one,
emerged only gradually during the second 206 which you can then work out quite quickly that
that is effectively a year
half of the twentieth century. Thus, she points 207 to check every single one. That is just the
out how, in the famous April 1953 Nature practicality of the time scale.
paper by Crick and Watson on DNA, the 208 The other problem though, if you are dealing
authors referred only to the structure of DNA– with something as big as
and she then investigates how the idea of 209 something like an encyclopaedia and you are
looking for a mistake and
using concepts of grammar and semantics to 210 effectively what you dealing with is just a code,
describe genetic processes emerged during a series of letters, then
the 1960s – particularly relating to work on 211 you are looking for something like a missing
‘messenger’ RNA. Indeed, the first ‘word’ paragraph or sometimes just
of the genetic code (the UUU of RNA) was 212 a missing word, or sometimes just a missing
letter. And right down to
not identified until 1961. Kay subsequently 213 just a change on one letter can be all that is
argues that the Nobel prize-winning work needed to have disastrous
of Nirenberg and Mathei (who discovered 214 effects.
the first word) would simply not have been 215 Patient: Yeah.
possible without calling upon and utilising
metaphors of communication and information A number of issues deserve attention here. The
science such as we have referred to above. first is the extensive use of metaphor in this
Other writers have chosen to focus on genetic exchange. In particular, genes are referred to
discourse in everyday culture (as reflected as ‘coding instructions’ (lines 201–02, 210),
through news stories and the like) with equally ‘volumes of an encyclopaedia’ (203–04), a
interesting results. Thus Nelkin (2001), for ‘series of letters’ (210), and words and/or
instance, has noted how, in popular culture, paragraphs (211–12). And in accord with
DNA is not simply regarded as a ‘code’ – such rhetorical forms, mutations are referred
carrying and expressing information – but to as ‘missing’ words, letters or paragraphs,
that it is also endowed with executive action. as ‘mistakes’ possibly brought about by a
In short, DNA is represented through text as ‘change in just one letter’ (213). The second
something that ‘makes things’ (humans, can- issue of interest is in what may be called
cers, and so forth), in a deterministic system. the actional components of the sentences
In the following paragraph I present some that link genes to human physiology. Of
of my own data (derived from talk between particular significance is the way in which
a doctor and a client of a cancer genetics genes are said to ‘control’ protein functions
service) to illustrate some possibilities of this (line 201), and genetic re-arrangements of
kind of approach. Even though the data are DNA sequences (letters) are argued to be
derived from talk (rather than text per se), they capable of having ‘disastrous effects’ (lines
serve to illustrate how analysis of a discourse 213–14) on the human body. Such attention
can reveal detail about the ways in which, in to the ways in which the use of tropes (such
any given culture, the world and the objects as metaphor) and syntax operate in text lead
within it are represented and structured. us to consider how ‘things’ and events in the
world are structured through discourse.
200 Doctor: And the genes are broken up into
sections and so a gene that
It could be said that with both content and
201 controls a protein function in a body is not just discourse analysis, researchers are essentially
one long coding seeking to use documentation as ‘resource’ –
202 instruction it is in fact broken up into sections that is as a source of data for social scientific
that then get joined theorising (of varying degrees of complexity).
203 together. And those sections you can think of
them as being volumes of
It is, however, possible to approach doc-
204 an encyclopaedia. Basically between the two ument content as ‘topic’. The very useful
genes there are effectively distinction between resource and topic was
484 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
first introduced by Zimmerman and Pollner Disorder is, for instance, recognised as a
(1971), and picking up on this distinction disorder only in DSM-III (first published in
can encourage us to ask a different set of 1974), whilst multiple personality disorder
questions about documentation. So instead of (MPD) has undergone a few transformations
focusing merely on what documents contain and is no longer listed in the 4th-revised
we can begin to ask how the documentation edition of the DSM. The inclusion and
that we elect to examine came to assume deletion of such diagnostic categories can be
the form that it did. This line of inquiry can used as key indicators of not merely how
be especially useful in the examination of professional and technical discourse might
the ways in which people ‘sort things out’ have altered, but also how political, legal
(Bowker and Star, 1999). For instance, it is and socio-economic processes impinge on
often instructive in matters of social research the affairs of science and medicine (for a
to ask how things come to be classified detailed example of the relationships between
in a particular way (and not other ways) a form of scientific classification and styles
and what rules are to be used to allocate of professional practice see Keating and
objects to one realm rather than another. Cambrosio, 2000).
Thus we might, for example, ask questions The manufacture and standardisation of
concerning the ‘causes’ of death, disease taxonomies – as well as the deployment of
and illness – such as what can one die of? rules for allocating ‘cases’ to appropriate
The answer to that question is invariably categories – is important for various reasons,
constrained by the content of a World Health but not least because they are indispensable to
Organization (WHO) manual – namely, The generating images of the world. For example
International Classification of Diseases and the ways in which events relating to crime,
Related Health Problems (WHO, 1992). It is the economy, illness and disease or education
often referred to in an abbreviated form as are classified and counted, is fundamental to
the ICD. The current edition of the manual our understanding of long-term trends and
is the tenth, and so the abbreviation is, our image of contemporary happenings. And
more accurately, ICD-10. ICD-10 provides as numerous analysts of official statistical
a list of all currently accepted causes of accounts of the world have demonstrated (see
death, and they are classified into ‘chapters’. for example, May, 2001; Prior, 2003), for any
Thus, there are chapters relating to diseases given society we can have as much or as little
and disorders of the respiratory system, the illness, crime, ‘success’ and ‘failure’ as we
circulatory system, the nervous system and want – depending on how, exactly, we sort
so on. In different decades different diseases things out.
and causes of death are added and deleted Unfortunately, once we are engaged with
from the manual. HIV/AIDS is an obvious the routine messiness of the empirical world
case of an addition and it appears as a cause many of these distinctions between content
of death only in ICD-10, whilst ‘old age’ as a and discourse, topic and resource are difficult
cause of death was eliminated in ICD-6. Such to hold to. For documents, as with most
taxonomies reflect aspects of human culture phenomena are fluid, messy and somewhat
and researching the ‘archaeology’of such doc- slippery objects for analysis. More impor-
uments can be instructive in itself. A related tantly, and as I shall demonstrate in the
publication – The Diagnostic and Statistical next two sections, documents often appear as
Manual of Mental Disorders (American active agents in a universe of deeds.
Psychiatric Association, 2000) or DSM – is
available for the classification of psychiatric
(mental) conditions. One might say that the STUDYING DOCUMENTS IN ACTION
DSM provides the conceptual architecture in
terms of which western culture comprehends A focus on documents in action tends to
disorders of the mind. Post-Traumatic Stress encourage a focus on how documents are used
DOCUMENTS AND ACTION 485
(function) and how they are exchanged and the book, librarians to identify the literary
circulate in various communities. Naturally, genre of the book, readers to search out the
documents carry content – words, images, book as science-fiction and so forth. It is in
plans, ideas, patterns and so forth – but such a way that we can begin to see the book as
the ways in which such content is actually an object within a network. More importantly,
called upon and how it functions cannot be however, it is likely that our mysterious book
determined (though it may be constrained) by (or text) will not simply be at the mercy of
an analysis of its content. Indeed once a text the various ‘actors’ in such a network but will
or document is sent out into the world there also become an actor itself.
is simply no predicting how it is going to Perhaps the clearest image of a document
circulate and how it is going to function in as an actor arises in the case of a legally
specific social and cultural contexts. For this constituted ‘last will and testament’, which
reason alone, a study of what the author(s) of on the occasion of its final ‘reading’, acts.
a given document (text) ‘meant’ or intended Or consider the role of various books of the
can only ever add up to limited examination of Bible in the history of social and religious
what a document ‘is’. Indeed, as the literary controversy – which have also served as
theorist De Certeau (1984: 170) has argued, actors (as sources of authority, as witness
‘Whether it is a question of newspapers or to evidence and so forth). And as with
Proust, the text has a meaning only through human actors, documents as actors can be
its readers; it changes along with them; recruited, suppressed, enrolled into the service
it is ordered in accordance with codes of of various interest groups – some examples
perception that it does not control’. In this of which are referred to in Prior (2003).
regard an interest in the reception and reading Unfortunately, one of the problems with
of text has formed the focus for recent histories the concept of immutable mobiles is its
of knowledge that seek to examine how emphasis on stasis. For as the objects in a
the ‘same’ documents have been received network move they often become mutable and
and absorbed quite differently into different metamorphose into new objects.
cultural and geographical contexts (see, for A consideration of objects in a network is
example, Burke, 2000; Livingstone, 2005). usually associated with a somewhat amor-
One possible starting point for inquiries phous group of writers who favour what is
into the dynamics of documentation rests in called actor-network-theory orANT (see, Law
Latour’s notion of an ‘immutable mobile’ and Hassard, 1999). ANT is of concern to
(1987). An immutable mobile is something us insofar as it opens a new dimension for
that can move around, whilst – at the same social research – analysing how documents
time – holding its essential shape. Thus a book, are positioned in actor-networks and also how
or set of instructions, or a recipe, or map, can they function (act) in such networks. (In terms
hold its shape in the ordinary everyday sense of ANT, non-human agents are commonly
of such words, and it can also hold its shape referred to as actants rather than as actors.)
in a relational manner. That is to say, a book From our point of view, the key research
has shape in (three-dimensional) space, but questions revolve around the ways in which
it also has shape as a member of a specific documents are integrated into networks and
type of literature (say a science text, or a how they influence the development of the
work of fiction, or work of science-fiction, or network. This kind of focus has, in some cases,
philosophy, or poetry or history of art). Yet for led to developments in research software to
the book to retain its shape in this relational explore the relational aspects of humans and
sense, a dynamic network of actors is needed. documentation. In what follows I shall outline
Such a network might include, for instance, a few examples. I shall concentrate first on
authors and literary critics to identify the book WWW pages as documents and sketch out
as a work of science-fiction, book catalogues how they can be approached in a variety of
to classify the work, libraries in which to hold social scientific frameworks.
486 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
In the first instance, of course, it is clear crawl necessitated the identification of WWW
that WWW pages can be scoured for their addresses for two Ugandan non-governmental
content alone – that is, used and interrogated organisations (NGOs) working with people
as informant. For example, in a 2002 study of with HIV/AIDS. The results of this initial
anti-vaccination web sites, Wolfe et al. (2002) crawl indicate a number of features. I have
identified 22 such WWW sites and noted that highlighted only a few of these in Figure 28.1.
in all cases the documentation asserted that They concern the centrality of international
vaccines caused idiopathic illness, in 95 per- organisations such as the UN, Unicef and the
cent of cases that vaccines erode immunity, World Health Organisation in the document
and in 91 percent of cases that vaccination network. Surrounding those organisations are
policy was driven by profit motives rather the pages of various Ugandan government
than cares about health. These and other organisations (such as health.go.ug), and on
details concerning document content were the periphery are the local NGOs, whilst at
acquired by the use of relatively simple the very edge is the page for the Ugandan
coding techniques. The authors also noted parliament.
that anti-vaccination sites used specific tactics The links between such documentation may
for transmitting their messages. Thus, one be considered as data in themselves – and
favoured strategy involved the use of personal they certainly point to factors such as position
stories – often from parents who served as (degrees of centrality, for example), density
witnesses to the fact that vaccination caused of contact, directions of contact and so forth.
severe illness in their children. Analysis of The links could also be considered as a
story structure would, of course, inveigle us map for exploring the relationships between
into a specific style of discourse analysis – local NGOs, international organisations and
in this case perhaps one that focused on the Ugandan government. Naturally, the
narrative rather than on rhetoric. However, exploration of such links would need to be
there remains a further strategy for the supplemented by the use of other methods
examination of anti-vaccination sites and it and techniques (such as interview techniques
involves looking at the networks that emerge or a range of ethnographic techniques),
out of the relations between such sites. nevertheless the provision of the web map
The possibility for examining relations provides both a starting point and ground
between web sites is, of course, built into on which hypotheses might be generated
web sites ordinarily, for web sites contain concerning notions of, say, ‘partnership’ in
hyperlinks (to other web pages), and by the field of HIV/AIDS in Africa. There is,
concentrating on the outlinks of the web however, a feature of social activity that is
pages it becomes possible to study how only touched upon – rather than confronted –
internet documents relate one to another. by the use of a web crawler. It involves the fact
In recent years the task of tracing the that actor-networks contain human as well as
links between such sites has been facilitated non-human actors.
by the use of web crawlers. However, By tradition, a focus on relationships
Richard Rogers, who has designed one such between people in a network has been
crawler (www.govcom.org), refers to issue associated with social network analysis. Such
networks and issue spaces rather than WWW analysis concentrates on the number of links
networks, (see, Marres and Rogers, 2005). between specific individuals, the degree to
An issue network is a network of pages that which an individual is central or peripheral to
acknowledge each other by way of hyperlinks. a given network, the density of interactional
I have provided a simple example of such or contact nodes and so forth (see, Scott,
a network in Figure 28.1. The figure traces 1999). However, as actor-network theorists
links between web pages of organisations emphasise, social networks cannot be reduced
who work with people with HIV/AIDS in to relations between humans. Consequently,
Uganda. The starting point for the web what is usually needed is an analysis of
DOCUMENTS AND ACTION 487
amref.org
Local NGO’s
Straight-tak.ar.ug
und.ac.aa
rabn.org
kapc.ocke
sara.aed.org kanco.org
aioug.org govemment.go.ug
International Organisations
unaids.org
who.int
unipa.org
unicef.org
cdc.gov
parliament.go.ug health.go.up tasouganda.org
undp.org
unaso.or.ug
ugandamolg.org ama-assn.org
aidsuganda.org
uhpl.uganda.co.ug cafs.org
globalhealth.org
Ugandan Parliament worldbank.org
toa.gov
finance.go.ug
managa.52
theglobalfurd.org
ubos.org
Figure 28.1 WWW links between organisations in Uganda concerned with HIV/AIDS
(generated using Issuecrawler.net)
relationships between humans, organisations, interested in how the people in the network
and ‘things’ (such as documents, machines, collaborated, as well as the role of such things
germs or whatever). For example, Cambrosio as antigen, antibody reagents (contained in
et al. (2004) studied the nature of collaborative bottles) and antibodies in a research net-
research networks and innovation in a specific work. One component of their investigation
field of biomedicine. The researchers were concentrated on the relationships between
488 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
research workshops and research laboratories from the map how the relationships fan
in the development of particular (HLDA) out, the relative importance of each of the
antibodies, and Cambrosio et al. sought to three workshops and which institutions are
designed a network map of the relations that linked to which antibodies. Antibodies are not
linked the institutions and workshops to the documents, of course, but the network map
antibodies. In doing that they designed a illustrates how documents could be mapped
network map – reproduced as Figure 28.2. into a scheme of social relations and how
In the context of this figure the points T, M it could be the documents that form the
and B represent different research workshops. focus of attention rather than the human
The outer points represent the laboratories or beings. However, such maps require dedicated
research centres and the size of the circles software that can generate visual traces of
and squares are proportional to the number actor-networks. In the case discussed the
of antibodies submitted by each laboratory relevant technology was provided by Réseau-
to each workshop. We can see immediately Lu (see, Mogoutov et al., 2005).
Workshops
Research Centres
Connections
Figure 28.2 Human leucocyte differentiation antigens (HLDA) workshops research centres
and antigens
Source: Cambrosio et al., 2004.
DOCUMENTS AND ACTION 489
CG = Clinical Geneticist
NC = Nurse Counsellor
Cyrillic – a programme that draws pedigrees and calculates risk
and in clear sequences, and finally serve to Documents have content – words, sentences,
underline the ways in which the division of phrases – and content can be counted and
labour (between ‘doctors’ and ‘nurses’) is classified and compared (one document to
underpinned in both this episode and the clinic another). A study of document content can
at large (lines 23–24). form an excellent starting point for social
This second example also raises a number researchers – illustrating how ‘things’ are
of other important issues that lay beyond the described and linked. Social researchers
scope of this chapter; namely, how talk is may also be interested in how those same
to be transcribed and translated into writing things are represented and structured through
(as has been done in Figure 28.3), and what language – in which case the researcher is
conventions are to be deployed so as to render drawn into various forms of discourse anal-
active talk into inert text. ysis. These days of course there are various
types of software that can be called upon
and used as aids to content and discourse
CONCLUSIONS analysis. At the most basic level a researcher
can use a simple concordance programme.
The closing example – as shown in Such a programme would commonly provide
Figure 28.3 – illustrates the multidimensional a list and count of words used in a text
features of documentation in the social world. (together with a facility for locating word
DOCUMENTS AND ACTION 491
Keating, P. and Cambrosio, A. (2000) ‘”Real compared Myers, G. (1990) Writing Biology. Texts in the Construc-
to what?” Diagnosing leukemias and lymphomas’, in tion of Scientific Knowledge. London: University of
M. Lock, A. Young and A. Cambrosio (eds.) Living Wisconsin Press.
and Working with the New Medical Technologies. Nelkin, D. (2001) Molecular metaphors. The gene in
Intersections of Inquiry. Cambridge: Cambridge popular discourse. Nature Reviews, 2:555–559.
University Press. pp. 103–134. Plummer, K. (2001) Documents of Life.2. An invitation
Krippendorf, K. (2004) Content analysis. An Introduction to critical humanism. London: Sage.
to its Methodology. 2nd Ed. London: Sage. Prior, L. (2003) Using Documents in Social Research.
Latour, B. (1987) Science in Action. How to Follow London: Sage.
Scientists and Engineers Through Society. Milton Psathas, G. (1979) Organizational features of direction
Keynes: Open University Press. maps, in G. Psathas (ed.) Everyday Language.
Latour, B. and Woolgar, S. (1979) Laboratory Life. The Studies in Ethnomethodology. New York: Irvington
Social Construction of Scientific Facts. London: Sage. Publishers. pp. 203–225.
Law, J. and Hassard, J. (eds.) (1999) Actor-Network Scott, J. (1990) A Matter of Record. Documentary
Theory and After. Oxford: Blackwell. Sources in Social Research. Cambridge: Polity Press.
Livingstone, D.N. (2005) Text, talk, and testimony: Scott, J. (1999) Social Network Analysis. London: Sage.
geographical reflections on scientific habits. An Scott, J.P. (ed.) (2006) Documentary Research. 4 Vols.
afterword. British Society for the History of Science, London: Sage.
38:1:93–100. Thomas, W.I. and Znaniecki, F. (1958) The Polish
Lynch, M. and Woolgar, S. (eds.) (1990) Represen- Peasant in Europe and America. New York: Dover.
tation in Scientific Practice. Cambridge, MA: MIT Wolfe, R.M., Sharp, L.K. and Lipsky, M.S. (2002) Content
Press. and design attributes of anti-vaccination websites.
Marres, N. and Rogers, R. (2005) Recipe for tracing Journal of the American Medical Association,
the fate of issues and their publics on the web, in 287:24:3245–3248.
B. Latour and P. Wiebel (eds.) Making Things Public. Wood, L.A. (2000) Doing Discourse Analysis. Methods
Atmospheres of Democracy. Cambridge, MA: MIT for Studying Action in Talk and Text. London: Sage.
Press. pp. 922–935. Woolgar, S. (1988) Science: The Very Idea. London:
May, T. (2001) Social Research. Issues, Methods Tavistock.
and Process. 3 rd Ed. Buckingham: Open University World Health Organisation. (1992) International Statis-
Press. tical Classification of Diseases and Related Health
Mogoutov, A., Cambrosio, A. and Keating, P. (2005) Problems. 10 th Revision. London: HMSO. 3 Vols.
Making collaborative networks visible, in B. Latour Zimmerman, D.H. and Pollner, M. (1971) The everyday
and P. Wiebel (eds.) Making Things Public. Atmo- world as a phenomenon, in J.D. Douglas (ed.)
spheres of Democracy. Cambridge, MA: MIT Press. Understanding Everyday Life. London: Routledge and
pp. 342–345. Kegan Paul. pp. 80–103.
29
Video and the Analysis of
Work and Interaction
Christian Heath and Paul Luff
recordings of everyday activities and events. is very different from more traditional studies
The approach draws upon methodological of work and occupational practice. However,
developments within sociology, namely eth- in various ways it can be seen to evolve from
nomethodology and conversation analysis. some of the key methodological and analytic
It directs analytic attention towards the concerns that underpinned the emergence of
social and interactional accomplishment of organisational ethnographies. It is perhaps
everyday activities and events. Even though worthwhile providing a little background and
this analytic orientation is only one way raising one or two points that might give a
in which video is used in social science sense of the potential contribution of video
research, it is an approach that has proved and this particular approach.
highly productive and is of growing signif- Work and workplace organisation have
icance within various disciplines including formed a pervasive concern for sociology and
sociology, anthropology and linguistics. It more generally the social sciences from their
is an approach that has begun to throw a significant beginnings in the late nineteenth
new and distinctive light on a variety of century. It has long been recognised that
long-standing topics and issues in the social social interaction in the workplace produces
sciences and an approach that provides the and reproduces organisational forms and the
analytic resources to address the organisation various rules, procedures and dispositions
of social action across a broad and complex that inform the daily transactions that arise
range of everyday and institutional environ- between people in organisations. Parsons’
ments. In recent years for example, we have (1951) analysis of the ‘situation of medical
seen the emergence of studies of scientific practice’ is exemplary in this regard, and
practice, surveillance, medical consultations, though commonly known more for its expo-
children’s play, museum visits, the household, sition of the sick role rather than the organ-
computer-mediated communication, conver- isational structure of the professional-client
sational interaction, political discourse, sur- consultation, it powerfully demonstrates the
gical operations and architectural practice ways in which patterned forms of social
(see for example Engeström and Middleton, interaction, governed by expectations and
1996; Goodwin, 1981, 1995; Goodwin, 1990; dispositions, underpin medical work. The
Goodwin and Goodwin, 1994, 1996; Heath, character of this interaction however, and
1986; Heath and Luff, 2000; Knoblauch et al., the practices that enable its concerted and
2006; LeBaron and Koschmann, 2003; Luff contingent accomplishment, remain largely
et al., 2000; Mondada, 2003; Streeck and unexplicated. Indeed, despite the wide-spread
Kallmeyer, 2001; Suchman, 1987; Whalen, recognition that social interaction forms
1995, Whalen et al., 2002). In this chapter, we the foundation to work and occupational
draw on materials from a study of auctions practice, there is a long-standing neglect
and auction houses, to provide some practical in many forms of organisational analysis,
guidance to using video recordings to address of what Goffman (1983) refers to as the
the social and interactional organisation of ‘interaction order’. In turn, by neglecting the
naturally occurring events. interactional foundations of organisations, we
not infrequently find a disregard for the ways
in which work is accomplished by participants
WORKPLACE ORGANISATION & themselves (Barley, 1996; Barley and Kunda,
SOCIAL INTERACTION 2001; Silverman, 1970, 1997a, 1997b).
There are important exceptions. Since their
An increasing body of video-based, quali- early beginnings many qualitative studies of
tative research is concerned with work; in work and organisation have placed social
particular the social and interactional accom- interaction at the heart of analytic agenda.
plishment of complex forms of organisational For example, in his insightful discussion
activity. This burgeoning corpus of research of the methodological commitments that
VIDEO AND THE ANALYSIS OF WORK AND INTERACTION 495
informed what came to be known as the sociological attention. Perhaps the most
post-war Chicago school, Everett Hughes significant contribution in this regard are
suggests that the principal aim of the studies studies that draw upon ethnomethodology and
is to ‘discover patterns of interaction’ and conversation analysis and form ‘part of a
that ‘the subject matter of sociology is programme of work undertaken … to explore
interaction’ (Hughes, 1971). These method- the possibility of achieving a naturalistic
ological commitments, and in particular, observation discipline that could deal with the
the recognition that work and occupational details of social action(s) rigorously, empir-
performance evolves in, and is sustained ically, and formally’ (Schegloff and Sacks,
through, interaction, gave rise to a rich 1973:233). Building on the analysis of con-
and insightful body of sociological and versation, we have witnessed the emergence
in particular ethnographic studies of work of a broad range of studies of talk in insti-
and organisation (see for example Becker, tutional settings, primarily based on audio-
1963; Goffman, 1963; Roth, 1963; Strauss recordings, that address the organisation of
et al., 1964). These studies have had a a range of workplace activities including
profound influence on successive generations legal interrogation, news interviews, political
of workplace ethnography including for oratory, diagnosis in medical consultations,
example Barley, 1989; Hochschild, 1983; the delivery of bad news, counselling and ther-
Star, 1996; Strong, 1978; Van Maanen, apy and classroom instruction and teaching
1991, and directly and indirectly given (see for example Atkinson, 1984; Atkinson
rise to parallel developments in cognitive and Drew, 1980; Boden, 1994, Boden and
science, anthropology and emerging fields Zimmerman, 1991; Clayman and Heritage,
such as Computer Supported Cooperative 2002; Drew and Heritage, 1992; Heritage and
Work. Despite these methodological com- Maynard, 2006; Maynard, 2003; Peräkylä,
mitments, the richness and insightfulness of 1995; Silverman, 1997a, 1997b; Whalen
these ethnographies, the interaction that arises et al., 1988; Zimmerman, 1992). As Heritage
in, and sustains, organisations, the interac- (1984, 1997) points out, the sequential and
tion through which work is accomplished turn organisation of talk has provided a
in collaboration with others, can remain critical resource for these studies as they
under-explored and sometimes unexamined. explicate the ways in which highly specialised
Indeed, many of the concepts that inform forms of activity embody a re-specification
this ethnographic tradition: concepts such of the interactional practices that inform
as negotiation, bargaining, career, and the conversational organisation; a re-specification
like, tend to draw attention away from the that enables ‘institutional realities and their
details of organisational conduct – the talk, unique characteristics to be talked into
visible and material action through which being’.
people, in collaboration with others, produce Not withstanding the significant contribu-
and coordinate their workplace activities. tion of these studies to our understanding of
Moreover, the concepts and methodological work and organisation, it is recognised that
precepts that pervade qualitative studies of the interactional accomplishment of social
work and related forms of ethnography, whilst actions and activities involves the interplay
powerfully resonating with field studies and of talk and visible conduct such as gesture
naturalistic observation, do not necessarily and bodily comportment. It is recognised that
lend themselves to the analysis of video objects and artefacts, tools and technologies,
and in particular to examining the wealth of play a critical part in many activities and
detail made available through audio-visual that the use of material resources are a
recordings of everyday events. pervasive and integral feature of almost all
Over the past few decades however human activities not least of which those
the social and interactional foundations of that arise in the workplace. In the last
workplace activities has received sustained decade or so, audio-visual recordings of
496 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
and exchange of goods worth some billions of the length of pauses or silences are captured
pounds each year. in tenths of a second, for example, ‘(0.3)’.
To simplify matters we use ‘{B1 bids}’ Pauses of less than two tenths of a second
to represent the bidding, the number giving are represented by ‘(.)’; words or parts of
an indication of the order at which different words that are emphasised by the speaker are
participants enter the bidding. Where the underlined, ‘is that’. Sounds that are elongated
auctioneer (A) bids on behalf of a buyer who are captured by colons, the number of colons
cannot attend the sale – what is known as a representing the length of the elongation,
‘commission bid’ – we have used ‘{A bids}’. ‘number:’; and intonation is captured by
Commission bids are where the buyer leaves punctuation marks, for example, for rising
a price with the auction house and the intonation: ‘One thirty now:?’. More detailed
auctioneer bids on their behalf until they reach versions of the orthography can be found
the maximum price of the commission. in various books and collections including
for example Boden and Zimmerman (1991),
Drew and Heritage (1992) and Maynard
FRAGMENT 1: TRANSCRIPT 1 (2003).
Before considering the visible or nonverbal
A: Lot number: (0.2) Four Three aspects of the participants’ conduct, we can
Three (.) Four Three Three the lot begin to generate some initial observations
number: now. Bidding here at one
concerning the talk that arises in the fragment.
hundred pounds now.
(.) {A bids} In the first place, we can see that the talk
A: A hundred pounds I’m bid straight is primarily produced by one party, namely
away for this, at a hundred pounds:, the auctioneer. He briefly introduces the lot
(.) One hundred pounds (will do it) and then repeatedly announces a series of
One hundred one ten (.) n ow:? (0.3)
figures. These figures escalate in terms of
A hundred pounds only. One hundred
pounds, one hundred pounds. One ten increments of ten pounds – beginning at
now quickly? one hundred pounds, with the goods finally
(0.3) {B1 bids, B2 raises hand} being sold at one hundred and thirty pounds.
A: One ten is that. One ten I’m bid. Bidding appears to alternate between the
One ten. One twenty on commission now.
auctioneer, bidding on behalf of a commission
One thirty now:? One twenty still
with me, at one twenty. buyer (‘bidding here at one hundred’ and
{B2 bids} ‘one twenty on commission’), and buyers in
A: One thirty bid there: fresh bid, the room (B1 bids ‘one ten’, B2 bids ‘one
one thirty, one thirty. Forty now:? thirty bid there:’). In the first instance, the
(0.2)
auctioneer appears to take a bid from B1
A: At a hundred an thirty pounds (.)
bids there at one thirty. Do show rather than B2 who also attempts to bid by
if you happen to have an extra bid. raising his hand. The auctioneer not only
At one thirty over there. takes bids from particular participants, but
{knock} displays those bids to all who are present, for
A: One thirty that’s yours sir.
example announcing that the bid is ‘here’ at
The buyer number is?
one hundred pounds, ‘there’ at one hundred
Talk is transcribed using an orthography and thirty, and ‘still with me’ at one twenty.
developed by Gail Jefferson and commonly It also appears that the auctioneer goes to
used in ethnomethodology, conversation anal- some trouble to elicit bids from people in the
ysis and cognate approaches such as discourse audience and before finally selling the goods;
analysis. The transcription system is designed attempting to maximise the opportunities for
to capture aspects of the articulation of the anyone present to bid.
talk and in particular the interactional position Whilst the auctioneer does most, if not all,
and production of the participants’ utterances. of the speaking during the sale of the lot, the
Very briefly: talk is laid out turn by turn, transcript begins to reveal the ways in which
498 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
sequences of action are critical to the structure a question whose […] analysis may also be relevant
of the activity. For example, the auctioneer’s to find what ‘that’ is. That is to say, some utterances
repetition of a particular increment, such as may derive their character as actions entirely from
placement considerations. (1974)
one hundred pounds, involves an attempt
to elicit a bid from a member of the For instance, whilst the auctioneer repeat-
audience. Once the bid is received, in this edly reiterates the first bid, one hundred
case by a participant raising his hand, it is pounds, it is only when he announces the
acknowledged by the auctioneer with ‘one next increment with a rising intonation that
ten is that’. In turn, the auctioneer produces participants attempt to bid, in this case two
the next bid, on behalf of his commission at the same time. Transcription also begins to
buyer, ‘one twenty on commission now’ and reveal the complexity of the action that arises
invites a subsequent bid from the floor, ‘one even within a very brief fragment such as this,
thirty now:?’. The participant’s bid, indeed and provides the resources to begin to draw
the attempt by both B1 and B2 to bid, are some preliminary observations concerning
sensitive to the auctioneer’s invitation, ‘one the structure and arrangement of the actions.
ten now quickly?’, and in turn, the auctioneer In this case, the transcript also points to
accepts a bid from B1 and is able to announce some more general features of interaction,
the next bid, namely ‘one twenty’. In turn, be it within the workplace or any other
the announcement of the commission bid environment for that matter – how the event
at ‘one twenty still with me’ is followed contingently emerges, moment by moment,
by the auctioneer looking for a next bid and the ways in which each contribution
at one hundred and thirty pounds. We can is sensitive to the actions of others, or the
see therefore how particular actions of the withholding of particular actions, and oriented
auctioneer serve to elicit bids from members to a determinate range of possibilities.
of the audience, just as those bids enable the
auctioneer to announce the price and produce
a subsequent bid. Each action is sensitive to THE VISIBLE AND THE MATERIAL
the prior, indeed, may be elicited by the prior
action, and in each case forms the basis to It is clear that a range of actions that arise
subsequent action and activity. These actions within the sale of the lot are not available
are organised with regard to distinct forms of through inspection of the talk alone and that
sequential and interactional organisation that the talk is accompanied by, and sensitive
underpins the escalation of price. Where no to, various visible and material actions. For
further bids are forthcoming, the auctioneer example, at least two people bid using
is able to bring the sale to a successful nonverbal or visible actions and these bids are
completion with the fall of the hammer. critical to the escalation of the price and the
Transcribing talk provides the opportunity final sale of the goods. How these actions arise
to become more familiar with the actions with regard to the visible and accompanying
that arise within a particular activity and to talk of the auctioneer is not available using this
begin to scrutinise not only what is said and limited transcript. Moreover, these gestured
how, but the location of particular utterances turns or bids, are attributed by the auctioneer
or actions and how they are produced with to particular individuals in the room, or even
regard to the contributions of others. It an absentee buyer, and yet their ascription
enables the researcher to address why specific of actions to the participants, for example
actions arise at particular moments within the ‘one ten is that’, ‘bids there at one thirty’,
emerging course of the activity. As Schegloff ‘bidding here at one hundred pounds now’
and Sacks suggest: remain ambiguous without reference to the
visible aspects of the activity. These gestured
a pervasively relevant issue (for participants) about turns and their revelation are critical to the
utterances in conversation is ‘why that now,’ escalation of price and the sale of the goods
VIDEO AND THE ANALYSIS OF WORK AND INTERACTION 499
and feature in the sequence of action through The following is a highly simplified version
which bids are elicited and acknowledged. of a more complex transcript that is included
Various artefacts also play an important role later in the chapter, but it provides a sense of
in the event. The fall of the gavel for example the ways in which we can begin to map out
finalises the sale of the goods and their transfer the participants’ conduct and identify some
of ownership. The auctioneer’s book not only features of actions’ organisation.
provides information concerning commission Transcribing the visible, as well as the
bids, reserves and the like, but is referenced spoken aspects of the fragment, provides
and referred to by the auctioneer during the an important resource with which to begin
course of the sale. Without taking the visible to examine the participants’ conduct and to
aspects of the participants seriously, their identify the potential relationship between
gestures, bodily orientation, use of artefacts particular actions. For example, in this
and the like, it is difficult to address the fragment, we can notice that as he announces
organisation of the activity and the practices the current increment ‘one twenty on com-
upon which the auctioneer relies upon to mission’ the auctioneer turns and gestures
conduct the sale. towards the first bidder, B1, inviting him
To examine how the visible, as well as talk to bid at the next increment, namely one
feature in the accomplishment of the activity, hundred and thirty pounds. However, even
we need to develop our transcript to enable us as he voices the next increment ‘one thirty
to begin to encompass various aspects of the now:?’, he turns away from the first bidder
participants’ visible conduct. Unfortunately, and looks for an alternative participant who
but not surprisingly, there is no general may be prepared to bid. The auctioneer’s
or widely accepted transcription system for actions reveal that the first bidder has declined
the visible and material aspects of social the next increment and that ‘one thirty
interaction. Over some years however, those now:?’ serves as a generalised invitation for
undertaking video-based studies informed by anyone in the room to bid. As he undertakes
ethnomethodology and conversation analysis, the search for a new bidder, he not only
have developed ways of working with video announces that the bid is ‘still with me’ but
that enables them to transcribe aspects of reveals the source of that bid, dramatically
the participants’ bodily conduct in particular pointing first to the book that contains the
with regard to the talk (see for example commission bid and second to himself bidding
Goodwin, 1981; Heath, 1986). There is some on behalf of the absentee participant. A
individual variation in how this is done, but it new bidder raises his hand and the bid is
ordinarily includes identifying the onset and accepted ‘one thirty’. The bid is produced
completion of particular actions, such as a as the auctioneer announces ‘it’s still with
gesture and demarcating significant aspects me’ and in particular when the auctioneer’s
of its articulation – such as for example, search around the room arrives at the area
where it reaches its acme. These transcripts where the bidder is sitting. In other words,
are primarily concerned with delineating the both the auctioneer’s announcement ‘it’s still
occurrence and position of particular aspects with me’ and his visual orientation, serve to
of the participants’ visible conduct. They encourage the participant to bid and to bid at a
may include details of head nods, gestures, particular moment. As he announces the bid,
visual orientation, changes in body position, ‘one thirty’, the auctioneer gestures towards
the use of particular artefacts, and the like; the bidder, and displays both to the bidder
indeed whatever arises within the developing and all those present, who has the bid of
course of a fragment. The transcript provides ‘one thirty’.
a resource to begin to discover the geography Such transcripts are far more detailed than
and organisation of action within a fragment the diagram shown above. They are primarily
and to document certain features of the used by the researcher and enable a range
participants’ conduct and interaction. of potentially relevant details of conduct
500 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
FRAGMENT 1: TRANSCRIPT 2
Auctioneer
Orientation looks
B1 around room B2
..........____________________,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,....____________________,,
Gesture
open palm points
at B1 at book at self at book at B2
Œ Œ Œ Œ Œ
One twenty on commission. One thirty now: It’s still with me. At one twenty. One thirty
Œ
to be identified, clarified and documented. which such transcripts are primarily designed
They form the basis to generating notes as a vehicle for the individual researcher to
and ideas about particular fragments and examine and document observations concern-
the organisation of particular actions and ing a fragment. Transcription is an important,
activities. The transcript below is part of the if not the critical resource, for the analysis
original from which our observations of this of particular events with video recording
fragment are drawn and illustrates the ways in remaining the principal source of data.
VIDEO AND THE ANALYSIS OF WORK AND INTERACTION 501
FRAGMENT 1 TRANSCRIPT 3
particular actions within the practicalities of have to be managed. Moreover, with the
accomplishing everyday, socially organised, interest in exploring the social interactional
activities, in concert, with others. organisation of naturally occurring activities,
it is critical, as far as practically possible,
to encompass the actions of all participants.
DATA COLLECTION In some settings, where there are two or
three participants involved in what Goffman
This particular form of analysis has a signifi- (1971) refers to as a ‘focused gathering’it may
cant bearing on the type of data that needs to well be possible to gather analytically fruitful
be gathered and the ways in which we record data using a single camera with a built-in
and document action and activities within microphone. Settings that involve numerous
particular environments. Every setting poses participants, and in some cases a diverse
its own unique demands on data collection and range of material resources, settings such
can raise particular difficulties for undertaking as classrooms, control rooms and operating
video recording. In almost every setting theatres, may necessitate the use of multiple
it is critical therefore that the researcher cameras and separate microphones placed in a
undertakes a period of field observation before number of locations. It is unlikely, even after
considering the introduction of cameras a period of fieldwork, that the first recordings
and microphones. Fieldwork provides an will provide the necessary quality or access to
opportunity for the researcher to become the action, and in many cases, the researcher
familiar with the setting – the socio-physical will find that it is necessary to gather recorded
environment, the sorts of activities that arise data over a series of occasions before finding
and patterns of interaction and the like. It the most useful and appropriate position and
also enables the researcher to see the ways perspective for recording. Indeed, it is not
in which various material resources feature unusual during the course of a project to gather
in particular activities, be they computers, data from rather different positions to enable
paper documents, or even as in previous particular phenomena to be investigated.
fragment, hammers, and to reflect upon the These phenomena, and the decision to collect
ways in which they constrain and of course particular forms of data, may change as the
provide opportunities for particular activities. analysis develops during the course of a study;
Last but not least, a period of fieldwork data collection is an iterative process in which
enables the researcher to engage, where materials may be progressively gathered in
relevant, with the participants themselves, the course of examining, transcribing and
and to establish a relationship that can form analysing data.
the basis to securing their willingness to While the audio-visual recordings are
be video recorded and to clarify the ethical likely to form the principal data on which
requirements that participants themselves see analysis is developed, fieldwork, and in some
as important. cases the fieldwork that accompanies the
There are a number of practical issues that actual recording, remains an important, if not
have to be addressed in undertaking video critical, part of the research. If we take the
recording of naturally occurring activities. workplace for example, there are a range
Each setting poses its own unique demands of practices, conventions and resources that
and it is unusual that one is able to gather bear upon, and inform the accomplishment
quality data on the first occasion that one of particular activities, and it may well be
records. The lighting, the physical arrange- necessary to augment video analysis with field
ment of the space, the position and movement observation and even interviews. In many
of the participants, the ambient noise, the cases it is necessary to gain access to the
location of particular objects and technologies relevant material resources, such as records,
and the necessity to remain, as far as possible, work sheets, diagrams, plans and the like,
unobtrusive, can all raise difficulties that and become familiar with the ways in which
VIDEO AND THE ANALYSIS OF WORK AND INTERACTION 503
they are used. In this regard, screen-based or field research. The technology, and the
technologies can pose particular difficulties, analytic opportunities it affords, however,
since it may be necessary to record the raises important methodological challenges
contents of the screen. There are a number of for the social sciences and demands distinctive
solutions. In some circumstances it is possible approaches to the study of social action and
to video record the screen with a camera (for interaction.
some screens this may require the frame rate Perhaps the most substantial corpus of
to be appropriately adjusted). The data that video-based, naturalistic studies to emerge
are gathered depends not only on the analytic within sociology over the past couple of
approach that has been adopted but also the decades or so have been informed by
sorts of phenomena that are addressed. Data, ethnomethodology and conversation analysis.
including audio-visual recordings, are always These studies have addressed the social and
constrained by practicalities and resources. interactional organisation of a broad range
It is critical however that the materials that of actions and activities and delineated ways
underpin the research can legitimately serve in which seemingly mundane events are
the insights and phenomena that are addressed accomplished in, and through, the complex,
in the analysis. In particular, we need to yet systematic, interplay of talk, visible and
demonstrate the ways in which participants material conduct. They have revealed the
themselves orient to and rely upon the order and organisation that underlies and
practices that inform the accomplishment of informs the production of everyday activities
the action and activities at hand. and begun to delineate the resources on which
participants rely to make sense of and coordi-
nate the actions in which they engage. In this
SUMMARY regard, the emergence of workplace studies –
studies of work, interaction and technology
Despite numerous calls for the social sciences in complex organisational environments – is
to take the visual seriously, video remains a of particular interest. These studies provide
surprisingly neglected resource, relegated to the resources to address and re-specify some
a marginal role in some qualitative research key concepts and ideas that inform more
and absent from most. When video is used, conventional analyses of occupational prac-
it often forms an accompaniment to others tice and institutional environments. Indeed,
forms of data collection that prioritises in various ways, these video-based studies of
fieldwork, and is used to illustrate events and the workplace draw on, and transform, the
activities that have been primarily identified long-standing recognition that social inter-
and analysed using conventional ethnographic action underpins and preserves institutional
observations. Yet video provides a more arrangements, and enables a reorientation
significant opportunity for social science of studies of organisational practice. They
research, a resource that enables analysts to also provide a vehicle for taking the object
scrutinise social actions and activities in ways seriously, and reshaping the ways in which
that hitherto were not possible, to begin to we address and reveal how the material, the
discover phenomena and aspects of socially environment, technology, artefacts and the
organised practice unavailable to conven- like, feature in the practical accomplishment
tional fieldwork and ethnography. Moreover, of social action and activity. The significance
audio-visual recordings of naturally occurring of video therefore is not simply that it provides
activities and events provide the opportunity another way of gathering data, but rather, with
of building a more cumulative data corpus an appropriate methodological framework,
than is possible within many other forms enables the social sciences to build a rigorous
of qualitative research and to engage in and systematic analysis of the organised
forms of collaborative research and analysis production of social action as it occurs in its
that is unavailable within much ethnography everyday, natural environments.
504 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Marks, D. (1995) Ethnographic film: from Flaberty to Silverman, D. (1997b). Discourses of Counseling: HIV
Asch and after. American Anthropologist, 97 (2), Counseling as Social Interaction. London: Sage.
337–347. Simmel, G. (1950). The Sociology of George Simmel,
Maynard, D. W. (2003). Bad News, Good News: Wolf, K. (ed). Glencoe, Illinois: Free Press.
Conversational Order in Everyday Talk and Clinical Smith, C. W. (1989). Auctions: The Social Construction
Settings. Chicago: University of Chicago Press. of Value. London: Harvester Wheatsheaf.
Mondada, L. (2003). Working with video: how surgeons Star, S. L. (1996). Working together: Symbolic interac-
produce video records of their actions. Visual Studies, tionism, activity theory and information systems, in
18, 58–73. Engeström, Y. & Middleton, D. (eds) Cognition and
Parsons, A. S. (1951). The Social System. Glencoe: Free Communication at Work (pp. 296–318). Cambridge:
Press. Cambridge University Press.
Peräkylä, A. (1995). Aids Counselling: Institutional Inter- Strauss, A., Schatzman, L., Bucher, R., Ehrlich, D.,
action and Clinical Practice. Cambridge: Cambridge & Sabshin, M. (1964). Psychiatric Ideologies and
University Press. Institutions. London: Free Press.
Pink, S. (2001a). Doing Ethnography: Images, Media and Streeck, J., & Kallmeyer, W. (2001). Interaction by
Representation in Research. London: Sage. inscription. Journal of Pragmatics, 33, 465–490.
Pink, S. (2001b) More visualising, more methodologies Strong, P. (1978). The Ceremonial Order of the Clinic:
on video, reflexivity and qualitative research. Patients, Doctors and Medical Bureaucracies. London:
Sociological Review, 49 (1), 586–599. Routledge Kegan Paul.
Prodger, P. (2003). Time Stands Still: Muybridge and Suchman, L. (1987). Plans and Situated Actions: The
the Instantaneous Photography Movement. Oxford: Problem of Human Machine Interaction. Cambridge:
Oxford University Press. Cambridge University Press.
Rose, G. (2001). Visual Methodologies: An Introduction Van Maanen, J. (1991) The smile factory: Work at
to the Interpretation of Visual Materials. London: Disneyland, in Frost, P. J., Moore, L. F., Louis, M. L.,
Sage. Lundberg, C. C., & Martin, J. (eds) Reframing
Roth, J. A. (1963). Timetables: Structuring and the Organisational Culture (pp. 58–76). London: Sage.
Passage of Time in Hospital Treatment and other Whalen, J. (1995). Expert systems vs. systems for
Careers. Indianapolis: Bobbs Merrill. experts: Computer-aided dispatch as a support
Ruby, J. (2000). Picturing Culture: Explorations of Film system in real-world environments, in P. Thomas (ed)
and Anthropology. Chicago: University of Chicago The Social and Interactional Dimensions of Human-
Press. Computer Interfaces (pp. 161–183). Cambridge:
Schegloff, E. A., & Sacks, H. (1973). Opening up closings. Cambridge University Press.
Semiotica, 7, 289–327. Whalen, J., Whalen, M., & Henderson, K. (2002).
Schegloff, E. A., & Sacks, H. (1974). Opening up closings, Improvisational choreography in a teleservice work.
in R. Turner (ed.) Ethnomethodology (pp. 233–264). British Journal of Sociology, 53 (2), 239–259.
Harmondsworth, U.K. and Baltimore, MD: Whalen, J., Zimmerman, D. & Whalen, M. (1988). When
Penguin. words fail: a single case analysis. Communication
Silverman, D. (1970). The Theory of Organisation. Yearbook, 11, 406–432.
London: Heinemann. Zimmerman, D. H. (1992). The interactional organization
Silverman, D. (1997a). Studying organisational interac- of calls for emergency assistance, in P. Drew &
tion: ethnomethodology’s contribution to the ‘new J. Heritage (eds) Talk at Work: Interaction in
institutionalism’. Administrative Theory and Praxis, Institutional Settings (pp. 418–469). Cambridge:
19 (2), 1. Cambridge University Press.
30
Secondary Analysis of
Qualitative Data
Janet Heaton
qualitative secondary analysis (Heaton, 1998, Republic; the Norwegian Social Science
2000, 2004). It focuses on developments in Data Services (NSD); the Swedish Social
the UK, where there has been considerable Science Data Services (SSD); and the Institute
work to promote the archiving and re-use of für Geschichte und Biographie in Germany.
qualitative data, and describes examples of In the USA, the Murray Research Center
secondary analysis carried out internationally (A Center for the Study of Lives at Harvard
in social research (but not social research of a University) holds over 270 datasets from
more historical nature). Most of the examples research on human development and social
are from health-related research, where the change, including longitudinal datasets con-
vast majority of studies involving the re-use of taining qualitative data (James and Sørensen,
qualitative research data have been published 2000).
to date. Particular advances in qualitative data
archiving have been made in the UK, where
formal sharing of all types of qualitative data
across the social sciences has been heavily
STATE OF THE ART
promoted since the mid-1990s by a major
funder of social research, the Economic and
Accessing qualitative data
Social Research Council (ESRC). In 1994, the
There are three ways in which social ESRC established the world’s first and only
researchers can access qualitative research Qualitative Data Archiving Resource Centre
data for secondary analysis: through data (Qualidata), based at the University of Essex
archives, by informal data sharing and by in England and directed by Paul Thompson.
re-using data from their own previous research The role of this service has evolved over
(Heaton, 2004). These approaches, and some time (Corti, 2000, 2003; Corti and Backhouse,
illustrative examples of studies using different 2005; Corti and Thompson, 2004). Originally,
sources of data, are described below. Qualidata was set up to promote and facilitate
the archiving of qualitative datasets in existing
Data archives repositories across the UK. In 2003, Qualidata
Many countries have national and other data became part of the new Economic and Social
archives which preserve datasets from the Data Service (ESDS), an initiative jointly
social sciences and make them available for funded by the ESRC and Joint Information
further use by other researchers.Archived data Systems Committee (JISC). Renamed ESDS
tends to be quantitative rather than qualitative Qualidata, the service is now based within
in nature, although some longitudinal studies the UKDA. Following a consultation carried
include a qualitative component. Where out for the ESRC on the use of qualitative
archives do hold qualitative data, these tend research resources (Henwood and Lang,
to be collections of life stories retained for 2003), ESDS Qualidata has sought to improve
use in historical research, rather than other the accessibility of archived material by
types of qualitative data often collected in making selected datasets available via the
social research. Information on worldwide web, and by creating web-based samplers of a
archives is available through the Council larger number of datasets so that researchers
of European Social Sciences Data Archives can more easily assess the potential for
(CESSDA) website1 . In Europe, there are using them in teaching and/or for secondary
a number of archives where qualitative research purposes.
datasets are already deposited, or which are The ESRC has further promoted qualitative
planning to accept this type of data2 . They data archiving and re-use through a number
include: the UK Data Archive (UKDA); of related policy and funding initiatives.
the Finnish Social Science Data Archive Since 1995, the ESRC has had a Datasets
(FSD); the Danish Data Archives (DDA); Policy making it a condition of its awards
the Sociological Data Archive in The Czech that researchers make available for archiving
508 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
qualitative datasets arising from their work; studies published in the international health
in applying for funding researchers also have and social care literature, which has been
to demonstrate that the proposed primary updated over time (Heaton, 1998, 2000,
research cannot be carried out using existing 2004). While this work is limited in that it
archived datasets3 . In addition, following the focuses on one area of social research, it
aforementioned consultation on qualitative provides an indication of how researchers
research resources, the ESRC funded a have re-used qualitative data in practice,
feasibility study on the possibility of a and I have not found evidence to suggest
qualitative longitudinal study (Holland et al., that numerous secondary studies have been
2004). This, in turn, has been followed published in other areas of social research to
up with funding for a programme of work date8 . The review found that that only nine
intended to develop resources for qualitative (14%) of the 65 secondary studies identified
secondary analysis. This includes funding involved the re-use of datasets collected
for a series of demonstration studies to by other researchers, and were carried out
investigate the value of innovative models independently of the primary researchers
of archiving, sharing and re-using qualitative (Heaton, 2004). Of these, two studies utilised
data, commissioned in 2005 as part of publicly archived datasets. One was a study
the ESRC’s Qualitative Archiving and Data by Bloor (2000) of communal understanding
Sharing Scheme (QUADS)4 . It also includes of, and responses to, the disease popularly
funding for a major qualitative longitudinal known as ‘Miners’ Lung’, using oral history
study, called Changing Lives and Times, material from South Wales Miners’ Library
commencing in 20065 . at the University of Wales Swansea. The
As a result of the above strategies, there other was a study by Bevan (2000) of
has been an increase in the availability of the career choices of general practitioners,
archived qualitative datasets in the UK, as well using life histories deposited with the British
as an improvement in the cataloguing of these Library National Sound Library. Another two
resources. By 2002, Qualidata had facilitated publications were based on data that Julius
archiving of 140 qualitative datasets and Roth had left with Paul Atkinson and which
added details of a further 150 existing were used for teaching and in research. These
collections to its catalogue (Corti, 2003; see data were re-used in a study of the cultural
also Corti and Backhouse, 2005)6 . However, aspects of tuberculosis (Weaver, 1994), and
there have been difficulties collating figures also to illustrate a book on micro-computing
on usage of these resources (Corti, 2000), and qualitative data analysis (Weaver and
and little is known about the extent to which Atkinson, 1994).
existing datasets have been accessed7 . Of Notable secondary studies which have been
course, many archived datasets have only just carried out using archived datasets in other
become available and work is ongoing to areas of social research include Fielding
improve the accessibility of some of these, and Fielding’s (2000) secondary analysis of
hence it will take time for researchers to com- Cohen and Taylor’s (1972) research on the
plete work based on these resources and for long-term imprisonment of men in a maxi-
resulting secondary studies to be published. mum security prison (archived at the Institute
Nonetheless, as Parry and Mauthner (2005) of Criminology, Cambridge). And data from
have argued, the ongoing case for qualitative the ‘Affluent Worker’ study (available via
data archiving (and different models for this) Qualidata, at the University of Essex) have
needs to be supported by information on the been re-used in a secondary study by Savage
extent to which these datasets are accessed and (2005a; see also Savage 2005b). Thompson
re-used, by whom and for what purposes. (1998) has also reported that oral histories
In a bid to examine whether and how collected for ‘The Edwardians’ study (held at
researchers have re-used qualitative research the University of Essex) have been re-used in
data, in 1997 I began a review of secondary numerous publications and for teaching.
SECONDARY ANALYSIS OF QUALITATIVE DATA 509
verification; and for teaching and learning where primary research stops and secondary
(no example of verification was provided, analysis starts, particularly when the sup-
which the authors acknowledge researchers plementary analysis is carried out by the
have not yet pursued, despite the availability same researchers who carried out the primary
of resources). research.
In my review of the health and social An example of supplementary analysis is
care literature, which looked in detail at how provided by Brownlie and Howson’s (2005)
and why researchers had re-used qualitative secondary analysis of two datasets on pro-
datasets in published studies, I found that fessional and parental views of the measles,
there were five main types of qualitative mumps and rubella (MMR) vaccination.
secondary analysis (Heaton, 2004). These These data were collected in studies carried
are summarised below, together with a few out by an independent research agency for the
examples of relevant studies drawn from the Health Education Board for Scotland (HEBS,
review and from a more recent search of the now NHS Health Scotland) in 1999 and
social research literature carried out to update 2001. These organisations agreed to provide
the findings for this chapter. the secondary researchers with access to the
datasets after they had been anonymised by
Supra analysis the research agency. The secondary analysis
In this type of secondary analysis, the focused on ‘emergent themes of trust and
focus of the secondary study transcends parental anxiety about risk’ (Brownlie and
that of the primary work. New theoretical, Howson, 2005: 223).
empirical or methodological questions are
explored that are distinct from the aims of Re-analysis
the original research. For example, three of Whereas the above types of secondary
the secondary studies reviewed focused on the analysis involve the investigation of new
use of metaphors in participants’ accounts of questions or emergent issues, the purpose of
medical encounters (Jairath, 1999; Jenny and re-analysis is to verify and corroborate the
Logan, 1996; Pascalev, 1996). Another three findings of previous work. Only one example
studies used secondary analysis in method- approximating this type of secondary analysis
ological work concerning micro-computing was identified in the review. This was a study
and qualitative data analysis (Weaver and by Popkess-Vawter et al. (1998), where
Atkinson, 1994), different methods of textual alternative methods of analysis were used,
analysis (Atkinson, 1992), and the value of in a form of methodological triangulation, to
different approaches to biographical analysis re-examine data originally collected by the
(Jones and Rupp, 2000). first author on women’s experiences of losing
and gaining weight after dieting (‘weight-
Supplementary analysis cycling’). Whereas the primary analysis was
Supplementary analysis was the most com- based on ‘reversal theory’, the secondary anal-
mon type of secondary analysis identified in ysis was a content analysis performed by two
the review. This approach involves the in- independent coders ‘with no consideration
depth investigation of an issue, or aspect for reversal theory’ (Popkess-Vawter et al.,
of the data, that was not addressed, or was 1998: 71). The authors claim that secondary
only partly covered, in the original research. analysis was carried out to provide ‘a validity
The focus may be on a particular issue check for the primary coding and an accuracy
or theme that emerged from the primary check for complete interpretation’ (Popkess-
work, or on a sub-set of the data. Unlike Vawter et al., 1998: 71), although in reporting
supra analysis, the subject of this type of their results they do not comment on how
secondary analysis is more closely related the coding and findings from the secondary
to that of the primary work. As a result, in analysis related to those previously applied
some cases it may be difficult to distinguish and obtained.
SECONDARY ANALYSIS OF QUALITATIVE DATA 511
the fieldwork (Heaton, 1998, 2004). Some Godfrey, 2003; Thorne, 1998). This could be
archivists and researchers have also argued done at the time data are collected. However,
that this problem can be reduced by primary information on exactly how data will be re-
researchers fully documenting their dataset, used, by whom and for what purpose, is
and by secondary analysts consulting the likely to be scant at this time. Alternatively,
researchers who collected the data (Corti and consent could be sought retrospectively, as
Thompson, 2004; Fielding, 2004; Hinds et al., and when particular secondary studies are
1997). planned. But this requires that participants’
Yet another concern is whether one sug- identity and contact details are known and
gested use of secondary analysis – re- can be used for this purpose. Re-contacting
analysis in order to confirm or discount participants also presents researchers with
previous research findings – is a realistic logistical and ethical difficulties where people
ambition or accordant with the principles have changed address or may have died; being
of qualitative inquiry (Hammersley, 1997; re-contacted may also be unwelcome to some
Heaton, 2004). However, others support the former participants. In addition, whether or
concept of preserving data for replication not researchers decide to seek fresh consent
in both quantitative and qualitative research for a secondary study may depend on who
(Schneider, 2004). collected the data and on the type of qualitative
Discussion of technical issues has tended secondary analysis planned; for example, in
to focus more on issues of how to archive the case of a supplementary analysis carried
qualitative data than how to do qualitative sec- out by the same researchers who collected
ondary analysis. For example, there has been the primary data, and where the aims of the
some discussion of how best to anonymise secondary and primary research are relatively
qualitative data while preserving the integrity congruent, this may not be required (for
of datasets (Thomson et al., 2005), and when example, see Brownlie and Howson, 2005).
best to obtain consent for archiving and re- From a legal perspective, data may be
using qualitative data (see below). However, re-used in research in the UK under the
unlike the literature on secondary analysis of Data Protection Act 1998 providing it has
quantitative data, there are no textbooks on been anonymised. However, copyright law
how to re-use qualitative data and there has also has to be considered when publicly
been only preliminary discussion of issues archiving and re-using qualitative data. Under
such as: how to design secondary studies re- the Copyright, Designs and Patents Act
using qualitative data; how to find and select 1988, copyright of ‘original works’ (which
relevant datasets; how to analyse secondary include interview transcripts), is owned by
qualitative data; how to assure and assess the interviewee. While some use can be made
the quality of secondary studies; and what of such material by non-copyright holders,
to include in reports of such studies (Heaton, researchers in the UK have been advised to
2004; Hinds et al., 1997; Thorne, 1994, 1998). have ownership of copyright of qualitative
There is an urgent need for further research on data transferred in writing from participants
these topics. to themselves or an archive if the dataset is
to be archived for re-use by others (Allen and
Ethical and legal concerns Overy, 1998).
Another set of concerns relate to the ethical
and legal aspects of re-using qualitative
data. These include the issue of whether QUESTIONS FOR FUTURE POLICY AND
and, if so, when researchers should seek PRACTICE
consent to re-use data in secondary studies
(Alderson, 1998; Corti et al., 2000; Heaton, Ongoing developments in the secondary
2004; Hood-Williams and Harrison, 1998; analysis of qualitative data raise a number
Parry and Mauthner, 2004; Richardson and of questions for future policy and practice
SECONDARY ANALYSIS OF QUALITATIVE DATA 513
concerning the collection, archiving and re- should concentrate on retention of ‘classic’
use of qualitative data. Three of the most or ‘key’ qualitative datasets and suggested
critical questions are discussed below. the ESRC explore ‘alternative approaches
to the re-use of qualitative data in order to
demonstrate the possibilities’ (Boddy, 2001).
Which qualitative datasets should be
But, as Parry and Mauthner (2005) point out,
archived?
this begs the question of how some datasets
As we have seen, great advances in qualitative come to be defined as ‘classic’ and selected
data archiving have been made in the UK, for archiving. Furthermore, as we have seen,
driven by policies of a major funder of there is little evidence of the extent to which
research in the social sciences, the ESRC. researchers have made use of qualitative
Since 1995, the ESRC has had a Datasets datasets that have been officially archived so
Policy that requires researchers to provide far across the UK. So far, reviews have shown
qualitative datasets for archiving and possible that most of the (non-historical) secondary
use by third parties as a condition of their analyses of qualitative data published to
funding, although applicants may make a date have been by researchers who have
case for exemption or request access to informally shared their data or re-used their
their datasets is made subject to conditions. own data.
Qualidata helped inform development of the Adoption of a blanket mandatory rather
ESRCs Datasets Policy and has discussed than, say, an elective or invited, policy
archiving policies with other funders (Corti of formal data archiving, would mean that
and Backhouse, 2005). While commending all researchers would have to aim to meet
the ESRC’s policy lead, staff from ESDS minimum criteria for archiving datasets to
Qualidata have recommended that the ESRC a standard that could be used by third parties –
improves implementation of its Datasets regardless of the nature of the study, the
Policy, to make it more ‘robust, systematic potential value of the dataset as a secondary
and accountable’ – for example, suggesting resource (which may be hard to predict
that penalties could be introduced for non- in advance), and the associated work and
compliant researchers (Corti, 2003: 424; see costs involved in meeting this standard. The
also Corti and Backhouse, 2005). But what requirement to archive could also impact
is the case for such a mandatory policy of upon the conduct of primary qualitative
data archiving? And what are the possible research when consent for archiving data
alternatives to this model of promoting is sought at the time of data collection,
secondary analysis of qualitative data? adding to the amount of information that
Parry and Mauthner (2005: 338) have needs to be given and explained to potential
argued that, so far as the demand for archived research participants by primary researchers.
data goes, the ‘jury are still out’. As they point While ESDS Qualidata provide guidelines on
out, there is no clear evidence of support for how to do this10 , it is not known whether
formal archiving of qualitative datasets. On prolonging and complicating the process of
the one hand, Qualidata carried out a survey of getting informed consent at this stage affects
academics and researchers in the UK in 1999, participants’ agreement to take part. Nor is
which found that 92% of over 550 respondents it known if, having agreed to take part and
wanted access to qualitative datasets (Corti have their contribution to the dataset deposited
and Thompson, 2004)9 . On the other hand, in an archive, participants’ disclosure to
a report of a consultation on ESRC Data the primary researcher(s) is affected by
Policy andArchiving found mixed support for, the knowledge that the information will be
and highlighted ‘considerable concerns within available, albeit anonymously, to unknown
the research community’ about, the archiving third parties. In short, there is little research on
and re-use of qualitative data (Boddy, 2001). this topic to help researchers, peer reviewers
The report recommended that archiving policy of grant applications, funding organisations,
514 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
ethics committees, and the public, decide supply qualitative data for use in secondary
whether or not archiving is, per se, a desirable research. These studies would be the equiva-
scientific and personal option in social lent of multi-purpose statistical surveys, and
research. funding would include provision for archiving
Different models of quantitative and and associated costs. Both ‘exemplary’ and
qualitative archiving have been previously multi-purpose qualitative datasets would be
discussed (see Boddy, 2001; Corti, 2000). available to registered users via the web.
In contrast to a mandatory qualitative data Whereas the first and last points are being
archiving policy, I would like to propose advanced through the aforementioned ESRC
an alternative fourfold strategy, subject to initiatives on formal data sharing, less is
support by the research community, including being done to investigate, support and develop
research participants and the public. First, informal data sharing, or the private retention
this strategy would focus on making widely and re-use of qualitative datasets.
available datasets that have value for historical
and/or contemporary secondary research.
Where this differs from current policy and
Whose qualitative data should be
practice is that what counts as an ‘exemplary’
re-used?
study would be decided retrospectively and by
independent peer review, using agreed criteria As the availability of public and pri-
for selecting studies which demonstrate value vately retained qualitative datasets grows,
for teaching and/or secondary research pur- researchers will increasingly have a choice of
poses across social science disciplines. The not only whether to do primary or secondary
selection of datasets for archiving would be a research, but whether to re-use datasets
mark of prestige for the researchers involved, available in dedicated archives, via informal
and include a financial award for their help data sharing, or from their own oeuvre.
in documenting and preparing the dataset for The advantages and limitations of re-using
deposit. qualitative data varies depending on whose
Second, in support of the requirement data are re-used.
of some funding organisations, and many The main advantages of re-using formally
publishers, that datasets are retained for archived datasets are that these will have been
a minimum period of time after work has specially prepared for use by third parties.
been completed and/or published, funders Thus, issues of consent, copyright ownership,
would make available resources for the anonymity of data, meta-documentation of
adequate in-house preparation and retention datasets and conditions of access to and use
of qualitative datasets. At present, there is of the material, should have been dealt with
little provision in research grants, and limited and be clear to potential secondary users.
facilities in university workplaces, for data The main limitations are that these datasets
to be adequately retained even for a limited will have been collected by other researchers,
period and to a lower standard than that which presents two major problems for
required in formal data archiving (that is, secondary analysts. One is how they can
where data are not purposely made available recapture the context in which the original
for use by third parties). Third, services study was devised and the data collected.
such as Qualidata and data archives would As we have seen, some researchers believe
provide advice and guidelines for researchers that intimate knowledge of ‘being there’ in
on protocols for informal data sharing and the field and ‘immersing’ oneself in data
researchers’ re-use of their self-held datasets, processing and analysis, are integral and
as well as procedures for depositing and re- essential to the process of doing qualitative
using qualitative datasets in official archives. research and making sense of people’s
Finally, funds would be dedicated for projects, experiences. While the meta-documentation
such as longitudinal studies, designed to of datasets provides some background and
SECONDARY ANALYSIS OF QUALITATIVE DATA 515
insight, this can only ever be an approx- The main disadvantages of informal data
imation (Mauthner et al., 1998). The other sharing are that datasets may not be prepared
problem, which is related to this, concerns the to be as high a standard as in an archive, nor
relative distance that secondary analysis of may all the aforementioned protocols be fully
formally archived datasets imposes between satisfied. For example, secondary researchers
the researcher and the researched (Thorne, may have to re-contact participants for
1994). Here, the researcher’s relationship consent to re-use data where this is required.
to the data is reduced to (most likely) In addition, where primary researchers share
anonymised data, perhaps offset by wider their data but are not involved in the
personal experience of doing primary research secondary analysis, the disadvantages of
with similar groups of people, and/or by re-using formally archived data apply and
contact with the primary researcher(s) who may be compounded by the relatively poor
collected the data. While there may be documentation of datasets if they have not
some advantages to having this distance been prepared for sharing with third parties.
in some secondary studies (for instance, Many of the advantages and limitations
where re-analysis is the goal), nevertheless of informal data sharing apply to secondary
it is a different, less intimate, relationship research carried out by researchers who
compared to that of primary researchers and choose to re-use their own data. Additional
their subjects11 . advantages are that researchers who have
Where datasets are informally shared worked on related projects in their careers can
between colleagues, and primary researchers draw on and utilise material from this work
are also involved in the secondary analysis, (for example, see Thorne, 1990a, 1990b and
here the secondary research team have the 1990c). Researchers may also identify and
advantage of jointly holding and sharing the follow up spontaneous topics of analysis that
tacit, as well as the documented, knowledge emerge unexpectedly in the course of research
of the researcher(s) who collected the data. and which otherwise may go unanalysed if
In this situation, the process of doing they are not germane to the aims of the primary
secondary analysis is arguably no different research, or if the data are not shared with
to that of doing primary research in teams others or archived for further use. However,
where interviews may be carried out and this practice raises new issues. Where does
analysed by different members (Heaton, 1998, primary research stop and secondary analysis
2004). The co-involvement of the primary start? At what point is further consent
researchers means that, compared to re-using required from participants to re-use data for
archived data, there may be greater awareness spontaneous studies, even if these are to be
of the context of the primary work, and carried out by the same researcher(s) who
sensitivity to the feelings of the researched collected the data? Finally, researchers who
(and any other researchers who carried out re-use their own data may also find that their
the primary work). Other advantages are memory of the original study changes over
that secondary researchers may be able to time, and that their perspective shifts as their
gain quicker access to informally shared own life experiences inform their subsequent
datasets, rather than have to wait for them analysis of the data (Mauthner et al., 1998).
to be processed and become available via an Of course, the above are just some of the
archive. They may also have access to and be pros and cons of working with qualitative
able to re-use any electronic coding that was data drawn from different sources. Many
employed in the original analysis, carried out other factors, including the accessibility of
using software designed to assist qualitative datasets, preference for data format, quality
data analysis. And where primary researchers of the original study, degree of ‘fit’ between
are involved in the secondary research, they the aims of the secondary research and
may retain direct control over the re-use of the content of the dataset(s), trust between
the dataset rather than rely on an archive. researchers, compatibility of shared datasets,
516 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
3 ESRCs Datasets Policy is set out in Annex C Allen and Overy (1998) ’Copyright/confidentiality: final
of ‘2005 ESRC Research Funding Guide Post fEC’, report to the Economic and Social Research Council’.
available at: http://www.esrcsocietytoday.ac.uk/ Retrieved from: ftp://ftp.esrc.ac.uk/pub/guide.doc
ESRCInfoCentre/opportunities/research%5Ffunding/ [accessed 14/9/1998].
[accessed 28/2/2006]. Angst, D.B. and Deatrick, J.A. (1996) ’Involvement
4 Information on QUADS is available at
in health care decisions: parents and children with
the UKDA website: http://quads.esds.ac.uk/about/
introduction.asp [accessed 28/2/2006].
chronic illness’, Journal of Family Nursing, 2 (2):
5 ‘Changing Lives and Times qualitative 174–94.
longitudinal initiative’. Call for outline proposals on Atkinson, P. (1992) ’The ethnography of a medical
ESRC website: http://www.esrcsocietytoday.ac.uk/ setting: reading, writing and rhetoric’, Qualitative
ESRCInfoCentre/opportunities/current_funding_ Health Research, 2 (4): 451–74.
opportunities/index28.aspx [accessed 28/2/2006]. Bevan, M. (2000) ’Family and vocation: career choice
6 The ESDS Qualidata catalogue currently lists 162 and the life histories of general practitioners’, in
datasets (and details of others remain to be transferred Bornat, J., Perks, R., Thompson, P. and Walmsley, J.
from the older Qualicat catalogue). Available at: (eds.), Oral History, Health and Welfare. London:
http://www.data-archive.ac.uk/search/allSearch.asp? Routledge. pp. 21–47.
q1=qualidata&zoom_page=1&zoom_per_page=10&
Bloor, M. (2000) ’The South Wales Miners Federation,
zoom_cat=-1&zoom_and=1&zoom_sort=1&ct=
Miners’ Lung and the instrumental use of expertise,
xmlAll [accessed 28/2/2006].
7 There is some public information on this, 1900–1950’, Social Studies in Science, 30 (1):
produced by JISC on ESDS performance and published 125–40.
on its website: http://www.mu.jisc.ac.uk/servicedata/ Bloor, M. and McIntosh, J. (1990) ’Surveillance and
esds/data/ [accessed 28/2/2006]. concealment: a comparison of techniques of
8 Provisional searches of ASSIA and selected client resistance in therapeutic communities and
electronic databases of research on criminology and health visiting’, in Cunningham-Burley, S. and
education carried out by myself and independently by McKeganey, N.P. (eds.), Readings in Medical
two colleagues (Rachel Pitman and Janette Colclough, Sociology. London: Tavistock/Routledge. pp. 159–81.
University of York) in February 2006 provided little evi- Boddy, M. (2001) Data Policy and Data Archiving: Report
dence of such studies. However, there are difficulties
on Consultation for the ESRC Research Resources
searching for secondary studies because there are no
Board. Bristol: University of Bristol.
established key words for classifying such studies, and
authors’ own definitions of secondary analysis vary.
Brownlie, J. and Howson, A. (2005) ‘ “Leaps of faith”
A renewed search of the health-related literature, and MMR: an empirical study of trust’, Sociology,
using similar search strategies, did result in further 39 (2): 221–39.
studies been identified. In total, over 100 secondary Cohen, M.H. (1995) ’The triggers of heightened parental
studies in health, criminology and education have uncertainty in chronic, life-threatening childhood
been identified to date. illness’, Qualitative Health Research, 5 (1): 63–77.
9 A different response figure (99%) and date Cohen, S. and Taylor, L. (1972) Psychological Survival:
of the survey (2000) have been reported elsewhere The Effects of Long-Term Imprisonment. London:
(Corti, 2000). I have quoted the most recently Allen Lane.
published. Corden, A. and Sainsbury, R. (2005) ‘Research
10 Guidelines on creating and depositing
participants’ views on use of verbatim quotations’.
qualitative datasets are available on the ESDS
Final report to ESRC, ref 2094. York: Social Policy
Qualidata website: http://www.esds.ac.uk/qualidata/
create/ [accessed 28/2/2006]. Research Unit (SPRU), University of York.
11 There are parallels here with concerns over the Corti, L. (2000) ’Progress and problems of preserving
use of computer software in qualitative data analysis and providing access to qualitative data for social
(see Gilbert, 2002). research – the international picture of an emerging
12 See essays published in a special supplement culture’, Forum: Qualitative Social Research
of the Journal of Health Services Research and Policy, [Online Journal], 1 (3): 58 paragraphs. Available
2005, 8 (1). at: http://www.qualitative-research.net/fqs-texte/
3-00/3-00corti-e.htm [accessed 1/3/2006].
Corti, L. (2003) ‘Infrastructure services and needs for
REFERENCES the provision of enhanced qualitative data resources’,
International Social Science Journal, 55 (3): 417–32.
Alderson, P. (1998) ’Confidentiality and consent in Corti, L. and Backhouse, G. (2005) ‘Acquiring
qualitative research’, Network – Newsletter of the qualitative data for secondary analysis’,
British Sociological Association, 69: 6–7. Forum: Qualitative Social Research [Online Journal],
518 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
6 (2): 31 paragraphs. Available at: http://www. Hood-Williams, J. and Harrison, W.C. (1998) ‘ “It’s all
qualitative-research.net/fqs-texte/2-05/05-2-36-e.htm in the small print …”: archiving and qualitative
[accessed 1/3/2006]. research’, Network – Newsletter of the British
Corti, L., Day, A. and Backhouse, G. (2000) Sociological Association, 70: 8–9.
’Confidentiality and informed consent: issues Jairath, N. (1999) ’Myocardial infarction patients’ use
for consideration in the preservation of and provision of metaphors to share meaning and communicate
of access to qualitative data archives’, Forum: underlying frames of experience’, Journal of Advanced
Qualitative Social Research [Online Journal], 1 (3): Nursing, 29 (2): 283–89.
46 paragraphs. Available at: http://www.qualitative- James, J.B. and Sørensen, A. (2000) ’Archiving
research.net/fqs-texte/3-00/3-00cortietal-e.htm longitudinal data for future research: why
[accessed 1/3/2006]. qualitative data add to a study’s usefulness’, Forum:
Corti, L. and Thompson, P. (2004) ‘Secondary anal- Qualitative Social Research [Online Journal], 1 (3):
ysis of archived data’, in Seale, C., Gobo, G., 57 paragraphs. Available at: http://www.qualitative-
Gubrium, J.F. and Silverman, D. (eds.), Qualitative research.net/fqs-texte/3-00/3-00jamessorensen-e.htm
Research Practice. London: Sage. pp. 327–43. [accessed 1/3/2006].
Fielding, N. (2004) ‘Getting the most from archived Jenny, J. and Logan, J. (1996) ’Caring and comfort
qualitative data: epistemological, practical and metaphors used by patients in critical care’, Image:
professional obstacles’, International Journal of Social Journal of Nursing Scholarship, 28 (4): 349–52.
Research Methodology, 7 (1): 97–104. Jones, C. and Rupp, S. (2000) ’Understanding the
Fielding, N.G. and Fielding, J.L. (2000) ’Resistance carers’ world: a biographical-interpretive case study’,
and adaptation to criminal identity: using secondary in Chamberlayne, P., Bornat, J. and Wengraf, T.
analysis to evaluate classic studies of crime and (eds.), The Turn to Biographical Methods in Social
deviance’, Sociology, 34 (4): 671–89. Science: Comparative Issues and Examples. London:
Gilbert, L.S. (2002) ‘Going the distance: ‘closeness’ Routledge. pp. 276–89.
in qualitative data analysis software’, International Mauthner, N.S., Parry, O. and Backett-Milburn, K. (1998)
Journal of Social Research Methodology, 5 (3): ’The data are out there, or are they? Implications for
215–28. archiving and revisiting qualitative data’, Sociology,
32 (4): 733–45.
Grinyer, A. (2002) ‘The anonymity of research partici-
May, C., Allison, G., Chapple, A., Chew-Graham, C.,
pants: assumptions, ethics and practicalities’, Social
Dixon, C., Gask, L., Graham, R., Rogers, A. and
Research Update, Issue 36, University of Surrey.
Roland, M. (2004) ‘Framing the doctor-patient
Hammersley, M. (1997) ’Qualitative data archiving:
relationship in chronic illness: a comparative study of
some reflections on its prospects and problems’,
general practitioners accounts’, Sociology of Health &
Sociology, 31 (1): 131–42.
Illness, 26 (2): 135–58.
Heaton, J. (1998) ’Secondary analysis of qualitative Nelson, L.G.L., Summers, J.A. and Turnbull, A. (2004)
data’, Social Research Update, Issue 22, University ‘Boundaries in family-professional relationships:
of Surrey. implications for special education’, Remedial and
Heaton, J. (2000) ’Secondary analysis of qualitative data: Special Education, 25 (3): 153–65.
a review of the literature’. Final report to ESRC, Parry, O. and Mauthner, N.S. (2004) ‘Whose data are
ref R000222918. York: Social Policy Research Unit they anyway? Practical, legal and ethical issues in
(SPRU), University of York. archiving qualitative research data’, Sociology, 38 (1):
Heaton, J. (2004) Reworking Qualitative Data. London: 139–52.
Sage. Parry, O. and Mauthner, N.S. (2005) ‘Back to basics:
Henwood, K. and Lang, I. (2003) Qualitative Research who re-uses qualitative data and why?’, Sociology,
Resources: A Consultation with UK Social Scientists. 39 (2): 337–42.
Swindon, UK: ESRC. Pascalev, A. (1996) ’Images of death and dying in the
Hinds, P.S., Vogel, R.J. and Clarke-Steffen, L. (1997) intensive care unit’, Journal of Medical Humanities,
’The possibilities and pitfalls of doing a secondary 17 (4): 219–36.
analysis of a qualitative data set’, Qualitative Health Plummer, K. (1983) Documents of Life: An Introduction
Research, 7 (3): 408–24. to the Problems and Literature of a Humanistic
Holland, J., Thomson, R. and Henderson, S. (2004) Method. London: George Allen & Unwin.
‘Feasibility study for a possible qualitative longitudinal Plummer, K. (2001) Documents of Life 2: An Invitation
study: discussion paper’. Available at: http://www. to a Critical Humanism. London: Sage.
lsbu.ac.uk/inventingadulthoods/feasibility_study.pdf Popkess-Vawter, S., Brandau, C. and Straub, J. (1998)
[accessed 23/2/2006]. ’Triggers of overeating and related intervention
SECONDARY ANALYSIS OF QUALITATIVE DATA 519
strategies for women who weight cycle’, Applied qualitative data’, Forum: Qualitative Social Research
Nursing Research, 11 (2): 69–76. [Online Journal], 6 (1): 33 paragraphs. Available
Richardson, J.C. and Godfrey, B.S. (2003) ‘Towards at: http://www.qualitative-research.net/fqs-texte/
ethical practice in the use of archived transcripted 1-05/05-1-29-e.htm [accessed 1/3/2006].
interviews’, International Journal of Social Research Thorne, S.E. (1988) ’Helpful and unhelpful commu-
Methodology, 6 (4): 347–55. nications in cancer care: the patient perspective’,
Sandelowski, M. (1997) ‘ “To be of use”: enhancing the Oncology Nursing Forum, 15 (2): 167–72.
utility of qualitative research’, Nursing Outlook, 45: Thorne, S.E. (1990a) ’Constructive noncompliance in
125–32. chronic illness’, Holistic Nursing Practice, 5 (1): 62–9.
Savage, M. (2005a) ‘Working-class identities in Thorne, S.E. (1990b) ’Mothers with chronic illness:
the 1960s: revisiting the Affluent Worker study’, a predicament of social construction’, Health Care
Sociology, 39 (5): 929–46. for Women International, 11: 209–21.
Savage, M. (2005b) ‘Revisiting classic qualitative Thorne, S.E. (1990c) ’Navigating troubled waters:
studies’, Forum: Qualitative Social Research chronic illness experience in a health care crisis’.
[Online Journal], 6 (1): 43 paragraphs. Available Unpublished thesis, The Union Institute of Advanced
at: http://www.qualitative-research.net/fqs-texte/ Studies: Cincinnati.
1-05/05-1-31-e.htm [accessed 1/3/2006]. Thorne, S.E. (1994) ’Secondary analysis in qualitative
Schneider, B. (2004) ‘Building a scientific community: research: issues and implications’, in Morse, J.M.
the need for replication’, Teachers College Record, (ed.), Critical Issues in Qualitative Research Methods.
106 (7): 1471–83. London: Sage. pp. 263–79.
Scott, J. (1990) A Matter of Record: Documentary Thorne, S.E. (1998) ’Ethical and representational issues
Sources in Social Research. Cambridge: Polity Press. in qualitative secondary analysis’, Qualitative Health
Szabo, V. and Strang, V.R. (1997) ’Secondary analysis of Research, 8 (4): 547–55.
qualitative data’, Advances in Nursing Science, 20 (2): Weaver, A. (1994) ’Deconstructing dirt and disease: the
66–74. case of TB’, in Bloor, M. and Taraborrell, P. (eds.),
Thompson, P. (1998) ’Sharing and reshaping life Qualitative Studies in Health and Medicine. Aldershot:
stories: problems and potential in archiving research Avebury. pp. 76–95.
narratives’, in Chamberlain, M. and Thompson, P. Weaver, A. and Atkinson, P. (1994) Microcomputing and
(eds.), Narrative and Genre. London: Routledge. Qualitative Data Analysis. Aldershot: Avebury.
pp. 167–81. Yamashita, M. and Forsyth, D.M. (1998) ’Family coping
Thomson, D., Bzdel, L., Golden-Biddle, K., Reay, T. with mental illness: an aggregate from two studies,
and Estabrooks, C.A. (2005) ‘Central questions of Canada and United States’, Journal of the American
anonymization: a case study of secondary use of Psychiatric Association, 4 (1): 1–8.
31
Secondary Analysis of
Quantitative Data Sources
Angela Dale, Jo Wathan and
Vanessa Higgins
good practice. Good practice has two aspects – national archives can be found from the
ensuring data are used in a responsible websites of the International Federation of
way that maintains the confidentiality of the Data Organisations (IFDO see http://www.
respondents, and also good practice in terms ifdo.org/) and Council of European Social
of analysis. Finally, we review some of the Science Data Archives (CESSDA see http://
new developments in access that have resulted www.cessda.org).
from web technologies. The archives listed in Box 31.1 are
typically created in an academic environment
with academic re-use in mind. However,
DATA AVAILABILITY many data collectors are also involved with
data distribution. In the United States a
In this section we discuss what a data archive range of microdata are available directly
does, what types of data are available and from the website of the Census Bureau,
provide some generic advice on how to find a whilst many other statistical offices, such
dataset. as the UK Office for National Statistics
(ONS) and the National Institute for Statistics
and Economic Studies (INSEE) in France,
Data archives
make summary statistics available online.
Data archives play a fundamental role in The United Nations Statistics Division
making data available for secondary analysis. provides a listing of national statistics offices
A data archive is a storehouse of digitised data. as well as links to other statistical databases
The archive performs a set of related functions (http://unstats.un.org/unsd/methods/inter-
which include obtaining data, assessing its natlinks/sd_natstat.htm (last accessed 06/02/07).
suitability for release, checking the data,
adding the necessary data description and
What data are available?
documentation and preserving the data for
future use. All archives have some form The range of data that is available for a specific
of catalogue. Large archives usually have country will vary with historical and cultural
sophisticated search facilities that allow you factors but may include many of the types
to browse through major studies, search described in Box 31.2. Data from private
abstracts for keywords and so forth. sector sources, or business surveys may also
Many countries now have either a national be available.
archive, or a small number of major archives.
This development can be traced back to the
Locating a dataset
establishment of the Roper Center in the
United States in 1957 (Dale et al, 1988) The following points provide some guidance
which continues to be a major source of on how to search for a dataset.
public opinion datasets. The Inter-University
Consortium of Public and Social Research 1 Searching your local data archive
(ICPSR) followed in 1962 and houses a The most obvious place to search for a dataset
broad range of social data, mainly from is in the archives of your own country. Such
academic and government sources. The UK archives will almost certainly have a website with
DataArchive was formed in 1967 and has been a searchable catalogue. The CESSDA (for Europe)
or IFDO websites are helpful in locating national
centrally funded by the Economic and Social
archives.
Research Council throughout this period. 2 Looking for data from data collectors
By the start of the century archives were If you know a dataset exists but cannot find
widespread as illustrated in Box 31.1. it in your local data archive it is worth finding
This list, which is far from comprehensive, out who collected or commissioned the data.
illustrates the extent to which archives are National statistical organisations and other major
found world-wide. More extensive lists of social survey organisations may be able to provide
522 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Box 31.1 Some key national, and other, major data archives
you with access to the data or direct you to resources that can help you to locate potential data
another organisation that disseminates data on sources. A list of these is given in Box 31.3.
their behalf.
3 Using a dataset from another country
Some datasets are restricted to users within the Microdata based on administrative
country of origin. However, the archive website records
will usually describe access conditions. Often your
local data archive will be able to help you to obtain There are a growing number of datasets
the dataset. CESSDA has an international data that are constructed by linking together
browser and search facility which enables users administrative records for the same individ-
to explore a range of data published by major uals. However, they may not be listed in
European national archives. a data archive catalogue and will almost
4 Does a dataset exist? certainly only be available under restricted
A literature search on your research topic is conditions. The use of administrative records
a good way to find out about data availability.
for research has been pioneered by countries
If a major data source is available it is likely
such as Norway, Denmark, Finland and
that someone will already have used it. A good
knowledge of the literature will help you to identify the Netherlands. In these countries a single
the sorts of data sources that may be available. identification number which is used across
5 Other information sources a wide range of official records provides a
You may find that there are other resources basis of record linkage. In Denmark a unique
(often web-based) that can help with your search. ID is allocated to individuals at birth and is
The United Kingdom, for example, has a range of used by government departments responsible
SECONDARY ANALYSIS OF QUANTITATIVE DATA SOURCES 523
for employment, taxation, benefits, education, determinants of mortality. These studies have
housing and health. This has enabled the all been based on evidence from death records,
Danish statistical office to create a research linked to other information from vital statistics
database by linking together records for each and, in some cases, census data. In the UK
individual in the country (Smith et al, 2004). these is a growing focus on realising the
This has, for example, been used to model the research benefits of record linkage across a
effect of proposed tax and benefit changes on much wider range of topic areas, although the
different sections of the population. Similarly, absence of single reliable ID which is used
Sweden has a longitudinal database for across all administrative records hampers
education, income and employment that was progress.
set up to support research on changes in In all cases, where administrative records
the Swedish labour market during the 1990s. are used for research there are major concerns
In Norway, Sweden and Denmark, and also over protecting the anonymity and confiden-
England and Wales, linked records have tiality of the individuals in the database.
long been used for analysis of the social This means that databases are very carefully
524 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Question Bank
http://qb.soc.surrey.ac.uk
This service contains information on survey content and survey questions. There are also links to the
Survey Link Scheme which enables researchers to attend a survey briefing, and often to shadow an
interviewer in the field.
Intute
http://www.intute.ac.uk
Intute is a general resource which provides links to key websites.
controlled by the relevant national statistical means that the data can be used in much more
offices and research access is subject to very sophisticated analyses than is possible with
tight security measures (see section entitled tabular outputs.
‘Advances in access to data and support’). Large quantitative surveys tend to be
collected by agencies with well-established
reputations for quality research, for example,
ANALYTICAL AND RESEARCH VALUE the US Census Bureau or the UK Office for
National Statistics. Rigorous methodologies
The quantitative data sources available for and sophisticated sampling methods are
secondary analysis offer enormous potential employed and interviewers are trained exten-
for research on a wide range of topics. sively to ensure good quality data. Usually
Whilst tabular data provide an excellent the survey process is well documented and
source of material for many purposes (for data are carefully checked and edited. Access
example, national censuses of population to these very expensive resources provides
provide essential information on the structure considerable benefit to secondary analysts. In
of the population and, in particular, the the following paragraphs we briefly review
characteristics of small areas), these aggregate some of the key research benefits from
sources do not allow the analyst the flexibility secondary analysis of microdata files.
available with microdata. For example, access
to microdata provides a much more extensive
Large and nationally representative
range of variables, usually in a great deal of
samples
detail. This allows the creation of new cate-
gorisations and new definitions appropriate to Secondary analysis can provide the basis for
the research question, rather than using those making generalisations to the population as a
defined by the survey commissioner. It also whole. Large government surveys are usually
SECONDARY ANALYSIS OF QUANTITATIVE DATA SOURCES 525
The European Social Survey (ESS) pro- (e.g. Dickens et al, 2000, Marmot, 2003) or
vides a contrast in that it is explicitly before-and-after policy analysis (Gregg et al,
designed to support international comparison. 2005).
The survey started in 2001 and has been The General Social Survey has been
conducted every two years since. It covers conducted in the US by NORC since 1972
over 20 nations and is designed to chart and provides information on the changing
and explain the interaction between Europe’s attitudes of the US population. In a similar
changing institutions and the attitudes, beliefs vein, the British Social Attitudes Survey has
and behaviour patterns of its diverse pop- been conducted annually since 1983 and
ulations. Achieving equivalence across all provides a unique insight into how attitudes
countries participating in the study is a in Britain have changed over this time period.
principle that is applied to sample selection, Both studies form part of a larger programme:
translation of the questionnaire, and to the International Social Science Programme
all methods and processes. All procedures (ISSP) which provides comparative data for
and outcomes are comprehensively docu- up to 41 countries world-wide www.issp.org.
mented in a standard way. More information Cross-sectional surveys do not follow the
and direct download of data is available same individual over time so they cannot be
from:www.europeansocialsurvey.org. used to analyse individual level change over
Clark and Lelkes (2005) used the 2002– time. However, change across aggregated
2003 ESS to show that religion acts as a groups can be analysed. For example, Payne
buffer between stressful life events and the and Payne (1994) used the Labour Force
ensuing economic and social implications. Survey for 1979–1989 to model trends in the
All denominations suffer less psychological work chances of unemployed people relative
harm from unemployment than the non- to the chances of people in work. Longitudinal
religious. Catholics and Protestants are less data such as cohort studies or panel studies are
hurt by marital separation than the non- required to compare individuals at different
religious but, while Protestants are protected points in time.
against divorce, Catholics suffer a greater fall
in life satisfaction than other groups.
Cohort studies
In the UK a succession of birth cohorts have
Historical comparisons and change
studied people born in 1946, 1958, 1970 and,
over time
most recently, 2000–2001. These studies have
Many of the examples in the earlier section been repeated at intervals since birth and thus
also included a time dimension and secondary grow richer as the respondents grow older. For
analysis may be the only means by which example, the 1958 cohort study sampled all
historical comparisons can be made for those children born in Great Britain during one
information that cannot be collected retro- week in March 1958 and conducted follow-up
spectively. Data archives allow the researcher surveys of sample individuals at key stages
to go back in time and find sources of (e.g. ages 7, 11, 16, 23, 33, 42). It is expected
information on, for example, what people that all these cohort studies will continue
thought, how they voted and how much they throughout the lifetime of their members.
earned. Many surveys, such as the British Longitudinal birth cohort studies are valu-
General Household Survey (GHS), which able for investigating the lifetime processes
collects data on a range of topics covering of individuals. For example, using the 1958
household, family and individual information, cohort, Butler et al (1971) identified the
have now been running for 30 years or effect of smoking on low-birth weight and
more. These surveys have retained a high perinatal mortality; Hobcraft and Kiernan
degree of consistency in their core questions (2001) showed that any experience of child-
and therefore support time series analyses hood poverty is clearly associated with
SECONDARY ANALYSIS OF QUANTITATIVE DATA SOURCES 527
adverse outcomes in adulthood; and Elias and whilst Berthoud and Gershuny (2000) provide
Blanchflower (1988) demonstrated the impact analyses based on the first seven years of
of early school achievement on occupational BHPS.
attainment.
A single cohort study is clearly limited in its
Small population sub-groups
ability to say anything about how outcomes
vary between different cohorts. However, Secondary analysis can provide a means of
the ability to compare a number of cohorts obtaining data on small groups within the
born at time intervals from 1946 to 2000 population for whom there is no obvious
becomes a very powerful analysis tool. Ferri sampling frame. However, a dataset must
et al (2003) provide an accessible account of be large enough to ensure that sufficient
cohort differences based on analysis of the numbers of the sub-group can be located, and
1946, 1958 and 1970 cohorts. Topics include: should also be able to provide a representative
family and parenting, qualifications and sample. Some surveys occasionally have
employment, income and living standards, special boost samples for sub-groups; for
physical and mental health, lifestyles, health example, the Health Survey for England
and citizenship. An account of the first contained ethnic minority boosts in 1999 and
findings from the Millennium cohort (births 2004 (Erens et al, 2001; Sproston and Mindell,
from 2000 to 2001) is given by Dex and Joshi 2006). The survey results highlighted some
(2005). interesting ethnic differences in health out-
comes. Bangladeshi and Pakistani men and
women, and Black Caribbean women, were
Panel studies
more likely than the general population to
Panel studies such as the US Panel Study report that they had bad or very bad health.
of Income Dynamics (PSID) and the British In relation to the general population (set at
Household Panel Study (BHPS) cover all 1.0) the risk ratios for bad or very bad health
ages, and are repeated at frequent intervals, were 3.77 for Bangladeshi men, 4.02 for
usually annually. Whereas cohort studies are Bangladeshi women, 2.33 for Pakistani men,
primarily suited to understanding develop- 3.54 for Pakistani women, and 1.90 for Black
mental processes over a life course, a panel Caribbean women (Sproston et al, 2006).
study is able to show the effect of short- Additionally, datasets with comparable
term changes in levels of income, household questions and data collection methods can
composition and changes in the economy. be pooled to increase sample sizes. For
For example, Jarvis and Jenkins (1999) use example Ginn and Price (2002) pooled a
the BHPS to show the impact of marital number of annual GHS datasets to look at the
break-up on income whilst Jenkins and Van subpopulations of divorcees. Many analysts
Kerm (2006) examine trends in income pool a number of years from the Labour Force
inequality and income mobility. The similarity Survey to allow analysis of ethnic minorities
between PSID and BHPS lends support to (Dale et al, 2006). When data are being pooled
comparative analyses between the US and over successive years it is vitally important to
Britain – for example Banks et al’s (2003) check that there are no changes in sampling
comparison of financial wealth inequality design, question wording or categorisation.
between these two countries. Both PSID and
BHPS provide a wealth of information to
Relationships within households
support users and have published collections
of papers that demonstrate very fully some Many datasets collect information about all
of the research strengths of the data. Five members in the household, for example
Thousand American Families captures the most of the UK government surveys, BHPS,
first 13 years of PSID and is now available PSID and the SIPP. This is valuable for
on-line from: www.psidonline.isr.umich.edu, analysing intra-household relationships and
528 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
supports research concerned with, for exam- characteristics of the individuals concerned,
ple, the impact of a partner’s characteristics interpretivists pointed out that these social
on women’s employment. Other levels of ‘facts’ were, in themselves, artefacts that
analysis may also be possible, for example it resulted from socially mediated processes.
is often possible to identify a family unit or, in For example, whether a suicide is recorded
the case of the UK Family Resources Survey, is influenced by legislation and coroner
a social-security benefit unit. decisions (Atkinson, 1977). This prototype
of secondary analysis not only came to be
associated with positivism but also with a lack
Combining survey analysis with of reflection on data sources.
qualitative research However, critical secondary analysts
New data dissemination and analysis tools should now be aware that survey data are
make it easier than in the past to conduct socially constructed artefacts of the processes
secondary analysis as part of a mixed methods that produced them. In this sense, secondary
approach. There are many ways in which this analysis is no different from other forms of
might be undertaken (Bryman, 1988). social research. The results of a qualitative
study based on in-depth interviews are,
• Secondary analysis can provide evidence to help similarly, a product of the relationship
in planning a qualitative study. For example between the subject and researcher, the
analysis of census data can help to target which researcher’s interpretation of that interaction,
geographical areas to use in an interview-based and the choices made over which aspects of
study. the research to report.
• Secondary data can provide a nationally represen- However, secondary analysis is usually
tative context for a small-scale study, such as a undertaken by researchers who did not
locality-based study or a study of divorcees or lone conduct the primary data collection. For this
fathers.
reason they have a more distant relationship
• Qualitative studies are often very important in
explaining relationships which are identified by
to the data and may not, therefore, fully
quantitative analysis (for example, the low levels appreciate the processes by which the data
of economic activity amongst some groups of were constructed. Therefore it is vital that
South Asian women in the UK; see Dale et al, analysts find out as much detail as possible
2006). about how the survey was conducted and the
• Secondary analysis can often be used to test strengths and limitations of the dataset. Axinn
theories generated as the result of qualitative and Pearce (2006: 23) make a number of
studies. valuable suggestions for ways in which the
secondary analyst can learn about the process
of collecting the data. These include using a
WHAT ARE THE METHODOLOGICAL copy of the survey questionnaire to interview
ISSUES ASSOCIATED WITH someone and then getting them to interview
SECONDARY ANALYSIS? you and visiting the organisation that collected
the data and inspecting fieldwork notes to
One of the earliest examples of secondary learn about the problems that occurred during
analysis is Durkheim’s classic study of fieldwork.
suicide – routinely cited as the archetype of The secondary analyst is usually trying to
positivistic research in which the adminis- answer a rather different research question
trative records of suicides were treated as than the primary data analyst. For example,
‘social facts’ to be studied as ‘things, that data may have been collected by a government
is as realities external to the individual’ department to address a particular policy
(Durkheim, 1952: 38). Whilst Durkheim’s requirement and concepts will, therefore,
study demonstrated that evidence on suicide reflect this. The secondary analyst needs to
rates showed relationships with particular work through their own conceptual definitions
SECONDARY ANALYSIS OF QUANTITATIVE DATA SOURCES 529
before starting the study rather than accept- all data collection agencies, maintaining the
ing, uncritically, those of the primary data confidentiality of their respondents is of huge
collector. Often it is possible to combine data importance and a breach of confidentiality
elements in new ways to construct the desired may have negative consequences for the
definitions. Where the data are less than ideal respondent as well as a negative impact on
it is valuable to explain the shortcomings and the public’s willingness to participate in such
seek evidence of how this may affect the studies.
results. Even though secondary analysts may not
One of the benefits of secondary analysis face these obligations at the point of data
is that documentation and data are available collection, they inherit responsibilities as a
for others to use. This means that the research result of access to the data and must cooperate
results can be critically assessed by other in ensuring the confidentiality of the data. In
researchers and analyses replicated, perhaps some cases this means that a researcher will
using alternative assumptions or different not be able to obtain as much detailed data as
models. wished. The relationship between the amount
of detail released and the restrictions on access
are discussed further in the section entitled
ETHICS IN SECONDARY ANALYSIS ‘Advances in access to data and support’.
A further set of obligations arises with
At first sight secondary analysis may appear respect to professional conduct. Even though
to bypass all the ethical issues that arise there are no specific guidelines on secondary
at the data collection stage of a study. research, the codes of professional organisa-
The primary investigators will have been tions, whose remit covers secondary analysts,
responsible for obtaining appropriate ethical share some common features. Table 31.1 gives
approval for the study and made decisions the common features of the codes of the
about their procedures for informed consent British Social Research Association (SRA)
and for protecting the confidentiality of the (e.g. 2003), British Sociological Association
respondent. Data collection agencies take (BSA) (e.g. 2002) and the Royal Statis-
great care to ensure that procedures conform tical Society (RSS) (1993). These include
to high ethical standards. Many national maintaining awareness of necessary law and
statistical offices collect data under statutory legislation, reporting the limitations of your
requirement and, in these cases, the security data and method, respecting privacy and
of the data is governed by law. However, for maintaining confidentiality of data.
Table 31.1 A comparison of the ethical codes of the British Sociological Association, Royal
Statistical Society and Social Research Association
Conduct RSS BSA SRA
Ensure that you know the relevant law & regulations – abide by these ✓ ✓ ✓
Freely given informed consent wherever possible; be aware of power issues, explain the research ✓ ✓
fully and uses of data produced
Do not produce misleading research; honestly & proportionately state problems and limitations of ✓ ✓ ✓
your data and method. Distinguish interpretation of results from opinion. Give readers enough
information to assess the quality of work
Seek to upgrade your own skills ✓
Only do research work that you are competent to do ✓ ✓
Respect privacy – don’t unnecessarily intrude on subjects ✓ ✓
Consider the effects of your research, including publication; minimise harm to research ✓ ✓
participants and self
Maintain confidentiality of data – and inform research participants about the use to which data ✓ ✓ ✓
will be put
530 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
against reading too much into small effects, perspective, the potential of the grid might
even if statistically significant, and emphasise be to provide a controlled environment within
the importance of replication to see whether which disclosive data could be analysed
the explanatory variable in question is found without the need to distribute the microdata
repeatedly in independent studies. to users. The processing capacity can also be
harnessed for processing very large datasets
and conducting very power-hungry analyses
ADVANCES IN ACCESS TO DATA (Smith, 2004). These and other functions are
AND SUPPORT being championed by the UK National Centre
for E-social Science (www.ncess.ac.uk).
The technical developments of the web The increased power of the web, including
and remote access to data are mirrored in its search facilities, and the increased level of
moves towards distributed services: that is, data availability in general, has led to growing
a service that is located at more than one concerns over ensuring the confidentiality
physical site. The UK ESDS is one such of data for secondary analysis. This applies
service with specialist functions run by teams particularly to microdata where a great deal
at two universities which are 200 miles of information about an individual may be
apart. This geographical distance should not contained in a single record. By contrast, it is
be apparent to users who access a single much harder to identify someone using aggre-
website and who are supported by a joined gate statistics (e.g. a table from the Census).
up helpdesk. However the ability to run a We can define two interacting dimensions
distributed service means that it can benefit when considering access to data – the level
from specialist groups irrespective of their of safety associated with the dataset and the
geographical location. This has the potential level of safety associated with the access
to add more expertise to the service than would setting. The level of safety associated with
be possible if all staff were required to be in the dataset will depend heavily on the degree
the same institution. In this sense ESDS might of detail in the data; the proportion of
be considered to set a new standard in data the population in the sample and the ease
services. of identifying the data – either through
A second area of development builds on matching or spontaneous recognition. Thus a
the potential for networked technology to small sample with very restricted individual
provide linkages without the constraints of detail and little geographical information will
geography. Grid technology moves beyond be much ‘safer’ than a very large sample
the internet and provides the means for containing detailed individual information
users to benefit from increased data storage (e.g. occupation, educational qualification,
facilities and processing power. The Grid ethnic group) and also information on the
offers the potential for researchers to link locality of residence. The level of safety
data from different sources, held at a range associated with the access setting will range
of locations (perhaps still within the control from a safe setting within a statistical office at
of the data collector), with prescribed access one extreme, to unrestricted access to data by
conditions, and then to analyse these using any user, at the other extreme.
data processing power from one or more The two dimensions interact so that, at one
servers. In the UK, the academic community extreme, if the data are judged to be entirely
has made some investment in pilot projects safe, then the access arrangements can be very
to establish areas of potential development. open. This is exemplified by the Public Use
Grid technology has been used to provide Microdata Files produced by the US Bureau
virtual meeting spaces called ‘access grid of the Census, which can be downloaded
nodes’ which have been used for meetings without restriction from the website of the
between partners in the ESDS distributed US Bureau of the Census. These files are
data service. From a secondary analysis samples – 1 percent and 5 percent – where
SECONDARY ANALYSIS OF QUANTITATIVE DATA SOURCES 533
the amount of both individual detail and multilevel modelling is now readily available
geographical information has been restricted (e.g. MLWin, STATA, SAS) and there is
to preserve confidentiality (Census Bureau, abundant provision of training to help the new
2005). user (see www.ncrm.ac.uk/database).
By contrast, if the data are very detailed Structural equation modelling (SEM)
and/or contain information that could be allows much greater flexibility in defining
readily used to identify someone, then greater models than standard multiple regression. It
safety needs to be built into the access condi- introduces the concept of the latent variable
tions. An example is the ONS Longitudinal which has multiple indicators and can
Study that contains data with a great deal correct for some of the measurement error in
of individual and geographical detail drawn standard regression analysis. Models tested
from the census and from vital events (e.g. may have complex causal pathways, often
birth and death records). For this dataset with a two-way direction of causality. SEM
access is only available within a secure setting allows specific pathways in the model to be
inside the Office for National Statistics. An tested as well as an overall test of a model.
alternative is a remote access facility such as As for multilevel modelling, software for
that used for the Luxembourg Income Study. using SEM is becoming much more widely
Here researchers send requests for analysis (in available and, for both, there are a growing
the form of SPSS or STATA programs) which number of courses and on-line resources.
are run and checked before the non-disclosive Other developments in secondary analysis
results are emailed to the researcher. relate to the linkage of additional data sources
In practice most data are made available to supplement or augment individual level
under some kind of licence whereby the records collected by a survey. The simplest
user agrees to a set of conditions designed example is where aggregate information about
to ensure the confidentiality of the data. a respondent’s locality is attached to that
However, as researchers need more detailed individual (for example, area-level statistics
data, for example including information on from the census) and this can then be used to
locality, or more datasets are produced by explain variation at the level of the locality
linking administrative records, then they will in a multilevel model. In addition, external
be subject to tighter controls. data may be matched to an individual, for
example tax returns may be used to provide
accurate information on earnings. This has
NEW DEVELOPMENTS IN METHODS been introduced in the Canadian Survey of
Living and Income Dynamics (SLID), where
Developments in statistical analysis now pro- respondents can choose between providing
vide more opportunities for building models detailed information on income or allowing
that reflect some of the complexities of social this to be obtained from their tax return. This
life – for example, analysis of children’s uses an exact matching method where it is
attainment in school. Multilevel models allow vitally important to have identical keys in
one to define children by the class they are in both data sources. An alternative that is used
(and the characteristics of their class teachers); where income data cannot be obtained from
the school they attend (and the characteristics the respondent is to add an estimated value to
of the school); and also the catchment area each individual. For example, a survey which
of the school. All these different levels are does contain the required income information
known to affect a child’s attainment and may be used to identify a set of explanatory
their impact can be modelled. Similarly, variables that predict income well. If these
multilevel models can improve analyses of explanatory variables are also contained in the
unemployment, for example, by allowing dataset without income, then they can be used
information about the local labour market as a basis for predicting the expected income
to be included in the model. Software for of each individual.
534 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Hakim, C. (1982) Secondary Analysis of Payne, J., & Payne, C. (1994) Recession, restructuring
Social Research. London: George Allen and and the fate of the unemployed: Evidence in
Unwin. the underclass debate, Sociology, Vol. 28, No. 1,
Hobcraft, J., & Kiernan, K.E. (2001) Childhood pp. 1–19.
poverty, early motherhood and adult social exclusion, Plewis, I. (2004) Weighting for Non-response: Illustra-
British Journal of Sociology, Vol. 52, No. 3, tive Examples www.ccsr.ac.uk/esds/events/2004-03-
pp. 495–517. 12/documents/plewisexamleshandout.doc
Jarvis, H., & Jenkins, S.P. (1999) Marital splits and Rainwater, L., & Smeeding, T.M. (2003) Poor Kids in
income changes: Evidence from the British Household a Rich Country: America’s Children in Comparative
Panel Survey, Population Studies, Vol. 53, No. 2, Perspective. New York: Russell Sage Foundation.
pp. 237–254. Royal Statistical Society (1993) Royal Statistical Society
Jenkins, S.P., & Van Kerm, P. (2006) Trends in income Code of Conduct. London: RSS.
inequality, pro-poor income growth and income Samples of Anonymised Records: http://www.ccsr.ac.
mobility, Oxford Economic Papers, Vol. 58, No. 3, uk/sars
pp. 531–548. Smith, S. (2004) Grid Enabling the SARs. Manchester:
Kenward, M., & Carpenter, J. (2005) Missing Centre for Census and Survey Research, http://www.
Data Methodology for Multilevel Models, Methods ccsr.ac.uk/sars/publications/
Briefing 5, www.ccsr.ac.uk/methods/publications/ Smith, G., Noble, M., Anttila, C., Gill, L., Zaidi, A.,
documents/kenward_000.pdf. Wright, G., Dibben, C., & Barnes, H. (2004) The Value
Kielcolt, K.J., & Nathan, L.E. (1986) Secondary Analysis of Linked Administrative Records for Longitudinal
of Survey Data. London: Sage. Analysis, Report to the ESRC National Longitudinal
Luxembourg Income Study Project http://www. Strategy Committee.
lisproject.org/. Social Research Association (2003) Ethical Guidelines.
Marmot, M. (2003) Monitoring Socio-economic Differ- London: Social Research Association.
ences in Health, presentation to the Health Surveys Sproston, K., & Mindell, J. (eds) (2006) Health Survey
User Group, RSS, London, 11/07/03, slides available for England 2004. Volume 1: The Health of Minority
at www.esds.ac.uk Ethnic Groups. London: The Information Centre.
32
Conducting a Meta-Analysis
Erika A. Patall and Harris Cooper
A literature review typically summarizes summary of the pooled results’ (Last, 2001).
results of past studies, suggests potential Even though meta-analysis has the same
reasons for inconsistencies in past research goals as the traditional narrative review,
findings, and directs future investigations. many limitations of the narrative review can
Researchers often use a narrative approach to be addressed by using statistical procedures
summarize and integrate research on a specific to combine the results of previous studies.
topic. The traditional narrative reviewer For example, one advantage of quantitative
identifies articles relevant to the topic of synthesis was demonstrated empirically in
interest, examines the results of each article a study by Cooper and Rosenthal (1980).
to see whether the hypothesis was supported, Faculty members and graduate students were
and provides an overall conclusion. asked to draw summary conclusions using
Traditional narrative reviews have been either a meta-analytic or narrative approach
criticized because, although they can provide about studies that tested whether females
a meticulous list of multiple tests of a showed greater persistence at tasks than
hypothesis, they often fail to fully and males. Results showed that narrative review
accurately integrate the conclusions contained procedures led to inaccurate or imprecise
in them (Hunt, 1997). Narrative reviews are characterizations of the cumulative research
prone to allowing the biases of the reviewer results; in particular, reviewers using a
to enter into conclusions, because information qualitative approach underestimated the size
in the original studies can be discarded or of the effect.
improperly weighted. In this chapter we provide a framework
More recently, systematic research synthe- for understanding meta-analysis. First, major
ses that include meta-analyses have taken the meta-analytic procedures are described. This
place of purely narrative reviews of empirical is followed by a discussion of the major
literature. Meta-analysis is ‘the statistical syn- challenges that face the meta-analyst and
thesis of the data from separate but similar, i.e. some new directions in the development of
comparable studies, leading to a quantitative meta-analytic methods.
CONDUCTING A META-ANALYSIS 537
the degree of linear relation between two vari- Table 32.2 An example of odds ratio
ables. The correlation coefficient is familiar estimation
to most researchers and is most appropriate Pool playing Control
when describing the relationship between two Arrested a = 75 b = 60
continuous variables. Not arrested c = 25 d = 40
Information, such as variances and covari-
ances necessary to calculate a correlation
odds of arrest were 3 to 1 (75 to 25).
coefficient are rarely provided in primary
When participants did something else, the
research reports. Luckily, most researchers
odds of arrest were 1.5 to 1 (60 to 40).
provide r-indexes in cases where they apply.
The meta-analyst then simply forms the ratio
When only the t-value associated with the
of the playing-pool odds over the control
r-index is given, the r-index can be calculated
activities odds. In this case, the odds ratio
with the following formula:
is 2, meaning the odds of arrest are twice as
large in the pool-playing condition as in the
t2
r= placebo condition. The odds ratio can also be
t + dferror
2
calculated by dividing the product of the main
diagonal elements by the product of the off-
where all terms are defined as before.
diagonal elements. In this example, using the
However, it should be noted that this
previously described formula,
formula will always produce a positive value.
Consequently, the researcher should seek ad 75 × 40
additional information in the primary research OR = = =2
bc 60 × 25
report, such as a verbal description of the
relationship, which would allow the direction where all terms are defined in Table 32.2.
of the relationship to be determined.
Identifying independent samples
The odds ratio
The odds ratio is applicable when both A statistical problem arises when a single
variables are dichotomous and findings are study contains multiple effect size estimates
presented as frequencies or proportions. This taken on the same sample of participants.
measure of effect is used most in medical There are several approaches meta-analysts
sciences, in which the researcher is often use to handle such dependent effect sizes.
interested in the effect of a treatment on Some treat each effect size as independent,
mortality or the appearance or disappearance regardless of the number of effect sizes that
of disease. It also appears frequently in comes from the same sample of people. The
studies of educational interventions when the strength of this technique is that it does
outcome of interest is drop-out or retention not lose any of the within-study information
rates or criminal justice studies where the regarding potential moderators. However,
outcome is recidivism. Take for example, a this strategy violates the assumption that
case in which we are interested in whether the estimates are independent. This may
playing pool with friends led to subsequent cause the standard error associated with the
arrest. Suppose that the meta-analyst came overall effect to be underestimated and the
across a study in which 200 people either robustness of the effect to be exaggerated.
played pool with friends or did not and then Further, the results of studies will not be
examined evidence for arrests later that night. weighted equally in any overall conclusion
The results of the study could have looked like about results. Rather, studies will contribute
the fictional data presented in Table 32.2. to the overall effect in relation to the number
First, the odds that a participant was of statistical tests contained in it.
arrested must be determined for each con- Other meta-analysts use the study as the
dition. When participants played pool, the unit of analysis. They calculate the mean
540 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
effect size, or take the median result, or weight in calculating the average effect. In the
identify a preferred outcome measure, and use weighted procedure, each independent effect
this value to represent the study. This strategy size is first multiplied by the inverse of its
ensures that the assumption of independence variance and the sum of these products is
is not violated and that each study contributes then divided by the sum of the inverses. The
equally to the overall effect. However, some weighting procedure is generally preferred
within-study information may be lost in this because it gives greater weight to effect
approach. sizes based on larger samples and larger
Sophisticated statistical models also samples provide more precise estimates of the
have been suggested as a solution to the population value. Also, confidence intervals
problem of dependent effect size estimates are calculated for weighted average d-indexes
(Gleser & Olkin, 1994; Raudenbush and used as a test of the null hypothesis that
et al., 1988) but due to their complexity they no relation exists in the population. Hedges
are yet rarely found in practice. and Olkin (1985), Shadish and Haddock
A compromise solution is to use a shifting (1994), and Lipsey and Wilson (2001) provide
unit of analysis (Cooper, 1998). In this procedures for calculating the appropriate
procedure, each effect size is coded into the weights and confidence intervals.
dataset as if it were an independent estimate. For the d-index this procedure requires the
For example, if a study of playing pool used meta-analyst to calculate a weighting factor,
both the Satisfaction with Life Scale (Diener wi , which is the inverse of the variance
et al., 1985) and the Subjective Happiness associated with each d-index estimate:
Scale (SHS) (Lyubomirsky & Lepper, 1999)
2(ni1 + ni2 )ni1 ni2
to measure well-being, two separate d-indexes wi =
would be calculated. In the shifting unit of 2(ni1 + ni2 )2 + ni1 ni2 di2
analysis approach, for estimating the overall
relation between playing pool with friends where ni1 and ni2 represent the number
and well-being, statistical independence is of data points in Group 1 and Group 2
maintained by averaging these two d-indexes of the comparison and di represents the
prior to entry into the analysis, so that the study d-index of the comparison under considera-
only contributed one effect size. However, tion. Table 32.3 presents the group samples
in an analysis that examined the effect of sizes, d-indexes, and wi associated with each
measurement characteristics on effect size, comparison from our fictional pool-playing
each sample would contribute one estimate to and well-being example. The next step in
the effect size for life satisfaction measures obtaining a weighted average effect size
and one to the effect size for happiness involves multiplying each d-index by its
measures. This shifting unit of analysis associated weight and dividing the sum of
approach retains as much data as possible these products by the sum of the weights. The
from each study while holding to a minimum formula is:
violations of the assumption that data points
N
are independent. di wi
i=1
d• =
N
Averaging effect sizes wi
i=1
The most pivotal outcomes of a meta-analysis
are the average effect sizes and measures where all terms are defined as before.
of dispersion that accompany them. Both Table 32.3 shows the average weighted
unweighted and weighted procedures are d-index for the eight comparisons was found
typically used to calculate average effect to be d = 0.21.
sizes across comparisons. In the unweighted Finally, the confidence interval around the
procedure, each effect size is given equal average effect size estimate can be calculated.
CONDUCTING A META-ANALYSIS 541
1
CId• 95% = 0.21 ± 1.96 = 0.21 ± 0.15
171.89
36.442
Qt = 37.34 − = 29.62
171.89
Qw = 13.89 + 14.72 = 28.61
Qb = 29.62 − 28.61 = 1.01
First, the inverse of the sum of the wi s is found. Lipsey & Wilson, 2001; Rosenthal, 1994), zi
Then, the square root of this variance is using the following formula:
multiplied by the z score associated with
1+r
the confidence interval of interest. Thus, zi = 1 2 loge
the formula for a 95% confidence interval 1−r
would be: where r is the correlation coefficient and loge
1 is the natural logarithm. Next, the following
CId• 95% = d• ± 1.96 N formula is applied to compute the average
wi weighted z:
i=1
N
where all terms are defined as before. The 95% (ni − 3)zi
i=1
confidence interval for the eight pool-playing z• =
comparisons includes values of the d-index N
(ni − 3)
0.15 above and below the average d-index. i=1
Thus, we expect 95% of estimators of this
where ni represents the total sample size for
effect to fall between d = 0.06 and d = 0.36.
the ith comparison and all other terms are
Note that the interval does not contain the
defined as before. For the confidence interval,
value d = 0. It is this information that can
the formula is:
be taken as a test of the null hypothesis that
no relation exists in the population. In this 1.96
CIz• 95% = z• ±
example, we would reject the null hypothesis
N
that there is no difference in well-being (ni − 3)
i=1
between people who play pool with friends
and those who do not. where all terms are defined as before. Finally,
A parallel procedure is conducted to find to present results, z is transformed back to the
the average weighted r-index and confidence original r metric using the inverse of Fisher’s
interval. However, because the sampling z to r transformation (Lipsey & Wilson, 2001):
distribution for r is not symmetrical except
when ρ equals 0, first r is transformed to its e2zi − 1
r=
corresponding z score (Hedges & Olkin, 1985; e2zi + 1
542 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
where e is the base of the natural logarithm the problem of having cell frequencies equal
(2.718) and all other terms are defined as to zero, this strategy will bias the estimate
before. such that the strength of the relationship
Like the correlation coefficient, the odds will be slightly underestimated (Fleiss, 1994).
ratio must also be transformed by taking When only a few contingency tables contain
the natural logarithm (Haddock et al., 1998; zeros, this solution is acceptable. However, if
Lipsey & Wilson, 2001): there are many cases in which cell frequencies
are equal to zero, the Mantel-Haenszel method
LOR = loge (OR) of combining odds ratios should be used
(Hauck, 1989). The interested reader may
Next, a weighting factor, wi , which is the
refer to Lipsey and Wilson (2001) or Shadish
inverse of the variance associated with each
and Haddock (1994) for a full discussion of
logged odds ratio is calculated using the
this method.
following formula:
abcd
wi = Models of error
ab(c + d) + cd(a + b)
Another aspect of conducting a meta-analysis
where all terms are defined in Table 32.2 that has recently received considerable atten-
illustrating the odds of being arrested after tion involves the decision about whether a
playing pool with friends. fixed effects or random effects model of error
The next step in obtaining a weighted underlies the generation of study outcomes.
average effect size involves multiplying each In a fixed effects model, all studies assumed
logged odds ratio by its associated weight and to be drawn from a common population are
dividing the sum of these products by the sum therefore, estimating a common population
of the weights. The formula to calculate the effect. As such, variance in effect sizes is
weighted average logged odds ratio is: assumed to reflect only sampling error, that
N is, error solely due to participant differences.
LORi wi This type of error is the only error taken into
i=1
LOR• = account using the procedures just described
N
for weighting effect sizes by sample size.
wi
i=1 However, sometimes other features of studies
can be viewed as random influences. For
where LORi represents the logged odds ratio example, studies that look at the impact of
for the ith comparison and all other terms are pool playing on well-being might vary in
defined as before. For the 95% confidence the types of pool halls in which the studies
interval, the formula is: were conducted, in the length of play, and
1 in the game of pool being played. In this
CILOR• 95% = LOR• ± 1.96 N case, it may be most appropriate to consider
pool halls as randomly sampled from all
wi
i=1 pool halls and pool games randomly sampled
from all games. That is, in a random-effect
where all terms are as defined before. Finally
analysis, study-level variance is assumed to
these summary statistics can be converted
be present as an additional source of random
back to the original odds ratio metric by taking
influence.
the antilogarithms.
The question each meta-analyst must ask
OR = eLOR is whether the effect sizes in a dataset are
affected by a large number of these study-
It should be noted that if any of the cell level random influences. If it is the case that
frequencies equal zero, 0.5 should be added the meta-analyst suspects a larger number
to every cell. Even though this solution solves of these additional sources of random error
CONDUCTING A META-ANALYSIS 543
in effect sizes then a random effects model effect size estimate under a fixed effects model
is most appropriate in order to take these was d = 0.21 with a 95% confidence interval
sources of variance into account. If the meta- from 0.06 to 0.36. However, when a random
analyst suspects that the data are most likely effects model was used, the estimate was
little affected by other sources of random d = 0.31 with a 95% confidence interval from
variance, then a fixed effects model can −0.01 to 0.63. Note that the mean estimate
be applied. Alternatively, Hedges and Vevea of d changes using the random-effect error
(1998; p. 3) state that fixed-effect models model, because of a changed (lesser) effect
of error are most appropriate when the goal of weighting studies by sample size on the
of the research is ‘to make inferences only result. Note also that in the random-effects
about the effect size parameters in the set of error model, the variance around the mean
studies that are observed (or a set of studies estimate increases and the combined result of
identical to the observed studies except for pool-playing studies no longer rejects the null
uncertainty associated with the sampling of hypothesis. In this case then, caution must be
subjects).’A further statistical consideration is taken when considering the interpretation of
that in the search for moderators, fixed effect the result that playing pool with friends has
models may seriously underestimate error a positive effect on well-being, given that the
variance and random effects models may seri- effect is statistically different from zero only
ously overestimate error variance when their when a fixed-effects model is assumed.
assumptions are violated (Overton, 1998).
In view of these competing sets of con-
Homogeneity of effect sizes
cerns, we recommend that the meta-analyst
consider applying both models (e.g. Cooper In addition to the confidence interval as a
et al., 2006). Specifically, all analyses could measure of dispersion, meta-analysts usually
be conducted twice, once employing fixed carry out homogeneity analyses. Homogene-
effect assumptions and once using random ity analyses allow the meta-analyst to explore
effect assumptions. Differences in results if effect sizes vary from one study to the
based on which set of assumptions is used next. A homogeneity analysis compares the
can be incorporated into the interpretation and amount of variance in an observed set of
discussion of findings. effect sizes with the amount of variance that
Formulas to calculate random effects esti- would be expected by sampling error alone
mates of the mean effect size, confidence and provides calculation of how probable it
intervals, and homogeneity statistics are com- is that the variance exhibited by the effect
plex and involve a two-stage process. As such, sizes would be observed if only sampling
the interested reader should refer to Hedges error was making them different. If there is
and Olkin (1985), Raudenbush (1994), and greater variation in effects than would be
Lipsey and Wilson (2001) for a full dis- expected by chance, then the meta-analyst can
cussion of random effects computation. In begin the process of examining moderators
addition, several statistical packages have of comparison outcomes. If the observed
recently been developed specifically for meta- variance is not significantly different from
analysis that allow the meta-analyst to easily that expected by sampling error alone, many
conduct analyses using both fixed and ran- statisticians advise the meta-analyst to stop the
dom effects assumptions (e.g. Comprehensive analysis there and not look for moderators.
Meta-Analysis; Borenstein et al., 2005). After all, chance is the most parsimonious
For the remainder of this chapter, random explanation for the variation in effect sizes.
effect estimates will be presented for our We recommend that the meta-analyst may
running example, although formulas and search for moderators in the absence of a
computations will not be shown. statistically significant homogeneity analysis
In our fictional meta-analysis of the effect if there are good theoretical reasons for
of playing pool with friends on well-being, the doing so.
544 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
subsets of studies, comparing the average effect of playing pool with friends has a
effect sizes for different methods, types of significant impact on well-being for females,
programs, outcome measures, and partici- d = 0.29 (95% CI = 0.08/0.79) but not males,
pants and compares these to determine if d = 0.13 (95% CI = −0.09/0.35). As shown
they provide insight into what influences the in Table 32.3, the Qt statistic for the eight
strength and/or direction of the relationship. studies was 29.61. The Qw statistic for females
In fact, a major strength of meta-analysis is was 13.89 and for males was 14.72 and the
that the meta-analyst can ask questions about total Qw for both groups is 28.61. From here,
variables that moderate outcomes even if no the Qb statistic comparing males to females
individual study has included the moderator can be calculated, Qb (1) = 1.01, p = 0.32.
variable. In our example, we can ask whether This result was not significant with 1 degree of
the relationship between playing pool with freedom. Using a random-effect error model,
friends and well-being differs for females the impact of playing pool with friends does
compared to males, even if no single study not have a significant effect on either females,
has included both groups. The results of such a d = 0.41 (95% CI = −0.08/0.89), or males
comparison of average effect sizes can suggest d = 0.23 (95% CI = −0.26/0.72). Further,
whether gender would be important to look at the Qb statistic comparing males to females
in future research. under random effects assumptions indicated
The procedure to test whether a method- that there was not a significant difference in
ological or conceptual distinction between the average weighted d-index between the
comparisons explains variance in effect sizes groups, Qb (1) = 0.26, p = 0.61.
involves several steps. First, a Qt statistic is In this way, the meta-analyst employs a
calculated using the formula just presented. formal means for testing whether different
Then, a Q statistic is calculated separately for features of studies explain variation in their
each subgroup of studies. Then the values of outcomes. This is an extension of the
these Q statistics are summed to form a value same rules of inference required of primary
called Qw . This value is then subtracted from researchers. If reliable differences do exist,
Qt to obtain Qb . the average effect sizes corresponding to these
differences will take on added meaning and
Qb = Qt − Qw will help the meta-analyst to guide future
research or make policy recommendations.
This Qb statistic is used to test whether the Further, in meta-analysis, tests of moderation
average effects from the groupings of studies may allow for the examination of certain
are homogenous. It is compared to a chi- forms of research bias. For example, modera-
square table using degrees of freedom one less tor tests can be employed to explore whether
than the number of groupings. If Qb exceeds stronger effects are more likely to come
the critical value, then the grouping variable is from certain researchers or whether allegiance
a significant contributor to variance in effect effects in clinical research are present. Specif-
sizes and remains a plausible moderator of ically, allegiance effects can be examined by
effect. This test is analogous to conducting an using the preference researchers have for a
analysis of variance in that a significant Qb particular treatment over others as a grouping
indicates that at least one group mean differs variable when exploring explanations for the
from the others. variation in study outcomes.
We use our example, illustrated in An alternative strategy for examining
Table 32.3, to demonstrate how a search for whether particular characteristics of studies
moderators of outcomes might proceed. Let us are related to the sizes of the treatment
compare effect sizes calculated from female effect is meta-regression. Unlike the strategy
samples compared to effect sizes using male previously discussed, meta-regression allows
samples, given in the last column. First, we the meta-analyst to explore the relationship
find that using a fixed-effect error model the between continuous, as well as categorical,
546 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
characteristics and effect size, and allows the THE ISSUE OF DATA CENSORING
effects of multiple factors to be investigated
simultaneously (Thompson & Higgins, 2002). Many meta-analysts go to great lengths to
In our example, imagine that our studies locate as much relevant research as possible.
ranged in the duration of the manipulation However, even after careful planning, search-
of playing pool with friends. One option ing, and coding of research reports, missing
would be to group studies into several distinct data can influence the conclusions drawn
categories of duration of pool playing and from the meta-analysis. Just as biases in the
continue with subgroup moderator analyses as selection of study participants threaten the
previously discussed. However, an alternative validity of primary research, data censoring
would be to employ meta-regression, leaving threatens the validity of the meta-analysis
this characteristic continuous. The interested (Rothstein et al., 2005). When data are
reader may refer to Thompson and Higgins systematically missing, not only is the size
(2002) or Higgins and Thompson (2004) for of the sample gathered for the research
a full discussion of this method. synthesis reduced, but the representativeness
of the sample and the validity of the results
Sensitivity analysis are compromised, regardless of the quality
An additional step in meta-analysis is the of the meta-analysis in all other respects
performance of sensitivity analyses. A sen- (Rothstein et al., 2005).
sitivity analysis is used to determine if and
how the conclusions of an analysis might
Types of data censoring
differ if it was conducted using different
statistical procedures or assumptions. There Data censoring occurs when primary
are numerous points at which a meta- researchers, journals, or publishers censor
analyst might decide a sensitivity analysis is what research gets into print or what specific
appropriate. For example, there might be a findings or aspects of the research are
set of comparisons that fall at the edge of reported. This data censoring can often cause
the conceptual definition of what constitutes the research included in a meta-analysis
an acceptably reliable measure of well-being. to be systematically unrepresentative of
The effects of playing pool with friends might the population of completed studies. As
be tested with and without the inclusion of suggested by Pigott (1994), there are three
these comparisons. Or, some evaluations of kinds of missing data that can result from
the relation between playing pool with friends data censoring.
and well-being might have missing data. First, entire studies may be unavailable to
These comparisons might be omitted from include in a dataset. In particular, unpublished
one analysis and included in another analysis research findings are frequently missing
that makes conservative assumptions about from meta-analyses. The research synthe-
what those values might be. The calculation sist can take extra precautions to include
of weighted, unweighted, and median effect unpublished research that may be difficult
sizes can be considered a form of sensitivity to locate. For example, search techniques
analysis. Lastly, averaging effect sizes and that include contacting professional networks
conducting homogeneity and moderator tests and listservs, using conference programs, or
using both fixed and random effects models is searching databases that include dissertation
another form of sensitivity analysis. In each and masters theses (Dissertation Abstracts)
case, the meta-analyst is seeking to determine can improve the inclusiveness of the studies
whether a particular finding is robust across in the meta-analysis. However, inevitably,
different sets of assumptions. If the answer there will be relevant studies left undis-
is ‘conclusions do not change under different covered. This form of data censoring is
sets of assumptions’ then greater confidence problematic because it frequently reflects the
can be placed in the conclusion. bias against the null hypothesis found in
CONDUCTING A META-ANALYSIS 547
published research. That is, published articles playing pool with friends on well-being, had
tend to report statistically significant results, particular studies failed to report the gender
whereas, unpublished research is less likely to makeup of their participant sample, those
include statistically significant results. studies could not have been included in the
Evidence suggests that bias against the null moderator analysis.
hypothesis is present in the decisions made
by both reviewers and primary researchers
Detecting missing data
(Cooper, 1998). For example, Atkinson et al.
(1982) found that significant results were A number of graphical and statistical tests
more than twice as likely as non-significant can be used to assess the possible presence
results to be recommended for publication of data censoring and the implications of this
in two APA journals in counseling psy- threat to the validity of the conclusions drawn
chology even when research designs of from the meta-analysis. One way a meta-
studies were identical. Greenwald (1975) analyst can evaluate whether data censoring
found that researchers said they were inclined has affected a distribution of effect sizes is to
to submit significant results for publication create a funnel plot (Light & Pillemer, 1984).
approximately 60% of the time. However, A funnel plot graphically depicts a measure of
they would submit the study for publication the sample size of studies, such as their given
only 6% of the time if the results failed to weight or precision, against their associated
reject the null hypothesis. When examining effect sizes (Greenhouse & Iyengar, 1994).
actual decisions made by researchers, Cooper If the meta-analyst has captured all the
et al. (1997) found that approximately 74% relevant studies, the funnel plots should be
of researchers submitted significant results symmetric around the mean and approximate
for publication, but only 5% submitted non- the shape of the normal distribution. However,
significant results. publication biases can restrict the range of the
Second, even if all relevant studies have distribution, resulting in overrepresentation
been uncovered, individual studies may be of studies in one tail of the distribution
missing relevant information necessary in (Sterne et al., 2005). In addition to graphical
order to calculate an effect size. Missing effect displays, regression methods such as the Rank
sizes will occur when the primary researcher Correlation Test (Begg & Mazumdar, 1994)
does not report adequate statistics or descrip- and Egger’s Test (Egger et al., 1997) can be
tive information needed to calculate an effect used to detect whether a bias is present (see
size. The consequence of missing an effect Sterne & Egger, 2005 for full discussion of
size is similar to missing an entire study. That these strategies).
is, a study with a missing effect size cannot Figure 32.2 presents the funnel plot illus-
be included in the estimate of the average trating the distribution of effect sizes from our
effect. Consequently, the generalizability of example meta-analysis on the effect of playing
the results may be limited to the sample of pool with friends on well-being. Our plot
studies which had complete data. Further, suggests the presence of bias, as the bottom
similar to reasons why entire studies may of the plot shows a higher concentration of
be missing from a review, effect sizes are studies on the right side of the mean compared
frequently unreported in published reports to the left.
when the relationship was not significant, and Another way to explore for possible data
thus, the author fails to report the precise censoring is by using publication status
values of the means, standard deviations, as a moderator variable in a homogeneity
statistical test, and/or p values (Pigott, 1994). analysis. As previously discussed, homogene-
Finally, information about study charac- ity analysis allows the meta-analyst to test
teristics used to examine moderators of an whether sampling error alone accounts for
effect may be missing from individual reports. variation in effect sizes or whether features
For example, when examining the effect of of studies, in this case, publication status,
548 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
6
Precision (1/Std Err)
0
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0
Std diff in means
influences the observed effect sizes. In this direction of each test of the treatment and the
way, the meta-analyst can use observed sample size associated with each condition,
studies to assess whether publication status treatment and control.
moderates an overall effect. Briefly, the Pigott (1994) outlined several methods of
meta-analyst calculates average effect sizes imputing an estimate for missing values.
for published and unpublished studies and One strategy is to assume that missing
compares these to determine if there is a values are equivalent to a very conservative
significant difference in the strength and/or estimate, such as zero. Another option is
direction of the relationship. A description of to replace missing values with the mean
the procedure used to conduct a homogeneity value calculated from available cases for that
analysis was discussed in the section ‘Testing variable. Regression techniques can also be
for moderators of effect sizes’. used to impute missing values. Complete
cases are used to generate a regression
equation that can be used to estimate missing
Strategies for imputing missing data
values. A final alternative that appears to be
There are a number of strategies that meta- promising are multiple imputation procedures
analysts can use to deal with data censoring. (Rubin, 1987). Multiple imputation tech-
Rothstein et al. (2005) provide an in-depth niques use information from complete cases in
treatment of numerous approaches. One way the review to generate multiple estimates for
is to try to estimate the missing value using each missing value. The advantage of using
one of a number of imputation techniques. multiple imputation is that a range of estimates
Vote-counting is one strategy that can be are provided for each missing observation.
used to generate an effect size estimate (see Therefore, results using each of the estimates
Bushman, 1994; Hedges & Olkin, 1985 for a can be compared.
discussion of vote-counting techniques). That Even though imputing an estimate for
is, the underlying magnitude of a treatment’s missing values allows the meta-analyst to
effect can be estimated from the proportions include in the synthesis cases with miss-
of studies showing positive and negative ing data, data imputation methods force
directional outcomes. However, this approach the meta-analyst to make assumptions that
requires that the vote-counter knows the may not be accurate and can result in
CONDUCTING A META-ANALYSIS 549
other types of bias. In particular, when analyses are consistent with variation in effect
using single-value imputation methods, the sizes that would be predicted if the estimates
assumption that missing values may be were normally distributed. The method first
either smaller than or similar to observed examines whether the distribution of observed
values may simply be incorrect. Further, effect sizes is skewed, indicating a possible
using single-value imputation methods can bias created either by the study retrieval
result in an artificially reduced variance procedures or by data censoring on the part
for those variables for which values were of authors. Then it provides a way to estimate
imputed. This reduced variance is particularly the values from missing studies that need to be
problematic when testing the homogeneity present to approximate a normal distribution.
of effect sizes. In fact, one advantage of It imputes these missing values, permitting an
the regression imputation technique is that examination of an estimate of the impact of
an adjustment can be applied to correct data censoring on the observed distribution of
for this underestimation of the sampling effect sizes and the statistics resulting from
variance (Little & Rubin, 1987). While all including the imputed values.
but the zero imputation technique provide More specifically, the Trim-and-Fill tech-
a reasonable estimate for the mean when nique uses a nonparametric method that
information is missing completely at random, initially removes the asymmetric studies from
when information is missing for reasons the right side of the funnel plot (those
related to the value itself or other observed indicating a positive effect) in order to
or unobserved variables, these imputation compute an unbiased estimate of the effect.
results fail to generate an unbiased estimate Missing effect sizes from the left side of the
(Sutton & Pigott, 2005). Given the growing plot (those that would reduce the size of the
awareness of publication bias, imputation positive effect) are then estimated based on
techniques seem destined to remain an the normal distribution. Finally, both removed
important area for the development of new and imputed studies are placed into the funnel
meta-analytic techniques. plot and a new combined effect that includes
Regardless of which method is employed, these imputed effect sizes is computed. Con-
meta-analysts are obligated to discuss how sequently, the Trim-and-Fill method provides
much data was missing from their reports, a sensitivity analysis in which the meta-
how they handled it, and why they chose analyst can compare the observed combined
the methods they did. Finally, it is becoming effect size to the hypothetical combined effect
increasingly common practice for meta- size when imputed missing effect sizes are
analysts with large amounts of missing included.
data to conduct their analyses using more Figure 32.3 depicts the asymmetric funnel
than one strategy and determining whether plot of effect sizes from our fictional meta-
their findings are robust across different analysis with effect sizes imputed using
missing data assumptions (see Greenhouse the Trim-and-Fill method included to make
and Iyengar, 1994). the funnel plot symmetric. When looking
for missing studies on the left side of the
The Trim-and-Fill procedure distribution (and based on a fixed-effect
There is an interesting imputation method that model), the Trim-and-Fill technique suggests
is gaining popularity because of its simplicity that there are three missing studies. Recall
and ease of use. Duval and Tweedie, (2000a, that the fixed effects observed point estimate
2000b) have recently developed a Trim-and- and 95% confidence interval for the combined
Fill method that, through an iterative process, studies is 0.21 (95% CI = 0.06/0.36). Using
fills in possible values for effect sizes from Trim-and-Fill, the imputed fixed effects point
studies that are not represented in the dataset. estimate is 0.04 (95% CI = −0.09/0.18). The
The Trim-and-Fill procedure tests whether random effects observed point estimate and
the distribution of effect sizes used in the 95% confidence interval for the combined
550 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
6
Precision (1/Std Err)
Figure 32.3 Funnel plot of observed and imputed d indexes for example meta-analysis
Note: Black dots represent imputed effect sizes, making the distribution symmetrical.
studies is 0.31 (95% CI = −0.01/.63). Using of effect sizes is heterogeneous, that the
the Trim-and-Fill method the imputed random differences between study outcomes exceed
effects point estimate is 0.05 (95% CI = that which may have been found by chance
−0.29/0.38). Thus, this imputation technique alone. In contrast, a non-significant Q statistic
changes our finding both in the statistical sig- indicates that the differences underlying the
nificance and magnitude of effect. Therefore, results of studies can be accounted for by
we may not be confident that the positive find- sampling error alone.
ing of our meta-analysis on the observed eight However, the Q statistic itself has power
studies testing the effect of playing pool with characteristics. That is, it may fail to detect
friends on well-being is robust against a plau- meaningful heterogeneity in the case in which
sible assumption about data censoring. In such just a few studies are being meta-analyzed,
a case, we would certainly discuss the impli- or it may detect ‘unimportant’ heterogeneity
cations of this finding and take care to caution when a large number of studies are being
the reader about this important limitation. synthesized (Hardy & Thompson, 1998).
Consequently, it may be advisable for the
meta-analyst to report another statistic, I 2 ,
which provides a way to quantify the
NEW DIRECTIONS IN META-ANALYSIS heterogeneity among effect sizes included in
a synthesis. I 2 describes the percentage of
Alternative indices of heterogeneity total variation across studies that is due to
We have seen that studies addressing a heterogeneity rather than chance (Higgins &
common question will generally vary in Thompson, 2002; Higgins et al., 2003). It can
terms of their design, interventions or other be derived from the Q test using the following
manipulations, sample characteristics, and/or formula:
outcomes. And, as previously mentioned, the I 2 = 100% × (Q − df )/Q
most common way of assessing heterogeneity
in a set of effect sizes is the Q test. A where all terms are as previously defined.
significant Q statistic indicates that the sample Negative values for I 2 should be assumed
CONDUCTING A META-ANALYSIS 551
to be equivalent to zero and indicate no variable, controlling for all other predictors,
heterogeneity. Non-zero I 2 values represent given one unit change in the criterion
the extent to which heterogeneity is present variable.
in the sample of studies, with 100% being Syntheses of regression analyses are diffi-
the maximum value. For example, in our cult to conduct for a variety of reasons. First,
fictional meta-analysis examining the effect models using multiple regression generally
of playing pool with friends on well-being, differ from study to study. Each study may
I 2 = 100% × (29.616 − 7)/29.616 = 76.364, include different predictors in the regres-
indicating that over 76% of the variability sion model and therefore, the slope for
between our eight studies cannot be explained the predictor of interest will represent a
by sampling error alone. different partial relationship in each study
As suggested by Higgins and colleagues (Wu & Becker, 2004). Second, the scale of
(2003), there are several important advantages the predictor of interest and outcome may
of I 2 . First, I 2 overcomes many of the vary across studies (Wu & Becker, 2004).
drawbacks of the Q test because it does not In some cases, a predictor such as SAT
depend directly on the number of independent scores or monetary expenditures may have
effects included in the meta-analysis. Second, a common scale. However, in most cases
given that I 2 is a percentage, it can be the scale of both the predictor and outcome
easily compared across meta-analyses, even variable will vary, making comparisons across
when they may differ in the number of studies difficult. Still, this problem can be
studies included, the outcome being assessed, overcome by using β, the fully standardized
or effect size metric used. Finally, I 2 is estimate of the slope for a particular predictor
easily computed from statistical tests that when the scaling of both the predictor
are normally conducted in a meta-analysis. and outcome variable differ across studies.
Currently, I 2 is rarely reported in published ‘Half-standardizing’ is an alternative way to
meta-analyses outside of medicine, but its create similar slopes when only outcomes are
clear advantages, as well as the ease by dissimilar (Greenwald et al., 1996).
which it can be interpreted, suggest it will If slopes are independently and identically
soon be reported regularly in social science distributed, we can apply standard methods
meta-analyses as well. for meta-analysis. Slopes will be identically
distributed across studies when the outcome
and predictor of interest are measured in a
Combining slopes from multiple
similar fashion, the other predictors in the
regressions
model are the same across studies, and when
Up to this point, the procedures for combining predictor and outcome scores are similarly
and comparing study results have generally distributed (Becker, 2005). If these conditions
assumed that the measure of effect is a are met, weighting can be accomplished by
mean difference, correlation, or odds ratio. multiplying each effect size by the inverse
However, regression analysis is a commonly of its variance and then the sum of these
used technique in the social sciences, particu- products is divided by the sum of the inverses.
larly for non-experimental studies. Like the Standard tests can be then computed, includ-
standardized mean difference or correlation ing the mean effect, confidence intervals, and
coefficient, the regression coefficient, b, or homogeneity tests.
the standardized regression coefficient, β, are However, it is rare that datasets meet the
also measures of effect size. β will typically assumption of being identically and indepen-
be used in meta-analyses because, like the dently distributed (Becker, 2005). Typically,
d-index and r-index, it standardizes effect measures differ across studies and regression
size estimates when different measures are models are diverse in terms of which
used in different studies. β represents the additional variables are included in them.
standardized score change in a predictor And, because few studies provide descriptive
552 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
statistics on the variables measured and Becker, B. J. (2005, November). Synthesizing Slopes in
included in the regression model, it remains Meta-analysis. Paper presented at the meeting on
difficult to assess whether the assumption that Research Synthesis and Meta-Analysis: State of the
scores are distributed similarly across studies Art and Future Directions, Durham, NC.
has been met. Given the current limitations, Begg, C. B. & Mazumdar, M. (1994). Operating charac-
teristics of a rank correlation test for publication bias.
a common method for summarizing the
Biometrics, 50, 1088–1101.
results of the regression analyses has been
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H.
to use a vote-count strategy (see Cooper (2005). Comprehensive Meta Analysis (Version 2.1)
et al., 2006; Hanushek, 1989; or Patall et al., [Computer software]. Englewood, NJ: BioStat.
2007, for examples). What remains clear is Bushman, B. J. (1994). Vote-counting procedures
that techniques for synthesizing results from in meta-analysis. In H. Cooper & L.V. Hedges
multiple regression analyses need to be more (Eds.). Handbook of Research Synthesis. New York:
extensively developed and studied. Russell Sage.
Cohen, J. (1988). Statistical Power Analysis in the
Behavioral Sciences. Hillsdale, NJ: Erlbaum.
Cooper, H. M. (1998). Synthesizing Research: A Guide
CONCLUSIONS for Literature Reviews (3rd ed.). Thousand Oaks,
CA: Sage.
In this chapter, the major meta-analytic Cooper, H., DeNeve, K., & Charlton, K. (1997). Finding
procedures, challenges that face the meta- the missing science: The fate of studies submitted for
analyst, and new directions of meta-analysis review by a human subjects committee. Psychological
were discussed. What should be evident is that Methods, 2, 447–452.
meta-analysis is a powerful tool that can be Cooper, H. & Hedges, L. V. (1994). Handbook of
used to inform future social science research, Research Synthesis. New York: Russell Sage.
as well as social policy decision-making. Cooper, H., Robinson, J. C., & Patall, E. A. (2006).
Does homework improve academic achievement?:
While meta-analysis is not without limitation,
A synthesis of research, 1987–2003. Review of
meta-analyses help to meet rigorous standards
Educational Research, 76, 1–62.
that allow us to be more confident when Cooper, H. M. & Rosenthal, R. (1980). Statisti-
drawing conclusions about the cumulative cal versus traditional procedures for summariz-
state of evidence on relationships in our social ing research findings. Psychological Bulletin, 87,
world. 442–449.
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S.
(1985). The stisfaction with life scale. Journal of
Personality Assessment, 49, 71–75.
NOTES Duval, S. & Tweedie, R. (2000a). A nonparametric ‘trim
and fill’ method of accounting for publication bias
1 Hedges (1980) showed that the d-index may in meta-analysis. Journal of the American Statistical
slightly overestimate the size of an effect in the entire Association, 95, 89–98.
population. However, the bias is minimal if the sample
Duval, S. & Tweedie, R. (2000b). Trim and fill: A simple
size is more than 20. If a meta-analyst is calculating
d-indexes from primary research based on samples funnel plot-based method of testing and adjusting
smaller than 20, Hedges’ (1980) correction factor for publication bias in meta-analysis. Biometrics, 56,
should be applied. 276–284.
Egger, M., Davey Smith, G., Schneider, M., & Minder, C.
(1997). Bias detected in meta-analysis detected by a
simple, graphical test. British Medical Journal, 315,
REFERENCES 629–634.
Fleiss, J. L. (1994). Measures of effect size for categorical
Atkinson, D. R., Furlong, M. J., & Wampold, B. R. (1982). data. In H. Cooper & L. V. Hedges (Eds.). Handbook
Statistical significance, reviewer evaluations and of Research Synthesis. pp. 245–260. New York:
scientific process: Is there a (statistically) significant Russell Sage.
relationship? Journal of Counseling Psychology, 29, Gleser, L. J. & Olkin, I. (1994). Stochastically dependent
189–194. effect sizes. In H. Cooper & L. V. Hedges
CONDUCTING A META-ANALYSIS 553
(Eds.). Handbook of Research Synthesis. New York: Little, R. J. A. & Rubin, D. B. (1987). Statistical Analysis
Russell Sage. with Missing Data. New York: Wiley.
Greenhouse, J. B. & Iyengar, S. (1994). Sensitivity analy- Lyubomirsky, S. & Lepper, H. S. (1999). A measure
sis and diagnostics. In H. Cooper & L. V. Hedges (Eds.). of subjective happiness: Preliminary reliability and
Handbook of Research Synthesis. pp. 383–398. construct validation. Social Indicators Research, 46,
New York: Russell Sage. 137–155.
Greenwald, A. (1975). Consequences of prejudice Overton, R. C. (1998). A comparison of fixed-effects
against the null hypothesis. Psychological Bulletin, and mixed (random-effects) models for meta-analysis
82, 1–20. tests of moderator variable effects. Psychological
Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). The Methods, 3, 354–379.
effect of school resources on student achievement. Patall, E. A., Cooper, H., & Robinson, J. C. (2007). Parent
Review of Educational Research, 66, 361–396. involvement in homework: A research synthesis.
Haddock, C. K., Rindskopf, D., & Shadish, W. R. (1998). Manuscript submitted for publication.
Using odds ratios as effect sizes for meta-analysis of Pigott, T. D. (1994). Methods for handling missing data
dichotomous data: A primer on methods and issues. in research synthesis. In H. Cooper & L. V. Hedges
Psychological Methods, 3, 339–353. (Eds.). Handbook of Research Synthesis. New York:
Hanushek, E. A. (1989). The impact of differential Russell Sage.
expenditures on school performance. Educational Raudenbush, S. W. (1994). Random effects models.
Researcher, 18, 45–51. In H. Cooper & L. V. Hedges (Eds.). Handbook of
Hardy R. J. & Thompson, S. G. (1998). Detecting and Research Synthesis. pp. 301–322. New York: Russell
describing heterogeneity in meta-analysis. Statistics Sage.
in Medicine, 17, 841–856. Raudenbush, S. W., Becker, B. J., & Kalaian, H. (1988).
Hauck, W. W. (1989). Odds ratio inference from Modeling multivariate effect sizes. Psychological
stratified samples. Communications in Statistics, 18A, Bulletin, 103, 111–120.
767–800. Rosenthal, R. (1984). Meta-Analytic Procedures for
Hedges, L. V. (1980). Unbiased estimation of effect Social Research. Beverly Hills, CA: Sage.
size. Evaluation in Education: An International Review Rosenthal, R. (1994). Parametric measures of effect size.
Series, 4, 25–27. In H. Cooper & L. V. Hedges (Eds.). Handbook of
Hedges, L. V. & Olkin, I. (1985). Statistical Methods for Research Synthesis. New York: Russell Sage.
Meta-analysis. Orlando, FL: Academic Press. Rothstein, H. R., Sutton, A. J., & Borenstein, M.
Hedges, L. V. & Vevea, J. L. (1998). Fixed and ran- (2005). Publication Bias in Meta-analysis: Prevention,
dom effects models in meta-analysis. Psychological Assessment and Adjustments. Chichester, UK: John
Methods, 3, 486–504. Wiley & Sons, Ltd.
Higgins, J. P. T. & Thompson, S. G. (2002). Quantifying Rubin, D. B. (1987). Multiple Imputation for Nonre-
heterogeneity in a meta-analysis. Statistics in sponse in Surveys. New York: Wiley.
Medicine, 21, 1539–1558. Shadish, W. R. & Haddock, C. K. (1994). Com-
Higgins, J. P. T. & Thompson, S. G. (2004). Controlling bining estimates of effect size. In H. Cooper &
the risk of spurious findings from meta-regression. L. V. Hedges (Eds.). Handbook of Research Synthesis.
Statistics in Medicine, 23, 1663–1682. pp. 261–282. New York: Russell Sage.
Higgins, J. P. T., Thompson, S. G., Deeks, J.J., & Altman, Sterne, J. A. C., Becker, B. J., & Egger, M. (2005).
D. G. (2003). Measuring inconsistency in meta- The funnel plot. In H. R. Rothstein, A. J. Sutton, &
analyses. British Medical Journal, 327, 557–560. M. Borenstein (Eds.). Publication Bias in Meta-
Hunt, M. (1997). How Science Takes Stock: the Story of analysis: Prevention, Assessment and Adjustments.
Meta-analysis. New York: Russell Sage Foundation. pp. 75–98. Chichester, UK: John Wiley & Sons, Ltd.
Hunter, J. E. & Schmidt, F. L. (2004). Methods of Sterne, J. A. C. & Egger, M. (2005). Regression
Meta-analysis: Correcting Error and Bias in Research methods to detect publication and other bias in
Findings (2nd ed.). Thousand Oaks, CA: Sage. meta-analysis. In H. R. Rothstein, A. J. Sutton, &
Last, J. M. (2001). A Dictionary of Epidemiology. Oxford: M. Borenstein (Eds.). Publication Bias in Meta-
Oxford University Press. analysis: Prevention, Assessment and Adjustments.
Light, R. J. & Pillemer, D. B. (1984). Summing Up: pp. 99–110. Chichester, UK: John Wiley & Sons, Ltd.
The Science of Reviewing Research. Cambridge, MA: Sutton, A. J. & Pigott, T. D. (2005). Bias in meta-
Harvard University Press. analysis induced by incompletely reported studies.
Lipsey, M. W. & Wilson, D. B. (2001). Practical Meta- In H. R. Rothstein, A. J. Sutton, & M. Borenstein
analysis. Thousand Oaks, CA: Sage. (Eds.). Publication Bias in Meta-analysis: Prevention,
554 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Assessment and Adjustments. pp. 223–240. Wu, M. & Becker, B. J. (2004, April). Synthesizing
Chichester, UK: John Wiley & Sons, Ltd. Results from Regression Studies: What can we Learn
Thompson, S. G. & Higgins, J. P. T. (2002). from Combining Results from Studies Using Large
How should meta-regression analyses be under- Data Sets? Paper presented at the annual meeting
taken and interpreted? Statistics in Medicine, 21, of the American Educational Research Association,
1559–1573. San Diego, CA.
33
Synergy and Synthesis:
Integrating Qualitative and
Quantitative Data
Jane Fielding and Nigel Fielding
different methods, triangulation must assume counteract the ‘holistic fallacy’that all aspects
that variations in findings arise from the of a situation are congruent, and can demon-
phenomenon or the particularities of the strate the generalisability of limited-sample
methods being combined rather than methods observations. Qualitative research sometimes
haphazardly producing different findings on succumbs to ‘elite bias’, concentrating on
different occasions, or there being no pre- respondents who are articulate, strategically
dictable consistencies in the working of given placed and have a status that impresses
methods. The latter is especially important in researchers. Quantitative data can compensate
the convergent validation approach to trian- by indicating the full range that should be
gulation, as it is premised on the combined sampled. Qualitative data can contribute depth
methods having different and distinctive to quantitative research, and suggest leads that
biases; if methods are susceptible to the same the more limited kinds of quantitative data
biases, combining them may simply multiply cannot address.
error. Further implied is that these sources of As well as combining methods, triangu-
error can be anticipated and their effects can lation can also involve using a number of
be traced during analysis. It is in this sense data sources (self, informants, other com-
that Levins’ (1966: 423) declaration that ‘our mentators), several accounts of events, or
truth is the intersection of independent lies’ several researchers. Denzin’s (1970) original
is so apt. conceptualisation, which was related to Webb
The doctrine of convergent validation et al’s (1966) work on ‘unobtrusive mea-
therefore requires agreement of results from sures’, not only involved multiple methods
diverse but systematic uses of methods, (‘data triangulation’) but multiple investiga-
data sources, theories and investigators tors (‘investigator triangulation’) and multiple
(Denzin 1989). Some maintain that combin- methodological and theoretical frameworks
ing methods or drawing on different data (‘theoretical and methodological triangula-
sources only enhances validity where each tion’). Each main type has a set of sub-
is associated with compatible ontological and types. Data triangulation may include time
epistemological perspectives (Blaikie 1991). triangulation, exploring temporal influences
Post-positivists have somewhat sidestepped by longitudinal and cross-sectional designs;
the ontological/epistemological critique with space triangulation, taking the form of
the argument that datasets are open to comparative research; and person triangu-
interpretation from a range of theories. lation, variously at the individual level,
Another perspective is that combining dif- the interactive level among groups and the
ferent methodologies does not necessarily collective level. In investigator triangulation,
enhance validity but can extend the scope more than one person examines the same
and depth of understanding (Fielding and situation. In theory triangulation, situations
Fielding 1986; Denzin and Lincoln 2000; are examined from different theoretical per-
Fielding and Schreier 2001). spectives. Methodological triangulation has
Triangulation has also been informed by two variants: ‘within-method’, where the
rationales for the methodological ‘division of same method is used on different occasions
labour’ (Sieber 1973). For Sieber, qualitative (without which one could hardly refer to
work can assist quantitative work in providing ‘method’at all), and ‘between-method’, where
a theoretical framework, validating survey different methods are applied to the same
data, interpreting statistical relationships and subject in explicit relation to each other.
deciphering puzzling responses, selecting sur- While the classical approach represented
vey items to construct indices and providing by Campbell’s work seeks convergence
case studies. Quantitative data can identify or confirmation of results across different
individuals, groups and settings for qualitative methods, the triangulation term has accu-
fieldwork and indicate representative and mulated so many renderings that it is now
unrepresentative cases. Quantitative data can clearer to use the terms ‘convergence’ or
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 557
Methods
Different
Pa
n a
rad
me
Dif
t
ren
igm
no
fer
e
ffe
en
s
Ph
Di Similar
t
Sa
e
m
me
Sa
us
eo
Eq
an
ua
ult
l
Sim
l
tia
Un
Interactive
g
en
eq
cin
St
qu
ua
atu
en
Se
l
qu
s
Se
Independent
Inter-
dependence
Figure 33.1 Dimensions of methodological design (Original figure, drawing on Green et al.,
1989)
may account. In combination, qualitative and essential integrity of the quantitative and
quantitative methods can reveal more about qualitative components of the method. They
the extent of regularities and the dimensions represent moves to interrelation rather than
of the types. Numerous hybrid techniques juxtaposition of different forms of data.
interrelate quantitative and qualitative proce-
dures. Where codes derived from qualitative
data are recorded separately for each case, the CORE PRINCIPLES OF
presence/absence of each code can be used to MULTIPLE-METHOD
create variables, from which case-by-variable
RESEARCH DESIGN
matrices can be derived. Such matrices enable
hypothesis testing, predictive modelling and
Epistemology and pragmatism
exploratory analyses.
Statistical techniques like cluster analy- The advantages of combining methods do
sis, correspondence analysis and multidi- not require that we ignore that different
mensional scaling can be applied to such approaches are supported by different episte-
‘quantitised’ qualitative data. For example, mologies. Accepting the case for interrelating
non-standardised interviews documenting data from different sources is to accept a
types of adaptation to labour force position moderate relativistic epistemology, one that
can be used as the basis of a probabilistic justifies the value of knowledge from many
cluster analysis. The proximity and prob- sources, rather than elevating one source.
ability of classification of each respondent Taking a triangulation or multiple-method
towards the centre of the relevant clus- approach is to accept the continuity of all data-
ter (i.e. type) can thus be visualised and gathering and analytic efforts. Proponents are
categories reduced to fewer dimensions by likely to regard all methods as both privileged
multiple correspondence analysis. Kuiken and and constrained: the qualities that allow us to
Miall (2001) used this technique to specify access and understand one kind of information
experiential categories derived from interview close off other kinds. A full understanding
response in a study comparing different flows from tackling the research question in
readers’ impressions of the same short story. several ways.
Having identified attributes qualitatively, Results from different methods founded
categories were specified by a quantitative on different assumptions may then be com-
cluster analysis that systematically varied bined for different purposes than that associ-
the presence of individual attributes. Subse- ated with convergent validation. Theoretical
quent qualitative inspection of the clusters triangulation does not necessarily reduce
further differentiated the types. In her study bias, nor does methodological triangula-
of mixed-methods projects, Niglas (2004) tion necessarily increase validity. Combining
used scales to capture variation amongst results from different analytic perspectives
them on various characteristics of research or methods may offer a fuller picture but
design. Cluster analysis of variables from not a necessarily more ‘objective’ or ‘valid’
her quantitative content analysis produced one. When we combine theories and methods
eight distinctive groups and identified the we do so to add breadth or depth to our
characteristics best differentiating them. The analysis, not because we subscribe to a
findings were compared to discursive notes single and ‘objective’ truth. In the social
from her initial reading of the study to realm it is beyond our capacities to achieve
produce summary descriptions of each group. absolute objectivity or axiomatic truth, but
The descriptions were used to make the this is not the same as rejecting the attempt
final assignment of studies into categories to be objective or the standard of truth.
representing variables for further statistical It is merely to accept that our knowledge
analysis. These alternating quantitative and is always partial and incomplete. We can
qualitative procedures do not challenge the make it less so by expanding the sources
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 561
Focus groups
with vulnerable
groups to Flood
Key informant management
establish
interviews policy
understanding
of flood
warning
Sample:
Respondents
post a flood
Secondary
analysis of
existing data
establishing
Sample:
what flood
Population at
victims did
risk of
following
flooding
flood warning
Results inform
questionnaire
Interviews to Primary
discover survey to Baseline results of
discover what Flood
actions taken Results inform imputed actions
people say management
following questionnaire following flood
they would do policy
flood warning warning
following
flood warning
Results inform
questionnaire
Focus groups
Sample: to investigate
Population at public
risk of understanding
flooding of flood
warnings
Figure 33.3 Research design for Public Response to Flood Warning Project
victims following the Autumn 2001 floods. because group discussions were thought best
Phase 2 consisted of two qualitative compo- able to access people’s thinking about the
nents: focus group discussions and individual issue while action was thought most reliably
interviews. While the focus groups concen- to be accessed by interviewing individuals.
trated on public understanding and interpre-
tation of the Environment Agency’s warning Identification of risky places and
codes, the in-depth interviews explored how risky people
individuals said they would act in response to The EA projects had multiple aims and
warnings. Another important difference was outcomes but centrally depended upon the
that while focus groups largely rely on the identification of risky places and risky people.
interaction between group members and a Respondents were defined as those ‘at risk’
shared experience, the individual interviews from tidal or fluvial flooding but who may
were conducted in respondents’ own homes, never have actually experienced a flood
with the potential to provide situational cues event. The study’s multiple-method design
prompting responses. In the final phase, enabled us to negotiate the controversies
the survey used a questionnaire instrument associated with identifying this population
developed from the responses obtained in and their understanding of their risk. The
phases 1 and 2. This was designed, using ‘at risk’ samples were identified by the use
hypothetical flood scenarios, to establish how of flood plain maps. It may seem obvious
the public would respond to flood warning in that residents within the flood plains are
the event of an emergency. most at risk from flooding but measuring the
Note that the conventional sequence of pilot extent of the flood plains and quantifying the
qualitative work enabling design of a survey likelihood of floods is a contentious exercise
instrument is here augmented by preliminary exacerbated by many factors ranging from
secondary analysis, and that the qualitative climate change to the involvement of the
components were in two modes chosen insurance industry.
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 563
The EA maps identified the ‘risky places’ people). These characteristics are usually seen
but were also used to identify the ‘at risk’ as those which increase social dependence; i.e.
population living within them. Thus the old age, ill-health, disability and ethnicity (due
quantitative data was used to define the sample to language barriers). Quantitative methods
for subsequent qualitative and quantitative are nearly always used to identify vulnerable
analyses, exemplifying a ‘development’ strat- places (measuring the likelihood of an event
egy in research design (Green et al., 1989). occurring) and are also often used to identify
This ‘at risk’ population was then targeted vulnerable people. One negative consequence
by the EA ‘awareness campaigns’ designed of this approach is that individuals may
to educate the vulnerable public about flood become stereotyped based on their defining
facts. A potential five million people and functional ‘deficit’. Another problem is that
two million homes and businesses were such defined ‘vulnerable groups’ are not
targeted. However, the flood maps were an homogenous.
etic, outsider measure of those at risk and In contrast, an emic viewpoint seeks to
recognition of their risk by those affected was identify vulnerability on the basis of meanings
clearly important for appropriate public action held by individuals arising from their lived
in preparation for any future disaster. This experience and tends to be aligned with qual-
dichotomy of meaning and measurement, itative methodology. Emic vulnerability is
in terms of outsider (etic) and insider (emic) founded on a person’s/family’s/community’s
perspectives, will now be discussed. sense of their own resilience and ability
to respond in the face of a flood. Emic
Emic and etic conceptualisation of vulnerability can only be determined by
vulnerability the person experiencing it. So, a person
A useful conceptual framework for thinking who may be defined as belonging to an
about vulnerability to flood is in terms of at-risk group (etic vulnerability) may only feel
‘emic’ and ‘etic’ approaches (see Spiers vulnerable if they consider some threat to their
2000; Fielding and Moran-Ellis 2005). These self to exceed their capacity to adequately
concepts, re-interpreted from linguistics and respond, despite ‘rationally’ acknowledging
anthropology, refer to two complementary their possession of vulnerable characteristics.
perspectives. The etic perspective represents They need to recognise that they are at risk
the ‘outsider’ viewpoint and the emic an before they can effectively prepare.
‘insider’ viewpoint. Pike (1967) linked emic
and etic linguistic analysis to emic and etic Public awareness of risk
perspectives on human behaviour, developing Quantitative analysis of the ‘at risk’ popula-
a methodology for cross-cultural compar- tion based on a survey administered in 2001
isons. Pike regards emic and etic perspectives (Fielding et al., 2005) and more recently
as being like the two images of a matching reported by the EA1 , where 49 percent of
stereoscopic view. They may initially look residential respondents (41 percent in 2005)
alike but on close inspection are different, and, were not aware that their property was in a
when combined, give a ‘startling’ and ‘tri- flood risk area, made it clear that the EA’s
dimensional understanding’ of human behav- message was not getting through. Nearly half
ior instead of a ‘flat’ etic one (Pike 1967: 41). those defined as ‘at risk’ were not aware
The payoff from combination is key: ‘emic of their risk. Thus, while the quantitative
and etic data do not constitute a rigid measurement of the extent of the flood
dichotomy of data, but often present the same plains had been used to identify the ‘at
data from two points of view’ (ibid). risk’ population, other quantitative analysis
An etic viewpoint defines vulnerable indi- identified a differing perception of reality.
viduals as those at greater risk based either The imposed, outsider view defining risky
on where they live (in vulnerable places) or places was at odds with the lived experience
on demographic characteristics (vulnerable of those defined ‘at risk’. The fact that
564 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Table 33.2 Factors that influence awareness of flood risk of own property
% aware property in flood risk Total N Significance a
Age 16–24 31% 49 **
25–34 43% 207
35–44 55% 193
45–54 57% 150
55–64 56% 141
65+ 52% 201
Class A 86% 29 ***
B 62% 160
C1 49& 259
C2 47% 175
D 49% 144
E 43% 175
Source: At Risk 2001 survey
a Chi Square test significance ***p < 0.001; **p < 0.01
an emic perspective (risk awareness) was possibly their parents’ experience, they may
captured using an etic measure illustrates not have suffered flooding and therefore feel
that the etic/emic perspectives are not simply perfectly safe. EA public safety materials,
questions of method. including targeted letters and leaflet drops
Why were those who are vulnerable about the ‘objective’ risk, simply reinforce
according to etic measures not aware of a belief that the authorities do not know
their risk? This was initially explored using what they are talking about. Analysis of
the survey data relating other variables to response to flood warnings and of relevant
‘explain’ variation in the dependant variable, survey data (Fielding et al., 2005) found
awareness. However, the other variables that the most influential factor on flood
chosen, generally those indicating, in line with awareness and likely action in the event
the literature, a social or financial dependency, of a flood was previous flood experience.
drew on etic, or outsider, analysis to explain Evidence of scepticism based on local
lack of awareness. This did establish a clear knowledge and experience was found not
social class gradient, with the lower social only in verbatim responses in the survey
classes, the young and the old least aware but in elaborated form in the individual
of their flood risk (see Table 33.2). One use interviews.
made of the focus groups and interviews was In response to why no action was taken
to establish whether these most vulnerable upon receiving a flood warning, verbatim
groups feel most at risk, and to see whether responses in the survey included:
there were other explanations for lack of
awareness. Thus the qualitative data was used ‘Lived in [town] all my life and know where it floods
to complement and ‘explain’ the findings and where it doesn’t’.
from the quantitative analysis: an example ‘We were not flooded the first time so we did not
of ‘complementarity’ in the Green et al., expect to be flooded again’.
“I don’t want to be ignorant but it is absolute trash
typology. to say that this property is at risk of being flooded.
Flood researchers regularly encounter I have lived in [riverside town] all of my life and
respondents who deny that they live within I am 84 years old, and this area has never been
the flood plains identified by the EA. flooded in that time, and I am saying that with
Indeed, some actively campaign against their 30 years experience in the fire brigade. Whoever put
this address on the at risk register was very wrong,
properties being included (possibly because if the flooding ever got to this area [town] would
it affects their insurance premiums and not exist’.
thus house prices). In their experience, and (Post Events Survey 2001 verbatim responses)
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 565
While interviews yielded similar responses, time ago it’s not worth worrying about … Which
e.g. TD: No, I’ve lived ‘ere thirteen years and I could understand.
I’ve never felt [at risk], never (Parent Inter- (New residents interview (FWVG Project))
view, FWVG Project), the finer-grained data There is indeed ‘objective’ cause for scepti-
also contained indications that ignorance was cism about flood risk information. Flood plain
a factor. maps underestimate risk in the case of
I knew about floodplains but I didn’t imagine for flooding caused by inadequate storm-drains
one minute that where we’re located was on a or groundwater and surface water runoff,
[floodplain], in fact I didn’t even know […] there and overestimate where flood defences or
was a bloody river, that was a surprise, I knew local topography have not been accounted for.
the hump back bridge [I] go over [it] every day but
I didn’t know there was a river in that proximity.
In addition, the EA’s own literature concedes
(New residents focus group (FWVG Project)) the maps ‘… cannot provide detail on indi-
vidual properties’2 . There was evidence of
Interviews suggested that experience could disbelief in the integrity of the maps among
negate ‘objective’ awareness: ‘at risk’ respondents, who had taken no action
F: I don’t actually feel at risk. I mean I’m quite kind when warned:
of aware that I live on
‘Being on first floor flat didn’t worry’
[a floodplain because], … we have had leaflets ‘Because property is not in flood area’
through saying you’re in a blue zone and … (Post Events Survey 2001 verbatim)
knowing environmentally I could see there was a
rise and you know floods that happened like … There were hints of conspiracy between the
Lewes and Cornwall. EA and insurers from respondents:
(Owner occupier focus group (PRFW Project))
But as soon as you give your postcode they
This respondent was aware of the flood immediately know you’re in a high risk flood area.
risk but discounted it from lack of experience […]
of flooding. Participant 1: Even if you’re not, I mean I notice on
the list of roads that you gave us one of those was
F: I think that’s it, I think because I haven’t actually … Hill, well I mean that’s literally up on the Downs,
experienced anything either. how can you possibly flood up there? [Laughter] […]
And yet as far as … the insurance companies are
Several respondents recognised their lack of concerned, all they have is your postcode […] The
awareness but blamed it on lack of official Environment Agency’s stated that you are in that
warning when they moved into the area, which area.[…] Participant 3: And in the harbour there
in turn was blamed on the long time lapse, are seven storey blocks … so if you live in the top
of the storey […]
and therefore reduced risk, since the last You’re still going to be penalised.
flood: (Owner occupier focus group (PRFW Project))
…It’s just ignorance on all of our parts because Depending on personal circumstances,
nobody had told us in the first place you know,
if you only get flooded in the last time in 1968
recognition of vulnerability to flood risk,
everyone sort of forgets about it and if we’d have according to the ‘etic’ flood maps, may either
probably known that there was a chance that we be accepted and acted upon, a situation where
were going to get flooded you might have done the emic and etic perspective coincide, or
something about it sooner. rejected where etic and emic viewpoints
F: And [property] searches … you only have to
give the last twenty years history.
are at variance. In the latter case there are
(Families focus group (FWVG Project)) two possibilities. First, the respondent is not
[Second participant] I looked for it you know actually at risk – due either to an error in
because I phoned my solicitor up and gave him a the flood maps (the respondent lives on a
piece of my mind and he said well … it does show hill or recent flood defences have not been
up in your search and he told me the page it was on
but he said it is 1968, it’s quite a long time ago so
taken into account) or personal circumstance
he said I never really mentioned it to you because (the respondent lives above the ground floor).
I thought that … perhaps [because] that was a long Second, the respondent is at risk but does not
566 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
perceive this risk to be significant. Reasons for declared such a rationale did not use multiple
this are diverse: they may lack information methods in the study itself, and yet other
about the risk; through past experience and researchers who declared both a rationale
local knowledge their perception of their and followed it through by using multiple
coping ability may outweigh perceived risk; methods actually relied on a single method
acknowledging the risk may have negative for their analysis. These divergences reflect
impacts (psychological and/or economic); the fact that MMRD is not a technique, like
or they may distrust the flood maps. calculating tests of significance or running a
So, while there is value in identifying cross tabulation, but an attitude of inquiry,
those ‘at risk’ to target awareness campaigns an approach to quality standards and to what
or to explore the environmental justice constitutes adequate explanations of social
agenda, it must also be recognised that phenomena.
vulnerability is a quality of experience and The policy community – government,
produces different responses in different voluntary organisations and interest groups –
individuals. Rather than regard emic and etic is a growing consumer of social science
perspectives as competing versions, complex research. In the UK and USA those engaged
social phenomena require coordination of the in commissioning research have increasingly
perspectives and their associated method- construed adequate research as multiple-
ologies. The principal social science tool method research. At root, MMRD is a grow-
enabling such an approach is a mixed-method ing orthodoxy because of the ‘common sense’
design that assigns different roles to different appeal of the underlying logic (combined with
methods. either a measure of ignorance or indifference
to the epistemological differences between
methods), but the trend is also related to
THE STANDING, USES AND FUTURE OF the increasing promotion of ‘evidence-based
METHODOLOGICAL COMBINATION policy’, which has engendered significant
institutional moves towards standardisation
of research methods, manifest in professional
The contemporary practice of
reviews of research capacity, such as the
multiple-method research
Rhind Report in the UK (2003).
The status of MMRD contrasts in the To overcome what are regarded as the
academic and applied research spheres. constraints on the representativeness and
MMRD remains controversial in the academic generalisability of qualitative research, gov-
sphere. Since the canonical formulation of ernment has initiated both topic-specific
‘triangulation’ in the 1950s, the social sci- reviews of quality standards for research
ences have developed a range of considered (such as in health) and generic reviews
objections on grounds of epistemology and of quality standards for particular methods,
incommensurability of methods. The situation such as qualitative research (e.g. the Spencer
contrasts with that in applied research, where Review for the UK’s Cabinet Office; Spencer
many regard MMRD as a practical necessity. et al., 2003). Such reviews tend to result
Bryman (2005) compared planned research in checklists of ingredients for reliable
design and actual practice in studies claim- and valid research, and are uncomfortable
ing MMRD, finding substantial divergence reading for those who do not construe social
from the kind of planned use of MMRD research as a matter of following recipes,
that we might expect if the concept of but there is no doubting the significance of
MMRD was firmly established as part of such developments. In particular, qualitative
the methodological canon. Researchers some- research may have ‘arrived’, but it is welcome
times employed multiple methods without at the platform only provided its findings
any rationale for why this was superior to can be associated with findings from research
using a single method; other researchers who using other methods.
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 567
Long before checklists emerged for qual- qualitative studies. Initially their idea was
itative research they were already a familiar to simply add together the samples from
part of the environment for quantitative a number of qualitative studies of parental
researchers. Criteria in that area reflect the resistance until they had what they regarded
tidier characteristics of quantitative method- as a large enough sample size from which
ology and benefit from the benchmark to draw inferences. These researchers had no
standards that are intrinsic to work with direct expertise in qualitative research. Their
statistical data, such as expected sample sizes, background was in epidemiology. It had to
accepted tests of association and standard be explained that simply ‘adding together’
measures of effect size. So the checklist a cluster of qualitative studies would be
approach emerged earlier in relation to to ignore the different modes of eliciting
quantitative research and attracted less con- parental views, different analytic techniques,
troversy. A major application of large-scale different degrees of experience of vaccination
quantitative research is to health research amongst the respondents and so on. ‘Adding
and much of the heuristic associated with together’ would do little more than multiply
quality standards for quantitative research was error.
laid down in the context of epidemiological
research, which is associated with large
Technological transformations
samples and experimental/control designs.
This approach is sufficiently embedded in While the institutional frames within which
the apparatus of policy-making that it has multiple-method research is conducted cast
taken institutional form in organisations like a strong influence over what is understood
the ‘Campbell collaboration3 ’ in criminal as legitimate methodological practice, social
justice and the ‘Cochrane collaboration4 ’ in research methodology is also responsive to
health. Membership represents a kind of new techniques, particularly those emergent
official seal of approval to conduct research from the computational field. In this section
in this area and members must produce we consider some current and potential ‘trans-
research that adheres to inflexible quality formative technologies’ for their potential
standards. impact on the future of multiple-method
Ill-considered multiple-method research research.
can lead to real methodological traps. We A recent means of interrelating quali-
might take an example from the health tative and quantitative data that embraces
field, concerning the UK controversy over Caracelli and Green’s integrated approach
the Measles, Mumps and Rubella (MMR) has emerged largely by stealth. This is
vaccine, a combined vaccination against the development of quantification routines
common childhood diseases. A small sample within computer-assisted qualitative data
study conducted by a medical researcher analysis (‘CAQDAS’). Most qualitative soft-
suggested a link between the vaccine and ware counts ‘hits’ from specified retrievals
autism, and received considerable publicity. (e.g. all single female interviewees who
During the 1990s parental resistance to commented on divorce), and encourages
MMR vaccination grew, and many parents triangulation by offering a port to export
demanded that the National Health Service data to SPSS and import quantitative data
instead provide single vaccines against the tables. Some argue that such facilities repre-
various diseases. Other parents refused all sent a hybrid methodology transcending the
vaccination. Both forms of parental resistance quantitative/qualitative distinction (Bazeley
increased the incidence of the diseases. Health 1999; Bourdon 2000). These claims relate to
policy researchers were asked to address these software that enables statistical information
problems. They wanted to add qualitative to be imported into qualitative databases and
understanding to epidemiological and survey used to inform coding of text, with coded
data. They proposed a ‘meta-analysis’ of information then being exported to statistical
568 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
software for further quantitative analysis. so that findings from integrated qualitative
For example, NUD*IST’s table import and studies can in turn be related to findings
export functions enable manipulation of from quantitative research, exploiting meta-
exported data either as information about analysis strategies. Studies of family formation,
codes that have been applied to the text or the household economy and health-related
a matrix built from cross-tabulated coded behaviour are amongst areas where a number
data. Some packages also have a command of qualitative studies, rich in themselves,
language for automating repetitive or large- have proved unable to ‘talk to each other’
scale processes, allowing autocoding of data. due to varying conceptualisations addressing
Quantitative data can be imported to inform fundamentally rather similar characteristics.
interpretation before detailed coding, such as XML protocols provide the basis of a
divisions within the sample that emerged from meta-data model to integrate individual
survey response. analyses from cognate small-scale studies.
Possibilities for interrelating data range In other words, we increasingly have just
from sorting qualitative comments by cate- the tools the medical researchers wanted
gorical or scaled criteria to incorporating the in the MMR example above. By creating
results of qualitative coding in correspon- a translation protocol between researchers,
dence analysis, logistic regression or other data, contexts and interpretations, using an
multivariate techniques. Categorised response XML data model and wrappers around each
sets exported to a statistics package for individual study, the meta-data model can
analysis are still linked to the qualitative access and query individual datasets. An
data from which they were developed. For ontology is used to specify a common
example, a table in N-Vivo provides access vocabulary for both methodological and
to qualitative data from each cell of the substantive facets. The ontology is in effect
matrix produced when a cross-tabulation- a practical conciliation of quantitative and
type search is performed across data files. qualitative epistemology. Defining it draws
This enables users to show any number out and reconciles different constructions
of socio-demographic characteristics against of the features of the same phenomenon.
any number of selected codes. Supplementing The procedure of matching up the dis-
counts of hits, colour-graduation of table parate terminologies employed by different
cells flags the density of coding in each researchers in a number of independent
cell. Analytic searches can thus be com- studies enables a ‘scaling up’ of findings
posed of combinations of interpretive coding without the problem of multiplying error.
and coding representing socio-demographic The ontology ‘translates’ between projects
details. (so that what study A calls ‘conflict over
Since the emergence of Grid and High Per- shared space’ is matched to ‘kids fight over
formance computing in the late 1990s, a suite bathroom rights’ in study B etc.), enabling
of new research tools has become available generalisations and heuristics derived from
to social scientists (see Fielding 2003). Large the different studies to be reliably combined
gains in computing resource offer new data- while genuine differences are identified and
handling capacities and analytic procedures, highlighted.
and new facilities to archive, curate and Another e-Research tool relates to the
exploit social science data. A development under-exploitation of archival data, particu-
relevant to methodological integration is larly in the qualitative field. The capacity to
in ‘scaling up’ findings from small-scale link data is a key issue in exploiting archived
studies, which often have small sample data: linking qualitative and quantitative data,
sizes, non-standardised definitions and non- and linking material like personal biographies
cumulative patterns of inquiry, in such a to census data, maps and so on. ‘Data Grids’
way that inquiries by cognate qualitative enable researchers to share annotations of data
researchers can build on each other, and and access multimodal, distributed archival
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 569
material with a view to producing multiple, detect points of disparity have a helpful part
inter-linked analytic narratives. A given data to play.
event can be represented by multiple streams The potential analytic yield of multiple-
and captured using multiple tools (for sound, method research from fully exploiting
image, transcript, statistics). ‘Asset manage- expensively gathered social science data
ment’ software such as ‘Extensis Portfolio’ and drawing on the analytic affordances of
and ‘iVIEWMEDIA Pro’ enable a range computational technologies is very attractive.
of data types to be held in an integrated Such applications interest several disciplines,
environment that supports data collection, including anthropologists working with visual
analysis and authoring. Such an approach archives, linguists with sound archives and
was used in a multimedia ethnographic humanities and social researchers interested
study of a heritage centre (discussed in in multimedia work. More significantly, the
Fielding 2003). Grid computing resources ability to interrelate a host of data sources
were used to distribute large audio and offers the potential for multimethod research
video datasets for collaborative analysis. For to address social science ‘grand challenges’,
example, ‘Hypercam’ software was used to such as the relationship between social
record ‘physical’ interaction within a 3D exclusion and educational achievement in a
graphical environment as a way of annotating mixed economy, in such a way that the kind
and modelling different visitor behaviours of predictive capacity and causal explanation
in heritage centres. The 3D files could be associated with the natural sciences comes
streamed over networks via the Internet, into frame for the social sciences.
enabling researchers at other centres to com-
ment on and modify the behavioural models in
real time. Data Grids also enable researchers NOTES
to access image, statistical or audio files held
in remote archives and to work on them 1 http://www.environment-agency.gov.uk/news/
over networks (e.g. collaboratively, or using Environment Agency launches campaign to tackle
specialist software not available locally) or flood apathy (12/10/2005) Accessed 20/02/2006.
download them. Thus, an image database 2 http://www.environment-agency.gov.uk/subjects/
flood/826674/829803/858477/862632/?version=1&
compiled in one study can be systematically lang=_e#3
compared to those from others. 3 http://www.campbellcollaboration.org/
Technology opens up new types of mode index.html
comparison. The oldest ‘research’ technique 4 http://www.cochrane.org/index.htm
is pure observation and we still gain much
from carefully watching what people do.
Multimedia tools like THEME combine REFERENCES
multivariate methods to detect behaviour
patterns over time (Koch and Zumbach 2002). Bazeley, P. (1999) ‘The bricoleur with a computer’,
THEME searches for syntactical real-time Qualitative Health Research 9 (2): 279–287.
patterns based on probability theory. Applying Bazeley, P. (2006) ‘The contribution of qualitative
it to digital film, interaction patterns relating software to integrating qualitative and quantitative
to complex behaviours can be found that data and analyses’, Research in the Schools 13 (1):
are not detectable by ‘eyeballing’ the data. 63–73.
Becker, H. (1986) Writing for Social Scientists, Chicago:
Comparisons can then be made between what
University of Chicago Press.
is found using observation recorded in con-
Blaikie, N. (1991) ‘A critique of the use of triangulation in
ventional field notes and using THEME. Since social research’, Quality and Quantity 25 (2):
MMRD is all about making connections, 115–136.
technologies that allow researchers to derive Bourdon, S. (2000) ‘QDA software: Enslavement or
comparator datasets, open up their own data liberation’, Social Science Methodology in the New
to collation with that gathered by others and Millennium: Proceedings of the Fifth International
570 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Conference on Logic and Methodology, Köln: Fielding, N. and Schreier, M. (2001, February).
Zentralarchiv fur Empirische Sozialforschung. Introduction: On the Compatibility between Qualita-
Bryman, A. (2005) ‘Why do we need mixed meth- tive and Quantitative Research Methods [54 para-
ods?’. Presented at ‘Mixed-methods: Identifying the graphs]. Forum Qualitative Sozialforschung/Forum:
issues’, Manchester, 26–27 October 2005. Qualitative Social Research [On-line Journal], 2(1).
Burningham, K., Fielding, J., Thrush, D. and Available at: http://www.qualitative-research.net/
Gray, K. (2005). Flood Warning for Vulnerable fqs-texte/1-01/1-01hrsg-e.htm [accessed 6 August
Groups: Technical Summary. Bristol: Environment 2007].
Agency. Green, J., Caracelli, V. and Graham, W. (1989) ‘Towards
Campbell, D.T. (1981) ‘Comment: another perspective a conceptual framework for mixed-method evaluation
on a scholarly career’, in M. Brewer and H. Collins, design’, Educational Evaluation and Policy Analysis
eds., Scientific Inquiry and the Social Sciences, 11 (3): 255–274.
San Francisco: Jossey Bass, pp. 454–486. Hammersley, M. and Atkinson, P. (1995) Ethnography:
Campbell, D.T. and Fiske, D.W. (1959) ‘Convergent and Principles in Practice, London: Routledge. 2nd
discriminant validity by the multi-trait, multi-method edition.
matrix’, Psychological Bulletin 56: 81–105. Johnson, R.B. and Onweugbuzie, A.J. (2004) ‘Mixed
Campbell, D.T. and Russo, M.J. (1999) Social Experi- methods research’, Educational Researcher 33 (7):
mentation, Thousand Oaks CA: Sage. 14–26.
Caracelli, V. and Green, J. (1993) ‘Data analysis Koch, S.C. and Zumbach, J. (2002, May). The Use
strategies for mixed-method evaluation designs’, of Video Analysis Software in Behavior
Educational Evaluation and Policy Analysis 15: Observation Research: Interaction Patterns in
195–207. Task-oriented Small Groups [37 paragraphs]. Forum
Caracelli, V. and Green J. (1997) ‘Crafting mixed method Qualitative Sozialforschung/Forum: Qualitative
evaluation designs’, in J. Green and V. Caracelli, eds., Social Research [On-line Journal], 3(2). Available at:
Advances in Mixed Method Evaluation, San Francisco http://www.qualitative-research.net/fqs-texte/2-02/
CA: Jossey Bass. 2-02kochzumbach-e.htm [accessed 6 August 2007].
Creswell, J.W. (2003) Research Designs, Thousand Oaks, Kuiken, D. and Miall, D.S. (2001, February). Numerically
CA: Sage. Second edition. Aided Phenomenology: Procedures for Investigat-
Denzin, N. (1970) The Research Act, Chicago: Aldine. ing Categories of Experience [68 paragraphs].
Denzin, N. (1989) The Research Act, New York: McGraw Forum Qualitative Sozialforschung/Forum: Qualitative
Hill. Second edition. Social Research [On-line Journal], 2(1). Available at:
Denzin, N. and Lincoln, Y.S. (2000) ‘Introduction: http://www.qualitative-research.net/fqs-texte/1-01/
the discipline and practice of qualitative research’, 1-01kuikenmiall-e.htm [accessed 6 August 2007].
in N. Denzin and Y. Lincoln, eds., Handbook of Levins, R. (1966) ‘The strategy of model building
Qualitative Research, Thousand Oaks, CA: Sage, in population biology’, American Scientist, 54,
pp. 1–28. 420–440.
Elliott, J. (2005) Using Narrative in Social Research, Morgan, D. (1998) ‘Practical strategies for combin-
London: Sage. ing qualitative and quantitative methods’, Qualitative
Fielding, Jane and Jo Moran-Ellis (2005) ‘Synergies and Health Research 8 (3): 362–376.
tension in using multiple methods to study vulnera- Needham, R. (1983) The Tranquillity of Axiom,
bility ’. Presented at ‘Mixed-methods: identifying the Los Angeles: University of California Press.
issues’; Manchester, 26–27 October 2005. Niglas, K. (2004) ‘The combined use of qualitative
Fielding, J., Burningham, K., Thrush, D. and Catt, R. and quantitative methods in educational research’,
(2007) Public Response to Flood Warning’, Bristol, Tallinn, Estonia: Tallinn Pedagogical University.
Environment Agency. Pike, K.L. (1967) Language in Relation to a Unified
Fielding, J., Gray K., Burningham K. and Thrush D. Theory of Human Behavior, The Hague: Mouton.
(2005) Flood Warning for Vulnerable Groups: Sec- Rhind, D. (2003) Great Expectations, London: Academy
ondary Analysis of Flood Data, Bristol: Environment of Learned Societies in the Social Sciences.
Agency. Sieber, S. (1973) ‘The integration of fieldwork and survey
Fielding, N. (2003) ‘Qualitative research and E-Social methods’, American Journal of Sociology 78 (6):
Science: Appraising the potential ’, Swindon: ESRC, 1335–1359.
pp. 43. Spencer, L., Ritchie, J., Lewis, J. and Dillon, L. (2003)
Fielding, N. and Fielding, J. (1986) Linking Data, Beverly ‘Quality in qualitative evaluation: a framework for
Hills: Sage. assessing research evidence’, Government Chief
SYNERGY AND SYNTHESIS: INTEGRATING QUALITATIVE AND QUANTITATIVE DATA 571
Social Research Office Occasional Paper 2. London, Tashakkori, A. and Teddlie, C. (1998) Mixed Methodol-
Cabinet Office. ogy, Thousand Oaks CA: Sage.
Spiers, J. (2000) ‘New perspectives on vulnerability Webb, E., Campbell, D., Schwartz, R. and Sechrest, L.
using emic and etic approaches’, Journal of (1966) Unobtrusive Measures, Chicago: Rand
Advanced Nursing 31 (3): 715–721. McNally.
34
The Analytic Integration of
Qualitative Data Sources
Ann Cronin, Victoria D. Alexander, Jane
Fielding, Jo Moran-Ellis and Hilary Thomas
lack something. For example, people may visual realm. Similarly, accounts generated
be classified as ‘vulnerable’ because they in interviews may emerge with different co-
are homeless, children are assumed to be constructions of vulnerabilities than those
essentially vulnerable and older people are generated through life history interviews.
seen as vulnerable when they lack power and Critical reflection on these possibilities points
capacity. Undoubtedly the uneven distribution towards the potentially heterogeneous nature
of economic, social and political power in of our qualitative datasets and the implications
society leads to certain groups of people of this for integration of these particular
being at greater risk of adverse events data. One implication concerned analytic
such as ill-health, trauma or material loss. approaches to different sets of data, and the
However, this one-sided approach tells us question of how to analyse each dataset using
very little about the experiential nature either an approach appropriate to the nature of that
of being a member of such a group or of data, so that its epistemological contribution
feeling vulnerable. Furthermore, designating to understanding the phenomenon is realised,
specific groups of people ‘vulnerable’ and whilst also being able to integrate the analyses
implying others are ‘not vulnerable’ leaves to produce explanations and understandings
us unable to examine how people (regardless which were greater than the sum of the parts.
of their situation) experience and manage The in-depth interviews were based on
vulnerabilities in everyday life. As Wisner conventional practices of using a broad
(1991: 128) argues, research on vulnerability schedule of topics to guide the interview,
needs to ‘create ways of analysing the vulner- and being responsive to participants’ own
ability implicit in daily life’, and the coping accounts of their experiences and meanings
strategies that people develop to manage with regard to questions asked. On the
these. This conceptualisation of vulnerability basis of this, we considered that the most
points towards the research methods which appropriate analytic approach to the dataset
can capture these experiential aspects. generated through in-depth interviews was
In the PPIMs project we used three methods that of a grounded thematic analysis. For this
to generate qualitative data in respect of the researcher typically begins by examining
the experiential nature of vulnerability: in- the data line by line, identifying themes and
depth interviews, life histories, and visual coding these (see Coffey and Atkinson, 1996),
methods. These different methods have the then developing these codings to capture
potential to tap into different dimensions of multiple meanings, coding convergence and
vulnerability. For example, verbal accounts divergence, and the relationship of codes to
of vulnerability elicited through in-depth broader categories. The process is iterative,
interviews allow exploration of meanings and involves segmenting the data. Analysis
of vulnerability whereas accounts generated then proceeds through consideration of codes
through photo-elicitation interviews may con- and categories to develop a thematic level of
nect with constructions connected to the analysis.
THE ANALYTIC INTEGRATION OF QUALITATIVE DATA SOURCES 575
The practicalities of the process of compar- to narrate their experiences and thus situate
ison of segments of data leads to an enduring the issue of vulnerability in a broader context.
problem of this type of analysis, namely that Consequently, these accounts required a
the segments are to some extent removed from different analytic approach. To have analysed
the contexts of their occurrences within the such accounts through thematic analysis –
interview. The development of the thematic which pulls short segments out of the whole
analysis requires the research to re-connect interview, fragmenting it – would not have
segments to contexts in order to derive maintained the integrity of the participants’
legitimate interpretations of the data. stories. Accordingly they were analysed using
The visual study component of the project a sociologically informed narrative analysis
was based on two visually rooted meth- approach.
ods: photo-elicitation interviews based on Narrative analysis focuses on the social
photographs, and video-recorded neighbour- construction of the story and the role that
hood tours. In this chapter we focus on stories play in the construction and presen-
the verbal dataset generated through the tation of identity (Rosenweld and Ochburg,
photo-elicitation interviews. Photo-elicitation 1992). Moving beyond the idea that a story is
interviews involve participants discussing representative of an individual life, attention
photographs with the researcher. In our is focused on the ‘joint actions’ involved
study, participants themselves generated the in the production of the story. Plummer’s
photos about which they were interviewed. (1995) tri-partite model of the producers
Collier (1967), an early advocate of this (those who tell their story), the coaxers (those
technique, suggests that the use of pho- who encourage and enable the story to be told)
tographs during interviews helps frame and and the consumers (those who read/hear the
focus the discussion, sharpen memory, evoke story) is illustrative of this mode of thinking.
rich descriptions and set the informant at Even though the producer (teller), encouraged
ease. The interview enables participants to by the coaxer, draws on real events and
discuss their interpretation and meaning of experiences to tell the story, the story is
the photographs and to provide an explanation only ever an interpretation of the significance
for why they chose to photograph what they of past events and experiences. Finally the
did. We felt that for this dataset a thematic consumer will add another layer of meaning
approach to the photo-elicitation interviews and interpretation onto the story. As Riessman
was also appropriate. However, the presence (1993) notes, representation is ambiguous and
in the photo-elicitation interview data of always open to different interpretations. Thus,
references to the photographs, and hence to the both the meaning and consequences of a story
visual realm, both by participants and by the is always contingent upon first, the social
researcher, created a framing of participants’ location of those involved in the production
experiences to which the thematic analysis and consumption of the story and second,
also had to attend. the wider social context in which the story
The third set of qualitative data was gener- is told. For our purposes here we focus on
ated via interviews with people who were, or the producers (the participants) who tell their
had recently been, homeless. Our experience stories.
of running a focus group with previously In contrast to thematic analysis, narrative
homeless individuals indicated that although analysis begins by identifying the ‘sequence’
participants were willing to engage in inter- of a story. While ‘sequencing’ can take many
active discussion about their experiences of forms, including chronological, consequential
homelessness, they were concerned to present or thematic sequencing, it focuses attention
their own life accounts, or stories, of home- on the socially constructed nature of the
lessness. Taking this into account, subsequent story. Thus analysis moves beyond the mere
individual interviews specifically used a life identification of past events and experiences
history approach which enabled participants to concentrate on trying to understand the
576 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
young people in our study. On this basis tend to characterise notions of vulnerable
we moved to Step 2 and took it up as a groups and individuals at an objective level.
‘promising’ thread, systematically identifying
and analysing ‘physical safety’ in these
and the other datasets we had generated. Step 1 – Initial analysis
Through this we identified codes and cate- The photo-elicitation interview data
gories, and generated emergent findings on Thematic analysis of the photo-elicitation
‘physical safety’ for each dataset. This led interviews suggested that participants associ-
on to Step 3 where we juxtaposed these ated a threat to physical safety with specific
to create a data repertoire. This repertoire places, groups of people or hazards, with a dis-
was then analysed further, with particular tinction being made between physical assault
emphasis on analytic questions such as and accidents. Photographs of dark alleyways,
whether issues concerning vulnerability and deserted paths, and graffiti were taken by
physical safety were persistent features of respondents to represent unsafe places where
experiences of vulnerability, the different assaults could occur. Participants often said
facets that were revealed by different research that they avoided these places, especially at
methods, and the importance of contexts night. Photographs of a fast lorry, a dark street
for how this form of vulnerability was and a blind curve represented potential traffic
experienced. hazards. Even though participants said that
Whilst it is interesting that two of our sets they had to exercise due care, these hazards
of participants – people who are homeless, were constructed as being beyond the control
and children and young people – are usually of the individual and responsibility was seen
classified as ‘vulnerable’ in policy terms, or to rest with ‘the Council’. In relationship to
categorised as members of a vulnerable group a photo (of a blurry lorry), one participant
in objective measures and conceptualisations commented:
of their social position, they were not selected
for particular analytic attention on this basis. I hate the lorries using this as a rat run to the
industrial estate at the end because they make the
Rather their data have been given prominence
house shake. The whole road is up in arms about
here because of the strong resonances we that. (Jane, 37 years old)
found between the emergent findings in the
analysis of the photo-elicitation interviews, Participants made a further distinction
which were conducted with a range of people, between potential threats (either malicious or
and the initial analyses of the data generated accidental) to their own safety and threats to
with these two groups of participants. Our other people. In the latter case, participants
orientation to all the participants in the talked about the threat to specific groups
study was to their subjective understandings of people – children, the elderly or the
and constructions of vulnerability in their disabled – suggesting they saw vulnerability
everyday lives, and their accounts of how they as being an inherent characteristic of these
strategically manage these vulnerabilities. particular groups. One respondent, for
This precludes any assumptions being made example, photographed an uneven pavement
about essentialised or inevitable vulnerabil- which she saw as a potential tripping hazard.
ities for any group of participants in our She was not concerned for her own safety, but
studies. In this respect, the use of multiple referenced ‘vulnerable old people’, perhaps
qualitative methods was particularly valuable with walking sticks, who could easily trip.
as it enabled us to gain an extensive and The same respondent photographed the
intensive exploration of vulnerability as a detritus of drug use but focused her concern
subjective interpretive phenomenon. This on this being found near a primary school:
allowed for people’s own understandings … it’s literally about 50 yards from the back end of
and agency and moved away from the the school field and there is a gate that goes from
overarching deterministic discourses which the junior school, to this. It is literally about 50 yards
578 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
and you go down there and they have got, they photographs as well as elaborated in the
have made, bits of furniture that have been chucked accompanying photo-elicitation interviews, it
away, like that was a table and all around there
was only in the latter interviews that people
there is paraphernalia, what I call paraphernalia.
There is drink cans, there is coke cans where they talked about how they managed potential
have made bombs to smoke drugs, there’s even threats to their physical safety. From this
silver foil where they have actually, we did have a verbal data we identified three key strategies
look and it looked as though they had been smoking which participants used to minimise either
heroin and that is a concern, obviously, to the whole
actual risk or their perception of risk. The
of the neighbourhood because any kids of any age
can go down there. (Alice, aged 56-65) first strategy was to avoid places or people
categorised as unsafe. The second was related
Certain types of public space, represented to the degree of familiarity participants felt
by photographs of alleyways, overgrown pas- about their local environment. While a high
sages between buildings and a subway were degree of familiarity could be used to aid
considered intrinsically unsafe, particularly at decisions about which places or groups of
night time, due to the potential for physical people to avoid, it was also used to ‘offset’
assault. The canal had a more ‘fluid’ status feelings of insecurity or a lack of safety. One
as a safe/unsafe place, seen as a recreational woman, for example, claimed that she felt safe
amenity during the day but dangerous after living in the neighbourhood despite knowing
dark. In addition, specific groups of people that other people had been assaulted there. She
(the homeless, drunk people, local youth had lived in the neighbourhood for a long time
gangs) were labelled ‘trouble’ or ‘scary’, and was familiar with it, and so felt it was safe.
generally because they represented a potential This links to the third strategy of displacing
threat to an individual’s safety. Even though the perception of risk to oneself onto groups
participants did not take photographs of of people already designated vulnerable.
people whom they feared – participants cited
safety reasons for not photographing these
threatening people, but also said that they did Step 2 – Picking up the promising
not feel comfortable invading the privacy of thread of ‘physical safety’
such individuals – other means were used to
indicate the sense of threat felt by participants. The accounts of children and young people
For one respondent, a photograph of graffiti Picking up the thread of physical safety in the
was emblematic of a gang of youths who were interviews with children and young people,
considered unstoppable due to the support the analysis showed that these participants
they enjoyed from older male relatives. In were often making decisions, and taking
contrast, participants took images of graffiti actions in relation to their safety, based on
to suggest that crime was generally prevalent other people’s worries and concerns rather
in the area where it was found. than their own. In particular they were subject
Photographs of CCTV cameras were to the worries and concerns of their parent(s),
presented as either representing the dual which varied in terms of what the worry was,
sword of security and surveillance or, in and how strongly it was a factor in parental
one instance, given that an old man had moves to constrain their child’s actions:
been physically assaulted twice under the
I: Are there any […] rules that your parents set
photographed camera, used to question the [about using the internet]?
notion of security implicit in the use of P: Not really but they don’t let us have hotmail
CCTV cameras. In contrast to these examples, because of the chat room, my sister had it
photographs of personal spaces – homes, but I don’t know what she did but then they
banned it … so I don’t get the benefit which
gardens or bedroom – were taken to indicate
I think is really unfair as all my friends have it
safe, comfortable places. and I’m the only one who doesn’t have it
Whilst experiences and perceptions of I: Do you understand the reasons why you can’t
vulnerability were represented visually in the have it?
THE ANALYTIC INTEGRATION OF QUALITATIVE DATA SOURCES 579
P: Not really, I asked but they wouldn’t tell me. Constraints related to safety were also often
(Tom, age 13 years) contingent on time of day. The arrival of ‘the
dark’ was a particularly important marker of a
The children in our study, aged 10–13, shift from a safe time to an unsafe time. In this
indicated they were constrained concerning respect, the temporality of safety and threat
their actions, the places they could go, and resonated with a similar framing by adults as
how they got there. In general they accepted well as children in the visual data accounts.
these limitations whilst also wishing for, However, in the visual data it was named
and indeed trying to gain, greater autonomy places that became less safe with the arrival
in their movement in public spaces. Two of night time, whilst in the interviews with
of the children who had recently started the children parental fears were understood as
cycling into the town centre on their own being simply about ‘the dark’:
identified this as an extension of their usual
domains beyond the house and garden. I: What about when you are outside playing?
Are there rules about where you can go or
Undertaking this venture was accompanied what time?
by an acute awareness that they needed to P: Sometimes I am not allowed to go to the park
guard their safety in respect of being in the I have to stay right in front [garden]. And we
town unaccompanied by an adult. Thus the are not allowed to come home really late.
threat, and their physical vulnerability, was I: What is late, what would be late?
P: Well, when it gets dark. When it gets dark.
associated with being in a particular place (Yasmin, 11 years old)
without the protection of an adult rather than
the hazard of cycling on the roads (the latter In contrast young people, generally aged
being a safety issue they did not mention). 14–18, felt that these worries about safety
Another child spoke of his sense of a particular belonged really to their parents and did not
threat to his safety when he was not in the reflect the safety issues that they actually
company of a protective adult: had to deal with when they were out and
about in public spaces. These young people
I: What is it about strangers that you worry
about?
identified having to deal with threats of
P: Kidnapped. violence: some of the places they went – the
(Jack, 13 years old) amusement arcade, the town centre – opened
up the possibility that they might encounter
Indeed for some children the threat of individuals who wanted to fight, gangs or
being kidnapped or murdered framed their general violence. Thus, it was important
reflections on whether there were places in to know when to leave a place and who
the town that they might not go, or where to avoid. Furthermore, young people often
they had to be careful. These threats were worked to manage their parents so they did
‘monstrous’ but at the same time the children not find out about these hazards, for example
outlined their strategies for maximising their by withholding information as to their true
safety, primarily through being able to identify whereabouts or by presenting themselves in
people who might pose such a threat: ways designed primarily to reassure their
parents:
P: If I like see someone who doesn’t, if it’s late
or something and I find, if I see someone who I go to my friend’s house and we’ll go out, and I’ll
doesn’t look like normal than I just walk off just text my parents and say we’ve gone here, there
with my mates and go somewhere else. or wherever. If I’m staying at a friend’s house, I will
I: […]what kind of things do you look for when go out with them but won’t tell my parents. (Lucy,
you’re trying to decide if someone’s OK or a 14 years old)
bit?
P: It’s just like if he doesn’t look right, they’re While space does not permit a full discussion
watching and things. here, one of the girls in the study also talked
(Stuart, age 12 years) about managing gendered threats to her safety
580 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
from men, whilst another indicated that this hide this identity: he avoided mixing with
was a parental worry that she had to negotiate other homeless people in public therefore
in order to be allowed out with her friends or hoping to ‘pass’ as a general member of the
on her own. public, thus remaining safe. Another resident
Young people who articulated a definition of the night shelter – Tom, a man in his late
of vulnerability tended to associate it with the thirties and homeless since the age of 14 – had
ability or inability to defend oneself physically developed additional strategies to cope with
from attack. the physical threats that arise from being street
homeless. On arriving in a new and unfamiliar
The narrative data: Stories of homelessness environment he applied knowledge gained in
Picking up the thread in the narrative previous locations to the new location, in short
interviews with people who were homeless, constructing a ‘universalised’ safety ‘map’.
the analysis revealed the ways in which the For example, previous experiences had taught
topic of physical safety in the accounts of him that the chances of being physically
people who had experienced homelessness attacked were higher if he slept in the centre
was a salient factor in both the construction of a town as opposed to the outskirts, thus he
of identity, and the material practices of daily routinely avoided the centre of all towns.
life. Physical safety – the lack of it, the search Participants who had been through drug
for it, the meaning of it – was an integral part and/or alcohol rehabilitation and were cur-
of individual stories. rently living in residential move-on accom-
Many participants presented biographical, modation, from which they hoped to move to
chronologically structured accounts of their individual permanent accommodation, were
lives which highlighted the lived experience at a different stage and this was reflected not
of vulnerability and its links to (a lack of) only in the telling of their stories but also
physical safety. This included, for exam- in their reflections on physical safety. While
ple, physical, sexual and emotional abuse producing in-depth accounts of past threats
in childhood, experiences of being street to physical safety, the majority felt physically
homeless, the physical dangers inherent in safe in the present although recognising that
alcohol and/or drug abuse or the transient this was contingent upon remaining alcohol
nature of many homeless people’s lives. and/or drug free. Looking to the future,
At the individual level it was evident that participants expressed concerns that they
the majority of the stories were structured might be housed in areas populated by drug
around the ‘quest’ for physical safety; taken users and dealers, which would constitute a
collectively it was possible to chart the new threat to their physical safety.
different ‘stages’ involved in homelessness, These participants adopted a number of
the strategies developed at each stage to strategies to reduce threats to their physical
deal with the experience, and the subsequent safety, including avoidance, ‘invisibility’ and
impact on identity. ‘passing’. Additionally, recovery from alcohol
One young man – David – for example, and/or drug abuse was often talked about in
had become homeless in his home town terms of a long-term strategy to reduce the
and had lived for a short period of time in risk of physical harm, in as much as the
a car, yet had felt safe doing so because of his ultimate goal is permanent accommodation
familiarity with the area and the people. This and reintegration into ‘mainstream’ society.
contrasted sharply with his recent experiences In addition, participants’ explanations for
of living in a night shelter. His lack of why they left home could be construed
familiarity with the area, coupled with his as a strategic act of resistance, whereby
perception that local people were actively being homeless was considerably preferable
hostile to homeless people not only led him to to being subjected to further abuse at home.
reflect on the salience of this new identity for In the discussion of the previous two
him but also to develop strategies to publicly datasets it was possible to use data extracts
THE ANALYTIC INTEGRATION OF QUALITATIVE DATA SOURCES 581
from the interviews to illustrate our analysis. living with violence and a visually locatable
Unfortunately, a combination of a lack of phenomenon for all participants. For young
space and methodological considerations does children the concept was often related to
not permit the inclusion of data extracts from extraordinary events (kidnap, murder) rather
the homeless accounts – one ‘extract’ ran than more ordinary or frequent threats to
to some 12 pages of transcription. In order physical safety such as road traffic accidents,
to do justice to the data we would need to muggings or assault.
present extended data extracts to demonstrate Contingent features of vulnerability also
the narrative nature of the accounts and emerged out of the analytic integration of the
the presentation of identity. One man, for three datasets. These included vulnerabilities
example, began his interview by asking if associated with physical safety which were
the interviewer wanted the story of his life contingent on time of day/night as well as
and then proceeded to provide a very detailed being linked to material-spatial-architectural
chronologically ordered account of his life, aspects of public spaces. In terms of dealing
which attempted to provide a socially situated with situations and locations which increased
explanation for his homelessness. Drawing perceptions or senses of physical vulner-
on the notion of ‘discredited identities’ ability, all participants identified strategies
(Goffman, 1963) it is possible to see how the which they used to manage their (potential)
interview provided the homeless participants vulnerability.
with the opportunity to provide an alternative It became clear that perceptions, construc-
account of homelessness from the negative tions and experiences of vulnerability also
one traditionally portrayed in society. diverged in different domains for different
groups of participants. For people who were
homeless, vulnerability was closely tied to
Step 3 – Creating a data repertoire
the biographies that had led them to be
The third step of this process of analytic without a home. They identified physical
integration involved juxtaposing both the assault as a recurrent feature of their childhood
initial analytic findings of the individual homes, their temporary homes in their adult
datasets and the data segments/elements that lives, and of their times living on the
had been coded in the initial analysis to street. In addition, threats to their physical
create a data repertoire for the theme of safety were encountered, or anticipated, when
‘physical safety’. This was then subjected to moving into new areas or new towns and
further analysis and interpretation, looking for occasioned the need to make decisions about
commonalities and differences, convergences where they would stay and where they would
and divergences. Effectively this repeats locate themselves. Physical vulnerability was
the process of inductive analysis with the tied into the identity of being homeless in a
data identified as salient to the thread of profoundly biographical and narrative way.
physical safety whilst remaining mindful of In contrast the physical safety issues
the implications of the nature of the data that concerned children and young people
and its origin. It is through the development in our study reflected the ways in which
of the analysis of the data repertoire that they are positioned between structures which
findings can be integrated to produce a more constrain their actions on the basis of their
complex understanding of the thread and its age, and their own desires, opportunities,
relationship to the overall research question. and abilities to be (relatively) autonomous
In relation to physical safety and vulnera- social actors (see Hutchby and Moran-Ellis,
bility, further analysis of our data repertoire 1998; James and Prout, 1990). In this regard
led us to understand physical safety as both a their constructions of physical safety and
present and an embedded past feature of the vulnerability were linked to the relative
lives of people who were homeless, a present distributions of power between adults and
but negotiable hazard for young people not children/young people, and the ways in which
582 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
these distributions intersect with their social of ecology on the one hand and an inherent
worlds. For the children in the study, their characteristic for some on the other. Physical
sense of vulnerability in physical terms related vulnerability can be understood in visual
to extending their usual geographical range terms as readings of present dangers, future
from their immediate localities with known dangers, and attributed responsibility for
adults nearby to being unaccompanied in causing the vulnerability to outside agents
public spaces at a further distance from home. such as the Council, or a local group of
They managed their vulnerability by adhering youths. Strategies for managing safety were
to parental rules which they understood not manifested in the visual domain, emerging
to be designed to maximise their safety, instead as accounts of actions including
and by developing their own readings of avoidance of the location.
other people in their vicinity in terms of In summary then, physical safety emerges
whether or not they might present a threat. as a dimension of vulnerability but how it
For the young people in the study their emerges is contextual to the social worlds
social worlds were already more extended of the participants. How people experience
both geographically and temporally, but they vulnerability, and how they act on that, varies
sought greater control and autonomy in considerably whilst the environment presents
their movements and activities. With this different degrees of threat. For the homeless
came an increased likelihood of having people in the study physical safety was a
to deal with physical safety issues, with key strand in their narratives, interweaving
threats presented by others in the form with their identities and biographies. For the
of fights, gang actions, violent encounters, young people and children it was a site around
and possible sexual harassment or assault. which the relationship between their structural
Key for the young people in the study position in their families, and in society more
was managing parental concerns so that generally, and their status as social actors is
the young people could exercise physical played out. Visually, the notion of physical
autonomy in the face of other people’s worries safety can be framed by participants as a
about their vulnerability whilst balancing material, ecologically located phenomenon.
this with managing the potential risk of Uniting these dimensions brought us to
actual violence when they were in the public considerations of vulnerability and safety
arena. Their perception of their own physical which suggested that, whilst there were
vulnerability was framed in the context of commonalities of dimensions across different
their strength or weakness relevant to their genres of experience and perception, such
potential assailant. as the significance of time and place, this
Physical safety and vulnerability took form of vulnerability also intersected with
on a different dimension in the domain individuals’ notions of their own and others’
of the visual as represented in the photo- identities. This led us towards theorising how
elicitation interviews. Here it was the material this aspect of vulnerability and its intersection
fabric of places which were invoked visually with (or contribution to) individual identities
and verbally as increasing or decreasing fits with other forms of vulnerability. To
vulnerability to physical hazards and assaults. address this question we returned to the
The built environment was taken to be a data to identify other ‘promising threads’
context in which a person’s vulnerability may and followed those analytically across the
be accentuated – for example that of older datasets. The final goal was to synthesise
people who might trip over loose paving, these findings with other themes, to create
or children who were at secondary risk to multi-faceted understandings of vulnerabil-
the hazards of drug taking near their school. ity and its management in everyday life
This material context intersected with ideas across a broad range of dimensions that
of time of day and sources of responsibility emerged from our research. At the end our
to produce physical vulnerability as a product theoretical understandings of vulnerability
THE ANALYTIC INTEGRATION OF QUALITATIVE DATA SOURCES 583
was a picture woven from these different approaches. In respect to this, the PPIMs
threads. team critically examined what happened to
the narrative accounts of the homeless people
when the data repertoire was created. Our
conclusion was that the data repertoire could
CONCLUSION
encompass narratives provided effort is made
to preserve their integrity by constantly re-
Challenges in this approach
examining the links between the themes
Our goal in this chapter has been to demon- and the narratives. In our case, the over-
strate how integration of different qualitative arching structural narrative feature, which
datasets, through an examination of each set was paramount to our understanding of the
of findings relating to safety and vulnerability, homeless participants’ accounts, draws on
increased our understanding of the experience notions of identity as a homeless person. The
of vulnerability. Each dataset contributed an theme of physical safety extended beyond
equal share to the analysis of vulnerability and specific instantiations to form a cornerstone
physical safety, and as the analysis proceeded, of identity. We suggest that the salience of
we were able to reflect on the complex nature identity in these accounts complements those
of vulnerability in this regard. produced by other participants and resulted
This process for generating analytic inte- in an increased understanding of the theme
gration is time-intensive and entails a number of physical safety and the overall theme of
of challenges. The first is identifying ‘promis- vulnerability.
ing’ threads. There are a number of strategies Nevertheless, the potential ‘risk’ of some
used in single dataset analyses which can types of data being ‘translated’ into other
be drawn on: inductive leads may arise types remains. We successfully retained
from within the project, through reference the narrative quality of the homeless data;
to the research question or sensitivity to the however, we were unable to convey a
content of the data, or it may be sparked sense of the story-ness of the data in a
externally, so to speak, by the stimulus of short chapter such as this. Similarly, in this
theoretical work and other empirical studies. chapter we described photographs, thereby
In addition, team discussions about dataset translating visual data into verbal, and relied
contents, emergent findings, and puzzling on the transcriptions of the photo-elicitations
questions are essential for establishing res- (textual data) leaving aside actual visual
onances between the datasets. Thus it is analysis which was also part of the study.
important that team research includes team The photo-elicitation draws on the visual
members with a range of expertise, allows knowledge of the study participants and is
for appropriate methodological divisions of therefore distinct from the other interview
labour, and includes sufficient opportunities data. We believe that visual data itself, as
for good communication within the team. with narrative data, can be part of analytical
Another key challenge is to allow each integration; however, conventional reporting
dataset its own integrity throughout the and publishing formats provide challenges in
integration process. Creating a data repertoire presenting such data in their own terms.
of systematically identified initial analyses, In this chapter we have argued, drawing
assembled for further analysis to produce an on our earlier work, that integration should
integrated story about a particular aspect of be thought of as a process which creates, and
the phenomenon (such as we have in done analytically exploits, a particular relationship
in this chapter with respect to physical safety between different sets of data. We have
and vulnerability), might seem to privilege a also argued that since all qualitative data
thematic approach to analysis. If this were are not alike attention must be paid to
the case, it would be problematic for data the processes by which research generating
more appropriately handled by other analytic multiple qualitative datasets will achieve
584 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
integration, where that is the purpose of Collier, J. (1967). Visual Anthropology: Photography as
having a multiple methods research design. a Research Method. New York: Holt, Rinehart and
To this end we have presented a model for Winston.
the practical accomplishment of integration at Corden, A., and Sainsbury, R. (2006). ‘Exploring
the level of analysis – ‘following the thread’ – ’quality’: research participants’ perspectives on
verbatim quotations’, International Journal of Social
which focuses on ensuring that the integrity of
Research Methodology, 9(2):97–110.
each type of dataset is preserved in the process
Coxon, T. (2005). ‘Integrating qualitative and
of integration, and hence the epistemological quantitative data: What does the user need?’
contribution of each set of data is maintained. FQS (Forum: Qualitative Social Research). 6 (2):
We also argue that this approach offers the e-paper. http://www.qualitative-research.net/fqs/
opportunity for synergies between datasets in fqs-eng.htm. Accessed July 2006.
order to achieve one of the goals of multiple- Dicks, B., Soyinka, B., and Coffey, A. (2006). ‘Multimodal
methods research: the generation of an overall ethnography’, Qualitative Research, 6(1): 77–96.
analysis which is greater than the sum of the Goffman, Erving. (1963). Stigma: Notes on the
(methodological) parts. Management of Spoiled Identity. Englewood Cliffs,
NJ: Prentice Hall.
Greene, J.C., Caracelli, V.J., and Graham, W.F. (1989).
NOTES ‘Toward a conceptual framework for mixed-method
evaluation designs’, Educational Evaluation and
Policy Analysis, 11:225–274.
1 ESRC Award H333250054 Investigating
Practice and Process in Integrating Methodologies
Hutchby, I., and Moran-Ellis, J. (eds) (1998). Children
(PPIMs). The project is funded by the ESRC and Social Competence: Arenas of Action. London:
under the Research Methods Programme Falmer Press.
http://www.ccsr.ac.uk/methods/. James, A., and Prout, A. (eds) (1990). Constructing and
2 A pseudonym for a small town in the South of Reconstructing Childhood. London: Falmer Press.
England. All participants are anonymised. Mason, J. (2006). ‘Mixing methods in a qualitatively
3 This alludes to a repertoire of dance or music driven way’, Qualitative Research, 6(1): 9–25.
pieces, rehearsed and developed, which provide a Moran-Ellis, J., Alexander, V.D., Cronin, A.,
pool from which a selection is made to create a
Dickinson, M., Fielding, J., Sleney, J., and Thomas, H.
particular conceptual performance. We use this to
capture the assemblage of initial analyses which are
(2004). Following a Thread – An Approach to
not ‘raw’ data, have their own (methodological) Integrating Multi-method Data Sets, paper given
integrity, and which can be brought together to at ESRC Research Methods Programme, Methods
produce a coherent ‘story’. We would not, however, Festival Conference, Oxford, July 2004.
wish the metaphor to be taken too far: the intention Moran-Ellis, J., Alexander, V.D., Cronin, A.,
is to provide some language to describe this part of Dickinson, M., Fielding, J., and Thomas, H.
the process of integrated analysis. (2006). ‘Triangulation and integration: Processes,
claims and implications’, Qualitative Research, 6(1):
45–59.
REFERENCES Pawson, R. (1995). ‘Quality and quantity, agency
and structure, mechanism and context, dons and
Brannen, J. (2004). ‘Mixing methods: the entry of cons’, BMS, Bulletin de Methodologie Sociologique,
qualitative and quantitative approaches into the 47:5–48.
research process’, International Journal of Social Plummer, K. (1995). Telling Sexual Stories: Power,
Research Methodology, 8(3):173–184. Change and Social Worlds. London: Routledge.
Bryman, A. (2004). Social Research Methods, second Riessman, C.K. (1993). Narrative Analysis. London:
edition. Oxford: Oxford University Press. Sage.
Caracelli, V.J. and Greene, J. (1997). Advances in Mixed- Rosenweld, G.C. and Ochberg, R.L. (1992). Storied Lives:
Method Evaluation: The Challenges and Benefits of The Cultural Politics of Self-understanding. London:
Integrating Diverse Paradigms, New Directions for Yale University Press.
Evaluation, No. 74. San Francisco, CA: Jossey-Bass. Wisner, B. (1991). ‘Rural livelihoods in Kenya,
Coffey, A., and Atkinson, P. (1996). Making Sense of 1971–1990: Further reflections on justice and
Qualitative Data Analysis: Complementary Strategies. sustainability’. Paper presented to the Association of
Thousand Oaks, CA: Sage. American Geographers, Miami.
35
Combining Different Types of
Data for Quantitative Analysis
Manfred Max Bergman
A man [sic.] with a watch knows what time it is. data are indeed thus produced, what are the
A man [sic.] with two watches is never sure. advantages of using more than one dataset for
Segal’s Law
a particular research purpose?
This chapter is about what and how data
are detected and used, and, as a consequence,
INTRODUCTION how certain limitations thus arising may be
overcome by using more than one dataset.
Data do not occur naturally, nor do data Of particular interest are different types of data
ever speak for themselves, nor does there and how they are selected and combined in
exist an obvious interpretation for a datum. modern research designs. For this objective,
Instead, data are manufactured and interpreted it is necessary, first, to conceptualize data
to fit a particular research purpose or line and their integral position within the research
of argumentation. Empirical detection and process, second, to understand the process
interpretation of presences or absences, pat- of data production, and, third, to explain
terns, order, structure, or change, regardless the possibilities and limits of using more
of whether inductively or deductively derived, than one dataset for a research project. This
are the outcome of theoretical models and chapter will not deal with data analysis
assumptions underlying analysis, of which issues specifically but will nevertheless cover
data are an integral part. Already in 1964, reasons for which more than one dataset could
Coombs wrote: ‘knowledge is the result of be used in quantitative research. In addition,
theory – we buy information with assump- while many of these issues could be applicable
tions – “facts” are inferences, and so also are to qualitative or mixed-methods analysis,
data and measurements and scales’ (1964: 5). the explicit focus here is on quantitatively
If data production is part of the constitutive oriented research. Finally, there exists an
process of research, then from where do they excellent literature on validity and reliability,
come and of what are they made? And if which connects in many ways to the use of
586 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
more than one dataset for a particular research practice, researchers often either formulate or at
purpose. In this text, however, such issues least adjust their research questions according
are not covered in detail. The utility of using to the characteristics of the available data. This
multiple datasets transcends quality issues is particularly the case with secondary analysis
relating to classical validity concerns but tends of existing data, where researchers often create
proxies from variables that may be related to,
to be under-theorized. This chapter addresses
but do not fully connect with, a construct
this omission. under investigation, or they adjust their research
questions or models to create a more adequate fit
between the constructs embedded in the research
DATA AND THE RESEARCH PROCESS question and the data available. Moreover, few
researchers are unclear about what analytic
Prompted by various introductory texts and techniques they will use, at least in general
lectures on research methods and method- terms, before they have collected their data, often
ology, most people understand empirical selecting the analytic strategies and methods
research as a tripartite process: the conceptu- according to their analytic competences and
alization of a research question, the collection habits. Quite often, specialists in multidimensional
of data, and the analysis of these data, from scaling, correspondence analysis, latent class
analysis, etc. tend to stick with the technique with
which the research results emanate.
which they are familiar.
• Fragmentation: Due to the conventional tripartite
The conventional view of the division of the research process, researchers
research process tend to focus on the details relating to the
components of the research process – research
The conventional model about the research question, data collection, and data analysis, while
process connects the four principal research neglecting the intricate relations between them.
components, i.e. research question, data However, the quality of the research process
collection, data analysis, and the research and its results are at least as dependent on
results1 , in a specific way. Figure 35.1 illus- the interconnectedness between the components
as they are on the components themselves.
trates this conventional view of the research
Due in part to this fragmented research design,
process.
many research results are unconvincing or
There are three fundamental problems with incommensurable with other research findings,
this research model: chronology, fragmenta- despite the availability of appropriate data and the
tion, and apparent inevitability: application of sophisticated analytic techniques.
This connects to some extent to John Tukey’s
• Chronology : This conventional model implies suggestion that there exists an error source far
a chronological ordering of the different parts more treacherous than the Type I or Type II error:
of the research process such that researchers the greatest threat to validity, the ‘Type III error,’
appear to have settled on a research question is asking the wrong questions of the data (cited in
before they collect or select appropriate data, Raiffa, 1968).
and only then would they consider how these • Inevitability : This model also implies a certain
data are to be analyzed. Thus, the model strongly inevitability of the results that emerge from
implies a deductive approach to research, while the research question. The research results are
inductive research, including data exploration and believed to be an inevitable consequence of the
visualization, are either ignored or spurned2 . In research question because the data were collected
Research
Data Analysis Results
Question
or selected based on their suitability for answering taken long before data have been collected,
a particular research question, and the analytic and that there exist many options to answer
technique was selected according to the data a particular research question. All research
at hand and in line with the research question. findings are contingent. The intricate intercon-
However, this is not necessarily the case. Just nectedness between research results, research
because a dataset is analyzed adequately, i.e. the
question, data, and analysis is illustrated in
analysis conforms to established standards and
that its output provides an answer to the research
Figure 35.2.
question, it does not mean that no other analyses Even though this model is less parsimo-
are equally adequate for this dataset and research nious than the research model presented in
question. A different analytical model with the Figure 35.1, it is more comprehensive, making
same data or a similar statistical model with explicit the complex interactions between
different data is likely to produce variations in the different parts of the research process. As a
results, even if the research question remains the more realistic representation of the research
same. What is neglected in this tripartite research process, it implies that:
process with one dataset and one analytic strategy
is the awareness of equally suitable alternatives, • The research question, data collection, and data
i.e. other suitable datasets or analytic strategies,
analysis are interconnected reciprocally and are
which could have served equally well to answer
thus not connected to one anther chronologically.
the research question. Due to the implied causal
For example, experienced researchers formulate
chain – from research question via a dataset to
precise research questions or hypotheses based on
the research results, variations in results due to
part or the data that can be or have been collected.
alternative data choices or analytic strategies are
They furthermore collect data such that they are
rarely considered.
suitable for a particular set of analyses. This does
not only refer to the kind of data necessary to
answer research questions but also to the ‘shape’
An alternative view of the research data needs to have in order to be analyzed
process according to various analytical techniques.
• The intricate relations between the components
Experienced researchers are not taken in by of the research process make it necessary to
this traditional model and its implications. consider the research process within a larger
They are aware that the components of the research framework, within which the relations
research process are far more integrated, that between the components are as important as the
many decisions about data analysis have been components themselves. Thus, there may exist
Research Data
Question
Analysis Results
Figure 35.2 Interdependence between the research question, data, analysis, and results
588 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
different ways to analyze a particular dataset or among the countries participating in the
there may exist many different datasets relevant survey. However, as only 21 items of the scale
for a research question. The criteria for data were included in the ESS survey, some aspects
and analytic selection are not only based on the of the theory cannot be tested fully (Schwartz,
suitability in relation to the research question, 2005). An inclusion of the entire scale or a
but also on familiarity with the data or analytic
different subgroup of items, a different sample
technique, contemporary fashions and trends,
institutional politics, access and cost, political and
of individuals within the participating coun-
economic context, etc. tries, a different set of participating countries,
• The research results are a function of not etc. may have changed the results generated
only the research question, but also of choices by the testing of the hypothesis regarding
relating to the selection and preparation of data the universal nature and structure of cultural
and analysis. As different datasets and different values. Furthermore, values are studied in
analytic techniques respond to different parts of many other ways. For example, Schwartz
a particular research question, so will the results be labeled the value associated with prestige
a function of not only the research question per se, and social status ‘Power.’ Two items measure
but also of the selection of the dataset (including Power on a six-point ordinal scale in the
how key concepts were operationally defined
ESS: ‘important to be rich, have money and
before data were collected, as well as the context
within which these data were collected) and
expensive things’and ‘important to get respect
the analytic technique (including how data were from others.’ It is debatable whether these
prepared for analysis and which analyses were two survey items adequately measure prestige
conducted). In other words, regardless of whether and social status and to what extent prestige
researchers frame their work in a materialist- and social status encapsulate power as a
realist or a constructivist paradigm, empirical desirable and trans-situational goal for survey
research always has a constructivist slant to respondents. For instance, Treiman (1977),
it because no objective manner exists to, for Coxon and Jones (1978), and Ganzeboom
instance, define, measure, or analyze a cultural and Treiman (1992, 1996) propose markedly
value, an attitude, a social class, a policy, an different ways to conceptualize and measure
education level, or a poverty line. Empirical
prestige. Returning to the research model,
research results are framed by the way a research
question has been phrased and operationally
it should be clear from this example that
defined, as well as what and how empirical cultural value theory can indeed be tested
phenomena were selected and prepared as data. with the ESS data, but always only partially.
They are framed furthermore by how and in Other data could be used for the same theory
what context these data have been collected and associated hypothesis, which may not
and prepared for analysis, how they have been only produce different results, but might also
analyzed, and how the results from the analysis address a different aspect of the research
have been interpreted and qualified. question. For instance, including one of the
omitted items relating to Power in the ESS,
While both models indicate that data should i.e. authority (‘the right to lead or command,’
be collected or selected according to their suit- Schwartz, 1997) may have not only changed
ability for the research question, the second the result in relation to the presence of this
model also shows – via the double arrows – value in the participating countries, but also
that any specific dataset will only partially have had implications for the way Power was
answer a research question. An example will assessed with this item. Thus, the research
clarify the arguments above. The European question not only has obvious implications
Social Survey (ESS, 2004) includes 21 items for the selection of data but, less obviously,
of a 56-item scale to measure ten ‘universal the actual data selected have implications
cultural values,’ as developed by Schwartz for what part of the research question is
(1999). It should thus be possible to test being answered. Accordingly, the relationship
Schwartz’s hypothesis that the 10 values between the research question and data is
indeed exist within a particular configuration reciprocal in nature.
COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 589
It is a matter of purpose and debate, whether statistical analysis. To pursue this problematic
empirical research ought to begin with basic argument and, thus, shed light on why
laws or theory, whether it should start with and how multiple datasets could be used
empirical observations from which laws and for quantitatively oriented research, it is
theory are deduced, or whether research necessary to explore how data are classified
iteratively vacillates between data and theory more generally.
(e.g. Bryman, 2001). Nevertheless, empirical
research is irreducibly connected with both
Qualitative vs. quantitative data
theory and data. Indeed, it is argued here that
no datum can be conceived of or understood in One of the most widespread and misleading
the absence of explicit or implicit theoretical classification systems divides data into quan-
assumptions, and that data can be understood titative and qualitative data. Within this tradi-
and evaluated in terms of their suitability and tion, there are three different practices relating
quality only with regard to their relationship to this nomenclature. First, it is used to dif-
with a research question and how they are ferentiate between variables measured on so-
to be analyzed. In order to substantiate this called continuous and discrete scales. The age
argument, it is necessary to examine how of respondents in months or years, the precise
the term ‘datum’ is used and in what way net annual household income, the estimated
assumptions and interpretations are part of this percentage of time dedicated to specific
usage when conducting research. To explain leisure activities, etc. are habitually pre-
how and why different types of data can be sented as continuous and are thus considered
used for empirical research, it is necessary to quantitative variables, while place of resi-
explore: first, what data are made of; second, dence, religious affiliation, ethnicity, etc. are
the reasons for using more than one dataset often considered discrete and thus qualitative
for a research question; and, third, how these variables. Setting aside a critique of this
reasons connect differently to various parts of particular practice, it should be evident that
the research process. data thus coded have already gone through
a theoretical and analytic process such that
using the terms ‘qualitative’ and ‘quantita-
WHAT ARE DATA MADE OF? tive’ in this narrow sense is most useful
for the selection of a particular statistical
Etymologically, a datum, past participle of the technique with which these data may be
Latin word dare, i.e. ‘to give,’ implies that a analyzed. Second, the bifurcation of data into
datum is something given or something that qualitative and quantitative data often refers
exists, and that in some way it reflects or to a more general form in which observations
at least is connected with an understanding have been recorded, e.g. numbers vs. words
of what colloquially is referred to as reality. or numerical vs. textual data. The problem
Most students taking an introductory course with this form of classification is that, on
in statistics may get the impression that social the one hand, numbers often stand for words,
science data originate from spreadsheets, concepts, or positions on axes of judgment
readymade and conveniently organized into and that, on the other hand, textual data could
rows and columns that correspond to cases and be easily, and often are, transformed into
variables, respectively. But data are of course numerical form. Furthermore and related to
the result of a very long production chain, this point, numbers and text do not share
which includes operationalization, selection, the same level of abstraction in that numbers
translation, and transmogrification processes often stand for text. Finally, dividing data
(e.g. Marsh, 1982). into numbers and text does not do justice
By either habit or misconception, only to the tremendous variety of data used in
certain kinds of data, usually the rows-and- the social sciences, such as visual and audio
columns kind, are believed to be suitable for data. Depending on the research question and
590 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
design, such data can be transformed into data collected from surveys or experiments
numerical or other kinds of data. Third, this will be subjected to statistical analysis, it
bifurcation often reflects the way in which may be of interest to explore interactions
data are analyzed. Accordingly, quantitative between researchers and respondents in
data are ostensibly analyzed statistically, survey research or (quasi-) experiments non-
while qualitative data are not. However, any statistically.
so-called qualitative data, e.g. texts, audio and
video recordings, symbols, photos, drawings,
Sense data, objective data, and
etc. could be transformed into numerical form
subjective data
and then analyzed statistically. Quantitative
content analysis is one of the numerous Amore elaborate way to classify data connects
techniques, in which non-numeric data are to its relation to a presumed external reality,
analyzed statistically. As such, one would add dividing data into sense data, objective data,
to the confusion by proposing that ‘qualitative and subjective data. The most obvious way to
data’ are analyzed quantitatively. think about where data come from and what
From these arguments, the terminology they are made of relates to sense perception,
‘qualitative data’ and ‘quantitative data’ i.e. acquiring and processing sensory informa-
should be considered misnomers, and they tion not only through the five senses – vision,
should be avoided because these three prac- audition, gestation, olfaction, and tactition, as
tices are confusing and misleading. The terms proposed by Aristotle in De Anima, Book II,
‘qualitative’ and ‘quantitative,’ if they must but also thermoception (heat), nociception
be used, should be restricted to how data (pain), equilibrioception (balance), and pro-
are analyzed, though even this usage is not prioception (body awareness), etc. (Hurley,
entirely unproblematic. 1998). More usual are empirical data that are
derived from sense data. Derived data can
be based on memory or experience such as
Data as a product of the data
attitude or value statements, or data that are
collection method
inferred from sense or other derived data.
Beyond dividing data into qualitative and Particularly the latter form of data gives rise
quantitative data, another typical way to to the central constructs in the empirical
classify data is to associate them with the social sciences such as poverty, exclusion,
method with which they were collected. class, networks, identity, family, household,
Accordingly, interviews, focus groups, par- etc. A further distinction of these indirect
ticipant observations, indirect measurement, derivatives is termed ‘objective and subjective
surveys, experiments and quasi-experiments, data.’
etc. generate interview data, focus group No longer requiring the perceived infor-
data, observational data, experimental data, mation to represent faithfully the external
etc. This classification is far less problem- objects, objective data nowadays are more
atic but neither makes a clear statement likely to refer to data that may be observed
about data types or of their content, nor and possibly verified by more than one person.
about how these data will be analyzed. Duncan et al. state that ‘[o]bjective phenom-
Despite the incorrect assumption that inter- ena are those that can be known by evidence
view data or data from participant obser- that is, in principle, directly accessible to
vations, e.g. will be submitted to some an external observer. Often that evidence
form of qualitative analysis, it is not is actually a matter of record, although the
only conceivable but occasionally of par- relevant records may not be easily sampled
ticular interest to statistically analyze data for the population of interest’ (1984: 8).
from interviews or participant observations Experimental data or answers to survey
(e.g. Johnson, 1978; Bernard, 2005). Sim- questions relating to name, gender, age,
ilarly, while it is usually assumed that commuting distance to work, annual gross
COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 591
income from work, marriage status, number data but none captures sufficiently how data
of unprotected sexual encounters in the should be integrated in the research process
past month, etc. are examples of objective more generally and in quantitative research
data in that the information conveyed by more specifically. Another conceptualization
the data could be verified by someone of data is needed in order to make a convincing
other than the respondent. However, whether argument about why and how more than one
confirmation by others would indeed render dataset should be used in quantitative analysis.
data truly objective is questionable. On the one If it is not possible to make a convincing case
hand, convergence between the respondents’ about the use of data from existing typologies,
answers and an external observer about the then it needs to be made with regard to their
phenomenon under investigation does not purpose in the different stages of the research
guarantee objectivity as the respondent and process.
the external observer may misperceive or
misjudge the phenomenon in a similar way.
On the other hand, divergent information FOUR GENERAL REASONS FOR
from verification through alternative sources COMBINING DATASETS IN
may not automatically falsify the respon- QUANTITATIVE RESEARCH
dents’ declarations. In contrast, subjective
data are data that ostensibly cannot be There are four general reasons for using more
verified by external observers. In this vein, than one dataset in one research project, par-
‘[s]ubjective phenomena are those that, in ticularly in quantitatively oriented research:
principle, can be directly known, if at all, verification, convergence, complementarity,
only by persons themselves’ (Duncan et al., and holism.
1984: 8). Examples of subjective phenomena
are answers to questions relating to atti- • Verification: Using data for the purpose of
tudes, values, preferences, judgments, etc. verification can take a number of different forms.
However, if this were true, i.e. if attitudes Generally, verification here means to assess some
and values exist only in the minds of form of fit, whether empirical or theoretical,
the persons in question, then they would between an ostensibly established dataset or
be of little significance to social science theory and another, less well-established dataset.
research. Attitudes and values, e.g. often Verification is part of what is often referred
have behavioral and symbolic correlates to as convergent validity. However, this form
of ‘validation,’ i.e. convergence, differs from
such that they can be inferred by external
verification, in that using more than one dataset
others. Hence, Duncan et al. add the qual-
for the purpose of convergence goes beyond
ification that ‘a person’s intimate associates a comparison of results with some empirically or
or a skilled observer may be able to surmise theoretically established baseline.
from indirect evidence what is going on • Convergence: Researchers often consider findings
“inside”’ (ibid). from different datasets and different studies in
It should have become evident from these order to examine how results between different
typologies that, contrary to habits and frequent time periods, contexts, or samples converge.
misconceptions, all types of data presented Convergence can be of importance with regard
above could be analyzed quantitatively, i.e. to data quality, changes across time periods,
statistically. Indeed, it is important to differ- regional and situational variations, etc. The idea
of convergence connects to convergent (and
entiate between what data one wants to use
divergent) validity. Derived from measurement
and how these data need to be prepared for
theory and well established in psychometrics,
statistical analysis; the former is connected convergent validity relates to the extent to
most closely to the research question, the which items or sets of items that should
latter to the statistical technique that will be associated with one another theoretically
be performed. The typologies presented so indeed can be observed to relate to each other
far are based on the origin and uses of statistically (Campbell & Fiske, 1959). When using
592 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
unconnected dataset, i.e. when it is not possible were presented as if they are mutually
to correlate items or sets of items with each other, exclusive, researchers may actually pursue
it is more difficult to assess convergence. a combination of these reasons within one
• Complementarity : In essence, complementarity or more research phases. What remains
stands for the use of more than one dataset to be accomplished is to connect these
for the purpose of finding additional but directly
general reasons with the different phases in
related aspects that can be discerned only in
their combination. There may exist theoretical
quantitatively oriented research in order to
and empirical reasons for combining different show that, on the one hand, different reasons
conceptualizations and empirical findings in order could be employed in the same research phase,
to get an additional perspective on a particular or that the same reason could be employed
theory or research finding. Important here is that fruitfully for very different purposes, depend-
researchers do not simply examine additional data ing on the particular research step and
that in some way relate to the research topic as research aim.
practically any additional dataset would provide
further insights or qualifications. Instead, the use
of an additional dataset should go beyond the
desire for an ‘additional perspective.’ It should be REASONS FOR USING MULTIPLE
either theory driven or at least pursue a specific
DATASETS AND THE FOUR RESEARCH
purpose. As with the first two reasons for using
more than one dataset, here too, complementarity COMPONENTS
is often not mutually exclusive from the other
reasons. With regard to the term ‘data,’ Coombs (1964)
• Holism: Holism is an extension of complementarity distinguishes between recorded observations
but goes one step further. It is based on and that which is analyzed. More precisely,
a classical view of empirical research and stands his theory of data, implicitly emphasizing
for the aim of studying the phenomenon under inductive and exploratory approaches to
investigation as it exists ‘in reality,’ i.e. beyond quantitative research, includes three phases:
the limits of research-related errors, biases, the selection and recording of observations
and subjectivity. Thus, each dataset and results
from a universe of potential observations; the
associated with it is considered a piece of the
puzzle that will eventually, if combined correctly,
production of data by interpreting, classifying,
reveal the true phenomenon and its dynamics and labeling these observations; and, by
(Brewer & Hunter, 2006). The extent to which applying the data to an analytic model, the
findings from many different datasets, often identification of relations, order, and structure.
collected for different purposes and in different Given that the results from an analytic
contexts, are able to eliminate all kinds of model do not speak for themselves but must
errors and provide insight into how things really be interpreted, one could propose a fourth
are ‘out there’ are questionable. While holism phase: the transformation from the relations,
explicitly aims at depicting reality by piecing order, and structure as emergent from the
together evidence from different data sources, analysis into research results, usually by an
complementarity merely uses different datasets
interpretive process that links these to theories
for establishing, expanding, or testing an idea or
theory. Nevertheless, the use of multiple datasets
and research questions. While Coombs’ Data
in the pursuit of holism has been practiced widely Theory is predominantly concerned with the
in the past although, with Kuhn (1970, 1983) and process of identifying patterns and structures
Rorty (1991), one wonders whether a belief in the from existing data, this chapter is an extension
idea of objectivity and convergence of scientific in three ways: it explores how and why more
progress toward some external reality is necessary than one dataset could be used in quantitative
for even classical approaches to science. analysis; it examines the research process
beyond the identification of structures and
The use of more than one dataset in a research patterns from existing data; and it emphasizes
project may be justified based on these the non-chronological ordering and intercon-
four general reasons. While these reasons nectedness of the research components.
COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 593
All four research components are embed- a compound variable reflecting household
ded in creative and interpretive processes. net income, which will then be used for
In the absence of objective guidelines with further analysis. Some transformations are so
regard to how researchers get from the complicated that authors publish conversion
conceptualization of the research question to tools that allow users to recode certain
the interpretation of the statistical results, variables according to such a key (e.g.
each step requires creative decisions that are Ganzeboom & Treiman, 1992). But rather
forced upon the researcher. Hence, each phase than using existing compound variables or
delimits the research results in a particular using a conversion tool, one may want to adapt
way, where different types of data and or test these instruments by, e.g. introducing
different types of analysis could be used or omitting variable, or by weighing the
for different purposes. The four research importance of a variable differently.
components and the four general reasons With regard to convergence, for instance,
for using more than one dataset allow for researchers may not be satisfied with using
16 possible combinations, i.e. using more only a subset of an established scale to test
than one dataset for verification, convergence, a theory, such as the example about the
complementarity, and holism in relation universal cultural values as described above,
to identification/selection of observations, and may therefore collect additional data.
recording/transforming them into data, iden- Within limits, they may also want to verify
tifying analytically patterns/structures, and whether the ESS data from the subset of
interpreting patterns/structures meaningfully. Schwartz’s value scale are adequate to assess
This section will not cover all 16 possibilities values as proposed by the full 56-item scale,
but only provide examples on how more as discussed earlier. Researchers may also
than one dataset can be used for different want to verify the universality of Schwartz’s
reasons and different components. For the theory on values by collecting and analyzing
following, it is also important to realize data for a country not part of the ESS.
that it is only possible in principle to Verification may also include an examination
separate the four general reasons for using of the suitability of a shortened scale or
more than one dataset across the different the representativeness of a dataset with
research components. In practice, the line of regard to some demographic indicators, e.g.
demarcation between categories can be rather gender, ethnic composition, age, etc., by
difficult to identify. comparing them with national census data,
for instance. With regard to complementarity,
researchers may be interested in testing or
Links between data and their
qualifying the universality of value systems
patterns
or Schwartz’s value structure by exploring
For most of those conducting quantitatively alternative cross-cultural datasets on values
oriented research, the relations within the (e.g. Hofstede, 2001).
research process between raw data and their There are numerous problems associated
patterns and structures are the most accessible, with comparing the results of an analysis,
and so I will begin by outlining reasons including the compatibility of the contexts
for using more than one dataset within this within which data were collected, the com-
phase. Data are either explored for patterns, patibility of the sample, the compatibility of
or the fit between theory-guided patterns the variables, etc. (Kiecolt & Nathan, 1985;
and the data are analyzed. Often, data are Dale et al., 1988).
also prepared for further analysis, e.g. by Beyond these, there are also problems
creating compound variables or components. associated with combining micro and micro
For example, variables relating to income, data. In short, the social sciences often
debts, savings, etc. of individuals living attempt to connect micro-level data such as
in a household may be used to produce individual behaviors or family dynamics with
594 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
macro-level data such as social norms and convergence, and complementarity may nev-
power structures. For example, Alexander ertheless play an important role in advanc-
and Giesen (1987) identified the five main ing theory and empirical support thereof,
approaches to micro-macro analysis in the without necessarily providing (or needing to
social sciences. According to the major provide) the ‘real’ model of complex social
strands in social theory, society is created systems.
by (a) rational individuals, (b) interpretive
individuals, (c) socialized individuals acting
Links between patterns and their
as a collective force, (d) socialized indi-
interpretation
viduals who reproduce the existing social
environment on a micro-level, and (e) rational The complex relations between patterns
individuals who acquiesce due to external and structures on the one hand, and their
forces of social control (cf. Münch & Smelser, interpretation, on the other, is also part of the
1987). However, it should be noted that main focus in statistically oriented research.
there is nothing intrinsic about a level to An analysis of presences or absences, patterns,
be identified as micro or macro, i.e. they order, structure, etc. within datasets still needs
represent relative points on a continuum. to be interpreted. A mere statistical description
In other words, interactions, families, or thereof is insufficient because coefficients do
neighborhoods could represent the micro or not speak for themselves but must be linked
macro level, depending on their integration meaningfully to the research question and the
into a model. What is important, however, underlying theories. Convergence can play an
is that there are more micro-level units important role at this state. For instance, in
than macro-level units, and that the micro our study on intergenerational social mobility
units can be assigned to a macro unit. in Switzerland, we examined all large-scale
The computational complexity of assessing data available in Switzerland that contained
the interrelation of systems that are formed information about social position between
concurrently between micro units, between parents and their children (Joye et al., 2003).
macro units, and between micro-macro units Convergence was important in three respects:
is tremendous (Saam, 1999), but the greater first, with regard to data quality, data col-
problem is the lack of unity between theoreti- lected during approximately the same period,
cal and computational models. Consequently, including census data, should converge in
some researchers, particularly those engaged order to cross-validate the datasets in relation
in empirical research, often argue that the to their representativeness of the population
levels cannot be combined, i.e. that macro- under investigation. Convergence of datasets
level data follow a different set of laws and from different time periods was examined
logics than micro-level data. Theorists, on in order to explore how intergenerational
the other hand, have long been involved in social mobility has changed in Switzerland
conceptualizing the relationship so central to over time. Finally, the chronological trends
social science (e.g. Mill, 1961; Luhmann, identified in this study were compared to those
1982; Giddens, 1991; Collins, 2000), but have of other countries in order to explore how
difficulties with finding convincing empirical social mobility in Switzerland converged with
evidence for their sophisticated arguments. other European countries.
While the combination and analysis of micro- When using unconnected dataset, i.e. when
and macro-level data, either separately or it is not possible to correlate items or sets of
pooled, may provide important insights into items with each other, it is more difficult to
the complex relations between and within the assess convergence. Meta-analysis is an area
levels, it is likely that the gap between empiri- of research, where multiple research findings
cal results and theory will remain. Combining are compared with each other. There often
micro- and macro-level data for verification, exist many different studies with their own
COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 595
data, all pursuing a similar research question. opinion (e.g. John Zaller, 1992) or the
Even though the first meta-analysis was role of cognitive intelligence in society
performed to merely increase statistical power (Herrnstein & Murray, 1994). An appeal
(Pearson, 1904), data and findings of related toward holistic research is also made by
studies can be pooled and compared with each Brewer and Hunter (2006), who even argue
other in relation to a substantive theory. Meta- that by integrating the ‘four major research
analysis, the ‘analysis of analysis’ (Glass, styles’ – fieldwork, surveys, experiments, and
1976) attempts to identify and partially non-reactive research – it would be possible to
correct artifacts and variations in findings due take advantage of the strength of each of these
to sampling and measurement error, range methods and, thus, arrive at ‘valid’ research
restriction, correlation bias etc. over a series results. The attractiveness of this approach
of studies (Hunter & Schmidt, 1990). With is an underlying quest for systematization
variations, this is basically accomplished by of the many competing and conflicting
identifying a set of studies for meta-analysis theoretical and empirical approaches. For
that are relevant to a research question, numerous reasons elaborated in this chapter,
determining the suitability for inclusion in however, theories and empirical findings on
a meta-analysis in terms of the research a research topic are bound to be conflicting
respondents, variables, time period, research and contradictory. Rather than attempting to
design, etc., assessing the effect size of the isolate the one set of social science theories
different studies with regard to the qualities and empirical findings that are superior
and quantities under investigation (e.g. group according to some set of criteria, presumably
mean differences, correlations, proportions, because they are closer to reality, social
etc.), creating comparability between the science research may indeed be marked not
effect sizes by a coefficient, and then examin- only by systematic thought and analysis, but
ing the convergence and variability between also by eternal ambiguity about the validity,
the studies and their respective data (Lipsey & utility, and context dependence of different
Wilson, 2001). A further variant is that of approaches. This may not necessarily be
pooled data, i.e. combining data collected a bad thing. It could be argued that it
at multiple sites, different time periods, or is precisely this ambiguity, the competition
a combination thereof (Beck, 2001; Halaby, between theories and empirical approaches
2004). However, it is often argued that pooling that can be considered a way of doing science;
data is fraught with error due to heterogeneity not necessarily a closing in of how things
problems across datasets (Maddala, 1999). really are in a mind-independent reality but
Another problem is the issue of which studies a negotiation of questions and their pursuit
to include: some argue that methodologically between different stake holders in a particular
weaker studies should also be included, time and space.
albeit with a different weighting, while others
propose to include only methodologically
Links between recorded
sound studies (Abrami et al., 1988). The
observations and data
‘file-drawer effect’ is yet another problem
because meta-analyses often exclude non- Statistically oriented research is often not
significant findings as these are usually not directly involved in problems associated with
published. transformation between recorded observa-
Combining datasets and their analysis is tions and data. Nevertheless, a considerable
also often practiced in search of holism. information loss occurs during the recording
For example, some researchers interested of an observation, e.g. from the attitude of vot-
in voting behavior may use a multitude of ers at the moment of recording to the recorded
available data in order to pursue complex attitude statements in the questionnaire, from
theories, e.g. a general theory on public the lived experience of the interview situation
596 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
to the interview transcript, etc. This loss, be used for verification, convergence, or
however, should not only be considered as complementarity.
a potential source of bias in a classical sense,
but also as a necessary step in the focusing
of empirical phenomena to a set of relevant Links between potential
aspects as defined by the researcher’s focus
observations and recorded
and research question. Once observations
observations
have been recorded in whatever crude form,
e.g. photos, ticks on questionnaires, piles Despite the fact that the transformation from
of items sorted by respondents, interview a potential observation to a recorded obser-
recordings, etc., they must be turned into data vation is so crucial to the research process,
before they can be analyzed quantitatively. most quantitatively oriented researchers have
At times, this is done simultaneously, such not considered its complexity sufficiently.
as in the encoding of responses with CAPI Thus, a brief transgression into related fields
or CATI, where the interviewers encode will shed light on the complexity, within
responses directly into preexisting response which potential observations are ultimately
categories, often with significant freedom transformed into data via their selection
and error when reinterpreting respondents’ and recording. Originally explored by pre-
answers (Elias, 1997a, 1997b). Turning the Socratic philosophers in conjunction with the
recorded observations into meaningful cate- limits of our senses to provide us with true
gories is an art in itself, as the following quote knowledge about the world (White, 1991;
illustrates: Dancy & Sosa, 1992), this first transition –
from potential to recorded observations – has
In the field one has to face a chaos of facts, occupied a prominent position in cognitive
some of which are so small that they seem
insignificant; others loom so large that they are
and social psychology as well as social
hard to encompass with one synthetic glance. But anthropology and ethnography.
in this crude form they are not scientific facts at Termed ‘sense data’ by twentieth-century
all; they are absolutely elusive, and can be fixed philosophers, information from our senses
only by interpretation, by seeing them sub specie appears to reproduce external objects in the
aeternitatis, by grasping what is essential in them
and fixing this. (Malinowski, 1948: 238)
mind via perception. According to sense-
theorists such as Russell (1927) and Moore
Turning observations into data is a form (1953), the book you are reading is repre-
of taming and disciplining them, turning sented by sense information relating to shape,
them into a form that is suitable for texture, weight, color, etc. such that the object
a particular type of analysis. Far too little is represented by the mind according to this
attention is paid to this process, which, perceived sense information. Thus, sense data
ultimately, includes a type of analysis at reflect the attributes that an object is believed
least as important as the subsequent analysis to have. But sense data also relate to the
with the thus derived data. For example, awareness of perception and are, thus, always
before a quantitative content analysis can also mind dependent. From a materialist-
be performed, non-numeric material needs realist perspective, even though the size and
to be coded meaningfully. While there are shape of this book varies if viewed from
some tentative suggestions about how to different angles or distances, these changes are
derive and verify these codes, e.g. via iterative variations in perspectives of the same external
procedures (Glaser & Strauss, 1967; Glaser, object. As part of cognitive development
2005) and inter-rater reliability (Gwet, 2001), of infants, this and related issues have
the processes suggested in the literature are at been studied by developmental psychologists
best guidelines and recommendations. In this such as Piaget (1955) under the heading
case, producing different datasets from the of conservation and persistence. However,
same set of recorded observations could a number of philosophers (e.g. Austin, 1962;
COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 597
Jackson, 1977) question the possibility of i.e. theories and ideas. It is quite likely that
representation of external objects through many data collectors, coders, and researchers
sense data, listing in particular phenomena are subject to similar tendencies when they
relating to illusions, hallucinations, double are identifying, sorting, and interpreting
vision, and the time delay between existence as relevant a small subset of observations
and perception. Furthermore, sense data such in the pursuit of a particular research
as color, taste, smell, and sound do not exist question.
in the external world but are recognized as Indeed, social anthropologists and ethnog-
attributes due to specific interactions between raphers initially attempted, and later contested
stimuli, physiology, and mind. As such, the the possibility of, an objective description
perception of objects is fundamentally influ- of meaning structures. Social anthropologists
enced by human physiology and psychology. and ethnographers initially attempted to, and
More precisely, research in social cognition later contested the possibility of, objec-
has revealed that perception and memory tively encode and present meaning structures.
are shaped by prior knowledge and cur- Malinowski, the first modern anthropological
rent context. Asch proposed two competing explorer and specialized fieldworker, outlined
models for impression management (1946): the tools with which to understand the
according to Asch’s configurational model, complexities of meaning structures external
individual elements of perception are aligned to one’s own mental context. Empathy and
to form an overall impression such that these insight, acquired in part through long-term
can be changed according to context and exposure to a socio-cultural environment of
expectations. His algebraic model proposes concern, are the tools that were believed to
that individuals assemble all elements of assist in the understanding of the meaning of
perception and then come up with a combined such phenomena.
impression thereof. Both of these models But careful and systematic empirical
have received wide attention, while the latter observations and detailed descriptions of
has had a strong influence on attitude and socio-cultural phenomena have created incon-
value research (e.g. Fishbein & Ajzen, 1975). sistencies and, ultimately, doubts about the
Heider’s balance theory (1944) is related feasibility of precisely this undertaking.
to Asch’s in that perceived elements tend At least since the 1970s, a time period marked
to be changed in people’s minds, if they by what Geertz named the ‘crisis of repre-
do not fit an existing model. Apparently, sentation,’ it was clear that a reproduction
sense information is adapted to fit existing of meaning or, more generally, the transport
thought structures in order to maintain of meaning from one meaning system to
unified, overall impressions and knowledge another, is at least problematic. As Geertz
structures. Less socially oriented, Bartlett states:
(1932) explored how past behaviors and
There is a lot more than native life to plunge into if
experiences are organized into patterns such one is to attempt this total immersion approach to
that they facilitate future cognitions and ethnography. There is the landscape. There is the
behavior. isolation. There is the local European population.
While psychological studies about social There is the memory of home and what one has
cognition and impression management focus left. There is the sense of vocation and where
one is going. And, most shakingly, there is the
on general human processes including cog- capriciousness of one’s passions, the weakness
nition, motivation, and behavior, the findings of one’s constitution, and the vagaries of one’s
from these studies could well be applied thoughts: the nigrescent thing, the self. It is not
to researchers and the research process. a question of going native…. It is a question of
Researchers make sense of a confusing living a multiplex life: sailing at once in several seas.
(1988: 77)
and complex environment and, here too,
researchers may have the tendency to adjust Clifford (1983, 1986) goes so far as to con-
and adapt elements to fit existing schemas, sider insights acquired through observations
598 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
as highly intersubjective engagements, i.e. tools, and the analysis, interpretation, and
where observations are ‘orchestrated’ within presentation of research results, also within
politically charged situations, far better the limits of well-established tools and forms.
reflecting the ethnographer’s view and Inconsistencies between research findings,
position than that of the people and situations if detected at all, are usually attributed
observed. While this position represents an rather vaguely to differences in theoretical
extreme view in the social sciences, it nev- approaches, data collection and analysis
ertheless stresses correctly the intersubjective methods, interpretations, etc. Sooner or later,
nature and selectivity of phenomena, long the self-correcting nature of science, so it is
before these phenomena are recorded, trans- hoped, will take care of these inconsistencies.
formed into analyzable data, and analyzed. The third way begins with the recognition
From a practically infinite number of possible that all knowledge derived from empirical
empirical phenomena, in themselves only a research is partial, subject to argument,
subgroup of all potential empirical phenom- verification, and revision. This third option
ena that could have been chosen, researchers also paves the way for using more than
select as empirical evidence for their project one dataset for quantitative research, not
that which they believe to be suitable, based on merely for purposes relating to verification
specific social, economic, political, cultural, or convergence, but also for complementarity
etc. considerations (Bergman, 2002). In other and holism.
words, even before a shred of empirical All four reasons for the use of more
evidence has been conceived of as a potential than one dataset could be relevant in this
source of data, the research results have been permanently transitional phase. A researcher
‘compromised.’ could use additional data to verify whether
The social sciences deal with this problem a construct has been adequately concep-
in three ways. The first entails a call for tualized and captured by existing studies.
the abandonment of empirical research alto- Similarly, convergence and complementarity
gether, supported by the claim that research could motivate a researcher to propose
thus tainted would not yield what is often an alternative, shorter, or otherwise more
called objectivity or, in epistemology and convenient way to collect data, which would
the philosophy of science, is considered true either test or elaborate on an existing study
knowledge or truth, i.e. knowledge that is not or theory. For example, values have been
subject to argument and perspective. In this studied cross-nationally not only by Schwartz
vein, Tyler (1986) proposes that the aims of and his colleagues, but also by, for exam-
science in general, and ethnography in par- ple, Hofstede (2001), Triandis (e.g. 1995)
ticular, are now an evocation of an imagined and Abramson and Inglehart (1995). There
reality between the author and the reader of may exist theoretical and empirical reasons
scientific texts for therapeutic and aesthetic for combining different conceptualizations
effect. The second, far more frequently and empirical findings in order to get an
practiced way to deal with this problem, par- additional perspective on how values are
ticularly by quantitatively oriented research, distributed across nations and in which
is to ignore it. Concerns outlined above combination they are distributed between
are drowned out by comfortable routines regions or social groups. Finally, this third
engrained in the craft and habits of doing way also reconnects researchers to the human-
research. These include: the formulation of made artifacts within research, e.g. that a
a research question or hypothesis in line ‘value’ is a complex construct and that it
with the literature of respectable authors and is part of a form of shorthand that allows
journals, the operational definition of key researchers to explain to their public a set
constructs relating to the research question of phenomena, which they have crafted and
or hypothesis, data collection according to identified as relevant within a particular space
these definitions and with well-established and time.
COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 599
data analyses tend to emphasize induction and The Poetics and Politics of Ethnography. Berkeley,
hypothetico-deductive research, exploratory analysis CA: University of California Press.
needs some form of ‘container’ that provides a Collins, R. (2000). Situation stratification: A micro-
minimal theoretical underpinning from which explo- macro theory of inequality. Sociological Theory, 18,
rations are conducted. On the other hand, most 1, 17–43.
statistical modeling, which formally is based on
Coombs, C.H. (1964). A Theory of Data. New York:
hypothesis testing, includes model adjustments for
Wiley.
various theoretical and empirical reasons. Hence,
empirical research in practice is rarely purely inductive
Coxon, A.P.M., & Jones, C.L. (1978). The Images of
or deductive (see also Bryman, 2001). Occupational Prestige. London: Macmillan.
Dale, A., Arbor, S., & Proctor, M. (1988). Doing
Secondary Analysis (Contemporary Social Research
Series No. 17). London: Unwin Hyman.
Dancy, J., & Sosa, E. (1992). A Companion to
REFERENCES
Epistemology. Oxford: Blackwell.
Duncan, O.D., Fischhoff, B., & Turner, C.F. (1984).
Abrami, P.C., Cohen, P.A., & d’Apollonia, S. (1988). Domain of the study: Objective and subjective
Implementation problems in meta-analysis. Review phenomena. In C.F. Turner & E. Martin (Eds.),
of Educational Research, 58, 2, 151–179. Surveying Subjective Phenomena (vol. 1). New York:
Abramson, P.R., & Inglehart, R. (1995). Value Change Sage.
in Global Perspective. Ann Arbor, MI: Michigan Elias, P. (1997a). Social class and the standard
University Press. occupational classification. In D. Rose & K. O’Reilly
Alexander, J.C., & Giesen, B. (1987). From reduction to (Eds.), Constructing Classes: Towards a New Social
linkage: The long view of the micro-macro debate. Classification for the UK. Swindon: ESRC/ONS.
In J.C. Alexander, B. Giesen, R. Munch, N.J. Smelser Elias, P. (1997b). Occupational Classification: Con-
(Eds.), The Micro-Macro Link. Berkeley: University of cepts, Methods, Reliability, Validity, and Cross-
California Press. National Comparability. Occasional Papers, 20,
Asch, S.E. (1946). Forming impressions of personality. OECD, Warwick: Institute for Employment Research.
Journal of Abnormal and Social Psychology, 41, ESS (2004). ESS documentation report 2002/2003. The
1230–1240. ESS Data Archive. Norwegian Social Science Data
Austin, J.L. (1962). Sense and Sensibilia. Oxford: Services. http://www.europeansocialsurvey.org/
Clarendon. Fishbein, M., & Ajzen, I. (1975). Belief, Attitude,
Bartlett, F.A. (1932). A Study in Experimental and Social Intention, and Behavior: An Introduction to Theory
Psychology. Cambridge: Cambridge University Press. and Research. Reading, MA: Addison-Wesley.
Beck, N. (2001). Time-series cross-section data: What Ganzeboom, H.B.G., & Treiman, D.J. (1992). Interna-
have we learned in the past few years? Annual Review tional Stratification and Mobility File: Conversion
of Political Science, 4, 271–293. Tools. Utrecht: Department of Sociology.
Bergman, M.M. (2002). Reliability and validity in Ganzeboom, H.B.G., & Treiman, D.J. (1996). Interna-
interpretative research during the conception of the tionally comparable measures of occupational status
research topic and data collection. Sozialer Sinn, 2, for the 1988 international standard classification of
317–331. occupations. Social Science Research, 25, 201–239.
Bernard, H.R. (2005). Research Methods in Anthro- Geertz, C. (1988). Works and Lives: The Anthropologist
pology: Qualitative and Quantitative Approaches as Author. Stanford, CA: Stanford University Press.
(4th ed.). Walnut Creek, CA: Alta Mira. Giddens, A. (1991). Modernity and Self-Identity: Self and
Brewer, J., & Hunter, A. (2006). Foundations of Society in the Late Modern Age. Stanford: Stanford
Multimethod. Research: Synthesizing Styles (2nd ed). University Press.
Thousand Oaks, CA: Sage. Glaser, B.G. (2005). The Grounded Theory Perspec-
Bryman, A. (2001). Social Research Methods. Oxford: tive III: Theoretical Coding. Mill Valley, CA: Sociology
Oxford University Press. Press.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and Glaser, B.G., & Strauss, A. (1967). Discovery of
discriminate validity by the multitrait-multimethod Grounded Theory: Strategies for Qualitative Research.
matrix. Psychological Bulletin, 54, 297–312. Chicago: Aldine.
Clifford, J. (1983). On ethnographic authority. Represen- Glass, G.V. (1976). Primary, secondary, and meta-
tations, 1, 2, 118–146. analysis of research. Educational Researcher, 5, 3–8.
Clifford, J. (1986). Introduction: Partial truths. Gwet, K. (2001). Handbook of Inter-Rater Reliability.
In J. Clifford & G.E. Marcus (Eds.), Writing Culture: Gaithersburg, MD: StatAxis.
COMBINING DIFFERENT TYPES OF DATA FOR QUANTITATIVE ANALYSIS 601
Halaby, C. (2004). Panel models in sociological research: Moore, G.E. (1953). Some Main Problems of Philosophy.
Theory into practice. Annual Review of Sociology, 30, London: George, Allen and Unwin.
507–544. Münch, B., & Smelser, N.J. (1987). Relating the micro
Heider, F. (1944). Social perception and phenomenal and macro. In J.C Alexander, B. Giesen, R. Münch,
causality. Psychological Review, 51, 358–374. N.J. Smelser et al. (Eds.), The Micro-Macro Link.
Hofstede, G. (2001). Culture’s Consequences: Compar- Berkeley: University of California Press.
ing Values, Behaviors, Institutions, and Organizations Pearson, K. (1904). Report on certain enteric fever
across Nations. Thousand Oaks, CA: Sage. inoculation statistics. British Medical Journal, 3,
Herrnstein, R.J. & Murray, C. (1994). The Bell Curve: 1243–1246.
Intelligence and Class Structure in American Life. Piaget, J. (1955). The Construction of Reality in the Child.
New York: Simon & Schuster. London: Routledge and Kegan Paul.
Hunter, J.E., & Schmidt, F.L. (1990). Methods of Meta- Raiffa, H. (1968). Decision Analysis. Reading, MA:
Analysis: Correcting Error and Bias in Research Addison-Wesley.
Findings. Newbury Park, CA: Sage. Richard, R. (1991). Objectivity, Relativism, and Truth.
Hurley, S. (1998). Consciousness in Action. Cambridge, Cambridge: Cambridge University Press.
MA: Harvard University Press. Russell, B. (1927). The Analysis of Matter. New York:
Jackson, F.C. (1977). Perception: A Representative Harcourt, Brace.
Theory. Cambridge: Cambridge University Press. Saam, N.J. (1999). Simulating the micro-macro link: New
Johnson, A.W. (1978). Quantification in Cultural approaches to an old problem and an application to
Anthropology. Stanford: Stanford University Press. military coups. Sociological Methodology, 29, 43–79.
Joye, D., Bergman, M.M., & Lambert, P. (2003). Schwartz, S.H. (1999). A theory of cultural values and
Intergenerational educational and social mobility some implications for work. Applied Psychology – an
in Switzerland. Swiss Journal of Sociology, 29, 2, International Review, 48, 23–47.
263–291. Schwartz, S.H. (2005). Universalism values and the
Kiecolt, K.J., & Nathan, L.E. (1985). Secondary Analysis inclusiveness of our moral universe. In A.-M. Pirttilä-
of Survey Data (Quantitative Applications in the Social Backman, M. Ahokas, L. Myyry, & S. Lähteenoja
Sciences). Newbury Park, CA: Sage. . (Eds.), Values, Morality and Society: Change and
Kuhn, T.S. (1970). The Structure of Scientific Revolutions Diversity. Helsinki: Gaudeamus.
(2nd ed.). Chicago: Chicago University Press. Schwartz, S. H., Verkasalo, M., Antonovsky, A., &
Kuhn, T.S. (1983). Rationality and theory choice. Journal Sagiv, L. (1997). Value priorities and social desirabil-
of Philosophy, 80, 10, 563–570. ity: Much substance, some style. British Journal of
Leedy, P.D. (1989). Practical Research: Planning and Social Psychology, 36, 3–18.
Design (4th ed.). London: Collier Macmillan. Treiman, D.J. (1977). Occupational Prestige in Compar-
Lipsey, M.W., & Wilson, D.B. (2001). Practical ative Perspective. New York: Academic Press.
Meta-Analysis (Applied Social Research Methods). Triandis, H.C. (1995). Individualism and Collectivism.
Thousand Oaks, CA: Sage. Boulder, CO: Westview.
Luhmann, N. (1982). The Differentiation of Society. Tyler, S.A. (1986). Post-modern ethnography: from
New York: Columbia University Press. document of the occult to occult document.
Maddala, G.S. (1999). On the use of panel data methods In J. Clifford & G.E. Marcus (Eds.), Writing culture:
with cross-country data. Annales d’économie et de The Poetics and Politics of Ethnography. Berkeley:
statistique, 55–56, 429–448. University of California Press.
Malinowski, B. (1948/1916). Magic, Science and Walliman, N. (2005). Your Research Project (2nd ed.).
Religion, and Other Essays. Boston: Beacon. London: Sage.
Marsh, C. (1982). The Survey Method: The Contribution White, N.P. (1991). Plato’s epistemological metaphysics.
of Surveys to Sociological Explanation. Winchester, In R. Kraut (Ed.), Cambridge Companion to Plato.
MA: Allen & Unwin. Cambridge: Cambridge University Press.
Mill, J.S. (1961[1843]). A System of Logic. London: Zaller, J.R. (1992). The Nature and Origin of Mass
Longmans, Green & Co. Opinion. Cambridge: Cambridge University Press.
36
Writing and Presenting
Social Research
Amir Marvasti
Traditionally, there has been a divide between In the social sciences, while some remain
‘science’ and ‘literature,’ mostly due to devoted to the traditional divide, there
the belief that representing ‘scientific facts’ is a growing awareness of the rhetorical
requires a method of writing that is free from dimensions of writing and representing facts,
aesthetic whimsy and emotions. A procedural particularly among qualitative researchers
approach to writing was first developed by (see, for example, Alasuutari 1995 and
natural scientists (e.g. physicists) and later Gubrium and Holstein 1997). This reflexive
adopted by social scientists (e.g. sociologists) or rhetorical turn, as it is often called,
as the ideal model for disseminating facts. centers on the recognition that any effort
Thus grew the two representational cultures to inscribe social reality invariably involves
of science and literature, with the former linguistic constructive practices as well.
presiding over the domain of ‘universal Perhaps the work that is most widely cited
truths’ and the latter being relegated to the in connection with this movement in the
world of fiction and individualistic self- social sciences is James Clifford and George
expression. Marcus’s Writing Culture: The Poetics and
The divide between science and literature Politics of Ethnography (1986). This edited
went unchallenged well into the second volume calls for social scientists, particularly
half of the twentieth century. However, ethnographers, to see writing as a craft
a ‘third culture’ of representation (Shaffer that involves culture, aesthetics, and politics.
1998) is now questioning the necessity of As stated in this book’s introduction, ‘the
treating science and literature as mutually making of ethnography is artisanal, tied to the
exclusive realms of knowledge. This emerg- worldly work of writing’ (p. 6).
ing interdisciplinary field focuses on the Another important work in this area is John
reflexive relationship between the two worlds Van Maanen’s Tales of the Field (1988). This
of representation where literature influences book is also concerned with ethnography and
science and science informs literature. its stylistic conventions. Through secondary
WRITING AND PRESENTING SOCIAL RESEARCH 603
analysis, Van Maanen identifies different proposed writing strategies for texts that are
genres of ethnographic texts (e.g. realist, sensitive to postmodern sentiments. It has
confessional, and impressionist). He argues been suggested that these experiments or
that rather than describing a single social alternative representational forms expand
reality seen from multiple perspectives, the representational space of ‘value-free’
variations in writing construct realities of research, provide strategies for challenging
their own. For Van Maanen, ‘[T]here is no dominant texts, and convey fresh perspec-
way of seeing, hearing, or representing the tives on old questions. Alternative forms
world of others that is absolutely, universally, of writing also have been the subject of
valid or correct’ (p. 35). considerable criticism, which I take up in the
In the analysis of writing as representational conclusion.
practice, some of the greatest contributions In the remainder of this chapter, I offer
come from feminist scholars who have doc- a brief survey of these alternative writing prac-
umented the absence or distortion of female tices by focusing on the following six genres:
subjectivity in dominant textual paradigms (1) writing with pictures, (2) performative
(e.g. Irigaray 1985 and Butler 1990). At the writing, (3) writing factual fiction, (4) poetic
same time, feminists have turned our attention representation, (5) writing the author, and
to the linguistic nuances and conventions (6) post-colonial writing. I end the chapter
of texts and their gendered tones. For with a critical assessment of these genres.
example, Laurel Richardson (1990, 2000)
shows the prevalence of literary devices
(e.g. metaphors) in social science texts. WRITING WITH PICTURES
For her, scientific writing is never neutral
but is invariably embedded in practices of The old saying ‘a picture speaks a thou-
power and oppression. As she writes, ‘power sand words’ is now considered theoretically
is, always, a sociohistorical construction. naïve—pictures, like written texts, are seen
No textual staging is ever innocent. We are as constructive of the realities they represent.
always inscribing values in our writing. It is Gillian Rose’s Visual Methodologies (2001)
unavoidable’ (1990, p.12). offers an excellent postmodern analysis of
As a whole, the textual shift in the the place of the visual in contemporary
social sciences relates to a larger movement society and social research. According to
that explicitly and intensely questions the Rose, rather than simply providing ‘realistic’
value and presumably benign character of all representations, the visual creates the reality
scientific knowledge. This movement largely under observation. Images provide ways of
referred to as ‘postmodernism’ or ‘post- seeing social issues from particular cultural
structuralism’ challenges the very authority standpoints. Thus a given image can be
and linguistic structures of science and their interpreted in different ways depending on the
representations of ‘truth.’ For example, the viewers and their cultural sensibilities.
renowned sociologist and postmodern thinker, While the visual has always had a place
Norman Denzin (1993), states in the social sciences, its use and analysis
have fluctuated over the history of var-
[i]f there is a center to recent critical poststructural
thought, it lies in the recurring commitment to strip ious disciplines. For example, more than
any text of its external claims to authority. Every a hundred years ago, the American Journal
text must be taken on its own terms. The desire to of Sociology, the flagship journal of the
produce a valid and authoritarian text is renounced. discipline, published a number of articles that
Any text can be undone in terms of its internal-
used photos as data (Stasz 1979). According
structural logic. (p. 136)
to Elizabeth Chaplin (1994: 201), the first
While some have dismissed the textual shift manuscript of this type was F. Blackmar’s
as a passing fad, others have embraced it ‘The Smoky Pilgrims’ published in 1897. The
as the new logic of social science and have study depicted poverty in rural Kansas using
604 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
posed photographs. Yet, this earlier interest that cinematic representations are mere
in the visual waned as the written word entertainment with no social value. Instead,
accompanied with numerical analysis became he argues that we understand and express
the dominant mode of sociological analysis. ourselves and our social settings through
In a way, statistical figures, charts, and tables Hollywood films. According to Denzin, cin-
became the visual centerpieces of professional ematic representations both describe social
sociological publications (Marvasti 2003). realities and mandate a way of seeing
It is worth noting that this trend was or accepting these realities. Consider, for
not followed in the related discipline of example, his analysis of the movie When
anthropology where the visual has remained Harry Met Sally:
a strong and legitimate component of the
discipline’s representational practice. The movie … is a ‘Field Guide to Single Yuppies’. …
As such it takes a stand on and defines the
In the different editions of the Handbook of
following problematic terms; being single versus
Qualitative Research, Douglas Harper offers being married; sexuality and women’s orgasms;
thorough surveys of the growing field of love, sexuality, and friendship; life after divorce, or
visual research. In the most recent edition after breaking up with a lover. These terms are
(2005), he notes, for example, that Contexts, presented as obstacles. … The solutions are gender
specific. Women must not be single, must learn
a relatively new journal of the American
how to fake orgasms, so that males think they
Sociological Association, makes use of visual have sexual power. … Men, on the other hand,
images in three ways. First, images can be must have a woman who lets them think they can
used to illustrate the text. Second, they are make them sexually happy. They need male friends
used as part of visual essays where the images to talk to, because women don’t understand male
sexuality. In this battle between the sexes, sex must
dominate the discussion and the text for
be overcome, before love and friendship can be
the most part describes the images. Third, achieved. (Denzin 1995: 117)
Contexts articles sometimes use images to
visually depict the process of social change According to this analysis, such cinematic
(748–749). representations mandate a way of thinking
In the broader context of writing in the about male-female relationships. When Harry
social sciences, one can think of the visual Met Sally becomes a sort of how-to guide
in two ways: (1) writing about pictures and on heterosexual relations, constructing and
(2) writing with pictures (as is the case describing the reality of how men and women
with most typologies, these categories are not should relate to one another. Over time
mutually exclusive). Writing about pictures cinematic representations become taken-for-
involves the analysis of existing images, granted truths that both construct and validate
often for the purpose of cultural critique. gender stereotypes.
For example, in his landmark sociological In the field of anthropology, Catherine
study, Gender Advertisements (1979), Erving Lutz and Jane Collins’ Reading National
Goffman analyzed how gender roles and Geographic (1993) offers a brilliant critique
expectations are reflected in magazine ads. of the representations of non-Western cultures
Using over 500 photos, he critiqued taken-for- in the National Geographic. This analysis
granted nature of gender relations in Western connects the magazine’s photographs with
societies. Goffman showed how magazine ads Western assumptions about ‘savage’ cultures
in the late 1970s, depicted men in active roles and their exotic lifestyles. As Lutz and
(doing things like helping patients or playing Collins put it, ‘Non-Westerners draw a look,
in sports), whereas the women were depicted rather than disattention or interaction, to the
as mere spectators, passively watching the extent that their difference or foreignness
men’s activities. defines them as noteworthy yet distant’ (188).
Similarly, in Images of Postmodern Society The authors show how such ‘looks’ are
(1991) and Cinematic Society: The Voyeur’s reflected in the National Geographic’s rep-
Gaze (1995), Norm Denzin rejects the notion resentations of ‘foreignness.’ The magazine’s
WRITING AND PRESENTING SOCIAL RESEARCH 605
photos can thus be seen as ‘gazes’ that subjected to “scientific” and “professional”
construct the exotic other. discourse. Photography resists a language of
Aside from analyzing existing images, analysis. The image speaks in silence. We give
writing with pictures could also involve ourselves up to that which is beyond language
creating first-hand visual material for the and rational thought’ (p. 381). In a sense,
purpose of illustrating, complementing, or Quinney uses photographs in the same way
transcending the written text. In the social some social scientists use poetry to transcend
sciences, anthropology is a leader of the the limits of scientific and ordinary language
use of pictorial and filmic materials for (poetic representations are discussed later in
illustrative purposes. For example, G. Bateson this chapter).
and Margaret Mead’s Balinese Character: The use of photographs is most common in
A Photographic Study (1942) juxtaposes text multidisciplinary fields like cultural studies.
and the visual in a complementary way so that For example, Crossing the Divide: Strangers,
one would enhance the meaning of the other. Neighbors, Aliens in New America presents
In the words of the authors, interviews with people from the multiethnic
communities of Queens, New York. Here is
We are attempting a new method of stating the
intangible relationship among different types of
how the authors describe the project:
culturally standardised behavior by placing side by
We decide to become travelers in our own
side mutually relevant photographs. … By the use
backyard. For three years we trek between the
of photographs, the wholeness of each piece of
shadows of the block-long superstores that now
behavior can be preserved. (Bateson and Mead
dominate most of the major boulevards in Queens,
1942: xii, as quoted in Harper 1994: 404)
down the side streets, into the bodegas, family-
For example, by placing a series of photos of owned restaurants, homes, places of worship,
libraries, and community rooms—looking for
a given native ritual on one page and related
migrations stories, culture, and soul. (Lehrer and
text on the opposite page, Bateson and Mead Sloan 2003: 12–13)
encourage their readers to see and read the
story simultaneously. The still photos in this book show the
In sociology, one of the most recognized interviewee’s faces, the places where they live
voices of the visual has been Howard Becker, and work, and the cultural artifacts that define
who in a 1975 article called for advancing their ethnic background. Even the written text
beyond photography as an art form to itself is manipulated for visual effect with
seeing it as a mode of representing and different font types, sizes, and colors adding
analyzing social reality. He also promoted more layers of textuality and meaning to the
greater appreciation for the role of social work.
theory in the production and analysis of Similarly, Body Type: Intimate Messages
photographic images (Harper 1994: 406). Etched in Flesh (Saltz 2006) tells the
Becker subsequently published Exploring stories of tattoos and the people who wear
Society Photographically (1981), an edited them. The written text plays a minimal role
book with a visual presentation style similar in this book. Instead, the photographs of
to that of Bateson and Mead. tattooed body parts dominate the book. Each
Photographs can also be incorporated in photograph is accompanied with a direct quote
writing personal narratives. For example, explaining its significance for the tattooed
Richard Quinney (1996) uses photographs person. Interestingly, the book does not
from his father’s trip to California in the contain any facial images; the respondents are
1920s to tell the intimate, nostalgic story of identified only through their tattoos.
his relationship with his father. Even though Writing with the visual continues to expand.
Quinney’s photographs are interspersed with As Douglas Harper (2005) notes, emerging
a good deal of writing, he gives greater computer technologies are revolutionizing the
weight to the visual impact of his work. use of visual material in social research.
In his words, ‘photographs are not to be Particularly, multimedia texts can now easily
606 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
combine pictures and written material in the and its conventions. In Sarah Finely’s words,
same context, thanks to technology that is ‘art-based research’
exceedingly affordable. Additionally, multi-
is an act of political emancipation from the
media texts can be posted on internet websites dominant paradigm of science for new paradigm
accessible to users virtually from any location researchers to say “I am doing art” and to mean
in the world. A key feature of internet-posted “I am doing research” – or vice versa. In either
multimedia text (e.g. ‘hypertext’) is that the utterance, that art and research are common acts
material does not have to be read or viewed makes a political statement. (Finely 2003: 90, cited
in Finely 2005: 685)
linearly like a bound book. So-called ‘hot
links’or ‘hyperlinks’allow the readers to jump There are many variations to this approach
from one passage to another. For example, where the author becomes an acting voice or
while reading a hypertext ethnography, the body in evocative texts. For the purpose of
reader can click on pictures from the field, see this discussion, I present a research example
an image of a respondent, and click on his that literally involves a staged performance.
name to see excerpts from an interview with Specifically, I use Gray Ross et al’s ‘Making
that respondent. a Mess and Spreading It Around: Articulation
Sarah Pink (2001) suggests that hypertext of an Approach to Research-Based Theater’
brings a sort of reader-oriented coherence to to offer a summary of how social research is
ethnographic research. In her words, ‘The transformed into theater. The original research
coherence of ethnographic hypermedia is data for the staged performances discussed in
created in the relationship between the design this work come from Ross et al’s studies of
of the text and how it is interpreted. It depends cancer patients (i.e. women with breast cancer
on authors’ creativity for the former and users’ and men with prostate cancer).
for the latter’ (169). Pink also notes that The first step in staging research is
hypertext allows for continuous revisions of preparing a script. The authors recommend
the original work: avoiding ‘representations that fail to deliver
the promise of an engaging and visceral
Theoretically, this means neither knowledge itself connection with the research material’ (Ross
nor representations of knowledge are ever
et al 2002: 62) by consulting expert directors,
complete. … Practically, this means that, unlike
printed books and finished films, on-line hyperme- scriptwriters, set designers—generally people
dia texts may be up-dated, added to, or altered. with expertise about what does or does not
Video sequences may be re-edited, photographs work on stage. Additionally, Ross et al suggest
manipulated in new ways, written words changed, that the following groups be included in the
and the hyperlinks between them modified. (p. 167)
development of the script: (1) researchers who
are familiar with the nuances of the data;
For an example of hypermedia ethnographies (2) research participants whose stories are
discussed in Pink (2001), visit the following being told; and (3) people who are ‘naïve to
website: http:anthropology.ac.uk/Bhalot the area under study’ (p. 63) and can provide
insight about how outside audiences might
respond to the performance.
The script itself can incorporate: (1) the
PERFORMATIVE WRITING original research findings; (2) a ‘second
research process’ (64) where new insights
This genre of writing is the most aesthetically emerge through secondary analysis and exam-
conscious (Ellis and Bochner 1992; Paget ination of the original data; and (3) invented
1995; Mienczakowski 1996; Denzin 1997, scenes from rehearsals and improvisations.
2000, 2003). Like other genres discussed The script should then be read, reread,
thus far, the goal here is to transcend the rehearsed, and revised.
limits of ordinary language and to, overtly or Finally, the cast could include both orig-
covertly, rebel against mainstream academia inal research participants and actors who
WRITING AND PRESENTING SOCIAL RESEARCH 607
have become intimately familiar with the pedagogical and theoretical models for the
roles. To encourage audience participation kind of alternative writing or ‘creative ana-
a traditional viewing can be followed by lytical practices’ (Richardson and St. Pierre
a discussion and question-and-answer session 2005: 962) that are now gaining momentum
with the actors, researchers, and director. in the field. The same frustrations about the
Of course, this entire process involves limitations of objectivity and the need to
deliberate choices about what is included ‘bring the text to life’ inspired journalists
and what is excluded from the research. to experiment with innovative modes of
For example, an important research finding representing the stuff of everyday life and find
may not be dramatically and aesthetically ways of ‘writing about oneself in relation to
powerful and thus cannot be included in the the subject at hand’ (Brett Lott, cited in Moore
script. Ross et al advise against improvising 2007: 280). The sociological emphasis on the
the material to the point where the original reflexive relationship between the self and
research participants no longer recognize the social world is echoed in the pedagogy
themselves on the stage. This commitment of creative nonfiction. For example, in his
to ‘real’ people seeing themselves on the introductory text for English courses about
screen serves two purposes. On a practical this genre, Dinty Moore delineates the link
level, if a dramatization of a tragedy does not between reality and the imaginative author in
connect with the very people who endured this way:
the suffering, then there might be reason to
believe that the work has failed theatrically. A subject becomes noteworthy, in other words,
On a more analytical level, the matter of because the author takes close notice, and then
finds a way to transmit his or her own fascination
authenticity takes center stage here, so to with the subject to the curious reader. Moreover,
speak. That is, we are once more faced with a writer of creative non-fiction is not asked to
the question: To what extent does the per- be invisible .… In fact, voice and point of view
formance represent ‘real’ life experience? As are fundamental to what is creative about creative
this example indicates, alternative practices non-fiction. (2007: 11)
do not necessarily resolve representational
Creative nonfiction writers have offered
dilemmas; sometimes they simply transport
insightful analyses of topics that are the
the questions to a different arena. In the case
mainstay of the social sciences. For example,
of research-as-theater, as the written text is
through ‘total immersion’ (the equivalent of
set aside in favor of bodily performance,
what Adler and Adler (1987) call ‘complete
the problem of representing ‘authentic selves’
participant role’), Lee Gutkind explores the
migrates onto the stage.
‘humanistic aspects of the high-tech medical
world’ (1998: 6). His book Many Sleepless
Nights looks at the lives and practices
WRITING FACTUAL FICTION
surrounding organ transplantations. Gutkin
observes that in their single-minded devotion
Despite the apparent contradiction in the
to ‘saving lives’ surgeons become detached
phrase, factual fiction or what is known
from the emotional health of the very lives
as ‘creative nonfiction’ outside the social
they are saving:
sciences, is an exciting and influential
school of writing with a long and distin- I once listened to a prominent surgeon impatiently
guished history of transgressing the divide interrupt a resident who was carefully explaining
between objective truth and imagination (see, a procedure to a family member, prompting
for example, Truman Capote’s 1966 novel him to “save lives first—answer questions later.”
In Cold Blood). As Michael Agar notes Another surgeon told me, in defense of his
insensitive behavior, “Psychologic [sic] trauma and
(1995), although largely ignored by social all that stuff is important, but it doesn’t make a
scientists, creative nonfiction and literary goddamn difference if you are well-adjusted and
journalism in many respects could serve as dead.” (p. 7)
608 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
To the degree that the social scientific genre Similarly, the famed Persian poet Omar
can be viewed as different from ‘creative Khayam was considered an important
nonfiction’ is that the former has different astronomer and mathematician, and his poetry
disciplinary ties and is more explicitly in the Rubaiyat is as much about the physical
committed to systematic and scholarly wonders of the universe as it is about aesthet-
research. For example, Rosenblatt writes ics and self-exploration per se. So the recent
that at the end even the most creative social attempt by social scientists to use poetry in
science writer ‘must still be a craftsperson, conveying their observations is not entirely
a consummate interviewer, a doubter, without precedent, nor is it entirely ‘new.’
WRITING AND PRESENTING SOCIAL RESEARCH 609
The social scientist most widely associated experience. The genre simply gives the
with use of poetic prose in qualitative texts is author greater creative latitude in telling
Laurel Richardson, who argues, the story. As Richardson notes about the
above poem, ‘The speech style is Louisa
Poetic representation … is a practical and powerful,
indeed transforming, method for understanding
May’s, the words are hers, but the poetic
the social, altering the self, and invigorating the representation, including the ordering of the
research community that claims knowledge of our material, are my own’ (883).
lives. (Richardson 2002: 888) Again, initially, this kind of writing may
It is worth noting that this method of writing seem a radical departure from mainstream
does not imply an anything-goes approach representational practices in the social sci-
to writing. Formal training and conventions ences, but in some ways it is simply an
still apply. In fact, Richardson recommends extension of existing practices. In particular,
poetry classes for anyone interested in cre- qualitative researchers have always had the
ative writing of social science. She reminds discretion to use some material and not
her would-be followers that writing poetry others. Arguably, the choices that shape the
involves learning the basics of a craft like ‘final report’ have never been completely
any other. Richardson draws attention to the detached from aesthetic concerns. To the
importance of ‘sound, sight, and ideation’ degree that ethnographers strive to tell a
(p. 881) (i.e. tone, imagery, and symbolism) coherent story their field experiences, they all
in poetic representations and chides, engage in poetic revisions. Surprisingly, this
observation equally applies to quantitative
A line writing. I recently attended a job interview in
break does which the candidate presented several colorful
not
a poem
graphs of a regression analysis. In a sense, the
make. (p. 882) statistical logic of the numbers projected on
the screen was complemented by aesthetically
The task of writing or rewriting research pleasing colors and shapes (e.g. a continuous
findings into poetic forms requires familiarity green line for one dependent variable and
with the conventions of the form and a good fragmented red line for another). At one point,
deal of practice. Like traditional poetry, this the candidate was openly complimented for
kind of writing begins with an object or his ‘nice graphs,’making explicit the aesthetic
a thing in the real world but then tries criteria for the assessment of the quantitative
to transcend the object through masterful representation of research findings.
description. The poetry is intended to be
a condensed and more powerful version of
the original text. For example, Richardson WRITING THE AUTHOR
rewrote the transcripts from a five-hour
interview with a Southern woman into a five- A few decades ago, including the subjective
page poem. Here is an excerpt from the voice of the author in the scientific text was
poeticized interview: considered antithetical to the very essence
Well, one thing that happens of science. Today, at least in the realm
growing up in the South of ethnographic texts, writing the author
is that you leave. I into the field notes, or autoethnography, has
always knew I would
become an established method of representing
I would leave. (p. 888)
research findings. There are many flourishing
The goal here is to convey the woman’s forms in this genre and a good deal of
life narrative without losing its emotional empirical and pedagogical literature.
tone to the very words that describe the A thorough survey of this type of writing
experience. Like other social scientific texts, can be found in the introductory chapter of
the basic objective is still representing human Deborah Reed-Danahay’s Auto/Ethnography.
610 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Stylistic variations notwithstanding, one how looking at the world from a specific, per-
can gather form Reed-Danahay’s discussion spectival, and limited vantage point can tell,
that most experts concede that autoethno- teach, and put people in motion’ (2005: 763).
graphic writing is a self-reflexive account of In the field of autoethnography, the works
social experience. The central criterion for of Carol Ronai are exemplary because of
autoethnographic text appears to be that the her ability to combine the best analytical
explicit voice of the author must be embedded innovations of this genre with superior
in a broader social context. Autoethnographic aesthetic sensibility. Ronai’s writing is both
text is expected to tie idiosyncratic stories with informative and politically brave. The story
a larger universe of experiences and meanings. of how her father sexually abused her,
Reed-Danahay makes this point explicit in titled ‘My Mother is Mentally Retarded,’ is
her definition of autoethnography as ‘self- a classic example of what she calls a ‘multi-
narrative that places the self within a social layered account.’ In this particular form of
context’ (1997: 9). autoethnography, the author’s experiential
Having said that, how this is achieved account is juxtaposed against academic and
and for what purposes is the subject of popular discourses. The descriptions are
considerable debate and contention. In Reed- layered and deliberately disjointed using a set
Danahay’s chapter there seems to be a con- of asterisks. To better appreciate the potency
tinuum of representational strategies for of Ronai’s writing, consider the following
autoethnographers. On the one end, there excerpt:
is the minimally self-referential text that
I resent the imperative that all is normal with my
simply adds the author’s own subjective family, an imperative that is enforced by silence,
voice to the many voices and observations and “you don’t talk about this to anyone” rhetoric.
from the field. On the other end, there Our pretense is designed to make event flow
is ‘pure,’ ‘native’ experience represented smoothly, but it doesn’t work. Everyone is plastic
with little or no intervention from academic and fake around my mother, including me. Why?
Because no one has told her to her face that she
sources. For example, John Dorst’s The is retarded. We say we don’t want to upset her.
Written Suburb (1989, cited in Reed-Danahay I don’t think we are ready to deal with her reaction
1997) treats suburbanites’ artistic creations to the truth. … Because of [my mother] and because
(i.e. arts and crafts) as autoethnographic of how the family as a unit has chosen to deal
representations. For Dorst, autoethnography the problem, I have compartmentalized a whole
segment of my life into a lie. (1996: 115)
is a sort of ‘self-documentation’ done by
ordinary people. In this context, expert social As this excerpt shows, autoethnographic text
scientific description is unnecessary because can be a powerful method of representing
in a postmodern society anyone can be a social issue. Ronai’s gripping and ‘author-
an informed author of culture: ‘If the task itative’ voice compels the reader to engage
of autoethnography can be described as the topic. For many readers of ethnography,
the inscription and interpretation of culture, this representation of Ronai’s suffering has
then postmodernity seems to render the become an inescapable memory.
professional ethnographer superfluous’(Dorst
1989: 2, cited in Reed-Danahay 1997: 8).
Other advocates of autoethnography, who POSTCOLONIAL (RE)WRITING
fall somewhere in the middle of the two
extremes on the continuum, emphasize nei- This method of representation in some
ther academic nor ordinary dimensions of ways is as much about rewriting or un-writing
this genre but its potential for political the canonical texts as it is about writing per
action and change. For example, Stacy se. In some ways, postcolonial writing has
Holman Jones (2005) introduces her paper been the analytical engine of the many
titled ‘Autoethnography: Making the Personal alternative forms of representation in the
Political,’ in this way: ‘This is a chapter about social sciences. The seminal contributions
WRITING AND PRESENTING SOCIAL RESEARCH 611
Harper, D. 1994. ‘On the Authority of the Image: Pink, S. 2001. Doing Visual Ethnography. London:
Visual Methods at the Crossroads.’ In Handbook Sage.
of Qualitative Research, edited by N. Denzin Quinney, R. 1996. ‘Once My Father Traveled West to
and Y. Lincoln. Thousand Oaks, CA: Sage. California.’ In Composing Ethnography: Alternative
pp. 403–412. Forms of Qualitative Writing, edited by C. Ellis and
Harper, D. 2005. ‘What’s New Visually?’ In The A. Bochner. Walnut Creek, CA: AltaMira Press.
Handbook of Qualitative Research (3rd ed.), edited pp. 357–382.
by N. Denzin and Y. S. Lincoln. Thousand Oaks, CA: Reed-Danahay, D. 1997. Auto/Ethnography: Rewriting
Sage. pp. 747–762. the Self and the Social. Oxford, UK: Berg.
Hussain, Y. 2005. Writing Diaspora: South Asian Richardson, L. 1990. Writing Strategies: Researching
Women, Culture and Ethnicity. Burlington, VT: Diverse Audiences. Thousand Oaks, CA: Sage.
Ashgate. Richardson, L. 2000. Writing: A method of inquiry.
Irigaray, L. 1985. This Sex Which is Not One. Ithaca, NY: In Handbook of Qualitative Research, edited by
Cornell University Press. N. Denzin, and Y. Lincoln. Thousand Oaks, CA: Sage.
Jones, S. H. 2005. ‘Autoethnography: Making the pp. 923–948.
Personal Political.’ In The Handbook of Qualita- Richardson, L. 2002. ‘Poetic Representation of
tive Research (3rd ed.), edited by N. Denzin and Interviews.’ In The Handbook of Interview
Y. S. Lincoln. Thousand Oaks, CA: Sage. pp. 763–791. Research: Context & Method, edited by
Lassiter, L. E. 2005. The Chicago Guide to Collaborative J. Gubrium and J. Holstein. Thousand Oaks, CA: Sage.
Ethnography. Chicago: The University of Chicago pp. 877–891.
Press. Richardson, L. and E. A. St. Pierre. 2005. ‘Writing:
Lawrence, D. H. 1950. Kangaroo. Middlesex, England: A Method of Inquiry.’ In The Handbook of Qualitative
Penguin Press. Research (3rd ed.), edited by N. Denzin and
Lehrer, W. and J. Sloan. 2003. Crossing the Divide: Y. S. Lincoln. Thousand Oaks, CA: Sage.
Strangers, Neighbors, Aliens in a New America. pp. 959–978.
New York: W. W. Norton & Company. Ronai, C. 1996. ‘My Mother is Mentally Retarded.’
Lutz, C. A. and J. Collins. 1993. Reading the National In Composing Ethnography, edited by C. Ellis and
Geographic. Chicago: The University of Chicago A. Bochner. Walnut Creek, CA: Altamira Press.
Press. pp. 109–131.
Marcus, E. M. and M. Fischer. 1999. Anthropology as Rose, Gillian. 2001. Visual Methodologies. London:
Cultural Critique: An Experimental Moment in the Sage.
Human Sciences (2nd ed.). Chicago: The University Rosenblatt, P. C. 2002. ‘Interviewing at the Bor-
of Chicago Press. der of Fact and Fiction.’ In The Handbook of
Marvasti, A. 2003. Qualitative Research in Sociology. Interview Research: Context & Method, edited by
London: Sage. J. Gubrium and J. Holstein. Thousand Oaks, CA:
Merrison, J. 1998. ‘The Death of the Poet: Coleridge and Sage. pp. 893–909.
the Logic of Science.’ In The Third Culture: Literature Ross, G., V. Invonoffski and C. Sinding. 2002.
and Sciences, edited by E. S. Shaffer. Berlin: Walter ‘Making a Mess and Spreading It Around:
de Gruyter. pp. 170–181. Articulation of an Approach to Research-Based
Mienczakowski, J. 1996. ‘An Ethnographic Act: The Theater.’ In Ethnographically Speaking, edited by
Construction of Consensual Theater.’ In Composing A. Bochner and C. Ellis. Walnut Creek: Altamira Press.
Ethnography: Alternative Forms of Qualitative Writ- pp. 57–75.
ing, edited by C. Ellis and A. Bochner. Walnut Creek, Ryan, S. 1994. ‘Inscribing the Emptiness: Cartography,
CA: AltaMira Press. pp. 244–266. Exploration, and the Construction of Australia.’ In De-
Miner, H. 1956. ‘Body Ritual among the Nacirema.’ Scribing Empire: Post-Colonialism and Textuality,
American Anthropologist 58(3): 503–507. edited by C. Tiffin and A. Lawson. London: Routledge.
Moore, D. 2007. The Truth of the Matter: Art and Craft pp. 115–130.
in Creative Nonfiction. New York: Pearson Longman. Said, E. 1978. Orientalism. London: Routledge.
Moro, P. 2006. ‘It Takes a Darn Good Writer: Saltz, I. 2006. Body Type: Intimate Messages Etched in
A Review of Ethnographic I.’ Symbolic Interaction Flesh. New York: Harry N. Abrams.
29(2): 265–269. Shaffer, E. S. 1998. The Third Culture: Literature and
Paget, M. A. 1995. ‘Performing the Text.’ In Repre- Sciences. Berlin: Walter de Gruyter.
sentation in Ethnography, edited by J. Van Maanen. Stasz, C. 1979. ‘The Early History of Visual Sociology.’ In
Thousand Oaks, CA: Sage. pp. 222–244. Images of Information: Still Photography in the Social
616 THE SAGE HANDBOOK OF SOCIAL RESEARCH METHODS
Sciences, edited by J. Wagner. Beverly Hills, CA: Sage. Van Maanen, J. 1988. Tales of the Field. Chicago:
pp. 119–136. University of Chicago Press.
Tiffin, C. and A. Lawson. 1994. ‘Introduction: The Van Maanen, J. 2006. ‘Ethnography Then and
Textuality of Empire.’ In De-Scribing Empire: Post- Now.’ Qualitative Research in Organizations and
Colonialism and Textuality, edited by C. Tiffin and Management 1(1): 13–21.
A. Lawson. London: Routledge. pp. 1–14.
Index
1-parameter logistic (1PL) model 273, 279 analytic generalization, case studies 223
2-parameter logistic (2PL) model 274, 275, 279 analytic induction 198
3-parameter logistic (3PL) model 274–5, 279 Angel-Ajani, Asale 60–1
Angrist framework 120–2
abbreviated interrupted time series design with Angrist, Joshua and colleagues 120–3
a control series 143–51 applied research, paradigm wars 21–3
description 143–4 archival data, secondary analysis see secondary
examples 149–51 analysis, archival and survey data
strength of design 158 archival research, informed consent 100–1
treatment effects: multiple dimensions 145 archives 507–8, 521–2
unique characteristics: theoretical and empirical archiving and re-use of data
reasons 144–9 longitudinal studies 236
validity 144 secondary analysis, qualitative data 513–14
Abstracted Empiricism 72 Arnkil, Tom Erik 79
abstraction 77 Asch, S.E. 597
accuracy in parameter estimation (AIPE) 171–2 Ashenfelter, O., job training study 144
omnibus effect in multiple regression 182–5 assessment criteria, qualitative research 49
standardized regression coefficients 186–7 assessment measures, cultural equivalence 103–4
targeted effects in multiple regression 185–7 attitude scaling 30
unstandardized regression coefficients 185–6 attrition, longitudinal studies 235–6
actions, as situated 496 atypical case 217
actor-network-theory (ANT) 485 Australia, ethical guidelines 96–7
adaptive tests 283–4 autocorrelation, residuals 384
adjustment strategies for equating groups at baseline, autoethnography 609–10
comparing randomized experiments and
observational studies 421–5 Bakhtin, M.M. 456
Affluent Worker (Goldthorpe et al.) 16–17, 204–5, 234 Bartlett, F.A. 597
Agodini, R., dropout prevention study 155–6 Basics of Qualitative Research (Strauss and Corbin) 466
Aiken, L.S. and colleagues, TSWE study 140, 153–4 Bateson, G. 605
AIPE see accuracy in parameter estimation Beck-Gernsheim, Elisabeth 91
Alberoni, F. and colleagues 195–6 Beck, U. 91
Allen, C. 101 Becker, Howard S. 33, 72–3, 200, 207, 605
ambiguity, in standards of performance 77 Belmont Report 96
American Anthropological Association, ethical benefits of research, fair distribution 102–3
guidelines 96 Bertaux, Daniel 88, 91–2, 449
American pragmatism 83–5, 89–90 Beuchat, Henri 56
American Psychological Association (APA), ethical Beveridge, Sir William 70
guidelines 96, 98 Bhabha, Homi 611
American Sociological association, ethical bias 45, 116, 171
guidelines 96 biographical material 83–4
analysis biographical research 266–7
early survey research 29 biographical interpretive method 346–7
qualitative data 372–4 comparison of methods 350–4
quantitative data 371–2 context 346
analysis of covariance 423 as illustrative of debates in social research 81
of documents 479–80 increased popularity 344
618 INDEX