You are on page 1of 397

SW 4 Quantitative Research

Methodology
Notes

Topics

What is quantitative research


Research Process Overview
Research Proposal Components
Problem Formulation
Literature Review
Objectives
Concepts, Variables, Levels of Measurement

Research Design and Types of Research Design


Research Ethics
Sampling
Tools for Data collection
Hypothesis and Hypothesis Testing
Data Analysis
Univariate and Bivariate Data Analysis
Graphic Representation of Data
Descriptive Statistics - Measures of Central Tendency,
Dispersion and Chi-Square test

Inferential Statistics
Correlation
T-test and ANOVA
Linear Regression

Report Writing

Steps of the Research Process


o

o
o
o
o
o
o
o

Selection of Topic (Problem identification/problem


statement)
Formulate Research Questions, Study objectives
Conceptualize research design
Construct measure/instrument/tool
Collect data
Clean & Analyze data
Interpret data
Inform others (Write research report, disseminate findings)

Selecting Researchable Topics


o
o
o
-

Research topic is a concept, subject or issue that can be


studied through research
Research topics in social research often- people, problems,
programmes, phenomena
Developing research questions
They can be about local or global phenomenon
Very specific or more general
Can focus on the past, present or future
Can be related to social reality or seek answer to social
problems
Can be inductive or deductive in nature
Aim at exploring new knowledge or seek to fill gaps in
existing knowledge

Sources of Research Questions


o
o
o
o

Values and Science


Personal factors-intellectual curiosity
Social, political and economic climate
Research funding

Developing Researchable Question


Literature Review
o Feasibility
-Access
-Time & money
-Expertise
o

Literature Review
Based on the assumption that knowledge accumulates & that we
learn from & build on what others have done.
Goals of Literature Review
To show familiarity & establish credibility of researcher & the
importance of the problem
Provide theoretical background to the study
To show the path of prior research & its linkage with the current
project
To learn & improve research methodology, measures used
To integrate & summarize what is known in an area+ identify gaps
To learn from others & stimulate new ideas
To contextualize the findings of the study & integrate them into the
existing body of knowledge once data collection & analysis is
complete

Types of Reviews
o
o
o
o
o
o

Self study Reviews increase the readers confidence


Context Review- Place specific project in larger context
Historical Review- trace development of an issue over time
Theoretical review- compare how different theories address
the issue
Methodological Review- different methodologies used by
earlier researchers
Integrative Reviews- Summarizing what is known at a point in
time

Sources of Research Literature


o
o
o
o
o
o
o

Scholarly journals
Books
Dissertations
Government documents/Policy reports
Conference papers/ Working papers,
monographs
Web search- www.scholar.google.com
Electronic databases- Sagepub, LexisNexis,
ERIC, JSTOR etc.

Conducting a Systematic Review


o
o
o

Define and refine a topic


Choose focused research questions
Locate research reports, books, documents,
policy reports, conference papers
Present it in a logical sequence

Fundamental Concepts of Social Research


o
o
o
o
o

Objectives
Concepts
Variables
Hypothesis
Assumptions

Selection of Topic
o
o
o
o
o
o

Personal experience
Curiosity
The state of knowledge in the field
Solving a problem
Personal values
Everyday life

Techniques for Narrowing a Topic


1. Published literature is an excellent source of
ideas for research questions2. Talk over ideas to others
3. Apply to a specific context
4. Define the aim and desired outcome- type of
study
5. State specific objectives

Nature and Use of concepts


o

o
o
o
o
o

A concept is abstracted from many sense


impressions or percepts. Concept may be subjective
and may vary markedly from individual to individual
Concepts are foundation of all human
communication and thought
Science requires a more precise communication
Problem in definition and communication
Scientific terms may have different meaning in other
frames of reference
With evolution of knowledge, concepts may be
redefined

Concepts
o
o
o
o
o
o
o
o

Concepts are words or signs that share common


characteristics
Concepts can be concrete and easily measurable or complex
and difficult to measure
Each science develops its own terms or concepts for
communicating its findings
Concepts have meaning only within some frame of reference,
some theoretical system
Concepts are building blocks of a theory
Natural science concepts are often expressed in symbolic
form
Most social science concepts are expressed in words
Concepts are everywhere and used all the time

Concepts
o

o
o

Concepts have two parts: Symbol- word or term and


Definition
Concepts are created from personal experiences,
creative thought or observation
Social science concepts form a specialized language
or jargon
Every field of science has its own jargon
Concepts can refer to concrete objects or abstract
phenomenon

Concepts
o
o
o
o
o
o

Concrete concepts- school, age, height, income, housing,


physical space etc
Abstract concepts- Family, beliefs, social control, intelligence,
personal space etc.
Scientific concepts are more precisely defined than concepts
used in day to day communication.
We rarely use concepts in isolation. They form
interconnected groups or clusters.
Theories contain collection of associated concepts that are
consistent and mutually reinforcing
They can be one-dimensional or complex having multiple
dimensions

Conceptual and Operational


Definitions
o

Conceptual definitions are abstraction,


articulated in words, that facilitate
understanding. definitions in the
dictionaries, or meanings understood in day
to day conversation
Operational definitions consists of a set of
instructions on how to measure a variable
that has been conceptually defined

Operational Definitions of Concepts


o

Operational definitions ensures that other


scientists will understand the terms in the
same way as the researcher
It defines the phenomenon with greater
precision and may leave out some elements
of an older concept.
Operational definition makes it easy to
measure or study an abstract concept

Variables
o

Measurability is the main difference between


a concept and a variable.
Variable take on more than one value and
those values can be numbers or words
Social research is based on defining variables,
looking for associations among them and
trying to understand whether and how one
variable causes another

Dimensions of Variables
o
o
o
o

Variables can be one-dimensional or


multidimensional.
Examples of one dimensional variables- age,
height, weight, birth order, marital status
Multidimensional variables- stress, wealth,
attitude,
A variable must have at least two categories

Types of Variables
o

Independent variable- The cause variable or


the one that identifies forces or conditions
that act on something else is an independent
variable
Dependent variable- The variable that is the
effect or is the result or outcome of another
variable is the dependent variable
Intervening variable- This comes between
the independent and dependent variable &
links them to each other

Variables
o

Not simple to decide whether a variable is


dependent or independent
Simple theories have one dependent and one
independent variable
Complex theories can contain dozens of
variables with multiple independent,
dependent and intervening variables

Levels of Measurements
The categories of a variable should be exhaustive &
mutually exclusive
o Nominal variables The values comprise of a list of
names (religion, states, occupation)- Qualitative
measurement-- involves only classification
o Ordinal variable- the categories have names and
these values can be rank ordered (socio economic
class, opinions)-- involves classification+rank
ordering
o All ordinal variables can be treated as nominal
variables but not the other way round.

Levels of Measurements
o

Interval variables They have all the properties of


nominal and ordinal variables They are mutually
exhaustive and mutually exclusive list attributes .
There is a standard distance between the adjacent
categories.
(temperature,
height,
weight,
measurement scales)
Ratio variables: have names, the categories can be
rank ordered, the adjacent categories have a standard
distance & one of the categories have a true zero
point. E.g. age, income

Types of Variables
Categorical variables- measured at nominal or
ordinal levels where variables are divided into
multiple categories
Continuous variables- have continuity in
measurement and are measured at an
interval or ratio level.

Units of Analysis
Populations of people
o Farms
o Communities
o Myths
o Cities
o Countries
Always collect data on the lowest unit of analysis. It is
easy to aggregate data collected on individuals but
not possible to disaggregate data collected on
groups
o

Hypotheses
o

A hypothesis is a proposition to be tested or a


tentative statement of a relationship between
two variables.
Hypothesis is used to test the direction or
strength of a relationship between variables
3 types of relationships between variableso
o
o

Positive
Negative
Curvilinear

Hypothesis
Criterion of a good hypothesis
Conceptual clarity
Should have empirical referents
Specific
Should be related to available technique of testing
Should be related to a body of theory

Cause and Effect


Cause and effect relationship between two variables
can
be established if the following conditions are met

Two variables covary

The covariation is not spurious

There is a logical time order

A mechanism is available to explain how an


independent variable causes changes in a
dependent variable.

Causal Hypothesis

At least two variables,


Expresses the causal relationship between
the variables
Can be expressed as a prediction or an
expected future outcome
Logically linked to a research question or
theory
It is falsifiable-

Expression of Causal Relationship


o
o
o
o
o
o
o
o
o
o
o

Different ways to express- Attendance in College and


performance in Exam
Causes
Leads
Relates
Influences
Associated with
Produces
Results
If, then
Higher /lower
Reduces

Covariation
o
o

Covariation is also called association


Association is not a sufficient condition but a
necessary condition (time spent in library
scores in exam)
Spurious correlation- No. of firefighters and
amount of damage, gender and lung cancer,
age and readiness to adopt innovation

Hypothesis- a Word of Caution


o

The fact that two variables go together does


not mean that change in one variable causes
change in another variable.
In social science, it is often difficult to
establish causality
Many times we are not sure that cause comes
first (conflicts with friends and low self
esteem)

Research proposal components


The research proposal should include

The problem identification,


Aim of the study; rationale (why is this problem important to study)
A brief literature review to discuss what is known about this topic
The research questions (and hypothesis, if any) / objectives of the
study
Methodology including - type of research design, the sample (type of
sample, sampling frame, size of sample), methods and tool for data
collection.
Ethical concerns related to the study must be included.

PART II TOOL FOR DATA COLLECTION


Students are expected to prepare a tool for data collection which
should be included at the end of the research proposal.

All research proposals must have a list of references and in text


citations.

Quantitative Research Process: Use of


Logic
n

Deductive Process- Begins with abstract logical relationships


among concepts and move towards concrete empirical
evidence
Theory- Hypothesis- Observations to test the hypothesis .e.g.
Ability to manage multiple tasks and gender; Mathematical
aptitude and gender; Effectiveness of experiential learning vs.
learning through lecture method
Inductive Process- Begins with detailed observations of the
world and move towards abstract generalizations and ideas.
Examples: Observation of cases of Malnutrition in a region,
Observations of different aspects associated with
malnutrition among infants/ children below 6years.
Generalization based on the association of factors affecting
status of Nutrition among different age groups of children

Cyclic Model of Science


- Perpetual flow of theory building and theory testing
Theories
Logical deductions
Empirical Generalizations

Hypothesis
Measurement

Statistical or verbal
summarization
Observations

Use of Research
n

Basic Research /academic research/pure


research- Research designed to add to our
knowledge and understanding of the social
world for the sake of contributing to
theoretical knowledge

Applied research it is intended to be


useful in the immediate future and to
suggest action or increase effectiveness in
some area

Types of Applied Research


n

Action Research- focused on immediate application,


problem solving in a particular situation/community not
so much on development of theory or making
generalizations.

Impact Assessment- main purpose is to determine


whether a given program has had the desired effect on
the target population (individual, household, institution)
and to determine whether the desired effect is
attributable to the program or to other factors. It is a fact
finding activity that describes conditions that exist at a
particular time.

Evaluation Research- purpose to collect information to


provide feedback about a project/scheme to assess its
worth or merit. Implies judgment on effectiveness, utility
or desirability of a program or policy.

Research Design
n

A study design is the blue print presenting the


overall plan of why and how the study will be
conducted.
It is a research strategy specifying the number
of cases to be studied, the number of times
data will be collected, the number of samples
that will be used and whether or not the
researcher will try to control or manipulate
the independent variable in some way.

Decisions about Research Design


n

Key factors to consider-

Purpose of research
Researchers interest
General use of theory

n
n

Research Design: Purpose of


Research
n
n
n
n

Exploratory Research
Descriptive Research
Explanatory Research
Applied Research
Studies may have multiple purposes but
one purpose is usually dominant.

Goals of Exploratory Research


Ground breaking research on a relatively unstudied
topic or a new area. Exploratory research addresses
the what question
n
n
n
n
n

Become familiar with the basic facts, people, and


concerns involved
Develop a well grounded mental picture of what is
occurring
Generate many ideas and develop tentative theories
and conjectures
Formulate questions and refine issues for more
systematic inquiry
Develop techniques and a sense of direction for
future research

Goals of Descriptive Research


It is designed to describe groups, activities, situations
or events. It focuses on how and who questions
n
n
n
n
n
n
n
n

Provide an accurate profile of a group


Describe a process, mechanism, or relationship
Give a verbal or numerical picture
Find information to stimulate new explanations
Present basic background information or a context
Create a set of categories or classify types
Clarify a sequence, set of stages or steps
Document information that contradicts prior belief
about a subject

Goals of Explanatory Research

Explanatory research is designed to explain why


subjects vary in one way or the other. It addresses the
why questions.
Determine the accuracy of a principle or theory
Find out which competing explanation is better
Advance knowledge about underlying processes
Link different issues and topics under a common
general statement
Build and elaborate a theory so that it becomes more
complete
Extend a theory or principle into new areas or issues
Provide an evidence to support or refute an
explanation or prediction

Types of Research Designs


1.Research Designs based on time dimensions
n

Cross Sectional Study

Longitudinal study 3 types


n Panel
n Trend
n Cohort studies

Case Study

2. Experimental Study Designs

Time Dimension in Research


n

Cross sectional research

Longitudinal Research- time series


research, panel study research, cohort
analysis

Case Studies

Cross Sectional Study


n

Data are collected for all the variables of


interest using one sample at one time
Data are collected for one sample at one point
in time even if that one point lasts for hours,
days, months or years.
Most widely used design as they are useful to
describe samples or populations on a number
of variables
Usually less expensive and simpler to
implement

Cross Sectional Study


n

Statistical analysis is used to analyse


patterns of relationships
There is at least one independent and one
dependent variable
They are sometimes used to examine causal
relationships when the time order between
the variables is easy to determine

Longitudinal Designs
n
n

Data are collected at least two different times


Types of longitudinal study- panel, trend, or
cohort
Panel study- Same sample is followed over a
period of time- one can collect current data to
determine the order of events and experiences
Panel studies allows documentation of patterns of
change and establishment of time order
consequences

Longitudinal Design: Panel Study


Disadvantages:
n Higher cost in terms of time and money
n The loss of subjects from a study because of
disinterest, death, illness or inability to locate
them
n Changes in the methods of data collection,
and measurement techniques
n Panel conditioning
n Effect of age on the subjects

Longitudinal Design: Trend Study


n
n
n
n
n
n
n

Data collected at least twice selecting a new sample


each time
They avoid panel attrition and panel conditioning
Save expenses of relocating sample members
Measures aggregated changes and not changes in
individuals
May be no aggregated change even though there are
changes in individual characteristics
Difficult to establish causal relationship because a
new sample is examined
Could inform about overall change in society but may
not identify the reasons for this change- thus
informing about general trends

Longitudinal Design: Cohort Study


Cohort means people born within a given
time
frame or experiencing a life event at
approximately the same time
n People can leave the cohort but no one can
join
after its formation
n Cohort is a specific kind of trend study that
studies a cohort over time.
n It can examine an entire generation
n

Experimental Design
n

One group post test only design

One group pre test post test design

Classical Controlled Experiments

Solomon Four group Experiment

Quasi Experimental

One group post test only design


n
n
n
n
n
n
n
n

Only one group


No group to compare
Data collection only after intervention
Can be useful in gathering information about how the
program is functioning
Often used for client satisfaction survey to see their
perception of the programme
Numerous threats to its validity both external and
internal
Internal validity Not sure whether the outcome is the
result of the programme
External validity No use for generalization

One group pre test post test design


Pre test--Programme intervention post test
n

Threats to internal validity- History( no method of


judging effects of other events), Maturation and
effect of testing, changes in the questionnaire
Threats to external validity- History, reactive
effect (change in the behaviour because of the
participation in the study)

Classic Controlled Experiment


n

One experimental and one controlled group

Dependent variable is measured at least two times


before and after the experiment

The independent variable is manipulated/controlled by


the researcher

Used for explanatory research testing causal hypothesis

Classic Controlled Experiment


n

Internal validity- Selection and maturation


interaction

External validity- Selection treatment


interaction and maturation treatment
interaction

Solomon Four Group Experiment


Groups

Time
1( pretest)

2( Post Test)

Experimental

Measure dependent
variable

Measure dependent variable again


post intervention

Control

Measure dependent
variable

Measure dependent variable again


(no intervention)

Experimental (no
pretest)

Measure dependent variable


(post intervention)

Control (no pretest)

Measure dependent variable


(no intervention)

Solomon Four Group Experiment


n

It takes care of
Testing the main effects of the
experiment
Understanding the interaction effect of
testing
Combined effect of maturation and
history

Solomon Four Group Experiment


n

Same as classic experiment with the addition of


post test only control and experimental groups
Four groups are- Experimental group with pretest, experimental group only post test,
Control group with pre -test and control group
without pre test
Stimulus is introduced in both the experimental
Groups
Post test measurements for all four groups

Experimental Study
n
n
n

Use of Random assignment to select groups


Use of matching to select groups
Experimental group receives the stimulus and
controlled group receives nothing
Difference in dependent variable as measured at
pre test and post test is calculated for both the
groups
Field experiments are done in real situations so
better generalizability
Laboratory research allows researcher a better
control over setting

Quasi Experimental Designs


n
n
n

No pre test
Post test only
Controlled group may be given alternate
treatment or placebo
Not as reliable as the other experimental
designs
Eliminates threats to internal validity and
can establish causality

Ethics in Research

66

Why are we talking about ethics in


social sciences?
Ethics is concerned with the conduct of human
beings
SS are conducted with the participation of
human beings
Have an impact on human beings or on the
wider society
67

Context specific .
Can be universal
Can be specific to a particular context
Can specific to a particular locality

68

Four well known moral principles of


ethics
The Principle of Non-maleficence
Research must not cause harm to the participants in particular and to people
in general
The Principle of Beneficence
Research should also make a positive contribution towards the welfare of the
society
The Principle of Autonomy
Research must respect and protect the rights and dignity of participants
The Principle of Justice
The benefits and risks of research should be fairly distributed among people
69

The Principle of Autonomy


Research must respect and protect the rights
and dignity of participants
The Principle of Justice
The benefits and risks of research should be
fairly distributed among people

08/09/11

Ten General Ethical Principles (1)


Essentiality
Necessary to make all possible efforts to get and give adequate
consideration existing literature and knowledge of the study or
the issue you research
Maximization of Public Interest and Social Justice
Research is a social activity, carried out for the benefit of the
society (even the reason is to get marks)
With the motive of maximization of public interest and social
justice
71

Ten General Ethical Principles (2)


Knowledge, Ability and Commitment to do Research
Sincere commitment to research and relevant subject
Readiness to acquire adequate knowledge, ability and
skills for a particular research
Respect and Protection of Autonomy, Rights and
Dignity of the Participants
Protect the autonomy, rights and the dignity
Participation of the individual MUST be Voluntary and
based on informed consent
Get permission of the respondent for photography and
recording
72

Ten General Ethical Principles (3)


Privacy, Anonymity and Confidentiality
All information from the participants are confidential
NO IDENTIFYER!!!!!
Pseudo names
Precaution and Risk Minimisation
All research carries some risk to the participants and to
society
Take adequate precautions and minimizing is essential
73

Ten General Ethical Principles (4)


Non Exploitation
MUST not unnecessarily consume the time of
participants
MUST not incur undue loss of resources and income
MUST not expose them to risks due to participation
Should not exploit juniors or other team members
Contribution of each member should be properly
acknowledged and recognized
74

Ten General Ethical Principles (5)


Public Domain
Results should be in the public demine
Accountability and Transparency
Must be fair
Must be honest
Must be transparent
Must preserve the research records for a reasonable time
Must destroy all the records after certain periods
75

Ten General Ethical Principles (6)


Totality of Responsibility
All those involved directly and indirectly in the
research should adhere to the ethics

76

A few other important principles


Protection and Promotion of Integrity in
Research
Researchers should not take any secret research
There is fabrication, falsification, plagiarism

77

Participants have the right to get help


All possible help to the participants
Help the participants in case of adverse consequence
Informed consent where the gatekeepers are involved
Obtain permission of the gatekeepers
But it does not substitute the permission of the actual
respondent
Take back your results to the community
You check whether the observation you made is correct

78

NEVER have a wrong statistical analysis


Think about your interpretation
Should not be from group to individual level
Should not be a particular population to general
population

79

Reference books: (Published by Centre for


Studies in Ethics and Rights)
Ethical Guidelines for Social Science Research in
health (2006). National committee for Ethics in
Social Science Research in Health.
Amar Jesani and Tejal Barai-Jaitly (2005). Ethics
in health Research: A Social Science Perspective.
80

COMPOSITE MEASURES

QUANTITATIVE RESEARCH
METHODOLOGY
Madhura Nagchoudhuri

Composite Measures
Composite measures are used to measure variables that are
complex or multifaceted such that they cannot be measured
using a single item on a questionnaire e.g. stress, quality of life,
human development
Two types of composite measures Indexes
Scales

Both indexes and scales enable representation of complex


variables with scores that allow potential for greater variance.
Scores are derived from multiple items.
Indexes and scales provide ordinal measures of variable by rank
ordering people through overall score that combines items on
the scale or index.

Composite Measures
Factors to keep in mind while selecting items
to create a scale or an index Face validity
Items should have adequate variance- useful in
distinguishing people from each other. People
should not come up with uniform answer.

Index- The HDI example

An example of a well established and commonly used index- Human


Development Index
HDI provides a way of ranking countries on the issue of human
development and is used to monitor the progress of nations.
It indicates how far a country has to travel to provide essential choices to its
entire population.
Focus is on long term human development outcomes. It cannot reflect input
efforts in terms of policies or short term human development achievements
It is an average measure and masks certain disparities and inequalities
within countries.
Consists of
Educational attainment access to knowledge measured by adult literacy
rate and combined gross primary, secondary and tertiary enrollment ratios
A decent standard of living- measured by GDP per capitapurchasing
power
Long and Healthy Life - measured by Life expectancy rate at birth

HDI calculated by taking an average of deprivations in all three areas and


subtracting the average from 1

Types of Scales
3 types of scales most commonly used
include Likert scales
Semantic differential scales
Guttman scales

Likert Scale
Format frequently used in contemporary survey
questionnaires.
Respondent is presented with a series of statements to which
s/he is to respond indicating whether s/he strongly agrees,
agrees, undecided/neutral, disagrees or strongly disagrees.
There is an unambiguous ordinality in the response categories.
Usually is 3point, 5point or 7point (odd no. with a midpoint)
Assumes that each item on the scale has equal intensity
Lends itself to simple method of scaling with the possibility of
scoring being done in a uniform way e.g. scores of 0 to 5 may
be assigned where score of 5 is assigned to strongly agree
for positive items and strongly disagree for negative items.

Semantic Differential Scales


Used extensively in social science research
It is easy to construct and easy to administer
Used to measure inanimate things, animate things, behaviors
and intangible concepts similar to Likert scales.
Usually a 7 point scale
Tests peoples feelings about something
Determine the relevant dimensions and have terms to represent
extremes of each.
Uses pairs of adjectives which are opposites of each other (e.g.
boring- interesting)
Respondents are expected to rate each set of adjectives
indicating which of the pair they favor more eg. Boring or
interesting (very much, somewhat, neither, somewhat, very
much)

Guttman Scales
Clear difference in intensity in the way items
are structured moving from the least intense to
the most intense.
If a respondent agrees to the more intense
items (harder items) then one may assume that
s/he will agree to the less intense or easier
items.
E.g. Bogardus Social Distance Scale

Complex measures- word of caution


Some factors to be taken into account while
creating and applying scales Language- shades of meaning as they are understood
may be different. If the scale is to be used in a context
where respondents dont know English- interviewers
who are bilingual may be used. Instrument may be
translated into another language --- back translated to
ensure reliability and validity
Culturally sensitive questions scales should be
tested in different cultural contexts to ensure their
reliability and validity in the context of that culture.

Reliability & Validity

Reliability
Deals with the indicators of dependability
A reliable indicator or measure gives the
same result every time
Three types of Reliability1. Stability reliability -reliability across time,
2. Representative reliability -across
subpopulation, groups of people and
3. Equivalence reliability -consistency across
different indicators

Sources of Error

Unclear Definition of variables


Use of retrospective information
Variation in conditions for data collection
Structure of the instrument (many open ended
questions may reduce the reliability)

Testing Reliability
Reliability is determined by obtaining two or
more measures of the same thing and seeing
how closely they agree.
Four methods of testing reliability
Test retest
Alternate form
Split Half
Observer reliability

Test-Retest
Repeated administering the same instrument to
the same set of people on separate occasions
They should not be subjects in actual study
If the results of repeated tests are similar, then
the reliability is high
Drawback- the first test has an influence on the
next
Measuring instruments that are strongly
affected by memory or repetition, should not be
tested for reliability using this method

Alternate Form
Different but equivalent forms of the same test
are administered to the same group of
individuals usually close in time and then
compared
Drawback- developing equivalent tests can be
time consuming
Some problems associated with test-retest are
not completely eliminated

Split Half
Items of the instrument are divided into
comparable halves
The test is administered and the scores of the
two halves are compared.
If the scores are same then the test is reliable
Major problem in designing two halves that
are equivalent

Observer Reliability
Comparing administration of an instrument
done by different observers or interviewers
The observers need to be thoroughly
trained
At least two people will code the content of
the responses according to certain criterion

Validity
Validity: A measure is valid if it measures
what it is supposed to measure
Four Types of Measurement Validity
- Face validity
- Content validity
- Criterion validity
- Construct validity

Face Validity
The easiest type of validity to achieve and
most basic
It is the judgment by the scientific community
that the indicator really measures the construct

Content Validity
It is a special type of face validity
Whether it captures the entire meaning
Is the full content of the definition
represented?
E.g. Feminism, empowerment

Criterion Validity
The validity of an indicator is verified by
comparing it with another measure of
The same construct in which the researcher has
confidence
Two subtypes Concurrent
Predictive

Concurrent Criterion Validity


An indicator must be associated with a
preexisting indicator that is judged to be valide.g. intelligence test

Predictive Criterion Validity


Indicators predict future events that are
logically related to a construct
E.g. scores of competitive exams like
SAT or CAT and future performance of
the student

Construct Validity
It is for measures with multiple indicators
Two types
Convergent
Discriminant

Convergent Construct Validity


This applies when multiple indicators
converge or are associated with one another
E.g.- income, type of housing
Educational level, skills in writing or
computation and knowledge or awareness

Discriminate Construct Validity


Also known as divergent validity
If two constructs A and B are very different
then their measures should not be associated
e.g. belief in secularism and strong identity
with religious groups

Other Types of Validity


Internal- It means that there is no error internal
to the design of research project
External- It is the ability to generalize the
findings of a specific setting or group to a
broad setting or group
Statistical- Choice of correct statistical
procedure and meeting its assumptions fully

Relationship between Reliability &


Validity
Reliability is necessary for validity but not
sufficient
A measure can be reliable but not valid
weighing scale
Validity and reliability are usually
complementary concepts

Sampling
Introduction
Inferential statistical methods use sample statistics to
make predictions about population parameters.
The quality of inferences depends crucially on how well
the
sample represents the population.
To ensure a good sample representation
randomization
is essential.
What is randomization?
Randomization is the mechanism for ensuring that the
sample representation is adequate for inferential
methods.

Methods of Sampling
Sampling is quite often used in our day-today
practical life where, our purpose is to
determine the population characteristics
only by observing a finite sub set of
individuals taken from it.
Sampling methods can be classified under
two heads namely,
1. Probability Sampling Methods
2. Non-probability Sampling Methods

Probability Sampling
Methods
1. Simple Random Sampling
2. Systematic Sampling
3. Stratified Sampling
4. Cluster Sampling
5. Multistage Sampling

1. Simple Random Sampling : A Simple Random


Sample of n subjects from a population is one in which
each possible sample of that size has the same
probability of being selected.
How to Select a Simple Random Sample?
Before selecting a Random Sample, we need a list of
all subjects in the population. The list is called
Sampling Frame. The most common method for
selecting a simple random sample from the sampling
frame is the use of a random number table.
A random number table is a table containing a
sequence of numbers that is computer generated
according to a scheme whereby each digit is equally
likely to be any of the integers 0,1,2, ... ,9

. Systematic Sampling: If a sample


of size n is to be selected from a
population size by N and let k = N/n. A
systematic random sample
1. Selects a subject at random from the first k
names in the sampling frame, and
2. Selects every kth subject listed after that one.
The number k is called the sampling interval.

Stratified Sampling : A stratified


random sample divides the population
into separate groups, called strata and
then selects a simple random sample
from each stratum.
The population is divided into k
homogeneous strata with stratum size
N1, N2, , Nk such that
N1 + N2 + + Nk=N
N1 N2 + ++ N k = N

. Cluster Sampling: Simple, systematic and


stratified random sampling are very expensive or
ever impossible to implement in many situations
particularly when a complete and up to date
sampling frame is not available.
In cluster sampling, the population is divided in to
large number of groups, called clusters. A cluster
sample is one for which the sampling units are
the
subjects in a random sample of the clusters.

Multistage Sampling: Multi-stage sampling methods


use combinations of various sampling techniques. For
example, to study various characteristics of adult
residents in Maharashtra state, one could treat districts
as clusters and select a random sample of a certain
number of them. Now within each district selected one
could take cluster sample of villages. Within each
village selected, one could systematically sample every
10th household. Within each household selected, one
could select one adult at random from the sample.

Non-Probability Sampling
Methods
Social research is often conducted in situations that do
not allow the kinds of probability sampling discussed
so
far, for large-scale social surveys. Suppose we want to
study homelessness. Neither a list of all homeless
individuals is available nor it can be created. Moreover,
there are times when probability sampling wouldnt be
appropriate. Such situations call for non-probability
sampling.

Methods of Non-Probability
Sampling
1. Purposive or Judgement Sampling;
2. Volunteer Sampling;
3. Snowball Sampling;
4. Quota Sampling; and
5. Selecting Informants

1. Purposive or Judgement Sampling


Sometimes it is appropriate to select a sample
on the basis of knowledge of a population and
the purpose of the study. This type of sampling is
called purposive or judgement sampling.
2. Volunteer Sampling
One of the most common non-probability sampling
methods is volunteer sampling. In this method
subjects volunteer themselves for the sample. A good
example of volunteer sampling is visible almost any day
on television. Some T.V. programmes ask viewers to
offer their opinions on any issue or vote for any
celebrity for his/her performance through SMS or calling
a phone number. The danger in this method is that the
sample will poorly represent the population and may
yield misleading conclusions.

Snowball Sampling
Snowball sampling method is appropriate when the
members of a special population or individuals with a
rare characteristic are difficult to locate such as
persons in a village who were bitten by snake. In this
method, the researcher collects data from the few
members of the target population; then asks those
individuals to provide information needed to locate
other members of the target population whom they
happen to know.
.

Quota Sampling
Quota sampling begins with a matrix, or table, describing the
relevant characteristics of the target population. Depending on
your research purposes, you may need to know what proportion
of the population is male and what proportion female as well as
what proportion of each gender fall into various age categories,
educational levels, ethnic groups, and so forth.
Once you have created such matrix and assigned a relative
proportion to each cell in the matrix, you proceed to collect data
from people having all the characteristics of a given cell. You
then assign a weight to all the people in a given cell that is
appropriate to their total population. When all the sample
elements are so weighted , the overall data should provide a
reasonable representation of the total population.

Sampling Error
The sampling error of a statistic is
the error that occurs when a statistic
based on a sample estimates or
predicts the value of a population
parameter.
1. Other Sources of Variability/Error
2. Under-coverage
3. Response Bias
4. Non-response
5. Missing Data
Sampling Distribution
A sampling distribution is a probability
distribution that determines probabilities of
the possible values of a sample statistic.
Standard Error
The standard deviation of the sampling
distribution of the sample statistic is called
the standard error of the statistic

Research Methods
Tools & Techniques for
Data Collection

Some Techniques are


l
l
l
l

Interview
Questionnaire
Focus Groups
Observation

Methods of Interview
l
l
l
l

Self administered questionnaires


Face-to-face interviews
Telephonic Interview
Through the Internet

Interview
Types of Interviews

Informal

Unstructured

Semi Structured &


Structured

Interviewing
l

Unstructured Interviewing-

Get people to open up and express


themselves in their own terms at their own
pace.
Excellent for building initial rapport before
moving to more formal interviews
Often no formal written tool is used

Interview
l

Semi-structured Interviewing-

An Interview Guide used- This is a written list of


questions and topics and need to be covered in a
particular order.

Structured interviewing-people are asked to


respond to as nearly identical a set of stimuli as
possible.
-

Interview Schedule used with a written list of


questions used with the question order and
structure being followed exactly for each interview.

Different Components
l

The Interviewer

The Interview Schedule/Interview Guide

The Researched/Respondent

The Skill of Interviewing


l
l

Assure people of anonymity and confidentiality


Explain that you simply want to know what
they think, and what their observations are
Encourage them to interrupt you during the
interview with anything they think is important.
Always ask for permission to record personal
interviews and to take notes.

Framing the Questions


l
l

l
l

l
l
l

Questions must be unambiguous and clear.


The vocabulary or words used must be appropriate to the
respondents.
There must be a clear purpose for every question.
Use of open and closed ended questions
- open ended questions respondent gives own answer (used
for qualitative data, sensitive information)
- Closed ended questions-choice of answers, respondent
picks the most appropriate one.
Never use loaded or double-barreled questions.
Always pre test
See if the questions elicit the information needed to test the
hypothesis or answer research questions

Sequencing the Questions


l

l
l

Ask general questions or about some facts


before personal questions
Get respondents involved in the interview
Intersperse fact-based questions throughout
the interview to avoid long lists of factbased questions

Sequencing the Questions


l

Ask questions about the present before


questions about the past or future.
The last questions might be to allow
respondents to provide any other
information they prefer to add and their
impressions of the interview

Skills in Interviewing
l

Probing
-

Silent Probe
- Echo Probe
- Uh-huh Probe
- Tell Me-More Probe
- Long-Question Probe

Carrying the Interview


l
l

l
l

Ask one question at a time.


Attempt to remain as neutral as possible.
Encourage responses
Be careful about the appearance when note
taking.
Provide transition between major topics
Don't lose control of the interview

After the Interview


l

Verify if the tape recorder, if used, worked


throughout the interview.
Make any notes on your written notes
Write down any observations made during
the interview

Points to Note.
l
l
l
l
l

Importance of Language
Pace of the Study
Being Yourself
The little things!!
Using a Tape Recorder (recording equipment
etc)
Taking Notes

Response Effects
l

l
l
l
l

Response effects refer to measurable differences in the


interview data that are predictable from the characteristics
of respondents, interviewers and/or the environment.
Age, sex, culture, comfort level of the respondents impact
responses
The Deference Effect
Threatening Questions
Social Desirability Effect
Accuracy of responses-(inability to recall, misleading)

Types of Interviews
l
l

Face to face
Telephone

Advantages of Face to Face


Interviews
l

l
l

Can be used for all types of people- illiterate, bedridden,


old etc
It is possible to clarify if the person does not understand
the meaning (noting such questions where explanation
was needed)
Use of different techniques is possible-open ended
questions, visual aids, graphs, etc.
Long interviews are better in a face to face situation
Can get only one question at a time so cannot flip
through the next page to see whats coming
Possible to observe the body language

Disadvantages of Face to face


Interviews
l
l
l

They are intrusive and reactive


Costly in terms of time and money
Limits the sample size for a single
investigator as you have to finish the data
collection in a short time (should not exceed
a year)
Training is needed for multiple investigators
and there can be some error

Telephone Interviews
Advantages:
-Have the impersonal quality of the questionnaire.
Inexpensive, need less time and energy
Can reach everyone who has a phone
Less influence of the interviewers personality
Disadvantages:
Not useful for people without a telephone connection
Cannot be a long schedule
Data can be false if investigators are not monitored
properly

Questionnaire
l

Include a brief explanation of the purpose of


the questionnaire.
Include clear explanation of how to
complete the questionnaire.
Include directions about where to provide
the completed questionnaire.
Note conditions of confidentiality

Advantages of Self Administered


Questionnaires
l

l
l
l

Can be administered in various ways including- mail, drop and


collect; administered online and collected through email or
administered to a group of people sitting in a room
Single researcher can collect data from a large sample in a
short time
Relatively cheaper
No interviewer bias
Possible to include questions with long list of categories/long
battery questions
Possible to ask sensitive/difficult to answer in face to face
interview

Disadvantages of Self Administered


Questionnaires
l

l
l
l

No control over how people interpret the


questions
Response rate can be poor
Prone to serious sampling problems
The sequence may not be followed/cannot
avoid flipping through
Not useful for illiterate or visually impaired
population

When to Use What


l
l

No method is perfect
On an average interview method can ensure
82% of fully filled schedules as against 68%
by questionnaire method
Short schedules for a population having
telephone connections, telephone
interviews are possible

Dillmans Total Design Method to Improve Response


of Mail and Telephone Surveying
l

Mailed questionnaires must look professional (size and


colour of the paper, font size, layout)
Front and back covers- No question on either coversinteresting title, name and return address
Question order start with questions related to the
topic and end with questions on personal data
Formatting- Careful use of font and case, spacing

Dillmans Total Design Method to Improve


Response of Mail and Telephone Surveying
l

Length should not be more than 10 pages and 125


questions
The covering letter brief and specific, guarantee
confidentiality
Inducement- some monetary incentives for responding
ca also be thought of
Contact and follow up Sending the questionnaire
after prior intimation and follow up after mailing.
Sending a second cover letter and questionnaire to
non respondents

Focus Groups

Interview through the group process


Focus groups typically have 6-12 members, plus a
moderator
Discussion Focused around a specific topic or theme
Homogenous group preferred
To the extent possible, participants should not know
one another
Focus groups have to be supportive and nonjudgemental

Focus Group Discussion


l
l

l
l

Develop five to six questions


Major goal of facilitation is collecting useful
information to meet goal of meeting.
Carefully word each question before that question is
addressed by the group.
Facilitate discussion around the answers to each
question, one at a time.
After each question is answered, carefully reflect back
a summary of what you heard
Ensure even participation.
Closing the session

Observation
l

Observation is a data collection strategy involving the


systematic collection and examination of verbal and
non verbal behaviours as they occur in a variety of
contexts It includes both human activities and the
physical settings in which such activities take place.
Observation methods are also used to extend or
validate data collected by other data collection
methods.

Observation
l

Observation also has relevance in research studies


where the respondents are unable to communicate for
a variety of reasons for e.g. either they are infants or
they could be adults who may not be able to articulate
complex emotions or certain life situations in an indepth manner.
Even in studies with direct interviews, researchers use
observational techniques to note body language and
other gestures to get an insight into the words spoken
by the persons being interviewed.

Observation
l

The purpose of observational research is to


record group activities, conversation and
interaction as they happen and to ascertain
the meanings of such events to
participants.
Observation may take place either in
laboratory settings, designed by the
researcher or in field settings that are the
natural habitat of selected activities.

Types of Observational Research


Three types of observational researchl
a)Descriptive observation generates a large
quantum of data as it involves the description of all
details by an observer
l
b) Focused observation as the name indicates entails
looking at only specific pertinent material relevant to
the area of study
l
c)Selective observation would mean identifying
specific areas from a more general category.

Types of Observation
l

l
l

Observation-in-person (participant observation and


non participant observation)
Video recordings.
Structured or Unstructured.
In majority of research, efforts are usually made to
observe participants in as natural a setting as possible.

Participant Observation
l
l

Usually involves field work


It a strategic method that lets you collect any kind of
data qualitative as well as quantitative, narrative or
numbers.
It can be in the form of life histories, attending rituals,
talking about sensitive issues, It is about immersing
yourself I a culture, process and documenting it.
Can be ethically problematic if not done properly

Two Different Roles as Participant Observer


l

Complete Participant- Becoming a member


of the group without disclosing your role as
a researcher
Participant Observer- Can be an insider who
observes and records some aspects of life
around him/her or can be an outsider who
participates in some aspects of life and
records whatever is possible

Advantages
l

l
l

It makes it possible to collect different types


of data
It reduces the problems of reactivity or
change in behaviour because they are being
studied
It helps to formulate sensible questions
Gives an intuitive understanding of a culture

Non participant Observer


l
l

Direct observation
Video recording and then analysing

Non Reactive Measurements


l

Observations of the researcher on a topic of


his/her interest without making the subjects
aware that they are being studied.
The evidence of social behaviour or action is
available naturally
Researcher infers from the evidence or
behaviour without disrupting those being
studied

Varieties of Non Reactive or Unobtrusive


Observation
l

Family portraits of different historical eras- to


study the gender relations
Contents of garbage dumps in an urban area to
study how many bottles of liquor are being thrown
to study the under reporting of liquor
consumption
Interest in different exhibits in a museum from the
worn floor tiles
Bumper stickers of cars to study political affiliation

Recording and Documentation


l

l
l

Construct conceptualization and linking it to


non reactive observations
Deciding upon a system of observation
It can be done from physical traces, Archives
and observation of the events, behaviour in
natural settings

Content Analysis
l

l
l

l
l

Content analysis is a technique for gathering and


analyzing the content of the text
Content refers to words, meanings, pictures, symbols,
ideas, themes, and any message of communication
Text can be written, visual or spoken
The researcher uses objective and systematic counting
and recording procedures to produce a quantitative
description of the content
It is a non reactive technique
It involves random sampling, precise measurement and
operational definitions for abstract constructs

Topics Appropriate for Content Analysis


l

Themes of popular songs and religious


symbols and hymns
Trends in newspaper covers and ideological
tone of the editorials
TV coverage of people from different
backgrounds
Gender exploitation in commercials

Units of Analysis in Content


Analysis
l

Word, Theme, plot, design, newspaper


article,
Coding system can be around four
characteristics
-

Frequency
Direction
Intensity
Space

Data Collection from Secondary


Data
Books
l Reports
l Published compilation
l Computerized records
Researcher can search through collection of information
with a research question and variables in mind
Reassemble the information in new ways to address the
research question
l

Sources of Data Collection from Secondary


Data
l

Any topic on which information has been


collected and is publicly available can be
selected
Existing statistics can also provide data for
the study such as govt. records, census
data,
Assembly proceedings, biographical
information

Methods of Data Collection, Data


Processing and Codebook

What is Data ?
Data refers to a collection of organized
information, usually the result of experience,
observation or experiment, other information
within a computer system, or a set of premises.
This may consist of numbers, words, or
images, particularly as measurements or
observations of a set of variables.
169

Data Processing
The Survey data which are collected from field
require certain operations before it can be used
for analysis.
The data processing requirements are to be
specified in an earlier stage of any research
study in terms of time, cost, manpower,
materials, etc.
170

Cont
Processing of collected data is required for
drawing out meaningful results. Data
processing involves the various steps, from
editing of questionnaires to analysis and
report-writing.
There are different stages of data processing:
Editing and Scrutinising
Coding and Data Entry
Validation, Checking & Updating
Analysis

171

Editing & Scrutinising


The first stage is scrutiny or editing of filed-in
questionnaires.
Editing means checking the schedules for:
Completeness,
Accuracy and
Uniformity
172

Completeness
By completeness, it is meant that the filled-inschedule is complete in all manners. The first
point to check is whether there are answers for
every question. If an interviewer forgets to ask
a question or to record an answer, it may be
possible to deduce from other data on the
questionnaire what the answer should have
been and thus fill the gap at the editing stage.
173

Accuracy
By accuracy, it is meant that the answers are
correctly filled-in. It is not enough to check
that questions are answered; one must try to
check whether the answers are accurate.
Answers needing arithmetic even of the
simplest kind, should be edited carefully.

174

Uniformity
The editing stage gives every opportunity for
checking that interviewers have interpreted
questions and instructions uniformly.
For example, if a question on occurrence of a
calamity is to be asked as follows :
"Whether any calamity has occurred in your
village during the past two years". Every
investigator should confine to the period of
two years only so that there will be uniformity
in the case of the period.
175

Characteristics of Completed Questionnaire


A correctly completed questionnaire or
schedule will have, among others, the
following characteristics :
All answers are recorded in a legible and
comprehensive way.
No inconsistencies between answers.
Where information is missing, it should be
filled-in (wherever possible) in accordance
with the correct pattern.
176

In general, you need a thorough knowledge of


the entire questionnaire and the ability to see
which answer to a certain question can be
verified against the answers.
When you encounter a problem for which
there is no ready solution, you should always
check against other relevant questions in the
schedule or questionnaire.

177

Coding
Coding is translating answers into numerical values or
assigning numbers to the various categories of a
variable to be used in data analysis.
Coding is generally done while preparing the questions
and interview schedules. Fieldwork is thus done with
pre-coded questions. However, sometimes, when
questions are not pre-coded, coding is done after the
fieldwork.
Coding is done on the basis of the instructions given in
the codebook. The codebook gives a numerical code
for each variable.
178

Example
If Age = 9 years;

Then, code

0 9

Do not code in any of the following ways:


Blank instead of zero :

Wrong Box Used

The last will be read as 90 and not 9.

9 0

179

Types of Questions
Different types of questions should be examined
before coding:
(i)
(ii)
(iii)
(iv)
(v)

Number (value) Questions.


Fixed Alternative Questions.
Semi-Open-ended Questions.
Open-ended Questions.
Multi-coded Questions.
180

(i) Number (Value) Questions : This can be coded in


the same way as it has been recorded on the
questionnaire at the time of interview.
Example : Age, No. of children, etc.
(ii) Fixed Alternative Questions : Questions of this type
are YES/NO, SEX, Month etc. (The no. of
alternatives decided in advance).
(iii) Semi-Open-ended Questions: These questions have
a fixed number of alternatives plus an `OTHER'
option. Example : Other contraceptive methods, sex
preferences of next child, etc.
181

Cont
(iv) Open-ended Questions: These questions are left
completely open for the interviewers, and no
alternatives are suggested in the questionnaire. The
reason for this may be either of the following:
(a) The alternatives are known, but there are too may to
make it practicable to list them all (Example :
Contraceptive methods).
(b) The possible replies cannot be foreseen, and as a
consequence the answers are taken down verbatim
and later classified in manageable groups (Example :
Occupational Status).
182

Cont
(v) Multi-Coded Questions: Multi-coded
questions belong to the group of `fixed
alternative' questions, as the number of
possible replies are fixed. However, in multicoded questions, the answers are not
necessarily mutually exclusive, so that two or
more answers are allowed for the same
respondent. The codes for this are developed
differently from the other types.
183

Cont
For this type of question, a Binary System of
codes is used, rather than a consecutive order.
This idea is that all the categories ticked can be
added together to form one code without any
loss of information, as each `sum' represents a
unique combination of answers.

184

Cont
A detailed coding manual or set of instructions should
be prepared before the coding begins. Since, the
editing and coding operations are related, the timing
of the coding depends on that of the editing. In
general, the coding should not begin until there are an
adequate number of edited questionnaires available,
and there is assurance that there will be a continuous
flow of questionnaires. Once the coding starts, there
should not be delays due to the unavailability of
edited questionnaires. There must be adequate office
space so that questionnaires can be checked as they
are returned from the field.
185

Cont
The unedited questionnaires should be kept separate
from the edited ones. Likewise, those that have been
coded should be stored separately from those not yet
coded. Adequate working space should be provided
for each individual coder so that there is no
overcrowding and the work can proceed satisfactorily.
All coders should be given specific training for
sufficient understanding of the job. The real effective
way to train is to ensure that they are given enough
on the job practice, followed up with careful
evaluation of the work performed.

186

Data Validation and Updating


The data validation may be done in different
stages, which can be divided into following
parts:
Data Entry
Editing
Recoding
Tabulation
Archiving and further analysis
187

Cont
Data Entry: The data are entered into a
computer. For example, the data are entered
into SPSS package.
Editing: The data are checked and corrected
on computer for format and structure errors to
ensure that all and only required data are
present. Also, the data are checked and
corrected for out of range and inconsistent
responses.
188

Cont
Recoding: The edited data are transformed
from the actual responses to a set of variables
convenient for analysis.
Tabulation: The recorded data are tabulated
according to the specifications laid down for
writing reports.
Archiving and further analysis: The different
data files with complete documentation are
organized for further research.
189

Cont
It is very important for any meaningful interpretation
of data that all possible errors and inconsistencies are
corrected before the analysis phase.
Thus cleaning or machine editing of data is an
extremely important function involving both the
researchers and data processors.
Essentially, computer editing is a repetition of the
manual editing and is necessary both because of
human error in the manual operation and to correct
errors introduced during coding and punching.
190

Machine Editing
After the office editing, a more comprehensive
checking must be carried out by the computer.
Machine editing can be divided into two main stages.
A)Format and structure check which involves in
checking the following items:
Each part of the identification (e.g. sample area,
household, and line number) contains a valid value.
All sample households are present.

191

Cont
B) Range and Consistency Checks:
All codes are within the ranges specified for them in
the code book.
All skips in the questionnaire have been correctly
executed.
The information recorded is internally consistent.
Dates in the event histories flow in a sequential order
with a specified minimum elapsed time between
events.

192

Cont
The computer is used to locate errors and not
to make corrections.
During the format, structure and consistency
checks, error reports are produced from the
computer.
Correct values are looked up in the original
questionnaires and written into suitable update
forms along with the identification of the
record to be corrected. This work is usually
done by the office editors.
193

Cont
It is, therefore, very important that: (i) the error
reports from the computer are clear and easily
comprehensible to the non-data processing
staff, and (ii) the update forms for writing
down the corrections are simple to fill out. It
should be ensured that careful organization is
done of the way corrections are to be made on
the computer. Questionnaires should be easily
accessible and located on shelves clearly
labelled with the cluster/region to which they
belong.
194

Cont
The editing staff looking up the corrections
must be thoroughly trained on how to interpret
error listings from the computer, how to look
up appropriate corrections and how to fill out
the update forms.
The contents of update forms are key punched
and used to update the computer files. The
whole checking and correction procedure must
be repeated until no more errors are
encountered.
195

What is a codebook?
A codebook describes and documents the
questions asked or items collected in a survey.
The codebook will describe the subject of the
survey or data collection, the sample and how
it was constructed, and how the data were
coded, entered, and processed.
The questionnaire or survey instrument will be
included along with a description or layout of
how the data file is organized.
196

Dr. Madhura Nagchoudhuri & Ms. Divya K.


SW 4- Quantitative Research Methodology

Questions & Hypotheses


n

Survey questions should be based on hypotheses to be


tested.
n Hypotheses expect/assume a relationship between
certain factors/variables- where there is an
independent and dependent variable and the
dependent variable is affected by the independent
variable.
n

Hypotheses are key to data analysis as they define


what it is you want to find and guide analysis.

Elementary Quantitative Analysis


Data: refers to the numbers of measurements that are collected
from the subjects/respondents .
Statistic: a number calculated on the sample data that quantifies
/ describes a characteristic of the sample.
Descriptive statistics- used to describe the data collected (e.g.
measures of central tendency- mean, mode, median, measures of
dispersion- range, standard deviation)
n Inferential Statistics- used to make inferences about the
population from which sample was drawn through use of various
tests (mostly used with ratio and interval level variables).
n Types of analysis- Univariate, Bivariate and Multivariate
analysis

SPSS : What Can it and Cant it Do?


nSPSS is a windows based point and click program.
nSPSS helps organize & analyze data
nSPSS can also help present data through graphs,
charts etc.
nSPSS does not help make decisions related to
analysis.
nSPSS does not interpret analysis.

Purpose of Data Analysis


Main purpose of data analysis is to understand and make sense of the
information/data the researcher has collected through the data collection.
n Through various types of analysis the researcher understands patterns in the
datan
how closely clustered is the data? How close is it to a central
point? - (measures of central tendency)
n

how spread out is the data? (measures of dispersion), how


frequently do certain data points occur ?(frequency)

Nature of relationships between variables- are certain variables related or


unrelated? If related, are variables that are related moving in the same direction
or in the opposite direction?
n

Broadly what do the answer to these questions mean in real terms while
answering the research questions of the study?
n

Data Analysis
n

Univariate Analysis: refers to looking at tables and statistics


that describe one variable at a time (e.g. for categorical
variables frequency tables, scale variables- measures of
central tendency and measures of dispersion).
Bivariate Analysis: refers to tables and statistics that
describes the relationship between two variables.
Multivariate Analysis: refers to tables and statistics that
describes the relationship between multiple variables.

Univariate Table
Table 4.3 HELP TAKEN FOR ADMISSION PROCESS

Help in Admission Process

No. of responses (N=38)

Self

18 (47.3%)

Superintendent of the transit


hostel

8 (21.5%)

Trustees or volunteers

3 (7.8%)

House master /sir of previous


institute

3 (7.8%)

Hostel seniors

2 (5.2%)

Friends

2 (5.2%)

Dont remember

2 (5.2%)

Cross Tabulation (Bivariate


(Bivariate))
Cross Tabulation between Literacy and Gender of
person
Literate * Sex Crosstabulation
Sex
Literate

no
yes

Total

Count
% within Sex
Count
% within Sex
Count
% within Sex

female
2095
21.1%
7847
78.9%
9942
100.0%

male
1517
13.3%
9874
86.7%
11391
100.0%

Total
3612
16.9%
17721
83.1%
21333
100.0%

Different Ways of Data Presentation


Descriptive presentation
Table
Graph
Map
Essential information- Clear Title, Title for Columns,
Total number of cases and percentages, Key for
different colours or figures for graphical
presentation

Graphical Representation
Different types of Graphs: The primary purpose of
graphical representation is to highlight important
features of the data.
- Line graph- displays trends in data
- Bar chart/graph- used to show nominal or ordinal
level data
- Pie chart- used to show nominal or ordinal data
- Histogram- used to represent distributions of
interval or ratio data.

Line graph-is used to display trends in data

Bar charts:- are used to display the distribution of subjects or


cases in particular categories. Usually used for nominal or
ordinal level data
100

90

80

70

Count

60

50

40
low

SES

middle

high

Pie diagram like bar diagram is another way of displaying the


number of subjects or cases within different subsets of categorical
data
h
ig
h
2
9
.
0
%

lo
w
2
3
.
5
%

m
id
d
le
4
7
.
5
%

Measures of Central Tendency


Measures of

Central Tendency: refer to descriptive


statistics that indicate the central location of data
distribution. These statistics summarise data by describing
the most representative values in the dataset.
Mean- refers to the average sum of scores divided by the
number of scores.
Median-refers to the value below which 50% of the scores
fall. It is the centre most score if the no. of scores are odd
and the average of the two center scores if the no. of scores
are even.
Mode: refers to the most frequently occurring score

Measures of Central Tendency: Mean


Mean = Xi/N where N is equal to total sample
The mean is sensitive to the exact value of all the scores in

the distribution.
It is very sensitive to extreme scores.
The mean is least subject to sampling variation as compared
to other measures of central tendency. If repeated samples
were taken from a population the mean would vary
somewhat from sample to sample but it would vary lot less
than median or mode. This is the reason why it is used so
frequently in inferential statistics.

Measures of Central Tendency: Median


Median is less sensitive to extreme scores than
the mean.
Median is more subject to sampling variability
than mean but less than mode
If no. of scores is odd it is the middle value.
If score is even Median = Center
score1+Centre score2/ N

Measures of Dispersion/Variability
n

Measures of Dispersion/Variability: refer to descriptive


statistics that indicate how far apart the scores are spread.
These measures quantify the extent of dispersion from a
central point. Measures of dispersion include
n
Range- refers to the difference between the highest
and the lowest scoress in the distribution i.e.
highest score- lowest score = Range
n
Standard Deviation-indicates the distance of the
scores from around the mean.
n
Variance- square of the standard deviation used
mostly in inferential statistics.

Measures of Dispersion: Standard Deviation


n

Standard deviation gives the measure of dispersion relative


to the mean.
It is sensitive to each score in the distribution so if scores
are moved closer to the mean the standard deviation will
become smaller while if it is moved further from the mean
the standard deviation will become larger.
Like the mean, standard deviation is stable with regard to
sampling fluctuations.

Topics

Univariate, Bivariate Analysis


Descriptive and Inferential Statistics
Measures of Central Tendency
Measures of Dispersion
Measures of Association

Topics
Measures of Association
Correlation
Chi square test
t-test ( for independent samples)
One way ANOVA

Outline of the Session


Coding the questionnaire; identify
levels of measurement; type of
variables.
Frequency distributions
Measures of central tendency and
dispersion.
Parametric and non parametric.
Descriptive and inferential Statistics
31 Aug 2012

217

Types of Statistics
Broadly, statistics are of two types:
I. Descriptive and II. Inferential
Descriptive statistical procedures summarise large
groups of numbers. They are also called summary
statistics.
Ex: Measures of Central tendency, variance, S.D.,
correlation and so on.
The second category of statistics is called inferential
statistics. Inferential statistics are the statistical
techniques used by researchers to generalize from
characteristics of a small group to a larger group not
measured by the researcher.
Ex: t-test, ANOVA, Chi-Square etc.

Parametric and NonNon-parametric statistics


regarding parametric and nonnon Depending on the nature Assumptionsparametric
statistics
of sample distribution
Non-para metric
there are two types of Parametric
Test
hypotheses Test hypotheses that
statistics:
based
on
the
assumption that the
samples come from
populations that are
normally
distributed.

a) parametric and
b) non-parametric or
distribution-free statistics.

The use of nonparametric statistics in


social sciences was on
the increase since
behavioural scientists
and social work
researchers rarely
achieve the sort of
measurement, which
permits the meaningful
use of parametric tests.

Parametric statistics
are
only
for
Interval/ratio levels
of
measurements
though some use it
on ordinal data also.

Assume homogeneity
of variance

do
not
specify
normality
or
homogeneity
of
variance.
Some
researchers prefer to
use these statistics
when
these
two
assumptions
are
violated.
Non parametric
statistics are used for
nominal/ordinal
levels of
measurement

Para metric statistics and analogous nonnon-parametric


procedures
Parametric
Pearson's r
t-test correlated samples

Non-Parametric
Spearmans rank correlation
(rho)
Sign test

t-test independent samples

Mann-Whitney U test

One-way ANOVA

Kruskal-Wallis
ANOVA of ranks

One-way
ANOVA
repeated measures

one-way

with Friedman two-way ANOVA


of ranks

(No similar parametric test)

Chi-square (single sample


/independent samples)

The assumptions need to be fulfilled to use:


Parametric statistical test

Non-para metric statistical test

non-parametric statistical
The observations must be A nontest is a test whose model
independent and the
does not specify conditions
sample a random one.
about the parameters of
the population from which
Observations must be
the sample was drawn.
drawn from normally
Most nonnon-parametric tests
distributed populations.
apply to data in an ordinal
At least, the level of
scale, and some to data in a
nominal scale.
measurement must be on
Simply increasing the size
interval scale
of N increases the
Homogeneity of variance
efficiency of nonnonparametric statistics.

Descriptive Stats:
Frequency Distributions
A frequency distribution is a display of the frequency
of occurrence of each value/score. It can be
presented either in a tabular form or as a graph.
Bar charts are suitable for nominal variables and for
interval/ratio variables histograms and frequency
polygons are useful.
Measures of Central tendency- mean, mode and
median.
The measures of variation include range, standard
deviation and variance.
31 Aug 2012

222

Descriptive Stats:
Frequency Distributions
To obtain frequency table, MCT, and variability:
Analyze>descriptive stat>frequencies>select
variables>statistics>continue>charts/histograms>
OK

31 Aug 2012

223

Descriptive stats
To explain about the differences between
M,Md,Mdn,SD, range, variance.
To show the calculation of Standard deviation.
Mention briefly if necessary about z scores.
Then go to Cross tabulation and chi-square
Multiple response analysis
Correlation.

31 Aug 2012

224

Cross tabulation
Helps us explore the relationship between
variables. It goes beyond descriptive
statistics.
Whereas Chi-square tells us whether two
variables are related or not dependent.

Chi--Square test
Chi
There are different types of Chi-square analysis:
Test for goodness of fit
Applies to the analysis of single categorical
variable and determines if differences in
frequency exist across response categories
compared to the population from which the
sample is drawn.
Test of Independence
Applies to independence or relatedness
between two categorical variables. This is a
very common method used by researchers.

Chi--Square test
Chi
Questions addressed by Chi-square test.

1. Whether attitude toward abortion is


dependent on sex of respondent.
2. Whether mental well being is dependent on
sex or age of respondent.
3. Whether access to toilet facility or piped
water is dependent on household wealth.
4. Whether possession of assets is dependent
on sex of respondent.

Chi--Square test
Chi
Degrees of freedom = (r-1)(c-1).
If the calculated value is higher than the table
value (reported in the output), then we
conclude that there is some significant
association between the two variables.
Significance level usually selected will be: .05 or
lower.
How to report: there is significant relationship
between sex of respondent and the possession
of land as assets (X2 = 34.21, df=1, p<0.000).

Objectives
Explain what is meant by a chi-square goodness of fit test
Conduct a chi-square goodness of fit test
Given a two-way table, compute conditional distributions
Conduct a chi-square test for homogeneity of populations
Conduct a chi-square test for association / independence
Use technology to conduct a chi-square significance test

Chi-Square Distribution
Total area under a chi-square curve is equal to 1
It is not symmetric, it is skewed right
The shape of the chi-square distribution depends on the
degrees of freedom (just like t-distribution)
As the number of degrees of freedom increases, the chi-square
distribution becomes more nearly symmetric
The values of are nonnegative; that is, values of are
always greater than or equal to zero (0); they increase to a
peak and then asymptotically approach 0

Conditions
All Chi-Square tests (Goodness of Fit, Homogeneity,
Independence):
Independent SRSs
All expected counts are greater than or equal to 1
(all Ei 1)
No more than 20% of expected counts are less than
5
Remember it is the expected counts, not the observed
that are critical conditions

Chi-Square Test for Goodness of Fit

Chi-Square Test for Homogeneity


H0: distribution of response variable is the same for all c
populations
Ha: distributions are not the same

z-Test versus Test


We use the test to compare any number of
proportions
The results from the test for 2 proportions will be
the same as a z-test for 2 proportions
z-Test is recommended to compare two proportions
because it gives you a choice of a one-side test and is
related to the confidence interval for p1 p2.

Test of Association/Independence

This test assesses whether this observed association is


statistically significant. That is, is the relationship in the sample
sufficiently strong for us to conclude that it is due to a
relationship between the two variables and not merely to chance.

Correlation

A correlation is a measure of relationship between two or more


variables.
The variables used should be at interval or ratio level.
The correlation coefficient
can range from +1(posititive
correlation) to -1(negative correlation) and a 0 represents no
correlation.
Some of the common applications of Correlation are:
Do people who smoke tend to have higher incidence of
cancer?
Is there a relationship between Level of literacy and Fertility
rate of women?
Does child under nutrition consistently decline as maternal
education improve?
Is Educational attainment and Age at marriage are related?
How is height is related to self-esteem?

The direction of Association


Direction of association

value

Increase

Increase

Decrease

Decrease

Increase

Decrease

Decrease

Increase

No association

Correlation
Correlation and causation.
Association between variable doesnt mean causation.
Sometimes variables may be spuriously correlated.
Problems relating to multicollinearity
How to report the result?
r (N=75).78; p<0.05.

Person

Height (x)
1
68
2
71
3
62
4
75
5
58
6
60
7
67
8
68
9
71
10
69
11
68
12
67
13
63
14
62
15
60
16
63
17
65
18
67
19
63
20
61

Self Esteem (y)


4.1
4.6
3.8
4.4
3.2
3.1
3.8
4.1
4.3
3.7
3.5
3.2
3.7
3.3
3.4
4.0
4.1
3.8
3.4
3.6

Scatterplot of Height and Self Esteem

Pearsons Coefficient of Correlation(r)

Person

Height (x)

Self Esteem (y)

x*y

x*x

y*y

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Sum =

68
71
62
75
58
60
67
68
71
69
68
67
63
62
60
63
65
67
63
61
1308

4.1
4.6
3.8
4.4
3.2
3.1
3.8
4.1
4.3
3.7
3.5
3.2
3.7
3.3
3.4
4
4.1
3.8
3.4
3.6
75.1

278.8
326.6
235.6
330
185.6
186
254.6
278.8
305.3
255.3
238
214.4
233.1
204.6
204
252
266.5
254.6
214.2
219.6
4937.6

4624
5041
3844
5625
3364
3600
4489
4624
5041
4761
4624
4489
3969
3844
3600
3969
4225
4489
3969
3721
85912

16.81
21.16
14.44
19.36
10.24
9.61
14.44
16.81
18.49
13.69
12.25
10.24
13.69
10.89
11.56
16
16.81
14.44
11.56
12.96
285.45

MULTIPLE RESPONSE/CHOICE
ANALYSIS

What are Multiple Choice Questions? How to


code them?
Could you mention the
purpose (s) for which
you normally went out
during last week?
i.
ii.
iii.
iv.
v.

To buy groceries
To visit relatives
To visit friends
To run errands
To attend social
function
vi. To places of worship
vii. Any other..
31 Aug 2012

Treat each choice as a


separate question
Assign one column for
each choice
Code Yes and No as 1
and 2
1. Buy groceries:
Yes=1
No=2

245

Distribution of the sample elderly by


places/purpose of going out
Place /purpose

31 Aug 2012

%*

To attend social functions

172 (73.2)

To buy groceries

160(68.1)

To visit relatives

157(66.8)

To places of worship

146(62.1)

To visit friends / neighbors

138(58.7)

For a stroll

114 (48.5)

To run errands

87(37.0)

Shopping/visit mall

71(30.2)

To hang out in a place in neighborhood

36(15.3)

To park

34(14.5)

Attend exhibitions and events in the city


*percentages do not add to 100 due to multiple
responses.
N= 243

22(9.4)

246

Statistical Inference and


Hypothesis Testing

Statistical Inference
The process of generalization in prescribed manner
from a sample to its universe is known as

Statistical Inference.
Population Parameters
: Population mean
: Population standard deviation

Sample Statistic
x: Sample mean
s: Sample standard deviation

Universe/Population

SAMPLE

HYPOTHESIS TESTING
Hypothesis testing in inferential statistics involves
making inferences about the nature of the
population on the basis of observations of a sample
drawn from the population.
What is Statistical Hypothesis?
A Hypothesis is a statement/conjecture about one or
more population parameters.

What is null hypothesis?


A null hypothesis (H0) is a hypothesis of no
relationship or no difference.
Steps in hypothesis testing?
1. State the Hypothesis
2. Set the criterion for rejecting H0
3. Compute the test statistic
4. Decide whether to reject H0

Type I and Type II Errors


(Huck et. al, 1974)
The researcher normally would state a null hypothesis or an
alternate hypothesis. If a null hypothesis states that there is
no difference, an alternative hypothesis states that there is
a difference.
Ex.: Alternate Hypothesis

Teacher behaviour changes as a function of changes in


student behaviour.

Null Hypothesis

There will be no teacher behaviour changes as a


function of changes in student behaviour.

Selecting a Level of Significance


The level of significance is a probability that defines how rare or unlikely the sample data must be
before the researcher can reject the null hypothesis or confirm alternate hypothesis. The most
common levels used are: .05 and .01 level.
.95 level of confidence means the same thing as the .05 level of significance.
Calculated value from the data and critical value from the statistical table.
Rejecting or Failing to Reject Null Hypothesis
If critical value is larger than the calculated value, then the researcher accepts (or fails to
reject) the null hypothesis.
Possible Errors in Hypothesis Testing
There is a possibility that the researcher will make the wrong decision concerning the null
hypothesis, in either accepting or rejecting it.
A Type I Error is rejecting null hypothesis when it is true (RNT).
A Type II Error is accepting the null hypothesis when it is false (ANF).
One and Two Tailed Tests
A two tailed test is sensitive to significant differences in either direction (i.e. greater and
less); the one-tailed test is sensitive to differences in only one direction (i.e. greater or less).

1. State the Hypothesis


In inferential statistics, the term hypothesis has a very specific
meaning: conjecture about one or more population parameters.
The hypothesis to be tested is called the null hypothesis and is
given the symbol H0.
Example: We use a null hypothesis that the mean quantitative SAT
score of the population of XII standard psychology students is 455.
Thus, our null hypothesis, written in symbols, is
H0: = 455

OR

H0: -455 = 0

Where
=
455 =

population mean
Hypothesis value to be tested

We test the null hypothesis (H0) against the alternative


hypothesis (symbolized H1), which includes the
possible outcomes not covered by the null
hypothesis. For the above example we will use the
alternative hypothesis as
H1 : 455
The alternative hypothesis, often considered the
research hypothesis, can be supported only be
rejecting the null hypothesis

2. Set the Criterion for Rejecting H0


After stating the hypothesis the next step in hypothesis
testing is determining how different the sample statistic
(X) must be from the hypothesized population
parameter () before the null hypothesis can be
rejected. For our example, suppose we randomly select
144 XII standard psychology students from the
population and find the sample mean (X) to be 535. Is
this sample mean X=535) sufficiently different from
what we hypothesize for the population mean ( = 455)
to warrant rejecting null hypothesis. Before answering
this question, we need to consider three concepts: (i)
errors in hypothesis testing, (ii) level of significance, and
(iii) Region of rejection

i. Errors in hypothesis testing


When we decide to reject or not reject the
null hypothesis, there are four possible
situations:
a.
b.
c.
d.

A true hypothesis is rejected.


A true hypothesis is not rejected.
A false hypothesis is not rejected
A false hypothesis is rejected

In a specific situation, we may make one of


two types of errors, as shown in the figure
below:
State of nature
Decision made

Null hypothesis
is true

Null hypothesis
is false

Reject null
hypothesis

Type I error

Correct
decision

Do not reject
null hypothesis

Correct
decision

Type II error

Type I error is when we reject a true null


hypothesis.
Type II error is when we do not reject a false
null hypothesis

ii. Level of significance


To choose the criterion for rejecting H0, the
researcher must first select what is called the level of
significance. The level of significance or alpha ()
level is defined as the probability of making a Type I
error when testing a null hypothesis.
The level of significance is the probability of making a
Type I error: rejecting H0 when it is true.

iii. Region of Rejection


The region of rejection is the area of the sampling
distribution that represents those values of the sample
mean that are improbable if the null hypothesis is true.
The Critical values of the tests statistic are those values
in the sampling distribution that represent the beginning
of the region of rejection.
When the alternative hypothesis is non-directional, the
region of rejection is located in both tails of the
sampling distribution. The test of the null hypothesis
against this non-directional alternative is called a twotailed test

Region of rejection for sampling distribution of the mean for null


hypothesis H0 : = 455 and x = 8.33

3 Compute the Test Statistic


In our example
=455, the hypothesized value for the parameter
n=144, the size of the sample
X= 535, the observed value for the sample statistic
=100, the value of the standard deviation in the
population
First using the concept of z scores, we determine how different X is from , or the
number of standard errors (standard deviation units) the observed sample
value is from the hypothesized value.
In symbols,

z=

X -m

For this example


535 - 455
8.33
= 9.60

z=

calculating the z score using above formula is called


computing the test statistic

4. Decide about H0
Suppose we had found that the sample mean (X) for 144
students was not 535, but 465. Our hypotheses, sampling
distribution, and critical values (+1.96 and -1.96) remain
the same, but now the test statistic is
z=

X -m

465 - 455
= 1.20
8.33

In other words, the observed sample mean (X=465) is 1.20


standard errors above the hypothesized value of the
population mean.

Theoretical sampling distribution for the hypothesis H0:=45,


illustrating the values of the test statistic when X=465

-1.96

+1.96

1.20
Note that the test statistic does not exceed the critical value; it does not fall
into the region of the rejection; and we should not reject the null
hypothesis

Region of rejection : Directional Alternative Hypothesis

In the SAT example, we tested the null hypothesis against a non-directional


alternative:
H0 : = 455
H1 : 455
This test is called two-tailed or non-directional because the region of rejection was
located in both tails of the sampling distribution of the mean.
Suppose a direction of the results is anticipated. A directional hypothesis states that
a parameter is either greater or less than the hypothesis value.
For instance, in the SAT example we might use the alternative hypothesis that the
mean SAT level of our population is greater than 455, in symbols,
H0 : = 455
H1 : > 455
An alternative hypothesis can be either non-directional or directional. A directional
alternative hypothesis states that the parameter is greater than or less than the
hypothesized value. A non-directional alternative hypothesis merely states that the
parameter is different from (not equal to) the hypothesized value.

The test of the null hypothesis against a directional alternative is called a


one-tailed test, the region of rejection is located in one of the two tails of
the sampling distribution. The specific tail of the distribution is
determined by the direction of the alternative hypothesis.
Now suppose the alternative hypothesis states that the mean SAT was less
than 455. In symbols, the hypotheses are

H0 : = 455
H1 : < 455
Here the critical region lies on the left tail of the distribution

Hypothesis Testing when 2 is Unknown


For testing the hypothesis about a population mean when
is not known, we estimate the standard deviation of the
population () by using the standard deviation of the
sample (s). The estimated standard error of the sampling
distribution of sample mean (SX) is then given by

s
sX =
n

Students t Distributions
Does the adjustment of using s to estimate have an effect on the statistical test?
Actually, it does, especially for small samples. The effect is that the normal
distribution is inappropriate as the sampling distribution of the mean. In the
beginning of the 20th century William S. Gosset found that, for small samples,
sampling distribution departed substantially from the normal distribution and that,
as sample sizes changed, the distributions changed. This gave rise to not one
distribution but a family of distributions.
The t distributions are a family of symmetrical, bell-shaped distributions that
change as the sample size changes.
Degrees of Freedom : The number of degrees of freedom is a mathematical
concept defined as the number of observations less the number of restrictions
placed on them.

Students t distribution for 1, 2, 5, 10, and


degrees of freedom

Computation of Test Statistic


when the variance in the population is known and
the normal distribution is used as the sampling
distribution the test statistic is defined as
However, when the variance of the sample is used
as an estimate of population variance, the test
statistic is defined as t
z=

X -m

t=

Where

SX =

X -m
SX

S
n

Test Statistic =

Statistic - Parameter
Standard error of the Statistic

This test statistic is then compared to the critical value. If the test statistic exceeds the critical
values in absolute value, then the null hypothesis is rejected

Point Estimates and Interval Estimates


A point estimate is a single value that represent the
best estimate of the population value. If we are
estimating the mean of a population (), then the
sample mean (X) is the best point estimates.
Interval Estimation builds on points estimation to
arrive at a range of values that are tenable for the
parameter and that define an interval we are
confident contains the parameter.

Confidence Interval
2
When is Known
CI= X (ZCV) (X)
Where
X = Sample mean
ZCV = Critical value using the normal distribution and
X = Standard error of the mean

Confidence Interval
When 2 is Unknown
CI= X (tCV) (sX)
Where
X = Sample mean
tCV = Critical value using appropriate t distribution and
sX = estimated standard error of the mean from the sample

Introduction to Linear Regression and


Correlation Analysis

Goals
After this, you should be able to:

Calculate and interpret the simple correlation


between two variables
Determine whether the correlation is significant
Calculate and interpret the simple linear regression
equation for a set of data
Understand the assumptions behind regression
analysis
Determine whether a regression model is
significant

Goals
(continued)

After this, you should be able to:


Calculate and interpret confidence intervals
for the regression coefficients
Recognize regression analysis applications
for purposes of prediction and description
Recognize some potential problems if
regression analysis is used incorrectly
Recognize nonlinear relationships between
two variables

Scatter Plots and Correlation


A scatter plot (or scatter diagram) is used to
show the relationship between two variables
Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
Only concerned with strength of the
relationship
No causal effect is implied

Scatter Plot Examples


Linear relationships
y

Curvilinear relationships
y

x
y

x
y

Scatter Plot Examples


(continued)
Strong relationships
y

Weak relationships
y

x
y

x
y

Scatter Plot Examples


(continued)
No relationship
y

x
y

Correlation Coefficient
(continued)

The population correlation coefficient


(rho) measures the strength of the
association between the variables
The sample correlation coefficient r is an
estimate of and is used to measure the
strength of the linear relationship in the
sample observations

Features of and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the negative
linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship

Examples of Approximate
r Values
y

r = -1

r = -.6

r=0

r = +.3

r = +1

Calculating the
Correlation Coefficient
Sample correlation coefficient:

r=

( x - x)( y - y )
[ ( x - x ) ][ ( y - y ) ]
2

or the algebraic equivalent:

r=

n xy - x y

[n( x 2 ) - ( x )2 ][n( y 2 ) - ( y )2 ]

where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable

Calculation Example
Tree
Height

Trunk
Diamete
r

xy

y2

x2

35

280

1225

64

49

441

2401

81

27

189

729

49

33

198

1089

36

60

13

780

3600

169

21

147

441

49

45

11

495

2025

121

51

12

612

2601

144

S=321

S=73

S=3142 S=14111

S=713

Calculation Example
Tree
Height,
y 70

r=

n xy - x y

[n( x 2 ) - ( x)2 ][n( y 2 ) - ( y)2 ]

60

50

40

30

(continued)

8(3142) - (73)(321)
[8(713) - (73) 2 ][8(14111) - (321) 2 ]

= 0.886

20

10

0
0

10

Trunk Diameter, x

12

14

r = 0.886 relatively strong positive


linear association between x and y

Introduction to Regression Analysis


Regression analysis is used to:
Predict the value of a dependent variable
based on the value of at least one
independent variable
Explain the impact of changes in an
independent variable on the dependent
variable

Dependent variable: the variable we wish to


explain
Independent variable: the variable used to
explain the dependent variable

Simple Linear Regression Model


Only one independent variable, x
Relationship between x and y is
described by a linear function
Changes in y are assumed to be
caused by changes in x

Types of Regression Models


Positive Linear Relationship

Negative Linear Relationship

Relationship NOT Linear

No Relationship

Population Linear Regression


The population regression model:
Population
y intercept
Dependent
Variable

Population
Slope
Coefficient

Independent
Variable

y = 0 + 1x +
Linear component

Random
Error
term, or
residual

Random Error
component

Linear Regression Assumptions


Error values () are statistically independent
Error values are normally distributed for any
given value of x
The probability distribution of the errors is
normal
The probability distribution of the errors has
constant variance
The underlying relationship between the x
variable and the y variable is linear

Population Linear Regression


y

y = 0 + 1x +

(continued)

Observed Value
of y for xi

i
Predicted Value
of y for xi

Slope = 1
Random Error
for this x value

Intercept = 0

xi

Estimated Regression Model


The sample regression line provides an estimate of
the population regression line
Estimated
(or predicted)
y value

Estimate of
the regression
intercept

Estimate of the
regression slope

y i = b0 + b1x

Independent
variable

The individual random error terms ei have a mean of zero

Introduction
Correlation
the strength of the linear relationship between two
variables

Regression analysis
determines the nature of the relationship

Is there a relationship between the number of


units of alcohol consumed and the likelihood of
developing cirrhosis of the liver?

294

Pearsons coefficient of correlation (r)


Measures the strength of the linear relationship
between one dependent and one independent
variable
curvilinear relationships need other techniques

Values lie between +1 and -1


perfect positive correlation r = +1
perfect negative correlation r = -1
no linear relationship r = 0

295

r = +1

Pearsons coefficient of correlation

r = -1

r=0

r = 0.6

296

Scatter plot

BMD

dependent variable
make inferences about

Calcium intake
independent variable
297

Non-Normal data

298

Normalised

299

SPSS output: scatter plot

300

SPSS output: correlations

301

Interpreting correlation
l

Large r does not necessarily imply:


l

strong correlation
l

r increases with sample size

cause and effect


strong correlation between the number of
televisions sold and the number of cases of
paranoid schizophrenia
l watching TV causes paranoid schizophrenia
l may be due to indirect relationship
l

302

Interpreting correlation
l

Variation in dependent variable due to:


l
l
l
l
l
l

relationship with independent variable: r2


random factors: 1 - r2
r2 is the Coefficient of Determination
e.g. r = 0.661
r2 = = 0.44
less than half of the variation in the dependent
variable due to independent variable

303

304

Agreement
l

Correlation should never be used to determine


the level of agreement between repeated
measures:
l
l
l

measuring devices
users
techniques

It measures the degree of linear relationship


l

You can have high correlation with poor agreement

305

Non-parametric correlation
l
l
l

Make no assumptions
Carried out on ranks
Spearmans r
l

Kendalls t
l
l
l

easy to calculate
has some advantages over r
distribution has better statistical properties
easier to identify concordant / discordant pairs

Usually both lead to same conclusions


306

Role of regression
l
l

Shows how one variable changes with another


By determining the line of best fit
l
l

linear
curvilinear

307

Line of best fit


l
l

Simplest case linear


Line of best fit between:
l

dependent variable Y
l BMD
independent variable X
l dietary intake of Calcium

Y = a + bX
value of Y when X=0 change in Y when X increases by 1
308

Role of regression
l

Used to predict
l
l
l

the value of the dependent variable


when value of independent variable(s) known
within the range of the known data
l
l

extrapolation risky!
relation between age and bone age

Does not imply causality

309

SPSS output: regression

310

Multiple regression
l

More than one independent variable


l

BMD dependent on:


age
l gender
l calorific intake
l Use of bisphosphonates
l Exercise
l etc
l

311

Logistic regression
l

The dependent variable is binary


l
l

yes / no
predict whether a patient with Type 1 diabetes
will undergo limb amputation given history of
prior ulcer, time diabetic etc
l

result is a probability

Can be extended to more than two


categories
l

Outcome after treatment


l

recovered, in remission, died


312

Summary
l

Correlation
l
l
l
l

strength of linear relationship between two variables


Pearsons - parametric
Spearmans / Kendalls non-parametric
Interpret with care!

Regression
l
l
l
l

line of best fit


prediction
Multiple regression
logistic
313

Statistics for Health Research

Regression:
Checking the Model
Peter T. Donnan
Professor of Epidemiology and Biostatistics

Objectives of session
Recognise the need to check fit of
the model
Carry out checks of assumptions in
SPSS for simple linear regression
Understand predictive model
Understand residuals

How is the fitted line


obtained?
Use method of least squares (LS)
Seek to minimise squared vertical
differences between each point and
fitted line
Results in parameter estimates or
regression coefficients of slope (b)
and intercept (a) y=
y=a+bx
a+bx

Dependent (y)

Consider Fitted line of


y = a +bx

a
Explanatory (x)

Consider the regression of age on


minimum LDL cholesterol achieved
Select Regression
Linear.
Dependent (y) Min LDL achieved
Independent (x) - Age_Base

Output from SPSS linear


regression
Coefficientsa
Model
1

Unstandardized Coefficients Standardized Coefficients


B
Std. Error Beta
t
(Constant)
2.024
.105
19.340
Age at baseline
-.008
.002
-.121
-4.546

sig
.000
.000

a. Dependent Variable: Min LDL achieved

N.B. 0.008 may look very small but


represents:
The DECREASE in LDL achieved for each
increase in one unit of age i.e. ONE year

Output from SPSS linear


regression
Coefficientsa
Model
1

Unstandardized Coefficients Standardized Coefficients


B
Std. Error Beta
t
(Constant)
2.024
.105
19.340
Age at baseline
-.008
.002
-.121
-4.546

sig
.000
.000

a. Dependent Variable: Min LDL achieved

H0 : slope b = 0
Test t = slope/se = -0.008/0.002 = 4.546 with
p<0.001, so statistically significant
Predicted LDL = 2.024 - 0.008xAge

Prediction Equation from linear


regression
Predicted LDL achieved = 2.024 - 0.008xAge
So for a man aged 65 the predicted LDL
achieved = 2.024 0.008x 65 = 1.504
Age

Predicted Min LDL

45

1.664

55

1.584

65

1.504

75

1.424

Assumptions of Regression
1. Relationship is linear
2. Outcome variable and hence
residuals or error terms are approx.
Normally distributed

Use Graphs and Scatterplot


to obtain the Lowess line of
fit

Use Graphs and Scatterplot to


obtain the Lowess line of fit
1. Create Scatterplot and then
doubledouble
-click to enter chart
editor
2. Chose Icon Add
Add fit line at
total
total
3. Then select type of fit such
as Lowess

Linear assumption: Fitted


lowess smoothed line

Lowess smoothed line (red) gives a good eyeball


examination of linear assumption (green)

Definition of a residual
A residual is the difference between
the predicted value (fitted line) and the
actual value or unexplained variation
ri = y i E ( y i )
Or
ri = yi ( a + bx )

Residuals

To assess the residuals in SPSS


linear regression, select plots..
Normalised
or
standardised
predicted
value of LDL
Normalised
residual
Select
histogram of
residuals and
normal
probability plot

In SPSS linear regression, select


Statistics..
Model fit
Select
confidence
intervals for
regression
coefficients

Select DurbinDurbinWatson for


serial correlation
and identification
of outliers

Output:
Scatterplot of residuals vs. predicted
Note
1) Mean of
residuals
= 0
2) Most of
data lie
within +
or -3
SDs of
mean

Assumptions of Regression
1. Relationship is linear
2. Outcome variable and hence
residuals or error terms are approx.
Normally distributed

Output:
Histogram of standardised residuals

Plot of
residuals
with
normal
curve
supersuper
imposed

Output:
Cumulative probability plot
Look for
deviation
from
diagonal
line to
indicate
nonnon
normality

Output:
Description of residuals
Descriptive statistics for residuals

Subjects with standardised


residuals > 3

Residuals Statisticsa
Casewise Diagnostics(a)

Minimum Maximum
Predicted Value
1.314867 1.843205
Residual
-1.65389 4.0658469
Std. Predicted Value
-2.750
3.264
Std. Residual
-2.302
5.660

Mean Std. Deviation


1.556478
.0878548
.0000000
.7181448
.000
1.000
.000
1.000

a. Dependent Variable: Min LDL achieved

Worth
investigation?

N
1383
1383
1383
1383

Case Number
Residual
164
5.660
209
4.395
250
3.143
268
3.064
274
3.227
362
4.095
517
3.636
849
3.968
1047
4.207
1075
3.885
1103
3.519
1229
3.016
1290
3.975

Std. ResidualMin LDL

Predicted

5.5840
4.5260
3.7875
3.8730
4.0953
4.5350
4.3240
4.3290
4.4360
4.4040
3.9905
3.7660
4.2345

4.0658471
3.1573148
2.2581750
2.2013357
2.3180975
2.9415398
2.6122125
2.8508873
3.0223141
2.7907805
2.5279157
2.1667456
2.8553933

1.518153
1.368685
1.529325
1.671664
1.777153
1.593460
1.711788
1.478113
1.413686
1.613219
1.462584
1.599254
1.379107

a. Dependent Variable: Min LDL achieved

Output:
Model fit and serial correlation
Model Summary
Model
1

R
.121a

R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson


.015
.014
.7184048
2.034

a. Predictors: (Constant), Age at baseline

R correlation between min LDL achieved and Age at


baseline, here 0.121
R2 - % variation explained, here 1.5%, not particularly
high
Durbin-Watson test - serial correlation of residuals
Durbinshould be approximately 2 if no serial correlation

Summary
After fitting any regression model check
assumptions Functional form linearity is default,
often not best fit, consider quadratic
Check Residuals for approx. normality
Check Residuals for outliers (> 3 SDs)
All accomplished within SPSS

Practical on Model Checking


Read in LDL Data.sav
1) Fit age squared term in min LDL model and
check fit of model compared to linear fit
(Hint: Use transform/compute to create age
squared term and fit age and age2)
2) Fit separate linear regressions with min
Chol achieved with predictors of 1) baseline
Chol 2) APOE_lin 3) adherence
Check assumptions and interpret results

What is ANOVA?

A statistical method for testing whether two or more dependent variable means are
equal (i.e., the probability that any differences in means across several groups are
due solely to sampling error).
Variables in ANOVA (Analysis of Variance):
Dependent variable is metric.
Independent variable(s) is nominal with two or more levels also called
treatment, manipulation, or factor.
One--way ANOVA: only one independent variable with two or more levels.
One
Two--way ANOVA: two independent variables each with two or more levels.
Two
With ANOVA, a single metric dependent variable is tested as the outcome of a
treatment or manipulation.
With MANOVA (Multiple Analysis of Variance), two or more metric dependent
variables are tested as the outcome of a treatment(s).

How Do We State The


Null and Alternative Hypotheses?

H0: The means for all groups are the same


(equal).
Ha: The means are different for at least one pair
of groups.
H0: m1 = m2 = . = mk
Ha: m1 m2 . mk

How do you determine which means are


significantly different?

The FF-statistic assesses whether you can conclude that


statistical differences are present somewhere between
the group means.
But to identify where the differences are you must use
follow--up tests called multiple comparison tests.
follow
Many multiple comparison tests are available in SPSS.

Writing the Research Report


The purpose of the written report is to
present the results of your research,
but more importantly to provide a
persuasive argument to readers of
what you have found.

Components of an Empirical
Research Paper in Economics

Title
Abstract
Table of Contents
Introduction and Literature Survey
Theoretical Analysis
Empirical Testing
Conclusions
References

Introduction
The purpose of the introduction to the
research report is to provide the rationale for
the research. This rationale should address
four issues:
What is the nature of the issue or problem the
research investigates?
Why is this worthy of investigation?

Introduction
What have previous researchers discovered
about this issue or problem?
What does your research attempt to prove?

The Written Literature Review


A literature review is a summary of the major
studies that have been published on a
research topic. Literature review is usually
included as part of the introduction in
research papers.

The Written Literature Review


The literature review should accomplish three goals:
v It should identify the major findings on a topic up to the
present;
v It should point out the principal deficiencies of these studies
or provide a sense of what is lacking in the literature; and
v It should conclude by leading into your research question, by
explaining how your research proposes to contribute to the
literature or address some short-coming of a previous study.

The Most Frequently Asked


Question!
Students frequently ask how many sources
should be included in the literature survey.
What do you think the answer should be?

The Answer
It depends on how many major studies have been
completed on the topic.
If you only report one or two sources, readers may
suspect that you have not put enough effort into
searching the literature. You dont want to miss a
major study, since at best it will make you look
careless and at worst it may weaken the rationale for
your research.

What a Literature Survey is NOT


A list of potential sources of information
about your topic;
A list of sources that you reviewed, or even
A list of summaries of the sources you
reviewed.

Theoretical Analysis
The purpose of this part of research is to
present the theoretical analysis of the issue or
problem you are investigating. This is also
described as presenting your theoretical
model.

Empirical Testing of the Analysis


The purpose of the empirical testing part of
the research report is to provide the empirical
evidence for your research argument. The
theme of this section of the paper can be
summarized as: Given your hypothesis, how
did you test it and what were your findings?

Empirical Testing of the Analysis


This section should include:
The data used;
The empirical model and type of statistical
analysis you employed;
The results you hypothesized;
The actual results; and
Your interpretation of the results.

Conclusions
The purpose of this part of the research report is to
summarize your findings, that is, to restate your
argument and conclude whether or not it is valid. In
light of the statistical results, what can you infer
about your hypothesis? To what extent did your
empirical testing confirm your analysis?

Writing a Research Report


If research was not written up, did
it really occur?

Writing a Research Report


Academic sociologists conduct research to discover
facts, truths, and explanations about the social
world.
They write research reports to convey theirs and
others research findings.
Types of Research:
Library research refers to gathering information that
others have generated.
Primary research refers to generating information
through data collection, analysis, and reporting
findings.

Writing a Research Report


Sociologists articles, papers, or research reports come in
different forms:
Literature Review: Library research that organizes facts and/or
theories others in the sociological community generated (Rarely
published)
Research Article or Book: Ones own findings generated by a
primary research project that builds on previous research by the
sociological community. (Findings from basic research, most
common.)
Applied Research Report: Ones findings from a primary research
project that evaluates a program without drawing much from
previous sociological research. (Findings from applied research,
rarely published.)
This class focuses on writing Research Articles.

Writing a Research Report


A sociological article, paper, or report generally
covers only one important topic of interest and
conveys evidence and interpretations of evidence.
Research reports are NOT creative writing, opinion
pieces, poems, novels, letters, musings, memoirs, or
interesting to read.

Writing a Research Report

A sociological article, paper, or report about primary research


generally takes a structure or form that seems difficult but is
intended to help make reading it or using it for research quick and
efficient.
A research report has seven components:
1.
2.
3.
4.
5.
6.
7.

Abstract or Summary
Introduction
Review of Literature
Methods
Results
Conclusions and Discussion
References

Note:

Qualitative research reports will vary from what is presented here.


Applied research reports may vary from what is presented here.

Writing a Research Report

A research report has seven components:


1.

Abstract or Summary
The abstract or summary tells the reader very briefly what the main
points and findings of the paper are.
This allows the reader to decide whether the paper is useful to them.
Get into the habit of reading only abstracts while searching for
papers that are relevant to your research.
Read the body of a paper only when you think it will be useful to
you.

Writing a Research Report


A research report has seven components:
1. Abstract or Summaryan example

Writing a Research Report

A research report has seven components:


2.

Introduction
The introduction tells the reader:

Introductions should:

what the topic of the paper is in general terms,


why the topic is important
what to expect in the paper.

funnel from general ideas to the specific topic of the paper


justify the research that will be presented later

Introductions are sometimes folded into literature reviews

Writing a Research Report


A research report has seven components:
2. Introductionan example

Writing a Research Report

A research report has seven components:


3.

Review of Literature
The literature review tells the reader what other researchers
have discovered about the papers topic or tells the reader
about other research that is relevant to the topic. Often what
students call a research paper is merely a literature review.

A literature review should shape the way readers think about a


topicit educates readers about what the community of
scholars says about a topic and its surrounding issues.

Along the way it states facts and ideas about the social world
and supports those facts and ideas with evidence for from
where they came (empiricism).

Writing a Research Report

A research report has seven components:


3.

Review of Literature

Literature reviews have parenthetical citations running


throughout. These are part of a systematic way to document
where facts and ideas came from, allowing the skeptical reader
to look up anything that is questionable.

Parenthetical citation is our way of substantiating the claims in


our paper, without breaking our flow.

Each citation directs the reader to the references where


complete details on sources can be found. Therefore,
information such as authors first names or titles of works do
not need to be written into the text.

Writing a Research Report

A research report has seven components:


3. Review of Literature
Citations consist of authors last names and the year of publication. One
finds complete information on sources by looking up last names and
dates in alphabetized referencesso theres no need to put all that
information in the text.
We have conventions that allow the reader to figure out from where
information is coming . Here are some examples of the conventions for
citing in text of the literature review:
Just pointing out where info came from:
Form: blah blah (Author Year)
Example: the gays are different (Lee 2004).

More than one article in the same year:


Form: blah blah (Author Yeara) and also blah blah (Author yearb)
Example: are different (Lee 2004a), but are more pickled (Lee 2004b)

Writing a Research Report

A research report has seven components:


3. Review of Literature
We have conventions that allow the reader to figure out from where
information is coming . Here are some examples of the conventions for citing
in text of the literature review:
Where a researcher is quoted:
Form: blah, Quote quote (Author Year: Pages)
Example: reveals that the gays are different. (Lee 2004: 340).

More than one source:


Form: blah blah (Author Year; Author Year)
Example: bis are more adept (Lee 2004; Seymour & Hewitt 1997).

Writing a Research Report

A research report has seven components:


3. Review of Literature
We have conventions that allow the reader to figure out from where
information is coming . Here are some examples of the conventions for citing
in text of the literature review:
Using the authors name in a sentence:
Form: Author (Year) says that
Example: Lee (2004) claims that girls will rule the world

Quoting a person and using their name:


Form: Author (Year: Pages) says, Quote quote
Example: Lee (2004: 341) says, Girls are more likely to rule the world

Writing a Research Report


A research report has seven components:
3. Review of Literatureexamples of citing

Writing a Research Report

A research report has seven components:


3.

Review of Literature
If an idea is used, but cannot be substantiated by the
community of sociologists, the literature review clearly shows
that the author is speculating and details the logic of the
speculation.

Do NOT discuss irrelevant information.

For example, a paper on attitudes about marijuana attitudes should not


detail the multiple uses of hemp such as in clothing, rope, hemp oil and so
forth.

The literature review has is written in the authors voice. The


sources of information are not extensively quoted or copied
and pasted. Instead, the author puts facts and ideas into his
or her own words while pointing out from where the
information came.

Analogously, if you were discussing the exciting things you learned in a


sociology course at a cocktail party, you would use your own words. You
would NOT pull out a book or lecture notes and quote these word for
word.

Writing a Research Report

A research report has seven components:


3.
Review of Literature

Note: Explaining why social events occur as they do requires use (and
testing) of explanations that have worked before. THESE
EXPLANATIONS ARE CALLED THEORIES.

Most academic literature reviews have a guiding theory that is


used to:

Sometimes the whole point of a research project is to:

Frame (or help us understand) facts in the literature.


Establish expectations (or hypotheses) for the research.
Justify speculation when no evidence to justify an idea specific to a
topic exists in the literature.

Determine whether a theory works


Pit two or more theories against each other to see which works
better

You will most likely not refer to theories in your papers

Writing a Research Report

A research report has seven components:


3.
Review of Literature

Quantitative literature reviews typically end with:

1.

Focused declarations of the particular issues the research


activity is addressingideas about a topic that will be
tested with quantitative methods

2.

Research hypotheses
Hypotheses are statements of the expected relationship(s)
between two (or more) variables
For example:
Men will have higher investment income than women.
Older Americans are more likely to oppose abortion for a
woman who doesnt want her baby because she is poor.

Writing a Research Report


A research report has seven components:
3. Review of Literatureexamples of hypotheses
Hypothesis 1. In a new social context, girls will be more sociable than boysgetting more involved with
others (interactional commitments) and forming more emotionally close relationships (affective
commitments)across activity domains.
Hypothesis 2. Given that commitments to new relationships positively determine identity prominence,
and identity prominence positively determines behaviors, if girls are more sociable with newer
persons, their identities and behaviors will change more across activity domains.
Hypothesis 3. However, girls and boys will experience the same identity processes, meaning that girls
and boys with the same sociability in new relationships will have equal identity and behavior
changes.

Writing a Research Report

A research report has seven components:


4.

Methods

A METHODS SECTION MUST CONTAIN:


1.

Descriptions of Data (Think in terms of: Who, What, When, Where,


Why and How?)
Report:
A. The Target Population
B. The Ways Data were Collected:
1. Sampling
2. Delivery Methods
C. Response Rates
D. Sample sizes resulting from various decisions
Such as:
1. eliminating non-Christians from the sample
2. using only white respondents

Writing a Research Report

A research report has seven components:


4. Methods
A METHODS SECTION MUST CONTAIN:
2. Descriptions of Variables
First for dependent, then for independent variables, report:
A. Names for the variablesmake them intuitive! (Do not use
GSS variable names.)
B. Word for word description of the questions. (sociology
differs from psychology and medicine)
C. Final coding schemethe numbers you assigned to
responses.

Writing a Research Report

A research report has seven components:


4.

Methods
A METHODS SECTION MUST CONTAIN:

3.

Manipulations of the variables or data


For example:
A.
B.

4.

Reflection on ability of data to generalize to the target


population
A.
B.

5.

recoding income from 23 uneven intervals to five equivalent


categories
removing non-citizens if studying voting patterns

Limitations of Data (omitted cases, biases, etc.)


Analyses that bolster claims that the data are appropriate

Statistical techniques that will be used to test your hypotheses


and the statistics program used.

Writing a Research Report


4. Methods

Writing a Research Report

A research report has seven components:

5. Results

The results section chronicles the outcome of


the statistical analyses, assessing whether your
hypotheses were correct and why or why not.

Writing a Research Report

A research report has seven components:


5.

Results
The results section includes:

Narrative describing most relevant findings

Professional tables showing descriptive and inferential


statistics

Tables must be numbered and have a descriptive title


There are conventions for formatting
For example:

Asterisks are used to highlight results that are statistically important

All numbers in a column are aligned on decimals

Writing a Research Report


5. Results

Writing a Research Report

A research report has seven components:


5.

Results

The narrative and tables are complementary.

The narrative discusses ONLY VERY IMPORTANT Results and


leaves details for tables.

As different outcomes are described in the narrative, reference is


made to where the detailed information can be found in the
tables.

The tables contain almost all statistical information so that the


author does not have to write a narrative for every detail in the
analysis.

Writing a Research Report

A research report has seven components:


5. Results
The narrative highlights:
Evaluations of the hypotheses. Were the
research hypotheses supported?
Statements about new discoveries or
surprises encountered in the analyses

Writing a Research Report

A research report has seven components:


6. Conclusions and Discussion
This section assesses how ones research findings
relate to what the community of sociologists have
accepted as facts.
Things that should be done:
1. Summarize the most salient points of your research
(tell the reader what you found out about your
topic).
2. Discuss the general significance of your topic and
findings.

Writing a Research Report

A research report has seven components:


6. Conclusions and Discussion
3. Discuss the shortcomings of your study and how
these might affect your findings.
4. Discuss things future researchers should investigate
about your topic to advance knowledge about it.
5. Help the reader gain the knowledge that you think
he or she ought to have about the topic. You spent a
lot of time exploring the, you should share your
expertise.

Writing a Research Report

A research report has seven components:


7. References
The references are just as important as any other part of
your paper.
References are the empirical support for claims in a
paper that are not directly observed in the research.
They are needed for researchers to remain empirical in
their descriptions of topics.

Writing a Research Report

A research report has seven components:


7. References:
Link the paper to the community of scholars, permitting
readers to assess the worthiness claims in a paper.
Make the research process much more efficient because
they make it very easy to look up sources of facts and
ideas.

Writing a Research Report

A research report has seven components:


7.
References
Style:
Hanging indented
Alphabetical on authors last name (by increasing year within same author)
Invert only first authors name
Information within source in an order determined by type of source
Article:
Last Name, first name, first name last name, and first name last name. Year. Article
title. Journal Name Volume(number): 1st Page- Last Page.
Lee, James Daniel. 2005. Do Girls Change More than Boys? Gender Differences and
Similarities in the Impact of New Relationships on Identities and Behaviors.
Self and Identity 4:131-47.
Multiple authors
Kroska, Amy and Sarah K. Harkness. 2008. Exploring the Role of Diagnosis in the
Modified Labeling Theory of Mental Illness. Social Psychology Quarterly
71:193-208

Writing a Research Report

A research report has seven components:


7.

References

Book Chapter:
Last Name, first name. Year. Chapter Name. Pages in the book in Book Name, edited
by first name last name. City of Publisher: Publisher.
Bianciardi, Roberto. 1997. "Growing Up Italian in New York City." Pp.179-213 in Adult
Narratives of Immigrant Childhoods, edited byAna Relles. Rose Hill, PA:
Narrative Press.
Book:
Last name, first name. Year. Book Name. City of Publisher: Publisher.
Stryker, Sheldon. 1980. Symbolic Interactionism: A Social Structural Version. Menlo
Park, CA: Benjamin/Cummings.

Writing a Research Report

A research report has seven components:


7.

References

General Social Survey:


Davis, James Allan and Smith, Tom W.: General Social Surveys, 1972-2008. [machine-readable
data file]. Principal Investigator, James A. Davis; Director and Co-Principal Investigator, Tom W.
Smith; Co-Principal Investigator, Peter V. Marsden, NORC ed. Chicago: National Opinion
Research Center, producer, 2005; Storrs, CT: The Roper Center for Public Opinion Research,
University of Connecticut, distributor. 1 data file (53,043 logical records) and 1 codebook
(2,656 pp).
Website:
Last Name (if available), first name. Year (if available). Article or web page title. Journal or
Report Name Volume (if available). Retrieved date (http://address).
Markowitz, Robin. 1991. Canonizing the Popular. Cultural Studies Central. Retrieved
October 31, 2001 (http://culturalstudies.net/canon.htm).
Note: Do your best to replicate this style in the case of missing information. If there is no author,
use the title in that position. Always have a retrieved date and website address.

Writing a Research Report


A research report has seven components:
7. Referencesan example

Writing a Research Report


Some General Points
1.

Make accurate sociological claims in your paper. Stake out


positionsa kind of, I think I have the answer to this issue,
position.

2.

Cite facts to support your sociological claims.

3.

If you can, use theories to support your sociological claims.

4.

Every declaration or fact claim must be cited or overtly posed as


speculation.

Writing a Research Report


Some General Points
5.

Anticipate your readers questions as you write:


A.
B.
C.

6.

help the reader understand why your topic is important


demonstrate to the reader that you adequately investigated your
topic
help them anticipate what youll say nexteverything you say should
seem reasonable to say

While writing, keep thinking The point is to:


(1) establish hypotheses
(2) describe how to test the hypotheses
(3) give results of tests, and
(4) discuss what the reader should believe about the world.

Writing a Research Report


Some General Points
7.

There is no right answer in a research paperJust approximate


representations of the truth that are closer or further away from
that truth.

The truth is:

From Community of Scholars:

What they said about your topic in the journals, books, and
other publications

From you:

What your methods and analyses revealed about the


topic.

Writing a Research Report


FinallyAvoiding Plagiarism
What is it?

All knowledge in your head has either been copied


from some place or originally discovered by you.
Most knowledge was copied.
This is true in most settings. General knowledge is
copied. Most teachers lectures are copied
knowledge.
Human culture would not exist without our keen
ability to copy!
Humans are natural copiers, but that is not what is
meant by the term plagiarism.

Writing a Research Report

The Elements of Style endorses imitation as a way for a writer to achieve


his own style:
The use of language begins with imitation . . . The imitative life continues long
after the writer is on his own in the language, for it is almost impossible to
avoid imitating what one admires. Never imitate consciously, but do not worry
about being an imitator; take pains instead to admire what is good. Then
when you write in a way that comes naturally, you will echo the halloos that
bear repeating.

Copied from: http://www.answers.com/topic/writing-style-1

Writing a Research Report


FinallyAvoiding Plagiarism

What is it?

Among other things, plagiarism refers to taking others work


and representing it as if it were your own.

In academics this is bad because with plagiarism:

One cannot assess students development accurately


The person who makes his or her livelihood by scholarly pursuit is
being robbed of credit
It masks the lineage of ideas and facts.

Plagiarism is to academics as Enron-accounting is to


corporate America.

Writing a Research Report


FinallyAvoiding Plagiarism
Lineage of Ideas:

Original sources of research are all the proof we have for some facts.
Without the paper trail of academic thought:

People could pass incorrect ideas off as facts

We would have to keep re-proving things.

The contexts that generated facts and ideas get lost.

Research becomes highly inefficient as it becomes incredibly difficult to


find full information on a topic.

Writing a Research Report


FinallyAvoiding Plagiarism

To avoid plagiarism:
1.
2.
3.
4.
5.

Document every source for information that is not general


knowledgethis includes facts and ideas.
Cite every time a fact or idea is used unless it is clear that one
citation is referring to a group of facts or ideas.
If you quote material, put quotation marks around the quoted
stuff and include a page number within the citation.
It is alright to paraphrase material, but you still have to cite from
where the paraphrased material came.
When in doubt, cite the source.

Improper citing is grounds for failure on the course paper.

You might also like