You are on page 1of 8

Construct validity

From Wikipedia, the free encyclopedia


Construct validity is the degree to which a test measures what it claims, or purports, to be
measuring. In the classical model of validity, construct validity is one of three main types of validity
evidence, alongside content validity andcriterion validity.
[4][5]
Modern validity theory defines construct
validity as the overarching concern of validity research, subsuming all other types of validity
evidence.
[6][7]

Construct validity is the appropriateness of inferences made on the basis of observations or
measurements (often test scores), specifically whether a test measures the intended construct.
Constructs are abstractions that are deliberately created by researchers in order to conceptualize
the latent variable, which is the cause of scores on a given measure (although it is not directly
observable). Construct validity examines the question: Does the measure behave like the theory
says a measure of that construct should behave?
Construct validity is essential to the perceived overall validity of the test. Construct validity is
particularly important in thesocial sciences, psychology, psychometrics and language studies.
Psychologists such as Samuel Messick (1989) have pushed for a unified view of construct validity
as an integrated evaluative judgment of the degree to which empirical evidence and theoretical
rationales support the adequacy and appropriateness of inferences and actions based on test
scores
[8]
Key to construct validity are the theoretical ideas behind the trait under consideration,
i.e. the concepts that organize how aspects of personality, intelligence, etc. are viewed.
[9]
Paul
Meehl states that "The best construct is the one around which we can build the greatest number of
inferences, in the most direct fashion."
[2]

Contents
[hide]
1 History
2 Evaluation
o 2.1 Convergent and discriminant validity
o 2.2 Nomological network
o 2.3 Multitrait-multimethod matrix
3 Construct validity in experiments
4 Threats to construct validity
o 4.1 Inadequate preoperational explications of constructs
o 4.2 Mono-operation bias
o 4.3 Mono-method bias
o 4.4 Interaction of different treatments
o 4.5 Interaction of testing and treatment
o 4.6 Restricted generalizability across constructs
o 4.7 Hypothesis guessing
o 4.8 Evaluation apprehension
o 4.9 Experimenter expectancies
5 See also
6 References
7 External links
History[edit]
Throughout the 1940s scientists had been trying to come up with ways to validate experiments prior
to publishing them. The result of this was a myriad of different validities (intrinsic validity, face
validity, logical validity, empirical validity, etc.). This made it difficult to tell which ones were actually
the same and which ones were not useful at all. Until the middle of the 1950s there were very few
universally accepted methods to validate psychological experiments. The main reason for this was
because no one had figured out exactly which qualities of the experiments should be looked at
before publishing. Between 1950 and 1954 the APA Committee on Psychological Tests met and
discussed the issues surrounding the validation of psychological experiments.
[2]

Around this time the term construct validity was first coined by Paul Meehl and Lee Cronbach in their
seminal articleConstruct Validity In Psychological Tests. They noted the idea of construct validly was
not new at that point. Rather, it was a combinations of many different types of validity dealing with
theoretical concepts. They proposed the following three steps to evaluate construct validity:
1. articulating a set of theoretical concepts and their interrelations
2. developing ways to measure the hypothetical constructs proposed by the theory
3. empirically testing the hypothesized relations
[2]

Many psychologists note that an important role of construct validation in psychometrics was that it
place more emphasis on theory as opposed to validation. The core issue with validation was that a
test could be validated, but that did not necessarily show that it measured the theoretical construct it
purported to measure. Construct validity has three aspects or components: the substantive
component, structural component, and external component.
[10]
They are related close to three
stages in the test construction process: constitution of the pool of items, analysis and selection of the
internal structure of the pool of items, and correlation of test scores with criteria and other variables.
In the 1970s there was growing debate between theorist who began to see construct validity as the
dominant model pushing towards a more unified theory of validity and those who continued to work
from multiple validity frameworks.
[11]
Many psychologists and education researchers saw predictive,
concurrent, and content validities as essentially ad hoc, construct validity was the whole of validity
from a scientific point of view
[10]
In the 1974 version of The Standards for Educational and
Psychological Testing the inter-relatedness of the three different aspects of validity was recognized:
"These aspects of validity can be discussed independently, but only for convenience. They are
interrelated operationally and logically; only rarely is one of them alone important in a particular
situation". In 1989 Messick presented a new conceptualization of construct validity as a unified and
multi-faceted concept.
[12]
Under this framework, all forms of validity are connected to and are
dependent on the quality of the construct. He noted that a unified theory was not his own idea, but
rather the culmination of debate and discussion within the scientific community over the preceding
decades. There are six aspects of construct validity in Messicks Unified Theory of Construct
Validity.
[13]
They examine six items that measure the quality of a tests construct validity:
1. Consequential- What are the potential risks if the scores are, in actuality, invalid or
inappropriately interpreted? Is the test still worthwhile given the risks?
2. Content- Do test items appear to be measuring the construct of interest?
3. Substantive- Is the theoretical foundation underlying the construct of interest sound?
4. Structural- Do the interrelationships of dimensions measured by the test correlate with the
construct of interest and test scores?
5. External- Does the test have convergent, discriminant, and predictive qualities?
6. Generalizability- Does the test generalize across different groups, settings and tasks?
How construct validity should be properly viewed is still a subject of debate for validity theorists. The
core of the difference lies in an epistemological difference
between Positivist and Postpositivist theorists.
Evaluation[edit]
Evaluation of construct validity requires that the correlations of the measure be examined in regard
to variables that are known to be related to the construct (purportedly measured by the instrument
being evaluated or for which there are theoretical grounds for expecting it to be related). This is
consistent with the multitrait-multimethod matrix (MTMM) of examining construct validity described in
Campbell and Fiske's landmark paper (1959).
[14]
There are other method to evaluate construct
validity besides MTMM. It can be evaluated through different forms of factor analysis, structural
equation modeling (SEM), and other statistical evaluations.
[15][16]
It is important to note that a single
study does not prove construct validity. Rather it is a continuous process of evaluation, reevaluation,
refinement, and development. Correlations that fit the expected pattern contribute evidence of
construct validity. Construct validity is a judgment based on the accumulation of correlations from
numerous studies using the instrument being evaluated.
[17]

Most researchers attempt to test the construct validity before the main research. To do this pilot
studies may be utilized. Pilot studies are small scale preliminary studies aimed at testing the
feasibility of a full-scale test. These pilot studies establish the strength of their research and allow
them to make any necessary adjustments. Another method is the known-groups technique, which
involves administering the measurement instrument to groups expected to differ due to known
characteristics. Hypothesized relationship testing involves logical analysis based on theory or prior
research.
[3]
Intervention studies are yet another method of evaluating construct validity. Intervention
studies where a group with low scores in the construct is tested, taught the construct, and then re-
measured can demonstrate a tests construct validity. If there is a significant difference pre-test and
post-test, which are analyzed by statistical tests, then this may demonstrate good construct
validity.
[18]

Convergent and discriminant validity[edit]
Main articles: convergent validity and discriminant validity
Convergent and discriminant validity are the two subtypes of validity that make up construct validity.
Convergent validity refers to the degree to which two measures of constructs that theoretically
should be related, are in fact related. In contrast discriminant validity tests whether concepts or
measurements that are supposed to be unrelated are, in fact, unrelated.
[14]
Take, for example, a
construct of general happiness. If a measure of general happiness had convergent validity, then
constructs similar to happiness (satisfaction, contentment, cheerfulness, etc.) should relate closely to
the measure of general happiness. If this measure has discriminate validity, then constructs that are
not supposed to be related to general happiness (sadness, depression, despair, etc.) should not
relate to the measure of general happiness. Measures can have one of the subtypes of construct
validity and not the other. Using the example of general happiness, a researcher could create an
inventory where there is a very high correlation between general happiness and contentment, but if
there is also a significant correlation between happiness and depression, then the measure's
construct validity is called into question. The test has convergent validity but not discriminate validity.
Nomological network[edit]
Main article: nomological network
Paul Meehl and Lee Cronbach (1957) proposed that the development of a nomological net was
essential to measurement of a tests construct validity. A nomological network defines a construct by
illustrating its relation to other constructs and behaviors.
[2]
It is a representation of the concepts
(constructs) of interest in a study, their observable manifestations and the interrelationship among
them. It examines whether the relationships between similar construct are considered with
relationships between the observed measures of the constructs. Thorough observation of constructs
relationships to each other it can generate new constructs. For example, intelligence and working
memory are considered highly related constructs. Through the observation of their underlying
components psychologists developed new theoretical constructs such as: controlled attention
[19]
and
short term loading.
[20]
Creating a nomological net can also make the observation and measurement
of existing constructs more efficient by pinpointing errors.
[2]
Researchers have found that studying
the dimensions of the human skull (Phrenology) are not indicators of intelligence. By removing the
theory of Phernology from the nomological net of intelligence, testing constructs of intelligence is
made more efficient. The weaving of all of these interrelated concepts and their observable traits
creates a net that supports their theoretical concept. For example, in the nomological network for
academic achievement, we would expect observable traits of academic achievement (i.e. GPA, SAT,
and ACT scores) to relate to the observable traits for studiousness (hours spent studying,
attentiveness in class, detail of notes). If they do not then there is a problem with the measurement
of academic achievement or studiousness. If they are indicators of one another then the nomological
network, and therefore the constructed theory, of academic achievement is strengthened. Although
the nomological network proposed a theory of how to strengthen constructs, it doesn't tell us how we
can assess the construct validity in a study.
Multitrait-multimethod matrix[edit]
Main article: Multitrait-multimethod matrix
The multitrait-multimethod matrix (MTMM) is an approach to examining Construct Validity developed
by Campbell and Fiske (1959).
[14]
This model examines convergence (evidence that different
measurement methods of a construct give similar results) and discriminability (ability to differentiate
the construct from other related constructs). It measures six traits: the evaluation of convergent
validity, the evaluation of discriminant (divergent) validity, trait-method units, multitrait-multimethods,
truly different methodologies, and trait characteristics. This design allows investigators to test for:
convergence across different measuresof the same thingand for divergence between
measuresof related but conceptually distinct things'.
[21]

Construct validity in experiments[edit]
In order to show an example of construct validity, it would be best to do so with a landmark
experiment. One of which is the Milgrams study of obedience. The purpose of this study was to look
at whether or not a person would continue to do something they were uncomfortable with just
because someone of authority was telling them to do so. Essentially it was intended to test whether
people are obedient or not.
This was done by getting participants through voluntary participation in the form of a newspaper
advertisement.
[22]
They were all men of various ages, level of education, and occupation. There was
also an experimenter running the experiment and learner that acted in the study.
[22]
The
participants were essentially to listen to the experimenter and shock the learner every time they
responded incorrectly to studied information. There were 30 different levels of shock to be
administered.
[22]
The participants were allowed to hear the learners reaction to the shock. If the
participant did not want to continue with the shocks they were heavily encouraged to continue. If
they refused they were considered defiant before the 13th step and if they continued passed the
13th step they were considered obedient.
[22]

Now look at this in regard to construct validity. Does the level at which the person decides not to
continue with the shock really accurately measure a persons level of obedience? There are two
ways looking at this idea. There is the definitionalists view of construct validity. This view states it is
essential to define exactly what we want to be looking for when we are testing something
[23]
So in
this case, is the level of shock testing for level of obedience and level of obedience only? The other
view is called the relationists view. This view states that, in this case, it would be important to make
sure to test for obedience since that is the intention of the study, but if other factors come into play
as long as it can relate to obedience it is fine that it may be included in the testing.
[23]

It would be said that the Milgrams study does measure obedience very effectively, but it can be
seen where other factors may come into play. So this study does seem to have construct validity.
However it is important to note that this most likely aligns the relationists view. This is because this
experiment can also just be showing that some people are just oblivious to things going on around
them. They could also feel a sense of responsibility to finish as they were monetarily compensated
for their time. They were guaranteed the money no matter what but some people may have really
taken that to heart.
[22]

The level does measure level of obedience within the relationists view. Construct validity is present
in the Milgrams study making it a good valid study for its testing purposes at the time it was
administered. In this day and age though, it would not be approved by an internal review board due
to the possible psychological harm done to the participant. Even still, this is a landmark study and
one that contains a good example to proper construct validity.
Threats to construct validity[edit]
Since construct validity attempts to create a universal cohesion, there are many possible threats to
it. Construct validity is threatened by participant re-activity to the study situation (e.g. the Hawthorne
effect), altered behavior due to the novelty of a new treatment, researcher expectations, and
diffusion or contamination of the treatment conditions. Developing a poor construct can be an issue.
If a construct is too broad or too narrow it can invalidate an entire experiment.
[24]
For example, a
researcher might try to use job satisfaction to define overall happiness. This is too narrow, as
somebody may love their job but have an unhappy life outside the workplace. Likewise, using
general happiness to measure happiness at work is too broad. Construct confounding is another
threat. Construct confounding occurs when other constructs affect the measured construct.
[25]
For
example, self-worth is confounded by self-esteem and self-confidence. Ones perception of self-
worth is effected by their state of self-esteem or self-confidence. Another threat is hypothesis
guessing. If a subject makes assumption of the aims of the research their behavior changes and it
affects construct validity. Similar, if an individual becomes apprehensive during an experiment, it
could affect his or her performance.
[26]
Researches themselves, can be threats to construct validity.
Researchers expediencies and biases, whether overt or unintentional, can lower construct validity
by clouding the effect of the research variable. To avoid these effects interaction should be
minimized and double-blind experiments should be used. Variance in scores can show weak
construct validity. For example if native English speakers score higher than individuals who speak
English a second language on a test written in English trying to measure intelligence, than the test
has poor construct validity. It measures language ability rather than intelligence.
Another take on construct validity is described by William M. Trochim.
[27]
He includes Inadequate
Preoperational Explication of Constructs, Mono-Operation Bias, Mono-Method Bias, Interaction of
Different Treatments, Interaction of Testing and Treatment, Restricted Generalizability Across
Constructs, Confounding Constructs and Levels of Constructs, Hypothesis Guessing, Evaluation
Apprehension, and Experimenter Expectancies", in his definitions of threats to construct validity.
[27]

Inadequate preoperational explications of constructs[edit]
"Inadequate preoperational explications of constructs" is not defining the construct of the experiment
well enough.
[27]

Mono-operation bias[edit]
"Mono-operation bias" pertains to only using one variable or only suggesting one way of dealing with
a problem.
[27]
The problem with this is, it does not look at different aspects of the experiment and the
true reasons for the experiment.
[27]

Mono-method bias[edit]
"Mono-method bias" is how an experimenter measured or observed their experiment may correlate
to things they did not expect.
[27]

Interaction of different treatments[edit]
"Interaction of different treatments" deals with an experiment where an experimenter may implement
a plan to help a certain population, and they assume that their plan caused an effect that could not
be achieved for other reasons that might also be incorporated with that population.
[27]

Interaction of testing and treatment[edit]
"Interaction of testing and treatment" involves labeling a program without addressing the issue of the
treatment being a possible part of the program.
[27]

Restricted generalizability across constructs[edit]
"Restricted generalizability across constructs" means an experiment worked in one particular
situation, but it might not work in any other situation than that one.
[27]

Hypothesis guessing[edit]
In "hypothesis guessing" the participants in an experiment may try to figure out what the goal of an
experiment is, and their thoughts about what they think is being studied may affect how they respond
in the experiment and alter the results.
[27]

Evaluation apprehension[edit]
"Evaluation apprehension" explains how a participants mind and bodys react to knowing that they
are being experimented on . This could alter your results of an experiment.
[27]

Experimenter expectancies[edit]
"Experimenter expectancies" explain that the way a researcher may want or may expect an
experiment to go, may be a potential effect of why that experiment went a certain way.
[27]


Introduction
In this section, you will be dealing with:
recognizing types of evidence used to establish validity
apply these types of evidence to formal and informal class assessments and to
published tests that are used in schools
recognize the role of validity in the interpretation and use of tests.
Validity of assessment has to do with how closely the test measures what it was meant
to measure. If a test does not show evidence of validity then it is useless. This
evidence of validity can be discussed in three categories:
Content related evidence - looks at how well the content of the test relates to
what is being assessed. i.e.: the questions, observations, etc.
Criterion related evidence - correlation between the performance on the test
with performance of relevant criterion not in the test.
Construct related evidence - looks at whether the test matches the capabilities
or psychological construct which it is trying to measure.
Content related evidence
To determine whether the test questions, observation guidelines, etc. are valid before
they are used with the students, they are compared to the skills that the test is about to
measure. A table of Specifications, as well as clear performance objectives, are used
to do this.
A table of Specifications is constructed from a two dimensional chart including the
content area covered in the test, and a list of categories of performance the test is to
measure. With performance objectives, the emphasis is on the intersection of the
content and the performance.
Criterion related evidence
Criterion related evidence of validity involves looking at how a students performance
on a test relates to how the student has done using related material in a different
assessment formal or informal situation. To evaluate whether a test is valid, you must
be able to identify what the test can be correlated with. For example:
***
high score on college admission test high performance in college


moderate score on published achievement test moderate performance in school
***
Criterion related evidence of validity is not to be confused with Criterion referenced
interpretations of performance which refers to the frame of reference used to interpret
performance on a test. How well the student does is considered meaningful based on a
precise description of the domain the test is measuring. Predictive validity and
concurrent validity can be seen as sub categories of criterion related validity.
Construct - related evidence of validity
Construct related evidence of validity is concerned with whether or not "a test
matches the capabilities or psychological construct that is to be measured." (Oosterhof
1994) Some examples of constructs of learned knowledge are public achievement
tests, or aptitude tests.
To assess the construct validity, the Multitrait Multimethod Matrix approach is
commonly used.

The many aspects of validity are interconnected, creating a means of checking to
ensure that a test is effective for the student. The relationships among the above types
of validations can also be seen from the Multitrait Multimethod Matrix
However, the above validation framework is a traditional thought. Current views
about validation has been advanced by Messick (1989). It is commonly accepted that
construct validity should be the overarching concept for all types of validity, and the
consequential evidence of validation should be an integral part of the construct
validity.

You might also like