Professional Documents
Culture Documents
18
Linkage, Allele Sharing, and Association
Mara Giordano
205
206 Giordano
gamets are generated by the independent assortment of the parental alleles and are referred to
as “recombinants.” For two independent loci, the ratio parentals: recombinants is 50 : 50.
If two loci are syntenic (i.e., they lie on the same chromosome), they do not segregate inde-
pendently. Genetic linkage is the phenomenon whereby alleles at loci close together on the
same chromosome do not assort independently at meiosis, but they will tend to be inherited
together. If the linkage between two loci is complete, an individual heterozygous at both loci
(AaBb) will produce only two types of gamete. In Fig. 2, the loci A and B are closely linked:
All of the gametes will carry the parental combination (i.e., 50% AB and 50% ab).
An example of independent and linked loci segregating in the same family is reported in Fig 3,
in which is shown a large pedigree with some members affected by a rare autosomal dominant
disease, in which each affected individual has inherited a mutant copy of the disease gene (D)
and is Dd heterozygous, whereas unaffected subjects have two normal alleles at the disease
locus (dd genotype). All of the members of the family have been tested with two markers:
marker A on chromosome 2, with alleles A1 and A2, and marker B on chromosome 5, with
alleles B1 and B2. Following the inheritance of marker A alleles and the inheritance of the
disease, the affected members of the pedigree transmit either the marker allele A1 or A2 to
affected offspring with equal likelihood, as expected by random segregation of the disease and
the marker. Instead, when comparing the inheritance of marker B alleles and the disease, all
affected members of the pedigree transmit the allele B1 to the affected offspring and none of the
Linkage, Allele Sharing, and Association 207
unaffected members of this pedigree has inherited this same allele. The precise cosegregation
of B1 and D alleles allows us to hypothesize that the marker locus B and the disease locus are
physically linked on chromosome 5.
In geneticist terminology, two alleles at linked loci are said to be in coupling (or in cis) when
they are on the same parental chromosome (e.g., B1 and D in Fig. 3) and in repulsion (or in
trans) when they are on two different chromosomes (e.g., B1 and d). The only consequence of
different phases is that the gametes that are called recombinant in one case are nonrecombinant
in the other.
3. Recombination Fraction
The family reported in Fig. 4 shows the segregation of the nail–patella syndrome, a rare
autosomal dominant disorder, involving nail and skeletal deformities. The affected individuals
are heterozygotes Dd for the disease allele, whereas the non affected are homozygotes dd. All
of the family members were genotyped for the ABO blood group determined by three different
alleles, A and B codominant and each dominant to O. Individual II-1 inherits the D-A allelic
combination from her father (I-1) and the allelic combination d-B from her mother (I-2). She
transmits to her eight children the combination D-A four times and the combination d-B three
208 Giordano
Fig. 3. Pedigree with members affected by an autosomal dominant disease. All the individu-
als were genotyped for two markers A (alleles A1 and A2) and B (alleles B1 and B2). Individuals
carrying the various gametic combinations of the disease locus alleles with each of the two
markers are reported in the table. For marker A, all of the possible allelic combinations are
detected, as expected for independent loci. For marker B, only two allelic combinations are
present (D-B1 and d-B2), suggesting that the disease and marker B locus are linked and that the
disease allele D is carried in cis with marker allele B1.
times, and in just one case, she transmits the recombinant combination d-A (II-3). Thus, the
parental combinations D-A and d-B are cotransmitted to members of the third generation more
frequently than expected by chance. This suggests that the nail–patella and the ABO group loci
are linked.
As stated above, most but not all the individuals inherit the parental allelic combination.
Individual III-3 received the marker allele A from her mother but not the disease-allele D. This
is a recombinant, originated by a crossing-over event between the loci ABO and nail–patella
which occurred during the maternal meiosis.
Crossing-over is the reciprocal exchange of corresponding segments between the nonsister
chromatids of two homologous chromosomes. It is assumed to be random along the length of the
Linkage, Allele Sharing, and Association 209
Fig. 4. Pedigree with members affected by Nail–Patella syndrome. The genotype at the ABO
locus is reported for each individual. The parental combinations of II-1 (D-A and d-B) and the
gametes transmitted to her eight children are reported. All of the children but III-3 (recombi-
nant) received a parental combination.
chromosome. Thus, the more distant the loci, the greater is the probability for a crossing-over to
occur between them somewhere along the length of the chromosome. As a consequence alleles
at loci on the same chromosome co-segregate at a rate related to the distance between them. The
distance separating two loci is measured by the recombination fraction (ϑ), which corresponds
to the probability that a recombination event occurs between the two loci (see Fig. 5).
Because the frequency of recombinant gametes is a function of the distance between two
genes, it is used as a measure of the so-called “genetic distance” between the two genes. One
percent recombination is referred to as one map unit, or 1 centimorgan (cM), distance. The
human genome has been estimated by recombination studies to be about 3000 cM. As the
physical length of the haploid genome is 3000 Mb, 1 cM can be approximated to 1 Mb (1
million bases).
The maximum measurable genetic distance between two synthenic loci is 50 cM (50% re-
combination). This is explained by the fact that if a single crossing-over occurs, only two of the
four chromatids are involved (see Fig. 6A). Considering two synthenic loci, a single crossover
event occurring between them generates two recombinant gametes out of four. Therefore, 100%
probability of a crossover event results in 50% recombinant gametes. However, two loci far
apart can be separated by more than one crossing-over event. Double crossovers can involve
two, three, or four chromatids (see Fig. 6B). As shown in Fig. 6B, because the three types of
double crossing-over occur randomly, the overall effects of this double recombination event
give, on average, 50% recombinants. Consequently, the proportion of the parental and recom-
binant gametic combinations when two loci are far apart on the same chromosome approxi-
mates to 50%, exactly as is seen for loci located on different chromosomes. Thus, when θ > 0.5,
the two loci are not linked, because they are either on different chromosomes or far apart on the
same chromosome.
4. Informative Matings
Linkage analysis between two loci can be performed only when it is possible to distinguish
individuals who inherit recombinant allelic combinations from individuals who inherit nonre-
210 Giordano
Fig. 5. Gametic combinations for two linked loci: (1) closely linked loci: no crossing-over
occurs between the two loci. All of the gametic combinations are parental. (2) Crossing-over
between the two loci occurs at a rate related to the distance separating the two loci. The propor-
tion of recombinant gametes is equal to the recombination fraction (ϑ) and the proportion of
nonrecombinant gametes is 1–ϑ, each type being (1–ϑ)/2.
combinant (parental) combinations. Recombination events can be detected only when the indi-
vidual is heterozygous at both loci and the phase of the parental alleles is known. The phase can
be established for sure only in three-generation pedigrees with available grandparents. In the
family in Fig. 7A, it is not possible to count recombinants and nonrecombinants because the
phase of the parental gametes in unknown. The same family is reported with the first-genera-
tion genotypes available (see Fig. 7B). Now, the phase can be determined (B-D) and it is pos-
sible to distinguish whether a gamete passed to the third generation is the product of a
recombination event or is a parental (nonrecombinant) gamete and to count recombinants and
nonrecombinants in the offspring. In some pedigrees, even when the grandparents are acces-
sible, the phase remains undetectable. This occurs when the parent useful for linkage analysis is
homozygous (see Fig. 7C) or when the parents, although heterozygous, share the same marker
alleles and their children are also heterozygous (see Fig. 7D).
Fig. 6. Each crossing-over event involves two of the four sister chromatids: (A) a single
crossing-over generates two recombinant and two nonrecombinant chromatids; (B) a double
crossing-over generates only non recombinant chromatids if it involves two strands; a
threestrand double crossing-over generates two recombinants and two nonrecombinants; a four-
strand double crossing-over generates only recombinants. The three types of double crossing-
over occur randomly, so the average effect is 50% recombinants.
gous. The heterozygosity of a marker corresponds to the chance that a randomly selected per-
son will be heterozygous for that marker. This depends on the number of alleles and their
relative frequency. If the marker alleles are A1, A2, A3,…, and their frequencies are p1, p2,
p3,…, the heterozygosity is
H = 1 − ∑ pi2
where pi corresponds to the frequency of the pi-th allele, and p2 is the fraction of homozygotes.
For biallelic markers, the maximum heterozygosity is 0.5, corresponding to the presence of two
alleles with equal frequency [H = 1– (0.52 + 0.52) = 0.5].
The first generation of DNA markers was represented by RFLPs (restriction fragment length
polymorphisms) corresponding to DNA variations located within a restriction enzyme recogni-
tion site. The analysis of RFLPs required isolation of a large amount of DNA, restriction digests,
Southern blotting, and hybridization with radiolabeled probes. The major limitation of RFLPs
was their uninformativeness, as they are biallelic markers with the maximum heterozygosity of
0.5.
A great improvement was given by minisatellite VNTR (variable number tandem repeat)—
tandemly repeated DNA sequences of 100–1000 bp that show an elevated number of alleles in
the population and, thus, a high level of heterozygosity. However, they are not uniformly dis-
tributed all over the genome (they tend to cluster near telomeres) and have the drawback of
requiring typing by Southern blotting and radioactive detection.
The last generation of highly informative polymorphic markers is represented by
microsatellites, which consist of particular DNA sequence motifs ranging from 1 to 6 basepair
212 Giordano
Fig. 7. Informative and uninformative meioses for linkage analysis. (A) Uninformative mei-
oses: the grandparents are not available and the phase of II-1 (useful for linkage analysis) is
unknown. (B) Informative meioses: the genotypes of the first generation are now available and
the phase can be determined (B-D and A-D). (C) Uninformative meioses: II-1 is homozygous at
the ABO locus and it is not possible to distinguish which alleles she transmits to the offspring.
(D) Uninformative meioses: the two parents II-1 and II-2 are heterozygous for the same marker
alleles; the children could have inherited A from the mother and B from the father, or vice versa.
(bp) tandem repetitions. They usually have multiple alleles differing in the repetition number
and show a high level of heterozygosity. The presence of numerous different alleles decreases
the probability that the two parents are heterozygous for the same alleles (as the example of
Fig. 7C), thus decreasing the probability of uninformative meioses. Microsatellites are uni-
formly distributed all over the genome and are easily genotyped by polymerase chain reaction
(PCR). These markers became the most popular genetic markers and a high-density
microsatellite map including more than 5000 microsatellites has been developed (2).
Fig. 8. Family with Nail–Patella syndrome, typed for the ABO locus. II-1 is a double het-
erozygous (AB, Dd) individual but the phase is unknown. If the phase is D-A (1), there are five
recombinants and one nonrecombinant. Alternatively, if the phase is D-B (2), there are one
recombinant and five nonrecombinants. R = recombinant; NR = nonrecobinant.
In most human pedigrees, because of the small family size and the difficulty in collecting
grandparent DNA, it is impossible to determine the allele phase and to directly count the recom-
binants. In Fig. 8, individual II-1 is again double heterozygous, but the phase is unknown.
Therefore, among her children, there can be either five recombinants and one nonrecombinant
or there are five nonrecombinants and one recombinant. In this case, we cannot identify unam-
biguously recombinants and nonrecombinants.
The test used to calculate the significance of the observations indicating that two loci are
linked at a certain genetic distance (ranging from ϑ = 0 to ϑ = 0.5) is the LOD score test. It
evaluates the overall likelihood of the pedigree, on the alternative assumption that the loci are
linked (ϑ < 0.5) or not linked (ϑ = 0.5). The ratio of these two likelihoods gives the odds of
linkage and it is expressed as the log of this ratio (LOD stands for “log of odds”). Being
expressed as a logarithm, the LOD score has the useful feature that scores from different
matings for which the same loci are analyzed can be added, hence providing a cumulative set of
data either supporting or not supporting some particular linkage value.
When performing a LOD score test we ask “what is the likelihood that the detected meioses
(births) come from linked loci with a certain percentage of recombination (0 < ϑ < 0.5) as
compared with the probability that the same sequence would have been produced by indepen-
dent loci (ϑ = 0.5)?” In each family, the probability for an individual being recombinant is ϑ
and the probability of being nonrecombinant is 1–ϑ. To determine the LOD score, different
values of ϑ are chosen. The Z value (LOD score) is calculated for each, and then the ϑ at which
the maximum LOD score (Zmax) is obtained gives the best estimate (maximum likelihood esti-
mate, MLE) of the recombination fraction between the two loci. This value of ϑ is the estimate
of the map distance between the marker and the disease locus.
6.1. Example of Lod Score Calculation When the Phase is Known
In the family of Fig. 4, we can establish that among the children, there are one recombinant
(a priori probability ϑ) and seven nonrecombinants (a priori probability 1-ϑ) because the phase
in individual II-1 is known (A is in coupling with D). The overall likelihood given linkage is
214 Giordano
(1– ϑ)7ϑ. The likelihood given no linkage is (0.5)8. The linkage/nonlinkage likelihood ratio is
(1 –ϑ )7ϑ / (0.5)8 and the LOD score (logarithm of the ratio) is
Z (ϑ) = log[(1 –ϑ )7 ϑ / (0.5)8].
We can calculate the Z values for different ϑ values. For example, determine Z at ϑ = 0.2:
Probability given linkage = (1–0.2)7 (0.2) = 0.0419,
The value of ϑ for which Z reaches a maximum positive value (Zmax ) is 1.1163 at a ϑ value
of 0.125. However, this would not be considered a significant LOD value. In fact, the linkage
hypothesis is conventionally considered statistically significant when it is 1000-fold (LOD score
= 3) more likely than the independence hypothesis. To obtain such high a value, even if the
linkage hypothesis is correct, larger sample sizes are needed. This could be done either by
extending the analysis to other family members (if they are available) or by adding the Z value
calculated from other pedigrees.
In the family reported in Fig. 8, the phase of the parental gamete is unknown. This repre-
sents the situation more frequently encountered in human pedigrees. There is no way to know
for certain if the D allele at the nail–patella locus is in cis with the A allele at the ABO locus (D-
A; prior probability 1/2) or with the B allele (D-B; prior probability 1/2). If the correct phase is
D-A, there are five nonrecombinants and one recombinant. The overall likelihood of the ob-
served sequence of meioses is 1/2 (1– ϑ)5ϑ. If the correct phase is D-B, there is one recombi-
nant and five nonrecombinants. The overall likelihood of the latter sequence of meioses is 1/2
(1– ϑ)ϑ5.
The overall likelihood is then calculated as
⎧⎡ 1 1 ⎤ 6⎫
Z (ϑ ) = log ⎨ ⎢ (1 – ϑ ) ϑ + (1 – ϑ ) ϑ 5 ⎥ / ( 0.5 ) ⎬
5
⎩⎣ 2 2 ⎦ ⎭
This allows for either phases with equal prior probability. The calculation is performed at
different ϑ values giving the following results:
J 0 0.1 0.2 0.3 0.4 0.5
Z –∞ 0.276 0.323 0.222 0.076 0
Many families can be scored and the Z values obtained can be added to reach a significant
result demonstrating linkage between the two loci. The total results can be displayed graphi-
cally (see Fig. 9). The graph was obtained from the LOD scores calculated in different families
including the pedigrees of Figs. 4 and 8. In this example, the LOD score at ϑ = 0 tends to –∞.
This is generally the case if there has been at least one recombination event between the loci,
where it is clear that the two loci cannot be zero distance apart. In the graph, the maximum
value of Z occurs at a recombination fraction of 0.1 and it is higher than 3. Therefore, we can
feel confident that the two loci are linked and the best estimate of their genetic distance is ϑ
=0.1, which corresponds to 10 cM.
Linkage, Allele Sharing, and Association 215
Fig. 9. Graph of LOD score against recombination fraction from a hypothetical set of families.
LOD score values are reported for pedigrees 1 and 2 corresponding to the families of Figs. 4 and
8, respectively. All of the LOD score values from n families are added and the total LOD scores
are reported for different ϑ values. The maximum value of Z (Zmax = 3.9) occurs at a recombina-
tion fraction of 0.1, suggesting that the disease locus and the ABO locus are 10 cM apart.
Linkage is conventionally excluded when Z < –2 which corresponds to 100 : 1 odds against
linkage. Values of Z between –2 and 3 are defined inconclusive. In the examples of nail–patella
affected pedigrees reported above, the LOD scores could be easily calculated manually. How-
ever in many monofactorial diseases several factors can complicate linkage analysis. One of
these is the incomplete penetrance. Nail–patella syndrome is a dominant disease with complete
penetrance; that is, all of the individuals carrying the Dd genotype manifest the clinical symp-
toms typical of the syndrome. In several dominant diseases, the penetrance is not 100%, but it
is incomplete (i.e., not all of the individuals carrying the Dd genotype are affected), thus the
genotype of unaffected family members cannot be known for certain. Another complicating
factor is genetic heterogeneity, which means that mutations in different genes are responsible
for the disease. Consequently, in different families, the disease will be coinherited with differ-
ent marker loci.
216 Giordano
Therefore, several sophisticated software packages have been developed (such as the LINK-
AGE package) to determine the LOD score values for a continuous range of ϑ from very com-
plex pedigree data, considering factors such as the incomplete penetrance and the genetic
heterogeneity.
6.1.1.1. MULTIPOINT LINKAGE ANALYSIS
Up to now, we have only considered linkage analysis with two loci (the disease locus and
one marker locus). However, by performing two-point linkage analysis, we have no informa-
tion about the order of the loci. In the example of the nail-patella locus, we know that it is at
about 10 cM from the ABO locus, but it is impossible to determine on which side with respect
to the marker locus it is located.
In multipoint analysis, the inheritance of multiple loci in an interval is considered simulta-
neously. Instead of calculating the recombination fraction between a single marker and the dis-
ease locus “D,” the recombination fraction is determined between the disease locus and several
markers at the same time. In this approach, a fixed map of known markers is used to establish the
position of an unknown disease locus. The locus D is moved across the map and the LOD score
values are calculated for each possible point of the map considering all the markers of the map
simultaneously. A great advantage of multipoint linkage analysis is that it gives a higher LOD
score because the LOD-scores obtained at the multiple points of the map can be added. Pro-
grams such as LINKMAP (part of the LINKAGE package) and GENHUNTER are commonly
used to compute the overall likelihood of the pedigree at each position.
The basic approach currently used in genetic mapping of human diseases involves the analy-
sis of the segregation of many markers distributed all over the genome. To this aim, a map of
about 300 microsatellites spaced, on average, 10 cM is used. In affected families, one or few
microsatellites will be eventually identified that cosegregate with the disease, giving signifi-
cant LOD score values.
The next step is collecting as many families as possible and using a denser marker map to
refine the chromosomal position in order to identify the shorter interval containing the disease
gene.
Fig. 10. Identical-by-state (IBS) and identitical-by-descent (IBD) alleles. (A) The two sibs
have one identical allele (A1). As the parents’genotypes are not available, we do not know if A1
is IBS or IBD. The genotypes of the parents reveal that A1 is IBS (B). (C) The two children
share one IBD; they inherited it from the heterozygous mother. (D) Both of the two children
inherited A1 from their homozygous mother. The allele is IBS for sure, but we cannot distin-
guish if it is also IBD. (E) Parents and children are heterozygous for the same genotype; the
children could have inherited A1 from the mother and B1 from the father, or vice versa. They
have two IBS alleles; we cannot distinguish if they are also IBD.
Fig. 11. Affected sib pair linkage analysis. By random segregation, sib pairs share none,
one, or two IBD alleles 25%, 50%, or 25% of the times, respectively. If linkage is present, the
siblings will tend to share one or two IBD alleles more often than 50% and 25%, respectively.
Table 1
Affected Sib Pair Analysis in Type 1 Diabetes
No. of ASP who inherited
No. of from one parent
Chromosome Marker locus informative Same Different Chi-square
meiosis allele (1 IBD) alleles (0 IBD)
1 D1S234 142 73 (51%) 69 0.062
1 D1S238 147 75 (51%) 72 0.061
2 D2S131 138 67 (49%) 71 0.116
2 D2S125 147 68 (46%) 79 0.823
3 D3S1297 125 60 (48%) 65 0.2
3 D3S1265 139 70 (50%) 69 0.007
6 D6S258 117 90 (77%) 27 33.923
6 DSS291 102 54 (53%) 48 0.353
7 D7S503 138 71 (51%) 67 0.116
7 D7S684 140 68 (48%) 72 0.114
9 D9S144 123 61 (50%) 62 0.008
9 D9S118 138 68 (53%) 70 0.029
markers (300–400) throughout the genome (whole-genome screening). When an elevated de-
gree of allele sharing is found with a marker, it means that a susceptibility gene might be
located nearby. As in the case of a parametric LOD score, it is crucial to evaluate the strength of
findings of linkage at different sites in the genome. This is usually reported as the LOD score or
the p value. A maximum LOD score ratio is used to test whether the degree of sharing is signifi-
cantly distorted from the expected sharing under the hypothesis of no linkage, just like the
parametric LOD score is used to test whether a recombination frequency less than 0.5 is signifi-
220 Giordano
Table 2
Lander and Kruglyak Significance Criteria for Mapping Loci Involved
in Complex Diseases by Whole-Genome Scans in Affected Sib Pairs
Category Range of p-values Range of LOD scores
No linkage 1.00–0.0008 0–2.1
Suggestive linkage 0.0007–0.00003 2.2–3.5
Significant linkage 0.00002–0.0000004 3.6–5.4
Highly significant ⱕ 0.0000003 ⱖ5.4
linkage
Confirmed linkage Significant linkage in an initial study, confirmed
in an independent sample
cant. The p value corresponds to the probability that the observed deviation in allele sharing
has arisen by chance under the independent assortment of the alleles. Several geneticists have
suggested some guidelines for interpreting the significance of evidence of linkage data obtained
from whole-genome screening of ASP to avoid false-positive results. At the same time, caution is
needed not to run the risk of missing true hints of linkage. The significance of a positive linkage
result depends on how often such deviation would arise by chance in a whole-genome search.
Lander and Kruglyak (3) proposed some significance criteria for mapping loci involved in
complex diseases by whole genome scans categorizing the significant p value, as shown in
Table 2. In their guidelines, the authors suggested that to reach a significant linkage, a thresh-
old of LOD ⱖ 3.6 must be imposed. Moreover, a linkage study that gave a significant LOD
score must be replicated in a second independent set of families (replication study) to be cred-
ible and therefore confirmed.
The most widely used software for the nonparametric LOD score is the GENEHUNTER
program, which performs multipoint analysis in affected sib pairs. The results can be graphi-
cally shown as curves reporting the LOD score values (y-axis) for each chromosome marker (x-
axis). In Fig. 12 is presented a graph with a linkage curve for chromosome 6 obtained from a
genomewide scan on sib pairs affected by celiac disease (CD) (4). A highly significant LOD
score is reached in correspondence of the HLA region, which is known to play an important
role in CD susceptibility.
Fig. 12. Affected sib pair linkage analysis in coliac disease with markers on chromosome 6
(4). Multipoint LOD scores are plotted relatively to markers along the chromosome. A highly
significant peak of linkage is detected in correspondence of markers mapping within the HLA
region.
chromosome 2q34 (IDDM12) has been linked to type 1 diabetes by several groups (6–9) and it
maps near the CTLA4/CD28 gene complex. These two genes are both expressed by T-cells:
CD28 mediates T-cell activation, whereas CTLA4 induces apoptosis. A polymorphism in the
first exon and a microsatellite in the 3' untranslated region of the CTLA-4 gene have been
found to be associated to T1DM in independent studies (10–12).
Among the remaining loci, replication of linkage evidence was reported in different data sets
for the susceptibility intervals IDDM4 (11q13), IDDM5 (6q25), IDDM6 (18q21), IDDM8 (6q27)
although there is not as yet an indication for the likely susceptibility gene in these regions.
It is often difficult to detect and replicate linkage data and considerable heterogeneity is
present among data sets. This can be attributed both to statistical errors (including false posi-
tives and false negatives) and to biological causes such as the large number of involved loci,
genetic heterogeneity, variable disease gene frequencies and environmental factors. Given this
situation, the study of many hundreds of affected sib-pairs of the order of thousands would be
required to establish linkage and this material is not easy to collect.
Sib pair analysis has been used to investigate many other complex diseases such as asthma,
diabetes, multiple sclerosis, schizophrenia, and celiac disease. All of these studies have been
successful to a limited extent and progress has been extremely slow. Only a few genes and
some genetic regions outside the HLA involved in complex diseases have been identified and
replicated in different studies. Linkage analysis might be ineffective for mapping complex dis-
ease loci with low phenotypic effect and it can succeed only with moderate to large effect
variations. Association studies provide a more powerful approach to detect disease susceptibil-
ity genes with a relatively modest effect.
222 Giordano
Table 3
Reported Susceptibility Loci for Type 1 Diabetes
Locus Chromosome Lod score
IDDM1 6p21.3 65.8
IDDM2 11p15.5 4.28
IDDM3 15q26 0.01
IDDM4 11q13 1.37
IDDM5 6q25-q27 1.60
IDDM6 18q21 0.01
IDDM7 2q31-q33 2.62
IDDM8 6q25-q27 0.94
IDDM9 3q22-q25 0.50
IDDM10 10p11-q11 2.80
IDDM11 14q24.3-q31 1.22
IDDM12 2q33 0.77
IDDM13 2q34 0.23
IDDM15 6q21 2.36
IDDM17 10q25 1.38
IDDM18 5q33-34 0.05
D1S617 1q42 2.27
D16S3098 16q22-24 4.13
Table 4
Some Examples of HLA-Associated Diseases
% Individuals carrying
the associated allele
Disease Allele Patients Controls
Ankylosing spondylitis B27 90% 9%
Type 1 diabetes DR3 52% 23%
DR4 74% 24%
DR3 or DR4 93% 43%
Multiple sclerosis DR2 60% 30%
Rheumatoid arthritis DR4 81% 24%
Coliac disease DQ2 95% 28%
Grave’s disease DR3 65% 27%
Narcolepsy DR2 95% 33%
two loci that was present in the ancestor might have been preserved through many generations.
(This conservation is referred to as “linkage disequilibrium.”) Thus, the particular allele com-
bination would be found more often than expected by chance, i.e. considering the frequencies
of the two alleles). The combination of alleles between more distant loci would be lost because
of the increased probability of crossing-over between the two. As a consequence, when it does
not involve the causal variations, association depends on the presence of linkage disequilib-
rium between the tested markers and the disease gene and it can be observed only using mark-
ers located very near to it. Conversely, detecting association of a marker with the disease
indicates that the marker is located very close to the disease gene. An international project
called the HapMap project has been launched to characterize the LD pattern throughout the
genome, in the hope of facilitating association studies. It has been discovered that LD is highly
structured into discrete islands, called haplotype blocks (13). Each block comes only in a few
common patterns and few markers can represent a haplotype block. Thus, when the haplotypic
structure of the human genome is completed, the search for genes underlying common diseases
will require testing only few single-nucleotide polymorphisms marking different versions of a
given chromosome region. In the future, whole-genome association studies can be envisaged.
The proportion of subjects with the risk factor who develop the disease is a/(a+b) and it can
be considered as a measure of the risk of developing the disease in the presence of the risk
factor. Similarly, the risk of developing the disease in the absence of the risk factor is c/(c+d).
224 Giordano
The RR compares the risk of developing the disease in presence of the allele with the risk in
absence and it is defined by the ratio of the two risks:
a / (a + b) a (c + d )
RR = =
c / (c + d ) c (a + b)
The odds ratio is, by definition, the ratio of two odds for an event (i.e., the ratio between the
probability of one event and the probability that the event does not occur). The OR is used in
retrospective studies in which one of the random sample consists of individuals who developed
the disease (cases) and the other random samples consists of unaffected individuals (controls).
In this kind of approach, the number of affected and unaffected is fixed by the researcher and,
thus, it makes no sense to estimate the risk of developing the disease in presence or in absence
of the risk factor. In this case, a comparison of the presence of a risk factor for disease in a
sample of affected subjects and nonunaffected controls is made: the number of individuals
with disease who possess the allele over those with disease who do not possess it (a/b) divided
by those without disease who were exposed over those without disease who were not exposed
(c/d). Thus,
a / b ad
OR = =
c / d bc
This measure should be used for association case-control studies where we retrospectively
look at a disease allele in patients and controls. For example, in one study of 91 patients af-
fected by late-onset Alzheimer disease and 71 healthy controls, a particular allele (ε4) of the
apoliprotein E (ApoE) was found in 58 patients and 16 controls:
Alzheimer’s disease
Allele ε4 Present (patients) Absent (controls) Total
Present (ε4+) 58 16 74
Absent (ε4–) 33 55 88
Total 91 71
The OR is then (58×55)/(16×33) = 6. This mean that the odds of developing Alzheimer’s disease
are six time higher in individuals who carry the ApoE-ε4 allele with respect to those individuals
who do not carry this polymorphism.
Fig. 13. The transmission disequilibrium test is performed on families with an affected child
(simplex). n families are genotyped with a microsatellite marker with multiple alleles. The
allele transmitted from heterozygous parents to the affected child are counted. Each allele has a
50% chance of being transmitted. If an allele is involved in the disease, the chance of being
transmitted will be higher than 50%. Allele A3 is transmitted 70% of the times and, thus, it
might be in linkage disequilibrium with a disease-susceptibility allele.
The problem of population stratification can be reduced to some extent by using genetically
isolated populations for case-control studies, for instance, the Finnish or the Sardinian popula-
tions. In a mixed population, a solution comes from family-based association tests as the trans-
mission disequilibrium test (TDT) (15).
The transmission disequilibrium test is performed on nuclear families with one affected
offspring and the two parents (simplex families; see Fig. 13). TDT compares the number of
times that each parent transmits a given marker allele to an affected offspring with the number
of times that he does not transmit it. By combining the data of many families, each allele has a
50% chance of being transmitted and a 50% of not being transmitted. If an allele at a locus is
responsible for the disease or if it is very close to the disease locus and in linkage disequilib-
rium with it, the chance of being transmitted will be higher than 50%. This method detects true
association, as a result of linkage disequilibrium, eliminating the possibility of spurious asso-
ciation resulting from population stratification. As an example, in a set of Type 1 diabetes
families ascertained in the United States (4), the DQB1*0302 allele is preferentially transmit-
ted to affected offspring (see Table 5)
A limitation of the TDT is that it can be performed only when the parents are heterozygous
for the marker locus. This creates a loss of statistical power because the number of individuals
useful for the analysis is confined to heterozygotes. Moreover, because it requires the parental
genotypes to be specified, it is not feasible when late-onset diseases are studied, such as
Alzheimer’s disease. Consequently, although insensitive to population stratification, the fam-
ily-based methodologies can be difficult to implement and might require significantly more
patients and family collections than case-control studies.
226 Giordano
Table 5
TDT Result for the HLA-DQB1*0302 Allele in Type 1 Diabetes Families
No. of times the allele is
Transmitted Non transmitted χ2
Observed 393 (80.3%) 96 180
Expected 244.5 (50%) 244.5
Acknowledgments
The author thanks Professor Patricia Momigliano-Richiardi for her comments and critically
reviewing this chapter.
References
1. Rommens, J. M., Iannuzzi, M. C., Kerem, B., et al. (1989) Identification of the cystic fibrosis gene:
chromosome walking and jumping. Science 245, 1059–1065.
2. Dib C., Fauré S., Fizames C., et al. (1996) A comprehensive genetic map of the human genome
based on 5,264 microsatellites. Nature 380, 152–154.
3. Lander, E. and Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting
and reporting linkage results. Nature Genet. 11, 241–247.
4. Greco, L., Corazza, G., Babron, M. C., et al. (1998) Genome search in celiac disease Am. J. Hum.
Genet. 62, 669–675.
5. Cox, N. J., Wapelhorst, B., Morrison, V. A., et al. (2001) Seven regions of the genome show evi-
dence of linkage to type 1 diabetes in a consensus analysis of 767 multiplex families. Am. J. Hum.
Genet. 69, 820–830.
6. Davies, J. L., Kawaguchi, Y., Bennett, S. T., et al (1994). A genome-wide search for human type 1
diabetes susceptibility genes. Nature 371, 130–136.
7. Morahan, G., Huang, D., Tait, B. D. Colman, P. G., Harrison, L. C. (1996) Markers on distal chro-
mosome 2q linked to insulin-dependent diabetes mellitus. Science 272, 1811–1813.
8. Esposito, L., Hill, N. J., Pritchard, L. E., et al. (1998) Genetic analysis of chromosome 2 in type 1
diabetes: analysis of putative loci IDDM7, IDDM12, and IDDM13 and candidate genes NRAMP1
and IA-2 and the interleukin-1 gene cluster. Diabetes 47, 1797–1799.
9. Mein C. A., Esposito L., Dunn M. G., et al. (1998). A search for type 1 diabetes susceptibility genes
in families from the United Kingdom. Nature Genet. 19, 297–300.
10. Nisticò, L., Buzzetti, R., Pritchard, L. E., et al. (1996). The CTLA-4 gene region of chromosome
2q33 is linked to, and associated with, type 1 diabetes. Belgian Diabetes Registry. Hum. Mol. Genet.
5, 1075–1080.
11. Marron, M. P., Raffel, L. J., Garchon, H. J., et al. (1997) Insulin-dependent diabetes mellitus (IDDM)
is associated with CTLA4 polymorphisms in multiple ethnic groups. Hum. Mol. Genet. 6, 1275–1282.
12. Ueda, H., Howson, J. M., Esposito, L., et al. (2003) Association of the T-cell regulatory gene CTLA4
with susceptibility to autoimmune disease. Nature 423, 506–511.
13. Wall, J. D. and Pritchard, J. K. (2003) Haplotype blocks and linkage disequilibrium in the human
genome. Nature Rev. Genet. 4, 587–597.
14. Sachidanandam, R., Weissman, D., Schmidt, S. C., et al. (2001) A map of human genome sequence
variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933.
15. Spielman, R. S., McGinnis, R. E., and Ewens, W. J. (1993) Transmission test for linkage disequilib-
rium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet.
52, 506–516.