2006 Stat 246 WK 5 Lec 1

Mapping mouse coat color genes
Statistics 246 Spring 2006

Week 5 Lecture 1
1
Inbred strains and their crosses
Our main players are the C57BL/6 (BL for black,

abbreviated B6), a robust strain that has been around
about 90 years, and the NOD (non-obese diabetic)
mouse strain, a delicate diabetes-prone strain
discovered in 1990.
Coat colours: agouti is standard, B6 is black, NOD is
albino (i.e. white). There are many others (chocolate,
blue, etc.) but these are the three we meet here.
2
Normal (wild-type) mouse coat: color = agouti
a grizzled color of fur resulting from the barring of
3
each hair in several alternate dark and light bands
Black mouse: C57/BL6 strain 4
Albino mouse: non-obese diabetic (NOD) strain 5
Coat color loci in mice
Four main loci : A, B, C and D
• Locus A – agouti
• Locus B – black
• Locus C (known as Tyr) – albinism
• Locus D – dilution gene
In the discussion that follows, we only see variation at

loci A and C. Our mice all have the dominant (black)
allele B rather than the recessive (chocolate) allele b
at Locus B , and the dominant (normal color) allele D
rather than the recessive (diluted) allele d at Locus D.
6
Alleles at the Agouti (A) locus
• Ay, Lethal dominant yellow

• Avy, Viable yellow
• Aw, White-bellied Agouti
• A, Agouti or Wild type
• At, Black and Tan
• Am, mottled agouti
• a, Non-agouti
• ae., Extreme non-agouti
A and a are a dominant/recessive allele pair

7
Alleles at the Albino (C) Locus
• C, full color gene

• cch, chincilla
• ch, himalayan
• c, albino gene
C and c are a dominant/recessive

pair of alleles
8
Alleles at A and C interact
(called epistasis in genetics)
• If the mouse is aaCy it is not agouti and not albino

(in our case it is a black mouse)
• If the mouse is AxCy it is agouti and not albino
• If the mouse is wxcc it is albino no matter what the

alleles at the agouti locus are, because they are
irrelevant
9
Crosses
We will denote the NOD mice by A, and the B6 mice by

B. This same notation will denote the two
homozygotes at a polymorphic marker.
Two main crosses interest us, following the first filial
generation or F1 , which we denote by A×B → H. Here
H denotes heterozygote, which is the case for our F1s.
The backcross BC is arrived at via H×B → BC, or the
obvious variant, while the F2 intercross (second filial
generation) is denoted by H×H → IC=F2.
10
Our data
• An F2 inter cross was performed starting with

C57BL/6 and NOD parental lines.
• We have 133 female mice at the F2 generation, just
females for the reason that males fight, and this
influences other (quantitative blood) phenotypes of
interest
• They were genotyped at 153 microsatellite markers
spanning all 19 autosomes and the X chromosome.
We also have coat color and a few white blood cell
phenotypes.
11
Our markers are Microsatellites
..AGTCCACACACACACACATGT..
..AGTCCACACACACACACATGT.. A
PCR and
electrophoresis
..AGTCCACACACACACACATGT.. H
..AGTCCACACACACACACACACACATGT..
..AGTCCACACACACACACACACACATGT.. B
..AGTCCACACACACACACACACACATGT..
Desirable: to call the genotypes (A, H, or B) automatically

Problems: stutters and noise, variability of the patterns, etc. 12
A small portion of the data (beginning)
#individuals #loci #traits marker next column = data from mouse1
data type f2 intercross .

133 153 7
*D10M106 BBABBBBBHBBABBBBAABBBB-BABA
data type f2 intercross
133 153 7
.
*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHA
*D10M106 BBABBBBBHBBABBBBAABBBB-BABABABBABBBBBBBBBBBBB-BBBBBBABBAAABBBBBBBBB-HBABABB-ABBBBAB-BBBABABBB-BBBBBCBCBCBHBBBHCBBHBHHBCBBBBBBBHBHBHCH
*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAHAAHAHHHHHBAHHHAHHBAHBHABBBHAAHHHHAHBHHH--HHHHAHAHAHBHHHAHHABAHHHAHHHAHBHBBHHHAAHAAHHBHHAHAH-HBABAHAHBHHAH
*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAHAAHAAHHAHBAHHHHHHHAHBHABBBHAAHBBHAHBBHHBBHBHHHH-HBHHHHHAHHAHABH-AHHHAHBABBBBAAAHAAHHBHHAHHHBHBAHAHABHHHAH
*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHABAAHAAABHHBH-HAHBHAAHBCABABHAAABBHAHBHHBBBHBHAHH-HBHHHABAHHHHAHHBAAHHABHABHBHAAHBHAAHBHAAHBHBHBHHHHABAHAAH
*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHA
*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHA
D10M106 = a marker on chr 10 defined by MIT

Incompleteness code: C = B or H, D = A or H, - = missing
13
A small portion of the raw data (end)
*DXM210 --HAAAAHHHAHAAAAAHAH-HAHHAHAHHH-H
*DXM222 HAAHHAA-HHAAHAAHHAAAHH-HAAHAAHHH
*DXM39 HAAAHAA-HHAH-AAA-HAAHH-HAAAAHHHHHH
Coat color code
data type f2 intercross
133 153 7
.
*trait1
*D10M106 BBABBBBBHBBABBBBAABBBB-BABABABBABBBBBBBBBBBBB-BBBBBBABBAAABBBBBBBBB-HBABABB-ABBBBAB-BBBABABBB-BBBBBCBCBCBHBBBHCBBHBHHBCBBBBBBBHBHBHCH
*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAHAAHAHHHHHBAHHHAHHBAHBHABBBHAAHHHHAHBHHH--HHHHAHAHAHBHHHAHHABAHHHAHHHAHBHBBHHHAAHAAHHBHHAHAH-HBABAHAHBHHAH
*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAHAAHAAHHAHBAHHHHHHHAHBHABBBHAAHBBHAHBBHHBBHBHHHH-HBHHHHHAHHAHABH-AHHHAHBABBBBAAAHAAHHBHHAHHHBHBAHAHABHHHAH
*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHABAAHAAABHHBH-HAHBHAAHBCABABHAAABBHAHBHHBBBHBHAHH-HBHHHABAHHHHAHHBAAHHABHABHBHAAHBHAAHBHAAHBHBHBHHHHABAHAAH
112311231222111131111111
*trait2 8.90472059883773 8.62455170973674
WBC 8.4546
*trait3 16.0508869012649 16.1080453151048 16.167
traits
*trait4 16.0138456295845 16.0907244541622 16.125
*trait5 13.8887610197039 14.1288603771646 13.986
*trait6 7.1066061377273 6.52209279817015 6.63331
*trait7 8.65927129000923 8.41405243249672
14 8.1586
Snapshot of the genotype data
15
Using the
LOD_error
Error Detection
statistic.
Based on
close
recombn
events
which
indicate
possible
presence of
genotyping
error
(see later)
calc.genoprob, calc.errorlod, plot.errorlod

16
Mendel’s laws for one locus
We can (and should) check Mendel with data from our

133 offspring at each of our 153 loci.
For example, at D7Mit126, we have 24 A, 29 B and 67
H genotypes, adding to 120, indicating 12 incomplete
or missing genotypes.
What do we expect according to Mendel? How would

we test whether the data agree with our expectations?
17
Mendel’s law for 2 loci
Mendel inferred the independent segregation of

different factors from data on peas .
Here we check that this holds for our two coat

color loci, but not generally. We then go on to
understand the more general situation.
18
Mating & Coat color outcomes in this cross
C57/BL6 NOD females

Parental males ×
lines Albinos
Black
(aaBBCC) (AABBcc)
All Agouti
F1
aABBCc
Agouti : Black : Albino

F2
9 3 4
We need to check these last proportions following Mendel’s
19
reasoning.
Punnett square depicting F1 parental allele
combinations passed on to F2 offspring
20
It’s not always like that
132 A H B Total
51
A 26 10 0 36
H 10 46 9 65
B 0 5 23 28
Total 36 61 32 129
2-locus genotypes at D12Mit51 and D12Mit132.

If we pool A and H, we do not get 9:3:3:1. 21
Let’s estimate the recombination fraction
r between D12Mit51 and D12Mit132
132 A H B Total
51
A 26 10 0 36
H 10 46 9 65
B 0 5 23 28
Total 36 61 32 129
2-locus genotypes at D12Mit51 and D12Mit132.

129 offspring from H×H, where A×B→H. 22
Estimation of r
First note that we can’t simply count recombinants. Why?

Because recombination can occur in the paternal or the
maternal meiosis, or both, and all we see are the genotypes of
the offspring. In most cases, the parental origin of the
recombination can be inferred, but not in every case.
Denoting the two markers by 1 and 2, the NOD alleles by a,

and B6 alleles by b, then the parental haplotypes are a1a2 on
one chromosome, and b1b2 on the other. Each parent passes
on a1a2 with probability(1-r)/2, and similarly for b1b2 , while they
pass on each of the recombinant haplotypes a1b2 and b1a2
with probability r/2.
In practice, recombinations have slightly different frequencies in

male and female meioses, but we ignore this refinement. 23
Probabilities of parentally transmitted
haplotype combinations (×4)
Haplotype combinations resulting from crossing doubly
heterozygous parents, each a1/b1 at locus 1 and a2/b2 at locus
2. This table is for coupling: the parental haplotypes are a1a2
and b1b2, i.e. the mother and father are both a1a2/b1b2.
Here P and M denote the Paternally and Maternally transmitted
haplotypes, respectively.
P M a1a2 a1b2 b1a2 b1b2
a1a2 (1-r)2 r(1-r) r(1-r) (1-r)2
a1b2 r(1-r) r2 r2 r(1-r)
b1a2 r(1-r) r2 r2 r(1-r)
b1b2 (1-r)2 r(1-r) r(1-r) (1-r)2 24
From the Punnett square to the table
of 2-locus genotype probabilities
Terms in the Punnett square table can be summed to build up a

table of probabilities for the 9 different 2-locus genotype
probabilities.
For example, we observe A (=a1/a1 ) at locus 1 and H (=a2 /b2)

at locus 2, if and only if the transmitted male and female
haplotypes are the pairs a1a2 &a1b2 or a1b2 &a1a2 , and this
occurs with a combined probability of 2r(1-r)/4.
The other terms are built up similarly, the most complex case
being the 2-locus genotype HH, where 4 different terms need to
be considered, corresponding to the fact that a double
heterozygote can result from 4 different combinations of
parental or recombinant haplotypes. 25
Probabilities of 2-locus genotypes (×4)
L1 L2 A H B
A (1-r)2 2r(1-r) r2
H 2r(1-r) 2[r2+(1-r)2] 2r(1-r)
B r2 2r(1-r) (1-r)2
Looking at this table, we see that recombinations

(or not) can be inferred, apart from the parent, in all
but the HH case. We can almost count recombinants.
26
Estimation of r, cont.
Using the table of probabilities we can write down a log
likelihood function for any set of 2-locus frequencies.
Label the cells of the table 1,…,9, and denote the
corresponding probabilities by p1(r) …,.p9 (r), and the
frequencies by n1, …, n9. Then the log-likelihood for the
resulting multinomial model is
log L = ∑i ni log pi (r).
The parameter r is then estimated by maximizing this

function, and an approximate standard error or
confidence interval obtained using the Fisher information
or the asymptotic chi-square approximation.
27
A frill: the M-step of an EM-algorithm
The function log L(r) can be maximized in a number of

ways, but in general there is no closed form
expression for the maximum likelihood estimate r^. rˆ If
we were able to decompose the count n5 of HHs into
the n5P that are pairs of parental haplotypes, and n5R
that are pairs of recombinant haplotypes, with
€
frequencies (1-r)2 and r2, resp, the recombinant
haplotypes can then be counted directly and the MLE
is
rˆ = 2(n3 + n7 + n5R)+ n2 + n4 + n6 + n8)/2n.
28
The E-step
In general we don’t know n5R but can estimate it using the

following formula:
2
R r
E(n | n 5 ) =
5 2
n
2 5
(1− r) + r
In practice, we need a value of r to begin with. Next we use

the above estimate, then get the next rˆ , and then iterate.
€
Exercise: Prove the above formula, and that the iteration is
an instance of the EM-algorithm. 29
€
2-locus genotype frequencies for
D12Mit132 and D13Mit6
132 | 6 A H B Total
A 10 21 7 38
H 15 29 17 61
B 5 21 6 32
Total 30 71 30 131
Exercise: Estimate r for these two loci. Is it different from 1/2?

30
Inferring linkage and
mapping markers
We now turn to deciding when two marker loci are linked,

and if so, estimating the map distance between them.
Then we go on and create a full (marker) map of each
chromosome, relative to which we can map trait genes.
With these preliminaries completed, we can map trait loci.
31
The LOD score
Suppose that we have two marker loci, and we don’t

know whether or not they are linked. A natural way to
address this question is to carry out a formal test of
the null hypothesis H: r=1/2 against the alternative
K: r< 1/2, using the marker data from our cross.
The test statistic almost always used in this context is
log10 of the ratio of the likelihood at the maximum
likelihood estimate rˆ to that at the null, r=1/2, i.e.
L( rˆ )
LOD = log10 { }
L(1 / 2) 32
€
Calculating the LOD score
Recall that the (log) likelihood here is based on the multinomial

distribution for the allocation of n=132 intercross mice into their
nine 2-locus genotypic categories. As we saw earlier, it can be
written
log10 L(r) = ∑ n i log10 pi (r)
i
and so we take the difference

€
between this function evaluated at
and at r=1/2, which is rˆ
€ LOD = ∑ n i log10 pi ( rˆ ) /qi

i
where qi is 1/16, 1/8 or 1/4, depending on i.
€ 33
€
Null probabilities of 2-locus genotypes
L1 L2 A H B
A 1/16 1/8 1/16
H 1/8 1/4 1/8
B 1/16 1/8 1/16
This is just putting r = 1/2 in an earlier table.
Exercise: Suggest some different test statistics to discriminate between the

null H and the alternative K. How do they perform in comparison to the LOD?
34
Using the LOD score
Normal statistical practice would have us setting a type 1 error in a given

context (cross, sample size), and determining the cut-off for the LOD which
would achieve approximately the desired error under the null hypothesis.
This approach is rarely adopted in genetics, where tradition dictates the use
of more stringent thresholds, which take into a account the multiple testing
common on linkage mapping. It was originally motivated by a Bayesian
argument, and in fact, Bayesian approaches to linkage analysis are
increasingly popular. Let us use of Bayes’ formula in the form
log10 posterior odds = log10 prior odds + LOD,
where the odds are for linkage. With 20 chromosomes, which we might
assume approx the same size, and not too long, the prior probability of two
random loci being on the same chromosome and hence linked, is about
1/20. In order to overcome these prior odds against linkage, and achieve
35
reasonable posterior odds, say 100:1, we would want a LOD of at least 3.
Linkage groups
And so it has come to pass that a LOD must be >3 to get

people’s attention. We’ll be a little more precise later.
The next step is to define what are called linkage groups.

These partition the markers into classes, every pair of markers
being either closely linked (i.e. r ≈ 0), or being connected by a
chain of markers, each consecutive pair of which is closely
linked. In practice, we might define closely linked to be
something like
a) rˆ < c1, and b) LOD( rˆ ) > c2, where e.g. c1= 0.2, c2 = 3.
36
Forming linkage groups, cont.
When one tries to form linkage groups, it is not unusual to have

to vary c1 and c2 a little, until all markers fall into a group of
more than just one marker. When this is done, it is hoped that
the linkage groups correspond to chromosomes. If the
chromosome number of the species is known, and that
coincides with the number of linkage groups, this is a
reasonable presumption. But much can happen to dash this
hope: one may have two linkage groups corresponding to
different arms of the same chromosome, and not know that;
one can have a marker at the end of one chromosome “linked”
to a marker at the end of another chromosome, though this
should be rare if there is plenty of data; and so on.
37
Ordering linkage groups
Next we want to order the markers in a linkage group( ideally,

on a chromosome). How do we do that? An initial ordering can
be done by starting one of the markers, M1 say, on the most
distant pair, here distance being recombination fraction, or map
distance. Call M2 the closest marker to M1 and continue in this
way.
Now we want to confirm our ordering. One way is to calculate a
(maximized) log likelihood for every ordering, and select the
one with the largest log likelihood. But if we have (say) 11
markers on a chromosome, this is 11! = 4×107 orders. What
people often do is take moving k-tuples of markers, and
optimize the order of each, e.g. with k = 3 or 4. Whichever
strategy one adopts, multi (i.e. >2) locus methods are needed.
38
Likelihoods for 3-locus data
Suppose that we have 3 markers M1 , M2 and M3 in that order. How do we
calculate the log likelihood of the associated 3-locus marker data from our
intercross?
Recalling the discussion preceding the Punnett square of the last lecture,
the parental haplotypes here are a1a2a3 and b1b2b3 while are would no
fewer than 6 forms of recombinant haplotypes:
the four single recombinants a1a2b3 , a1 b2 b3 , b1b2a3 and b1a2a3 ,
and the two double recombinants a1b2 a3 and b1a2b3 .
Proceeding as before, we calculate the probability of each of these in terms
of the recombination fractions r1 and r2 across intervals M1-M2, and M2-M3,
respectively. For simplicity, we assume the Poisson model, with
independence of recombination across disjoint intervals. For example,
a1a2a3 would have probability (1- r1)(1- r2)/4, a1a2b3 would have probability
(1- r1)r2/4, while a1b2 a3 would have probability r1r2 .
We would do this for every one of the 8 paternal and 8 maternal haplotypes,
and then collect them up to assign probabilities for each of the 33 3-locus
genotypes (AAA, AAH, …, BBB), and maximize the multinomial likelihood in
the parameters r1 and r2 . This is just as in the 2-locus case. 39
Multilocus linkage: #loci >3
It should have become clear by now that the strategy just
outlined is not going to work too easily when there are (say) 11
loci in a linkage group.
In that case, haplotypes are strings of the form a1a2b3 … a10b11 ,
where there are just 2 parental and 210-2 distinct recombinant
haplotypes. The number of parental haplotype combinations is
the square of this number, and they must be mapped into 311
11-locus genotypes, and a multinomial MLE carried out to
estimate 10 recombination fractions. What can be done?
In 1987 the first large scale human genetic map was published,
and at the same time a new algorithm was announced for both
human pedigrees and experimental crosses, such as our
intercross. This algorithm made use of hidden Markov models,
and for the first time allowed full likelihood calculations in our
40
current context without the exponential blow-up just described.
Multilocus mapping
Here we show how using Rabiner’s notation we can get an HMM.
Then we calculate our probabilities via the forward algorithm.
Note that in our case, the Markov chain is non-stationary: it has
different states and transition probabilities from time (here locus)
to time (locus). For simplicity, we omit the locus subscript.
State space: {aa, ab, ba, bb} = {a,b}×{a,b}.
Transition probabilities: P⊗P (Kronecker product), where P is
1− r r 
 
 r 1− r
Note: using states {A, H, B} won’t work. Why?
Observation set: {A, H, B,C, D, -}. Here C = not A, D = not B.
Emission probabilities: here just the obvious ones, e.g.
pr(emit A | aa€) = pr (emit D | aa) = 1.
Initial probabilites: πi all 1/4. 41
Multilocus mapping, cont.
I’m not going to cover this topic in any more detail this year, as I
discussed it a few years ago, and those interested can read it
there:
www.stat.berkeley.edu/users/terry/Classes/s260.1998/index.html
We use the HMM forward algorithm on each mouse’s data, one

by one, and multiply to get the likelihood, just as we described
last week for the backcross. In practice we take logs, and need
some tricks to deal with underflow. Parameter estimation can
also be dealt with using a different HMM formula.
Now suppose that we have ordered our marker loci as just

described, either by maximizing the likelihood within linkage
groups over all orders, or by doing so in moving windows of
size 3-5. How do we look at the result? 42
Checking the map, after
removal of bad markers
Top triangle is a
transform of the
recombination
fraction, namely
-4(1+log2r ).
Bottom triangle
contains the
LOD scores at
the maximum
likelihood
estimate of
recombination
fraction.
Notice the “bad”
bits in the top LH
and bottom RH
corners. est.rf, plot.rf (from an R package) 43
Checking existing genetic maps
As indicated earlier, the markers in our cross came from MIT,

and they were already mapped. Most researchers would
simply use the pre-existing map, as this would usually (but not
always) be based on many more recombinations than could be
expected in a single cross. Why might we not just do the same?
Well, existing maps are rarely completely error-free, and one
should always look at one’s own data.
An added benefit of looking at one’s own data in relation to an
existing map is that this should bring to light markers with a
large numbers of genotyping errors, assuming the map is
correct.
44
Interplay between error
detection and maps
• Genotyping errors in mouse crosses can usually only be
detected with the appearance of unusual numbers of
close recombination events
• This depends entirely on the quality of the map
• The availability of the mouse genome sequence allows
us to check genetic maps against the physical maps: we
locate the (unique) PCR primers for our microsatellite
markers. This has brought a new era in quality of maps
(includes human genetic maps!).
The next slide depicts the genetic map we used.
45
Locations of our markers
46
After a commercial, we move on to mapping coat color genes.
R
47
R/qtl
Authors: Karl Broman, Hao Wu, Gary Churchill, Saunak Sen, & Brian Yandell
48
Benefits of using R/qtl
• Lots of graphics
• Good error detection with accompanying graphics
• Single and two qtl mapping (and interaction terms)
• Choice of several input formats
– Includes Mapmaker format
• Many alternatives for mapping methods
• Many different models for phenotypes, e.g.
standard normal, nonparametric model, binary
traits
49
Why map coat color genes in our
C57/BL6 x NOD F2 intercross?
• the locations of these genes are known

• even with a modest number of mice we should be able to
map these genes easily
• it is a useful check that everything is as it should be with
our data
• and finally, it is a good exercise for us.
Exercise. Look up the agouti and albino loci at the Mouse

Genome Informatics database.
50
Recall our earlier Punnett square
51
Segregation data at a “random” marker
Phenotype by genotype at D12Mit51

(complete data only)
A B H
Agouti 19 18 35
Black 8 3 18
White 9 7 12
52
Mapping a segregating trait
We turn now to mapping the two coat color genes segregating in
our cross, beginning with the albino locus, and then the agouti
locus. To do so, we need a genetic model, that is, we need to
know or guess the relation between genotypes at our trait loci
and phenotypes, which is embodied in the notion of a
penetrance function.
Looking at the preceding table, the albino trait segregates just

as though governed by a recessive gene, so we postulate a
locus with a recessive and a dominant allele for it. Although this
is not precisely the case for the non-agouti trait, it is almost, and
we do likewise.
Later we will consider their interaction. 53

Probabilities of albino-marker genotypes (×4)
Recall that the NOD mouse (A) is homozygous for the albino
allele, while the C57/BL6 (B) is homozygous for the non-albino
allele. We can collapse an earlier table to get (×4)
Colour M A H B
Albino (1-r)2 2r(1-r) r2
Full color 1-(1-r)2 2 - 2r(1-r) 1-r2
Here r is the rec. fr. between a marker and the albino locus.
54
Segregation data at the
marker closest to Tyrc

@ 50 cM (the Tyrc locus is at 44 cM)
A B H
Agouti 3 19 47
Black 0 10 19
White 21 0 1
55
Mapping the albino locus
56
Plot of LOD score at each marker along the genome
Chromosome 7 genotypes for the albino mice.
A: homozygous NOD, B: homozygous B6,

H: heterozygote. Genotypes are read down.
Pale blue shading is conserved NOD haplotype.

D7Mit128 is near the Tyrc locus, 57
Approximate probabilities of
agouti-marker genotypes (×4)
Recall that the C57/BL6 (B) is homozygous for non-agouti,

while the NOD (A) is homozygous agouti. Ignoring the 1/16 of
the intercross who would exhibit the non-agouti trait (and be
black) if they weren’t albino, we get the following approximate
table, where 1/16 of the mice will be misclassified. Here r is the
recombination fraction between a marker and the agouti locus.
Colour M A H B
Non-black 1-r2 2-2r(1-r) 1- (1-r)2
Black r2 2r(1-r) (1-r)2
58
Segregation data at the marker
closest to the agouti locus

@ 87 cM (agouti locus is at 89 cM)
A B H
Agouti 24 2 46
Black 0 28 1
White 5 6 14
59
Mapping the agouti locus
60
Plot of LOD score at each marker along the genome
Chromosome 2 genotypes for the black progeny.
Mauve shading indicates conserved C57/BL6 haplotype.

61
Marker D2Mit48 is very close to the agouti locus.
Conclusion: single locus mapping
• agouti locus (A,a alleles) on Chr 2 at 89.9 cM
• albino locus (C,c alleles) on Chr 7 at 44 cM
(now known as Tyrc gene)
• In the data set:
– at 89 cM on Chr 2 with a LOD score > 20
• Marker D2M48 (8th marker on Chr 2)
– at 43 cM on Chr 7 with a LOD score > 20
• Marker D7M126 (4th marker on Chr 7)
The method worked for agouti, even though

62
1/16th of the mice were misclassified
Acknowledgement
This lecture would not have been possible without the

very substantial input of Melanie Bahlo and Tom
Brodnicki of the Walter & Eliza Hall Institute of Medical
Research (WEHI), Melbourne Australia.
Tom (together with people from the WEHI mouse
facility) carried out the cross, and did all the
phenotyping, while Melanie did all the data analysis
presented, and contributed a lot to the presentation.
Overall, responsibility for the presentation (especially
all the errors!) remains mine. 63

2006 Stat 246 WK 5 Lec 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2006 Stat 246 WK 5 Lec 1

Uploaded by

Copyright:

Available Formats

Mapping mouse coat color genes

Statistics 246 Spring 2006

Our main players are the C57BL/6 (BL for black,

In the discussion that follows, we only see variation at

• Ay, Lethal dominant yellow

A and a are a dominant/recessive allele pair

• C, full color gene

C and c are a dominant/recessive

• If the mouse is aaCy it is not agouti and not albino

• If the mouse is AxCy it is agouti and not albino

• If the mouse is wxcc it is albino no matter what the

We will denote the NOD mice by A, and the B6 mice by

• An F2 inter cross was performed starting with

Desirable: to call the genotypes (A, H, or B) automatically

data type f2 intercross .

D10M106 = a marker on chr 10 defined by MIT

calc.genoprob, calc.errorlod, plot.errorlod

We can (and should) check Mendel with data from our

What do we expect according to Mendel? How would

Mendel inferred the independent segregation of

Here we check that this holds for our two coat

C57/BL6 NOD females

Agouti : Black : Albino

2-locus genotypes at D12Mit51 and D12Mit132.

2-locus genotypes at D12Mit51 and D12Mit132.

First note that we can’t simply count recombinants. Why?

Denoting the two markers by 1 and 2, the NOD alleles by a,

In practice, recombinations have slightly different frequencies in

Terms in the Punnett square table can be summed to build up a

For example, we observe A (=a1/a1 ) at locus 1 and H (=a2 /b2)

Looking at this table, we see that recombinations

log L = ∑i ni log pi (r).

The parameter r is then estimated by maximizing this

The function log L(r) can be maximized in a number of

In general we don’t know n5R but can estimate it using the

In practice, we need a value of r to begin with. Next we use

Exercise: Estimate r for these two loci. Is it different from 1/2?

We now turn to deciding when two marker loci are linked,

Suppose that we have two marker loci, and we don’t

Recall that the (log) likelihood here is based on the multinomial

and so we take the difference

€ LOD = ∑ n i log10 pi ( rˆ ) /qi

This is just putting r = 1/2 in an earlier table.

Exercise: Suggest some different test statistics to discriminate between the

Normal statistical practice would have us setting a type 1 error in a given

log10 posterior odds = log10 prior odds + LOD,

And so it has come to pass that a LOD must be >3 to get

The next step is to define what are called linkage groups.

When one tries to form linkage groups, it is not unusual to have

Next we want to order the markers in a linkage group( ideally,

We use the HMM forward algorithm on each mouse’s data, one

Now suppose that we have ordered our marker loci as just

As indicated earlier, the markers in our cross came from MIT,

The next slide depicts the genetic map we used.

• the locations of these genes are known

Exercise. Look up the agouti and albino loci at the Mouse

Phenotype by genotype at D12Mit51

Looking at the preceding table, the albino trait segregates just

Later we will consider their interaction. 53

Phenotype by genotype at D7Mit126

A: homozygous NOD, B: homozygous B6,

Pale blue shading is conserved NOD haplotype.