Genetics M It

Genetics Lecture 1
We will begin this course with the question: What is a gene?
This question will take us four lectures to answer because there are actually several
different definitions that are appropriate in different contexts.
We will start with a physical definition of the gene. Conceptually this is the simplest and
it will give me an excuse to briefly review some of the molecular biology that you probably
already know.
Genes are made of DNA
For this course we will mostly think of DNA as an information molecule not as a chemical
substance.
In 1953, Watson and Crick deduced that the structure of DNA was a double helix. It
was not the helical structure per se, but the discovery of complementary base pairing
that revealed how information could be encoded in a molecule and how this information
could be exactly duplicated each cell division. Replication.
In order to extract information from the DNA, the cell again uses the complementary
base-pairing to make a copy of the information copied onto an RNA molecule. This is
Transcription. RNA is chemically less stable than DNA and mRNA can be
known as Transcription
thought of as a temporary copy of DNA’s information.
lecture1 1
Transcription
Transcription
Translation
Translation
Folded proteins
proteins:
enzymes
structural proteins
membrane channels
hormones
Gene: DNA segment needed to make a protein
Genes are typically 103 - 104 base pairs in size although they can be much larger. For
example, the human dystrophin gene is 2 x 106 base pairs.
E. coli has about 4,200 genes which isn’t very many considering that at least 1,000
different enzymes are needed carry out just the basic biochemical reactions in a cell.
The smallest genome for a free-living organism (i.e. a cell, not a virus) is that of the
bacterium Mycoplasma genetalium which encodes only 467 genes. Humans are at the
other end of the spectrum of complexity and have about 35,000 genes.
lecture1 2
In the demonstration in class you see how a mutation in the Shibire gene in the fly
Drosophila gives a heat sensitive protein that is required for synaptic transmission.
When the flies that carry this mutation are warmed by the projector lamp they become
paralyzed.
Gene — Protein — Cell Process — Organism “disease”
(Shibire) (Dynamin) (Synaptic Signaling) (Paralyzed Fly)
This example illustrates two powerful aspects of genetic analysis. First, we can follow
microscopic changes in the DNA such as the Shibire mutation as they are revealed by the
macroscopic consequences of the mutation such as a paralyzed fly. Second, we have a
very precise way of studying the function of individual proteins by examining the
consequences of eliminating just that one protein function in an otherwise normal
organism.
Alleles: different versions of the same gene
Often alleles are referred to as mutants but actually this usage is often incorrect
particularly when we discuss naturally occurring variants in a population.
Mutation: an altered version of a gene when we have “witnessed” the alteration but not
when it is preexisting in the population.
Genotype: all alleles of an individual
Wild type: defined standard genotype
The concept of wild-type is used as a defined reference for organisms where we can do
breeding experiments. Of course there is no realistic way to define a standard genotype
for humans, therefore “wild type” has no meaning when we discuss human genetics.
The physical definition of the gene is a very good one but there are many instances where
we wish to study genes whose DNA sequences are not known. For example, say we have
isolated a new mutant fly that is also paralyzed and we want to know whether this
mutation is also in the Shibire gene. We will see in the next several lectures that we can
answer this question without knowledge of the DNA sequence either by a test for gene
function known as a complementation test or by a test of the chromosomal position of the
mutation by recombinational mapping. In practice, these other ways of defining genes by
function or by position are often much more useful than a definition based on the DNA
sequence.
lecture1 3
Lecture 2
2
In this lecture we are going to consider experiments on yeast, a very useful organism for
genetic study. Yeast is more properly known as Saccharomyces cerevisiae, which is the
single-celled microbe used to make bread and beer. Yeast can exist as haploids of either
mating type α (MATα) or mating type a (MATa a). Haploid cells of different mating type
when mixed together will mate to make a diploid cell.
Haploids and diploids are isomorphic – meaning that a given mutation will cause essentially
the same change in haploid and diploid cells. This allows us to look at the effect of having
two different alleles in the same (diploid) cell.
All yeast needs to grow are salts, minerals, and glucose (minimal medium). From these
compounds, yeast cells can synthesize all of the molecules such as amino acids and
nucleotides that are needed to construct a cell. The synthesis of complicated molecules
requires many enzymatic steps. When combined, these enzymatic reactions constitute a
biochemical pathway
Consider the pathway for the synthesis of the amino acid histidine.
A → B → C → D → histidine → Protein
Enzyme: 1 2 3 4
Each intermediate compound in the pathway is converted to the next by an enzyme. For
example, if there is a mutation in the gene for enzyme 3 then intermediate C can not be
converted to D and the cell can not make histidine. Such a mutant will only grow if
histidine is provided in the growth medium.
lecture2 4
This type of mutation is known as an auxotrophic mutation and is very useful for genetic
analysis.
growth on minimal growth on minimal + histidine
His+ (wild type) + +
His– – +
Phenotype: All traits of an organism (with an emphasis on trait under investigation)
Homozygote: diploid with two like alleles of same gene
Heterozygote: diploid with two different alleles of same gene
Recessive Allele: trait not expressed in heterozygote
genotype phenotype Mate to : diploid genotype diploid phenotype
a His3–
MATa His– MATα His3– His3–/His3– His–
a His3–
MATa His– MATα His3+ His3–/His3+ His+
Based on the His– phenotype of the His3–/His3+ heterozygote, we would say that His3– is
recessive to wild type.
Let’s consider a different kind of mutation giving resistance to copper that occurs in a
gene known as CUP1.
genotype phenotype Mate to : diploid genotype diploid phenotype
a Cup1r
MATa copper resistant MATα Cup1+ Cup1r/Cup1+ copper resistant
Dominant Allele: trait is expressed in heterozygote
Cup1r is dominant to wild type (Cup1+).
The terms dominant and recessive are simply shorthand expressions for the results of
particular experiments. If someone says a particular allele is dominant that means that
at some point they constructed a heterozygous diploid and found that the trait was
expressed in that diploid.
lecture2 5
Note: Sometimes an allele will have more than one phenotype and may be recessive for
one and dominant for another. In such cases, the phenotype must be specified when one
is making statements about whether the allele is dominant or recessive. Consider for
example, the allele for sickle cell hemoglobin in humans designated Hbs. Heterozygous
individuals (Hbs/Hba) are more resistant to malaria, thus Hbs is dominant for the trait of
malaria resistance. On the other hand, Hbs/Hba heterozygotes do not the debilitating
sickle cell disease, but Hbs/Hbs homozygous individuals do. Therefore, Hbs is recessive
for the trait of sickle cell disease.
Once we find out whether an allele is dominant or recessive, we can already infer
important information about the nature of the allele. The following conclusions will
usually be true.
Recessive alleles usually cause the loss of something that is made in wild type
Dominant alleles usually cause increased activity or new activity
It turns out that the Cupr allele actually carries more copies of the gene for a copper
binding protein and therefore increases the activity of the gene.
Last lecture we defined the gene structurally as the DNA needed to encode a protein.
We can now define a gene in a new way based on its function. Using the phenotypic
difference between wild type and a recessive allele we can use a Complementation test to
determine whether two different recessive alleles are in the same gene.
Say you isolate a new recessive histidine requiring mutation that we will call HisX–. In
principle, this mutation could be in His3 or it could be in any of the other genes in the
histidine biosynthetic pathway. In order to distinguish these possibilities we need a test
to determine whether HisX the same as His3.
To carry out a complementation test, one simply constructs a diploid carrying both the
His3– and HisX– alleles.
An easy way to do this would be to mate a MATα HisX– strain to a MATa

a His3– strain.
possibility genotype of diploid phenotype of diploid complementa-

tion
HisX= His3 His3–/His3– His– No
HisX≠His3 His3–/His3+, HisX–/HisX+ His+ Yes

Yes
lecture2 6
Having performed this test, if the two mutations don’t complement we conclude that they
are in the same gene. Conversely, if they do complement we conclude that they are in
different genes.
This test only works for recessive mutations. Think about what the outcome would be if
HisX– were dominant.
The complementation test can be thought of in the following way. If I have an allele
with an observable phenotype whose function can be provided by a wild type genotype
(i.e., the allele is recessive) — I can ask whether the function that was lost because of
the recessive allele can be provided by another mutant genotype. If not, the two
alleles must be defective in the same gene. The beauty of this test is that the trait
can serve as a read-out of gene function even without knowledge of what the gene is
doing at a molecular level.
lecture2 7
Definitions from the Language of Genetics
Gene: The fundamental unit of heredity, which can be defined in three ways: i) A gene can be
defined in molecular terms as a segment of DNA carrying the information necessary to express a
complete protein or RNA molecule, including the promoter and coding sequence. ii) A gene can be
defined by function with a group of recessive mutations that do not complement each other. iii) A
gene can be defined by position with a single-locus segregation pattern in a cross between lines with
different alleles. Examples are a 1:3 phenotypic ratio in the F2 generation in a cross between diploid
organisms or a 2:2 segregation pattern in yeast tetrad analysis.
Alleles: Distinguishable versions of the same gene.
Locus: The site on a chromosome where a gene is located. Usually defined by recombinational
mapping relative to neighboring loci.
Genotype: The allelic constitution of an individual, usually with emphasis on the gene or genes
under examination.
Phenotype: All of the traits or characteristics of an organism, usually with emphasis on traits con-
trolled by the gene or genes under examination.
Wild type: A standard genotype that is used as a reference in breeding experiments. Note that for
human crosses there is no standard genotype and the concept of wild-type is therefore not meaning-
ful.
Haploid: A cell or organism with one set of chromosomes (1n).
Diploid: A cell or organism with two sets of chromosomes (2n).
Homozygous: The condition of having two like alleles in a diploid.
Heterozygous: The condition of having two different alleles in a diploid.
Dominant allele: An allele that expresses its phenotypic effect or trait in the heterozygous state.
Recessive allele: An allele whose phenotypic effect or trait is not expressed in a heterozygout state.
Incomplete dominance: The case where a heterozygote expresses a phenotype intermediate be-
tween the corresponding homozygote phenotypes.
Complementation test: A test of gene function where two genotypes with recessive alleles are
combined by a cross to test whether the genotype of one parent can supply the function absent in the
genotype of the other parent.
F1: First generation produced by interbreeding of two lines.
F2: Generation produced by interbreeding of F1 individuals.
Incomplete penetrance: Cases where certain alleles are not always expressed to give observable
traits because of other environmental or genetic influences.
True-breeding: Refers to a line of individuals that on intercrossing always produce individuals of
the same phenotype. This can almost always be taken to mean that the individuals are homozygous
lecture2 8
Lecture 3
Now let’s consider diploid organisms:
The genotype of the zygote will depend on which alleles are carried in the gametes.
Allele sperm
in gamete A a
A A/A A/a
egg Zygote
a a/A a/a
When heterozygotes mate their offspring will have different phenotypes: If A is domi-
nant to a, the two possible phenotypes will be the phenotype of a/a
a or the phenotype of
A/A
A and A/a a.
When we do breeding experiments it is important to know the genotypes of the parents.

But as you can see from the example above individuals with the dominant trait could be
either A/A or A/a
A/a. A method to control this type of variation is to start with populations
that we know to be homozygous. One way to do this is to keep inbreeding individuals until
all crosses among related individuals always produce identical offspring. This is known as
a true-breeding population and all individuals can be assumed to be homozygous.
lecture3 9
True Breeding: homozygous for all genes
Say we have a true breeding line of shibire flies these flies are paralyzed and have geno-
type shi–/shi
shi–.
First, we can test to see whether the shibire allele is dominant or recessive.
shi–/shi
shi– x (wild type) shi+/shi
shi+
↓
all are shi–/shi
shi+
(The offspring from a cross of two true breeding lines is known as the F1 or first filial
generation). The F1 flies appear like wild type therefore shi– is recessive (not expressed
in heterozygote)
Say we have isolated a new paralyzed mutant that we call par

par.
We start with a true breeding par– strain that we mate to wild type. We find that the
mutation is not expressed in the F1 heterozygotes and therefore is recessive.
To find out whether par– is the same as shi– we can do a complementation test since
both mutations are recessive. For this test, we cross a true breeding par– strain to a
true breeding shi– strain.
par–/par
par– x shi–/shi
shi–
↓
F1 (these flies must inherit both shi– and par–)
Possible outcome Complementation? Explanation Inferred genotype
shi– and par– par– genotype can supply par–/par

par+
F1 not paralyzed complement function missing in shi–
and vice versa shi+/shi
shi–
shi– and par– par– has lost function shi–/shi

shi–
F1 paralyzed do not complement needed to restore shi–
Let’s look more carefully at gene segregation in a cross between F1 flies.
shi–/shi
shi+ x shi–/shi
shi+
What is the probability of a paralyzed fly in the next (F2) generation?
lecture3 10
n
a) = a
Definition: p(a na = number of outcomes that satisfy condition a
N
N = total number of outcomes (of equal probability)
Probability problems can be solved by accounting for every outcome, but usually it is
easier to combine probabilities.
p(paralyzed F2 fly) = p(inherit shi- from mother and inherit shi- from father)
Product rule: p(a

a and b) = p(a
a) x p(b
b)
(note the product rule only applies if a and b are independent which is the case here since
the allele from mother does not affect the allele from the father)
shi– from mother) = 1/2

p(shi
p(paralyzed) = 1/2 x 1/2 = 1/4
p(not paralyzed) = 1 – 1/4 = 3/4
Thus in the F2 generation the phenotypic ratio will be, 1 paralyzed : 3 not paralyzed
A 1 : 3 phenotypic ratio among the F2 in a breeding experiment shows that alleles of a

single gene are segregating.
This actually constitutes a third definition of a gene. Historically, this was the first
definition of the gene developed by Gregor Mendel in the 1860s. Mendel was able to
detect single genes segregating in pea plants because he looked at simple traits and
started with true-breeding strains.
Let’s see how these ideas can be applied to a very interesting problem in the evolution of
corn. Domestic corn is derived from wild progenitor Teosinte. There is no historical
record of how the breeding was done to produce Maize but there is a genetic record of
the differences between Teosinte and Maize recorded the genomic differences between
these two species. Maize and Teosinte can be crossed to give viable progeny.
Teosinte x Maize
↓
F1 all same and unlike either parent
↓
F2 50,000 plants
~1/500 look like Teosinte and ~1/500 look like Maize
lecture3 11
How many genes contribute to the differences between the two kinds of plants?
Let’s designate the genes that differ as A, B, C, D ...
For each gene there are two alleles: the allele present in Teosinte and the allele present
in Maize.
For the A gene we will designate these alleles AT and AM respectively. For the B gene
there will be alleles BT and BM and so on for all the genes that differ.
Let’s follow the A gene through the cross between Maize and Teosinte
AT x AM/A
AT/A AM
AM
F1 : AT/A
Because the F1 don’t look like either parent, let’s assume that the alleles are codominant.
Codominant
Codominant: heterozygote different than either homozygote.
Incomplete dominance: heterozygote expresses the traits of both homozygous parents.
(Alternatively, the genes that differ could have a mixture of dominant and recessive
alleles)
F 2: AT/A
AT AT/A
AM AM/A
AM
1 : 2 : 1
1/4 will look like Teosinte.
For two genes that differ: AT/A

AT BT/B
BT
1/4 x 1/4 = 1/16 will look like Teosinte.
Similarly, for three genes the probability will be 1/64. For four genes it will be 1/256,
and for five genes it will be 1/1024.
Since ~1/500 look like Teosinte the conclusion is that 4–5 genes differ between wild corn
(Teosinte) and domestic corn (Maize). Using modern methods, it has been confirmed that
there are about five significantly different alleles and several of these have been located
using mapping methods.
lecture3 12
Lecture 4
4
From the last lecture, we followed gene segregation in a cross of a true breeding shibire
fly with a wild type fly.
Shibire x wild type
↓
F1: all not paralyzed
↓
F2: 3 not paralyzed : 1 paralyzed
This is the segregation pattern expected for a single gene. But in an actual experiment
how do we know that the phenotypic ratio is really 3 : 1 ?
There is no logical way to prove that we have a 3 :1 ratio. Nevertheless, we can think of
an alternative hypothesis then show that the alternative hypothesis does not fit the
data. Usually, we then adopt the simplest hypothesis that still fits the data.
A possible alternative hypothesis is that recessive mutations in two different genes are
needed to get a paralyzed fly.
In this case a true breeding paralyzed fly would have genotype: a/a , b/b
Whereas wild type would have genotype: A/A , B/B
F1: A/a
a B/b
b not paralyzed
F 2: p(a/a and b/b) = (1/4 )2 = 1/16
p(a/a and B/––) = 1/4 x 3/4 = 3/16
p(b/b and A/––) = 3/16
p(A/–– and B/––) = the rest = 9/16
This is the classic ratio for two gene segregation 9:3:3:1
paralyzed
For our hypothesis we should see a phenotypic ratio of 15 not paralyzed : 1 paralyzed.
lecture4 13
Therefore, to distinguish one-gene segregation from two-gene segregation we need a
statistical test to distinguish 3 : 1 from 15 : 1. Intuitively, we know that in order to get
statistical significance, we need to look at a sufficient number of individuals.
For a chi-square test you start with a specific hypothesis that gives a precise
expectation. The test is then applied to the actual experimental results and will give the
probability of obtaining the results under the hypothesis. The test is useful for ruling
out hypotheses that would be very unlikely to give the actual results.
Say we look at 16 flies in the F2 and observe 14 not paralyzed and 2 paralyzed flies.
Under the hypothesis of two genes we expect 15 not paralyzed flies and 1 paralyzed fly.
We calculate the value χ2 using the formula below. Where O is the number of individuals
observed in each class and E is the number of individuals expected for each class.
χ2 =
Σ (O–E)2
(all classes)
E
=
12
15
+
12
1
= 0.067 + 1 = 1.067
degrees of freedom (df) = number of classes – 1
From the table using 1 df, 0.05 < p < 0.5
The convention we use is that p ≤ 0.05 constitutes a deviation from expectation that is
significant enough to reject the hypothesis. Therefore, on the basis of this sample of 16
flies we can’t rule out the hypothesis that two genes are required.
Say we look at 64 F2 flies and find that 12 are paralyzed. For the hypothesis of two
genes the expectation is that 4 would be paralyzed. The χ2 for this data:
82 + 8 2
χ2 = = 1.07 + 16 = 17.1
60 4
From the table p < 0.005 so we reject the two-gene hypothesis.
Let’s use this data to test the hypothesis of one gene segregation which would be
expected to give 16 paralyzed flies from 64 F2 flies,
42 + 4 2
χ2 = = 0.33 + 1 = 1.33
48 16
From the table using 1 df, 0.5 < p < 0.5. Thus the data still fits the hypothesis of one-
gene segregation.
lecture4 14
So far, the hypothesis that one gene is responsible for the paralyzed trait is the simplest
explanation that fits the data.
The way to distinguish most easily between a heterozygote and a homozygote expressing
a dominant trait is to cross to a homozygous recessive test strain.
Test cross
cross: cross to homozygote recessive:
A/A x a/a gives all A/a . i.e. all offspring will express the dominant trait.
A/a x a/a gives 1/2 A/a and 1/2 a/a. i.e. one half of the offspring will express
the dominant trait.
Mendelian inheritance in humans
For humans we can’t do test crosses, of course, but by following inheritance of a trait for
several generations the modes of inheritance can usually be identified by applying basic
principles of Mendel. The following are guidelines for identifying different modes of
inheritance in pedigrees.
Autosomal dominant
i) Affected individuals must have at least one affected parent
Exceptions to this rule will occur if a new mutation arises in one of the parents (in
real life a more likely explanation is extramarital paternity). Another possibility is
incomplete penetrance, where other genetic or environmental factors prevent the
trait from being expressed in one of the parents.
Autosomal recessive
i) When both parents are carriers, on average 1/4 of the children will be affected.
ii) When both parents are affected, then all of the children will be affected.
iii) If the trait is very rare then consanguinity is likely. That is, it is likely that
parents of affected children are themselves related (e.g. cousins).
X-linked inheritance
O XcX+ x O X+Y
(carrier)
O XcX+, O X+X+, O XcY, O X+Y
(carrier) (color blind)
lecture4 15
i) When parents are a carrier O and an unaffected O , then on average, 1/2 of the
daughters will be carriers and 1/2 of the sons will be affected.
If the trait is rare then the vast majority of affected individuals will be male
which is the hallmark of X-linked traits.
ii) Affected sons inherit the allele from mother
• Maternal uncles often affected

• Since inherited only from mother, inbreeding doesn’t increase the
probability of an affected O .
Conditional probabilities
Consider the following pedigree of a recessive trait.

= female
= male
?
p(affected child) = p(mother carrier and father carrier and affected child)
= 2/3 x 2/3 x 1/4 = 1/9
However, if they have a child that is affected we must reassess the probability that
their next child will be affected.
p(both parents carriers) = 1. So, p(next child affected) = 1/4
This example shows how probability calculations are based on information. The
probability changes not because the parents have changed but because our information
about them has.
lecture4 16
HANDBOOK for PROBABILITY CALCULATIONS
Many problems in diploid genetics rely on basic concepts of probability. This is because each individual
inherits at random only one of two possible copies of a gene from each parent. Thus, breeding experi
ments or inheritance in human pedigrees have probabilistic rather than absolute outcomes. Everyone
has an intuitive sense of probability but what we need is a precise definition that will allow probabilities
to be manipulated quantitatively.
Probabilities are usually defined in terms of possible outcomes of a trial. A trial could be the toss of a
coin, the roll of a die, or two parents having a child. If we define a specific event a, p(a) or the probabil
ity of a, can be defined as follows: after a very large number of trials, p(a) is simply the fraction of trials
that give outcome a. In principle, we could determine p(a) by actually performing a large number of
trials and directly measuring the fraction of trials that produce event a. This is sometimes called the
“Monte Carlo method” named after a famous European casino and works well for computer simulations
of complicated phenomena. However, in many cases there is a much simpler way to calculate probabili
ties. To directly calculate classical probabilities one must know enough about a process to break down
the possible outcomes of a trial into some number of equally probable events. In these cases the prob
ability of event a is:
p(a)= na
N
where na is the number of outcomes that satisfy the criteria for a and N is the total number of equally
probable outcomes. Note that since N includes all possible outcomes, na ≤ N and 0 ≤ p(a) ≤ 1.
Example: A couple has two children, what is the probability that they are both girls? Assuming that the
chances of having a boy or a girl are equal, there are 4 equally probable ways of having two children
(boy, boy; girl, boy; boy, girl; girl, girl) and the probability of two girls is 1/4 or 0.25.
For classical probability problems you will always be able to arrive at the correct answer by writing out
all of the possible outcomes of a trial and counting the fraction of outcomes that satisfy the criteria for a
given event. Often, enumerating all of the outcomes for a trial is time-consuming and error-prone. It is
usually faster and easier to break a problem down into simple parts and then to combine the probabilities
for the individual parts. The following are useful ways that probabilities can be combined to speed
probability calculations.
PRODUCT RULE
p(a and b) = p(a) x p(b) if a and b are independent.
Two events are considered independent if they do not influence one another. The criterion of indepen
dence is very important — application of the product rule for events that are not independent will give
an incorrect answer.
lecture4 17
Examples: To find the probability that a couple with three children have three boys we first note that the
sex of one child has no influence on the sex of another and therefore constitute independent events. For
each child, p(boy ) = 1/2 and by the product rule p(3 boys) = 1/2 x 1/2 x 1/2 = 1/8.
First, for a recessive trait to be expressed the progeny must inherit the recessive allele from both the
mother and the father. Since the probability of inheriting a given allele from a heterozygote is 1/2,
p(mutant from mother and mutant from father) = 1/2 x 1/2 = 1/4. Second, since unlinked genes are
inherited independently, we can use the product rule again to calculate p(recessives at gene A and reces
sives at gene B) = 1/4 x 1/4 = 1/16.
SUM RULE
The probability that either a or b will occur can be written as p(a or b). If two events a and b cannot
both occur they are mutually exclusive and the number of events that satisfy a or b is na + nb. It should
be apparent from our definition of probability that:
n +n n + nb
p(a or b) = a b = a = p(a) + p(b)
N N N
A useful special case of the sum rule arises when we consider p(not a). By definition p(a) and p(not a)
are mutually exclusive and they encompass all possible outcomes. Thus:
p(a or not a) = 1 = p(a) + p(not a) and p(not a) = 1 – p(a)
Examples: Find the probability that a family with three children has at least one girl. We begin by
noting that instead of trying to count all possible families with at least one girl it is easier to realize that
p(at least one girl) is the same as p( not all boys). Since p(all boys) = 1/8, p(not all boys) = 1– 1/8 = 7/8
= p(at least one girl).
In a cross where both parents are heterozygous for recessive mutations in two unlinked genes, what is
the probability that one of their progeny will express at least one of the dominant traits? p(at least one
dominant) = 1 – p(both recessive), and from above, p(both recessive) = 1/16. Therefore p(at least one
dominant) = 1 – 1/16 = 15/16.
In cases where two events a and b are independent but not mutually exclusive, we can still calculate
p(a or b). In this case we note that the two events a and (b and not a) are mutually exclusive and
encompass all outcomes that satisfy a or b or both. For these mutually exclusive events we can apply
the sum rule. Thus,
p(a or b) = p(a or [b and not a]) = p(a) + p(b and not a)
Since b and not a are independent:
p(a) + p(b and not a) = p(a) + p(b) x p(not a) = p(a) + p(b) x [1 – p(a)] =
p(a) + p(b) – [p(a) x p(b)]
Note that in the case where a and b are mutually exclusive, p(a) x p(b) = 0 giving the same formula as
for the sum rule.
Example: We can use this formula as another way to solve the last example, which is a case in which the
two events are independent but not mutually exclusive. p(at least one dominant) = p(dominant at gene A
or dominant at gene B) = p(dominant at gene A)+p(dominant at gene B) – [p(dominant at gene A) x
p(dominant at gene B )] = 3/4 + 3/4 – [3/4 x 3/4] =6/4 – 9/16 = 15/16.
lecture4 18
Lecture 5
5
Until now our analysis of genes has focused on gene function as determined by phenotype
differences brought about by different alleles or by a direct test of function – the
complementation test.
For the next six lectures our analysis will be concerned with the tests of gene position
starting with the position of genes on chromosomes and finally mapping point mutations at
the resolution of single nucleotide pairs.
We’ve taken it for granted that genes reside on chromosomes, but how do we know this?
Let’s review the properties of gene segregation.
Consider two different traits.
A/A, B/B x a/a, b/b
The gametes from one parent will be A, B and from the other parent a, b
These gametes will then give an F1 generation of all A/a, B/b
Crosses between F1 individuals will give an F2 generation with a 9 : 3 : 3 : 1 phenotypic

ratio as shown before.
A better way to look at segregation is by a test cross of the F1 heteroxygote to a

homozygous recessive individual.
A/a, B/b x a/a, b/b
The possible gamete genotypes from the F1 will be:

A, b a, b A, B a, B
(recomb.) (parental) (parental) (recomb.)
The corresponding genotypes of the offspring in the testcross will be:

A/a, b/b a/a, b/b A/a, B/b a/a, B/b
Each offspring receives either one or the other parental allele: gene segregation
segregation.
For most gene pairs, the frequency of each of the four classes of gametes is the same
indicating that the two genes segregate independently: independent assortment
assortment.
At the turn of the century microscopes allowed people to watch chromosomes in the
nuclei of dividing cells. (human cells, for example, contain 46 chromosomes).
lecture5 19
The chromosomes in dividing somatic cells go through Mitosis
Mitosis:
The net result of mitosis is to distribute a replica of each chromosome into the two
daughter cells.
The stages of mitosis are as follows:
i) Interphase. DNA replication

ii) Prophase.
Prophase.Chromosomes condense and centromeres attach to microtubule spindle
iii) Metaphase. Chromosomes align
iv) Anaphase. Sister chromatids move apart
v) Telophase. Nuclei reform
The cell has evolved a simple mechanical mechanism to insure that after mitosis each
daughter cell has received exactly one copy of each chromosome. (Failure of proper
chromosome segregation is known as nondisjunction
nondisjunction). The steps in the mechanism are as
follows: 1) After DNA replication two daughter chromosomes known as sister chromatids
are held together by special proteins known as cohesins. 2) As chromosomes align in
metaphase microtubule spindles attach to centromeres on each chromatid. 3) Once all of
the chromatids are attached to spindles a protease known as separase becomes active
(Actually unattached chromatids produce a signal to keep separase inactive and only when
every chromaitid pair is under tension generated by spindles pulling in opposite directions
is the inhibitory signal turned off.) 4) Finally, active separase cleaves the cohesin pro-
teins detaching sister chromatids and allowing them be pulled apart by the spindle to be
distributed to different daughter cells.
lecture5 20
Cells in production of germ cells such as pollen undergo a very different kind of division,
Meiosis
Meiosis.
Meiosis differs from mitosis in twofundamental respects: 1) in meiosis there are two
rounds of chromosome segregation for one round of synthesis so each germ cell receives
only one of the two homologous chromosomes and 2) in meiosis the homologs pair with one
another then move to opposite poles.
Chromosomes behave in meiosis the same way that Mendel showed genes to behave.
Each germ cell receives only one of the two homologs, a behavior that is analogous to gene
segregation.
The relative alignment of chromosomes is arbitrary which is analogous to independent

assortment of genes.
What was needed to show that genes are on chromosomes was a chromosome that could
be identified in the microscope and that carried an allele for a trait that could also be
followed. The proof for chromosome theory would then depend on correlating the
segregation of the trait with segregation of the chromosome.
T.H. Morgan proved chromosome theory in 1910 using Drosophila.
Flies normally have brick-red eyes. The first white-eyed mutant was found by Morgan’s
wife, Lillian, who worked in the lab.
white O x red O (wild type)

↓
F1: all red
↓
F2: 3 red : 1 white (but only males)
Thus, the white mutation behaves like a recessive allele, but there was something unusual
about the white mutation because only the male flies in the F2 have white eyes.
lecture5 21
white O x red O (heterozygote from F1)
↓
1 red : 1 white
In this case equal numbers of males and females have white eyes. Again this is consistent
with white being a recessive trait. The most informative cross is the reciprocal of the
first cross.
white O x red O (wild type)

↓
red O , white O
Ignoring the sex of the flies, it looks as if a wild type O is heterozygous: the wild type
allele is always passed on to the daughters and the white allele is passed on to sons.
The explanation is that eye color gene is on the sex determining chromosome X. Males
only have one copy of the X chromosome and daughters always get one copy of the X
from the mother and one copy from the father.
O (XX) x O (XY)
XX XY
Xw = white allele on X, X+ = red allele on X
XwXw
x X+Y
↓
XwX+ XwY
red O white O
Thus, the trait for red eyes is always inherited along with the X chromosome from the
father. The absence of red (giving white eyes) always goes with the Y chromosome.
lecture5 22
Lecture 6
6
When we talk about gene position the term locus is used to designate the chromosomal
location of a gene.
What we are going to do is to map genes relative to one another. To begin, we need two
genes on the same chromosome. Last lecture we saw how you could tell whether a gene is
on the X chromosome by how alleles of the gene are inherited differently by males and
females.
Consider two mutations on the X chromosome of Drosophila; crossveinless and white eye.
Genotype Phenotype
Xcv+w+ Y wild type
Xcv- w+ Y crossveinless wings
Xcv+w- Y white eye
Xcv-w+ Y x Xcv+w-Xcv+w- (true breeding)
All of the daughters from this cross will have two different X-chromosomes, which
differ at two loci: Xcv+w-Xcv-w+
We want to follow these X chromosomes into the next generation so after a cross we
look at male flies.
parental classes: Xcv+w-Y and Xcv-w+Y
crossover classes: Xcv-w-Y and Xcv+w+Y

(crossveinless, white) (wild type)
In the crossover classes the alleles appear to have separated and moved from one X to
the other. Genes on the same chromosome often do not assort independently. Such
Linkage.
behavior is known as Linkage
unlinked — crossover classes appear at same frequency as parental classes.

(Note that traits that show independent assortment are unliked)
weakly linked – crossover classes appear often but less often than parental classes.
tightly linked — crossover classes appear rarely or never.
lecture6 23
To see what’s really going on we need to look at the chromatids in prophase of meiosis in
the mother.
Crossovers between homologous chromosomes occur more or less at random during

meiosis. To give you a rough idea of how frequent these crossovers are, in several
different well studied organisms (Yeast, Drosophila, and humans) there is about one
crossover per chromosome arm per meiosis. The geneticist uses these random crossovers
as a tool to measure distance. Distance can be obtained because crossovers between two
points that are close together will rarely occur whereas crossovers between points that
are far apart will occur frequently.
Definition of genetic distance:

crossover gametes
map distance (m.u. or cM) = 100 x
total gametes
cv- W+
O x O (Note new notation)
CV+ w-
(In order to detect both dominant and recessive alleles, we look at males only)
cv- W+ 430
CV+ w- 450
cv- w- 52
CV+ W+ 68
Number of crossover gametes = 120 Total gametes = 1,000

120
Distance = 100 x = 12 cM
1,000
lecture6 24
It is important to note that once a map distance between two genetic markers has been
established this distance can be used to calculate the expected numbers of each type of
progeny. For example, if you know that two mutations are 12 cM apart then you should
expect that 6% of the progeny from a cross will be of each of recombinant class.
Things get interesting when we make several pairwise crosses between genes on the same
chromosome.
We can use this data to construct a Genetic Map
Genetic maps have the following properties:
i) Distance is proportional to frequency of crossover classes (this approximation

actually only holds for short distances of <20 cM)
ii) Distances are approximately additive: mapped points fall on a line.
iii) Maps are internally consistent and concise.
(The first genetic map was constructed in 1911 by Alfred Sturtevant when he was a
sophomore student in Morgan’s lab)
It is important to remember that genetic distances are measured using a property of

meiosis (genetic recombination) that varies from one organism to another. The
relationship between genetic distance and actual physical distance can be summarized in
this way:
Genetic distance = physical distance x recombination rate
The actual relationship between genetic distance in cM and physical distance in base pairs
(bp) depends on the recombination rate and is different for different organisms
For example: Human: 1.3 cM/Mbp Yeast: 360 cM/Mbp
Sometimes recombination rates in the male and female of a species are different. In
Drosophila there is no recombination in the male so the genetic distance between markers
on the same chromosome are always zero when examined by meiosis in the male. In hu-
mans the recombination rate (and therefore map distances) in the female are twice that
of the male.
lecture6 25
Another issue that often causes confusion concerns the map distances of genes that are
far apart on the same chromosome.
The physical length of a genetic interval is proportional to the frequency of crossovers

that occur in that interval during meiosis but in a cross we are not actually counting
crossovers rather we are counting the number of recombinant progeny that are produced.
The number of recombinants provide a good approximation of distance for short intervals
but as the interval length increases, multiple crossovers are possible making the relation-
ship between frequency of recombinants and crossovers not linear.
Crossovers between Recombinant

same chromatids progeny
1 yes
2 no
3 yes
4 no
If the measured distance in a cross is statistically indistinguishable from 50 cM then we

say that the genes are unlinked. But this doesn’t mean that distances greater than 50 cM
can not be obtained. By adding intervals, longer distances that are meaningful can be
obtained. For example, if all the intervals between linked genes in the human genome are
added together the total length of the genome (in males) is 2,500 cM.
lecture6 26
Lecture 7
7
Last time we discussed how to measure the distance between two genes on the X
chromosome. To do this we used the trick of looking only at male progeny so the
genotype of the X chromosome could be scored directly since these flies only carry one
copy of the X. For autosomes, we can have the same ability to score all recombinant
classes by crossing to a homozygous recessive individual. This is known as a test-cross.
Consider the recessive traits vestigial wings and short bristles that are specified by two
different genes on the same autosome (non-sex chromosome).
vg sh + +
O x O
vg sh + +
↓
vg sh
All F1:
+ +
The F1 flies are heterozygous for both genes so we are in position to see how
oftencrossovers between these chromosomes occur in meiosis by doing a test-cross.
vg sh vg sh
O x O
+ + vg sh
↓
(Note that the progeny vg sh 458
have distinct phenotypes) vg sh
vg sh 442
+ +
vg +
vg sh 47
(crossover classes)
+ sh
53
vg sh
100
The distance between vg and sh = 100 x = 10 cM
1000
lecture7 27
Now we’ll do a second cross. Note that the key is to set up a parent that is heterozygous
at two loci.
+ cn
O sh + x O (cinnabar eyes)
sh + + cn
↓
sh + sh cn
O x O
+ cn sh cn
↓
sh cn
12
sh cn
crossover classes
+ + 8
sh cn
+ cn 493
sh cn
sh +
487
sh cn
Distance between sh and cn = 2 cM
There are two possible orders. We could resolve them by measuring the cn to vg
distance, which should be either 8 cM or 12 cM depending on the order. However, it’s
difficult in practice to get a statistically significant measurement that would cleanly
distinguish between these possibilities.
A better way to find the order is to set up all three heterozygous markers at the same
time and to look at the frequencies of the eight different gamete genotypes.
lecture7 28
This is known as a 3 factor cross
cn sh vg cn sh vg
O x O
+ + + cn sh vg
cn sh vg 900
+ + + 912
cn + + 2
+ sh vg 1
cn sh + 75
+ + vg 70
cn + vg 18
+ sh + 22
These are all of the possible combinations. One pair of these gamete classes must be the
result of double crossovers. This class will be very rare (0.1 x 0.02 = 2 x 10-3). By
finding the rare class we have a qualitative test to determine gene order.
The double crossover classes for the two possible orders are:
cn + vg
or
+ sh +
(If this were the order,
these would be the rare classes)
sh + vg
or
+ cn +
(Since these are the rare classes
we know this to be the order)
lecture7 29
There is a simple system for evaluating 3-factor crosses:
1) Group recombinant classes into reciprocal pairs.
2) The most frequent pair is the parental classes.
3) Derive the gene order from the least frequent pair, which are the double cross-
over classes.
4) The single crossover frequency for the two intervals can be obtained by adding
the frequency of each of the single crossover class pairs to the frequency of the
double crossover class pair. (In the present example the double crossovers are so
rare that their inclusion doesn’t matter).
sh cn vg
lecture7 30
TETRAD ANALYSIS IN FUNGI
Image removed to due to copyright considerations.
THE PRODUCTS OF A SINGLE MEIOSIS

ARE PACKAGED IN A SAC (ASCUS)
lecture8 31
THE PRODUCTS OF A SINGLE MEIOSIS
A a
Haploid X
A
Diploid a
A
1st
A
a
division
a
Meiosis
A Image Aremoved
a to due to
copyright
a considerations.
A
a
lecture8 32
Mendel
1. Segregation: Equal numbers of A and a
•The phenotype resulting from a mutation in a single

gene will segregate exactly 2A : 2a.
•Question in Tetradspeak:
Does the phenotype segregate2:2?
Yes. A and a are alleles of a single gene.
lecture8 33
HOW DO WE KNOW WHETHER TWO GENES ARE LINKED?
A B
X
a b
A B A b
A B A b
a b or a B
a b a B
Parental Non-Parental
AB Ditype Ab Ditype
(PD) (NPD)
AB Ab
a b a B
a b a B
lecture8 34
How do you determine linkage?
You cross AB x ab and find in 100 tetrads:
48 PD
52 NPD
Are A and B linked?
lecture8 35
Mendel
1. Segregation: Equal numbers of A and a
•The phenotype resulting from a mutation in a single

gene will segregate exactly 2A : 2a.
•Question in Tetradspeak:
Does the phenotype segregate 2:2?
Yes. A and a are alleles of a single gene.
2. Independent Assortment (linkage): AB x ab

•Tetradspeak: Are PD = NPD
Yes No
Two genes are unlinked Two genes are linked
lecture8 36
Independent Assortment (linkage): AB x ab
lecture8 37
Complete Linkage of Two Genes
AB
X ab
AB
AB
ab
ab
Parental ditype
AB
AB
AB
AB
ab ab
ab
ab
lecture8 38
The Products of a Single Crossover
A B a b
A B
AB Parental
A B Recombinant
Ab
a X b aB Recombinant
a b ab Parental
TETRATYPE
lecture8 39
2. Independent Assortment (linkage): AB x ab
Yes No

TETRATYPE?
Yes NO
100% linked
Map Distance
centiMorgans = 1/2 TT
TOTAL
lecture8 40
90 PD
10 TT
Are they linked?
90 PD = 90 X4 = 360 Non-Recombinant Progeny

10 TT = 10 x 2 = 20 Recombinant Progeny
MD in centimorgans = Recombinants
X 100
Total tetrads
MD = 20/400= 5 centiMorgans
Map Distance
= 1/2 TT
centiMorgans
TOTAL
lecture8 41
Linkage AB x ab
Are PD = NPD?
Yes No

TETRATYPE?
Yes NO
100% linked
Map Distance = 1/2 TT
TOTAL
WHAT ABOUT DOUBLE CROSSOVERS?
lecture8 42
DOUBLE CROSSOVERS
A B AB A B
AB
A B AB A B
PD Ab TT
a XX b ab a X b ab
ab aB
a b a b
A B A B
Ab Ab
A B AB A B Ab
a X b aB TT a X b aB
NPD
ab a b aB
a b
1PD : 1NPD :2TT
lecture8 43
CROSSING OVER OCCURS AT THE 4 STRAND STAGE
A B AB A B
AB
A B AB A B
PD Ab TT
a XX b ab a X b ab
ab aB
a b a b
A B A B
Ab Ab
A B AB A B Ab
a X b aB TT a X b aB
NPD
ab a b aB
a b
lecture8 44
Linkage AB x ab
Are PD = NPD?
Yes No
Two genes are linked

TETRATYPE?
Two genes are unlinked
Yes NO
100% linked
Map Distance = 1/2 TT
TOTAL
NPD?
NO YES
SINGLE DOUBLE
MD = 1/2 TT + 3NPD
TOTAL
lecture8 45
ESTIMATION OF DOUBLE CROSSOVERS
Singles = TT
Doubles = 1PD :2TT: 1NPD
Single 1/2 (TT-2NPD)
Doubles 4NPD
MD = 1/2(TT-2NPD) + 4NPD
Total
SINGLE DOUBLE
MD = 1/2 TT + 3NPD
TOTAL
lecture8 46
70 PD
20TT
10NPD
Count crossover gametes = 40 + 4 (10) X 100 = 20 cM

400
Analyze tetrads = 10 + 30 X 100 = 40 cM

100
If we just count crossover gametes we would underestimate the distance.
lecture8 47
THE THREE TYPES OF TETRADS
Parental Non-Parental Tetratype
AB Ab A B
AB Ab A b
a b a B a B
a b a B a b
NO ONE TETRAD TYPE IS SUFFICIENT.
IT IS THE RELATIONSHIP THAT TELLS ALL
lecture8 48
Lecture 99
By way of review, let’s consider the general question of how genetic mapping studies can
be used to locate a gene that has been identified by an allele with an interesting
phenotype. For example, the CLOCK mutation in mouse was identified as a semi-dominant
mutation that disrupts the normal circadian rhythm. This mutation was isolated after
mutagenesis of animals with ethyl-nitorosourea and then screening their offspring for
abnormal activity at a time when normal mice would be sleeping.
Let’s assume that we have at our disposal a large number of genetic markers spaced
evenly along all of the mouse chromosomes (later in the course we will discuss DNA-based
markers which can be found in an almost unlimited number in the mammalian genome) and
that we have the capability to screen about 1000 mice for recombinants in a mapping
experiment. The question of how precisely we can locate the CLOCK gene can be
considered to be a question of the resolving power of a mapping experiment. Two points
on a genetic map can only be resolved if a recombination event that separates them can
be found. The smallest interval that can be resolved on average if 1000 progeny are
screened is 1 recombinant in 1000 which corresponds to a map distance of ~ 0.1 cM on
either side of the CLOCK mutation. Thus we would be able to map the CLOCK mutation to
an interval of 0.2 cM.
In mice, and in humans, a simple rule of thumb is that 1 cM corresponds to a physical

distance of about 1 Mbp. Therefore our mapping experiment would locate the CLOCK
gene to a region of DNA about 200 kbp in size. Such an interval may contain two to ten
gene sequences so additional work using recombinant DNA methods would be needed to
precisely identify the CLOCK gene. However, there is still no substitute for
recombinational mapping studies as a way to initially narrow down the approximate
location of a gene.
lecture9 49
To review how we can categorize tetrads consider the following cross:
Trp1– Leu2+ x Trp1+ Leu2–
Trp1–Leu2+, Trp1+Leu2–, Trp1+Leu2+, Trp1–Leu2– This is a Tetratype
Trp1–Leu2+, Trp1–Leu2+, Trp1+Leu2–, Trp1+Leu2– This is a PD
Trp1+Leu2+, Trp1+Leu2+, Trp1–Leu2–, Trp1–Leu2– This is an NPD
Now consider a cross that involves mutations that have the same phenotype.
Leu1–Leu2+ x Leu1+Leu2–
Notice that even though we can’t always unambiguously assign genotypes to each spore
clone we can distinguish the three tetrad types.
Tetratype PD NPD
genotype phenotype genotype phenotype genotype phenotype
1+2+ Leu+ 1–2+ Leu– 1+2+ Leu+
1–2– Leu– 1–2+ Leu– 1+2+ Leu+
1+2– Leu– 1+2– Leu– 1–2– Leu–
1–2+ Leu– 1+2– Leu– 1–2– Leu–
Here are the steps used in tetrad analysis:
i) Count tetrads of each type to get four numbers: PD, T, NPD, total
ii) If the classes are in the following ratio: PD : T : NPD

NPD
1 : 4 : 1
the genes are unlinked and you are finished.
iii) If PD > > NPD, then the genes are linked and the distance between them is given by:
T + 6NPD
NPD
Distance in cM = x 100
2 Σ
lecture9 50
Gene Fine Structure
So far we have been considering the phenotypic consequences of mutations in different

genes and the relative positions of genes along chromosomes. Now we are going to look at
the internal structure of genes themselves and to consider the different kinds of
mutations that are found in genes. For the next eleven lectures we will examine the
genes of bacteria and their viruses, known as phage (some special features of eukaryotic
genes will come later). First, let’s see how the life cycle of phage can be exploited to
carry out classical genetic manipulations.
Phage cross
In order to do genetics with phage it is necessary to have recognizable variants of phage.
The easiest mutants to work with have obvious distinctive plaque morphologies
λ phage mi–: make small plaques

because fewer particles are released
λ phage normally infects cells to make more phage. But about 1% of the time phage
becomes quiescent in host cell and prevents other phage from infecting that cell. This
property of phage λ is known as immunity. The quiescent phage is known as a lysogen.
cI– mutants make clear plaques because they can’t form lysogens.
lecture9 51
Phage Cross
Cross: Infect bacteria with ~10 of each phage per cell to insure that every cell
has both kinds of phage replicating inside. Allow one round of growth then plate out the
phage that are produced to look at the morphology of the plaques.
Plaque type Frequency

lg clear (parent) 0.4
sm turbid (parent) 0.4
lg turbid (recomb.) 0.1
sm clear (recomb.) 0.1
For this cross the measured distance between mi– and cI– is:
number of recombinants 20
x 100 = x 100 = 20 m.u. (map units)
total 100
mi cI
20 mu
We can also use this system to find the distance between two mutations in the same
gene. For example, let’s say that we have isolated two different cI– mutants that give
clear plaques. After infection of E. coli with ~10 phage per cell of each mutant, the
resulting phage are plated and their plaque morphology is examined. Most of the phage
will still be cI–, but recombination between mutations will produce wild type phage with
turbid plaques. If 2 out of 1,000 phage make turbid plaques then the measured map
distance between the two mutations is 0.4 m.u. Note that in this example we are only
counting wild type recombinants and there should be an equal number of double-mutant
recombinants — both classes of recombinants must be accounted for in calculating the
map distance.
cI-1 cI-2
0.4 mu
lecture9 52
Lecture 10
Analysis of Gene Sequences
Anatomy of a bacterial gene:
Promoter Coding Sequence (no stop codons)
mRNA:
Transcription Translation Start Translation Stop Transcription
Start (AUG) (UAG, UAA, or UGA) Terminator
S-D Sequence
Sequence Element Function

Promoter To target RNA polymerase to DNA and to start transcription
of a mRNA copy of the gene sequence.
Transcription terminator To instruct RNA polymerase to stop transcription.
Shine-Dalgarno sequence S-D sequence in mRNA will load ribosomes to begin transla-
and translation start tion. Translation almost always begins at an AUG codon in
the mRNA (an ATG in the DNA becomes an AUG in the
mRNA copy). Synthesis of the protein thus begins with a
methionine.
Coding Sequence Once translation starts, the coding sequence is translated by
the ribosome along with tRNAs which read three bases at a
time in linear sequence. Amino acids will be incorporated into
the growing polypeptide chain according to the genetic code.
Translation Stop When one of the three stop codons [UAG (amber), UAA
(ochre), or UGA] is encountered during translation, the
polypeptide will be released from the ribosome.
Example: A gene coding sequence that is 1,200 nucleotide base pairs in length (including
the ATG but not including the stop codon) will specify the sequence of a protein 1200/3 =
400 amino acids long. Since the average molecular weight of an amino acid is 110 da, this
gene encodes a protein of about 44 kd — the size of an average protein.
lecture10 53
The Genetic Code
lecture10 54
Classically, genes are identified by their function. That is the existence of the gene is
recognized because of mutations in the gene that give an observable phenotypic change.
Historically, many genes have been discovered because of their effects on phenotype.
Now, in the era of genomic sequencing, many genes of no known function can be detected
by looking for patterns in DNA sequences. The simplest method which works for
bacterial and phage genes (but not for most eukaryotic genes as we will see later) is to
look for stretches of sequence that lack stop codons. These are known as “open reading
frames” or ORF
ORFs. This works because a random sequence should contain an average of
one stop codon in every 21 codons. Thus, the probability of a random occurrence of even
a short open reading frame of say 100 codons without a stop codon is very small (61/
64)100 = 8.2 x 10–3
Identifying genes in DNA sequences from higher organisms is usally more difficult than in
bacteria. This is because in humans, for example, gene coding sequences are separated
by long sequences that do not code for proteins. Moreover, genes of higher eukaryotes
are interrupted by introns
introns, which are sequences that are spliced out of the RNA before
translation. The presence of introns breaks up the open reading frames into short
segments making them much harder to distinguish from non-coding sequences. The maps
below show 50 kbp segments of DNA from yeast, Drosophila, and humans. The dark grey
boxes represent coding sequences and the light grey boxes represent introns. The boxes
above the line are transcribed to the right ant the boxes below are transcribed to the
left. Names have been assigned to each of the identified genes. Although the yeast
genes are much like those of bacteria (few introns and packed closely together), the
Drosophila and human genes are spread apart and interrupted by many introns. Sophisti-
cated computer algorithms were used to identify these dispersed gene sequences.
Saccharomyces cerevisiae
YFL046W YFL040W YFL030W
RGD2 FET5 TUB2 RP041 YFL034W HAC1 STE2
0 50
SEC53 ACT1 MOB2 RIM15 CAK1 BST1 EPL1

YFL044C YPT1 RPL22B CAF16
YFL042C GYP8
Drosophila melanogaster CG3131

syt CG15400
0 50
CG16987 CG2964 CG3123
Human
GATA1 HDAC6 LOC139168
0 50
PCSK1N
lecture10 55
To see how gene sequences are actually obtained, we will first need to consider some
fundamentals of the chemical structure of DNA. Each strand of DNA is directional. The
different ends are usually called the 5’ and 3’ ends; referring to different positions on
the ribose sugar ring where the linking phosphate residues attach.
In a double stranded DNA molecule the two strands run anti-parallel to one another and
the general structure can be diagramed like this:
5’ 3’
3’ 5’
• Note about representation of DNA sequences.
1) Single strands are always represented in direction of synthesis – 5’ to 3’
2) For double stranded DNA, usually one strand is represented in the 5’ to 3’ direction.
For a gene, the strand represented would correspond to the sequence of the mRNA.
DNA polymersaes are the key players in the methods that we will be considering. The
general reaction carried out by DNA polymerase is to synthesize a copy of a DNA
template starting with the chemical precursors (nucleotides) dATP, dGTP, dCTP, and
dTTP (dNTPs). All DNA polymerases have two fundamental properties in common.
(1) New DNA is synthesized only by elongation of an existing strand at its 3’ end.
(2) Synthesis requires nucleotide precursors, a free 3’ OH end, and a template strand.
A general substrate for DNA polymerase looks like this:
5’ 3’
3’ 5’
Note that the template strand can be as short as 1 base or as long as several thousand
bases.
After addition of DNA polymerase and nucleotide precursors this product will be readily
synthesized:
5’ 3’
3’ 5’
lecture10 56
DNA Sequencing
Consider a segment of DNA that is about 1000 base pairs long that we wish to sequence.
(1) The two DNA strands are separated. Heating to 100˚C to melt the base pairing
hydrogen bonds that hold the strands together does this.
(2) A short oligonucleotide (ca. 18 bases) designed to be complimentary to the end of one
of the strands is allowed to anneal to the single stranded DNA. The resulting DNA
hybrid looks much like the general polymerase substrate shown previously.
(3) DNA polymerase is added along with the four nucleotide precursors (dATP, dGTP,
dCTP, and dTTP). The mixture is then divided into four separate reactions and to each
reaction a small quantity different dideoxy nucleotide precursor is added. Dideoxy
nucleotide precursors are abbreviated ddATP, ddGTP, ddCTP, and ddTTP.
(4) The polymerase reactions are allowed to proceed and, using one of a variety of
methods, radiolabel is incorporated into the newly synthesized DNA.
(5) After the DNA polymerase reactions are complete, the samples are melted and run on
a gel system that allows DNA strands of different lengths to be resolved. The DNA
sequence can be read from the gel by noting the positions of the radiolabeled fragments.
The crucial element of the sequencing reactions is the added dideoxynuclotides. These
molecules are identical to the normal nucleotide precursors in all respects except that
they lack a hydroxyl group at their 3’ position (3’ OH).
Thus dideoxynuclotides can be incorporated into DNA, but once a dideoxynuclotide has
been incorporated further elongation stops because the resulting DNA will no longer have
a free 3’ OH end. Each of the four reactions contains one of the dideoxynuclotides
added at about 1% the concentration of the normal nucleotide precursors. Thus, for
example, in the reaction with added ddATP about 1% of the elongated chains will
terminate at the position of each A in the sequence. Once all of the elongating chains
have been terminated there will be a population of labeled chains that have terminated at
the position of each A in the sequence.
lecture10 57
A part of the final gel will look like this:
+ddGTP +ddATP +ddTTP +ddCTP
Top (—)
Bottom (+)
(Note that larger molecules migrate more slowly to the cathode on these gels)
The deduced DNA sequence obtained from this gel is: 5’ GGATCCTATC 3’
lecture10 58
Polymerase Chain Reaction
Now let’s consider how to obtain DNA segments that are suitable for sequencing. At
first, DNA sequences were obtained from cloned DNA segments (we will discuss some
methods to clone new genes in a subsequent lecture). Presently the entire DNA sequence
for E. coli, as well as a variety of other bacterial species, has been determined. If we
want to find the sequence of a new mutant allele of a known gene we need an easy way to
obtain a quantity of this DNA from a culture of bacterial cells. The best way to do this
is to use a method known as PCR or polymerase chain reaction that was developed by Kary
Mullis in the mid-1980’s. The steps in a PCR reaction are as follows.
(1) A crude preparation of chromosomal DNA is extracted from the bacterial strain of
interest.
(2) Two short oligo nucleotide primers (each about 18 bases long) are added to the DNA.
The primers are designed from the known genomic sequence to be complimentary to
opposite strands of DNA and to flank the chromosomal segment of interest.
(3) The double stranded DNA is melted by heating to 100˚C and then the mixture is
cooled to allow the primers to anneal to the template DNA.
(4) DNA polymerase and the four nucleotide precursors are added and the reaction is
incubated at 37˚C for a period of time to allow a copy of the segment to be synthesized.
(5) Steps 3 and 4 are repeated multiple times. To avoid the inconvenience of having to
add new DNA polymerase in each cycle a special DNA polymerase that can withstand
heating to 100˚C is used.
The idea is that in each cycle of melting, annealing and DNA synthesis the amount of the
DNA segment is doubled. This gives an exponential increase in the amount of the specific
DNA as the cycles proceed. After 10 cycles the DNA is amplified 103 fold and after 20
cycles the DNA will be amplified 106 fold. Usually amplification is continued until all of
the nucleotide precursors are incorporated into synthesized DNA.
lecture10 59
lecture10 60
Lecture 11
11
Gene Mutations
Let’s say that we are investigating the LacZ gene, which encodes the lactose hydrolyzing
enzyme ß-galactosidase. There is a special compound known as X-gal that can be
hydrolyzed by ß-galactosidase to release a dark blue pigment. When X-gal is added to
the growth medium in petri plates, Lac+ E. coli colonies turn blue whereas Lac– colonies
with mutations in the LacZ gene are white. By screening many colonies on such plates it is
possible to isolate a collection of E. coli mutants with alterations in the LacZ gene. PCR
amplification of the LacZ gene from each mutant followed by DNA sequencing allows the
base changes that cause the LacZ– phenotype to be determined. A very large number of
different LacZ mutations can be found but they can be categorized into three general
types.
Mutation Type Description
Missense A base change that converts one codon into another. Many missense
mutations are silent because the encoded amino acid remains the
same or the amino acid substitution is sufficiently subtle so as not to
compromise activity of the enzyme. Missense mutations that have a
marked effect often lie in the active site or grossly disrupt protein
folding.
Nonsense A base change that converts a codon within the coding sequence into
a stop codon. Note that there is only a limited set of sense codons
that can be converted to a stop codon by a single base change.
Nonsense mutations lead to a truncated protein product. Nonsense
mutations that lie early in the gene sequence will completely inactivate
the gene. Sometimes nonsense mutations that lie late in the gene
sequence will not disrupt gene function.
Frameshift The addition or deletion of a base or bases such that the coding
sequence is shifted out of register. Note that addition or deletion of
a multiple of three bases does not cause a frameshift. After the
frameshift mutation is encountered, missense codons will be read up
to the first stop codon. Like nonsense mutations, frameshift
mutations usually lead to complete inactivation of the gene.
lecture11 61
Although many different kinds of mutations occur spontaneously, the frequency with
which mutations occur can be increased as much as 103 fold by treatment of cells with a
mutagen. Here are some general categories of mutagens
Type of Mutagen Mechanism Examples Type of Mutations
Analog is incorporated into

Base Analog 5-bromouracil A•T → G•C, G•C → A•T
DNA and can pair with more
than one base 2-aminopurine A•T → G•C
Base Modifying Chemical or photo damage to Hydroxylamine G•C → A•T

Agent DNA can be repaired, but
repair itself is error prone EMS G•C → A•T, C•G or T•A
UV All changes
Intercalating Polycyclic compounds can fit Acridine Frameshifts (+ or –)

Agent between bases and cause mis-
copying by polymerase to add or Proflavine “
delete bases
ICR-191 “
Suppressor mutations
A powerful mode of genetic analysis is to investigate the types of mutations that can
reverse the phenotypic effects of a starting mutation. Say that you start with a mi– λ
phage mutant that makes small plaques. After plating a large number of these mutant
phage rare revertants can be isolated by looking for phage that have restored the ability
to make large plaques. These revertants could have either been mutated such that the
starting mutation was reversed or they could have acquired a new mutation that somehow
compensates for the starting mutation. The possibilities are:
1) back mutation - true wild type
2) intragenic suppressor - compensating mutation in same gene
3) extragenic suppressor - compensating mutation in different gene
lecture11 62
These possibilities can be distinguished in that a revertant that arose by suppression will
still carry the starting mutation (now masked by the suppressor mutation), whereas a
back mutation will produce a true wild type phage. The general test is to cross the
revertant to wild type and to note whether mi– recombinants are observed. A back
mutation crossed to wild type will not produce any mi– progeny, whereas a revertant that
results from an extragenic suppressor will produce many mi– recombinants. Intragenic
suppressors will produce an intermediate result that sometimes can be difficult to
distinguish from a back mutation in practice. For example, an intragenic suppressor that
lies very close to the original mi– mutation may be able to produce mi– recombinants in
principle but these recombinants may be too rare to be readily observed.
Nonsense suppressors.
An important class of extragenic suppressor mutations can suppress nonsense mutations

by changing the ability of the cells to read a nonsense codon as sense. Such extragenic
revertants were originally isolated by selecting for reversion of amber (UAG) mutations
in two different genes. Since simultaneous back mutations at two different sites is
highly improbable the most frequent mechanism for suppression is a single mutation in
the gene for a tRNA that changes the codon recognition portion of the tRNA. For
example, one of several possible nonsense suppressors occurs in the gene for a serine
tRNA (tRNAser). One of six tRNAser normally contains the anticodon sequence CGA which
recognizes the serine codon UCG (by convention sequences are given in the 5’ to 3’
direction). A mutation that changes the anticodon to CUA allows the mutant tRNAser to
recognize a UAG codon and insert tryptophan when a UAG codon appears in a coding
sequence.
Recognition of UCG (serine codon) Recognition of UAG(stop codon)

by wild type tRNAser by amber suppressor mutant tRNAser
mRNA: 5’——————UCG——————3’ 5’————UAG––——————3’

AGC AUC
3' 3'
tRNA:
5' Ser 5' Ser
The presence of an amber suppressing mutation is usually designated Su+ whereas a wild-
type (nonsuppressing) strain would be designated Su–.
Example: Pam designates an amber (nonsense) mutation in the λ phage P gene, which is
required for λ phage DNA replication. When λ Pam phage are grown on E. coli with an
amber suppressor (Su+) the phage multiply normally, but when λ Pam phage infect a
nonsuppressing host (Su–) the phage DNA cannot replicate.
lecture11 63
The combined use of amber mutations and an amber suppressor produces a conditional
mutant, which is a mutant that is expressed under some circumstances but not under
mutant
others. Conditional mutants are especially useful for studying mutations in essential
genes. Another kind of conditional mutation is a temperature sensitive mutation for
which the mutant trait is exhibited at high temperature but not at low temperature. In a
sense, auxotrophic mutations are also conditional because auxotrophic mutants can be
grown in the presence of the required nutrient but the mutants will not grow when the
nutrient is not provided.
lecture11 64
Lecture 12
Transposable elements
Transposons are usually from 103 to 104 base pairs in length, depending on the transposon
type. The key property of transposons is that a copy of the entire transposon sequence
can at a low frequency become inserted at a new chromosomal site. The mechanism by
which transposons insert into new sites differs from one kind of transposon to another,
but the details are not important to understand how transposons can be used. It is worth
contrasting the recombination events that occur during transposition to the homologous
recombination events that we have considered in meiosis and in phage crosses. In
homologous recombination, crossovers occur between like sequences. While this type of
recombination can generate new combinations of alleles the arrangement of genes is left
undisturbed. In contrast, transposition involves recombination between unrelated
sequences, namely the ends of the transposon and a site in the target sequence.
Transposition therefore results in a new arrangement of genes along the chromosome.
The generic structure of a transposon looks like this:

Host DNA Transposon Tn5 Host DNA
Transposase Kanamycin
Gene resistance
Inverted repeat sequences
Transposon Element Function
Transposase An enzyme that cuts the target DNA more or less at random
and splices the transposon ends to the target sequences,
Other steps in transposition are performed by host enzymes.
Inverted Repeats These sequences direct transposase to act at the ends of

the transposon. Note that because the sequences are
inverted, the two ends have identical sequence.
Selectable Marker(s) Transposons are thought to have evolved by providing a

selective advantage to the host cell. Many transposons carry
genes that confer antibiotic resistance or some other
benefit to the host.
lecture12 65
The study of transposition mechanism and the biology of transposons is an interesting
subject in genetics but for our current purposes we are going to concentrate on how
transposons can be used for bacterial genetic analysis. For this purpose we will focus on
the transposon Tn5 which can function in E. coli as well as a wide variety of other
bacterial species. The selectable marker in Tn5 is a gene that confers resistance to the
antibiotic kanamycin. Thus bacteria without Tn5 are sensitive to kanamycin (Kans),
whereas bacteria that have Tn5 inserted into the chromosome are resistant to kanamycin
(Kanr).
To introduce random insertions of Tn5 into the E. coli chromosome we will start with Tn5
carried on a special λ phage vector: λ Pam int–::Tn5
::Tn5.
Pam allows conditional phage growth. When λ Pam phage infect E. coli with an amber
suppressor (Su+) the phage multiply normally, but when λ Pam phage infect a
nonsuppressing host (Su–) the phage cannot replicate.
int– is a mutation in the λ integrase gene. Phage with this mutation can not integrate into
the host chromosome to make a stable prophage.
::Tn5 designates that the λ phage carries an inserted copy of Tn5

Tn5.
When λ Pam int–::Tn5 infects a wild type (Su– Kans) E. coli host, the phage DNA can not
Pam int–) thus the only way for the E. coli to become Kanr
Pam) nor can it integrate (int
replicate (Pam
is for Tn5 to transpose from the λ DNA to some location on the E. coli chromosome. This
type of transposition is an inherently rare process and will occur in about one out of 105
phage-infected E. coli cells. This is how a transposon mutagenesis might be done:
1) Infect 2x109 wild-type E. coli cells with λ Pam int–::Tn5 so that each cell receives at
least one phage chromosome.
2) Select for Kanr by plating on medium that contains kanamycin. There should be a total
of about 2x104 Kanr colonies. Each of these should have Tn5 inserted into a different
site on the E. coli chromosome.
The genes of E. coli are densely spaced along the chromosome and about half of the Tn5
insertions will lie in one gene or another. There are 4,200 genes in E. coli so our
collection of 2x104 random Tn5 insertions will likely contain at least one insertion in each
gene. (Note that insertions in genes that are essential for E. coli growth such as the
genes for RNA polymerase or ribosomal subunits will not be recovered because these
insertion mutants will not form colonies on the kanamycin plates).
lecture12 66
Let’s say that we are interested in the E. coli genes that are involved in synthesis of
histidine. To find insertion mutants that can not synthesize histidine (His–) we could
screen amongst our collection of 2x104 random Tn5 insertions to find those that are His–.
The easiest way to do this would be to plate out the collection of insertions at a density
of 200 colonies per plate (100 plates total). Each of these master plates would then be
replica plated (first by transfer to a sterile piece of velvet) to a plate that contains
histidine and also to a plate that lacks histidine. His– insertion mutants would be identi-
fied as colonies that can not grow on the plates that lack histidine. Note that the same
collection of random Tn5 insertions can be screened multiple times to find interesting
mutations with different phenotypes.
3) Identify His– Tn5 insertion mutants by replica plating to find colonies that specifically
can not grow on plates that don’t contain histidine.
Once we have a set of His– insertion mutations (in the present example, one might expect
to find 10-20 different His– mutants), the affected gene(s) can be identified by the
simple fact that they will be “tagged” by the inserted Tn5 sequences. The easiest way to
identify the site of insertion is by performing a special PCR amplification of the DNA
fragment that corresponds to the novel junction betweenTn5 Tn5 and the bacterial chromo-
somal sequences. Ordinarily PCR reactions are carried out using two DNA primers, each
of which corresponding to an end of the sequence to be amplified. When we want to
amplify a junction fragment we can use as one of the primers a sequence that lies near
the end of Tn5 but we won’t yet know the relevant chromosomal sequence to allow the
other primer to be designed. There are several tricks that can be used to circumvent
this problem, which are too complicated to describe here. Suffice it to say that there
are ways that the junction fragment can be amplified by PCR using only sequences defined
by the Tn5 portion of the junction fragment.
4) Use the known sequence of the end of Tn5 to PCR amplify a fragment that spans the
junction between the end of Tn5 and the E. coli chromosomal site that was the target for
insertion.
DNA sequencing of the amplified junction fragments will give the identity of the target
sequences. Since we know the DNA sequence of the entire E. coli chromosome, the gene
that was the target for Tn5 insertion can be identified unambiguously.
5) The DNA sequence of the junction fragments will identify all of the genes that have
been inactivated to give the His– phenotype.
lecture12 67
The procedure just outlined can be used to isolate and characterize a wide variety of
useful mutations. A major limitation of this method is that as stated earlier, transposon
mutations usually completely disrupt the target gene and therefore lead to a complete
inactivation of the gene product. Often we will want to work with point mutations (such
as temperature sensitive mutations or nonsense mutations). In the next lecture we will
see how transposons can also be used to facilitate analysis and manipulation of point
mutations.
lecture12 68
Lecture 13
Gene Manipulation in Bacteria
There is no meiosis in bacteria so special techniques have been worked out for
manipulating genes in bacteria so that mapping experiments, strain construction, and
complementation tests can be done.
First, we need a way of getting chromosomal DNA from one cell into another. There are
several ways to do this. All of the methods have in common the use of special extra
chromosomal elements for mobilizing chromosomal genes; the methods differ according
to which extra chromosomal element is used.
We will consider a method that uses phage and is known as Transduction
E. coli chromosome is 4.6 x 106 base pairs Phage P1 chromosome
is 105 base pairs
After infection of E. coli, the phage DNA is replicated by a mechanism known as a “rolling
circle” and the phage is packagedinto phage particles one headfull at a time:
1/300 phage mistakenly packages E. coli chromosome DNA instead of phage DNA.
lecture13 69
Each phage particle will package about 1/50 of the E. coli chromosome. By combining
probabilities we see that about 1/15,000 phage will carry a particular E. coli gene.
A basic transduction experiment to measure the linkage between markers A and B is done
as follows:
(1) Grow P1 on A+B+
(2) Infect A–B–
(3) Select for A+ and then screen for B+
The idea is that we are looking for the rare cases where some chromosomal DNA carrying
gene A is moved into the recipient. To find these recombinants, we select for A+. Then
we screen for B+ to see how often gene B comes along with gene A.
A+ B+
A- B-
The measured frequency of cotransduction of B with A gives a measure of distance

according to the following rules:
• If distance between A and B is greater than one headfull (105 bp) then there will be no
cotransduction.
• If A and B are very close together then there will be 100% cotransduction.
• Cotransduction frequency is an inverse measure of distance.
100
one
cotransduction
headfull
0
distance between markers
The experiment just described is the bacterial equivalent of a 2-factor cross and will
give us relative distances between genes.
lecture13 70
We can also do a 3-factor cross to determine gene order.
(1) Grow P1 on A+B+C+
(2) Infect A–B–C–
(3) Select for A+ and then screen for B+ and/or C+
Genotypes
A+B+C+ 2 crossovers (A to C distance)
A+B+C– 2 crossovers (A to B distance)
A+B-C- 2 crossovers
A+B-C+ 4 crossovers (very rare)
(Note that there are only four possible genotypes because we select A+)
B+
A+ C+
A- B-
C-
A limitation of transduction experiments is the need for a good selectable marker. Tn5
insertions provide a way to extend the utility of transduction for mapping and strain
construction. For example, let’s say that we have isolated a new mutation in the MotA
gene. MotA is a component of the bacterial flagellar moter and MotA– mutants are
nonmotile, a phenotype easily detected by the inability of MotA– colonies to “swarm”
outward on soft agar plates. Imagine that we want to map the MotA– mutation or to
move this mutation into an E. coli strain with a new genetic background. Clearly direct
transduction MotA– would not be possible since we have no way to select for rare
(1/15,000) transductants with the nonmotile MotA– phenotype. One solution would be to
use a nearby marker for which we can select to move MotA– by its cotransduction with
the selectable marker. Unfortunately, good selectable markers are not common and we
are unlikely to have a good selectable marker placed within cotransduction distance of
MotA– readily available. A powerful alternative approach would be to isolate a random
Tn5 insertion that is close to MotA– and to use the Kanr trait conferred by Tn5 as the
selectable marker for cotransduction.
lecture13 71
The steps for finding a linked Tn5 insertion are as follows:
1) Start with a collection of random Tn5 insertions into wild type E. coli (the isolation of
such a collection was described in last lecture). Grow phage P1 on the mixture of 2x104
different Tn5 insertion mutants. Note that this donor strain is MotA+.
2) Use the resulting P1 phage to infect a MotA– recipient strain. Select for transduction
of the Tn5 insertions by selecting for growth of the transductants on kanamycin plates.
Screen for cotransduction of MotA+ by testing each of the Kanr transductants for
motility on soft agar. The desired cotransductant will be Kanr and will bemotile.
Given that one P1 phage headfull corresponds to about 1/50 of the E. coli chromosome,
about 1 in 500 Tn5 insertions will be close enough to the MotA gene to show 90%
cotransduction. Thus if we test about 103 Kanr transductants for motility, we are likely
to find at least one that has cotransduced the MotA+ marker.
3) Once a Tn5 (Kanr) MotA+ transductant has been identified, grow P1 on Tn5 (Kanr)
MotA+.
4) Use the P1 phage from step 3) to infect a MotA– recipient strain. Select for
transduction of the Tn5 insertions by selecting for growth of the transductants on
kanamycin plates.
Test the resulting Kanr transductants for their motility. The transductants that have
cotransduced the MotA+ marker will be motile, whereas the transductants still contain
the MotA– allele will be nonmotile. The fraction of the total transductants that are
motile will give the distance between MotA– and the Tn5 insertion as a cotransduction
frequency.
Kanr) MotA– transductant isolated in step 4) can then be used to transduce the
A Tn5 (Kan
MotA– marker into a new recipient strain by cotransduction with Tn5
Tn5. Note that if we
had isolated a second MotA– mutant, transduction into this strain would amount to a
3-factor cross and would provide a way to determine the order of the two different
MotA– alleles.
lecture13 72
Lecture 14
Gene Complementation in Bacteria
In order to perform tests for dominance or for complementation in bacteria we need a

way to make the bacteria diploid for part of the chromosome. To do this we need to
consider a different extrachromosomal element:
Ori T
The F plasmid
(length 105 base pairs)
Tra
genes
There are some special terms to describe the state of F in a cell: F– refers to a strain
without any form of F, whereas F+ refers to a strain with an F plasmid.
Donor cell Recipient cell
F pilus
Ori T
F is very efficient at transferring itself from an F+ cell to an F– cell. After culturing F+

and F– cells together about 1/10 of the F– cells will become F+.
The property that makes F useful for genetic manipulation is that at low frequency the
plasmid will integrate into chromosome. This occurs because F carries insertion
sequences that are also present at multiple locations on the chromosome. Crossing over
between insertion sequences on F and on the chromosome gives integration.
lecture14 73
Hfr
Hfr: a strain with F integrated into the chromosome that will give efficient transfer of
some chromosomal markers.
F+ plasmid: 1) Transfers itself at a frequency of 0.1
2) Does not transfer chromosomal markers
Hfr 1) Transfers some chromosomal markers efficiently
2) Other markers transferred inefficiently - Gradient of transfer
(It takes about 100 minutes to transfer the entire chromosome)
Consider an F+ integrating to make an Hfr

Hfr:
A B C D
A B C D
This process can be reversed to go back to the F+ state:
F+
A B C D
The recombination can occur at a different position to give an F plasmid that carries a
part of the chromosome. This form of F is called an F’
F’.
B C B C
A
D
lecture14 74
F’
F’s are usually isolated by selection for early transfer of a marker that is transferred
late in the Hfr
Hfr. In the example above the F’ could have been isolated from a population
of Hfr
Hfrs by selecting for early transfer of either B or C.
F ’
F 1) Very efficient transfer of markers carried on F’
F’. These can be markers that
were transferred very late in the Hfr from which the F’ was derived.
2) No transfer of chromosomal markers not on F’

F’.
F’s can be used to perform genetic tests of function because a cell containing a F’ will be
diploid for the region of the chromosome carried on F. This is known as a merodiploid
merodiploid.
For example, if we isolated a new Lac– mutation we could use an F’ Lac+ to determine
whether the Lac– mutation is dominant or recessive.
Growth on lactose
Lac+ +
Lac – –
Lac– / F’ Lac+ Lac– is recessive)
+ (Lac
It is also possible to test for functional complementation of two linked mutations.

Consider two mutations, A– and B–, that are close together and have the same phenotype.
We can introduce an F’ carrying A– into a strain with a B– mutation. If the merodiploid
has a wild type phenotype then we know that the mutations complement and are
therefore in different genes.
A- B+
A+ B-
lecture14 75
Lecture 15
15
Gene Cloning
F is one of many bacterial plasmids, most of which are also transmissible from one cell to
another.
R factors - This type of plasmid was discovered in Japan in early 1950’s . They came from
hospital patients that were infected with bacteria that were resistant to several
different antibiotics. This was surprising since antibiotics work by very different
mechanisms.
For example, resistance to ampicillin, kanamycin, tetracycline, and sulfonamide could be

conferred at once on transfer of a given R factor. In fact, most of the antibiotic
resistance genes are actually in transposons that are carried on the R factor.
Sulr Ampr
Tetr Kanr
Modern cloning vectors are stripped down versions of R factors. They usually carry one
or two drug resistance genes and an origin of replication.
Ampr
origin of
replication
Cloning involves the use of enzymes in vitro to make plasmids carrying pieces of the
chromosome. One of the important tools is a set of enzymes that can cleave DNA at
Enzymes. They were discovered
specific sites. These enzymes are known as Restriction Enzymes
in the following way:
E. coli C E. coli K
λ (grown on C) 108/ml 103/ml
λ (grown on K) 108/ml 108/ml
lecture15 76
This phenomenon known as “host restriction” behaves like a genetic change that reverses
at a high frequency. The explanation is that E. coli K makes enzyme that cleaves λ DNA.
The K strain doesn’t destroy its own chromosome because it also makes an enzyme that
modifies the cleavage site.
The phage that grow on K have by rare chance escaped cleavage long enough to be
modified.
The genes for restriction enzymes usually come in pairs with the gene for the restriction
R) residing next to the gene fr the enzyme that modifies the same sequence (M
enzyme (R M).
modifying restriction
enzyme enzyme
M R
Mutants the have a mutated version of the restriction enzyme but a wild type version of
the modifying enzyme (R R- M+) are useful because they do notshow host restriction but
phage grown on these strains are resistant to host restriction. It is a useful exercise to
think about why a strain with a mutated modifying enzyme but a wild type restriction
enzyme (RR+ M-) would be inviable.
A large number of these enzymes have been isolated from different bacterial species.
Most of the enzymes recognize palindromic DNA sequences of 4 or 6 base pairs.
Restriction enzymes can be used to cut chromosomal DNA into fragments. These
fragments can be ligated into plasmid DNA that has been cut at a single site. This
procedure takes advantage of the fact that the DNA ends that remain after cleavage
with a restriction enzyme will base pair with other ends cut with the same enzyme. The
collection of a large number of random chromosomal fragments carried in plasmids is
known as a Library
lecture15 77
Generation of a library yields a a very large collection of plasmids each with a different
chromosomal insert.
E. coli plasmid Ampr

chromosomal DNA
cut with restriction enzyme

Ampr
mix and ligate
Ampr Ampr Ampr
transform by CaCl2 treatment
select Ampr
Cloning by Complementation
Say we wanted to clone the Lac operon. First a library would be made from DNA from a
Lac+ E. coli strain. This library would then be used to transform a Lac– strain.
Transformants would first be selected by Ampr. The resistant colonies would then be
Lac+). These clones should contain plasmids
screened for the ability to grow on lactose (Lac
carrying a functional Lac operon.
How many clones would we need to screen? Each plasmid carries about 5 x 103 bp of
chromosomal DNA. The chromosome is 5 x 106 base pairs so the entire genome will be
covered if several thousand clones are screened.
All sorts of genes from E. coli have been cloned by looking for DNA fragments that can
restore function to a mutant. It is also possible to find genes from other bacteria. The
following is a dramatic example of a cloning experiment to find an important protein for a
pathogenic bacterium.
lecture15 78
Yersinia is the bacillus that causes bubonic plague, a disease that killed 100 million people
in the 6th century A.D.. One reason that Yersinia is such a deadly pathogen is that it
escapes the immune system by multiplying within cells. The problem was to find the
Yersinia genes that enable the bacterial cells to invade human cells. To do this an assay
was needed.
A test for bacterial invasion consists of a layer of mammalian tissue culture cells. The
bacteria are allowed to settle onto the cells for awhile, then the bacteria that have not
entered the cells are killed with the antibiotic gentamicin, which can not cross the
membrane of tissue culture cells. The bacteria that have entered cells escape gentamicin
and can be recovered from the inside of the cells after the cells are lysed with deter-
gent.
E. coli normally can not invade cells. The gene for invasion was found by transforming E.
coli with a library of Yersinia DNA and then selecting for E. coli that had invaded cells.
A single gene was found that encodes a surface protein known as invasin
invasin.
lecture15 79
Lecture 16
16
Gene Regulation
We are now going to look at ways that genetics can be used to study gene regulation. The
issue is how cells adjust the expression of genes in response to different environmental
conditions. The principles of gene regulation were first worked out by Jacob and Monod
studying the E. coli genes required for cells to use the sugar lactose as a nutrient.
lactose lactose glucose + galactose

(glucose-galactose) intracellular
LacY LacZ
(permease) (β-galactosidase)
cell mass
log
amount
LacY or LacZ
+ lactose
time
The logic of the Lac operon is that the proteins required to use lactose are only made
when their substrate (lactose) is available. This prevents wasteful expression of
enzymes when their substrates are notavailable.
At first, scientists noted that lactose is both an inducer and substrate for the enzymes
of the Lac operon and they therefore concluded that lactose was somehow acting as a
template for the formation of the enzyme. Then compounds were discovered that could
act as inducers but were not themselves substrates for the Lac enzymes. The classic
IPTG, which is an effective inducer of LacZ
example of such a “gratuitous inducer” is IPTG
expression but isn’t hydrolyzed by ß-galactosidase.
CH3
IPTG = galactose—S—CH
CH3
The existence of compounds such as IPTG shows that recognition of the inducer is a
separate molecular event from lactose breakdown.
lecture16 80
The next major finding was the discovery of LacI– mutants. LacI– mutants are
constitutive, meaning that they always express β-galactosidase at high levels regardless
of whether there is an inducer present or not. LacI– mutants have apparently lost a
component of the machinery the cell uses to turn off β-galactosidase expression.
The regulatory system turns out to be quite simple and by isolation of mutants and simple
genetic tests Jacob and Monod were able to figure out the following scheme:
The Lac Operon

repressor
protein β-galactosidase permease
RNA
polymerase
P O
LacI LacZ LacY LacA
promoter operator
+ inducer
The idea is that the inducer has a net positive effect on expression because the inducer
is a negative regulator of the repressor, which is itself a negative regulator of the gene
for β-galactosidase.
We will now consider how regulatory mutants can be analyzed genetically. We will use as
examples different mutations in the Lac system but the genetic tests are very general
and can be applied to most regulatory systems.
Dominance test
β-galactosidase
–IPTG +IPTG Interpretation
I+ Z+ – +
I– Z+ + + I– is constitutive
I– Z+ / F’ I+ Z+ – + I– is recessive
I+ Z– – – Z– is uninducible
I+ Z– / F’ I+ Z+ – + Z– is recessive
I+ Z– / F’ I– Z+ – + I– and Z– mutations complement each other

i.e. the mutations are in different genes.
lecture16 81
A second type of constitutive mutant inactivates the operator site and is known as a
LacOc mutation. LacOc mutations are dominant as revealed in tests of the appropriate
merodiploids:
Oc Z+ + + Oc is constitutive
Oc Z+ / F’ O+ Z+ + + Oc is dominant
You might think that on the basis of a dominance test we could tell whether we have a
LacOc or a LacI– mutation. However, life is not so simple, because it is possible to find
LacI– mutations that are dominant. Such mutations are known as LacI-d. They are
dominant because the repressor protein is a tetramer and LacI-d mutant subunits can
combine with normal subunits and interfere with their function.
I-d Z+ + + I-d is constitutive
I-d Z+ / F’ I+ Z+ + + I-d is dominant
We will now consider a new genetic test that will let us distinguish LacOc (operator
constitutive) from LacI-d (dominant repressor negative) mutations.
Cis/trans test
I+ O+ Z+ – +
I-d Z+ / F’ I+ Z– + + I-d is dominant in cis or in trans with Z+;
(cis) Therefore we say it is “trans-acting”.
I-d Z– / F’ I+ Z+ + +
(trans)
Oc Z+ / F’ O+ Z– + + Oc is dominant only in cis with Z+;

(cis) Therefore we say it is “cis-acting”.
Oc Z– / F’ O+ Z+ – +
(trans)
If a mutation is cis-acting we take this as evidence that the mutation affects a site on
DNA like an operator. If a mutation is trans-acting we take this as evidence that the
mutation affects a diffusible gene product such as a repressor.
lecture16 82
Lecture 17
17
Until now we have been considering mutations that lead to constitutive synthesis of
ß-galactosidase. It is also possible to get mutations that are uninducible. For example, a
LacP–) is uninducible.
mutation in the promoter (LacP
P– Z+ – – P– is uninducible
P– Z+ / F’ P+Z+ – + P– is recessive
P– Z+ / F’ P+Z–
*P – –
P– is cis-acting
P– Z– / F’ P+Z+ – +
*Note that this experiment can also be viewed as a complementation test that shows that
LacP– and LacZ– are mutations in the same gene. This fits with our primary definition of
a gene as the DNA segment needed to make a protein, since the promoter is certainly
needed for protein expression.
Promoter mutants in Lac operon can be distinguished from simple LacZ– mutations since
promoter mutations affect the LacY and LacA genes as well.
Is designates a “super repressor” which binds to the operator DNA but won’t bind
inducer.
Is Z + – – Is is uninducible
Is Z+ / F’ I+ Z+ – – Is is dominant
Positive regulation.
Now we will consider how a different E. coli operon is regulated. The Mal operon encodes
several genes necessary to take up and degrade maltose; a disaccharide composed of two
glucose residues.
lecture17 83
maltose maltose glucose + glucose
(glucose-glucose) intracellular
maltose MalQ
transport proteins (amylomaltase)
Much like the Lac operon, the products of the Mal operon are induced when maltose is
added to cells. Thus, maltose acts as an inducer.
cell mass
log
amount
MalQ
+ maltose
time
When mutants that affect the regulation of the Mal operon were isolated, the most
common type consisted of uninducible mutations in a gene known as MalT
MalT. We can apply
dominance tests and cis-trans tests to MalT mutations with the following results:
maltase activity
–maltose +maltose Interpretation
Mal+ – + Maltose induces Mal operon
MalT– – – MalT– is uninducible
MalT– / F’ MalT+ – + MalT– is recessive
MalT– MalQ+ / F’ MalT+ MalQ– – +

MalT is trans-acting
MalT– MalQ– / F’ MalT+ MalQ+ – +
From this table it looks as if the MalT– trait is not expressed either in cis or in trans.
Because MalT– is recessive, it makes more sense to consider the properties of the
dominant MalT+ allele in the cis/trans test. Viewed in this way, the MalT+ trait is
expressed in both cis and trans and therefore MalT is considered to be trans-acting.
lecture17 84
This behavior is different from any of the Lac mutations that we have discussed. The
interpretation is that MalT encodes a diffusible gene product (not a site on DNA) that is
required for activation of transcription of the Mal operon. This type of gene is usually
called an activator. As shown in the diagram below, maltose binds to the MalT activator
protein causing a conformational change in MalT allowing it to bind near to the promoter
and to stimulate transcription. Note that the genes required for maltose uptake are
located in an operon elsewhere on the chromosome, but these genes are also regulated by
MalT
MalT.
The Mal Operon

activator maltodextrin
protein phosphorylase
amylomaltase
RNA
polymerase
P initiator
MalT MalP MalQ
promoter
+ inducer
(maltose)
This model requires a site, called the initiator, which is where the activator binds near
the promoter to activate transcription. If you think about how mutations in an initiator
site should behave in dominance and cis/trans tests, you will see why in practice it is
difficult to distinguish initiator site mutations from promoter mutations.
It is also possible to isolate “super activator” mutants that will bind to the initiator site
and activate transcription regardless of whether the inducer maltose is present. Such
alleles of the MalT gene are called MalTc and their properties are given below.
–maltose +maltose Interpretation
MalTc + + MalTc is constitutive
MalTc / F’ MalT+ + + MalTc is dominant

MalTc MalQ+ / F’MalT+ MalQ– + +
MalTc is trans-acting
MalTc MalQ– / F’MalT+ MalQ+ + +
d mutants that
For a multimeric activator it should also be possible to isolate activator-d
d
will interfere with the binding of wild-type subunits to the initiator site. Actually MalT-d
mutants have not been isolated, probably because MalT is a monomer.
lecture17 85
Lecture 18
18
In the preceding examples of bacterial gene regulation, we have used known regulatory
mechanisms to see how mutations in different elements of the system would behave in
dominance tests and cis/trans tests. However, one is often trying to learn about a new
operon and is therefore faced with the problem of deducing mechanism from the
behavior of mutants.
The steps to analyzing a new operon are as follows:
1) Isolate mutants that affect regulation. These could be either constitutive or

uninducible. The most common regulatory mutations are recessive loss of function
mutants in trans-acting factors. This is because there are usually many more ways to
disrupt the function a gene than there are ways to make a dominant mutation. Promoter,
operator, and initiator sites are usually much shorter than genes encoding proteins and
these sites present much smaller targets for mutation.
2) Check to see whether the mutation is recessive and trans-acting (most will be).
If the mutation is constitutive then it is likely in the gene for a repressor.
If the mutation is uninducible then it is likely in the gene for an activator.
repressor activator
enzyme enzyme
– +
Although loss of function mutations in genes for repressors or activators are generally
the most common type of regulatory mutation, the table below will help you to interpret
mutations in sites or more complicated mutations in proteins.
Type of Mutation Phenotype Dominant/Recessive Cis/Trans-acting
repressor– constitutive recessive trans-acting
activator– uninducible recessive trans-acting
operator– constitutive dominant cis-acting
promoter– uninducible recessive cis-acting
repressor-d constitutive dominant trans-acting

or activators
repressors uninducible dominant trans-acting

d
or activator-d
lecture18 86
Regulatory Pathways
So far we have been considering simple regulatory systems with either a single repressor
(Lac) or a single activator (Mal). Often genes are regulated by a more complicated set of
pathway. Although
regulatory steps, which together can be thought of as a regulatory pathway
there are good methods that can be used to determine the order of steps in a regulatory
pathway (as will be discussed shortly), it is usually difficult at first to tell whether a
given component identified by mutation is acting directly on the DNA of the regulated
gene or whether it is acting at a step upstream in a regulatory pathway. For example, it
will often be the case that a recessive trans-acting mutation that causes constitutive
expression is not an actual repressor protein, but a protein acting upstream in a
regulatory pathway in such a way that the net effect of this proteins is to cause
repression of gene function. The best way to represent this situation is to call the gene
product a negative regulator and to reserve the term repressor for cases in which we
know that the protein actually shuts off transcription directly by binding to an operator
site. Similarly, the best way to represent a gene defined by a recessive, trans-acting
mutation that causes uninducible expression as a positive activator until more specific
information can be obtained about whether or not the gene product directly activates
transcription. The diagrams to be used are shown below.
negative positive
regulator activator
enzyme enzyme
– +
An important note about interpreting such diagrams is that the arrow or blocking symbol
do not necessarily imply direct physical interaction simply that the negative regulator or
positive activator have a net negative or positive effect, respectively, on gene expression
lecture18 87
Ordering gene functions in a regulatory pathway
Imagine that we are studying the regulation of an enzyme and we find a recessive, trans-
acting mutation in gene A, that gives uninducible enzyme expression. The simplest
interpretation is that gene A is a positive activator of the enzyme:
Model 1
A
enzyme
+
Now, say that we find a recessive, trans-acting mutation in gene B that gives constitutive
enzyme expression. The following model takes into account the behavior of mutations in A
and B:
Model 2 A
B enzyme
– –
The idea is that the gene for the enzyme is negatively regulated by gene B which in turn
is negatively regulated by gene A. The net outcome is still a positive effect of gene A on
enzyme expression. To distinguish the two models we will need more mutations.
However, we can also modify Model 1 as shown below to fit the new data.
Model 1 (revised)
B
A enzyme
– +
The best way to distinguish the two possible models is to test the phenotype of a double
mutant. In one case the A– B– double mutant is predicted to be uninducible and in the
other case it is predicted to be constitutive.
lecture18 88
Model 1 Model 2
A– B– uninducible constitutive
This experiment represents a powerful form of genetic analysis known as an epistasis

test. In the example above, if the double mutant were constitutive we would say that the
mutation B– is epistatic to A–. Such a test allows us to determine the order in which
different functions in a regulatory pathway act. If the double mutant in the example
were constitutive, we would deduce that gene B functions after gene A in the regulatory
pathway. To perform an epistasis test, it is necessary that the different mutations
under examination produce opposite phenotypic consequences. When the double mutant
is constructed, its phenotype will be that of the function that acts later in the pathway.
Epistasis tests are of very general utility. If the requirement that two mutations have
opposite phenotypes is met, almost any type of hierarchical relationship between
elements in a regulatory pathway can be worked out. For example, the LacOc mutation is
in a site, not a gene, but it is still possible to perform an epistasis between LacOc and
LacIs since these mutations satisfy the basic requirement for an epistasis test. One
mutation is uninducible while the other is constitutive for Lac gene expression. When the
actual double mutant, LacOc LacIs, is evaluated it is constitutive (this makes sense given
what we know about the Lac operon since a defective operator site that prevents
repressor binding should allow constitutive expression regardless of the form of the
repressor protein). Formally, this result shows that a mutation in LacO is epistatic to a
LacI. Even if we did not know the details of Lac operon regulation before
mutation in LacI
hand, this epistasis test would allow us to deduce that the operator functions at a later
step than the repressor.
Stable regulatory circuits

We have been considering enzymes that are regulated in response to the availability of
nutrients. There is another general type of regulation whereby genes can be held in
stable on or off states. In development of multicellular organisms all cells (except for
the germ cells and cells of the immune system) have the same genotype yet cells in
different tissues express different sets of genes. Cell-type specification is in part a
program of gene transcription that is established by extracellular signals. In most cases,
after the cell type has been specified the cells do not readily change back when the
signals are removed. This general behavior of cells in development implies the existence
of stable regulatory states for gene control.
lecture18 89
The best understood case of a stable switch is the lysis vs. lysogeny decision made by
phage λ. When phage λ infects cells there are two different developmental fates of the
phage.
1) In the lytic program the phage: replicates DNA, make heads, tails, packages DNA, and
lyses host cells.
2) In the lysogenic program the phage: integrates DNA and shuts down phage genes. The
resulting quiescent phage integrated into the genome is known as a lysogen
The decision between these two options must be made in a committed way so the proper
functions act in concert. The switch in the case of phage λ hinges on the activity of two
repressor genes cI and cro
cro. The cI and cro genes have mutually antagonistic regulatory
interactions that can be diagramed as follows:
cI cro Lytic genes

– –
After an initial unstable period immediately after infection, either cro expression or cI
expression will dominate.
Mode 1: High cro expression blocks cI expression. In this state, all of the genes for lytic
growth are made and the phage enters the lytic program.
Mode 2: High cI expression blocks cro expression. In this state, none of the genes
except for cI are expressed. This produces a stable lysogen.
In gene regulation, as in good circuit design, stability is achieved by feedback. The result
is a bi-stable switch that is similar to a “flip-flop”, one of the basic elements of digital
electronic circuits.
Other genes participate in the initial period to bias the decision to one mode or the
other. These genes act so that the lytic mode is favored when E. coli is growing well and
there are few phage per infected cell, whereas the lysogenic mode is favored when cells
are growing poorly and there are many phage per infected cell.
lecture18 90
Lecture 19
EUKARYOTIC GENES AND GENOMES I
For the last several lectures we have been looking at how one can
manipulate prokaryotic genomes and how prokaryotic genes are regulated. In
the next several lectures we will be considering eukaryotic genes and genomes,
and considering how model eukaryotic organisms are used to study eukaryotic
gene function. During the course of the next six lectures we will think about
genes and genomes of some commonly used model organisms, the yeast
Saccharomyces cerevisiae and the mouse Mus musculus. But first let’s look how
the genes and genomes of these organisms compare to E. coli at one extreme,
and humans at the other.
genome = DNA content of a complete haploid set of chromosomes

= DNA content of a gamete (sperm or egg)
DNA year genes have

genes/
Species Chromosomes cM content/ sequence
haploid introns?
haploid (Mb) completed
E. coli 1 N/A 5 1997 4,200 no
S. cerevisiae 16 4000 12 1997 5,800 rarely
C. elegans 6 300 100 1998 19,000 nearly all
D. melanogaster 4 280 180 2000 14,000 nearly all
2002 draft
M. musculus 20 1700 3000 22,500? nearly all
2005 finished?
2001 draft
H. sapiens 23 3300 3000 22,500? nearly all
2003 finished
Note: cM = centi Morgan = 1% recombination
Mb = megabase = 1 million base-pairs of DNA
Kb = kilobase = 1 thousand base-pairs of DNA
Let’s think about the number of genes in an organism and the size of the
organism’s genome. The average protein is about 300 amino acids long,
requiring 300 triplet codons, or roughly 1Kb of DNA. Thus it makes sense that to
encode 4,200 genes E. coli requires a genome of 5 million base pairs. However,
the human genome encodes about 22,500 proteins, and this should require a
genome of lets say 25 million base pairs. Instead, humans have a genome that
is ~ 3000 million base pairs, or ~ 3,000 Mb, i.e., ~ 3 billion base pairs. In other
words, there is about 100-fold more DNA in the human genome than is required
for encoding 22,500 proteins. What is it all doing? Some of it constitutes
promoters upstream of each gene, some is structural DNA around centromeres
lecture19 91
and telomeres (the end of chromosomes, some is simply intergenic regions (non-
coding regions between genes) but much of it is present as introns.
What does it mean “Genes Have Introns”. This represents one of the
fundamental organizational differences between prokaryotic and eukaryotic
genes. Eukaryotic genes turn out to be interrupted with long DNA sequences
that do not encode for
gene
protein…these
exons introns
“intervening sequences”
1 2 3
chromosome (ds DNA) are called introns.
transcription
The DNA segments that
are ultimately expressed
primary transcript (ss RNA) 1 2 3 as protein, i.e., the DNA
addition of 5’ cap sequence that contains
3’ polyadenylation triplet codon information,
splicing out of introns
are called exons. The
mRNA (ssRNA) MeG 1 2 3 AAAAA intronic sequences are
cap
AUG stop removed from the primary
translation
transcript by splicing.
protein (amino acids) 1 2 3
A major consequence of this arrangement is the potential for alternative

splicing to produce different proteins species from the same gene and primary
transcript. This gives the potential for tremendous amplification of the
complexity of mammals (and other eukaryotes) through many more thousands
of possible proteins.
Note that lower eukaryotes such as the yeast S. cerevisiae only have ~ 5% of
their genes interrupted by introns, but for multicellular organisms, like humans,
>90% of all genes are interrupted by anywhere between 2 and 60 introns, but
most genes have between 5 and 12 introns.
Saccharomyces cerevisiae
YFL046W YFL040W YFL030W
RGD2 FET5 TUB2 RP041 YFL034W HAC1 STE2
0 50
SEC53 ACT1 MOB2 RIM15 CAK1 BST1 EPL1

YFL044C YPT1 RPL22B CAF16
YFL042C GYP8
Drosophila melanogaster
CG3131
syt CG15400
0 50
CG16987 CG2964 CG3123
Human
GATA1 HDAC6 LOC139168
0 50
PCSK1N
lecture19 Figure by MIT OCW. 92

Gene Regulation in Yeast
In the next few lectures we will consider how eukaryotic genes and genomes can
be manipulated and studied, and we will begin with an example of examining
how genes are regulated in S. cerevisiae. First, let’s figure out how to use some
neat genetics to identify some regulated genes, and in the next lecture we will
figure out how one can use genetics to dissect the mechanism of that regulation.
Characterizing function and regulation of S. cerevisiae genes: We are

going to combine a few neat genetic tools that you learned about in Prof. Kaiser’s
lectures for this, namely a library of yeast genomic fragments cloned into a
bacterial plasmid, a modified transposon (mini-Tn7), and the lacZ gene
embedded within the transposon. In this experiment the lacZ gene is going to
be used as a reporter for transcriptional activity of yeast genes.
Mini-Tn7 In E. coli
Tn7TR lacZ URA3 tet Tn7TR
Tn7TR lacZ URA3 tet Tn7TR

Tn7TR lacZ UR A3 tet Tn7TR
In yeast Tn7TR lacZ URA3 tet Tn7TR
Required for Reporter of Selection in Selection in Required for Yeast genomic DNA
transposition transcription yeast E. coli transposition
E. coli
The mini-Tn7 is introduced into a
population of E.coli that harbor a +
plasmid library of the S. cerevisiae
genome; i.e., each E. coli cell is home Tn7
to a plasmid that contains a different Tn7 donor
segment of the S. cerevisiae genome, Random yeast

Yeast genomic
plasmid library
such that the whole geneome is insertion library
represented many times over in this

population of E. coli. The mini-Tn7 is
allowed to transpose by integrating into
either the plasmid DNA or the bacterial
DNA; the original DNA that carries the
mini-Tn7 can not replicate, but cells that have integrated the mini-Tn7 into the
plasmid or E. coli chromosome are selected as Tetracycline resistant colonies.
Plasmid DNA is purified from these transformants and retransformed into
tetracycline sensitive E. coli; the resulting tetracycline resistant bacteria harbor
only plasmids that have an integrated mini-Tn7 transposon. Plasmid is isolated
lecture19 93
from these cells and the yeast genomic fragments are isolated by digestion with
an appropriate restriction enzyme.
So now we have a library of yeast genomic fragments each of which has the
transposon inserted; these genomic fragments can be transformed into S.
cerevisiae cells that are ura3-. Each Ura+ transformant colony will have
recombined a Tn7 transposon-containing genomic DNA into its genome. This
essentially gives us a library of yeast with transposons randomly
integrated into it genome.
Note that the lacZ gene in the

Promoter
transposon does not carry its own of gene X
transcription or a translation start Tn7TR lacZ URA3 tet Tn7TR
site, but if the transposon inserts

in the correct orientation
downstream of a yeast gene
promoter, and in the correct Tn7TR tet URA3 lacZ Tn7TR
triplet codon reading frame, the

•One in two insertions will be in the incorrect
lacZ gene comes under the orientation and will not produce a LacZ-fusion
control of that promoter and protein
when transcription is activated
•Only one in three correct orientation
from that promoter a LacZ-fusion insertions can produce a LacZ-fusion proten
protein is expressed, and most
LacZ-fusion proteins display •At most, only one in six insertions produce a
functional LacZ-fusion proten
robust β-galactosidase activity.
Promoter rt
top
sta ta r
t
ns
of gene X
rip t i on on s
t i rip
ti o Yeast cells expressing β-galactosidase
sc sla sc
Tr
an
Tr
an
Tr
a n
activity can easily be detected by growth
Tn7TR lacZ URA3
in the presence of 5-bromo-4-chloro-
3-indolyl-beta-D-galactopyranoside,
mRNA
better known as X-gal. LacZ cleaves X-
AUG
gal to release a chemical moiety that has
Fusion protein N- -C
a brilliant blue color…and so the colonies
Gene X encoded
amino acids
Mini-Tn7 encoded
amino acids
LacZ encoded
amino acids
turn bright blue!
Fusion Protein has β−galactosidase activity
There are at least two useful things to come out of such a collection of yeast
strains:
(1) Any transposon that integrated into a gene will essentially disrupt that
gene and is likely to cause a null mutation.
(2) For transposons that integrate into a yeast gene such that the lacZ gene
is in frame with the genes coding region, the level of β-galactosidase
activity in these cells therefore becomes a reporter for the transcription
of that gene.
lecture19 94
Here are just two examples of how such a library can be used: (1) to
identify genes that protect cells against a DNA damaging agent that causes
cancer; lets take the example of one of the many many compounds found in
tobacco smoke; and (2) to identify genes whose transcription is up-regulated in
response to being exposed to this tobacco smoke chemical.
The chemical we’ll use as an example is 4-(Methylnitrosoamino)-1-(3-pyridyl)-1-
butanone (NNK). The yeast random insertion library
is first plated out so that individual cells give rise to a
colony; these colonies are then replicated onto test
plates. To screen the library for genes that protect
against the cell killing that can be induced by NNK the
colonies are replica plated onto agar medium that either does or does not contain
a high dose of NNK. To screen the library for genes that are transcriptionally
regulated in the presence of this nasty carcinogenic compound, the colonies are
replica plated onto agar medium containing either X-gal alone or X-gal plus a low
dose of NNK.
Random library of Tn7lacZ insertion mutants – Random library of Tn7lacZ insertion mutants –
Phenotypic screen for NNK sensitivity screen for NNK-regulated genes
Plus NNK + X-Gal X-Gal + NNK

Minus NNK high dose low dose
NNK
sensitive
strain
Interesting colonies can be retrieved from the master plate for further study and
for identification (and subsequent cloning) of the gene responsible for the
interesting phenotype.
Once we have identified a gene that is transcriptionally up or down regulated in

response to an environmental change, how can we use genetics to figure out
how regulation is achieved. This is the topic of the next lecture.
lecture19 95
LECTURE 20
EUKARYOTIC GENES AND GENOMES II
In the last lecture we considered the structure of genes in eukaryotic

organisms and went on to figure out a way to identify S. cerevisiae genes that
are transcriptionally regulated in response to a change in environment. The
ability to regulate gene expression in response to environmental cues is a
fundamental requirement for all living cells, both prokaryote and eukaryote.
We considered how many genes each organism has, about 4,000 for E. coli,
6,000 for yeast and a little over 20,000 for mouse and humans. But only a
subset of these genes is actually expressed at any one time in any particular
cell. For multicellular organisms this becomes even more apparent…it is
obvious that skin cells must be expressing a different set of genes than liver
cells, although of course there must be a common set of genes that are
expressed in both cell types; these are often called housekeeping genes.
There are a number of ways that gene regulation in eukaryotes differs from
gene regulation in prokaryotes.
• Eukaryotic genes are not organized into operons.
• Eukaryotic regulatory genes are not usually linked to the genes they
regulate.
• Some of the regulatory proteins must ultimately be
compartmentalized to the nucleus, even when signaling begins at
the cell membrane or in the cytoplasm.
• Eukaryotic DNA is wrapped around nucleosomes
Today we will consider how one can use genetics to begin to dissect the
mechanisms by which gene transcription can be regulated. For this we will
take the example of the yeast GAL genes in S. cerevisiae.
GALACTOSE METABOLISM IN YEAST

Reaction Enzyme Gene
D-galactose
Galactokinase GAL1
D-galactose-1-phosphate
Galactose transferase
UDP-D-galactose
GAL7
GAL1 encoded
Galactose epimerase GAL10
UDP-D-glucose
UDP-glucose
Phosphorylase
D-glucose-1-phosphate
Phosphoglucomutase
D-glucose-6-phosphate
GLYCOLYSIS
GAL1, GAL7, GAL10 transcription all induced in the
presence of glucose. How is this achieved.
lecture20 96
Once a gene has been identified as being inducible under certain
inducing conditions, in this case in the presence of galactose, we can begin to
dissect the regulatory mechanism by isolating mutants; i.e., mutants that
constitutively express the GAL genes even in the absence of galactose, and
mutants that have lost the ability to induce the GAL genes in the presence of
galactose. If we were studying galactose regulation today we would probably
use a lacZ reporter system as we discussed in the last lecture. However,
when the Gal regulatory system was fist genetically dissected, it was done by
actually measuring the induction of Gal1 encoded galactokiase activity, so this
is how we will discuss the genetic dissection of the system.
Mutagenized GAL1::Tn7lacZ fusion strain grown on: Another approach is to simply measure galactokinase
GLYCEROL activity in the presence or absence of Galactose
GLYCEROL GALACTOSE
+ X-Gal + X-Gal
Constitutive
Uninducible
What we know is that Gal4 mutants are uninducible and that Gal80 and
Gal81 mutants constitutively express the Gal1 galactokinase gene, along
with the other Gal genes. Let’s analyze each mutant in turn:
Gal4 mutant: It was first established that, like Gal1-, the Gal4- mutant
phenotype is recessive, because heterozygous diploids generated by mating
Gal4- to wild type have normal regulation. It was then established that the
mutation in the Gal4- strain lies in a
new gene, and not simply in the GAL1
galactokinase gene; Gal1- mutants
don’t express galactokinase activity in
the presence of galactose, just as was
seen for the Gal4- mutant. That
Gal1- and Gal4- mutants have
mutations in different genes was
shown by complementation analysis,
(diploids from mating Matα Gal4-
with Mata Gal1- behave like wild
type) and the fact that the GAL4 and
lecture20 97
GAL1 genes are unlinked was established by tetrad analysis. You should
think about what the tetrads from the aforementioned diploids would look like.
Put together the simplest model

is that Gal4 is a positive
regulator of Gal1 (and the other
Gal genes). The + sign indicates
that Gal4 increases Gal
expression, but does not indicate
whether this is direct or indirect.
Gal80 mutant: The next useful

regulatory mutant isolated was Gal80-,
in which the Gal1 encoded galactokinase
is expressed even in the absence of
galactose and is not further induced in
its presence. Again, heterozygous
diploids (Gal80-/wt) showed that
Gal80- is recessive, Tetrad analysis
showed that Gal80 is not linked to Gal1,
Gal4 or any of the Gal genes.
If a mutant Gal80 results in constitutive Gal1 expression, the simplest model

is that Gal80 negatively regulates the Gal genes. Since Gal4 positively
regulates, and Gal80 negatively regulated Gal1 expression, we have to figure
out how these two gene products work together to achieve such regulation.
Assuming that Gal4 and Gal80 act in series there are two formal possibilities:
Model 1 Model 1 is that Gal4

positively regulates Gal1,
and that Gal80 negatively
regulates Gal4; the
presence of galactose
somehow inhibits Gal80
function thus releasing
Model 2
Gal4 to positively activate
Gal1 expression.
Model 2 is that Gal80

negatively regulates
Gal1, and Gal4
negatively regulates Gal80; here the presence of galactose positively
activates Gal4 which in turn negatively regulates Gal80, thus relieving
inhibition of Gal1 expression.
lecture20 98
We can distinguish between these two models by doing what’s called an
epistasis test to establish the epistatic relationship between Gal4- and
Gal80-. This involves making a double Gal4- / Gal80- mutant strain. The
phenotype of the double mutant will indicate which of the two models is most
likely to be true….take a look at the two models to predict what phenotype the
double mutant should have. For Model 1 the double mutant would become
uninducible, for Model 2 the double mutant should be constitutive.
We could make the Gal4- / Gal80- double mutant strain using molecular
biological approaches…but an easier way is to let yeast meiosis do the job for
you. If we mate the Gal4- / Gal80+ haploid strain with the Gal4+ / Gal80-
haploid strain (we know these two genes are unlinked) we should obtain
double mutants among the tetratype and non-parental ditype tetrads that
result from this cross.
Parental Non Parental

Ditype Tetratype Ditype
Gal4- Gal80+ Gal4- Gal80+ Gal4- Gal80-
Gal4- Gal80+ Gal4- Gal80- Gal4- Gal80-
Gal4+ Gal80- Gal4+ Gal80- Gal4+ Gal80+
Gal4+ Gal80- Gal4+ Gal80+ Gal4+ Gal80+
These results clearly support Model 1, i.e., because the double mutant is
uninducible rather than constitutive, Gal4 liklely behaves as a positive
activator of Gal1 expression, and in the absence of galactose Gal80 somehow
prevents Gal4 from activation Gal1 expression. When galactose is present
Gal80 can no longer prevent Gal4 from activating Gal1 expression.
Now lets consider a new class of mutant that turned out to be quite
informative, Gal81-. Gal81- mutants, like Gal80- mutants are consititutive
for Gal1 expression, but unlike Gal80-, Gal81- is dominant. (Gal81-/ Gal80-
diploids are constitutive).
lecture20 99
An obvious question is whether Gal81- mutants are still constitutive in a
Gal4- background, since it was already established that Gal4 positively
regulates Gal1 (and the other Gal genes).
Mata Gal81- Gal4+ X Matα Gal4- Gal81+
The surprising finding was that from this cross all the tetrads were of the
parental ditype; in other words there were no tetratypes or nonparental
ditypes, indicating that Gal81- and Gal4- are very tightly linked. Indeed, it
turns out that the Gal81- mutation maps to the Gal4 gene, in the coding
region. The Gal81- mutation was redesignated as Gal481. Essentially Gal481
behaves as a super-activator that is impervious to the negative effects of
Gal80; Gal481 thus activates independently of galactose and Gal80.
So, how do all these genetic facts fit into a molecular model? Upstream of the
Gal1 gene (and other Gal genes) two cis-acting elements are needed for
transcriptional activation. First the TATA-binding protein (TBP) binds to the
TATA-concensus site, and provides a landing pad for a very large RNA
ploymerase complex (RNAP). However, just binding to TBP does not enable
transcription, the complex must
be activated by a transcriptional
activator, in this case the Gal4
protein. The Gal4 protein sits at
another cis-acting element in the
Gal1 promoter region, the
upstream activator sequence
(UAS) that tethers Gal4 to the
promoter. In the absence of
galactose, Gal80 physically
Gal80
prevents Gal4 from recruiting
and activating RNAP. In the
presence of galactose the Gal80
protein changes conformation
and binds to a different region of Gal4, unveiling the ability of Gal4 to recruit
and activate RNAP.
lecture20 100
(Note that the mutation in the Gal481 allele interferes with Gal80 binding
allowing Gal4 to recruit and activate RNAP all the time, even in the absence
of galactose.)
One final comment about

the model for induction of
the Gal genes by galactose.
For many years it was
assumed that galactose (or a derivative of galactose) actually binds directly to
the Gal80 protein, thus preventing it from inhibiting the Gal4 protein from
activating Gal1 transcription. However, it now seems that one extra protein
involved in this chain of events. The Gal3 protein turns out to be directly
bound by galactose (or a derivative); this allows Gal3 to move from the
cytoplasm into the nucleus, the galactose/Gal3 moiety binds to Gal80 to
facilitate moving Gal80 to a different site on the Gal4 protein, thus allowing
Gal4 to activate transcription. Thus while the model as written in this figure
does not actually include Gal3, the models are still formally correct.
In the next lecture we will be looking at promoter elements in eukaryotic

genes in more detail.
lecture20 101
Lecture 21
Eukaryotic Genes and Genomes III
Cis-acting sequences
In the last lecture we considered a classic case of how genetic analysis could be
used to dissect a regulatory mechanism. This analysis was contingent upon
having “clean” phenotypes associated with the isolated mutants; e.g.,
mutations in the Gal80 gene produce a phenotype of constitutive Gal1
expression. However, it is sometimes very difficult to identify regulatory
proteins by isolating mutants, because regulators that influence the expression
of a wide variety of genes might be essential (i.e., mutations in these could be
lethal), or their mutant phenotypes may be extremely complex and difficult to
interpret.
One solution to this has been to work backwards from the cis-acting promoter
sequences for particular genes to identifying the proteins that bind to them.
Let’s take the Gal1 gene as an example. We have considered the fact that in
the presence of galactose the Gal1 gene is transcriptionally upregulated (along
with other Gal genes). What I haven’t told you is the fact that if glucose is
present in addition to galactose, the induction of the Gal genes simply does not
occur! This is known as glucose repression. This makes physiological sense
because glucose is a more efficient energy source for yeast, and is therefore
the preferred carbon source over galactose. Why bother metabolizing galactose
as long as glucose is present? In fact, glucose represses a very large number
of genes whose products metabolize a wide range of carbon sources (sucrose,
maltose, galactose etc) that are less energy efficient than glucose, as well as
repressing a whole host of other genes.
It seems reasonable to expect that there

is a transcriptional repressor that
responds to glucose levels; this
repressor would be ineffective when
glucose is low or absent, and effective
when glucose is present. It also seems
reasonable that one could isolate trans-
acting mutants that fail to repress
galactose-induced Gal gene expression
in the presence of glucose. However, it
+ galactose and glucose
turns out that the very fact that glucose
represses such a large number of
different genes made it difficult to
GLUCOSE REPRESSION identify such mutants.
lecture21 102
Instead of looking for mutants that fail to execute glucose repression at
the Gal1 gene, studies of the Gal1 promoter region itself provided the key to
dissecting the mechanism of glucose repression. Specifically, the Gal1
promoter region was fused to the E. coli LacZ gene, on a plasmid that can
replicate autonomously in S. cerevisiae. It was first important to establish that

regulation of LacZ (β-galactosidase) from the plasmid mirrored the regulation
of Gal1 (galactokinase) from its chromosomal locus; i.e., that β−galactosidase
was induced by galactose in the absence of glucose, but not in its presence.
Having established that, it was possible to go on and interrogate subdomains of

the Gal1 promoter region for their role in induction of Gal1 by galactose, as
well as repression of Gal1 by glucose. The minimal length of DNA stretching
upstream into the promoter region from the Gal1 transcription start site
(designated as adjacent to -1) was 400bp DNA. Once this functional promoter
region was delineated, systematic deletions
400 base pairs upstream of the Gal1
transcription start site is enough to confer
of 50bp or so could be made all across the
proper Gal1-like regulation upon LacZ 400 bp region; this is easy to do with some
recombinant DNA tricks that are not
important to know about here. Suffice to
say that this “deletion analysis” revealed
two regions critical for transcriptional
control, as well as the location of the TATA
Gal1 Promoter
region Gal1 Transcription sequence that is required for loading of the
start site
basal transcription machinery.
1
2
3
4
5
6
7
8
lecture21 103
The expression of β−galactosidase from each of these promoter deletion
constructs under minus-galactose, plus-galactose, and plus galactose &
glucose, are show. From these data we can deduce the location of cis-acting
regulatory sequences for the Gal1 gene.
• Deletions 7 and 8 do not express the reporter under any conditions

because the deletions have removed some of the TATA sequence that is
required for assembly of the basal transcription machinery.
• Deletions 1 and 2 eliminate the ability of galactose to increase expression
from the Gal1 promoter, and since expression is not induced there is
nothing for glucose to repress. It turns out that the 75bp sequence
between -310 and -385 is the DNA binding site for Gal4 and this kind of
region is generally called a UAS (upstream activation sequence) and in
this case UASGAL. We will come back to thinking about Gal4 binding to
the UAS recognition sequence later.
• Deletions 3, 5 and 6 have no effect on the ability of galactose to induce
expression because the UAS remains intact. Note that shortening the
distance between the UAS and the TATA region is not detrimental to
induction. Indeed increasing the distance by inserting extra DNA
between the UAS and the TATA sequence also has little effect on
inducibility. This has led to the idea that UAS sequences can work at
long distances (1,000 – 10,000 bp) away from the TATA sequence and
the transcription start sites. (In mammalian cells regions containing
binding sites for transcriptional activators are called enhancers; we will
come to these in a later lecture)
• Deletion 4 turns out to reveal information about glucose repression.
For this construct, while galactose induces expression, glucose is unable
to repress that expression. The deleted region defines the position of a
sequence element needed for glucose repression, and a sequence
element that behaves this way (i.e., are required for repression) is
generally called a URS (upstream repressor sequence), and in this case
URSGAL.
No/low Glucose High Glucose
After determining that there was a
Snf1 kinase active, Snf1 kinase inactive,
phosporylates Mig1, Mig1 goes to nucleus, URS element controlling glucose
Snf1 prevents nuclear Snf1 binds in a complex to
localization the URS repression at the Gal1 gene
P promoter, it was possible to go on to
Mig1 Mig1 find the Mig1 protein that binds the
cytoplasm cytoplasm URSGAL sequence (which turns out to
lie in the promoter regions of many
nucleus nucleus
genes besides Gal genes). The Snf1
complex is a kinase that under low
Mig1
glucose conditions actively
phosphorylates the Mig1 repressor,
preventing it from entering the
lecture21 104
nucleus. This situation (low glucose) is permissive for galactose induction of
Gal1 gene expression via the UAS. In high glucose the Snf1 kinase is
inactivated, so Mig1 is not phosphorylated, and the unphoshorylated Mig1
enters the nucleus, to bind its URS sequence where it recruits two other
proteins that together achieve repression of Gal1 expression.
Modular properties of Transcription Activators
The Gal4 transcriptional activator turns out to be one of the most well studied
proteins to carry out this kind of function. Once again, a LacZ reporter was
used in an imaginative way to establish that the Gal4 protein has two
functional domains that are separated by a flexible region in the protein. This
time, the Gal1 promoter region remains intact upstream of the LacZ reporter,
but deletions are made across the Gal4 protein; the inverse of keeping Gal4
intact and making deletions along the promoter, as described above.
Gal4 protein deletion analysis Essentially, if the N-terminal domain

LacZ Reporter construct: lacZ of the Gal4 protein is deleted, the
Gal4 deletions:
UASGAL TATA
protein can not bind to the UASGAL
DNA binding Activation
DNA LacZ sequence, and so is unable to
binding activity
domain domain activate transcription of the reporter
N- -C + +++
gene. But, in addition to DNA
-C - - binding, Gal4 must have a region
N- + +++
near the C-terminal end that is
responsible for recruiting and
N- + + activating the RNA polymerase, thus
N- + - allowing expression of the reporter
N-
gene. The most remarkable thing of
-C + +++
all, was that a large region in the
center of Gal4 can be deleted; as long as the DNA binding domain is present
at the N-terminus, and the activating domain is present at the C-terminus,
Gal4 can activate transcription from the UASGAL sequence.
Gal4 missense mutations tend to map to the
DB or the AD regions
DNA binding Activation
domain domain
N- -C
DB AD G a l8 0
Gal4- Recessive, Gal481 Dominant,

uninducible constitutive DB AD
This remarkable separation of function between these two domains of Gal4 was
dramatically demonstrated by a series of experiments called domain
swapping. Essentially, using recombinant DNA techniques, the Gal4
lecture21 105
transcription activation domain (AD) was fused to the DNA binding (DB)
domain of an E. coli protein called LexA; LexA is a repressor that binds to a
known DNA sequence, the LexA operator (LexA OP). Also, the Gal4 DB
domain was fused to the AD transcription activation domain of a viral protein
know to be a strong activator, VP16. These chimeric proteins were
introduced into yeast cells with the appropriate LacZ reporter gene constructs
and the results of these domain swapping experiments were dramatic.
Two LacZ reporter constructs Two chimeric proteins
Two derivatives of a Gal4- yeast strain were created, one containing the LacZ
reporter construct downstream of the Gal1UAS, and the other containing the
LacZ reporter construct downstream of the LexA OP. The two different
chimeric proteins were expressed in each strain and the ability to induce LacZ
activity monitored. In addition the following constructs were also introduced
into the two strains: the
wild type Gal4 protein
and a third chimeric
protein with the
activation domain of the
Gal481 mutant protein
fused to the LexA DB
domain. The results
from these experiments
clearly show that the AD
and the DB domains
function independently
of one another.
This series of experiments, while interesting and certainly revealing about the
how the Gal genes are regulated, have turned out to have a profound effect on
all of biological research because it contributed to the development of a widely
used technology called the yeast two hybrid assay. This assay makes it
possible to determine whether two proteins interact with each other as a
complex with long-lived interaction, and sometimes even when two proteins
only interact transiently.
To determine whether protein X interacts with either protein Y or protein Z

one can do the following: fuse protein X to the Gal4 DB, this chimeric protein
is known as the bait, and it will attach to the UASGAL that lies upstream of a
lecture21 106
reporter gene, usually a selectable marker or LacZ, or both. This bait lies in
wait for an interaction with another protein. The GAL4 AD, is fused to either
protein Y or protein Z. Should either one of these proteins be able to interact
with protein X then the Gal4 AD region will become tethered to the UASGAL
region and will recruit and activate the RNA polymerase.
No interaction
A Gal4-AD
Gal4 DNA Binding Domain Protein X x Y
Gal4-BD
GAL4-binding site Reporter gene

Protein Y Gal4 Activation Domain
Positive interaction
B
Gal4-AD Increased transcription
x
z
Protein Z Gal4 Activation Domain Gal4-BD
GAL4-binding site Reporter gene
Gal4 Chimeric Proteins can Interrogate Yeast Two-Hybrid Assay for Protein-
Protein-Protein Interactions Protein Interactions
Figure by MIT OCW.
Note that the protein X, Y and Z do not have to be yeast proteins; the only
requirement is that the DNA coding sequence for the protein is available (which
is now true for all of the genes from a wide variety of organisms); these
sequences are then cloned such that they produce the appropriate Gal4
chimeric proteins.
In the previous two three lectures we have looked at one particular regulatory
network in S. cerevisiae, and have employed a wide range of tools to
understand this network. In the next lecture I will be telling you how these and
other tools have evolved into technologies that allow us to look globally at gene
regulation in eukaryotic cells.
lecture21 107
Lecture 22
Eukaryotic Genes and Genomes III
In the last three lectures we have thought a lot about analyzing a regulatory
system in S. cerevisiae, namely Gal regulation that involved a hand full of genes.
These studies monitored the increased transcription of Gal genes in the presence
of galactose (and the absence of glucose); we saw that this regulation is achieved
by particular proteins, or multiprotein complexes that bind to specific sequences in
the promoter region upstream from their target genes.
What if I told you that it is now possible to do the following in S. cerevisiae:
• Monitor mRNA expression level for every gene in S. cerevisiae, in one single
experiment.
• Monitor all the binding sites in the S. cerevisiae genome for each
transcription factor in a single experiment.
• Determine all possible pair-wise interactions for every S. cerevisiae protein.
Obviously I wouldn’t mention these possibilities if they weren’t already happening.

What I want to do today is to introduce you to the idea of carrying out genetic
analyses on a global, genome-wide scale, and hopefully give you some examples
that are relevant to what we have already learned along the way. So, this will be
a technology oriented lecture, but with some
application to what we have already learned
about gene regulation in eukaryotes. It should
S. cerevisiae 5,800
also be mentioned that what will be described
Drosophila 14,000
for S. cerevisiae, is theoretically possible for
C. elegans 19,000
any organism whose genome has been
mouse 22,500
human 22,500
completely sequenced and the location of all
the genes in that genome have been
Figure by MIT OCW.
established. What we will learn today is
already being, or will be, applied to higher
eukaryotes and mammals.
Monitor mRNA expression level for every gene in S. cerevisiae, in one

single experiment: Global transcriptional profiling.
Before we consider how it is possible to measure the levels of thousands of mRNA

species, we will have to step back to consider how the levels of one or two mRNA
species can be measured by Northern Blot analysis….and I know you must have
learned this in 7.01 if not in high school. Northern blot analysis is based upon the
fact that DNA and RNA molecules that possess complementary base sequences will
hybridize together to form a double stranded molecule. If the complementarity is
perfect the duplex molecule is stable, if it is imperfect (with base pair mismatches)
it is relatively less stable. This provides the specificity needed to identify perfectly
lecture22 108
matched DNA:RNA duplexes (on Northern Blots) and DNA:DNA duplexes (on
Southern Blots). This specificity is needed to be sure we are measuring the level
of one particular transcript and that this is not contaminated with signal from
closely related transcripts. RNA is isolated from cells, size fractionated on a gel;
the thousands of mRNAs species form a smear on the gel which is punctuated by
the strong ribosomal RNA bands (28S and 18S) that do not interfere with the
analysis.
Image removed due to copyright reasons.

Please see
http://www.accessexcellence.org/RC/VL/GG/nucleic.html
Figure by MIT OCW.
The breakthrough in developing microarrays

Northern Blots
for analyzing mRNA levels was to reverse the
Immobilized mRNA population hybridized logic – instead of immobilizing the mRNAs for
with labeled DNA probe representing one
hybridization with one or two labeled
or two genes
complementary DNA (cDNA) probes, all
possible cDNA probes are immobilized on a
DNA Microarrays solid surface (usually glass slides). The
Immobilized DNA probes representing all
spotting of probes is achieved robotically; the
possible genes hybridized with labeled DNA probes are designed to specifically
mRNA population hybridize to only one nucleic acid sequence
that represents a single mRNA species. The
DNA Clones thousands of DNA probes are dispensed from
96-well, or 384-well plates to an addressable
site on the solid surface. The mRNA
population from each cell type purified and
then copied such that the copy is fluorescently
PCR amplification labeled. This fluorescent population is
purification hybridized to the immobilized probes, and the
robotic intensity of the fluorescence at each probe
printing
spot is proportional to the number of copies of
that specific mRNA species in the original
mRNA population.
hybridize target
to microarray
lecture22 Figure by MIT OCW. 109

So let’s look at how this would actually work in a real experiment. mRNA is
isolated from yeast cells in state A (e.g., minus galactose) and from yeast cells in
state B (e.g., plus galactose), and copies of each population is made such that
one fluoresces red and the other fluoresces green. After mixing, these fluorescent
molecules are hybridized to the slides containing ~5,800 DNA probes, each one
specific for detecting hybridization of many copies of an individual mRNA species.
What’s happening at each spot?
Yeast in state A Yeast in state B
Isolate mRNA populations
AAAAA
AAAAA AAAAA
AAAAA AAAAA
AAAAA
AAAAA
AAAAA
Label copies of mRNA

species with RED or GREEN
TTTTT TTTTT
TTTTT TTTTT
TTTTT
TTTTT
TTTTT TTTTT
MIX
TTTTT
TTTTT
TTTTT
TTTTT Hybridize to the
TTTTT microarray
TTTTT
TTTTT
TTTTT
Hybridization
The location and identity of each probe on the

microarray slide is known, and each probe is
specific for a single mRNA. The color and
intensity of the fluorescence is measured by
scanning the slide with lasers, and the relative
abundance of each mRNA in the cells of State
A vs State B can be calculated from the
mRNA present much higher in State A than State B
emitted fluorescence. i.e., the relative level of
mRNA present much higher in State B than State A
5,800 mRNAs can be compared between two
mRNA present at equal levels in States A and B
populations of yeast cells.
Presenting data for thousands of mRNA transcripts is clearly a challenge. You

could present endless tables of data, but our brains are much more adept at
recognizing shapes, patterns and colors. Colored representations of up and down
regulation of transcripts levels is the preferred way to present data.
Northern Blot vs. Microarray
Each colored vertical line in the horizontal lane displays

the relative expression level of a single mRNA
Images removed due to copyright reasons. Please see Lodish, Harvey, et. al. Molecular Cell Biology.
5th ed. New York : W.H. Freeman and Company, 2004.
lecture22 110
For our purposes here, let’s look at what genes are up-regulated when a glucose
grown culture of S. cerevisiae is shifted into galactose; what genes are up-
regulated under these conditions? Obviously transcripts for Gal1, Gal7 and
Gal10 genes will be up-regulated, as we have discussed in the last couple of
lectures. In addition Gal2 (galactose
What transcripts have increased levels permease) and Gal80 (the negative
when shifted from glucose to galactose? regulator of the Gal4 transcriptional
activator) are also induced; this was
previously known, although we didn’t
Images removed due to copyright reasons.
discuss it directly in the previous lectures.
Please see Ren, Bing., et.al. "Genome-wide Location
But upon looking globally, it has become
and Function of DNA Binding Proteins."
Science 290, no. 5500 (Dec. 22, 2000): 2306-9.
clear that some other genes are also up-
regulated. (This figure shows just a small
snapshot of the response.) These additional
genes are Fur4, Gcy1, Mth1, and Pcl10,
and their co-regulation along with the Gal
genes was previously unrealized. We will
be coming back to this later in the lecture.
Monitor all the binding sites in the S. cerevisiae genome for each
transcription factor in a single experiment.
In the last lecture we talked about deletion analysis of cis-acting regulatory

sequences identifying the location of UAS and URS sequences upstream of the
Gal1 gene. That the Gal4 transcriptional activator protein binds to the DNA
sequence present at the URSGAL1 can be shown to happen in the test tube, but
showing that it is actually bound in a living cell is another matter. A method was
recently developed for doing just that, and this method has been further
developed to determine transcription regulator binding across the whole genome.
Chromatin Immuno Precipitation (ChIP)

H2CO Formaldehyde
Living cells treatment crosslinks
proteins to DNA
DNA fragments that Isolate DNA with

the transcription proteins crosslinked,
factor was bound to in shear into small
the living yeast cell fragments
Reverse the Immunoprecipitate

formaldehyde specific transcription
crosslinks and get rid factor and its bound
of protein DNA
Images removed due to copyright reasons. Please see
Figure 2 in Weinmann, Amy S. Novel ChIP-based Strategies
to Uncover Transcription Factor Target Genes in the Immune System.
Nature Reviews Immunology 4 (2004): 381-386.
This method takes advantage of the fact that formaldehyde crosslinks proteins to
DNA in a way that can later be reversed.
lecture22 111
For galactose grown yeast cells chromatin immunoprecipitation (ChIP) with an
antibody that pulls down the Gal4 protein
A more complete view of galactose revealed some surprises. In addition to
induced gene expression in S. cerevisiae
confirming that Gal4 binds to the promoters
regions upstream of the expected Gal genes,
Images removed due to copyright reasons. the Gal4 protein also binds to the promoter
Please see Ren, Bing, et. al. "Genome-wide
regions of 4 other genes, namely Fur4, Pcl10,
Location and Function of DNA Binding Proteins."
Science 290, no. 5500 (Dec. 22, 2000): 2306-9.
Mth1 (shown in the adjacent figure) and Gcy1
(not shown). Note that these genes were
shown to be induced by galactose in the
previous section. Just how the up-regulation
of Fur4, Pcl10 and Mth1 might contribute to
optimizing the metabolism of galactose is
shown in this figure, but the role Gcy1 plays is
unclear. Clearly, taking a global look at what genes are up-regulated in the
presence of galactose, and taking a global look at what promoters are bound by
the Gal4 regulator, has clearly enriched our view of how S. cerevisiae adapts to
the presence of this sugar.
The ChIP approach, followed by hybridization to DNA microarrays, was originally

limited to monitoring
binding of transcriptional
Images removed due to copyright reasons. regulators for which there
Please see Ren, Bing, et. al. "Genome-wide were good precipitating
Location and Function of DNA Binding Proteins." antibodies. However, this
Science 290, no. 5500 (Dec. 22, 2000): 2306-9.
limitation was recently
eliminated by fusing an
Arrayed probe
sequences represent epitope TAG to each
the upstream cis- regulator gene. This
acting regions of all
5,800 genes epitope TAG is recognized
by a strong antibody, and
so a single antibody can “pull down”
Regulatory Protein Gene Promoter Regulatory Protein
binds Gene Promoter (immunoprecipitate) >100 different
regulatory proteins, each of which is
expressed in its own yeast strain.
This has enabled a massive study to

identify all of the target genes for
each of 106 transcriptional regulators
in S. cerevisiae growing in a defined
medium. A compilation of all the data
has revealed a number of
fundamentally different regulatory
motifs; these are shown in the
lecture22 112
adjacent figure. For the most part the Gal4 regulatory network (not shown)
represents a simple Single Input Motif.
This approach has already been extended to human cells and it will not be long
until detailed regulatory mechanisms are defined for humans, in the way it is now
happening in yeast. It is now possible to go on to monitor which genes the
transcriptional regulators bind to under different environmental conditions, and
from there to build more dynamic models for how these genetic regulatory
mechanisms operate and ultimately how they co-operate with each other.
Determine all possible pair-wise interactions for every S. cerevisiae

protein.
The third global scale analysis we will consider is the systematic determination of
protein-protein interactions in S. cerevisiae. This essentially involves a systematic
test of all pair-wise combinations between all 5,800 yeast proteins. Individual
matings to test >33 million combinations isn’t feasible, so mating pools of 100
strains in
Gal4 chimeric proteins representing all 5,800
Positive interaction all
proteins fused to the Gal4 Activation Domain Gal4-AD Increased transcription combinati
x
and to the DNA Binding domain. z
Gal4-BD
ons has
Gal4 DNA Binding Domain One of 5,800 proteins
GAL4-binding Reporter gene become
site
LacZ, URA3, HIS3 the
5,800 Matα yeast strains Individual strains
5,800 Matα strains X 5,800 Mata
preferred
One of 5,800 proteins Gal4 Activation Domain
strains 33,640,000 matings Select for diploids approach.
that can grow in the
5,800 Mata yeast strains
Pools of 100 strains
58 pools Matα strains X 58 pools Mata
absence of Uracil Only the
and Histidine and
strains 3,364 matings which are blue on X-gal diploid
strains
Figure by MIT OCW.
where the
Gal4 DB-fusion and the Gal4 AD-fusion proteins interact will be able to grow on
galactose medium without uracil and histidine, as well as turning blue when grown
on galactose and X-gal. The plasmids present in such diploids are then sequenced
to determine which proteins are fused to the Gal4 AD and DB domains.
This systematic approach to cataloguing all possible protein-protein interactions

for yeast proteins yielded many more
Embedded in this complex web of interactions
we can find those proteins that bind Gal4
interactions that originally thought. Admittedly
the yeast two hybrid is quite noisy, giving many
• Gal1 can pinch-hit for Gal3
Gal 1
• Gal11 turns out to be a false positive interactions, but even so
subunit of the PolII
transcription machinery so alternative methods (that we do not have time
Gal4 communicates with
Gal 3
Gal 80
PolII Via Gal11 to consider in detail) have confirmed many of
these interactions. When all of the known
Gal 11 Gal 4
protein-protein interaction data is assembled,
we see the surprising fact that > 5,000 proteins
can be connected together by > 14,000 protein
lecture22 113
interactions in a continuous web. Indeed, the interaction data for Gal4 embedded
within this web makes sense and adds some new information. Such
“Interactomes” are being developed for all the usual organisms, and the C.
elegans interactome is particularly well developed. One of the major revelations
has been that proteins from pathways that were previously thought to be totally
unconnected, turn out to have interacting proteins.
lecture22 114
Lecture 23
Transgenes and Gene Targeting in Mice I
In the next two lectures I will be telling you about some of the ways in
which we can study gene function in higher eukaryotes, more specifically in the
laboratory mouse Mus Musculus. I will be doing this by telling you about a
remarkable number of manipulations that have been made to the mouse
genome in order to generate an experimental mouse model system for human
Sickle Cell Disease. The mouse that was developed to explore this human
disease turns out to be one of most genetically modified mice on the
planet…and so it gives us an interesting framework in which to tell you about
making transgenic and knockout mice. To set the scene for genetically
modifying mice to mimic human sickle cell disease we need to step back a bit
and consider this devastating human disease and some of its features.
Human Sickle Cell Disease (a.k.a. sickle cell anemia): Sickle cell disease is a
human blood disorder that is caused by a single mutation in a gene that
encodes one of the subunits of hemoglobin (Hb), namely β-globin.
Sickle Cell Disease – An autosomal Hemoglobin is a tetrameric protein made up
Recessive disorder of Hemoglobin of two α-globin proteins, and two β-globin
proteins; ααββ. Each of the 4 globin proteins
embrace an iron-containing heme molecule
A single
• Red blood cells
(RBCs) make up
mutation in
the sixth
(iron is what makes hemoglobin and Red
40% of the blood
volume amino acid of
the β-globin
Blood Cells red) whose function is to bind
oxygen in the lungs and release it in all the
• Hemoglobin makes
up 70% of the chain
proteins in RBCs (Glutamine ->
Valine)
causes Sickle
tissues of the animal. The very simple
Cell Disease
change of the sixth amino acid in β-globin
Images removed due to copyright reasons. (glutamine is substituted with a valine)
causes devastating consequences. It turns
out that Hb containing β-globin subunits with the sickle mutation (known as
HbS) does not directly interfere with the ability of hemoglobin to store or
release oxygen, but rather this amino acid change bestows a novel property on
the hemoglobin molecule; in its deoxygenated state the HbS molecules
aggregate together to form polymeric fibers, and the presence of these fibers
grossly distort the shape of Red Blood Cells
(RBCs). Instead of being shaped almost like a
doughnut (without the actual hole) and having
tremendous flexibility to squeeze through tiny Images removed due to copyright reasons.
capillaries within tissues, the aggregated HbS
fibers cause the RBCs to become curved (like a
sickle), rigid, prone to rupture and prone to
clumping; rupture causes anemia and clumping clogs small blood vessels,
leading to tissue damage.
lecture23 115
It turns out that Sickle Cell Disease is
Freq. sickle cell disease in US born children
Ethnicity HbS
very common in many parts of the world,

especially sub-Saharan Africa, and even
African American 1/500
Hispanic 1/14,000 Images removed due to
Middle Eastern 0/22,000
copyright reasons.
among parts of the US population, in
Native American 1/17,000
Caucasian 1/160,000
particular African Americans and Hispanic
Asian 0/200,000 Americans. The prevalence of such a
1/12
African
devastating disease allele is actually quite
Heterozygotes for the β-globin sickle cell
Americans
are mutation turn out to be resistant to surprising since one would expect it to be
carriers MALARIA infection; the malaria parasite
(heterozyg
ous) for
does not grow well in RBCs in heterozygous selected against as the human population
individuals. You will consider such issues in
the HbS
allele the population genetics lectures. expanded. However, it turns out that
people who are heterozygous for the sickle
mutation in the β-globin gene are resistant to malaria, and so this gives a
survival advantage for people who are carriers of the mutant allele; they are
said to have the sickle cell trait but they do not have sickle cell disease.
Organization and Expression of the Human globin genes: It turns out

that mammals have a number of different β-globin-like genes, and a number
of α-globin-like genes, i.e., a β-globin family and an α-globin family of genes.
These two gene families are found on separate chromosome; some of the
family members are pseudogenes (genes that do not produce functional
proteins), and the functional family members turn out to be expressed at
different times during development. How did all of these globin genes appear
in mammalian genomes, and what are they doing there.
The Origins of Gene Families in Mammals ε γ γ Ψ δ β
β-globin locus
First 2 mo. in utero +++ + + - -

Till birth + +++ +++ - -
After birth - - - + +++
ζ Ψ Ψ Ψ α2 α1 Ψ
α-globin locus
First 2 mo. in utero +++ + +

Till birth - +++ +++
After birth - +++ +++
First 2 mo. in utero: ζζεε

Till birth: ααγγ
After birth: ααββ Relevant to Sickle
Cell Disease
Expression of Human Globin Genes is Developmentally Regulated
Figure by MIT OCW.
Many genes in mammals exist as multi-gene families, and the globin genes are
a good example of this. During mammalian evolution it appears that gene
duplication was a common event, and this has allowed the duplicated genes to
accumulate mutations that sometimes inactivate the gene (leading to
pseudogenes that are non-functional) and sometimes to genes that produce
proteins that can carry out a slightly different function. For the globin genes,
soon after duplication of an ancestral gene to create the α-globin and β-globin
ancestral genes, these two genes were segregated to separate chromosomes
where they evolved their own gene families through further duplication and
mutations during thousands of years.
lecture23 116
Healthy people: ααββ
It is the ααβSβS hemoglobin molecule expressed
Images removed due
after birth that is responsible for aggregating
Sickle Cell Trait: ααββs
to copyright reasons. and causing sickle cell disease. The ααββS
hemoglobin tetramers expressed in people
heterozygous for the sickle mutation do not
Sickle Cell Disease: ααβsβs aggregate to form fibers, and so do not cause
disease; however, should such heterozygous
ααβsβs is soluble when oxygenated, but precipitates
in low oxygen people live at high altitude some sickling can
occur.
It is sobering to note that almost 50 years since the molecular basis of this
disease was discovered there still does not exist a really effective therapy for
the disease. Hemoglobin was one of first proteins to be purified, it’s gene was
one of the first to be cloned, and the globin proteins were among the first to
have their structure determined by x-ray crystallography…and although some
progress has been made in therapy, much more still needs to be done. This is
precisely why having a robust mouse model for sickle cell disease to test
experimental therapies is absolutely critical. Tremendous strides have been
made in generating a mouse model for sickle cell disease.
How do we Genetically modify There are two general ways to specifically

the mouse genome? modify the genetic makeup of a mouse. One
involves the random integration of a cloned
(1) Transgenes gene somewhere into the mouse genome (i.e.,
• adding genes by pronuclear injection
the introduction of a “transgene”). The other
• random insertion with no replacement
involves precisely targeting a specific gene in
(2) “Knock-outs” the mouse and introducing a know alteration of
• subtracting or deleting genes
• gene targeting
that gene, usually the deletion of the gene and
• specific insertion with replacement
the insertion of a marker gene in its place (a
gene knock-out by targeted homologous
recombination).
Introduction of the Human β-globin gene with the sickle cell mutation
H
(βS ) into the mouse genome: In the 1980’s and early 1990’s several
groups tried to make a mouse with sickle cell disease by introducing the
H
Human β-globin gene with the sickle mutation (βS ), in the hope that if the
protein was expressed at high levels it would precipitate Hb fibers that would
cause sickling of RBCs, thus mimicking sickle cell disease. How does one
make a transgenic mouse?
Mice are treated with a hormone to make them super-ovulate and then mated.
Soon after mating, the fertilized eggs are retrieved from the uterus. Eggs that
contain two pronuclei (one from the mother and one from the father) indicating
that the embryo is still at the one-cell stage, are identified under the
lecture23 117
microscope. The male pronucleus is injected (still under the microscope) with
STEP 1: Retrieve fertilized egg Inject foreign DNA into purified DNA fragments that
from recently mated female one of the pronuclei
H
mouse.
Pronuclei contain the βS gene along with an
appropriate promoter region to
STEP 2: Inject cloned human
β gene (into male pronucleus).
S
H
Fertilized mouse egg prior give it a good chance of being
to fusion of male and female
pronuclei expressed in once integrated into
STEP 3: Human “transgene”
integrates into the mouse
genome at a random site.
Transfer injected eggs
into foster mother the genome. The injected DNA
quite often gets incorporated into
STEP 4: Transfer injected egg the genome, and about one three
into the uterus of a foster
mother. eggs that are implanted into a
STEP 5: Foster mother gives About 10-30% of offspring
will contain foreign DNA in
foster mother mouse will have the
birth to pups, about 1 in three
H
have the transgene integrated
into every cell of its body.
chromosomes of all their
tissues and germ line βS gene integrated, and will go on
Breed mice expressing
foreign DNA to propagate
to produce a baby mouse. Animals
STEP 6: Breed transgenic
offspring to get homozygous
DNA in germ line
that score positive for the human
carriers of the transgene.
transgene are mated to generate
Figure by MIT OCW. mice homozygous for the
transgene. Among these progeny one is likely to contain the mutated human
β−globin protein in its RBCs.
This was indeed achieved, BUT, this mouse did not prove to be a good model
for sickle cell disease. It turns out
Genotypes of the βSH Transgenic Mice
that the human β-globin protein does
not complex well with the mouse α-
globin protein (αM) and so the β H
S
cloned gene encoding the human α- βM αM αM

H
globin protein (α ) was introduced β M α M αM
into fertilized mouse eggs to create a Breed transgenic offspring

new transgenic mouse line, which was β H
H S
then mated with the βS transgenic βM αM αM

mouse to produce a mouse β H
S
H βM αM αM
expressing both βS and αH human
proteins.
Unfortunately the RBCs of these mice do not Note that the αH gene is almost certain
sickle efficiently……maybe because human α− to integrate into different location than
globin is not present. H
βSH the βS gene did, and probably in a
βM αM αM different chromosome. These alleles will
βSH
βM αM αM
therefore sort independently when the
two transgenic mouse lines are bred
Add in the human α-globin
transgene and breed mice
together. The strong expectation was
H H
βSH αH that the presence of the αH αH βS βS
βM αM αM
hemoglobin tetramer in mouse RBCs
βSH αH
βM αM αM would lead to the precipitation of fibers
lecture23 118
and the sickling of the mouse RBCs. However, much to the disappointment of
the research teams involved, this was simply not the case. It turns out that the
presence of the normal mouse hemoglobin proteins is enough to prevent the
mutant hemoglobin tetramers from precipitating into fibers, and so these mice
do not make a good model for human sickle cell disease.
βSH αH
βM αM αM It was decided that the only solution to
βSH αH this problem would be to eliminate the
βM αM αM
endogenous mouse α and β globin genes.
PROBLEM: These mice still do not have RBCs that sickle very well.
The mouse still has mouse α and β globin molecules and their
presence is enough to prevent the human hemoglobins from forming
fibers, in much the same way that humans heterozygous for the This will be the topic of the next lecture.
sickle mutation do not normally have RBCs that sickle.
SOLUTION: Need to get rid of the endogenous mouse α and β

globin genes by targeted homologous recombination to generate
“Knock-out” mice
lecture23 119
Lecture 24
Transgenes and Gene Targeting in Mice II
In the last lecture we discussed sickle cell disease (SCD) in humans, and I
told you the first part of a rather long, but interesting, story describing how a
mouse model for this human disease has been generated. I only got half way
through the story…we will cover the rest today. In the last lecture we
discussed how the human β-globin gene with the sickle mutation (βSH) was
introduced as a transgene in mice, in the hope that it would cause the
precipitation of hemoglobin and the sickling of mouse red blood cells (RBCs);
had this happened this would have generated an animal model for SCD. If you
recall, the transgenic mouse did not have sickling RBCs, and to try to fix this,
the human α-globin gene was also introduced into the mouse genome…but still
the doubly transgenic mouse did not have sickling RBCs. The solution to this
was to inactivate the endogenous mouse α-globin and β-globin genes, and
that’s what we will cover today. BUT, before then, I want to share with you
some great questions that I got after the last lecture, and some responses to
those questions.
βSH αH Great Questions from students after
βM αM αM the last lecture
αH
Inject foreign DNA into
βSH one of the pronuclei
Pronuclei
βM αM αM • How do you know it didn’t integrate into an
important gene?
Fertilized mouse egg prior
• Can’t the phenotype (if you get one) be to fusion of male and female
pronuclei
because of the disruption of an endogenous
gene? Transfer injected eggs
PROBLEM: These mice still do not have RBCs that sickle very well. into foster mother.
The mouse still has mouse α and β globin molecules and their • How do you know that the human globin
proteins were expressed?
presence is enough to prevent the human hemoglobins from forming
fibers, in much the same way that humans heterozygous for the • Why didn’t the human βS-globin gene
sickle mutation do not normally have RBCs that sickle. recombine with the mouse β-globin gene?
About 10-30% of offspring
will contain foreign DNA in
• Could one inject the w.t. human β-globin chromosomes of all their
gene into a human embryo to correct the tissues and germ line
deficiency? Breed mice expressing

SOLUTION: Need to get rid of the endogenous mouse α and β foreign DNA to propagate
DNA in germ line
globin genes by targeted homologous recombination to generate
“Knock-out” mice Figure by MIT OCW.
So…how do we “get rid of” the endogenous mouse α-globin and β-globin
genes? Just like making transgenic
mice this involves some
manipulations of the mouse
embryo…but this is a much more
Images removed due to copyright reasons. complex process, and some
background about the
preimplantation mouse embryo is
needed. For about 4-5 days after
fertilization, the mouse embryo is
freefloating (and therefore accessible) and all of the cells that will eventually
form the mouse remain totipotent, meaning that they have the potential to
lecture24 120
differentaite into any, and every, mouse cell type. This has been shown in
various dramatic ways. For instance, if the four-cell embryo is dissected and
each cell implanted into a different foster mother, four identical mice will be
born. More interestingly, if cells from two genetically different pre-implantation
embryos (e.g., embryos destined to produce mice with different fur colors) are
simply mixed together (they are sticky) and implanted into a foster mother, a
single chimeric mouse will be born.
Early findings Essentially the two types of
revealed that totipotent cells mix together and
the produce an animal that has a
preimplantation micture two types of cells in its
mouse embryo body. This animal has four genetic
Images removed due to
is remarkably parents!!
copyright reasons.
malleable, and The ability of these genetically
that cells in the different totipotent cells to mix
the together in the preimplantation
preimplantation embryo is crucial for the mouse
embryo are
gene knock-out technology.
TOTIPOTENT
In order to make a directed genetic change in a specific mouse gene we exploit

homologous recombination just as we have discussed for E. coli and S.
cerevisiae. However, this is much harder to do in mammalian cells than
bacteria and yeast. In yeast, when a linear
In yeast
DNA duplex is introduced into the cell, Tn7TR lacZ URA3 tet Tn7TR
about 90% of the time that that DNA is

integrated into the yeast genome it is done
Yeast genomic DNA
by the homologous recombination
machinery such that incoming DNA In yeast homologous recombination to replace an
endogenous gene with the transfected DNA fragment
fragment is swapped for the endogenous occurs >90% of the time
gene. In mammalian cells the DNA that is In mammalian cells such homologous recombination
integrated into the genome is almost always between genome and transfected DNA fragment is very
rare (<0.01% of the time)
at a non-homologous site, and the
Have to have clever selection schemes to get the rare cells
frequency of homologous replacement of an that integrated a transfected DNA fragment by targeted
endogenous sequence is about 10-3 to 10-5. homologous recombination
What this means is that we have to allow thousands of integration events to

take place, and to be able to identify the integration event we want…namely an
integration even that took place by homologous recombination.
The first crucial development for this technology was being able to grow the
totipotent cells from preimplantation embryos in culture in the lab; these
are called mouse embryonic stem cells (ES cells); the crucial development
was to devise a clever way to select integrated a DNA construct by
homologous recombination.
lecture24 121
Cells from the inner cells mass of a preimplantation embryo at the
blastocyst stage could be removed and cultured in the lab without the cells
losing their totipotency; i.e., even after being cultured in the lab for many
years these cells can still be introduced back into a preimplantation embryo and
go on to make all the tissues of a mouse. What this means, is that the cells
can be genetically manipulated whilst in culture…and then put back into a
mouse preimplantation embryo!!
neor tkHSV
Preimplantation blastocyst from an Specifically replace your gene
embryo that would produce a mouse of interest (α or β-globin genes)
with GREY FUR with a mutated version of that
Gene X replacement construct
gene in cultured ES cells
ES ES Homologous
Nonhomologous
Construct cells cells recombination
recombination
ES-cell DNA ES-cell DNA

Other genes Gene X
Random Gene-targeted
insertion insertion
No mutation in gene X Mutation in gene X
Can remove totipotent Select for the genetically altered Cells are resistant to G-418 Cells are resistant to G-418
EMBRYONIC STEM CELLS cells you want but sensitive to ganciclovir and ganciclovir
(ES cells) and culture in vitro
Formation of ES Cells Carrying a Knockout Mutation
Targeting Construct
R V
HS
eo
N TK
Now you inject the genetically modified

Select for the NeoR gene ES cells (originally from a blastocyst for a
and against the TKHSV gene mouse with GREY FUR) and inject into a
new blastocyst that would normally give
rise to a mouse with WHITE FUR
The only cells to survive
have undergone a targeted
The blastocyst, now containing
homologous recombination
two types of totipotent embryonic stem
event at the gene of interest
cells, is implanted into a foster mother;
she will give birth to the chimeric offspring
Select fot the genetically

altered cells you want
Figures by MIT OCW.
Essentially, once you have identified mouse ES cells (originally from a grey
furred mouse) that have been genetically altered the way you wish…these cells
can be used to generate a living animal that contains descendents from these
totipotent ES cells. Lets see how you get from there to a mouse in which
every cell contains that genetic alteration.
lecture24 122
The goal is to have the H eterozygous for H eterozygous for
GERM CELLS (sperm and the knocked out the knocked out
Foster gene gene
Mom eggs) derived from the
genetically modified ES
cells; if so all the
offspring would have
GREY FUR when mated
with a white mouse grey α M +/- α M +/-
Some mice are Chimeric fur is a dominant trait
α M +/+ α M +/- α M -/-

Since the “grey” ES cells were
heterozygous for the KO’d
gene, only half the sperm have
25% 50% 25%
the KO gene, so 50% of the
grey offspring are
H om ozygous m utant
heterozygous for the KO. m ice… Viable??
The blastocyts implanted into the foster mother will produce animals with
varying contributions from the “white fur ES cells” and the “grey fur ES cells”,
the latter having been genetically manipulated to have an altered gene, e.g., a
mutated α-globin gene. The crucial step is that the gonads be derived from
the genetically altered “grey fur ES cells”, because then the genetic alteration
can be passed on to an offspring (which will have grey fur) in which every cell
carries the genetic alteration. These offspring can then be crossed to generate
a mouse that is homozygous for the altered gene. This can be done for
generating mice with deletion mutations in the α-globin gene and then again
for deletion mutations in the β-globin gene.
βM
βSH
αM αM
αH
SPERM
βSH αH
αΜ
Μ+βΜ
Μ+
αΜ Μ Μ
Μ− αΜ
Μ+βΜ −β Μ+ αΜΜ−βΜΜ− is essentially
This is essentially an
an
βM αM αM
AaBb X X AaBb
AaBb cross
cross
+/+ +/+ +/+ +/- +/- +/+ +/- +/- the A
where the A and
and BB genes
genes
αΜ+βΜ
Μ+
SOLUTION: Need to get rid of the endogenous mouse α and β
globin genes by targeted homologous recombination to generate on different
lie on different
“Knock-out” mice chromosomes and
chromosomes and are
are
βSH αH
αΜ+βΜ
Μ− +/+ +/- +/+ -/- +/- +/- +/- -/- therefore unlinked.
therefore unlinked.
neo
βMR neoR
Neo Neo
EGGS
βSH αH
Human transgenes
The Human transgenes
neo neoR +/- +/+ +/- +/- -/- +/-
homozygous in
are homozygous in both
both
NeoR Neo
αΜ−βΜ
Μ+ -/- +/+
parents and
parents and soso will
will be
be
present in
present in all
all offspring.
offspring.
βSH αH
αΜ−βΜ
Μ−
+/- +/- +/- -/- -/- +/- -/- -/- 1/16 offspring have
1/16 offspring have the
the
βM αM αM
desired genotype
desired genotype
βSH αH
neoR neoR
Neo Neo
ββSHSH αHH
α
x
Neo R NeoRR
Neo
βSH αH ββSHSH αHH
α
βM αM αM
βSH αH Neo R NeoRR
Neo
neo neo
NeoR NeoR
There are many different mating schemes that one could use to generate mice
that are homozygous for deletions in both the mouse α-globin gene and the
lecture24 123
mouse β-globin gene, and that also carry the trangenes encoding the
human α-globin gene and the human β-globin gene with the sickle cell
mutation. What I have shown you is just one way to obtain this mouse. It
should be noted that after birth, this mouse ONLY expressed human
hemoglobin, and the mouse is therefore said to be humanized.
Was it all worth it? Do we have a The outstanding news is that this mouse
Sickle Cell Disease mouse model? does indeed represent an excellent model of
αH
Sickle Cell Disease which is now being
βH
S
used to explore therapies for SCD that are

R R Neo
Neo
βH
S αH
Neo R R Neo very difficult to carry out on human

Mouse RBCs Sickle!!
Sickled Mouse RBCs
clog the small blood
SCD mouse has huge
spleen…working
populations. So far, these mice have been
vessels in tissues overtime to clear
defective RBCs
used to explore the effectiveness of new
drugs in ameliorating the tendency of RBCs
to sickle. Moreover, the mouse has been
used to test out Gene Therapy approaches
SCD wt
to treating the disease. Both of these
approaches have been successful in the
mouse, paving the way for trying out these treatments in people.
Circulating RBCs
• Isolate mouse
Bone Marrow stem
cells
Images removed due to copyright reasons. Images removed due to copyright reasons. • Transfect with
Please see figure 4 in Iyamu, E. W., E. A. Turner, and T. Asakura. Human β-globin
"Niprisan (Nix-0699) Improves the Survival Rates of Transgenic gene that produces
Sickle Cell Mice Under Acute Severe Hypoxic Conditions." a protein that
Br J Haematol. 122, no. 6 (Sep. 2003): 1001-8. Kidney tissue damage prevents sickling
• Put the modified
bone marrow back
into a mouse
• Monitor
Sickle Cell SCD Mouse
expression of the
Disease (SCD) After Gene transgene and the
Lung of control mice Lung of mice taking Niprisan Mouse Therapy health of the mouse
lecture24 124
Lecture 25
Population Genetics
Until now, we have been carrying out genetic analysis of individuals, for the
next three lectures we will consider genetics from the point of view of
groups of individuals, or populations.
We will treat this subject entirely from the perspective of human population
studies where population genetics is used to get the type of information
that would ordinarily be obtained by breeding experiments in experimental
organisms.
At the heart of population genetics is the concept of allele frequency
Consider a human gene with two alleles: A and a
The frequency of A is f(A) ; the frequency of a is f(a)
Definition: p = f(A) q = f(a)
p and q can be thought of as probabilities of selecting the given alleles by

random sampling. For example, p for a given population of humans is the
probability of finding allele A by selecting an individual from that population
at random and then selecting one of their two alleles at random.
Since p and q are probabilities and in this example there are only two
possible alleles;
p+q=1
Correspondingly, there are three possible genotype frequencies:
f(A/A) + f(A/a) + f(a/a) = 1
We usually can't get allele frequencies directly but must derive them from
the frequencies of the different genotypes that are present in a population
p = f(A/A) + 1/
2 f(A/a)
(homozygote) (heterozygote)
q = f(a/a) + 1/
2 f(A/a)
lecture25 125
Example: M and N are different blood antigens specified by alleles of the
same gene. The antigens are codominant so a simple blood test can
distinguish the three possible genotypes.
f(M/M) = 0.83, f(M/n) = 0.16, f(N/N) = .01
p = f(M) = .83 + .08 = 0.91
q = f(N) = .01 + .08 = 0.09
Note: we can get both p and q with just two of the genotype frequencies
because the three genotype frequencies must total to a frequency of 1.0:
f(M/M) + f(M/N) + f(N/N) = 1
Now let's think about how the inverse calculation would be performed. That
is, how to derive the genotype frequencies from the allele frequencies. To
do this we must make an assumption about the frequency of mating of
individuals with different genotypes. If we assume that the gametes mix at
random, we can calculate the compound probabilities of obtaining each
possible combination of alleles.
egg
A a
sperm (p) (q)
A A /A A /a
(p) (p2) (pq)
a A /a a /a
(q) (pq) (q2)
Thus the genotype frequencies for the next generation are:
f(A/A) = p2, f(A/a) = 2pq, f(a/a) = q2
lecture25 126
We can now calculate the new p1 for this generation using the formula for
deriving allele frequencies from genotype frequencies:
p1 = f(A/A) + 1/2 f(A/a)
= p2 + pq
= p (p + q)
=p
We obtain the simple but very important result that when mixing of gametes
occurs at random, the allele frequencies do not change from one generation
to the next.
This is a condition known as Hardy-Weinberg Equilibrium
If we know the genotype frequencies and allele frequencies then we can ask
whether the population is in H-W equilibrium for that gene by determining
whether the genotype frequencies reflect random mixing of alleles. Consider
two different populations that have different genotype frequencies and
different allele frequencies but have different genotype frequencies.
M/M M/N N/N p q

US Caucasians 0.29 0.5 0.21 0.54 0.46
American Inuit 0.84 0.16 0.008 0.92 0.08
Although the allele frequencies are quite different, both populations have
the genotype frequencies and allele frequencies that fit H-W equilibrium.
Consider the two sample populations that have the same allele frequencies
but have different genotype frequencies.
A/A A/a a/ a p q
Population I: 0.20 0.20 0.60 0.3 0.7
Population II: 0.09 0.42 0.49 0.3 0.7
Only population II satisfies H-W criteria: p2 = 0.09, 2pq = 0.42, q2 = 0.49
lecture25 127
Here is a helpful way to look at frequencies in H-W equilibrium:
1.0
ƒ(A/A) ƒ(a/a)
Genotype ƒ(A/a)
frequency 0.5
.25
p
1.0 0
(A)
q
0 1.0
(a)
Before we needed at least two of the genotype frequencies to calculate

allele frequency but if we know that the population is in H-W equilibrium we
can get both allele frequencies and all genotype frequencies from just one of
the genotype frequencies or one of the allele frequencies.
How good is the random mating assumption in actual human populations? The
chief criteria necessary for a population to be H-W equilibrium is random
mating among individuals in the population. These are some of the conditions
that affect random mating assumption and therefore may affect H-W
equilibrium:
1) Genotypic effects on choice of partner:

Examination of allele frequencies and genotype frequencies for most
genes in the human populations reveals that they closely fit H-W
equilibrium. The implication is that in general, humans select their
mates at random with respect to individual genes and alleles. This may
seem odd given that personal experience says that choosing a mate is
anything but random. However the usual criteria for selecting mates
such as character, appearance, and social position are largely not
lecture25 128
determined genetically and, to the extent that they are genetically
determined, these are all very complex traits that are influenced by a
large number of different genes. The net result is that our decision
of with whom we have children does not in general systematically
favor some alleles over others.
One of the exceptional conditions that produce a population that is

not in H-W equilibrium is known as Assortative Mating. Which
means preferential mating between like individuals. For example,
individuals with inherited deafness have a relatively high probability
of having children together. But even this type of assortative mating
will only affect the genotype frequencies related to deafness.
2) New mutations:
Although new mutations continually arise, mutation rates are usually
sufficiently small that in any single generation their effect on allele
frequencies is negligible. As will be discussed in the next lecture, the
effect of mutations compounded over many generations can have a
significant effect on allele frequencies.
3) Selection (differences in survival or reproduction of different

genotypes)
Like new mutations, the effect of selection is usually small in any
single generation and therefore usually does not affect H-W
equilibrium. An exception would be a recessive lethal mutation that
would render the genotype frequency of the homozygote = 0
regardless of the genotype frequency of the heterozygote. As will be
discussed in the next lecture, the effect of selection can have a
significant effect over many generations.
4) Genetic drift/Founder effect:

For small populations only a small number of individuals pass their
alleles on to the next generation. Under these circumstances, chance
fluctuations in the alleles that are transmitted can cause significant
changes in allele frequency. These effects are usually insignificant
for large populations such as in the U.S.
lecture25 129
To see how this would happen, consider a gene in a very large
population with a single major dominant allele A and 10 minor recessive
alleles a1, a2, a3 ...a10 with allele frequencies ƒ(a1) = ƒ(a2) = ƒ(a3) ... = 10-4
and (ƒ(A) ≈ 1)
Now imagine that a group of 500 individuals from this population move
to an island starting a new population. The aggregate frequency of
recessive alleles (an) is 10-3. Thus, only one of the recessive alleles will
likely be in the initial 1000 alleles included in the island population. If
the selected allele happens to be a1, the new frequencies in the island
population will be: ƒ(a1) = 10-3 , and ƒ(a2) = ƒ(a3) = ƒ(a4) ... = 0.
Thus in a stochastic fashion, most of the minor alleles will be lost,
whereas an occasional rare allele will experience an increase in
frequency. The smaller the founding population the more likely that a
rare allele will be lost and the greater the increase in frequency
experienced by the alleles that happen to be selected.
5) Migration of individuals between different populations

When individuals from populations with different allele frequencies
mix, the combined population will be in H-W equilibrium after one
generation of random mating. The combined population will be out of
equilibrium to the extent that mating is assortatative.
If we are considering rare alleles we can make the following approximations

allowing us to avoid a lot of messy algebra in our calculations.
For f(a) = q, and f(A) = p,
If q << 1 then p ≈ 1
From H-W:
f(A/A) = p2 ≈ 1, f(A/a) = 2pq ≈ 2q, f(a/a) = q2
Since most genetic diseases are rare, these approximations are valid for
many of the population genetics calculations that are of medical importance.
lecture25 130
For example, albinism occurs in 1/20,000 individuals. Let's say that this
condition is due to a recessive allele a of a single gene that is in H-W
equilibrium.
f(a/a) = 5 x 10-5 = q2
q= 5 x 10-5 = 7 x 10-3
f (A/a) = 2pq ≈ 2q = 1.4 x 10-2
We will now calculate the fraction of alleles for albinism that are in
individuals that are homozygous for albinism.
Number of alleles in homozygotes ≈ 2 x N (q2) N = population size
Number of alleles in heterozygotes ≈ N (2q)
2 x N (q2)
The ratio is: =q
N (2q)
Thus, for albinism (since q = 7 x 10-3) the fraction of alleles in homozygotes

is 7 x 10-3. That is, > 99% of the alleles are in heterozygotes.
lecture25 131
Lecture 26
In this lecture we will consider how allele frequencies can change under the
influence of mutation and selection.
The first consider the conversion of a wild type gene to an altered allele by
mutation:
µ
A → a µ =mutation rate (probability of a mutation/generation)
Δqmut = µ ƒ(A) = µp ≈ µ
Typical mutation rates vary from µ = 10-4 — 10-8
Thus, in the absence of any other effects, such as selection, for any given gene the
frequency of mutant alleles will increase a little each generation because of new
mutations
Consider the disease phenylketonuria (PKU), which is an autosomal recessive defect

in the enzyme phenylalanine hydroxylase. The absence of the enzyme prevents
phenylalanine from being metabolized causing unusually high levels of phenylalanine
in the body leading to severe mental retardation.
Say, that for PKU, µ = 10-4. The frequency of PKU will then slowly increase each
generation.
When the allele frequency gets high enough selection against homozygotes will
counterbalance new mutations and q will stay constant. In order to treat selection
quantitatively we need an additional concept.
S = selective disadvantage; and fitness = 1–S
If a genotype has S = 0.75 then fitness = 0.25, meaning that individuals with this
genotype will reproduce at a rate of only 25% relative to an average individual.
Fitness can be thought of as a combination of survival and fertility.
lecture26 132
Recall that for alleles in H-W equilibrium (random mating) the genotype
frequencies will be:
ƒ(A/A) = p2, ƒ(A/a) = 2pq, ƒ(a/a) = q2
Genotype frequency after selection Δ frequency
A/A p2 p2 0
A/a 2pq 2pq 0
a/a q2 q2 (1 – S) –Sq2
#qsel = –Sq2
In the steady state: #qsel + #qmut = 0, –Sq2 + µ = 0, µ = Sq2
q = . µ/S
For PKU, q is 10-2 Sand during human evolution S ≈ 1. Therefore, the estimated
value of µ is about 10-4. The actual mutation frequency is probably not this high –
and the relatively high q for PKU is probably due to a founder effect in the
European population or a balanced polymorphism (see below).
In modern times PKU can be treated by a low-phenylalanine diet so S < 1. So the

frequency of PKU should start to rise at a rate #qmut = 10-4.
Thus, q will only increase by a factor of 1% per generation and it will take a long
time for this change in environment to have a significant effect on disease
frequency.
Now let’s determine the steady state allele frequency for a dominant disease with
allele frequency q = ƒ(A). In contrast to the situation for recessive alleles, for
dominant alleles selection will operate against heterozygotes.
Note that for a rare dominant trait almost all affected individuals are
heterozygotes. q = ƒ(A/A) + 1/2 ƒ(A/a) ≈ 1/2 ƒ(A/a)
lecture26 133
Genotype frequency after selection Δ frequency
A/A – – –
A/a 2pq ≈ 2q (1 – S) 2q –2Sq
a/a p2 p2 0
Δqsel = 1/2 [Δ ƒ(A/A)] = 1/2 (–2Sq)

= -Sq
(After selection, 2Sq heterozygotes are lost each generation but only 1/2 of their
alleles are A. So the net reduction in ƒ(A) is –Sq.)
In the steady state: Δqsel + Δqmut = 0, –Sq + µ = 0, µ = Sq
q = µ/S For S = 1, q = µ
In other words, for dominant mutations with fitness = 0, the only instances of the
disease will be due to new mutations. This makes sense because mutant alleles
cannot be passed from one generation to the next. In this case, the number of
affected individuals will be 2µ.
When S<1 the frequency can get quite high. A good example of this is Huntington's
disease which has a late onset of degeneration of neuromuscular system at > 35
yrs. This disease is bad personally but doesn't decrease reproductive fitness
much.
For the final example of a balance between mutation and selection, consider an X-
linked recessive allele with frequency q = ƒ(a). For rare alleles the vast majority
of affected individuals who are operated on by selection are males, and new
mutations will increase the allele frequency Δqmut ≈ µ
Genotype frequency after selection ! frequency

XA Y p p 0
Xa Y q (1 – S)q –Sq
Note that in a population of equal numbers of males and females, 1/3 of the X
chromosomes will be in males.
lecture26 134
Therefore,
Δqsel = 1/
3 [Δ ƒ(Xa Y)] = 1/3 (–Sq)
= -Sq/3
In the steady state: Δqsel + Δqmut = 0, -Sq/3 + µ = 0, µ = Sq/3
q = 3µ/S For S = 1, q = 3µ
For X-linked recessive mutations with fitness = 0, exactly one third of the alleles
in a population will be new mutations. This relationship has been demonstrated for
the debilitating X-linked diseases hemophilia A and Duchenne muscular dystrophy.
Balanced Polymorphism
Now we will consider a situation in which an allele is deleterious in the homozygous
state but is beneficial in the heterozygous state. The steady state value of µ will
be set by a balance between selection for the heterozygote and selection against
the homozygote.
We will need a new parameter that represents the increased reproductive fitness
of heterozygote over an average individual.
h = heterozygote advantage
Genotype frequency after selection ! frequency
A/A p2 p2 0
A/a 2pq ≈ 2q (1 + h) 2q 2hq
a/a q2 (1 – S)q2 – Sq2
Δq = Δ ƒ(a/a) + 1/2 Δ ƒ(A/a) = – Sq2 + 1/2(2hq)
= – Sq2 + hq
Say S = 1, then Δq = 0 when q2 = hq i.e. h = q
lecture26 135
The possibility of a subtle selection for (or against) the heterozygote for an allele
that appears to be recessive means that in practice the estimates of µ from allele
frequencies are quite unreliable.
For example, q = 10-2. This could mean µ = 10-4 and h = 0 or µ < 10-4 and h = 10-2.
Since a 1% increase in heterozygote advantage would be essentially unmeasurable
we couldn't distinguish these possibilities.
The best understood case of balanced polymorphism is sickle-cell anemia
The allele of hemoglobin known as HbS is recessive for the disease but is dominant
for malarial resistance. HbS is most prevalent in a number of different equatorial
populations where malaria is common: sub-Saharan Africa, the Mediterranean, and
Southeast Asia.
In parts of Africa the frequency of the disease can be as high as ~ 2.6 %, which
means that in these populations q = 0.16.
During human history sickle cell disease would almost certainly be fatal thus S ≈ 1
and therefore h must have been about 0.16. This indicates that during evolution
the reproductive advantage for an HbS heterozygote is 16%.
Many of the most prevalent genetic diseases are suspected to be at a relatively

high frequency because of balanced polymorphism.
Cystic Fibrosis: Autosomal recessive mutations in CFTR (Cystic fibrosis

transmembrane conductance regulator). Mutants disrupt Cl– transport leading to
disturbed osmotic balance across in epithelial cell layers of the lungs and intestine.
Incidence in European populations ≈ 1/2000. Thus, q = 0.05
This high frequency is probably not due to either high mutation frequency or
founder effect (many different alleles have been found although 70% are #F508).
lecture26 136
The hypothesis is that heterozygotes may be more resistant to bacterial
infections that cause diarrhea such as typhoid or cholera and that this selection
was imposed in densely populated European cities.
A second example is a set of different autosomal recessive lysosomal storage

disorders
Allele frequency
Disease Enzyme (maximum)
Gaucher glucocerebrosidase 0.03
Tay-Sachs hexosaminidase A 0.017
Nieman-Pick sphyngomylinase 0.01
All three enzymes are involved in breakdown of glycolipids in the lysosome. When
these enzymes are defective (in individuals heterozygous for the disease allele)
excessive quantities of glycolipids build up in cells and can have pathological
effects. In particular all three diseases are characterized by mental retardation
because of excess glycolipids in neurons.
All three diseases are ~ 100x more common in Ashkenazi Jewish populations.
This group arrived in central Europe in 9th century AD and is currently distributed
among US, Israel, and the former Soviet Union. The competing theories to explain
the unusually high allele frequencies are balanced polymorphism or founder effect.
lecture26 137
Lecture 27
Effects of Inbreeding:
Today we will examine how inbreeding between close relatives (also known as
consanguineous matings) influences the appearance of autosomal recessive traits.
Note that inbreeding will not make a difference for dominant traits because they need
only be inherited from one parent or for X-linked traits since they are inherited from
the mother.
Consider an extreme case of inbreeding namely a brother-sister mating.
?
A useful concept is the Inbreeding Coefficient = F which is defined as the likelihood of
homozygosity by descent at a given locus.
If we consider a locus with different alleles in each grandparent: A1, A2, A3, A4,
F is the probability that the grandchild will be either A1/A1, A2/A2, A3/A3, A4/A4
p(A1/A1) = 1/2 . 1/2 . 1/4 = 1/16
p(A2/A2) = " = 1/16
p(A3/A3) = " = 1/16
p(A4/A4) = " = 1/16
p(homozygous by descent) = 4 . 1/16 F = 1/4
A bother-sister mating is the simplest case but is of little practical consequence in human
population genetics since all cultures have strong taboos against this type of
consanguineous mating and the frequency is extremely low.
lecture27 138
However, 1st cousin marriages do happen at an appreciable frequency. Let's calculate F
for offspring of 1st cousins.
p(A1/A1) = 1/2 . 1/2 . 1/2 . 1/2 . 1/4 = 1/64
p(A2/A2) = “ = 1/64
p(A3/A3) = “ = 1/64
p(A4/A4) = “ = 1/64
p(homozygous by descent) = 4 . 1/64 , F for 1st cousins = 1/16
Consider a rare recessive allele a at frequency f(a) = q = 10-4
For random mating the frequency of homozygotes is f(a/a) = q2 = 10-8
Imagine a hypothetical situation where only 1st cousins mated. In that case the
frequency of homozygotes would be:
f(a/a) = p (homozygous by descent) x p(allele is a)
= F x q
f(a/a) = 1/16 x q = 6.3 x 10-6
Thus there would be 600 times more affected individuals for 1st cousin matings than for
random mating. But 1st cousin marriages are rare and their actual impact on the
frequency of homozygotes in a population will depend on the frequency of 1st cousin
marriages.
lecture27 139
In the U.S. the frequency of 1st cousin marriages is ≈ 0.001
p (affected because of 1st cousin mating) = 1/16 q 10-3 = 6.3 x 10-9
p (affected because of random mating) = 10-8
Thus, ~1/3 of affected individuals will come from 1st cousin marriages
Note that this proportion depends on allele frequency such that traits caused by very
rare alleles will more often be the result of consanguinity
For rare diseases, it is often quite difficult to tell whether or not they are of genetic
origin. A useful method to identify disorders that are likely to be inherited is to ask
whether an unusually high proportion of affected individuals have parents that are
related to one another.
Now let's consider the problem of recessive lethal mutations in the genome:
We have already seen that the frequencies of recessive, loss of function alleles are
usually in the range of 10-3 - 10-4
This may seem like a comfortably small number but given that the total number of human
genes is about 2 x 104, each of us must be carrying many recessive alleles. Assuming that
about 50% of genes are essential, each person should carry an average of approximately
1-10 recessive lethal mutations!
Genetic Load: lethal equivalents per genome.
Usually the genetic load is not a problem since it is very unlikely that both parents will
happen to have lethal mutations in the same genes. However, that chance is considerably
increased for parents that are 1st cousins.
As we have already calculated, the probability that a grandparental allele will become
homozygous is 1/64 for 1st cousins
Thus, each recessive lethal allele for which one of the grandparents in a carrier will
contribute an increased probability of 0.016 that the grandchild will be homozygous and
therefore be afflicted by a lethal inherited defect.
To look for this effect we will use the frequency of stillbirth or neonatal death from 1st
cousin marriages. We must also be careful to subtract the background frequency of
stillbirths and neonatal deaths that are not due to genetic factors. These frequencies
can be obtained from the cases where parents are not related.
lecture27 140
unrelated parents 1st cousins difference
Observed 0.04 0.11 0.07

frequency of still-
birth or neonatal
death
Average number of recessive lethals in both grandparents = 0.07/0.016 = 4.4
Thus each grandparent has an average of 2.2 recessive lethal alleles.
lecture27 141
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
Lecture 28: Polymorphisms in Human

DNA Sequences
•SNPs
•SSRs
lecture28 142
1
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
The methods of genetic analysis that you have been learning are applicable to mammals — even
to humans. However, we need to combine these genetic principles with an understanding of the
physical realities of the human genome. To genetics we will add genomics.
Eukaryotic Genes and Genomes

genome = DNA content of a complete haploid set of chromosomes
= DNA content of a gamete (sperm or egg)
DNA year
genes/
Species Chromosomes cM content/ sequence
haploid
haploid(Mb) completed
E. coli 1 N/A 5 1997 4,200
S. cerevisiae 16 4000 12 1997 5,800
C. elegans 6 300 100 1998 19,000
D. melanogaster 4 280 180 2000 14,000
M. musculus 2002 draft

20 1700 3000 30,000?
2005 finished?
2001 draft
H. sapiens 23 3300 3000 30,000?
2003 finished
Note: cM = centi Morgan = 1% recombination
Mb = megabase = 1 million base-pairs of DNA
Kb = kilobase = 1 thousand base-pairs of DNA
lecture28 143
2
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
Let's add some columns to a table we constructed several lectures back:
DNA content/ generation design true breeding

Species cM
haploid (Mb) time crosses? strains?
E. coli N/A 5 30 min yes yes
S. cerevisiae 4000 12 90 min yes yes
C. elegans 300 100 4d yes yes
D. melanogaster 280 180 2 wk yes yes
M. musculus 1700 3000 3 mo yes yes
H. sapiens 3300 3000 20 yr no no
You might add a column indicating the number of offspring per adult. What are the implications of
this table for human genetic studies? Obviously they're difficult.
lecture28 144
3
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
More specifically:
• Human genetics is retrospective (vs prospective). Human geneticists cannot

test hypotheses prospectively. The mouse
provides a prospective surrogate.
• Can’t do selections
• Meager amounts of data Human geneticists typically rely upon statistical

arguments as opposed to overwhelming
amounts of data in drawing connections between
genotype and phenotype.
• Highly dependent on DNA-based maps and DNA-based analysis
The unique advantages of human genetics:
• A large population which is self-screening to a considerable degree

• Phenotypic subtlety is not lost on the observer
• The self interest of our species
lecture28 145
4
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
Let's consider the types and frequency of polymorphisms at the DNA level in the human genome.
DNA polymorphisms are of many types, including substitutions, duplications, deletions, etc. Two
types of DNA polymorphisms are of particular importance in human genetics today:
A locus is said to be polymorphic if two or more alleles are each present at

a frequency of at least 1% in a population
of animals.
1) SNPs = single nucleotide polymorphisms = single nucleotide substitutions
In human
populations:
Hnuc = average heterozygosity per nucleotide site = 0.001
This means that, on average, at a randomly selected locus, two randomly selected human alleles
(chromosomes) differ at about 1 nucleotide per 1000. This implies that your maternal genome (the
haploid genome that you inherited from your mother) differs from your paternal genome at about 1
nucleotide per 1000.
Similarities and differences: This also implies that the genomes of any two individuals are 99.9%
identical. Conversely, the genomes of two randomly selected individuals will differ at several million
nucleotides. (Identical twins are a notable exception.)
lecture28 146
5
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
The great majority (probably 99%) of SNPs are selectively “neutral” changes
of little or no functional consequence:
• outside coding or gene regulatory regions (>97% of human genome)
• silent substitutions in coding sequences
• some amino acid substitutions do not affect protein stability or function
• disadvantageous SNPs selected against --> further underrepresentation
A small minority of SNPs are of functional consequence and are

selectively advantageous or disadvantageous.
lecture28 147
6
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
Affymetrix chip to identify SNPs
Image removed due to copyright considerations.
6000 datapoints, tabular and visual views of the

data.
Note that only 1500 showing in image on left, a

few hundred at most on right.
Following slides show…

how we visualize data
lecture28 148
7
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
lecture28 149
8
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
PEDIGREE: DOMINANT TRAIT WITH SUPPRESSOR SEGREGATING
It looks like we've been lucky. Allele A at SSR37 appears to segregate with HD. But can you be
confident that the HD gene is in close proximity to the SSR37 locus, or even that it is on
chromosome 4?
lecture28 150
9
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
AKR HAS A GENE THAT SUPPRESSES TUMORS

TUMORS NON-TUMORS
C57black X AKR
AAbb aaBB
AaBb
All normal
13/16 normal:: 3/16 tumors
. A-B- aaB-
aaB-
aabb
It looks like we've been lucky. Allele A at SSR37 appears to segregate with HD. But can you be
confident that the HD gene is in close proximity to the SSR37 locus, or even that it is on
chromosome 4?
lecture28 151
10
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
lecture28 152
11
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
lecture28 153
12
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
lecture28 154
13
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
lecture28 155
14
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
LACTOSE
β(1,4)-Glycoside Linkage
OH OH
HO
galactose 4
residue O O OH
glucose
O residue
HO HO
1 HO H
HO
H Lactose
β(1,4)-Glycoside Linkage
OH OH
glucose
residue
CANDIDATE
HO O
4
GENE
O OH
O
HO HO
1 HO H
HO
H Cellobiose
lecture28 156
15
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
LACTOSE TOLERANCE
LACTASE GENE
SNP
lecture28 157
16
7.03 Lecture 28-30 11/17/03, 11/19/03, 11/21/03
2) SSRs = simple sequence repeat polymorphisms = "microsatellites"
Most common type in mammalian genomes is CA repeat:
primer #1
alleles n
(CA)n
A 11
(GT)n
B 12
PCR primer #2
C 13
gel electrophoresis
D 14
n E 15
F 16 ⎯ ⎯ F 16
E 15 ⎯
D 14 ⎯ ⎯
C 13 ⎯ ⎯
B 12 ⎯
A 11 ⎯ ⎯
AB CD EF AD CF
Genotype
lecture28 158
17
Lectures 29-30: Statistical Evaluation of
Genetic Linkage
•Phase
•Lod scores
lecture29_30 159
genetic linkage mapping
We genotype the six members of the family for SSRs scattered throughout
the genome (which spans 3300 cM)—
one SSR must be within 10 cM of the Huntington's gene:
HD ?
SSR12 SSR112 SSR31 SSR37 SSR5
20 cM
lecture29_30 160
LOD0.06(family 1) = log10 (0.024/0.0039) = log10 (6.25) = 0.796
Same for families #2 and #3:
LOD0.06 (families 1, 2, 3) = 3 x 0.796 = 2.388
Family #4:
Maternal HD HD HD + HD
alleles SSR37 D D D D
P if linked at 0.06 = 1/2 (P if phase 1) + 1/2 (P if phase 2)
= 1/2 (0.47 x 0.47 x 0.03 x 0.47) + 1/2 (0.03 x 0.03 x 0.47 x 0.03) = 0.0016
LOD0.06 (family 4) = log10 (0.0016/0.0039) = log10 (0.41) = - 0.387

lecture29_30 161
LOD0.06 (families 1, 2, 3, 4) = 2.388 – 0.387 = 2.001
Still not sufficient to publish. What to do?
1. It's tempting to ignore family 4 — to declare it to be irrelevant for some

reason or another.
But this would not be an acceptable solution.
2. Calculate LOD scores for other values?

2.5
1.5
LOD
1
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6
-0.5

lecture29_30 162
3. Get more families — always a good idea
4. Determine phase in affected parents
In each of the four families, we were uncertain about phase, and our
LOD calculations embodied those uncertainties.
Family #4:
HD SSR37
Phase HD D
1: + B
two possible arrangements of alleles
on mother's chromosomes
Phase HD B
2: + D
lecture29_30 163
Typing the maternal grandparents for SSR37:
Family #4:
A
B
SSR37
C
D
E
Locus: HD SSR37
HD D
Now we can deduce the Phase 1:
+ B
phase in the mother:
HD B
Phase 2:
+ D
lecture29_30 164
Here is a more realistic version of the genotypic information we might obtain:
dead
Family #4: refused consent
A
B
SSR37
C
D
E
or
inferred
lecture29_30 165
Before we had written:
= 1/2 (0.47 x 0.47 x 0.03 x 0.47) + 1/2 (0.03 x 0.03 x 0.47 x 0.03) = 0.0016
But we now know that phase 1 was correct:
= 1/2 (0.47 x 0.47 x 0.03 x 0.47) + 1/2 (0.03 x 0.03 x 0.47 x 0.03) = 0.0032
LOD0.06(family 4) = log10 (0.0032/0.0039)= log10 (0.82) = - 0.086
We can sum the LOD0.06 scores for all four families:
LOD0.06(family 1, 2, 3, 4) = 2.388 – 0.086 = 2.302

phase known
lecture29_30 166
Overall effect of determining phase in all four families:
Add increment of log10(2) = 0.301 to each family’s LOD score.
LOD0.06 (families 1,2,3,4: all phased) =
LOD0.06 (families 1,2,3,4: unphased) + 4 log10 (2)
= 2.001 + 4 (0.301) = 3.205
Publish!
What if we had not been able to obtain samples from any grandparents?
Try more markers
lecture29_30 167
Search for SSR marker showing no recombination with HD: Where to look?
SSR34 SSR35 SSR36 SSR37 SSR38
Chr 4 20 cM
or
Marker showing no recombination with HD
HD
LOD0 (families 1,2,3,4: unphased) = 4 x 0.903 = 3.609
Very strong conclusion!!
lecture29_30 168
Lecture 31: Genetic Heterogeneity and
Complex Traits
• Allelic heterogeneity
• Nonallelic heterogeneity
•r
• Twin studies
• Sib-pair analysis
lecture31 169
Each of the heritable human traits that we have discussed in recent weeks is
monogenic
invariably caused by mutation in the same gene.
Each of these traits was quite straightforward from a Mendelian perspective:
Sickle cell disease: autosomal recessive
Phenylketonuria: autosomal recessive
Huntington's disease: autosomal dominant
genetic homogeneity
all affected individuals have the same mutation in the same gene
lecture31 170
ALLELIC HETEROGENEITY
cystic fibrosis (CF):
• autosomal recessive disorder affecting 1/2500 newborns in

populations of European origin
• phenotype: sticky viscous secretions  obstruction of pancreas and

airways  pancreatic insufficiency (treated with enzyme
supplements) + lung infections
• mapped to chromosome 7 by genetic linkage analysis in 1985
• gene identified at molecular level in 1989: encodes a chloride channel
protein
• > 600 mutant alleles in the gene have been identified
allelic heterogeneity
Would such allelic heterogeneity affect the outcome of combining

LOD scores from multiple families with affected children?
No, because all such families would show linkage to the

same chromosomal locus.
lecture31 171
TWO MUSCULAR DYSTROPHIES
DUCHEYNNE BECKER
lecture31 172
retinitis pigmentosa (RP):
• degeneration of retina (accompanied by deposits of pigment in retina) 

progressive visual impairment  blindness
• population prevalence of 1/3,000
• one of most common causes of blindness among middle aged in
developed countries
• autosomal recessive inheritance in 84% of affected families
• autosomal dominant inheritance in 10% of affected families
• X-linked recessive inheritance in 6% of affected families
• At least 66 different genetic loci implicated
but RP appears to be result of a single gene mutation in any

given family, at least in most cases
NON-ALLELIC HETEROGENEITY
lecture31 173
How could one begin to genetically dissect a trait like RP that shows
nonallelic heterogeneity?
Approach 1: Linkage analysis on large families with many affected individuals.
Different families with RP may show linkage to different loci, combining LOD
scores from different families might obscure rather than clarify the situation.
However, this trap can be avoided if one can identify a family with sufficient
numbers of affected individuals (and informative meioses) to provide, by
itself, a LOD score of 3.
Approach 2: Direct search for mutations in candidate genes.
In some diseases, one can make good guesses as to the biochemical

structures or pathways that are likely sites of causative mutations. In such
cases, a direct search for mutations at the DNA sequence level in "candidate
genes" -- can be an effective strategy -- even in the absence of any prior
genetic linkage analysis.
lecture31 174
This "candidate gene" approach will become increasingly prominent given:
• Complete sequence of human genome
Complete sequence of human genome (rough draft published in 2001;

reference grade sequence expected in 2003)
• Falling cost of sequencing
Perhaps 10 years from now, scientists will routinely sequence the

entire genomes of individuals with unexplained phenotypes.
lecture31 175
r = coefficient of relationship between two individuals
= likelihood of sharing by descent a given allele at a given locus
= expected proportion of all alleles (at all genes) that two
individuals share by descent
coefficient of relationship, r ≠ inbreeding coefficient, F

(likelihood that an individual is
homozygous by descent at a given
locus)
Relationship degree r
Parent-child 1st 1/2

Siblings 1st 1/2
Aunt/niece 2nd 1/4
First cousins 3rd 1/8
lecture31 176
Cleft lip is a common birth defect. Its incidence in the general population is
about 0.001, but relatives of affected children are at higher risk:
Risk (relative to
Relatives of affected
degree % affected general
child
population)
Sibs 1st 4.1 x40
Children 1st 3.5 x35
Aunts and uncles 2nd 0.7 x7
Nephews and nieces 2nd 0.8 x8
First cousins 3rd 0.3 x3
Are these findings consistent with autosomal dominant inheritance of cleft lip?
No, because the percentages of 1st and 2nd degree relatives who are
affected are too low (would expect 50% and 25%, respectively).
Are these findings consistent with autosomal recessive inheritance of cleft lip?
No, because the percentage of affected siblings is too low (would
expect 25%) and because the risk in children is nearly as high as that
in siblings.
lecture31 177
Phenotypic concordance in monozygotic (MZ; identical)
and
dizygotic (DZ; fraternal) twins
MZ twins arise when a developing embryo (derived from one zygote;

fertilization of one egg by one sperm) splits into two parts,
each giving rise to a baby
DZ twins arise from two separate, but nearly simultaneous fertilization

events.
Relationship degree r
Parent-child 1st 1/2
Siblings 1st 1/2
Aunt/niece 2nd 1/4
First Cousins 3rd 1/8
MZ twins 0 1
DZ twins 1st 1/2
lecture31 178
Twin studies:
Concordance = both twins display phenotype in question
Discordance = one twin displays phenotype in question, other does not
Concordance Rates in
MZ twins DZ twins Interpretation
Huntington's disease 100% 50% autosomal dominant
Sickle cell disease 100% 25% autosomal recessive
Cystic fibrosis 100% 25% autosomal recessive
Measles 97% 94% environmental (contagious)
Cleft lip 40% 4% environment + multiple genes
Insulin-dependent 30% 6% environment, ≥1 gene
diabetes
Coronary heart disease 46% 12% environment, ≥1 gene
Schizophrenia 46% 14% environment, ≥1 gene
lecture31 179
male homosexuality
Concordance Rates
MZ twins 57%
DZ twins 24%
Non-twin brothers 13%
In early 1990's, Dean Hamer and colleagues at NIH embarked

on genetic studies of male homosexuality. They phenotyped
the individuals by asking them to answer a number of questions
about their sexuality: self-identification, attraction, fantasy, and
behavior  bimodal distribution of scores.
lecture31 180
Pedigree figures removed due to copyright considerations.
lecture31 181
Pedigree figures removed due to copyright considerations.
lecture31 182
Hamer and colleagues then employed concordant sib-pair analysis,
variation on conventional genetic linkage analysis that
1. requires no knowledge of mode of inheritance
2. unaffected by incomplete penetrance
3. can tolerate some degree of non-allelic heterogeneity
Sib-pair analysis =
search for nonrandom sharing of alleles between phenotypically

concordant sibs
lecture31 183
Hamer and colleagues (Science 261:321-327 [1993]) identified 40 nuclear
families in which there were two homosexual brothers. In each of the 40
families, they studied the transmission of X-linked SSRs from the mother to
the homosexual sons. For an X-linked SSR, there are two possible
genotypes in each son, and thus there are four possible combinations of
genotypes in the two sons:
C/Y A/B
A/Y A/Y
Identical by descent (IBD)
B/Y B/Y
A/Y B/Y
Nonidentical
B/Y A/Y
lecture31 184
If the region of the X chromosome being tested plays no causal role in
male homosexuality, then the four possible combinations should be equally
likely, and identity by descent and nonidentity should be equally likely.
IBD Nonidentical
Expected 20 20
On the distal long arm of the X chromosome, Hamer and colleagues

observed a dramatic departure from random expectations among the 40
families:
IBD Nonidentical
Observed 33 7
2
(O − E ) (33-20)2 + (7-20)2
Chi-square =
∑ E
=
20 20
= 16.9
lecture31 185
2
The table of critical values of the X distribution has been removed
due to copyright considerations.ew Text
p = probability, given the null hypothesis, of observing the data (or data
even more diverged from the null expectations)
p <<< 0.005
Suggests that a gene on distal long arm of X chromosome contributes

to the development of male homosexuality -- in some but not all cases.
lecture31 186
Lecture 32: Numerical Chromosomal
Abnormalities and
Nondisjunction
• Meiosis I
• Meiosis II
• Centromere-linked markers
lecture32 187
Female 46,XX
Male 46,XY
Human chromosomal abnormalities may be numerical or structural.
Numerical Total # chromosomes / cell
Trisomy = 3 copies of a single chromosome 47

Monosomy = 1 copy of a single chromosome 45
Triploidy = 3N 69
Tetraploidy = 4N 92
Structural
Deletion
Duplication
Translocation (involves 2 chromosomes)
lecture32 188
Chromosomal abnormalities manifest themselves in two ways:
1) Spontaneous abortions
50% of human pregnancies --> spontaneous abortion or miscarriage
nearly all during first trimester of pregnancy, with many

during the first month, when pregnancy is recognized only
by hormonal assays
50% of spontaneously aborted embryos and fetuses have

chromosomal abnormalities
Therefore 25% of all human embryos have chromosomal abnormalities.
lecture32 189
Breakdown of chromosomal abnormalities in spontaneous abortions:
Trisomy
16 15%
13, 18, 21 9%
XXX, XXY, XYY 1%
All others 27%
Monosomy X (45,X or XO) 18%
Triploidy 17%
Tetraploidy 6%
Other 7%
Total 100%
lecture32 190
Chromosomal abnormalities manifest themselves in two ways:
2) Defects in newborns: 0.5% aggregate frequency
Among the most common:
XXY 1 / 1,000 males

XYY 1 / 1,100 males
XO 1 / 7,500 females
XXX 1 / 1,200 females
Trisomy 13 1 / 15,000
Trisomy 18 1 / 11,000
Trisomy 21 1 / 900
Structural anomalies 1 / 400
lecture32 191
lecture32 192
lecture32 193
Sex chromosomes SEX DETERMINING SIGNAL
MALE FEMALE
Fruitfly XY XX # of X’;s
Mammals XY XX Y (+ or -)
Nematodes XO XX # of X’;s
hermaphrodite
Birds ZZ ZW ?
Some Reptiles temperature
lecture32 194
Trisomy 21 Down Syndrome
Numerical chromosomal disorders are the result of
nondisjunction = failure of chromosomes to separate

normally during cell division
Nondisjunction can occur during meiosis (before fertilization)

or mitosis (after fertilization).
How could you figure out whether nondisjunction for

chromosome 21 had occurred during meiosis or mitosis?
lecture32 195
A
SSR 21.1 B
C
D
A’
B’
SSR 21.2
C’
D’
What can you conclude? At least two things:
1. The presence in the affected child of two different maternal alleles
for SSR 21.1 indicates that
nondisjunction occurred before fertilization (in meiosis) in the mother.
2. There has been recombination between the two chromosome 21’s

in the mother prior to nondisjunction.
lecture32 196
Human Male Meiosis
n Meiosis I:
t io
ic a
p l separate homologs
re
Meiosis II:
separate sister
46 92
chromatids
Meiosis I
46 46
Meiosis II
Gametes
23 23
23 23
lecture32 197
Human Female Meiosis
n
t io
ic a
p l
re
46 92
Meiosis I
1st polar body
46 46
Meiosis II
23 23
Gamete or Gamete or
2nd polar body 2nd polar body
lecture32 198
Normal chromosome 21 segregation:
A A’
A A’
A A’
B B’
B B’
46 92
B B’
Meiosis I
A A’
46 46 B A’
A B’
B B’
Meiosis II
AA’ AB’
23
23
4 possible BA’ BB’
gametes 23 23
lecture32 199
Nondisjunction in female meiosis I leading to trisomy:
A A’
A A’
A A’
B B’
B B’
46 92
B B’
Meiosis I 1st polar body

A A’ B A’
48
no Chr 21!
A B’ B B’
44
Meiosis II
AA’ or AB’
what’s left
and 24
24 BA’ or BB’ 2nd polar body
Gamete
lecture32 200
Nondisjunction in female meiosis II leading to trisomy:
A A’
A A’
A A’
B B’
B B’
46 92
B B’
Meiosis I
1st polar body
A A’
46 46 B A’
A B’
Meiosis II B B’
AA’ and AB’ no Chr 21!

24 2nd polar body
22
gamete
lecture32 201
The key to distinguishing Meiosis I vs Meiosis II nondisjunction is
the centromere-linked marker, which will segregate as follows:
Proper disjunction A or B
Meiosis I nondisjunction A and B
Meiosis II nondisjunction (A and A) or (B and B)
lecture32 202
Nondisjunction in female meiosis I leading to trisomy:
A A’
A A’
A A’
B B’
B B’
46 92
B B’
Meiosis I 1st polar body

A A’ B A’
48
no Chr 21!
A B’ B B’
44
Meiosis II
AA’ or AB’
what’s left
and 24
24 BA’ or BB’ 2nd polar body
Gamete
lecture32 203
The key to distinguishing Meiosis I vs Meiosis II nondisjunction is
the centromere-linked marker, which will segregate as follows:
Proper disjunction A or B
Meiosis I nondisjunction A and B
Meiosis II nondisjunction (A and A) or (B and B)
lecture32 204
Nondisjunction in female meiosis II leading to trisomy:
A A’
A A’
A A’
B B’
B B’
46 92
B B’
Meiosis I
1st polar body
A A’
46 46 B A’
A B’
Meiosis II B B’
AA’ and AB’ no Chr 21!

24 2nd polar body
22
gamete
lecture32 205
very close A
to SSR 21.1 B
C
centromere
D
A’
farther
B’
from SSR 21.2
centromere C’
D’
Interpretation: The data for SSR21.1, the centromeric marker,

demonstrate that nondisjunction occurred in
maternal meiosis I.
Taken together, the SSR21.1 and SSR21.2 data
demonstrate that
recombination between the mother's two
chromosome 21’s occurred prior to nondisjunction.
lecture32 206
C D’ A A’
D D’ B B’
very close A
to SSR 21.1 B
C
centromere
D A B’
B B’
A’ D D’
farther
B’
from SSR 21.2
centromere C’
D’
Interpretation: The data for SSR21.1, the centromeric marker,

demonstrate that nondisjunction occurred in
maternal meiosis I.
Taken together, the SSR21.1 and SSR21.2 data
demonstrate that
recombination between the mother's two
chromosome 21’s occurred prior to nondisjunction.
lecture32 207
Studies of many individuals with trisomy 21 using
centromere-linked markers have revealed following
breakdown of cases:
Nondisjunction in maternal meiosis: 88%
Meiosis I: 65%
Meiosis II: 23%
Nondisjunction in paternal meiosis: 8%
Meiosis I: 3%
Meiosis II: 5%
Post-zygotic mitosis: 3%
lecture32 208
The risk of trisomy 21 rises dramatically with increasing maternal age:
Age of Mother Incidence of trisomy 21
20 1 per 1925 births
30 1 per 885 births
35 1 per 365 births
40 1 per 110 births
45 1 per 32 births
50 1 per 12 births
Trisomy 21 provides the major rationale for advising pregnant
women 35 years of age or older to undergo amniocentesis
(examination of fetus' chromosomes by light microscopy).
In human females, oocytes enter but arrest in prophase of
meiosis I during fetal development.
Each oocyte remains arrested in prophase of meiosis I until that
individual oocyte is ovulated, as much as 50 years later!
An oocyte proceeds through meiosis II only after (and if) it is
fertilized.
lecture32 209
Most environmental causes of cancer are mutagens:
mutagenic compounds, X-rays, uv
Enzymatic conversion
H 3C to reactive epoxide
O
Spontaneous
O reaction with
N7 of guanosine
O O H 3C
O O
O
O
O O
Aflatoxin B O
HO O
O
+
N
HN
G
H 2N N N
3,4-benzpyrene DNA
lecture33_1 210
Cancer tends to arise in actively dividing cells
• Epithelial cells (lining of intestine, lungs etc.) = carcinoma
• Blood and lymphatic cells = lukemia, meyloma, lymphoma
• Connective tissue (bones, tendons muscle) = sarcoma
lecture33_1 211
Cancer is a genetic disease of somatic cells
The underlying cause is mutations

that release cells from the normal constraints
that exist in well organized tissues
allowing uncontrolled growth
Which are the key genes that are mutated ?
lecture33_1 212
Incidence of stomach cancer as a function of age
800
Stomach Cancer
700 Males (1975)
Males (1992)
600
Rate per 100,000
500
400
300
Females (1975)
200
100 Females
(1992)
0
0 10 20 30 40 50 60 70 80
5 15 25 35 45 55 65 75 85
Age
Incidence of stomach cancer as a function of age.
Figure by MIT OCW.
lecture33_1 213
Major complications in understanding the genetic basis of cancer
• Multiple mutations are necessary to produce a tumor cell
• Different types of tumor have different genes mutated
• Early initiating events occur rarely in complex tissues
and are therefore extremely difficult to detect
• The key initiating event often leads to an increase in mutation rate
thus tumor cells often bear many fortuitous mutations
Important aspects of the disease we won't discuss

relate to the spread of cancer cells
and the formation of large tumors
(metastasis and angiogenesis)
lecture33_1 214
3T3 cells in culture Transformed 3T3 cells
lecture33_1 215
Isolation of the Ras
oncogene from human tumor
cells
Figure by MIT OCW.
lecture33_1 216
Synergistic effect of oncogenic forms of myc and ras
myc
100
Tumor-free mice (%)

80
ras D
60
40
myc + ras D
20
0 50 100 150 200
Age (days)
lecture33_1 217
Oncogene —
dominant gain-of-function mutations

promote cell transformation
Tumor suppressor gene —
recessive, loss-of-function mutations

lecture33_2 218
lecture33_2 219
Daughter cells
Extracellular
growth
control signals
G0
Intracellular
quality control
checks
M (Mitosis) G1
G2
S
(DNA synthesis)
Figure by MIT OCW.

lecture33_2 220
Cell cycle in S. cerevisiae
Mother cell
Cytokinesis
Daughter cell
Chromosome
segregation; Growth
nuclear division
G1 START
M
Spindle pole body

duplication
Spindle formation
S
G2 Bud emergence
DNA replication
Nuclear migration
lecture33_2 Figure by MIT OCW. 221

A Genetic Screen for Cell Cycle Mutants
Figure by MIT OCW.
lecture33_2 222
Wild type cdc28 mutant cdc7 mutant
lecture33_3 223
What is the basic organization of the cell cycle ?
Are the steps of the cycle mechanistically linked ?
"substrate-product" model
Is there an autonomous "cell cycle clock" ?
lecture33_3 224
Opposite effects of cdc2 alleles in S. pombe
cdc2 + (wild type)
cdc2 - (recessive)
cdc2D (dominant)
Figure by MIT OCW.
lecture33_3 225
There is there an autonomous "cell cycle clock"
which is composed of cyclin + CDK
The amount of cyclins oscillates
throughout the cell cycle
However, the steps of the cycle are linked
by negative feedback loops
known as "checkpoint control" pathways
lecture33_3 226
Daughter cells
Checkpoint control
pathways Extracellular
growth
control signals
G0
Formation of spindle
and alingment of
chromosomes
Completion of M (Mitosis) G1
DNA synthesis
Repair of
DNA damage
G2
S
(DNA synthesis)
Figure by MIT OCW.

lecture33_3 227
Image removed due to copyright reasons.
lecture33_3 228
General types of oncogenic mutations
Activation of growth control pathways

signaling quiescent cells to divide inappropriately
Inactivation of checkpoints
allowing cells with damaged DNA or misaligned chromosomes
to divide allowing high mutation rates and chromosome imbalances
Inactivation of DNA repair genes

allowing high mutation rates causing other oncogenic mutations
lecture33_3 229
Genetics of Cancer
Lecture 34
lecture34 230
Alterations in different kinds of
Genes cause Cancer
Oncogenes
Tumor suppressor genes

Mutator genes
Usually recessive, loss-of-function mutations
that increase spontaneous and environmentally
lecture34 induced mutation rates 231
Most of the mutations that contribute to cancer occur in
somatic cells – but germ line mutations can also contribute
egg sperm
zygote
growth and mitotic

differentiation divisions
2 meiotic
divisions
mitotic
divisions
endoderm
colon gametes (eggs or sperm)

lecture34 232
Most of the mutations that contribute to cancer occur in
somatic cells – but germ line mutations can also contribute
egg sperm
zygote
growth and germ line mitotic

differentiation divisions
2 meiotic
divisions
mitotic
divisions
endoderm
colon gametes (eggs or sperm)

lecture34 233
Signal Transduction and Growth Regulation
Nuclear
Cytoplasmic
proteins
signal
transduction
proteins Growth
Factor
Genes
lecture34 234
Great Targets for Dominant Acting Oncogenes
Secreted Growth
factors, e.g. EGF,
PDGF
Specific Receptors
for Growth factors
e.g., RET, EGFR
Nuclear
Cytoplasmic
proteins
G-proteins, signal
Transcription
kinases and transduction factors, e.g.,
their targets proteins Growth MYC, JUN, FOS
e.g., RAS, ABL,
Factor
(RB)
Genes
lecture34 235
Receptor Tyrosine Kinases (RTKs)
Extracellular domain
Exterior
Cytoplasm Transmembrane domain
Kinase active Cytoplasmic domain

site
Figure by MIT OCW.
lecture34 236
lecture34 237
Extracellular Growth
factor
Engages with and

dimerizes specific
receptors on cell surface

Please see Figure 1 in Zwick, E., J. Bange and A. Ullrich.
"Receptor Tyrosine Kinases as Targets for Anticancer Drugs."
Dimerized Receptor
Trends Mol Med. 8, no.1 (Jan 2002): 17-23. activates cascade of
molecular events
Machinery for increased

cell proliferation is
lecture34 mobilized 238
Kinases
Trans-
cription
Factors
lecture34 239
Constitutive Activation converts RTKs
to Dominant Acting Oncogenes

Please see Figure 2 in Zwick, E., J. Bange and A. Ullrich.
"Receptor Tyrosine Kinases as Targets for Anticancer Drugs."
Trends Mol Med. 8, no. 1 (Jan 2002):17-23.
lecture34 240
Genetic alterations leading to
Constitutive Activation of RTKs
• Deletion of extracellular domain

• Mutations that stimulate dimerization
without ligand binding
• Mutations of Kinase domain
•Overexpression of Ligand
•Overexpression of Receptor
lecture34 241
Two Classic
Examples
Her2
receptor
Please see Lodish, Harvey, et. al. Molecular Cell Biology.
EGF
receptor
Her2 = Human Epidermal

growth factor receptor 2
EGFR = Epidermal growth

lecture34
factor receptor 242
EGF Receptors signal through the RAS G-protein

lecture34 243
Secreted Growth
factors, e.g. EGF,
PDGF
Specific Receptors
for Growth factors
e.g., RET, EGFR
Nuclear
Cytoplasmic
proteins
G-proteins
G-proteins,
and signal
Transcription
kinases,
kinases e.g.,
and transduction factors, e.g.,
RAS,
their ABL,
targets
RB proteins Growth MYC, JUN, FOS
e.g., RAS, ABL,
Factor
RB
Genes
lecture34 244
cABL – A non-receptor, cytoplasmic tyrosine
kinase that can be converted into an
oncoprotein
• cABL proto-oncogene product

signals to many of the same
molecules as the RTKs
• Signals cell cycle progression

and cell proliferation
lecture34 245
The Philadelphia Chromosome and Chronic
Myeloid Leukemia
lecture34 246
Human Chromosome Spread – G-banding Karyotype
lecture34 247
Human Chromosome Spread – G-banding Karyotype
lecture34 248
The Philadelphia Chromosome created by a
Translocation between Chrs 9 and 22
Chronic Myeloid Leukemia
lecture34 249
Myeloid Leukemia
lecture34 250
Myeloid Leukemia

lecture34 251
Fusion Protein

Uncontrolled ABL Kinase Activity

and Signal Transduction
lecture34
Chronic Myeloid Leukemia 252
Secreted Growth
factors, e.g. EGF,
PDGF
Specific Receptors
for Growth factors
e.g., RET, EGFR
Nuclear
Cytoplasmic
proteins
G-proteins
G-proteins,
and signal
Transcription
kinases,
kinases e.g.,
RAS,
their ABL,
targets
e.g., RAS, ABL,
Factor
RB
Genes
lecture34 253
Burkitt’s Lymphoma: A chromosome translocation
cMYC to be expressed inappropriately in B-cells
14
IgH
c-myc Figure by MIT OCW.
lecture34 cMYC drives cells from G1 to S 254

Another way that oncogenic transcription factors
can be up-regulated: Gene Amplification
Chromosome from a TUMOR
Blue – staining of
all chromosomes
Red – staining of
Please see Lodish, Harvey, et. al. Molecular Cell Biology. chromosome 4
Green – staining
of the N-MYC
gene
(N-MYC and cMYC share
many similar proerties)
lecture34 255
One more example – with an interesting twist
A translocation between Chr 14 and Chr 18 to put
the BCL2 gene under the strong IgH promoter
lgH Breakpoint
enhancer
Chromosome 14
Immunoglobulin heavy
chain gene (lgH)
Not active in B lymphocytes
Chromosome 18
bcl2 gene
Breakpoint
Rejoining of
breakpoints
Active in B lymphocytes
Translocation 4;18
Figure by MIT OCW.
The BCL2 protein PREVENTS programmed cell death, B cells

lecture34 live longer than normal leading to B-cell Lymphomas 256
What chromosomal events convert proto-
oncogenes to dominantly acting oncogenes
• Point mutations (e.g., RAS)
• Deletion mutations (e.g., RTKs)
•Chromosomal translocations that produce
novel fusion proteins (e.g., Bcr-Abl)
• Chromosomal translocation to juxtapose a
strong promoter upstream and the proto-
oncogene such that it is inappropriately
expressed (e.g., Bcl2)
• Gene amplification resulting in overexpression
(e.g., N-Myc)
lecture34 257
Secreted Growth
factors, e.g. EGF,
PDGF
Specific Receptors
for Growth factors
e.g., RET, EGFR
Nuclear
Cytoplasmic
proteins
G-proteins, signal
Transcription
kinases and transduction factors, e.g.,
their targets proteins Growth MYC, JUN, FOS
e.g., RAS, ABL,
Factor
RB
Genes
lecture34 258
RB – the Retinoblastoma Gene – was the first example
of a Tumor Repressor Gene (aka a Recessive Oncogene)
Loss of Function Mutations in

both RB genes lead to malignant
tumors of the retina during the
first few years of life
lecture34 259
Daughter cells
Extracellular
growth
control signals
G0
Intracellular
quality control
checks
M (Mitosis) G1
RB prevents
cells from
G2 leaving G1 to
enter S-phase,
S until the
(DNA synthesis)
appropriate
time
Figure by MIT OCW.
lecture34 260
Phosphorylation of RB at the appropriate time in
G1 allows release of the E2F Transcription Factor
Cell cycle P
Kinase P
RB E2F RB
P
Must lose function

of both RB alleles Transcribes
in order to lose genes for
E2F
replication
cell cycle control
and cell
lecture34
proliferation
261
Two ways to get retinal tumors due to loss of
RB function
Germline Normal
mutation gene
Somatic mutation
Somatic mutation
Somatic mutation
Multiple tumors Single tumors

Bilateral Unilateral
Early-onset Later-onset
Mendelian Sporadic
Figure by MIT OCW.

lecture34 262
The Retinoblastoma • In order to lose cell cycle control
disease behaves as an MUST lose function of both alleles
autosomal dominant
mutation • But, for Mendelian inheritance of
RB, children need only inherit only
one non-functional allele
• To explain this the “TWO HIT”

hypthesis was proposed
Germline
•During development of the retina
mutation
a second mutation is almost
Somatic mutation
certain to occur
Multiple tumors
Bilateral
Early-onset •RB is one of the very few cancers
that seems to require defects in
only one gene (but in both alleles263
Figure by MIT OCW.
lecture34
How is the second RB allele
rendered non-functional?
Loss of
wt Rb Mutant RB Heterozygosity
LOH
This can happen
is several ways
Heterozygous for RB
mutation
lecture34 264
Point Mutation Non-Disjunction
Chromosome Chromosome loss

loss & duplication
wt Rb Mutant Rb
Recombination
Interchromosomal
lecture34 Deletion Translocation Gene Conversion 265
Recombination
Secreted Growth
factors, e.g. EGF,
PDGF
Specific Receptors
for Growth factors
e.g., RET, EGFR
Nuclear
Cytoplasmic
proteins
G-proteins
G-proteins,
and signal
Transcription
kinases,
kinases e.g.,
RAS,
their ABL,
targets
e.g., RAS, ABL,
Factor
RB
Genes
lecture34 266
Genes cause Cancer
Oncogenes

Mutator genes
Genetics of Cancer
Lecture 35
lecture35 268
Genes cause Cancer
Oncogenes

Mutator genes
What chromosomal events convert proto-
oncogenes to dominantly acting oncogenes
• Point mutations (e.g., RAS)
• Partial deletion mutations (e.g., RTKs)
•Chromosomal translocations that produce
novel fusion proteins (e.g., Bcr-Abl)
• Chromosomal translocation to juxtapose a
strong promoter upstream and the proto-
oncogene such that it is inappropriately
expressed (e.g., cMyc, Bcl2)
• Gene amplification resulting in overexpression
(e.g., N-Myc)
lecture35 270
Point Mutation Non-Disjunction
LOH - Loss of
heterozygosity
Chromosome Chromosome loss
loss & duplication
wt Rb Mutant Rb
Recombination
Interchromosomal
lecture35 Deletion Translocation Gene Conversion 271
Recombination
Sunlight
Oxidation
Pollution Food
Cigarette
Smoke
Courtesy of Professor Bevin P. Engelward. Used with permission.
lecture35 272
Excision Repair
Proteins Detect Damage
Enzymes Excise DNA

Segment with Damage
DNA Polymerase Copies the

Undamaged Strand
DNA Ligase Seals the ends

together
lecture35 273
lecture35 274
Figure by MIT OCW.
Sunlight
Oxidation
Pollution Food
Cigarette
Smoke
lecture35 275
Xeroderma Pigmentosum An
Autosomal Recessive Disease
2000-fold increased risk of
lecture35
skin cancer 276
Complementation in fused cells reveals 7
genes that cause Xeroderma Pigmentosum
nucleus = DNA Excision Repair after UV Irradiation
= No DNA Excision Repair after UV Irradiation

cytoplasm
WT XPA WT + XPA
XPA XPA XPA + XPA
XPA XPB
XP
lecture35 277
XPA + XP
XPB
Age at First Skin Cancer
100
Cumulative Cancer Incidence (%)
90
80
70
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90
Age (years)
XP population Non-XP population
lecture35 278
Figure by MIT OCW.
There are Many Other Human Cancer Prone
Syndromes Deficient in DNA Repair
If DNA
Repair
pathway Colon Colon Skin Breast
is Ovary Ovary
defective
lecture35
Endometrial Leukemias 279
Hereditary Nonpolyposis Colon Cancer
DNA Mismatch Repair Defect
Syndrome inherited as Autosomal Dominant

lecture35 280
Hereditary Breast Cancer Susceptibility
DNA Recombination Repair Defect
Syndrome inherited as Autosomal Dominant
BRCA2 Family Pedigree

lecture35 281
Cells need time to repair DNA: DNA
Damage induces Cell Cycle Checkpoints
• DNA damage
Daughter cells
Extracellular
growth
G0
control signals
signals cell cycle
Intracellular
quality control
check points
checks
M (Mitosis) G1
G2 • If the damage is
S
too great to fix by
repair a signal is
(DNA synthesis)
sent for the cell to

Figure by MIT OCW. undergo suicide
lecture35 282
Sunlight
Oxidation
Cigaratte Smoke
Pollution Food
Courtesy of Professor Bevin P.
Cigarette
Engelward. Used with permission. Smoke
DNA damage is sensed

G1, G2, &
M arrest
Signal Transduction
KINASES are activated
Apoptosis P P
P
p53 p53
Increased DNA
lecture35
repair 283
Loss of p53 function occurs in
more than 50% of human cancers!!
•These cancer cells are genetically
unstable because they are unable to do
the following:
• Stop the cell cycling to allow time for
DNA repair
• Carry out efficient DNA repair
lecture35
• Undergo apoptosis 284
Li-Fraumeni Syndrome –
Inheritance of one p53 null allele

lecture35 285
Most fully blown cancers require inactivation of
tumor suppressor genes and activation of oncogenes
Inactivation of APC
Tumor Suppressor genes
Take the case
of Colon
Activation of K-RAS Cancer
Oncogene
Inactivation of p53
Tumor Suppressor gene
E arly A denoma /
Normal E pithelium Dysplastic Crypt L ate A denoma Carcinoma Metastasis
A PC Other
K RAS T P53 Changes
Figure by MIT OCW.

lecture35
20 – 40 Years 286
Xeroderma Pigmentosum ~ 1/250,000
Image removed due to copyright reasons. Please see Wei et al., Clinical Chemistry, Vol. 41, No. 12, 1995.
lecture35 287
lecture35 288

Genetics M It

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Genetics M It

Uploaded by

Copyright:

Available Formats

Genetics Lecture 1

We will begin this course with the question: What is a gene?

Genes are made of DNA

Gene: DNA segment needed to make a protein

Gene — Protein — Cell Process — Organism “disease”

(Shibire) (Dynamin) (Synaptic Signaling) (Paralyzed Fly)

Alleles: different versions of the same gene

Wild type: defined standard genotype

His+ (wild type) + +

Phenotype: All traits of an organism (with an emphasis on trait under investigation)

Homozygote: diploid with two like alleles of same gene

Heterozygote: diploid with two different alleles of same gene

Recessive Allele: trait not expressed in heterozygote

genotype phenotype Mate to : diploid genotype diploid phenotype

genotype phenotype Mate to : diploid genotype diploid phenotype

Dominant Allele: trait is expressed in heterozygote

Cup1r is dominant to wild type (Cup1+).

Dominant alleles usually cause increased activity or new activity

An easy way to do this would be to mate a MATα HisX– strain to a MATa

possibility genotype of diploid phenotype of diploid complementa-

HisX≠His3 His3–/His3+, HisX–/HisX+ His+ Yes

organisms or a 2:2 segregation pattern in yeast tetrad analysis.

Alleles: Distinguishable versions of the same gene.

mapping relative to neighboring loci.

trolled by the gene or genes under examination.

Haploid: A cell or organism with one set of chromosomes (1n).

Diploid: A cell or organism with two sets of chromosomes (2n).

Homozygous: The condition of having two like alleles in a diploid.

Heterozygous: The condition of having two different alleles in a diploid.

tween the corresponding homozygote phenotypes.

genotype of the other parent.

F1: First generation produced by interbreeding of two lines.

F2: Generation produced by interbreeding of F1 individuals.

traits because of other environmental or genetic influences.

True-breeding: Refers to a line of individuals that on intercrossing always produce individuals of

Now let’s consider diploid organisms:

When we do breeding experiments it is important to know the genotypes of the parents.

Say we have isolated a new paralyzed mutant that we call par

Possible outcome Complementation? Explanation Inferred genotype

shi– and par– par– genotype can supply par–/par

shi– and par– par– has lost function shi–/shi

Let’s look more carefully at gene segregation in a cross between F1 flies.

What is the probability of a paralyzed fly in the next (F2) generation?

Product rule: p(a

shi– from mother) = 1/2

p(paralyzed) = 1/2 x 1/2 = 1/4

p(not paralyzed) = 1 – 1/4 = 3/4

A 1 : 3 phenotypic ratio among the F2 in a breeding experiment shows that alleles of a

~1/500 look like Teosinte and ~1/500 look like Maize

Let’s designate the genes that differ as A, B, C, D ...

Incomplete dominance: heterozygote expresses the traits of both homozygous parents.

For two genes that differ: AT/A

Shibire x wild type

Whereas wild type would have genotype: A/A , B/B

F 2: p(a/a and b/b) = (1/4 )2 = 1/16

p(a/a and B/––) = 1/4 x 3/4 = 3/16

p(b/b and A/––) = 3/16

p(A/–– and B/––) = the rest = 9/16

This is the classic ratio for two gene segregation 9:3:3:1

degrees of freedom (df) = number of classes – 1

From the table using 1 df, 0.05 < p < 0.5

From the table p < 0.005 so we reject the two-gene hypothesis.

Mendelian inheritance in humans