You are on page 1of 8

THE JOURNAL OF BIOLOGICAL.

CHEMLWRY CC, 1990 by The American Society for Biochemistry

and Molecular Biology, Inc.

Vol. 265, No. 11, Issue of April 15, pp. 6104-6111, 1990 Printed in CJ.S. A.

Characterization a Key Proenzyme

of the Gene for Human Plasminogen, in the Fibrinolytic System*


(Received for publication, November 27, 1989)

Torben
From

E. Petersen+,
the Department

Mark

R. MartzenQ,
University

Akitada

Icbinose,
Seattle,

and Earl
Washington

W. Davie
98195

of Biochemistry,

of Washington,

The organization and structure of the gene coding for plasminogen has been determined by a combination of in vitro amplification of leukocyte DNA from normal individuals and isolation of unique clones from three different human genomic libraries. These clones were characterized by restriction mapping, Southern blotting, and DNA sequencing. The gene for human plasminogen spanned about 52.5 kilobases of DNA and consisted of 19 exons separated by 18 introns. DNA sequence analysis revealed that the five kringle structures in plasminogen were coded by two exons. The nucleotides in the introns at the intron-exon boundaries were GT-AG analogous to those found in other eukaryotic genes. Three polyadenylation sites for plasminogen mRNA were also identified. When the amino acid sequences deduced from the genomic DNA and cDNAs of plasminogen were compared with that of the plasma protein determined by amino acid sequence analysis, an apparent amino acid polymorphism was observed in several positions of the polypeptide chain. Nucleotide sequence analysis of the amplified genomic DNAs and genomic clones also revealed that the plasminogen gene was very closely related to several other proteins, including apolipoprotein(a). This protein may have evolved via duplication and exon shuffling of the plasminogen gene. The presence of another plasminogen-related gene(s) in the human genomic library was also observed.

plasminogen by releasing an NHz-terminal fragment (A4, 8,000) called a preactivation peptide (2). Lys-plasminogen is more readily activated by plasminogen activators and binds to fibrin with greater affinity than native Glu-plasminogen
(3,4).

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

Plasminogen is a glycoprotein that circulates in plasma as a proenzyme. It is converted to plasmin by tissue plasminogen activator (tPA) in the presence of a fibrin clot or urokinase (1). Plasmin then digests the insoluble fibrin clot into soluble fragments during tissue repair and recanalization. The molecular weight of native Glu-plasminogen is about 93,000, as estimated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Plasmin also converts Glu-plasminogen to Lys* This work was supported in part by Research Grant HL 16919 from the National Institutes of Health. The costs of publication of this article were defraved in part by the payment of page charges. This article must therefore be hereby marked aduertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTMIEMBL Data Bunk with accession number(s) 505286. $ Supported by International Research Fellowship 3 F05 TW03433-OlSl from the John E. Fogarty International Center for Advanced Study in the Health Sciences. Present address: Dept. of Molecular Biology, University of Aarhus, C. F. Mollers Alle 130, DK8000 Aarhus C, Denmark. 3 Supported by National Research Service Award 5 F32 HL07816 from the National Institutes of Health. 1 The abbreviations used are: tPA, tissue-type plasminogen activator; kb, kilobase( bp, base pair(s).

The primary structure of human plasminogen (791 amino acids) has been established by amino acid sequence analysis (5-7) and cDNA cloning (8,9). It is a single-chain glycoprotein consisting of a preactivation peptide (77 amino acid residues), five tandem structures called kringle domains (about 90 residues each), an activation cleavage site (between Arg-561 and Val-562), and a catalytic domain including the serine protease triad of His-603, Asp-646, and Ser-741. The kringle structures are also found in a number of other proteins, such as tPA, urokinase, factor XII, prothrombin, and apolipoprotein(a). The last protein is highly homologous with plasminogen and contains up to 37 tandem repeats of plasminogen kringle 4 (10, 11). The first kringle in plasminogen (12, 13) and the second kringle in tPA (14, 15) function as a binding site for fibrin. The function of the kringles in the other proteins has not been established. Since several cases of plasminogen abnormalities and deficiencies have been identified in association with thrombosis (16), it was important to determine the structure and organization of the normal gene in order to compare it with abnormal genes. Knowledge regarding the gene for plasminogen could also provide some insight as to its regulation as well as its evolution in relation to other closely related genes, such as the gene coding for apolipoprotein(a). In previous studies, cDNAs (8, 9, 17) and several genomic clones (8, 17) coding for plasminogen were isolated and the sequence of the DNA coding for a portion of kringle 4 in the human gene was reported (8). In the present studies, the sequence of the 5- and 3-flanking regions, the exons, and the intron-exon boundaries of the entire gene coding for human plasminogen are presented and compared with several closely related proteins.
EXPERIMENTAL PROCEDURES

Restriction endonucleases, nuclease Bal-31, and T4 DNA ligase were purchased from Bethesda Research Laboratories or New England Biolabs. T7 DNA polymerase and sequencing kits were purchased from the United States Biochemical Corp. The Klenow fragment of Escherichia coli DNA polymerase, bacterial alkaline phosphatase, ATP, deoxynucleotides, dideoxynucleotides, M13mp18, -Ml3mpi9, pUC18, and pUC19 were supplied by Bethesda Research Laboratories. P-Labeled nucleotides were obtained from Du PontNew England Nuclear, and [(u-S]dATP was provided by Amersham Corn. Two human nenomic libraries cloned into Charon 4A (18) and EMbL3 (19) were kindly provided by Drs. Tom Maniatis and Shinji Yoshitake, respectively. Additional human leukocyte and lung fibroblast genomic libraries were obtained from Clontech and Stratagene, respectively. Oligonucleotides were synthesized using a nucleotide synthesizer (Applied Biosystems Inc.) and kindly provided by Dr. Patrick S. H.

6104

Organization

of the Gene for Human Plasminogen

6105

Chou, Dr. Yim Foon Lee, and Jeff Harris, University of Washington. Genomic clones containing the gene for human plasminogen were obtained by screening human genomic libraries by the in situ hybridization technique using a partial cDNA or the 5 and 3 portions of the cDNA coding for human plasminogen. Two cDNAs of 1.9 kb (8) and 2.7 kb (17) were principally employed. The 2.7-kb cDNA was isolated from a normal liver cDNA library and started with nucleotide

FIG. 1. EcoRI restriction map and location of the exons in the gene for human plasminogen. The 19 exons are shown with wide vertical bars and are numbered with Roman numerals, while the 14 EcoRI restriction sites are shown with narrow uertical bars. The six overlapping X phage clones with DNA inserts coding for plasminogen are also shown. The 5 and 3 portions of the gene (14.5 and 3.5 kb, respectively) were amplified by the polymerase chain reaction, and these fragments are listed as PCR 1-13 for the 5 end and PCR 14-16 for the 3 end of the gene. TABLE Nucleotide PCR fraanent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 sequences Size kb 1.2 3.2 2.4 3.8 1.1 1.6 2.2 2.2 4.4 3.3 2.2 2.7 2.3 1.8 1.8 3.5
of the primers

100 (Fig. 2). It contained the same 3 end as the 1.9-kb cDNA. A third cDNA of 3.4 kb was also employed. It was isolated from a human Hep G2 cDNA library and extended beyond the stop codon in the smaller cDNAs by 750 nucleotides. This 3.4-kb cDNA resulted from the utilization of an alternative polyadenylation site. The Hep G2 cDNA library was kindly provided by Dr. Fred Hagen, ZymoGenetics, Inc., Seattle, WA. To obtain genomic clones containing certain exons, appropriate restriction fragments from the cDNA or synthetic oligonucleotides were used for further screening or for identification of isolated clones by Southern blot analysis. Phage DNA was prepared by the liquid culture lysis method (20), followed by centrifugation and banding on a cesium chloride step gradient (21). Genomic DNA inserts were isolated by digestion of the phage DNA with EcoRI or Sal1 and EcoRI endonuclease followed by subcloning into plasmid pUC18 or pUC19. Additional restriction fragments from the inserts were also subcloned into M13mp18 or M13mp19 to obtain overlapping sequences. The genomic DNA inserts were sequenced by the dideoxy method (22) employing [(u-S]dATP and buffer gradient gels (23). The DNA sequence was determined two or more times, and approximately 90% of the sequence was carried out on both strands. Sequence data were obtained by employing at least two overlapping independent fragments. Digestions with nuclease Bal-31 were also performed to generate DNA fragments that provided overlapping sequences with restriction fragments (24). Oligonucleotides were synthesized as sequencing primers to obtain DNA sequence of the second strand for several regions in the gene. The 5 and 3 portions of the gene for plasminogen were also established by in vitro amplification employing the polymerase chain reaction (25). Genomic DNA samples were prepared from the leukocytes of normal individuals by standard techniques (26). One to five pg of genomic DNA was amplified in a lOO-~1 reaction mixture containing 50 mM KCl, 10 mM Tris-HCl (pH 8.4), 2.5 mM MgC12, two oligonucleotide primers each at l-10 PM, the four deoxynucleotide triphosphates each at 200 pM, gelatin at 200 pg/ml, and 2.5-5.0 units of Taq DNA polymerase obtained from New England Biolabs or Perkin-Elmer-Cetus. Each sample was placed in a small Eppendorf tube and overlaid with 75 ~1 of mineral oil to prevent evaporation. I amplification

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

employed

for

of the gene coding

for plasminogen

5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3 5 3

GATCGAATTCCGCAGACATTCCACC CACAGAATTCCATGGCATATGTATTTTTACTAC CTGCGAATTCTGGCAACCACTAATCTAC GGGTATTCACATAGTCATCCAGAGGCTCTCC GATGAAGCTTGTAGTTTTATTTGAAAAGAAAGGT ATTAAAGCTTGTCGAGATATGGTCCACTTCAA TGTAAGCTTCAGAGTGCAAGACTGGGAATGGAAAG TTGGAAGGAATGTATCCATGAGCGTGTGGG GGGACCCACTTTCTGGGCACTGCTGGCC CCATAAGCTTGTATGCCTAAATGGGTGAATTC AAGCAGCTGGGAGCAGGAAGTAT TTTTCAAATAAAACTACATCTCTCATC ATTAAAGCTTACAAGTAGCAAGCAAACGGT GTAAAGCTTTCCATTCCCAGTCTTGCACTCTGA ATTTGAATTCATCCATTTCAGTTTTCTTCTTC TGTAAGCTTTTGATTTCAAGAACAGGGC GGGACCCACTTTCTGGGCACTGCTGGCC GGGTATTCACATAGTCATCCAGAGGCTCTCC GATGAAGCTTGTAGTTTTATTTGAAAAGAAAGGT GTAAAGCTTTCCATTCCCAGTCTTGCACTCTGA CATCGAATTCTGCCTTGCTAATAGCAAGC TTTACATGTGTAAAAATCACTCAACAGAAT TAGTAAGCTTCTTTATTTATGTCCAAATGCCCG TATTAAGCTTACCGTTTGCTTGCTACTTGTAA TGTAAGCTTCAGAGTGCAAGACTGGGAATGGAAAG ACACTCAAGAATGTCGCAGTAGTCATATCTC GGTAGTCAAGAGGAGCTTCCTCCCTGCAGC ACAGAGTTCGGTGGATTGGACTCTTCCATTCAG GGAAGAGTCCAATCCACCGAACT CACAGTCACTTGCAGTTTTGCTTTTCTCTG GGTAGTCAAGAGGAGCTTCCTCCCTGCAGC CACAGTCACTTGCAGTTTTGCTTTTCTCTG

6106

Organization

of the Gene for Human Plasminogen

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

FIG. 2. Nucleotide sequence of the 5- and 3-flanking regions, the exons, and the intronjexon boundaries for the gene coding for human plasminogen. The DNA sequence upstream from the cytosine listed as nucleotide 1 is shown in the left margin with negative numbers. The amino acids in the signal sequence are also shown with negative numbers, while those in the mature protein are shown with positive numbers in the left margin. The amino acid sequence predicted by the coding region of each exon is indicated above the corresponding DNA sequence employing the oneletter amino acid code. CCAAT boxes and TATAA sequences are underlined, as well as the 3.noncoding region with sequences that are apparently involved in mRNA processing. The 5 and 3 ends of each exon are enclosed in brackets. The sites of polyadenylation at the 3 end of the gene are shown with a diagonal slash. The sequences used for the preparation of amplifying primers are underlined or ouerlined and begin with an asterisk. The solid vertical arrows indicate the cleavage site for the signal peptide and the cleavage site for the conversion of plasminogen to plasmin.

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

FIG.

2. -continued

The samples were then subjected to 25 or 30 cycles of amplification by heating at 94 C for 1 min to denature the DNA, cooling to 6070 C for 2 min to anneal the primers, and incubating at 72 C for 3 min to extend the annealed primers. At the end of the last cycle, the samples were incubated at 72 C for 7 min to ensure the completion of the final extension step. After precipitation with ethanol and resuspension in 100 ~1 of 10 mM Tris-HCI and 1 mM EDTA buffer (pH 7.5), a 5- or lo-p1 aliquot was applied to a 0.8 or 1.5% agarose (IBI) gel containing 0.5 rg/ml of ethidium bromide in 89 mM Tris base, 89 mM boric acid, and 20 mM EDTA buffer (pH 7.8). A human leukocyte library (18) and a lung fibroblast library (Stratagene) were also screened employing the appropriate 5 or 3 region of the cDNA (8, 17) to obtain genomic clones containing the 5 or 3 portion of the gene coding for plasminogen, as described above. To select the correct genomic clones coding for plasminogen and to exclude those for a plasminogen-related gene(s), the isolated phage clones were first amplified by the polymerase chain reaction, as described. Several portions of the amplified phage DNA were then subjected to DNA sequence analysis. The phage clones that were shown to contain the nucleotide sequences coding for exons that matched the corresponding regions of the cDNAs for plasminogen were employed for further analysis. DNA sequences were analyzed by the Genepro program (Version 4.1, Riverside Scientific Enterprises. Seattle, WA) employing a Tandy 3000 computer.

RESULTS

AND

DISCUSSION

The Middle
clones were

of (Xl, X2, X3) containing initially isolated from

Portion

the Gene for


the gene for approximately

Plusminogen-Three
human plasminogen 2 x lo6 phage et al. (18) using of

the AU/Hoe111 genomic library of Lawn the cDNA of 1.9 kb as a probe (Fig. 1) (8). These three clones were found to be unique by restriction enzyme digestion and Southern blotting analysis. DNA sequence analysis revealed that these genomic clones contained the middle portion of the gene for plasminogen extending from exons VII to XVII (Fig.
1). This corresponded to the central part of the cDNA coding for the polypeptide chain of plasminogen extending from the second half of kringle 2 (Lys-204) to the middle portion of the catalytic chain (Gly-690).

Since these three clones did not contain the 5 and 3 portions of the gene, appropriate restriction fragments from the cDNA of 1.9 kb (8) or 2.7 kb (17) and synthetic oligonucleotides were used for further screening and isolation of additional clones for identification by Southern blot analysis. Two more clones obtained from the human fibroblast library (19) and human leukocyte library contained nucleotide se-

6108

Organization

of the Gene for Human Plasminogen


K2 SGLE:QRUDSQ,

cc~T?Q~usa-

K3

acid sequence for plasminogen and the locathe 18 introns in the gene for plasminogen. The positions of the introns (A-R) are indicated by solid arrows at or between specific human tion of coding

FIG. 3. Amino

amino acids. The amino acid residues are numbered starting with the aminoterminal glutamic acid residue as number 1 and ending with residue 791. The signal peptide (shown in a box with negative numbers) contains 19 amino acids and is cleaved by signal peptidase at the Gly-Glu peptide bond. The preactivation peptide (PAP) is generated primarily by the cleavage between Lys-77 and Lys-78 (shown with an open straight arrow) by plasmin. The conversion of plasminogen to plasmin occurs by the cleavage between Arg-561 and Val-562 (shown by an open curued arrow). Kl-K5 refer to kringles l-5 in the A chain, while the active site His, Asp, and Ser residues in the B chain are circled. Carbohydrate attachment sites (Asn-289, Thr-346) are shown by diamonds.

c c--c
: G NK TK~~rr.~~K y TT D DPEKRY

*I.

c-c ECE i,;42iTp 1


s

PP~~~~TV

L L. c-c D

~.'

t 298
T.346 347*R P P ELT,u~

SPSTEQLRP

TKCE

A chain
ct lain
G G T

Y C-L
0 s G _

uL
.F

~~

,
E R s

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

PAP
43& E EF~CA D~Eh3;~E,

+sK
: Q,

;:
nAEN

nG CFH
F f+ R k U Q 582 :

G R,
G c-c A H UPUS P

I
GP E

R
LLppp RGK K~ NG bE;fc$ -;QPGy
c

,,,Tf5 T

SIGNAL
j-y-19

t: P : F ;

DyLKRPNTTv Y KN * vch ,: DOD UC C--c


L.

,s

UTKKqL FS
L S LL R GQ TN -3 + KEpaGG +t v , , L P U

: G

TRK~IRLLKLS~PflUITGKlp~ 690 + GRGFTGQTEGUGT,FCETRGAUUYNPSP NOR 8 E , GpLAFEKGK 0 m741 GLGUST ,L Q c

: s (I

LI k pN HPE T~~~~~~~R

791 RPNKPGUYURUSRFUTUIEGUMRNN~

quences that apparently corresponded to the 5 and 3 regions of the cDNAs. Preliminary sequence analysis of these clones indicated, however, that they originated from a plasminogenrelated gene(s), since there were a number of nucleotide changes and in-frame stop codons in the apparent exons from both the 5 and 3 regions when their sequences were compared with the plasminogen cDNA sequence (8,9, 17). The 5 and 3 Portions of the Gene for Plasminogen-To obtain the correct 5 and 3 portions of the gene coding for plasminogen, leukocyte DNA from normal individuals was amplified by the polymerase chain reaction employing oligonucleotide primers (Table I). The sequence of these primers was matched to the appropriate regions of the cDNA (8, 9, 17). Primers were also prepared from the 5 and 3 ends of the existing clones (X1-X3). In addition, the genomic sequences of the plasminogen-related gene(s) were utilized to design primers for the amplification of the 5- and 3-flanking regions. Altogether, 13 overlapping DNA fragments were prepared from the 5 end (PCR 1-13) and three fragments from the 3 end (PCR 14-16) of the gene by the polymerase chain reaction (Fig. 1). These fragments covered approximately 14.5 and 3.5 kb of genomic DNA from the 5 and 3 portions of the gene, respectively (Fig. 1). DNA sequence analysis of these fragments revealed that the 5 portion contained the genomic sequence coding from the signal peptide to the first half of kringle 2 (including exons I-VI), and the 3 portion coding for exons XVIII and XIX. Two additional X phage clones (h4 and X5) were then isolated by rescreening of the genomic library of Lawn et al. (18) using the 5 portion of the cDNA as a probe, and one more clone (X6) was obtained by screening a human lung fibroblast library using the 3 end of the cDNA (Fig. 1). The restriction digestion and mapping of the genomic inserts in these clones with endonuclease EcoRI was consistent with that obtained by the amplified DNA by the polymerase chain reaction. DNA sequence analysis also confirmed that the nucleotide sequences of these genomic clones were identical with those obtained from the amplified DNA generated by the polymerase chain reaction.

Nucleotide Sequences of the Exons and Intron-Exon Boundaries--The DNA sequence of 7853 nucleotides coding for human plasminogen and the flanking regions of the gene is shown in Fig. 2. This sequence extended about 960 base pairs upstream from the cytosine which was arbitrarily labeled as nucleotide 1. Comparison of the DNA sequence of the gene with the cDNA sequence (8, 9, 17) indicated that the gene consisted of 19 exons (I-XIX) interrupted by 18 introns (AR) (Fig. 3). The first exon contained the 5-noncoding region and coded for a typical signal peptide including a hydrophobic core. The sequence of all the intron-exon splice junctions (Table II) agreed with the GT-AG rule of Breathnach et al. (27) and with the consensus sequence of Mount (28). Eight of the splice junctions were type I (introns A, C, E, G, I, K, M, and Q), eight were type II (introns B, D, F, H, J, N, 0, and P), and two were type 0 (introns L and R) (29). The exons varied in size ranging from 75 to 387 nucleotides. Exon XIX was the largest of the 19 exons and included the coding region for the active site Ser, the COOH terminus of the protein and the 3-noncoding region of the gene. The average size of the 19 exons was 146 bp, which is similar to the average size of 150 bp found in other eukaryotic genes (30). The overlapping clones spanned about 52.5 kb. Thus, the gene for plasminogen is the largest of the known serine proteases involved in blood coagulation and fibrinolysis (31). Nucleotide Sequences of the 5- and 3-Flanking RegionsThe DNA sequence analysis revealed that the 5-flanking region of the gene for plasminogen contained two clusters of regulatory elements for transcription (32), including forward and reverse CCAAT boxes and TATAA sequences (Fig. 2). At present, it is not known whether or not either of these sequences functions as a promoter element. Two sequence elements of CTGGGA common to acute-phase reactant genes (33, 34) were found in the 5-flanking region of the gene for plasminogen. However, no GC boxes were present. Sequences in the 3-flanking region are also thought to play a role in polyadenylation and mRNA processing. A potential CAYTG signal (35) was identified 13 bp downstream from

Organization

of the Gene for Human

Plasminogen

6109

TABLE II Nucleotide sequence at the splice junctions and size of exons Interrupted codons by introns are underlined. The exact size of exon I (shown in parentheses) is not known. PAPl, amino-terminal half of the preactivation peptide; PAPS, carboxyl-terminal half of the preactivation peptide; Kla-K5a, amino-terminal half of the kringle; Klb-K5b, carboxyl-terminal half of the kringle; Act, activation cleavage site; Cnt, connecting region to the A chain; Loop, disulfide loop prior to the active-site serine. Size bp (169) 136 107 115 140 121 119 163 146 160 182 149 94 121 75 141 107 146 387 EXOII Boundary sequence Intron Boundary sequence EXOll Junction type I 11 I II I II I II I II I 0 1: II II I 0

5-Noncoding PAP1 PAP2 Kla Klb K2a K2b K3a K3b K4a K4b K5a K5b Act His Asp Cnt Loop Ser + 3-noncoding

I II III IV V VI VII VIII IX X XI XII XIII XIV xv XVI XVII XVIII XIX Consensus*

. CTGAAATCAG GTAAGA . . . .TCACCTGCAcGTATTT... .GAAAAGAA? GTGAGT. . . .ACAGACCTACGTAAGA... .GAGTGTGAA?GTCAGG... .TTCCTTCCAiiGTAAGT... .CCCCGCTGcGTGAGT... .TTCCCTGCAiiGTAAGT... .GCTCCCAC=GTAAGC... .ACCCAAATGeGTATGT... . TCCGAAGAA? GTAAGA . . .GGAAAAAAATGTAAGC... .CCTCAGTGTGGTAGGT... .TTAGAACAAG GTAAGA. . . . GCTTGGAGG GTATGT . . . .AGCTAAGCzGTACTC... .GAAACCCA.i.iGTGAGA... .CAGTTGCCAGGTAAGC... A C AAG GTGAGT

A B C 0 E F G H I J K L M N 0 ; R

. . . . . . . . . . . . . . . : .

.CTCTAGGTCM&GAGA.. .CTGCAG?%CATTCCAA.. .CTTCAGTGTATCTCTC.. .CCCCAGATTCTCACCT.. .GTCCAGxGGAATGTAT.. .ATTCAGATTTCCAAAC.. .TTCAAGCAACACCTCC.. .TTTCAG-AAATTTGGAT.. .TTTCAGCACCACCTGA.. .TTCCAG%CCTGACA.. .GTACAGACTGTATGTT.. .TTTCAG%TGCCGTA.. .CCACAGCGGCCCCTTC.. .TTCCAGETTGGAATG.. .TTCTAGGTCCCCAAGG.. .TTTCAG?;CCTGCCGTC.. .ACACAGGTACTTTTGG.. .GTATAGETGACAGTG.. TT T - CCNCAG

II III IV V VI VII VIII IX x 4 XIII XIV xv XVI XVII XVIII XIX

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

GT

Sharp *Mount

(29). (28).

Plasminogen
Signal P*P ~r,n~le I Krlngle 2 Krlngle 3 Krlngl-3 Krlngle 5

tPA

.-. : ..I. -.* : *..*.... .* p. : t-2 *. .. *. .. : . : :*.... . -. 2 *g... :2.2...* if :.,. . ::: . :.* : .* c, f -.A-*= 4. . . . .7,: +..:2.: :. . . .-. . . . . . .:Kp =.:=..* .(-&q t -f t t

signal

Type I

EGF

Krlngls

1 Krlngle

SIgnal EGF
UPA

Kringle

Signal

Prothrombin

Gla .*. : : : : . :'-v.

. : \*,'.. : '-...* :

1..

Kringle 1 Kringle 2 .-T:*. ..y::*.. : t .:;:: ; : : -; : : .a. '.f '-'. .*.... - '...+T**. .* '.$.....:A . . . . . . . . .-.:.:...@q

tt

Factor

XII

signa,

Type II

EGF 1 Type I

EGF 2

Kringle

FIG. 4. Location of the introns in the genes for five kringlecontaining proteins. Solid arrows indicate the location of the introns in plasminogen, tPA (tissue-type plasminogen activator), urokinase-type plasminogen activator (&A), prothrombin, and factor XII. Data are taken from the following references: tPA (37), urokinase-type plasminogen activator (38), factor XII (39), and prothrombin (40). PAP, preactivation peptide; EGF, epidermal growth factor; Gla, y-carboxyglutamic acid.

the conserved AATAAA sequence. This sequence was identical to the consensus sequence in four of the five nucleotide positions. An alternative polyadenylation site for the cDNA reported by Forsgren et al. (9) was found 31 nucleotides downstream from the first polyadenylation site. Potential CAYTG signals for this cDNA were found two nucleotides upstream and ten nucleotides downstream from the second polyadenylation site. The third polyadenylation site for the cDNA obtained from a human Hep G2 library that contained an extra 750 nucleotides of 3-noncoding DNA was also identified in the gene. A potential CAYTG signal for this cDNA was present 26 bp downstream from the alternative poly(A) site. A consensus sequence of YGTGTTYY, which is required for efficient formation of the 3 terminus of mRNA (36), was not present within 50 nucleotides downstream from the first AATAAA sequence. This consensus sequence, however, was present 32 bp downstream from the AATAAA sequence for the third polyadenylation site. Organization of the Gene for Plasminogen-Intron A was located between the nucleotide sequence coding for the signal sequence and the first half of the preactivation peptide, while the second intron (intron B) in the gene for plasminogen was located in the middle of the preactivation peptide (Fig. 3). Each of the five kringles was coded by two separate exons with a single intron inserted in the middle of each structure. The splice junctions of the introns within the kringles were usually type II, while the introns between the kringles were type I (Table II). This was the same pattern as that found in other genes containing one or more kringle structures, including tPA, urokinase, factor XII, and the first kringle in prothrombin (Fig. 4) (37-40). These results are consistent with the concept that the kringle-containing proteins as well as other proteins with specific domains have evolved in part by gene duplication and exon shuffling (41), and this shuffling occurs primarily at type I intron-exon splice junction boundaries. The second kringle in prothrombin, however, was en-

6110

Organization

of

the Gene for Human Plasminogen

FIG. 5. An alignment of portions of the gene for plasminogen (PLG) and the cDNA for apolipoprotein(a). Intron A, exons II-IX, and intron I were inserted between serine at position -4 and alanine at position 347 in the gene for plasminogen. The serine at position -4 and alanine at position -3 are adjacent to each other in the cDNA for apolipoprotein(a) (10). An intron may be present between these 2 amino acid residues in the gene for apolipoprotein(a).

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

Plasminogen
Signal PAP Kringle I Kringle 2 Kringle 3 Kringle4 Kringle 5

Apolipoprotein
Signal
. : ; .t ... .* f.. I 0

(a)
4
:

Kringle
*..*:I::.* i: .. - ..::..& . . . . .i

Kringle
. . . , . . . . t . : f -a *.. . ;

*;

::

. . . . ct I

37

xp..(-q

FIG. 6. Comparison of the structures for human plasminogen and apolipoprotein(a). Solid arrows indicate the location of introns in the gene coding for plasminogen, and the open arrows indicate those predicted in apolipoprotein(a) by homology in the amino acid and cDNA sequences of the two proteins. PAP, preactivation peptide. TABLE III Apparent polymorphisms of the amino acid residues in plasminogen as deduced from the genomic sequence and cDNA sequence (8, 9, 17), as well as the peptide sequence (5-7) Protein E-53 D-88 N-91
CDNA (1.9 kb) cDNA* (2.7 kb) cDNA Gene

RFLPd

CANQ)

CMQ)

CANQ)
Mae11 XmnI

AAT AAT AAT AAC AAC AAT C-238 TGT TGC TGC V-272 GTE GTT GTG F-295 TTC TTC TTT TTC E-342 CAA(Q) CANQ) CAN&) CAN&) N-453 GAT(D) GAT(D) GAT(D) @T(N) V-563 GTA GTA GTG GTA G-743 GGT GGT GGT GGG 3.NC45 GGAAC GGAAC GGGAC GGGAC 3-NC49 CGAGG CGTGG CGTGG CGTGG From Malinowski et al. (8). b E. Mulvihill and M. Martzen, unpublished data. From Forsgren et al. (9). d Restriction fragment length polymorphism.

AuaII-HaeIII

urokinase, and factor XII (37-39), although the positions of the introns relative to the amino acid sequence were slightly different. The data shown in Fig. 2 are consistent with the concept that the gene described in the present study is the one that expresses plasminogen, since the nucleotide sequence shown in Fig. 2 matches the cDNA prepared from human liver of Hep G2 cells. However, other very closely related genes have been identified during these studies and portions have been subjected to preliminary DNA sequence analysis. One of the related genes differed from the plasminogen cDNA in that it contained in-frame stop codons in the apparent exons, as well as a number of nucleotide substitutions, suggesting that it is a pseudogene. The gene coding for plasminogen is also closely related to that of apolipoprotein(a), as previously discussed by McLean et al. (10) and Tomlinson et al. (11). Indeed, apolipoprotein(a) may have evolved via exon shuffling (41) and the deletion of exons II-IX in the plasminogen gene, followed by a recombination event that linked the signal peptide precisely at intron A to kringle 4 precisely at intron I (Figs. 5 and 6). Alternatively, exons II-IX in the plasminogen gene or portions of this DNA may be present in the gene for apolipoprotein(a) as a large intron, and this intron sequence is removed during the processing of the apolipoprotein(a) mRNA. Additional evolution of the plasminogen gene would involve multiple duplications of exons X and XI coding for plasminogen kringle 4 generating up to 37 kringles present in apolipoprotein(a) as well as a number of small insertions and deletions. An exon coding for apolipoprotein(a) (or a very closely related gene) from the region that included the potential active site Asp residue was also amplified by the polymerase chain reaction during these studies and identified by preliminary sequence analysis. The intron/exon boundaries for this genomic DNA fragment were found to occur exactly in the same positions as those of exon XIV in the gene for plasminogen corresponding to amino acids 608 and 654 in the plasminogen polypeptide chain. These results also support the conclusion that the kringle-containing proteins are a family of proteins that have evolved from one or more common ancestral genes. These genes are all apparently localized on chromosome 6, band q26-27, because the results obtained employing the part of the gene for plasminogen containing exon X or XIV (42), the 3 end of the cDNA for apolipoprotein(a), and a cDNA fragment containing kringles l-3 of plasminogen (43) are in agreement. Results obtained from linkage studies also support this conclusion (44, 45).
Apparent Polymorphism in the

coded by a single exon (40), suggesting that the internal intron in this kringle may have been lost during evolution. The gene organization for the light chain of plasminogen was also similar to that of other serine proteases, especially tPA,

Gene for Plasminogen-A

number of minor differences were found when the DNA sequence of the gene for human plasminogen was compared with that of the cDNAs isolated in different laboratories (8,

Organization

of the Gene for Human Plasminogen

6111

9, 17) (Table III). Five of the nucleotide substitutions occurring in the coding region had no influence on the amino acid sequence, as was the case for two changes in the 3-noncoding region located 45 and 49 bp downstream from the stop codon in the cDNA (9). Several differences in the genomic DNA sequence and the cDNAs did result, however, in the substitution of amino acids that are different from those determined by amino acid sequence analysis of the plasma protein (Table III). In addition, IIe-67 located in the second half of the preactivation peptide region encoded by the three nucleotides (CAA) in the genomic DNA sequence and the cDNA (9) was not identified by amino acid sequence analysis (5). Some of these differences may be due to amino acid sequencing artifacts. For instance, Glu and Gln (residues 53 and 342), and Asp and Asn (residue 88) sometimes were difficult to differentiate when phenylthiohydantoin derivatives were separated by two-dimensional chromatography. Alternatively, some of the differences, such as Asn-453, may be the result of polymorphisms in the normal human population, resulting in amino acid changes. Using the isoelectric focusing technique, several variant alleles for plasminogen have been reported in different populations (46-50). The substitution of a charged residue, such as Asp-453 for the uncharged Asn residue (Table III), may contribute in part to the differential electrophoretic mobility of the gene products and to the heterogeneity of plasminogen. This substitution would be in addition to the well known differences in carbohydrate in plasminogen that also leads to changes in electrophoretic mobility (51). Some of the nucleotide substitutions described above were also confirmed by restriction digestion of amplified genomic DNAs from normal individuals showing that apparent polymorphisms exist in the gene for plasminogen. A Mae11 site in exon VII (at Cys-238) and a XmnI site in exon VIII (at Phe295) were found in some of the amplified genomic DNAs, while other genomic DNAs lacked these sites. Also, an AvaII site in exon XIX (at Gly-743) was not present in some of the genomic DNAs, and these DNAs have an additional Hue111 site which does not exist in other DNAs. These apparent restriction fragment-length polymorphisms might be helpful in studying various normal and abnormal genes.
Acknowledgments-We thank Drs. T. Maniatis and S. Yoshitake for kindly providing genomic libraries, and Dr. F. Hagen for a Hep G2 cDNA library. We also wish to thank Dr. J. E. Sadler for assistance in the initial screening of the genomic library, Dr. D. Chung for helpful discussions, E. Espling for technical assistance, and L. Swenson for help in the preparation of the manuscript. Note Added in Proof-The primary structure of an additional member of the plasminogen gene family (human hepatocyte growth factor) containing four kringles has been reported (Nakamura, T., Nishizawa, T., Hagiya, M., Seki, T., Shimonishi, M., Sugimura, A., Tashiro, K., and Shimizu, S. (1989) Nature 342, 440-443 and Miyazawa, K., Tsubouchi, H., Naka, D., Takahashi, K., Okigaki, M., Arakaki, N., Nakayama, H., Hirono, S., Sakiyama, O., Takahashi, K., Gohda, E., Daikuhara, Y., and Kitamura, N. (1989) Biochem. Biophys. Res. Commun. 163, 967-973).
REFERENCES 1. Robbins, K. C. (1981) Prog. Fibrinol. 5, 3-13 2. Wallen, P. (1978) Prog. Chem. Fibrinol. Thrombol. 3. Castellino, F. J., and Powell, J. R. (1981) Methods 3, 167-181 Enzymol. 80, 365-378

4. Hoylaerts, M., Rijken, D. C., Lijnen, H. R., and Collen, D. (1982) J. Biol. Chem. 257, 2912-2919 5. Wiman, B., and Wall&n, P. (1975) Eur. J. Biochem. 50,489-494 6. Wiman, B. (1977) Eur. J. Biochem. 76, 129-137 7. Sottrup-Jensen, L., Claeys, H:, Zajdel, M., Petersen, T. E., and Magnusson, S. (1978) Prog. Chem. Fibnnol. Thrombol. 3, 191-209 8. Malinowski, D. P., Sadler, J. E., and Davie, E. W. (1984) Biochemistry 23,

4243-4250

9. Forsgren, M., Rlden, B., Israelsson, M., Larsson, K., and Hedkn, L.-O. (1987) FEBS L&t. 213,254-260 10. McLean, J. W., Ton&won, J. E., Kuang, W.-J., Eaton, D. L., Chen, E. Y., Felss, G. M., Scanu, A. M., and Lawn, R. M. (1987) Nature 330, 132.
1.3,

11. Tomlinson,

264,5957-5965

J. E., McLean,

J. W., and Lawn,

R. M. (1989)

J. Biol. Chem.

12. Thorsen, S., Clemmensen, I., Sottrup-Jensen, L., and Magnusson, S. (1981) Biochim. BioDhvs. Acta 668.377-387 13. Lerch, P. G., fiickli, E. E., Lergier, W., and Gillessen, D. (1980) Eur. J. Eiochem. 107,7-13 14. Ichinose, A., Takio, K., and Fujikawa, K. (1986) J. Clin. Inuest. 78, 163169 _-_

Downloaded from www.jbc.org at INSTITUTE OF MICROBIAL TECHNOLOGY LIBRARY: (CSIR), on July 27, 2011

15. von Zonnevelt, A. J., Veerman, H., and Pannekoek, H. (1986) Proc. Natl. Acad. Sci. U. S. A. 83,4670-4674 16. Ichinose, A., Espling, E. S., Takamatsu, J., Saito, H., Shinmyozu, K., Maruvama. I.. Martzen. M. R.. Petersen. T. E.. and Davie. E. W. (1989) T/work Hhekstas. 6i, 495 (kbstr.) 17. Martzen, M. R., Petersen, T. E., Ichinose, A., and Davie, E. W. (1988) Fibrinolysis 2, 11 (abstr.) 18. Lawn, R. M., Fritsch, E. F., Parker, R. C., Blake, G., and Man&is, T. (1978) Cell 15,1157-1174 19. Yoshitake, S., Schach, B. G., Foster, D. C., Davie, E. W., and Kurachi, K. (1985) Biochemistry 24,3736-3750 20. Silhavy, T. J., Berman, W. L., and Enquist, L. W. (1984) Experiments with Gene Fusions, pp. 140-141, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 21. Degen, S. J. F., MacGillivray, R. T. A., and Davie, E. W. (1983) Biochemistry S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. 5. A. 74,5463-5467 23. Biaein. M. D.. Gibson. T. J.. and Hone. G. F. (1983) Proc. Natl. Acad. Sci. n 9. A. 8d,3963-$965 - 24. Poncz, M., Solowiejczyk, D., Ballantine, M., Schwartz, E., and Surrey, S. (1982) Proc. Natl. Acad. Sci. U. S. A. 79,4298-4302 25. Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B., and Erlich, H. A. (1988) Science 239,487-491 26. Bell, G. I., Karam, J. H., and Rutter, W. J. (1981) Proc. Natl. Acad. Sci. U. S. A. 78.5759-5763 27. BreathnaiL, R., Benoist, C., OHare, K., Gannon, F., and Chambon, P. (1978) Proc. Natl. Acad. Sci. U. S. A. 75,4853-4857 28. Mount, S. M. (1982) Nucleic Acids Res. 10,459-472 29. Sharp, P. A. (1981) Cell 23,643-646 30. Blake, C. (1983) Nature 306,535-537 31. Davie, E. W. (1987) in Herno&& and Thrombosis (Colman, R. W., Hirsh, J., Marder, V. J., and Salzman, E. W., eds) 2nd Ed., pp. 242-267, J. B. Lippincott, Philadelphia, PA 32. Breathnach, R., and Chambon, P. (1981) Annu. Reu. Biochem. 50, 349-

22,2087-2097 22. Sanger, F., Nicklen,

383 33. Fowlkes,


34. 35. 36. 37. 38. 39.

40.
41. 42. 43. 44.

K. H.. Donovan. M.. Hornunn. S.. Motulskv. A. G.. Dist%he, C!., Dyer, ic., Swisshelm, k., Anderson, 3. diblett, E., Sadler: E.. Eddv. R.. and Shows. T. B. (1987) Am. J. Hum. Genet. 40.338-350 46. Hobart, d.J. (1979) Ann. Hum. denet.Lond. 42,419-423 47. Raum, D., Marcus, D., and Alper, C. A. (1980) Am. J. Hum. Genet. 32, Ml -ma 48. Nishimukai, H., Kera, Y., Sakata, K., and Yamasawa, K. (1981) VOX Sang.
- - _ _ Y

79,80-82 45. Murrav. J. C.. Buetow.

D. M., Mullis, N. T., Comeau, C. M., and Crabtree, G. R. (1984) Proc. Natl. Acad. Sci. U. S. A. 82,8710-8714 Adrian, G. S., Korinek, B. W., Bowman, G. H., and Yang, F. (1986) Gene (Amst.) 49,167-175 Berget, S. M. (1984) Nature 309,179-182 McLauchlan, J., Gaffney, D., Whitton, J. L., and Clements, J. B. (1985) Nucleic Acids Res. 13,1347-1368 Ny, T., Elgh, F., and Lund, B. (1984) Proc. Natl. Acad. Sci. U. S. A. 81, 5355-5369 Riccio, A., Grimaldi, G., Verde, P., Sebastio, G., Boast, S., and Blasi, F. (1985) Nucleic Acids Res. 13,2759-2771 Cool, D. E., and MacGillivray, R. T. A. (1987) J. Biol. Chem. 262, 136621 RWR _--.Deeen. S. J. F.. and Davie. E. W. (1987) Biochemistrv _ 26.6165-6177 ~I Gilbert, W. (lG78) Nature271,5bl Lindahl, G., Gersdorf, E., Menzel, H. J., Duba, C., Cleve, H., Humphries, S., and Utermann, G. (1989) Hum. &net. 81,149-152 Frank, S. L., Klisak, I., Sparkes, R. S., Mohandas, T., Tomlinson, J. E., McLean. J. W.. Lawn. R. M.. and Lusis. A. J. (1988) Hum. Genet. 79. x2-?..w ..- ___ Weitkamp, L. R., Guttormsen, S. A., and Schultz, J. S. (1988) Hum. Genet.

40,422-425

49. Dykes, D., Nelson, M., and Polesky, H. (1984) Electrophoresis 4, 417-420 50. Aoki, N., Tateno, K., and Sakata, Y. (1984) Biochem. Genet. 22,871-881 51. Hayes, M. L., and Castellino, F. J. (1979) J. Biol. Chem. 254,8772-8776

You might also like