You are on page 1of 14

Transgenic Res (2008) 17:293306 DOI 10.

1007/s11248-007-9101-3

ORIGINAL PAPER

Transgene integration and organization in Cotton (Gossypium hirsutum L.) genome


Jun Zhang Lin Cai Jiaqin Cheng Huizhu Mao Xiaoping Fan Zhaohong Meng Ka Man Chan Huijun Zhang Jianfei Qi Lianghui Ji Yan Hong

Received: 9 January 2007 / Accepted: 18 April 2007 / Published online: 5 June 2007 Springer Science+Business Media B.V. 2007

Abstract While genetically modied upland cotton (Gossypium hirsutum L.) varieties are ranked among the most successful genetically modied organisms (GMO), there is little knowledge on transgene integration in the cotton genome, partly because of the difculty in obtaining large numbers of transgenic plants. In this study, we analyzed 139 independently derived T0 transgenic cotton plants transformed by Agrobacterium tumefaciens strain AGL1 carrying a binary plasmid pPZP-GFP. It was found by PCR that as many as 31% of the plants had integration of

J. Zhang L. Cai J. Cheng H. Mao X. Fan Z. Meng K. M. Chan H. Zhang J. Qi L. Ji Y. Hong (&) Temasek Life Sciences Laboratory, National University of Singapore, 1 Research Link, Singapore 117604, Singapore e-mail: hongy@tll.org.sg Present Address: J. Zhang Shandong Cotton Research Center, Jinan, Shandong province 250100, P. R. China Present Address: J. Cheng Institute of Botany, Chinese Academy of Sciences, Beijing, P. R. China Present Address: X. Fan Z. Meng H. Zhang Shanxi Cotton Research Institute, Yuncheng, Shanxi, P. R. China

vector backbone sequences. Of the 110 plants with good genomic Southern blot results, 37% had integration of a single T-DNA, 24% had two T-DNA copies and 39% had three or more copies. Multiple copies of the T-DNA existed either as repeats in complex loci or unlinked loci. Our further analysis of two T1 populations showed that segregants with a single T-DNA and no vector sequence could be obtained from T0 plants having multiple T-DNA copies and vector sequence. Out of the 57 T-DNA/ T-DNA junctions cloned from complex loci, 27 had canonical T-DNA tandem repeats, the rest (30) had deletions to T-DNAs or had inclusion of vector sequences. Overlapping micro-homology was present for most of the T-DNA/T-DNA junctions (38/57). Right border (RB) ends of the T-DNA were precise while most left border (LB) ends (64%) had truncations to internal border sequences. Sequencing of collinear vector integration outside LB in 33 plants gave evidence that collinear vector sequence was determined in agrobacterium culture. Among the 130 plants with characterized anking sequences, 12% had the transgene integrated into coding sequences, 12% into repetitive sequences, 7% into rDNAs. Interestingly, 7% had the transgene integrated into chloroplast derived sequences. Nucleotide sequence comparison of target sites in cotton genome before and after T-DNA integration revealed overlapping microhomology between target sites and the T-DNA (8/8), deletions to cotton genome in most cases studied (7/8) and some also had ller sequences (3/8).

123

294

Transgenic Res (2008) 17:293306

This information on T-DNA integration in cotton will facilitate functional genomic studies and further crop improvement.

Keywords Transgene Cotton (Gossypium hirsutum L.) Agrobacterium-mediated transformation Transgene/plant genome junctions Transgene repeats Vector backbone integration

Introduction The unique capability of Agrobacterium tumefaciens to integrate a dened segment of DNA into eukaryotic genomes has been adapted into a powerful tool for production of transgenic plants, random mutagenesis (Topping et al. 1995; Newell 2000) and functional genomic studies (Parinov and Sundaresan 2000). It has been suggested (De Buck et al. 1999; Tzra et al. 2004) that integration of the T-DNA in the plant genome involves an illegitimate recombination mechanism based on microhomology, deletions, repair activities and insertions of ller DNA. The DNA to be transferred (T-DNA) by Agrobacterium is delineated by two similar, but not identical 25 bp direct border repeats, the left border (LB) and the right border (RB) (Zambryski 1992). It was initially believed that T-DNA transfer was a polar process, starting from the right border and continuing through the T-DNA until the left border. Only the DNA between the repeats, the T-DNA, is transferred to the plant cell (Zambryski 1988). However, many studies demonstrated that vector backbone sequences were also integrated very frequently into the plant genome (Cluster et al. 1996; Kononov et al. 1997; De Buck et al. 2000; Yin and Wang 2000; Olhoft et al. 2004; Lange et al. 2006). The transfer of vector backbone sequences was generally interpreted to have derived from reading through loose left borders (LB) (Wenck et al. 1997; Lange et al. 2006). It was also suggested that this process occurred within the Agrobacterium cells where T-strand was produced (De Buck et al. 2000). However, there has been no concrete evidence to this suggestion. On the other hand, vector backbone sequences integrated into transgenic plants were seldom characterized at the sequence level. Presence of vector sequences not only inuences transgene expression (Iglesias et al. 1997;

Matzke and Matzke 1998; Jakowitsch et al. 1999) but also poses a biosafety concern. Most of our knowledge on transgene integration and organization comes from studies on model plants like Arabidopsis, tobacco and rice. T-DNA integrates either at one locus or at several independent loci. Multiple T-DNA copies at one locus could be in either direct or inverted repeats. Intact or part of LB and RB repeats were integrated into plant genome (Brunaud et al. 2002). The recent nding of speciesspecic DNA double strand break repair mechanisms (Kirik et al. 2000) suggested possible variation in transgene integration among plants. A recent analysis of T-DNA integration into the barley genome found only direct repeated T-DNA integrations, a different situation from that reported in Arabidopsis and tobacco (Stahl et al. 2002). For commercial development and creating T-DNA tagged lines for functional genomics studies in crops, clean and simple insertions are generally required. In designing these projects, information on locus structure, frequency of T-DNA integration sites in target plant genome and distribution of transgene loci is required to determine how many transgenic plants must be created. Such crop-specic information is scarce and generally lacks details. Upland cotton is the most important ber crop for the textile industry but little is known about transgene integration in cotton. In this study, we produced and analyzed 139 T0 transgenic cotton plants. Detailed characterization of transgene in these T0 and some T1 plants was conducted with PCR and genomic Southern blot hybridization. T-DNA/cotton genome junctions and T-DNA/T-DNA junctions in complex loci were cloned by inverse PCR and sequenced. Our results indicated that as many as 31% of T0 transgenic plants had vector backbone sequence integrated, around 60% had integration of a single copy or two copies of the T-DNA. Multiple copies of the T-DNA existed either as repeats in complex loci or unlinked loci. Intact T-DNA repeats in complex loci arranged mostly in tandem. Overlapping micro-homology was present for most T-DNA/T-DNA junctions. It was also found that T-DNA integration into the cotton genome associated with micro-homology between target site and T-DNA, deletions and adding ller sequences. Left border ends were found to be less precise with many having internal border sequence truncated. Besides integration into unknown

123

Transgenic Res (2008) 17:293306

295

sequences, T-DNAs also integrated frequently into coding sequences, repetitive sequences, rDNAs as well as chloroplast derived sequences. Our detailed characterization of vector sequences in 33 plants provides support to the suggestion that collinear vector sequence integration is determined within agrobacterium.

T1 seeds were germinated and allowed to grow in greenhouse for further analysis. Genomic DNA isolation and genomic Southern blot analysis Genomic DNAs were isolated from the leaf material of transgenic plants according to a modied cetyltrimethylammonium bromide (CTAB) method (Paterson 1993). A 20 mg portion of genomic DNA was digested with an appropriate restriction enzyme, separated on 0.7% agarose gel and blotted onto Hybond + Nylon membrane (Stratagene, La Jolla, CA). Digoxigenin-11-dUTP labeled probes were prepared using Roche DIG Nuclear Detection System (Roche, Basal, Switzerland) and hybridized to blotted membranes (about 100 ng of probe per membrane). Probe labeling and hybridization were conducted according to Roches DIG Application Manual. The hybridized membranes were incubated with anti-digoxigenin antibody conjugated with alkaline phosphatase (AP) and uorescent signal was detected by a catalyzed substrate CDP-Star and X-ray lm. PCR to characterize T-DNA integration and inverse PCR to amplify junction sequences Table 1 lists all the PCR primers used in this study with location, orientation and sequence information. Inverse PCR was conducted based on Thomas et al. (1994). Primer pairs were used to amplify the gene of interest, the selection marker gene inside T-DNA (1/2 for gfp and 3/4 for nptII) and vector backbone sequences outside RB and LB (5/6 and 7/8). About 2050 ng of DNA was used as template for PCR amplication with 0.5 mM each of primers and 1 U of Taq polymerase (Qiagen, Chatsworth, CA) in 25 ml of reaction volume. The PCR reactions were initiated by heating the samples at 958C for 3 min, followed by 35 cycles performed at 958C for 20 s, 558C for 30 s and 728C for 60 s. Additional 10 min of extension at 728C was used to complete PCR. All PCR reactions were performed in a PTC100 Thermal Cycler (MJ Research, Waltham, MA). For inverse PCR cloning of junction sequences, 100 U of restriction enzyme EcoRI was used to digest 25 mg of DNA in 100 ml of reaction volume at 378C overnight. The digestion products were then puried

Materials and methods Binary vector construct and cotton transformation To construct the binary plasmid pPZP-GFP (Fig. 1) used in this study, a 1863 bp fragment containing the 35S promoter and GFP2 gene was cut from the pGFP2 vector (Haseloff and Amos 1995) with two restriction enzymes PstI/EcoRI (New England Biolabs, Ipswich, MA) and inserted into the binary vector pPZP111 (Hajdukiewicz et al. 1994) to obtain the plasmid. The plasmid (Fig. 1) was subsequently transformed into Agrobacterium tumefaciens strain AGL1. Cotyledon of upland cotton (Gossypium hirsutum L.) variety Coker 312 was used as the explant for generating transgenic cotton lines (Li et al. 2005). Antibiotic chloramphenicol was included in agrobacterium culture before co-cultivation. About 50mg/l of Kanamycin was included in the somatic embryo induction media and GFP uorescence was also used for positive selection of transformants. T0 cotton plants were maintained in a green house under natural lighting with day length of about 12 h and temperature ranging from 25 to 348C. They were allowed to self pollinate in the greenhouse. Harvested

Fig. 1 Binary vector pPZP-GFP for cotton transformation. T-DNA sequence was numbered from RB towards LB. Primer pairs 1/2, 3/4 and 5/6 were used to detect encoding sequences for GFP (gfp), NPT-II (nptII) and chloramphenicol resistance protein (cmr), also to generate probes for genomic Southern blot hybridization. Double arrowed nested PCR primers were used for cloning junction sequences. Restriction enzymes PstI, XbaI and EcoRI all had unique cut site in the binary vector

123

296 Table 1 Details of PCR primers

Transgenic Res (2008) 17:293306 Primer sequence (50 30 )

Primer ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Position and orientation on construct F1251 R1792 F3354 R4031 F4804 R5321 F10141 R10729 F2017 F2026 R2349 R2340 R428 R398 F4059 F4062

ATGGGTAAGGGAGAAGAACTTTTCACTGG TGGTCTGCTAGTTGAACGCTTCCATC CTATTCGGCTATGACTGGGCACAACA GTAAAGCACGAGGAAGCGGTCAGC CAGTGGCGGTTTTCATGGCTTCTTGT AGAAACTGCCGGAAATCGTCGTGGTA CGGGTGCGGTCGATGATTAGGGAACG TGATCCAACCCCTCCGCTGCTAT AGATTGAATCCTGTTGCCGGTCTTG CCTGTTGCCGGTCTTGCGATGATTAT CATTAGGCACCCCAGGCTTTACACTT CCCCAGGCTTTACACTTTATGCTTCC CATCTGTGGGTTAGCATTCTTTCTGA GAAAAGGCTAATCTGGGGACCTGCAG ATCGCCTTCTATCGCCTTCTTGACG GCCTTCTATCGCCTTCTTGACGAGT

by phenol extraction and circularized by T4 Ligase (New England Biolabs, Ipswich, MA) at 128C overnight. The nested primers numbered from 9 to 16 were used for inverse PCR amplication of junction sequences. Inverse PCR was initiated using products of ligation as template with the 1st round primers at relatively low annealing temperatures. The PCR was performed at 958C for 3 min followed by 3540 cycles of 958C for 20 s, 558C for 30 s and 728C for 2.5 min with 10 min extension at 728C. The 2nd round of nested PCR was carried out using 15 ml of 1st round PCR product in a nal volume of 100 ml with the nested primers at higher annealing temperatures. Sequencing of junctions PCR products were fractionated by 1% agarose gel electrophoresis and gel puried by using the Qiagen Gel Extraction Kit (Qiagen). All fragments puried were used for direct sequencing and/or cloned into the vector pGEM-T-easy vector (Promega, Madison, WI) for sequencing. Direct PCR sequencing was carried out with the nested primers and Big dye chemicals and sequences analyzed by an ABI 3730XL DNA sequencer (Applied Biosystems, Foster City, CA). For those cloned into pGEM-T easy vector, T7 or Sp6 primers were used for PCR sequencing.

The anking sequences in the cotton genome and the transgene junction sequences were analyzed by BLASTN DNA homology search against NCBI (http://www.ncbi.nlm.nih.gov/BLAST) non redundant nucleic acid database and aligned with pPZPGFP vector sequence. An E value of e10 was adopted as the cut off threshold for positive homology. Characterization on pre-insertion sites in the cotton genome For those single copy transgenic plants with good genomic anking sequences of both RB and LB, a primer was designed based on RB anking sequence while another was designed based on LB anking sequence. Such primer pairs were used to amplify the pre-insertion sites from a non-transgenic (wild type) cotton genomic DNA. The amplied products were either directly sequenced with the PCR primers or cloned into pGEM T-easy vector and sequenced. Preinsertion sequences were then aligned with insertion loci to evaluate changes caused by transgene integration. Characterization of transgenic loci Five transgene loci were fully characterized by combining sequences of their T-DNA junction(s),

123

Transgenic Res (2008) 17:293306

297

plant genome/T-DNA junctions in reference to genomic Southern blot hybridization results.

Results PCR characterization of transgenic cotton plants T0 transgenic cotton plants were produced by agrobacterium-mediated transformation by the simple binary plasmid pPZP-GFP (Fig. 1). Transformants were selected by both Kanamycin resistance and GFP uorescence during transformation and plantlet regeneration. With this selection strategy, plants with GFP silenced were not included. Various primers were designed to amplify specic regions inside and outside RB and LB. The PCR results are summarized in Table 2. PCR analysis showed that most plants (69%, 96/139) had both gfp and nptII and were free of vector sequence. As many as 31% T0 plants (43/139) had integration of vector sequences. Most of these plants (37/43) had vector sequence outside LB. In the 28 plants with vector sequence near RB, most of them (79%, 22/28) also had vector sequences near LB. T-DNA copy number evaluation by genomic Southern blot analysis To check the pattern and copy number of T-DNA insertions, genomic Southern blot analysis was conducted with probes specic to gfp and nptII (inside the T-DNA borders) and cmr that is outside the LB (See Fig. 1). DNAs (20 mg each) were digested with restriction enzyme XbaI (with a single cutting site before gfp) and hybridized to gfp, nptII and cmr probes. For 110 plants, we obtained for each a clear, clean and unique banding pattern. Among them, 41 (37%) had a single copy T-DNA, 26 (24%) had two copies while 43 (39%) had three or more copies
Table 2 Summary of vector backbone integration in cotton genome

(multiple-copy events). A total of 35 plants (32%) tested positive for the cmr probe, consistent with the percentage of outside LB vector backbone sequence integration detected by PCR (27%, 37/139). An example of the hybridization results is shown in Fig. 2. Most transgenic T0 plants had the same number of copies for gfp (Fig. 2A) and nptII (Fig. 2B). There were eight with different copy numbers for gfp and nptII. Plants 2, 8, 17 and 23 had more gfp bands while plants 6, 9, 10 and 20 had more nptII bands. The results indicated partial integration of T-DNA for some copies. This partial integration had no obvious bias toward gfp or nptII, probably due to the use of positive selections for both GFP and Kanamycin resistance. Copy number for vector backbone sequence (cmr, Fig. 2C) was always the lowest. Vector backbone integration occurred not only in plants with multiple copies of the T-DNA but also in some plants with single copy T-DNA integration. It has been suggested that multiple copy T-DNAs can either integrate into one single site as repeats to form so-called complex T-DNA locus (De Buck et al. 1999) or inserted into independent sites. Since such information is not available from the above genomic Southern blot hybridization with an enzyme cutting inside T-DNA (XbaI), we digested the same genomic DNAs with an enzyme (BamHI) with no cut site on the plasmid to determine number of insertion loci. Genomic southern blot hybridization to gfp probe after BamHI digestion clearly showed that some T0 plants had more than one insertion loci (Fig. 2D). Some plants had one single band sized much larger than that of T-DNA (4.7 kb), suggesting the presence of complex T-DNA loci or possible inclusion of whole binary plasmid (10.7 kb). To further study multiple loci T-DNA integration, we investigated the banding patterns of T1 populations generated from two T0 transgenic plants G9 and

Category Total number of T0 plants Plants free of vector backbone Plants with vector backbone near LB or RB Near LB only Near RB only Plants with vector backbone near both RB and LB

Number 139 96 21 15 6 22

Percentage (%) 100 69.1 15.1

15.8

123

298

Transgenic Res (2008) 17:293306

Fig. 2 Transgene integration evaluation by genomic Southern blot analysis. Transgene copy numbers were evaluated by XbaI digestion of genomic DNAs followed by hybridization to gfp (A), nptII (B) and cmr (C). Number of insertion loci was

evaluated by BamHI (an enzyme that does not cut the vector) digestion followed by hybridization to gfp (D). M: DIG-labeled Lamda DNA/HindIII ladders. Lanes 125 were independently derived T0 transgenic plants

47. XbaI or BamHI digested genomic DNAs were probed with gfp (Fig. 3A1, A2). Genomic Southern blot hybridization indicated that G9 had two T-DNAs in two insertion loci while 47 had ve T-DNAs in >3 insertion loci (Fig. 3A1, A2). Both T0 plants had one copy of vector sequence (data now shown). In both T1 populations, segregants containing one single copy of gfp and without vector sequence were obtained. Genomic DNAs from 27 T1 segregants for T0 plant 47 were digested with XbaI and hybridized to gfp and cmr probes (examples in Fig. 3B1, B2). About 11 plants had ve T-DNAs like the parent, seven plants had four T-DNAs, six plants had two T-DNAs and three plants had one single TDNA copy (represented by 47T1-15, T1-22, T1-12 and T1-9 respectively, Fig. 3A). The segregants with one or two T-DNA copies (e.g., T1-41, T1-56, T1-59) were free of vector sequence, indicating that they were not linked to the vector sequence in T0 parent plant. Canonical T-DNA ends in the cotton genome We dened canonical T-DNA as T-DNA limited between the two borders with no vector backbone integration. Canonical T-DNA/plant genome junctions were cloned by inverse PCR using two pairs of nested primers after EcoRI digestion of genomic DNAs.

Primer pairs 9/14 followed by 10/13 were used to clone junctions with RB and primer pairs 15/11 followed by 16/12 were used to clone junctions with LB (Fig. 1). A total number of 121 RB junctions and 103 LB junctions were cloned and sequenced. The distribution of RB and LB borders is summarized in Fig. 4. We found that RB integration was more precise with 51% (62/121) junctions having at least part of RB repeat and 41 (34%) ended at the same nucleotide in RB repeat (196 bp). Some plants had deletions of R border sequences (Fig. 4B). The longest deletion was 114 bp. For the LB junctions, only 36% (37/103) contained part of the LB repeat and most (66/103, 64%) had no LB repeat sequence at all. Deletion to LB sequences was more frequent and longer, with the longest deletion encompassing 196 bp of the L border sequence (Fig. 4C). Right border ends were mostly G or C (90/121, 74%) while LB ends were mostly A or T (67/103, 65%). Characterization of integrated vector sequences Inverse PCR cloned from 33 plants integrated vector sequences outside LB. Junctions of these vector sequences with plant genome are listed in Fig. 5 with information on ending nucleotides, length of vector backbone integrated as well as their downstream sequences on the vector. These vector sequences

123

Transgenic Res (2008) 17:293306

299

obvious feature is the overlapping micro-homology sequences at junctions between two T-DNA units in more than half of the repeat junctions (67%, 38/57). Such overlapping micro homology sequences ranges from 1 to 7 bp long (Fig. 6A). Filler sequences (Fig. 6B) of unknown origin were present in only few junctions (12%, 7/57). Based on the report by Tzra et al. (2004), we dened RB end as head and LB end as tail. Tail-to-head structures are in direct orientation while all other structures formed by various tail-to-tail and headto-head joining are in inverted orientation. We found out that nearly half of them (47%, 27/57) had canonical T-DNA repeats linked tail-to-head. The rest of junctions (53%, 30/57) had unusual structures with truncated TDNA or inclusion of vector sequences (long head or tail). Figure 6C shows the following types of unusual junction structures: (1) (2) (3) (4) (5) (6)
Fig. 3 Segregation of transgenes in T1 populations. (A) DNA samples from the two T0 plants (G9T0, 47T0) and their T1 populations (G9T1-xx and 47T1-xx) were digested with XbaI (A1), BamHI (A2) and probed with gfp. M was the DNA ladder; (B) DNA samples for one T0 plant (47T0) and its T1 populations (T1-xx) were digested with XbaI and probed with gfp (B1) and cmr (B2)

long tail to normal head (LT-NH) normal tail to long tail (NT-LT) truncated tail to truncated tail (TT-TT) truncated head to normal head (TH-NH) normal tail to long transfer head (NT-LH) and long tail to truncated head (LT-TH)

It is noted that microhomology and ller sequences were also present in these unusual junction structures. Distribution of transgene loci in the cotton genome For 130 transgenic plants, at least one T-DNA anking cotton genome sequence was cloned and sequenced (all genome anking sequences >400 bp). A homology search against the NCBI nucleic acid non-redundant database was conducted and identities of these genomic border sequences are listed in Table 3. More than half (55%, 71/130) of these cotton genome anking sequences had no hits in the database (unknown). Due to the very limited coverage of the cotton genome in the database, these insertions could be either in not yet identied genes or intergenic regions. Six (5%) had homologues in cotton BAC clones sequenced but with unknown functions (cotton genome). Sixteen plants (12%, 16/ 130) had the T-DNA inserted into protein coding sequences, which included those for a RNA recognition motif containing protein, embryogenesis abundant protein, glycerol-3-phosphate dehydrogenase,

were all continuous with LB with no mutation detected. They were all larger than 1 kb and encompassed the complete cmr coding sequence and part of promoter region. These Integrated vector sequence ends were mostly G or C (27/33), similar to RB ends but different from LB ends (mostly A and T). Some vector sequences ended at the same nucleotides. Also, there was no signicant base pair bias in upstream and downstream sequences of the ending nucleotides (Fig. 5). T-DNA repeats in complex transgene loci As shown in Fig. 2, many plants had multiple copies of T-DNA inserted into a single locus (complex loci). Our inverse PCR strategy had also cloned 57 of these junctions between repeats in complex loci. Some of these repeat junction structures are listed in Fig. 6. An

123

300

Transgenic Res (2008) 17:293306

Fig. 4 Cotton genome/T-DNA junctions in transgenic cotton plants. (A) Distribution of transgene ends near the RB and the LB. Y-axis represents frequency of occurrence. Border repeat

sequences are boxed; (B) Plants with deletions further inside the RB; (C) Plants with deletions further inside the LB. Frequency of occurrence is indicated by number of arrows

Fig. 5 Ends of integrated vector sequences outside the LB. Vector backbone sequence/cotton genome junctions were cloned and sequenced for 33 transgenic cotton plants integrated with collinear vector backbone sequence outside LB. The ending nucleotides are listed with 25 bp upstream vector sequence and 25 bp downstream vector sequence. Most of the ending nucleotides were G or C (bolded)

cotton Lea4-A and one MADS box protein. It was noted that 13% T-DNA integration were in repetitive sequence regions, 7% in rDNA regions, 7% in chloroplast derived sequences and 1.5% in microsatellite regions.

Changes to cotton genome sequence by T-DNA integration For single T-DNA copy plants with both RB and LB anking sequences characterized, primers were

123

Transgenic Res (2008) 17:293306

301

integration sites are listed in Fig. 7. Microhomology of 15 bp between T-DNA borders (RB or LB) and genome DNA was present for all the plants characterized. Deletion to genome sequence (7/8) was a general feature and ranged in length from a few base pairs to 67 base pairs. Filler sequences were also detected for three plants. The llers had no strong homology with vector sequence and hence most likely originated from plant genome. They all contained small repeats in direct or inverted orientations. Examples of transgene loci in the cotton genome Five transgene loci were fully characterized (Fig. 8) by combining sequencing data of T-DNA/T-DNA junctions, cotton genome/T-DNA junctions and genomic Southern blot hybridization results. There was one locus with integration of a single clean TDNA copy (Figs. 8a, 2 lane 4). Another locus had two canonical T-DNAs arranged in tandem and linked to a partial T-DNA with only gfp (Fig. 8b), consistent with the extra gfp band revealed by genomic Southern blot hybridization (Fig. 2 lane 2). The third locus had one canonical T-DNA linked to a partial T-DNA with only the selection marker (nptII, Figs. 8c, 2 lane 7). From a multiple loci transgenic plant, one locus was fully characterized with two tandem repeated canonical T-DNAs, between which there was a long region of vector sequence >2 kb (Figs. 8d, 2 lane 10). A transgenic plant (Fig. 8e) had a single locus insertion but had the vector region outside LB translocated to the front of RB. This unusual structure explains the different sized cmr hybridizing band from those for gfp and cmr probes (Fig. 2 lane 1).

Fig. 6 Junction structures for T-DNA repeats in complex loci. (A) Junction structures for tandem repeats. The bold italic letters are the overlapping micro-homology bases; (B) Junction structures with ller sequences between two transgene units; (C) Some unusual junction structures with either deletions to borders or inclusion of vectors sequences

designed to amplify pre-insertion sites (target sites) from a non-transgenic control plant. These preinsertion sequences were compared with genomic sequences after T- DNA integration. Eight T-DNA

Table 3 Distribution of transgene loci in cotton genome

Category Unknown Coding sequence Repetitive sequence Cotton genome Chloroplast derived rDNA Microsatellite Total

Frequency 71 16 16 6 9 9 2 130

Percentage (%) 54.6 12.3 12.3 4.6 6.9 6.9 1.5 100.0

123

302

Transgenic Res (2008) 17:293306

Fig. 7 Changes to cotton genome by T-DNA integration. (ah) are eight pre- insertion sequences with information of TDNA insertion (above). Deleted nucleotides are boxed. The

bolded and italicized nucleotides had microhomology. Filler sequences are listed below pre-insertion sequences. Repeated sequences inside ller sequences are indicated with arrows

Discussion Border sequences in transgenic cotton In our study, a total number of 121 genome/RB junctions and 103 genome/LB junctions were sequenced. Right border ends were found more precise, with 51% (62/121) containing at least part of RB repeat. Most of them (41/62) ended at base 196, a C nucleotide, presumably a point T-DNA nicking might have happened. Left border ends were less precise with only 36% having any LB repeat sequence and only one plant having full length LB repeat. Others had as long as 196 bp deletion to left border sequence. Since the proximity of the nptII gene with LB in our plasmid, we could have missed

those with longer deletions. This is a different situation from that in Arabidopsis, for which TDNA integration was precise for both LB and RB (Forsbach et al. 2003), and as many as 24% transgenic plants had full length LB repeat (Brunaud et al. 2002). This result suggests that in the cotton genome, recognition of left border sequence of TDNA was less stringent. This agrees well with the high percentage of transgenic lines (27%) with collinear integration of vector sequence outside LB. Complex transgene loci with vector backbone and truncated transgene in the cotton To explain the formation of transgene repeats, a model for T-DNA integration via double strand

123

Transgenic Res (2008) 17:293306

303

Fig. 8 Examples of transgene loci in cotton genome. Numbering starts from RB of the T-DNA. Junction sequences of T-DNA units are listed above with overlapping microhomology shown in bold italic. One ller sequence (below) had short inverted repeats (bold italic) in the two ends

intermediates was proposed (De Neve et al. 1997). According to this model, the VirD2 molecule remains attached to the 50 -end (the head) of the T-strand even after its conversion to a double strand form. Potentially, VirD2 molecules that are attached to two T-DNAs can bind to each other and bring the T-DNAs together in a head-to-head orientation before their ligation by host factors. Such model suggests more head-to-head joining of T-DNAs than head-totail or tail-to-tail joining. Our nding was different from such a prediction with about half (27/57) of repeat junctions having canonical T-DNA repeats linked head-to-tail as the typical transgene repeat structure. Only three junctions had a canonical T-DNA joined head-to-head with another highly truncated T-DNA (one example is given in Fig 6C). The two junctions with T-DNAs tail-to-tail joined also had truncation to one T-DNA (Fig 6C). This is similar to the recent nding of mostly tandem repeats in transgenic soybean lines revealed by genomic Southern blot analysis (Olhoft et al. 2004). Since most of the tandem repeat junctions had microhomology sequences, we suggest that overlapping microhomology and some cotton specic host factor could have promoted head-to-tail ligation of T-DNA before their integration into a single locus in cotton. Despite the fact that some junction structures seemed to occur at relatively high frequencies, such as junctions formed by 4408/197 with microhomology of TCTA (7/57) and 4442/193 with CTGC (3/57, see

Fig. 6A), repeat junction structure is much less conserved than aspen in which an identical repeat junction was found in all ten transgenic lines that had direct repeats (Kumar and Fladung 2000). Our study also characterized many junction structures formed by truncated T-DNAs and T-DNAs with vector backbone sequences in direct and inverted orientations. Microhomology was common in these atypical transgene repeat structures (See Fig. 6C). Our analysis of complex loci structure supports the view that synapsis mediated by microhomology is important in the formation of complex loci. Generally, complex loci in cotton lack ller sequences (only in 5 of 57 junctions) but have frequent truncations to T-DNA. In summary, our analysis of T-DNA/T-DNA and T-DNA/genomic junctions in transgenic cotton plants suggested that transgene integration in cotton occurs by illegitimate recombination that is characterized by the presence of microhomology and ller sequences. It is noted that such illegitimate recombination was also implicated in transgenic plants generated by particle bombardment. Kohli et al. (1999) characterized 12 transgenic rice lines and observed microhomology at the junctions of ten lines. Southern blot analysis and sequencing of plasmid/plasmid junctions also identied multiple plasmids joined as repeats that were free of intervening genomic DNAs (Kohli et al. 1998; Kohli et al. 1999). In view of this similarity between agrobacterium generated cotton plants and particle bombardment generated rice plants, function of Vir protein in the formation of complex loci becomes questionable. Distribution of transgene loci In the Arabidopsis genome where gene density is high, mapping sequences of single copy T-DNA insertions on chromosomes did not reveal any apparent bias of T-DNA insertion for any chromosome. The observed distribution of T-DNA insertions in intergenic sequence versus gene sequence appeared randomly (Forsbach et al. 2003). However, different integration patterns have been reported in other plants. In rice, as many as 45% T-DNA integration happened to genes, which represented only 1025% of the genome (Jeong et al. 2006). In our study, among the 130 T0 transgenic cotton plants with genome anking sequences, more than half (54.6%) had T-DNA integrated into regions

123

304

Transgenic Res (2008) 17:293306

with no sequence homology hits in GenBank nonredundant nucleic acid database (nr). These regions may be novel genes or not yet sequenced intergenic regions. Six had hits in cotton genome sequence derived from sequenced cotton BAC clones, most likely intergenic regions but also possibly novel genes. About 12% had T-DNA integrated into protein coding regions. About 13% of T-DNA integration were in repetitive sequence, less than the predicted composition of 3036% of repetitive sequences in tetraploid cotton genome (Zhao et al. 1995). In cotton, there were only four rDNA loci in tetraploid G. hirsutum, two in each of the A and D subgenomes (Crane et al. 1993). We found that as many as 7% of T0 plants had T-DNAs inserted into rDNA. Such high percentage of integration into non-randomly distributed rDNA in the cotton genome suggests preferential T-DNA integration into these actively transcribed regions in cotton. This result is in line with the nding that in Arabidopsis, Ds insertion lines had preference for transposition to the regions adjacent to the nucleolar organizer regions (Parinov et al. 1999; Raina et al. 2002). However, it is different from the report that in Arabidopsis, none of the 112 single copy T-DNA insertion lines had integration into rDNA (Forsbach et al. 2003). Since T-DNA integration into actively transcribed regions like rDNA could potentially lead to efcient transcription, it is possible that our selection strategy favored transgenic plants with efcient expression of selection marker gene and/or gene of interest. It remains to be further investigated if these lines have transcription of gfp or nptII at signicantly higher level or more stable than in other lines. Another surprising nding was the transgene integration into sequences homologous to chloroplast genome (7%). It is unlikely that the T-DNAs were integrated into chloroplast genome since most of them were only partially homologous to chloroplast sequences. A more likely explanation is that T-DNAs integrated into chloroplast derived regions in nuclear genome. The reason for such high frequency remains to be further investigated.

vector backbone DNA can be transferred and integrated into plant genome (Wenck et al. 1997; Yin and Wang 2000; Forsbach et al. 2003). We suggest that in cotton, vector sequence integration mostly happen collinearly with LB of T-DNA for the following reasons. Firstly, 37 of the 43 vector containing plants had vector sequence outside LB. Secondly, most plants with vector sequence outside RB also had vector sequence outside LB (22/28) and nally, inverse PCR cloned from 33 plants vector sequences linked to LB. In line with this suggestion, we also found that ve of them might contain a full copy of the binary vector (data not shown). Vector sequences in transgenic plants have been rarely characterized at the sequence level and our sequence analysis of these 33 vector sequences integrated provided us an insight into mechanism of vector sequence integration in the plant. All of them were found in conjunction with LB without mutation, suggesting that LB were skipped during T-DNA formation. All also had the full coding sequence of the chloramphenicol resistance protein (cmr) located outside LB. This surprising nding correlated with our inclusion of chloramphenicol in agrobacterium culture medium. Such strong correlation between the antibiotic selection in agrobacterium culture medium and the integration of antibiotic gene in transgenic plants provides a very strong support to the suggestion that collinear vector integration is determined in agrobacterium before co-cultivation with plant cell (De Buck et al. 2000). With this nding, minimizing long vector integration is possible by using a negative selection marker after LB. It was also noted that these vector sequences ended mostly at G or C (27/33), a pattern similar to RB ends. It is possible that recognition for ending of collinear vector sequence is similar to that for RB. Feasibility of getting clean single copy plants in T1 generations With the concern of expression and stability of a transgene, it has been a general practice in commercial development to discard T0 transgenic lines with multiple copies of T-DNA. Such practice limits the number of useful lines into further development. In this study, we demonstrated that clean single copied transgenic plants could be obtained from two T1 populations derived from T0 plants with multiple

Collinear vector integration is determined in agrobacterium In our study, 31% (43/139) of transgenic cotton plants had vector sequences outside LB or RB integrated. This conrmed the ndings by others that

123

Transgenic Res (2008) 17:293306

305 methylation of homologous promoters in trans. Plant J 17:131140 Jeong DH, An S, Park S, Kang HG, Park GG, Kim SR, Sim J, Kim YO, Kim MK, Kim J, Shin M, Jung M, An G (2006) Generation of a anking sequence-tag database for activation-tagging lines in japonica rice. Plant J 45:123132 Kirik A, Salomon S, Puchta H (2000) Species-specic doublestrand break repair and genome evolution in plants. Embo J 19:55625566 Kohli A, Grifths S, Palacios N, Twyman RM, Vain P, Laurie DA, Christou P (1999) Molecular characterization of transforming plasmid rearrangements in transgenic rice reveals a recombination hotspot in the CaMV 35S promoter and conrms the predominance of microhomology mediated recombination. Plant J 17:591601 Kohli A, Leech M, Vain P, Laurie DA, Christou P (1998) Transgene organization in rice engineered through direct DNA transfer supports a two-phase integration mechanism mediated by the establishment of integration hot spots. Proc Natl Acad Sci U S A 95:72037208 Kononov ME, Bassuner B, Gelvin SB (1997) Integration of TDNA binary vector backbone sequences into the tobacco genome: evidence for multiple complex patterns of integration. Plant J 11:945957 Kumar S, Fladung M (2000) Transgene repeats in aspen: molecular characterisation suggests simultaneous integration of independent T-DNAs into receptive hotspots in the host genome. Mol Gen Genet 264:2028 Lange M, Vincze E, Moller MG, Holm PB (2006) Molecular analysis of transgene and vector backbone integration into the barley genome following Agrobacterium-mediated transformation. Plant Cell Rep 25:815820 Li XB, Fan XP, Wang XL, Cai L, Yang WC (2005) The cotton ACTIN1 gene is functionally expressed in bers and participates in ber elongation. Plant Cell 17:859875 Matzke AJ, Matzke MA (1998) Position effects and epigenetic silencing of plant transgenes. Curr Opin Plant Biol 1: 142148 Newell CA (2000) Plant transformation technology. Developments and applications. Mol Biotechnol 16:5365 Olhoft PM, Flagel LE, Somers DA (2004) T-DNA locus structure in a large population of soybean plants transformed using the Agrobacterium-mediated cotyledonarynode method. Plant Biotechnol J 2:289300 Parinov S, Sevugan M, Ye D, Yang WC, Kumaran M, Sundaresan V (1999) Analysis of anking sequences from dissociation insertion lines: a database for reverse genetics in Arabidopsis. Plant Cell 11:22632270 Parinov S, Sundaresan V (2000) Functional genomics in Arabidopsis: large-scale insertional mutagenesis complements the genome sequencing project. Curr Opin Biotechnol 11:157161 Paterson AH (1993) A rapid method for extraction of cotton Genomic DNA suitable for RFLP or PCR analysis. Plant Mol Biol Reptr 1:122127 Raina S, Mahalingam R, Chen F, Fedoroff N (2002) A collection of sequenced and mapped Ds transposon insertion sites in Arabidopsis thaliana. Plant Mol Biol 50:93110 Stahl R, Horvath H, Van Fleet J, Voetz M, von Wettstein D, Wolf N (2002) T-DNA integration into the barley genome

copies of T-DNA and vector backbone sequence. This validates the alternative approach to proceed with multiple copied T0 plants with desirable phenotypes and obtain clean single copy segregants in the next generation.
Acknowledgements This project was supported by an internal research grant of Temasek Life Sciences Laboratory, Singapore. We would like to thank Tan Jason and Ng Khar Meng for technical help.

References
Brunaud V, Balzergue S, Dubreucq B, Aubourg S, Samson F, Chauvin S, Bechtold N, Cruaud C, DeRose R, Pelletier G, Lepiniec L, Caboche M, Lecharny A (2002) T-DNA integration into the Arabidopsis genome depends on sequences of pre-insertion sites. EMBO Rep 3:11521157 Cluster PD, ODell M, Metzlaff M, Flavell RB (1996) Details of T-DNA structural organization from a transgenic Petunia population exhibiting co-suppression. Plant Mol Biol 32:11971203 Crane CF, Price HJ, Stelly DM, Czeschin DG, McKnight TD (1993) Identication of a Homeologous Chromosome Pair by in-Situ DNA Hybridization to Ribosomal-Rna Loci in Meiotic Chromosomes of Cotton (Gossypium-Hirsutum). Genome 36:10151022 De Buck S, De Wilde C, Van Montagu M, Depicker A (2000) Determination of the T-DNA transfer and the T-DNA integration frequencies upon cocultivation of Arabidopsis thaliana root explants. Mol Plant Microbe Interact 13:658665 De Buck S, Jacobs A, Van Montagu M, Depicker A (1999) The DNA sequences of T-DNA junctions suggest that complex T-DNA loci are formed by a recombination process resembling T-DNA integration. Plant J 20:295304 De Neve M, De Buck S, Jacobs A, Van Montagu M, Depicker A (1997) T-DNA integration patterns in co-transformed plant cells suggest that T-DNA repeats originate from cointegration of separate T-DNAs. Plant J 11:1529 Forsbach A, Schubert D, Lechtenberg B, Gils M, Schmidt R (2003) A comprehensive characterization of single-copy T-DNA insertions in the Arabidopsis thaliana genome. Plant Mol Biol 52:161176 Hajdukiewicz P, Svab Z, Maliga P (1994) The small, versatile pPZP family of Agrobacterium binary vectors for plant transformation. Plant Mol Biol 25:989994 Haseloff J, Amos B (1995) GFP in plants. Trends Genet 11:328329 Iglesias VA, Moscone EA, Papp I, Neuhuber F, Michalowski S, Phelan T, Spiker S, Matzke M, Matzke AJ (1997) Molecular and cytogenetic analyses of stably and unstably expressed transgene loci in tobacco. Plant Cell 9: 12511264 Jakowitsch J, Papp I, Moscone EA, van der Winden J, Matzke M, Matzke AJ (1999) Molecular and cytogenetic characterization of a transgene locus that induces silencing and

123

306 from single and double cassette vectors. Proc Natl Acad Sci U S A 99:21462151 Thomas CM, Jones DA, English JJ, Carroll BJ, Bennetzen JL, Harrison K, Burbidge A, Bishop GJ, Jones JD (1994) Analysis of the chromosomal distribution of transposoncarrying T-DNAs in tomato using the inverse polymerase chain reaction. Mol Gen Genet 242:573585 Topping JF, Wei W, Clarke MC, Muskett P, Lindsey K (1995) Agrobacterium-mediated transformation of Arabidopsis thaliana. Application in T-DNA tagging. Methods Mol Biol 49:6376 Tzra T, Li J, Lacroix B, Citovsky V (2004) Agrobacterium TDNA integration: molecules and models. Trends Genet 20:375383 Wenck A, Czako M, Kanevski I, Marton L (1997) Frequent collinear long transfer of DNA inclusive of the whole

Transgenic Res (2008) 17:293306 binary vector during Agrobacterium-mediated transformation. Plant Mol Biol 34:913922 Yin Z, Wang GL (2000) Evidence of multiple complex patterns of T-DNA integration into the rice genome. Theor Appl Genet 100:461470 Zambryski P (1988) Basic processes underlying Agrobacterium-mediated DNA transfer to plant cells. Annu Rev Genet 22:130 Zambryski PC (1992) Chronicles from the AgrobacteriumPlant Cell-DNA Transfer Story. Annu Rev Plant Physiol Plant Mol Biol 43:465490 Zhao X, Wing RA, Paterson AH (1995) Cloning and characterization of the majority of repetitive DNA in cotton (Gossypium L.). Genome 38:11771188

123

You might also like