REVIEW-Crop Genome Sequencing2

R E P O R T
The plant genome: a socio-economic implication

Nomar Espinosa Waminal 1,2
Abstract | Crop genomics has gotten much attention recently, especially after the completion of the genome sequencing project of Arabidopsis thaliana. Coupled with the ever advancing DNA sequencing technologies, genomics has unlimited resources to offer to the scientific community. With several plant genomes having been sequenced, it is apparent that novel knowledge have been unraveled in understanding plant mechanisms for disease resistance and biosynthesis of desired traits, areas that immediately affect economic aspect of agriculture. Here, detailed review on the genome sequencing and assembly of three socio-economically important plant species are presented.
INTRODUCTION Plants have existed long before humans and animals have even appeared. With the instinctive gift of intelligence, and learning through trial-and-error process, humans have gradually developed tools for survival with the crudest technology from the early human existence to the ever evolving level of sophistication in our post-modern era. With this technological evolution came along the agricultural evolution, which could be accounted with greater significance, for in order to maintain life (i.e. society), food is essential, which mainly comes from plants [1]. Humans learned the value of plants and through experience selected those that are beneficial in a process called domestication [2] Plants with particular economic importance that lead to their cultivation are generally called crops [3]. These crops are generally valued for their relevance for food, medicine, materials, industry, landscape, etc. For enhancing the quality and yield of crops produced, breeders and scientists have worked hard to produce methods that address these
1
Laboratory of Functional Crop Genomics and Biotechnology, Department of Plant Science, College of Agriculture and Life Sciences, Seoul National University, 151742 Seoul, Korea Plant Biotechnology Institute, Department of Life Science, Sahmyook University, 139-742 Seoul, Korea Email: newaminal@snu.ac.kr
2
objectives. These are logically achieved by properly understanding the anatomy, physiology, genetics, and ultimately all aspects of plant mechanisms responsible for growth and development, as well as environmental factors that affect this growth and development. It basically means, the more knowledge we have of a plant and the intricate interplay of all biological factors that limit or promote its optimal performance, the easier it is to manipulate particular parameters to obtain the desired phenotype. We have come a long way in understanding the biological factors and mechanisms that contribute in plant growth and development, from knowing the hereditary molecules to the isolation of the first gene [4], to the recent studies of the genome, the transcriptome, the proteome, and the metabolome. Despite these great astonishing
R E P O R T | CROP GENOME ANALYSIS
VOLUME 01 | APRIL 2012 | 1
advances that encompass the study of plants, it is still apparent that we have yet another long way to go to make use of this vast amount of information for the improvement of human health and lifestyle, and to address the recent international concerns of global warming. The concerted efforts of scientists from various fields have contributed enormously in the understanding of the plant genome, its structure, and its function. Unlocking the genomic DNA sequence and understanding the interplay of the DNA and other biomolecules have a profound impact in downstream researches and applications of this information. There is such a wide horizon of downstream applications of the genome sequence that attempting to enumerate them is like trying to limit its possibilities. Nevertheless, some of the apparent direct and indirect contributions of the genome sequence to the scientific community include (i) access to: the relatively complete gene catalogue of a species, the regulatory elements that control the gene functions, and the foundation in understanding variation of genomes; (ii) understanding the structure, function, and evolution of organisms; (iii) understanding biochemical pathways; (iv) development of molecular markers to speed up genetic analysis, discovery of genes, and breeding programs for crop improvement; (v) and providing framework for further structural and functional genomics studies of model plants, essential food crops, animal feed, and energy crops. To date, there are about 26 plant genomes that have been sequenced [5]. Of which plant genome to sequence first is influenced by several factors like the sequencing cost, genome size, and genome complexity. These factors have more influence in the decision of sequencing than the direct economic significance of the species being sequenced, as exemplified by the sequencing of Zea mays L. which could have been sequenced after Arabidopsis and rice [6] but was consequently sequenced later, after Vitis
vinefera [7] and Populus trichocarpa [8], due to the huge amount of repetitive elements in
its genome [9]. These highly repetitive elements make genome assembly difficult by challenging computational accuracy [9], especially with the use of the next generation sequencing (NGS) technologies that produce short reads [10]. However, scientists have developed approaches that utilize long reads (Sanger sequencing) in combination with the NGS reads to produce more reliable results [11]. To date, several important crop species have already been sequenced using whole genome shotgun (WGS) sequencing or BAC-byBAC sequencing approaches (Table 1), and this number is dramatically increasing as more sophisticated sequencing technologies and bioinformatics tools are being refined. With the increasing knowledge of the plant genome, which was greatly spurred after the completion of the genome sequence of Arabidopsis thaliana in 2000 [5], comes along the incremental evolution of DNA sequencing technologies. The Sanger method has dominated the DNA sequencing industry for nearly two decades and has contributed so much in sequencing many genomes, including the monumental completion of the human
Table 1. Overview of plant genomes that have been sequenced (Adapted from Trends in Plant Science February 2011, Vol. 16, No. 2 and List of sequenced eukaryotic genomes. (2012, April 20). In Wikipedia, April 29, 2012)
Organism* Dicots Relevance Genome (Mb) 207 Chrom. no. (n) 8 Predicted Genes 32,670 Sequencing strategy WGS Organization Year of completion 2011
Arabidopsis lyrata
Model plant
DOE-JGI and Max Planck Institute for Developmental Biology
Arabidopsis thaliana
Model plant
119
25,498, 27,400, 31,670
BAC-by-BAC
Arabidopsis Genome Initiative
2000
Brassica rapa Cannabis sativa Cucumis sativus Fragaria vesca Glycine max Jatropha curcas Lotus japonicus Malus domestica Medicago truncatula Populus trichocarpa Ricinus communis Solanum tuberosum Thellungiella parvula Theobroma cacao
Crop and model organism Hemp and marijuana production Vegetable crop Fruit crop Protein and oil crop Biodiesel crop Model legume Fruit tree Model organism for legume biology Carbon sequestration, model tree, timber Oilseed crop Crop plant Arabidopsis relative with high salt tolerance Flavoring crop
284 534 367 280 1,100 410 417 927 375 550 320 844 140 430
10 10 7 7 20 11
41,174 30,074 26,682 25,050 46,430 40,929 30,799 57,000 62,388
WGS WGS WGS WGS WGS BAC-by-BAC WGS BAC-by-BAC BAC-by-BAC BAC-by-BAC WGS WGS WGS WGS WGS
multicenter collaboration multicenter collaboration Chinese Academy of Agricultural Sciences, Beijing multicenter collaboration Purdue University Kazusa DNA Research Institute multicenter collaboration International consortium multicenter collaboration The International Poplar Genome Consortium multicenter collaboration multicenter collaboration multicenter collaboration CIRAD, multiple institutions (separate project, Mars Inc., USDA)
2011 2011 2009 2011 2010 2010 2008 2010 2011 2006 2010 2011 2011 2010
19
45,555 31,237
12
39,031 28,901
10
28,798
Vitis vinifera
Fruit crop
490
30,434
WGS
The French-Italian Public Consortium for Grapevine Genome Characterization
2007
Monocots
Brachypodium distachyon Oryza sativa ssp indica
Model monocot (grass) Crop and model organism
272 420
5 12
26,500 32-50,000
WGS WGS
The International Brachypodium Initiative Beijing Genomics Institute, Zhejiang University and the Chinese Academy of Sciences
2010 2002
Oryza sativa ssp japonica Oryza glaberrima
Crop and model organism West African species of cultivated rice that was domesticated independently of Asian rice.
466 316
12 12
58,000 ND
BAC-by-BAC BAC pooling and WGS
Syngenta and Myriad Genetics Arizona Genomics Institute
2002 2010
Phoenix dactylifera Sorghum bicolor Zea mays
Fruit tree (palm) Crop plant Cereal crop
658 730 2,800
36 10 10
>25,000 27,640 63,300
WGS WGS BAC-by-BAC
Genomics Core, Qatar Multiple institutions NSF
2011 2009 2009
ND: no data *species in bold font are discussed in detail in this review
genome [12]. However, limitations of this technology (low throughput and high cost are main concerns) have fueled the need for more advanced sequencing technologies that could produce enormous amount of sequence data in shorter time and cheaper cost. The result is the shift of sequencing approach from the traditional first-generation technology of automated Sanger sequencing to the more advanced next-generation sequencing [12]. Recent development and refinement of these technologies have initiated the thirdgeneration sequencing with high accuracy, longer read lengths, super high coverage and fast data acquisition [13].
Here, I review the genomic sequencing results of three recently sequenced crops:
Cucumis sativus, Jatropha curcas, and Theobroma cacao, covering the sequencing
strategies used, genomic structure and arrangement, novel biosynthetic pathways, speciesspecific genes, implication in species evolution, and other areas of functional genomics. REVIEWED GENOMES
Cucumis sativus L. (cucumber) belongs to the family Cucurbitaceae which includes

many economically significant species. It has served as a model system for sex determination studies [14]. The cucurbits are also plant models for vascular biology studies because xylem and phloem sap are easily collected for long-distance signaling studies.
Jatropha curcas L. ( jatropha) belongs to the Euphorbiaceae family. It has much

potential for various uses including biofuels due to its high yield of oil per unit area which is second only to oil palm. This presents a great promise for reducing the problems caused by the continued consumption of fossil fuels, mainly the global warming concerns.
Theobroma cacao L. (cocoa tree), the Criollo variety, is an important crop in producing
chocolate products. However, fine-cocoa production is about less than 5% globally. This is mainly caused by fungal, oomycete and viral diseases, and insect pest susceptibility to fine-flavor cocoa varieties. Breeding of improved Criollo varieties is needed for sustained production of fine-flavor cocoa. Despite the great economic significance of these crops as partially mentioned above, there are limited or very limited genomic resources that hinder speedy researches that address their respective objectives. Unlocking their genomic sequence will undoubtedly uncover new frontiers that would help understand their structure, function, and control of desired traits. Consequently, independent organizations have initiated and completed the sequencing of their genomes (Table 1). STRUCTURAL GENOMICS
Sequencing and assembly

N50 is the sequence size above which half of the total length of the sequence set can be found.
Whole genome shotgun (WGS) sequencing approach was used to sequence the three genomes, with different platforms used and methods to achieve a better quality of assembled sequences. Whole genome shotgun with a combination of the Sanger and NGS (GA by Illumina) sequencing was used to sequence the cucumber genome. This method produced longer N50 of both contigs and scaffold than when using separately assembled reads from each sequencing strategy (Table 2). For J. curcas genome sequencing, a combination of BAC end sequencing and shotgun sequencing was employed using the
Table 2. Genome assembly statistics of C. sativus Assembly Sanger Illumina GA Sanger + Illumina GA Contig N50 (kb) 2.6 12.5 19.8 Contig total (Mb) 204 190 226.5 Scaffol N50 (kb) 19 172 1,140 Scaffold total (Mb) 238 200 243.5
conventional Sanger method and the NGS (GS-FLX by Roche/454 and GA by Illumina). For R E P O R T | CROP GENOME ANALYSIS
T. cacao, the sequencing strategy used was WGS incorporating Sanger and NGS platforms
(-FLX by Roche/454 and GA by Illumina). Different software was used to assemble the genomic sequences of the three species Table 3. The combination of the conventional
Table 3. Summary of information about the genome, sequencing strategy and genome assembly statistics of the three reviewed species
Jatropha curcas
Genome size (Mb) Ploidy, chrom. no. Date of completion Date of publication, Journal Sequencing/Funding Institution Sequencing strategy ~410 2n=2x=22 2010 2011, DNA research Kazusa DNA Research Institute Foundation, Japan BAC-by-BAC and WGS Sanger Sequencing method NGS for shotgun libraries and BAC ends GS-FLX (Roche, USA) GA II (Illumina, USA) Assembly program Total length of assembled genome Percent of genome covered by the assembly Coverage depth of raw data Gene space covered Sequence anchored to chromosomes Contig: Total number Total length (Mb) Average length (Kb) Longest (Kb) N50 (Kb) Scaffold: Total number Total length (Mb) Average length (Kb) Longest (Kb) N50 (Kb) 15, 300 129.3 8.4 56 ND 4, 792 326.9 68.2 3, 145 473.8 47, 837 243.5 ND ND 1, 144 120, 586 276.7 2.3 29.7 3.8 25, 912 291.4 11.2 190 19.8 62, 410 226.4 ND ND 19.8 PCAP.rep and MIRA 285.9 Mb ~70% (if based on ~410 Mb genome) ~75% (if based on ~380 Mb genome) ND 95% ND Newbler version 2.3, SOAP 326.9 Mb ~76% (based on genotype B97-61/B2, 430 Mb) 61.1x 97.8% 67% RePS2 243.5 Mb ~66% (based on Chinese long inbred line 9930) Total: 72.2x 96.8% 72.8% [15]
Theobroma cacao
~430 2n=2x=20 2010 2011, Nature Genetics The International Cocoa Genome Sequencing Consortium-ICGS, coordinated by CIRAD WGS Sanger NGS for BAC ends only GS-FLX GA II [18]
Cucumis sativus
~367 2n=2x=14 2009 2009, Nature Genetics Chinese Academy of Agricultural Sciences, Beijing, China WGS Sanger NGS for BAC, plasmid, and fosmid sequencing GA II [11]
ND: no data
Sanger and NGS strategy proved to be superior than just using either technology independently by compensating the shortcomings of each respective method, allowing the acquisition of high quality sequences with lower cost in a short period of time; thus,
Gy14 is a North American processing market-type cucumber cultivar. PI183967 is an accession of C. sativus var.
making it popular in the sequencing of eukaryotes [15].
Linkage analysis
The consensus genetic map of T. cacao was created using two mapping populations, while 77 recombinant inbred lines from inter-subspecific cross between Gy14 and PI183967 were used for cucumber. No data was provided about the linkage analysis of J.
hardwickii originating
from India.
curcas. About the same percentage of the molecular markers were aligned into the newly
assembled genomic sequence of C. sativus and T. cacao (Table 4). It was interesting to observe that in cucumber, recombination suppression regions were found after comparing
FISH: Fluorescence in situ hybridization is a cytogenetic technique that is used to detect and localize the presence or absence of specific DNA sequences on chromosomes.
the genetic and physical maps. This covers two 10-Mb regions at either ends of chromosome 4, a 20-Mb region on chromosome 5, and a 8-Mb region on chromosome 7 (Fig. 1a). Further FISH analysis revealed segmental inversion on chromosome 5 between Gy14 and PI183967 (Fig. 1b). This chromosomal inversion helps explain the recombination suppression in these regions and added insight to the study of cucumber evolution during domestication.
Table 4. Summary of the linkage analysis Mapping population Total length (cM) 750.6 581 No. mol. markers 1,259 1,885 Aligned markers 1,192 (94%) 1,763 (93.5) Anchored sequences to chromosomes (%) 67 72.8
T. cacao C. sativus
2 77
Figure 1. The integrated genetic and physical maps of cucumber. (a) Genetic distance vs. physical distance of the seven cucumber chromosomes. The brackets denote the regions of recombination suppression. (b) Detection of segmental inversion on chromosome 5 between Gy14 and PI183967 through FISH (12-7 and 12-2 are fosmid clones used as probes). Bar = 5m.
Repetitive sequences
In proportion to the genome size, J. curcas has the most number of transposons (36.6% of 410 Mb genome) followed by C. sativus (24% of 367 Mb genome) and T. cacao (24% of 430 Mb genome) (Table 5). In all three species, the Class I transposable elements (retrotransposons) represent majority of the repeat sequences in the genome. In J. curcas,
Table 5. Summary of the transposable elements identified in the three species
C. sativus
Number of elements Class I LTR: copia LTR: gypsy LTR: other Others Class II Others TOTAL 20,119 16,972 135,464 266,232 1.75 1.24 11.64 24.01 119,339 (91,109)* Fraction of the genome (%) 12.16 (10.43) Number of elements 113,047 31,740 67,658 13,454 195 25,977 28,069 152,805
J. curcas
Fraction of the genome (%) 29.91 8.03 19.6 2.23 0.05 2.04 5.22 36.6 19260 21,882 ND 67,575 Number of elements 49,942 18,060 12,622
T. cacao
Fraction of the genome (%) ND
ND ND ~24
*Values in parentheses denote total value of all LTR families.
there are more gypsy-type retrotransposons than copia-type, an opposite pattern with that of T. cacao. In fact, in T. cacao, a copia-like LTR name Gaucho, 11,297 bp long and repeated approximately 1,100 times, was identified and hybridized through FISH, and was found to occupy most of the interstitial regions (regions between centromeres and telomeres) of chromosome arms (Fig. 2b). Additionally, a 212 bp long repeat named ThCen was confirmed to be centromere-specific repeats after FISH analysis (Fig 2a), and that it may have contributed to the genome size variation of T. cacao.
Figure 2. FISH analysis of T. cacao repetitive sequences. (a) T. cacao chromosomes counterstained with DAPI (blue) with ThCen (red) used as probe. (b) hybridization of ThCen (red) and Gaucho LTR retrotransposon probes (green).
Gene content
Not all RNA-encoding genes were reported for the three species (Table 6). Due to the limitations of sequencing method, the number of genes, especially for the ribosomal RNAencoding genes may be largely underestimated. In T. cacao, only six fragments of rRNA genes were recovered, a huge reduction of the average number of repeats found in most R E P O R T | CROP GENOME ANALYSIS
eukaryotes, as can be observed through FISH analysis using rDNA as probes [16]. MicroRNAs (miRNAs) are short non-coding RNAs that transcriptionally or posttranscriptionally regulate gene-expression. Many miRNAs have roles in plant development and stress response. In T. cacao, most of the miRNAs predicted have homologous transcription factor sequences, suggesting that miRNAs are major gene expression regulators in T. cacao. Three gene-prediction methods were used to identify proteinrRNA tRNA miRNA snoRNA snRNA Table 6. Summary of RNA-coding genes in the three species
C. sativus
292 699 171 238 192
J. curcas
ND 597 ND ND 65
T. cacao
6 473 83 ND ND
coding genes for the three species namely ab initio, cDNA-EST, and homology searches using gene
finder software in public databases (Table 7). Comparison of the gene families with other sequenced genomes resulted to 682 T. cacao-specific and 4,362 C. sativus-specific gene families, while 1,529 genes were found to be specific to the family Euphorbiaceae where J.
curcas belongs.
Table 7. Summary of the gene-prediction analysis of the three sequenced genomes
Geneprediction methods Protein-coding region search programs GlimmerHMM Genscan Agustus BGF SNAP GeneMark.hmm Genescan Similarity searches database No. of predicted genes 26,682 Mean coding sequence size (bp) 1,046 Mean exon size (bp) 238 Mean intron size (bp) 483 Mean exons per gene 4.39
C. sativus
ab initio
homology search cDNA-EST
Arabidopsis
Papaya Poplar Grapevine Rice Uniref TrEMBL
J. curcas
ab initio
40,929
3,064
227
356
ND
T. cacao
ab initio
EUGene SpliceMachine
Swiss-Prot TAIR Malvaceae GenBank
28,798
3,346
231
6,319
5.03
Glycine max T. cacao EST
FUNCTIONAL GENOMICS
Disease resistance-related genes

Resistance genes (R genes) are subdivided into two classes: the nucleotide-binding site leucin-rich repeat (NBS-LRR) class of genes and the receptor protein kinase (RPK) class of genes [17]. A total of 297, 92, and 61 NBS genes were identified in T. cacao, J. curcas, and C. sativus, respectively. Three major resistance gene families were identified in T. cacao, the NBS, LRR-RLK and NPR1, all three have been mapped onto the ten chromosomes. Alternative mechanisms may have been utilized by C. sativus to confer resistance to pathogens. The relatively few amount of NBS genes found in its genome compared to the two other species and to Arabidopsis (200), poplar (398), and rice (600) [8] is compensated R E P O R T | CROP GENOME ANALYSIS
by the expansion of its lipoxygenase (LOX) pathways that produce short chain aldehyde and alcohols that are involved in plant defense mechanism. The eukaryotic translation initiation factors confer recessive resistance to plant viral infections. Three EIF4E and EIF4G genes that encode the eIF4E and eIF4G proteins, respectively, have been identified in C.
sativus genome, another mechanism of compensating its less NBS-R-mediated pathogen

resistance.
Genes for desired traits in respective species

One of the utmost goals of genome sequencing is the elucidation of putative gene families that are responsible for the desired quality traits that contribute to the species economic significance. In the case of T. cacao, gene families that are directly responsible for the fine quality and high yield of cocoa are of great interest. For J. curcas, gene families that are responsible for oil production are greatly valued. And for C. sativus, since it is a model for plant vascular biology and the study of sex expression among others, gene families responsible for these traits are of great interest. Triacylglycerol (TAG) genes contribute to the biodiesel production, and J. curcas has the ability to biosynthesize and accumulate considerable amount of TAGs in its seeds. To improve Jatropha oil quality for biodiesel, modification of the fatty acid synthesis can be obtained by altering the genes involved in its synthesis as predicted in the newly sequenced genome. Moreover, J. curcas is known to produce tumor-promoting phorbol esters. Lowering the expression of the genes responsible for the production of phorbol esters in high oil-yielding lines will promote the safe use of J. curcas for biodiesel production. Oils, proteins, starch, Flavonoids, alkaloids, and terpenoids are principla components affecting flavor and quality of cocoa. The unique fatty acid profile of cocoa butter enhances the quality of smell to chocolates and confectioneries. A total of 84 orthologous genes were discovered that are potentially involved in the lipid biosynthesis; 96 genes involved in flavonoid biosynthesis; and 57 genes that encode terpene synthase. Cucurbits are known for the production of cucurbitacinan insect-repellent secondary metabolite but also attracts specific insects for pollination. Four genes for oxidosqualene cylase (OSC genes) that are responsible for the cucurbitacin production were identified in
C. sativus. Moreover, 137 cucumber genes related to the biosynthesis of ethylene, a

compound that stimulates femaleness in cucumber, have been identified. Additionally, auxin regulates sex expression, and six auxin-related genes were identified in C. sativus genome. Additionally, three short-chain dehydrogenase/reductase genes homologous to the ts2 sex-determination gene in maize (Zea mays) have been identified. The discovery of these genes will surely spur downstream studies directed to the improvement of varieties that would eventually address social and economic issues. A relevant example would be the issue of global warming.
Genome evolution and comparative genomics

Eudicots are known to have undergone paleo-hexaploidization events followed by lineage-specific whole genome duplications (WGD) events. It was suggested that the T.
cacao genome underwent 11 major chromosome fusions from the 21 chromosomes of the
paleo-hexaploid ancestor to produce the present 10 chromosomes (Fig. 3). On the other hand, the collinear gene-order analysis of C. sativus revealed no recent WGD, but some segmental duplication events. Additionally, the comparative genomics between C. sativus and its immediate relative, C. melo (melon) suggests a possible chromosomal fusion between two chromosomes among ten ancestral chromosomes to form the five (chrom. no 1, 2, 3, 5, and 6) of the seven present chromosomes of C. sativus (Fig. 4). In T. cacao, seven blocks of duplicated genes were characterized after alignment of its gene models onto its genome (Fig. 5).
Figure 3. Evolutionary model of T. cacao. The eudicot ancestor chromosomes are presented in seven colors. The several lineage-specific shuffling events have shaped the present eudicot genomes. R: rounds of WGD, F: chromosomal fusions.
Although the idea of accounting the genomic evolutionary history to common ancestry may be incredibly enticing, perhaps due to the fact that chromosomal segments are rearranged after breeding like in the case of C. sativus, I suggest that alternative approach be considered. The fact that nobody has lived a thousand years (how much more for a million years) and that homologous segments doesnt always mean common ancestry but alternatively mean common design and function, I recommend further unbiased researches as far as genomic history is concerned, to unlock further mechanisms that underlie the control of the genomic fusion, inversion, translocation, etc. and to test how much of these R E P O R T | CROP GENOME ANALYSIS
events limits the sustenance of life. Do these similarities really mean common ancestry, or common functions and/or regulatory mechanisms?
Figure 4. Comparative genomics between melon and cucumber, showing chromosomes 1, 2, 3, 5, and 6 of cucumber largely syntenic to two chromosomes of melon.
Microsynteny with other genomes

As expected, greater degree of syntenic relationship (53% of the assembled scaffolds) was observed between J. curcas and Ricinus communis, both in the Euphorbiaceae family, but less synteny was observed between distantly-related (or functionally less related) species like Glycine max (11%) and
Arabidopsis thaliana (16%). Meanwhile, 54% of the BAC

sequences of melon were aligned to C. sativus. Moreover, 628, 540, 1,106, 772, and 795 syntenic blocks were identified between C. sativus and A. thaliana, Carica papaya, Populus
trichocarpa, Vitis vinefera, and Oryza sativa, respectively.

The highly syntenic genomes of C. sativus and C. melo Figure 5. Duplicated gene segments of will help the genetic analysis of C. melo, now that the T. cacao. The seven colors represent the genomce of C. sativus have already been sequenced. It will
seven ancestral eudicot linkage groups.
also help in the advancement of phylogenetic relationship studies. Collectively, syntenic relationship among dicots, and among plants in a broader sense, will help in gene prediction, and eventually aid in understanding the relationship between sequence similarity and function, and the limitations to the theory of ancestry as the sole explanation to sequence similarity. CONCLUSION Recent advancement in sequencing technologies has revolutionized our experimental approaches in the study of plants. It has also shifted major scientific questions like How to sequence a genome to What platform should be used best to sequence a particular genome of interest. It has allowed scientists to study crops holistically in the genomics, transcriptomics, proteomics, and metabolomics level. It will definitely aid in crop improvement, understanding phylogenies and metabolic pathways among others. The direct or indirect exciting consequences of genome sequencing apparently boils down to the economic and lifestyle improvement of people that would hopefully be welldistributed globally. The promising open doors to science brought by these advance technologies are limitless. The use of these technologies for human and Mother Earths benefit is to be the main goal, and not just solely for humans.
REFERENCES (Those in bold font refers to the main articles for the three species reviewed here.) 1. 2. 3. 4. 5. 6. Yang TS. 2012. Plant and CultureAnother Interpretation of Human History. Journal of Jishou
UniversitySocial Sciences33(1): 1-7.

Hirst KK. Plant Domestication: Table of Dates and Places. http://archaeology.about.com/od/domestications/a/plant_domestic.htm Crop. 2012. March 30. In Wikipedia, The Free Encyclopedia. Retrieved 04:36, April 27, 2012, from http://en.wikipedia.org/w/index.php?title=Crop&oldid=484675363 Shapiro J, Machattie L, Eron L, Ihler G, Ippen K and Beckwith J. 1969. Isolation of Pure lac Operon DNA. Nature 224, 768 774. Feuillet C, Leach JE, Rogers J, Schnable PS and Eversole K. 2011. Crop genome sequencing: lessons and rationales. Trends in Plant Science 16:77-88. Messing J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y, Wei F, Fuks G, Soderlund CA, Mayer KFX, and Wing RA. 2004. Sequence composition and genome organization of maize.
PNAS 101: 1434914354.

7. 8. 9. Jaillon O, Aury JM, Noel B, et al. 2007. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449 (7161): 463467. Tuskan GA, Difazio S, Jansson S, et al. 2006. The genome of black cottonwood, Populus
trichocarpa (Torr. & Gray). Science 313 (5793): 1596604.

Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing RA, Rounsley S, Birren B, Nusbaum C, Mayer KF, and Messing J. 2005. Structure and architecture of the maize genome. Plant Physiol. 139:1612-1624. 10. Schatz MC et al. 2010. Assembly of large genomes using secondgeneration sequencing. Genome
Res. 20, 11651173.

11. Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, et al., 2009. The genome of the cucumber,
Cucumis sativus L. Nature Genetics 41:12751281.

12. Metzker ML. 2010. Sequencing technologiesthe next generation. Nature Reviews: Genetics 11:31-46. 13. Hayden EC. 2009. Genome sequencing: the third generation. Nature 457:768-769. 14. Tanurdzic M and Banks JA. 2004. Sex-determining mechanisms in land plants. Plant Cell 16, S61 S71. 15. Sato S et al., 2011. Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L. DNA Research 18:65-76. 16. Waminal NE, Kim NS and Kim HH. 2011. Dualcolor FISH karyotype analyses using rDNAs in three Cucurbitaceae species. Genes and Genomics. 33: 517-524. 17. Afzal AJ, Wood AJ and Lightfoot DA. 2008. Plant receptor-like serine threonine kinases: roles in signaling and plant defense. Mol. Plant Microbe Interact. 21, 507517. 18. Argout X, et al., 2011. The genome of Theobroma cacao. Nature Genetics 43:101-108.

REVIEW-Crop Genome Sequencing2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

REVIEW-Crop Genome Sequencing2

Uploaded by

Copyright:

Available Formats

R E P O R T

The plant genome: a socio-economic implication

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 1

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 2

DOE-JGI and Max Planck Institute for Developmental Biology

25,498, 27,400, 31,670

Arabidopsis Genome Initiative

41,174 30,074 26,682 25,050 46,430 40,929 30,799 57,000 62,388

The French-Italian Public Consortium for Grapevine Genome Characterization

Brachypodium distachyon Oryza sativa ssp indica

Model monocot (grass) Crop and model organism

Oryza sativa ssp japonica Oryza glaberrima

BAC-by-BAC BAC pooling and WGS

Syngenta and Myriad Genetics Arizona Genomics Institute

Phoenix dactylifera Sorghum bicolor Zea mays

Fruit tree (palm) Crop plant Cereal crop

658 730 2,800

>25,000 27,640 63,300

WGS WGS BAC-by-BAC

Genomics Core, Qatar Multiple institutions NSF

2011 2009 2009

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 3

Cucumis sativus L. (cucumber) belongs to the family Cucurbitaceae which includes

Jatropha curcas L. ( jatropha) belongs to the Euphorbiaceae family. It has much

Sequencing and assembly

making it popular in the sequencing of eukaryotes [15].

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 5

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 6

*Values in parentheses denote total value of all LTR families.

Swiss-Prot TAIR Malvaceae GenBank

Glycine max T. cacao EST

Disease resistance-related genes

sativus genome, another mechanism of compensating its less NBS-R-mediated pathogen

Genes for desired traits in respective species

C. sativus. Moreover, 137 cucumber genes related to the biosynthesis of ethylene, a

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 9

Genome evolution and comparative genomics

Microsynteny with other genomes

Arabidopsis thaliana (16%). Meanwhile, 54% of the BAC

trichocarpa, Vitis vinefera, and Oryza sativa, respectively.

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 11

UniversitySocial Sciences33(1): 1-7.

PNAS 101: 1434914354.

trichocarpa (Torr. & Gray). Science 313 (5793): 1596604.

Res. 20, 11651173.

Cucumis sativus L. Nature Genetics 41:12751281.

R E P O R T | CROP GENOME ANALYSIS

VOLUME 01 | APRIL 2012 | 12

You might also like