You are on page 1of 212

SAARC Regional Training

on

Molecular Genetic Characterization of Farm


Animal Genetic Resources
April 20 to 26, 2015

Course Director
Arjava Sharma, Director
ICAR-National Bureau of Animal Genetic Resources
Karnal-132001 (Haryana), INDIA

Course Coordinator
R.S. Kataria, Principal Scientist
ICAR-National Bureau of Animal Genetic Resources
Karnal-132001 (Haryana), INDIA

Course Joint-Coordinator
S.K. Niranjan, Senior Scientist
ICAR-National Bureau of Animal Genetic Resources
Karnal-132001 (Haryana), INDIA

Sponsored by
South Asian Association for Regional Cooperation
Organized by
ICAR-National Bureau of Animal Genetic Resources, Karnal, India
&
SAARC Agriculture Centre, Dhaka, Bangladesh

ICAR-National Bureau of Animal Genetic Resources, Karnal, India


Dr. Arjava Sharma, Director
E-mail: arjava@yahoo.com, director.nbagr@icar.gov.in
Dr. RK Vijh, Principal Scientist
E-mail: rameshkvijh@gmail.com
Dr. PK Vij, Principal Scientist
E-mail: pkvij@yahoo.com
Dr. MS Tantia, Principal Scientist
E-mail: tantiams@gmail.com

Training Faculty

Dr. RAK Aggarwal, Principal Scientist


E-mail: rakaplp@gmail.com
Dr. PK Singh, Principal Scientist
E-mail: pksinghmathura@yahoo.com
Dr. RS Kataria, Principal Scientist
E-mail: katariaranji@yahoo.co.in
Dr. Satpal Dixit, Principal Scientist
E-mail: dixitsp@gmail.com
Dr. Monika Sodhi, Principal Scientist
E-mail: monikasodhi@yahoo.com
Dr. Manishi Mukesh, ICAR-National Fellow & Principal Scientist
E-mail: mmukesh_26@hotmail.com
Dr. Reena Arora, Principal Scientist
E-mail: rejagati@gmail.com
Dr. Avnish Kumar, Principal Scientist
E-mail: avnish@lycos.com
Dr. Rekha Sharma, Senior Scientist
E-mail: rekvik@gmail.com
Dr. SK Niranjan, Senior Scientist
E-mail: saketniranjan@gmail.com
Dr. Indrajit Ganguly, Senior Scientist
E-mail: drindrajit@gmail.com
Dr. Sanjeev Singh, Senior Scientist
E-mail: sssanjeev197@gmail.com
Dr. Karanveer Singh, Scientist (SS)
E-mail:karan_veer@yahoo.com

MESSAGE
It is a pleasure to know that ICAR-National Bureau of Animal genetic Resources (NBAGR),
Karnal is organizing a SAARC Regional Training on Molecular Genetic Characterization
of Farm Animal Genetic Resources during 20-26 April, 2015.
NBAGR has made significant contribution by way of characterization and conservation
of indigenous livestock and poultry biodiversity. With experienced and competent faculty
in the area of genetic characterization, the Bureau is the right institution to organize
such training programme. I am sure the trainees from participating countries will gain
knowledge and be able to implement the FAO recommended genetic characterization of
their respective farm animal genetic resources. Exchange of ideas through discussions
during the training course will also go a long way in exploring new areas of international
collaborations and scientists will get benefitted from sharing of their knowledge and
experiences.
I wish the training programme a grand success.

[S Ayyappan]

Hkkjrh; f"k vuqla/kku ifj"kn~


f"k Hkou] Mk- jktsUnz izlkn ekxZ] ubZ fnYyh&110114
Indian Council of Agricultural Research
Krishi Bhawan, Dr. Rajendra Prasad Road, New Delhi110114

izks-".k eqjkjh yky ikBd


miegkfuns'kd (i'kq foKku)

Prof. K.M.L. Pathak


Deputy Director General
(Animal Sciences)

MESSAGE
I am happy to learn that a SAARC Agriculture Centre (Dhaka) sponsored short course on
Molecular genetic characterization of farm animal genetic resources, is being organized
during April 20-26, 2015 at ICAR-National Bureau of Animal Genetic Resources, Karnal,
a leading institute engaged in genetic characterization of livestock breeds. As all SAARC
countries are predominantly agricultural based economies, the farm animal genetic
resources play very important role in ensuring livelihood as well as nutritional security
of farming communities. Genetic characterization of the livestock resources is necessary
for their conservation and sustainable utilization. NBAGR is a leading institute engaged
in genetic characterization of livestock breeds. I am sure the participants from SAARC
countries will get an opportunity to learn the latest techniques being employed for
molecular characterization of livestock breeds. I am happy to note that organizers have
given special emphasis on the hands on training, which the participants will be able to
utilize on their return for characterization and conservation of their native germplasm.
I wish to convey my gratitude to SAARC Agriculture Centre for financial support and
all co-operation in organizing this training programme and look forward to more such
future ventures.
I convey my best wishes to the organizers and participants of the training programme.

[K.M.L. Pathak]
Place : New Delhi
Dated : 8th April, 2015

SAARC AGRICULTURE CENTRE (SAC)

MESSAGE
I am delighted to write a brief foreword for the compendium of the regional training
programme on Molecular Genetic Characterization of Farm Animal Genetic Resources
jointly organized by SAARC Agriculture Centre (SAC), Bangladesh and ICAR-National
Bureau of Animal Genetic Resources (NBAGR), Karnal.
SAARC Agriculture Centre (SAC), under the framework of SAARC has been working
for strengthening agriculture research and technology transfer through regional networks
among agricultural research/extension institutions and policy makers in the SAARC
member countries. ICAR-NBAGR is one of the reputed institutions in India undertaking
research and development activities to protect and conserve indigenous Farm Animal
Genetic Resources for sustainable utilization and livelihood security. This regional training
programme would provide hands-on and theoretical knowledge to the participants from
different SAARC member countries to strengthen research and extension activities in their
respective countries. I am extremely happy to see the contents covered in this training
programme. This compendium is a store of information related to characterization and
documentation of farm animal genetic resources. This book is unique and surely a work to
treasure for anyone who is interested in characterization of farm animal genetic resources.
I wish all the success for this regional training programme and for future endeavors.

[Dr. Abul Kalam Azad]


Director, SAARC Agriculture Centre
BARC Complex, New Airport Road, Farmgate, Dhaka1215, Bangladesh
Tel.: +880-2-8115353, PABX: 8113378, 8113380, 8113386, Fax: +880-2-9124596
E-mail: sac@saarcagri.org, Web. www.saarcagri.org

FOREWORD
India is blessed with large genetic bio-diversity in its domestic animals which contributes significantly to
the needs of worlds second highest populated country. National Bureau of Animal Genetic Resources
is a premier research institute of Indian Council of Agricultural Research, with a broad mandate of
characterization and conservation of animal genetic resources of India. The institute, since its establishment
in 1984 has shown tremendous progress to achieve its mandate by working in close association with the
stakeholders of poultry and livestock genetic resources of the country. During last three decades, Bureau
has developed strength through its well-trained scientific faculty, most of them having exposure to working
in international laboratories abroad. Besides, it has very well equipped laboratories with the facilities of
automated Sanger DNA sequencer, microarray, real-time PCR, High performance computer system etc.
I am happy that South Asian Association for Regional Cooperation (SAARC), has chosen our Bureau for
imparting training to the participants of member countries in the field of molecular characterization of farm
animal genetic resources. The programme has been designed with emphasis on hands-on-training and also
interactions among participants and the faculty. Laboratory exercises will be based on FAO recommended
tools for the genetic characterization of farm animals. I am sure that the topics covered during the training
programme will enhance the knowledge of the participants and they will be able to apply these skills and
techniques after returning back to their respective country. The faculty and the coordinators have put a lot of
efforts in bring out this compendium of lectures and I am sure this document will serve as a useful resource
material to the participants.
I convey my sincere thanks to Director and Senior Program Officer (Livestock) of SAARC Agriculture
Centre, Dhaka (Bangladesh) for their guidance, cooperation and financial support extended for the training
programme. I acknowledge the support extended by Deputy Director General (AS), ADG (AP&B), ADG
(IR) and other officers of ICAR and wish this training programme a grand success.

[ARJAVA SHARMA]

Course Director

PREFACE
Sovereignty over its animal genetic resources (AnGR) for each country is endorsed under Convention on
Biological Diversity (CBD). During last few decades, there has been increasing trend in erosion of indigenous
livestock populations due to various reasons, worldwide. In an era of globalization, protection of its valuable
indigenous AnGR by describing and cataloguing, is the need of hour for its sustainable utilization. Therefore,
it is imperative to characterize the unique germplasm and their important traits at phenotypic as well as
genetic levels. Importantly, during recent decades, newer and more effective molecular tools for genetic
characterization have come up with an advantage to identify the populations having gene pool of unique
alleles, biomolecules. Such tools are also useful in identification of threat level to a particular breed by
defining its status, thus helping in designing conservation strategies.
This SAARC Regional Training on Molecular Genetic Characterization of Farm Animal Genetic Resources
organized by ICAR-NBAGR, Karnal, India and SAARC Agriculture Centre, Dhaka, Bangladesh, during 2026 April, 2015 is much needed effort to propagate and share valued information and knowledge among the
SAARC members. A total of eighteen participants from Bangladesh, Maldives, Nepal, Pakistan, Sri Lanka and
India are expected to participate in this training programme. We hope, back home, the training programme
will help in characterizing and documenting the indigenous AnGR for their management across member
countries. Further, it will help in prioritizing the populations for conservation as well as value addition.
Compendium of lectures and practicals prepared for the training programme will be a useful document in
updating the knowledge in the field of molecular genetic characterization of native livestock and poultry of
the participants.
We are highly thankful to the South Asian Association for Regional Cooperation, Secretariat, Kathmandu
(Nepal), for the financial support to the training programme. We are also grateful to Dr. Abul Kalam Azad,
Director and Dr. Md. Nure Alam Siddiky, Senior Program Officer (Livestock), SAARC Agriculture Centre,
Dhaka for all time cooperation and guidance while preparing for the training programme. We sincerely
thank Dr. Arjava Sharma, Course Director & Director, ICAR-NBAGR for taking keen interest and providing
guidance throughout the training programme. We gratefully acknowledge the assistance received from
members of various committees and all the scientific faculty of the Bureau, for their contribution to the
training programme.

Course-Coordinators

List of Participants
S.
Name of Participant
No.

Address

01

Dr. Md. Omar Faruque


Professor

Department of Animal Breeding and Genetics


Bangladesh Agricultural University
Mymensingh-2202, Bangladesh
E-mail: faruque_mdomar@yahoo.com
Phone: +88 01714 07543

02

Dr. Md. Rakibul Hassan


Scientific Officer

Bangladesh Livestock Research Institute


Dhaka, Bangladesh
E-mail: mdrakibulhassan@gmail.com
Phone: +88 01712511183

03

Mr. Md. Panir Choudhury


Scientific Officer

Bangladesh Livestock Research Institute


Dhaka, Bangladesh
E-mail: cpanir@yahoo.com
Phone: + 88 01717 629021

04

Mr. Ali Naseeh


Police Corporal

DNA Analyst
Maldives Police Service, Maldives
E-mail: int.affairs@police.gov.mv
Phone: +960 7967949

05

Mr. Hussain Rasheed


Police Sergeant

DNA Analyst
Maldives Police Service, Maldives
E-mail: int.affairs@police.gov.mv
Phone: +960 9996687

06

Mr. Raju Kadel


Senior Scientist

Nepal Agricultural Research Council


Animal Breeding Division,
Khumaltar, Nepal
E-mail: raju_kadel@yahoo.com
Phone: +977 9851149990

07

Dr. Pankaj Jha


Scientist

Nepal Agricultural Research Council


Animal Breeding Division
Khumaltar, Nepal
E-mail: drpankaj.np@gmail.com
Phone: +977 9841099494

08

Mr. Saroj Sapkota


Scientist

Animal Breeding Division


Nepal Agricultural Research Council
Khumaltar, Nepal
E-mail: sarose.sapkota@gmail.com
Phone: +977 9841596477

09

Dr. Ahmad Ali


Associate Professor

Department of Biosciences
COMSATS Institute of Information Technology
COMSATS Road, G. T Road,
Sahiwal, Punjab, Pakistan
E-mail: ahmadali@ciitsahiwal.edu.pk
Phone: +92 40 4305001

S.
Name of Participant
No.

Address

10

Dr. Maqsood Akhtar


Senior Research Officer

Animal Breeding & Genetics Division


Buffalo Research Institute Pattoki,
District Kasur, Pakistan
E-mail: drmaqsood66@gmail.com, akhtar_maqsood@yahoo.com
Phone: +92-301-6402120

11

Ms. W.N.U. Perera


Lecturer (Probationary)

Department of Animal Sciences


Faculty of Agriculture, University of Peradeniya
Sri Lanka
E-mail: wnuperera@gmail.com
Phone: +94812395321

12

Mr. L. J. Ekanayake
Lecturer (Probationary)

Department of Animal Sciences


Faculty of Agriculture, University of Peradeniya
Sri Lanka
E-mail: jayampathiekn@gmail.com
Phone: +94716857442

13

Dr. D.M.W.C.B. Dissanayake Animal Breeding Division, Veterinary Research Institute,


Veterinary Surgeon
Department of Animal Production and Health
Gannoruwa, Peradeniya, Sri Lanka, 20400
E-mail: chamindadissa@yahoo.com
Phone: +94718201935

14

Dr. Amit Kumar


Laboratory Animal Resources
Scientist (AG&B) & Incharge ICAR-Indian Veterinary Research Institute
Izatnagar, Bareilly - 243122 (Uttar Pradesh), India
E-mail: vetamitchandan07@gmail.com
Phone: 0091-92196-14456

15

Dr. S.P. Yadav


Sr. Scientist (Biotech)

ICAR-Central Institute for Research on Buffaloes


Sirsa Road, Hisar - 125 001 (Haryana), India
E-mail: yadav.satyapal@gmail.com
Phone: 0091-99915-35791

16

Dr. M.S. Dighe


Scientist (AG&B)

ICAR-Central Institute for Research on Goats


Makhdoom, PO Farah, Mathura-281122
Uttar Pradesh, India
E-mail: dighemahesh@cirg.res.in
Phone: 0091-9720978970

17

Dr. Anupama Mukherjee


Senior Scientist

Dairy Cattle Breeding


ICAR-National Dairy Research Institute,
Karnal-132001 (Haryana) India
E-mail: writetoanupma@gmail.com
Phone: 0091-94362-64291

18

Dr. Sonika Ahlawat


Scientist

Animal Biotechnology Division


ICAR-National Bureau of Animal Genetic Resources
Karnal-132001 (Haryana) India
E-mail: sonika.ahlawat@gmail.com
Phone: 0091-94161-61369

CONTENTS
Title

Sl.

Page No

Theory Lectures
1.

Indian Livestock Diversity and its Conservation


- Arjava Sharma

1-6

2.

Designing Field Strategies for Characterization of Farm Animal Genetic Resources


- P K Singh and Karuna Asija

7-16

3.

Breed Registration Process in India


- P.K.Vij

17-21

4.

Conservation Strategy through Network Programme


- M S Tantia and Rekha Sharma

22-25

5.

Conservation of Genome Resources- Concept of Gene Bank


- Rajeev A K Aggarwal

26-29

6.

Cytogenetic and Molecular Methods for Screening of Major Genetic Defects in Livestock
- S K Niranjan and R S Kataria

30-35

7.

Genetic Characterisation of Animal Genetic Resources: Principle, Methodology


and Guidelines
- Monika Sodhi, S K Niranjan, M Mukesh and R S Kataria

36-47

8.

Microsatellite Markers for Genetic Diversity Analyses of Farm Animals


- Reena Arora

48-54

9.

Mitochondrial DNA as a Marker for Genetic Diversity and Evolution in Farm AnGR
- Monika Sodhi, Amit Kishore and Manishi Mukesh

55-64

10.

Y- Chromosome Based Genetic Diversity in Farm Animal Genetic Resources with Special
Reference to Bovine
- Indrajit Ganguly, Monika Sodhi, Suchit Kumar, Sanjeev Singh and K N Raja

65-74

11.

Candidate Gene Polymorphism Approaches for Detection and Genotyping


- R S Kataria and S K Niranjan

75-82

12.

Dissection of Complex Traits and Identification of Quantitative Trait Loci in Livestock


- R K Vijh and Upasna Sharma

83-92

13.

An Introduction to Quantitative Real Time PCR for Expression Analysis of


Candidate Genes
-Indrajit Ganguly, Sanjeev Singh, Monika Sodhi and Manishi Mukesh

93-105

14.

High Throughput Techniques for Transcriptome Analysis in Farm Animals with Special
Reference to Expression Microarrays
- Manishi Mukesh and Monika Sodhi

106-112

15.

Strategies for Genotype and Phenotype Association Studies in Livestock


- S P Dixit and Jayakumar Sivalingam

113-117

16.

High Performance Computing for High Throughput Data Analysis


- Avnish Kumar Bhatia

118-121

Practicals
1.

Cryopreservation of Cauda Epididymal Spermatozoa for Conserving Caprine


Genetic Biodiversity
- Rajeev A K Aggarwal

123

2.

Cytogenetic and Molecular Screening of Genetic Defects in Livestock


- S. K. Niranjan and R. S. Kataria

124-130

3.

Genomic DNA Isolation from Blood Samples


- Reena Arora and Rekha Sharma

131-135

4.

Analytical Approaches for Microsatellite Markers


- Reena Arora

136-143

5.

Approaches for Analysis of Mitochondrial Sequence Data


- Monika Sodhi and Manishi Mukesh

144-148

6.

SNPs detection, Genotyping and Submission


- R.S. Kataria, S.K. Niranjan, S.K. Mishra and Karanveer Singh

149-154

7.

Web Resources and Tools for Genomic Research


- S K Niranjan, Manika Sehgal and R S Kataria

155-171

8.

Statistical Procedures for Identification of Quantitative Trait Loci


-Upasna Sharma and R K Vijh

172-179

9.

RNA Isolation and Real time-Quantitative Polymerase Chain Reaction


- Manishi Mukesh, Ankita Sharma, Indrajit Ganguly, Kiran Thakur and Preeti Verma

180-184

10.

Expression Microarray Methodology Using Agilent Whole Genome Chip


- Manishi Mukesh, Ankita Sharma and Monika Sodhi

185-192

11.

Genotype and Phenotype Association Studies in Livestock


- S P Dixit, Anurodh Sharma and Jayakumar Sivalingam

193

1
Indian Livestock Diversity and its Conservation
Arjava Sharma
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________
Livestock sector in developing countries accounts for almost 25-40 percent of overall agricultural
output, serving as source of food, such as milk, meat and eggs; shelter and protection based on fiber
and hides; energy in the form of animal draught and transport; fuel and fertilizer utilizing animal
manure; savings based on the cash value of animals; and as part of cultural and traditional values.
These are also the best insurance against the vagaries of nature like drought, famine and other
natural calamities. Estimates for 2012-13 indicate that this sector contributed 132.4 million tonnes of
milk, 69.7 billion eggs, 46.05 million kg wool, and 5.95 million tonnes of meat in India.According to
estimates of the Central Statistics Office (CSO) of India, the value of output from livestock sector at
current prices was about 4,59,051crore during 2011-12 which is about 24.8% of the value of output
from total agricultural and allied sector at current price and 25.6% at constant prices (2004-05). Milk
is the main output of livestock sector accounting around two third (67%) of the total output by
livestock sector. Meat and egg share 18.2% and 3.9% of the value of livestock.
Animal genetic resource scenario in India
India has traditionally been a mega biodiversity center and rearing of domesticated animals of
different species viz. cattle, buffalo, sheep, goat, pig, camel, horse, donkey, yak and mithun by
livestock keepers has been practiced since time immemorial. In poultry, apart from chicken,
domesticated strains of avis such as ducks, geese, quails, turkey, pheasants and partridges also exist
in India.
According to the Livestock Census (2012), the country had 512.6 million livestock population
comprising mainly of 190.9 m cattle, 108.7 m buffalo, 65 m sheep, 135.2 m goat and 10.3 m pig, 0.32
m donkey, 0.63 m horses and ponies, 0.19 m mules, 0.40 m camel, 0.077 m yak and 0.298 m mithun
besides 692.65 m chicken and 23.54 m ducks.The vast and varied population of animals that country
possesses is indigenous while a very small to sizably high proportion is represented by crossbreds
between exotic germplasm and native stock. In pigs and cattle, proportion of crossbreds is
relatively high whereas in sheep it is only about 5%. There are very few animals belonging to exotic
breeds in the country which are maintained mostly in organized farms. Large proportion of farm
animal population is of non-descript native animals, which so far have not been characterized
systematically.
Presently, there are 151 registered breeds of livestock and poultry in India which include 39
breeds of cattle, 13 of buffalo, 40 of sheep, 24 of goat, 6 of horse and ponies, 9 of camel, 3 of pig, 1 of
donkey and 16 of poultry in addition to many more not characterized and accredited so far.
Populations of other species like mules, yaks, mithuns, ducks, quails, etc. are yet to be classified in
to well descript breeds.
SWOT analysis of Indian AnGR
Strengths:
Mega livestock biodiversity with existence of almost all major domesticated farm animal
species.
Large number of breeds in each farm animal species adapted to the specific agro-climatic
conditions.

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Diversified draft, milch and dual purpose cattle breeds. The draft breeds can significantly
contribute in agricultural operations to save fossil fuels.
Adaptability of germplasm to diverse changing climatic conditions of hot arid, humid tropical
and temperate climates and better resistance to parasites and diseases.
Capability to survive and produce on coarse and poor quality feed and fodder resources (low
input).
Availability of best breeds of buffaloes, a multipurpose farm animal species.
Large network of Research Institutes, State Agricultural/Animal Science Universities, State
Animal Husbandry Departments, Livestock Development Boards and NGOs engaged in
conservation and development of AnGR.
Large amount of ITK available with the livestock keepers for management of AnGR.
Seasonal migration of nomadic pastoralists help overcome adverse conditions especially
during winter and rainy seasons which enable them to sustain the breed population
maintained by them.
Weaknesses:
Lack of reliable breed wise livestock census data.
Low productivity of indigenous livestock.
Poor implementation of breeding policies.
High population density vis--vis inadequate feed and fodder resources, and pasture land
availability.
Lack of performance and pedigree recording at farmers level.
Inadequate number of superior/proven bulls/bucks/rams/semen for AI and natural mating.
Inadequate funding for conservation of AnGR.
Insufficient patronage to native breeds.
Lack of local institutions like breed societies or herders groups/association.
Poor marketing system for animals, animal products and by products.
Inadequate insurance coverage of livestock and poultry.
Lack of legal support for registration of livestock breeds and protection of farmers/ livestock
keepers rights.
Poor orientation for characterization and conservation of AnGR.
IPR issues not clearly defined in case of AnGR.
Lack of harmony and coordination among different agencies.
Opportunities:
Integral part of agriculture with synergistic relationship.
Substantial contribution to GDP.
Gainful employment, particularly to rural women and youth.
Excellent potential of indigenous AnGR for low cost conversion of poor quality roughages
into animal protein to cater the fast growing dietary demand of human population.
Large export potential for animal germplasm including semen/embryos adapted under
tropics, animal products and by products.
Presence of large genetic variability within breeds for bringing genetic improvement in traits
of economic and environment importance.
Availability of technologies like genomics, phenomics, nano-biotechnology, cloning, etc for
faster genetic improvement in AnGR.
Exploitation of animal draught power for better efficiency in farm operations.
Scope for allele mining for biotic and abiotic stresses in indigenous AnGR.
2

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Increasing scope and market for organic agriculture.


ITK provides researchable issues for animal production and health care.
Threats:
Loss of superior germplasm due to uncontrolled breeding, migration and slaughter.
Genetic dilution due to indiscriminate breeding.
Trans-border illegal export of AnGR.
Over mechanization replacing draft animal power.
Changes in production system leading to intensive monoculture.
Continued decreasing land under fodder production.
Increased human population pressure.
Increased pollution and degradation of environment.
Continuous decline in population of some breeds due to change in land use pattern.
Traditional grazing areas in forests or revenue lands are planted or occupied with exotic
species viz., Lantana, Prospis juliflora, Eucalyptus, etc. which has significantly reduced the
grazing area and affected population of breeds.
India possesses all major domesticated livestock species with rich diversity. The AnGR are well
adapted with better heat and draught tolerance, and disease resistance capabilities. Indigenous
livestock resources can survive and produce on coarse and poor quality feed and fodder resources.
It is also a source of employment to considerable number of rural people. However the desired
progress in this sector is limiting due to low productivity, high population density vis--vis
inadequate pasture land availability, lack of performance and pedigree recording at farmers level,
inadequate superior/proven bulls for AI and natural mating, lack of legal support, poor marketing,
etc.
But there exists large genetic variability which can be exploited for genetic improvement. The
large network of Research Institutes, state Agricultural/Animal Science Universities, state Animal
Husbandry departments and NGOs can be utilized for sustainable improvement. Searching of
breed specific biomolecules and niche markets for their products will enhance the utilization and
ensure the survival of these breeds. Also this sector has great export potential in form of animal
products and byproducts. The rich indigenous traditional knowledge (ITK) provides great scope for
low cost local management of AnGR.
Conservation and management of AnGR
Conservation of AnGR is generally defined as management of biosphere for the benefit of mankind
of present generation while maintaining its potential to meet the future needs.Over the years,
intensification of animal husbandry and widespread introduction of exotic breeds have completely
altered Animal Genetic Resources scenario. There is perceptible increase of a limited number of
specialised breeds, whereas reduction has occurred in genetic variability and population size of
many local breeds which did not meet the increasing production requirements of farming
enterprise. Social changes have also greatly influenced AnGR especially small ruminants because
present generation is not keen to continue their ancestral occupation of rearing livestock in
migratory system of grazing. Human population pressure, loss of access to resources, overgrazing
and resource degradation have also adversely affected animal genetic resources. As a result of these
developments, several indigenous livestock and poultry breeds over the years have suffered decline
and degeneration but we may need to rely back on the adaptability and potential of indigenous
animal genetic resources to face an uncertain future.Genetic diversity is necessary for adaptation to
changing environments (production systems, climate, etc.), emerging diseases, changing markets,
changes in consumer demand, continued genetic improvement, cultural and historical reasons, etc.
3

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

There is a need for immediate action for systematic conservation, genetic enhancement and
sustainable utilization of indigenous breeds.
World-wide discussion on conservation of genetic resources in animal production started much
later than in plant production. The need for conservation of animal genetic resources has been
accepted globally for sustainable development. Several international and national agencies have
taken up conservation of rare and dwindling breeds of domestic animals in various parts of the
world.
Fifties of
20th century
Sixties
1972

1974
1980
1985
1992
1993
1997
1999
2002
2003
2003
2004
2007
2009
2011
2011
2011
2012
2013

History of Initiates to Stop Genetic Erosion at Global and National Levels


Swedish AI-studs conserved semen from each bull used for breeding
Scientific and farmer communities in Europe draw attention to the high rate of erosion of
AnGR
The first United Nations Conference on Environment in Stockholm recognized these
developments and problems and initiated the United Nations Environment Programme
(UNEP) with one of its tasks to monitor the breeds of livestock in danger of disappearing
for whatever cause
FAO and UNEP initiated joint projects on conservation.
Technical Consultation on Animal Genetic Resources Conservation and Management
held in Rome
FAO introduced an expanded Global Strategy for the Management of Farm Animal
Resources under the responsibility of the CGRFA
The Convention on Biological Diversity (CBD), signed in Rio by 150 governments,
committed the nations of the world to conserve their biodiversity, to ensure its sustainable
use, and to provide for equitable sharing of the benefits arising from its use
Initiation of the Global Strategy for the Management of Farm AnGR by FAO
Intergovernmental Technical Working Group (ITWG) on AnGR established by CGRFA
CGRFA requested FAO for development of a State of the World's Animal Genetic
Resources for Food andAgriculture (SOW-AnGR)
FAO initiated development of country Report on AnGR
Govt. of India enacted the Biological Diversity Act (BDA) to address issues related to the
Biodiversity Fund, and access and benefit sharing mechanisms
National Biodiversity Authority (NBA) establishedto implement Indias Biological
Diversity Act.
Country report on State of Animal Genetic Resources of India submitted to FAO
CGRFA decided to finalize the SOW-AnGR, including Strategic Priorities for Action, at a
first International Technical Conference on Animal Genetic Resources
Interlaken Conference:
Launched SoW-AnGR
Adopted Global Plan of Action through the Interlaken Declaration
Ranchi Declaration (India) stressed the need for National Plan of Action on management
and conservation of farm animal genetic resources.
ICAR formed a National Advisory Board on Management of Genetic Resources
(NABMGR).
NABMGRrecommended that National Action Plan on Animal Genetic Resources be
developed by NBAGR
India signed The Nagoya Protocol on access to genetic resources and the fair and
equitable sharing of benefits arising from their utilization and ratified it on 19 October
2012
As per the guidance of NABMGR Guidelines for Management of Genetic Resources
were prepared
Second Country Report on AnGR submitted

Molecular Genetic Characterization of Farm Animal


Genetic Resources

The conservation of Animal Genetic Resources is now a multidimensional activity which


encompasses not only preservation and maintenance of existing breeds but also their improvement
and proper management. The overall aim is sustainable utilization, restoration and enhancement of
resources so as to meet the needs of mankind at present and in future.
The task of conservation, management and evaluation of vastly distributed native animal genetic
resources is of gigantic magnitude for our country. There is hardly any pedigree breeding with
proper data recording in field conditions and therefore, genetic improvement through selection is
limited to organized herds only. The concept of conservation of AnGR is poorly understood by
development agencies and farmers. While there are obviously cases where many indigenous breeds
are indeed no longer profitable and costly conservation programmes may need to be established in
order to maintain diversity values, there are also many cases where it is possible to support de facto
conservation i.e. through sustainable use. A further point to note on this issue is that it is likely to be
much cheaper to conserve local breeds through sustainable use before they become endangered.
Finding niche markets for their products is one possible way of ensuring the survival of these
breeds, and enabling the people who keep them to earn more from their existing lifestyle.
Approach for conservation of AnGR
Although some precious livestock genetic resources have already been lost and have become
extinct, there is still time to take a stock of current assets and conserve the fast declining genetic
treasure. It includes the preservation, protection and management of all natural resources.There can
be two approaches of conservation our genetic material:
In-situ conservation:In-situ conservation requires establishment of live cattle breeding farms and
their maintenance in their native tract. In-situ conservation strategies emphasize wise use of
indigenous cattle genetic resources by establishing and implementing breeding goals and strategies
for animal sustainable production systems. Major advantages of in-situ conservation are that this
system offers sufficient genetic diversity as the animals undergo gradual process of adaptation.
Proper breeding plans can be executed, genetic defects can be detected and animals can be
evaluated and improved over the years. The selected animals are always available for immediate
and future use. However, in-situ conservation involves a large infrastructure of land, buildings,
feed and fodder resources, water supply, technical and supervisory manpower, etc. Therefore, new
establishments for in-situ conservation of farm cattle genetic resources are quite costly and even the
maintenance of existing ones is cumbersome. The costs factor need to be estimated for each
ecosystem. That is why more than half of the breeds are not covered under this system. In situ
models of AnGR conservation have been developed by NBAGR by providing technical inputs and
incentives to the farmers/breeders in the breeding tract of respective farm animal breeds
Ex- situ conservation:Ex-situ conservation means literally, "off-site conservation". It is the process
of protecting an endangered species of animal/plant by removing part of the population from a
threatened habitat and placing it in a new location, which may be a wild area or within the care of
humans. While ex- situ conservation comprises some of the oldest and best known conservation
methods, it also involves newer, sometimes controversial laboratory methods. Ex situ conservation
of genetic material from livestock through cryopreservation is an important strategy to conserve
genetic diversity in many species. Conservation strategies benefit from advances made in
cryopreservation and reproductive technologies. Choice of type of genetic material to be preserved
for different species highly depends on objectives, technical feasibility (e.g., collection, cryoconservation), costs, and practical circumstances. National Animal Gene Bank has been established
at NBAGR, Karnal. A total of 1,09,200 frozen semen doses belonging to 277 breeding males
(Bulls/Rams/Bucks/Stallions) from 41 breeds representing cattle, buffalo, sheep, goat, camel, yak
and equine have been preserved at National Gene Bank. Animal Genomic Resource Bank has also
5

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

been established at NBAGR, Karnal which has collection of genomic DNA from 130
breeds/populations of livestock and poultry. It also has buffalo mammary gland EST library.
Strategies for future development
The onus for achieving goals of the national programme on conservation, sustainable management
and use of animal genetic resources lays with many players, such as farmers and livestock owners,
ministries, govt. departments, institutes, non-profit-making social and charitable NGOs, breeding
organisations, researchers, etc. Conservation and utilization of AnGR can be best achieved through
a joint approach by involving all stakeholders. These should understand and participate in all
activities relating to management of AnGR like implementation of improvement and conservation
programmes, animal identification, performance recording, marketing and branding of animal
products, development of pasture lands, fodder production, etc.
Breeding plans for long term conservation and continuous genetic improvement of indigenous
breeds need to be undertaken by establishing elite herds of the breed in its native habitat for
production of superior young males for breeding. Large number of government and nongovernment owned livestock farms exist in each state. Many of these farms maintaining indigenous
livestock breeds are not in good condition and are on the verge of closing down because of
inconsistent breeding plans; and inadequate availability of funds, manpower and other facilities.
These farms have the basic infrastructure which can be strengthened further for maintaining proper
herd size and implementing conservation, breed improvement, germplasm multiplication,
demonstration and utilization programmes. There should be effective networking of satellite
breeding flocks/herds of respective breeds with the established nuclear breeding units for
exchange of elite germplasm, multiplication, dissemination of germplasm, providing breeding
services in the breed tract and also supporting training and capacity building to livestock keepers.

2
Designing Field Strategies for Characterization of Farm Animal Genetic
Resources
P K Singh and Karuna Asija
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

The term Animal Genetic Resources (AnGR) is referred to those animal species and the populations
within each species that are used, or may be used for the production of food and agriculture. The
population within each species can be classified as wild and feral populations, landraces and
primary populations, standardized breeds, selected lines, varieties, strains and any conserved
genetic material- all of which are currently categorized as breeds. Thousands of years of natural and
human selection, genetic drift, inbreeding and crossbreeding have contributed to todays AnGR
diversity and allowed the development of sustainable livestock production in various agroecological zones and production systems. The 40+ livestock species contributing to todays
agriculture and food production are shaped by a long process of domestication and development.
Genetically diversified livestock populations provide a greater range of options for meeting future
challenges in changing environment, disease threat, nutritional requirement, market and human
demands. Among the worlds 148 non-carnivorous species weighing more than 45 kg, fifteen could
be domesticated. Thirteen of these species are from Europe and Asia and 2 originate from South
America. Six species (Cattle, sheep, goat, pig, horses and donkeys)are found in all the continents
while remaining 9 (dromedaries, Bactrian camel, llamas, Alpacas, reindeer, water buffalo, yak, bali
cattle, mithun) are important in the limited areas of the world. The proportion is even lower in case
of birds (other than ornamental and recreational species) with only 10 species (chicken, domestic
ducks, Muscovy ducks, domestic geese, guinea fowl, ostriches, pigeons, quails and turkeys) could
currently be domesticated out of 10,000 avian species.
The breed has been defined by FAO as Either a sub-specific group of domestic livestock with
definable and identifiable external characteristics that enable it to be separated by visual appraisal from other
similarly defined groups within the same species, or a group for which geographical and/or cultural
separation from phenotypically separate groups has led to acceptance of its separate identity.
Characterization of livestock biodiversity
Understanding the diversity, distribution, basic characteristics, comparative performance and the
current status of animal genetic resources is essential for their efficient and sustainable use and
conservation. Complete national inventories, supported by periodic monitoring of trends and
associated risks are basic requirements for effective management of animal genetic resources.
Without such information some breed population and unique characteristics they contain may
decline significantly, or be lost, before their value is recognized and measures taken to conserve
them. Major difficulty in completing the inventory of farm animal breeds result from the fact that
livestock breeds generally dont corresponds to the notion of herd book breed and are not pure
breeds with identifiable characteristics but is the result of haphazard breeding programmes under
the field conditions.
Keeping this in view the National Bureau of Animal Genetic Resources, has initiated a structural
programme for phenotypic characterization and development of breed descriptors for animal
genetic resources.Technical programme of the phenotypic characterization and development of
breed descriptors is quite comprehensive and should envisage conducting of scientific surveys by
following modern sampling designs and suitable formats, descriptors and questionnaires for
collecting all possible relevant information for a particular breed inhabiting in a defined
7

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

zoo-geographical zone. Such surveys of breeds/animal types must ensure mandatory recording of
the following types of information:
(i)
Demographical and geographical distribution
(ii) The native environment
(iii) Enumeration of breeds in terms of age, sex in a population
(iv) Management practices and utility
(v)
Qualitative and quantitative characterisation of breeds in relation to morphological
traits, production potential and reproductive status etc
(vi) Qualitative and quantitative description of unique animals, elite producers and rare or
unusual characteristics in certain specimens
Survey plan
The survey would be conducted preferably in three districts of the breeding tract of breed under
consideration. Each district would have one supervisor and four enumerators. On the assumption
that the breeding tract of a breed is spread over adjoining/contiguous districts in one or more
states, stratified two stage sampling design may be adopted. Different zones within a district
would be identified, which may constitute the different strata. Villages within the stratum may
constitute the first unit and houses within the village, the second unit. Totally, three districts;
within each district four strata and five villages within each strata would be randomly selected.
Demographical and geographical distribution of a breed
In the first quarter, the supervisor and enumerators would be engaged in determining
demographical and geographical distribution of the breeds. From each stratum, five villages may be
randomly selected for complete enumeration for the purpose of deriving demographic distribution
of the breed. This study would cover the following information:
a. Age wise and sex wise distribution.
b. Group enumeration for calves/ kids/ lambs, young stock and adults (milking females, dry
females, working males, stud bulls etc.).
c. Geographical distribution of the breed.
Complete information is obtained by stratified survey on data regarding group-wise, sex- wise
and breed-wise total population in the breeding tracts. During survey if individual animals with
exceptionally high producing capacity or with rare genetic variation are located, they would be
brought under organisational support or purchased for further studies.
Information would be recorded on 3000 animals covering three districts of the breeding tract. In
each district, 200-250 animals under each of the group would be studied for aspects given against
the group (Table 1).
Thus, there would be 1000 animals in a district which would be randomly selected from 4
randomly selected zones. National Bureau of Animal Genetic Resources has formulated five to six
questionnaires separately for different livestock and poultry species for collecting the required
information for phenotypic characterization of the breed, which may be used during the survey for
collection of information on different aspects of phenotypic characterization.
Native environment
Some important metrological parameters are to be recorded for the breeding tract of a breed. These
include temperature, humidity, rainfall, in terms of maximum and minimum along with their
respective months and average of last 10 years. Annual duration of flood and draught along with
their months, maximum and minimum elevation of land, sub soil water depth during summer and
rainy season are also recorded. Other information like soil description, forest area (in sq. kms.), wet
cultivated area, dry cultivated area, uncultivated area, main cultivated cereals, main cultivated
pulses, other crops are recorded. Area of the pasture available for the grazing of animals along
8

Molecular Genetic Characterization of Farm Animal


Genetic Resources

with classification of the pastures as (mountaineous/ sub moutaineous/ plains- irrigated/ rain
fed/ sandy) are also to be recorded.
Table 1:Group classification and study coverage for phenotypic characterization
Species
Cattle and
Buffaloes

Group
Calves (up to 1year)
Stock ( 1 3 years)
Milking Females
Working males
Breeding bulls

Sheep

Goat

Lamb (1-3 months)


Young Stock
(6-12 months)
Milking ewes
Stud rams
Kids (1-3 months)
Young Stock
(6-12 months)
Yearlings
Milking Does

Pig

Poultry

Stud Bucks
Piglets (0-2 months)
Young Stock
(2-8 months)
Sows
Boars
Cockrels
(up to 5 months)
Pullets
(up to 5 months)
Cock
(above 5 months)
Hen
(above 5 months)

Study coverage
Physical traits, feeding, management practices
Physical,growth traits, feeding andmanagement practices traits
Physical traits, feeding and management practices, reproduction,
production and growth traits
Physical traits and feeding and management practices.
Physical and reproductive traits and feeding and management
practices
Physical traits, feeding, management practices and growth traits
Physical traits, feeding, management practices and growth traits
Physical, productive, reproductive traits, feeding and management
practices
Physical, reproductive traits and feeding, management practices
Physical traits, feeding, management practices and growth traits.
Physical traits, feeding, management practicesand growth traits
Physical, reproductive traits, feeding, management practices and
growth traits
Physical, productive, reproductive traits, feeding and management
practices
Physical, reproductive traits and feeding, management practices
Physical traits, feeding, management practices and growth traits.
Physical traits, feeding, management practices, growth traits
Physical, productive, reproductive traits, feeding and management
practices
Physical, reproductive traits, feeding and management practices
Physical traits, feeding, management practices and growth traits
Physical traits, feeding, management practices and growth traits
Physical, reproductive traits, feeding, management practices and
growth traits
Physical traits, feeding, management practices, utility, egg
production and growth traits

Enumeration of breed
Population statistics is important for classification of breeds as per their risk status classes.
Therefore data is required for the estimation of population status of breed under consideration.
FAO (2013) has laid out criteria for classification of breeds according to their risk status. For high
reproductive capacity species like Pig, dog, rabbit and all avian species, the criteria would be as
follows:

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Risk
class

status

Total
number
of
breeding females mated
to males of same breed

Overall population

Total number
of
breeding
males

Rate
inbreeding
generation

of
per

Extinct

When no breeding male or female remaining and no cryo-preserved genetic material.

Cryopreserved
only
Critical

No living male or female but sufficient cryo-preserved genetic material for reconstitution of
breed.

Criticalmaintained
Endangered
Endangeredmaintained
Vulnerable

100
<20% CB

80andincreasing trend OR
5
3% or higher
120& decreasing or static
trend
As for Critical but for which active conservation programmes are in place or populations are
maintained by commercial companies or research institutions.
100 - 1000
<20% CB

80-800& increasing
trend OR >5 and 20
1-3%
120-1200& decreasing or static
trend
As for endangered but for which active conservation programmes are in place or
populations are maintained by commercial companies or research institutions.
1000 and 2000
<20% CB

Not at Risk

800-1600& increasing trend OR >20 and 35


0.5% to 1%
1200-2400& decreasing or static
trend
If the population status is known and the breed does not fall in any of above categories.

Unknown

If the population data is not available for the breed

For low reproductive capacity species e.g. cattle, buffalo, sheep, goat, equine, camel, yak, mithun
etc.; all the figures would be three times than high reproductive species as mentioned in the above
table. If any breed is falling short on any of the above criteria, it will be kept in respective
class.Therefore, it is important to get number of breeding males, breeding females, population
trend, percent of breeding females bred to males of the same breed and overall population size of
the breed so as to classify the breeds as per their risk status. During the survey in the selected
villages, these figures may be collected and estimated population may be obtained by extrapolation
these figures on census data. Along with the population data average herd size and age wise/ sex
wise classification within the herd is also required.
General information about livestock keepers
Communities responsible for rearing of breed and description of communities (farmers/nomads/
isolated tribal/ any other) who are the keepers of the breed in question, is to be recorded along with
some socio-economic parameters on them. Information are to be collected about the livestock
keepers only once during the survey. These information include agricultural land holding
(Irrigated/ non-irrigated), feed and fodder grown during different seasons, profession, total annual
income in rupees, income generated through animal husbandry, number of family members,
number of literate members, number of members engaged in animal husbandry practices
(man/women/children), mode of sale and purchase of animals and animal products and utility of
the livestock/ poultry reared by him.
Management Practices
Housing and Hygiene: Duration of housing of the animals like during day/night/ both day and
night/ none is to be recorded. Nature of animal houses like open/closed; kutcha/ pucca; separate/
part of residence; kutcha floor/ pucca floor; full walled/ half walled; well ventilated or not;
10

Molecular Genetic Characterization of Farm Animal


Genetic Resources

sanitary condition (good or poor) and drainage system of house are to be recorded. Hygiene of
feeding/water trough and cleaning of milk utensils as well as animals etc may also be recorded.
Wallowing practices, if any, are also to be recorded indicating place and duration of the wallowing.
Feeding:Feeding practices for calves would be recorded every month in cattle and buffaloes and
every fortnight in sheep and goats. Feeding of mothers would be done once in three months.
However, for feeding and management practices for rest of groups, recording would be done once
in every 3 months. Grazing practices along with the distance and time covered is to be recorded in
morning as well as evening. Stall feeding is to be recorded in terms of individual or group feeding
and quantum of the feed offered as follows:

Green fodder
Dry fodder
Concentrate
Minerals

Name

Morning
Qty (kg)

Name

Noon
Qty (kg)

Name

Evening
Qty (Kg)

In pigs, stall feeding/semi stall feeding/ scavenging alone/ scavenging with supplementation of
kitchen waste and vegetable waste is to be recorded. While recording the stall feeding the
supplements name like cake/ concentrate/ mineral mixer/ green fodder is also to be recorded. The
feeding practices of piglet are also needed. In poultry, the feeding through scavenging/ scavenging
with supplement feed/ free ranging/ free ranging with supplement feeding/ feeding with local
feeds/ feeding with branded concentrates are the major feeding practices which are to be recorded
in the management of breed under study.
Water: Adequate/inadequate along with the quantity and water source
Hay/ Silage making practices for preserving the fodder is also to be recorded.
Breeding: Natural/ artificial insemination along with information about the breeding males is used.
Treatment and prophylactic measures of the diseases:The type of diseases along with the treatment given
to the animals (herbal/ allopathic or local) is to be recorded. The prophylactic measures taken in
terms of de-worming and vaccination etc. are also to be recorded. During the visits in each season,
reproductive and disease management aspects would be recorded by observations as well
asinteraction with the farmer. Breeding bulls might not be available in sufficient number and
therefore studies would be limited to whatever available in the area of coverage.Prenatal and post
natal mortality at different age groups is to be recorded in all the species.

11

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Physical Characters (separately in males and females):


Characters
Colour

Horns

Ears

Head

Body
Hump
Dewlap
Naval flap
Penis
sheath flap
Basic
temperame
nt
Tail:

Udder&
Teat

Beard
Wattles
others

cattle/buffalo
Colour of coat,
skin,
muzzle,
eyelids, tail and
hoofs
Colour
Size

Shape(Straight/
curved)
Orientation
Orientation
(horizontal/
drooping)
length
Forehead
(Convex/conca
ve/straight)

Sheep
% surface area in
coat colour with
distinctive colour
markings

Colour
Size
(small<15/mediu
m 15-25/large> 25
cm)
Shape (Straight/
curved)
Orientation
Orientation(erect/
pendulous/
horizontal)

Goat
% surface area in
coat colour with
distinctive colour
markings

Pig
% surface area in
coat colour with
distinctive colour
markings

Colour
-Size (small <15/ -medium 15-25/
large > 25 cm)
Shape (Straight/
curved)
Orientation
Orientation(erect
/
pendulous/
horizontal)

--

Forehead(straight/
convex/slightly
convex)

Forehead(straigh
t/
convex/
slightly convex)

L/ M/S
L/M/S
L/M/S
L/M/S
L/M/S

------

------

Snout
profile
(straight/
convex/ slightly
convex/ concave)
------

(docile/
moderate
/tractable/wild
)
Length(L/M/S)
Colour
of
switch
-Shape (bowl/
round/ trough/
pendulous)
Fore-udder
size(L/M/S)
Rear-udder size
(L/M/S)
Teat
shape
(cylindrical/
funnel/ pear)
Teat
tip
(pointed/
round/ flap)
Milk
vein
(L/M/S)
----

(docile/ moderate
/tractable/wild)

(docile/
moderate
/tractable/wild)

(docile/ moderate
/tractable/wild)

Length (L/M/S)
Type

---

---

Shape
--

---

---

--

--

--

--

--

--

--

-Orientation(erect/
pendulous/
horizontal)

Poultry
Plumage
colour:
White/
Black/
Blue/
Red/
Brown/
Gold/
Others (specify)
Pattern:
Solid/
Dull/
Stripped/
Patchy/ Spotted/
Barred/
Others
(specify)
Skin
colour:
White/
Yellow/
Blue/ Black/ Other
Shank
colour:
White/
Yellow/
Black/ Blue/ Green
Earlobe
colour:
White/
Red/
Black/ White &
Red/
Others
(specify)
Comb
colour:
Black/
Red/
Others (Specify)
Eye colour: Grey/
Black/
Brown/
others (specify)
Comb
type:
Single/ Pea/ Rose/
Walnut/ cushion/
strawberry/
Duplex/v-shaped/
double

Number of teats
and teat position

--

----

--

--

(present/ absent)
(present/ absent)
Coat Type (hair/
wool),
Fineness
(fibre
diameter)
(fine
<
21

(present/ absent)
(present/absent)
Coat Type (hair/
cashmere/pashm
ina/mohair);
Fineness (fibre

12

--Coata.
Bristle
(long/medium/sh
ort);

dwarfism,
feathered
legs,
naked neck, silky
frizzle,
multiple

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Top line
-L- Large, M- Medium, S- small

/medium
22-26
coarse >26 micrometers),
Length
(12mo
fleece)
(short
<5/medium
5-10/long>10 cm),
Lustre
(lustrous/non-lust
rous),
Crimp/curl(straig
ht/low crimp = <
4 / high crimp = >
4 cm.),
Wool
cover
(covered/bare)- a.
Head b. Face c.
Belly d. Legs are to
be recorded
--

diameter);
cashmere/
pashmina/down;
mohair

b.
Fineness
(bristle diameter)

spurs, etc.

--

Straight/concave

--

Morphometric traits: Chest girth, body length and height at withers are important morphometric
parameters in mammalian livestock at different age and sex classes. Physical measurements for the
mothers would be recorded during the first/second of 8-10 month of lactation in cattle and
buffaloes and recorded during the first/second and 6th month of lactation in sheep and goats. For
calves/kids/lambs, measurements would be taken for every month up to 6 months and thereafter
every six month. For stock (13 years) body measurements would be recorded once in every 6
months and for others only once in cattle and buffaloes. For young stock, body measurements
would be recorded once in every 3 months in sheep and goats, every two months in pigs and for
others only once. In pigs, neck girth along with above mentioned morphometric traits is also
recorded.
Production Performance
Body Weight:
Characters
Body
Weight (kg)

cattle/buffalo
Birth weight,
Pre-weaning
weight, 12m
weight, 24m
weight, weight
at first mating
and weight at
first calving

Sheep
Birth weight, Preweaning weight, 3m
weight, 6m weight,
12m weight, weight
at first shearing,
weight
at
first
lambing and body
weight at marketing
with age

Goat
Birth weight,
Pre-weaning
weight,
3m
weight,
6m
weight, 12m
weight,
weight
at
slaughter,
weight at first
kidding

Pig
Birth weight,
Pre-weaning
weight,
3m
weight,
6m
weight,
12m
weight, body
weight
at
slaughter and
at
first
furrowing

Poultry
Hatching,
8,12
week of age and
at slaughter, Body
weight gain/ kg
feed (weeks) for
the periods 08week, 8-12 week
and 8-20 weeks
are also recorded
with the feed
conversion
efficiency

Dairy performance: Dairy performance of cattle, buffalo, goat and sheep are recorded for the first
four order of lactations. Milk recording would be done once in a month from the first month of
lactation to the end in cattle and buffaloes; at fortnightly intervals for full lactation in goat; and on
7th and 50th day of lactation in sheep. Milk fat and SNF would be estimated every day from
morning milk only. The trait considered for cattle and buffalo include daily milk yield, peak milk
yield, days to reach peak yield, lactation length, lactation milk yield, fat%, SNF%, milking rate
13

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

(litres/min.), productive life span (month), dry period, feed conversion for milk, percentage of
animals in different lactations. In case of sheep and goat the traits peak milk yield, days to reach
peak yield, milking rate (litres/min.), and percentage of animals in different lactations are not
required. However, in case of sheep the traits like dry period, feed conversion for milk are also not
required. Abnormality of teats may also be recorded while recording the milk production
performance.
Other Production Traits:
Goat
Mohair production: Sampling site (shoulder/mid-side/thigh), number of sheerings per year, average
Greasy fleece weight, clean fleece weight, staple length, fibre diameter (true
mohair/heterotypes/kemps), fleece colour and feed conversion for wool are to be recorded in
males and females for the first and later sheerings during the year.
Cashmere/Pashmina production: Age at combing/collection, weight of fibre per combing /collection,
clean yield%, fibre length, fibre diameter
Hair production: Average weight of clipping (kg), hair length and hair diameter are to be recorded
along with age at clipping to be presented in the tabular form.
Skin production: Average skin weight, skin length, skin width is to be recorded in kids and adult.
Sheep
Wool production: Information on sampling site (shoulder/mid-side/thigh), number of sheerings per
year and processing type (carpet/crossbred/merino wool) are recorded. Average greasy fleece
weight, clean fleece weight, staple length, fibre diameter (true mohair/heterotypes/kemps), fleece
colour and feed conversion for wool are to be recorded in males and females for the first and later
sheerings during the year.
Pelt production: Pelt weight, pelt length, pelt width is to be recorded in foetus and lamb.
Pig
Bristle production: It is recorded in both the sexes in terms of number of cuttings per year and
average weight, length, diameter and colour of bristle in each cutting.
Carcass characters like carcass weight (kg), age at slaughter (d), weight (Hot/Cold), length,
dressing % (hot/cold), skin %, meat: bone ratio, fat thickness, lean %, bone %, fat % are recorded for
goat, sheep and pig.
Poultry
Egg production characteristics in terms of age at first egg, egg numbers, age at 50% production, age
at culling.
Egg quality traits like albumin index, yolk index, haugh unit, shell weight, albumin weight, yolk
weight, specific gravity, egg weight (g) (40/50/60/70), shell colour (white/brown/cream or
tinted/other), shell strength (shell thickness and breeding strength), albumin quality, egg inclusion
bodies (Blood spots/meat spots) are to be recorded.
Qualitative and quantitative descriptions of individual animals other than the above which are
given in the breed descriptor would be covered once.During survey if individual animals with
exceptionally high producing capacity or with rare genetic variation are located, they would be
brought under organisational support or purchased for further studies.
Reproduction Performance:
Males:
(i) Age at first ejaculation (days) in case of cattle and buffalo only, (ii) Age at first mating
(days) (iii) If breed is under artificial insemination, semen quality parameters should also be
recorded.

14

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Females: Age at first oestrus, oestrous cycle duration (days), oestrus duration (hrs), age at first
mating,
age
at
first
calving/kidding/lambing/furrowing
etc.,
calving/kidding/
lambing/furrowing interval, gestation length and range, twinning percentage all are recorded in all
mammalian species. In cattle and buffalo it is desirable to record interval from calving to first
conception, conception rate, number of services per conception, service period, and range, dystocia
percentage, Placental retention (%), abortions (%), still births (%), post gestational mortality (%). In
case of sheep, goat and pig seasonality, litter size, lifetime number of kidding/lambing/furrowing
are to be recorded. In pigs it would be desirable to record litter weight, litter size at weaning. The
infectious and non-infectious abnormalities, abortions and still birth and pre-weaning and adult
mortality should also be recorded.
Poultry: in terms of age at first egg, Broodiness (usual/ sometimes/rare/other), fertility (%) and
hatchability (%) on fertile eggs basis and on total eggs basis.
Mortality (%) in poultry: a) 0-1 weeks b) 1-8 weeks c) 8-20 weeks d) n-n weeks
Draft ability- type of work:Parameters like purpose of draft(ploughing, threshing, power etc.),
capacity for work (Hard/medium/light) and average duration of work/day(hrs) are recorded to
know the draft ability of the animal.
Physiology and diseases: Rectal temperature, pulse rate, respiration rate are to be recorded in
males and females. Drought tolerance and heat tolerance are graded (1 to 5) from lowest to highest.
Common diseases and parasites, measures against diseases including prophylactic measures
against diseases along with the resistance to infectious diseases and parasites in the breed are to be
recorded.
Documentation of Livestock biodiversity
For documentation of indigenous livestock breeds, the NBAGR, State Animal Husbandry
Departments and many other Agricultural/ Veterinary Universities are publishing breed
monographs and upto now approximately 100 monographs have been published by different
organizations. NBAGR is also publishing the breed descriptors of different livestock and poultry
breeds in the Indian Journal of Animal Science as special feature. NBAGR has also released the
breed charts/ calendars for cattle, buffalo, sheep, goat and chicken species in which one male and
one female animal of all the breeds of that species have been depicted. Therefore, it is important to
document the information collected for the phenotypic characterization of the breed in terms of
monograph, leaflet or video documentary. By doing so we may keep the information in public
domain and avoid any kind of biopiracy.
Brief description of livestock breeds- Breed descriptors
The breed descriptor of a breed includes the minimum information in a summarized form so as to
describe the breed in all respects. The breed descriptors generally have five major parts i.e. General
description, Physical characters, Performance traits, Physiology of the animal and Diseases. Under
general description the name of the breed, its origin, communities rearing thebreed with their socio
economic status, different kind of management practices including feeding, grazing, housing,
breeding and health management, the native tract of its distribution along with the native
environment and the population status of the breed is described. Physical characters include the
qualitative and quantitative physical traits and biometric observations of the animals belonging to
different ages and sex. The performance of the animals is generally recorded in terms of growth,
production, reproduction and draft ability of the animals besides these factors, the uniqueness of
the animals and adaptative traits should also be mentioned in the breed descriptors. It will also be
appropriate, if the photographs of a typical mature breeding male and female is given along with
the breed descriptor.
15

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Epilogue
If any population of any livestock species available in the particular geographical area, fulfilling the
status of a breed and kept under uniform management and utility, it should be studied,
characterize and registered as the details given in this paper. By doing so we will be able to
complete the inventories of our animal genetic resources and also reducing the proportion of large
non-descript population of different livestock and poultry species. After recognition of a population
as a breed suitable breeding and developmental strategies may be framed for the genetic as well as
overall development of the breed, thereby improving the livelihood of livestock keepers. It is
advised to develop sets of questionnaires for every species along with the typical format of breed
descriptors before taking up the job of phenotypic characterization.
Reference

FAO,2013. In vivo conservation of Animal Genetic Resources.FAO Animal Production and Health Guidelines
No. 14. Rome.

16

Breed Registration Process in India


P.K.Vij
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

India possesses huge as well as diverse livestock population distributed over a large range of
geographical, ecological and climatic regions, and is globally acknowledged as one of the largest
livestock diversity center. Farm animal population comprises of 512 million of livestock and 729
million of poultry (Livestock Census, 2012). Only 20 percent of this population belongs to well
defined 151 registered indigenous breeds in the country and remaining 80 percent belong to many
animal populations that are not assigned to any recognized breed.The populations which have not
been characterized and accredited so far, are commonly referred to as non-descript or
traditional. Even though parts of these non-descript populations are known to be multiple
crosses of recognized breeds, some animals may belong to homogenous groups distinguishable
from other populations on the basis of identifiable and stable phenotypic characteristics that
warrant their being distinguished as separate breeds.
The advent of new era of national sovereignty over genetic resources under Convention on
Biological Diversity (CBD) requires a new approach to describe and catalogue livestock and poultry
breeds. The objective of sustainable use of genetic resources as one of the main goals of the CBD as
well as sustainable development could be achieved only through ensuring wide access to animal
genetic resources, for farmers, herders, breeders and researchers. To this end frameworks for
access, and for equitable sharing the benefits derived from genetic resources, need to be put in place
The global scenario of World Trade Organisation (WTO) and Intellectual Property Rights needs
protecting the local animal genetic diversity and provide recognition to the developers of new
improved animal breeds. This in turn demands an authentic national documentation system of
valuable sovereign genetic resources with well defined characteristics.
Registration is nothing but a documentation of the knowledge, skills and techniques (KST), and
biological resources of local communities. The registration process is a critical pathway for public
description and documentation of genetic materials. Of utmost importance, once registered, these
genetic materials are incorporated into the public domain.
Recognizing the need for an authentic national documentation system of valuable sovereign
genetic resource with known characteristics, Indian Council of Agricultural Research (ICAR)
initiated a mechanism for Registration of Animal Germplasm at National Bureau of Animal
Genetic Resources (NBAGR), Karnal. This would provide protection to the valuable animal genetic
diversity and facilitate its access for genetic improvement of animal breeds. This mechanism is the
sole recognised process for registration of Animal Genetic Resources material at national level.
Guidelines for registration
Registration of new breeds
The registration of Indian livestock and poultry genetic resources revolves around the concept of a
breed. Distinct populations within species are usually referred to as breeds. Cultural and ecological
aspects of livestock keeping also serve as a means of identifying populations that merit being
treated as separate breeds.It is difficult to exactly define a breed. The broad definition of the term
breed used by FAO is a reflection of the difficulties involved in establishing a strict definition of
the term. According to this definition, the breeds are either

17

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

(a) a sub-specific group of domestic livestock with definable and identifiable external
characteristics that enable it to be separated by visual appraisal from other similarlydefined groups within same species; or
(b) a group for which geographical and/or cultural separation from phenotypically similar
groups has led to acceptance of its separate identity.
Eligibility Criteria for Registration:
1. Populations of domesticated animals, which are unique, stable and uniform, and has potential
attributes of academic, scientific or commercial value can be registered as breeds.
2. Any population having at least 1000 animals will be considered for registration as a breed.
These animals may be maintained by the applicant/ breed society/ NGO/ Govt. Agency/
farmers in field conditions.
3. All claims concerning the material submitted for registration should accompany scientific
evidence for uniqueness, reproducibility and value in the form ofi. Publication in standard peer reviewed journal (a copy of reprint to be submitted).
AND/ OR
ii. Evaluation data for at least three years under research programmes like All India Coordinated Research Project (AICRP), Network Project, Adhoc Schemes, etc. supported with
relevant extracts of the documents or verification by concerned Director/Project Director
(PD)/Project Coordinator (PC)
AND/ OR
iii. Publication of information on potential value of germplasm in institute annual report or any other
such reports
AND/ OR
Recommendation of the State Animal Husbandry Department/Livestock Development
Board regarding the novelty and uniqueness of the breed claimed.
Who can Apply: Application can be submitted by any citizen of India / breed society registered as
per constitution of India / NGO / Govt. agency.
Validity of Registration: The period for validity of registration shall be 25 years.
Notification of Registered Materials: All breeds approved for registration would be officially notified
to the applicants along with Registration Number. A certificate will also be issued to this effect to
the applicant. Official Notification will be published along with brief description of not less than
one page in the subsequent issue of
i. Indian Journal of Animal Sciences - Published by I.C.A.R., New Delhi 110 012
ii. An abstract form of the registered breed will also be published in following publications:
a.
NBAGR Newsletter, Published by the Director, NBAGR, Karnal-132 001
b.
ICAR News - Published by the Publication and Information Division, Krishi
Anusandhan Bhavan, ICAR, New Delhi 110 012
c.
NBAGR, ICAR Website
De-notification: De-notification shall be done by the Registration Committee in case of false claim(s)
or disputed IPR claim. Appeal for counter claim, if any, should reach the Registration Committee
within a period of three months or the publication of Notification in Indian Journal of Animal
Sciences - Published by the I.C.A.R.
Procedure for Submission of Proposal for Breed Registration:
1. Submission of Application and Material: All applications for registration of proposed breeds
should be submitted to the following address:
18

Molecular Genetic Characterization of Farm Animal


Genetic Resources

The Director, National Bureau of Animal Genetic Resources, P.O.Box. 129, Karnal 132001,
Haryana. Phone: 0184-226 7918, Fax: 0184-226 7654, Email: director.nbagr@icar.gov.in
2. The applicant should submit 3 copies of the application along with relevant documents,
literature, no matter how small (even one page), for the proper evaluation of the breed and
softcopy of the application, descriptor and photographs (original).
3. The application must be signed by the applicant and countersigned by Director, Department of
Animal Husbandry of the concerned state or his representative with rubber seal.
4. The application must be accompanied by complete description of the breed using standard
descriptors (as per concerned species).
5. Submit a detailed history of the breed.
6. List the difference, distinction and details that are specific for that breed in comparison to other
breeds in the vicinity or elsewhere.
7. Submit representative photographs of the breed (male, female, young ones and herd /flock).
8. Submit a list of the registered animals of the breed that are conforming the breed standards laid
out by the applicant or his organization.
9. The breed must have completed a minimum of 10 generations.
10. Submit letters from at least three different breeders/owners of the breed, explaining:
Why they believe it should become a recognized breed?
How long they have been breeding the breed?
Spell out the reasons for reorganization of the breed as a separate identity.
What has been done to establish this breed- breeding strategies, parental stock etc?
What are the suggestions to further improve this breed in a long term perspective?
What makes this breed clearly different and distinctive from all other breeds?
Registration of Varieties/Strains/Lines of chicken

1. Submission of Application along with documents


All applications for registration of proposed Varieties/Strains/Strains should be
submitted to the following address:

2.
3.
4.
5.
6.
7.
8.
9.

The Director, National Bureau of Animal Genetic Resources, P.O.Box. 129, Karnal
132001,
Haryana.
Phone:
0184-226
7918,
Fax:
0184-226
7654,
Email:
director.nbagr@icar.gov.in
The applicant should submit 3 copies of the application along with relevant documents,
literature, no matter how small (even one page), for the proper evaluation of the
Variety/Strain/Line and softcopy of the application, descriptor and photographs (original).
The application must be signed by the applicant(s) and countersigned by the Head of the
Organisation with rubber seal.
The application must be accompanied by complete description of the Variety/Strain/Line using
prescribed descriptors.
Submit a detailed history of the development of the Variety/Strain/Line.
List the distinctiveness characteristics of the Variety/Strain/Line in comparison to other
Varieties/Strains/Lines available in the country.
Submit representative photographs of the Variety/Strain/Line (male, female, young ones and
flock).
The Variety/Strain/Line must have completed a minimum of 8 generations.
The Applicant must certify that:
The Variety/Strain/Line is distinct from other Lines/Strains whose existence is a matter of
common knowledge at the time of filing of application
It is sufficiently uniform and stable
19

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Eligibility Criteria for Registration


Any population having at least 1000 birds will be considered for registration as a
Variety/Strain/Line. All claims concerning the material submitted for registration should
accompany scientific evidence for uniqueness, reproducibility and value in the form of(I) Publication in standard peer reviewed journal (a copy of reprint to be submitted).
AND/ OR
(II) Evaluation data for at least three years under research programmes like All India Coordinated Research Project (AICRP), Network Project, Adhoc Schemes, etc. supported with
relevant extracts of the documents or verification by concerned Director/Project Director
(PD)/Project Coordinator (PC)
AND/ OR
(III) Publication of information on potential value of germplasm in institute annual report or any
other such reports
Status of Registration
a. Accession numbers have been given to each of extant breeds of various species of livestock
and poultry. These have been published in The Indian Journal of Animal Sciences, 78(1): 127130 (2008).
(http://www.nbagr.res.in/Accessionbreed.html)
b. Guidelines, descriptor formats and application form have been prepared for registration of
new breeds (http://www.nbagr.res.in/guidelines.html) and for registration of
varieties/strains/lines of chicken.
(http://www.nbagr.res.in/guide.pdf)
c. Descriptors of 109 breeds (11 buffalo, 29 cattle, 23 goat and 26 sheep, 4 horse, 2 pig, one
donkey and 13 chicken breeds) have been published so far in Indian Journal of Animal
Sciences.
Newlyregistered breeds
Twenty-two new breeds of different species of livestock and poultry have been registered so far.
This includes nine breeds of cattle, three of buffaloes, three each of goat and pig, and one each of
sheep, camel, donkey and chicken. After including these newly registered breeds, total number of
indigenous breeds in the country now is 151, which include 39 for cattle, 13 for buffalo, 24 for goat,
40 for sheep, 6 for horses and ponies, 9 for camel, 3 for pig, 1 for donkey and 16 for chicken.
Indigenous pig and donkey breeds, and a line of chicken have been registered for the first time.
This documentation process has helped creating awareness and enhancing sense of ownership
among local communities, policy makers, and research and development organizations. There may
still be many domestic animal populations having distinct characteristics, which have not been
recognized as breeds so far. These need to be documented and registered as breeds to protect and
exploit benefits of IPRs, if any, arising out of these resources.

20

Molecular Genetic Characterization of Farm Animal


Genetic Resources

List of newly registered livestock breeds and lines


S.N.

Breed

Home Tract

Accession number
INDIA_CATTLE_1526_MOTU _03031

CATTLE
01

Motu

02

Ghumusari

Odisha, Chhattisgarh
and Andhra Pradesh
Odisha

03

Binjharpuri

Odisha

INDIA_CATTLE_1500_BINJHARPURI _03033

04

Khariar

Odisha

INDIA_CATTLE_1500_KHARIAR _03034

05

Pulikulam

Tamilnadu

INDIA_CATTLE_1800_PULIKULAM_03035

06

Kosali

Chhattisgarh

INDIA_CATTLE_2600_KOSALI _03036

07

Malnad Gidda

Karnataka

INDIA_CATTLE_0800_MALNADGIDDA_03037

08

Belahi

Haryana and
Chandigarh
Uttar Pradesh and Bihar

INDIA_CATTLE_0532_BELAHI _03038

09
Gangatiri
BUFFALO

INDIA_CATTLE_1500_GHUMUSARI _03032

INDIA_CATTLE_2003_GANGATIRI_03039

01

Banni

Gujarat

INDIA_BUFFALO_0400_BANNI_01011

02

Chilika

Odisha

INDIA_BUFFALO_1500_CHILIKA_01012

03
Kalahandi
GOAT

Odisha

INDIA_BUFFALO_1500_KALAHANDI_01013

01

Konkan Kanyal

Maharashtra

INDIA_GOAT_1100_ KONKANKANYAL _06022

02

Berari

Maharashtra

INDIA_GOAT_1100_ BERARI _06023

03

Pantja

Uttarakhand and Uttar


Pradesh

INDIA_GOAT_2420_ PANTJA _06024

Katchaikatty
Black

Tamil Nadu

INDIA_SHEEP_1800_KATCHAIKATTYBLACK_
14040

01
PIG

Kharai

Gujarat

INDIA_CAMEL__0400_CAMEL_02009

01

Ghoongroo

West Bengal

INDIA_PIG_2100_GHOONGROO_09001

02

Niang Megha

Meghalaya

INDIA_PIG_1300_NIANGMEGHA_09002

03
Agonda Goan
Donkey

Goa

INDIA_PIG_3500_ AGONDAGOAN _09003

01
Spiti
Chicken

Himachal Pradesh

INDIA_DONKEY__0600_SPITI_05001

01

Rajasthan

INDIA_CHICKEN_1700_ MEWARI _12016

Sheep
01
Camel

S.N.

Mewari
Name

Developed by

Lines Registered

Accession number

Chicken-Synthetic
01

PD1
(Vanaraja
Male Line)

ICAR-Directorate
of
Poultry
Research,
Hyderabad

INDIA_CHICKEN_001_PD1_13001

21

4
Conservation Strategy through Network Programme
M S Tantia and Rekha Sharma
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

The realization that animal genetic resourcesare at risk of being lost has stimulated
nationallivestock conservation efforts. The need for conservation isbased on economic, cultural, and
ecological values; unique biologicalcharacteristics; shifts in market demand; and research needs. A
first step in assessing geneticconservation needs is development of baseline information
onpopulation and genetic relationships. It is clear that livestock breeds are not biological taxa but
rather represent the outcome of social processes. They are therefore unlikely to survive outside the
social contexts and production systems that formed them. However, these losses weaken the
potential of breeding programs that could improve hardiness of livestock. Traditional pastoralists
have often tended to foster biodiversity, in both plants and animals. Many pastoral societies have
developed elaborate systems that result in the preservation of genetic resources. Pastoralists have
deliberately developed livestock to meet different needs and conditions.
Network project on Animal Genetic Resources is fully funded ICAR project being coordinated
by NBAGR wherein different organizations working in the field of livestock and poultry research
and development are being loomed in a network approach. Various agencies which are having
infrastructure as well as manpower like state animal husbandry departments, state
veterinary/agricultural universities, livestock development boards, animal science institutes,
NGOs, etc. are the partners in this project. In netwokmode, the specific agency which is located in
the breeding/home tract of the targeted breed/population is approached with a well defined
technical programme to undertake conservation activities for the designated breed/population. The
animal keepers of the targeted breed/population are also motivated and involved in the
conservation activities.
Commercial breeds of livestock possess greater genetic variability than most crop varieties do.
This diversity allows intensification of selection within breeds to be a fruitful approach for
improving livestock productivity. However, if continued emphasis on breed replacement and
increasing selection intensity (e.g. for greater productivity) take place at the expense of maintenance
of genetic diversity, including the advantages of disease resistance and environmental adaptation,
there may be significant long-term costs. As an example, Holstein cattle have become the preeminent dairy breed world-wide and have enjoyed sustained improvements in milk production
potential, but only at the cost of declining genetic diversity within the breed.
The indigenous breeds are considered hardy and well adapted tothe environment. The hardiness
of the indigenous breeds is believed to haveresulted from natural selection under the management
practices of the Native breeders/herders and from the adverse feed conditions. Indigenous breeds
show a high level of fertility and reproduction. In situ management of animal genetic resources can
only be successfullyaccomplished through breeder actions.
Factors affecting conservation
Indigenous breeds are populations that are the product of breeding or selection carried out by
farmers, either deliberately or not, continuously over many generations. They tend to contain high
levels of genetic diversity and to be adapted to specific environments, being especially important in
environmentally marginal areas. Developing countries typically rely on landraces for much of their
production. They are important genetic resources, representing an insurance policy against
uncertain markets and environmental conditions for food and agriculture in the future.
22

Molecular Genetic Characterization of Farm Animal


Genetic Resources

The characteristics of the indigenous breeds (low growthrate, lower level of production) imply
thatthe potential for altering gross income is lower than more prevalentbreeds under current
marketing conditions. However, adaptationto the environment and reproductive performance may
alter thissituation. Short-term ownership negatively affects breedconservation by creating an
unstable situation for maintainingor increasing animal numbers. However, it is doubtful that
anyeffective selection will be implemented; therefore, the populationmay behave as if it is a
randomly mated population, with minimalloss of alleles due to selection.With the relatively small
total population size and small individualflock sizes, genetic drift is an important factor
affectingwithin-breed genetic diversity. With the small flock/herd sizes,one should expect random
gene frequency changes that are cumulativeover generations.
Given the above conditions, there are two areas in which tobase conservation efforts. These
consist of developing a conservationinfrastructure (a public service) and breeder actions (a privatesectoractivity). Nongovernmental organizations have to play a key rolein the conservation of
indigenous breeds, and their engagement is likelyto continue by assisting breeders with technology
transfer.Conservation infrastructure consists of a set of actions takenby the public sector for the
public good. These actions includedevelopment of cryopreserved germplasm reserves that can
beused to regenerate the breed, reduce inbreeding levels, anduse molecular genetic tools to
evaluate genetic diversity and/orgenes of interest. A sufficient quantity of semenand, potentially,
embryos should be collected to regeneratethe breed if necessary and to relieve potentially high
levelsof inbreeding.
In-situ maintenance of the genetic diversityis the responsibility of the breeders. To aid in
conserving indigenous breeds,there is a need to develop market for indigenous breeds that
provides breederswith an economic incentive for raising respective breed. Breeder, participation in
the breed association provides a linkage for technology transferand marketing activities.
Some of the biotechnologies offer tremendous potential to address real problems facing farmers
in developing countries. For example, the area of genomics, allowing the identification and
characterization of individual genes influencing traits such as disease or stress resistance, growth
rate or yield, promises to be of great value. The genetic material (genomes) of several hundred
species, including mammals, plants, fish, bacteria and viruses, has already been sequenced or
sequencing is in progress and the information generated from genomics studies in other fields, such
as human medicine or basic science, may also be useful for the application of genomics to food and
agriculture.
Causes of geneticerosion in domestic animals
Three factors are considered as being largely responsible for the declining genetic diversity of
livestock:
Destruction of the native habitats of livestock breeds;
The development of genetically uniform livestock breeds;
Farmer and / or consumer preferences for certain varieties and breeds (and changes in these
consumer preferences over time).
Among these, commercial interests are considered as the most important pressure on livestock
diversity. Important factors in determining the direction and nature of change include: growth
performance (productivity), pest and disease resistance, ease of handling, adaptation to current
levels of technology, and to a relatively minor extent consumer choice.

23

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Cause
Inappropriate Aid

Product-focused
selection
Changes in land use
Changes in
knowledge
Change in
Technology
Change in Economy
Intensification

Cross-breeding
Storage
Conflict
Disaster

Table 1: Causes of genetic erosion in domestic animals

Description
Lack of appreciation of the value of indigenous breeds and their importance in
niche adaptation.
Incentives to introduce exotic and more uniform breeds from industrialised
countries
Undue emphasis placed on a specific product or trait, leading to the rapid
dissemination of one breed of animal at the expense of others
Conversion of rangelands and mixed farming systems foragriculture, game parks,
and industrial use
The idea that "modern/imported is best" has led to the loss of knowledge about
traditional livestock husbandry practices and to the erosion of domestic animal
diversity
Replacement of animal draught and transport by machinery, leading to permanent
change of farming system, artificial insemination and embryo transfer leading to
rapid replacement of indigenous breeds
Decline in economic viability of traditional livestock production systems
Livestock populations that rely on veterinary services and on improved feeding
conditions. Heavy investment in preventative and curative veterinary measures,
and in feeding, housing and management.
Multipurpose local species and breeds replaced by those with higher milk, meat,
egg production (including cross-breeds and pure-bred exotics)
Predominance of sires from a few selected breeds in widespread cross-breeding
programmes can lead to loss of features expressed by specialised breeds
Failure of cryopreservation equipment (used to freeze semen, ova and embryos) or
lack of refrigerant, inadequate maintenance of frozen semen from breeds that are
not in demand
Wars and other forms of socio-political instability can lead to livestock owners
moving their stock out of their usual area, thus increasing the possibility of mixing
with other breeds thereby potentially losing a location-specific breed
Natural disasters such as floods, drought or famine can result in whole breeds
dying out

Conservation strategies
The conservation strategies being followed are ex- situ and in-situ. The ex- situ conservation is either
having live animals of a breed at nucleus farm or cryopreserving the germplasm for longterm
storage. In-situ conservation is maintaining the sizable population of a breed in the tract with the
livestock keepers.
Present status of conservation of Animal Genetic Resources
The economically important species/breeds are being maintained by the livestock keepers and are
being improved continuously. The population of these breeds is either growing or available in
sufficient numbers with sufficient genetic diversity. The breeds which are not economic to the
farmers need intervention. In this regard most of these breeds are being maintained in the cryocans.
In majority of cases the semen of these breeds has been preserved. In few cases other biological
material like ova, embryo or even somatic cells are being preserved.
Breeds which are facing extinction:Most of the draft cattle breed like Krishna Valley, Nagori, Khilar,
Bargur, Amritmahal, Punganur, Ponwar, etc. Many of the buffalo breeds like Bhadawari, Toda,
Surti are facing threat as Murrah is being used as improver breed throughout the country due to
increased demand of liquid milk. Due to very little value for the wool from the Indian breeds and
24

Molecular Genetic Characterization of Farm Animal


Genetic Resources

scarce grazing resources most of sheep breeds are losing ground. The sheep are being maintained
as meat animal but has to compete with goat which are more prolific and have an advantage over
sheep for value of meat in large part of country. Almost all the native breeds of chicken face
extinction due to over emphasis on commercial chicken farming. The pack animal species like
camel, equines, Yak etc face threatened due to their very limited utility and changing production
systems.
Which is more important- Conservation of breeds, genes or unique character: The conservation is of
paramount importance as the livestock resources contribute significantly to the rural economy and
especially for the down trodden population. The livestock is more evenly distributed than the land
resources and has great potential as a resource for poverty alleviation programme. The genes are
functional in a combination of large number of genes involved in different gene networks and the
breeds which possess the desired/unique characters have these genes in right combination. The
present day research to find out the unique alleles of various genes which have evolved over the
long period due to adaptation of the indigenous livestock resources are also very important. These
have become more important due to increased concerns for Global warming as our resources are
more adapt to sustain their production in harsh climate and scarce feed resources. The research
efforts are required to identify these alleles in different resources which can boost the economic
values of indigenous livestock.
Effectiveness of the programme on conservation:Various conservation programmes being executed by
the agencies is yielding little of the desired results as these are not able to improve the profits from
the uneconomic breeds/species of livestock. The conservation is a long term activity and the
benefits are not generally appreciated by the planners and the masses as one cannot account these
benefits in short term foreseeable time. Thus conservation activities have to be undertaken with
long term commitment in form of finances as well as continuity of the programmes. Most of the
time the conservation/improvement in livestock resources is compared with plant resources which
are entirely different and due to long generation interval per year gains are very nominal.
Suggestions/recommendations
The best way of conservation is to sustainably utilize the resources in their ecological niches so that
these are continuously evolved to produce in changing environs. The effectiveness of these
programmes can only be enhanced if the developmental agencies like state animal husbandry
departments are sensitized and the region/area specific long term plans are implemented for
genetic enhancement of resources with involvement of stakeholders/farmers.

25

5
Conservation of Genome Resources- Concept of Gene Bank
Rajeev A K Aggarwal
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Introduction
The genetic resources of farm animals in India are represented by a broad spectrum of native
breeds of cattle, buffaloes, goat, sheep, swine, equines, camels and poultry. The genetic biodiversity
among this livestock has developed and stabilized over millions of years of evolution and endowed
the indigenous breeds with capabilities to withstand hostile climate, epidemic pests and diseases,
and to survive on inadequate quantities of feed, fodder and water. However, over the years due to
many reasons the population size of many breeds is declining. As genetic diversity equips farmers
and breeders to utilize a wide range of production environments and develop diverse products to
meet the needs of local communities, the unavailability of such diversity in future may hamper
sustainable development. Hence, the need for conservation of animal genetic resources has been
accepted in India as well as globally.
Conservation methods
Conservation methods can be broadly categorized as in situ and ex situ. In situ conservation means
that animals are kept within their production system, in the area where the breed developed its
characteristics. Ex situ conservation applies to situation where animals are kept outside their area of
origin (herds kept in experimental farms, farm parks, within protected areas or in zoos) or more
often, when genetic material is conserved and stored in gene banks in the form of semen, ova,
embryo or DNA. Conservation through any of these methods has its own merits and demerits.
1. Organized flocks/herds: Maintenance of small population at a place away from the main breeding
tract of the breed is the ex situ conservation of the live animals. This may be in the form of
organized herd maintained in a research institution, bull mother farm, state owned livestock
farm, zoo or breed park. This population can be used in regeneration of endangered breed, new
breed development and DNA studies.
2. Cryopreservation of embryos: This is ideal for breed improvement, conservation and revival of lost
breed. Its main importance is due to its diploid nature and containing all genes. However
conserving embryos finds limited use, as its production and transfer require highly skilled
manpower and large resources.
3. Somatic cell banking: Somatic cells can be used as genetic material for conservation of endangered
animal genetic resources. They are diploid cells and contain full genetic code of an animal. Cost
of maintenance of these cells is very low and can be sampled quickly even from remoter area at
low cost. They can be used for production of therapeutic proteins also. However somatic cells
also find less preference in conservation programme as the success rate of cloning is still very
low.
4. Epididymal sperms banking:Epididymal spermatozoa particularly caudal spermatozoa are mature
and have full competence to undergo normal fertilization and cause fetal development. In vitro
fertilization (IVF) experiments have revealed that epididymal semen possesses binding sites for
important zona pellucida proteins. Collection of cauda epididymal semen from slaughtered
animals would be a rapid and cheap alternative of sperms conservation as it would obviate the
requirement of time consuming and extensive training of males for semen donation. Hence,
epididymal spermatozoa cryostorage is promising methodology of conservation especially in
small ruminants and further research efforts are needed in this direction.
26

Molecular Genetic Characterization of Farm Animal


Genetic Resources

5. Cryopreservation of embryonic stem cell lines: This can be excellent biological material for producing
live animals and producing genetically modified animals. This also finds usage in gene and cell
therapies, and for producing vital therapeutic proteins. However, so far they have limited usage
as stable embryonic stem cell lines have not been successfully generated in farm animals except
in human and rodents.
6. Cryopreservation of spermatogonial stem cell lines:The Spermatogonial Stem Cells (SSCs) are adult
stem cells, which transmit genetic information to the next generation and create foundation for
spermatogenesis. Transplantation of spermatogonial stem cells from a donor mouse testis into
the seminiferous tubules of a recipient mouse testis results in donor-derived spermatogenesis.
SSCs transplantation has also been demonstrated in goats, dog, cow, pig, baboon and bovine
spermatogonial stem cells shown to be capable of colonizing recipient mouse seminiferous
tubules. An in vitro system that supports the proliferation and maintenance of spermatogonial
stem cells could be used to preserve and expand spermatogonial stem cell numbers as well as
aid in genetic modification. However much needs to be done in farm animals before its potential
could be utilized in domestic livestock diversity preservation.
7. Storage of DNA: Cryogenic storage of DNA is another method of preservation of genetic material.
It has several advantages over the live germplasm as it is very easy to obtain, store, transport at
low cost with no chance of disease transfer. The DNA may find use in gene conservation
through their introgression by transgenesis or knock out technology, and can help in recreation
of lost breeds by cross checking of different populations or genetic material used. However this
has limitations due to the fact that genome maps of different farm species are not yet available
and life can not be created from DNA alone.
8. Frozen Semen: This is ideal for genetic resources utilization activities, providing sample half of
the genetic material of preserved breeds in a form that permits convenient introgression into
recipient population. However, regeneration of a cryopreserved breed from frozen semen in one
generation is possible only if living females of that breed are available, otherwise several
generation of up gradation are required to reestablish a conserved breed. In spite of this
limitation, availability of established semen freezing technology especially in cattle and buffalo,
and presence of semen freezing infrastructure across the country makes it method of choice for
conserving indigenous livestock biodiversity. The National Bureau of Animal Genetic Resources
(NBAGR) is playing a pivotal role in ex situ conservation through semen cryo-storage of
indigenous livestock for posterity by establishing a National Semen Bank at Karnal.
Conservation priority
High costs of collection and limited use of preserved material restricts development of ex situ
collection. Hence it may be appropriate to prioritize breeds for undertaking them in ex situ
programme and evaluation of many factors may make basis of such prioritization. To implement
the conservation programme it is thus essential to have breed-wise livestock census along with
their population and production trends. However many a times the data is available species wise,
there is a need to explore quick population estimates and undertake conservation efforts for
threatened breeds. The unique genes possessed by a breed and the likelihood of its extinction may
be an important parameter to set the priority of conserving a breed.
The quantification of relatedness among breeds can group them in different sets, each set
consisting of genetically closer/ relatedness breeds which are different than breeds of another set.
Such arrangement will drastically reduce the conservation costs as conserving a single breed in a set
will represent all breeds of respective set. Such phylogenetic differentiation of breeds is possible by
mapping the genes in livestock species using microsatellite markers. The usefulness of these
markers for estimation of genetic distances among closely related population in different species of
livestock has been documented by numerous studies (Bowcock et al.,, 1994, Buchanan et al., 1994,
27

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Cianpolini et al., 1995, Bradley et al., 1996, Mac Hugh et al., 1997). Food and Agricultural
Organization (1996) has well laid detailed technical programme for large scale international
conservation project using microsatellite markers under MODAD project.
Based upon
abovementioned considerations, some breeds of different species have been undertaken for ex situ
conservation programme and for keeping their frozen semen in National Semen Bank at NBAGR
(Table 1). Simultaneously ex situ conservation in form of DNA and Somatic cells has also been taken
at NBAGR.
Table 1: Germplasm (Frozen semen) stored in National GeneBank of NBAGR, Karnal
Cattle

Amritmahal,Dangi,Gangatiri,Gir,Hallikar,Hariana, Kangayam, Kankrej, Kherigarh, Khillar,


Krishan Valley, Ongole, Ponwar, Punganur, Rathi, Red Kandhari, Red Sindhi, Sahiwal,
Tharparkar, Vechur, Frieswal, Gaolao
Assamese Swamp, Banni, Bhadawari, Jaffarabadi, Murrah, Nilli-Ravi, Pandharpuri, Surti, Tarai,
Mehsana, Toda, Nagpuri
Black Bengal, Chegu, Osmanabadi, Assam Hill
Garole
Marwari, Zanskari, Poitou
Arunachali
Jaiselmeri

Buffalo
Goat
Sheep
Equine
Yak
Camel

Species=7; Breeds=44; Males=311; Semen Doses=1,29,174

Breed Population trends


The population trend of a breed is an important parameter to suggest, whether a breed should be
undertaken for its conservation. As breed wise census are not available, the endangered status of an
animal breed can be determined by the size of breeding stock, expressed through number of
breeding females and sex ratio, which may differ among different species. The endangered status of
an animal breed depending upon minimum population size in different species has been suggested
by different workers (Table 2). The population size, which may be applicable to Indian conditions,
has also been suggested (Table 3).
Table 2: Population estimates for endangered status of breed
Country
England
West Germany
Europe
General

Cattle
750
7,500
1,000
10,000

Sheep
1,500
15,000
500
10,000

Goat
500
5,000
200
10,000

Pig
150
200
10,000

Horse
1,000
5,000
10,000

Reference
Alderson (1981)
Simak (1991)
Maijala (1982)
FAO

Table 3: Population size of indigenous breeds for their status of endangerment (,000)
Species
Cattle
Buffaloes
Sheep
Goats
Camels
Horses
Pigs
Cattle
Buffaloes

Normal
25
30
50
30
20
20
10
>30
>35

Insecure
15-25
20-30
30-50
20-30
15-20
15-20
5-10
20-30
25-35

Vulnerable

Endangered

5-15
10-20
15-30
10-20
5-15
5-15
1-5
10-20
15-25

2-5
5-10
8-15
5-10
205
2-5
0.5-1.0
5-10
10-15

Critical
<2
<5
<8
<5
<2
<2
<0.5
<5
<10

Reference
Nivsarkar et al., 1994

Nivsarkar et al., 2000

Sample size in preservation programme


Sample size in preservation programs are influenced by both genetic considerations and cost. For a
dominant gene, semen to produce 20 viable offspring (100 units; a unit of semen is the amount
appropriate for one insemination) should suffice, but somewhat more semen (possibly 200 units)
28

Molecular Genetic Characterization of Farm Animal


Genetic Resources

may be desirable in preserving a recessive gene. The preservation of quantitative variation within a
population or breed would require about 100 units of semen from each of 10 to 20 unrelated males
(CAST, 1984). As per Smith 1984, conserving collection of frozen semen from 25 sires would be
adequate for all species. However it is appropriate to have frozen stores, which are large enough to
provide a good representation of the conserved stock and to prevent much genetic drift or
inbreeding.
Conclusion
India is a repository of a large segment of biodiversity in livestock germplasm, with nearly 39
breeds of cattle, 13 of buffaloes, 40 of sheep, 24 of goats, 9 of camel, 6 of horses and ponies, 16 of
poultry, 3 of pig, 1 of donkey and many populations of other livestock like Yak, Mithun etc. Having
so large animal diversity and spread over large territory of country, it becomes a gigantic task to
conserve even those breeds where populations are decreasing. Further the divergence in methods
for undertaking ex situ conservation programme also complicates it. This situation necessitates the
selection of a cost effective ex situ conservation method and involvement of many agencies in
undertaking ex situ conservation programme working in a network at national level for preserving
the indigenous farm resources.
References

Alderson L. 1981. The conservation of Animal Genetic Resources in United Kingdom. FAO Animal
Production and Health Paper No. 24, pp 53-76. FAO, Rome.
Bowcock A.M., Ruiz-Lineares A., Tonfohrde J., Minch E., Kidd J.R. and Cavalli-Sforza L.L. (1994). High
resolution of human evolutionary trees with polymorphic microsatellite. Nature 368, 455-457.
Bradley D.G., MacHugh D.E., Cunning-ham P. and Loftus R.T. 1996. Mitochondrial diversity and the origins
of African and European cattle. Proceedings of the National Academy of Sciences of the USA 93, 5131
Buchanan F.C., Adams L.J., Littlejohn R.P. et al. 1994. Determination of evolutionary relationships among
sheep breeds using microsatellites. Genomics 22, 397-403.
CAST 1984. Animal germplasm preservation and utilization in agriculture. Published by Council for
Agricultural Science and Technology. Report No. 101, 30-31.
Cinapolini R., Moazani-Goudarzi K., Vaiman D. et al. 1995. Individual Multilocus genotypes using
microsatellite polymorphisms to permit the analysis of the genetic variability with in and between Italian
beef cattle breeds. Journal of Animal Science 73, 3259-68.
Food and Agricultural Organisation of the United Nations (FAO) 1996. Global projects for the maintenance of
Domestic Animal Genetic Diversity (MoDAD).
Mac Hugh, D.E., Shriver, M.D., Laftus, R.T., Cunningham, P., Bradley, D.G. 1997. Microsatellite DNA
variation and the evolution, domestication and phlogeography of taurine and zebu cattle (Bos taurus and
Bos indicus ). Genetics 146, 1071-86.
Maijala K., 1982. Preliminary report of the working party on animal genetic resources in Europe. In
Conservation of Animal Genetic Resources. Session 1. Commission of Animal Genetics, EAPP, G.I.@
Leningrad.
Nivsarkar, A.E., Gupta, S.C., Vij, P.K. and Sahai R. 1994. Identification and conservation of endangered breeds
of livestock- strategies and approach. Proceedings of the National Symposium on Livestock Production and
Management held at Gujarat Agricultural University, Anand, 21 to 23 February 1994.
Nivsarkar A E, Vij P K, Tantia M S 2000. Strategies for conservation. In Animal Genetic Resources of India Cattle
and Buffalo. pp 318-333, ICAR, India
Simak E. 1991. The conservation of rare breeds in West Germany. In Genetic Conservation of Domestic Livestock.
(Ed.) Lawrence Alderson. Pp. 65-69. CAB International, London.
Smith, C. 1984. Economic benefits of conserving animal genetic resources. Animal Genetic Resources
Information. 3: 10-14.

29

6
Cytogenetic and Molecular Methods for Screening of Major Genetic
Defects in Livestock
S K Niranjan and R S Kataria
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________
Abnormality in specific parts of genetic materialcauses genetic diseasein anindividual.These
abnormalities may be a minor change in form of point mutation at nucleotide level to anlarge
alteration at chromosome level.Although, all of the genetic defects do not terminate into a disease,
however, it depends on the type, location and intensity of such genetic defect. Sometime, individual
looking normal may possessgenetic defect, which may remain unnoticed throughout the life time.
For example, mutation at nucleotide level, in heterozygous condition may not culminate into a
disease asanother normal copy of the genetic material compensates the effect. However, such
individual act as carrier to inherit the defect to next generations and may produce progenies
homozygous to the mutation after mating with similarcarrier or mutant homozygous individual.
Similarly some of the genetic diseases are also expressed at later stage of life, however, before that
individual can inherit the defected copy of the gene or chromosome to the next generation.
In most of the cases, genetic disorders are inherited from the parents; however some may be
acquired de novo due to mutation in genetic material. Ifthe genetic defect has been occurred in the
germline cells then it passes to the next generation through gamete. Other genetic defects in somatic
cells can not pass on to the next generation, however capable to cause a genetic disease in that
individual. Genetic diseases due to single gene defect as point mutation has Mendelian or
monogenetic inheritance. Mostly genetic defects are rare in nature because of continuous natural
selection against them. There are about 6000 known single gene disorders in human.Some of
theimportant diseases like cystic fibrosis, sickle cell anaemia, Huntingtons disease are found to
have a genetic basis of occurrence. Certain multifactorial diseases like cancer, which are also
supposed to be caused by either defect or presence of specific seemingly undesirable alleles in a
number of genes or loci. However, some of the other factors like environment also play an
important role in precipitating suchkind of diseases. In fact, the inheritance of these disorders is not
simple Mendelian type. Another kind of genetic disorder like mitochondrialencephalopathy a kind
of dementia is caused by mutation in the mitochondrial DNA. Mitochondrial inheritance occurs
from female parent only.
Inheritance of the chromosomal abnormality is not clear, however, some of the minor
chromosomal changes mayinherit in Mendelian mannerto the next generation.Majority of
individuals with chromosomal defect havevery less survivability and/or fertility; therefore, their
contribution to the next generation is naturally ended.
In such circumstance, any individual with genetic defect may inherit the defective gene or
chromosome to a larger number of progenies; thereby have more economic concern in livestock
industry. Because most genetic diseases are inherited from the carriers, which generally produce no
noticeable indications, the undesirable trait can proliferate extensively in absence of screening of
genetic defects.
During recent time, we are now able to diagnose the genetic defect in the individuals. Now,
biotechnology offers to diagnose genotypes, such as normal, carrier, or affected individuals.
Understanding the molecular basis of a defect, the direct detection of the heterozygous carriers is
thus possible even during embryonic stage. In livestock, genetic screening has become much
30

Molecular Genetic Characterization of Farm Animal


Genetic Resources

essential in view of intensive selection in dairy and meat industry, which has predisposing only few
of the high valued males.Now a day, cytogenetic and molecular screening of all breeding males has
been made essential in the new National Programme on Cattle and Buffalo breeding (NPCBB) to
keep our farm animals free from genetic defects aroused by any chromosomal abnormalities or
nucleotidemutations.
Cytogenetic methods
Each species of domestic animals has specific chromosomes, regarding the number as well as the
form. Following table is showing normal chromosome numbers in different livestock species.
Table 1. Chromosome numbers in different livestock species
Species

Scientific name

Cattle
River buffalo
Swamp buffalo
American bison
Mithun
Camel
Dog
Cat
Donkey
Goat
Horse
Pig
Sheep
Yak

Bos taurus, B. indicus


Bubalus bubalis
Bubalus bubalis
Bison bison
Bos frontalis
Cemelusbacterianus
Canis familiaris
Feliscatus
Equusasinus
Capra hircus
Equuscaballus
Sus scrofa
Ovis aries
Bos grunniens

Chromosome
Number
60
50
48
60
60
74
78
38
62
60
64
38
54
60

Chromosome structure depends on the stage of mitosis. The chromosomes can be set up
pairwise, when individual pairs can be identified. They are arranged according to size and/or the
position of the centromere. Shortest arm of the chromosome is called the p-arm and the longest the
q-arm. When the chromosomes are presented, the q-arm is always turned downwards. Centromere
assumes relatively at constant position at the chromosome. Therefore, the ratio of two arms length
is remain as constant and is important for identification of different chromosome. Position of
centromere and ratio of arms length classify the chromosomes in four categories- Metacentric,
submetacentric, acrocentric and telocentric. Different species in same family have similar kind of or
homologous chromosome. Cattle possess 29 pairs of autosomes acrocentric and X (submetacentric)
and Y (metacentric in taurine and acrocentric in indicine cattle) sex chromosome. In buffalo, there
are 5 pairs of autosomes are submetacentric, 19 pairs of autosomes and X and Y sex chromosome
are acrocentric. Five submetacentric chromosomes of buffalo derived from centric fusion of 10 pairs
of autosomes i.e. 1/27, 2/23, 8/19, 16/29, and 5/28 of cattle. Swamp buffalo has another fusion of 4
and 9 chromosomes of riverine buffalo. Like cattle, goat also has 60 chromosomes, which are all
nearly identical with those in the cattle, except for the sex-chromosomes X and Y. The Xchromosome in the goat is acrocentric and the Y-chromosome is much smaller than the cattle. In
sheep the same differences in the sex-chromosomes are found, but in addition there are three
centromere fusions. The sheep possess 54 chromosomes due to fusion of 3 pairs (1/3, 2/8 and 5/11)
of chromosomes of goat.

31

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Karyotype of male zebu cattle

Karyotype of male Taurus cattle

Karyotype of Male River buffalo

Chromosomal abnormalities
Chromosome abnormalities usually occur when there is an error in cell division. Mitosis and
Meiosis, both processes, the correct number of chromosomes is supposed to end up in the resulting
cells. However, errors in cell division can result in cells with too few or too many copies of a
chromosome. Errors can also occur when the chromosomes are being duplicated. Other factors that
can increase the risk of chromosome abnormalities are maternal age and environment. Generally in
mammals, female is born with all the eggs. The age of eggs also increase with the age of female
therefore, older females are more at risk of giving birth to babies with chromosome abnormalities
than younger. In males, sperms are newly produced throughout the life; therefore, it does not
increase risk of chromosome abnormalities. Some time, specific environmental factors can also
cause chromosome abnormalities. It is also important to note that some races of human being and
some breeds in livestock or from specific region may have higher incidence of genetic defects, at
chromosomal or DNA level.
Abnormality in chromosomal numbers:Euploidy is the condition of having a normal number of
structurally normal chromosomes. Aneuploidy is any deviation from euploidy, having less than or
more than the normal diploid number of chromosomes.It is the most frequently observed type of
cytogenetic abnormality. Monosomy is lack of one of a pair of chromosomes. A common
monosomy seen in many species is X chromosome monosomy and is commonly lethal during
prenatal development.Trisomy is having three chromosomes of a particular type. Another type of
aneuploidy is triploidy. A triploid individual has three of every chromosome, that is, three haploid
sets of chromosomes. A triploidcattle would have 90 chromosomes (3 haploid sets of 30). Triploid
commonly occurs by fertilization of one ovaby two sperm. However, birth of a live triploid is
extraordinarily rare and such individuals are quite abnormal.
Abnormality in chromosomal structure:A chromosome deletion occurs when the chromosome breaks
and a piece is lost. This of course involves loss of genetic information. A related abnormality is a
chromosome inversion- a break or breaks occur and that fragment of chromosome is inverselyrejoined. Inversions, thus do not involve loss of genetic material, however, breakpoints may disrupt
thegene. Generally, individuals carrying inversions have a normal phenotype. In chromosomal
translocation, chromosome(s) break and the fragments re-join to other chromosome(s). There is no
loss of genetic material, although the breakpoint can cause disruption of a critical gene or may
create fusion gene. Translocation is manifested as reductions in fertility or some time some disease
conditions like cancer. When two non-homologous chromosomes break and exchange fragments, it
is termed as reciprocal translocations. Individuals carrying such abnormalities may have a normal
phenotype, but may show subnormal fertility. A centric fusion is a translocation in which the
centromeres of two acrocentric chromosomes fuse to generate one large metacentric chromosome.
32

Molecular Genetic Characterization of Farm Animal


Genetic Resources

They are also often called Robertsonian translocations. The karyotype of an individual carrying a
centric fusion has one less than the normal diploid number of chromosomes. The best known is the
1/29 centromere fusion of chromosome 1 and 29 in cattle.
Chromosomal abnormalities may originate during gametogenesis or during or after fertilization.
Majority of aneuploids result from defective or abnormal gametogenesis, however, most of
haploids and polyploids occur during or after fertilization. About 25% of the abnormalities can be
attributed to errors during meiosis, while rest of the abnormalities occurs around the time of
fertilization. Chromosomal abnormalities may account for approximately one fifth of the total
embryonic and fetal loss. It has been seen that development rate is comparatively slow in
chromosomally abnormal embryos compared to normal diploid embryos. Major deviations are
rarely compatible with survival, and such individuals usually die prenatally.
Incidence and Significance
Both the overall incidence and the occurrence of specific abnormalities clearly depend upon when
the data are collected relative to development. This bias is clearly understood by considering the
effect on survival of minor versus major genetic lesions. For example, when newborn children are
screened, it is found that roughly 1 in every 200 has a chromosomal abnormality. Some of these
children are phenotypically normal, while others have obvious, sometimes severe manifestations of
disease. By definition however, these children have chromosomal disorders at the "mild" end of the
spectrum because they are compatible with survival to term.
A much higher incidence of chromosomal disease is seen if one looks earlier in gestation.
Approximately half of the human fetuses that are spontaneously aborted during the first trimester
are chromosomally abnormal, reflecting chromosomal disorders severe enough to disrupt prenatal
development. If one looks at the chromosomes in pre-implantation embryos, even higher numbers
of abnormalities are seen: 5-10% of viable blastocysts collected from cattle and pigs were
cytogenetically abnormal. Finally, some chromosomal abnormalities are essentially never seen,
presumably because they are so profound as to cause death shortly after fertilization.
The concepts on incidence presented above refer to the broad spectrum of chromosomal
disorders. It is important to recognize that certain abnormalities can reach a very high and
important prevalence in small populations of animals. This has been vividly observed with certain
types of translocations, which reduce fertility yet cause little if any disease in carriers. A classic
example is the 1/29 centric fusion in cattle, which has at times reached a prevalence of up to 30% in
certain breeds within a particular country.
Multiple congenital malformations are seen with many types of chromosomal abnormalities,
particularly deletions and aneuploidy. Animals with a balanced set of chromosomes will generally
be normal phenotypically. If an individual does not have a balanced set of chromosomes, this will
normally be visible through more or less deviation of phenotype from normality. Animals with a
non-balanced set of chromosomes will most often be sterile and have low vitality. Chromosome
deviations, in animals with a normal phenotype, are normally detected due to low fertility or
complete sterility.The trisomies are very rare in animals, but they occasionally occur. In cattle,
normally the foetuses carrying trisomy of chromosome 28 are aborted or die straight after birth.
Such animals show cleft palate and heart abnormalities.In most domestic animals less severe
chromosome errors occur. The subfertility is caused by problems in chromosome pairing and
segregation during meiosis. In general, however, it shows a substantial, often greater than 50%
reduction in fertility. Chromosomal fusion in heterozygote form causes a slightly lower fertility.
The karyotype of a bull with low fertility has shown having a 1/8 translocation. In livestock, the
defects of sexual chromosomes usually influence the development and function of reproductive
system. In buffaloes and some cattle reduced fertility revealed the structural and numerical
aberrations of the chromosomes more frequent, specifically chromosomal gaps and deletions in
33

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

autosomal and sex chromosomes as well chromatid breaks and centric fusions in autosomal
chromosomes. Chromosomal disorders such as XO, XXY, translocation reported in livestock can
reduce the fertility or hamper the breeding of animals. However, defects in autosomes are usually
lethal except mosaicismand translocations.
In buffaloes, mosaicism of sex chromosomes has been observed in heterosexual and in some of
homosexual twinning cases. XX/XY is found in intersex female and XO/XX/XY is observed in cotwined bull. On the other hand, XO/XY and XO/XX were recorded in one male and in one female
of homosexual twins respectively. In twinning of foetus with different sex, a mixture of stem cells is
established for the white and the red blood cells by mixing the blood in the early foetal stage. If the
mixing is too extensive the heifer in a mixed twin pair gets abnormal sexual organs and is infertile
and called Free martins. The bull birthed from such twining generally has normal fertility, however,
might show the genotype of the other twin.
Cytogenetic screening
By studying the chromosomes, we generally study the inheritance pattern from one generation to
another. It also gives an opportunity to locate the genes and their arrangement on the
chromosomes, which become important for the linked loci. Generally, chromosomes are extended
during interphase of the cell, however, condensed desired shape is achieved during the metaphase
of the cell division. Science applies for studying the chromosomes for their structure, function,
anomaly, and establishing the relationships with phenotype is called cytogenetics. It also includes
routine analysis of chromosomes, their banding. Now a days molecular cytogenetics like
fluorescent in situ hybridization (FISH) and comparative genomic hybridization (CGH) has also
been come out, which is analysing the chromosome with more refinement, however, limited use is
there in routine due to high cost.
Chromosomal banding
Chromosomal banding is mainly based on the staining chromosomes with a specific dye. Most
commonly used bandings are G (Giemsa), R (reverse), Q (quinacrine) and C (centromere) banding.
During staining some part of the chromosome is strongly stained compared to others, forming band
like patterns. These darkly stained bands are referred as positive or respective G, R, Q and C band.
Each of these techniques produces a pattern of dark and light (or fluorescent versus nonfluorescent) bands along the length of the chromosomes. Importantly, each chromosome displays a
unique banding pattern like bar codes, which allows it to be reliably differentiated from other
chromosomes of the same size and centromeric position. G- banding technique preferentially stains
the regions that are rich in adenine (A) and thymine (T). R- banding is reverse to the G-banding in
which region rich in A and T are light in staining. C banding stains the heterochromatin areas.
NOR-staining identifies genes for ribosomal RNA in necleolar organizing region.
Molecular methods
Genes are located on chromosomes. Sometime, any individual may be chromosomally sound but it
may have the genetic defect at DNA level. Although mutations at DNA level, particularily point
mutation are more frequent in nature but mutation at a functional part of the genome may cause
genetic disorder. Such defects are also inherited to next generation. In contrary to chromosomal
defects, the individuals with gene defect(s) may survive and fertile enough atleast if the defect is in
heterozygous condition.For example, frequency of several genetic diseases like BLAD,
citrullinemia, deficiency of factor XI, and most importantly the infertility (among crossbred males
as well as females) has also increased in view of crossbreeding in cattle, leading to major
production losses.

34

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Bovine Leukocyte Adhesion Deficiency (BLAD)


Bovine leukocyte adhesion deficiency (BLAD) is a recessive autosomal disease. This is a syndromic
condition arising by changing the protein structure of integrin cell adhesion molecules due to a
point mutation in 2 integrin gene which causes inability to adhere the leukocyte with, thereby not
able to reach the infection site. A substitution of amino acid at 128 (D128G) site due to point
mutation at ITGB2 gene causes the defect, when it is present in homozygous condition. It is most
prevalent in black and white cattle specifically Holstein Friesian. In India, occurrence of BLAD has
been identified particularly crossbred cattle between 5-7% as carrier.
Citrullinemia
It results by deficiency of one of the enzyme argininosuccinatesynthetase (ASS) of urea cycle
resulting in accumulation of citrulline in blood and finally ammonia poisoning. Deficiency of ASSis
caused by single-base substitution (C>T), converting the CGA codon to TGA, a termination codon
within exon 5. The calves affected by the disease are incapable of disposing of ammonia on their
own even shortly after birth and begin to show neurological problems. Calf shows depression,
followed by unsteady gait, aimless wandering, apparent blindness, head pressing, collapse,
convulsions, and death within a week. This abnormality is reported mostly in HF cattle, carrier may
be found upto 10%.
Deficiency of Uridine Mono Phosphate Synthase (DUMPS)
Deficiency of Uridine Mono Phosphate Synthase (DUMPS) is transmitted as an autosomal recessive
trait. Itis inherited as a single, two-allele, autosomal locus.The homozygous genotype for the
deficient allele is lethal in utero about 40 days post conception. Uridine monophosphate synthase is
necessary for de novo synthesis of pyrimidine nucleotides. This disorder of pyrimidine nucleotide
biosynthesis results in early embryonic mortality for the homozygous recessive.
Factor XI Deficiency
It is also an autosomal defect. Factor XI deficiency may or may not be accompanied by spontaneous
or induced bleeding episodes. Continued bleeding from the umbilical cord is sometimes seen in
affected calves. However, some animals die due to excessive bleeding. While affected animals can
survive for years with neither ofevident clinical signs, they as a group appear to have higher
mortality and morbidity. Frequency of carriers is around 2%. Genetic cause involves a 76 bp
segment insertion, containing a stop codon in between into exon 12 of FXI, which prevents the
translation of full-length protein.
Reference

B. M. Marron, J. L. Robinson, P. A. Gentry and Beever J. E. 2004. Identification of a mutation associated with
factor XI deficiency in Holstein cattle. Animal Genetics, 35: 454.
Dennis, J.A.; Healy, P.J.; Beadudet, A.L.; O'brien, W.E. 1989. Molecular Definition of Bovine
ArrgininosuccinateSynthetase Deficiency. Proceedings of the National Academy of Sciences ofthe United
States of America 86: 7947.
King W.A. (1990).Chromosome abnormalities and pregnancy failure in domestic animals.Advances in
Veterinary Science and Comparative Medicine, 34: 229.
Prakash, B., Balin, D.S., Lathwal, S.S. 1992. A 49,XO sterile Murrah buffalo (Bubalus bubalis) Veterinary record
130: 559.
Prakash, B., Balain, D.S., Lathwal, S.S. and Malik, R.K. 1995. Infertility associated with monosomy-X in a
crossbred cattle heifer. Veterinary-Record. 137: 436.
Schwenger, B., Schber, S. and Simon D. 1993.DUMPS Cattle Carry a Point Mutation in the Uridine
Monophosphate Synthase Gene. Genomics 16: 241.
Shuster, D.E.; Kehrli, M.E.; Ackerman, M.R.; Gilbert, R.O. 1992. Identification and prevalence of genetic defect
that Causes Leucocyte Adhesion Deficiency Diseases in Holstein Cattle. Proceedings of the National
Academy of Sciences of the United States of America 89: 9225.

35

Genetic Characterisation of Animal Genetic Resources: Principle,


Methodology and Guidelines
Monika Sodhi, S K Niranjan, M Mukesh and R S Kataria
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Animal Genetic Resources (AnGR) includes both livestock and poultry resources available for food
and agriculture. As per FAOs Global Databank, there are nearly eight thousand livestock breeds in
the world. Among these, about 20% of the breeds are classified as at risk. Moreover, a large fraction
has not been properly characterized and genetic similarity between them is largely unknown. Such
knowledge is useful in designing conservation programs as well as in formulating breeding
programs. The selection of breeds or strains of livestock for conservation or improvement programs
can be hampered by an inadequate description of population structure both within and between
populations. The choice of appropriate populations for conservation or improvement should be
based on a combination of phenotypic and genetic data. Geographical isolation over time has built
up a plethora of genetic types but the magnitude of genetic differentiation has rarely been
quantified. Indiscriminate crossbreeding further clouds the situation. A key element of a
conservation strategy for animal genetic resources must be the characterization of breeds and
strains to provide an overall picture of genetic diversity. Though it is difficult to characterize the
difference between the breeds in terms of agriculturally important genes, but general genetic
variability is the most suitable criteria for identifying the breeds for genetic uniqueness, which is an
important criterion that can be used when breeds are selected for conservation. The underlying
assumption being that breeds which are taxonomically distinct (Hall and Bradley, 1995)are most
likely to have special adaptation and gene combination not found in other breeds. By selecting for
conservation those populations with unique evolutionary histories a maximum amount of diversity
could be preserved. Genetic characterization of the native breeds is a first step in the conservation
programme, as it will help the decision makers to identify genetically unique breeds so that they
may be prioritized for breed conservation purposes. In order to facilitate and rationalize the
maintenance of domestic animal diversity, it is essential that simple assays be quickly developed
taking advantage of molecular genetic tools now available. At present, arrays of DNA based
techniques to type polymorphic loci for detecting diversity at DNA level are available and are being
exploited globally to construct genetic profile for different populations/breeds/strains of farm
animals.
Molecular tools in diversity analysis
Traditionally, phenotypic characterization based on morphological features, physical body
measurements, production traits, reproductive traits and adaptive traits was used in the description
of breeds. A number of reports are available (Acharya and Bhatt, 1984; Nivsarkaret al., 2000)
wherein assignments of indigenous breeds was based on phenotypic/subjective data and
information generated from the local sources. Additionally, efforts were also made on the
characterization of genetic variation. Early reports on the detection of genome variation were
focused on the analysis of protein and blood group type variation. These biochemical markers were
used extensively, but were not very effective for characterization, as they often express low level of
polymorphism, and also are sex limited and age dependent. To overcome the limitations of
biochemical markers, several DNA based markers were employed for genetic characterization. The
exploitation of DNA polymorphism as molecular markers has opened many vistas in genetic
characterization, improvement and molecular evolution studies in farm animal. The molecular
36

Molecular Genetic Characterization of Farm Animal


Genetic Resources

markers include microsatellite markers (simple tandem repeat, STR), single nucleotide
polymorphism, tandem repeats (VNTRs), random amplified polymorphic DNA (RAPD), single
strand conformation polymorphisms (SSCPs), amplified fragment length polymorphisms (AFLPs),
and restriction fragment length polymorphisms (RFLPs). Additionally, in diversity and phylogeny
studies, specific mtDNA and Y chromosome markers are used for the identification of maternal and
paternal lineages. Out of the number of molecular markers available, the analysis of microsatellite
typing is one such method of choice.
Genetic marker
Among the available neutral molecular markers, microsatellite markers/ STR (short tandem
repeats), maternally inherited mitochondrial DNA (mtDNA) as well as paternal based Ychromosomal variations have been extensively utilized to reveal the genetic structuring,
domestication events and male/ female demographic patterns among the livestock species. The
approach has been extensively demonstrated in Human races to understand the gene flow, genetic
structure and population ancestry.
Amongst these, besides several unique properties, STRs are highly sensitive to population
bottlenecks and selection. mtDNA may be a poor indicator of overall genomic diversity because it
is a single locus and is an extra-nuclear genetic marker with specific evolutionary dynamics. Also, it
is maternally inherited and does not detect male-mediated gene flow, which has a powerful
influence on the evolution among few species, such as pig, in modern times. The Y-chromosome is
paternally (male mediated) inherited and despite being low polymorphic within a species, due to
nonrecombining part of the Y chromosome (NRY) it maintains the original arrangement of
mutation events enabling to trace the male lineages both within and among population.
Therefore, DNA based Microsatellite markers are most preferred for genetic characterization.
With the automation in sequencing and genotyping technologies, it has now become much easier to
genotype microsatellite loci in large number samples. With the availability of high-through put
systems, the most frequently used markers in genetic diversity studies are the microsatellite
markers. Most of these loci are selectively neutral which makes them compatible with the
assumptions of most population genetic theory. They remain unaffected by the environmental
factors, and generally do not have pleiotropic effects on quantitative trait loci (QTL).These are
simple tandem repeated (STRs) motifs of 1-5 nucleotides that are densely and evenly distributed
throughout the genome and often exhibit substantial variation/polymorphism due to site specific
length variation. Their short lengths make them amenable to amplifications by PCR and subsequent
separation by polyacrylamide gels with the resolution of alleles differing by as low as single base.
Additionally, with the automation in sequencing and genotyping technologies, it has now become
much easier genotype microsatellite loci in large number samples.
Microsatellite DNA markers
With the availability of high-through put systems, the most frequently used markers in genetic
diversity studies are the microsatellite markers. These are simple tandem repeated (STRs) motifs of
1-5 nucleotides that are densely and evenly distributed throughout the genome and often exhibit
substantial variation/polymorphism due to site specific length variation, as a consequence of the
occurrence of different number of repeat units. The difference in repeat number can be reliably
distinguished, and the variants are inherited as alleles at each locus. The polymorphic nature of this
type of locus, with variations many times more common than is non-repetitive sequence makes
microsatellite ideal for examining genetic variation within a species. Microsatellites occur at a
frequency of 1 SSR per 10kb DNA and numbering a total of about 50 100 thousand in the
mammalian genome. Their short lengths make them amenable to amplifications by PCR and
subsequent separation by polyacrylamide gels with the resolution of alleles differing by as low as
37

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

single base. Additionally, with the automation in sequencing and genotyping technologies, it has
now become much easier genotype microsatellite loci in large number samples.
The FAO has formulated an integrated programme for the global management of genetic
resources of various livestock species using species-specific lists of microsatellite loci (about 30 per
species) for cattle, chicken, sheep swine and buffalo to be used in diversity studies and a number of
projects were carried out worldwide for diversity studies. The advisory group of FAO MoDAD
project has compiled a list of 25-30 highly polymorphic microsatellite markers to be used for
analysis of genetic distances for each species. To select appropriate microsatellite the working
group of MoDAD project has issued the following criteria.
- The microsatellite marker should be in public domain.
- Wherever possible, microsatellite loci that have been identified in mapping studies should
be used and should preferably be known to be unlinked
- The microsatellite variants should be shown to exhibit Mendelian inheritance.
- Each microsatellite locus should exhibit at least four alleles.
- There should be information on the microsatellite loci in a published report.
- The microsatellite loci suitable for several related species (heterologous markers) should be
preferred
- Microsatellite markers to be used should be suitable for multiplexing with automated
DNA sequencer.
These criteria were agreed in a meeting of the EU-AIR concerted action group on Analysis of
genetic diversity in cattle to preserve future breeding option which was held in Dublin in 1995. List
of microsatellite markers was compiled as per above recommendations for universal use in
molecular genetic characterization of breeds so that joint analysis of future data from different
laboratories would be possible for prioritizing breeds for conservation in terms of genetic
uniqueness.
International Society of Animal Genetics(ISAG)FAO Advisory Group on Animal Genetic
Diversity have recommended different panels of 30 microsatellite markers for nine major livestock
species-cattle, buffalo, sheep, goat, horse, donkey, camel, pig and chicken (Molecular Genetic
Characterization of Animal Genetic Resources). The list of these is also available at websitewww.globaldiv.eu/docs/Microsatellite%20markers.pdf.
NBAGR hasstandardizedapanel of 25 markers for cattle, buffalo, sheep, goat, camel,equines and
23 markers for pig for genetic characterization. This approach not only yields more accurate data
than using a subset of the markers, but also offers more opportunity for comparisons with results
from previous studies.
Sampling design
For designing for the sample collection, consider the structure of the production system, geographic
locations and pedigree relationships. For genetic characterization, it should be ensured that samples
are drawn in such a way that it should cover most of the genetic variability in the population. For
the sample collection, consider the structure of the production system, geographic locations and
pedigree relationships.
Sample should be collected preferably from the areas (breeding tract) that are closest to the site
of the development of the breed. Samples should also reflect different agro-climatic zones, where
the breed is found. Typically not more than 10 percent of any one herd or village population should
be sampled and in any case not more than five animals should be sampled from any herd. Always
avoid sampling from animals with common ancestors at least for three generations.
If it seems that there are genetic subdivisions within breeds, then it is desirable to collect the
samples that represent all the subtypes. Further, also keep the records of the animals and types,
which are sampled. For breeds that are of hybrid origin (via introgression, upgrading or the
38

Molecular Genetic Characterization of Farm Animal


Genetic Resources

planned creation of a synthetic breed) it is essential to have data from parental breeds. For breeds
having a recent history of intense selection and/or inbreeding, sampling of animals from previous
generations which may be available in the form of cryopreserved semen samples may be
appropriate.
For genetic characterization based on mitochondrial DNA (mtDNA), sampling of animals with
common maternal origin should not be taken. Similarly, for Y-chromosomal markers based
characterization, samples belonging paternal origin should be avoided.
Sampling material
Blood is most preferable tissue for sampling. Generally 10-15 ml of blood should be collected as a
sample from an individual. Other samples like semen, hide, bone, tissue (e.g. ear tissue), faeces,
fossils, plucked hair with root cells and feathers can also be used.
Number of samples
For reliable estimation of allele frequencies, at least 25 and preferably 50 animals per breed should
be typed for genetic characterization. More than 50 animals should be collected in view of possible
losses, mistyping or missing. If there are population subdivisions, different subtypes or agro
climatic zones, sampling a larger number of animals is recommended.
DNA extraction
Standard protocol (Phenol: chloroform method) should be followed for DNA extraction from blood
or any other tissues. Several reliable protocols for DNA extraction are available. Most commonly
used protocols are based on proteinaseK/SDS lysis of cells, organic extraction and alcohol
precipitation. Kit based DNA extraction can be done as per protocol given by manufacturer.
Every sample of genomic DNA should contain a minimum of 100 g at which it is used. The
quality of the DNA should be as follows: A260/A280=1.7-2.0; A260/A230>1.5. One agarose gel
electrophoresis photo with at least one size marker should be submitted.
Genotyping
Microsatellite genotyping can be performed either manually (running Urea-PAGE polyacrylamide
gels followed by silver staining) or through automation (amplification using Fluorescent dye
labeled primers and genotyping by automated DNA sequencer). However, the time consuming and
cumbersome technique of manual genotyping is not very successful because of the difficulty in i)
accurate determination of the allele size, ii) comparison of data across the different breeds and, iii)
its reproducibly. All these factors hinder the comparative studies across the various livestock and
poultry breeds. With the advances in sequencing technologies and amenability of microsatellites to
automation, the switch from manual genotyping to fragment analysis using sequencing technique
was feasible and presently is the most widely used methodology.
Automated microsatellite genotyping i.e. amplification using Fluorescent dye labelled primers
and genotyping by automated DNA sequencer should be preferred over manual genotyping
through running Urea-PAGE polyacrylamide gels followed by silver staining technique.
While carrying out microsatellite based genotyping, at least one reference sample should be
included in each experiment so as to cross-validate successive genotyping experiments. It is
preferable that one laboratory performs all typing for a given marker in order to exclude
laboratory-dependent scoring. Include at least one reference sample in each experiment so as to
cross-validate successive genotyping experiments. Use the FAO recommended microsatellite panel
and if possible, include international reference samples in order to link your data to other datasets.
Multiplexing of PCR products while performing fragment length analysis with automated DNA
sequencer can reduce the cost. However, care should be taken to multiplex amplicons of different
sizes and labeled with different dyes (FAM, VIC, NED or PET).
39

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

PCR products of different sizes and dyes can be pooled for maximizing the throughput. It is
important to pool PCR products together at the correct ratios, in order to get similar fluorescent
intensities across all loci in the pooling. The fluorescent dyes are detected with different efficiencies.
The pooling ratio, or amount of each dye-labeled product added with respect to the other products
in the pool, should be adjusted to ensure an appropriate detection of all the loci.
Post PCR multiplexing
To maximize the thoroughput, products amplified by different primers with different dyes were
pooled for one capillary injection. This is based on fact that ABI PRISM3100 DNA Analyzer can
automatically analyze PCR products of different sizes and dyes. In order to get similar fluorescent
intensities across different loci, it is important to pool PCR products using correct ratios. Hence, the
pooling ratio or amount of each product added with respect to the other products in the pool is
crucial to ensure an appropriate detection of all the alleles. After optimization of pooling ratios, the
products with different fluorescent labels are mixed in the following ratio.
FAM labeled PCR product- 1.0 l
VIC labeled PCR product1.5 l
NED labeled PCR product- 2.0 l
PET labeled PCR product 2.0 l
GeneScan-500 LIZ Size Standard (Applied Biosystems) is used as the internal standard for
fragment sizing. This size standard yields size fragments between 50 to 500 bases providing 16
single-stranded labeled fragments of 35, 50, 75, 100, 139, 150, 160, 200, 250, 300, 340, 350, 400, 450,
490, and 500 bases. Each of the DNA fragments is labeled with a proprietary fluorophore, which
results in a single peak when run under denaturing or native conditions. Internal lane size standard
is run with every sample for accurate sizing.
Data analysis
Microsatellite markers should address questions related to within breed or between breed diversity
based on various parameters.
Intra-population analysis. (allele diversity, gene diversity, deficiency of heterozygotes).
Inter-population analysis (Genetic distance and analyses of molecular variance)
Numbers of methods are available for analysis of data recorded as genotype designations for each
individual across the microsatellite loci using many software packages with different analytical
methods that can be downloaded from internet.
The multi locus genotypes of individuals can be used to analyse the assignment accuracy of
individuals to their respective population using software programme. The individuals can be
assigned to the population in which the likelihood of their genotype is highest and to the
(genetically) closest population.
Numbers of methods are available for analysis of data recorded as genotype designations for
each individual across the microsatellite loci using many software packages with different
analytical methods that can be downloaded from internet. Appropriate software can be used to
assess the within and between breed diversity .Some of the software packages most commonly
used in population genetics are follows:
POPGENE (http://www.ualberta.ca/~fyeh/)
AMOVA (Analysis of Mol. Var.)
Arlequin ( http://lgb.unige.ch/arlequin/)
GenAlEx (http://www.anu.edu.au/BoZo/GenAlEx /)
GENEPOP ( http://wbiomed.curtin.edu.au/genepop/)
GDA (http://lewis.eeb.uconn.edu/lewishome)
GENETIX (http:// lotka.stanford.edu/microsat/microsat.html).
40

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Microsatellite ( http://oscar.gen.tcd.ie/~sdepark/ms-toolkit/
FSTAT( http://www.unil.ch/izea/softwares/fstat.html)
Phylip (http://evolution.genetics.washington.edu/phylip/getme.html)
TreeView ( http://taxonomy.zoology.gla.ac.uk/rod/treeview.html)
The following checks should be carried out while data analysis in order to minimize the error rate Identify and critically evaluate samples with identical results, which may indicate errors
during sampling (related samples) or processing of samples.
Examine unusual alleles, which may result from incorrect interpretation of electrophoretic
patterns.The reason may be the bleed-through from other colors because of off-scale data/
primers not fully optimized.
Check for an excess of apparent homozygosity in samples with low DNA concentration
because of allele dropout (i.e. the inability of the assay to detect certain alleles).
Standardize allele-calling with other laboratories, particularly for microsatellites.
Compare allele frequencies with data from breeds that are likely to share the most frequent
alleles in order to detect inconsistent allele sizing.
Check for absence of laboratory-dependent clustering of breeds, which may result from
systematic differences in allele calling. One cause of laboratory-dependence may be labdependent differentiation of microsatellite alleles that only differ by one bp in length.
Determine if any pairs of markers are in linkage disequilibrium (LD). Markers in LD in all
populations are probably genetically linked and thus provide less information about genetic
variability than would two markers that are independent.
Check for markers that diverge from Hardy-Weinberg (HW) equilibrium. Markers that in
most breeds are not in HW may have null alleles or be linked to loci under selection, hence
breaking the assumption of neutrality. Within single breeds, divergence from HW may
indicate the presence of inbreeding or assortative mating.
Calculation of within breed diversity indices
The observed number of alleles (N o ), effective number of alleles (N e ), observed heterozygosity
(H obs ) and expected (H exp ) heterozygosity, Polymorphism Information Content and frequency
distribution at each locus can be calculated using POPGENE software. The allele frequency data is
further used to calculate the number of private alleles (alleles specific to one breed) as well as
number of shared alleles using software GDA (http://lewis.eeb.uconn.edu/lewishome) can be
used.
Hardy-Weinberg and linkage equilibrium test
Three different tests chi square (2), likelihood ratio (G2) and exact test can be applied to analyze the
deviation from Hardy-Weinberg equilibrium (HWE). In Chi Square (2) test, observed and expected
genotypic frequencies were compared while G2 measure likelihood ratio. The Fishers exact test is
applied using Markov chain procedure to compute unbiased estimates of the exact probabilities (P
value).
Ewens Watterson neutrality test
The neutrality of markers markerscanbe checked with POPGENE software by applying Ewens
Watterson test. The test calculates the quantity F which is equal to the sum of squared allele
frequencies.
Estimation of bottleneck in cattle breeds
To estimate the bottleneck events in the investigated breeds, two different approaches can be
followed. The first approach based on the heterozygosity excess consisted of three tests: sign test,
standardized differences test and a Wilcoxon sign-rank test. These methods test for the departure
41

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

from mutation drift equilibrium based on heterozygosity excess or deficiency. The probability
distribution is established using 1000 simulations under three models infinite allele (IAM),
stepwise mutation (SMM) and two-phase mutation model (TPM). The test can be conducted using
bottleneck v1.2.02 software (http://www.ensam.inra.fr/URLB). Another test is based on graphical
representation of mode-shift equilibrium. It assumes that in bottlenecked populations one or more
of the common allele classes have a higher number of alleles than the rare allele class.
Measurement of F-statistics and gene flow
The degree of population differentiation amongst the breeds can be estimated using variance based
method of Weir and Cockerham (1984). Different, F-statistics estimates viz., f (within-populationinbreeding estimate), F (total inbreeding estimate) and (measurement of population
differentiation) that are analogous to F IS, F IT and F ST respectively, can be estimated using FSTAT
version 2.9.3.2 computer programme. Means and standard deviations of F-statistics parameters are
obtained across breeds by the Jackknifing procedure over loci. The level of significance (P< 0.05) is
determined from permutation test with the sequential Bonferroni procedures applied over all loci.
Wrights F ST assess the degree of genetic differentiation between populations as this classical
estimator is considered most appropriate as genetic drift is assumed to be the main factor in genetic
differentiation among closely related populations. The effects of migration and gene flow (N e m) on
the genetic structure of populations is estimated between each pair of population. N e m values
indicating the average number of effective migrants exchanged per generation, were calculated
according to the formula:
Determination of genetic divergence and relationships
The genetic divergence between each pair of breeds can be calculated using various genetic
distance estimates based on different assumptions. Genetic distance methods can be clustered into
three groups: I) genetic distances based on infinite allele model: II) genetic distances that assume a
step-wise-mutation model III) Genetic distance based on proportion of shared alleles- a non metric
method. In addition, inter-individuals genetic distance based on proportion of alleles shared
averaged over loci can also be calculated.
The genetic distance matrices between the breeds is then used to reconstruct the tree according
to Neigbour Joining (NJ) and unweighted pair group methods with arithmetic averages (UPGMA)
algorithms making use of the PHYLIP package. The robustness of the tree topology was obtained
by 1000 bootstrap resampling of loci.
Multivariate correspondence analysis
The pattern of population differentiation can be evaluated by factorial correspondence analysis
(FCA) of the individual multi-locus scores using GENETIX software (http://www.univmontp2.fr/~genetix/genetix/genetix.htm). This multivariate correspondence analysis method is
analogous to the principle component analysis and can condense the information from large
number of alleles into fewer synthetic variables appropriate for discrete variables. The factorial
analysis can lead to a simultaneous representation of breeds and loci as a cloud of points in a metric
space. For this approach, the allele frequencies at all the loci are used as variables, and the
population clusters were identified graphically.
Breed assignment
The multilocus genotypes of individuals can be used to analyze the assignment accuracy of
individuals to their respective population using the GENECLASS2 software. This program includes
two types of methods: likelihood-based methods and genetic distance-based methods. In the first
type of methods, individuals are assigned to the population in which the likelihood of their
genotype is highest. In the second type, individuals are assigned to the (genetically) closest
population.
42

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Advantages of microsatellite as markers for genetic diversity studies


Microsatellites are considered as the most powerful genetic markers as they overcome many of the
difficulties associated with the other types of marker. Their short length makes them amenable to
amplification by PCR and subsequent separation either by polyacrylamide gels with the resolution
of the alleles differing by as little as single base or genotyping through automated sequencer. All
these features make them valuable for linkage analysis, genome mapping, parentage control and
phylogenetic analysis. Due to this, the microsatellite DNA markers have become the preferred tool
for analysis of genetic variations in closely related population of various animal species.
Microsatellite markers have essentially replaced other markers because of their high
polymorphic nature.
The whole procedure of microsatellite genotyping using PCR can be automated and as many
as 10 microsatellite loci can be analyzed simultaneously using multiplex.
In contrast to RAPD and other multilocus genetic fingerprinting, where results may be
difficult to interpret genetically and may be difficult to compare and replicate in different
laboratories, result with microsatellite analysis are easily comparable between different
laboratories.
Technically microsatellite typing are easy to standardize and are reproducible and
genotyping can be done on most tissues and cell types with same ease.
For most livestock species there are now many microsatellite markers to choose from, far
more than is necessary to compare allele frequencies between breeds.
Mitochondrial DNA (mtDNA) analysis
The mtDNA genome is approximately 16Kb of size in livestock species and is divided into two
sections: coding and noncoding. The noncoding region is also called the control region or
displacement loop (D-loop) region that contains a highly polymorphic region called hypervariable
region (HVR) which is approximately 250 bp in length. The coding region is responsible for the
production of various biological molecules involved in the process of energy production in the cell.
The control region is responsible for regulation of the mtDNA molecule. The HVR of mtDNA
within the D-loop region have been found to be highly polymorphic, or variable, within the species.
The mtDNA is consequently a powerful tool for establishing the levels of genetic diversity and
phylogenetic structure within a species. mtDNA tells about the recent demographic processes
affecting a population, for example whether a population has undergone a recent demographic
expansion, or has a more complex history. Each individual has a single haplotype therefore;
phylogenetic analyses are relatively straightforward to interpret.
The mtDNA sequencing is often used in cases where biological evidence may be degraded or
small in quantity. Cases in which hairs, bones, or teeth are the only evidence retrieved from a crime
scene are particularly well-suited to mtDNA analysis. This review will examine the process of
mitochondrial DNA typing, including the interpretation of results, the phenomenon of
heteroplasmy, the mtDNA population database, presentation of mtDNA population statistics,
quality assurance issues, and testimonial experience.
For genetic characterization based on mitochondrial DNA (mtDNA), sampling of animals with
common maternal origin should not be taken.
Sampling material
Typical sources of DNA recovered may include blood, semen, hair, bones, teeth, and body fluids
such as saliva.
Sample preparation
Ancient samples should be cleaned as per standard procedure prior to the mtDNA sequencing
process to remove contaminating materials surrounding or adhering to the sample. This step is
43

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

important to ensure that the sequence of the DNA obtained from the sample originates from the
sample and not from exogenous DNA.
DNA extraction
Standard procedure (phenol-chloroform method) should be followed for DNA extraction.
Genotyping
Standard polymerase chain reaction (PCR) procedure should be followed to amplify small amount
of DNA, which should be sequenced further.
Data analysis
The nucleotide sequences obtained from sequencing should be analyzed further for sequence
alignment, identification of nucleotide variations, generation of haplotypes, Estimation of
population indices, such as, gene diversity, nucleotide diversity and pairwise nucleotide
differences, calculation of within breed and among breed differences through AMOVA,
determination of demography, determination of population expansion, estimation of phylogenetic
relationship among different breeds of a species, identification of ancestral and descendent
haplotypes and estimation of coalescence age using estimator (rho) for time divergence by using
various software programmes.
Comparative analysis of data should be performed essentially to define the new breeds and
assess the population structure at regular intervals in order to take necessary steps for the
prioritization of conservation.
Other Markers
Y-chromosomal markers
Y-chromosomal variation is a powerful tool with which to trace gene flow by male introgression. It
is the most powerful marker in human population genetics and is used more and more in domestic
animal species.
Single-nucleotide polymorphisms (SNP)
As the name indicates, a SNP is a DNA sequence variation that occurs through a change in the
nucleotide at a single location within the genome of a species or breed. SNP usually have only two
alleles. Generally, SNP can occur throughout the genome and may represent either neutral or
functional genetic diversity.
Copy number variations (CNV)
Genetic studies of the human genome indicate the presence of variation in copy number of certain
chromosomal segments, as well as a relationship between copy number and phenotypic variation.
It is anticipated that this category of genetic variation will also prove to be relevant for studying the
diversity of livestock.
Genome sequencing
Next-generation genomic technologies, several of which have already passed the proof-ofprinciple stage, will expand further the scope of molecular studies and likely allow in the near
future the affordable whole-genome sequencing of individual animals. Predictably, this will open
new avenues of research that lead to new insights into diversity and the estimation of conservation
values. Most notably, dense genetic maps allow the demarcation of footprints or signatures of
selection, while the growing amount of knowledge on genotypephenotype relationships will also
reveal novel aspects of functional diversity. Clearly, this will require new software and hardware
for extracting and storing meaningful information for the huge amount of DNA sequence.

44

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Recommendations of FAO for genetic characterization


FAO and the ISAGFAO Advisory Group on Animal Genetic Diversity recommend that:
Current activities to genetically characterize the genome and establish the genetic
relationships among the breeds of each domestic animal species should be continued and
completed as a matter of urgency and should be complemented with phenotypic
characterization
These guidelines and the recommendations herein should be taken into account during setup
and execution of studies of the diversity of AnGR, while monitoring closely the advances in
molecular technology and bioinformatics;
Particular attention should be given to standardization of results from existing and planned
studies for integration into a global analysis of AnGR diversity;
Breeds from white spots on the current phylogeographic map and samples relevant for
joining datasets should be analysed;
New frameworks for international cooperation should be established to create and distribute
reference samples of DNA for standardization and to develop a centralized database to store
and provide access to data;
National Coordinators for the Management of AnGR and National Advisory Committees for
AnGR should be made aware of all diversity projects at whatever geographic level, so that
results can contribute to the planning and development of national conservation and
sustainable use activities and so that FAO can help facilitate coordination among projects,
exchange information and promote funding.
New genomic tools for characterization of diversity that avoid ascertainment bias should be
implemented and that methods be developed for combining datasets generated by
established and new technologies, respectively.
Conservation Genomics
Recently, taking the advantage of the availability of sequence draft of different livestock species
(chicken, 2004; cattle, 2005; rabbit, 2006; pig, 2007; sheep, 2009; camel, 2010), SNP arrays have been
developed for chicken, cattle, pig and sheep and it is under development for other species.
Multiplex SNP genotyping permits simultaneous high throughput investigation of hundreds of
thousands of loci with high measurement precision and provides a wealth of information useful for
many aspects of livestock breed conservation. The genomic abundance and amenability to cost
effective high throughput genotyping has made the SNPs as the most preferred class of genetic
marker in genome wide association studies, genomic selection, and the dissection of quantitative
traits. High density panel of SNP markers are being used to assess the true extent of autosomal
diversity among cattle breeds. The whole genome SNP panel in different breeds of cattle identified
several levels of population substructure with the greatest level of genetic differentiation detected
between Bostaurus and Bos indicus breeds (McKay et al. 2008). The efficiency of clustering breeds
using SNP loci indicated that a large number of SNP markers are required for genetic diversity
studies to replace FAO recommended 30 microsatellite markers. SNP array also offers the solution
to know the extent of genetic diversity existing in the breeding stocks so as to take care of future
needs. Global analysis of SNPs in chicken revealed that 50% or more of the genetic diversity in
ancestral breeds is absent in commercial pure lines (Muir et al. 2008). The inability to incorporate
ancestral alleles indicates that the extent of genetic diversity and the number of incorporated breeds
in the breeding stock should be large. The availability of dense SNP sets is driving investigations
into the pattern of linkage disequilibrium LD and consequence of selection and genome wide
selection as a method to accelerate genetic gain in livestock. The use of gene markers has attracted
researchers in recent years as variation in these allele frequencies may provide information related
to functional differences between breeds. Phylogenetic studies using gene markers or SNPs
45

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

associated with traits of interest are relevant for breed utilization and conservation for future
production needs.
Conservation Genetics for Selection of Breeds
Large numbers of indigenous breeds are in danger of extinction and hence needs conservation, but
the available resources are limited. Making a decision about which breeds should be targeted for
conservation is challenging. A conservation strategy of animal genetic resources has to be directed
towards maintenance of maximum genetic diversity in the global gene pool (Maijala and Kolstad
1992, Barker 1994) while maintaining within breed diversity to reduce inbreeding and preserve
genetically differentiated groups. Diversity is generally measured as 1-f [f = average kinship or co
ancestry in a sub population] or 1- F [F = average inbreeding in a sub population]. In order to
manage genetic diversity it is best to minimize the average kinship in a population.
Maximization of genetic diversity for the next generation is achieved by minimizing the average
relatedness of the parents. The principle of minimizing relatedness not only applies to the choice of
parents for producing the next generation in breeding programs, but also to the choice of candidate
breeds for conservation. Eding and Meuwissen (2001) worked out the principles to estimate
average relatedness between different breeds based on microsatellite markers and to determine the
optimal contributions of different breeds to a gene pool. Eding et al. (2002) also quantified the
contribution of each breed to the maximum amount of genetic diversity and to identify important
breeds for the conservation of genetic diversity in Northern European cattle breeds. In contrast to
kinship approach, Weitzmans methods for selecting breeds for conservation give higher priority
for inbred population. However, both methods rank the breeds as per the diversity content.
Piyasatian and Kinghorn (2003) developed a method to balance genetic diversity, genetic merit and
population variability in the establishment of conservation programmes. The method gives
appropriate balance to the three issues: diversity, merit and variability. To sum up, conservation
programmes should be based on wider information; in particular it should be based on a relatively
large set of genetic marker loci by targeting good coverage of the genome. Further, other indicators
of genetic variability and genetic uniqueness should also be included. There is a need to complete
the work of detailed genetic characterization and select the breeds for conservation by pooling the
marker data with available information on degree of endangerment, traits of economical and
ecological values, specific adaptive features, presence of unique genes/phenotypes and cultural
and historical values. The decision for conservation of breeds can be taken on the basis of
independent selection principle i.e. a breed can be conserved if it reaches a maximum value for at
least one of the above mentioned criteria.
Conclusion
Genetic variability is a major concern to define any livestock breed and to preserve the maximum
amount of genetic diversity.Generally, by genetic characterization, we assess the genetic
constitution of a breed or population of a species. It assesses the genetic uniformity, admixture or
subdivisions, inbreeding, or introgression in the population. It is also helpful in providing insights
into breed formation, informing about closest wild ancestral species and localization of the site of
domestication. Phylogenetic relationships of populations based on genetic analysis unravel the
evolutionary history of the breeds or populations. Therefore, it is important to characterize different
breeds so that we can know how unique or different a breed is from other native populations. The
genetic characterization is a further step to answer questions on taxonomy, evolution,
domestication processes, management of genetic resources and setting conservation plans for their
effective utilization. Through this, we can prioritize the breeds for conservation using molecular
data and monitor its status in the defined geographical region.

46

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Reference

Acharya R M and Bhat P N. 1984.Livestock and poultry genetic resources in India.IVRI Research Bulletin
No.1. Indian Veterinary Research Institute, Izatnagar.
Barker J S F. 1994.Animal breeding and conservation genetics. In Loeschcke V., Tomiuk J., and Jain, S.K. Eds.,
Conservation Genetics, BrikhauserVerlag, Basel, Switzerland 381.
Eding H, Crooijmans P M A, Groenne, M A M and Meuwissen T H E. 2002. Assessing the contribution of
breeds to genetic diversity in conservation schemes. Genetic Selection Evolution 34: 613.
Eding J H and Meuwissen T H E. 2001. Marker based estimates of between and within population kinships
for the conservation of genetic diversity. Journal of Animal Breeding Genetics 118:141.
FAO.2007. The State of the Worlds Animal Genetic Resources for Food and Agriculture, edited by Barbara
RischkowskyandDafydd Pilling. Rome.
Hall S J and Bradley G. 1995. Conserving livestock breed diversity. Tree 10: 267.
Maijala K and Kolstad N. 1992. Gene banks for livestock conservation. In Sandlund O.T., Hindar K., and
Brown A.H.D. Eds., Conservation of Biodiversity for Sustainable Development. Scandinavian University
Press, Oslo. pp. 230.
McKay S D, Schnabel R D, Murdoch B M, Mutukumalli L K, Aerts J, Coppieters W, Crews D, Neto E D, Gill C
A, Gao C, Mannen H, Wang Z, vanTassel C P, Williams J L, Taylor J F and Moore S S. 2008. An assessment
of population structure in eight breeds of cattle using a whole genome SNP panel. BMC Genetics 9: 37.
Muir W M, Wong G K, Zhang Y, Wang J, Groenen M A M, Crooijmans R P M A, HendrikJan M, Zhang H,
Okimoto R, Vereijken A, Jungerius A, Albers G A A, Lawley C T, Delany M E, MacEachern S and
Cheng H H. 2008. Genomewide assessment of worldwide chicken SNP genetic diversity indicates
significant absence of rare alleles in commercial breeds. Proceedings of the National Academy of Sciences
USA 105: 17312.
Nivsarkar, A E, Vij P K and Tantia M S. 2000. Animal Genetic Resources of India Cattle and Buffalo.pp. 5054
and 135139.Directorate of Information and Publications of Agriculture, Indian Council of Agricultural
Research, New Delhi.
Piyasatian N and Kinghorn B P. 2003. Balancing genetic diversity, genetic merit and population viability in
conservation programmes. Journal of Animal Breeding Genetics 120: 137.

47

8
Microsatellite Markers for Genetic Diversity Analyses of Farm Animals
Reena Arora
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Genetic diversity plays a very important role in survival and adaptability of a species. It is required
to meet current production needsin various environments, to allow sustained genetic improvement,
and tofacilitate rapid adaptation to changing breeding objectives (Notter, 1999). A major drawback
in formulation and implementation of conservation, breeding and management policies for Indian
livestock breeds is the lack of information regarding their current genetic status. Over the past
decade microsatellite markers have proven to be useful in genetic diversity studies in several
livestock species (Acosta et al 2013;Arora et al., 2011; Kumar et al 2006). The awareness of the
importance of this diversity at the phenotypic level has led to the assessment of diversity at the
genetic level as well. Genetic characterization enables the prioritization of breeds for conservation.
The amount of genetic divergence between populations is regarded as a major criterion for deciding
their uniqueness and therefore prioritizing their conservation (Eding et al., 2002).
Microsatellite markers have been liberally used globally to measure genetic diversity, gene flow,
migration and effective population size in livestock breeds (Kantanen et al., 2000; Peter et al., 2007,
Cinkulev et al., 2008). A plethora of information has been generated on the genetic characterization
of Indian livestock during the last decade, using neutral microsatellite markers. Coancestry and
kinship between breeds has also been determined through the use of microsatellite markers. Past
genetic bottlenecks have been detected in several livestock breeds using microsatellites. These
markers are also being used for assigning individuals to the population of origin as well as for
parentage verification. They are the best suited markers for differentiating closely related breeds.
Microsatellites- Markers of choice
Inspite of the growing competition from new genotyping and sequencing techniques, the
microsatellite markers are still regarded as the most powerful DNA tools for genetic analysis owing
to their several unique characteristics and are globally being exploited to establish genetic profiles
of animal genetic resources. Since their discovery, microsatellites have been used in mapping
programmes and by population biologists for studies of population genetic structure and kinship
investigations.
Microsatellites have been recommended by FAO as first priority molecular tools for the
Measurement of Domestic Animal Diversity (MoDAD).The term microsatellite was introduced by
Litt and Luty (1989) to characterize the simple sequence motifs repeated in tandem, one to six
nucleotides (mono, di, tri, tetra, penta and, hexanucleotides tandem repeats) in length. For example,
mono nucleotides , AAAAAAAAAAA would be referred to as (A) 11
di nucleotides, GTGTGTGTGTGT would be referred to as (GT) 6
tri nucleotides, CTGCTGCTGCTG would be referred to as (CTG) 4
tetra nucleotides, ACTCACTCACTCACTC would be referred to as (ACTC) 4
Microsatellites are also known as simple sequence repeats (SSR), short tandem repeats (STR) and
sequence tagged microsatellite repeats (STMR). They occur at a frequency of one SSR per 10Kb
DNA and numbering to a total of 50,000 - 100,000 in the mammalian genome. They are found in a
wide variety of eukaryotes including plants. Microsatellites occur very frequently and randomly
in most eukaryotic DNA. Human genomic DNA contains on an average one microsatellites every
6 bp (Beckman and Weber, 1992).
48

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Major advantages of these highly polymorphic microsatellites are their locus specificity,
abundance and random distribution over the genome, co-dominant inheritance, ease and speed of
their application, and suitably for automated analysis. An advisory group of the International
Society for Animal Genetics (ISAG) in collaboration with FAO, has established, for each species of
interest, a set of microsatellite markers to be used as the standard set for the calculation of genetic
distances. Adherence to such recommendations allows for reasonable comparison of parallel or
overlapping studies and helps combine results in meta-analyses. To attain a certain precision for
different levels of resolution or discrimination among breeds, it is recommended to sample at least
25 animals per breed (mainly blood samples, hair or tissues may be taken) and investigate 25
microsatellite loci with 4-10 alleles per locus. The primer sequences and map position of each of
these markers can be obtained from Domestic Animal Diversity Information System (DAD-ISMoDAD) and are also available at site http://dad.fao.org/dad-is/data/molecula/index.html.
Isolation of microsatellite markers
Tandem repeat sequences (microsatellites) are first detected from the entire genome and their
unique flanking sequences are used to develop primers for amplification of the specific
microsatellites by PCR. Broadly, two strategies are used for the isolation of microsatellite markers.
(A) Cosmid derived microsatellite markers
In this strategy, the genomic DNA, after digestion with restriction enzymes is cloned into suitable
vectors mostly cosmids, thus forming a cosmid genomic library. The cosmids are then screened
with a labelled (CA) n or (GT) n polynucleotide probe. The clones that hybridize to the probes are
detected by autoradiography. The positive clones are isolated and the insert (microsatellite) which
they harbour is sequenced and characterized. Appropriate primers are designed from the flanking
regions.
(B) Microdissected chromosome derived microsatellite markers
A chromosome spread is obtained from a blood culture in this methodology and the chromosome
of interest is identified under a microscope.
This chromosome is dissected using a
micromanipulator. The microdissected chromosomal fragments are then used to construct genomic
DNA library which is screened with radiolabelled (CA) n or (GT) n probes. Positive clones are
isolated and subjected to PCR amplification. The PCR products are sequenced and the sequences
checked for uniqueness to develop PCR primers. A modification of this method involves
amplification of the microdissected chromosomal fragments by PCR using degenerate
oligonucleotide primers. To the amplified products biotinylated (CA) n probes are added. After
denaturation and annealing the annealed DNA is added to streptavidin paramagnetic particles and
incubated to capture DNA fragments hybridized to biotinylated (CA) n probes. The bound DNA is
eluted and amplified using appropriate primers. The amplified products are purified and
sequenced to be used as markers.
Evolution of microsatellites
It is believed that when DNA is being replicated, errors occur in the process and extra sets of these
repeated sequences are added to the strand. Although a clear understanding of the origin and
evolution of microsatellites is still not available, the number of repeats increases or decreases by a
single repeat unit, though sometimes more. Simple repeats are considered to be generated mostly
by slipped strand mispairing (Moxon and Wills, 1999) or by insertions or substitutions (Zhu et al.
2000).
Slipped strand mispairing
In this process, the number of microsatellite repeats increases or decreases during DNA replication.
An increase in the number of microsatellite repeats occurs when slippage occurs on the newly
49

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

synthesized strand in its binding to the template strand and the DNA polymerase adds the
nucleotides to fill in the gap, thereby increasing the strand by one repeat. Decrease in the repeat
number occurs when the old or template strand slips resulting in the repair enzymes deleting a
repeat. DNA polymerase has a very high rate of slippage or templates containing simple repeats in
vivo, but most of these errors are corrected by cellular mismatch repair systems. But the instability
of simple repeats observed for some human diseases may be a consequence of either an increased
rate of DNA polymerase slippage or a decreased efficiency of mismatch repair (Strand et al. 1993).
Insertions and substitutions
Slippage of DNA polymerase depends on mispairing of tandem repeats during DNA replication, so
it may not occur when there are few tandem repeats. Studies of slippage mutations show that they
are more common in loci with longer repeats. Loci with fewer than five repeats are rarely
polymorphic. If they incur few mutations which increase the number of repeats, their
polymorphism levels increase. For slippage to occur on longer repeats some mechanism other than
slippage must occur on shorter repeats from which the longer repeats evolved. Microsatellite
sequences are exceptionally vulnerable to spontaneous insertion or deletion mutations and nontriplet microsatellites when located in coding sequences are expected to introduce frameshift
mutations at high frequency. Substitutions are much more common than insertions and they are
the dominant source of new two-repeat loci. Microsatellites have been estimated to mutate at the
rate of 103 to 105 mutations per gamete.
Theoretical Models of Microsatellite Mutations
Theoretical mutation models have been derived to explain the evolutionary processes of
microsatellites from which genetic distances and population differentiation are estimated. The
Infinite Allele Model (IAM) was given by Kimura and Crow (1964), according to this model a
mutation can involve any number of tandem repeats and always results in a new allele state not
previously existing in the population. But this model does not confer with the slipped strand
mispairing mechanism responsible for microsatellite length variation. This mechanism leads to
small changes in the repeat numbers and alleles may mutate towards allele states that are already
present in the population. In order to explain the discrepancies in the mutational processes, the
Step-wise Mutation Model (SMM) was introduced in the 1970s. The model assumes that the entire
sequence of allelic states can be expressed as integers and mutation results in a change in one repeat
unit either by insertion or deletion (Kimura and Ohta, 1978). In addition to this model, DiRienzo et
al. (1994) described the Two Phase Model (TPM), where a limited proportion of mutations involve
several repeats.
Limitations
Null Alleles
Failure of amplification of some alleles due to mutations in the binding regions results in reduction
or loss of PCR products. These are termed as null alleles and may lead to serious underestimation
of heterozygosity. In a heterozygote of two different alleles, if one allele fails to amplify due to
primer annealing difficulties then the phenotype will appear as a single banded homozygote. This
problem may be overcome by designing new primers but it is a very tedious task.
Slippage
This problem is due to the activity of the Taq polymerase used in the PCR. During PCR
amplification, the thermo-polymerase tends to slip leading to production of differently sized
products. These products are less intense and are also referred to as shadow bands. Further, the
Taq polymerase has a tendency to add an additional ATP at the 3end of the amplified PCR
products. This can also lead to difficulties in scoring bands.
50

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Homoplasy
Homoplasy can be defined as the co-occurrence of alleles that are identical by descent. If two
alleles are inherited without any mutation from the same ancestral allele they are identical by
descent. But two alleles may have the same structure and even the same sequence but may not
have been inherited from the same ancestral allele. Such alleles are identical in state (Jarne and
Lagoda, 1996).
Applications of Microsatellite Markers
Biodiversity Analysis:
By analyzing the microsatellite profiles for each individual across different loci inferences can be
made about overall magnitude of genetic diversity within breeds. The priority breeds for
conservation should be the ones with the largest within breed diversity. Microsatellites are most
suitable to determine the relationships, expressed as genetic distances among breeds, possible
levels of inbreeding in each breed, gene flow in livestock populations, most diverse and distinctive
i.e. genetically unique breeds/populations for higher priority in conservation programmes and
relative contribution of each breed to the total (species) genetic diversity. These markers have been
successfully used for differentiation of closely related breeds and assignment of individuals to
specific breeds.
Breed demarcation and phylogenetic studies:
Microsatellite markers have been successfully used to determine the genetic variation between
breeds. Characterization of breeds is necessary for the development of conservation programmes,
to determine which breeds should be conserved. Pihkanen et al. (1996) used microsatellite markers
to estimate dog breed differentiation. Microsatellite have been successfully used to assess the
genetic variation between various cattle breeds (Moazami-Goudarzi et al., 1997; Martin-Burriel et al.,
1999; Schmid et al., 1999; Kantanen et al., 2000). The relationship among breeds of species other than
cattle have also been estimated, viz., Goats (Saitbekova et al., 1999), horse (Bjornstad et al., 2000) and
donkey (Jordana et al., 2001). Microsatellite loci have been successfully used to reconstruct
phylogenetic relationships among populations. Ritz et al. (2000) determined the phylogenetic
relationships in the tribe Bovini using 20 microsatellite markers. Takezaki and Nei (1996) have
suggested that microsatellite DNA is very useful for clarifying the evolutionary relationship of
closely related populations.
Parentage testing:
Microsatellite typing can be used as a tool for identity or paternity testing by detection of
hypervariable sequences. Identity testing and parentage determination are useful in artificial
insemination and progeny testing programmes and also in paternity related disputes. DNA
analysis allows a far greater accuracy of parent identification through comparison of microsatellite
sequences of an individual and its candidate parents. A DNA-based technique can be used to
identify parentage in situations with multiple sire matings. In addition, these molecular markers
also serve as a useful tool for animal identification, particularly for verification of the semen used
for artificial insemination. ISAG has recommended panels of microsatellite markers for parentage
verification in horse, dog, cattle, sheep, goat and pig.
(www.isag.us/Docs/consignmentforms/02_PVpanels_LPCGH.doc).
Population Bottleneck:
A population bottleneck is a drastic reduction in the size of a population that may be caused by
natural calamities, habitat destruction or endemic disease. The decrease in population number
directly impacts the genetic diversity which also decreases. When populations are under strong
natural selection or artificial selection, only a subset of individuals in the population will reproduce
51

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

therefore relatively few individuals contribute alleles to subsequent generations. Alleles for generegions that are not under selection are present in the post-selection population as a random subset
of the original allelic diversity. The probability of an allele being present in subsequent generations
is equivalent to its frequency in the original population therefore high frequency alleles have a
greater probability of being present in the post-selection population than low frequency alleles
(Luikart et al., 1998a,b). If selection pressure lasts for many generations, rare alleles will be lost
simply by chance resulting in a post-selection population with fewer alleles and lower
heterozygosity than the original population. Unfortunately, it is often difficult to identify losses of
variability because levels of genetic variability prior to a population decline are generally unknown
(Spencer et al., 2000) A number of statistical methods now make it possible to investigate a
population's history without the need for information on past population sizes (Spencer et al., 2000).
These tests typically quantify deviations from expected patterns in allele sizes, allele numbers,
heterozygosity levels, or allele distributions, often using microsatellites data as these molecular
markers are important modern tools for estimating the level of genetic diversity in endangered
populations (OBrien 1994).
Identification of disease carrier:
Many incurable diseases result from defects in genomes. DNA polymorphism occurring within a
gene helps to understand the molecular mechanism and genetic control of several genes and
metabolic disorders and allows the identification of heterozygous carrier animals. Identification of
carrier animals of weaver disease (progressive degenerative myeloencephalopathy) in cattle has
been accomplished using TGLA116 microsatellite marker. Georges et al. (1993) performed an
extensive linkage study in a bovine pedigree segregating for the weaver condition and identified a
microsatellite locus closely linked to the weaver gene and by extension; the weaver locus was
assigned to bovine synteny group 13. Microsatellite TGLA116 can be used to identify weaver
carriers, to select against this genetic defect.
Mapping of QTL:
The most important application of microsatellites includes mapping of QTL by linkage. Such
mapping information if available for genes of economic importance can be used in breeding
programmes of either within breeds manipulations like marker assisted selection of young sires or
between breeds introgression programmes. Microsatellites have been adopted widely for use in
heritage mapping studies of the farm animals to the point that they are now the favored
polymorphic marker for this purpose. Microsatellite marker D21S4 has shown significant
association with effects on milk and protein yields in cattle. The presence of QTL for milk
production on five chromosomes (namely chromosome no. 1, 6, 9, 10 and 20) has also been
demonstrated in 14 US Holstein half-sib families using 159 microsatellites. Significant association of
microsatellite markers with somatic cell score (SCS, an indicator for susceptibility to mastitis),
productive herd life and milk production traits has also been established. Potential QTL for SCS, fat
yield, fat percentage, and protein percentage have also been identified using microsatellite (Ron et
al.1994; Ashwell et al.1997). Characterization of QTL for economically important traits using
microsatellite markers will help in formulating more efficient breeding programmes using MAS.
The map would also help in identification, isolation and manipulation of animals with
predetermined phenotype by modifying the candidate genes.
Conclusion
SNP markers are gradually replacing microsatellite markers for diversity analysis within species.
However, SNPs are not without limitations of ascertainment bias (Schlotterer 2004). In addition,
there are limitation with existing genetic programs and computer applications to be able to process
the huge amounts of data generated in genome wide SNP studies (Decker et al, 2009). Further,
52

Molecular Genetic Characterization of Farm Animal


Genetic Resources

diversity analyses using SNPs/microarrays involves high costs. Therefore, despite some limitations
of sampling methods, number of markers used and type of analyses, microsatellite based studies
remain viable for analysis of biodiversity, potential conservation and sustainable utilization of
livestock genetic resources particularly the indigenous breeds. Although the information generated
from microsatellite data facilitates the outlining of genetic management and conservation programs
for livestock breeds/populations, additional information on population trends, economic
importance and specific adaptive features needs to be taken into consideration.
References

AcostaA.C, Uffo O, Sanz A, Ronda R, Osta R, Rodellar C, Martin-Burriel I and Zaragoza P. 2013. Genetic
diversity and differentiation of five Cuban cattle breeds using 30 microsatellite loci.Journal of Animal
Breeding and Genetics.130: 79.
Arora R, Bhatia S, Mishra B. P. and Joshi B.K. 2011. Population structure in Indian sheep ascertained using
microsatellite information. Anim. Genet. 42: 242.
Ashwell M.S, Rexroad Jr C. E, Miller R.H, VanRaden P.M and Da Y. 1997. Detection of loci affecting milk
production and health traits in an elite US Holstein population using microsatellite markers. Animal
Genetics. 28: 216.
Beckmann J.S. and Weber J.L. 1992. Survey of human and rat microsatellites. Genomics 12, 627-631.
Cinkulov M, Popovski Z, Porcu K, Tanaskoovska B., Hodzic A, Bytyqi H, Mehmeti H, Margeta V, Djedovic R,
Hoda A, Trailovic R, Brka M, Markovic B, Vazic B, Vegara M, Olasker I. and Kantanen J. (2008). Genetic
diversity and structure of the West Balkan Pramenka sheep types as revealed by microsatellite and
mitochondrial DNA analysis. Journal of Animal Breeding and Genetics. 125, 417-426.
Decker J.E, Pires J.C, Conant G.C. et al.2009. Resolving the evolution of extant and extinct ruminants with
high-throughput phylogenomics. Proc. Natl. Acad. Sci. USA. 106: 18644.
DiRienzo A, Peterson A.C, Garza J.C, Valdes A.M, Slatkin M. and Frieimer N.B. 1994. Mutational process of
simple sequence repeat loci in human populations. Proc. Natl. Acad. Sci., USA. 91: 3166.
Eding H, Crooijmans R.P.M.A, Groenen M.A.M. and Meuwissen T.H.E. (2002) Assessing the contribution of
breeds to genetic diversity in conservation schemes. Genetics Selection and Evolution. 34, 613-633.
Georges M, Dietz A.B, Mishra A, Nielsen D, Sartgeant L.S, Sorensen A, Steele M.R, Zhaho X, Leipold H,
Womack J.E and Lathrop M. 1993. Microsatellite mapping of the gene causing weaver disease in cattle
will allow the study of an associated quantitative trait locus. Proceedings National Academy of
Sciences, USA. 90: 1058.
Jarne P. and Lagoda P.J.L. 1996. Microsatellites from molecules to populations and back. TREE, 11, 424-429.
Kantanen J, Olsaker I. and Holm L.E. 2000. Genetic diversity and population structure of 20 North European
cattle breeds. Journal of Heredity.91:446.
Kimura M. and Crow J. F. (1964). The number of alleles that can be maintained in a finite population.
Genetics. 49, 725-738.
Kimura M. and Ohta T. 1978. Stepwise mutation model and distribution of allelic frequencies in a finite
population. Proc. Natl. Acad. Sci., USA. 75: 2868.
Kumar S, Gupta J, Kumar N, DikshitK, Navani N, Jain P and Nagarajan M. 2006. Genetic variation and
relationships among eight Indian riverine buffalo breeds.Molecular Ecology. 15: 593.
Litt M and Luty J.A. 1989. A hypervariable microsatellite revealed by in-vitro amplication of a dinucleotide
repeat within the cardiac muscle actin gene. American Journal of Human Genetics. 44: 397.
Luikart G, Allendorf F.W, Cornuet J. M, and Sherwin W.B. 1998a. Distortion of allele frequency distributions
provides a test for recent population bottleneck. J.Hered. 89: 238.
Luikart G, Sherwin W.B, Steele B.M. and Allendorf F.W. 1998b. Usefulness of molecular markers for detecting
population bottlenecks via monitoring genetic change. Mol. Ecol. 7: 963.
Moxon E.R. and Wills C. 1999. DNA microsatellites: agents of evolution? Sci. Am. 94.
Notter D.R. 1999. The importance of genetic diversity in livestock populations of the future. J Anim Sci. 77: 61.
OBrien, S.J. 1994. A role for molecular genetics in biological conservation. Proc. Natl. Acad. Sci. USA.91: 5748.
Peter C, Bruford M, Perez T, Dalamitra S, Hewitt G, Erhardt G. and the ECONOGENE Consortium. (2007)
Genetic diversity and subdivision of 57 European and Middle-Eastern sheep breeds. Animal Genetics.
38, 37-44.

53

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Ron M, Band M, Yanai A and Weller J.I 1994.Mapping quantitative trait loci with DNA microsatellites in a
commercial dairy cattle population. Animal Genetics. 25: 259.
Schlotterer C. 2004. The evolution of molecular markers just a matter of fashion? Nature Reviews Genetics.
5, 639.
Spencer C.C, Neigel J.E. and Leberg P.L. 2000. Experimental evaluation of the usefulness of microsatellite
DNA for detecting bottlenecks. Mol. Ecol.9: 1517.
Strand M, Prolla T.A, Liskay R.M. and Petes T.D. 1993. Destabilization of tracts of simple repetitive DNA in
yeast by mutations affecting DNA mismatch repair. Nature. 365, 274-276
Zhu Y, Strassman J.E. and Queller D.C. 2000. Insertions, substitutions and the origin of microsatellites. Genet
Res. 76: 227.

54

9
Mitochondrial DNA as a Marker for Genetic Diversity and Evolution in
Farm AnGR
Monika Sodhi, Amit Kishore and Manishi Mukesh
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana

________________________________________________________________________________________

Livestock breeds have been formed through human and natural selection since the beginning of
domestication thousands of years ago so as to best fit the environmental condition and human
needs. Detailing the evolutionary and demographic history of domesticated animals has always
been a focus of research. The genetic diversity has been exploited in livestock species to identify
new traits developed in response to changes in environment, diseases or market conditions
(Erhardt and Weimann, 2007). Also the evolutionary potential of a species depends mainly on the
genetic variation of their populations, which is the consequence of a balance between evolutionary
and demographic processes generating either heterogeneity or homogeneity among local
populations. Understanding the evolutionary relationships among livestock breeds can reveal the
origin of animal husbandry, distinction between wild and domesticated forms of a species and
elucidation of the events surrounding bovine prehistory made from archeological and
anthropological data (Loftus et al., 1994a).
Recent developments in molecular genetics have provided new powerful tools, called molecular
markers, to assess the evolutionary and demographic history of livestock species, domestication
events and geographic distribution of their diversity (Hanotte and Jianlin, 2006). DNA based
marker methods are commonly used in ecological, evolutionary, and genetic approaches to analyze
efficiently genetic structure in both animal and plant species (Tarnita et al., 2009). These markers
have helped in identification of the wild ancestors of modern livestock and the nature of livestock
expansion in past millennia. Such Information tells us about history and the way in which
extraordinary biological diversity has been shaped in a relatively short period of time. With
development of molecular technologies, DNA-based polymorphisms became the markers of choice
for molecular-based survey of genetic variation (Hanotte and Jianlin, 2006). Different genetic
markers provide different levels of genetic diversity information among which Mitochondrial DNA
(mtDNA) sequences are the markers of choice for significant insights into the domestication and
past migration history of livestock species. Use of mtDNA has broadened the perspective on the
origin and evolution of domesticated cattle (Maji et al., 2009). Further, one of the persistent
challenges in the analysis of population genetic data is to account for the spatial arrangement
(nonrandom distribution of genetic variation among individuals within populations) of samples
and populations. mtDNA data have been extensively used to understand the spatial distribution of
genetic lineages within species allowing the historical factor with the highest effect on the lineages
spatial patterns. mtDNA has been used for the identification of maternal and paternal lineages
(Erhardt and Weimann, 2007) as well as test hypothesis related to past genetic history and
evolution of different species (Hebsgaard et al. 2007). mtDNA can also tell us about the recent
demographic processes affecting a population, for example whether a population has undergone a
recent demographic expansion, or has a more complex history. The recognition of mitochondrial
DNA molecule as a genetic marker in population and evolutionary biology derives in part from the
relative ease with which clearly homologous sequences can be isolated and compared. Simple
sequence organization, maternal inheritance and absence of recombination make mtDNA an ideal
marker for tracing maternal genealogies.

55

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Mitochondrial DNA vis a vis Genomic DNA


Mitochondrial DNA is cytoplasmic and maternal (uniparental) in inheritance and in contrast to
nuclear DNA, it is present in abundance per cell as each cell may contain several hundred of
mitochondria and each mitochondrion contains multiple copies of its own duplex circular genome.
The presence of mtDNA in abundance facilitates its extraction from even forensic/ ancient cell or
tissue and provides many insights on the genetic diversity, origin and taxonomy of breeds, besides
estimating the time depth of domestication history. The mutation rate of mtDNA is higher than that
of nuclear DNA, making it useful for addressing genetic relationships within and between
populations and the presence of population bottlenecks (Berggren et al. 2005).
Cytoplasmically inherited mtDNA evolves five to ten times more rapidly than nuclear genome
(Brown et al., 1979) due to the presence of a hypervariable displacement-loop (D-loop) and tend to
enhance recent demographic events. In general, the non-coding D-loop region exhibit elevated
levels of variation relative to coding sequences such as the cytochrome-b gene, presumably due to
reduced functional constraints and relaxed selection pressure (Brown et al., 1993). The D-loop is also
known as control region, since it is the site of transcriptional and replicational control (Anderson et
al., 1982 as it contains the two major transcriptional promoters (P H and P L ) and the origin of
replication (Ori H ) within it. Overall, the mtDNA structure is highly conserved in higher animals.
Mitochondrial DNA has many advantages over genomic DNA in that it has been well characterized
at both population and molecular levels and yields data readily amenable to analysis. A rapid rate
of sequence divergence (at least in vertebrates) allows discrimination of recently diverged lineages.
Studies of mtDNA from a diversity of animal groups have revealed significant variation among
taxa in mtDNA sequence dynamics, gene order and genome size.
Mitochondrial DNA as genetic marker
To address the impact of population historic events, markers must meet certain criterion 1) the
markers must be evolutionarily conserved to allow the identification of the wild taxon or
population from which the species descends 2) variable and structured enough across the
geographical range of the species so that the approximate locality of domestication can be
identified, and 3) the markers should evolve rapidly but at a constant rate to date the
polymorphism. mtDNA presents all these characteristics. Further, mtDNA is also almost
exclusively maternally inherited, is effectively haploid and is free from genetic recombination
implying that each individual has a single haplotype and that phylogenetic analyses are relatively
straight forward to interpret. Because of these features mt DNA has been the predominant molecule
used to determine maternal based phylogenies unobscured by genetic exchange (Figure 1). This
property of unipaternal inheritance makes mtDNA distinct from the highly polymorphic
microsatellite markers, which are co dominantly inherited. Also, the presence of D-loop that
evolves much rapidly provides an upper edge over information based on Y-chromosome DNA
sequences, which are less variable within species (Bruford et al., 2003) which makes its routine use
difficult for phylogenetic analyses. Analysis of the Y chromosome also lacks the power of multipleband profile. mtDNA is highly variable within species, such that in humans for just one highly
variable section of mtDNA control region, over 500 distinct haplotypes have been recorded. Thus,
due to combination of genetic characteristics such as uniparental mode of inheritance, lack of
recombination and presence of hypervariable region mtDNA diversity has been the primary focus
for maternal based genetic studies (MacHugh and Bradley, 2001).

56

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Figure 1. Comparison of characteristics of mitochondrial DNA (mtDNA), microsatellites and single


nucleotide polymorphism (SNPs) as genetic markers (adapted from Morin et al., 2004)

Size of mt-DNA in farm animal


The size of mtDNA molecule has been determined in most of the livestock animal species using
various methods viz. electron microscopy, restriction fragment analysis and nucleotide sequencing.
The size of mtDNA in almost all livestock animal species is approximately 16 Kb (Table 1). The
small size of mitochondrial genome has been explained due to their limited coding capacity
(typically 37 genes- 22 tRNAs, 2 rRNAs, and 13 polypeptides) as well as compact and efficient
genomic organization ((Lavrov, 2007).
Species
Cattle

Table 1.Size of mitochondrial genome and the control region in livestock animals
Size (bp)
16,338
16,339
16,355

GenBank Acc.
V00654
NC_005971
AY488491

First report
Anderson et al., 1982
Hiendleder et al., 2008
Parma et al., 2004

Control region (bp)


910
911
914 to 928

Sheep

Organism
B.taurus
B. indicus
Bubalus
bubalis
Ovis aries

16,616

Hiendleder et al., 1998

934*

Goat
Pig
Horse

Capra hircus
Sus scrofa
Eqqus cabllus

16,640
16,613
16,660

AF010406,
AF010407
AF034253
X79547

Pietro et al., 2003


Lin et al., 1999
Xu and Arnason, 1994

1176
960*

Buffalo

* D-loop length excluding repeats.

Mitochondrial DNA as a molecular clock


With highest nucleotide substitution rate (10-8 per year) compared to whole mitochondrial DNA, Dloop regionis considered as the mutational hot spot. Due to this high rate of substitution, mtDNA
has been referred as a molecular clock, as it can be used for estimation of time for origin of breed, its
divergence and phylogeography (Galtier et al., 2009) . The classic example of this is in human
evolutionary genetics, where the molecular clock could provide a recent date for mitochondrial-eve.
Besides, mtDNA has been useful for addressing genetic relationship within and between
population and the presence of population bottlenecks (Berggren et al., 2005). mtDNA marker has
also been utilized as workhorse in evolutionary related studies in particular phylogeography based
on maternal lineage ( Kim et al., 2003) and understanding the domestication events among the
breeds of several species like cattle ( Cai et al., 2007), buffalo (Kumar et al., 2007a), goat (Joshi et al.,
2004), sheep (Arora et al., 2013); , pig (Larson et al., 2005) and horse (Cozzi et al., 2004). The term Dloop was used to mark a region of animal mtDNA where replication is believed to be initiated. The
newly synthesized daughter strand displaces the parental heavy strand and thus a loop structure is
57

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

observed upon electron microscopy, which is termed as displacement or D-loop. The peripheral Dloop domains containing main regulatory elements evolved rapidly in a species-specific manner,
generating heterogeneity in length and base composition. Structurally, 3 domains are present in
mtDNA viz., left and right peripheral domain and central domain. Because of the peculiar evolution
of both left and right domains, they cannot correctly estimate the genetic distances between
mammalian species. On the other hand, the central domain is highly conserved during evolution
behave as a good molecular clock which gives reliable estimates of the times of divergence between
closely and distantly related species. The D-loop analysis had been approached through
polymerase chain reaction (PCR) based techniques viz., PCR-RFLP (restriction fragment length
polymorphism) and PCR-SSCP (single strand conformation polymorphism) or by direct sequencing
of the D loop. Earlier reports are based on PCR-RFLP, but presently on direct sequencing of
mtDNA/D loop is being used in most of the studies.
Restriction mapping of mitochondrial genome for population structuring
Restrictions map i.e. the physical mapping of the relative position of cleavage sites of one or more
restriction endonucleases in mtDNA have been established for most of the livestock animal species
(Hecht et al., 1990). Alignment of the physical map with specified genes encoded by mtDNA is also
available for several species. Anderson et al.,(1982) presented a complete mitochondrial gene map
of bovine mtDNA derived from complete sequence of this genome. Several reports are available on
the physical map of mtDNA in livestock animal species viz. cattle (Loftus et al., 1994a), sheep and
riverine buffalo (Mishra et al., 2009). Structural analysis of mtDNA by restriction endonuclease
digestion and agarose gel electrophoresis has proven useful in assessment of genetic relatedness in
systematic and population genetic studies. Various workers have used mtDNA- RFLP extensively
to study intra- and inter-species comparison. Polymorphism data have been used successfully to
characterize population/breed structure. This type of data also helped in studying the evolution of
a species.
Watanabe et al.,(1985) analysed mtDNA restriction pattern from cattle and pigs of Asian and
European descent and observed clear distinction between the two groups. Bhat et al.,(1990) studied
mtDNA polymorphism in Holstein (Bos taurus) and Haryana (Bos indicus) cattle breeds using 13
restriction endonucleases using 6 enzymes viz., AvaII, BamHI, BglII, HindIII, HpaI and PstI. The two
Holstein differed at 6 sites, where as the Haryana breed did not show any site polymorphism. The
authors observed different mitotype for Holstein as reported by Anderson et al. (1982) and
Watanabe et al. (1985). The Haryana breed did not showed any polymorphic site which can be
understood in the light of the history of this breed, where gene migration has been very limited. On
the basis of existence of polymorphism in the Holstein breed which is known to be genetically
diverse, and no polymorphism in zebu cattle, it was suggested that mtDNA polymorphism might
characterize the two breeds of cattle. Loftus et al., (1994a) analyzed mtDNA from 13 different cattle
breeds of European, African and Asian origin were analysed to determine the phylogenetic
relationship and level of variation among breeds. The presence of 26 different mitotypes described
by 20 polymorphisms indicated two major lineages as Afro-European and Asian types. None of the
mitotypes found in the Asian lineage was detectable in the Afro-European lineage or vice-versa.
Further, the grouping of all African indicine population within the clade containing all Bos taurus
lineage pointed towards hybrid origins of the humped cattle of African continent and their
distinction from Bos indicus population.
Origin and phylogeography of cattle based on mitochondrial DNA
Livestock breeds have been developed through centuries of natural and human selection to fit
different environmental conditions and human needs. The domestication of cattle was an important
step in human history leading to modification of diet and socio-economic structure of several
58

Molecular Genetic Characterization of Farm Animal


Genetic Resources

populations (Beja-Pereira et al., 2006). The process of cattle domestication (B. taurus and B. indicus)
was probably started approximately 11,000 years ago and the breeds and strains have been
morphologically differentiated primarily by the absence (Bos taurus) or presence (Bos indicus) of a
hump. The domesticated cattle breeds have been derived from wild aurochs (B. primigenius) as their
ancestor and the origin of domestication is believed to be in Southeast Asia (Anatolia/ the Fertile
Crescent and its Eastern margin, towards the Indus valley region). The wild aurochs (B. primigenius)
has 3 distinct subspecies: the European taurine cattle have B. p. primigenius in the near and Middle
East; African taurine cattle have B. p. opisthonomus in the northern Africa, while Asian zebuine cattle
have B. p. nomadicus in northern Indian subcontinent as their progenitor (Table 2). On the basis of
mtDNA haplotypic diversity (haplogroups), humpless B. taurus have been reported to be diverged
from humped B. indicus ~1.7-2.0 million years ago (Bradley et al., 1996; Hiendleder et al., 2008;
Loftus et al., 1994a; Loftus et al., 1994b). The clear genetic distinctness between taurine (B. taurus)
and zebu (B. indicus) cattle breeds points two distinct lineages indicating two major sites of
domestication (Fig 2a), one in Indian subcontinent and the other in the Near East, where zebu and
the taurine breeds would have emerged independently from their respective distinct aurochsen
groups. It has been hypothesized that all extant European breeds would have been descended from
cattle domesticated in the Near East and subsequently spread during the diffusion of herding and
farming lifestyles (Beja-Pereira et al., 2006)
Table 2. Origin and domestication of modern cattle breeds
Domestic species
Bos taurus taurus
Bos
indicus

taurus

Wild Ancestor
Aurochs
(3 subsps.) (extinct)
B. primigenius primegenius

MtDNA
clades
4

Domestication
Events (at least)
1

Time
B.P.
~8000

B. p. opisthonomous
B. p. nomadicus

2
2

1
1

~9500
~7000

Location
Near and Middle
East (West Asia)
Northeast Africa
Northern
Indian
subcontinent

Source: Bruford et al., 2003; Hanotte and Jianlin, 2006

Further, information based on mtDNA, microsatellite DNA and Y-chromosome DNA sequence
variation revealed distinctness among Indian and African zebu cattle (Anderung et al., 2007;
Bradley et al., 1998) (Figure 3). The African cattle breeds are taurine in origin with independent
domestication events in African subcontinent. It is estimated that at around 700 AD, the male
mediated introgressions and widespread of zebu cattle started resulting into the admixture
population. The African zebu admixture population with zebu alleles decreased from East to West
Africa and then followed a steep north-south gradient in West Africa. Thus African zebu seem to be
hybrids, with majority of their genome derived from Bos indicus introgression, but with maternally
inherited mtDNA variation that is representative of the original Bos taurus domesticates of that
continent. bovine mtDNA sequences follow the well established taxonomic distinction between the
two domestic cattle forms, namely the humpless variety of Europe, the Middle East and West
Africa (B. taurus or taurine) and the humped cattle of South Asia (B. indicus or zebu) (Magee et al.,
2007).
Distribution of mitochondrial DNA haplogroups in cattle breeds
The variation in mtDNA D-loop region has been extensively studied to determine the ancestral
haplogroups. Based on mitochondrial D-loop nucleotide sequence diversity from different
geographical locations, 5 maternal lineages with major taurine haplogroups (T and T1 to T4) in Bos
taurus and 2 major indicine haplogroups (I1 and I2) in Bos indicus (Chen et al., 2010; Jia et al.,
2010)have been defined and used as nomenclature for mtDNA haplotypes (Pellecchia et al., 2007).
The haplogroup T is defined by a transition at position 16,255 from the Anderson sequence
59

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

(GenBank accession: V00654); T1 by transitions at 16,050, 16,113 and 16,255; T2 by transitions at


16,185 and 16,255 plus a G to C transversion at 16,057; and T3 is identical to the Anderson sequence.
The central haplogroup T represents the root/ nodal point for other taurine haplogroups (Achilli et
al., 2008; Troy et al., 2001) (Figure 4a, b-i). However, for I1 and I2 haplogroups, the positions of
nucleotide changes in particular, have not been defined.
The distribution of mtDNA haplogroup varies among cattle breeds across the world. This spatial
distribution of major haplogroups is another noteworthy feature of mtDNA sequences diversity.
European cattle breeds have their origin from NearEastern Neolithic region with the
predominance of haplogroup T3 along with T and T2 in Anatolia and Middle East (Troy et al.,
2001). The derivative of African mtDNA haplogroup (T1) has been predominantly found in
Caribbean and Brazilian Creole cattle and in Spanish cattle. Admixture of these, Portuguese cattle
breeds represent both African and European taurine haplotypes. American Creole cattle are
reported to be descended from Spanish and Portuguese cattle, and thus contain African and
European haplotypes (Carvajal-Carmona et al., 2003; Magee et al., 2002; Mirol et al., 2003). Thus,
besides the strong geographical partitioning of these B. taurus haplogroups, geographical
exceptions have been observed as signatures of secondary migratory events suggesting historical
importation of African cattle to these regions. With identification of a T4 haplogroup, a fourth
domestication centre has been suggested from North Eastern Asia. This haplotype has not been
reported from Africa, Europe or the Near East region (Mannen et al., 2004). Thus, haplogroup T and
T2 predominantly covers majority of Near Eastern variation; T1 and T3 predominantly represent
African and European taurine cluster, respectively, while T4 represents North Eastern Asian clade
(Mannen et al., 2004). Another haplogroup T5 has now been identified for B. taurus with an initial
split of macro-haplogroup T into two sister subclades, T5 and T1T2T3 (Achilli et al., 2009). Jia et al.
(2007) reported the first evidence for presence of taurine haplogroups/ subhaplogroups T1A, T3A,
T3B and T5 in Asian cattle. Analysis of Indian zebu cattle (B. indicus) from three different
geographical regions in India revealed two nodes designated as Z1 and Z2 (Baig et al., 2005; Magee
et al., 2007). These studies, on the basis of diversity and time separation concluded that there was
genetic signal of independent cattle domestication in India, in parallel to Near Eastern
domestication for European and African cattle.
MtDNA based genetic diversity has been analyzed in several cattle breeds and mitochondrial
haplotypes described by Troy et al. (2001) and Mannen et al. (2004) have been extensively used to
infer the origin of cattle breeds across the world. Similarly, mtDNA haplotypes have been described
in zebu cattle in India (Chen et al., 2010) and China (Lai et al., 2006). However, an internationally
accepted nomenclature on mitochondrial haplotypes for Indian major livestock species (cattle,
buffalo, goat and sheep) has not been put in place till date. This could be a reason of assigning
different name to haplotypes described for zebu cattle. Thus, it would be important to have an
mtDNA nomenclature system at international level to clearly define the matrilineal diversity across
the cattle breeds. Such systematic information can be utilized among countries in exchange of
germplasm on the basis of genetic distance between the breeds. Further, mtDNA haplotypes in
other species, viz. buffalo, sheep and goat might be developed for their potential application in
genetic improvement in similar line with that of cattle.
Studies conducted at national Bureau of Animal Genetic Resources have revealed distinct
dichotomy and independent domestication events for Indian cattle and Bos taurus cattle. The
phylogeopraphy and network analysis revealed southern breeds to be primitive in comparison to
northern cattle breeds. In the phylogenetic tree for Indian, African, European, and Chinese zebu
cattle based on mtDNA haplotypes, Indian cattle were close to Chinese zebu cattle, whereas African
and European taurus clustered together. Thus phylogeny reflected clear separation of Indian cattle
from rest of the cattle types
60

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Origin and phylogeography of other livestock based on mitochondrial DNA


Amongst the two buffalo types, riverine buffalo (B. bubalis bubalis) are distributed in Indian
subcontinent, Middle-East and Eastern Europe while, swamp buffalo (B. bubalis carabanesis) are
distributed in North-eastern India, Bangladesh, China and South-east Asia. Till 2006, it remained
disputed whether swamp and river buffaloes were domesticated independentl by human selection
or was the result of single domestication event (Kierstein et al., 2004). mtDNA analysis clearly
revealed two divergent lineages for riverine and swamp buffalo indicating their independent
domestication events and distinct genetic origin (Kumar et al., 2007b). The centre of domestication
for river buffalo has been believed to be in the Indus valley and/or the Euphrates and Tigris valleys
some 5000 years ago, and for swamp buffalo - in China around 4000 years ago ( (Hanotte and
Jianlin, 2006). On the basis of mtDNA D-loop, Southeast Asia appeared to be a hybrid zone where
swamp and river buffalo from China and India got spread and interbred. However, Kumar et
al.,(2007b) has proposed the Western region of Indian subcontinent (present day breeding tracts of
the Mehsana, Surti and Pandharpuri breeds) as the possible centre of domestication.
In sheep (Ovis aries), at least two potential domestication events ) has been suggested (Figure 2c)
whereas, in goats (Capra hircus), mtDNA analysis has revealed a complex pattern of domestication
with at least three divergent lineages (Joshi et al., 2004;) (Figure 2d). For Indian native goat, based
on hypervariable region-I sequence data, five lineages (lineage A-E) and a complex origin has been
suggested by Joshi et al. (2004). Among the detected, lineage A that dates back more than 35,000
years was found to be majorly distributed. The phylogenetic analyses clustered descript and
nondescript breeds into two groups; all domestic goats clustered together with three sub-groups,
while the local nondescript goats formed a separate group. However, among descript breeds,
Pashmina goats exhibited different demographic history from the other breeds and time to the most
recent common ancestor (TMRCA) for the most frequent lineage was estimated to be around 35,000
and 69,000 years.

Figure 2: Phylogenetic complexities in modern day (a) cattle, (b) horses, (c) sheep and (d) goat
Bruford et al., 2003

61

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

For Indian sheep breeds, the study conducted by Pardeshi et al. (2007) suggested a common
origin for Deccani, Bannur and Garole breeds. Arora et al. (2013) analyzed mtDNA diversity among
19 Indian sheep breeds from 3 different agroecological regions. The lineage analysis reflected major
population with Type A haplotype which is in accordance with their Asian origin, while type B
haplotype was observed in only few animals (Chokla, Jaisalmeri, Kheri, Marwari and Nali) from
Northwestern region that might have resulted as crossbreeding with European breeds for
improvement of wool quality. Based on mitochondrial DNA sequence data, modern horse (Equus
caballus), has been grouped into ~17 mtDNA phylogenetic lineages indicating a complex and
numerous domestication steps (Figure 2b) . In pigs, mtDNA evidence supported multiple centre of
domestication (Larson et al., 2005). Summing up, mtDNA marker has been very useful in tracing the
ancestry, time of divergence and domestication events of various livestock species.
Challenges and future prospects
The characterization of genetic diversity within and between breeds, and the identification of the
geographical category of variation will allow region specific conservation measures to be put in
place. The identification of ancestral livestock populations could be important for sustainable
utilization and preservation of animal genetic resources and to meet the need and aspiration of
future generations. The use of mtDNA in diversity studies might be challenged by genetic
improvement programmes using exotic germplasm leading to genetic erosion in native breeds. In
such circumstances conservationist need to be cautious in selecting sample of animals for
investigation. Thus, in areas where there is gene flow from introduced breeds and the population
size of purebred is reduced, samples should be taken from areas which are relatively isolated by
geography maintaining a high chance of authentic genetic components. Secondly, samples from
areas where both male and female are bred to reproduce their own herd reducing genetic exchange
with the outside gene pool are ideal for analysis of diversity.
Although been studied widely and most resistant part of the genome to introgression, use of
mtDNA to infer the evolutionary and demographic past of both population and species has been
questioned in recent years. It has also been held that being smaller in size and an extra-nuclear
genetic marker, mtDNA may not always be sufficient to answer all the questions related to genetic
diversity of the species and not the organelle and being maternally inherited, it does not detect
male-mediated gene flow, which has a significant influence on the evolution of species in the
modern times (Bruford et al., 2003). Thus, for the holistic picture of diversity analysis to determine
the population historic events, the approach of mtDNA need to be supplemented with information
from other neutral markers such as autosomal microsatellite markers and Y-chromosome DNA
variations. Also, the combination of mtDNA and microsatellite will avoid inheritance bias since
they relay information on maternally and codominant inherited regions.
References

Achilli, A., Olivieri, A., Pellecchia, M., Uboldi, C., Colli, L., Al-Zahery, N., Accetturo, M., Pala, M., Hooshiar
Kashani, B., Perego, U.A., Battaglia, V., Fornarino, S., Kalamati, J., etal., 2008. Mitochondrial genomes of
extinct aurochs survive in domestic cattle. Curr Biol 18: R157.
Anderson, S., De Bruijn, M., Coulson, A., Eperon, I., Sanger, F., Young, I., 1982. Complete sequence of bovine
mitochondrial DNA conserved features of the mammalian mitochondrial genome. J Mol Biol 156, 683-717.
Arora, R., Yadav, H.S., Mishra, B.P., 2013. Mitochondrial DNA diversity in Indian sheep. Livestock Science
153: 50.
Beja-Pereira, A., Caramelli, D., Lalueza-Fox, C., Vernesi, C., Ferrand, N., Casoli, A., et al., 2006. The origin of
European cattle: Evidence from modern and ancient DNA. Proceedings of the National Academy of
Sciences 103: 8113.
Berggren, K., Ellegren, H., Hewitt, G., Seddon, J., 2005. Understanding the phylogeographic patterns of
European hedgehogs, Erinaceus concolor and E. europaeus using the MHC. Heredity 95: 84.

62

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Bhat, P., Mishra, B., Bhat, P., 1990. Polymorphism of mitochondrial DNA (mtDNA) in cattle and buffaloes.
Biochemical genetics 28: 311.
Bradley, D.G., MacHugh, D.E., Cunningham, P., Loftus, R.T., 1996. Mitochondrial diversity and the origins of
African and European cattle. Proc Natl Acad Sci U S A 93: 5131.
Bruford, M.W., Bradley, D.G., Luikart, G., 2003. DNA markers reveal the complexity of livestock
domestication. Nat Rev Genet 4: 900.
Carvajal-Carmona, L.G., Bermudez, N., Olivera-Angel, M., Estrada, L., Ossa, J., Bedoya, G., Ruiz-Linares, A.,
2003. Abundant mtDNA diversity and ancestral admixture in Colombian criollo cattle (Bos taurus).
Genetics 165: 1457.
Chen, S., Lin, B.Z., Baig, M., Mitra, B., Lopes, R.J., Santos, A.M., Magee, D.A., Azevedo, M., Tarroso, P.,
Sasazaki, S., Ostrowski, S., Mahgoub, O., Chaudhuri, T.K., Zhang, Y.P., Costa, V., Royo, L.J., Goyache, F.,
Luikart, G., Boivin, N., Fuller, D.Q., Mannen, H., Bradley, D.G., Beja-Pereira, A., 2010. Zebu cattle are an
exclusive legacy of the South Asia neolithic. Mol Biol Evol 27: 1.
Cozzi, M.C., Strillacci, M.G., Valiati, P., Bighignoli, B., Cancedda, M., Zanotti, M., 2004. Mitochondrial D-loop
sequence variation among Italian horse breeds. Genet Sel Evol 36: 663.
Edwards, C.J., MacHugh, D.E., Dobney, K.M., Martin, L., Russell, N., Horwitz, L.K., McIntosh, S.K.,
MacDonald, K.C., Helmer, D., Tresset, A., Vigne, J.D., Bradley, D.G., 2004. Ancient DNA analysis of 101
cattle remains: limits and prospects. J Archaeol Sci 31: 695.
Erhardt, G., Weimann, C., 2007. Use of molecular markers for evaluation of genetic diversity and in animal
production. Archivos Latinoamericanos de Produccion Animal 15: 63.
Galtier, N., Nabholz, B., Glmin, S., Hurst, G., 2009. Mitochondrial DNA as a marker of molecular diversity: a
reappraisal. Mol Ecol 18: 4541.
Hanotte, O., Jianlin, H., 2006. Genetic characterization of livestock populations and its use in conservation
decision-making. The role of biotechnology in exploring and protecting agricultural genetic resources.
FAO, Rome, Italy, 89.
Hecht, W., Geldermann, H., Ellendorff, F., 1990. Studies on mitochondrial DNA in farm animals. Genome
analysis in domestic animals., 259.
Hiendleder, S., Lewalski, H., Janke, A., 2008. Complete mitochondrial genomes of Bos taurus and Bos indicus
provide new insights into intra-species variation, taxonomy and domestication. Cytogenetic and Genome
Research 120: 150.
Hiendleder, S., Lewalski, H., Wassmuth, R., Janke, A., 1998. The complete mitochondrial DNA sequence of
the domestic sheep (Ovis aries) and comparison with the other major ovine haplotype. J Mol Evol 47: 441.
Jia, S., Chen, H., Zhang, G., Wang, Z., Lei, C., Yao, R., Han, X., 2007. Genetic variation of mitochondrial Dloop region and evolution analysis in some Chinese cattle breeds. Journal of genetics and genomics = Yi
chuan xue bao 34, 510-518.
Jia, S.G., Zhou, Y., Lei, C.Z., Yao, R., Zhang, Z.Y., Fang, X.T., Chen, H., 2010. A new insight into cattle's
maternal origin in six Asian countries. J Genet Genomics 37, 173-180.
Joshi, M.B., Rout, P.K., Mandal, A.K., Tyler-Smith, C., Singh, L., Thangaraj, K., 2004. Phylogeography and
origin of Indian domestic goats. Mol Biol Evol 21, 454-462.
Kierstein, G., Vallinoto, M., Silva, A., Schneider, M.P., Iannuzzi, L., Brenig, B., 2004. Analysis of mitochondrial
D-loop region casts new light on domestic water buffalo (Bubalus bubalis) phylogeny. Mol Phylogenet
Evol 30: 308.
Kim, K.-I., Lee, J.-H., Lee, S.-S., Yang, Y.-H., 2003. Phylogenetic relationships of northeast Asian cattle to other
cattle populations determined using mitochondrial DNA D-loop sequence polymorphism. Biochemical
genetics 41: 91.
Kumar, S., Nagarajan, M., Sandhu, J.S., Kumar, N., Behl, V., 2007a. Phylogeography and domestication of
Indian river buffalo. BMC Evol Biol 7: 186.
Kumar, S., Nagarajan, M., Sandhu, J.S., Kumar, N., Behl, V., Nishanth, G., 2007b. Mitochondrial DNA
analyses of Indian water buffalo support a distinct genetic origin of river and swamp buffalo. Animal
Genetics 38: 227.
Lai, S.J., Liu, Y.P., Liu, Y.X., Li, X.W., Yao, Y.G., 2006. Genetic diversity and origin of Chinese cattle revealed
by mtDNA D-loop sequence variation. Mol Phylogenet Evol 38: 146.

63

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H.,
Brand, T., Willerslev, E., Rowley-Conwy, P., Andersson, L., Cooper, A., 2005. Worldwide phylogeography
of wild boar reveals multiple centers of pig domestication. Science 307: 1618.
Lavrov, D.V., 2007. Key transitions in animal evolution: a mitochondrial DNA perspective. Integr Comp Biol
47, 734-743.
Lin, C.S., Sun, Y.L., Liu, C.Y., Yang, P.C., Chang, L.C., Cheng, I.C., Mao, S.J.T., Huang, M.C., 1999. Complete
nucleotide sequence of pig (Sus scrofa) mitochondrial genome and dating evolutionary divergence within
Artiodactyla. Gene 236: 107.
Loftus, R.T., MacHug, D.E., Bradley, D.G., Sharp, P.M., Cunningham, P., 1994a. Evidence for two
independent domestications of cattle. Proc Natl Acad Sci U S A 91: 2757.
Loftus, R.T., MacHugh, D.E., Ngere, L.O., Balain, D.S., Badi, A.M., Bradley, D.G., Cunningham, E.P., 1994b.
Mitochondrial genetic variation in European, African and Indian cattle populations. Animal Genetics 25:
265.
MacHugh, D.E., Bradley, D.G., 2001. Livestock genetic origins: goats buck the trend. Proceedings of the
National Academy of Sciences 98: 5382.
Magee, D.A., Mannen, H., Bradley, D.G., 2007. Duality in Bos indicus mtDNA diversity: Support for
geographical complexity in zebu domestication. Vertebr Paleobiol Pa, 385.
Magee, D.A., Meghen, C., Harrison, S., Troy, C.S., Cymbron, T., Gaillard, C., Morrow, A., Maillard, J.C.,
Bradley, D.G., 2002. A partial african ancestry for the creole cattle populations of the Caribbean. J Hered
93, 429-432.
Maji, S., Krithika, S., Vasulu, T.S., 2009. Phylogeographic distribution of mitochondrial DNA
macrohaplogroup M in India. J Genet 88: 127.
Mannen, H., Kohno, M., Nagata, Y., Tsuji, S., Bradley, D.G., Yeo, J.S., Nyamsamba, D., Zagdsuren, Y.,
Yokohama, M., Nomura, K., Amano, T., 2004. Independent mitochondrial origin and historical genetic
differentiation in North Eastern Asian cattle. Mol Phylogenet Evol 32: 539.
Mirol, P.M., Giovambattista, G., Liron, J.P., Dulout, F.N., 2003. African and European mitochondrial
haplotypes in South American Creole cattle. Heredity 91, 248-254.
Mishra, B., Kataria, R., Bulandi, S., Prakash, B., Kathiravan, P., Mukesh, M., Sadana, D., 2009. Riverine status
and genetic structure of Chilika buffalo of eastern India as inferred from cytogenetic and molecular
markerbased analysis. J Anim Breed Genet 126, 69-79.
Parma, P., Erra-Pujada, M., Feligini, M., Greppi, G., Enne, G., 2004. Water buffalo (Bubalus bubalis): complete
nucleotide mitochondrial genome sequence. DNA Seq 15: 369.
Pellecchia, M., Negrini, R., Colli, L., Patrini, M., Milanesi, E., Achilli, A., Bertorelle, G., Cavalli-Sforza, L.L.,
Piazza, A., Torroni, A., Ajmone-Marsan, P., 2007. The mystery of Etruscan origins: novel clues from Bos
taurus mitochondrial DNA. P R Soc B 274: 1175.
Pietro, P., Maria, F., GianFranco, G., Giuseppe, E., 2003. The complete nucleotide sequence of goat (Capra
hircus) mitochondrial genome: goat mitochondrial genome. Mitochondrial DNA 14: 199.
Steinborn, R., Schinogl, P., Wells, D.N., Bergthaler, A., Muller, M., Brem, G., 2002. Coexistence of Bos taurus
and B. indicus mitochondrial DNAs in nuclear transfer-derived somatic cattle clones. Genetics 162: 823.
Troy, C.S., MacHugh, D.E., Bailey, J.F., Magee, D.A., Loftus, R.T., Cunningham, P., Chamberlain, A.T., Sykes,
B.C., Bradley, D.G., 2001. Genetic evidence for Near-Eastern origins of European cattle. Nature 410: 1088.
Xu, X., Arnason, U., 1994. The complete mitochondrial DNA sequence of the horse, Equus caballus: extensive
heteroplasmy of the control region. Gene 148: 357.
Tarnita C. E., Antal T., Ohtsuky H., Nowak M. A. 2009. Evolutionary dynamics in set structured populations.
Proceedings of the National Academy of Science 21: 8601.

64

10
Y- Chromosome Based Genetic Diversity in Farm Animal Genetic
Resources with Special Reference to Bovine
Indrajit Ganguly, Monika Sodhi, Suchit Kumar, Sanjeev Singh and K N Raja
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana

________________________________________________________________________________________
Studies on Y chromosome are of particular interest in livestock species because in common
breeding strategies, only a few males contribute genetically to the next generation (Lindgren et al.,
2004). The mammalian Y chromosome is a gene poor male specific chromosome in a species with
male heterogamety like in human. It often determines sex in a dominant fashion and is inherited
clonally from father to son, so it is never present in females. Y chromosome may thus, complement
the studies which are using mitochondrial DNA for inferring sex-specific population genetic
processes. The mammalian Y chromosome has two components, a pseudo-autosomal region, which
frequently recombines with the X chromosome and a male-specific region (MSY). In many higher
organisms (with an X-Y sex determining system), it is the only chromosome with truly haploid
characteristics; wherein no genetic material is exchanged with a homologue through recombination,
making all sites linked to each other. This non-recombining region is called the male-specific region
of the Y chromosome, the MSY, and comprises 95% of the length of the Y chromosome in human.
Remaining5% that is genetically similar to X, make up the pseudo-autosomal region, PAR, in the
telomere ends of the Y chromosome and recombine with the X chromosome during meiosis.
Markers on the MSY, which is paternally inherited in a haploid way, have been used for studying
the origin of species, range expansion, admixture of populations, and migration in animals (Pidancier et al., 2006). Molecular variation in the Y chromosome provides information about genetic
diversity, since it reveals the pattern of distribution of paternal lineages. For instance, it may
indicate stocks upgrading, which is often performed by using sires from breeds with the desired
properties.
Origins and evolution of Y-chromosome
The evolutionary ancestor of the sex chromosomes was a pair of matched, autosomal chromosomes
that acquired sex-determining genes on one member of the pair. This occurred about 300 million
years ago in a reptile-like ancestor. Over time additional genes with male-specific functions
accumulated in this same chromosome, called proto-Y, which then lost its ability to recombine with
its counterpart chromosome, called proto-X. There are four regions of the proto-X chromosome,
which appear to have been involved in four different steps, resulting in the loss of recombination
with proto-Y. Each of the four regions accumulated mutations in those non-recombining regions of
proto-Y at four different times in evolution. Each time recombination was lost there was
degradation and loss of the non-recombining region. Over time this chromosome evolved into Y,
losing most of its genetic information as a result of the degradation of the non-recombining regions
of the chromosome. Its partner chromosome evolved into the X chromosome. The degeneration of
the Y was offset at various times by additions of autosomal genes to this chromosome (as well as to
X), leading to a pattern of loss and gain of genetic material over a period of about 170 million years.
If we consider human Y chromosome as an example then the degeneration seems to be occurred
in four discrete episodes, beginning about 300 million years ago when a reptile-like ancestor
acquired the SRY gene on one of its autosomal chromosomes. Each of the four episodes involved a
failure of recombination to occur between the X and the Y chromosomes, resulting in subsequent
decay of some genes in the non-recombining region.Around 166 million years ago, a huge chunk of
the Y chromosome in one of our mammalian ancestors was turned upside down and reinserted.
65

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

The change was so extreme that the Y chromosome no longer matched the X, and it became
impossible for the two to swap genes. The Y chromosome began collecting mutations and losing
genes, ultimately taking on its characteristic Y shape as a result.In humans, it now carries a mere 19
of the 800 genes it originally shared with the X. Given that rate of loss, some geneticists have
predicted that the chromosome will lose its final gene in 4.6 million years. Recently, Jennifer
Hughes and her colleagues from Whitehead Institute for Biomedical Research in Cambridge,
Massachusetts sequenced the Y chromosome of the rhesus macaque - a primate that diverged from
humans around 25 million years ago.They found that the monkey's Y chromosome contains 20
genes that match its X chromosome, and 19 of them are the same as human Y genes. This suggests
that the human Y chromosome has lost only one gene since humans and macaques last shared a
common ancestor (Nature, DOI: 10.1038/nature10843). This empirical data suggests that the Y
chromosome has held steady over the last 25 million years and the 19 surviving genes probably
have vital biological functions.
Characteristics features of mammalian Y chromosome
The mammalian Y is the smallest chromosome of the genome, comprising < 3% of the haploid
genome (Krausz and DeglInnocenti 2006). It is usually a metacentric or acrocentric chromosome
and contains a short (Yp) and long arm (Yq). A small region (5% of the Y) located in the distal part
of either Yp or Yq that mediates X and Y segregation is known as the pseudoautosomal region
(PAR), where X and Y chromosomes pair and recombine during meiosis. The rest of the Y (95%)
contains Y chromosome male-specific sequences (MSY) that do not recombine with the X during
meiosis (Rice 1996). Several special features set the MSY apart from the rest of genome: absence of
homologous recombination, male-limited transmission, abundance of Y-specific repetitive
sequences with unique genomic structures (i.e. massive palindromes, or palindrome-like
sequences), tendency of MSY genes to degenerate during evolution, acquisition of autosomal genes,
and accumulation and functional cluster of testis genes for maleness and reproduction (Lahn and
Page 1997, Tilford et al. 2001, Rozen et al. 2003, Gvozdev et al. 2005, Liu 2010). Investigating Y
chromosomes is challenging
as the absence of recombination between the X and Y makes classical linkage-mapping of MSY
virtually impossible, and the complexity of the repetitive sequences makes sequencing extremely
difficult (Liu and Ponce de Len 2007). This explains why the Y was excluded from most
mammalian genome sequencing projects.Most of todays knowledge regarding the mammalian Y
chromosome is based on the three sequenced primate (human, chimpanzee and rhesus macaque) Y
chromosomes (Skaletsky et al. 2003, Hughes et al. 2010, Hughes and Rozen 2012, Hughes et al.
2012) and the partially sequenced mouse (Alfldi 2008) and bovine Y chromosomes (Chang et al.
2013b).
The Bos taurus Y chromosome (BTAY) is ~ 51 Mb in size and is the smallest chromosome in the
genome (Liu and Ponce de Len 2007). The PAR is ~ 6 Mb (Das et al. 2009), and the MSY is ~ 45 Mb.
Cytogenetically, the size and morphology of the Y chromosome differ among bovid lineages (Di
Meo et al. 2005). BTAY is submetacentric, while the zebu (Bos indicus, BIN) and river buffalo
(Bubalus bubalis, BBU) Y chromosomes are acrocentric (Kieffer and Cartwright 1968). This
morphological difference is the consequence of Y chromosomal rearrangements through either
centromeric transposition or pericentric inversion as revealed by comparative fluorescent in situ
hybridization (FISH) (Di Meo et al. 2005). By using Y-linked repetitive sequences as FISH painting
probes, Di Meo and coworkers found that the Y chromosome in different bovid lineages has
underwent genomic rearrangements and accumulated various classes of repetitive sequences
during the bovid evolution (Di Meo et al. 2005). The bovine Y is being sequenced
(http://www.ncbi.nlm.nih.gov/bioproject/20275), and a draft sequence assembly of ~ 43.3 Mb is
available (GenBank acc. no. CM001061.2).
66

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Recent transcriptome analysis of bMSY identified a total of 1,274 protein-coding genes/families


and 367 additional non-coding RNA (ncRNA) families,making the bMSY gene density (~31.2
genes/Mb) the highest in the genome, in comparison with~ 9.4 genes/Mb for the bovine X (BTAX)
and ~10.2 genes/Mb for the entire genome (Chang etal. 2013b). The discovery of the higher gene
density along with the high transcriptional activitiesobserved from these Y chromosome genes (see
below) challenges the widely accepted hypothesisthat the MSY is gene poor and transcriptionally
inert.
Y-chromosome polymorphisms (STR and SNPs)
There are two classes of chromosomal markers with respect to mutation rates, unique and
recurrent event markers. The unique mutations (with a mutation rate about 1x10-9 per site per
generation) are considered to occur only once in the evolutionary history of a species for example,
SNPs and insertion-deletion events (Hammer 1994; Thomson et al. 2000). In the MSY these binary
polymorphisms can be combined into haplogroups with a monophyletic origin, and the
phylogenetic relationship can thus be depicted in a single most parsimonious tree (a tree with the
least mutations to explain the topology). Microsatellites or short tandem repeats (STRs) are
tandemly repeated units of 1-6 bp scattered in the genome. They are considered to be recurrent
event markers because of their high mutation rate (about 2x10-3) (Ellegren2000; Kayser et al. 2000).
Their high level of polymorphism makes STRs useful in, for example for paternity testing and to
detect recent population differentiation. STRs on the Y chromosome (outside the PAR region) can
be combined into haplotypes, but because of the high level of homoplasy (identity by state but not
by descent) phylogenetic reconstruction is difficult. Given their different levels of polymorphism,
combining information from the two classes of mutations makes the Y chromosome a powerful tool
to detect male population structure and differentiation on different time-scales (Jobling and TylerSmith 1995; Mitchell and Hammer 1996). Unique and recurrent mutations can be combined to
reveal Y chromosome genealogies. Unique mutations define deep lineages (haplogroups) while
recurrent mutations define terminal haplotypes (de Knijff et al. 1997). In this way many recurrent
events in a population causing loops in a network can be resolved if considered in their respective
haplogroups because each haplogroup is founded by a single male with no variation at the
multiallelic loci.
Factors effecting Y- chromosome diversity
The main hindrance of studying Y chromosome polymorphisms in natural populations of
nonhuman mammals is due to lack of available Y-specic markers for two main reasons. First, there
is the common assumption that the Y chromosome has a low genetic variability resulting from the
effect of selective sweeps and/or a reduced effective population size (Hellborg and Ellegren, 2004).
As in mitochondrial DNA, the nonrecombining Y chromosome has an effective population size onequarter that of autosomal DNA and therefore is predicted to be sensitive to genetic drift (Petit et al.,
2002). Second, there are technical difficulties in nding Y-specic genetic markers due to the high
occurrence of gene conversion, degeneration, and repetitive sequences on this chromosome (Rozen
et al., 2003; Skaletskyet al., 2003).Intraspecific diversity as well as interspecific divergence can be
estimated by pi () or theta (). Theta is the proportion of nucleotide sites that are polymorphic in a
sample. Pi is the average number of nucleotide differences per site between two randomly chosen
sequences in the sample. While is a measure of nucleotide polymorphism in a sample and can be
corrected for by sequence length and sample size, measures the nucleotide diversity with regard to
frequencies of different alleles. Under neutrality, these estimates should be the same but selection
and population structure will affect the estimates of and in different ways (Li 1997). Levels of
genetic variation in the Y chromosome may differ from that of the rest of the genome for a number
of reasons discussed below.
67

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Effective Population Size: In sexual populations, half of the alleles are derived from females and
half from males. The number of chromosome variants maintained in the population is, among other
things, dependent on the effective population size (Ne) of each chromosome. In an ideal population
the relationship between Y, X and autosomes is Ne: 3Ne:4Ne suggesting an expected 1:3:4
relationship in diversity between the different chromosomes.
Mating systems:Different mating systems can cause differences in effective population size. Skewed
mating systems, for example in polygynous species, where one male mates with many females, will
affect the relative difference in effective population size between chromosomes. For example; the
Y:X: autosome relationship when one male mates with two females will be 1:5:6. If the ratio of
females to males is increased to ten (which is common in some species (McComb and Clutton-Brock
1994; Roed et al. 2002) the relationship would be 1:21:22.
Selection:New mutations can be neutral, advantageous or disadvantageous. The probability of
fixation or elimination of the mutation in the population depends on the relative fitness of the new
phenotype. Exceptions occur for balancing selection, overdominance (where heterozygotes are
favored) and in limited populations. Negative selection will tend to eliminate disadvantageous
mutants or genotypes from the population and is the prevailing type of selection since the majority
of non-neutral mutations are deleterious or slightly deleterious. Positive selection increases the
probability for an advantageous mutation to become fixed in the population (Li 1997). However,
the chance of losing a new advantageous mutation from the population by random genetic drift
(change in allele frequency due to chance) can still be high (Hartl and Clark 1997). Selection at a
locus will also affect linked sites. In the absence of recombination, selection will tend to reduce
genetic variability at linked sites to the same extent as at the locus under selection. With
recombination, the effect becomes gradually smaller as the rate of recombination between the
selected locus and linked sites increases. In line with this thinking, levels of neutral variability have
been shown to correlate with recombination rate in humans (Nachman 2001), mice (Nachman
1997), plants (Stephan and Langley 1998) and fruit flies (Begun and Aquadro 1992). The Y
chromosome, which lacks recombination (except in the PAR), should be expected to have reduced
variation as compared to recombining chromosomes. Selective sweeps and background selection
may have severe affects on the MSY where all sites are linked compared to other genomic regions.
Selective Sweeps:Selective sweep or the hitchhiking effect is an effect of positive selection where a
favorable allele drives through the population to fixation together with its linked loci. This will
reduce the variation linked to the selected site and decrease the diversity in the population (Rice
1987). The impact of a selective sweep depends on the recombination rate and selection coefficient,
the lower the recombination rate and/or the higher the selection coefficient, the larger is the
genomic region affected by the sweep. In the Y chromosome where 95% of the sites are linked, an
advantageous gene regulating a male specific trait, like one involved in spermatogenesis, may
sweep through the population and eliminate all variation in the MSY (Roldan and Gomendio 1999;
Wyckoff et al. 2000). Selective sweeps can bring about fixed Y chromosomes within a species and
different between species, while mutations will only slowly produce new variants in a population.
In contrast to the neutralists prediction of a positive correlation of intraspecific variation and
interspecific divergence, positive selection can lead to uncoupling of levels of polymorphism and
divergence (Li 1997).
Background selection:Background selection is an effect of negative selection where deleterious
mutations will be eliminated from the population together with their linked loci. This process, as
with selective sweeps, will reduce variation in the region around the selected site. Similar to sweeps
the impact of background selection depends on the recombination rate and the selection coefficient.
68

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Background selection is not thought to alter allele frequencies to the same extent as selective
sweeps; indicating that the two types of selection can be distinguished from each other
(Charlesworth et al. 1995). In a non-recombining region, mildly deleterious as well as weakly
advantageous alleles will survive linked to each other. In the absence of a strongly advantageous
mutation, a neutral or weakly selected mutation can only survive on a non-recombining
chromosome (like Y) if there is no strongly deleterious mutation, otherwise it will be eliminated
(Charlesworth, 1994).
Sex Specific Mutation Rates: Mutations are generated during DNA replication. The number of germ
cell divisions differs between spermatogenesis and oogenesis. In oogenesis every mature oocyte has
gone through a total of 24 cell divisions irrespective of the age of the female. In spermatogenesis
however cell division is a continuous process, so the older the male the more cell divisions his
sperms have gone through. For example in a 20 year-old man every sperm has gone through about
150 cell divisions and at the age of 40, 610 cell divisions (Hurst and Ellegren 1998). The male to
female mutation rate ratio, m, is mostly dependent on the skewed number of cell division in the
germ lines of males and females. As m is generally larger than one, meaning that male germ cells
mutate more frequently than female germ cells, more mutations in the Y chromosome than in other
chromosomes can be predicted (Miyata et al. 1987). Estimates ofm from X and Y comparisons
suggest that it co-varies with the mean age of reproduction; rodent (m=2)(Chang et al. 1994) <
felidae (m=4)(Pecon Slattery and O'Brien 1998) < primates (m=6)(Chang et al. 1996).
Other factors affecting Y diversity:Differences in migration between males and females can produce
variation in the patterns of genetic differentiation detected in maternally and paternally inherited
systems. In a patrilocal species, where female migrate more than males, this will imply less
variation in the Y chromosome locally. In a global perspective, this will lead to higher
differentiation in Y chromosome than in the maternally inherited mitochondrial DNA (mtDNA)
(Seielstad et al. 1998). Spermatogenesis and sperm mobility are energy demanding processes;
therefore, the function of the mitochondria is vital for reproductive success. Deleterious mutations
in the mtDNA that affect the energy production negatively would be expected to lead to impaired
reproduction and, consequently, reduced effective population size among males. This will lower
the effective population size of Y chromosomes and reduce its diversity (Gemmell and Sin 2002).
Origin and domestication of cattle
Y chromosome analyses have long been used to study the process of domestication. The aurochs, or
the wild ox (Bos primigenius), extinct since 1627, was once widespread throughout Europe, northern
Africa, and southern Asia during the Pleistocene and Holocene period. Modern cattle have
probably been domesticated from this wild aurochs in the Near East and Asia around 10,000 years
ago (Anderung et al., 2007; Freemann et al., 2006; Gtherstrm et al., 2005). The estimated time of
divergence between Bos taurus and Bos indicus ranges from 117 000 to 275 000 years according to
mtDNA analyses (Bradley et al., 1996) and from 610 000 to 850 000 years according to microsatellite
data analyses (MacHugh et al., 1997). A new estimated divergence study based on mtDNA data has
indicated the approximately time of divergence between B. taurus and B. indicus could be about 1.7 2.0 million years ago (Hiendleder et al., 2008).B. taurus and B. indicus cattle were domesticated
independently from the aurochsen in the Near East and in Indian subcontinent, respectively
around 10,000 year ago (Beja-Pereira et al., 2006 ; Bradley et al., 1998). Subsequently, cattle
accompanied human migrations, which led to the dispersal of domestic cattle of taurine, indicine,
or mixed origin over Asia, Africa, Europe, and the New World (Ajmone-Marsan et al., 2010).
Bovine Y-chromosome variations
Cattle Y-chromosome studies are generally affected by a lack of powerful sources of information.
There are limited numbers of informative segregating sites and polymorphic Y specific
69

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

microsatellites (Ginja et al., 2009; Gtherstrm et al., 2005). The first analysis of the Y-chromosome
was on the exploration of karyological features of different species and it was identified as
metacentric/submetacentric and acrocentric in taurine (Bos taurus) and zebu/indicine (Bos indicus)
cattle, respectively (Potter and Upton, 1979; Halnan and Watson, 1982). Studies on cattle Ychromosomes have mainly focused on the assessment of male-mediated migration patterns and
admixture between B. taurus and B. indicus (Hanotte et al., 2000; Anderung et al., 2007; Edwards et
al., 2007) or the assessment of differences in diversity among different breeds (Ginja et al., 2009 ;
Kantanen et al., 2009). The understanding of the origin, relationships, and paternal inheritance of
native breeds indicated that there is large share of Y chromosome-specific markers (Edwards et al.,
2000; Hellborg and Ellegren, 2004; Li et al., 2007). Y chromosome-specific markers are preferred for
testing paternity, examining contamination risks of DNA samples (analysis of male component in
male/female mixtures), and handling criminal cases (Jobling et al., 1997; Jobling, 2001).
Gtherstrm et al. (2005) identified five polymorphic sites (Table 1) on the cattle Y-chromosome,
allowing identification of three haplotypes viz., Y1, Y2 and Y3 (Table 2) in contemporary cattle,
with Y1 being more frequent in north-western Europe B. taurus, Y2 being dominant in southern
Europe B. taurus and Anatolian cattle, and Y3 being exclusive to B. indicus. Recently, Li et al. (2013)
studied the Y chromosome genetic diversity and paternal origin of Chinese cattle including 369
bulls from 17 Chinese native cattle breeds, 30 bulls from Holstein and four bulls from Burma. In
total, the taurine Y1, Y2 haplogroup and indicine Y3 haplogroup were detected in 7 (1.9 %), 193
(52.3 %) and 169 (45.8 %) individuals of 17 Chinese native breeds, respectively. Y2 was observed to
be dominating northern China (91.4 %), Y3 in southern China (81.2 %) while Central China was an
admixture zone with Y2 predominating overall (72.0 %). The results also demonstrated that Chinese
cattle have two paternal origins, one from B. taurus (Y2) and the other from B. indicus (Y3). The Y1
haplogroup might have originated from the imported beef cattle breeds in western countries.
Interestingly, the geographical distributions of the Y2 and Y3 haplogroup frequencies reveal a
pattern of male indicine introgression from south to north China, and male taurine introgression
from north to south China. SNP markers were also been used to identify genetic variations in both
X and Y chromosomes of taurine and zebu cattle breeds in Africa and reported to be useful in
determining zebu admixture in African cattle breeds (Anderung et al., 2007).
The combined investigation of Y-chromosome SNPs and microsatellite alleles, which are highly
conserved (i.e., ~108 per site per generation in humans) and highly mutable (i.e., ~103 per locus per
generation in humans), respectively (Hurles and Jobbing, 2001), facilitates the assessment of Yhaplotype diversity among species and the taxonomic origins of the genes. Y chromosome-specific
single nucleotide polymorphisms (SNPs) and microsatellites markers were therefore combined and
used to investigate the genetic diversity and origins in cattle (Bradley et al., 1994; Budowle et al.,
2005; Cai et al., 2006; Yang et al., 2011), dogs (Bannasch et al., 2005; Erdoan et al., 2013), sheep
(Niemi et al., 2013), and humanpopulations of different regions (Cinnioglu et al., 2004; Rootsi et al.,
2004). More recently, the combination of 5 SNPs, 1 indel, and 7 STRs identied 13 Y-chromosome
specific haplotypes in Portuguese native cattle breeds (Ginja et al., 2009). The 13 Y-haplotypes
included 3 previously described patrilines (Y1, Y2, and Y3) and 10 new haplotypes within Bos
taurus. Native cattle contained most of the diversity with 7 haplotypes (H2Y1, H3Y1, H5Y1, H7Y2,
H8Y2, H10Y2, and H12Y2). H6Y2 and H11Y2 occurred in high frequency across breeds including
the exotics and thus had a common genetic signature (Ginja et al., 2009).
The genetic diversity of the Y chromosome was determined as lower than that of autosomal
chromosomes (Liu et al., 2003; Hellborg and Ellegren, 2004; Ginja et al., 2009). Relatively low levels
of Y-chromosome genetic diversity have been reported in several mammalian species including
cattle (Hellborg and Ellegren 2004; Lindgren et al. 2004; Bannasch et al., 2005; Meadows et al., 2006;
Li et al., 2007). In the case of domestic animals, the effective Y chromosome contribution tends to be
70

Molecular Genetic Characterization of Farm Animal


Genetic Resources

reduced because of common use in breeding schemes of a few selected males that produce a large
number of offspring (Hellborg and Ellegren 2004). For example, a demographic analysis of the
native Portuguese cattle breed Alentejana indicates that, from an original number of 671 founder
sires, only 24 Y chromosomes are currently represented with an effective number of 2.73 males
(Carolino and Gama 2008). Despite limitations, studies of male lineages contribute to a better
understanding of the origin and relationships among domestic breeds (Edwards et al., 2000;
Lindgren et al., 2004; Anderung et al., 2005; Gotherstrom et al., 2005; Li et al., 2007). Detail of Ychromosome specific cattle STR, Primer sequences, Ta, fluorescence labeling etc. may be available
from previous studies (Bishop et al.,1994; Gtherstrm et al., 2005; Vaiman et al., 1994; Kappes et al.,
1997; Liu et al., 2002)
Y chromosome study in other domestic animals
Up to date, however, few phylogenetic surveys involving the Y chromosome have been reported in
domestic species due to a lack of MSY variation. Indeed, very low rates of nucleotide diversity have
been reported within the MSY of horse (Lindgren et al., 2004), cattle (Hellborg and Ellegren, 2004),
and sheep (Meadows et al., 2006). In goat, latest studies based on mitochondrial DNA analyses
revealed a complete pattern of caprine domestication (Luikart et al., 2001; Naderi et al., 2007). In the
ECONOGENE project (http://econogene.eu/) the sequence variation at the Y chromosome was
used to integrate information from mitochondrial and autosomal DNA to study the genetic
diversity of several goat breeds.
Future perspective
In the post genomic era, whole Y-chromosome sequencing holds the promise of stretching the
paternal phylogeny to its maximal resolution. The absence of recombination enables all Ychromosome sequences to be placed within a single phylogenetic tree and a single locus hierarchy
may oversimplify the demographic history of a particular breed/individual. In Y chromosome, the
ordering of the accumulated sequence variants since the most recent common ancestor is preserved.
Due to this molecular encapsulation of male demographic history, Y-chromosome phylogeny has
become one of the pillars of archaeogenetics. Although, it is possible to infer the phylogenetic
relationship based on STRs and SNPs identified till date however, the number of haplogroups that
may be interesting for deriving Y chromosome based phylogenies is expected to be much higher.
Also, from reports based on Y-STR networks, it is clear that it is still not possible to distinguish
several phylogenetic groups, which may be relevant for applications of the Y chromosomal tree.
Identification of thousands of unknown Y-SNPs through whole genome studies is the best resource
for deducing Y chromosome based diversity and phylogeography. The informative markers
derived from whole-genome sequences, followed by genotyping in larger panels can be used to
precisely delineate patterns of restricted geographic and/or population specificity. With the
advancements in the sequencing technology, a plethora of sequence information on Y chromosome
will be available leading to exponential number of new Y-SNPs and a growing number of (sub-)
haplogroups. Whole genome Y-SNP profiles will facilitate in better resolution of Y chromosomal
phylogenetic tree and help us to understand our animal genetic resources in a better way.
Table 1. Polymorphic sites on the cattle Y-chromosome
Locus
DDX3Y-1
DDX3Y-7
UTY-19
ZFY-9

Region
Intron 1
Intron 7
Intron 19
Intron 9

SNPs
425>C/T
123 > C/T
423> C/A
120> C/T

GenBank #
AY928816
AY928819
AY936543
AY928828

ZFY-10
ZFY_10indel

Intron 10
Intron 10

665> C/T
704> /GT

AF241271
AF241271

71

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Table 2. Description of bovine Y chromosome haplotypes


Haplotype
Y1
Y2
Y3

Origin
B.taurus
B.taurus
B.indicus

ZFY-9

Marker
DDX3Y-1

ZFY-10

C
C
T

C
C
T

C
C
T

DDX3Y-7

UTY-19

C
C
T

C
A
A

Table 3: Primer sequences, Ta,fluorescence label and references for Y chromosome STR loci
Locus Name
BM861-F
BM861-R
DDX3Y1STR-F
DDX3Y1STR-R
INRA124-F
INRA124-R
INRA126-F
INRA126-R
INRA189-F
INRA189-R
UMN0103-F
UMN0103-R
UMN0307-F
UMN0307-R
UMN0504-F
UMN0504-R
UMN0920-F
UMN0920-R
UMN2001-F
UMN2001-R
UMN2303-F
UMN2303-R
UMN2404-F
UMN2404-R
UMN3008-F
UMN3008-R

Primer sequence (5-3)


TTGAGCCACCTGGAAAGC
CAAGCGGTTGGTTCAGATG
TGAACCACTAGGGAGGTCATC
TTCCAATTTAGCTGTGGTTATCTG
GATCTTTGCAACTGGTTTG
CAGGACACAGGTCTGACAATG
GTTGTTGCCTCTGCAGAGTAGG
GACACTCTTTCTATTTTCAAGG
TACACGCATGTCCTTGTTTCGG
CTCTGCATCTGTCCTGGACTGG
ACACAGAGTATTCACCTGAG
ATTTACCTGGGTCAAAGCAC
GATACAGCTGAGTGACTAAC
GTGCAGACATCTGAGCTGTG
AGGCCATCTGCATAGTGAAG
TGCTGGACTGCTCATCTCTG
GTTGAGGACTCTTGCATCTG
CACAGGCCTAGAAGATTGAG
TCAGGCAAGACTACTGGAGC
TACCCTGGCGATTCTGCAA
TACTTGCTTGAGACTTACTG
TGTGAACACATCTGATTCTG
GGTACAATTGAAAATATG
TGTACCTACACTGATATGTT
TTGTGGAGGACTATTCATGG
TCTGGACTCGACAGGACACC

References

Ta(C)
58

Label
Ned

Reference
Bishop et al. (1994)

58

Fam

Gtherstrm et al. (2005)

58

Vic

Vaiman et al. (1994)

58

Fam

Vaiman et al. (1994)

55

Fam

Kappes et al. (1997)

58

Fam

Liu et al. (2002)

58

Vic

Liu et al. (2002)

58

Ned

Liu et al. (2002)

55

Ned

Liu et al. (2002)

55

Fam

Liu et al. (2002)

58

Ned

Liu et al. (2002)

58

Fam

Liu et al. (2002)

58

Fam

Liu et al. (2002)

Ajmone-Marsan P., Garcia J.F., LenstraJ.A. and the GLOBALDIV CONSORTIUM 2010.On the origin of cattle:
how aurochs became cattle and colonized the world. EvolAnthropol 19: 148.
Anderung C., Bouwman A., Persson P., Carretero J.M., Ortega A.I., ElburgR.,Smith C., Arsuaga J.L., Ellegren
H., Gotherstrom A. 2005. Prehistoriccontacts over the Straits of Gibraltar indicated by genetic analysis
of IberianBronze Age cattle. Proc. Natl. Acad. Sci. USA. 102:8431.
Anderung C., Hellborg L., Seddon J., Hanotte O., Gtherstrm A. 2007. Investigation of X- and Yspecific
single nucleotide polymorphisms in taurine (Bos taurus) and indicine (Bos indicus) cattle. Anim. Genet.
38: 595.
Bannasch D.L., Bannasch M.J., Ryun J.R., Famula T.R., Pedersen N.C. 2005.Y chromosome haplotype analysis
in purebred dogs. Mamm Genome 16: 273.
Beja-Pereira A., Caramelli D., Lalueza-Fox C., Vernesi C., Ferrand N., CasoliA,Goyache F., Royo L., Conti S.,
Lari M., Martini A., Ouragh L., Magid A., AtashA.,Zsolnai A., Boscato P., Triantaphylidis C., Ploumi
K., Sineo L., Mallegni F., TaberletP.,Erhardt G., Sampietro L, Bertranpetit J, Barbujani G, Luikart G,
Bertorelle G 2006.The origin of European cattle: evidence from modern and ancient DNA. Proc. Natl.
Acad. Sci. U S A, 103:8113.

72

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Bradley D, Loftus R, Cunningham P, MacHugh D (1998). Genetics and domesticcattle origins.Evol. Anthr
6:79.
Bradley D.G., MacHugh D.E., Cunningham P. and Loftus R.T. 1996. Mitochondrial diversity and the origins
of African and European cattle.ProcNatlAcadSci U. S. A. 93:5131.
Bradley D.G., MacHugh D.E., Loftus R.T., Sow R.S., Hoste C.H. and Cunningham E.P. 1994. Zebu-taurine
variation in Y chromosomal DNA: a sensitive assay for genetic introgression in West African
trypanotolerant cattle populations. Anim Genet 25: 7.
Budowle B., Adamowicz M., Aranda X.G., Barna C., Chakraborty R., Cheswick D., Dafoe B., Eisenberg A.,
Frappier R., Gross A.M.et al.2005. Twelve short tandem repeat loci Y chromosome haplotypes: genetic
analysis on populations residing in North America. Forensic SciInt 150: 1.
Cai X., Chen H., Wang S., Xue K. and Lei C. 2006. Polymorphisms of two Y chromosome microsa-tellites in
Chinese cattle. Genet SelEvol 38:525.
Charlesworth B. 1994. The effect of background selection against deleterious mutations on weakly selected,
linked variants. Genet Res 63: 213.
Charlesworth D., Charlesworth B., and Morgan M.T. 1995.The pattern of neutral molecular variation under
the background selection model. Genetics 141: 1619.
Edwards C.J., Baird J.F. and MacHugh D.E. 2007. Taurine and zebu admixture in Near Eastern cattle: a
comparison of mitochondrial, autosomal and Y-chromosomal data. Anim. Genet. 38:520.
Edwards C.J., Gaillard C., Bradley D.G. and MacHugh D.E. 2000. Y-specificmicrosatellite polymorphisms in a
range of bovid species.Anim Genet.31:127.
Erdoan M., Tepeli C., Brenig B., Akbulut M.D., Uuz C., Savolainen P. and zbeyaz C 2013. Genetic
variability among native dog breeds in Turkey. Turk J Biol 37: 176.
Freemann, A., Hoggart, C., Hanotte, O. and Bradley, D.G. (2006). Assessing the relative ages of admixture in
the bovine hybrid zones of Africa and the Near East using X chromosome haplotype
mosaicism.Genetics 173: 1503-1510.
Gemmell N.J. and Sin F.Y. 2002. Mitochondrial mutations may drive Y chromosome evolution. Bioessays 24:
275.
Ginja C., Telo da Gama L. and Penedo M.C.T. 2009. Y Chromosome haplotype analysis in Portuguese cattle
breeds using SNPs and STRs. J. Hered. 100: 148.
Gotherstrom A., Anderung C., Hellborg L., Elburg R., Smith C., Bradley D.G. and Ellegren H. 2005. Cattle
domestication in the Near East was followed by hybridization with aurochs bulls in Europe. ProcBiol
Sci. 272:2345.
Halnan C.R.E. and Watson J.I. 1982. Y chromosome variants in cattle Bos taurus and Bos indicus. Ann Genet
SelAnim 14: 1.
Hanotte O., Tawah C.L., Bradley D.G., Okomo M., Verjee Y., Ochieng J. and RegeJ.E. 2000. Geographic
distribution and frequency of a taurineBostaurusandanindicineBosindicusY specific allele amongst
sub-saharan African cattlebreeds.Mol Ecol.9:387.
Hellborg L. and Ellegren H. 2004. Low levels of nucleotide diversity inmammalianY chromosomes.
MolBiolEvol 21:158.
Hiendleder S., Lewalski H. and Janke A. 2008. Complete mitochondrial genomes of Bos taurus and Bos
indicus provide new insights into intra-species variation, taxonomy and domestication. Cytogenet
Genome Res 120(12):150.
Hurst LD, and Ellegren H. 1998. Sex biases in the mutation rate. Trends Genet 14: 446-452.
Jobling M.A. and Tyler-Smith C. 1995. Fathers and sons: the Y chromosome and human evolution. Trends
Genet 11: 449.
Jobling M.A., Pandya A. and Tyler-Smith C. 1997. The Y chromosome in forensic analysis and paternity
testing.Int J Legal Med 110: 118.
Jobling M.A. 2001. In the name of the father: surnames and genetics. Trends Genet 17: 353.
Kantanen J., Edwards C.J., Bradley D.G., Viinalass H., Thessler S., Ivanova Z. et al. 2009. Maternal and
paternal genealogy of Eurasian taurine cattle (Bos taurus). Heredity doi:10.1038/hdy.2009.68
Li M.H., Zerabruk M., Vangen O., OlsakerI. and Kantanen J. 2007. Reduced genetic structure of north
Ethiopian cattle revealed by Y-chromosome analysis. Heredity, 98:214.

73

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Li R., Xie W.M., Chang Z.H., Wang S.Q., Dang R.H., Lan X.Y., Chen H. and Lei C.Z. 2013. Y chromosome
diversity and paternal origin of Chinese cattle.MolBiol Rep. 40 (12):6633-6. doi: 10.1007/s11033-0132777-y.
Lindgren G., Backstrom N., SwiLindgren G., Backstrom N., Swinburne J., Hellborg L., Einarsson A., Sandberg
K., Cothran G., Vila C., Binns M. and Ellegren H. 2004. Limited number of patrilines in horse
domestication.Nat Genet. 36:335.
Liu WS, Beattie CW, Ponce de Leon FA.(2003). Bovine Y chromosome microsatellite
polymorphisms.Cytogenet Genome Res. 102:5358.
Luikart, G., Gielly, L., Excoffier, L., Vigne, J.D.,Bouvet, J. and Taberlet, P. 2001. Multiple maternal origins and
weak phylogeographic structure in domesticgoats. Proc. Natl. Acad. Sci. U.S.A. 98:5927-5932.
MacHugh D.E., Shriver M.D., Loftus R.T., Cunningham P. and Bradley D.G. 1997. Microsatellite DNA
variation and the evolution, domestication andphylogeography of Taurine and Zebu cattle (Bos taurus
and Bos indicus). Genetics, 146:1071.
McCombK. and Clutton-Brock T. 1994. Is mate choice copying or aggregation responsible for skewed
distributions of females on leks? Proc R SocLond B BiolSci 255: 13.
Meadows J.R., Hanotte O., Drogemuller C., Calvo J., Godfrey R., Coltman D., Maddox J.F., Marzanov N.,
Kantanen J., Kijas J.W. 2006. Globally dispersed Y chromosomal haplotypes in wild and domestic
sheep. Anim Genet. 37: 444.
Mitchell R.J. and Hammer M.F. 1996. Human evolution and the Y chromosome. CurrOpin Genet Dev 6: 737.
Miyata T., Hayashida H., Kuma K., Mitsuyasu K. and Yasunaga T. 1987. Male-driven molecular evolution: a
model and nucleotide sequence analysis. Cold Spring HarbSymp Quant Biol 52: 863-867.
Naderi, S., Rezaei, H.R.,Taberlet, P., Zundel, S., Rafat, S.A., Naghash, H.R., El-Barody, M.A., Ertugrul, O.and
Pompanon, F., 2007.Large-scale mitochondrial DNA analysis of the domestic goat reveals six
haplogroups with high diversity.PLoS One 2, e1012.
Niemi M., Bluer A., Iso-Touru T., Nystrm V., Harjula J., Taavitsainen J.P., Stor J., Lidn K. and Kantanen J.
2013. Mitochondrial DNA and Y-chromosomal diversity in ancient populations of domestic sheep
(Ovisaries) in Finland: comparison with contemporary sheep breeds. Genet SelEvol 45: 2.
Pidancier, N., Jordan, S., Luikart, G. and Taberlet, P., 2006. Evolutionary history of the genus Capra
(Mammalia, Artiodactyla): discordance between mitochondrial DNA and Y-chromosomephylogenies.
Mol. Phylogenet. Evol.40:739-749.
Potter W.L., Upton PC 1979. Y chromosome morphology of cattle.Aust Vet J 55: 539.
Rice W.R. 1987. Genetic hitchhiking and the evolution of reduced genetic activity of the Y sex chromosome.
Genetics 116: 161.
Roed K.H., Holand O., Smith M.E., Gjostein H., Kumpula J. and Nieminen M. 2002. Reproductive success in
reindeer males in a herd with varying sex ratio.MolEcol 11: 1239.
Roldan E.R. and Gomendio M. 1999. The Y chromosome as a battle ground for sexual selection. Trends in
Ecology and Evolution 14: 58-62.
Rozen S., Skaletsky H., Marszalek J.D., Minx P.J., Cordum H.S., Waterston R.H., Wilson R.K., and Page D.C.
2003. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes.
Nature 423: 873.
Seielstad M.T., Minch E. and Cavalli-Sforza L.L. 1998.Genetic evidence for a higher female migration rate in
humans. Nat Genet 20: 278.
Skaletsky H., Kuroda-Kawaguchi T., Minx P.J., Cordum H.S., Hillier L., Brown L.G., et al. 2003. The malespecific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423: 82537.
Verkaar E.L.C., Nijman I.J., Beeke M., Hanekamp E., Lenstra J.A. 2004. Maternal and paternal lineages in
cross-breeding bovine species. Has Wisent a hybrid origin? Mol. Biol. Evol. 21: 1165.
Wyckoff G.J., Wang W., and Wu C.I. 2000. Rapid evolution of male reproductive genes in the descent of man.
Nature 403: 304.
Yang Y., Chang T.C., Yasue H., Bharti A.K., Retzel E.F., Liu W.S. 2011. ZNF280BY and ZNF280AY: autosome
derived Y-chromosome gene families in Bovidae. BMC Genomics 12: 13.

74

11
Candidate Gene Polymorphism Approaches for Detection and Genotyping
R S Kataria and S K Niranjan
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)
____________________________________________________________________________________________

Variations in the genomic sequence along with various environmental forces influence the
evolutionary process in different species. Efforts have been made, since long to exploit the genetic
variation or polymorphism among and within species for the benefit of mankind through selection
for better performance. Genetic polymorphism can be defined as the occurrence of two or more
alleles at same locus in the same population, each with appreciable frequency. The locus is said to
be polymorphic and the population to exhibit polymorphism for that locus. A polymorphic locus is
usually defined as one for which frequency of the most common allele is less than 0.99. There are
only two kinds of polymorphism: First due to replacement of DNA bases and second due to
insertion or deletion of base pairs. The term polymorphism is different from mutation, which is
generally used to refer to changes in DNA sequence which is not present in most individuals of a
species. Genetic polymorphism during last one decade has found important place in livestock
genomics for developing molecular markers for an early selection of animals and revealing
polymorphisms at the DNA level, is now key player in animal genetics. This has also been used
widely for studying the genetic variation existing within and between species, parentage
verification and also for understanding the process of evolution.Before the discovery of new
generation sequencing tools and genome-wide polymorphism discovery, nucleotide variations
within functional genes were being exploited largely as selection markers, some of them finding
their ways into the commercial use of such markers. Candidate gene polymorphism with
significant effects on production traits has been exploited widely in the livestock species in the past.
Utility of such molecular markers is governed by two major factors- genotyping protocol, which
should be simplest possible and the cost of genotyping which should be low, so as to generate vast
amount of data for association and selection.
Single Nucleotide Polymorphism
Discovery of one kind of genetic polymorphism referred as Single Nucleotide Polymorphism
(SNPs), has paved the newer ways to harness the genomic variationfor developing markers most
suitable for faster genetic gain in livestock species. SNPsare stable genetic markers and have low
mutation rate in comparison with other genetic markers. Even though SNPs are bi-allelic codominant markers, when compared to more informative multiallelic microsatellites, these are still
considered important because of their importance in a higher density and a comparative low cost of
genotyping. Advent of newer high throughput low-cost technologies like nano-pore sequencing
will pave the ways for their better use as markers for future genomic selection in livestock sector.
Depending upon location and their nature the SNPs could be synonymous which are present in
coding region but do not result in change of amino acids due to degeneracy of triplet codons. But
they are non-synonymous if they are present in coding region and also result in change of amino
acid. Non-synonymousSNPs could be further of two types: Missense - Non-synonymous change
results in a different amino acid or nonsense - Non-synonymous change results in a premature stop
codon.Two out of every three SNPs, involve the replacement of C with T. SNPs occur in both the
coding and non-coding regions of the genome. The coding region SNPs may result in mutations,
affecting protein function or resulting in neutral mutations, which do not affect the protein
function. The SNPs coding outside the coding regions may serve as useful markers, because of their
75

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

close proximity to disease loci. Polymorphic nucleotides present in 5 or 3UTR and promoter
regions could affect the expression of genes and hence the phenotype of animal. Several SNP
markers in the intronic regions also have been shown to be having association with the phenotypes
including disease resistance.
Identification of SNPs
Sequencing is the method of choice for the detection of SNPs, if we want to target particular region
of the genome, which could be a candidate gene of interest, selected from a pathway governing the
trait of interest. The genomic DNA samples are selected from the diverse possible
breeds/races/populations or individuals, which could be either pooled or sequenced individually
after PCR amplification of the region, we are interested in analyzing. Next generation sequencing
techniques have now helped in screening of whole genome for the generation of enormous data
that could be utilized for preparing SNP chip for even whole genome selection. Once we have
identified the SNPs various techniques given below could be utilized for genotyping of large
numbers of individual to find the association of SNPs with the trait of interest or to study the
genetic variation at that particular locus. Some of the techniques like SSCP, PCR-RFLP have been
utilized for detecting as well as genotyping of SNPs in the known target regions of the genome.
Highly polymorphic genes like Major Histocompatibility Complex (MHCs) have been analyzed for
detecting genetic variation using simple PCR-RFLP.
Random Amplification of Polymorphic DNA, RAPD-PCR technique hasalso been successfully
used indefining genetic diversity among different species. RAPD method was used to generate
specificfingerprint patterns of ten different species: includingwild boar, pig, horse, buffalo, beef,
venison, dog, cat, rabbit, and kangaroo. RAPD markers have advantages like, no prior sequence
knowledgeis necessary for designing the specific primers,which can then be used in different
templates. Theamount of DNA required is very small because it will be amplified by PCR. RAPDs
are simple, quick, andcost effective compared to RFLP. However,the technique also has some
disadvantages, likethe repeatability and reliability of RAPD polymorphic profiles are poor. Some
non-specific and thereforenon-reproducible binding of primers occurs and RAPDs are dominant
genetic markers which cannot beused to distinguish homozygote.
Amplified fragment length polymorphism or AFLP is acombination of the RFLP and PCR
techniques for the detection of polymorphism. In this technique first the genomic DNA isdigested
with a restriction enzyme and then the digestedfragments are ligated to the primers that are
complementary to a selectivesequence on the adaptors. Subsequent separationof the amplified
fragments are obtained by selective primersand visualized using autoradiography or by size in gel
electrophoresis. AFLPsovercome the drawbacks of the labor-intensive, timeconsumingRFLP
method and solve the reliability problemcaused by non-specific amplifications in RAPDs. AFLP has
the advantage of genetic stability, being effective, rapid, and economical tool for detectinga large
number of polymorphic genetic markers thatcan be genotyped. The AFLP method is an
idealmolecular approach for population genetics and genometyping, it is consequently widely
applied to detect geneticpolymorphisms, evaluate, and characterize animal genetic resources.
Genotyping of SNPs
The increasing need for large-scale genotyping applications of single nucleotide polymorphisms
(SNPs) in model and non-model organisms requires the development of low-cost technologies
accessible to minimally equipped laboratories. Many techniques have been developed and are
being utilized depending upon the facilities available and the number of individuals to be screened.
Direct sequencing: Sequencing is the best way to detect as well as genotype SNPs. It is not only
helpful in detection of polymorphic sites but also confirms the alleles present. This is the gold test
to confirm alleles present at a site and detected by other methods. This is the method of choice for
76

Molecular Genetic Characterization of Farm Animal


Genetic Resources

sequencing closely placed SNPs which could be detected by single pass sequencing and also
becomes economical than many other genotyping methods, which otherwise give ambiguous
results. In chromatogram heterozygous positions can be scored manually or by sequence alignment
tools like MegAlign, it is possible to score homozygous as well as heterozygous alleles. Typically
heterozygous position will depict double peak and software like PhredPhrap can help in base
calling and scoring of alleles.

(a)

(b)

Figure 1: Sequence chromatogram showing polymorphism with multiple peaks in heterozygous sample and
distinct homozygous peaks (G/T in a. and C/T in b.) for two alleles present at the site.

Restriction enzyme cutting (PCR-RFLP):This is the method of choice for genotyping, having
several advantages like easy to perform and simple to record genotypes. But it is not possible to
design protocol for each polymorphic locus identified. Limitation is the possibility of getting a
suitable restriction site at the locus as only few restriction enzymes are available, which recognize
specific cleavage site. Online software are available to design PCR-RFLP protocols. It is also
possible to use PCR-RFLP for detecting as well as genotyping unknown polymorphic sites within
the amplified region, particularly for highly polymorphic genes like Major Histo-compatibility
Complex (MHC) genes. A different approach is followed for that. First we digest the target region
using selected restriction enzymes of our choice e.g. preferably tetra-cutters having higher
frequency of cutting sites. Allelic patterns are recorded and representative alleles are confirmed by
sequencing afterwards. It is also possible to clone and confirm the alleles, particularly when
duplication in genes like MHC class-I is observed during PCR-RFLP.

Figure 2: PCR-RFLP genotyping of amplified product with restriction enzyme and recording of genotypes.

Single Strand Conformation Polymorphism Analysis (SSCP):


Principle of SSCP Analysis:Two strands of a PCR amplified product can be separated into single
strands by heat. Each single strand can coil around itself to form a 3-dimensional structure or
conformation through intramolecular or intra strand hydrogen bonds. This conformation attained
depends on the length of the strand and its base composition. The two complementary strands may
have different conformations because they are not identical. If the DNA fragment contains a single
base change, there are four different single strands on denaturation, which may have four different
forms. These molecules or strands may have different in their 3-dimensional sizes and shapes,
affecting their migration speed in a polyacrylamide gel. Hence different band patterns are observed
because of the presence of a nucleotide changes in the amplified DNA region.

77

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Figure 3: Detection of SSCP variants in non-denaturing polyacrylamide gel and their confirmation by
sequencing.

Primer extension: includes the following methodsMass spectrometry: The principle of the commercially available mass spectrophotometry based
MassARRAY(http://agenabio.com/) system is the extension of an oligonucleotide probe over a
SNP site in a PCR product, with a mixture of deoxynucleotides and dideoxynucleotides, to produce
different size products for each allele of a SNP. The extended products are analyzed by
SEQUENOM MALDI-TOF (Matrix-assisted laser desorption/ionizationtime of flight mass
spectrometry)mass spectrometry, and the time-of-flight is proportional to mass, permitting precise
determination of the size of products generated, which can be converted into genotype information.
Because the mass resolution of this method is very high, one can routinely perform multiplexed
assays to permit analysis of up to 6 SNPs in one PCR reaction/tube.Candidate genetic marker
development with the MassARRAYsystem is preferable over the other systems due to itsflexibility.
The highly efficient assay design, short lead time and easy panel modification enable users to
rapidly validate the genetic markers at low reagent and labor cost.

Figure 4: Genotyping by Mass-spectrophotometry using SEQUENOM MALDI-TOF MassARRAY


(https://www.empiregenomics.com/files/store/products/Genotyping/SNP_Genotyping_Using_the%20Sequen
om.pdf).

78

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Allele-specific primers: The method allows efficient discrimination of SNPs by allele-specific PCR in
a single reaction with standard PCR conditions. A common reverse primer and two forward allelespecific primers with different tails amplify two allele-specific PCR products of different lengths,
which are further separated by agarose gel electrophoresis. PCR specificity is improved by the
introduction of a destabilizing mismatch within the 3 end of the allele-specific primers. This is a
simple and inexpensive method for SNP detection that does not require PCR optimization.
Tetra-Primers Amplification Refractory Mutation System-(Tetra-ARMS) PCR: It is a simple
technique, requiring no special equipment etc. and capable of detecting genotypes directly on gel
after PCR without any post PCR processing. The principle of technique is very simple, designing
and utilizing four different set of primers as shown below. One outer forward and outer reverse
primer set, which will amplify a product without discriminating the alleles. One each forward and
reverse primer discriminating the alleles in opposite orientation utilizing the outer primers. Inner
allele specific primers will amplify the products of different lengths utilizing one of the outer
primers, depending upon the allele present. Only disadvantage of the technique is it requires lots of
standardization and it might not be possible to design primers easily from the polymorphic site.

Figure 5: Principle of tetra-Primers-ARMS PCR amplification.

Figure 6: A typical agarose gel showing tetra-ARMS PCR genotyping results.Note the common outer primers'
amplified product on the top.

79

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Single base extension: The SNaPshot Multiplex System is a primer extension-based method that
enables multiplexing up to 10 SNPs (single nucleotide polymorphisms).SNaPshot labeling
chemistry relies on single-base extension and termination.The SNaPshot Multiplex Kit uses a
single-tube reaction to interrogate SNPs at knownlocations. The chemistry is based on the dideoxy
single-base extension of an unlabeledoligonucleotide primer (or primers). Each primer binds to a
complementary template in thepresence of fluorescently labeled ddNTPs and DNA polymerase.
The polymerase extendsthe primer by one nucleotide, adding a single ddNTP to its 3 end. The
fluorescence colorreadout reports which base was added.

Figure 7: SNaPshot single base extension labelling chemistry


(https://tools.lifetechnologies.com/content/sfs/brochures/cms_101014.pdf).

Submission of SNPs to database


SNPs databases: Enormous data have been generated on SNPs of various mammalian and nonmammalian species. The public databases give more information about newly discovered SNPs. In
response to a need for creating a general catalog of genome variation to address the large-scale
sampling designs required by association studies, gene mapping and evolutionary biology, the
National Cancer for Biotechnology Information (NCBI) has established the dbSNP database.
dbSNPis the largest available database for SNPs across all species and is one of the most popular
SNP databases hosted by NCBI (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi),
which shows the current records of 364,942,122SNPs identified in human beings, chicken having
32,135,384, cattle- 234,957,222, sheep- 100,819,196, horse- 5,453,018, pig- 82,712,833, goat- 55,076,618
and buffalohaving only - 502 SNPs in the SNP data base. The discovery of newer SNPs is in
progress at great pace and database is updated regularly. Latest release from dbSNP is build 143
(Table 1). The dbSNP also gives the information about the frequencies of SNPs in selected
populations and also all the other details like genotyping protocol etc for some of the SNPs
discovered. There are some specific SNP data bases also now like for cancer and cytokines in
human beings.
Submissions to dbSNP will be integrated with other sources of information at NCBI such as
GenBank, PubMed, LocusLink and the Human Genome Project data.SNP data submitted to dbSNP
come from different sources including individual laboratories, collaborative polymorphism
discovery efforts e.g. species specific consortia, large-scale genome sequencing centers, and private
industry. The dbSNP accepts submissions for variations in any species and from any part of a
genome.The data collected range from characterization of particular genes associated with traits of
interest or broadly sampled levels of variation from random genomic sequence. It should also be
noted that in serving its role as the variation complement to GenBank, dbSNP does not restrict
submissions to only neutral polymorphisms. Submissions are welcome on all classes of simple
molecular variation, including those that cause rare clinical phenotypes.
80

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Table 1: Build statistics of dbSNP Build 143


Number of
RefSNP Clusters
(rs#'s) ( #
validated)

Number of
(rs#'s)
in gene

dbSNP
Build

Genome
Build

Number of
Submissions
(ss#'s)

Bos taurus

143

6.2

234,957,222

95,182,052
(44,427,026)

45,672,343

Sus scrofa

143

4.2

82,712,833

52,679,275
(27,768,562)

20,672,310

Ovis aries

143

1.1

100,819,196

54,004,457
(28,360,665)

19,571,908

Capra hircus

143

1.1

55,076,618

37,166,653 (0)

13,055,034

Ovis orientalis

143

1.1

29,547,990

29,263,208 (0)

10,156,420

Capra aegagrus

143

1.1

17,530,711

17,415,157 (0)

6,073,671

Zea mays

143

1.1

13,784,397

10,526,779
(2,343,328)

3,098,050

Ciona
intestinalis

143

3.1

3,233,523

3,190,512 (0)

1,900,242

Microtus
ochrogaster

143

1.1

14,699

14,545 (0)

5,375

0
genomes

537,677,189

299,442,638
(102,899,581)

120,205,353

Organism

Total: 9
Organisms

Number
of
(ss#'s)
with
genotype
-

Number
of
(ss#'s)
with
frequency
968
161
173

1,302

(http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summaryandbuild_id=143)

Figure 8: The structure of the flanking sequence in dbSNPdisplaying composite of bases either assayed for
variation or included from published sequence.

81

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Table 2: SNP databases


Database
Emphasis
Details
dbSNP
SNPs from the complete
More than 6 million
http://www.ncbi.nlm.nih.gov/projects/SNP
genome
validated SNPs
HapMap http://hapmap.org/cgi-perl/gbrowse Whole genome SNPs in
More than 1 million
four populations
SNPs
HGVbasehttp://hgvbase.cgb.ki.se/
SNPs from the complete
More than 2.8 million
genome
SNPs
GVShttp://gvs.gs.washington.edu/GVS
Access to dbSNP and
4.3 million SNPs
HapMap SNPs
Perlegen Genotype
Whole genome SNP in three More than 1.5 million
Datahttp://genome.perlegen.com
populations
SNPs
JSNPhttp://snp.ims.u-tokyo.ac.jp/
Common SNPs within
More than 197,000
Japanese population
SNPs
PharmGKBhttp://www.pharmgkb.org
Genes involved in drug
SNPs from 167 genes
metabolism
SNP500Cancer
Genes involved in cancer
More than 13,400
http://snp500cancer.nci.nih.gov/home_1.cfm
SNPs
NIEHS SNPs Program
Environmental response
More than 83,000
http://egp.gs.washington.edu
genes
SNPs
Human Cytochrome P450 (CYP) Allele
Human cytochrome P450 SNPs from 25 CYP
Nomenclature Committee
genes
450 genes
http://www.cypalleles.ki.se
Cytokine Gene Polymorphism
Cytokine gene
SNPs from more than
http://www.nanea.dk/cytokinesnps/
polymorphisms in human 40 cytokine genes
disease
(Source- Kim andMisra, Biomedical Engineering / Volume 9, 2007 / pp. 289-320)

The dbSNP thus has been designed to support submissions and research into a broad range of
biologicalproblems. These include physical mapping, functional analysis and pharmacogenomics,
association studies, and evolutionary studies. Because dbSNP was developed to complement
GenBank, it may contain nucleotide sequences from any organism; currently, the majority of the
data is for human and mouse, with enormous increase in submissions from livestock and poultry
species, mainly cattle, sheep, pig and chicken.
References

Brookes, A.J. 1999. The essence of SNPs. Gene, 234: 177.


Kim, S. and Misra, A. 2007. SNP Genotyping: Technologies and Biomedical Applications. Annual Review of
of Biomedical Engineering, 9: 289.
Kwok, P. and Chen, X. 2003.Detection of Single Nucleotide Polymorphisms. Current Issues in Molecular
Biology, 5: 43.
Yang, W., Kang, X., Yang, Q., Lin Y. and Fang, M. 2013. Review on the development of genotyping methods
for assessing farm animal diversity. Journal of Animal Science and Biotechnology,4:2.
Ye, S., Dhillon, S., Ke, X., Collins, A.R. and Day, I.N.M. 2001. An efficient procedure for genotyping single
nucleotide polymorphisms. Nucleic Acids Research,29: e88. doi: 10.1093/nar/29.17.e88.

82

12

Dissection of Complex Traits and Identification of Quantitative Trait Loci


in Livestock
R K Vijh and Upasna Sharma
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________
The vast majority of economically important traits in livestock production systems are quantitative,
that is they show continuous distributions. In attempting to explain the genetic variation observed
in such traits, two models have been proposed, the infinitesimal model and the finite loci model.
The infinitesimal model assumes that traits are determined by an infinite number of unlinked and
additive loci, each with an infinitesimally small effect (Fischer 1918). This model has been
exceptionally valuable for animal breeding, and forms the basis for breeding value estimation
theory (e.g Henderson 1984).
However, the existence of a finite amount of genetically inherited material (the genome) and the
revelation that there are perhaps a total of only around 20 000 genes or loci in the genome (Ewing
and Green 2000), means that there is must be some finite number of loci underlying the variation in
quantitative traits. In fact, there is increasing evidence that the distribution of the effect of these loci
on quantitative traits is such that there are a few genes with large effect, and a many of small effect
(Shrimpton and Robertson 1998, Hayes and Goddard 2001). In Figure 1.1, the size of quantitative
trait loci (QTL) reported in QTL mapping experiments in both pigs and dairy cattle is shown. These
histograms are not the true distribution of QTL effects however, they are only able to observe
effects above a certain size determined by the amount of environmental noise, and the effects are
estimated with error. In Figure1.1.B, the distribution of effects adjusted for both these factors is
displayed. The distributions in Figure 1.1.B indicate there are many genes of small effect, and few of
large effect. The search for these loci, particularly those of moderate to large effect, and the use of
this information to increase the accuracy of selecting genetically superior animals, has been the
motivation for intensive research efforts in the last two decades. Anylocus with an effect on the
quantitative trait is a called a QTL, not just the loci of large effect.

Figure 1.1 A. Distribution of additive (QTL) effects from pig experiments, scaled by the standard deviation of
the relevant trait, and distribution of gene substitution (QTL) effects from dairy experiments scaled
by the standard deviation of the relevant trait. B. Gamma Distribution of QTL effect from pig and
dairy experiments, fitted with maximum likelihood (adapted from manual of Prof Ben Hayes).

Two approaches have been used to uncover QTL. The candidate gene approach assumes that a gene
involved in the physiology of the trait could harbor a mutation causing variation in that trait. The
gene, or parts of the gene, is sequenced in a number of different animals, and any variations in the
83

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

DNA sequences, that are found, are tested for association with variation in the phenotypic trait.
This approach has had some successes for example a mutation was discovered in the oestrogen
receptor locus (ESR) which results in increased litter size in pigs (Rothschild et al. 1991). There are
two problems with the candidate gene approach, however. Firstly, there are usually a large number
of candidate genes affecting a trait, so many genes must be sequenced in several animals and many
association studies carried out in a large sample of animals (the likelihood that the mutation may
occur in non-coding DNA further increases the amount of sequencing required and the cost).
Secondly, the causative mutation may lie in a gene that would not have been regardeda priori as an
obvious candidate for this particular trait.
An alternative is the QTL mapping approach, in which chromosome regions associated with
variation in phenotypic traits are identified. QTL mapping assumes the actual genes which affect a
quantitative trait are not known. Instead, this approach uses neutral DNA markers and looks for
associations between allele variation at the marker and variation in quantitative traits. A DNA
marker is an identifiable physical location on a chromosome whose inheritance can be monitored.
Markers can be expressed regions of DNA (genes) or more often some segment of DNA with no
known coding function but whose pattern of inheritance can be determined. When DNA markers
are available, they can be used to determine if variation at the molecular level (allelic variation at
marker loci along the linkage map) is linked to variation in the quantitative trait. If this is the case,
then the marker is linked to, or on the same chromosome as, a quantitative trait locus or QTL which
has allelic variants causing variation in the quantitative trait.
Until recently, the number of DNA markers identified in livestock genome was comparatively
limited, and the cost of genotyping the markers were high. This constrained experiment designed to
detect QTL to using a linkage mapping approach. If a limited number of markers per chromosome
are available, then the association between the markers and the QTL will persist only within
families and only for a limited number of generations, due to recombination. For example in one
sire, the A allele at a particular marker may be associated with the increasing allele ofthe QTL,
while in another sire, the a allele at the same marker may be associated with the increasing allele at
the QTL, due to historical recombination between the marker and the QTL in the ancestors of the
two sires. To illustrate the principle of QTL mapping exploiting linkage, consider an example where
a particular sire has a large number of progeny. The parent and the progeny are genotyped for a
particular marker. At this marker, the sire carries the marker alleles172 and 184, Figure 1.2.

Figure 1.2. Principle of quantitative trait loci (QTL) detection, illustrated using an abalone example. A sire is
heterozygous for a marker locus, and carries the alleles 172 and 184 at this locus. The sire has a large
number of progeny. The progeny are separated into two groups, those that receive allele 172 and
those that receive allele 184. The significant difference in the trait of average size between the two
groups of progeny indicates a QTL linked to the marker. In this case, the QTL allele increasing size
is linked to the 172 allele and the QTL allele decreasing size is linked to the 184 allele (Figure adapted
from Nick Robinson).

84

Molecular Genetic Characterization of Farm Animal


Genetic Resources

The progeny can then be sorted into two groups, those that receive allele 172 and those that receive
allele 184 from the parent. If there is a significant difference between the two groups of progeny,
then this is evidence that there is a QTL linked to that marker.QTL mapping exploiting linkage has
been performed in all nearly livestock species for a huge range of traits. The problem with mapping
QTL exploiting linkage is that, unless a huge number of progeny per family or half sib family are
used, the QTL are mapped to very large confidence intervals on the chromosome. To illustrate this,
consider the formula that Darvasi and Soller (1997) gave for estimating the 95% CI for QTL location
for simple QTL mapping designs under the assumption of a high density genetic map. The formula
was CI=3000/(kN2)where N is the number of individuals genotyped, is the allele substitution
effect (the effect of getting an extra copy of the increasing QTL allele) in units of the residual
standard deviation, k the number of informative parents per individual, which is equal 1 for halfsibs and backcross designs and 2 for F2 progeny, and 3000 is about the size of the cattle genome in
centi-Morgans. For example, given a QTL segregates on a particular chromosome within a half sib
family of 1000 individuals, for a QTL with an allele substitution effect of 0.5 residual standard
deviations the 95% CI would be 12 cM. Such large confidence intervals have two problems. Firstly if
the aim of the QTL mapping experiment is to identify the mutation underlying the QTL effect, in a
such a large interval there are a large number of genes to be investigated (80 on average with 20 000
genes and a genome of 3000cM). Secondly, use of the QTL in marker assisted selection is
complicated by the fact that the linkage between the markers and QTL is not sufficiently close to
ensure that marker-QTL allele relationships persist across the population, rather marker-QTL phase
within each family must be established to implement marker assisted selection.
Designs for QTL detection in livestock
The designs used to detect the QTLs in livestock vary from experimental backcross and F2
populations to half sib designs that use existing family structures within a commercial population.
The situation in case of livestock is however ticklish as compared to plant species. The absences of
inbred lines in livestock, maintenance of the experimental populations are prohibitively expensive.
The other part is reproductive capacity and generation interval is often the limiting in the choice of
experimental design. These factors have to be taken into account in both the design and analysis of
QTL experimental. Using a sparse marker map of 10-20 cMspacings, several designs have been
used to detect regions across the genomes in livestock.
Experimental crosses for QTL detection in livestock
The experimental crosses have been implemented in pigs and poultry as the generation intervals
are relatively short and the number of offspring is moderate to high. Such crosses have been
established between domestic breeds and descendants of their wild progenitors as well between
phenotypically divergent commercial breeds. The analysis is generally the same as that used for
inbred crosses i.e., markers alleles in the second generation are traced back to their line origin and
contrasts for putative QTL are estimated as differences between lines.
Exploiting existing family structures
In case of large ruminants common approach of large paternal half sib family structures that occur
where the usage of artificial insemination is common. In these half sib designs the genotypes are
collected on a number of grandsires and their half sib offspring. Phenotypes are either collected on
the half sib offspring themselves or on a group of progeny from each half sib. In dairy cattle where
the number of daughters of each sire are more than 100with phenotype records, the three
generation half sibdesign (A granddaughter design)has been a common practice in several
advanced nations. The analysis using least square analysis makes no assumptions about number of
QTL alleles and estimates a unique QTL effect within each half sib family. As the half sib family
structures also exist within experimental crosses, these models are sometimes also fitted on line
85

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

cross populations to gain further insights into identified QTL by checking which F1 parents are
most likely to be heterozygous for a QTL and allow for which QTL genotypes of the F 1 parents. The
same methodology can be extended to accommodate full sib family structures, provided family
sizes are sufficient, and such an approach has only be used in poultry breeding.
When we have in mind a single trait or an index of traits as the main focus, a potentially efficient
use of genotyping resources can be achieved by selective genotyping, where only the individuals
showing a more extreme phenotype within a family or a cross are genotyped. Selective genotyping
pooling, where marker allele frequencies are high pools are contrasted with those in the low pools
within families. The approach can reduce the number of genotyping further, but requiresmore
careful design, analysis and consideration of technical aspects if it is not to lead to detection of
spurious QTL.
Utilising marker information in genetic evaluation programs
The value of genotypic information for predicting the genetic merit of animals is dependent
on the predictive ability of the marker genotypes. The three types of molecular lociviz.; direct
markers, LD markers and LE markers differ not only in methods of detection but also in methods
of their incorporation in genetic evaluation procedures. Whereas direct and, to a lesser
degree, LD markers, allow selection on genotype across the population, use of LE markers
must allow for different linkage phases between markers and QTL from family to family, i.e.
LE markers are family specific and family specific information must be derived.
Utilising information of QTLs in selection models
By using QTL information in genetic evaluation, in principle, part of the assumed polygenic
variation is substituted by a separate effect due to a genetic polymorphism at a known locus. This
has the immediate effect of having a much better handle on the Mendelian sampling process, as
phenotypic covariance can be evaluated based on specific genetic similarity rather than on an
average relationship.
A number of different approaches have been described to accommodate marker information in
genetic evaluation. Roughly, these methods can be distinguished through their modeling of the
QTL effect and through the type of genetic marker information used. The QTL effect can be
modeled as random or fixed, while the molecular information comes from LE, LD or direct
markers.
With a fixed QTL model, regression on genotype probabilities would be used in genetic
evaluation to account for the effect of QTL polymorphisms. In the simplest additive QTL model,
suitable for estimating breeding values, simple regressions could be included on the probability of
carrying the favourable mutation. Regression can be on known genotypes (class variables), or
probabilities can be derived for ungenotyped animals in a general complex pedigree. A fixed QTL
model is sensible if few alleles are known to be segregating, and where dominance and/or epistasis
are important. The model also assumes effects being the same across families. The effects of various
genotypes could be fitted separately, giving power to account for dominance and epistasis in case
of multiple QTL. For selection purposes, a fixed QTL effect, if additive, would be added to the
polygenic estimated breeding values (EBVs), similar to breed effects in across breed
evaluations. The advantage of a fixed QTL model is the limited number of effects that need to be
fitted. Alternatively, QTL effects could be modeled as random effects, with each individual having
a different QTL effect. Covariances are based on the probability of QTL alleles being identical by
descent rather than on numerator relationships as in the usual animal model with polygenic
effects. With full knowledge about segregation, this would effectively fit all founder alleles as
different effects. The random QTL model makes no assumptions about number of alleles at a QTL
and it automatically accommodates possible interaction effects of QTL with genetic background
86

Molecular Genetic Characterization of Farm Animal


Genetic Resources

(families or lines). Therefore, the random QTL model is less reliant on assumptions about
homogeneity of QTL effects. The random QTL model is a natural extension to the usual mixed
model and seems therefore a logical way to incorporate genotype information into an overall
genetic evaluation system. These models result in EBVs for QTL effects along with a polygenic
EBV. The total EBV is the simple sum of these estimates. One of the main computational limitations
of this method, however, is the large number of equations that must be solved, which increases
by two per animal for each QTL that is fitted.
Genetic evaluation using direct markers
When the genotype of an actual functional mutation is available, no pedigree information is
needed to predict the genotypic effect, as QTL genotypes are measured directly. When there is
only a small number of alleles, the number of specific genotypes is limited. In genetic evaluation,
it would seem appropriate to treat the genotype effect as a fixed effect, i.e. the assumption is that
genotype differences are the same in different families and herds or flocks. Such assumptions
might be reasonable for a biallelic QTL model in a relatively homogeneous population.
Alternatively, random QTL models could be used with different effects for different founder alleles,
or even QTL by environment interactions. In both fixed and random QTL models, genotype
probabilities can be derived for individuals with missing genotypes.
Genetic evaluation using LE markers
When the genotype test is not for the gene itself, but for a linked marker, QTL probabilities derived
from marker genotypes will be affected by the recombination rate between marker and QTL
and by the extent of LD between the QTL and marker across the population. If LD between the
QTL and a linked marker only exists within families, marker effects or, at a minimum, marker QTL
linkage phase must be determined separately for each family. This requires marker genotypes and
phenotypes on family members. If linkage between the marker and QTL is loose, phenotypic
records must be from close relatives of the selection candidate because associations will erode
quickly through recombination. With progeny data, marker QTL effects or linkage phases can
be determined based on simple statistical tests that contrast the mean phenotype of progeny that
inherited alternate marker alleles from the common parent. A more comprehensive approach is
based on Fernando and Grossmans (1989) random QTL model, where marker information from
complex pedigrees can be used to derive covariances between QTL effects, yielding best linear
unbiased prediction (BLUP) of breeding value for both polygenic and QTL effects. Random
effects of paternal and maternal QTL alleles are added to the standard animal model with random
polygenic breeding values. The variance covariance structure of the random QTL effects, also
known as the gametic relationship matrix (GRM), is based on probabilities of identity by descent
(IBD), and is now derived from co-segregation of markers and QTL within a family. Probabilities
of IBD derived from pedigree and marker data link QTL allele effects that are expected to be equal
or similar, therefore using data from relatives to estimate an individuals QTL effects. For example,
if two paternal halfsibsi and j have inherited the same paternal allele for markers that flank the
QTL (with recombination rate r), they are likely IBD for the paternal QTL allele and the
correlation between the effects of their paternal QTL alleles will be (1r)2. The method is
appealing, but computationally demanding for large scale evaluations, especially when not all
animals are genotyped and complex procedures must be applied to derive IBD probabilities.
Genetic evaluation using LD markers
Most QTL projects have moved towards fine mapping where the final result is a marker or
marker haplotype in LD with the QTL, if not the direct mutation. A haplotype of marker alleles
close enough to the putative QTL is likely to be in LD with QTL alleles. Such a marker test
provides information about QTL genotype across families, and is in a sense not very different
87

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

from a direct marker. The most convenient way to include genotypic information from marker
haplotypes in genetic evaluation systems is through the random QTL model. In their original paper,
Fernando and Grossman (1989) derived IBD from genotype data on single markers and
recombination rates between marker and QTL.
However, the random QTL model is more
versatile, and covariances based on IBD probabilities can also use information beyond pedigree,
based on LD. The latter can be derived from marker orhaplotype similarity, e.g. based on a
number of marker genotypes surrounding a putative QTL. Meuwissenand Goddard (2001)
proposed using both linkage and LD information to derive IBD based covariances (termed LDL
analysis). Lee and van der Werf (2005) showed that with denser markers, the value of linkage
information, and therefore pedigree, reduces. Hence, when QTL positions become more
accurately defined, genetic information from close markers (within a few cM) can be used
increasingly to derive LD based IBD probabilities, thereby defining covariances between random
QTL effects without the need for a family structure or information through pedigree.
Lee and van der Werf (2006) have shown that LD information results in a very dense GRM.
Genetic evaluation, which is usually based on mixed model equations that are relatively sparse, is
currently not feasible computationally for the LDL method for a large number of individuals
and alternative models are needed. One approach is to model population wide LD by simply
including the marker genotype or haplotype as a fixed effect in the animal model evaluation,
as suggested by Fernando (2004). An advantage of modeling population wide LD effects as fixed
rather than random is that fewer assumptions about population history are needed. A disadvantage
is that estimates are not BLUPed, i.e. regressed towards a mean depending on the amount of
information that is available to estimate their effects. This will be important if some of the genotype
or haplotype effects cannot be estimated with substantial accuracy because the number of
individuals with that genotype or haplotype is limited. Haplotype effects could also be fitted as
random, but more development is needed in this area.
In general, for the purpose of increased genetic
change of economically important
quantitative traits, and in the context of well recorded and efficient breeding programmes, there is
no need to have knowledge of functional mutations since nearby markers will have a high
predictive value about genetic merit. Moreover, the benefit from the extra investment and time
spent on finding functional mutations might be superseded by the genetic change that can
be made in the breeding programme in the meantime.
Implementation of marker assisted genetic evaluation
It is important to note that, for most of the gene marker tests which are in use do not get
integrated in the evaluation programs. This is because the gene testing is either for a Mendelian
characteristic, or it predicts phenotypic differences for traits that are not the same as those in
current genetic evaluation. Moreover, breeders would not only be interested in more accurate
EBVs based on gene markers, but they would also want to know the actual QTL genotypes
for their breeding animals. This information on individual genotype will become less relevant if
more gene tests become available and if testing becomes cheaper and more widespread. This might
still take some years. Thus, as gene marker testing is gradually introduced, it is more likely to
create additional selection criteria to consider and it will take some time before QTL
information is seamlessly and optimally integrated in existing genetic evaluation programmes. In
particular, if genetic evaluation is based on information from many different breeding units,
such as in cattle or sheep, genotyping information will initially be available for only a small
proportion of the breeding animals, possibly not justifying a total over haul of the system for
genetic evaluation. Simple ad hoc procedures where QTL effects are estimated and presented
separately as additional effects are initially a more likely route to implementation.
88

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Solutions for fixed QTL genotype effects, along with genotype probabilities as outputs of
genetic evaluation, might be interesting to breeders and, compared with random QTL
effects, may be more likely to be presented and used separately from polygenic EBVs. This
would also be the case for genotypic information on Mendelian characters; where there is no
polygenic component.Thus Molecular information can be used to enhance both the processes of
integrating superior qualities of different breeds and within breed selection.
Between breed selection
Crossing breeds results in extensive LD, which can be capitalized upon using MAS in a number
of ways. If a large proportion of breed differences in the trait(s) of interest are due to a small
number of genes, gene introgression strategies can be used. If a larger number of genes are
involved, MAS within a synthetic line is the preferred method of improvement. Introgression of the
desirable allele at a target gene from a donor to a recipient breed is accomplished by multiple
backcrosses to the recipient, followed by one or more generations of intercrossing. The aim of the
backcross generations is to produce individuals that carry one copy of the donor QTL allele but that
are similar to the recipient breed for the rest of the genome. The aim of the intercrossing phase is to
fix the donor allele at the QTL. Marker information can enhance the effectiveness of the
backcrossing phase of gene introgression strategies by: (i) identifying carriers of the target gene(s)
(foreground selection); and (ii) enhancing recovery of the recipient genetic background
(background selection). The effectiveness of the intercrossing phase can also be enhanced through
foreground selection on the target gene(s). If the target gene cannot be genotyped directly,
carrier individuals can be identified based on markers that flank the QTL at <10 cM, because of
the extensive LD in crosses. The markers must have breedspecific alleles in order to identify
line origin. For the introgression of multiple target genes, gene pyramiding strategies can be used
during the backcrossing phase to reduce the number of individuals required (Hospital
andCharcosset, 1997; Koudandet al., 2000). For background selection, markers are used that are
spread over the genome at <20 cM intervals, such that most genes that affect the trait will be within
10 cM from a marker. Combining foreground and background selection, selection will be for the
donor breed segment around the target locus but for recipient breed segments in the rest of the
genome. Foreground selection will result in selection for both the target locus and for donor breed
loci that are linked to this locus, some of which could have an unfavorable effect on performance.
To reduce this so called linkage drag around the target locus, in the molecular score used for
background selection greater emphasis can be given to markers that are in the neighborhood of
the target locus (apart from the flanking markers, which are used in foreground selection).
Most studies have considered marker assisted introgression (MAI) of single QTL (e.g. Hospital
and Charcosset, 1997) but often several QTL must be introgressed simultaneously. Koudandet al.
(2000) showed that large populations are needed to obtain sufficient individuals that are
heterozygous for all QTL in the backcrossing phase. This would make MAI not feasible in livestock
breeding programmes. In many cases, however, immediate fixation of introgressed QTL alleles may
not be required. Instead, the objective of the backcrossing phase can be to enrich the recipient
breed with the favourable donor QTL alleles at sufficiently high frequency for selection
following backcrossing. The effectiveness of such strategies was demonstrated by Chaiwong et
al. (2002).
Within breed selection
The procedures described previously for incorporating markers in genetic evaluation result in
estimates of breeding values associated for QTL, together with estimates of polygenic breeding
values. Alternatively, if molecular data are not incorporated into genetic evaluations, as will be
the case for more ad hoc approaches and for gene tests for Mendelian characteristics,
89

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

separate selection criteria will be available that capture the molecular information. The following
three selection strategies can then be distinguished (Dekkers, 2004):
Select on the QTL information alone;
Tandem selection, with selection on QTL followed by selection on polygenic EBV;
Selection on the sum of the QTL and polygenic EBV.
Selection on QTL or marker information alone ignores information that is available on
all other genes (polygenes) that affect the trait and is expected to result in the lowest response to
selection unless all genes that affect the trait are included in the QTL EBV. This strategy does
not, however, require additional phenotypes other than those that are needed to estimate
marker effects, and can be attractive when phenotype is difficult or expensive to record (e.g.
disease traits, meat quality, etc.). Selection on the sum of the QTL and polygenic EBV is expected
to result in maximum response in the short term, but may be suboptimal in the longer term
because of losses in polygenic response (Gibson, 1994). Indexes of QTL and polygenic EBV can
be derived that maximize longer term response (Dekkers and van-Arendonk, 1998) or a
combination of short and longer term responses (Dekkers and Chakraborty, 2001). However, if
selection is on multiple QTL and emphasis is on maximizing shorter term response, selection on
the sum of QTL and polygenic EBV is expected to be close to optimal. Optimizing selection on a
number of EBVs, indexes and genotypes, while also considering inbreeding rate and other
practical considerations is not a trivial task. Kinghornetal., (2002) have proposed a mate selection
approach that could be used to handle such problems, and it can be expected that with more
widespread use of genotypic information for a larger number of regions, specific knowledge about
individual QTL becomes less interesting and will simply contribute to prediction of whole EBV or
whole genotype.
Meuwissenand Goddard (1996) reported a simulation study that looked at the main
characteristics determining efficiency of MAS using LE markers. They found that MAS could
improve the rate of genetic improvement up to 64 percent by selecting on the sum of QTL and
polygenic EBV. Their work also demonstrated that MAS is mainly useful for traits where
phenotypic measurement is less valuable because of: (i) low heritability; (ii) sex limited
expression; (iii) availability only after sexual maturity; and (iv) necessity to sacrifice the animal
(e.g. slaughter traits). Selection of animals based on (most probable) QTL genotype will allow
earlier and more accurate selection, increasing the short and medium term selection response.
Most simulation studies have assumed complete marker genotype information but in practice
only a limited number of individuals will be genotyped. However, in an advanced breeding
programme with complete information on phenotype and pedigree information, marker and QTL
genotype probabilities could be derived for ungenotyped animals and genotyping strategies
could be optimized to achieve a high value for the investments made. However, in breeding
programmes for more extensive production systems (beef, sheep), pedigree recording is often
incomplete and only a small proportion of animals are genotyped. Moreover, these genotyped
animals are not necessarily the key breeding animals. The utility of linked markers will be even
more limited if pedigree relationships cannot be used to resolve genotype probabilities and
marker QTL phase of ungenotyped individuals.
A second point of caution is that many studies on MAS have taken a single trait approach
and shown that genetic markers could have a large impact on responses for traits that are
difficult to improve by phenotypic selection. However, within the context of a multitrait
breedingobjective, the overall impact of such markers on the breeding goal may be less because a
greater response for one trait often appears at the expense of another. Therefore, the overall effect of
MAS on the breeding program will generally be much smaller than predicted for single trait MAS
favorable cases. The main effects of MAS would be to shift the selection response in favor of the
90

Molecular Genetic Characterization of Farm Animal


Genetic Resources

marked traits, rather than achieving much additional overall response. Hence, while it will be easier
to select for carcass and disease resistance, further improvement for these traits will be at the
expense of genetic change for production traits (growth, milk).
The impact of MAS on the rate of genetic gain may be limited in conventional breeding
programmes (ranging up to perhaps 10 percent extra gain) unless the variation in
profitability is dominated by traits that are hard to measure. However, new technologies often
lead to other breeding program designs being closer to optimal. Genotypic information has extra
value in the case of early selection and where within family variance can be exploited, which is
particularly the case in programs where reproductive technologies are used. Reproductive
technologies usually lead to early selection and more emphasis on between family selection. DNA
marker technology and reproductive technologies
are therefore
highly
synergistic and
complementary (van derWerf and Marshall, 2005) and gene markers have much more value in
such programmes. Gene marker information is also clearly valuable in introgression programmes,
as demonstrated by simulation (Chaiwonget al., 2002; Dominiket al., 2006) as well as in practice
(Nimbkaretal., 2005). Yet, although these examples are favorable to the value of gene marker
information, the added value of MAS still relies heavily on a high degree of trait and pedigree
recording.
Marker assisted selection in Indian context
Complete phenotypic and pedigree information is often only available in intensive breeding units.
Therefore, in the context of low input production systems, some questions can be raised concerning
the validity and practicality of the simulation studies described above, and it would be more
difficult to realize the value of marker information. It would be harder and more expensive to
determine the linkage phase in the case of using linked markers. Moreover, even if the genetic
marker were a direct or LD marker, its effect on phenotype would have to be estimated for the
population and the environment in which it is used. This would require phenotypes and
genotypes on a sample of a rather homogeneous population to avoid spurious associations that
could result from unknown population stratification. Therefore, a gene marker for a QTL is likely to
be most successful in an environment with intensive pedigree and performance recording.
Nevertheless, in low input environments, direct and LD markers will be more useful than
LE markers because the latter require routine recording of phenotypes and genotypes to
estimate QTL effects within families.
In addition to MAS within local breeds, several other strategies for breed improvement could
bepursued in developing countries, including gene introgression and MAS within synthetic breeds.
This would be most advantageous for introducing specific disease resistance alleles into breeds
with improved
production characteristics to make them more tolerant to the environments
encountered in developing countries. Gene introgression is, however, a long and expensive process
and only worthwhile for genes with large effects. MAS within synthetic breeds, e.g. a cross
between local and improved temperate climate breeds, can allow development of a breed that is
based on the best of both breeds (e.g. Zhang and Smith, 1992). Because of the extensive LD within
the cross, a limited number of markers would be needed. However it is important to avoid the
impact of genotype x environment interactions if MAS is implemented in a more controlled
environment.
References

Chaiwong, N., Dekkers, J.C.M., Fernando, R.L. and Rothschild, M.F. 2002.Introgressing multiple QTL in
backcross breeding programs of limited size. Proc. 7th Wld. Congr.Genet.Appl. Livest.Prodn. Electronic
Communication No. 22: 08. Montpellier, France.
Darvasi, A. and Soller, M. 1997. A simple method to calculate resolving power and confidence interval of QTL
map location. Behavior Genetics 27: 125.

91

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Dekkers J.C.M. and Chakraborty, R. 2001.Potential gain from optimizing multi-generation selection on an
identified quantitative trait locus. J. Anim. Sci. 79: 2975.
Dekkers, J.C.M. and van Arendonk, J.A.M. 1998. Optimizing selection for quantitative traits with information
on an identified locus in outbred populations, Genet. Res. 71: 257275.
Dominik, S., Henshall, J., OGrady, J. and Marshall, K.J. 2007. Factors influencing the efficiency of a marker
assisted introgression program in merino sheep. Genet. Sel. Evol.,39 495.
Ewing B. and Green P. 2000. Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet.
25:232-4.
Fernando, R. and Grossman, M. 1989. Marker assisted selection using best linear unbiased prediction.
Genetics Selection Evolution,21: 467.
Fernando, R.L. 2004.Incorporating molecular markers into genetic evaluation.Session G6.1.Proc.55th Meeting
of the European Association of Animal Production. 59 September 2004, Bled, Slovenia.
Fischer, R. A. 1918. The correlation between relatives: the supposition of mendelaininheritance.Transactions
of the royal society of Edinburgh, 52:399.
Gibson, J.P. 1994. Short-term gain at the expense of long-term response with selection of identified loci, Proc.
5th Wld. Cong. Genet. Appl. Livest.Prodn. CD-ROM Communication No. 21: 201204. University of
Guelph, Canada.
Hayes, B. J. and Goddard, M.E. 2001.The distribution of the effects of genes affecting quantitative traits in
livestock. Genetics Selection Evolution 33: 209-229.
Henderson, C. R. 1984. Applications of linear models in animal breeding.Can. Catal. Publ. Data,Univ Guelph,
Canada.
Hospital, F. and Charcosset, A. 1997.Marker-assisted introgression of quantitative trait loci. Genetics 147:
1469.
Kinghorn, B.P., Meszaros, S.A. and Vagg, R.D. 2002.Dynamic tactical decision systems for animal breeding,
Proc. 7th Wld. Congr.Genet.Appl. Livest.Prodn.Communication No. 23-07. Montpellier, France.
Koudand, O.D., Iraqi, F., Thomson, P.C., Teale, A.J. and van Arendonk, J.A.M. 2000.Strategies to optimize
marker-assisted introgression of multiple unlinked QTL.Mammal. Genome 11:145150.
Lee SH, van der Werf JH.2004. The efficiency of designs for fine-mapping of quantitative trait loci using
combined linkage disequilibrium and linkage. Genet SelEvol.36:145.
Lee, S.H. and van der Werf, J.H.J. 2006. An efficient variance component approach implementing an average
information REML suitable for combined LD and linkage mapping with a general complex pedigree.
Genet. Sel. Evol. 38: 25.
Meuwissen, T. H. E., B. Hayes, and M. E. Goddard.2001. Prediction of total genetic value using genome-wide
dense marker maps. Genetics 157:1819.
Meuwissen, T.H.E. and Goddard, M.E. 1996.The use of marker haplotypes in animal breeding
schemes.Genet. Sel. Evol. 28: 161.
Nimbkar, C., Pardeshi, V. and Ghalsasi, P. 2005. Evaluation of the utility of the FecB gene to improve the
productivity of Deccani sheep in Maharashtra, India. pp. 145154. In H.P.S. MakkarandG.J.Viljoen, eds.
Applications of gene-based technologies for improving animal production and health in developing
countries. Netherlands, Springer.
Rothschild MF, Larson RG, Jacobson C, Pearson P. 1991. PvuII polymorphisms at the porcine oestrogen
receptor locus (ESR).Anim Genet. 22(5):448.
Shrimpton, A. E., Robertson, A. 1988. The Isolation of Polygenic Factors Controlling Bristle Scorein
Drosophila melanogaster. II. Distribution of Third Chromosome Bristle Effects Within Chromosome
Sections. Genetics 118: 445.
van der Werf, J.H.J. and Marshall, K. 2005. Combining gene-based methods and reproductive technologies to
enhance genetic improvement of livestock in developing countries, pp. 131144.In H.P.S Makkarand G.J.
Viljoen, eds. Applications of gene-based technologies for improving animal production and health in
developing countries. Netherlands, Springer.
Zhang, W. and Smith, C. 1992.Computer simulation of marker-assisted selection utilizing linkage
disequilibrium.Theor. Appl. Genet. 83: 813.

92

13

An Introduction to Quantitative Real Time PCR for Expression


Analysis of Candidate Genes
Indrajit Ganguly, Sanjeev Singh, Monika Sodhi and Manishi Mukesh
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana

________________________________________________________________________________________

Real-time PCR is a recent modification of polymerase chain reaction (PCR) that allows precise
quantification of specific nucleic acids in a complex mixture by fluorescent detection of labeled PCR
products. It is also known as kinetic PCR, qPCR, qRT-PCR and RT-qPCR. Both specific, as well as
nonspecific fluorescent probes may be used for detection. Real-time PCR is often used in the
quantification of gene expression levels. Before using real-time PCR to quantify a target message,
care must be taken to optimize the RNA isolation, primer design, and PCR reaction conditions so
that accurate and reliable measurements can be made. Here we will be discussing some basic
aspects of real-time PCR, primer and probe designing guidelines, real time chemistries, as well as
real time quantification using both relative and absolute quantification approaches. Useful Web
sites have been mentioned in the text during discussion.
Real Time PCR
Real time PCR is a technique used to monitor the progress of a PCR reaction in real time. A
relatively small amount of PCR product (DNA, cDNA or RNA) can easily be quantified. Real Time
PCR is based on the detection of the fluorescence produced by a reporter molecule which increases,
as the reaction proceeds. This occurs due to the accumulation of the PCR product with each cycle of
amplification. These fluorescent reporter molecules include dyes that bind to the double-stranded
DNA (i.e. SYBR Green) or sequence specific probes (i.e. Molecular Beacons or TaqMan Probes).
Real time PCR facilitates the monitoring of the reaction as it progresses. One can start with minimal
amounts of nucleic acid and quantify the end product accurately. Moreover, there is no need for the
post PCR processing which saves the resources and the time. These advantages of the fluorescence
based real time PCR technique have completely revolutionized the approach to PCR-based
quantification of DNA and RNA. Real time PCR assays are now easy to perform, have high
sensitivity, more specificity, and provide scope for automation. Real time PCR is also referred to as
real time RT PCR which has the additional cycle of reverse transcription that leads to formation of a
DNA molecule from a RNA molecule. This is done because RNA is less stable as compared to
DNA.
Real Time PCR and Traditional PCR
Quantitative real-time PCR (qPCR) has become the most precise and accurate method for analyzing
gene expression. Prior to qPCR, the most common methods for determining expression levels were
northern blotting, RNase protection assays, or traditional endpoint reverse transcription (RT) PCR.
Endpoint RT-PCR was an improvement over the older methods due to its ease of use and the much
smaller amounts of RNA needed for the reaction. Traditional or conventional PCR uses gel
electrophoresis for the detection of PCR amplification in the final phase or at end-point of the PCR
reaction. In contrast, real-time PCR allows the accumulation of amplified product to be detected,
during the early phases of the reaction, and measured as the reaction progresses, that is, in real
time. In contrast, traditional RT-PCR can be useful for determining the presence or absence of a
particular gene product. The main advantage of real-time PCR over conventional PCR is that realtime PCR allows you to determine the starting template copy number with accuracy and high
sensitivity over a wide dynamic range. Real-time PCR results can either be qualitative (presence or
absence of a sequence) or quantitative (number of copies of DNA). In contrast, conventional PCR is
93

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

at best semi-quantitative. Additionally, real-time PCR data can be evaluated without gel
electrophoresis, resulting in reduced experiment time and increased throughput. Finally, because
reactions are run and data are evaluated in a closed-tube system, opportunities for contamination
are reduced and the need for post amplification manipulation is eliminated.
Different phases of Real Time PCR amplification
A typical qPCR amplification plot may be divided into four major phases (Figure 1): the linear
ground phase, early exponential phase, log-linear phase, and plateau phase. During the linear
ground phase (usually the first 1015 cycles), PCR is just beginning, and fluorescence emission is
yet to rise above the background. Baseline fluorescence is calculated at this point. At the early
exponential phase, the amount of fluorescence has reached a threshold where it is significantly
higher than background levels. The cycle at which this occurs is known as Ct or crossing point (CP).
This value is representative of the starting copy number in the original template and is used to
calculate experimental results. During the log-linear phase, PCR reaches its optimal amplification
period with the PCR product doubling after every cycle in ideal reaction conditions. Finally,
amplification reaches a plateau as the reaction components are exhausted and the fluorescence
intensity is no longer useful for data calculation. In the endpoint PCR, amplification can only be
viewed at the end of the reaction, and only the final plateau is observedany differences in initial
abundance are obscured.

Figure 1. Phases of a typical qPCR amplification plot

Commonly Used Terms in Real Time PCR


1. Amplicon: A short segment of DNA generated by the PCR process
2. Amplification plot: The plot of Fluorescence signal versus cycle number
3. Standard: A sample of known concentration used to construct a standard curve. By running
standards of varying concentrations, a standard curve is created from which the quantity of an
unknown sample can be calculated.
4. NTC(no template control): A sample that does not contain template
5. Passive reference: A dye that provides an internal reference to which the reporter dye signal can
be normalized during data analysis. It helps to correct for fluctuations cause by change in
concentration or volume
6. Threshold: The average standard deviation of Rn for the early PCR cycles, multiplied by an
adjustable factor. It is the level of fluorescence in which reactions are in the exponential phase of
amplification.
7. C T (threshold cycle): It is the cycle at which the amplification plot crosses the threshold i.e., the
point at which there is a significant detectable increase in fluorescence.
8. Baseline: The baseline is the noise level in early cycles (between cycles 3 and 15), where there is
no detectable increase in the fluorescence due to amplification products
94

Molecular Genetic Characterization of Farm Animal


Genetic Resources

9. Unknown: Unknown: A sample containing an unknown quantity of template. This is the sample
of interest (experimental sample as opposed to positive controls or standards) whose quantity is
being determined.
10. Background: It is due to the non PCR based fluorescence in the reaction due to presence of large
amount of double stranded DNA or inefficient quenching of the fluorophore.
11. Endogenous reference gene: This the gene whose expression level should not differ between
samples, such as a house keeping gene (GAPDH, HPRT, Beta actin etc).
12. Slope:Mathematically calculated slope of standard curve, e.g., the plot of Ct values against
logarithm of ten-fold dilutions of target nucleic acid. This slope is used for efficiency calculation.
Ideally, the slope should be 3.32 (3.1 to 3.6), which corresponds to 100% efficiency (precisely
1.0092) or two-fold (precisely, 2.0092) amplification at each cycle.
13. Reference dye: Used in all reactions to obtain normalized reporter signal (Rn) adjusted for well-towell variations by the analysis software. The most common passive reference dye is ROX and is
usually included in the master mix. Not all instruments require the use of a reference dye (see
Table 1 in Real-Time PCR by Qiagen).
14. ROX: 6-carboxy-X-rhodamine. Most commonly used passive reference dye for normalization of
reporter signals in ABI instruments. The emission recorded from ROX during the baseline cycles
(usually 3 to 15) is used to normalize the emission recorded from the reporter due to
amplification in later cycles. The use of ROX improves the results by compensating for small
fluorescent fluctuations such as bubbles and well-to-well variations that may occur in the plate.
ROX or any other internal reference dye is not required by all machines (see the list in Table 1
in Qiagen Publication: Checklist for Multiplex Real-time PCR). If in a ROX requiring instrument,
a master mix with lower than required ROX concentration is used, the SD will be large and may
be reduced by using an appropriate master mix.
15. Rn (normalized reporter signal): The fluorescence emission intensity of the reporter dye divided by
the fluorescence emission intensity of the passive reference dye. Rn+ is the Rn value of a reaction
containing all components, including the template and Rn is the Rn value of an unreacted
sample. The Rn value can be obtained from the early cycles of a real-time PCR run (those cycles
prior to a significant increase in fluorescence), or a reaction that does not contain any template.
16. Rn (delta Rn, dRn): The magnitude of the fluorescence signal generated during the PCR at each
time point. The Rn value is determined by the following formula:

Primer and Probe Designing Guidelines for Real Time PCR


Optimal primers are essential to ensure that only a single PCR product is amplified. In order to
avoid non-specific PCR products, primers should not have high sequence similarity with other
sequences. This can be checked using the Basic Local Alignment Search Tool (BLAST) from the
National Center for Biotechnology Information (http://blast.ncbi.nlm.nih.gov/Blast.cgi ). Primers
containing 16-28 nucleotides are enough for successful PCR amplification.
1. Small amplicon selection:Smaller amplicons of 50- to 150-basepair range are favoured because they
promote high-efficiency amplification. In addition, high-efficiency assays enable relative
95

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

2.

3.

4.

5.

6.

quantitation to be performed using the comparative C T (CT) method (Livak and Schmittgen,
2001). This method increases sample throughput by eliminating the need for standard curves.
G/C content:Whenever possible, select primers and probes in a region with a G/C content of 30 to
80%. Regions with a G/C content >80% may not denature well during thermal cycling, leading
to a less efficient reaction. G/C-rich sequences are susceptible to nonspecific interactions that
may reduce reaction efficiency and produce nonspecific signal in assays using SYBR Green
reagents. Avoid primer and probe sequences containing runs of four or more G bases.
Melting Temperature:When working with Primer Express software, you can select primers and
probes with the recommended melting temperature(Tm) using universal thermal cycling
conditions. It is generally recommended that the probe Tm should be 10 C higher than that of
the primers. In Primer Express software recommended Tm for probe is 68-70 C and for primer it
is about 58-60 C.
5End of probes:Primer Express software does not select probes with a G on the 5end. The
quenching effect of a G base in this position will be present even after probe cleavage. The
presence of a G base can result in reduced fluorescence values (Rn) that can negatively affect
assay performance. G bases in positions close to the 5end, but not on it, have not been shown to
compromise assay performance.
3End of primers:To reduce the possibility of nonspecific product formation, ensure that the last
five bases on the 3end of the primers do not contain more than two C and/or G bases. Under
certain circumstances, such as a G/C-rich template sequence, you may have to relax this
recommendation to keep the amplicon under 150 basepairs in length. In general, avoid primer
3ends extremely rich in G and/or C bases.
General considerations:Select the probe first, and then design the primers as close as possible to the
probe without overlapping the probe.
Intron spanning primer pair should be preferred in order to prevent potential signals from
genomic DNA contamination in the sample.
Make TaqMan MGB probes as short as possible without being shorter than 13 nucleotides.
Finally, if oligo(dT) is used for priming in reverse transcription, primers should be located
within 1000 bp of the 3' end of mRNA (Wang et al., 2006).
There are some free online tools or commercially available soft-wares which can be used for
primer design if the parameters described above are provided. The selected list of useful web
resources and some commercial programs is given in table 1.

Real Time detection chemistries


A key step in designing a qPCR assay is selecting the chemistry to monitor the accumulation of
amplified target sequence. A variety of fluorescent chemistries are available:
Sequence non-specific detection (Intercalating dye): In the sequence non-specific detection, DNA
binding dyes (SYBR Green I, YO-PRO, CYTO-9, BEBOetc) emit fluorescence after binding to
dsDNA. As the double-stranded PCR product accumulates during cycling, more dye can bind and
emit fluorescence. Thus, the fluorescence intensity increases proportionally to dsDNA
concentration. SYBR Green I is the most frequently used dsDNA specific dye in Real-time PCR and
its binding affinity to dsDNA is 100 times higher than that of ethidium bromide. After SYBR GreenI binds to dsDNA, it emits 1000-fold greater fluorescence as compared to unbound dyes. This
technique is very flexible because one dye can be used for different gene assays as well as
inexpensive and simple to use. Consequently, multiplexing reactions is not possible. Because DNA
binding dyes do not bind in a sequence-specific manner, these assays are prone to false positives.
However, the problem may be solved by dissociation curve (melting curve) analysis (Figure 2).
After melting curve analysis, if two or more peaks are present, it means that there are more than
96

Molecular Genetic Characterization of Farm Animal


Genetic Resources

one amplified products in the reaction and thus no specific amplification for a single DNA sequence
has been occurred. This method is not affected when the presence of variations (i.e. single
nucleotide polymorphisms or SNPs) on the target sequence. Moreover, less specialized knowledge
is required as compared to the designing of fluorescent labeled oligo probes.

Figure 2. Amplification data using SYBRGreen reagents. (a) Amplification plot (linear view) demonstrating
suspected nonspecific amplification in negative control (NC) wells; (b) Melt curve analysis
confirming that product in NC wells has a different melting temperature from the specific product

Sequence specific detection: Fluorescent probe based technology in Real-time PCR allows us to
perform sensitive and specific detection. Mostly, three types of probes are used having distinct
molecular structure and dyes attached. They are hybridization probes, hydrolysis probes and
hairpin probes. All detection methods using fluorescent probe technology rely on a process referred
to as fluorescence resonance energy transfer (FRET) in which the transfer of light energy between
two adjacent dye molecules occurs (Espy et al., 2006). However, both hydrolysis and hybridization
probes depend on FRET to change fluorescence emission intensity; the energy transfer works in
opposite manners in these two chemistries. While FRET reduces fluorescence intensity in
hydrolysis probes, it increases intensity in hybridization probes.
A. Hybridization probes
In an assay/reaction one or two hybridization probes can be used. When two hybridization probes
are used, they bind to target sequence in close proximity to each other in a head-to-tail arrangement
(Figure 3). The upstream probe carries an acceptor (or quencher) dye on its 3' end the second probe
or downstream probe is labeled with a donor (or reporter) dye on 5' end. On the other hand, in one
probe method, the upstream primer is labeled with an acceptor dye on the 3' end instead of labeling
probe. Thus, labeled primer replaces the function of one of the probes used two hybridization probe
method. In both cases, the energy transfer depends on the distance between two dye molecules.
Because of the distance between two dyes in solution, donor dye emits only background
fluorescence. When the probes hybridize to their complementary sequence, this binding brings the
97

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

two dyes in close proximity to one another and FRET occurs at high efficiency. Since, a fluorescent
signal is detected only as a result of two independent probes hybridizing to their correct target
sequence, increasing amounts of measured fluorescence is proportional to the amount of DNA
synthesized during the PCR reaction. Moreover, as the probes are not hydrolyzed, fluorescence
signal is reversible and allows the generation of melting curves (Bustin, 2000).

Figure 3. Hybridization and hydrolysis probe based detection in Real-time PCR

B. Hydrolysis probes
Hydrolysis probes (also known as TaqMan probes or 5' nuclease assay) contain a fluorescent
reporter dye at its 5' end and quencher dye at its 3' end. If the probe is unbound, reporter and
quencher dyes are maintained in close proximity, which allows the quencher to reduce the reporter
fluorescence intensity by FRET, and thus no reporter fluorescence is detected (Bustin, 2000) (Figure
3). After annealing to the target sequence, the bound and quenched probe will be degraded by the
DNA polymerases 5 nuclease activity during the extension step of the PCR. Probe degradation
allows separation of the reporter from the quencher dye, resulting in increased fluorescence
emission (Figure 3). Minor groove binders (MGBs), such as dihydrocyclopyrroloindoletripeptide
(DPI3), may be added to these probes to increase their Tm and allow the use of a shorter probe.
These probes are less expensive, display reduced background fluorescence and a larger dynamic
range due to increased efficiency of reporter quenching. Hydrolysis probes commonly are in
structure of nucleic acids, however, recently developed, Locked Nucleic Acids (LNA) containing
hydrolysis probes are commercially available from Roche Applied Science under the name of
Universal
Probe
Library
(UPL)
probes
and
can
be
accessed
online
(www.universalprobelibrary.com). LNAs are DNA nucleotide analogues with increased binding
strengths compared to standard DNA nucleotides. In order to maintain the specificity and Tm,
LNA bases are incorporated in each UPL probes.

98

Molecular Genetic Characterization of Farm Animal


Genetic Resources

C. Hairpin probes
Hairpin or stem-loop DNA probes display an increased specificity of target recognition compared
to linear DNA probes. Hairpin DNA probes are single-stranded oligonucleotides and contain a
sequence complementary to the target that is flanked by self-complementary target unrelated
termini. Invention of hairpin probes is let to view hybridization process in real-time. They are
widely used in different applications and two major factors are responsible for such broad
applications of these DNA probes: Enhanced specificity of the probetarget interaction and the
possibility of closed-tube real-time monitoring formats (Broude, 2005). There are several types of
hairpin probes commercially available including molecular beacons, scorpions, LUXTM fluorogenic
primers and SunriseTM Primers (Figure 4).
a. Molecular beacons
This class of hairpin probes and first developed in 1996 (Tyagiand Kramer 1996). A molecular
beacon is a dye-labelled oligonucleotide (2540 nt) that forms a hairpin structure with a stem and a
loop (Figure 4A). The 5' and 3' ends of the probe have complementary sequences of 56 nucleotides
that form the stem structure. The loop portion of the hairpin is designed to hybridize specifically to
a 1530 nucleotide section of the target sequence. A fluorescent reporter molecule is attached to the
5' end of the molecular beacon, and a quencher is attached to the 3' end. Formation of the hairpin
therefore brings the reporter and quencher together, so no fluorescence is emitted. During the
annealing step of the amplification reaction, the loop portion of the molecular beacon binds to its
target sequence, causing the stem to denature. The reporter and quencher are thus separated,
quenching is abolished, and the reporter fluorescence is detectable. Because fluorescence is only
emitted from the probe when it is bound to the target, the amount of fluorescence detected is
proportional to the amount of target in the reaction. The fluorescence of the probe increases 100fold even when it binds to its target Molecular beacons have some advantages over other
chemistries. They are highly specific, can be used for multiplexing, and if the target sequence does
not match the beacon sequence exactly, hybridization and fluorescence will not occur which is
especially desirable for allelic discrimination experiments. Unlike TaqMan assays, molecular
beacons are displaced but not destroyed during amplification, because a DNA polymerase lacking
5' exonuclease activity is used. The main disadvantage of using molecular beacons is that they are
difficult to design. The stem of the hairpin must be strong enough so that the molecule will not
spontaneously fold into nonhairpin conformations that result in unintended fluorescence. At the
same time, the stem of the hairpin must not be too strong; otherwise the beacon may not properly
hybridize to the target.
b. Scorpions primers
These assays use two primers. Scorpions combine the detection probe with the upstream PCR
primer and consist of a fluorophore on the 5end, followed by a complementary stem-loop structure
(also containing the specific probe sequence), quencher dye, DNA polymerase blocker, and finally a
PCR primer on the 3end (Figure 4B). During the first amplification cycle, the Scorpions primer is
extended, and the sequence complementary to the loop sequence is generated on the same strand.
After subsequent denaturation and annealing, the loop of the Scorpions probe hybridizes to the
internal target sequence, and the reporter is separated from the quencher. The resulting fluorescent
signal is proportional to the amount of amplified product in the sample. The Scorpions probe
contains a PCR blocker just 3' of the quencher to prevent read-through during the extension of the
opposite strand.
c. LUX TM fluorogenic primers
Light upon extension (LUX) primers (Invitrogen, Carlsbad, CA, USA) are self-quenched singlefluorophorelabeled primers. These assays employ two primers, one of which is a hairpin-shaped
99

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

primer with a fluorescent reporter attached near the 3' end (Figure 4C). The reporter is quenched by
the secondary structure of the hairpin. During amplification, the LUX primer is incorporated into
the product, eliminating the quenching hairpin structure, so fluorescence is emitted. LUX primers
are designed to have a G or C 3'-terminal nucleotide and fluorophore attached to the second or
third base (Thymine nucleotide) fromthe 3' end. It also has five to seven nucleotides 5'-tail that is
complementary to the 3' end of the primer. Such a design of the primer allows the molecule to form
a blunt-end hairpin structure with low fluorescence at temperatures below its Tm.
d. SunriseTM primers
The Sunrise primer-probes, originally created by Oncor (Gaithersburg, MD, USA), are
bifunctional molecules similar to Scorpions primer-probes which combine both the PCR primer and
detection mechanism in the same molecule. The Sunrise primer-probes have dual-labeled (reporter
and quencher fluorophores) hairpin loop on the 5end, with the 3end acting as the PCR primer
(Figure 4D). Unbound intact hairpin causes reporter quenching via FRET. Upon integration into the
newly formed PCR product, the reporter and quencher are held far enough apart to allow reporter
emission.

Figure 4.Various types of commercially available hairpin probes

Real Time Quantification


Quantification of mRNA transcription can be measured by absolute or relative quantitative Realtime PCR (Souaz et al., 1996; Pfaffl, 2001a; Bustin, 2002) methods. In absolute quantification
analysis method, the copy number of a target sequence (in picograms or monograms of DNA or
RNA) in the sample is accurately measured, while relative quantification provides relative changes
in mRNA expression levels as a ratio of the amount of initial target sequence between control and
analyzed samples (Souaz et al., 1996; Pfaffl, 2001a; Fraga et al., 2008). Thus, relative quantification
simply allows us to determine the fold changes between sample and control. If the purpose is
accurately measuring the copy number of a target sequence, absolute quantification strategy which
requires standards of known copy number, should be performed. Moreover, these standards
should be amplified in the same run (Peirson et al., 2003). Both approaches are generally used but
100

Molecular Genetic Characterization of Farm Animal


Genetic Resources

relative quantification requires less set up time and easier to perform than absolute quantification
because a standard curve is not essential (Livak, 2001; Fraga et al., 2008). Furthermore, it is
commonly not necessary to know the absolute amount of mRNA in biological applications
examining gene expression (Bustin, 2002; Huggett et al., 2005).
Absolute Quantification: Absolute quantification requires a standard calibration curve using
serially diluted standards of known concentrations for highly specific, sensitive and reproducible
result. Linear relationship between Ct and initial amounts of total RNA or cDNA using standard
curve allows the detection of unknowns concentration based on their Ct values. In this method, all
standards and samples are assumed to have equal amplification efficiency. It is necessary to control
the efficiency of the Real-time PCR reaction to quantify mRNA levels (Fraga et al., 2008). Real-time
PCR amplification efficiencies for calibration curve and target cDNA must have identical reverse
transcription efficiency to provide a valid standard for mRNA quantification (PfaffandHageleit,
2001). The amplification efficiencies of the standard and unknown target sequence should be
approximately equal and the concentration of the serial dilutions should be within the range of the
unknown(s) in order to ensure correct results. The standard and target sequence should have the
same primer binding sites and produce a product of approximately the same size and sequence
(Fraga et al., 2008). The standard can be based on known concentrations of double-stranded DNA
(dsDNA), single-stranded DNA (ssDNA), commercially synthesized long oligonucleotide and
complementary RNA (cRNA) bearing the target sequence.
DNA standards can be synthesized by cloning the target sequence into a plasmid, purifying a
conventional PCR product, or may directly be synthesized chemically. These standards have a
property of larger quantification range, greater sensitivity, more reproducibility and higher stability
than RNA standards. However, DNA standards are generally not possible to use as a standard for
absolute quantitation of RNA because there is no control for the efficiency of the reverse
transcription step. (Livak, 2001; Wong and Medrano, 2005). Therefore, RNA molecules are strongly
recommended as standards for quantification of RNA.
For RNA standard preparation, an in vitro-transcribed sense RNA transcript is generated
followed by a digestion with RNase-free DNase so eliminate DNA contamination. A recombinant
RNA (recRNA) can be synthesized in vitro by cloning the DNA of the gene of interest (GOI) into a
suitable vector, containing typically SP6, T3, or T7 phage RNA polymerase promoters. Several
commercial kits are available that facilitate the production of RNA from these vectors. After in vitro
transcribed RNA (standard RNA) is synthesized, the standard concentration is measured on a
spectrophotometer and converted the absorbance to a target copy number per g RNA (Bustin,
2000). Once the standard has been accurately quantified, it is serially diluted in increments of 5- to
10-fold and each dilution should be run in triplicate (Fraga et al., 2008). The dilutions should be
made over the range of copy numbers that include the likely amount of target mRNA expected to
be present in the experimental samples to maximize accuracy (Bustin, 2000; Fraga et al., 2008). The
average Ct values from each dilution are then plotted versus the absolute amount of standard
present in the sample to generate a standard curve (Figure 5).Comparison of experimental Ct
values to this standard curve produces an estimate of the amount of target present in the initial
sample (Bustin, 2000; Fraga et al., 2008).

101

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Figure 5. Showing standard curve for absolute quantificationandestimation of conc of an unknown sample

Relative Quantification: Relative quantification analysis determines the levels of expression of a


GOI and expresses it relative to the levels of an internal control or reference gene (RG). Results are
given as ratio of GOI versus one or more RGs. In this type of analysis, the function of theRG is to
normalize the data for differences in RNA (DNA) quantification and template input. Therefore,
expression of the RG has to be analyzed in the same sample as the GOI and can be co-amplified in
the same tube as a multiplex assay (probes) or the same sample should be used in separate tubes as
a simplex assay. Reference genes are genes that are not affected by the treatment in any way and
are constant under the tested conditions. Hence, the reliability of the relative quantification analysis
is strongly dependent on the stability of the RG. Several tools are available for the determination of
the
best
RG
like
geNorm:
http://medgen.ugent.be/~jvdesomp/genorm/;
BestKeeper:http://www.gene-quantification. info etc. here are numerous mathematical models
available to calculate the mean normalized gene expression (relative expression ratio-R or fold
induction) from relative quantitation assays based on the comparison of the distinct cycle
differences. Depending on the method employed, these can yield different results and thus
discrepant measures of standard error. Two types of relative quantification models are generally
used:
(A) Relative quantification without efficiency correction or the Comparative Ct method: The
comparative Ct method is a mathematical model based on the delta-Ct (Ct) (Wittwer et al., 2001)
or delta-delta-Ct (Ct) values in most applications, described by Livak and Schmittgen (Livak and
Schmittgen, 2001) without efficiency correction. In this model, an optimal doubling of the target
sequence during each performed Real-time PCR cycle is assumed. This analysis can be performed
in two ways:
-Non-normalized expression (also known as Ct method) and
-Normalized expression (also known as Ct method)
i) Non-normalized expression (Ct) method:In relative quantification, a comparison is made with the
gene expressed in the sample to that of the same gene expressed in the control. Ct values are nonnormalized using housekeeping gene, but normalization is accomplished via equal loading of
samples. Quantitation is performed relative to the control by subtracting the Ct value of the control
gene from Ct of the sample gene (Ct). The fold difference of target gene in sample and control is
calculated by using the resulting differences in cycle number (Ct) as the exponent of the base 2
(due to the doubling function of PCR) as given below in eq.1 and 2.
R = 2Ct
(1)
R = 2[Ct sample Ct control]
(2)

102

Molecular Genetic Characterization of Farm Animal


Genetic Resources

ii) Normalized Expression (CT) method:In this approach, loading differences are eliminated.
Moreover, the Ct values of both the control and the samples for target gene are normalized to an
appropriate housekeeping or reference gene. This method also known as
2Ct method. Formulas are given below in eq.3 and 4.
R = 2Ct
(3)
R = 2[Ct sample Ct control]
(4)
Ct (sample) = Ct target gene Ct reference gene
Ct (control) = Ct target gene Ct reference gene
Ct = Ct (sample) Ct (control)

The reaction is rigorously optimized and the PCR product size should be kept small (less than 150
bp). Comparative Ct method can be chosen when assaying a large number of samples because the
standard curve is unnecessary. This model is acceptable for a first approximation of the crude
expression ratio. However, efficiency (E) corrected models are useful to obtained reliable relative
expression data (Pfaffl et al., 2009).
(B) Relative quantification with efficiency correction: Pfaffl Method
The 2CT method for calculating relative gene expression is only valid when the amplification
efficiencies of the target and reference genes are similar. If the amplification efficiencies of the two
ampicon are not the same, an alternative formula must be used to determine the relative expression
of the target gene indifferent samples. To determine the expression ratio between the sample and
calibrator, use the following formula:
=

, ( )

, ( )

This Pfaffl model combines gene quantification and normalization into a single calculation. This
model incorporates the amplification efficiencies of the target and reference (normalization) genes
to correct for differences between the two assays. The relative expression software tool (REST),
which runs in MicrosoftExcel, automates data analysis using this model (Pfaffl et al., 2002; Pfaffl et
al., 2009). REST uses the Pairwise Fixed Reallocation Randomization Testto calculate result
significance and will indicate if the reference gene used is suitable for normalization.
(C) Relative quantification by standard curve method: When the amplification efficiencies of the
target and the endogenous are not same then this method is used. In this method standard curves
are prepared for both the target and the endogenous control. For each experimental sample, the
amount of target and endogenous control is determined from the appropriate standard curve.
Then, the target amount is divided by the endogenous control amount to obtain a normalized target
value. One of the experimental samples is designated as the calibrator or 1x sample. The calibrator
is usually the expression level at baseline and the experimental samples are those collected after
treatment or some intervention. Each of the normalized target value is divided by the calibrator
normalized target value to generate the relative expression levels.
Technical and biological replicates
Depending on the applications, the use of technical and biological replicates or both has to be
considered. Often, the same cDNA sample is analyzed in triplicate in one RT-qPCR run. This type
of technical replicate only tells something about the pipetting skills of the operator and the accuracy
of the PCR instrument. Biological replicates refer to the application of the same treatment to two or
more samples. From each of the samples, the RNA isolation and cDNA synthesis are performed
independently but under identical conditions. Each of the obtained cDNA samples can be analyzed
once by RT-qPCR. Both types of replicates (technical or biological) provide information about the
experimental variation and allow statistics to be applied to identify differences in expression levels
103

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

between samples. Being a beginner, it is a good practice to include technical replicates to test for
pipetting skills. When testing the amplification efficiency of a new primer set, it is advisable to
include at least a triplicate of each dilution point. When investigating the effects of a treatment, the
use of biological replicates we think is of greater value. For example, in an in vitro experiment, cells
are incubated in the presence or absence of a stimulus. The treatment is repeated in at least three
replicate wells. Each of the three wells is a biological replicate; however, the cells are derived from a
single individual. More relevant would be to repeat the same in vitro experiment on cells isolated
from three different individuals, each of them being a biological replicate.
Table 1:Useful web link and some commercial programs for primer and probe designing
Software
Primer3

Name
Picking primer and hybridization probes

Primer-BLAST

For making primers. It uses Primer3 to design


primers and then submits them to BLAST search
public database for primer and probe sequences
used in real-time PCR assays employing popular
chemistries
(SYBR
Green
I,
Taqman,
Hybridisation Probes, Molecular Beacon)
Public database of Real-time PCR primers.
Contains over 306.800 primer mostly for human
and mouse
Calculates oligonucleotide properties

RTPrimer DB

PrimerBank
OligoCalc

URL
http://frodo.wi.mit.edu/primer3
/input.htm
http://www.ncbi.nlm.nih.gov/to
ols/primer-blast/
http://www.rtprimerdb.org/

http://pga.mgh.harvard.edu/pri
merbank/

Universal Probe
Library

Designing primers and UPL hydrolysis probe

Primer Express
Beacon
Designer
Primer Premier

Designing primers and TaqMan probes


Real-time PCR primers and probes

http://www.basic.northwestern.e
du/biotools/oligocalc.html
www.universalprobelibrary.com
or
http://www.roche-appliedscience.com
www.appliedbiosystems.com
http://www.premierbiosoft.com

Primer Design

http://www.premierbiosoft.com

References

Broude, N. E. 2005. Molecular Beacons and Other Hairpin Probes. Encyclopedia of Diagnostic Genomics and
Proteomics, 846-850 Marcel Dekker, Inc., New York.
Bustin, S. A. 2000. Absolute quantification of mRNA using real-time reverse transcription polymerase chain
reaction assays. J. Mol. Endocrinol.,25: 169.
Bustin, S. A. 2002. Quantification of mRNA using real-time reverse transcription PCR (RTPCR): trends and
problems. J. Mol. Endocrinol., 29:23.
Espy, M. J., Uhl, J. R., Sloan, L. M., Buckwalter, S. P., Jones, M. F., Vetter, E. A., Yao, J. D., Wengenack, N. L.,
Rosenblatt, J. E., Cockerill, F. R. 3rd., and Smith,T. F. 2006. Real-time PCR in clinical microbiology:
applications for routine laboratory testing. Clin.Microbiol. Rev.,19: 165.
Fraga, D., Meulia, T., and Fenster, S. 2008. Real-Time PCR. In: Current Protocols Essential Laboratory
Techniques, Gallagher, S. R., and Wiley, E. A. (Eds), 10.3.110.3.34, John Wiley and Sons, Inc. Retrieved
from http://onlinelibrary.wiley.com/doi/10.1002/9780470089941.et1003s00/full
Huggett, J., Dheda, K., Bustin, S., and Zumla, A. 2005. Real-time RT-PCR normalization; strategies and
considerations. Genes Immun., 6: 279.
Livak, K. J., and Schmittgen, T. D. 2001. Analysis of relative gene expression data using realtime quantitative
PCR and the 2(-Delta Delta C(T)) Method. Methods, 25: 402.
Livak, K.J., 2001. Relative quantification of gene expression, ABI Prism 7700 Sequence detection System User
Bulletin #2;.http://docs.appliedbiosystems.com/pebiodocs/04303859.pdf.
Peirson, S. N., Butler, J. N., and Foster, R. G. 2003. Experimental validation of novel and conventional
approaches to quantitative real-time PCR data analysis. Nucleic Acids Res., 31: e73.

104

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Pfaffl, M. W. 2001a. A new mathematical model for relative quantification in real-time RTPCR. Nucleic Acids
Res., 29, 9, e45
Pfaffl, M. W., and Hageleit, M. 2001. Validities of mRNA quantification using recombinant RNA and
recombinant DNA external calibration curves in real-time RT-PCR. Biotechnology Letters, 23, 4, 275-282
Pfaffl, M.W., G.W. Horgan, and L. Demp-fle. 2002. Relative expression software tool (REST) for group-wise
comparison and statis-tical analysis of relative expression results in real-time PCR. Nucleic Acids Res.
30:e36
Pfaffl, M. W., Vandesompele, J., & Kubista M. (2009). Data analysis software, In: Real-time PCR: Current
Technology and Applications, Logan, J., Edwards K., and Saunders N., pp. 65-83, Caister Academic Press,
978-1-90-44-55-39-4, Norfolk, UK.
Souaz, F., Ntodou-Thom, A., Tran, C. Y., Rostne, W., and Forgez, P. (1996). Quantitative RT-PCR: limits and
accuracy. Biotechniques, 21, 2, 280-285.
Tyagi, S., and Kramer, F. R. (1996). Molecular beacons: probes that fluoresce upon hybridization. Nat.
Biotechnol., 14, 303308.
Wang, X., and Seed, B. (2006). High-throughput primer and probe design, In: Real-time PCR, Dorak T. M., pp.
93-106, International University Line, 0-4153-7734-X, New York, USA.
Wittwer, C. T., Herrmann, M. G., Gundry, C. N., and Elenitoba-Johnson, K. S. (2001). Real-time multiplex PCR
assays. Methods, 25, 4, 430-442.
Wong, M. L., and Medrano, J. F. (2005). Real-time PCR for mRNA quantitation. Biotechniques, 39, 1, 75-85.

105

14
High Throughput Techniques for Transcriptome Analysis in Farm
Animals with Special Reference to Expression Microarrays
Manishi Mukesh and Monika Sodhi
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________
Gene expression analysis is increasingly becoming important in many fields of biological
research including livestock research. Understanding expressed genes pattern is critical to provide
insights into complex regulatory networks and identification of genes relevant to new
biological processes . Developments in bio informatics and molecular biology have added
several tools to the arsenal of molecular biologists to study the gene expression and novel gene
discovery. The techniques for the evaluation of gene expression have progressed from methods
developed for the analysis of single, specific genes like, Northern blotting to the techniques
aimed at identifying all genes that differ in expression between or among experimental
samples like, subtractive hybridization, expressed sequence tags (ESTs), serial analysis of gene
expression (SAGE), microarrays etc.
High throughput gene expression profiling has emerged over the last decade as one of the most
important and powerful approaches in livestock genomic research. The advancement in this area
has largely been driven by the microarray technology, wherein mRNA expression level of
potentially the entire genome in particular tissue/cells can be assessed simultaneously. The rapidly
increasing popularity of this technology to dissect the entire transcriptome of livestock genome is
evidenced by number of publications in recent years. The microarray technology has revolutionized
the study of gene expression and has given rise to an unprecedented increase in the rate of data
acquisition in analysis of gene transcript regulation in complex eukaryotic genome enabling large
numbers of genes, up to the order of tens of thousands, to be evaluated simultaneously. The
objective of a microarray experiment might be to investigate genes which are differentially up or
down regulated in cells between, a control group and cells which have undergone some treatment,
or between cells of animals of different genetic background (e.g.control mice compared to knockout
mice) or between cells in healthy tissue and diseased tissues, or between cells at different time
points (e.g. developmental biology). Because the expression pattern of a gene is tied to its biological
role, microarray studies of global gene expression can provide detailed insights into the regulation
of specific sets of genes linked by function. In the past decade or so, there has been a rapid progress
in the development of new methods to quantify the gene expression at genome wide level.
Expression microarrays and RNA-seq are currently the two most widely used genome-wide gene
expression quantification methods.The high throughput techniques like DNA microarray have
proved revolutionary tool to ultimately link entire genome expression and whole organism
function by allowing for the study of the expression of a vast numbers of genes under a range of
experimental conditions.
Numerous studies have been published addressing the critical issues of microarray experimental
design, data analyses, and application of microarray technology to investigate normal physiology
and disease pathogenesis. The method is based on the phenomenon of preferential complementary
base pairing, known as hybridization, and produces its signal by parallel hybridization of labeled
targets to specific probes that have been immobilized on a solid surface in an ordered array. The
core principle behind microarrays is the hybridization between two DNA strands, the property of
complementary nucleic acid sequences to specifically pair with each other by forming hydrogen
bonds between complementary nucleotide base pairs. Thus, DNA microarrays are an orderly array
106

Molecular Genetic Characterization of Farm Animal


Genetic Resources

of target DNA material immobilized onto a substrate, normally a coated glass microscope slide in
a precise, well-known pattern. Each probe corresponds to either a complete transcript or part of a
transcribed sequence which is tethered onto the array and the target is a labeled pool of DNA that is
complementary to mRNA. Two of the major requirements of any good microarray platform are
system reproducibility, which provides the means for high confidence experiments and accurate
comparison across multiple samples; and high sensitivity, for the detection of significant gene
expression changes, including small fold changes across multiple gene sets. All components of
microarray workflow (such as probe design, printing process, RNA sample quality, labeling,
microarray processing, scanning of the images and feature extraction algorithms) can affect the
quality of the data acquired.
Types of microarray platforms
There are two principle DNA microarray methods based upon the nature of the target arrayed
DNA material (cDNA or oligonucleotide microarrays) and method of spotting DNA (mechanical
microspotting or photolithography). The number of target genes that make up an array can range
from a small number of specific well-characterized genes to a pool of thousands of genes that may
comprise entire genomes. For certain model organisms including Arabidopsis, yeast, mouse, and
human, both cDNA and oligonucleotide arrays are commercially available and are suited to
medical diagnostics and drug discovery applications. For many non-model organisms used in
physiological studies, custom arrays can be constructed from a number of different target DNA
sources including: cDNAs clones obtained from normalized libraries, ESTs, oligonucleotides,
genomic clones or genomic DNA. Obtaining this target DNA material remains a costly barrier to
employing microarray technology for a large number of non-model physiologically interesting
organisms. These days, oligo arrays and whole genome arrays have superseded the cDNA arrays in
terms of quality, reliability and spot uniformity and avoid some of the technical pitfalls of cDNA
arrays. The oligos representing transcripts/genes are physically spotted or printed onto a solid
surface. There are various types of microarray platforms that are commercially available for
different species. Arrays can be tissue specific (mammary, immune response genes specific) or
whole genome (representing all genes expressed in an organism).Two of the major requirements of
any microarray platform are system reproducibility, which provides the means for high confidence
experiments and accurate comparison across multiple samples; and high sensitivity to detect even
small fold changes across multiple gene sets. Agilent whole genome bovine 44K chip harboring 60
mer oligos is one such very popular platform for detecting accurate differential expression. Bovine
whole genome platforms from Affymetrix are coming with shorter oligos (25-35 mer) built by
photolithographic masks. Microarray platforms from Illumina are also available for bovine and
other species. The bead chip from Illumina consists of 50 mer oligos attached to beads randomly.
Generally the cost of spotted arrays is lower than that of Affy- or Illumine arrays.
Strategies to utilize gene expression microarrays
Usually, microarrays allow for the direct comparison of expression patterns of all the target
genes spotted on an array between samples taken under two conditions or treatments. Different
fluorophores are used to label cDNA prepared from either total RNA or messenger RNA, typically
representing control and experimental conditions. Many types of fluorescent dyes are available for
microarray experiments. However, the most common dyes used for microarray studies are Cy3 and
Cy5. The fluorescently labeled cDNAs are mixed, and the probe is hybridized to target DNA
samples on the array, where labeled messenger sequences will quantitatively anneal to target
DNA sequences. However, the two dyes have non-linear sample labeling and hybridization
kinetics, which means that they do not provide equal sensitivity across the whole range of
transcripts in a sample. More specifically, they have differential labeling and scanning efficiencies
107

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

and also exhibit gene-specific bias. To combat this, the roles of the dyes are often exchanged and the
procedures of hybridization and scanning repeated, known as a dye-swap, means exchanging the
dye labels across samples. Taking a suitable average of both dye-swap pair ratios removes dye-bias,
giving more reliable results. If a dye-swap has not been performed, gene-specific dye-bias cannot
easily be removed. The contribution and cause of gene-specific dye-bias to the underlying variation
has not been properly characterized, however there has been recent research in this area aimed at
modeling this effect.
Recently, because of the availability of high quality microarrays and robust workflow, several
groups have started utilizing one colour (intensity based) microarrays which are much simpler to
perform than the traditionally more common two colour (ratio based) microarrays. In one colour
intensity based microarrays, researchers simply hybridize each available sample on one microarray.
Therefore, a one colour microarray provides the ability to compare the measured gene expression
output of a microarray directly across other microarrays to generate new and multiple ratio-metric
measurements. Therefore, one colour microarrays differ from two colour approach, where all gene
expression ratios are generated only from two samples compared on the same microarrays. For one
colour microarray experiments, mostly Cy3 is chosen because it is less susceptible to degradation
by environmental factors such as ozone, pH, and organic solvents as compared to Cy5 dye. In
general, between and within slide replication, as well as the use of well-characterized control genes
are used to ensure accuracy. Automated processes calculate a relative measure of gene expression
within the two samples for each of the target DNA samples present on the array. The overall
expression pattern of all genes collectively is known as an expression profile, wherein genes that
are up-regulated or down-regulated can easily be identified. Detailed descriptions of DNA
microarray protocols are outlined and available from several web-based sources
(e.g.http://www.gene-chips.com; http://cmgm.stanford.edu/pbrown/mguide/index.html).
Analysis of microarray gene expression data
With the generation of large amounts of microarray data, it has become increasingly important to
address the challenges of data quality and standardization related to this technology. The major
concern of data quality control is to detect problematic raw probe-level data (array with spatial
artefacts or with poor RNA quality for example) to facilitate the decision of whether to remove this
array from further analysis. Data-QC is followed by two other pre-processing steps. The first step is
data normalization. It is a fundamental step which aims at removing systematic bias and noise
variability caused by technical and experimental artifacts. Whereas the aim of the next step or data
filtering step is to discard the probe sets with very low expression across the samples (and that
provide no biological information) in order to reduce noise in data and to avoid wrong
interpretations of the final results. Of the most common normalization procedure is to choose a
gene-set which consists of genes for which expression levels should not change under the
conditions studied (housekeeping gene), that is the expression ratio for all genes in the gene-set is
expected to be 1. From that set, a normalization factor, which is a number that accounts for the
variability seen in the gene set, is calculated. It is then applied to the other genes in the microarray
experiment. The normalization procedure is carried out only on the background corrected values
for each spot.
Computational data analysis tasks such as data mining which includes classification and
clustering are used to extract useful knowledge from microarray data. In addition, relating gene
expression data with other biological information; it will provide kind of biological discoveries such
as transcription factor biding site analysis, pathway analysis, and protein-protein interaction
network analysis. Identification of differential gene expression is the first task of an in depth
microarray analysis. Differentially expressed genes are the genes whose expression levels are
significantly different between two groups of experiments. There are two common methods for in
108

Molecular Genetic Characterization of Farm Animal


Genetic Resources

depth microarray data analysis, i.e. clustering and classification. Clustering is one of the
unsupervised approaches to classify data into groups of genes or samples with similar patterns that
are characteristic to the group. Generally, classification is a process of learning-from-examples.
Given a set of pre-classified examples, the classifier learns to assign an unseen test case to one of the
classes. Clustering is the most popular method currently used in the gene expression data matrix
analysis. It is used for finding co-regulated and functionally related groups. There are three
common types of clustering methods (i.e.) hierarchical clustering, k-means clustering and selforganizing maps. Classification is also known as class prediction, discriminant analysis, or
supervised learning.
In the earlier stage, simple fold change approach was used to find differences under
assumption that changes above some threshold, (For example, two-fold) were biologically
significant. Several univariate statistical methods were used later to determine either the expression
or relative expression of a gene from normalized microarray data, including t-tests, modified t-test
known as SAM, two-sample t-tests, F-statistic and Bayesian models. For more complex datasets
with multiple classes, Analysis of Variance (ANOVA) techniques were also used. Due to the large
number of genes represented on a microarray, this may lead to a large number of false positive calls
or Type 1 error. This demand is addressed by the concept of the False Discovery Rate (FDR).
Factors determining FDR are the proportion of truly differentially expressed genes, the distribution
of the true differences, measurement variability and sample size. Benjamini and Hochberg
described a procedure to control the FDR under the assumption that the test statistics arising from
the true null hypotheses are independent. The FDR must be smaller than the number of real
differences that one finds - which in turn depends on the size of the differences and variability of
the measured expression values.
Classification, clustering and identification of differential genes can be considered as basic
microarray data analysis tasks with gene expression profiles alone. However, Gene expression
profiles can be linked to other external resources to make new discoveries and knowledge. The
identification of functional elements such as transcription-factor binding sites (TFBS) on a wholegenome level can be one of the challenging tasks. Transcription factors play a prominent role in
transcription regulation; identifying of their binding sites is central to annotating genomic
regulatory regions and understanding gene-regulatory networks. Protein-protein interactions (PPI)
are also useful tools for investigating the cellular functions of genes. It is a core of the entire
interactomics system of any living cell. Several databases that have been developed to store protein
interactions such as the Biomolecular Interaction Database (BIND), Database of Interacting Proteins
(DIP), IntAct, STRING and the Molecular Interaction Database (MINT). Combining coexpressed as
well as interacting genes in the same cluster several meaningful predictions related to gene
functions, evolutionary relationships and pathways can be made.
The next promising method for analysing microarray data is pathway analysis as it involves the
cascade of network interactions. Analysing the microarray data in a pathway perspective could
lead to a higher level of understanding of the system. This integrates the normalized array data and
their annotations, such as metabolic pathways and gene ontology and functional classifications.
Metabolic pathway analysis can identify more subtle changes in expression than the gene lists that
result from univariate statistical analysis. Gene Set Enrichment Analysis (GSEA) is a computational
method that determines whether a set of genes shows statistically significant and concordant
differences between two biological states. The gene sets are defined based on prior biological
knowledge, e.g. published information about biochemical pathways, located in the same
cytogenetic band, sharing the same Gene Ontology category, or any user-defined set. The goal of
GSEA is to determine whether members of a gene set tend to occur toward the top (or bottom) of
the list, in which case the gene set is correlated with the phenotypic class distinction.
109

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Representative microarray based transcriptomic research in livestock


With recent developments in sequencing of genome for different livestock species, the availability
of species specific microarray platform has enabled the researcher to utilize this powerful
technology to discover genes and address a variety of questions relating to normal physiological
processes, such as cell differentiation, pregnancy, lactation, and parturition in different livestock
species. Very recently, results from global transcriptomics studies have started unfolding critical
aspects in bovine health and normal physiology.
Several microarray based attempts were made to understand the host-pathogen interaction in
animal species to better understand the immune functions and regulation of genes controlling
immunity trait (Jiang et al., 2008). Also substantial progress has been made in understanding the
physiology and tissue (mammary gland, liver) genomic responses of high producing Holstein
Frisian cattle during the stressful periparturient stage of animal, infectious diseases like mastitis
and metabolic disorders like ketosis (Loor et al., 2007; Moyes et al., 2010). Using bovine microarray
chip, Loor et al (2007) highlighted the changes in key metabolic and signaling network signatures
during nutrition induced ketosis and liver lipidosis in periparturient dairy cows. In their study,
several genes playing key roles in hepatic metabolism adaptations to negative energy balance and
changing physiological state near time of parturition were identified.
Some insights into bovine muscle biology (beef biology) have been obtained by cattle muscle
profiling utilizing microarray studies. Byrne et al. (2005) undertook gene expression profiling of
muscle tissue in Brahmen steers to understand the processes associated with remodeling of muscle
tissue in response to nutritional stress. Gene expression profiling was also conducted in different
muscle types to better understand the muscle characteristics which determine meat quality traits
across muscles, and is a major factor of variability of meat tenderness. Australian and Japneese
scientists undertook a microarray-based comparison of the longissimus muscle (LM) from Japanese
black and Holstein cattle over an extended intensive feeding period to identify genes that may be
involved in determining the unique ability of Japanese black cattle to deposit intramuscular fat with
lower melting temperature (Wang et al., 2005). Other transcriptomic studies of bovine muscle were
reported to identify some markers of meat tenderness and insight into muscle growth in cattle
(Sudre et al., 2005; Reecy et al., 2006). Gene expression profiles were compared in Charolias bulls
between high and low meat quality scores of tenderness, flavour and juices (Bernard et al., 2007).
Microarray technology has also been extensively used to unravel key insights of reproductive
biology in different livestock species. Caetano et al. (2004) identified differentially expressed genes
in ovaries and ovarian follicles of pigs selected for increased ovulation rate to seek new insights into
ovarian physiology and the quantitative genetic control of reproduction in swine. Ushizawa et al.
(2004) undertook cDNA microarray analysis of bovine embryo gene expression profiles during the
pre-implantation period to identify genes involved in embryonic development. Recently, Hayashi
et al. (2010) carried out differential genome-wide gene expression profiling of bovine largest and
second-largest follicles to identify genes associated with growth of dominant follicles.
With a goal to better understand bovine mammary gland biology, Suchyta et al., (2004)
compared the gene expression profiles of lactating bovine mammary gland against non-lactating
tissue on a bovine microarray chip that yielded many novel and interesting genes expressed
specifically in lactating mammary tissue. To understand the complexity that underlies mammary
gland development and function, several microarray expression data that were generated in
different studies has provided insight into the mechanisms that ultimately allow mammary gland
to function in a coordinated fashion throughout puberty, pregnancy, lactation, and involution.
These days, initiatives like elucidating the signaling mechanisms underlying the functional
development of mammary gland and regulation of milk fat/protein synthesis through out the
110

Molecular Genetic Characterization of Farm Animal


Genetic Resources

lactation cycle by generating whole genome expression pattern coupled with metabolic/hormonal
pathways has become high priority area of research in animal genomics that can yield a wealth of
information on as yet unknown molecular adaptations in response to physiological stage of the
animal. Such inputs can relate functional development of mammary gland of dairy animals with
coordinated changes in the global expression pattern to understand the basic biology of mammary
gland development that is far from complete.
RNA sequencing (RNA Seq)
Recent advances in high throughput sequencing technologies (Next or Second Generation
Sequencing) have introduced a new alternative to microarrays, namely RNA-seq. After years of
extensive investigations based on the characterization of genome-wide gene expression through
oligonucleotide-based array technologies, transcriptomics has gained new momentum, thanks to
the advent of Next Generation Sequencing (NGS). This tool quantifies gene expression by
sequencing short strands of cDNA, aligning sequences obtained back to the genome or
transcriptome, and counting the aligned reads for each gene.
Until the advent of RNA-Seq, microarrays were the standard tool for gene expression
quantification. But with the development of new sequencing technologies and bioinformatic tools,
RNA-Seq has emerged as an appealing alternative to classical microarrays in measuring global
genomic expressions (Wolf et al. 2010). The RNA-seq, also called whole-transcriptome shotgun
sequencing, refers to the use of high-throughput sequencing technologies for characterizing the
RNA content and composition of a given sample. RNA-seq technology, unlike microarray, does not
depend on the prerequisite knowledge of the reference transcriptome. Further, RNA-seq data
contains very low background signal, a higher dynamic range of expression levels, and also
relatively small amount of total RNA required for quantification, when compared to microarray.
Therefore, gene detection in RNA-Seq, unlike microarray, is not dependent on probe design; rather
it relies on short nucleotide reads mapping which can attain exceedingly high resolution.
Although both RNA-seq and microarrays are generally in good agreement when it comes to
relative gene expression quantification (Nookaew et al. 2012), RNA-seq has clear advantages as it
can have sufficient coverage and captures a wider range of expression values. It is able to identify
transcripts that have not been previously annotated and it can quantify both very low transcripts
(unlike microarrays where there is background noise interference), and very high ones. As a digital
measure (count data), it scales linearly even at extreme values, whereas microarrays show saturation of analog-type fluorescent signals (Marioni et al. 2008). RNA-seq further provides
information on RNA splice events; these are not readily detected by standard microarrays
(Mortazavi et al. 2008). However, microarray technology is still widely used because of lower costs
and wider availability. While RNA-seq will most likely take the lead role in transcriptome analysis
in the near future, one should not forget that RNA-seq data collection and statistical analysis are
still under development. Several other problems related to read errors, overwhelming amount of
ribosomal RNA (rRNA) in the data, short reads, and variation of read density along the length of
the transcript, possess a challenge for this high-throughput method. Additionally, cost of NGS,
necessary computing, data storage facilities and bioinformatics expertise associated with these
technologies is still quite demanding compared to microarrays. Thus, microarrays should not be
dismissed by default, and it is worth considering which application is best suited for addressing the
question at hand before engaging in a large RNA-seq experiment.
The continuous development of bioinformatics and genomic approaches for improved
annotation combined with new data analysis tools that enable cross-species comparisons will
greatly enhance the extraction of biological information from species specific microarrays/RNA seq
and advance our understanding of livestock biology. From the economic point of view, the
importance and impact of genome wide tools in modern agriculture is likely to increase in coming
111

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

years. Over the longer term, these high-throughput technologies would reshape the livestock
biology in terms of functional annotation and discovery of new gene regulating trait of economic
importance, complete description and understanding of cellular pathways (e.g., metabolism,
proliferation, cell-cell interaction), understanding genomic-environment interaction (e.g.,
developmental pathways, abiotic stress, nutritional genomics and infectious diseases). This would
further help in identification of target molecules for improvement and selection of better
performing livestock species to ensure food security meeting the challenges of increasing global
population.
References

Bernard C et al. (2007). New indicators of beef sensory quality revealed by expression of specific genes. J Agric
Food Chem.55:52295237
Byrne KA, Wang Y H, Lehnert S A, Harper G S, McWilliam S M, Bruce H L and Reverter A (2005).Gene
expression profiling of muscle tissue in Brahman steers during nutritional restriction. J. Anim. Sci. 83:1-12
Caetano AR, Johnson RK, Ford JJ, Pomp D. (2004). Microarray profiling for differential gene expression in
ovaries and ovarian follicles of pigs selected for increased ovulation rate. Genetics, 168: 1529-1537
Hayashi KG, Ushizawa K, Hosoe M and Takahashi T. (2010).Differential genome-wide gene expression
profiling of bovine largest and second-largest follicles: identification of genes associated with growth of
dominant follicles.Reproductive Biology and Endocrinology, 8:11
Loor JJ, Everts RE, Bionaz M, Dann HM, Morin DE, Oliveira R, Rodriguez-Zas SL, Drackley JK, and Lewin
HA (2007). Nutrition-induced ketosis alters metabolic and signaling gene networks in liver of
periparturient dairy cows.Physiol. Genomics 32: 105-116
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008). RNA-seq: an assessment of technical
reproducibility and comparison with gene expression arrays. Genome Research, 18: 1509 1517
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008). Mapping and quantifying mammalian
transcriptomes by RNA-Seq. Nature Methods, 5: 621-628
Moyes K M, Drackley J K, Morin DE, Rodriguez-Zas SL, Everts R E, Lewin HA, and Loor JJ. (2010). Mammary
gene expression profiles during an intramammary challenge reveal potential mechanisms linking
negative energy balance with impaired immune response. Physiol Genomics, 41(2): 161 170
Nookaew I, Papini M, Pornputtpong N et al. (2012). A comprehensive comparison of RNA-Seq-based
transcriptome analysis from reads to differential gene expression and cross-comparison with
microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Research, 40: 10084 10097
Reecy J M, Moody SD and CH Stah (2006). Gene expression profiling: Insights into skeletal muscle growth
and development. Journal of Animal Sciences, 84:E150-E154
Suchyta SP, Sipkovsky S, Halgren RG, Kruska R, Elftman M, Weber-Nielsen M, Vandehaar MJ, Xiao L,
Tempelman RJ, Coussens PM (2003) Bovine mammary gene expression profiling using a cDNA
microarray enhanced for mammary-specific transcripts. Physiol Genomics,16:818
Sudre K, Cassar-Malek I, Listrat A, Ueda Y, Loroux C, Jurie C, Auffrag C, Renand G, Martin P, and
Hocquette JF. (2005). Biochemical and transcriptomic analyses of two bovine skeletal muscles in
Charolais bulls divergently selected for muscle growth. Meat Sci. 70:267277
Ushizawa K, Herath CB, Kaneyama K, Shiojima S, Hirasawa A, Takahashi T, Imai K, Ochiai K, Tokunaga T,
Tsunoda Y, Tsujimoto G, Hashizume K. (2004). cDNA microarray analysis of bovine embryo gene
expression profiles during the pre-implantation period. Reprod Biol Endocrinol.2, 77
Wang, YH, Byrne KA, Reverter A, Harper GS, Taniguchi M, McWilliam S M, Mannen H, Oyama K, and
Lehnert S A. (2005). Transcriptional profiling of skeletal muscle tissue from two breeds of cattle. Mamm.
Genome, 16:201210
Wolf JBW, Bayer T, Haubold B et al. (2010). Nucleotide divergence versus gene expression differentiation:
comparative transcriptome sequencing in natural isolates from the carrion crow and its hybrid zone with
the hooded crow. Molecular Ecology, 19: 162-175

112

15

Strategies for Genotype and Phenotype Association Studies in Livestock


S P Dixit, Anurodh Sharma and Jayakumar Sivalingam
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Genetic variation
Genetic variability is a measure of tendency to differ an individual from the average of the
population it belonged. Genetic variability also underlies the differential susceptibility of organisms
to diseases and sensitivity to toxins or drugs a fact that has driven increased interest in
personalized medicine given the rise of the human genome project and efforts to map the extent of
human genetic variation. Genetic recombination is one of the sources of variability. During the
process of meiosis, two homologous chromosomes from the male and female are crossed over
randomly on one another and tend to exchange gene sequences. Once split apart, it produces its
own offspring. This process is governed by its own sets of genes that code for where crossover can
occur and mechanism of exchange of DNA chunks. Recombination also vary in frequency and
location, thus, it can be selected to increase fitness by nature. More recombination results more
variability and its easy for a population to adapt for a changing environment.
Genetic polymorphism
Genetic polymorphism refers to the difference in DNA sequence among individuals, groups, or
populations, and can be caused by mutations ranging from a single nucleotide base change to
variations in several hundred bases. The progress of molecular genetic technology during last two
decades has generated many advances including the discovery of DNA based markers, which
immensely contributed to the development of gene mapping. This would facilitate to identify
genes, which control part of the variability of the phenotypic traits. Broadly, two experimental
strategies have been developed for this purpose: linkage studies and candidate gene approach.
Linkage studies rely on genetic map knowledge, searching quantitative trait loci (QTL) by using
family information and comparing segregation patterns of genetic marker and the traits being
analyzed. Markers that tend to co-segregate with the analyzed trait provide approximately
chromosomal location of the underlying genes.
SNP genotyping methods
For SNP genotyping, there are many techniques available. One key feature of most SNP genotyping
techniques, apart from those based on direct hybridization, is the two steps separation: 1)
generation of allele-specific molecular reaction products; 2) separation and detection of the allele
specific products for their identification.
Direct Hybridization Techniques:
i.
Dot Blot
ii.
Reverse Dot Blot technique
Techniques Involving Generation and Separation of an Allele-Specific Product:
i.
Restriction Fragment length polymorphism (RFLP)
ii.
Single strand conformation polymorphism (SSCP)
iii.
Primer extension
iv.
Oligonucleotide ligation assay (OLA)
v.
Invasive cleavage of oligonucleotide probes (invader assay)
vi.
Pyrosequencing
vii.
Array based high throughput genotyping
113

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Table:Illumina's BeadChips developed for important domestic animals


Species

BeadChip name

No. of SNPs
(Approximate)

No. of mapped
SNPs
(Assembly)

Chicken
Dog

Multiple
chips*
CanineSNP20

22,362

Dog

CanineHD

170,000

Cattle

BovineSNP50

54,001

Cattle

BovineHD

777962

22,000
(CanFam2.0)
170,000
(CanFam2.0)
52,255
(Btau4.0)
>90% (Btau)

Cattle

Bovine3K**

3,000

3,000 (Btau4.0)

Horse

EquineSNP50

54,602

Pig

PorcineSNP60

64,232

Sheep

OvineSNP50

54,241

54,602
(EquCab2.0)
55,446
(Sscrofa9)

Average
interval
between
SNPs
(kb)

Average
MAF
across tested
populations

125

0.27

14.3

0.23

51.5

0.25

<3

>.05

43.2

0.21

40.7

0.27

46

~0.3

Release status

Open
with
restriction
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available

* Multiple chips were produced, including a 60K SNP array.


** Selected from SNPs on the BovineSNP50, with the potential use for selecting breeding cattle prior to purchase in the
dairy industry.

Principles and methods of association studies utilizing SNPs


Any two individual genomes differ in millions of different ways. There are small variations in the
individual nucleotides of the genomes (SNPs) as well as many larger variations deletions,
insertions and copynumbervariations. Any of these may cause alterations in an individual's traits,
or phenotype, which can be anything from production performance, disease risk to physical
features. The variability within genes coding for protein products involved in key physiological
mechanisms and metabolic pathways directly or indirectly involved in determining an economic
trait (e.g. feed efficiency, muscle mass accretion, reproduction efficiency, disease resistance, etc.)
might probably explain a fraction of the genetic variability for the production trait itself.
There are two main approaches, one based on candidate genes (CG) and the other based on
testing the entire genome (genome-wide association (GWA). Both approaches enjoy a combination
of benefits and drawbacks. The identification of mutations in candidate genes can be analyzed in
association studies. The association may be demonstrated at the level of chromosome (haplotype)
or at the level of individual (genotype). The study of associations of candidate genes is a step for the
knowledge of the genetic basis of productive traits, and compared to other genomic approaches
(QTL detection) is potentially more easily and efficiently implemented in breeding programs. Based
on this knowledge, new strategies of gene assisted selection or introgression could be designed to
modify production traits. Broadly, CG studies tend to have rather high statistical power but are
incapable of discovering new genes or gene combinations, while GWA studies can pinpoint genes
regardless of whether their function was known before but have low power owing to the number of
independent tests performed. Association analyses are common method for dissecting complex
114

Molecular Genetic Characterization of Farm Animal


Genetic Resources

traits in humans. Similarly, SNP markers in livestock allow use of association analyses of
economically important traits. Advancement in field of polymorphism studies has led to scaling up
from few SNPs to genome-wide SNPs. In most studies concerning the associations between
genotype and production traits the results are highly dependent on a breed, an animal population,
and even on a herd. Now the question is what are appropriate methods and models to use on these
data?
Binary traits: A complex binary trait is a character that has a dichotomous expression but with a
polygenic genetic background. The trait like disease comes under this category. Association
between genotype at a particular locus and disease can arise in three ways namely 1.) The locus
may be causally related to the disease, different allele carrying different risks.2) the locus may not
itself be causal but may be sufficiently close to causal locus as to be in linkage disequilibrium with
it. Association may be due to confounding by population stratification or admixture. If the
association due to confounding then it is of little interest and should be excluded. Measure of
association for disease traits can be done in form of risk rate or odd ratios.
Genotype

Disease
Yes

No

AA

X AA

1- X AA

Aa

X Aa

1- X Aa

aa

X aa

1- X aa

If there are three genotype AA, Aa and aa with relative penetrance as shown in above table and
allele a is more common form then aa genotype is taken as reference and relative risk is calculated
as AA = X AA / Xaa and Aa = X Aa / Xaa . The odd ratio is calculated as X* AA = X AA *1- X
aa / 1- X AA * X aa and as X* Aa = X Aa *1- X aa / 1- X Aa * X aa.
The test of association is done using test log likelihood ratio test and score test. Both these
statistics are asymptotically distributed as chi squared with two degree of freedom and both can be
expressed as simple function of observed frequencies and corresponding expected frequencies. The
Chi-squared test can also be done with assumption that all three genotype have equal rate of
disease or each genotype has different disease rate. A 2 x 2 (allele) or 3 x 2 (genotype) table is
analyzed. Mixed model are used if fixed effects thought to affect incidence of condition.
Continuous traits: For the association studies, the traits of interest can be analyzed using the
General Linear Model (GLM) procedure of the software packages like SAS, SPSS or SYSTAT and
the least square means of the genotypes can be compared by the Tukey test. The linear model used
to fit the quantitative variables can include, in addition to the genotype effect, other factors which
affect the trait. A simple model involving genotype effect can be represented like:
Y ijk =A+G i +Cj+e ijk , where Y ijk =production trait, A=overall mean, G i =fixed effect of the ith
genotype, C j = any other fixed effect and e ijk =random error. In order to exclude from the analysis
genotypes with small number of animals and avoid confounding between genetic groups and
genotype effects on traits of interest, genotypes with very low frequency in the total animal sample
or genetic groups showing a single genotype are not analyzed. G j estimates will contain the
additive effects of each of the two gene regions constituting the genotype, the general and specific
dominance interactions between the two gene regions and the average epistatic effects between
each of the two gene regions and the remaining genotype of each of the individuals within each G j
group. Genotype can be fitted as random effect when aim is to identify the best genotype for
selection and to obtain Best Linear Unbiased Predictions (BLUPs). The most common use of a
mixed model to test the association between a genetic marker and a phenotype is to fit the marker
as a fixed effect and a polygenic component modeled as a random effect. The random effect is the
115

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

individual taxon (strains, inbred lines or varieties). A likelihood ratio test against the chi-square
distribution (when using maximum likelihood), or the Wald test against either the chi-square or
normal distribution when using restricted maximum likelihood (REML), is performed to assess the
significance of the effect of a polymorphic marker.
Genome Wide Association studies (GWAS): GWAS involves correlating allele frequencies at each
of several hundred thousand markers spaced throughout the genome with trait variation in a
population-based sample. GWAS is based on the premise that a causal variant is located on a
haplotype, and therefore a marker allele in Linkage Disequilibrium with the causal variant should
show an association with a trait of interest. One of the advantages of the GWAS approach is that it
is unbiased with respect to genomic structure and previous knowledge of the trait etiology, in
contrast to candidate gene studies, where knowledge of the trait is used to identify candidate loci
contributing to the trait of interest. Therefore, GWAS results hold the promise to reveal causal
genes not previously suspected in disease etiology and to estimate relatively complete genetic
effects (additive and non-additive) and pleiotropy in an unbiased way.
In a GWAS, allele frequencies at thousands if not millions of loci are compared in individuals of
varying phenotype. Defining the phenotype is an important consideration because phenotypic
heterogeneity can reduce power. Other complexities, including data quality per individual and per
SNP, batch effects and relatedness among samples as well as genetic outliers must be accounted for
to avoid systematic bias. GWAS analysis tests for association of each SNP with qualitative or
quantitative trait value in hundreds to tens of thousands of individuals. For quantitative traits,
linear regression or Spearmans rank correlation is used to test each SNP for association between
trait values and genotype. For categorical traits (e.g., case-control status or phenotypic extremes),
chi-square or contingency table-based tests can be used in addition to logistic regression tests. The
statistical power of a GWAS is a function of sample size, effect size, causal allele frequency, and
marker allele frequency and its correlation with the causal variant. Population stratification must be
addressed in these analyses. Stratified analysis (e.g., using a CochranMantelHaentzel test),
population structure covariates (e.g., inferred population assignments, or principal component
analysis (PCA) eigenvectors are approaches for dealing with cryptic population structure. Various
strategies exist for testing associations between markers and traits. The most common methods of
association analysis involve fitting one marker at a time. An iterative, stepwise regression, proceeds
by fitting the marker with the strongest association first, then retesting the remaining markers for
significance after. Additional markers are added in a similar fashion until a stopping criterion is
met. A different strategy is to fit all the markers simultaneously as random effects. The distribution
of the markers can then be modeled according to a Bayesian framework. EMMA/R, TASSEL,
ASREML and SAS Proc Mixed and WOMBAT are little software suitable for association analysis.
Genomic breeding value and genomic selection
Genomic selection is the ultimate application of markers in animal breeding. Genomic evaluation
has been developed to predict breeding values using dense marker maps. The introduction of highthroughput single nucleotide polymorphism (SNP) genotyping methods has cleared the way for
implementation of genomic selection. Several studies have shown that genomic selection is
significantly more accurate than traditional selection of young animals, especially for lowheritability traits. This has led to a great need for developing flexible and efficient software for
genomic evaluation in livestock. Methods commonly used to estimate genomic breeding values
(GEBV) are best linear unbiased prediction from mixed model analysis using a genomically
estimated relationship matrix (G-BLUP), random regression BLUP (R-BLUP) and different non
linear methods. For most of the economically important traits in livestock, accuracy of linear
models was shown to be similar to non linear methods or even more accurate. Only for traits that
116

Molecular Genetic Characterization of Farm Animal


Genetic Resources

are lowly heritable and controlled by few large QTL, the nonlinear methods were more accurate.
The model to predict GEBVs, considering only additive genetic effects, is described as:
Yi = fixed effects + animali + (SNPijk) + ei

where y i may be a phenotype EBV, daughter-yield deviation, deregressed EBV or average


offspring performance of animal i; fixed effects are a set of fixed effects, which may only be the
overall mean, animal i is a polygenic effect, (SNP ijk ) is the sum of both SNP effects (k =1, 2),
summed across all loci ( j) for animal i. Note that for both alleles, a separate effect may be estimated
(k =1, 2), or alternatively, one allele substitution effect a may be estimated per locus. The GEBVs of
animal i can be obtained as follows: GEBVi= animali + (SNPijk). Independent of the applied

model to estimate marker effects, the accuracy of GEBVs strongly depends on the linkage
disequilibrium between marker and QTL loci that is consistent between the reference population
and the animals for which GEBVs are predicted. The accuracy of the estimated marker effects
depends on the characteristics of the reference population, such as the number of included
phenotypes, sampling of animals from the population and the heritability of the trait.
References

Barbara E. Stranger,Eli A. Stahl and Towfique Raj 2011. Progress and Promise of Genome-Wide Association
Studies for Human Complex Trait. Genetics, 187: 367-383.
Brookes, A.J. 1999. The essence of SNPs. Gene, 234: 177.
Calus M. P. L. 2010. Genomic breeding value prediction: methods and procedures. Animal, 157.
Curi R.A., de Oliveira H.N., Silveira A.C., Lopes C.R. 2005 Association between IGF-I, IGF-IR and GHRH
gene polymorphisms and growth and carcass traits in beef cattle. Livestock Production Science, 94: 159.
Duncan, B.K. and Miller, J.H. 1980.Mutagenic deamination of cytosine residues in DNA. Nature, 287: 560.
Jayakumar, S. and Ved Prakash. Phenomic and genomic tools for analysis of livestock genome.146.
Lander E.S. et al. 2001. Initial sequencing and analysis of the human genome. Nature, 409: 860.
M. Ota, H. Fukushima, J. K. Kulski, and H. Inoko, 2007. Single nucleotide polymorphism detection by
polymerase chain reaction-restriction fragment length polymorphism. Nat Protoc., 2857.
Meuwissen, T.H.E., Hayes, B.J. and Goddard, M. E. 2001. Prediction of total genetic value using genome-wide
dense marker maps. Genetics 157: 1819.
Meyer K. 2007 WOMBATA tool for mixed model analyses in quantitative genetics by restricted maximum
likelihood (REML) J Zhejiang Univ Sci B. 8: 815.
Yan, H., Kinzler, K.W. and Volgelstein, B. 2000. Genetic testing, present and future. Science 289: 1890.
Zhiwu Zhang, Edward S. Buckler, Terry M.Casstevens and Peter J. Bradbury. 2009. Software engineering the
mixed model for genome-wide association studies on large samples Briefings in Bioinformatics. 10: 664.

117

16

High Performance Computing for High Throughput Data Analysis


Avnish Kumar Bhatia
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

High performance computing


Computers consist of a processing component to perform computations and a memory component
to store software and data. Computer with a single processor can easily perform day to day
tasks.However, multiple processors are required to solve a problem with complex
operations.Highperformance computing (HPC)involves use of supercomputers or clusters of
computers to solve computational problems using huge data sizes and intensive computing. These
problems are toolarge tobe solved on a workstation because of their requirement ofhuge memory
and computational resources. HPC systems often derive their computational power by exploiting
parallelism thatenables a number of processes to work on a process at the same time.
There are two general models for managing and coordinating large numbers of processors supercomputers and distributed computingor cluster of computers (Korte, 2014). Supercomputers
are large, expensive systems housed in a single room where multiple processors are connected by a
fast local network. Clusters are systems in which processorsconnected through networks are not
necessarily located in close proximity and can even be housed on different locations.
Supercomputers have distributed or shared memory, use custom components (processor, network),
are constructed by a major vendor (for example IBM, HP), and use custom (Unix-like) operating
systems. Clusters consist of network of workstations assembled by vendor or users using of-theshelf components, and use Linux operating system. Supercomputers are very expensive to buy with
high levels of availability and scalability. Clusters are much cheaper to buy, butexpensive to own
with lower overall availability and scalability.
High Performance Computing technologies are making it possible to carry out radical biological
andmedical breakthroughs by high throughput data analysis. Graphics Processing Units (GPUs)
areproviding a unique opportunity to tremendously increase the effective computational capability
of the commodity PCs, allowing desktop supercomputing atvery low prices. Moreover, large
clusters are adopting the use of these relativelyinexpensive and powerful devices as a way of
accelerating processing of applications.
Since the emergence of supercomputers in the 1960s, computing performance has often been
measured in floating point operations per second (FLOPS). An early supercomputer -CDC 6600
reached a peak processing speed of 500 kilo-FLOPS in the mid-1960s. The worlds fastest
supercomputer in2013, Chinas Tianhe-2, operated at a peak speed of 34 peta-FLOPS (Korte, 2014).
In the November 2013 list, thetwo top supercomputers in the world use co-processors to improve
theirperformance (Perez-Sanchezet al., 2014). Tianhe-2 has more than 16000 nodes composed bytwo
Processor Intel Xeon E5 and three Coprocessor Intel Xeon Phi 31S1. In the top tensupercomputers,
four make use of co-processors to enhance their performance.The computing power is alsoanalyzed
in terms of power consumption: Tianhe-2 has a power consumption of 17GW fora total of 33 PetaFLOPS. Titan has a power consumption of 8GW fora total of 17 PetaFLOPS.
Amazon cloud, one of the worlds fastest distributed systems, achieved a speed of 1.2 petaFLOPS in 2013. While it doesnt compete with supercomputers like the Tianhe-2, distributed
systems can be built much more cheaply than supercomputers. A 2013 HP study found that the
hourly cost of renting a processor on a dedicated supercomputer was approximately 2-3 times as
high as on a comparable distributed cloud-based system (Korte, 2014).
118

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Data measurements
Some data measures are listed as Mega: 220106; Giga: 230109 ;Tera: 2401012; Peta: 2501015; Exa:
2601018.Some of the Data Sizesare listed as: 1,000 Bytes (1 KB) is size of an email; Size of Human
Chr-1 is 250MB; 4GB is thesize of DVD; 1,000,000,000,000bytes (1Terra Bytes) is 1/15th Library ofUS
Congress (256 DVDs); 5 TB is the size of primary data fr. Illumina HiSeq2K. Main memory sizes are
listed as: Personal computer: 1 GByte; Top supercomputer: 10 TByte. Disk space (Byte): Single disk
2004: 200 GByte: Top supercomputer: 700 TByte. (Source: Slides by Thomas Ludwig).
Computational Performance (floating point operations per second = Flops) are listed as: Modern
processor: 3 Giga Flops; Top supercomputer: 33 Peta Flops. Network performance (Byte/s):
Personal computer: 10/100 MByte/s; Supercomputer networks: gigabytes/s
Applications in bioinformatics
As the number of sequenced genomes has considerably increased, inadequatecomputational power
has become a bottleneckfor research in evolutionarybioinformatics. Finding all common
genesbetween any two different species that come from asingle gene of the last common ancestor,
referred to as orthologs will take more than 60 years of computationwith a modern personal
computer. Ifa researcher aims to finish the ortholog computation ina week, it requires at least 60
52 = 3120computing nodes for a whole week without interruptions (Kim et al, 2012).Although
many research institutions now provide clustercomputing services that enable users toexecute
multiple computing jobs in parallel, computingresources of this size may not be available
becausescalability of a system is limited by the total hardwarecapacity of the hosting institution,
which is shared by a number of users. A few bioinformatics applications of high performance
computing from literature are listed in the following subsections.
Next Generation Sequencing
Next Generation Sequencing data analysis is one of the most demanding applications in
bioinformatics (Perez-Sanchez et al., 2014). A major limitation associated with NGS data analyses is
the requirement of large data storage and High Performance Computing facilities (Kadarmideen,
2014).Starting from procedure like alignments andvariant calling to more complex challenges like
genome wide annotations andbiomarkers correlation to diseases, NGS analyses are timeconsuming.High performance computing provides advantages in this field of genomics applied to
medicine and healthcare.
Global aligners are very fast with use of particular data representation approaches, such as the
Burrows-Wheeler transform (BWT). Butthey are quite slowto achieve the optimal result through the
backtrackingapproach, despite the use of reliable representations ofdata. It becomes more complex
when local alignments are needed.GPU based solutionssuch as CUSHAW are available, which is a
CUDA (ComputeUnified Device Architecture)compatible short read alignment algorithm for
multiple GPUs sharing a singlehost. It provides support for un-gapped alignment, and results are
comparable with BWT-based aligners such as Bowtie andSOAP2. Another aligner BarraCUDA is
directly based on BWA, anddelivers a high level of alignment fidelity and is comparable to other
mainstreamalignment programs. It can perform alignments with gap extensions, in order
tominimize the number of false variant calls in re-sequencing studies.
Genetic Algorithms based docking
Genetic Algorithm (GA) has been used to find the optimal docking conformation of a ligand with
respect to a protein. All the data relative to the GA state is maintained on the GPU memory,
avoiding data movement through the PCI Express bus. The GA generates the random numbers on
the CPU instead of doing it on the GPU for two reasons (i) it enables one-to-one comparisons of
119

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

CPU and GPU results, (ii) it reduces the design, coding and validation effort of generating random
numbers on GPU (Perez-Sanchez et al., 2014).
An enhanced version of the PLANTS approach for protein-ligand docking usingGPUs is also
available. It exhibits speedup factors of up to 50x in their GPUimplementationcompared to an
optimized CPU based implementation for the evaluation of interaction potentials in the context of
rigid protein. The GPU implementationhas been carried out using OpenGL to access the GPU's
pipeline and Nvidia's Cglanguage for implementing the shaders programs. The speedup factors
observed are limited by several factors. First, only the generations of the ligand-protein
conformation and the scoring function evaluation are carried out on the GPU, whereas the
optimization algorithm is run on the CPU. This algorithmic decomposition impliestime-consuming
data transfers through PCI Express bus. The optimization algorithm used in PLANTS is the Ant
Colony Optimization (ACO) algorithm.A parallel scheme for this algorithm on a CPU cluster is
proposed, which use multiple ant colonies in parallel, exchanging information occasionally
between them.
DAIRRy-BLUP
DAIRRy-BLUP, a parallel, Distributed-memory RR-BLUP implementation, based on single-trait
observations (y), uses theAverage Information algorithm for restricted maximum-likelihood
estimation of the variance components (De Coninck et al., 2014). DAIRRy-BLUP enables the analysis
of large-scale data sets to provide more accurate estimates of marker effects and breedingvalues. A
distributed-memory framework is required since the dimensionality of the problem determined by
the number ofSNP markers becomes too large to be analyzed by a single computer. DAIRRy-BLUP
enablesthe analysis of very large-scale data sets up to 1,000,000 individuals and 360,000 SNPs.
Increasing thenumber of phenotypic and genotypic records has a more significant effect on the
prediction accuracy than increasing thedensity of SNP arrays.Gengar cluster on Stevin,the highperformance computing (HPC) infrastructure of GhentUniversity has been used,which consists of
194 computing nodes (IBMHS 21 XM blades) interconnected with a 4X DDR Infinibandnetwork (20
Gbit/sec). Each node contains a dual-socket quadcoreIntel Xeon L5420 2.5-GHz CPU (eight cores)
with 16 GBRAM. A one-to-one mappingof processes to CPU cores has been applied to achieve a
high performance.
Computational analysis of large-scale proteome data sets
Computational analysis of shotgun proteomicsdata can be performed in an automated
andstatistically rigorous wayby the freely availableMaxQuant environment. The sophisticated
algorithms and the amount of data require very highcomputational demands. Parallelization
andmemory optimization of the MaxQuant software with the aimof executing it on a large
computer cluster has been implemented (Neuhauser et al., 2013). The analysis mitigates bottlenecks
in overall performance to find that themost time-consuming algorithms are those detecting
peptidefeatures in the mass spectrometry (MS) data as well as the fragment spectrumsearch. These
tasks scale with the number of raw files and can readily be distributed over many CPUs. The
performance of a parallelized version of MaxQuant running on a standard desktop has been
compared with anI/O performance optimized desktop computer (game computer), and a cluster
environment. The modified gaming computerand the cluster vastly outperform a standard desktop
computer when analyzing more than 1000 raw files.The resulting MaxQuant version is highly
parallelizedand memory optimized. Highperformance platform has been applied to investigate
incremental coverage of the human proteome by high resolution MS data originating fromin-depth
cell line and cancer tissue proteome measurements.
Close to 1000 raw files can beefficiently processed in the standard workflow in a matter of afew
days. For the future, both the power ofcomputational hardware and the size of the data acquired
120

Molecular Genetic Characterization of Farm Animal


Genetic Resources

inproteomic investigations will increase. For instance, the numberof MS and MS/MS scans used in
standard acquisitions couldincrease several fold over the next few years, just as it has overthe last
several years. Countering this additional computationalload, current desktop chips with 12 virtual
cores already exist. Based on the trends it is expected that the computationaldemands of the
standard workflow for in depth shotgunproteomics can be comfortably handled for the
foreseeablefuture. But specialized tasks, such as searches in six frametranslations of large genomes,
and other extremely computingintensive tasks may benefit from large clusters.
Analysis of SNPs Interaction in Genome Wide Association Studies
Genome-wide association studies (GWAS) lead to systematic discovery of single
nucleotidepolymorphisms (SNPs) which are associated with a given disease. Univariate analysis
approaches may miss important SNP associations that only appear through multivariate analysis in
complex diseases.However, multivariate SNP analysis is currently limited by its inherent
computational complexity.Goudey et al. (2015) present a computational framework that harnesses
supercomputers. They estimate a three-wayinteraction analysis on 1.1 million SNP GWAS data
requiring over 5.8 years on the full Avoca IBM Blue Gene/Qinstallation at the Victorian Life
Sciences Computation Initiative. This is hundreds of times faster than estimates forother CPU based
methods and four times faster than runtimes estimated for GPU methods.It is becoming feasible to
carry out exhaustive analysis of higher order interaction studies on large modern GWAS. Nearlinear scalability of runtimewith the number of threads on a parallel, distributedmemory
supercomputer allows for a reduction in analysisruntime that has not been achieved previously.
Summary
High throughput data analysis in the field of Bioinformatics and Computational Biology can take
advantage from improvement inhigh performance computing systems to overcome computational
limitations. These applications provide the opportunity to create new exciting therapeutic strategies
formore productive andhealthier lifestyles that were unfeasible not so long ago.
Cloud computing services haveemerged as a cost-effective alternative for locallyinstalled
computing clusters. They provide computingresources and data storages that are virtually
withoutlimit, not interrupted by other users applications orsystem maintenance, and charged by
usage only. (Kim et al., 2012).
References

Coninck, A. D., Fostier, J., Maenhout, S. and Baets B. D. 2014. DAIRRy-BLUP: A high-performance computing
approach to genomic prediction. Genetics, 197: 813.
Goudey, B., Abedini1, M, Hopper, J.L., Inouye, M., Makalic, E., Schmidt, D.F., Wagner, J., Zhou, Z., Zobel, J.
and Reumann, M. 2015. High performance computing enablingexhaustive analysis of higher order single
nucleotide polymorphism interaction in Genome Wide Association Studies. Health Information Science
and Systems, 3(Suppl 1):S3.
Kadarmideen H. N. 2014. Genomics to systems biology in animal and veterinary sciences: progress, lessons
and opportunities. Livestock Science, 166: 232248.
Kim, I., Jung, J.Y., DeLuca, T.F., Nelson, T.H. and Wall D.P. 2012. Cloud computing for comparative genomics
with WindowsAzure Platform. Evolutionary Bioinformatics, 8, 527.
Korte
T.
2014.
Supercomputing
vs.
distributed
computing:
A
government
primer.URL:http://www.datainnovation.org/2014/01/supercomputing-vs-distributed-computing-agovernment-primer/
Neuhauser, N., Nagaraj, N., McHardy, P., Zanivan, S., Scheltema, R., Cox, J. and Mann, M. 2013. High
performance computational analysis of large-scale proteomedata sets to assess incremental contribution to
coverage of thehuman genome. Journal of Proteome Research, 12: 2858.
Perez-Sanchez, H.,Cecilia, J.M. and Merelli, I. 2014. The role of high performance computing in
bioinformatics. Proceedings IWBBIO. Granada 7-9 April, 2014.

121

Cryopreservation of Cauda Epididymal Spermatozoa for Conserving


Caprine Genetic Biodiversity
Rajeev A K Aggarwal
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Procedure
i.
Goat testis are collected from slaughter house and brought to laboratory in cold condition
within 2-3 hours of slaughtering.
ii.
The caudal region of epididymis region is sliced out from the testis, adhering tissues
removed and washed in normal saline at room temperature.
iii.
The cauda is given three to four cuts longitudinally and kept in buffer solution for about 30
minutes for sperms to swim out of seminiferous tubules.
iv.
The buffer containing sperms is given a suitable spin in centrifuge so as to concentrate them.
v.
The sperms are then extended in a buffer containing cryo protectant, sugar, buffering salt,
antibiotics etc. at suitable pH and diluted so that their conc. is about 100 million / ml.
vi.
The extended sperms are then filled in a straw and straw sealed.
vii.
The straws are then stacked in a programmable freezer and cooled to 50C @ 0.250C/min. and
then stabilized at this temperature for 30 minutes.
viii.
The straws are then cooled to -200C @ 50C/min and then to -1000C @ 200C/min.
ix.
The straws are then plunged directly in liquid nitrogen.
x.
After suitable period of storage the straws are thawed at 370C and sperms motility
evaluated.

123

2
Cytogenetic and Molecular Screeningof Genetic Defects in Livestock
S. K. Niranjan and R. S. Kataria
National Bureau of Animal Genetic Resources, Karnal, Haryana, India
Cytogenetic screening
Cytogenetic screening involves some main steps- cell division or inducing the cell for cell division,
arresting the cells at metaphase, treat the cells in hypotonic solution, make the spread and finally
staining the chromosomes. Nearly all methods of chromosome bandings rely on these steps in most
importantly on first two. Peripheral blood is most convenient tissue and common source for the
karyotype preparation in livestock species. Since, lymphocytes have the division capability; they
are induced for mitosis by using suitable mitogen and allowed to propagate in suitable culture
medium supplemented with essential ingredients and incubation temperature and period.
Harvesting of the chromosomes is achieved by the inhibitors like colchicine or colcemid, which
inhibit the tubulins and depolymerize the mitotic spindles and ultimately arrest the cell division at
particular stage. Metaphase chromosomes spread make all the chromosomes to stay in the same
plane on the slide. The spreads, which do not overlap, are selected and individual chromosomes are
identified.
About 8-10 ml blood sample should be collected under strictly sterile condition in heparin coated
vacutainer tubes (green top). Sample should clearly mention about the Animal ID on the collection
tube. However, description of samples must be provided separately with Animal ID, breed, sex,
age. However, information about the fertility status of the animal along with other requisites
should also be provided in case of cytological screening.Blood samples must be reached to lab as
soon as possible (not beyond 48 hours after collection for cytological analysis) in cooled (at about
4C) conditions.
Preparation of reagents
Media
8.1 gm RPMI media
0.8 gm NaHCO 3
5.0 ml antibiotics (Actinomycotik: Anti-Anti)
1 ml (2.5 mg/ml) Lectin-Phytoheamoglobin (PHA)
1 ml (1mg/ml) lectin (pokeweed)
1 ml (5 mg/ml) conconavalin A
500 ml autoclaved distilled water
Mix all the content properly and filter (Nalgene Filter units MF75 Series SFCA
membrane, 90mm diameter, pore size 0.45 )
Add 100 ml fetal bovine serum
Mix and store at -20 C
Hypotonic solution
1.667 gmKCl dissolved in 300 Distilled Water, Keep at 37 C (needed about 7 ml/sample).
Fixative solution:
3 Methanol:1 acetic acid (keep in freezer) (needed about 15 ml/sample)
Staining solution (2%Geimsa stain)
49 ml GURR buffer
1 ml Geimsa stain
124

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Culture setting:
Take Media 5 ml in 15 ml sterilised tube and add 0.7-1.0 ml whole blood. Mix the content properly.
This step should be done under strict sterilized condition, using laminar flow to avoid any
contamination during culture. Incubate the culture in incubator at 37 C for 72 hours. Mix the
content of tube almost every 12 hours interval. Normally blood cells are settled down in culture
media after few hours. In case of any contamination, culture content is generally not settled and
turns black.
After completion of 72 hours of culture, put out the culture from incubator and add Colchicine
(colcemid) @ 28l per sample and again incubate at 37 C for 1 hour.
Remove the tubes and centrifuge at 2,000 rpm for 20 minutes. After centrifugation, discard the
supernatant cautiously.
Add 7 ml of hypotonic solution and mix it with glass/plastic pipette. Incubate the content in
incubator at 37 C for 20 minutes.
Add 1 ml fixative (chilled) into the content and mix it properlywith the help of glass/plastic
pipette (colour changes to blackish). Centrifuge the tube at 2,000 rpm for 20 minutes.
Discard supernatant cautiously and add 5 mlof fixative solution. Mix the content properly.
Centrifuge at 2,000 rpm for 20 minutes.
Discard supernatant very cautiously, as the sedimented content turns almost colourless and add
4 ml of fixative solution. Mix the content properly. Centrifuge at 2,000 rpm for 20 minutes.
Discard supernatant very cautiously and add 3 mlof fixative solution. Mix the content properly.
Centrifuge at 2,000 rpm for 20 minutes.
Discard half of supernatant and keep 1.5 ml and re-suspend the content for further slide
preparation. Content may also be preserved at -20 C.
Preparation of slide
Wash the glass slide. Keep the glass slides into the icecold water before making the spread.
Take about 0.5 ml culture contentin to the pipette and drop 4-5 droplets on the slide from a
height of about 1 meter. During the dropping the slide should be slightly tilted/angled
towards ground, so that it will cause to bursting of cell and evenly spreading of
chromosomes.
Airs dries the slide for overnight and mark after drying.
Stainingand mounting
Dip the slide in staining solutionGeimsa (2%), for 15- 20 min (slide should be totally dried before
staining). After staining, wash the slides in running tape water and rinse with distilled water.
Dry the slide overnight in incubator (dry completely)
Dip the dired slides in xylene for 15-20 min
Put 2-3 drops of mountant (DPX or Eukitt quick hardening substance)
Mount the coverslip (xylene dipped) on slide and fix properly. remove air bubbles by putting
slight pressure on coverslip. Air dry the slide overnight.
Clean the slide by using xylene.
Microscopic examination
Once stained slides are prepared, they are scanned to identify "good" chromosome spreads (i.e. the
chromosomes are not too long or too compact and are not overlapping), which are photographed.
The images of each chromosome then are cut out and pasted to a backing sheet in an orderly
manner. Alternatively, a digital image of the chromosomes can be cut and pasted using a computer.
If standard staining was used, the orderly arrangement is limited to grouping like-sized
chromosomes together in pairs, whereas if the chromosomes were banded, they can be
unambiguously paired and numbered.
125

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Generally, several metaphases are processed because it is not uncommon for a single spread to
artifactually have extra chromosomes or be missing chromosomes. This is particularly important if
one is to diagnose an abnormality in an individual. It also allows one to diagnose cases of
mosaicism, in which an individual has multiple, cytogenetically distinct populations of cells.
One final point, the discussion above has focused on initial evaluation of an individual's
cytogenetic status. If abnormalities are found in peripheral blood, it is sometimes desirable to
determine whether that abnormality is present throughout the individual, and further studies with
tissues other than blood can be performed. Also, analysis of diseased tissues can often provide
useful information. A prime example of this is the cytogenetic evaluation of cancers, which is not
only used diagnostically, but has provided valuable understanding of the pathogenesis of certain
types of neoplasia.
Cytogenetic nomenclature
Nomenclature of chromosome and chromosomal abnormalities is done as per guidelines given by
International System for Human Cytogenetic Nomenclature (ISCN) 2009. Few examples of
cytogenetic nomenclature are as follows.
Karyotypes are presented in a standard form. First, the total number of chromosomes is given,
followed by a comma and the sex chromosome constitution. This shorthand description is followed
by coding of any autosomal abnormalities. A few (simple) examples of this format are:
A normal male cattle: 60, XY
Horse with three X chromosomes (trisomy X): 65, XXX
Female sheep with increased length of the short (p) arm of chromosome 2: 78, XX, 2p+
Male pig with a deletion from the long arm (q) of chromosome 10: 38, XY, 10q46, XX
Normal Female Karyotype of human
46, XY
Normal Male Karyotype of human
p
short arm of chromosome
q
long arm of chromosome
cen
centromere
+
gain of
eg. 47,XX,+21 Female with trisomy 21
loss of
eg. 45,XX,-14,-21,+t(14q21q) Normal female carrier of a robertsonian
translocation between the long arms of chromosomes 14 and 21; karyotype is
missing a normal 14 and a normal 21
4pChromosome 4 with one of the short arm deleted.
:
break 5qter -->5p15: deleted chromosome 5 in a patient with cri du chat syndrome,
with a deletion breakpoint in band p15
::
break and join 2pter-->2q21::8p13-->8pter Description of der(2) portion of t(2,8)
/
mosaicism 46,XX/47,XX,+8 Female with two populations of cells, a normal
karyotype and one with trisomy 8
del
deletion 46,XX,del(5p) Female with deletion of part of short arm of one
chromosome 5
der
derivative chromosome der(1) Translocation
chromosome
derived
from
chromosome 1 and containing the centromere of chromosome 1
dic
dicentric chromosome dic(X;Y) Translocation
chromosome
containing
centromeres from both the X and the Y chromosomes
dup
duplication
fra
fragile site 46, Y fag(X)(q27.3)
Male with fragile X chromosome
i
isochromosome 46,X,i(Xq) Female with isochromosomefro the long arm of the X
chromosome.
ins
insertion
126

Molecular Genetic Characterization of Farm Animal


Genetic Resources

inv
mar
r
rcp
rob
t
ter

inversion inv(3)(p25:q21) Pericentric inversion of chromosome 3


marker chromosome 47,XX,+mar Female with an extra unidentified chromosome.
ring chromosome 46,X,r(X) Female with ring X chromosome
reciprocal translocation
Robertsonian translocation
translocation 46,XX,t(2;8)(q21;p13) Female with balanced translocation between
chromosome 2 and chromosome 8, with breaks in 2q21 and 8p13
terminus eg. 46, X,Xq-(pter-->q21:) Female with partial deletion of the long arm from
Xq21 to Xqter (nomenclature shows the portion of the chromosome that is present)

Molecular screening
Blood collection:Ten ml of venous blood was collected in sterile centrifuge tube containing 0.5 ml of
2.7% EDTA as an anticoagulant and immediately transferred to laboratory in an ice bucket. Sample
should clearly mention about the Animal ID on the collection tube. Blood samples must be reached
to lab as soon as possible (not beyond 48 hours after collection for cytological analysis) in cooled (at
about 4C) conditions.
Isolation of genomic DNA: Genomic DNA was isolated from 10 ml blood by phenol-chloroform
extraction method by standard protocol (Sambrook and Russel, 2001) with slight modifications.
PCR amplification:To amplify genomic region of aparticular gene, following set of primers are used
List of primers

Locus ,Allele
BLAD

Nmae of primer
Forward
Reverse
5'Citrullinemia
Forward
Reverse
DUMPS
Forward
Reverse
Factor XI deficiency Forward
Reverse
5-

Primer sequence
5'-CCTGCATCATATCCACCAG -3'
GTTTCAGGGGAAGATGGAG -3'
5'- GGCCAGGGACCGTGTTCATTGAGGACATC - 3'
5'- TTCCTGGGACCCCGTGAGACACATACTTG -3
5- GCAAATGGCTGAAGAACATTCTG -3
5- GCTTCTAACTGAACTCCTCGAGT -3
5- CCCACTGGCTAGGAATCGTT -3
CAAGGCAATGTCATATCCAC -3

PCR conditions: The following reaction mixture can be used for the amplification of these alleles
S. N. Reaction components
Concentration
Amount
1. Template (Genomic DNA)
140 ng
2.00 l
2. Forward primer
30 pmole
1.00 l
3. Reverse primer
30 pmole
1.00 l
4. 10X PCR buffer (with 1.5 mM MgCl 2 ) 1X
5.00 l
5. dNTPs mix (2 mM)
200 M
5.00 l
6. Autoclaved triple distilled water
34.75 l
7. Taq DNA polymerase (5 U/l)
1.25 U
0.25 l
Total
50.00 l
PCR amplification conditions: General PCR conditions are given below, which needs to be
standardized for each test.
S.N. Steps
Temperature
Time
1. Initial denaturation 94C
3 min.
2. Denaturation
94C
30 sec.
3. Annealing
specified (52-58C) 30 sec.
4. Extension
72C
1 min.
GO TO STEP 2 FOR 35 TIMES
5. Final extension
72C
10 min.
127

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Checking of amplified products:After completion of PCR programme, the PCR products were checked
in 1.5% agarose gel electrophoresis at 60V for 1 hour. After electrophoresis, the products were
visualized and documented under gel documentation system.
Restriction enzyme digestion of PCR products:Total four restriction enzymes namely AvaII, HinfI, TaqI,
Tru1I, HaeIII and MspIwere used for the PCR-RFLP as well as haplotype analysis of DQA and DQB
genes.
Enzymes used for PCR-RFLP of BuLA-DQA genes
Restrictionenzymes
Incubationtemp
Recognitionsequence
Hinf I
37 C
5'GANTC3'
Ava II
37 C
5'GGA/TCC3'
HaeIII
37 C
5'GGCC3'
The reaction mix for digestion
The digestion was carried out in 0.2ml PCR tube in a total volume of 15l reaction mix at specified
temperatures.
Reaction component
Amount
Restriction enzyme (10u/l)
1.0 l
10x assay buffer for RE
1.5 l
PCR product
10.0 l
Autoclaved dist. water
2.5 l
Total volume
15l
The samples were incubated overnight to ensure complete digestion. Digested products of PCR
products were checked in 2.0% agarose gel electrophoresis at 70V for 1hr 30 min. After
electrophoresis, the products were visualized and documented under gel documentation system.
Bovine Leukocyte Adhesion Deficiency (BLAD)
Through using PCR- RFLP method, this genetic defect can be identified. PCR-RFLP patterns of
BLAD free and BLAD carrier cows are screened by restriction digestion with Taq I. The PCR
isperformed using primers (5'-CCTGCATCATATCCACCA G-3' and 5'- GTTTCAGGGGAAGAT
GGAG-3'), resulting in amplification of fragment of 343 bp length. The fragment was cut with TaqI
restriction enzyme. After digestion, two bands of 152 and 191 bp indicate homozygote normal
individual, a single band of 343 bp indicates homozygote sick individual, and three bands of 152,
191 and 343 bp indicate heterozygote carrier individual.

Figure 1. Pictorial diagramme showing different BLAD genotypes by PCR-RFLP

Citrullinemia
PCR-RFLP method permits diagnosis of the genotypes for bovine citrullinemia. Amplification of
ASAS locus and detection of mutation at codon 86 was done by PCR followed by restriction
digestion of amplified products. Primers of sense (5'GGCCAGGGA CCGTGTTCATTGAGGACATC
3') and antisense primers (5' TTCCTGGGACCCCGTGAG ACACATACTTG 3 ). The PCR -RFLP of
128

Molecular Genetic Characterization of Farm Animal


Genetic Resources

ASAS locus using AvaII enzymeis 103bp and 82bp for normal animals and 185bp, 103bp and 82bp
for carrier animals and only 185bp for diseased (homozygous recessive) animals.

Figure 2. Pictorial diagramme showing different Citrullinemia genotypes by PCR-RFLP

Deficiency of Uridine Mono Phosphate Synthase (DUMPS)


Genetic testing of animals for DUMPS involves PCR amplification of a 108 bp fragment from the
UMPS gene using (5- GCAAATGGCTGAAGAACATTCTG -3) and (5- GCTTCTAAC
TGAACTCCTCGAGT -3) primers. The PCR productis digested with AvaI; normal homozygote
shows bands of 53, 36, and 19bp, heterozygote of 89, 53, 36, and 19bp, the recessive genotype is 89
and 19bp.

Figure 3. Pictorial diagramme showing different DUMPS genotypes by PCR-RFLP

Factor XI Deficiency
Detection of Factor XI deficiency is based on PCR amplification of the target gene fragment.The
primers (5-CCCACTGGCTAGGAATCGTT-3) and (5-CAAGGCAATGTCATATCCAC-3) can be
used to amplify the region containing the mutation in exon 12. For normal FXI allele, PCR amplifya
244 bp long fragment, whereas, in mutated FXI allele in homozygouscondition results in a single
320 bp long fragment amplification. Heterozygous, or carrier, individuals exhibit both 244 and 320
bp long fragments.

Figure 4. Pictorial diagramme showing different FXI genotypes

129

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Reference
Grupe, S.; Dietl, G. and Schwerin, M. 1996.Population survey of Citrullinemia on German Holsteins.Livest.
Prod. Sei., Amsterdam 45: 35.
Marron, B. M., J. L. Robinson, P. A. Gentry and J. E. Beever 2004. Identification of a mutation associated with
factor XI deficiency in Holstein cattle. Animal Genetics, 35:454.
Prakash, B., Balain, D.S., Lathwal, S.S. and Malik, R.K. 1995. Infertility associated with monosomy-X in a
crossbred cattle heifer. Veterinary-Record. 137: 17: 436.
Shuster, D.E., Kehrli, M.E., Ackerman, M.R. and Gilbert, R.O. 1992. Identification and prevalence of genetic
defect that Causes Leucocyte Adhesion Deficiency Diseases in Holstein Cattle. Proceedings of the
National Academy of Sciences of the United States of America 89, 9225.
http://www.radford.edu/~rsheehy/cytogenetics/Cytogenetic_Nomeclature.html

130

Genomic DNA Isolation from Blood Samples


Reena Arora and Rekha Sharma
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Nucleic acid extraction is a key step in laboratory procedures required to perform further molecular
analysis. Successful use of available downstream applications will benefit from the use of highquantity and high-quality DNA. Therefore, it is imperative that the DNA extracted for subsequent
use is devoid of proteins and other inhibitors.DNA can be extracted from fresh or frozen whole
blood, blood stains, sperm cells, cultured cells/tissue, amniotic fluid and hair roots. Basic extraction
procedure remains the same except for minor modifications for each type of material. The phenolchloroform extraction followed by ethanol precipitation is most routinely used method for DNA
isolation. Phenol-chloroform extractionis a liquid-liquid extraction method that separates mixtures
of molecules based on the differential solubility of the individual molecules in two different
immiscible liquids.
DNA extraction methods follow some common procedures aimed to achieve effective disruption
of cells, denaturation of nucleoprotein complexes, inactivation of nucleases and other enzymes,
removal of biological and chemical contaminants, and finally DNA precipitation.Most of the
methods follow similar basic steps and include the use of organic and inorganic reagents and
centrifugation methods.
Since the goal of genomic DNA extraction depends on what will be the applications of the DNA
after isolation, therefore purity, source, quantity and quality of DNA are all issues that need to be
addressed prior to genomic DNA extraction. A plethora of different methods, technologies and kits
are now available to researchers to isolate genomic DNA from cells.Selecting the most suitable
method/technology/kit for DNA extraction depends on the following factors.
Quantity of DNA needed
Molecular weight and size of DNA
Purity of DNA required
Downstream applications of DNA
Time available
Ease of DNA extraction technique or method
Expense or money available
Almost all protocols for isolation of DNA from blood and tissues involve four major steps Lysis of cells using a detergent such as sodium dodecylsulphate (SDS)
Digestion of proteins released from cell lysis with proteinase- K
Extraction of DNA with phenol
Precipitation of DNA with alcohol
Principle of DNA extraction
The basic principle of extraction is that all other components of the chromatin are removed leaving
behind the DNA. The proteins are digested by the enzyme Proteinase- K in presence of SDS, which
acts as a catalyst and also helps in lysis of cells (WBC, bacterial cells etc). Phenol-chloroform
extraction is used to remove the proteins from the aqueous phase. Chloroform eliminates any traces
of phenol which can cause phosphodiester breakage. The pH of phenol should be maintained above
7.8 to prevent DNA from becoming trapped at the inter-phase between the organic and aqueous
phase. DNA is finally precipitated with alcohol or isopropanol from a salt solution of a moderate

131

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

concentration of monovalent cations.On an average, about 400 to 500 g of DNA can be extracted
from 10 ml of blood.
Materials required
Plastic (Oakridge tubes)
Borosil tubes (autoclaved)
Crushed ice
Weighing balance
Pipettes (sterile)
Sterile plasticwareand glassware
High speed centrifuge
Vortex mixer
Stock solutions
1 M Tris (pH 8.0)
1 M NH 4 CI
1 M KHC0 3
0.5 M EDTA (pH 8.0)
0.5 M NaCl
20% SDS
Proteinase- K (20 mg/ml)
3 M sodium acetate (pH 5.2)
Phenol (equilibrated with Tris at pH > 7.8)
Chloroform
Isoamyl alcohol
Absolute alcohol
70% alcohol
RBS lysis buffer
Ammonium chloride
155mM
Potassium bicarbonate
10 mM
EDTA (pH8)
0.1 mM
DNA extraction buffer
NaCl
Tris (pH 8)
EDTA (pH 8)

400 mM
10 mM
2 mM

Collection and storage of blood


The blood is collected aseptically from the jugular vein of animals in vacutainers containing EDTA
as anti coagulant. In case of avian species, the blood is collected from the wing vein. The collected
blood sample has to be transported to the laboratory in ice cold condition to prevent lysis of the
blood cells. The blood can be stored upto 48 hours at 4C or can be stored at -70C for few days but
the quality and quantity of extractable DNA decreases with time.
Procedure
Cell Lysis
1. Transfer 10ml of blood into a labeled autoclaved Oakridge tube. Add two volumes of ice-cold
RBC lysis buffer and mix gently. Incubatethe tubes on ice for 10 minutes.
2. Centrifuge the sample at 8500 rpm for 10 minutes at 4oC and discard the supernatant without
disturbing the pellet.
132

Molecular Genetic Characterization of Farm Animal


Genetic Resources

3. Re-suspended the pellet in one volume of RBC lysis buffer and keep on ice for 10 minutes.
Centrifuge as in step 2.
4. Steps 2 and 3 are repeated until most of the red blood cells are lysed and clear pellets of white
blood cells are obtained.
5. Add equal volume of DNA extraction buffer to the WBC pellet and mix well by vortexing.
6. Add 20% Sodium dodecyl sulfate (SDS)@ 200l per 10ml of whole blood and mix gently.
(SDS is a popular detergent used to solubilize cell membranes)
7. Add Proteinase-K (20 mg/ml stock solution) @ 40 l per 10ml of blood and incubate at 56oC
overnight.
(Enzymes are combined with detergents to target cell surface or cytosolic components. Proteinase K
cleaves glycoproteins and inactivatesRNases and DNases)
Phenol extraction
1. After overnight incubation, add equal volume of Tris (pH 8.0) saturated phenol to the above
mixture and mix gently by inverting the tube for 5-10 minutes to form a uniform suspension.
2. Centrifuge the mixture at 10000 rpm for 15 minutes at 25oC.
3. Aspirate the upper aqueous phase gently using a wide bore sterile pasture pipette without
disturbing the inter-phase of protein and transfer to a fresh oakridge tube.
The nucleic acid will tend to partition in the organic phase if the phenol has not beenadequately
equiliberated at a pH of 7.8-8.0)
4. Add equal volume of phenol: chloroform: isoamyl alcohol (25:24:1) and mix gently by inverting
the tube until a uniform suspension is formed.
(Isoamyl alcohol prevents frothing during mixing)
5. Centrifuge the mixture at 10000 rpm for 15 minutes at 25oC and again aspirate the upper
aqueous phase gently using a wide bore sterile pasture pipette and transfer into a fresh
oakridge tube.
6. Add equal volume of Chloroform: Isoamyl alcohol (24:1) and mix properly by inverting the
tube several times.
7. The mixture is centrifuged again at 10000 rpm and the aqueous phase is transferred into a fresh
sterile glass tube without disturbing the inter-phase.
DNA precipitation
1. To the separated aqueous phase, add 0.1 volume of sodium acetate (3M, pH 5.2) and mix gently.
(DNA precipitation is achieved by adding high concentrations of salt to DNA-containingsolutions, as
cations from salts counteract the repulsion caused by the negative charge of the phosphate backbone)
2. Add 2 to 2.5 volumes of chilled ethanol and mix gently by inverting the tube to precipitatethe
DNA.
(A mixture of DNA and salts in the presence of solvents like ethanol at final concentrations of 70%80%
cause nucleic acids to precipitate)
3. Spool out the precipitated DNA with the help of Pasteur pipette and transfer to an eppendorf
tube.
4. Wash the DNA with 1ml of 70% ethanol by mixing well and centrifuging at 10000 rpm for 5
minutes at 4oC. Repeat this step again.
(Washing step with 70% ethanol removes excess salts from DNA)
5. Keep the eppendorf tubes open in a sterile incubator at 37oC to dry the pellet by evaporating the
alcohol.
6. Dissolve the DNA in 500l TE buffer (pH 8.0) and incubate at 65C for 30 minutes.
7. Store the DNA sample at 4C for 4-5 days to ensure complete dissolution of DNA in the buffer.
The dissolved DNA can then be stored at -20Cas stock solution for future use.
133

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Critical Points in DNA isolation


Ice incubation must be at least for 10 min so that all RBC enzymes get inactivated.
Over incubation in lysis buffer may cause the lysis of WBCs also.
Lysis buffer and ethanol must be chilled.
All the glassware, plastic ware, pipettes must be clean, autoclaved and dried.
Always put on apron and disposable gloves during sample preparation and DNA isolation.
Do not allow the pellet to dry completely; otherwise, it will be difficult to dissolve.
Kit based DNA isolation
Alternatively, solution or column based blood DNA isolation kits can be used for genomic DNA
isolation, yielding good quality DNA without handling hazardous chemicals like phenol and
chloroform. The protocol given below is for the 'HiPurATM SPP Blood DNA Isolation Kit' supplied
by HiMedia Laboratories, India.
DNA purification protocol for 2 ml whole blood:
1. Take initial volume 2 ml of whole blood in a clean 15 ml centrifuge tube.
2. RBC Lysis-Add 6.0 ml of RBL Buffer (R075). Mixwell by inverting the tube a few times. Incubate
the tubes at room temperature for 5minutes. Mix the tube contents intermittently by inverting
several times during incubation.
3. Centrifuge the tube at 2,000 x g (5,000 rpm) for 5 minutes at room temperature. Discardthe
supernatant containing the lysed red blood cells carefully without disturbing the whitepellet.
Leave about 150 ml of residual liquid in the tube. If the blood sample has been frozen,repeat
steps 2-3 until pellet is white.
NOTE:If some red blood cells or cell debris are observed along with the white blood cellpellet,
resuspend the white blood cell pellet and mix with 4 ml of RBL buffer (R075).Incubate at room
temperature for 2 minutes. Pellet down the white blood cells by repeatingStep 3.
4. Vortex the tube vigorously so as to resuspend the white blood cells completely.
5. WBC Lysis- Add 2 ml WBL Buffer (DS0046) to the resuspended white blood cells and pipet up
and downto lyse the cells. Solution should become viscous. Incubate the solution at 37C if any
cellclumps are still present.
Optional RNase A treatment-If RNA-free genomic DNA is required, add 10 l of RNaseA Solution
(DS0003). Invert thetube 20-25 times to ensure thorough mixing of enzyme and incubate for
approximately 10minutes at 37C.
6. Cool the sample to room temperature before further processing.
7. Protein Precipitation-Add 670 l PBP Buffer (DS0047) to the cell lysate. Mix by vortexing for 30
seconds at highspeed. Incubate on ice for 5 minutes.
8. Centrifuge at maximum speed 2,000 x g (5,000 rpm) for 5 minutes at room temperature.
9. DNA Precipitation- Transfer the supernatant to a new 15 ml centrifuge tube, containing 2 ml of
100%isopropanol. Ensure that no protein pellet gets transferred along with the supernatant.
Mixby inverting the tube 40-50 times gently.
10. Centrifuge at 2,000 x g (5,000 rpm) for 3 minutes at room temperature. A small whitepellet of
DNA will be visible.
11. Wash- Discard the supernatant and dry the pellet by inverting the tube on a clean absorbent
papertowel. Wash the DNA pellet by adding 2 ml of 70% ethanol, inverting the tube a few times.
12. Centrifuge at 2,000 x g (5,500 rpm) for 2 minutes at room temperature. Carefully pour offthe
ethanol. The pellet may be very loose at this point, so the supernatant should becarefully poured
off without disturbing the pellet. Repeat steps 11-12 for second ethanolwash.
13. Invert the tube on a paper towel and air-dry the pellet for 10-15 minutes.

134

Molecular Genetic Characterization of Farm Animal


Genetic Resources

14. DNA Elution-Add 200 l of Elution Buffer (ET) (DS0040) and vortex for 1 minute to dissolve the
DNA pelletproperly. Incubate the tube at 65C for 1 hour and at room temperature overnight
torehydrate the DNA. Gently shake the tube several times intermittently during the incubationto
dissolve the DNA completely.
15. Storage of the eluate with purified DNA- The eluate contains pure genomic DNA. For shorttermstorage of the DNA, 2-8C and for long-term storage, -20C is recommended. Avoid
repeatedfreezing and thawing of the sample which may cause denaturing of DNA. The Elution
Buffer willhelp to stabilize the DNA at these temperatures
Estimation of quality and quantity of DNA
Most commonly used methods to estimateDNA concentration
spectrophotometeric measurements and agarose gel electrophoresis.

and

purity

are

UV

1. UV spectrophotometry: Measuring ultraviolet light absorbance using spectrophotometer at


different wavelengths (230 nm, 260 nm and 280 nm) is an initial quick and efficient way of
determining purity and concentration of nucleic acid samples.
230nm: Phenol (used in phenol-chloroform extraction) absorbs strongly at 230nm, therefore high
absorbance at this wavelength can be indicative of carry-over of this compound into the sample.
260nm: DNA absorbs light most strongly at 260nm, so the absorbance value at this wavelength
(called A 260 ) can be used to estimate the DNA concentration using the BeerLambert law
Concentration (g/ml) = (A 260 reading A 320 reading) 50
280nm: Since tyrosine and tryptophan residues absorb strongly at this wavelength, the absorbance
at 280nm is used as an indicator of protein contamination.
Purity of nucleic acid samples is assessed in a 260/280 absorbance ratio, and values in the range of
1.82.0 are generally considered acceptable.
2. Agarose gel electrophoresis (AGE):Electrophoresis is the migration of charged molecules in
solution in response to an electric field. Their rate of migration depends on the strength of the field,
net charge, size and shape of the molecules and also on the ionic strength, viscosity and
temperature of the medium in which the molecules are moving. As an analytical tool,
electrophoresis is simple, rapid and highly sensitive. It is used to study the properties of a single
charged species and as a separation technique.
AGE using a quantitative dye such as ethidium bromide, can be used as an alternative approach
to measure the DNA concentration in the sample. The DNA concentration of a sample can be
roughly calculated by comparison of the sample band intensity with that of a molecular weight
marker band whose DNA concentration is known. Contaminating RNA or genomic DNA can also
be differentiated on an agarose gel, since RNA will run as a low molecular weight smear and
genomic DNA as a high molecular band.

135

Analytical Approaches for Microsatellite Markers


Reena Arora
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Sampling procedure
Any of the biological materials like fresh blood, tissue, hair, bone etc. may potentially be used for
DNA analysis. However, fresh blood is preferred as a sample material as high quality of DNA can
easily be obtained from peripheral blood. Sample should be collected from unrelated animals by
visiting the breeding tract of the breed in question and not more than 10% of a herd or village
population should be sampled. Whenever possible, pedigree records should be consulted for
identifying unrelated individuals. To achieve clearer differentiation among closely related
populations/ breeds, FAO recommends that per breed 50 unrelated animals (preferably 25 each of
both the sexes) should be assayed.
DNA extraction
The collected blood samples in vacutainer tubes containing anticoagulant such as EDTA are
transported to the laboratory under chilled condition for further processing. Genomic DNA from
total blood is then isolated using proteinase-K digestion followed by standard phenol/ chloroform
extraction. Both the quality as well as quantity of isolated genomic DNA is assessed and
subsequently stored at 200C/40C for further analysis with microsatellite markers. Blood samples
can also be collected on FTA cards.
Amplification and resolution of microsatellites
Microsatellites can be amplified by polymerase chain reaction (PCR) technique with unlabelled or
labeled primers. On amplification with unlabelled primers the number of repeat units that an
individual has at a given locus can be resolved using polyacrylamide gels by silver staining which
involves three steps 1) Fixing of the DNA band on the gel by 10% acetic acid, 2) Incubation of the
gel in the silver nitrate solution for 30 minutes 3) Developing the DNA bands with the help of
developer. The resulting stained gels are dried and stored for data recording and data analysis.
From the gels, two bands for most individuals can be seen as each individual inherits one length of
nucleotide repeats from the mother and the other from the father (individuals with one band reveal
that the same band has been received from the mother as well as the father).PCR primers labeled
with fluorescent dyes viz., FAM, HEX, NED, PET which have different absorption spectra permit
the simultaneous analysis of microsatellites, which overlap in size, by automated DNA fragment
analyzer/ sequencer. The sequencer allows a much higher resolution of microsatellites than is
possible with the other methods of analysis.
Data processing
A number of software programmes are available for analysis of microsatellite data recorded as
genotype designations for each individual across the microsatellite loci with different analytical
methods that can be downloaded from internet. The data generated in terms of alleles, as
photographs or preserved gels then needs to be analyzed. Two main steps are involved in the
statistical analysis of molecular data in diversity studies:
Genotyping: Each individual can be genotyped manually by scoring the band (alleles) can be scored
manually as two digits or as their interger size in base pair in which case heterozygous individuals
yield two bands and those that are homozygous yield one band.
Entry of band/allele information into the computer: It can be done manually or it can be read from gel
directly by a computer installed with software.
136

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Statistical analysis of data:


Data analysis can be grouped into two main categories Intra-population analysis (allele diversity, gene diversity, deficiency of heterozygotes)
Inter-population analysis (Genetic distance and analyses of molecular variance)
Measures of molecular variation at individual or populations levels
Genetic diversity indices viz., Allele number, Allele Frequency / Gene Frequency, Effective Allele
Number, Heterozygosity and the statistics that express level of inbreeding, genetic bottleneck,
Polymorphism Information Content (PIC) of loci, F-statistics and Genetic Distance can be estimated
by several software packages with different analytical methods that can be downloaded from
internet. Most of them can be obtained free of charge.
Software for microsatellite data analysis
1. PopGen32
(http://www.ualberta.ca/-fyeh/fyeh)
2. GenAlEx
(http://www.anu.edu.au/BoZo/GenAlEx /)
3. Arlequin
(http://lgb.unige.ch/arlequin/)
4. GDA
(http://hydrodictyon.eeb.uconn.edu/people/plewis/software.php)
5. Phylip
(http://evolution.genetics.washington.edu/phylip/getme.html)
6. Microsatellite (http://oscar.gen.tcd.ie/~sdepark/ms-toolkit/)
7. TreeView
(http://taxonomy.zoology.gla.ac.uk/rod/treeview.html)
Microsatellite genotyping using automated sequencer ABI3100
(Note: The information presented in this section has been sourced from the Genescan Reference
Guide).
In an automated DNA sequencer, fluorescent dyes like FAM, HEX, NED etc are used for 5' -end
labeling of the primers. After amplification the PCR product is mixed directly with a size standard,
denatured and electrophoresed.
Multiplexing reactions
Using the 3130xi and the Applied Biosystems GeneMapper system, different DNA fragments can be
labelled with up to four different fluorescent dyes (6-FAM, VIC, NED or PET). LIZ or ROX dyes
are reserved for internal standard. To exploit the potential for increased throughput using this
system, PCR may be multiplexed. Multiplexing may be done by
1. Combining more than one pair of primers in the same PCR reaction tube or
2. Pooling PCR products with different dye labels and overlapping product size.
However, same dye-labelled primers for loci with overlapping allele size ranges should not be
multiplexed. Compatibility between the primers for successful co-amplification should also be
checked. Since the intensity of emitted fluorescence is different for each dye, 6-FAM, VIC >NED >
PET > LIZ, a greater amount of PCR product labelled with dyes of low emission intensity should be
used than those labelled with dyes of high emission intensity, in order to get similar fluorescent
intensities for all products.
Internal size standard
GeneScan 500 LIZ size standard is used as an internal lane size standard. The fragment sizes range
from 35 to 500bp with 16 single stranded labeled fragments of 35, 50, 75, 100, 139, 150, 160, 200, 250,
300, 340, 350, 400, 450, 490 and 500bases. When designing the primers ensure that the fragments
produced are >75 bp and < 490bp (for LIZ500) or < 580bp (for LIZ600). It is critical to the sizing
method employed by the GeneMapper software that there are at least two size standard fragments
larger than your largest unknown fragment.

137

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Control DNA
It is recommended to analyse at least one control DNA sample in every PCR run. As the control
DNA serves as a positive control for troubleshooting problems with the PCR amplification and also
minitors and correlates/compares the fragment sizes obtained in different runs or by different
people.
Preparation of PCR samples
1) Amplify your target product using primer pairs of which the forward is labeled with a capillary
based dye: 6FAM (Blue), PET (Red), VIC (Green) and NED (Yellow)
2) Dilute PCR product in MiliQ water (e.g. 1:20 dilution varies with individual markers)
3) Prepare internal standard by adding 10 l of LIZ 500 standard (stored at 4 oC)
to 1 ml of HiDi formamide (stored in aliquots of 1 ml at -20C) and mix by pipeting
4) Pipette 1 l of dilute PCR product into individual wells of the microtitre plate
5) Add 9 l of the standard/formamide mix into each well and centrifuge briefly
Troubleshooting
Too much signal is the most common problem. For optimal results, the fluorescent signal should be
between 150 6000 RFUs. Above this range, the instrument cannot measure the true value of the
signal and therefore cannot apply the matrix correctly. This results in artifact pull-up peaks that
can appear in other colours. Artifact peaks can corrupt both the automated size calling and the
analysis of co-loaded samples.
If you intend to pool PCR products, it is important to pool PCR products together at the correct
ratios in order to get similar fluorescent intensities across all fragments in the pool. The fluorescent
dyes are detected with different efficiencies; therefore the amount of each dye-labelled product in
the pool will require adjustment to ensure even detection.
A good way to proceed is to test a few combinations of pooled PCR reactions to determine the
pooling ratio that will provide similar fluorescent intensities across all the pooled fragments. Then
carry out a series of dilutions on the pooled reactions in order to determine the optimal
fluorescence for running on the 3130xi.
After determining the optimal pooling ratio and/or dilution ratio, you can then use the same
dilutions for subsequent analyses, as PCR yields should be relatively consistent.

138

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Using GENEMAPPER
Creating a new panel
(that contains details of your marker including name, colour, size range of fragment)
Open Genemapper
Tools
Panel manager
Double click root of panel manager
Right click top most cell of panel column
Close the dialog box but panel cell still highlighted
File
New kit
Kit name-Test
Click on test (test highlighted)
File
New Panel-Test Plex 1
Double click on test
Select plex 1
File

Marker Name
A
B
C
D
E

New marker
Enter information for marker
Dye
Dye colour
Max size
NED
Yellow
109
PET
Red
116
VIC
Green
111
FAM
Blue
86
FAM
Blue
251
Apply
Okay
Close panel window

139

Min size
149
150
141
160
311

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Data Analysis
Open Genemapper
File
Add samples to project
Browse
Highlight entire folder
Add to list
Add
Ok
Analyse
(choose microsatellite default)
Select column, ctrl D to fill down
select panel of markers (test plex 1)
Select column, ctrl D to fill down
Select internal marker GSLIZ500
Select column, ctrl D to fill down
Run
Open genotypes
Sort by marker /sample no. etc
Export file as text tab delimited
Open table in MS Excel
Edit table to remove the unwanted columns
Using GenAlEx
(Peakall, R. and Smouse P.E. (2012) GenAlEx 6.5: genetic analysis in Excel. Population genetic
software for teaching and research an update. Bioinformatics 28, 2537-2539).
Genetic Analysis in Excel is a popular cross platform package for population genetic analysis that
runs within Microsoft Excel. GenAlEx offers analysis of codominant, haploid and binary genetic
loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population
assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial
autocorrelation) analyses are provided. In GenAlEx 6.5 we introduce exciting new features
including calculation of new estimators of population structure: GST, GST, Josts Dest, and FST
via AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data, and
heterogeneity tests for spatial autocorrelation analysis. Data export is provided to more than 30
other software packages.

140

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Input file format for diploid data, co-dominant marker


Sample
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

Pop
AA1
AA2
AA3
AA4
AA5
AA6
AA7
AA8
AA9
AA10
AA11
AA12
AA13
AA14
AA15
BB1
BB2
BB3
BB4
BB5
BB6
BB7
BB8
BB9
BB10
BB11
BB12
BB13
BB14
BB15
CC1
CC2
CC3
CC4
CC5
CC6
CC7
CC8
CC9
CC10
CC11
CC12
CC13
CC14

locus1
113
113
113
109
109
109
113
109
109
113
109
113
109
113
113
111
109
109
111
113
111
105
109
113
105
111
111
105
113
105
111
105
111
111
111
111
111
111
111
111
111
107
111
111

45

CC15

111

Pop1

Pop3

113
113
113
113
113
113
117
109
109
113
113
117
109
115
113
115
113
113
113
113
113
111
113
115
109
113
113
117
117
113
111
115
113
111
111
111
113
113
111
111
115
111
111
111

Pop2
locus2
136
136
136
130
136
136
136
130
134
134
134
132
124
136
130
136
132
136
130
136
128
136
130
130
130
130
130
130
134
130
126
128
130
132
130
136
130
128
134
128
132
130
130
130

136
136
136
136
136
144
136
130
134
134
136
132
124
136
130
144
136
144
136
136
128
138
144
136
144
136
136
136
144
136
132
136
136
132
130
146
136
132
136
132
136
136
130
136

locus3
182
182
182
184
180
184
180
182
184
176
182
182
184
180
182
184
184
180
184
184
184
184
182
182
184
184
184
184
184
184
180
178
182
180
184
180
170
178
180
178
184
180
180
180

182
182
184
198
184
184
180
182
198
184
184
184
184
198
184
184
184
184
184
184
184
184
184
182
184
184
184
184
184
184
184
184
198
180
198
180
180
184
198
184
184
182
198
184

locus4
126
122
126
124
122
124
120
120
120
120
120
124
122
124
120
120
120
120
120
120
120
120
120
120
120
126
120
120
120
124
124
124
122
124
128
118
118
124
124
116
118
124
124
118

140
124
128
128
122
124
122
120
120
120
122
128
124
124
124
130
120
120
120
124
124
128
126
128
120
126
124
124
124
138
124
132
124
138
128
128
134
124
134
124
130
124
124
124

locus5
212
218
218
214
212
212
212
212
224
212
212
218
212
214
214
214
220
214
220
218
218
214
220
218
214
220
218
216
220
220
218
214
220
220
218
220
220
218
218
220
218
214
214
220

218
218
224
214
216
220
216
216
224
216
214
218
218
218
218
222
220
218
220
218
218
218
220
218
218
220
218
218
220
222
220
220
220
220
220
220
224
220
220
220
220
214
214
222

locus6
123
123
123
121
123
123
123
123
123
123
121
121
123
121
123
123
123
125
121
123
125
123
123
121
123
121
123
121
123
125
123
123
123
121
117
121
123
123
123
123
121
123
123
121

123
123
131
129
123
123
123
123
123
123
123
129
123
121
123
123
133
133
123
123
133
123
133
125
131
133
127
123
133
133
127
145
125
123
121
123
145
127
145
145
123
133
133
121

113

122

136

184

198

120

124

214

222

121

123

141

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Using Popgene32
(Yeh, F.C.; Boyle, T.; Rongcai, Y.; Ye, Z. and Xian, J.M. 1999. POPGENE version 1.31. A Microsoft
window based freeware for population genetic analysis. University of Alberta, Edmonton).
POPGENEis a user-friendly MicrosoftWindow-based computer package for the analysis of genetic
variation among and within natural populations using co-dominant and dominant markers and
quantitative traits. designed specifically for the analysis of co-dominant and dominant markers
using haploid and diploid data. It performs most types of data analysis encountered in population
genetics and related fields. It can be used to compute summary statistics (e.g., allele frequency, gene
diversity, genetic distance, F-statistics, multilocus structure, etc.) for (1) single-locus, single
populations; (2) single-locus, multiple populations; (3) multilocus, single populations and (4)
multilocus, multiple populations. The modules for co-dominant and dominant markers are
currently limited to a maximum of:
1400 populations;
150 groups;
1000 loci;
10 characters (Alpha-numeric) for a locus name (automatically truncates to 10 if more than 10
characters are given).
The number of alleles per locus is limited to 9 (1-9) if you use the numerals to code your alleles or to
52 if you use the alphabetic letters (respectively, capital alphabet A - Z for alleles 1 to 26 and lower
alphabet a -z for alleles 27-52).
Input file format for diploid data, co-dominant marker
/* Diploid alphabetic data of 3 populations each with varying records (genotypes) and 21 loci */
Number of populations = 3
Number of loci = 21
Locus name :
AAT-1 AAT-2 AAT-3 ACO ADH DIA-1 DIA-3 EST-2 GDH G6P HA
IDH MDH-1 MDH-2 MDH-3 MDH-4 PEP-1 PEP-2 PGI-2 PGM SPG-2
AA AAAAAAAAAAAA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB BB A3 AA AB BB AA AAAAAAAAAAAAAAAA AB AA AA
AA AAAAAA BC AC AA AB ABAB AA AAAAAAAAAAAAAAAAAAAA
AA AAAAAA BB CC AA BB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA AB AC AA BB AA AAAAAAAAAAAA AC AA AAAA AB AA
AA AAAA AB AB AC AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA
AB AA AAAA BC AC AA AB AB AA AAAAAA AB AA AB AA AAAAAAAA
AA AAAAAA BB AA AAAAAAAAAAAAAAAAAAAAAAAA AC AA AA
AA AAAAAAAA BC AA AB AA AAAAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB BC BC AA BB AB AA AAAAAA AB AA AAAAAAAA AC AA
AA AAAA AB AC AB AA BB BB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA AB AC AA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAAAAAA BC AB AB AA AAAAAAAAAAAAAAAAAA AC AA AA
AA AAAA AB AA AC AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA BB BB AA AAAAAAAAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA AC AC AA BB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB BB BC AA BB AA AC AA AAAAAAAAAAAAAA AC AD AA
AA AAAAAA AB BC AA AB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA BB BC AA BB AA AAAAAAAAAAAA AB AA AAAA AC AA
AA AAAAAAAA BC AA AB AB AA AAAAAAAAAA BC AA AAAAAAAA
AA AAAAAA AB BC AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA

142

Molecular Genetic Characterization of Farm Animal


Genetic Resources

AA AAAAAA BD BC AA AB AB AA AAAAAAAAAAAAAAAA AC AE AA
AA AAAA AB CC BB AA BB AA AAAAAAAAAAAAAAAAAAAA AB AA
AA AAAAAA BC BB AA AB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB CC BB AA AA AB AA AAAAAAAAAAAAAAAAAA AB AA
AA AAAAAA BB BC AA AB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA AB BC AA AA BB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB AC AB AA BB AA AAAAAAAAAAAAAAAAAAAA AB AA
AA AAAA AB BC BB AA BB BB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB ABAB AA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA BB AB AA AAAAAAAAAAAAAAAA BB AA AAAAAAAA
AA AAAAAA AC BC AA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA CC BC AA AB AB AA AAAAAAAAAA AB AA AAAA AC AA
AA AAAAAA BB AC AA BB AA AAAAAAAAAAAAAAAAAA CC AA AA
AA AA AB AA BE BB AA BB AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA AB BC AA AB AA AAAAAAAAAAAA AB AA AAAAAAAA
AA AAAA BB AA CC AA AB AB AA AAAAAAAAAA BC AA AAAAAAAA
AA AAAA AB AC BC AA BB BB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB AA BB AA AC AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB BB AC AA BB AA AAAAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA AC AC AA AA AB AA AAAA AB AA AA AB AA AAAAAAAA
AA AAAAAAAA AB AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA BC BC AA BB AB AA AAAAAAAAAA AB AA AAAA AC AA

143

5
Approaches for Analysis of Mitochondrial Sequence Data
Monika Sodhi and Manishi Mukesh
ICAR-National Bureau of Animal Genetic Resources, Karnal, Haryana

________________________________________________________________________________________

Source of template and DNA extraction


For mtDNA profiling biological materials including blood, tissue, hair, bone etc. can be used for
extracting DNA. Amongst these, fresh blood collected from jugular vein of the animal is generally
preferred because of ease of collection, DNA extraction and better yield. The blood samples can be
collected in vacutainer tubes containing anticoagulant such as EDTA and transported to the
laboratory under chilled condition. For analysis of mtDNA based diversity, samples should be
collected from random unrelated animals from their respective breeding tracts. Care should be
taken that animals are unrelated for three generations and not more than 10% of a herd or village
population should be sampled. Genomic DNA from total blood is isolated using proteinase-K
digestion followed by standard phenol/ chloroform extraction. Quality and quantity of isolated
genomic DNA is assessed using agarose gel electrophoresis and nanodrop respectively. The
isolated DNA can be stored at 200C till further analysis.
Mitochondrial D-loop region amplification and sequencing
To amplify a 1142 bp region of bovine mtDNA between positions 15601 to 404, that includes all the
D-loop and flanking sequences at both ends the specific primers used are F:-5
TAGTGCTAATACCAACGGCC- 3; R: 5-AGGCATTTTCAGTGCCTTGC- 3. PCR can be
performed using 200-400 ng DNA in a 25 l volume reaction containing 2.5 l of 10X Buffer, 0.5 l
of 10mM dNTPs, 1.0 unit of Taq Polymerase (Invitrogen, CA), and 0.5l of each 10 pM primer. PCR
conditions include initial denaturing temperature of 95C for 6 min followed by 30 cycles of a three
step process of 45 sec at 94C, 30 sec at 65C, and 1 min at 72C followed by a final step of 6 min at
72C. PCR products can be analyzed by electrophoresis in a 1% agarose 1X TAE gel and visualized
by ethidium bromide staining and UV light. Purified PCR products are subjected for sequencing
using forward primer and Big Dye Terminator v. 3.1 Cycle Sequencing Kit.
Template preparation is the most crucial part for automated DNA sequencing. PCR product
should be clear of dimers and non specific amplifications. After polymerase chain reaction (PCR),
amplified DNA is purified for the removal of excess reaction components. It can be done by ExoSAP Digestion of PCR product as given below
1. Make a master-mix of Exonuclease I and Shrimp Alkaline Phosphate as per the composition
mentioned in the table
Component
Exo I (20 U/l )
SAP (1 U/l )
PCR Buffer 10 X
Milli Q

Final Conc. (U/l)


0.5
0.5
1 l
Make up the final volume to 10 l.

2. Add 1 l of the mastermix to 10l of PCR product (50 to 100 ng) and set up the following
incubation protocol in the Thermal cycler.
37 C for 120 minutes
85 C for 15 minutes
4 C for infinity
144

Molecular Genetic Characterization of Farm Animal


Genetic Resources

3.
4.
5.
6.
7.

Make the final volume of the PCR product to 100l with MilliQ water.
Add 10l of 3 M Na acetate pH 5.5 and 250l of chilled 95 % ethanol.
Mix the tube well and incubate on ice for 20 30 minutes
Centrifuge at 13,000 rpm for 20 minutes and aspirate the supernatant.
Wash the pellet by adding 500l of 70% ethanol at room temperature and centrifuge at top speed
for 5 minutes.
8. Aspirate the supernatant and repeat the 70% ethanol wash once more.
9. Air dries the pellet and resuspend in suitable volume of water and check by agarose gel
electrophoresis. For the product size of 10002000 bp template of 1040 ng is sufficient for
sequencing
Setting up of cycle sequencing reaction
The ready reaction composition:
PCR Product
10-40 ng
Ready reaction Mix
1 l
5 X Sequencing Buffer
1.5 l
Primer (Forward/Reverse) 5 Pmol
Milli Q water
make up the volume 10 l
Mix the content briefly and keep it in a thermal cycler set at following reaction conditions
96C for 1 minute----- For Initial Denaturation
96C for 10 seconds
50C for 5 seconds
60C for 4 minutes for 30 cycles
4C for final storage
Purification of the sequencing product
After the sequencing reaction the products are purified by the following protocol
Add 2l for 125 mM EDTA to stop the reaction and mix well.
Add 2 l for 3 M Sodium acetate pH 4.6 to each reaction well.
Ensure the proper mixing of the contents
Add 50l of 95 % ethanol to each well and incubate at room temperature for 15 minutes
Spin at a speed of 1650g for 45 minutes at room temperature
Invert the plate on paper towel and give a short spin at 180g for removing supernatant.
Add 200l of 75 % ethanol and spin at 1650g for 5 minutes
Invert the plate slowly on paper towel and spin at 180g for 1 minute
Denaturation and sequencing
Add 10 l of Hi Di Formamide denature the products at 950C for 5 minutes and chill on ice
immediately for 5 minutes. The samples are ready for sequencing using automated DNA
sequencer. The Sequences with chromatogram can be visualized and further saved by ABI PRISM
DNA Sequencing Analysis Software.
mtDNA sequence data analysis
Individual chromatograms are checked manually and ambiguous bases are disregarded. Sequences
base calling is performed with Phred, poor sequence data based on signal and spacing and
sequence data of the primers is removed. The final sequences obtained from a panel of samples are
aligned using Sequencher or MEGA version 6.0 or any other software and a contig sequence is
generated for further analysis. A number of softwares are available freely for diversity and
phylogenetic analysis using mt DNA sequence data. These include:
145

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

DNAsp- DNA sequence polymorphism: (http://dnasp.software.informer.com)


Arlequin: (http://lgb.unige.ch/arlequin/)
Phylip: (http://evolution.genetics.washington.edu/phylip/getme.html)
TreeView: http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
MEGA- Molecular evolutionary genetics analysis: (http://www.megasoftware.net/)
Network 4.612- (http://www.fluxus-engineering.com)
Intra-population variation can be estimated by computing haplotype diversity (HD), nucleotide
diversity (Pi), Nucleotide diversity with JC, PiJC and pair wise differences (K) for each breed using
DNAsp or Arliquin 3.11 software. mtDNA mismatch distributions for the combined data set and
the individual breeds can be assessed using Network or DNAsp while Fus Fs statistics which is
sensitive to population growth can be calculated using DNAsp. To understand the extent of
divergence and genetic relationship of analyzed bredds to that of populations across different
continent/countries (data available in GenBank) pair-wise F ST values can be estimated using
MEGA or Arliquin soft and phylogenetic trees using various algorithms can be drawn. Network
software is used to reconstruct phylogenetic networks and trees, infer ancestral types and potential
types, evolutionary branching and variants, and also to estimate datings.
Functionality of DnaSP
It is possibly the most popular software in population genetics, which has been cited hundreds of
times. The contig sequence generated using MEGA/CLUSTALW or Sequencher can serve as the
input file for DNAsp as it supports FASTA, MEGA, NBRF/PIR or NEXUS files. Blank spaces, Tabs,
and Carriage returns are ignored (i. e. they can be used to separate blocks of nucleotides). By
default DnaSP uses: the hyphen character '-' to specify an alignment gap; the dot character '.' to
specify that the nucleotide in this site is identical to that in the same site of the first sequence (i.e.
identical site or matching symbol); the symbols '?', 'N', 'n' to designate missing data. Nevertheless,
these symbols can be changed in the dialog box that appears when opening a data file. The
sequence name can be up to 20 characters. Blank spaces and tabs are not allowed (underlines
should be used to indicate a blank space).
Data import

After importing the data, go to analysis menu and various tests for intra population diversity and
interpopulation genetic distance can be performed. The commonly conducted analysis for mtDNA
include DNA polymorphism/divergence; haplotype/nucleotide diversity and divergence; Fus Fs
and genetic differentiation and gene flow among populations.

146

Molecular Genetic Characterization of Farm Animal


Genetic Resources

General analysis-polymorphism and population genetics tests

General statistics

Popular population
genetics test

DnaSP can compute several measures of DNA sequence variation within and between populations;
gene flow, gene conversion and linkage disequilibrium parameters. In addition, DnaSP can perform
Fus Fs statistics tests. It takes advantage of the Microsoft Windows capabilities, and can handle a
large number of sequences of thousands of nucleotides each on a microcomputer. Furthermore,
DnaSP can easily exchange data with other programs, for example, programs to perform multiple
sequence alignments, phylogenetic tree analysis, or statistical analysis.
Steps to create a network
The input file in Network 4.612 consists of nucleotide multiple sequence alignment (MSA) in RDF
format. To generate this kind of file DNAsp softwarecan be used.
The example of RDF file is
1111111111111111111111111166666666666666666666666666011111112222222222233333339112478811245777999122242149725937326048048
9457227
NC1a TCCGCTCCATTCCCGTCCTGTTCTTA 1
NC1b TCCGCTCCATTCCCGTCCTGTTCTTA 1
NC1c TCCGCTCCATTCCCGTCCTGTTCTTA 1
NC2a TCCGCTCTATTCCCGTCCTGTTCTTA 1
NC2b TCCGCTCTATTCCCGTCCTGTTCTTA 1
NC3 TCCGCTCTGTTCCCGTCCTGTTCTTA 1
NC4 TCCGCTCTGTTCCCGTCCTGTTCTCA 1
NC5a TTCACTTTGTTCCCGCTCTATTCTCA 1
NC5b TTCACTTTGTTCCCGCTCTATTCTCA 1
NC6a TTCACTCTGTTCCCGCCCTATTCTCA 1
NC6b TTCACTCTGTTCCCGCCCTATTCTCA 1
NC7 CTCACTCTGTTCCTGCCCTATTCTCA 1
NC8a TTCACTCTGTTCCCGCTCTATTCTCA 1
NC8b TTCACTCTGTTCCCGCTCTATTCTCA 1
NC9a CTCGCTCTGTTCCCGCTCTATTCTCA 1
NC9b CTCGCTCTGTTCCCGCTCTATTCTCA 1
NC10 TTCGCTCTGTTCCCGCTCTATTCTCG 1
NC11a TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11b TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11c TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11d TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11e TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC12a TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12b TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12c TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12d TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12e TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12f TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12g TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12h TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12i TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC13 TTCGCTCTATCCCCGCTCTATTCTCA 1
NC14 TTCGCTCTGTTCCCGTTCTATTCTCA 1
NC15a TTCGCTCTGTTCCCGTTCTACTCTCA 1
NC15b TTCGCTCTGTTCCCGTTCTACTCTCA 1

147

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

NC16 TCCGCTCTGTTTCCGCCCTGTTTTCA 1
NC17 TCCGTTCTGTTCCCGCCCTGCTCTCA 1
NC18a TCCGTTCTGTTCCCGCCCTGTCCTCA 1
NC18b TCCGTTCTGTTCCCGCCCTGTCCTCA 1
1010101010101010101010101010101010101010101010101010

Follow the below listed steps to generate Network


Open MSA with DNAsp software;
File / Save export data as / Roehl File Format
Steps to generate your network
Open the Network software;
Data entry / Import rdf file / continue;
Find the rdf file generated with DNAsp and click Open;
Next, click in Save. Rename the file and click Ok;
Close this window;
Click Calculate network / Network calculations / Median joining;
Click File / Open (choose the rdf file generated by DNAsp;
Click Calculate network;
Rename and save the file *.out;
Click Draw network;
Click File / Open (find your *.out file) and click Open.
Following these steps haplotype network can be generated. Click in the bottom to finalize the
drawn network.
For using the different softwares, its not necessary to create the input file for individual software.
The output of one software can act as a input file for the other as described in the figure below.
Flow chart of possible data exchange between different population genetics softwares

148

6
SNPs detection, Genotyping and Submission
R.S. Kataria, S.K. Niranjan, S.K. Mishra and Karanveer Singh
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Restriction enzyme cutting (PCR-RFLP):


NEBcutter is an online tool available at http://tools.neb.com/NEBcutter2/ for designing RFLP
protocols for the SNP identified. Due to limitation of presence or absence of restriction site at the
polymorphic site, the methodology cannot be useful always for all the SNPs discovered. The
software requires the input sequence data with polymorphic site placed at the center of at least 2025 nucleotides long. Two files are created having each allele and software is run, which will display
the results with restriction enzyme sites. The enzymes with presence or absence of cutting sites are
documented and comparison gives the results in the form of enzymes identified for ability to
differentiate two alleles in the sequence. RFLP is then standardized on the known homozygous or
heterozygous PCR amplified products and patterns are recorded. The test is then ready for running
on numbers of unknown PCR amplified samples of the same region. It is always better to run three
known patterns i.e. two homozygous for each allele and a heterozygous with unknown samples as
controls to be sure of RE digestion, which sometimes give erroneous results due to incomplete
digestion.

Figure 1: Input file for NEBcutter showing the restriction enzyme site 4th nucleotide being polymorphic C/G.

Figure 2: Output file from NEBcutter showing the restriction enzyme sitesat polymorphic site due to
nucleotide C at polymorphic site.

149

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Figure 3: Input file for NEBcutter showing the restriction enzyme site 4th nucleotide being polymorphic C/G.

Figure 4: Output file from NEBcutter showing the restriction enzyme sitesat polymorphic site due to
nucleotide G at polymorphic site. Note the abolition of sites when it is C and creation of new site
when it is G.

A typical PCR-RFLP reaction components include:


PCR product
200-300 ng
10X Restriction enzyme buffer2.0 ul
Restriction enzyme1-2 units.
Water to20 ul.
The reaction is carried out overnight at a 370C or the temperature recommended for incubation
with restriction enzyme being used. After the completion of reaction, products are run in 2-3%
agarose gel along with molecular size marker, depending upon the fragments' size.
Tetra-ARMS PCR:
Primers designing
For designing primers of tetra-ARMS PCR, an online web tool is available at
http://cedar.genetics.soton.ac.uk/public_html/primer1.html, requiring the input sequence file
along with SNP position and nucleotide change at polymorphic site.

150

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Figure 5: Tetra primers ARMS PCR primer designing tool.

Figure 6: An output file of tetra-ARMS PCR primer designing tool, showing four primers' choice along with
sequences, melting temperature and expected products size information.

151

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Setting-up Tetra-ARMS PCR


A typical tetra-ARMS PCR reaction includes the following reactants and PCR conditions:
Template genomic DNA50-100 ng
Forward outer primers (10pmol/ul)
1 ul
Reverse outer primers (10pmol/ul)
1 ul
Forward inner primers (10pmol/ul)
0.5 ul
Reverse inner primers (10pmol/ul)
0.5 ul
10X PCR buffer2 ul
10mM dNTPs0.5 ul
Taq DNA polymerase0.3 ul (1 Unit)
Water to20 ul.
Cycling conditions areInitial denaturation at 950C for 3 min.
Followed by 35 cycles of
Denaturation at 940C for 30 sec.
Annealing at depending upon primers for 30 sec.
Extension at 720C for 1 min.
After cycles of amplification final extension at 720C for 5 min.
Alternatively PCR master mix like GoTaq from Promega can also be used for amplification yielding
better results sometimes.
The products are run in 2-3% agarose gel depending upon expected products' size along with
molecular size marker. If the amplified products are too close in size, we may have to run them in
SDS-PAGE or alternatively design another set of primers yielding product sizes different enough to
resolve on agarose gel.
Submitting SNP data to dbSNP
The dbSNP database has two major classes of content: the first class is submitted data,
i.e.originalobservationsof sequence variation and the second class is computed content, i.e.content
generated during the dbSNP build cycle by computation on original submitted data.Each
variation submitted to dbSNP must have an identifier provided by the submitter called a local
identifier by dbSNP and each SNP is issued a unique identifier, formatted as an integer prefixed
with ssforsubmitted SNP, for example, ss6759231. An ss number is thus permanently associated
with the submitters identifier, and it can be treated as a formal accession number by the scientific
publishing community. Data submitted to dbSNP are clustered and provide a non-redundant set of
variations for eachorganism in the database. It consists of refSNPs, other computed data, and links
that increase the utility ofdbSNP.These clusters are maintained as refSNPs in dbSNP in parallel to
theunderlying submitted data. refSNPs are distinguished from assay submissions by using an rs
prefix(refSNP) accession number instead of the ss-prefixed (submitted SNP) accessionnumber
assigned to individual submissions.refSNPs are thus compact sets of identifiers that are used to
annotate variations on other NCBIresources. A refSNP has a number of summary properties that
are computed over all cluster members. The entire refSNP set is exported in many report formats
on the FTP [ftp.ncbi.nih.gov/snp/] site and as sets of results through a dbSNP batch query.
Both refSNPsand submitted SNPs are maintained as FASTA databases for BLAST searches
[http://www.ncbi.nlm.nih.gov/SNP/snpblastByChr.html] of dbSNP. Thus terms used in the
documentation like submitted SNP or reference SNP refer to all classes of variation in the
database and should be regarded as meaning a submitted report of variation and a reference
report of variation.EachdbSNP entry includes the sequence in context of the polymorphism (i.e.,
the surrounding sequence), the occurrence frequency of the polymorphism by population or
152

Molecular Genetic Characterization of Farm Animal


Genetic Resources

individual, and the experimental method(s), protocols, and conditions used to assay the
variation.Forpreparing
a
submission
to
dbSNPthere
are
online
instructions
[http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=how_to_submit]. A short tag or
abbreviation called Submitter HANDLE uniquely defines each submitting laboratory and groups
the submissions within the database.

Figure 7: dbSNP home page @ http://www.ncbi.nlm.nih.gov/SNP/

The basic submission steps are:


1. Get a handle assignment from NCBI if your lab doesn't already have one. Send your handle
request to snp-admin@ncbi.nlm.nih.gov or use the online handle request form.
2. Prepare a submission file with your data and send it to snp-sub@ncbi.nlm.nih.gov. Several
submission scenarios and their respective file components are provided as a guide.
3. You will receive a submission report from NCBI indicating what was loaded into the database
and a list of error or warning messages if problems were encountered while processing your
submission file.
4. Resubmissions of corrected files (returned by NCBI because of excessive errors) should be sent
to snp-update@ncbi.nlm.nih.gov.

Figure 8: Submitting a Handle Request Form as part of SNP submission tool

153

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Figure 9: Summary of current release (Build 142) of dbSNP, showing new submissions and build statistics @
http://www.ncbi.nlm.nih.gov/SNP/ snp_summary.cgi.

Searching dbSNP
The
SNP
database
can
be
explored
from
the
dbSNP
homepage[http://www.ncbi.nlm.nih.gov/SNP/] by using the Entrez SNP searchbox at the top
ofthe page or by using the links to eight basic dbSNP search options located just below the
EntrezSNPSearchbox. For single record query in dbSNP, Search by IDs query module is used to
select SNPs based on dbSNP record identifiers. These include referenceSNP (refSNP) cluster ID
numbers (rs#), submitted SNP Accession numbers (ss#), and local (or submitter) IDs for the same
variations.There
are
different
options
available
at
[http://www.ncbi.nlm.nih.gov/books/NBK44371/#Search.how_do_i_search_dbsnp] for searching
SNPs of interest.

154

7
Web Resources and Tools for Genomic Research
S K Niranjan, ManikaSehgal and R S Kataria
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana

________________________________________________________________________________________
For starting a molecular work, it is always essential to get references about any kind of nucleotide
sequence(s), gene(s), extragenic region(s), chromosome, genome, rRNA, cDNA, EST, amino acid
sequence(s) of any species or at least common species. A number of databases available on the net
can be used for the search of such kind of data. Worldwide, three public databases are bearing
major responsibility to store and share almost all type of nucleotide and protein sequence data:
GenBank at the NCBI, DNA Database of Japan (DDBJ) and European Molecular Biology Laboratory
(EMBL) Nucleotide Sequence Database at EBI, England. GenBank, one of the largest databases
possess 173,353,076 non-WGS, non-CON records containing 161,822,845,643 base pairs of sequence
data. In addition, there are 175,779,064 WGS records containing 719,581,958,743 base pairs of
sequence data (GenBank Release 202.0; June, 2014). From 1982 to the present, the number of bases
in GenBank has been doubled in approximately every 18 months. Some other specific databases like
Whole-genome shotgun (WGS), Ensembl, Pfam etc. are also available. For candidate gene analysis,
we generally use the NCBI, Ensemble databases for search of genomic data. Here, we have enlisted
different databases, which can be used for search of the reference sequences.
WEB RESOURCES
Databases used in genomic and proteomics research
NCBI
National Center for Biotechnology Information (NCBI). www.ncbi.nlm.nih.gov
GenBank
GenBank is the NIH genetic sequence database, an annotated collection of all
publicly available DNA sequences. www.ncbi.nlm.nih.gov/genbank/
RefSeq
NCBI Reference Sequence Database is a collection of sequences, which provides a
comprehensive, integrated, non-redundant, well-annotated set of sequences,
including genomic DNA, transcripts, and proteins.www.ncbi.nlm.nih.gov/refseq
PubMed
PubMed comprises more than 23 million citations for biomedical literature from
MEDLINE, life science journals, and online books. Citations may include links to
full-text
content
from
PubMed
Central
and
publisher
web
sites.www.ncbi.nlm.nih.gov/pubmed
OMIM
OMIM is a comprehensive compendium of human genes and genetic phenotypes. Its
official home is omim.org.www.ncbi.nlm.nih.gov/omim
dbSNPs
Database of single nucleotide polymorphisms (SNPs) and multiple small-scale
variations that include insertions/deletions, microsatellites, and non-polymorphic
variants.www.ncbi.nlm.nih.gov/snp/
EST
The EST database is a collection of short single-read transcript sequences from
GenBank. These sequences provide a resource to evaluate gene expression, find
potential variation, and annotate genes. http://www.ncbi.nlm.nih.gov/est
dbEST
dbEST is a division of GenBank that contains sequence data and other information
on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of
organisms.www.ncbi.nlm.nih.gov/genbank/dbest
WGS
Whole Genome Shotgun (WGS) sequencing projects are incomplete genomes or
incomplete chromosomes that are being sequenced by a whole genome shotgun
strategy.www.ncbi.nlm.nih.gov/genbank/wgs

155

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

HTG

The High Throughput Genomic (HTG) Sequences division contains unfinished DNA
sequences generated by the high-throughput sequencing centers. Sequence data in
this division are available for BLAST homology searches against either the "htgs"
database or the "month" database, which includes all new submissions for the prior
month. It was done in a coordinated effort among the International Nucleotide
Sequence
databases,
DDBJ,
EMBL,
and
GenBank.www.ncbi.nlm.nih.gov/genbank/htgs
EMBL
European Bioinformatics Institute; Website: www.embl.org/
1000 Genomes -It is a catalog of shared human genetic variation in population groups worldwide.
www.1000genomes.org/
ArrayExpress -This is a database of functional genomics experiments that can be queried and the
data downloaded. It includes gene expression data from microarray and high
throughput sequencing studies.
www.ebi.ac.uk/arrayexpress/
Database of Genomic Variants archive-The Database of Genomic Variants archive (DGVa) is a
repository that provides archiving, accessioning and distribution of publicly
available genomic structural variants, in all species.
www.ebi.ac.uk/dgva/
PromoterWise-It compares two DNA sequences allowing for inversions and translocations, ideal
for promoters.
EBI Metagenomics- A resource for the analysis and archiving of metagenomic data.
www.ebi.ac.uk/metagenomics/
EMBOSS Tools-Selected EMBOSS tools for sequence analysis, providing: pairwise sequence
alignment, sequence format conversion, sequence translation and back-translation,
and sequence statistics. www.ebi.ac.uk/Tools/emboss/
Ensemble
The Ensembl project produces genome databases for vertebrates and other
eukaryotic species. www.ensembl.org/index.html
European Nucleotide Archive- http://www.ebi.ac.uk/ena/home. The European Nucleotide
Archive (ENA) provides a comprehensive record of the world's nucleotide
sequencing information, covering raw sequencing data, sequence assembly
information and functional annotation.
Immuno Polymorphism Database- http://www.ebi.ac.uk/ipd/ The Immuno Polymorphism
Database (IPD), was developed in 2003 to provide a centralised system for the study
of polymorphism in genes of the immune system. The IPD project was established by
the HLA Informatics Group of the Anthony Nolan Research Institute in close
collaboration with the European Bioinformatics Institute.
Pfam
http://pfam.sanger.ac.uk/The Pfam database is a large collection of protein families,
each represented by multiple sequence alignments and hidden Markov models
(HMMs).Sanger Centre
Rfam
http://rfam.sanger.ac.uk/This database is a collection of RNA families, each
represented by multiple sequence alignments, consensus secondary structures and
covariance models (CMs).
GenomeNet www.genome.jp. GenomeNet is a Japanese network of database and computational
services for genome research and related research areas in biomedical sciences,
operated by the Kyoto University Bioinformatics Center.
GenomeNet Database Resources
DBGET: Integrated Database Retrieval System

156

Molecular Genetic Characterization of Farm Animal


Genetic Resources

KEGG:
Kyoto
Encyclopedia
of
Genes
and
Genomes
www.genome.jp/kegg/pathway.html
KEGG PATHWAY - Systems information: pathways
KEGG BRITE - Systems information: ontologies
KEGG Organisms - Organism-specific entry points
KEGG GENES - Genomic information
KEGG LIGAND - Chemical information
GenomeNet Bioinformatics Tools
KEGG
database resource for understanding high-level functions and utilities of the
biological system, such as the cell, the organism and the ecosystem, from molecularlevel information. used for mapping of molecular datasets in genomics,
transcriptomics, proteomics and metabolomics for biological interpretation.
KEGG PATHWAY collection of manually drawn pathway maps for various prokaryotes and
eukaryotes representing molecular interaction and reaction networks for Metabolism
(Carbohydrate, Energy, Lipid, Nucleotide, Amino acid, Other amino Glycan
Cofactor/vitamin, Other secondary metabolite, Xenobiotics Chemical structure),
Genetic Information Processing, Environmental Information Processing, Cellular
Processes, Organismal Systems and Human Diseases
DDBJ (DNA Data Bank of Japan) http://www.ddbj.nig.ac.jp/
DNA Data Bank of Japan (DDBJ) is the sole nucleotide sequence data bank in Asia,
which is officially certified to collect nucleotide sequences from researchers and to
issue the internationally recognized accession number to data submitters. Since we
exchange the collected data with ENA/EBI; European Bioinformatics Institute and
NCBI; National Center for Biotechnology Information on a daily basis, the three data
banks share virtually the same data at any given time. The virtually unified database
is called "INSD; International Nucleotide Sequence Database". DDBJ collects
sequence data mainly from Japanese researchers, but of course accepts data and issue
the accession number to researchers in any other countries.
GenomeReviews-European Bioinformatics Institute, www.ebi.ac.uk/GenomeReviews
UniProt
European Bioinformatics Institute, www.uniprot.org/.The mission of UniProt is to
provide the scientific community with a comprehensive, high-quality and freely
accessible resource of protein sequence and functional information.
UNIProtKB-Protein knowledgebase consists of two sections: Swiss-Prot, which is
manually annotated and reviewed. TrEMBL, which is automatically annotated and is
not reviewed.It, includes complete and reference proteome sets.
UniProt/SwissProt- (Swiss Institute of Bioinformatics) www.ebi.ac.uk/swissprot/
Protein Data Bank in Europe (PDBe) -http://www.ebi.ac.uk/pdbe
PDBe is the European resource for the collection, organisation and dissemination of
data on biological macromolecular structures. In collaboration with the other
worldwide Protein Data Bank (wwPDB) partners - the Research Collaboratory for
Structural Bioinformatics (RCSB) and BioMagResBank (BMRB) in the USA and the
Protein Data Bank of Japan (PDBj) - we work to collate, maintain and provide access
to the global repository of macromolecular structure data.
PDBj
Protein Data Bank Japan; http://pdbj.org/
It maintains a centralized PDB archive of macromolecular structures and provides
integrated tools, in collaboration with the RCSB, the BMRB in USA and the PDBe in
EU. PDBj is supported by JST-NBDC and Osaka University

157

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

www.PDB

http://www.wwpdb.org/
The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as
deposition, data processing and distribution centers for PDB data.1 Members are:
RCSB PDB (USA), PDBe (Europe) and PDBj (Japan), and BMRB (USA). The wwPDB's
mission is to maintain a single PDB archive of macromolecular structural data that is
freely and publicly available to the global community.
PDB-Protein Data Bank- www.rcsb.org/An Information Portal to Biological Macromolecular
Structures.
PROSITE (Swiss Institute of Bioinformatics) prosite.expasy.org/
UniProt/PIR (National Biomedical Research Foundation) http://pir.georgetown.edu/
Primer Designing
A number of primer designing tools are available on internet. Most of the designing tools or
programmes are paid but some are online free to use. Among different programmes freely available
online, Primer-3, PrimerBlast and PerlPrimer, are most easy and users friendly.
PRIMER3 programme (http://bioinfo.ut.ee/primer3-0.4.0/)
It is a widely used program for designing PCR primers. It can also design
hybridization probes and sequencing primers. The primer3 has many different input
parameters that you control and that tell primer3 exactly what characteristics make
good primers for your goals. This programme gives a choice to specify the target like
simple sequence repeat site or SNP, to exclude an specific region, included region,
product size length, 3 stability of the primer, primer size, melting temperature (Tm),
primer GC content, complementarity etc.
Primer-BLAST It was developed at NCBI to help users make primers that are specific to the input
PCR template. It uses Primer3 to design PCR primers and then submits them to
BLAST search against user-selected database. The blast results are then
automatically analyzed to avoid primer pairs that can cause amplification of targets
other than the input template.
PerlPrimer
It is an open-source GUI application written in Perl that designs primers for
standard PCR, bisulphite PCR, real-time PCR (QPCR) and sequencing. It aims to
automate and simplify the process of primer design.
OLIGO Primer Analysis
Software is the essential tool for designing and analyzing sequencing and PCR
primers, synthetic genes, and various kinds of probes including siRNA and
molecular beacons. Based on the most up-to date nearest neighbor thermodynamic
data, Oligo's search algorithms find optimal primers for PCR, including TaqMan,
highly multiplexed, consensus or degenerate primers. Multiple file batch processing
is possible. It is also an invaluable tool for site directed mutagenesis.
ExonPrimer It helps to design intronic primers for the PCR amplification of exons. The script
needs a cDNA and the corresponding genomic sequence as input. It aligns these
sequences using Blat and designs PCR primers to amplify each exon using Primer3.
The positions of the exons are deduced from the alignment of the genomic and the
cDNA sequences. Insertions/deletions up to 6 base pairs are bridged by
postprocessing. Exons with small introns in-between are combined. The user can
define the maximum exon size. Exons larger than this size will be divided into
several parts.
GeneFisherInteractive PCR Primer Design is another good site for primer designing.There are
certain programmes, which allow primer designing from the amino acid sequences.
Few sites are: Reverse Translate a Protein, iCODEHOP.
158

Molecular Genetic Characterization of Farm Animal


Genetic Resources

GenScript Real-time PCR (TaqMan) Primer Design:


This online tool helps you to design primers and probes for your Real-time PCR
(TaqMan) experiments. You can customize the potential PCR amplicon's size range,
Tm (melting temperature) for the primers and probes, as well as the organism. You
can also decide how many Primer/Probe sets you want the tool to return to you. We
recommend you using the GenBank Accession to input your target sequence.
However, you can choose to input the sequences manually in raw format.
RealTimeDesign, QuantPrime are also other programmes for primer designing for Real Time-PCR.
PCR reactions can also be set up easily through using PCR Box Titration Calculator and PCR
Reaction Mixture Setup programmes.
Sequence Analysis
Obtained sequence after sequencing needs to be analysed manually by using the programme
Chromas (www.technelysium.com.uk). In this programme we can find nucleotide sequence in
forms of peaks of different colour reflecting different nucleotides. It should be ensured that peaks
are prominent enough. The Chromas enables to analyse the obtained sequence for presence of
heterozygous through observing two peaks of nucleotide at a same nucleotide positions. If
necessary, necessary edit can be made. After confirming the sequence, we can further analyse the
sequence through BLAST or alignment with other homologous sequences.
ORF finding
There are certain graphical analysis tools, which find all open reading frames of a sequence. These
tools identify all open reading frames using the standard or alternative genetic codes. the DNA
sequence is first transcribed into RNA and then translated into all the potential ORFs (Open
Reading Frame) encoded within each of the six translation frames (3 in the forward direction and 3
in the reverse direction) and ultimately results in the longest protein coding sequence. Some
programmes areORF Finder
http://www.ncbi.nlm.nih.gov/gorf/gorf.html
StarORF
http://star.mit.edu/orf/
ORF Finder
http://www.bioinformatics.org/sms2/orf_find.html
Sequence Alignment
Sequence similarity searching: It is a method of searching sequence databases by using alignment
to a query sequence. By statistically assessing how well database and query sequences match one
can infer homology and transfer information to the query sequence.
FASTA
FASTA is commonly used sequence similarity search tool which uses heuristics for
fast local alignment searching. Protein Nucleotide Genomes Whole Genome Shotgun
SSEARCH
is an optimal (as opposed to heuristics-based) local alignment search tool using the
Smith-Waterman algorithm. Optimal searches guarantee you find the best alignment
score for your given parameters.Protein Nucleotide Genomes Whole Genome
Shotgun
PSI-Search
combines the sensitivity of the Smith-Waterman search algorithm (SSEARCH) with
the PSI-BLAST profile construction strategy to find distantly related protein
sequences.
GGSEARCH performs optimal global-global alignment searches using the Needleman-Wunsch
algorithm.
GLSEARCH performs an optimal sequence search using alignments that are global in the query
but local in the database sequence. This can be useful when you want to match all of
a short query sequence to part of a larger database sequence.

159

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

FASTM/S/F These specialist programs allow searches of databases using a group of short
peptides as the query.
BLAST
NCBI BLAST is the most commonly used sequence similarity search tool. It uses
heuristics to perform fast local alignment searches.Protein Nucleotide Vectors
WU-BLAST is similar to NCBI BLAST but combines multiple parameter options into a simpler
'sensitivity' setting. Protein Nucleotide
PSI-BLAST allows users to construct and perform a BLAST search with a custom, positionspecific, scoring matrix which can help find distant evolutionary relationships. PHIBLAST functionality is also available to restrict results using patterns. Protein
Statistical Analysis of Protein Sequences (SAPS) http://www.ebi.ac.uk/Tools/seqstats/saps/
SAPS evaluate a wide variety of protein sequence properties using statistics.
Properties considered include compositional biases, clusters and runs of charge and
other amino acid types, different kinds and extents of repetitive structures, locally
periodic motifs, and anomalous spacings between identical residue types.
Sequence Analysis from GenomeNet Database Resources www.genome.jp
BLAST / FASTA - Sequence similarity search
MOTIF - Sequence motif search
CLUSTALW / MAFFT / PRRN - Multiple alignment
Alignment
Clustal Omega is a new multiple sequence alignment program that uses seeded guide trees and
HMM
profile-profile
techniques
to
generate
alignments.
http://www.ebi.ac.uk/Tools/msa/clustalo/
ClustalW2-PhylogenyCommonly used phylogenetic tree generation methods provided by the
ClustalW2 program.
http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny/
DaliLite
Pairwise alignment of protein structures. DaliLite computes optimal and suboptimal
structural alignments between two protein structures. It compares all chains in the
first structure against all chains in the second (unless specific chain IDs are given).
The resulting superimposed coordinate files can be downloaded or viewed
interactively in Jmol. http://www.ebi.ac.uk/Tools/structure/dalilite/
Multiple Sequence Alignment (MSA) http://www.ebi.ac.uk/Tools/msa/
ClustalOmegaNew MSA tool that uses seeded guide trees and HMM profile-profile techniques to
generate alignments. Suitable for medium-large alignments.
ClustalW2
Popular MSA tool that uses tree-based progressive alignments. Suitable for medium
alignments.
DbClustal
Create a Multiple Sequence Alignment from a protein BLAST result using the
DbClustal program.
Kalign
Very fast MSA tool that concentrates on local regions. Suitable for large alignments.
MAFFT
MSA tool that uses Fast Fourier Transforms. Suitable for medium-large alignments.
MUSCLE
Accurate MSA tool, especially good with proteins. Suitable for medium alignments.
MView
Transform a Sequence Similarity Search result into a Multiple Sequence Alignment
or reformat a Multiple Sequence Alignment using the MView program.
T-Coffee
Consistency-based MSA tool that attempts to mitigate the pitfalls of progressive
alignment methods. It is suitable for small alignments.
WebPRANK The EBI has a new phylogeny-aware multiple sequence alignment program which
makes use of evolutionary information to help place insertions and deletions.
Pairwise Sequence Alignment http://www.ebi.ac.uk/Tools/psa/
160

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Global Alignment: Global alignment tools create an end-to-end alignment of the sequences to be
aligned. There are separate forms for protein or nucleotide sequences.
Needle
EMBOSS Needle creates an optimal global alignment of two sequences using the
Needleman-Wunsch algorithm.
Stretcher
Stretcher uses a modification of the Needleman-Wunsch algorithm that allows larger
sequences to be globally aligned.
Local Alignment Local alignment tools find one, or more, alignments describing the most similar
region(s) within the sequences to be aligned. There are separate forms for protein or
nucleotide sequences.
Water
Water uses the Smith-Waterman algorithm (modified for speed enhancements) to
calculate the local alignment of two sequences.
Matcher
Matcher identifies local similarities between two sequences using a rigorous
algorithm based on the LALIGN application.
LALIGN
LALIGN finds internal duplications by calculating non-intersecting local alignments
of protein or DNA sequences.
Genomic alignment tools concentrate on DNA (or to DNA) alignments while accounting for
characteristics present in genomic data.
Wise2DBA
Wise2DBA (DNA Block Aligner) aligns two sequences under the assumption that the
sequences share a number of colinear blocks of conservation separated by potentially
large and varied lengths of DNA in the two sequences.
GeneWise
GeneWise compares a protein sequence to a genomic DNA sequence, allowing for
introns and frameshifting errors.
SNP Analysis
HaploBlock SNP Haplotyping and Linkage Disequilibrium Mapping using Models of Haplotype
Block Variation. HaploBlock is a software program which provides an integrated
approach to haplotype block identification, haplotyping SNPs (or haplotype phasing,
resolution or reconstruction) and linkage disequilibrium (LD) mapping (or genetic
association studies). HaploBlock is suitable for high density haplotype or genotype
SNP marker data and is based on a statistical model which takes account of
recombination hotspots, bottlenecks, genetic drift and mutations and has a Markov
Chain at its core.
bioinfo.cs.technion.ac.il/haploblock/
POPGENE
It is User-friendly computer freeware for the analysis of genetic variation among and
within populations using co-dominant and dominant markers. It computes both
comprehensive genetic statistics (e.g., allele frequency, gene diversity, genetic
distance, G-statistics, F-statistics) and complex genetic statistics (e.g., gene flow,
neutrality
tests,
linkage
disequilibria,
multi-locus
structure).
http://www.ualberta.ca/~fyeh/
ARLEQUIN Population genetic analysis package that includes haplotype estimation by the
expectation maximization (EM) algorithm and LD analysis for locus pairs;
significance tested by permutation method" - http://anthro.unige.ch/arlequin/
Protein Analysis Web Resources
Protein Functional Analysis tools described on this page are provided using our new
bioinformatics analysis tools framework. At present a subset of the protein functional
analysis tools available at EBI are available in the new framework.
http://www.ebi.ac.uk/Tools/pfa/
CENSOR
Identify and/or mask repeat sequences in protein sequence data.
161

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

FingerPRINTScanIdentify the closest matching PRINTS sequence motif fingerprints in a protein


sequence.
InterProScan Protein Functional Analysis using the InterProScan program.
Phobius
Prediction of transmembrane topology and signal peptides using the Phobius
program.
Pratt
Search patterns conserved in sets of unaligned protein sequences.
PROSITE Scan Compare a protein sequence against the signatures in PROSITE.
RADAR
Detection and alignment of repeats in protein sequences.
InterPro
Protein sequence analysis and classification www.ebi.ac.uk/interpro/
InterPro
Provides functional analysis of proteins by classifying them into families and
predicting domains and important sites. We combine protein signatures from a
number of member databases into a single searchable resource, capitalising on their
individual strengths to produce a powerful integrated database and diagnostic tool.
PRF SEQDB Protein Research Foundation, Japan
www.prf.or.jp/seqdb-e.html
ExplorEnz
Trinity College Dublin
www.enzyme-database.org/
EPD
Swiss Institute for Experimental Cancer Center http://epd.vital-it.ch/
ProDom
Rhone-Alpes Bioinformatics Center
prodom.prabi.fr
AAindex
Kyoto University Bioinformatics Center
www.genome.jp/aaindex/
LinkDB
Kyoto University Bioinformatics Center
www.genome.jp/linkdb/
BLASTp
Protein database searching tool
NRDB
Database containing non-redundant protein sequences
DEG
Database of essential genes required for survival of the organism
COBALT
Multiple protein sequence alignment tool
Pfam
The Pfam database is a large collection of protein families, each represented by
multiple sequence alignments and hidden Markov models. It can be used for
prediction of Pfam families, motifs, repeats and clans.
Motif search for prediction of protein motifs
ProtParam
for prediction of physio-chemcal attributes of the protein
GlobPlot
Protein globularity analysis
PDB
Structural database for 3D structuring of biomolecules
ClustalW
Multiple sequence alignment tool
HHpred
for protein homology determination and structure prediction of the protein
MODELLER for 3D structure prediction of protein
PROCHECK Evaluates the steriochemical properties of the protein structure based on
Ramchandran plot
ProSA
For recognition of errors in 3D structure of protein
metaPocket 2.0 for predicting the active site in protein sequence
Autodockvina Virtual screening of ligands to protein receptors
ZINC
ligand database
Ligplot
for Protein-ligand interaction analysis
MATLAB
to calculate properties of peptide sequences, amino acid composition, ORF finding.
GROMACS to perform molecular dynamics of protein specifically bonding, interaction
Discovery studio comprehensive life science modelling and simulation suite of applications for
drug discovery process, protein ligand docking, protein homology modelling, protein protein
docking, antibody modelling

162

Molecular Genetic Characterization of Farm Animal


Genetic Resources

WEB TOOLS
Candidate genes can be analysed in various ways by using a number of bioinformatics tools and
programmes available on net. However, there are always chances for misinterpretation of the data
by using the programmes without knowing its concept or principle. Therefore, it is very essential to
know how about the programmes, before handling these tools. Before taking any decision, it is
necessary that these programmes say about the theorem on which they are based upon, their
strength i.e. ability to analyse the data and weakness or constraints. Therefore, before analysing the
data with help of any web based software programmes it is essential to read about the programme
particularly knowhow. Most of the time, these programmes get upgraded frequently; therefore, it is
also essential to get informed about the updated programmes.
Primer designing using PRIMER3 programme
The specificity and efficiency of a primer depend on several factors which must be taken into
account while designing primers. The optimal length of general PCR primers ranges between 18-24
bases. However, for multiplexing purpose the length may be as long as 30 to 35 bp.If the primer is
too short, it results in low specificity, hence, thereby induces non-specific amplification. On the
contrary, very long primers tend to decrease the template-binding efficiency at normal annealing
temperature due to the higher probability of forming secondary structures such as hairpins. Longer
primers also require more time to anneal with the complementary target sequence and to denature
in the next recycling step. It makes the PCR to compromise with the quantity of the amplicon. In
general, the optimal G/C content is between 45-55%, with an acceptable range of 40-60%. The G/C
content ultimately determines the annealing temperature. Permissible T m difference between the
primers is less than 5C, preferably within 2C.Primer pair T m mismatch can lead to poor
amplification. The primer with the higher T m will misprime at lower temperatures, while the other
primer with the lower T m may not work at higher temperature. The 3-terminus of the primer is
very important, since the DNA amplification occurs in 5 to 3 direction. It increases efficiency of the
primers. G/C clamp refers to the presence of G or C within the last 4 bases from the 3-end of
primers. G/C clamp thus prevents mispriming and enhances specific primer-template binding.
Steps for Primer Designing:
1. Open the online Primer3 (version 4) software by using the
URLhttp://www.frodo.wi.mit.edu/.
2. Paste the nucleotide sequence (in FASTA format) in the Box for source sequence in the
Primer3 page.

Primer3 home-page for primer designing

163

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

3. Set the required parameters. However, the parameters mentioned by default can be set.
4. Click on the Pick Primers option to get the primers.
5. Out put file on window will show 3-5 sets of primer. Prefeered primer setcan be selected
based on product size and the target covered, several primer parameters.

Primer3 output showing the left and right primers

Sequence Submission
Sequence may be submitted to the any three major sequence databases i.e. NCBI-GenBank, DDBJ
and EMBL. However, preferred method is NCBI GenBank. We can submit the sequences through
using BankIt or Sequin of NCBI.
Bankit: It is online submission tool at NCBI. BankIt is used for a single sequence or a small batch of
different sequences. It is preferred method if the feature annotation for your sequences is not
complicated. It is needed to open an account in the name of submitter. Once a submitter registers to
use BankIt, the submitter's contact information is saved and is automatically displayed each
subsequent time the submitter logs in to submit. BankIt allows submitters to navigate and edit
previously visited pages.Sequence data can be either cut-and-pasted as text or uploaded as file.
BankIt does not have direct update option. The GenBank Submissions Handbook [Internet] can be
consulted
for
the
GenBank
submission
either
using
bankIt
or
Sequin
(http://www.ncbi.nlm.nih.gov/books/NBK63585/)

BankIt home page for submitting the sequence(s)

164

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Sequin: Sequin (http://www.ncbi.nlm.nih.gov/Sequin/) is a stand-alone software tool developed


by the National Center for Biotechnology Information (NCBI) for submitting and updating
sequences to the GenBank, EMBL, and DDBJ databases. Sequin is preferred method of off-line
submission when a sequence or sequences are complex. It is capable of handling simple
submissions that contain a single short mRNA sequence, and complex submissions containing long
sequences, multiple annotations, gapped sequences, or phylogenetic and population studies. It also
allows sequence editing and updating, and provides complex annotation capabilities. In addition,
Sequin contains a number of built-in validation functions for enhanced quality assurance. A single
Sequin file should contain less than 10,000 sequences for maximum performance. The software
along with guiding instructions can be downloaded from ftp://ftp.ncbi.nih.gov/sequin/.

Sequin home page

Sequin sequence submission start page

The input sequence file for Sequin has to be in FASTA format. If it is protein coding gene sequence,
a separate FASTA file of amino acids is also required. There are drop down menu to select the
options for the information required like species name, sequence features like cds start site,
completeness of sequence, UTRs etc. Once the sequin file is in order i.e. error free, a message will
come in the end to submit the file to NCBI through email at the email ID- gb-sub@ncbi.nlm.nih.gov.
We have the choice to stop our data from putting in public domain by giving a release date. Once
the sequence file is submitted to GenBank, an accession number is given to the file sent through
email to the submitting author within 2-3 days of submission. Later after the file is processed, a flat
file for checking and approval is sent and the sequence is released in the public domain on due date
or immediately after publication of data, whichever is earlier. Larger submissions should be made
with a command-line program, Tbl2asn. It automates the creation of sequence records for
submission to GenBank and uses many of the same functions as Sequin.
PolyPhen-2 (Polymorphism Phenotyping v2):It is a tool which predicts possible impact of an
amino acid substitution on the structure and function of a human protein using straightforward
physical and comparative considerations. http://genetics.bwh.harvard.edu/pph2/

165

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Phylogenetic Analysis
A phylogenetic tree is a graph composed of branches and nodes. Phylogeny deduces the correct
tree for molecular sequence data that define evolution of genes and proteins families. It also
estimates the time of divergence between organisms since the time of sharing a common ancestor.
Generally, we generate inferred trees from available data based on some model, which should be
very near to the true tree based on actual events occurred during evolution. For example, with 10
taxonomic units, about 34 million rooted trees can be generated; however, an exhaustive search
examines all possible trees and selects the one with the most optimal features, such as the shortest
overall sum of the branch lengths.
Phylogenetic tree is developed in five stages viz. selection of sequences for analysis, multiple
sequence alignment, determination of statistical model of nucleotide/amino acid evolution, tree
building and tree evaluation. Multiple alignment is a critical step in phylogenetic analysis. If you
misalign or wrongly align a group of sequences, the tree developed by that means will not reflect
true biological evolution. It is to be sure that all the sequences are homologous. It can be further
tested by performing pairwise alignment, if expect value is significant. Always remove nonhomologous sequence(s) from the group. For proteins, which share a domain but not the other
region, should be analysed for the shared domain only. For more number of sequences, a heuristic
algorithm is used to identify an optimal tree. Rather, heuristic algorithm discards a vast numbers of
non-useful trees. Heuristic algorithms have an inherent trade-off between search time and
confidence in the search result. One can assume that they provide an approximation of the best
tree. Phylogenetic trees are built based upon two concepts viz. distance based and character based.
Distance based methods work upon the number of DNA or amino acid changes occurred during
pairwise comparison. Commonly used distance based methods are Minimum Evolution (ME),
Fitch-Margoliash (FM), UPGMA and Neighbor Joining (NJ) methods. NJ method is fastest.
Maximum Parsimony and Maximum likelihood are commonly used character based methods.
UPGMA method assumes that the rate of evolution has remained constant throughout the
evolutionary history of the included sequences/taxa, therefore, it produces a rooted tree. Maximum
Parsimony can be used when there is very high sequence similarity, whereas, maximum likelihood
may be better when there is very low sequence similarity. Reliability of a tree can be evaluated by
using the bootstrap method. It evaluates the accuracy of tree through evaluating the probability for
the members of a clade to be a part of the true tree. Higher the score (or closer to 100), more
significant grouping of the branches.
Phylogenetic Analysis using MEGA 6: Phylogenetic tree can be drawn by using various software
programmes. MEGA6 includes many statistical methods for the study of molecular evolution. It
may be downloaded from www.megasoftware.net. It also contains a fully functional Web Browser
for retrieval of sequence(s) directly from web exploration and allows to directly access the NCBI for
sequence alignment and inferring the phylogenetic tree. Phylogenetic tree, using MEGA 6 software
166

Molecular Genetic Characterization of Farm Animal


Genetic Resources

can be derived relying upon the models of DNA or amino acid substitution(s). These models are
also used to evaluate the evolutionary distance between sequences and estimation of divergence
time. Commonly used substitutions models are Number of differences, p-distance, Jukes-Cantor,
Tajima-Nei, Kimura-2 parameter ,Tamura 3-Parameter, Tamura-Nei, Maximum Composite
Likelihood, Nei- Gojobori for nucleotide and Number of differences, p-distance, poisson and
Dayhoff for amino acid sequences.
Steps for the MEGA6 are given in the programme itself. Major steps include-Assembling data for
analysis, building sequence alignment using MUSCLE and CLUSTALW, evolutionary analysis
(computing basic statistical quantities for sequences, computing evolutionary distances using
different nucleotide substitution and amino acid substitution, Synonymous and non-synonymous
substitution models), constructing phylogenetic tree using different methods (also includes
statistical and bootstrap tests for reliability), molecular clock test (including Tajimas Test for
relative substitution test), Tests of selection (operated) based on synonymous/nonsynonymous
tests and Tajimas test of neutrality.
Steps for MEGA6
Download MEGA6 from the web and open MEGA6
Go for Align at left, Edit/Build alignment
Select Create New alignment and click OK

Select DNA for DNA Sequence or Protein for Protein seq analysis under Datatype for
Alignment
It will open next Window. Go edit. Select Insert sequence from file
Select file (FASTA/txt file) from source(You can also make txt file by pasting sequence in
Notepad and save, then select this file ex. Test.txt).

167

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Sequence will be automatically come at MEGA window. You can change name by selecting
Test at extreme left showing under Species/abbrv)

If you dont have reference seq then select Web and then Do BLAST Search. It will
automatically link you with NCBI BLAST, If you have ref seq, then convert in FASTA/ .txt
file by pasting on Notepad and do same procedure.

Select BLAST similar to NCBI BLAST.It will show same results as in NCBI BLAST.

Now select sequences, you want to align as reference sequences.Then click add To
Alignment
It will open a window M6: Input Sequence Label. Here you can label each sequence
separately as First word (for naming first ref seq e.g. Buffalo/ Bubalus bubalis), Second
Word..Or you can escape by clicking OK directly (i.e. without naming)
You can change name in same way as we changed for Test Sequence.
Now click Alignment under M6: Alignment Explorer Window

168

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Select align by Clustal W or can select any other. If the sequence is cDNA/mRNA, you
can opt ----(codon) option.

Click OK when being asked for select all?Then M6: ClustalW parameters window will
open.Click OK from that window. (Keep the values at default).
It will lead to alignment of sequences under M6: Alignment explorer window. Now you
can remove unaligned part by selection and then delete from keyboard ( you can select
like in Excel or can select unaligned part individually) removal will make better results.
Delete unaligned part from both ends.
Now open Data and Phylogenetic Analysis from dropbox.

It will open Confirm about protein coding nucleotide sequence data? Click Yes if your
seq is protein coding (select yes, even if it has introns inside also)
Now come to MEGA6.06 (6140226) and select Phylogeny. Select option
Construct/Test Nieghbor-Joining Tree or other.

169

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

It will ask about Use the active file would you like to use currently active data.
Click Yes. It will open M6: Analysis Preferences window.
This window can give you the preference about what kind of tree and on what basis it is to
be generated. Further, what method will be used for tree generation. Pl. keep remember for
these preferences/ options, whatever as default or you are modifying. All yellow strips
under this, you can make change as per options given (for ex. Below tree is being generated
by Tajima Nei Model under Model/Method
Go at Test of phylogeny option. Select Bootstrap method from dropbox.
At next option Number of bootstraps application you can increase/decrease bootstrap
values. However default value 500 will be OK. For others, you can take default values.

Then go for Compute. It will open M6: Tree Explorer which will have tree. For details
about tree, click caption in this window. It will give you details about method of
phylogeny. It is needed during publication so save it. You can change tree type by selecting
caption showing trees (no title is given), select radiation or other. For copying the tree, go
to Image then copy to clipboard and paste the tree at desired place like Word file etc.
You can copy the content from caption also.

170

Molecular Genetic Characterization of Farm Animal


Genetic Resources

You can save the tree as desired place. Next time you can open this file directly, which will
show you tree as well as window (MEGA6.06) with aligned sequence files. You can estimate
divergence by selecting Distance then Compute pairwise distance
It will ask use the active file say Yes. Again it will lead to M6: Analysis preference
you can keep default values but remember these values. Then click Compute.
Will open new window, showing M6: pairwise distances

References

Breslauer, K.J., Frank, R., Blcker, H. and Marky, L.A. 1986. Predicting DNA duplex stability from the base
sequence. PNAS(USA)., 83: 3746-3750. (http://www.pnas.org/content/83/11/3746).
Markoff, A., Savov, A., Vladimirov, V., Bogdanova, N., Kremensky, I. and Ganev, V. 1997. Optimization of
single-strand conformation polymorphism analysis in the presence of polyethylene glycol. Clin. Chem.,
43(1): 30-3. (http://www.clinchem.org/content/43/1/30.long). (A correction has been published in
http://www.clinchem.org/content/43/4/692).
Rozen, S. and Skaletsky, H.J. 2000.Primer3 on the WWW for general users and for biologist programmers. In:
Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology.
Humana Press, Totowa, NJ, pp 365-386. Source code available at http://fokker.wi.mit.edu/primer3/.
SantaLucia, J, Jr. 1998. A unified view of polymer, dumbbell, and oligonucleotide DNA nearestneighborthermodynamics.PNAS.95: 1460-1465.DOI:10.1073/pnas.95.4.1460
Tamura K, Stecher G, Peterson D, Filipski A, and Kumar S (2013) MEGA6: Molecular Evolutionary Genetics
Analysis version 6.0. Molecular Biology and Evolution:30 2725.
Thornton, B. and Basu, C. 2011. Real-Time PCR (qPCR) Primer Design Using Free Online Software.
Biochemistry and Molecular Biology Education, 39: 145-154. DOI: 10.1002/bmb.20461
Ye J., Coulouris G., Zaretskaya I., Cutcutache I., Rozen S. and Madden T. 2012. Primer-BLAST: A tool to
design target-specific primers for polymerase chain reaction. BMC Bioinformatics, 13(1): 134.
doi:10.1186/1471-2105-13.
Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids
Res.,31(13), 3406-3415. doi: 10.1093/nar/gkg595 .

171

8
Statistical Procedures for Identification of Quantitative Trait Loci
Upasna Sharma and R K Vijh
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

A QTL is a region of any genome that is responsible forvariation in the quantitative trait of interest.
The goal ofidentifying all such regions that are associated with aspecific complex phenotype might
at first, seem quitesimple, especially with all the genomic and computationaltools available to help
us.Detecting a QTL was the motivation for many scientific investigations and was an achievable
goal. Presently, the trend is on locating the multiple interacting QTL that are associated with
multiple traits, by continually evolving sophisticated statistical analyses. As more and more new
technologies, methodologies are developed;we must remember that no single technological
advance or statistical method will unravel the genomic mystery. Instead, it will be the
conglomeration of ideas, techniques and analyses that provide the end to this Endeavour.
Unfortunately, the taskof QTL detection and their interaction among several others is difficult
because of the sheer number of QTL, and thepossible Epistasis or interactions between QTLs.
Tocombat this,QTL experiments can be designed with theaim of containing the sources of variation
to a limitednumber, so that dissection of a complex phenotypemight be possible. In general, a large
sample of individualshas to be collected to represent the total population,to provide an observable
number of recombinants andto allow a thorough assessment of the trait under investigation.Using
this information, coupled with one ofseveral methodologies to detect or locate QTL, associations
between quantitative traits andgenetic markers are made as a step towards understandingthe
genetic basis of complex traits.The first step in any QTL-mapping experiment isusually to construct
populations that originate fromhomozygous, inbred parental lines.The resultingF1 lines will tend to
be heterozygous at all markers andQTL. From the F1 population, crosses are made, and the
segregation of markersand QTL are statistically modeled. In general, experimentersassume that
markers are segregating randomly,but if, in fact, markers are subject to Segregation distortion, it is
not possible to anticipate how the resultingestimates of recombination will be affected, as wellas
any potential QTL locations. Once the data iscollectedon each individual, statistical
associationsbetween the markers and quantitative trait are establishedthrough statistical
approaches that range fromsimple techniques, such as analysis of variance (ANOVA), to models
that include multiple markers andinteractions. The simpler statistical approaches tend tobe
methods of QTL detection that assess differences inthe phenotypic means for single-marker
genotypicclasses. The actual location of QTL involves an estimatedgenetic map with known
distances betweenmarkers, and evaluations of likelihood function that ismaximized over the
established parameter space.
Single-marker tests
Simple, single-marker tests (for example, using t-test,ANOVA and simple linear regression
statistics)that assess the segregation of a phenotype with respectto a marker genotype, indicate
which markersare associated with the quantitative trait of interestand, therefore, point to the
existence of potential QTL. Typically, the null hypothesis tested is that themean of the trait value is
independent of the genotypeat a particular marker. The null hypothesis isrejected when the test
statistic is larger than a crucialvalueand the implication isthat a QTL is linked to the marker under
investigation.Although the t-test, ANOVA and simple linearregression approach are all equivalent
to each otherwhen their hypotheses are testing for differences inthe phenotypic means, they fail to
provide a closed form estimate of QTL location, or recombination frequencybetween the marker
172

Molecular Genetic Characterization of Farm Animal


Genetic Resources

and the QTL. This isbecause the QTL effect and the location are confounded,or are unable to be
estimated separately.
Interval Mapping
Confounding, in these situations, is addressed by(incrementally) fixing the location of the QTL
andestimating the QTL effect between intervals of markers.These intervals of markers lead
naturally to amethod that estimates both QTL effect and the location, known as Interval
Mapping'.Detecting QTL by this type of single-markerapproach is a simple procedure that can be
accomplishedwith any standard statistical analysis softwarepackage, and has the potential to
identify numeroussignificant markers. Two important issues should beconsidered when assessing
these statistical results. Thefirst consideration is sample size. The number of individualsstudied
provides information for the estimationof phenotypic means and variances. A largesample of
individuals provides the opportunity toobserve recombinant events and to estimate
parameterswith greater accuracy and, therefore, a greaterability to detect QTL through a singlemarker test.
The second issue concerns the problem of multipletesting and arises when many markers are
investigatedthrough independent statistical tests. This problem iscoupled with the level of
statistical significance that isset by the investigator and can lead to detection offalse-positive QTL.
Typically, an investigator is willingto tolerate incorrectly detecting a QTL in, for example,5% of
cases. Therefore, given a 5% level of significance, and 100 positive, unique marker tests, five ofthe
100 markers would detect QTL incorrectly. Thisproblem can be accounted for through a
multipletest adjustment, such as Bonferroni, orTukey, that will correct the level of significance
accordingto how many independent statistical tests are made.Single-marker analyses are still used
as a means toidentify markers that are segregating with a trait.Mostof these applications deal
primarily with detectingindividual markers, rather than genomic regions, andare a quick and
efficient means to screen large populationsfor specific traits, such as disease resistance.
Typically, when investigations focus on questions ofgenomic location, then more sophisticated
methods ofQTL analysis, which rely on the estimated order ofmarkers, are used. The added
information that is gainedfrom knowing the relationships between markers isessential to QTL
methodologies that aim to locate QTL. Genetic mapsSingle-marker analyses investigate individual
markersindependently, and without reference to their positionor order.When markers are placed in
genetic(linear) map order, so that the relationships betweenmarkers are understood, the additional
genetic informationgained from knowing these relationships providesthe necessary setting to
address confoundingbetween QTL effect and location. A genetic map alsoprovides a genetic
representation of the chromosomeon which the markers and QTL reside.Pairwise information, or
recombination, is first estimatedfor all markers that are segregating asexpected, and then any
marker that is linked to anyother marker is placed in the same linkage group. Thelinear
arrangement of markers into linkage groups, orchromosomes, provides the genetic map for
locatingQTL that are relative to intervals of markers (or statisticallyrelated sets of markers). In
addition to supplying the structure in which to search for QTL,the estimated genetic map benefits
the estimation ofmissing marker information by using the surroundingmarker genotypes to infer
knowledge of the missingmarker genotypes.When using genetic maps in this way, it is importantto
distinguish between recombination eventsand genetic distance. The essential difference is
thatgenetic distances are additive, whereas recombinationunits are not because they are
probabilities andalsobecause of genetic interference. Recombination unitsand genetic distance can
be translated between byusing a map function (such as the Haldane andKosambi map functions).
The practical value of agenetic map is that the QTL can be mapped moreeasily in an interval of
defined genetic distance. Themethods for linearly ordering the molecular markersrely on
minimizing the recombination between pairsof markers.As the estimated genetic distance
173

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

betweenmarkers is a function of the average number ofobserved recombination events between


them, minimizingthese values best represents the frequency ofrecombination. The unit for
expressing the geneticdistance between markers on a chromosome is theMorgan (or, more usually,
the centiMorgan, cM), andis defined as the distance along which one recombinationevent is
expected to occur per gamete pergeneration.When several markers are considered,they are ordered
simultaneously on the basis of minimizedrecombination an approach called multipointlinkage
mapping.
The accuracy of locating QTL is limited by the information; in particular the number of
recombinantsthat is gained from observing the genotypicstates of the markers. These observed
recombinantscan be limited by both small sample size and missinggenotypic data.With this in
mind, a commonly askedquestion is: Should I genotype more markers onfewer individuals, or
score more individuals (for genotypeand phenotype) on fewer markers? Becauseobserved
recombinants provide the information, scoringmore individuals addresses both previously
mentionedconcerns. Interval mappinguses an estimated geneticmap as the framework for the
location of QTL. Theintervals that are defined by ordered pairs of markersare searched in
increments and statistical methods are used to test whether a QTLis likely to be present at the
location within the intervalor not. It is important to realize that interval mapping,as defined by
Lander and Botstein, statistically tests fora single QTL at each increment across the orderedmarkers
in the genome. The results of the tests areexpressed as LOD (logarithm of the odds) scores,which
compare the evaluation of the likelihood functionunder the null hypothesis (no QTL) with
thealternative hypothesis (QTL at the testing position) forthe purpose of locating probable
QTL.Interval mapping searches through the orderedgenetic markers in a systematic, linear (also
referred toas one-dimensional) fashion, testing the same nullhypothesis and using the same form of
likelihood at each increment. In addition, as the LOD scores takentogether represent a LOD profile
across the geneticmap, the locations of the maximum LOD profile havethe potential to indicate
multiple or ghost QTL incorrectlywhen a single QTL model is used.
Determining which of the many peaks indicates a singleQTL leads to issues of determining
statistically significantresults. Because the likelihood is usuallya function of mixtures of (normal)
distributionsand, when maximized under both the null andalternative hypotheses, leads to test
statistics that failto follow standard statistical distributions, it is difficultto declare a QTL with
confidence. This happenseven when the previously noted issues of multipletesting are taken into
consideration. Both multipletesting and distributional assumptions of the test statisticcan be
accounted for through an application ofresampling methodology. Nevertheless, although interval
mapping is certainlymore powerful than single-marker approaches todetect QTL (because of the
structure and additionalgenotypic information supplied by the genetic map),it is limited by both
the model that defines it as a singleQTL method, and by the one-dimensional searchthat does not
allow interactions between multipleQTL to be considered.
Multiple quantitative trait loci
Statistical approaches for locating multiple QTL aremore powerful than single QTL approaches
becausethey can potentially differentiate between linkedand/or interacting QTL.When the alleles of
two ormore QTL interact (epistasis), this has great potentialto alter the quantitative trait in a
manner that is difficultto predict. One of the most extreme (and simplest) cases is the complete loss
of trait expression inthe presence of a particular combination of alleles atmultiple QTL. The
ultimate challenge in the searchfor multiple QTL is to consider every position in thegenome
simultaneously, for the location of a potentialQTL that might act independently, be linked
toanother QTL, or interact epistatically with otherQTL. Interacting QTL are of particular interest
asthey indicate regions of the genome that might nototherwise be associated with the quantitative
traitusing a one-dimensional search.Although the concept of locating multiple, interactingQTL is
174

Molecular Genetic Characterization of Farm Animal


Genetic Resources

straightforward, implementation is quite difficultdue to the tremendous number of potential


QTLand their interactions, which lead to innumerable statisticalmodels and heavy computational
demand. Oneheuristic approach that has been taken is to first locateall single QTL, then to build a
statistical model withthese QTL and their interactions and, finally, search inone dimension for
significant interactions.Kao et al.made such a proposal through a direct extension ofinterval
mapping to include a simultaneous search formultiple epistatic QTL.Owing to the
computationalintensity of a multidimensional search, a simultaneousinvestigation is not possible,
and the search is referred toas a quasi-simultaneous investigation.Approaches like this have the
potential to work in many situations, butare limited to the pool of QTL that resulted from thefirstpass QTL analyses, and have little hope of establishingtrue epistatic effects for QTL that are not
individuallysignificant. Searching through all potential models isa problem known as model
selection, and remains anactive area of research in theoretical statistics.
Composite interval mapping
Composite interval mapping as introduced byZhao-Bang Zeng in 1993 and, similarly,multiple
QTLmapping as introduced by Ritsert Jansen in thesame year, achieve the same result by reducing
the numberof potential models under consideration. Bothmethods extend the ideas of interval
mapping to includeadditional markers as cofactors outside a definedwindow of analysis for
the purpose of removing thevariation that is associated with other (linked) QTL inthe genome.The
limitations of both approaches are thatthey are restricted to one-dimensional searches acrossthe
genetic map, and are challenged at times by the multiplicityof epistaticQTL effects. There is also a
risk ofputting too many markers in the model as cofactors and care should be taken to preserve the
amount ofinformation that is available for estimation of the QTLeffect.
The importance of developing models with multipleQTL is well understood for linked QTL, and
has an evengreater role in the estimation and location of epistaticQTL. The limiting feature in
successfully using multipleQTL models is not our inability to write an equation fora model, it is our
inability to identify the best model, orsubset of models (from potentially millions).Enumeration of
all possible QTL models that considerthe appropriate genetic architecture for the experiment,as
well as linkage and epistasis, is a daunting task.Accurate and fast simultaneous
multidimensionalsearches through the most likely models, and their comparisons,are required to
determine the most feasiblemodels. The interval mapping and composite interval mapping have
benefitedthe mapping community, but are limited in theirinability to accommodate multiple linked
QTL. Becausea stepwise linear approach to model building, byadding and deleting every
combination of multiple (linked) QTL and their interactions is not computationallyfeasible. One
approach isto globally search for the optimum multiple QTL genotypeusing various genetic
algorithms. The application ofgenetic algorithm(s) to multiple QTL problems is one ofmany
beneficial approaches because it allows a samplingof the QTL models across unequal QTL numbers
to be considered, and because it can be used in conjunctionwith any QTL-mapping methodology
that is implementedfor a multidimensional search of a genome.
The approach breaks the QTL problem intotwo distinct parts: the relationship between the QTL
andthe quantitative trait, and the location of the QTL.Disjoining these two independent
relationships allowsthe initial focus to be placed on estimation of theunknown QTL genotypes, and
then on allowing thesearch for different models and their comparisons withthe information gained
from completing the QTL genotypeinformation. The power in breaking a problem into two
independent parts is not new as it was dealt with byJansen in 1993, and lies in the fact that
information isgained in the first part that can be used in the secondpart. Once the QTL genotypes
are estimated, Sen and Churchill explore all possible models using an approachthat allows distinct
models of different QTL numbers tobe considered. As the QTL genotypes are
calculatedindependently from the QTL effect and location, previousissues of epistasis and linked
175

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

QTL are eliminatedbecause the state of the QTL genotype and QTL numberare known before the
estimation of their effects andinteractions.Multi-trait QTL mapping can also benefit from
thecomputational framework of Sen and Churchill by simplyextending from a single phenotype to
multiple correlatedphenotypes, and by dissecting the problem in asimilar manner. The additional
information gainedfrom knowing the covariation between multiple traits isthe same as the
treatment originally detailed by Jiangand Zeng (1995), but the computational mechanics of
thesolution follow the Sen and Churchill approach.Although the Sen and Churchill view has been
shown tobenefit QTL mapping, it might have an even largerpotential for accommodating other
types of problemand data structure.
Joint trait analysis:
Several data for mapping quantitative trait loci (QTL) contain observations on multiple traits or on
one or several traits in multiple environments. With such data, we can ask questions like the
following: Does a QTL have pleiotropic effects on multiple traits? Does a QTL show
genotypeenvironment interaction? What is the natureof genetic correlationbetween differenttraits?
Is the correlation due topleiotropy or linkage in certain regions of a genome?Statistically this
involves multiple trait analysis, because the expression of a trait in different environments can be
regardedas different traits or different trait states. Presently the QTLs for various traits are
analysedseparately. This approach does not take advantage of the correlated structure of data and
has a number of disadvantages for mapping QTL and also for understanding the nature of genetic
correlations. The statistical powers of hypothesis tests tend to be lower and the sampling variances
of parameter estimation tend to be higher for separate analysis. Also, it would be difficult to test a
number of biologically interesting questions involving multiple traits by analyzing different traits
separately. Different traits are correlated genetically due to pleiotropy and linkage. With
observations on a number of polymorphic genetic markers and on a number of quantitative traits, it
is possible to dissect a portion of genetic variation and co-variation among traits by localizing and
estimating responsible QTL. It is also possible to test whether the genetic correlation is due to
pleiotropy or linkage for certain regions of a genome.
Many data in QTL studies contain multiple traits. These traits are often correlated genetically
and non-genetically (or environmentally). One way to analyze these data is to map QTL on each
trait separately. Alternatively and preferably, different traits are analyzed together to map QTL
affecting one or more traits by taking the correlated structure of data into account. There are
generally three advantages for this joint analysis. First, the joint analysis may increase statistical
power of detecting QTL. Second, the joint analysis can improve the precision of parameter
estimation. Third and probably most importantly, the joint analysis provides appropriate
procedures to test a number of biologically interesting hypotheses involving multiple traits.
The single marker regression analysis, Interval mapping, composite interval mapping and Joint
trait analysis procedures shall be utilized using the software "QTL Cartographer" which has been
provided along with the buffalo test data to run the analysis.
QTL Cartographer (http://statgen.ncsu.edu/qtlcart/index.php)
How to use the software Win QTL cartographer
Single-marker analysis
When to use?
For quick scanning of the entire genome (all chromosomes) to find best possible QTLs and identify
missing (or incorrectly formatted) data. Use single-marker analysis first to ensure your data file is
clean; then move on more sophisticated analysis methods, such as Interval Mapping and
Composite Interval Mapping.

176

Molecular Genetic Characterization of Farm Animal


Genetic Resources

How it works?
Single-marker analysis is based on the idea that if there is an association between a marker
genotype and trait value, it is likely that a QTL is close to that marker locus.
Comments
Single-marker analysis can be somewhat useful for a quick look at data, but it has been superceded
by Interval Mapping and Composite Interval Mapping. IM and CIM are more thorough and
accurate indicators of QTL. The prime value of WinQTLCart's single-marker analysis is its
identification of missing data that could affect later analysis.
Running a single-marker analysis
1. Open a mapping source data file (an .MCD file) into the WinQTLCart main window.
2. Select Method>Single-Marker Analysis. WinQTLCart analyzes the data and displays the single
marker analysis controls in the form pane. The information pane on the right includes the
analysis results.
3. Select a trait for display from the Trait Selection pull-down list. All the traits present in the file
will be on the list.
4. For each trait, the information pane on the right displays WinQTLCart's statistical summary of
the file. (You can view this summary in a larger window by clicking the Result button in the
Statistical Summary group box, just to the left of the information pane.)
5. In the Single Marker Analysis group box, click Result to view the analysis result for the selected
trait. You can change the font used by the display window to make the results easier to read.
Click the Save button in this group to save the marker analysis results to a text file.
6. In the Statistical Summary group box, click Result to view the summary in a larger display
window. Click the Save button to save the statistical results to a text file.
The statistical summary includes:Basic summary of the data, A histogram for the quantitative trait.
WinQTLCart's summary of missing individuals that should be present, as indicated by the data.
If markers show 0% data, there was likely an import problem.
Summary of marker segregation Combines LR map QTL and Q stats
7. Click the Graphic File button to save the results to a QTL mapping result file (*.QRT). You
can open this .QRT file later to view the results as a graph.
8. Click Close to end the single-marker analysis session and return to the Form View of Source
Data.
Interval Mapping
What it is?
Interval mapping (IM) is an extension of single-marker analysis. In single-marker analysis, only one
marker is used in QTL mapping but effects are underestimated and the QTL position cannot be
determined. Interval mapping provides a systematic way to scan the whole genome for evidence of
QTL. IM uses two observable flanking markers to construct an interval within which to search for
QTL. A map function (either Haldane or Kosambi) is used to translate from recombination
frequency to distance or vice visa. Then, a LOD score is calculated at each increment (walking step)
in the interval. Finally, the LOD score profile is calculated for the whole genome. When a peak has
exceeded the threshold value, we declare that a QTL have been found at that location.
When to use it?
IM is a good general standard to use for all datasets.
Use it in combination with or as part of a process including
You may wish to start with a single-marker analysis and then run IM to further refine the analysis.

177

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

High-level process
Here's a quick overview of how to use WinQTLCart's IM implementation. The first few times you
run this analysis, go with the WinQTLCart default values for the form's parameters. The defaults
provide the best all-around parameter settings, especially for initial analysis sessions.
1. Select the IM analysis method.
2. Select the chromosome(s) and trait(s) you want to analyze.
3. Select a threshold level to apply to the selected trait(s). Select either By manual input (the
WinQTLCart default) or By permutations (to have WinQTLCart determine an optimum
threshold). See setting the threshold level for more information on the impact of each of these
choices.
4. Click OK to start the calculations for the threshold level.
5. Following threshold calculation set IM form parameters. Select a walk speed in cM.It's
recommended you use the same walk speed for your entire dataset. Don't reset the walk speed
between runs or your results will not be comparable.
6. Click Start to begin the analysis.
Composite Interval Mapping
What it is?
Composite interval mapping (CIM) adds background loci to simple interval mapping (IM). CIM fits
parameters for a target QTL in one interval while simultaneously fitting partial regression
coefficients for "background markers" to account for variance caused by non-target QTL. "In theory,
CIM gives more power and precision than simple IM because the effects of other QTL are not
present as residual variance. Furthermore, CIM can remove the bias that would normally be caused
by QTL that are linked to the position being tested." Background markers are usually 20-40cM
apart.
High-level workflow
Here's a quick overview of how to use WinQTLCart's CIM implementation. The first few times you
run this analysis, go with the WinQTLCart default values for the form's parameters. The defaults
provide the best all-around parameter settings, especially for initial analysis sessions.
1. Select the CIM analysis method.
2. Select the chromosome(s) and trait(s) you want to analyze.
3. Select a threshold level to apply to the selected trait(s). Select either by manual input (the
WinQTLCart default) or By permutations (to have WinQTLCart determine an optimum
threshold). See the Setting the threshold level topic for more information on the impact of each
of these choices.
4. Click OK to start the calculations for the threshold level. This may take from several minutes to
several hours to run.
5. Following threshold calculation set CIM form parameters. Select a walk speed in cM.It's
recommended you use the same walk speed for your entire dataset. Don't reset the walk speed
between runs or your results will not be comparable.
6. Click Start to begin the analysis. The analysis may take from 20 minutes to several hours to run.
Multiple Interval Mapping
What it is?
Multiple interval mapping (MIM) uses multiple marker intervals simultaneously to fit multiple
putative QTL directly in the model for mapping QTL. The MIM model is based on Cockerham's
model for interpreting genetic parameters and the method of maximum likelihood for estimating
genetic parameters. MIM is well suited to the identification and estimation of genetic architecture
parameters, including the number, genomic positions, effects and interactions of significant QTL
and their contribution to the genetic variance.
178

Molecular Genetic Characterization of Farm Animal


Genetic Resources

High-level process
Here's a quick overview of how to use WinQTLCart's MIM implementation:
1. Select the MIM analysis method.
2. Pick a trait you want to work with. (MIM works with only one trait at a time.)
3. Decide if you want to create a model using WinQTLCart's default search procedures or an
4. alternative (such as Forward, Backward, or CIM).
5. Run the analysis to generate the model.
6. Refine the model as needed by editing individual cells in the model, adding or deleting QTL,
7. searching and testing QTLs or epistatics, and re-estimating. This part of the analysis can
8. iterate for as long as you want to search for QTLs.
9. Save the model as a .MDS file (or as a result file using the Refine Model function).
References

Jansen, R. C. andStam, P. 1994. High resolution of quantitativetraits into multiple loci via interval mapping.
Genetics 136,14471455.
Jansen, R. C. 1992. A general mixture model for mappingquantitative trait loci by using molecular
markers.Theor.Appl. Genet. 85, 252.
Jansen, R. C. 1995. Genetic Mapping of Quantitative Trait Loci inPlants a Novel Statistical Approach.Ph.D.
thesis,CIPdata KoninklijkeBiblotheek, Den Haag, The Netherlands.
Jansen, R. C. 1993. Interval mapping of multiple quantitative traitloci. Genetics 135, 205211.
Jiang, C. andZeng, Z.-B 1995.Multiple trait analysis of geneticmapping for quantitative trait loci.Genetics
140,1111.
Kao, C. H., Zeng, Z.-B.and Teasdale, R. D. 1999. Multiple intervalmapping for quantitative trait loci.Genetics
152,1203.
QTL CARTOGRAPHER: A Reference Manual and Tutorial forQTL Mapping.19952001. Department of
Statistics, North Carolina StateUniversity, Raleigh, North Carolina.
Zeng, Z.-B. 1993. Theoretical basis of precision mapping ofquantitative trait loci.Proc. Natl Acad. Sci. USA
90,10972.

179

9
RNA Isolation and Real time-Quantitative Polymerase Chain Reaction
Manishi Mukesh, Ankita Sharma, Kiran Thakur, Preeti Verma and Indrajit Ganguly
ICAR-National Bureau of Animal Genetic Resources, Karnal, Haryana

________________________________________________________________________________________

RNA isolation
Principle:
RNA (Ribonucleic acid) is a polymeric substance present in living cells and many viruses,
consisting of a long single-stranded chain of phosphate and ribose units with the nitrogen bases
adenine, guanine, cytosine, and uracil, which are bonded to the ribose sugar. RNA is used in all the
steps of protein synthesis in all living cells and carries the genetic information for many viruses.
The isolation of RNA with high quality is a crucial step required to perform various molecular
biology experiment. TRIzol Reagent is a ready-to-use reagent used for RNA isolation from cells and
tissues. The reagent, a mono-phasic solution of phenol and guanidine isothiocyanate, is an
improvement to the single-step RNA isolation method. During sample homogenization or lysis,
TRIZOL Reagent maintains the integrity of the RNA, while disrupting cells and dissolving cell
components. TRIzol works by maintaining RNA integrity during tissue homogenization, while at
the same time disrupting and breaking down cells and cell components. Addition of chloroform,
after the centrifugation, separates the solution into aqueous and organic phases. RNA remains only
in the aqueous phase. After transfer of the aqueous phase, the RNA is recovered by precipitation
with isopropyl alcohol.
Following protocol is use to isolate RNA from peripheral blood mono-nuclear cells (PBMC):
Note: All the steps should be done on ice and while wearing latex free gloves.
1. Thaw the trizolated frozen cells and homogenize properly using hand held homogenizer
(Labgen, Cole Parmer, USA).
2. Add 1l linear acrylamide (Ambion, USA) per ml of trizol and vortex the contents and
centrifuge at 10,000g for 10 minutes at 4oC.
3. Transfer the supernatant into a fresh 1.5ml tube and add 200 l chloroform/ml trizol. Then mix
it vigorously for 30 sec. and keep at room temperature for 2-3 min., Centrifuge the content of the
tubes again at 10,000g for 10 min. at 4C.
4. Gently aspirate the upper aqueous phase (containing RNA) without taking the contamination of
interface, and transfer to a fresh tube.
5. For denaturation, add 600l acid: phenol: chloroform (5:1) to the aqueous phase and centrifuge
at 13,000xg for 15 min at 4C.
6. Take separated upper aqueous phase carefully in a fresh tube. To this add 500l of isopropanol
and keep for 30 minutes at RT. Centrifuge the mixture at 15,000g for 15min at 4oC.
7. Discard the supernatant carefully and add 1ml of 75% ethanol to the pellet then, vortex for 1 min
to wash RNA. Centrifuge the contents at 15,000g for 5 min at 4oC and discard the supernatant.
8. Air dry the RNA pellet and dissolve in 30-50 l RNA storage solution (1mM Na-citrate). For
quantification, take O.D of RNA using Nanovue plus (GE, Healthcare).
Purification of RNA
To remove the traces of genomic DNA, RNeasy Mini kit columns (Qiagen, Germany) along with on
column digestion by RNAse free DNase enzyme (Qiagen, Germany) were used.
Principle:
The RNeasy procedure represents a well-established technology for RNA purification. This
technology combines the selective binding properties of a silica-based membrane with the speed of
180

Molecular Genetic Characterization of Farm Animal


Genetic Resources

micro spin technology. A specialized high-salt buffer system allows up to100g of RNA longer than
200 bases to bind to the RNeasy silica membrane. Ethanol is added to provide appropriate binding
conditions, and the sample is then applied to an RNeasy Mini spin column, where the total RNA
binds to the membrane and contaminants are efficiently washed away. High-quality RNA is then
eluted in 30100l water. With the RNeasy procedure, all RNA molecules longer than 200
nucleotides are purified.
Steps:
1. Adjust each sample volume to 100l with RNase-free water. Add 350l of buffer RLT and mix
well. Immediately add 250l ethanol (96-100%) to the diluted RNA, and mix well again by
pipetting.
2. Transfer the sample (700 l) to an RNeasy Mini spin column placed in a 2 ml collection tube
(supplied). Close the lid gently, and centrifuge for 15 s at 8000 x g (10,000 rpm). Discard the
flow-through.
3. To the RNeasy spin column, add 350l buffer RW1and centrifuged for 15 sec at 10,200 rpm to
wash the spin column membrane. The flow-through was discarded carefully.
4. Add 80 l DNase Mix (10 l DNase I + 70 l RDD buffer) to the spin column membrane and
place it on benchtop for 15 min.
5. Add 350l of buffer RW1 was added to the RNeasy spin column and centrifuged for 15sec at
10,200rpm. The flow-through was discarded.
6. To wash the spin column membrane, 500l RPE buffer was added to the RNeasy spin column
and centrifuged for 15sec at 10,200rpm. Further after discarding the flow-through, 500 l buffer
RPE was added again to the RNeasy spin column and centrifuged for 2 min at 10,200rpm.
7. The RNeasy spin columns were placed in a new 2 ml collection tube and the old collection tubes
were discarded with flow-through and centrifuged at full speed for 1 min.
8. To elute the RNA, 30l of RNase free water was added directly to the spin column membrane
placed in a new 1.5ml collection tube and centrifuged for 1 min at 10,200 rpm. The step was
repeated twice to get maximum and pure yield.
Evaluation of RNA quality/integrity
1. Total RNA concentration and purity was measured using a Nanovue plus (GE, Healthcare). The
purity of RNA (A 260 /A 280 ) for all samples was above 1.9.
2. RNA denaturing agarose gel was performed to check the integrity of all the extracted RNA.
The extracted RNA was stored at -800C till further use.
Real time Quantitative Polymerase Chain Reaction
The polymerase chain reaction (PCR) is a scientific technique in molecular biology to amplify a
single or a few copies of a piece of DNA across several orders of magnitude, generating thousands
to millions of copies of a particular DNA sequence. Polymerase Chain Reaction was developed in
1984 by the American biochemist, Kary Mullis. In traditional (endpoint) PCR, detection and
quantitation of the amplified sequence are performed at the end of the reaction after the last PCR
cycle, and involve post-PCR analysis such as gel electrophoresis and image analysis. In real-time
quantitative PCR (qPCR), the amount of PCR product is measured at each cycle. This ability to
monitor the reaction during its exponential phase enables users to determine the initial amount of
target with great precision.In real-time PCR, the amount of DNA is measured after each cycle by
the use of fluorescent markers that are incorporated into the PCR product. The increase in
fluorescent signal is directly proportional to the number of PCR product molecules (amplicons)
generated in the exponential phase of the reaction. Fluorescent reporters used include doublestranded DNA (dsDNA)-binding dyes, or dye molecules attached to PCR primers or probes that are
incorporated into the product during amplification. The change in fl uorescence over the course of
181

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

the reaction is measured by an instrument that combines thermal cycling with scanning capability.
By plotting fluorescence against the cycle number, the real-time PCR instrument generates an
amplification plot that represents the accumulation of product over the duration of the entire PCR
reaction.
Overview of real-time PCR
qPCR steps
There are three major steps that make up a qPCR reaction. Reactions are generally run for 40 cycles.
1. Denaturation- The temperature should be appropriate to the polymerase chosen (usually 95C).
The denaturation time can be increased if template GC content is high.
2. Annealing- Use appropriate temperatures based on the calculated melting temperature (Tm)
of the primers (5C below the Tmof the primer).
3. Extension- At 7072C, the activity of the DNA polymerase is optimal, and primer extension
occurs at rates of up to 100 bases per second. When an amplicon in qPCR is small, this step
is often combined with the annealing step using 60C as the temperature.
Real-time PCR fluorescence detection systems:
Several different fluorescence detection technologies can be used for real time PCR, and each has
specific assay design requirements. All are based on the generation of a fluorescent signal that is
proportional to the amount of PCR product formed. The three main fluorescence detection systems
are:
DNA-binding agents (e.g., SYBR Green and SYBR GreenER technologies)
Fluorescent primers (e.g., LUX Fluorogenic Primers and Amplifluor qPCR primers)
Fluorescent probes (e.g., TaqMan probes, Scorpions, Molecular Beacons)
DNA-binding dyes
The most common system for detection of amplified DNA is the use of intercalating dyes that
fluoresce when bound to dsDNA. SYBR Green I and SYBR GreenER technologies use this type
of detection method. The fluorescence of DNA-binding dyes significantly increases when bound to
double-stranded DNA (dsDNA). The intensity of the fluorescent signal depends on the amount of
dsDNA that is present. As dsDNA accumulates, the dye generates a signal that is proportional to
the DNA concentration and can be detected using real-time PCR instrument
Probe-based detection systems
Probe-based systems provide highly sensitive and specific detection of DNA and RNA and use the
phenomenon of Fluorescent Resonance Energy Transfer (FRET). TaqMan probes require a pair of
PCR primers in addition to a probe with both a reporter (as FAM (6-carboxyfluorescein)) and a
quencher dye ((TAMRA (6-carboxytetramethylrhodamine)) attached. The probe is designed to bind
to the sequence amplified by the primers. During qPCR, the probe is cleaved by the 5 nuclease
activity of the Taq DNA polymerase; this releases the reporter dye and generates a fluorescent
signal that increases with each cycle
Primer-based detection systems
Primer-based fluorescence detection technologies can provide highly sensitive and specific
detection of DNA and RNA. In these systems, the fluorophore is attached to a target-specific PCR
primer that increases in fluorescence when incorporated into the PCR product during amplification.
Passive reference dyes (such as ROX dye) are frequently used in real-time PCR to normalize the
fluorescent signal of reporter dyes and correct for fluctuations in fluorescence that are non-PCR
based.

182

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Procedure:
Gene amplification by qPCR is perform using qPCR system [Applied Biosystem Step one plus (ABI,
California), LightCycler 480 (Roche)]. Each reaction in a 96 well plate was comprised of 10 l mix.
1. Thaw the cDNA samples and slightly vortex; add the 4 l of cDNA samples in each of the
duplicate well.
2. Prepare the master mix as per given in the table 1 including forward and reverse primer (0.4 l
each) , nuclease free water (0.2 l) and the available SYBER Green ( Roche or Thermoscientific, 5
l)
For each gene, samples to be run in duplicate (technical replicates) along with 6 point relative
standard curve plus the non-template control (NTC). The amplification conditions of the reactions
are: 10 min at 95 C, 40 cycles of 15 s at 95 C (denaturation) and 1 min at 60 C (annealing +
extension). A dissociation protocol with an incremental temperature of 95 C for 15 s plus 65 C for
15 s was used to investigate the specificity of the qPCR reaction and the presence of primer dimers.
Table 1. : Reaction mixture for qPCR (10l reaction)
Sr. No.
1
Master Mix
2
3
4
5

Constituents
cDNA

Volume
4.0 l

SYBER Green master mix (2X)


Forward Primer(100pm)
Reverse Primer(100pm)
Nuclease free water
Total volume

5.0 l
0.4 l (10pm)
0.4 l (10pm)
0.2 l
10 l

Data Analysis:
Melting curve analysis
The specificity of a real-time PCR assay is determined by the primers and reaction conditions used.
However, there is always the possibility that even well designed primers may form primer-dimers
or amplify a nonspecific product. There is also the possibility when performing qRT-PCR that the
RNA sample contains genomic DNA, which may also be amplified. The specificity of the qPCR or
qRT-PCR reaction can be confirmed using melting curve analysis. When melting curve analysis is
not possible, additional care must be used to establish that differences observed in Ct values
between reactions are valid and not due to the presence of nonspecific products.

Melting curve

Normalization methods
Variations at any stage of the process will prevent the ability of researchers to compare data and
will lead to erroneous conclusions if not factored out of the study. Sources of variability include the
nature and amount of starting sample, the RNA isolation process, reverse transcription, and lastly
183

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

real-time PCR amplification. Normalization is essentially the act of neutralizing the effects of
variability from these sources. While there are individual normalization strategies at each stage of
real-time PCR, some are more effective than others.
Normalizing to a reference gene The use of a normalizer gene, (also called a reference gene or
housekeeping gene) is the most thorough method of addressing almost every source of variability
in real-time PCR. However, for this method to work, the gene must be present at a consistent level
among all samples being compared. An effective normalizer gene controls for RNA quality and
quantity, differences in reverse transcription efficiency, and real-time PCR amplification efficiency.
If the reverse transcriptase transcribes or the DNA polymerase amplifies a target gene in two
samples at different rates, the normalizer transcript will reflect the variability.
General process
1. Viewing the amplification plots for the entire plate
2. Setting the baseline and threshold values
3. Using the methods detailed in this section to determine results.
Relative Quantification
Relative quantification describes the change in expression of the target gene in a test sample relative
to a calibrator sample. The calibrator sample can be an untreated control or a sample at time zero in
a time-course study (Livak and Schmittgen, 2001). Relative quantification provides accurate
comparison between the initial levels of template in each sample.
Calculation methods for relative quantification
Relative standard curve method- Running the target and endogenous control amplifications in
separate tubes and using the relative standard curve method of analysis requires the least
amount of optimization and validation.
Comparative Ct method (Ct)- to use the comparative Ct method, a validation experiment
must be run to show the efficiencies of the target and endogenous control amplification should
be optimal. This methods contain double normalization first with endogenous control and then
with calibrator sample.
Formula used for calculation:
2^ (-(Ct) (Livak and Schmittgen, 2001)
Steps:
1. Prepare the set up for the plate and start the run.
2. Collect the Ct values and calculate the average of duplicates for each sample..
3. Determine the Ct by subtracting the average Ct of your endogenous control from the average
of your target.
4. Determine the Ct by subtracting the Ct of your calibrator from the Ct of your test sample
or treated sample.
5. The calculate the fold change ratio with the formula- 2^ (-(Ct) .
References

Rio D.C., Ares M. Jr., Hannon G.J. and Nilsen T.W. 2010. Purification of RNA using TRIzol (TRI reagent).Cold
Spring Harb Protoc. (6). doi: 10.1101/pdb.prot5439.

184

10
Expression Microarray Methodology Using Agilent Whole Genome Chip
Manishi Mukesh, Ankita Sharma, Monika Sodhi
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana

________________________________________________________________________________________
Step-1: Sample preparation

Trizol based RNA Isolation from cell/ tissue


Thaw samples at room temperature.
Homogenize the tissue (100mg) in 1 ml ice-chilled trizol.
Centrifuge@ 10,000g for 10 minutes at 4C.
Collect supernatant in fresh 1.5ml tube.
Add 200l chloroform/ml trizol and keep at RT for 3 min.
Mix properly and centrifuge as above in step 3.
Collect the aqueous phase and add 600l acid: phenol: chloroform.
Mix and centrifuge at 12,000g for 15 min at 4C.
Collect the aqueous phase and add 500l of isopropanol, Keep for 30 min at RT .Centrifuge at
12,000g for 15min and discard the supernatant.
Add 1 ml of 75% ethanol and vortex for 1 min to wash RNA.
Centrifuge at 10,000g for 5 min at 4C, discard supernatant.
Air dry pellet and dissolve in 30-50 l RNA storage solution (1mM Na-citrate).
Quantification using nanodrop.
RNA Kit purification (using RNAeasy Kit of Qiagen)
Adjust the RNA sample vol. to 100l using RNase-free water. Add 350l Buffer RLT, and mix.
Add 250l ethanol (96-100%) to the diluted RNA, and mix well by pipetting. Dont centrifuge.
Proceed immediately to next step.
Add 350l Buffer RW1 to the RNeasy spin column. Close the lid gently, and centrifuge for 15
sec at 10,200rpm to wash the spin column membrane. Discard the flow-through.
Add 350l of Buffer RW1 to the RNeasy spin column. Close the lid gently, and centrifuge for
15sec at 10,200rpm. Discard the flow-through.
Transfer the sample (700l) to an RNeasy Mini spin column placed in a 2 ml collection tube.
Close the lid gently, and centrifuge for 15 sec at 10,200rpm. Discard the flow-through.
Add 500l Buffer RPE to the RNeasy spin column. Close the lid gently, and centrifuge for
15sec at 10,200rpm to wash the spin column membrane. Discard the flow-through.
Add 500 l Buffer RPE to the RNeasy spin column. Close the lid gently, and centrifuge for 2
min at 10,200rpm to wash the spin column membrane.
Place the RNeasy spin column in a new 2 ml collection tube and discard the old collection
tube with the flow-through. Close the lid gently, and centrifuge at full speed for 1 min.
Place the RNeasy spin column in a new 1.5ml collection tube. Add 30l of RNase free water
directly to the spin column membrane. Close the lid gently, and centrifuge for 1 min at 10,200
rpm to elute the RNA. Repeat the step.
Quantification using nanodrop and analyzing quality control using bioanalyzer
185

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

Step-2: Preparation of One-Color Spike Mix


1. Equilibrate water baths to 37C, 65C, 80C, 40C and 70C.
2. Vigorously mix the One-Color Spike Mix stock solution on a vortex mixer.
3. Heat at 37C for 5 minutes, and mix on a vortex mixer once more.
4. Briefly spin in a centrifuge to drive contents to the bottom of the tube prior to opening.
Settlement of the solution on the sides or lid of the tubes may occur during shipment and
storage. Table 1 provides the dilutions of Agilent One-Color Spike Mix for a range of total RNA
input amounts. For inputs not shown Table 1, make sure that the amount of spike mix is
proportional to the amount of RNA input. If you start with 5 ng mRNA as the input mass,
follow the dilution scheme as described in Table 1.
Table 1. Dilutions of Agilent One-Color Spike Mix for Cyanine 3-labeling
Starting amount of RNA
(ng)
Total RNA PolyA
RNA
10
25
50
100
200
5

Serial Dilution
First
1:20
1:20
1:20
1:20
1:20
1:20

Second

third

1:25
1:25
1:25
1:25
1:25
1:25

1:20
1:20
1:20
1:20
1:10
1:20

Fourth

Spike Mix Volume to be used


in each labelling reaction (uL)

1:10
1:4
1:2

2
2
2
2
2
2

For example, to prepare the Agilent One-Color Spike Mix make dilution appropriate for 25 ng of
total RNA starting sample:
1. Create the First Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix First Dilution.
b. Mix the thawed Spike Mix vigorously on a vortex mixer.
c. Heat at 37C in a circulating water bath for 5 minutes.
d. Mix the Spike Mix tube vigorously again on a vortex mixer.
e. Spin briefly in a centrifuge to separate contents to the bottom of the tube.
f. Into the First Dilution tube, put 2 L of Spike Mix stock.
g. Add 38 L of Dilution Buffer provided in the Spike-In kit (1:20).
h. Mix thoroughly on a vortex mixer and spin down quickly to collect all of the liquid at the
bottom of the tube. This tube contains the First Dilution.
2. Create the Second Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix Second Dilution.
b. Into the Second Dilution tube, put 2 L of First Dilution.
c. Add 48 L of Dilution Buffer (1:25).
d. Mix thoroughly on a vortex mixer and spin down quickly to collect all of the liquid at the
bottom of the tube. This tube contains the Second Dilution.
3. Create the Third Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix Third Dilution.
b. Into the Third Dilution tube, put 2 L of Second Dilution.
c. Add 38 L of Dilution Buffer (1:20).
186

Molecular Genetic Characterization of Farm Animal


Genetic Resources

d. Mix thoroughly on a vortex mixer and spin down quickly to collect all the liquid at the
bottom of the tube. This tube contains the Third Dilution.
4. Create the Fourth Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix Fourth Dilution.
b. Into the Fourth Dilution tube, add 10 L of Third Dilution to 30 L of Dilution Buffer for the
Fourth Dilution (1:4).
c. Mix thoroughly on a vortex mixer and spin down quickly to collect all of the liquid at the
bottom of the tube. This tube contains the Fourth Dilution (now at a 40,000-fold final
dilution).
d. Add 2 L of Fourth Dilution to 25 ng of sample total RNA as listed in Table 1 and continue
with cyanine 3 labeling using the Agilent Low Input Quick Amp Kit protocol as described in
Step 2.
Storage of Spike Mix dilutions
Store the Agilent RNA Spike-In Kit, One-Color at 70C to 80C in a non-defrosting freezer for up
to 1 year from the date of receipt. The first dilution of the Agilent One-Color Spike Mix positive
controls can be stored up to 2 months in a non-defrosting freezer at 70C to 80C and
freeze/thawed up to eight times. After use, discard the second, third and fourth dilution tubes.
Step-3: Prepare labeling reaction
For each assay, make sure that the volume of the total RNA sample plus diluted RNA spike-in
controls does not exceed 3.5 L. Because the 1x reaction involves volumes of less than 1 L, prepare
components in a master mix and divide into the individual assay tubes in volumes >1 L. When
preparing 4 samples, use the 5x master mix. When preparing 8 samples, use the 10x master mix.
1. Add 200 ng of total RNA to a 1.5-mL microcentrifuge tube in a final volume of 1.5 L. (from
working RNA concentrations of 100 ng/L).
2. Add 2 L of diluted Spike Mix to each tube. Each tube now contains a total volume of 3.5 L.
3. Prepare and add T7 Promoter Primer:
a. Mix the T7 Promoter Primer and water to prepare the T7 Promoter Primer Master Mix as
listed in Table 2.
Table 2. T7 Promoter Primer Mix

Component
T7 Promoter Primer (green cap)
Nuclease-free water (white cap)
Total Volume

Volume (L) per reaction


0.8
1
1.8

b.

Add 1.8 L of T7 Promoter Primer Mix to the tube that contains 3.5 L of total RNA and
diluted RNA spike-in controls. Each tube now contains a total volume of 5.3 L.
c. Denature the primer and the template by incubating the reaction at 65C in a circulating
water bath for 10 minutes.
d. Place the reactions on ice and incubate for 5 minutes.
4 Prewarm the 5X first strand buffer at 80C for 3 to 4 minutes to ensure adequate resuspensions
of the buffer components. For optimal resuspension, briefly mix on a vortex mixer and spin the
tube in a microcentrifuge to drive down the contents from the tube walls. Keep at room
temperature until needed.
187

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

5 Prepare and add cDNA Master Mix:


a. Immediately prior to use, add the components for cDNA Master Mix listed in Table 3, use a
pipette to gently mix, and keep at room temperature. The AffinityScript RNase Block mix is
a blend of enzymes. Keep the AffinityScript RNase Block mix on ice and add to the cDNA
Master Mix immediately prior to use.
Table 3. cDNA Master Mix
Component
5X First Strand Buffer (green cap)
0.1 M DTT (white cap)
10 mM dNTP mix (green cap)
AffinityScript RNase Block Mix (violet cap)
Total Volume

Volume (L) per reaction


2
1
0.5
1.2
4.7

b. Briefly spin each sample tube in a microcentrifuge to drive down the contents from the tube
walls and the lid.
c. Add 4.7 L of cDNA Master Mix to each sample tube and mix by pipetting up and down.
Each tube now contains a total volume of 10 L.
d. Incubate samples at 40C in a circulating water bath for 2 hours.
e. Move samples to a 70C circulating water bath and incubate for 15 minutes.
f. Move samples to ice. Incubate for 5 minutes.
g. Spin samples briefly in a microcentrifuge to drive down tube contents from the tube walls
and lid.
Stopping Point. If you do not immediately continue to the next step, store the samples at 80C.
6 Prepare and add Transcription Master Mix:
a. Immediately prior to use, gently mix the components listed in Table 4 in the order indicated for
the Transcription Master Mix by pipetting at RT. The T7 RNA polymerase blend is a blend of
enzymes. Keep the T7 RNA polymerase on ice and add to the Transcription master mix.
Table 4. Transcription Master Mix
Component
Nuclease-free water (white cap)
5X Transcription Buffer (blue cap)
0.1 M DTT (white cap)
NTP mix (blue cap)
T7 RNA Polymerase Blend (red cap)
Cyanine 3-CTP

Volume (L) per reaction


0.75
3.2
0.6
1
0.21
0.24

b. Add 6 L of Transcription Master Mix to each sample tube. Gently mix by pipetting. Each tube
now contains a total volume of 16 L.
c. Incubate samples in a circulating water bath at 40C for 2 hours.
Stopping Point. If you do not immediately continue to the next step, store the samples at 80C.

188

Molecular Genetic Characterization of Farm Animal


Genetic Resources

Step- 4: Purify the labeled/amplified RNA


a) Add 84 L of nuclease-free water to your cRNA sample, for a total volume of 100 L.
b) Add 350 L of Buffer RLT and mix well by pipetting.
c) Add 250 L of ethanol (96% to 100% purity) and mix thoroughly by pipetting. Do not
centrifuge.
d) Transfer the 700 L of the cRNA sample to an RNeasy mini column in a 2 mL collection
tube. Centrifuge the sample at 4C for 30 seconds at 13,000 rpm. Discard the flow-through
and collection tube.
e) Transfer the RNeasy column to a new collection tube and add 500 L of buffer RPE
(containing ethanol) to the column. Centrifuge the sample at 4C for 30 seconds at 13,000
rpm. Discard the flow-through. Re-use the collection tube.
f) Add another 500 L of buffer RPE to the column. Centrifuge the sample at 4C for 60
seconds at 13,000 rpm. Discard the flow-through and the collection tube.
g) If any buffer RPE remains on or near the frit of the column, transfer the RNeasy column to a
new 1.5 mL collection tube and centrifuge the sample at 4C for 30 seconds at 13,000 rpm to
remove any remaining traces of buffer RPE. Discard this collection tube and use a fresh tube
to elute the cleaned cRNA sample. Qiagens RNeasy mini spin columns are recommended
for purification of the amplified cRNA samples. If sample concentration causes difficulty,
you can use the Stratagene Absolutely RNA Nanoprep kit as an alternative. See Absolute
RNA Nanoprep Purification
h) Elute the cleaned cRNA sample by transferring the RNeasy column to a new 1.5 mL
collection tube. Add 30 L RNase-free water directly onto the RNeasy filter membrane. Wait
60 seconds, then centrifuge at 4C for 30 seconds at 13,000 rpm.
i) Maintain the cRNA sample-containing flow-through on ice. Discard the RNeasy column.
Step- 5: Quantify the cRNA
1. Start the NanoDrop software.
2. Click the Microarray Measurement tab.
3. Before initializing the instrument as requested by the software, clean the sample loading area
with nuclease-free water.
4. Load 1.0 to 2.0 L of nuclease-free water to initialize. Then click OK.
5. Once the instrument has initialized, select RNA-40 as the Sample type (use the drop down
menu).
6. Make sure the Recording button is selected. If not, click Recording so that the readings can be
recorded, saved, and printed.
7. Blank the instrument by pipetting 1.0 to 2.0 L of nuclease-free water (this can be the same water
used to initialize the instrument) and click Blank.
8. Clean the sample loading area with a laboratory wipe. Pipette 1.0 to 2.0 L of the sample onto
the instrument sample loading area. Type the sample name in the space provided and click
measure. Be sure to clean the sample loading area between measurements and ensure that the
baseline is always flat at 0, which is indicated by a thick black horizontal line. If the baseline
deviates from 0 and is no longer a flat horizontal line, reblank the instrument with nuclease-free
water, then remeasure the sample.
189

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

9. Record the following values:


Cyanine 3 dye concentration (pmol/L)
RNA absorbance ratio (260 nm/280 nm)
cRNA concentration (ng/L)
10. Determine the yield and specific activity of each reaction as follows:
a) Use the concentration of cRNA (ng/L) to determine the g cRNA yield as follows:
(Concentration of cRNA) x 30 L (elution volume)
---------------------------------------------------------------- = g of cRNA
1000
b) Use the concentrations of cRNA (ng/L) and cyanine 3 (pmol/L) to determine the specific
activity as follows:
Concentration of Cy3
------------------------------------------- x 1000 = pmol Cy3 per g cRNA
Concentration of cRNA
11. Examine the yield and specific activity results.
CAUTION: If the yield is <1.65 g and the specific activity is <9.0 pmol Cy3 per g cRNA do not
proceed to the hybridization step. Repeat cRNA preparation.
Table 5. Recommended Yields and Specific Activity for hybridization
Microarray format
1 pack
2-pack
4-pack
8-pack

Yield (g)
5
3.75
1.65
0.825

Specific Activity (pmol Cy3 per g cRNA)


6
6
6
6

Step- 6: Hybridization
Prepare the 10X Blocking Agent
1. Add 500 L of nuclease-free water to the vial containing lyophilized 10X Blocking Agent
supplied with the Agilent Gene Expression Hybridization Kit, or add 1250 L of nuclease-free
water to the vial containing lyophilized large volume 10X Blocking Agent (Agilent p/n 51885281).
2. Mix by gently vortexing. If the pellet does not go into solution completely, heat the mix for 4
to 5 minutes at 37C.
3. Drive down any material adhering to the tube walls or cap by centrifuging for 5 to 10 seconds.
10X Blocking Agent can be prepared in advance and stored at 20C for up to 2 months. After
thawing, repeat the vortexing and centrifugation procedures before use.
Prepare hybridization samples
Add 500 L of nuclease-free water to the vial containing lyophilized 10X Blocking Agent supplied
with
1. Equilibrate water bath to 60C.
2. For each microarray, add each of the components as indicated in the tables 6 below to a 1.5 mL
nuclease-free microfuge tube:
190

Molecular Genetic Characterization of Farm Animal


Genetic Resources

3. Mix well but gently on a vortex mixer.


For 1-pack and 2-pack microarrays, if you did not generate enough labeled cRNA, add the amount
of labeled cRNA to the fragmentation mix such that the same amount is used for each microarray
within the same experiment
Table 6: Fragmentation mix for 4-pack or 8-pack microarray formats
Components
cyanine 3-labeled, linearly amplified
cRNA
10X Blocking Agent
25X Fragmentation Buffer
Total Volume

Volume/Mass (4-pack microarrays)


1.65 g
Nuclease-free water bring volume to 52.8 L
2.2 L
55L

CAUTION: Do not incubate sample in the next step for more than 30 minutes. Cooling on ice and
adding the 2x Hybridization Buffer will stop the fragmentation reaction.
4. Incubate at 60C for exactly 30 minutes to fragment RNA.
5. Immediately cool on ice for one minute.
6. Add 2x GEx Hybridization Buffer HI-RPM to the 4-pack microarray format at the appropriate
volume to stop the fragmentation reaction as mentioned in Table 7.
Table 7: Hybridization mix
Components
cRNA from Fragmentation Mix
2x GEx Hybridization Buffer HI-RPM

Volume per hybridization for 4-pack


55 l
55 l

7. Mix well by careful pipetting. Take care to avoid introducing bubbles. Do not mix on a vortex
mixer; mixing on a vortex mixer introduces bubbles.
8. Spin for 1 minute at room temperature at 13,000 rpm in a microcentrifuge to drive the sample
off the walls and lid and to aid in bubble reduction.
9. Use immediately. Do not store.
10. Place sample on ice and load onto the array as soon as possible.
Prepare the hybridization assembly
1. Load a clean gasket slide into the Agilent SureHyb chamber base with the label facing up and
aligned with the rectangular section of the chamber base. Ensure that the gasket slide is flush
with the chamber base and is not ajar.
2. Slowly dispense the volume of hybridization sample (see Table 8) onto the gasket well in a
drag and dispense manner.
Table 8: Hybridization Sample
Components
Volume Prepared
Volume to Hybridize

Volumes per hybridization (4-pack)


110 L
110 L

3. Slowly place an array active side down onto the SureHyb gasket slide, so that the
Agilent-labeled barcode is facing down and the numeric barcode is facing up. Make sure
the sandwich-pair is properly aligned.
4. Place the SureHyb chamber cover onto the sandwiched slides and slide the clamp assembly
onto both pieces.
191

SAARC Regional Training, ICAR-NBAGR, (India)


20-26 April, 2015

5. Hand-tighten the clamp onto the chamber.


6. Vertically rotate the assembled chamber to wet the gasket and assess the mobility of the
bubbles. If necessary, tap the assembly on a hard surface to move stationary bubbles.
7. Place assembled slide chamber in rotisserie in a hybridization oven set to 65C. Set your
hybridization rotator to rotate at 10 rpm when using 2x GEx Hybridization Buffer HI-RPM.
8. Hybridize at 65C for 17 hours.
Prewarm Gene Expression Wash Buffer 2
Warm the Gene Expression Wash Buffer 2 to 37C as follows:
1. Dispense 1000 mL of Gene Expression Wash Buffer 2 directly into a sterile 1000-mL bottle.
Repeat until you have enough prewarmed Wash Buffer 2 solution for your experiment.
2. Tightly cap the 1000-mL bottle and place in a 37C water bath the night before washing
arrays. Alternatively, remove the plastic cubitainer from the box and place it in a 37C water
bath the night before washing the arrays
Microarray Wash
1. Add the Triton X-102 (0.005%) to Gene Expression wash buffer 1 and 2
2. Wash buffer 2 should be prewar overnight at 37C
3. Wash the slide with wash buffer 1 and wash buffer 2
Table 9: Wash conditions
Dish
1

Wash Buffer
GE Wash Buffer 1

Temperature
Room temperature

Time

Disassembly
1st wash
2nd wash

2
3

GE Wash Buffer 1
GE Wash Buffer 2

Room temperature
Elevated
temperature

1 minute
5 minute

4. Use Ozone barrier slide cover to dry the slide


Step- 7: Scanning of Slides and analysis of data
1. Scan the slide immediately using GenePix-4000B (molecular Device) or Agilent scanner.
2. Feature extraction and microarray image analysis.
3. After loading the images, load the grid that references your image: It is the GAL format file
(for Gene Array List).
4. At the end of the spot validation, launch the analysis using feature extraction software that
will convert the pixels into digital intensity values.
The data is ready for normalization and analysis using microarray analysis softwares.

192

11
Genotype and Phenotype Association Studies in Livestock
S P Dixit, Anurodh Sharma and Jayakumar Sivalingam
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)

________________________________________________________________________________________

Input file preparation: The genotype information available in the excel sheet may be directly
exported into the SAS software and may be readily used as an input file.
Gene and genotype frequencies in animals: The gene and genotype frequencies can be estimated
by using the SAS software.
Genotype and association: Statistical analysis can be carried out using PROC GLM of SAS version
9.3 to find out the association between the genotypes of the polymorphic SNPs of the genes studied
with the traits of interest. Duncans Multiple Range Test (DMRT) as modified by Kramer (1957) was
used for testing the differences among least-squares means.
proc glm data=datasetlsmeans;
class fixed effects;
model dependent variables = fixed effects / solution;
lsmeans fixed effects;
manova h=_all_ / printe printh;
run;
Haplotype construction and association: The linkage disequilibrium analysis was carried out by
using PROC Allele procedure of SAS software, version 9.3 (2011). The SNPs that were found to be
in the linkage disequilibrium were further used for the construction of the haplotypes using
arlequin version 3.0 software (Excoffier et al 2005). Then the association of haplotypes and trait of
interest can be studied using SAS version 9.3. The command that has been used for the genotype
and association study may be used for haplotype and association study, by using the haplotypes
thats been constructed instead of genotypes.
Whole genome association study: The whole genome association study can be carried out using
the genotype information of the SNPs and the trait of interest using the PLINK software.

193

You might also like